Professional Documents
Culture Documents
Raymond S. Nickerson
Tufts University
p. cm.
Includes bibliographical references and index.
ISBN 0-8058-4898-3 (cloth : alk. Paper)
ISBN 0-8058-4899-1 (pbk.: alk. paper)
2003
—dc22 2003000000
CIP
Preface ise
2. Randomness 54
3. Coincidences 82
7. Statistics 232
Vll
References 439
vm
Pre£ace
ix
x * PREFACE
mental results that support the view that in many circumstances people's intu-
itions about statistics or probabilistic events are quite good.
My own interest in probabilistic reasoning derives in part from the belief
that we all engage in it, more or less constantly, that we sometimes reason
probabilistically in ways that suit our purposes very well and that we some-
times do rather poorly in this regard, and, finally, that a better understanding of
probability and our abilities and limitations as probabilistic thinkers might
help improve our thinking generally. But even if one could not legitimately
claim that there are compelling practical reasons for studying probability and
probabilistic thinking, I suspect that the allure of the topic, for me at least,
would remain. I find probability, chance, randomness, and closely related con-
cepts fascinating. I do not pretend to know what these terms mean in any deep
sense; in fact the more I have thought about them, the more elusive whatever
their referents are have become. I know that I am not alone in this respect and,
indeed, believe that I am in rather good company.
My motivation for writing this book, as nearly as I can tell, was to learn
something about probabilistic reasoning—about its origins, its status, its use in
various contexts, what recent research has revealed about it, and so on. One
should write a book, I have been told, with a specific audience in mind. This is
undoubtedly good advice, but I confess to not following it. What I have written
has been determined primarily by what I have found interesting. The book is
divided roughly into two parts. The emphasis in the first part, which includes
the first seven chapters, is historical and conceptual. The later chapters focus
more on what research has shown about people's abilities and limitations as
probabilistic thinkers.
Chapter 1 presents the history of the development of probabilistic ideas be-
ginning with the seminal correspondence between Blaise Pascal and Pierre de
Fermat in 1654 and continuing with the contributions of other 17th- and
18th-century figures. The concept of randomness, which is at once elusive and
central to probability theory, is the subject of chapter 2. Coincidences and the
question of what makes them interesting are discussed in chapter 3. The notion
of inverse probability, as represented by Bayes's theorem, and Bayesian deci-
sion making and reasoning are discussed in chapter 4. Chapter 5 presents a va-
riety of problems that illustrate some of the subtleties that can be involved in
dealing with probabilities. Chapter 6 continues this theme with a discussion of
several paradoxes and dilemmas involving probabilities. Chapter 7 gives an ac-
count of the birth of statistics and its increasing applications in science.
Chapter 8 reviews empirical work on the question of how well people are
able to estimate probabilistic variables and predict the outcomes of probabilis-
tic events. Chapter 9 continues the review of empirical work, now with a focus
on people's abilities to perceive covariation and contingency, and the factors
PREFACE • xi
that affect their performance in this regard. Chapter 10 deals with decision
making under uncertainty and the notions of expected utility, inductive
heuristics, and probabilistic mental models. Chapter 11 focuses on the general
question of how good people are as intuitive statisticians. Some concluding re-
marks comprise chapter 12.
I can only hope that some readers will learn as much from reading the book
as I think I did from writing it, and with as much pleasure. As to who such read-
ers might be, I hope they might include psychologists, philosophers, and econ-
omists for whom probabilistic reasoning is an essential aspect of their efforts to
understand the workings of the human mind more generally, researchers study-
ing probabilistic reasoning, and students preparing to do so. I believe the book
could be used as a major text for a graduate or upper-class seminar on probabil-
istic reasoning, or as a supplementary text for graduate or upper-class courses
in cognitive psychology, philosophy, economics, or any subject for which
probabilistic reasoning is a key topic. I have tried to write also, however, for the
general reader who has an interest in the subject and little or no technical train-
ing beyond standard high school math. Familiarity with basic algebra should
suffice for the great majority, if not all, uses of mathematics, of which there are
relatively few.
I owe thanks to Jonathan Baron, Susan Butler, Richard Chechile, Julie
Downs, Ruma and Rafael Falk, R. Duncan Luce, Nathan Nickerson, Richard
Shiffrin, Salvatore Soraci, Howard Wainer, and Michael Wenger for helpful
comments on parts of the manuscript and/or stimulating conversations on one
or another aspect of probabilistic reasoning, to my granddaughter Amara
Nickerson for helping to organize and check references, and, as always, to my
wife Doris for her constant love, companionship, and encouragement.
This page intentionally left blank
CHAPTER
1
Probability ana Chance
Probability is... the philosophical success story of the first half of the twentieth
century.
—Hacking (1990, p. 4)
Strictly speaking it may even be said that nearly all our knowledge is problemat-
ical; and in the small number of things -which we are able to know with certainty,
even in the mathematical sciences themselves, the principal means for ascer-
taining truth—induction and analogy—are based on probabilities.
—Laplace (1814/1951, p. 1)
Subjective probabilities are required for reasoning ...a theory of partially or-
dered subjective probabilities is a necessary ingredient of rationality.
—Good (1983, p. 95)
BEGINNINGS
T
X he use of chance devices and the drawing of lots for purposes of sortilege
and divination were common to many cultures of antiquity. Classical Greek lit-
erature contains numerous references to games of chance at least as early as the
Trojan wars, and there is evidence to suggest that such games were known in
Egypt and elsewhere long before then. One of the earliest known written docu-
ments about the use of chance devices in gaming is in the Vedic poems of the
1
2 • CHAPTER 1
Rgveda Samhita. "Written in Sanskrit circa 1000 B.C., this poem or song,
called the 'Lament of the Gambler,' is a monologue by a gambler whose gam-
bling obsession has destroyed his happy household and driven away his de-
voted wife" (Bennett, 1998, p. 34).
Gambling had become a sufficiently common form of recreation in Europe
by the time of the Roman Empire that laws forbidding it were passed—and
largely ignored. The emperor Caesar Augustus (63 B.C.-14 A.D.) was an avid
roller of the bones. The bones, in this context, were probably astragali, a form
of dice made from the heel bones of running animals. The astragalus, some-
times referred to as a talus, huckle-bone, or knuckle-bone, was shaped in such a
way that, when tossed, it could come to rest with any of four sides facing up, the
other two sides being somewhat rounded. It is known to have been used in the
playing of board games at least as early as 3500 B.C. Dice, which presumably
are evolutionary descendants of astragali, are known to have existed in the
Middle East as early as the third millennium B.C. According to Bennett
(1998), the earliest known six-sided die was made of baked clay around 2750
B.C. and was found in what was once Mesopotamia and is now Northern Iraq.
In view of the fact that chance devices have been used for a variety of pur-
poses for so long, scholars have found it difficult to explain why quantitative
theories of randomness and probability were not developed until relatively re-
cent times. Although why a theory of probability was so long in coming re-
mains unknown, there has been much speculation as to what some of the
contributing factors may have been. Hypothesized deterrents include perva-
sive belief in determinism, lack of opportunity to observe equiprobable sets
(astragali did not turn up all four faces with equal relative frequency), absence
of economic incentives, and lack of a notational system suitable for represent-
ing the critical ideas and computations. Each of these explanations has been
challenged, if not discredited to one or another degree (Hacking, 1975). There
appear what Gigerenzer et al. (1989) refer to as suggestive fragments of proba-
bilistic thinking in classical and medieval literature, and there may have been
some theoretical ideas about probability, especially in India, that are lost to us,
but apparently nothing approaching a systematic theoretical treatment of the
subject was attempted in Europe until the 17th century.
Gigerenzer et al. (1989) attribute considerable significance to the Reforma-
tion and Counter-Reformation and the associated clashes between extremist
views on faith and reason, which influenced attitudes regarding how beliefs
should be justified:
Confronted with a choice between fideist dogmatism on the one hand and the
most corrosive skepticism on the other, an increasing number of seventeenth-
century writers attempted to carve out an intermediate position that abandoned
PROBABILITY AND CHANGE
all hope of certainty except in mathematics and perhaps metaphysics, and yet
still insisted that men could attain probable knowledge.... The new criterion for
rational belief was no longer a watertight demonstration, but rather that degree
of conviction sufficient to impel a prudent man of affairs to action. For reason-
able men that conviction in turn rested upon a combined reckoning of hazard and
prospect of gain, i.e. upon expectation, (p. 5)
They continue:
Blaise Pascal and Pierre de Fermat thought collaboratively about the stakes-di-
vision problem as a consequence of the question being put to Pascal by a gam-
bler, Chevaleau de Mere. Pascal and Fermat exchanged letters on the topic over
several months during the summer and fall of 1654. (The collaborators never
met before, during, or after their remarkable correspondence.) The correspon-
dence between Pascal and Fermat regarding the stake-division problem is
worth considering in some detail, because it is viewed as one of the defining
events in the emergence of probability theory as a mathematical discipline.
And it provides a glimpse at the kind of struggle that the founders of probabil-
ity theory had in trying to understand their own intuitions and to make them ex-
plicit and clear. An English translation of what is preserved of the corres-
pondence—apparently the surviving record is not entirely complete— may be
found in David (1962, Appendix 4).
The specific situation that Pascal and Fermat considered was this. Two play-
ers are involved in a series of games and play is terminated when one of the play-
ers, say A, is two games short of winning the series and the other player, say B, is
three games short of doing so. How should the stakes be divided between the
players at this point? The tacit assumption is that A should get a larger share than
B because A would have been more likely to win the series had play continued,
but exactly what proportion of the total should each player receive?
Fermat had proposed a way of analyzing the situation. In a letter dated Monday,
August 24,1654, Pascal repeats Fermat's analysis and argues that it works for two
players but will not do so if more than two are involved. Fermat's analysis starts
with a determination of the maximum number of additional games that would
have to be played to decide the winner of the series. In this case, the number is four,
because in the playing of four games either A will win two or B will win three.
Fermat's method then calls for identifying all possible outcomes of four games, of
which there are 16, determining the percentage of them that would be won by each
player, and dividing the stakes in accordance with the ratio of these percentages.
The situation is represented in Pascal's letter by the following tabular ar-
rangement in which each column represents one possible outcome of the four
games, with a representing a win of an individual game by A and b a win by B
(Pascal actually used 1 s and 2s in the last row of the table where I have As and
Bs, but this is irrelevant to the point of the discussion):
a a a a a a a a b b b b b b b b
a a a a b b b b a a a a b b b b
a a b b a a b b a a b b a a b b
a b a b a b a b a b a b a b a b
A A A A A A A B A A A B A B B B
Certainly it is easy to see that it is absolutely equal and immaterial to them both
whether they let the game take its natural course, which is to cease play when
CHAPTER 1
one man has won, or to play the whole four games: therefore since these two pro-
cedures are equal and immaterial, the result must be the same in them both. Now,
the solution is correct when they are obliged to play four games, as I have
proved: therefore it is equally correct in the other case. (David, 1962, p. 242)
a a a a a a a a a b b b b b b b b b c c c c c c c c c
a a a b b b c c c a a a b b b c c c a a a b b b c c c
a b c a . b c a b c a b c a b c a b c . a b c a b c . a b c
So, given the need of A to win one game, and that of the other two players
each to win two, the playing of three games will not invariably produce an un-
ambiguous result. Pascal dismissed the possibility of simply adding up the to-
tal "wins" for each player and dividing the stakes in the proportion 19:7:7 on
the grounds that a combination that is favorable to two players (e.g., abb)
should not count as much for each of them as does a combination that is favor-
able to that player alone. One possibility that he considered is that of counting
those combinations that are favorable to two players as half a win for each of
them. This would give a division in the proportion 16:5.5:5.5.
Pascal argues that the latter solution would be fair if the players' intention
was to play three additional games and to share the winnings equally if there
happened to be two winners, but not if their intention was to play only until the
first player to reach his goal had done so. He contends that, given the second in-
tention, the solution is unfair because it is based on the false assumption that
three games will always be played. A similar assumption caused no difficulty
in the two-person situation considered earlier, but it is problematic here. The
PROBABILITY AND CHANCE
difference between the situations is that in the first one a combination with
more than one winner could not occur whereas in this one it could. Pascal goes
on to say that if play is to be continued only until one of the players reaches the
number of games he needs, the proper division is 17:5:5. This result, he says,
can be found by his "general method," which is not described in the letter.
In a reply to Pascal, dated Friday, September 25,1654, Fermat agrees that the
appropriate division in the three-person situation considered by Pascal is 17:5:5:
I find only that there are 17 combinations for the first man and 5 for each of the other
two: for, when you say that the combination a c c is favourable to the first man and to
the third, it appears that you forgot that everything happening after one of the play-
ers has won is worth nothing. Now since this combination makes the first man win
the first game, of what use is it that the third man wins the next two games, because
even if he won thirty games it would be superfluous? (David, 1962, p. 248)
Fermat seems to be saying that if one uses his combinatorial analysis and
credits a combination only to the first of the players to reach his goal, then one
will get the right proportion in a straightforward way. If we do this with the pre-
vious table, for example, and let the top-to-bottom order of the rows represent
the order in which the games are played, we get:
a a a a a a a a a b b b b b b b b b c c c c c c c c c
a a a b b b c c c a a a b b b c c c a a a b b b c c c
a b c a b c a b c a . b c a b c a b c a b c a b c a b c
A A A A A A A A A A A A A A A A A
B B B B B
C C CCC
Fermat argues that this method is general in the sense that it works just as
well no matter how many games are to be played. If it is done with four games,
for example, it will be seen that A wins 51 of the 81 possible combinations and
B and C each win 15, giving again the proportion 17:5:5. The tone of Fermat's
letter suggests that he understood all this before Pascal pointed out the "prob-
lem" with his analysis of the two-person game.
Fermat proposes another way to view the situation. I quote his presentation
of this view in its entirety, because it would be easy to change it subtly but ma-
terially by paraphrasing it:
The first man can win, either in a single game, or in two or in three.
If he wins in a single game, he must, with one die of three faces, win the first
throw. [Throughout the correspondence Fermat and Pascal use the abstraction
CHAPTER 1
of an imaginary die with two or three faces.] A single [three-faced] die has
three possibilities, this player has a chance of 1/3 of winning, when only one
game is played.
If two games are played, he can win in two ways, either when the second player
wins the first game and he wins the second or when the third player wins the first
game and he wins the second. Now, two dice have 9 possibilities: thus the first
man has a chance of 2/9 of winning when they play two games.
If three games are played, he can only win in two ways, either when the second
man wins the first game, the third the second and he the third, or when the third
man wins the first game, the second wins the second and he wins the third; for, if
the second or third player were to win the first two games, he would have won the
match and not the first player. Now, three dice have 27 possibilities: thus the first
player has a chance of 2/27 of winning when they play three games.
The sum of the chances that the first player will win is therefore 1/3,2/9 and 2/27
which makes 17/27.
And this rule is sound and applicable to all cases, so that without recourse to any
artifice, the actual combinations in each number of games give the solution and
show what I said in the first place, that the extension to a particular number of
games is nothing but a reduction of the several fractions to a common denomina-
tor. There in a few words is the whole mystery, which puts us on good terms
again since we both only seek accuracy and truth. (David, 1962, p. 248)
Although Fermat does not use it, his comments suggest the following tree
representation of the situation he described—the case of three players, one of
whom, say A, needs one more win and two of whom, B and C, both need two:
PROBABILITY AND CHANCE
The boxed outcomes in this representation indicate possible terminal
points. Play ceases after Game 1, for example, if that game is won by A, which
will happen with probability 1/3. If B wins Game 1, which also will happen
with probability 1/3, Game 2 is played. In 1/3 of those games A will win Game
2; so the probability that B wins Game 1 and A wins Game 2 is (1/3) x (1/3) or
1/9. And so on. The probability that A will win the series is the sum of the prob-
abilities of the individual ways in which A can win, which is 1/3 + 2/9 + 2/27 =
17/27. The probability that B will win the series is 1/9 + 2/27 = 5/27, as is the
probability that C will win it.
I have considered the correspondence between Pascal and Fermat at some
length, because it provides a glimpse into the thinking of two giants of mathe-
matics—Todhunter (1865/2001) calls them the most distinguished mathemati-
cians of Europe at the time—as they attempted to deal with probabilistic
concepts before the theory of probability had been developed to any great de-
gree. (Fermat was not a mathematician by profession, but apparently did math-
ematics for the sheer pleasure that he derived from it. Extraordinarily
productive, E. T. Bell [1937] calls him "the prince of amateurs" [p. 56].) One
sees a progression of thought that involves a gradual increase in awareness of
the various dimensions of the problem and the realization that convergence on
a consensus requires making assumptions clear.
It is interesting to note that neither Pascal nor Fermat seemed intent on rush-
ing into print to establish the priority of their early work in probability theory. As
Todhunter (1865/2001) points out, it apparently was sufficient for each of these
men to gain the approbation of the other. Todhunter notes, too, that the theory
was advanced only little during the half century following their correspon-
dence—Pascal soon turned his attention from mathematics, Fermat died in
1665, and there soon were other topics, such as the differential calculus of New-
ton and Leibnitz, to capture mathematicians' attention. Pierre de Montmort,
writing shortly after the turn of the 18th century could claim, with only moderate
exaggeration, to be exploring a subject that after being only slightly noticed had
been entirely forgotten for sixty years (Todhunter, 1965/2001).
Montmort gives the following general solution to the "problem of points"
that was the subject of the Pascal-Fermat correspondence. Let m and n repre-
sent the number of points that A and B, respectively, need to win the game, and
letp represent the probability of A winning a single trial and q the probability
ofBdoingso,/? + #= 1. Let r=m + n-l, the number of trials in which the game
must be decided. A's chance of winning is given by
10 • CHAPTER 1
and B's by
Practical Applications
Although much of their work involved games of chance and the solutions of
abstract problems, many of the contributors to the development of theory
PROBABILITY AND CHANCE • 15
found ways to apply their work to socially significant problems. Some, includ-
ing Daniel Bernoulli and Jean le Rond D'Alembert, applied probability theory
to the question of the advisability of the use of smallpox vaccination;
Bernoulli, Euler, and others applied it to the computation of annuities; some
calculated rates for insurance that took account of the probabilities of specific
insurable events; several used the theory to argue the irrationality of participat-
ing in lotteries; Laplace applied it to problems in astronomy, to voting,
and—especially with his invention of the method of least squares—to the pre-
diction of error in many contexts.
As in mathematics more generally, problems that might appear to be solved
simply for the intellectual challenge they posed were later turned to some prac-
tical purpose. Daniel Bernoulli, for example, solved the following problem:
"In a bag are 2n cards; two of them are marked 1, two of them are marked 2, two
of them are marked 3,... and so on. We draw out m cards; required the probable
number of pairs which remain in the bag" (Todhunter, 1865/2001, p. 229).
Subsequently he found the solution to the problem useful in calculating tables
of remaining life expectancies of married couples (as couples), letting the cou-
ples be represented by the pairs of cards in the problem.
People who wished to apply notions of probability or statistics to practical
problems did not feel compelled to wait until a firm theoretical foundation
had been built before doing so. Indeed, probability theory is an existence
proof, par excellence, of the principle that the usefulness of mathematics of-
ten outruns its theoretical justification. It is one of several examples that
could be given of an area of mathematics whose rigorous development was
spurred by interest in practical problems to which it could be applied and, as
such, it gives the lie to the idea that the best mathematics invariably comes
from interests other than practical ones. As Daston (1987a) puts it, "The doc-
trine of aleatory contracts stimulated the development of a mathematics of
chance and provided the fledgling theory of probability with both problems
and a conceptual framework within which to solve them" (p. 254). Today
probability theory is among the most extensively used areas of mathematics,
despite continuing debates about the meanings of its foundational concepts;
Kac (1964) refers to it as "a cornerstone of all the sciences" (p. 95). The ways
in which this area of mathematics has been applied have had a profound im-
pact on how we view ourselves and the world.
The importance of probability and statistics in the physical sciences was es-
tablished by their application to a variety of problems, first to astronomy nota-
bly by Laplace (Stigler, 1987), and later to the kinetic theory of gases by
Maxwell, Boltzmann, and Gibbs, to the analysis of Brownian motion by Ein-
stein and Smoluchowski, and to the theory of radioactive decay by Rutherford.
Pais (1986) identifies Rutherford's discovery of a half-life and Planck's devel-
16 • CHAPTER 1
opment of the theory of the quantum at about the same time as the two events
that marked the end of the era of classical physics and necessitated a funda-
mental revision of the concept of causality. That atoms decay spontaneously
and that the probability that a given atom will decay instantaneously is constant
over time are sufficiently well established in modern physics to be viewed as
facts and are invoked to explain a wide range of other phenomena. Why atoms
have this probabilistic property—why one cannot tell that ^particular atom is
more or less likely than others of the same type to decay in a given time—re-
mains a mystery. The theory of the quantum has made probability central to
particle physics in many ways; indeed, one may say that the theory of probabil-
ity is fundamental to quantum physics.
More generally, although the term revolution is undoubtedly overworked, it
seems appropriate to apply it to the change in thinking effected by probability,
chance, and closely related concepts during the 19th and early 20th centuries.
As I. B. Cohen (1987) puts it:
The physical, biological, and social scientists of the twentieth century are al-
most universally aware that the establishment of a statistically based physics (ra-
dioactivity and quantum physics), biology (especially genetics), evaluation of
experimental data, and social science have constituted so sharp a break with the
past that no term of lesser magnitude than revolution should be used to charac-
terize this mutation, (p. 34)
WHAT IS PROBABILITY?
The vision and the intuition of the fair coin are fogged over and the road to
axiomatization is beset with pitfalls. So also are the philosophical and psycho-
logical bases of probability. (P. J. Davis & Hersh, 1981, p. 165)
Despite the many attempts no agreement as to what probability is appears to be
in sight. (Macdonald, 1986, p. 16)
The abstract noun "probability"—despite what we learnt at our kindergartens
about nouns being words that stand for things—not merely has no tangible coun-
terpart, referent, designation or what you will, not merely does not name a thing
of whatever kind, but is a word of such a type that it is nonsense even to talk
about it as denoting, standing for, or naming anything. (Toulmin, 1958, p. 65)
The concept of probability has been around in something close to its current
form for more than 300 years and has proved to be extremely useful. The theory
of probability is now an established area of mathematics with a well-stocked
18 • CHAPTER 1
store of axioms, theorems, corollaries, lemmas, and the rest. What exactly
"probability" means, or should mean, however, is far from established. Is there
such a thing as objective probability as distinct from a state of mind? Is probabil-
ity a property of a proposition or of an event? Is the concept of chance incompati-
ble with that of determinism? How does one determine the probability of a
possible event? These and many similar questions have been the subjects of de-
bate and discussion that began in the beginning and continue to this day.
One point on which there appears to be little debate is that whatever proba-
bility is it is closely related to the concept of chance. Uspensky (1937) put it
this way: "It is always difficult to describe with adequate conciseness and clar-
ity the object of any particular science; its methods, problems, and results are
revealed only gradually. But if one must define the scope of the theory of prob-
ability the answer may be this: The theory of probability is a branch of applied
mathematics dealing with the effects of chance" (p. 1). Certainly in what fol-
lows, we shall have many occasions to refer to chance, and the closely associ-
ated concept of randomness; it would not be possible to discuss probability
without use of them.
As to what constitutes chance, Laplace (1814/1951) offered the following
definition:
The theory of chance consists in reducing all the events of the same kind to a cer-
tain number of cases equally possible, that is to say, to such as we may be equally
undecided about in regard to their existence, and in determining the number of
cases favorable to the event whose probability is sought. The ratio of this num-
ber to that of all the cases possible is the measure of this probability, which is
thus simply a fraction whose numerator is the number of favorable cases and
whose denominator is the number of all the cases possible, (p. 7)
Laplace argued that when the probability is unknown, all possible values of
probability between 0 and 1 should be considered equally likely. (This view
has been controversial [Hacking, 1975; Polya, 1954b].) Together, Upsensky's
definition of probability and Laplace's definition of chance make a fairly tight
circle; the idea of equal possibility or likelihood keeps popping up in discus-
sions of both of these concepts. Laplace did, however, make provision for situ-
ations in which all the cases are not equally possible. In such cases, one should
"determine first their respective possibilities, whose exact appreciation is one
of the most delicate points of the theory of chance. Then the probability will be
the sum of the possibilities of each favorable case" (p. 11). (Laplace often used
possibility in much the way we today use probability,)
Peirce (date unknown/1956) referred to the theory of probability as "simply
the science of logic quantitatively treated" (p. 1334): "The general problem of
PROBABILITY AND CHANCE • 19
probabilities is, from a given state of facts, to determine the numerical proba-
bility of a possible fact. This is the same as to inquire how much the given facts
are worth, considered as evidence to prove the possible fact. Thus the problem
of probability is simply the general problem of logic" (p. 1334). Despite his use
of simply in these comments, Peirce saw probability as a controversial and not
well understood subject. "It may be doubted," he claimed, "if there is a single
extensive treatise on probabilities in existence which does not contain solu-
tions absolutely indefensible" (p. 1334). This state of affairs he attributed in
part to the lack of a regular method for calculating probabilities and in part to
the disputed nature of the fundamental principles of its calculus.
Probabilistic reasoning is an example, in Peirce's (date unknown/1956)
view, of explicative, analytic, or deductive reasoning, which he distinguished
from amplificative, synthetic, or, "loosely speaking," inductive reasoning. He
interpreted probability frequentistically and defined the probability of a mode
of argument "as the proportion of cases in which it carries truth with it" (p.
1336). The probability of the conclusion B given the premise A, for example,
was seen as the ratio of the number of times in which both A and B are true to
the total number of times in which A is true independently of the truth of B.
Peirce also recognized the distinction between probability "as a matter of fact,"
which is calculated as a ratio of two sets of events, one of which is a subset of
the other, and probability as "the degree of belief which ought to attach to a
proposition" (p. 1342), although he claimed that most writers had mixed the
two conceptions together. The latter view, "though answering well enough in
some cases" (p. 1346), he considered to be generally inadequate.
In contrast to the view that probabilistic reasoning is a form of deduction is a
close association of it with induction. "The two topics, induction and probability,
are nowadays so closely linked in philosophers' minds, that no explanation is felt
to be needed why a book, for example, which is about induction should also be
about probability" (Stove, 1986, p. 113). Stove argues that probability and induc-
tion are so closely linked in philosophers' minds that they "are often inextricably
confused" (p. 113). Perhaps the point on which one would be most likely to get
consensus among philosophers, mathematicians, and others who make use of the
concepts of probability and chance is that it is difficult indeed to find definitions on
which one can expect general agreement. As Carnap (1953) put it: "Practically ev-
eryone will say that probability as used in science has only one meaning, but when
you ask what that meaning is, you will get different answers" (p. 128).
Any probability should in principle be indexed with the name of the person, or
people, whose opinion it describes. (Edwards, Lindman, & Savage, 1963, p. 197)
20 • CHAPTER 1
A question about probability that has been around almost as long as the con-
cept and that persists to this day is whether statements such as "The probability
that the next toss of the die will come up 4 is 1/6" and "The chance of rain to-
morrow is about .8" should be viewed as anything more than reflections of
states of mind—expressions of expectations, of degrees of knowledge or igno-
rance, about the outcomes of uncertain events, or of levels of confidence in the
truth of specified assertions.
According to one view, the idea of chance is necessitated only by the limita-
tions of human knowledge. This view was held by many of the major contribu-
tors to probability theory, including Laplace and de Morgan. According to it,
all apparently probabilistic phenomena are, in fact, deterministic. They are un-
predictable because of our ignorance of the cause-effect relationships in-
volved, or because, even if we understand the relationships in principle, the
situations are too complex to be tractable computationally.
Hacking (1990) notes that the association of chance and probability with
lack of knowledge prevailed during what we now refer to as "the Age of Rea-
son." There is, in his characterization of the view, a hint of something bordering
on the disreputable; the need to invoke such concepts appears to have been seen
as an embarrassment to the human intellect:
Chance, superstition, vulgarity, unreason were of one piece. The rational man,
averting his eyes from such things, could cover chaos with a veil of inexorable
laws. The world, it was said, might often look haphazard, but only because we do
not know the inevitable workings of its inner springs. As for probabilities—
whose mathematics was called the doctrine of chances—they were merely the
defective but necessary tools of people who know too little, (p. 1)
Citing B. J. Shapiro (1983) and Daston (1988), Gigerenzer (1994) notes that
many connotations of probability were promoted at different times during the
18th and 19th centuries, including "physical symmetry (e.g., the physical con-
struction of dice, now called 'propensity'); frequency (e.g., how many people
of a given age died annually); strength of argument (e.g., evidence for or
against a judicial verdict); intensity of belief (e.g., the firmness of a judge's
conviction in the guilt of the accused); verisimilitude; and epistemological
modesty, among others" (p. 134). A remarkable aspect of the use of the concept
by Enlightenment probabilists, Gigerenzer suggests, is the ease with which
they slid from one meaning of the term to another.
Writing in the 18th century, D' Alembert suggested the need for a distinction
between what is metaphysically possible and what is physically possible. Any-
thing is metaphysically possible that is not ruled out as a conceptual absurdity;
something is physically possible only if it could actually occur: "It is meta-
physically possible to throw two sixes with two dice a hundred times running;
but it is physically impossible, because it never has happened and never will
happen" (Todhunter, 1865/2001, p. 262).
The considerable attention given to probability during the 20th century did
not result in a convergence on a definition to which all theorists would sub-
scribe. Not only do different writers continue to conceptualize probability in
different ways, many explicitly recognize two or more types of probability and
discuss the circumstances under which each meaning is to be used. W. Weaver
(1950), for example, makes a distinction between mathematical probability
and statistical probability. For mathematical probability, "equally likely
cases" is an undefined concept, playing much the same role in probability the-
ory that points and lines play in geometry. For statistical probability, the likeli-
hood of cases is determined empirically; the likelihood that a randomly
selected child will be a girl is the percentage of all children that are girls.
Margolis (1987) distinguishes between probabilityg, or probability in the gam-
bling sense, and probabilityb, which connotes believability or plausibility. He
argues that sometimes participants in experiments on probabilistic reasoning
PROBABILITY AND CHANCE • 23
answer questions on the basis of probabilityb when the experimenter intended
that they answer them on the basis of probabilityg.
Nagel (1936/1956) distinguishes three interpretations of probability: the
view formulated by de Morgan (and held more recently by Harold Jeffreys
[1934, 1939] and L. J. Savage [1954/1972, 1962] among others), which sees
probability as a state of mind or the strength of belief in the truth of a proposi-
tion; the view, associated with John Maynard Keynes (1921), that probability
is an unanalyzable but intuitively understandable logical relation between
propositions; and the view that equates probability with relative frequency in
the same sense as noted earlier. Nagel argues that probability can have different
meanings in different contexts. He sees the frequency interpretation (when
properly qualified) as the most satisfactory one in everyday discourse as well
as in applied statistics and measurement, and in many branches of the theoreti-
cal sciences. He sees it also as appropriate for interpreting some statements
about the probability of hypotheses, but does not think it the appropriate inter-
pretation in statements about the probability of complicated scientific theories.
Keynes (1921/1956) treats the concept of probability almost as an undefin-
able primitive:
No knowledge of probabilities, less in degree than certainty, helps us to know
what conclusions are true, and... there is no direct relation between the truth of a
proposition and its probability. Probability begins and ends with probability.
That a scientific investigation pursued on account of its probability will gener-
ally lead to truth, rather than falsehood, is at the best only probable. The proposi-
tion that a course of action guided by the most probable considerations will
generally lead to success, is not certainly true and has nothing to recommend it
but its probability. The importance of probability can only be derived from the
judgment that it is rational to be guided by it in action; and a practical depend-
ence on it can only be justified by a judgment that in action we ought to act to
take some account of it. (p. 1373)
Keynes expects that people with the same education would tend to agree re-
garding what the probability of a specified uncertain event would be.
Ayer (1965) distinguishes three types of judgments of probability: judg-
ments of a priori probability, statistical judgments, and judgments of credibil-
ity. As illustrations of the three, he gives a statement of the likelihood of tossing
double six with a pair of true dice, the observation that any given unborn child
is slightly more likely to be a boy than a girl, and the claim (as of 1965) that
there is little chance that Britain will join the Common Market. Ayer distin-
guishes also five different types of events that are described as happening by
chance: (a) a member of a series that conforms with the a priori calculcus of
chances (theory of probability), (b) a deviation from an established frequency,
(c) an event that was not intended by the agent that caused it, (d) concurrent
24 • CHAPTER 1
(coincidental) events for which there is no ready causal explanation, and (e)
specific outcomes of statistically determined processes (on average, one of six
tosses of a die will show a three; which tosses show a three is a matter of chance
in this sense).
L. J. Cohen (1982) identifies five different interpretations of probability:
"We can interpret the probability-function that is regulated by the classical cal-
culus of chance either as a ratio of a priori chances, or as a relative frequency of
empirically given outcomes, or as a causal propensity, or as a logical relation,
or as a measure of actual or appropriate belief (p. 253). He does not contend
that one of these interpretations is right and the others wrong: "Just as we can
usefully measure quantities of apples in at least three different ways (by num-
ber, by weight or by volume), so too, if we desire, we can measure probabilities
in several different ways according to the purpose in hand" (p. 253).
Continuing the escalation, Good (1983b) explicitly distinguishes seven
kinds of probability: tautological (mathematical), physical, and five types of
intuitive (logical, subjective, multisubjective, evolving, and psychological).
He defines subjective probability as "psychological probability to which some
canons of consistency have been applied" and logical probability as "the sub-
jective probability in the mind of a hypothetical perfectly rational man" (p.
122). Logical probability is sometimes referred to as credibility.
In sum, two points are clear: First, there is not a consensus among people
who use probability theory about what probability means, and second, it is pos-
sible to distinguish several different connotations that have been given to the
term. A comment by Toulmin (1958) describes the situation well:
The attempt to find some "thing," in terms of which we can analyze the solitary
word "probability" and which all probability-statements whatever can be
thought of as really being about, turns out therefore to be a mistake.... To say
that a statement is a probability-statement is not to imply that there is some one
thing which it can be said to be about or express. There is no single answer to the
questions, "What do probability-statements express? What are they about?"
Some express one thing: some another. Some are about to-morrow's weather:
some about my expectation of life. If we insist on a unique answer, we do so at
our own risk. (p. 70)
Probability and determinism are often viewed as antithetical ideas. The be-
havior of deterministic systems is considered to be lawful and, at least in
principle, predictable. That of probabilistic systems is seen to be erratic and
inherently unpredictable in detail. In fact, completely deterministic systems
can produce unpredictable behavior. Mathematical chaos, complexity the-
ory, and other areas of mathematics that focus on nonlinear systems present
many examples of unpredictably complex—some would say truly ran-
dom—behavior produced by systems whose operation is governed by rela-
tively simple deterministic rules. Some observers believe this fact reveals a
deep truth about the universe and our ability to know it: "The conclusion must
be that even if the universe behaves like a machine in the strict mathematical
sense, it can still happen that genuinely new and in-principle unpredictable
phenomena occur" (Davies, 1988, p. 55).
Not only can deterministic systems be unpredictable, but the behavior of proba-
bilistic systems can be highly predictable in the aggregate; at the appropriate level
of description, probabilistic phenomena are remarkably lawful and predictable,
and this is a primary reason why probability theory has been of such great interest
and of use in so many contexts. It would seem that "we can't get away from deter-
minism. Chase it out the door, by postulating total incoherence, and it comes back
through the window, in the guise of statistical laws" (Ekeland, 1993, p. 50).
The relationship between probability and determinism is not easy to de-
scribe in a few words, in part because both terms are used to represent a variety
of concepts. We have already noted several connotations that have been given
to probability. Determinism has been conceived in several ways as well.
Gigerenzer et al. (1989) distinguish five—metaphysical, epistemological, sci-
entific, methodological, and effective—and argue that whereas "the empire of
chance" has shaken scientific determinism, it has left the other types intact.
Some students of chance have seen statistical regularity as a form of deter-
minism. De Moivre (1756/1962), for example, expressed that view this way:
"If from numberless observations we find the ratio of events to converge to a
determinate quantity, as to the ratio of P to Q, then we conclude that this ratio
PROBABILITY AND CHANCE • 27
expresses the determinate law according to which the event is to happen" (p.
264). Buckle (1857-1861) saw statistical regularities as evidence of a lawful-
ness that could be countermanded neither by chance nor by human will. Davies
(1992) captures something of this idea in declaring a difference between
stochasticity and anarchy.
Heated and extensive debate has occurred also as to whether the phenome-
non of statistical regularity should be seen as evidence against human freedom.
No one denies the reality of the phenomenon; the debate has centered on the
question of whether aggregate regularity is consistent with freedom at the level
of the individual. Like most such debates, this one has had no clear winner, or at
least no losers willing to acknowledge themselves as such.
A particularly interesting aspect of the debate, because it tells us something
about human reasoning, is the contrast between attitudes prevailing in the early
part of the 19th century and those more common a century later: "In the 1930s,
the conviction that the laws of nature are probabilistic was thought to make the
world safe for freedom. The incoherence went in the opposite direction in the
1830s: if there were statistical laws of crime and suicide, then criminals could
not help themselves. In 1930, probability made room for free will; in 1830, it
precluded it" (Hacking, 1990, p. 136). Hacking goes on to note, however, that
doubts about whether the fact of statistical regularity really does support the
idea of free will persist: "The cool-headed analytic view says that a statistical
law may apply to a population, but members of the population remain free to do
as they please. The law applies only to the ensemble of individuals.... Despite
this glib and comfortable opinion, we have not made our peace with statistical
laws about people. They jostle far too roughly with our ideas about personal re-
sponsibility" (p. 117).
Given the course that physics took during the 20th century, the history of
the development of probability theory and statistics now seems a little ironic.
Quetelet and others wanted to quantify social and psychological phenomena
in the fashion of the physical sciences, in order to give the former the cer-
tainty enjoyed by the latter, and the application of statistics to social and psy-
chological phenomena was central to this effort. Porter (1986) summarizes
the attitudes among social scientists regarding the application of statistics to
their field this way:
The evident success of statistics as an approach to social science was not inter-
preted by contemporaries as vindication of a metaphysic which regarded the
laws governing certain domains as only probable. On the contrary, statistical
laws were deliberately formulated to extend the certainty of sciences like astron-
omy and mechanics to knowledge of phenomena which hitherto had resisted ex-
act scientific investigation, (p. 69)
28 • CHAPTER 1
The irony is that an area of mathematics that was seen to be the avenue to
certainty regarding social and psychological phenomena subsequently became
instrumental in relieving us of much of the certainty with which we had viewed
the physical world before the appearance of quantum mechanics on the scene.
The acceptance of indeterminism has been referred to as "one of the most strik-
ing changes of modern scientific thought" (Porter, 1986, p. 149), and the dis-
covery that the physical world is not deterministic as "the most decisive
conceptual event of twentieth century physics" (Hacking, 1990, p. 1).
In quantum mechanics, the probabilistic nature of the properties of sub-
atomic particles is considered to be not a matter of the limitations of our
knowledge, but inherent to the particles themselves. Our inability to deter-
mine both the position and momentum of a particle to a high degree of accu-
racy is not just a consequence of the crudeness of our measuring techniques,
and it is not even only a result of the fact that the act of determining one of
these properties has an effect on the other property; according to some inter-
pretations of the theory, a particle does not have a precise location and a pre-
cise momentum at the same time.
The emergence of probability theory and statistics as mathematical disci-
plines was not the only—perhaps not even the primary—cause of the demise
of determinism in physics. This was forced by the results of experiments that
could not be accounted for in classical deterministic terms. The fact that by
the time these experimental results were obtained, probability theory and sta-
tistics were well established meant that scientists had the concepts and tools
they needed to accommodate the new results. One wonders what course
physics might have taken had these concepts and tools not been at hand.
Surprisingly, acceptance of the idea that the universe is nondeterministic
at the quantum level has had the effect of greatly increasing physicists' abil-
ity to predict phenomena at higher levels of organization. Discovery of the
indeterminate character of the building blocks of nature has somehow made
their behavior in the aggregate more understandable and susceptible to
modification and control. In short, indeterminism at the level of the individ-
ual particle does not seem to rule out the possibility of determinism at the
level of their aggregate behavior. This observation applies equally to the be-
havior of inanimate physical systems and to that of individual people and
large groups. Koestler (1972) speaks of the paradox of the lawfulness of
chance: "The paradox consists, loosely speaking, in the fact that probability
theory is able to predict with uncanny precision the overall outcome of pro-
cesses made up out of a large number of individual happenings, each of
which in itself is unpredictable. In other words, we observe a large number
of uncertainties producing a certainty, a large number of chance events cre-
ating a lawful outcome" (p. 25).
PROBABILITY AND CHANCE • 29
In order for the probability of the drawing of k blues and 4-k white balls to
be equal for all values of n from 0 to 4, it would have to be the case that
and there is no value of p for which these equalities can all hold.
Nevertheless, the principle of indifference is a controversial one, and at least
as sometimes expressed, can lead to contradictions. (See the discussion of
Bertrand's paradox in chap. 6.) Carnap (1953) illustrates this by applying
Jeffreys' (1939) claim that in the absence of a reason to consider one hypothe-
sis more likely than another the probabilities are equal—to the situation in
which the question is what probability to assign to the drawing of a ball of a
PROBABILITY AND CHANCE • 37
specified color from an urn containing balls of three colors, say blue, red, and
yellow, in unknown proportions:
Let us consider as a starting hypothesis that the first ball we draw from the urn will
be blue. According to Jeffreys' (and Laplace's) statement of the principle of indif-
ference, if the question is whether the first ball will be blue or not blue, we must as-
sign equal probabilities to both these hypotheses; that is, each probability is 1/2. If
the first ball is not blue, it may be either red or yellow, and again, in the absence of
knowledge about the actual proportions in the urn, these two have equal probabili-
ties, so that the probability of each is 1/4. But if we were to start with the hypothe-
sis that the first ball drawn would be, say, red, we would get a probability of 1/2 for
red. Thus Jeffreys' system as it stands is inconsistent, (p. 132)
Probability in its most general use is a measure of our degree of confidence that a
thing will happen. (Tippett, 1941/1956, p. 1485)
38 • CHAPTER 1
Whatever else it may be, the idea of probability is a reflection of the fact that
our knowledge of the world is often uncertain and incomplete. Because our in-
dividual knowledge bases about specific aspects of the world differ, the proba-
bilities we assign to possible events should sometimes differ also. If a fair coin
were about to be tossed, you and I would undoubtedly agree that the probabil-
ity that the toss would result with a head is .5. If the coin had already been
tossed, but neither of us had seen the outcome, we both would again consider
the probability of head to be .5. If you now looked at the coin but I did not, the
probability of head for you would immediately become either 0 or 1, but for me
it would remain at .5. Your uncertainty about this aspect of the world has been
resolved by the information you received from your observation, but my uncer-
tainty remains. Or if one of us knew the coin tosser to be using a trick coin
weighted so as to produce a head on about 75% of the tosses, and the other did
not have this knowledge, again the probability of the outcome of the upcoming
toss would be different in our minds.
In this book, a number representing a probability will be taken to reflect a
belief regarding the likelihood of the occurrence of a particular event. Gen-
erally it will be assumed that such beliefs are based on, and are consistent with,
either assumptions of physical symmetry (as would be involved in coin tossing
or dice throwing) or objective relative frequency data (as, for example, in actu-
arial statistics) when such data are available. When notions of symmetry are
not relevant or relative frequency data are not available (as when estimating the
probability of a third world war before the middle of the 21st century), such a
number will be taken to reflect the judgment of an individual or group, the basis
of which may or may not be possible to make explicit.
For purposes of describing experimental investigations of reasoning under
uncertainty, it may not be necessary to make much of the distinction between rel-
ative frequency and probability. To be sure, some investigators have had people
estimate relative frequencies and some have had them estimate probabilities, but
typically the basis for judging the accuracy of the probability estimates (when
that has been a concern) has been relative-frequency data. When, for example,
people have been asked to estimate the probability that one will be the victim of
some type of accident, it has been assumed that the true probabilities are seen in
past-incidence statistics. In this book, when discussing experimental work on
reasoning under uncertainty, I will follow the convention of using the same terms
(relative frequency, probability) as have the investigators whose work is cited,
and generally will not make anything of the differences in terminology.
Until the development of the theory of quantum mechanics early in the 20th
century, the need to make use of statistical probability was assumed to be a re-
flection of the limitations of our powers of observation and understanding of
physics. The outcome of the toss of a coin, for example, was assumed to be pre-
PROBABILITY AND CHANCE • 39
dictable, in principle, though not in practice, given the current state of under-
standing of the interacting causal forces. With the development of the theory of
quantum mechanics came acceptance of the idea that some properties of the
universe are inherently probabilistic, and are not predictable, except statisti-
cally, even in principle. In quantum theory, distributions of quantum events are
predictable, but individual events are not. The intensity of a field at a particular
point is said to equal the probability of the occurrence of the field's associated
quantum particle at that point; whether it will actually be observed at that point
can be predicted only probabilistically until the observation is made.
The debate between those who hold a frequentist view of probability and
those who hold a subjectivist view is unlikely to be resolved soon; there are
knowledgeable people on both sides of the divide who see their position as ob-
viously correct and who fail to understand how anyone could find the opposing
view tenable. Fortunately, the debate represents little threat to probability cal-
culus as a mathematical discipline; the mathematical theory of probability, es-
pecially as axiomatized by Kolmogorov, appears to be immune to disagree-
ments about what probability really is. It is remarkable that the theory has
proved to be so practically useful in so many contexts, despite the continuing
lack of consensus regarding what the basic concept means, but a similar obser-
vation can be made about mathematics more generally.
G. Shafer (1993) argues that an axiomatic theory of probability can be based
on axioms pertaining to any of three ideas—knowledge of the long run, fair
price (or fair odds), and warranted belief—that the concept of probability ties
together in what he calls a circle of reasoning. But whichever idea is selected as
foundational, one cannot do full justice to probability without involving the
other two ideas as well; the three were inextricably intertwined in the emer-
gence of mathematical probability. Historically, Shafer attributes the develop-
ment of the theory of fair price in games of chance to Pascal, Fermat, and
Huygens in the 1650s, and the step from fair price to probability during the fol-
lowing 50 years primarily to Jacob Bernoulli, who also proved the law of large
numbers, which is fundamental to knowledge of the long run. Knowledge of
the long run was brought back to fair price by Condorcet in the 1780s, thus
making the circle complete. The contemporary thinker can enter this circle at
any point, Shafer suggests, but getting the ideal picture of probability requires
that one appreciate all three ideas and the connections among them.
Fair coins, if tossed repeatedly, will come up heads approximately half of the
time. No one, no matter what one's understanding of what probability "really is,"
doubts that this is so. But why is it so? What determines that heads and tails will
occur with about equal frequency? This question pertains to physical or aleatory
probability, and it is one that, in one form or another, has perplexed probability
theorists, and other people who have thought about such things, for a long time.
Consider again Bernoulli's limit theorem. What gives us confidence that
what this theorem claims—that "in the limit" the relative frequencies with
which specific events are observed in a sample will be arbitrarily close to the
probabilities of those events—is true? We cannot check it empirically, because
we cannot examine a sample that is "close to infinity" in size. Moreover, sup-
pose we could do so. If we found that the observed relative frequencies of spe-
cific events in such a sample differed from the assumed probabilities of those
events, we would revise our assumptions regarding the probabilities. If we
found, for example, that a large number of tosses, say a million, of a coin
yielded heads and tails in the ratio 55:45, we would conclude not that the limit
theorem is wrong but that the probability of that particular coin coming up head
is not .5 but closer to .55. And we would do so on the strength of Bernoulli's
limit theorem or reasoning of a similar kind. But is this not circular?
What are we to say to Gigerenzer et al.'s (1989) claim that Bernoulli's
limit theorem "created a model of causation that was essentially devoid of
causes" (p. 29)? Is it true, as these writers suggest, that this new model "aban-
doned all search for mechanisms, for the hidden springs and principles that
ran the clockwork of the world" (p. 30)? "In Bernoulli's urn model," they ar-
gue, "numbers generated numbers; the physical processes by which they did
so were wholly inscrutable" (p. 30). Are we closer to an understanding of
what probability really is today than when Bernoulli's theorem and urn
model were originally articulated?
Early literature on probability contains references to possibilities, propensi-
ties, proclivities, and facilities, but such terms have little explanatory power.
Sometimes the circularity of attempts to define or account for probability was
glaring, as when Laplace, in his first paper on the topic, noted that probability
46 • CHAPTER 1
could be defined in terms of a ratio among cases, so long as the cases are
equally probable (Oeuvres, VIII, p. 10; noted in Hacking, 1975, p. 131). Fortu-
nately, as already noted, it is not necessary to be able to explain probability in
order to be able to use the concept to good effect.
It would be very unlikely for unlikely events not to occur. (Paulos, 1990, p. 37)
The essence of chance is that anything is possible, even the improbable.
(Ekeland, 1993, p. 113)
Just how great do the odds against a particular outcome have to be before one can
be confident that the outcome is not the result of chance? (Devlin, 2000b, p. 14)
Poincare (1913/1956) described the birth of a great man as the greatest bit of
chance:
It is only by chance that meeting of two germinal cells of different sex, contain-
ing precisely, each on its side, the mysterious elements whose mutual reaction
must produce the genius. One will agree that these elements must be rare and
that their meeting is still more rare. How slight a thing it would have required to
deflect from its route the carrying spermatozoan. It would have sufficed to de-
flect that a tenth of a millimeter and Napoleon would not have been born and the
destiny of our continent would have been changed. No example can better make
us understand the veritable characteristics of chance, (p. 1392)
The fact that any particular hand is so improbable is undoubtedly one of the
reasons people can play such a game as bridge for many years without becom-
ing bored with it. One can easily play the game regularly for many years with-
out ever seeing the same hand twice.
48 • CHAPTER 1
Paulos' claim that it would be silly to decide that one could not have been
dealt the bridge hand that one holds on the grounds of its improbability is
unlikely to provoke much debate. The bridge hand illustration makes the
case too for the observation by Devlin (2000b) that "simply computing the
probability of an event is not enough to decide whether some phenomenon
is the outcome of chance or design" (p. 14). But suppose now that someone
predicts the cards that her next hand will contain and it turns out that she is
dealt precisely the predicted cards. The probability that the prediction
would prove to be accurate, assuming a random deal, is less than 1 in 600
billion, the same as the probability of getting any particular unpredicted
hand, but in this case we would all be surprised indeed that the prediction
came true and we would surely believe that the deal was not quite as random
as it was supposed to be.
Why are we surprised in the second case and not in the first? Should we be
surprised in the second case and not in the first? What evokes the surprise, or
what should do so, is not the fact that a low-probability hand has been dealt but
that in the second example one was able to predict the dealing of a particular
low-probability hand. If we registered surprise at every occurrence of a low-
probability event, we would exist in a state of constant amazement, because all
events are, from one or another perspective, low-probability events. The ability
to predict the occurrence of specific low-probability events, however, is some-
thing we do not, generally speaking, have.
To make the same point in slightly different terms, we can note that life is
full of situations in which we can say that an improbable event is certain to oc-
cur—that the probability that an improbable event will occur is 1. Such events
are represented by a lottery with, say, 1,000,000 participants. Assuming the
lottery is fair, the probability that any specific participant will win is very
small, 1/1,000,000, close enough to 0 to be considered 0 for practical purposes.
The probability that someone will win, however, is 1. Some individual will feel
very fortunate after the fact and may be inclined to wonder "why me?" But it
had to be somebody; the only way to avoid that is to not have the lottery, and
most of the real-life events for which this is symbolic do not give us that option.
I have argued that the chance occurrence of a low-probability event should
not necessarily evoke surprise, but that the chance occurrence of a. predicted
low-probability event should; the occurrence of a predicted low-probability
event is likely to make us wonder whether it really occurred by chance. This is
not the whole story, however; sometimes we are truly surprised by low-proba-
bility events even if they were not predicted, and it is hard to argue convinc-
ingly that we should not be. Consider again the bridge hand. We are not
surprised when we are dealt a hand that has less than 1 chance in 600 billion of
being dealt, but we would be surprised if dealt a perfect hand— 13 cards of the
PROBABILITY AND CHANCE • 49
same suit—even though the probability of receiving thatparticularhand is just
the same as that of getting any other particular hand.
Why are we more surprised at getting a perfect hand than at getting any partic-
ular hand? Again we come back to the distinction between specific events and
events with specific properties. Our surprise at being dealt a particular bridge
hand is determined, at least in part, I suspect, by the relative size of the set of pos-
sible hands that share salient features with the hand in hand. There are only four
members of the set of perfect hands, whereas most specific hands have features
in common with many other hands. The probability of getting a hand with, say,
four cards in each of two suits, three in a third, and two in the fourth is very large
by comparison with the probability of getting one with all cards in the same suit.
The probability of getting a hand with a mix of face and number cards is much
larger than the probability of getting one with all face cards. And so on. We need
not suppose that one can calculate, or even accurately estimate, the probabilities
involved in order to have some rough feel for them. People who play the game
frequently are likely to acquire from experience an approximate knowledge of
the relative frequency with which hands with certain properties of interest occur,
and even the most inexperienced novice is likely to recognize that a hand with a
mix of suits is much more probable than one with all cards of the same suit.
The distinction between small-probability events that are noticed after the
fact and those that are predicted in advance is critically important for many
purposes; however, I do not mean to suggest that low-probability events that
are noticed after the fact should always be dismissed as chance events. If I toss
a coin 10 times and it comes up head every time, I am surprised, whereas if it
comes up TTHTHHHTHT, I am not, despite the fact that I recognize the proba-
bility of each sequence to be precisely the same—about .001. The all-heads se-
quence makes me wonder about the fairness of the coin and the other one does
not. Are my different reactions justified? The reader who feels they are not
might imagine tossing the coin 100 times, or 1,000 times, and asking whether a
sequence of all heads would be more surprising than a particular sequence
with a mix of heads and tails, despite the fact that any two specific 100-toss (or
1,000-toss) sequences are equally probable.
The argument I want to make is that discovered low-probability regularities
can be legitimate bases for surprise and that they can serve as reasonable stim-
uli for hypothesis formation and can motivate searches for causal explanations.
Getting all heads in even only 10 tosses of a coin justifies, in my view, some
suspicion that something peculiar is going on. I would want to entertain—and
test—some hypothesis other than that the coin is fair. The observation of the 10
heads does not prove that the tossing was a nonrandom process, but it raises the
suspicion and rightfully prompts the search for a causal explanation. When
does perceived low-probability structure warrant investigation? When should
50 • CHAPTER 1
it be taken as suggestive evidence of a nonchance effect? When does it beg a
causal explanation?
These are difficult questions, and challenges to the statistical decision mak-
ing, a subject to which we return in chapter 10. It suffices to note here that ap-
parent structure can happen by chance and apparent structure will happen by
chance. Search for structure among outputs of a random process is bound to
succeed. It does not follow, however, that all apparent structure should be dis-
missed as effects of chance.
Paradoxically, chance lies at the root of most of the uniformities of the world we
are familiar with. (Barrow, 1990, p. 297)
The existence of chance, far from opposing the order of the universe, manifests
ever more cogently the existence of order. (Nogar, 1966, p. 292)
Chance has been evoked many times in the preceding discussion; it is time
to take stock of this concept. We speak of "chance devices," "chance events,"
and "chance variations." We refer to "effects of chance," noting that this or that
event was "due to chance" or "happened by chance." We refer to many games
as "games of chance." The importance of the concept to mathematics and sci-
ence is seen in references to the "mathematics of chance," the "theory of
chance," the "doctrine of chances," "the science of chance phenomena," and
the, some would say incongruous, term "the laws of chance."
Chance is a controversial concept. As I see it, the word has several connota-
tions as used in the scientific and mathematical literature. I note three:
SUMMARY
Some conception of chance dates to antiquity, as evidenced by such practices
as the drawing of lots, the rolling of bones, and other forms of gambling. De-
spite this fact, the development of a quantitative theory of probability did not
begin in earnest until the 17th century. Accounts of the thinking of the early de-
velopers of this theory reveal struggles between conflicting intuitions regard-
ing what the answers to specific questions of probability should be. Such
conflicting intuitions were often resolved by consensus, but not always.
As to what probability really means, differences of opinion persist.
Frequentist (or statistical) and subjectivist (or judgmental) connotations repre-
sent one, and perhaps the most important, distinction in both historical and
contemporary views. The development of probability theory as a mathematical
discipline seems not to have been impeded by the lack of consensual closure
about meaning; the theory has been, and continues to be, used to good effect in
numerous areas of practical application.
The concept of chance, which is central to probability theory, also has sev-
eral connotations in the literature, and what, precisely, it should be taken to
mean is still a matter of debate. That what are generally considered to be
chance events are lawful, which is to say predictable, in the aggregate is be-
yond dispute, thus the peculiar notion of "laws of chance." But chance is not an
explanatory concept, at least in the view of the writer; why chance events are
lawful remains an unanswered question.
CHAPTER
Randomness
Whatever your views and beliefs on randomness—and they are more likely than
not untenable—no great harm will come to you.
—Kac (1983, p. 406)
Some Conceptions
During the 1870s, William Shanks published the value of n to 707 places, a
prodigious feat, given that the computation was done entirely by hand. Three
quarters of a century passed before someone produced, with the help of com-
puting machinery, an approximation with a larger number of digits. Before this
time, the last 200 or so digits of Shanks' approximation had been suspect, be-
cause some digits were represented noticeably more than others. Inasmuch as
7i is a transcendental number, the digits should be randomly distributed, and
therefore should appear with about equal frequency in any sizeable part of its
decimal approximation. As it turned out, Shanks' approximation was indeed
incorrect after about the 500th decimal place (Ogilvy & J. T. Anderson, 1966).
This story illustrates the equal-representation conception of randomness:
Given a process that selects repeatedly (with replacement) from a finite set in
such a way that every member of the set has the same chance as every other of
being selected on each occasion, we would expect every member of the set to
be represented approximately the same percentage of times among the selected
items after a large number of selections.
This property is easily misunderstood. If an urn contains three black balls
and one white one, we do not expect black and white balls to be represented
equally in a large number of random selections, with replacement, from the
urn. What we mean by equal representation is that in a large number of random
selections, each ball will be represented approximately the same number of
times, or that on any given draw every ball, independently of color, has the
same chance of being drawn.
A random set or sequence is said to be irregular in the sense that it is rela-
tively unstructured. I say "relatively unstructured" because structure or regu-
larity is a matter of degree. A measure defined by Pincus (1991) and his
colleagues (Pincus & Kalman, 1997; Pincus & Singer, 1996), which they call
approximate entropy (ApEn), can be computed for any sequence and indicates
the degree to which the sequence approximates maximum irregularity. ApEn is
56 • CHAPTER 2
maximum when all fc-tuples of digits are as nearly equally represented in the
sequence as possible. Unpredictability, in the sense that knowledge of a part of
a set or sequence does not provide a basis for predicting any other part, follows
from the lack of structure; the greater the irregularity or lack of structure, the
less the predictability.
A sequence is said to be incompressible if a description of it cannot be given
that is shorter than the sequence itself. The sequence of even numbers starting
with 2 and ending with 1,000 can be described (to wit the last few words) with
fewer characters than would be required to write out the sequence in its entirety;
so this sequence is compressible. Another way to make the same point is to say
that the sequence is such that it can be generated by a rule or procedure that is
shorter than the sequence. The procedure in this case could be: Start with 2; let
each successive integer be the preceding integer plus 2; end with 1,000. Follow-
ing this line of thought, a random sequence is sometimes defined as a sequence
the shortest description of which is itself (Chaitin, 1975; Kolmogorov, 1965;
Martin-L6f, 1966). The sequence 10101010101010101010 ... is easily com-
pressed, for example, as "alternating 1 s and O's," or as " 10 repeated indefinitely."
The sequence 00101110100010110101 ... is not so readily compressed. So ac-
cording to the compressibility criterion, the second sequence might be consid-
ered random, but the first clearly would not.
Because any sequence can be represented in binary form, the compressibil-
ity criterion is sometimes expressed in terms of the length of a computer pro-
gram, represented in binary form, that would generate the sequence of interest
relative to the length of the sequence itself, also expressed in binary form. The
sequence is said to be compressible if the program is shorther than it. Any rela-
tively short binary number is likely to be shorter than a program that would be
able to generate it, but if the number is very long and has enough structure to
permit it to be produced by a simple rule, the program could be shorter than the
number itself.
Some Difficulties
All of these properties have been invoked to characterize randomness, but, al-
though each contributes to the richness of the concept, each also has limita-
tions. The expectation of relatively equal representation, for example, is an
expectation and neither a necessary nor sufficient condition of randomness. It
is not necessary because a random process can produce a set in which the items
do not appear with anything close to equal frequency. It is not sufficient, be-
cause equal representation can be obtained by a process that samples in a
highly regular and deterministic way. One would ensure equal representation
of the 10 digits in a large number of digits, for example, by selecting the digits
RANDOMNESS • 57
in order, from 0 to 9, and repeating this process as many times as necessary to
get the size sample desired.
Total irregularity appears to be unattainable. According to an area of mathe-
matics pioneered by Frank Ramsey, any sizable set will contain structured sub-
sets within it, no matter how it was produced (Graham, Rothschild, & Spencer,
1990; Graham & Spencer, 1990).
The idea of minimal description or incompressibility is closely related to the
concept of randomness as the absence of pattern or organization. The location of
dots on a page would be said to be random if the simplest way to "describe" its
"organization" were to present the page itself. If the arrangement admits of a
simpler description, or if it could be generated by a procedure that could be de-
scribed more tersely than the page—that would not have to give the coordinates
of the individual dots—it is not random according to this conception.
One attractive consequence of defining randomness in terms of incompress-
ibility (or absence of structure) is that it appears to help avoid the type of per-
plexing question that was raised in the preceding chapter at the beginning of
the discussion of what constitutes an improbable event. By most conceptions
of randomness, the outcome of the toss of a fair coin is considered a random
event and a sequence of heads and tails produced by 20 tosses would be consid-
ered a random sequence. But, as already noted, the probability of getting a
specified highly patterned sequence, say, alternating heads and tails, is pre-
cisely the same as the probability of getting a specified sequence with no dis-
cernible pattern, or a little less than one in a million in both cases.
The compressibility idea seems to permit us to avoid the question of why we
should consider one of two equally likely outcomes to be random and the other
not, given that they are equally probable. Clearly the one sequence is com-
pressible whereas the other is not. We can state a simple rule for generating the
first one, but, it would appear, there is no better way to specify the second one
than to write it out explicitly. But consider the following sequence:
00111101011100001010...
Choose a number x between 0 and 1. Compute the sequence 2x, 22x, 23x,... For
each number in this sequence substitute 1 when its fractional part is less than Vi
and 0 otherwise.
The value of x that I used was .38. For many choices of x, this rule will pro-
duce a sequence of Is and Os that will pass most, if not all, of the tests of ran-
58 • CHAPTER 2
domness that one might wish to apply (Kac, 1983). The short—20-digit—
sequence that I produced is shorter than the expression of the rule, so giving the
rule does not demonstrate its compressibility; but it could be made arbitrarily
long by simply applying the rule many times, and any long sequence produced
by this procedure can be described in compressed form by stating the rule and
specifying the value of x that was used to produce the sequence. And each time
the rule is applied with the same value of x, precisely the same random-looking
sequence will be produced.
There are other formulas that will generate numbers that appear to be ran-
dom—that will pass many of the tests that are conventionally used to establish
randomness. One such is the recursive formulaxn+l = kxn( 1 - jtn), 0 < xl < 1. For
some values of k, x stabilizes at some number; at other values of k, x oscillates
successively between two or more numbers. But for certain values of k, x varies
haphazardly (Paulos, 1992).
These examples illustrate the asymmetry of the compressibility test for ran-
domness. Anyone who saw a sequence produced by these processes would be
hard-pressed to find a way to compress them if he or she did not know the for-
mula that produced them. If one can find a way to compress a sequence, one
can say with certainty that, by the criterion of compressibility, the sequence is
not random. Inability to find a way to compress a sequence, however, does not
guarantee that none exists, so the most one can say in this case is that the se-
quence has not been shown to be nonrandom.
Random to Whom?
How should one test for randomness? One answer to this question that has been
proposed is that there is no way to demonstrate, beyond all doubt, that the out-
put of a given process is truly random. Horwich (1982), for example, argues
that no finite sequence can be shown to be entirely representative of the long-
term output of a stochastic process; conversely, one can argue that any finite se-
quence could be the output of a stochastic process, because, according to the
theory, every sequence that has nonzero probability of occurring will be pro-
duced by a random process if only it runs long enough.
It is sometimes held that establishing that a nonrandom sequence is not ran-
dom is more straightforward than establishing that a random sequence is ran-
dom (Wagenaar, 1972), but this is debatable. Given that a random process can,
and will, produce sequences that have as much structure as one pleases, one
can never say with certainty by inspecting a sequence that it is not random if
one defines a random sequence as a sequence produced by a random process.
RANDOMNESS • 61
Randomness as a Contingent Property
Twenty tosses of a coin that produced 20 heads would, by just about any
test, be judged to be nonrandom, but 20 consecutive heads embedded in a
sequence of a few million tosses would not be considered evidence that the
entire sequence was nonrandom; to the contrary, the larger sample would be
suspect if it had no runs of this length. It follows from this type of consider-
ation that it is possible to have many sequences that pass conventional tests
of randomness that, when joined together, yield a much longer sequence
that fails because it does not contain sufficiently long runs (Ayton, Hunt, &
G.Wright, 1991a).
Suppose we invoke the idea of compressibility. Devlin (2000b) gives the
following sequence as one he generated by rolling a fair die 42 times: 1,3,3,6,
5,6,6,4,3,5,2,4,6,1,5,2,1,2,5,6,1,1,4,3,5,4,1,4,4,1,5,1,3,3,2,2,6,2,
6,4,3,5. This sequence probably satisfies the criterion of noncompressibility;
it is hard to imagine producing a description of it that is shorter than the se-
quence itself. So, according to this criterion, we would call the sequence ran-
dom. Devlin goes on to argue:
But suppose I doctored the list a bit, changing the first 1 to a 2? Not only would
such a sequence be the product of design (because of my tampering with the
original result), but it would also carry the mark of design, inasmuch as each
number from one through six would occur exactly seven times. Still, there seems
to be no way to specify the new list of numbers that is shorter than the sequence
itself.... Hence, according to Kolmogorov's definition, the sequence is random,
despite the element of design in its generation, (p. 15)
There are, of course many ways in which a circular area can be divided into
subareas of equal size. Figure 2.2 shows a few of them. Which of these divi-
sions or other possibilities should be used to test whether the distribution of
points in Fig. 2.1 is random? Presumably it should make no difference. If a test
using one division showed the distribution to be random and one with a differ-
ent division showed it to be nonrandom, we would wonder either about our
concept of randomness or about the appropriateness of our test for it.
If we divide the circle into four quadrants, a chi-square test is likely to come
out one way if the division is made horizontally and vertically than if it is made
with the two diagonals (bottom two divisions in Fig. 2.2). In the first case, the
test will not force rejection of the null hypothesis and will therefore lead us to
conclude that the assumption of a random distribution is tenable. In the second
case, the test will make us reject the hypothesis of no difference and conclude
that the distribution is not random. So if we want to believe that the distribution
is random, we should perform the test with the first division, and if we prefer to
believe that it is not random, we should do it with the other one.
It is generally agreed by statisticians that this way of proceeding is cheating.
In situations of this sort, one is likely to be able to find a way of partitioning the
world so as to get a desired outcome if one has the opportunity to scrutinize the
data before deciding how to design the test. This is one of the reasons that many
statisticians insist that the hypotheses that are to be tested by experimentation
and the statistical procedures that are to be used to test those hypotheses be
specified in detail before the experimental data are collected.
RANDOMNESS • 63
FIG. 2.2. Possible ways to partition the area of a circle to test for the randomness of the
distribution of points in the circle.
FIG. 2.3. The area of a circle partitioned as it might be to determine whether points
within it representing the locations of darts thrown at a bull' s-eye are randomly dispersed
over the entire area. The area of the inner circle is equal to half the area of the larger one,
that is, equal to the area of the larger circle minus the area of the smaller one.
RANDOMNESS • 65
the torus surrounding it. But all the points that are less than half the length of
the radius of the target from the center of the target define an area, as shown in
Fig. 2.4, that is only one quarter the area of the whole target, or one third of the
area of the torus surrounding it. It would be possible to have a distribution of
dart locations that would be considered clustered around the center of the target
according to the first criterion, represented by Fig. 2.3, but that would be con-
sidered a chance distribution by the second criterion, represented by Fig. 2.4.
What appears to be a difficulty follows from the nonlinear relationship be-
tween a circle's area and its radius, A = Ttr2, and in particular the fact that
2 / \2
*n — . When the areas of two circles are in the ratio 2:1 their respec-
tive radii are in the (approximate) ratio 1:4.1; when the ratio of their radii are in
the ratio 2:1 their respective areas are in the ratio 4:1. So which of the two ap-
proaches considered is the appropriate one to use to test the question of
whether or not the dart locations are random with respect to the center of the
target? The answer is one cannot say without being more precise about what
one wants to mean by random in this context. Imagine being given the task of
distributing points on the target area in such a way that their locations would be
FIG. 2.4. The ratio of the areas of the outer and inner circles is 4:1; the ratio of the radii
of the outer and inner circle is 2:1.
66 • CHAPTER 2
random with respect to the center. One way to proceed would be to superim-
pose a grid on the target and select the x, y coordinates of each point using a pair
of numbers from a table of random numbers. This would produce a distribution
of points that is expected to be roughly the same in any two regions of equal
area, and, given this operational definition of randomness, the first approach
mentioned earlier would be appropriate.
Alternatively, one might determine the random points, again by drawing
pairs of numbers from a random number table, but this time using the first num-
ber of each pair to designate the distance of a point from the center and the sec-
ond one to determine the orientation of a radius drawn through the point with
respect to the 360 degrees of the circle. This would produce a distribution that
is uniform with respect to distance from the center, but with a greater density of
points for a region closer to the center than for a region of equal area farther
from the center, and given this operational definition of randomness, the sec-
ond approach mentioned earlier would be appropriate.
One may have good reason to prefer one of these definitions of randomness
over the other for a specific purpose, but neither can be said to be correct in an
absolute sense. The important point is that failure to be specific about what one
means by randomness in particular contexts can be problematic. We will en-
counter this fact again in the chapter on paradoxes and dilemmas, in particular
in a discussion of Bertrand's paradox.
Another point regarding testing for randomness that is implicit in the pre-
ceding comments is that variables can be random in some respects and struc-
tured in others. One might say, for example, that the distribution of points in a
circle is random with respect to the horizontal dimension, but nonrandom with
respect to the vertical. Or we might describe a distribution as random with re-
spect to angular displacement from the vertical but nonrandom with respect to
distance from the center.
The rule that hypotheses and statistical testing processes should be specified
before the collection of experimental data seems a good one to apply whenever
experimentation is to be done. This does not mean, however, that what appears
to be structure that was not predicted or anticipated should be ignored. When
one discovers by observation that there is a way of looking at a data set that
makes it appear nonrandom, as, for example, when one observes that one way
of partitioning a space is likely to produce statistical evidence of structure
whereas another partitioning is not, this can be a useful hint for further experi-
mental exploration. Such evidence of structure must be considered tentative,
RANDOMNESS • 67
and should never be reported without a clear explanation of the role that selec-
tion played in revealing it, but it can be of considerable interest, nonetheless.
Gilovich (1991) discusses an interesting case of after-the-fact assessment
of the randomness of a distribution of points over an area based on studies by
Clarke (1946) and D. Johnson (1981). The question of interest was whether
the V-l bombs dropped on central London by the German airforce during
World War II were randomly distributed over the entire area. Visual inspec-
tion of the distribution (see Gilovich, p. 20) suggests a greater-than-chance
concentration in the northwest and southeast quadrants. And indeed, a
chi-square test would permit rejection of the hypothesis of equal distribution
over the four quadrants with a high degree of confidence. But is the perfor-
mance of such a test appropriate, given the absence of an a priori reason for
partitioning the area into quadrants by dividing it north-south and east-west,
and the fact that interest in the possibility of a nonrandom distribution was
stimulated by inspection of the data? Gilovich says unequivocally no, on the
grounds that "with hindsight it is always possible to spot the most anomalous
features of the data and build a favorable statistical analysis around them" (p.
21). The reason for partitioning the map into four rectangular quadrants, in-
stead of dividing it into four sectors with two bisecting diagonals, say, is to
maximize the sensitivity of the test to the structure that we think we see by in-
specting the figure.
A more extreme case of selectivity in testing for randomness is seen in the
following illustration. Consider a set of points distributed as shown in Fig. 2.5.
It is easy to imagine a partitioning of this space for which a chi-square test
would provide evidence of randomness. If one divides the area into 16 equal
squares as shown in Fig. 2.6, for example, one gets a chi-square of 11 that, with
15 degrees of freedom, yields a.p of about .75, so one cannot reject the hypothe-
sis of a chance distribution. But in inspecting the original distribution, one may
notice that there appear to be more points in the upper right and lower left quad-
rants than in the upper left and lower right. If one divides the total area into four
equal-size squares, as in Fig. 2.7, and does a chi-square test, one gets a
chi-square of about 8.6 that, with 3 degrees of freedom, yields a.p of less than
.05, so one can now reject the hypothesis of random distribution.
Is one entitled to conclude from this second analysis that the points are not
randomly distributed within the entire area? Many statisticians would say defi-
nitely not. I want to argue that the question requires a qualified answer. One can
certainly say that the distribution of points in the space does not pass the partic-
ular (second) test of randomness that was used. Because of the way in which
the test was applied, one does not have a good basis for drawing any firm con-
clusions; however, the observed pattern of results raises a question as to
X
FIG. 2.5. Is the distribution of points in the square random?
X
FIG. 2.6. A chi-square test with this partitioning of the square supports the hypothesis
that the points are distributed randomly.
68
RANDOMNESS • 69
X
FIG. 2.7. A chi-square test with this partitioning of the square supports the hypothesis
that the points are not distributed randomly. There appears to be greater clustering in the
lower left and upper right quadrants than in the upper left and lower right ones.
whether there may be some causal reason why the points are clustered more
along the major diagonal (lower left to upper right) than along the minor one
(upper left to lower right). It would be foolish to ignore this possible clue to
structure; it provides a reason for making an effort to check more data with this
hypothesis in mind.
Although the illustrations in this discussion have been spatial, the issue of
selectivity is relevant to the problem of distinguishing between randomness
and nonrandomness more generally. Imagine a long sequence of Is and Os of
the kind that might be generated by many tosses of a coin. If one were to scan
such a sequence looking for a subsequence of length 20, say, that appeared to
be nonrandom, and that would be shown to be nonrandom by the application
of some standard statistical test, one would be very likely to be able to find
one. This method of finding structure would be rejected by statistical sophis-
ticates, of course. We should note, however, that in principle the approach is
not unlike the practice of repeating a failed experiment until it yields the de-
sired statistical result.
70 • CHAPTER 2
Every so often, there is a report in the news media of a neighborhood or
town that has become alarmed because there appears to be a higher-than-
chance incidence of some disease (usually cancer) among its residents. From
the perspective of a resident, this is an understandable cause for concern and
for an effort to determine whether there is a causal explanation for the unusu-
ally high rate of the condition. However, from the perspective of a demogra-
pher, the finding that among the many thousands of towns and neighborhoods
in the country, a few have an unusually high rate of cancer would not be sur-
prising; this is to be expected on a purely statistical basis, just as an occa-
sional run of 10 consecutive heads is to be expected if one tosses a fair coin a
sufficiently large number of times. One cannot blame the resident of a
high-incidence town for looking for something unusual about the area that
would account for the rate, but one should not be surprised, either, if there is
nothing unusual to be found.
As already noted, any sizable set will contain structured subsets within it, no
matter how it was produced; so, if one looks for structure, one is very likely to
find it, even in data that have been produced by a random process. This being
said, it would be hard to deny that the adventitious discovery of structure has
played an important role in science and is a critical aspect of effective thinking
more generally as well. The moral of the fact that structure may be found even
in products of random processes is that discovered structure should be taken
not as strong evidence of causal forces at work but as a stimulus for further in-
vestigation—controlled experimentation when feasible—aimed at providing
independent evidence as to the tenability of the hypothesis that the observed
structure is real and not simply the result of having selected a low-frequency
outcome of a random process.
However, not all the hypotheses that one might want to entertain regarding
randomness lend themselves to testing via controlled experimentation. As-
tronomers, for example, are interested in the question of whether stars and gal-
axies are distributed randomly throughout the universe. They are quite
convinced that, except on the largest scale, the degree of clustering is greater
than would be expected by chance, and this conclusion has been drawn on the
basis of after-the-fact observation. No one proposed a hypothesis about the dis-
tribution of stars and specified a statistical process for testing it before the ac-
tual distribution of stars was observed. It was by looking at the actual distri-
bution that astronomers got the idea that it is nonrandom in specific ways.
When data suggesting structure are obtained from observation and they are not
subject to experimental corroboration, as in the case of the distribution of stars,
perhaps the best that can be done is to consider the aggregate weight of all the
data that can be brought to bear on the question of interest.
RANDOMNESS • 71
THE PRODUCTION AND PERCEPTION OF RANDOMNESS
Charles Dickens is said to have refused, late one December, to travel by train
because the annual quota of railroad accidents in Britain had not yet been filled
that year.
Nature abhors a vacuum, human nature abhors chaos. Show us randomness and
we will find order, pattern, clusters, and streaks. (Myers, 2002, p. 134)
In view of the difficulty that statisticians have had in agreeing on the nature of
randomness, we should not be surprised if lay people often have an imperfect
understanding of the concept. Many experiments have been done to determine
how good people are at producing or identifying random sequences or sets
(Nickerson, 2002). The general conclusion that the results of these experi-
ments in the aggregate seem to support is that people are not very good at these
tasks—that they find it hard to generate random sets on request and to distin-
guish between those that have been produced by random processes and those
that have not. It appears that people often consider sequences that have been
generated by random processes to be nonrandom and they see contingencies
where they do not exist (J. Cohen, 1972; Wagenaar, 1972). Studies of the per-
ception of covariation, for example, some of which have been motivated by an
interest in superstitious behavior, have yielded evidence that people sometimes
impute contingency relationships between variables that are independent
(Catania & Cutts, 1963; Hake & Hyman, 1953; J. C. Wright, 1962). The gam-
bler's fallacy, the negative-recency effect, the law of small numbers, and a vari-
ety of other concepts attest to the prevalence of the belief among researchers
that people find it hard to distinguish consistently between random and non-
random sets and to generate random ones on request.
The tenability of the conclusion that people are poor producers and
perceivers of randomness has been challenged on the ground that much of the
work has been predicated on the assumption that there exist valid objective cri-
teria with which to judge the adequacy of subjective conceptions of random-
ness and nonrandomness (Ayton, Hunt, & G. Wright, 1991a, 1991b). Ayton et
al. (199la) argue not only that this assumption is false but that in many cases
"psychologists have set their subjects the task of generating or recognizing
random sequences, without explicitly defining what sort of sequence would
count as patterned, and therefore nonrandom, and then do not demur from
passing judgment on the adequacy of the performance of the task" (p. 224).
They suggest that often, in telling people to attempt to produce sequences that
have certain properties (appear to be jumbled or orderless) or that fail to have
others (structure or regularity), they are, in effect, instructing them to produce
72 • CHAPTER 2
sequences that have properties (e.g., local representativeness, no long runs,
etc.) that are later taken as evidence of nonrandomness.
Other investigators have argued that the controversial nature of the concept
of randomness and lack of agreement among experts as to how it should be de-
fined make it difficult to assess the ability of nonexperts to recognize or pro-
duce randomness, because the standard against which their performance
should be judged is unclear (Lopes, 1982). If experts do not agree as to what
constitutes randomness, it is not clear what we should assume that nonexperts
take the term to mean. One suspects that there are large individual differences
in this regard and that many of the conceptions are imprecise. Evidence that
people produce sequences that are more nearly random by conventional tests
when asked to make them unpredictable than when asked to make them ran-
dom (Finke, 1984) suggests that randomness and unpredictability may not be
equivalent in many minds.
A review of experimentation on people's ability to produce or perceive ran-
domness (Nickerson, 2002) revealed the importance of task instructions and
the difficulty of interpreting results when instructions are vague or ambiguous,
as they often have been. The results of many experiments have been taken as
evidence that people are not good at generating or recognizing randomness,
but, although the conclusion might be correct, much of the experimental sup-
port that has been advanced for it is weak because of the ambiguities involved.
Perhaps the best known case of an imputed contingency is the "gambler's fal-
lacy," according to one form of which arun of successive occurrences of one type
of random event (e.g., a run of heads in coin tossing) will make an additional oc-
currence of that event appear to be less likely. Another, but closely related, form
is the belief that when a sample of ongoing random events shows a "deficit" of
one type of event, the probability of the imminent occurrence of that event is in-
creased. If, for example, heads has outnumbered tails in a series of coin tosses,
RANDOMNESS • 73
the probability of the occurrence of tails is assumed to be increased until the bal-
ance is restored. Laplace (1814/1951) gives an amusing example of this form of
the bias: "I have seen men, ardently desirous of having a son, who could learn
only with anxiety of the births of boys in the month when they expected to be-
come fathers. Imagining that the ratio of these births to those of girls ought to be
the same at the end of each month, they judged that the boys already born would
render more probable the births next of girls" (p. 162).
Closely related to the gambler's fallacy is the "negative-recency" effect,
which refers to a bias against repetition evidenced by people attempting to gen-
erate or identify random sequences (Bar-Hillel & Wagenaar, 1993; Wagenaar,
1970). A spatial analog to the negative-recency effect has been observed when
people make presumably random selections from a list and avoid selecting ad-
jacent items (H. C. A. Dale, 1960). A negative-recency effect appears to be op-
erative in the attitudes many people express regarding the occurrence of
natural disasters that are generally assumed to be randomly timed; many peo-
ple seem to believe that the occurrence of a natural disaster (earthquake, flood,
tornado) more or less guarantees that such an event will not recur for a rela-
tively long time (Burton, Kates, & White, 1978).
The gambler's fallacy can take more complicated forms as, for example,
when in a two-alternative prediction task people tend to predict the more fre-
quent event after one occurrence of the less frequent event and to predict the less
frequent event after two consecutive occurrences of the more frequent one
(Jarvik, 1951). Study of the gambler's fallacy is complicated by the fact that peo-
ple sometimes make predictions that are consistent with the assumption that they
believe that independent events are contingent even when they indicate, when
asked, that they believe them to be independent. Bar-Hillel and Budescu (1995)
cite this finding, reported in an unpublished manuscript by Gold and Hester
(1989), as evidence of the need for caution in inferring what people believe about
probabilities from their predictions of the outcomes of probabilistic events.
When attempting to predict the outcomes of binary events (win-loss of abet
on a coin toss, win-loss of a football game) people sometimes use information
regarding previous outcomes differently, depending on what they assume
about how the outcomes are determined. When outcomes are assumed to be
determined by chance, as in the case of bets on coin tosses, previous wins are
likely to lead to prediction of a loss; when they are assumed to be determined
causally, as in the case of football games, previous wins are likely to lead to
prediction of a win. The first case is an example of the gambler's fallacy. The
second has sometimes been viewed as an example of a complementary mis-
conception of chance within the world of sports, which is illustrated by the
"hot-hand" phenomenon in basketball: It is commonly believed among basket-
ball players and fans that players sometimes experience an unusual streak of
74 • CHAPTER 2
successful shots; on such occasions they are said to be hot, or to have a hot hand
(Gilovich, Vallone, & Tversky, 1985).
In a long series of events of the same kind the single chances of hazard ought
sometimes to offer the singular veins of good luck or bad luck which the major-
ity of players do not fail to attribute to a kind of fatality. (Laplace, 1814/1951,
p. 164)
Some investigators have argued that both the gambler's fallacy and the hot-hand
belief rest on a misunderstanding of randomness, and, in particular, on unaware-
ness of the frequency with which runs of moderate length are likely to occur in
randomly generated sequences; runs in small samples are more likely than they
are generally believed to be (Wagenaar, 1972). Gilovich (1991) explains the hot-
hand belief this way: "Players and fans are not mistaken in what they see: Bas-
ketball players do shoot in streaks. But the length and frequency of such streaks
do not exceed the laws of chance and thus do not warrant an explanation involv-
ing factors like confidence and relaxation that comprise the mythical hot hand"
(p. 16). The mistake that players and fans who believe in the hot-hand phenome-
non make lies, according to this view, not in what they see but in how they inter-
pret what they see. The claim that the length and frequency of streaks of
successful shots do not exceed the laws of chance is based on an analysis of
shooting records of professional players by Gilovich et al. (1985).
Gilovich et al. (1985) note that they do not attempt in their analysis to cap-
ture all that people might mean by "the hot hand" or "streak shooting," but
they argue that the common use of the terms implies that the sequences of hits
and misses of basketball shots should differ from sequences of heads and tails
produced in coin tossing in two ways: (a) the probability of a hit should be
greater following a hit than following a miss, and (b) the number of streaks of
successive hits should be greater than the number produced by a chance pro-
cess with a constant hit rate. The data presented by Gilovich et al. constitute
evidence against the idea that the probability of a hit is greater, on the aver-
age, immediately following one or a sequence of other hits than following
one or a sequence of misses.
Some dyed-in-the wool sports fans who are aware of the data presented
by Gilovich and colleagues are reluctant to accept them as compelling evi-
dence that there is no such thing as a hot hand (Hooke, 1989; Larkey, R. A.
Smith, & Kadane, 1989). I confess to being among them, and although I re-
alize that that reluctance may simply bear out Gilovich's (1991) observa-
tion that belief in the hot hand within the basketball world is very strong and
RANDOMNESS • 75
not easily changed by the presentation of counter-indicative evidence of the
sort that he and his colleagues have produced, I will try to say why I find
their data unconvincing. I do not mean to deny that observers often see evi-
dence of conditional dependencies where none exists. This was demon-
strated convincingly by Tversky and Gilovich's (1989a) finding that
believers in the hot-hand phenomenon are sometimes convinced they see
streak shooting in randomly generated data. But from the fact that people
sometimes see dependencies that do not exist, it does not follow that there
are never such dependencies to be seen.
The hot-hand or streaky performance can mean at least two things: (a) that
the outcomes of successive trials are not independent (e.g., that the probability
of success on trial n, given success on trial n - 1, is greater than would be pre-
dicted from a knowledge of the overall probability of success and the assump-
tion of trial independence); (b) that a player's short-term steady-state
probability of success fluctuates over time more than is consistent with the as-
sumption of random variation around a stable long-term probability—that in-
dividual players experience limited periods of time during which they play
significantly better than they do on average. Gilovich et al. recognize the latter
conception in noting that references to hot-hand or streak-shooting phenom-
ena "express the belief that the performance of a player during a particular pe-
riod is significantly better than expected on the basis of the player's overall
record" (p. 295). In my view, the data they present do not constitute strong evi-
dence against the occurrence of this type of hot-hand phenomenon.
According to this conception, a hot (or cold) hand would reveal itself, not in
the difference between the probability of a hit conditional on a preceding hit
and the probability of a hit conditional on a preceding miss, but in a short-term
increase (or decrease) in a player's hit rate. To be specific, when a player who
has a long-term-average hit rate of .5 plays a game in which his hit rate is .7, one
might say that during that game he had a hot hand. Conversely, when the same
player plays a game with a hit rate of .3, one might say that his hand, during that
game, was cold. Of course, short-term fluctuations in hit rate would be ex-
pected strictly on the basis of chance, just as the proportion of heads in short se-
quences of coin tosses would be expected to deviate from precisely .5.
However, if the probability of the outcome is really constant, and not varying
over time, one expects the extent to which the relative frequencies of outcomes
in small samples will deviate from that probability to be limited and within
specifiable bounds.
Gilovich et al. (1985) addressed the possibility that the short-term hit rate
varies from the long-term average more than would be expected by chance, as-
suming a fixed underlying probability, in two ways. They compared two esti-
mates of the standard error of individual players' per-game shooting
76 • CHAPTER 2
percentage, one based on the player's shooting percentages for each game and
the other from his overall shooting percentage across games. They also looked
for evidence of more sequences of successive hits than would be expected by
chance if the hit rate were constant over time. The rationale for the latter analy-
sis was the assumption that if a player is occasionally hot, his performance re-
cord ought to reveal a greater number of hit streaks than would be expected by
chance. Both analyses failed to yield evidence of a nonchance effect. This re-
sult justifies the claim that no evidence for the reality of the hot hand was
found, but not, unless one wishes to assert that the null hypothesis has been
proved, to claim that it has been shown not to exist.
Larkey et al. (1989) argue that the analyses from which Gilovich et al.
(1985) concluded that streak shooting is illusory ignore the effects on perfor-
mance of game context and how a player's shooting interacts with the activities
of the other players. On the basis of an analysis of the shooting records of 18
star players during the 1987-1988 NBA season that was designed, in their
view, to take game context into account, these authors concluded that streak
shooting does occur and offered Vincent ("the Microwave") Johnson as a bona
fide streak shooter. Although I agree with Larkey, Smith, and Kadane's point
that any analysis that purports to demonstrate the existence or nonexistence of
streak shooting should take game context into account, their own analysis was
criticized, effectively in my view, by Tversky and Gilovich (1989b).
Gilovich et al. (1985) make it clear how randomly produced runs can easily
be misperceived as hot-hand effects. And the results of their analyses certainly
give believers in the hot-hand phenomenon something to think about. As
Hooke (1989) has argued, however, they do not justify the conclusion that no
such thing as a hot hand exists, which is what designating belief in the hot hand
as erroneous and as based on a cognitive illusion seems to imply. It may be the
case that the belief is erroneous and that the basis for it is totally illusory, but
this has yet to be demonstrated. There is a difference between saying that no
one has produced compelling evidence that it exists and saying that it does not
exist. And although some of the people who have written on this subject recog-
nize this difference, not all do. S. J. Gould (1989), for example, citing the data
presented by Tversky and Gilovich (1989a), says flatly of the hot hand, "no
such phenomenon exists" (p. 12). One might argue that in the absence of com-
pelling evidence of its existence the assumption of its nonexistence should be
made on the principle of parsimony, but we should distinguish an assumption
that is made in the interest of parsimony and that is not ruled out by evidence
from a conclusion that is forced by evidence.
Hooke (1989) makes what seems to me a critical point in contending that
while we know what should happen if only chance is involved, we do not have a
good idea of exactly what to expect if streak shooting actually exists; in his
RANDOMNESS • 77
words, "we don't have a well-formulated hypothesis to test against the null hy-
pothesis" (p. 36). Also, the null-hypothesis testing that has been applied to
hot-hand data has been primarily, if not exclusively, rejection-support (RS) test-
ing as contrasted with acceptance-support (AS) testing. Normally RS testing is
appropriate when the null hypothesis represents what the experimenter does not
believe and expects to be able to reject, and rejection is to be taken as evidence
supportive of the experimenter's theoretical position. (See also Meehl's [1967,
1990,1997] distinction between strong and weak uses of statistical significance
tests.) AS testing is appropriate when the null hypothesis represents what the ex-
perimenter believes and acceptance of it would be taken as support for the exper-
imenter's view (Binder, 1963; Steiger & Fouladi, 1997). In RS testing, the
decision criterion is biased against Type I error (rejection of the null hypothesis
when it is true), while in AS testing, the criterion is biased against Type II error
(acceptance of the null hypothesis when it is false). Using RS testing is question-
able when one's intent is to show the null hypothesis to be true (i.e., to show the
hypothesis that there is such a thing as a hot hand to be false).
One basis for reluctance to accept the conclusion that the hot hand is illu-
sory in the absence of evidence that forces one to this conclusion is the im-
plausibility—to me at least—of the implied hypothesis that an athlete's
performance is precisely constant over time. It is too easy to think of plausi-
ble reasons why a player's (shooter's) performance (hit probability) might,
unlike the fair coin, differ from one time to another. These include general
health, fatigue, mental state, playing arena, teammate combinations, and op-
posing players. One might argue that the effects of such variables, especially
in combination, can be assumed to be random, and for a large enough sample,
perhaps they can; but this does not rule out the possibility that a player's hit
rate might be better than his average when playing, uninjured and well rested,
on his home court, with his favorite play maker on the floor, and against a
weak defender.
I suspect that the truth about the hot hand lies somewhere between the ex-
treme view that sees it whenever a player's short-term success rate is notice-
ably above his long-term average or he makes several baskets in a row and the
other extreme position according to which it never occurs. Believers in the
hot-hand phenomenon may be too quick to attribute any impressive sample of
shooting performance to it; nonbelievers may be too quick to rule out the possi-
bility that shooters may sometimes be genuinely better than they typically are.
Evidence that performance in sporting events sometimes is streaky comes
from a study by Gilden and Gray Wilson (1995). Using a "runs z score" as "a
measure of outcome clustering that is independent of both sequence length and
hit rate," these investigators found evidence of streaky performance in both
golf putting and dart throwing. In both cases, the probability of streaks ap-
78 • CHAPTER 2
peared to be a nonlinear function of task difficulty, being greatest for interme-
diate difficulty levels and smaller or (in the case of dart throwing) not different
from chance with very easy and very difficult targets.
Waldrop (1995) has recently reported an analysis of some of the data con-
sidered by Tversky and Gilovich (1989a) that suggests that the question of
whether the hot hand is an illusion may be more difficult to answer than one
might have assumed. The data on which Waldrop focused were the 1980-1981
and 1981-1982 free-throw shooting records of nine regulars of the Boston
Celtics. The question of interest was whether there was any evidence in these
records of streak shooting in the first sense mentioned previously—the proba-
bility of a hit on the second of two shots being higher following a hit than fol-
lowing a miss. Tversky and Gilovich had concluded from their analysis of the
same data that there was no evidence that the outcome of the second shot de-
pended on that of the first.
Waldrop (1995) notes that one comes to different conclusions about the de-
pendencies if one examines the data on a player-by-player basis than if one
looks at them in the aggregate. Roughly half of the players did better on the sec-
ond shot if they missed the first one and the other half did the reverse, but none
of the differences is very impressive from a statistical point of view. When the
data are pooled over players, however, the probability that the second shot was
a hit is significantly greater if the first shot was a hit than if it was a miss.
Waldrop argues that basketball fans are more likely to have a representation in
memory that corresponds to the aggregate contingency table than a representa-
tion of a table for each individual player and that, therefore, their belief that the
second of two shots is more likely to be a hit if the first one was a hit should not
be considered unfounded, at least in the case of free throws. This does not jus-
tify the belief with respect to individual players, but does so with respect to a
team as a whole. (Waldrop also makes a statistical argument that several play-
ers did better on their second shot than on their first one, and argues that while
this outcome does not support the hot-hand hypothesis, it does count against
the hypothesis of independence.)
Waldrop's (1995) analysis involves a paradox—Simpson's paradox—that
is discussed in chapter 6. It is most readily illustrated in this context by focus-
ing, as Waldrop did, on the performance of Larry Bird and Rick Robey, the best
and worst free-throw shooters, respectively, on the team. As shown in Table 2.1
both Bird and Robey were somewhat more likely to make the second of two
free throws if the first was a miss than if it was a hit, but when their data are
combined, the opposite relationship holds. The reversal is accounted for in this
case by the fact that Bird has more impact on the combined table than does
Robey by virtue of taking more free throws and making a larger percentage of
them. It turns out that the differential weightings of the individual players' re-
sults has a similar effect when the data are combined for the whole team.
RANDOMNESS • 79
TABLE 2.1
Frequencies of Pairs of Free Throws of Bird and Robey Individually and Combined
My sense is that the tale of the hot hand has not yet been told in its entirety.
There can be no doubt that people see structure in events that have been gener-
ated by random processes and sometimes misinterpret that structure as evi-
dence of nonrandom effects. That all the apparent structure that is interpreted
as evidence of a hot hand or similar phenomenon is the result of random pro-
cesses is much less clear.
Nonaging Processes
SUMMARY
Coincidences
Of all the abuses of mathematics, of all the abuses of science generally, no single
phenomenon causes more misunderstanding than coincidences.
—Dewdney (1993, p. 40)
Strange coincidence, that every man whose skull has been opened had a brain!
—Wittgenstein (1953/1972, p. 28)
The writer of the Time magazine article from which this quotation was taken
pointed out that, in addition to the accurate correspondences, such as those
mentioned, many others have been produced by tweeking the facts just a bit.
To a list similar to the one that appeared in Time, Gardner (1967) added the
following observations:
Both the Federal Bureau of Investigation and the Secret Service, had they been
skilled in the prophetic aspects of numerology, would have been more alert on
the fatal day.
The digits of 11722 (November 22) add to 6, and Friday has six letters. Take the
letters FBI, shift each forward six letters in the alphabet, and you get LHO, the
initials of Lee Harvey Oswald. He was, of course, well known to the F.B.I. More-
over, Oswald has six letters. Oswald shot from the sixth floor of the building
where he worked. Note also that the triple shift of FBI to LHO is expressed by the
number 666, the infamous number of the Beast.
The Secret Service, an arm of the Treasury Department, likewise should have
been more alert. Two weeks before the assassination the Treasury Department
released a new series of dollar bills, a sample of which I enclose. [This is all pre-
sented by Gardner as a letter he received from the famous numerologist, "Dr.
Matrix."]
Observe that this series is designated by the letter K on the left. In 1913, half a
century earlier, when the Federal Reserve districts were designated, Dallas was
assigned the letter K, the eleventh letter of the alphabet. For this reason "Dallas,
Texas," where Kennedy was murdered appears beneath the "K." Dallas, Texas
has eleven letters. John Kennedy has eleven letters.
The serial number on this bill, as on all K bills, begins with K and ends with
A—"/Kennedy Assassination." Beneath the serial number on the right is "Wash-
ington, D.C.," the origin of the Presidents's fatal trip.
Below the serial number on the right, as well as above and below the serial num-
ber on the left, are two 11 's. Eleven, of course, is the month of November. The
two 11 's add to 22, the day of the assassination. To the right of Washington's pic-
ture is "Series 1963A," the year of the assassination, (pp. 45-47)
It is amazing indeed, is it not, that the FBI and the Secret Service could have
been so blind to the obvious pointers to the coming tragedy! Gardner's spoof is
good fun, but the "coincidences" and the interpretations Dr. Matrix makes of
them are not much more far-fetched than some that have been taken quite seri-
ously in other contexts by people with a numerological bent.
84 • CHAPTERS
WHAT IS A COINCIDENCE?
"And we are both widows too!" said Barbara's mother. "We must have been
made to know each other."
"I haven't a doubt about it," returned Mrs. Nubbles. "And what a pity it is we
didn't know each other sooner."
"But then, you know, it's such a pleasure," said Barbara's mother, "to have it
brought about by one's son and daughter, that it's fully made up for. Now, ain't
it?"
To this, Kit's mother yielded her full assent, and, tracing things back from ef-
fects to causes, they naturally reverted to their deceased husbands, respecting
whose lives, deaths, and burials, they compared notes, and discovered sundry
circumstances that tallied with wonderful exactness; such as Barbara's father
having been exactly four years and ten months older than Kit's father, and one of
them having died on a Wednesday and the other on a Thursday, and both of them
having been of a very fine make and remarkably good-looking, with other ex-
traordinary coincidences. (Dickens, 1840/1894, p 301)
Meaningiulness
All of us constantly witness very low probability co-occurrences, and pay them
no mind; they do not even register as coincidences in the usual sense of the
term. If someone points out to me that the co-occurrence of a blue jay landing
on the bird feeder outside my window and the radio beginning to play Claire de
Lune at the precise moment that I walked by the window on my way to get a cup
of coffee was very unlikely a priori, I would have to admit that that was so, but
normally I would not be surprised by such a co-occurrence—I would not even
notice it as an interesting coincidence because the co-occurring events bear no
meaningful relationship to each other in my mind. Our days are filled with such
meaningless co-occurrences that go unnoticed.
Perhaps the most obvious difference between coincidences that capture our
attention and those that do not is that in the former case the co-occurring events
are meaningfully related in the observer's mind in some way. The appearance of
a blue jay on the feeder outside one's window and the playing of Claire de Lune
on one's radio are not, under most circumstances, likely to be meaningfully re-
lated events in most people's minds, so such a coincidence goes unnoticed. In
contrast, the co-occurrence of the appearance of a blue jay on one's feeder and
the playing of Rhapsody in Blue, while no less improbable than the coincidence
of the blue jay and Claire de Lune, is much more likely to capture one's attention,
because the two events in this case are more likely to be related in one's thinking.
86 • CHAPTER 3
Assumed Small Prior Probability
Personal Involvement
Falk (1989) and Falk and MacGregor (1983) obtained evidence that people
find it somewhat easier, or more natural, to perceive coincidences in the latter
way when they are experienced by other people than when experienced by
themselves. People express more surprise, for example, when a coincidence in-
COINCIDENCES • 87
volves them than when a comparable, or even identical, coincidence involves
others. Falk was able to show that this is not simply a consequence of people se-
lecting own coincidental events to report that were subjectively surprising to
themselves but not to others. She found that even meaningless contrived coin-
cidences were perceived as more surprising when they involved oneself than
when they did not.
Given that co-occurrences that we perceive as interesting coincidences
capture our attention when we experience them, it is not surprising to find
that they tend to be remembered rather well (Hintzman, Asher, & Stern,
1978). Because they are remembered well, it may appear, in retrospect, that
one has experienced more of them than one might have expected to by
chance. This illustrates one way in which selectivity may play a role in mak-
ing attention-getting chance coincidences seem to be more common than
they are. Selectivity is also involved when we focus on low-probability
co-occurrences that are of special interest to us and ignore those (much more
frequent ones) that are not. And it may affect our impressions of the incidence
of such events in general because only those co-occurrences that are of spe-
cial interest to people who experience them get reported—the fact that person
X found himself sitting next to person Y (whom he did not know and in whom
he had no interest) in a theater in London on September 14,1986, is unlikely
to be considered noteworthy, despite the fact that the a priori probability of
this co-occurrence must be considered extremely low.
This quote from Paulos alludes to the fact that some coincidences may be much
more highly likely than is generally realized, simply on the basis of chance.
Consider, for example, the probability that within any modest-size group there
are two people who have the same birthday. Imagine a gathering of four people.
What is the probability that among these people there are at least two who cele-
brate their birthdays on the same day? Inasmuch as each person could have
been born on any of 365 days, there are 3654 possible arrangements of the four
birthdates. There are 365 !/(365 - 4)!, or 365 x 364 x 363 x 362, ways in which
the birthdates could be distributed so that no two are the same. So the probabil-
ity that at least two of the four people have the same birthday is 1 - [365 !/(365 -
4) !]/3654. In general, the probability that among n people at least two have the
CHAPTER 3
same birthday is 1 - [365!/(365 - n)!]/365". The answer for n - 4 is .016, not
very likely. However, for moderately large n the probability is much larger; it is
greater than .5 for n greater than 23.
Since it was first identified as an interesting phenomenon relating to prob-
ability theory, the chance coinciding of birthdays has received considerable
attention in the statistical literature. Diaconis and Mosteller (1989), who re-
view some of this literature, mention von Mises (1939) as perhaps the first to
discuss the phenomenon. They point out too that the problem has been gener-
alized in various ways and formulas have been developed to compute (or ap-
proximate) the probability that two or more of N people have the same
birthday plus or minus one day, the probability of k or more of N people hav-
ing the same birthday, and so on. B. Levin (1981) developed an algorithm for
computing the smallest number of people, N, required to ensure a probability
of at least 0.5 that at least k of them have the same birthday. Diaconis and
Mosteller give N ~ 41 (k - 1.5)3/2 as a function that approximates the values
produced by Levin's algorithm for k smaller than 20. According to Levin's
computation, one needs a group of 1,181 for the probability that 10 or more
people will have the same birthday to be greater than 0.5. For k = 10, Diaconis
and Mosteller's approximation of N gives 1,165.
When should coincidences be surprising? What about the kinds of coinci-
dences that were mentioned at the beginning of this section—learning that a
new acquaintance has three children with the first same names as one's own, or
hearing from an old friend about whom one has just dreamed? Each of these
events seems highly unlikely, and therefore surprising when it occurs. But
what we easily overlook when we think about such events is the very large
number of things that could occur in one's life, any one of which might be con-
sidered a surprising coincidence. The a priori probability that a specific one of
them will occur may be very small, but the probability that some one or more of
them will occur is very high; or to make the point the other way around, the
probability that no such events will occur is very low. When such an event does
occur, we are more likely to be impressed by the low a priori probability of that
particular event than by the high probability of the occurrence of some atten-
tion-getting low-probability event.
The point is that the fact that coincidences occur should not be surprising.
To the contrary, we should be amazed if they did not occur. The example of the
coinciding birthdays is but one of many that could be used to make the point.
The example serves also to sharpen an important distinction. The probability
that any two randomly selected people will have the same birthdays is very
small, less than .003, but as we have seen, the probability that some pair of per-
sons among a set of n people will have the same birthday is fairly large when n
is on the order of a dozen or so.
COINCIDENCES • 89
Diaconis and Mosteller (1989) express the inevitability of low-probability
consequences with "the law of truly large numbers," according to which "with
a large enough sample, any outrageous thing is likely to happen" (p. 859). They
point out, for example, that if a once-in-a-million coincidence occurs to one
person in a million every day, on the order of 100,000 such occurrences a year
would be expected in a population of 250 million people. (A similar argument
was made by Alvarez [1965] in a letter to the editor of Science regarding para-
psychology.) On the same logic, one-in-ten-thousand or one-in-one-thousand
coincidences, any of which might seem remarkable from a narrow perspective,
would be very common from a broad one. The same logic can bring us to the
conclusion that any given individual is likely to experience some number of
such low-probability coincidences over the course of a lifetime. In being sur-
prised by the occurrence of specific low-probability events, one is making
what Falk (1981-1982) calls the selection fallacy: One attends to alow-proba-
bility coincidence after the fact and is surprised by it only because one does not
recognize that some such events are bound to occur.
In view of the high probability of low-probability co-occurrences, why are
we surprised when they happen? The answer that comes most immediately to
mind is that we are unaware, or forget, that the likelihood of such co-occur-
rences is high. As Gardner (1979) puts it, "Even mathematicians can forget that
if enough people doodle long enough with random sequences of digits, it is
highly probable that they will find highly improbable patterns. It is because
most people fail to grasp this basic notion that they are unduly impressed when,
out of the billions upon billions of possible ways coincidences can arise in
daily life, one does occur" (p. 25).
Despite the fact that we should be surprised if coincidences did not occur,
their occurrence can still be attention getting. A few days ago, I took a leisurely
meandering ride on my bike on a particularly pleasant afternoon. I noticed at
one point on the trip that my odometer was turning to 21 at precisely the mo-
ment that I was passing a roadside marker that read "mile 21." This was the
only marker I saw on the trip. If I were more superstitious than I am, I might
have read some significance into this coincidence. I am not that superstitious,
quite, nevertheless, the event was sufficiently attention getting to have been re-
tained as a memorable aspect of the trip.
COINCIDENCES IN SCIENCE
The noticing of coincidences has played an important role in scientific dis-
covery and the development of scientific theory. The ability to see obscure
connections—connections that most people miss—is believed by some to be
among the more important abilities a scientist can have. Isaac Newton's pos-
tulation of the universal law of gravitation has been attributed to his notice of
what was, at the time, a "very approximate, numerical coincidence," namely
the fact that the parabolic path of a thrown rock and the path of the moon's or-
bit about the earth are particular cases of the same mathematical object
(Wigner, 1960/1980). Apparent coincidences invite speculation in science as
elsewhere, and finding explanations for coincidences has often been a goal of
scientific research and theory building. The similarity of the eastern coastline
of South America and the western coastline of Africa invites consideration of
the possibility that the two continents were once joined. The fact that quarks
and leptons are similar in certain respects, including the apparent organiza-
tion of both in three groups, begs an explanation, and physicists are likely to
see such a correspondence as the hint of a deeper causal connection. When
the numbers 2.5029 and 4.669201 or their reciprocals keep popping up in
studies of mathematical chaos, one is led to suspect that there is something
special about these numbers in this context and that their frequent appearance
is not "just" coincidental.
92 • CHAPTER 3
The distinction between coincidences in science that motivate search for a
causal connection and those that do not can be subtle. The relative diameters of
the sun and the moon and their relative distances from the earth (about 400-to-1
in both cases) are balanced in such a way that each body subtends about the
same visual angle (1 degree) and therefore they have almost exactly the same
apparent size to an observer on earth. (Although this coincidence has not moti-
vated a search for a causal connection, it has proved to be useful to science by,
for example, permitting the testing of a prediction of Einstein's theory of rela-
tivity during an eclipse of the sun.) The period of revolution of the sun on its
axis and the period of revolution of the moon about the earth both are about 28
days. These are examples of coincidental relationships revealed by science to
which scientists have attached no particular significance. They are, one might
say, curiosities, but not of the type that motivate most people to seek causal ex-
planations. In contrast, the fact that the orbits of the planets of our solar system
are, with one exception, all in about the same plane is too great a coincidence
for anyone to believe that there is not some causal connection and any theory of
the origin of the solar system that is to get serious consideration among scien-
tists must take this fact into account.
Commenting on the numerous regularities involving atomic weights and
other characteristics of the elements that had been pointed out by his fellow
chemists, Mendeleev quoted approvingly Strecker's (1859) observation that "it
is hardly probable that all the above mentioned relations between the atomic
weights (or equivalents) of chemically analogous elements are merely acciden-
tal" (Mendeleev, c. 1869/1956, p. 915). Mendeleev went on to produce the peri-
odic table of elements for which we remember him today. The history of science
provides many similar examples of "coincidences" that turned out, upon further
investigation, to be anything but coincidental in the usual sense of the term.
A particularly interesting illustration of the fact that advances in scientific the-
ory have sometimes provided new ways to view previously unexplained coinci-
dences relates to inertia and gravity. It had been known at least since the days of
Newton that the force of gravity on an object is proportional to the object's inertia.
Lederman (1993) credits Newton with being the first to recognize that the m of
F -ma,
that represents the inertial mass, the stuff that resists force, and the M in
F = G(M1M2)//?2,
that represents gravitational mass, the stuff that exerts pull on another object,
are equal, or very nearly so. He refers to the equality of m and M, which are suf-
COINCIDENCES • 93
ficiently different entities that they should not have the same name, as "an in-
credible coincidence" that "tormented scientists for centuries," and notes that
Einstein incorporated it into his general theory of relativity (p. 96). Berlinski
(2000) points out that the concepts of inertial mass and gravitational mass per-
tain to different parts of Newton's system: "Newton's second law of motion
draws a connection between force, inertial mass and acceleration. There is no
appeal to gravitation. Newtons's universal law of gravitation, on the other
hand, draws its connection between force, gravitational mass and accelera-
tion. There is no appeal to inertia. Two different conceptions of mass; two dif-
ferent laws of nature" (p. 201).
Recognition of the dual role of mass was articulated in the principle of
equivalence, which refers to the equality of mass as a measure of resistance to
acceleration and as a source of gravitational attraction. Newton himself saw
the coincidence as remarkable and mysterious, and offered no explanation for
it. Einstein resolved the mystery by declaring gravity and inertia to be the same
thing and making this equivalence the center of the general theory of relativity,
according to which there is only one type of mass and it is a consequence of the
curvature of space.
These cases illustrate that attempting to account in a causal way for what
might appear, at first, to be chance coincidences has been a fruitful endeavor in
science. But one can find examples of attempting to account for coincidences
that are generally considered not to be science at its best. A well-known case in
point is that of the attention given, beginning in the mid- 19th century, to the great
pyramid of Egypt. Much of the fascination that modern observers have had for
this structure is due to certain mathematical relationships noted first by John
Taylor (1859) and shortly later by Charles Smyth (1865). Taylor and Smyth dis-
covered numerous intriguing coincidences in measurements they made—for ex-
ample, that the ratio of twice the pyramid's base to its height was roughly the
same as the ratio of the diameter of a circle to its circumference; that the ratio of
the pyramid's base to the width of a casing stone was 365, the number of days in a
year; and that the pyramid's height multiplied by 109 was approximately equal to
the distance from the earth to the sun. The discovery of these and many other cor-
respondences provided von Daniken (1969) a basis for arguing that the earth had
been visited by intelligent extraterrestrials in the past.
What is the critical difference between this work with pyramid measure-
ments and the events mentioned in connection with Newton, Mendeleev, and
Einstein? In each of the latter cases, a coincidence was noticed that seemed to
call for a search for a common physical explanation, and the explanations that
were proposed were subject to test by virtue of their predictive implications re-
garding observable phenomena. In the case of the pyramid measurements,
many relationships were considered and only those that were deemed to reflect
94 • CHAPTER 3
interesting correspondences were given further attention. This kind of selectiv-
ity lends itself to underestimation of the probability that the coincidences noted
could be due to chance.
Gardner (1957), who describes this work, refers to Smyth's book as a clas-
sic of its kind illustrating beautifully "the ease with which an intelligent man,
passionately convinced of a theory, can manipulate his subject matter in such
a way as to make it conform to precisely held opinions" (p. 176). As Gardner
points out, a complicated structure like the pyramid provides a great many
opportunities for measurements of length and those measurements and the
results of calculations based on them are bound to coincide here and there
with numbers that are of interest for historical or scientific reasons, simply
because the set of numbers to be considered is so large. If one has the freedom
to pay attention only to those measurements or calculations that yield some-
thing of interest and ignore the great many that do not, one is pretty sure of
finding some correspondences by chance: "Since you are bound by no rules,
it would be odd indeed if this search for Pyramid 'Truths' failed to meet with
considerable success" (p. 177).
Kepler's Spheres
One of the better known examples of a coincidence that fired the imagination
of a great scientist is Kepler's discovery of a correspondence between the or-
bits of the planets of the solar system and the regular polyhedra. Since before
Plato it had been known that there are five such shapes: the tetrahedron (pyra-
mid; 4 triangular faces), the hexahedron (cube; 6 square faces), the octahedron
(8 triangular faces), the dodecahedron (12 pentagonal faces), and the
icosahedron (20 triangular faces). In Kepler's day there were only five known
planets in addition to the earth. Kepler was convinced that these "perfect"
forms held a key to a fundamental aspect of the solar system's design. He la-
bored to find the connection between the polyhedra and the planetary orbits,
and eventually he believed he had succeeded in doing so.
For each regular polyhedron, imagine two spheres, the largest sphere that
can be enclosed within the polyhedron and the smallest sphere in which the
polyhedron can be contained. For any given polyhedron the ratio of the diame-
ters of the inner and outer sphere is a constant; that is, it is independent of the
size of the polyhedron. This ratio differs however for the different polyhedra.
Kepler discovered that if the polyhedra are scaled in size and nested in a spe-
cific order in such a way that the inner sphere of one polyhedron coincides ex-
actly with the outer sphere of the next smaller one in the nesting, the ratios of
the diameters of the nested spheres approximate the ratios of the diameters of
the planetary orbits.
COINCIDENCES • 95
Here is his own account of the connection he worked out and his elation
upon finding it, as quoted in Boorstin (1985, p. 310):
Kepler considered the fact that there are only five regular polyhedra to be the
reason why there were only six planets. He had, Butler (1970) suggests,
"thought that since there could only be five regular solids, thus disposed, he
had shown conclusively why there were six and only six planets, echoing
Bongo's view that numerological arguments reveal to us the 'why' of things"
(p. 87). The Bongo to whom Butler refers is Pietro Bongo, who wrote during
the 16th century on number symbolism, or number mysticism.
How surprised should we be by the correspondence that Kepler discovered?
Is this one of those coincidences that seems to demand an explanation in terms
of a common cause? We need to distinguish here between surprise at the fact
that Kepler thought there might be some connection between perfect solids and
planetary orbits and surprise at the "closeness of fit" that he found. Let us con-
sider first the question of how surprised we should be that Kepler was looking
for a connection between the solids and the orbits.
The regular polyhedra—variously known as the "Platonic solids," the "per-
fect solids," or the "cosmic bodies"—had been objects of great fascination
among philosophers since their discovery at least 2,000 years before Kepler
and they had figured prominently in explanations of natural phenomena among
the classical Greeks. The Pythagoreans, who thought the number of such fig-
ures was four (the dodecahedron had not yet been discovered), associated them
with the four primal elements—earth (hexahedron), air (octahedron), fire (tet-
rahedron), and water (icosahedron). By the time of Plato the dodecahedron had
been discovered and it represented to him the universe as a whole. It really is
not surprising that Kepler, with his mystical perspective, his propensity to see
beauty in regularity, and his firm conviction that the universe and all it con-
96 • CHAPTERS
tained were the perfect work of an intelligent architect, should think to look for
a correspondence between the five perfect mathematical objects that he knew
and the orbits of the planets that, to him, comprised the entire solar system.
Whether we should be surprised at the closeness of the correspondence (not
exact) between the ratios of Kepler's nested spheres and the ratios of the plane-
tary orbits is a different matter altogether. This is a question of statistical reason-
ing. How should we think about this question? The basic data of interest are two
sets of ordered ratios. One of these sets of ratios is derived from the diameters of
the planetary orbits, ordered from, say, largest to smallest. The first ratio in the
set is the ratio of the diameter of the largest orbit to that of the second-largest; the
second one is the ratio of the diameter of the second-largest orbit to that of the
third-largest; and so on. The second set is derived in an analogous way from the
diameters of the nested spheres containing the perfect solids. The first ratio is the
ratio of the diameter of the largest sphere to that of the next-largest sphere, and so
on. But which polyhedron should determine the largest sphere?
Here is a very important point. Five polyhedra can be ordered in 120 differ-
ent ways, so Kepler had a choice of 120 different nesting arrangements, each of
which would produce a unique set of ordered ratios against which to compare
the ratios derived from the planetary orbits. Did Kepler pick the one that pro-
duced the best fit? I presume that he did. We would be more impressed by the fit
that Kepler found if it had been produced by arranging the polyhedra in a natu-
ral order, say from most simple to most complex or vice versa. There seems to
be no rationale for the order that Kepler selected, except, presumably, the fact
that that order came closest to yielding the correspondence for which Kepler
was looking. Today this would not generally be considered acceptable form,
but probability theory and statistics were not established fields of mathematics
at the time Kepler did his work. The book in which Kepler advanced his ideas
about the correspondence between the planetary orbits and the perfect polyhe-
dra was published in 1596,61 years before Huygen's publication of the initial
work of Pascal and Fermat on probability and 67 years before the appearance
of Cardano's The Book of Games of Chance. And, as Devlin (2000a) points out,
"in seeking to understand the patterns of nature through the abstract patterns of
mathematics [he was] working within a tradition that continues to this day to be
highly productive" (p. 152).
Large-Number Coincidences
Among the coincidences that have increasingly attracted the attention of phys-
icists, astronomers, and cosmologists in recent years are several that involve
large, in some cases very large, numbers. Some of these large-number coinci-
dences, though interesting, have prompted little comment; others have caused
a great deal of speculation and debate.
An example of the first case is the coincidence between the gravitational en-
ergy in the universe and the universe's total mass: "When one adds up all the
mass in the observable portion of the universe, a very large number is obtained.
When one calculates the amount of negative gravitational energy that exists in
the same region of space, another large number is obtained. As far as we can
tell, the two numbers are about equal" (Morris, 1987, p. 141). Morris notes,
however, that although this is an intriguing coincidence, no conclusions have
been drawn from it.
A large-number coincidence that has puzzled physicists for many years in-
volves what would appear to be two completely independent ratios that turn
out to be almost identical. The ratio of the linear extent of the known universe
to the diameter of an atom is roughly the same as the ratio of the electromag-
netic force between an electron and a proton and the gravitational force be-
tween them. In both cases the ratio is roughly 1040. Although associated with
the names of Eddington and Dirac, this coincidence was apparently first
pointed out in 1919 by Hermann Weyl (J. A. Wheeler, 1986). (The first ratio is
sometimes expressed in terms of the Hubble time—about 1010 years—and the
"nuclear time," which is the time that light takes to travel a distance equal to the
COINCIDENCES • 101
41
diameter of a proton. Davies [1982] gives 10 as this ratio. If one uses the lin-
ear extent of the universe and the diameter of a proton for one ratio and the
gravitational attraction and electromagnetic repulsion between two electrons
for the other, one gets ratios closer to 1042 [Feynman, 1965/1989].)
These two ratios are sometimes referred to as the first and second cosmic
numbers, respectively. The fact that they are nearly equal is a surprising coinci-
dence, inasmuch as there is no reason to expect them to be related in any way.
And there is more. Whereas the second number is invariant, the first one
changes as the universe expands, so the mystery of the coincidence is deepened
by the fact that the numbers coincide at precisely the time that we happen to be
around to notice the fact. As if this were not mystery enough, Dirac pointed out
in the 1930s that the estimated number of particles in the observable universe is
approximately the square of the coincidental ratio. Coincidences involving
magnitudes of about 1040 are intriguingly common and have attracted the atten-
tion of both physicists and cosmologists. Discussions of several of them can be
found in Davies (1982) and in Barrow and Tipler (1988).
Other Coincidences
There are numerous other coincidences the significance of which has been the
subject of speculation and debate. An example is the coincidence between the
laws that relate strength to distance for both electromagnetic and gravitational
force. Although, as already noted, the electromagnetic repulsion between two
electrons is greater by at least 40 orders of magnitude than the gravitational at-
traction between them, these forces both vary inversely with the square of the
distance between the objects involved. To date, no one knows whether this co-
incidence is significant, but the joining of the forces of electromagnetism and
gravity within a single theoretical framework remains a major goal of physics.
The earth has many properties that contribute to its habitability by life as we
know it. These include its distance from the sun, the near circularity of its orbit,
the tilt of its rotational axis, the size and position of its moon, its tropospheric
ozone shield against ultraviolet radiation, its magnetic shield against poten-
tially lethal cosmic radiation, the abundance of oxygen in its atmosphere and
liquid water on its surface; the list is easily extended. Some people find it hard
to accept this constellation of unusual properties that makes the earth condu-
cive to life as purely coincidental and take it as evidence that the earth is, by de-
sign, a special place.
Taking the existence of the earth with all its properties as a given, there are
other coincidences that are fortunate from the point of view of its inhabitants.
A case in point is the fact that a human being happens to be about the right size
to have developed the use of fire for practical ends. A fire that is smaller than a
102 • CHAPTER 3
small campfire is difficult to keep going because it is so easily extinguished,
and a much larger one is hard to keep under control: "Prometheus was just large
enough to feed the flame and keep from getting burnt" (Stevens, 1974, p. 25).
Another example is the coincidence involving the freezing temperature of
water, the fact that much of the earth has temperatures below this point for sig-
nificant portions of the year, and the curious fact that, unlike most liquids,
which contract upon freezing, water expands—not much, but enough to make
ice float. This is fortunate because, if ice sank, lakes would freeze solid in win-
ter—as the ice formed, it would sink to the bottom until the entire lake was a
solid block—and many would thaw only close to the top in the summer. This
would mean that much of the earth that is now habitable would not be, at least
by life forms with which we are familiar that depend on an abundance of liquid
water as a habitat.
When one considers the universe more generally, one also finds numerous
quite remarkable coincidences, some of which appear to be essential to the exis-
tence of life, and others of which seem to be required for the existence of any-
thing. These include the fact that gravity is at the right strength to permit the
formation of planets (Carter, 1974); several remarkable properties of water in
addition to its expansion upon freezing (Greenstein, 1988); the coupling of the
resonance between helium and beryllium, which enhances the production of be-
ryllium in red giant stars, and that between the reacting beryllium and helium nu-
clei with the carbon atoms that they form (if it were not for this unique double
matching of resonances there would be no carbon production and consequently
no carbon-based life; Barrow [1991] refers to this coincidence as "something
akin to the astronomical equivalent to a hole-in-one" [p. 95]); the fact that neu-
trons outweigh protons by just enough to permit the existence of hydrogen and
hence the existence of hydrogen stars, which, unlike helium stars, last long
enough for life to develop (Davies, 1982); the precise correspondence between
the expansive force of the big bang and the force of gravity (to within 1 part in
1060) at the "Plank time" of 1043 seconds after the big bang (Gribbin & Rees,
1989); the equivalence of the electrical charges of protons and electrons, which
differ dramatically in almost every other respect (if these charges were not equal
to an accuracy of one part in 100 billion, objects the size of human beings or
smaller could not exist; an even greater accuracy is required to permit the exis-
tence of much larger structures, such as the earth or the sun) (Greenstein, 1988);
the precise strengths of the strong and weak forces (small differences in either
would rule out the possibility of the universe as we know it) (Barrow & Silk,
1983; Davies, 1982; Gribbin & Rees, 1989). And so on.
I mention these "coincidences" to make the point of their existence, but I
will make no attempt here at an extensive discussion of what, to me, is a most
intriguing topic. Thoughtful and thought-provoking discussions of these and
COINCIDENCES • 103
other coincidences that are essential to life as we know it, and in some cases to
the existence of a habitable universe or even any universe at all, can be found in
Roxburgh (1977), Davies (1982, 1983, 1988, 1992), Hoyle (1983), Polking-
horne (1986,1988), Barrow and Tipler (1988), Greenstein (1988), Gribbin and
Rees (1989), and Barrow (1990,1991,1992), among other places. Is the identi-
fication of such coincidences simply another illustration of selectivity of the
sort described earlier in connection with studies of the great pyramid, and are
the coincidences that are identified to be expected on the basis of chance if
properly viewed, or are they aspects of the universe that beg causal explana-
tion. I confess to leaning strongly to the latter view.
There can be no question about the fact that chance coincidences can be
made to appear to be more significant than they are by selectively focusing on
them after the fact and failing to take account of the high probability that some
such co-occurrences are bound to happen by chance. One might push this ob-
servation to the extreme of dismissing all coincidences that do not have obvi-
ous explanations as due to chance. This does not seem reasonable to me; in my
view, there is a point at which attributing a coincidence to chance taxes credu-
lity to a greater degree than assuming that there is a causal explanation to be
found. Where that point is depends in part, I suspect, on one's philosophical
and/or religious perspective as much as on anything else.
Davies (1982) refers to the discovery that many of the coincidences men-
tioned in the preceding section are necessary for our existence as one of the
most fascinating discoveries of modem science. Certainly it has prompted a
104 • CHAPTER 3
great deal of thought and discussion among scientists and philosophers in re-
cent years. One outcome of this thinking that has received considerable atten-
tion is the anthropic principle or, as it is sometimes called, the anthropic
cosmological principle.
The term anthropic principle was used first by the British physicist Brandon
Carter in 1974. The basic idea has been elaborated by numerous others, nota-
bly G. J. Withrow and Robert Dicke. According to this principle, the very exis-
tence of human life constrains the universe. That is to say, given the existence
of observers, the universe must have certain properties that are prerequisite to
their existence. One statement of the principle is, "If some feature of the natural
world is required for our existence, then [inasmuch as we exist] it must indeed
be the case" (Greenstein, 1988, p. 46). Another similar expression of it is, "The
only things that can be known are those compatible with the existence of
knowers" (Greenstein, 1988, p. 47).
These assertions seem obvious, unlikely to provoke controversy, and not even
very interesting. To be sure, if the conditions that permit life and knowers to exist
did not hold, we would not be here to realize it; they must hold, given that we are
here. But does this observation constitute an explanation of anything? Does it
make the fact that the critical conditions do hold any less mysterious? For some
at least, it does not. As Gribbin and Rees (1989) put it, "To say that we would not
be here if things were otherwise... need not quench our curiosity and surprise at
finding that the world is as it is" (p. 2&5). Or as Davies (1992) argues, "The fact is
we are here, and here by grace of some pretty felicitous arrangements. Our exis-
tence cannot of itself explain these arrangements" (p. 204).
A universe that is compatible with the existence of knowers appears to be
extremely unlikely on a priori grounds, depending as it does on an impressive
number of apparently independent, but precise conditions. Our surprise comes
from the narrowness of the tolerances of the critical conditions, from the fact,
in other words, that if this or that constant differed from its actual value by the
tiniest amount, no universe containing knowers—indeed perhaps no universe
at all—could exist. The existence of the universe and our existence in it would
impress us less if they were known to be compatible with a wide range of con-
ditions. Why, we feel compelled to ask, did precisely the right conditions per-
tain, when the probability of them doing so by chance seems to have been so
infinitesimally small?
The anthropic principle has been stated in a variety of ways and means differ-
ent things to different people. A distinction is often made between weak and
strong versions of the principle. According to the weak anthropic principle
COINCIDENCES • 105
(which is the interpretation given in the preceding paragraphs), life can exist
only in an environment that is habitable. From the fact of our existence, there-
fore, we can infer certain properties of the universe, namely those properties it
must have in order to permit our existence. For example, from the fact that we
are a carbon-based form of life, we can infer that the universe must be, say, be-
tween 10 and 20 billion years old.
The argument involves assumptions about the origins and life cycles of stars,
the synthesis of complex elements, the formation of the solar system, the evolu-
tion of life, and the times required for these processes, but the reasoning is rela-
tively straightforward and easy to follow. If the universe were much younger,
according to this argument, we would not be here, because there would not have
been enough time for carbon and the other heavy elements that are necessary for
human life to have been produced by the thermonuclear activity within the stars.
When it is much older, the stars that are necessary to sustain life will have burned
themselves out. We live during the only time relative to the universe's evolution,
so this argument goes, that is conducive to life. A similar line of reasoning can be
used to infer other properties that the universe must have.
The strong form of the anthropic principle holds that a habitable environ-
ment must exist; in Carter's words, "The universe must be such as to admit the
creation of observers within it at some stage" (quoted in Davies, 1982, p 120).
Davies points out that the strong anthropic principle, at least as articulated by
Carter, has a quite different philosophical basis than does the weak one:
Indeed, it represents a radical departure from the conventional concept of scientific
explanation. In essence, it claims that the universe is tailor-made for habitation, and
that both the laws of physics and the initial conditions obligingly arrange them-
selves in such a way that living organisms are subsequently assured of existence. In
this respect the strong anthropic principle is akin to the traditional religious explana-
tion of the world: that God made the world for mankind to inhabit (p. 120)
SUMMARY
Coincidences are ubiquitous. They capture our attention when they unexpect-
edly bring together presumably independent events that we find interesting or
meaningful in combination. Some coincidences are best considered chance
co-occurrences; others can be traced to common, or closely related, causes.
When noticed, coincidences often motivate search for causal explanation. Ef-
forts to find explanations can evoke or reinforce superstitious thinking; they
have also sometimes led to scientific discoveries. Certain remarkable coinci-
dences involving physical and astronomical constants have attracted the atten-
tion of cosmologists, among others, in recent years and speculation continues
regarding what they might mean.
CHAPTER
Inverse Probability
and the Reverend
Thomas Bayes
A . form of reasoning that has received a great deal of attention from philoso-
phers, mathematicians, and psychologists is based on theorems proposed by an
18th-century British cleric Thomas Bayes (1763) and the French astronomer
and mathematician Pierre Laplace (1774). Today Bayes's name is more
strongly associated with this development than is that of Laplace, who is better
known for his work on celestial mechanics culminating in a five-volume opus
by that title and his treatise on probability theory published in 1812; but
Todhunter (1865/2001) credits Laplace with being the first to enunciate dis-
tinctly the principle for estimating the probability of causes from the observa-
tions of events. Although the popularity of Bayesian statistics waxed and
waned—perhaps waned more than waxed—over the years, it has enjoyed
something of a revival of interest among researchers during the recent past, as
indicated by a near doubling of the annual number of published papers making
use of it during the 1990s (Malakoff, 1999).
In classical logic, given the premise "If P, then Q," one cannot argue from
the observation Q to the conclusion P; such an argument is known as "affirma-
tion of the consequent" and is generally considered fallacious. Nevertheless
109
110 • CHAPTER 4
both in everyday life and in scientific reasoning, we do take the observation of
Q as evidence favoring the likelihood of P: If my hypothesis is correct (P), mix-
ing these two chemicals should produce a small explosion (Q); observing a
small explosion upon mixing the chemicals (Q) strengthens my confidence in
the correctness of my hypothesis. Most of us would probably agree, perhaps
with certain caveats, that that is as it should be.
Although Bayes is remembered for having quantified this idea, the idea it-
self, or something close to it, was probably around long before Bayes. McLeish
(1994) sees the logic that underlies Bayes 's rule in the reasoning represented in
the Talmudic law, as, for example, when a decision must be made as to whether
a child of a recently widowed and remarried woman is more likely that of the
deceased husband or the new one. Inasmuch as the baby was born 9 months af-
ter the death of the woman's first husband and 6 months after her remarriage,
either paternity is possible. The reasoning takes account of the probabilities of
full-term and premature births, as well as that of a woman showing signs of
pregnancy within 3 months of conception.
THE THEORY
Bayes formali zed his notion in what has sometimes been referred to as the "in-
verse probability theorem," which may be represented as follows:
k=\
where p(Hi I D) is the probability of Hypothesis i given the datum D (the poste-
rior probability of H), p(D I H.) is the probability of the datum D given Hypoth-
esis i (the probability of D conditional on H.), p(H) is the prior probability of
Hypothesis i (the probability ofH( before the observed datum D), and n is the
total number of hypotheses in the set. Because
where C and E represent cause and effect, respectively. Conceptualizing the re-
lationship in terms of hypotheses and data gives it a somewhat broader conno-
tation than does conceptualizing it in terms of causes and effects, because the
hypotheses can include, but are not necessarily restricted to, causal hypotheses
and, similarly, effects are one type of data that can figure in Bayesian reasoning
but not the only one.
Bayes's rule is nothing more nor less than a formula for computing a specific
probability given that specific other probabilities are known. It tells us how to
compute the probability of a hypothesis conditional on some data, if we already
know the probability of that hypothesis and all competing hypotheses in the ab-
sence of the data and, for each hypothesis, the probability of observing the data
in question conditional on that hypothesis. In short, one has to know quite a lot
about a situation in order to apply Bayes's theorem, or, if one does not know the
essential probabilities, one must make some assumptions or guesses about
them. Much of the contention about the application of Bayes's theorem in par-
ticular cases has to do with the question of the justifiability of assumptions or
guesses that are made.
The rule may be applied iteratively to accommodate a sequence of observa-
tions, in which case the value ofp(H{ I D) that results from any given computa-
tion becomes the />(//.) for the next iteration. Thus the following representation
is appropriate for the situation in which Bayes's theorem is to be used when up-
dating a posterior probability estimate with a sequence of observations:
where p(Hi I Dn) represents X//^) after the nth observation. To calculate X/^ '
Dj), the probability of H{ following the first observation, p(Hi I Z)n_j) in the
equation would be replaced withp(H), which represents the a priori, or prior,
probability of H{ —the probability of H{ before any observations are made.
t
Likelihood Ratio
A concept that is often encountered in the Bayesian literature is that of the like-
lihood ratio. A likelihood ratio is the ratio of two conditional probabilities.
for example, is said to be the likelihood ratio of D given //. to D given H.. People
sometimes find it easier to estimate likelihood ratios than to estimate the un-
derlying conditional probabilities, and likelihood ratio estimates can suffice
for the application of Hayes's rule in some instances.
The likelihood ratio tells us nothing about the absolute sizes of the condi-
tional probabilities involved, but only about the size of one relative to the size
of the other. It does give a good indication, however, of how discriminating a
bit of evidence is with respect to the two hypotheses involved. When the ratio is
close to 1 , the observation does not help much in choosing between them; how-
ever, the more it deviates from 1 in either direction, the greater the credence it
gives to one hypothesis or the other. This feature of the ratio is usually said to
reflect the diagnosticity of the data. The greater the diagnosticity of any datum,
the more useful it is in helping us choose among candidate hypotheses.
Odds
we have
Thus, to revise the odds so as to take into account a new observation, one
multiplies the existing odds by the likelihood ratio of the observation. In other
words, the odds for two hypotheses following the nth observation is simply the
product of the odds following the (n - l)th observation and the likelihood ratio
of the nth observation. So if one knows, or can estimate, the prior odds and the
likelihood ratio for a given observation, one can update the odds in light of the
observation without estimating conditional probabilities.
C. R. Peterson and Phillips (1966) suggested modifying the preceding equa-
tion by adding an exponent on the likelihood ratio,
which, since the datum in hand is a black ball and there are three hypotheses to
consider, we will rewrite as
INVERSE PROBABILITY • 115
and
The reader who is surprised to see that the posterior probability of the third
hypothesis is almost 10 times as large as the posterior probability of the first
hypothesis after the drawing of only a single ball is in good company. Most
people do not change their estimates as much as Hayes's rule indicates they
should as a consequence of observations; this is the well-known finding of con-
servatism in evidence evaluation that many investigators have described, about
which more later. Suppose that we draw a second ball and it too is black. The
denominator of our equation is now
In keeping with our intuitions, drawing a ball of each color on two succes-
sive draws increases the posterior probability that the urn from which the balls
are being drawn is the one containing black and white balls in equal numbers,
and decreases the posterior probabilities of the competing hypotheses equally,
but not by enormous amounts. The reader will easily verify that the distribution
of probabilities following the drawing of a ball of each color would be the same
independently of whether the black or the white ball was drawn first. If this
were not the case, the outcome would so strongly violate our intuitions that we
would find it hard to take the rule seriously.
This example is very simple, in that there is no difficulty in determining what
the set of hypotheses should be and there is no problem in assigning values for ei-
ther prior or conditional probabilities. Most real-life problems are not this straight-
forward. Also there is really no need for human judgment in this example.
However, it is interesting to consider how closely the posterior probabilities that
people assign on an intuitive basis match those produced by application of Bayes's
rule even in comparably simple cases. Because of experimental results showing
people to be too conservative in their estimates of posterior probabilities (see later
discussion) some investigators have proposed that people should be relied upon to
make conditional probability judgments and that posterior probabilities should be
calculated—from these estimated conditionals—automatically.
A difficulty with this view is that when the conditional probabilities are known,
as in the previous example, there is no need for anyone to estimate them, and when
INVERSE PROBABILITY • 117
they are unknown it is not clear how accurate people's estimates of them are. In
many instances, there is no alternative to estimates produced by people. In such
cases, people could be asked to estimate either conditional or posterior probabili-
ties. The opinion that people should be asked to estimate conditionals from which
posteriors can be calculated rests on the assumption that the greater ability of peo-
ple to estimate conditionals in situations in which they can be determined objec-
tively carries over to situations in which they cannot be so determined.
A tacit assumption throughout the foregoing discussion has been that the
observations—the data—used to update posterior probabilities are unambigu-
ous. The ball that is drawn from the urn is either black or white, and upon draw-
ing and observing it there is no question as to which is the case. Often in real
life the evidence to which one has access is not so assuredly reliable. The situa-
tion is more like that in which one gets information through a noisy channel;
one does not observe the ball directly, for example, but gets a report from an ob-
server who sometimes lies or makes a mistake.
Techniques have been proposed for taking account of the unreliability of
data in applying Bayes's rule (Gettys & Wilke, 1969; Schum & DuCharme,
1971; Schum, DuCharme, & DePitts, 1973; Snapper & Fryback, 1971; Steiger
& Getty s, 1972). They all involve introducing one or more terms in the compu-
tation to adjust the magnitude of the influence that a reported observation can
have in updating the probabilities of alternative hypotheses and they have the
effect of decreasing the diagnosticity of reported data. We will not consider
these techniques here, but it is important to understand that the need for them
arises because of the fact that Bayes's rule assumes reliable data and that as-
sumption often is not valid in real-life situations.
The deduction of the probability of causes from the probability of their conse-
quences is a game whose rules are such that no one can take part in it without
cheating. (Eddington, 1935, p. 126)
The question of whether Bayes's rule is descriptive of how people actually ap-
ply new data to the modification of their beliefs has received a great deal of at-
tention from researchers, and we will consider some of the experimental
results presently. Whether or not people behave like Bayesians, Bayes's rule
INVERSE PROBABILITY • 119
has been promoted widely as a normative, or prescriptive, theory of how be-
liefs shouldbe, changed as a consequence of the acquisition of new data that re-
late to them. Not everyone who has thought about it has readily accepted this
view, however, and debates about what inverse probability, which is what the
rule is said to compute, really means go back to the time of Bayes.
A problem perceived by Bayes (who never published his rule) was the need to
estimate prior probabilities. In order to apply the theorem, quantitative values
for prior probabilities must be assigned to the competing hypotheses under
consideration. In many practical situations, there is no objective way to deter-
mine prior probabilities, so if the rule is to be applied in these cases, subjective
probabilities must be used. In the absence of information that would justify
some other assumption, there seems to be no alternative to assuming that all
hypotheses are equally likely a priori. Today this might be justified on the
grounds that it is the assumption that maximizes a priori uncertainty, in the in-
formation-theoretic sense, and this seems appropriate for the situation in
which one has no basis for considering any hypothesis to be more or less likely
than any other. Bayes had trouble with the "uniform prior probability distribu-
tion" assumption, whereas Laplace did not.
One of the implications of the use of subjective estimates of prior proba-
bility distributions, when agreed-upon objective distributions are not known,
is that different individuals applying Bayes's rule to the same observations
will produce different posterior probabilities if their subjective estimates of
the prior probabilities differ. To people who consider probability to be a sub-
jective entity in any case, this poses no problem, because on the view that a
probability reflects nothing more than a state of mind—a degree of knowl-
edge or ignorance of the situation of interest—there is little reason to assume
equality among all observers in that regard. To one who wishes to give proba-
bility a more objective meaning, this feature of Bayes's rule is bothersome,
and the general applicability of the rule has been challenged by holders of a
frequentistic view of probability (Venn, 1888; von Mises, 1928/1957). Ed-
dington (1935) expressed a negative view of the use of subjective probabili-
ties in scientific calculations this way: "the most elementary use of the word
probability refers to strength of expectation or belief which is not associated
with any numerical measure. There can be no exact science of these non-nu-
merical probabilities which reflect personal judgment" (p. 113). Bunge
(1996), criticizes Bayesians for using probability—"or rather their own ver-
sion of it"—when confronted with ignorance or uncertainty: "This allows
them to assign prior probabilities to facts and propositions in an arbitrary
120 • CHAPTER 4
manner— which is a way of passing off mere intuition, hunch, guess for sci-
entific hypothesis" (p. 103).
SO
and
TABLE 4.1
Posterior Probabilities of Each of Three Hypotheses About the Proportion
of Black Balls in an Urn After n Draws, Each of Which Produces a Black Ball
Casel Case 2
n 0 1 2 0 1 2
PCHj 1 Dn) .333 .067 .009 .8 .364 .070
P(H 2 ID n ) 333 .333 .234 .1 .227 .219
P(H3 1 Dn) .333 .600 .727 .1 .409 .711
Note. Hl: proportion of black balls = .l;H2: proportion of black balls = .5; #3: proportion of black
balls = .9. In Case 1, the prior probabilities for the three hypotheses are equal at .333; in Case 2, they are
.8, .1, and .1 for Hypotheses 1,2, and 3, respectively.
122 • CHAPTER 4
any comfort we derive from this supposition does not apply to situations in
which only a single observation is to be made. There are many real-life, or
real-life-like, situations in which one has the results of a single observation and
wants to know what to make of them. A relatively reliable, but not infallible,
medical test has proved positive; what do I conclude from that regarding the
probability that I have the rare disease for which the test is diagnostic? An ac-
quaintance in whose judgment I have considerable, but not total, confidence
tells me that...; how seriously should I take this claim? Answering these ques-
tions within a Bayesian framework means revising prior probabilities in the
light of the observations, and inasmuch as there is only one observation in each
case, how the prior probabilities are estimated and what the estimates are are of
some importance.
L. J. Cohen (1981) focuses on two situations that are typical of those to which
Bayesian analysis is usually considered appropriate. The first is a problem at-
tributed to Kahneman and Tversky (1972a) that goes as follows. Eighty-five
percent of the taxi cabs in a given city are blue, and the remaining 15% are
green. A cab that was involved in a hit-and-run accident at night was identified
by a witness as green. Tests showed that the witness was able to distinguish a
blue cab from a green cab under night lighting conditions four out of five times.
INVERSE PROBABILITY • 123
What is the probability that the cab involved in the accident was blue? The me-
dian probability estimate produced by the participants in the cited experiment
was .20. This result suggested to the investigators that the participants ignored
prior probabilities and based their estimates entirely on what they were told re-
garding the accuracy of the witness in identifying cab color.
According to one view, the correct answer is given by the formula
where p(5 1 g) is the posterior probability that the cab is Blue given that the wit-
ness said "green," p(g 1 5) is the probability that the witness would say "green"
given that the cab was Blue, and/?(#) is tne prior probability that the cab was
Blue. The prior probability distribution represents what one should consider
the probabilities to be if one did not have the testimony of a witness.
In fact, given the aforementioned statement of the problem, we cannot solve
this equation exactly because we do not know the value of p(g I fl) orp(g I G).
The problem statement tells us only that the witness could identify the correct
cab color four times out of five, but it does not tell us the conditional probabili-
ties of the two types of errors: saying "blue" when the cab was Green and say-
ing "green" when it was Blue. We will return to this point later. For now, if we
make the assumption that these conditional probabilities were equally likely,
and we use the 85-to-15 ratio of Blue-to-Green cabs as the basis for assigning
prior probabilities, p(B) = .85 and/?(G) = -15, then the computation is:
A: p(B&b), the probability the cab is Blue and the witness says "blue."
B: p(G&b), the probability the cab is Green and the witness says "blue."
C: p(B&g\ the probability the cab is Blue and the witness says "green."
D: p(G&g), the probability the cab is Green and the witness says "green."
E, the sum of p(B&b) and p(B&g), is the probability that a randomly se-
lected cab is Blue and F, the sum ofp(G&b) and/?(G&g), is the probability that
it is Green. G is the probability that the witness says "blue" and H the probabil-
ity that he says "green." Of greatest interest for present purposes are the follow-
ing conditional probabilities:
A/E: p(b I B), the probability the witness is correct given the cab is Blue.
D/F: p(g I G), the probability the witness is correct given the cab is Green.
A/G: p(B I &), the probability the witness is correct given he says "blue."
D/H: p(G I g), the probability the witness is correct given he says "green."
TABLE 4.2
Representation of the Cab Colors—Witness Report Problem
Cab Color
Note. Each cell entry, A, B, C and D, represents the joint probability of the cab being the color indicated
by the associated column and the witness reporting the color indicated by the associated row. Marginal
entries, E, F, G, and H, are row and column sums. (Note that B and G, as markers in the table, are not to be
confused with B and G standing for Blue and Green cabs in probability notation in the text.)
INVERSE PROBABILITY • 125
Note that the unconditional probability that the witness makes a correct
identification is A + D, the probability that the cab is Blue and the witness says
"blue" plus the probability that the cab is Green andthe witness says "green."
Table 4.3 shows the situation defined by the conditions of Lyon and Slovic's
(1976) study, in which the prior probabilities of the cab being Blue and Green
were .85 and .15, respectively, and the probability that the witness's report was
correct was .8 irrespective of the cab's actual color. The conditional probabili-
ties of interest are represented here only implicitly; the probability that the wit-
ness says "green" given that the car is actually Blue,p(g I #), for example, is the
ratio of the joint probability of the car being Blue and the witness saying
"green," p(B&g), to the probability of the car being Blue, p(B&g) + p(B&b),
which is .177.85, or .2.
According to the conventional Bayesian approach to this problem, the prob-
ability that the cab was actually Blue, given that the witness said it was "green,"
is computed from the application of Bayes's rule, using the base rates .85 and
.15 as the prior probabilities of the cab being blue and green, respectively, and
applying the conditional probabilities of the witness to derive the posterior
probabilities, which represent the probability of the cab being a specified color
that takes both the prior probabilities and the observational data into account.
The computation, as was noted previously, gives All.29, or .59, as the poste-
rior probability that the cab was blue, given that the witness reported it to be
green. The situation is as shown in Table 4.3.
L. J. Cohen (1981) argues that .59 is not the correct answer to the question.
What this ratio represents, in his words, is as follows:
[It represents] the value of the conditional probability that a cab-colour identifi-
cation by the witness is incorrect, on the condition that it is an identification as
green. Jurors, however, or people thinking of themselves as jurors, ought not to
TABLE 4.3
The Analysis Shown in Table 4.2 With the Values Used in the Taxicab Problem
Discussed Earlier
Cab Color
Blue Green
Witness Says "blue" .68 .03 .71
"green" .17 .12 .29
TABLE 4.4
Case 1
Cab Color
TABLE 4.5
Case 2
Cab Color
Blue Green Row Z,
Witness Says "blue" .50 .20 .70
"green" .00 .30 .30
Column I .50 .50 1.00
TABLE 4.6
CaseS
Cab Color
Blue Green RowZ
Witness Says "blue" .30 .00 .30
"green" .20 .50 .70
Column!: .50 .50 1.00
combined, is .8, but only in Case 1 is it true that the probability that the witness
is correct is .8 independently of whether he says "green" or "blue." The proba-
bility that he is correct, given that he says "green," is 1 .0 and .71 in Cases 2 and
3, respectively; the probability that he is correct, given that he says "blue," is
.71 and 1.0 in the same two cases.
If we plug the appropriate conditional probabilities from these matrices into
the Bayesian calculation and use .85 as the prior probability that the cab is blue,
we get as the posterior probability, in the three cases:
Alternatively, if we accept Cohen's argument that "the fact that cab colours ac-
tually vary according to an 85/15 ratio is strictly irrelevant" to the estimate of the
128 • CHAPTER 4
probability that the cab is Blue, given that the witness says it is "green," and use .5
as the prior probability, we get for the three cases, again using Bayes's rule:
L. J. Cohen (1981) argues that .2 is the correct answer, not because it comes
from the application of Bayes's rule, as in Case 1 as just listed, but on the basis
of its being the conditional probability of the witness saying "green" when the
cab is really Blue. As we have noted, this conditional probability is an assump-
tion that goes beyond what is given in the statement of the problem. Cohen un-
doubtedly realized the need for this assumption, because he mentioned that
Lyon and Slovic (1976) had made it explicit in an article that postdated
Kahneman and Tversky's original one.
Another conceivable, though perhaps less likely, interpretation of the claim
that the witness distinguishes blue cabs from green in 80% of cases is that he
does so when he encounters the two colors with the relative frequency with
which cabs of these colors appear around town, which is to say when 85% of
the instances he has to judge are blue and 15% are green. With this interpreta-
tion, we again can imagine three cases analogous to those described previously
(see Tables 4.7,4.8, and 4.9).
(Note: In Cases 5 and 6, I used the largest probabilities in the Blue and
"blue" and Green and "green" cells, respectively, that were possible with the
constraint that the overall probability of a correct identification stay constant at
.8. In Case 6, it was possible to have the probability of the Green and "green"
cell match the probability of the occurrence of a Green cab, thus making ^(cor-
rect I Green) = 1.0. In Case 5, it was not possible to have the probability of Blue
and "blue" equal the probability of the occurrence of a Blue cab, because any-
TABLE 4.7
Case 4
Cab Color
Cab Color
Blue Green RawZ
Witness Says "blue" .80 .15 .95
"green" .05 .00 .05
Column I .85 .15 1.00
TABLE 4.9
Case 6
Cab Color
Blue Green RowZ
Witness Says "blue" .65 .00 .65
"green" .20 .15 .35
Column I .85 .15 1.00
thing greater than .8 in this cell would have made the overall probability of a
correct identification be greater than .8.)
Again, in all three cases the probability of a correct identification, all trials
combined, is .8, but the probability that the identification is correct conditional
on a particular identification response varies with the specifics of the cases.
The conditional probabilities of interest are as shown in Table 4.10.
Let us again calculate with Bayes's rule the probability that the cab was
Blue, given that the witness reported it to be "green," using the values assumed
in these three cases:
TABLE 4.10
The Probability of Being Correct, Conditional on Either of the Two Car Colors
or Either of the Two Reports, for Cases 4,5, and 6 Discussed Earlier
Case
4 5 6
P(correct 1 Blue) .80 .94 .76
P(correct 1 Green) .80 .00 1.00
P(correct 1 "blue") .96 .84 1.00
P(correct 1 "green") .41 .00 .43
cab is Blue, given that the witness says "green," is the complement of the prob-
ability that the witness is correct, given that he says "green." The correspon-
dence of the numbers simply shows that one gets the same result if one uses
Hayes's rule as one does when one obtains the probabilities directly by consid-
ering the appropriate outcome ratios.
The cases just considered illustrate that the claim that the witness can distin-
guish blue cabs from green ones in 80% of cases does not provide the informa-
tion needed to support an unqualified answer to the question of what is the
probability that the cab in the accident was Blue, given that the witness has
identified it as "green." One might acknowledge the legitimacy of these vari-
ous interpretations of the claim that the witness can distinguish blue cabs from
green ones in 80% of cases, but argue that the most natural interpretation is one
that assumes that the witness is equally as likely to identify a blue cab as green
as to identify a green one as blue, and that in the absence of evidence to the con-
trary, we should assume this to be the one that people naturally make. It seems
to me a reasonable working assumption, but it is important to recognize that it
is an assumption.
L. J. Cohen (1981) argues that "[t]he fact that cab colours actually vary ac-
cording to an 85/15 ratio is strictly irrelevant to this estimate [the estimate of
the probability that the cab was blue, given that the witness has said it was
green], because it neither raises nor lowers the probability of a specific
cab-colour identification being correct on the condition that it is an identifi-
cation by the witness" (p. 328), and he justifies this view on the grounds that
"[a] probability that holds uniformly for each of a class of events because it is
based on causal properties, such as the physiology of vision, cannot be al-
INVERSE PROBABILITY • 131
tered by facts, such as chance distributions, that have no causal efficacy in the
individual events" (p. 329).
A conventional counter to this argument is that, given the assumption that
the probability that the witness will misidentify the color on a single trial is .2,
independently of whether the cab is actually blue or green, then a long run of
trials on 85% of which the cab is blue will produce the outcome represented by
Case 4. If we think of the experiment as a random one from that long run of
imaginary trials, and take the theoretical relative frequencies as indicative of
the momentary probabilities, then the numbers in the matrix associated with
Case 4 represent also the probabilities associated with that trial, and the correct
answer to the question of the probability that the cab was blue, given that the
witness said it was green is .59. Cohen argues that a relative-frequency concep-
tion of probability is not appropriate here, that probability in this case is better
viewed as a "causal propensity," the judgment of which does not rely on rela-
tive frequencies, and that the sought for probability is .2, the probability that
the witness makes an error.
I confess to not fully understanding Cohen's argument. It seems to me he dis-
misses the relevance of relative frequencies, but uses them all the same. He de-
nies the relevance of the relative frequencies of blue and green cabs in operation,
but he accepts the relative frequency with which the witness makes errors of
identification (Margalit & Bar-Hillel, 1981). And it is not clear to me how an er-
ror of identification qualifies as a causal propensity any more than does the rela-
tive numbers of cabs of different colors in operation. One can think of causes of
errors of identification (poor vision, poor lighting), but in what sense is the
causal connection between such factors and the problem greater than is the rela-
tive availability of cabs of specified colors for involvement in an accident?
In defending Cohen's analysis of the taxicab problem, Levi (1981) argues
that the percentage of the total population of cabs that are blue (or green) is not
relevant to the jurors' task; what they need to be told, he claims, is "the percent-
age of blue (green) cabs in the city involved in accidents" (p. 343). (See also
Niiniluoto [1981].) A good Bayesian, he argues, "should neglect the base rate
given in the example because it is useless for the purpose of determining sub-
jective probabilities" (p. 343). He notes too that when people have been told
that 85% of the taxicabs involved in accidents have been blue, they have not ne-
glected base rates.
I would agree that knowledge of the percentage of cabs of a specified color
that have been involved in accidents should count for more than knowledge of
the percentage of cabs of a specified color in the town's fleet. If told, for exam-
ple, that 75% of the cabs in town are green, but that 75% of the cabs that have
been involved in accidents have been blue, I would think the latter percentage a
more appropriate basis than the former for estimating the prior probability that
132 • CHAPTER 4
a random accident will involve a blue cab. But the question is, if one does not
have the accident statistics and has only the population statistics, should one
use the latter or should one ignore them. It seems to me one should use them,
but that one should have less confidence in one's answer than would be justi-
fied if it were based on more situation-specific base rate data.
Let us turn to the second situation considered by L. J. Cohen (1981) in his dis-
cussion of the possibility of demonstrating human irrationality. A patient suf-
fers from one of two diseases, A or B. Imagine yourself as that patient. Cohen
posits the following:
Cohen leaves no doubt of his answer to the question: "It is the former option
that would be the irrational one for you, qua patient, not the latter" (p. 329).
Note first that, as was true of the taxicab problem as presented, the descrip-
tion of this problem is, strictly speaking, inadequate to permit unequivocal
computation of posterior probabilities, whatever one's attitude toward base
rates. The problem is with the claim that the diagnosis "has been tried out on
equal numbers of A- and B-patients and is known to be correct on 80% of those
occasions" (p. 329). The statement is sufficiently vague to permit the interpre-
tation that 80% of the total number of diagnoses were correct without necessar-
ily assuming that 80% of the A-patients were diagnosed correctly and that 80%
of the B-patients were diagnosed correctly. The statement does not rule out the
possibility, for example, of an overall rate of 80% correct diagnoses, with 90%
of one type of patient being diagnosed correctly and 70% of the other being so.
As in the case of the taxicab scenario, one might be willing to assume that most
readers would interpret the claim as that of equal accuracy of diagnosis for both
types of patient, but it is important to recognize that that is an assumption and it
INVERSE PROBABILITY • 133
is essential to the further analysis of the problem. I believe that a nontrivial
amount of the difficulty people have in dealing with questions of probability
stem from the imprecise use of language, and it therefore is better to err in the
direction of more-than-adequate precision than in the opposite one. In any
case, for the sake of further discussion, let us take the intent of the claim to be
that the probability of a misdiagnosis given the presence of disease A was the
same as that of a misdiagnosis given the presence of disease B and that it was .2
in both cases. Given this interpretation, we may represent the situation de-
scribed by Cohen as in Table 4.11.
The conventional Bayesian answer to the question of the probability that
you are suffering from Disease A, given that the tests indicate that you are suf-
fering from Disease B, which I will represent asp(A I b), is
As noted earlier, L. J. Cohen (1981) rejects this answer, arguing that the
standard statistical method of taking the prior frequency into account would be
correct "if what one wanted was a probability for any patient considered not as
a concrete particular person, not even as a randomly selected particular person,
but simply as an instance of a long run of patients" (p. 329). But the individual
patient, concerned only about her own situation, Cohen argues, "needs to eval-
uate a propensity-type probability, not a frequency-type one, and the standard
stastical method would then be inappropriate" (p. 329). Cohen contends that
this view is not a repudiation of Bayesian analysis, but only of the assumption
TABLE 4.11
Joint Probability of the Disease State Indicated by the Associated Column
and the Diagnosis Indicated by the Associated Row
Disease State
A B RowZ
Diagnosis "a" .76 .01 .77
"b" .19 .04 .23
Column L .95 .05 1.00
134 • CHAPTER 4
that the prior probabilities are appropriately taken to be equivalent, in this case,
to the relative frequency of occurrence of the two types of disease in the popu-
lation. Cohen's analyses of the taxicab and medical-test problems evoked
much commentary, positive and negative.
All good Bayesian statisticians reserve a little of probability for the possibility
that their model is wrong. (DeGroot, 1982, p. 337)
138 • CHAPTER 4
The Bayesian approach to reasoning about probabilistic situations has much to
recommend it, and it is not surprising that it has received so much attention
from both philosophers and psychologists, or that it has been applied to practi-
cal advantage in many contexts. In part just because the approach is so attrac-
tive and useful in many ways, it is important to be aware of its limitations, of
which there are several.
In a discussion of these limitations a distinction should be made between
limitations inherent to the Bayesian approach and other limitations that may
result from the ways in which people apply it. With respect to the second type
of limitation, Fischhoff and Beyth-Marom (1983) identify a variety of poten-
tial sources of bias. According to their analyses, biases can affect about any as-
pect of the Bayesian approach, of which they identify seven: hypothesis
formation, assessing component probabilities, assessing prior odds, assessing
likelihood ratios, aggregating data, information search, and action selection.
This list suggests a fairly broad connotation of Bayesian reasoning includ-
ing, as it does, such activities as information search and action selection,
which, strictly speaking, are not distinguishing aspects of the Bayesian ap-
proach. Indeed, the only aspect of decision making that uses Bayes's theorem
directly is hypothesis evaluation—specifically, the updating of posterior prob-
abilities. Many of the biases that affect performance in a Bayesian deci-
sion-making context are not unique to this context, but may be found also when
decision making is done without reference to Bayes's rule (Krischer, 1980;
Politser, 1981). Here I will deal only with the question of limitations that are in-
herent to a Bayesian approach.
Conditions of Applicability
The main risk associated with Bayesian reasoning, as I see it, is the risk of ex-
pecting more of it than it can deliver. Bayes's rule was advanced to answer a
quite specific question: how to revise probabilities associated with existing hy-
potheses in the light of new data relevant to the truth or falsity of those hypothe-
ses. Given the conditions under which the rule is supposed to work, it appears
to do a very good job. Those conditions are restrictive, however, and often are
not realized in real-life situations.
The requirement of a well-structured problem in which one has an exhaus-
tive and mutually exclusive set of hypotheses about the possible states of the
world is perhaps the most obvious restriction. The assumption that one and
only one of those hypotheses is true does not allow for the possibility that none
of them is true, or that several of them are true, or partially so.
Inasmuch as the rule specifies only how to revise existing probability esti-
mates, one must begin by specifying prior probabilities if they have not already
INVERSE PROBABILITY • 139
been specified. Prior probabilities for one problem may, of course, be posterior
probabilities of a preceding problem, and this process may be iterated many
times. But, as Rozeboom (1997) points out:
On pain of infinite regress, this process must have a non-Bayesian origin. And
there is no coherent credibility distribution assigning certainty to evidence e that
cannot be reached by conditionalization on e from some other coherent credibil-
ity distribution. So if a theory of rational belief can advocate no normative stan-
dards beyond Bayesian ideals, neither has it any grounds for preferring one
coherent allocation of posterior belief strengths compatible with our accumu-
lated data to another, (p. 345)
Rozeboom is quick to point out that this does not defeat the Bayesian per-
spective, but it does show the insufficiency of that perspective, in the absence
of additional principles, to justify belief acquisition.
If one has no rational basis on which to estimate prior probabilities, one
must do so arbitrarily. In the absence of any reason for believing otherwise, one
might assign equal probability to all the hypotheses under consideration, ac-
cording to the principle of insufficient reason, as first expressed by Jacob
Bernoulli, but, as Luce and Raiffa (1957) have pointed out, indiscriminate use
of this principle has led to many nonsensical results. Jaynes (1968) proposed
use of the principle of maximum entropy, according to which the assignment is
done in such a way as to maximize entropy without contradicting the available
prior information. One must decide too, how many hypotheses one should con-
sider and what they should be; Bayes 's rule provides no guidance for this task.
Extra-Bayesian Requirements
A major advantage of the use of Bayes's rule to evaluate hypotheses is the re-
quirement it imposes that all hypotheses of interest be explicitly identified
and that the application of data to the evaluation of any one of them must take
into account the implications of those data to competing hypotheses.
Bayesian analysis constitutes a method of applying data to the task of judging
the relative tenability of each of a set of hypotheses. It requires that more than
one hypothesis be under consideration and has nothing to offer to the individ-
ual who has only a single hypothesis in mind. This might be seen as a problem
inasmuch as it forces one who would use the approach to come with more
than one hypothesis to account for a situation of interest, but this is really an
advantage just because it does force one to consider possible alternatives to
what may be a favored point of view.
At the very least one must consider not only a hypothesis of interest, H, but
the complementary hypothesis, ~H, as well. This is an important discipline.
Much evidence indicates that a common failing in reasoning is to evaluate the
credibility of a hypothesis by considering the probability of observing some
specific data if that hypothesis were true, p(D I H), without considering the
probability of observing the same data if that hypothesis were false, p(D I ~H)
(Nickerson, 1998). The Bayesian prescription precludes this by requiring that
an exhaustive set of hypotheses be considered and that evaluation involve the
application of the same data to the entire set at the same time.
Many critics of conventional statistical significance testing argue that the use
of Bayes's rule is a preferred approach to the evaluation of hypotheses (Ed-
wards, Lindman, & L. J. Savage, 1963; Gelman, Carlin, Stem, & Rubin,
1995; Greenwald, 1975; Lindley, 1984; Rindskopf, 1997; Rouanet, 1996;
Rozeboom, 1960; Rubin, 1978). One argument in favor of this view is that the
use of Bayes's rule permits evidence to be applied simultaneously to a set of
hypotheses of interest, and not just to a null hypothesis and its complement.
And it yields an update of the probabilities of all the hypotheses, not just a bi-
nary decision with respect to a null hypothesis. Also, unlike conventional sta-
tistical significance testing, Bayes's rule represents a means of cumulating
the effects of evidence on the hypotheses of interest from a series of studies
conducted over time. In theory at least, it allows for the posterior probabili-
ties of one set of studies to be the priors for a subsequent one, and the process
can be iterated indefinitely. To date, this advantage seems not to have had a
large effect on experimental design.
SUMMARY
A SMALL WAGER
Imagine the following gamble. A coin is to be tossed repeatedly until one of the
two following sequences occurs: (A) head, head or (B) tail, head. Depending
143
144 • CHAPTER 5
on which of these sequences occurs first either you or your opponent wins the
game. You are given your choice of which sequence is to be yours. Should you
have a preference for one over the other? In fact you should. The odds are 3 to 1
in favor of sequence B and this is so despite the fact that in a long series of
tosses each of these two sequences is equally likely.
Why should one prefer sequence B? Consider the four possible outcomes of
the first two tosses: HH, HT, TH, and TT. If the first of these occurs the game is
over and the player with sequence A wins. The game is also over if the third
event occurs and in this case the player with sequence B wins. Assuming a fair
coin, these events are equally likely and each occurs with a probability 1/4. If
either Events 2 or 4 occurs, the game proceeds, since neither of these is a win-
ning combination. But in both cases, it is now impossible for the combination
HH to occur before TH, because in each case the occurrence of the first head
will necessarily follow a tail and consequently terminate the game with a win
for the player who has sequence B.
This gamble illustrates how easy it is to overlook critical aspects of a proba-
bilistic situation and to jump to erroneous conclusions by making inferences
from simple situations to slightly more complicated ones without noticing the
complication. What is it that makes the illustrative gamble so easy to mis-
perceive? When one tosses a fair coin twice, the probability of getting a tail fol-
lowed by a head is precisely the same as getting two heads in a row. Now
imagine the following situation. A coin is to be tossed repeatedly and a record
kept of the sequence of heads and tails that is produced—TTHTHHHTHTT...
Suppose that this time we decide to enter this sequence at a random point, by
picking a number from a hat, say, and check the next two outcomes following
that point. If either of our two combinations is found, the game is over. If nei-
ther is found, we enter the sequence at another point, determined by the draw-
ing of another number from the hat, and again check the next two outcomes.
And so on. In this case, one should have no preference between HH and TH, be-
cause they are equally likely to be found, no matter how long the game goes on.
The situation described first is analogous to entering the sequence at a ran-
domly determined point, checking the first and second outcomes following
that point and, if neither of the game-terminating pairs is found, moving on to
check the second and third outcomes, then the third and fourth, and so on. The
second pair of outcomes in this case is not independent of the first, but has one
member in common with it. And given the rules of the game, that member is
necessarily a T, so inspection of the second and third outcomes can only yield
TH that will terminate the game with a win for B, or TT that will mean a repeti-
tion of the cycle.
This type of difference and the ease with which it is overlooked is the basis
of a wagering scam that can take a variety of forms. In one of them you are to
SOME INSTRUCTIVE PROBLEMS • 145
make a wager with an opponent as to which of two three-outcome sequences
will occur first in a series of coin tosses, and your opponent, being polite, gives
you the opportunity of choosing first. As in the two-toss example just consid-
ered, some sequences have a better chance than others of occurring first in an
extended setof tosses, and if your opponent is aware of the differences and you
are not, you are likely to lose.
Suppose Jack and Jill are playing this game and Jack, on the assumption that
all possible sequences are equally likely, bets on HHH. Jill then, being smarter
than Jack, picks THH. These sequences are equally likely to occur on the first
three tosses, and each has probability 1/8 of doing so. However, if the game
goes beyond three tosses, Jill is sure to win. The only way that Jack can win on
the first three tosses is for there to fail to be at least one T among them, and
given the occurrence of a T, it becomes impossible for Jack ever to win, be-
cause THH must then occur before HHH can. So, THH will win over HHH
seven times in eight; or equivalently, the odds are seven to one in favor of THH.
It is easy to see the advantage of THH over HHH. In the cases of some of the
other sequence pairs the relative advantages of one over the other is not so obvi-
ous, but invariably one of the pairs does have an advantage. Here I will trace out
a hypothetical example for just a few steps to make the idea plausible. Suppose,
for example, that Jack had chosen HTH and Jill HHT. Which of them, if either,
has the advantage? One way to represent the situation is with a tree, each level
of which represents a toss of the coin. Figure 5.1 shows all 32 possible se-
quences of five tosses
Figure 5.2 is drawn so as to highlight the nine ways in which the game can be
terminated within five tosses, two after three tosses, three after four, and four
after five. Inasmuch as the probability of arriving at a particular point in this
tree as a consequence of n tosses is 1/2", the probability that this game would
terminate within five tosses is (2 x 1/23) + (3 x 1/24) + (4 x 1/25) = 9/16. The
probability that HTH would win within the first five tosses is 1/8 + 1/16 + 1/32
FIG. 5.2. All possible ways in which game can be terminated by the occurrence of
HHT or HTH within five tosses.
= 7/32, and the probability that HHT would win in the same number of tosses is
1/8 + 2/16 + 3/32 = 11/32; in other words, HHT would have had the advantage,
the odds in its favor being 11 to 7, or 1.57 to 1.00. It is obvious that the probabil-
ity that there will be a winner increases with the number of tosses that are al-
lowed; the reader may wish to verify that the extent to which the odds favor
HHT get better the longer the game is allowed to go on. The odds favoring
HHT are 1.69 to 1.00 if the game is allowed to go to 6 tosses and 1.92 to 1.00 if
it is allowed to go to 10.
This example perhaps suffices to make it intuitively clear that two triplets can
have the same probability of occuring in three tosses and yet one of them be
much more likely than the other to occur first in an extended series of tosses.
What is probably less intuitively clear, but no less true, is the fact, first noted by
Penney (1969), that given a specified triplet, there is always another triplet that
has an advantage over it in an extended series of tosses. In other words, if Jill un-
derstands the situation completely, she can always pick a triplet that is a better
bet than the one Jack picked, no matter what triplet Jack picked. Hombas (1997)
has presented a detailed analysis of the situation. Table 5.1 gives the results of his
analysis and shows for each possible combination of triplets chosen by two play-
ers, A and B, the probability that the triplet chosen by the second player to
choose, B, will occur before the one chosen by the first player to choose.
Player B's optimal strategy is to pick the triplet corresponding to the row
that contains the largest probability in the column corresponding to the triplet
picked by A. Suppose, for example, that A chooses TTH. Then B should
choose HTT, in which case the latter's chances of winning would be 3/4, or
odds of 3-to-l. If A chose either HHH or TTT, B could get his probability of
winning to 7/8, or odds to 7-to-1, by choosing THH in the former case and HTT
in the latter. A can limit B's potential advantage to odds of 2-to-l by choosing
HTH, HTT, THH or THT, but no matter what A chooses, B can ensure favor-
SOME INSTRUCTIVE PROBLEMS • 147
TABLE 5.1
Probabilities That the Triplet Chosen Second by B Will Occur
Before the Triplet Chosen First by A
able odds of at least 2-to-l. The solution represented in Table 5.1 illustrates a
type of intransitivity that can arise among probabilistic relationships. Note that
HHT beats HTT, which beats TTH, which beats THH, which beats HHT,
which is where we started. Other intransitivities involving probabilistic rela-
tionships will be noted in chapter 6.
There once was a king in a polygamous land who worried about the fact that
men and women were born in about equal numbers. This bothered him because
the ideal situation, in his view, was for there to be many more women in the
realm than men so that every man could have several wives. As it was, the men
who were successful in acquiring several wives were resented by those who
were not and this caused a lot of friction and strife. The king's predecessors had
solved this problem by periodically sending the men off to war, thus ensuring
that females would outnumber males in the realm by a large margin despite the
near-equal birth rates of the two genders.
This king had no taste for war, however, and believed there must be a more
humane way to accomplish his goal. After thinking about the problem for a
long time and consulting his advisers, he decided to issue a decree that any
family could have as many children as it wished until it produced its first boy,
148 • CHAPTER 5
whereupon it was to cease and desist from further reproduction; having a child
after already having had a boy would result in banishment from the land. Fur-
thermore, to motivate families to keep having children as long as they were
producing girls, he provided a most generous monetary award upon the birth of
every female child. The king reasoned that his decree would guarantee that no
family in his realm would have more than one boy, and that his incentive policy
would ensure that many families would have more than one girl, and some
would have several.
The decree was issued and strictly enforced, and the awards were made
without fail. The king observed the results of his action for a few years with
great satisfaction. Considering only the children born after the issuance of the
decree, one could not find a family in the realm that had more than a single boy,
but one could find many families with two, three, four, or more girls. The king
was ecstatic to leam that in his realm of a few tens of thousands of people there
were a few families that had had as many as 10 girls before having to call it
quits because of finally having a boy!
So pleased was the king with how well his policy was working that he de-
cided to quantify the results so they could be reported to the people. He com-
missioned a census aimed at counting the number of boys and girls in the realm
that had been born since the issuance of the decree. To the king's dismay, the
count, and the recount, showed the number of male children born after the de-
cree to be about the same as the number of female children born during the
same time. Strict compliance with the decree had had no effect on the ratio of
male-to-female births in the realm, which had remained one-to-one.
Where had the king gone wrong in his thinking? It was true that an effect of
the decree was to guarantee that no family in the realm had more than one boy.
And it was the case that there were many families with more than one girl.
Where the king had gone wrong was in assuming that these effects were tanta-
mount to a larger number of female births than of male births.
I have not tried to determine in any very formal way whether most people
would share the king's surprise at the ineffectiveness of his attempt to increase
the ratio of women to men in his kingdom. The reactions of a few people to
whom I have told the story make me suspect that many people would do so. If
this conjecture is correct, a question that arises is whether there is an insight
that can make the situation clear.
Assume, for the sake of simplicity, that every family in the realm has as
many children as possible without violating the decree, and that the probability
that any random child will be a boy is exactly 1/2. One insight that may help
make the outcome plausible is that whereas every family will have one boy, one
half of all families—those who produce a boy as the first child—will have no
girls. Specifically, the distribution of girls over families in the realm will be as
SOME INSTRUCTIVE PROBLEMS • 149
follows: One half of all families will have no girls, one half will have one or
more girls, one fourth will have two or more girls, one eighth will have three or
more, and, in general (l/2n)th will have n or more. As it happens, the number of
boys produced by having one boy in every family turns out to be about the same
as the number of girls produced by this distribution.
Mathematically, the result follows from the fact that
girls.
The process may be clarified with a concrete illustration. Imagine 1,000
families, each producing a child, yielding a total of 500 girls and 500 boys. The
families that produced boys can have no more children, but each of those that
produced a girl the first time go on to have a second child; 250 of these produce
girls and 250 boys. We now have a total of 750 girls and 750 boys. The 250 fam-
ilies that had two girls go on to produce a third child, and so on. The king's de-
cree clearly had an effect on family structure in the realm. It ensured that no
family would have more than one boy, and that only parents who produced
strings of girls had sizable families. It had no effect, however, on the ratio of
girls and boys in the realm.
The reader who has difficulty with this story may wish to do an analogous ex-
periment. Consider a sequence of coin tosses to be terminated with the first occur-
rence of a head. Thus H is one such sequence, TH is another, TTH another, and so
on. Now toss a coin many times, keeping track of the sequences that occur. After a
few thousand tosses, you will find that about half of your sequences are one toss
long (H), about one quarter are two tosses long (TH), about one eighth three tosses
long (TTH), and, in general, about l/2n are n tosses long. Although many se-
quences have several Ts, none has more than one H. The total number of Hs and Ts
in all the sequences combined, however, will be about equal. This should be intu-
itively obvious from the fact that in any large number of tosses of a (fair) coin, the
number of heads should approximately equal the number of tails, and in conduct-
ing this experiment all one does is toss a coin a large number of times. Perhaps it is
clear from this example that there is no "contingent-termination" method that will
be generally effective in determining the percentage of outcomes of a given type
(say tosses of heads) of a random process.
150 • CHAPTERS
A SUBTLE DISTINCTION WITH UNSUBTLE IMPLICATIONS
In a certain town, the average family has three children; it follows that the aver-
age child in this town has two siblings. If you did not object to this inference,
you are probably in good company. It sounds right, but it is not. The number of
siblings that the average child in this town has is likely to be much greater than
two; it would be two only in the unlikely event that every family has three (the
average number) of children, in which case every child has two siblings. If
there is any variability in family size, the average child has more siblings than
does a child from an average-size family.
J. J. Jenkins and Tuten (1992) discuss this situation and note that the fact that
must be grasped to see through the puzzle is that the average child does not
come from the average family. Imagine that there are 10 families in this town, 5
of which have one child each and 5 of which have five each. In the aggregate the
10 families have 30 children, which gives us an average of 3 per family (al-
though in this example, there is no actual family that has the same number of
children as the "average family"). If each family had the average number of
children, three, each child would have two siblings. But each of the 5 children
in the 1-child families has no siblings, whereas each of the 25 children in the
5-children families has four siblings, so the number of siblings of the average
child is [(5 x 0) + (25 x 4)]/30 = 3.33.
One might object that language has been used a little loosely in this discus-
sion. What, after all, is an "average family" or an "average child"? Does it
make sense to speak of an "average family" in a situation, like our imaginary
one, in which no family in the population of interest is "average?" And one
could argue that "average child" should be taken to mean child from an average
family, although that is not the way this term was used in our example.
To be more precise, we could reword the original assertion as follows. In a
certain town, the average number of children per family is three; it follows
that the average number of siblings per child in this town is two. This infer-
ence is, of course, incorrect, and having just thought through the problem as
originally worded, we are likely to see that straightaway. Whether people will
generally be more apt to see the problem with the inference when stated in the
second, more precise, way than when stated in terms of average families and
average children will be left to the reader to check. My suspicion is that many
people will see no problem with the inference in either form, until it is
pointed out to them, but the second wording does have the advantage over the
first that its use of the terms "per family" and "per child" makes explicit the
fact that number that is used as the denominator to calculate the "average"
differs in the two cases.
SOME INSTRUCTIVE PROBLEMS • 151
Failure to make this type of distinction can lead to the drawing of unwarranted
conclusions or present the opportunity for deception. Consider, for example, the
question of average class size in a school district, a question of considerable
practical interest to parents and school administrators in many towns. One is in-
clined to assume that the smaller the average class size, the better. The reason, or
at least a reason, for this assumption is the belief that a child is likely to get more
personal attention in a small class than in a large one. But it is important to recog-
nize that average number of pupils per class is not the same as average number
of classmates per pupil. Consider the two following situations. School District A
has 20 classrooms, each housing a class of 20 students, whereas District B has 5
classrooms with 5 students, 5 with 10, 5 with 25 and 5 with 40. Both districts
have an average of 20 pupils per class; however, the average number of class-
mates per pupil is 19 in District A and 33.4 in District B.
J. J. Jenkins and Tuten (1992) point out that the problem represented by the
average family-average child puzzle has important implications for the inter-
pretation of descriptive statistics . By way of illustration, they cite the following
example noted by D. S. Smith (1979): "The mean population of the 433 Mid-
western counties in 1900 was 36,853. The average person in the Midwest in
1900, however, had another 284,821 persons living in his county; a sample of
Midwestern counties would produce a poor sample of Midwesterners" (p. 85).
Jenkins and Tuten note too that the distinction has implications for sampling: If
one wants to select a representative sample of families, one generally will not
be assured of getting one by randomly sampling children. Moreover, they ar-
gue, the problem is a very general one: "The average churchgoer does not at-
tend the average-size church; the average person does not live in the average-
size household; the average club member does not belong to the average-size
club; the average rat pup is not from the average-size litter; ... In the most gen-
eral sense, we must be aware that the average individual classified by some sys-
tem will not be in the average-size class of that system" (p. 524).
153
154 • CHAPTER 5
SIBLING GENDER
Some of the problems just described might be mildly surprising to people with
a good understanding of probability theory when they first think about them,
but they are unlikely to provoke any lasting debate, because the reasons why a
superficial analysis might produce the wrong answer are obvious on reflection.
Situations can be described, however, for which people who are familiar with
probability theory will disagree as to what the probabilities of specified possi-
bilities are. The following example is one of several discussed by Bar-Hillel
and Falk (1982): "Mr. Smith is the father of two. We meet him walking along
the street with a young boy whom he proudly introduces as his son. What is the
probability that Mr. Smith's other child is also a boy?" (p. 109). Bar-Hillel and
Falk report that two professors of mathematics, when given this problem, dis-
agreed as to whether the correct answer is 1/2 or 1/3.
156 • CHAPTER 5
The following problem appeared several times in the "Ask Marilyn" column
of Parade Magazine (Vos Savant, 1990a, 1990b, 1991):
Suppose you're on a game show, and you're given a choice of three doors. Be-
hind one door is a car; behind the others, goats. You pick a door—say, No.
1—and the host, who knows what's behind the doors, opens another door—say,
No. 3—which has a goat. He then says to you, "Do you want to pick door No. 2?"
Is it to your advantage to switch your choice? (Vos Savant, 1991, p. 12)
Publication of this problem, and of the answer proposed by Vos Savant (that it
is to your advantage to switch), stimulated a great deal of mail, much of it from
people with university- or graduate-level exposure to probability theory. Some
responders argued strongly for one position, some for the other. Responses often
indicated a great deal of confidence in the positions defended, and sometimes
showed disdain for the incompetence of anyone who could hold a different view.
Since its first appearance in Vos Savant's column, this problem—sometimes re-
ferred to as "the player's dilemma" and sometimes as "Monty's dilemma," after
Monty Hall, the longtime host of "Let's Make a Deal"—has been discussed by
several writers in the literature on probabilistic thinking (Falk, 1992; Gillman,
1992; J. P. Morgan, Chaganty, Dahiya, & Doviak, 1991a, 1991b; Selvin,
1975a,1975 b; Seymann, 1991; Shaughnessy & Dick, 1991).
Problems like those described by Bar-Hillel and Falk and by Vos Savant il-
lustrate in a particularly compelling way that probabilistic reasoning can be
tricky, even for people who are well versed in probability theory. What makes
these problems, and many others that could be considered, difficult may have a
complex answer. However, I want to argue that a major contributing factor is
the fact that statements of problems often are incomplete or ambiguous in the
sense that they admit of more than one interpretation, depending on assump-
tions that the reader may make about the situation described, possibly without
realizing he or she is making them, in the process of deriving an answer. I have
tried to make this possibility plausible by noting the assumptions that are im-
plicit in the different answers to the two problems just mentioned and to others
of a similar nature (Nickerson, 1996a). If this hypothesis is correct, one way to
reduce disagreements regarding solutions to probability problems among peo-
ple who are knowledgable about probability theory is to insist that the assump-
tions on which probabilities are calculated be made explicit. Two
probabilistically knowledgeable persons working on the same assumptions
should not produce different answers to the same problem.
Bar-Hillel and Falk's (1982) analysis of the sibling gender problem points
out the relevance of how the information that Smith has at least one son was ob-
tained and notes the need to make an assumption about the probability of meet-
ing Smith on the street with a boy if he happened to have one boy and one girl. If
SOME INSTRUCTIVE PROBLEMS • 157
we assume that if Smith has a boy and a girl so that we are equally likely to find
him in the company of either—that the child we see with Smith was randomly
selected from the two he has—the correct answer is 1/2. In contrast, if we be-
lieved him to be sufficiently partial to boys that he would walk only with a son
if he has a boy and a girl, then the correct answer is 1/3. On the other hand if we
believe him to be partial to girls and willing to walk with a son only if he has no
daughter, the correct answer is 1 (Nickerson, 1996, Table 1).
Bar-Hillel and Falk (1982) argue that in the absence of evidence to the con-
trary, it seems natural to assume that a father of a son and daughter would be as
likely to be seen with one as with the other. This seems right to me, but the point
is that it is an assumption and should be recognized as such. The point is illus-
trated by consideration of another possible answer to the question, but one that
is justified only if a different assumption is made about the probability that a fa-
ther of a son and daughter would elect to walk with the son.
Bar-Hillel and Falk (1982) give the following argument as one that might be
used in support of the the answer 1/3. Given that Smith is the father of two chil-
dren, he must be the father of two boys, of two girls, or of one boy and one girl,
with probability 1/4,1/4, and 1/2 respectively. Discovering that at least one of
the children is a boy rules out the possibility of two girls and identifies the fam-
ily as a member of the subset of two-child families that have at least one boy,
and we know that about 1/3 of such families have two boys. So, one might con-
clude, the probability that the other child is a boy is 1/3.
Bar-Hillel and Falk (1982) point out that the conclusion is justified only if
another unstated assumption is made, namely that the family not only is a
member of the subset of two-child families that have at least one boy but that it
is a randomly selected member of that subset, which is tantamount to assuming
that all members of this subset are equally likely to be represented on the street
by a father and son. But this assumption would be reasonable only in a land
where fathers who had a son and a daughter would walk only with the son. In a
land where fathers with a son and a daughter are as likely to walk with one as
with the other, any family with two sons is twice as likely to be represented on
the street by a father and son as is any family with only one son.
THREE CARDS
Another problem that illustrates the ease with which people can be confused
by conditional probabilities involves three cards, one of which (RR) is red on
both sides, one of which (WW) is white on both sides, and one of which (RW)
is red on one side and white on the other. Imagine the following scenario. Per-
son A asks person B to shuffle the cards out of sight, pick one at random and
report the color of (only) one side. A cannot see the card and neither can you.
158 • CHAPTER 5
B shuffles the cards, picks one and reports "red." A then notes that the card
obviously cannot be the white-white one, but that it could be either of the
other two. A then offers to bet $10 against 9 that the other side of this card is
red. Should you take the bet?
Assuming that B selected at random which side of the card to report, the
probability that you will lose if you play is 2/3. A has offered you abet in which
his expected gain is (2/3) x $9 - (1/3) x $10 = $2.67 (which, of course, is your
expected loss). It is true that in knowing that one side of the card is red elimi-
nates the WW card from consideration, but it does not follow that the card in
hand is equally likely to be either of the remaining two cards. The important
thing to see is that the RR card is twice as likely to show a red face, if picked, as
is the RW card. About two thirds of the college students to whom Bar-Hillel
and Falk (1982) gave a version of this problem judged the probability that both
sides of the card were red, given one side was shown to be red, to be 1/2.
Here is a way to think of the problem that may help make it less confusing.
Imagine that the six sides of the three cards are distinguishable by some way
other than their colors. To be specific, let us imagine that the sides are num-
bered 1 through 6, the sides of the red-red card being numbered 1 and 2, those
of the white-white card 3 and 4 and those of the red-white card 5 and 6. If a card
is drawn at random and placed on the table so a random side is up, then each
number is equally likely to be showing following a draw. Let the notation R:R2
indicate that the red-red card has been drawn and the side with the number 1 is
up. The situation is represented in Table 5.3. It should be clear from the table
that in two of the three equiprobable cases in which a red card is seen, the un-
seen side of the card is also red. So, given that the "up" side of a randomly se-
lected card is red, the probability that the "down" side is also red is 2/3.
The following are two variations on the three-card problem:
A: While blindfolded, you draw all three cards from a hat and place them in
a row on the table. When you are allowed to look at the cards you see that, left to
right, they are showing red, white, and white. What is the probability that the
hidden side of the leftmost card is red?
B: While blindfolded, you draw all three cards from a hat and place them in a
row on the table. When you are allowed to look at the cards you see that they are
showing two reds and a white. What is the probability that the hidden side of
the leftmost red card is red?
One way to solve these problems is to consider all possible outcomes of the
drawing of the three cards as represented in Table 5.4. The notation in this case
indicates the triplets of sides showing. Inasmuch as there are three cards, each
SOME INSTRUCTIVE PROBLEMS • 159
TABLE 5.3
A Card (From Among Three Cards, Redj-Redj, White3-White4, and Red5-White6)
Is Chosen at Random and the Color of a Random Side of the Selected Card Is Seen
Color Seen
Card r w 5
1/6 0 1/6
R^
R2R1 1/6 0 1/6
W3W4 0 1/6 1/6
W4W3 0 1/6 1/6
R5W6 1/6 0 1/6
W6R5 0 1/6 1/6
X 1/2 1/2 1
of which has two sides, there are 23 ways in which the sides can be combined.
Each one of these combinations can occur in six different orders, so taking or-
der into account there are 48 equiprobable arrangements of the three cards. All
are listed in the table.
To see the answer to Question A, we note that 8 of the 48 possible arrange-
ments of cards show RWW, in that order. In all eight cases, the red face show-
ing is either R^ or R2 (never R5), and both R: and R2 have red on the opposite
side, so the answer is that the probability that the hidden side is red is 1. The an-
swer is obvious too if we note that if two white faces are showing, one must be
W3W4 and the other R5W6, from which it follows that the remaining card—the
one showing red—must be R^. This applies independently of the order in
which the cards are arranged.
Now consider Question B. Inspection of Table 5.4 will reveal that of the 48
equally probable arrangements, 24 have two reds showing and for 12 of these
the leftmost card showing red is red on both sides (Rj on one side and Rj on the
other), so the answer to the question is 12/24 or 1/2. Again, the answer should
be intuitively clear simply from consideration of the fact that if two reds are
showing, one of them must be RjRj and the other R5W6, and each of these is as
likely as the other to be the leftmost of the cards showing red. It should be clear
also that the answer would be the same if the question had been, given that
when you look at the cards, you see two reds and a white, what is the probabil-
ity that the hidden side of the rightmost red card is red.
160 • CHAPTER 5
TABLE 5.4
All Possible Arrangements of the Six Sides of Three Cards, Taking Order Into Account
Remove all cards except aces and kings from a deck, so that only eight cards re-
main, of which four are aces and four are kings. From this abbreviated deck,
deal two cards to a friend. If he looks at his cards and announces (truthfully)
that his hand contains an ace, what is the probability that both his cards are
aces? If he announces instead that one of his cards is the ace of spades, what is
the probability then that both his cards are aces? (These two probabilities are
nctfthe same!) (p. 433)
Copi (1968) gives 3/11 as the first probability and 3/7 as the second. Rose
(1972) accepts 3/7 as the correct probability for the second case and argues that
accepting this answer for that case commits one to acceptance of it for the first
case as well:
For if at least one of the cards in the hand is an ace, then either the hand contains
As (subscript = suit) or it contains Ah or it contains Ad or it contains Ac. But if the
hand contains As, the probability of two aces is 3/7 If it contains Ah, the probabil-
ity of two aces is 3/7. If it contains Ad, the probability of two aces is 3/7. And if it
SOME INSTRUCTIVE PROBLEMS • 161
contains Ac, the probability of two aces is 3/7. Therefore, by constructive di-
lemma, the probabiltiy that both cards are aces is 3/7, even if no suit has been
mentioned, (p. 523)
The error of those who arrive at a probability of 3/11 when no suit has been men-
tioned lies in supposing that the various possible hands are equipossible, and
then noting that of the 22 hands containing at least one ace there are six contain-
ing two aces.... But the equipossibles are not the hands, for this is not one of
those standard questions about probability of a certain hand; what is distinctive
and crucial about the problem before us is the information that there is at least
one ace. And any one of the 28 aces that occur in the above array [Rose had listed
the 22 hands containing at least one ace, 6 of which contain two] could provide
just as much support for that information as could any of the other aces. Thus the
equipossibilities are the 28 aces in the above array of hands, and not the 22 hands
themselves; by the same token, the favorable outcomes here are those aces that
are accompanied by another ace, and not the hands in which there are two aces.
Therefore, since a total of 12 of the 28 aces are accompanied by another ace, the
probability that the hand contains two aces is 12/28 = 3/7, exactly the same as
when a suit was named, (p. 523)
Rose's (1972) analysis goes wrong, in my view, in two ways. First, it takes
Copi's (1968) statement of the problem as adequate to permit an unequivocal
solution, but it is not. We are not told under what conditions the friend to whom
the cards are dealt will announce (truthfully) that his hand contains an ace.
Rose's analysis assumes that the friend will announce that his hand contains at
least one ace whenever it does so. But this is an assumption that need not be
true. The friend might announce on the basis of some other rule—he might, for
example, be more inclined to announce having a king whenever he had at least
one king, in which case he would announce having an ace only when he had
two of them. Or he might use some other rule, or no rule at all, deciding
whether to announce having an ace or having a king—when he has one of
each—on the basis of momentary whim. The point is, in the absence of knowl-
edge of, or an assumption about, his reporting rule, the problem is not solvable.
In this regard, the situation is analogous to the coin-tossing and sibling gender
problems discussed above. A similar ambiguity applies to the case in which the
friend announces having the ace of spades; again, one must know, or make an
assumption about, the conditions under which he would announce this in order
to know what to make of that announcement.
But now let us make the assumption that the friend's rule is to report having an
ace whenever he has at least one ace. We might imagine that after dealing the
cards, you ask your friend to look at both of them and tell you whether or not at
162 • CHAPTER 5
least one of them is an ace. What do we make of Rose's argument that the proba-
bility that the hand contains two aces, given that it contains at least one, is 12/28,
or 3/7? If we analyze this situation, we get the answer 3/11. There are 28 possible
two-card hands that can be dealt from eight cards; 6 will have two aces, 6 two
kings, and 16 will have one ace and one king. The entries in Table 5.5 show the
relative frequencies with which the friend will report holding an ace.
Each cell in the table represents the probability of the joint occurrence of the
hand indicated by the associated row label and the report indicated by the associ-
ated column label. Letting p(AA I a) represent the probability that there are two
aces, given that there is at least one ace (that the friend reports having an ace),
/?(AA&a) the probability of the joint occurrence of two aces in the hand and the
report of having an ace, and/?(a) the probability of the report of having an ace,
In other words, the probability of there being two aces, given that there is
one ace, is the ratio of the number of hands having two aces (6) to the number of
hands having at least one ace (22).
Rose (1972) contends that this ratio is not the appropriate one for this prob-
lem. The argument goes as follows. If we consider only the 22 hands that con-
tain at least 1 ace, we see that there are 28 aces in those hands—12 of them in
the 6 hands that contain 2 aces each, and 16 in the hands that contain 1 ace and 1
king. The probability of there being two aces, given that there is one ace, is the
ratio of the number of aces in the hands containing two aces (12) to the number
of aces in the hands containing at least one ace (28).
Which of these analyses (if either) is correct? Clearly they cannot both be.
Imagine that the card game is played a sufficiently large number of times that
the relative frequencies of the various possible hands closely approximate
TABLE 5.5
The Rule Is to Report Having an Ace Whenever the Hand Contains at Least One Ace
Friend's Report
ace
Hands
AA 6/28 0 6/28
AK 16/28 0 16/28
KK 0 6/28 6/28
I 22/28 6/28 1
SOME INSTRUCTIVE PROBLEMS • 163
those predicted by theory and the assumption that all possible hands are
equiprobable on every deal. The proportion of hands that contain two aces will
be about 6/28; the proportion that contain at least one ace will be about 22/28;
and the proportion of those that contain at least one ace that contain two aces
will be about 6/22, or 3/11.
It is also true that about 12/28, or 3/7, of all the aces that are in hands con-
taining at least one ace (which is to say all the aces dealt) will be in hands that
contain two aces. But the conditional probability represented by these numbers
is the probability that an ace will be dealt into a hand containing two aces,
given that it is dealt at all. This is not the same as the probability that a hand
contains two aces, given that it contains one. The problem posed by Copi
(1968) has to do with the latter probability and not the former. The proof of the
pudding here is in the eating. The individual who believes that the probability
that a hand contains two aces, given that it contains one, is 3/7 and accepts bet-
ting odds consistent with this belief will lose money to one who bets in accor-
dance with the belief that that conditional probability is 3/11.
A. I. Dale (1974), like Rose, fails to note the ambiguity in Copi's statement
of the problem, and tacitly assumes that the friend announces that the hand
contains an ace whenever it contains at least one ace. This is seen in the asser-
tion that "Copi's question may equivalently be phrased as 'find the probability
that both cards are aces, given that at least one of them is an ace'" (p. 205). But
one of the main points I am trying to make in this discussion is that such a re-
phrasing of the problem requires the making of an assumption, and that the
overlooking of this fact is at the base of much of the misunderstanding about
conditional probabilities. But again, suppose we explicily make the assump-
tion that Rose and Dale both implicitly make.
A. I. Dale (1974) criticizes Rose's (1972) analysis on the grounds, in part,
that it considered the order of the cards in a hand to be irrelevant to the solution
of the problem: "The order is of supreme importance, and failure to recognize
this fact has, I think, been responsible for wrong solutions to both the problem
under consideration here and many others" (p. 204). Dale's analysis begins
with a listing of all 56 possible two-card hands, order being taken into account,
and proceeds with a consideration, by enumeration, of specific conditional
probabilities. The probability that the second card is an ace, given that the first
one is an ace, for example, is found by counting the number of instances of the
first card being an ace (28) and the number of those cases for which the second
card is also an ace (12) and taking the ratio, which, in this case, is 3/7. Similarly,
to find the probability that both cards are aces, given that at least one is, one
counts the number of hands that contain at least one ace (44) and the number of
those that contain two aces (12) and take the ratio, which is 3/11. (Although
this ratio was obtained by considering the full set of 56 hands, taking card order
164 • CHAPTER 5
into account, the same 3/11 would be obtained by enumaration using only the
28 possibilities when card order is not taken into account.)
A. I. Dale (1974) attributes Rose's answer of 3/7 to the question of the prob-
ability that both cards are aces, given that one is, to "the failure to notice that the
event 'at least one card is an ace' is the union of the two mutually-exclusive
events 'exactly one card is an ace' and 'exactly two cards are aces'" (p. 205).
Dale goes on to argue as follows: "I do not think it is correct to regard any one
of the aces in the above array [Dale's listing of all 56 possible hands] as provid-
ing the same amount of support for the information that there is at least one ace.
That is, while the hand As, Kh, (say), provides a certain amount of support for
this proposition, the hand As, Ah does not provide more support, which is what
Rose seems to be claiming" (p. 205). I confess to not understand exactly what
is being said here, so I do not see it as a convincing account of Rose's miscalcu-
lation, though miscalculation I consider it to be.
Both Rose and Dale accept Copi's answer of 3/7 as the probability that the
hand contains two aces when the friend announces that he has the ace of
spades. We should note that this too involves an unstated assumption, namely
that the friend announces that his hand contains an ace of spades whenever it
does so. If, in those instances in which he held an ace of spades, he would
sometimes announce holding the other card, the probability of holding two
aces, given the announcement of holding an ace of spades, would not necessar-
ily be 3/7. What it would be would depend on the specifics of the conditions un-
der which the ace of spades would be announced.
In a commentary on the analyses by Rose and Dale, Faber (1976) puts his
finger on the crux of the matter on which this discussion has focused, namely
the indeterminacy of Copi's problem in the absence of knowledge, or an as-
sumption about, the conditions under which the friend (the player in Faber's
terms) announces that his hand contains an ace (or that it contains a specified
ace). "To calculate probabilities it is not enough to know," Faber argues, "that
there is at least one ace in the hand. We must also know how this evidence came
to light; that is we must know the rule by which the player decided what to an-
nounce" (p. 284). Faber considers three possibilities:
"Rule 1: The player chooses randomly which of the two cards to inform us
about."
"Rule 2: The player is obliged to announce an ace when he holds one, but is
free to specify the suit of either of two aces when two are present."
"Rule 3: The player must announce an ace if he holds one, and must specify
the spades ace in preference to another suit when two aces are present" (pp.
284, 285).
SOME INSTRUCTIVE PROBLEMS • 165
Faber (1976) argues that the probability that the hand contains two aces
given that the player announces that it contains the ace of spades, or just that it
contains an ace, under each of these rules is as shown in Table 5.6.
Faber (1976) suggests that, in the absence of any hints from Copi (1968)
about the rule under which the friend in the original statement of the problem
was operating, it is reasonable to assume Rule 1, and he argues that under this as-
sumption, the two probabilities that Copi claims are different are really the same.
As I understand Faber's comments, he considers the situation in which
the player invariably announces holding a particular card by identifying
both its face and suit—ace of spades, king of hearts; he does not consider
the situation in which the player is free sometimes to announce holding an
ace (or a king) without naming the suit. The rules, as expressed earlier, do
not specify this constraint, but Faber's analysis suggests that this was the in-
tention. Moreover, the constraint is essential to preclude perverse but infor-
mative implementations of the rules, as expressed. For example, Rule 3,
would permit the player to announce having an ace, without reporting its
suit, whenever he holds only one, and to specify the ace of spades only when
he holds two aces, one of which is the ace of spades; this reporting rule
would, of course, provide useful information to an observer who knew it
was being used. In general, any implementation of the rules, as expressed,
in which the player decides, on the basis of a card's face, whether or not to
report its suit could be informative in a similar way. The following com-
ments are predicated on the assumption that the rules, as intended by Faber,
could be expressed as follows.
Rule 1: The player chooses randomly which of the two cards to inform us
about, and reports the face and suit of that card.
TABLE 5.6
Probability That the Hand Contains Two Aces, Contingent on What the Player
Announces, Under Each of Three Announcement Rules Accoring to Faber (1976)
Player Announces
The Ace of Spades An Ace Other Than Spades
Rule 1 3/7 3/7
Rule 2 3/11 3/11
Rule 3 3/7 3/11
166 • CHAPTER 5
Rule 2: The player is obliged to announce an ace and its suit when he holds
one, and is to select which suit to report at random when he holds two aces.
Rule 3: The player must announce an ace, reporting its suit, if he holds only
one, and must specify the spades ace in preference to another suit when he
holds two aces, one of which is the ace of spades.
Recall that of the 28 possible hands (ignoring order), 6 have two aces, 6 have
two kings, and 16 have an ace and a king. It may help at this point to lay out the
hands to highlight these facts. This is done in Table 5.7. The hands that contain
the ace of spades are in bold print. We assume, of course, that all possible hands
are equally probable.
Under Rule 1, when the player gets a particular one of these hands, he is
equally likely to reveal the identity of either card. So when he gets an ace and a
king, he is equally likely to announce the one as the other, and when he gets two
aces, he is equally likely to name either suit. The probability of getting two aces
in hand is 6/28. In only three of such cases will one of the two aces be the ace of
spades, and on only 1/2 of those instances will the player announce "ace of
spades"; so the joint probability that the player's hand will contain two aces
and that he will announce "ace of spades," p(AA&"A"), is (3/28)( 1/2) = 3/56.
The probability that the hand will contain the ace of spades is 7/28 and, inas-
much as the player will announce "ace of spades" on half of such instances, the
probability that the player will announce "ace of spades," p("As"), is
(7/28)( 1/2) = 7/56. So the probability that the player has two aces, given that he
announces having the ace of spades, p(AA. I "As"), is
TABLE 5.7
All Possible Two-Card Hands (Ignoring Order) That Can Be Dealt
From a Deck Consisting of Four Aces and Four Kings
AA \KS ACKS
A
A
SOME INSTRUCTIVE PROBLEMS • 167
"As") = XAA&"AS")/X"AS") = (3/56)/(7/56) = 3/7.
Now consider the probability that the player has two aces, given that he an-
nounces having an ace other than the ace of spades (still under Rule 1). Again
the probability of getting two aces in hand is 6/28. In three such cases one of the
two aces will be something other than spades, and in the other three both of
them will be. In half of the former cases and all of the latter, the player will an-
nounce an ace other than spades, so the joint probability that the player's hand
will contain two aces and that he will announce a nonspades ace,
p(AA&"Anot.s"), is (3/28)(l/2) + (3/28) = 9/56. The probability that the hand
will contain an ace other than spades is 1 8/28 ; in three of such hands both cards
are nonspades aces and in the remaining 15 only one is, so he will announce a
nonspades ace in three cases and in half of the other 15, which is to say the
probability that the player will announce a nonspades ace, p("Anot.s") is (3/28) +
( 1 5/28)( 1/2) = 2 1/56. So the probability that the player has two aces, given that
he announces having a nonspades ace, p(AA I "Anot s"), is
A similar analysis of the implications of Rule 2 will show that the corre-
sponding probabilities are both 3/11. Consider first the case in which the
player announces "ace of spades." Given that the player is obliged to an-
nounce an ace when he has one, but is free (let us assume equally likely) to
specify the suit of either ace when two are present, the joint probability that
the player's hand will contain two aces and that he will announce "ace of
spades," is, as with Rule 1 (3/28)(l/2) = 3/56. Again, the probability that the
hand will contain the ace of spades is 7/28, but now the player will announce
"ace of spades" whenever the ace of spades is paired with a king and on half
of the instances when it is paired with another ace, so the probability that the
player will announce "ace of spades" is (4/28) + (3/28)(l/2) = 1 1/56. Thus the
probability that the player has two aces, given that he announces having the
ace of spades, is
Suppose, again under Rule 2, that the player announces having an ace other
than the ace of spades. The joint probability that the player's hand will contain
two aces and that he will announce an ace other than spades is (3/28) +
(3/28)(l/2) = 9/56. The probability that the hand will contain an ace other than
spades is 1 8/28; the ace of spades will be one of the aces in three of these cases
but not on the other 1 5, so the player will announce an ace other than spades with
168 • CHAPTER 5
probability (15/28) + (3/28)(l/2) = 33/56. So in this case, the probability that the
player has two aces, given that he announces having a nonspades ace, is
Finally, let us consider Rule 3, which obliges the player to announce that his
hand contains the ace of spades whenever it does so. With this rule, the joint
probability that the player's hand will contain two aces and that he will an-
nounce "ace of spades," is 3/28. The probability that the player will announce
"ace of spades" is the same as the probability that the hand contains the ace of
spades, or 7/28. So the probability that the player has two aces, given that he
announces having the ace of spades, is
Rule 3 is a bit more complicated when applied to those cases in which the
player announces some ace other than spades. This is because a listener who
knows the rule knows that when the player announces a nonspades ace, the other
card in the hand is not the ace of spades; if it were the ace of spades he could not
have announced the nonspades ace. The joint probability that the player's hand
will contain two aces and that he will announce an ace other than spades is 3/28.
Inasmuch as there are 15 hands that contain one or two aces, neither of which is
the ace of spades, the probability that he will announce an ace other than spades
is 15/28. So given rule 3 as expressed on page 166, the probability that the player
has two aces, given that he announces having a nonspades ace, is
There are at least two other rules by which the player of the game described
by Copi (1968) could decide what to announce, both of which are plausible
possibilities of what Copi had in mind and are unambiguous:
Rule 4: The player must announce that he holds at least one ace if he does so,
without specifying its suit if he holds only one and without specifying the suit
of either if he holds two, and he must announce nothing else.
Rule 5: The player must announce the ace of spades if he holds it, and he
must announce nothing else.
With Rule 5, the joint probability that the player's hand will contain two
aces and that he will announce "the ace of spades" is 3/28. The probability that
he will announce "the ace of spades" is the same as the probability that the hand
contains the ace of spades, or 7/28. So the probability that the player has two
aces, given that he announces having the ace of spades, is
Perhaps these two probabilities represent the distinction that Copi (1968)
had in mind in claiming that the probability that both cards are aces is differ-
ent when the holder of the hand announces that one of the cards is an ace than
when he announces that one of them is the ace of spades; what was missing
from Copi's claim was a recognition that in order to determine the probabili-
ties, it is not enough to assume the card holder's report is truthful, but an as-
sumption must be made about the conditions under which he will make
various possible (truthful) reports. Faber's (1976) critique of Copi's claim
was justified in pointing this out, but the rules he articulated were themselves
somewhat ambiguous, and, given unambiguous restatements of them, his
anlysis to illustrate the effects of different possible reporting rules was not
quite right with respect to Rule 3.
In the article following that of Faber in the same journal, S. Goldberg (1976)
comments on the problem posed by Copi and on the earlier commentaries by
Rose, Dale, and Faber. Goldberg chooses to view Copi's problem as "most rea-
sonably formulated as simply asking us to compute the conditional probabili-
ties that both cards in the hand are aces, given that the hand contains at least one
ace, and then given that the hand contains the ace of spades" (p. 287). How we
come to know that the hand contains at least one ace, or that it contains the ace
of spades, "whether by announcement or otherwise, is irrelevant," Goldberg
argues, and raising this issue unnecessarily complicates the picture. It should
be clear from the forgoing that, given this interpretation of the problem, the an-
170 • CHAPTER 5
swer is 3/11 in the former case and 3/7 in the latter. But should we accept this
interpretation? Failure to recognize that inferring the probability of an event
conditional on the report of some observation may require an assumption
about the conditions under which the report would be made—and that different
assumptions could justify different inferences—is precisely the point at issue.
And the literature provides many evidences that lack of clarity on this point can
lead to considerable confusion and debate.
S. Goldberg (1976) argues that "the two announcements in Copi's exercise
are most reasonably and simply interpreted as truthful affirmative responses to
the questions: 'Does the hand contain at least one ace?' and 'Does the hand
contain the ace of spades?'" (p. 287). (Rules 4 and 5 are designed to yield an-
swers to these questions.) This is an opinion, and perhaps one that many people
share, or would endorse if asked. But not everyone who has thought about the
problem shares this opinion, as the foregoing discussion illustrates. Moreover,
if what one wants to get at is people's understanding of the theory of probabil-
ity by seeing how well they can solve problems, the problems should be posed
in terms that are open to as little opinion-based interpretation as possible.
Copi's problem could easily be stated as:
Imagine the following situation. You have a deck of cards from which all but the
four aces and four kings have been removed. From this abbreviated deck, you
deal two cards, at random, to a friend. You then ask if the dealt hand contains at
least one ace, and your friend answers truthfully that it does; what is the proba-
bility that both her cards are aces? Now suppose you deal two cards, at random,
from the same eight-card deck, and this time you ask your friend whether the
hand contains the ace of spades. Again, assume she answers truthfully that it
does; what is the probability then that both her cards are aces?
Card Reported
Hand AH A
s j £
A
HAS 1/6 1/6 0 1/3
AHJ 1/6 0 1/6 1/3
ASJ 0 1/6 1/6 1/3
£ \n> 1/3 1/3 1
TABLE 5.9
The Rule Is to Report Only Whether or Not the Hand Contains the Ace of Hearts
Report
Hand Yes No j
AHAS 1/3 0 1/3
1/3 0 1/3
A^
ASJ 0 1/3 1/3
2 2/3 1/3 1
TABLE 5.10
The Rule Is Always to Report the Suit of the Ace When There Is Only One Ace
and the Suit of a Randomly Selected Ace When There Are Two
Card Reported
Hand A, As X
AHAS 1/6 1/6 1/3
AHJ 1/3 0 1/3
ASJ 0 1/3 1/3
I 1/2 1/2 1
172
SOME INSTRUCTIVE PROBLEMS • 173
(rather than the ace of hearts) on some proportion of the cases in which the
hand contains both aces.
A similar argument applies to the four-card case in which it is "given" that
the hand contains an ace without anything being said about suit. Again what we
make of this depends on what we assume about the conditions for obtaining
this information. If we assume that we are told there is an ace whenever there is
at least one ace, the probability that there are two aces conditional on being in-
formed that there is one, is, as Gillman (1992) claims, 1/5. But if we assume
that when there is one ace and one jack, we are equally likely to be told that
there is one jack as we are to be told that there is one ace, the probability that
there are two aces, conditional on being informed that there is one, is 1/3. This
follows from the fact that we will be informed that there is an ace in all in-
stances in which the hand contains two, but in only half those of the four times
as many instances in which it contains only one.
Gillman was not insensitive to the issue that I have been trying to address.
The card problem was adapted from one described by Ball (1892) (now Ball &
Coxeter, 1987). Gillman (1992) used the term "given that" a hand contains
specified cards, rather than saying that someone has asserted that it does so, as
Ball had done, for the explicit purpose of avoiding the ambiguity I have been
claiming. To interpret an assertion, Gillman notes, it would be necessary to
know how one decided what to assert: "My present rule is that you are to state
whether your hand contains an ace" (p. 6). With that interpretation of "given," I
agree with Gillman's figures. Gillman also considers the possible rule, in the
case of the four-card problem, of reporting one of the cards at random and gives
1/3 as the probability of two aces conditional on knowledge of the presence of
one in this case.
Essentially the same problem (using aces and deuces of spades and clubs)
was discussed by Freund (1965), who argued that the problem as he found it—in
a book of mathematical puzzles by Gamow and Stern (1958)—was incomplete
and could not be solved, at least not "without smuggling in unwarranted assump-
tions" (p. 44). Freund identified as critical the question of how we come to know
that the hand in question (which in his scenario is held by an opponent) contains
at least one ace, or a specified ace. To put the problem in perspective, he proposes
imagining that a spy provides us with the required knowledge, and considers two
possible ways in which the spy gets information:
Case 1: The spy looks at our opponent's entire hand; either he reports
whether or not he sees (at least) one ace, or he also reports the suit (flipping a
coin to decide whether to report spades or clubs when he sees both aces in our
opponent's hand).
174 • CHAPTERS
Case 2: The spy has the chance to see only one card (randomly selected from
our opponent's hand) and he either reports whether or not it is an ace, or he also
reports the suit. (p. 29)
If he merely reports that he sees (at least) one ace, we find ourselves in the first sit-
uation described above [elimination of the one hand that has no aces, leaving five
equiprobable hands]. Using Bayes' Rule it can easily be shown that the 5 remain-
ing hands have equal a posteriori probabilities and, hence, that the correct odds
are 4 to 1. If he also reports the suit, namely, that he sees the ace of spades, the use
of Bayes' Rule shows that the 3 remaining hands do not have equal a posteriori
probabilities. As a matter of fact, the a posteriori probability of the hand which
consists of both aces is only half of that of the other two remaining hands, and it
follows that the correct odds against our opponent having both aces are still 4 to 1.
The same argument applies also if the spy reports the ace of clubs, (p. 29)
I think this analysis is correct, given an unstated assumption that I will make
explicit. The wording "he reports whether ornothe sees (at least) one ace, or he
also reports the suit..." [emphasis added] makes it clear, in my view, that the
spy's rule is to report seeing an ace in all cases in which he does see at least one.
This precludes the possibility that, upon seeing an ace and a deuce, he will re-
port seeing the deuce and not report seeing the ace. And the proviso that, if re-
porting suits, the spy will flip a coin to decide which ace to report when he sees
two of them rules out the possibility of a bias in reporting suits when he has a
choice of which ace to report. So far, so good; but there remains an ambiguity
that needs to be resolved, and thus the need for a further assumption.
The assumption is needed because Case 1 can be interpreted in two ways.
One way to interpret it is as a description of two subcases: Case la, in which
the spy always reports only whether or not he sees (at least) one ace and does
not report the suit, and Case Ib, in which he always reports the suit of an ace if
he sees at least one, flipping a coin to decide which one to report when he sees
two (flipping the coin in private, of course, so as not to reveal by this act the
fact that the hand contains two aces). Given this interpretation of Case I,
Freund's (1965) analysis, I believe, is correct. However, another interpreta-
tion of Case I does not rule out the possibility that the spy sometimes reports
only whether or not he sees an ace without reporting the suit, and sometimes
reports both the presence of an ace and its suit (or the suit of a randomly cho-
sen one of them if he sees two).
This interpretation does not preclude the operation of an information-con-
veying bias such as would pertain, for example, if the spy reported seeing an
ace whenever the hand contained at least one ace and, for those hands that con-
SOME INSTRUCTIVE PROBLEMS • 175
tained only one ace, reported the ace's suit only, or primarily, when it was the
ace of spades. This seems an unlikely interpretation, but it is a possible one, and
if the spy were motivated to give a deceptive, albeit accurate, report, this is one
way he could do it. In dealing with spies, it is well to be aware of all the possi-
bilities! So the correctness of Freund's (1965) analysis of Case 1 rests, I claim,
on an unstated assumption. The assumption could be that Case 1 is to be inter-
preted as two subcases, as described earlier, or it could be that the spy only
sometimes reports the suit of an ace in a one-ace hand, but his decision of when
to do so is independent of which ace the hand contains.
Given either of these assumptions, Freund (1965) is correct in giving 4 to 1
as the odds against the hand having two aces, given the spy's report that it has at
least one, or his report that it has the ace of spades. This may be easier to see in
the first instance than in the second. In the first instance, the spy is equally
likely to report seeing at least one ace in five of the six possible hands, only one
of which contains two aces, so the probability that the hand contains two aces,
given the report that it contains at least one, is 1/5. The instance in which the
spy reports the suit is slightly more complicated. Three of the six possible
hands contain the ace of spades, and one of these three contains two aces. The
important thing to notice is that, given that the spy is to flip a coin to decide
which ace to report when the hand contains two aces, he is only half as likely to
report the ace of spades for the two-ace hand as he is to report the ace of spades
for either of the hands that contain the ace of spades as the only ace, so the prob-
ability that the hand contains two aces, given that the spy reports that it contains
the ace of spades is (l/2)/[(l/2) • + ! + !] = 1/5, or, if you prefer,
(l/2)(l/6)/[(l/2)(l/6) + (l/6) + (l/6)] = l/5.Thel/6of the latter equation repre-
sents the a priori probability of each hand containing the ace of spades, and the
1/2 represents the fact that that ace would be reported for only half of the hands
that contain both aces.
Freund's (1965) analysis of Case 2 yields odds of 2 to 1 against the hand
having both aces, given the spy's report that the card he sees is an ace, and in-
dependently of whether he reports the suit of that ace. Case 2, as described by
Freund, is ambiguous in the same way as Case 1. One interpretation of it is as
two subcases: In Case 2a, the spy always reports only whether or not the card
he inspects is an ace; in Case 2b, he always reports whether or not it is an ace
and gives the suit if it is. (He may report the suit if it is not an ace as well; this
makes no difference.) An alternative interpretation leaves open the possibil-
ity that the spy sometimes reports only whether or not the card he sees is an
ace without reporting the suit, and sometimes reports both whether it is an ace
and its suit if it is. As in Case 1, the second interpretation does not rule out the
possibility of the spy making the probability of reporting the suit of an ace
contingent on which ace he sees. In Case 2, however, such a bias would not
176 • CHAPTER 5
convey useful information; if the spy only sees one of the cards, his honest re-
port of whether or not it is an ace is informative with respect to the a posteriori
conditional probability that the other card is also an ace, but his report (or
lack of report) of that ace's suit is not.
The 2-to-l odds against the hand having both aces, given the spy's report
that the card he sees is an ace, or that it is the ace of spades, is easily seen if we
consider all 12 two-card hands that are possible when card order is taken into
account. Seeing only one card is equivalent to looking always at the first card
dealt (left) or always at the second card dealt (right). As shown in Table 5.11,
two out of six of the hands that have an ace in the left (right) position have one
also in the right (left) position. And one out of three of the hands that have an
ace of spades in the left (right) position have an ace in both positions, which is
what Freund's (1965) analysis claims.
TABLE 5. 11
All Possible Two-Card Hands Dealt From Four Cards (Aces and Deuces of Spades
and Clubs) Taking Order Into Account
D
A DD
SOME INSTRUCTIVE PROBLEMS • 177
several family sizes. The results of this exercise are shown in Tables 5.12,5.13,
and 5.14 for families with two, three, and four children respectively.
Each row of each table represents one of the possible combinations of males
and females in a family of the specified number of children. Column C indi-
TABLE5.12
Representing a Family of Two Children
D E F G
Males Females Families M'sBs M'sSs F'sBs F'sSs ACD ACE BCF BCG
2 0 1 1 0 0 0 2 0 0 0
1 1 2 0 1 1 0 0 2 2 0
0 2 1 0 0 0 1 0 0 0 2
TABLE 5.13
Representing a Family of Three Children
D E F G
Males Females Families M'sBs M'sSs F'sBs F'sSs ACD ACE BCF BCG
3 0 I 2 0 0 0 6 0 0 0
2 1 3 1 1 2 0 6 6 6 0
1 2 3 0 2 1 1 0 6 6 6
0 3 1 0 0 0 2 0 0 0 6
TABLE 5.14
Representing a Family of Four Children
D E F G
Males Females Families M's&s M'-ySs ,F'sBt r F'.y& ACD ACE BCF BCG
4 0 1 3 0 0 0 12 0 0 0
3 1 4 2 1 3 0 24 12 12 0
2 2 6 1 2 2 1 12 24 24 12
1 3 4 0 3 1 2 0 12 12 24
0 4 1 0 0 0 3 0 0 0 12
178 • CHAPTERS
cates the number of families, on average (from totals of 4, 8, and 16 in Tables
5.12 through 5.14, respectively), that would be expected to have that combina-
tion. This column contains, of course, the binomial coefficients. Columns D
and E give the number of brothers and sisters a male would be expected to have;
F and G give the same information for females. The four rightmost columns
give, respectively, for each combination of males and females in the family, the
numbers of brothers of males (ACD), sisters of males (ACE), brothers of fe-
males (BCF), and sisters of females (BCG).
Adding over rows (male-female combinations) shows the total expected
numbers of brothers and sisters to be the same for males and females. Con-
sidering all males in families of three children, 3/12, or 1/4, are in families of
three males (and so have two brothers), 6/12, or 1/2, are in families of two males
(one brother), and 3/12, or 1/4, are in families of one male (no brothers). So the
average number of brothers for males in families of three children is (l/4)(2) =
(1 /2)( 1) + (1/4)(0) = 1. Similar calculations show that the average number of sis-
ters for males (as well as brothers for females and sisters for females) in families
of three children is also (l/4)(2) + (1/2)(1) + (1/4)(0) = 1. The same analysis will
show equivelance for families of other sizes as well.
Another way to approach the problem is to ask: Given that one is a boy (girl)
in a family of n children, what is the expected number of brothers and the ex-
pected number of sisters he (she) will have? The possibilities for n = 2 are as
shown in Table 5.15, taking birth order into account.
The total number of boys represented in the table is four and the total num-
ber of brothers of boys [E boys x (bros/boy)] is two, so the expected number of
brothers per boy is .5. Similarly, the total number of sisters of boys is two, so
the expected number of sisters per boy in a family of two is also .5. These num-
bers are consistent with the expected number of siblings, which, of course,
must be one. The same figures hold for a girl in a family of two children.
TABLE 5.15
Possible Combinations of Brothers and Sisters in a Family of Two Chikiren,
Taking Birth Order Into Account
SUMMARY
One might expect that when the theory was just beginning to be developed, being
surprised by results was not an uncommon experience, and that disagreement
among mathematicians as to how probabilities should be computed was not rare.
Several accounts of specific surprises and disagreements are recorded in the lit-
erature. One such account relates to the surprise of Chevalier de M6r6 upon
learning that although the probability of obtaining at least one six in four tosses
TABLE 5.16
Possible Combinations of Brothers and Sisters in a Family of Three Children,
Taking Birth Order Into Account
Some Probability
Paradoxes and Dilemmas
PARADOXES
181
182 • CHAPTER 6
(Weintraub, 1988; Zabell, 1988), a dilemma (Mosteller, 1965), and an absur-
dity (Sze"kely, 1986).
I will not attempt a precise definition here. The following problems have
been called paradoxes, or are similar to others that have been so called. Readers
who feel that some of them are better described by other words should make
what they consider to be suitable substitutions in terminology.
Possibly the most famous paradox involving probability theory is one de-
scribed by Nicholas Bernoulli in a letter to Pierre de Montmort who later pub-
lished it in a book on games of chance (Montmort, 1708). Its name comes
from the fact that Nicholas' cousin, Daniel Bernoulli, published his resolu-
tion of it (not universally accepted as a resolution) in the annals of the Acad-
emy of St. Petersburg in 1738.
The "expected value" of the outcome of a probabilistic event is said to be the
sum of the products of the values of each of the possible outcomes multiplied
by its probability of occurrence. In other words
where p. and V. are the probability and value, respectively, of the /th outcome.
Thus, if you stood to gain $1 by the toss of a head with a fair coin and $0 with
the toss of a tail, the expected value of a toss would be(.5x$l) + (.5x $0) or 50
cents. Presumably, if you were offered the chance to purchase this gamble, you
should consider any purchase price less than $.50 to be a good buy and any
price over $.50 to be a poor one.
Consider the following gamble. A fair coin is to be tossed until it comes
up head; that is to say the tossing will continue as long as the outcomes are
tails, the first head terminates the game, and when it occurs determines how
much you win. (As initially posed by N. Bernoulli to Montmort, the prob-
lem involved the tossing of dice, but shortly later Cramer restated the prob-
lem as a game of heads and tails [Jorland, 1987], and the latter version has
generally been discussed since then.) If the first toss is head, the game is
over and you receive $2. If a tail occurs first and then a head, you receive $4;
if a tail occurs twice before the first head, you receive $8; if three times,
$16, and so on, the amount you win doubling with each additional toss of a
tail; so if a head occurs first on the kth toss, you win 2k dollars. In general the
situation is as follows:
SOME PROBABILITY PARADOXES AND DILEMMAS • 183
What would you consider a reasonable price to pay for the opportunity to play
this game? The expected value of the gamble, being the sum of the products of all
possible winnings and their probabilities of occurrence, is, in dollars,
So if expected value were used as the index of reasonable price, a rational in-
dividual should be willing to pay a very great amount indeed to play this game.
Nevertheless, it probably would surprise no one to discover that when people
are asked what they would be willing to pay for the opportunity to take this
gamble, almost no one would give a very large sum.
Inasmuch as an infinite amount of money is a bit beyond the reach of any fi-
nite being, we might make the situation more realistic by limiting the number of
tosses allowed, say by agreeing to consider the game to be terminated either by
the toss of a head or with the nth toss (irrespective of what it is), whichever came
first; if the nth toss is a head, you win $2" and if it is a tail, you win nothing. (The
idea of limiting the number of tosses so as to avoid the need to deal with infinity
was suggested at least as early as 1777 by Georges Louis de Buffon [Todhunter,
1865/2001], who proposed that it be limited to 29 on the grounds that 229 would
exceed the amount of money that could be furnished by all of France.)
Suppose, for example, that we agreed to terminate the game either by the oc-
currence of a head or with the third toss, whichever came first. Now imagine
that you have an opportunity to play this game a very large number of times.
We would expect that about half of the games would be terminated by the first
toss, because it was a head; in all these cases you would win $2. About one
fourth of the games would be terminated by the second toss, and on these you
would win $4. The remaining one fourth of the games would be terminated
184 • CHAPTER 6
with the third toss, and you would win $8 on the one half of these that are heads.
So on any given game, you would win $2, $4, $8, or $0 and your average,
per-game, win—the expected value of the game—would be
Given the opportunity to play this game many times, you probably would be
happy to do so at any cost to you that is much less than $3 per game. At any cost
greater than $3 per game, you probably would not find it very attractive, be-
cause you would be almost sure to lose money in the long run. You might even
be willing to pay something close to $3 to play this game even if you could only
play it once, because you have three chances in eight of winning $4 or more,
and you might consider that to be a reasonable gamble.
Now suppose that we agreed that the game would be terminated either by
the toss of a head or by the 1,000th toss, whichever came first. The situation is
completely analogous to the simpler one, except that the expected value of the
game, in dollars, is
Theoretically, one should consider any cost of playing this game that is
much less than $1,000, say $800, to be attractive. However, it seems unlikely
that many people would be willing to pay more than a relatively few dollars to
play it. The probability is .5 that one would win only $2, and the probability
that one would win as much as $1,000 is only about .001. (Even in the St. Pe-
tersburg game, the probability is .5 that one will win only $2, and, as
Samuelson [1977] points out, the expected total number of coin tosses, n, per
game is only 2:
The St. Petersburg paradox illustrates in a particularly compelling way that the
expected value of a probabilistic outcome is not necessarily the value of the most
likely outcome. This distinction is seen in the following situation, from Paulos
(1992), which may be thought of as a modern analogue of the paradox. Consider
a volatile stock that each year, with equal probability, increases in value by 60%
or decreases by 40%. What is $1,000 invested in this stock likely to be worth in
100 years? Inasmuch as the average change in the stock's value is (+60 - 40)/2,
or 10%, the expected value of the $1,000 investment after 100 years is $1,000 x
1.110°, or $ 13,780,612; not a bad return. What your broker is less likely to tell you
is that, given the assumption that the stock is equally prone to increase (by 60%)
or decrease (by 40%) in any given year, the most probable scenario is that it will
increase in 50 of the 100 years and decrease in the other 50, and its final value in
this case will be $1,000 x 1.650 x 0.650, or $130.
One's initial reaction to such disparate numbers for expected value and most
probable outcome is likely to be that something must be wrong—that one of
the calculations must have been done incorrectly or that the reasoning must
have been based on some unstated faulty assumption. But where is the mis-
take? Is there something wrong with the idea of averaging the equally likely
60% gain and 40% loss to get a mean annual gain of 10%? Suppose you made
two $1,000 investments and gained 60% on one while losing 40% on the other
in the first year; your average gain per investment, (600 - 400)72 = 100 dollars,
would be 10%. But this is not analogous, one might argue, to the case in which
you have a gain of 60% followed by a loss of 40% (or vice versa) on the same
investment, in which case the total change is 600 - 640 = -40 (or -400 + 360 =
-40), which is -4%. Does this mean that using the average of the equally likely
percentage changes to calculate expected value of the imagined investment is
not legitimate?
In fact it is legitimate. To get a better intuitive feel for the situation, it may
help to trace the first few branches of the tree of possible histories of the invest-
ment. Table 6.1 gives, for each number of years 1 through 5, and 10, every pos-
sible outcome (combination of gains and losses) for the investment over that
number of years, the number of ways each outcome can be realized, the proba-
bility of obtaining that outcome, the value of that outcome, the product of the
TABLE 6.1
Possible Outcomes of a $ 1,000 Investment That Is Equally Likely to Increase
by 60% or Decrease by 40% per Year
Year Outcome No. Prob. (P) Value (V) PxV Exp. Value
1 IG 1 .500 1,600 800
1L 1 .500 600 300
1,100
2 2G 1 .250 2,560 640
IG, 1L 2 .500 960 480
2L 1 .250 360 90
1,210
3 3G 1 .125 4,096 512
2G, 1L 3 .375 1,536 576
1G.2L 3 .375 576 216
3L 1 .125 216 27
1,331
4 4G 1 .0625 6,554 410
3G, 1L 4 .2500 2,458 614
2G, 2L 6 .3750 922 346
1G.3L 4 .2500 346 86
4L 1 .0625 130 8
1,464
5 5G 1 .0313 10,486 328
4G, 1L 5 .1563 3,932 614
3G,2L 10 .3125 1,475 461
2G.3L 10 .3125 553 173
1G,4L 5 .1563 207 32
5L 1 .0313 78 2
1,610
***
188
SOME PROBABILITY PARADOXES AND DILEMMAS * 189
probability and the value of that outcome (which is the contribution of that out-
come to the expected value for the year), and the (approximate) expected value
for the year (the sum of these products for that year).
The expected value at the end of each of these years, obtained by adding up
all the possible outcomes weighted by their probabilities of occurrence, is the
same as we would get by using the standard compound-interest formula with
the interest set at 10%. The breakdown shows how the various possibilities
contribute to the expectation. It also gives a sense of how the most likely
(modal) outcome steadily decreases. At the end of 10 years, the most likely
value of the investment is $815, which represents a loss of $ 185, and 638 of the
1,024, or about 62%, of the equally-likely outcomes represent a loss of at least
that amount. Continuation of this analysis would show the expected value
growing increasingly rapidly, reaching $13,780,612 at the end of the 100th
year, whereas the value of the most likely outcome continues slowly to de-
crease, and the probability of a net loss remains always greater than .5.
Saying that the expected value of this investment after 100 years is
$13,780,612 is a little like saying that you and some celebrity had an average in-
come last year of $20,000,000; true, perhaps, but probably not very exciting
from your point of view. (I am making an assumption here about who is most
likely to read this book.) The expected value is skewed greatly by virtue of
low-probability outcomes of extremely high worth. Suppose that the stock were
to move in the same direction, either to increase or to decrease in value, every one
of the 100 years. The probability of either of these outcomes (assuming the di-
rection of change is random) is vanishingly small—(1/2)100 or approximately
one chance in 1030—but theoretically possible; the important point is that al-
though the two outcomes are equally probable, their impacts on the expected
value computation are very different in magnitude. Following 100 years of
steady 60% increases, the value of the stock would be $ 1,000 x 1.6100, or $2.58 x
1023, many orders of magnitude more than the net worth of all the businesses on
earth. Following 100 years of steady 40% declines, it would be worth $1,000 x
0.6100, or, for practical purposes, 0. The average of these outcomes is $1.29 x
1023, which is to say the positive scenario counts much more in the expectation
than the negative one, because the latter is bounded below by 0.
The probability of 100 consecutive moves in the same direction is so small
that neither of these possibilities has an appreciable effect on expected value;
however, the principle that the comparison illustrates pertains to other possibil-
ities of higher probability and thus of greater impact on the expectation. The
probability that the number of years that the stock gains in value will exceed by
a specified amount the number of years that the stock loses in value is the same
as the probability that the number of losing years exceeds by that amount the
number or gaining years; however the impact on expected value of an excess of
190 • CHAPTER 6
gain years over loss years is much greater than the opposing impact of an equal
excess of loss years over gain years. The overall effect of this asymmetry is that
the expected value of the investment gives a highly distorted picture of what is
most likely to happen.
The counterintuitiveness of the compound interest example is due, at least
in part perhaps, to the easy-to-overlook nonequivalence of equal percentage
gains and losses. In particular, if one failed to think about it, one might assume
that an increase and decrease (or a decrease and increase) of a given percentage
offset each other so one ends up where one began, but of course that is not the
case. A loss of 50% offsets a gain of 100%, or conversely, it takes a gain of
100% to compensate for a loss of 50%. In our example, a loss of 40% more than
offsets a gain of 60%, so the greater the equal number of gains and losses of
these magnitudes the further behind one will fall. An investment that gains and
loses 50% in alternate years will lose 25% of its existing value every 2 years, so
at the end of n years it will be worth only .IS"*2 of its original value; $1,000 in
this case becomes $237 in 10 years. To satisfy my curiosity, I calculated the
losses that would exactly offset gains of specific magnitudes. The results, ex-
pressed in percentages, are shown in Table 6.2.
It does not follow from any of this that a decision to invest in a stock that is
expected to behave as the volatile one just described would necessarily be irra-
tional. Whether such an investment would make sense would depend on the in-
vestor's attitude toward risk. One might be inclined to invest in such a stock,
although hopefully not one's last dollar, on the grounds that even though the
TABLE 6.2
Magnitudes of Losses That Would Offset Specified Gains, All in Percentages
Gain Loss
10 09.1
20 • 16.7
30 23.1
40 28.6
50 33.3
60 37.5
70 41.2
80 44.4
90 47.4
100 50.0
SOME PROBABILITY PARADOXES AND DILEMMAS • 191
most likely outcome is that one would experience a modest loss, there is a very
good chance that one would realize a large gain. This is especially true if one is
permitted to decide when to liquidate the investment.
Here is an analogue of the volatile-stock story that may also help to put the
situation in perspective. Imagine that you are given the opportunity to partici-
pate in the following game. You are to put down $ 1 at the outset; this is your ini-
tial "investment" in the game. (If you prefer to imagine a larger investment,
simply multiply all the dollar figures in what follows by whatever you wish; an
investment of $ 1,000, for example, changes all the dollar figures by a factor of
1,000.) A coin is to be tossed 10 times. Every time the coin comes up heads, the
current value of your investment will be increased by 60% and every time it co-
mes up tails, the current value of your investment will be decreased by 40%.
Would you find this an attractive game to play? The expected value of your in-
vestment at the end of the game is $2.59 (representing a gain of $ 1.59); the most
likely value is about $0.82 (representing a loss of about 18 cents). You are more
likely to lose money than to make any (the probability that you will lose some
portion of the initial investment is about .62). The most you can lose is $1,
whereas it is possible to win a much larger amount (over $100 in the most ex-
treme case), and the chances of a relatively spectacular win are quite good; in
particular the probability of at least doubling your investment is about .38 and
the probability of increasing it by almost a factor of six is about .17. The proba-
bility of getting k or more heads in 10 tosses and the final value of the initial $1
investment for each value of k are given in Table 6.3.
In general, an excess of one type of outcome over the other (heads vs. tails in
this game, gain years vs. loss years in the original volatile-stock scenario) is
equally likely in both directions; for example, the probability of seven or more
heads is the same as the probability of seven or more tails—it is about. 17 in each
case. However, one stands to win a great deal more with seven or more heads
than one can lose with seven or more tails; in fact, given the outcome seven or
more heads, the expected gain is $9.87 ($10.87 expected value including the ini-
tial $ 1 investment), and the minimum is $4.80, whereas given the outcome seven
or more tails, the expected loss is about $0.91, and the maximum $1.
If we were to define the game as a sequence of 100 coin tosses, to correspond to
the original stock scenario, the asymmetry would become much more extreme.
Now the maximum possible (though extremely unlikely) gain would be $2.58 x
1020, whereas the maximum possible loss would remain at $1, the value of the orig-
inal investment. The most likely final value of the investment would be $0.13. As
with the 10-toss game, the probability is greater than .5 that the final value would
be less than $1, but the chances of realizing an enormous gain are quite good.
Table 6.4 shows the approximate probability of getting exactly k heads in 100
tosses and the value of the original $1 given k heads. (The gain in each case is the
192
TABLE 6.3
The Probability of Getting Exactly k, and k or More, Heads in 10 Tosses
and the Resulting Value of $1, Given That a Toss of a Head Means a 60% Increase
and One of a Tail a 40% Decrease
latter value minus $1 representing the original investment.) The gain associated
with k heads is the minimum gain, given k or more heads; remember that the
maximum loss is never greater than $1, in the situation we are considering.
Before turning to the table, it may be useful to recall that the formula for
computing the probability of exactly H heads in 100 tosses is
100
,100
i J. w i
Where represents the number of combinations of 100 things taken H at a
\H J
time and the general formula for computing (»} the number of n things taken r
iV •
at a time is —7 —. For large n, it is convenient to use Stirling's approxima-
r\(n-r)l
tion for the factorial, «! = e~"nn(27m)w, approximately. Alternatively, one can
use logarithms of factorials, often provided in books of mathematical tables at
SOME PROBABILITY PARADOXES AND DILEMMAS • 193
TABLE 6.4
The Probability of Getting Exactly k, and k or More, Heads in 100 Tosses
and the Approximate Value of $1 Resulting From k Heads, Given That a Toss
of a Head Means a 60% Increase and One of a Tail a 40% Decrease
least for factorials up to 100! (The probabilities in the second column of the ta-
ble were calculated with logarithms; the use of Stirling's approximation would
have yielded very slightly larger values.)
The probability that 100 tosses will produce at least H heads is
100
p(k>H) =
Ik,
k=H\ K
)
194 • CHAPTER 6
and the value of the original $1 investment at the end of a game, given the oc-
currence of k heads, is
given as 1 are not exactly 1, of course, but round off to 1 when carried to 5
places beyond the decimal point.)
Thus the probability of having at least one out of five games increase the
value of the initial investment by at least a factor of 5.8 is about .61 and the
probability of having at least one game increase the value by at least a factor of
15.46 is about .25. A similar analysis for some of the possible outcomes of five
100-toss games is given in Table 6. 6.
So the probability of having at least one game produce 55 or more heads
and, consequently, at least a 17-to-l return on the investment is about .64. The
probability that at least one game will produce at least 60 heads and therefore a
2,361-to-1 return is about .13.
It should be clear that these games become increasingly attractive the larger
the number of times one is permitted to play them. If one is permitted to play
the game 100 times, the probabilities comparable to those in the preceding ta-
ble are as as shown in Table 6.7.
In other words, the probability of having at least one game produce a
2,362-to-l return is about .94, and the probability of having at least one pro-
duce a return of over 300,000-to-l is a non-neglible .16. These figures might
make many people willing to pay a fair amount of money for the opportunity to
play the 100-toss game 100 times.
SOME PROBABILITY PARADOXES AND DILEMMAS • 197
TABLE 6.6
Probability of Getting k or More Heads at Least Once in Five 100-Toss Games
and Approximate Value of $ 1, Given k Heads
TABLE 6.7
Probability of Getting k or More Heads at Least Once in 100 100-Toss Game
and Approximate Value of $1, Given k Heads
The St. Petersburg paradox generated a great deal of interest among early prob-
ability theorists and much debate about what would constitute an acceptable
resolution of it. Of particular importance for the subsequent development of
theories of decision making and rational behavior is the role it played in forcing
a distinction between monetary value and "utility," a distinction promoted by
Daniel Bernoulli (1738)—he used the terms "physical fortune" and "moral
fortune"—in his original attempt to resolve the paradox. Samuelson (1977)
credits Cramer (1728) as the originator of the idea of diminishing marginal
utility in a letter to Nicholas Bernoulli, and published by Daniel Bernoulli in
his 1738 paper. According to this idea, the utility to an individual of an increase
of a given amount in her fortune will vary inversely with the amount of wealth
she already has. Many of the proposed resolutions of the paradox were based
on this idea—which was often expressed as a concave relationship between
wealth and utility—and, in particular, on the assumption that increase in utility
diminishes essentially to zero as wealth increases indefinitely.
The distinction between value and utility was also used extensively by von
Neumann and Morgenstern (1953/1944) in their seminal work on decision the-
ory in its modern form. This distinction recognizes that what a specified amount
of money is worth to individuals depends on how much they have; to a person
who has only $1,000, the possibility of losing $1,000 presumably is much more
than 1,000 times worse than the possibility of losing $1, but this probably is not
the case for one whose net worth in measured in millions. Bernoulli assumed that
how much satisfaction—increase in moral fortune—one derives from a given
small increase in one's physical fortune is inversely proportional to the size of
one's physical fortune before the increase. This principle is incorporated in ex-
pected utility theory as the assumption that utility is a concave (looking up from
the abscissa) function of money (Pratt, 1964).
The relationship between money and utility is further complicated by the
recognition that the subjective value of a given amount of money may depend
not only on how much money one has to begin with but on whether the money
in question is a (real or potential) gain or loss. To the individual who has only
$1,000, the prospect of gaining another $1,000 is likely to be very attractive,
but the degree of happiness caused by this eventuality is unlikely to match the
degree of sadness that would accompany a loss of the same amount. In general,
it appears that losses have greater subjective value than gains of equal amounts
(Galanter & Pliner, 1974).
The St. Petersburg paradox and the variations on it considered earlier not
only illustrate the necessity of the distinction between monetary value and util-
ity, or between objective and subjective worth; they also demonstrate that the
SOME PROBABILITY PARADOXES AND DILEMMAS • 199
attractiveness of a possible venture with an uncertain outcome can depend not
only on the mathematical expectation of the outcome but on the entire proba-
bility distribution of possible outcomes. A distribution that is highly skewed—
with a great difference between mean and mode—represents a very different
situation from one that is symmetrical about its mean, and, as Allais (19797
1990) has pointed out, an attitude that takes account of the dispersion of psy-
chological values should not be considered irrational.
The stir that the paradox made and the rethinking that it caused of the mean-
ings of such foundational concepts as probability and expectation also illus-
trate the importance of intuitive notions of reasonableness as the final court of
appeals on questions of what shall pass for rationality and what shall not. Any
theory of rationality that prescribed that one should be willing to pay an infinite
amount of money, or even a large amount, for the opportunity to make the St.
Petersburg wager would be a strange guide for behavior indeed.
Zabell (1993) has argued that probability theory is useful precisely because
it does not correspond to our intuitions at all points; if it did, he says, we would
not need it. The point is well taken in the sense that our immediate and casual
intuitions when thinking about probabilistic events are often wrong, and a con-
sultation of probability theory can set them right. When, however, what the
theory has dictated has violated the intuitions of people who have thought
deeply and long about the issues, either the basis of the faulty intuitions has be-
come understood and the intuitions changed, or the theory has been modified
to be consistent with the intuitions that persist. Intuition is the final judge.
FIG. 6.1. Bertrand's paradox. Illustrating three possible answers to question of the
probability of a randomly drawn chord of a circle being longer than the side of an in-
scribed equilateral triangle.
Answer 2:1/3. Each end of the chord is equally likely to be anywhere on the
perimeter of the circle. Suppose one end is at A. Draw an inscribed equilateral
triangle with a vertex at A. The vertices of this triangle divide the circumfer-
ence into three equal arcs. The chord originating at A will be longer than a side
of the triangle if and only if it terminates on the arc opposite the triangle's ver-
tex at A. Inasmuch as it is equally likely to terminate anywhere on the perime-
ter, the probability that it will terminate on the arc opposite A is 1/3.
Answer 3:1/4. For a chord to be longer than AC its midpoint must lie within
the circle inscribed inside the inscribed triangle. The midpoint of a randomly
SOME PROBABILITY PARADOXES AND DILEMMAS • 201
drawn chord is equally likely to fall anywhere within the larger circle, so the
probability of it falling within the smaller circle is the ratio of the area of the
smaller circle to that of the larger, which is 1/4.
What is the resolution of this paradox? Is there one? Each of the answers to
the original question appears to have a legitimate rationale. Is one and only one
of them correct? Can they all be correct? And if so, what does this do to our ba-
sic ideas about what probability means? If a well-formed question of probabil-
ity can have two or more incompatible answers, is the theory of probability not
inconsistent and therefore of dubious value?
The difficulty here is similar in principle to the basis for confusion in some
of the problems discussed in the preceding sections. At the heart of it is the fact
that the statement of the problem does not provide enough information to let
one come up with an unqualified answer that is obviously correct. Specifically
what is missing is a definition of what is meant by "randomly drawing" a
chord, and this is critical because the concept is ambiguous.
Imagine that we decide to do an experiment in order to determine empiri-
cally which, if any, of the aforementioned answers to the Bertrand paradox is
correct. We propose to draw at random a very large number of chords and see
what percentage of them are longer than AC. This is a brute-force way to settle
the issue. But it turns out not to be that simple, because without first settling the
question of what it means to draw a random chord, we cannot proceed.
Suppose you were given the task of specifying the chords. How would you
go about guaranteeing that each chord is a random selection? Having just
thought about Bertrand's paradox, you might consider several possibilities.
Here are three rules that might come to mind.
Suppose a car has traversed a distance of 1 mile, and we know that the time
taken was between one and two minutes, but we know nothing further about it.
Applying the principle of indifference, we conclude that there is a probability
of 1/2 that the time taken was in the range of 1 to 11/2 minutes, and a probabil-
ity of 1/2 that the time taken was in the range 1 1/2 to 2 minutes. A logically
equivalent way of expressing our knowledge is to say that the car covered the
distance at an average speed between 30 and 60 miles per hour. Applying the
principle of indifference again, we conclude that there is a probability of 1/2
that the average speed was between 30 and 45 miles per hour, and a probability
of 1/2 that the average speed was between 45 and 60 miles per hour. Unfortu-
nately, we have just been guilty of self-contradiction. A time of 1 1/2 minutes
for a distance of one mile is an average speed of 40, not 45, miles per hour. On
the basis of the same information, formulated in different but equivalent terms,
we get the result that there is a probability of 1/2 that the average speed is be-
tween 30 and 40 miles per hour, and also that there is a probability of 1/2 that
the average speed is between 30 and 45 miles per hour. Since it is not impossi-
ble that the average speed is between 40 and 45 miles per hour, the foregoing
results are mutually incompatible, (p. 94)
Is this a paradox or a faulty analysis? One might argue that it is the latter and
that the fault lies in the assertion that the two ways described in the problem
statement of expressing our knowledge of the situation are "logically equiva-
lent." Assuming that there is a probability of 1/2 that the time taken was in the
range of 1 to IVi minutes is not logically equivalent to assuming that there is a
probability of 1/2 that the average speed was between 30 and 45 miles per hour.
More generally, assuming the equiprobability of all possible times between 1
and 2 minutes is not equivalent to assuming the equiprobability of all possible
speeds between 30 and 60 miles per hour, because the relationship between
time taken and speed is curvilinear.
But now, given that all one knows about the situation is that a specific dis-
tance has been traversed within some time between specified limits, what is the
appropriate assumption to make? The equiprobability of all possible times?
The equiprobability of all possible speeds? Something else, say a "splitting of
the difference" between these two? Unless one has some reason for preferring
one assumption over the other possibilities, it is not clear this question has any
SOME PROBABILITY PARADOXES AND DILEMMAS • 205
correct answer. The important point, for present purposes, is that and i/one as-
sumes equiprobability of all possible times, one gets one answer, if one as-
sumes equiprobability of all possible speeds, one gets a different one, and there
is nothing mysterious or paradoxical about this; it follows simply from the re-
lationship between speed, time, and distance traversed.
Intransitivity Paradoxes
TABLE 6.8
The Probability That Each Specified Die, Marked as Indicated in the Text,
Will Win When Rolled in the Indicated Three-Dice Combination
Probability of Winning
Combination A B C D
ABC 4/9 2/9 3/9 —
ABD 2/6 1/6 — 3/6
ACD 2/9 — 4/9 3/9
BCD — 1/3 1/3 1/3
Simpson's Paradox
A paradox that has received some attention both from statisticians and from
psychologists is sometimes referred to as the reversal paradox and sometimes
as Simpson's paradox (Blyth 1972a, 1972b; M. R. Cohen & Nagel, 1934;
SOME PROBABILITY PARADOXES AND DILEMMAS • 209
Lindley & Novick, 1981; Simpson, 1951). In general terms the paradox con-
sists in the fact that two variables that are related in a certain way (positively or
negatively) at one level of analysis may have the opposite relationship when
analyzed at a different level.
The phenomenon is readily seen in analyses involving contingency tables.
Table 6.9 is an example from Hintzman (1980) that illustrates how a table that
is the sum of two other tables may show a relationship between the variables in-
volved that is opposite the relationship shown in each of the component tables.
The thing to note is that in the composite table (a + b\ P(X I Y) < P(X),
whereas in both of the component tables (a and b\ P(X I Y) > P(X).
Messick and van de Geer (1981) have shown that, given a conditional proba-
bility relationship of the form
it is always possible to partition the data with respect to some third variable, C,
such that
for all i, which is to say that reversability is always possible: "The consequence
of this result is to generalize the reversal paradox and to show that any ordinal
relationship between two variables can be 'reversed' by the introduction of a
third variable such that within each level of this third variable, the relationship
TABLE 6.9
The Composite Table Shows a Relationship Between X and Y That Is Opposite
From the Relationship Shown in Each of the Component Tables
Note. From Hintzman (1980), Table 1. Copyright 1980 by the American Psychological Association.
Adapted by pennission of the author.
210 • CHAPTER 6
between the two original variables is precisely the opposite of the original rela-
tionship" (p. 588). They note too that the reversal process can be repeated (re-
storing the original relationship) with a further partitioning, and so on
indefinitely. The implication, they suggest, is that what the relationship be-
tween two variables is taken to mean should depend on the set of variables ex-
amined in its determination.
Messick and van de Geer (1981) make essentially the same observation
with respect to two-by-two contingency tables. Any such table can be decom-
posed into two additive tables in such a way that the direction of the relation-
ship in the original table is reversed in both subtables, provided only that the
original table contains no zero entries and has at least two observations in
each of the off-diagonal cells.
That Simpson's paradox is of more than theoretical interest is illustrated by
a study involving the question of gender bias in admissions to graduate pro-
grams at the University of California at Berkeley (Bickel, Hammel, &
O'Connell, 1975). In 1973, of all the students who applied to the various grad-
uate programs at Berkeley, about 44% of the men and about 35% of the women
were admitted. Given a total of 12,763 applicants (8,442 men and 4,321
women), the shortfall of women is highly statistically significant and would
seem to represent strong evidence of discrimination against women in the uni-
versity's admissions policy.
However, when the admissions data from each of the university's graduate
departments and interdepartmental graduate programs were examined in iso-
lation, there was evidence of a bias against women in only four of them, and
there was evidence of a bias against men in six. Given the results of this depart-
ment-by-department analysis, it is hard to see how the university as a whole
could be guilty of a bias against women in its admissions policy.
The explanation of the apparent discrepancy in the Berkeley data was that
departments to which women applied in the greatest numbers tended to accept
a smaller percentage of applicants than departments to which women were less
likely to apply. Bickel et al. (1975) suggested that an appropriate way to check
for bias on a university-wide basis would be to compare the overall number of
women accepted to the number expected on the assumption of no bias where
the number expected on the assumption of no bias is the sum of the number ex-
pected for each department and this number, for a given department, is the
product of three terms: the percentage of people applying to that department
who are women, the percentage of the people applying to that department who
are accepted, and the number of people applying to the department. In other
words, the expected number of women accepted by a department is taken to be
the number of people applying to the department weighted by both the percent-
age of that number who are women and the percentage who are accepted.
SOME PROBABILITY PARADOXES AND DILEMMAS • 211
How it is that evidence of a bias may appear with university-wide data when
such evidence does not exist at the departmental level may be more difficult to
grasp than the possibility of real bias at the departmental level being obscured
in aggregate university-wide data. Intuitively it is easy to see how, if each of
several departments had a bias against women and a roughly equal number had
a comparable bias against men, the combined data could show no bias at the
university-wide level.
The Berkeley data illustrate the possibility of coming to different conclusions
regarding some aspect of behavior—in this case the behavior of an institution
with respect to admissions—depending on whether one looks at the data set in
the aggregate or does a finer-grained analysis. But which of the two possibilities
is the correct one? E. Martin (1981) argues that "both are correct, that neither has
an a priori privileged status: they are but two aspects or views of the same thing,
just as latency and accuracy can be two measures made on a single process or
system. Thus neither a federal agency, say, nor a university is 'lying with statis-
tics ' when it cites one or the other result in defense of some position" (p. 372).
I agree that an argument can be made for both types of analysis, but find it
hard to see them as equally informative. The department-by-department analy-
sis is more informative, in my view, than is the analysis based on univer-
sity-wide data. This is seen in the fact that the results of the analysis of the
university-wide data are inferable from the results of the department-by-de-
partment analysis, but the converse is not true. I would argue further that an in-
dividual who was aware of the results of both types of analysis and cited only
the results of the university-wide analysis in order to support the charge of bias
on the part of the university would be guilty, if not of lying with statistics, at
least of being less than forthright in the use of informative data.
Hintzman (1980) argues that the possibility of Simpson's paradox makes
the results of many experiments on human memory that have relied primarily
on contingency-table analyses suspect, because relationships that have been
revealed at one level of analysis might have been reversed had analyses been
done at other levels. In response to this criticism of the way memory data have
often been analyzed, E. Martin (1981) claims that there is little, if any, empiri-
cal evidence that the results of memory experiments that have been analyzed in
two-by-two contingency tables are Simpson paradoxical. He does not say that
they could not be Simpson paradoxical, but, as I understand his position, only
that there is no empirical evidence that they, in fact, are.
Messick and van de Geer (1981) note that the reversal paradox can be stated
either in terms of conditional probabilities or in terms of correlations between
variables. In the latter case, the paradox consists in the possibility of having a
positive (negative) correlation between two variables while having a negative
(positive) correlation between the same varibles within each cell of a partition-
212 • CHAPTER 6
ing based on another variable. Messick and van de Geer point out further that,
stated in this way, the paradox is related, though not identical, to the problem of
illusory correlation (see chap. 9): "Such an illusory correlation can be sus-
pected to be present when two variables are found to be correlated statistically
but when the partial correlation between the variables with respect to a third
variable is of less extreme value, perhaps zero, or perhaps even reversed in
sign" (p. 584). They add the caution, however, that when the complete and par-
tial correlations have opposite signs, which, if either, correlation should be
considered illusory may depend on the specifics of the situation.
D. S. Wilson (1975) has shown how the reversal paradox could conceivably
have played a part in the development of altruism even if, as is generally as-
sumed, nonaltruistic individuals have a survival advantage over altruistic
members of the same group. If altruists and nonaltruists are unevenly distrib-
uted over "trait groups"—relatively many altruists in some groups and rela-
tively few in others—it could happen that altruists will have a greater survival
rate than nonaltruists in the population as a whole, even though the reverse re-
lationship holds for every trait group.
Simpson's paradox has been implicated by various investigators in other in-
teresting phenomena, such as paradoxes of preference (see Shafir, 1993), and
what has been referred to as the "hot-hand" illusion (see chap. 2). Other discus-
sions of Simpson's paradox may be found in Good (1972), Lindley (1972),
Winkler (1972), Messick and van de Geer (1981), S. H. Shapiro (1982), Wag-
ner (1982), and Sprent (1988). Falk and Bar-Hillel (1980) make the interesting
observation that a judicious redistribution of the U.S. population could in-
crease the average IQ in all 50 states; one can imagine other contexts in which
Simpson's paradox could be exploited to good cosmetic effect.
Paradoxes 01 Expectation
Paradoxes of Confirmation
A widely held view, and one that seems to be consistent with common sense, is
that a case of a hypothesis is confirmatory of that hypothesis. According to this
view, the hypothesis that all crows are black is supported by the observation of
a black crow. This idea has been challenged. Good (19671983b) gives an ex-
ample of a situation in which a case of a hypothesis can be disconfirmatory
with respect to that hypothesis:
Suppose that we know that we are in one or the other of two worlds, and the hy-
pothesis, H, under consideration is that all the crows in our world are black. We
know in advance that in one world there are a hundred black crows, no crows that
are not black, and a million other birds; and that in the other world there are a
thousand black crows, one white one, and a million other birds. A bird is selected
equiprobably at random from all the birds in our world. It turns out to be a black
crow. This is strong evidence (a B ayes-Jeffreys-Turing factor of about 10) that
we are in the second world, wherein not all crows are black. Thus the observa-
tion of a black crow, in the circumstances described, uudermines the hypothesis
that all crows in our world are black, (p. 119)
crease one's confidence that the hypothesis is true. The following example he
attributes to Paul Berent. Every observation of a man that is less than 100 feet
tall is confirmatory with respect to the hypothesis that all men are less than
100 feet tall. But suppose a man is found who is 99 feet tall. This too is a "case
of the hypothesis," but it is likely to (ought to) shake one's confidence in the
truth of the hypothesis.
Gardner (1976) notes that not only can cases that are consistent with a hy-
pothesis legitimately shake one's confidence in that hypothesis, it is possible
for them to show a hypothesis definitely to be false. Imagine a deck of 10 regu-
lar playing cards containing all the values from ace to 10. Suppose the deck is
shuffled and the 10 cards are placed facedown in a row. Let the hypothesis be
that no card with the value n is in the nth position of the row. Now suppose that
as the cards are turned over, one at a time from left to right, each of the first 9 is
confirmatory of the hypothesis (does not show the value that corresponds to its
position in the row) and none of them is the 10; one knows before turning over
the 10th card that the hypothesis is false, which is to say that the aggregate ef-
fect of the 9 confirmatory cases is disconfirmation.
These and other paradoxes of confirmation establish that a case of a hypothe-
sis is not necessarily confirmatory of that hypothesis. It is probably safe to as-
sume, however, that these are the exceptions that prove the rule and that in a large
majority of the real-life situations of practical interest, a case of a hypothesis can
be taken as confirmatory of that hypothesis—generally speaking, observation of
a black crow lends credence to the hypothesis that all crows are black.
Other Paradoxes
Many other paradoxes have been described. Some involve choices that are in-
consistent with axioms of decsion theory based on the idea that rationality dic-
tates the maximization of subjective expected utility. These include paradoxes
described by Allais and Ellsberg (Dawes, 1988). Whether choices that violate
such axioms are considered irrational depends, of course, on how binding one
considers a particular theory—more specifically, the axioms on which it is
based—to be. One must decide, in specific cases, which one finds more intu-
itively compelling, the axioms or the choices that violate them.
Summary
DILEMMAS
Prisoner s Dilemma
Two members of a criminal gang are arrested and imprisoned. Each prisoner is
in solitary confinement with no means of speaking to or exchanging messages
with the other. The police admit they don't have enough evidence to convict the
pair on the principal charge. They plan to sentence both to a year in prison on a
lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain.
If he testifies against his partner, he will go free while the partner will get three
years in prison on the main charge. Oh, yes there is a catch ... If both prisoners
testify against each other, both will be sentenced to two years in jail.
The prisoners are given a little time to think this over, but in no case may either
learn what the other has decided until he has irrevocably made his decision. Each
is informed that the other prisoner is being offered the very same deal. Each pris-
oner is concerned only with his own welfare—with minimizing his own prison
sentence. (Poundstone, 1992, p. 118)
What makes this a dilemma is the fact that (a) each prisoner can reason that he
should testify against his partner on the grounds that his own sentence will be
smaller by 1 year if he testifies than if he does not, whether or not his partner testi-
fies—which is to say the strategy of testifying dominates that of not testify-
ing—and if one testifies and the other does not, the one who testifies will do very
well (whereas the one who refuses to testify will do very poorly) indeed, but (b) if
216 • CHAPTER 6
both prisoners testify they both will do worse than they would have if both had re-
fused to do so. The situation may be represented by a two-by-two table showing
the outcomes for both prisoners of the four possible combinations of choices of
"not testifying" and "testifying" by each. The entries in each cell of the following
matrix show the "payoff to prisoner A and prisoner B, in that order, of these com-
binations; payoffs in this case are expressed as negative numbers, inasmuch as
they represent years in a jail sentence, which each prisoner would like to minimize.
There are many variations of the prisoner's dilemma, but they all have the
same basic form. In general terms, we may think of the two options available to
each "player" as cooperation and defection. In all cases, the strongly preferred
outcome for a given player, say A, is the one that obtains when that player, A,
defects and the other, B, cooperates; however, both players prefer the outcome
that occurs when both cooperate over that that occurs when both defect. The
following matrix (from Poundstone, 1992, p. 120), in which the cell entries
represent positive payoffs, say in dollars, illustrates the general situation.
B cooperates B defects
A cooperates 2,2 0,3
A defects 3,0 1,1
One may think of the four possible outcomes in the following terms: "There is
a reward payoff (the $2.00 above) for mutual cooperation, which both desire
more than the punishment payoff ($ 1.00 above) both receive for not cooperating.
But both covet the temptation payoff ($3.00 above), the highly desirable out-
come of a single defection, even more than the reward. Both fear being the one
who doesn't defect and getting stuck with the sucker payoff (0 above)" (p. 120).
One explanation of why a player of this game might choose to cooperate de-
spite the fact that an analysis indicates that he should defect independently of
what he believes his opponent will do notes the possibility of confusing the sit-
uation represented by the preceding payoff matrix with that represented by the
following one:
B cooperates B defects
A cooperates 2,2 0,0
A defects 0,0 1,1
217
In this case (which is not the prisoner's dilemma), A would reason that inas-
much as both he and his opponent will realize the greatest payoff if both cooper-
ate and that his opponent is likely to notice this also and choose to cooperate, he
should choose to cooperate. The suggestion is that A might erroneously apply
this reasoning to the preceding matrix, which does represent the prisoner's di-
lemma. It is true in the prisoner's dilemma case that the two players will realize
the greatest aggregate payoff if both cooperate, and that both are likely to realize
that, but if A makes the choice on the basis of thinking only this far about the situ-
ation, he runs the risk of B defecting and leaving him holding the (empty) bag. So
from a normative point of view, this would be a mistake of reasoning on A's part.
What is interesting is that both A and B will be the better for it if they both make
the same mistake. McCain (2000) argues that such mistakes could easily be in-
terpreted as evidence of altruism or of higher rationality. (Of course, the fact that
the choice of cooperation could be due, in some instances, to an incomplete or er-
roneous analysis of the situation does not rule out the possibility it could be due,
in other instances, to altruism or higher rationality.)
Investigators have studied both the case in which the same participants play
the same prisoner's dilemma game repeatedly an unspecified number of times,
and those in which they face the situation only once. This distinction is an im-
portant one, because in the latter case, the normative prescription is always to
defect, but in the former there is no guaranteed best choice and the effective-
ness of any particular strategy will depend on the strategy of the other player
(Nowak, May, & Sigmund, 1995). In the game of indefinite duration, one's be-
havior in any given instance may be determined in part by the intention of in-
fluencing the behavior of one's opponent in the future, or by that of punishing
uncooperative behavior in the past. The most reliable finding from studies in-
volving repeated play without a prespecified stopping point is that consistent
cooperative (competitive) play on the part of one participant tends to evoke the
same type of play on the part of the other (Axelrod, 1984).
This is a finding of some practical significance, from a social psychology
point of view, because it may provide some insight into the evolution of coop-
erative ("reciprocally altruistic") or competitive behavior (Boyd, 1988;
Trivers, 1971). A strategy that has proved to be remarkably effective in re-
peated-play prisoner's dilemma situations is the "tit-for-tat" strategy in which
the tit-for-tat player always selects the option (cooperate or defect) that was
chosen by the opponent on the preceding trial (Axelrod, 1980a, 1980b; Nowak
& Sigmund, 1992; An. Rapoport & Chammah, 1965).
The original and most frequently studied form of the prisoner's dilemma in-
volves only two players, but generalizations of it include communities in
which each player opposes several others, for example, nearest neighbors
(Lloyd, 1995). Axelrod (1984) has shown how mutual cooperation can emerge
218 • CHAPTER 6
from a cluster of individuals committed to reciprocity in prisoner's dilemma
situations even within a larger community of egoists (defectors), and how
"nice" strategies (strategies based on the principle of not being the first to de-
fect) can be stable in such communities. Hirshleifer and Rasmusen (1989) have
demonstrated how the possibility of ostracizing noncooperators can allow co-
operators to maintain cooperation in a repeated-play game. Cole (1997) takes
the relative effectiveness of nice strategies as an argument against equating
survival fitness with strength: "Just because the 'fittest' tend to survive ...
doesn't necessarily mean the 'fittest' are the strongest, or meanest, or even the
most reproductively profligate; the fittest may be those who learn best how to
use cooperation to their own ends" (p. 121).
Simulations of extended games with various strategies—tit-for-tat, generous
tit-for-tat (in which defection is occasionally responded to with cooperation),
win-stay lose-shift (in which a player sticks with the preceding choice if it
yielded a high payoff and switches if it yielded a low one)—have revealed the
possibility of the spontaneous emergence of cooperative behavior and its persis-
tence over time under certain conditions; transitions between predominantly co-
operative behavior and large-scale defection can be relatively abrupt, however,
and what causes them is not completely understood (Nowak et al., 1995).
Amnon Rapoport (1967) argues that when one can assume that one's own
behavior in a prisoner's dilemma game can influence the behavior of one's
opponent on subsequent trials, one may no longer have a true dilemma. This
is because the set of possible outcomes is no longer completely represented
by the original payoff matrix; a complete representation would have to take
account of the assumed influence on the opponent's play and the effects of
this on future payoffs. The dilemmatic character of the situation is most
clearly represented by the case in which the participants play the game only
once, so the possibility of influencing future behavior is not a complicating
factor—or, in the case, of repeated play, the dilemma is most stark if we con-
sider only the final time the participants must make a decision and they know
it to be the final time.
Poundstone (1992) contends that the absolute amounts of the payoffs in a
prisoner's dilemma situation are not critical, that it is necessary only that the
payoffs be ranked in a certain way and that the reward be greater than the av-
erage of the temptation and sucker payoffs. The latter condition is to rule out
the possibility that, in a series of prisoner dilemma games, players could win
more by taking turns defecting unilaterally than by adopting the cooperative
strategy in each game. Nozick (1993) challenges the idea that the relative
sizes of the payoffs should make no difference to a rational participant in a
prisoner's dilemma game. Contrast the following two situations (the num-
bers are from Nozick).
SOME PROBABILITY PARADOXES AND DILEMMAS • 219
B cooperates B defects
1000,1000 0,1001
1001,0 1,1
B cooperates B defects
A cooperates 3,3 -200,500
A defects 500, -200 2,2
To say all this about symbolic utility is to say that our responses to the Pris-
oner's Dilemma are governed, in part, by our view of the kind of person we
wish to be and the kinds of ways we wish to relate to others. What we do in a
particular Prisoner's Dilemma situation will involve all this and invoke it to
different degrees depending upon the precise (ratios of differences among)
utility entries in the matrix and also upon the particular factual circumstances
that give rise to that matrix, circumstances in which an action may come to
have its own symbolic meanings, not simply because of the structure of the ma-
trix, (p. 57)
SOME PROBABILITY PARADOXES AND DILEMMAS • 221
The point is that the prisoner's dilemma is an interesting and revealing rep-
resentation of a decision situation in the abstract, but that its application to
real-life problems is likely to be complicated by considerations that are diffi-
cult to capture in the representation.
In the conventional prisoner's dilemma game, neither player is permitted to
know the other player's choice before making his or her own. If one knew the
other player's choice before making one's own, there would be no dilemma for
the player in the know, and the game would not have generated the interest that
it has. However, some research on the effects of knowing the opponent's choice
before making one's own has revealed behavior similar in some respects to that
of the student who makes one choice when either of two conditions is known to
pertain but makes another choice when it is known only that one or the other of
the two possibilities pertain (Shafir, 1993). Shafir and Tversky (1992) found,
for example, that a quarter of the participants in their study defected when they
knew their opponent's choice, independently of what that choice was, but
made the cooperative response when they did not know their opponent's
choice. It is as though, they suggest, these people see the situation from the per-
spective of individual rationality when the opponent's choice is known and
from that of collective rationality when it is not.
A Related Problem
PlayerB
Player A Strategy 1
Strategy 1 0, 0
Strategy 2 0, 5
Strategy 1, one guarentees that one's opponent gets nothing, and leaves open
the possibility of getting 5 oneself. Five is not as good as 6, but the combination
of 5 for oneself and 0 for one's opponent may be considered to be better than 6
for both. This seems a particularly ungenerous—not to say self-defeating—
perspective, but not an implausible one all the same. The idea that some people
are willing to pay a price in order to ensure that others do not get ahead of them
is hardly unthinkable, unattractive though it may be.
Dawes (1988) points out that the selection of Strategy 1 can be made consis-
tent with subjective expected utility theory if it is assumed that a player assigns
a negative value to the other's welfare. Thus, for example, if each of the num-
bers in the preceding table were replaced by the difference between it and the
other number in the same cell, as shown in the following matrix, then Strategy
1 becomes the dominating strategy for both players.
PlayerB
Player A Strategy 1 Strategy 2
Strategy 1 0,0 +5, -5
Strategy 2 -5,+5 0,0
Another situation that bears some resemblance to the prisoner's dilemma and
the situation described in the preceding section is one that is sometimes re-
ferred to as the ultimatum game:
SOME PROBABILITY PARADOXES AND DILEMMAS * 223
Imagine that somebody offers you $100. All you have to do is agree with some
other anonymous person on how to share the sum. The rules are strict. The two of
you are in separate rooms and cannot exchange information. A coin toss decides
which of you will propose how to share the money. Suppose that you are the pro-
poser. You can make a single offer of how to split the sum, and the other per-
son—the responder—can say yes or no. The responder also knows the rules and
the total amount of money at stake. If her answer is yes, the deal goes ahead. If
her answer is no, neither of you gets anything. In both cases, the game is over and
will not be repeated. What will you do? (Sigmund, Fehr, & Nowak, 2002, p. 83)
According to some conceptions of rationality, the rational thing for the pro-
poser to do is to offer the smallest possible amount of money, $ 1, and the ratio-
nal thing for the responder to do is to accept that offer. In fact this not what
people do when put in this situation; the modal response typically is 50% of the
total, about two thirds of proposers offer between 40% and 50%; only a small
percentage (usually less than 5) offers less than 20% and such offers are often
rejected (Camerer & Thaler, 1995; A. E. Roth, 1995; Sigmund et al., 2002).
Although there is evidence of cultural effects on precisely how people play
this game (Henrich, 2000), in none of 15 small-scale societies investigated in one
study did people respond in the way that the assumption that behavior is deter-
mined exclusively by self-interest would predict (Henrich et al., 2001). Individ-
ual studies in such places as Ljubljana (Slovenia), Yogyakarta (Indonesia),
Tokyo, and several U.S. cities have corroborated the finding that mean offers by
proposers tend to be between 40% and 50% and low offers are frequently re-
jected (Henrich, 2000). An exception to this rule was found by Henrich (2000)
among the Machiguenga in the Peruvian Amazon, where offers averaged 20%
(the modal offer was 15%) and were almost always accepted. Another, but less
extreme exception, was reported by A. E. Roth, Prasnikar, Okuno-Fujiwara, and
Zamir (1991), who found that Israeli proposers offered only 36% (the modal of-
fer was 50%) but were likely (.71) to reject offers of less than 20%. Henrich in-
terpreted his results as a challenge to the assumption "that all humans share the
same economic decision-making processes, the same sense of fairness, and/or
the same taste for punishment" (p. 978).
Sigmund et al. (2002) interpret the typical finding as evidence that "real
people are a crossbreed of H. economicusandH. emoticus, a complicated hy-
brid species that can be ruled as much by emotion as by cold logic and selfish-
ness," and see it as a challenge "to understand how Darwinian evolution
would produce creatures instilled with emotions and behaviors that do not
immediately seem geared toward reaping the greatest benefit for individuals
or their genes" (p. 84). They see it also as evidence that people generally take
account of a coplayer's view of the situation and that they place a high value
on fair outcomes.
224 • CHAPTER 6
Other accounts of what appears to be magnanimous behavior in the ultima-
tum game have questioned the players' understanding of the situation (e.g., the
fact that the game is to be played only once, when that is the case), or proposed
that the utility that some players are trying to maximize differs from the game's
payoff (Nowak, Page, & Sigmund, 2000). With respect to the latter proposal,
we should not overlook the possibility—high probability in my view—that
some people's utility functions include a genuine interest in the well-being of
others that cannot be reduced to pure self-interest.
When the same players play the game repeatedly for an unspecified number
of times, and any given player is sometimes the proposer and sometimes the re-
sponder, punishment of proposers who make low offers can modify their be-
havior in the direction of higher offers (Ostrom, Walker, & Gardner 1992).
Even if the situation is structured so that low proposers cannot be identified in-
dividually (and thereby acquire reputations), low offers may elicit punishment
reactions and these can be effective in increasing subsequent offers (Fehr &
Gachter, 2000). Generally, in the absence of the ability to punish free-riding in
public-goods experiments of the prisoner's dilemma or ultimatum game type,
cooperation deteriorates badly over time (Fehr & K. M. Schmidt, 1999); how-
ever, the credible threat of punishment for free-riding can eliminate or severely
reduce the practice.
Nowak et al. (2000) have shown by computer simulation that fairness (the of-
fer by proposers of much more than the minimum) can evolve within a commu-
nity of players, even when no two players interact more than once, if proposers
have a means of finding out what offers have been accepted or rejected by re-
sponders in the past, but that when they do not have this means, evolution pro-
motes low offers and low demands. As the authors put it, "When reputation is
included in the Ultimatum Game, adaptation favors fairness over reason" (p. 74).
In the aggregate the results of experimentation with the ultimatum game
(and other games that involve decisions regarding cooperation or sharing) sup-
port the idea expressed as a conjecture by Fehr and Gachter (2000) that "in ad-
dition to purely selfish subjects, there is a nonnegligible number of subjects
whp are (i) conditionally cooperative and (ii) willing to engage in costly pun-
ishment of free-riders" (p. 984). As evidence of the first point, Fehr and
Gachter cite work by Fehr, Kirchsteiger, and Riedl (1993) and Berg, Dickhaut,
and McCabe (1995), and as evidence of the second point work of A. E. Roth
(1995) and Fehr, Gachter, and Kirchsteiger (1997).
In applying the results of experiments of the type reviewed here to real-life
decisions, we should note too the possibility that some people's behavior may
be motivated, at least some of the time, by true altruism, by which I mean a gen-
uine interest in the welfare of others, independently of how their welfare re-
lates to one's own. There are ways, of course, to see altruism as basically
SOME PROBABILITY PARADOXES AND DILEMMAS • 225
selfish—people can, for example, get considerable gratification from public
recognition of acts of altruism. But many people also do altruistic things anon-
ymously, and though it is possible to see such acts as selfish also—one gets the
satisfaction of self-esteem—this seems to me tantamount to ruling out unself-
ish acts by definition.
Social Dilemmas
The prisoner's dilemma is seen as prototypical of all situations that are charac-
terized by the "temptation to better one's own interests in a way that would be
ruinous if everyone did it" (Poundstone, 1992, p. 126). Sociologists and econo-
mists have identified many situations in which when members of a group act in
their individual self-interest they appear to be acting against the best interests
of the group. Hollis, Sugden, and Weale (excerpted in A. Fisher, 1988) charac-
terize such situations in which enlightened self-interest is self-defeating as
those in which one's order of preferences is as follows:
1 You do X, others do Y.
2 You do Y, others do Y.
3 You do X, others do X.
4 You do Y, others do X. (p. 163)
We should assume also that the individual who has to decide how close to
the fountain to stand wants to maximize the number of happy days of life. What
makes this a dilemma is the fact that no matter how close to the fountain one
chooses to stand one could have chosen to stand closer: "If it is irrational to do
one thing, when one has more reason to perform some alternative, when it
would have been better for one to have performed some definitely available al-
ternative, then one has, inevitably, acted irrationally in the circumstances just
mentioned. We have described a rational dilemma" (p. 470).
Slote (1986/1990) points out that the dilemma he describes is a dilemma
only from the perspective that rationality dictates that one attempt to maximize
the expected utility of one's decisions, or that one select the best of all possible
alternatives when faced with a choice. It is not necessarily a dilemma from a
satisficing point of view. For a satisficer it would be rational to take the attitude
that it suffices to get close enough to the wall to add many happy days to ones
life and that it is not necessary to try to add the maximum possible number. One
might even take the position that it is irrational to try to maximize when maxi-
mization is impossible, not only practically but theoretically as well.
Here is a variation on the situation described by Slote. Suppose you are
competing with several other people for some very highly desirable prize that
is to be given to the person who names the largest number. Each of you has only
one chance to name a number, and the winner takes all; there are no consolation
prizes. It seems reasonable to assume that the larger the number you name, the
greater is the likelihood that you will win the prize. But no matter what number
you name, one can always ask why you did not name a larger one.
It is not so clear that the satisficer's perspective works as well in this case as
in the preceding one; one cannot argue that a given large number will yield a
"good-enough" prize, because if it is not the largest number among those
named, it will yield no prize at all. The argument that it is not rational to try to
maximize when maximization is not possible also does not apply, because one
need not maximize anything in this case, one need only pick a number that is
larger than that picked by any of the other contestants. Perhaps this is not a di-
lemma from the perspective of a model of rationality that assumes one should
230 • CHAPTER 6
maximize expected utility but that, like Good's (1983a) Type II rationality, dis-
counts utility by taking the labor and cost of calculations and thinking (or fret-
ting) into account.
Slote (1986/1990) considers this interpretation of his version of the di-
lemma: "The difficulty of standing at closer and closer distances thus ap-
proaches the limits of the agent's power and skill. In that case, an agent who
through great effort and concentration chose to stand at a distance so near
that it would have been very difficult (for him or for anyone else) to choose
to stand any nearer, may count as having chosen rationally, or at least not ir-
rationally" (p. 477). A similar argument may be applied to the task of nam-
ing a large number. Slote notes that this type of argument would not apply if
it could be assumed that standing nearer and nearer the fountain (or naming
a larger and larger number) need not get increasingly difficult, but this
seems implausible.
Problems like those represented by the prisoner's and other social dilemmas
continue to be debated by decision theorists and other social scientists and are
not likely to be resolved to everyone's satisfaction any time soon. As Moser
(1990) has pointed out, the debates involve important questions about the roles
of cooperation and causation in rational decision making. They also raise fun-
damental issues about human nature, not only regarding the question of what it
is but that of what we would like it to be.
Summary
Statistics
Statistics, more than most other areas of mathematics, is just formalized com-
mon sense, quantified straight thinking.
—Paulos (1992, p. 58).
r ew, if any, characteristics of the world are more apparent than variability.
We see it everywhere we look. There are, by some counts, tens of millions spe-
cies of living creatures in the world. When our focus is narrowed to a single
species, say our own, variability is still the rule. Excluding identical twins, no
two people look exactly alike and even twins can usually be distinguished by
people who know them well.
One of the ways in which we cope with diversity is through the process of
conceptual categorization. We group things that are similar in certain respects
into conceptual classes or categories, give these categories names and then, for
many purposes, respond to members of the same category—items with the
same name—as though they were identical. Even within categories, however,
there is much variability. Though all chairs are chairs, they differ greatly in
size, shape, color, and numerous other respects. All snow flakes are six-sided,
but no two of them, we are told, have precisely the same crystalline structure.
Whereas people share certain characteristics that define their humanity, they
differ in height, weight, age, intelligence, hair color, eye color, and countless
other less obvious respects.
232
STATISTICS • 233
Statistics is the area of mathematics that helps us deal with variability in a
quantitative way. Compared to some areas of mathematics, such as geometry,
algebra, trigonometry and even analysis, statistics emerged as a distinct disci-
pline relatively recently. Since its development, however, its influence on
thinking in the social, biological, and physical sciences, as well as in the busi-
ness world and even in everyday life has been profound.
cial problems associated with the rapid growth of cities at the time. It was in
large measure from this interest in counting for social or political purposes that
statistics as a mathematical discipline eventually emerged: "Statistical laws
that look like brute, irreducible facts were first found in human affairs, but they
could be noticed only after social phenomena had been enumerated, tabulated
and made public. That role was well served by the avalanche of printed num-
bers at the start of the nineteenth century" (Hacking, 1990, p. 3).
By way of justifying his use of the term avalanche, Hacking points out that
in Germany alone, there were 410 statistical periodicals being published by
1860, whereas as of 1800 there had been essentially none. The tabulations pro-
duced by the counters during this period provided practically unlimited oppor-
tunities for subsequent workers to search for, and theorize about, statistical
regularities in real-world events. The tabulators were to subsequent theorists
what Tycho Brahe was to Kepler.
The information and the "political arithmetic" ("social mathematics,"
"moral statistics") that developed around these numbers reflected a desire to
provide a scientific basis for public policy and were used for a variety of gov-
ernmental purposes, especially to justify political or social reforms. The active
members of the early statistical societies were more likely to be politically ac-
tive than to be seriously interested in mathematics or natural science (Porter,
1986). Although eminent mathematicians were involved as well. Laplace
(1814/1951), for example, applied his calculus of probabilities to the "moral
sciences" and a variety of social issues; noting that average life expectancy in
France rose from about 28 at birth to about 43 for those who had already lived
beyond infancy, he argued the case for vaccination as one means of attacking
infant mortality. Gigerenzer et al. (1989) describe the development of statisti-
cal thinking in the 19th century as reflecting a mix of social and political views:
"Even in 1900, after the successful application of statistical reasoning to phys-
ics and biology, and the beginnings of a mathematical field of statistics, the
term still referred first of all to social numbers, and only by analogy to this
branch of applied mathematics" (p. 69).
Interest in tabulating and applying the results to the resolution of social
problems appears to have become somewhat weaker during the second half of
the 19th century than it had been during the first half. And little progress was
made in developing social statistics into a formal discipline during this time:
"Fifty years after the start of the era of statistical enthusiasm in the 1820s, the
same impressionistic and arbitrary eyeballing techniques were used to argue
for a positive or negative relationship between two variables. No permanent
cumulation in techniques of data collection and analysis had been made. Until
the twentieth century, this is typical of empirical social research everywhere"
(Oberschall, 1987, p. 113). "By the end of the [nineteenth] century,"
STATISTICS • 235
Oberschall says, "moral statistics meant voluminous compilations of statisti-
cal data without an attempt to make sense of its contents in other than a superfi-
cial descriptive fashion" (p. 115).
But statistics did become a mathematical discipline in time, and its effect on
thinking extends far beyond political arithmetic. The work of Quetelet and
other early tabulators or "statists" called attention to the possibility of studying
and describing aggregate phenomena, and even of developing laws in terms of
which such phenomena might be understood, despite the unpredictability or
"unlawfulness" of the individual cases of which the aggregates were com-
posed. The ideas and methods that were developed were found to be useful in
diverse domains in addition to government and public policy, including espe-
cially insurance and biometrics, and eventually the physical sciences as well.
Statistics is one area of mathematics in which applications often spurred theo-
retical developments. Stewart (1989) points out that by the end of the 19th cen-
tury statistics had provided an alternative to differential equations as a basis for
describing many natural phenomena, but that despite this fact there was virtu-
ally no contact, at a mathematical level, between the two disciplines.
During the first half of the 20th century, mathematical statistics developed
rapidly and affected science profoundly. Scientists had been conducting exper-
iments long before this time, but before the 20th century the results of experi-
mentation were seldom reported very fully; instead, scientists typically
presented their conclusions and published data that "demonstrated" their truth.
The work of a few statisticians—notably Ronald Fisher—produced an ap-
proach to experimental design and data analysis that changed the way experi-
mental science was done (Salsburg, 2001).
Fisher's work had a profound effect on psychological research, especially
on the design of experiments and the interpretation of results. Statistical hy-
pothesis testing became the standard modus operandi. But during the latter half
of the 20th century, statistics began to affect the thinking of psychologists in
another important way. As Gigerenzer et al. (1989) put it
The view that the nature of human thought might be statistical calculation, or at
least, should be, arose at the same time as the view that sensory detection and
discrimination, perception, recognition from memory, and other cognitive pro-
cesses might involve statistical calculation.... All these statistical perspectives
emerged around 1960, shortly after statistics had been institutionalized as the in-
dispensable tool of the experimenter, (p. 216)
In physics, probabilistic and statistical ideas became important in two dif-
ferent ways; first, during the latter part of the 18th and the beginning of the 19th
century, in application to the treatment of errors of observation, and later, dur-
ing the latter part of the 19th century and early in the 20th, in theory construe-
236 • CHAPTER 7
tion (Kriiger, 1987a). These ideas impacted theory construction first in
statistical mechanics and then in quantum mechanics. The application of sta-
tistics to biology was spurred by the rediscovery of Mendel's work on heredity
in 1900 (R. J. Berry, 1988). This application met with some resistance because
not only did Mendel's theory run counter to prevailing ideas about inheritance,
but the application of mathematics to biology was itself unacceptable to many
biologists of the day (Barber, 1961; Flammarion, 1890). In time, however, sta-
tistical analysis came to be seen as an indispensable tool for biologists, as well
as for science more generally. By the end of the 20th century, probability and
statistical ideas had permeated essentially all the sciences, hard and soft alike.
Perhaps nowhere is the usefulness of statistics in science more apparent than in
the intersection of astro- and particle physics, where the tenability of theories
of cosmology rest, in part, on the ability to estimate the frequency of very low
probability events (e.g., the splitting of a deuteron nucleus by a neutrino) by
applying statistical techniques to very large numbers of observations (Mac-
Donald, Klein, & Wark, 2003).
( n )
f(x)= n )
V
' (ajtej
where n is the number of measurements and ji and a2 are their mean and vari-
ance respectively. It is descriptive, or approximately so, of a great many distri-
butions found in nature, a fact that Fourier (1819), among others, found to be
remarkable. It was applied to probability theory by de Moivre as the limit of the
binomial distribution,
n-k
The fact that it can be used to approximate the binomial distribution for large
n facilitates the estimation of probabilities that would be computationally te-
dious if not intractable with the binomial function and is a great convenience in
the practice of statistical hypothesis testing and decision making more generally.
Porter (1986) points out that the history of this curve is practically coexten-
sive with the history of statistical mathematics during the 19th century. He also
notes that although the curve originally was thought of as a representation of
error—it was used effectively by astronomers, along with the method of least
squares, to deal with the problem of variability in the measurement of astro-
nomical phenomena—it was reinterpreted as a law of genuine variation. Porter
refers to this reinterpretation as "the central achievement of nineteenth-century
statistical thought" (p. 91).
STATISTICS • 243
The reinterpretation came about gradually by a process that in retrospect
looks like creative misunderstanding:
Once again the lead was taken by social thought, and in this matter its influence
on the natural sciences is demonstrable. The main line of development pro-
ceeded from Laplace to Quetelet to Maxwell and Gallon, and from the error of
mean values in demography as well as astronomy to the deviations from an ide-
alized average man to the distribution of molecular velocities in a gas and the in-
heritance of biological variation in a family. Ultimately, even the analysis of
error was transformed by this line of statistical thinking. (Porter, 1986, p. 91)
Thus over time, beginning perhaps with Quetelet's discovery that the "error
curve" was descriptive not only of measurements made by astronomers but also
of such natural variables as human height and other features, there developed an
interest in variability as a natural phenomenon in its own right. Around 1840,
Quetelet's focus began to shift from the stability of averages and rates to the way
variables are distributed. He was particularly intrigued by how often the distribu-
tion appeared to be similar to that of errors of observation (Gaussian) and in his
1848 book, Du Systeme Social et des Lois Que le Regissent (On the Social Sys-
tem and the Laws That Govern it), proposed the "law of accidental causes," to ac-
count for the regularity of distributions of all sorts (Le"cuyer, 1987).
Interest in variability per se, and especially in the variability of human traits
and abilities, was apparent in the work of Galton (1869, 1874, 1889) and his
lifelong focus on the exceptional as opposed to the commonplace. His study of
genius and his framing of the nature-nurture debate reflect this interest, and in
his autobiography, he explicitly acknowledges it: "The primary objects of the
Gaussian Law of Error were exactly opposed, in one sense, to those to which I
applied them. They were to get rid of, or to provide a just allowance for errors.
But those errors or deviations were the very things I wanted to preserve and to
know about" (Galton, 1909, p. 305). Murray (1987) identifies Fechner's use of
the equation for the normal distribution to estimate the variability in measure-
ments in psychophysical experiments on just noticeable differences as the first
application of probability theory in psychology.
The history of the development of statistical thinking—extensive accounts
of which are readily available (e.g., P. L. Bernstein, 1996; David, 1962;
Gigerenzer et al., 1989; Hacking, 1975,1990; Porter, 1986; Salsburg, 2001)—
presents a curious mix of discoveries of lawful regularities in the behavior of
large collections of unpredictable individual cases, frustrations arising from
the inability in certain situations to get precisely the same measurements on
different attempts to measure the same thing, the realization that certain distri-
butions are characteristic of a wide variety of natural phenomena, interest in
why things vary as they do, and the practical need for mathematical concepts
244 • CHAPTER?
and tools with which to deal with variability in an effective way. The purpose of
statistical hypothesis testing, about which more later, can be seen as that of dis-
tinguishing between variability (in one or more dependent variables) that can
be attributed to controlled variability (in one or more independent variables)
and that which cannot. In this context, error generally connotes the variability
in the first kind of variable that cannot be attributed to the variability in the sec-
ond kind, and it is sometimes divided into two types: sampling error (error aris-
ing from studying a sample that is not representative of the population of
interest) and measurement error (error that results from imprecision of a mea-
suring instrument) (Abelson, 1995).
USES OF STATISTICS
Statistical Description
Two of the more fundamental ideas in statistics are those of a variable and a fre-
quency distribution. A variable is any measurable property that varies in a pop-
ulation. Height, for example, is a variable property of humans. Variables are
said to be distributed in certain ways. If we were to measure the height of every
person in the world and then construct a graph showing the number of people
whose height was between, say, 5 ft. and 5 ft. 1 in., those between 5 ft. 1 in. and
5 ft. 2 in., those between 5 ft. 2 in. and 5 ft. 3 in., and so on, when we had fin-
ished, extending the height measure in both directions sufficiently far to ac-
commodate the shortest and tallest people in the world, the resulting graph
would represent the distribution of the height of all people living in the world
today. This distribution would be more or less bell-shaped with its highest
point probably somewhere around 5l/2 feet.
Not everything is distributed like height of course. If we were to plot the dis-
tribution of family sizes, for example, showing the number of children in the
family, varying from 1 to, say, 20, on the horizontal axis and number of families
with that number of children on the vertical axis, we would find that the distri-
STATISTICS • 245
bution would be quite asymmetrical. There would be many more families with
1 or 2 children than with 19 or 20.
If we were to plot the distribution of many variables, we would find that dif-
ferent variables give rise to distributions with different shapes. However, we
would also discover that certain shapes occur frequently, because there are a
few distributions that are descriptive of the ways in which many variables are
distributed in the world. Among the better known of these, in addition to the
gaussian, are the rectangular, the exponential, and the logistic.
Producing a picture of a distribution is an excellent way to answer a question
regarding how any particular variable is distributed. However, although pic-
tures are very useful for some purposes, they are less so for others. Statisticians
have developed concepts that can be used to describe distributions when one
wants to talk about them. These descriptions make use of properties or charac-
teristics of distributions called "parameters." A parameter of a distribution is a
number that provides some summary information about the distribution. The
mean, median, and mode are sometimes referred to as central-tendency param-
eters, because each of them conveys information about the center, more or less,
of the distribution. Other useful parameters include the standard deviation and
interquartile range, which, because they convey information about the shape of
the distribution and, in particular, about the degree to which it is narrow and
peaked as opposed to broad and relatively flat, are sometimes referred to as pa-
rameters of dispersion.
Measures of central tendency and of dispersion are very useful in describing
distributions. Thus, in answer to the question of how tall American men are,
one might point to a graph of the distribution of heights or, alternatively one
might say that this distribution is approximately gaussian, or "normal," and has
a mean of 68 inches and a standard deviation of 3 inches. To someone who un-
derstands elementary statistics, such a description conveys nearly as much in-
formation as does the actual distribution graph. Such a person understands that
a gaussian distribution has the approximate shape of a bell, symmetrical about
its mean, and that about 95% of all the cases are within plus or minus two stan-
dard deviations of the mean. In the case of our example, the description tells us
that the average height of an American man is 68 inches and that about 95% of
all American men have a height somewhere between 62 inches and 74 inches.
(The numbers in this example were made up for purposes of this illustration.)
A particularly useful fact relating to the normal distribution is represented
by what is known as the central limit theorem. According to this theorem, no
matter how a variable is distributed, if one draws a large number of random
samples of that variable and calculates the mean of each sample, one will find
that the distribution of the means will be normal. "Large number" in this con-
text is conventionally taken to be 30 or more, unless the underlying distribution
246 • CHAPTER 7
is highly asymmetric, in which case it may have to be a bit larger for the ap-
proximation to be accurate. This is a remarkable theorem and one that finds
many practical applications in statistics.
Distribution parameters facilitate discussion about variables, but it is im-
portant to recognize that they convey less information about variables than do
the distributions themselves, in either graphical or equation form. Two distri-
butions with the same mean can differ with respect to dispersion; and even if
they have equal standard deviations, the variability may be determined in dif-
ferent ways.
The usefulness of statistics for purposes of description does not depend on
relating statistics to any theory of probability or chance or on any assumptions
about why natural variables are distributed the way they are. But one would
like to know why so many variables are distributed in ways that can be de-
scribed by simple mathematical functions. What determines how a variable is
distributed? Why are certain distributions so frequently seen in nature? What
accounts for instances in which two variables that appear to have no relation-
ship to each other are distributed in the same way?
Statistical Estimation
The term "null hypothesis" has at least two connotations as applied to statisti-
cal testing. Usually it is intended to represent the hypothesis of "no differ-
ence" between two sets of data with respect to some parameter, such as their
means, or of "no effect" of an experimental manipulation on the dependent
variable of interest. Suppose one wished to know whether Europeans and
Asians differ with respect to how long they live. To approach this question
statistically one would compare representative samples of longevity data
from Europe and Asia. The specific question one would ask, by means of the
application of statistical procedures, would be whether the evidence justifies
rejection of the "null" hypothesis, typically represented by H0, that the two
samples of data were drawn from the same population. This may seem like a
strange way to describe the situation because it is obvious that the data are
from two different populations, one European and one Asian, but the ques-
tion that is being asked is if those populations differ with respect to the vari-
able of interest, namely longevity; the default—null—hypothesis is that they
do not differ in this regard. Statistical testing is used to see if the data warrant
rejection of that hypothesis, or whether it is more reasonable to consider them
the same population with respect to this variable.
"Null hypothesis" also sometimes has the more inclusive meaning of the
hypothesis the nullification of which, by statistical means, would be taken as
evidence in support of a specified alternative hypothesis. Given the latter con-
notation, the null hypothesis may or may not be a hypothesis of no difference or
of no effect (Bakan, 1966).
The distinction between these connotations is sometimes made by referring
to the former as the nil-null hypothesis or simply the nil hypothesis; usually the
distinction is not made explicitly and whether null is to be understood to mean
nil-null, as it almost always does, must be inferred from the context. In what
follows, "null hypothesis" will be used to indicate the hypothesis of no differ-
ence or no effect unless otherwise specified, and will be represented as H0.
Application of a statistical significance test to the difference between two
means usually yields a value of p, the theoretical probability that a difference of
the size obtained, or larger, would have been obtained from two samples of the
size of those used had they been drawn at random from the same population. A
"confidence level," usually designated as alpha, is specified to serve as a deci-
sion criterion and the null hypothesis is rejected only if the value ofp yielded
by the test is not greater than the value of alpha. If alpha is set at .05, say, and a
249 • CHAPTER 7
significance test yields a value of/? less than .05, the null hypothesis is rejected
and the result is said to be statistically significant at the .05 level.
This logic of null hypothesis significance testing, as it is generally presented
in statistics texts, admits of only two possible decision outcomes: rejection (at
some specified level of confidence) of the hypothesis of no difference, and fail-
ure to reject this hypothesis (at that level). Given the latter outcome, all one is
justified in saying is that a statistically significant difference was not found;
one does not have a basis for concluding that the null hypothesis is true.
Inasmuch as the null hypothesis may be either true or false and it may either
be rejected or fail to be rejected, any given instance of null hypothesis testing
admits of four possible outcomes as shown in the following Table 7.1
There are two ways to be right: rejecting the null hypothesis when it is false
(when the samples were drawn from different populations) and failing to reject
it when it is true (when the samples were drawn from the same population).
And there are two ways to be wrong: rejecting the null hypothesis when it is
true and failing to reject it when it is false. The first of these two ways to be
wrong is usually referred to as a Type I error and the second as a Type II error.
Theoretically, the probability that a Type I error will be made, if the null hy-
pothesis is true, is given by the p value (criterion or confidence level) that is
used to reject the hypothesis (alpha). The probability of occurrence of a Type II
error, if the null hypothesis is false, usually referred to as beta, is generally
much larger than alpha, but not known precisely.
Null hypothesis significance testing (NHST) has been widely used in psy-
chological research since its invention and its use has been subject to intense
criticism for equally as long. Debate between critics and defenders of such test-
ing continues to the present time. I have reviewed the controversy elsewhere
(Nickerson, 2000), so will not do so again here, beyond noting some of the crit-
ics and defenders of NHST and listing in Table 7.2 some common misunder-
standings about it that have been pointed out in the literature.
TABLE 7.1
The Four Possible Outcomes of a Null Hypothesis Test
p is the probability that the null hypothesis is true and that 1 - p is the probability
that the alternative hypothesis is true.
Rejection of the null hypothesis establishes the truth of a theory that predicts it to
be false.
A small p is evidence that the results are replicable.
A small p means a treatment effect of large magnitude.
Statistical significance means theoretical or practical significance.
Alpha is the probability that if one has rejected the null hypothesis one has made a
Type I error.
The value at which alpha is set for a given experiment is the probability that a Type
I error will be made in interpreting the results of that experiment.
The value at which alpha is set is the probability of Type I error across a large set of
experiments in which alpha is set at that value.
Beta is the probability that the null hypothesis is false, or the probability of making
a Type II error.
Failing to reject the null hypothesis is equivalent to demonstrating it to be true.
Failure to reject the null hypothesis is evidence of a failed experiment.
STATISTICAL EXPLANATION
If one plots the frequency with which there have been zero, one, two, three,
four, or more than four outbreaks of war in a year, one gets a Poisson distribu-
tion. One gets the same type of distribution if one plots the frequency of years
having zero, one, two, three, four, or more than four wars coming to an end.
(The fact that one gets a similar distribution in both cases would be redundant if
all wars lasted the same amount of time, but they do not.) This is the kind of dis-
tribution one would expect if the initiation of war were a completely random
252 • CHAPTER 7
event. It suggests, in effect, that the instantaneous probability of the outbreak
of war is constant over time.
L. F. Richardson (1956) notes that:
This explanation of the occurrence of war is certainly far removed from such ex-
planations as ordinarily appear in newspapers, including the protracted and crit-
ical negotiations, the inordinate ambition and the hideous perfidy of the
opposing statesmen, and the suspect movements of their armed personnel. The
two types of explanation are, however, not necessarily contradictory; they can
be reconciled by saying that each can separately be true as far as it goes, but can-
not be the whole truth, (p. 1258)
Does showing that the frequency of war outbreaks can be fitted with a Poisson
distribution constitute an explanation at all, and if so, what exactly does it explain?
The Poisson distribution describes a surprisingly large number of natural
phenomena. Kac (1983) gives an interesting illustration of the descriptive
range of this construct, involving deaths of Prussian soldiers by horse kicks
and the emission of alpha particles by a radioactive substance:
event" are sometimes given causal connotations. And in the ensuing comments
I more than once alluded to chance as a causative agent. Exactly what might it
mean for something to be determined by chance? It sounds as though one is
imputing to "chance," whatever that is, the power to determine the outcomes of
events—not the outcomes of individual events, to be sure, but the aggregate
outcome of multiple events. Chance is being identified as the cause of the
aggregate outcome. This makes chance sound very deterministic, which
appears to be a contradiction in terms.
One might say that this is not correct, that all one is doing is describing
the outcome, not accounting for it, when one attributes it to chance. But
Hacking (1987b) notes that at one point in the evolution of statistical think-
ing, "it was an essential part of the doctrine of chances that there was always
an underlying causal structure. It was the task of the analyst to find that
structure," but that at a later point, "people became indifferent to that" (p.
53). According to the later view, statistical law became autonomous, which
is to say usable "to explain something else, without itself having to be re-
duced" (p. 53). This appears to make the question of what accounts for the
predictability of chance events—the lawfulness of chance—moot. I do not
find this completely satisfying. Chance, in my view, is a great mystery; I
doubt if anyone understands it very well.
Kriiger (1987b) also points out that acceptance of probability as an
explanatory construct was slow:
Although in the course of the nineteenth century, statistical practice and a corre-
sponding amount of probability theory spread rapidly through various disci-
plines, most scientists and philosophers remained opposed to taking probability
as fundamental or irreducible, or according it an explanatory function. Indeed,
that function appeared to imply the reality of possibilities or of indeterminate-
ness; hence to recognize it seemed to involve too high an ontological price. At
any rate, there is a time lag between the widespread use of statistics and proba-
bility on the one hand and the adoption of a probabilistic view of, or attitude to-
ward, reality (or parts of reality) on the other, (p. 60)
No less a figure than Kant insisted that nothing happens through blind
chance, which implies that all phenomena have deterministic explanations
even if one cannot discover what they are. According to this view, which dates
back at least to the classical Greek atomists (e.g., Leucippus and Democritus),
"chance is excluded not only from the domain of scientific concepts but also
from the world of events; laws of chance or probability theory cannot possibly
refer to reality" (p. 62). Kriiger points out that many 19th-century scientists
and philosophers considered it obvious that statistical regularities must have
deterministic explanations. He credits Maxwell with transforming what had
STATISTICS • 259
been descriptive in social science and statistics into explanatory accounts of
observable phenomena like heat, flow, and diffusion.
That knowledgeable people can interpret its significance in quite different
ways is illustrated by the contrasting views of Monod (1972) and Polkinghorne
(1986). Monad takes the operation of chance in the processes of the world as
evidence of their meaninglessness. Polkinghorne views it as providing an
insight into the design of the world:
When I read Monad's book I was greatly excited by the scientific picture it pre-
sented. Instead of seeing chance as an indication of the purposelessness and fu-
tility of the world, I was deeply moved by the thought of the astonishing
fruitfulness it revealed inherent in the laws of atomic physics... the fact that they
have such remarkable consequences as you and me speaks of the amazing poten-
tiality contained in their structure. From this point of view the action of chance is
to explore and realize that inherent fruitfulness. (p. 54)
The "laws of chance" is the name we give to the fact that certain events that
are irregular and unpredictable individually are regular and predictable in the
aggregate. As an explanatory construct, chance, or the idea of the laws of
chance, raises the question of what an explanation is. What do we mean when
we say that the outcome of the toss of a coin is determined by chance? Or that
the laws of chance dictate that if a fair coin is tossed a large number of times, it
will come up heads on about half of the tosses? In what sense have we ex-
plained anything by such claims?
Perhaps we need to recognize levels of explanation. A concept that is used
as an explanatory construct at one level must be explained, if at all, at a deeper
level. I say "if at all" because instead of attempting to explain a concept, one
might opt to take it as a given—as a primitive of one's explanatory system. One
might elect, for example, to treat chance as such a construct—take it as a given
and decline to attempt to explain it in terms of more basic concepts. This does
not preclude one from studying how it works; it simply admits an inability, or at
least a disinclination to try, to determine why it works.
So if we accept the laws of chance, much as we accept the law of gravity, as
descriptive of a characteristic of the way things are, and let it go at that, we may
invoke the concept to account for events that we cannot account for
deterministically. We can, in other words, attribute to chance events or
outcomes that we cannot account for in a more traditional cause-effect way.
But we should recognize, it seems to me, that this is explanation at a less than
fundamental level. By defining random sampling as sampling in such a way
that every member of the population has an equal likelihood of being in the
sample, we can invoke the concept of regression to the mean, for example, to
account for a variety of phenomena. Such an explanation accounts very well
260 • CHAPTER?
for the phenomena, given the definition of random sampling, but it does not
explain why random sampling works.
FIG. 7.2. Expected number of sums equal to S in 10,000 tosses of a pair of dice.
261
262 CHAPTER 7
The final example involves another coin-tossing exercise. In this case suppose
we defined an event as the number of successive heads tossed before the occur-
rence of a tail. That is, on each trial we would toss the coin until it came up tail, and
the value of the variable for that trial would be the number of heads tossed; the oc-
currence of a tail would terminate the trial. Obviously, this variable can take on any
integer value equal to or greater than 0. When we plotted the distribution of values
obtained in 10,000 trials we would expect to get a distribution approximately lie
that shown in Fig. 7.3 which can be described by the exponential function
FIG. 7.3. Expected number of times the first head will be preceded by N tails in 10,000 trials.
STATISTICS • 263
The idea that the results of statistical tests should be used primarily as aids to
human judgment, and seldom if ever as definitive justification for accepting or
rejecting a hypothesis has been defended forcefully by Abelson (1995) in his
presentation of "statistics as principled argument." Abelson identifies five fac-
tors that, in his view, determine the persuasive force of a statistically supported
argument, and organizes them around the acronym MAGIC:
Although some seminal suggestions regarding the use of statistics for inferen-
tial purposes had been put forth as early as the middle of the 19th century—for
example, the work of Gustav Radicke (Coleman, 1987)—what might be re-
ferred to appropriately as theories of statistical inference were not developed
until the 20th century. Gigerenzer and Murray (1987) point out that something
of an "inference revolution" occurred in psychology between 1940 and 1955,
as a consequence of which "inferential statistics" became indispensable for
psychologists. Unfortunately, as the same authors also note, many textbooks
that are used to teach students techniques of statistical analysis and experimen-
tal design present the material as though the theoretical underpinnings of sta-
tistical inference were noncontroversial. Often theory, let alone theoretical
controversy, is not even mentioned. Algorithmic procedures are given for cal-
culating various statistical quantities and for making tests of statistical signifi-
cance; follow the computational recipe, consult the appropriate tables, report
the results, and rest easy that science has been served. In fact, the development
of the theoretical underpinnings of statistical inference was full of controversy
and differences of opinion persist regarding some of the foundational issues.
Many people contributed to the development of mathematical statistics
during the early part of the 20th century. One who deserves special mention is
Karl Pearson. He is remembered primarily for the correlation coefficient that
bears his name, but his influence went way beyond this contribution. He
advanced the concept of a skew distribution, identified certain measures
(mean, standard deviation, symmetry, and kurtosis) as the important
parameters of a distribution, created the first goodness-of-fit test using the
chi-square statistic, and established—along with Francis Galton and Raphael
Weldon—Biometrika, which published primarily distribution measurements
on biological variables motivated by an interest in Darwin's then new theory of
evolution. Pearson is also remembered for the long-lasting feud between him
and another major figure in the statistical world of the early 20th century,
Ronald A. Fisher; the jury is still out on the question of who was the more
contentious of the two.
Three different schools of thought have defined much of the controversy
around statistics during the 20th century; one derives from the work of R. A.
Fisher, another from that of Jerzy Neyman and Egon Pearson (Karl Pearson's
son), and a third from the approach originally put forth by Thomas Bayes.
Gigerenzer et al. (1989; Gigerenzer & Murray, 1987) see each of these views as
"a considerable advance" over earlier ones, but they note that none of the three
has emerged as the clear winner, and that the issues on which they are divided
266 • CHAPTER 7
are deep and fundamental: "The different schools often disagree fiercely about
basic issues, and value-laden words from ordinary speech such as 'efficient,'
'unbiased,' and 'coherent,' have been enlisted as names of central concepts in
the various theories. By implication, rival approaches are charged with
inefficiency, bias, and incoherence" (p. 90). The rivalry, sometimes acrimoni-
ous, among the pioneers, especially between Fisher and Karl Pearson, has been
described in some detail by Salsburg (2001). Salsburg's book recounts also the
contributions of other notables to the development of statistics as a
mathematical discipline.
The approaches of each of these schools are widely applied, those of Fisher
and Neyman-Pearson primarily in the experimental sciences and those of
Bayes more in economics and related disciplines. Fisherian and Neyman-
Pearson ideas about statistical inference have been extremely influential in the
development of methodology and especially conventions of experimental
design in psychology. The Bayesian approach has been studied widely by
psychologists as a normative model of everyday reasoning and decision
making and has motivated a great deal of experimentation designed to
determine the extent to which people naturally behave as Bayesians. The
Fisherian and Neyman-Pearson views are considered briefly in the next two
subsections; the Bayesian approach has already been discussed in chapter 4.
Curiously, Bayesian analysis has not been applied much to the interpretation of
experimental data, even by those who consider it a normative model of
reasoning and decision making.
Fisnerian Inference
The Fisherian approach, which was set forth in Statistical Methods for Re-
search Workers, first published in 1925, centers on designing experiments to
test the null hypothesis, the hypothesis that two samples were randomly drawn
from the same population. The statistical question that Fisherian significance
tests, such as the analysis of variance, are intended to answer is the question of
the probability of obtaining a difference between observed and hypothesized
data of a specified magnitude by chance. By convention, when the probability
of obtaining an observed difference by chance is determined to be less than a
specified criterion, say .05, the null hypothesis—the hypothesis that there is no
nonchance difference—is said to be rejected at a level of confidence defined by
that criterion. As already noted, the logic of the significance test is not univer-
sally accepted.
Gigerenzer et al. (1989) distinguish between ^substantive null hypothesis and a
statistical null hypothesis, and argue that it is only the latter that can be rejected by
a Fisherian significance test. The substantive null hypothesis is the hypothesis that
STATISTICS • 267
an experimental treatment has had no effect; the statistical null hypothesis is the
hypothesis that two samples have been drawn from the same population. System-
atic errors in the design of an experiment could lead either to the rejection of the
statistical null hypothesis when the substantive null hypothesis is true or to failure
to reject the statistical null hypothesis when the substantive one is false. For this
reason, proceeding inferentially from the rejection of a statistical null hypothesis
to the acceptance of the hypothesis of a causal effect, a common practice in the in-
terpretation of the results of psychological experiments, is not as straightforward
as typically assumed (Gigerenzer & Murray, 1987).
It is conventional in the Fisherian approach to set a strict criterion of re-
jecting the null hypothesis; .05 is commonly used, as is .01. This means that,
given the assumptions on which the test rests, the probability of rejecting the
null hypothesis if it is really true is small (less than .05 or less than .01 for
these two criteria). It means also, however, that the probability of failing to
reject the null hypothesis if it is really false is likely to be quite large.
(Fisherians never speak of accepting the null hypothesis; the only two op-
tions one has are to reject it or to fail to reject it.) The rationale for putting this
kind of bias in the system is the assumption that, in the context of scientific
research of the kind for which statistical hypothesis testing is usually done,
failing to reject the null hypothesis when it is really false is a less objection-
able type of error than rejecting the null hypothesis when it is really true. This
bias is an analogue of the preference in law for failing to convict a guilty party
over convicting an innocent one. Fisher did not put as much stock in a single
experiment that yielded ap < .05 than have many of the subsequent users of
his ideas; he stressed the importance of replication arguing that a phenome-
non should be considered experimentally demonstrable when one knows
how to design an experiment so that it will almost always yield a statistically
significant result.
In a defense of the Fisherian approach to statistical inference, Macdonald
(1997b) notes that:
A low significance level does not require one to accept the presence of an effect.
One's acceptance can be influenced by such factors as the effect's importance,
its consistency with previous findings, its compatibility with one's existing be-
liefs and one's confidence in the researchers and in the study's methodology. In-
deed this acceptance can depend on aspects of the data of which the test takes no
account (e.g., inconsistencies within the data, overly good fits, unexplained pe-
culiarities, errors in other parts of the analyses, etc.). (p. 339)
Neyman—Pearson Inference
Jerzy Neyman was 37 years younger than Karl Pearson and a contemporary of
Karl's son, Egon. The Pearson of the Neyman-Pearson collaboration was
Egon. Neyman and E. Pearson's (1928a, 1928b, 1933) approach to statistical
inference can be seen as, in part, a reaction against what its proponents per-
ceived to be the overly one-sidedness of the Fisherian method of hypothesis
testing. Instead of posing questions that could be answered only in a yes-no
fashion—yes the null hypothesis is rejected, no it is not—the Neyman-
Pearson approach provides for the consideration of how the weight of evidence
contributes to the relative statistical plausibility of each of two competing hy-
potheses, one of which can be the null hypothesis.
Use of this approach involves not only articulating the hypotheses that are to
be considered but also specifying the criterion that will be used to make a
selection between them. One selects a decision criterion on the basis of the
relative benefits and costs of the various possible ways to be right or wrong—
deciding in favor of Hypothesis 1 when Hypothesis 2 is true, and so on.
Newyman and Pearson saw their approach to hypothesis testing as one among
other alternatives and stressed the need for users of statistical inference
techniques to exercise judgment regarding the appropriateness of specific
techniques to their own situations.
In somewhat oversimplified terms, Neyman-Pearson hypothesis testing
can be equated with either-or decision making, whereas the Fisherian
approach is better described as yes-no. Because of the either-or nature of the
former, when one of the hypotheses is the null it is possible to compute the
probability of both types of error and thus also power. So, unlike with Fisherian
testing, acceptance of the null hypothesis is a legitimate outcome in this case.
Macdonald (1997b) describes both the Fisherian and Neyman-Pearson
approaches to statistical inference as being "concerned with establishing that
STATISTICS • 269
an observed effect could not plausibly be accounted for by sampling error" (p.
334). When an achieved significance level—the probability of obtaining an
effect as extreme as the one observed—is sufficiently low, the hypothesis that
the obtained effect could plausibly be due to sampling error is rejected, and
rejection of this hypothesis entails also rejection of the hypothesis of a true
effect opposite in direction to that observed.
Bayesian Inierence
Gigerenzer et al. (1989) review the debate between the followers of Fisher and
those of Neyman-Pearson (as well as those of Bayes) in some detail and note
that it has not ended. They note too the remarkable fact that little hint of the his-
torical and ongoing controversy is to be found in textbooks that are used to
teach statistical significance testing to its potential users. The amalgamation of
ideas from different schools of thought that is commonly taught fails to do jus-
tice to the richness and complexity of their philosophical underpinnings and
promotes an uncritical and doctrinaire application of certain types of statistical
analyses as the sine qua non of good psychological science:
The need for personal judgment—for Fisher in the choice of model and test sta-
tistic; for Neyman and Pearson in the choice of a class of hypotheses and a rejec-
tion region; for the Bayesians in the choice of a prior probability—as well as the
existence of alternative statistical conceptions, were ignored by most textbooks.
As a consequence, scientific researchers in many fields learned to apply statisti-
cal tests in a quasi-mechanical way, without giving adequate attention to what
questions these numerical procedures really answer, (p. 106)
One must wonder why this state of affairs exists. Who has decided that
students, especially students who are preparing to do scientific research, have
no need to know about the philosophical disputes that have raged regarding the
justifiability of some of the fundamental tools that they are learning how to
use? Are students being "protected" from these controversies? Are instructors
270 • CHAPTER?
concerned that exposure to them would undermine students' confidence in the
efficacy of statistical inference? Are instructors reluctant to try to make sense
of them? Are they themselves unaware of them? Are Gigerenzer and Murray
(1987) correct in attributing the widespread belief among psychologists in the
"illusion" of a single statistical theory for valid inference from data to
hypotheses in part to self deception rooted in the strong desire to have such a
theory? "Psychologists seem not to wish to be cured of these illusions, for the
consequence would be the abandonment of the indispensable instrument and
hence the abandonment of the unification of psychological methodology
institutionalized by the inference revolution" (p. 25).
Gigerenzer et al. (1989) also note that significance testing, the rules of
which are based on an amalgam of the ideas of Fisher and Neyman-Pearson
that neither Fisher nor Neyman and Pearson would be likely to endorse, has
become almost the only statistical tool that is used in sociology and
psychology, and that in these fields other tools, such as confidence intervals,
the likelihood function, and Bayesian inference, have been given little
attention by comparison. They note too the strange incongruity in the fact that
many researchers who have done experimental studies of human reasoning
under uncertainty have concluded that their subjects are irrational because they
do not reason in accordance with Bayes' theorem, whereas they themselves use
Fisherian or Neyman-Pearsonian statistics or a mixture of them to test their
own hypotheses.
As institutionalized in university curricula and journal editorial policies, the
hybrid theory, as Gigerenzer et al. (1989) call the amalgam, with its sharp focus
on statistical significance testing has been a major determinant of how sociologi-
cal and psychological research is done: "In some fields, a strikingly narrow un-
derstanding of statistical significance made a significant result seem to be the
ultimate purpose of research, and non-significance the sign of a badly conducted
experiment—hence with almost no chance of publication" (p. 108). Support for
this view comes from a study by Sterling (1959) that showed that as of 1955
more than 80% of the articles in several leading psychology journals used signif-
icance tests to justify conclusions from the data. Gigerenzer et al. claim that as of
the late 1980s the figure was somewhere between 90% and 100%.
The point of this discussion is not to deny the usefulness of statistical
significance testing as a research tool. Used thoughtfully with an appreciation
of its limitations, it can be a powerful aid to extracting information from noisy
data. Used unthinkingly without an understanding of the assumptions on
which various tests rest and of the controversies that have surrounded some of
those assumptions, it can become a fetish and an impediment to scientific
advance. Statistics texts typically do make a distinction between statistical and
theoretical or practical significance, but the dominance of statistical
STATISTICS • 271
significance testing as an experimental paradigm has obscured this distinction
in practice. The purpose of research is to further our understanding of how the
world works, not just to discover statistically significant differences, and only
to the extent that the discovery of such differences really serves the larger
purpose does it contribute anything of worth to the enterprise.
Noting that early psychologists took physics as their model for an
experimental science, Gigerenzer (1987) raises the interesting question of why
they did not follow the lead of quantum theory, and proposes the answer that
"quantum theory seemed to violate two ideals connected with the struggle for
certain knowledge: determinism and objectivity" (p. 12). Probability, he
argues, was pressed into the service of these ideals through the way in which
statistics was applied to the drawing of inferences from experimental data:
"Probabilistic thinking was used as a means toward objectivity in the classical
sense of separating the experimenter from his knowledge. Such was the role of
inferential statistics as a mechanization of the experimenter's inference from
data to hypothesis" (p. 12).
The amalgamation of the ideas of Fisher and Neyman-Pearson into the
hybrid form of statistics that is generally taught in psychological experimental
design texts is seen as integral to maintaining the objectivity of psychology as
an experimental science:
The connection I shall draw between the kind of inferential statistics established
in psychology and objectivity is based on the following observations: (1) There
was a single dominant theory of inferential statistics that (2) was taught anony-
mously (i.e., without indication of its multiple and sometimes contradictory
sources) as "truth" per se. (3) Problems stemming in part from the fact that the
theory was spliced together from theses different sources went unacknowl-
edged, (4) alternative theories were neglected, and (5) this dominant hybrid the-
ory was institutionalizedby editors of journals and internalized by authors as the
one "true" path to experimental knowledge (and therefore toward publication).
(Gigerenzer, 1987, p. 12)
Determinism could coexist with probabilistic thinking during the first half
of the 20th century, Gigerenzer argues, because "probabilistic thinking was
enlisted in the service of determinism" (p. 15). The appearance of objectivity
was promoted by eliminating the need for the experimenter's judgment
through the application of mechanical statistical procedures to the interpreta-
tion of data; statistics became "a means for the mechanization of inductive
inference" (p. 25). Neglecting to acknowledge the controversial history of
statistics and the existence of approaches alternative to the one that was almost
universally taught were essential to the maintenance of this appearance, in
Gigerenzer's view.
272 • CHAPTER 7
The amalgam, or hybrid theory, could of course be exactly right. The fact
that neither Fischer nor Neyman and Pearson would own it as his, or their, own
does not necessarily make it wrong. Indeed, given the sharp differences of
opinion between Fisher on the one hand and Neyman and Pearson on the other
regarding how statistical inferences should be made, it would not be surprising
if a theory that stood the test of time turned out to show the influence of both
views but to not be totally consistent with either. A theory of statistical
inference should be judged on its merits and not on its historical pedigree.
Critics of the hybrid view generally would argue, I believe, that not only is this
view not completely consistent with Fisher's theory or that of Neyman and
Pearson, but that it also does not have a self-consistent theoretical foundation
of its own.
A strong argument could be made for the idea that the single most important in-
sight that has ever been expressed about decision making is the recognition that
in most choice situations there are more ways than one to be wrong not all of
which are equally palatable to the decision maker. Perhaps the most well-known
illustration of this principle was Pascal's famous wager, which goes roughly as
follows: God exists or He does not. Reason cannot decide the issue. One must
wager. One can bet that God exists or that He does not, and in both cases one can
be right or wrong. Suppose that one bets that God exists. If one is right, one gains
infinitely much; if one is wrong one loses little. Suppose that one bets that God
does not exist. If one is right, one gains little; if wrong, one loses infinitely much.
Clearly, Pascal argued, one should bet that God exists.
It is doubtful if any other wager, actual or proposed, has generated as much
commentary and debate as this one. Many variants of Pascal's wager, in which
the possible outcomes are not so disparate as those expressed by him but that
still represent greatly differing levels of desirability, have been constructed
since Pascal's time. The point of all of them is to demonstrate that the way one
wishes to "bet" on an uncertain outcome is likely to be determined not only by
how likely one believes the various thinkable outcomes to be but also on the
anticipated satisfaction or regret associated with betting correctly or
incorrectly (selecting an outcome that does not occur) in the various possible
ways. The wager illustrates in an intuitively compelling way that there can be
more to rationality in uncertain choice situations than simply making
selections so as to maximize the probability of being correct.
This insight has had considerable influence in experimental psychology
through the application of the statistical theory of signal detection (D. M.
Green & Swets, 1966; Swets, 1964; Tanner & Swets, 1954). In its basic form
STATISTICS • 273
the theory deals with the problem of detecting signals in noise; the observer's
(detector's) task is to say whether an observation interval contains only noise
or a signal as well. Fundamental to the theory is a sharp distinction between the
sensistivity of the observer and the criterion that the observer uses to determine
whether to report that a signal is present.
The situations to which signal detection theory was originally applied were
designed so that an observer could not increase the probability of reporting the
presence of a signal when one had actually occurred without also increasing
(by a different amount) the probability of reporting the presence of a signal
when one had not occurred. In order to attain a desired probability of reporting
a signal when it was really present (the probability of a "hit"), one had to settle
for accepting also some nonzero probability of reporting a signal when it was
not present (the probability of a "false alarm").
Such situations are very common in life; in order to capture more of the real
signals, one has to be willing to pick up also more false alarms. Or to look at it
from another perspective, the decision problem becomes that of deciding what
kind of a compromise one is willing to strike between failing to capture real
signals on the one hand and incorrectly counting nonsignals as signals on the
other. A "signal," as the word is being used here, could be a literal signal, as in
the context of a sonar operator trying to decide whether a signal emitted by a
ship has occurred in a background of acoustic noise, or it could be a figurative
one, as when we treat the outcome of a diagnostic test as a possible signal for
the presence of a specific disease.
The problem of the covariation of hits and false alarms is illustrated clearly
in the case of medical diagnosis. Many disease symptoms are continuous in
nature—a high temperature, for example, is often symptomatic of illness, but
temperature can vary continuously over a considerable range and it is
impossible to select a point on this continuum such that all higher temperatures
occur only in the presence of disease and no lower ones ever do. Thus if one
uses temperature as an indication of disease, wherever one places the divide
between "normal" and "abnormally high" temperatures, there are likely to be
some healthy people with a temperature above this point and some sick ones
with a temperature below it. (For purposes of this illustration, I am ignoring
that an abnormally low body temperature also can indicate a medical problem.)
In order to decrease the percentage of healthy people whose temperature is
above the "abnormally high" criterion, we can raise that criterion, but in so
doing we have to accept the fact that we will also increase the percentage of
sick people whose temperature reading will fall below it; conversely to
decrease the percentage of sick people whose temperature will fall below the
criterion, we can lower it, but in doing this we also increase the percentage of
healthy people whose temperatures will be classified as abnormally high.
274 • CHAPTER?
The problem is very common and because signal detection theory pro-
vides a quantitative means of dealing with it, the theory has found applica-
tion in a great many contexts (Swets, 1986; Swets, Dawes, & Monahan,
2000a, 2000b). By quantifying what can be expected given what is known
or assumed about the nature of the "signal" that one is trying to detect and
the sensitivity of the detection process, it can help one decide on a decision
criterion that takes into account both of the types of mistakes that can be
made. In the case of the medical illustration, if temperature were the only
symptom one had on which to base a diagnosis of a particular disease
(which of course it is not), where one would set the criterion for "abnor-
mally high" would depend in part on the relative seriousness of the two pos-
sible kinds of mistake—judging a healthy person to be sick and judging a
sick one to be healthy.
TABLE 7.3
Results Expected With a Diagnostic Test That Correctly Detects 90% of the Cases
of Disease and Erroneously Shows Positive on 1% of the Cases of No Disease
Note. Assumed incidence is 1 in 10,000. Cell entries are joint probabilities, assuming the test is
administered to a large random sample of the population.
TABLE 7.4
Results Expected With a Diagnostic Test That Correctly Detects 90% of the Cases
of Disease and Erroneously Shows Positive on .001% of the Cases of No Disease
Test Result Ground Truth
Disease (D) No Disease (-D)
Positive (P) .00009 .00100 .00109
Negative (~P) .00001 .99899 .99891
.00010 .99990 1.00000
Note. Assumed incidence is 1 in 10,000. Cell entries are joint probabilities, assuming the test is
administered to a large random sample of the population.
.000097.001 = .09,
or almost 1 in 10. In both cases, remember that the test is 90% accurate in the
sense that, given the presence of the disease, it returns a positive result 90% of
the time. However, a random person getting a positive result would have only
about one chance in a hundred of having the disease in the first case and about 1
chance in 10 in the second. (Of course people who get tested for a particular
disease usually are not randomly selected from the population, but get the test
STATISTICS • 279
because they have some reason to suspect they might have the disease, and
these numbers would not apply to such nonrandom samples.) These examples
illustrate the importance of false-positive rates, especially given low-incidence
conditions, in interpreting the outcomes of diagnostic tests. They also serve as
a reminder that ambiguities in the way test accuracy is reported lend them-
selves to erroneous conclusions of various sorts.
Statistical results can also be ambiguous, or meaningless, because of un-
certainty about the way in which the samples on which they are based were
selected. If there is a possibility that a statistical claim is based on a sample
that was selected in a biased way, the claim cannot be taken to be representa-
tive of the general population to which the sample belongs. The results of
polls in which the respondents are self-selected are highly suspect, for exam-
ple, because people who voluntarily respond to a poll on a specific issue can-
not be assumed to be a random sample from the population of potential
respondents and therefore representative of the population as a whole with
respect to the issue involved.
The general moral of this story is that statistical assertions are very often
ambiguous, and sometimes even meaningless, and their ambiguity or lack of
meaning may go undetected if they are interpreted uncritically without an
effort to understand alternative interpretations they could have. Promoters
know rather well how to exploit ambiguous statistical claims. The kinds of
ambiguous claims that are sometimes made with the intent to mislead can also
be made in good faith and an unawareness of their ambiguity. The challenge to
the reader or listener to detect the ambiguity is the same in both cases, although
one may feel rather differently toward the originator of an ambiguous
statement that was intended to deceive than one does toward an individual who
speaks ambiguously from statistical naivete".
In a very readable little book that does not require a grounding in mathematics
to be appreciated, Huff (1954/1973) has described numerous ways in which sta-
tistics—in the sense of the presentation of facts and figures—can be used, inten-
tionally or unintentionally, to mislead. More recently, Gilovich (1991) has done
so as well in a book with the delightfully ambiguous title How We Know What Is-
n't So. Other writers who have provided useful and highly accessible discussions
of how statistics can be misused or misunderstood include Jaffe and Spirer
(1987), Paulos (1990), and Dewdney (1993).
The idea of sampling is an essential element for making sensible decisions; in-
deed, it may be the basis of thought itself. (J. Cohen, 1957, p. 128)
280 • CHAPTER?
Statistics appeared on the psychological scene in a major way first, following
the work of Fisher and of Neyman and Pearson, as a tool for making inferences
regarding the meaning of the outcomes of experiments. Perhaps because this
application of statistics was seen to be very successful, psychologists began to
use statistical inference procedures as metaphors for how people make deci-
sions in daily life. Gigerenzer and Murray point to the application of the statis-
tical theory of signal detection to human sensory and perceptual processes
(Green & Swets, 1966; Swets, Tanner, & Birdsall, 1961; Tanner & Swets,
1954) as an especially noteworthy example of the fruitful use of this metaphor.
Gigerenzer and his colleagues (Gigerenzer & Murray, 1987; Gigerenzer et
al., 1989) have suggested that psychologists have a tendency to make meta-
phors of mind based on tools and instruments that they find useful. Especially
notable examples of such "tools-to-theories" transactions have involved statis-
tics and computers. These metaphors, both of which emerged around 1960,
have had a major impact on the subsequent history of psychological research.
Not only did they provide conceptual models for perceptual and cognitive phe-
nomena and frames of reference within which the phenomena can be viewed,
they helped determine the kinds of research questions that were asked.
According to Gigerenzer and Murray (1987), as a consequence of the
application of signal detection theory to the study of sensation and perception,
and related work:
[The mind, at least in certain functional contexts,] was now pictured as a statisti-
cian of the Neyman and Pearson school. The processes of "inference," "decision,"
and "hypothesis testing" were freed from their conscious connections and seen as
unconscious mechanisms of the brain. Thus, uncertainty, in the sense of uncertain
inferences and decisions, became an essential feature of cognitive processes, and
computation of distributions and likelihoods, random sampling and power analy-
sis became the mind's way of coping with this uncertainty, (p. 60)
Problems were constructed so that they could be answered by calculating proba-
bilities, means, variances, or correlations.... The new vocabulary for under-
standing human reasoning was the vocabulary of the statistician; the new
elements of thinking were numbers (probabilities), and the process of thinking
itself was explained by statistical operations such as calculating likelihood ra-
tios. The theoretical questions asked by experimenters, the problems posed to
the subjects, and the explanations sought all reflected the fascination with prob-
ability and statistics, (p. 147)
SUMMARY
There can be little doubt of the power of statistical methods for making infer-
ences when they are thoughtfully applied with a clear understanding of their
limitations and the assumptions that underlie them. But the cautions that have
been raised by various writers (Abelson, 1995; Gigerenzer et al., 1989; Good,
1983a) about the dangers of applying these methods in cookbook fashion and
the temptation to gloss over the complexity of the ideas in the interest of simpli-
fying instruction or assuring the noncontroversial nature of a science's pre-
dominant methodology cannot be ignored.
Psychology, and other disciplines that make equally heavy use of statistical
inference techniques, would be better served if the training of students seeking
advanced degrees in the field put much more emphasis on the logic and
frequently debatable assumptions that underlie the statistical procedures that
have been developed for inferencing and less on the mechanics of the
procedures themselves. Such training should include a thorough acquaintance
with the history of probability theory and the philosophical questions that
center on the concept of chance. Among the accounts of various aspects of this
history and of many of the relevant issues are those of David (1962), Hacking
(1965, 1975, 1990), Porter (1986), Gigerenzer and Murray (1987), Kriiger,
Daston, and Heidelberger (1987), Kriiger, Gigerenzer, and M. S. Morgan
(1987), Gigerenzer et al. (1989), and Salsburg (2001).
Training in statistics should also provide a perspective in which its use in
hypothesis testing and decision making is viewed as one among other avenues
to discovery. Macdonald (1997b) has characterized psychology as "a mix of
evidence and theory held together by arguments" (p. 344). This seems right to
282 • CHAPTER 7
me. The results of statistical tests are among the evidences from which
arguments can be constructed, and they can be persuasive when used
appropriately, but they are by no means the only ones, and when used
inappropriately can be worse than no evidence at all.
In short, people who use statistics in their work to describe data sets or to
make inferences need to understand their tools, the rationales on which they
are based, and the assumptions that justify their use in particular instances. But
it is not only people who use statistics in their work who benefit from
knowledge of the subject. One must have some acquaintance with basic
statistics if one is to read the newspaper comprehendingly, if one is to avoid
being taken in by scams that capitalize on statistical naivete, and if one is be an
effective decision maker in an uncertain world.
CHAPTER
I n the preceding chapters, much attention has been given to a variety of topics
relating to probability theory and statistics and relatively little to the results of
research on how, and how well, people think about probabilistic matters. The
results of some research have been discussed, but empirical studies have not
been the major focus. Beginning with this chapter, attention shifts to research,
its results, and the interpretations those results have been given.
Probabilistic or statistical reasoning has been studied in a variety of contexts
including clinical diagnosis (Meehl, 1954), management decision making (R.
V. Brown, Kahr, & C. R. Peterson, 1974), flood probability estimation by flood
plain residents (Slovic, Kunreuther, & White, 1974), experimental design
(Brewer & Owen, 1973), weather forecasting (Murphy & Winkler, 1974,
1977), climate change (National Academy of Sciences, 1983), accident analy-
sis (C. H. Green & R. A. Brown, 1978; Slovic, Fischhoff, & Lichtenstein,
1978), election outcome predictions (Black, 1958; I. Fischer & Budescu,
1994), among many others. The review of this research here is not intended to
be exhaustive, but it is extensive and the studies considered are intended to be
broadly representative of the work in the field.
283
284 • CHAPTER 8
If people were more capable of estimation and simple calculation, many obvious
inferences would be drawn (or not), and fewer ridiculous notions would be en-
tertained. (Paulos, 1990, p. 17)
The ability to estimate has not received the attention it deserves in education. It
is an extraordinarily useful ability. Evidence that this is beginning to be real-
ized is perhaps seen in one of six major recommendations in the 1980 agenda-
setting report of the National Council of Teachers of Mathematics: "Teachers
should incorporate estimation activities into all areas of the program on a regu-
lar and sustaining basis, in particular encouraging the use of estimating skills
to pose and select alternatives and to assess what a reasonable answer might
be" (p. 7). The importance of estimation skills has also been stressed by the
Curriculum Framework Task Force of the Mathematical Sciences Education
Board (1988). Although estimates can be made of many types of variables—
for example, quantities, magnitudes, durations—here attention is limited to
probabilistic or statistical variables.
When asked to observe a set of numbers and to estimate some measure of cen-
tral tendency, such as its mean, people are able under some conditions to pro-
duce reasonably accurate estimates (Beach & Swensson, 1966; Edwards,
1967; C. R. Peterson & Beach, 1967), although systematic deviations from ac-
tual values have also been reported (N. H. Anderson, 1964; I. P. Levin, 1974,
1975). When the numbers whose means are to be estimated have been pre-
sented sequentially, effects both of primacy (influence of the first few numbers
in the sequence) (Hendrick & Costantini, 1970) and of recency (influence of
the last few numbers) (N. H. Anderson, 1964) have been obtained. For skewed
distributions, estimates of means are likely to be biased in the direction of me-
dians (C. R. Peterson & Beach, 1967).
People's ability to estimate variability has been studied but less extensively
than their ability to estimate means. One focus of interest has been how per-
ceived variability depends on the mean around which the variability occurs.
Some investigators have reported results suggesting that perception of a distri-
bution's variability is not influenced by the distribution's mean (I. P. Levin,
1975; Pitz, Leung, Hamilos, & Terpening, 1976). Others have found the two
estimates to be related systematically. Estimates of the variability of a set of
numbers have been noted to decrease as the mean of the set increased and the
ESTIMATION AND PREDICTION • 285
(absolute) variability remained the same; in other words, variances of the same
magnitude around a small mean and around a large mean appeared larger in the
former case (Beach & Scopp, 1968; Hofstatter, 1939; Lathrop, 1967). There is
some question as to the extent to which this relationship reflects a true mis-
perception as opposed to a confusion of variability in absolute terms with vari-
ability relative to a mean. A standard deviation of 20 pounds in the distribution
of weights of 100 freight cars seems a good bit smaller than a standard devia-
tion of 20 pounds in the distribution of weights of 100 people, even though in
absolute terms it is not. It could also stem, at least in part, from the linguistic
convention of making the interpretation of such words as small and large con-
tingent on the context in which they are used, so what may be happening is
analogous to when we consider a particular horse to be small or a particular dog
large, even though in absolute terms the former is larger than the latter.
Given a variable set of values, one's perception of its variability may depend
somewhat on the manner in which one becomes aware of the set. The variance
is likely to be perceived as larger, for example, if the values are presented in
random order than if they are presented in a regular ascending or descending
order (Lathrop, 1967). More information on work that has been done on the es-
timation of sample statistics and on proposed models of the process by which
estimates are made can be found in C. R. Peterson and Beach (1967), Pollard
(1984), and Busemeyer (1990).
The problem of estimating how long it will take and how much it will cost to
complete some task is a very common one in our society. Contractors must pre-
pare time and cost estimates when bidding on projects. Local, state, and na-
tional government agencies must make similar estimates when allocating
public resources to programs competing for the same funds. All of us, as indi-
viduals, estimate, for our own planning purposes, how long it will take us to do
specific things and, in some cases, what the costs will be.
How good are we at making such estimates? The answer to this question has
important implications for our individual and corporate lives. Especially inter-
esting and potentially significant is the possibility of systematic biases in the
way estimates are made and consistent errors in the results. If we are strongly
inclined to underestimate, or to overestimate, the time and/or money that will
be required to complete specific tasks, it should be helpful, for practical pur-
poses, to know that. And if we do tend consistently to err in one direction, we
would like to have an explanation of the fact.
A front-page article in the September 11, 1994, issue of the Boston Globe,
entitled "The Big Dig," reports progress on the Third Harbor Tunnel project in
Boston as of that time (Sennott & Palmer, 1994). Described in the article as
"the largest public works project in America," the Big Dig involves 7.5 miles of
construction including a tunnel providing access from south Boston under the
Boston Harbor to Logan Airport. Estimates of the cost of the project grew, after
the project was under way, from $360 million in 1976 to $7.74 billion in 1994.
The latter estimate was the official one as of the date of the article, but the writ-
ers noted that, unofficially, officials were acknowledging that the ultimate
costs were likely to be higher, perhaps as high as $ 10 or $ 12 billion. Official es-
timates of completion dates were 1998 as of February 1991,2001 as of March
1992, and 2004 as of February 1993. A front-page article in the February 2,
2000, issue of the Boston Globe reported that Big Dig officials had just an-
nounced a $ 1.4 billion jump in the costs, bringing the then officially estimated
total to $12.2 billion (Palmer, 2000); the same article gave early 2005 as the
completion date, and described the project as being "roughly on track to be
completed on time." The September 2001 summary report of the Oversight
Coordination Commission of the Central Artery/third Harbor Tunnel Project
(www.state.ma.us) gives $14.475 billion as the then current estimate of the to-
tal cost and a "substantial completion" date of 2004. (This report gives $2.3 bil-
lion as the initial cost estimate, made in 1984.) Some of the cost escalation is
288 • CHAPTERS
because the scope of the project increased very considerably over the years,
some because of unanticipated complications encountered in the process of
construction, and some because the reported figures do not correct for infla-
tion, but even taking these factors into consideration, the history of this project
is a graphic example of how very easy it is to underestimate by very large
amounts cost and time-to-completion for complex projects.
The Third Harbor Tunnel project is far from unique with respect to its history
of upward revisions of cost and time-to-completion estimates. Similar observa-
tions could be made about numerous major programs and projects. According to
Gibbs (1994), time and cost estimates are chronically too low in major software
development projects: "Studies have shown that for every six new large-scale
software systems that are put into operation, two others are canceled. The aver-
age software development project overshoots its schedule by half; larger projects
generally do worse" (p. 86). Pfleeger (1991) has made similar observations. One
gets the impression from the news media that cost and time overruns (underesti-
mates) are the rule in the government contracting world, if not in the world of
project planning more generally; and, it would seem, the more complex the pro-
ject, the greater the underestimation is likely to be.
But is this impression correct? It could be that the perception that overruns
(and underestimates) are the rule rather than the exception is a misperception
of the facts. Such a misperception could arise in a variety of ways. Perhaps
overruns are more likely to be reported by the news media than are projects that
finish on time and within budget. Even if this were not the case, it could be that
reports of overruns, especially those that involve large amounts of money and
some evidence or hint of scandal capture our attention and linger in our memo-
ries more than do reports of projects without these characteristics. Granted the
possibility of the greater visibility and memorability of projects that exceed
time and cost estimates than those that finish within time and on budget, I sus-
pect that underestimation is a more common occurrence than overestimation.
I do not mean to single out contractors here as poor estimators of how long it
will take to accomplish specific projects. My guess is that most of us are in this
boat. I know from experience that when I estimate how long it will take to finish
this book, I should make the best estimate of which I am capable, at least double
it, and then not be surprised when, in fact, it takes considerably longer than that.
My estimates of how long it will take to complete some personal chore seem al-
most invariably to miss the mark significantly on the low side; at least I am more
keenly aware of misses of this type than of those of the opposite one.
Some research has been done on the question of how well people predict the
time it will take them, or someone else, to perform a specified task. The results
of this research support the notion that people generally underestimate how
long it will take them to perform a specific task themselves, or they overesti-
ESTIMATION AND PREDICTION • 289
mate how much they can do in a given time (Buehler, Griffin, & McDonald,
1997; Buehler, Griffin, & M. Ross, 1994,1995; Byram, 1997; Hayes-Roth &
Hayes-Roth, 1979; Kidd, 1970), and this despite the fact that producing a spe-
cific estimate of how long it will take to complete a task may motivate one to
complete the task within that time thus helping the estimate to become a
self-fulfilling prophecy (Sherman, 1980). And underestimates sometimes per-
sist despite efforts to improve them with debiasing techniques (Byram, 1997;
Newby-Clark, M. Ross, Buehler, Koehler, & Griffin, 2000).
Interestingly, it appears that people are less likely to be overly optimistic
when predicting the time it will take someone else to perform a task than when
predicting how long it will take themselves to do so; they seem more likely to
consider the possible effects of intrusions in the former case (Buehler et al.,
1994; Newby-Clark et al., 2000). However, experts in the performance of a
task may find it easy to underestimate how long it will take novices to do the
same task (Hinds, 1999).
Assuming that there is a general tendency to underestimate, what might ac-
count for it? One possibility that should not be overlooked is that of the opera-
tion of motivational factors. Inasmuch as contractors typically bid for jobs and
contracts are awarded to low bidders, any tendency they have to underestimate
time and costs might be attributed, at least in part, to the incentive the bidding
process provides to do so. Those who submit bids that take into account the
high probability that not everything will go exactly as planned are likely to find
themselves eliminated from the competition by others whose bids are based on
much more optimistic, if less realistic, projections. Of course, if the bidis bind-
ing in the sense that the contractor must deliver what is promised for the esti-
mated cost, this is a very risky strategy, but as it happens, in many contracts
time overruns carry no penalty and cost overruns often are collectable, so the
risk associated with underbidding may not be great.
Motivational factors could be involved in individuals' estimates as well as in
those of corporate entities. Estimating a relatively long time to perform a task
might be seen as an admission of lack of competence, for example, in which
case one might be inclined to lower one's best estimate of how long a task will
take, as opposed, say, to raising it. On this hypothesis, we underestimate how
long it will take ourselves to do something because we tend to believe that we
are more capable or more efficient than we really are, and we are more likely to
underestimate how long it will take us than to underestimate how long it will
take someone else, because we see ourselves as more capable and efficient than
our peers. The latter idea is consistent with numerous other self-serving biases
that research has revealed, such as the tendency to consider ourselves to have
more positive and fewer negative character traits than our peers (Alicke, 1985),
to be more fair-minded (Liebrand, Messick, & Wolters, 1986; Messick,
290 • CHAPTER 8
Bloom, Boldizar, & Samuelson, 1985), less prejudiced (Fields & Schuman,
1976; O'Gorman & Garry, 1976), better equipped for academic success or
marital happiness (Kunda, 1987), and better-than-average with respect to lead-
ership ability (College Board, 1976-1977).
Another possible determinant of chronic underestimation is the difficulty
we have in generating exhaustive lists of the members of vaguely defined sets
(and/or overestimating the exhaustiveness of the lists we produce when we at-
tempt to do so). In estimating the time required to complete a task, one is likely
to imagine performing the task, which probably means going over it step-by-
step in one's mind estimating the duration of each step. This process can go
wrong and lead to an underestimate in either of two ways. First, one may over-
look one or more steps. This possibility seems intuitively more likely than that
of imagining extra steps that are not required. Second, in estimating the dura-
tion of any given step, one may overlook things that could go wrong or fail to
think of complicating factors that could increase the time required to perform
it. Again this possibility seems more likely than the offsetting one of factoring
in to one's estimate complications and problems that do not arise. It appears
that when people imagine how a future project may go, they typically do not
imagine the various ways in which it might go wrong (Buehler et al., 1994).
Providing people with an explicit list of what can go wrong may improve esti-
mates in some cases (Engle & Lumpkin, 1992).
Evidence that people typically overestimate the probability of conjuctions
of independent events and underestimate the probability of disjunctions
(Bar-Hillel, 1973; L. J. Cohen, Chesnick, & Haran, 1971,1972) also can help
account for any general tendency to underestimate the time required to com-
plete a task, especially a complex one. If a task is to be completed in minimum
time, all of the several subtasks must go right (estimating the probability of
their doing so is estimating the probability of a conjunction of events) and it
must be that almost none of the possibilities for something to go wrong materi-
alizes (estimating the probability of one or more of these events happening is
estimating the probability of a disjunction). If one is motivated to minimize es-
timates of time and costs, one is likely to do so for each of the subtasks. What is
easy to overlook is that even if the probability of a delay-causing problem is
small in each case, the probability that such a problem will occur in at least one
of the components can be large. The assumption that everything will proceed
as planned and that no contingencies will arise has been called by Kahneman
and Tversky (1979b) the planning fallacy.
Another conjecture relating to task-time estimation is the following: When
working on a nontrivial task that is spread over a significant period (say
months), if one is asked at various times to estimate what percentage of the task
has been completed, one is more likely to overestimate the percentage com-
ESTIMATION AND PREDICTION • 291
pleted and to underestimate what remains to be done than the reverse. This con-
jecture rests on the assumption that we are likely to be more keenly aware of the
details of the finished aspects of a task than of those of the remaining ones; we
can call the former to mind because we have recently experienced them, but the
latter are not so readily identified because we have to rely on our imagination to
identify them.
A similar explanation can be given to the finding that people tend to remem-
ber better their own contributions to a group effort than the contributions of
other members of the group, and therefore are likely to exaggerate the relative
size of their own contributions (Brenner, 1973,1976; Johnston, 1967; M. Ross
& Sicoly, 1979). Such a bias may be accounted for by a form of the availability
principle. One is bound to be more aware of the details of one's own work on a
collaborative task, especially with respect to those aspects of the effort that are
covert, than of the details of the work of one's collaborator(s). In recalling a
project, one is likely to have available more information pertaining to one's
own efforts than pertaining to the efforts of others, other things being equal.
Possibly the best known misconception that relates to the ability of people to
predict probabilistic events is the gambler's fallacy. This concept was dis-
cussed in chapter 2. It suffices here to recall that it is the belief that a run of suc-
cessive occurrences of one type of random event (e.g., a run of heads in coin
ESTIMATION AND PREDICTION • 293
tossing) will make an additional occurrence of that event less likely, or that
when a sample of ongoing random events shows a "deficit" of one type of
event, the probability of the imminent occurrence of that event is increased.
This fallacy has been known for a long time and has been demonstrated in
many experiments.
An experimental situation that has been widely used to study predictive behav-
ior involves giving a person the task of predicting, on each trial of the experi-
ment, the outcome of a probabilistic event, such as the color of the next flash of
a light that has been flashing different colors, say red and green, on an irregular
schedule. In this situation, people often predict a particular event, say "red,"
with about the same relative frequency as its actual occurrence and, as a conse-
quence, end up with a smaller percentage of correct predictions than they could
have obtained by simply predicting the more frequently occurring event on ev-
ery trial (Estes, 1964; Kintsch, 1970; Meyers, 1976). If, for example, the light
were flashing red on a random 70% of the trials, predicting red on 70% of the
trials would result in a success rate of approximately (.7 x .7) + (.3 x .3) = .58,
whereas predicting red on every trial would ensure a success rate of .7.
Analogous forms of suboptimal behavior have been observed in a variety of
contexts in which people would do well to make all-or-none choices but use
some other strategy instead. Even experienced poker players may make this
mistake. Lopes (1976), for example, found that the sizes of the bets that players
were willing to place increased in proportion to their subjective probability of
winning. This is a suboptimal strategy. At least in the long run, the expected
gain is maximized by betting the minimum possible amount whenever one be-
lieves one has a less than even chance of winning and the maximum possible
amount whenever one believes one's chances of winning are better than even
(N. H. Anderson, 1979).
Arkes, Dawes, and Christensen (1986) had people judge, on the basis of the
number of A's that specific students had received in three courses, whether or
not those students had graduated with honors. Participants were shown for
each possible number of A's (0-3) the percentage of students receiving that
number who graduated with honors. A minority of the students who received 0
or 1 A graduated with honors, whereas a majority of those who received 2 or 3
A's did. The percentages were such that participants who followed the rule of
guessing honors for all students who had received 2 or 3 A's and not honors for
all who had received 0 or 1 would have been sure of getting about 70% correct,
and they were given this information. Many participants did not use this rule,
however, and consequently got fewer than 70% correct.
294 • CHAPTERS
This was especially true of those participants who had been explicitly en-
couraged to try to do better than 70% by being "extremely observant" and of
those who had been motivated by the promise of a monetary reward for espe-
cially high performance. Those who were told that even experts could expect to
do no better than 70% on the task and that people who tried to do better would
actually do worse were more inclined to use the simple binary rule. Arkes et al.
(1986) interpreted their results as demonstrating that an increased incentive to
do well can cause a decrease in performance on a probabilistic judgment task.
Such a finding is especially thought provoking in view of the fact that high mo-
tivation characterizes many of the circumstances under which professional di-
agnoses (say for medical or psychological purposes) are performed.
It will not escape notice that in this experiment use of the simple strategy
of saying yes if the student has 2 or more A's makes the task trivially simple,
and perhaps boring. One might decide to try to beat the odds (to do better than
the 70% that is guaranteed by use of the two-or-more-A's-equals-yes rule),
realizing that the probability of doing so is small. What one gains by this de-
cision is participation in a real gamble with a long-shot chance of beating the
odds; what one gives up is participation in a boring game. There is some evi-
dence that even in casino gambling situations people may be motivated by the
desire not only to win money, but to engage in interesting or exciting games
(Keren & Wagenaar, 1985).
In a second experiment, Arkes et al. (1986) had people attempt to choose
from among three candidates the baseball player who won the "most valuable
player" award in the National League each year from 1940 to 1961 (excluding
a few years in which pitchers won the award). Four items of information were
provided for each candidate for each year: batting average, number of home
runs, number of runs batted in, and standing of the player's team at the end of
the year. All participants were informed that in about 75% of the cases choice
of the player whose team finished highest in the standings would be correct.
Those who were only moderately familiar with baseball tended to make use
of this fact and selected the player solely on the basis of his team's standing;
those who were highly familiar with baseball did not follow this simple rule.
The former participants did better than the latter as a group, although they ex-
pressed less confidence in their choices. Arkes et al. (1986) characterized the
moderately knowledgeable participants as slightly underconfident and the
highly knowledgeable ones as seriously overconfident, and noted that one of
the dangers of overconfidence is the assumption that no assistance is needed to
assure good decisions.
On the basis of a review of studies involving prediction in probabilistic situ-
ations, Meyers (1976) concluded that it is extremely difficult to get people to
follow strictly an optimal strategy. One factor that may contribute to the failure
295
of people to adopt optimal strategies in probabilistic prediction tasks is the pos-
itive reinforcement they receive on those trials on which their predictions
prove to be correct. If it is true that people pay more attention to their successes
than to their failures, then every time one makes a correct prediction, one's be-
havior, whatever it is, is reinforced. When one "probability matches" in proba-
bilistic prediction tasks, one can get a fair amount of reinforcement and that
may be enough to sustain the behavior. Indeed, as may be seen from Table 8.1
in the two-alternative situation, the difference between the guaranteed percent-
age correct with an optimal strategy and the expected percentage correct with a
matching strategy is not enormous even at its maximum, when the probabili-
ties of the two events are .75 and .25, and it is quite small when the probabilities
of the two events are either very different or nearly the same.
This line of reasoning gets some support from the finding that item-by-item
feedback on the outcomes of probabilistic choice tasks sometimes results in
poorer performance than complete lack of feedback (Arkes et al., 1986;
Hammond, Summers, & Deane, 1973; Schmitt, Coyle, & King, 1976). Feed-
back may encourage people to focus on individual trials and to make each pre-
diction contingent on the outcome of the immediately preceding trial, using
some simple strategy such as "stay with same prediction following successful
trial, change prediction following an unsuccessful one." Such a strategy would,
of course, result in probability matching. It also may be, as Dawes (1979) has
suggested, that people feel compelled to try to account for all of the variance in
such tasks even though it is not possible to do so.
It could also be that college students participating in probabilistic prediction
experiments are unwilling to use optimal strategies, and clinicians resist using
algorithmic approaches to diagnosis, for the simple reason that, in both cases
TABLE 8.1
Guaranteed and Expected Success Rates for Optimal and Matching Strategies,
Respectively, in Two-Alternative Event Prediction Tasks
Strategy
Our passions, our prejudices, and dominating opinions, by exaggerating the prob-
abilities which are favorable to them and by attenuating the contrary probabilities,
are the abundant sources of dangerous illusions. (Laplace, 1814/1951, p. 160)
Suppose you are to enter a lottery in which the probability of winning is .1.
Should you care whether the. 1 represents there being one winning ticket in 10,
10 winning tickets in 100,100 winning tickets in 1,000,... ? As long as 1 ticket
in every 10 sold is a winner (and wins the same amount), what should it matter
how many tickets are sold? It appears that many people do have a preference
for a larger number of tickets and winners, for a fixed probability of winning.
Kirkpatrick and Epstein (1992) demonstrated this, for example, with an exper-
iment in which people who were to win a prize if they drew a red bean from a
bowl preferred to draw from a bowl that contained 100 beans 10 of which were
red than from a bowl containing 10 beans 1 of which was red. Denes-Raj and
Epstein (1994) found that some people even preferred to draw from a bowl that
contained 100 items, from 5 to 9 of which were red, than from a bowl contain-
ing 10 items, 1 of which was red; these people actually preferred the situation
that provided the lower probability of winning.
When a compound event is composed of two or more independent compo-
nent events, the probability of the compound event is the product of the proba-
bilities of the component events, and the order of occurrence is irrelevant. That
order of event occurrence in such cases may not be seen as irrelevant was
shown by an experiment by Rowen (1973) in which participants were required
to choose one of two mutually exclusive actions both of which had the same
probability of success. However, both actions were composed of two inde-
pendent steps, the probability of success of which differed. Participants tended
to select the action composed of steps the first of which had the greater proba-
bility of success.
In most laboratory studies of the ability of people to predict probabilistic
events, the events are of no great consequence or intrinsic interest to the par-
ticipants. We can probably safely assume that, for the most part, people have
ESTIMATION AND PREDICTION • 297
no strong preference for the occurrence of one of the possible events over that
of the other. Presumably they want their predictions to be as accurate as pos-
sible, because making correct predictions is their task, but, apart from this
consideration, whether the red light comes on more frequently than the green
one is of little concern.
In many real-life situations involving probabilistic events, we care a lot
about outcomes and strongly prefer some of the possibilities over others. We
care, for example, about how likely we are to end up as a highway fatality sta-
tistic, about the probability that the global temperature really is on the rise,
about the prospects of an improved economy, maybe even about the likeli-
hood that the weather will be good tomorrow. Do our preferences for particu-
lar outcomes over others affect our ability to estimate the probabilities of
uncertain events? And if so, how much of an effect do they have, and what is
the nature of that effect?
Sundstrom, Lounsbury, DeVault, and Peele (1981) took opinion polls of
people living in a small community near the planned site of one of the world's
largest nuclear-power plants as of 1980. They related their results to an expec-
tancy-value attitude model that assumed one's attitude toward some object, in
this case the power plant, is a simple sum of the possible positive and negative
consequences of the object's existence each multiplied by the individual's ex-
pectation of its occurrence. The results supported such a model inasmuch as
expressed attitudes reflected people's opinions about the relative probabilities
of positive and negative consequences of the plant's existence. People did not
give equal weight to all possible consequences, however, but tended to over-
weight some and underweight others. The correlation between the degree to
which one liked or disliked a potential consequence and its estimated probabil-
ity of occurrence was high, suggesting that the two judgments were not inde-
pendent. In a very different context, Babad and Katz (1992) showed that soccer
fans were likely to overestimate the probability that their favorite team would
win a specified game.
The idea that how desirable, or undesirable, one considers an event to be
may affect one's estimate of the probability of its occurrence finds consider-
able support in the experimental literature (S. J. Hoch, 1985; Irwin, 1953;
MacCrimmon, 1968; Marks, 1951; Slovic, 1966; Weinstein, 1980). When
asked to predict cards drawn at random from packs containing cards represent-
ing different payoffs, people often tend to overestimate the likelihood of draw-
ing the more desirable cards (Irwin, 1944, 1953; Irwin & Snodgrass, 1966).
There is also some evidence that people tend to overestimate the probability
of an event to which then- attention has been directed—a focal event. Gibson,
Sanbonmatsu, and Posavac (1997) found that people were more likely to bet on a
national basketball team they had been instructed to imagine winning than to bet
298 • CHAPTER 8
on any of a set of competing NBA teams. These investigators see this behavior as
an example of the more general finding of selective hypothesis testing whereby
people tend to overrate the likelihood of a hypothesis that they are considering
relative to those of competing hypotheses that could be considered.
Lotteries
Laplace (1814/1951) argued that people who play at lotteries generally do not
understand the odds against their winning: "They see only the possibility by a
small stake of gaining a considerable sum, and the projects which their imagi-
nation brings forth, exaggerate to their eyes the probability of obtaining it; the
poor man especially, excited by the desire of a better fate, risks at play his ne-
cessities by clinging to the most unfavorable combinations which promise him
a great benefit" (p. 161). Laplace also pointed out that the player's exaggera-
tion of the prospects of winning is bolstered by the publicity that winning re-
ceives and that losing does not: "All would be without doubt surprised by the
immense number of stakes lost if they could know of them; but one takes care
on the contrary to give to the winnings a great publicity, which becomes a new
cause of excitement for this funereal play" (p. 161).
Do people who play lotteries really understand the odds against their win-
ning? Probably not. Certainly promotions of lotteries, including those run by
the state, are not designed to stress the fact that almost everyone who plays
loses. Publicity that follows drawings invariably focuses on winners. Seldom,
if ever, does one hear or read of the many people who, week after week, put
money that they can ill afford to lose into the black hole of a public lottery in the
unrealistic hope of striking it rich.
Would people play lotteries if they did understand the odds? Perhaps some
who now do so would not, but undoubtedly many would. No matter what the
odds, and how well they are understood, one can always take the position
"Someone has to win, and my chances are as good as anyone else's."
Is this a rational position? If one defines rationality in terms of the maximiza-
tion of expected value, it clearly is not. Lotteries are designed to generate income
for the entities (state, organization) that run them, which means that the payout to
the winner(s) must be less (and typically is considerably less) that what is taken
in from participants. It follows that the expected value of a ticket (the product of
the amount that would be won and the probability of winning) must be less (and
it typically is considerably less) than its cost. So when one buys a lottery ticket
for $ 1.00, one, in effect, pays $ 1.00 for something with an expected value of, per-
haps, 50 cents. Not very rational behavior from an expected-value point of view.
But one might argue that this is a much too simple-minded assessment of the
situation. It completely ignores the value of the pleasure one gets from playing
ESTIMATION AND PREDICTION • 299
the game. People spend money all the time on activities from which they have
no expectation of a monetary return (movies, concerts, roller coaster rides,
trips around a golf course). Like many activities, playing the lottery has enter-
tainment value, win or lose.
Compulsive gambling is not a new phenomenon, but as state lotteries have
become increasingly common, it appears to have become an increasingly no-
ticeable and troublesome problem with societal ramifications. It is a bit ironic
to hear enticing promotions of state-run lotteries and public-service announce-
ments of state-sponsored clinics for compulsive gambling nearly juxtaposed
on the airways. It would be good to know more about how people's tendency to
be overconfident of their ability to beat the odds, especially when bolstered by
lopsided reporting or lottery outcomes, contributes to this problem.
Sample-Size Intuitions
Researchers have focused on both the question of whether the law of large
numbers—which refers in this context to the tendency of large random sam-
ples, but not small ones, to resemble the populations from which they were
drawn—is reflected in the intuitions of people untrained in statistics and on
that of the extent to which people, such as research psychologists, who have
had such training make statistical decisions that are inconsistent with it. Evans
(1989) points out that this research has concentrated on the isssue of whether
people overestimate the power of small samples and that little, if any, attention
has been given to the question of whether people understand the diminishing
returns to be obtained from ever-increasing sample size. This is unfortunate, he
suggests, because "optimal decision behavior requires one to sample suffi-
ciently but not excessively" (p. 34).
Among what appear to be faulty intuitions that have been observed about
probabilistic events is the expectation for small samples to resemble too pre-
cisely the populations from which they are drawn. For example, it has been
claimed that when people are asked to predict a sequence of tosses of a fair
coin, the proportion of heads in arbitrary short segments of the sequences they
produce tends to be closer to .5 than would be expected according to probabil-
ity theory (Tune, 1964). Tversky and Kahneman (1971) refer to the intuitions
behind such judgments as the "law of small numbers," according to which
small samples are assumed to be highly representative of the populations from
which they are taken. (See also Wagenaar, 1970, 1972.)
During the first couple of weeks of a major-league baseball season, the lead-
ing batting average is likely to be between 400 and 500. By the end of the sea-
son it is almost certain to be under 400. An interpretation that is sometimes put
on this fact is that batters get worse or pitchers improve as the year wears on. A
300 • CHAPTERS
more plausible account recognizes that small samples are more likely than
large ones to produce deviant numbers. This explanation could be tested by
computing batting averages over 2-week periods throughout the season. On
the sample-size hypothesis, one would not expect the leading 2-week average
to decrease systematically over the year. Of course there may be systematic
trends superimposed on the sample-size effect, but the point is that even in the
absence of such trends the sample-size effect would be expected to yield the
observed decrease in the leading average, when that average is computed on a
steadily increasing sample size.
Kahneman and Tversky (1972) have observed that people often do not take
sufficient account of sample size when estimating probabilities for which sam-
ple size is a relevant consideration. The following problem illustrates the point:
A certain town is served by two hospitals. In the larger hospital about 45 babies
are bom each day, and in the smaller hospital about 15 babies are born each day.
As you know about 50 percent of all babies are boys. The exact percentage of
baby boys, however, varies from day to day. Sometimes it may be higher than 50
percent, sometimes lower.
For a period of one year, each hospital recorded the days on which (more/less)
than 60 percent of the babies born were boys. Which hospital do you think re-
corded more such days? (p. 443)
Of those participants who selected one hospital or the other (they were also
allowed to indicate indifference between them) about half selected the one
with the larger number of births. The result appears to indicate a lack of under-
standing of the fact that large deviations from a mean, in percentage terms, are
more likely in small samples than in larger ones.
Several subsequent studies have shown that people's performance on such
problems can be quite sensitive to the way in which they are worded (Bar-Hillel,
1979; Evans & Dusoir, 1977; C. L. Olson, 1976), and this raises the possibility
that what is sometimes attributed to faulty statistical intuitions could be due, at
least in part, to a failure to comprehend the problem as given. Evans and Dusoir
(1975) found, for example, that when participants were asked which of the two
hospitals described previously was more likely to have a day on which all babies
born were boys, and were forced to select one or the other, 85% selected the
smaller hospital. When asked directly whether an accurate result was more
likely to come from a statistical estimate based on a small sample or on a large
one, nearly all parcipants correctly indicated the large sample.
Some research psychologists who have had formal training in statistics and
experimental design believe that a sample randomly drawn from a population
will be similar to the population in all important respects (Tversky &
Kahneman, 1971). This leads to the expectation that two randomly drawn
ESTIMATION AND PREDICTION • 301
small samples from the same population will be more similar to the population
and to each other than sampling theory would predict, and to such errors of sta-
tistical reasoning as overestimation of the replicability of results from a single
experiment and failure to consider the possibility that unexpected results could
be due to sampling variability. In general, it appears that people often are un-
aware of or ignore the importance of the validity of the information on which
predictions are based and are unaware that in the case of low validity, predic-
tions should be regressed toward some central tendency measure such as the
mean (Kahneman & Tversky, 1973).
L. J. Cohen (1962) has reported evidence that the statistical power of many
psychological experiments is extremely low, which means that by using small
samples experimenters run a high risk of failing to detect effects that are real,
which is to say, they risk failing to reject the null hypothesis when it is false.
Tversky and Kahneman (1971) obtained evidence on this issue by asking pro-
fessional psychologists a variety of questions regarding the design of studies
intended to replicate or check previously obtained experimental results. In
general, the sample sizes the respondents specified for replication studies were
sufficiently small to guarantee a large percentage of failures to find effects that
were there to be found. In Tversky and Kahneman's words, the believer in the
law of small numbers "gambles his research hypothesis on small samples with-
out realizing that the odds against him are unreasonably high" and in "evaluat-
ing replications, his or others, he has unreasonably high expectations about the
replicability of significant results" (p. 109).
Somewhat in contrast to the evidence that indicates the existence of an intu-
itive law of small numbers, there is also evidence that people have some sensi-
tivity to the importance of sample size in statistical reasoning and are able to
apply this sensitivity in a discriminating way (C. R. Peterson, DuCharme, &
Edwards, 1968). Expressed confidence in estimates of the mean or variance of
a set of numbers tends to increase (confidence intervals decrease) as the sample
size increases (Bar-Hillel, 1979; DuCharme & C. R. Peterson, 1969; C. R. Pe-
terson & Beach, 1967). People seem to understand also that one needs a larger
sample to justify a generalization about a characteristic that is likely to be
highly variable in the population than to justify a generalization about a char-
acteristic that is likely to be more constant (Evans & Pollard, 1985; Nisbett et
al., 1983). (Understanding of this principle does not preclude stereotyping,
which can rest on underestimation of within-group variability with respect to
specific characteristics.) How likely people are to use the law of large numbers
effectively in reasoning depends, in part, on whether the situation about which
they are reasoning seems to be governed by chance effects to a substantial de-
gree (Jepson, Krantz, & Nisbett, 1983). For example, people are less likely to
believe that a small sample of a slot machine's behavior is representative of its
302 • CHAPTER 8
long-term behavior than to believe that a small sample of an athlete's perfor-
mance is representative of his or her general ability.
It appears also that people are more likely to use the law of large numbers
when the events involved are highly "codable" than when they are not (Kunda
& Nisbett, 1986a; Nisbett et al., 1983). To be codable in this context means
roughly to be counted easily or to have features or aspects that are easily
counted. Kunda and Nisbett use sports activities as examples of codable
events. A basketball game has associated with it many events that can be
counted, such as number of baskets, rebounds, or assists per player. Social be-
havior is not likely to be so easily coded. Kunda and Nisbett suggest also that
people are more likely to use the law of large numbers for familiar domains and
problem types than for unfamiliar ones, which may be because more familiar
events tend to be more codable.
Population-Size Intuitions
By comparison with the research that has been done on intuitions regarding
sample size, relatively few studies have addressed the question of intuitions
about population size or, more specifically, about sample-to-population ratio.
The few relevant studies that have been reported suggest that people tend to be-
lieve that the larger the population, the larger the sample that is needed to repre-
sent it adequately, or, conversely, that small samples are likely to be more
appropriate for smaller populations than for larger ones (Bar-Hillel, 1979; Ev-
ans & Bradshaw, 1986); Evans and Bradshaw's participants selected popula-
tion size as the most important of several variables as determinants of how
many batteries should be sampled from a truckload of either 10,000 or 20,000
to decide whether the entire load satisfied a statistical acceptance criterion.
Some readers may be surprised to learn that, within broad limits, the adequacy
of a sample is independent of the size of the population from which it is drawn.
"Within limits" is an important qualification here, because it is not the
case that population size is never relevant to the question of adequate sample
size. A random sample of modest size that almost exhausts the population
from which it is drawn is likely to represent its population somewhat better
than a random sample of equal size drawn from a much larger population. In
general, however, in situations in which random sampling makes practical
sense, the sample will be a relatively small proportion of the population, and
in these cases what constitutes a sample size that is adequate to provide an es-
timate of a parameter with a given level of expected accuracy is essentially in-
dependent of the size of the population. A random sample of 1000, for
example, is likely to provide as accurate a representation of a population of
100,000 as of a population of 10,000.
ESTIMATION AND PREDICTION • 303
L. J. Cohen (1982) makes a distinction between confidence and weight, and
argues that confidence is determined strictly by sample size but that weight is de-
termined by sample-to-population ratio. Confidence, according to Cohen's anal-
ysis should depend on the proportion of random samples that resemble (within
specified limits) their parent populations with respect to some measure of inter-
est, and this is strictly a function of sample size, at least for samples that are small
relative to their populations. In contrast, weight or strength of evidence reflects,
at least in part, whatever legitimate reasons one has to assume that the sample is
not biased in any way, and one way to increase the probability that it is not is to
increase the fraction of the total population it contains. Again, this is easiest to
see in the case of samples that are relatively large fractions of their parent popu-
lations; there simply are fewer opportunities for bias with a sample that contains
90% of its population than with one that contains 10% of its.
In one study of risk estimation, both physicians and college students overesti-
mated the risk of death from various specified diseases, although the physicians'
estimates were more nearly accurate than those of the students (J. J. J.
Christensen-Szalanski, Beck, C. M. Christensen-Szalanski, & Keopsell, 1983).
For both groups, estimates were directly related to the frequency with which the
diseases had been encountered. The investigators concluded that experts and
nonexperts probably use similar thought processes to make frequency estimates
but that they differ with respect to their exposure to the estimated events.
A specified risk is likely to be perceived the more serious the larger the num-
ber of deaths or critical injuries that could result from a single incident (C. H.
Green & R. A. Brown, 1978). Also the frequency of sensational causes of death
(homicide, tornado) tends to be overestimated, whereas that of more mundane
causes (asthma, diabetes) is typically underestimated (Lichtenstein, Slovic,
Fischoff, Layman, & Coombs, 1978). These findings may be attributable in part
to the greater memorability of incidents affecting many people at one time in at-
tention-getting ways and of events with sensational causes, and they may rest in
part also on the tendency of the news media to give greater coverage to these
types of events (Combs & Slovic, 1979). Suggestive evidence that people's esti-
mates of the relative frequency of various causes of death are influenced by the
amount of media coverage given to them may be seen in the fact that accidental
death is likely to be estimated to be more frequent than death from stroke, al-
though the reverse is true in fact (Slovic, Fischoff, & Lichtenstein, 1976).
Reporting by the media of information regarding risks tends to focus on
newsworthy events—often catastrophic incidents such as the 1986 nu-
clear-power plant meltdown in Chernobyl or the 1984 toxic-chemical release in
Bhopal. It is not necessary to deny the importance of such events to observe that
media focus on them can help foster inaccurate beliefs about the relative magni-
tudes of various threats to the environment. The perceived risk of nuclear con-
tamination increased considerably, for example, following the Chernobyl
incident (Midden & Verplanken, 1990; Renn, 1990; van der Pligt & Midden,
1990; Verplanken, 1989). More generally, the public's attitudes about various
man-made health hazards—for example, possible carcinogens—appears to be
highly correlated with their frequency of mention by the media, which often is at
variance with the views of the scientific community (Rothman & Lichter, 1996).
ESTIMATION AND PREDICTION • 307
It may be that rare events receive more publicity than frequent events in part be-
cause of their rarity, and that they consequently come to be seen as more common-
place than they are. Paulos (1990), who makes this point, notes that sensational-
istic events—terrorist kidnappings and cyanide poisonings—are given a great deal
of media coverage, often of a highly emotional sort, whereas more mundane trage-
dies, such as the 300,000 deaths per year from smoking in the United States
alone—"roughly the equivalent of three fully loaded jumbo jets crashing each and
every day of the year"—are given little attention by comparison.
The effect of media attention is also illustrated by the crash of the Hindenburg
in Lakehurst, New Jersey, in 1937. This incident effectively brought to a halt the
use of zeppelins as a means of transportation and interest in developing the tech-
nology it represented, despite the fact that the 36 people who lost their lives in
this tragic event were the only people to die as the result of a mishap during 20
years of commercial airship travel (McPhee, 1973/1992). Only very recently,
more than half a century after the disaster, is interest in zeppelin-type airships of
transportation being revived (Hangenlocher, 1999).
The history of lighter-than-air aircraft stands in rather striking contrast to
that of steamships. More than 3,000 people were killed as the result of explo-
sions of steamship boilers between 1816 and 1851. (Burke [1966] gives an ac-
count of how the commonality of such explosions gradually changed the
public's attitude, and that of their elected representatives, about governmental
regulation of private enterprise.) One must assume that the fact that the
Hindenburg incident was captured on film as it happened, and was widely pub-
licized immediately following its occurrence and for years thereafter, had
much to do with the association of disaster with this type of air travel in the
public's mind and with the abrupt change of attitude regarding lighter-than-air
aircraft as a means of transportation.
Extensivesness of media coverage undoubtedly is a major factor in shaping
public opinion about the relative seriousness of various risks. It seems unlikely,
however, to be the whole story. It does not account, for example, for the finding
that experts commonly overestimate both the dangerousness of events and the
frequency of occurrence of dangerous events in psychiatric contexts (Ennis &
Litwack, 1974; Steadman & Cocozza, 1974; Ziskin, 1975). Sensational, emo-
tion-evoking events are more likely than more bland events to capture our at-
tention when they occur and our imagination when they do not, and to be
remembered in either case.
Hallman and Wandersman (1992; see also Wandersman and Hallman, 1993),
who have taken the position that people's responses to environmental threats are
predictable and not as irrational as they sometimes appear to be, note that those
responses are determined not only by the perceived likelihood of a threat being
realized, but also by a variety of other factors. People are likely, for example, to
find a risk that is voluntarily taken to be more acceptable than a risk of equal
magnitude that is involuntarily imposed (Fischhoff, Slovic, & Lichtenstein,
1978; Starr, 1972). Risks that are under individual control, fairly distributed
among a population, natural, familiar, detectable, well understood by science,
and ethical are more acceptable than risks that are under governmental control,
unfairly distributed, artificial, exotic, undetectable, or unethical.
Responses to perceived risk appear to be affected by personal values
(Brody, 1984; Office of Technology Assessment, 1987), which, in turn are in-
fluenced by the social and cultural contexts in which people live (Bradbury,
1989; Covello & B. B. Johnson, 1987; Vaughn & Nordenstam, 1991). Citing
the distinction made by Hance et al. (1988) between hazard factors, which are
those usually measured objectively if possible, and outrage factors, which tend
to be more social, political, or ethical in nature, Wandersman and Hallman
(1993) suggest that people are likely to be concerned (or outraged) by a risk
surrounded by outrage factors even if its probability of realization is low. Other
investigators who have emphasized the predictability of the perception of and
response to risk include Vlek and Stallen (1980), Baum, Fleming, and Singer
(1983), and Cvetkovich and Earle (1992).
When assessing the acceptability of future, especially far-future, risks, it
seems appropriate to discount them to some degree just because they are in the
future, and many things can happen to modify or nullify them—one has no guar-
antee, after all, of even being alive at any future date. The evidence suggests that
people do discount the future in the sense of attaching greater importance to
present costs and benefits than to costs and benefits that could be realized at a fu-
ture time. Possible future calamities are perceived as less serious the further in
the future their possible occurrence is considered to be. Although it is hard to
quantify the problem, it appears that people find it easy to overdiscount the fu-
ture. Thus they satisfy immediate wants by incurring future debt that they are un-
able to pay» they smoke and use other addictive drugs with apparent lack of
concern for the long-term risks to their health, and they often fail to incur modest
immediate costs (health check-ups, insurance) in the interest of protecting
against serious future problems (Klatzky & Messick, 1995; Platt, 1973).
In the context of economic decision making, there appears to be a general
preference for risk aversion, the prevalence of which Kahneman and Tversky
ESTIMATION AND PREDICTION • 311
(1979b) have referred to as "perhaps the best known generalization regarding
risky choices" (p. 263). Risk aversion appears to hold generally, especially
when one's choice is between positive alternatives, but risk seeking is often ob-
served when the alternatives between which one must choose are both nega-
tive, as, for example, when one's choice is between a sure loss and a gamble
involving a possible loss of greater magnitude and a possible gain (Fishburn &
Kochenberger, 1979; Markowitz, 1952; Shafir & Tversky, 1995). Kahneman
and Tversky (1979b) attribute the preference for risk aversion in the domain of
gains and for risk seeking in the domain of losses to the same psychological
principle—the overweighting of certainty. However, it is not just that certainty
is desirable, they suggest, but that certainty increases the aversiveness of losses
and the desirability of gains.
Risk aversion for choices among positive alternatives and risk seeking for
choices among negative alternatives are both incorporated in "prospect the-
ory," developed by Kahneman and Tversky (1979b). According to this theory,
the decision maker's focus is on the gains and losses that could be realized as a
consequence of the decision and not on the decision maker's final state of
wealth; and losses of specific amounts have a greater effect on subjective value
than do gains of the same amount. This aspect of the theory might be accounted
for by the fact that a gain of a specific amount is a smaller proportion of one's
resulting total wealth than is a loss of the same amount: A gain of $25 on $100
is one fifth of one's resulting total of $125, whereas a loss of $25 from $100 is
one third of one's remaining total of $75.
Buff on, in his "Essai d'Arithmetique Morale," published in 1777, also ar-
gues that a loss of a given amount is perceived as larger than a gain of the same
amount: If one stakes half one's wealth on a gamble, one stands to increase
one's wealth by one third and to decrease it by one half. Curiously, to justify the
one third in this argument, he divides the gain by the resulting total wealth, but
to justify the one half, he divides the loss by the original wealth. As Todhunter
(1865/2001) points out, the argument would be even stronger if he divided by
the resulting wealth in both cases (as do Kahneman and Tversky).
Prospect theory has proved to be predictive of behavior in many probabilis-
tic choice situations (Kahneman & Tversky, 1979b; Shafir & Tversky, 1995).
The asymmetric relationship between the subjective values of gains and losses
is known as loss aversion. This relationship has important implications in a va-
riety of contexts. Shafir and Tversky (1995) point out, for example, that a me-
diator of a dispute could increase the chances of obtaining an agreement by
framing concessions as bargaining chips rather than as losses.
On the average, groups are inclined to take riskier decisions than individu-
als, and individuals are likely to be willing to accept greater risk when partici-
pating in a group decision-making process than when making decisions on
312 • CHAPTERS
their own (Bern, Wallach, & Kogan, 1965; R. D. Clark, 1971; Kogan &
Wallach, 1967; Wallach & Kogan, 1965). This finding has been referred to as
the "risky-shift" phenomenon, and may be a special case of participation in
group discussion moving people to more extreme positions than they other-
wise would express (Myers & Lamm, 1976).
On the basis of several studies of risk-taking behavior, Slovic (1972) has
concluded that a propensity for risk taking is probably not a situation-inde-
pendent character trait. One's tendency to take risks in one situation has proved
not to be a good predictor of one's willingness to do so in qualitatively different
situations. Whatever the explanation for different propensities for risk taking,
it can be argued that from the point of view of rationality, one's attitude toward
risk must be taken as a given. Allais (1979/1990) puts it this way: "It is ac-
cepted that a rational individual's scale of psychological values may differ
from the monetary scale, and that he may have a greater or lesser propensity for
safety or for risk. There seems to be agreement that this is an issue of psychol-
ogy and not of 'rationality'" (p. 121).
Risk Communication
Inasmuch as both governmental policies and personal decisions often are made
in response to perceived risks, it is of some importance that reliable data re-
garding risks be communicated in a way that will be correctly understood by
the policy formulators and decision makers. In view of the considerable evi-
dence that many people have difficulty in dealing with probabilistic concepts,
finding effective ways to present risks to the general public is a challenge, and
recognized as such (Allen, 1987; Hance et al., 1988; National Research Coun-
cil, 1989). The point is made by reference to the Three Mile Island incident.
The information immediately forthcoming from various sources was inconsis-
tent and confusing (Goldsteen, Schorr, & Goldsteen, 1989). Lack of confi-
dence in communication from government officials is indicated by the flight of
about 200,000 people from the area following the issuing of a gubernatorial ad-
visory that pregnant women and preschool children living within 5 miles of the
plant (an estimated total of about 3,500 people) might want to evacuate and that
everyone living within 10 miles should consider staying indoors (Erikson,
1990). On the other hand, overreaction is not the invariable response to risk.
Many people refused to leave the vicinity of Mt. Saint Helens before its erup-
tion in May 1980 (Saarinen, 1980), and people do elect to live in areas where
risks of natural disasters are relatively high.
Several investigators have attempted to compare exposure to each of a vari-
ety of risks on a common metric, such as amount of decrease in one's life ex-
pectancy (B. Cohen & Lee, 1979; Sowby, 1965; R. Wilson, 1979). Slovic,
ESTIMATION AND PREDICTION • 313
Fischoff, and Lichtenstein (1981/1986) note that whereas such comparisons
may serve a useful educational purpose, they fail to capture the complexity of
people's attitudes toward risks that are affected by numerous variables, such as
the degree to which a risk is controllable or avoidable, the potential it has to as-
sume catastrophic proportions, and the threat it poses to future generations.
The same investigators take the position that appropriate presentations of fac-
tual material within a comparative framework can do much to counter mis-
perceptions and misestimates and to put risks in a more accurate perspective.
People are more likely to see data regarding risks to be relevant to them person-
ally if the data are framed in terms of personal risk than if presented as popula-
tion statistics (L. Jeffrey, 1989; Sharlin, 1986).
SELF-EVALUATION
Many things depend on people's subjective assessment of what they know and
do not know: whether they volunteer for certain roles or tasks, whether they seek
further practice or instruction, and whether they instill confidence in others, as
well as the answers they give to questions from superiors and subordinates and
the affect they induce on others by facial expressions and body language.
(Jacoby, Bjork, & Kelley, 1994, p. 57)
The ability to reflect on one's own thought processes is a fascinating ability and
one that we do not understand at all well. We know that we have this ability,
ESTIMATION AND PREDICTION • 315
however, and we use it in a variety of ways. We can make judgments and then
make judgments of those judgments. We can estimate the likelihood of a speci-
fied event and then rate our confidence in the estimate. We can assess what we
know on a particular subject relative to what there is to know.
This is not to claim that we can do these things well or that the self-evalua-
tion judgments we make are necessarily accurate. Such judgments must have
an accuracy greater than chance, or they would not be worth making, but it
would be surprising if they were highly accurate, given the introspective basis
on which they are made. Some of these judgments may even be a bit paradoxi-
cal. Consider, for example, the problem of assessing how much one knows rel-
ative to what there is to know about a given subject. In order to make such an
assessment, one must not only know what one knows about the subject, but in
some sense one must also know what there is to know—which is to say one
must know what one does not know—in order to make the comparison.
Psychologists have done numerous experiments in which people have been
asked to express their degree of confidence in various types of judgments that
they themselves have made. Interest in the topic goes back at least to the early
decades of the 20th century (Hollingsworth, 1913a, 1913b; Lund, 1925; Trow,
1923). One purpose of confidence-judgment studies has been to investigate
how confidence, or degree of certitude, relates to a variety of more objective
variables, such as the probability that a judgment made with a given level of
confidence is correct.
When participants in experiments have expressed confidence as probability
estimates or as ratings that can be transformed—with some plausible assump-
tions—into probability estimates, it has often been possible to compare these
probability estimates with performance on the primary task, determining for
each confidence estimate what percentage of the judgments on the primary
task to which that estimate was assigned were correct. Plots of actual percent-
age correct against percentage correct "predicted" by the confidence estimates
have been referred to as calibration curves; perfect calibration is represented
by the unit line, which indicates that for any given confidence estimate, X, the
proportion of all the judgments with that estimate that were correct was X. Cal-
ibration curves have been plotted by numerous investigators for judgments ob-
tained under a wide variety of conditions (J. F. Yates, 1990,1994).
Several different measures of performance that can be derived from confi-
dence estimates of the type used in calibration studies have been defined. Three
are over/underconfidence, calibration, and resolution. The following defini-
tions are from Lichtenstein and Fischhoff (1977):
316 • CHAPTER 8
where N is the total number of respon ses, nt is the number of times the response
rt was used, ct is the proportion correct for all items assigned probability rf and
Tis the total number of different response categories (confidence levels) used.
Overconliaence
We're so often cocksure of our decisions, actions, and beliefs because we fail to
look for counterexamples, pay no attention to alternative views and their conse-
quences, distort our memories and the evidence, and are seduced by our own ex-
planatory schemes. (Paulos, 1998, p. 55)
Probably the broadest generalization that comes out of calibration studies and
other investigations of judgments of judgments is that people tend to express a
higher degree of confidence in their judgments than is justified by the accuracy
of those judgments when they involve quantitative variables such as probabili-
ties. Calibration studies have generally shown overconfidence to be a more
common failing than underconfidence (Alpert & Raiffa, 1982; Arkes &
Harkness, 1983; J. J. Christensen-Szalanski & Bushyhead, 1981; Einhora,
1980; Einhorn & Hogarth, 1978; Fischhoff, 1982; Fischhoff & Slovic, 1980;
Kelley & Lindsay, 1993; Lichtenstein & Fischhoff, 1977; M. G. Morgan &
Henrion, 1990; Paese & Feuer, 1991; Pitz,1974; Ronis & F. Yates, 1987;
Slovik, Fischhoff, & Lichtenstein, 1977; Vallone, Griffin, Lin, & L. Ross,
1990). Reviews include Lichtenstein, Fischhoff, and Phillips (1982), Wallsten
and Budescu (1983), O'Connor (1989), and Keren (1991). The finding that
overconfidence in one's judgments is the rule has serious practical implica-
tions because, as Baron (1998) has pointed out, people are more likely to take
extreme actions on the basis of views in which they have great confidence than
on the basis of those for which they have reservations.
Much of the evidence of overconfidence comes from experiments in which
people have answered general-knowledge questions—questions pertaining to
the type of information found in almanacs or books of facts—with a forced-
choice format and have, for each answer, expressed their confidence that it was
correct, but overconfidence has been observed in other contexts as well. Retro-
spective judgments of comprehension of expository text have been observed to
be higher than justified (Glenberg & Epstein, 1985; Glenberg, Sanocki, Ep-
stein, & Morris, 1987; Glenberg, Wilkinson, & Epstein, 1982), as have expres-
sions of confidence in predictions of future events over which one is assumed
to have some degree of control (S. J. Hoch, 1985). Fischhoff and Slovic (1980)
asked people to perform a variety of difficult or nearly impossible tasks—iden-
tification of the nationalities of people on the basis of samples of their hand-
writing, projection of stock price activity, prediction of horse race outcomes—
and found that people's performance typically was not as good as their confi-
dence indicated they thought it would be. Griffin, Dunning, and L. Ross (1990)
showed that people tend to be overconfident of their ability to predict either
their own behavior or that of others in specified situations. People's assess-
ments of how well they know some subject matter appear not to be very indica-
318 • CHAPTER 8
live of how well they will do on an exam on the subject (Cull & Zechmeister,
1994; Mazzoni & Cornoldi, 1993).
Experts have also often been found to be overconfident when assessing their
own knowledge in their areas of expertise. When asked to predict the outcomes
of their own cases, attorneys, for example, appear to be overconfident in the ag-
gregate, in the sense of being apt to express confidence of obtaining outcomes
better than those they actually obtain (E. F. Loftus & Wagenaar, 1988;
Wagenaar & Keren, 1986). Other professionals who have been found to be
overconfident when making judgments in their own areas of expertise include
physicians (Faust, Hart, & Guilmette, 1988; Lusted, 1977), psychologists
(Oskamp, 1965), and engineers (Kidd, 1970). Experts appear to do better when
there is a reliable basis for statistical prediction, such as when predicting
bridge hands (Keren, 1987) or betting odds for horse racing (Hausch, Ziemba,
& Rubenstein, 1981).
People, including experts, also tend to be overconfident of their ability to es-
timate failure rates of system components. When asked to specify one failure
rate that they would expect only 5% of components to exceed and another that
they would expect only 5% of components to fall below, they typically pick
values that are insufficiently extreme (Lichtenstein, Fischhoff, & Phillips,
1977). Kahneman and Tversky (1973) refer to the confidence that people feel
for highly fallible judgments as the "illusion of validity." Apparently experts
are not immune from this illusion (Nuclear Regulatory Commission, 1978;
Slovic, Fischhoff, & Lichtenstein, 1981).
Kuhn, Weinstock, and Flaton (1994) have demonstrated a connection be-
tween confidence in a decision and the approach that one takes in reaching it.
They found that some participants (satisficers) in a mock jury trial tended to
use evidence selectively in order to build up a single view of what had hap-
pened, whereas others (theory-evidence coordinators) considered more than
one possibility and attempted to evaluate all of them in light of the accumulat-
ing evidence. A particularly thought-provoking aspect of their results is the
fact that those participants who took the former approach had greater confi-
dence in their decisions than did those who took the latter.
Lichtenstein and Fischhoff (1977) found the tendency toward overconfi-
dence to be especially pronounced when people made judgments regarding
which there was little objective basis for accuracy (e.g., whether drawings
were made by Asian or European children) and for which accuracy was close
to chance. Calibration and resolution in this case were also poor. The same in-
vestigators found that providing people with some task relevant knowledge de-
creased overconfidence and improved calibration and resolution, but that when
knowledge was very high (so that more than about 80% of the questions were
answered correctly) underconfidence sometimes became the rule.
ESTIMATION AND PREDICTION • 319
They concluded that it is not the case that people who know more know
more about how much they know, in general, but rather that confidence appears
to be most predictive of what one knows at intermediate levels of knowledge;
when one knows very little, confidence is likely to be too high, whereas when
one knows a lot (gives correct answers on a large majority of the questions) it is
sometimes moderately too low. Their results over a series of experiments seem
also to be consistent with the conclusion that resolution is somewhat less af-
fected by changes in the difficulty of the task than are under or overconfidence
or calibration. Making the question-answering task easier or more difficult, for
example, appears to have the effect of moving the entire correctness-confi-
dence curve up or down rather than changing its slope.
One account of why people tend generally to be overconfident in their an-
swers is that once having identified a plausible answer to a question or scenario
for the future, they fail to consider possible alternative answers or scenarios
(Griffin et al., 1990; S. J. Hoch, 1985; Shaklee & Fischhoff, 1982). A closely
related hypothesis is that when one has produced a tentative answer to a ques-
tion, people find it easier to bring to mind evidence in support of that answer
than evidence against it (Graesser & Hemphill, 1991; Koriat, Lichtenstein, &
Fischhoff, 1980). This account links the tendency to be overconfident of one's
answers with the idea of a pervasive confirmation bias (Nickerson, 1998). The
bias is expressed in this case as a tendency to bring to mind information that
will confirm the answer one has produced and to overlook information that
would count against it. This is similar to the explanation suggested by Nisbett
and L. Ross (1980) of why people may persevere with a belief even after learn-
ing that the information on which the belief was based was fictitious: after hav-
ing formed the belief they sought and found independent data to corroborate it.
Another suggestion, not necessarily in conflict with this one, is that a major
factor contributing to confidence in an answer is the speed (Nelson & Narens,
1990) and/or ease (Kelley & Lindsay, 1993) with which the answer comes to
mind. Nelson and Narens found a correlation between confidence and speed of
answering even when the answers were incorrect, although for a given speed,
confidence was higher for correct than for incorrect answers. V. L. Smith and
H. H. Clark (1993) also found that when people were able to answer a question,
the higher the confidence in the answer, the more quickly it was produced.
Kelley and Lindsay (1993) manipulated ease of access by exposing poten-
tial answers, some correct and some incorrect (but plausible), to questions be-
fore asking the questions. Prior exposure increased the speed with which
participants produced those answers and their confidence in them when they
used them, in both cases. This was true even in a condition in which none of the
items on a preexposed list was a correct answer to the subsequent question and
participants were informed of this and told to ignore the list during the ques-
320 • CHAPTERS
tion-answering phase of the experiment. The fact that they appeared to be un-
able to avoid the influence of prior exposure on feelings of confidence was
taken as strong evidence that confidence is based in part on the ease with which
answers, right or wrong, come to mind.
Griffin and Tversky (1992) make a distinction between strength (extreme-
ness) of evidence and weight (predictive validity) of evidence and hypothesize
that people tend to focus primarily on the former and make some—but typi-
cally insufficient—adjustments in response to the latter. This hypothesis leads
to the expectation that overconfidence will be the rule when strength is high
and weight low, whereas underconfidence will prevail when the opposite is the
case. Griffin and Tversky argue that this hypothesis can reconcile the apparent
discrepancy between the finding of conservatism in the updating of posterior
probabilities in Bayesian decision making (Edwards, 1968) and the finding
that people often make radical inferences on the basis of small samples
(Tversky & Kahneman, 1971). Conservatism or underconfidence has typically
been found, they suggest, when people have been exposed to large samples of
data of moderate strength, and radicalism or overconfidence has generally
been observed in situations involving moderately strong effects based on small
samples; so both phenomena follow, according to this view, from the domi-
nance of evidentiary strength over weight.
Numerous studies have shown that medical diagnoses based on case statis-
tics tend to be more accurate than those based on clinical judgments, and that
despite this fact clinicians typically have greater confidence in their own judg-
ments than in those derived statistically from incidence data (Dawes, 1976).
Studies yielding predictions from case statistics that are at least as good as, and
usually better than, those from judgments of experts include those of Dawes
(1971), Wiggins (1981), Wedding (1983), Leli and Filskov (1984), Meehl
(1986), among many others. A simple formula developed by L. R. Goldberg
(1965, 1968) distinguishes neurotics from psychotics on the basis of Minne-
sota Multiphasic Personality Inventory scores with 70% accuracy. The fact that
70% is greater accuracy than any group of clinicians had attained on this task
did not suffice to convince clinicians to use the formula instead of their own in-
dividualized approaches (Arkes et al., 1986). One study of the way clinicians
score projective tests showed 18.5% of them using standard procedures and
81.5% using their own individualized procedures (Wade & T. B. Baker, 1977).
Although the quality of decision making has not always been found to in-
crease with the quantity of relevant information available to the decision
maker (L. R. Goldberg, 1968; Hayes, 1964), confidence may increase with
the amount of information available even when accuracy does not. Clini-
cians' confidence in their judgments about cases, for example, has been ob-
served to increase with the amount of information on which the judgments
ESTIMATION AND PREDICTION • 321
were made even when the accuracy of the judgments did not improve
(Oskamp, 1965; Ryback, 1967).
Allwood and Granhag (1996) found that the accuracy or realism of confidence
judgments improved when people were asked to evaluate the extent of their
knowledge in the area of a question they were about to be asked. These investiga-
tors interpreted this result as at least suggestive evidence that people's evaluations
made them more aware than they otherwise might have been of the limitations of
their knowledge in the specified areas. An interesting aspect of Allwood and
Granhag's results is that participants judged their knowledge to be more nearly
comprehensive for broadly defined domains than for narrowly defined ones.
Confidence, or degree of certainty, has usually been expressed in calibration
studies as a rating on a linear scale (e.g., a 7-point scale anchored at one end by
"certainty" and at the other by "pure guess"). Sometimes, however, people
have been asked to estimate odds, the ratio of the chance that a particular situa-
tion pertains to the chance that it does not. The most extreme examples of over-
confidence have been found in these cases. Fischhoff, Slovic, and Lichtenstein
(1977) had people judge which of each of several pairs of causes of death was
the more prevalent, and express their confidence in each judgment in terms of
the odds that it was correct. Subjects tended to give odds that were way out of
line with their primary task performance especially with the easiest compari-
sons: Odds given for comparisons on which the participants were right 90% to
95% of the time ranged from 10,000:1 to 1,000,000:1. One possible explana-
tion for such unrealistic numbers is that the participants lacked a clear under-
standing of what "odds" means. A lecture on probability and odds caused a
reduction in the use of such extremes but did not eliminate it. Slovic et al.
(1977) also found that people tended to be overconfident when asked to judge
the odds that their answers to general-knowledge questions were correct. It ap-
pears that overconfidence is the rule in calibration studies independently of
whether confidence is expressed with probabilities or with odds (Hazard & C.
R. Peterson, 1973; Seaver, von Winterfeldt, & Edwards, 1978).
Reducing Overconiiaence
Studies of Bayesian decision making have shown that people often overesti-
mate the completeness of the sets of hypotheses provided about the possible
ESTIMATION AND PREDICTION • 323
states of the world (Fischhoff et al., 1978; Gettys, S. D. Fisher, & Mehle, 1978;
Mehle, Gettys, Manning, Baca, & Fisher, 1981). Given this tendency, the use
of previously prepared check lists in decision-making situations would appear
to have some merit, and there is evidence that decision making can be im-
proved in this way (de Dombal, Leaper, Horrocks, Staniland, & McCann,
1974). Users of electronic document searches often overestimate, sometimes
by large amounts, the percentage of the relevant documents that a search has
returned (Blair & Maron, 1985; MacGregor, Fischhoff, & Blackshaw, 1987).
These and similar results are suggestive of a general tendency to overestimate
the completeness of lists that are presented to us for various purposes.
Such a tendency can help explain, at least in some cases, the common find-
ing of overconfidence in judgments of various types. If I can think of only two
or three possible hypotheses to account for some event and I believe my short
list to be relatively complete, when in fact it is not, I might judge the probability
that my favored hypothesis is correct to be quite high, on the assumption that it
is one from a small number of possibilities; whereas if I realized that it was re-
ally one from a large number of possibilities, my confidence in its correctness
might be considerably lower.
Gettys et al. (1978) have proposed essentially this account of the finding
that people tend to be overconfident of the probable adequacy of stated hypoth-
eses: People not only fail to generate some plausible hypotheses because of
memory failure or lack of knowledge, but they judge the set of hypotheses they
have before them (produced from memory or provided by someone else) to be
more complete than it is because they overlook the limitations of their own
knowledge and memory. In other words, they discount hypotheses they have
been unable to generate by ignoring the possibility there may be some.
A tendency to overestimate the completeness of lists can be seen as itself a
manifestation of overconfidence. In estimating any set to be more complete
than it really is, one is, in effect, taking the fact that one cannot think of mem-
bers of the set that are not included in the enumeration as evidence that they
do not exist: If many do exist, one has overestimated either one's knowledge
of the set or one's ability to retrieve from memory the information retained
there. In either case, one has shown more confidence in one's cognitive re-
sources than warranted.
Overconfidence in one's own judgments may be a consequence, in part, of
the fact that judgments that are made outside the laboratory are typically made
for the purpose of choosing among action alternatives, and such choices often
have the effect of precluding the possibility of obtaining certain types of evi-
dence that would show them to have been poor ones, if they were. Inasmuch as
decisions often rule out the possibility of discovering what the consequences
of different choices would have been, the decision maker may not get the kind
324 • CHAPTER 8
Juslin (1993) argues that if people are well calibrated to their natural environ-
ment, a minimum requirement for the observation of good calibration in the
experimental situation is that the questions that people are to answer, and with
respect to which they are to judge the probability of the correctness of their an-
swers, are selected in such a way as to ensure that cues that are valid in the natu-
ral environment remain valid in the experimental situation. He contends that
certain strategies that are commonly used to select items for use in experiments
more or less ensure that the knowledge that is valid in participants' natural en-
ESTIMATION AND PREDICTION • 327
vironment will be less valid in the experimental situation; more specifically,
the argument is that such selection strategies typically result in sets of items for
which cues leading to wrong answers are "overrepresented" relative to their
commonness in the natural environment, and that this is especially so for rela-
tively difficult items. The hard-easy effect can be an artifact, he suggests, of
the partitioning of items into different difficulty categories after the fact on the
basis of performance on the primary task. Supporting this explanation are the
results of experiments showing good calibration for items selected at random
from a set assumed to be representative of the natural environment
(Gigerenzer, Hoffrage, & KleinbSlting, 1991; Juslin, 1993,1994). A counter-
example is given, however, by Griffin and Tversky (1992). Also the Nickerson
and McGoldrick (1963) study mentioned earlier, which showed the hard-easy
effect, did not involve partitioning the items into different degree-of-difficulty
categories on the basis of performance on the primary task.
A common finding in revision-of-opinion studies with a Bayesian paradigm
was that people revise opinions on the basis of new data less than they should,
according to the Bayesian prescription. They are said to be too conservative in
their treatment of new data. As a consequence of this tendency, the (posterior)
probabilities that people assign to events tend differ less from the prior proba-
bilities than they should with application of Bayes's rule. One might say that
the subjective posterior probabilities produced in these studies represent
underconfidence. (It must be said that, from an alternative perspective, the
finding might be interpreted as overconfidence in one's prior probability, or
overconfidence in an existing opinion.)
Erev, Wallsten, and Budescu (1994) address this apparent conflict in the lit-
erature directly and demonstrate that it is possible to get evidence either of
overconfidence or of underconfidence from the same data set, depending on
how it is analyzed. They note that Bayesian experiments showing conserva-
tism in the use of new data (underconfidence) have typically focused on sub-
jective probabilities as a function of known objective probabilities, whereas
calibration experiments have typically considered objective probabilities as a
function of expressed subjective probabilities. They show that it is possible
with a single data set to get the appearance of underconfidence with the first
type of analysis and that of overconfidence with the second type (see also Erev,
Bornstein, & Wallsten, 1993), and, with a log-odds model of how confidence
in a judgment might be determined, they show that even one whose judgments
are neither over- nor underconfident can appear to be either if the decision cri-
terion includes a random error term.
Erev et al. (1994) do not conclude that all evidences of over- or under-
confidence are artifactual. Such a conclusion is unwarranted, they argue, in
view of the fact that the magnitudes of both over- and underconfidence can be
328 • CHAPTER 8
varied systematically and that, for both types of analysis, the effects sometimes
invert. Nevertheless, the demonstration that the same data set can be made to
appear indicative of either overconfidence or underconfidence, depending
only on how it is analyzed, is a powerful reminder of the need for care in the in-
terpretation of experimental results. Erev, Wallsten, and Budescu argue that
how one should look at judgment data depends on what is being asked:
"Conditionalize on the event state (or on probability when it can be independ-
ently defined) when the research is aimed at understanding the underlying cog-
nitive processes themselves. Conditionalize on response when the focus is on
accuracy. In either case, the performance measure will not be fully understood
without incorporating notions of error" (p. 526).
Contraction Bias
Not all studies of confidence have had people express confidence level by
giving confidence or degree of certainty ratings or probability estimates.
Pitz (1974) had college students indicate their degree of uncertainty about
the population of each of 23 countries in the Americas in the following way.
For each country a participant was to provide two numbers such that, in his
or her opinion, the true population was equally likely to be (a) less than the
smaller number, (b) greater than the larger, or (c) between the two. If partic-
ipants had an accurate understanding of their own uncertainty about the
populations involved, we would expect the true population to be in each of
the "tertiles" about one third of the time. In fact on only about 16% of the tri-
als did the true population fall between the two numbers given, which sug-
gests that the participants typically believed they could bracket the true
value more narrowly than they could in fact do so. Pitz referred to his result
as the "hyperprecision effect."
330 • CHAPTERS
In another study described by Pitz (1974) in which the tertile estimation pro-
cedure was used, the hyperprecision effect was also obtained when partici-
pants estimated populations or the heights of well-known buildings, but not
when they estimated the ages of well-known people. In the last case, the center
tertile included the true age 47% of the time, which is to say, people underesti-
mated their ability to bracket these values, or overestimated their degree of un-
certainty. Pitz attributed this result to the greater store of knowledge that
people possess about people's ages. In fact the result does not really tell us that
the participants knew more about ages than about populations and building
heights; it tells us only that they knew more about ages, and less about popula-
tions and building heights, than they thought they did.
Pitz (1974) mentions also unpublished data obtained by Alpert and Raiffa
showing a hyperprecision effect. In this case participants were asked to esti-
mate the 1 st and 99th percentile values for various quantities and gave numbers
that typically bracketed the correct value about 50% of the time rather than
98%. In the aggregate, the results of these studies suggest that one is more
likely to overestimate than to underestimate how precisely one can specify dis-
tribution percentiles, and this is consistent with the common finding of over-
confidence in other types of judgment tasks.
SUMMARY
The abilities to estimate and to predict are valuable skills. The focus of this
chapter has been on statistical estimation and prediction. This is a small piece
of the broader topic that would include estimation, or prediction, of a variety of
physical or temporal variables, but it is an important part inasmuch as many of
the practical contexts in which the skill is needed are probabilistic in nature.
Research results suggest that people are reasonably good at estimating mea-
sures of central tendency of distributions and relative frequencies of events,
and less good at predicting the time and resources that will be required to per-
form specified tasks. Estimates of time and costs typically err on the low side.
One factor that appears to contribute to this problem is the ease with which
people assume, perhaps unwittingly, that everything will go according to plan
(overestimation of the probability of the conjunction of successes) and that
nothing will go wrong (underestimation of the probability of the disjunction of
possible problems).
Predictions of the outcomes of probabilistic events are subject to a number
of misconceptions and biases. Perhaps the best known misconception is the
gambler's fallacy. Predictions often are influenced by preferences—people
tend to overestimate the likelihood of occurrence of what they wish to occur.
Arguably this tendency makes lotteries appear more attractive than they
ESTIMATION AND PREDICTION • 331
might appear if people clearly understood the odds against winning. Predic-
tions can also be influenced by intuitions about sample size and population
size and how the two relate.
Opinions about the relative seriousness of various risks are shaped not only
by the objective probabilities with which feared events occur, but also by irrel-
evant factors such as the publicity that events of different sorts receive. Also the
acceptability of specific risks is determined not solely by the seriousness of the
consequences of the occurrence of a feared event, but by factors such as one's
sense of control of the risky situation (driver vs. passenger in a car), and
whether exposure to the risk is voluntary or imposed. Determining how to
communicate information about specific risks in such a way as to evoke appro-
priate action is an important research objective.
Evaluations of one's own estimation or prediction capability or performance
frequently prove to be too generous, which is to say, people's expression of con-
fidence in the accuracy of their own judgments often tend to be higher than their
performance warrants. There is some evidence that people evaluate their own
performance more accurately when they judge it in terms of percentage of an-
swers correct than in terms of confidence on an answer-by-answer basis. Finding
ways to improve people's ability to assess their own capabilities and perfor-
mance more accurately is another continuing challenge for research.
CHAPTER
Perception 01 Covariation
and Contingency
332
PERCEPTION OF COVARIATION AND CONTINGENCY * 333
Chapman, 1967; Crocker, 1981;H.M. Jenkins &W.C. Ward, 1965;Shaklee&
Tucker, 1980; Shweder, 1977; Smedslund, 1963). Shweder (1977) suggests
that people's estimates of co-occurrence in their own experience are more
strongly dependent on resemblance or semantic similarity than on actual fre-
quency of co-occurrence. He suggests that people not only base judgments of
what goes with what on resemblance but that they seem unable to make use of
frequency information or to reason correlationally. On the other hand, Nisbett
and L. Ross (1980) note that, although laboratory studies indicate that people
are not very good at detecting covariation, they nevertheless adapt well in so-
cial contexts that seem to require the ability to detect covariation with consid-
erable accuracy. This, they suggest, requires an explanation. This does not
invalidate the results of laboratory studies, but it does suggestion caution in ex-
trapolating those results to real-world contexts.
Several investigators have studied the ability of people to estimate the corre-
lation between series of number pairs and found it to be quite good (Erlick &
Mills, 1967; Jennings, Amabile, & L. Ross, 1982). Such estimates tend to be
more accurate when the correlations are positive than when they are negative
(Erlick & Mills, 1967) and the errors that are made are typically in the direction
of conservatism, which is to say in that of underestimating the strength of the
relationship (Beach & Scopp, 1966; Jennings et al, 1982).
BIASED FOCUS
The relationship between two variables, A and B, is often represented by show-
ing, in a table, the frequencies of occurrence of all four possible combinations
of the presence or absence of each variable. An example of such a table, some-
times referred as a contingency table, is shown as Table 9.1.
When people are asked to estimate the strength of the relationship between two
variables based on information in two-by-two contingency tables, they often
TABLE 9.1
Illustrating the Tabular Representation of the Contingency Between A and B
A Present A Absent
B present A and B Not-A and B
B absent A and not-B Not-A and not-B
334 • CHAPTER 9
give too much weight to the cell representing the positive state of both vari-
ables (A and B) and pay too little attention to the other cells (Arkes & Harkness,
1983; Crocker, 1981; Doherty & Falgout, 1986; Einhorn & Hogarth, 1978; H.
M. Jenkins & W. C. Ward, 1965; Kuhn, Phelps, & Walters, 1985; C. R. Peter-
son & Beach, 1967; Schustack & Sternberg, 1981; Shaklee & Mims, 1982;
Shaklee & Tucker, 1980; Smedslund, 1963; Wasserman, Dorner, & Kao,
1990). The general finding is illustrated by a study by Smedslund in which a
group of nurses reviewed a set of clinical cases in which a specific symptom
sometimes co-occurred with a particular diagnosis. The set of cases selected
for review included all four possible combinations (symptom-disease, symp-
tom-no disease, no symptom-disease, no symptom-no disease) but the actual
correlation between symptom and disease was zero. The nurses reported the
correlation to be positive, however, on the strength of their observation that the
symptom often co-occurred with the disease.
The cell of a two-by-two contingency table that tends to get the most atten-
tion—the one that represents cases in which both of the variables of interest
occurred—is typically referred to as the "plus-plus" cell; the other three cells
represent cases in which one or both of the variables failed to occur. With re-
spect to this terminology, it is important to note that "plus-plus" need not re-
fer to a specific cell (e.g., upper left) of a contingency matrix, but rather to the
combination of variables on which the individual focuses. This is illustrated
by an experiment by Crocker (1982) in which some participants were asked
to judge whether practicing the day before a tennis match related to winning
and others were asked to judge whether such practice related to losing. Both
groups were asked which cells of the two-by-two contingency table that
showed the frequencies of all combinations of practicing-not practicing and
winning-losing would provide information useful for their task. The group
focused on winning was most interested in the cell representing the combina-
tion of practicing and winning; the one focused on losing wanted to know
about the cell representing the combination of practicing and losing. Both
showed the plus-plus bias, but what constituted the plus-plus cell differed for
the two groups.
Focusing on the plus-plus cell of a contingency table, while completely ig-
noring the other three cells, is an extreme form of biased focus and not very
representative of behavior. What is more typical is an undue emphasis on the
plus-plus cell and insufficient attention to the other cells. Attention typically
has been unequally distributed among the remaining cells, with the two cells
representing the presence of one or the other of the variables receiving more
than the one representing the absence of both (Arkes & Harkness, 1983;
Schustack & Sternberg, 1981; Shaklee & Mims, 1982). The strategy of focus-
ing primarily on the plus-plus cell and paying insufficient attention to the oth-
PERCEPTION OF COVARIATION AND CONTINGENCY • 335
ers was noted by Inhelder and Piaget (1958) and interpreted by them as an
indication of immature judgment.
None of this is to suggest that all the cells of a two-by-two contingency table
are equally important and always deserving of equal weight in any attempt to
judge the strength of relationship between two variables. How much weight
one should give to the various cells of such a table depends, to some degree, on
the assumptions on which one is working. J. R. Anderson (1990) shows by a
mathematical analysis, for example, that the minus-minus cell should carry as
much weight as the plus-plus cell only if it can be assumed that the prior proba-
bility of the effect in question occurring in the presence of the cause in question
is equal to the prior probability of that effect not occurring in the absence of
that cause. Thus, according to Anderson, the fact that people tend to focus on
the plus-plus cell when attempting to determine the strength of relationship be-
tween two variables represented in a two-by-two contingency table is not com-
pelling evidence that they are behaving irrationally; "the critical issue is what
prior model is adopted in a rational analysis" (p. 160).
That being said, it must be noted that the consensus among investigators
who have studied this situation appears to be that the degree to which people
focus on the plus-plus cell and ignore or discount the others is not usually justi-
fied and typically leads to estimating the strength of relationship between two
variables to be stronger than the data indicate it to be. Focusing on the plus-plus
cell in two-by-two tables is sometimes seen as one of several examples of the
difficulty people have in using disconfirming information.
How misleading a focus on the plus-plus cell can be depends on the nature
of the relationship between the variables involved. If the variables are highly
correlated, such a focus is not likely to lead one to conclude that they are not. It
is easy to imagine circumstances, however, in which this focus could result in
the conclusion of a strong relationship when one does not exist. This asymme-
try helps account for the fact that this type of focus typically leads to an overes-
timation of the strength of the relationship between the variables involved.
It should be clear that by ignoring, or paying too little attention, to cells
other than the plus-plus cell, one is extracting much less information about the
possible relationship between A and B than a contingency table provides. Sup-
pose that A is a necessary cause of B (or that B is a sufficient cause of A). In that
case we would expect to find no occurrences of not-A and B. Or if A were a suf-
ficient cause of B (or B a necessary cause of A), we would expect no occur-
rences of A and not-B. If A were a necessary and sufficient cause of B (or B a
necessary and sufficient cause of A), we would expect to see entries only in the
plus-plus and minus-minus cells. Conversely, if we observed one of these pat-
terns, we would have grounds for suspecting a causal relationship of the asso-
ciated type: if, for example, we observed a table with sizeable numbers in all
336 • CHAPTER 9
cells except not-A and B and a zero in that one, we might begin to suspect that A
was a necessary cause of 5. None of these possibilities would be noted, how-
ever, if we focused only on the plus-plus cell.
Leaving aside the question of a possible causal relationship between A and
B, the plus-plus cell by itself cannot tell us as much as there is to learn from the
table about the degree to which the two variables covary. The higher the posi-
tive correlation between them, the more we expect to see the observations con-
centrated in the plus-plus and minus-minus cells; the higher the negative
correlation, the more they would be concentrated in the two remaining cells.
Inhelder and Piaget (1958) suggested that in judging the strength of contin-
gency between two variables people may focus on the difference between (a)
the sum of the plus-plus and minus-minus cells and (b) the sum of the remain-
ing two cells: the larger the difference, the stronger the contingency. We should
note that the difference can be either positive (indicating a positive correlation)
or negative (indicating a negative one).
Perhaps, then, the most concise summary of the empirical work on covariation
detection is that judgments of covariation are relatively accurate when people
lack strong beliefs about the event relationship in question or when the situa-
tional information concerning the objective correlation between the events is
congruent with people's preconceptions about the event relationship. When ob-
jective data and preconceptions are incongruent, judgments of covariation are
frequently erroneous and biased in the direction of initial expectations, (p. 123)
This view has been challenged (Goddard & Allan, 1988) and defended (Al-
loy, 1988); it seems safe to say that precisely how good people are at detecting
covariation and the conditions that determine the accuracy with which they do
so remain subjects of debate.
ILLUSORY CORRELATION
You can find perfect correlations that mean nothing for any three people and
three characteristics, and in general for any N people and N characteristics....
Thus we tend to overestimate our general knowledge of others and are con-
vinced of all sorts of associations (more complicated variants of "more shy, less
intelligent") that are simply bogus. By failing to adjust downward our multiple
correlation coefficients, so to speak, we convince ourselves that we know all
manner of stuff that just isn't so. (Paulos, 1998, p. 27)
Cause—Eiiect Relationships
PROBABILITIES OF CONJUNCTIVE
OR DISJUNCTIVE EVENTS
p(AB)=p(A)p(B\A),
342 • CHAPTER 9
or as
Overestimates 01 Conjunctions
Underestimates of Disjunctions
Although the strength of the conjunction fallacy has been shown to vary con-
siderably with the specifics of the task (Morier & Borgida, 1984; J. F. Yates &
Carlson, 1986), it is sufficiently common that the question naturally arises as to
whether it may have some functional basis. Is there a point of view from which
the selection of Alternative (b) in the foregoing example might be considered
reasonable? Note that the information provided about Linda in this problem is
totally irrelevant to the selection of the more probable of the two statements.
Statement (a) is more probable (more accurately, not less probable) than State-
ment (b) by virtue of the conjunction rule, independently of any information
provided in Linda's description. However, it seems not unreasonable of partici-
pants in psychological experiments to assume that the information that is being
given to them in problem-solving situations is relevant to the problem they are
asked to solve. In this case, most of the information provided about Linda is, by
intention, the type of information that would be useful for distinguishing activ-
ists in the feminist movement from nonactivists but not for distinguishing bank
tellers from non-bank tellers. One might say the experimental approach is to
lead the participant down a garden path. Perhaps some people select (b) on the
assumption that the experimenter would not provide them with irrelevant or
misleading information.
It is easy to imagine being given the kind of information about Linda in this
example in a nonlaboratory situation in which one was being asked to judge the
probability that she was active in the feminist movement, whereas it is much
less easy to imagine being given this information if one were being asked to
judge whether she was a bank teller. If one were being asked to decide whether
she was a bank teller, one would expect to be given information that was more
relevant to that decision. The fact that people erroneously select Alternative (b)
when presented with this problem may be less than compelling evidence that
346 • CHAPTER 9
they are poor statistical reasoners; it may be that they interpret the question ac-
cording to certain rules of language usage that work in everyday conversation
if not always in the psychology laboratory. If the point were to determine
whether people understand and can apply the conjunction rule one might ask
them to judge which of the two statements, (a) or (b), is the more probable
without providing the garden-path information about Linda or indeed any in-
formation about her at all.
Kahneman and Tversky (1982a) make essentially this point in one of their
discussions of this and similar findings. They cite the "cooperativeness princi-
ple" by which the listener in a conversation is entitled to assume that the
speaker is trying to be informative (H. H. Clark & E. V. Clark, 1977; Grice,
1975; Searle, 1975). They note that people are likely to assume that the princi-
ples that apply to everyday conversations apply also in the experimental labo-
ratory, which is to say they are likely to assume that the information provided
them by the experimenter in problem-solving situations is relevant to the task
that is to be performed; if it were not, according to the cooperative principle,
the experimenter would not provide it. Macdonald (1986; Macdonald &
Gilhooly, 1990) makes a similar argument, and raises the question whether,
given the conventions of everyday conversation, participants in these experi-
ments are wrong to see the phrasing of questions as providing information.
Other investigators have also argued that what have sometimes been taken as
evidence of illogicality in dealing with Linda-like problems may have more to
do with linguistic conventions (Dulany & Hilton, 1991; Politzer & Noveck,
1991; Slugoski & A. E. Wilson, 1998).
The idea that people take the phrasing of questions as informative gets
strong support from the fact that participants in Tversky and Kahneman's
(1983) experiment considered it more likely than not that Linda was a feminist
bank teller. As Macdonald and Gilhooly (1990) argue, the probability that a
randomly chosen woman would be a feminist bank teller is surely very small.
One explanation of the relatively high probabilities that the participants as-
signed to Linda's being a feminist teller, they contend, is that the question itself
suggests that there may be reason to believe that she is.
L. J. Cohen (1982), among others, takes the position that the performance of
people in these situations can be seen as reasonable behavior, depending on
what we assume people understand their task to be. Perhaps they take their task
to be to assess the believability of a story about a person cast in terms of the
causes or effects of the collections of features constituting their profiles, rather
than as that of estimating the relative frequencies of different kinds of people:
One has to take into account not just the meanings of sentences in which instruc-
tions to subjects are formulated but also the implications of uttering them. So
PERCEPTION OF COVARIATION AND CONTINGENCY • 347
when asked for the probability of a particular single event subjects may well in-
fer that what is wanted is an estimate of the believability of that event's occur-
rence as an apparently isolated event, which could well be lower than the
believability of the occurrence of a particular causal sequence containing the
event. If, on the other hand, subjects are asked specifically for the probability of
the single event's occurrence whether in isolation or within the particular se-
quence, it would be very surprising indeed if they then went on to declare the
particular sequence's occurrence to be even more probable, (p. 264)
Dawes (1988) notes that "on the average, 91% of the 32 internists ques-
tioned believed that the combination of a probable symptom (in this case
dyspnea) and the improbable one (in this case hemiparesis) was more likely
than the improbable symptom alone" (p. 130). What I want to call attention to
is the ambiguity of this statement. "The improbable symptom alone" could
be taken to refer to the improbable symptom (hemiparesis), whether or not
accompanied by the more probable symptom, or it could be taken to mean the
improbable symptom by itself, which is the improbable symptom in the ab-
sence of the more probable one. These two interpretations are quite different
and a conjunction fallacy is implicated only in the case of the former one, be-
cause, given the latter interpretation, the two sets—(a) dyspnea and
hemiparesis and (b) hemiparesis alone—are disjoint. The ambiguity of the
terminology in this description of the outcome of an experiment lends cre-
dence to the possibility that participants in the experiment could have been
confused by the same type of ambiguity.
Appeal either to the cooperativeness principle or to the filling-in hypothesis
helps us see how people could be led to overlook the simple set-subset relation-
ships involved in the situations described previously, but neither seems to ap-
ply as readily to another result obtained by Tversky and Kahneman. In this
case, one group of participants was asked to estimate the frequency of
seven-letter words with "n" in the next-to-last position, and a second group was
asked to estimate the frequency of seven-letter words ending in "ing." Inas-
much as the first set includes the second it cannot be the smaller of the two.
Nevertheless, estimates of the "ing" set were reliably larger than estimates of
the "n_" set. One might object to treating this result as an example of the con-
junction fallacy, inasmuch as the same participants did not make both esti-
mates. On the other hand, if the estimates the two groups produced are taken as
indications of what people generally believe about the sizes of these two sets,
we must conclude that the subset is generally believed to be larger than the
more inclusive set from which it is drawn.
The final example relates to the conjunction fallacy in an unusual way and
was also reported by Tversky and Kahneman (1971). In this case, research psy-
PERCEPTION OF COVARIATION AND CONTINGENCY • 349
chologists found a single experimental result that strongly supported an hy-
pothesis to be more compelling evidence than the same result combined with
another that provided weak, but still positive, support for the hypothesis. It ap-
pears that the weight of the strong result was diminished by being coupled with
the weak one, even though the conjunction of strong and weak positive results
is less likely to occur by chance than is a strong positive result alone.
Gigerenzer (1991b, 1993,1994) argues that the "conjunction fallacy," as
commonly described, is not really a fallacy, at least in the eyes of a probabil-
ity theorist of the frequentist school. The argument is similar to that used to
discount the overconfidence effect (see chap. 8). Frequentists, Gigerenzer
contends, do not recognize the meaningfulness of the application of proba-
bility to single events, so the question of the probability that a person fitting
a given description is a member of a specified profession has no answer:
"What is called the 'conjunction fallacy' is a violation of some subjective
theories of probability, including Bayesian theory. It is not, however, a vio-
lation of the major view of probability, the frequentist conception" (1991b,
p. 92). "From the frequency point of view, the laws of probability are mute
on the Linda problem, and what has been called a conjunction fallacy is not
an error in probabilistic reasoning—probability theory simply doesn't ap-
ply to such cases. Seen from the Bayesian point of view, the conjunction fal-
lacy is an error" (1993, p. 293).
In any case, it appears that when problems like those described earlier are
phrased in frequentistic terms—for example, how many out of 100 people fit-
ting Linda's description are (a) bank tellers (b) bank tellers and active in the
feminist movement?—the conjunction fallacy is much less likely to occur
(Fiedler, 1988; Hertwig & Gigerenzer, 1994). Gigerenzer (1994) argues that,
this being the case, "instances of the 'conjunction fallacy' cannot be properly
called reasoning errors in the sense of violations of the laws of probability. The
conceptual distinction between single-event and frequency representations
suffices to make this allegedly stable cognitive illusion largely disappear" (p.
144). And, Gigerenzer notes, the conjunction fallacy is not the only cognitive
illusion to which this argument applies.
CONDITIONAL PROBABILITIES
The probability of A conditional on B, often represented as p(A I B), is the
probability that A is true, given that B is true. According to the probability cal-
culus p(A I B) =p(A & B)/p(B), which is to say the probability that both A and
B occur, divided by the probability that B occurs. If A almost always occurs
when B does, the conditional probability is high; if B more often than not oc-
curs without A, the conditional probability is low.
350 • CHAPTER 9
Dimculties in Conaitionalizing
T +D +
p(T +| D +) = 7 r -. r-,
n I ) (T+D+) + (T-D+)
and
T +D +
P(D+T+)= (T +D+) + (T +D-)
It is easy to see that these probabilities can be greatly different and that con-
fusing them could have unfortunate consequences. Other conditional probabil-
ities of interest that can be computed from the table, p(T-1D-) andp(D -12"-),
represent, respectively, the specificity of the test, which is the probability that
the test is negative conditional on the disease being absent, and the predictive
value of a negative test, which is the probability that the disease is absent con-
ditional on the test being negative. B. K. Holland (1998) discusses therelation-
TABLE 9.2
The Possible Combinations of Test Result (T) and Disease State (D),
Treating Each Variable as Binary
Disease
SUMMARY
10
Choice Under Uncertainty
PRELIMINARY DISTINCTIONS
355
356 • CHAPTER 10
"forced-choice," namely that in which the choice is forced between taking the
one possible action or declining to do so.
It is apparent that one franc has much greater value for him who possesses only a
hundred than for a millionaire. (Laplace, 1814/1951, p. 22)
Daniel Bernoulli is generally credited with being the first to realize that the de-
sirableness of money or of anything with cash value does not necessarily in-
crease linearly with the amount involved. The intuitive compellingness of this
insight is seen in the observation that the amount of pleasure one would derive
from an unexpected windfall of a fixed amount of money is likely to depend on
how much money one has already; a gift of $ 1,000 is likely to be a much greater
cause of celebration for a person who is broke than for one who already has $ 1
million salted away.
Bernoulli questioned the reasonableness of considering irrational the be-
havior of a pauper who, finding a lottery ticket known to be worth, with equal
probability, either 20,000 ducats or nothing, decides to sell it for 9,000 ducats,
1,000 ducats less than its expected value. It was this type of situation that led
Bernoulli to make a distinction between cash value and utility (or moral worth)
and to speculate that the utility of an individual's wealth increased as the loga-
rithm of its cash value. Laplace made much of this distinction between, in his
terms, fortune physique wdfortune morale. A preference for certain outcomes
over uncertain outcomes with greater expected value has been found many
times since Bernoulli made this observation, and not only among paupers
(Allais 1979/1990; Kahneman & Tversky, 1979b).
P. L. Bernstein (1996) refers to the idea that utility is inversely related to the
worth of what one already possesses as one of the great intellectual leaps in the
history of ideas. The hypothesized decelerating rate of increase in utility with
increasing cash value has the interesting consequence that negative utility of a
loss of a given value will be greater (in absolute terms) than the positive utility
of a gain of the same value. Bernstein puts it this way: "The logical conse-
quence of Bernoulli's insight leads to a new and powerful intuition about tak-
ing risk. If the satisfaction to be derived from each successive increase in
wealth is smaller than the satisfaction derived from the previous increase in
wealth, then the disutility caused by a loss will always exceed the positive util-
ity provided by a gain of equal size" (p. 112).
Action Alternatives
A2 A3 ... A.
Hypothesized
States
H
i un u12 u13 ... uaj ... uln
H2 U21 U22 U23 ... U2J ... U2n
H3 U31 U32 U33 ... U3j ... U3n
H
i Uu U0 U13 ... Uy ... Uta
The hypothesized states of the world are assumed to be an exhaustive and mu-
tually exclusive set, which is to say that it is assumed that one and only one of
them pertains. Associated with each hypothesized state, H? is a probability, p(H)
and the assumption of an exhaustive and mutually exclusive set means that these
360 • CHAPTER 10
Maximization of expected utility is one of the goals that can be used to guide
choices, given the analytic approach described previously. Some theorists,
like Good (1983a), have taken the strong position that the principle of maxi-
mizing expected utility, in one or another form, is the only defensible guide
for rational action. Expected utility theory has been widely interpreted not
only as a normative theory of rationality, but also as a descriptive theory of
how people actually behave or at least wish to behave (Arrow, 1971; Fried-
man, 1953; Friedman & L. J. Savage, 1948). On the other hand, the use of
subjective utility in models of choice is not without critics: "Rational choice
models make heavy use of both subjective utilities and subjective probabili-
ties, as well as of the simplistic hypothesis that selfishness is the only motiva-
tion of human behavior. Not surprisingly, none of these models fits the fact.
CHOICE UNDER UNCERTAINTY • 361
Hence, although at first sight they look scientific, as a matter of fact they are
pseudoscientific" (Bunge, 1996, p. 104).
Despite the usefulness of the principle of maximization of expected utility
and its popularity among economists, it has problems, some of which have
been known for a long time. There seems to be a growing consensus among in-
vestigators of decision-making behavior that the classical view of decision
making, which promotes maximization of expected utility as a goal, and analy-
sis of the situation into possible world-states and action alternatives, is not de-
scriptive of the way most decisions are actually made, including decisions with
significant stakes that are made under conditions that would permit an analytic
approach. As Beach, B. Smith, Lundell, and Mitchell (1988) put it, "Whatever
else may have been learned in thirty-five years of behavioral decision research,
the primary lesson must be that expected utility and its theoretical accoutre-
ments provide an unsatisfactory description of decision-making at the individ-
ual level" (p. 17). If behavior that maximizes expected utility is taken as the
norm for rationality, then there seems no alternative to the conclusion that peo-
ple behave irrationally more often than not, which is to say that, as a descriptive
model of human behavior, maximization of expected utility misses the mark
(Miller & Starr, 1967). In fact, the adequacy of expected utility theory as a nor-
mative model of rationality has also been the subject of some debate (Beach et
al., 1988; Krantz, 1991; Sahlin, 1987; Shafer, 1986).
Keynes (1921/1956), for example, rejects the idea that "in order to obtain... a
measure of what ought to be our preference in regard to various alternative
courses of action, we must sum for each course of action a series of terms made
up of amounts of good which may attach to each of its possible consequences,
each multiplied by its appropriate probability" (p. 1363). His rejection of this po-
sition is based in part on the grounds that not all the assumptions that underlie it
are justified, and in part also on the grounds that not only probabilities but the
weight of evidence on which they are based should be taken into account:
If, therefore, the question of right action is under all circumstances a determinate
problem, it must be in virtue of an intuitive judgment directed to the situation as
a whole, and not in virtue of an arithmetical deduction derived from a series of
separate judgments directed to the individual alternatives each treated in isola-
tion. We must accept the conclusion that, if one good is greater than another, but
the probability of attaining the first less than that of attaining the second, the
question of which it is our duty to pursue may be indeterminate, unless we sup-
pose it to be within our power to make direct quantitative judgments of probabil-
ity and goodness jointly, (p. 1364)
Keynes questions the assumption that goods and probabilities can be com-
bined multiplicatively in a straightforward way:
362 • CHAPTER 10
Is it certain that a larger good, which is extremely improbable, is precisely equiv-
alent ethically to a smaller good which is proportionately more probable? We
may doubt whether the moral value of speculative and cautious action respec-
tively can be weighed against one another in a simple arithmetic way, just as we
have already doubted whether a good whose probability can only be determined
on a slight basis of evidence can be compared by means merely of the magnitude
of this probability with another good whose likelihood is based on completer
knowledge, (p. 1366)
The important point in our argument is that one's disposition to choose affects
the situations in which one may expect to find oneself. A straightforward
maximizer, who is disposed to make maximizing choices, must expect to be ex-
cluded from cooperative arrangements that he would find advantageous. A con-
strained maximizer may expect to be included in such arrangements. She
benefits from her disposition, not in the choices she makes, but in her opportuni-
ties to choose, (p. 330)
One might argue that the essential difference between the straightforward
maximizer and the constrained maximizer is that the latter simply evaluates
utility in a broader frame of reference than does the former. But pressed to its
limits, this argument would allow the interpretation of all behavior as consis-
tent with an effort to maximize expected utility—broadly conceived—over a
lifetime, making the notion uninteresting if not tautological.
The point was made earlier that the goal one selects in any specific decision
situation is likely to depend not only on the specifics of the situation but on more
abiding factors such as one's willingness to accept risk and one's general outlook
CHOICE UNDER UNCERTAINTY • 369
on life. Good (1983a) argues the importance of the latter factor by pointing out
that the mimimax strategy (the strategy of minimizing one's maximum possible
loss, which is an alternative to maximizing expected utility or gain) is an ultra-
conservative one, reasonable if, and only if, the least favorable initial distribution
is reasonable according to one's body of beliefs. The minimax solution assumes,
he suggests, that we live in the worst of all possible worlds. Whether or not one
accepts this assessment, it is clear that the minimax strategy will produce sub-
optimal results in many situations. If one used it to guide one's investment deci-
sions, one would do relatively well—less poorly than others—only when almost
all investments were doing poorly. When most investments were doing reason-
ably well, this strategy would leave one far behind the norm.
The term heuristic is used in two somewhat different ways in the psychological
literature. Sometimes it connotes a strategy or rule of thumb that is used to ad-
vantage in solving problems for which algorithmic solutions are either not
known or impractical. The solutions that are obtained are not assumed to be op-
timal, but they are can be good enough and they can be effected with limited re-
sources: "Simple heuristics can provide good solutions without a great deal of
complex general purpose calculation" (Goodie et al., 1999, p. 340). This use of
the term is similar to that found in the literature on computer science and artifi-
cial intelligence where a heuristic is a prescription for action that is not guaran-
teed to work but is judged to be likely to do so.
The second way in which the term is used in the psychological literature is to
connote reasoning strategies that often tend to lead the reasoner astray. Evans
and Bradshaw (1986) have this connotation in mind when they refer to a heu-
ristic as a theoretical construct that has been proposed to explain reasoning bi-
ases, or ways in which reasoning deviates systematically from the dictates of a
normative statistical theory. Klahr (1976) likens heuristics of this type to "cog-
nitive illusions," which are "compelling even when we know of their existence
and can explain their source" (p. 245).
Kahneman andTversky (1972b, 1973,1982a; Tversky & Kahneman, 1973,
1974,1983) have identified several heuristics that people appear to use when
making judgments that call for probabilistic or statistical reasoning. The better
known and more thoroughly studied of these are the availability, representa-
tiveness, and anchoring-and-adjustment heuristics. Whether these reasoning
strategies should be considered heuristics in the first (beneficial) or second
(detrimental) sense, or possibly both, is a question to which we will return. I
note here that Kahneman (2000) takes exception to the widely held notion that
people who do research on heuristics and biases are interested primarily in
demonstrating human irrationality and argues that the greater interest is in "un-
derstanding the psychology of intuitive judgment and choice" (p. 682).
Availability
"Availability" is the name that Kahneman and Tversky (1973) gave to the heu-
ristic that people use when they estimate the frequency of a class, or the likeli-
hood of an event, on the basis of the ease with which they can bring examples or
occurrences to mind. Kahneman and Tversky point out that availability can be
372 • CHAPTER 10
a useful clue to relative frequency, because the relative frequency with which
an event has been encountered is one of the determinants of its availability in
memory; it is a fallible clue, however, because it is not the only such determi-
nant. Availability may be affected also by such factors as vividness, concrete-
ness, recency of activation, emotional quality, and other characteristics that
enhance memorability (Billings & Schaalman, 1980; Shedler & Manis, 1986;
S. E. Taylor, 1982), so its use can also lead to reasoning errors. In short, the
availability heuristic can be expected to work well when the primary basis of
availability is frequency of occurrence and to work less well when it derives
from other factors; the problem is that its basis in specific instances usually is
not clear. Much of the research relating to the use of the heuristic has focused
on situations in which availability is likely to be influenced strongly by one or
more factors other than relative frequency.
The use of the availability heuristic seems to account well for findings like
the following ones reported by Kahneman and Tversky (1973). When people
were asked to estimate the proportion of male or female names on a list of ce-
lebrities they had just read, their estimates were influenced by the relative
fame of the people on the list. In particular, they tended to overestimate the
representation of the gender which had the more famous names on the list,
which, presumably, were the names they could more readily recall. In an-
other example, people estimated the number of different 2-person commit-
tees that can be formed from 10 people to be greater than the number of
different 8-person committees that can be formed from the same group. Al-
though the number of possible committees is the same in both cases, the hy-
pothesized explanation for the larger estimate in the two-person case is that
two-person committees are easier to imagine than eight-person committees
and are therefore judged to be more numerous.
Evidence regarding the importance of availability to likelihood estimates
has been reported by S. J. Hoch (1984), who had people generate reasons why a
future event might or might not occur. Some participants generated pro reasons
first and then con, whereas others did the reverse. In both cases they generated
more reasons of the type they were asked to produce first, and produced them
more quickly. When asked to estimate the likelihood of the event, people's esti-
mates tended to be consistent with the type of reason (pro or con) they had pro-
duced first. It appeared that the act of generating the initial list inhibited some-
what the generation of the second one and also tended to establish the partici-
pants' opinion on the matter. Increasing the length of the delay between the
generation of the pro and con lists diminished the interference and increased
the influence of the second list.
Availability seems to account also for the fact that when asked to judge the
size of the set of English words that begin with a specified letter, relative to the
CHOICE UNDER UNCERTAINTY • 373
size of the set of words that have that letter in the third-letter position, people
tend to judge the set with the specified letter in the first-letter position to be the
larger, even when there are many more words in the other set (Tversky &
Kahneman, 1973). Presumably people can search memory more effectively
when looking for words with a specified letter in first-letter position than when
looking for words with that letter in third-letter position, and the greater avail-
ability of first-letter words supports the judgment that there are more of this
class in the language.
The availability heuristic may be involved in a variety of other findings relat-
ing to systematic biases in estimation tasks. It could help account, for example,
for why, when husbands and wives were asked to estimate the percentage of fam-
ily activities (including unpleasant activities such as arguments) for which each
spouse was responsible, the importance of each person's role was estimated
higher by him or herself than by his or her spouse, or for why basketball players
judged members of their own team to have been more responsible than members
of the opposing team for critical plays in their games (M. Ross & Sicoly, 1979).
It appears also that people tend to overestimate the extent of their own contri-
butions to tasks that they perform collaboratively with others (M. Ross & Sicoly,
1979). Honest misjudgments of this type are easily accounted for by the avail-
ability principle. One is bound to be more aware of the details of one's own work
on a collaborative task, especially with respect to those aspects of the effort that
are covert, than of the details of the work of one's collaborators. In recalling a
project, one is likely to have available more information pertaining to one's own
efforts than pertaining to the efforts of others, other things being equal.
Use of the availability heuristic has been considered one of the reasons why
people often misestimate the frequencies of risky events; events that tend to be
memorable, and hence available, tend to be perceived to occur with relatively
high frequency (Slovic et al., 1981/1986). It also can help account for the com-
mon finding that people often tend to be overconfident of their own hypothe-
ses. If, when generating hypotheses regarding the cause of an observed effect,
people fail to think of some of the plausible alternatives to those they favor, the
probabilities they assign to the latter are likely to be inappropriately high. Peo-
ple tend to produce less-than-complete sets of hypotheses and overestimate the
likelihood of the hypotheses they produce (Fischhoff, Slovic, & Lichtenstein,
1978; Mehle, 1982; Mehle et al., 1981).
Representativeness
There can be little doubt that people use the types of heuristics identified by
Kahneman and Tversky when making judgments and decisions about statisti-
cal or probabilistic variables. As Tversky and Kahneman (1974) point out,
these principles simplify the task of reasoning about uncertain situations and,
in general, are quite useful. They frequently yield correct conclusions or at
least conclusions that are correct enough for the practical demands of the mo-
ment. However, as Tversky and Kahneman also point out, they "sometimes
lead to severe and systematic errors" (p. 1124).
Ratelli-Palmarini (1994) puts the case more strongly:
We have come to see that our minds spontaneously follow a sort of quick and
easy shortcut, and that this shortcut does not lead us to the same place to which
the highway of rationality would bring us. Few of us suffer from any illusion that
the summary paths taken by our intuitions and approximations would lead us to
exactly the same point to which reason and exact calculation might have brought
us. But we do delude ourselves into thinking that we are thereby brought to a
neighboring area, one that is close enough, (p. 143)
It seems clear that the use of the kinds of heuristics being considered some-
times yields judgments and decisions the rationality of which appears dubious
at best. What should we conclude about them in general? Is their use stark evi-
dence of human irrationality? Should we try, through the educational system,
to teach students not to use them?
We do not know how pervasively these heuristics are used outside the psy-
chological laboratory. Several investigators have questioned whether that use
is really very extensive, and not all are convinced that the types of problems
and problem contexts that have been used to study them in the laboratory are
representative of those that people typically encounter in daily life (L. J. Co-
hen, 1981; Dennett, 1981; Lopes, 1982; Macdonald, 1986).
We also do not know if, when they are used outside the laboratory, they typi-
cally work well or poorly. Many of the experimental situations in which they
have been studied have been carefully designed to demonstrate how they can
lead one astray. But how representative are these contrived situations with
those people typically encounter in everyday situations? One of the criticisms
of laboratory demonstrations of cognitive illusions is that experimenters some-
times get the appearance of irrational behavior only by violating widely ac-
cepted principles of normal discourse in their structuring of problem scenarios.
If heuristics of the type considered here are used extensively, and their use typi-
380 • CHAPTER 10
cally results in faulty judgments and decisions, why have they survived?
Would we not expect grossly ineffective approaches to judgment and decision
making to become casualties of natural selection over time?
One can make a case for the practical utility of these heuristics, at least under
certain conditions. As already noted, availability might be expected to be corre-
lated with relative frequency and therefore often to serve as an effective clue to rel-
ative frequency in statistical reasoning. The correlation is not perfect, of course,
and availability is known to be affected by factors other than relative frequency, so
its use as an indicant of relative frequency can lead to reasoning errors; but it is at
least conceivable that availability is a sufficiently reliable indicator of relative fre-
quency to ensure the availability heuristic's usefulness in many contexts.
J. R. Anderson (1990) has pointed out that the utility of any heuristic ap-
proach to a cognitive problem should be evaluated in terms of the expected
value of the consequences of applying it. To borrow the example he uses to
make the point, the tendency that some people have not to believe an argument
if the arguer appears not to believe it is normatively irrational, inasmuch as the
validity of an argument does not depend on the beliefs of the one who advances
it, but it is possible that people who use this heuristic are less likely to accept an
invalid argument than those who do not.
Gigerenzer (1991; Gigerenzer & Murray, 1987) has criticized the use of the
Kahneman and Tversky heuristics to account for errors in probabilistic reasoning
on several grounds. First, he argues that many of the "errors" that they are invoked
to explain are not really errors, or are so only from a particular narrow and chal-
lengeable interpretation of probability. Second, he sees the heuristically based ex-
planations, in several cases at least, as little more than redescriptions of the
phenomena they are intended to explain. Third, he dismisses the heuristics consid-
ered previously as useful explanatory constructs for much the same reason that
Popper (1959) dismissed the theories of Marx, Freud, and Adler, namely, their ex-
cessive versatility: These heuristics, in his words, "are largely undefined concepts
and can post hoc be used to explain almost everything" (p. 102).
My opinion regarding heuristic approaches to intellectually demanding
tasks is that they are valuable to the extent that they (a) work acceptably well in
most of the important situations in which they are applied, and (b) make signif-
icantly lighter demands on one's cognitive resources than alternative ap-
proaches that would give more precise or more consistently correct results.
What one gains by using heuristics is computational simplicity, the ability to
address complex problems with only modest effort; what one gives up is preci-
sion and guaranteed statistically optimal results. Is this a reasonable trade?
I doubt if it is possible to answer this question in any very conclusive way.
We can, however, reflect on the question and on what some partial answers
might be. It is conceivable, for example, that the trade is worth it in the sense
CHOICE UNDER UNCERTAINTY • 381
that the ratio of the benefit derived from the use of heuristics to the cost of their
use is greater, when integrated over all situations in which they are used, than
would be the ratio of the benefit derived from the use of more exact approaches
to the cost of their use. It also could be that heuristics work well enough in situ-
ations that really matter to offset their ineffectiveness in less important situa-
tions. Both of these possibilities could be seen as adaptive.
The question arises, if a heuristic procedure is effective in some situations and
not in others, why is it not applied only in situations of the former type. There are
at least two plausible answers to this question. The first challenges the tacit as-
sumption on which the question rests. There may well be examples of heuristics
that are used to advantage on some occasions by some people and avoided in fa-
vor of more exact methods on other occasions by the same people. We do not
know that this is not the case. A second plausible answer addresses those cases in
which a heuristic procedure is used more or less invariably, assuming there are
such. Invoking again notions of benefits and costs, one might assume that what is
to be gained by deciding on a case-by-case basis whether use of the heuristic is
likely to yield acceptable results or a more precise approach is called for is not
worth the cost of being constantly faced with the need to make this decision.
This is all conjectural, but some conjecture is probably required to see the
use of heuristics from a sufficiently broad perspective. The evidence is com-
pelling that situations can be designed in which people apply heuristics and, as
a consequence of doing so, form judgments, draw conclusions, or make deci-
sions that are suboptimal in specifiable ways. When viewed in isolation, such
applications of heuristics are likely to be taken as instances of irrational behav-
ior, but when viewed from a broader perspective these applications might be
seen as exceptions to the general usefulness of the procedures involved.
None of this is to deny the possibility of making people more aware of the
benefits and limitations of heuristic thinking and more discriminating in the
use of specific heuristic rules. Presumably not all heuristics are equally effec-
tive, relative to any measure of effectiveness that might be applied, and specific
heuristics are applied to better advantage in some situations than in others. It is
hard to believe that a more extensive and explicit knowledge of the advantages
and limitations of heuristic thinking and of specific heuristic rules could be
detrimental to one's rationality and it is easy to see how it might increase the ef-
fectiveness with which one could meet intellectual challenges in general.
The idea that sometimes lack of knowledge can be advantageous by actually in-
creasing the probability of selecting the correct item among two or more alterna-
tives considered has been pursued experimentally by Gigerenzer and colleagues.
D. G. Goldstein and Gigerenzer (1999) describe the results of experiments
showing that basing choices on simple recognition can be an effective strategy in
dealing with situations in which one knows little or nothing about the alterna-
tives, but recognizes (only) one among them. This heuristic is domain specific,
Goldstein and Gigerenzer caution, working only in domains in which recogni-
tion is correlated with the property with respect to which the choice is to be
made. It works in the case of selecting the foreign city among two or more alter-
natives that has the largest population, presumably because the greater the popu-
lation of a city, the greater the probability that one has heard of it.
In a particularly thought-provoking study, Borges, D. G. Goldstein,
Ortmann, and Gigerenzer (1999) showed that investments in stocks selected
solely on the basis of recognition of company names outperformed stock port-
folios selected on the basis of conventional criteria, in some cases by experts.
The investigators conclusion that "In general it seems that the greater the de-
gree of ignorance, the better it is for picking stocks" undoubtedly needs some
384 • CHAPTER 10
qualification; else we would all be millionaires, but their account of how lack
of knowledge can be beneficial in specific situations is instructive.
Gigerenzer and D. G. Goldstein (1999) consider several heuristics for de-
ciding when to stop searching for cues on which a selection might be made in a
choice situation and then making the selection. Suppose several cues are avail-
able regarding the property of interest; cues to the size of a European city, for
example, might include whether it has a soccer team, whether it is a national
capital, whether it is home to a university,... If the cues can be ordered in terms
of their correlation with the property of interest, one often may do quite well by
relying solely on the single best cue among those one knows about. It works
best if, when the cues are ordered in terms of their importance, each cue is more
important than any combination of the less important cues (Martignon &
Hoffrage, 1999). Gigerenzer and Goldstein call this heuristic Take the Best,
and they present data showing that it can be about as effective as standard sta-
tistical analyses, including multiple regression (which require much greater
computational capability) in specific instances.
Other studies of the effectiveness of fast and frugal heuristics and single-rea-
son decision making—and especially of the Take the Best heuristic— are re-
ported by Czerlinski, Gigerenzer, and D. G. Goldstein (1999), Martignon and
Hoffrage (1999), Martignon and Laskey (1999), and other contributors to
Gigerenzer and Todd (1999b). Heuristics are described for estimating quantities
(Hertwig, Hoffrage, & Martignon, 1999), categorizing objects (Berretty, Todd,
& Martignon, 1999) or actions (Blythe, Todd, & Miller, 1999), searching for a
mate (Todd & G. F. Miller, 1999), and investing by parents in their offspring (J.
N. Davis & Todd, 1999). The general idea promoted by these investigators is that
people generally reason by availing themselves selectively of the contents of a
toolbox of ecologically rational simple heuristics that are effective despite mak-
ing limited demands on memory and computational resources. The heuristics in
the toolbox have been adapted over time and they are retained because they
work—in some cases as well as, or even better than, formal procedures that re-
quire significant computational resources. Different tools are suited to different
situations and it is important that there be a match between a tool that is selected
and the structure of the situation in which it is to be applied. A major challenge
for this line of research is that of identifying the conditions under which specific
tools should be used (Luce, 2000) and of safeguarding against applications that
can have the opposite of the desired effect (Margolis, 2000).
Framing Effects
Problem: Imagine that the U.S. is preparing for the outbreak of an unusual Asian
disease, which is expected to kill 600 people. Two alternative programs to com-
bat the disease have been proposed. Assume that the exact scientific estimates of
the consequences of the programs are as follows:
Frame 1: If Program A is adopted, 200 people will be saved. If Program B is
adopted, there is a one-third probability that 600 people will be saved and a
two-thirds probability that no people will be saved. Which of the two programs
would you favor?
Frame 2: If Program C is adopted, 400 people will die. If Program D is adopted,
there is a one-third probability that nobody will die and a two-thirds probability
that 600 people will die.
Kahneman and Tversky (1984) found that 72% of 152 people who were
given the first version of this problem selected Program A, whereas 78% of 155
people who were given the second version selected Program D. The same type
of result was obtained with sophisticated participants as with naive ones, and
even when the same people responded to both representations within a period
of a few minutes. Kahneman and Tversky's explanation invokes the notion that
people tend to be risk aversive when thinking in terms of gains and risk seeking
when thinking in terms of losses. The first representation of the epidemic prob-
386 • CHAPTER 10
lem formulates the outcomes of the alternative programs in terms of two possi-
ble gains relative to a reference point in which 600 people are expected to die.
The second representation provides options expressed as losses relative to a
reference point in which no one dies.
This is a compelling demonstration that preferences among decision out-
comes are determined not only by what the outcomes are, but by the way in
which they are represented. The result has been replicated in other studies not
only when the differently framed choices were given to different groups of
people, but also when the same people were given both types of frames
(Dawes, 1988). However, before accepting without reservation the conclusion
that people's preference for a given outcome changes as a function of how that
outcome is described, we would like to feel sure that they believe that an out-
come described in different ways is indeed the same outcome. In the aforemen-
tioned case, we would want to be convinced that people believed the outcomes
of A and C (and of B and D) would be the same. Without this assurance, we
cannot rule out the possibility that the their choices were determined by what
they thought the outcomes would be, as opposed to being influenced by how
identical outcomes were expressed.
The intent of the experimenters was to frame precisely the same prob-
lem—present precisely the same information—in different ways. The assump-
tion is that the assertion "200 people (out of 600) will be saved" is equivalent to
the assertion that "400 people (out of 600) will die." As Macdonald (1986)
points out, however, such assertions, as they are used in everyday language,
may not be seen as conveying exactly the same information. In many contexts,
a guarantee to save 200 people could be taken as a guarantee to save at least
200. If more than 200 were saved, the guarantor would have been seen as keep-
ing his promise, whereas, if less than 200 were saved he would not. Similarly
the assertion that 400 people will die could be taken as a threat to 400 lives at
least. If the language used in the problem is interpreted this way, then A and C
are not equivalent and do not describe equally desirable choices: "Subjects
may see themselves as comparing an uncertain result with either a clearly posi-
tive one (A) or a clearly negative one (C)" (p. 24). If linguistic ambiguities of
this sort play some role in determining people's choices, we still might want to
describe the effects as framing effects, but we would have to acknowledge the
possibility that framing effects result, at least in part, from the fact that differ-
ent frames may lend themselves to linguistic interpretations that are not equiv-
alent in meaning. Others have also argued this possibility (Berkeley &
Humphreys, 1982; Kiihberger, 1995).
Even if the role of linguistic ambiguities were not an issue in interpreting the
framing effect described, it is not clear that the people's behavior in this situa-
tion should be considered irrational. J. R. Anderson (1990) points out that, al-
CHOICE UNDER UNCERTAINTY • 387
though Kahneman and Tversky's (1984) theory of problem framing accounts
for the choices their subjects made, there is little reason to consider those
choices irrational; inconsistent behavior should not be considered evidence of
irrationality in a situation in which there is no good reason to be consistent, and
no rational model would prescribe a consistent preference between choices
with equivalent outcomes. Anderson notes also that in some other choice situa-
tions in which framing effects have been found, the stakes riding on the choice
have been small. Inconsistent behavior in such instances, he argues, may con-
stitute evidence against rationality in the normative sense, but make a weak
case against it in the adaptive sense.
Different ways of presenting essentially the same information can have
different effects on those to whom the information is given. The relative at-
tractiveness of different medical procedures, for example, differs for both
physicians and patients depending on whether the probable outcomes are
described in terms of mortality or survival (McNeil, Pauker, Sox, &
Tver sky, 1982) and on precisely how the degree of uncertainty regarding
the outcome is expressed (Teigen & Brun, 1999). In betting situations,
choices vary depending on whether a particular gamble is expressed in
terms of probability of winning or probability of losing (I. P. Levin, D. P.
Chapman, & R. D. Johnson, 1988; I. P. Levin et al., 1986; I. P. Levin, R. D.
Johnson, Russo, & Deldin, 1985). Lobbyists for the credit card industry
have shown a sensitivity to framing effects in arguing that price differences
between cash and credit purchases be expressed as cash discounts rather
than credit card surcharges (Thaler, 1980).
How such results should be interpreted depends in part on the confidence
one can have that the two forms of a given message have been equally under-
stood by those to whom they were given. The point is illustrated by an experi-
ment in which people were either (a) informed that over a period of 50 years of
driving (about 40,000 trips) the probability of being killed is about 1 in 100 and
that of experiencing at least one disabling injury is about 1 in 3 or (b) given the
equivalent information by being told that there is one fatal accident for every
3.5 million person trips and a disabling injury for every 100,000 person trips.
Those in the former group responded more favorably to the use of seat belts
than did those in the latter (Slovic et al., 1978). Perhaps most people would find
the first message somewhat easier to relate to their personal chances of having
a serious accident, simply because it is phrased explicitly in those terms. Simi-
lar questions of comprehension can be raised about many of the studies in
which presumably the same information was conveyed in two different ways.
Toda (1963) has argued that subjective probability is, in effect, defined by
the technique that is used to measure it. This argument gets some support in the
finding that the way people are asked to give probability estimates affects the
388 • CHAPTER 10
estimates they produce (Damas, Goodman, & C. R. Peterson, 1972; Herman,
Ornstein, & Bahrick, 1964; Pitz, 1974; Seaver et al, 1978).
A powerful effect of question wording was obtained by Fischhoff and
MacGregor (1983) (reported also in Fischhoff, Slovic, & Lichtenstein, 1982,
and in National Research Council, 1989, Appendix C ). When asked to esti-
mate the lethality rate from influenza people estimated about 393 deaths per
100,000 influenza cases. When told that about 80 million people have influ-
enza in a normal year and asked to estimate the number of those cases that
would end in death, their average estimate was 4,800, which is equivalent to
only about 6 deaths per 100,000. Here, too, one might assume that in telling
people how many cases of influenza there are in an average year, one is provid-
ing information that they might consider relevant to the rate estimate; applying
the rate of 393 per 100,000 to a total case number of 80 million would produce
an estimate of more than 300,000 deaths, which might seem implausibly high.
There is abundant evidence that public opinions, as reflected by the results
of polls, can vary considerably depending on the way in which questions are
framed (Moore, 1992; Payne, 1952; Wheeler, 1976). Even the degree of satis-
faction with one's life that one expresses can depend on the range of options
provided (Parducci, 1974). Especially problematic is the use of words that can
have a variety of connotations. It is hard to know, for example, what to make of
the results without knowing how people interpret such terms as possible and
doubt. Some people may apply possible only to events that they consider to be
somewhat likely, whereas others may apply it to those they consider even re-
motely conceivable; similarly, doubt can connote anything from fairly strong
disbelief—"Tom believes that Elvis is ah've and well, but I really doubt it"—to
recognition of the remote possibility of being wrong—"I am convinced this is
the right thing to do, although I confess to a lingering doubt."
It is fairly easy to get people to make what appear to be inconsistent choices un-
der uncertainty. An example from Thaler (1980; 1983/1986) will make the
point. Thaler (1983/1986) describes the following three situations:
Risk Situation 1: While attending the movies last week you inadvertently exposed
yourself to a rare, fatal disease. If you contract the disease, you will die a quick and
painless death in one week. The chance that you will contract the disease is exactly
.001—that is, one chance in 1000. Once you get the disease there is no cure, but
you can take an inoculation now which will prevent you from getting the disease.
Unfortunately, there is only a limited supply of inoculation, and it will be sold to
CHOICE UNDER UNCERTAINTY • 389
the highest bidders. What is the most you would be willing to pay for this inocula-
tion? (If you wish, you may borrow the money to pay at a low rate of interest.)
Risk Situation 2: This is basically the same as situation 1 with the following
modifications. The chance you will get the disease is now .004—that is, four in
1000. The inoculation is only 25 percent effective—that is, it would reduce the
risk to .003. What is the most you would be willing to pay for this inoculation?
(Again, you may borrow the money to pay.)
Risk Situation 3: Some professors at a medical school are doing research on
the disease described above. They are recruiting volunteers who would be re-
quired to expose themselves to a .001 (one chance in 1000) risk of getting the
disease. No inoculations would be available, so this would entail a .001 chance
of death. The 20 volunteers from this audience who demand the least money
will be taken. What is the least amount of money you would require to partici-
pate in this experiment? (p. 163).
Thaler reports that people to whom he has presented these situations typi-
cally give median responses of about $800 in Situation 1, $250 in Situation 2,
and $ 100,000 in Situation 3. There is a sense in which what one is buying in the
first two cases is the same, namely a reduction of .001 in the likelihood of con-
tracting a disease, which is equivalent to what one is selling in the third situa-
tion. People's responses here certainly appear to be inconsistent. In particular,
comparing the responses to Situation 3 with those to Situations 1 and 2, we see
a very large difference. In Thaler's (1983/1986) words, "this implies that atyp-
ical individual would refuse to pay $5,000 to eliminate a risk, and would refuse
to take $5,000 to accept the same risk. How can $5,000 be both better and
worse than bearing some risk?" (p. 164).
It appears from these results that people are not willing to pay as much to de-
crease an existing risk as they require to be compensated for accepting a new risk
of comparable magnitude. Thaler refers to this phenomenon as the "endowment
effect," the idea being that people demand more money to give something up
than they are willing to pay to acquire it. The effect has been observed in other
contexts than that just described (Kahneman, Knetsch, & Thaler, 1990).
To illustrate the endowment effect in a situation that does not involve risk,
Thaler poses the following decision situation: "Suppose you won a ticket to a
sold-out concert that you would love to attend, and the ticket is priced at $15.
Before the concert, you are offered $50 for the ticket. Do you sell? Alterna-
tively, suppose you won $50 in a lottery. Then a few weeks later you are offered
a chance to buy a ticket to the same concert for $45. Do you buy?" According to
Thaler many people say they would neither sell for $50 in the first case nor buy
for $45 in the second. A practical implication that is suggested by the endow-
390 • CHAPTER 10
ment effect is that people may be willing to work harder to hold on to some-
thing they have than to acquire it in the first place. Or to put it in other words,
having acquired something, people may be very reluctant to give it up, even if
they did not greatly desire it originally.
We have already noted that people often express a preference for a gam-
ble with a relatively small potential payoff and a relatively large probability
of winning over one with the same expected payoff but composed of a larger
potential payoff and smaller probability of winning. That is they opt for the
larger probability of winning. If given the chance to sell either gamble,
however, the same people are likely to demand a higher price for the second
one than for the first (Grether & Plott, 1979). One account of this apparent
inconsistency is that when the focus is on winning versus losing, people an-
chor on the probability of success, whereas when the focus is on selling,
they anchor on monetary amounts, and use of the anchor-and-adjustment
heuristic in the two cases yields different results because it starts from dif-
ferent points (Dawes, 1988).
An endowment effect of sorts is easy to understand in intuitive terms in
some contexts. I have a small collection of antique wood-working tools. There
are a few tools in that collection that I would be unwilling to sell even if offered
considerably more than I believe to be their fair market value, but I did not pay
more than what I considered their fair market value to be when I acquired them,
and if I did not already have them, I would be unlikely to do so now. The value
of these tools to me is determined, in part, by the fact that while in my posses-
sion they have become to me more than objects. They represent memories of
successful treasure hunts with my wife and of many pleasant hours spent learn-
ing about their functions, working with them, and admiring the craftsmanship
that produced them. Some have special significance because of unusual places
or circumstances in which they were found.
I suspect that most of us, if asked, would be able to identify things we
possess that have value to us by virtue of the fact that we possess them. We
would be unlikely to be willing to sell them for some specified amount of
money that is greater than what we would be willing to pay for them if we
did not possess them already. In fairness to those investigators who have
seen the endowment effect as evidence of an irrational inconsistency in
choice behavior, I think that instances of personal possessions that have
come to have special significance to their owners are not the types of enti-
ties for purchase and sale they have had in mind. My point is that one can
easily think of circumstances under which it seems reasonable and not sur-
prising that the desire to hold on to something one has is greater than the de-
sire to acquire it in the first place; I am not prepared to argue that this
asymmetry is reasonable in all cases.
CHOICE UNDER UNCERTAINTY • 391
Considerable attention has been given by researchers to the fact that some of
the techniques that have been used to get probability estimates from people
have the property that it is not in one's best interest to report one's real subjec-
tive probabilities. The problem is illustrated by the following situation. Imag-
ine a test composed of two-alternative forced-choice questions. Suppose that,
instead of selecting one of the alternatives in each case, the task is to state, for
each alternative, one's subjective probability that it is correct. (Assume that
one and only one of the alternatives is correct in each case, so one's subjective
probabilities for each pair of alternatives must sum to 1.0.) Suppose that when
the test is scored, the score received for each item is the amount placed on the
correct alternative, or some linear function thereof. Thus, for example, if a
probability of .8 had been assigned to the correct alternative on Item 1 and one
of .4 to the correct alternative on Item 2, .8 and .4 would be credited respec-
tively for these items, or, in any case, twice as much for the first item as for the
second. It is easy to show that, given this scoring rule, one should not assign
numbers to the alternatives in such a way as to reflect one's true beliefs about
their probabilities of being correct; rather one should assign 1.0 to any alterna-
tive whose probability one considers to be greater than .5, and 0 to any whose
probability one considers to be less than .5. (For those one considers to be ex-
actly . 5, one could assign .5 or either 1.0 or 0 on the basis of a toss of a coin.)
To see why the "all-or-none" strategy is better than that of weighting the al-
ternatives according to one's real subjective probabilities, consider all those
cases in which one believes the probability of a given alternative is .7. If .7 is as-
signed to those alternatives, then one expects to have assigned this value to the
correct alternative 70% of the time and to the wrong alternative the remaining
30%. Thus, one's expected average score for this subset of items is (.7)(.7) +
(.3)(.3), or .58. On the other hand, if 1.0 is assigned to all those items that one
believes to be correct with probability .7 and 0 to those one believes to be cor-
rect with probability .3, one's expected average score for these items is
(,7)( 1.0) + (.3)(0) or .70. One does better in the latter case. (The situation corre-
sponds to the one discussed relative to probability matching in chap. 8.)
Several scoring rules have been invented that have the property, sometimes
called the "matching property," that one maximizes one's expected score when
one reports one's true subjective probabilities. These rules, which have been
referred to as "admissible probability measures" and as "proper scoring rules,"
have been the subject of both theoretical and empirical investigation (de
Finetti, 1962; Good, 1952; Roby, 1965; Shuford, Albert, & Massengill, 1966;
Toda, 1963). I will describe briefly one such rule—the spherical-gain rule—to
illustrate the concept.
392 • CHAPTER 10
According to the spherical-gain rule, described by Roby (1965), one's score
on any question (in a multiple-choice test item for which one and only one an-
swer is correct) is the number assigned to the correct alternative divided by the
square root of the sum of squares of the numbers assigned to all the alterna-
tives; that is,
where Sj represents the score received for the jth question on the test, jc;.k the num-
ber that the student assigns to the kth alternative for the jth question, and xjc the
number assigned to the correct alternative for that question. (The rule gets its
name from the fact that the assignments of numbers to n alternatives can be rep-
resented as vectors in an n-dimensional space and interesting interpretations can
be given to various properties of the vectors, such as length and orientation.)
Imagine, for illustrative purposes, the situation represented in Table 10.1. It
should be obvious that 0 < 5; < 1. The score will be 0 if 0 has been assigned to the
correct alternative (as in A above); it will be 1 if 0 is assigned to every alterna-
tive except the correct one (as in B). A "pure guess" in which the same nonzero
number is assigned to all alternatives (as in C) will result in a small, but not 0,
score. (This feature reflects the idea that knowing that one does not know is
TABLE 10.1
Question: Which of the Following U.S. Presidents Served Two Nonconsecutive Terms?
I.James Madison
2. Grover Cleveland
3. William Harding
4. John Adams
5. James Buchanan
(The correct answer is #2.)
Hypothetical answers:
A. 1 0 B. I O C . 1 1 D. 1 0
2 0 2 10 2 1 2 7
3 0 3 0 3 1 3 0
4 5 4 0 4 1 4 6
5 0 5 0 5 1 5 2
Scores 0.00 1.00 0.45 0.74
CHOICE UNDER UNCERTAINTY • 393
preferable to believing that one knows when one really does not.) Answer D
represents a case in which one can rule out two alternatives and is able to ex-
press differential confidence about the remaining three. The score in this case
rewards the test taker for the partial knowledge displayed.
In general, the larger the number placed on the correct alternative relative to
the numbers assigned to the other alternatives, the larger the resulting score.
More important, one maximizes one's expected score if (and only if) one as-
signs numbers to the alternatives in accordance with one's true beliefs regard-
ing their relative chances of being correct. Consider again the two-alternative
case mentioned earlier in which one believes that the probability that a particu-
lar alternative is correct is .7. As noted previously, given a linear scoring rule,
one's best strategy is to put all one's bet on the most likely alternative, in this
case the one with subjective probability .7. If one bet in accordance with one's
beliefs one's expectation would be .58, whereas if one put all one's bet on the
most likely alternative, it would be .70.
To simplify the illustration of what happens with the spherical-gain rule, let
us assume that the total points one uses equals 10 (any other number would do
as well). With the spherical-gain rule, one's best strategy is to bet strictly in ac-
cordance with one's true beliefs. If one put all one's bet on the alternative one
considered to be most likely correct, the one with probability .7 in our example,
the expected gain would be
whereas if one bet in accordance with one's true beliefs, placing .7 of one's bet
on the .7 alternative and .3 of it on the other one, one's expectation would be
The superiority of scoring rules that have the matching property over those
that do not have it is indisputable from a mathematical point of view; however,
the relevance of this distinction to much of human probabilistic reasoning is
debatable. The evidence that people behave very differently when scoring
rules are admissible than when they are not is thin, and in many, if not most, of
the real-life situations in which probabilistic reasoning is required, outcomes,
which are controlled by nature, do not demonstrably have the matching prop-
erty in any case. The use of such scoring rules for test administration is an inter-
esting possibility, however, especially when tests could be administered under
394 • CHAPTER 10
computer control so that test takers could get immediate feedback that might
help them develop a good understanding of the advantages of honestly reflect-
ing their true belief states in their answers. Admissible scoring rules have other
characteristics that could be exploited to advantage in providing much
finer-grained feedback to teachers regarding the knowledge states of students
and classes in the aggregate than do the results of conventionally administered
multiple-choice tests.
One way to run a lottery is to permit players to purchase tickets with preas-
signed numbers. Another is to permit the ticket buyers to specify the numbers
they wish to play. There is some evidence that players find the second approach
more to their liking than the first. In one study, in which a lottery was conducted
by drawing names of professional football players from a bag, participants
who selected the name of a player valued their tickets about four times as much
as did participants who were assigned a name at random (Langer, 1975).
Langer's explanation of this difference was that the ability to choose gives one
an illusion of control.
Apparently people believe—or at least act as though they believe—that
their chances of winning are greater when they select an item that is to be part
of a random draw than when they are assigned one. We know that many people
who play lotteries do not choose random numbers, but resort to a variety of
methods for selecting what they hope to be lucky ones. Given Langer's (1975)
findings, we would have to judge as highly rational the behavior of lottery op-
erators who have opted to permit ticket buyers to select their own numbers, be-
cause such lotteries should entice more players than those in which the player
has to take whatever happens to be on the next available ticket.
Paulos (1998) describes an interesting way in which feedback from lotteries
can provide information that can be interpreted, erroneously, as evidence that
players stand a better chance of winning if they pick their own numbers than if
they accept numbers picked by machine. Consider a lottery that has only two
players, and the winner is selected by a random drawing of a number between 1
and 10. Suppose that player A always plays her favorite number, say 7, whereas
B plays a different number each time. Over many games, A and B will win with
about equal frequency, but the number 7 will be the winning number more often
than any other number; 7 is the winning number on all of the 50% of the games
that A wins, whereas on the other 50% of the games, which B wins, the winning
numbers are distributed over the entire range, so no one number is winner for B
on more than a small percentage of the games. If, in a large lottery, many of the
people who pick their own numbers stay with the same number week after week,
TtT
whereas machine-picked numbers vary randomly over all the possibilities, it can
happen that hand-picked numbers have a higher probability of winning than do
machine-picked numbers, whereas individual players who hand pick their num-
bers do not have a higher probability of winning than do those who buy numbers
picked by machine. Paulos points out that, in an analogous way, it can happen
that "any situation having many outcomes, one of which is conventionally fa-
vored by the society at large, will seem to generate the conventionally favored
outcome more frequently than chance would suggest" (p. 184).
Some people, including, perhaps especially, those who gamble, make a dis-
tinction between chance and luck (Keren & Wagenaar, 1985; Wagenaar &
Keren, 1988; Wagenaar, Keren, & Pleit-Kuiper, 1984). Luckis seen as acausal
agent that sometimes overrules the effects of chance. Good luck, for example,
might be credited with an unexpected run of wins in a game of chance, such as
roulette. A run of losses is attributed to bad luck. Belief in luck in this context
can be reinforced by the tendency to underestimate the probability of relatively
low-probability chance events, such as several successive tosses of heads on a
coin. Inasmuch as the probability of alternations is typically overestimated
whereas that of repetitions is underestimated (Budescu, 1985; Falk, 1981; Falk
& Konold, 1997; Kubovy & Gilden, 1990; Lopes & Oden, 1987; Am.
Rapoport & Budescu, 1992), a run demands an explanation as a non-chance
event, and the idea of luck satisfies that demand. Belief in luck can give one a
false sense of control over the situation: It supports the assumption that contin-
uing to play when luck is with one, and quitting when it is not, improves one's
chance of winning.
Falk (1991) points out that beliefs in the "hot-hand" phenomenon and in
luck are analogous in interesting ways:
You only have to exchange these two terms with each other and you get the same
story. People believe that one cannot force luck to happen. One should wait till
luck appears and know how to utilize it wisely. By the same token, a hot hand
cannot be summoned at will, but once a player is "hot" it is important for players
on the team to pass the ball to that player (and for the opposition to watch him
closely). In both contexts, people are confronted with random sequences of wins
and losses and their attention is drawn to the subjectively overlong runs. Far
from adjusting their concept of what could happen by chance, they now invoke
an interpretation in the form of "luck" or "hot hand." The gambler's fallacy in
perception of randomness thus appears in different disguises and surely it does
so in many other contexts, (p. 217)
SUMMARY
11
People as Intuitive
Probabilists
Applying probabilities and statistics is much more a matter of grasping the situ-
ation, constructing informal arguments, and building comprehensive narratives
than of substituting numbers into formulas.
—Paulos (1998, p. 82)
One of the earliest and best documented findings regarding how well people
do, relative to what Bayes's rule says they should do, in extracting information
from data is that when people estimate posterior probabilities p(H ID)—when
they attempt to revise estimates of the probability that some particular hypoth-
esis is true upon receiving data that relate to that hypothesis—they tend to re-
vise their previous estimates in the appropriate direction but not by sufficiently
large amounts (Donmell & DuCharme, 1975; Edwards, 1968; Edwards,
Lindman, & Phillips, 1965; Messick & Campos, 1972; Navon, 1978; C. R. Pe-
terson & Beach, 1967; C. R. Peterson & DuCharme, 1967; C. R. Peterson & A.
J. Miller, 1965; C. R. Peterson, Schneider, & A. J. Miller, 1965; Phillips & Ed-
wards, 1966; Phillips, Hays, & Edwards, 1966; Am. Rapoport & Wallsten,
1972; Slovic & Lichtenstein, 1971). This result has been described as evidence
of people's conservatism as Bayesian information processors.
People tend, according to this view, to extract less information from data
than is there to be extracted. Another way to say this is that people require more
evidence than does an ideal Bayesian process to arrive at a given level of cer-
tainty regarding which of several competing hypotheses is true. If one begins
by assuming that each of several possibilities is equally probable, the effect of
this type of conservatism is to keep the estimator at all times closer to the condi-
tion of maximum uncertainty (equally likely alternatives) than is appropriate,
which means that at any given time the estimate of a posterior probability will
be too low for high probabilities and too high for low ones, relative to what
Bayes's rule would produce.
400 • CHAPTER 11
Conservatism in the estimation of posterior probabilities has been described
as the primary finding of Bayesian research (Slovic et al., 1977). It was this
finding that prompted Edwards (1965) and others to propose that probabilistic
information-processing systems should use human experts to provide esti-
mates of data conditional upon hypotheses, p(D I //), for each of the hypotheses
under consideration and to have machines generate posterior probabilities,
p(H ID) in accordance with Bayes's rule, using the conditional probabilities
provided by the experts as inputs.
Why should conditional probabilities be easier to estimate accurately than
posterior probabilities? One possibility is that it is easier to think in terms of
contingencies in the direction of D given H (or H, therefore D) than in that ofH
given D, because we find it more natural to reason from cause to effect. It may
be more natural in medicine, for example, to think in terms of the probability
that one has a particular symptom if one has a particular disease, because the
symptom is perceived as a consequence of the disease and not the reverse.
In addition, formal training programs may reinforce any tendency to think
intheD-given-//, orif-//-then-A direction. Consider medicine again. Medical
students are taught what to expect by way of symptoms, given a specified dis-
ease. Less often are they encouraged to think in terms of the various diseases
for which a particular symptom could be a sign. One suspects this may be true
of teaching also in other areas, say electronic or automotive troubleshooting, or
criminal investigation.
Though most of the studies of posterior probability estimation have yielded
evidence of conservatism and have been consistent with the idea that human
estimators are better at estimating p(D I H} \hanp(H I D), there have been some
exceptions. Occasionally when the hypothesis set has been small and the envi-
ronment frequentistic, people have produced posterior probability estimates
that were more extreme than those produced by the application of Bayes's rule
(Southard, Schumn, & Briggs, 1964). Messick and Campos (1972) found that
conservatism can be diminished by encouraging people to think of probability
as reflecting the ratio of favorable-to-total possible outcomes rather than
strength of belief. Other studies that cast doubt on the generality of the finding
of conservatism in the estimation of posterior probabilities include Howell
(1966), R. J. Kaplan and Newman (1966), and Schum, I. L. Goldstein, and
Southard (1966).
Conservatism, as represented by the results of the studies that have found it,
could be the consequence of an overweighting of the importance of prior prob-
abilities or, what amounts to the same thing, from an insensitivity to the
diagnosticity of incoming sample data; although it appears not to be the case
that people are totally insensitive to diagnosticity (Devine, Hirt, & Gehrke,
1990; Skov & Sherman, 1986; Snapper & C. R. Peterson, 1971; Trope &
PEOPLE AS INTUITIVE PROBABILISTS • 401
Bassock, 1982,1983). It could be due to inappropriate application of Bayes's
rule. And it could stem in part from a (not necessarily unreasonable) unwilling-
ness of people always to assume that the data selections are determined com-
pletely by chance (L. J. Cohen, 1982).
Another possibility is that people discount evidence to compensate for the fact
that it often is unreliable in real-world situations. One might argue that people
should not discount evidence in laboratory situations in which the data are unam-
biguous and completely trustworthy, but it is conceivable that they do so as a gen-
eral rule and that that habit serves them well in most of the cases they encounter in
the everyday world. The effect of the various normative approaches that have been
developed to deal with data that are known, or assumed, to be unreliable is to dis-
count evidence, thus moving probability updates in the direction of conservatism.
When people have been asked to make updates on the basis of data of different
specified degrees of reliability, they have tended to be overly conservative with
highly reliable data, and to become less conservative as reliability is decreased,
sometimes approaching optimality for intermediate degrees of reliability, and be-
coming insufficiently conservative for especially low ones (E. M. Johnson, 1974;
E. M. Johnson, Cavanagh, Spooner, & Samet, 1973; Schum et al., 1971).
Which of the various possibilities, or combination thereof, accounts for the
phenomenon is not clear, despite the considerable attention that has been given
to this question (Fischhoff & Beyth-Marom, 1983). Almost from the beginning
of work on Bayesian decision making, there have been proponents of the view
that conservativism results from miscalculation of conditional probabilities as
well as for the alternative view that people can calculate, or estimate, the condi-
tional probabilities well enough, but they misapply the theorem as a whole. To
the extent that one attributes the finding of conservatism to computational dif-
ficulties, as distinct from the overweighting of prior probabilities or the
underweighting of updating information, one may question whether conserva-
tism is quite the right concept to apply.
Base Rates
Numerous experiments have yielded results that suggest that, upon receipt of
specific information about a probabilistic situation, people tend to rely on it
alone and to ignore, or underweight, prior probabilities, as represented by base
rates, more or less completely (Bar-Hillel, 1980; Bar-Hillel & Fischhoff, 1981;
Borgida & Brekke, 1981; J. J. Christensen-Szalanski & Beach, 1982; Grether,
1980; Lyon & Slovic, 1975,1976). Contrary to the Bayesian prescription, the
new information seems to supplant the old entirely, or nearly so. Failure to give
due weight to base rates has been viewed as a particularly well documented and
pervasive manifestation of the shortcomings of people as intuitive statisticians.
On the face of it, the results of many experiments seem to support the idea
that people do indeed give inadequate weight to base rates in their estimates of
posterior probabilities. There is some question as to whether participants in ex-
periments always understand what the probabilities used in these problems
mean and whether they may confuse, say, the probability of a witness's report
conditional on the actual color of a cab with the probability that a cab was a
given color conditional on the witness reporting that color (Baron, 1988). This
is a worry and dictates caution in the interpretation of results. On the other
hand, the finding that people tend to ignore, or severely discount, base rates in
_ PEOPLE AS INTUITIVE PROBABILISTS • 403
such problems is a robust one. It has been obtained with a variety of problems
and has proved to persist despite efforts by experimenters to increase the sa-
liency of the base rates in several ways.
Hammerton (1973), for example, used a disease-detection problem like those
discussed in chapt. 7 in which the following information was pertinent. A partic-
ular diagnostic test will give a positive result with a probability of .9 when ap-
plied to a person who has the disease; when applied to a person who does not
have the disease, the same test will give a positive result with a probability of .01 .
About 1% of the population has the disease. Mr. Smith has been tested and the
test result shows positive. What is the probability that Mr. Smith has the disease?
Letting p(H I D) andp(D I H) represent, respectively, the probability of hav-
ing the disease (Hypothesis) given a positive test result (Data) and the proba-
bility of getting a positive test result given one has the disease, Bayes's
computation is
or
A light bulb factory uses a scanning device which is supposed to put a mark on
each defective bulb it spots in the assembly line. Eighty-five percent of the light
bulbs on the line are OK; the remaining 15% are defective. The scanning device
is known to be accurate in 80% of the decisions, regardless of whether the bulb is
actually OK or actually defective. That is, when a bulb is good, the scanner cor-
rectly identifies it as good 80% of the time. When a bulb is defective, the scanner
correctly marks it as defective 80% of the time. Suppose someone selects one of
the light bulbs from the line at random and gives it to the scanner. The scanner
marks this bulb as defective. What do you think is the probability (expressed as a
percentage) that this bulb is really defective? (p. 292)
Thus the evidence is strong that people often discount or ignore base rates. I
want to turn now to some evidence that people do not always do so, but before
doing that we should note a methodological problem that complicates the in-
terpretation of some studies of base-rate use. Sometimes it is not possible to de-
termine from data whether participants in a Bayesian information-processing
406 • CHAPTER 11
experiment were behaving in accordance with Bayes's rule or not, because it is
not clear what assumptions they made to fill in essential information missing
from problem descriptions. A critical assumption underlying the application of
Bayes's rule is that each observation is randomly and independently selected
from the entire set of hypothesized possibilities. When what constitutes that set
is not clear, the base rates will be ambiguous, and in order to judge the rational-
ity of people's estimates, one must know what assumptions they have made to
resolve that ambiguity.
But leaving that problem aside, let us accept as fact that people often fail to
pay much attention to base rates when reasoning about uncertain events. The
point to be made now is that they do not always do so. There are certain condi-
tions under which base rates are utilized and it appears to be possible to in-
crease the chance that they will be taken into account in other cases as well.
Base rates are used, for example, when case-specific information is not pro-
vided, which is to say when the base-rate information is all one has (Kahneman
& Tversky, 1973). It is important to bear in mind, however, that even irrelevant
case-specific information can swamp base-rate information, in some circum-
stances; apparently having no case-specific information and having some are
quite different situations, psychologically, independently of how useful or use-
less the case-specific information may be.
When case-specific information is available, base-rate information may be
used more appropriately if it is obtained from experience in real-world situations
than if it is provided verbally in laboratory situations (J. J. Christensen-Szalanski
& Bushyhead, 1981). It has been suggested that, in general, one is more likely to
use appropriate statistical reasoning when the statistical characteristics of the sit-
uation are objective and relatively apparent than when they are more subjective
or obscure (Jepson et al., 1983; Nisbett et al., 1976).
Even in laboratory situations base rates may be used if the information that
is critical to their relevance is sufficiently salient. For example, in one experi-
mental paradigm that has been used, a person is told the relative sizes of profes-
sion A and profession B, given some biographical information about an
individual, and asked to estimate for each of the professions the probability
that the individual is a member of it. The usual finding is that people pay less at-
tention to the information regarding the relative sizes of the professions—the
base-rate information—than they should, according to Bayes's rule, and more
to the biographical, or case-specific, information. It is important to note that
the base-rate information is relevant here only on the assumption that the indi-
vidual whose profession is being judged was randomly selected from the popu-
lation that includes the combined sets of the two professions. Gigerenzer, Hell,
and Blank (1988) have shown that when the validity of this assumption is made
salient by having people draw the biographical descriptions from an urn in
PEOPLE AS INTUITIVE PROBABILISTS • 407
which the two professions are represented in the specified proportions, the
base-rate information is not ignored.
Other experimenters have been able to increase people's use of base rates by
making their random sampling from a population salient (R. D. Hansen &
Donoghue, 1977; Wells & Harvey, 1977) or by having them experience the sta-
tistics of a situation by witnessing a series of incidents, one at a time (J. J.
Christensen-Szalanski & Beach, 1982; Schlotterbek,1992). Fischhoff, Slovic,
and Lichtenstein (1979) found they could improve the use of base-rate infor-
mation by giving the same participants several versions of a word problem that
were identical, excepting for base rates; presumably the saliency of the
base-rate information was assured by this design.
There is also evidence suggesting that base rates are likely to be used when
they are expressed in sufficiently concrete terms (Mam's, Dovalina, Avis, &
Cardoze, 1980), when they are expressed as frequencies (50 out of 1,000)
rather than as probabilities or percentages (.05 or 5%) (Cosmides & Tooby,
1990; Gigerenzer, 1993,1994; Gigerenzer & Hoffrage, 1995), when their rele-
vance to a particular question is made clear (Brekke & Borgida, 1988), and es-
pecially if there is an apparent causal relationship between them and the cases
of interest (Ajzen, 1977; Bar-Hillel, 1980; Tversky & Kahneman, 1978).
When the taxicab problem is recast so that .85 and .15 represent the proportion
of accidents involving blue and green cabs respectively, rather than the propor-
tion of blue and green cabs on the road, the base rates are more likely to be
taken into account (Bar-Hillel, 1980,1983,1984). When the statement of the
taxicab problem provides the information that 85% of the cabs on the road are
blue and 15% are green, use of these numbers as base rates requires the as-
sumption that the probability that the cab involved in the accident in question
was blue (green) is exactly the same as the proportion of all city cabs that are
blue (green), which is tantamount to assuming that the probability that any
given cab will be involved in an accident is independent of its color. This seems
a reasonable default assumption in the absence of information to the contrary,
but it is an assumption. Bar-Hillel's result suggests that people may fail to
make this assumption in the absence of explicit instructions to do so.
Although the data clearly demonstrate that people may become more sensi-
tive to base rates if they explicitly engage in random sampling, or perhaps are
helped in some other way to think in frequentistic terms, they do not support
the conclusion that base-rate neglect can always be eliminated in this way
(Camerer, 1990; Griffin & Tversky, 1992). Gigerenzer (1991b) also cautions
that simply telling people that cases were randomly sampled may not suffice to
convince them of that, especially in instances in which random sampling
would not be expected in the real-world situations the experimental scenarios
are intended to represent.
408 • CHAPTER 11
Base rates have been taken strongly into account also when problems that
are isomorphic with some that have shown base-rate neglect are explicitly
framed as games of chance (Ginossar & Trope, 1987). It appears too that base
rates can be made more salient through training people to pay attention to them
(J. J. Christensen-Szalanski & Beach, 1982), or by motivating them to do so by,
for example, having them—in the blue-green cab scenario—play the role of a
lawyer for one of the cab companies (Ginossar & Trope, 1987).
Gavanski and Hui (1992) found that people showed a high degree of sensi-
tivity to variations in base rates when the relevant sample spaces for judgments
corresponded to natural-kind categories, but not when the relevant sample
spaces were not natural-kind categories. Natural-kind categories are catego-
ries in terms of which people spontaneously partition the world and are to be
distinguished from categories that may be defined for special purposes but are
not commonly used by people in their everyday experience. Gluck and Bower
(1988) had people classify people as having either a rare or common disease on
the basis of consideration of a set of symptoms; classifications were sensitive
both to the rarity-commonality of the disease (base rates) and to the relative
likelihoods (likelihood ratio) of the symptoms given the disease.
Results from another study suggest that base rates are used in making estimates
of covariation significantly more if they relate to causal factors than if they relate to
noncausal factors, even though equally valid predictions can be made in both cases
(Ajzen, 1977). People who were told that students who spend more than 20 hours
per week studying have higher grade-point averages than students who spend less
than 20 hours used that base-rate information in evaluating information about
cases to predict grade-point averages; in contrast, people who were told that stu-
dents who lived more than 20 miles from campus had better grades than those who
lived closer placed less weight on information about distance of commute, even
though it presumably could have been as useful in making predictions. A causal
connection between study time and grade-point average is easy to imagine,
whereas one involving distance of home from campus is not so easy to see.
People also show some intuitive sensitivity to base rates in comments and
attitudes about low-probability events. Nisbett et al. (1983) point out, for ex-
ample, that the statement "I can't understand it; I have nine grandchildren and
all of them are boys" would not be perceived as strange by most people, but
substitute three for nine in that sentence and it might be. The student who fails a
test is likely to be less upset if three quarters of the class failed it than if nearly
everyone passed it. But granted that people sometimes show a sensitivity to the
relevance of base rates, they often do not, and it is this fact that many investiga-
tors feel requires explanation.
A practical problem in applying base rates in everyday reasoning is that of
figuring out what they are. Often it is very difficult, if not impossible, to do this,
PEOPLE AS INTUITIVE PROBABILISTS • 409
or even to determine what would constitute a reasonable assumption in that re-
gard. Even if one can identify the populations the relative sizes of which are the
critical data, the relative sizes may not be known and what to assume about
them may not be clear. When relevant numbers are available, they may be am-
biguous. As we have already noted, for example, learning that a particular test
that is used for medical disgnosis is 90% accurate does not suffice, because
90% accuracy can have a variety of meanings not all of which imply the same
base rates. Moreover, one cannot always take reported base fates at face value.
It is easy to find examples of grossly inaccurate reports of base rates in public
media and, as J. R. Anderson (1990) points out, this fact might account, to
some degree, for the tendency of people in psychological experiments to take
those that they encounter in that context less seriously than they might.
Theorists do not agree regarding exactly how base rates should be taken into
account in estimating prior probabilities. Many appear to assume that prior
probabilities should be equated with base rates when the latter are known. But
others do not make this assumption (Levi, 1983; Okasha, 2000). Okasha makes
the important point that whether or not one uses base rates for prior probabili-
ties is not, strictly speaking, a Bayesian issue. Bayes's rule has to do with how
one should use data to modify prior probabilities; it does not specify how the
prior probabilities are to be determined.
Consensus Information
The failure of people to give due weight to base-rate information when making
judgments about category membership has a parallel in the way people tend to
discount consensus information when ascribing causes to human behavior. In
this case when asked to predict the behavior of individuals in specified situa-
tions, people sometimes tend to ignore available information regarding typical
behavior of people in those situations in the past. The finding is illustrated by
an experiment by Nisbett and Borgida (1975).
PEOPLE AS INTUITIVE PROBABILISTS • 411
These investigators informed people of the results of two earlier experi-
ments, one by Nisbett and Schachter (1966) in which participants were asked
to take as much electric shock as they could stand, and another by Darley and
Latane (1968) in which participants witnessed what appeared to be a fellow
participant in an adjacent experimental cubicle having a seizure. Nisbett and
Borgida's subjects were also shown taped interviews with individual partici-
pants in the earlier experiments or were shown written descriptions of their
backgrounds and personalities and were asked to guess how the individuals be-
haved in the earlier experimental situations. Some of Nisbett and Borgida's
subjects were informed of the actual behavior of the participants in the earlier
experiment, as a group. The predictions of individual behavior were not
strongly influenced by the information about their aggregate behavior; those of
the subjects who had that information were similar to those of the subjects who
did not have it, and in neither case did they match the actual behavior very
closely. Also, the knowledge that most participants in the Darley and Latane
experiment had not offered to help the person who appeared to be having a sei-
zure did not make Nisbett and Borgida's subjects less likely to blame the indi-
vidual participants in the earlier experiment for their failure to help.
Nisbett et al. (1976) equate consensus information with base-rate infor-
mation, the difference being that consensus information is base-rate infor-
mation about behavior, rather than about category membership. Attribution
they see, however, as a more complicated and less direct inference than a
prediction. They contrast the difference between the two inferential pro-
cesses this way:
Kahneman and Tversky ask their subjects to produce a rather direct and uncom-
plicated chain of inference: "If the majority of the members of the population
belong to a particular category, then odds are the target case does also." Their
subjects fail to make such an inference. In the attribution research we have been
discussing, a still more elaborate chain of inference is requested: "If the majority
of the members of a population behave in a particular way, then the situation
must exert strong pressures toward that behavior, and therefore it is
unparsimonious to invoke personal idiosyncrasies to account for the behavior of
the target case if his behavior is modal." (p. 124)
Inasmuch as there is evidence, which Nisbett et al. review, that people fail to
apply behavioral base rates to predictions about target cases, the question of
why people ignore consensus information in making attributions reduces, in
their view, to the question of why they disregard base-rate information in gen-
eral. The topic of attribution—especially in the form of the question of how
people attribute effects to causes—has received a great deal of attention from
researchers; the role of base rates is but one aspect of this interest.
412 • CHAPTER 11
Gigerenzer (1994) makes the thought-provoking observation that although
psychologists have tended to take the neglect of base rates by participants in
psychological experiments as evidence of irrationality, experimenters them-
selves typically have ignored base rates in their own testing of hypotheses; and
have not seen the inconsistency in their behavior in this regard. A large major-
ity of statistical tests that are reported in the psychological literature—that
which deals with base-rate neglect included—are significance tests of the
Fisherian type, which do not take base rates into account.
Primacy Effects
Intuitively we would expect that, given n items of information that are relevant
to some conclusion that is to be drawn or decision that must be made, a rational
process for integrating those items of information should be insensitive to the
order in which they are considered. Bayes's rule has this property: n observa-
tions that are used to update a distribution of probabilities over a hypothesis set
will have the same combined effect independently of the order in which the ob-
servations are made.
The conclusions that people draw and the decisions they make often are not
insensitive to the order in which the data that helped determine those conclu-
sions or decisions were obtained (Pennington & Hastie, 1988; Walker,
Thibaut, & Andreoli, 1972). In particular, information that is acquired early in
the process often is given more weight than that acquired later. C. R. Peterson
and DuCharme (1967), for example, had people sample a sequence of colored
chips and estimate the probability that the sequence came from an urn with a
specified distribution of colors rather than from a second urn with a different
distribution. The sampling was arranged so that the first 30 trials favored one
urn and the second 30 favored the other so that after 60 trials the evidence was
equally strong (or equally weak) for both. People tended to favor the urn indi-
cated by the first 30 draws, which is to say that the evidence in the first 30 draws
was not counterbalanced by the evidence in the second 30 even though statisti-
cally it should have been. This is one illustration of what has been called a pri-
macy effect. There is other evidence that how data influence one's evaluation of
hypotheses can depend on when they are obtained in the hypothesis evaluation
process (Chenzoff et al., 1960; H. C. A. Dale, 1968).
Studies have revealed a number of related effects. They all might be consid-
ered primacy effects in the sense that information that is processed early—or,
perhaps more accurately, opinions that are formed from information that is
processed early—seem to be given inappropriately great weight. These effects
are discussed under such rubrics as persistence, commitment, and confirma-
tion bias, and they may not all have the same basis.
PEOPLE AS INTUITIVE PROBABILISTS * 413
The primacy effect is of considerable practical importance as it relates to judi-
cial reasoning. Jurors are admonished not to form opinions as to the guilt or inno-
cence of a defendant until all of the evidence has been presented and they have
had a chance to deliberate about it. Complying with this admonition requires an
ability to repress any tendency to form an initial opinion on the basis of partial in-
formation. Whether jurors can do this effectively is a matter of some doubt (G. P.
Kramer, Kerr, & Carroll, 1990). Results of mock trial experiments indicate that
mock jurors often come to favor a particular verdict early in the trial process
(Devine & Ostrom, 1985; Weld & Danzig, 1940; Weld & Roff, 1938), and that
they typically do reach at least a tentative verdict in their own minds before the
deliberation process begins (Hastie, Penrod, & Pennington, 1983).
Additional Considerations
Most of the discussions of "cognitive strain" and "limited capacity" that are de-
rived from the study of problem solving and concept formation depict a person
as a computer that has the right programs but cannot execute them properly be-
cause its central processor is too small. The biases from availability and anchor-
ing certainly are congruent with this analogy. However, the misjudgment of
sampling variability and the errors of prediction illustrate more serious deficien-
cies. Here we see that people's judgments of important probabilistic phenomena
are not merely biased but are in violation of fundamental normative rules. Re-
turning to the computer analogy, it appears that people lack the correct programs
for many important judgmental tasks, (p. 173)
Pennington and Hastie (1993) give the following litany of the ways in which
human uncertainty assessment has been shown to be inconsistent with one or
more of the traditional probability calculi:
In most studies that claim to have demonstrated human errors, biases, and short-
comings, no argument is given to explain why the statistical rule is rational, nor
is rationality independently defined. (King, 1992, p. 179)
Any claim of irrationality presumes some standard with which the behavior that
is considered to be irrational is compared and found wanting. In the case of sta-
tistical or probabilistic thinking, a standard that has generally been applied in the
Western world, although with some waxing and waning of popularity over time
(Martignon & Laskey, 1999), is probability theory, and in particular, probability
theory as developed from the formulations of such 17th-century thinkers as
Pascal and Fermat; judgments or decisions are said to be irrational when they are
inconsistent with what "Pascalian" probability theory prescribes.
A forceful argument against the conclusion that people generally exhibit ir-
rationality in the sense of being at odds with probability theory has been made
420 • CHAPTER 11
by Gigerenzer (1991b, 1994). The argument rests on the claim that there is no
single normative theory of probability that is recognized as such by all mathe-
maticians who work in this area. There are rather several views of what proba-
bility means and of what the basic tenets of a normative theory should be, so the
worst that can be said about any particular judgment is that it is irrational ac-
cording to this or that theory of probability, but what is irrational according to
this theory may be exactly right according to that one.
Gigerenzer (199 Ib) argues that much of the psychological literature on cog-
nitive illusions and biases ignores this fact, that it assumes the existence of an
unequivocal normative theory, and that the theory it treats as definitive is not
considered as such by all major probability theorists:
What is called in the heuristics and biases literature the "normative theory of
probability" or the like is in fact a very narrow kind of neo-Bayesian view that is
shared by some theoretical economists and cognitive psychologists, and to a
lesser degree by practitioners in business, law, and artificial intelligence. It is not
shared by proponents of the frequentist view of probability that dominates to-
day's statistics departments, nor by proponents of many other views; it is not
even shared by all Bayesians.... By this narrow standard of "correct" probabilis-
tic reasoning, the most distinguished probabilists and statisticians of our cen-
tury—figures of the stature of Richard von Mises and Jerzy Neyman—would be
guilty of "biases" in probabilistic reasoning, (p. 87)
More generally, Gigerenzer et al. (1989) have been critical of the elevation
by psychologists of probability theory "as a mathematical codification of ratio-
nal belief and action in uncertain situations" (p. 226). They point out that, un-
like the 18th-century probabilists who had been willing to revise their
mathematics to fit better the dictates of common reason when they ran into the
St. Petersburg problem, "the twentieth-century psychologists had come so to
revere the mathematical theory of probability and statistics that they instead in-
sisted that common reason be reformed to fit the mathematics" (p. 226).
The objective of research on probabilistic reasoning should not be to attempt
to explain the difference between people's judgments and "normative" perfor-
mance as represented by a particular Bayesian prescription, Gigerenzer (1991b)
argues, but rather to explain people's judgments. More progress will be made, he
suggests, by viewing competing statistical theories as the bases of competing ex-
planatory models than by pretending that statistics speaks with one voice:
The history of probability theory, with all its changes in the interpretation of
probability, in the meaning of "descriptive" and "normative," ... should warn us
to be cautious in using one formal method as the norm, against which such judg-
ments are denigrated as irrational, independent of their content and context. A
rational mind may be more than the kind of intuitive statistician who mechani-
PEOPLE AS INTUITIVE PROBABILISTS • 421
cally applies the same formula (be it expected utility, Bayes' theorem, or some
other) to all contents and in all contexts. But such a rational mind is much harder
to define. (Gigerenzer et al., 1989, p. 232)
It is one thing to establish one or more probabilistic interpretations for the calcu-
lus of chance, and quite another to show that the resultant theory applies to some
or all of the probability judgments that are made in everyday reasoning. In order
to discover what criteria of probability are appropriate for the evaluation of lay
reasoning we have to investigate what judgments of probability are intuitively
acceptable to lay adults and what rational constraints these judgments are sup-
posed to place on one another, (p. 319)
This is not to say that people cannot contradict themselves or make invalid
deductions. But when they do these things, the norms they violate, Cohen ar-
gues, can be discovered from their own intuitions.
L. J. Cohen (1979) rejects, in particular, the idea that Pascalian probability
theory is the only legitimate framework within which to reason about uncertain
predictions. He argues that one normative alternative to the Pascalian formula-
tion that is widely used in science can be traced to the writings of Francis Ba-
con. Cohen argues further that before one can legitimately classify an instance
of reasoning about probability as fallacious, one must know the theoretical
framework within which the reasoning is being done. What would be consid-
ered fallacious from a Pascalian perspective would not necessarily be seen as
fallacious from a Baconian point of view. Many of the alleged demonstrations
of human fallibility in probabilistic reasoning are unconvincing, Cohen ar-
gues, because they assume a particular perspective—the Pascalian one—that
people may not always use. Pascalian and Baconian judgments may differ
from one another in any particular circumstance, which is not to say that at
least one must be wrong. Neither, he contends, implies the falsity of the other.
The idea that two different judgments of probability regarding the same sit-
uation could both be right may be difficult for the reader to accept. I find it hard
to accept. Could we not settle the issue in favor of one or the other judgment—
or possibly against both of them—by doing an appropriate experiment? If the
probability of a specified outcome of an uncertain event is Pp and PB, according
to the Pascalian and Baconian views, respectively, let us simply observe the
event a large number of times and determine the proportion of instances on
which the specified outcome occurs. But, at best, this works only for events of
the sort that can be observed a large number of times, and many of the events
about which we wish to reason probabilistically are not this type. Even when it
is possible, both in theory and in practice, to observe an event many times, what
it means to say that the probability of a specified outcome on a specific occur-
rence of that event is thus and so is a debatable question. When I say that I be-
lieve that the probability that the toss of a fair die will produce a three with
probability 1/6,1 am sure I mean that I believe that three will come up on ap-
proximately 1/6 of a large number of tosses. But suppose that the die is to be
tossed just once. What I mean by saying that I believe the probability that it will
come up three on that toss is 1/6 is not quite so clear.
L. J. Cohen (1970,1977) has developed the Baconian view of probability in
some detail. According to this view, probability has to do primarily with judg-
ments of the "inductive reliability of generalizations," and is focused on the in-
ductive establishment of causal laws. Cohen argues that the experimental
method of modern science is essentially Baconian in that it seeks to establish
the inductive reliability of generalizations, which we commonly refer to as sci-
PEOPLE AS INTUITIVE PROBABILISTS • 423
entific laws. The Anglo-American legal system also endorses Baconian rea-
soning, in Cohen's view, by virtue of its commitment to adjudication on the
basis of the totality of the case-specific evidence that is before the court. No
court, he suggests, would convict a person in a civil suit solely on the grounds
that he belongs to a group over half of the members of which are known to have
committed the crime of which he is accused (base-rate information). He points
out too that if, in criminal cases, the injunction on jurors to assume a defendant
is innocent until proven guilty is interpreted as the assignment of zero as the
prior probability of guilt, then a guilty verdict within a Pascalian-Bayesian
framework is impossible, because in this system prior probabilities of zero in-
evitably produce posterior probabilities of zero, independently of what the
conditional probabilities are. None of this is to argue that Pascalian probabili-
ties cannot enter into forensic reasoning, but only that cases can seldom be de-
cided on the basis of them alone.
Many of the experimental findings that have been interpreted as evidence
of fallacious reasoning—application of the representativeness heuristic, for
example—may be seen as fallacious, Cohen argues, if one assumes that rea-
soners were working within the framework of Pascalian probability but not if
they were approaching the problems from a Baconian perspective. As to
which perspective reasoners should use in specific instances, Cohen argues
that the charitable thing for experimenters to do is to assume that participants
in their experiments are at least as rational as themselves and to interpret their
performance on probabilistic reasoning tasks in whichever ways do not re-
quire the assumption that they are reasoning fallaciously: "One hypothesis
which seems to fit the available evidence is that people inexpert in statistical
theory tend to apply Baconian patterns of reasoning instead, and to apply
these correctly whenever they have an opportunity to make the probability in
question depend on the amount of inductively relevant evidence that is of-
fered" (L. J. Cohen, 1979, p. 397).
In short, Cohen argues that many uncertain situations can be assessed
from either a Pascalian or a Baconian point of view, that neither perspective
is always clearly more appropriate than the other, and that before one can le-
gitimately classify an instance of probabilistic reasoning as fallacious, one
must know which perspective the reasoner is taking. Cohen allows that
there are situations in which a particular view is correct and the alternative
is not—the Pascalian view is the only appropriate one, for example, when
the statement of a problem makes it clear that the probabilities involved are
equivalent to long-term percentages or relative frequencies—and he argues
that the evidence that people characteristically reason fallaciously in such
cases is not compelling, which is not to deny that certain types of fallacies
may be fairly common.
424 • CHAPTER 11
One of the more common criticisms of research focused on the identifica-
tion of common biases in probabilistic thinking, or on ways in which such
thinking commonly violates normative principles, is that the situations investi-
gated sometimes are contrived so as to ensure that the application of heuristics
or principles that are effective in everyday life will be inappropriate in the ex-
perimental context (Ayton et al., 199la, 1991b; Goldman, 1986; Winkler &
Murphy, 1973). Moreover, what may appear to be an irrational bias in one con-
text may be functional in another. For example, the negative-recency bias (see
chap. 2) would be a useful bias to have in any context in which it is important to
be able to detect nonrandomness in sequences and the most likely form that de-
partures from randomness take is that of excessive repetitions or an excess of
repetitions is more important to detect than an excess of alternations (Lopes,
1982). Navon (1978) notes ways in which the behavior that is typically consid-
ered evidence of excessive conservatism in Bayesian decision experiments can
be beneficial in real-world contexts.
If H then not D
D
Therefore, not H
and
The first of these forms is a valid modus tollens argument. The second form
looks fine, but that it is not valid is easily seen by the following instantiation of it:
to the theory behind null-hypothesis testing, if the null hypothesis is true, one is
unlikely to obtain ap value as small as .05, but the obtaining of ap as small as .05
is often taken as evidence that the null hypothesis is unlikely to be true:
Errors to which the gambler's fallacy would be expected to give rise appear
to be based, in some cases at least, on misconceptions about what a random se-
quence or a noncontingent relationship is (C. R. Peterson, 1980). Peterson ar-
gues that participants in psychological experiments sometimes fail to perceive
noncontingent events as noncontingent because the expectations they bring to
the experiment preclude the assumption that what is occurring is random.
When one does not make the assumption that the process that is determining
the outcome of an uncertain event is completely random, then it is questionable
whether what is usually deemed an instance of the gambler's fallacy is appro-
priately considered a fallacy (L. J. Cohen, 1982); moreover, in many experi-
mental situations what are represented to participants as random processes are
really not random.
Gilovich (1991) also argues that some of our reasoning difficulties stem
from the strong tendency we seem to have of perceiving even random events
and patterns as having some degree of structure: "Our difficulty in accurately
recognizing random arrangements of events can lead us to believe things that
are not true—to believe something is systematic, ordered, and 'real' when it is
really random, chaotic, and illusory" (p. 21). This tendency is exacerbated by
the fact that once we believe a phenomenon exists we are very good at invent-
ing reasons for why it should exist.
People untutored in probability theory may fail to understand that the
probabilities of a set of mutually exclusive possibilities cannot total more
than 1 or that the elimination of some of the alternatives from among such a
set has implications for the probabilities associated with the remaining alter-
natives . When told, in one study of statistical reasoning, that a certain one of a
few suspects in a murder mystery was not the culprit, participants tended to
fail to revise the probabilities assigned to the remaining suspects (L. B. Rob-
inson & Hastie, 1985).
Sometimes it is difficult to tell whether inappropriate decisions based on the
probabilities of outcomes from random sampling are better attributed to faulty
intuitions regarding fundamental aspects of probability theory or to an inabil-
ity to compute or estimate what the probabilities of specific outcomes are. This
comment applies, for example, to the finding of L. J. Cohen and Chesnick
428 • CHAPTER 11
(1970) that some people prefer a lottery in which they draw 1 ticket from a pop-
ulation of 10 tickets to the opportunity to have as many as 20 draws (with re-
placement) from a population of 100 tickets.
Evans (1989) argues that what appears to be faulty reasoning under uncer-
tainty results in part from faulty intuitions, but more from a generally passive
approach to reasoning, whereby people tend to consider only what is explicitly
brought to their attention. Even correct intuitions, he suggests, are readily
overpowered by irrelevancies when the latter are made salient by the way in
which problems are posed to them. This view is compatible with Baron's
(1985, 1988) attribution of faulty reasoning to insufficient search. In both
cases, lack of effort—as distinct from lack of knowledge—is seen as a major
cause of poor reasoning. That performance on many tasks used in psychologi-
cal experiments may elicit less-than-maximum effort by the participants is a
highly plausible possibility. That students who participate to satisfy a course
requirement, or to obtain extra credit, or even for cash payment will invariably
make a best effort on tasks in which they may have little or no intrinsic interest
is an assumption that is perhaps too easily made uncritically. This is not to sug-
gest that all experimental tasks fit this description for all participants, but cer-
tainly many do, and that suffices to make one cautious about generalizing the
results of many experiments to situations in which people are highly motivated
to do their best on tasks because they have a meaningful stake in the conse-
quences of their performance.
Although it is perhaps somewhat ironic to look to psychological experi-
ments for evidence of this assertion, a number of studies have shown that the
amount of effort participants devote to the analysis and interpretation of infor-
mation presented to them depends, at least in part, on the degree of their per-
sonal involvement with the issues to which the information pertains (Chaiken,
1980; Harkness, DeBono, & Borgida, 1985). Highly involved people appear to
process messages and arguments at a deeper level and more extensively than
less involved people, who are likely to be influenced by relatively superficial or
surface features (Borgida & Howard-Pitney, 1983; Petty & Cacioppo, 1979;
Petty, Cacioppo, & Goldman, 1981; Showers & Cantor, 1985).
As Harkness et al. (1985) point out, assuming some modest correlation be-
tween complexity of information-processing strategies and the quality of the
outcomes of those strategies in natural task environments, it makes sense for
people to use more complex strategies when they are more involved—when
they have more at stake. These investigators demonstrated a relationship be-
tween involvement and complexity of strategies used in a study in which data
about a potential dating partner were processed by participants, some of
whom were told they would actually be dating the individual several times
and some of whom were not given this expectation. Although the highly in-
PEOPLE AS INTUITIVE PROBABILISTS • 429
volved participants in this study used more complex strategies than did the
less involved participants to analyze contingency data, the most common ap-
proach among them was not the normatively optimal one. Harkness et al. note
that the tendency for uninvolved participants to use easy and superficial strat-
egies can be seen as an indication of rationality—as evidence of mindfulness
in deciding when to be mindless: "Suboptimal performance on a trivial task
may be a marker of intellect" (p. 32).
In short, investigators have identified many possible bases of what appears to be
faulty reasoning under uncertainty. I say what "appears to be faulty reasoning" be-
cause some of the bases that have been suggested might better be considered
misperceptions, limitations of knowledge, or failures of motivation than of faulty
reasoning per se. Causes that have been suggested include confusion about what
one has been asked to do in an experimental situation, insensitivity to such basic
distinctions as that between frequency and relative frequency, lack of understand-
ing of certain critical concepts such as those of randomness and contingency, fail-
ure to understand that the probabilities of an exhaustive set of mutually exclusive
possibilities must sum to 1, computational limitations, misapplication of heuristic
rules, and a tendency to approach problems too passively and hence to fail to make
the effort necessary to exploit one's reasoning capabilities fully.
Much of the research that has been reviewed in this book has focused on ways
in which probabilistic reasoning appears to go astray. But the judgment that
any particular instance of reasoning is faulty presupposes one has a standard
against which to make that judgment. As Larkey et al. (1989) put it, "Attrib-
uting error in reasoning about chance processes requires at the outset that you
know the correct model for the observations about which subjects are reason-
ing. Before you can identify errors in reasoning and explain those errors as the
product of a particular style of erroneous reasoning, you must first know the
correct reasoning. It is much easier to know the correct model in an experimen-
tal setting than in a natural setting" (p. 30).
A general conclusion that has been drawn from the work that has been con-
sidered here is that people's intuitions about probability and related con-
cepts—chance, randomness, covariation—are faulty. However, what to make
of this conclusion is called into question by the fact that highly knowledgeable
people disagree, rather strongly on occasion, as to what should be considered
fallacious and faulty and what should not (L. J. Cohen, 1979, 1980;
Gigerenzer, 1994; Kahneman & Tversky, 1979a).
What is clear is that there is no escaping appeal to intuition. When the intu-
itions that underlie the probabilistic reasoning of participants in experiments
430 • CHAPTER 11
are judged to be faulty by the experimenters, those judgments are themselves
based on other intuitions about probability—the experimenters' intuitions, in-
formed by one or another school of thought that represents the intuitions of
other scholars. And not all writers have the same intuitions in this regard.
We should not forget the essential role that intuition played in the develop-
ment of probability theory in the first place. As Daston (1987b) puts it: "Be-
cause its practitioners understood probability theory as a mathematical
codification of good sense, the right answers to these questions were those
that seconded the intuitions of reasonable men" (p. 298). And again, "When
mathematical results clashed with the practice of reasonable men, the eigh-
teenth-century probabilists consistently rearranged or modified the mathe-
matics to reconcile the two" (p. 298). Or, as Gigerenzer et al. (1989) put it,
probability theory was meant by its developers to capture, not to correct, rea-
sonable intuitions.
In chapter 1 of this book, we noted the way Pascal and Fermat sharpened
their respective intuitions in attempting to converge on a solution to the prob-
lem on which they were working that they would both consider correct. Daston
(1987b) refers to the St. Petersburg paradox to illustrate how intuitions could
be modified as a consequence of grappling with problems: "If the conventional
solution of the St. Petersburg problem ran counter to the judgment of reason-
able men, probabilists reexamined their definitions of expectation. In the
minds of eighteenth-century mathematicians, there did not exist any theory of
probabilities disembodied of subject matter" (p. 298).
So where does this leave us? To what do we as individuals appeal when
we try to judge the merits of the differing—and mutually incompatible—
points of view defended by the experts? We have no place to go but to our
own intuitions. We have no choice but to accept those arguments that we
find most intuitively compelling. But this is as it should be. If we are ratio-
nal creatures, we, each of us individually, and not the experts, are responsi-
ble for our individual beliefs.
This does not mean that all possible views are equally correct, or that dia-
logue and debate are pointless. What one finds intuitively compelling can
change as a consequence of the acquisition of information and exposure to new
views and arguments in their support. But it does mean that the only legitimate
defense that I can offer for holding a particular view at any given time is that /
find that view to be the most consistent with the evidence at my disposal as
judged by my own intuitions. If I adopt a belief about something of which I
have little direct knowledge because it represents what certain experts believe,
the decision to do so is my responsibility, and I must appeal to my intuitions—
hopefully informed by some evidence of the competence and integrity of the
experts in question—in making it.
PEOPLE AS INTUITIVE PROSABILISTS • 431
We must state at once that Condorcet's work is excessively difficult; the diffi-
culty does not lie in the mathematical investigations, but in the expressions
which are employed to introduce these investigations and to state their results: it
is in many cases almost impossible to discover what Condorcet means to say.
The obscurity and self contradiction are without any parallel, so far as our expe-
rience of mathematical works extends; some examples will be given in the
course of our analysis, but no amount of examples can convey an adequate im-
pression of the extent of the evils. We believe that the work has been very little
studied, for we have not observed any recognition of the repulsive peculiarities
by which it is so undesirably distinguished, (p. 352)
SUMMARY
12
Concluding Comments
I ambling, the drawing of lots, and other activities that today we associate
with chance go back to antiquity. Whether the ancients had a concept of chance
that was very close to what is generally meant by chance today is questionable.
The use of chance devices for purposes of divination suggests that users, at least
sometimes, believed the outcomes to be revealing of otherwise hidden truths:
Was the accused party guilty or not? Would the king's army prevail in battle?...
As a mathematical discipline, probability theory is a relative late-comer;
historians of mathematics usually mark its beginning in the middle of the 17th
century, with the work of Blaise Pascal, Pierre de Fremat, and some of their
contemporaries. It was axiomatized only early in the 20th century as a conse-
quence of the work, primarily, of Andrei Kolmogorov. Today well-formed
probability problems generally can be solved straightforwardly with the appli-
cation of the probability calculus, as laid out, for example, by William Feller
(1957, 1966). The usefulness of probability theory has been established be-
yond doubt in numerous contexts and its impact on the sciences, especially
during the 20th century, has been profound.
Despite these facts, the philosophical question of what probability and
closely related entities like chance and randomness really are remains a matter
of debate. All these terms have been given a variety of connotations, so being
explicit about what is intended in specific contexts can help avoid confusion
and misunderstanding. Whatever chance is, it is not the antithesis of lawful-
ness, as the predictability of chance events in the aggregate attests. Probability
436
CONCLUDING COMMENTS • 437
paradoxes and dilemmas of various sorts provide interesting opportunities to
study probabilistic reasoning, and have sometimes played important roles in
the development of theories of choice behavior under uncertainty.
Statistics, which began as a strictly descriptive discipline—a means of dealing
with variability, especially in demographic data—evolved over time into an indis-
pensable tool for all the sciences, hard and soft. It is widely used for both descrip-
tive and inferential purposes and, like any powerful tool, is subject to misuses and
abuses of various sorts, either by intent or from an inadequate understanding of the
assumptions on which specific statistical procedures are based.
How good are people's untutored intuitions about probability, chance, ran-
domness, and closely related concepts? Certainly they are far from perfect.
There are numerous examples in the literature of probabilistic intuitions of
even prominent mathematicians that have proved to be wrong. And the evi-
dence that people in general often misjudge probabilities, sometimes system-
atically, is strong. A variety of errors of judgment involving probabilistic
situations that appear to be commonly made have been identified; the gam-
bler's fallacy in various guises, illusory correlations, the conjunction fallacy,
overconfidence in one's own judgments on probabilistic matters, and the con-
firmation bias are cases in point. Others could be mentioned. Confusions in-
volving conditional probabilities—failure to distinguish correctly what is
conditional on what—have often been reported.
It is also the case, however, that some experimental results that have been
widely taken as evidence of faulty probabilistic reasoning can be interpreted in
other ways. Probability problems can very easily be stated incompletely or am-
biguously. The incompleteness or ambiguousness may go unnoticed, and dif-
ferent people may unwittingly make different assumptions about the
statement's intended meaning—and arrive at different problem solutions.
Research on how people make choices under uncertainly has revealed fre-
quent departures from the predictions, or prescriptions, of normative models
that assume that choices are driven by the objective of maximizing expected
utility. Behavioral data have led to the development of alternative accounts of
how people make choices under uncertainty that emphasize the use of specific
heuristic strategies that work effectively in many contexts but that yield unde-
sirable consequences in others; precisely what those heuristic strategies are
and the conditions under which they are effective are matters of continuing re-
search. An unresolved issue is the extent to which many of the findings with
choice tasks in the laboratory generalize to real-world situations of interest.
In sum, as intuitive statisticians, people are neither at a total loss nor remark-
ably facile. They evidence some sound intuitions even without training, but
they also typically display a variety of misconceptions about probabilistic vari-
ables, and problems requiring a more than superficial understanding of proba-
438 • CHAPTER 12
bility theory are likely to cause significant difficulties. Training in probability
theory and mathematical statistics tends to improve probabilistic reason-
ing—it would be surprising and disheartening indeed if this were not so—but
there are probability problems that can be difficult even for experts, who some-
times disagree regarding what the solutions are. In the final analysis, we are all
pushed back to our intuitions, informed by training or not; we can accept only
what we find plausible. But a motivating premise of this book is that plausibil-
ity that rests on familiarity with the development of probabilistic thinking over
the centuries, with various perspectives that have been offered in recent times,
and with a body of experimental work on the topic is to be preferred to plausi-
bility that rests on uninformed intuition alone.
Ref,erences
439
440 • REFERENCES
Allwood, C. M., & Granhag, P. A. (1996). Considering the knowledge you have: Effects on
realism in confidence judgements. European Journal of Cognitive Psychology, 8,
235-256.
Alpert, M, & Raiffa, H. (1982). A progress report on the training of probability assessors. In
D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics
and biases (pp. 294-305). New York: Cambridge University Press.
Alvarez, L. W. (1965). A pseudo experience in parapsychology [Letter to the Editor]. Sci-
ence, 148, 1541.
Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Anderson.N. H. (1964). Test of a model for number-averaging behavior. Psychonomic Sci-
ence, 1, 189-190.
Anderson, N. H. (1979). Algebraic rules in psychological measurement. American Scien-
tist, 67(5), 555-563.
Apostolakis, G. (1990). The concept of probability in safety assessments of technological
systems. Science, 250, 1359-1364.
Arbuthnot, J. (1710). An argument for divine providence, taken from the constant regular-
ity observ'd in the births of both sexes. Philosophical Transactions of the Royal Society,
27, 186-190.
Arkes, H. R., Christensen, C., Lai, C., & Blumer, C. (1987). Two methods of reducing over-
confidence. Organizational Behavior and Human Decision Processes, 39, 133-144.
Arkes, H. R., Dawes, R. M., & Christensen, C. (1986). Factors influencing the use of adeci-
sion rule in a probabilistic task. Organizational Behavior and Human Decision Pro-
cesses, 37, 93-110.
Arkes, H. R., & Hammond, K. R. (Eds.). (1986). Judgment and decision making: An inter-
disciplinary reader. New York: Cambridge University Press.
Arkes, H. R., & Harkness, A. R. (1983). Estimates of contingency between two dichoto-
mous variables. Journal of Experimental Psychology: General, 112, 117-135.
Arrow, K. J. (1963). Social choice and individual values (2nd ed.). New York: Wiley.
Arrow, K. J. (1971). Essays in the theory of risk-bearing. Chicago: Markham.
Arrow, K. J. (1990). Values and collective decision making. In P. K. Moser (Ed.), Rational-
ity in action: Contemporary approaches (pp. 337-353). New York: Cambridge Univer-
sity Press.
Attneave, F. (1953). Psychological probability as a function of experienced frequency.
Journal of Experimental Psychology, 46, 81-86.
Axelrod, R. (1980a). Effective choice in the prisoner's dilemma. Journal of Conflict Reso-
lution, 24, 3-25.
Axelrod, R. (1980b). More effective choice in the prisoner's dilemma. Journal of Conflict
Resolution, 24, 379-403.
Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books.
Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211,
1390-1396.
Ayala, F. J. (1978). The mechanisms of evolution. Scientific American, 239(3), 56-69.
Ayer, A. J. (1965). Chance. Scientific American, 213(4), 44-54.
Ayton, P., Hunt, A. J., & Wright, G. (1991a). Psychological conceptions of randomness.
Journal of Behavioral Decision Making, 2, 221-238.
Ayton, P., Hunt, A. J., & Wright, G. (1991b). Randomness and reality, Reply to comments
on "Psychological conceptions of randomness." Journal of Behavioral Decision
Making, 2, 222-226..
Babad, E., & Katz, Y. (1992). Wishful thinking—Against all odds. Journal of Applied So-
cial Psychology, 21, 1921-1938.
REFERENCES » 441
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulle-
tin, 66, 1-29.
Baker, J. D., McKendry, J. M., & Mace, D. J. (1968). Certitude judgments in an operational
environment (Tech. Res. Note No. 200). Alexandria, VA: U.S. Army Research Institute
for the Behavioral and Social Sciences.
Ball, W. W. R., & Coxeter, H. S. M. (1987). Mathematical recreations and essays (13th
ed.). New York: Dover. (First edition authored by Ball and published 1892)
Barber, B. (1961). Resistence by scientists to scientific discovery. Science, 134,596-602.
Bar-Hillel, M. A. (1973). On the subjective probability of compound events. Organiza-
tional Behavior and Human Performance, 9, 396-406.
Bar-Hillel, M. A. (1974). Similarity and probability. Organizational Behavior and Human
Performance, 11, 277-282.
Bar-Hillel, M. A. (1979). The role of sample size in sample evaluation. Organizational Be-
havior and Human Performance, 24, 245-257.
Bar-Hillel, M. A. (1980). The base-rate fallacy in probability judgments. Acta
Psychologica, 44, 211-233.
Bar-Hillel, M. A. (1983). The base-rate fallacy controversy. In R. W. Scholtz (Ed.), Deci-
sion making under uncertainty. Amsterdam: North Holland.
Bar-Hillel, M. A. (1984). Representativeness and fallacies of probability judgment. Acta
Psychologica, 55, 91-107.
Bar-Hillel, M. A., &Budescu,D. (1995). The elusive wishful-thinking effect. Thinkingand
Reasoning, 1, 71-103.
Bar-Hillel, M. A., & Falk, R. (1982). Some teasers concerning conditional probabilities.
Cognition, 11, 109-122.
Bar-Hillel, M., & Fischhoff, B. (1981). When do base rates affect predictions? Journal of
Personality and Social Psychology, 41, 671-680.
Bar-Hillel, M. A., & Wagenaar, W. A. (1993). The perception of randomness. In G. Keren
& C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Method-
ological issues (pp. 369-393). Hillsdale, NJ: Lawrence Erlbaum Associates.
Baril, G. L., & Cannon, J. T. (1995). What is the probability that null hypothesis testing is
meaningless? American Psychologist, 50, 1098-1099.
Baron, J. (1985). Rationality and intelligence. New York: Cambridge University Press.
Baron, J. (1988). Thinking and deciding. New York: Cambridge University Press.
Baron, J. (1998). Judgment misguided: Intuition and error in public decision making. New
York: Oxford University Press.
Barrow, J. D. (1990). The world within the world. New York: Oxford University Press.
Barrow, J. D. (1991). Theories of everything: The quest for ultimate explanation. New
York: Oxford University Press.
Barrow, J. D. (1992). Pi in the sky: Counting, thinking, and being. New York: Oxford Uni-
versity Press.
Barrow, J. D. (1998). Impossibility: The limits of science and the science of limits. New
York: Oxford University Press.
Barrow, J. D., & Silk, J. (1983). The left hand of creation. New York: Basic Books.
Barrow, J. D., & Tipler, F. J. (1988). The anthropic cosmologicalprinciple. New York: Ox-
ford University Press.
Bartecchi, C. E., MacKenzie, T. D., & Schrier, R. W. (1995). The global tobacco epidemic.
Scientific American, 272(5), 44-51.
Baum, A., Fleming, R., & Singer, J. E. (1983). Coping with victimization by technological
disaster. Journal of Social Issues, 39, 117-138.
Bauman, L. J., & Siegel, J. (1987). Misperception among gay men of the risk of AIDS.
Journal of Applied Social Psychology, 17, 329-350. Berger (1997).
442 • REFERENCES
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philo-
sophical Transactions of the Royal Society, 53, 370-418. (Reprinted in G. A. Barnard
(1958), Studies in the history of probability and statistics. Biometrika, 45, 293-315)
Beach, L. R. (1966). Accuracy and consistency in the revision of subjective probabilities.
IEEE Transactions in Human Factors in Electronics HFE-7, 29-37.
Beach, L. R., & Peterson, C. R. (1966). Subjective probabilities for unions of events.
Psychonomic Science, 5, 307-308.
Beach, L. R., & Scopp, T. S. (1966). Inferences about correlations. Psychonomic Science,
6, 253-254.
Beach, L. R., & Scopp, T. S. (1968). Intuitive statistical inferences about variances. Organi-
zational Behavior and Human Performance, 3, 109-123.
Beach, L. R., Smith, B., Lundell, J., & Mitchell, T. R. (1988). Image theory: Descriptive
sufficiency of a simple rule for the compatibility test. Journal of Behavioral Decision
Making, 1, 17-28.
Beach, L. R., & Swensson, R. G. (1966). Intuitive estimation of means. Psychonomic Sci-
ence, 5, 161-162.
Beatty, J., Cartwright, N., Coleman, W., Gigerenzer, G., & Morgan, M. S. (1987). Introduc-
tion to volume 2. In L. Kriiger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic
revolution: Vol. 2. Ideas in the sciences (pp. 1-4). Cambridge, MA: MIT Press.
Becker, G. M., DeGroot, M. H., & Marschak, J. (1963). Stochastic models of choice behav-
ior. Behavioral Science, 8, 41-55.
Bell, E. T. (1937). Men of mathematics: The lives and achievements of the great mathemati-
cians from Zeno to Poincare. New York: Dover.
Bell, P. A., Fisher, J. D., Baum, A., & Greene, T. C. (1990). Environmental psychology. Fort
Worth, TX: Holt, Rinehart & Winston.
Beltrami, E. (1999). What is random? Chance and order in mathematics and life. New
York: Springer-Verlag.
Bern, D. J., Wallach, M. A., & Kogan, N. (1965). Group decision making under risk of
aversive consequences. Journal of Personality and Social Psychology, 1, 453-460.
Bennett, D. J. (1998). Randomness. Cambridge, MA: Harvard University Press.
Berg, J., Dickhaut, J., & McCabe, K. (1995). Trust reciprocity and social history. Games
and Economic Behavior, 10, 122-142.
Berg, J., Dickhaut, J., & O'Brien, J. (1985). Preference reversal and arbitrage. In V. Smith
(Ed.), Research in experimental economics (pp. 31-72). Greenwich, CT: JAI.
Berger, J. O. (1997). Statistical decision theory and Bayesian analysis, New York:
Springer-Verlag.
Berger, J. O., & Berry, D. A. (1988). Statistical analysis and the illusion of objectivity.
American Scientist, 76, 159-165.
Berkeley, D., & Humphreys, P. (1982). Structuring decision problems and the "bias heuris-
tic." Acta Psychologica, 50, 201-252.
Berl, J., Lewis, G., & Morrison, R. S. (1976). Applying models of choice to the problem of
college selection. In J. S. Carroll & J. W. Psyne (Eds.), Cognition and social behavior
(pp. 203-220). Hillsdale, NJ: Lawrence Erlbaum Associates.
Berlinski, D. (2000). Newton's gift: How Sir Isaac Newton unlocked the system of the
world. New York: Simon & Schuster.
Bernoulli, D. (1738). Specimen theoriae norae de mensura sortis. Commentarii
Academoae, Scientiarum Imperiales Petropolitanae, 5, 175-192. (Translation pub-
lished in 1954 by L. Sommer in Econometrica, 22, 23-26)
Bernoulli, J. (1713). Ars conjectandi [The art of conjecture]. Basel, Switzerland:
Bernstein, P. L. (1996). Against the gods: The remarkable story of risk. New York: Wiley.
REFERENCES • 443
Berretty, P. M., Todd, P. ML, & Martignon, L. (1999). Categorization by elimination: Using
few cues to choose. In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make
us smart (pp. 235-254). New York: Oxford University Press.
Berry, R. J. (1988). God and evolution: Creation, evolution andthe Bible. London: Hodder
& Stoughton.
Bertrand, J. (1889). Calcul des probabilities [Calculus of probabilities]. Paris:
Gautier-Villars et Fils.
Beyth-Marom, R. (1982). Perception of correlation reexamined. Memory and Cognition,
10, 511-519.
Bickel, P. J., Hammel, E. A., & O'Connell, J. W. (1975). Sex bias in graduate admissions:
Data from Berkeley. Science, 187, 398-404.
Billings,R. S., & Schaalman, M. L. (1980). Administrators'estimationof the probability of
outcomes of school desegregation: A field test of the availability heuristic. Organiza-
tional Behavior and Human Performance, 26, 97-114.
Binder, A. (1963). Further considerations on testing the null hypothesis and the strategy
and tactics of investigating theoretical models. Psychological Review, 70, 101-109.
Birnbaum, M. H. (1983). Base rates in Bayesian inference: Signal detection analysis of the
cab problem. American Journal of Psychology, 96, 85-94.
Black, D. (1958). The theory of committees and elections. Cambridge, England: Cam-
bridge University Press.
Blackburn, S. (1981). Rational animal? Commentary on Cohen, 1981. The Behavioral and
Brain Sciences, 4, 331-332.
Blair, D. C., & Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text
document retrieval system. Communications of the ACM, 28, 289-299.
Blyth, C. R. (1972a). On Simpson's paradox and the sure-thing principle. Journal of the
American Statistical Society, 67, 364-366.
Blyth, C. R. (1972b). Rejoinder. Journal of the American Statistical Society, 67,379-381.
Blyth, C. (1972c). Some probability paradoxes in choice among random alternatives. Jour-
nal of the American Statistical Association, 67, 366-373.
Blythe, P. W., Todd, P. M., & Miller, G. F. (1999). How motion reveals intention: Catego-
rizing social interactions. In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that
make us smart (pp. 257-285). New York: Oxford University Press.
Bolin, B., & Doos, B. R. (1986). The greenhouse effect: Climatic change and ecosystems.
New York: Wiley.
Boorstin, D. J. (1985). The discoverers: A history of man's search to know his world and
himself. New York: Vintage Books.
Borges, B., Goldstein, D. G., Ortmann, A., & Gigerenzer, G. (1999). Can ignorance beat
the stock market? In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us
smart (pp. 59-72). New York: Oxford University Press.
Borgida, E., & Brekke, N. (1981). New directions in attribution research (Vol. 5). Hillsdale,
NJ: Lawrence Erlbaum Associates.
Borgida, E., & Howard-Pitney, B. (1983). Personal involvement and the robustness of per-
ceptual salience effects. Journal of Personality and Social Psychology, 45, 560-570.
Bower, G. H. (1995). Emotion and social judgments. Washington, DC: Federation of Be-
havioral, Psychological and Cognitive Sciences.
Box, J. F. (1978). R. A. Fisher: The life of a scientist. New York: Wiley.
Boyd, R. (1988). Is the repeated prisoner's dilemma a good model of reciprocal altruism?
Ethology and Sociobiology, 9, 211-222.
Bradbury, J. A. (1989). The policy implications of differing concepts of risk. Science, Tech-
nology, and Human Values, 14, 380-399.
444 • REFERENCES
Bradley, J. V. (1981). Overconfidence in ignorant experts. Bulletin of the Psychonomic So-
ciety, 17, 82-84.
Braithwaite, R. B. (1974). The predictionist justification of induction. In R. Swinburne
(Ed.), The justification of induction (pp. 102-126). London: Oxford University Press.
(Original work published 1953)
Brams, S. J. (1976). Paradoxes in politics. New York: The Free Press.
Brams, S. J., & Fishburn, P. C. (1983). Approval voting. Boston: Birkhauser.
Brekke, N., & Borgida, E. (1988). Expert psychological testimony in rape trials: A so-
cial-cognitive analysis. Journal of Personality and Social Psychology, 55, 372-386.
Brenner, M. W. (1973). The next-in-line effect. Journal of Verbal Learning and Verbal Be-
havior, 12, 320-323.
Brenner, M. W. (1976). Memory and interpersonal relations. Unpublished doctoral disser-
tation, University of Michigan, Ann Arbor, MI.
Brewer, J. K. (1985). Behavioral statistics textbooks: Source of myths and misconcep-
tions T Journal of Educational Statistics, 10, 252-268.
Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the "Journal
of Educational Measurement." Journal of Educational Measurement, 10, 71-74.
Brody, C. J. (1984). Differences by sex in support for nuclear power. Social Forces, 63,
209-228.
Brown, R. V., Kahr, A. S., & Peterson, C. R. (1974). Decision analysis for the manager.
New York: Holt, Rinehart & Winston.
Browne, M. W., & Cudeck, R. C. (1992). Alternative ways of assessing model fit. Sociolog-
ical Methods and Research, 21, 230-258.
Brubaker, E. R. (1975). Free ride, free revelation, or the golden rule? Journal of Law and
Economics, 18, 147-161.
Buchanan, J. M. (1978). Cost and choice: An inquiry in economic theory. Chicago: Univer-
sity of Chicago Press.
Buckle, H. T. (1857-1861). History of civilization in England (2 vols.). London: J. W. Parker.
Budescu, D. V. (1985). Analysis of dichotomous variables in the presence of serial depend-
ence. Psychological Bulletin, 97, 547-561.
Budescu, D. V., & Bruderman, M. (1995). The relationship between the illusion of control
and the desirability bias. Journal of Behavioral Decision Making, 8, 109-125.
Buehler, R., Griffin, D., & McDonald, H. (1997). The role of motivated reasoning in opti-
mistic time predictions. Personality and Social Psychology Bulletin, 23, 238-247.
Buehler, R., Griffin, D., & Ross, M. (1994). Exploring the "planning fallacy": Why people
underestimate their task completion times. Journal of Personality and Social Psychol-
ogy, 67, 366-381.
Buehler, R., Griffin, D., & Ross, M. (1995). It's about time: Optimistic predictions in love
and work. In W. Stroebe & M. Hewstone (Eds.), The European Review of Social Psy-
chology (Vol. 6, pp. 1-32). Chichester, England: Wiley.
Buffon, G. L. L. (1777). Essai d'arithmetique morale [Essay on moral arithmetic].
Supplement a I'histoire naturelle (Vol. 44).
Bunge, M. (1996). In praise of intolerance to charlatanism in academia. In P. R. Gross, N.
Levitt, & M. W. Lewis (Eds.), The flight from reason (pp. 96-115). New York: New
York Academy of Sciences.
Burke, J. G. (1966). Bursting boilers and the federal power. Technology and Culture, 7,
1-23.
Burton, I., Kates, R. W, & White, G. R. (1978). The environment as hazard. New York: Ox-
ford University Press.
Busemeyer, J. R. (1990). Intuitive statistical estimation. In N. H. Anderson (Ed.), Contribu-
tions to information integration theory (Vol. 1, pp. 187-215). Hillsdale, NJ: Lawrence
Erlbaum Associates.
REFERENCES • 445
Butler, C. (1970). Number symbolism. London: Routledge & Kegan Paul.
Byram, S. J. (1997). Cognitive and motivational factors influencing time prediction. Jour-
nal of Experimental Psychology: Applied, 3, 216-239.
Camerer, C. (1988). Illusory correlations in perceptions and predictions of organizational
traits. Journal of Behavioral Decision Making, 1, 77-94.
Camerer, C. (1990). Do markets correct biases in probability judgment? Evidence from
market experiments. In L. Green & J. H. Kagel (Eds.), Advances in behavioral econom-
ics (Vol. 2, pp. 126-172). Norwood, NJ: Ablex.
Camerer, C., & Thaler, R. H. (1995). Anomalies: Ultimatums, dictators and manners. Jour-
nal of Economic Perspectives, 9, 209-219.
Carlson, B. W. (1990). Anchoring and adjustment in judgments under risk. Journal of Ex-
perimental Psychology: Learning, Memory, and Cognition, 16, 655-676.
Carnap, R. (1953). What is probability? Scientific American, 189(3), 128-138.
Carnap, R. (1962). Logical foundations of probability (2nd ed.). Chicago: University of
Chicago Press. (Original work published 1950)
Carroll, J. B. (1971). Measurement properties of subjective magnitude estimates of word
frequency. Journal of Verbal Learning and Verbal Behavior, 10, 722-729.
Carter, B. (1974). Large number coincidences and the anthropic principle in cosmology. In
M. S. Longair (Ed.), Confrontation of cosmological theories with observational data
(pp. 291-298). Dordrecht, Netherlands: Reidel.
Carver, R. (1978). The case against statistical significance testing. Harvard Educational
Review, 48, 378-399.
Carver, R. (1993). The case against statistical significance testing, revisited. Journal of Ex-
perimental Education, 61, 287-292.
Cassells, W., Schoenberger, A., & Grayboys, T. B. (1978). Interpretation by physicians of
clinical laboratory results. New England Journal of Medicine, 299, 999.
Catania, A. C., & Cutts, D. (1963). Experimental control of superstitious responding in hu-
mans. Journal of Experimental Analysis of Behavior, 6, 203-208.
Cervone, D., & Peake, P. K. (1986). Anchoring, efficacy, and action: The influence of judg-
mental heuristics on self-efficacy judgments and behavior. Journal of Personality and
Social Psychology, 50, 492-501.
Chaiken, S. (1980). Heuristic versus systematic information processing and the use fo source ver-
sus message cues in persuasion. Journal ofPersonality and Social Psychology, 39,752-766.
Chaitin, G. J. (1975). Randomness and mathematical proof. Scientific American, 232(5), 47-52.
Chapman, G. B., & Bornstein, B. H. (1996). The more you ask for, the more you get: An-
choring in personal injury verdicts. Applied Cognitive Psychology, 10, 519-540.
Chapman, G, B., & Johnson, E. J. (1994). The limits of anchoring. Journal of Behavioral
Decision Making, 7, 223-242.
Chapman, L. J. (1967). Illusory correlation in observational report. Journal of Verbal
Learning and Verbal Behavior, 6, 151-155.
Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous
psychodiagnostic observations. Journal of Abnormal Psychology, 72, 193-204.
Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of
valid psychodiagnostic signs. Journal of Abnormal Psychology, 74, 271-280.
Chenzoff, A., Crittenden, R. L., Flores, L, & Tolcott, M. A. (I960). Human decision mak-
ing as related to air surveillance systems: A survey of literature and current research.
(Tech Rep. No. 1, AFCCDD-TR-60-25). U.S. Air Force.
Chow, A., Crittenden, R. L., Flores, I., & Tolcott, M. A. (1960). Human decision making as
related to air surveillance systems: A survey of literature and current research. (Tech.
Rep. No. 1, AFCCDD-TR-60-25). U.S. Air Force
Chow, S. L. (1987). Experimental psychology: Rationale, procedures and issues. Calgary,
Alberta, Canada: Detselig Enterprises.
446 • REFERENCES
Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103,105-110.
Chow, S. L. (1989). Significance tests and deduction: Reply to Folger (1989). Psychologi-
cal Bulletin, 706,161-165.
Chow, S. L. (1991). Conceptual rigor versus practical impact. Theory and Psychology, 1,
337-360.
Chow, S. L. (1996). Statistical significance: Rationale, validity, and utility. Beverly Hills,
CA: Sage.
Chow, S. L. (1998a). Pr6cis of statistical significance: Rationale, validity, and utility. Be-
havioral and Brain Sciences, 21, 169-239 (including commentary).
Chow, S. L. (1998b). What statistical significance means. Theory and Psychology, 8,
323-330.
Christensen-Szalanski, J. J., & Beach, L. R. (1982). Experience and the base rate fallacy.
Organizational Behavior and Human Performance, 29, 270-278.
Christensen-Szalanski, J. J. J., Beck, D. E., Christensen-Szalanski, C. M., & Keopsell, T, D.
(1983). Effects of expertise and experience on risk judgments. Journal of Applied Cog-
nition, 68, 278-283.
Christensen-Szalanski, J. J., & Bushyhead, J. B. (1981). Physicians' use of probabilistic in-
formation in a real clinical setting. Journal of Experimental Psychology: Human Per-
ception and Performance, 7, 928-935.
Chronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American
Psychologist, 30, 116-127.
Clark, H. H., & Clark, E. V. (1977). Psychology and language. New York: Harcourt Brace.
Clark, R. D. (1971). Group-induced shift toward risk: A critical appraisal. Psychological
Bulletin, 76, 251-270.
Clarke, R. D. (1946). An application of the Poisson distribution. Journal of the Institute of
Actuaries (London), 72, 72.
Cohen, B., & Lee, I. (1979). A catalog of risks. Health Physics, 36, 707-722.
Cohen, I. B. (1987). Scientific revolutions, revolutions in science, and a probabilistic rev-
olution 1800^1930. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The proba-
bilistic revolution: Vol. 1. Ideas in history (pp. 23-44). Cambridge, MA: MIT Press.
Cohen, J. (1957). Subjective probability. Scientific American, 797(5), 128-138.
Cohen, J. (1972). Psychological probability: Or the art of doubt. London: George Allen &
Unwin.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.
Cohen, J., Dearnaley, E. J., & Hansel, C. E. M. (1958). Skill with chance: Variations in esti-
mates of skill with an increasing element of chance. British Journal of Psychology, 49,
319-323.
Cohen, L. J. (1962). The statistical power of abnormal-social psychological research. Jour-
nal of Abnormal and Social Psychology, 65, 145-153.
Cohen, L. J. (1970). The implications of induction. London: Methuen.
Cohen, L. J. (1977). The probable and the provable. Oxford, England: Clarendon.
Cohen, L. J. (1979). On the psychology of prediction: Whose is the fallacy? Cognition, 7,
385-407.
Cohen, L. J. (1980). Whose is the fallacy? A rejoinder to Daniel Kahneman and Amos
Tversky. Cognition, 8, 89-92.
Cohen, L. J. (1981). Can human irrationality be experimentally demonstrated? Behavioral
and Brain Sciences, 4, 317-331.
Cohen, L. J. (1982). Are people programmed to commit fallacies? Further thoughts about the
interpretation of data on judgment. Journalofthe Theory of Social Behavior, 12,251-274.
Cohen, L. J., & Chesnick, E. I. (1970). The doctrine of psychological chances. British Jour-
nal of Psychology, 61, 323-334.
REFERENCES • 447
Cohen, L. J., Chesnick, E. I., & Haran, D. (1971). Evaluation of compound probabilities in
sequential choice. Nature, 232, 214-216.
Cohen, L. J., Chesnick, E. I., & Haran, D. (1972). A confirmation of the inertial—y—effect
of sequential choice and decision. British Journal of Psychology, 63, 41-46.
Cohen, M. D., & Marsh, J. G. (1974). Leadership and ambiguity: The American College
President. New York: McGraw-Hill.
Cohen, M. R., & Nagel, E. (1934). An introduction to logic and scientific method. London:
Routledge & Kegan Paul.
Cole, K. C. (1997). The universe and the teacup: The mathematics of truth and beauty. New
York: Harcourt Brace.
Coleman, W. (1987). Experimental physiology and statistical inference: The therapeutic
trial in nineteenth-century Germany. In L. Kriiger, G. Gigerenzer, & M. S. Morgan
(Eds.), The probabilistic revolution: Vol. 2. Ideas in the sciences (pp. 201-226). Cam-
bridge, MA: MIT Press.
College Board. (1976-1977). Student descriptive questionnaire. Princeton, NJ: Educa-
tional Testing Service.
Combs, B., & Slovic, P. (1979). Causes of death: Biased newspaper coverage and biased
judgments. Journalism Quarterly, 56, 837-843.
Compendium of curious coincidences: Parallels in the lives and deaths of A. Lincoln and J.
F. Kennedy. (1964, August 21). Time, 84, 19.
Condorcet, M. (1785). Essai sur I'application de I'analyse a la probabilite des decisions
rendues la pluralit6 des voix [Essay on the application of analysis of the probability of
decision of votes rendered by a plurality]. Paris: De rimprimerie Royale.
Coombs, C. H. (1964). A theory of data. New York: Wiley.
Coombs, C. H., Bezembinder, T. G., &Goode, F. M. (1967). Testing expectation theories of
decision making without utility or subjective probability. Journal of Mathematical Psy-
chology, 4, 72-103.
Cooper, A. C., Woo, C. Y, & Dunkelberg, W. C. (1988). Entrepreneurs' perceived chances
for success. Journal of Business Venturing, 3, 97-108.
Copi, I. M. (1968). Introduction to logic (3rd ed.). New York: Macmillan.
Cortina, J. M., & Dunlap, W. P. (1997). On the logic and purpose of significance testing.
Psychological Methods, 2, 161-172.
Cosmides, L., & Tooby, J. (1990, August). Is the mind afrequentist? Paper presented at the
second annual meeting of the Human Behavior and Evolution Society, Los Angeles.
Cournot, A. A. (1843). Exposition de la theorie des chances et des probabilites [Exposition
of the theory of chances and probabilities]. Paris: Hachette.
Covello, V. T, & Johnson, B. B. (1987). The social and cultural construction of risk: Issues,
methods, and case studies. In B. B. Johnson & V. T. Covello (Eds.), The social and cul-
tural construction of risk (pp. vii-xiii). Dordrecht, Netherlands: D. Reidel.
Cox, D. R. (1977). The role of significance tests. Scandinavian Journal of Statistics, 4,
49-70.
Cramer, G. (1728). Letter to N. Bernoulli, published by D. Bernoulli (1738).
Crocker, J. (1981). Judgment of covariation by social perceivers. Psychological Bulletin,
90, 272-292.
Crocker, J. (1982). Biased questions in judgment of covariation studies. Personality and
Social Psychology Bulletin, 8, 214-220.
Crouch, E. A. C., & Wilson, R. (1982). Risk/benefit analysis. Cambridge, MA: Ballinger.
Crowe, B. (1969). The tragedy of the commons revisited. Science, 166, 1103-1107.
Cull, W. L., & Zechmeister, E. B. (1994). The learning ability paradox in adult
metamemory research: Where are the metamemory differences between good and poor
learners? Memory and Cognition, 22, 249-257.
448 ' REFERENCES
Cvetkovich, G., & Earle, T. C. (Eds.). (1992). Public responses to environmental hazards.
Journal of Social Issues, 48(4), 1-187.
Czerlinski, J., Gigerenzer, G., & Goldstein, D. G. (1999). How good are simple heuristics?
In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart (pp.
97-118). New York: Oxford University Press.
Dale, A. I. (1974). On a problem in conditional probability. Philosophy of Science, 41,
204-206.
Dale, H. C. A. (1960). A study of subjective probability. The British Journal of Statistical
Psychology, 13 (Pt. 1), 19-29.
Dale, H. C. A. (1968). Weighing evidence: An attempt to assess the efficiency of the human
operator. Ergonomics, 11, 215-230.
Damas, P. A., Goodman, B. C, & Peterson, C. R. (1972). Bayes theorem: Response scales
and feedback (University of Michigan Project No. 197-014). Ann Arbor, MI.
Darley, J. M., & Latane, B. (1968). Bystander intervention in emergencies: Diffusion of re-
sponsibility. Journal of Personality and Social Psychology, 8, 377-383.
Daston, L. J. (1987a). The domestication of risk: Mathematical probability and insurance
1650-1830. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The probabilistic
revolution: Vol. 1. Ideas in history (pp. 238-260). Cambridge, MA: MIT Press.
Daston, L. J. (1987b). Rational individuals versus laws of society: From probability to sta-
tistics. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The probabilistic revolu-
tion: Vol. 1. Ideas in history (pp. 295-304). Cambridge, MA: MIT Press.
Daston, L. J. (1988). Classical probability in the enlightenment. Princeton, NJ: Princeton
University Press.
David, F. N. (1962). Games, gods, and gambling: The origins and history of probability.
New York: Hafner.
Davies, P. C. W. (1982). The accidental universe. New York: Cambridge University Press.
Davies, P. C. W. (1983). God and the new physics. New York: Simon & Schuster.
Davies, P. (1988). The cosmic blueprint: New discoveries in nature's creative ability to or-
der the universe. New York: Simon & Schuster.
Davies, P. C. W. (1992). The mind of God: The scientific basis for a rational world. New
York: Simon & Schuster.
Davis, H. L., Hoch, S. J., & Ragsdale, E. E. (1986).An anchoring and adjustment model of
spousal predictions. Journal of Consumer Research, 13, 25-37.
Davis, J. N., & Todd, P. M. (1999). Parental investment by simple decision rules. In G.
Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart (pp. 309-324).
New York: Oxford University Press.
Davis, P. J., & Hersh, R. (1981). The mathematical experience. Boston: Houghton Mifflin.
Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles
of human decision making. American Psychologist, 26, 180-188.
Dawes, R. (1976). Shallow psychology. In J. S. Carroll & J. W. Payne (Eds.). Cognitionand
social behavior (pp. 3-12). Hillsdale, NJ: Lawrence Erlbaum Associates.
Dawes, R. M. (1979). The robust beauty of improper linear models in decision making.
American Psychologist, 34, 571-582.
Dawes, R. M. (1988). Rational choice in an uncertain world. New York: Harcourt Brace.
Dawes, R. M., & Corrigan, B. (1974). Linear models in decision-making. Psychological
Bulletin, 81, 95-106.
de Dombal, F. T, Leaper, D. J., Horrocks, J. C., Staniland, J. R., & McCann, A. P. (1974).
Human and computer-aided diagnosis of abdominal pain: Further report with emphasis
on performance of clinicians. British Medical Journal, 1, 376-380.
de Finetti, B. (1962). Does it make sense to speak of "good probability appraiser?" In I. J.
Good, A. J. Mayne, & J. M. Smith (Eds.), The scientist speculates: An anthology of
half-baked ideas (pp. 357-364). New York: Basic Books.
REFERENCES • 449
de Finetti, B. (1976). Probability: Beware of falsifications! Scientia, 3, 283-303.
DeGroot, M. H. (1982). Comment (on Schafer [1982]). Lindley's paradox. Journal of the
American Statistical Association, 77, 325-334). Journal of the American Statistical As-
sociation, 77, 336-339.
DeJoy, D. (1989). The optimism bias and traffic accident risk prevention. Accident Analysis
and Prevention, 21, 333-340.
De Moivre, A. (1756). Excerpt from The doctrine of chances. Reprinted as Appendix 5 in F.
N. David (1962). Games, gods, and gambling: The origins and history of probability
(pp. 254-267). New York: Hafner.
Denes-Raj, V., & Epstein, S. (1994). Conflict between intuitive and rational processing:
When people behave against their better judgment. Journal of Personality and Social
Psychology, 66, 819-829.
Dennett, D. C. (1981). Three kinds of intentional psychology. In R. Healey (Ed.), Reduc-
tion, time and reality (pp. 37-61). Cambridge, England: Cambridge University Press.
Devine, P. G., Hirt, E. R., & Gehrke, E. M. (1990). Diagnostic and confirmation strategies
in trait hypothesis testing. Journal of Personality and Social Psychology, 58,952-963.
Devine, P. G., & Ostrom, T. M. (1985). Cognitive mediation of inconsistency discounting.
Journal of Personality and Social Psychology, 49, 5-21.
Devlin, K. (2000a). The language of mathematics. New York: Freeman.
Devlin, K. (2000b). Snake eyes in the Garden of Eden. The Sciences, 40(4), 14-17.
Dewdney, A. K. (1993). 200 percent of nothing. New York: Wiley.
Diaconis, P., & Freedman, D. (1981). The persistence of cognitive illusions. Commentary
on Cohen, 1981. The Behavioral and Brain Sciences, 4, 333-334.
Diaconis, P., & Mosteller, F. (1989). Methods for studying coincidences. Journal of the
American Statistical Association, 84, 853-861.
Dickens, C. (1894). The old curiosity shop. Boston, MA: Houghton Mifflin. (Original work
published 1840)
Dickinson, A., Shanks, D., & Evenden, J. (1984). Judgment of act-outcome contingency: The
role of selective attention. Quarterly Journal of Experimental Psychology, 36A, 29-50.
Dickinson, R. E., & Cicerone, R. J. (1986). Future global warming from atmospheric trace
gases. Nature, 319, 109-115.
Ditto, P. H., Jemmott, J. B., & Darley, J. M. (1988). Appraising the threat of illness: A men-
tal representational approach. Health Psychology, 7, 183-201.
Ditto, P. H., & Lopez, D. F. (1992). Motivated skepticism: Use of differential decision crite-
ria for preferred and nonpreferred conclusions. Journal of Personality and Social Psy-
chology, 63, 568-584.
Dixon, P. (1998). Why scientists value p values. Psychonomic Bulletin and Review, 5,
390-396.
Doherty, M. E., &Falgout, K. (1986). Subjects'data selection strategiesfor assessing event
covariation. Unpublished manuscript, Department of Psychology, Bowling Green
State University, Bowling Green, OH.
Donmell, M. L., & DuCharme, M. W. (1975). The effect of Bayesian feedback on learning
in an odds estimation task. Organizational Behavior and Human Performance, 14,
305-313.
Dowie, J., & Elstein, A. (Eds.) (1988). Professional judgment. A reader in clinical decision
making. Cambridge, England: Cambridge University Press.
Dracup, C. (1995). Hypothesis testing—what it really is. The Psychologist, 8, 359-362.
DuCharme, W. M., & Peterson, C. R. (1969). Proportion estimation as afunction of propor-
tion and sample size. Journal of Experimental Psychology, 81, 536-541.
Dulany, D. E., & Hilton, D. J. (1991). Conversational implicature, conscious representa-
tion, and the conjunction fallacy. Social Cognition, 9, 85-110.
450 • REFERENCES
Dunning, D. (1993). Words to live by: The self and definitions of social concepts and cate-
gories. In J. Suls (Ed.), Psychological perspectives on the self (pp. 99-126). Hillsdale,
NJ: Lawrence Erlbaum Associates.
Dyson, F. J. (1971). Energy in the universe. Scientific American, 225(3), 50-59.
Eddington, A. S. (1935). New pathways in science. New York: Macmillan.
Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportuni-
ties. In D. Kahneman, P. Slovic, & A. Tversky (Eds.) Judgment under uncertainty:
Heuristics and biases (pp. 249-267). New York: Cambridge University Press.
Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of
the Royal Statistical Society, 53, 460-475,644-663.
Edney, J. J. (1980). The commons problem: Alternative perspectives. American Psycholo-
gist, 35, 131-150.
Edwards, W. (1962). Subjective probabilities inferred from decisions. Psychological Re-
view, 69, 109-135.
Edwards, W. (1965). Probabilistic information processing system for diagnosis and action
selection. In J. Spiegel & D. Walker (Eds.), Information system sciences. Proceedings of
the second congress (pp. 000-000). Washington, DC: Spartan Books.
Edwards, W. (1967). Dynamic decision theory and probabilistic information processing.
Human Factors, 4, 59-73.
Edwards, W. (1968). Conservatism and human information processing. In B. Kleinmuntz
(Ed.) Formal representation of human judgment. New York: Wiley.
Edwards, W., Lindman, H., & Phillips, L. D. (1965). Emerging technologies for making de-
cisions. In New directions in psychology II (pp. 261-325). New York: Holt, Rinehart &
Winston.
Edwards, W, Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psy-
chological research. Psychological Review, 70, 193-242.
Edwards, W, & Tversky, A. (1967). Decision making. Middlesex, England: Penguin.
Einhorn, H. J. (1980). Overconfidence in judgment. InR.A. Shweder &D. W. Fiske (Eds.),
New directions for methodology of social and behavioral science: No. 4. Fallible judg-
ment in behavioral research (pp. 1-16). San Francisco: Jossey-Bass.
Einhorn, H. J., & Hogarth, R. M. (1978). Confidence in judgment: Persistence of the illu-
sion of validity. Psychological Review, 85, 395-416.
Einhorn, H. J., & Hogarth, R. M. (1981). Behavioral decision theory: Processes of judg-
ment and choice. Annual Review of Psychology, 32, 53-88.
Einhorn, H. J., & Hogarth, R. M. (1985). Ambiguity and uncertainty in probabilistic infer-
ence. Psychological Review, 92, 433-461.
Ekeland, I. (1993). The broken dice. Chicago: University of Chicago Press. (Original work
published 1991 in French)
Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. Quarterly Journal of Eco-
nomics, 75, 643-669.
Engle, P. L., & Lumpkin, J. B. (1992). How accurate are time-use reports? Effects of cogni-
tive enhancement and cultural differences on recall accuracy. Applied Cognitive Psy-
chology, 6, 141-159.
Ennis, B. J., &Litwack, T. R. (1974). Psychiatry and the presumption of expertise: Flipping
coins in the courtroom. California Law Review, 62, 693-752.
Erev, I., Bornstein, B. H., & Wallsten, T. S. (1993). The negative effect of probability assess-
ment on decision quality. Organizational Behavior and Human Decision Processes, 55,
78-94.
Erev, I., Wallsten, T. S., & Budescu, D. V. (1994). Simultaneous over- and
underconfidence: The role of error in judgment processes. Psychological Review, 101,
319-327.
REFERENCES • 451
Erikson, K. (1990, Fall). The fear you can't ignore. Best of Business Quarterly, pp. 52-59.
Erlick, D. E., & Mills, R. G. (1967). Perceptual quantification of conditional dependency.
Journal of Experimental Psychology, 73, 9-14.
Estes, W. K. (1964). Probability learning. In A. W. Melton (Ed.), Categories of human
learning (pp. 90-128). New York: Academic Press.
Estes, W. K. (1976a). The cognitive side of probability learning. Psychological Review, 83,
37-64.
Estes, W. K. (1976b). S ome functions of memory in probability learning and choice behav-
ior. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 10, pp.
2-45). New York: Academic Press.
Evans, J. St. B. T. (1989). Bias in human reasoning: Causes and consequences. Hillsdale,
NJ: Lawrence Erlbaum Associates.
Evans, J. St. B. T, & Bradshaw, H. (1986). Estimating sample-size requirements in re-
search design: A study of intuitive statistical judgment. Current Psychological Re-
search and Reviews, 5(1), 10-19.
Evans, J. St. B. T, & Dusoir, A. E. (1975). Sample size and subjective probability judgements.
Paper presented at the meeting of the Experimental Psychology Society, Oxford, England.
Evans, J. St. B. T, & Dusoir, A. E. (1977). Proportionality and sample size as factors in in-
tuitive statistical judgement. Acta Psychologica, 41, 129-137.
Evans, J. St. B. P., & Pollard, P. (1985). Intuitive statistical inferences about normally dis-
tributed data. Acta Psychologica, 60, 57-71.
Faber, R. J. (1976). Re-encountering a counter-intuitive probability. Philosophy of Science,
43,283-285.
Falk, R. (1979). Revision of probabilities and the time axis. In Proceedings of the Third In-
ternational Conference for the Psychology of Mathematics Education (pp. 64-66).
Warwick, England.
Falk, R. (1981). The perception of randomness. In Proceedings of the Fifth International Con-
ference for the Psychology of Mathematics Education (pp. 222-229). Grenoble, France.
Falk, R. (1981-1982). On coincidences. The Skeptical Inquirer, 6, 18-31.
Falk, R. (1983). Experimental models for resolving probabilistic ambiguities. In R.
Hershlowitz (Ed.), Proceedings of the Seventh International Conference for the Psy-
chology of Mathematics Education (pp. 319-325). Rehovot, Israel: The Weizmann In-
stitute of Science.
Falk, R. (1986). Misconceptions of statistical significance. Journal of Structural Learning,
9, 83-96.
Falk, R. (1989). Judgment of coincidences: Mine versus yours. American Journal of Psy-
chology, 102, 477-493.
Falk, R. (1991). Randomness—An ill-defined but much needed concept. Commentary on
"Psychological conceptions of randomness." Journal of Behavioral Decision Making,
4, 215-218.
Falk, R. (1992). A closer look at the probabilities of the notorious three prisoners. Cogni-
tion, 43, 197-223.
Falk, R. (1993). Understanding probability and statistics: A book of problems. Wellesley,
MA: A. K. Peters.
Falk, R., & Bar-Hillel, M. (1980). Magic possibilities of the weighted average. Mathemat-
ics Magazine, 53, 106-107.
Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence
of a probabilistic misconception. Theory and Psychology, 5, 75-98.
Falk, R., & Konold, C. (1992). The psychology of learning probability. In F. S. Gordon & S.
P. Gordon (Eds.), Statistics for the twenty-first century (pp 151-164). Washington, DC:
Mathematical Association of America.
452 • REFERENCES
Falk, R., & Konold, C. (1997). Making sense of randomness: Implicit encoding as a basis
for judgment. Psychological Review, 104, 301-318.
Falk, R., & MacGregor, D. (1983). The surprisingness of coincidences. In P. Humphreys,
O. Svenson, & A. V£ri (Eds.), Analysing and aiding decision processes (pp. 489-502).
Budapest, Hungary: AdadSmiai Kiad6.
Faust, D., Hart, K., & Guilmette, J. (1988). Pediatric malingering: The capacity of children
to fake believable deficits on neuropsychological testing. Journal of Counseling and
Clinical Psychology, 56, 578-582.
Fehr, E., & Gachter, S. (2000). Cooperation and punishment in public goods experiments.
American Economic Review, 90, 980-994.
Fehr, E., Gachter, S., & Kirchsteiger, G. (1997). Reciprocity as a contract enforcement de-
vice: Experimental evidence. Econometrica, 65, 833-860.
Fehr, E., Kirchsteiger, G., & Riedl, A. (1993). Does fairness prevent market clearing? An
experimental investigation. Quarterly Journal of Economics, 108, 437-460.
Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation.
Quarterly Journal of Economics, 114, 817-868.
Feldman, J. M., Camburn, A., & Gatti, G. M. (1986). Shared distinctiveness as a source of
illusory correlation in performance appraisal. Organizational Behavior and Human De-
cision Processes, 37, 34-59.
Feller, W. (1936-1937). Uber das Gesetz der Grossen Zahlen [On the law of large num-
bers]. Acta Litterarum ac Scientiarum Regiae Universitatis Hungaricae Fran-
cisco-Iosophinae, 8, 191-201.
Feller, W. (1957). An introduction to probability theory and its applications (Vol. 1, 2nd
ed.). New York: Wiley.
Feller, W. (1966). An introduction to probability theory and its applications, Vol. 2. New
York: Wiley.
Ferguson, T. S. (1989). Who solved the secretary problem? Statistical Science, 4,282-296.
Ferrell, W. R., & McGoey, P. J. (1980). A model of calibration for subjective probabilities.
Organizational Behavior and Human Performance, 26, 32-53.
Feynman, R. P. (1989). The character of physical law. Cambridge, MA: MIT Press. (Origi-
nal work published 1965)
Fiedler, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors.
Psychological Research, 50, 123-129.
Fields, J. M., & Schuman, H. (1976). Public beliefs about the beliefs of the public. Public
Opinion Quarterly, 40, 427-448.
Finke, R. A. (1984). Strategies for being random. Bulletin of the Psychonomic Society, 22,
40-41.
Fischer, G. W. (1982). Scoring-rule feedback and the overconfidence syndrome in subjec-
tive probability forecasting. Organizational Behavior and Human Performance, 13,
1-16.
Fischhoff, B. (1977). Perceived informativeness of facts. Journal of Experimental Psychol-
ogy: Human Perception and Performance, 3, 349-358.
Fischhoff, B. (1982). Debiasing. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judg-
ment under uncertainty: Heuristics and biases (pp. 422-444). Cambridge, England:
Cambridge University Press.
Fischhoff, B., & Beyth-Marom, R. (1983). Hypothesis evaluation fromaBayesian perspec-
tive. Psychological Review, 90, 239-260.
Fischhoff, B., Lichtenstein, S., Slovic, P., Derby, S. L., & Keeney, R. L. (1981). Acceptable
risk. New York: Cambridge University Press.
Fischhoff, B., & MacGregor, D. (1982). Subjective confidence in forecasts. Journal of
Forecasting, 1, 155-172.
REFERENCES • 453
Fischhoff, B., & MacGregor, D. (1983). Judged lethality: How much people seem to know
depends upon how they are asked. Risk Analysis, 3, 229-236.
Fischhoff, B., & Slovic, P. (1980). A little learning ...: Confidence in multicue judgment.
In R. S. Nickerson (Ed.), Attention and performance VIII (pp. 779-800). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1977). Knowing with certainty: The appropri-
ateness of extreme confidence. Journal of Experimental Psychology: Human Percep-
tion and Performance, 3, 552-564.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated
failure probabilities to problem representations. Journal of Experimental Psychology:
Human Perception and Performance, 4, 330-344.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1979). Subjective sensitivity analysis. Organi-
zational Behavior and Human Performance, 23, 339-359.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1980). Knowing what you want; Measuring la-
bile values. In T. S. Wallsten (Ed.), Cognitive processes in choice and decision behavior
(pp. 117-141). Hillsdale, NJ: Lawrence Erlbaum Associates.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1982). Lay foibles and expert fables in judg-
ments about risks. The American Statistician, 36, 241-255.
Fischhoff, B., Sverson, O., & Slovic, P. (1987). Active responses to environmental hazards:
Perception and decision making. In D. Stokols & I Airman (Eds.), Handbook of environ-
mental psychology (pp. 1089-1133). New York: Wiley.
Fishburn, P. C., & Kochenberger, G. A. (1979). Two-piece von Neumann-Morgenstera
utility functions. Decision Sciences, 10, 503-518.
Fisher, A. (1988). The logic of real arguments. New York: Cambridge University Press.
Fischer, I., & Budescu, D. V. (1994). Desirability and hindsight biases in predicting re-
sults of a multi-party election. In J. P. Caverni, M. Bar-Hillel, H. Barron, & H.
Jungermann (Eds.), Frontiers in decision research (pp. 193-211). Amsterdam: North
Holland.
Fisk, A. D., & Schneider, W. (1984). Memory as a function of attention, level of processing,
and automatization. Journal of Experimental Psychology: Learning, Memory, and Cog-
nition, 10, 181-197.
Flammarion, C. (1890). L'inconnu et les problemes psychiques [The unknown and psychic
problems]. Paris.
Fleming, R. A. (1970). The processing of conflicting information in a simulated tactical de-
cision-making task. Human Factors, 12, 375-385.
Flexer, A. J., & Bower, G. H. (1975). Further evidence regarding instructional effects on
frequency judgments. Bulletin of the Psychonomic Society, 6, 321-324.
Folger, R. (1989). Significance tests and the duplicity of binary decisions. Psychological
Bulletin, 106, 155-160.
Fong, G. T, Krantz, D. H., & Nisbett, R. E. (1986). The effects of statistical training on
thinking about everyday problems. Cognitive Psychology, 18, 253-292.
Forgas, J. P., Bower, G. H., & Krantz, S. (1984). The influence of mood on perceptions of
social interactions. Journal of Personality and Social Psychology, 20, 497-513.
Fourier, J. B. J. (1819). Extrait d'un m6moire sur la the"orie analytique des assurances [Ex-
cerpt from a memoir on the analytical theory of assurances]. Annales de Chimie et de
Physique, 10, 177-189.
Freudenthal, H. (1970). The aims of teaching probability. In L. Rade (Ed.), The teaching of
probability and statistics (pp. 151-167). Stockholm: Almqvist & Wiksell.
Freund, J. E. (1965). Puzzle or paradox. American Statistician, 19, 29,44.
Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychological
Methods, 1, 379-390.
454 • REFERENCES
Friedman, M. (1953). Essays inpositive economics. Chicago: University of Chicago Press.
Friedman, M., & Savage, L. J. (1948). The utility analysis of choices involving risks. Jour-
nal of Political Economy, 56, 279-304.
Frisch, D., & Baron, J. (1988). Ambiguity and rationality. Journal of Behavioral Decision
Making, 1, 149-157.
Galanter, E., & Pliner, P. (1974). Cross-modality matching of money against other con-
tinua. In H. R. Moskowitz, B. Scharf, & J. C. Stevens (Eds.), Sensation and measure-
ment (pp. 65-76). Dordrecht, Netherlands: Reidel.
Gallon, F. (1869). Hereditary genius; An inquiry into its laws and consequences. London:
Macmillan.
Gallon, F. (1874). English men of science: Their nature and nurture. London: Macmillan.
Gallon, F. (1889). Natural inheritance. London: Macmillan.
Gallon, F. (1909). Memories of my life (3rd ed.). London: Melhuen.
Gamow, G., & Stern, G. (1958). Puzzle-math. New York: Viking.
Gardenfors, P., & Sahlin, N.-E. (1983). Decision making with unreliable probabilities.
British Journal of Mathematical and Statistical Psychology, 36, 240-251.
Gardner, M. (1957). Fad and fallacies in the name of science. New York: Dover.
Gardner, M. (1967). The numerology of Dr. Matrix. New York: Simon & Schuster.
Gardner, M. (1974). On the paradoxical situations thai arise from nontransitive relations.
Scientific American, 237(4), 110-114.
Gardner, M. (1976). Malhematical games: On the fabric of inductive logic, and some prob-
ability paradoxes. Scientific American, 234(3), 119-124.
Gardner, M. (1979). Mathematical games: In some patterns of numbers or words Ihere may
be less than meets the eye. Scientific American, 241(3), 22-32.
Gauthier, D. (1990). Maximization constrained: The rationality of cooperation. In P. K.
Moser (Ed.), Rationality in action: Contemporary approaches (pp. 315-334). New
York: Cambridge University Press. (Original work published 1986)
Gavanski, I., &Hui, C. (1992). Natural sample spaces and uncertain belief. JournalofPer-
sonality and Social Psychology, 63, 766-780.
Gell-Mann, M. (1994). The quark and the jaguar: Adventures in the simple and the com-
plex. New York: Freeman.
Gelman, A., Carlin, J., Stern, H., & Rubin, D. B. (1995). Bayesian data analysis. London:
Chapman & Hall.
Gettys, C. F., Fisher, S. D., & Mehle, T. (1978). Hypothesis generation and plausibility as-
sessment (Annual Rep. No. R 15-10-78. University of Oklahoma, Decision Processes
Laboratory.
Gettys, C. F., & Wilke, T. A. (1969). The application of Bayes's theorem when the Irue data
state is uncertain. Organizational Behavior and Human Performance, 4, 125-141.
Gibbs, W. W. (1994). Software's chronic crisis. Scientific American, 271(3), 86-95.
Gibson, B., Sanbonmalsu, D. M., & Posavac, S. S. (1997). The effects of selective hy-
pothesis testing on gambling. Journal of Experimental Psychology: Applied, 3,
126-142.
Giere, R. N. (1972). The significance test controversy. British Journal for the Philosophy of
Science, 23, 170-181.
Gigerenzer, G. (1987). Probabilistic thinking and the fight against subjectivity. In L.
Krtiger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution: Vol. 2.
Ideas in the sciences (pp. 11-33). Cambridge, MA: MIT Press.
Gigerenzer, G. (1991b). How to make cognitive illusions disappear: Beyond "heuristics
and biases." In W. Stroebe & M. Hewslone (Eds.), European review of social psychol-
ogy (Vol. 2, pp. 83-115). London: Wiley.
REFERENCES • 455
Gigerenzer, G. (1993). The bounded rationality of probabilistic mental models. In K. I.
Manktelow & D. E. Over (Eds.), Rationality: Psychological and philosophical perspec-
tives (pp. 284-313). London: Routledge.
Gigerenzer, G. (1994). Why the distinction between single-event probabilities and fre-
quencies is important for psychology (and vice versa). In G. Wright & P. Ayton (Eds.),
Subjective probability (pp. 129-161). New York: Wiley.
Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of
bounded rationality. Psychological Review, 103, 650-669.
Gigerenzer, G., & Goldstein, D. G. (1999). Betting on one good reason: The Take the Best
heuristic. In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart
(pp. 75-95). New York: Oxford University Press.
Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and content: The use of base
rates as a continuous variable. Journal of Experimental Psychology: Human Perception
and Performance, 14, 513-525.
Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without in-
struction: Frequency formats. Psychological Review, 102, 684-704.
Gigerenzer, G. Hoffrage, U., & Kleinbolting, H. (1991). Probabilistic mental models: A
Brunswikian theory of confidence. Psychological Review, 98, 506-528.
Gigerenzer, G., & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Gigerenzer, G., Swijtink, Z., Porter, T, Daston, L., Beatty, J., & Kriiger, L. (1989). The em-
pire of chance: How probability changed science and everyday life. New York: Cam-
bridge University Press.
Gigerenzer, G., & Todd, P. M. (1999a). Fast and frugal heuristics: The adaptive toolbox. In
G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart (pp. 3-34).
New York: Oxford University Press.
Gigerenzer, G., & Todd, P. M. (Eds.). (1999b). Simple heuristics that make us smart. New
York: Oxford University Press.
Gilbert, J. P., & Mosteller, F. (1966). Recognizing the maximum of a sequence. American
Statistical Association Journal, 61, 35-73.
Gilden, D. L., & Gray Wilson, S. (1995). Streaks in skilled performance. Psychonomic Bul-
letin and Review, 2, 260-265.
Gillies, D. A. (1973). An objective theory of probability. London: Methuen.
Gillman, L. (1992). The car and the goats. American Mathematical Monthly, 99, 3-7.
Gilovich, T. (1991). How we know what isn't so: The fallibility of human reason in every-
day life. New York: The Free Press.
Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the
misperception of random sequences. Cognitive Psychology, 17, 295-314.
Ginossar, Z., & Trope, Y. (1987). Problem solving in judgment under uncertainty. Journal
of Personality and Social Psychology, 52, 464-474.
Glenberg, A. M., & Epstein, W. (1985). Calibration of comprehension. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 11, 702-718.
Glenberg, A. M., Sanocki, T., Epstein, W, & Morris, C. (1987). Enhancing calibration of
comprehension. Journal of Experimental Psychology: General, 116, 119-136.
Glenberg, A. M., Wilkinson, A. C., & Epstein, W. (1982). The illusion of knowing: Failure
in the self-assessment of comprehension. Memory and Cognition, 10, 597-602.
Gluck, M. A., & Bower, G. H. (1988). From conditioning to category learning: An adaptive
network model. Journal of Experimental Psychology: General, 117, 227-247.
Glymour, C. (1996). Why I am not aBayesian. In D. Papineau (Ed.), Philosophy of science
(pp. 290-213). Oxford, England: Oxford University Press.
456 • REFERENCES
Goddard, M, & Allan, L. (1988). A critique of Alloy and Tabachnik's theoretical frame-
work for understanding covariation assessment. Psychological Review, 95, 296-298.
Gold, E., & Hester, G. (1989). The gambler's fallacy and the coin's memory. Unpublished
manuscript, Carnegie-Mellon University, Pittsburgh, PA.
Goldberg, L. R. (1959). The effectiveness of clinicians'judgments: The diagnosis of or-
ganic brain damage from the Bender-Gestalt test. Journal of Consulting Psychologists,
23, 23-33.
Goldberg, L. R. (1965). Diagnosticians vs. diagnostic signs: The diagnosis of psychosis vs.
neurosis from MMPI. Psychological Monographs, 79 (Whole No. 9).
Goldberg, L. R. (1968). Simple models or simple processes? Some research in clinical
judgment. American Psychologist, 23, 483-496.
Goldberg, S. (1976). Copi's conditional probability problem. Philosophy of Science, 43,
286-289.
Golding, S. L., & Rorer, L. G. (1972). Illusory correlation and subjective judgement. Jour-
nal of Abnormal Psychology, 80, 249-260.
Goldman, A. I. (1986). Epistemology and cognition. Cambridge, MA: Harvard University
Press.
Goldsmith, R. W. (1978). Assessing probabilities of compound events in a judicial context.
Scandinavian Journal of Psychology, 19, 103-110.
Goldsteen, R., Schorr, J. K., & Goldsteen, K. S. (1989). Longitudinal study of appraisal at
Three Mile Island: Implications for life event research. Social Science and Medicine,
28, 389-398.
Goldstein, D. G., & Gigerenzer, G. (1999). The recognition heuristic: How ignorance
makes us smart. In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us
smart (pp. 37-58). New York: Oxford University Press.
Goldstein, D., Hasher, L., & Stein, D. K. (1983). The processing of occurrence rate and
item information by children of different ages and abilities. American Journal of Psy-
chology, 96, 229-241.
Goldstein, W., & Einhora, H. (1987). Expression theory and the preference reversal phe-
nomena. Psychological Review, 94, 236-254.
Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society, Series B, 14,
107-114.
Good, I. J. (1972). Comment. Journal ofthe American Statistical Association, 67,374-375.
Good I. J. (Ed) (1983a). Good thinking: The foundations oj'probability and its applications.
Minneapolis: University of Minnesota Press.
Good, I. J. (1983b). The white shoe is a red herring. In I. J. Good (Ed.), Good thinking: The
foundations of probability and its applications (pp. 119-120). Minneapolis: University
of Minnesota Press. (Original work published 1967)
Goodie, A. S., Ortmann, A., Davis, J. N., Bullock, S., & Werner, G. M. (1999). Demons ver-
sus heuristics in artificial intelligence, behavioral ecology, and economics. In G.
Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart (pp. 327-355).
New York: Oxford University Press.
Gould, L. C, Gardner, G. T, DeLuca, D. R., Tiemann, A. R., Doob, L. W., & Stolwijk, J. A.
J. (1988). Perceptions of technological risks and benefits. New York: Russell Sage
Foundation.
Gould, S. J. (1989). The streak of streaks. Chance: New Directions for Statistics and Com-
puting, 2(2), 10-16.
Gould, S. J. (1993). Eight little piggies. New York: Norton.
Graesser, A. C., & Hemphill, D. (1991). Question answering in the context of scientific
mechanisms. Journal of Memory and Language, 30, 186-209.
Graham, R. L., Rothschild, B. L., & Spencer, J. H. (1990). Ramsey theory (2nd ed.). New
York: Wiley.
REFERENCES • 457
Graham, R. L., & Spencer, J. H. (1990). Ramsey theory. Scientific American, 263(1),
112-117.
Grant, D. A. (1962). Testing the null hypothesis and the strategy and tactics of investigating
theoretical models. Psychological Review, 69, 54-61.
Green, C. H., & Brown, R. A. (1978). Counting lives. Journal of Occupational Accidents,
2, 55-70.
Green, D. M., & Swets, J. A. (1966). Signal detection theory andpsychophysics. New York:
Wiley.
Greening, L., & Chandler, C. C. (1997). Why it can't happen to me: The base rate matters,
but overestimating skill leads to underestimating risk. Journal of Applied Psychology,
27, 760-780.
Greenstein, G. (1988). The symbiotic universe: Life and the cosmos in unity. New York:
Morrow.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psycho-
logical Bulletin, 82, 1-20.
Gregory, W. L., Cialdini, R. B., & Carpenter, K. M. (1982). Self-relevant scenarios as medi-
ators of likelihood estimates and compliance: Does imagining make it so? Journal of
Personality and Social Psychology, 43, 89-99.
Grether, D. M. (1980). Bayes' rule as a descriptive model: The representativeness heuristic.
The Quarterly Journal of Economics, 95, 537-557.
Grether, D. M., &Plott, C. (1979). Economic theory of choice and the preference reversal
phenomenon. American Economic Review, 69, 623-638.
Gribbin, J., &Rees,M. (1989). Cosmic coincidences: Darkmatter, mankind, andanthropic
cosmology. New York: Bantam.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. I. Morgan (Eds.), Syntaxandse-
mantics: Vol. 3. Speech acts (pp. 41-58). New York: Seminar Press.
Griffin, D. W, Dunning, D., & Ross, L. (1990). The role of construal processes in overcon-
fident predictions about the self and others. Journal of Personality and Social Psychol-
ogy, 59, 1128-1139.
Griffin, D. W, & Tversky, A. (1992). The weighing of evidence and the determinants of
confidence. Cognitive Psychology, 24, 411-435.
Guttman, L. (1977). What is not what in statistics. The Statistician, 26, 81-107.
Hacking, I (1965). The logic of statistical inference. Cambridge, England: Cambridge Uni-
versity Press.
Hacking,!. (1975). The emergence of probability. New York: Cambridge University Press.
Hacking, I. (1987a). Prussian numbers, 1860-1882. In L. Kriiger, L. J. Daston, & M.
Heidelberger (Eds.), The probabilistic revolution: Vol. 1. Ideas in history (pp.
377-394). Cambridge, MA: MIT Press.
Hacking, I. (1987b). Was there a probabilistic revolution 1800-1930?. In L. Kriiger, L. J.
Daston, & M. Heidelberger (Eds.), The probabilistic revolution: Vol. 1. Ideas in history
(pp. 45-55). Cambridge, MA: MIT Press.
Hacking, I. (1990). The taming of chance. New York: Cambridge University Press.
Hacking, I. (2001). An introduction to probability and inductive logic. New York: Cam-
bridge University Press.
Hake, H. W, & Hyman, R. (1953). Perception of the statistical structure of arandom series
of binary symbols. Journal of Experimental Psychology, 45, 64-74.
Hakohen, D., Avinon, O., & Falk, R. (1981). What is surprising about coincidences? Un-
published manuscript (in Hebrew), The Hebrew University, Jerusalem. (Cited in Falk,
1989)
Hallman, W. K., & Wandersman, A. H. (1992). Attribution of responsibility and individual
and collective coping with environmental threats. Journal of Social Issues, 48(4),
101-118.
458 • REFERENCES
Hamil, R., Wilson, T. D., & Nisbett, R. E. (1980) Insensitivity to sample bias: Generalizing
from atypical cases. Journal of Personality and Social Psychology, 39, 578-589.
Hamilton, D. L. (1979). A cognitive attributional analysis of stereotyping. In L. Berkowitz
(Ed.), Advances in experimental social psychology (Vol. 12, pp. 53-84). New York: Ac-
ademic Press.
Hamilton, D. L. (1981). Illusory correlation as a basis for stereotyping. In D. L. Hamilton
(Ed.), Cognitive processes in stereotyping and intergroup behavior (pp. 333-353).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Hamilton, D. L., Dugan, P. ML, & Trolier, T. K. (1985). The formation of stereotypic be-
liefs: Further evidence for distinctiveness-based illusory correlations. Journal of Per-
sonality and Social Psychology, 48, 5-17.
Hamilton, D. L., & Gifford, R. (1976). Illusory correlation in interpersonal perception: A
cognitive basis for stereotypic judgments. Journal of Experimental Social Psychology,
12, 392-407.
Hamilton, D. L., & Sherman, S. J. (1989). Illusory correlations: Implications for stereotype
theory and research. In D. Bar-Tal, C. F. Graumann, A. W. Kruglanski, & W. Stroebe
(Eds.), Stereotyping and prejudice: Changing conceptions (pp. 59-82). New York:
Springer-Verlag.
Hammerton, M. (1973). A case of radical probability estimation. Journal of Experimental
Psychology, 101, 252-254.
Hammond, K R. (2000). Human judgment and social policy. New York: Oxford University
Press.
Hammond, K. R., & Marvin, B. A. (1981). Report to the Rocky Flats Monitoring Commit-
tee concerning scientists'judgments of cancer risk (Rep. No. 232). Boulder, CO: Uni-
versity of Colorado.
Hammond, K. R., McClelland, G. H., & Mumpower, J. (1980). Human judgment and deci-
sion making. New York: Hemisphere.
Hammond, K. R., Summers, D. A., & Deane, D. H. (1973). Negative effects of outcome
feedback in multiple-cue probability learning. Organizational Behavior and Human
Performance, 9, 30-34.
Hance, B. J., Chess, C., & Sandman, P. M. (1988). Improving dialogue with communities: A
risk communication manual for government. Trenton, NJ: Department of Environmen-
tal Protection.
Hangenlocher, K. G. (1999). A Zeppelin for the 21st century. Scientific American, 281(5),
104-109.
Hansen, R. D., & Donoghue, J. M. (1977). The power of consensus: Information derived
from one* sownbehaviot. Journal of Personality and Social Psychology, 35,294-302.
Hansen, W. B, & Malotte, C. K. (1986). Perceived personal immunity: The development of
beliefs about susceptibility to the consequences of smoking. Preventive Medicine, 15,
363-372.
Hardin, G. (1968). The tragedy of the commons. Science, 162, 1243-1248.
Harkness, A. R., DeBono, K. G., & Borgida, E. (1985). Personal involvement and strategies
for making contingency judgments: A stake in the dating game makes a difference.
Journal of Personality and Social Psychology, 49, 22-32.
Harris, D. M., & Guten, S. (1979). Health protective behavior: An exploratory study. Jour-
nal of Health and Social Behavior, 20, 17-29.
Harris, J. M. (1981). The hazards of bedside Bayes. Journal of American Medical Associa-
tion, 246, 2602-2605.
Harris, P. (1996). Sufficient grounds for optimism: The relationship between perceived
controllability and optimistic bias. Journal of Social and Clinical Psychology, 15,
319-386.
REFERENCES * 459
Harris, R. J. (1997). Significance tests have their place. Psychological Science, 8, 8-11.
Hasher, L>, & Chromiak, W. (1977). The processing of frequency information: An auto-
matic mechanism? Journal of Verbal Learning and Verbal Behavior, 16, 173-184.
Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of
Experimental Psychology: General, 108, 356-388.
Hasher, L., & Zacks, R. T. (1984). Automatic processing of fundamental information: The
case of frequency of occurrence. American Psychologist, 39, 1372-1388.
Hastie, R., Penrod, S. D., & Pennington, N. (1983). Inside the jury. Cambridge, MA: Har-
vard University Press.
Hausch, D. B., Ziemba, W. T, & Rubenstein, M. (1981). Efficiency of the market for race-
track betting. Management Science, 27, 1435-1452.
Hayes, J. R. M. (1964). Human data processing limits in decision making. In E. Bennett
(Ed.), Information system science and engineering: Proceedings of the First Congress
on the Information System Sciences. New York: McGraw-Hill.
Hayes-Roth, B., & Hayes-Roth, F. (1979). A cognitive model of planning. Cognitive Sci-
ence, 3, 275-310.
Hazard, T, & Peterson, C. R. (1973). Odds versus probabilities for categorical events
(Tech. Rep. 73-2). McLean, VA: Decisions and Designs, Inc.
Heidelberger, M. (1987). Fechner's indeterminism: From freedom to laws of chance. In L.
Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The probabilistic revolution: Vol. 1.
Ideas in history (pp. 117-156). Cambridge, MA: MIT Press.
Heilbroner, R. (1974). An inquiry into the human prospect. New York: Norton.
Hendrick, C., & Constantini, A. F. (1970). Number averaging behavior: A primacy effect.
Psychonomic Science, 19, 121-122.
Henle, M. (1962). On the relation between logic and thinking. Psychological Review, 69,
366-378.
Henrich, J. (2000). Does culture matter in economic behavior? Ultimatum game bargaining
among the machiguenga of the Peruvian Amazon. The American Economic Review, 90,
973-979.
Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R.
(2001). In search of Homo economicus: Behavioral experiments in kl 5 small-scale so-
cieties. American Economic Review, 91, 73-78.
Henrion, M., & Fischhoff, B. (1986). Assessing uncertainty in physical constants. Ameri-
can Journal of Physics, 54, 791-798.
Herman, L. M., Orastein, G. N., & Bahrick, H. P. (1964). Operator decision performance
using probabilistic displays of object location. IEEE Transactions on Human Factors in
Electronics, 5, 13-19.
Hertwig, R., & Gigerenzer, G. (1994). Unpublished manuscript. (Described in Gigerenzer,
1994)
Hertwig, R., Hoffrage, U., & Martignon, L. (1999). Quick estimation: Letting the environ-
ment do the work. In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us
smart (pp. 209-234). New York: Oxford University Press.
Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on pre-
dictions of novice performance. Journal of Experimental Psychology: Applied, 5,205-221.
Hintzman, D. L. (1980). Simpson's paradox and the analysis of memory retrieval. Psycho-
logical Review, 87, 398-410.
Hintzman, D. L. (1986). "Schemata abstraction" in a multiple-trace memory model. Psy-
chological Review, 93, 411-428.
Hintzman, D. L., Asher, S. J., & Stern, L. D. (1978). Incidental retrieval and memory for co-
incidences. In M. M. Grunberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of
memory (pp. 61-68). London: Academic Press.
460 • REFERENCES
Hirshleifer, D., & Rasmusen, E. (1989). Cooperation in a repeated prisoners' dilemma with
ostracism. Journal of Economic Behavior and Organization, 12, 87-106.
Hoch, S. J. (1984). Availability and inference in predictive judgment. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 10, 649-662.
Hoch, S. J. (1985). Counterfactual reasoning and accuracy in predicting personal events.
Journal of Experimental Psychology .-Learning, Memory, and Cognition, 11,719-731.
Hoch, S. J., Malcus, L., & Hasher, L. (1986). Frequency discrimination: Assessing
global-level and element-level units in memory. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 12, 232-240.
Hoffman, M. (Ed.). (1991). The world almanac and book of facts. New York: World Alma-
nac.
Hoffman, P. (1988). Archimedes revenge: The joys and perils of mathematics. New York:
Fawcett Crest.
Hoffrage, U. (2000). Why the analyses of cognitive processes matter. Behavioral and Brain
Sciences, 23, 679-680.
Hofstadter, D. R. (1983). Metamagical themas: Computer tournaments of the prisoner's di-
lemmas suggest how cooperation evolves. Scientific American, 248(5}, 14-20.
Hofstatter, P. R. (1939). Uber die Schatzung von gruppeneigenschaften [On the estimation
of group properties]. Zeitschrift fur Psychologie, 145, 1-44.
Hogarth, R. M. (1975). Cognitive processes and the assessment of subjective probability
distributions. Journal of the American Statistical Association, 70, 271-291.
Hogarth, R. M. (1987). Judgement and choice (2nd ed.). London: Wiley.
Holland, B. K. (1998). Probability without equations: Concepts for clinicians. Baltimore,
MD: Johns Hopkins University Press.
Holland, J. H., Holyoak, K. J., Nisbett, R. E., &Thagard, P. R. (1986). Induction: Processes
of inference, learning, and discovery. Cambridge, MA: MIT Press.
Hollingworth, H. L. (1913a). Characteristic differences between recall and recognition.
American Journal of Psychology, 24, 532-544.
Hollingworth, H. L. (1913b). Experimental studies in judgment. Archives of Psychology,
29, 1-96.
Hombas, V. C. (1997). Waiting time and expected waiting time—paradoxical situations.
The American Statistician, 51, 130-133.
Hooke, R. (1989). Basketball, baseball, and the null hypothesis. Chance: New Directions
for Statistics and Computing, 2(4), 35-37.
Hoorens,V.,&Buunk,B.P. (1993). Social comparison of health risks: Locus of control, the
person-positivity bias, and unrealistic optimism. Journal of Applied Social Psychology,
18, 291-302.
Horgan, J. (1994). Radon's risks. Scientific American, 271(2), 14-16.
Horvath, R. A. (1987). The rise of macroeconomic calculations in economic statistics. In L.
Kriiger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution: Vol. 2.
Ideas in the sciences (pp. 147-169). Cambridge, MA: MIT Press.
Horwich, P. (1982). Probability and evidence. New York: Cambridge University Press.
Houghton,R. A., & Woodwell.G. M. (1989). Global climatic change. Scientific American,
260(4), 36-44.
Howard, R. A. (1966). Decision analysis: Applied decision theory. In D. B. Hertz & J.
Melese (Eds.), Proceedings of the Fourth International Conference on Operational Re-
search (pp. 55-71). New York: Wiley-Interscience.
Howard, R. A. (1968). The foundations of decision analysis. IEEE Transactions on Sys-
tems Science and Cybernetics, SSC-4, 211-219.
Howell, W. C. (1966). Task characteristics in sequential decision behavior. Journal of Ex-
perimental Psychology, 71, 124-131.
REFERENCES • 461
Howell, W. C. (1973). Storage of events and event frequency: A comparison of two para-
digms in memory. Journal of Experimental Psychology, 98, 260-263.
Howson, C., & Urbach, P. (1993) Scientific reasoning: The Bayesian approach (2nd ed.).
Chicago: Open Court.
Hoyle, F. (1983). The intelligent universe. London: Michael Joseph.
Huberty, C. J., & Morris, J. D. (1988). A single contrast test procedure. Educational and
Psychological Measurement, 48, 567-578.
Huff, D. (1973). How to lie with statistics. New York: Viking Penguin. (Original work pub-
lished 1954)
Hunter, J. E. (1997). Need: A ban on the significance test. Psychological Science, 8,3-7.
laffaldano, M. T., & Muchinsky, P. M. (1985). Job satisfaction and job performance: A
meta-analysis. Psychological Bulletin, 97, 251-273.
Ibrekk, H., & Morgan, M. G. (1987). Graphical communication of uncertain quantities to
non-technical people. Risk Analysis, 7, 519-529.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adoles-
cence. New York: Basic Books. (Original work published 1955)
Irwin, F. W. (1944). The realism of expectations. Psychological Review, 51, 120-126.
Irwin, F. W. (1953). Stated expectations as functions of probability and desirability of out-
comes. Journal of Personality, 21, 329-335.
Irwin, F. W., & Snodgrass, J. G. (1966). Effects of independent and dependent outcomes on
bets. Journal of Experimental Psychology, 71, 282-285.
Jacoby, L. L. (1972). Context effects on frequency judgments of words and sentences. Jour-
nal of Experimental Psychology, 94, 255-260.
Jacoby, L. L., Bjork, R. A., & Kelley, C. M. (1994). Illusions of comprehension, compe-
tence, and remembering. In D. Druckman & R. A. Bjork (Eds.), Learning, remember-
ing, believing: Enhancing human performance (pp. 57-80). Washington, DC: National
Academy Press.
Jaffe, A. J., & Spirer, H. F. (1987). Misused statistics: Straight talk for twisted numbers.
New York: Marcel Dekker.
Jarvik, M. E. (1951). Probability learning and a negative recency effect in the serial antici-
pation of alternative symbols. Journal of Experimental Psychology, 41, 291-297.
Jaynes, E. P. (1968). Prior probabilities. IEEE Transactions on System Science and Cyber-
netics, SSC-4, 227-240.
Jeffrey, L. (1989). Writing and rewriting poetry: William Wordsworth. In D. B. Wallace &
H. E. Gruber (Eds.), Creative people at work (pp. 69-89). New York: Oxford University
Press.
Jeffrey, R. (1983). The logic of decision (2nd ed.). Chicago: University of Chicago Press.
Jeffreys, H. (1934). Probability and scientific method. Proceedings of the Royal Society of
London, Series A, 146, 9-15.
Jeffreys, H. (1939). Theory of probability. Oxford, England: Clarendon.
Jemmott, J. B., Ditto, P. H., & Croyle, R. T. (1986). Judging health status: Effects of per-
ceived prevalence and personal relevance. Journal of Personality and Social Psychol-
ogy, 50, 899-905.
Jenkins, H. M., & Ward, W. C. (1965). Judgment of contingency between responses and
outcomes. Psychological Monographs: General and Applied, 79 (Whole No. 594).
Jenkins, J. J., & Tuten, J. T. (1992). Why isn't the average child from the average fam-
ily?—and similar puzzles. American Journal of Psychology, 105, 517-526.
Jennings, D., Amabile, T. M., & Ross, L. (1982). Informal covariation assessment:
Data-based versus theory-based judgments. In A. Tversky, D. Kahneman, & P. Slovic
(Eds.) Judgment under uncertainty: Heuristics and biases (pp. 211-230). New York:
Cambridge University Press.
462 • REFERENCES
Jepson, C., Krantz, D. H., & Nisbett, R. E. (1983). Inductive reasoning: Competence or
skill? Behavioral and Brain Sciences, 6, 94-101.
Johnson, D. (1981). V-l, V-2: Hitler's vengeance on London. New York: Stein & Day.
Johnson, E. J., & Schkade, D. A. (1989). Bias in utility assessment: Further evidence and
explanations. Management Science, 35, 406-424.
Johnson, E. J., & Tversky, A. (1983). Affect, generalization, and the perception of risk.
Journal of Personality and Social Psychology, 45, 20-31.
Johnson, E. M. (1973). Numerical encoding of qualitative expressions of uncertainty
(Tech. Paper No. 250). Alexandria, VA: U.S. Army Research Institute for the Behav-
ioral and Social Sciences.
Johnson, E. M. (1974). The effect of data source reliability on intuitive inference (ARI
Tech. Paper No. 251, AD 784097). Alexandria, VA: U.S. Army.
Johnson, E. M., Cavanagh, R. C., Spooner, R. L., & Samet, M. G. (1973). Utilization of reli-
ability measurements in Bayesian inference: Models and human performance. IEEE
Transactions on Reliability, R-22, 176-183.
Johnson, M. K., Peterson, M. A., Yap, E. C., & Rose, P. M. (1989). Frequency judgments:
The problem of defining a perceptual event. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 15, 126-136.
Johnson, M. K., Raye, C. L., Wang, A. Y, & Taylor, J. H. (1979). Fact and fantasy: The
roles of accuracy and variability in confusing imaginations with perceptual experi-
ences. Journal of Experimental Psychology: Human Learning and Memory, 5,
229-240.
Johnson, M. K., Taylor, T. H., & Raye, C. L. (1977). Fact and fantasy: The effects of inter-
nally generated events on the apparent frequency of externally generated events. Mem-
ory and Cognition, 5, 116-122.
Johnston, W. A. (1967). An individual performance and self-evaluation in a simulated
team. Organizational Behavior and Human Performance, 2, 309-328.
Jonides, J., & Jones, C. M. (1992). Direct coding for frequency of occurrence. Journal of
Experimental Psychology: Leaning, Memory, and Cognition, 18, 368-378.
Jorland, G. (1987). The St. Petersburg paradox, 1713-1937. In L, Kriiger, L. J. Daston, &
M. Heidelberger (Eds.), The probabilistic revolution: Vol. 1. Ideas in history (pp.
157-190). Cambridge, MA: MIT Press.
Juslin, P. (1993). An explanation of the hard-easy effect in studies of realism of confidence
in one's general knowledge. European Journal of Cognitive Psychology, 5, 55-71.
Juslin, P. (1994). The overconfidence phenomenon as a consequence of informal experi-
menter-guided selection of almanac items. Organizational Behavior and Human Deci-
sion Processes, 57, 226-246.
Kac, M. (1964). Probability. Scientific American, 211(3), 92-108.
Kac, M. (1983). Marginalia: What is random. American Scientist, 71, 405-406.
Kahan, J. P. (1974). Rationality, the prisoner's dilemma, and population. Journal of Social
Issues, 30, 189-210.
Kahneman, D. (2000). A psychological point of view: Violations of rational rules as a diag-
nostic of mental processes. Behavioral and Brain Sciences, 232, 681-683. ,
Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1990). Experimental tests of the endow-
ment effect and the coase theorem. Journal of Political Economy, 98, 1325-1348.
Kahneman, D., & Lovallo, D. (1993). Timid choices and bold forecasts: A cognitive per-
spective on risk taking. Management Science, 39, 17-30.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty:
Heuristics and biases. Cambridge, England: Cambridge University Press.
Kahneman, D., & Tversky, A. (1972a). On the psychology of prediction. Oregon Research
Institute Bulletin, 12 (Whole No. 4).
REFERENCES • 463
Kahneman, D., & Tversky, A. (1972b). Subjective probability: A judgment of representa-
tiveness. Cognitive Psychology, 3, 430-454.
Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Re-
view, 80, 237-251.
Kahneman, D., & Tversky, A. (1979a). On the interpretation of intuitive probability: A re-
ply to Jonathan Cohen. Cognition, 7, 409-411.
Kahneman, D., & Tversky, A. (1979b). Prospect theory. An analysis of decision under risk.
Econometrica, 47, 263-291.
Kahneman, D., & Tversky, A. (1982a). On the study of statistical intuitions. Cognition, 11,
123-141.
Kahneman, D., & Tversky, A. (1982b). Variants of uncertainty. Cognition, 11, 143-157.
Kahneman, D., & Tversky, A. (1984). Choices, values, and frames. American Psycholo-
gist, 39, 341-350.
Kalbfleisch, J. G., & Sprott, D. A. (1976). On tests of significance. In W. L. Harper & C. A.
Hooker (Eds.), Foundations of probability theory, statistical inference, and statistical
theories of science (Vol. 2, pp. 259-272). Dordrecht, Netherlands: Reidel.
Kamlah, A. (1987). The decline of the Laplacian theory of probability: A study of Stumpf,
von Kries, and Meinong. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The proba-
bilistic revolution: Vol. 1. Ideas in history (pp. 91-116). Cambridge, MA: MIT Press.
Kaplan, M. R, & Schwartz, S. (Eds.). (1977). Human judgment and decision processes in
applied settings. New York: Academic Press.
Kaplan, R. J., & Newman, J. R. (1966). Studies in probabilistic information processing.
IEEE Transactions in Human Factors in Electronics, 7, 49-63.
Kareev, Y. (1995a). Positive bias in the perception of covariation. Psychological Review,
102, 490-502.
Kareev, Y. (1995b). Through a narrow window: Working memory capacity and the detec-
tion of covariation. Cognition, 56, 263-269.
Keeney, R. L., & Raiffa, H. (1976). Decisions with multiple objectives. New York: Wiley.
Kelley, C. M., & Lindsay, D. S. (1993). Remembering mistaken for knowing: Ease of re-
trieval as a basis for confidence in answers to general knowledge questions. Journal of
Memory and Language, 32, 1-24.
Kellogg, R. T. (1981). Feature frequency in concept learning: What is counted? Memory
and Cognition, 9, 157-163.
Kempton, W. (1991). Lay perspectives on global climate change. Global Environmental
Change, 1, 183-208.
Keren, G. B. (1987). Facing uncertainty in the game of bridge: A calibration study. Organi-
zational Behavior and Human Decision Processes, 39, 98-114.
Keren, G. B. (1991). Calibration and probability judgments: Conceptual and methodologi-
cal issues. Acta Psychologica, 77, 217-273.
Keren, G. B,. & Wagenaar, W. A. (1985). On the psychology of playing blackjack: Norma-
tive and descriptive considerations with implications for decision theory. Journal of Ex-
perimental Psychology: General, 114, 133-158.
Kerr, R. A. (2000). Draft report affirms human influence. Science, 288, 589-590.
Kessler, D. A., & Feiden, K. L. (1995). Faster evaluation of vital drugs. Scientific Ameri-
can, 272(3), 26-32.
Keynes, J. M. (1921). Treatise on probability. London: Macmillan.
Keynes, J. M. (1956). The application of probability to conduct. In J. R. Newman (Ed.), The
world of mathematics (Vol. 2, pp. 1360-1373). New York: Simon & Schuster. (Original
work published 1921)
Kidd, J. B. (1970). The utilization of subjective probabilities in production planning. Acta
Psychologica, 34, 338-347.
King, J. P. (1992). The art of mathematics. New York: Fawcett Columbine.
464 • REFERENCES
Kintsch, W. (1970). Learning, memory, and conceptual processes. New York: Wiley.
Kirk, R. (1996). Practical significance: A concept whose time has come. Educational and
Psychological Measurement, 56, 746-759.
Kirkpatrick, L. A., & Epstein, S. (1992). Cognitive-experiential self-theory and subjective
probability: Further evidence for two conceptual systems. Journal of Personality and
Social Psychology, 63, 534-544.
Klahr, D. (1976). The social psychologist as troll. In J. S. Carroll & J. W. Payne (Eds.), Cog-
nition and social behavior (pp. 243-250). Hillsdale, NJ: Lawrence Erlbaum Associates.
Klatzky, R. L., & Messick, D. M. (1995). Curtailing medical inspections in the face of nega-
tive consequences. JEP: Applied, 1, 163-178.
Knight, F. H. (1921). Risks, uncertainty and profit. Boston: Houghton Mifflin.
Knobloch, E. (1987). Emile Borel as a probabilist. In L. Kriiger, L. J. Daston, & M.
Heidelberger (Eds.), The probabilistic revolution: Vol. 1. Ideas in history (pp.
215-233). Cambridge, MA: MIT Press.
Koestler, A. (1972). The roots of coincidence: An excursion into parapsychology. New
York: Vintage Books.
Kogan, N., & Wallach, M. A. (1967). Risk taking as a function of the situation, the person,
and the group. In New directions in psychology (Vol. 3, pp. 111-278). New York: Holt,
Rinehart & Winston.
Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information.
Problems of Information Transmission, 1, 1-7.
Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Ex-
perimental Psychology: Human Learning and Memory, 6, 107-118.
Kramer, G. P., Kerr, N. L., & Carroll, J. S. (1990). Pretrial publicity, judicial remedies, and
jury bias. Law and Human Behavior, 14, 409-438.
Krantz, D. H. (1981). Improvements in human reasoning and an error in L. J. Cohen's.
Commentary on Cohen, 1981. The Behavioral and Brain Sciences, 4, 340-341.
Krantz, D. H. (1991). From indices to mappings: The representational approach to mea-
surement. In D. Brown & J. Smith (Eds.), Frontiers of mathematical psychology (pp.
1-52). New York: Springer-Verlag.
Kreps, D. M. (1990). A course in microeconomic theory. Princeton, NJ: Princeton Univer-
sity Press.
Krischer, J. P. (1980). An annotated bibliography of decision analytic applications to health
care. Operations Research, 28, 97-113.
Kriiger, L. (1987a). The probabilistic revolution in physics: An overview. In L. Kriiger, G.
Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution: Vol. 2. Ideas in the
sciences (pp. 373-378). Cambridge, MA: MIT Press.
Kriiger, L. (1987b). The slow rise of probabilism: Philosophical arguments in the nine-
teenth century. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The probabilistic
revolution: Vol. 1. Ideas in history (pp. 60-89). Cambridge, MA: MIT Press.
Kriiger, L., Daston, L. J., & Heidelberger, M. (Eds.). (1987). The probabilistic revolution:
Ideas in history (Vol. 1). Cambridge, MA: MIT Press.
Kriiger, L., Gigerenzer, G., & Morgan, M. S. (Eds.). (1987). The probabilistic revolution:
Vol. 2. Ideas in the sciences. Cambridge, MA: MIT Press.
Kubovy, M., & Gilden, D. L. (1990). Apparent randomness is not always the complement
of apparent order. In G. Lockhead & J. R. Pomerantz (Eds.), The perception of struc-
ture: Essays in honor of Wendell R. Garner (pp. 115-127). Washington, DC: American
Psychological Association.
Kuhberger, A. (1995). The framing of decisions: A new look at old problems. Organiza-
tional Behavior and Human Decision Processes, 62, 230-240.
Kuhn, D., Phelps, E., & Walters, J. (1985). Correlational reasoning in an everyday context.
Journal of Applied Developmental Psychology, 6, 85-97.
REFERENCES • 465
Kuhn, D., Weinstock, M., & Flaton, R. (1994). How well do jurors reason? Competence dimen-
sions of individual variations in ajuror reasoning task. Psychological Science, 5,289-296.
Kunda, Z. (1987). Motivation and inference: Self-serving generation and evaluation of evi-
dence. Journal of Personality and Social Psychology, 53, 636-647.
Kunda, Z., & Nisbett, R. E. (1986a). Prediction and the partial understanding of the law of
large numbers. Journal of Social Psychology, 22, 339-354.
Kunda, Z., & Nisbett, R. E. (1986b). The psychometrics of everyday life. Cognitive Psy-
chology, 18, 199-224.
Landman, J., & Manis, M. (1983). Social cognition: Some historical and theoretical per-
spectives. Advances in Experimental Social Psychology, 16, 49-123.
Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology,
32,311-328.
Langer, E. J., & Roth, J. (1975). Heads I win, tails is chance: The illusion of control is a
function of the sequence of outcomes in a purely chance task. Journal of Personality
and Social Psychology, 32, 951-955.
Laplace, P. S. (1774). M6moire sur la probability des causes par les 6v6nements [Memoir
on the probability of causes of events]. Me moires pre sente es a I'Acade mie des Sci-
ences, 6, 621-656.
Laplace, P. S. (1812). Th6 orie analytique des probability s [Analytical theory of probabili-
ties]. (Reprinted in Laplace, P. S. [1878-1912]. Oeuvres completes de Laplace. Paris:
Gauthier-Villars.)
Laplace, P. S. (1951). A philosophical essay on probabilities. (F. W. Truscott & F. L. Emory,
Trans.). New York: Dover. (Original work published 1814)
Larkey, P. D., Smith, R. A., & Kadane, J. B. (1989). It's okay to believe in the "hot-hand."
Chance: New Directions for Statistics and Computing, 2(4), 22-30.
Latane", B., Williams, K., & Harkins, S. (1979). Many hands make light the work: The
causes and consequences of social loafing. Journal of Personality and Social Psychol-
ogy, 37, 822-832.
Lathrop, R. G. (1967). Perceived variability. Journal of Experimental Psychology, 73,
498-502.
Lattimer, J. (1966). Similarities in fatal woundings of John Wilkes Booth and Lee Harvey
Oswald. New York: s.n.
Lattimer, J. (1980). Kennedy and Lincoln: Medical and ballistic comparisons of their as-
sassinations. New York: Harcourt Brace Jovanovich.
Layzer, D. (1990). Cosmogenesis: The growth of order in the universe. New York: Oxford
University Press.
L6cuyer, B-P. (1987). Probability in vital and social statistics: Quetelet, Farr, and the
Bertillons. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The probabilistic revo-
lution: Vol. 1. Ideas in history (pp. 317-335). Cambridge, MA: MIT Press.
Leddo, J., Abelson, R. P., & Gross, P. H. (1984). Conjunctive explanations: When two rea-
sons are better than one. Journal of Personality and Social Psychology, 47, 933-943.
Lederman, L. (1993). The God particle: If the universe is the answer, what is the question?
New York: Dell.
Leli, D. A., & Filskov, S. B. (1984). Clinical detection of intellectual deterioration associ-
ated with brain damage. Journal of Clinical Psychology, 40, 1435-1441.
Levi,I. (1981). Should Bayesians sometimes neglect base rates! Behavioral and Brain Sci-
ences, 4, 342-343.
Levi,I. (1983). Who commits the base rate fallacy'! Behavioral and Brain Sciences, 6,502-506.
Levin, B. (1981). A representation for multinomial cumulative distribution functions. The
Annals of Statistics, 9, 1123-1126.
Levin, I. P. (1974). Averaging processes and intuitive statistical judgments. Organizational
Behavior and Human Performance, 12, 83-91.
466 • REFERENCES
Levin, I. P. (1975). Information integration in numerical judgments and decision processes.
Journal of Experimental Psychology: General, 104, 39-53.
Levin, I. P., Chapman, D. P., & Johnson, R. D. (1988). Confidence in judgments based on
incomplete information: An investigation using both hypothetical and real gambles.
Journal of Behavioral Decision Making, 1, 29-41.
Levin, I. P., Johnson, R. D., Deldin, P. J., Carsten, L. M., Cressey, L. J., & Davis, C. D.
(1986). Framing effects in decisions with completely and incompletely described alter-
natives. Organizational Behavior and Human Decision Processes, 38, 48-64.
Levin, I. P., Johnson, R. D., Russo, C. P., & Deldin, P. J. (1985). Framing effects in judg-
ment tasks with varying amounts of information. Organizational Behavior and Human
Decision Processes, 36, 362-377.
Lichtenstein, S., & Fischhoff, B. (1977). Do those who know more also know more about
how much they know? Organizational Behavior and Human Performance, 20,
159-183.
Lichtenstein, S., & Fischhoff, B. (1980). Training for calibration. Organizational Behavior
and Human Performance, 20, 159-183.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1977). Calibration of probabilities: The
state of the art. In H. Jungermann & G. de Zeeuw (Eds.), Decision making and change in
human affairs (pp. 275-324). Dordrecht, Netherlands: Reidel.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The
state of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under
uncertainty: Heuristics and biases (pp. 306-334). Cambridge, England: Cambridge
University Press.
Lichtenstein, S., & Slovic, P. (1971). Reversals of preference between bids and choices in
gambling decisions. Journal of Experimental Psychology, 89, 46-55.
Lichtenstein, S., & Slovic, P. (1973). Response-induced reversals of preference in gam-
bling: An extended replication in Las Vegas. Journal of Experimental Psychology, 101,
16-20.
Lichtenstein, S., Slovic, P., Fischoff, B. Layman, M., & Coombs, B. (1978). Judged fre-
quency of lethal events. Journal of Experimental Psychology: Human Learning and
Memory, 4, 551-578.
Liebrand, W. B. G., Messick, D. M., & Wolters, F. J. M. (1986). Why we are fairer than oth-
ers: A cross-cultural replication and extension. Journal of Experimental Social Psy-
chology, 22, 590-604.
Lindley, D. V. (1965). Introduction to probability and statisticsfrom a Bayesian viewpoint.
Cambridge, England: Cambridge University Press.
Lindley, D. V. (1971). Making decisions. London: Wiley.
Lindley, D. V. (1972). Comment. Journal of the American Statistical Association, 67,
373-374.
Lindley, D. V. (1984). A Bayesian lady tasting tea. In H. A. David & H. T. David (Eds.), Sta-
tistics: An appraisal (pp. 455-485). Ames: Iowa State University Press.
Lindley, D. V. (1993). The end of physics: The myth of a unified theory. New York: Basic
Books.
Lindley, D. V., & Novick, M. R. (1981). The role of exchangeability in inference. The An-
nals of Statistics, 9, 45-58.
Linville, P. W., Fischer, G. W, & Fischhoff, B. (1993). AIDS risk perceptions and decision
biases. In J. B. Pryor & G. D. Reeder (Eds.), The social psychology of HIV infection (pp.
5-38). Hillsdale, NJ: Lawrence Erlbaum Associates.
Llewellyn-Thomas, H., Sutherland, H. J., Tibshirani, R., Ciampi, A., Till, J. E., & Boyd, N.
F. (1988). The measurement of patients' values in medicine. In J. Dowie & A. Elstein
(Eds.), Professional judgment. A reader in clinical decision making (pp. 395-408).
Cambridge, England: Cambridge University Press. (Original work published 1982)
REFERENCES * 467
Lloyd, A. L. (1995). Computing bouts of the prisoner's dilemma. Scientific American,
272(6), 110-115.
Loftus, E. F. (1979). Eyewitness testimony. Cambridge, MA: Harvard University Press.
Loftus, E. E, & Wagenaar, W. A. (1988). Lawyers' predictions of success. Jurimetrics
Journal, 29, 437-453.
Loftus,G.R. (1991). On the tyranny of hypothesis testing in the social sciences. Contempo-
rary Psychology, 36, 102-105.
Loftus, G. R. (1995). Data analysis as insight. Behavior Research Methods, Instruments,
and Computers, 27, 57-59.
Loftus, G. R. (1996). Psychology will be a much better science when we change the way we
analyze data. Current Directions in Psychological Science, 5, 161-171.
Loomes, C., &Sugden,R. (1983). A rationale for preference reversal. American Economic
Review, 73, 428-432.
Lopes, L. L, (1976). Model-based decision and inference in stud poker. Journal of Experi-
mental Psychology: General, 105, 217-239.
Lopes, L. L. (1982). Doing the impossible: A note on induction and the experience of ran-
domness. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8,
626-636.
Lopes, L. L., & Ekberg, P-H. S. (1980). Test of an ordering hypothesis in risky deci-
sion-making. Acta Psychologica, 45, 161-167.
Lopes, L. L., & Oden, G. C. (1987). Distinguishing between random and nonrandom events.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 392-400.
Lotka, A. (1926). The frequency distribution of scientific productivity. Journal of the
Washington Academy of Sciences, 16, 317-323.
Lowrance, W. (1976). Of acceptable risk. San Francisco: Freeman.
Luce, R. (1956). Semi-orders and a theory of utility discrimination. Econometrica, 24,
178-191.
Luce, R. D. (2000). Fast, frugal, and surprisingly accurate heuristics. Behavioral and Brain
Sciences, 23, 757-758.
Luce, R. D., & Raiffa, H. (1957). Games and decisions: Introduction and critical survey.
New York: Wiley.
Lund, A. M., Hall, J. W., Wilson, K. P., & Humphreys, M. S. (1983). Frequency judgment ac-
curacy as a function of age and school achievement (learning disabled versus non-leam-
ing-disabled) patterns. Journal of Experimental Child Psychology, 35, 236-247.
Lund, F. H. (1925). Psychology of belief. Journal of Abnormal Social Psychology, 20,
180-190.
Lunt, P. K., & Livingstone, S. M. (1989). Psychology and statistics: Testing the opposite of
the idea you first thought of. The Psychologist, 2, 528-531.
Lusted, L. B. (1977). A study of the efficacy of diagnostic reaiologic procedures: Final re-
port on diagnostic efficacy. Chicago: Efficacy Study Committee of the American Col-
lege of Radiology.
Lykken, D. T. (1968). Statistical significance in psychological research. Psychological
Bulletin, 70, 151-159.
Lyon, D., & Slovic, P. (1975). On the tendency to ignore base rates when estimating proba-
bilities. ORI Research Bulletin, 15, 1.
Lyon, D., & Slovic, P. (1976). Dominance of accuracy information and neglect of base rates
in probability estimation. Acta Psychologica, 40, 287-298.
MacCrimmon, K. R. (1968). Decision making among multiple-attribute alternatives: A
survey and consolidated approach (RAND Corporation Memo RM 4823-ARPA). Santa
Monida, CA: RAND Corporation.
MacDonald, A. B., Klein, J. R., & Wark, D. L. (2003). Solving the solar neutrino problem.
Scientific American, 188(4), 40-49.
468 • REFERENCES
Macdonald, R. R. (1986). Credible conceptions and implausible probabilities. British
Journal of Mathematical and Statistical Psychology, 39, 15-27.
Macdonald, R. R. (1997a). Base rates and randomness. Behavioral and Brain Sciences, 20,
778.
Macdonald, R. R. (1997b). On statistical testing in psychology. British Journal of Psychol-
ogy, 88, 333-347.
Macdonald, R. R., & Gilhooly, K. J. (1990). More about Linda or conjunction in context.
European Journal of Cognitive Psychology, 2, 57-70.
MacGregor, D., Fischhoff, B., & Blackshaw, L. (1987). Search success and expectations
with a computer interface. Information Processing and Management, 23, 419-432.
Mackie, J. L. (1981). Propensity, evidence, and diagnosis. Behavioral and Brain Sciences,
4, 345-346.
Mackworth, N. H. (1965). Originality. The American Psychologist, 20, 51-66.
Malakoff, D. (1999). Bayes offers a "new" way to make sense of numbers. Science, 286,
1460-1464.
Malgady, R. G. (1998). In praise of value judgments in null hypothesis testing ... and of
"accepting" the null hypothesis. American Psychologist, 53, 797-798.
Malkiel, B. G. (1985). A random walk down Wall Street (4th ed.). New York: Norton.
Manis, M., Dovalina, I., Avis, N. E., & Cardoze, S. (1980). Base rates can affect individual
predictions. Journal of Personality and Social Psychology, 38, 231-240.
Marcus, R. (1980). Moral dilemmas and consistency. Journal of Philosophy, 77(3),
121-136.
Margalit, A., & Bar-Hillel, M. (1981). The irrational, the unreasonable, and the wrong.
Commentary on Cohen, 1981. The Behavioral and Brain Sciences, 4, 346-349.
Margolis, H. (1987). Patterns, thinking, and cognition. Chicago: University of Chicago
Press.
Margolis, H. (2000). Simple heuristics that make us dumb. Behavioral and Brain Sciences,
23, 758.
Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151-158.
Marks, R. (1951). The effect of probability, desirability, and "privilege" on the stated ex-
pectations of change. Journal of Personality, 19, 332-351.
Martignon, L., & Hoffrage, U. (1999). Why does one-reason decision making work? In G.
Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart (pp. 119-140).
New York: Oxford University Press.
Martignon, L., & Laskey, K. B. (1999). Bayesian benchmarks for fast and frugal heuristics.
In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make us smart (pp.
169-188). New York: Oxford University Press.
Martin, E. (1981). Simpson's paradox resolved: A reply to Hintzman. Psychological Re-
view, 88, 372-374.
Martin-Lof, P. (1966). The definition of random sequences. Information and Control, 9,
602-619.
Mathematical Sciences Education Board (1988). A framework for the revision ofK-12
mathematical curricula (Final report to the Mathematical Sciences Education Board
from its Curriculum Frameworks Task Force). Washington, DC: National Academy
Press.
May, R. S. (1986). Inferences, subjective probability and frequency of correct answers: A
cognitive approach to the overconfidence phenomenon. In B Brehmer, H. Jungermann,
P. Lourens, & G. Sevon (Eds.), New directions in research on decision making (pp.
175-189). Amsterdam: Elsevier Science.
Mazzoni, G., & Cornoldi, C. (1993). Strategies in study time allocation: Why is study time
sometimes not effective? Journal of Experimental Psychology: General, 122, 47-60.
REFERENCES • 469
McCain, R. A. (2000). Differences, games and pluralism. Behavioral and Brain Sciences,
23, 688-689.
McClintock, C. G., & McNeel, S. P. (1966). Reward level and game-playing behavior.
Journal of Conflict Resolution, 10, 98-102.
McCormick, N. J. (1981). Reliability and risk analysis. New York: Academic Press.
McKenna, F. P. (1993). It won't happen to me: Unrealistic optimism or illusion of control?
British Journal of Psychology, 84, 39-50.
McKenna, F. P., Stanier, R. A., & Lewis, C. (1991). Factors underlying illusory self-assess-
ment of driving skill in males and females. Accident Analysis and Prevention, 23,
45-52.
McKenzie, C. R. (1994). The accuracy of intuitive judgment strategies: Covariation assess-
ment and Bayesian inference. Cognitive Psychology, 26, 209-239.
McLeish, J. (1994). The story of numbers: How mathematics has shaped civilization. New
York: Fawcett Columbine. (Original work published 1991 under the title Number)
McNeil, B., Pauker, S., Sox, H., Jr., & Tversky, A. (1982). Comment on the elicitation of
preferences for alternative therapies. New England Journal of Medicine, 306,
1259-1262.
McPhee, J. (1992). The deltoid pumpkin seed. New York: Noonday Press. (Original work
published 1973)
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a re-
view of the evidence. Minneapolis: University of Minnesota Press.
Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox.
Philosophy of Science, 34, 103-115.
Meehl, P. E. (1986). Superiority of actuarial to clinical prediction. Journal of Personality
Assessment, 50, 370-375.
Meehl, P. E. (1990). Appraising and amending theories: The strategy of Lakatosian defense
and two principles that warrant it. Psychological Inquiry, 1, 108-141.
Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests
by confidence intervals and quantify accuracy of risky numerical predictions. In L. L.
Harlow, S. A. Molaik, & J. H. Steiger (Eds.), What if there were no significance tests?
(pp. 391-423). Mahwah, NJ: Lawrence Erlbaum Associates.
Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of
psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216.
Mehle, T. (1982). Hypothesis generation in an automobile malfunction inference task. Acta
Psychologica, 52, 87-106.
Mehle, T, Gettys, C. V., Manning, C., Baca, S., & Fisher, S. (1981). The availability expla-
nation of excessive plausibility assessments. Acta Psychologica, 49, 127-140.
M6nard, C. (1987). Why was there no probabilistic revolution in economic thought? In L.
Kriiger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution: Vol. 2.
Ideas in the sciences (pp. 139-146). Cambridge, MA: MIT Press.
Mendeleev, D. (1956). Periodic law of the chemical elements. In J. R. Newman, (Ed.), The
world of mathematics (Vol. 1, pp. 913-918). New York: Simon & Schuster. (Original
work published circa 1869)
Merton, R. K. (1961). Singletons and multiples in scientific discovery: A chapter in the so-
ciology of science. American Sociological Review, 105, 470-486.
Messick, D. M. (1973). To join or not to join: An approach to the unionization decision. Or-
ganizational Behavior and Human Performance, 10, 145-156.
Messick, D. M., Bloom, S., Boldizar, J. P., & Samuelson, C. D. (1985). Why we are fairer
than others. Journal of Experimental Social Psychology, 21, 480-500.
Messick, D. M., & Campos, F. T. (1972). Training and conservatism in subjective probabil-
ity. Journal of Experimental Psychology, 94, 335-337.
470 • REFERENCES
Messick, D. ML, & van de Geer, J. P. (1981). A reversal paradox. Psychological Bulletin, 90,
582-593.
Metalsky, G. I., & Abramson, L. Y. (1981). Attributional styles: Toward a framework for
conceptualization and assessment. In P. C. Kendall & S. D. Hollon (Eds.), Assessment
strategies for cognitive-behavioral interventions (pp. 15-58). New York: Academic
Press.
Metz, K. H. (1987). Paupers and numbers: The statistical argument for social reform in
Britain during the period of industrialization. In L. Kriiger, L. J. Daston, & M.
Heidelberger (Eds.), The probabilistic revolution: Vol. 1. Ideas in history (pp.
337-350). Cambridge, MA: MIT Press.
Meux, E. P. (1973). Concern for the common good in an N-person game. Journal of Person-
ality and Social Psychology, 28, 414-418.
Midden, C. J. H., & Verplanken, B. (1990). The stability of nuclear attitudes after
Chernobyl. Journal of Environmental Psychology, 10, 111-119.
Miller, D. W., & Starr, M. K. (1967). The structure of human decisions. Englewood Cliffs,
NJ: Prentice-Hall..
Mintzberg, H. (1975, July/August). The manager's job: Folklore and fact. Harvard Busi-
ness Review, pp. 49-61.
Mischel, W. (1968). Personality and assessment. New York: Wiley.
Mischel, W., & Peake, P. K. (1982). Beyond deja vu in the search for cross-situational con-
sistency. Psychological Review, 89, 730-755.
Monod, J. (1972). Chance and necessity. London: Collins.
Montmort, P. R. (1713). Essai d 'analyse sur lesjeux de hasard [Essay on analysis of games
of chance]. Paris.
Moore, D. S. (1990). Uncertainty. In L. A. Steen (Ed.), On the shoulders of giants: New ap-
proaches to numeracy (pp. 95-137). Washington, DC: National Academy Press.
Moore, D. W. (1992). The super pollsters. New York: Four Walls Eight Windows.
Morgan, J. P., Chaganty, N. R., Daniya, R. C., & Doviak, M. J. (1991a). Let's make a deal:
The player's dilemma. The American Statistician, 45, 284-287.
Morgan, J. P., Chaganty, N. R., Dahiya, R. C., & Doviak, M. J. (1991b). Rejoinder. The
American Statistician, 45, 289.
Morgan, M. G., & Henrion, M. (1990). Uncertainty: A guide to dealing with uncertainty in
quantitative risk and policy analysis. New York: Cambridge University Press.
Morgan, M. S. (1987). Statistics without probability and Haavelmo's revolution in econo-
metrics. In L. Kruger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolu-
tion: Vol. 2. Ideas in the sciences (pp. 171-197). Cambridge, MA: MIT Press.
Morier, D. M., & Borgida, E. (1984). The conjunction fallacy: a task specific phenomenon?
Personality and Social Psychology Bulletin, 10, 243-252.
Morris, R. (1987). The nature of reality. New York: Noonday Press.
Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy. Chi-
cago, Aldine.
Moser, P. K. (1990). Rationality in action: General introduction. In P. K. Moser (Ed.), Ra-
tionality in action: Contemporary approaches (pp. 1-10). New York: Cambridge Uni-
versity Press.
Mosteller, F. (1965). Fifty challenging problems in probability with solutions. Reading,
MA: Addison-Wesley.
Mulaik, S. A., Raju, N. S., & Harshman, R. A. (1997). There is a time and place for signifi-
cance testing. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no
significance tests? (pp. 65-116). Mahwah, NJ: Lawrence Erlbaum Associates.
Murphy, A. H. (1973). A new vector partition of the probability score. Journal of Applied
Meteorology, 12, 595-600.
REFERENCES • 471
Murphy, A. H. (1974). A sample skill score for probability forecasts. Monthly Weather Re-
view, 102, 48-55.
Murphy, A. H., & Katz, R. W. (Eds.) (1983). Probability, statistics, anddecision making in
the atmospheric sciences. Boulder, CO: Westview.
Murphy, A. H., & Winkler, R. L. (1971). Forecasters and probability forecasts: Some cur-
rent problems. Bulletin of the American Meteorological Society, 52, 239-247.
Murphy, A. H., & Winkler, R. L. (1974). Subjective probability forecasting experiments in
meteorology: Some preliminary results. Bulletin of the American Meteorological Soci-
ety, 55, 1206-1216.
Murphy, A. H., & Winkler, R. L. (1977). Can weather forecasters formulate reliable proba-
bility forecasts of precipitation and temperature? National Weather Digest, 2, 2-9.
Murray, D. J. (1987). A perspective for viewing the integration of probability theory into
psychology. In L. Kriiger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic rev-
olution: Vol. 2. Ideas in the sciences (pp. 73-100). Cambridge, MA: MIT Press.
Myers, D. G. (2002). Intuition: The powers and perils of our inner knowing. New Haven,
CT: Yale University Press.
Myers, D. G., & Lamm, H. (1976). The group polarization phenomenon. Psychological
Bulletin, 83, 602-627.
Myers, J. C. (1976). Probability learning and sequence learning. In W. K. Estes (Ed.),
Handbook of learning and cognitive processes: Vol. 3. Approaches to human learning
and motivation (pp. 171-206). Hillsdale, NJ: Lawrence Erlbaum Associates.
Nagel, E. (1956). The meaning of probability. In J. R. Newman (Ed.), The world of mathe-
matics (Vol. 2, pp 1398-1414). New York: Simon & Schuster. (Original work published
1936)
National Academy of Sciences. (1983). Changing climate: Report of the carbon dioxide
assessment committee. Washington, DC: National Academy Press.
National Aeronautics and Space Administration. (1985). Space shuttle data for planetary
mission RTG safety analysis. Huntsville, AL: Author.
National Council of Teachers of Mathematics. (1980). Agenda for action: Recommenda-
tions for school mathematics of the 1980s. Reston, VA: National Council of Teachers of
Mathematics.
National Research Council. (1989). Improving risk communication. Committee on Risk
Perception and Communication, National Research Council. Washington, DC: Na-
tional Academy Press.
Navon, D. (1978). The importance of being conservative: Some reflections on human
Bayesian behavior. British Journal of Mathematical and Statistical Psychology, 31,
33-48.
Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and new find-
ings. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 26, pp.
125-173). New York: Academic Press.
Newby-Clark, I. R., Ross, M., Buehler, R, Koehler, D. J., & Griffin, D. (2000). People focus
on optimistic scenarios and disregard pessimistic scenarios while predicting task com-
pletion times. Journal of Experimental Psychology: Applied, 6, 171-182.
Neyman, J., & Pearson, E. S. (1928a). On the use and interpretation of certain test criteria
for purposes of statistical inference. Part I. Biometrika, 20A, 175-240.
Neyman, J., & Pearson, E. S. (1928b). On the use and interpretation of certain test criteria
for purposes of statistical inference. Part II. Biometrika, 20A, 263-294.
Neyman, J., & Pearson, E. S. (1933). The testing of statistical hypotheses in relation to
probability apriori. Proceedings of the Cambridge Philosophical Society, 29,492-510.
Nickerson, R. S. (1967). Expectancy, waiting time and the psychological refractory period. In
A. F. Sanders (Ed.), Attention and performance (pp. 23-34). Amsterdam: North-Holland.
472 • REFERENCES
Nickerson, R. S. (1994). Electronic bulletin boards: A case study of computer-mediated
communication. Interacting with Computers, 6, 117-134.
Nickerson, R. S. (1996). Ambiguities and unstated assumptions in probabilistic reasoning.
Psychological Bulletin, 120, 410-433.
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Re-
view of General Psychology, 2, 175-220.
Nickerson, R. S. (1999). How we know—and sometimes misjudge—what others know:
Imputing one's own knowledge to others. Psychological Bulletin, 125, 737-759.
Nickerson, R. S. (2000). Null hypothesis statistical testing: A review of an old and continu-
ing controversy. Psychological Methods, 5, 241-301.
Nickerson, R. S. (2001). The projective way of knowing: A useful heuristic that sometimes
misleads. Current Directions in Psychological Research, 10, 168-172.
Nickerson, R. S. (2002). The production and perception of randomness. Psychological Re-
view, 109, 330-357.
Nickerson, R. S., & Burnham, D. W. (1969). Response times with nonaging foreperiods.
Journal of Experimental Psychology, 79, 452-457.
Nickerson, R. S., & McGoldrick, C. C. (1963). Confidence, correctness, and difficulty with
non-psychophysical comparative judgments. Perceptual and Motor Skills, 17,159-167.
Nickerson, R. S., & McGoldrick, C. C. (1965). Confidence ratings and level of perfor-
mance on a judgmental task. Perceptual and Motor Skills, 20, 311-316.
Niiniluoto, I. (1981). L. J. Cohen versus Bayesianism. Behavioral and Brain Sciences, 4,
349.
Nisbett, R. E., & Borgida, E. (1975). Attribution and the psychology of prediction. Journal
of Personality and Social Psychology, 32, 932-943.
Nisbett, R. E., Borgida, E., Crandall, R., & Reed, H. (1976). Popular induction: Informa-
tion is not necessarily informative. In J. S. Carroll & J. W. Payne (Eds.), Cognition and
social behavior (pp. 113-134). Hillsdale, NJ: Lawrence Erlbaum Associates.
Nisbett, R. E., Fong, G. T, Lehman, D. R., & Cheng, P. W. (1987). Teaching reasoning. Sci-
ence, 238, 625-631.
Nisbett, R. E., Krantz, D. H., Jepson, D., & Fong, G. T. (1982). Improving inductive infer-
ence. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty:
Heuristics and biases (pp. 445-462). New York: Cambridge University Press.
Nisbett, R. E., Krantz, D. H., Jepson, D., & Kunda, Z. (1983). The use of statistical
heuristics in everyday inductive reasoning. Psychological Review, 90, 339-363.
Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social
judgement. Englewood Cliffs, NJ: Prentice-Hall.
Nisbett, R. E., & Schacter, S. (1966). Cognitive manipulation of pain. Journal of Experi-
mental Social Psychology, 2, 227-236.
Nogar, R. J. (1966). The wisdom of evolution. New York: New American Library.
North, D. W. (1968). A tutorial introduction to decision theory. IEEE Transactions on Sys-
tems Science and Cybernetics, SSC-4, 200-210.
Novick, M. R., & Jackson, P. E., (1974). Statistical methods for educational and psycho-
logical research. New York: McGraw-Hill.
Nowak, M. A., May, R. M., & Sigmund, K. (1995). The arithmetics of mutual help. Scien-
tific American, 272(6), 76-79.
Nowak, M. A., Page, K. M., & Sigmund, K. (2000). Fairness versus reason in the ultimatum
game. Science, 289, 1773-1775.
Nowak, M. A., & Sigmund, K. (1992). Tit-for-tat in heterogeneous populations. Nature,
355, 250-253.
Nozick, R. (1993). The nature of rationality. Princeton, NJ: Princeton University Press.
Nuclear Regulatory Commission. (1978). Risk assessment review group report to the U.S. Nu-
clear Regulatory Commission (Rep. No. NUREG/CR-0400). Washington, DC: Author.
REFERENCES * 473
Nuclear Regulatory Commission (1986). Safety goals for the operation of nuclear power
plants: Policy statement. Federal Register, 51 (August 4), 28044-28049.
Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sci-
ences. Chichester, England: Wiley.
Oberschall, A. (1987). The two empirical roots of social theory and the probability revolu-
tion. In L. Kriiger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution:
Vol. 2. Ideas in the sciences (pp. 103-131). Cambridge, MA: MIT Press.
O'Connor, M. (1989). Models of human behavior and confidence in judgment: A review.
International Journal of Forecasting, 5, 159-169.
Office of Technology Assessment, U.S. Congress. (1987). New developments in biotechnology:
Public perceptions of biotechnology. Washington, DC: U.S. Government Printing Office.
Ogilvy, C. S., & Anderson, J. T. (1966). Excursions in number theory. New York: Oxford
University Press.
O'Gorman, H. J., & Garry, S. L. (1976). Pluralistic ignorance—a replication and extension.
Public Opinion Quarterly, 40, 449-458.
Okasha, S. (2000). Bayes.Levi, andthetaxicabs. Behavioral and Brain Sciences, 23,693.
Olson, C. L. (1976). Some apparent violations of the representativeness heuristic in human
judgment. Journal of Experimental Psychology: Human Perception and Performance,
2, 599-608.
Olson, M. (1965). The logic of collective action: Public goods and the theory of groups.
Cambridge, MA: Harvard University Press.
Ophuls, W. (1973). Leviathan or oblivion? In H. E. Daly (Ed.), Toward a steady state econ-
omy (pp. 215-230). San Francisco: Freeman.
Osberg.T. M., & Shrauger, J. S. (1986). Self-prediction: Exploring the parameters of accu-
racy. Journal of Personality and Social Psychology, 51, 1044-1057.
Osherson, D. N. (1995). Probability judgment. In E. E. Smith & D. N. Osherson (Eds.),
Thinking: An invitation to cognitive science (2nd ed., Vol. 3, pp. 35-75). Cambridge,
MA: MIT Press.
Oskamp, S. (1962). The relationship of clinical experience and training methods to several
criteria of clinical production. Psychological Monographs, 76.
Oskamp, S. (1965). Overconfidence in case study judgments. Journal of Consulting Psy-
chology, 29, 261-265.
Ostrom, E., Walker, J., & Gardner, R. (1992). Covenants with and without a sword:
Self-governance is possible. American Political Science Review, 86, 404-417.
Owens, D. (1992). Causes and coincidences. Cambridge, England: Cambridge University
Press.
Paese, P. W, & Feuer, M. A. (1991). Decisions, actions, and the appropriateness of confi-
dence in knowledge. Journal of Behavioral Decision Making, 4, 1-16.
Paese, P. W, & Sniezek, J. A. (1991). Influences on the appropriateness of confidence in
judgment: Practice, effort, information, and decision-making. Organizational Behav-
ior and Human Decision Processes, 48, 100-130.
Pagels, H. R. (1991). Perfect symmetry: The search for the beginning of time. New York:
Bantam Books.
Pais, A. (1986). Inward bound: Of matter and forces in the physical world. New York: Ox-
ford University Press.
Palmer, T. C., Jr. (2000, February 2). Big Dig costs take a jump of $1.4b. Boston Globe (pp.
A1.B8).
Parducci, A. (1974). Contextual effects: A range-frequency analysis. In E. C. Carterette &
M. P. Friedman, (Eds.), Handbook of perception (Vol. 2, pp. 127-142). New York: Aca-
demic Press.
Pareto, V. (1897). Cours d'economic politique [Course in political economy] (Vol. 2).
Lausanne, Switzerland: Rouge.
474 • REFERENCES
Paulos, J. A. (1990). Innumeracy: Mathematical illiteracy and its consequences. New
York: Vintage Books.
Paulos, J. A. (1992). Beyond numeracy. New York: Vintage Books.
Paulos, J. A. (1998). Once upon a number: The hidden mathematical logic of stories. New
York: Basic Books.
Payne, S. L. (1952). The art of asking questions. Princeton, NJ: Princeton University Press.
Pearson, K. & Weldon, W. F. R. (1901). Editorial. Biometrika, 1, 3.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An inte-
grated approach. Hillsdale, NJ: Lawrence Erlbaum Associates.
Peirce, C. S. (1932). Collected papers. (C. Hartshone & P. Weiss, Eds.). Cambridge, MA:
Belknap Press.
Peirce, C. S. (1956). The red and the black. In J. R. Newman (Ed.), The world of mathemat-
ics (Vol. 2, pp. 1334-1340). New York: Simon & Schuster.
Penney, W. (1969). A waiting time problem. Journal of Recreational Mathematics, 241
(Problems 95 and 96).
Pennington, N., & Hastie, R. (1988). Explanation-based decision making: Effects of mem-
ory structure on judgment. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 14, 521-533.
Pennington, N., & Hastie, R. (1993). The story model for juror decision making. In R.
Hastie (Ed.), Inside the juror: The psychology of juror decision making (pp. 192-221).
New York: Cambridge University Press.
Perloff, L. S., & Fetzer, B. K. (1986). Self-other judgments and perceived vulnerability to
victimization. Journal of Personality and Social Psycholgy,50, 502-510.
Peters, T. (1979, November/December). Leadership: Sad facts and silver linings. Harvard
Business Review, pp. 164-172.
Peterson, C. R. (1980). Recognition of noncontingency. Journal of Personality and Social
Psychology, 38, 727-734.
Peterson, C. R., & Beach, L. R, (1967). Man as an intuitive statistician. Psychological
Bulleton, 68, 29-46.
Peterson, C. R., & DuCharme, W. M. (1967). A primacy effect in subjective probability re-
vision. Journal of Experimental Psychology, 73, 61-65.
Peterson, C. R., DuCharme, W. M., & Edwards, W. (1968). Sampling distributions and
probability revisions. Journal of Experimental Psychology, 76, 236-243.
Peterson, C. R., & Miller, A. J. (1965). Sensitivity of subjective probability revision. Jour-
nal of Experimental Psychology, 70, 117-121.
Peterson, C. R., & Phillips, L. D. (1966). Revision of continuous probability distributions.
IEEE Transactions on Human Factors in Electronics, HFE-7, 19-22.
Peterson, C. R., Schneider, R. J., & Miller, A. J. (1965). Sample size and the revision of sub-
jective probability. Journal of Experimental Psychology, 69, 522-527.
Peterson, D. R. (1968). The clinical study of social behavior. New York: Appleton-Cen-
tury-Crofts.
Petty, R. E., & Cacioppo, J. T. (1979). Issue involvement can increase or decrease persua-
sion by enhancing message-relevant cognitive responses. Journal of Personality and
Social Psychology, 37, 349-360.
Petty, R. E., Cacioppo, J. T, & Goldman, R. (1981). Personal involvement as a determinant of
argument-based persuasion. Journal of Personality and Social Psychology, 41,847-855.
Pfleeger, S. L. (1991). Software engineering: The production of quality software (2nd ed.).
New York: Macmillan.
Phillips, L. D. (1973). Bayesian statistics for social scientists. London: Nelson.
Phillips, L. D., & Edwards, W. (1966). Conservation in a simple probability inference task.
Journal of Experimental Psychology, 72, 346-357.
REFERENCES * 475
Phillips, L. D., Hays, W. L., & Edwards, W. (1966). Conservatism in complex probabilistic
inference. IEEE Transactions on Human Factors in Electronics, HFE-7, 7-18.
Phillips, L. D., & Wright, G. N. (1977). Cultural differences in viewing uncertainty and as-
sessing probabilities. In H. Jungermann & G. de Zeeuw (Eds.), Decision making and
change in human affairs (pp. 507-519). Dordrecht, Netherlands: Reidel.
Piattelli-Palmarini, M. (1989). Evolution, selection and cognition: From "learning" to pa-
rameter setting in biology and in the study of language. Cognition, 31, 1-44.
Piattelli-Palmarini, M. (1994). Inevitable illusions: How mistakes of reason rule our
minds. New York: Wiley.
Pincus, S. (1991). Approximate entropy as a measure of system complexity. Proceedings of
the National Academy of Sciences of the United States, 88, 2297-2301.
Pincus, S. & Kalman, R. (1997). Not all (possibly) "random" sequences are created
equal. Proceedings of the National Academy of Sciences of the United States, 94,
3513-3518.
Pincus, S., & Singer, B. H. (1996). Randomness and degrees of irregularity. Proceedings of
the National Academy of Sciences of the United States, 93, 2083-2088.
Pitz, G. F. (1974). Subjective probability distributions for imperfectly known quantities. In
L. W. Gregg (Ed.), Knowledge and cognition (pp. 24-40). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Pitz, G. F., Leung, L. S., Hamilos, C., & Terpening, W. (1976). The use of probabilistic in-
formation in making predictions. Organizational Behavior and Human Performance,
17, 1-18.
Platt, J. (1973). Social traps. American Psychologist, 28, 641-651.
Poincare', H (1956). Chance. In J. R. Newman (Ed.), The world of mathematics (Vol. 2, pp.
1380-1394). New York: Simon & Schuster. (Original work published 1913)
Poisson, S-D. (1837). Recherches sur laprobabilitedesjugementsen matiere criminelle et
en matiere civile, precldees de r>egles generates du calduldesprobabilites [Investiga-
tions of the probability of judgments in criminial and civil matters, preceded by general
rules of the calculus of probabilities] Paris: Bachelier.
Polanyi, M. (1963). Experience and the perception of pattern. In K. M. Sayre & F. J.
Grossman (Eds.), The modeling of mind: Computers and intelligence (pp. 207-220).
Notre Dame, IN: Notre Dame Press.
Politser, P. E. (1981). Decision analysis and clinical judgment: A reevaluation. Medical De-
cision Making, 7,361-389.
Politzer, G., & Noveck, I. A. (1991). Are conjunction rule violations the result of conversa-
tional rule violations? Journal of Psycholinguistic Research, 20, 83-103.
Polkinghorne, J. C. (1986). One world. London: SPCK.
Polkinghorae, J. (1988). Science and creation: The searchfor understanding. Boston: New
Science Library Shambhala.
Pollard, P. (1984). Intuitive judgments of proportions, means, and variances: A review.
Current Psychological Research and Reviews, 3, 5-18.
Pollard, P. (1993). How significant is "significance"? In G. Keren & C. Lewis (Eds.), A
handbook for data analysis in the behavioral sciences: Methodological issues (pp.
449-460). Hillsdale, NJ: Lawrence Erlbaum Associates.
Pollard, P., & Evans, J. St. B. T. (1983). The effect of experimentally contrived experience
on reasoning performance. Psychological Research, 45, 287-301.
Pollard, P., & Richardson, J. T. E. (1987). On the probability of making Type I errors. Psy-
chological Bulletin, 10, 159-163.
Pollatsek, A. (1991). Randomness is well-enough understood to be misunderstood. Com-
mentary on "Psychological Conceptions of Randomness." Journal of Behavioral Deci-
sion Making, 4, 218-220.
476 • REFERENCES
Pollatsek, A., Konold, C., Well, A., & Lima, S. (1984). Beliefs underlying random sam-
pling. Memory and Cognition, 12, 395-401.
Polya,G. (1954a). Mathematics and plausible reasoning: Vol. 1. Induction and analogy in
mathematics. Princeton, NJ: Princeton University Press.
Polya, G. (1954b). Mathematics and plausible reasoning: Vol. 2. Patterns of plausible in-
ference. Princeton, NJ: Princeton University Press.
Popper, K. R. (1959). The logic of scientific discovery. New York: Basic Books.
Porter, T. M. (1986). The rise of statistical thinking, 1820-1900. Princeton, NJ: Princeton
University Press.
Porter, T. M. (1987). Lawless society: Social science and the reinterpretation of statistics in
Germany, 1850-1880. In L. Kriiger, L. J. Daston, & M. Heidelberger (Eds.), The proba-
bilistic revolution: Vol. L Ideas in history (pp. 351-375). Cambridge, MA: MIT Press.
Poulton, E. C. (1982). Biases in quantitative judgments. Applied Ergonomics, 13,31 -42.
Poundstone, W. (1990). Labyrinths of reason. New York: Doubleday.
Poundstone, W. (1992). Prisoner's dilemma: John von Neumann, game theory, and the
puzzle of the bomb. New York: Anchor Books.
Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32,122-136.
Press, F. (1975). Earthquake prediction. Scientific American, 232, 14-23.
Price, D. J. de S. (1961). Science since Babylon. New Haven, CT: Yale University Press.
Quetelet, L. A. J. (1829). Recherches statistiques sur le royaume des Pays-Bas [Statistical
investigations in the Netherlands]. Nouveaux memoires de I'academic royale des sci-
ences et belles-lettres de Bruxelles, 5.
Quetelet, L. A. J. (1847). De 1'influence de libre arbitre de l'homme sur les fails sociaux
[On the influence of free will on social acts]. Bulletin de la Commission Centrale de
Statistique (of Belgium), 3, 135-155.
Quetelet, L. A. J. (1848). Du syste"me social et des lois qui le r6gissent [On the social system
and the laws that govern it]. Paris: Guillaumin.
Raiffa, H. (1968). Decision analysis: Introductory lectures on choices under uncertainty.
Reading, MA: Addison-Wesley.
Raiffa, H., & Schlaifer, R. O. (1961). Applied statistical decision theory. Cambridge, MA:
Harvard Business School.
Rapoport, Am. (1967). Optimal policies for the prisoner's dilemma. Psychological Review,
74, 136-145.
Rapoport, Am., & Budescu, D. V. (1992). Generation of random series in two-person strictly
competitive games. Journal of Experimental Psychology: General, 121, 352-363.
Rapoport, Am., & Wallsten, T. S. (1972). Individual decision behavior. Annual Review of
Psychology, 23, 131-176.
Rapoport, An., & Chammah, A. M. (1965). Prisoner's dilemma. Ann Arbor: University of
Michigan Press.
Rathje, W. L. (1989, December). Rubbish! The Atlantic Monthly, pp. 99-109.
Reichardt, C. S., & Gollob, H. F. (1997). When confidence intervals should be used instead
of statistical tests, and vice versa. In L. L. Harlow, S. A. Mulaik* & J. H. Steiger (Eds.),
What if there were no significance tests? (pp. 259-284). Mahwah, NJ: Lawrence
Erlbaum Associates.
Renn, O. (1990). Public response to the Chernobyl accident. Journal of Environmental Psy-
chology, 10, 151-167.
Richardson, L. F. (1956). Statistics of deadly quarrels. In J. R. Newman (Ed.), The world of
mathematics (Vol. 2, pp. 1254-1263). New York: Simon & Schuster.
Rindskopf, D. M. (1997). Testing "small," not null, hypotheses: Classical and Bayesian ap-
proaches. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no sig-
nificance tests? (pp. 319-332). Mahwah, NJ: Lawrence Erlbaum Associates.
REFERENCES • 477
Roberts, L. (1989). Global warming: Blaming the sun. Science, 246, 992.
Robertson, L. S. (1977). Car crashes: Perceived vulnerability and willingness to pay for
crash protection. Journal of Consumer Health, 3, 136-141.
Robinson, D., & Levin, J. (1997). Reflections on statistical and substantive significance,
with a slice of replication. Educational Researcher, 26(5), 21-26.
Robinson, L. B., & Hastie, R. (1985). Revision of beliefs when a hypothesis is eliminated
from consideration. Journal of Experimental Psychology: Human Perception and Per-
formance, 11, 443-456.
Roby, T. B. (1965). Belief states and the uses of evidence. Behavioral Science, 10,255-270.
Ronis, D. L., & Yates, F. (1987). Components of probability judgment accuracy: Individual
consistency and effects of subject matter and assessment method. Organizational Be-
havior and Human Decision Processes, 40, 193-218.
Rose, L. E. (1972). Countering a counter-intuitive probability. Philosophy of Science, 39,
523-524.
Ross, M., & Sicoly, F. (1979). Egocentric biases in availability and attribution. Journal of
Personality and Social Psychology, 37, 322-336.
Roth, A. E. (1995). Bargaining experiments. In J. H. Kagel & A. E. Roth (Eds.), Handbook
of experimental economics (pp. 253-348). Princeton, NJ: Princeton University Press.
Roth, A. E., Prasnikar, V., Okuno-Fujiwara, M., &Zamir, S. (1991). Bargaining and market
behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An experimental study. The
American Economic Review, 81, 1068-1095.
Rothman, S., & Lichter, S. R. (1996). Is environmental cancer a political disease? In P. R.
Gross, N. Levitt, & M. W. Lewis (Eds.), The flight from reason (pp. 231-245). New
York: New York Academy of Sciences.
Rouanet, H. (1996). Bayesian methods for assessing importance of effects. Psychological
Bulletin, 119, 149-158.
Rowen, J. (1973). Effects of some probability displays on choice. Organizational Behavior
and Human Performance, 9, 1-15.
Roxburgh, I. W. (1977). The cosmical mystery: The relationship between microphysics
and cosmology. In R. Duncan & M. Weston-Smith. (Eds.), The encyclopedia of igno-
rance (pp. 37-45). New York: Pocket Books, Simon & Schuster.
Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychologi-
cal Bulletin, 57, 416-428.
Rozeboom, W. W. (1997). Good science is abductive, not hypothetico-deductive. In L. L.
Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests?
(pp. 335-391). Mahwah, NJ: Lawrence Erlbaum Associates.
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. An-
nals of Statistics, 6, 34-58.
Ruelle, D. (1991). Chance and chaos. Princeton, NJ: Princeton University Press.
Russel, G. D. (1973). Lincoln and Kennedy: Looked at kindly together. New York: Carlton
Press.
Ryback, D. (1967). Confidence and accuracy as a function of experience in judg-
ment-making in the absence of systematic feedback. Perceptual and Motor Skills,
24, 331-334.
Saarinen,T. F. (1980). Reconnaissance trip to Mt. St. Helens, May 18-21,1980. TheBridge
(National Academy of Engineering), 10, 19-22.
Sahlin, N-E. (1987). The significance of empirical evidence for developments in the foun-
dations of decision theory. InD. Batens & J. P. VanBendegem(Eds.), Theory and exper-
iment (pp. 103-122). Dordrecht, Netherlands: Reidel.
Saks, M. J., & Kidd, R. F. (1980). Human information processing and adjudication: Trial by
heuristics. Law and Society Review, 15, 123-160.
478 • REFERENCES
Salmon, W. C. (1974). The pragmatic justification of induction. InR. Swinburne (Ed.), The
justification of induction (pp. 85-97). London: Oxford University Press.
Salsburg, D. (2001). The lady tasting tea. New York: Freeman.
Samet, M. (1975a). Quantitative interpretation of two qualitative scales used to rate mili-
tary intelligence. Human Factors, 17, 76-86.
Samet, M. (1975b). Subjective interpretation of reliability and accuracy scales for evaluat-
ing military intelligence (Tech. Paper No. 260). Alexandria, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
Samuelson, P. A. (1960). The St. Petersburg paradox as a divergent double limit. Interna-
tional Economic Review, 1, 31-37.
Samuelson, P. A. (1977). St. Petersburg paradoxes: Defanged, dissected, and historically
described. Journal of Economic Literature, 15, 24-55.
Savage, L. J. (1954). Foundation of statistics. New York: Wiley.
Savage, L. J. (1962). Subjective probability and statistical practice. In G. A. Barnard & D.
R. Cox (Eds.), The foundations of statistical inference: A discussion (pp. 9-35). New
York: Wiley.
Savage, L. J. (1972). The foundations of statistics (2nd ed.). New York: Dover. (Original
work published 1954)
Savage, P. S., Jr. (1994). The paradox of nontransitive dice. The American Mathematical
Monthly, 101, 429-436.
Schaffner, P. E. (1985). Specious learning about reward and punishment. Journal of Per-
sonality and Social Psychology, 48, 1377-1386.
Schkade, D. A., & Johnson, E. J. (1989). Cognitive processes in preference reversals. Orga-
nizational Behavior and Human Decision Processes, 44, 203-231.
Schlotterbek, M. (1992). Bayesian inference without base rates. Diploma thesis, Univer-
sity of Tubingen, Germany. (Described in Gigerenzer, 1994)
Schmidt, F. L. (1992). What do datareally mean? American Psychologist, 47,1173-1181.
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psy-
chology: Implications for training of researchers. Psychological Methods, 1,115-129.
Schmitt, N., Coyle, B. W, & King, L. (1976). Feedback and task predictability as determi-
nants of performance in multiple cue probability learning task. Organizational Behav-
ior and Human Decision Processes, 16, 388-402.
Schum, D. A., & DuCharme, W. M. (1971). Comments on the relationship between the im-
pact and the reliability of evidence. Organizational Behavior and Human Performance,
6, 111-131.
Schum, D. A., DuCharme, W. M., & DePitts, K. E. (1973). Research on human multi-stage
probabilistic inference processes. Organizational Behavior and Human Performance,
10, 318-348.
Schum, D. A., Goldstein, I. L., & Southard, J. F. (1966). Research on a simulated Bayesian
information processing system. IEEE Transactions on Human Factors in Electronics,
HFE-7, 37-48.
Schum, D. A., & Martin, A. W. (1980). Probabilistic opinion revision on the basis of evi-
dence at trial: A Baconian or Pascalian process (Research Rep. No. 80-02). Houston,
TX: Rice University, Department of Psychology.
Schum, D. A., & Martin, A. W. (1982). Formal and empirical research on cascaded infer-
ence in jurisprudence. Law and Society Review, 17, 105-151.
Schustack, M. W, & Sternberg, R. J. (1981). Evaluation of evidence in causal inference.
Journal of Experimental Psychology: General, 110, 101-120.
Schwarz, N. (1995). Social cognition: Information accessibility and use in social judg-
ment. In E. E. Smith & D. N. Osherson (Eds.), Thinking: An invitation to cognitive sci-
ence (2nd ed., Vol. 3, pp. 345-376). Cambridge, MA: MIT Press.
REFERENCES • 479
Searle, J. R. (1975). Indirect speech acts. In P. Cole & J. I. Morgan (Eds.), Syntax and se-
mantics: Vol. 3. Speech acts (pp. 59-82). New York: Seminar Press.
Seaver, D. A., von Winterfeldt, D. V., & Edwards, W. (1978). Eliciting subjective probabil-
ity distributions on continuous variables. Organizational Behavior and Human Perfor-
mance, 21, 379-391.
Sedikides, C. (1992). Changes in the valence of the self as a function of mood. In M. S.
Clark (Ed.), Emotion and social abehavior: Review of personality and social psychol-
ogy (Vol 14., pp. 271-311). Newbury Park, CA: Sage.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the
power of studies? Psychological Bulletin, 105, 309-316.
Seitz, R, Jastrow, R., & Nierenberg, W. A. (1989). Scientific perspectives on the green-
house problem. Washington, DC.: George C. Marshall Institute.
Selvin, S. (1975a). On the Monty Hall problem. The American Statistician, 29, 134.
Selvin, S. (1975b). A problem in probability [Letter to the editor]. The American Statisti-
cian, 29, 67.
Sennott, C. M., & Palmer, T. C., Jr. (1994, September 11). The big dig. Boston Globe, pp. 1,
40,41,43.
Seymann, R. G. (1991). Comment. The American Statistician, 45, 287-288.
Shackle, G. L. S. (1967). Time in economics. Amsterdam: North Holland. (Original work
published 1958)
Shafer, G. (1986). Savage revisited (with discussion), Statistical Science, 1, 463-501.
Shafer, G. (1993). Can the various meanings of probability be reconciled? In G. Keren & C.
Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological
issues (pp. 165-196). Hillsdale, NJ: Lawrence Erlbaum Associates.
Shafer, G., & Tversky, A. (1985). Languages and designs for probability judgment. Cogni-
tive Science, 9, 309-339.
Shafir, E. (1993). Intuitions about rationality and cognition. In K. I. Manktelow & D. E.
Over (Eds.), Rationality: Psychological and philosophical perspectives (pp. 260-283).
London: Routledge.
Shafir, E., & Tversky, A. (1992). Thinking through uncertainty: Nonconsequential reason-
ing and choice. Cognitive Psychology, 24, 449-474.
Shafir, E., & Tversky, A. (1995). In E. E. Smith & D. N. Osherson (Eds.), Thinking: An in-
vitation to cognitive science (2nd ed., Vol. 3, pp. 77-100). Cambridge, MA: MIT
Press.
Shaklee, H., & Fischhoff, B. (1982). Strategies of information search in causal analysis.
Memory and Cognition, 10, 520-530.
Shaklee, H., & Mims, M. (1982). Sources of error in judging event covariations: Effects of
memory demands. Journal of Experimental Psychology: Learning, Memory and Cog-
nition, 8, 208-224.
Shaklee, H., & Tucker, D. (1980). A rule analysis of judgments of covariation between
events. Memory and Cognition, 8, 459-467.
Shapiro, B. J. (1983). Probability and certainty in seventeenth-century England. Princeton,
NJ: Princeton University Press.
Shapiro, S. H. (1982). Collapsing contingency tables—a geometric approach. The Ameri-
can Statistician, 36, 43-46.
Sharlin, H. I. (1986). EDB: A case study in communicating risk. Risk Analysis, 6,
61-68.
Shaughnessy, J. M., & Dick, T. (1991). Monty's dilemma: Should you stick or switch?
Mathematics Teacher, 84(4), 252-256.
Shaver, J. P. (1993). What statistical significance testing is, and what it is not. Journal of Ex-
perimental Education, 61, 293-316.
480 • REFERENCES
Shaw, G. B. (1956). The vice of gambling and the virtue of insurance. In J. R. Newman
(Ed.), The world of mathematics (pp. 1524-1531). New York: Simon & Schuster. (Orig-
inal work published 1944)
Shedler, T. & Mam's, M. (1986). Can the availability heuristic explain vividness effects?
Journal of Personality and Social Psychology, 51, 26-36.
Shepard, R. N. (1964). On subjectively optimal selection among multi-attribute alterna-
tives. In M. W. Shelley, II, & G. L. Bryan (Eds.), Human judgment and optimatity (pp.
257-281). New York: Wiley.
Sherman, S. J. (1980). On the self-erasing nature of errors of prediction. Journal of Person-
ality and Social Psychology, 39, 211-221.
Sherman, S. J., McMullen, M. N., & Gavanski, I. (1992). Natural sample spaces and the inver-
sion of conditional judgments. Journal of Experimental Social Psychology, 28, 401-421.
Shodell, M. (1985). Risky business. Science, 230, 43-47
Showers, C, & Cantor, N. (1985). Social cognition. A look at motivated strategies. Annual
Review of Psychology, 36, 275-305.
Shrout, P. E. (1997). Should significance tests be banned? Introduction to a special section
exploring the pros and cons. Psychological Science, 8, 1-2.
Shuford, E. H., Albert, A., & Massengill, H. E. (1966). Admissible probability measure-
ment procedures. Psychometrika, 31, 125-147.
Schweder, R. A. (1977). Likeness and likelihood in everyday thought: Magical thinking in
everyday judgments about personality. In R N. Johnson-Laird & R C. Wason (Eds.),
Thinking: Readings in cognitive science (pp. 446-467). New York: Cambridge Univer-
sity Press.
Shweder, R. A. & D'Andrade, R. G. (1980). The systematic distortion hypothesis. In R. A.
Shweder (Ed.), New directions for methodology of behavioral science: Fallible judg-
ment in behavioral research (pp. 37-58). San Francisco: Jossey Bass.
Sigmund, K., Fehr, E., & Nowak, M. A. (2002). The economics of fair play. Scientific
American, 286(1), 82-87.
Silka, L., & Albright, L. (1983) Intuitive judgments of rate of change: The use of teenage
pregnancies. Basic and Applied Social Psychology, 4, 337-352.
Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of 'Econom-
ics, 69, 99-118.
Simon, H. A. (1957). Models of man: Social and rational. New York: Wiley.
Simon, H. A. (1990). Alternative visions of rationality. In P. K. Moser (Ed.), Rationality in
action: Contemporary approaches (pp. 189-204). New York: Cambridge University
Press. (Originally appeared in Reason in human affairs, 1983 (pp. 7-35). Stanford, CA:
Stanford University Press)
Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of
the Royal Statistical Society, Series B, 13, 238-241.
Skov, R. B., & Sherman, S. J. (1986). Information-gathering processes: Diagnosticity, hy-
pothesis confirmatory strategies and perceived hypothesis confirmation. Journal of Ex-
perimental Social Psychology, 22, 93-121.
Slote, M. (1985). Commonsense morality and consequentialism. London: Routledge &
Kegan Paul.
Slote, M. (1986). Moderation, rationality, and virtue. In The Tanner Lectures on Human
Values (Vol. 7). Salt Lake City, UT: University of Utah Press.
Slote, M. (1990). Rational dilemmas and rational supererogation. In P. K. Moser (Ed.), Ra-
tionality in action: Contemporary approaches (pp. 465-480). New York: Cambridge
University Press. (Original work published 1986)
Slovic, P. (1966). Value as a determiner of subjective probability. IEEE Transactions on
Human Factors in Electronics, HFE-7, 22-28.
REFERENCES • 481
Camerer, C., 223, 338,407, 445, 459 Cohen, B., 312, 446
Campbell, D.T., 250,487 Cohen, I. B., 16, 446
Campos, F. T., 399,400,469 Cohen, J., 71, 250, 251, 279, 291, 398,
Cannon, J. T., 250, 441 446
Cantor, N., 428, 480 Cohen, L. J., 24, 35, 122,125,126,130,
Cardoze, S., 407, 468 132,133, 134,135,136, 290, 301,
Carlin, J., 142, 269,454 303, 342, 343, 346, 347, 379,401,
Carlson, B. W., 343, 345, 375, 445, 488 413,421, 422,423,427,429,446
Carnap, R., 19, 31, 35, 36, 37,445 Cohen, M.D., 256,447
Carpenter, K. M., 286, 457 Cohen, M. R., 208,447
Carroll, J. B., 295, 445 Cole, K. C., 206, 218,447
Carroll,J. S., 413, 464 Coleman, W., 265, 442,447
Carsten, L. M., 466 Combs, B., 306,447
Carter, B., 102, 445 Condorcet, M., 186,206,447
Cartwright, N., 442 Constantini, A. R, 284,459
Carver, R., 250,445 Coombs, B., 306, 328, 376,466
Cassells, W., 404,445 Coombs, C. H., 358, 362,447
Catania, A. C, 71,445 Cooper, A. C., 309,447
Cavanagh, R. C., 401,462 Copi, I. M., 160,161,163,165,168,169,
Cervone, D., 377, 445 447
Chaganty,N. R.,156,470 Cornoldi, C., 318,468
Chaiken, S., 428,445 Corrigan, B., 358, 359,448
Chaitin, G. J., 56,445 Cortina, J. M., 250,251, 447
Chammah, A. M., 217, 476 Cosmides, L., 407,447
Chandler, C. C., 309, 457 Cournot,A. A., 32,186,447
Chapman, D. P., 387,465 Covello,V.T.,310,447
Chapman, G. B., 375, 378, 445 Cowan, T., 58,483
Chapman, J. P., 332, 337, 338,445 Cox, D. R., 250,447
Chapman, L. J., 332, 337, 338,433, 445 Coxeter, H. S. M., 173,441
Cheng, P. W., 433, 472 Coyle, B. W., 295,478
Chenzoffetal.,412, 445 Cramer, G., 198,447
Chesnick, E. I., 290, 342, 343,427,446 Crandall, R., 405,406,411,472
Chess, C., 303, 310, 312,458 Cressey, L. J., 466
Choo, G.T., 431,487 Crocker, J., 333, 334, 337, 338,447
Chow, S. L., 250,445, 446 Crouch, E. A. C., 276, 314,447,487
Christensen, C., 293, 294, 295, 320, 321, Crowe, B., 226,447
440 Croyle, R. T., 309,461
Christensen-Szalanski, C. M., 306,446 Cudeck,R. C., 251,444
Christensen-Szalanski, J. J., 317,402, Cull,W.L.,318,447
406,407,408, 417,446 Cutts,D.,71,445
Christensen-Szalanski, J. J. J., 306,446 Cvetkovich, G., 310,447
Chromiak, W., 285, 458 Czerlinski, J., 384,447
Chronbach, L. J., 250,446
Cialdini, R. B., 286,457
Ciampi, A., 364, 466 D
Cicerone, R. J., 305, 449
Clark, E. V., 346,446
Clark, H. H., 319, 346,446,481 Dahiya, R. C., 156,470
Clark, R. D., 312, 446 Dale, A. L, 160,163,164,448
Clarke, R.D., 67, 446 Dale, H. C. A., 73,412,448
Cocozza, J. J., 307, 482 Damas, P. A., 388,448
492 AUTHOR INDEX
D'Andrade, R. G., 339,480 Donoghue, J. M., 407,458
Danzig, E. R., 413,487 Doob, L. W., 313, 456
Parley, J. M., 309,411,448,449 Doos, B. R, 305,443
Daston, 375, 409,413,420, 421,430 Dorner, W. W., 334,486
Daston, L., 2, 13, 21, 25, 26, 34, 45, 234, Dovalina, I., 407,468
235, 239, 243, 247,265, 266, 269, Doviak, M. J., 156,470
270,280,281,455 Dowie, J., 359,449
Daston, L. J., 15, 22, 31, 241,448,464 Dracup, C., 250,449
David, F. N., 3, 4, 5, 6, 7, 8, 10, 12, 243, DuCharme, M. W., 399,449
281,448 DuCharme, W. M., 117, 301, 399,401,
Davies, P., 26, 54,103, 448 412,415,449,474,478
Davies, P. C. W., 27,100,101,102, 103, Dugan, P. M., 339,458
104,105,106, 107,108,448 Dulany, D. E., 346, 449
Davis, C. D., 466 Dunkelberg, W. C., 309,447
Davis, H. L., 377, 448 Dunlap, W. P., 250, 251,447
Davis, J. N., 335, 371, 384,448,456 Dunning, D., 308, 317, 319, 321, 322,
Davis, P. J., 17,55,448 396,449,457
Dawes, R., 320,448 Dusoir, A. E., 300,451
Dawes, R. M., 214, 221, 222, 225, 274, Dyson, F. J., 103,450
276,293, 294, 295, 320, 344, 347,
348, 358, 359, 377, 386, 390, 417, E
440,448,483
Deane, D. H., 295,458
Dearnaley, E. J., 291,446 Earle,T. C, 310,447
DeBono, K. G., 428,458 Eddington, A. S., 118,119,450
de Dombal, F. T., 323, 448 Eddy, D. M., 404,425,450
de Finetti, B., 35,391,448 Edgeworth, F. Y., 236,450
DeGroot, M. H., 137, 367,442, 448 Edney, J. J., 226, 227,450
DeJoy, D., 308, 309, 396, 449 Edwards, W., 19, 118, 142, 269, 284, 301,
Deldin, P. J., 387,466 320, 321, 322, 350, 357, 358, 359,
DeLuca,D. R.,313,456 388, 399,400,409,415,450,474,
De Moivre, A., 11, 26, 240, 449 475,479,485
Denes-Raj, V., 296,449 Einhorn, H., 366, 456
Dennett, D.C., 379,449 Einhorn, H. J., 274, 317, 334, 359,414,
DePitts, K. E,, 117,415,478 415,450
Derby, S. L., 305,452 Ekberg, P-H. S., 377,467
DeVault, R. C., 297, 482 Ekeland, I., 26,46,450
Devine,P.G.,400,413,449 Ellsberg, D., 274,450
Devlin, K., 46,48,61, 96,449 Elstein, A., 359,449
Dewdney, A. K., 82, 276, 279,449 Engle, P. L., 290,450
Diaconis, P., 86, 88, 89, 352,449 Ennis, B. J., 307,450
Dick, T., 156,479 Epstein, S., 296,449,463
Dickens, C., 84,449 Epstein, W., 317,455
Dickhaut, J., 224, 366,442 Erev, L, 327, 328,450
Dickinson, A., 396, 449 Erikson,K., 312,450
Dickinson, R. E., 305,449 Erlick, D. E., 333, 340,450
Ditto, P. H., 309,449,461 Estes,W. K., 34,293,451
Dixon, P., 250,449 Evans, J. St. B. T., 299, 300, 301, 302,
Doherty, M. E., 334,449 371,404,428,451,475
Donmell, M. L., 399,449 Evenden, J., 396,449
AUTHOR INDEX 493
A priori probability, 23
Argument
Acceptance-support testing, 77 statistics in service of, 263-264
Aces and kings problem, 160-170 strength of, 22
Actuarial statistics, 246-247 Ars Conjectandi (Bernoulli), 11, 12
Additive-difference model of decision Articulation, in argument, 264
making, 358 Assassination coincidences, 82-83
Adjustment, anchoring and, 375-378, 390 Assessment, risk, 292, 303-304, 305,
Admissible scoring rules, 391-394 313-314,331
Affirmation of the consequent, 109-110 Assimilation effect, 425
Aleatory contracts, 15 Assumptions, in probability problems,
Aleatory probability, 30, 31, 32, 34,45 156-157,161-165,169-171,173-175
Alpha, 248-249 Astragali, 2
Altruism Astronomy, 15
development of, 212 Attention, role played in frequency per-
prisoner's dilemma and, 217, 220 ception and retention, 286-287
social dilemmas and evidence of, 228 Attribution, vs. prediction, 411
ultimatum game and, 224-225 Availability, 371-373, 380
Ambiguity principle of, 291
in problem statements, 425 Average family-average child puzzle,
in statistical assertions, 274-279 150-151
Analysis of variance, 266 Axiomatic theory of probability, 39
Anchoring, 375-378, 390 Axiomatization of probability theory, 33
Annuities, 15, 233
Anthropic (cosmological) principle,
103-108 B
strong form, 105-106
testability of, 106-108 Baconian probability theory, 422-423
weak form, 104-105 Bargaining, anchoring and adjustment
Applicability of Bayesian reasoning, and, 378
138-139 Base-rate neglect, 374-375, 399,
Approximate entropy, 55-56 402-405,409-410
505
506 SUBJECT INDEX
Base rates, 401-402,414,434 weight of evidence and, 136-137
consensus information and, 411 Behavioral patterns, correlation and,
discounting, 374-375, 399, 402-405, 339-341
409-410 Belief functions, 25
reconciling, 409-410 Belief-revision tool, 399,434
personal, 405 Bertrand's paradox, 199-205
population, 405 Beta, 249
prior probabilities and, 122-136 Betting semantics, 25
relevance of, 134-136 Bias
use of, 405-409 anchoring and adjustment and, 378
Basketball, hot-hand phenomenon and, availability and, 373
74-79 Bayesian reasoning and, 138
Bassette, 13 confirmation, 412,437
Batting averages, regression to the mean in estimation, 286, 287
and, 254 expectations and, 337
Bayesian inference, 265,266, 269, 270 heuristics and, 379
Bayesian reasoning, 117-118, 398-414 normative theory and, 420
advantages of, 140-142 overconfidence and, 324
cumulative use of evidence in hy- in sample, 279
pothesis testing, 142 Biased focus, 333-337
explicit problem representation, Big Dig, 287-288
141 Binary rule, 73, 293-294
insensitivity to initial probabilities, Binomial distribution, 242, 260
141 Biology, statistics and, 236
base rates and, 401-409 Biometrics, 235
consensus information, 410-412 Biometrika, 265
conservatism in probabilistic judg- Birthdays, chance coinciding of, 87-88
ments, 399-401 Book of Games of Chance, The (Cardano),
decision making and, 117-118, 360 96
information presentation in frequency Boston Globe, 41,287
terms and, 413-414 Bounded rationality, 382
limitations of, 137-140 Bridge hand, small-probability events and,
conditions of applicability, 138-139 47-48,48-49
extra-Bayesian requirements, Brownian motion, 15
139-140
as normative model, 413-414
overconfidence and, 322-323
primacy effects and, 412-413
reconciling conservatism and base-rate Cab-color problem, 407
discounting in, 409-410 Bayesian analysis and, 122-132
Bayes'rule, 25, 110-114 representativeness and, 374-375
application of, 114-117,417-418 Calibration, 315, 316, 317, 415
cab-color problem, 122-132 contraction bias and, 328
disease diagnosis and, 132-134 curves, 315-316,317
conditional probabilities and, 111, overconfidence and, 321
116-117 Card problems
justifiability of, 118-122 aces and kings, 160-170
finding prior probabilities, 119-120 three cards, 157-160
sensitivity to prior probabilities, two aces and a jack, 170-176
120-122 Case-specific information, base rates and,
relevance of, 134-136 406-407,410
SUBJECT INDEX 507
Mathematics
coincidences with physics, 96-100
explanation in, 10
La Logique, ou I'An de Penser (Arnauld),
probability and, 22,29-30
10
Maximization of expected utility, 360-369,
Large-number coincidences, 100-101
437
Laws
Mean, 245
of chance, 259
estimates of, 284-285
of genuine variation, 242-243
regression to, 252-256
of large numbers, 12-13, 236-241,
414,434 Meaningfulness of coincidences, 85
Measurable uncertainty, 356
sample size and, 185, 299, 301-302
Measurement error, 244
St. Petersburg paradox and, 185,
Media
186
biased focus in, 337
of small numbers, 71, 257
risk perception and, 307
representativeness and, 374
Median, 245
sample size and, 299-300, 301
Memory
transformation of dilemmas into
for frequency of occurrence, 285,
nondilemmas and, 226, 227
286-287
of truly large numbers, 89
Simpson's paradox and, 211
Least squares method, 15
Metaphysical determinism, 26
Lexical-graphic model of decision mak-
Metaphysical possibility, vs. physical pos-
ing, 358
sibility, 22
Liber de Ludo Aleae (Cardano), 4
Life expectancy Method of least squares, 15
Methodological determinism, 26
nonaging process and, 80-81
Mind, statistics as theory of, 279-281
tables, 15, 233
Mind-as-statistician metaphor, 280-281
Light bulb problem, 404
Minimax strategy, 369
Likelihood, representativeness and, 375
Minus-minus cell, 335-336
Likelihood estimates, 372
Mode, 245
Likelihood function, 270
Models, normative. See Normative models
Likelihood ratio, 112,113, 352
Limit theorem of probability, 12, 13,45 Modus tollens argument, 426
Monty's dilemma, 156
Linear model of decision making, 358
Mood effects, on estimation of risk, 307-308
Linear-syllogism problem, 205
Logical probability, 24 Moral dilemmas, 228-230
Moral fortune, 198
Logistic distribution, 245
Moral imperative, protection of commons
Loss aversion, 311
and, 227
Lots, drawing of, 1-2
Most likely outcome, St. Petersburg out-
Lotteries, 15, 41-42,48, 296, 298-299,
come, 187-197
330-331,394-395
Multidimensional variables, ordering
Luck, vs. chance, 395
intransitivities and, 206
Mutual funds, regression to the mean and,
M 253-254
Magnitude, in statistically supported argu-
ment, 263-264 N
Martingale, 151-153
Matching property, 391 NASA, estimation of risk, 50-51
Mathematical Sciences Education Board, National Council of Teachers of Mathe-
284 matics, 284
514 SUBJECT INDEX
Natural and Political Observations Made role in signal detection theory, 273
Upon the Bills of Mortality (Graunt), Odds, inverse probability and, 112-114
233 Opinion polling, 247
Natural categories, 374 Opinions, anchoring and adjustment and,
Natural-kind categories, 408 377
Natural phenomena, Poisson distribution Ordering relationships, 205-207
and, 252 Outcome certainty, 356
Negative-recency effect, 71,73, 424 Outcome of interest, 41,44
Neyman-Pearson inference, 265,266, Outcomes
268-269,413 most likely, 187-197
NHST. See Null hypothesis significance representation of, 386-387
testing Outcome uncertainty, 356
Nil-null (nil) hypothesis, 248 Outrage factors, risk and, 310
Nonaging processes, 79-81, 213 Overconfidence, 315-316, 317-321,417,
Noncompressibility, randomness and, 61 437
Nontransitivity paradoxes, 208 artifactual contributions to, 326-328
Normal distribution, 242, 245 bases for, 322-325
Normative goodness, 414 reducing, 321-322
Normative models, 281 Overconfidence bias, 324
Bayes's rule as, 413-414 Overestimation, 288-289, 290-291
of decision making, 358-359, 437 of conjunctions, 342
selecting, 414
Normative theory
bias and, 420
perspectives on, 420-424
Norm extraction method, 421
Nuclear power industry, risk assessment Papers of the Imperial Academy of Sci-
and, 304-305 ences in Petersburg, 12
Nuclear Regulatory Commission, 304 Parade Magazine, 156
Null hypothesis, 62, 248, 266 Paradoxes, 181-215
hot-hand phenomenon and, 76 Bertrand's, 199-205
small samples and, 301 of confirmation, 213-214
statistical, 266-267 of expectation, 212-213
substantive, 266-267 inquisitive prisoner, 181-182
Null hypothesis significance testing intransitivity, 205-208
(NHST), 249-251,426-427 of lawfulness of chance, 28
common false beliefs, 250 of preference, 212
hot-hand data and, 77 Simpson's, 78, 208-212
St. Petersburg, 182-199, 364,430
expected values, 187-197
o most likely outcomes, 187-197
value and utility, 198-199
Parameters, 245-246
Objective probability, 30, 31, 34 Parameters of dispersion, 245
vs. subjective, 31 Parsimony, 76
Objectivity, 271 Particle physics, probability and, 16
Observer Partitioning of space, testing for random-
anthropic principle and, 105-106 ness and, 62-67
reaction to coincidences, 85-87, 91 Pascal-Fermat correspondence, 4-10
role in perception of randomness, Pascalian probability theory, 421-422
58-59 Pascal's wager, 272
SUBJECT INDEX 515