Probability Estimation in Poker: A Qualified Success For Unaided Judgment

Journal of Behavioral Decision Making
J. Behav. Dec. Making, 23: 496–526 (2010)

Published online 17 September 2009 in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/bdm.670
Probability Estimation in Poker: A Qualified

Success for Unaided Judgment
JAMES LILEY and TIM RAKOW*
University of Essex, Colchester, UK
ABSTRACT
Poker players make strategic decisions on the basis of imperfect information, which are
informed by their assessment of the probability they will hold the best set of cards
among all players at the conclusion of the hand. Exact mental calculations of this
probability are impossible—therefore, players must use judgment to estimate their
chances. In three studies, 69 moderately experienced poker players estimated the
probability of obtaining the best cards among all players, based on the limited
information that is known in the early stages of a hand. Although several of the
conditions typically associated with well-calibrated judgment did not apply, players’
judgments were generally accurate. The correlation between judged and true prob-
abilities was r > .8 for over five-sixths of the participants, and when judgments were
averaged across players and within hands this correlation was .96. Players slightly
overestimated their chance of obtaining the best cards, mainly where this probability
was low to moderate (<.7). Probability estimates were slightly too strongly related to
the strength of the two cards that a player holds (known only to themselves), and
insufficiently influenced by the number of opponents. Seemingly, players show some-
what insufficient regard for the cards that other players could be holding and the
potential for opponents to acquire a strong hand. The results show that even when
judgment heuristics are used to good effect in a complex probability estimation task,
predictable errors can still be observed at the margins of performance. Copyright #
2009 John Wiley & Sons, Ltd.
key words probability judgment; heuristics; simulation heuristic; anchors;

anchoring and adjustment; calibration; support theory; expertise
INTRODUCTION
Poker is a game of chance and skill where players bet on the value of cards. The strategies that maximise
profit have been distilled from the experiences of professional players (e.g. Harrington & Robertie, 2006),
and have been subjected to statistical analysis where appropriate (Sklansky, 1994). Poker is a complex game,
* Correspondence to: Dr Tim Rakow, Department of Psychology, University of Essex, Colchester, CO4 3SQ, UK.
E-mail: timrakow@essex.ac.uk
Copyright # 2009 John Wiley & Sons, Ltd.

J. Liley and T. Rakow Probability Estimation in Poker 497
with an element of uncertainty, which lends itself to the study of probability judgment when it is informed by
specialist knowledge and experience.
Texas Hold ‘em Poker (hereafter poker) is currently the most popular version of poker, which, encouraged
by an explosion of internet play, now boasts many high-stakes tournaments offering over $1 million in prize
money. For the benefit of readers unfamiliar with poker, we first explain the basics of game. Games are
typically played with a maximum of 10 people per table. A hand of poker begins with each player being dealt
two cards face down (for their eyes only) from a regular deck of 52 cards (see Figure 1a). These two cards are
known as the ‘starting hand’. Five ‘community cards’ are then dealt face up. In the final stage of the hand each
player creates their best hand (in the hope of beating the other players) by selecting the highest value
combination of five cards from any of the seven cards that are visible to them (his/her own two starting cards
plus the five community cards).1 Valuable hands include two-, three- or four-of-a-kind, flushes (five cards of
the same suit), and straights (five cards in sequence)—the rules prescribe greater value to less likely
combinations of cards (e.g. a flush beats three-of-a-kind). Figure 1(b) shows the ranking of hands that
determines the winner. The community cards are dealt in three rounds, which are referred to as the ‘flop’
(three cards dealt), the ‘turn’ (one additional card) and the ‘river’ (one final card revealed). Each round of card
dealing is followed by a round of betting: first after the starting hands are dealt but before any community
cards are known (the ‘pre-flop’ round of betting), then after the flop, then after the turn, and finally after the
river. Bets accumulate in the centre of the table in ‘the pot’, which can be won by any player. Players may also
‘fold’ (i.e. withdraw from the current hand) at any point in the betting. Once a player has folded they no longer
contribute to the pot, but they are no longer eligible to win it—so betting can be thought of as paying to retain
the opportunity to win. A player wins if all other players fold or if he holds the best hand among players who
remain after the river. Thus a player can sometimes bluff his way to victory if confident betting leads other
players (with better hands) to fold before the betting ceases. Therefore, the winning player may not always be
the player holding the best cards, as a player who folded may actually have held the best cards.
Explicit probability judgments are not required in poker—however, at many points in the game a player
may be guided by his beliefs about the probability of particular events or outcomes. The game requires a
player to judge the chance of his hand beating each of his opponents’ unknown hands—or, more subjectively,
to judge how likely it is that an opponent is bluffing or will continue to bet. Based on such judgments, a player
must decide when to fold or bet. Thus, some of the skill in poker revolves around assessing whether these
chances justify continuing to bet—an assessment that can be made at several junctures as the hand is played
out (see Figure 1a). Whilst a player may not consciously assess these chances for each and every hand,
players must make a judgment on some level in order to make decisions about whether to continue to play. In
this paper, we focus on the most objective of these judgments: assessing the probability of obtaining the best
cards among all players given particular cards.
In estimating the probability of obtaining the best cards against a given number of unknown opponents
there is a finite number of possible outcomes. However, with 1326 unique starting hand combinations, 19 600
(3-card) flop combinations and 2 118 760 (5-card) community card combinations to consider, there is a huge
range of possible outcomes that simply overwhelms human cognition. Incapable of these calculations, people
must implement a strategy that requires limited cognitive resources but maintains a degree of accuracy. What
kind of strategy might this be? The following three paragraphs outline three classes of psychologically
plausible strategy that poker players could adopt when assessing probabilities: specific judgment heuristics,
processes of hypothesis evaluation, and memory-based processes based on encoded frequencies. These
classes of strategy are not mutually exclusive (e.g. hypothesis evaluation may involve heuristic reasoning)
and there may be individual differences in their use (e.g. preferences among strategies or skill in applying
1
For simplicity we use masculine pronouns when referring to poker players from this point forward—so ‘he’ should be read ‘he or she’,
and ‘his’ should be read as ‘his or her’. This choice of language is in also in keeping with the observation that almost all of our poker-
playing participants were male.
Copyright # 2009 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, 23, 496–526 (2010)
DOI: 10.1002/bdm
498 Journal of Behavioral Decision Making
Figure 1a. The game of Texas Hold ‘Em Poker. Upper row shows community cards; lower rows show annotated
outcomes for two players, indicating the best hand of five cards that each player holds at each stage of the game.
Hand strength will change as the hand is played fully. The winner is the player holding the best hand after the final round
of betting after the river (unless all but one of the players have folded prior to this). In the second example, the 3€ in the
starting hand becomes redundant, as there are higher cards to accompany the pair of kings. 1, 2, 3 and 4 refer to betting rounds.
Figure (1b) Poker hand rankings listed from best (1) to worst (10). In any case where 2 or more opponents have the same
hand ranking the hand made up from higher ranking cards wins
them may vary with expertise). Nonetheless, they serve to map out what we should expect on the basis of
some of the key theories of probability judgment. Our studies do not provide an unequivocal test of these
strategies. Rather, these approaches are reviewed here as they provide a framework for understanding the task
and interpreting our results.
Several heuristics could simplify the process of probability assessment in poker, one of which is the
simulation heuristic (Kahneman & Tversky, 1982). Mental simulation can generate an evaluation of the
DOI: 10.1002/bdm
tendency of one’s model of the situation to produce different outcomes. Just as the availability heuristic uses
ease of recall to judge the relative frequency of past events or the size of existing sets (Tversky & Kahneman,
1973), the simulation heuristic uses the ease with which future possibilities can be constructed to assess their
probability. Whilst a poker player may be unable to simulate all possible hands in his mind, the simulation
heuristic allows him to reach an estimation of the chances of obtaining the best cards using a sample of
simulations as the basis for judgment. For instance, a player may consider some of the cards that his
opponents could be holding and/or some of the cards that could be dealt, and evaluate his chances by
considering the proportion of simulated opponents’ hands that are weaker than his own hand. For example, a
player holding a pair of 10s when a flop of 10^2|7€ is on the table will find it hard to simulate opponents’
hands or future community cards that can yield a hand stronger than his own. In contrast, an opponent holding
3^8 against this flop will quickly simulate possible hands for his opponents that will beat his hand—and,
accordingly, will judge his chances to be more modest. Anchoring and adjustment could also be implemented
usefully to judge the probability of obtaining the best cards for a particular hand (Tversky & Kahneman,
1974). For instance, even before the cards are dealt a player knows the potential number of players in the
hand, which could be used to calculate an equiprobabilty anchor (Teigen, 2001). For example, with five
players all have a 20% chance before the cards are dealt (assuming equal levels of skill)—this 20% prior
probability could then be revised as the cards are revealed and players fold. Additionally, the first
individuating information that a player receives is his two starting cards. The perceived strength of these cards
could be used to provide an initial estimate of the chances of obtaining the best cards (perhaps drawing on
past experience)—which again would adjust as the hand is played out.
Probability estimation in poker can be viewed as a process of evaluating successive hypotheses on the
basis of accumulating evidence, which will inevitably depend upon which cues players attend to, and how
they use this information. Kahneman and Lovallo (1993) discuss these attentional features and propose two
modes of forecasting: the inside view and the outside view, which Lagnado and Sloman (2004) characterised
with respect to probability judgment. The inside view is a singular mode of thinking and focuses on evidence
for the most salient outcome, with an ignorance for other less obvious outcomes. An example of adopting an
inside view whilst judging the probability of obtaining the best cards would be to focus on one’s own starting
hand and how hand strength may increase with future community cards, without considering the cards
opponents may be holding. Viewed according to support theory (Brenner, Koehler, & Rottenstreich, 2002;
Rottenstreich & Tversky, 1997; Tversky & Koehler, 1994) this would correspond to over-reliance on the
support (evidence) for the focal hypothesis that I hold the best cards in comparison to under-weighting the
support for the alternate hypothesis that one of my opponents holds the best cards. The outside view
represents a more distributional mode of thinking, which considers a wider set of possibilities, including the
less immediately salient ones. An ‘outside judge’ in poker may consider what cards other players might be
holding, how the community cards could affect the strength of each opponent’s hand relative to his own hand
and what additional community cards may result in a strong hand for each of his opponents. Outside
judgment is usually achieved through statistical analyses and is less reliant on heuristics. Alternatively, an
outside judge may still rely on mental simulation but simulate alternative outcomes in a more systematic and
extensive manner. Dougherty, Gettys, and Thomas (1997) have shown that generating multiple causal
scenarios for alternative hypotheses decreases the perceived likelihood of the focal hypothesis—which
counteracts the typical tendency towards overestimation of focal hypotheses. Thus, whatever strategies an
outside judge adopts, the outside view requires more careful and effortful thinking (in considering complex
rules or a greater number of possibilities). The inside view seems to be the default position in most situations.
For instance, Koriat, Lichtenstein, and Fischhoff (1980) showed that probability judgments became
somewhat more appropriate when participants were actively encouraged to think of additional possibilities
(reasons why an answer could be wrong). In fact, many studies imply that less obvious possibilities fail to
come to mind when a problem is first considered unless they are prompted (e.g. Fischhoff, Slovic, &
Lichtenstein, 1978; Tversky & Koehler, 1994).
DOI: 10.1002/bdm
Another way to bypass intractable probability calculation is simply to rely upon memory and experience.
In a given situation, past experience of the number of wins and losses in equivalent situations can inform
assessments of the probability obtaining the best cards in the current situation. Hasher and Zacks (1979)
demonstrated that people have a good facility for tracking small frequencies within a short time frame, and
others have argued that humans are generally well adapted to the task of logging event frequencies and using
them to assess probability or make inferences (Gigerenzer, Hoffrage, & Kleinbölting, 1991). However, one
challenge that the poker player faces is in constructing the appropriate reference class of events. For instance,
if my starting hand is a pair of sixes—what past experiences should I recruit? All previous instances where I
held a pair of sixes? But perhaps none of these cases included hands with the same number of opponents that I
face now. Given the number of starting hand and flop combinations, and the variable number of opponents,
players will often find themselves in unique circumstances. In these cases, even excellent memory for past
instances provides only a first approximation to the current probability of obtaining the best cards.
Tasks that rely upon specialist knowledge, an example of which we consider in this paper, have provided a
valuable context for studying the calibration of probability judgment (see Koehler, Brenner, & Griffin, 2002).
It has frequently been noted that meteorologists have been observed to have very good, sometimes near-
perfect, calibration for probability judgments for a variety of weather events (e.g. Murphy & Winkler, 1977).
In other words, among a collection of meteorological events each of which is assigned a subjective
probability of X%, approximately X% of these events do occur. Horse-race odds have been shown to be well
calibrated (Hoerl & Fallin, 1974), and groups of executives in banking and the pharmaceutical industry have
been able to provide accurate subjective probabilities when forecasting interest rates or estimating the chance
that a project succeeds (Balthasar, Boschi, & Menke, 1978; Kabus, 1976). In contrast, physicians’ diagnostic
and prognostic probability judgments are notoriously variable in quality. A number of studies find poor
calibration in probability judgments for diagnosis for a number of diseases (e.g. Christensen-Szalanski &
Busheyhead, 1981; Poses, Cebul, Wigton, Centor, Collings, & Fleishli, 1992). This is often attributed to the
lack of feedback that doctors receive on their judgments, which contrasts with the continual feedback that
meteorologists receive (Bolger & Wright, 1994), and which also applies to the poker players that we study.
However, Koehler et al. (2002) note that feedback cannot be the sole determinant of the quality of probability
judgment, as there exists considerable variability among different studies in medicine where feedback
characteristics are similar. Others have suggested that values (e.g. the perceived severity of the outcome
event) can contaminate physicians’ probability judgments. For instance, Arkes et al. (1995) reported that
doctors overestimated the chances of terminally ill patients dying within 2, or within 6, months—a
pessimistic bias that these authors suggest could be attributed to avoiding giving false hope to patients.
However—just as for gamblers in a horse-race betting market, whose laying of bets imply good calibration
(Johnson & Bruce, 2001)—there is no rational motivation for poker players to adopt either a pessimistic or an
optimistic bias in judging their chances, as either stance is associated with failing to maximise financial gains
(either from under-betting or from over-betting). Notably, Keren (1987, Experiment 1) found that national
and international class bridge players (who receive prompt feedback) also showed superb calibration when
asked to judge the chances of making a contract in tournament play. Thus, there are instances where repeated
feedback and appropriate incentives seem to lead to good judgment.
STUDY 1
This first study is an initial exploration of the ability of moderately experienced players to judge the
probability of holding the best cards at the end of the hand using only the information that is available to them
in the early stages of a hand. Such judgments are not a formal requirement of the game of poker, but a player
whose beliefs about this are inaccurate (even if they are not explicitly stated) may lose money by over-betting,
or fail to exploit opportunities by under-betting. We consider the first two stages of a hand of poker
DOI: 10.1002/bdm
(Figure 1a). First, when one’s own starting cards and the number of opponents are known, we examine
judging the probability of obtaining the best cards among all players for different starting hands and for a
varying number of opponents. Second, when the first set of community cards (the three ‘flop’ cards) are also
known. This involves judging the probability of obtaining the best cards for varying combinations of starting
hands and flops (holding number of opponents constant), or, for varying flops and number of opponents (for a
given pair of starting cards).
Method
Participants
Thirty-six poker players (35 males) were recruited with an average age of 20.9 years (range 18–28 years).
Most were university students and at least 60% were members of the University of Essex Poker Society.
Players reported knowing how to play poker for a median of 29 months (inter-quartile range, IQR of 16–42),
and playing online poker a median of 15 times a month (IQR 0–20) and live (i.e. face-to-face) poker a median
of 4 times a month (IQR 1–7). Most of this play would be for real stakes, though not necessarily large stakes.
Apparatus
Probability judgment tasks were presented as a sequence of three pencil and paper tasks. Publicly available
computer simulation software (Poker Pro LabsTM, 2007) was used to obtain the true probabilities of the
judgment tasks. The ‘random card’ feature on another publicly available poker game simulator
(TheHendonMob.com, 2007) was used to randomise the 3-card flops for two of the tasks (the Flop and
Jack-Ten tasks described below).
Materials
Three different tasks were used to assess probability judgment accuracy. Playing card images were used to
provide the information about the hands.
The Pre-flop task tested the accuracy of estimating probabilities with five different starting hands against 1,
3, 5, 7 and 9 opponents. No community cards were present. The starting hands selected for analysis were
chosen by the experimenter (JL) and were A€A|, K J , 6|6^, 3|4| and 3^8 . These were selected
in an attempt to provide a range of true probabilities (SD ¼ 20.63, range ¼ 78.9%) and to provide a range of
hands for which quality typically would be perceived as good, bad or intermediate.
The Flop task tested the accuracy of estimating probabilities with the same five starting hands as used in
the Pre-flop task against two hand picked 3-card flops and three randomly dealt flops, which were different
for each starting hand. Each hand was against 5 opponents.
The Jack-Ten task tested the accuracy of estimating probabilities with the same starting hand (J 10€) in
combination with two hand picked flops and three randomly dealt flops that were different to those in the
Flop task. Accuracy was tested for each flop and starting hand combination for 1, 3, 5, 7 and 9 opponents.
The purpose of the Jack-Ten task was to see how estimates vary with the number of opponents so it was
necessary to choose a starting hand that was unlikely to be viewed as especially strong or weak, so that a
reasonable proportion of the variance in the task came from the different sets of flop cards. The hand picked
flops were included to ensure a variety of different scenarios and thus a spread of true probabilities (for the
Flop task; SD ¼ 29.0%, range ¼ 86.4%, for the Jack-Ten task, SD ¼ 32.4%, range ¼ 90.8%).
This gave a total of 75 hands to be judged (25 per task)—see the Appendix for exact details of each hand.
Each task was presented on a single page, with the 25 hands set out in a 5 5-grid configuration (e.g. starting
hands by number of opponents in the Pre-flop task). Demographic information was requested, which included
DOI: 10.1002/bdm
sex, date-of-birth, when poker was first learnt, average frequency of playing online and live poker, and degree
scheme (i.e. major). Five independent judges rated the degree schemes for mathematical content (coded high
vs. low, according to the consensus of the judges), as this should predict mathematical skill.
Design
The study followed a within-subjects design with some additional correlational analyses. The dependent
variable was the probability judgments of the participants. These were compared against the actual
probability of obtaining the best cards, which was obtained by simulating each hand 10 million times. The
very large number of runs for the simulations ensured that this value would differ barely, if at all, from an
analytically derived probability (though such calculations are essentially intractable for many of the hands
presented)—therefore we refer to this value as the ‘true probability’. The independent variables vary between
tasks. In the Pre-flop task the independent variables are the starting hands and the number of opponents faced.
The independent variables in the Flop task are the starting hands and their flops—the number of opponents
remains constant. The independent variables in the Jack-Ten task are the flops and the number of
opponents—the starting hand remains fixed. The order that the hands were presented in each task was
determined by fixed randomisation. The order of presentation for the three tasks was counterbalanced (six
possible task orders).
Procedure
Participants provided probability estimates for each task as percentages. All participants were familiar with
the basic rules of poker and hand rankings that determine the winner (see Figure 1b). For each task,
participants were told to estimate the chance of winning if the hand was played out fully to its conclusion with
all players remaining in the game (i.e. the probability of obtaining the best cards). All opponents’ hands were
unknown. Participants were told explicitly how the true probabilities had been calculated (play for all hands
were simulated 10 million times to give an accurate estimate of the true probability). Task order was fully
counterbalanced, and each ‘batch’ of six participants received one of the six possible task orders (randomised
within batch). Each task required 25 probability estimates to be made. The tasks were completed ‘unaided’
(i.e. participants did not have calculators, or books on poker strategy or theory, to hand), and
participants typically took about 25–30 minutes to complete the three tasks and to provide demographic
information.
Results
In order to assess estimation accuracy, the signed difference (judged probability minus true probability) and
the unsigned difference (magnitude of the signed difference) was calculated for each judged value. Two
aggregate measures reflecting judgment accuracy were obtained from these signed and unsigned differences:
bias and absolute deviation. Bias is the mean of the signed differences, and absolute deviation is the mean of
the unsigned differences. In the items analysis below, this averaging was performed across participants and
within hands—therefore, the aggregate measures reflect how accurately each hand was judged. In the
subsequent analysis of individual participants, this averaging was performed across hands and within
participants—therefore, the aggregate measures reflect how well each participant performed. Absolute
deviation served as the primary indicator of accuracy—a score of zero indicates perfect judgment, though a
non-zero score does not indicate the direction of error. Negative values of the absolute deviation are not
possible—the greater the (positive) value the less accurate the judgment in absolute terms. Bias, which can be
positive or negative, gave a measure of (absolute) overestimation or underestimation, respectively.
DOI: 10.1002/bdm
Items analysis
We analysed each of the 75 hands for which participants supplied probability judgments by calculating the
mean probability judgment, absolute deviation and the bias for each hand. Note that a bias of zero for a given
hand does not necessarily equate to no bias on the part of individuals—merely that participants who
overestimated the hand are ‘balanced out’ by those who underestimated it. The true and mean judged
probability for each hand are given in the Appendix.
We calculated the correlation between judged and true probability for each participant. There was always a
good match between judged and true probabilities (in fact, an excellent match in most cases) with correlations
ranging from .64 to .95 (median of .88, IQR of .85–.92). Figure 2(b) illustrates these correlations for four
example participants: two participants representing the lower quartile by performance, and two representing
the upper quartile by performance. Fit lines are shown (solid lines): a quadratic function plotted when this is a
significantly better fit to the data than a linear one (otherwise a linear function is shown). Examination of
these example participants indicates that the most accurate participants provided judgments that matched
very well to the true probabilities, and that even participants with below-average accuracy provided
appropriate judgments for the majority of hands. The relationship between the mean judged and true
probabilities was very strong, r ¼ .96, p < .001 (Figure 2a). The same relationship was found using median
judgment in place of mean judgment (r ¼ .96).
Figure 2. Study 1: Scatter-plots to illustrate the accuracy of probability estimates. (Dotted line is the identity line; solid
line is the best fit line.) (a) Mean estimated probability (averaged within hands across participants) plotted against true
probability. (b) Example participants representing the: (i) Lower quartile (left) and (ii) upper quartile (right) for accuracy.
Example participants were determined according to the correlation between judged and true probabilities (upper pair of
participants) and absolute deviation (lower pair)
DOI: 10.1002/bdm
Figure 2 illustrates that, in comparison to most studies of probability judgment, participants’ judgments
were very well calibrated. Figure 2(a) shows a general tendency to overestimate (i.e. positive bias), with many
points sitting a little below the identity line. A two-step hierarchical regression with true probability as the
dependent variable was used to determine that a quadratic function of mean judged probability was a better fit
than a straight line for these data (linear ! quadratic, significant R2 change ¼ .01, F(1,72) ¼ 8.86, p ¼ .004).
The overall regression was significant, R2 ¼ .93, F(2,72) ¼ 490.6, p < .001. This function, which is shown in
Figure 2(a), illustrates that overestimation is greatest for hands where the probability of obtaining the best
cards is low or moderate (i.e. 10–50% chance), but, on average, is minimal for hands where this probability is
high (>70%). Unsurprisingly, linear regression with bias as the dependent variable confirmed this pattern:
bias is significantly better described by a quadratic function of true probability than by a linear function
(linear ! quadratic, significant R2 change ¼ .05, F(1,72) ¼ 4.80, p ¼ .032). This inverted-U-shaped function
gives maximum expected bias of 10.1% when the true probability is in the range 10–21% (i.e. a relatively flat
maximum), and a bias of zero for a true probability of 74%.
Figure 3 shows the absolute deviation for each hand, plotted as a function of the true probability. Again,
linear regression with absolute deviation as the dependent variable found that a quadratic function of true
probability was a better fit to these data than a linear one (linear ! quadratic, significant R2 change ¼ .26,
F(1,72) ¼ 25.8, p < .001). This quadratic function accounted for 26% of the variance in mean absolute
deviation, F(2,72) ¼ 12.9, p < .001. Thus, in absolute terms, judgments deviated least from the true value
near the endpoints of the probability scale (especially the upper region), and deviated more from the true
value for hands with an intermediate true probability. However, if accuracy is assessed relative to the true
probability, then judgments for hands with low true probabilities are clearly the least accurate. For instance,
for a true probability of 10% the expected judgement is double what it should be and the expected mean
absolute deviation is 122% of the true value. In contrast, for a true probability of 50%, the expected
judgement is 1.13 times what it should be and the expected mean absolute deviation is 34% of the true value.
Figure 3 also illustrates a tendency for participants to be least accurate on the Flop task (higher absolute
deviations). The mean (SD) absolute deviation was 12.4 (4.0) for the Pre-flop task, 16.9 (5.8) for the Flop task
and 11.2 (3.7) for the Jack-Ten task. A single-factor between-groups ANOVA (using data for hands as the
cases) confirmed that there was a significant effect of task upon the mean absolute deviation, F(2,72) ¼ 10.4,
Figure 3. Study 1: Accuracy (absolute deviation) by true probability and task for the 75 hands, with quadratic fit line
plotted
DOI: 10.1002/bdm
Table 1. Study 1: Mean (SD) judgment, absolute deviation and bias by number of opponents (Pre-flop and Jack-Ten tasks
only)
Measure (%) Number of opponents
1 3 5 7 9
Probability judgment 59.7 (22.6) 48.3 (23.4) 39.6 (22.5) 32.9 (22.1) 27.3 (21.6)
Absolute deviation 9.47 (2.66) 12.36 (3.19) 13.30 (3.39) 12.52 (4.18) 11.37 (5.18)
Bias –1.53 (4.71) 7.76 (5.72) 7.98 (6.25) 6.42 (5.71) 4.18 (5.67)
Equiprobability Anchor values 50.00 25.00 16.67 12.50 10.00
Summary measures are first obtained for each hand by first averaging across participants within each hand; means and standard deviations
are for these summary measures across several hands.
MSe ¼ 21.2, p < .001 (R2 ¼ 0.22). A Tukey’s HSD post hoc test showed that the mean absolute deviation on
the Flop task was significantly higher (i.e. worse) than that for the other two tasks.2
The tasks require participants to integrate information about different components of the game of poker:
the starting hand, the community cards and the number of opponents. The last of these variables was varied
systematically in the Pre-flop and Jack-Ten tasks—and the results for these 50 hands are shown in Table 1.
Table 1 shows a tendency towards smaller absolute deviation for 1 or 9 opponents. One-way ANOVA
indicated that there was no significant effect of the number of opponents upon absolute deviation,
F(4,45) ¼ 1.50, MSe ¼ 14.5, p ¼ .219 (R2 ¼ .12)—however, a polynomial contrast showed that the quadratic
trend was significant, F(1,45) ¼ 4.72, p ¼ .035. Bias also follows an inverted-U-shaped function of the
number of opponents, with the least bias for 1 or 9 opponents, and the greatest bias for 5 opponents. One-way
ANOVA showed that there was a significant effect of number of opponents upon bias, F(4,45) ¼ 4.87,
MSe ¼ 31.8, p ¼ .002 (R2 ¼ .30). A Tukey HSD post hoc test showed that the mean absolute deviation for
hands facing 1 opponent was significantly different from that for hands facing 3, 5 and 7 opponents, and a
polynomial contrast showed that the quadratic trend was significant, F(1,45) ¼ 13.87, p ¼ .001.
Table 1 shows that judged values are not especially close to the equiprobability anchors. Therefore, if
participants do anchor on these values, they also adjust away from them to a considerable degree. Moreover,
adjustments from these potential anchors are variable in size: standard deviations are large, and, for instance,
the average probability judgment is 10% above the equiprobability anchor for 1 opponent but 23% above the
anchor for 3 opponents.
Analysis of individual participants

The absolute deviation and bias were calculated for each participant. The median absolute deviation was
12.7% (range 7.1–24.1%, IQR 10.3–15.8%) and the median bias was 8.2% (range of –15.0 to 23.0%, IQR
2.4–10.1%). Thus over three-quarters of our participants generally over-estimated the probabilities, and the
‘average participant’ typically provided estimates that deviated (in absolute terms) by 12.7% above or below
the true probability. We examined whether accuracy (measured by absolute deviation or the correlation
between judged and true values) was predicted by age, frequency of play, mathematical content of degree,
experience (time since learned) or club ranking. Non-parametric correlation was used for this analysis due to
the presence of some outliers (e.g. the most experienced player had 104 months experience, whereas no other
player had played for more than 64 months). Frequency of play significantly predicted the correlation
between judged and true probabilities, r ¼ .39, p ¼ .018—those that played more frequently provided
2
Independent sample t-tests were conducted to see if there was any difference in absolute deviation the handpicked hands and the random
hands. No significant differences were found for the Flop task or the Jack-Ten task. These tasks were not combined for this analysis, as
there may have been some effect of varying numbers of opponents in the Jack-Ten task.
DOI: 10.1002/bdm
judgments that ranked hands more correctly. The relationship between absolute deviation and amount of time
since the participant first learned to play poker was also moderate, r ¼ –.32, p ¼ .056—more experienced
players made more accurate judgments. All other correlations were weak and non-significant (all jrj < .23, all
p > .184). So, perhaps surprisingly, club rankings based on 13 weekly tournaments did not significantly
predict task accuracy, so it is possible that the ability to assess the probability of achieving the best hand is not
crucial to becoming successful at poker. However we should note that these rankings were only available for
22 participants, and, with club players attending for differing numbers of weeks, are not easily combined into
a reliable indicator for comparing performance across individuals.
Discussion
One of the most striking features of this data set is that participants’ judgments were generally well calibrated,
exhibiting just a small tendency to overestimation (mainly at the lower end and in the mid-range of the
response scale). We will reserve detailed discussion of the reasons for this uncommonly high level of
performance until the General Discussion, as Studies 2 and 3 throw additional light on the process by which
judgments are made. Therefore, in discussing this study, we focus on understanding why participants found
some hands easier to judge than others.
Figure 2 shows that participants are most accurate when the chance of obtaining the best cards is relatively
high. It is important to note that these situations cannot simply be identified by using just one of the three
components of the game (starting hand, community cards or number of opponents). For instance, for many
situations with 1 opponent there was only a moderate chance of obtaining the best cards, and holding a strong
starting hand (e.g. A|A€) was no guarantee of a high probability of obtaining the best cards once the flop is
dealt. Thus the successful identification of high probability situations (and appropriately discriminating these
from other situations) is evidence of successful information integration of at least two, or very likely more,
sources of information.
Performance seemed to be dramatically worse for some hands on the Flop task than for hands on the other
tasks with similar true probabilities. Three (out of the four) hands with the highest absolute deviations on the
Flop task had true probabilities of 43.6, 44.8 and 46.8%. All were overestimated by a margin of more than
20% on average. Each of these three hands had the same starting hand of A|A€. These cards (a pair of aces
from any two suits) are the best starting hand that can be dealt. People may have overestimated their
judgments because they focused on the perceived strength of this starting hand and failed to account for
situations where they may lose. This is consistent with the inside view (Kahneman & Lovallo, 1993). Data
from the Pre-flop task support this, as the highest absolute deviations (i.e. the greatest absolute discrepancies
from the true values) are for starting hands of A|A€, K J and 6^6|. These are the best three starting
hands of the five starting hands tested in this task and are generally regarded as good cards amongst
experienced poker players. Therefore, it seems that the tendency for overestimation is greatest when a player
is holding strong a starting hand.
However, overestimation was not restricted to situations where the starting cards were strong. For
example, a starting hand of 3^8 with a flop of 10 7|8€ was one of the most overestimated hands in the
Flop task with an absolute deviation of 22.1% and a bias of 21.6%. This starting hand is amongst one of the
worst a player can be dealt due to its low value cards and poor drawing possibilities (e.g. straights and flushes
are unlikely). A pair has been made, but the chance of improvement with additional community cards (i.e. the
turn and the river) is low. The true probability for this hand is 14.5%, and, although participants were often
positively biased for hands with low true probabilities, judgments for this hand were especially inaccurate in
comparison to other hands with a similar true probability. Consistent with the inside view, players may be
using a positive test strategy (Klayman & Ha, 1987; Mussweiler & Strack, 1999) to evaluate their simulations
of future card draws. Focussing on how additional cards can improve the chances of winning whilst failing to
consider how these same cards might assist other players could explain the general tendency to overestimate.
DOI: 10.1002/bdm
Accuracy in the Flop task was significantly worse than for the other two tasks. The Flop task required
participants to make judgments based on different combinations of starting hands and 3-card flops. However,
the lower performance in this task cannot be attributed to difficulties in integrating information concerning
the flop cards, as this was also a requirement of the Jack-Ten task. The number of opponents was fixed at 5
for the Flop task. One might assume that this meant that the chances of obtaining the best cards were not as
extreme as for the other tasks where players sometimes faced 1 or 9 opponents. However, the Jack-Ten and
Flop tasks both had the same number of hands where the true probability fell between either 0–15 or 85–
100%. One possibility is that by varying the number of opponents in the Jack-Ten and Pre-flop tasks, this
information became more salient—encouraging participants to adjust more effectively for the number of
opponents. The number of opponents is a relevant, if indirect, source of base-rate information (cf Cohen,
1981)—and it is a common finding that such information is often underutilised when its importance is not
stressed (Koehler, 1996). Thus, participants may be more likely to adjust estimates to take account of the
5 opponents that they face when they are also asked to consider 1, 3, 7 and 9 opponents (as in the Jack-Ten and
Pre-flop tasks) than when the number of opponents is fixed at 5 across the task (mean absolute deviation of
15.2% in the Flop task versus 10.6% with 5 opponents in the other two tasks). Thus although Table 1 indicates
that it may be slightly harder to judge hands with 5 opponents than with 1 opponent (perhaps because
1 opponent is a more common occurrence in actual play than 5 opponents), the greater difficulty of the Flop
task is not solely attributable to the fact that 5 opponents were faced—though differences between tasks in
salience of those 5 opponents may be important.
STUDY 2
Despite the overall high level of performance, Study 1 provides several converging pieces of evidence of a
tendency to rely slightly too strongly on the value of the starting hand to the detriment of fully incorporating
other information (the number of opponents or the flop cards). For instance, overestimation is greatest for
strong starting hands and when the number of opponents is plausibly less salient. This could be viewed in
terms of insufficient adjustment from a self-generated anchor (Epley & Gilovich, 2001; Tversky &
Kahneman, 1974): anchoring on the value of the starting hand and adjusting insufficiently for the flop cards or
the number of opponents. It could also be viewed as the result of sub-optimal simulation associated with an
‘inside view’ or a positive test strategy (Kahneman & Lovallo, 1993; Klayman & Ha, 1987): focussing on
what might enhance the value of one’s own hand with insufficient regard to how this might effect the strength
of one’s opponents’ hands (and the fact that there are multiple opponents). Study 2 builds on the groundwork
of Study 1 by directly testing for anchoring on starting card strength, consistent with over-reliance on the
‘inside view’.
Method
Participants
Twenty-one male poker players from the University of Essex participated, five of them indicated that they had
participated in Study 1. Participant characteristics were similar to Study 1, including age (mean of 20.7, range
18–25 years), frequency of live play (median of 6 times per month, IQR 4–12) and online play (median of
10 times per month, IQR 1–20). Participants in this study had generally been playing slightly longer than
those in Study 1: median of 36 months (IQR 23–51).
Apparatus and materials

The apparatus was the same as for Study 1, and the task was again presented as a pencil and paper task.
Variants of two hands used in the Flop task and two hands used in the Jack-Ten task of Study 1 were used,
DOI: 10.1002/bdm
giving four ‘sets’ of five cards to examine. Each set of five cards was then divided into two subsets in various
ways to create several starting hand and 3-card flop combinations (shown in Table 2). This created 14 hands
for which participants were to assess the probability of obtaining the best cards against 5 opponents.
Set 1 (A|A€3^3€3 ) had three variants: starting hands A|A€, A|3 and 3€3 . (For instance: with a
starting hand of A|A€ the flop was 3^3€3 , whereas, with a starting hand of A|3 the flop was
A€3^3€.)
Set 2 (A|Q K€J 10€) had four variants: starting hands A|K€, A|Q , K€J and J 10€.
Set 3 (J J^10 10€2|) had three variants: starting hands J J^, J^10 and 10 10€.
Set 4 (3 K J 9 10^) had four variants: starting hands K 3 , 10^9 , 10^3 and 9 3 .
The cards used as starting hands were selected to provide a range that varied in strength. The starting hand
and flop combinations provided a spread of true probabilities (SD ¼ 26.6%, range ¼ 64.4%).
Design and procedure

A within-subjects design was used, which also permitted some correlational follow-up analyses. The hands
were presented in a four-page booklet such that no two hands from the same set were on the same page. Each
of the first three pages showed four hands, and the last page showed two hands and requested demographic
information. The order of presentation on each page was determined by fixed randomisation, but the order of
the first three pages was randomised to provide some protection against order effects.
Here we briefly explain how the design of Study 2 allows us to test the main conjectures raised by Study 1.
It is important to note that the probability of obtaining the best cards is not solely dependent upon which five
cards are in a set—it depends also on which two are held as starting cards and which three belong to the
community cards. For instance, in Set 1 the player is holding a ‘full house’ (a pair plus three-of-a-kind),
which is a high-value hand. However, holding A-3 gives a better chance of obtaining the best cards than
holding A-A, because community cards of 3-3-3 give opponents more opportunities for a strong hand than
community cards of A-3-3. For instance, any player holding the remaining ‘3’ can make four-of-a-kind with
3-3-3 and beat the player’s full house. There is a 21% chance that 1 of the 5 opponents is holding this card
(3|), and, in addition there are other winning possibilities for the opponents depending upon the remaining
community cards that are dealt. Note that this is the case even though A-A represents a stronger starting hand
than A-3. Before any community card is dealt, a starting hand of A-A has a 49.3% chance of yielding the best
hand against 5 opponents, whereas with A-3 this probability is only 17.9% (barely different from the
equiprobability base rate of 16.7% when there are six players around the table). Thus, a tendency to anchor
upon (or to otherwise overweight) the strength of the starting hand and to take insufficient account of
possibilities that give opponents good hands would lead to overestimation of A-A matched with 3-3-3 relative
to A-3 matched with A-3-3.
Participant instructions and the procedure for collecting data were the same as for Study 1.
Results
All task data were complete, though three participants failed to provide complete demographic data. A
typographic error in preparing the materials meant that one of the hands in Set 2 was not an exact re-
arrangement of the other hands in the set. With the starting hand as A|K€ the flop was presented as
Q|J 10€ when it should have been presented as Q J 10€. The true probability for the hand presented was
85.5%, when it would have been 76.2% for the intended flop. The data were treated in the same way as in
Study 1.
DOI: 10.1002/bdm
Table 2. Study 2: Comparison of judged probabilities (organised by strength of starting hand)
Stronger starting hand Weaker starting hand Strong–weak starting hand
J. Liley and T. Rakow
Starting hand Mean True Starting hand Mean True True Mean judged t(20): True
[pre-flop Flop judgment probability [pre-flop Flop judgment probability probability probability vs. judged
probability] cards (%) (%) probability] cards (%) (%) difference difference (SD) difference
Set 1
A|A€ [49.3] 3€3 3^ 87.1 74.6 A|3 [17.9] 3€3Â€ 91.6 94.5 19.9 4.4 (14.5) 4.88

A|3 [17.9] 3€3Â€ 91.6 94.5 3€3 [16.3] 3Â|A€ 80.7 81.8 12.7 10.9 (17.9) 0.46
A|A€ [49.3] 3€3 3^ 87.1 74.6 3€3 [16.3] 3Â|A€ 80.7 81.8 7.2 6.5 (12.2) 5.12
Set 2
K€A| [28.3] Q|J 10€x 88.5 85.5 A|Q [26.5] J 10€K€ 77.1 76.2 9.3 11.5 (15.0) 0.66
K€A| [28.3] Q|J 10€x 88.5 85.5 J 10€ [22.2] Q A|K€ 83.4 85.5 0.0 5.1 (19.6) 1.19
K€J [24.2] A|10€Q 88.3 85.5 J 10€ [22.2] Q A|K€ 83.4 85.5 0.0 4.9 (18.3) 1.23
A|Q [26.5] J 10€K€ 77.1 76.2 J 10€ [22.2] Q A|K€ 83.4 85.5 9.3 6.4 (20.8) 0.64
A|Q [26.5] J 10€K€ 77.1 76.2 K€J [24.2] A|10€Q 88.3 85.5 9.3 11.3 (17.4) 0.52
K€A| [28.3] Q|J 10€x 88.5 85.5 K€J [24.2] A|10€Q 88.3 85.5 0.0 0.2 (6.5) 0.14
Set 3
10 10€ [30.1] J^J 2| 53.3 38.2 J^10 [22.2] 10€2|J 74.9 63.1 24.9 21.6 (15.1) 1.00
J J^ [33.8] 10€10 2| 58.6 39.2 J^10 [22.2] 10€2|J 74.9 63.1 23.9 16.3 (16.1) 2.17
J J^ [33.8] 10€10 2| 58.6 39.2 10 10€ [30.1] J^J2| 53.3 38.2 1.0 5.3 (14.1) 1.41
Set 4
3 K [19.4] J 9 10^ 42.2 40.0 10^3 [11.7] K 9 J 23.2 11.8 28.2 19.1 (19.2) 2.19
9 10^ [19.6] 3 J K 35.0 26.4 9 3 [14.9] K J 10^ 31.5 32.7 6.3 3.5 (13.3) 3.38
9 10^ [19.6] 3 J K 35.0 26.4 10^3 [11.7] K 9 J 23.2 11.8 14.6 11.8 (14.8) 0.87
9 3 [14.9] K J 10^ 31.5 32.7 10^3 [11.7] K 9 J 23.2 11.8 20.9 8.3 (17.6) 3.27
3 K [19.4] J 9 10^ 42.2 40.0 9 3 [14.9] K J 10^ 31.5 32.7 7.3 10.7 (15.5) 1.01
9 10^ [19.6] 3 J K 35.0 26.4 3 K [19.4] J 9 10^ 42.2 40.0 13.6 7.2 (15.6) 1.87

p < .05; p < .01; p < .001.
x
Intended to be Q J 10€—for which the true probability is 76.2%
Probability Estimation in Poker
509
DOI: 10.1002/bdm
Journal of Behavioral Decision Making, 23, 496–526 (2010)
Items analysis and accuracy

The overall pattern of results was highly consistent with Study 1. The correlation between the judged and true
probability was strong or very strong for almost all participants, median r ¼ .90 (range .57–.96, IQR .84–.95),
indicating a very good or excellent ability to distinguish good and bad hands for most participants. Again,
when averaging across participants within hands, judgmental accuracy as measured by the correlation
between the mean judged and true probabilities was excellent, r ¼ .97, p < .001. The mean (SD) of the
absolute deviation for the 14 hands was 12.9% (4.5%). Again, there was a slight tendency towards
overestimation, with the mean (SD) bias for the 14 hands being 5.7% (7.2%). Four hands were slightly
underestimated (by no more than 3%) and 10 were overestimated (by up to 19.4%).
Comparison of hands by set

Overall, 67.7% of individual judgments were above the true probability for that hand (i.e. positively biased)
and 31.3% of individual judgments were below the true probability (i.e. negatively biased). However, the
main purpose of Study 2 was to determine whether giving undue weight to the strength of the starting hand
can explain the degree of overestimation. This was analysed by means of a pair-wise comparison between the
hands within each set (Table 2). For each pair of hands we calculated the difference in judged probability
between the two hands by subtracting the judged probability for the hand with the weaker starting hand from
the judged probability for the stronger starting hand (for each participant). The strength of the starting hand
was determined by calculating the probability that the starting hand would yield the best final hand of
cards—but making this calculation before any community cards are dealt. These probabilities were obtained
using a 10-million run simulation, and are shown in square brackets in Table 2.3 The judged difference was
then tested against the true difference by means of a series of one-sample t-tests. If hands with stronger
starting hands are judged more favourably than those with weaker starting hands, the judged difference will
be more positive than the actual difference. In the case that the true probability difference is negative, this
then means the judged probability difference will be a smaller negative value or a positive value. In the case
that the true probability is positive, then the judged probability difference will be a larger positive value.
Differences consistent with this hypothesised direction of effect are indicated by a positive t-value in Table 2.
Table 2 shows that 13 of the 18 judged probability differences are more positive than the true difference
(five significantly so)—consistent with a greater tendency for overestimation for stronger starting hands.
Five comparisons fall in the opposite direction, two of which are significant. Notably there are two instances
where the judged ordering of the hands reverses the actual ordering. In Set 1, A|A€ with 3€3 3^ is
assessed more favourably than 3€3 with 3Â|A€ when the reverse is true. Similarly, in Set 4,
and 9 10^ with 3 J K is incorrectly assessed to give a better chance of yielding the best final hand of
cards than 9 3 with K J 10^. Set 2 has three comparisons for which the true probability is the same for
both hands (85.5%)—in each case the hand with the stronger starting hand has a slightly higher mean judged
probability.
Discussion
Study 2 confirmed the finding of Study 1 that moderately experienced poker players could provide accurate
judgments of their chances of obtaining the best cards when their opponents’ cards are unknown and two
community cards remain to be drawn. In keeping with Study 1, judgments were tainted by a degree of
3
In calculating and using this formal measure of starting hand strength we are not presuming that players know these values. However, it
is likely that players’ perception of starting hand strength with have a strong rank correlation with these calculated values, and that
training in poker strategy means that skilful players could have an approximate interval scaling of starting hand strength that corresponds
closely to this. This therefore serves as a valuable measure for assessing the cues that players attend to.
DOI: 10.1002/bdm
overestimation. Moreover, Study 2 also sheds light on the ‘margins’ of performance: illuminating why, in
some instances, performance, even if good, is less than perfect. We illustrate this by reference to the pair-wise
comparison of different hands.
Set 1: A|A€3^3€3 variants

The chance of obtaining the best cards with a starting hand of A|A€ was substantially overestimated, which
is consistent with judgments made in Study 1. This is not simply indicative of a general tendency to
overestimate. The mean judgment for this hand was significantly closer to that for a starting hand of A|3
than it should have been, and higher than that for a starting hand of 3€3 when it should have been lower
(also a significant effect). This is consistent with over-valuing the benefits of holding a pair of aces relative to
the moderately good opportunities that three 3’s in the community cards offers to opponents.
Set 2: A|Q K€J 10€ variants

Average judgments for this set were quite accurate, showing only a slight tendency towards overestimation.
Nonetheless estimates were consistent with the proposal that starting hand strength exerts undue influence on
probability judgments. Over-estimation was (just) greatest for the hand with the strongest starting hand in the
set (K€A|)—whereas the only hand that was underestimated was the one with the weakest starting hand
(J 10€).
Set 3: J J^10 10€2| variants

There was a general tendency towards overestimation for hands in this set, consistent with the lower accuracy
of judgment for hands with a low or moderate true probability. However, importantly, overestimation
increased with increasing starting hand strength: overestimation by 11.8% for the weakest starting hand
(J^10 ) but overestimation by 19.4% for the strongest starting hand (J J^). This is what would be expected
if players anchor on (or overweight) the strength of starting hands but adjust insufficiently for other
information. Thus although participants judgments do reflect the fact that a starting hand of J^10 is more
likely to yield the best final hand of cards than a starting hand of J J^ (given their respective flops), they
underestimate the extent to which this is the case.
Set 4: 3 K J 9 10^ variants

Judgments for these hands, all with relatively low true probabilities, were fairly precise. The least accurate
judgments were for the 10^3 starting hand, which was actually the weakest starting hand in the set. This
runs counter to the hypothesis that players will be unduly influenced by the strength of the starting hands, and
is the source of the only two significant effects that run counter to the hypothesis that overestimation is
greatest when starting hands are strongest. However, the overestimation of this hand relative to others in the
set does raise a further interesting possibility. The three hearts in the flop for this hand could be viewed as an
attractive ‘unit’ offering a player good opportunities for a flush (five hearts). However, if viewed from the
‘inside’ rather than the ‘outside’, a player may fail to appreciate the full impact of the fact that these cards also
offer similar opportunities to other players. A similar argument could be made for the three 3’s in the flop of
the most overestimated hand in Set 1. The other three hands in this set follow the pattern of increasing
overestimation with increasing strength of the starting cards.
DOI: 10.1002/bdm
STUDY 3
Studies 1 and 2 show remarkably accurate probability judgments by poker players. However, we conducted
one further study to determine whether we had unwittingly helped our participants to provide a well-
calibrated set of judgments. In actual play, a poker player need only consider the strength of his cards, or his
chances of winning, for one hand at a time. However, in Study 1 participants were presented with 25 hands on
a single page, and in Study 2 participants saw three or four hands per page. One possibility is that this
encouraged explicit comparisons between hands—and that simply by ranking the hands in a somewhat
appropriate manner and spreading their judgments over the range of possible responses participants were able
to provide fairly accurate assessments. Moreover, in Study 1, participants may have been ‘cued’ to consider
the number of opponents as a relevant variable by the fact that the number of opponents was listed explicitly
for the Pre-flop and Jack-Ten tasks. Therefore, Study 3 used a more ecologically valid presentation format for
the task information (that mimicked online poker), made no explicit reference to variation in the number of
opponents, and showed participants only one hand at a time. The key findings of Studies 1 and 2 were then re-
examined in this new data set, collected using an information presentation format more closely approximated
actual poker play.
Method
Participants
28 male poker players from the University of Essex Poker Society with a mean age of 20.2 years (range 18–
24) participated. The median time since learning to play was 28 months (IQR 14–42), and the median
frequency of play was 6 times per month (IQR 4–10) for live play and 10 times per month (IQR 4–20) for
online play. Eleven participants reported participating in either Study 1 or 2—no one had participated in both
studies.
Apparatus and materials

Thirty-five hands used previously in Studies 1 and 2 were selected. The selection of hands ensured that a
range of hands would be judged within a reasonable amount of time, permitting us to re-test the conclusion
that moderately experienced poker players show high levels of skill in probability judgment. The inclusion of
several sets of hands from Study 2 allowed us to test the robustness of the findings of Study 2. Ten hands were
selected from Study 2: all hands from Sets 1, 3 and 4 (see Table 2). Ten hands from the Pre-flop task were
included: each of the five pairs of starting cards were included, once with a small number of opponents (1 or 3,
randomly selected) and once with a large number of opponents (7 or 9, randomly selected), Ten hands from
the Jack-Ten task were included: comprising two flops each played against 1, 3, 5, 7 and 9 opponents. (These
flops were selected to have the greatest variation in true probabilities—thereby providing the strictest test of
skill). Five hands from the Flop task were included: each flop was paired with a starting hand of K J and
played against 5 opponents. (We selected fewer hands from the Flop task than for the Pre-flop and Jack-Ten
tasks, as we were already including 10 hands from Study 2, for which the task and number of opponents were
the same.)
The information for each of the Pre-flop hands was shown on a separate page. The layout was
diagrammatic, and followed the style of presentation used in online poker (see Figure 4a). Thus, players were
shown around the table, meaning that participants were required to count the number of opponents if they
wished to know this information. The position of the opponents around the table was determined randomly
for each hand. In order to mirror the pattern of information acquisition in actual play, the information for the
Jack-Ten, Flop task and Study 2 hands were shown over two pages (see Figure 4b). First participants learned
their starting hands and then turned the page to see the flop cards (still seeing their starting cards) before
DOI: 10.1002/bdm
Figure 4. Study 3: presentation of task information: (a) Pre-flop judgments and (b) post-flop judgments presented on two
successive pages.
making their judgment. Thus, Study 3 represents a more ecologically valid task than Studies 1 and 2 in three
respects: (1) Task-relevant information is presented as it is in actual online play, (2) information becomes
available sequentially as in actual play, and (3) hands are considered one at a time, so removing the possibility
that simultaneous presentation of hands enhances judgment by making key variables (e.g. number of
opponents) more salient or allowing explicit ranking among hands. Note also that this procedure meant that
participants were required to switch between pre-flop and post-flop judgments—just as they would be if they
were making a variety of judgments over the course of a session of play. This also diminished the similarity
between successive judgments, making it harder to make useful direct comparisons between successive
hands.
Design and procedure

The 35 hands were arranged into a random order and divided as equally as possible between four booklets,
subject to the constraints that no two hands from the same set of Study 2 appeared in the same booklet and that
two hands using identical cards never appeared next to each other. The order of the booklets was randomised
for each participant. These constraints and this randomisation procedure ensured that: (a) We randomised the
presentation order of hands from Study 2 for which differences would be examined, (b) we discouraged the
DOI: 10.1002/bdm
use of anchoring on the previous judgement for identical cards, which would hardly ever occur in actual play,
and (c) in general, a reasonable mixing of the order of hands occurred between participants.
Participants were instructed to work through the booklets in the order that they were presented to them (i.e.
as randomised by the experimenter) and to work through each booklet in order. They were asked to work
carefully through the booklets, providing an estimate each time it was asked for. Participants were asked not
to go back and change answers once they had moved on to the next scenario. This was to further discourage
artificially inflating accuracy by allowing explicit comparison or ranking among hands. After completing the
task, participants provided demographic information (age, sex and degree scheme) and information about
their poker playing (when they learned, and frequency of live and online play). Data were collected at club
poker tournaments, with players completing the task when they had finished playing or during breaks
between play. Participants took approximately 10–30 minutes to complete the task, and received a fixed
participation fee of UK£4 (approximately US$6 at the time of the study).
Results
Five participants missed out one estimate each, which was presumably the consequence of turning two pages
at once, or not realising that the back cover of a booklet sometimes showed a hand. All remaining data were
complete, and were analysed as per Studies 1 and 2.
Assessment of individual and aggregate performance

Probability judgments in this more ecologically valid version of the task were again excellent, with the
distribution of accuracy measures across participants being very similar to Study 1. For individual
participants, the median correlation between judged and true probabilities (across 35 hands) was .87 (range
.41–.96, IQR .81–.91). Scatter-plots showing example participants are shown in Figure 5(b) (lower quartile of
performance left and upper quartile right). The mean absolute deviation was 11.1% (range 6.9–23.5%, IQR
10.1–13.2%). The majority of participants provided a positively biased set of estimates (19/28), though, in
general, bias was a little less than in the earlier studies: median of 3.1% (range –9.5 to 17.0%, IQR –2.4 to
5.4%). Again, a very accurate set of judgments is obtained by averaging across participants within hands (see
Figure 5a): the correlation between the mean estimated probability and true probability being r ¼ .96,
p < .001 (also r ¼ .96 if the median is used in place of the mean). Figure 5(a) illustrates that, again, the
greatest tendency for overestimation occurs when the true probability is small, and that judgements are not
systematically biased when the true probability of obtaining the best hand is moderate or large. Although
averaging judgements across participants improves upon the performance of individual participants (Ariely
et al., 2000), we should note that here (as in Studies 1 and 2) the very best participants attained levels of
performance that were close to that of the aggregated judgements shown in Figure 5(a).
We again used non-parametric correlation to test whether length or frequency of play predicted accuracy
as assessed by the correlation between true and judged probabilities and by absolute deviation. None of these
relationships were significant—the only non-trivial one being the positive relationship between frequency of
online play and the true-judged correlation (r ¼ .17, p ¼ .385).
Study 2 hands—comparison of hands by set

Table 3 shows the data for the hands that were re-used from Study 2, which were analysed to determine
whether the strength of the starting hands predicts relative over- or under-estimation. (See the Appendix for
the true and mean judged probabilities for the other hands.) Ten of the 12 differences examined using t-tests
fall in the same direction as they did in Study 2. Notably, the positive t-values are mostly smaller than for
Study 2—thus where the hypothesised pattern of mis-estimation is found it is not as severe as in Study 2.
DOI: 10.1002/bdm
Figure 5. Study 3: Scatter-plots to illustrate the accuracy of probability estimates (dotted line is the identity line
for reference): (a) Mean estimated probability (averaged within hands across participants) plotted against true probability.
(b) Example individual participants representing the: (i) Lower quartile (left) and (ii) upper quartile (right) for accuracy.
Example participants were determined according to the correlation between judged and true probabilities (upper pair of
participants) and absolute deviation (lower pair)
Moreover, the negative t-values generally have greater magnitude than in Study 2—thus there are several
clear-cut instances of hands with weaker starting cards being over-estimated relative to other hands, contrary
to the general pattern of Study 2. Overall, with equal numbers of effects in each direction, the results shown in
Table 3 are equivocal with respect to over/under-estimation on the basis of starting hand strength.
Discussion
Study 3 provides strong evidence that the good-to-excellent accuracy in probability judgment seen in Studies
1 and 2 was not simply an artefact of the presentation format used in those studies. Using a presentation
format based on online poker and asking participants to consider only one judgment at a time, we found
similar levels of accuracy to Study 1 as assessed by the correlation between judged and true probabilities
across hands and by absolute deviation. In fact, judgment bias was slightly lower in this study.
In relation to re-examining the primary research question of Study 2, these data are equivocal. With
respect to Set 1 (A|A€3^3€3 variants) and Set 3 (J J^10 10€2| variants) we find general
concordance with what would be predicted if participants place too much weight on their starting hands—but
the effect is less marked than in Study 2. Therefore, like the findings regarding bias discussed above,
judgments for these cards, whilst not perfect, were slightly more appropriate than in previous studies. With
respect to Set 4 (3 K J 9 10^ variants), we again found results contrary to the pattern observed in Sets 1
and 3—and this discrepancy was more marked than in Study 2. It is therefore clear that overweighting the
starting hands is certainly not a universal feature. That said, every significant effect in Set 4 seems to be
DOI: 10.1002/bdm
516
Table 3. Study 3: Comparison of judged probabilities (organised by strength of starting hand)

Stronger starting hand Weaker starting hand Strong–weak starting hand
Starting hand Mean True Starting hand Mean True True Mean judged t(27): True
[pre-flop Flop Judgment Probability [pre-flop Flop Judgment Probability probability probability vs. judged
probability] cards (%) (%) probability] cards (%) (%) difference difference (SD) difference

Set 1
A|A€ [49.3] 3€3 3^ 82.6 74.6 A|3 [17.9] 3€3Â€ 89.2 94.5 19.9 6.6 (21.7) þ3.24
A|3 [17.9] 3€3Â€ 89.2 94.5 3€3 [16.3] 3Â|A€ 79.9 81.8 12.7 9.3 (12.2) 1.49
A|A€ [49.3] 3€3 3^ 82.6 74.6 3€3 [16.3] 3Â|A€ 79.9 81.8 7.2 2.6 (25.5) þ2.04
Journal of Behavioral Decision Making
Set 3
10 10€ [30.1] J^J 2| 41.5 38.2 J^10 [22.2] 10€2|J 69.9 63.1 24.9 28.4 (19.7) 0.95
J J^ [33.8] 10€10 2| 50.7 39.2 J^10 [22.2] 10€2|J 69.9 63.1 23.9 19.2 (18.6) þ1.33
J J^ [33.8] 10€10 2| 50.7 39.2 10 10€ [30.1] J^J2| 41.5 38.2 1.0 9.2 (21.8) þ1.99
Set 4
3 K [19.4] J 9 10^ 35.3 40.0 10^3 [11.7] K 9 J 21.0 11.8 28.2 14.3 (15.4) 4.77
9 10^ [19.6] 3 J K 29.1 26.4 9 3 [14.9] K J 10^ 32.1 32.7 6.3 3.0 (19.5) þ0.89
9 10^ [19.6] 3 J K 29.1 26.4 10^3 [11.7] K 9 J 21.0 11.8 14.6 8.1 (15.3) 2.23
9 3 [14.9] K J 10^ 32.1 32.7 10^3[11.7] K 9 J 21.0 11.8 20.9 11.2 (16.5) 3.02
3 K [19.4] J 9 10^ 35.3 40.0 9 3 [14.9] K J 10^ 32.1 32.7 7.3 3.1 (19.0) 1.16
9 10^ [19.6] 3 J K 29.1 26.4 3 K [19.4] J 9 10^ 35.3 40.0 13.6 6.2 (20.4) þ1.93

p < .05; p < .01; p < .001.
DOI: 10.1002/bdm
Journal of Behavioral Decision Making, 23, 496–526 (2010)
largely driven by the overestimation of 10^3 paired with a flop of K 9 J. So, as discussed in Study 2, we
should retain the working hypothesis that ‘attractive’ flops may have an effect on judgment that is similar to
what we have generally seen with strong starting cards. Certainly, this would be consistent with Windschitl,
Kruger and Simms (2003) who found that competitors fail to fully appreciate that factors that assist their
performance (such as a flop offering good prospects for a highly ranked hand at the conclusion of the hand)
may not improve their chances of winning when such factors also assist their opponents.
GENERAL DISCUSSION
A case study in excellent probability judgment

Perhaps the most remarkable feature of these data is just how good the poker players were at the task. Formal
estimation of the probability of obtaining the best cards is not a part of the game—and, whilst many of our
participants are serious about their pastime, they have not accumulated the 10 years of experience that is often
seen as a prerequisite of genuine domain expertise (Ericsson & Smith, 1991). Nonetheless, these players were
able to provide a more accurate set of probability judgments than has been seen in the majority of studies—
even those using experts (e.g. see Bolger & Wright, 1994). The probability judgment of our players was not as
accurate as the meteorologists in Murphy and Winkler’s (1977) much-cited study, but it was more accurate
than many other studies involving meteorologists (e.g. see data for Daan & Murphy, 1982, reported in
Koehler et al., 2002). Similarly, while some studies in medicine and elsewhere find performance comparable
to that of the poker players studied here (Arkes et al., 1995), most studies of physicians’ probability
judgments, and several studies in other domains, show a lower level of performance (often much lower). Thus
calibration superior to that seen in this investigation (e.g. Keren, 1987) is the exception, and calibration
inferior to that observed here is the norm across a range of domains (see Koehler et al., 2002; Lichtenstein,
Fischhoff, & Phillips, 1982).
Why were our participants so successful? Koehler et al. (2002) identify five reasons that various authors
have used to explain the generally superior performance of meteorologists in calibration research (p. 702).
We review each in turn to see if these can explain the success of our participants.
First, it has been noted that meteorologists employ advanced computer models and can draw upon
centrally provided forecasts that allow them to pattern match. It is the case that there are ‘patterns’ or
regularities to watch for in the game of poker (e.g. pairs or sequences of cards). So, even if participants were
encountering card combinations in the tasks that they had never seen before in actual play (a distinct
possibility given the millions of possible starting hand and flop combinations), these cards may have
‘matched’, or been somewhat equivalent to, ones that they had previously encountered or had been taught the
value of. However, our poker players completed the task without books or computer aids. Therefore, if
participants were imposing a pattern-like structure upon the task they were doing so on the basis of their own
reason or intuition (though they may have learned from books how to find ‘patterns’).
Second, meteorologists receive clear and unambiguous feedback about outcomes—which, perhaps
importantly, is generally fairly immediate (see Bolger & Wright, 1994). To a degree this is true in poker. In
the course of an evening’s play a hundred or so hands may be played, and the winner of each hand becomes
immediate and common knowledge—as was the case for the bridge players in Keren (1987). In principle, this
gives poker players the opportunity to log the frequency with which particular classes of starting cards that
they may hold (e.g. pairs) end up winning. However, players receive incomplete information concerning
other players cards: if a player folds he need not show his cards, and of course, it may be that this player was
actually holding what was or would have become the best hand. This incomplete feedback for opponents’
cards suggests a possible ecological explanation for the tendency to overestimate the probability of holding
the best cards (which is distinct from the probability of winning in actual play because bluffing and folding
may result in a lesser hand winning out). Clearly, I always know when I have won a game. Also, when I have
DOI: 10.1002/bdm
folded, I usually know whether my cards would have beaten the winner’s hand (because the winning player’s
cards are shown unless a player wins by all other players folding). However, I never know when another
player would have won if they had not folded without showing their cards.4 Therefore, players’ feedback
concerning how frequently they end up with the best hand can be viewed as a biased sample of information. If
players fail to adjust for this bias, they will overestimate the chances of obtaining the strongest hand around
the table. Many studies imply that people have difficulty identifying when their experience is biased, or, if
they are aware of this, that they adjust insufficiently for sample bias (Fiedler, 2000; Juslin & Fiedler, 2006).
A third explanation for success in meteorologists’ forecasts is training in probabilistic thinking. Our
student participants came disproportionately (though certainly not exclusively) from more numerate
disciplines. However, studying a numerate academic discipline is no guarantee of training in probabilistic
thinking. Moreover, we found no difference in accuracy between those from more/less mathematical degree
schemes. Some participants may have read books on the theory and strategy of poker (e.g. Harrington &
Robertie, 2006)—but there is no a priori reason to suppose that our participants could be considered experts
in probabilistic reasoning.
Fourth, a failure to take account of base-rate information (i.e. the relative frequency of the target event) is a
contributory factor in many examples of poor probability judgment (e.g. Rakow, Harvey, & Finer, 2003).
Seemingly participants adopt a case-based approach: making intuitive judgments based on the individual
characteristics of the instance at hand, and neglecting to take account of the class or classes that the instance
belongs to, from which base rate information can be estimated or derived (see Griffin & Brenner, 2004).5 In
contrast, historical weather records provide meteorologists with explicit base rate information for the weather
event that they are assessing. Our poker players did not receive explicit base rate information. In Study 1, the
mean true probability across the 75 hands was 36.4% and 37.2% across 35 hands in Study 3—but we gave
participants no indication of this base rate for either study. One could argue that this is a good example of a
task where there is more than one relevant base rate (Cohen, 1981). Players should also have regard to what
we might term the a priori base rate: the probability of holding the best cards at the conclusion of the hand
based solely on the number of opponents (i.e. the ‘equiprobability anchor’—e.g. 50% with 1 opponent, or
16.7% with 5 opponents). The average value of this base rate was 20.8% in Study 1—however, neither this
figure nor the a priori base rate for individual hands were explicitly stated (though participants could have
chosen to calculate this themselves). The success of our participants in providing accurate probability
judgments implies that they did have a broadly appropriate regard for base rate information, even though this
information was not explicit. However, as discussed in Study 1, it may be that the a priori base rate is more
salient—and therefore incorporated more effectively into judgments—when it varies from hand to hand as in
the Pre-flop and Jack-Ten tasks, and, albeit less explicitly, in Study 3. Indeed, several studies have found that
manipulating base rates within subjects reduces base-rate neglect (e.g. Birnbaum & Mellers, 1983; Fischhoff,
Slovic, & Lichtenstein, 1979—for a review see Koehler, 1996).
The fifth explanation for successful probability judgment that Koehler et al. (2002) discuss is the
availability of accurate cues for judgment combined with the presence of moderate base rates (event
probabilities around 50%). Having genuinely predictive cues allows meteorologists to discriminate
effectively between situations where an event is more or less likely to occur. In the case of poker, the
information upon which judgments ought to rely (the starting cards, the community cards and the number of
players) is accessible and ought to be obvious—something which may not be the case for many diagnostic or
prognostic judgments in medicine. However, exactly how this information might best be segmented or
4
To adapt the words of Donald Rumsfeld: ‘There are known knowns: Things that we know have occurred. And there are known
counterfactuals: Things that we know would have occurred. But there are also unknown counterfactuals: Things that we don’t know
would have occurred’.
5
Note that ‘cased-based reasoning’ has a particular meaning here—the evaluation of evidence based on the features of an individual case
as opposed to distributional features—which differs from the usage of the phrase ‘case-based’ in some other contexts.
DOI: 10.1002/bdm
combined to generate relevant cues is a far from trivial problem. For instance, in Study 2 we illustrated how
the chance of making the strongest hand at the table does not depend solely upon the pool of cards that a
player can build their hand from—but depends on how that pool is divided between the player’s hand and the
community cards, and therefore upon how opposing players can use those cards, and consequently also upon
the number of opponents. Our design allows us to consider two ‘isolated cues’: the equiprobability anchor
derived from the number of players, and the starting hand strength. For Study 1, the correlation between the
equiprobability anchor and the true probability was r ¼ .39. To provide a formal measure of card strength
we determined the probability of obtaining the best cards against 1 opponent for each starting hand
(i.e. before the flop is dealt). These values were also moderately correlated with the true probability (r ¼ .38).
Thus, when treated in isolation these cues are clearly helpful, but are not powerful enough to explain the very
strong correlations between judged and true probabilities that most of our participants achieved: each of these
cues accounts for around 15% of the variance in true probability, yet most participants’ judgments could
account for more than 70% of this variance. This implies that poker players do not simply rely upon simple
cue information, but combine or aggregate information effectively from different sources to make
appropriate judgments. It therefore seems likely that they have used information in a configural fashion, much
as Busey and Vanderkolk (2005) report in the case of experts in finger print identification. Koehler et al.
(2002) point out that in most of the examples of very good calibration in full-range tasks the event base rate is
close to 50%. This is certainly true for Keren (1987) where bridge players had a 55% chance of making a
contract, for the most accurate set of physicians’ judgments reported by Arkes et al. (1995) where patients had
a 45% chance of surviving 6 months, and for the highly accurate predictions of sports writers obtained by
Onkal and Ayton (1997, reported in Koehler et al., 2002) with 56% of home team wins in a set of English
soccer matches. Thus again, our data are unusual in that successful probability judgments are made despite
moderately low base rates (36–37%). Consequently, not only were poker players’ probability judgments
unusually good—they were surprisingly good given the task characteristics.
Thus, although some features of the task environment may have been conducive to good judgment, no
single one of these five possible reasons provides a definitive explanation of why our participants succeeded
(in probability judgment) when others have frequently failed. A sixth possibility may have contributed to the
success of our participants—or, perhaps more correctly, to the failure of others. Our task lent itself to the use
of simulation to generate accurate estimates of the true probabilities for each hand. This is not normally
possible in calibration research, where usually one relies upon observed outcomes only using the relatively
small set of outcomes for which judgments were made.
Hypothesis evaluation and case-based reasoning in poker

Direct support models of probability judgment provide a framework for examining the manner in which
participants used the available information in our tasks (Brenner, 2003; Griffin & Tversky, 1992; Tversky &
Koehler, 1994). This class of models assumes that people rely mainly upon the characteristics of the case at
hand when judging probability. Griffin and Tversky’s (1992) strength–weight model proposes that
participants anchor on the strength (i.e. extremity) of evidence for the hypothesis under consideration and
adjust for the weight (i.e. predictive value) of the evidence. For instance, base rate neglect is equivalent to
placing too much emphasis on evidence strength, and adjusting insufficiently (for predictive value) towards
the base rate. Support theory (Brenner, 2003; Rottenstreich & Tversky, 1997; Tversky & Koehler, 1994)
proposes that people weigh the support (i.e. evidence) for a hypothesis against the alternatives.
Overestimation occurs if people consider only the support for the hypothesis (e.g. that I hold the best hand),
and is reduced if the salience of support for the alternative hypotheses is made more salient (e.g. by more
detailed descriptions of events). Insufficient regard for the support for this alternate hypothesis could explain
the tendency to overestimate the probability of obtaining the best cards for weak hands. A player may find it
difficult to ‘unpack’ the (many) possible opponents’ hands that would be better than his. Thus, while
DOI: 10.1002/bdm
recognising that the support for his hand is weak, he may underestimate the strength of support for his
opponents’ hands.
How might the evaluation of the strength of evidence or the support for a hypothesis proceed in the early
stages of a hand of poker? Three quarters of our Study 1 participants produced judgments that correlated
r > .85 with the true probabilities (and three-quarters r > .81 in Study 3)—this could only be achieved by
evaluating and integrating multiple sources of information. However, inevitably, poker players were not
perfect in their judgment: the correlation between judged and true probabilities was always less than 1, and
there was a tendency towards overestimation for low and moderate probabilities. According to the strength-
weight model, these imperfections can be seen as deriving from a tendency to anchor on the value of the
starting hands, but to adjust (somewhat insufficiently) for the a priori base rate. According to support theory,
these imperfections can equivalently be seen as a tendency to evaluate the strength of one’s own hand, whilst
paying insufficient regard to the potential strength of one’s opponents’ hands. At least four lines of evidence
from our studies support this conclusion. First, in Study 1, overestimation was greatest for hands with the
strongest starting hands. Second, for rearrangements of the same five cards in Study 2, overestimation
relative to other hands was predicted by differences in starting hand strength (though this was not so in Study
3). Third, accuracy was worst in the Flop task where the a priori base rate was not salient. Fourth, analysis of
Study 1 reveals that probability judgments across the 75 hands were more strongly related to starting hand
strength than they should have been, and less strongly related to the equiprobability anchor than they should
have been. This is shown by the fact that the correlation between starting hand strength and true probability is
.38, but the correlation between starting card strength and the mean judged probability in slightly higher at .49
(see Footnote 3). In contrast, the correlation between the equiprobability anchor and the true probability is
.39, but the correlation between the equiprobabilty anchor and the mean judged probability is slightly lower at
.32. This finding is replicated for the 35 hands used in Study 3. Here the correlation between card strength and
true probability is .37, but the correlation between starting card strength and mean judged probability is
higher at .50. The correlation between the equiprobability anchor and true probability is .41, but the
correlation between the equiprobability anchor and mean judged probability is again lower at .24. Equivalent
results are obtained from analysis of each individual participant: the median correlation between judgment
and starting hand strength was .45 in Study 1 and .42 in Study 3, whereas the median correlation between
judgment and the equiprobability anchor was .29 in Study 1 and .21 in Study 3. Analysis of these correlation
coefficients for individuals shows that most participants’ judgments were more closely related to starting
hand strength than they were to the equiprobability anchor (78% of participants in Study 1 and 79% in Study
3)—even though these cues are similarly related to the true probability. Note that there is little apparent
difference in cue usage between the less and more ecologically valid version of the task (Study 1 vs. Study
3)—though the slightly lower weight given to the equiprobability anchor in Study 3 could imply a little less
regard for the number of opponents when this variable is not explicitly mentioned in Study 3.
The previous paragraph gives an account of why our poker players’ judgments are sub-optimal—but, in a
sense, it also provides the key to understanding why they were not especially far from being optimal. Players
may anchor on (or overweight) the strength of their current hand or their starting hand—but this information
is a valid cue—it is not an arbitrary anchor—and the fact that they can do so shows a measure of expertise in
assessing hands. Players may have insufficient regard for the base rate or take insufficient note of alternative
possibilities (e.g. in their mental simulations)—but they do nonetheless make some appropriate adjustment
for these factors, which participants in some other studies seem to have taken far too little notice of
(e.g. Buehler, Griffin, & Ross, 2002; Kahneman & Tversky, 1973). There are a few features of the game of
poker that might have encouraged participants to pay some (if slightly insufficient) attention to these factors
that people tend to neglect in other situations. First, poker is a zero-sum competitive game. If I win, all my
competitors lose—if one of my opponents wins, then I lose. It is likely that this makes it more natural to
evaluate the support for the hypothesis that I win relative to the support for the alternate hypothesis that one of
my opponents win without overlooking significant amounts of support for the alternate hypothesis. Second,
DOI: 10.1002/bdm
as already discussed, the a priori base rate ought to be fairly salient—I can see the number of players
remaining in the game, and informal or formal calculations of whether it is worth betting (i.e. paying) to
remain in the hand will probably encourage some reflection on this component of the game (for a description
of ‘counting outs’ and estimating ‘pot odds’ see Harrington & Robertie, 2006, pp. 119–144). Furthermore, the
zero-sum nature of the game ought to make it clear why the number of opponents is relevant to the calculation
of the probability of obtaining the best cards—imbuing the a priori base rate with causal properties, which
often decreases the degree of base rate neglect (Tversky & Kahneman, 1982). Third, even if a player does not
attend consciously to this base rate, as soon as he simulates outcomes for multiple opponents (ideally for all of
his opponents), and makes some adjustment to his beliefs on that basis, then he is making adjustment
equivalent to some adjustment for base rate. Similarly, even if a player fails to simulate possible opponents’
hands, a mechanical adjustment on the basis of his base rate probability of obtaining the best cards is
equivalent to incorporating the fact that with more opponents in the game there are more chances for at least
one of them to acquire a strong hand.
Future research
Texas Hold ‘em Poker offers valuable possibilities for investigating behavioural decision making and
expertise. We considered only the first stages of the game, and there is considerable scope for extending the
tasks that we used to examine how beliefs update as new information becomes available as hands are played
out (Hogarth & Einhorn, 1992). Also, by asking players to judge the probability of holding the strongest hand
assuming that all players remain in the game, we by-passed some of the more subjective or intuitive (and,
arguably, most interesting) features of the game: confidence, bluffing and betting behaviour. An
understanding of these features would require different kinds of studies, at least some involving actual play.
The findings from our judgment task provide clear predictions concerning betting behaviour. Overestimation
was greatest when starting hands were strong—therefore, we would predict a tendency to adopt sub-optimal
betting strategies when holding such hands. However, it would be important to investigate whether betting
decisions can indeed be predicted from players’ explicitly stated beliefs.
A ‘friendly’ task environment or not?

Several aspects of Koehler et al.’s (2002) analysis (see above) emphasised the importance of considering
whether the task environment allows good judgment (see also Shanteau, 1992). This point has led some to
distinguish between ‘friendly’ and ‘unfriendly’ environments (Shanteau & Thomas, 2000): environments
differing in the extent to which they encourage accuracy, particularly in relation to the use of heuristics
(Hogarth & Karelaia, 2007). Following from our discussion above, an informal task analysis of probability
judgment in poker reveals that the game has some ‘friendly’ features, but that this is balanced by a similar
number of ‘unfriendly’ ones. The probability of obtaining the best cards is determinable at any stage of play,
though, in most cases these calculations are intractable. Nonetheless, the regularities of play that make
probabilities calculable may assist intuitive judgment even when players find these calculations too difficult
to perform. The rules of the game ensure that outcomes are well specified (e.g. in determining a winner) yet
feedback is incomplete (players do not always discover their opponents cards). Moreover, estimating the
probability of outcomes involves the comparison between the possible future hands of multiple opponents.
Predictive cues are available to the poker player—but these do not present themselves as concrete values, so
must be ‘extracted’ from the environment. (Meteorologists face a similar challenge when deriving
information from radar displays—see Stewart, Heideman, Moninger, & Reagan-Cirincione, 1992). In
addition, a simple linear addition of these cues is not sufficient for reliable prediction: cue-use must be
configural (because outcomes are determined by combining players’ cards with the community cards). In
DOI: 10.1002/bdm
sum, the game has enough regularity to be comprehensible, but enough complexity and uncertainty to be
interesting. This is presumably why the game has attracted millions of aficionados—and makes it a fruitful
area for decision research.
APPENDIX
Appendix A1. Mean estimated probability for each hand in Study 1

Number of True Mean estimate Mean estimate
Starting hand Flop opponents probability (Study 1) (Study 3)
Pre-flop task
6c6d — 1 62.7 62.5 56.9
6c6d — 3 31.5 47.8
6c6d — 5 20.2 36.0
6c6d — 7 15.6 26.5
6c6d — 9 13.3 19.5 24.8
3d8h — 1 34.8 30.8 31.0
3d8h — 3 15.9 21.3
3d8h — 5 10.1 15.6
3d8h — 7 7.4 11.6 10.5
3d8h — 9 6.0 8.3
KhJh — 1 61.5 62.6 58.8
KhJh — 3 37.0 50.3
KhJh — 5 27.6 40.6
KhJh — 7 22.1 32.2 31.7
KhJh — 9 18.5 25.7
3c4c — 1 35.7 41.8
3c4c — 3 20.5 32.3 20.1
3c4c — 5 15.2 26.1
3c4c — 7 12.6 19.5 14.9
3c4c — 9 11.0 14.9
AsAc — 1 84.9 83.7 78.3
AsAc — 3 63.8 71.8
AsAc — 5 49.3 61.1
AsAc — 7 38.9 52.5
AsAc — 9 31.3 44.6 48.2
Flop task
AsAc 3s3d3h 5 74.6 85.7
AsAc Kh6s9d 5 46.8 68.3
AsAc 10c4hQh 5 43.6 65.2
AsAc 2s8cAh 5 88.4 86.9
AsAc Jd5c9s 5 44.8 68.2
3d8h 10h7s8c 5 14.5 36.1
3d8h 6h7c2c 5 5.4 20.0
3d8h Qh7d7h 5 3.7 15.5
3d8h Ah3h8s 5 54.3 57.2
3d8h 5h6h7h 5 26.0 52.1
(Continues)
DOI: 10.1002/bdm
Appendix A1. (Continued)
Number of True Mean estimate Mean estimate
Starting hand Flop opponents probability (Study 1) (Study 3)
KhJh 10h9sQh 5 87.2 88.0 86.8
KhJh 10d3h9h 5 48.0 52.6 49.1
KhJh 6c4c5s 5 5.6 28.2 15.8
KhJh Ac4d9s 5 9.7 23.6 17.2
KhJh Js5h8h 5 59.9 63.0 64.1
3c4c 8c5c6c 5 65.3 69.3
3c4c KhJdJs 5 2.0 11.1
3c4c As5c7h 5 27.1 24.0
3c4c Js2s6c 5 16.8 18.9
3c4c 10d3h8d 5 12.0 24.7
6c6d AsAc6h 5 83.8 82.9
6c6d 8c6h7d 5 53.0 69.9
6c6d Ah3h9h 5 6.3 28.2
6c6d Qc9sKs 5 7.3 23.0
6c6d Jd5c7s 5 12.8 27.2
Jack-10 Task
10sJh Kh4d7h 1 41.2 38.5
10sJh Kh4d7h 3 17.9 24.6
10sJh Kh4d7h 5 12.0 17.3
10sJh Kh4d7h 7 9.0 12.3
10sJh Kh4d7h 9 7.2 7.5
10sJh Ad6c6d 1 36.2 36.6 31.5
10sJh Ad6c6d 3 11.8 24.1 19.1
10sJh Ad6c6d 5 6.0 16.7 12.5
10sJh Ad6c6d 7 3.5 12.0 11.9
10sJh Ad6c6d 9 2.3 7.3 11.8
10sJh 2cJd10h 1 89.5 85.0
10sJh 2cJd10h 3 74.2 74.1
10sJh 2cJd10h 5 63.1 63.4
10sJh 2cJd10h 7 54.8 57.0
10sJh 2cJd10h 9 48.4 50.3
10sJh QhAcKs 1 93.1 94.6
10sJh QhAcKs 3 89.0 87.8
10sJh QhAcKs 5 85.5 81.1
10sJh QhAcKs 7 82.4 76.4
10sJh QhAcKs 9 79.6 71.9
10sJh 10d3dQh 1 73.2 61.5 49.1
10sJh 10d3dQh 3 43.4 48.6 34.4
10sJh 10d3dQh 5 27.5 38.4 26.7
10sJh 10d3dQh 7 18.9 29.5 22.8
10sJh 10d3dQh 9 14.0 23.4 21.9
Cards: J ¼ jack, Q ¼ queen, K ¼ king, A ¼ ace.
Suits: c ¼ clubs, d ¼ diamonds, h ¼ hearts, s ¼ spades.
This information is given for Study 2 hands in Table 2 (Study 2) and Table 3 (Study 3).
DOI: 10.1002/bdm
ACKNOWLEDGEMENTS
This paper represents an equal collaboration between the two authors. The authors thank Steve Avons,
Michael Dowling and James O’Geran for comments on earlier versions of this work. The final version of this
paper was completed while Tim Rakow was a Visiting Fellow in the School of Psychology at the University
of New South Wales, and supported by a Short Visit Grant from The Royal Society.
REFERENCES
Ariely, D., Au, W. T., Bender, R. H., Budescu, D. V., Dietz, C. B., & Gu, H., et al. (2000). The effects of averaging
subjective probability estimates between and within judges. Journal of Experimental Psychology: Applied, 6, 130–147.
Arkes, H. R., Dawson, N. V., Speroff, T., Harrell, F. E., Alzola, C., & Phillips, R., et al. (1995). The covariance
decomposition of the probability score and its use in evaluating prognostic estimates. Medical Decision Making, 15,
120–131.
Balthasar, H. U., Boschi, R. A. A., & Menke, M. M. (1978). Calling the shots in R&D. Harvard Business Review, 56,
(May–June), 151–160.
Birnbaum, M., & Mellers, B. A. (1983). Bayesian Inference: Combining base rate opinions of sources who vary in
credibility. Journal of Personality and Social Psychology, 45, 792–803.
Bolger, F., & Wright, G. (1994). Assessing the quality of expert judgment: Issues and analysis. Decision Support Systems,
11, 1–24.
Brenner, L. A. (2003). A random support model of the calibration of subjective probabilities. Organizational Behavior
and Human Decision Processes, 90, 87–110.
Brenner, L. A., Koehler, D. J., & Rottenstreich, Y. (2002). Remarks on support theory: Recent advances and future
directions. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive
judgment (pp. 489–509). Cambridge: Cambridge University Press.
Buehler, R., Griffin, D., & Ross, M. (2002). Inside the planning fallacy: The causes and consequences of optimistic time
predictions. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive
Busey, T. A., & Vanderkolk, J. R. (2005). Behavioral and electrophysiological evidence for configural processing in
fingerprint experts. Vision Research, 45, 431–448.
Christensen-Szalanski, J. J. J., & Busheyhead, J. B. (1981). Physicians’ use of probabilistic information in a real clinical
setting. Journal of Experimental Psychology: Human Perception and Performance, 7, 928–935.
Cohen, L. J. (1981). Can human irrationality be experimentally determined? Behavioral and Brain Sciences, 4, 317–331.
Dougherty, M. R. P., Gettys, C. F., & Thomas, R. P. (1997). The role of mental simulation in judgments of likelihood.
Organizational Behavior and Human Decision Processes, 70, 135–148.
Epley, N., & Gilovich, T. (2001). Putting the adjustment back in the anchoring and adjustment heuristic. Psychological
Science, 12, 391–396.
Ericsson, K. A., & Smith, J. (1991). Prospects and limits of the empirical study of expertise: An introduction. In K. A.
Ericsson, & J. Smith (Eds.), Toward a general theory of expertise. Cambridge: Cambridge University Press.
Fiedler, K. (2000). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological
Review, 107, 659–676.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure probabilities to problem
representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330–344.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1979). Subjective sensitivity analysis. Organizational Behavior and Human
Decision Processes, 23, 339–359.
Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of
confidence. Psychological Review, 98, 506–528.
Griffin, D., & Brenner, L. (2004). Perspectives on probability judgment calibration. In D. J. Koehler, & N. Harvey (Eds.),
Blackwell handbook of judgment and decision making (pp. 177–199). Oxford: Blackwell.
Griffin, D., & Tversky, A. (1992). The weighting of evidence and the determinants of confidence. Cognitive Psychology,
24, 411–435.
Harrington, D., & Robertie, B. (2006). Harrington on Hold ‘em: Expert strategy for no-limit tournaments, Vol. I: Strategic
play (1st ed.). Henderson, NV: Two Plus Two Publishing LLC.
Hasher, L., & Zacks, R. (1979). Automatic and effortful processes in memory. Journal of Experimental Psychology:
General, 108(3), 356–388.
DOI: 10.1002/bdm
Hoerl, A. E., & Fallin, H. K. (1974). Reliability of subjective evaluations in a high incentive situation. Journal of the Royal
Statistical Society, 137, 227–230.
Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: The belief adjustment model. Cognitive
Psychology, 24, 1–55.
Hogarth, R. M., & Karelaia, N. (2007). Heuristic and linear models of judgment: Matching rules and environments.
Psychological Review, 114, 733–758.
Johnson, J. E. V., & Bruce, A. C. (2001). Calibration of subjective probability judgments in a naturalistic setting.
Juslin, P., & Fiedler, K. (2006). Information sampling and adaptive cognition. New York: Cambridge University Press.
Kabus, I. (1976). You can bank on uncertainty. Harvard Business Review, 54 (May–June), 95–105.
Kahneman, D., & Lovallo, D. (1993). Timid choices and bold forecasts: A cognitive perspective on risk taking.
Management Science, 39, 17–31.
Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237–251.
Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P. Slovic, & A. Tversky (Eds.),
Judgment under uncertainty: Heuristics and biases (pp. 201–210). Cambridge: Cambridge University Press.
Keren, G. (1987). Facing uncertainty in the game of bridge: A calibration study. Organizational Behavior and Human
Klayman, J., & Ha, Y.-W. (1987). Confirmation, disconfirmation, and information in hypothesis testing. Psychological
Review, 94, 211–228.
Koehler, J. J. (1996). The base rate fallacy reconsidered: Descriptive, normative and methodological challenges.
Behavioral and Brain Sciences, 19, 1–53.
Koehler, D. J., Brenner, L., & Griffin, D. (2002). The calibration of expert judgment: Heuristics and biases beyond the
laboratory. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive
Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human
Learning and Memory, 6, 107–118.
Lagnado, D. A., & Sloman, S. A. (2004). Inside and Outside Probability Judgment. In D. J. Koehler, & N. Harvey (Eds.),
Blackwell handbook of judgment and decision making (pp. 157–176). Oxford: Blackwell Publishing.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The state of the art to 1980. In D.
Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 306–334).
Cambridge: Cambridge University Press.
Murphy, A. H., & Winkler, R. L. (1977). Can weather forecasters formulate reliable probability forecasts in meteorology:
Some preliminary results. National Weather Digest, 2, 2–9.
Mussweiler, T., & Strack, F. (1999). Hypothesis testing and semantic priming in the anchoring paradigm: A selective
accessibility model. Journal of Experimental Social Psychology, 35, 136–164.
Poker Pro LabsTM (2007). Holdem Poker Calculator (Program Version 1.0.25). Downloaded from: http://www.
pokerprolabs.com/holdem_poker_calculator/index.html on 12 October 2007.
Poses, R. M., Cebul, R. D., Wigton, R. S., Centor, R. M., Collings, M., & Fleishli, G. (1992). Controlled trial using
computerized feedback to improve physicians’ diagnostic judgments. Academic Medicine, 67, 345–347.
Rakow, T., Harvey, N., & Finer, S. (2003). Improving calibration without training: The role of task information. Applied
Cognitive Psychology, 17, 419–441.
Rottenstreich, Y., & Tversky, A. (1997). Unpacking, repacking, and anchoring: Advances in support theory. Psycho-
logical Review, 104, 406–415.
Shanteau, J. (1992). Competence in experts: The role of task characteristics. Organizational Behavior and Human
Shanteau, J., & Thomas, R. P. (2000). Fast and frugal heuristics: What about unfriendly environments? Behavioral and
Brain Sciences, 23, 762.
Sklansky, D. (1994). The theory of poker. Henderson, NV: Two Plus Two Publishing LLC.
Stewart, T., Heideman, K., Moninger, W., & Reagan-Cirincione, P. (1992). Effects of improved information on the
components in skill in weather forecasting. Organizational Behavior and Human Decision Processes, 53, 107–134.
Teigen, K. H. (2001). When equal chances ¼ good chances: Verbal probabilities and the equiprobability effect.
TheHendonMob.com (2007). Poker Calculator. Accessed from: http://www.thehendonmob.com/pokercalc/index.html
on 12 October 2007.
Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive
Psychology, 4, 207–232.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131.
DOI: 10.1002/bdm
Tversky, A., & Kahneman, D. (1982). Evidential impact of base rates. In D. Kahneman, P. Slovic, & A. Tversky (Eds.),
Judgment under uncertainty: Heuristics and biases (pp. 153–160). Cambridge: Cambridge University Press.
Tversky, A., & Koehler, D. J. (1994). Support Theory: A nonextensional representation of subjective probability.
Psychological Review, 101, 547–567.
Windschitl, P. D., Kruger, J., & Simms, E. N. (2003). The influence of egocentrism and focalism on people’s optimism in
competitions: When what affects us equally affects me more. Journal of Personality and Social Psychology, 85, 389–
408.
Authors’ biographies:
James Liley was an undergraduate psychology student at the University of Essex when this research was conducted. He is
currently employed as a Statistical Officer for the UK Government Statistical Service.
Tim Rakow is a Senior Lecturer in Psychology at the University of Essex. His research interests include probability
judgment, decisions from experience and risk communication.
Authors’ address:
James Liley and Tim Rakow, Department of Psychology, University of Essex, Colchester, CO4 3SQ, UK.
DOI: 10.1002/bdm

Probability Estimation in Poker: A Qualified Success For Unaided Judgment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Estimation in Poker: A Qualified Success For Unaided Judgment

Uploaded by

Copyright:

Available Formats

Journal of Behavioral Decision Making

J. Behav. Dec. Making, 23: 496–526 (2010)

Probability Estimation in Poker: A Qualiﬁed

key words probability judgment; heuristics; simulation heuristic; anchors;

Copyright # 2009 John Wiley & Sons, Ltd.

Analysis of individual participants

Apparatus and materials

Design and procedure

Copyright # 2009 John Wiley & Sons, Ltd.

Items analysis and accuracy

Comparison of hands by set

Set 1: A|A€3^3€3 variants

Set 2: A|Q K€J 10€ variants

Set 3: J J^10 10€2| variants

Set 4: 3 K J 9 10^ variants

Apparatus and materials

Design and procedure

Assessment of individual and aggregate performance

Study 2 hands—comparison of hands by set

Table 3. Study 3: Comparison of judged probabilities (organised by strength of starting hand)

Copyright # 2009 John Wiley & Sons, Ltd.

A case study in excellent probability judgment

Hypothesis evaluation and case-based reasoning in poker

A ‘friendly’ task environment or not?

Appendix A1. Mean estimated probability for each hand in Study 1

You might also like