U. Hoffrage - Hindsight Bias (2003)

MEMORY
Editors
Susan E.Gathercole and Martin A.Conway,
University of Durham, Department of Psychology, Science Laboratories,
South Road, Durham, DH1 3LE.
Tel: +44 (0)191 374 2625 Fax:+44 (0)191 374 747
Email: S.E.Gathercole@durham.ac.uk
Editorial Board
Serge Bredart J.Richard Hanley Timothy Perfect
Roberto Cabeza Akira Miyake Mark Wheeler
Consulting Editors
Alan D.Baddeley Fergus I.M.Craik Henry L.Roediger III Daniel Schacter
Aims and Scope of Memory
The Journal publishes high quality research in all areas of memory. This includes experimental studies of memory (including
laboratory-based research, everyday memory studies and applied memory research), developmental, educational,
neuropsychological, clinical and social research on memory. Memory therefore provides a unique venue for memory researchers to
communicate their findings and ideas both to peers within their own research tradition in the study of memory and also to the wider
range of research communities with direct interest in human memory.
Submission of manuscripts. Manuscripts should be prepared in APA format and submitted, in quadruplicate, to Susan E.Gathercole
and Martin A.Conway, Editors of Memory, University of Durham, Department of Psychology, Science Laboratories, South Road,
Durham, DH1 3LE.
Memory is published by Psychology Press Ltd, a member of the Taylor & Francis group. Correspondence for the publisher should be
addressed to the Head Office, 27 Church Road, Hove, East Sussex BN3 2FA, UK.
This edition published in the Taylor & Francis e-Library, 2005.
“To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.”
New subscriptions and changes of address should be sent to Psychology Press, Taylor & Francis Ltd, Rankine Road, Basingstoke,
Hants RG24 8PR, UK. Please send change of address notices at least six weeks in advance, and include both old and new addresses.
Subscription rates to Volume 11, 2003 (6 issues) are as follows:
To individuals: UK: £142.00 Rest of world: $235.00 (postage and packing included)
To institutions: UK: £294.00 Rest of world: $485.00 (postage and packing included)
Memory (USPS permit number 016165) is published bi-monthly in January, March, May, July, September and November. The
2003 US institutional subscription price is $485.00. Periodicals postage paid at Champlain, NY, by US Mail Agent IMS of New York,
100 Walnut Street, Champlain, NY.
US Postmaster: Please send address changes to pMEM, PO Box 1518, Champlain, NY 12919, USA.
Memory is available online: see Psychology Online at www.psypress.co.uk for information. Alternatively, please visit the journal
website at http://www.tandf.co.uk/journals/pp/09658211.html
Memory is covered by the following abstracting, indexing and citation services: Biobase; Current Contents (ISI); Embase; Ergonomics
Abstracts; Focus on Cognitive Psychology; LLBA; PsycINFO; Research Alert; SciSearch; Sociological Abstracts.
Copyright: The material published in this journal is copyright. No part of this journal may be reproduced in any form, by photostat,
microfilm, retrieval system, or any other means, without the prior written permission of the publisher.
This publication has been produced with paper manufactured to strict environmental standards
and with pulp derived from sustainable forests
© 2003 Psychology Press Ltd
ISBN 0-203-48789-3 Master e-book ISBN
ISBN 0-203-59546-7 (Adobe eReader Format)

Research on hindsight bias: A rich past, a productive present, and a
challenging future
Ulrich Hoffrage
Max Planck Institute for Human Development, Berlin, Germany
Rüdiger F.Pohl
Justus Liebig University Giessen, Germany
In this introduction to the present issue, we give a brief description of the phenomenon. Subsequently, we discuss
the major theoretical accounts, focusing on how these are related to the papers included in the issue.
Judgments about what is good and what is bad, what is worthwhile and what is a waste of talent, what is useful and
what is less so, are judgments that seldom can be made in the present. They can safely be made only by posterity.
(Tulving, 1991, p. 42)
With hindsight, we tend to exaggerate what we had known in foresight. For example, after the US and British troops attacked
Iraq in March 2003 without a further resolution of the United Nations Security Council, we were likely to overestimate how
predictable this was (as compared to a prediction made in, say, January). This effect has been termed “hindsight bias” or the
“knew-it-all-along” effect.
The oldest empirical demonstration of this phenomenon that we are aware of dates back more than 50 years. However, it was
only treated as a marginal part of that study, was not given a name, and was only marginally interpreted by the author (Forer,
1949). Things changed when Fischhoff (1975) published his classic paper on the hindsight bias. After this rediscovery,
numerous studies were conducted, and it was just 15 years later that Hawkins and Hastie (1990) published an extensive review
on the hindsight bias, closely followed by Christensen-Szalanski and Willham’s (1991) meta analysis, which covered 122
studies published in 40 articles. A more recent literature search in Psyc-INFO (up to week 21/2003) with the entry “hindsight
bias OR knew it all along” revealed 152 hits. In 5-year intervals, starting in 1975, 1980, 1985, 1990 and 1995, the numbers
were 3, 8, 15, 37, and 56, respectively, indicating considerable growth over time. Given the amount of research, it is hardly
surprising that this phenomenon is also treated in most textbooks on judgement and decision making, cognitive biases, and
memory.
To our knowledge, however, this is the first special issue of a journal exclusively focusing on hindsight bias. Although time
is also ripe for another review paper, there is no question that the introduction to this special issue cannot provide a
comprehensive state-of-the-art overview. Instead, we first give a brief description of the phenomenon. Subsequently, we
discuss the major theoretical accounts, focusing on how these are related to the papers included in the present issue. We
thereby restrict ourselves to just an overview of these papers without giving away too much information—after all, this is only
the entrée and the meal is still to come. We close this introduction with due acknowledgements and some remarks, made in
hindsight, about the process of compiling this issue.

http://www.tandf.co.uk/journals/pp/09658211.html DOI:10.1080/09658210344000080
THE PHENOMENON OF HINDSIGHT BIAS

Hindsight bias can only be obtained when judgements are given under uncertainty: Telling adults that 7*7 equals 49, and then
asking them what they would have calculated had they not been told the solution, will hardly produce the effect. Where and
how has it been found?
Designs and definition

Two different general experimental procedures have been employed. In the memory design, people first give an (unbiased)
response, then receive the correct answer, and are finally asked to recall their original response. As a control, the same items are
given to other people without providing them with the correct answer before they recall their original response. In the
330 HOFFRAGE AND POHL
hypothetical design, people receive the correct answer right away and are then asked to provide the response they would have
given without this knowledge. As a control, other people are asked for their response without giving them the correct answer
beforehand.
Generally, hindsight bias may be said to exist whenever the responses made in hindsight lie closer to the correct answer
than those made in foresight, and when the measure that captures this difference is significantly larger than for a control
group. Comparison against a control group is important, because original and recalled responses may systematically differ for
other reasons. These include (a) regression effects: if the first responses to a numerical estimation task are distributed around
the correct answer, then the chances of recollecting an estimate that is closer to the correct answer or that lies even beyond are,
on average, larger than of recollecting an estimate that deviates in the opposite direction (Pohl, 1995); (b) the reiteration
effect: repeated exposure to statements increases people’s confidence that they are true, which, if true statements are more
prevalent than false statements, overall works in the same direction as the hindsight bias (Hertwig, Gigerenzer, & Hoffrage,
1997); or (c) thinking about a question a second time may activate more knowledge that is relevant for the answer.
Materials and measures

Hindsight bias is very robust across content domains. It has been found in general-knowledge questions, in political or
business developments, in predictions of elections or sport results, in medical diagnoses, and in personality assessment, to
name only a few (for an overview, see Hawkins & Hastie, 1990).
It is also very robust across type of tasks. The following list includes the types that have been used most frequently.
Hindsight bias has been found with two-alternative forced-choice tasks, both with respect to choices and to confidence in
their correctness (“Which city has more inhabitants, London or Paris?”), with confidence in the correctness of assertions
(“True or false: London has more inhabitants than Paris”), with numerical questions (“How many inhabitants does London
have?”), with predicting outcomes of survey questions on a percentage scale (“How many German households currently have
an Internet access?”), with rating the likelihoods of possible developments of a given scenario (e.g., outcomes of international
conflicts, patient histories, or consequences of business decisions), or with answers on closed rating scales (e.g., rating one’s
own or someone else’s performance, school grade, satisfaction, or personality traits).
The most common measures in the memory design compare pre- and post-outcome estimates with respect to their distance
to the solution (in the hypothetical design, pre-outcome and post-outcome estimates are obtained between-subjects). One such
shift measure, specifically tailored to tasks that involve numerical estimates on an unlimited scale, is given by the “ΔE” index
(Hardt & Pohl, 2003-this issue; Pohl, Eisenhauer, & Hardt, 2003-this issue). To allow for averaging across items with
different scales and for a more meaningful comparison between experiments, this index is used with standardised data. If the
task requires an answer on a limited scale (e.g., a dichotomous choice or an answer on a percentage scale), the measure of the
difference between the responses given in foresight and those given in hindsight can be simplified.
The memory design involves repeated measures; therefore one can and should in addition compute another dependent
variable, namely the proportion of correct recollections. Because correct recollections have a deviation of zero and thus
diminish the overall effect, they may contribute to the finding that hindsight bias is typically smaller in the memory design
than in the hypothetical design (for a direct comparison of the effects in the two designs, see Blank, Fischer, & Erdfelder,
2003-this issue; Musch, 2003-this issue; Schwarz & Stahlberg, 2003-this issue).
Relevance and related phenomena

Research on hindsight bias not only offers theoretical insights for memory storage and retrieval of information, but also has
significant practical implications (Christensen-Szalanski & Willham, 1991). Consider, for example, a researcher who is asked
to review a manuscript but already knows the opinion of a fellow researcher. Or consider a physician who, knowing the
diagnosis a colleague has made, is asked for a second opinion. Many studies have shown that the new and allegedly
independent judgements are most likely biased towards those that are already available (see Hawkins & Hastie, 1990). In other
words, second judgements are less independent from previous ones than we like to think. Moreover, feeling wiser in hindsight
could also lead us to wrong speculations about how we would have reacted in that situation (i.e., without the knowledge of
how things would turn out). For example, having understood why the Challenger disaster occurred may affect our evaluations
of the people involved and their omissions and commissions. Another example are the court trials in which investment
advisors have been sued by clients after their recommendations led to financial loss.
An experimental paradigm that is closely related to that of hindsight-bias studies is employed in studies on anchoring. In a
hindsightbias experiment using a hypothetical design, participants are informed about the solution and are then asked what
they would have estimated. In contrast, studies on anchoring do not provide the solution but instead introduce an allegedly
random value. Participants are then asked to indicate whether the true solution lies above or below this value, and
subsequently to give an exact estimate. Both procedures lead to comparable distortions, suggesting that the hindsight bias and
INTRODUCTION TO SPECIAL ISSUE 331
anchoring effects may be driven by similar (if not the same) cognitive processes. Despite this similarity, both effects are still
treated separately in the literature, and we hope that Pohl et al. (2003-this issue), who discuss that relationship, will contribute
to connecting the corresponding research traditions.
Other related research paradigms are the misinformation effect observed in studies on eyewitness testimony (Loftus &
Loftus, 1980; McCloskey & Zaragoza, 1985) and the reiteration effect (Hertwig et al., 1997). Both phenomena involve a
change of response over time, in the case of the misinformation effect due to additional information from a different source
(followed by the question “What was the information in the original source?”), and in the case of the reiteration effect eff ect
due to another presentation of the same statement (followed by the question “How confident are you now that this statement is
true?”).
THEORETICAL ACCOUNTS
Is hindsight bias a memory bias or a judgemental bias? The fact that it is observed in a memory design may suggest the first,
but because it is also observed in the hypothetical design, the latter may suffice to account for the effect (including the effect
obtained in a memory design). Other phenomena lead to a similar question. Take a situation in which you cannot recall an
event: Is memory impaired (the trace is not in the storage anymore) or is remembering impaired (the trace may still be in memory,
but cannot be retrieved at the moment)? Or take the influence of false information on eyewitness testimony as another
example: Is the original trace in memory over-written (Loftus & Loftus, 1980), or do different memory traces coexist, and the
influence of false information is better explained as a judgemental phenomenon in which demand characteristics play an
important role for performance in a memory test (McCloskey & Zaragoza, 1985)?
The hindsight bias has been explained in both ways. Fischhoff (1975) suggested that being told the solution impairs memory
—that is, alters one’s knowledge about the criterion or the event in question. Alternatively, information processing may be
biased when “rejudging the outcome” (Hawkins & Hastie, 1990, p. 321) during reconstruction. Hawkins and Hastie
considered three subtasks that are probably involved in such a (re)judgement, namely sampling of evidence, interpretation of
evidence, and integration of the implications of evidence. These distinctions are useful when characterising some of the
papers of this special issue of Memory.
Computational models
The paper following this overview introduces a cognitive process model named SARA (Selective Activation and
Reconstructive Anchoring; Pohl et al., 2003-this issue) that makes specific assump tions about the representation of the item-
specific knowledge base in memory and the cognitive processes leading to hindsight bias. The model assumes an
associatively connected knowledge base consisting of “images”. Cyclic search and retrieval processes are applied to these
images in order to generate an estimate, to encode the solution, and to recall or reconstruct the original estimate. These
processes change the association matrix (called “selective activation”) and thus alter the likelihood of an image being
retrieved in later memory search. In addition, hindsight bias may also result from “biased sampling”, that is, the solution may
serve as a retrieval cue biasing memory search towards solution-related images in memory. As a consequence, the
reconstructed estimate will likely be biased towards the solution.
The other computational model included in the present special issue of Memory is the RAFT model (Reconstruction After
Feedback with Take The Best; proposed by Hoffrage, Hertwig, & Gigerenzer, 2000). This model adds a new item to Hawkins
and Hastie’s list of what can happen during reconstruction—specifically, it postulates that being informed about the solution
can lead to an automatic update of the knowledge that was originally used when trying to infer that solution. Like other
reconstruction approaches, RAFT assumes that after being told the solution, we do not simply retrieve a trace from memory
(possibly meanwhile altered) about what we originally said, but rather engage in rejudging the problem. However, at the same
time it embraces Fischhoff’s idea of changes in memory, specifically by assuming changes in the knowledge that has been
used to make the original inference. Rejudging the problem based on such a distorted knowledge base may in turn lead to
distorted judgements. Hertwig, Fanselow, and Hoffrage (2003-this issue) put the RAFT model to another test and show that it
can account for the well-established fact that familiarity with the task seems to decrease the hindsight bias (Christensen-
Szalanski & Willham, 1991).
Both SARA and RAFT are highly formalised models, which were successfully implemented as computer programs. The
advantages are evident: When simulating known empirical data, computational models have to be refined and adapted to
account for these data, which in turn leads to a better understanding of the psychological mechanisms that may have produced
these data. New empirical evidence raises the question whether the models can account for it. With respect to Hardt and
Pohl’s (2003-this issue) data on the impact of anchor distance and anchor plausibility, the SARA model, which is designed
for the type of task used in this study, was able to do so. Moreover, both SARA and RAFT are able to predict new findings
and thus direct future empirical work (see the discussion sections of Pohl et al., and Hertwig et al., both 2003-this issue).
Another perspective could be to integrate these computational models—and they certainly have this potential—in frameworks
of “unified theories of cognition” such as ACT-R or SOAR.
Meta-cognitions and surprise

Unlike SARA and RAFT, which assume an implicit and unconscious influence of outcome information on memory (by changing
association strengths and cue values, respectively), other approaches exclusively locate hindsight bias in the reconstruction
phase. Following Stahlberg and Maass (1998), Schwarz and Stahlberg (2003-this issue) assume that meta-cognitions, that is,
cognitions about one’s own cognitive competence or capacity, are used when inferring one’s original answer. In particular,
they propose that the solution provided by the experimenter is chosen as an anchor in this process (see also Hardt & Pohl,
2003-this issue), and that feedback about their performance leads participants to use this anchor accordingly. For instance, if
participants receive solutions widely diverging from their original estimates and are subsequently told that their estimates had
been good, they show a larger hindsight bias than participants who learned that their estimates had been poor. (These authors
also provide a nice and brief overview of theoretical approaches, focusing on reconstruction.)
The next paper, by Werth and Strack (2003-this issue), provides yet another “inferential approach to the knew-it-all-along
phenomenon”. These authors suggest that feelings and experiences made when encountering the solution are used as cues
when inferring what one would have said. (A similar approach has been taken in Winman, Juslin, and Björkman’s (1998)
accuracy-assessment model, which is based on the assumption that people use an inferential strategy to re-assess their previously
accomplished level of accuracy. If they have been overconfident in their former estimates, this strategy may lead to hindsight
bias.)
A special kind of experience we may have when learning about the outcome of an event or the correct solution to a question
is surprise. There has been some discussion about the role of surprise, in particular due to some interesting but also seemingly
contradictory findings. After reviewing this literature, Pezzo (2003-this issue) proposes a sense-making model of the
hindsight bias, in which he distinguishes between “initial surprise” and “resultant surprise”. We may encounter initial surprise
when confronted with an event we did not expect. According to Pezzo, such an event triggers a sense-making process that,
dependent on the outcome of this process, may or may not lead to resultant surprise. Pezzo suggests that we typically only
have conscious awareness of resultant surprise. Further, he empirically demonstrates that high resultant surprise leads to
reduced, eliminated, and possibly even reversed hindsight bias (which is, given the robustness of the effect, a remarkable
prediction and result). Instead of “having known it all along”, participants might experience in these cases a feeling of “I
would never have known that”. One interpretation could be that they managed to neither integrate the solution into their
knowledge base nor use it as a retrieval cue.
Motivational accounts and individual differences

While cognitive explanations for hindsight bias have received much more attention than motivational accounts, they do not
exclude the latter. In fact, motivational accounts have been discussed in the literature, and also some of the papers of this
special issue are in this tradition. For instance, Pezzo (2003-this issue), although focusing on surprise, also examined self-
defensive processing and suggests that its effects may only occur in situations in which the decision maker feels responsible
for the negative outcome. The subsequent paper by Mark, Boburka, Eyssell, Cohen, and Mellor (2003-this issue) focuses on
exactly this topic, namely on ways to successfully cope with such outcomes. The data they present indicate that people
involved in a particular situation perceive negative, self-relevant outcomes as less foreseeable than neutral observers do,
suggesting that hindsight bias in such situations may be attenuated due to self-protecting motives.
The next paper by Renner (2003-this issue) also studies self-serving processes. However, unlike Mark et al., who gave
(faked) feedback on performance in a laboratory task, Renner studied hindsight bias in a real-life setting. She asked people
who participated in a screening test for cholesterol to predict their test value and, after notification of the result, she asked
them to remember what they originally said. Those who received positive feedback, that is, a low cholesterol value, showed
no hindsight bias. Those who unexpectedly received threatening feedback, that is, a high value, showed hindsight bias when
they were asked immediately after having been told the result, and reversed hindsight bias when they were asked around 5
weeks later. This pattern is explained with a shift of the motivational focus from “hot effect” and fear control to more
cognitive event representations and danger control.
Besides self-defence, another motive that has received attention in the literature is self-presentation: According to this
view, participants of hindsight studies simply try to appear smarter than they really are (as illustrated by the connotation of the
term “I knew-it-all-along”). Such motives have often been discussed in terms of individual differences, a topic explored by
Musch (2003-this issue). Musch computed the correlations between individuals’ degree of hindsight bias and their scores on
10 personality tests. When hindsight bias was measured in a hypothetical design, five of these correlations reached
significance, and when measured in a memory design, only two became significant. The effects of those variables that reached
INTRODUCTION TO SPECIAL ISSUE 333
significance (in neither of the two designs was self-presentation or self-deceptive enhancement among them) were all of
medium size, suggesting that a full account of hindsight bias also requires the viewpoint of personality psychology.
Components and adaptive value of the hindsight bias

The first heading in the present introduction was “the phenomenon of hindsight bias”. Blank et al. (2003-this issue), who were
the first to report a successful replication of the hindsight bias in political elections using a control group, conclude their paper
with a noteworthy speculation that directly relates to this heading. They observed a “lack of correspondence between memory
performance and subjective hindsight experience”, which leads them to suggest that “the hindsight bias may actually be an
interrelated complex of three subphenomena, memory distortion, illusion of foresight, and impression of necessity” (p.
501, italics by the authors). They subsequently use these components to discuss several approaches and results reported in
other papers of the present special issue, thereby also offering interesting perspectives for future research.
In quite another approach to decompose the hindsight bias, Erdfelder and Buchner (1998) presented a multinomial model
that allows one to infer the frequencies of various cognitive processes from the distributions of responses in different
experimental conditions. From their results, Erdfelder and Buchner concluded that reconstructive processes present a major
source of hindsight bias, while the evidence for memory impairment as its cause was rather weak.
The papers included in this special issue of Memory address several questions, such as: What are the mechanisms,
moderating variables, and the components of the hindsight bias? A variety of answers are given, but note that they neither
contradict nor exclude each other. There is not even a contradiction between seemingly opposite answers to the question of
whether showing the hindsight bias is a good or a bad thing. Of course, there are situations in which one may wish to assess a
previous knowledge state exactly, and one may fear to be a victim of the hindsight bias. However, these cases may be
relatively rare, so that the disadvantage of a biased memory reconstruction is probably more than outweighed by the benefits
of adaptive learning (Hoch & Loewenstein, 1989; Hoffrage et al., 2000).
ACKNOWLEDGEMENTS
The idea of this issue was born in 1998 during a coffee break at a conference. We (U.H. and R.F.P.) contemplated the
considerable amount of empirical and theoretical work on the hindsight bias that has been done during the last years, and we
found that the 2 hours allotted to a conference symposium did not provide enough time for all the ideas. Consequently, we
asked several colleagues whether they would be interested in coming to a meeting exclusively devoted to the hindsight bias.
The response was overwhelming and in the end a group of nearly 20 people participated in a workshop that we organised in
summer 1999, close to the University of Mannheim, Germany. We gratefully acknowledge the financial support from the
German Research Foundation (particularly the SFB 504, located at the University of Mannheim) for this wonderful
opportunity. At the end of those 3 days, the group reached the same conclusion as we had during our coffee break one year
earlier and thus confirmed what we knew all along, namely, that there is a critical mass of unpublished research, and that it is
a good idea to assemble this work in a special issue.
We were especially fortunate that Memory enthusiastically responded to this enterprise, and we would particularly like to
thank the editors, Sue Gathercole and Martin Conway, for their support as well as for their trust in our editorial decisions.
In order to support the exchange of ideas and to further improve the quality of the submitted papers, we organised two
additional workshops at our home institutions, and our thanks go to the University of Giessen and the Max Planck Institute
for Human Development in Berlin for their financial support. During these meetings, we extensively discussed previous drafts
of eight of the present papers (and others, which we as the editors, however, finally rejected). The fact that the authors read
(and participated in the discussion of) each paper is also reflected in the many cross-references between the papers in this
issue. In response to our call-for-papers that was sent to several mailing lists, we received additional manuscripts and finally
accepted two of these.
On top of the discussion in both workshops, a two-stage, anonymous peer review took place. Some of the reviewers
revealed their identity and are acknowledged in the individual papers, but we would still like to take the present opportunity to
thank all reviewers and workshop participants for their many helpful comments. In particular, our thanks go to the authors for
their efforts which, at the end, led to this issue. Last but not least, we would like to acknowledge Jenny Millington for her
careful copy-editing, DP Photosetting who did the typesetting, and Isobel Muir and Mark Fisher from Psychology Press for
the great job they have done with organising the whole production process of this issue.
REFERENCES
Blank, H., Fischer, V., & Erdfelder, E. (2003). Hindsight bias in political elections. Memory, 11, 491–504.
Christensen-Szalanski, J.J.J., & Willham, C.F. (1991). The hindsight bias: A meta-analysis. Organizational Behavior and Human Decision
Processes, 48, 147–168.
Erdfelder, E., & Buchner, A. (1998). Decomposing the hindsight bias: An integrative multinomial processing tree model. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 24, 387–414.
Fischhoff, B. (1975). Hindsight ≠ foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental
Psychology: Human Perception and Performance, 1, 288–299.
Forer, B.R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility. Journal of Abnormal and Social Psychology,
44, 118–123.
Hardt, O., & Pohl, R.F. (2003). Hindsight bias as a function of anchor distance and anchor plausibility. Memory, 11, 379–394.
Hawkins, S.A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the outcomes are known. Psychological Bulletin, 107,
311–327.
Hertwig, R., Fanselow, C., & Hoffrage, U. (2003). Hindsight bias: How knowledge and heuristics affect our reconstruction of the past.
Memory, 11, 357–377.
Hertwig, R., Gigerenzer, G., & Hoffrage, U. (1997). The reiteration effect in hindsight bias. Psychological Review, 104, 194–202.
Hoch, S.J., & Loewenstein, G.F. (1989). Outcome feedback: Hindsight and information. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 15, 605–619.
Hoffrage, U., Hertwig, R., & Gigerenzer, G. (2000). Hindsight bias: A by-product of knowledge-updating? Journal of Experimental
Psychology: Learning, Memory, and Cognition, 26, 566–581.
Loftus, E.F., & Loftus, G.R. (1980). On the permanence of stored information in the human brain. American Psychologist, 35, 409–420.
Mark, M.M., Boburka, R.R., Eyssell, K.M., Cohen, L. L., & Mellor, S. (2003). “I couldn’t have seen it coming”: The impact of negative
self-relevant outcomes on retrospections about foreseeability. Memory, 11, 443–454.
McCloskey, M., & Zaragoza, M.S. (1985). Misleading postevent information and memory for events: Arguments and evidence against
memory impairment hypotheses. Journal of Experimental Psychology: General, 114, 1–16.
Musch, J. (2003). Personality differences in hindsight bias. Memory, 11, 473–489.
Pezzo, M.V. (2003). Surprise, defence, or making sense: What removes the hindsight bias? Memory, 11, 421– 441.
Pohl, R.F. (1995). Disenchanting hindsight bias. In J.-P. Caverni, M.Bar-Hillel, F.H.Barron, & H.Jungermann (Eds.), Contributions to decision
making (pp. 323–334). Amsterdam: Elsevier.
Pohl, R.F., Eisenhauer, M., & Hardt, O. (2003). SARA: A cognitive process model to simulate the anchoring effect and hindsight bias.
Memory, 11, 337–356.
Renner, B. (2003). Hindsight bias after receiving self-relevant health risk information: A motivational perspective. Memory, 11, 455–472.
Schwarz, S., & Stahlberg, D. (2003). Strength of hindsight bias as a consequence of meta-cognitions. Memory, 11, 395–410.
Stahlberg, D., & Maass, A. (1998). Hindsight bias: Impaired memory or biased reconstruction? In W. Stroebe & M.Hewstone (Eds.), European
review of social psychology (Vol. 8, pp. 105–132). Chichester, UK: Wiley.
Tulving, E. (1991). Memory research is not a zero-sum game. American Scientist, 46, 41–42.
Werth, L., & Strack, F. (2003). An inferential approach to the knew-it-all-along phenomenon. Memory, 11, 411–419.
Winman, A., Juslin, P., & Björkman, M. (1998). The confidence-hindsight mirror effect in judgment: An accuracy-assessment model for
the knew-it-all-along phenomenon. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 415–431.
SARA: A cognitive process model to simulate the anchoring effect and
hindsight bias
Rüdiger F.Pohl and Markus Eisenhauer
Oliver Hardt
The University of Arizona, USA
The cognitive process model “SARA” aims to explain the anchoring effect and hindsight bias by making detailed
assumptions about the representation and alteration of item-specific knowledge. The model assumes that all
processes, namely generating an estimate, encoding new information (i.e., the “anchor”), and reconstructing a
previously generated estimate, are based on a probabilistic sampling process. Sampling probes long-term memory
in order to retrieve information into working memory. Retrieval depends on the associative strength between this
information and the currently active retrieval cues. Encoding the anchor may alter this associative pattern
(“selective activation”) or the anchor may serve as a retrieval cue, thus directing memory search (“biased
reconstruction”). Both processes lead to systematically changed retrieval probabilities, thus causing the anchoring
effect or hindsight bias. The model is completely formalised and implemented as a computer program. A series of
simulations demonstrates the power of SARA to reproduce empirical findings and to predict new ones.
Hindsight bias and anchoring effects are demonstrated in studies in which participants have to answer difficult questions (for
an overview see Hawkins & Hastie, 1990). Typically, these questions are designed in such a way that only a few participants
know the correct answer. Participants are therefore forced to generate uncertain estimates. A characteristic of this kind of
cognitive task is that the obtained estimates may be systematically biased if specific information is provided to the
participants before they estimate. Participants are usually not aware of this manipulation (Fischhoff, 1975) and there are
obviously only a few situations where hindsight bias or anchoring effects are considerably reduced or eliminated (Erdfelder &
Buchner, 1998, Exp. 3; Hasher, Attig, & Alba, 1981; Pohl, 1998).
Hindsight bias and anchoring effects are studied with slightly different procedures. Two experimental designs are commonly
used to study hindsight bias (Fischhoff, 1977). One is the “hypothetical design”, in which the solution to the question is
presented right at the beginning of the experiment. Participants are then asked to make an estimate without considering the
given solution, that is, “as if they didn’t know the correct answer”. Compared to control items (for which the solution was not
presented), the estimates for the experimental items are biased towards the given solution. In the case of a “memory design”,

http://www.tandf.co.uk/journals/pp/09658211.html DOI.10.1080/09658210244000487
participants first make an estimate. After a retention interval (usually a few days), the solution is presented before they are
finally asked to remember their previous estimate. This design, too, reveals a systematic bias of the estimate towards the given
solution. Thus, in both designs, hindsight bias (or the “knew-it-all-along-effect”, Wood, 1978) consists of a shift of generated
or remembered estimates towards the previously presented solution. In a meta-analysis covering 128 studies, Christensen-
Szalanski and Willham (1991) found only six studies in which no hindsight bias was revealed. This effect thus seems to be
extremely robust (Pohl & Hell, 1996).
The anchoring effect was demonstrated in Tversky and Kahneman’s (1974) experiment by using a manipulated wheel of
fortune. For one group of participants, the wheel always stopped at the number 10, for another group at 65. Participants were
Requests for reprints should be sent to Dr Rüdiger F.Pohl, FB 06—Psychology, Justus Liebig University, Otto-Behaghel-Str.
10, 35394 Giessen, Germany. Email: ruediger.pohl@psychol.uni-giessen.deWe are grateful to the German Science
Foundation (Deutsche Forschungsgemeinschaft) for supporting the reported research and the development of the model
SARA with two grants to the first author (Po 315/6–2 and Po 315/6–3). Our thanks also go to Gerhard Fessler, Gregor
Lachmann, and Bettina Menzel, who were of invaluable help in running the computer simulations and in documenting the
details of SARA. Very much appreciated comments on earlier versions of this paper were provided by Arndt Bröder, Carola
Fanselow, Wolfgang Hell, Ulrich Hoffrage, and Britta Renner.
336 POHL, EISENHAUER, HARDT
then asked whether the percentage of nations within the UN that were African nations was higher or lower than the randomly
obtained number. Then they produced an exact estimate. The mean estimate of the first group (25%) was lower than that of
the second group (45%). Both groups obviously used the anchor generated “randomly” by the wheel of fortune for their
orientation. Like hindsight bias, the anchoring effect can also easily be induced and has been demonstrated in numerous
studies.
Up to now, hindsight bias and the anchoring effect were looked at rather separately in the literature. We hold, however, that
both are based on the same cognitive processes (Pohl & Eisenhauer, 1997; Pohl, Hardt, & Eisenhauer, 2000). We will
therefore use the terms “estimate” and “anchor” in the context of both phenomena.
Several theories have been proposed to explain the anchoring effect and hindsight bias. In general, two classes of
explanations may be distinguished. One of these assumes that encoding of the anchor modifies the memory representation of
the item-specific knowledge. Fischhoff (1975), for example, proposed an immediate and irreversible integration of the anchor
information into one’s memory. Moreover, the anchor may be used to reorganise the logical structure of one’s episodic
memory, as described in the “creeping determinism” mechanism (Fischhoff, 1975). Similarly, the authors of the RAFT model
(Hertwig, Fanselow, & Hoffrage, 2003-this issue; Hoffrage, Hertwig, & Gigerenzer, 2000) assume that the anchor is used to
draw inferences. This could result in filling out previous gaps in one’s knowledge and in changing previously held false
beliefs. According to these theories, the anchoring effect and hindsight bias cannot be avoided because the underlying
memory representation has been changed as a result of anchor presentation. Both phenomena are thus considered inevitable
“side-effects” of learning.
Another class of theories explains the observed effects in terms of a biased reconstruction process in which the anchor
serves as a retrieval cue (e.g., Schwarz & Stahlberg, 2003-this issue; Stahlberg & Maass, 1998). According to these theories,
the memory representation is not altered. Rather, the anchor influences the reconstruction process as, for example, proposed in
Tversky and Kahneman’s (1974) classical “anchoring and adjustment” heuristic. According to the “relative trace-strength”
model (Hell, Gigerenzer, Gauggel, Mall, & Müller, 1988), hindsight bias is the result of some sort of weighted integration
process. More recently, Strack and Mussweiler (1997; Mussweiler & Strack, 1999) assumed that the anchor selectively
increases the accessibility of information in memory that is consistent with the anchor, so that anchor-related information is more
likely to be accessed at estimation or reconstruction. Finally, Winman, Juslin, and Björkman (1998) argued that people use
some meta-cognitive strategy to reassess their previously accomplished level of accuracy, which then leads to hindsight bias
as long as they are over-confident with respect to the accuracy of their estimates (see also Werth & Strack, 2003-this issue).
Unfortunately, regardless of their empirical adequacy, most theories lack a specific and precise description of the cognitive
processes that may lead to a systematically distorted estimate. It is therefore difficult to derive precise predictions from these
approaches without adding further assumptions. One noteworthy exception to this unsatisfying situation is the RAFT model
(Hertwig et al., 2003-this issue; Hoffrage et al., 2000) that builds on the PMM theory (“probabilistic mental models”;
Gigerenzer, Hoffrage, & Kleinbölting, 1991). However, the RAFT model applies only to a specific subset of experimentally used
procedures (i.e., comparisons of pairs of items with respect to a common feature; e.g., “Which city is larger, Hamburg or
Munich?”). In this paper we will illustrate how our model SARA accounts for a considerably wider range of experimental
procedures and manipulations. Specifically, the model SARA addresses the cognitive underpinnings of the various tasks
that participants perform when they engage in anchoring or hindsight-bias experiments. It does so by providing detailed
answers to the main questions: How are estimates generated, anchors encoded, and previous estimates reconstructed?
The paper is divided in two main parts. In the first part, we describe the basic assumptions of the model and how they are
formalised, such as the organisation of knowledge and the basic processes operating thereupon. In the second part, we
evaluate the model by looking at how well it allows simulation of empirical findings. We present a simulation of a complete
hindsight-bias experiment and then systematically vary the free parameters of the model in order to test the model’s
behaviour.
THE COGNITIVE PROCESS MODEL “SARA”
Overview of the model

We think that the hindsight-bias memory design actually includes the other two more simple designs, namely the anchoring
and the hypothetical designs. Therefore, we base our discussion of the model on the more complex memory design.
The model “SARA” (Selective Activation, Reconstruction, and Anchoring; Pohl & Eisenhauer, 1997; Pohl et al., 2000)
assumes that each person possesses a number of item-specific information units (called “images”) that are associated to the
given question that has to be answered. This knowledge (the “image set”) is used to generate an estimate, to encode an
anchor, and to later recollect the original estimate. All these processes modify the organisation of the image set by changing
the matrix of association strengths among the images themselves and between images and externally provided retrieval cues.
SARA: SIMULATING HINDSIGHT BIAS 337
Association strengths are increased during cyclic search and retrieval processes (called “sampling”) and decreased during
forgetting (asymptotic decay). The total activation strength of an image determines its likelihood to be accessed in memory
search. New images (e.g., estimates and anchors) can be added to one’s image set.
Hindsight bias and the anchoring effect are seen as the result of either selective activation or biased sampling, or both:
Anchor encoding may thus change the memory representation of the image set (selective activation) and, at retrieval, the
anchor may bias the memory search towards anchor-related images of the image set (biased sampling), or both. The first
process represents a change within long-term memory (as, for example, proposed in the assimilation theory; Fischhoff, 1975),
while the second one represents an effect of the retrieval process (as, for example, proposed in the reconstruction theory;
Schwarz & Stahlberg, 2003-this issue; Stahlberg & Mass, 1998). SARA thus incorporates both of the general explanations for
hindsight bias discussed above, namely an altered memory representation as well as a biased reconstruction. The result of
both processes is that the set of images retrieved during reconstruction differs in a systematic way from the set of images
retrieved during the generation of the original estimate. As a consequence, the reconstructed estimate will most likely be
biased towards the anchor.
In contrast to the memory design, the order of processes is different in the anchoring and in the hypothetical designs. Here,
the encoding of the anchor precedes the attempt to generate an “original” estimate. However, this attempt is biased because of
a selective activation caused by encoding the anchor or because of a biased reconstruction caused by the anchor serving as a
retrieval cue.
SARA is a partly simplified and partly extended version of the associative memory model “SAM” (Search of Associative
Memory, Raaij-makers & Shiffrin, 1980; Shiffrin & Raaijmakers, 1992). SAM has successfully simulated a number of
phenomena in the field of free recall and word recognition. Like SAM, the basic architecture of SARA comprises a set of
images that are stored in long-term memory and that may be recalled by currently active retrieval cues. The successful recall
of images into working memory depends on the currently established pattern of associations. This pattern may change as a
function of learning and forgetting.
SARA claims to capture all changes in a participant’s knowledge in each phase of the experimental procedure. The model
is thus able to predict the participant’s performance at any point in time. The precision and clarity of the model exceeds
previous explanations in this domain. As a consequence, the model has been successfully implemented as a computer
simulation. The next two sections present the model’s assumptions concerning the organisation of knowledge and the
cognitive processes that are postulated to be responsible for the anchoring effect and hindsight bias.
Organisation of knowledge
General architecture. In SARA, we distinguish between long-term memory, in which all information is stored, and working
memory, in which successfully retrieved information or information that will be encoded is processed. As in other theories of
memory, the capacity of long-term memory is assumed to be unlimited. In long-term memory, information is organised
according to similarity and is subject to common forgetting processes. The capacity of working memory is limited to a few units
of information (cf. Cowan, 2001; Miller, 1956).
Units of knowledge. Typically, difficult almanac questions are used to study hindsight bias and the anchoring effect (e.g.,
“How old was Goethe when he died?”). Participants in general do not know the correct answer but possess more or less
pronounced knowledge to generate an estimate. This knowledge consists of the individually available information units
(images) that are associated with the question and stored in long-term memory. With respect to the given example about
Goethe’s age at death, these images might contain knowledge about the general life expectancy at that time in history or a
memory of some picture of Goethe portraying him as an old man. These images comprise a person’s image set to a specific
question. Each question is connected to a number of images in long-term memory. The model considers only images with a
content that is or can be transformed into a numerical value. For example, an image storing information about general life
expectancy might contain the fact “about 75 years” or the memory of Goethe’s picture might evaluate to “about 80 years old”.
The range of possible estimates is additionally limited by subjectively plausible minimum and maximum values (e.g., that
most adults die between 30 and 90 years of age).
The information contained in an individual image set unfortunately remains to a great extent obscure. Introspection, which
is an unreliable method in the first place, reveals that it is almost impossible to explicitly state which knowledge someone has
available for a given question. Therefore, and in order to simulate empirical data, artificial image sets are created in SARA for
each question and each person. Subsequent processes of the model make use of only these data. In the memory design, these
image sets are based on the original estimates from the same sample. In the hypothetical design as well as in the anchoring
paradigm, distributions of unbiased estimates are available from previous studies in our lab. In either case, a randomly
determined number of images within a specific range around the original estimate will be generated to serve as an individual
and item-specific image set.
Cues. Retrieval cues play a central role in searching long-term memory. These cues are assumed to be present in working
memory during all proposed cognitive processes. The retrieval cues can stem from external or from internal sources.
Externally provided cues contain either numerical (e.g., the anchor) or non-numerical information (e.g., the question or the
task). Internal cues consist of retrieved images, which are all assumed to have a numerical content. In particular, the anchor
may be an important cue because the likelihood of retrieving a specific image increases with this image’s similarity to the
anchor. The details of this mechanism are specified in the next section.
Organisation. The images of an image set are all connected to each other. The association strength between two images is
determined by their similarity (Kahneman & Miller, 1986). In SARA, the smaller the numerical distance between two images,
the stronger their mutual association. This corresponds to the association principles of semantic-memory models.
The relative distance d between the numerical contents of two images I1 and I2 [with 0≤d≤1] is calculated by scaling the
absolute numerical distance between their values v1 and v2 at the theoretically maximal distance (i.e., the distance between the
image with the numerically highest and the image with the numerically lowest value):
(1)
The denominator of this equation is also used as the basis for guessing, whenever appropriate images do not exist or cannot be
retrieved.
The association strength S between two images I1 and I2 [with 0<S< 1] is then simply expressed as:
(2)
The larger the distance between two images, the smaller their association strength. Some external cues (e.g., the question or
the context cue representing the specific task) do not have a numerical content. In SARA’s implementation, the associa tion
strengths between these cues and the images of the image set are therefore chosen randomly from a predefined range of
possible values.
Image sets of different questions quite possibly overlap in long-term memory (e.g., for questions dealing with semantically
related items). This can cause uncontrollable interference effects when images from one set are processed and activation
spreads to related images in a different set. Usually, careful material selection attempts to avoid these effects in the
experiment’s planning phase. To simplify matters, in SARA all image sets are supposed to be separate and independent from
each other. Nevertheless, possible interference effects are at least partially accounted for by the assumption of unsystematic
fluctuation of association strengths (see below).
Basic cognitive processes

Search and retrieval processes. In SARA, a cyclic search and retrieval process (sampling) is supposed to underlie processes
such as answering questions, encoding anchors, and reconstructing estimates. Sampling retrieves necessary information from
long-term memory and thus makes it available for further processing in working memory. The number of search and retrieval
cycles per sampling process is indirectly limited by the assumed capacity of working memory (e.g., Cowan, 2001; Miller,
1956), which determines the maximum number of images that can be transferred to working memory. Not all potentially
available images will be sampled from the image set every time. Instead, we assume that only a subset of images will be
retrieved and subsequently processed. The number of cycles that take place varies intra- and interindividually (from question
to question and from sampling process to sampling process).
Memory search is controlled by the cues that are currently present in working memory. The probability of finding and
retrieving a specific image Ii is a function of its overall activation A, which is the product of all association strengths S
between this image and each cue C currently in working memory:
(3)
The question is usually the first cue in working memory. The association strengths between the question and the images of the
image set therefore determine the initial probability of finding any particular image during the search process. An image is
more likely to be found if its overall activation is relatively high, but can only be retrieved if its overall activation also
exceeds a certain absolute threshold (i.e., a predefined retrieval probability). The relative probability PR that an image Ii is
found during sampling is given by its overall activation A relative to the sum of the overall activation of all images Ik of the
image set:
(4)
If an image is transferred into working memory, it becomes an additional cue for the remaining retrieval cycles. Sampling is
therefore increasingly directed (and restricted) by those images that have already been retrieved (i.e., the internal cues).
If sampling fails to retrieve any images into working memory, only the cues that have initially been present can be used to
generate or reconstruct an estimate. In this case, a guessing process that uses general information will be employed. This
general information corresponds to the knowledge about the numerical scale of the question and the maximum interval in
which a possible answer can be found. Regarding Goethe’s age of death, for example, this knowledge might be based on the
average life span of adults (e.g., 30 to 90 years). The guessing procedure randomly draws a value out of this interval.
Learning processes. The retrieval of images into working memory leads to changes in association strength within the image
set. More specifically, the association strength increases for all pairs of cues that have been together in working memory. The
association strength is incremented during each cycle for a certain amount that is determined by a power function (Anderson
& Schooler, 1991). The following equation is used in SARA to calculate the new association strength S between cues Ci and
Cj after both have been together in working memory during one cycle from tn to tn+1:
(5)
−b
The factor x with x > 0 and 0 < b < 1 determines how the association strength increases from cycle to cycle. The higher the
association strength, the smaller any further increase. With more and more retrieval cycles, the values asymptotically
approach the maximum association strength.
The model thus “learns” about different constellations of images and about their relation to specific task contexts (like
“generate an estimate” or “encode the solution”). The increase in association strength and thus in retrieval probability is called
selective activation (Eisenhauer & Pohl, 1999; Pohl & Eisenhauer, 1997; Pohl et al., 2000). Selective activation is one of the
two basic principles of SARA that explain the anchor’s distorting influence. Strack and Mussweiler (1997; Mussweiler &
Strack, 1999) proposed a similar concept to explain the anchoring effect. According to their account, the consideration of the
anchor as a possible estimate yields a selective increase in the accessibility of anchor-related information. The RAFT model
(Hertwig et al., 2003-this issue; Hoffrage et al., 2000), too, includes a similar assumption. According to that model, presenting
the outcome information yields inferential processes that alter the memory representation’s contents, thereby producing
hindsight bias.
Forgetting processes. To account for changes due to forgetting, SARA incorporates two mechanisms: decay and
fluctuation.
Decay is thought to operate as the counterpart of learning. Associations that have been strengthened in earlier processes can
also be weakened: The longer the retention interval, the more expressed the decay of these associations. During decay, the
association strengths asymptotically approach the values they had prior to the experiment (s0) but never reach the absolute
minimum. This principle reflects the assumption that the initial association matrix is based on the images’ numerical
similarities, which are considered to be relatively resistant to permanent loss or modification. However, a total forgetting is
possible for newly encoded images (like the estimate or the anchor), since they represent recently added, “non-established”
information that has not yet been “consolidated”. The association strength S between image Ii and Ij at time tn+1 is derived
from the association strength at time tn with x > 0 and 0 < b < 1:
(6)
Besides this systematic forgetting process, we also assume non-systematic changes of association strengths (fluctuation)
because the image set is not an isolated structure that is not connected to the remaining knowledge. Rather, images that are
part of an image set are also associated to all kinds of other information stored in long-term memory. As a result, any
association strength can change any time in an unpredictable and uncontrollable manner. In the model, we try to capture these
irregular events (and the permanent “white noise in the system”) by a random process that slightly changes the association
strengths of the image set in a non-systematic way (Mensink & Raaijmakers, 1988, 1989).
SARA at work
The following illustration of the model at work is based on a typical study of hindsight bias in the memory design. Figure 1
provides a highly simplified overview of the experimental procedure and the assumed cognitive processes for a specific
question (“How old was Goethe when he died?”). Not included in the diagram are (a) different initial association strengths,
(b) different increases caused by the specific order in which images have been retrieved, and (c) changes caused by forgetting.
Generating an estimate. The given question and the task context serve as initial cues. The image’s values in this example
are 38, 56, 62, 74, 80, and 90 (Figure 1a). During sampling, several search and retrieval cycles attempt to find and retrieve
images. In the given example, the images with the numerical contents “74”, “56”, and “62” are successfully retrieved, thus
leading to specific changes in the association-strength matrix, as indicated by different levels in the striped areas in Figure 1.
The higher the relative position of the image, the stronger its overall activation strength. The retrieved numerical contents are
then integrated by the simplest integration algorithm, namely averaging (Anderson, 1981, 1986), yielding a mean value of
“64” (=[56+62+74]/3). This estimate is given as the answer to the question. It is added as a new image to the image set and
bound to the external retrieval cues with a high association strength.
Figure 1. Schematic display of changes in the total activation of the images of an image set in the course of a hindsight-bias experiment in
the memory design according to the process model SARA: (a) generation of an estimate, (b) encoding of an anchor, and (c) reconstruction
of the estimate (see text for more details.
Presentation and encoding of the anchor. In this phase of the experiment (Figure 1b), the question is presented together
with a specific anchor (here, “82”). The participant usually considers the anchor a significant piece of information with respect
to the question. This significance may result from the labelling of the anchor (e.g., as “the solution” or as “another person’s
estimate”) or from the participant’s uncertainty about the correct answer, which renders any information regarding the
question relevant. Accordingly, we assume that the presentation of an anchor will lead to an attempt to encode it.
The anchor is encoded by the same sampling process described above. Both the question and the anchor as well as the task
context now serve as retrieval cues. The anchor is associated with a higher strength to images that are numerically close than
to images that are numerically distant from it. For this reason, numerically closer images have a higher retrieval probability.
In the given example, three images (“74”, “80”, and “90”, respectively) are successfully retrieved. As in estimation, the
association strengths between retrieved images and cues are increased, that is, these images are “selectively activated”.
Finally, the anchor itself is added to the image set with a high association strength to the external cues.
Recollection task. The participant is now asked to recollect his or her original estimate. A sampling process identical to the
one used for generating an estimate is initiated, in order to search for corresponding information in the image set (Figure 1c).
The question, the first task context (“Generate an estimate!”), and the present task context (“Recall your estimate!”) serve as
retrieval cues. Again, the retrieval probability of an image depends on its overall activation. The original estimate can, of
course, be found and retrieved. However, if its activation is not high enough, it may not be recognised as the information that
has been searched for. In this case, the estimate is treated in the same way as any other image. This reasoning is based on the
assumption that the clarity of source information is correlated with the overall activation of an image. If, on the other hand,
the overall activation of the retrieved estimate is high enough, the search will stop and the original estimate will be given as
the original answer, that is, the participant will produce a correct recollection.
If the anchor is retrieved but not recognised as such during memory search, it will be used as an additional cue in working
memory just like any other image from long-term memory. If, however, its overall activation is high enough to be recognised,
the anchor will only serve as a retrieval cue, but will not be considered in the subsequent integration process. In many studies,
presentation of the anchor occurs simultaneously with the recollection task, so that the anchor is still present and does not
need to be remembered. In this case, the anchor will serve as an additional cue right from the beginning. The impact of an
anchor as a retrieval cue is the second source of a distorted reconstruction that results in the anchoring effect or hindsight
bias.
In the numerical example, the images “62”, “74”, and “80” are successively retrieved. Neither the original estimate itself
(“64”) nor the anchor (“82”) was found. The recollected estimate would then be “72” (=[62+74+80]/3), which is given as the
answer and added as a new image to the image set. Note that the recollected value is closer to the anchor than the original
estimate had been. Hence, this falsely reconstructed recollection may be interpreted as showing hindsight bias.
EVALUATION OF THE MODEL

In this part, we first summarise the parameters of the model. In order to evaluate the capacities of the proposed model, we
then present the simulation of a complete set of empirical data across many questions and participants. Next, we discuss the
explanatory power of the model, that is, how the model’s assumptions fit the empirical evidence—or, in other words, how
empirically observed effects are translated into the parameters of SARA. We will then report some tests of the model’s
behaviour. Each of the free parameters was systematically varied, while keeping all others constant. After looking at the
empirical effects of these manipulations, we finally point to the limitations of the model in its present version.
SARA’s parameters
Some of the model’s parameters are considered to be fixed (Table 1), while others may be used to capture specific
experimental settings and manipulations (Table 2). The fixed parameters are thought of as representing constant
characteristics of the human information processing system. Apart from known findings on the memory span (7±2), learning
and forgetting curves, and the lower and upper limits of the association strength (0 and 1, respectively), the values or ranges
of values were chosen either following the SAM model or after systematic model runs. More important for simulating
experimental manipulations are the free parameters. Their exact number depends on the specific design that SARA is
supposed to simulate. In the memory design, all 10 parameters of Table 2 may be needed, while in the hypothetical design as
well as in the anchoring design, only 6 of these are necessary. The first task in running a simulation is therefore to decide
which cognitive processes are necessary in which order. The second and more difficult task is to determine the specific values
of the free parameters of the model (cf. Table 2). Actually this task has two parts. First, all of the parameters need to be set to
specific values that reflect the general characteristics of the current experimental setting. For example, there could be a 1-
week retention interval in all conditions of Experiment 1, necessitating an increase in the amount of decay in comparison to
another Experiment 2 in which the retention interval was only 1 hour. Second, a subset of parameters must be selected that are
capable of capturing the effects of each experimental manipulation within the experiment to be simulated. Ideally, each
manipulation can be described by changes in exactly one parameter. For example, Condition A with a deeper elaboration may
be represented by a larger number of sampling cycles as compared to Condition B with a less deep elaboration.
For all parameters, their optimal values are not obvious in the beginning. However, there are two constraints to the
selection of values. First, each parameter should in substance reflect the experimental feature to be captured. For example,
varying the retention interval should be reflected
TABLE 1 Fixed parameters of the model

Parameter Description Values
Mn Memory span of individual n 5–9
S(Cq, Ii) Initial association strength of question q to image i .2–.7
S(Ct, Ii) Initial association strength of task context t to image i .2–.7
Parameter Description Values

x−b Change rate for increase (learning) or decrease (forgetting) of association strength 2−.5
fl Fluctuation: maximum of random variation of association strength ±.1
thR Recall threshold: total activation necessary for recall of images .1–.3
thI Identification threshold: total activation necessary for identification of a recalled image .3–.5
TABLE 2 Free parameters of the model and their generally used range of values
Parameter Description Range
(1) Size of image set (knowledge):
Kqn Number of images for question q and person n 0–10
(2) Initial association strength between external cues and new images:
S(Cq IE) (a) between question q and the estimate (E) .1–.9
S(Ct, IE) (b) between task-context t and the estimate (E) .1–.9
S(Cq, IA) (c) between question q and the anchor (A) .1–.9
S(Ct, IA) (d) between task-context t and the anchor (A) .1–.9
(3) Number of search and retrieval cycles (sampling):
ZG (a) for generating an estimate (G) 1–3
ZA (b) for encoding the anchor (A) 1–3
ZR (c) for reconstructing the estimate (R) 1–3
(4) Number of forgetting cycles (decay):
ZD1 (a) after generating an estimate 1–3
ZD2 (b) after encoding the anchor 1–3
in the number of decay cycles and not in any of the other parameters. Second, the selected values must be in a proper ordinal
relation with respect to other experiments or other conditions within the same experiment. For example, if the retention
interval is 1 day in Condition A and 1 week in Condition B, the number of decay cycles should be higher in Condition B than
in Condition A. Under these constraints, it is one of the goals of the model's simulation to find optimal values for each of the
parameters. To this end, all parameters are initially set to their most plausible value and then changed successively by a small
amount.
The quality of the fit between simulated and empirical data is evalutated by looking at the percentage of correct
recollections, the amount of hindsight bias, and the distribution of original and recalled estimates. The goal is to produce data
that are identical to the empirical ones in all respects. This, of course, is an ambitious objective that will be put to the test by
subjecting both empirical and simulated data to a joint statistical analysis. More precisely, we computed a 2×2 contingency
table (chi-square test) across condition (experimental vs control) and data source (empirical vs simulated) to analyse the
percentage of correct recollections, a 2×2 ANOVA to compare the amount of hindsight bias, and finally a goodness-of-fit
statistic (chi-square tests) as well as a Kolmogorov-Smirnov test to compare the distributions of empirical and simulated
estimates. Due to these different statistics, no overall fitting function could be defined. Instead, fitting was done by hand and
eye, trying to keep all test statistics as low as possible. This is admittedly a weakness of the evaluation process that needs to
be fixed in future versions.
Simulation of a hindsight-bias experiment

In this section we present a simulation of a simple experiment that demonstrated hindsight bias in the memory design. There
was no other empirical manipulation except whether or not the solution was given in the last phase before recollecting the
original estimates, thus creating experimental and control items, respectively. The goal of the simulation was to test whether
SARA would be able to replicate the given data in a satisfactory manner.
The original experiment. A total of 99 participants (students from the University of Trier, Germany) were tested in a
memory hindsight-bias design. They answered 40 difficult almanac-type questions from different knowledge domains by
giving a numerical estimate. After 1 week, they received the solutions to half of the questions and were asked to carefully read
the solutions. Then the solutions were taken away and the participants were asked to remember all their previous estimates.
The only manipulation in this experiment was whether the solution to a question was given (experimental items) or not
(control items). One of the 40 items had to be deleted after the experiment because the wording turned out to be ambiguous.
Furthermore, 114 data entries (i.e., 1.6% of the total set) were missing, while 323 others (i.e., 4.6%) were considered too
extreme (i.e., they were higher or lower than the median plus or minus three times the inter-quartile range; Tukey, 1977) and
were thus deleted from the data set. Missing entries were filled with random numbers that were taken from the range of all
answers to the specific question.
The percentage of correct recollections was found to be 28.4% for control and 24.5% for experimental items, which were
significantly different from each other, , p=.02. The amount of hindsight bias was measured by employing the
shift index “Δz” (Pohl, 1992) which is the standardised version of the “ΔE” index (cf. Fischer & Budescu, 1995). “ΔE”
compares the two distances from the original estimate E and from the recollected estimate RE to the anchor A:
(7)
A positive value indicates that the recollected estimate shifted towards the anchor (i.e., hindsight bias), a value of zero implies
that no systematic bias occurred, and a negative value signifies that a contrast effect emerged. To compute “Δz”, estimate,
anchor, and recalled estimate in Equation 7 were simply replaced with their standardised values. Standardisation of the data was
necessary in order to average across differently scaled questions. The analysis revealed that the shift was significantly larger
for experimental than for control items , F(1, 98) = 56.920, p=.0001, thus indicating the typical
hindsight bias.
The simulation with SARA. The model was run separately (with 99 simulated participants) for the control and for the
experimental condition. In the control condition, the simulation included three processes, namely the generation of an
estimate, a retention interval, and the reconstruction of the previous estimate. In the experimental condition, a fourth process
was inserted after the retention interval, namely the encoding of an anchor. Apart from that, all parameters were identical in
both conditions and set to the following values: The number of images varied from 3 to 7, the initial association strength from .
7 to .8, and the number of all sampling and decay cycles from 1 to 3.
Let us first look at the percentage of correct recollections. In the simulation, 28.2% of the original estimates in the control
condition and 22.1% of those in the experimental condition were recalled correctly, , p<.001. These quantities
were similar to those found in the original study, namely 28.4% and 24.5%, respectively. A 2×2 contingency table using the
absolute frequencies found no significant interaction, , p=.28. Thus the simulated data perfectly fit the
empirical ones.
The incorrectly recalled estimates in the simulation showed shifts of and 0.020 for experimental and control
items, respectively. Again, these findings were similar to those of the original study, namely 0.266 and 0.037. Accordingly, an
overall 2×2 ANOVA including the data source (empirical vs. simulated) and the condition (experimental vs. control) showed
only a main effect for the condition, F(1, 196)=146.258, p< .001, but no effect of the data source, neither as a main effect nor
in interaction, both Fs<1. Thus, the simulated data were not different from the original ones.
Finally, we analysed the data by looking at the distributions of the mean distances between original estimates and anchors
and recalled estimates and anchors separately for empirical and simulated data. Figure 2 plots
these distributions for data in the experimental condition. Comparing the descriptive statistics revealed that the central
tendency (mean and median) as well as the dispersion (standard deviation and interquartile range) were highly similar for
empirical and simulated data in the experimental as well as in the control condition (Table 3).
In sum, we may conclude that the simulation reached a satisfactory fit to the empirical data. SARA was able to simulate a
complete experiment with 99 participants and 39 questions in which the original estimates were recalled either in an
experimental or in a control condition. In the simulation, the two conditions differed only with respect to whether a specific
process, namely the encoding of an anchor, was included or not (experimental vs control condition). Apart from that, all
parameters were identical in both conditions. This could possibly be interpreted as showing that a memory distortion alone
would be responsible for hindsight bias. However, encoding the anchor also allows for the anchor to later become a retrieval
cue, so that a biased reconstruction may be responsible as well. More research is needed to decide this issue (cf. Schwarz &
Stahlberg, 2003-this issue).
Explanatory power of SARA

SARA provides a highly detailed model that makes precise assumptions about the cognitive processes and memory
representations involved in the generation of anchoring effects and hindsight bias. This section is intended to further illustrate
how the model works and how it may simulate a number of well-known findings by adequately setting the model’s
parameters. Because the model is built on a modular basis so that the assumed processes can be arranged or repeated as
Figure 2. Distribution of mean distances to the anchor for original and recalled estimates separated for empirical and simulated data in the
experimental condition.
TABLE 3 Statistics for the mean distances of estimates and recollections to the anchor separated for empirical and simulated data
Estimates Recollections
Empirical Simulated Empirical Simulated
Experimental items
Mean 1.075 1.007 0.809 0.760
Median 1.046 1.033 0.823 0.724
Standard deviation 0.291 0.231 0.228 0.205
Interquartile range 0.380 0.394 0.308 0.275
Control items
Mean 1.064 1.014 1.027 0.995
Median 1.059 1.002 1.048 1.001
Standard deviation 0.283 0.236 0.228 0.224
Interquartile range 0.431 0.347 0.283 0.374
necessary, many known experimental procedures as well as new ones can be created. The advantage of the model is that it
specifies the state of the knowledge representation at any point in time. In the following, we discuss two main assumptions of
the model: the process of selective activation and the use of retrieval cues.
Selective activation. The main cognitive processes of the model (generating an estimate, encoding the anchor, and
reconstructing the estimate) all employ cyclic retrieval attempts (sampling). A central feature of the sampling process is that
the associations of retrieved images to currently available external cues (like the question, the task context, or the anchor) are
increased which in turn leads to a higher retrieval probability of these images in later sampling processes (Eisenhauer & Pohl,
1999; Pohl & Eisenhauer, 1997). With this simple mechanism, SARA is able to explain several basic findings.
First of all, let us turn to the explanation of the anchoring effect and hindsight bias. Selective activation is one of the two
processes leading to these distortions. At the same time, the anchor itself is encoded as a new image into the image set. If the
anchor is retrieved in the reconstruction phase, it serves as a retrieval cue and thus biases memory search. Even worse, if the
anchor is not recognised, it will be integrated just like any other image to yield the reconstructed estimate. This last case
resembles the first account of hindsight bias, namely immediate and irreversible assimilation (Fischhoff, 1975).
Comparing different experimental procedures, hindsight bias is usually smaller in the memory than in the hypothetical
design (see, e.g., Davies, 1992; Hertwig, Gigerenzer, & Hoffrage, 1997; Musch, 2003-this issue; Powell, 1988). Given that
this finding is not based on an artefact—which would be the case if correct recollections, which can only occur in the memory
design, are not excluded from the analyses (Pohl, 1995; Stahlberg & Schwarz, 1999)—it would lend some support to the
hypothesis of a changed memory and would seriously challenge the hypothesis that hindsight bias is solely based on
reconstructive processes. According to SARA, one’s memory is already changed while generating the original estimate.
Images that were retrieved in this phase are selectively activated and thus have a higher retrieval probability than other, not-
used images. These activated images should therefore be helpful during reconstruction if the estimate itself cannot be
retrieved. As a consequence, hindsight bias will be smaller in the memory than in the hypothetical design.
SARA can also easily explain the results of studies using “dynamic” anchors that are constructed in relation to the first
estimate of a person in a memory design (e.g., Dehn & Erdfelder, 1998; Hardt & Pohl, 2003-this issue; Schwarz & Stahlberg,
2003-this issue; Stahlberg, Eller, Romahn, & Frey, 1993). In these studies, the distance of the anchor to one’s estimate is
varied. If the distance appears subjectively plausible and if hindsight bias is measured on an absolute scale, the bias increases
linearly with the anchor’s distance. In terms of SARA, these findings can again be explained on the basis of selective
activation. If the distance between the anchor and the estimate increases, more extreme images of the image set are activated,
resulting in a larger absolute hindsight bias. However, this increase should be diminished and eventually stop when the anchor
falls far outside the range of the image set, because increasingly less and finally nothing in the image set is selectively
activated by the anchor. And indeed, several authors found that extreme anchor distances did not produce a larger hindsight
bias (Pohl, 1998; Strack & Mussweiler, 1997, Exp. 3), while others even observed a smaller hindsight bias for extreme
anchors (Chapman & Johnson, 1994; Hardt & Pohl, 2003-this issue).
An interesting variation consists in the presentation of multiple anchors. If two anchors are placed symmetrically around
the original estimate, the recollection will no longer be distorted as we found in a (as yet unpublished) study conducted in our
lab. The effects of contradictory anchors can also be derived from the results of studies on “counterfactual reasoning” (see the
review in Hawkins & Hastie, 1990, for some references). These studies reported that hindsight bias could be reduced or
eliminated when a person is forced to consider the opposite outcome. This corresponds to the presentation of two symmetrical
anchors that neutralise each other. We found that if both anchors either lie on the same side of the estimate, or are
asymmetrically distributed around the estimate, or if more than two anchors are used, the result can nevertheless easily be
predicted: The resulting effect corresponds to that of the (hypothetical) mean anchor. All these results support the
assumptions incorporated into SARA: Each anchor selectively activates numerically similar images of the image set, and the
reconstruction process in the end is bound to reflect all these changed retrieval probabilities.
Retrieval cues. Retrieval cues provide a further alternative to explain specific empirical findings. Most importantly, the
anchor may serve as an additional cue while searching memory in the reconstruction phase, either because the anchor is still
available in working memory or because it has successfully been retrieved during a previous sampling cycle. In any case, if the
anchor is present in working memory, it will substantially influence the ongoing memory search and thus lead to hindsight
bias, because it is highly associated to images that are numerically similar, which will cause these images to be more likely to
be retrieved. This corresponds to the biased-reconstruction view of hindsight bias (Schwarz & Stahlberg, 2003-this issue;
Stahlberg et al., 1993; Stahlberg & Maass, 1998). Thus, SARA incorporates both of the main approaches to explaining
hindsight bias, namely altered memory as well as biased reconstruction.
The particular context of the task at hand allows for a specific recollection of individual episodes of the experiment. SARA
distinguishes between different recollection tasks (e.g., recollecting the original estimate, the given anchor, or an estimate
already recollected before). The particular context serves as an additional retrieval cue that allows for a differentiated
sampling. Davies (1987, Exp. 1) found that offering notes that were made while generating the original estimate could reduce
hindsight bias. These notes could have been the result of the first sampling process and thus represent specific images of the
image set. In the recollection phase, these notes could be represented as additional retrieval cues that are highly associated
with those images that were activated during the generation of the estimate. The reconstruction will therefore be more
accurate, so that hindsight bias is reduced.
A few studies yet have found that the recall of the anchor is also systematically biased, namely towards the original
estimate (see, e.g., Pohl, 1999). Some theories have problems with this finding. The assimilation theory, for example, would
predict the same recollection for both the estimate and the anchor. But the data showed clearly distinct recollections (Pohl,
1999). With respect to the biased-reconstruction view, one would need to assume that reconstruction of the anchor started
from the original estimate (just as reconstruction of the estimate started from the anchor). But that is a highly implausible
Figure 3. Percentage of correctly recalled estimates and amount of hindsight bias for incorrectly recalled estimates as a function of the
image-set size.
assumption. In SARA, the described finding can be explained by assuming a selective activation of the image set. In addition,
different task contexts serve as specific retrieval cues and thus help to differentiate between estimate and anchor recollection.
SARA’s behaviour
In this section we look at the model’s free parameters (cf. Table 2) and illustrate which empirical manipulations they can
possibly capture. We restrict our discussion to those parameters that appeared to be most easily applicable. These are (a) the
size of the image set, (b) the number of search and retrieval cycles during sampling, and (c) the number of forgetting cycles.
As default ranges for the values of the free parameters of the model, we set the number of images from 0 to 10, the initial
association strength from .2 to .5, and all sampling and decay cycles from 1 to 3. In the following runs that simulated the
experimental condition in the memory design with four phases (estimate generation, retention interval, anchor encoding, and
estimate reconstruction), we kept all parameters at the given default values, except one which was then varied systematically
across the specified range of values to see how the model responds to this manipulation. The complexity of the model
necessitates that the parameters interact with each other, so that the results of manipulating one parameter depend on the
values of the others. The following results therefore represent only a selection of all possible behaviours of SARA.
To illustrate the behaviour of the model, we selected one question, namely “How old was Goethe when he died?”, for
which we had obtained a large number of independent estimates (464) from previous studies. We thus constructed 464
artificial image sets to represent individual participants that were then used by SARA. In order to reduce the random error, we
ran each simulation 10 times and then averaged across the ten results. The percentage of correct recalls and the amount of
hindsight bias for the incorrectly recalled estimates were chosen as the dependent measures. A correct recall was scored if
estimate and recalled estimate were identical. Hindsight bias was measured by “ΔE” (see Equation 7).
The results of the first simulation that used the default values as mentioned above were that 23.4% (SE=6.2%) of all
recollected estimates were correct, while the remaining ones showed a systematic bias towards the solution (i.e., “82”) with
(SE=0.207). That is, recalled estimates were on average almost 5 years closer to the true outcome than the original
estimates had been. In order to evaluate whether this bias indicates a small or large effect, we computed the mean absolute
difference between the original estimates and the solution, which yielded (by setting RE=A) as the maximally
possible hindsight bias for this set of estimates. The recollected estimates in our simulation thus moved 29.2% (=4.87/16.70)
towards the solution. (In a simulation of the control condition, in which the anchor was not presented, the bias index turned
out to be , SE=0.154, thus signalling the absence of hindsight bias.)
Size of image set. In the first series of simulations, the number of images in the participants’ image sets was varied from 1
to 10 (Figure 3). Theoretically, a set size of zero is also possible (in which case the model will simply guess a value from the
available range), but that is not of major interest here. The results of the simulations showed that both dependent measures
decreased monotonically with increasing set size. The percentage of correct recall decreases because it gets increasingly
difficult to find the original estimate among the other images (given that there are only between one and three sampling cycles
at retrieval). And hindsight bias is diminished, because the chances are increasingly higher of finding images that were not
affected by selective activation at anchor encoding.
Empirical findings showed that hindsight bias is smaller for easy than for difficult questions (for details see Hawkins &
Hastie, 1990) and that experts reveal less hindsight bias than laypersons (Christensen-Szalanski & Willham, 1991; see also
Hertwig et al., 2003-this issue). This would be reflected in the model’s behaviour, if one assumes that a large number of
available images is a characteristic of expertise or easy items. We doubt that this is the case, because Pohl (1992) found that
Figure 4. Percentage of correctly recalled estimates as a function of the number of sampling cycles.
Figure 5. Amount of hindsight bias for incorrectly recalled estimates as a function of the number of sampling cycles.
experts and novices exhibited the same amount of hindsight bias, but that experts generated better estimates and more often
correctly recalled their estimates. Expertise is thus probably more adequately captured by the images’ quality (i.e., their
proximity to the solution) and their initially higher association strengths to the question than by their number. But these
features were not varied here.
Sampling cycles. Varying the number of cycles separately for the three main processes in the memory design (i.e.,
generating an estimate, encoding the anchor, and reconstructing the estimate) produced the results shown in Figures 4 and 5.
Increasing the sampling cycles for generating an estimate yielded a decrease in the percentage of correct recalls as well as a
decrease in hindsight bias. While the latter finding appears straightforward, the former may come as a surprise. However, the
number of retrieval cycles affects only the associative strength of those images that were successfully retrieved and then
served as retrieval cues. The association between the final estimate and the question as well as the estimate’s associations to
the other images are not affected. Due to this assumption of the model (and due to the restricted number of reconstruction
cycles), the relative probability of finding one of the retrieved images again during reconstruction increases. This in turn leads
to both less correct recall and less hindsight bias.
Regarding the encoding of the anchor, varying the number of sampling cycles yielded the following effects: While the
percentage of correct recalls was not affected, hindsight bias increased with an increasing number of cycles. This, of course, is
due to the selective activation of images during anchor encoding and thus reflects a central assumption of SARA.
Finally, an increased number of sampling cycles in the reconstruction phase led to an increase in the percentage of correct
recalls, except when there was only one sampling cycle. The high percentage of correctly recalled estimates given only a
single retrieval cycle is due to an additional number of “random hits” because with only one cycle, the reconstruction process
will return that single image that has been retrieved, no matter whether it is recognised or not. This type of random hit is not
possible with two or more cycles (given that at least two retrieval attempts were successful) because the numerical contents of
the retrieved images will be integrated (i.e., averaged) to produce the reconstruction of the original estimate. With an
increasing number of cycles, the probability of finding (and recognising) the original estimate increases, leading to an
increased percentage of correct recalls. Concerning the amount of hindsight bias, the number of sampling cycles showed no
effect, contrary to the situation in estimate generation and anchor encoding. However, hindsight bias was larger when there
was only one single sampling cycle at recollection. Again, this is caused by a procedural peculiarity. If the anchor is retrieved
during the first and only cycle, but is not recognised as such, it will be given as the recalled estimate. In this case, however,
hindsight bias is maximal. This situation cannot occur with more than one successful retrieval cycle. More intriguing,
however, is the finding that hindsight did not decrease with more than two reconstruction cycles. Apparently, the probability
of finding images that were used during generating the estimate and images that were used to encode the anchor was similar,
so that the amount of bias remained fairly constant. This would have not been the case, if the number of cycles had been
different for the first two tasks. More precisely, hindsight bias would decrease with relatively more cycles at generating the
estimate, while it would increase with relatively more cycles at encoding the anchor.
The sampling-cycles parameter seems to be of importance for research that addresses the effects of a more or less intense
elaboration of the estimate or the anchor on hindsight bias. The deeper the elaboration, the more sampling cycles are likely to
be performed in order to generate an estimate or to encode an anchor. Empirical findings showed that a deeper encoding of one’s
own estimate reduced hindsight bias (Davies, 1987, Exp. 3; Hell et al., 1988), whereas a deeper encoding of the anchor
enlarged it (Kohnert, 1996, Exp. 1; Wasserman, Lempert, & Hastie, 1991; Wood, 1978, Exp. 2). These findings correspond to
the results obtained in the model’s simulation. However, before drawing any conclusions, it has to be taken into account that
the cited studies did not distinguish between correctly and incorrectly recalled estimates as we did in our analysis of the
results generated by SARA.
Appropriate analyses did indeed show a different pattern of effects. Two studies (Dehn & Erdfelder, 1998; Pohl, Ludwig, &
Ganner, 1999) found that a deeper elaboration during the generation of the estimate led to a higher percentage of correct
recall, but did not affect hindsight bias for the incorrectly recalled estimates. It is therefore much more plausible to attribute the
effects of a deeper elaboration to the initial association strengths between the generated estimate and the external retrieval
cues (leading to a higher percentage of correctly recalled estimates) than to the number of sampling cycles.
In a recent study in our lab, another experimental manipulation was used that may be captured by varying the number of
sampling cycles. In that study, generating the estimate (in the hypothetical design, i.e., after having encoded the anchor) took
place with or without time pressure. Under time pressure, only a few images can be retrieved, because only a few sampling cycles
may occur. These images are most likely those that were activated only a short time ago, and these are, of course, mainly
images that are similar to the anchor. As a consequence, hindsight bias should be larger under time pressure. The results
confirmed this prediction. (The percentage of correctly recalled estimates could not be analysed in this experiment because it
was run in a hypothetical design.)
Another recent experiment from our lab found that presenting the anchor once or twice in a memory design increased the
amount of hindsight bias, , p=.06, but did not change the percentage of correct recollections, t(47) = 1.225,
p=.23, just as was predicted by the model.
Forgetting cycles. In the memory design, there are two potential retention intervals, namely between generating the
estimate and encoding the anchor (Decay I) and between the latter and reconstructing the estimate (Decay II). The effects of
varying these forgetting cycles from 0 cycles (indicating no decay) to 10 cycles are shown in Figures 6 and 7. While an
increasing number of decay cycles had detrimental effects on the percentage of correct recall for both types of decay, it had
opposite effects on the amount of hindsight bias.
If there was more forgetting right after generating the estimate (Decay I), the percentage of correctly recalled estimates
decreased monotonically, while at the same time hindsight bias increased in a similar fashion. Both findings are
straightforward and reasonable. For Decay II (i.e., after encoding the anchor), the decrease in the percentage of correct recall
was less pronounced than for Decay I. The reason for this finding is that without Decay I, the original estimate as well as
those images that were used to generate it have a relatively high chance to be used as images to encode the anchor. Their
associative strength will thus be further increased (they are “rehearsed” so to speak). As a consequence, the probability of
later finding the original estimate is larger than in the case in which Decay I occurred. At the same time, Decay II led to a
strong decrease in hindsight bias, which is caused by “forgetting” the anchor and the images associated with it. With an even
longer retention interval—that is, with more forgetting cycles in Decay II—the associative pattern of the image set will
eventually return to its pre-experimental state, which then would no longer yield any hindsight bias.
These considerations fit nicely to empirical findings (Pohl, 1999; Schmidt, 1993). Comparing the retention interval between
estimation and outcome presentation in different studies on political elections (Decay I) revealed that hindsight bias increased
with an increasing retention interval (as summarised in Blank, Fischer, & Erdfelder, 2003-this issue). At the same time,
increasing the interval after an unfavorable outcome on a cholesterol test was given led to a reduction in hindsight bias
(Renner, 2003-this issue). And given a fixed retention interval between estimation and recollection (Hell et al., 1988),
hindsight bias was found to be larger if the anchor was presented just before the recollection test (Decay I) than when it was
presented directly after the initial estimation (Decay II).
Figure 6. Percentage of correctly recalled estimates as a function of the number of decay cycles.
Figure 7. Amount of hindsight bias for incorrectly recalled estimates as a function of the number of decay cycles.
Limitations of the model

The evaluation of the behaviour of the cognitive process model SARA has shown that the model may be used as an effective
tool in reproducing several different types of effects on the percentage of correct recall and on the amount of hindsight bias.
However, we reported only the means for the dependent variables and not any effect-size measure or ANOVA result across the
tested parameter values. Looking at the ordinates in Figures 3 to 7 quickly reveals that effect sizes are different for the free
parameters of the model. How these effect sizes fit the empirical ones remains to be demonstrated.
Besides, there are some conceptual limitations that should be pointed out in order to reach a fair judgement of SARA’s
merits so far. One is the complexity of the model which may be seen as an advantage (because it allows simulation of a wide
variety of experiments) but also as a disadvantage (because it leads to numerous interactions between parameters). As a
consequence, the model may appear to be able to simulate just any pattern of results, which would certainly be a theoretical
drawback. From our experience with the model’s behaviour so far, we would deny that this poses a serious threat, but that
remains to be shown in future work.
Most importantly, SARA focuses on the automatic processes involved in producing hindsight bias. The model thus reflects
the basic observation that the anchoring effect and hindsight bias are extremely robust and can hardly be influenced intentionally
(Christensen-Szalanski & Willham, 1991; Pohl & Hell, 1996). However, there are several studies that provide evidence for
the influence of intentional, strategic, or metacognitive processes that may moderate the amount of hindsight bias (Kohnert,
1996; Schwarz & Stahlberg, 1999, 2003-this issue; Werth, 1998; Werth & Strack, 2003-this issue). In these cases evaluative
processes, which may go along with the presentation of the anchor, are of special importance. For example, anchors that are
stated to be an expert’s solution may lead to a larger hindsight bias than those of a non-expert (Pohl, 2000).
Particularly the plausibility of an anchor causes strong effects (Hardt & Pohl, 2003-this issue; Pohl, 1998, 2000; Strack &
Mussweiler, 1997): While subjectively plausible anchors led to maximum anchor effects, subjectively implausible anchors
caused reduced or no effects at all or even led to reversed effects (i.e., remembered estimates were further from the anchor
than the original estimates). Reduced, eliminated, or reversed effects can also be observed in cases in which the anchor
consists of a surprising solution (Choi & Nisbett, 2000; Guerin, 1982; Kohnert, 1996; Ofir & Mazursky, 1997; Pezzo, 2003-this
issue; Pohl, Bender, & Lachmann, 2002). Evidently, surprise may have produced the meta-cognition “I couldn’t have known
that at all! That means that my estimate must have been far from the solution” (Mark, Boburka, Eyssell, Cohen, & Mellor,
2003-this issue; Mazursky & Ofir, 1990). Other studies showed that the discrediting of a previously presented anchor may
also lead to strategic effects, thereby reducing or even eliminating hindsight bias (Erdfelder & Buchner, 1998, Exp. 3; Hasher
et al., 1981, Exp. 2).
SARA is not yet able to simulate these processes. The changes in one’s knowledge base through selective activation cannot
be reversed. This corresponds to Fischhoff’s (1975) assumption of an immediate and irreversible assimilation. The only
possibility available is to present “counter-anchors”, which may compensate for the previous distortion (as, for example, in
the case of “counterfactual reasoning”; see above), or to have rather long retention intervals leading to complete forgetting.
But this would still not suffice to explain reversed effects.
CONCLUSIONS
The detailed and completely formalised process model SARA that was laid out in this paper offers a comprehensive cognitive
theory with a broad area of application in the field of distorted judgements and recollections. The model can easily explain
many of the well-known findings in the domain of the anchoring effect and hindsight bias. Several new findings are predicted
and thus allow testing of the model’s assumptions. Advantages of the model are (a) that it is mainly based on established
assumptions of cognitive psychology (borrowing from “SAM”; Raaijmakers & Shiffrin, 1980; Shiffrin & Raaijmakers, 1992),
(b) that it makes detailed assumptions about all processes involved, (c) that it postulates flexible process modules, which can
be used for many kinds of experimental procedures and manipulations, and (d) that it is implemented as a computer program
thus allowing the simulation of empirical findings.
With respect to the explanation of the anchoring effect and hindsight bias, the model proposes two mechanisms, namely,
(a) that encoding the anchor changes the associative pattern in the item-specific knowledge base (“selective activation”;
Eisenhauer & Pohl, 1999; Pohl & Eisenhauer, 1997; Pohl et al., 2000) and (b) that the anchor may act as a retrieval cue,
thereby directing memory search (“biased reconstruction”; Schwarz & Stahlberg, 2003-this issue; Stahlberg et al., 1993;
Stahlberg & Maass, 1998). Both processes lead to changed retrieval probabilities. While the impact of the latter mechanism
appears to be firmly established by many empirical findings, the role of the first one is still under dispute. Hopefully, future
simulations with SARA will help us to decide which mechanism plays exactly which role.
Perspectives
One crucial feature of the model is that it makes new predictions that can be empirically tested. Some ideas have already been
mentioned in the text. For example, it would be worthwile to compare two or more simultaneously presented anchors to
anchors that were presented successively. This manipulation would be easy to simulate with SARA. If there are substantial
retention intervals between the successively presented anchors (allowing for some forgetting), sequential effects are to be
expected. The anchor presented last is supposed to have a greater influence than previously presented anchors. To our
knowledge there are no studies yet examining such sequential effects (but compare order effects in “belief updating”; Hogarth
& Einhorn, 1992; Kerstholt & Jackson, 1998).
Furthermore, additional predictions can be derived from the model: (1) If the availability of single images could be changed
(e.g., by encoding them more or less deeply) so that their order of retrieval would be altered, the model predicts that the
earlier an image is retrieved, the larger its impact should be. (2) If participants are asked to recall their original estimates
repeatedly, SARA would predict a constant hindsight bias. (3) SARA predicts the distorted recollection of the anchor (cf.
Pohl, 1999). (4) A counterintuitive prediction of the model is that a larger image set not only leads to a reduction in the
amount of hindsight bias, but also to a reduced percentage of correct recalls. This assumption could possibly be tested by
instructing participants to memorise artificial image sets that contain different numbers of images.
Although SARA certainly has its limitations and cannot account for all phenomena known in hindsight-bias research, in
particular for higher-order processes (as discussed above), we think that before modelling these higher-order processes, we
need to understand the basics of how anchoring and hindsight bias emerge in the first place. SARA appears to be a promising
candidate for modelling the cognitive processes on such a level.
REFERENCES
Anderson, J.R., & Schooler, J.W. (1991). Reflections of the environment in memory. Psychological Science, 2, 396–408.
Anderson, N.H. (1981). Foundations of information integration theory. New York: Academic Press.
Anderson, N.H. (1986). Algebraic rules in psychological measurement. In H.R.Arkes & K.R. Hammond (Eds.), Judgment and decision
making: An interdisciplinary reader (pp. 77–94). Cambridge: Cambridge University Press.
Chapman, G.B., & Johnson, E.J. (1994). The limits of anchoring. Journal of Behavioral Decision Making, 7, 223–242.
Choi, L, & Nisbett, R.E. (2000). Cultural psychology of surprise: Holistic theories and recognition of contradiction. Journal of Personality
and Social Psychology, 79, 890–905.
Processes, 48, 147–168.
Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain
Sciences, 24, 87–185.
Davies, M.F. (1987). Reduction of hindsight bias by restoration of foresight perspective: Effectiveness of foresight-encoding and hindsight-
retrieval strategies. Organizational Behavior and Human Decision Processes, 40, 50–68.
Davies, M.F. (1992). Field dependence and hindsight bias: Cognitive restructuring and the generation of reasons. Journal of Research in
Personality, 26, 58–74.
Dehn, D.M., & Erdfelder, E. (1998). What kind of bias is hindsight bias? Psychological Research, 61, 135–146.
Eisenhauer, M., & Pohl, R.F. (1999). Selective activation as an explanation for hindsight bias. Proceedings of the Twenty First Annual
Conference of the Cognitive Science Society (pp. 144–148). Mahwah, NJ: Lawrence Erlbaum Associates Inc.
Fischer, L, & Budescu, D.V. (1995). Desirability and hindsight bias in predicting results of a multi-party election. In J.-P.Caverni, M.Bar-
Hillel, & F.H. Barron (Eds.), Contributions to decision making I (pp. 193–212). Amsterdam: Elsevier.
Fischhoff, B. (1977). Perceived informativeness of facts. Journal of Experimental Psychology: Human Perception and Performance, 3,
349–358.
Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological
Review, 98, 506–528.
Guerin, B. (1982). Salience and hindsight biases in judgments of world events. Psychological Reports, 50, 411–414.
Hasher, L., Attig, M.S., & Alba, J.W. (1981). I knew it all along: Or, did I? Journal of Verbal Learning and Verbal Behavior, 20, 86–96.
311–327.
Hell, W., Gigerenzer, G., Gauggel, S., Mall, M., & Müller, M. (1988). Hindsight bias: An interaction of automatic and motivational factors?
Memory and Cognition, 16, 533–538.
Memory, 11, 357–377.
Hoffrage, U., Hertwig, R., & Gigerenzer, G. (2000). Hindsight bias: A by-product of knowledge updating? Journal of Experimental
Hogarth, R.M., & Einhorn, H.J. (1992). Order effects in belief updating: The belief-adjustment model. Cognitive Psychology, 24, 1–55.
Kahneman, D., & Miller, D.T. (1986). Norm theory: Comparing reality to its alternatives. Psychological Review, 93, 136–153.
Kerstholt, J., & Jackson, J.L. (1998). Judicial decision making: Order of evidence presentation and availability of background information.
Applied Cognitive Psychology, 12, 445–454.
Kohnert, A. (1996). Grenzen des Rückschaufehlers: Die Verzerrung von Erinnerungen an früheres Wissen durch neue Informationen
[Limits of hindsight bias: The distortions of recollections of earlier knowledge through new information]. Bonn, Germany: Holos.
Mazursky, D., & Ofir, C. (1990). “I could never have expected it to happen”: The reversal of the hindsight bias. Organizational Behavior
and Human Decision Processes, 46, 20–33.
Mensink, G.-J., & Raaijmakers, J.G.W. (1988). A model of interference and forgetting. Psychological Review, 95, 434–455.
Mensink, G.-J., & Raaijmakers, J.G.W. (1989). A model of contextual fluctuation. Journal of Mathematical Psychology, 33, 172–186.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological
Review, 63, 81–97.
Mussweiler, T., & Strack, F. (1999). Hypothesis-consistent testing and semantic priming in the anchoring paradigm: A selective
accessibility model. Journal of Experimental Social Psychology, 35, 136–164.
Ofir, C., & Mazursky, D. (1997). Does a surprising outcome reinforce or reverse the hindsight bias? Organizational Behavior and Human
Decision Processes, 69, 51–57.
Pohl, R.F. (1992). Der Rückschau-Fehler: systematische Verfälschung der Erinnerung bei Experten und Novizen [Hindsight bias:
Systematic distortions of the memory of experts and lay persons]. Kognitionswissenschaft, 3, 38–44.
Pohl, R.F. (1995). Disenchanting hindsight bias. In J.-P. Caverni, M. Bar-Hillel, F.H.Barron, & H.Jungermann (Eds.), Contributions to
decision making (pp. 323–334). Amsterdam: Elsevier.
Pohl, R.F. (1998). The effects of feedback source and plausibility on hindsight bias. European Journal of Cognitive Psychology, 10,
191–212.
Pohl, R.F. (1999). More than hindsight bias: Systematically distorted recollections of the solutions of difficult knowledge questions. Paper
presented at the 11th Congress of the European Society for Cognitive Psychology, Ghent, Belgium.
Pohl, R.F. (2000). Suggestibility and anchoring. In V. De Pascalis, V.A.Gheorghiu, P.W.Sheehan, & I. Kirsch (Eds.), Suggestion and
suggestibility: Advances in theory and research (pp. 137–151). Munich, Germany: M.E.G.-Stiftung.
Pohl, R.F., Bender, M., & Lachmann, G. (2002). Hindsight bias around the world. Experimental Psychology, 49, 270–282.
Pohl, R.F., & Eisenhauer, M. (1997). SARA: An associative model for anchoring and hindsight bias. In M.G.Shafto & P.Langley (Eds.),
Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society (p. 1103). Mahwah, NJ: Lawrence Erlbaum
Associates Inc.
Pohl, R.F., Hardt, O., & Eisenhauer, M. (2000). SARA—Ein kognitives Prozeβmodell zur Erklärung von Ankereffekt und Rückschaufehler
[SARA—A cognitive process model to explain anchoring effect and hindsight bias]. Kognitionswissenschaft, 9, 77–92.
Pohl, R.F., & Hell, W. (1996). No reduction of hindsight bias with complete information and repeated testing. Organizational Behavior and
Human Decision Processes, 67, 49–58.
Pohl, R.R, Ludwig, M., & Ganner, J. (1999). Kein Effekt der Tiefe der Elaboration auf das Ausmaß des Rückschaufehlers [No effect of the
degree of elaboration on the amount of hindsight bias]. Zeitschrift für Experimentelle Psychologie, 46, 275–287.
Powell, J.L. (1988). A test of the “knew-it-all-along” effect in the 1984 presidential and statewide elections. Journal of Applied Social
Psychology, 18, 760–773.
Raaijmakers, J.G.W., & Shiffrin, R.M. (1980). SAM: A theory of probabilistic search of associative memory. In G.H.Bower (Ed.), The
psychology of learning and motivation (Vol. 14; pp. 207–262). San Diego, CA: Academic Press.
Schmidt, C. (1993). Verzerrte Vorstellung von Vergangenem: Vorsatz oder Versehen? [Distorted images of the past: Intention or failure?]
Bonn, Germany: Holos-Verlag.
Schwarz, S., & Stahlberg, D. (1999). Hindsight-bias: The role of perfect memory and meta-cognitions (Report No. 99–35). Mannheim,
Germany: Sonderforschungsbereich 504: Rationality, decision making, and economic modelling.
Shiffrin, R.M., & Raaijmakers, J.G.W. (1992). The SAM retrieval model: A retrospective and a prospective. In A.F.Healy, S.M.Kosslyn, &
R.M. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in Honor of William K.Estes (Vol 2, pp. 69–86).
Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Stahlberg, D., Eller, F., Romahn, A., & Frey, D. (1993). Der Knew-it-all-along-Effekt in Urteilssituationen von hoher und geringer
Selbstwertrelevanz [The knew-it-all-along effect in judgmental settings of high and low self-esteem relevance]. Zeitschrift für
Sozialpsychologie, 24, 94–102.
Stahlberg, D., & Schwarz, S. (1999). Would I have known it all along if I would hate to know it? The hindsight bias in situations of high and
low self esteem relevance (Report No. 99–34). Mannheim, Germany: Sonderforschungsbereich 504: Rationality, decision making, and
economic modelling.
Strack, R, & Mussweiler, T. (1997). Explaining the enigmatic anchoring effect: Mechanisms of selective accessibility. Journal of
Personality and Social Psychology, 73, 437–446.
Tukey, J.W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131.
Wasserman, D., Lempert, R.O., & Hastie, R. (1991). Hindsight and causality. Personality and Social Psychology Bulletin, 17, 30–35.
Werth, L. (1998). Ein inferentieller Erklärungsansatz des Rückschaufehlers [An inference explanation of hindsight bias]. Hamburg,
Germany: Kovac.
Wood, G. (1978). The “knew-it-all-along” effect. Journal of Experimental Psychology: Human Perception and Performance, 4, 345–353.
Hindsight bias: How knowledge and heuristics affect our
reconstruction of the past
Ralph Hertwig
Columbia University, USA, and Max Planck Institute for Human Development, Berlin, Germany
Carola Fanselow and Ulrich Hoffrage
Max Planck Institute for Human Development, Berlin, Germany
Once people know the outcome of an event, they tend to overestimate what could have been anticipated in
foresight. Although typically considered to be a robust phenomenon, this hindsight bias is subject to moderating
circumstances. In their meta-analysis, Christensen-Szalanski and Willham (1991) observed that the more
experience people have with the task under consideration, the smaller is the resulting hindsight bias. This
observation is one benchmark against which the explanatory power of process models of hindsight bias can be
measured. Therefore, we used it to put the recently proposed RAFT model (Hoffrage, Hertwig, & Gigerenzer,
2000) to another test. Our findings were consistent with the “expertise effect.” Specifically, we observed—using
computer simulations of the RAFT model—that the more comprehensive people’s knowledge is in foresight, the
smaller is their hindsight bias. In addition, we made two counterintuitive observations: First, the relation between
foresight knowledge and hindsight bias appears to be independent of how knowledge is processed. Second, even
if foresight knowledge is false, it can reduce hindsight bias. We conclude with a discussion of the functional value
of hindsight bias.
Recollection or re-evaluation of past events can be affected by what has happened since. While interesting in its own right—
as it sheds light on the working of human memory—this phenomenon, hindsight bias, also matters because it affects how we
evaluate the actions of others. Take historians as an example. Historians are hermeneuts of the past, trying to explain why
things turned out the way they did. They must, for instance, evaluate the appropriateness of ex ante behaviour (e.g.,
Napoleon’s decision to invade Russia) that resulted in bad or good ex post outcomes. By necessity historians are cognisant of
the outcome, and this knowledge can affect their evaluations. A behaviour, for instance, may be judged to be more neglectful
than it would have been judged without knowing its negative consequences.
As an example, consider the tragic failure of an adventure that has captured the public’s imagination: Robert F.Scott’s race to
be first to reach the South Pole. In November 1911, Scott led a British team in an attempt to reach the Pole. After marching
and skiing more than 900 miles, Scott and his four companions reached their goal in January 1912, only to find that
Amundsen and his Norwegian colleagues had beaten them by almost a month. On their way back, Scott and his compatriots
froze to death in a tent just a few miles short of a depot of food and heating oil. “When words of their deaths reached
England, Scott was hailed as a hero, an exemplar of

English gentlemanly pluck in the face of dire adversity” (The New York Times, Science section, 28 August 2001). In recent
decades, however, historians have turned to less flattering second-guessing of Scott’s actions. For instance, the British
historian Roland Huntford sought to revise the public’s view of Scott. With the benefit of hindsight, he questioned many of
Scott’s decisions, such as why Scott and his men acted as their own pack animals, pulling a sled loaded with more than 200
pounds of equipment and supplies. He also asked how it was possible that Scott and his crew were not prepared for the
gruelling temperatures. In his foreword to the new edition of Huntford’s book (1999), the well-known travel writer Paul
Requests for reprints should be sent to Ralph Hertwig, Center for Adaptive Behavior and Cognition, Max Planck Institute for
Human Development, Lentzeallee 94, D-14195 Berlin, Germany. Email: hertwig@mpib-berlin.mpg.deThis work was
completed while Ralph Hertwig was a visiting scholar at Columbia University. We thank the Deutsche
Forschungsgemeinschaft for its financial support of Ralph Hertwig with Grant HE 2768/6–1 and of Ulrich Hoffrage with
Grant Ho 1847/1. We are grateful to Oliver Hardt, Rüdiger Pohl, Lael Schooler, and Anders Winman for many helpful
comments, and we thank Callia Piperides and Anita Todd for editing the manuscript.
354 HERTWIG, FANSELOW, HOFFRAGE
Theroux continued the tradition of denigrating Scott, describing him as “insecure, dark, panicky, humorless, an enigma to his
men, unprepared, and a bungler, but in the spirit of a large-scale bungler, always self-dramatizing” (p. xiv).
To what extent is Huntford’s, and for that matter Theroux’s, view of Scott tainted by the knowledge of the expedition’s
tragic end? In a new book, Susan Solomon (2001), senior scientist at the US National Oceanic and Atmospheric
Administration, has analysed meteorological data of the last 17 years from weather stations in Antarctica and compared them
with weather information from the diaries and letters of the men on the Scott expedition. Based on these data she argued that
an extremely rare spell of dramatic cold was the deciding factor in Scott’s fatal expedition. Contradicting Huntford’s
judgement, she concluded that Scott and his crew “planned meticulously, and they were undone by an act of nature… It
would have been a perfectly workable plan in a normal year” (The New York Times, Science section, 28 August 2001).
What was Scott—a neglectful bungler or a meticulous planner? Although we may never learn the truth, psychological
research can at the least elucidate how such drastically different views of Scott’s personality could have emerged. Clearly,
book authors writing on the same events have strategic incentives to overemphasise their differences in opinion. Hindsight
bias research, however, suggests another key factor that may have contributed to Huntford and Solomon’s diverging views.
For the sake of argument, let us assume that Huntford’s judgement is tainted more than Solomon’s by the benefit of knowing
the fatal outcome of the expedition. How could that be? One possibility is that Solomon’s edge in knowledge, that is, her
knowing about the exceptionally icy temperatures, was instrumental in protecting her from exaggerating what Scott could have
anticipated in foresight. In other words, being more knowledgeable may in fact have guarded her from concluding that the
expedition’s fatal outcome was inevitable and foreseeable. Is there reason to believe that a person’s amount of knowledge is
related to the hindsight bias? The brief answer is yes. As we describe in the next section, an extensive meta-analysis
concluded that the more experienced a person is with the task under consideration, the smaller is the effect of the hindsight
bias.
MODERATOR VARIABLES OF HINDSIGHT BIAS

Hindsight bias is one of the most frequently cited cognitive biases (Christensen-Szalanski & Beach, 1984). Not surprisingly,
it is also one of the most researched. In a meta-analysis of hindsight bias research, Christensen-Szalanski and Willham (1991)
analysed a total of 128 studies to identify important moderator variables of the phenomenon. They focused on two variables—
one is the question of whether or not hindsight bias is more pronounced when people are told that an event occurred versus
did not occur. The second variable is the effect of what the authors referred to as people’s “familiarity”, “expertise”, and
“experience” with the task. In what follows, we focus on this latter moderator variable (see Hertwig, Gigerenzer, & Hoffrage,
1997, for an account of the first variable).
To examine the impact of people’s experience, Christensen-Szalanski and Willham (1991) coded each study within a large
set of studies on hindsight bias as either “familiar” or “unfamiliar”. For example, Arkes, Wortmann, Saville, and Harkness’s
(1981) study was coded as familiar because their participants were experts in the field from which questions were sampled.
Specifically, Arkes et al. asked physicians to assign to each of four possible diagnoses entertained in a medical case history,
the probability they thought they would have assigned. One group of physicians made their estimates without knowing the
outcome (i.e., the actual disease), whereas all others arrived at their estimates after having been told which of the four possible
diagnoses was correct. In their pool of studies, Christensen-Szalanski and Willham (1991) coded about half of the studies
as “familiar” and “unfamiliar”, respectively. Did the size of hindsight bias differ between these two sets? It did. Christensen-
Szalanski and Willham (1991) reported that “the more familiar the subject is with the task, the smaller the effect of the
hindsight bias” (p. 155). This effect is of medium to large size (r=0.42 when corrected for sampling error). Thus, Christensen-
Szalanski and Willham’s meta-analysis established experience to be a key moderator of hindsight bias. By identifying this
effect, henceforth the expertise effect, their meta-analysis has provided hindsight bias research with a key empirical
benchmark against which the explanatory power of models of hindsight bias can be evaluated.
EXPERT PERFORMANCE
How do experts and novices differ, and can these differences help to explain why novices are more disposed to the hindsight
bias? The prolific research on expert performance (e.g., Chase & Simon, 1973; Chi, Feltovitch, & Glaser, 1981; de Groot,
1946/1965; Larkin, McDermott, Simon, & Simon, 1980) has revealed a number of differences in the way experts and novices
go about solving problems; some of those are immediately relevant to hindsight bias. First and foremost, knowledge emerges
as an “essential prerequisite to expert skill”, and “the extent of the knowledge an expert must be able to call upon is
demonstrably large” (Larkin et al., 1980, p. 1342). How large an expert’s repertoire of knowledge can be is illustrated in
Simon and Barenfeld’s (1969) classic study of master chess players. According to their estimates, a player (after at least
several years of serious occupation with the game) is expected to have acquired a “vocabulary” of familiar subpatterns with a
size of 10,000 to 100,000 patterns.
RECONSTRUCTION OF THE PAST 355
However, experts differ from novices in more than just their sheer amount of knowledge. The famous German physicist
Werner Heisenberg (1971) suggested that an expert is someone who knows some of the worst mistakes that can be made in
his subject, and how to avoid them. One interpretation of Heisenberg’s portrayal of expert performance is that expert
knowledge may also be more veridical than the knowledge of novices.
It makes intuitive sense to call upon differences in the quantity and quality of knowledge to explain expert-novice
differences. These presumed differences in knowledge, however, present what Ericsson and Staszewski (1989) called a
“thorny” problem: How do experts process an enormous amount of information, given that they are subject to the same or similar
elementary information-processing limits as novices? Shanteau (1992), a prominent scholar of expert decision making,
asserted that “experts should use all relevant information” (p. 253, emphasis added), defying those limits. By suggesting that
experts retrieve all the information available (either from internal or external memories) and combine various aspects into a
single judgement, he thus depicts expert decision making as akin to rational decision making. Specifically, he echoed two
commandments that are often taken as characteristics of rational choices and judgements, namely, “complete search” and
“compensation” (see Gigerenzer & Goldstein, 1999). The former prescribes, “thou shalt find all the information available”,
while the latter says, “thou shalt combine all pieces of information” (i.e., not rely on just one piece).
To conclude, expert and novice performance has been demonstrated or suggested to differ on multiple dimensions, among
them the amount, the accuracy, and the processing of knowledge. In principle, both the combination of these dimensions as
well as each one individually may be able to account for the expertise effect observed by Christensen-Szalanski and Willham
(1991). In what follows, we investigate how each of these dimensions affects hindsight bias. In our investigation, we employ
the RAFT model (Reconstruction After Feedback with Take The Best). This recent process model of the cognitive processes
underlying the hindsight bias (Hoffrage & Hertwig, 1999; Hoffrage et al., 2000) affords us the opportunity to map the three
dimensions of expertise considered here— amount, accuracy, and processing of foresight knowledge—into a single theoretical
framework, and then to analyse their impact on hindsight bias.
THE RAFT MODEL

Before introducing the RAFT model, we first distinguish two different types of research designs employed in hindsight bias
research. The hypothetical design approximates the situation of historians (such as Huntford) who typically evaluate an event
(e.g., Scott’s fatal expedition) in hindsight without having given an assessment prior to its occurrence. Specifically, this
design compares two groups of participants: One group has no outcome information, and the other has such information but is
asked to ignore it (e.g., the physicians in Arkes et al.’s, 1981, study). Finally, a comparison is made between the judgements of
both groups. In contrast, the memory design approximates everyday situations in which individuals (e.g., weather forecasters,
political pollsters) predict an event, learn about the actual outcome, and then eventually remember their previous judgement.
Because the RAFT model was primarily designed to model hindsight bias judgements in the context of the memory design,
we henceforth focus on this design.
The RAFT model is based on the theory of probabilistic mental models (Gigerenzer, Hoffrage, & Kleinbölting, 1991). This
theory models the cognitive processes in two-alternative-choice tasks, in which people are required to make inferences about
which of two objects, a or b, has a higher value on some quantitative dimension (henceforth, original choice). The RAFT
model applies this theoretical framework to a repeated measurement context in which a previous choice (made at Time 1)
needs to be recalled (at Time 3) after receiving feedback (at Time 2) on the correct choice (Hoffrage et al., 2000).
The RAFT model makes three assumptions about this recollection process (at Time 3): First, if the original choice (made at
Time 1) cannot be retrieved from memory, it will be reconstructed by rejudging the problem. Second, the reconstruction
involves the attempt to recall the knowledge on which the original choice was based. Third, the outcome information received
(at Time 2) is used to update old knowledge, in particular knowledge that was elusive and missing at Time 1. In conjunction,
these assumptions suffice to explain the occurrence of hindsight bias. Thus, the RAFT model suggests that outcome
information does not directly affect the memory trace for the original choice but exerts its impact indirectly by updating
knowledge that is used to reconstruct the original choice (in the context of hindsight bias, the notion of reconstruction has
been proposed by, for instance, Hawkins & Hastie, 1990; Stahlberg & Maass, 1998; see also Schwarz & Stahlberg, 2003-this
issue).
We now specify in detail the cognitive processes underlying the original choice at Time 1 and the recalled choice at Time
3.
Time 1:
Original choice
An anecdote helps to illustrate the proposed processes: A couple of months before the 2000 US presidential election, two
German colleagues of ours, Peter and Michael, had bet on its outcome. While Peter deemed Al Gore to be the likely winner,
Michael was convinced that George W. Bush would win. As we all know, the election was not settled until five weeks after
election day when the US Supreme Court’s intervention finally brought the contest to an end. This unusually long delay may
have contributed to the fact that, very much to Michael’s chagrin, Peter plainly forgot about their wager and when reminded,
recalled having picked Bush rather than Gore as the likely winner (for a study of hindsight bias in the context of political events,
see, for example, Blank, Fischer, & Erdfelder, 2003-this issue; Synodinos, 1986; Wendt, 1993).
How would RAFT account for Peter’s retrospective belief that he had picked Bush rather than Gore after he learned that
Bush won the election? The first step in the model is to account for the original choice: Not knowing who would win, Peter
initially tried to infer the more likely winner from what he knew about the two candidates. According to the RAFT model,
Peter constructed a probabilistic mental model to make the inference. Such a model connects the specific structure of the task
with a probability structure of a corresponding natural environment (stored in long-term memory) and consists of knowledge
in terms of a reference class, probability cues, and the cue values of the objects on the cues. Before we describe this
knowledge in more detail, let us stress that henceforth we use the terms “knowledge” and “know” in a rather narrow sense,
namely, to refer to the cue values a person has stored in long-term memory (regardless of whether or not those cue values are
accurate).
Knowledge of cues and cue values. In Peter’s case, the reference class might be some set of previous presidential elections
(e.g., all elections since 1948) with the competing candidates in those races as objects that compose the reference class. Each
candidate can be described on a number of cues related to the criterion “outcome of the election”. Cues are variables that
covary with the criterion, thus allowing a person to use them as predictors for the criterion variable. Cue values are the values
of the objects on the cues. In the case of dichotomous cues, the cue values are “positive” and “negative”.
Which cues may come to mind when one attempts to forecast the outcome of political elections? According to the common
wisdom of political forecasters (e.g., Lichtman, 1996), the outcome of the presidential election can be inferred on the basis of
predictors such as the reelection, the incumbent party, and the economic prosperity cues: The first cue refers to the
observation that if the President is running for reelection, he (or she in the future) may have a head start (e.g., because of
being well known to voters), and so may the candidate of the incumbent party. The third cue refers to the observation that the
party that promises to better maintain prosperity in the future has an advantage (Campbell & Mann, 1996). Moreover, it
appears that personal features of the candidates can also be predictive of their success or lack thereof. For instance, the
candidate who is charismatic (e.g., Kennedy), a national hero (e.g., Eisenhower), or plainly taller (than the opponent) has been
suggested to have better chances of winning.
Clearly, some of these cues are better predictors than others. A cue’s predictive ability is captured in the notion of
ecological validity, which is defined as the relative frequency with which the cue correctly predicts which object (here
candidate) scores higher on the criterion (in a defined reference class). The validity of the re-election cue, for instance, has an
ecological validity of 75% (assuming a reference class that consists of the presidential races between 1948 and 1996). In
contrast, the incumbent party cue only has a modest validity of 54% (again considering the races between 1948 and 1996).
Inference mechanism. Let us assume that Peter’s probabilistic mental model includes four cues, with the economic
prosperity cue ranked highest, followed by the charisma, re-election, and incumbent party cues. The RAFT model accounts for
Peter’s inference with a processing strategy called the “Take The Best” heuristic. This lexicographic strategy assumes a
subjective rank order of cues according to their validities and makes the inference on the basis of the best (i.e., most valid) cue
that discriminates. The three building blocks of the heuristic (excluding the recognition principle, which is not relevant here;
see Gigerenzer & Goldstein, 1996) are:
• Search: Choose the cue with the highest validity and retrieve the object’s cue values from memory.
• Stop: If the best cue discriminates, stop searching. The cue is said to discriminate between two objects if one has a positive
cue value and the other does not. If the best cue does not discriminate, continue with the next best cue until a cue that
discriminates is found.
• Decide: Choose the object to which the cue points, that is, the object with the positive cue value (if criterion and cues are
negatively correlated, then choose the object with the negative cue value). If no cue discriminates, then choose randomly.
Note that for the purpose of illustration, we treat all cues in the Gore-Bush example as binary cues although some, such as
the charisma cue, may be continuous (for the treatment of continuous cues within the RAFT framework, see Hoffrage et
al., 2000).
To illustrate the heuristic’s policy, Table 1 shows the Take The Best heuristic applied to Peter’s knowledge about the
candidates. At Time 1, Peter does not know the values for the highest-ranked cue, the economic prosperity cue, and thus Take
The Best cannot use it. In addition, the second-ranked cue, charisma, does not discriminate between the candidates either:
Although Peter knows that Gore is widely considered to lack charisma (“Gore the Bore”), he does not know how Bush scores
on this cue dimension. Since neither of the candidates is the sitting president seeking re-election, Peter’s third-ranked cue
does
TABLE 1 Hindsight bias in the presidential election inference

Cues Time 1 Feedback Time 3
Gore Bush “Bush” Gore Bush
Economic prosperity ? ? ? ?
Charisma − ? → − +
Re-election − − − −
Incumbent party + − + −
Response “Gore” “Bush”
The probabilistic mental model contains four cues ranked according to their (assumed) validity. Cue values are positive (+) or negative (−);
missing knowledge is indicated by question marks (?). To predict the winner of the election, the Take The Best heuristic looks up
only the cue values in the shaded areas. The final decision is determined solely on the basis of the cue values in the lowest shaded
cell. At Time 3, the cue value of the charisma cue for Bush shifts towards feedback, that is, from “?”to “+”. As a consequence,
this cue now discriminates and points to Bush—hindsight bias occurs.
not discriminate either. Therefore, Take The Best determines the choice on the basis of the fourth-valid cue, the only one that
discriminates between the candidates. Because Gore is the candidate of the incumbent party, the heuristic picks him as the
candidate with the better chance of winning.
Time 3:
Reconstruction
Why does Peter misremember his original choice? One necessary reason for the occurrence of Peter’s hindsight bias is that he
is not able to retrieve his original choice from memory directly. According to the RAFT model, if the original response
cannot be retrieved, it will be reconstructed by reiterating the steps taken at Time 1.
The reconstruction process begins by retrieving the knowledge on which the choice at Time 1 was based, that is, by
retrieving the original cues and their values. In some cases, veridical retrieval may be possible; in others, memory of the cue
values may be elusive or missing—either because the knowledge retrieval from long-term memory is not completely reliable
or because knowledge was elusive or missing at Time 1. The RAFT model’s critical assumption is that outcome knowledge (e.g.,
Bush won the presidential race) transforms some of the elusive and missing cue values into positive or negative values, thus
possibly turning non-discriminatory cues into discriminatory ones. This is due to the reversibility of the cue-criterion
relationship: Because it is possible to draw inferences from a cue (e.g., height) to the criterion, the reverse is also possible—to
draw inferences from the criterion to the cues. Thus, what used to be the distal variable (i.e., outcome of the election) at Time
1 now turns into a proximal cue and is used to infer what used to be a proximal cue at Time 1 (e.g., charisma) and what turns
into a distal variable at Time 3. Such a reversal between proximal cues and distal variable is possible because cues and
criterion are correlated with each other.
To illustrate this, let us consider Peter’s updated probabilistic mental model at Time 3. After the 8 December ruling of the
Supreme Court, Peter attempts to reconstruct his original choice. RAFT assumes that the new information concerning the de
facto winner affords the mind inferences about some of the cue values that were unknown at Time 1. That is, not all initial cue
values may be veridically remembered but some will have taken on values consistent with the newly acquired outcome
information. As Table 1 shows, in Peter’s updated mental model, Bush’s value for charisma is now seen as being “positive”.
Consequently, this cue now discriminates and points to Bush as the likely winner. If the same heuristic (here, Take The Best)
is applied to this updated knowledge base, the reconstructed choice will be consistent with the outcome information. In other
words, Peter remembers having originally deemed Bush to be the winner, thus exhibiting hindsight bias.1
It is important to note that within the RAFT model, updating cue values is thought to be a probabilistic process in which
some, but not all, missing cue values are updated. As a consequence, the RAFT model cannot predict which of the elusive and
missing cue values will be updated. It can, however, use the updated cue values to predict which recollections of the original
judgement will exhibit hindsight bias. In addition, RAFT assumes that knowledge retrieval from long-term memory is not
perfectly reliable. In other words, it also posits random deviations between the cue values at Time 1 and Time 3. Because such
alterations are independent of outcome knowledge, they can explain why for a given item hindsight bias (if they coincide with
the direction of the actual outcome), reversed hindsight bias (if they are counter to the direction of the actual outcome), or no
hindsight bias occurs.2
The RAFT model can be summarised as follows: If the original choice cannot be retrieved from memory, an attempt is
made to reconstruct the probabilistic mental model that led to it. An identical reconstruction requires the type of inference strategy
for the original choice and its reconstruction to be the same, and the input into the strategy also to remain the same. Any
violation of these requirements may lead to differences between the original and the reconstructed choice. The RAFT model
assumes that feedback changes the input (i.e., the cue values) into the inferential strategy but does not exclude the possibility
that other requirements may also be violated (and there are indeed such accounts of hindsight bias, e.g., Hawkins & Hastie,
1990).
Let us conclude this introduction of the RAFT model with a caveat. The model focuses on the context of choices (here
between two alternatives, but it can, in principle, be applied to choices among multiple alternatives). Choices are in fact the
paradigmatic context of economic theory and, according to Kahneman and Tversky (1984), “making decisions [i.e., risky and
riskless choices] is like speaking prose—people do it all the time, knowingly or unknowingly” (p. 341). The context of
choices, however, is not the only context in which hindsight bias has been observed. Others are estimation of quantities, and
confidence judgements. For this reason, we need to caution that the scope of the RAFT model is limited to one task, albeit a
ubiquitous one. Thus, the RAFT model does not exclude other accounts of hindsight bias such as the SARA model that has
been designed to account for hindsight bias in numerical estimates (Pohl, Eisenhauer, & Hardt, 2003-this issue).
Before we describe how the RAFT model was implemented in the present computer simulation, we briefly review some of
the available empirical evidence that can be marshalled in support of it.
THE RAFT MODEL: EMPIRICAL SUPPORT

How do people make choices between two objects based on a bundle of imperfect cues? The Take The Best heuristic
embodies the bold possibility that only a single imperfect cue will be used to make such a choice, thus minimising both the
information-searching costs (e.g., in terms of time) and the computational costs. This policy is what Gigerenzer, Todd, and the
ABC Research Group (1999b) have called one-reason decision making. One advantage of this decision-making policy is that
it avoids conflicts between those cues that point in opposite directions. Avoiding such conflicts makes the Take The Best
heuristic non-compensatory, which means that a cue supporting alternative A cannot be outweighed by any combination of
less important cues, even if they all support alternative B.
Do people employ the Take The Best heuristic? There have been several independent investigations into the descriptive
validity of the Take The Best heuristic and variants thereof (e.g., Bröder, 2000; Rieskamp & Hoffrage, 1999; Slegers, Brake,
& Doherty, 2000). Recently, Hertwig and Hoffrage (2001) reviewed this set of published studies. While acknowledging that
conditions exist under which the majority of people could not be classified as using the Take The Best heuristic (see Bröder,
2000, Study 1), it seems fair to conclude that two key conditions of decision making—time pressure and the imposition of costs
on information search and use—favour the use of non-compensatory strategies such as the Take The Best heuristic. For
instance, Bröder (2000, Study 4) showed that when participants had to search for costly information, 65% of them were
classified as using the Take The Best heuristic. In contrast, less than 10% could be classified as using a simple linear decision
strategy (with unit weights).
Another core assumption of the RAFT model is that outcome knowledge (e.g., Bush won the presidential race) transforms
some of the elusive and missing cue values into positive or negative values, thus possibly turning non-discriminatory cues into
discriminatory ones. Does this updating actually occur? It does. As can be seen in Figure 1, Hoffrage et al. (2000) observed that
after feedback on the correct alternative (at Time 2), more cue values (object relations; see figure legend) shifted towards the
correct alternative than away from it (at Time 3). In contrast, cue values in the control condition (i.e., without feedback)
shifted equally often towards and away from it. This finding was obtained in two independent studies— one that used binary
cues (Study 1) and one that used continuous cues (Study 2).
The RAFT model accounts for the observed outcomes at Time 3 (hindsight bias, reversed hindsight bias, or veridical recall)
on the basis of the cue values at Time 3. Do the cue values at Time 3 in fact determine the observed outcomes? Hoffrage et al.
(2000, see their Figure 5) found that in 83.5% (Study 1) and 69.5% (Study 2) of the cases, the outcomes predicted by the
1 While we use the Gore-Bush competition simply as an illustration, it does highlight an issue that, to the best of our knowledge, has hardly
been addressed. Hindsight bias research typically assumes that the outcome of an event is unambiguous. To the chagrin of both presidential
candidates, however, the outcome of the 2000 presidential election was ambiguous. In light of the fact that Gore won the popular vote
(though did not reach the majority in the electoral college), some people (including a reviewer of this paper) argued that Gore was the
“winner”. Be this as it may, the outcome of the 2000 presidential election demonstrates that, in real life, outcomes can be ambiguous or at
least may be perceived as such. If so, one may speculate that the benefit of knowing the outcome may lose some of its alluring impact.
2 Explaining item-specific reversed hindsight bias through unsystematic changes in cue values does not exclude other accounts of the
hindsight bias reversals, such as the surprise account proposed by Mazursky and Ofir (1990), and Ofir and Mazursky (1997) (for another
account that treats feelings as information, see Werth & Strack, 2003-this issue). It is interesting to note that RAFT’s core assumption of
updating could also play a key role in Pezzo’s (2003-this issue) sense-making model which builds on the very notion of surprise. His model
assumes that in cases in which outcome information is incongruent with prior expectations, a sense-making process will be activated.
Specifically, Pezzo predicts that if sense making succeeds, no “resultant surprise” is experienced, and hindsight bias will occur. On the
assumption that sense making is more likely to succeed if updating has occurred, RAFT provides a cognitive rationale for why Pezzo’s
account predicts hindsight bias in this case.
Figure 1. Percentage of shifts of object relations towards and away from the correct alternative in the feedback and no-feedback conditions
in two studies (adapted from Hoffrage et al., 2000). The term object relations refers to the relation of objects with respect to a cue. This
relation can be larger, smaller, equal, or unknown, and refers both to continuous cues (Study 1) and binary cues (Study 2).
RAFT model matched those actually observed (for various statistical tests of the performance of the RAFT model, see
Hoffrage et al., 2000, pp. 572– 577).
In sum, there are several empirical results that are consistent with the RAFT model and its building blocks such as the Take
The Best heuristic and the assumption of knowledge updating. Can we add to this collection of results by demonstrating that
the RAFT model can also account for Christensen-Szalanski and Willham’s (1991) expertise effect? In what follows, we
describe our investigation into this question in detail.
THE RAFT MODEL: IMPLEMENTATION

Using computer simulations, we investigated possible determinants of the expertise effect. Specifically, we examined the
impact of the amount of foresight knowledge (i.e., how much does a person know at Time 1), knowledge accuracy (how
accurate or inaccurate is a person’s knowledge at Time 1), and knowledge processing (how is a person’s knowledge processed
at Times 1 and 3) on hindsight bias.
Environment
We conducted the simulations using a real-world environment, namely, German cities. In this environment, simulated
“individuals” first (Time 1) answer real-world questions such as, “Which city has more inhabitants: (a) Essen or (b)
Bremen?”. Then, (Time 2) they learn the correct answer (e.g., “Essen”). Finally, in the attempt to reconstruct the original
answers, they rejudge the same questions at Time 3. The city environment consists of the set of German cities with more then
100,000 inhabitants (excluding Berlin, 82 cities), with population size as the criterion variable. The environment includes
eight binary ecological cues (see Table 3) and the actual 8×82 positive and negative values of the objects (cities) on the cues.
The cues include predictors such as the soccer-team cue (“Does the city have a team in the major league?”) and the state-
capital cue (“Is the city a state capital?”). The complete environment (e.g., cues and cue values) is shown in Gigerenzer and
Goldstein (1996). Next, we describe the parameters that we systematically varied in the simulations (see Table 2).
Knowledge: Amount, accuracy, and updating

In the present simulation, the amount of knowledge was simply the percentage of cue values a “person” knows. Within the
German city size environment, perfect knowledge means knowing all 656 cue values (i.e., the values of 82 cities on eight
cues). Individuals with incomplete knowledge have only a portion of the total set of cue values at their disposal. To avoid
selecting implausible knowledge parameters, we reanalysed a previous study in which we had asked 19 students (at the
University of Munich) to recall their values on each of the eight cues for each of the 82 cities. On average, participants
recalled 89% of all cue values (SD=10%). Amongst participants, the amount of knowledge ranged from 70 to 100% of all cue
values. Informed by this analysis, we implemented different amounts of knowledge, ranging from 30 to 100% (see Table 2),
thus extending the range beyond the (empirical) lower bound to examine how a small amount of knowledge affects hindsight
bias.
In addition to varying the sheer amount, we also varied knowledge accuracy. Knowledge accuracy is simply the percentage
of cue values (among the known ones) that are correct. Perfectly accurate knowledge means that every cue value a “person”
has stored is correct. Individuals with less than perfectly accurate knowledge need to rely on (some) cue values that are false.
In the reanalysis of the Munich student sample, we also found that, on average, 86% of the known cue
TABLE 2 Parameters investigated in the simulations

Parameter Description Implementation Range
Amount of knowledge Percentage of cue values stored Which of the cue values were Complete knowledge: 100%
in long-term memory designated to be unknown was (656 values) Incomplete
randomly determined across all knowledge: 30% (167 values),
cue values 40%, 50%,…, 90%
Accuracy of knowledge Percentage of false cues values Positive and negative cue values Complete accurate knowledge:
among the known cue values were randomly replaced by the 0% false cue values Partly false
opposite cue values knowledge: 5%, 10%, 15%, …,
35%
Updating probability of cue Probability with which Unknown cue values were Updating probability: 0.05, 0.1
values unknown cue values are updated randomly replaced with positive (default value), 0.15, 0.2
in accordance with outcome or negative cue values (in
information accordance with outcome
information)
values were accurate (SD=6%) and that the proportion of false cue values ranged between 5 and 25%. Informed by these
values, we implemented degrees of false knowledge (of cue values) ranging from 0 to 35% (see Table 2).
Finally, according to the RAFT model, elusive and missing cue values can be updated by feedback. For the sake of
simplicity, we assumed that cue values (i.e., positive, negative, and unknown) are veridically retrieved from long-term
memory (thus ignoring the fact that cue values at Time 1 and 3 may differ simply because memory retrieval is typically not
completely reliable), and that updating would only occur (with some probability) if cue values were unknown in long-term
memory. The updating probability for unknown cue values was set to range from 5 to 20% (Table 2). It is thus within close
range of the rate that we observed in Hoffrage et al.’s (2000) Study 2.
Across all simulations, we examined 8 (amount of knowledge: 30%, 40%,…, 90%, or 100% of the cue values were known)
×8 (knowledge accuracy: 0%, 5%,…, 30, or 35% of known cue values were false) combinations. Within each of the 64
combinations, we simulated 100 “individuals”, who differed randomly from one another in the particular cue values that were
false or missing. Similarly, for each individual (and for each of 100 runs of each individual) we randomly determined which of
the missing cue values would be updated as well as the set of 41 city pairs (out of the 82 cities) to which an “individual”
responded. The results were averaged across individuals and runs.
Let us also highlight that we did not predetermine the same hierarchy of cues for all simulated individuals. Rather, we
calculated cue validities on the basis of the existing knowledge of cue values for each individual (at Time 1). Thus, the cue
order of a person who knows, say, 50% of all cue values could be quite different from the cue order of another person with
more or less knowledge and even different from the cue order of another person with the same amount of knowledge
(depending on the distribution of known and unknown cue values).
In Simulation 1, we assumed knowledge to be completely accurate and we kept the updating probability constant (i.e.,
default value of 0.10; see Table 2). Here we examined how an increasing amount of cue-value knowledge affects hindsight
bias. In Simulation 2, we replicated the knowledge simulation but explored the impact of an alternative inference mechanism
on the hindsight bias. In Simulation 3, we analysed how an increasing proportion of false knowledge affects hindsight bias.
The general policy we followed throughout the simulations was to vary the specific parameter under consideration, while
keeping all others constant or averaging across them (i.e., ceteris paribus policy). Although this procedure restricted our
ability to explore intricate interactions among parameters, it helped us to focus on the main effects, thus increasing the results’
transparency and comprehensibility.
In empirical studies, hindsight bias occurs when the recalled choices are more accurate than the original choices. To control
for other factors that may also affect the accuracy of the inferences at Time 3, the observed increase in accuracy is typically
appraised against changes in accuracy in a control condition in which no outcome infor mation was provided, and in which
typically no systematic differences occur. Similarly, in our simulations, we expected no systematic differences in the control
conditions to occur, and, in fact, we found none. Therefore, we could simplify the hindsight bias measure: Specifically, we
computed the difference between the average percentages of correct inferences at Time 3 (henceforth, hindsight accuracy)
and Time 1 (henceforth, foresight accuracy). To control for different levels of foresight accuracy (thus avoiding the problem
of a ceiling effect), hindsight bias was expressed as the ratio of this difference and the maximum difference between hindsight
and foresight accuracy. Specifically, hindsight bias equals 100* (hindsight accuracy−foresight accuracy)/(100−foresight
accuracy).
Figure 2. Foresight and hindsight accuracy (i.e., amount of correct inferences at Time 1 and Time 3) achieved by Take The Best, and
hindsight bias as a function of amount of knowledge.
SIMULATION 1:
HOW DOES AMOUNT OF KNOWLEDGE AFFECT HINDSIGHT BIAS?
In their meta-analysis of hindsight bias studies, Christensen-Szalanski and Willham (1991) observed that more expertise in
the domain from which the tasks were sampled yielded less hindsight bias. In the first simulation, we operationalised
expertise in terms of the amount of knowledge about cue values at Time 1 (all of which are correct). Does access to more cue
values guard against hindsight bias? Figure 2 depicts foresight accuracy, hindsight accuracy, and hindsight bias as a function
of the amount of foresight knowledge. Clearly, more knowledge reduces hindsight bias. Specifically, the size of hindsight bias
turns out to be a linear function of the amount of knowledge. That is, the more foresight knowledge a person has, the smaller
the bias he or she tends to exhibit. For instance, a person who knows only 30% of all cue values displays a hindsight bias that
is about seven times as large as a person who knows 90% of all cue values—34% versus 5%.
Which mechanism underlies the compelling relationship between cue-value knowledge and amount of hindsight bias? To
explore this question, let us focus on two of the eight degrees of knowledge (at Time 1), namely, scant knowledge and ample
knowledge (i.e., 30% versus 90% of cue values are known). In addition, let us introduce two new concepts, namely, frugality
and utilisation rate. The former refers to the average number of cues a heuristic needs to look up before it can arrive at a decision
(e.g., as to which of two objects scores higher on a criterion value); the latter refers to the percentage of choices in the total set
that are determined by a given cue. Both concepts will help to explain why less foresight knowledge yields more hindsight
bias.
According to the RAFT model, outcome knowledge can be used to infer missing cue values. Therefore, some of the cues that
did not discriminate at Time 1 will discriminate at Time 3.3 Consequently, Take The Best becomes more frugal at Time 3,
that is, it needs to look up fewer cues before it arrives at a decision. Of course, such savings in information search will be
larger when the decision maker only has scant knowledge to begin with—with ample knowledge, Take The Best already
tends to be frugal. Indeed, while the average number of cues looked up decreases from 5.7 (Time 1) to 4.6 (Time 3) for scant
knowledge, it remains the same for ample knowledge (3.5 at Time 1 versus 3.5 at Time 3). This fact is illustrated in Figure 3,
which shows the utilisation rate of all eight cues and the “guessing” cue at Time 1 and Time 3, respectively. For scant
knowledge at Time 1, the three highest-ranked cues were used in about 25% of all inferences; at Time 3 they were used in
more than 40% of all inferences. Finally, the guessing rate dropped from about 33% to 20%. In contrast, in the case of ample
knowledge, the cues’ utilisation rates and the guessing rate remained almost unchanged.
The changes in the heuristic’s frugality and the cues’ utilisation rates point to a mechanism that accounts for scant foresight
knowledge yielding a larger hindsight bias: Scant knowledge leaves more room for updating to affect the process of
reconstruction than ample knowledge does. Why? First and foremost, the fewer cue values a person knows at Time 1, the
more cues the Take The Best heuristic has to look up to make an inference (e.g., with scant knowledge 5.7 cues). If the choice
at Time 1 has been made, for instance, by the sixth-ranked cue, then each higher-ranked and updated cue now has a chance to
be the one reason that the Take The Best heuristic uses to determine the choice at Time 3. In addition, less knowledge also
means more missing cue values. Therefore, if a cue is retrieved at Time 3, the likelihood that this cue includes an updated
value increases with less knowledge. Finally, if the Take The Best heuristic uses an updated cue to arrive at the choice, then
3A cue is said to discriminate between two objects, a and b, if one object has a positive cue value and the other does not (i.e., it either has a
negative value or is unknown) for this cue. Cue updating can turn a cue that does not discriminate between object a and b into one that does
discriminate by updating one or both missing values of the objects for the cue.
Figure 3. Utilisation rate at Time 1 and 3 for each of the eight cues, rank ordered according to their validity (see Table 3) as a function of
amount of knowledge.
this choice will necessarily be correct (in terms of the criterion value), because updating cue values is contingent on outcome
knowledge. A correct choice at Time 3, in turn, can yield hindsight bias (if the choice at Time 1 was incorrect).
To conclude, to the extent that the amount of knowledge of cue values reflects expertise, the RAFT model can account for
the expertise effect reported in Christensen-Szalanski and Willham (1991). As Simulation 1 has shown, less foresight
knowledge makes the veridical reconstruction of the original choice less likely. Within the RAFT model, one can delineate a
mechanism that accounts for this effect. Less knowledge leaves more “room” for updating to affect the reconstruction process
at Time 3. This explanation gives rise to an interesting question: Would this effect of foresight knowledge on hindsight bias
also occur if cues were processed in a completely different way? The next simulation examines the question of whether the
effect of foresight knowledge arises specifically from the application of the Take The Best heuristic.
SIMULATION 2:
IS THE EFFECT OF KNOWLEDGE ON HINDSIGHT BIAS ROBUST ACROSS DIFFERENT
HEURISTICS?
How do experts make inferences about uncertain aspects of the world? As mentioned earlier, some researchers (e.g., Shanteau,
1992) have suggested that experts should bring all their relevant knowledge to bear. This idea reflects the widespread
assumption in cognitive psychology that more information yields better performance (see Hertwig & Todd, in press). The
research programme on fast and frugal heuristics (Gigerenzer et al., 1999b) has thoroughly challenged this ubiquitous
assumption and some experts have been shown to rely on just one or a few pieces of information (e.g., Green & Mehr, 1997).
Yet, it may be the case that those experts who attempt to look up all available information and to integrate it into one score
exhibit a different relationship between foresight knowledge and hindsight bias than those who employ a one-reason decision-
making strategy such as the Take The Best heuristic.
There is a plausible reason why this might be the case. If a choice has initially been made on the basis of a set of cues rather
than on one single cue, it may prove to be more robust towards slight changes in the updated knowledge state. The Take The
Best heuristic, in contrast, may amplify the effects of updating since a single updated cue value can lead to the opposite
choice. What strategy integrates multiple pieces of information while still being psychologically plausible? Robyn Dawes (e.g.,
1979) suggested a compensatory strategy that does not overtax the processing capabilities of the human mind. This strategy,
which Gigerenzer et al. (1999b) called Dawes’ rule, is a linear strategy with unit weights that has been advocated as a good
approximation of weighted linear models (Dawes, 1979; Einhorn & Hogarth, 1975). It simply adds up the number of positive
cue values and subtracts the number of negative cue values (ignoring missing values) and thus is very different from the Take
The Best heuristic: While both are fast (i.e., they do not involve much computation), Dawes’ rule is far from being frugal—it
bases its choice on all available pieces of information. Table 3 illustrates the policy of Dawes’ rule. It also demonstrates how
updating does or does not give way to hindsight bias, depending on the inference mechanism employed.
Does more knowledge involve less hindsight bias if it is processed by Dawes’ rule? To answer this question, we conducted
a simulation that exactly replicated the previous one with the exception that the Take The Best heuristic was replaced by
Dawes’ rule. Before attending to hindsight bias, let us first consider the accuracy of the Dawes’ rule prior to updating. On
average (i.e., across all degrees of knowledge of cue values), Dawes’ rule made 69.3% correct inferences at Time 1. The
surprising observation is that Take The Best matched this performance: It scored 69.7% correct inferences although it used only
about 44% of the information available to Dawes’ rule (for similar results see Gigerenzer, Czerlinski, & Martignon, 1999a).
Despite being much more frugal than Dawes’ rule, Take The Best does not pay a price in lost
TABLE 3 Choices at Time 1 and 3 as a function of knowledge states and inference mechanism (Dawes’ rule versus the Take The Best
heuristic) for the German cities task
Time 1 Time 3
Cues Essen Bremen Essen Bremen
Exposition site ? − → + −
Soccer team − ? − ?
Intercity train ? + ? +
State capital − + − +
Licence plate − − − −
University + + + +
Industrial belt ? ? ? ?
East Germany − ? − ?
−3 +1 −2 +1
Choice (Dawes’ rule) “Bremen” “Bremen”
Choice (Take The Best) “Bremen” “Essen”
The probabilistic mental model contains eight cues ranked according to their ecological validity. Cue values are positive (+) or negative
(−); missing knowledge is indicated by question marks. To infer whether Essen is larger than Bremen, Dawes’ rule adds up the
number of positive cue values and subtracts the negative cue values. The Take The Best heuristic, in contrast, looks up only the
cue values in the shaded areas. At Time 3, the cue value of the exposition site cue for Essen shifts towards feedback (i.e., Essen
is larger than Bremen). As a consequence, the Take The Best heuristic predicts hindsight bias on the level of choice, while
Dawes’ rule predicts veridical recollection.
accuracy. But does it pay a price in terms of a larger hindsight bias? The results (Figure 4) show that it does not:
Reconstructing one’s original choice on the basis of Dawes’ rule does not attenuate hindsight bias. On the contrary, hindsight
bias for Dawes’ rule is slightly but consistently larger than for the Take The Best heuristic. Across all degrees of knowledge,
the average hindsight bias was 18.4% and 17%, respectively. Except for this slight difference in hindsight bias, the results in
Figures 2 and 4 coincide: Again, the amount of hindsight bias is a linear function of the amount of knowledge. For instance,
the hindsight bias of a person who knows only 30% of all cue values is more than seven times the size of that of a person who
knows 90% of all cue values (38.2% versus 5.2%).
Thus, Dawes’ rule reproduces the same relation between foresight knowledge and hindsight bias as the Take The Best
heuristic. The underlying mechanism, however, must be a different one as Dawes’ rule processes cues differently. To identify
the mechanism, let us again focus on scant (i.e., 30% of cue values are known) versus ample (i.e., 90% of cue values are
known) knowledge. To reiterate, the policy of Dawes’ rule is simply to add up the number of positive cue values and subtract
the number of negative cue values (ignoring missing values) for each alternative. The decision rule is to choose the alternative
with the higher score (the “winner”). To understand why less foresight knowledge yields less hindsight bias, it is instructive to
consider the average difference between the “winner” and “loser” scores. With scant knowledge, the average difference is
much narrower than with ample knowledge, namely, 2.0 versus 3.7. This has an important implication: Updating one or only
a few missing cue values can overturn the original choice much more easily for scant knowledge than for ample knowledge.
As a consequence, hindsight bias is more likely to occur with scant knowledge.
To conclude, Dawes’ rule, a linear strategy with unit weights, produces the same finding we observed earlier: More
foresight knowledge results in less hindsight bias. In fact, both the non-compensatory Take The Best heuristic and the
compensatory Dawes’ rule yielded mostly identical results, with the latter exhibiting a slightly larger hindsight bias than the
former. Thus, Christensen-Szalanski and Willham’s (1991) expertise effect can be accounted for by two completely different
processing policies. Although this finding may seem surprising, it is consistent with the fact that these authors observed the
expertise effect across a wide range of studies, and thus very likely across different inference strategies.
SIMULATION 3:
HOW DOES ACCURACY OF KNOWLEDGE AFFECT HINDSIGHT BIAS?
Up to this point in our investigation, we assumed knowledge of cue values to be completely accurate. Knowledge, however,
may not always be accurate and, possibly, experts’ knowledge may be more exact than that of novices. In the final simulation,
we turned to the relationship between knowledge accuracy and hindsight bias. Specifically, we replicated Simulation 1 (using
the Take The Best heuristic and assuming an updating probability of .10) and introduced one additional variable, namely, the
Figure 4. Foresight and hindsight accuracy (i.e., amount of correct inferences at Time 1 and Time 3) achieved by Dawes’ rule (and Take
The Best), and hindsight bias as a function of amount of knowledge.
Figure 5. Foresight and hindsight accuracy (i.e., amount of correct inferences at Time 1 and Time 3) achieved by Take The Best, and
hindsight bias as a function of amount of false knowledge.
accuracy of knowledge. Specifically, we implemented eight different degrees of false knowledge, ranging from 0 to 35%
incorrect cue values (see Table 2). Figure 5 depicts the Take The Best heuristic’s foresight accuracy, hindsight accuracy, and
hindsight bias as a function of knowledge accuracy (the results are averaged across eight different amounts of knowledge,
ranging from 100% knowledge to 30% knowledge).
Before attending to hindsight bias, let us first consider the effect of false knowledge on judgement accuracy at Time 1. Not
surprisingly, it reduces the percentage of correct inferences in foresight. For illustration, compare the percentage of correct
inferences when 5% and 35% of all cue values are incorrect. Henceforth, we refer to these two states as (relatively) veridical
and flawed knowledge, respectively. While veridical knowledge yields about 67% correct inferences, drawing inferences from
flawed knowledge brings inference accuracy down to about 60%. Does flawed knowledge also result in a larger hindsight
bias? The surprising answer is no. As Figure 5 shows, the more flawed foresight knowledge is, the smaller the size of
hindsight bias. For instance, a person whose knowledge is veridical displays a hindsight bias that is almost one and a half
times larger than the bias of a person whose knowledge is flawed—15.2 versus 9.7%.
Which mechanism might underlie this counterintuitive effect of false knowledge on hindsight bias? One candidate
explanation concerns the impact of false knowledge on the heuristic’s frugality. Although the insertion of incorrect cue values
reduces foresight accuracy, it also reduces the number of cues that the Take The Best heuristic needs to look up before it can
reach a decision. Specifically, with veridical knowledge the heuristic, on average, looks up 3.9 cues (of 9 cues including the
guessing cue). In contrast, with flawed knowledge it only needs to look up 2.9 cues. Adopting the same logic as before (in the
case of scant knowledge, see Simulation 1), this difference in the heuristic’s frugality can account for why less accurate
knowledge yields a smaller hindsight bias. At Time 3 hindsight bias can only occur if the Take The Best heuristic encounters
a cue that has been updated and now discriminates between the two objects before it reaches the cue that initially
discriminated at Time 1 (see Table 1 for an example). With flawed knowledge, how ever, the chances of coming across such a
cue are smaller than with veridical knowledge. The reason is that prior to the cue that discriminated at Time 1 there are only,
on average, 2.9 (as opposed to 3.9) cues for the effect of updating to occur. As a consequence, flawed knowledge admits less
“room” for hindsight bias.
This candidate explanation raises the question of how is it possible that flawed knowledge increases the frugality of the
Take The Best heuristic. To answer this question, one needs to analyse the informational structure of the knowledge environment.
In environments in which the distribution of cue values is skewed such that “negative” cue values outnumber “positive” ones
(e.g., in the German city environment, 71% of the cue values are negative), random insertion of false knowledge will reduce
the asymmetry in the number of positive and negative cue values. Consequently, the cues’ discrimination rate increases, thus
reducing the number of cues that need to be looked up. This fact has an interesting implication: If incorrect cue values were
systematically rather than randomly distributed (e.g., if positive cue values were falsely considered to be negative but not vice
versa) or if the frequency of positive and negative cue values was not as skewed, flawed knowledge might affect the
reconstruction process at Time 3 in a rather different way.
Finally, it is noteworthy that flawed knowledge not only increases the discrimination rate, but it can also decrease the
validity of cues below 50%. As a consequence, those “invalid” cues will be eliminated from consideration, thus providing
even fewer cues for the effect of updating to occur. To test the extent to which both a higher discrimination rate and a smaller
set of cues might contribute to the “debiasing” effect of false knowledge, we ran a modified version of Simulation 3. Here
false knowledge was inserted such that the cues’ discrimination rate remained the same, and “invalid” cues (validity < 50%)
were not eliminated from consideration (i.e., the number of available cues remained constant). In this simulation, the
“debiasing” effect of false knowledge was reduced by half, thus indicating that both factors can provide a partial explanation
for the surprising effect of false knowledge.
DISCUSSION
Hindsight bias may occur because of the attempt to reconstruct one’s original judgement (Hawkins & Hastie, 1990). The
RAFT model (Hoffrage & Hertwig, 1999) was proposed to account for this process of reconstruction. The model’s core
assumption is that outcome knowledge (e.g., Bush won the presidential election) can be used to update the probabilistic
knowledge from which we draw inferences. Thus, hindsight bias is not so much viewed as a bias but as a consequence of
learning by feedback. By being explicit about the processes, the RAFT model opens the door to detailed analyses of the make-
up of hindsight bias.
The present simulations rendered three major results: Consistent with Christensen-Szalanski and Willham’s (1991)
observation, we found that more foresight knowledge results in a smaller hindsight bias (Simulation 1; see Figure 2). This
relation appears to be independent of the inference strategy used to process a person’s knowledge of cues and cue values:
Both a compensatory and a non-compensatory inference strategy yield comparable results, with the former showing a slightly
but consistently larger hindsight bias (Simulation 2; see Figure 4). Finally, we observed that more flawed foresight knowledge
led to a smaller hindsight bias (Simulation 3; see Figure 5).
Our investigations confirm the utility of developing and testing precise process models of hindsight bias. In addition, they
provide additional empirical support for the RAFT model in so far as the model can predict and account for the well-known
expertise effect. Finally, our results are also of importance as an existence proof that different cognitive strategies can yield
similar predictions, thus suggesting the possibility that a particular judgement or memory phenomenon may be robust across a
variety of different processing strategies. Next, we explore ways of testing the predictions that emerged from our simulations,
discuss alternative accounts of the expertise effect, and, more generally, evaluate the role of hindsight bias.
A first test of the expertise prediction and the issue of policy capturing
Consistent with Christensen-Szalanski and Willham’s (1991) finding, the RAFT model predicts that more knowledge (of cue
values) leads to less hindsight bias. This prediction can be evaluated in studies that manipulate or keep track of people’s cue
knowledge. To this end, we reanalysed a previous study (Hoffrage et al., 2000, Study 2) that recorded such knowledge. In this
study, participants were asked to assume the role of a health-insurance company employee. Assuming this role, they learned
some facts about a dozen fictional individuals who had submitted applications to purchase health insurance. These facts
referred to the applicants’ health status and included information about the presence or absence of three risk factors (parents’
hypertension, excess of weight, and smoking).
Participants were then instructed that the cost of a person’s health insurance depends on the presence or absence of health-
risk factors: Unfortunately, applicants had forgotten to indicate their values for a key risk factor, namely, (high) blood
pressure. Therefore, the participants’ task was, among other things, to decide for pairs of two applicants “Which of them has
higher blood pressure?”. To be able to make this choice, participants also learned that parents’ hypertension, excess of
weight, and smoking were cues for high blood pressure, and they were told the validities of these cues (80%, 70%, and 60%).
Either before or after they gave their response (Time 1), they recalled the values on all three cues they had learned—these
values represent their amount of foresight knowledge. In the second session, participants received the correct answer (in one of
three conditions; for the other two see Hoffrage et al., 2000) and then were asked to recall their original choice.
Was hindsight bias in this study larger for people with less foresight knowledge, as predicted by the RAFT model? To
answer this question, we computed the correlation between participants’ amount of foresight knowledge (here focusing only
on correct cue values) and their respective hindsight bias. Consistent with the RAFT model’s prediction, we found that the
amount of foresight knowledge and the magnitude of hindsight bias are negatively correlated in the feedback condition
The same correlation was positive in the no-feedback condition (r=.11), and the effect size of the difference
between the correlations in the two conditions amounted to q—0.26 (which corresponds to a medium effect size; see Cohen,
1988, p. 115). In short, the empirical results confirm the results obtained in Simulation 1.
Before we turn to alternative accounts of the expertise effect, let us briefly discuss a methodological implication of the
surprising finding that compensatory and non-compensatory processing strategies of cues produced similar sizes of hindsight
bias (Simulation 2). In our view, this finding should not discourage experimenters from trying to find out which inference strategy
people use. True, on an aggregate level, the Take The Best heuristic and Dawes’ rule yielded almost identical amounts of
hindsight bias. On the level of individual judgements, however, they produced diverging judgements. Experimenters can take
advantage of this fact. Specifically, if a study’s aim is to identify which inference policy people use, the experimenter can
compose a set of items that amplifies the differences between the strategies. For instance we observed that, averaged across
all amounts of foresight knowledge and all proportions of false knowledge, the Take The Best heuristic and Dawes’ rule
yielded identical outcomes (i.e., hindsight bias, reversed hindsight bias, veridical recollection) in about two-thirds of all items.
Therefore, an experimenter who tries to capture people’s judgemental policies could sample from those choice tasks that
discriminate between the policies instead of drawing a representative sample of tasks (see Dhami, Hertwig, & Hoffrage, 2003,
for a discussion of this topic).
Alternative accounts
Next to the RAFT’s model account of the expertise effect, we can think of two alternative explanations for it—one follows
from the RAFT model; the other refers to a process whose existence the RAFT model acknowledges (see Figure 1 in Hoffrage
et al., 2000) but does not aim to model, namely, direct recall of the original judgement. We begin with the latter.
Better episodic memory. Experts are not only more knowledgeable than novices, they also seem to have the ability to learn
mechanisms that afford rapid storage of information in long-term memory (e.g., Ericsson & Chase, 1982; Ericsson & Kintsch,
1995). In the context of hindsight bias research, this ability may enable experts to reliably recall the specific decision episode
in which they determined which of two objects (for instance, Munich or Hamburg) is larger. In other words, experts may have
a better episodic memory (for a review of research on episodic memory, see Tulving, 2002) of the context in which they
arrived at their choice, thus rendering reconstruction processes unnecessary. If so, experts’ average amount of hindsight bias
would be smaller than that of novices. This explanation can be tested by using the number of veridical recollections (i.e.,
judgement at Time 1=recalled judgement at Time 3) as an estimate of cases of direct recall. We know of one hindsight bias
study that kept track of the frequency of veridical recollections as a function of expertise. Pohl (1992) asked second-year
psychology students (“novices”) and researchers (“experts”) to estimate numerical figures, such as “When did J.J.Gibson
publish the book The ecological approach to visual perception?”. He then provided them with the correct answer and asked
them to recall their original estimate. Consistent with the speculation that experts may have better episodic memory, Pohl
observed that the experts’ rate of veridical reproduction of previous estimates was almost twice as high as that for novices,
namely, 33% versus 19%, respectively.
These findings relate to another process account of hindsight bias—the SARA model (Pohl et al., 2003-this issue). This
model conceptualises knowledge in terms of “images” or information units that are used when people estimate numerical
values such as Goethe’s age at death. An image is, for instance, the (subjective) quantitative knowledge of the average life
expectancy of Goethe’s contemporaries. In their simulations, Pohl et al. (2003-this issue) observed that the availability of
more images brings the proportion of veridical recollections down because, so runs one plausible suggestion, it is more
difficult to find the original estimate among a larger (as opposed to a smaller) set of images. In addition, if one assumed that
experts have more images at their disposal, then the SARA model would predict that experts’ episodic memory (for their
original judgements) should be worse than that of novices. However, the authors of the SARA model question this prediction,
and suggest instead that expertise may be more related to the quality of images rather than their quantity. Consistent with this
assumption, Pohl (1992) found that the original judgements of experts as compared to those of novices were significantly
closer to the solution. More generally, however, the SARA model and the RAFT model concur in predicting that more
knowledge (in terms of either more images or cues values) will reduce the size of the hindsight bias (see Pohl et al., 2003-this
issue).
More updating. The RAFT model, however, offers yet another candidate reason beyond more knowledge for why experts’
hindsight bias is smaller: the possibility that the rate of updating is higher for novices than for experts. Would a higher
updating rate yield more hindsight bias? To address this question, we varied the size of the updating probability, and found
that more updating produces in fact a larger hindsight bias. Figure 6 shows the amount of hindsight bias for four different
Figure 6. Hindsight bias as a function of amount of knowledge and the probability of updating (p=.05, p=.10, p=.15, p=.20).
updating probabilities: .05, .10, .15, and .20. Clearly, if novices’ knowledge were updated at a higher rate than experts’
knowledge, then, all else being equal, the former would show a larger average hindsight bias. The results in Figure 6 also
show that the impact of the updating rate on size of hindsight bias increases with increasing lack of knowledge. But is
novices’ knowledge indeed more thoroughly updated than experts’ knowledge? As we discuss in the following section, this is
just one of the novel and, as we believe, exciting questions that emerge from the RAFT model.
New questions
Hindsight bias is not a simple, uniform phenomenon. It is likely to be shaped by processes of recollection and reconstruction
(Erdfelder & Buchner, 1998) as well as by judgement phenomena such as the reiteration effect (Hertwig et al., 1997). The
results of our simulations support the view of hindsight bias as the sum of multiple determinants. Specifically, we find that
people’s hindsight bias is, all else being equal, a function of their amount of foresight knowledge, the proportion of false
knowledge, and the degree to which feedback updates their missing knowledge. Thus, hindsight bias is a nontrivial composite
of effects that take different directions: While the effects of knowledge accuracy and updating go in the same direction (i.e.,
hindsight bias increases with increasing accuracy and updating probability), the effect of the amount of knowledge goes in the
opposite direction (i.e., hindsight bias shrinks with increasing amount of knowledge)—at least in the present knowledge domain.
To the extent that hindsight bias reflects the combination of those three effects, their individual sizes will determine the size
of hindsight bias. In the simulations, we observed that the “debiasing” effect of more knowledge was larger than the “biasing”
effect of more accurate knowledge.4 In fact, a larger “debiasing” effect of knowledge amount could also explain why,
according to Christensen-Szalanski and Willham’s (1991) meta-analysis, experts exhibit less hindsight bias than novices
although the former may enjoy both ample and accurate knowledge.
However, to understand better how much influence each of the three effects exerts on hindsight bias, we need to know not
only the range of parameter values but also how they are distributed. Consider updating probability as an example. In our
simulations, we assumed updating to be constant across levels of knowledge. But this need not be so. Alternatively, the
updating may linearly (or exponentially) increase or decrease as a function of knowledge, or updating may follow a (inverse)
U-shaped function. Thus, by focusing on the updating process, the RAFT model raises new questions—for instance, do
novices’ updating probabilities exceed those of experts, or vice versa—that, to the best of our knowledge, have not or only
rarely been raised before. To the extent that knowledge updating represents a general learning mechanism that enables and
supports people’s inductive inferences, the answers to these questions will have relevance beyond the limits of hindsight bias
research.
Another novel issue concerns the question of how robust the RAFT model’s predictions are across different environments.
For illustration, consider the following property of the German city environment. In it, negative cue values outnumber
positive cue values (71% vs 29%, respectively). Does this ecological property affect the amount of hindsight bias predicted by
the RAFT model? To examine this question, we conducted another simulation in which, briefly sketched, we artificially
created environments that mimicked the German city environment (i.e., number of objects, number of cues, and cue
validities), except that positive cue values now outnumbered the negative ones (70% vs 30%, respectively); as in Simulation 1,
amount of knowledge ranged from 30 to 100% and updating probability was .10. Compared to the results of Simulation 1, we
now found the same linear increase of the hindsight bias as a function of more knowledge (see Figure 2); however, the
amount of hindsight bias was smaller, and ranged from 0% to 22% (compared to 0% and 34% in Simulation 1).
Why is this? This difference in the amount of hindsight bias is due to one of Take The Best’s key building blocks, its
stopping rule. The reader familiar with the Take The Best heuristic presented in Hoffrage et al. (2000) will have noticed that
in our simulations we simplified the stopping rule. Here we employed a lenient stopping rule that terminates search if one
object has a positive cue value and the other does not. Previously, we employed a strict stopping rule that terminated search
only when one object had a positive (larger) value and the other a negative (smaller) one. This change in the heuristic’s
architecture was made to achieve consistency with other applications of the Take The Best heuristic (e.g., Gigerenzer &
Goldstein, 1999). Moreover, the lenient stopping rule with its “positive bias” is ecologically rational for those environments in
which negative cue values outnumber positive ones, and thus unknown values are most likely negative ones (Gigerenzer &
Goldstein, 1999). As the new simulation showed, the lenient stopping rule leads to a smaller hindsight bias once the environment
exhibits a “negative” bias. To test whether this change is due to the nature of the stopping rule, we also implemented the strict
stopping rule in the artificial environment (see above), and found that with this rule the amount of hindsight bias is
independent of the ratio of positive and negative cue values.
To conclude, these novel results indicate that environmental structures may interact with the buildings blocks of heuristics.
Given this interaction, we can now ask and examine the question of whether or not the empirically observed hindsight bias in
reality changes as a function of environmental structures. Depending on the answer, we could then make inferences about the
building blocks of people’s heuristics, such as the nature of their stopping rules—strict or lenient.
Hindsight bias: Wagering on the future

Typically, hindsight bias has been interpreted as detrimental to people’s future predictive abilities. Take Fischhoff’s view, for
instance. His early experimental studies carved out this new topic for memory researchers. He stressed that hindsight bias is
not only robust and difficult to eliminate (Fischhoff, 1982a), but also has potentially harmful consequences:
When we attempt to understand past events, we implicitly test the hypotheses or rules we use both to interpret and to
anticipate the world around us. If, in hindsight, we systematically underestimate the surprises that the past held and
holds for us, we are subjecting those hypotheses to inordinately weak tests and, presumably, finding little reason to
change them. Thus, the very outcome knowledge which gives us the feeling that we understand what the past was all
about may prevent us from learning anything from it. (Fischhoff, 1982b, p. 343, emphasis added).
Similarly, Bukszar and Connolly (1988) concluded that, “those findings [of the hindsight bias] raise serious questions about
the ability of humans to learn from experience…. [I]n retrospect, people see the world as unfolding inevitably toward the
present. Outcomes fail to surprise” (p. 630).
We advocate an alternative view (Hoffrage & Hertwig, 1999; see also Roese & Olson, 1996). Although we do not deny
that hindsight bias can have tangible consequences, we interpret it as a by-product of an efficient memory system. Updating is
an important adaptive process because, among other reasons, “it is too expensive to maintain access to an unbounded number
of items” (Anderson & Schooler, 1991, p. 396). A stockpile of memories (e.g., the memories of previous judgements, choices,
decisions) would interfere with the only information that is relevant right now. In this sense, forgetting in conjunction with
updating information may be necessary for memory to maintain its function. It prevents us from using old and possibly
outdated information (see Bjork & Bjork, 1988; Ginzburg, Janson, & Ferson, 1996).
Given this view, does hindsight bias prevent us from learning from the past as suggested by Fischhoff (1982b)? No, on the
contrary, the very existence of bias suggests that we learn from the past. In the RAFT model, updating—that is, learning—
occurs on the level of the imperfect cues from which we draw inferences. This learning is possible because of what Egon
Brunswik (1952) called “vicarious functioning”. If information on one cue is not available, another can replace this cue. Further,
it is not only cues that are interchangeable: For many cases, the possibility of drawing inferences from a cue to a criterion can
be reversed. For instance, not only can the candidates’ charisma tell us about their chances in the election contest, but also the
reverse is true. Therefore, new information about the criterion can be used to update related knowledge in semantic memory—
similar to the updating of outdated information in episodic memory (see Bjork & Bjork, 1988).
4 This difference in effect sizes is not due to the fact that we examined a larger parameter space for knowledge amount (30– 100% known
cue values) than for knowledge accuracy (0–35% false cue values). To test for this possibility we simulated the effect size for knowledge
amount and knowledge accuracy assuming the same parameter space. The hindsight bias for 100% and 65% amount of knowledge
(averaged across all states of knowledge accuracy) amounted to 0% and 12%, respectively. In contrast, the hindsight bias for 0% and 35%
false cue knowledge (averaged across all states of known cue values) was 8.3% and 4.4% respectively. In addition, we simulated an
empirically derived parameter space (derived from the aforementioned Munich experiment). Here we found that the “debiasing” effect of
knowledge amount strongly exceeded the “biasing” effect of knowledge accuracy. Specifically, while more knowledge decreased by
hindsight bias by 6.5 percentage points, more accurate knowledge increased the hindsight bias by 0.7 percentage points.
Figure 7. Increase of Take The Best’s predictive accuracy after knowledge updating for the total choice set (assuming the same parameters
as in Simulation 1: false knowledge=0%, and updating probability=0.1).
Updating is adaptive: It increases the coherence of our knowledge and the accuracy of our inferences. Consider the results
reported in Figure 2: On average (across all knowledge states), the proportion of correct inferences increases from 64.1 in
foresight to 68.8 in hindsight. Accuracy improves because decision makers have new knowledge (about cue values). This new
knowledge, however, can not only be used to rejudge the initial choices but also to make inferences about novel choices for which
no outcome information was provided. In fact, in the German city environment the test set of 41 choices represents only about
1% of all possible choices (n=3,321). Does the updated knowledge also prove beneficial for the 99% novel inferences? It
does. As Figure 7 shows, the percentage of correct inferences for the complete set of old and new pair comparisons after updating
is higher than prior to it. That is, updating knowledge benefits future, novel choices.
Updating, however, is not a magical panacea. Although it increases the accuracy of our inferences, it may also account for
mental models of the environment that are not completely correct. That is, updating can generate cue values that conform to
the outcome of an event but never theless are false.5 Consider our previous example of US presidential elections. Among
candidates’ personal characteristics, their height has proved to be highly predictive: The taller candidate has won every
presidential race since World War II except in the post-Watergate election of 1976 (Carter versus Ford), and in the notorious
2000 election. Imagine a voter who initially did not know that Gore is taller than Bush. After the election, the voter may have
erroneously inferred that Bush is likely to be taller than Gore. Admittedly, updating by no means guarantees correct
knowledge. However, we suggest that updating will typically result in correct knowledge. In fact, in a reanalysis of Hoffrage
et al.’s (2000) Study 1 we found two-thirds of the updated information to be correct.
CONCLUSION
Winston Churchill was once asked what the desirable qualifications were for any young person who might wish to become a
politician. He responded that “it is the ability to foretell what is going to happen tomorrow, next week, next month, and next
year. And to have the ability afterwards to explain why it didn’t happen” (cited in Buchanan, 2000, p. 185). The interesting
insight from the present findings is that being able to explain in hindsight why something did or did not happen (in contrast to
one’s expectations) is the key to improving our ability to foretell what is going to happen in the future. In the same spirit, we
shall point out that while less knowledgeable decision makers may be less apt to veridically reconstruct past judgements, they
enjoy the benefit of learning more from new knowledge. In the trade-off between accurate memories of the past and accurate
inferences in the future, they wager on the latter.
REFERENCES
Anderson, J.R., & Schooler, L.J. (1991). Reflections of the environment in memory. Psychological Science, 2, 396–408.
Arkes, H.R., Wortmann, R.L., Saville, P.D., & Harkness, A.R. (1981). Hindsight bias among physicians: Weighing the likelihood of
diagnoses. Journal of Applied Psychology, 66, 252–254.
Bjork, E.L., & Bjork, R.A. (1988). On the adaptive aspects of retrieval failure in autobiographical memory. In M.M.Gruneberg, P.E.Morris
& R.N. Sykes (Eds.), Practical aspects of memory: Current research and issues, Vol I: Memory in everyday life (pp. 283–288). New York:
Wiley.
Bröder, A. (2000). Assessing the empirical validity of the “Take The Best” heuristic as a model of human probabilistic inference. Journal of
Brunswik, E. (1952). The conceptual framework of psychology. International Encyclopedia of Unified Science. Chicago: University of
Chicago Press.
Buchanan, M. (2000). Ubiquity. London: Weidenfeld & Nicolson.
Bukszar, E., & Connolly, T. (1988). Hindsight bias and strategic choice: Some problems in learning from experience. Academy of
Management Journal, 31, 628–641.
Campbell, J.E., & Mann, T.E. (1996). Forecasting the presidential election: What can we learn from the models? The Brookings Review, 14,
26–31.
Chase, W.G., & Simon, H.A. (1973). Perception in chess. Cognitive Psychology, 4, 55–81.
Chi, M.T.H., Feltovitch, P.J., & Glaser, R. (1981). Categorization of physics problems by experts and novices. Cognitive Science, 5,
121–152.
Christensen-Szalanski, J.J.J., & Beach, L.R. (1984). The citation bias: Fad and fashion in the judgment and decision literature. American
Psychologist, 39, 75– 78.
Christensen-Szalanski, J.J.J., & Willham, C.F. (1991). The hindsight bias: A meta-analysis. Organizational behavior and human decision
processes, 48, 147–168.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Dawes, R.M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571–582.
de Groot, A.D. (1965). Thought and choice in chess. The Hague: Mouton (first edition in Dutch, 1946).
Dhami, M.K., Hertwig, R., & Hoffrage, U. (2003). The role of representative design and sampling in an ecological approach to cognition.
Manuscript submitted for publication.
Einhorn, H.J., & Hogarth, R.M. (1975). Unit weighting schemes for decision making. Organizational Behavior and Human Decision
Processes, 13, 171– 192.
Erdfelder, E., & Buchner, A. (1998). Decomposing the hindsight bias: A multinomial processing tree model for separating recollection and
reconstruction in hindsight. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 387–414.
Ericsson, K.A., & Chase, W.G. (1982). Exceptional memory. American Scientist, 70, 607–615.
Ericsson, K.A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211– 245.
Ericsson, K.A., & Staszewski, J.J. (1989). Skilled memory and expertise: Mechanisms of exceptional performance. In D.Klahr &
K.Kotovsky (Eds.), Complex information processing: The impact of Herbert A.Simon (pp. 235–267). Hillsdale, NJ: Lawrence Erlbaum.
Fischhoff, B. (1982a). Debiasing. In D.Kahneman, P. Slovic, & A.Tversky (Eds.), Judgment under uncertainty: Heuristics and biases
(pp. 422–444). Cambridge: Cambridge University Press.
Fischhoff, B. (1982b). For those condemned to study the past: Heuristics and biases in hindsight. In D. Kahneman, P.Slovic, & A.Tversky
(Eds.), Judgment under uncertainty: Heuristics and biases (pp. 335– 351). Cambridge: Cambridge University Press.
Gigerenzer, G., Czerlinski, J., & Martignon, L. (1999a). How good are fast and frugal heuristics? In J. Shanteau, B.A.Mellers, &
D.A.Schum (Eds.), Decision science and technology: Reflections on the contributions of Ward Edwards (pp. 81–103). Norwell, MA:
Kluwer.
Gigerenzer, G., & Goldstein, D.G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103,
650–669.
Gigerenzer, G., & Goldstein, D.G. (1999). Betting on one good reason: The Take The Best heuristic. In G. Gigerenzer, P.M.Todd, & the
ABC Research Group, Simple heuristics that make us smart (pp. 75– 95). New York: Oxford University Press.
Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological
Review, 98, 506– 528.
Gigerenzer, G., Todd, P.M., & the ABC Research Group. (1999b). Simple heuristics that make us smart. New York: Oxford University
Press.
Ginzburg, L.R., Janson, C., & Ferson, S. (1996). Judgment under uncertainty: Evolution may not favor a probabilistic calculus. Behavioral
and Brain Sciences, 19, 24–25.
Green, L., & Mehr, D.R. (1997). What alters physicians’ decisions to admit to the coronary care unit? The Journal of Family Practice, 45,
219–226.
Hawkins, S.A., & Hastie, R. (1990). Hindsight: Biased judgment of past events after the outcomes are known. Psychological Bulletin, 107,
311–327.
Heisenberg, W.K. (1971). Physics and beyond: Encounters and conversations. New York: Harper & Row.
Hertwig, R., & Hoffrage, U. (2001). Empirische Evidenz für einfache Heuristiken: Eine Antwort auf Bröder [Empirical evidence for simple
heuristics: A response to Bröder]. Psychologische Rundschau, 52, 162–165.
Hertwig, R., & Todd, P.M. (in press). More is not always better: The benefits of cognitive limits. In D.Hardman & L.Macchi (Eds.),
Thinking: Psychological perspectives on reasoning, judgment and decision making. New York: Wiley.
Hoffrage, U., & Hertwig, R. (1999). Hindsight Bias: A price worth paying for fast and frugal memory. In G. Gigerenzer, P.M.Todd, & the
ABC Research Group, Simple heuristics that make us smart (pp. 191–208). New York: Oxford University Press.
Huntford, R. (1999). The last place on earth. New York: Modern Library.
Kahneman, D., & Tversky, A. (1984). Choices, values, and frames. American Psychologist, 39, 341–350.
Larkin, J., McDermott, J., Simon, D.P., & Simon, H.A. (1980). Expert and novice performance in solving physics problems. Science, 208,
1335–1342.
Lichtman, A.J. (1996). The keys to the White House, 1996. Lanham, MD: Madison Books.
Pezzo, M. (2003). Surprise, defence, or making sense: What removes hindsight bias? Memory, 11, 421–441.
Pohl, R.F. (1992). Der Rückschau-Fehler: Systematische Verfälschung der Erinnerung bei Experten und Novizen [Hindsight bias:
Systematic distortions of the memory of experts and laymen]. Kognitionswissenschaft, 3, 38–44.
Memory, 11, 337–356.
Rieskamp, J., & Hoffrage, U. (1999). When do people use simple heuristics, and how can we tell? In G. Gigerenzer, P.M.Todd, & the ABC
Research Group, Simple heuristics that make us smart (pp. 141–167). New York: Oxford University Press.
Roese, N.J., & Olson, J.M. (1996). Counterfactuals, causal attributions, and the hindsight bias: A con ceptual integration. Journal of
Experimental Social Psychology, 32, 197–227.
Shanteau, J. (1992). Competence in experts: The role of task characteristics. Organizational Behavior & Human Decision Processes, 53,
252–266.
Simon, H.A., & Barenfeld, M. (1969). Information-processing analysis of perceptual processes in problem solving. Psychological Review,
76, 473–483.
Slegers, D.W., Brake, G.L., & Doherty, M.E. (2000). Probabilistic mental models with continuous predictors. Organizational Behavior and
Solomon, S. (2001). The coldest march: Scott’s fatal Antarctic expedition. New Haven, CT: Yale University Press.
review of social psychology (Vol. 8, pp. 105–132). New York: Wiley.
Synodinos, N.E. (1986). Hindsight distortion: “I knew-it-all along and I was sure about it.” Journal of Applied Social Psychology, 16,
107–117.
Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology, 53, 1–25.
Wendt, D. (1993). No hindsight bias (“knew-it-all-along effect”) during province parliament elections in Schleswig-Holstein in 1988 and
1992. Zeitschrift für Sozialpsychologie, 24, 273–279.
Hindsight bias as a function of anchor distance and anchor plausibility
Oliver Hardt
The University of Arizona, USA
Rüdiger F.Pohl
This study explored the influence of anchor distance on hindsight bias and how the subjective plausibility of
different anchors moderates this relation. In addition to the standard memory design used in hindsight bias
research, participants were asked to indicate the range of values for possible answers to difficult almanac questions.
Varying anchor distance on the basis of each participant’s individual range of possible answers showed (1) that
anchor plausibility decreased with increasing anchor distance following a nonlinear monotone function, (2) that
size of hindsight bias initially increased with increasing anchor distance but, from a certain distance, started to
decrease, and (3) that hindsight bias was found to be always higher for plausible than for implausible anchors.
Hindsight bias is revealed when receipt of complimentary information (i.e., the anchor) systematically influences subsequent
recollections of previous predictions (i.e., the estimate). It is commonly found that estimates are remembered as being closer
to the anchor than they actually had been.
The most outstanding feature of hindsight bias is its robustness (cf. Christensen-Szalanski & Willham, 1991; Hawkins &
Hastie, 1990). Therefore, manipulations that successfully disrupt the unavoidability and automaticity of the effects of anchor
presentation on subsequent recollection of the original estimate are of special interest—they are likely to reveal some of the
phenomenon’s crucial underpinnings. Intuitively, discrediting the anchor information should most seriously damage its
usefulness and thus reduce hindsight bias (Hasher, Attig, & Alba, 1981): Why should an implausible “hint” be employed to
reconstruct a former memory, or why should it be stored in memory and quite possibly interfere with existing bona fide
representations? However, the results of studies that investigated the effects of anchors that were different in plausibility
suggest that even numerically extreme and subjective implausible anchors may provoke anchoring effects (e.g., Northcraft &
Neale, 1987; Pohl, 1998a; Russo & Shoemaker, 1989; Strack & Mussweier, 1997). These studies indicate that hindsight bias
is influenced by both the anchor’s numerical extremity and perceived plausibility, which do not necessarily map onto each
other. However, the relationship between hindsight bias, anchor distance to the original estimate, and

anchor plausibility remains to be explored in more detail.
FEATURES AND DETERMINANTS OF ANCHOR PLAUSIBILITY

Anchor plausibility appears to be a function of the anchor’s compatibility with the individual’s knowledge about the domain
in question (cf. Pohl, 1998a; Strack & Mussweiler, 1997). At least two characteristics of the available knowledge seem to be
critical features of ascribed anchor plausibility: knowledge quantity and knowledge precision (Pohl, 1998a). Unlike precision,
knowledge quantity can hardly be reliably measured, so it will not be considered in this short review of findings that have
been published.
Requests for reprints should be sent to Oliver Hardt, The University of Arizona, Department of Psychology, Tucson, AZ
85721, USA. Email: hardt@u.arizona.eduThe reported research was partially supported by DFG grant Po 315/6–3 to the
second author. We would like to thank the reviewers Edgar Erdfelder, Wolfgang Hell, Ulrich Hoffrage, and Britta Renner for
their contributions that significantly improved the manuscript. We would also like to take the opportunity to thank Markus
Eisenhauer, Almut Hupbach, and Rainer Rothkegel for their helpful critical comments.
ANCHOR DISTANCE AND PLAUSIBILITY 373
In the following paragraphs we will discuss how variables that were at the focus of our study influence subjective anchor
plausibility. Specifically, we were interested in the contribution of the anchor’s (1) distance to the estimate, (2) direction, and,
to a limited extent, (3) source.
Relative anchor distance

For almanac questions that require a numerical estimate, the compatibility of a numerical anchor with the individually
available domain knowledge is assumed to be reflected in the anchor’s numerical proximity to values considered acceptable
answers (Kahneman, 1992, p. 309). Thus, in some studies anchor plausibility was manipulated by varying the anchor’s
distance to the sample’s or a control’s mean estimate for the respective item (Chapman & Johnson, 1994; Kohnert, 1996;
Pohl, 1998a; Strack & Mussweiler, 1997). To study the influence of anchor plausibility in more detail, presenting anchors that
are related to the individual knowledge of each participant seems to be promising (e.g., Kohnert, 1996, Exp. 4, individually
varied anchor distance based on the range in which, with a 90% confidence, participants anticipated the correct answer).
Relative anchor distance can be measured in units of the range of possible estimates, which equals the absolute difference
between the absolutely lowest and highest estimate that the participant still considers to be possible answers to the question.
The midpoint of this range defines the origin of measurement.1 This method allows individual adjustment of anchor distance
for each participant and each item while still permitting comparisons across participants since the anchor distance can be
expressed in terms of relative anchor distance units. It can be expected that subjective anchor plausibility is related to the
subjective range of possible estimates: the larger the anchor’s relative distance to the subjective range’s midpoint, the smaller
its ascribed plausibility.
Symmetry
Although it can be argued that oppositely directed anchors which are equal in relative distance are ascribed with different
plausibilities (e.g., when, for a question about the length of a certain object, the anchor in the negative direction is closer to a
strict numerical limit than the positive anchor), in general and on average across all questions, this effect of anchor direction
on plausibility judgements is likely to be marginal. Hence it can be expected that ascribed anchor plausibility is more or less
symmetrically distributed: two anchors with equal relative distance but opposite directions should on average not be ascribed
with significantly different plausibilities.
Source
To complete our analysis of the determinants of anchor plausibility, we will briefly discuss the relation between the anchor’s
source and plausibility, although the former was not manipulated in the present experiment. Pohl (1998a) demon strated
hindsight bias for anchors that were labelled as the solution even if these values were subjectively implausible. This, however,
was not the case for anchors that were labelled as another participant’s estimate. Here, only subjectively plausible anchors led
to hindsight bias while subjectively implausible ones did not. This result corresponds to findings reported in the suggestibility
literature which demonstrate a stronger influence of authoritative than less authoritative sources (see, e.g., Toglia, Ross, Ceci,
& Hembrooke, 1992). In a summary of anchoring studies, Pohl (1998b) found accordingly that anchoring effects were larger
for anchors that were labelled as the solution or as an expert’s estimate than for anchors that were labelled as a layperson’s
estimate, which in turn were larger than the effects of anchors that were labelled as a random number. Presumably, the
variation of the anchor’s source affected the anchor’s perceived plausibility and thus its impact on the participants’ estimates.
1 Other methods to define the origin of measurement are, of course, possible. We decided to use the arithmetic mean because it assumes a
symmetrical distribution which we thought represents the distribution of possible estimates in the range—of which we only knew the minimum
and maximum values—more appropriately than other measures, such as the geometric or harmonic mean. However, it is very likely that the
distributions for certain questions are positively or negatively skewed, especially if these questions have strict numerical limits (such as
length, percent, or age). For these distributions, the arithmetic mean obviously is not the method best suited to determine the origin of
measurement. Thus future research to increase our knowledge about the distributions of values within the range of possible estimates is
certainly needed to determine how to calculate an index value that represents these distributions most adequately.
374 HARDT AND POHL
DETERMINANTS OF HINDSIGHT BIAS
Range of possible estimates

Pohl (1992) found that, while the number of known solutions was only slightly larger in experts, they distinctly recalled more
of their original estimates than novices did. This finding is in line with the results of the meta-analysis by Christensen-
Szalanski and Willham (1991), who reported reduced hindsight bias for familiar tasks and items. Since a narrow range is
assumed to be associated with more precise knowledge, it can be expected that the number of correctly recalled estimates is larger
for narrow than for wide ranges of subjectively acceptable estimates. Furthermore, Pohl (1992) found that the absolute size of
hindsight bias was smaller for experts than for novices. Since the expertise of his sample was not directly measured, the
“degree” of qualitative difference between novices and experts could only be indirectly approximated. Hence, it seems
worthwhile to relate hindsight bias to the individual’s width of range of possible estimates, which is likely to reflect the precision
of the domain-specific knowledge and could thus serve as an indicator of expertise. Thus, it can be expected that (a) the
number of correctly recalled estimates will be larger for items with narrow than for items with wide ranges of possible
estimates, and (b) hindsight bias will be expressed more for items with wide than for items with narrow ranges of possible
estimates.
Anchor plausibility
Taken together, inconsistent findings have been reported about the effects of anchor plausibility on hindsight bias. While
Strack and Mussweiler (1997) did not detect differences in the anchoring effect elicited by implausible and plausible anchors,
Chapman and Johnson (1994) found a reduction for extremely implausible anchors.
The experiment of Hasher et al. (1981) can also be interpreted as a successful manipulation of anchor plausibility: For
anchors that were labelled as wrong after presentation, that is, for anchors that can be considered as having been interpreted as
absolutely implausible, an elimination of hindsight bias was revealed. Manipulating the source of the anchor, Pohl (1998a)
found a reduction of hindsight bias for subjectively implausible anchors that were characterised as another participant’s
estimate, while no reduction was revealed for anchors that were labelled as correct solutions. In the latter case, hindsight bias
was independent both of the anchor’s extremity and its subjective plausibility.
In most studies, the anchor values were based on an independent sample’s distribution of unbiased estimates, for example,
the 15th percentile as low and 85th percentile as high anchors. However, this procedure offers very little information about
the subjective range that denotes acceptable values for the individual participant. Additionally, the anchors’ plausibility was
not systematically controlled. Participants were not required to report the plausibility they actually ascribed to the anchors
presented (as, e.g., in the studies by Pohl, 1998a). Hence, given the evidence reported in the literature, it seems premature to draw
final conclusions about the effects of anchor plausibility on hindsight bias.
Anchor distance
Some authors have argued that the relation between anchor distance and anchoring effect (anchor-distance function) is linear:
the higher the anchor value, the larger the anchoring effect (Northcraft & Neale, 1987; Russo & Shoemaker, 1989). In
contrast to this position, however, Kohnert (1996, Exps. 3 and 4) concluded on the basis of his results that hindsight bias is
not dependent on anchor distance. In his studies, hindsight bias turned out to be similarly large, independent of whether the
anchor was within or outside the subjective range. Since he employed questions about quantities stated in percentages, anchor
extremity was limited at 0 and 100, respectively. The anchor values employed in theses studies might not have been extreme
enough to rule out the possibility that the function may level off or even decrease at a certain anchor distance.
Using more extreme anchors for almanacquestions, Strack and Mussweiler (1997) reported that anchor extremity had no
effect on the size of the anchoring effect (for similar results cf. Chapman & Johnson, 1994). Strack and Mussweiler (1997,
Exp. 3) constructed extreme anchors that were plus or minus 10 standard deviations away from the mean of an unbiased
distribution of estimates (calibration group). They compared the effects of these anchors that had been rated as highly
implausible by a separate sample to the effects of anchors that were only one standard deviation away from the calibration
mean and that had been rated as highly plausible. They used, for example, the question “How old was Gandhi when he was
shot?” which was accompanied either by the plausible low and high anchors of “64” and “79”, respectively, or by the
implausible low and high anchors of “9” and “140”, respectively. Most surprisingly, the participants’ judgements were found
to be similarly influenced by close and distant anchors, independent of the anchor’s plausibility.
The results of Strack and Mussweiler (1997) would rather support arguments for a linear anchor-distance function with a
slope of zero since both plausible and implausible (i.e., close and distant) anchor values led to the same bias independent of
anchor distance. Similarly, the findings of other studies indicate that the anchor’s relative impact may be diminished for
extreme anchor values (Chapman & Johnson, 1994; Kahneman, 1992; Quattrone, Lawrence, Finkel, & Andrus, 1981).
All these results suggest that the anchor-distance function is not linearly increasing. Moreover, some results show that
hindsight bias may be diminished, absent, or even reversed under certain conditions. In the aforementioned study by Pohl
(1998a) in which both the anchor’s source and extremity were manipulated, hindsight bias was not found for anchors that
were both labelled as another person’s estimate and perceived as being implausible. In all other cases, hindsight bias of similar
magnitude could be observed. Hence, reduction of hindsight bias was dependent more on the anchor’s subjective plausibility
than its distance. Summarising a number of similar anchoring and hindsight studies, Pohl (1998b) found that anchoring effects
were closely linearly related to anchor distance as long as the anchors were subjectively considered plausible. For anchors
subjectively considered implausible, no anchoring effect at all could be observed and, in some cases, a contrast effect even
emerged, irrespective of the anchor’s distance (see, e.g., Kohnert, 1996; Ofir & Mazursky, 1997).
RESEARCH OBJECTIVES
On the background of these rather mixed results, it appears worthwhile to explore the relation between anchor distance,
plausibility, and hindsight bias more thoroughly. It seems promising to assess anchor plausibility for each item and each
participant separately to better understand the conditions that determine the amount of hindsight bias. Additionally, the
relation between hindsight bias and anchor distance that will emerge when anchor distances increase towards extreme values
needs to be studied more systematically.
We tried to address these questions in the study reported here. In this experiment, participants were first asked to indicate
their range of subjectively acceptable answers to difficult almanac questions. Then they were asked to find answers to the
questions. Anchors of different distances were constructed by employing the individual’s subjective ranges of acceptable
answers. One week later, these anchors were introduced as another participant’s estimate and participants were asked to rate
the anchors’ subjective plausibility. Finally, participants were asked to reproduce their own previous estimates.
METHOD
Participants
A total of 82 University of Trier students, 56 females and 26 males, between 19 and 38 years of age, served as participants.
They could choose between a monetary reward or course credits for compensation.
Material
A set of 44 difficult almanac-like questions was used in the experiment. The questions were taken from various topics,
including physics, geography, history, and biology and all required numerical answers.2 Previous studies that used similar
material (e.g., Hell, Gigerenzer, Gauggel, Mall, & Müller, 1988; Pohl, 1996, 1998a; Pohl & Hell, 1996; Schmidt, 1993)
demonstrated that participants could not be expected to know the correct answers, but that they would still be able to
approximate adequate estimates.
Apparatus
A computer program was developed in Apple’s HyperCard and run on Apple Macintosh PowerPC-based computer systems to
present the questions and record the responses. Except for initial greeting and assigning a workplace, the whole experiment
was controlled by the software (instructions were also presented on the computer screen). The user interface was designed for
participants with limited computer experience. Onscreen tutorials explained the tasks and procedures.
To prevent list effects, the experimental items were presented in a new randomised (constraint-restricted, thus pseudo-
randomisation algorithm) order for each participant.
In the first session, each participant was assigned a personal identification number (PIN) to enable anonymous log-in to the
experiment’s second session, subsequent to the 1-week retention interval. The PIN was unknown to the experimenter. Based
on answers in the first experimental session, individual anchor values were calculated for each participant and for each
question presented.
376 HARDT AND POHL
Design
A 1-week retention interval separated estimation from anchor presentation and recollection. The experiment was based on a
one-factor (relative anchor distance) within-subject design. Relative anchor distance was measured in units of the
participant’s subjective range of possible estimates (rp) to a given question. For each question and participant, rp thus could be
different. Relative anchor distance hence was comparable across participants. The midpoint of the range of possible estimates
(Crn), that is, the value with equal distance to the range’s limits, was defined as the origin of measurement for A total of
11 different relative anchor distances was used for each participant
Thus, four questions were pseudo-randomly assigned to each cell (44 questions divided by
11 anchor distances). The randomness of the question’s distribution to the independent variable’s levels was sometimes
restricted, since for some questions and some participants the resulting values would have been physically impossible. In
these cases, a different question was randomly selected until the resulting anchor value was acceptable. For the anchor
distance no systematic deviation of the recalled estimates is expected. The two anchors at are identical
to the lower and upper limit of the subjective range of plausible values. All other anchors lie outside this range.3
Dependent variables
Range of possible estimates. To compute the values of the 11 anchor distances, each participant’s subjective range of possible
estimates had to be determined for each question. To measure the range of each question, participants had to indicate the
absolutely lowest and absolutely highest possible answer. For each participant individually, the absolute
difference between these extremes denoted the subjective range of possible estimates for the current question.
Subjective plausibility of anchor value. The anchor was introduced as another participant’s estimate (cf. Pohl, 1998a).
Participants were asked to rate each anchor’s plausibility on a continuous scale using the computer mouse. Except for the
scale’s limits, no other points were distinct or labelled. The left end of the scale was named “absolutely implausible”, the
opposite end “extremely plausible”. Participants were instructed to rate an answer as “absolutely implausible” if they thought
that the answer by no means qualified as a possible solution to the question. An answer was to be rated as “extremely
plausible” if participants thought it was the correct solution to the question. The rating was transformed into percent by
relating the horizontal screen position of the rating cursor to the scale’s horizontal screen dimensions, with “0%" and “100%"
being assigned to “absolutely implausible” and “extremely plausible”, respectively.
Mean shift of recollections: Hindsight bias. To measure the influence of anchor presentation on subsequent recollection of
the former estimate, several bias indices have been suggested (for a detailed discussion of the advantages and disadvantages
of most of the indices see Kohnert, 1996). In this study, Pohl’s (1992) mean shift of recollections (Δz), was employed to
quantify hindsight bias. Δz compares the distance between the estimate and the anchor value before and after anchor
presentation. Thus, the original estimate E and its recollection R are compared using the anchor value A as an arbitrary origin.
To compute Δz across differently scaled items, the data points needed to be converted to z-scores first. Extreme data points
(i.e., data points more than three standard deviations away from the mean) were eliminated from the data set before
calculating the scores. For each item separately, all estimates, anchor values, and recollections of all participants across all
experimental conditions are included into computation of the z-scores. Due to this procedure, the data points become
independent of the original scales and the previously divergently scaled items become comparable. Using the z-scored data
points, was calculated for each single pair of original and recalled estimates by subtracting (a) the absolute difference between
the recollected estimate Ri and the anchor value Ai from (b) the absolute difference between the original estimate Ei and the
anchor value Ai:
(1)
If the recollected estimate is closer to the anchor value than the original estimate has been, is positive (positive shift). A
negative denotes that the recollected estimate moved farther away from the anchor value than the original estimate had been
(negative shift).4 In the condition is expected not to be significantly different from zero. Because Δz uses distances to
determine the shift in recollection, it cannot assess the shift’s direction, that is, whether the recollected estimate is above or
below the anchor value.
2Some examples of the questions employed in the study were: “In what year was the lightning rod invented?”, “How long is the river Rhine
(in kilometres)?”, “How many bones are in the human body?”. A complete list of the questions can be obtained from the first author upon
request.
3For the sake of readability, we will refer to “relative anchor distance” simply as “anchor distance” in the remainder of this article.
Procedure
The experiment was split up in two sessions that were 1 week apart. It took participants on average about 35 minutes to
complete the first, and 28 minutes to complete the second self-paced session. Participants were not informed about the overall
structure of the experimental sessions, so that they had no knowledge about the number and nature of the prospective tasks:
participants did not expect a memory test subsequent to estimation. Only one single question was ever presented on the
computer screen at a time.
First session. The first session took the following form:
(1) Range of possible estimates. In the first part of the first session, the task was to indicate the range of possible estimates
for each question presented. Participants were instructed to enter the lowest and the highest value for a plausible solution.
Participants were explicitly told to key in what they personally would consider the lowest and highest plausible answer to the
questions, not what they thought others might believe to be adequate solutions. The responses to the current question had to
be confirmed first in order to proceed to the next question. In every task of the experiment, participants were not able to go
back to any confirmed input.
(2) Estimation. The same questions as in the previous task were presented in a new random order. Participants were
instructed to find an answer to each of the 44 questions. The previously collected subjective ranges were not presented again.
Second session. The second session was as follows:
(1) Rating of anchor plausibility. Participants were informed that they would again be presented with the questions from the
first session but that this time the questions would be accompanied by another participant’s answer. They were told that their
task was to rate each answer’s plausibility. They were deceived by being told that for each question, some participant was
randomly selected to provide the estimate, whereas actually these answers had been exclusively tailored for every participant,
using the range of possible estimates each participant had entered in the experiment’s first session. The meaning of the term
“plausible” and the interpretation of the related rating scale were explained in detail. The 44 questions and feigned answers of
other participants (i.e., participant-specific anchor values) were presented in a new random order for each participant.
(2) Recollection. The question and the other participant’s answers were presented in a new random order. Participants were
now instructed to reproduce their own original answers to the questions from the previous week. They were informed that
each question would be presented together with the answer of the same participant that they had just seen. Participants were
instructed to reproduce their own answers from the last week’s session as precisely as possible. For each question,
participants had to recall and type in their original estimate. No feedback on their accuracy was provided.
After completion of the last task, participants were debriefed and received a handout with background information detailing
the study’s purpose, a complete list of the questions and the correct solutions.
RESULTS
Analysis
For each of the 82 participants, five data points were collected for each of the 44 questions presented: (a) minimum and (b)
maximum value of the subjective range of possible estimates, (c) the estimate, (d) plausibility rating for the feedback value
(anchor) scaled to percent, and (e) the recollection of the estimate.
The data of five participants were excluded from the analysis. They had keyed in the anchor value instead of their
recollected estimate for more than 20% of the items, which indicated that these participants had obviously misinterpreted the
instructions (see Pohl, 1995). The data of another participant had to be eliminated from the data set because it was not
possible to construct the anchor with the highest relative distance for this individual, due to this person’s extreme wide ranges
of possible estimates for every item.
Cases in which the estimate matched the correct solution were excluded since for them no bias could be expected.
However, this case was observed only 19 times (0.6% of all estimates), indicating that the selected questions were indeed
rather difficult. Furthermore, data of 115 cases (3.5%) had to be eliminated from the data set because these data points had
been more than three standard deviations away from the mean and were thus considered too extreme. As a consequence,
sometimes participants had to be excluded from certain analyses due to empty cells.
For all analyses that assessed the mean shift of recollections Δz, 605 cases (18.1%) in which participants correctly recalled
their former estimate were excluded and analysed separately. Since this study focused on the relation between anchor distance,
anchor plausibility, and hindsight bias, we decided to use the so-altered data set for the analyses of ascribed anchor
plausibility, in order to allow comparison with results obtained in the analyses of hindsight bias.
4 For each participant, the mean shift or recollections equals the mean of all items’ Δz.
378 HARDT AND POHL
Figure 1. Effects of anchor distance on ascribed anchor plausibility. The black data points indicate the subjective range of possible
estimates in this and all other graphs. The error bars represent the 95% confidence intervals. Post hoc comparisons revealed that ascribed anchor
plausibility was significantly different for every pair of adjacent anchor distances except for the two highest
and for the two lowest anchor distances.
Determinants and features of anchor plausibility

Anchor distance. For all 11 relative anchor distances, the mean plausibility rating across items was calculated for each
participant. Figure 1 illustrates that the larger the anchor distance to the participants’ ranges of possible estimates, the smaller
the mean plausibility ascribed to it. Due to empty cells, five participants had to be excluded from the analysis. A repeated-
measures ANOVA confirmed a significant effect of anchor distance on ascribed anchor plausibility, F(10, 700)=74.20,
MSE=258.14, p < .01.
Symmetry. To analyse the impact of anchor direction on ascribed anchor plausibility, a twoway repeated-measures ANOVA
across anchor direction (positive vs negative) and absolute anchor distance was computed. As
can be seen in Figure 1, only small differences could be observed between positively and negatively bound anchors.
Confirming this impression, a significant effect of absolute anchor distance was revealed, F(4, 280)=132.99, MSE = 259.05, p
< .01, but neither a significant effect of anchor direction, F(1, 70)=2.74, MSE=170.94, p = .10, nor an interaction between
anchor direction and anchor distance, F < 1. Thus, for opposed anchors equal in relative distance, ascribed anchor plausibility
was not found to be significantly different.
Anchor function. Excluding anchors equal tothe midpoint of the participants’ subjective range of possible estimates,
anchors equal in distance were grouped together. Thus leaving anchor direction out of consideration, a regression was
computed to analyse the relationship between anchor distance and anchor plausibility. As suggested by the pattern of the data
(see Figure 1), the relationship between the two variables was non-linear. Before calculating the regression, the axes were
transformed by employing the “natural logarithm”. The regression predicted anchor plausibility by
anchor distance with an accuracy that accounted for 99% of the variance, F(1, 3)=748.39, p < .01; Both intercept and
slope proved to be significant, ta = 228.15, p < .01; p < .01. Thus, given constant reduction of anchor distance,
ascribed anchor plausibility increased faster the closer the anchor approached the range of possible estimates.
Figure 2. Hindsight bias (Δz) for items with narrow and wide ranges of possible estimates, and anchors within and outside the range. The
error bars represent the 95% confidence intervals.
Determinants of hindsight bias

Participants correctly recalled their former estimates in 18% of the cases. The anchor value instead of the estimate was
reported in 6% of the estimate reconstructions while participants produced some other value in the remaining 76% of the
items.
Correct recall. Recollections that exactly matched the previous estimate were scored as “correct recall”, otherwise they
were scored as “incorrect recall”. For every participant, each item’s range of possible estimates was classified as either
“narrow” or “wide” by using a median split. For items with narrow ranges of possible estimates, 10% correct recalls were
observed, while 8% of the estimates where correctly recalled for wide-ranged items. A one-way repeated-measures ANOVA
across range (narrow vs wide) confirmed a significant effect of width of range on the percentage of correctly recalled
estimates, F(1, 75)=8.75, MSE=15.81, p < .01.
Range of possible estimates. For anchors within and outside the

range of possible estimates, the mean shift of recollections Δz was calculated for narrow and wide ranges.
Figure 2 illustrates that for both anchor positions, was smaller for items with a narrow range than for items with a wide
range of possible estimates. Noteworthy, for narrow ranged items, anchors outside the range were not found to yield hindsight
bias. Overall, Δz was higher for items within the range of possible estimates than for items outside the range. A two-way
repeated-measures ANOVA confirmed this impression. The analysis detected a significant effect of anchor position, F(1, 74)
=47.55, MSE=0.08, p < .01, and a significant effect of width of range, F(1, 74) = 23.89, MSE=0.07, p < .01, on but no
significant interaction of the two factors, F < 1.
Anchor plausibility. In order to compute stable and reliable measures of subjective anchor plausibility, we decided to
categorise anchor plausibility into only two categories. Employing the median-split procedure, ascribed anchor plausibility
was categorised as either “plausible” or “implausible” for each participant. Hindsight bias was observed for both categories of
subjective plausibility, revealed by one sample t-tests againsta hypothesised mean of zero: tplaus(75)=12.71, p < .01; timplaus(75)
=7.89, p < .01. A one-way repeated-measures ANOVA conducted on Δz detected a significant effect of anchor plausibility, F
(1, 75)=18.72, MSE=0.03, p < .01. For implausible anchors, Δz was significantly smaller than for plausible anchors
(implausible anchors: vs plausible anchors: 5
Anchor distance. The mean shift of recollections was calculated across the 11 anchor distances. It was found that increased
with increasing anchor distance, except for the highest anchor distances Although Δz was comparably higher for
negatively than for positively directed anchors, the same pattern of data was observed for both anchor directions (Figure 3).
Excluding five participants due to empty cells, a one-way repeated measures ANOVA across all 11 anchor distances was
computed. The analysis detected a significant effect of relative anchor distance on F(10, 70)=6.68, MSE=0.19, p < .01.
The effect of anchor direction on Δz was closer examined. For the analysis, five participants had to be excluded due to one
or more empty cells. Because it contained no information about direction, was not included in the analysis. A two-
way repeated-measures ANOVA across anchor direction (negative vs positive) and anchor distance
was conducted. Significant effects of anchor distance, F(4, 280)=7.95, MSE=0.20, p < .01,
380 HARDT AND POHL
Figure 3. Hindsight bias Δz predicted by collapsed relative anchor distance for two anchor directions. The quadratic regression functions
were for positive anchors for negative anchors, respectively.
and anchor direction, F(1, 70)=10.64, MSE=0.27, p < .01, on were detected. The interaction was not significant, F < 1. For
negative anchors, was significantly higher than for positive anchors.
Instead of the pre-experimentally assumed linear relation, the pattern of data suggested an inverted-U-shaped relationship
between anchor distance and Δz. Accordingly, quadratic regressions proved to best fit the data. Accounting for 98% of the
variance, F(2, 3) = 106.16, p < .01, Δz was predicted by collapsed anchor distance using a quadratic equation. Both
regression coefficients proved to be significant, p < .01; p < .01. In a comparable cubic model, the cubic
term did not reach the level of significance, t=3.22, p=.08. The quadratic model therefore produced the best fit of the data.
Additionally, for each anchor direction, a separate regression was calculated (Figure 3). For both directions, quadratic
regressions best predicted Δz, accounting for 94% of the variance among negative, F(2, 3) = 37.33, p < .01, and for 92% of the
variance among positive anchors, F(2, 3) = 31.52, p < .01. In both regressions, the coefficients proved to be significant
p = .01; p = .01), which was not the case for comparable cubic models (tnegAnc = 1.28, p = .33; tposAnc
= 0.30, p = .80). As in the previous analysis, the quadratic models produced a better fit than the cubic models.
Interaction of anchor plausibility and anchor distance. For each participant, ascribed anchor plausibility was categorised into
plausible and implausible anchors using a median-split. Corresponding anchors different in direction but equal in distance
were grouped together. A two-way repeated-measures ANOVA across ascribed anchor plausibility (plausible vs implausible)
and grouped anchor distance was conducted, revealing main effects of
plausibility, F(1, 15) = 11.40, MSE = 0.46, p < .01, and distance F(5, 75)=8.98, MSE = 0.35, p < .01, on Δz. The interaction
between the two factors was not significant, F < 1. Across the grouped anchor distances, Δz was smaller for implausible than
for plausible anchors.
For both implausible and plausible anchors, the curve characteristics of followed a non-linear function (Figure 4). A quadratic
regression for plausible and implausible anchors predicted Δz by collapsed anchor distance at high precision. A better fit was
accomplished for plausible, F(2, 3) = 105.01, p = .0017, than for implausible anchors, F(2, 3)=15.86, p=.0254,
5 Additionally, the same ANOVA was computed using categorisations of anchor plausibility that were based on different separation criteria.
For example, using identically defined categories for all participants, anchor plausibility was split into differently “grained” categorisations
that contained either three, four, five, or ten different ranks of anchor plausibility. A significant effect of anchor plausibility on was detected
for every categorisation (although with increasing number of ranks, F-values kept decreasing). Yet, for every categorisation, post hoc
comparisons always revealed only a significant difference for Δz between two ranks of anchor plausibility. A linear relation between anchor
plausibility and Δz was not detected.
Figure 4. Hindsight bias Δz predicted by collapsed anchor distance and split by plausibility. The quadratic regression functions were
for plausible anchors and for implausible anchors, respectively.
This was probably due to the smaller range of Δzscores for implausible anchors. In both models, the quadratic term
was significant p < .01; = .05), which was not the case for the cubic terms in corresponding cubic
models (tp1aus=1.27, p=3.23; timplaus=0.25, p = .83).
DISCUSSION
The findings demonstrate (a) that anchor plausibility decreases with increasing anchor distance as described by a power-
function, (b) that hindsight bias and anchor distance are curvilinearly related, in that hindsight bias initially increases with
increasing anchor distance, but later decreases for most extreme anchors, and (c) that anchor plausibility moderates the latter
effect, such that implausible anchors reduce the size of hindsight bias without altering the nature of the quadratic relation.
These results challenge the automaticity of hindsight bias as they suggest the involvement of cognitive processes that evaluate
the anchor before it is applied to reconstruct the former estimate.
Determinants of anchor plausibility

Anchor plausibility was predicted by anchor distance to the power of a negative exponent: The farther away the anchor value
was from the subjective range of possible estimates, the smaller the plausibility ascribed to it. This relation was found to be
symmetrical—anchors with equal relative distance but opposite direction were, on average, ascribed with the same
plausibility.
These findings are not very surprising and in line with assumptions and reflections about the nature of anchor plausibility
(e.g., Pohl, 1998a; Strack & Mussweiler, 1997). Moreover, they suggest that the introspective assessment of one’s own
knowledge possesses at least some degree of validity.
A power-regression accounted for nearly all of the variance: anchor plausibility asymptotically decreased with increasing
anchor distance from the subjective range of possible estimates. The acceleration of the curve was highest at the points where
anchors left the range. This corresponds to intuition—it can be expected that, the farther away anchors are from the
knowledge available, the less distinct the degree of implausibility conveyed by two different anchors. Consequently, as soon
as the anchors leave the range of subjectively possible estimates, a pronounced decrease in ascribed plausibility should occur,
which indeed was revealed in this study. One might speculate that this relation between anchor distance and ascribed anchor
382 HARDT AND POHL
plausibility is due to its trivial nature. However, it should be remembered that the range of possible estimates was measured in
the first experimental session. After retention of 1 week, participants seemed still to be able to reaccess the knowledge they
had used in determining their subjective range of possible estimates at high precision: Only anchors within the range of
possible estimates were ascribed an average plausibility above 50%. It is remarkable that even for the “fuzzy” type of
knowledge accessed by almanac-like questions, a more or less stable set of knowledge representations can be assumed. On the
other hand, it is surprising that even anchors of extreme extent were still not considered absolutely implausible, while anchors
at the midpoint of the subjective range were not considered absolutely plausible. These findings demonstrate that although the
subjective ranges were not completely arbitrary, they nevertheless exhibit a great deal of uncertainty or fuzziness.
Despite its intuitive plausibility, the non-linear relation between anchor distance and anchor plausibility needs to be explained
in more detail. A special mechanism can be assumed that exclusively processes the knowledge associated with the questions
presented (e.g., Pohl, 1998a; Strack & Mussweiler, 1997). By employing representations that denote the limits of the range of
possible estimates, this mechanism determines anchor plausibility (e.g., the representations associated with a question may
contain general information about the category in question such as “Humans can live up to about 120 years”). Furthermore, this
process can be assumed to operate to a great extent automatically. The central problem to solve is how the process maps
anchor distance to anchor plausibility in accordance with a power-function.
A methodological artifact might be responsible for these findings. The true relation between the anchor’s distance and its
plausibility might have been a linear decrease in plausibility for a certain range of anchor values until plausibility reaches its
minimum. Then anchor plausibility stays at this constant minimum for all anchor values beyond that point. If the slopes of the
first parts of these compound functions were differently steep, then summing across the whole range of anchor values yields a
decreasing power function. Unfortunately, this argument cannot be tested using the available data because every participant
received only one anchor with each item, so that no individual item-specific plausibility could be computed.
Determinants of hindsight bias

Hindsight bias was found to be dependent on (1) the width of the range of possible estimates, (2) anchor plausibility, and (3)
anchor distance.
Width of the range of possible estimates. The number of correct recalls was larger for narrow-ranged than for wide-ranged
items. Assuming that the width of the subjective range is negatively correlated to the participant’s expertise, this result fits
nicely with data reported in the literature (e.g., Christensen-Szalanski & Willham, 1991; Pohl, 1992). Additionally, the
analyses revealed (a) that in line with the results reported by Pohl (1992), hindsight bias was smaller for narrow- than for
wide-ranged items, (b) that independent of width of range, hindsight bias was larger for anchors within the range of possible
estimates than for anchors outside the range, and (c) that hindsight bias was not elicited by anchors outside the range of
narrow-ranged items.
(1) Width of range and correct recalls. Traditionally, the higher number of correct recalls for narrow-ranged items would
best be explained with a memory-trace account, in that the original estimate, due to a stronger memory trace (e.g., Hell et al.,
1988; Stahlberg & Maass, 1998), was easier to recall for narrow- than for wide-ranged items. Most likely, the stronger
memory trace would be attributed to a deeper encoding of the estimate. However, the question remains: Why was the estimate
better encoded for narrow- than for wide-ranged items? It seems valid to assume that width of range is negatively correlated to
knowledge precision. Since a more expert-like knowledge can be expected to go along with a higher interest in the domain
questioned (e.g., Christensen-Szalanski & Willham, 1991; Pohl, 1992; Synodinos, 1986), the estimate is likely to be subject to
a more elaborate encoding, which would yield a higher probability of getting recalled at recollection.
(2) Width of range of hindsight bias. In general, hindsight bias was smaller for narrow-ranged items than for wide-ranged
items. This result may follow from the fact that the width of range was confounded with the absolute values of possible
estimates. Again, this finding could also indicate that increasing knowledge precision reduces the likelihood of employing an
anchor to reconstruct the original estimate, since the width of range might denote the level of expertise for the given question.
Thus, the anchor could have been encoded to a lesser extent. This would then lead to the observed reduction of hindsight bias
for narrow-ranged items.
However, one problem remains: Why was hindsight bias eliminated for anchors outside the range of narrow-ranged items?
This is a note-worthy finding because it represents one of those rare cases in which hindsight bias failed to prevent
recollection of the original estimates (cf. Erdfelder & Buchner, 1998; Hasher et al., 1981; Pohl, 1998a, 1998b). Given the
assumption that narrow ranges correlate with expertise, this result could indicate that participants with a relatively “precise”
domain knowledge have the tendency to reject anchors that fall outside their range of acceptable answers. As a result, these
anchors are not employed to reconstruct former estimates, so that the reconstructions will be closer to the original responses.
Anchor plausibility. Independent of relative anchor distance, hindsight bias was found to be larger for subjectively plausible
than for subjectively implausible anchors. Since in this study the anchors were labelled as another participant’s estimate, this
finding can be considered a replication of the results reported by Pohl (1998a). However, other authors have reported the same
distorting effects for plausible and implausible anchors (Chapman & Johnson, 1994, Exp. 1; Quattrone et al., 1981; Strack &
Mussweiler, 1997, Exp. 3). The empirical discrepancy between these findings as well as the results of our study most likely
results from the difference in defining plausibility. The cited authors assessed the anchors’ plausibility indirectly either by
using the distribution of unbiased estimates or by employing the plausibility ratings of independent calibration groups, thus
applying the mean of these measures as the standard of comparison to all participants in the experimental condition. As
suggested by our discussion of the (numerical) location and the width of each individual’s knowledge base, these ranges
appear to vary inter individually to a great extent. Consequently, the same standard of comparison will yield largely divergent
plausibility ratings across participants. In our study, this methodological shortcoming was avoided by assessing the individual
ranges of acceptable estimates and by asking the experimental participants to rate the presented anchors’ plausibility themselves.
In conclusion, the present results support the assumption of an evaluative process that may moderate the anchor’s influence
on recollection of the original estimate. Hence, the notion of hindsight bias as a purely automatic process is seriously challenged
—neither immediate assimilation (Fischhoff, 1977), nor anchoring and adjustment (Jacowitz & Kahneman, 1995; Tversky &
Kahneman, 1974) can account for this result. Additionally, accounts that focus on explanations based on the relative strengths
of memory traces (e.g., Hell et al., 1988) are also incapable of explaining why and how anchor plausibility had an impact on
the size of hindsight bias.
Such an evaluative process may systematically tag domain knowledge that is similar to the discredited anchor in order to
prevent the use of knowledge that is associated with implausible information. In other words, presentation of implausible
anchors leads to inhibition of anchor-related information in subsequent processes that employ this knowledge to generate a
response, thus diminishing hindsight bias. In its most extreme case, this “deactivating” process might also account for the
contrast effect that was occasionally observed (see e.g., Kohnert, 1996; Ofir & Mazursky, 1997). Discarding the “discredited”
part of the available domain knowledge and then employing its altered remains to reconstruct a former estimate is likely to
yield a value that is farther away from the anchor than the original estimate had actually been. In addition, it might be
speculated whether such a discrediting procedure initiates a process that re-associates the question to alternative knowledge
stored in LTM, which is distinctly different from the anchor information. Either way, the result of such a process would be the
opposite of hindsight bias, namely a contrast effect.
Anchor distance. The relation between relative anchor distance and hindsight bias was found to follow an inverted-U-
shaped function. A quadratic regression accounted for more than 95% of the variance. Noteworthy, splitting up anchor
distance by (a) anchor direction, or (b) ascribed anchor plausibility revealed the same quadratic relationship in all cases. First
of all, these results show that the anchor-distance function is not linear as some authors suggested (Northcraft & Neale, 1987;
Pohl, 1998b; Russo & Shoemaker, 1989). The function rather levels off and even decreases for extremely distant anchors.
This fits with other results that questioned the previously assumed linear relationship (Chapman & Johnson, 1994;
Kahneman, 1992; Pohl, 1998a; Quattrone et al., 1981; Strack & Mussweiler, 1997). Intuitively, these findings appear to be
quite plausible: At a certain distance to the available domain knowledge, the anchor becomes increasingly incompatible, so
that it cannot be employed to reconstruct the original estimate. Despite its intuitive plausibility, this finding can hardly be
explained by most of the accounts of hindsight bias. In most models, a simple linear relation between anchor distance and
hindsight bias would have been expected. However, a quadratic curve and the finding that, for identical anchor distances,
hindsight bias was larger for plausible than for implausible anchors indicate that other, evaluative processes are very likely to
be involved in the genesis of hindsight bias.
The finding that hindsight bias was larger for negatively directed (“low”) anchors than for positively directed (“high”)
anchors, can hardly be explained in a satisfying manner. Others who analysed the direction effect for low and high anchors
reported all possible results. In a summary of 10 anchoring studies, and defining direction effects as having a larger impact of
at least 10% for low or high anchors, we found a larger bias for low anchors (as in this study) in five studies, no difference (as
would be expected) in two studies, and a larger bias for high anchors (opposite to this study’s findings) in three studies. It can
be argued6 that for skewed distributions, different effects should be expected for different anchor directions: If the possible
answers have a strict lower or upper limit (such as for questions about distances, lengths, life expectancy, or historical dates),
anchors directed towards the limit will become implausible and unacceptable sooner than anchors with the same amount of
deviation in the opposite direction. The distribution of the estimates in our questions, however, were not all skewed in the same
direction—about half of the distributions were positively skewed, while the other half were negatively skewed. Although this
does not provide sufficient evidence to eliminate any possibility that differentially skewed distributions are responsible for the
significant effect of anchor direction on hindsight bias, it indicates that further research is needed to discover this effect’s
underlying mechanisms.
6 We thank Wolfgang Hell and Ulrich Hoffrage for this suggestion.

384 HARDT AND POHL
Explanatory power of formal models of hindsight bias

The cognitive process model SARA (Selective Activation, Reconstructive Anchoring; Pohl, Eisenhauer, & Hardt, 2003-this
issue; Pohl, Hardt, & Eisenhauer, 2000) can explain most of the results. In short, in SARA it is assumed that estimates are
generated by a probabilistic process that samples knowledge representations from long-term memory and integrates these
recalled representations to form the estimate. Presenting the anchor automatically leads to its encoding. The encoding process
strengthens the associations between representations that are similar to the anchor. As a result, these representations then have
a higher probability of being recalled in subsequent processes that use this knowledge to reconstruct the previous estimate. The
reconstruction of the former estimate is thus biased towards the anchor and similar representations, thereby producing
hindsight bias.
The model SARA is not designed to determine the plausibility of a given anchor. Furthermore, the power-function
predicting anchor plausibility presents a problem since SARA expects a linear relationship between anchor distance and
plausibility. The farther away the anchor is from the individual’s knowledge, the more difficult retrieval of knowledge
representations will be. The extent to which representations can be readily retrieved may be employed in an assessment of the
anchor’s plausibility. However, nothing in this framework suggests a non-linear relation.
In its current version, SARA cannot account for the finding that plausibility moderates hindsight bias independent of the
anchor’s relative distance. An additional process seems to be necessary that, employing the anchor’s source and the available
domain knowledge, evaluates the anchor’s plausibility before encoding it. Depending on the anchor’s plausibility, a variable
weight might be attached to the anchor that determines the impact of this anchor at anchor encoding and at recollection of the
former estimate. The smaller the weight, the less the anchor’s impact.
One of the reviewers7 suggested that the observed curvilinear relationship between anchor distance and amount of
hindsight bias might result from the combined effects of two independent processes that follow non-linear functions. More
precisely, the impact of the anchor might monotonically increase with the anchor distance, but the probability of a biased
reconstruction depends on the anchor’s perceived plausibility: the smaller the anchor’s perceived plausibility, the less likely it
will be employed to reconstruct the original estimate (cf. Erdfelder & Buchner, 1998). Using a multinomial model to analyse
data of an experiment in which anchor distance was systematically varied in relation to the original estimate, Dehn and
Erdfelder (1998) could demonstrate that anchor distance alone appeared not to be responsible for the probability of a biased
reconstruction. However, they also found that in the case of a biased reconstruction, anchor distance had a significant impact
on the reconstructed estimate.
However, applying their multinomial model to our data failed because the model did not reach a satisfactory fit in order to
safely interpret the estimated model parameters, G2=114, df= 17, p < .0001.8 Therefore, the idea of understanding the
curvilinear relationship between anchor distance and amount of hindsight bias as a compound of two different processes
cannot be tested with the present data set. It is, however, certainly a proposition that deserves further investigation.
Conclusion
The results of the present study demonstrate that hindsight bias is a function of anchor distance and of subjective anchor
plausibility—hindsight bias is not a purely automatic process that solely depends on the anchor’s distance from the original
estimate. While anchor plausibility was closely related to anchor distance, plausibility nevertheless had an independent
impact on hindsight bias (cf. Pohl, 1998a). Furthermore, presenting anchors of extraordinary extremity, it was demonstrated
that, even for these anchors, hindsight bias was evoked, which contributes to the “robustness” of the phenomenon (cf. Strack
& Mussweiler, 1997). But the data also showed that, compared to less distant anchors, the influence of these extreme anchors
was diminished (cf. Chapman & Johnson, 1994). As a result, the anchor-distance function of hindsight bias was found to be
inversely U-shaped. Using individually constructed anchors for each participant and item appeared to be a successful method.
We thus suggest that future studies interested in anchor plausibility should employ this method rather than determining
anchor values on the basis of, for example, an independent sample’s mean estimate. Cognitive models of hindsight bias and
7 We thank Edgar Erdfelder for this idea.

8The reasons for this failure were most likely twofold: (1) The model compares the distribution of estimates in an experimental condition (with
anchor) to those in a control condition (without an anchor). Unfortunately, the present experiment did not include a genuine control
condition. There was at most one (experimental) condition that comes close to a control condition, because the middle value of the
individual range was presented as an anchor, so that no shift of estimates would be expected. But this case is certainly different from not
presenting any anchor at all, because in the former case, the estimates will probably be distributed closer around the anchor, resulting in a
smaller deviation. This in turn influences how the distribution of estimates in the (other) experimental conditions are separated by the
model. (2) In addition, the rather extreme anchors of the present experiment led to ceiling effects in the multinomial model, because the
probability of finding estimates still beyond the anchor was virtually zero for larger anchor distances.
the anchoring effect need to incorporate meta-cognitive evaluative processes to account for the finding that anchor plausibility
moderates the anchor’s influence on subsequent recollection or estimation.
REFERENCES
Chapman, G.B., & Johnson, E.J. (1994). The limits of anchoring. Journal of Behavioral Decision Making, 7(4), 223–242.
Processes, 48(1), 147–168.
349–358.
Hasher, L., Attig, M.S., & Alba, J.W. (1981). I knew it all along: Or, did I? Journal of Verbal Learning and Verbal Behavior, 20, 86–96.
Hawkins, S.A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the outcomes are known. Psychological Bulletin, 107
(3), 311–327.
Memory and Cognition, 16(6), 533–538.
Jacowitz, K.E., & Kahneman, D. (1995). Measures of anchoring in estimation tasks. Personality and Social Psychology Bulletin, 27(11),
1161–1166.
Kahneman, D. (1992). Reference points, anchors, norms, and mixed feelings. Organizational Behavior and Human Decision Processes, 51,
296–312.
Kohnert, A. (1996). Grenzen des Rückschaufehlers: Die Verzerrung von Erinnerungen an früheres Wissen durch neue Informationen
[Limits of hindsight bias: distorted memories of former knowledge caused by new information]. Bonn: Holos.
Northcraft, G.B., & Neale, M.A. (1987). Experts, amateurs, and real-estate: An anchoring-and-adjustment perspective on pricing decisions.
Organizational Behavior and Human Decision Processes, 39, 84–97.
Pohl, R.F. (1992). Der Rückschaufehler: systematische Verfälschung der Erinnerung bei Experten und Novizen [Hindsight bias: systematic
distortions in novices and experts]. Kognitionswissenschaft, 3, 38–44.
Pohl, R.F. (1995). Disenchanting hindsight bias. In M. Bar-Hillel, F.H.Barron, & J.-P.Jungermann (Eds.), Contributions to decision making
(pp. 323–334). Amsterdam: Elsevier.
Pohl, R.F. (1996). Der Rückschaufehler -eine systematische Verfälschung der Erinnerung [The hindsight bias—a systematic distortion of
memory]. Report Psychologie, 21, 596–609.
Pohl, R.F. (1998a). The effects of feedback source and plausibility on hindsight bias. European Journal of Cognitive Psychology, 10(2),
191–212.
Pohl, R.F. (1998b). Durch “Anker” verzerrte Urteile und Erinnerungen [Anchoring-distorted judgments and memory]. In U.Kotkamp &
W.Krause (Eds.), Intelligente Informationsverarbeitung (pp. 57–64). Wiesbaden: Deutscher Universitäts-Verlag.
Memory, 11, 337–356.
Pohl, R.F., Hardt, O., & Eisenhauer, M. (2000). SARA: Ein kognitives Prozessmodell zur Erklärung von Ankereffekt und Rückschaufehler
[SARA: A cognitive process model explaining anchoring effect and hindsight bias]. Kognitionswissenschaft, 9, 77–92.
Pohl, R.F., & Hell, W. (1996). No reduction in hindsight bias after complete information and repeated testing. Organizational Behavior and
Human Decision Processes, 67(1), 49–58.
Quattrone, G.A., Lawrence, C.P., Finkel, S.E., & Andrus, D.C. (1981). Explorations in anchoring: The effects of prior range, anchor
extremity, and sugges tive hints. Unpublished manuscript, Stanford University, CA (as cited in Chapman & Johnson, 1994).
Russo, J., & Shoemaker, P.J.H. (1989). Decision traps. New York: Simon & Schuster.
Schmidt, C. (1993). Verzerrte Vorstellung von Vergangenem: Vorsatz oder Versehen? [Biased representations of the past: Intention or
mistake?]. Bonn: Holos.
Strack, R, & Mussweiler, T. (1997). Explaining the enigmatic anchoring effect: Mechanisms of selective accessibility. Journal of
Personality and Social Psychology, 73(3), 437–446.
Synodinos, N.E. (1986). Hindsight distortion: “I-knew-it-all-along and I was sure about it”. Journal of Applied Social Psychology, 16,
107–117.
Toglia, M.P., Ross, D.F., Ceci, S.J., & Hembrooke, H. (1992). The suggestibility of children’s memory: A cognitive and social-
psychological interpretation. In M.L. Howe, C.J.Brainerd, & V.F.Reyna (Eds.), The development of long-term retention
(pp. 217– 241). New York: Springer.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131.
Strength of hindsight bias as a consequence of meta-cognitions
Stefan Schwarz and Dagmar Stahlberg
University of Mannheim, Germany
Hindsight bias is the tendency of people to falsely believe that they would have correctly predicted the outcome of
an event once it is known. The present paper addresses the ongoing debate as to whether the hindsight bias is due
to memory impairment or biased reconstruction. The memory impairment approach maintains that outcome
information alters the memory trace of the initial judgement, whereas the biased reconstruction approach assumes
that people who have forgotten their initial judgements are forced to guess and, in the presence of outcome
information, are likely to use this information as an anchor. Whereas the latter approach emphasises the role of
meta-cognitive considerations, meta-cognitions are not included in the memory impairment explanation. Two
experiments show that the biased reconstruction approach provides a better explanation for empirical findings in
hindsight bias research than does the memory impairment explanation.
When people predict the outcome of an event and are later—after the actual outcome of the event is known—asked to recall
their prediction, they are usually unable to ignore their knowledge of the outcome and remember having made a better
prediction than they actually did. This tendency to adjust one’s own memory to the actual outcome of an event is known as
the hindsight bias or the knew-it-all-along-effect.
It was Fischhoff and his collaborators (e.g., Fischhoff, 1975, 1977; Fischhoff & Beyth, 1975) who first investigated this
interesting phenomenon in the mid-1970s. In a classic experiment (Fischhoff, 1975), participants read a brief report about the
nineteenth-century war between the British and the Gurkhas of Nepal. Subsequently, they were asked to judge the probability
of each of four possible outcomes of the war. Participants in the experimental group were informed about the alleged actual
outcome prior to their judgement. They were then asked to estimate the probabilities “as if they had not known the actual
outcome”. Participants in the control group did not receive any outcome information. The results showed that participants in
the experimental group strongly overestimated the probability of the outcome they believed to be the actual one. It seems that
participants tend to assign higher postdictive probabilities to events reported to have happened. Since then, a large number of
studies on the hindsight bias have been reported (for reviews see Hawkins & Hastie, 1990; or Stahlberg & Maass, 1998). In
all, they have shown that hindsight bias is a very robust phenomenon that is hard to suppress. Even if participants were
carefully informed about the effect or asked to try not to fall prey to this bias, they were unable to ignore the outcome information
(Fischhoff, 1975, 1977). In a meta-analysis of 128 hindsight bias studies, Christensen-Szalanski and Willham

(1991) found only six studies without a significant effect.
THEORETICAL EXPLANATIONS OF THE HINDSIGHT BIAS

Although there is a rich literature on hindsight distortions, the underlying mechanisms are not yet fully understood. What
follows is a review of the three main approaches to explaining the hindsight bias: the motivational approach, the memory
impairment approach, and the biased reconstruction approach. Since the aim of this paper is to test different predictions
Requests for reprints should be sent to Prof Dr Dagmar Stahlberg, Lehrstuhl Sozialpsychologie, University of Mannheim,
68131 Mannheim, Germany. Email: dstahlberg@sowi.uni-mannheim.deFinancial support from the Deutsche
Forschungsgemeinschaft, Sonderforschungsbereich 504, at the University of Mannheim, is gratefully acknowledged.We are
grateful to Ulrich Hoffrage, Jochen Musch, Rüdiger Pohl, Lioba Werth, and one anonymous reviewer for their helpful and
constructive comments on earlier drafts of this article.
HINDSIGHT BIAS AND META-COGNITIONS 387
derived from the latter two approaches, they are described in greater detail below. The motivational explanations are
mentioned here only for the sake of completeness.
Motivational explanations
It is a well known fact in psychological research that human needs and motives influence judgement and decision processes.
Consequently, it was assumed that distortions like the hindsight bias could be driven by motivation. Researchers have
discussed the need for control (Campbell & Tesser, 1983), the need for cognition (Verplanken & Pieters, 1988), self-esteem
protection or enhancement (Mark & Mellor, 1991; Stahlberg & Schwarz, 1999), and, last but not least, self-presentational
concerns (Campbell & Tesser, 1983; Pohl, Stahlberg, & Frey, 1999; Stahlberg, Eller, Maass, & Frey, 1995). Christensen-
Szalanski and Willham (1991) concluded on the basis of their meta-analysis: “The cause of the hindsight bias is more likely
cognitively based, as originally suggested by Fischhoff (1975, 1977), than motivationally based” (pp. 163–164). Recently,
however, some researchers have reopened the debate over motivational influences on the hindsight bias (e.g., Renner, 2003-this
issue). A reduction of hindsight bias was found when participants received outcome information that reflected unfavourably
on them or the group to which they belonged (Louie, 1999; Louie, Curren, & Harich, 2000; Schwarz, 2002; Stahlberg &
Schwarz, 1999; Stahlberg, Sczesny, & Schwarz, 1999).
Memory impairment
All explanations in this category assume that outcome information impairs memory of previous judgements by either altering
or erasing existing memory traces, or rendering them less accessible. In the very first explanation of the hindsight bias,
Fischhoff (1975, p. 297) stated that the participant’s memory of the original prediction is altered by the subsequent outcome
information:
Assume that upon receipt of outcome knowledge judges immediately assimilate it with what they already know about
the event in question. In other words, the retrospective judge attempts to make sense, or a coherent whole, out of all that
he knows about the event.
According to Fischhoff’s immediate assimilation hypothesis, the outcome information is automatically integrated into the
existing knowledge structure, and this results in an inevitable and permanent modification of a person’s prior representation
of the event. Note that in a totally different context, namely the context of eyewitness testimony research, Loftus (1975;
Loftus & Hoffman, 1989; Loftus & Loftus, 1980) proposed a similar explanation for the fact that witnesses receiving
misleading postevent information about a previously observed event show poorer memory. Loftus stated that the original
memory trace is automatically updated when subsequent misleading information is encountered (“destructive actualisation
hypothesis”). Other researchers (e.g., McCloskey & Zaragoza, 1985a, 1985b) have challenged the assumption that subsequent
information affects memory of the original information. This will be discussed in greater detail later.
Other variants of the memory impairment approach have located the origin of the hindsight bias in the retrieval stage. Hell,
Gigerenzer, Gauggel, Mall, and Müller (1988), for example, proposed two separate memory traces: one for the original
prediction, and one for the outcome information. According to their dual memory traces model, the hindsight bias should
depend on how accessible the two memory traces are. Accessibility is mainly determined by two different features: the depth
and recency of information encoding. The authors claim that the hindsight bias is due to the fact that the outcome information
is—by definition—encoded more recently than the original prediction in hindsight situations. Thus, the outcome information
should be more accessible than the original prediction and should therefore play a more important role when a judgement is made
in hindsight. On the other hand, the deeper the memory trace for the original prediction, the smaller the distorting influence of the
outcome information should be. Consequently, the hindsight bias should be strong if the outcome information is encoded
more deeply than the original prediction. In a series of experiments, the authors were able to show that the depth of encoding
for the original prediction (or the outcome information) and the time of presentation for the outcome information influenced
the strength of the hindsight bias.
According to the selective retrieval hypothesis (Morton, Hammersley, & Bekerian, 1985; Slovic & Fischhoff, 1977),
known outcomes serve as retrieval cues for relevant case material. Once an outcome has been learned, information congruent
with this outcome will become highly accessible, whereas incongruent information cannot be retrieved with the same ease.
Thus, hindsight bias is due to a selective retrieval of information that is in line with the actual outcome.
388 SCHWARZ AND STAHLBERG
Biased reconstruction
Several authors have stressed the role of reconstructional processes as a major source of hindsight bias (e.g., Erdfelder &
Buchner, 1998; Hoffrage, Hertwig, & Gigerenzer, 2000; Pohl, Eisenhauer, & Hardt, 2003-this issue; Stahlberg et al., 1995;
Stahlberg & Maass, 1998; Werth, Strack, & Förster, 2002; Winman, Juslin, & Björkman, 1998). These authors share the idea
that people use the outcome information as a basis for reconstructing an original judgement when asked to remember it. What
follows is a detailed explanation of our own biased reconstruction model (Stahlberg, 1994; Stahlberg & Eller, 1993; Stahlberg
et al., 1995; Stahlberg & Maass, 1998).
The biased reconstruction approach is loosely based on McCloskey and Zaragoza’s response bias hypothesis (McCloskey &
Zaragoza, 1985a, 1985b; Zaragoza & McCloskey, 1989), which was originally developed within the field of eyewitness
testimony research. Contrary to the above-mentioned “destructive actualisation hypothesis” (Loftus, 1975; Loftus &
Hoffman, 1989; Loftus & Loftus, 1980), the response bias hypothesis does not assume that subsequent (e.g., misleading)
outcome information impairs memory of the original judgement. Rather, it assumes that the subsequent outcome information
will be used as a basis for reconstructing the original judgement by those people who have forgotten it. In the context of
eyewitness testimony research, the response bias view has been quite successful (Belli, Windschitl, McCarthy, & Winfrey,
1992).
Since both lines of research—eyewitness testimony after misleading information and hindsight bias—pursue the question
of whether information stored in memory might be less accessible after being confronted with inconsistent new information,
the response bias hypothesis was adapted to hindsight bias research. The biased reconstruction approach assumes that people
who are asked to remember their original prediction after being informed about the actual outcome of an event can either
remember it or not. Those who do remember their original prediction are likely to reproduce it. Those who have forgotten it
are forced to reconstruct their prediction or guess, and, in the presence of outcome information, are likely to utilise this
information as an anchor. Only those who have forgotten their prediction produce the hindsight bias. The magnitude and
direction of the hindsight bias depends on people’s subjective assumption about their predictive ability. Since people are
generally overly optimistic about their abilities (Greenwald, 1980), in the majority of cases they will locate their presumed
prior estimate closer to the real outcome (anchor) than it originally was, producing a hindsight bias. But, if they have reason to
believe that the outcome was unpredictable (e.g., if they doubt their predictive abilities in a certain field of expertise), the
hindsight bias might be reduced, non-existent, or even reversed (Mazursky & Ofir, 1990; Ofir & Mazursky, 1997; Stahlberg &
Schwarz, 1999).
BIASED RECONSTRUCTION VERSUS MEMORY IMPAIRMENT

The intention of this paper is to show that the biased reconstruction approach offers a valuable alternative to Fischhoff’s
(1975) immediate assimilation explanation of hindsight distortions, in much the same way that McCloskey and Zaragoza’s
(1985a, 1985b) response bias view offers an alternative to Loftus’s (1975, 1979) destructive actualisation hypothesis in the
context of misleading postevent information research. Although in actual judgement situations memory impairment and
biased reconstruction processes may often be active at the same time and thus may both contribute to hindsight biases (see
Hawkins & Hastie, 1990; Pohl & Gawlik, 1995; Stahlberg & Maass, 1998), a clear distinction between the two theoretical
explanations will be made for the sake of conceptual clarity and the formulation of unambiguous empirical predictions. The
biased reconstruction approach is regarded as a judgement process that operates only at the responsegeneration stage, whereas
the memory impairment approach is a largely automatic memory process that intervenes between initial prediction and response
generation. That being the case, the following differences between both approaches can be predicted.
One difference concerns the number of correctly recalled predictions (hits). The adaptation of Loftus’s (1975) destructive
actualisation hypothesis to the memory impairment approach of hindsight bias research suggests that receiving outcome
information will impede people’s ability to recall their original prediction. This should result in a lower number of hits in the
outcome information condition (experimental group) than in the no-outcome information condition (control group). In
contrast, the biased reconstruction view assumes that the outcome information does not affect people’s ability to correctly
recall their initial prediction. Accordingly, this should result in a comparable number of hits in the experimental and no-
outcome control condition. Another difference concerns the role of meta-cognitive considerations. A meta-cognition is a
cognition about one’s own cognitive competence or capacity. Meta-cognitive processes are central in biased reconstruction but
irrelevant in memory impairment, where memory traces are believed to be updated automatically. According to the biased
reconstruction approach, hindsight distortions will only occur when people forget their original prediction and—while
reconstructing it—have reason to believe that their initial estimate must have been close to what is now the known outcome.
As mentioned above, in most cases people are supposed to make self-serving assumptions about their predictive abilities.
Thus, they locate their remembered previous prediction closer to the actual outcome information. However, whenever meta-
cognitive beliefs suggest that the original prediction must have been distant from the true outcome, hindsight distortions are
predicted to disappear or even reverse.
BASIC RESEARCH PARADIGMS IN HINDSIGHT BIAS RESEARCH

Hindsight bias has been investigated in two distinct research paradigms: the memory design and the hypothetical design. In
the memory design, participants make predictions about a certain event. After a certain time interval, experimental group
participants receive information about the real or alleged outcome. Control group participants do not receive such outcome
information. Both groups are then asked to recall their original predictions as accurately as possible. The hindsight bias occurs
when the recalled predictions relative to the original predictions show a stronger tendency to move in the direction of the real
or alleged outcome in the experimental group than in the control group. In a modified within-subject design, the experimenter
may include a series of judgemental items and provide outcome information for only some of them (the experimental items),
while the remaining items serve as control items.
In the hypothetical design, some participants judge the probability of an outcome without receiving any outcome
information (control group), whereas others first learn about the actual or alleged outcome and are then instructed to indicate
their probability judgements “as if they had not known the outcome” (experimental group). The hindsight bias emerges if the
probability judgement of the experimental group is closer to the real or alleged outcome than the probability judgement of the
control group.
Hindsight distortions have generally been reported to emerge more forcefully in the hypothetical than in the memory design
(Campbell & Tesser, 1983; Davies, 1992; Fischhoff, 1977; Wood, 1978). However, only two studies exist that directly compare
the hindsight bias in a memory design and a hypothetical design (Campbell & Tesser, 1983; Musch, 2003-this issue). In
Campbell and Tesser’s (1983) study, participants were first asked to answer 40 almanac questions (e.g., “Florida has the
nation’s lowest altitude”) and to estimate the probability that their answers were correct on a scale ranging from 0=“definitely
true”, to 21=“definitely false”. After a 30-minute break, participants learned the correct answer to each item and were
instructed to recall their initial answer and the degree of confidence they had assigned to it. For this memory design,
Campbell and Tesser calculated a within-participant hindsight bias index by summing the changes from the foresight
responses to the hindsight responses. This index indicated a reliable hindsight bias in participants’ memories of their initial
answers. Next, the participants responded to 40 new almanac questions, this time with the answers displayed next to each item.
The participants were instructed to estimate the probabilities they believe they would have assigned to the statement if they
had not known the correct answer. These responses were used to construct a second between-participants index by summing
the differences between each participant’s response and the average response by other participants to the same item presented
without answers (i.e., based on the responses of participants in the first phase of the experimental session). The analysis of
this score also showed a clear hindsight bias for the hypothetical design: Participants who were given the correct answers
were more confident that they would have answered the questions correctly than participants predicting the correctness of
their responses without the answers. This procedure allowed the researchers to compare the hindsight bias for each participant
in both the memory design and the hypothetical design. Campbell and Tesser found that the participants exhibited a stronger
hindsight bias with hypothetical instructions than with memory instructions.
This result can be easily explained in terms of the biased reconstruction approach.1 According to the biased reconstruction
explanation, the hindsight bias is only due to those participants who have forgotten their previous judgements and are
therefore forced to reconstruct or guess. Some participants in the memory design will remember their previous estimates
correctly, but correct recall is—by definition—impossible in the hypothetical design. As a result, all participants in the
hypothetical design, but only some of the participants in the memory design, are potentially susceptible to hindsight distortions.
Consequently, the likelihood that the hindsight bias will occur is lower in the memory design than in the hypothetical design.
Statistically speaking, from the perspective of the biased reconstruction approach the difference in hindsight bias between the
two paradigms should be attributable to a different number of hits. Experiment 1 was designed to test this explanation derived
from the biased reconstruction approach. A procedure similar to Campbell and Tesser (1983) was applied and both hindsight
bias paradigms (memory and hypothetical design) were integrated into one common design (see also Musch, 2003-this
issue). We predicted that the difference in hindsight bias between both paradigms would disappear when perfect recalls (hits)
are excluded from the memory design data. Additionally, this experimental design offered the opportunity to replicate
previous findings showing that outcome information does not affect the number of correct recalls (hits) in the experimental
condition (e.g., Dehn & Erdfelder, 1998; Erdfelder & Buchner, 1998; Pohl, 1993; Stahlberg et al., 1995). As outlined above,
this is what the biased reconstruction approach would predict, whereas the memory impairment approach would predict a
lower number of hits in the experimental condition than in the control condition.
EXPERIMENT 1
Method
Participants first made predictions on 18 survey-type questions (e.g., “What percentage of Germans live in cities with more
than 100,000 inhabitants?”) without being informed about the later memory task. After a delay of 30 minutes, they received
fictitious outcome information above or below their initial predictions, or they received no outcome information.
Subsequently they were asked to remember their initial predictions as accurately as possible. After having answered 18
survey-type questions in the memory design, participants responded to 18 new survey-type questions with hypothetical
instructions. The same feedback condition as in the memory design was applied, with the only difference that the feedback
was calculated on the basis of a randomly assigned yoked participant’s prediction in the memory design.2 Hence, the study
consisted of a 3 (feedback: above prediction, below prediction, no feedback)×2 (research paradigm: memory design,
hypothetical design) factorial design, with both factors representing within-participants factors. In the memory design, the
difference between recalled and initial predictions was analysed as dependent variable. In the hypothetical design, the
difference between the foresight prediction of a yoked participant and the “hypothetical” prediction was measured.
Additionally, we analysed the number of hits in the memory design.
Participants. A total of 36 first-year psychology students at the University of Mannheim (Germany) received DM 15
(approximately $7.50) for their participation. The participants were between 19 and 30 years old (with a mean age of 22.3
years); 25 were female, 11 were male.
Procedure. Each participant worked individually at a personal computer. Upon arrival, participants received written
instructions explaining the scope of the experiment as follows: “This is a study about social judgement. Your task will be to
predict the results of recent national opinion polls as accurately as possible. Our aim is to analyse whether people are able to
predict complex social issues using only their everyday knowledge. You will be asked which result you would have predicted
for a number of recent representative polls.” Then participants received 36 different survey-type questions (e.g., “What
percentage of German households have a television?”). To prevent participants from choosing mainly estimates such as 30%
or 60%, which could be remembered easily, they were asked to choose estimates such as 29% or 62%. Only one question at a
time was presented on the monitor. All questions were used the same number of times in all experimental conditions.
Memory vs hypothetical design. First, we gathered the data based on the memory design. One half of the participants made
predictions for questions 1 to 18, the other half answered questions 19 to 36 (pre-test). After a break of 30 minutes
(participants meanwhile filled out a personality questionnaire, a mere filler task), they received the same questions again, in a
different order (post-test). Outcome information was provided for some of the questions but not for others (see “Feedback”).
For each question participants were asked: “Please remember your initial predictions as accurately as possible and indicate
exactly the percentage that you had indicated when making your prediction half an hour ago.” Subsequently, all participants
had to answer the remaining 18 questions under hypothetical design conditions (that is: participants who had answered
questions 1 to 18 the first time now answered questions 19 to 36, and vice versa). Under these conditions, they were instructed
to make their predictions as if they did not know the true outcome. Again, participants received outcome information for some
of the questions but not for others (see “Feedback”).
Feedback. In the memory design, participants received, for six questions, fictitious outcome information that was 15, 16, or
17 percentage points above their initial prediction (condition “above”). For another six questions, the outcome information
was 15, 16, or 17 percentage points below their initial predictions (condition “below”). For six questions they received no
outcome information at all (condition “no” feedback). Feedback was prepared in such a way that, across participants, each
question was equally often associated with “above” (15%, 16%, and 17%), “below” (15%, 16%, and 17%), and “no”
feedback. Accordingly, in the hypothetical design, for two-thirds of the questions participants received outcome information
that was 15, 16, or 17 percentage points below or above the foresight prediction of a yoked participant in the memory design.
For six questions participants did not receive outcome information. If the feedback for a certain question would have resulted
1 The biased reconstruction explanation assumes that the same cognitive processes are responsible for hindsight bias effects in the memory
and the hypothetical design. However, other authors have suggested that hindsight bias effects in both paradigms are due to different
cognitive processes (e.g., Blank, Fischer, & Erdfelder, 2003-this issue; Musch, 2003-this issue).
2 An alternative to the yoking procedure would have been to base the feedback itemwise on the mean predictions of the participants in the
memory design. However, pretests have shown that this procedure would have led to a significantly reduced variation in feedback scores in
the hypothetical design compared to those in the memory design, or far less extreme individual feedback scores, respectively. The yoking
procedure avoids this confounding between memory and hypothetical design and extremity of feedback.
in an estimate below zero or above 100%, the feedback was left out and the question was treated as a missing value. That was
true for both designs.3
Dependent variables. In the memory design, the dependent measures were computed from the participants’ percentage
estimates for the 18 questions in the pre- and post-test. To measure the hindsight bias, we computed the mean post-test minus
pre-test differences across the six4 questions in the condition “above”, “below”, and “no” feedback, respectively. This resulted
in three mean difference scores for each participant. If the hindsight bias appeared, the mean difference score should be
positive in the “above” feedback condition, negative in the “below” feedback condition, and approximately zero in the “no”
feedback condition. In the hypothetical design, the dependent measures were computed accordingly, except that the foresight
prediction of a yoked participant in the memory design was subtracted from the “hypothetical” prediction of the participant.
Additionally, we measured the participant’s ability to remember his or her initial predictions correctly. Due to the fact that there
is—by definition—no initial prediction in the hypothetical design, the number of perfect hits (post-test score minus pre-test
score=0) was calculated only for the data in the memory design.
Results
Hindsight bias. The mean difference scores across all participants in the experimental conditions are presented in Columns 1
and 2 of Table 1. The difference scores were subjected to a 3×2 within-participants ANOVA. The analysis of the difference
scores revealed a main effect of feedback, F(2, 70)=35.91, p < .001; Subsequent t-tests indicated that the mean
difference score was higher for “above” feedback questions (M=5.48%) than for “below” feedback questions
p < .001 and for “no” feedback questions (M=0.18%), p < .001. The difference
score for the
TABLE 1 Means (and standard deviations) of the difference between pre-test estimate and post-test estimate as a function of feedback and
research paradigm (Experiment 1)
Research paradigm
1 2 3
Feedback Memory design (hits included) Hypothetical design Memory design (hits excluded)
Above 3.98 6.98a 4.78a
(4.92) (9.16) (5.56)
No 0.60 b −0.24 b,c 0.59c
(3.82) (11.37) (4.16)
Below −3.57 −6.15d −4.01d
(4.71) (9.05) (5.34)
Means within rows sharing the same superscript were not significantly different from each other (t-test).
“below” feedback questions was significantly lower than for the “no” feedback questions (M=0.18%),
p < .001. The main effect of the factor research paradigm was not significant, F(1, 35)=0.014, p=.91.
Furthermore, the expected interaction between feedback and research paradigm was significant, F(2, 70)=3.26, p < .05.
The mean difference score for the “below” feedback questions was significantly lower in the hypothetical design
than in the memory design t(35)=1.96, p=.029, and the mean difference score for the “above” feedback
questions was marginally higher in the hypothetical design (M=6.98%) than in the memory design (M = 3.98%), t(35)=1.64,
p=.055.5 The mean difference score in the “no” feedback questions did not differ significantly in the hypothetical
and the memory design (M=0.60%), t(35)=0.41, p=.68. Although the hindsight distortions seemed to be more pronounced in
the hypothetical design from the inspection of the means, the effect size for feedback was higher in the memory design
than in the hypothetical design
Number of hits. The mean number of hits in the memory design (“above”: M=1.11, SD=0.92; “no”: M=1.08, SD=0.94;
“below”: M=0.81, SD = 0.95) was subjected to a one-way ANOVA. This analysis revealed no significant difference in the
number of hits between the three feedback conditions, F(2, 70)=1.16, p=.319. To test the different hypotheses that can be
derived from the two theoretical models (biased reconstruction: equal number of hits in experimental and control group;
3 That happened in 4.7% of the cases in which the experiment called for the outcome information to be supplied. These cases were evenly
distributed for the “above” and the “below” feedback condition as well as for the memory and the hypothetical design.
4 If the feedback for a question had to be left out (see Footnote 3), the corresponding mean difference score for the participant in question
was calculated on the basis of the remaining five questions (or four, which happened only once).
memory impairment: more hits in the control group than in the experimental group), we calculated the contrast between the mean
number of hits in the “no” feedback condition (M=1.08) and the mean number of hits in the pooled “above” and “below”
feedback conditions (M = 0.96). This analysis showed that the mean number of hits in the “no” feedback condition was not
significantly higher than the mean number of hits in the two pooled feedback conditions, t(35) = 0.72, p=.475. A subsequent
power analysis of t-tests for matched pairs revealed a power of 0.17 (f=0.117, N=36) for this comparison. Although
the power to detect an effect, if one exists, is unacceptably low here, there are two reasons for not rejecting the null hypothesis
in the present case. First, on a descriptive level the difference between the two groups is extremely small. Second, as
mentioned above, a number of prior studies have shown that there is no difference between the number of correct recalls in
outcome and no-outcome information groups (e.g., Dehn & Erdfelder, 1998; Erdfelder & Buchner, 1998; Pohl, 1993;
Stahlberg et al., 1995).
Hindsight bias after excluding hits in the memory design. Column 3 of Table 1 presents the mean pre-post differences in
the memory design after the questions for which this difference score was 0 were eliminated. Together with the unmodified
estimates of the hypothetical design (Column 2 in Table 1), these difference scores were subjected to a 3×2 within-participant
ANOVA. Once again, the hindsight bias was highly significant: main effect feedback: F(2, 70)=38.3, p < .001). Subsequent t-
tests indicated that the mean difference score was higher for “above” feedback questions (M=5.88%) than for “below”
feedback questions t(35)=8.33, p < .001, and for “no” feedback questions (M=0.18%), t(35)=5.00, p < .001.
The mean difference score for the “below” feedback questions was significantly lower than for the “no”
feedback questions (M=0.18%), t(35) = 4.07, p < .001. This effect had an effect size of The main effect of research
paradigm was not significant, F(1, 35)=0.043, p=.84.
The theoretically significant effect, however, concerns the interaction between feedback and research paradigm. Previously
this interaction reached significance, but—as theoretically expected by the biased reconstruction approach—after we
eliminated the perfectly recalled predictions in the memory design, it was no longer significant, F(2, 70)=1.92, p=.15. To test
whether this reduction in significance is meaningful, we calculated a 3×2×2 within-participant ANOVA with the additional
factor “difference scores including hits versus difference scores without hits”. The only interesting result of this analysis is the
three-way interaction between the factors involved. Because this three-way interaction is highly significant, F(2, 70)=11.01, p
< .001, we conclude that the difference between the data pattern before and after eliminating the hits in the memory design data
as described above is indeed statistically meaningful.
Discussion
The scope of this experiment using an experimental design that combined the memory design and the hypothetical design was
to test the predictions derived from the biased reconstruction approach concerning the different magnitude of the hindsight
bias in the two research paradigms. Overall the results show that the application of McCloskey and Zaragoza’s (1985a,
1985b) response bias view to hindsight bias research is fruitful. The predictions derived from the biased reconstruction
approach could be confirmed. In detail, the results are as follows:
First, the hindsight bias was once again replicated with almanac questions as stimulus material. This was true for the
memory design as well as for the hypothetical design. In both designs the participants recalled higher estimates after they received
“above” feedback and lower estimates after they received “below” feedback as compared to the “no” feedback estimates.
Second, feedback (either above or below previous predictions) did not affect the number of hits in the experimental and the
control condition. If we apply Loftus’s destructive actualisation hypothesis to hindsight research, we would expect the
number of hits in the memory design in the “above” and “below” feedback conditions to be lower than in the “no” feedback
condition. This was not the case, which suggests that memory was not impaired by outcome information. Thus, the results
contradict the memory impairment view, but they are consistent with the biased reconstruction approach. As mentioned above,
this finding is supported by several other studies (e.g., Dehn & Erdfelder, 1998; Erdfelder & Buchner, 1998; Pohl, 1993;
Stahlberg et al., 1995).
Third, the results show that the biased reconstruction approach is able to explain differences between the findings in the
memory and hypothetical design in a very parsimonious way. Similar to Campbell and Tesser (1983) and Musch (2003, this
issue), inspection of the mean difference scores in the different feedback conditions revealed a more forceful hindsight
distortion in the hypothetical design than in the memory design. According to the biased reconstruction approach, only
participants who forget their initial prediction are responsible for the hindsight bias. There are participants with perfect recall
in the memory design, but—by definition—there are no participants with perfect recall in the hypothetical design.
Consequently, there is a lower chance for the hindsight bias to occur in the memory design (which leads to a higher hindsight
5 Because of the directional hypothesis both tests were conducted as one-tailed tests.
distortion in the hypothetical design—as can be seen by the inspection of the mean difference scores in the experiment). Once
the perfect recalls in the memory design are eliminated from the data analysis, there is an equal chance for the hindsight bias
to occur in both designs. This was done in the present study, and we discovered that hindsight distortions in the hypothetical
design were no longer significantly more pronounced than in the memory design. According to the effect size measures in
both experimental designs, the factor feedback accounted for more variance in the memory design than in the hypothetical
design. At first glance, this surprising finding seems to contradict findings by other studies, which report a greater magnitude
of hindsight bias in the hypothetical design than in the memory design. But to our knowledge, the present study is one of the
first to statistically compute measures of effect size instead of merely analysing differences between means (see also Musch,
2003-this issue). Although the present data pattern in the memory and the hypothetical design is the same as reported in
previous studies, variances within memory design conditions are much smaller than variances within the hypothetical design.
From a hindsight perspective, the latter finding seems like a logical consequence of the yoking procedure that was applied in
this study: In general, the difference score of a participant in the memory design varied between zero and 15 percentage
points, whereas difference scores up to 80 and more percentage points appeared in the hypothetical design.
EXPERIMENT 2
As outlined earlier, biased reconstruction and memory impairment differ with regard to their emphasis on meta-cognitive
considerations. The biased reconstruction view argues that people who have forgotten their original prediction and are
therefore forced to guess will use more or less explicit assumptions about how close to or distant from the true outcome their
original estimate may have been. People who trust (or doubt) their predictive ability will remember an estimate that is close to
(or far away from) the outcome, and this results in a strong (weak or even reversed) hindsight bias. By contrast, the memory
impairment view does not emphasise a particular role of meta-cognitions. If outcome information is assimilated into the
existing knowledge structure in an automatic fashion, there is little reason to believe that meta-cognitive processes should
play a relevant role. To test whether meta-cognitive assumptions affect the strength of the hindsight bias (biased
reconstruction) or not (memory impairment), we experimentally manipulated participants’ assumptions of how close their
predictions allegedly were to the actual outcome.6 A first group of participants was told that overall their original predictions
had been quite close to the actual outcome (prediction quality: good), a second and a third group of participants received the
information that their predictions—as a rule—had been either too high or too low (prediction quality: too high or too low),
and a fourth group learned that they had made very poor predictions, which had sometimes been much too high and sometimes
much too low (prediction quality: poor). A further control group of participants did not receive any information about the
supposed quality of their predictions (no information).
The biased reconstruction view assumes that participants who received outcome information will first ask themselves
whether they can directly recall their previous predictions. If the answer to this question is “yes,” they will name what they
recall. In general, this will result in correct recalls of own predictions (hits).7 If they do not directly recall their own
predictions, they will try to reconstruct them. In this process, they are supposed to use all the information available, e.g., the
outcome information. Among other things, they ask themselves: “Will the outcome information help me to reconstruct my
prediction?”. Answering in the affirmative will occur if (1) the outcome information itself is perceived to be reliable, and (2)
participants have a reasonable assumption about how close their own prediction might have been to the actual outcome. It was
assumed that the latter would be the case in the experimental conditions “prediction quality: good, too high and too low”. If a
participant knows that—overall—his or her own predictions have been good, the strategy of reconstruction should be to
choose an answer that is close to the actual outcome. This will result in a strong hindsight bias. If the participant knows that—
overall— his or her own predictions have been too high (too low), he or she should choose an answer that lies above (below)
the actual outcome. This strategy will result in either a reduced or a strong hindsight bias, depending on whether the alleged
outcome was actually below or above the predictions: If, for example, a participant as a rule chooses an answer above the
outcome, this will attenuate hindsight bias if the prediction really was above (in the “below” feedback condition), and enhance
the bias if the prediction was below (in the “above” feedback condition).
However, in the experimental condition “prediction quality: poor” the answer to the above question will be “no”.
Whenever more than two outcome alternatives exist (and there are exactly 101 alternatives on a percentage scale with
onepoint increments), the information that “I almost never come close to the actual outcome” will not help participants to
reconstruct their own prediction, as it might have been above the actual outcome or below the actual outcome. So what do
participants do in this highly ambiguous situation? Our prediction was that they will simply guess, try to ignore the actual
6A very brief summary of this experiment was presented in Stahlberg and Maass (1998).
7Of course, there might be participants who are overconfident. These participants would also answer “yes”, but fail to remember their
original prediction.
outcome, or use another strategy of reconstruction, if possible (e.g., they will try to go back to the data on which their initial
prediction was based). Altogether, these latter three strategies will result in a reduced hindsight bias.
Figure 1 gives a summary of the model’s predictions based on the experimental conditions that are realised in Experiment 2.
In sum, we expected a strong hindsight bias in the condition “prediction quality: good”, a reduced hindsight bias in the
condition “prediction quality: poor”, and a hindsight bias of intermediate strength in the conditions “prediction quality: too
high/too low”. Furthermore, the design makes it possible to compare the answers of participants who received information
about the quality of their predictions and those who received no such information. It is therefore possible to analyse whether
the meta-cognition “I made good predictions” leads to a stronger hindsight bias and/or the meta-cognition “I made poor
predictions” leads to a reduced hindsight bias. Additionally, we analysed the number of hits in the different experimental
conditions (see Experiment 1).
Method
As in the previous experiment, participants were first asked to predict the results of 18 survey-type questions. After they answered
several paper and pencil personality questionnaires (a filler task that lasted about 15 to 20 minutes), they were told how
accurate they had been in predicting the survey results. Subsequently, participants received the same questions again and were
instructed to recall their initial predictions as accurately as possible. For two-thirds of the questions they received outcome
information that either exceeded or remained below their initial estimates (experimental questions), while no feedback was
provided for the six control questions. Hence, the design was a 3 (feedback: above, below, no)×5 (prediction quality: good,
too low, too high, poor, no information) design with one within-participants factor (feedback) and one between-participants
factor (prediction quality).
Participants. A total of 90 participants were recruited among the relatives and friends of the student experimenters (due to a
technical problem, the data of one participant were lost; thus, the final sample consisted of 89 participants). The participants
were between 18 and 70 years old (with a mean age of 34.6 years). The sample consisted of 50 employed people, 24 students,
3 pupils, 7 housewives, and 5 pensioners; 51 were female, 38 male. The data were collected with a laptop mainly at the
participants’ homes.
Procedure. The same cover story and stimulus material as in Experiment 1 was used. Upon their arrival, participants
received all instructions via computer display. Except for the personality questionnaires, all data were collected on the
computer. The procedure was exactly the same as in the memory design of Experiment 1, except that the factor “prediction
quality” was added. This information was given immediately before the participants were instructed to recall their initial
predictions.
Prediction quality. Participants were randomly assigned to one of the following conditions. Immediately before they were
asked to recall their initial predictions, participants received different information about their (alleged) ability to make
accurate predictions:
• Good: “62% of your predictions are 2 to 10 percentage points away from the actual estimate; 21% of your predictions are
exactly correct; 17% of your predictions are 10 to 20 percentage points away from the actual estimate. Altogether this is a
very good result.”
• Too high: “80% of your predictions are 15 to 25 percentage points above the actual estimate. Only 9% of your predictions
are exactly correct. 11% of your predictions are 10 to 15 percentage points below the actual estimate. As a rule your
predictions are too high. Altogether this is a relatively poor result.”
• Too low: “80% of your predictions are 15 to 25 percentage points below the actual estimate. Only 9% of your predictions
are exactly correct. 11% of your predictions are 10 to 15 percentage points above the actual estimate. As a rule your
predictions are too low. Altogether this is a relatively poor result.”
• Poor: “44% of your predictions are 15 to 25 percentage points above the actual estimate. 47% of your predictions are 15 to
25 percentage points below the actual estimate. Only 9% of your predictions are exactly correct. Altogether this is a
relatively poor result.”
• No information: participants in this condition received no information about the quality of their own predictions.
At the end of the study, all participants were carefully debriefed about the true nature of the experiment and were informed
that the information about their predictive ability was manipulated by the experimenter.
Dependent variables. The dependent measures were the same as in Experiment 1. The difference scores of percentage
estimates in the pre- and post-test and the number of hits (difference scores=0) were calculated as previously explained.
Figure 1. Predictions of the biased reconstruction model depending on the alleged prediction quality. Predictions are presented for
experimental conditions in Experiment 2 only. In the control conditions, participants are expected to either recall their own predictions or
use strategies of reconstruction that are not necessarily affected by the outcome information.
Results
Hindsight bias: Difference scores “post-test minus pre-test”. Table 2 presents the mean difference scores as a function of the
three feedback conditions and the five prediction quality conditions. The difference scores were subjected to a 3 × 5 ANOVA.
The analysis of the difference scores revealed the expected main effect of feedback, F(2, 168)=69.39, p < .001. Subsequent t-
tests indicated that the difference scores were higher for “above” feedback questions (M=4.06%) than for “below” feedback
questions p < .001, and for “no” feedback questions < .001.
The difference scores were also lower for “below” feedback questions than for “no” feedback questions
p < .001, thus indicating that the participants showed a strong hindsight bias effect
The interaction between feedback and prediction quality was also significant, F(8, 168)=2.59, p < .01. In order to analyse
this interaction, we subjected the difference scores in each prediction quality condition to a one-way ANOVA. The five one-way
ANOVAs are presented in Table 2. First of all, the results show that the effect size of the hindsight bias varied between the
five prediction quality conditions. The effect size was highest in the condition “prediction quality: good”
TABLE 2 Means (and standard deviations) of the difference between pre-test estimate and post-test estimate as a function of prediction quality
and feedback (Experiment 2)
Feedback One-way analysis of variance
Prediction quality Above No Below F df P
Good 6.34 1.46 −6.26 59.16 2, 32 < .000 .787
(3.32) (2.84) (4.18)
Too High 3.18 −3.33a −4.36a 17.48 2,34 <.000 .507
(4.33) (3.80) (3.91)
Too Low 3.93 −2.71b −3.98b 10.92 2, 34 < .000 .391
(6.57) (7.31) (4.23)
Poor 3.70c 1.11c,d −0.79d 4.21 2, 34 .023 .198
(4.71) (3.82) (4.35)
No 3.30 −0.74 −5.43 11.57 2, 34 < .000 .405
(6.00) (6.53) (5.37)
Means within rows sharing the same superscript were not significantly different from each other (t-tests).
and lowest in the condition “prediction quality: poor”

Strength of hindsight bias depends on the alleged prediction quality. The above-mentioned significant interaction between
feedback and prediction quality suggested that the strength of the hindsight bias varied across the five prediction quality
conditions. To elaborate on this analysis, we calculated the total difference score “mean difference score for ‘above’ feedback
questions minus mean difference score for ‘below’ feedback conditions”. This difference score served as a measure for the
hindsight bias in both feedback conditions. The higher this total difference score, the stronger the hindsight bias. Figure 2
shows the strength of the hindsight bias in the five prediction quality conditions according to the total difference score.
The total difference scores were subjected to a one-way ANOVA. This analysis showed that there were significant
differences between the prediction quality conditions, F(4, 84)=4.04, p < .01. To test the main hypotheses derived from the
model presented above, we calculated the following contrasts: The total difference score was significantly higher in the
condition “prediction quality: good” (M=12.61%) as compared to the pooled “poor” predictive ability conditions (“prediction
quality: too high, too low, and poor”, M=6.65%), t(84)=3.54, p < .001. The total difference score in the condition “prediction
quality: poor” (M=4.49%) was marginally lower than in the pooled conditions “prediction quality: too high and too low”
(M=7.72%), t(84)=1.85, p = .068. Furthermore, we calculated the contrasts between the two conditions “prediction quality:
good” and “prediction quality: poor” and the control condition “no information” about the predictive ability. The total
difference score was marginally higher in the condition “prediction quality: good” M=12.61%, t(84)=1.9, p=.061, and lower
in the condition “prediction quality: poor” M=4.49%, p < .05, as compared to the “no information” condition (M
= 8.72%).
Number of hits. Table 3 presents the mean number of hits as a function of the three outcome feedback conditions and the
alleged prediction quality. The mean numbers of hits were subjected to a 3×5 ANOVA. This analysis revealed a marginally
significant main effect of feedback, F(2, 168)=3.04, p=.051 To analyse whether the outcome information
influenced participants’ ability to remember their initial predictions, we compared the mean number of hits in the “no”
feedback condition (M=1.40) to the mean number of hits in the combined “above” and “below” feedback conditions (M=1.
11). This comparison revealed that there were more hits in the “no” feedback condition than in the pooled (“above” and
“below”) feedback conditions, t(88)=2.33, p < .05. The main effect prediction quality was also significant, F(4, 84)=2.88, p < .
05 Subsequent t-tests indicated that there was just one marginally significant difference in the number of hits,
namely between the condition
Figure 2. Strength of the hindsight bias in the prediction quality conditions (Experiment 2).
TABLE 3 Mean number of hits* (and standard deviations) as a function of prediction quality and feedback (Experiment 2)
Feedback
Prediction quality Above No Below
Good 0.76 1.29 0.94
(0.90) (1.36) (0.90)
Too high 1.33 1.44 1.28
(1.08) (1.29) (0.96)
Too low 0.78 0.78 0.61
(1.11) (0.81) (0.85)
Poor 1.28 1.61 1.11
(1.32) (1.19) (1.28)
No 1.56 1.89 1.44
(1.42) (1.49) (1.15)
The number of hits could vary between 0 and 6 in each feedback condition.
“prediction quality: too low” (M=0.72) and the “no information” condition (M=1.63). The interaction between feedback
and prediction quality did not reach significance, F(8, 168)=0.29, p=.97.
Discussion
This study has once again demonstrated the robustness of the hindsight bias. Participants exhibited reliable hindsight
distortions by shifting their estimates towards the alleged outcome. More importantly, with respect to the focus of this study,
the results showed that hindsight distortions were most pronounced when participants believed that they had been quite
accurate in their predictions. This finding is in line with the biased reconstruction explanation of the hindsight bias, which
argues that people’s subjective assumptions about how close or distant their original estimate had been from the true outcome
determine the direction and magnitude of the hindsight bias. Such meta-cognitions play no relevant role in the memory
impairment explanation. Yet the belief that one’s prediction must have been rather far from the true outcome did not entirely
eliminate the hindsight bias. This suggests that intentional reconstruction based on beliefs about one’s own predictive abilities
is not the only process operating in hindsight distortions. In line with other authors (Fischhoff, 1975; Fischhoff & Beyth, 1975;
Hawkins & Hastie, 1990), one might therefore conclude that automatic anchoring processes (for an overview see Wilson,
Houston, Etling, & Brekke, 1996) are operating in addition to intentional reconstruction mechanisms (for some more
experimental support see Stahlberg & Maass, 1998). Thus, participants may inadvertently use the outcome knowledge as an
anchor when reconstructing their estimates, even when they have little confidence in their predictive abilities.
Concerning the question of whether the outcome information influenced the participants’ ability to recall their initial
predictions, hits were more frequent on “no” feedback control items than on “above” or “below” feedback experimental
items. This result is contrary to the findings of other studies, which found no significant effect of the outcome information on
the number of hits (e.g., Dehn & Erdfelder, 1998; Erdfelder & Buchner, 1998; Pohl, 1993; Stahlberg et al., 1995). However,
this result is in line with the memory impairment approach. According to this approach, outcome information should impede
memory of the original prediction. This is what another study conducted by Pohl and Gawlik (1995) found. In an unpublished
meta-analysis of 23 independent data sets, Pohl found a weak effect for hits to be more frequent in no outcome information
conditions than in outcome information conditions (Pohl, 1993). Given the mixed results concerning the effect of outcome
information on the number of hits in the different conditions, one cannot exclude with certainty the possibility that memory
processes—in addition to the proposed reconstruction processes—play a crucial role in the development of the hindsight bias.
GENERAL DISCUSSION
In sum, the research reported in this paper shows that the biased reconstruction approach is a valuable and parsimonious way
to explain hindsight distortions. It seems that hindsight distortions are in large part a (biased) judgement rather than a memory
phenomenon (see also Stahlberg & Maass, 1998). Nevertheless, memory impairment cannot be totally excluded as an
explanation for the hindsight bias. In Experiment 2, hits occurred more frequently in the “no” feedback items than in the
“above” and “below” feedback items. In contrast, in Experiment 1 the presentation of outcome information had no effect on
individuals’ ability to recall their initial predictions. Given the mixed findings, more research is needed to address the
question of whether memory impairment, biased reconstruction, or a combination of both causes the hindsight bias.
A preliminary answer to this question is reported in a paper by Stahlberg and Maass (1998). The authors assume that
outcome information will be assimilated and existing memory traces will be altered only if there is a sufficiently complex
knowledge structure that can accommodate the new information, and if the old and new information can be merged into an
intermediate response. If these conditions are missing, they expected people to behave exactly as predicted by the biased
reconstruction approach. Stahlberg and Maass tested this assumption in a study in which participants were encouraged either
to develop a coherent cognitive representation of a target person (impression formation condition), or simply to memorise
detailed information (memory condition). They predicted that people would assimilate subsequent information about the target
person in the impression formation condition, but show a biased reconstruction under memory conditions. Their results
supported this idea. Thus, the authors concluded that both processes mentioned above could contribute to the development of
the hindsight bias. Rather than being mutually exclusive, memory impairment and biased reconstruction seem to have distinct
areas of application.
Experiment 2 has demonstrated that meta-cognitions play a significant role in determining the strength of the hindsight bias.
As predicted, hindsight distortions were most pronounced when people had reason to believe that their initial predictions must
have been very close to the actual outcome. This finding is in line with the biased reconstruction explanation and receives
support from other experiments done by Werth (1998) and Werth, Strack, and Förster (2002). The authors could also show
that idiosyncratic beliefs about one’s performance had an influence on the strength of the hindsight bias.
Finally, the fact that the magnitude of the hindsight bias is greater in the hypothetical design than in the memory design can
be easily explained by the biased reconstruction approach. It states that only participants who forget their initial estimate will
produce the hindsight bias. Due to the fact that there are—by definition—no participants with perfect recall in the
hypothetical design, the chance for the hindsight bias to occur is greater in the hypothetical design than in the memory design.
With an equal chance for the hindsight bias to occur, hindsight distortions in the hypothetical design were no longer
significantly larger than in the memory design.
REFERENCES
Belli, R.R, Windschitl, P.D., McCarthy, T.T., & Winfrey, S.E. (1992). Detecting memory impairment with a modified test procedure:
Manipulating retention interval with centrally presented event items. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 18, 356–367.
Blank, H., Fischer, V., & Erdfelder, E. (2003). Hindsight bias in political elections. Memory, 11, 491– 504.
Campbell, J.D., & Tesser, A. (1983). Motivational interpretations of hindsight bias: An individual difference analysis. Journal of
Processes, 48, 147–168.
Davies, M.F. (1992). Field-dependence and hindsight bias: Cognitive restructuring and the generation of reasons. Journal of Research in
Dehn, D.M., & Erdfelder, E. (1998). What kind of bias is the hindsight bias? Psychological Research, 61, 135–146.
349–358.
Fischhoff, B., & Beyth, R. (1975). “I knew it would happen”. Remembered probabilities of once-future things. Organizational Behaviour
and Human Performance, 13, 1–16.
Greenwald, A.G. (1980). The totalitarian ego: Fabrication and reversion of personal history. American Psychologist, 35, 603–618.
311–327.
Hell, W., Gigerenzer, G., Gauggel, S., Mall, M., & Müller, M. (1988). Hindsight-bias: An interaction of automatic and motivational factors?
Loftus, E.F. (1975). Leading questions and the eyewitness report. Cognitive Psychology, 7, 560–572.
Loftus, E.F. (1979). Eyewitness testimony. Cambridge, MA: Harvard University Press.
Loftus, E.F., & Hoffman, H.G. (1989). Misinformation and memory: The creation of new memories. Journal of Experimental Psychology:
General, 118, 100–104.
Loftus, E.F., & Loftus, G.R. (1980). On the performance of stored information in the human brain. American Psychologist, 35, 409–420.
Louie, T.A. (1999). Decision makers’ hindsight bias after receiving favorable and unfavorable feedback. Journal of Applied Psychology,
84, 29–41.
Louie, T.A., Curren, M.T., & Harich, K.R. (2000). “I knew we would win”: Hindsight bias for favorable and unfavorable team decision
outcomes. Journal of Applied Psychology, 85, 264–272.
Mark, M., & Mellor, S. (1991). Effect of self-relevance of an event on hindsight bias: The foreseeability of a layoff. Journal of Applied
McCloskey, M., & Zaragoza, M. (1985a). Misleading postevent information and memory for events: Arguments and evidence against
memory impair ment hypotheses. Journal of Experimental Psychology: General, 114, 1–16.
McCloskey, M., & Zaragoza, M.S. (1985b). Postevent information and memory: Reply to Loftus, Schooler, and Wagenaar. Journal of
Experimental Psychology: General, 114, 381–387.
Morton, J., Hammersley, R.H., & Bekerian, D.A. (1985). Headed records: A model for memory and its failures. Cognition, 20, 1–23.
Pohl, R.F. (1993). Der Rückschau-Fehler. Ein Modell zur Analyse und Erklärung systematisch verfälschter Erinnerungen [Hindsight bias.
A model for analysing and explaining systematically distorted recollections] Unpublished habilitation thesis,. University of Trier,
Germany.
Pohl, R.F., & Gawlik, B. (1995). Hindsight bias and the misinformation effect: Separating blended recollections from other recollection
types. Memory, 3, 21–55.
Memory, 11, 337–356.
Pohl, R.F., Stahlberg, D., & Frey, D. (1999). I’m not trying to impress you, but I surely knew it all along! Self-presentation and hindsight
bias. Working Paper No. 99–19, SFB 504, University of Mannheim, Germany.
Schwarz, S. (2002). Motivationale Einflüsse auf den Hindsight Bias: Selbstwertdienliche Verarbeitung von persönlich relevanten
Informationen [Motivational influences on the hindsight bias: Self-serving processing of self-relevant information]. Hamburg: Kovac
Verlag.
Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of Experimental Psychology: Human Perception
and Performance, 3, 544–551.
Stahlberg, D. (1994). Der Knew-it-all-along-Effekt. Eine urteilstheoretische Erklärung [The knew-it-all-along-effect. A biased
reconstruction explanation]. Unpublished habilitation thesis, Christian-Albrechts-University, Kiel, Germany.
Stahlberg, D., & Eller, F. (1993). Hindsight-Effekte— eine urteilstheoretische Erklärung [Hindsight effects —a biased reconstruction
explanation]. In L. Montada (Ed.), Bericht über den 38. Kongreβ der Deutschen Gesellschaft für Psychologie in Trier (pp. 735–741).
Göttingen: Hogrefe.
Stahlberg, D., Eller, R, Maass, A., & Frey, D. (1995). We knew it all along: Hindsight bias in groups. Organizational Behavior and Human
Review of Social Psychology, Vol 8 (pp. 105–132). Chichester, UK: Wiley & Sons.
low self-esteem relevance. Working paper No. 99–34, SFB 504, University of Mannheim, Germany.
Stahlberg, D., Sczesny, S., & Schwarz, S. (1999). Exculpating victims and the reversal of hindsight bias. Working paper No. 99–70, SFB
504, University of Mannheim, Germany.
Verplanken, B., & Pieters, R.G.M. (1988). Individual differences in reverse hindsight bias: I never thought something like Chernobyl would
happen. Did I? Journal of Behavioral Decision Making, 1, 131–147.
Werth, L. (1998). Ein inferentieller Erklärungsansatz des Rückschaufehlers [An inferential explanation of the hindsight bias]. Hamburg:
Kovac Verlag.
Werth, L., Strack, F. & Förster, J. (2002). Certainty and uncertainty: The two faces of the hindsight bias. Organizational Behavior and Human
Wilson, T.D., Houston, C., Etling, K.M., & Brekke, N. (1996). A new look at anchoring effects: Basic anchoring and its antecedents.
Journal of Experimental Psychology: General, 4, 387–402.
Winman, A., Juslin, R, & Björkman, M. (1998). The confidence-hindsight mirror effect in judgment: An accuracy-assessment model for the
knew-it-all-along phenomenon. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 415–431.
Wood, G. (1978). The knew-it-all-along effect. Journal of Experimental Psychology: Human Perception and Performance, 4, 345–353.
Zaragoza, M.S., & McCloskey, M. (1989). Misleading postevent information and the memory impairment hypothesis: Comment on Belli
and reply to Tversky and Tuchin. Journal of Experimental Psychology: General, 118, 92–99.
An inferential approach to the knew-it-all-along phenomenon
Lioba Werth and Fritz Strack
University of Würzburg, Germany
Two studies tested the hypothesis that the knew-it-all-along effect may be the result of an inferential process.
Specifically, that individuals use their feelings and experiences (e.g., “This question seems so familiar to me,
surely I would have known the answer!”) to infer their judgement. Drawing on subjective feelings such as
certainty or perceptual fluency, individuals can use a provided actual value as an informational cue and draw
inferences from it. Thus, the occurrence of the knew-it-all-along effect is expected to depend on the experienced
feeling of confidence with a question. This feeling may indicate to an individual that he or she did know the
answer; a total lack of such a feeling may suggest that he or she never would have known the answer. In the
reported studies we both measured feelings of confidence (Study 1) and induced them by manipulating perceptual
fluency (Study 2) to show that the knew-it-all-along effect proves to be a phenomenon of inferences based on
these experienced feelings. Participants experiencing high confidence or high perceptual fluency more strongly
assimilated their judgements to the provided values, than did participants experiencing low confidence or low
perceptual fluency.
How do people answer questions about which they have very little information when they are required to produce an answer?
The literature contains countless examples showing that individuals who render judgements under uncertainty use heuristics
to arrive at the required judgements (e.g., Tversky & Kahneman, 1974). They use experienced feelings as information (Schwartz
& Clore, 1983) when they are not confident about their judgements; that is, in the judgemental situation they make use of
their own subjective experiences or perceptions. It has been demonstrated that feelings of familiarity (Jacoby, Kelley, Brown,
& Jasechko, 1989a), ease of retrieval (Schwarz, Bless, Strack, Klumpp, Rittenauer-Schatka, & Simons, 1991), feelings of
uncertainty (Clore & Parrott, 1994), as well as feelings of knowing (Koriat, 1998; Koriat & Goldsmith, 1998; Nelson, Gerler,
& Narens, 1984) play an important role for decision making in that they are used as diagnostic cues to generate a judgement.
To give an example: In one of their most famous studies, Jacoby et al. (1989a) demonstrated that feelings of familiarity were
used for making fame judgements (see also Jacoby, Woloshyn, & Kelley, 1989b; Kelley & Lindsay, 1993). When participants
were required to evaluate people’s fame they tended to misattribute their own feelings of familiarity (which resulted from a
previous task) to fame. They inferred “this person feels so familiar to me; therefore she must be a famous person”. In this way
people became “famous overnight” as a result of an inference based on familiarity. The present article applies this feeling-as-
information logic to a special kind of judgement under uncertainty: The knew-it-all-along effect.

THE KNEW-IT-ALL-ALONG PARADIGM

The knew-it-all-along effect refers to a person’s tendency to distort a judgement about what he or she would have answered in
the direction of information that is provided concurrently with the question (the true answer or the actual outcome of a
Requests for reprints should be sent to Lioba Werth, Department of Psychology II, Universität Würzburg, Röntgenring 10,
D-97070 Würzburg, Germany. Email: werth@psychologie.uni-wuerzburg.deThis research was supported by a grant from the
Deutsche Forschungsgemeinschaft (Fo244/3–3). We would like to thank Roland Deutsch, Jens Forster, Michael Hafner,
Jennifer Mayer, Thomas Mussweiler, Roland Neumann, and Beate Seibt as well as our reviewers, especially Rüdiger Pohl and
Ulrich Hoffrage, for their helpful comments. We would also like to thank Markus Denzler, Sonja Eiden, Stephanie Floter,
Susanne Faber, Claudius Goring, and Bärbel Schöppner for collecting the data.
402 WERTH AND STRACK
situation). The knew-it-all-along paradigm includes one judgement phase in which participants are provided with the actual
value to the question and required to make a hypothetical judgement about what they would have answered had they not
known this actual value.1 The typical finding is that participants provided with the actual value shift towards the provided
actual value compared to a control group that did not receive an actual value.2
In this situation, ambiguity derives from two sources: First, the questions are deliberately chosen to be difficult so that people
cannot readily know the answer; second, the task itself is unusual and difficult: Participants are given the actual value or actual
outcome but have to ignore it. Therefore, lack of knowledge and experienced difficulty of the judgemental task result in
ambiguity of the situation and thus lead to uncertainty. This view of the knew-it-all-along effect as stemming from uncertainty
is consistent with several experimental findings: The magnitude of the knew-it-all-along effect increases with the ambiguity
of the task (Creyer & Ross, 1993); but decreases when counterfactual thinking is required (Arkes, Faust, & Hart, 1988;
Davies, 1992). Nevertheless, there is still no consensus about the underlying processes.
Approaches to the knew-it-all-along effect

Previous approaches. The traditional approach to the knew-it-all-along effect in the literature is that it reflects a creeping
determinism (Fischhoff, 1975). Proponents of this view argue that the information that participants receive about the correct
estimate is automatically assimilated to previous knowledge; consequently, unbiased knowledge is no longer accessible once
the correct estimate is encoded (e.g., Fischhoff, 1977; Leary, 1981).3
Other approaches assume that the knew-it-all-along effect is the result of anchoring (Connolly & Bukszar, 1990; Hardt &
Pohl, 2003-this issue; Sharpe & Adair, 1993; Tversky & Kahneman, 1974) or accessibility (Agans & Shaffer, 1994; Janoff-
Bulman, Timko, & Carli, 1985; Sharpe & Adair, 1993) caused by the provided value. According to these approaches, once
the actual value is encoded, only actual-value-consistent knowledge is activated; thus, the assimilation is due to a selective
knowledge-activation (Mussweiler & Strack; 1999a, 1999b, 2000; Pohl, Eisenhauer, & Hardt, 2003-this issue).
But there are findings that do not fit these approaches. An automatic assimilation of knowledge cannot explain why the
magnitude of the knew-it-all-along effect decreases when the provided value is perceived as highly incredible (e.g., Hardt &
Pohl, 2003-this issue; Pohl, 1998), or when counterfactual thinking is required (Arkes et al., 1988; Davies, 1992). Moreover it
cannot account for a reversed knew-it-all-along effect (“I never would have known it!”) that is produced in the case of
extremely difficult items (Hudson & Campion, 1994). However, these findings do fit into an inferential approach to the knew-
it-all-along effect that we will focus on in the following.
Inferential approach. According to our approach, individuals may use inferential strategies in order to arrive at the
judgement in the knew-it-all-along paradigm. This approach is based on the assumption of previous reconstruction theories
(Erdfelder & Buchner, 1998; Stahlberg & Maass, 1998; see also Hertwig, Fanselow, & Hoffrage, 2003-this issue; Schwarz &
Stahlberg, 2003-this issue) that people construct their hypothetical judgement in the knew-it-all-along paradigm using the
correct answer they have been told as a cue for inferring what they would have said without this information. How does this
inferential process operate? As outlined above, various sources, for example ease of retrieval or perceptual fluency, are used
for drawing inferences in a judgement process. It is reasonable to assume that judgements in the knew-it-all-along paradigm
are similarly affected by the ease with which a potential answer comes to mind or the ease with which the construction
process is carried out.
Moreover, it has been shown that people express stronger confidence in answers they retrieve more quickly, regardless of
whether those answers are correct or incorrect (Nelson & Narens, 1990). Participants’ judgemental confidence is based on a
simple heuristic: Answers that come to mind easily are more likely to be correct than those that take longer to retrieve. Following
this logic, participants experiencing ease or high confidence in generating the answer should misattribute this feeling to the
target of judgement (e.g., “This question seems so familiar to me, I’m sure I would have known the solution!”) and
overestimate how much they would have known; the result is an assimilation towards the provided value. Conversely,
1 Two experimental designs have been employed to study effects that are variously referred to as hindsight bias and the knew-it-all-along
effect. Following Hertwig, Gigerenzer, and Hoffrage (1997), we too confine the use of the term “hindsight bias” to the effects yielded in designs
in which recalled judgements are studied (referred to as memory designs; see Werth, Strack, & Förster, 2002), while we also prefer use of
the term “knew-it-all-along effect” (see Stahlberg & Maass, 1998) for designs in which hypothetical judgements are studied (referred to as
hypothetical designs). Due to the different proceedings in the memory-based and the hypothetical design, different processes are involved in
each. Thus, it is reasonable to investigate the underlying processes separately. The explanatory approach proposed here is therefore
restricted to the hypothetical design.
2 The term actual value does not imply that the feedback is the correct answer but that participants were led to believe that it was the correct
answer to the question.

3 Since empirical evidence for a motivational account (e.g., Campbell & Tesser, 1983) of the knew-it-all-along effect is generally weak (see
Christensen-Szalanski & Willham, 1991; Hawkins & Hastie, 1990), we will restrict ourselves to the cognitive accounts.
THE KNEW-IT-ALL-ALONG PHENOMENON 403
participants experiencing difficulty or low confidence in reconstructing the answer should misattribute this feeling to the
target of judgement (e.g., “This question is so difficult for me, I would have never known the answer!”) and overestimate how
much they would not have known; the result is less assimilation towards the provided value, no assimilation at all, or even a
contrast effect. Thus, assimilation and contrast are seen as the results of an inference process based on feelings of confidence.
This fundamental inference process should be applicable to various kinds of items (numeric items, forced-choice items,
probability items, etc.).
To sum up, the knew-it-all-along effect is likely to occur when the judgemental situation is ambiguous and people tend to
reduce this ambiguity by drawing inferences. We assume that people distort their judgement towards the provided value
whenever a feeling of confidence4 is (mis)attributed to the target of judgement. According to our theorising, an inference
implying high confidence with the task results in the knew-it-all-along effect, whereas an inference implying low confidence
actually leads to a reversed or at least to a reduced knew-it-all-along effect. These assumptions will be tested in the following
studies. While Study 1 measured feelings of confidence directly, Study 2 induced feelings of confidence via a perceptual-
fluency manipulation.
STUDY 1
Participants’ feeling of confidence was measured directly by using a rating scale. We expected that the answers given by the
participants charged with high confidence would evoke a stronger knew-it-all-along effect (“I would have known that!”)
whereas a smaller, no, or reversed knew-it-all-along effect (“I would have never known that!) was hypothesised in the case of
answers given with low confidence.
Method
Because the two studies are similar with respect to dependent measures and measurement of the knew-it-all-along effect, the
methods will be described in detail for both studies; any deviations in the second study from the measures specified here will
be described as they occur.
Participants. Eighteen University of Würzburg undergraduates majoring in different fields participated in the study. They
were recruited for a study evaluating stimulus material for quiz shows and were tested in groups of up to six people.
Participants were paid 10 DM (approximately 5 US$) for their participation.
Stimulus material. Participants worked on 40 test questions. All questions were difficult encyclopedia questions requiring
numerical answers. Examples: “How many 5-DM coins fit into a 10 litre bucket?” or “How many paintings did Albrecht
Dürer create?”. These questions were selected on the basis of the results of a preliminary testing carried out on 50
participants. From the pre-tested questions we selected only those that none of the participants had answered correctly. All
questions were presented in a fixed order.
In order to investigate inferential processes in the knew-it-all-along paradigm, we used a special procedure that departs
from the one typically employed in previous studies. It may be assumed that the greater the distance between the provided
value and the participant’s answer, the less confident the participant will be (Werth et al., 2002). Therefore, rather than
presenting all participants with the true values, the provided values were determined based on a calibration group. For each
question, participants received an “actual value” that was either 20% higher or 20% lower than the mean estimate of a
calibration group (N=50). (To give an example: For the question “How many currencies exist worldwide?” the low actual
value was 290, the high one 436.) This allowed us to skip the control group and to compare for each question the knew-it-all-
along effect in the case of a high actual value to a low actual value.
Half of the questions included a high actual value; the other half a low one (the assignment of low and high actual values
was counterbalanced across participants). Separately for each participant, questions with high versus low confidence level
were computed by a median split.
Design. The result is a 2×2 factorial design with the within-factor “actual value” (high vs low) and the a posteriori
determined within-factor “confidence level” (high vs low).
Procedure. Participants were required to work on a pretest to test stimulus material for quiz shows. Participants were told
that they would have to answer difficult questions. They were given the actual value to each question because many
participants supposedly liked to know the answers, but they were required to answer the questions as they would have without
knowing the actual value. For each estimate a confidence rating was required (“How certain are you about your answer?”) on
a scale ranging from 1 (= extremely uncertain) to 7 (= extremely certain). (The confidence question referred to the
experienced confidence in the actual judgement process.) Time to work on the task was up to the participants.
4 Those feelings might refer to the presented question, to the required answer, or to both.
Dependent measures. Participants’ estimates were analysed in two different ways. First, in order to determine the influence
of the actual value, an analysis of the absolute estimates was carried out. To pool estimates across different domains, all estimates
were transformed into z-scores. A positive z-score indicates a high estimate, a negative z-score a low estimate. These scores
were averaged across questions per participant and condition. Analysis of the absolute estimates shows whether or not the
estimates were influenced by the actual values (that is, when given a high actual value, participants’ estimates should be
higher than in the case of low actual values). That is all this analysis shows. It cannot determine the extent of the influence
from actual values. To give an example: given the high actual value of 436, Participant A might estimate 790 as the number
of currencies worldwide, whereas Participant B would say 490. Estimate A would result in a much higher absolute estimate
than B although it is not stronger assimilated towards the actual value. In order to determine the amount of assimilation we
need a second dependent measure.
In order to determine the extent of the assimilation towards the actual values, we therefore carried out an analysis of the
deviated distance between the actual values and the estimates. This distance was computed through the difference
and transformed into z-scores. The more negative the z-score, that is the smaller the distance, the
stronger the assimilation towards the actual value. These scores were then averaged across questions per participant and
condition. Alpha was fixed at a 5% level.
Results
Absolute estimates. To test whether the estimates would be biased towards the provided actual values, we conducted a
repeated measures analysis of variance (ANOVA). As Table 1 indicates, the typical knew-it-all-along effect was obtained.
When given a high actual value, participants’ estimates were accordingly higher (M = 0.20) than in the case of low actual
values (M=−0.20) F(1, 17)=36.12, p < .001; Because this pattern was more pronounced in the case of high as opposed
to low confidence (due to more extreme estimates), an interaction effect between confidence level and actual value was
obtained, F(1, 17)=7.35, p < .015; The simple contrasts between low and high actual values were both significant: On
questions charged with high confidence, t(16)=−5.91 p < .001; on questions charged with low confidence, t(16)=−2.85 p < .
011. The main effect “confidence level” did not reach statistical significance, F < 2.31.
Deviated distance to the actual value. To test whether confidence modulates the magnitude of the knew-it-all-along effect,
we conducted a
TABLE 1
Study 1
Confidence
High Low
Absolute estimate
High actual value .33 (.33) .07 (.31)
Low actual value −.24 (.24) −.15 (.28)
Distance between actual value and estimate
High actual value −.20 (.44) .06 (.26)
Low actual value −.30 (.44) .21 (.35)
Absolute estimates (z-transformed) and distances (z-transformed) between estimates and actual values by actual values and confidence
(Study 1).
Standard deviations in parentheses.
repeated measures analysis of variance (ANOVA) on the deviated distances between estimates and actual values. Table 1
indicates that the degree of confidence had a positive effect on the extent of the knew-it-all-along effect: The estimates were
more assimilated towards the actual value when participants experienced high confidence as compared to low
confidence (M=0.14), F(1, 17)=22.45, p < .001; This pattern was more pronounced in the case of low actual values as
compared to high actual values, F(1, 17)=5.51, p < .031; for the interaction.5 This interaction effect was not predicted
and thus will not be further interpreted. There was no main effect of actual value, F < 1.
Discussion
The results show that feelings of confidence influenced the extent of the knew-it-all-along effect. While the analysis of the
absolute estimates showed that the estimates were influenced by the actual values (when given a high actual value,
participants’ estimates were higher than in the case of low actual values), the deviated distance between the actual values and
the estimates determined the extent of the assimilation towards the actual values. Compared to participants experiencing low
confidence in generating the answer, participants experiencing high confidence more strongly assimilated their judgement
towards the provided value. The combination of both measures allows a more detailed understanding of the knew-it-all-along
effect than the exclusive interpretation of estimates. Whereas the latter shows that the provided values influenced the
estimates, the distance measure shows how this influence worked.
In this study, however, before rating their confidence for each estimate, participants had to generate their answer. Therefore,
judged confidence could be a result of moving one’s judgement towards the actual value; this is more likely than the
alternative, that assimilation towards the actual value resulted from participants’ confidence. If the latter were the case, the
assessed confidence would be an artifact of the measurement of confidence (the order of answer generation and confidence
judgement).
To exclude this alternative explanation and the post hoc nature of the study, an experimental induction of confidence was
warranted. The second study therefore sought to replicate the results of the first using induced instead of (a posteriori)
measured confidence.
STUDY 2
As outlined earlier it has been shown that people express stronger confidence in answers they retrieve more quickly (Nelson &
Narens, 1990). Experienced perceptual fluency affects the ease with which the judgement process is carried out and thus
increases confidence in a judgement (see, e.g., Busey, Tunnicliff, Loftus, & Loftus, 1995). Therefore, this study used a perceptual
fluency manipulation adapted from Reber and Schwarz (1999; see also Whittlesea, Jacoby, & Girard, 1990). Questions were
presented in colours that made them more or less easy to perceive against a coloured background, a procedure that
manipulated ease of processing. We expected that the same question would evoke a stronger knew-it-all-along effect (“I
would have known that!”) when it was easy rather than difficult to process. A smaller, no, or reversed knew-it-all-along effect
(“I would have never known that!”) was hypothesised in the case of questions that were difficult to see.
Following this reasoning, participants experiencing ease in generating the answer should misattribute this feeling to the
target of judgement (e.g., “This question seems so familiar to me, I’m sure I would have known the solution!”) and
overestimate how much they would have known.
Method
Participants. A total of 96 University of Würzburg undergraduates majoring in psychology participated in the study. They
were recruited for a series of studies on computer-based working conditions and tested in groups of up to four people.
Participants were paid 20 DM (approximately 11 US$) for their participation.
Stimulus material. Participants worked on the same 40 questions as the previous study (e.g., “How many paintings did
Albrecht Dürer create?”). All questions were presented computer-based and in random order. There was no time restriction.
Perceptual fluency of the items was manipulated by the contrast of the question colour to the background. Questions of
easy visibility (high perceptual fluency) were yellow on a green background, whereas questions of difficult visibility (low
perceptual fluency) were yellow on a red background. Half of the questions were presented in the “easy visibility” condition
(10 with a low and 10 with a high actual value), the other half were presented in the “difficult visibility” condition.
Thus, there were four different versions of the stimulus material; each question was presented with a low actual value in
two of these versions and with a high actual value in the other two versions, and it was presented with easy visibility in one
version and difficult visibility in the other. The conditions were counterbalanced across participants.
Design. This procedure results in a 2×2 factorial design with the within-factors “perceptual fluency” (easy vs difficult
visibility) and actual value (high vs low actual value).
Procedure. Participants were informed that they were taking part in a study investigating different computer-based working
conditions, to which end general knowledge questions were presented in different colours and types (in their condition only
the colour was varied). Participants were told that they were given the actual value to each question because many participants
liked to know the answers, but that they were required to answer the questions as they would have done had they not known
the actual value. Questioning of the participants after the experiment revealed that they did not suspect that we were interested
in effects of perceptual fluency (or visibility, readability) on their estimates or on the distance by which the estimates deviated
from the actual value.
5Because the contrasts between low and high confidence were both significant, the interaction does not question the interpretation: low
actual values: t(17)=6.15, p < .001; high actual values: t(17)=2.33, p < .032.
Participants were instructed to report when they could not read a question. Three participants were not able to read the
difficult visibility questions, so we adjusted the contrast on the computer screen for these persons. Thus, it was guaranteed
that all questions were readable by all participants.
Dependent measures. The same dependent measures as in the previous study were used. Absolute estimates and distances
between the “actual values” and the estimates were calculated according to the procedure explained in Study 1.
Results
Absolute estimates. As Table 2 shows, overall, high actual values led to higher absolute estimates (M=0.05) than low actual
values (M=−0.06), F(1, 95)=15.05, p < .001; thus confirming the hypotheses that the actual values had a biasing
effect on the estimates. There were neither a main effect of nor an interaction with perceptual fluency, both Fs < 1.
Deviated distance to the actual value. Table 2 also entails the distances between estimates and actual values. The
expectation was that the assimilation towards the actual value would be modulated by perceptual fluency. Estimates for easy
visibility questions (high fluency) were more strongly assimilated towards the actual values than estimates for
difficult visibility questions (M=0.04), F(1, 95)=5.86, p < .017; Neither the interaction effect nor the main
TABLE 2
Study 2
Perceptual fluency
High Low
Absolute estimate
High actual value .08 (.32) .05 (.34)
Low actual value −.07 (.28) −.04 (.32)
Distance between actual value and estimate
High actual value −.02(39) .03 (.40)
Low actual value −.06 (.34) .06 (.48)
Absolute estimates (z-transformed) and distances transformed) between estimates and actual values by actual values and perceptual fluency
(Study 2).
Standard deviations in parentheses.
effect of “actual value” reached significance, both Fs < 1.
Discussion
The data show that perceptual fluency affected participants’ estimates, in that participants experiencing high perceptual
fluency assimilates their judgements towards the provided value, whereas participants’ estimations in case of low perceptual
fluency resulted in a smaller assimilation effect towards the provided value.
Although the confidence effect is smaller in Study 2 than in Study 1—which is a typical finding for perceptual fluency
manipulation due to the subtle kind of induction procedure—this does not question the replicated pattern.
However, it is necessary to consider the possibility that the items in the difficult visibility condition were simply
unreadable. However, after adjusting the screen’s contrast in a few cases, no participant reported that a question was not
readable at all (although all were instructed to do so). More important, if a question was not readable, the same would have
been true for the provided actual value. Nevertheless, the actual values of difficult visibility had an effect on participants’
estimates, which means they must have encoded them.
GENERAL DISCUSSION
To sum up, these studies showed that subjective feelings moderate the knew-it-all-along effect: Whereas a strong knew-it-all-
along effect was obtained for participants experiencing high confidence or high perceptual fluency, a smaller assimilation
effect was obtained for participants experiencing low confidence or low perceptual fluency. Thus, participants used feelings
as bases for inferring their judgements.
Theoretical implications
The relation between confidence and the amount of the knew-it-all-along effect has been assumed and discussed by various
authors (ChristensenSzalanski & Willham, 1991; Hawkins & Hastie, 1990; Hoch & Loewenstein, 1989; Hom & Ciaramitaro,
2001; Jacowitz & Kahneman, 1995; Winman, Juslin, & Björkman, 1998). Hoch and Loewenstein (1989) as well as Hom and
Ciaramitaro (2001), for example, assumed that some kind of metacognitive knowledge (resulting from feelings of confidence
or self-perceptions) moderates the knew-it-all-along effect. As was shown in the literature, metacognitive knowledge is an
important base for drawing inferences (Förster & Strack, 1998; Strack & Förster, 1998; Werth, 1998; Werth & Förster, 2002;
Werth et al., 2002). The present studies provided empirical evidence for this view and demonstrated how this process may
work.
Moreover, the present studies fit well with the assumptions of Stahlberg and Maass (1998; Schwarz & Stahlberg, 2003-this
issue) and Pezzo (2003-this issue). According to Stahlberg and Maass, deliberate judgements while the answer is generated
may be responsible for the knew-it-all-along effect. According to these authors, individuals generate their judgements
assuming that their estimates are at least partially correct. Therefore, it seems justified to place their own answer somewhere
near the actual value. Similarly, Pezzo (2003-this issue) has shown that surprise might also be used as a cue for (re)
constructing. In this paper empirical evidence sheds light on potentially underlying processes. By this our paper might be
understood as linking this assumption of metacognitive processes (in the special case of confidence) to an inferential approach
to the knew-it-all-along effect.
Neither the creeping-determinism approach to the knew-it-all-along effect nor anchoring effects can account for the present
data. The automatic assimilation of the actual value towards the pre-existing knowledge as well as anchoring effects should
not depend on feelings or other experiences. Within this approach it seems difficult to assume that automatic knowledge
assimilation was stronger or more probable for the items charged with high confidence than for the other items. That is why
neither the assumption of automatic knowledge assimilation nor anchoring can on its own provide a sufficient explanation.
But one might speculate that inferential processes might moderate or interact with anchoring effects.
The present studies, in line with others (see also Werth, 1998; Werth et al., 2002), seem to be a promising starting point for
linking the existing findings concerning the interaction of feelings and decision making, of inferences and anchoring.
Conclusions
The work presented here was designed to specify the mechanisms underlying the knew-it-all-along effect. With regard to the
question of how hypothetical judgements are generated, we propose a constructive process that comprises both experiences
and inferences. The results of the present studies demonstrated a stronger knew-it-all-along effect when decision makers used
their feelings of confidence or fluency as a base to draw inferences about the presumed estimate.
This finding raises the question whether subjective feelings always improve judgements. The answer is no. In particular,
the validity of a judgement based on a feeling depends on the validity of the attribution of this feeling, for example, on its
representativeness for the target of judgement (Schwarz et al., 1991). It has been shown that the influence of feelings used “as
information” was eliminated when their informational value was discredited (Schwarz et al., 1991). Thus, when the
diagnosticity of one’s own feelings for the judgement is called into question, feelings should not be used for drawing
inferences.
Moreover, familiarity and perceptual fluency are only two examples of feelings that may be used as a basis for judgements;
other subjective experiences, such as a feeling of knowing or ease of retrieval, may similarly cause a knew-it-all-along effect
(e.g., Schwarz et al., 1991). Apart from subjective feelings, participants also use theories for drawing inferences, for example,
theories about their own performance or the performance of people in general (see Förster & Strack, 1998; Werth, Strack, &
Förster, 2001, 2002). Here again, the validity of the judgement depends on the validity of its base. Thus, research may benefit
from paying closer attention to experiences, feelings, or theories as bases from which to draw inferences.
REFERENCES
Agans, R.P., & Shaffer, L.S. (1994). The hindsight bias: The role of availability heuristic and perceived risk. Basic and Applied Social
Arkes, H.R., Faust, D., & Hart, K. (1988). Eliminating the hindsight bias. Journal of Applied Psychology, 73, 305–307.
Busey, T.A., Tunnicliff, J., Loftus, G.R., & Loftus, E.F. (1995). Accounts of the confidence-accuracy relation in recognition memory.
Psychonomic Bulletin and Review, 7, 26–48.
Christensen-Szalanski, J.J.J., & Willham, C.F. (1991).The hindsight bias: A meta-analysis. Organizational Behavior and Human Decision
Processes, 48 147– 168.
Clore, G.L., & Parrott, W.G. (1994). Cognitive feelings and metacognitive judgments. European Journal of Social Psychology, 24,
101–115.
Connolly, T., & Bukszar, E. (1990). Hindsight bias: Self-flattery or cognitive error? Journal of Behavioral Decision Making, 3, 205–211.
Creyer, E., & Ross, W.T.Jr. (1993). Hindsight bias and inferences in choice: The mediating effect of cognitive effort. Organizational
Behavior and Human Decision Processes, 55, 61–77.
Personality, 26, 58– 74.
Erdfelder, E., & Buchner, A. (1998). Decomposing the hindsight bias: A multinominal processing tree model for separating recollection and
Fischhoff, B. (1975). Hindsight ≠ Foresight: The effect of outcome knowledge on judgement under uncertainty. Journal of Experimental
349–358.
Förster, J., & Strack, F. (1998). Subjective theories about encoding may influence recognition. Judgmental regulation in human memory.
Social Cognition, 16, 78–92.
311–327.
Memory, 11, 357–377.
Hertwig, R., Gigerenzer, G., & Hoffrage, U. (1997). The reiteration effect in hindsight bias. Psychological Review, 104,194–202.
Memory, and Cognition, 15, 605–619.
Hom, H.L., & Ciaramitaro, M. (2001). GTIDHNIHS: I-knew-it-all-along. Applied Cognitive Psychology, 15, 493–507.
Hudson, J.P. Jr., & Campion, J.E. (1994). Hindsight bias in an application of the Angoff method for setting cutoff scores. Journal of
Applied Psychology, 79, 860–865.
Jacoby, L.L., Kelley, C.M., Brown, J., & Jasechko, J. (1989a). Becoming famous over night: Limits on the ability to avoid unconscious
influence of the past. Journal of Personality and Social Psychology, 56, 326–338.
Jacoby, L.L., Woloshyn, V., & Kelley, C. (1989b). Becoming famous without being recognized: Unconscious influences of memory
produced by dividing attention. Journal of Experimental Psychology: General, 118, 115–125.
Jacowitz, K.E., & Kahneman, D. (1995). Measures of anchoring in estimation tasks. Personality and Social Psychology Bulletin, 21,
1161–1166.
Janoff-Bullman, R., Timko, C., & Carli, L.L. (1985). Cognitive biases in blaming the victim. Journal of Experimental Social Psychology,
27, 161–177.
Kelley, C.M., & Lindsay, D.S. (1993). Remembering mistaken for knowing: Ease of retrieval as a basis for confidence in answers to
general knowledge questions. Journal of Memory and Language, 32, 1–24.
Koriat, A. (1998). Metamemory: The feeling of knowing and its vagaries. In J.G.Adair & F.I.M.Craik (Eds.), Advances in psychological
sciences, Vol. 2: Biological and cognitive aspects (pp. 461–479). Hove, UK: Psychology Press.
Koriat, A., & Goldsmith, M. (1998). The role of meta-cognitive processes in the regulation of memory performance. In G.Mazzoni &
T.Nelson (Eds.), Metacognition and cognitive neuropsychology: Monitoring and control processes (pp. 97–118). Mahwah, NJ:
Lawrence Erlbaum Associates Inc.
Leary, R.L. (1981). The distorted nature of hindsight. Journal of Social Psychology, 115, 25–29.
Mussweiler, T., & Strack, F. (1999a). Comparing is believing: A selective accessibility model of judgmental anchoring. In W.Stroebe &
M.Hewstone (Eds.), European review of social psychology (Vol. 10, pp. 135–167). Chichester, UK: Wiley.
Mussweiler, T., & Strack, F. (1999b). Hypothesis-consistent testing and semantic priming in the anchoring paradigm: A selective
accessibility model. Journal of Experimental Social Psychology, 35, 136– 164.
Mussweiler, T., & Strack, F. (2000). Numeric judgment under uncertainty: The role of knowledge in anchoring. Journal of Experimental
Social Psychology, 36, 495–518.
Nelson, T.O., Gerler, D., & Narens, L. (1984). Accuracy of feeling-of-knowing judgments for predicting perceptual identification and
relearning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 279–288.
Nelson, T.O., & Narens, L. (1990). Metamemory: A theoretical framework and some new findings. In G.H.Bower (Ed.), The psychology of
learning and motivation (Vol. 26, pp. 125–173). San Diego, CA: Academic Press.
Pezzo, M. (2003). Surprise, defence, or making sense: What removes hindsight bias? Memory, 11, 421–441.
Pohl, R.F. (1998). The effects of feedback source and plausibility on hindsight bias. European Journal of Cognitive Psychology, 10,
191–212.
Memory, 11, 337–356.
Reber, R., & Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth. Consciousness and Cognition: An International
Journal, 8, 338–342.
Schwarz, N., Bless, H., Strack, F., Klump, G., Rittenauer-Schatka, H., & Simons, A. (1991). Ease of retrieval as information: Another look
at the availability heuristic. Journal of Personality and Social Psychology, 61, 195–202.
Schwarz, R, & Clore, G.L. (1983). Mood, misattribution, and judgments of well-being: Informative and directive functions of affective
states. Journal of Personality and Social Psychology, 45, 513–523.
Sharpe, D., & Adair, J.G. (1993). Reversibility of the hindsight bias: Manipulation of experimental demands. Organizational Behavior and
Strack, F., & Förster, J. (1998). Self reflection and recognition: The role of metacognitive knowledge in the attribution of the recollective
experience. Personality and Social Psychology Review, 2, 111–123.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 124– 131.
Werth, L. (1998). Ein inferentieller Erklärungsansatz des Rückschaufehlers. Der Rückschaufehler: Ein Effekt sowohl zu hoher als auch zu
geringer Urteilssicherheit [An inferential explanation of the hindsight bias], Hamburg: Kovac.
Werth, L., & Förster, J. (2002). Implicit person theories influence memory judgments: The circumstances under which metacognitive
knowledge is used. European Journal of Social Psychology, 32, 353–362.
Werth, L., Strack, F., & Förster, J. (2001). Social influence and suggestibility. In V.DePascal, V.A. Gheorghiu, P.W.Sheehan, & I.Kirsch
(Eds.), Hypnosis international monographs: Vol 4, Suggestion and suggestibility. Theory and research (pp. 153–166). München:
M.E.G.-Stiftung.
Werth, L., Strack, F. & Förster, J. (2002). Certainty and uncertainty: The two faces of the hindsight bias. Organizational Behavior and Human
Whittlesea, B.W.A., Jacoby, L.L., & Girard, K. (1990). Illusions of immediate memory: Evidence of an attributional basis for feelings of
familiarity and perceptual quality. Journal of Memory and Language, 29, 716–732.
Winman, A., Juslin, P., & Björkman, M. (1998). The confidence-hindsight mirror effect in judgment. An accuracy-assessment model for the
knew-it-all-along phenomenon. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 415–431.
Surprise, defence, or making sense: What removes hindsight bias?
Mark V.Pezzo
University of South Florida, USA
This paper examines predictions concerning the absence of hindsight bias. Some hypothesise that because
hindsight bias increases with outcome “surprisingness”, only unsurprising outcomes will remove it. Others
suggest the opposite—that very surprising outcomes will reduce or reverse the bias. A proposed sense-making
model suggests that unexpected outcomes (i.e., initially surprising) invoke greater sensemaking, which typically
produces greater hindsight bias. If the process is not successful, however, the bias may be reduced or reversed.
Expected outcomes will also produce little hindsight bias, but only because they invoke relatively little
sensemaking in the first place. Feelings of surprise arising from sensemaking (i.e., resultant surprise) should be
inversely related to hindsight bias. Results of four experiments provide support for the model. A secondary goal
was to determine the boundaries of a defensive-processing mechanism also thought to reduce hindsight bias for
negative, self-relevant outcomes. Results suggest that a sense of responsibility for the outcome may be necessary
for defensive processing to be activated.
“I was surprised, but not that surprised. I mean, it makes sense.”

—from an internet chat.
When musing over past events, how inevitable do they seem? A sizeable literature on the hindsight bias tells us that we are
disposed to believe we “knew it all along” that such events would occur. In his seminal article, Fischhoff (1975) argued that
this inclination is caused by a relatively automatic and unconscious sense-making process (creeping determinism) that focuses
attention on outcome-consistent information and away from outcome-inconsistent information. More than two decades of
research, and over 120 published articles1 pro vide a wealth of evidence for the ubiquity of the hindsight bias—it has been
shown in medical diagnoses, presidential elections, legal decisions, accounting, sporting events, and myriad other domains
(see Christensen-Szalanski & Willham, 1991, and Hawkins & Hastie, 1990, for reviews). This paper, however, addresses
those instances in which the hindsight bias does not occur.
Consider a stockbroker who advises her clients, in good faith, to purchase many shares of a certain stock, but then learns that
the stock has “crashed”. Will the stockbroker later believe that she knew that this would happen? Or did voters supporting
President Clinton respond that they “knew it all along” that he was having an affair with Monica Lewinsky? Surely there are
times when we don’t think that we knew it all along. With few exceptions, however, the extant literature has not addressed the
conditions under which we do not exhibit the hindsight bias.

1Christensen-Szalanski and Willham (1991) found 40 articles, reporting 128 experiments on hindsight bias. A recent search revealed over
120 articles, a likely underestimation, as articles can use terms besides “hindsight bias” (e.g., Alicke, Davis, & Pezzo, 1994; Tan & Lipe,
1997).
Requests for reprints should be sent to Mark V. Pezzo, Department of Psychology, University of South Florida, 140 th Ave.,
South, St.Petersburg, FL 33701, USA. Email: pezzo@stpt.usf.eduI thank Stephanie Parks, Hal Arkes, Mark Leary, Eric
Stone, Emily Page, Maria Watson, William Fleeson, John Seta, and three anonymous reviewers for their helpful comments
and suggestions.I also wish to thank the many lab assistants who helped conduct this research: Adrianne Malanos, Jill Rosa, Peter
Rives, Emily Greenwood, Matt Morris, Andrew Adamsbaum, Neal Wildrick, Karen Manship, David James, Laurie Best,
Bethany Wolf, and Ellen J.Godfrey.
WHAT REMOVES HINDSIGHT BIAS? 411
Some researchers have effectively reduced the hindsight bias by forcing participants to consider how alternative outcomes
might have occurred (Arkes, Faust, Guilmette, & Hart, 1988; Davies, 1987; Nario & Branscombe, 1995; Slovic & Fischhoff,
1977). Although this approach is effective, it gives no indication of when people might spontaneously avoid the bias, either by
ignoring the outcome altogether, or by considering information that is incongruent with it.
THE ROLE OF SELF-RELEVANT OUTCOMES

Mark and Mellor (1991) suggest that outcome self-relevance might play an important role in determining whether or not the
hindsight bias occurs. They found that people who lost their jobs following a company layoff reported the layoff as less
foreseeable than did either community members or employees who did not lose their jobs. Mark and Mellor held that only the
laid-off workers did not exhibit the bias because they were motivated to reduce their own sense of culpability for not having
acted to circumvent their layoff. However, their study provides only indirect evidence for this because they included no pre-
outcome control condition. Participants were asked how “foreseeable” the outcome was only after the layoff had taken place.
If laid-off workers felt the outcome was not particularly foreseeable before the layoff as well, then no hindsight bias would be
said to exist. A regression-discontinuity technique was used to help reduce this possibility by statistically controlling for a
number of demographic factors among the different groups, but the inclusion of a no-outcome control condition would
provide more compelling evidence for such a claim.
A few researchers have used the no-outcome control design to examine the defensive-processing mechanism proposed by
Mark and Mellor (1991). Pezzo (1996) hypothesised that highly committed decision makers would exhibit less hindsight bias
if the outcome contradicted their decision. In two separate experiments, however, he found equivocal evidence for such an
effect. Louie (1999), however, found that business students who believed that their stock choice had failed exhibited no
hindsight bias, but students whose stock succeeded did exhibit the bias. Similarly, Louie, Curren, and Harich (2000) found
that MBA students showed hindsight bias when the outcome was favourable to their team’s decision, but not when it was
unfavourable.
An interesting question for the defensive-processing hypothesis is whether a sense of culpability or responsibility for the
negative outcome is necessary to produce the effect. Such a requirement is implied by the work of Mark and Mellor and
others (e.g., Louie et al., 2000; Markman & Tetlock, 2000), but has never been directly tested. This is important, because
there exist negative outcomes that can be both upsetting and relevant to one’s self-esteem, but for which one would not feel
culpable (Cialdini, Borden, Thorne, Walker, Freeman, & Sloan, 1976). Would such an outcome be sufficient to activate the
defensive-processing mechanism?
THE ROLE OF SURPRISE

Perhaps more important for the defensive-processing approach than the issue of culpability is the fact that, in addition to being
threatening, negative outcomes are often more surprising than positive outcomes (Taylor & Brown, 1988). Additionally, there
is reason to believe that a self-relevant event, such as the layoff in Mark and Mellor’s (1991) study, may be more surprising
than one that is not self-relevant. For example, Falk (1989) found that coincidences with high personal significance were
judged more surprising than those with low personal significance, which were judged less likely than coincidences that
happen to other people. There is evidence to suggest that this effect is robust (cf. Zakay, 1984), and thus any examination of
self-relevance should probably also take into consideration perceptions of outcome surprisingness.
What is the effect of surprise on the hindsight bias? Christensen-Szalanski and Willham (1991) note that “nearly all
[hindsight] researchers… have disregarded the possible moderating effect of an event’s surprisingness on the hindsight bias,
and the few studies that do attend to an event’s surprisingness do not present the data that are needed to evaluate the
hypothesized moderating influence” (p. 152). In the decade since that article, however, a few researchers have made explicit
predictions concerning the effect of surprise.
Mazursky and Ofir (1990, 1996) suggested that an extremely surprising outcome, alone, could reduce or even reverse the
hindsight bias. They theorised that a surprising outcome triggers “special processing” which, along with feelings of surprise,
can cancel or reverse the effects of hindsight bias. They suggest that this processing involves a greater number of attempts at
explanation, recall, and justification, which may offset the typical effects of creeping determinism. Note also that, according
to their model, moderately or completely unsurprising outcomes should produce typical hindsight biases. In three separate
experiments, using evaluations of medical decisions, consumer products, and modern art, they showed that people receiving
extremely surprising outcomes gave hindsight estimates that were in the “reverse” direction (Ofir & Mazursky, 1997). That
is, likelihood estimates for highly expected outcomes that did not occur were even larger in hindsight. Reverse hindsight bias
is somewhat controversial, because previous claims (Guerin, 1982; Mazursky & Ofir, 1990; Verplanken & Pieters, 1988)
have been criticised for misinterpretation of the data (Arkes, 1988; Hawkins & Hastie, 1990; Mark & Mellor, 1994). Still,
412 PEZZO
others have reported such effects (Choi & Nisbett, 2000; Haslam & Jayasinghe, 1995; Menec & Weiner, 2000; Pezzo, 1996;
Winman, 1997).
Despite the intuitive appeal of reverse hindsight bias, others have argued that surprising outcomes should produce greater
rather than less hindsight bias (cf. Roese & Olson, 1996). As Mazursky and Ofir (1996) have noted, considerable evidence
exists indicating that unexpected outcomes produce the greatest search for causal antecedents (Hastie, 1984; Pyszczynski &
Greenberg, 1981; Sanna & Turley, 1996; Weiner, 1985). But this type of “sense-making” activity has been argued, by most,
to produce hindsight bias rather than remove it (Fischhoff, 1975; Roese & Maniar, 1997; Schkade & Kilbourne, 1991;
Wasserman, Lempert, & Hastie, 1991). From this logic it follows that outcomes that are most unexpected should produce the
greatest hindsight bias.
Conversely, outcomes that are congruent with expectations should produce little hindsight bias because they require little
or no search for causal antecedents. This prediction has received some empirical support; for example, Cannon and Quinsey
(1995, Study 1) found that the occurrence of an outcome that was generally expected (estimated likelihood=70%) did not
produce hindsight bias, but its unexpected non-occurrence (estimated likelihood=30%) did produce a significant hindsight
bias.
Some indirect evidence indicating a positive relationship between surprise and hindsight magnitude can be gleaned from
studies using almanac trivia questions. People typically exhibit greater hindsight bias when they are presented with difficult or
misleading questions, which are thought to be more surprising than easy questions (Christensen-Szalanski & Willham, 1991;
Fischhoff, 1977; Hoch & Loewenstein 1989; Winman, 1997).2 However, the best example of this relationship comes from
Schkade and Kilbourne (1991). Using business scenarios, they created high and low expectations for a positive outcome by
manipulating performance history and employee behaviour. A surprising outcome was defined as incongruent with a priori
expectations. Hindsight bias was relatively large for incongruent outcomes, but was smaller and sometimes eliminated for
congruent outcomes. These findings are consistent with the idea that unexpected outcomes produce greater sense-making
activity (Weiner, 1985). In response, Ofir and Mazursky (1997) suggest that Schkade and Kilbourne may not have used
outcomes that reached the threshold of surprise sufficient to offset or reverse the bias. Whether or not this is the case, it is
notable that Ofir and Mazursky have demonstrated reductions and even reversals of the bias.
A SENSE-MAKING MODEL OF HINDSIGHT BIAS

One possible solution requires that we consider two meanings for the term “surprise”. The most common usage defines a
surprising outcome as one that is incongruent with a priori expectations (Fischhoff & Beyth, 1975; Schkade & Kilbourne,
1991). Such surprise could be thought of as “initial” surprise, and it is this type of surprise that triggers sense-making activity
(Hastie, 1984; Pyszczynski & Greenberg, 1981; Weiner, 1985). The second meaning for surprise refers to the
phenomenological “feeling” that results from the sense-making process. Some initially surprising outcomes are surely easier
to make sense of than others. Those that are more difficult, however, likely result in conscious awareness of the incongruity,
and it is this awareness that we may call “resultant” surprise. Thus, an originally incon gruent outcome (i.e., “initially”
surprising) could ultimately produce different levels of resultant surprise depending on the effectiveness of the sense-making
process.3
The model of this process is presented in Figure 1. It is hypothesised that unexpected outcomes spontaneously engage a sense-
making process, which, if successful, will produce hindsight bias. Although there may be multiple ways of “making sense” of
an outcome, the model assumes that the magnitude of the bias is a function of the relative ease with which causal antecedents
are uncovered (Roese & Olson, 1996). If an outcome is relatively easy to make sense of, it will produce a hindsight bias, and
should seem relatively unsurprising.
Note that the outcome will not necessarily seem completely unsurprising. Hindsight bias merely requires a difference
between pre- and post-outcome likelihood estimates, not a feeling that the person “knew it all along”. For example, if an initial
likelihood judgement is only 5% for some event, but later is recalled as 15%, hindsight bias would be said to exist, but no one
would claim to have “known all along” that the outcome was going to occur. Thus the outcome would still be perceived as
surprising but, importantly, less so.
However, if an outcome produces an effortful search that is not successful, this should reduce, remove, or even reverse
hindsight bias, and produce resultant surprise levels that are relatively high. Essentially, surprise should result from the
growing awareness that the sense-making process is not reaching closure. Finally, the model also states that outcomes that are
2 Hoch and Loewenstein (1989) have been cited as evidence for a reverse hindsight bias with difficult items (e.g., Hawkins & Hastie, 1990;
Louie, 1999; Louie et al., 2000; Pohl, 1998). In fact, Hoch and Loewenstein found that difficult items most have occurred because difficult
items were also found to reduce often led to greater, not less hindsight bias. The confusion may participants’ overconfidence—a related, but
distinct measure of cognitive bias.
Figure 1. A sense-making model of hindsight bias. Hindsight bias is produced by making sense of unexpected outcomes. It is significantly
reduced by unexpected outcomes that do not make sense, and removed by expected outcomes that do not require sensemaking. Resultant
feelings of surprise are negatively correlated with hindsight bias because they are reduced by successful sensemaking. Note that relatively
few outcomes in the past literature have been completely expected.
expected, or obvious, should produce modest (if any) hindsight and little surprise. Because the outcomes already make sense,
the processing thought to produce hindsight bias is less likely to be activated (Schkade & Kilbourne, 1991).
The sense-making model can be applied to a number of findings in the hindsight literature. For example, Wasserman et al.
(1991) found that providing “chance” explanations for an outcome (e.g., tornado) produced a smaller hindsight bias than
providing typical “deterministic” explanations. Similarly, Tan and Lipe (1997) showed that outcomes that were perceived as
uncontrollable (due to their unpredictability) produced smaller hindsight effects. Both results could be explained in the
present model by noting that, because chance factors have no causal antecedents, they are probably difficult if not impossible
to make sense of. Pohl (1998) also reported that certain types of feedback believed to be “implausible” might be less
susceptible to hindsight effects.
Roese and Maniar (1997) provide evidence for two aspects of the model. First, they found that in games for which
Northwestern’s football team was expected to lose, their (unexpected) wins produced large hindsight effects in fans.
However, by the third game of their study, as expectations for success had grown considerably, the team’s win produced no
hindsight effect at all—presumably because the now-expected win required little sense-making activity. In some conditions,
however, they induced participants to engage in causal thinking by either (a) having them “go over the game over and over in
their heads”, or (b) by thinking “if only” and imagining how the game might have turned out differently. Both techniques
produced an increase in causal thoughts, and both produced significant increases in the hindsight bias for all games (see also
Roese & Olson, 1996, Study 1).4
An examination of Ofir and Mazursky’s (1997) stimulus materials suggests that in all three of their experiments they
presented outcomes that appear to be very difficult to make sense of. In Experiment 1, for example, a medical patient is said
3 Ofir and Mazursky (1997) clearly state that “acknowledged surprise” is associated with the reversal of the hindsight bias (p. 52). What is
less clear is whether they view this form of surprise as the cause, result, or independent of sense-making activity. The present model
suggests that acknowledged surprise is likely the result of a (failed) sense-making process.
414 PEZZO
to die, but, similar to Wasserman et al.'s (1991) study, there are no causal antecedents (other than chance) that would explain
why the patient would die. Although the very unexpected death is sure to engage intense sense-making activity, it is unlikely
that this activity will be successful because all of the evidence points to the patient living. Further, this process could produce
the “reverse” hindsight bias that Ofir and Mazursky found. Because the only information available to participants points to the
alternative outcome (patient lives), the increased processing caused by the unexpected outcome will produce likelihood
estimates for the expected outcome that are even higher than they were before it was found that the opposite outcome
occurred.
Although Ofir and Mazursky (1997) propose a similar mechanism, the current model may present a more parsimonious
approach. Ofir and Mazursky claim that very surprising outcomes activate additional causal processing that, when combined
with acknowledged surprise, offsets or reverses the effects of typical hindsight processing. Rather than invoke two competing
processes, the present model suggests that only one process (sense-making) is at work and that this process must be
successful for hindsight bias to occur, regardless of how initially surprising the event may be.
It might be worth considering what role defensive processing would play in the proposed model. Typically, we respond to
threatening outcomes by forming excuses to reduce our perceived responsibility for the outcome (Markman & Tetlock, 2000).
Such behaviour has been shown to reduce hindsight for measures that imply responsibility—such as the “foreseeability”
ratings used by Mark and his colleagues (Mark & Mellor, 1991, Mark, Boburka, Eyssell, Cohen, & Mellor, 2003-this issue).
Consider this, however: according to Snyder and Higgins (1988) much of excuse making involves shifting attributions for
negative personal outcomes from internal (“I’m incompetent”) to external (“I’m overworked and tired”). One intriguing
possibility is that although shifting from internal to external causes may reduce perceived blame for an event, it does not
necessarily reduce the perceived likelihood of the event. In fact, a search for external causes for an event should still amass
more causal antecedents for that outcome than if one had not tried to make excuses. The ironic result may be that although
defensive pro cessing might help to reduce one’s sense of culpability, it might also increase hindsight bias as measured by
likelihood ratings (cf. Roese & Maniar, 1997). Such a possibility presents interesting directions for future research.
THE PRESENT RESEARCH

The present research will test predictions concerning the absence of hindsight bias derived from currently competing theories
and from the proposed sense-making model. The primary goal of this research is to untangle the seemingly disparate
predictions concerning outcome “surprisingness”. A secondary goal of this research is to examine the limits of the defensive-
processing hypothesis. Although defensive-processing models appear to imply the stipulation of culpability (cf. Mark et al.,
2003-this issue), this has not been directly tested. Does defensive processing occur for upsetting outcomes for which one is not
responsible?
The first of the present experiments compares predictions of two groups of loyal college basketball fans—one in foresight
and the other in hindsight—to test the effects of both outcome expectedness and self-relevance in a real-world setting. The
second and third experiments manipulate self-relevant outcome information via a bogus “cognitive abilities test” to examine
these same effects, but in a laboratory setting. A fourth experiment presents relatively unexpected psychological research
outcomes that are not self-relevant, but are either relatively difficult or easy to understand.
EXPERIMENT 1
Sports contests have proven to be successful outlets for testing hindsight predictions (e.g., Leary, 1981; Roese & Maniar,
1997). In the present research, the availability of both home and visiting fans provides an opportunity to examine the effects
of both naturally occurring differences in outcome expectations and the self-relevant implications of those outcomes. First,
presumably home fans would think that a home team win was more likely than would visiting fans. Second, such a win has
different connotations for home versus visiting fans.
Roese and Maniar (1997) found that participants induced to generate causal explanations for football game outcomes
exhibited the greatest hindsight bias. Because unexpected sports outcomes have been shown to spontaneously produce greater
sense-making activity (Lau & Russell, 1980), it is predicted that whichever team receives an unexpected outcome (regardless
of whether it is a win or loss) will exhibit greater hindsight bias. Conversely, if either team receives an outcome that is
4 Note that “if only” counterfactual processing is different than the “consider the opposite” approach (Arkes et al., 1988; Slovic &
Fischhoff, 1977). The latter approach asks people to consider a different outcome occurring under the same conditions. This weakens the
causal link between the actual outcome and pre-existing conditions. “If only” processing, however, asks people to consider a different
outcome occurring under different conditions. This strengthens the causal link between the outcome and pre-existing conditions, producing
greater hindsight. When spontaneously making sense of an outcome, this form of “if only” processing is typically invoked (Roese & Olson,
1996).
expected, they should exhibit no hindsight bias. And if either team’s fans receive a very unexpected outcome—particularly
one that is difficult to make sense of—these fans are also not expected to exhibit the bias.
What predictions might a defensive-processing hypothesis make? Although sporting events do not produce outcomes for
which fans feel any responsibility, Cialdini et al. (1976) have shown that sports can have a significant effect on fans’ self-
esteem even though fans play no role in determining the outcome. Leary (1981) predicted that greater self-esteem concerns
(e.g., public responding, high ego-involvement) would increase the hindsight bias for college football fans, because it is self-
serving to act as though one “knew it all along” what would happen. However, his results showed no effect of such motives
on likelihood estimates. Of course, defensive processing might still have played a role, however, because this study did not
record whether participants were home or visiting team fans, and thus could not take into account the self-relevant “meaning”
of the outcome. Roese and Maniar (1997) also did not distinguish between home and visiting fans, noting that “nearly all” of
their participants were home team fans (p. 1251). Had the distinction been made, the defensive-processing hypothesis might
predict that fans of the losing team would exhibit less hindsight bias than fans of the winning team. If such an effect were not
found, this could suggest that culpability is a necessary condition for activation of the defensive-processing mechanism.
Method
We approached 281 people immediately before or after three college basketball games. The home team (Wake Forest) played
three different visiting teams (Duke, University of North Carolina, and North Carolina State). College basketball was chosen
because of strong fan involvement. After indicating their team allegiance, participants were asked “Using a value from 0–
100%, how likely is it (was it) that the home team, Wake Forest, will (would) win this game?”. Hindsight bias was defined as
the difference between pre- and post-game estimates. A between-subjects design was used in that different participants were
(randomly) assigned to the pre-game and post-game conditions. During this season, Wake Forest had 23 wins and 5 losses,
and won all three of the games in the study. Assistants helping to collect data observed that the losing team fans in all three
games appeared visibly upset by their team’s failure.
Results
A three-way ANOVA was performed on participants’ likelihood judgements with team (Duke, NC State, or UNC), fan (home
or visiting), and time (response given before or after game) as the independent variables. Table 1 shows the mean, median,
and standard deviation values for the three teams. No predictions were made concerning the team variable, and it did not
significantly interact with any other variables, all ps > .51. Thus, in order to increase power, all further analyses collapsed
across this variable. Visiting fans were significantly less likely (M=44%) than home team fans (M=82%) to believe in
foresight that Wake Forest would win the game, F(1, 269) = 237.8, p=.001, partial Most important
TABLE 1 Estimated likelihood of home team win (0–100%)

Game
Condition Duke NC State UNC
Home fans
Foresight Md 80 85 75
M 82.7 85.1 78.6
SD 11.1 9.5 17.8
n 21 30 27
Hindsight Md 80 85 77
M 80.9 82.4 74.9
SD 12.2 12.2 23.7
n 21 28 22
Opposing fans
Foresight Md 40 50 38
M 38.8 47.0 32.5
SD 15.8 22.9 26.8
n 17 32 26
Hindsight Md 50 65 50
M 49.3 52.7 40.0
416 PEZZO
Game
Condition Duke NC State UNC
SD 12.2 33.3 25.7
n 15 19 23
was a significant two-way interaction of fan × time, indicating that the home and visiting fans exhibited hindsight biases of
different magnitudes, F(1, 269)=4.78, p=.03, partial . Planned comparisons showed that although the visiting team
fans exhibited marginally significant hindsight effects, F(1, 269)=3.51, p=.07, the home team fans did not exhibit any
hindsight bias, p=.43.5
Discussion
The present study indicates that, under certain conditions, salient real-world outcomes can fail to produce any significant
hindsight bias. Because home and visiting team fans gave different foresight estimates (82% and 44%, respectively), this
study provides an opportunity to examine the effect of varying levels of unexpectedness on the hindsight bias. In particular,
home team college basketball fans, who expected their team to win each game, did not exhibit a hindsight bias. Visiting team
fans, however, who did not expect Wake Forest to win, exhibited a marginally significant hindsight bias.6 These results are
consistent with the model’s prediction that relatively unexpected outcomes produce hindsight bias, but expected outcomes do
not (cf. Cannon & Quinsey, 1997, Study 1). However, the results do not appear to support Ofir and Mazursky’s (1997) main
prediction that surprising outcomes will produce reverse hindsight bias. Of course, this may be because the outcomes in this
study were not sufficiently surprising to invoke any additional processing to offset the bias. Recall, however, that Ofir and
Mazursky also predict that unsurprising outcomes will produce hindsight. This did not occur in the present study. The highly
expected win for the home team fans failed to produce any hindsight bias.
One concern in this study is that the lack of hindsight bias for the home team fans could be caused by ceiling effects.
Although it is true that the home team fans gave relatively high estimates in foresight (M=82%, Md=83%), there is arguably
still “room” for fans asked in hindsight to give even larger estimates. This did not occur, however, and in fact their mean and
median likelihood estimates for a win indicated a small “reverse” hindsight bias for all three games, although not a
statistically significant one. Thus, although a ceiling effect cannot be completely ruled out, it seems unlikely.
Finally, consider the results from a motivational perspective. This study indicates that a negative outcome, even for
dedicated college basketball fans, is not sufficient to remove the hindsight bias. This may present a limitation for the
defensive-processing hypothesis. That is, a sense of culpability for a negative outcome, rather than just self-relevance, may be
necessary for the defensive-processing mechanism to be activated. If feelings of culpability do not exist, then the process at
work may be primarily cognitive in nature (Fischhoff, 1977; Schkade & Kilbourne, 1991).
EXPERIMENT 2
Use of real-world outcomes in the first experiment produced some design limitations. First, although the college basketball
games used in Experiment 1 did produce strong emotional reactions in fans and could even have had implications for those
fans’ self-esteem, no one could claim that the fans were responsible for the outcomes. Second, participants were not randomly
assigned to each outcome condition. Third, no explicit measures of sensemaking, defensive thinking, or resultant surprise
were included. Experiment 2 attempts to resolve these issues, in a laboratory setting, by randomly assigning participants to
receive either positive or negative test-performance feedback, for which they would presumably feel responsible, and
recording both participants’ thoughts about the outcome and their resultant surprise at it.
Participants in Experiment 2 received bogus performance feedback designed to be either threatening or not threatening to
participants’ self-image. Outcome expectations were manipulated by giving feedback that was either consistent or
inconsistent with participants’ a priori belief about their own performance abilities. An explicit measure of resultant surprise
was included, and an attempt to measure sense-making activity was achieved by coding participants’ thoughts as either
consistent or inconsistent with the outcome. Defensive processing was measured by counting the number of excuses
participants provided. As an additional measure of the effects of defensive processing, participants’ responses were compared
5The same analyses using non-parametric tests produced virtually identical p-values: .08 and .55, respectively.
6As one reviewer notes, the NC State fans may have reduced this effect somewhat. These fans were not as confident that their team would win,
and probably did not find their team’s loss to be as unexpected—and thus exhibited a smaller hindsight bias.
to those of an additional set of (yoked) control participants, who had no personal investment in the outcomes, and thus would
not be expected to exhibit such processing.
Method
Participants. Introductory psychology students were screened for academic self-esteem (Fleming & Courtney, 1984).
Students scoring in either the upper or lower third of the distribution were invited to participate in the experiment on the
condition that they bring an acquaintance who was not enrolled in introductory psychology. A total of 145 student-
acquaintance pairs participated.
Procedure. All participants were given a “Cognitive Abilities” test, and led to believe that it was an excellent predictor of
college performance. Much of the bogus test was modelled after items in Raven’s Progressive Matrices (Raven, Raven, &
Court, 1995), although some additional word problems were included. After initial instructions were given to all participants,
the acquaintance was taken to a separate room ostensibly to create a less distracting test environment for both participants. In
fact, this room had a one-way mirror that allowed the acquaintance to secretly observe the actor as he or she took the test. The
actor was positioned close enough to the mirror that the observer could read the test and see the responses that the actor gave.
In addition, the observer was provided with his/ her own copy of the test. Thus, the observer served as a yoked control—able
to read along and see the answers given by the actor, but with no personal investment in the outcome. Similar results for both
actors and observers in a given condition (deemed threatening to actors) would weaken any claim of defensive processing.
Actors were given 20 minutes to complete the test. The experimenter then removed the test and answer sheet, ostensibly to
score it on a computer.
Outcome feedback. Five minutes later the experimenter returned with a printout indicating the participant’s percentile rank
on the test compared to other students at the university. Actors were either given negative feedback (30th percentile), positive
feedback (90th percentile), or no feedback (control). The observer, who remained in the adjacent room during this time, was
also shown a copy of the actor’s feedback sheet.
Thought listing. Participants were then given five minutes to record any thoughts (one per line) about their test performance
and feedback. Observers listed thoughts about the actor’s performance. Participants were then asked to indicate next to each
thought whether it was suggestive of a good performance, a poor performance, or irrelevant.7 Irrelevant thoughts were not
analysed. Some examples of reasons for a poor performance in this study were:
I’m horrible with word problems!

Other people are able to see patterns better than
me.
My SAT scores are pretty consistent with this (bad
outcome).
Examples of reasons for a good performance in this study were:
I’ve always been good at this stuff.

The problems seemed easy.
I checked my answers twice—I think they’re
mostly right
Hindsight measure. Actors given no performance feedback were asked to indicate, “What percentile rank did you think you
would fall into right after you completed this test?” Actors given feedback were asked to indicate how they thought they
would have responded before receiving feedback. Observers also gave estimates for the actor they observed.
Resultant surprise. Actors and observers receiving outcome information were asked to rate their resultant surprise at the
outcome using a 9-point scale with anchors of not at all and extremely surprising. The presentation order for surprise and
hindsight measures was counterbalanced to control for carryover effects.
Additional measures. Actors and observers were asked, also using a 9-point scale, to rate the following items: (a) test
difficulty, (b) their general test-taking abilities, (c) the other person’s general test-taking abilities. These questions appeared
randomly in between surprise and hindsight ratings.
Suspicion. After completing all of the measures, all participants were asked to indicate whether or not they were suspicious
of any of the procedures in the experiment, particularly the test and feedback they received. If either the actor or the observer
in a yoked pair indicated moderate or greater suspicion, the pair was not included in the analyses described below.
418 PEZZO
Results
A total of 19 actor/observer pairs were dropped from the analyses. Approximately half of these were due to record-keeping
errors or failure to complete the questionnaire. The other half were due to suspicion of the procedures, usually voiced by
observers. There remained 126 pairs of participants.
Manipulation checks. Regardless of feedback, low self-esteem actors rated their test-taking abilities lower than did high
self-esteem actors (Ms=4.9 vs 6.5, respectively), p < .001, d=1.0. Actors receiving negative feedback rated the
test more difficult (M=6.5) than actors receiving positive feedback (M=5.5), t(80)=2.62, p=.011, d=0.57. Control actors with
low self-esteem expected to perform worse than did those with high self-esteem, t(42)=3.05, p=.004, d = 0.94. We may
surmise, then, that low self-esteem actors found the positive outcome to be more unexpected and that high self-esteem actors
found the negative outcome more unexpected. Finally, all observers rated the test to be moderately difficult (M=6.2) and rated
the actors to have moderate test-taking abilities (M=6.5), regardless of which condition they were in.
Basic hindsight measure. Table 2 presents cell means, medians, and standard deviations for actors’ and observers’
estimates of the actors’ performance on the cognitive abilities test (in percentile units).
Actors. Four planned-comparisons were performed comparing both positive and negative outcome conditions with their
same-self-esteem controls. The hindsight bias was not found for outcomes that were generally congruent with participants’
expectations, but was found for those that were relatively incongruent. Predictions made by low self-esteem actors receiving a
negative outcome did not differ from those made by participants in the control condition, p=.89, but participants receiving the
positive outcome gave significantly larger estimates than controls, p =
TABLE 2 Estimates of actors’ test score (percentile rank) as a function of actor self-esteem (SE) and outcome
Outcome
Negative (30%) Control Positive (90%)
Actors’ estimates
Low SE Md 50 50 70
M 49.1a 49.7a 65.1b
SD 13.2 15.4 17.4
n 22 22 21
High SE Md 50 60 70
M 51.7a 64.1b 68.3b
SD 18.1 16.0 16.4
n 19 22 20
Observers’ estimates
Md 50 70 80
M 51.0a 65.3b 74.9c
SD 11.3 14.9 13.4
n 41 44 41
Estimates range from 0 to 100%. All comparisons are made within a self-esteem condition (i.e., row) across outcomes. Cells sharing a
superscript do not differ significantly at the .05 level.
.002, d=1.22. For high self-esteem actors, this pattern was reversed. Those receiving the positive outcome did not differ from
the control group, p = .42, but those receiving the negative outcome gave significantly lower estimates, p=.014, d=0.73.
Observers. The outcomes used in this experiment were not self-relevant for observers. Although some actor/observer pairs
were friends, the majority were mere acquaintances. Thus, observers were not expected to be aware of actors’ self-esteem
concerns, nor were they expected to have much personal stake in actors’ test results. As predicted then, there was no main
effect of self-esteem, nor any interaction with self-esteem and outcome, ps=.72 and .46, respectively. Values in Table 2 are
collapsed across self-esteem for observers. The results indicate typical hindsight biases. Compared to the control condition,
observers receiving negative feedback (about actor’s performance) gave lower performance predictions, p < .001, d=1.1, and
those receiving positive feedback gave higher performance predictions, p=.003, d=0.68.
7 The “self-coding” approach was used because participants had more insight into the meaning of their idiosyncratically written thoughts
than we did. Later, an independent coder, blind to condition, recoded the thoughts. Agreement with participants’ own categories was 93%.
Resultant surprise ratings. Actors. All actors reported moderate levels of surprise. As might be expected, low self-esteem
actors reported being significantly more surprised by positive outcomes (M=7.1, SD=1.5) than by negative outcomes (M =5.
7, SD=2.4), t(41)=2.28, p=.03, d=0.69. However, high self-esteem actors did not differ significantly in their reported surprise
levels for positive (M=6.2, SD=2.1) and negative outcomes (M=5.2, SD=2.4), p=.19.
Observers. Observers reported that the negative outcome was more surprising (M=6.5, SD = 1.8) than the positive outcome
(M=4.8, SD = 2.74), t(80)=9.53, p=.003, d=0.70. No other effects were significant, Fs < 1.
Correlations between hindsight magnitude and resultant surprise. The two primary predictions from the model are that (a)
hindsight bias would be greatest in conditions with unexpected outcomes, and that (b) resultant surprise and hindsight bias
would be inversely related. Analysis of the likelihood estimates for both actors and observers provides support for the first
prediction. To determine correlations between surprise and hindsight bias, however, a single value indicating the magnitude
of the hindsight bias must be calculated for each participant. This was achieved by subtracting the prediction of each
participant receiving an outcome from the mean of his or her (high or low self-esteem) control group. The resulting value was
multiplied by –1 for positive outcomes so it would be positive for typical hindsight effects, but negative for a reverse hindsight
effect. The same procedure was used with observers, although their control condition did not separate into high and low self-
esteem.
As predicted, increases in hindsight were associated with decreases in resultant surprise,
both ps < .001. Similar correlations were found within each of the four cells created by the 2×2 crossing of self-esteem and
outcome valence, all ps < .02. That is, regardless of whether outcomes were generally congruent or incongruent, people who
found them to still be surprising after 5 minutes of thought showed less hindsight bias. This result is consistent with Ofir and
Mazursky’s (1997) finding that surprising outcomes produce little or reverse hindsight bias. Indeed, scatter plots of hindsight
magnitude and resultant surprise ratings for each cell (not shown) consistently indicate that participants expressing the most
surprise showed a “reverse” hindsight bias, in that their hindsight magnitude scores were negative.
Thought listing procedure. Thought listings were included because they might provide an indication of the ease with which
participants made sense of the outcomes. For example, thoughts consistent with the outcome (e.g., reasons for poor
performance given after negative outcome) might indicate successful sense-making activity, while inconsistent ones might
indicate sense-making difficulty.
Actors. Low and high self-esteem actors did not differ in the total number of thoughts listed (Ms = 7.3 and 7.6, resp.), p=.
18. Table 3 reports the proportion of total thoughts of each type (e.g., reasons for good or poor performance) in each cell.
Positive outcomes produced no significant differences in thoughts compared to those reported by control participants.
Negative outcomes, however, exerted a significant and similar effect on both low and high self-esteem participants. Compared
to same-self-esteem controls, participants receiving negative outcomes thought of significantly more reasons why they would
perform poorly, and significantly fewer reasons why they would perform well. Although such a finding is consistent with the
model for high self-esteem participants—they showed hindsight bias for negative outcomes, but not for positive outcomes—it
does not appear to explain the opposite results of low self-esteem participants.
Perhaps a better way to detect the effects of sensemaking is to examine within-cell correlations between hindsight
magnitude, surprise, and the proportion of outcome-consistent and inconsistent thoughts. We would expect the proportion
TABLE 3 Proportion of thoughts predictive of good and poor test score as a function of self-esteem and outcome
Outcome
Negative Control Positive
Low self-esteem
Good score M .29a .45b .43b
SD .19 .17 .23
Poor score M .67a .54b .46b
SD .18 .17 .22
High self-esteem
Good score M .32a .53b .54b
SD .17 .16 .16
Poor score M .59 a .46 b .45b
SD .21 .16 .14
Values in the same row with different superscripts are significantly different, p<.05. Column values for a given self-esteem condition do
not add to 100% because irrelevant thoughts are excluded.
420 PEZZO
of outcome-consistent thoughts (e.g., reasons for good performance following positive feedback) to be positively correlated with
hindsight magnitude and negatively correlated with resultant surprise. Conversely, we would expect exactly the opposite for
inconsistent thoughts—they should be negatively correlated with hindsight magnitude and positively correlated with resultant
surprise.8 Indeed, this was found for high self-esteem actors in all conditions. All eight predicted correlations were in the
expected direction with magnitudes of all ps < .03. Again, however, the results were less clear for low self-esteem
participants. Although all correlations were in the expected direction, only three of the eight approached significance. First, as
predicted, hindsight magnitude was negatively correlated with inconsistent thoughts but only for positive outcomes,
p=.02. Further, and as predicted, resultant surprise was negatively correlated with outcome-consistent thoughts, p=.
01, and marginally so with outcome inconsistent thoughts, r = .40, p=.07, but each of these only occurred for negative
outcomes.
Observers. Similar findings were obtained for the thought listings of observers. Except for the negative outcome, consistent
thoughts were positively correlated with hindsight bias and negatively correlated with resultant surprise, while inconsistent
thoughts showed the opposite effect, all rs=.33 or greater, all ps < .035. In the negative outcome condition, hindsight
magnitude was not significantly correlated with either type of thought, both ps > .30.
Thus, for both actors and observers, significant correlations between thoughts and hindsight (where they were found)
indicated that people who spontaneously generated more outcome-inconsistent thoughts exhibited reduced hindsight bias—a
finding consistent with past literature (e.g., Arkes et al., 1988), and with the present model. It is unclear, however, why these
correlations were not consistently obtained, although one possible explanation will be discussed in the following section.
Discussion
Experiment 2 was designed to test predictions concerning both defensive processing and the unexpected nature of outcome
information on the presence of the hindsight bias.
Defensive processing. Defensive processing is believed to be activated by people’s desire to reduce their sense of
culpability for a negative outcome (Louie et al., 2000; Mark & Mellor, 1991). Thus, the defensive-processing mechanism may
only be activated when the participants could feel responsible for causing the outcome they receive. Although participants in
Experiment 1 would not be expected to experience such feelings, those in Experiment 2 should feel responsible for their own
good or poor performance on a “cognitive abilities” test. To the extent that this is true, the defensive-processing hypothesis
predicts that participants in both low and high self-esteem conditions would deny the foreseeability of any negative feedback
they get about their test performance—these people should exhibit little or no hindsight bias for negative feedback. It is also
reasonable to predict that positive self-relevant outcomes will produce typical hindsight bias because it is to participants’
advantage to enhance the foreseeability of their causing such events (Louie et al., 2000). Finally, yoked control participants
should exhibit typical hindsight bias for both negative and positive outcomes, because neither outcome has implications for
their self-image.
The data do not provide unequivocal evidence for a defensive-processing mechanism. Although low self-esteem
participants did not show the bias when given negative performance feedback (as predicted by defensive processing), high
self-esteem participants did exhibit a significant hindsight bias for negative feedback. These data might be taken to indicate an
interesting qualification of the defensive-processing hypothesis: Perhaps only low self-esteem individuals utilise defensive
processing when faced with threatening outcomes. Although previous research indicates that it is usually high, rather than low,
self-esteem individuals who are most likely to employ defensive mechanisms in the face of threatening outcome information
(Agostinelli, Sherman, Presson, & Chassin, 1992; Brown & Gallagher, 1992), this possibility should be examined.
To test this idea, two independent raters counted the number of reasons given by actors for their poor performance that
could be classified as “excuses”—thoughts that shifted blame for a poor performance from internal to external causes (e.g.,
“This desk is too small to write on”, “That fan is distracting”, etc.) (cf. Markman & Tetlock, 2000). Inter-rater agreement was
high (89%) and discussion resolved any differences. High and low self-esteem participants did not differ in the proportion of
total thoughts classified as excuses, (Ms = 50% and 52%, respectively), p=.78. Further, there were no significant correlations
between number (or proportion) of excuses and either hindsight magnitude or resultant surprise ratings for either high or low
self-esteem participants, all ps > .31. Thus, although it is still possible that low self-esteem participants could have reduced
the hindsight bias for negative outcomes via defensive processing, this study was unable to uncover any indication of such
processing. It may also be that defensive processing is most likely to occur with questions that imply culpability. For
8 To do this, thought listings were simply recoded as either consistent or inconsistent with the actual outcome, and control subjects were
dropped from the analysis. Note that 8 separate within-cell correlations arise from a 2 (thought consistency) × 2 (outcome)×2 (hindsight or
surprise measure) crossing. Multiplying this by 2 self-esteem conditions yields a total of 16 within-cell correlations. Space prevents US
reporting these in a separate table.
example, most (though not all) studies that have shown reductions in hindsight effects for threatening outcomes asked
participants to indicate how “foreseeable” the outcome was rather than provide numerical estimates of the likelihood of the
outcome (cf. Mark et al., 2003-this issue; Renner, 2003-this issue).
Another concern for the defensive-processing hypothesis, however, is that positive outcomes did not produce hindsight bias
for high self-esteem individuals. Again, one might predict that any outcome reflecting positively on one’s self-image would
produce a hindsight bias (Louie et al., 2000). Thus, it appears that something more than mere self-relevance and outcome
valence plays a role in determining whether or not the hindsight bias will occur.
Outcome expectations. Regardless of valence, outcomes that were generally congruent with a priori expectations produced
no hindsight bias, whereas those that were incongruent produced sizeable hindsight biases. In particular, participants with low
academic self-esteem showed the bias for positive outcomes, but not for negative outcomes, and those with high academic
self-esteem showed the bias for negative outcomes, but not for positive outcomes. These results are consistent with a
cognitive mechanism of hindsight bias in which incongruent events—independent of their implications for the self—produce
greater sense-making activity, which, in turn, produces greater hindsight bias (cf. Schkade & Kilbourne, 1991). The data are
also consistent with another prediction from the model, namely that hindsight bias and resultant surprise will be inversely
related. Those who found the outcomes to be most surprising after spending 5 minutes thinking about them, showed the least,
and often reverse hindsight bias (cf. Ofir & Mazursky, 1997).
That the hindsight bias in the low self-esteem condition (positive outcome) was considerably larger than in the high self-
esteem condition (negative outcome) may simply reflect a difference in the degree to which the outcome feedback was
incongruent with expectations for the two groups. When compared with estimates from their own control condition it appears
that the 90th percentile (positive) feedback for low self-esteem participants was somewhat more unexpected
than the 30th percentile (negative) feedback was for the high self-esteem participants
Data from Experiment 3 will help to verify this assertion. Similarly, the differences in hindsight
magnitudes for observers is probably due to the negative outcome feedback being somewhat more different from control
group expectations than the positive feedback.
Additionally, the fact that the hindsight bias for positive outcomes was significant for observers but not for high self-esteem
actors (who gave similar foresight estimates) might seem difficult to explain. However, one might counter this concern by
recalling that the magnitude of the bias for observers was greater for negative outcomes than for positive, and that this is
consistent with results for high self-esteem actors. Another possibility is that modesty on the part of high self-esteem actors
artificially reduced their foresight estimates. If this were the case, then high self-esteem actors would not actually be as
(initially) surprised as the observers by their positive feedback, and the hindsight bias of the observers, but not actors, would
make perfect sense. Of course, this explanation is speculative, and these results clearly invite further study. Despite the lack of
a definite cognitive explanation, however, note that defensive processing also cannot explain this apparent discrepancy, unless
the positive outcome could be viewed in some way as threatening to the high self-esteem actors.
Although the data in this experiment are consistent with the predictions of the sense-making model, the experiment does not
manipulate or measure sense-making activity directly. Thought listings were included because it was believed that they might
provide an indication of the ease with which participants made sense of the outcomes. A greater proportion of outcome-
consistent thoughts might indicate successful sense-making activity, while a greater proportion of inconsistent thoughts might
indicate difficulty in sensemaking. However, Schwarz (1998) has suggested that a distinction should be made between what
thoughts (or how many) come to mind and how easily those thoughts come to mind. He showed that it is often the ease with
which thoughts come to mind, rather than the absolute amount of thoughts, that has the biggest impact on judgements.
Unfortunately, the measures used in the present study do not provide any indication of the relative ease with which various
thoughts were brought to mind—only how many came to mind (see also Sanna, Schwarz, & Stocker, 2002). This may explain
the sometimes inconsistent correlations between thoughts and hindsight and surprise measures, and should be examined in
future research.
EXPERIMENT 3
Experiment 2 measured resultant surprise and showed that it is inversely correlated with hindsight bias. But it is also argued
that differences in hindsight (and resultant surprise) are produced primarily by differences in outcome expectedness or initial
surprise. Although in Experiment 2 expectation levels were inferred from participants’ academic self-esteem (Fleming &
Courtney, 1984) and from the control condition likelihood estimates, it seems instructive to also obtain a more direct measure
of initial surprise. Thus, Experiment 3 asked participants to provide surprise ratings immediately after receiving the outcome,
rather than after 5 minutes of thought.9 It was predicted that given little time to make sense of the outcome, those in the two
incongruent outcome conditions would provide greater ratings of surprise than would those in the two congruent outcome
conditions. Thus, this experiment serves as a manipulation check for outcome incongruence in Experiment 2.
422 PEZZO
Method
A total of 44 introductory psychology students of high and low academic self-esteem were run in small groups of between one
and four, with no observers. They did not speak to each other and were not aware of the feedback other participants received.
Administration of the cognitive abilities test was identical to Experiment 2. After receiving feedback, participants were
immediately given a short questionnaire that first asked how surprised they were by the results. They were then asked
additional filler questions, asking them to indicate why they felt as they did, and other questions that were used in Experiment
2 (e.g., performance estimate, test difficulty, etc.). A no-outcome condition was not included, as the goal was to measure
initial feelings of surprise at the outcome.
Results and discussion

The goal of this experiment is to show that expectations for the outcome (i.e., “initial” surprise) differed across conditions in
Experiment 2. Because the order of the surprise and other questions was not counterbalanced, only the results for surprise
ratings are described below. The lack of a control condition prevented hindsight bias from being calculated, and perhaps more
important, asking participants to consciously consider their surprise first was likely to have carryover effects.
There was a significant outcome×self-esteem interaction on “initial” surprise ratings F(1, 40) = 23.07, p < .001, d=1.50.
Simple effects tests showed that low self-esteem participants found the positive outcome more surprising (M=7.0, SD = 1.3)
than the negative outcome (M=3.3, SD = 2.0), F(1, 43)=20.3, p < .001, d=1.77, and that high self-esteem participants found
the negative outcome more surprising (M=7.0, SD=1.7) than the positive outcome (M=5.6, SD=1.8), F(1, 43) = 4.42, p=.
042, d=0.99. As expected, participants who received outcomes that were inconsistent with their pre-existing levels of
academic self-esteem rated the outcome as more surprising than did those who received outcomes that were consistent with their
academic self-esteem. Although participants could still have taken some time to “make sense” of the outcome before
responding to the surprise question, this measure is likely to correspond more closely with outcome unexpectedness than the
measure taken in Experiment 2. Thus, it can now be said with more conviction that the outcomes that did not produce
hindsight bias in Experiment 2 were those that were initially the least surprising (cf. Schkade & Kilbourne, 1991).
An additional test showed that for the two a priori “expected” conditions, high self-esteem participants receiving positive
outcomes expressed significantly greater initial surprise (M=5.6) than did low self-esteem participants who received a
negative outcome (M=3.3), t(21) = 2.80, p=.011, d=1.2. This is consistent with the finding from Experiment 2 that high self-
esteem participants showed a trend towards hindsight bias for positive outcomes that was, though not significant, similar to
that of observers, whereas low self-esteem participants showed no bias whatsoever.
One additional finding deserves attention. For some conditions, surprise levels seemed to increase somewhat during the 5
minutes. For example, low self-esteem participants who received negative outcome gave an “initial” surprise rating of M=3.3
in Experiment 3, but other low self-esteem participants, given time to think about this same outcome, actually rated it as
somewhat more surprising 5 minutes later, M = 5.73 (Experiment 2). A similar shift occurred for high self-esteem participants
receiving positive outcomes. Such an upward shift across time in the same condition is counterintuitive; surprise levels should
generally decrease as one makes sense of the outcome. One explanation is simply that, because different people participated in
the two experiments, some discrepancies in the absolute values are to be expected. A more reasonable possibility comes from
the fact that many participants in Experiment 2 were run in the very beginning of their first year in school, whereas those in
Experiment 3 were run later in that school year. It could be that academic experiences during the school year caused an
amplification of participants’ expectations—those with low academic self-esteem got lower, and those with high self-esteem
got higher. Thus, we would not expect either group to be quite as surprised by their results. Admittedly, this is speculative,
but it may provide a possible explanation. In any case, the results of Experiment 3, taken alone, seem to indicate that
outcomes defined as expected and unexpected were generally perceived so.
Experiments 1, 2, and 3 included variables to test for predictions made by the defensive-processing hypothesis. Although this
hypothesis makes important claims about when the hindsight bias does not occur, defensive-processing motives somewhat
confound the effects of sense-making “ease” on the hindsight bias. Thus, the final experiment does not use self-relevant
outcomes, nor does it ask people to list their thoughts in such a way that they may be motivated to engage defensive
processing. Instead it directly manipulates sense-making ease by presenting scenarios with relatively unexpected outcomes
that have been pilot tested to be either easy or difficult to make sense of.
9 Because people are likely to process outcome information the moment they receive it, it may be impossible to perfectly capture “initial”
surprise. However, asking immediately after participants receive the outcome is arguably more accurate than asking 5 minutes later.
EXPERIMENT 4
The sense-making model suggests that hindsight bias is determined by the relative ease with which sense can be made of
unexpected outcomes. Sensemaking is essentially a search for causal antecedents, something that occurs spontaneously for
outcomes that are incongruent with expectations (Hastie, 1984; Sanna & Turley, 1996; Weiner, 1985). The final experiment
provides a test of two predictions of the sense-making model of hindsight bias concerning the ease in sense-making of
relatively incongruent outcomes. First, outcomes that are relatively easy to make sense of will produce a hindsight bias, and will
ultimately seem relatively unsurprising (Prediction 1). Second, outcomes that are difficult to make sense of will produce
either no hindsight bias or reverse hindsight bias, and will remain relatively surprising (Prediction 2).
In this study, an incongruent outcome is created by describing psychological research with a fairly intuitive solution, but
then providing a counterintuitive outcome as the ostensibly true finding (see Choi & Nisbett, 2000; Davies, 1987; Slovic &
Fischhoff, 1977, for other studies using a similar technique). Thus, sensemaking is expected to be activated by all outcomes
used in this study. However, half of the outcomes were designed and pilot tested to be relatively difficult to make sense of,
whereas the other half were designed to be relatively easy to make sense of.
Method
Participants and procedure. A total of 92 students at Wake Forest University received course credit for their participation.
Two students were dropped from the analyses for failing to provide complete data. All participants received a booklet with six
(single-paragraph) scenarios, each describing a psychology experiment. Five of the scenarios listed three possible outcomes
for the experiment, and one (divorce study) listed two possible outcomes. Participants in the “foresight” condition (n=47)
were asked to indicate the likelihood of each outcome using a percentage from 0–100% with the restriction that the estimates
must sum to 100%. Participants in the “hindsight” condition (n=43) were given the same instructions, but were also told
which of the outcomes was the actual finding, and that the alternatives were known to be incorrect. Hindsight participants
were asked to ignore the outcome information. Participants receiving an outcome were also asked to indicate how surprised
they were by the findings, using a 1–7 scale. As in Experiment 2, the order of presentation of resultant surprise and likelihood
measures was counterbalanced to control for carryover effects. Although the unexpected nature of the outcomes should cause
sensemaking to occur spontaneously, participants were also asked to proceed slowly, and prompted to make sense of the
outcomes by encouraging them to “make sure you understand everything about the scenario” before turning to the next page
and answering any questions.
Materials. A list of the six scenarios can be found in Tables 4 and 5. These scenarios were written and pilot tested to have
outcomes that were unexpected, but that also varied in the difficulty with which participants could ultimately make sense of
them. Pilot participants were asked to give their reactions to the ostensibly true outcome for each scenario. The scenarios
written to be difficult to make sense of were typically thought to be so by most (although not all) participants, and those
written to be easy to make sense of were found to be so by almost all participants. For example, one of the difficult scenarios
stated, “of girls who actually have sex, those who feel most guilty about it are most likely to get pregnant.” Most participants
had difficulty providing any reasons why this would be true. An easy scenario stated that when deciding on who to date, “for
both men and women, the only factor that mattered was how physically attractive the person was.” Although students
acknowledged that they would typically have guessed that only men care about “looks”, it was relatively easy for them make
sense of the finding that looks matter to both men and
TABLE 4 Likelihood estimates and resultant surprise ratings for research outcomes that are difficult to make sense of
Research outcome Foresight Hindsight Surprise Surprise×Hindsight correlation
Teens who feel guilty are more likely to get pregnant M 12.9 19.9 M 5.4 r=−.77
SD 20.3 24.3 SD 1.6 p<.001
Md 5 10
American students are less overconfident than Chinese M 18.7 18.6 M 5.3 r=−.26
SD 13.1 12.9 SD 1.0 p=.09
Md 20 20
Democrats favour death penalty more than Republicans M 32.0 30.5 M 4.9 r=−.44
SD 18.5 17.0 SD 1.2 p=.047
Md 30 25
No comparisons between foresight and hindsight estimates were significant, all ps≥.20.
424 PEZZO
TABLE 5 Likelihood estimates and resultant surprise ratings for research outcomes that are easy to make sense of
Research outcome Foresight Hindsight Surprise Surprise×Hindsight correlation
“Looks” are equally important to both men and women M 27.3 45.3 M 3.95 r=−.64
SD 18.5 19.9 SD 1.6 p<.001
Md 20 40
Seminary students only help if they aren’t late for a talk M 40.0 55.5 M 3.43 r=−.51
SD 26.5 27.2 SD 1.6 p=.001
Md 40 60
Couples who argue are more likely to get divorced M 55.0 74.0 M 2.41 r=−.54
SD 20.7 21.5 SD 1.4 p=<.001
Md 60 80
All foresight/hindsight comparisons were significant, ps<.01. Effect sizes ranged from d=0.60 to 0.93.
women. Thus, the scenarios used should be sufficient to test the model.10
After the actual experiment was conducted, numerous participants volunteered to the experimenter that they had already
learned about one of the six research findings in their psychology class. After confirming that this particular topic (and none of
the others) had been covered in many of the introductory psychology classes contributing to the subject pool, this scenario
was dropped from analyses and a new scenario (Democrats favour the death penalty) was run with additional participants,
after additional pilot testing. A filler scenario, which was not analysed, was always presented first to capture the “within-
subject” nature of the design used with the first set of scenarios. Due to time constraints only 42 additional participants (21 in
foresight, 21 in hindsight) could be run in this condition. Below is a full example of a scenario written to be difficult to make
sense of (political attitudes) along with three possible outcomes given to participants. The ostensibly true outcome was
indicated by two asterisks.
Researchers in psychology have long been interested in the relationship between political affiliation and opinions about
various issues. It is important to be able to predict people’s opinions without actually asking them. One issue in
particular is that of capital punishment. In the United States, capital punishment is a hot topic, and a series of studies has
uncovered the following about Republicans and Democrats. In most states, it appears that:
Republicans are more likely than Democrats to favour capital punishment.
** Democrats are more likely than Republicans to favour capital punishment. There is no difference between the two
parties regarding capital punishment.

Planned comparisons were used to test for hindsight bias for each of the easy and difficult scenarios. Consistent with
Predictions 1 and 2, the three outcomes judged by pilot participants to be easy to make sense of produced a sizeable hindsight
bias, whereas outcomes that were judged difficult to understand produced no hindsight bias (see Tables 4 & 5).11 Although all
of the outcomes in this experiment were at least somewhat unexpected,12 the foresight estimates for the difficult condition
were lower, on average, than for the easy condition—although not exclusively so.
Unexpected outcomes typically produce sense-making activity (Olson, Roese, & Zanna, 1996; Weiner, 1985), and this
activity has been shown to produce greater hindsight bias (Roese & Maniar, 1997; Roese & Olson, 1996). Thus, we might
predict that the outcomes in the difficult condition would produce even more sense-making activity—and thus hindsight bias
—than those in the easy condition. In fact, these outcomes produced no significant hindsight biases, whereas the outcomes in
the easy condition produced significant hindsight biases. This finding seems to indicate that the ease with which sense can be
made of an outcome, rather than “surprisingness” per se, is the more important determinant of hindsight bias.
Indeed, even though participants could proceed at their own pace, and thus likely took the time to try understand the
outcomes, those receiving difficult outcomes still remained quite surprised. This is consistent with the idea that they had
difficulty in making sense of these outcomes. As in Experiment 2, ratings of surprise in this experiment indicate “resultant”
surprise rather than initial surprise, although little change between the two types would be expected for difficult outcomes.
10Curiously, although many participants had difficulty generating any explanations for the “difficult” outcomes, they were also
unable to generate more than one or two reasons for the “easy” outcomes, despite their claims that these outcomes made a lot of
sense. This is consistent the idea that the ease with which reasons are generated is not necessarily indicated by the absolute number
of reasons generated (Schwarz, 1998).
Also, as in Experiment 2, resultant surprise measures were negatively correlated with likelihood ratings. Such a finding is
consistent with the idea that those participants who found an outcome to be difficult to make sense of (regardless of whether it
was designed a priori to be difficult) found the outcome to be more surprising, and exhibited less hindsight bias. Of course,
this study did not measure sense-making activity (or its success or failure) directly. Future research should examine ways to
obtain non-intrusive measures of such activity. Some possibilities are discussed below.
GENERAL DISCUSSION
The hindsight bias is considered a robust phenomenon that occurs in myriad domains. The current research, however, was
designed to delineate those times when we do not experience hindsight bias. Using both real-world and laboratory settings,
the present research examined claims arising from two distinct perspectives, one cognitive and one motivational, both of
which predict specific absences of the hindsight bias.
The cognitive perspective: Outcome surprisingness

Interestingly, the results of this research provide support for both Ofir and Mazursky’s (1997) “surprise reduces the bias”
position and Schkade and Kilbourne’s (1991) “surprise increases the bias” position, sometimes in the same study.
Experiments 1 and 2, for example, both showed that the outcomes that were least expected produced the greatest hindsight
bias, and those that were most expected produced little or no bias (“surprise increases hindsight bias”). In Experiments 2 and
4, however, significant negative correlations were found between surprise ratings and hindsight bias in all conditions—those
outcomes that were most surprising produced the least hindsight bias (“surprise decreases hindsight bias”). The position taken
here is that outcomes that are unexpected or what might be termed “initially” surprising invoke a sense-making mechanism
that, if successful, produces hindsight bias and a reduced sense of “resultant” surprise. If the sense-making process is not
successful, however, resultant surprise will remain relatively high and hindsight bias should lower. It is resultant surprise that
seems most consistent with Ofir and Mazursky’s ideas, particularly their claim that surprise must be acknowledged before
hindsight bias can be reduced or reversed. However, the model does not require that we have conscious awareness of initial
surprise. In fact, in many cases it seems likely that we are not aware, for otherwise it would be difficult for hindsight to
“creep” up on us, as Fischhoff (1975) suggested. On the other hand, it is conceivable that people could be aware of their
initial surprise and still show hindsight bias, albeit of a slightly different form. Consider the expression, from the quote in the
beginning of the paper: “I was surprised, but not that surprised. I mean, it makes sense.” The position held here is that if an
outcome can be understood in light of one’s previous knowledge, then hindsight bias should occur. Rather than saying “I
knew it all along”, however, the acknowledged surprise might cause a person to say, “I should have known it all along.” To my
knowledge no research has looked at the hindsight bias from this perspective.
For those who did not claim to have known it all along, Ofir and Mazursky (1997) hold that this may be because their
acknowledged surprise triggered “special processing” that offset or reversed the process that normally produces hindsight
bias. Indeed, other researchers have suggested that additional, qualitatively different, processing occurs when people are
surprised by an outcome. For example, the experience of surprise has been posited to cause people to “rethink the judgement”
(Wasserman et al., 1991, p. 35), to “counteract the (hindsight) tendency to integrate the feedback into the subjects’ general
knowledge structures” (Hawkins & Hastie, 1990, p. 315), or to “serve as a cue to the outcomes’ unexpectedness (which)
reduces, even eliminates, traditional hindsight effects” (Louie, 1999, p. 30).
Although such a special mechanism is possible, the present model indicates that it may be unnecessary. Sherman, Cialdini,
Schwartzman, and Reynolds (1985) found that participants who tried to imagine a difficult-to-imagine outcome perceived the
outcome as less likely than if they had not tried to imagine it at all. Apparently, trying to imagine an event, but failing to do so,
causes that event’s perceived likelihood to decrease. This may be because difficult-to-imagine, or “unintegratable”
information tends to be forgotten or de-emphasised (Fischhoff & Beyth, 1975). Failure to integrate factors predictive of the
known outcome likely produces the same effects as the established debiasing strategy of “considering the opposite” (Arkes et
al., 1988; Davies, 1987; Slovic & Fischhoff, 1977). Thus, the reduction of the hindsight bias may be caused by a failure of the
sense-making process, rather than by an additional mechanism that overrides it. The negative correlation between “resultant”
surprise and hindsight bias magnitude in Experiments 2 and 4 seems consistent with this idea. Of course, Hawkins and Hastie
(1990) may have anticipated this argument when they asked “to what extent is surprise a by-product of the subjective
difficulty experienced while integrating outcome feedback…?” (p. 324). Two predictions of the sense-making model were
11Non-parametric tests performed on the six scenarios revealed identical results.

12Foresightratings for the divorce scenario were fairly large because, due to experimenter error, this scenario provided participants with
only two outcomes. The remaining five scenarios used three outcomes, which will generally lower the estimate for any given outcome.
426 PEZZO
supported in this research, namely that relatively unexpected outcomes that are easy to make sense of will produce hindsight
biases and that unexpected outcomes that are difficult to make sense of will not lead to hindsight biases.
Interestingly, cultural factors may moderate the need to explain unexpected events. Choi and Nisbett (2000) recently
reported that the more surprising their participants found the results of a psychological research study, the less hindsight bias
they exhibited. From the model, we assume that they refer to resultant surprise. More interesting, however, was the fact that
Korean students in their study consistently reported less surprise (and more hindsight bias) at the unexpected outcomes than
did American students. Choi and Nisbett contend that people from Asian cultures are less bothered by inconsistency, and thus
do not spontaneously consider alternative outcomes after receiving an unexpected outcome in the way that Westerners would.
Because no direct measures of “sensemaking” were taken in this research, of course, the findings do not offer unequivocal
evidence that sense-making success is the key determinant of hindsight bias. However, they do suggest that future research
should consider this possibility. One direction this research might take is to observe brain activity (e.g., PET, fMRI) as
participants receive unexpected outcomes that are either difficult or easy to understand. Difficult outcomes may force
participants to “recruit” multiple areas of the cortex in an attempt to find a solution. A large amount of brain activity might be
a sign that something is amiss, an indication of sense-making difficulty. One way to create such a situation might be to use
Sherman et al.’s (1985) approach by asking participants to think of an overly large number of reasons for an outcome. If they
attempt to do so, but fail, they may show increased or altered brain activity, and report likelihood estimates representing little
or reverse hindsight bias.
The motivational perspective: Defensive processing

A secondary goal of this paper was to examine the limits of the defensive-processing hypothesis (Louie et al., 2000; Mark &
Mellor, 1991). This motivationally based idea also suggests that there are times when the hindsight bias will not occur. Here,
however, the idea is that people can be motivated to ignore negative and self-relevant outcomes that threaten their self-image.
Indeed, two studies in this special issue of Memory present evidence for this claim (Mark et al., 2003-this issue, and Renner,
2003-this issue). Data presented in this paper, however, imply important limitations for this approach.
Arguments for a defensive-processing mechanism have intuitive appeal. However, because negative outcomes—
particularly those that are self-relevant—are often unexpected, care should be taken to also consider the effects of outcome
expectations (Falk, 1989; Taylor & Brown, 1988; Weiner, 1985). The data presented here suggest that outcomes that were
incongruent with expectations produced significant hindsight biases, regardless of self-relevance. Put differently, the
stockbroker and Clinton supporter described in the introduction may very well exhibit hindsight bias in response to their
upsetting outcomes, provided those outcomes are at least somewhat unexpected and some sense is able to be made of them.
In Experiment 1, for example, fans who expected their basketball team to win (and were correct) showed no hindsight bias,
but fans of the losing team, who expected their team to win (but were incorrect) did show the bias. Although sports outcomes
can have relevance for fans’ self-esteem (Cialdini et al., 1976), the fans could not claim responsibility for the outcome. Thus,
feelings of culpability may be required before a defensive-processing mechanism is activated.
In Experiment 2, participants could reasonably be said to feel responsible for their performance on a cognitive abilities test.
Here, low self-esteem individuals did not exhibit hindsight bias after negative test feedback, but did exhibit the bias after
positive feedback. This would seem to support the defensive-processing hypothesis. However, participants with high self-
esteem were found to exhibit the opposite. They did not show the bias for positive feedback, but showed a sizeable bias for
negative feedback. Unless there is a reason that defensive processing would occur only for those with low self-esteem, this is
somewhat problematic for the defensive-processing hypothesis. In fact, previous research has found that high rather than low
self-esteem individuals are most likely to utilise self-esteem-defensive mechanisms (Agostinelli et al., 1992; Brown &
Gallagher, 1992).
Interestingly, Wegner’s (1997) ironic mental processing model suggests that attempts to ignore threatening outcomes may
backfire. Wegner found that trying not to think about a “white bear” produced significantly more thoughts about it than if no
such attempt was made. In the defensive-processing model, a threatening outcome is essentially the white bear to be ignored.
Indeed, anecdotal evidence suggests that threatening outcomes may actually produce greater hindsight bias. For example,
feelings of guilt and self-blame can arise after concluding that we “should have known” and thus been able to prevent a tragic
event from occurring to a loved one. Tykocinski (2001) has even suggested that we might feel better following an unpleasant
outcome if that outcome seems relatively inevitable in hindsight. That is, it may be comforting to realise that nothing could
have been done to avoid the outcome. Clearly, defensive-processing hypotheses present fascinating possibilities. Future
research, however, will need to clearly define those situations in which defensive strategies are expected to mitigate the
hindsight bias, rather than exacerbate it.
REFERENCES
Agostinelli, G., Sherman, S.J., Presson, C.C., & Chassin, L. (1992). Self-protection and self-enhancement biases in estimates of population
prevalence. Personality and Social Psychology Bulletin, 18, 631–642.
Alicke, M.D., Davis, T.L., & Pezzo, M.V. (1994). A posteriori revision of a priori decision criteria. Social Cognition, 12, 281–308.
Arkes, H.R. (1988). Commentary on the article by Verplanken & Pieters. Journal of Behavioral Decision Making, 1, 146.
Arkes, H.R., Faust, D., Guilmette, T.J., & Hart, K. (1988). Eliminating the hindsight bias. Journal of Applied Psychology, 73, 305–307.
Brown, J.D., & Gallagher, F.H. (1992). Coming to terms with failure: Private self-enhancement and public self-effacement. Journal of
Cannon, C.K., & Quinsey, V.L. (1995). The likelihood of violent behaviour: Predictions, postdictions, and hindsight bias. Canadian
Journal of Behavioural Science, 27, 92–106.
Choi, I., & Nisbett, R.E. (2000). Cultural psychology of surprise: Holistic theories and recognition of contradiction. Journal of Personality
Processes, 48, 147– 168.
Cialdini, R.B., Borden, R.J., Thorne, A., Walker, M.R., Freeman, S., & Sloan, L.R. (1976). Basking in reflected glory: Three (football) field
studies. Journal of Personality and Social Psychology, 34, 366– 375.
Davies, M.F. (1987). Reduction of hindsight bias by restoration of foresight perspective. Organizational Behavior and Human Decision
Processes, 40, 50–68.
Falk, R. (1989). Judgment of coincidences: Mine versus yours. American Journal of Psychology, 102, 477– 493.
Fischhoff, B. (1975). Hindsight ≠ Foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental
349–358.
Fischhoff, B., & Beyth, R. (1975). “I knew it would happen…” Organizational Behavior and Human Performance, 13, 1–16.
Fleming, J.S., & Courtney, B.E. (1984). The dimensionality of self-esteem: II. Hierarchical facet model for revised measurement scales.
Journal of Personality and Social Psychology, 46, 404–421.
Guerin, B. (1982). Salience and hindsight biases in judgements of world events. Psychological Reports, 50, 411–414.
Haslam, N., & Jayasinghe, N. (1995). Negative affect and hindsight bias, Journal of Behavioral Decision Making, 8, 127–135.
Hastie, R. (1984). Causes and effects of causal attribution. Journal of Personality and Social Psychology, 46, 44–56.
Hawkins, S., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the outcomes are known. Psychological Bulletin, 167,
311–327.
Memory, & Cognition, 15, 605–619.
Lau, R.R., & Russell, D. (1980). Attributions in the sports pages: A field test of current hypotheses in attribution research. Journal of
Leary, M.R. (1981). The distorted nature of hindsight. Journal of Social Psychology, 115, 25–29.
84, 29–41.
Mark, M.M., & Mellor, S. (1991). Effect of self-relevance of an event on hindsight bias: The foreseeability of a layoff. Journal of Applied
Mark, M.M., & Mellor, S. (1994). “We don’t expect it happened”: On Mazursky and Ofir’s (1990) purported reversal of the hindsight bias.
Organizational Behavior and Human Decision Processes, 57, 247– 252.
Mark, M.M., Boburka, R.R., Eyssell, K.M., Cohen, L.L., & Mellor, S. (2003). “I couldn’t have seen it coming”: The impact of negative
self-relevant outcomes on retrospections about forseeability. Memory, 11, 443–454.
Markman, K.D., & Tetlock, P.E. (2000). “I couldn’t have known”: Accountability, foreseeability and counterfactual denials of
responsibility. British Journal of Social Psychology, 39, 313–325.
Mazursky, D., & Ofir, C. (1990). I could never have expected it to happen: The reversal of the hindsight bias. Organizational Behavior and
Mazursky, D., & Ofir, C. (1996). “I knew it all along” under all conditions? Or possibly “I could not have expected it to happen” under some
conditions. Organizational Behavior and Human Decision Processes, 66, 237–240.
Menec, V.H., & Weiner, B. (2000). Observers’ reactions to genetic testing: The role of hindsight bias and judgments of responsibility.
Journal of Applied Social Psychology, 30, 1670–1690.
Nario, M.R., & Branscombe, N.R. (1995). Comparison processes in hindsight and causal attribution. Personality and Social Psychology
Bulletin, 21, 1244– 1255.
428 PEZZO
Olson, J.M., Roese, N.J., & Zanna, M.P. (1996). Expectancies. In E.T.Higgins & A.W.Kruglanski (Eds.), Social psychology: Handbook of
basic principles. New York: Guilford Press.
Pezzo, M.V. (1996). Removing the hindsight bias: A test of the motivated processing hypothesis. (Ohio University, 1996). Dissertation
Abstracts International, 57, 6606.
Pohl, R.F. (1998). The effects of feedback source and plausibility of hindsight bias. European Journal of Cognitive Psychology, 10,
191–212.
Pyszczynski, T.A., & Greenberg, J. (1981). Role of disconfirmed expectations in the instigation of attributional processing. Journal of
Raven, J., Raven, J.C., & Court, J.H. (1995). Manual for Raven’s Progressive Matrices and vocabulary Scales. Oxford: Oxford University
Press.
Renner, B. (2003). Hindsight bias after receiving self-relevant health risk information: A motivation perspective. Memory, 11, 455–472.
Roese, N.J., & Maniar, S.D. (1997). Perceptions of purple: Counterfactual and hindsight judgments at Northwestern Wildcats football games.
Personality and Social Psychology Bulletin, 23, 1245–1253.
Roese, N.J., & Olson, J.M. (1996). Counterfactuals, causal attributions, and hindsight bias: A conceptual integration. Journal of
Sanna, L.J., Schwarz, N., & Stocker, S.L. (2002). When debiasing backfires: Accessible content and accessibility experiences in debiasing
hindsight. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 497–502
Sanna, L.J., & Turley, K.J. (1996). Antecedents to spontaneous counterfactual thinking: Effects of expectancy violation and outcome
valence. Personality and Social Psychology Bulletin, 22, 906–919.
Schkade, D.A., & Kilbourne, L.M. (1991). Expectation-outcome consistency and hindsight bias. Organizational Behavior and Human
Schwarz, N. (1998). Accessible content and accessibility experiences: The interplay of declarative and experiential information in judgment.
Personality and Social Psychology Review, 2, 87–99.
Sherman, S.J., Cialdini, R.B., Schwartzman, D.F., & Reynolds, K.D. (1985). Imagining can heighten or lower the perceived likelihood of
contracting a disease: The mediating effect of ease of imagery. Personality and Social Psychology Bulletin, 11, 118–127.
Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of Experimental Psychology: Human Perception
and Performance, 3, 544–551.
Snyder, C.R., & Higgins, R.L. (1988). Excuses: Their effective role in the negotiation of reality. Psychological Bulletin, 104, 23–35.
Tan, H.-T., & Lipe, M.G. (1997). Outcome effects: The impact of decision process and outcome controllability. Journal of Behavioral
Decision Making, 10, 315–325.
Taylor, S.E., & Brown, J.D. (1988). Illusion and well-being: A social psychological perspective on mental health. Psychological Bulletin,
103, 193–210.
Tykocinski, O.E. (2001). I never had a chance: Using hindsight tactics to mitigate disappointments. Personality and Social Psychology
Bulletin, 27, 376–382.
Wegner, D.M. (1997). When the antidote is the poison: Ironic mental control processes. Psychological Science, 8, 148–150.
Weiner, B. (1985). “Spontaneous” causal thinking. Psychological Bulletin, 97, 74–84.
Winman, A. (1997). The importance of item selection in “knew-it-all-along” studies of general knowledge. Scandanavian Journal of
Zakay, D. (1984). The influence of perceived event’s controllability on its subjective occurrence probability. Psychological Record, 34,
233–240.
“I couldn’t have seen it coming”: The impact of negative self-relevant
outcomes on retrospections about foreseeability
Melvin M.Mark Pennsylvania State University, USA
Renee Reiter Boburka East Stroudsburg University, PA, USA
Kristen M.Eyssell Gettysburg College, PA, USA
Laurie L.Cohen Arizona State University, USA
Steven Mellor University of Connecticut, USA
We examined a phenomenon related to hindsight bias, specifically, retrospective judgements about the
foreseeability of an outcome. We predicted that negative, self-relevant outcomes would be judged as less
foreseeable by the recipient of the outcome than by others, unlike either positive outcomes or outcomes that are
not self-relevant. In the context of a “stock market decision-making game”, the hypothetical stock selected by one
of two players showed an extreme increase or decrease. As predicted, the player who received an extreme
negative outcome reported that this outcome was less foreseeable than did the opponent and an observer, for
whom the outcome was less self-relevant. For no other kind of outcome was there a difference between the
recipient of an outcome, the opponent, and the observer. The findings have several implications, including the
possibility that hindsight bias should be considered as a special case of retrospective foreseeability.
Hindsight bias refers to the tendency, once the outcome of a particular event is known, to over-estimate how predictable that
outcome was in foresight. Hindsight bias has been demonstrated in a variety of applied and experimental settings (see
Hawkins & Hastie, 1990, and ChristensenSzalanski & Willham, 1991, for reviews). Fischhoff (1975), in his seminal work in
this area, suggested that hindsight bias makes it difficult to learn the lessons of the past. Many authors have since echoed this
argument (e.g., Christensen-Szalanski & Willham, 1991; Fischhoff, 1982). For example, Arkes and his colleagues (Arkes,
Faust, Guilmette, & Hart, 1988; Arkes, Wortmann, Saville, & Harkness, 1981) note that if, after learning about a confirmed
diagnosis a physician believes that he

or she “knew-it-all-along”, the outcome information may not provide the learning that might otherwise take place. In contrast
to this original focus on its negative impact on decision making, in recent years hindsight bias has been framed as a natural
byproduct of generally functional processes. “Adjusting conclusions in light of outcome information is the sine qua non of
learning, but carries the attendant effect of an exaggerated certainty regarding that outcome. As such, the hindsight bias is best
viewed on the same conceptual field as other functionally sound cognitive simplifications, such as attitudes, stereotypes, and
impressions: Quick and often pragmatically useful inferences that are sometimes made at the expense of accuracy” (Roese &
Olson, 1996, p. 224; also see Hoffrage, Hertwig, & Gigerenzer, 2000, p. 579).
This re-framing of the hindsight bias has three important implications that underlie the present research. First, hindsight bias
should not be seen as a singular, distinct phenomenon, but should be considered in the context of related psychological
processes. Second, expanding on Roese and Olson’s analysis, receipt of outcome information sometimes leads to the
development rather than to the adjustment of conclusions. Outcomes often arise about which people have not made prior
judgements of likelihood (unless prompted to by a researcher’s question, as in a hindsight bias study). Few people, for
instance, probably estimated in foresight the likelihood that the estranged wives of O.J.Simpson and Robert Blake would be
murdered. Of course, although many events such as these occur about which people have not made prior likelihood
judgements, certain other events, such as highly publicised trials, sporting events, and elections, may readily stimulate
judgements in foresight about the likelihood of the alternative outcomes. However, even when people are faced with
processes that could easily elicit prospective judgements of likelihood, they are sometimes, by virtue of their roles,
Requests for reprints should be sent to Melvin M. Mark, Department of Psychology, 417 Moore, Penn State University,
University Park, PA 16802, USA. Email: M5M@PSU.EDUThanks are due to three anonymous reviewers and the Special
Issue editors for helpful feedback on previous drafts.
430 MARK ET AL.
admonished to suspend judgements about what the outcome will be. For instance, jurors are often instructed to suspend
judgement until all the evidence has been presented.
The literature on hindsight bias can be seen as a special but important case in which individuals make retrospective
judgements of foreseeability. In essence, research on hindsight bias attempts to compare the retrospective foreseeability
judgements of those with outcome information to either (1) judgements of those same individuals prior to receiving outcome
information (within-subject designs) or (2) judgements of other individuals with no outcome information (between-subjects
designs) (Schwarz & Stahlberg, 2003-this issue). Yet comparison to a no-outcome-knowledge condition is not intrinsically
required for the study of retrospective foreseeability judgements. Indeed, related literatures, such as work on counterfactual
thinking, have progressed well without always including a no-outcome-information control (e.g., Markman & Tetlock, 2000;
Sherman & McConnell, 1995). Moreover, without broader attention to retrospective foreseeability judgements in general,
research on hindsight bias alone may leave several important questions unanswered. For instance, it is not clear whether the
findings of hindsight bias studies apply equally well to cases in which likelihood judgements would, and would not, be made
spontaneously in foresight without the researcher’s probing. In addition, hindsight bias designs focus attention on the outcome
information-no outcome information contrast. This may in a sense serve as a heuristic for researchers, drawing attention away
from investigation of related questions involving retrospections about foreseeability.
Recent re-framing of the hindsight bias also has a third implication. By pointing out the similarity of hindsight bias to other
“functionally sound cognitive simplifications”, Roese and Olson indirectly highlight the possible role of certain motivational
factors in retrospective judgements of foreseeability. In particular, the impact of self-protective and self-enhancing
mechanisms have been demonstrated in several other areas where “cognitive simplifications” provide quick, often useful
inferences (e.g., Brown, 1986; Dunning, Leuenberger, & Sherman, 1995; Greenwald, 1980; Kunda, 1987; Taylor & Brown,
1988; Wood, Taylor, & Lichtman, 1985). In fact, self-protective mechanisms have received some attention in the hindsight
literature. Mark and Mellor (1991) examined whether the self-relevance of a negative outcome moderates the magnitude of
hindsight. Laid-off union members, survivors (i.e., union members who were not laid off), and non-union community
members made retrospective judgements of the foreseeability of the layoff. The layoff should have been most self-relevant for
the laid-off group, somewhat self-relevant for the layoff survivors, and low in self-relevance for the community members. As
predicted, laid-off respondents reported less foreseeability than layoff survivors, who in turn reported less foresee ability than
community members. These results were strengthened by the use of the regression-discontinuity design (Cook & Campbell,
1979). Workers were laid off according to seniority, and analyses indicated there was a discontinuity in reported foreseeability
at precisely the point where the cutoff for layoff occurred (i.e., at the point in seniority below which workers were laid off and
above which they were not). Nevertheless, the retrospective and quasi-experimental nature of the study leaves some ambiguity
about whether the findings were the result of self-protective motives. Because the Mark and Mellor (1991) investigation
lacked the usual hindsight bias study’s comparison to a no-outcome-information condition, it might be better characterised as
a study of retrospective foreseeability judgements than as a study of hindsight bias.
Mark and Mellor’s reasoning was that, when a person is directly affected by a negative event (such as a layoff from work),
hindsight bias may be inhibited by a self-serving bias which has the adaptive function of preserving one’s self-image (see,
e.g., Blaine & Crocker, 1993; Bradley, 1978; Mullen & Riordan, 1988; Taylor & Brown, 1988; Zuckerman, 1979; also see
Mark & Mellor, 1991, for additional details on the possible motivational bases of hindsight bias). That is, if an individual is
directly affected by a negative event and acknowledges that the event was foreseeable, the person would likely feel increased
responsibility for its occurrence. At the least, the person is likely to think counterfactually about things he or she should have
done to have avoided the undesirable outcome or at least to have minimised its negative effects1 (Macrae, 1992; Markman &
Tetlock, 2000; Roese & Olson, 1996; Turley, Sanna, & Reiter, 1995). This does not imply that hindsight bias would be
attenuated by any negative outcome the person experiences. Rather, the outcome probably needs to be relevant in the sense
of having feasible implications for the person’s judgement, skills, culpability, and the like (as in the laid-off worker’s decision
to stay for years in the same job, while not increasing savings or training for other jobs).
Mark and Mellor focused on negative self-relevant outcomes. The effect of positive self-relevant outcomes is less clear.
Given that positive self-relevant outcomes often stimulate self-enhancement (e.g., Taylor & Brown, 1988), they may increase
retrospective foreseeability judgements: “Of course I knew all along that this good thing would happen.” However, the
tendency to self-enhance appears to be less powerful than the tendency to self-protect (Agostinelli, Sherman, Presson, &
Chassin, 1992; Campbell, 1986; Goethals, 1986; Taylor, 1991; but cf. Miller & Ross, 1975). This may be part of a more
pervasive pattern whereby bad events are more powerful than good ones and, accordingly, self-protection is more powerful
than self-enhancement (Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001). Thus, positive self-relevant events may be less
likely than negative self-relevant events to influence foreseeability.
Research by Louie and her colleagues (Louie, 1999; Louie, Curren, & Harich, 2000) addresses the question of positive
versus negative self-relevant outcomes, while also extending Mark and Mellor’s (1991) quasi-experimental methodology to
controlled experiments. Louie’s (1999, Study 1) participants read a case description of a company, decided whether they
would like to buy its stock, and subsequently either learned the stock price had increased or decreased, or got no outcome
NEGATIVE SELF-RELEVANCE AND FORESEEABILITY 431
information. Replicating Mark and Mellor (1991), there was no hindsight bias among people who experienced a negative self-
relevant outcome (i.e., who decided to purchase the stock and its value declined, or who did not purchase and the value
increased). That is, the retrospective foreseeability judgements of those who experienced a negative self-relevant outcome did
not differ from those in the no-outcome-information condition. In contrast, those who experienced a positive self-relevant
outcome showed hindsight bias. In addition, the pattern of attributions indicated that self-serving processes were operating.
Louie (1999, Study 2) replicated these findings and showed that a manipulation intended to constrain self-serving biases
eliminated the hindsight bias among those who experienced a positive self-relevant outcome. Louie et al. (2000) extended
research on self-serving biases and hindsight to a team setting. MBA students rated the likelihood of change in the market
share of a target firm in a class simulation. They later learned of the target firm’s performance and recalled their predictions.
The target was either the firm run by the student’s own team or a firm run by a team in another, parallel simulation (all teams
competed for class grade). When students’ own firm did badly they showed no hindsight, but students exhibited hindsight for
the poor performance of another team. Conversely, students showed hindsight when their own firm did well, but did not exhibit
hindsight for the good performance of another team.
The Louie (1999) and Louie et al. (2000) findings are consistent with Mark and Mellor’s (1991) suggestion that self-
relevance may moderate hindsight bias and, more generally, retrospective judgements of foreseeability. They also empirically
extend the self-serving bias account to positive outcomes. The studies of Louie and her colleagues also compensate for the
causal ambiguities that remain from Mark and Mellor’s (1991) quasi-experimental investigation, while also using procedures
that cleverly operationalise self-relevance (especially in the use of a real, graded classroom exercise in Louie et al., 2000). At
the same time, important questions remain. One is whether outcomes that are positive and self-relevant actually influence
judgements of foreseeability. In Louie’s (1999) studies, those who experienced positive self-relevant outcomes showed
hindsight bias, relative to a no-outcome-information control group; however, no comparison was made to the retrospections
of observers who saw the same outcome, but for whom it was low in self-relevance. It might be argued that such a
comparison occurred in the Louie et al. (2000) study, where hindsight did not occur when students made judgements about
another team’s success. However, all teams were competing in terms of course grade, and another firm’s actions highlight
decisions that one’s own firm did not take. Thus, another firm’s success could have been interpreted as one’s own team’s
failure. Therefore, findings for this group may simply replicate the finding that negative self-relevant findings attenuate
hindsight bias.
In short, although the Mark and Mellor (1991) and Louie (1999; Louie et al., 2000) studies demonstrate the role of self-
relevance in hindsight bias, they are not conclusive on the question of whether positive and negative outcomes have
symmetrical effects. More generally, studies to date do not provide a clear disentangling of outcome valence and self-
relevance which, furthermore, may vary as a function of one’s perspective relative to the outcome. Louie et al. (2000) noted
that the interpretation of an outcome may depend on one’s perspective (e.g., they describe the case of a power company that
defaulted on bonds, where the company’s view of its ability to predict the default differed from that of investors who sued).
Depending on one’s perspective, an outcome may be directly self-relevant, indirectly self-relevant, or not at all relevant. Such
differences in perspective may have important consequences for retrospective judgements about foreseeability.
In addition, in many contexts, comparative information is available simultaneously about one’s own outcomes and those of
others. Just as counterfactual reasoning that emphasises the explanation of alternative outcomes can reduce or eliminate
hindsight bias (e.g., Arkes et al., 1988; Davies, 1992), it is possible that the simultaneous consideration of different parties’
outcomes might make an outcome seem less foreseeable. In the Louie et al. (2000) research, however, participants were asked
to consider only the target firm’s performance.
We designed the present study, then, to test experimentally the effect of positive and negative outcomes, varying in self-
relevance, on retrospective judgements of foreseeability. We predicted that for a negative, self-relevant outcome, participants
would deny foreseeability (cf. Markman & Tetlock, 2000, for similar predictions in the context of counterfactual thinking).
Predictions for a positive, self-relevant outcome are more tenuous, but the findings of Louie (1999) and Louie et al. (2000)
suggest that a positive outcome may increase, and at least should not decrease, foreseeability. To avoid ambiguity about the
independent role of outcome valence and self-relevance, we had observers, who did not directly experience the outcomes, also
make judgements. In addition, because actual competitive settings often provide exposure to multiple outcomes
simultaneously, we asked participants to make judgements about multiple outcomes.
1The findings of Roese and Olson (1996) and Roese and Manier (1997), that counterfactual thinking increases the magnitude of hindsight
bias, might seem to suggest the opposite effect, that is, that a search for alternative causes should lead to more hindsight when the person
experiences a negative self-relevant outcome. However, Roese and Olson (1996) and Roese and Manier (1997) examined the effect of
counterfactual thoughts that emphasised the predictability of the observed outcome. In contrast, we suggest that negative self-relevant
outcomes stimulate counteractuals about alternative outcomes, and research shows that counterfactual reasoning that emphasises the
explanation of alternative outcomes can reduce or eliminate hindsight bias (e.g., Arkes et al., 1988; Davies, 1992).
432 MARK ET AL.
METHOD
Overview and design

In the context of a “stock market decision-making game”, two players each “bought” a stock. A third individual was assigned
to observe the game without actively participating. During the course of the game, one of the two players’ stock either greatly
decreased (negative outcome) or increased (positive outcome), while the other stock held a modest gain (neutral outcome).
The extreme outcome should be most self-relevant for the player receiving it and least for the observer. All participants rated
the foreseeability of the outcomes of both stocks.
Formally, the experiment employed a mixed factorial design, with player role (player 1, player 2, or observer), recipient of
extreme change (player 1 or player 2), and direction of extreme change (increase or decrease) as between-subject factors, and
target of rating (player 1 or player 2) as a within-subject factor. However, the player 1-player 2 distinction holds no
substantive interest, and effects did not differ across these two roles. We therefore combined across the two player roles for
analyses. Table 1 (see later) gives the simplified design.
Participants
Participants were 77 female and 46 male students enrolled in introductory psychology classes at Penn State University.
Participants received extra course credit in exchange for participation.
Procedure
Participants, in groups of three, were recruited for an experimental game on decision making in a stock market setting. They
were informed that the game would consist of two separate halves, and that the object of each half was to make as much
money as possible by investing in one stock. Participants drew from a hat one of three cards labelled “Player 1”, “Player 2”,
or “Observer”. The experimenter explained that the two players would compete against each other in the first half, during
which the observer would make judgements about both players’ decisions and outcomes, and that in the second half the
observer would also compete in the game (to increase the observer’s motivation and attention). There was, in fact, no “second
half’. To increase motivation, participants were told that the winner of each half would be given tickets for a draw for $100 in
proportion to their “earnings”. In addition, the observer was told that he or she would be entered in a separate $100 draw for
all observers. In actuality, all participants received an equal chance in two $100 draws conducted at the end of data collection.
The experimenter explained that the first half of the game would last for three “trading weeks”, with each trading week
consisting of four “trading days” (Tuesday-Friday). The “days” and “weeks” of the stock-trading game all took place during a
single 90-minute session. The available stocks were listed in the “College Street Journal”, a one-page newsletter with a new
edition for each trading day. Participants were told that the names of the stocks were fictional, but that their progress was charted
from actual stocks. In reality, the temporal pattern of the stocks was controlled to manipulate the independent variables. The
Journal, loosely modelled after the Wall Street Journal, contained fictional articles and information. Some articles were
relevant to the stocks from which participants could choose, others were non-relevant filler articles, and other features
included the weather. The Journal also contained daily opening and closing trading prices for the eight stocks from which the
players could choose. (Prior to beginning the game, the experimenter explained opening and closing prices and the calculation
of profits or losses.)
All three participants received a packet of four College Street Journals, one for each day of the first trading week. After
reading the Journals and observing a graph of the stocks’ performance for the week, players invested $200 by writing down
their chosen stock and handing it to the experimenter.2 After answering a series of questions consistent with the cover story,
participants received four more Journals for the second trading week. In addition to the daily opening and closing values of
the eight stocks in the Journal, for the second (and third) week the experimenter plotted the progress of the players’ chosen
stocks and total assets on a blackboard. During the second week, neither player’s stock rose or fell in price to a great extent,
but instead showed a modest upward trend (pilot testing had shown that this resulted in similar expectations for the two
stocks’ performance).
2Ifboth players made the same stock selection, they were informed that a penalty would be incurred if they invested in identical stocks. The
penalty was that they would each get three fewer shares than their money could buy and the extra money would not be invested at any point
in the game. If one person decided to change his or her choice, neither participant was penalised. Players rarely made the same stock
selection, and in every instance one of the players changed his or her stock choice so that neither player was penalised. Deletion of the cases
in which players initially made identical selections does not materially influence the results.
During the third trading week, one (randomly selected) player received either the extreme positive or negative outcome.
The extreme increase or decrease in this stock’s share price was given in the daily stock summary sections of the third week’s
set of College Street Journals, and represented just over a 10% rise or fall in the original investment over the period of a week.
For example, in one case “Fitsu Computers” opened at $37.50 per share on Day 1 of the third trading week and closed at $18.
00 per share on Day 4, for a net loss of $19.50 per share over the week. In addition, one of the journals for the third week also
contained an article that explained why the extreme outcome occurred. In the instance of the extreme decline in “Fitsu
Computers” price per share, an article explained that the stock declined sharply because the company’s products were made
outdated by a competitor. Two versions of the third week’s Journal were prepared for each stock, and which version was
distributed depended on the participants’ stock choices and the condition. All other stocks, including that of the player who
did not receive the extreme change, showed modest progress in the third week, similar to that of the first two weeks.
At the end of the third “trading week”, participants answered a series of questions in a confidential questionnaire. This
included a manipulation check, in which all participants rated, on a response scale ranging from (1) “increased greatly” to (9)
“decreased greatly”, the extent to which each stock’s value changed in the last trading period. The dependent measure was a
three-item “foreseeability scale”. All participants rated, on nine-point scales, the extent to which each stock outcome was
surprising (“I was surprised by the performance of this stock”), foreseeable (“I could foresee what this stock was going to
do”), and obvious in foresight (“It was obvious after the first four-day trading period that this stock was going to perform as it
did”). Coefficient alpha reliability was .77, and the three items were averaged for analysis. Other questions assessed whether
participants were suspicious of the cover story or had guessed the hypothesis in any way (no participants were suspicious).
Participants were then fully debriefed. When the study was over, two $100 draws were held.
RESULTS
Manipulation check
The manipulation check assessed the extent to which participants perceived each stock as increasing or decreasing in value. A
2 (extremity of outcome: “extreme” or “neutral”)×2 (valence of the extreme outcome: positive or negative [i.e., a gain or a
loss])×3 (rater’s perspective: recipient of the outcome, opponent, or observer) multivariate analysis of variance, with
extremity as a within-subject factor, revealed the expected extremity×valence interaction, F(l, 117)=315.92, p < .01. As
expected, a Newman Keuls post-hoc test showed that extreme positive outcomes (M=1.43) differed significantly from the
extreme negative outcomes (M=8.54), with the mean ratings indicating that these were seen, respectively, as having increased
and decreased greatly. Also as expected, both extreme outcomes differed from the neutral outcomes. Ratings for the neutral
condition were not affected by the valence of the opponent’s extreme outcome. That is, a neutral outcome in the context of a
“winning” opponent (M=3.73) did not significantly differ from a neutral outcome in the context of a “losing” opponent (M=2.
84). In addition, no main effects or interactions were found that included the rater’s perspective.3
Effects on foreseeability
Our primary prediction, drawing from the findings of Mark and Mellor and of Louie and her colleagues, was that participants
who experienced an extreme negative outcome (the negative self-relevant condition) would be more likely to deny the
foreseeability of that outcome, relative to the opponents and observers who also observed that outcome. We were also
interested in the somewhat more tenuous prediction that participants who experienced extreme positive outcomes might
report higher foreseeability than their opponents and observers. Louie (1999; Louie et al., 2000) observed hindsight bias
among those who experience a positive self-relevant outcome, but we are aware of no research that has compared such people
with observers for whom the outcome is not self-relevant. For neutral outcomes, no differences in foreseeability ratings were
3 However, the interaction between extremity of outcome, valence, and rater perspective was marginally significant, F(2, 117)=2.71, p < .
075. The interaction appears to be driven by the tendency of winners (i.e., the recipients of the extreme positive outcome) to rate their
opponent’s neutral outcome as having increased less (M=4.33) than did the recipients of neutral outcomes (M=3.46) and observers (M=3.
50) (recall that the rating scale ranged from 1, indicating the stock had increased greatly, to 9, decreased greatly). More importantly, ratings
of neutral outcomes by their recipients did not depend on whether these outcomes occurred in the context of a winning (M=3.46) or losing
(M=3.09) opponent. This absence of a contrast effect for the recipients of neutral outcomes suggests that these players did not see their
opponent’s outcome as relevant to themselves. That is, if the players who received a neutral outcome had seen their opponents’ outcome as
self-relevant, this presumably would have led to a contrast effect in their ratings of their own neutral outcomes. Given that this did not
occur, it seems most reasonable to expect that there should be no effect of the extreme outcome’s valence on the foreseeability of neutral
outcomes.
434 MARK ET AL.
predicted between the outcome recipient and the opponent and observer, because neutral outcomes were not expected to elicit
self-protective or self-enhancing motives (see footnote 3).
To test these hypotheses, we conducted planned contrasts for each of the four types of outcomes (i.e., extreme negative,
extreme positive, neutral in the context of a losing opponent, and neutral in the context of a winning opponent). The primary
hypothesis test involved comparing the foreseeability of the outcome as rated by the player who experienced it with ratings of
the same outcome by the opponent and the observer. For this contrast, weights of 2, −1, and −1 were assigned to the outcome
recipient, opponent, and observer, respectively. We also conducted a second planned contrast for each outcome, in which we
compared the ratings of opponents with those of observers, to assess whether there was some degree of self-relevance that led
opponents to judge an outcome differently from observers. For this contrast, weights of 0, 1, and −1 were assigned to the
outcome recipient, opponent, and observer, respectively.
The first planned contrast was significant for extreme negative outcomes, t(60)=2.57, p=.01, d =0.66. As shown in the first
row of Table 1, participants who received extreme negative outcomes were significantly less likely than their opponents or
observers to indicate that this outcome was foreseeable. However, the opponent of the player who received an extreme
negative outcome did not differ from the observer of this outcome. No significant differences were observed between the
outcome recipient and other participants or between the opponent and the observer for extreme positive outcomes or for
neutral outcomes. (Comparable results were observed when the recipient of an outcome was compared separately to the
opponent and the observer.)
TABLE 1 Mean foreseeability ratings, Ns, (and SDs) for extreme and non-extreme outcomes, by rater
Rater of outcome
Outcome rated Outcome recipient Opponent Observer
Extreme outcome
Negativea 2.36 (1.43) 3.32 (1.80) 3.88 (2.15)
n 22 19 22
Positiveb 4.73 (1.22) 4.02 (2.23) 4.80 (1.63)
n 22 18 20
Neutral outcome
Losing opponentb 6.00 (1.09) 5.83 (1.90) 6.32 (1.47)
n 19 22 22
Winning opponentb 5.43 (1.81) 4.85 (1.41) 4.97 (1.32)
n 18 22 20
a In this row, the outcome recipient significantly differs from the others (i.e., opponent and observer) at p<.05 (i.e., Contrast 1 was
significant).
b In this row, neither planned contrast approached significance. Higher means indicate greater perceived foreseeability of the outcome.
DISCUSSION
The effect of self-relevant outcomes, negative and positive

Consistent with Mark and Mellor’s (1991) findings and with the findings of Louie (1999) and Louie et al. (2000) for negative
self-relevant outcomes, we found that when people experience outcomes that are negative and self-relevant, they see these
outcomes as less foreseeable than do others observing the same outcome. In particular, when one player’s stock dropped
precipitously, the person who had selected that stock saw the drop as significantly less foreseeable than did the opponent or the
observer. It appears that self-protective biases can operate to defend against the threat to the self-concept that would otherwise
occur after experiencing negatively valenced outcomes which could have negative implications for one’s sense of self—in
other words, after making a predictably bad decision (Blaine & Crocker, 1993; Mullen & Riordan, 1988; Taylor & Brown,
1988; Zuckerman, 1979).4 The present findings extend past research by experimentally comparing those who received
negative self-relevant outcomes with observers and opponents who received the same outcome information but for whom the
outcome was not self-relevant.
Despite the present and past evidence that negative, self-relevant outcomes attenuate foreseeability, there may be some
circumstances in which negative self-relevant outcomes would increase retrospective foreseeability and hindsight bias.
Holding the minority opinion about a group decision may be one such circumstance. For example, if a married couple
disagreed about whether or not to invest in a stock, and one prevailed on the other to buy, the dissenting partner may
experience heightened hindsight bias if the stock declines precipitously.
In contrast to negative outcomes, no effect of self-relevance was observed for positive outcomes. Players who experienced
a dramatic rise in their stock’s value did not see this outcome as more (or less) foreseeable than did their opponents or the
observer. This may seem inconsistent with Louie (1999) and Louie et al’s. (2000) finding of hindsight bias among those who
received positive self-relevant outcomes. In fact, however, there may be no inconsistency. Although Louie’s studies
demonstrated hindsight among those who received positive outcomes, relative to a no-outcome-information condition, they
did not examine whether the magnitude of their hindsight bias was greater than for people who knew the outcome but for
whom it was not relevant. Adding to the findings of Louie (1999) and Louie et al. (2000), the current results suggest that
positive self-relevant outcomes do not significantly increase or decrease retrospective foreseeability judgements, relative to
those who receive outcome information without self-relevance. This pattern of asymmetrical results is consistent with
research in a number of areas showing larger responses to negative information and events than to positive ones (Baumeister
et al., 2001).
Although we found no impact of positive self-relevant outcomes on hindsight in the present research, it would be premature
to conclude that only negative self-relevant outcomes influence perceived foreseeability (and, by extension, hindsight bias).
For example, positive self-relevant outcomes might increase hindsight bias under circumstances in which self-enhancing
motivations are intensified, such as when the person’s self-esteem had previously been threatened. In such a case, self-
enhancing motives may be increased, and claiming inflated foresight of a positive outcome would serve as a form of self-
affirmation (Steele, 1988). In addition, positive outcomes that are more intensely self-relevant than those manipulated in the
present research might themselves stimulate self-enhancement.
Self-relevance and self-serving processes

The high self-relevant conditions in the present research may have been less self-relevant than their counterparts in previous
studies. Certainly in relation to being laid off from work, and perhaps relative to a graded assignment in an MBA marketing
course (Louie et al., 2000), there may be less self-relevance for psychology undergraduates who receive an extreme outcome
in a stock market game. One possible implication, just noted, is that more intensely self-relevant outcomes might elicit self-
enhancement for positive outcomes (“Of course, I expected all along that things would turn out so well”), as well as self-
protection for negative outcomes. More generally, the question might be raised as to whether the observed effects are truly
due to self-serving biases or instead are the result of self-presentation (e.g., Schlenker, 1980).
There are several reasons we believe the findings are consistent with self-serving bias rather than self-presentation. First,
our results for negative outcomes converge with those of Mark and Mellor (1991), Louie (1999), Louie et al. (2000), and
Renner (2003, this issue, for the delayed measures), whose findings are not all easily accounted for by self-presentation.
Second, Louie et al. (2000) explicitly presented attributional and experimental evidence designed to support a self-serving
bias account, and Renner (2003-this issue) presents evidence that implicates perceived threat as having a mediating role,
consistent with the self-serving bias interpretation. These findings provide evidence implicating self-serving rather than self-
presentational processes. Third, the procedure we used, with the implication for lottery tickets and the competitive task,
clearly engaged our participants, who generally appeared to care about the outcome. Facial expressions, cheers or moans, and
verbal exclamations often accompanied the extreme outcomes. Fourth, the data collection procedures indicated that responses
were anonymous. No personal identifiers were associated with the responses, nor were there any other apparent cues to elicit
self-presentation. Although these procedures do not completely rule out the possibility that participants were to some extent
concerned about self-presentation, the convergence of evidence across studies leads us to endorse a self-serving bias
interpretation.
Retrospective judgements of foreseeability versus hindsight bias

The present research examined the effect of outcome valence and self-relevance on retrospective judgements of
foreseeability. We did not collect measures of judgements in foresight, nor did we include a no-outcome-information control
condition, as is standard in research on hindsight bias. Nevertheless, the findings, at least for negative self-relevant outcomes,
converge with other research (e.g., Louie, 1999; Louie et al., 2000) that included the traditional hindsight bias comparisons.
We argue that potential benefits may accrue if research on retrospections about foreseeability breaks out of a strict hindsight
bias research mould. Attention to hindsight bias has generated a rich array of empirical and conceptual developments (as this
4The attributional findings from Louie (1999) support this classic motivational form of self-serving bias effect, as does the attenuation of
hindsight in Louie et al. (2000) in response to a manipulation designed to inhibit self-serving biases. At the same time, the findings for
negative self-relevant outcomes could have arisen at least in part because of related cognitive processes that are likely to be initiated by such
outcomes. See Mark and Mellor (1991) for a summary of various cognitive processes that might also contribute to a self-serving pattern of
responses.
436 MARK ET AL.
Special Issue of Memory illustrates). At the same time, any theoretical perspective or research tradition can also act as a set of
blinders, diverting attention away from some questions while focusing attention on others (Greenwald, Pratkanis, Leippe, &
Baumgardner, 1986). From this perspective, it may not have been coincidental that the Mark and Mellor (1991) investigation
of self-serving biases, which Renner (2003-this issue, p. 457) characterises as the first hindsight study in a”highly self-relevant
and consequential setting”, occurred in a real-world, quasi-experimental investigation that did not parallel conventional
hindsight bias designs. As another example, without broadening the field’s focus to perceived foreseeability in general, we
may fail to learn when it is that people spontaneously estimate the likelihood of an outcome in foresight (cf. Beach, 1993;
Wong & Weiner, 1981).
In addition to serving as blinders on investigators’ attention, research traditions often bring practical constraints as well.
For instance, the present research included an observer who had outcome information but for whom the outcome lacked self-
relevance, as well as participants for whom each outcome was self-relevant. Obviously, it would be possible (and, indeed,
informative) to include such conditions in a traditional hindsight bias study that also manipulated outcome knowledge.
However, there are practical limits on study size, especially in the relatively complex settings needed to observe motivational
influences such as the influence of self-protection. And there could be pragmatic challenges of manipulating all these
variables successfully in a single study. For example, a self-relevance manipulation may not be equally potent for those in a
no-outcome information control condition as for those in an outcome-information condition.
Such constraints may hinder progress unless research on hindsight bias is diversified to study, more generally, the
psychology of retrospective judgements of foreseeability. Indeed, a case can be made that arbitrary methodological traditions
may have helped keep researchers from fully considering the role of self-serving biases until recently. The traditional
paradigms used to investigate hindsight bias do not allow a meaningful assessment of the impact of self-relevance. Hindsight
bias research typically involves participants receiving feedback about their performance on test items (e.g., Hoch &
Loewenstein, 1989), or making judgements about the outcomes of scenarios (e.g., Fischhoff, 1975; Wasserman, Lempert, &
Hastie, 1991) or about the performance of some paper-and-pencil other (e.g., Schkade & Kilbourne, 1991) or product (e.g.,
Mazursky & Ofir, 1990). To test hypotheses about the role of self-relevance, or to have a serendipitous finding that suggests
self-serving processes, it is necessary to use a procedure that allows participants to be exposed to positively or negatively
valenced outcomes that vary in self-relevance (Renner, 2003-this issue). It has often been observed that methodological
choices can restrict the development of knowledge, and this may be another case in point.
In addition, although the term hindsight bias may be well established, the field is moving away from a conceptualisation of
the phenomenon as a bias and towards a view of hindsight as a byproduct of a generally adaptive learning process (e.g., Hoch
& Loewenstein, 1989; Hoffrage et al., 2000; Roese & Olson, 1996). There appear to be important intersections among
hindsight bias/ foreseeability, counterfactual thinking, causal attributions, the overconfidence effect, and other phenomena
(e.g., Hertwig, Gigerenzer, & Hoffrage, 1997; Lipe, 1991; Roese & Olson, 1996; Sherman & McConnell, 1995; Wasserman
et al., 1991). Viewed from this broader perspective, findings about the relative degree of retrospective foreseeability can be
quite informative, depending on the research question, even without the usual comparisons that allow an estimate of the
degree of hindsight “bias”.
Self-related concerns and recent work on hindsight bias

The present results may have implications for other research questions, including the ongoing debate about relationship
between surprise and hindsight bias (e.g., Arkes et al., 1988; Mark & Mellor, 1994; Mazursky & Ofir, 1990; Ofir & Mazursky,
1997; Pezzo, 2003-this issue; Schkade & Kilbourne, 1991; Verplanken & Pieters, 1988). In particular, future research might
fruitfully examine whether the relationship between surprise and hindsight is moderated by outcome valence and self-
relevance. When surprise occurs in the context of a negative self-relevant outcome, hindsight bias may be attenuated by self-
protective processes. On the other hand, when an outcome that would a priori be classified as surprising takes place in the context
of a positive self-relevant outcome, the findings of Louie et al. (2000) and the present research suggest that the hindsight bias
will not be attenuated. Pezzo (2003-this issue) also raises a potentially important distinction between culpability and self-
relevance, and it may well be that negative self-relevant outcomes will attenuate hindsight, and therefore moderate the
relationship between hindsight and surprise, only when the person feels some sense of personal responsibility for the outcome
(culpability). This distinction may explain why Tykocinsky (2001) found that people saw a negative event as more likely in
retrospect—they probably did not see themselves as culpable. For instance, the Israeli college student participants in
Tykocinsky’s Study 2 probably did not see themselves as personally responsible for the defeat of their preferred candidate for
Prime Minister. Also see both Pezzo (2003-this issue) and Renner (2003-this issue) for suggestions that the relationship
between surprise and hindsight bias may change over time.
Work on self-relevance might also fruitfully be integrated with other recent work on the processes underlying hindsight.
Take as an example Hoffrage et al.’s (2000) RAFT (Reconstruction After Feedback with Take the Best) model. This model
makes three general assumptions about the “recollection process (at Time 3): First, if the original choice (made at Time 1)
cannot be retrieved from memory, it will be reconstructed by rejudging the problem. Second, the reconstruction involves the
attempt to recall the knowledge on which the original choice was based. Third, the outcome information received (at Time 2)
is used to update old knowledge, in particular knowledge that was elusive and missing at Time 1. In conjunction, these
assumptions suffice to explain the occurrence of hindsight bias.” (Hertwig, Fanselow, & Hoffrage, 2003-this issue, p. 360).
Research on self-relevance might be able to complement the RAFT model (or any other model of hindsight). The
reconstruction process may be influenced, not only by recall of the cues that were involved in the original choice and by the
(updated) cue values, but also by self-serving considerations in the case of self-relevant outcomes. In essence, self-serving
processes involve questions of the form, “How likely is it that I would have selected a stock if it was obvious that it would
decline precipitously in value?” It is unclear precisely when self-serving processes are integrated with the results of the
reconstruction identified by the RAFT model. As the present research suggests, however, the processes generated by a
negative self-relevant outcome can counteract the knowledge updating that would otherwise occur as a result of outcome
knowledge. In short, self-serving processes may need to be taken into consideration to obtain a comprehensive understanding
of hindsight.
REFERENCES
Agostinelli, G., Sherman, S.J., Presson, C.C., & Chassin, L. (1992). Self-protection and self-enhancement biases in estimates of population
prevalence. Personality and Social Psychology Bulletin, 18, 631–642.
Arkes, H.R., Wortmann, R.L., Saville, P.D., & Harkness, A.R. (1981). Hindsight bias among physicians weighting the likelihood of
Baumeister, R.F., Bratslavsky, E., Finkenauer, C., & Vohs, K.D. (2001). Bad is stronger than good. Review of General Psychology, 5,
323–370.
Beach, L.R. (1993). Broadening the definition of decision making: The role of prechoice screening of options. Psychological Science, 4,
215–220.
Blaine, B., & Crocker, J. (1993). Self-esteem and self-serving biases in reactions to positive and negative events: An integrative review. In
R.F.Baumeister (Ed.), Self-esteem: The puzzle of low self-regard (pp. 55–85). New York: Plenum Press.
Bradley, G.W. (1978). Self-serving biases in the attribution process: A reexamination of the fact or fiction question. Journal of Personality
Brown, J.D. (1986). Evaluations of self and others: Self-enhancement biases in social judgments. Social Cognition, 4, 353–376.
Campbell, J.D. (1986). Similarity and uniqueness: The effects of attribute type, relevance, and individual differences in self-esteem and
depression. Journal of Personality and Social Psychology, 50, 281–294.
Processes, 48, 147–168.
Cook, T.D., & Campbell, D.T. (1979). Quasi-experimentation. Chicago: Rand McNally.
Dunning, D., Leuenberger, A., & Sherman, D.A. (1995). A new look at motivated inference: Are self serving theories of success a product
of motivational forces? Journal of Personality and Social Psychology, 69, 58–68.
Fischhoff, B. (1982). For those condemned to study the past: Heuristics and biases in hindsight. In D.Kahneman, P.Slovic, & A.Tversky
(Eds.) Judgment under uncertainty: Heuristics and biases (pp. 335–351). New York: Cambridge University Press.
Goethals, G.R. (1986). Fabricating and ignoring social reality: Self-serving estimates of consensus. In J. Olson, C.P.Herman, & M.P.Zanna
(Eds.), Relative deprivation and social comparison: The Ontario Symposium on Social Cognition: IV (pp. 135–157). Hillsdale, NJ:
Greenwald, A.G. (1980). The totalitarian ego: Fabrication and revision of personal history. American Psychologist, 35, 603–618.
Greenwald, A.G., Pratkanis, A.R., Leippe, M.R., & Baumgardner, M.H. (1986). Under what conditions does theory obstruct research
progress? Psychological Bulletin, 93, 216–229.
311–327.
Memory, 11, 357–377.
438 MARK ET AL.
Kunda, Z. (1987). Motivated inference: Self-serving generation and evaluation of causal theories. Journal of Personality and Social
Lipe, M.G. (1991). Counterfactual reasoning as a framework for attribution theories. Psychological Bulletin, 109, 456–471.
84, 29–41.
Macrae, C.N. (1992). A tale of two curries: Counter-factual thinking and accident-related judgments. Personality and Social Psychology
Bulletin, 18, 84–87.
Mark, M.M., & Mellor, S. (1994). “We don’t expect it happened”: On Mazursky and Ofir’s purported reversal of the hindsight bias.
Markman, K.D., & Tetlock, P.E. (2000). “I couldn’t have known”: Accountability, foreseeability and counterfactual denials of
responsibility. British Journal of Social Psychology, 39, 313–325.
Miller, D.T., & Ross, M. (1975). Self-serving biases in attribution of causality: Fact or fiction? Psychological Bulletin, 82, 213–225.
Mullen, B., & Riordan, C.A. (1988). Self-serving attributions for performance in naturalistic settings: A meta-analytic review. Journal of
Applied Social Psychology, 18, 3–22.
Ofir, C., & Mazursky, D. (1997). Does a surprising outcome reinforce or reverse the hindsight bias? Organization Behavior and Human
Roese, N.J., & Manier, S.D. (1997). Perceptions of purple: Counterfactual and hindsight judgments at Northwestern Wildcats football games.
Roese, N.J., & Olson, J.M. (1996). Counterfactuals, causal attributions, and the hindsight bias: A conceptual integration. Journal of
Schlenker, B.R. (1980). Impression management: The self-concept, social identity, and interpersonal relations. Belmont, CA: Brooks/Cole.
Sherman, S.J., & McConnell, A.R. (1995). Dysfunctional implications of counterfactual thinking: When alternatives to reality fail us. In
N.J.Roese & J.M. Olson (Eds.), What might have been: The social psychology of counterfactual thinking (pp. 199–232). Mahwah, NJ:
Steele, C.M. (1988). The psychology of self-affirmation: Sustaining the integrity of the self. In L.Berkowitz (Ed.), Advances in
experimental social psychology (Vol. 21, pp. 261–302). Orlando, FL: Academic.
Taylor, S.E. (1991). Asymmetrical effects of positive and negative events: The mobilization-minimization hypothesis. Psychological
Bulletin, 110, 67–85.
Taylor, S.E., & Brown, J.D. (1988). Illusion and well-being: A social psychological perspective on mental health. Psychological Bulletin,
103, 193–210.
Turley, K.J., Sanna, L.J., & Reiter, R.L., (1995). Counterfactual thinking and perceptions of rape. Basic and Applied Social Psychology, 17,
285–303.
Tykocinski, R.E. (2001). I never had a chance: Using hindsight tactics to mitigate disappointments. Personality and Social Psychology
Bulletin, 27, 376–382.
Verplanken, B., & Pieters, R.G. (1988). Individual differences in reverse hindsight bias: I never thought something like Chernobyl would
Wong, P.T.P., & Weiner, B. (1981). When people ask “why” questions and the heuristics of attributional search. Journal of Personality and
Wood, J.V., Taylor, S.E., & Lichtman, R.R. (1985). Social comparison in adjustment to breast cancer. Journal of Personality and Social
Psychology, 49, 1169–1183.
Zuckerman, M. (1979). Attribution of success and failure revisited, or: The motivational bias is alive and well in attribution theory. Journal
of Personality, 47,245–287.
Hindsight bias after receiving self-relevant health risk information: A
motivational perspective
Britta Renner
Ernst-Moritz-Arndt-Universitat Greifswald, Germany
The phenomenon of hindsight bias was explored in the context of self-relevant health risk information.
Participants in a community screening estimated their cholesterol level (foresight measure) before receiving
positive or negative feedback based on their actual cholesterol level. Hindsight estimations were then assessed
twice: once immediately after the feedback, and again several weeks later. While the unexpected positive
feedback group showed no systematic recall bias, hindsight estimations of individuals receiving unexpectedly
negative feedback showed a dynamic change over time. Immediately after the feedback, participants’ recollection
of their expected cholesterol level were shifted towards their actual cholesterol level (hindsight bias). In contrast,
several weeks later, foresight estimations were recalled as less accurate than they had been (reversed hindsight
bias). These data might reflect a change of the motivational focus from “hot affect” and fear control, which occur
immediately after receiving negative feedback, to danger control, which occurs some time after the feedback, as
proposed by the dual process model.
After learning the outcome of an event, people tend to remember their former predictions incorrectly as being more consistent
with the outcome than they really were. This phenomenon is called “hindsight bias” and has been demonstrated in numerous
studies, employing a wide range of judgement materials such as general almanac questions (e.g., Fischhoff, 1975; Hell,
Gigerenzer, Gauggel, Mall, & Müller, 1988; Pohl, Ludwig, & Ganner, 1999a), political events (e.g., Blank & Fischer, 2000;
Powell, 1988), medical diagnosis (Arkes, Wortmann, Saville, & Harkness, 1981), poor nursing performance (Mitchell & Kalb,
1981), rape scenarios (Carli, 1999; Stahlberg, Sczesny, & Schwarz, 1999), team decisions (Louie, Curren, & Harich, 2000),
and stock purchase decisions (Louie, 1999).
Studies of hindsight bias mostly invoke cognitive explanations, arguing, for example, that hindsight bias is a by-product of
adaptive learning (Hoffrage, Hertwig, & Gigerenzer, 2000), or the result of biased reconstruction (e.g., Dehn & Erdfelder,
1998), or memory impairment (e.g. Fischhoff, 1975; Hell et al., 1988). While the influence of cognitive factors has been well
demonstrated, hindsight bias might also be governed or moderated by motivational factors. The idea that hopes, fears, wishes,
desires, and apprehensions affect judgements is compelling. However, only a few studies have provided evidence for
motivational influences (Campbell & Tesser, 1983; Haslam & Jayasinghe, 1995; Hell et al., 1988; Louie, 1999; Louie et al.,
2000; Mark & Mellor, 1991; Mark, Boburka, Eyssell, Cohen, & Mellor,

2003-this issue; Pezzo, 2003-this issue; Schwarz, 2001; Stahlberg & Schwarz, 1999; Verplanken & Pieters, 1988). Other
studies found no indications for motivational impact (Leary, 1981, 1982; Pohl & Hell, 1996; Pohl, Stahlberg, & Frey, 1999b;
Stahlberg, Eller, Romahn, & Frey, 1993; Synodinos, 1986). As a result, it is commonly assumed that motivational influences
on the formation of hindsight bias are at most “non-negligible but small” (Hawkins & Hastie, 1990, p. 323; see also Pohl,
1998). One major shortcoming of the investigation of motivational effects on hindsight bias to date has been that the term
Requests for reprints should be sent to Dr Britta Renner, Psychologie, Ernst-Moritz-Arndt-Universitat Greifswald,
FranzMehring-Str. 47, 17487 Greifswald, Germany. Email: renner@uni-greifswald.deThis research was supported by the
Deutsche Forschungsgemeinschaft Grant Schw 208/11–01–03, The Techniker Krankenkasse, Landesvertretung für Berlin und
Brandenburg, and the Kommission für Forschung und Wissenschaftlichen Nachwuchs, Freie Universität Berlin. I thank the
reviewers Hartmut Blank, Stefan Schwarz, and two anonymous reviewers. I am also grateful to Ulrich Hoffrage, Rüdiger
Pohl, Harald Schupp, Wolfgang Hell, Judith Bäβler, Tony Arthur, and Matthias Siemer for their discussion and helpful
suggestions on earlier drafts of this paper.
440 RENNER
“motivational factor” served as an umbrella term for widely differing variables such as personality dispositions, monetary
incentives for correct recall, event favourableness, or personal involvement. To further specify the conceptualisation of
underlying motives, Verplanken and Pieters (1988) proposed a distinction between “person-related” and “decision-related”
motives.
Personality dispositions can be summarised under “person-related” motives. It seems plausible that individuals with a
stronger tendency to maintain a favourable image of themselves, or a stronger motive for predicting events than others, show
greater hindsight bias. Campbell and Tesser (1983), for instance, found that a greater need for favourable self-presentation and
a higher self-rated ego-involvement were positively related to hindsight bias. However, the predictive value of these
individual differences for hindsight bias was rather small. A high correlation between personality traits and hindsight bias
would require hindsight bias to be a result of a “thinking disposition” (Stanovich & West, 1998), i.e., stability of the hindsight
phenomenon at the individual level. On the contrary, a review of 29 empirical studies on hindsight bias indicated that the bias
varies considerably at the intra-individual level (Pohl, 1998, 1999). Taken together, the evidence so far does not support
reliable and interpretable associations between differences in personality dispositions and hindsight bias (Pohl et al., 1999b;
but see Musch, 2003-this issue).
Turning our attention to “decision-related” motives or motivational state variables, several studies explored hindsight bias
as a function of task involvement. Most empirical studies on hindsight bias conducted to date have only required estimates to
be made from artificial event descriptions, or material that has little or no relevance for the judges. Given a low level of
personal involvement, people may be especially prone to performance errors due to momentary lapses such as lack of
attention, distraction, or temporary memory deactivation (Stanovich & West, 1998), and motives might not in fact be the
issue. In this context, it is especially important to keep in mind that different motives of varying strength may be provoked,
depending on the motivational significance of the material to be judged. Two task features in particular might be of great
consequence for motivational phenomena: (1) self-involvement and (2) outcome valence. General knowledge questions, for
instance, probably induce only a low level of personal involvement in the judgement task, since the outcome of these
statements does not bear any significance for the judges. Whether judges, for example, underestimated or overestimated the
height of the Empire State Building has no effect on them, and consequently does not elicit motivational dynamism. Of
course, motivational drive might arise from an expected assessment of the degree of accuracy in answering these questions.
Respondents might strive to appear intelligent or highly prognostic, particularly when the degree of accuracy is explicitly
evaluated and rewarded. In line with this notion, Hell et al. (1988) found that monetary incentives for accuracy in recall of 88
general knowledge questions reduced the overall magnitude of the bias, in interaction with delay in the recall task, to a small
but significant degree. Unfortunately, these findings represent the exception rather than the rule (see for example Camerer,
Loewenstein, & Weber, 1989). Furthermore, experimental variations of this “try hard to recall accurately” instruction, which
were intended to elicit stronger task involvement, proved mostly to be ineffective (Hell et al., 1988; Pohl, 1998). Thus, studies
manipulating task involvement by monetary incentive or “try hard to recall accurately” instructions do not suggest strong
motivational effects on hindsight bias.
One might argue that motivational effects may be prevalent in important, naturally occurring judgement tasks that are
relevant for one’s personal life. Hence, to investigate motivational effects, outcomes should be constructed that have
relevance and bear consequences for the respondents. In this case, motivational effects are not only stimulated by the task, but
also by the outcome which impacts positively or negatively on the judges.
Given self-relevant outcomes, the study of hindsight bias can be informed by work in other domains. Within the motivated-
judgement literature, the empirical finding that has received the greatest attention during the last two decades is the robust
tendency to view oneself in an unrealistic light, and to perceive favourable information as more valid, accurate and internally
caused than unfavourable information (Armor & Taylor, 1998; Ditto, Scepansky, Munro, Apanovitch, & Lockhart, 1998).
Consequently, it is commonly assumed that personally relevant feedback that is inconsistent with self-beliefs and personal
goals produces systematically self-defensive biases in judgements as a function of its positivity (Armor & Taylor, 1998; Ditto
& Boardman, 1995; Kunda, 1987, 1990). Drawing especially on the finding that individuals take credit for favourable outcomes
and avoid blame for unfavourable outcomes, Mark and Mellor (1991) proposed that individuals show a reduced or even
reversed hindsight bias selectively for unfavourable outcomes (see also Louie, 1999; Mark et al., 2003-this issue; Stahlberg &
Schwarz, 1999). Derogating the predictability of the outcome decreases internal attributions for one’s plight, and saves
individuals from unpleasant feelings of guilt, regret, or blame for their situation, or from the notion that they should have
prevented it. Hence, unfavourable outcomes may lessen or even reverse the bias, due to self-serving mechanisms. For testing
this assumption, Mark and Mellor (1991) investigated hindsight bias in the context of a “real-life” setting. They asked (a)
workers who were laid off for an average of 27 months, (b) workers who were not laid off, and (c) members of the local
community, to rate retrospectively the foreseeability of the layoffs in their union local. The result showed that laid-off
workers rated the layoffs as less foreseeable than survivors or community members did. They concluded that laid-off workers
showed lower hindsight bias in comparison to the other two groups since the implications of blame for their situation
motivated them to deny the predictability of the outcome. As foresight estimations were not measured, this interpretation rests
RISK INFORMATION AND HINDSIGHT BIAS 441
on the assumption that the groups did not differ systematically in their perceived foreseeability before the layoffs. However,
the results did not change when Mark and Mellor (1991) controlled statistically for potential confounding variables (e.g., year
of job seniority), which supports the interpretation that the favourableness of the event influences hindsight bias. In more
recent studies, also using a between-subjects design but in an experimental setting with fictitious outcomes, equivalent results
have been obtained (Louie, 1999; Louie et al., 2000; Mark et al., 2003-this issue; Schwarz, 2001; Stahlberg & Schwarz,
1999).
To date, only one other study has investigated hindsight bias in the context of a non-laboratory, highly self-relevant and
consequential setting. Haslam and Jayasinghe (1995) asked undergraduates to predict their grades one week prior to a midterm
exam (foresight estimation). Two weeks after the exam respondents received their actual grade, and about one week later they
were asked to recall their predictions (hindsight estimation). The researchers report a typical hindsight bias for students who
were too optimistic in their predictions: those who received an unexpected poor grade improved their foresight estimation in
retrospect. In contrast, students who gave too pessimistic predictions showed a reversed hindsight bias. They recalled their
foresight predictions as less accurate than they had been. Clearly, these findings are not in line with Mark and Mellor’s (1991)
interpretation that negative self-relevant outcomes reduce hindsight bias. Haslam and Jayasinghe (1995) also consider their
findings as evidence for self-serving mechanisms. However, they propose a different motivational mechanism. They argue
that hindsight bias makes the outcome seem more foreseeable, and therefore enhances a sense of control in retrospect.
Particularly in the context of negative outcomes that are under behavioural control, such as poor task performance or illness
and disease, hindsight bias might function as a strategy for enhancing control by emphasising the relationship between action
and outcome. The ability to detect signs of illness beforehand is important for the prevention and control of health problems.
Hence, some individuals might gain benefit from the perception that they contributed to their situation because it restores a
sense of control, and implies that they will be able to avoid a similar misfortune in the future (Thompson, Armstrong, &
Thomas, 1998). From this perspective, individuals who face a negative outcome with most severe and threatening
consequences should demonstrate the most hindsight bias. In contrast, individuals who discovered that their predictions were
too pessimistic might show reversed hindsight bias, because admitting an unexpectedly favourable test result enhances
positive affect, and is a pleasant surprise. However plausible this interpretation might seem, it must remain speculative since
these findings have not been replicated.
Taken together, the present literature leads consistently to the pessimistic conclusion that motivational influences on the
formation of hindsight bias are at most, “non-negligible but small” (Hawkins & Hastie, 1990, p. 323; see also Pohl, 1998).
However, an area of research has been identified which suggests that motivational factors do impact on the formation of
hindsight bias. Studies on hindsight bias employing a naturally occurring outcome of high self-relevance have shown that
outcome valence does indeed influence the amount of hindsight bias, presumably by invoking self-protective motives. However,
to date only two studies have employed a design in which self-relevant outcomes were the focus of the estimates.
Interestingly, the pattern of hindsight bias and the proposed underlying self-serving mechanisms were quite different in both
studies. These apparently conflicting results may reflect the dynamic nature of the motive structure.
Self-relevant negative feedback invokes a dynamic motive structure, which might involve a rather short-term phase of “hot
affects” (e.g., fear), followed by more cognitive representations of the threat such as “perceived vulnerability” (Leventhal et
al., 1997; Renner & Schwarzer, in press). Hence, motives could change over time depending on the current situational
circumstances, and cannot be considered as constants that uniformly influence judgements. Interestingly, the time of
measurement of hindsight estimations differed dramatically in both studies described earlier. While Haslam and Jayasinghe
(1995) asked their participants about one week after the feedback to recall their foresight estimations, Mark and Mellor (1991)
elicited estimates after an average delay of 27 months. Accordingly, exploring effects of motivation on the hindsight
phenomenon might benefit by considering changes in the motive structure over time.
THE PRESENT RESEARCH

The present study explores the impact of motivation on hindsight bias in the context of self-relevant health risk information. A
community cholesterol screening was used as a context. Participants received their actual cholesterol reading as feedback,
which was either positive for the self (normal cholesterol reading) or negative for the self (elevated cholesterol reading). Since
a high cholesterol level is a primary risk factor for cardiovascular diseases such as heart attack, the feedback is naturalistic and
of clear emotional importance for the participants. Hence, it is justified to assume that self-protective motives of significant
strength are elicited. In order to assess the temporal dynamics presumed to be invoked by self-threatening health feedback, a
longitudinal perspective was applied and hindsight bias was measured on two occasions. The time of measurement was
chosen according to the motivational dynamics explored in the context of fear-communication. These studies showed that fear
appeals led initially to increased fear and acceptance of the recommended action, but both fear and attitude changes faded
away after 24 to 48 hours (Leventhal et al., 1997). Thus, in order to clearly separate these different foci of threat-feedback
442 RENNER
processing, a first estimate of hindsight bias was taken shortly after receiving feedback, while the second measurement was
postponed for several weeks.
The specific hypotheses of the present study were developed according to the dual process model (Leventhal, Safer, &
Panagis, 1983; see also Leventhal et al., 1997), which distinguishes two motives aroused by self-threatening information: (1)
fear-control motivation and (2) danger-control motivation. Fear-control motivation stimulates behaviour or cognitions that are
needed to cope with emotion, whereas danger-control motivation stimulates behaviours that are needed to cope with the
threatening agent itself. Fear-control motivation is high immediately after receiving a negative self-threatening feedback,
whereas danger-control motivation is more prominent later on. This leads to the assumption that after a negative outcome is
given, motivational pressures that influence hindsight bias will lessen or change over time. One can speculate that
immediately after the self-threatening feedback, people strive to regain a sense of predictability and control in order to calm
their emotional upset and to generate action plans for coping, and therefore demonstrate hindsight bias (e.g., “I already knew
that my cholesterol level would be high since I have put on some weight in the last few months. But this will soon change”).
Hence, hindsight bias may serve the important function of controlling potentially disruptive emotions, which might interfere
with adaptive behaviour. Thus, in common with the predictability motive proposed by Haslam and Jayasinghe (1995), it is
hypothesised that immediately after the feedback participants who received an unexpected negative cholesterol test result will
demonstrate hindsight bias.
After the strong initial emotional impact has faded away and the threatening information has been “digested” people might
feel more in control, and may therefore focus more on decreasing their responsibility for the past negative outcome.
Alternatively, one can hypothesise that after problem-focused action has been taken and emotional upset has lessened,
individuals can more readily admit that their past judgement was inaccurate. This would result in a decreased hindsight bias.
Therefore, delayed hindsight estimates should be more influenced by the motive to decrease responsibility, as proposed by
Mark and Mellor (1991), and should therefore display an inverse pattern: unexpected unfavourable feedback should lead to a
decreased or even reversed hindsight bias.
To ensure that this pattern is due to negative feedback valence, comparisons were made between participants who received
unexpected negative feedback and participants who received an unexpected positive feedback. Participants receiving an
unexpected negative feedback should show a stronger tendency for hindsight bias immediately after feedback than
participants who received a positive test result, since they should be more motivated to regain predictability and control.
Delayed hindsight estimations of both groups should also differ: Unexpected negative feedback should lead to a decreased or
even reversed hindsight bias in comparison to unexpected positive feedback because the former should elicit a stronger
tendency to decrease responsibility at this point in time.
METHOD
Participants
A large proportion of the participants (66%) were recruited for a cholesterol-screening conducted by the Free University of
Berlin and the Technician’s Health Insurance Agency (Techniker Krankenkasse) through advertisements placed in local
newspapers in Berlin, Germany. In addition, a letter describing the study was sent to people insured with the Technician’s
Health Insurance Agency who lived near the four study locations (two universities and two city halls). In all, 1506 individuals
were recruited.
Of these 1506 individuals, 92 participants (6%) had to be excluded from the data set because they did not complete the
foresight measure or the first hindsight measure. Another 511 participants (34%) failed to complete the third questionnaire,
which included the second hindsight measure. Accordingly, the “study sample” comprised 903 individuals (60%), who
provided complete data sets including the foresight and both hindsight questions. The mean age of these participants was 42
years (SD=15.7), 47% were male, and the average cholesterol level was 220 mg/dl (SD = 45.3), which is within the
borderline high range and below the mean German population cholesterol level of 237 mg/dl (Troschke, Klaes, Maschewsky-
Schneider, & Scheuermann, 1998).
The data from the 511 individuals providing only the first hindsight estimate were considered in control analyses. Thus,
systematic differences between the study sample providing both hindsight measures and this “control sample” were explored
regarding the pattern of hindsight bias for the first hindsight measure. Of these 511 participants, 46% were male, and they
were on average 37 years old (SD=14.5). Average cholesterol level was 215 mg/dl (SD=45.4). Analysis showed that the
control sample was on average 5 years younger than the study sample providing complete data, t(1412)=5.38; p < .001.
Furthermore, they exhibited a significantly lower mean cholesterol reading, t(1412)=2.10; p=.036, in comparison to the study
sample. There was no significant gender difference between the two samples, ; p=.55.
Measures
Foresight estimation. Individuals completed a first questionnaire asking them to indicate their beliefs about a series of
different health problems and disorders. The foresight estimation was embedded in the questionnaire. Participants were asked
“Immediately after completing this questionnaire your cholesterol level will be measured. What cholesterol level do you
expect?” Participant rated their expected cholesterol test result on a scale of 1 (very low) through 4 (optimal) to 7 (very high).
1
Hindsight estimation. After cholesterol test result feedback was given, participants completed a second questionnaire,
which included the first hindsight estimation. The general stem for the item was, “Please think back to the first brief
questioning, which took place before the cholesterol measurement. There, you were asked which cholesterol level you
expected. What did you expect at that time?”. Responses were made on the same 7-point scale used for foresight estimation
ranking from 1 (a very low cholesterol test result) through 4 (an optimal cholesterol test result) to 7 (a very high cholesterol
test result). The same item was included in the third questionnaire, which was completed at home.
Feedback reception. The second questionnaire, which was given shortly after receiving the feedback, also assessed various
responses associated with receiving health-relevant feedback information. Three questions served to assess perceived threat.
Participants were asked to rate how worried they felt due to their cholesterol test result. Ratings were made on a scale of 1
(absolutely not worried) through 4 (worried) to 7 (very worried). Furthermore, participants were asked to rate how serious a
threat to health their cholesterol level was on a 7-point scale, anchored by 1 (very low) through 4 (moderately high), to 7
(very high). A further question, the perceived pressure to change, reflects the extent to which a person feels the pressure to
lower the cholesterol level and change behaviours. High pressure to change is induced by threatening situations, which
require personal action to change the situation (cf. Fuchs, 1996). Participants were given the following statement: “It is
necessary for me to do something to lower my cholesterol level.” The responses were given on a 4-point scale, ranging from 1
(strongly disagree) to 4 (strongly agree). Own worry, perceived threat, and perceived pressure to change were assessed
immediately after the test results were given and before the first hindsight estimation. They were measured in fixed order and
were separated by other variables (e.g., perceived prevalence of cardiovascular risk factors and diseases).
Following Ofir and Mazursky (1997), feelings of surprise influence the amount of hindsight bias. Accordingly, surprise
elicited by feedback was measured by asking participants to indicate how surprised they were by the cholesterol test result on
a 5-point scale, ranging from “I was very positively surprised by my cholesterol test result” (+2) through “My cholesterol test
result matched with my initial expectation” (0) to “I was very negatively surprised by my cholesterol test result” (−2).
Perceived surprise was assessed before the first hindsight estimation.
Procedure
Upon their arrival at the screening site, participants received a brief description of the study and signed a consent form.
Participants were then asked to answer a first questionnaire which included the foresight estimation of the cholesterol test
result. Afterwards, participants’ height and weight were measured. Then, trained laboratory assistants measured the total
cholesterol level using a fingerstick blood draw and a Reflotron desktop analyser. Following the cholesterol measurement,
participants were provided with their exact actual cholesterol level. Furthermore, participants received feedback on their
cholesterol level risk category according to international standards (National Heart, Lung, and Blood Institute, 1995).
Participants with a cholesterol level of 200 mg/dl or below were told that their cholesterol level was optimal and did not pose
a risk for cardiovascular diseases. Both individuals with borderline high cholesterol levels (between 201 mg/dl and 249 mg/
dl) and participants with high cholesterol levels (above 249 mg/ dl) were informed about potential risks of borderline and high
cholesterol levels for cardiovascular diseases. The time between filling in the first questionnaire and the cholesterol feedback
was about 30–40 minutes. Shortly after receiving the cholesterol feedback, participants were given a second questionnaire.
Among filler questions, this questionnaire included the first hindsight measure and assessed perceived threat and surprise
elicited by the cholesterol feedback. After completing the second questionnaire participants received individualised follow-up
recommendations, they were thanked for their participation, and received a more detailed questionnaire that included an
additional hindsight estimation. This one was completed at home and sent back in a sealed envelope. On average, the third
questionnaire was sent back 5 weeks after feedback (SD=1.7).2
1Only elevated total cholesterol levels are considered as a health risk factor. Levels under 201 mg/dl do not require medical attention. Since
cholesterol levels could be considerably below 201 mg/dl, the rating scale included ratings from very low to very high.
2 A limitation of the study is that the feedback groups may have differed in their exact return time of the third questionnaire. Those who
were given unexpected positive feedback might have filled in the questionnaire right away whereas those who received unexpected negative
feedback might have shown a greater delay. However, since the standard deviation of the average return time was less than 2 weeks, great
differences between the two groups do not seem plausible.
444 RENNER
RESULTS
Foresight accuracy
Hindsight bias is a pattern of relationships among foresight estimations, actual cholesterol test results, and hindsight
estimations. However, participants who predicted their cholesterol test result accurately could not improve their prediction in
hindsight, but could only show either perfect memory or a decrease in accuracy. Only participants who gave a foresight
estimation that did not match with the actual test result could theoretically demonstrate either accurate recall, hindsight bias, or
reversed hindsight bias. Thus, the first analysis compared the incidence of accurate foresight estimations, determined by
comparing participants’ foresight estimations with their actual cholesterol test results (see Table 1). Participants received three
qualitatively different feedbacks: optimal, borderline high, or high cholesterol level, and foresight estimations were given on a
7-point scale.3 Participants were divided according to whether they expected an optimal or even lower cholesterol test result
(“not at risk”), or an elevated reading (“at risk”), and whether they had received an optimal (“not at risk”; (≤ 200 mg/dl) or
elevated reading (“at risk”; > 200 mg/dl), resulting in a 2×2 Table.
As shown in Table 1, 64% of the study sample (n=580, i.e., the sum of the areas outside the two frames) were able to
accurately predict their actual cholesterol level. These individuals were excluded from all further analyses because due to
their accurate foresight estimations they could not show hindsight bias. Hence, the following analyses are solely based on
individuals who received unexpected feedback (n=323, i.e., the sum of the two framed areas).
Out of the remaining 323 participants of the study sample who predicted their cholesterol level inaccurately, 198 (22%, the
sum of the upper right frame in Table 1) estimated their cholesterol test result to be optimal or even lower while their actual
cholesterol level was borderline high or high (above 200 mg/dl). Thus, these participants received an unexpected negative test
result and therefore demonstrated an optimistic bias. When the foresight estimation exceeded the actual test result, a
pessimistic bias existed (a participant estimated that his or her cholesterol test result would be high or borderline high, and the
actual test result was optimal or lower). These participants received an unexpected positive test result (n =125, 14%, the sum
of the lower left frame in Table 1). Hence, if participants made an inaccurate prediction they were more likely to make an
unrealistically optimistic prediction than an unrealistically pessimistic one, , p < .001.
Overall frequency of hindsight bias as a function of time

As a first step, the incidence or generality of biased judgements in hindsight was assessed. Biased judgements were calculated
by subtracting hindsight responses from those generated in foresight. A score of zero indicates accurate recall. For
participants who were given an unexpected high test result (negative feedback) a negative score indicates hindsight bias,
while a positive score shows reversed hindsight. Those who received an unexpected low cholesterol test result (positive
feedback) displayed hindsight bias if the deviation is positive, and reversed hindsight bias if the score is negative.
Immediately after test result feedback, 70% of the participants confronted with an unexpected test result accurately recalled
their foresight estimation (see Table 2). As expected, recall accuracy declined as the recall interval increased: After a time
interval of 5 weeks had elapsed, only 47% recalled their foresight estimation accurately. Thus, accurate recall was higher
when probed shortly after feedback compared to a 5-week delay, p<. 001.
Considering the immediate hindsight measure, there is a clear systematic recall bias, since 26% (n = 83) showed hindsight
bias whereas only 4% (n = 13) displayed reversed bias, p < .001. Hence, the hindsight bias phenomena could be
replicated in a naturalistic, personally relevant setting. Considering the delayed hindsight measure, the judgement errors
change from a pattern of hindsight to a systematic pattern of reversed hindsight. Participants more often remembered a
foresight estimation that was more dissimilar to the feedback than similar to it. To be specific, 32%
3 Due to this discrepancy between the foresight measure and the feedback, one could realise different “matches” between both, depending
on the applied cut-off point. For cardiovascular diseases it is only relevant if the cholesterol level is elevated. Hence, the most important
qualitative distinction for participants was probably whether they considered themselves at risk (foresight ratings from “moderately high”
through “very high”) or not at risk (foresight ratings from “optimal” through “very low”).
TABLE 1 Frequencies and percentages for study sample and control sample as a function of expected cholesterol test result (foresight
estimation) and actual cholesterol test result (feedback)
Feedback
Foresight estimation Optimal (≤ 200 mg/dl) Borderline High (201– High (≥ 250 mg/dl) Total
(expected cholesterol 249 mg/dl)
level)
Very Low (1) 2 (1) 3(1) 1 (0) 6 (2)
0.2% (0.2%) 0.3% (0.2%) 0.1% (0%) 0.7% (0.4%)
Low (2) 17 (9) 10 (10) 1 (2) 28 (21)
1.9% (1.8%) 1.1% (2.0%) 0.1% (0.4%) 3.1% (4.1%)
Moderately Low (3) 10 (8) 5(5) 3 (3) 18 (16)
1.1% (1.6%) 0.6% (1.0%) 0.3% (0.6%) 2.0% (3.1%)
Optimal (4) 159 (106) 131 (63) 44 (26) 334 (195)
17.6% (20.7%) 14.5% (20.7%) 4.9% (5.1%). 37.0% (38.2%)
Moderately High (5) 115 (68) 200 (117) 133 (50) 448 (235)
12.7% (13.3%) 22.1% (22.9%) 14.7% (9.8%) 49.6% (46.0%)
High (6) 10 (8) 17 (13) 37 (18) 64 (39)
1.1% (1.6%) 1.9% (2.5%) 4.1% (3.5%) 7.1% (7.6%)
Very High (7) 0 (0) 1(1) 4 (2) 5 (3)
0% (0%) 0.1% (0.2%) 0.4% (0.4%) 0.6% (0.6%)
Total 313 (200) 367 (210) 223 (101) 903 (511)
34.7% (39.1%) 40.6% (41.1%) 24.7% (19.8%) 100.0%
Study sample n=903. Control sample n=511.
Numbers outside parentheses represent frequencies for the study sample.
Numbers within parentheses represent frequencies for the control sample.
The upper right frame includes participants who estimated their cholesterol test result to be optimal or low, and whose actual cholesterol
level was borderline high or high (above 200 mg/dl), thus they received an unexpected negative test result.
The lower left frame includes participants who estimated their cholesterol test result to be borderline high or high, and whose actual test
result was optimal or low. Therefore, they received an unexpected positive test result.
showed reversed hindsight bias, whereas 21% displayed hindsight bias
Biased judgements in hindsight as a function of feedback valence and measurement point in time
A 2×3 mixed between-within-subjects analysis of variance (ANOVA) was conducted to determine whether participants
showed a memory bias towards the actual cholesterol level (hindsight bias), or a reversed hindsight bias as a function of
feedback valence and time of measurement. In this analysis, the within-subject factor named “Time of Measurement”
represents the three judgements of the expected cholesterol level, i.e., foresight, first and second hindsight estimate. The
between-subjects factor “Feedback Valence” is based on the feedback participants were given and consists of two levels,
unexpected negative or unexpected positive feedback. The reported results are based on a full factorial model which meets
standard requirements of ANOVA. Each effect was adjusted for all the other effects in the model. Therefore, the total SS is
divided into a source attributable to the between-subjects factor “Feedback Valence”, and a source attributable to the within-
subject factor “Time of Measurement”. In addition, changes over time in hindsight estimations were examined by
constructing simple main effects, which means the effect of the within-subject factor “Time of Measurement” was computed
within each level of the between-subjects factor “Feedback Valence”. This is a rather conservative approach, since total SS
remains constant over all analyses, but it has the advantage that all F values are comparable (cf. Kirk, 1968).
TABLE 2 Frequency of hindsight bias immediately after feedback (t1) and delayed after feedback (t2)
Accurate Hindsight bias Reversed hindsight bias Total
Immediately after feedback (t1) 227 (70%) 83 (26%) 13 (4%) 323 (100%)
Approximately 5 weeks after feedback (t2) 151 (47%) 69 (21%) 103 (32%) 323 (100%)
Results only for the study sample who received unexpected feedback, n=323.
446 RENNER
Figure 1. Mean foresight and hindsight estimations as a function of feedback valence and time of measurement. (Means not sharing a
common superscript differ at p < .05.)
The 2×3 ANOVA yielded a significant main effect of the between-subjects factor “Feedback Valence”, F(1, 321)=422.23; p < .
001 Inspection of the mean estimations revealed that participants who received unexpected positive feedback
gave higher foresight and hindsight estimations compared to participants who received unexpected negative feedback. In
addition, the within-subject factor “Time of Measurement” was significant, F(2, 642)=15.07; p < .001 .
Hence, mean foresight and hindsight estimations varied across time, confirming the results of the frequency analysis
(Table 2). However, as expected, the two main effects were further qualified by a significant interaction, F(2, 642)=18.76; p < .
001
Accordingly, further simple main effects of the within-subject factor “Expected Cholesterol Level” were assessed within
each feedback group. Within the unexpectedly negative feedback group, estimates differed significantly across the three
measurements, “Time of Measurement” F(2, 642)=39.23; p < .001. Post hoc Scheffé contrasts were calculated according to
the hypothesis. As shown in Figure 1, participants who were given an unexpectedly negative feedback demonstrated
hindsight bias when they had to recall their foresight estimate immediately after the negative health feedback. That is, recall
of the foresight estimate showed an upward shift, and hence in hindsight immediately after the feedback a foresight
estimation was remembered which was more similar to the actual test result than it actually was. Interestingly, the pattern of
biased judgement reversed across time, showing a downward shift at the second hindsight measurement. Thus, compared to
the foresight estimate, participants recalled their foresight estimate as more discrepant from the actual cholesterol level.
To ensure that the observed pattern is due to feedback valence rather than to self-relevance of feedback, the effect of the
within-subject factor “Time of Measurement” was assessed within the unexpected positive feedback group. As Figure 1
shows, the results for this group contrast with the group receiving unexpected negative cholesterol feedback. Their recall of the
foresight estimate was characterised by a slight and statistically non-significant shift in direction of the given feedback for
both estimates, F(2, 642)=2.83; p=.06.
To summarise, mean hindsight estimations differed as a function of measurement point in time and feedback valence. As
hypothesised, when measured immediately after feedback, unexpected negative feedback resulted in hindsight bias, while
conversely, if measured following a delay after feedback, a reversed hindsight bias emerged. The unexpected positive
feedback group showed on average neither a dynamic change over time nor a systematic bias in recall. The asymmetry
between both feedback groups supports a motivational account of hindsight bias.4
Control analyses: Hindsight bias in the control sample that provided only the first hindsight measure
Analogous to the study sample, 64% (n=325) of the control sample received a cholesterol test result that was consistent with
their initial expectation, 21% (n=110) displayed an unexpected negative cholesterol reading, and 15% (n=76) were
confronted with an unexpected positive result. There were no significant differences among the control group and the study
group in frequency of realistic, optimistic, or pessimistic expectations, p=.87 (see Table 1). Analogous to the
analyses for the study sample, participants who predicted their cholesterol level accurately were excluded from further
analyses, since by definition they could not demonstrate hindsight bias. This left 186 individuals for analyses who received
unexpected feedback.
Immediately after the test result feedback, most participants of the control sample who received an unexpected feedback
recalled their foresight estimation accurately (68%), but even so hindsight bias (28%) was more frequent than reversed
hindsight bias (4%) with < .001. Hence, the hindsight bias phenomenon could be replicated in a second sample.
In addition, the control sample and the study sample did not differ significantly in respect to frequency of biased recall,
p=.85.
To explore whether feedback valence moderated the frequency of hindsight bias immediately after the feedback, a 2×2
ANOVA analysis with the within-subject factor “Time of Measurement” (foresight and first hindsight estimate) and the
between-subjects factor “Feedback Valence” (unexpected positive and unexpected negative feedback) was conducted. The
main effect of “Feedback Valence” was significant, F(1, 184)=205.98; p < whereas the within-subject
factor “Time of Measurement” did not reach statistical significance, F(1, 184)=1.00; p=.32 More important,
the hypothesised significant interaction emerged, F(2, 184)=29.48; p < .001 In accordance with the study
sample, participants who received an unexpected negative feedback showed hindsight bias, F(1, 184)=25.28; p < .001. In
contrast to the study sample, the unexpected positive feedback group also showed hindsight bias F(1, 184)=8.30; p < .01, but
to a smaller degree than the unexpected negative feedback group (Table 3).
TABLE 3 Mean foresight estimation (t0) and hindsight estimation immediately after feedback (t1) by valence of feedback
Unexpected negative feedback (n=110) Unexpected positive feedback (n=76)
Foresight estimation (t0) 3.7 (.70) 5.1 (.31)
Immediately after feedback (t1) 4.2 (.74) 4.8 (.67)
F(1, 184)=25.28, p<.001 F(1, 184)=8.30, p<.01
Results only for the Control Sample that Received Unexpected Feedback, n=186.
Numbers in parentheses are standard deviations
While the overall pattern of the study sample was replicated in the control sample, some differences did occur. To explore
whether both samples differed significantly in their estimations, a 2×2×2 ANOVA analysis was conducted with the two
samples as an additional between-subjects factor named “Design Group” (study sample and control sample). The interaction
between “Time of Measurement” and the between-subjects factor “Feedback Valence” was again significant, F(1, 505)=66.
76; p < .001 Neither the main effect of the factor “Design Group”, nor any interaction term including this
factor, was significant (all Fs < 1, ns).
In summary, the result pattern observed for the study sample was replicated within the control sample.
Unexpected cholesterol feedback: Feelings of threat and surprise

In order to ensure that unexpected negative feedback elicited threat, additional analyses were performed comparing self-
reported worry, perceived threat to health, and perceived pressure to change as a function of feedback valence. For analyses,
2×2 ANOVAs were computed with “Feedback Valence” (unexpected negative or unexpected positive feedback) and “Design
4 To examine whether the observed recall bias at the second hindsight estimate was restricted to participants who mis-recalled their
prediction at the first hindsight measurement, a 2×2×2 ANOVA was conducted with “Feedback Valence” (positive vs negative),
“Accuracy” (accurate first hindsight estimate vs inaccurate first hindsight estimate), and “Time of Measurement” (foresight and second
hindsight estimate). However, neither the main effect for “Accuracy” nor any twoway interaction term including this factor reached
statistical significance (all Fs < 1). Conversely, the triple interaction was significant, F(1, 319)=5.64, p=.018. Further inspection revealed that
within the unexpected negative feedback group, participants who gave an inaccurate hindsight estimation at the first measurement (t1) showed
a reversed hindsight bias at the second measurement: foresight, M=3.9 vs second hindsight, M = 3.4; F(1, 319)=23.71, p < .001.
Participants who gave an accurate hindsight estimation at the first measurement (tl) demonstrated a less pronounced but still significant
reversed hindsight bias: foresight, M=3.8 vs second hindsight, M=3.5; F(1, 319)=3.71, p < .025. Within the unexpected positive feedback
group no significant differences between the two accuracy group were obtained, all Fs < 1.
448 RENNER
Group” (study sample or control sample) as between-subjects factors (n=509).5 Neither the main effect for the factor “Design
Group” nor the interaction effect reached statistical significance (all Fs < 1.15; p > .29). Conversely, the factor “Feedback
Valence” was significant. Participants who received an unexpected negative feedback (M=3.1; SD=1.4) were more worried
about their test result than participants who received an unexpected positive feedback, M=2.1; SD=1.3; F(1, 500)=45.50; p < .
001; In addition, they rated their unexpected negative cholesterol test result as a higher threat to their health
than did participants who were told an unexpected positive test result, M=3.1; SD=1.3 versus M=2.4; SD=1.4; F(1, 501)=26.
47; p < .001; η2partial=.06. An unexpected negative test result also elicited a stronger pressure to change compared to an
unexpected positive feedback, M = 3.0; SD=1.0 versus M=1.9; SD=1.0; F(1, 491) = 98.08; p < .001; Hence,
unexpected negative feedback elicited significantly more self-reported threat than unexpected positive feedback.
In addition, mean surprise ratings of participants differed as a function of feedback valence in the anticipated direction.
Thus, participants with an unexpected negative feedback were negatively surprised SD=0.90), whereas participants
with an unexpected positive feedback were positively surprised (M=0.85; SD=0.91). Interestingly, the absolute level of
reported surprise differed significantly, F(1, 495)=16.59; p < .001; An unexpected positive test result elicited
stronger surprise in comparison with an unexpected negative feedback. Again, the study sample and the control sample did not
differ significantly (all Fs < 0.39; p > .53).
Hindsight bias and perceived threat

Additional analyses within both feedback groups were conducted to elaborate more on the underlying motives for biased
recall. It might be expected that the perceived threat elicited by the unexpected negative cholesterol feedback was more
pronounced for individuals who demonstrated hindsight bias compared to individuals who showed accurate recall or reversed
hindsight bias immediately after the feedback. In contrast,
TABLE 4 Mean ratings for self-rated worry, perceived health threat, and perceived pressure to change by accuracy of recall (t1) within
unexpected negative feedback and within unexpected positive feedback
Unexpected negative feedback Unexpected positive feedback
Accurate Hindsight bias Reversed hindsight bias Accurate Hindsight bias Reversed hindsight bias
Worry (n=504) Mean 3.0a 3.3a 2.7b 2.1c 2.2c 1.9c
Adj. R2=.09 SD 1.4 1.4 1.9 1.3 1.4 1.5
n 202 91 13 148 43 7
Perceived Threat Mean 3.0c 3.4a 2.5b 2.4c 2.4c 2.3c
(n=505)
Adj. R2=.06 SD 1.3 1.3 1.4 1.3 1.6 1.2
n 202 91 13 149 43 7
Perceived Pressure to Mean 2.9c 3.2a 2.6b 1.9d 2.0d 2.1d
Change (n=495)
Adj. R2=.18 SD 1.0 1.0 1.0 1.1 0.9 1.4
n 198 89 13 145 43 7
n varies due to missing values.
Means without rows not sharing a common superscript differ at p < .05.
perceived threat might not vary as a function of recall accuracy for participants who were given an unexpected positive
feedback. Both samples (study sample and control sample) were collapsed for these analyses (n=509), and separate ANOVAs
containing the three-level between-subjects factor “Recall Accuracy” (hindsight bias, accurate recall, and reversed hindsight bias
at the first hindsight measure) and the two-level between-subjects factor “Feedback Valence” (unexpected negative and
unexpected positive feedback) were calculated.6 Simple main effects of recall accuracy within each feedback group and post-
hoc Scheffé contrasts were conducted to compare the extent to which participants who showed either hindsight bias, accurate
recall, or reversed hindsight bias, differed in perceived threat.
Within the unexpected negative feedback group, a significant effect for recall accuracy emerged for all three threat
variables, worry, perceived threat, and pressure to change (all Fs > 3.32; p < .04). As Table 4 depicts, participants who were
confronted with an unexpected negative test result and showed hindsight bias reported, on average, more worry, they felt
more threatened by their test result, and they perceived a higher pressure to change than those who were also confronted with
5 Due to missing values, degrees of freedom vary between analyses.

an unexpected negative feedback, but showed reversed hindsight bias. In addition, those who recalled their foresight
estimation accurately fell in between these two groups. As hypothesised, perceived threat did not differ within the unexpected
positive feedback group as a function of recall accuracy (all Fs > 1.77; p < .17). However, these results must be interpreted
with caution because cell frequencies vary greatly and the differences are rather small. Since the findings are very similar for
all three threat variables, however, this could be interpreted as support for the notion that the more unexpected the negative
feedback was, the more threat it elicited, and the more hindsight estimations were biased towards the given feedback.
DISCUSSION
The present study explored the phenomenon of hindsight bias in the context of self-relevant health risk information. More
specifically, individuals participating in a community screening received feedback based on their actual cholesterol level,
while foresight measures were obtained before cholesterol measurement, and immediate and delayed hindsight measures
probed afterwards for memory bias. The findings suggest three conclusions: First, the phenomenon of hindsight bias and
reversed hindsight bias has been demonstrated for the first time in the domain of health psychology, and did hold up outside
the laboratory in “real life”. Second, the present data might also contribute towards framing hypotheses on “motivated”
hindsight bias. The unexpected positive feedback group showed no systematic recall bias, whereas the unexpected negative
feedback group showed evidence for memory bias. The difference between the two feedback groups suggests a motivational
account for the observed recall distortions. Third, a dynamic change of direction in the judgement bias was observed for
participants confronted with unexpected negative feedback. Immediately after feedback, recalls of foresight estimations were
shifted towards the actual cholesterol level, indicating hindsight bias. In contrast, reversed hindsight bias emerged when
recalled foresight estimations were probed several weeks later. These data might reflect a change of motivational focus from
“hot affect” and fear control to more cognitive event representations and danger control, as proposed by the dual process
model (Leventhal et al., 1983, 1997). Interestingly, the dynamic shift in judgement bias over time suggests that motivated
hindsight reflects an adaptive mechanism.
Hindsight bias and self-regulation

It was proposed that systematic changes in hindsight judgements would be observed for participants receiving self-threatening
information. This hypothesis was derived according to the literature on motivated judgement (Armor & Taylor, 1998; Ditto &
Boardman, 1995; Ditto et al., 1998; Kunda, 1987, 1990), and health psychology (Leventhal et al, 1983, 1997), and
incorporated the assumption that motivated hindsight bias works in the service of self-protective motivation (Haslam &
Jayasinghe, 1995; Louie, 1999; Mark & Mellor, 1991; Stahlberg & Schwarz, 1999). In line with this hypothesis, feedback
valence moderated hindsight bias. Considering the immediate hindsight measure, both the positive and negative feedback
groups shifted their foresight estimate towards the actual cholesterol level (cf. Figure 1 and Table 3). However, in both
samples, the shift was more pronounced for participants who were given an unexpected negative cholesterol feedback.
Considering the delayed hindsight measure, a unique dynamic shift was observed within the unexpected negative feedback
group. They demonstrated hindsight bias at the immediate measure and reversed bias at the delayed measure. In contrast,
unexpected positive feedback was not accompanied by a dynamic shift from immediate to delayed hindsight estimations.
Motivated hindsight bias elicited by negative self-threatening information might be considered as a particular instance of
the more general asymmetric effects elicited by positive and negative events (Taylor, 1991; Berntson, Cacioppo, & Gardner,
1999). Accordingly, it is assumed that negative events elicit particularly strong immediate responses, followed by responses to
minimise or cope with adverse events compared to positive events. Interestingly, the dynamic shift in immediate and delayed
hindsight measures within the group receiving unexpected negative feedback suggests that motivational effects on hindsight
have an adaptive role and are consequent upon the change in motivational focus over time, instigating different coping
strategies (Lazarus & Folkman, 1984; Leventhal et al., 1983, 1997). In other words, it is hypothesised that memory distortions
might play a functional role within the self-regulatory processes elicited by negative feedback, and, as a by-product, recall
errors vary as a phase-specific phenomenon.
It was proposed that receiving negative cholesterol feedback elicits fear-control processes. Unlike many other health
problems, an elevated cholesterol level is not associated with symptoms. One can reasonably assume that participants who
expected their cholesterol level to be favourable felt well. Thus, while expecting a favourable or normal cholesterol level,
negative cholesterol feedback might not only induce fear and worry, but could also be accompanied by shattered feelings of
control and self-efficacy. At this stage, controlling the threatening agent itself is not possible, since this requires long-term
changes in health behaviour. Consequently, the only possible way of coping at this particular point in time is to change one’s
6 Recall accuracy was determined by the method used for calculating overall frequency of hindsight bias.
450 RENNER
beliefs and appraisals. The results of the immediate hindsight measure indicate a hindsight bias, i.e., the recalled foresight
estimate was shifted towards the actual cholesterol level. Furthermore, analyses showed that those who displayed hindsight
bias after receiving unexpected negative feedback felt more threatened than those who did not display the bias. This suggests
that people strive to shield their feelings of control and self-efficacy by making the outcome seem more foreseeable, as
assumed by Haslam and Jayasinghe (1995). Focusing on the causal importance of prior behaviour appears functional, since
this allows visualising of a connection between what they could be doing and the course of the disease (Thompson et al.,
1998).
Other empirical evidence from coping research also suggests that people try to shield their feelings of perceived control and
self-efficacy, especially in the face of threatening information (Armor & Taylor, 1998). For example, Taylor (1983) reported
that 95% of the patients she studied had developed a theory of why their cancer occurred. Many of the causes mentioned by
these patients were behavioural patterns which could be modified through the patient’s own efforts. Similarly, Croyle and
Sande (1988) showed that participants who believed that they suffered from (non-existent) thioamine acetylase (TAA)
deficiency reported more deficiency-associated symptoms and more TAA risk-related behaviours than participants who
believed their TAA level was normal. Overall, striving to shield feelings of control and self-efficacy in the face of threat might
be an adaptive reaction, since these optimistic self-beliefs are crucial for the behavioural change that is needed for successful
risk management (Taylor, Kemeny, Reed, Bower, & Gruenewald, 2000; Renner, Knoll, & Schwarzer, 2000; Renner &
Schwarzer, in press; Schwarzer & Renner, 2000). Hindsight bias could accordingly be understood as a by-product of the
attempt to regain control in the face of threat, which in turn facilitates danger-control oriented behaviour in the long run.
Considering the delayed hindsight measure, participants receiving unexpected negative feedback recalled their foresight
estimations as more discrepant to the feedback than they actually were, i.e., reversed hindsight. This pattern is in line with the
hypothesis of Mark and Mellor (1991), who proposed that people tend to derogate the foreseeability in order to avoid blame
for the situation. This avoidance tendency could emerge because after a certain time people might feel more in control and
adapted, and consequently their goal orientation shifts more towards avoiding blame.
One might be suspicious that the particular pattern of hindsight change from the first to the second hindsight measure
would appear different when considering the control sample, which only provided data from the first hindsight measure. Since
the study involved a mailed questionnaire, systematic drop-outs from t1 to t2 could impair the conclusions. The attrition was
34%, which is considerable but in line with other screening studies (cf. Glanz & Gilboy, 1995). Control analyses showed that
the study sample and the control sample did not differ systematically, neither in respect of their foresight accuracy, perceived
threat, or reported surprise, nor in respect of hindsight bias immediately after feedback. Hence, the findings do not support the
notion of a response bias or systematic drop-out which might have influenced hindsight measures. Admittedly, this discussion
is somewhat speculative and future studies are needed including the assessment of motivational factors at a more detailed
level, which was not possible in this study.
Furthermore, the sequence proposed above might only apply to negative self-relevant stimuli that require a behavioural
response to avoid loss or harm. Negative events that are not under behavioural control, or require no adaptive behaviour
change, may immediately generate, for example, more blame-avoiding reactions. This might partly explain why in
experimental contexts, for instance where participants where asked for stock market decisions (Louie, 1999; Mark et al., 2003-
this issue), negative self-relevant outcomes were judged as less foreseeable in comparison to positive outcomes. Perceived
control could be manipulated, for instance, by providing information about the treatability of the given condition. However,
the data already suggest that paying attention to the dynamic change of motivation might resolve otherwise conflicting results
(Haslam & Jayasinghe, 1995; Mark & Mellor, 1991). Both patterns of recall bias reported in previous studies were replicated
in this study, with time of measurement as a critical variable.
Hindsight bias and feelings of surprise

In order to further support the hypothesis that self-threatening feedback is crucial for motivated hindsight bias, one has to
consider the surprise elicited by the cholesterol feedback. Mazursky and Ofir (1990; Ofir & Mazursky 1997, but see also
Mark & Mellor, 1994; Pezzo, 2003-this issue; Schkade & Kilbourne, 1991) postulated a “reversal” hypothesis, arguing that
high levels of surprise will result in a reversed hindsight bias. They assume that the feeling of surprise serves as a memory cue
to the outcome unexpectedness, which in turn leads people to exaggerate in hindsight the discrepancy between foresight
estimation and outcome. In line with this reasoning, one could propose that negative and positive feedback evoked different
surprise levels, and consequently different recall patterns. Analysis of the surprise level in this study revealed that
immediately after the feedback, participants who received an unexpected positive result demonstrated more surprise and less
hindsight bias than participants who received an unexpected negative result. Hence, these results provide support for the
cognitive-oriented notion suggested by Ofir and Mazursky (1997), which claims that the more people are surprised by the
outcome, the less they demonstrate hindsight bias. However, this interpretation is limited because perceived surprise was
assessed before hindsight estimations and could therefore have altered the process of recall, favouring the cue hypothesis.
Another limitation is that the surprise measure confounded both the valence (positive vs negative) and amount of surprise.
Consequently, positive feedback might not have been more surprising, but elicited more pronounced perceived valence.
However, in both samples, participants who received an expected positive feedback gave similar surprise ratings to those who
received an expected negative feedback (all Fs < 1.7, p < .05), suggesting that the degree of valence of positive and negative
feedback was comparable.
While the cue proposition offers an explanation for reactions immediately after the feedback, it does not explain the
observed shift within hindsight estimates across time. In addition, one could reason that the causal sequence proposed by the
cue hypothesis is rather ambiguous. Ofir and Mazursky (1997) assume that memory distortions are caused by perceived
unexpectedness. However, people who are prone to hindsight bias, claiming they knew it all along, could not reasonably state
at the same time that the outcome took them by surprise. Thus, high perceived unexpectedness could be seen as a logical
consequence of reversed hindsight bias, rather than being the cause of it. In addition, assuming that reversed hindsight bias is
a consequence of experienced surprise, or vice versa, people ought to display hindsight bias on “objective” measures (that
means hindsight estimations that are more similar to the outcome than foresight estimations) as well as on “subjective”
measures, which concern the perceived foreseeability in retrospect. Conversely, one could assume that subjective and
objective measures can differ considerably. For example, some participants may recall a more accurate foresight estimation in
hindsight (dis playing “objective” hindsight), but nevertheless they could believe that they would not have foreseen the
diagnosis, and still report feelings of surprise. The opposite is also plausible: i.e., people who do not show hindsight bias on
objective measures could claim that they did not foresee the outcome. “Subjective” hindsight measures (e.g. Mark & Mellor,
1991) as well as “objective” hindsight measures (e.g., Haslam & Jayasinghe, 1997) have been applied, but most researchers
do not differentiate between these two phenomena (but see Blank & Fischer, 2000; Blank, Fischer, & Erdfelder, 2003-this
issue). However, a numerical difference between foresight and hindsight measure is not sufficient to state that people claimed
they knew it all along. This distinction might be particularly fruitful for the understanding of how motivational and cognitive
factors influence hindsight bias. “Subjective” hindsight bias might be more influenced by motivational factors, whereas
“objective” hindsight bias might depend more on cognitive factors.
Limitations of the study

Limitations of internal and external validity of the presented study must be acknowledged. Since all participants were
volunteers and there was no pressure to attend, sample selection biases might reduce external validity. People who choose to
be tested are by definition self-selected and may be in part psychologically and behaviourally prepared for dealing with bad
news. Consequently, the degree to which the findings generalise to people who refrained from testing is limited.
The phenomenon of hindsight bias was explored here in an ex-post facto study (Broota, 1989). Thus, the cholesterol
feedback given to the participants was not randomly assigned to the two feedback groups (positive and negative feedback), but
was based on their actual cholesterol test results. The advantage of giving actual feedback is that it is naturalistic, was not
forced onto the participants, and importantly, has a clear emotional importance for the recipients. As outlined in the
introduction, it was hypothesised that self-relevant negative feedback is a necessary prerequisite for investigating effects of
self-defensive motivational mechanisms on hindsight estimations. Conversely, without any question, an expost facto design
puts limitations on the generality of the findings presented here. However, while randomised group assignment is
easily accomplished by studying cognitive phenomena using, for example, test performance on almanac questions as
feedback, this bears on a number of possible concerns in the domain of health psychology. It appears that random assignment
to experimental conditions is only ethically feasible for studying short-term effects. For instance, Jemmott, Ditto, and Croyle
(1986), developed an experimental paradigm for studying reactions to risk factor information using a fictitious enzyme
deficiency (TAA) as a context. Importantly, the negative feedback was clearly emotionally upsetting for the participants (cf.
Croyle, Sun & Hart, 1997). Even worse, Baumann, Cameron, Zimmerman, and Leventhal (1989) stated that participants who
received false high blood pressure feedback reported more physical symptoms afterwards and rated their health status
significantly lower. The longitudinal perspective of motivational change was the main goal of the study, and accordingly, the
present study would have needed to withhold debriefing information for an average of 5 weeks. This long delay did not
appear ethically justifiable, considering the emotional impact of false negative feedback.
A further complication in realising the present study as experimental design emerges when both independent variables
(cholesterol feedback and feedback expectations) are considered. Random assignment to the feedback condition (positive or
negative) would not control for a priori differences between the two conditions in terms of feedback expectations.
Consequently, a full experimental design would not only involve the random assignment to feedback, but would in addition
require the experimental manipulation of expectations regarding the cholesterol test. This is probably difficult to achieve since
risk perceptions and self-related health ratings are quite resistant to interventions (e.g., Weinstein & Klein, 1995). According
to these ethical and practical limitations, an “ex-post facto” design appeared more appropriate for a first study in this area of
research.
452 RENNER
Since participants were not randomly assigned to the feedback conditions, a priori differences between the two feedback
groups could have seriously impaired internal validity. One could argue, for example, that the two groups differed on
personality variables, such as coping styles or depressive realism. Conversely, several studies have shown little relationship
between risk factor appraisals and individual difference variables such as self-esteem, monitoring vs blunting coping style, or
repression-sensitisation (Croyle, Sun, & Louie, 1993; Ditto, Jemmott, & Darley, 1988). In addition, personality traits could
certainly provide alternative explanations for the findings concerning the first hindsight measure, but not for the observed
changes in the direction of the hindsight bias over time. However, they may function as a moderator within feedback groups,
explaining why some individuals demonstrated hindsight bias and some did not when faced with unexpected negative
feedback.
Summary
The present study demonstrated the phenomenon of motivated hindsight bias in the context of health-related feedback.
Interestingly, applying a longitudinal perspective, hindsight bias was observed for the immediate hindsight measure, while
reversed hindsight bias was observed after a delay of about 5 weeks. These data are consistent with the notion of biased
memory recall reflecting self-serving mechanisms. However, the focus of self- protection might be dynamic, changing focus
over time. These findings need to be extended in future studies employing more rigorous experimental designs, and exploring
the relation of threat, motivation, and memory bias at a more specific level.
REFERENCES
Arkes, H.R., Wortmann, R.L., Saville, P.D., & Harkness, A.R. (1981). Hindsight bias among physicians weighing the likelihood of
Armor, D.A., & Taylor, S.E. (1998). Situated optimism: Specific outcome expectancies and self-regulation. In M.P.Zanna (Ed.), Advances
in experimental social psychology (Vol. 30, pp. 309–379). New York: Academic Press.
Baumann, L.J., Cameron, L.D., Zimmerman, R.S., & Leventhal, H. (1989). Illness representations and matching symptoms. Health
Blank, H., & Fischer, V. (2000). “Es muβte eigentlich so kommen”: Rückschaufehler bei der Bundestagswahl 1998. [“It had to turn out that
way”: Hindsight bias in the German Parliamentary Elections in 1998]. Zeitschrift für Sozialpsychologie, 31, 128–142.
Blank, H., Fischer, V., & Erdfelder (2003). Hindsight bias in political elections. Memory, 11, 491–504.
Broota, K.D. (1989). Experimental design in behavioral research. New York: Wiley.
Cacioppo, J.T., Gardner, W.L., & Berntson, G.G. (1999). The affect system has parallel and integrative processing components: Form
follows function. Journal of Personality and Social Psychology, 76, 839–855.
Camerer, C, Loewenstein, G., & Weber, M. (1989). The curse of knowledge in economic settings: An experimental analysis. Journal of
Political Economy, 97, 1232–1254.
Carli, L.L. (1999). Cognitive reconstruction, hindsight, and reactions to victims and perpetrators. Personality and Social Psychology
Bulletin, 25, 966–979.
Croyle, R.T., & Sande, G.N. (1988). Denial and confirmatory search: Paradoxical consequences of medical diagnosis. Journal of Applied
Croyle, R.T., Sun, Y.C., & Hart, M. (1997). Processing risk factor information: Defensive biases in health-related judgments and memory.
In K.J.Petrie & J. A.Weinman (Eds.), Perceptions of health and illness: Current research and applications (pp. 267–290). Singapore:
Harwood Academic Publishers.
Croyle, R.T., Sun, Y.C., & Louie, D.H. (1993). Psychological minimization of cholesterol test results: Moderators of appraisal in college
students and community residents. Health Psychology, 12, 1–5.
Ditto, P.H., & Boardman, A.F. (1995). Perceived accuracy of favorable and unfavorable psychological feedback. Basic and Applied Social
Ditto, P.H., Jemmott, J.B., & Darley, J.M. (1988). Appraising the threat of illness: A mental representational approach. Health Psychology,
7, 183–201.
Ditto, P.H., Scepansky, J.A., Munro, G.D., Apanovitch, A.M., & Lockhart, L.K. (1998). Motivated sensitivity to preference-inconsistent
information. Journal of Personality and Social Psychology, 75, 53–69.
Fischhoff, B. (1975). Hindsight foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental
Fuchs, R. (1996). Causal models of physical exercise participation: Testing the predictive power of the construct “pressure to change.”
Journal of Applied Social Psychology, 26, 1931–1960.
Glanz, K., & Gilboy, M.B. (1995). Psychological impact of cholesterol screening and management. In R.T. Croyle (Ed.), Psychological
effects of screening for disease prevention and detection (pp. 39–64). New York: Oxford Press.
Haslam, N., & Jayasinghe, N. (1995). Negative affect and hindsight bias. Journal of Behavioral Decision Making, 8, 127–135.
311–327.
Hell, W., Gigerenzer, G., Gauggel, S., Mall, M., & Müller, M. (1988). Hindsight-bias: An interaction between automatic and motivational
factors? Memory and Cognition, 16, 533–538.
Jemmott, J.B., Ditto, P.H., & Croyle, R.T. (1986). Judging health status: Effects of perceived prevalence and personal relevance. Journal of
Kirk, R.E. (1968). Experimental design: Procedures for the behavioral sciences. Belmont, CA: Brooks/Cole Publishing.
Kunda, Z. (1987). Motivated inference: Self-serving generation and evaluation of causal theories. Journal of Personality and Social
Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108, 480–498.
Lazarus, R.S., & Folkman, S. (1984). Stress, appraisal, and coping. New York: Springer.
Leary, M.R. (1982). Hindsight distortion and the 1980 presidential election. Personality and Social Psychology Bulletin, 8, 257–263.
Leventhal, H., Benyamini, Y., Brownlee, S., Diefenbach, M., Leventhal, E.A., Miller, L., & Robitaille, C. (1997). Illness representations:
Theoretical foundations. In K.J.Petrie & J.A.Weinman (Eds.), Perceptions of health and illness: Current research and applications
(pp. 19–45). Singapore: Harwood Academic Publishers.
Leventhal, H., Safer. M.A., & Panagis, D.M. (1983). The impact of communications on the self-regulation of health beliefs, decisions, and
behavior. Health Education Quarterly, 10, 3–29.
84, 29–41.
Mark, M.M., & Mellor, S. (1994). “We don’t expect it happened”: On Mazursky and Ofir’s (1990) purported reversal of the hindsight bias.
Mitchell, T.R., & Kalb, L.S. (1981). Effects of outcome knowledge and outcome valence on supervisors’ evaluations. Journal of Applied
National Heart, Lung, and Blood Institute (1995). Recommendations regarding public screening for measuring blood cholesterol (NIH
Publication No. 95–3045). Bethesda, MD: National Institutes of Health.
Pohl, R.F. (1999). Hindsight bias: Robust, but not reliable. Unpublished manuscript. Justus-Liebig-University Giessen, Germany.
Pohl, R.F. (1998). The effects of feedback source and plausibility of hindsight bias. European Journal of Cognitive Psychology, 10,
191–212.
Pohl, R.F., & Hell, W. (1996). No reduction in hindsight bias after complete information and repeated testing. Organizational Behavior and
Pohl, R.F., Ludwig, M., & Ganner, J. (1999a). Kein Zusammenhang zwischen Grad der Elaboration und Ausmaβ des Rückschaufehlers.
[No relation between depth of elaboration and amount of hindsight bias.] Zeitschrift für Experimentelle Psychologie, 46, 275–287.
Pohl, R.F., Stahlberg, D., & Frey, D. (1999b). I’m not trying to impress you, but I surely knew it all along! Self-presentation and hindsight
bias. Working paper No. 99–19, SFB 504. University of Mannheim, Germany.
Powell, J.L. (1988). A test of the knew-it-all-along effect in the 1984 presidential and statewide elections. Journal of Applied Social
Renner, B., Knoll, N., & Schwarzer, R. (2000). Age and body weight make a difference in optimistic health beliefs and nutrition behaviors.
International Journal of Behavioral Medicine, 7, 143–159.
Renner, B., & Schwarzer, R. (in press). Social-cognitive factors predicting health behavior change. In J.Suls & K.Wallston (Eds.), Social
psychological foundations of health and illness. New York: Blackwell.
454 RENNER
Schwarz, S. (2001). Motivationale Einflüsse auf den Hindsight Bias: Selbstwertdienliche Verarbeitung von persönlich relevanten
Informationen. [Motivational influences on hindsight bias: Self-serving processing of personally relevant information]. Hamburg:
Verlag Dr Kovac.
Schwarzer, R., & Renner, B. (2000). Social-cognitive predictors of health behavior: Action self-efficacy and coping self-efficacy. Health-
Stahlberg, D., Eller, F., Romahn, A., & Frey, D. (1993). Der knew-it-all-along Effekt in Urteilssituationen von hoher und geringer
Selbstrelevanz. [The knew-it-all-along effect in judgmental settings of high and low self-esteem relevance]. Zeitschrift für
low self-esteem relevance. Working paper No. 99–34, SFB 504 .University of Mannheim, Germany.
Stahlberg, D., Sczesny, S., & Schwarz, S. (1999). Exculpating victims and the reversal of hindsight bias. Working paper No. 99–70, SFB
504. University of Mannheim, Germany.
Stanovich, K.E., & West, R.F. (1998). Individual differences in rational thought. Journal of Experimental Psychology: General, 127,
161–188.
Synodinos, N.E. (1986). Hindsight distortion: “I knew-it-all along and I was sure about it.” Journal of Applied Social Psychology, 16,
107–117.
Taylor, S.E. (1983). Adjustment to threatening events: A theory of cognitive adaptation. American Psychologist, 38, 1161–1173.
Taylor, S.E. (1991). Asymmetrical effects of positive and negative events: The mobilization-minimization hypothesis. Psychological
Bulletin, 110, 67–85.
Taylor, S.E., Kemeny, M.E., Reed, G.M., Bower, J.E., & Gruenewald, T.L. (2000). Psychological resources, positive illusions, and health.
American Psychologist, 55, 99–109.
Thompson, S.C., Armstrong, W., & Thomas, C. (1998). Illusions of control, underestimations, and accuracy: A control heuristic
explanation. Psychological Bulletin, 123, 143–161.
Troschke, J., von Klaes, L., Maschewsky-Schneider, U., & Scheuermann, W. (1998). Die Deutsche HerzKreislauf-Präventionsstudie. [The
German cardiovascular prevention study]. Bern: Huber.
Verplanken, B., & Pieters, R.G. (1988). Individual differences in reverse hindsight bias: I never thought something like Chernobyl would
Weinstein, N.D., & Klein, W.M. (1995). Resistance of personal risk perceptions to debiasing interventions. Health Psychology, 14,
132–140.
Personality differences in hindsight bias
Jochen Musch
Ten personality correlates of hindsight bias were tested in a study with 75 participants answering almanactype
knowledge questions. Participants showed hindsight bias when hindsight estimates were compared to foresight
estimates (memory condition), when hindsight estimates were compared to foresight estimates of other
participants (BS=between-subjects hypothetical condition), and when hindsight estimates were compared to
foresight estimates in response to equally difficult control items (WS=within-subject hypothetical condition). The
magnitude of hindsight bias in both hypothetical conditions was positively associated with the participant’s field
dependence and his or her tendency for favourable self-presentation (as measured by social desirability and
impression management). Between-subjects hypothetical hindsight was associated with the participant’s
conscientiousness and need for predictability and control (as measured by a rigidity scale). In a multiple
regression analysis, 39% of the variance in BS hypothetical hindsight, 24% of the variance in WS hypothetical
hindsight, but no significant proportion of the variance in memory hindsight could be accounted for by personality
measures. It is concluded that individual differences in hindsight bias exist and must be taken into account in a
complete model of the effect.
Research in social and cognitive psychology has revealed a number of biases and shortcomings in everyday judgement and
information processing (Kahneman, Slovic, & Tversky, 1982; Nisbett & Ross, 1980). One such bias is the “knew-it-all-along”
effect, which refers to the tendency to overestimate how predictable the outcome of an event had been before one had
experienced the now familiar outcome. This hindsight bias has the effect of making the past seem less uncertain than it was. It
has been demonstrated across a range of predictions, subjects, and methodologies, in both laboratory and field settings (for
reviews, see Christensen-Szalanski & Willham, 1991; Hawkins & Hastie, 1990). The phenomenon has been observed in
domains as diverse as general knowledge (Fischhoff, 1977), scientific findings (Davies, 1987), football results (Leary, 1981),
election outcomes (Blank & Fischer, 2000), and the location of cities on a map (Pohl & Eisenhauer, 1995). It has been
obtained using both memory instructions and hypothetical instructions. With memory instructions, individuals first respond
without feedback. They are subsequently told the correct answers and asked to recall their previous judgements as accurately
as possible. In the hypothetical design, participants are given outcome information and are then asked to respond as they (or
others) would have answered, had they not been told the correct answers (e.g., Fischhoff, 1975; Wood, 1978). In both cases,
participants usually show an overestimation in the direction of the correct answer, either as compared to control participants
who respond without

feedback (hypothetical condition) or as compared to their own pre-outcome estimates (memory condition).
An important area of research on the hindsight bias phenomenon considers how the effect is linked to the laws of information
processing and storage (e.g., Erdfelder & Buchner, 1998; Fischhoff, 1975; Hell, 1993; Hertwig, Fanselow, & Hoffrage, 2003-
Requests for reprints should be sent to Jochen Musch, Department of Psychology, Lehrstuhl Psychologie III, Schloss—
Ehrenhof Ost, University of Mannheim, D-68131 Mannheim, Germany, Email: musch@psychologie.uni-mannheim.deI am
very grateful to Robert Mischke for his help in collecting and analysing the data. I ani also indebted to Johanna Louda,
Yvonne Mertens, Ursula Pryczak, and Romy Reyentanz for their help in collecting the data as part of their course work in a
seminar on experimental psychology. Thanks are due to Hartmut Blank, Arndt Bröder, Katja Ehrenberg, Markus Eisenhauer,
Edgar Erdfelder, Ulrich Hoffrage, Delroy Paulhus, Rüdiger Pohl, Anders Winman, and an anonymous reviewer for their many
helpful comments and constructive suggestions on previous drafts of this article.
456 MUSCH
this issue; Hertwig, Gigerenzer, & Hoffrage, 1997; Hoffrage & Hertwig, 1999; Hoffrage, Hertwig, & Gigerenzer, 2000; Pohl,
1993; Pohl, Eisenhauer, & Hardt, 2003-this issue; Pohl & Gawlik, 1995; Pohl, Hardt, & Eisenhauer, 2000; Stahlberg &
Maass, 1998; Strack & Muβweiler, 1997; Winman, Juslin, & Björkman, 1998). Hindsight bias, according to this line of
research, is based on mechanisms such as immediate assimilation, memory impairment, biased reconstruction, selective
accessibility and activation, and anchoring. There exists extensive literature on this approach, summaries of which can be
found in Christensen-Szalanski and Willham (1991), Hawkins and Hastie (1990), and Stahlberg and Maass (1998).
Undoubtedly, this line of research has identified important cognitive determinants of the effect.
A second line of research on hindsight bias is concerned with how individual differences in motivation and personality are
linked to the magnitude of the effect (e.g., Campbell & Tesser, 1983; Davies, 1992; Hell, Gigerenzer, Gauggel, Mall, &
Müller, 1988; Renner, 2003-this issue; Verplanken & Pieters, 1988). Personality factors may supplement cognitive accounts of
hindsight bias, rather than compete with them. According to the individual difference approach, hindsight bias is influenced
by individual traits, needs, and motives, and is not exclusively the result of rational but sometimes faulty information-
processing mechanisms applying to all individuals.
Previous research has revealed substantial relationships between individual differences in variables such as intelligence and
knowledge on the one hand and cognitive biases on the other hand (Stanovich & West, 2000). With regard to hindsight bias in
particular, Stanovich (1999) has shown that the magnitude of the bias can vary as a function of knowledge and that
participants high in overconfidence bias did also display greater hindsight bias. Nevertheless, individual difference variables
are often ignored in the study of cognitive biases. The present research is an attempt to close this gap and to shed some light
on what might be called a “blind spot” in many (but not all) previous studies of hindsight bias.
Using a paradigm introduced by Campbell and Tesser (1983), participants were asked to estimate the probability that given
assertions pertaining to almanac-type knowledge were correct. Later they were given the correct answers and were asked to
recall their initial estimates. A within-subject index of memory hindsight bias was then calculated on the basis of the changes
from the initial (foresight) responses to the final (hindsight) responses. In the second phase of the experiment, participants
responded to a new set of assertions which were presented along with their correct answers. The responses in this phase were
used to construct two additional indices of hypothetical hindsight. The first index was based on the difference between each
subject’s responses and the average responses of other subjects to the same items presented without answers. The second
index was based on the comparison of each subject’s responses to his or her responses to another, equally difficult set of
control items presented without answers. It was hypothesised that the magnitude of the hindsight bias indices would be
systematically related to individual differences in participants’ personalities.
The following individual difference measures which have previously been discussed with respect to their possible role in
mediating the bias were investigated: Self-presentational concerns (Campbell & Tesser, 1983; Pohl, Stahlberg, & Frey, 1999;
Stahlberg, Eller, Maass, & Frey, 1995); the motive to predict and control events (Campbell & Tesser, 1983); the Need for
Cognition (Verplanken & Pieters, 1988); Field Dependence (Davies, 1992, 1993); Suggestibility (Pohl, 1999b; Pohl &
Eisenhauer, 1995); and Conscientiousness. Before the method is described in detail, previous empirical evidence concerning
the impact of these personality variables on hindsight bias will be reviewed.
SELF-PRESENTATION
Among all individual difference measures, the motive of self-presentation has received most attention. Individuals giving
correct responses or making accurate predictions can pride themselves on good general knowledge and superior judging
abilities. Participants may therefore simply try to appear smarter than they really are by giving hindsight estimates that are more
in line with the actual solution or outcome.
Campbell and Tesser (1983) reasoned that certain individuals have a stronger tendency to maintain a positive image in
public than others, and consequently show more hindsight bias. In accordance with their prediction, they observed a
significant positive correlation of r=.24 between the magnitude of the hindsight bias and the personality trait of self-
presentation as measured by the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe, 1964).
If self-presentational concerns are involved, participants should react in a different manner, depending on whether or not
the recollection of their original estimates proceeds in public or in private. Hindsight bias should be larger in a public recall
condition than in a private recall condition. However, Leary (1981, 1982) found no such effects concerning predictions about
the outcome of a football game and the 1980 presidential election. Similarly, Stahlberg et al. (1995) found no difference in the
amount of hindsight bias when they asked participants to generate hindsight estimates either individually or in front of a group
of other participants. Pohl et al. (1999, Exp. 1) also found no effect of a public-private manipulation regarding the recall and/
or the presentation of the original judgements. An additional manipulation informing participants in one group that their
recollections would be closely compared with their original judgements (presumably precluding a self-presentation tendency)
was not effective either (Pohl et al., 1999, Exp. 2). Fischer and Budescu (1995) led their participants to believe that they could
easily get away with a retroactive “adjustment” because their original estimates presumably had been lost. The hypothesised
PERSONALITY DIFFERENCES IN HINDSIGHT BIAS 457
effect of this manipulation did not emerge, however. Finally, Connolly and Bukszar (1990) found that participants showed
equal hindsight shifts regardless of whether they believed that the outcome given was real, or whether it was generated by the
toss of a coin.
Taken together, self-enhancing strategies have been difficult to demonstrate in experimental research on hindsight bias. An
exception is a recent study of Stahlberg and Schwarz (1999). After positive feedback in a simulated job interview, successful
applicants showed (self-enhancing) hindsight bias to a higher extent than uninvolved observers of the same interview.
A second aspect of the self-presentation motive has received much stronger experimental support than self-enhancement.
According to the self-presentation account, hindsight should be strongly reduced or even reversed if having had pre-outcome
knowledge is self-threatening rather than self-enhancing. This is exactly what was found by Stahlberg, Sczesny, and Schwarz
(1998). When a highly self-threatening outcome (a rape scenario) was involved, participants showed significantly reduced
hindsight bias. Similarly, Mark and Mellor (1991) observed a reduced hindsight bias among laid-off workers as compared to
survivors of the lay-off. In Louie’s (1999) study, business students deciding if they would purchase a company’s stock only
showed hindsight bias if the outcome (stock value increased versus decreased) matched favourably with their decisions. Thus,
an important factor seems to be whether the outcome reflects positively or negatively upon the respondent. A large body of
research on the self-serving bias is consistent with the notion that people take credit for favourable outcomes but not blame
for unfavourable outcomes (Bradley, 1978; but see Musch & Bröder, 1999).
Taken together, experimental evidence for the mediating role of self-presentational tendencies is mixed and rather
contradictory. While little evidence for self-enhancing strategies could be found, a self-protection motive seems to be present.
However, it is important to note that almost all of the experimental results inconsistent with the self-presentation account are
based on null findings, which are often difficult to interpret. Power problems may have played a role, and some of the
manipulations employed may not have been strong enough to have a significant impact on the self-presentation motive. It is
conceivable, for example, that the participants in the study of Fischer and Budescu (1995) simply did not believe the cover
story telling them that their original estimates had been lost, a possibility that is explicitly acknowledged by the authors of this
study.
Pohl et al. (1999, p. 19) criticised the fact that the original finding of Campbell and Tesser (1983) of a significant
correlation between individuals’ self-presentation and hindsight bias has never been replicated. Social desirability as
measured by a German version of the Marlowe-Crowne Social Desirability Scale did not correlate with the size of the
hindsight effect in Pohl et al. (1999) and Pohl and Eisenhauer (1995). Sample sizes were small in these studies, however. It
therefore seems necessary to attempt a replication of Campbell and Tesser’s (1983) finding. This was the first aim of the
present study. Because the Marlowe-Crowne scale is now rather outdated and contains several items that may no longer tap
socially desirable behaviour, a recently developed alternative scale, the Social Desirability Scale-17 (SDS 17; Stöber, 1999) was
used. The SDS-17 has satisfactory reliability, high convergent validity to other measures of socially desirable responding, and
high face validity as evidenced by judges’ ratings of its 17 items with respect to social desirability (Stöber, 1999).
The term “self-presentation” is often used to refer very generally to the maintenance of public and/or private self-
evaluation. However, it is conceptually possible and useful to distinguish between a motive to maintain a positive self-image
and a motive to maintain a favourable public image (Campbell & Tesser, 1983; Paulhus, 1984). Therefore, in addition to the
SDS-17, participants in our experiment were requested to fill out the two-factorial Balanced Inventory of Desirable
Responding (BIDR, Paulhus, 1994; for a short version of the German translation, see Musch, Brockhaus, & Bröder, 2002),
which allows a differentiation between these two major forms of self-presentation. The two subscales of the BIDR (Self-
Deceptive Enhancement and Impression-Management) are relatively independent (r=.20), although their sum is highly
correlated with the Marlowe-Crowne scale (r=.73; Paulhus, 1994). Use of the BIDR should provide more detailed insight into
different components of the self-presentation motive than can be obtained from a one-dimensional scale of socially desirable
responding.
The Self-Deceptive Enhancement (SDE) subscale of the BIDR indexes the tendency to give honest but inflated self-
descriptions. SDE pertains to a perception of exaggerated mental control and dogmatic over-confidence (e.g., “I am very
confident of my judgements”; “I never regret my decisions”) as opposed to insight into imperfections (“When my emotions
are aroused, it biases my thinking”). As Paulhus (1994) notes, Self-Deceptive Enhancement is conceptually related to traits
such as Rigidity and Dogmatism. The possible association of both with hindsight bias is discussed in more detail below.
Research has also shown that individuals high in SDE are susceptible to various egocentric and self-serving biases. For
example, they report a lower expectation that they can be involved in traffic accidents, and have a higher illusion of control
(Paulhus & Reid, 1991). Paulhus (1994) describes several studies showing SDE to be correlated with objective indicators of
memory overconfidence, overclaiming, and self-inflation. Paulhus (1994) also mentions an unpublished study in which 40
multiple-choice trivia questions were presented with their answers indicated. Respondents were instructed to rate how
confident they were that they would have answered that question correctly. Given that accuracy rates on these questions are
no better than chance, Paulhus (1994) calculated a simple (in the present terminology, hypothetical) hindsight score by averaging
the individuals’ 40 confidence ratings. He found significantly more hindsight bias among high SDE as compared to low SDE
458 MUSCH
respondents. To the extent to which Self-Deceptive Enhancement is involved in the effect, a significant correlation between
SDE and the amount of hindsight bias can be expected.
Whereas the SDE subscale tends to measure more private aspects of self-presentation, the Impression-Management (IM)
subscale of the BIDR indexes the tendency to give inflated self-descriptions to an audience. Contrary to SDE scores, IM
scores are very sensitive to situational demands for self-presentation (such as instructions to fake good or fake bad; Paulhus,
Bruce, & Trapnell, 1995). Examples for IM items are: “I never cover up my mistakes” and “I always declare everything at
customs”. If high IM individuals consciously change their initially unbiased recollections towards the correct judgement in
order to appear smart, a significant correlation between IM score and the amount of hindsight bias can be expected.
As a further test of the self-presentation motive, the type of hypothetical instruction was varied between subjects. One half
of participants were instructed to give hypothetical answers concerning their own behaviour: “Try to estimate as accurately as
you can the answer you believe you would have given to the statement if we had not told you the correct answer”. The other
half of participants were given similar hypothetical instructions, but this time with reference to other people: “Try to estimate
as accurately as you can the answer you believe most people would have given to the statement if they had not been told the
correct answer” (no italics were used in the actual instructions). While self-presentational tendencies can be expected in the
first condition, the assessment of the general knowledge of other people presumably stimulates self-enhancement or self-
protection strategies to a lesser extent. An overestimation of another person’s competence in judgement is not likely to
promote self-esteem and self-presentation. A significant difference between hypothetical hindsight bias in the two conditions
would therefore provide strong evidence for the efficacy of a self-presentation motive. On the other hand, if the hindsight
effect is a cognitive one of immediate automatic assimilation, there is little reason to expect an effect of the “self”/”other”
instruction manipulation (cf. Fischhoff, 1977).
PREDICTABILITY
Several theorists have argued that a basic human motive is the desire for certainty, predictability, and the need to experience
an integrated and meaningful world (e.g. Kelley, 1971). Campbell and Tesser (1983) hypothesised that individual differences
in this motive affect the magnitude of hindsight bias. In line with their prediction, participants’ scores on a Dogmatism and an
Intolerance for Ambiguity scale—two measures they used to cover the predictability motive— independently contributed to
the amount of hindsight bias (Campbell & Tesser, 1983). Specifically, they found that the higher a person scored on
Dogmatism, or on Intolerance for Ambiguity, the larger the hindsight bias was (Pohl, 1999b, found no relationship between
Dogmatism and hindsight bias, however).
One well-established finding is that dogmatic people tend to adopt and hold more extreme beliefs and disbeliefs than
people low in dogmatism (Rokeach, 1954). One reason for this is that individuals high and low in dogmatism generate
differential amounts of consistent and inconsistent reasons for their beliefs. Dogmatic individuals tend to generate few
contradictory reasons for their beliefs. Individuals low in dogmatism are more willing to consider contradictory evidence
(Davies, 1998). Owing to this difference, the belief-disbelief systems of individuals high in dogmatism are relatively closed
and compartmentalised, whereas individuals low in dogmatism are characterised by more open belief systems and a higher
readiness to tolerate disparate beliefs (Franklin & Carr, 1971). A poor original estimate in a hindsight task may constitute
inconsistent information for an individual high in dogmatism. Striving to avoid inconsistencies in his or her belief system, he
or she may react to such inconsistency by minimising or ignoring it (cf. Palmer & Kalin, 1985). In hind sight settings,
dogmatic individuals may therefore be inclined to insist that they “knew it all along” rather than acknowledge their original,
wrong estimate.
In addition to a Dogmatism scale, Campbell and Tesser (1983) used an Intolerance for Ambiguity scale to measure the
Predictability and Control motive. Dogmatism and Intolerance for Ambiguity have often been found to be correlated
constructs (e.g., MacDonald, 1970). People high in Intolerance for Ambiguity may try to “make sense” of the past by
superimposing structure and simplicity on their recollections of it. A third variable strongly related to these two constructs is
Rigidity (Brengelmann & Brengelmann, 1960). Measures of all three variables were therefore included in the present study.
In particular, the Dogmatism scale of Brengelmann and Brengelmann (1960), the Rigidity scale of the Munich Personality
Test (Zerssen, 1994), and the Tolerance for Ambiguity scale of Kischkel (1984) were employed.
In discussing their results, Campbell and Tesser (1983, p. 617) remark that Dogmatism and Intolerance for Ambiguity are
not necessarily the best indicators for the Predictability and Control motive, and that a Need for Cognition scale could have
been used as well.
NEED FOR COGNITION

Cohen, Stotland, and Wolfe (1955) were the first to systematically investigate a motive already postulated by Sigmund Freud,
namely the urge to know and to research. Cacioppo and Petty (1982) developed an instrument, the Need for Cognition scale,
to measure this tendency. In general, individuals high in Need for Cognition engage more in thinking than others, and they do
so because they enjoy being engaged in thinking (Cacioppo & Petty, 1982). Need for Cognition is not statistically related to
socially desirable responding (Bless, Wänke, Bohner, Fellhauer, & Schwarz, 1994), but significant correlations to academic
achievement have been found (Cacioppo & Petty, 1982, Study 2 and 3). Research in persuasion context shows that
individuals high in Need for Cognition base their attitudes more on a careful consideration of the arguments contained in a
persuasive message than individuals low in need for cognition (Bless et al., 1994). Also, the consistency between attitudes and
behaviour is higher for individuals high in Need for Cognition (see Petty & Cacioppo, 1986, for an overview of research on
the Need for Cognition construct).
Srull, Lichtenstein, and Rothbart (1985) observed that individuals high in Need for Cognition show better recall memory
than individuals low in Need for Cognition. Wood (1978) found that the more one thinks about an event prior to the event, the
less hindsight bias occurs. It can therefore be hypothesised that high need for cognition individuals (possibly because they
remember their foresight estimates better than individuals low in need for cognition) will express less hindsight bias.
Whereas Campbell and Tesser (1983) suggest that Need for Cognition (negatively signed) can be used as an alternative
operationalisation of the predictability and control motive instead of Dogmatism and Intolerance for ambiguity, Verplanken
and Pieters (1988) argue for a differentiation between these constructs. In the hitherto only study to investigate the
relationship between Need for Cognition and the magnitude of hindsight bias, Verplanken and Pieters (1988) found that
individuals with a high Need for Cognition showed less hindsight bias than individuals with a medium or low Need for
Cognition. They also found evidence that high Need for Cognition individuals were better able to remember their foresight
estimate. However, it is important to note that the systematic hindsight distortion Verplanken and Pieters (1988) observed was
negative hindsight, which they explained by the unexpectedness of the Chernobyl accident they had investigated. This
unexpectedness supposedly led their participants to express an “I-never-thought-that-would-happen” attitude (see Ofir &
Mazursky, 1997, for other examples of a reversal of hindsight bias at high levels of surprise). An open question is whether
Need for Cognition also moderates positive hindsight bias effects that were expected for the less surprising items used in the
present study. To investigate this issue, participants were asked to fill out a German translation of the Need for Cognition
scale (Bless et al., 1994).
FIELD DEPENDENCE
Another potential individual-difference moderator of hindsight bias is Field Dependence (Davies, 1992, 1993). Field
dependence has been found to be linked to performance in a variety of perceptual, cognitive, and social domains (Witkin &
Goodenough, 1977, 1981). According to Witkin and Goodenough (1981), field independents are better than field dependents
at tasks requiring the breaking up of organised stimuli into individual elements. One of the most widely used tests of Field
Dependence is a perceptual restructuring task requiring subjects to pick out a simple figure hidden in a larger, more complex
one: the Embedded Figures Test (Witkin, Oltman, Raskin, & Karp, 1971). A superior restructuring ability of field
independents in learning and memory tasks has repeatedly been observed (Davis & Frank, 1979; Goodenough, 1976).
Moreover, field-independent persons are considered to be better than field-dependent persons in discriminating between
internal and external sources of information (Durso, Reardon, & Jolly, 1985).
With respect to the hindsight bias phenomenon, Davies (1992) argued that field independents—due to their superior
cognitive disembedding skills—are better at decomposing outcome and solution information. This should help them to avoid
the immediate assimilation and integration of the reported solution into their memory system. As predicted, Davies (1992,
Exp. 1; Davies, 1993) found that field-independent individuals showed less hindsight bias than field-dependent individuals.
He interpreted this finding as evidence that field independents are better at reconstructing in hindsight what their foresight
state was like (in the memory condition), or might have been like (in the hypothetical condition). In another experiment aimed
at reducing hindsight bias, Davies (1992, Exp. 2) found that field dependents benefit more from being asked to generate
reasons that contradict the given answers. The likely explanation for this difference is that only field-independent individuals
generate contradictory reasons without having to be urged to do so. However, Pohl and Eisenhauer (1995)— possibly because
of their rather small and homogeneous sample—could not replicate the original finding of a significant correlation between
Field Dependence and the magnitude of hindsight bias.
As Davies (1993) notes, it is not clear how differences in Field Dependence fit in with other individual difference variables
investigated in hindsight bias research. Davies speculated that Rigidity might be relevant: Field dependents are more
inflexible information processors than field independents (Frank, 1983). Because field-independent individuals often score
higher on cognitive ability and academic achievement tests (Kush, 1996), one can also speculate that Field Independence may
be related to Need for Cognition. The present study therefore tried to determine the relationship between Field Dependence
(as measured by LPS scale 10; Horn, 1983) and other personality traits possibly involved in hindsight bias.
460 MUSCH
SUGGESTIBILITY
Pohl and Eisenhauer (1995) speculated that in addition to Field Dependence, Suggestibility may play a role in the hindsight
bias effect. Suggestibility has originally been investigated in the context of persuability (Hovland & Janis, 1959), and was
found to correlate significantly with Field Dependence (Blagrove, Cole-Morgan, & Lambe, 1994). A possible explanation for
this association is that individuals high in Suggestibility and Field Dependence are particularly receptive to social cues
(Melancon & Thompson, 1989), and may therefore be more susceptible to assimilating the correct response into their memory
system. Pohl (1999b) found no relationship between Suggestibility and the magnitude of hindsight bias, however. For
exploratory reasons, participants completed a German translation of the Gudjonsson Suggestibility Scale (GSS; Gudjonsson,
1984) which was prepared for the present study.
CONSCIENTIOUSNESS
Of the Big Five personality factors (Costa & McCrae, 1992), Conscientiousness is the one that is most closely linked to the
set of personality variables that has been discussed in hindsight bias research. Conceptually, Conscientiousness is strongly
related to Rigidity (Zerssen, 1994), which has been found to be positively associated with hindsight bias (Campbell & Tesser,
1983). A sample item of the German version of the Conscientiousness scale is “Whatever I do, I always strive for perfection”.
Individuals high in Conscientiousness may be inclined to minimise or ignore the failure of a bad original estimate by
exhibiting an “I-knew-it-all-along” attitude.
It seems that no one has yet investigated the possible relationship between an individual’s Conscientiousness and his or her
susceptibility to hindsight bias. For exploratory reasons and because of the dominant role of the Big Five factors in
personality research, Conscientiousness was included among the variables investigated in the present study.
In addition to determining single effects of the various personality measures, the present study aims at exploring their
construct similarity. A factor analysis of the personality measures is conducted for this purpose. Additionally, an attempt is being
made to assess the percentage of variance in hindsight bias that can be explained on the basis of a combination of individual
difference measures. Finally, the reliability of various hindsight bias indices is investigated. Pohl (1999a) argued that the
search for personality correlates of hindsight bias is bound to fail if the size of individual hindsight bias cannot be measured
reliably. In a recent meta-analysis, Pohl (1999a) computed an overall reliability of only .11 for individual memory hindsight.
He concluded that poor reliability is a major impediment to an efficient investigation of individual differences in hindsight
bias. For hypothetical conditions, no reliabilities of hindsight bias are reported in the literature. Significant correlations
between hindsight bias and personality variables were expected only to the extent to which measures of hindsight bias are
internally reliable.
METHOD
Participants
Participants were 80 volunteering University of Bonn students from different faculties, and non-student volunteers recruited
by the experimenters. The data of five participants had to be discarded because they misunderstood the memory instructions
and exactly reproduced the given solutions rather than their original estimates. Mean age of the remaining 75 participants (36
male and 39 female) was 27 years. Experimental sessions lasted between one and one and a half hours. Each participant was
tested individually.
Experimental materials
The material consisted of 80 almanac-type assertions (40 true and 40 false) taken from the studies of Campbell and Tesser
(1983), Hasher, Goldstein, and Toppino (1977) and various German reference works.1 The assertions deal with facts from
history, politics, biology, medicine, current affairs, geography, and social customs, among others. Items were chosen so that
participants most probably did not possess the specific knowledge of the correct answers, but should have some pre-
experimental knowledge as a basis for their estimates. Two examples are “Earth is the only planet in the solar system that has
one moon” (True) and “The Danube is the longest river in Europe” (False). For each statement, participants were asked to
indicate whether they thought it was true. This was done by marking a 21-point line scale anchored by “certainly false” and
“certainly true”.
1I am grateful to Jennifer Campbell and Lynn Hasher for making their assertions available.
To construct the test booklets, the 80 items were randomly divided into two sets of 40 items each (Set A and Set B). Half of
the items in each set were true and half were false. The first section of each test booklet consisted of 40 items without
feedback information. Half of the booklets contained Set A items in this section and the other half contained Set B items.
After answering the first section of the test booklet, participants were asked to answer the personality questionnaires
contained in the second section of the test booklet (Table 1 shows an overview of the employed scales). The third section of
the booklet was given to participants after all personality questionnaires had been answered. It contained all answers to the 40
items in the first section of the booklet. Participants were asked to read these answers carefully. In the fourth section,
participants responded to all 80 items of both sets. Forty of these items, were the same ones the subject had responded to in
the first section of the booklet, and were accompanied by memory instructions. Specifically, subjects were asked to “recall as
accurately as possible the response you gave earlier to this statement”. The remaining 40 items were new items for which
hypothetical instructions were given. The type of hypothetical instruction was varied between subjects. One half of
participants were given hypothetical instructions referring to themselves: “Try to estimate as accurately as you can the answer
you believe you would have given to the statement if we had not told you the correct answer”. The other half of participants
were given hypothetical instructions referring to other people: “Try to estimate as accurately as you can the answer you
believe most people would have given to the statement if they had not been told the correct answer” (no italics in original
instructions). The order of items sets and associated instruction type (i.e., memory versus hypothetical) were counterbalanced
across test booklets. Because two different forms of hypothetical instructions were used, this resulted in eight different forms
of test booklets (order of the two item sets×order of the two instruction types × hypothetical instruction for self versus other).
Of each of the eight different forms, 10 copies were prepared to supply the 80 participants. The five participants that had to be
excluded from data analysis for reasons already explained were distributed across four of the eight different forms, without
any apparent pattern.
Procedure
Each experimental session was started by providing the participant with a randomly selected test booklet. Participants first
answered the general knowledge questions in the first section of the booklet. Next, they completed the personality inventories
contained in the second section, which took them approximately 40 minutes. Thus, the retention interval of the hindsight task
was about this length of time. After reading the solutions to the memory items in the third section of the test booklet,
participants answered the memory and hypothetical questions in the fourth section. The order in which memory and
hypothetical questions were asked was balanced. Finally, participants were debriefed and thanked for their participation.
Calculation of the dependent measures

All participants responded to 40 items with memory instructions (in a within-subject design) and 40 items with hypothetical
instructions (which were analysed in either a within-subject or between-subjects manner as explained below). Within each
type of item, the correct answer was “true” for half of the items and “false” for the remaining items. The magnitude of the
effect was computed as the increase in confidence given feedback as compared to one’s own prior confidence without
feedback (memory hindsight), as compared to the confidence of others who responded without feedback (between-subjects
hypothetical hindsight), and as compared to the confidence in response to control items for which no feedback was given
(within-subject hypothetical hindsight). In the hypothetical conditions, one half of participants were asked to estimate how
they would have answered had they not been given the correct answer (hypothetical-self condition); the other half of
participants were asked to estimate how most people would have answered had they not been told the correct answer
(hypothetical-other condition).
The magnitude of memory hindsight bias was calculated as follows: For each item, the difference between the subject’s
response without feedback and his or her response with feedback was calculated. This difference was scored as positive if it was
in the direction of the feedback and was scored as negative if it was not. In order to avoid confounding the percentage of exact
recollections with the amount of hindsight bias, correctly recalled estimates (28.6% of all estimates) were not considered.
Otherwise, a large number of perfect recollections may lead to an apparently smaller effect (Pohl, 1999a).2 The average shift
of all non-perfect recollections towards the solution was used as an index of memory hindsight for each participant.
For hypothetical instructions, two additional hindsight scores were computed. The first hypothetical score was based on a
between-subjects comparison, following the procedure of Campbell and Tesser (1983). In the computation of this between-
subjects (BS) hypothetical hindsight index, the difference between the participant’s response with feedback and the average
response of other participants responding to the same items without feedback was calculated. This difference was scored as
positive if it was in the direction of the feedback and negative if it was not. The hypothetical self (hypothetical other) BS
hindsight index was constructed by averaging the signed differences across the 40 items for all participants answering under
462 MUSCH
hypothetical-self (hypothetical-other) instructions. Total hypothetical BS hindsight scores were calculated by taking the
hypothetical BS hindsight score of each participant irrespective of the self/other instruction.
A problem with this BS measure of hypothetical hindsight bias is that it confounds hindsight bias with actual knowledge.
That is, it fails to separate those who actually knew it all along from those who erroneously believe they knew it all along.
For example, a person with high confidence in his or her knowledge and vast knowledge relating to the questions will be
scored as exhibiting strong hindsight bias, because his or her scores are constantly higher than those of the control group. On
the other hand, individuals who do exhibit hindsight bias may be scored as not exhibiting the bias if they have little
knowledge in the first place.3 A second hypothetical hindsight score that does not suffer from this problem was therefore
computed on the basis of a within-subjects comparison. For this purpose, each individual’s score on the memory item set in
the first section of the test booklet was used as his or her comparison standard for the hypothetical item set. This was done by
computing the difference between the sum of a participant’s responses to the first set of memory items (presented without
feedback) and the sum of his or her responses to the second set of equally difficult hypothetical items (presented with
feedback). This difference was scored as positive if responses were more accurate with feedback given; otherwise, the
difference was scored as negative. To obtain a per-item index comparable to the BS measure of hypothetical hindsight, the
signed difference was divided by 40 (the total number of items) to construct the hypothetical self (hypothetical other) WS
hindsight index. Total hypothetical WS hindsight scores were calculated by taking the hypothetical WS hindsight score of
each participant irrespective of the self/other instruction.
RESULTS
Preliminary analyses
Before examining the influence of the individual difference variables on the magnitude of hindsight bias, preliminary analyses
were undertaken to ascertain (a) whether Items Sets A and B were comparably difficult, a necessary prerequisite for using
them as a control condition for each other in the computation of the hypothetical within-subjects hindsight index, (b) whether
the present materials and procedures were effective in eliciting hindsight bias, (c) whether the type of instruction (e.g.,
memory versus hypothetical) exhibited an impact on the magnitude of the effect, and (d) whether the self/other-manipulation
within the hypothetical condition was effective.4
To determine the difficulty of the item sets, a perfectly correct estimate was counted as 1, and a perfectly incorrect estimate
was counted as 21. There was no difference in the overall difficulty of the two items sets as indicated by the average of the 40
correctness scores of the initial estimates in the first section of the test booklet in which the respective item sets were
presented without feedback, M=10.3 versus 9.9, t(73) = 1.01, ns. This justifies the use of the two item sets as a control
condition for each other in the computation of the hypothetical within-subject hindsight index.
The size of the hindsight effect was significant in the memory condition, M=0.69, SD=1.37 [95% confidence interval for
population mean: (0.38, 1.01)], t(74)=4.36, p < .001 (effect size sensu Cohen: d=0.50). In the BS hypothetical condition, the
hindsight effect was also significant, M=1.80, SD=2.52 [95% confidence interval: (1.22, 2.38)], t(74)=6.19, p < .001 (which
amounts to an effect size d of 0.72). The same was true in the WS hypothetical condition, M=1.81, SD=2.71 [95% confidence
interval: (1.18,2.43)], t(74)=5.78, p < .001 (d=0.67). Each of the means of the dependent variables can be interpreted as the
average size of the effect per item on a 21-point scale. The size of the memory hindsight effect was significantly smaller than
the average hindsight effect in the two hypothetical conditions, F(1, 74)=13.81, p < .001. Correlations between the three
hindsight indices were as follows: r (memory, BS hypothetical)=.09, ns; r (memory, WS hypothetical)=.28, p < .02; r (BS
hypothetical, WS hypothetical)=.81, p < .001.
In both hypothetical hindsight conditions, the size of the bias was not significantly greater under hypothetical-self than
under hypothetical-other instructions, t(73)=0.90, ns, for BS hypothetical hindsight, and t(73)=0.52, ns, for WS hypothetical
hindsight. The following analyses are therefore based on the total hypothetical scores, irrespective of self/other instructions.
Reliability of hindsight scores

A sufficient reliability of hindsight bias measures is a necessary condition for the investigation of individual differences in
hindsight bias (Pohl, 1999a). The reliability of all three hindsight measures was therefore determined. This was done by
2 Campbell and Tesser (1983) summed across perfect and imperfect recollections. In the present analyses, however, the significance pattern
of the correlations with personality measures was independent of whether perfect recollections were taken into account or not.
3 I am grateful to Anders Winman who identified these weaknesses of the between-subjects hypothetical hindsight index and suggested the
alternative, within-subject index of hypothetical hindsight bias.

splitting the item pool into two halves and by comparing the participant’s hindsight scores in both halves. To this end, each
person’s data set was divided according to an odd/even rule. (Cronbach α was not taken as a measure of reliability because
correct recollections were excluded in the computation of memory hindsight bias and, thus, a different set of items was
available for each participant.) Subset I comprised all items with an odd number, subset II comprised all items with an even
number. This classification was reversed for one half of the participants to avoid confounding set differences with material
effects. Hindsight indices were then computed for both subsets, and the correlation between the hindsight scores in subsets I
and II was used as a measure of the reliability (cf. Pohl, 1999a). For memory hindsight, this resulted in a reliability of .49
(averaging across the two different Item Sets A and B). Reliability of BS and WS hypothetical hindsight bias was .77 and .58,
respectively.
Individual differences
To determine common factors underlying the personality variables under investigation, a principal component analysis was
conducted. Four factors with eigenvalues larger than 1 could be extracted. Table 1 shows the solution resulting from varimax
rotation. Taken together, the four factors account for 71% of the variance of the 10 personality variables.
Associations between personality variables and the amount of hindsight bias were analysed separately for the three
different operationalisations of hindsight bias. Pearson correlations between the hindsight bias scores and the individual
difference measures as well as the four factor scores are given in Table 2.
All significant correlations were in the predicted direction. Memory hindsight was positively associated with Dogmatism
and Conscientiousness. Significant correlations occurred between BS hypothetical hindsight and Social Desirability,
Impression Management, Rigidity, Field Dependence, and Conscientiousness. WS hypothetical
TABLE 1 Factorial structure of the personality variables

Factor 1 Factor 2 Factor 3 Factor 4
Tolerance for Ambiguity −.79* −.06 .11 −.11
(Kischkel, 1984)
Dogmatism .75* .01 −.05 .00
(Brengelmann &
Brengelmann, 1960)
Rigidity (Zerssen, 1994) .74* .17 .02 −.30
Social Desirability .19 .85* −.00 .03
(Stöber, 1999)
BIDR—Impression .14 .83* −.12 .01
Managementa (Paulhus,
1994; Musch et al., 2002)
BIDR—Self-Deceptive −.41 .64* −.09 −.18
Enhancement (Paulhus,
1994; Musch et al, 2002)
Field Independenceb .11 −.16 .87* .08
(Horn, 1983)
Need for Cognition −.45 .03 .73* −.17
(Bless et al., 1994)
Suggestibilityc .06 .05 .03 .90*
(Gudjonsson, 1984)
Conscientiousness .37 .45 .18 −.56
(Borkenau & Ostendorf,
1993)
Explained variance 26.9% 19.1% 14.2% 10.3%
Loadings <.60 are marked * (N=75, Varimax rotation). The ten eigenvalues were 2.69, 1.91, 1.42, 1.03, 0.73, 0.60, 0.53, 0.45, 0.36, and 0.
27).
a Impression Management and Self-Deceptive Enhancement were measured using the respective subscales of the Balanced Inventory of
Desirable Responding (Paulhus, 1994; for a German version, see Musch et al., 2002). Dichotomous scoring as recommended by
Paulhus (1994) was employed.
4Additionalanalyses indicated that there were no significant effects for sex of subject and type of test booklet (i.e., which set of items was
responded to without feedback and the temporal order of memory/hypothetical questions). These variables were therefore not included in
subsequent analyses.
464 MUSCH
Factor 1 Factor 2 Factor 3 Factor 4

b Following Pohl and Eisenhauer (1995), scale 10 of the LPS (Horn, 1983) was used to measure Field Independence. This scale is a variant
of the Embedded Figures Test, requiring participants to find simple figures hidden in larger, more complex objects. As pretests
showed, young adults with university education perform very well at this task. To avoid possible ceiling effects, participants
were therefore allowed only 2 instead of the usual 3 minutes to deal with the items of LPS scale 10 (Pohl & Eisenhauer, 1995,
did not report the performance of their participants in the Embedded Figures Test; given the present experiences, it can be
speculated that ceiling effects may have prevented them from detecting a significant correlation between hindsight bias and field
dependence).
c To measure Suggestibility, a German translation of the Gudjonsson Suggestibility Scale (GSS; Gudjonsson, 1984) was prepared. The GSS
has been developed to meet the demand for an objective psychometric instrument to quantify Suggestibility (Gudjonsson, 1984).
It reflects the extent to which individuals “yield” to various types of suggestive questions concerning the content of a short story.
For example, one item asks “Did the woman have one or two children?”. Because no children are mentioned in the story, the
answers “One” or “Two” are counted as “yield” answers indicating Suggestibility. The original GSS consists of two scales. Only
the “yield” subscale of the GSS was used in the present study. The second, “shift” subscale is intended to measure the extent to
which participants comply to interpersonal pressure and was not considered relevant for the purpose of the present invesitigation.
hindsight was associated with Social Desirability, Impression Management, and Field Dependence.
In a multiple regression analysis, the magnitude of the hindsight bias effect was regressed on the individual difference
variables. The 10 personality measures accounted for 19% of the variance in memory hindsight, 39% of the variance in BS
hypothetical hindsight, and 24% of the variance in WS hypothetical hindsight. Because of the low ratio of cases per predictor
in the above regression analyses, one may question whether the obtained high values of R2 are actually robust. When
regression weights developed on one group are applied to data from another group, multiple
TABLE 2 Correlations between individual difference measures and hindsight bias

Pearson correlations with
Expected sign of Memory Hindsight Between-Subjects Within-Subject
correlation Hypothetical Hindsight Hypothetical Hindsight
Social Desirability + .06 .28* .25*
Self-Deceptive + −.16 .14 .09
Enhancement
Impression Management + −.02 .34** .33**
Dogmatism + .37** −.05 .04
Rigidity + .08 .41** .19
Tolerance for Ambiguity − −.09 −.09 −.11
Need for Cognition − −.09 −.02 −.20
Field Independence − .10 −.24* −.36**
Suggestibility + −.04 −.21 −.06
Conscientiousness ± .24* .27* .14
Factor 1 .28* .13 .10
Factor 2 −.03 .32** .27*
Factor 3 .09 −.13 −.31**
Factor 4 .07 .29* .09
R2 .19 .39 .24
R2 adjusted .17 .33 .20
Effect size f2 .23 .64 .32
*p<.05; **p<.01; N=75. R2 indicates the percentage of variance that can be accounted for by the 10 personality variables; R2 adjusted
indicates the same percentage adjusted for shrinkage as explained in the text; and f2 indicates the respective effect sizes.
correlations are usually subject to shrinkage, the extent of which may sometimes be considerable. A simulation analysis was
therefore conducted in which one randomly chosen half of participants was used to compute the weights of the regression
equation. Subsequently, these weights were used to compute the multiple correlation for the remaining half of participants. This
procedure was done repeatedly to avoid artifacts of any particular set of participants. Averaging across a total of 1000
simulation runs, corrected R2 were computed for all three hindsight bias indices. The results of these simulation analyses show
that some shrinkage does occur (see Table 2); however, the results also show that the considerable proportions of variance
that can be explained by personality factors are probably real and not due to overfitting and capitalisation on chance.
DISCUSSION
Strong hindsight biases could be observed both under memory and hypothetical instructions. Participants overestimated both
how much they knew and how much they would have known. Hindsight bias was larger in the hypothetical than in the
memory condition, a finding that has frequently been reported in the literature (Campbell & Tesser, 1983; Davies, 1992;
Fischhoff, 1977).
Individual differences contributed significantly to the strength of the hindsight bias effect: 39% of the variance in BS
hypothetical hindsight, 24% of the variance in WS hypothetical hindsight, and 19% of the variance in memory hindsight
could be accounted for by personality measures. A principal component analysis revealed that most of the variance of the
personality variables can be accounted for by four factors.
Factor 1 explains the largest proportion of variance. It mainly consists of the Intolerance for Ambiguity and the Dogmatism
scale and replicates the Need for Predictability factor previously identified by Campbell and Tesser (1983). The hypothesis
that Rigidity is also related to this set of variables could be confirmed. The results are consistent with the notion that dogmatic
and rigid individuals, unable to tolerate an ambiguous state of knowledge, reduce the feeling of inconsistency induced by a
bad original estimate by insisting that they “knew it all along”.
A second factor represents the Self-Presentation motive. Social Desirability, Impression Management, and (to a lesser
extent) Self-Deceptive Enhancement load on this factor. The original finding of Campbell and Tesser (1983) of an association
between Social Desirability and hindsight bias could be confirmed. Going beyond the finding of Campbell and Tesser (1983),
the insignificant effect of Self-Deceptive Enhancement on the magnitude of the bias, as contrasted with the significant effects
of Social Desirability and Impression Management, may indicate that hindsight bias is more closely associated with a
tendency for positive self-presentation to others than with an attempt to maintain a positive private self-image. However, this
reasoning is rather speculative because there is only a tendency for a difference between the effects of Self-Deceptive
Enhancement and Impression Management (p < .10, one-tailed).
The manipulation of hypothetical-self versus hypothetical-other instructions aimed at reducing self-presentational concerns
produced no significant effects, which is in accordance with previous results (Wood, 1978). However, in descriptive terms,
hindsight bias was larger in the hypothetical-self than in the hypothetical-other condition, both in the between-subjects
condition (M=2.07 versus M=1.54) and in the within-subject condition (M=1.97 versus M=1.64). A post-hoc power analysis
was therefore conducted using the program GPower (Erdfelder, Faul, & Buchner, 1996). Given a significance level of a = .05,
the present sample sizes allowed detection of the hypothesised effect of the self/otherinstruction with a power (1-β) of .57, if
the effect was of medium size (d=0.50), and with a power of .93, if the effect size was large (d=0.80). Thus, the null effect of
Instruction Type is possibly caused by low power. On the other hand, it can be concluded that if there is an effect, its size is
probably not large. Further research is warranted before firm conclusions can be drawn on this issue.
The third factor identified in the principal component analysis comprises what can be called cognitive style variables. Field
Independence and Need for Cognition load on this factor, but only Field Independence correlated significantly with hindsight
bias. Possibly, field-independent individuals are better at reconstructing their pre-outcome knowledge (Davies, 1992, 1993;
cf. Verplanken & Pieters, 1988). It might also be that field dependents are more influenced by external sources of information
(the correct answers provided by the experimenter; Davies, 1993). In line with this reasoning, Linton (1955) found that field
dependents showed greater conformity to others’ judgements and showed greater attitude change following persuasive
communications. Among the four factors identified in the principal component analysis, the cognitive style factor seems to
constitute the most obvious link between the individual difference approach adopted here and more cognitively oriented
explanations of the hindsight bias effect. The results call for a more thorough investigation of the relationship between
individual differences in cognitive style on the one hand and cognitive processes underlying hindsight bias on the other.
Only one variable loaded substantially on the fourth factor, namely, Suggestibility. The first-order correlation between
Suggestibility and hindsight bias is small and insignificant, however, suggesting a negligible role of Suggestibiliy in the
effect. To interpret this result, it must be noted that the internal consistency of the German version of the Gudjonsson
Suggestibility Scale (GSS) that had been prepared for the present study, was disappointing (Cronbach’s α=.48 as compared to .
77 in the English original). A possible reason is that the short story at the heart of the scale was given in writing, whereas
Gudjonsson (1984) read it aloud to his participants. However, the lack of any relationship between Suggestibility and the
magnitude of hindsight bias replicates a converging finding of Pohl (1999b).
The Big-Five scale Conscientiousness— included in the present study primarily for exploratory reasons—is not strongly
associated with any of the four factors, but shows medium-sized loadings on the Need for Predictability factor, the Self-
Presentation factor, and (with negative sign) on the Suggestibility factor. Moreover, it is the only variable that is significantly
related to both memory and hypothetical hindsight bias. First-order correlations of Conscientiousness were highest with social
desirability (r=.44, p < .001; see Paulhus, 1994, for a similar finding) and Rigidity (r=.43, p < .001). It seems that
conscientiousness taps variance common to a number of other personality variables investigated in this study (as might be
466 MUSCH
expected from a Big-Five factor; Costa & McCrae, 1992). More research is necessary to better understand the role of
Conscientiousness in the hindsight bias effect.
Overall, personality variables were more strongly correlated with hypothetical than with memory hindsight. Hypothetical
hindsight may simply leave more room for personality variables to become effective. It is also possible that in the memory
condition, participants are potentially aware that their hindsight estimates can be compared with their foresight estimates. This
fact, possibly suppressing self-presentational concerns and other effects associated with the utterance of a foresight estimate,
can account for the greater influence of personality variables on the magnitude of hypothetical hindsight bias. One may also
argue on the basis of the present results that if one is not interested in individual differences, the memory design might be
more suitable for a “pure” cognitive assay of hindsight bias than the hypothetical design (see also Blank & Fischer, 2000, for
a discussion of the different mechanisms working in memory and hypothetical conditions).
The differential reliability of memory and hypothetical hindsight measures is another likely explanation for the diverging
pattern of results (cf. Pohl, 1999a). Reliabilities were higher for BS hypothetical (.77) and WS hypothetical (.58) hindsight
than for memory hindsight (.49). Reliability of the memory hindsight index was possibly not sufficient to detect individual
differences in the size of the effect. Pohl (1999a) reports an average reliability of only .11 for memory hindsight based on
almanac questions requiring numerical estimates (e.g., “How many crime stories were written by Agatha Christie”?). Based
on this finding, he argued that low reliability may be responsible for the small and nonsignificant correlations he observed
between memory hindsight and personality measures. The use of confidence ratings instead of numerical judgements and the
additional use of hypothetical instructions may account for the much more satisfying reliabilities observed in the present
study.
As discussed in the method section, a theoretical advantage of the WS hypothetical hindsight bias index is that unlike the
BS hypothetical index, it does not confound hindsight bias with actual knowledge. However, the WS and the BS index turned
out to be very strongly correlated in the present study (r=.81), suggesting that this confound might not be a very serious one.
Considering that random assignment to conditions usually provides a powerful safeguard against the confoundation with
knowledge, there seems to be no particularly strong reason to refrain from the use of the BS hypothetical hindsight index in future
studies.
In the past two decades, researchers in the information-processing tradition have concentrated on the question of how most
people typically process information in the hindsight paradigm. Partly because of some failures to demonstrate individual
differences in the magnitude of the effect, relatively few studies have been carried out on personality correlates of hindsight
bias. The present study reinstates the utility of applying different theoretical approaches to the hindsight phenomenon—one
involving general processing models, the other an individual difference perspective. Personality variables that reflect natural
variations in mediating processes may be usefully applied in testing the operation of mechanisms proposed by general
theories and models (see Stanovich & West, 1998, for a related approach).
Advancing an individual difference approach to hindsight bias is, of course, not equivalent to dismissing previous research
findings as artifacts of variables such as the self-presentation motive. The present effects of individual differences are non-
negligible, but they leave much systematic variance unaccounted for. For example, several studies reported effects of
variables like depth of encoding, retention interval, and item difficulty (see Hawkins & Hastie, 1990, for a review). These
effects point to the role of cognitive processes that cannot be couched in terms of personality variables. Moreover, the mere
existence of personality correlates of hindsight bias says nothing about how they get translated into the size of the effect. Where
are the effects of the Need for Predictability, the Self-Presentation Motive, Field Dependence, and Conscientiousness located
in the judgement process? Do individual differences exert their influence when a person is about to encode the solution, to revise
his or her knowledge structure, or to generate a response? Future investigations must examine more closely the mechanisms
by which personality variables affect the magnitude of the hindsight bias effect. In particular, the extent to which the observed
individual differences are actually differences in personality has yet to be determined. Possibly, cognitive variables are
captured by some of the personality measures included in the present study. For example, a correlation between field
indepence and intelligence has repeatedly been observed (e.g., Roberge & Flexer, 1981).
Some additional caution is warranted when interpreting the results of this study. It is conceivable that individual differences
play a larger role when hindsight bias is being investigated using almanac questions than in other contexts (cf. Hawkins &
Hastie, 1990). However, for a variety of reasons it seems plausible to assume that individual differences may be even more
important in real-world contexts. This is because outside the laboratory, individuals often choose their tasks rather than
dealing with tasks they are being asked to perform. Individual difference effects are likely to be magnified when individuals
are more ego-involved in a task and face serious consequences from their decisions.
Stanovich and West (1998) recently found that hindsight bias was significantly associated with performance in other
cognitive tasks. In particular, they observed that people showing larger hindsight effects also showed more belief bias in
syllogistic reasoning, more errors in statistical reasoning, and more overconfidence. They interpreted this finding as evidence
for stable individual differences in rational thought. The notion that individual differences in hindsight bias generalise to other
kinds of biases and performance in different cognitive tasks is intriguing and worth pursuing in future research.
In an earlier review, Hawkins and Hastie (1990) concluded that hindsight biases are probably influenced by more than just
one mechanism. The present study confirms that composite, multifactor explanations will be required to provide a satisfying
account of the hindsight bias phenomenon. In particular, systematic individual differences in the magnitude of the bias exist
and must be taken into account in a complete model of the effect.
REFERENCES
Blagrove, M., Cole-Morgan, D., & Lambe, H. (1994). Interrogative suggestibility: The effects of sleep deprivation and relationship with
field dependence. Applied Cognitive Psychology, 8, 169–179.
Blank, H., & Fischer, V. (2000). “Es muβte eigentlich so kommen”: Rückschaufehler bei der Bundestagswahl 1998. [“It had to turn out that
way”: Hindsight bias in the German parliamentary elections in 1998]. Zeitschrift für Sozialpsychologie, 31, 128–142.
Bless, H., Wänke M., Bohner, G., Fellhauer, R., & Schwarz, N. (1994). Need for cognition: Eine Skala zur Erfassung von Engagement und
Freude bei Denkaufgaben [Need for Cognition: A scale for the assessment of involvement and happiness in cognitive tasks].
Zeitschrift für Sozialpsychologie, 25, 147–154.
Borkenau, P., & Ostendorf, F. (1993). Neo-Fünf-Fak-toren-Inventar (NEO-FFI) [NEO-Five-FactorInventory]. Göttingen: Hogrefe.
Bradley, G.W. (1978). Self-serving biases in the attribution process: A reexamination of the fact or fiction question. Journal of Personality
Brengelmann, J., & Brengelmann, L. (1960). Deutsche Validierung von Fragebogen dogmatischer und intoleranter Haltung [German
validation of scales for dogmatism and intolerance for ambiguity]. Zeitschrift für Experimentelle und Angewandte Psychologie, 7,
451–471.
Cacioppo, J.T., & Petty, R.E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42, 116–131.
Christensen-Szalanski, J.J., & Willham, C.F. (1991). The hindsight-bias: A meta-analysis. Organizational Behavior and Human Decision
Processes, 48, 147–168.
Cohen, A., Stotland, E., & Wolfe, D.M. (1955). An experimental investigation of need for cognition. Journal of Abnormal and Social
Connolly, T., & Bukszar, E.W. (1990). Hindsight bias: Self-flattery or cognitive error? Journal of Behavioral Decision Making, 3,
205–211.
Costa, P.T., & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO PI-R) and NEO Five Factor Inventory. Professional Manual.
Odessa, FL: Psychological Assessment Resources.
Crowne, D.P., & Marlowe, D. (1964). The approval motive. New York: Wiley.
Davies, M.F. (1987). Reduction of hindsight bias by restoration of foresight perspective: Effectiveness of foresight-encoding and hindsight-
retrieval strategies. Organizational Behavior and Human Decision Processes, 40, 50–68.
Davies, M.F. (1992). Field dependence of hindsight bias: Cognitive restructuring and the generation of reasons. Journal of Research in
Davies, M.F. (1993). Field dependence and hindsight bias: Output interference in the generation of reasons. Journal of Research in Personality,
27, 222–237.
Davies, M.F. (1998). Dogmatism and belief formation: Output interference in the processing of supporting and contradictory cognitions.
Journal of Personality and Social Psychology, 75, 456–466.
Davis, J.K., & Frank, B.M. (1979). Learning and memory of field independent-dependent individuals. Journal of Research in Personality,
13, 469–479.
Durso, F.T., Reardon, R., & Jolly, E.J. (1985). Self-nonself-segregation and reality monitoring. Journal of Personality and Social
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPO-WER: A general power analysis program. Behavior Research Methods, Instruments, &
Computers, 28, 1–11.
Fischer, I., & Budescu, D. (1995). Desirability and hindsight biases in predicting results of a multi-party election. In J.P.Caverni, M.Bar-
Hillel, F.H.Barron, & H.Jungermann (Eds.), Contributions to decision making (Vol. 1, pp. 193–211). Amsterdam: Elsevier.
349–358.
Frank, B.M. (1983). Flexibility of information processing and the memory of field-independent and field-dependent learners. Journal of
Research in Personality, 17, 89–96.
Franklin, B.J., & Carr, R.A. (1971). Cognitive differentiation, cognitive isolation, and dogmatism. Sociometry, 43, 230–237.
Goodenough, D.R. (1976). The role of individual differences in field dependence as a factor in learning and memory. Psychological
Bulletin, 83, 675– 694.
468 MUSCH
Gudjonsson, G. (1984). A new scale of interrogative suggestibility. Personality and Individual Differences, 5, 303–314.
Hasher, L., Goldstein, D., & Toppino, T. (1977). Frequency and conference of referential validity. Journal of Verbal Learning and Verbal
Behavior, 16, 107–112.
311–327.
Hell, W. (1993). Gedächtnistäuschungen [Memory illusions]. In W.Hell, K.Fiedler, & G.Gigerenzer (Ed.), Kognitive Täuschungen
[Cognitive illusions] (pp. 13–38). Heidelberg: Spektrum.
Memory, 11, 357–377.
Hertwig, R., Gigerenzer, G., & Hoffrage, U. (1997). The reiteration effect in hindsight bias. Psychological Bulletin, 104, 194–202.
Hoffrage, U., & Hertwig, R. (1999). Hindsight bias: A price worth paying for fast and frugal memory. In G. Gigerenzer, P.M.Todd, & the
ABC Research Group, Simple heuristics that make us smart (pp. 191–208). New York: Oxford University Press.
Psychology: Learning, Memory and Cognition, 26, 566–581.
Horn, W. (1983). Leistungsprüfsystem L-P-S [Achievement examination system L-P-S] (2nd ed.). Göttingen: Hogrefe.
Hovland, C.I., & Janis, I.L. (1959). Personality and persuability. New Haven, CT: Yale University Press.
Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University
Press.
Kelley, H.H. (1971). Attribution in social interaction. Morristown, NJ: General Learning Press.
Kischkel, K. (1984). Eine Skala zur Erfassung von Ambiguitätstoleranz. [A scale to measure ambiguity tolerance]. Diagnostica, 30,
144–154.
Kruglanski, A.W. (1996). Motivated social cognition: Principles of the interface. In E.T.Higgins & A.W. Kruglanski (Eds.), Social
psychology: Handbook of basic principles (pp. 493–520). New York: Guilford Press.
Kush, J.C. (1996). Field-dependence, cognitive ability, and academic achievement in Anglo American and Mexican American students.
Journal of Cross Cultural Psychology, 27, 561–575.
Linton, H.B. (1955). Dependence on external influence: Correlates in perception, attitudes, and judgment. Journal of Abnormal and Social
84, 29–41.
MacDonald, A.P. (1970). Revised scale for ambiguity tolerance: Reliability and validity. Psychological Reports, 26, 791–798.
Mark, M, & Mellor, S. (1991). Effect of self-relevance of an event on hindsight bias: The foreseeability of a layoff. Journal of Applied
Melancon, J.G., & Thompson, B. (1989). Measurement characteristics of the Finding Embedded Figures Test. Psychology in the Schools,
26, 69–78.
Musch, J., Brockhaus, R., & Bröder, A. (2002). Ein Inventar zur Erfassung von zwei Faktoren sozialer Erwünschtheit [An inventory for the
assessment of two factors of social desirability]. Diagnostica, 48,121–129.
Musch, J. & Bröder, A. (1999). Ergebnisabhängig asymmetrisches Attributionsverhalten: Motivationale Verzerrung oder rationale
Informationsverarbeitung? [Attribution asymmetry after success and failure: Motivational bias or rational information processing?].
Zeitschrift für Sozialpsychologie, 30, 246–254.
Nisbett, R.E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice-Hall.
Palmer, D.L., & Kalin, R. (1985). Dogmatic responses to belief dissimilarity in the “bogus stranger” paradigm. Journal of Personality and
Paulhus, D.L. (1984). Two-component models of socially desirable responding. Journal of Personality and Social Psychology, 46,
598–609.
Paulhus, D.L. (1994). Balanced Inventory of Desirable Responding: Reference Manual for BIDR Version 6. Vancouver, Canada: University
of British Columbia.
Paulhus, D.L., Bruce, M.N., & Trapnell, P.D. (1995). Effects of self-presentation strategies on personality profiles and their structure.
Paulhus, D.L., & Reid, D.B. (1991). Enhancement and denial in socially desirable responding. Journal of Personality and Social
Petty, R.E., & Cacioppo, J.T. (1986). Communication and persuasion. Central and peripheral routes to attitude change. New York:
Springer.
Pohl, R.F. (1993). Der Rückschaufehler—Ein Modell zur Analyse und Erklärung systematisch verfälschter Erinnerungen [The hindsight
bias—A model for the analysis and explanation of systematically biased recollections]. Unpublished Habilitationsschrift, University of
Trier. Trier, Germany.
Pohl, R.F. (1999a). Hindsight bias: Robust, but not reliable. Unpublished manuscript. University of Gieβen, Germany.
Pohl, R.F. (1999b). Der Einfluβ suggestiver Faktoren auf den Ankereffekt [The influence of suggestive factors on anchoring].
Experimentelle und Klinische Hypnose, 15, 29–53.
Pohl, R.F., & Eisenhauer, M. (1995). Der Rückschaufehler bei der Lokalisierung von Städten auf einer Landkarte [Hindsight bias in
locating cities on a map]. Zeitschrift für Experimentelle Psychologie, 42, 63–93.
Pohl, R.F., Eisenhauer, M, & Hardt, O. (2003). SARA: A cognitive process model to simulate the anchoring effect and hindsight bias.
Memory, 11, 337–356.
Pohl, R.F., & Gawlik, B. (1995). Hindsight bias and the misinformation effect: Separating blended recollections from other recollection
types. Memory, 3, 21–55.
Pohl, R.F., Hardt, O., & Eisenhauer, M. (2000). SARA —ein kognitives Prozepmodell zur Erklärung von Ankereffekt und Rückschaufehler
[SARA—a cognitive process model explaining the anchoring effect and hindsight bias]. Kognitionswissenschaft, 9, 77–92.
Pohl, R.F., Stahlberg, D., & Frey, D. (1999). I’m not trying to impress you, but I surely knew it all along! Self-presentation and hindsight
bias. Working paper No. 99–19, SFB 504, University of Mannheim.
Roberge, J.J., & Flexer, B.K. (1981). Re-examination of the covariation of field inde-pendence, intelligence and achievement. British
Journal of Educational Psychology, 51, 235–236.
Rokeach, M. (1954). The nature and meaning of dogmatism. Psychological Review, 61, 194–204.
Srull, T.K., Lichtenstein, M., & Rothbart, M. (1985). Associative storage and retrieval processes. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 11, 316–345.
Stahlberg, D., Eller, F., Maass, A., & Frey, D. (1995). We knew it all along. Hindsight bias in groups. Organizational Behavior and Human
Review of Social Psychology, Vol 8 (pp. 105–132). Chichester, UK: Wiley.
low self-esteem relevance. Working paper No. 99–34, SFB 504, University of Mannheim.
Stahlberg, D., Sczesny, S., & Schwarz, S. (1998). Der “Reversed hindsight bias”—die Rückkehr der Motivation in die Hindsight-
Forschung? [The reversed hindsight bias—the return of motivational accounts in hindsight research?]. In W.Hacker (Ed.), Abstracts of
the 41st Congress of the German Psychological Society [computer disk]. Dresden, Germany: Technical University Dresden.
Stanovich, K.E. (1999). Who is rational? Studies of individual differences in reasoning. Mahwah, NJ: Lawrence Erlbaum Associates Inc.
Stanovich, K.E., & West, R. (1998). Individual differences in rational thought. Journal of Experimental Psychology: General, 127,
161–188.
Stanovich, K.E., & West, R. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain
Sciences, 23, 645–726.
Stöber, J. (1999). Die Soziale-Erwünschtheits-Skala-17 (SES-17): Entwicklung und erste Befunde zu Reliabilität und Validität [The Social
Desirability Scale-17 (SDS-17): Development and first results on reliability and validity]. Diagnostica, 45, 173–177.
Strack, F., & Muβweiler, T. (1997). Explaining the enigmatic anchoring effect: Mechanisms of selective accessibility. Journal of
Witkin, H.A., & Goodenough, D.R. (1977). Field dependence and interpersonal behavior. Psychological Bulletin, 84, 661–689.
Witkin, H.A., & Goodenough, D.R. (1981). Cognitive styles: Essence and origin. New York: International University Press.
Witkin, H.A., Oltman, P.K., Raskin, L, & Karp, S.A. (1971). A manual for the Embedded Figures test. Palo Alto, CA: Consulting
Psychologists Press.
Wood, G. (1978). The knew-it-all-along effect. Journal of Experimental Psychology: Human Perception and Performance, 4, 345–353.
Zerssen, D. (1994). Persönlichkeitszüge als Vulnerabilitätsindikatoren—Probleme ihrer Erfassung [Personality traits as vulnerability
markers—Problems of assessment]. Fortschritt der Neurologie, Psychiatrie und ihrer Grenzgebiete, 62, 1–13.
Hindsight bias in political elections
Hartmut Blank
University of Leipzig, Germany
Volkhard Fischer
Hannover Medical School, Germany
Edgar Erdfelder
Two studies on political hindsight bias were conducted on the occasions of the German parliament election in
1998 and the Nordrhein-Westfalen state parliament election in 2000. In both studies, participants predicted the
percentage of votes for several political parties and recalled these predictions after the election. The observed
hindsight effects were stronger than those found in any prior study on political elections (using percentage of
votes as the dependent variable). We argue that the length of the retention interval between original judgement
and recollection is mainly responsible for this difference. In our second study, we investigated possible artifacts in
political hindsight biases using a control-group design where half of the participants recalled their predictions
shortly before or after the election. Hindsight bias was preserved, reinforcing the results of earlier studies with
non-control-group designs. Finally, we discuss the possibility that the hindsight experience (in political
judgement and in general) actually consists of three different, partly independent components.
The hindsight bias or knew-it-all-along effect (Fischhoff, 1975, 1977) is the tendency to retrospectively exaggerate one’s
foresight knowledge about the outcome of an event. This phenomenon has been found in experimental settings (e.g., Dehn &
Erdfelder, 1998; Hell, Gigerenzer, Gauggel, Mall & Müller, 1988, see overview by Hawkins & Hastie, 1990) and in a variety
of applied settings as well, for example, in medical diagnoses (Arkes, Faust, Guilmette, & Hart, 1988; Detmer, Fryback, &
Gassner, 1978), jurors’ decisions (Casper, Benedict, & Perry, 1989) or judgements of historical-political events (Fischhoff,
1975; Pennington, 1981a; Wasserman, Lempert, & Hastie, 1991). Several studies have also investigated the phenomenon of
concern in the present article, namely, hindsight bias in political elections (Fischer & Budescu, 1995; Leary, 1982;
Penningon, 1981b; Powell, 1988; Synodinos, 1986; Tykocinski, 2001; Wendt, 1993), and in most of these studies some
evidence for hindsight bias was found. Upon closer examination, however, these effects appear to be quite heterogeneous and
do not readily fit into a coherent picture.
Hindsight bias was found most easily for confidence judgements and, to a lesser extent, for

http://www.tandf.co.uk/journals/pp/09658211.html DOI: 10.1080/09658210244000513
probability estimates, but not consistently for the predicted percentage of votes. To illustrate, Synodinos (1986) asked 217
students, on the day prior to the election, about the outcome of the Hawaiian gubernatorial election in 1982. His participants
predicted the percentage of votes that each of three candidates would receive, indicated their confidence for these predictions,
and further estimated the probability that each candidate would win the election. Another 257 participants made the same
judgements in retrospect 2 days after the election. In other words, Synodinos used a between-participants hypothetical design
in which some participants make foresight judgements concerning some event and others make the same judgements in
hindsight as if they had not known the outcome. As it turned out, there was no significant difference between the pre-election
and post-election samples for the estimated percentage of votes. However, the post-election participants were significantly
Requests for reprints should be sent to Dr Hartmut Blank, Institut für Allgemeine Psychologie, Universität Leipzig,
Seeburgstr. 14–20, D-04103 Leipzig, Germany. Email: blank@rz.uni-leipzig.deThe preparation of this article was supported
by Grant Er224/l-2 from the Deutsche Forschungsgemeinschaft (DFG). We are strongly indebted to Martin Beckenkamp of
the University of Saarbrucken for his generous support in the conduction of our second study. We also thank the participants
of the first and second Hindsight Bias workshops in Mannheim and Rauischholzhausen, sponsored by the DFG (SFB 504) and
the University of Giessen, respectively, for fruitful discussions of our studies. Study 2 was mainly inspired by the Mannheim
workshop. Finally, we are grateful to Mark Elliott, Oliver Hardt, and Stefan Schwarz for helpful comments on a previous version
of this article.
POLITICAL HINDSIGHT BIAS 471
more confident in these estimates. Results were mixed for the judged probability of winning the election: Hindsight judgements
were significantly distorted towards the actual outcome for one of the three candidates but not for the others.
A study by Powell (1988) on the 1984 US presidential and Missouri gubernatorial elections found a very similar pattern of
results, using both a between-participants hypothetical design and a within-participant memory design. In the latter design, the
same participants provide foresight judgements of some fact or event and try to remember these judgements after learning
about the outcome.
Given such heterogeneity of results for different dependent measures, we cannot be so confident that the hindsight bias has
been convincingly demonstrated in political elections. One particular reason to be sceptical is the lack of bias for the
percentage of votes estimates in these studies, because these measures are very explicit, “hard” indicators of people’s actual
foresight knowledge and therefore bear most directly on the essence of the hindsight bias phenomenon (i.e., distortion of
foresight knowledge). Other studies employing such explicit measures of foresight knowledge do not alleviate our concerns
very much: While Leary (1982) and Fischer and Budescu (1995) found significant but small hindsight bias effects in the 1980
US presidential elections and the 1992 Knesset elections in Israel, respectively, Wendt (1993) found no effects at all in two
German Land parliament elections. Furthermore, since Leary (1982) used a hypothetical design, it is difficult to speak of an
actual distortion of foresight knowledge in this study because no such judgements had been made by the participants (at least
not in the experiment).
Thus, we felt that a convincing demonstration of memory hindsight bias in political elections was still desirable. In our
studies, we tried to create conditions that should be favourable to the emergence of hindsight distortions. This holds
particularly for the time interval between original and hindsight judgement: Longer retention intervals lead to weaker memory
traces and/or more forgetting (cf. Ebbinghaus, 1885). With memory for outcome information held constant, this should lead to
more hindsight bias; this follows from relative trace strength accounts of the hindsight bias (Hell et al., 1988) and from
research on retroactive interference in general (cf. Baddeley, 1976; Crowder, 1976) and is also supported by recent
multinomial (Erdfelder & Buchner, 1998) and computational (Pohl, Eisenhauer, & Hardt, 2003-this issue) models of
hindsight bias. In both of our studies, we used a retention interval of approximately 4 months, which was considerably longer
than any retention interval employed in previous studies. We were therefore quite confident of finding a substantial amount of
hindsight bias even using both a “hard” measure of foresight knowledge and a memory design.
Besides a robust demonstration of memory hindsight bias in political elections, another focus of the present work was on
some more subjective facets of the participants’ hindsight experience and how these relate to the memory effects. This refers
to what we call foresight illusion and impression of necessity (these facets are featured in two terms coined by Fischhoff,
1975: knew-it-all-along effect and creeping determinism). While it is generally assumed that these terms, as well as the
hindsight bias in recollections of foresight estimates, describe basically the same phenomenon, we argue that they might in
fact constitute three more or less independent components of the hindsight experience, which may be subject to separate
influences. This issue is elaborated more deeply in the general discussion. In the present studies, we tried to capture these
additional facets by means of a qualitative analysis of participants’ self-reports on their hindsight experience (Study 1) and
two separate Likert-type items (Study 2).
STUDY 1:
THE GERMAN BUNDESTAG ELECTION IN 1998
On the occasion of the German Bundestag election in September of 1998, we started a broader investigation into our
participants’ foresight and hindsight perceptions of the election and its outcome. The results of this investigation are presented
in more detail elsewhere (Blank & Fischer, 2000), so that we will restrict our presentation here to the issues outlined above.
Method
Overview and political climate before the election. In the beginning of July 1998, when we asked our participants to make an
initial prediction of the election outcome, the opinion polls were in favour of the then-oppositional social democrat (SPD) and
green (GRÜNE) parties, mainly because the popular Gerhard Schröder had just been appointed chancellor-candidate of the
social democrats. Soon after, however, this climate gradually shifted back towards favouring the conservative-liberal (CDU/
CSU and FDP) government led by Helmut Kohl. Thus, 2 weeks before the Bundestag election, the race was open, and
therefore the actual outcome of the election on September 27, with its clear social democrat advantage over the conservatives,
was a surprise for most commentators. One month later, at the end of October, we asked our participants to recall the
predictions they had made at the beginning of July.
Participants. A total of 56 University of Leipzig psychology undergraduates took part in both data-collection sessions and
provided sufficient data for calculating a measure of hindsight bias. Participation was voluntary and anonymous (because
472 BLANK, FISCHER, ERDFELDER
party affiliation and other political interest data were also registered). An anonymous coding system made it possible to
properly assign the participants’ foresight and hindsight responses.
Procedure. The first part of the study was conducted in a class setting and was announced (orally and in writing) as a study
in political psychology; hindsight bias was not mentioned. The questionnaire first asked the participants to predict the
percentage of votes the following parties or blocks would obtain in the Bundestag election: (1) CDU/CSU (conservative), (2)
SPD (social democrat), (3) FDP (liberal), (4) GRÜNE (green), (5) PDS (left-wing), (6) extreme right,1 and (7) others. Also,
the participants were asked to predict the winning party or coalition. Four months later, in the second part of the study, after
answering some other questions on political issues, the participants were asked to report the election outcome (they were not
reminded of the actual election outcome). They also indicated whether they were surprised by the outcome and, if yes, gave
reasons for their surprise. Finally, they were asked to recollect their earlier predictions concerning the election outcome,
purportedly to assess their ability to mentally reinstate the earlier political climate. After finishing the questionnaires, the
participants were thanked for their participation, and feedback on the results was announced to be given in a separate session.
Analysis. For each participant, we calculated an index of hindsight bias2 based on suggestions by Fischer and Budescu
(1995), namely, the shift of the recollected estimates towards the election outcome, averaged across parties. To illustrate,
assume that a participant has estimated 22% for party X, party X receives 25% of the vote, and the participant recalls that he/
she has predicted 24%. This means an absolute foresight distance (between original estimate and outcome) of three
percentage points and an absolute hindsight distance (between recollected estimate and outcome) of one percentage point.
Subtracting the hindsight distance from the foresight distance then yields a shift of the recollected estimate towards the
outcome of two percentage points for party X. These shifts are calculated for all parties and then averaged. This procedure is a
slight departure from Fischer and Budescu (1995), who added the shifts across the parties. Our modified procedure has the
advantage of making the index comparable across studies with different numbers of parties. To make this difference explicit,
we call our measure the modified Fischer-Budescu index. Note that the index was calculated on the basis of the actual election
results, not the results as recalled by the participants (which are themselves subject to distortions; see Blank & Fischer, 2000).

The means of our participants’ original and recollected estimates are given in Table 1 together with the election outcome. Simple
inspection of the table reveals a clear shift—designated delta delta (ΔΔ) in the last column, because it represents a difference
of (absolute) differences—of the hindsight recollections towards the actual election outcome for most parties already at this
descriptive level. This impression is supported by an analysis of the modified Fischer-Budescu index, M = 0.71, SE=.20, t
(55)=3.63, p=.001 (one-tailed), indicating that the average participant’s recollections had shifted from his/her original
predictions towards the election outcome by .71 percentage points (from an initial mean foresight distance of 2.97 percentage
points).3 Thus, the recalled estimates of the average participant were about one quarter closer to the actual election results than
the original estimates were.4 The conclusion from these results is straightforward: Employing a long retention interval
between prediction and recollection, we found strong evidence for hindsight bias in political elections in a memory design and
using the “hard” percentage of vote estimates as the target variable.
Nevertheless, there is one potential threat to this conclusion, due to the fact that we did not realise a control-group design
(as nobody else had in previous studies of the hindsight bias in political elections). In principle, therefore, our observed
hindsight effects might be caused by some other factor than outcome knowledge and might also show up in a control
condition without such knowledge. Surely enough, it is hard to imagine a make multiple predictions of the election outcome
mechanism that will produce such a result, but here is an idea:5 Assume that (1) the participants in different settings (private,
experimental etc.); (2) these predictions might be influenced by current survey results; (3) with time, the survey results will
approximate the actual outcome more closely and therefore, the participants’ predictions will be better too; (4) the participants
might confuse their experimental prediction with a more recent prediction. In this vein, our results might not demonstrate
hindsight bias but merely participants’ (correct) recollection of a more recent prediction. Note, however, that the argument
critically hinges upon an assumed improvement of the election forecasts. In fact, there was no such improvement during the
time between the first part of our study and the election (see, e.g., Infratest surveys of June 1998 to September 1998; Infratest
1Actually, this covered three parties of the extreme right. However, when planning our study, it was not yet clear whether each one of these
parties would take part in the election. Also, these parties are perceived as very similar, so we decided to treat them as one block.
2We also calculated two other indices of hindsight bias, suggested by Hell et al. (1988) and Pohl (1993). The former is a measure of the relative
shift of the recollected estimates towards the outcome, and the latter is a more general but also less intuitive variety of the modified Fischer-
Budescu index. These indices are reported in Blank and Fischer (2000). Therefore, and because they largely coincide with the present
analysis on the basis of the modified Fischer-Budescu index, they are not repeated here.
dimap, 2001), so that the above account seems unlikely for the present study. But even then, without a control condition, there
is always some residual risk of confounding influences which contribute to the results. Therefore, we decided to replicate our
study with a control-group design (see Study 2).
Another type of result of the present study concerns the participants’ subjective hindsight experience, as revealed by their
reasons for being surprised (N=19) or, more importantly, not being surprised (N=37) by the election outcome, which may be
indicative of hindsight distortions. The written comments of these latter 37 participants were content-analysed by the first
author (without knowledge about these participants’ memory hindsight status). This resulted in two substantial categories,
each including two subcategories, into which the comments of 28 participants could be classified; only 9 individuals provided
too little or unclassifiable information for a categorisation. An independent rater reclassified, after a brief explanation of the
category headings, 83% of the participants’ comments identically. Multiple categorisations were
TABLE 1 Study 1: Bundestag election

Party Foresight (F) Outcome (O) Hindsight (H) |F–O| |H–O| ΔΔ
CDU/CSU 31.5 35.1 34.3 3.6 0.8 2.8
SDP 37.3 40.9 39.0 3.6 1.9 1.7
FDP 5.7 6.2 5.2 0.5 1.0 −0.5
GRÜNE 7.7 6.7 8.1 1.0 1.4 −0.4
PDS 8.1 5.1 6.4 3.0 1.3 1.7
Extreme right 5.8 3.3 3.6 2.5 0.3 2.2
Other 3.9 2.7 3.6 1.2 0.9 0.4
Mean 2.2 1.1 1.1
Percentage of vote for various parties in foresight, hindsight, and in effect.
ΔΔ= difference of absolute distances |F–O| – |H–O|.
rare: Only 6 participants featured in two categories and 2 in three categories, whereas the remaining 20 were assigned to just
one category. The categories were: (1) Comments on the foreseeability of the election outcome. (la) Internal foreseeability:
Twelve individuals indicated that they themselves had foreseen the outcome (e.g., “I knew that there would be a change of
government”). (1b) External foreseeability: Another ten individuals maintained that the outcome had been predicted by
opinion polls. (2) Substantial reasons for the election outcome. (2a) Political causation: Ten participants mentioned a growing
discontent and need for change in the population. (2b) Historical or normative necessity: Six individuals’ comments conveyed
the impression of a necessary historical development (e.g., “It had to turn out that way”) and also contained a kind of
justification for or satisfaction with the course of events (e.g., “The government was due to change, the CDU was in power
too long”); such comments stemmed mainly from left-oriented participants.
In sum, the hindsight experience of the participants (who were not surprised by the election outcome) was dominated by
two characteristics, foreseeability and perceived necessity of the outcome, which are captured quite well by the expressions
knew-it-all-along effect and creeping determinism (Fischhoff, 1975, 1977). Interestingly, there was no obvious relation
between these phenomenal characteristics and the magnitude of memory hindsight bias. Of course, this might be due to the
incidental nature of the observations. This issue will be taken up in Study 2 and in the general discussion.
STUDY 2:
THE NORDRHEIN-WESTFALEN LAND PARLIAMENT ELECTION IN MAY 2000
The main purpose of Study 2 was to assess the hindsight bias in political elections with a control-group design. Previous
studies may have resorted to the traditional before-after design because a control-group design runs into obvious difficulties;
above all, it is neither possible nor ethically desirable to run a control condition wherein the participants are isolated from the
3 Obviously, although based on the same data set, the analyses in terms of ΔΔ and the modified Fischer-Budescu index lead to somewhat
differing results (mean shifts of 1.1 and .71 percentage points, respectively). This is because, in the calculation of these values, absolute
distances enter the formula at different levels (individual or group); however, the mean of absolute distances is not equivalent to the
absolute distance of means.
4 The magnitude of the shift towards the election outcome depends somewhat on the index used to measure it. The values reported here are
conservative, since the other indices yield larger shifts (see Blank & Fischer, 2000). In terms of conventional effect sizes, the shifts
indicated by the various indices represent medium to large effects (Cohen’s d ranging from about .5 to .7).
5This idea was raised by an anonymous reviewer of the Blank and Fischer (2000) article.
election outcome until some time after the election. However, long retention intervals between prediction and recollection
offer another possibility: If control and experimental participants recall their predictions shortly before and after the election,
respectively, the difference in retention interval between these groups is small compared to the entire interval, and therefore
the essential difference between them is the one of interest, that is, learning about the election outcome or not.
This design also made it possible to simultaneously investigate hindsight bias in a between-participants hypothetical
design. In addition to recalling their 3-month-old predictions, control participants made new predictions for the election
outcome, whereas experimental participants were asked what they would have predicted immediately before the election had
they not known the outcome. Finally, extending the observations from Study 1, the subjective hindsight experience was
assessed in a more systematic way: All participants were asked two questions concerning the foreseeability and perceived
necessity of the election outcome.
Method
Overview and political climate before the election. In the beginning of February 2000, all participants predicted the election
outcome. At that time, the surveys indicated a small but not insurmountable advantage of the reigning SPD/ GRÜNE coalition
over the oppositionary CDU/ FDP camp. Soon after, however, the prospects for the opposition dropped markedly as a result
of a financial scandal, but rose again when the popular Angela Merkel was elected chairperson of the nationwide CDU in late
March. Immediately prior to the election, then, the political situation as indicated by the opinion polls closely resembled the
climate in which the first part of the study was conducted. For the second part of the study, the participants were randomly
divided into a control and an experimental group, which were run in the week before and after, respectively, the election of
May 14 that saw a close victory of the reigning coalition.
Participants. A total of 89 psychology undergraduates participated in part one of the study, 44 from the University of Bonn
(Nordrhein-Westfalen) and 45 from the University of Saarbrücken (Saarland). From the initial sample 67 individuals also
participated in the second part of the study, 3 of whom were dropped from the analyses because of missing data, so that 64
participants were included in the analyses, 30 and 34 from the Bonn and Saarbrücken subsamples, respectively. As it turned
out, precisely half of these respondents from each subsample had been assigned to the control condition before, the other half
to the experimental condition. Thus, there were 32 control participants and 32 experimental participants altogether. There
were 18 males and 46 females in the final sample, with a median age of 22.5 years. All participants in both sessions received
course credit. Also, the participants were given collective feedback on the study in the classes from which they were
recruited.
Procedure. The procedure in the first part of the study was basically identical to Study 1. Under a similar cover story, all
participants were asked to predict the election outcome as in Study 1. Again, anonymity of the data was guaranteed via a
coding system. They were also asked to write their address on a separate envelope because they would be contacted by mail
for the second part of the study. Only then were the participants of each subsample randomly assigned to the control and
experimental conditions. Control participants were mailed the second questionnaire 1 week before the election. The
instructions and cover story from the first session were basically repeated, and the participants were firmly requested to return
the completed questionnaire by Friday before the election. The questionnaire began with the recall of the predictions made 3
months ago. Two further questions aimed at the participants’ hindsight experience: They were asked to judge the accuracy of
their initial prediction of the election outcome, on a 7-point response scale with the end-points “very accurate” (1) and
“completely wrong” (7). The second question first explained that sometimes political or historical events might simply be
chance outcomes, and in other cases more deterministic; then it asked for a corresponding judgement of the political-historical
necessity of the election outcome, on a 7-point response scale with end-points “necessity” (1) and “chance” (7). Some political
interest and knowledge questions followed, and finally the participants were asked to make another prediction of the election
outcome. Experimental participants were mailed a similar questionnaire the day before the election, so that they would not
receive it before the Monday after the election (the election was held on a Sunday, and there is no mail delivery on Sundays in
Germany). The questionnaire was to be returned by Friday of the same week.
The control and experimental questionnaires crucially differed with respect to the instructions for the new predictions: For
experimental participants, we professed in an additional passage that we had mailed the questionnaire very late, because we were
interested in the “last impression” before the election. Should they therefore have received the questionnaire only after the
election, they should answer these questions anyway and fill in what would have been their best predictions immediately
before the election. To heighten the credibility of this story, and also as a manipulation check, the participants were asked to
report the questionnaire’s day of arrival (none of these dates was critical). Another minor difference was that the two
subjective experience questions referred to the likely election outcome in the control group and to the final outcome in the
experimental group. All other questions were exactly the same in both questionnaires.
Upon returning for their course credit after termination of the study, all participants were given a postexperimental question
sheet to be answered anonymously (again using the coding system). This sheet included three questions tapping the perceived
purpose of the study, communication with other participants, and awareness of differences between the control and experimental
groups. Particularly, we wanted to know if such knowledge affected the participant’s responses, and how. Without going into
the details, an informal analysis of the participants’ answers to these questions gave us no reasons to mistrust their responses;
thus, we went on with our data analysis.

Preliminary analyses revealed that deviations from 100% in the sum of percentage estimates across parties had no discernible
effect on the pattern of results. Therefore, all 64 participants who had provided both memory and hypothetical hindsight bias
data were retained in the analyses. In rare cases, participants had provided no estimates for one or more parties. In such cases,
the memory hindsight bias index (the same as in Study 1) for these individuals was calculated by averaging the shifts for the
remaining parties. As in Study 1, all tests referring to hindsight bias effects (in the memory or in the hypothetical design) were
directional with alpha set at .05. The analyses referring to the subjective hindsight experience of the participants were more
exploratory in nature. Therefore, we conducted two-tailed tests, again with alpha=.05.
Memory hindsight bias. Table 2 first presents descriptive data for the control and experimental groups. Obviously, there are
clear ΔΔ shifts in the participants’ recollections towards the election outcome in the experimental but not in the control group.
Again, this impression is supported by the modified Fischer-Budescu index which is significantly greater than zero in the
experimental group, M=1.48, SE=.28, t(31)=5.22, p < .001.6 In the control group, this index did not reach significance (M=0.
24, SE=.33, t(31)=0.73, p=.24). Also, the modified Fischer-Budescu index was significantly greater in the experimental than
in the control group, t(62)=2.85, p=.003. Thus, the conclusion from Study 1 is fully reinforced by the results of our second
study: There was a large memory hindsight bias in the experimental condition; also, on the descriptive level, the absolute
magnitude of this shift towards the election outcome (an average ΔΔ of 1.1 percentage points in Table 2) was the same as in Study
1. Most importantly, however, there was virtually no corresponding shift in the control group, reassuring us that the effects
observed in the experimental group and also in Study 1 were indeed a consequence of outcome knowledge.
Hypothetical hindsight bias. Memory hindsight bias indices are not suitable for the analysis of hypothetical hindsight bias.
In the latter, the question of interest is whether the hypothetical hindsight estimates of experimental participants are closer to
the actual election outcome than the foresight estimates of control participants. Thus, the natural proceeding is to calculate the
mean absolute deviation (per individual across parties) of the foresight or hindsight estimates from the election outcome and
see whether these indices are lower in the experimental group. Indeed, there was a trend in this direction, but it did not reach
significance; Mexp=2.56 and Mcon=2.96 percentage points, t(62)=1.04, p=.15. Further, as should be expected, individuals who
showed more memory hindsight bias also tended to show more hypothetical hindsight bias, as evidenced by the negative
partial correlation (across control and experimental groups, but controlling for the influence of the experimental manipulation
itself) between our hypothetical hindsight bias measure and the modified Fischer-Budescu index p=.01). In sum,
however, the main conclusion from these analyses is that the evidence for a hypothetical hindsight bias in Study 2 was rather
weak. This is surprising at first sight, because, if anything, the hypothetical hindsight bias tends to be larger than the memory
hindsight bias under comparable circumstances. Probably, however, these circumstances were too different in our study to yield
meaningful comparisons; in particular, the time interval between foresight and hindsight estimates was much longer in the
memory branch of our study.
TABLE 2 Study 2: Nordrhein-Westfalen election

Party Foresight (F) Outcome (O) Hindsight (H) |F−O| |H−O| ΔΔ
Experimental group
CDU 30.5 37.0 34.0 6.5 3.0 3.5
SDP 42.3 42.8 41.8 .5 1.0 −.5
FDP 6.4 9.8 6.5 3.4 3.3 .0
GRÜNE 9.8 7.1 8.5 2.7 1.4 1.3
PDS 4.4 1.1 3.9 3.3 2.8 .4
Right 3.6 1.2 2.1 2.4 .9 1.4
6 The initial average distance of the experimental participants’ predictions from the election outcome was 4.18 percentage points in the
experimental group (3.42 percentage points in the control group). Thus, the experimental participants’ recollections were about one third
closer to the election outcome (1.48/4.18), compared to these initial values. Other indices of hindsight bias (cf. Blank & Fischer, 2000)
yielded slightly larger values. The statistical effect sizes are large (Cohen’s d between 0.9 and 1.1 for the various indices).
Party Foresight (F) Outcome (O) Hindsight (H) |F−O| |H−O| ΔΔ

Other 3.9 1.1 2.5 2.9 1.5 1.4
Mean 3.1 2.0 1.1
Control group
CDU 33.0 37.0 33.7 4.0 3.3 .7
SPD 42.2 42.8 41.2 .6 1.6 −1.0
FDP 6.2 9.8 5.9 3.6 3.9 −.3
GRÜNE 9.5 7.1 9.5 2.4 2.4 .0
PDS 3.4 1.1 4.2 2.3 3.1 −.8
Right 2.8 1.2 2.4 1.6 1.2 .4
Other 3.7 1.1 4.0 2.7 3.0 −.2
Mean 2.5 2.6 −.2
Percentage of vote for various parties in foresight, hindsight, and in effect.
ΔΔ=difference of absolute |F–O| – |H–O|. Data areaveraged across two subsamples (Bonn and Saarbrücken).
Subjective hindsight experience. The participants’ responses on the two subjective experience items which assessed the
participants’ subjective foresight accuracy and their perceived necessity of the election outcome were much around the
midpoint of the scales, with essentially no difference between control and experimental participants (all ts < 1). Also, these
subjective measures were not significantly related—across control and experimental groups, calculating partial correlations as
above in order to control for the experimental-control difference itself—to the memory hindsight bias index (absolute
magnitude of the partial rs < .2). However, they correlated significantly with each other (partial r=.31, p= .014), indicating
that individuals who perceived the outcome to be more deterministic also judged their predictions to be more accurate.
Further, there was a marginally significant partial correlation between judged accuracy of the initial prediction and our
measure of hypothetical hindsight bias (r=.22, p=.08). That is, individuals who— for whatever reasons, factual knowledge or
mere self-enhancement (cf. Campbell & Tesser, 1983)—made better estimations of the election outcome also judged their
earlier predictions to be more accurate. In sum, no clear picture emerged with respect to the subjective hindsight experience of
individuals. This may be due to the low reliability of single-item measures but may also indicate dissociations in the
underlying phenomena. We will return to this issue in the general discussion.
A COMPARISON OF OUR RESULTS WITH THOSE OF PREVIOUS STUDIES

The strong memory hindsight biases found in both of our studies stand out against the rather weak and inconsistent effects
found in some previous studies of memory hindsight bias, as outlined in the introduction. We supposed that the length of the
retention interval between foresight and hindsight judgements is the crucial factor for the emergence of memory hindsight
bias. Our results fit nicely into this line of reasoning. However, we felt that a more systematic analysis of this relationship
might yield more convincing evidence for our point. To this end, we compared our results with those of earlier studies of
memory hindsight bias, using a common measure.
Method
We could find three other studies of memory hindsight bias in political elections that used the percentage of votes or an
equivalent measure as the dependent variable. Fischer and Budescu (1995) had their participants estimate, 3 weeks before the
1992 election in Israel, the number of Knesset seats won by various parties or blocks and recall these predictions 3 weeks
after the election. Powell (1988) asked his participants to estimate the percentage of votes (among other measures) for the
candidates in the 1984 US presidential and Missouri gubernatorial as well as lieutenantgubernatorial elections, 1 day before
and 6 days after the elections; only data for the winning candidates are reported. Finally, Wendt (1993) assessed pre- and
postdictions for the Land parliament elections in Schleswig-Holstein, Germany, in 1988 and in 1992; the predictions were
made half a week before the elections and recalled another half week thereafter. Participants in all three studies were students.
In order to find a common measure of the memory hindsight bias effect for different studies, we made use of the descriptive
shifts ΔΔ already mentioned in the results sections of Experiments 1 and 2 (see Tables 1 and 2).7 They denote the difference
between the absolute foresight and hindsight distances of the mean estimates from the election outcome and respectively) for
a given party. These shifts are then averaged across parties (or candidates), making our effect measure comparable across election
studies with different numbers of parties (or candidates).

Table 3 shows that the presumed relation between the length of the retention interval and the magnitude of memory hindsight
bias does indeed hold across studies. There is essentially no effect in studies with short (i.e., 1 week) intervals, but a sizeable
effect (more than one percentage point towards the outcome) in our two studies with month-long retention intervals; the
Fischer and Budescu (1995) study falls somewhere in between these extremes. However, one might argue that larger shifts
with longer retention intervals are merely a consequence of a larger initial uncertainty of the estimates. That is, at least in the
studies considered here, longer retention intervals also meant a larger interval between foresight estimate and the election. It is
conceivable that such early predictions deviate more from the actual outcome than predictions closer to the election and,
consequently, there is more “room” for hindsight shifts to occur. We checked this alternative account by calculating the size
of the hindsight shift relative to the initial deviation from the election result. These values are not given in the table because
they are highly correlated with the absolute shifts, allowing us to conclude that the increase of the hindsight effect with the
retention interval is not an artifact of the accuracy of the original estimates (or, put another way, of the particular effect
measure employed).
GENERAL DISCUSSION
The results of our studies leave no doubt about the existence of hindsight bias in political elections, even in a memory design
and using a “hard” indicator of foresight knowledge as the estimated percentage of votes. Probably, earlier studies that did not
find hindsight bias (particularly Wendt, 1993) employed retention intervals that were too short for detecting the effect. At first
sight, one might wonder why 1 week should be too short, given that sizeable hindsight bias is often found in laboratory
studies with knowledge questions or fictitious events, using retention intervals as short as 1 hour (cf. Christensen-Szalanski &
Willham, 1991). However, in political election studies, the situation of the participants is quite different. Unlike the relatively
trivial judgements in laboratory studies (e.g., “How long is the Danube river?”), the election prospects of political parties are
personally important for many people, and even the less politically interested can hardly escape the information about survey
results etc. distributed by the media and elsewhere in public. Therefore, the participants’ predictions are hardly half-guessed,
spontaneous estimates as in many laboratory studies, but rather well-judged estimates based on elaborate knowledge
structures. Consequently, these estimates are
TABLE 3 Comparison of our results with those of earlier studies

Study Election Effecta N F-H Intervalb
Powell (1988)c USA/Missouri 1984 .05 42 1 week
Wendt (1993)d Schleswig-Holstein 1988 .10 41 1 week
Wendt (1993) Schleswig-Holstein 1992 –.05 52 1 week
Fischer & Budescu Israel 1992 ≈.3 90 6 weeks
(1995)e
Our Study 2f Nordrhein-Westfalen 1.07 32 3.5 months
2000
Our Study 1 Germany 1998 1.12 56 4 months
a Average ΔΔ across parties or candidates.
b Time interval between foresight (F) and hindsight (H) judgements.
c Does not coincide with the interpretation of the author; according to Powell, the increase of the estimated percentage of votesafter
Reagan’s election from 64% to 66% indicates hindsight bias; however, the actual outcome was 59%; thus, the shift is away from
the outcome and therefore in the negative direction. Averaging the participants’ judgements across all three elections investigated
by Powell then yields the value of .05.
d Averaged across two subgroups.
e Effect extrapolated from the mean of the individual Fischer-Budescu indices, on the basis of the corresponding relationships in our own
studies.
f Only experimental participants.
deeply processed (cf. Craik & Lockhart, 1972), resulting in stronger memory traces which are more resistant to forgetting than
those from superficial estimates in laboratory studies.
7 It was not possible to calculate conventional measures of effect size like d (Cohen, 1988) because the necessary data were not reported in
all of the studies. Moreover, this would even be suboptimal in this case because there is already a common natural measure of effect size,
that is, the hindsight shift on the percentage of votes scale. This measure can be calculated from the foresight and hindsight group means
and election outcomes reported in the studies.
Another factor that makes memory traces in political election studies more enduring is the list-length effect (e.g., Roberts,
1972), which says that a given item is more likely to be forgotten the more items the list contains. In political elections, there
are seldom more than a handful of relevant parties or candidates, whereas laboratory studies sometimes used 100 knowledge
questions (e.g., Hell et al., 1988). For all these reasons, it will take some time before memory for the predicted election results
is weak enough to be influenced by outcome knowledge. Whether such influence is mediated by anchor effects, retrieval
failure, or other mechanisms is a different question and one that is beyond the scope of the present article.
We are particularly confident in our conclusion about memory hindsight bias in political elections because we have—for
the first time with respect to this topic—realised a control-group design in our second study. As it turned out, the results in the
experimental condition of this study largely paralleled those of our first study, which employed a traditional before-after
design, whereas there was essentially no effect in the control condition. This, in turn, makes it likely that—all other things
being equal—the hindsight bias in our first study was a pure effect of outcome knowledge and no other contributing factor,
and it also raises confidence in the interpretation of the results of earlier political election studies as true hindsight bias effects
(disregarding the sometimes questionable magnitude and consistency of these effects for the moment). However, the ceteris
paribus clause is important: As outlined in the discussion of Study 1, a shift of the recollections towards the actual outcome might
also be expected if the participants recollect more recent predictions that are influenced by survey results of increasing
accuracy. During both of our studies, there was no such increase in the accuracy of the polls. Therefore, it would be premature
to generalise the results of our studies to situations where the accuracy of the survey results does increase, and studies lacking
control groups might produce severely misleading results under such circumstances. In any case, further studies of the
hindsight bias in political elections should keep this caveat in mind and have an eye on trends in the survey results and/or
employ suitable controls.
Our second study also allowed us to simultaneously investigate memory and hypothetical hindsight bias. Although both
effects were significantly correlated, the hypothetical effect was much weaker. This may be due to the different time horizon
of the effects: Whereas in the memory design, participants had to recall a judgement they had made three and a half months
ago, in the hypothetical design they were to indicate what they would have said immediately before the election, which was,
for most participants, two or three days ago. It is also conceivable that some of the participants had, in private settings,
actually made such judgements, which would effectively have turned the hypothetical design into a memory design for these
individuals. In any case, it is likely that the immediate preelection perspective was still much more salient for the participants
than their factual estimates from the first part of the study, and therefore hard to ignore. Under these circumstances, a large
amount of hypothetical hindsight bias would be unlikely. In general, it seems that the time intervals between prediction
(actual or projected), outcome and hindsight judgement are also important in hypothetical designs. It might be worthwhile to
systematically explore these temporal relationships and their effect on hindsight bias. In particular, hypothetical hindsight
judgements projected into the immediate past should be less influenced by the outcome because last week’s state of mind
should be still relatively easy to reconstruct for the individual. In contrast, when reconstructing one’s point of view from
several months ago, the person might be more likely to rely on metatheoretical assumptions (cf. Ross, 1989; Stahlberg &
Maass, 1998) and/or inferential strategies (cf. Werth, 1998).
This logic, however, may be limited to events that unfold in time and may not apply to matters of factual knowledge where
one should assume that, say, a state of ignorance with respect to a certain fact could not have changed very much in time. On
the other hand, people do have implicit theories about the development of their mental abilities across the life-span (cf. Ross,
1989), and such theories might also affect hindsight judgements, at least across a longer time distance (“Would I have known
that when I was younger?”).
The subjective hindsight experience

As Tulving complained in 1983, the scientific analysis of memory retrieval had concentrated almost exclusively on memory
performance, at the expense of the person’s recollective experience. This has changed somewhat in the years since then (see,
e.g., research on aspects of recollective experience like the feeling of knowing, feelings of familiarity, or the remember-know
distinction; Gardiner & Java, 1993; Jacoby, Kelley & Dywan, 1989; Koriat, 1993). Research on hindsight bias, in a sense, has
proceeded in the opposite direction. In Fischhoff’s (1975) original conception, hindsight bias was characterised by the
phenomenal experience of individuals, that is, their tendency to perceive an inner necessity in the course of events that led to
the final outcome, accompanied by the belief that the outcome was foreseeable and had in fact been foreseen by them. Later
research, ironically initiated by Fischhoff (1977) himself, concentrated on the ability of individuals to remember their
foresight judgements, that is, on memory performance. Accordingly, theory construction in this field was soon inspired by
theories of memory, particularly by theoretical explanations of the eyewitness misinformation effect (e.g., Bekerian &
Bowers, 1983; Loftus, 1979, 1991; McCloskey & Zaragoza, 1985). Despite this extensive focus on the memory dimension,
researchers still refer to the phenomenal experience of necessity and foreseeability when characterising hindsight bias in the
introductory sections of their papers. However, there is surprisingly little research actually directed at these subjective qualities
of hindsight bias.
The analysis of our participants’ self-reported reasons for not being surprised by the election outcome (in Study 1) and their
responses to the accuracy of foresight and perceived necessity items (in Study 2) was a first attempt to tap these issues. As it
turned out, foreseeability and perceived necessity indeed figured prominently in the participants’ comments. However, in both
studies, these impressions were unrelated to the memory dimension, although in Study 2 they were related to each other and
also somewhat to the magnitude of hypothetical hindsight bias. While these generally weak interrelations may be a
consequence of the incidental character of the observations in Study 1 and the questionable reliability of single items in Study
2, another possible explanation of this pattern of results may be worth contemplating. The lack of correspondence between
memory performance and subjective hindsight experience may indicate that the latter is not a mere by-product or
epiphenomenon of the former. Rather, what we generally refer to as the hindsight bias may actually be an interrelated
complex of three subphenomena, memory distortion, illusion of foresight, and impression of necessity. These components will
often be correlated but are to a certain degree independent of each other, for logical reasons and because they may be
influenced by different variables.
For instance, while perceiving some necessity in the course of an event is a logical precondition for feeling able to foresee
its outcome, it is by no means a sufficient condition. It is entirely possible to believe, for example, that all things in the world
happen according to a divine plan and at the same time believe that one would never be able to foresee this plan and have
insight into the course of events. Also, it is logically possible to have exaggerated perceptions of the (general) foreseeability
and necessity of an event while at the same time exhibiting no distorted recollection of one’s foresight judgement; this can
trivially be the case if one has in fact correctly predicted the outcome.
The logical possibility of dissociations between hindsight bias subphenomena also suggests that different influences may be
operative with respect to the different components. For instance, the degree of memory hindsight bias will be most sensitive to
influences known from memory research in general and interference phenomena in particular, for instance, the relative
strengths of memory traces (Hell et al, 1988), which themselves depend on factors as time (see our studies) or repetition,
retrieval dynamics (Pohl et al., 2003-this issue) and also anchoring effects (Erdfelder & Buchner, 1998; Hardt & Pohl, 2003-
this issue; Stahlberg & Maass, 1998). Foresight illusion, on the other hand, may be strengthened by self-enhancing
metacognitive beliefs regarding the accuracy of one’s predictions (cf. Campbell & Tesser, 1983; Schwarz & Stahlberg, 2003-
this issue; Stahlberg & Maass, 1998; Werth, 1998) but may also be affected by the valence of and personal responsibility for
the outcome (see Mark, Boburka, Eyssell, Cohen, & Mellor, 2003-this issue; Renner, 2003-this issue; Stahlberg, Sczesny, &
Schwarz, 1999; Stahlberg & Schwarz, 1999). Finally, the impression of a necessary unfolding of events towards the outcome
may be related to personality variables like tolerance of ambiguity, rigidity, or controllability (cf. Campbell & Tesser, 1983;
Musch, 2003-this issue; Tykocinski, 2001); it may also be influenced by political affiliation, as Tetlock (1999) has shown in
political experts’ hindsight evaluations of political-historical events.
The possibility of having different components of the hindsight experience may also account for apparently contradictory
effects of hindsight-relevant variables. For example, while some researchers (e.g., Mark et al., 2003-this issue) have provided
evidence that self-esteem-threatening outcomes (in this case, negative decision outcomes) prevent hindsight bias, there is other
evidence showing precisely the opposite effect: As Tykocinski (2001) demonstrated, voters showed increased hindsight bias
after their candidate had lost in an election. Upon closer examination, it turns out that Mark et al. had measured the
foreseeability (in hindsight) of the outcome, whereas Tykocinski had assessed its subjective probability (in hindsight). The
latter can be regarded as an operationalisation of the impression of necessity component, and in fact, perceiving a negative
outcome as inevitable should render it more palatable (“There was nothing that could be done”). In contrast, perceiving a
negative outcome as unforeseeable should serve to deny responsibility for it (“I couldn’t know it”) and so make it less
harmful to one’s self-esteem. Thus, a self-esteem maintenance motivation might produce opposite hindsight effects depending
on which hindsight component is involved.
At present, such considerations are mostly speculative, because we do not have many data that bear directly on the
subjective hindsight experience of individuals, and/or the available research has not been systematically analysed with respect
to the proposed distinction. However, if, there is some truth to it, the latter might have important consequences for research on
hindsight bias. For example, it should alert us that different research designs may be differentially sensitive to the different
hindsight components. Of course, memory designs using almanac questions are ideally suited for the assessment of the
memory component. An assessment of the other components, however, particularly of the impression of necessity
component, requires the use of (real or simulated) events. A hypothetical design may be something in between, sensitive
primarily to foresight illusions (remember that foresight accuracy judgements and hypothetical hindsight bias were weakly
correlated in Study 2). In this vein, Musch (2003-this issue) found that self-enhancement strivings of individuals (which
might be associated with foreseeability illusions) were correlated with hypothetical but not with memory hindsight bias. In
any case, these are directions for further research that might be worth exploring.
REFERENCES
Baddeley, A.D. (1976). The psychology of memory. New York: Basic Books.
Bekerian, D.A., & Bowers, J.M. (1983). Eyewitness testimony: Were we misled? Journal of Experimental Psychology: Learning, Memory,
and Cognition, 9, 139–145.
Blank, H., & Fischer, V. (2000). “Es muβte eigentlich so kommen”: Rückschaufehler bei der Bundestagswahl 1998 [“It had to turn out that
way”: Hindsight bias in the German parliamentary elections in 1998]. Zeitschrift für Sozialpsychologie, 31, 128–142.
Casper, J.D., Benedict, K., & Perry, J.L. (1989). Juror decision making, attitudes, and the hindsight bias. Law and Human Behavior, 13,
291–310.
Processes, 48, 147–168.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal
Behavior, 11, 671–684.
Crowder, R.G. (1976). Principles of learning and memory. Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Detmer, D.E., Fryback, D.G., & Gassner, K. (1978). Heuristics and biases in medical decision-making. Journal of Medical Education, 53,
682–683.
Ebbinghaus, H. (1885). Uber das Gedachtnis [On memory]. Leipzig: Duncker & Humblot.
Fischer, I., & Budescu, D.V. (1995). Desirability and hindsight biases in predicting results of a multi-party election. In J.-P.Caverni, M.Bar-
Hillel, F.H.Baron, & H.Jungermann (Eds.), Contributions to decision making (Vol. 1, pp. 193–211). Amsterdam: Elsevier.
349–358.
Gardiner, J.M., & Java, R.I. (1993). Recognising and remembering. In A.F.Collins, S.E.Gathercole, M. A.Conway, & P.E.Morris (Eds.),
Theories of memory (pp. 163–188). Hove, UK: Lawrence Erlbaum Associates Ltd.
Hardt, O., & Pohl, R. (2003). Hindsight bias as a function of anchor distance and anchor plausibility. Memory, 11, 379–394.
311–327.
Memory & Cognition, 16, 533–538.
Infratest dimap. (2001). DeutschlandTREND [April 1998-]. http://www.infratest.de/indi/politik/ deutschlandtrend/
Jacoby, L.L., Kelley, C.M., & Dywan, J. (1989). Memory attributions. In H.L.Roediger & F.I.M. Craik (Eds.), Varieties of memory and
consciousness: Essays in honour of Endel Tulving (pp. 391– 422). Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Koriat, A. (1993). How do we know that we know? The accessibility model of feeling of knowing. Psychological Review, 100, 609–639.
Loftus, E.F. (1979). Eyewitness testimony. Cambridge, MA: Harvard University Press.
Loftus, E.F. (1991). Made in memory: Distortions in recollection after misleading information. In G.H. Bower (Ed.), The Psychology of
Learning and Motivation, 25 (pp. 187–215). New York: Academic Press.
McCloskey, M., & Zaragoza, M. (1985). Misleading postevent informaion and memory for events: Arguments and evidence against
memory impairment hypotheses. Journal of Experimental Psychology: General, 114, 1–16.
Pennington, D.C. (1981a). The British firemen’s strike of 1977/1978: An investigation of judgements in foresight and hindsight. British
Journal of Social Psychology, 20, 89–96.
Pennington, D.C. (1981b). Being wise after the event: An investigation of hindsight bias. Current Psychological Research, 1, 271–282.
Pohl, R.F. (1993). Der Rückschau-Fehler: Ein Modell zur Analyse und Erklärung systematisch verfälschter Erinnerungen [The hindsight
bias: A model for the analysis and explanation of distorted recollections]. Unpublished habilitation thesis, University of Trier,
Germany.
Memory, 11, 337–356.
Powell, J. (1988). A test of the knew-it-all-along effect in the 1984 presidential and statewide elections. Journal of Applied Social
Roberts, W.A. (1972). Free recall of word lists varying in length and rate of presentation: A test of totalme hypotheses. Journal of
Experimental Psychology, 92, 365–372.
Ross, M. (1989). Relation of implicit theories to the construction of personal histories. Psychological Review, 96, 341–357.
Review of Social Psychology (Vol. 8, pp. 105–132). Chichester, UK: Wiley & Sons.
low self-esteem relevance. Working paper, SFB 504, University of Mannheim.
Stahlberg, D., Sczesny, S., & Schwarz, S. (1999). Exculpating victims and the reversal of hindsight bias. Working paper, SFB 504,
University of Mannheim.
Synodinos, N.E. (1986). Hindsight distortion: “I-knew-it-all-along and I was sure about it”. Journal of Applied Social Psychology, 16,
107–117.
Tetlock, P.E. (1999). Theory-driven reasoning about plausible pasts and probable futures in world politics: Are we prisoners of our
preconceptions? American Journal of Political Science, 43, 335–366.
Tulving, E. (1983). Elements of episodic memory. Oxford: Clarendon Press.
Tykocinski, O.E. (2001). I never had a chance: Using hindsight tactics to mitigate disappointments. Personality and Social Psychology
Bulletin, 27, 376–382.
Wendt, D. (1993). Kein Hindsight Bias (“Knew-It-All-Along-Effekt”) bei den Landtagswahlen in Schleswig-Holstein 1988 und 1992 [No
hindsight bias (“knew-it-all-along effect”) in the Schleswig-Holstein Land parliament elections in 1988 and 1992]. Zeitschrift für
Werth, L. (1998). Ein inferentieller Erklärungsansatz des Rückschaufehlers [An inferential account of the hindsight bias]. Hamburg:
Kovac.

U. Hoffrage - Hindsight Bias (2003)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

U. Hoffrage - Hindsight Bias (2003)

Uploaded by

Copyright:

Available Formats

MEMORY

ISBN 0-203-48789-3 Master e-book ISBN

ISBN 0-203-59546-7 (Adobe eReader Format)

© 2003 Psychology Press Ltd

THE PHENOMENON OF HINDSIGHT BIAS

Designs and definition

Materials and measures

Relevance and related phenomena

Meta-cognitions and surprise

Motivational accounts and individual differences

Components and adaptive value of the hindsight bias

© 2003 Psychology Press Ltd

THE COGNITIVE PROCESS MODEL “SARA”

Overview of the model

Basic cognitive processes

EVALUATION OF THE MODEL

TABLE 1 Fixed parameters of the model

Parameter Description Values

Simulation of a hindsight-bias experiment

Explanatory power of SARA

Limitations of the model

© 2003 Psychology Press Ltd

MODERATOR VARIABLES OF HINDSIGHT BIAS

THE RAFT MODEL

TABLE 1 Hindsight bias in the presidential election inference

THE RAFT MODEL: EMPIRICAL SUPPORT

THE RAFT MODEL: IMPLEMENTATION

Knowledge: Amount, accuracy, and updating

TABLE 2 Parameters investigated in the simulations

Hindsight bias: Wagering on the future

© 2003 Psychology Press Ltd

anchor plausibility remains to be explored in more detail.

FEATURES AND DETERMINANTS OF ANCHOR PLAUSIBILITY

Relative anchor distance

DETERMINANTS OF HINDSIGHT BIAS

Range of possible estimates

Determinants and features of anchor plausibility

Determinants of hindsight bias

Range of possible estimates. For anchors within and outside the

Determinants of anchor plausibility

Determinants of hindsight bias

6 We thank Wolfgang Hell and Ulrich Hoffrage for this suggestion.

Explanatory power of formal models of hindsight bias

7 We thank Edgar Erdfelder for this idea.

© 2003 Psychology Press Ltd

(1991) found only six studies without a significant effect.

THEORETICAL EXPLANATIONS OF THE HINDSIGHT BIAS

BIASED RECONSTRUCTION VERSUS MEMORY IMPAIRMENT

BASIC RESEARCH PARADIGMS IN HINDSIGHT BIAS RESEARCH

and lowest in the condition “prediction quality: poor”

© 2003 Psychology Press Ltd

THE KNEW-IT-ALL-ALONG PARADIGM

Approaches to the knew-it-all-along effect

answer to the question.

effect of “actual value” reached significance, both Fs < 1.

“I was surprised, but not that surprised. I mean, it makes sense.”

© 2003 Psychology Press Ltd

THE ROLE OF SELF-RELEVANT OUTCOMES

THE ROLE OF SURPRISE

A SENSE-MAKING MODEL OF HINDSIGHT BIAS

THE PRESENT RESEARCH

TABLE 1 Estimated likelihood of home team win (0–100%)

I’m horrible with word problems!

Examples of reasons for a good performance in this study were: