You are on page 1of 15

Journal of Economic Behavior & Organization 84 (2012) 510–524

Contents lists available at SciVerse ScienceDirect

Journal of Economic Behavior & Organization


journal homepage: www.elsevier.com/locate/jebo

An unlucky feeling: Overconfidence and noisy feedback


Zachary Grossman a,∗ , David Owens b
a
University of California, Santa Barbara, United States
b
Haverford College, United States

a r t i c l e i n f o a b s t r a c t

Article history: How do individuals’ beliefs respond to ego-relevant information? After receiving noisy,
Received 29 November 2011 but unbiased performance feedback, participants in an experiment overestimate their
Received in revised form 26 July 2012
own scores on a quiz and believe their feedback to be ‘unlucky’, estimating that it
Accepted 13 August 2012
under-represents their score by 13%. However, they exhibit no such overconfidence in
Available online 25 August 2012
non-ego-relevant beliefs—in this case, estimates of others’ scores. Comparing subjects’
belief-updating to the Bayesian benchmark, we find that this ‘unlucky feeling’ is largely due
JEL classification:
C91 to overconfident priors, with biased updating driving overconfidence only among the par-
D03 ticipants with the worst-calibrated beliefs. This suggests that social comparisons contribute
D83 to the biased response to feedback on relative performance observed in other studies. While
feedback improves performance estimates, this learning does not translate into improved
Keywords:
estimates of subsequent performances. This suggests that beliefs about ability are updated
Overconfidence
differently than beliefs about a particular performance, contributing to the persistence of
Feedback
Overestimation
overconfidence.
Absolute performance © 2012 Elsevier B.V. All rights reserved.
Bayesian updating
Biased updating
Information processing
Learning transfer
Cross-game learning

1. Introduction

Life experience consistently supplies feedback about our personal attributes and our performance in a variety of settings.
A restaurateur’s experience provides her with information not only about the likelihood that her current restaurant will
succeed, but also about her ability to successfully operate other restaurants in the future. Though a Bayesian updater should
develop more accurate self-knowledge as she acquires relevant information, people frequently express overconfident esti-
mates of their relative ability or performance across a broad range of measurements (Hoelzl and Rustichini, 2005; Burks
et al., 2010)—even to their own financial detriment (Malmendier and Tate, 2008; Camerer and Lovallo, 1999)—including for
attributes about which people receive a multitude of feedback over the course of their lives.1

∗ Corresponding author at: Department of Economics, 2127 North Hall, University of California, Santa Barbara, CA 93106, United States.
Tel.: +1 805 893 7309; fax: +1 805 893 8830.
E-mail addresses: grossman@econ.ucsb.edu (Z. Grossman), dowens@haverford.edu (D. Owens).
1
In a commonly cited example, Svenson (1981) finds that 93% of U.S. students believe that they are better than the median driver. Most people drive on
a regular basis, and each trip is an opportunity to learn about one’s driving ability.

0167-2681/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.jebo.2012.08.006
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 511

Despite a vast catalogue of observed overconfidence, it is not clear to what extent it indicates biased information
processing.2 Moore and Healy (2008) and Benoît and Dubra (2011) argue that much of the evidence of overconfidence
is compatible with Bayesian updating. However, Burks et al. (2010) reject Bayesian updating, while Mobius et al. (2011) and
Eil and Rao (2010) find that beliefs respond to news conservatively and with an ego-preserving bias.
In this paper we present an experiment that examines how overconfidence about absolute (as opposed to relative)
performance arises and how it persists in the face of experience and feedback. The design allows us to distinguish between
two sources of overconfident posterior beliefs—overconfident priors and biased information processing—by measuring how
individuals’ beliefs about their score on a quiz react to noisy, but unbiased feedback. We also examine how experience and
feedback from the quiz affect subsequent beliefs about performance in a second, very similar quiz, which provides insight
into how feedback about one performance affects the accuracy of prior beliefs about the next performance and thus how
overconfidence may persist over a series of related tasks.
The experiment, described in Section 2, makes four main contributions to the study of overconfidence and the response
to feedback, each made possible by a particular feature of the design. First, all participants take a ten-question quiz on logic
and reasoning and provide estimates of absolute performance before taking the quiz, after the quiz, and after receiving noisy
performance feedback. Studying the beliefs about absolute performance allows us to provide new evidence on the extent
to which expressed overconfidence reflects biased updating, while also shedding light on the role of social comparisons in
driving the updating anomalies found by Mobius et al. (2011) and Eil and Rao (2010). Furthermore, while relative performance
is at times a key determinant of success, many risky endeavors hinge chiefly on absolute performance. For example, a lender’s
bottom line depends upon her being a good judge of borrowers’ probability of default, independent of the judgments of other
lenders. Thus, investigating the nature and severity of overconfidence in absolute performance measures can yield insight
into how overconfidence matters in a broader range of environments including entrepreneurial and investor choice.
Second, participants estimate different target scores in two separate conditions: their own score in the Self condition
and the score of another participant in the Other condition. The tasks in the two conditions are virtually identical, but the
estimated quantity in the Other condition is not ego-relevant, and thus provides a more faithful control than that of Eil and
Rao (2010), Ertac (2011), and Mobius et al. (2011). The controls in those studies feature events with objective probabilities,
about which updating presents the same statistical problem, but which may be interpreted very differently by participants.
Third, the nature of the feedback we give—a perturbation of the raw score, as opposed to a garbled binary signal—allows
us to examine quantitatively and at the individual level how individuals attribute performance to external circumstances or
luck. Like Ertac (2011), who provides incomplete, as opposed to noisy, feedback, our information process allows participants
to entirely exclude some possible outcomes, while leaving the relative probabilities of the remaining outcomes unchanged
for a Bayesian updater. In contrast, the feedback process of Eil and Rao (2010) changes the relative possibilities of the
remaining ranks and Mobius et al. (2011) features only signals which are never perfectly informative. Finally, participants
complete the quiz/belief-elicitation protocol twice, which allows us to examine how information and experience carry over
from one task to the next.
Section 3 presents our results. On average our participants express accurate interim and posterior beliefs about oth-
ers’ scores, but they overestimate their own score and believe themselves to have received ‘unlucky’ feedback. However,
this effect is largely consistent with Bayesian updating and overconfident prior beliefs, as opposed to biased information
processing and, for most participants, there we find no evidence that beliefs respond conservatively to feedback about their
own performance. Furthermore, most participants’ beliefs react fairly symmetrically to ‘good news’ and ‘bad news’, leading
us to conclude that post-feedback overconfidence is primarily driven by overconfident priors. While experience and feed-
back from the first quiz improve the accuracy with which participants estimate others’ scores on the second quiz, we find
only minimal improvement in the accuracy of participants’ estimates of their own scores. This suggests that people update
their beliefs about their overall ability differently than they do to update their beliefs about a specific performance, which
may contribute to the persistence of overconfidence.
We conclude, in Section 4, with a discussion of how our results are consistent with research finding that overconfidence
levels and updating depend on whether or not the relevant information is ego-relevant.3 However, the results are less
consistent with recent work documenting an asymmetric response to news. For example, Eil and Rao (2010) ask subjects
to estimate their ranking in both an IQ task and in a measure of individual attractiveness both before and after receiving
binary feedback and find that participants respond much less to negative information about themselves than positive infor-
mation. We find limited evidence of such a “good news–bad news” effect, with the bias concentrated in a small number
of subjects with highly inaccurate prior beliefs. We do not reject the notion that judgments reflecting overconfidence may
be compatible with Bayesian updating, as pointed out by Moore and Healy (2008) and Benoît and Dubra (2011). Compar-
ing our results to related work, it seems that asymmetric responsiveness to feedback is more likely in beliefs about relative

2
Not all research has found exclusive overconfidence and overoptimism. Ertac (2011) finds evidence of pessimism in interpreting feedback that is
incomplete (as opposed to noisy) and Clark and Friesen (2009) find zero mean error and underconfidence to be more prevalent in forecasts of both relative
and absolute performance on a real effort task. They also find that people have less difficulty predicting relative performance than predicting absolute
performance. Moore and Healy (2008) also find subjects to be under-confident, in both a relative and absolute sense, on difficult tasks.
3
Charness et al. (2011) find that people make more errors in estimating their own performance on a mental ability task than on another in which
feedback relates to a more abstract question. Ertac (2011) finds that whether or not information is ego-relevant affects the way it is processed.
512 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

performance. This is further supported by our participants’ beliefs in the Other condition, which are marginally more sensitive
to good news than to bad news.
However, when it comes to cross-game learning (Kagel, 1995; Cooper and Kagel, 2003, 2008, 2009), we do find some
evidence of a bias specific to ego-relevant judgments—namely that the transfer of learning from one quiz into the next
appears stronger when one’s own ego is not at stake. Among those estimating their own score, both the level and precision
of beliefs improve less from the first to the second quiz than among those estimating others’ scores. This suggests another
channel through which overconfidence may persist. Performance feedback does improve the accuracy of these participants’
beliefs about the relevant performance, but not necessarily about the underlying abilities that generate similar performances.
Thus, not only do people process information about themselves differently than they do information about others, they way
they learn about their ability is different from they way they learn about a specific performance.

2. Experimental design

We conducted twelve sessions, six for each condition, with each participant taking part in only one session, at the
Experimental and Behavioral Economics Laboratory (EBEL) at the University of California, Santa Barbara. Participants were
randomly recruited from the EBEL subject pool (largely comprised of UCSB students and staff) using the online system ORSEE
(Greiner, 2003). Each session lasted roughly 75 min and, with one exception, included 11 participants.4 Participants earned
$12.65 on average, which includes a $5 show-up fee and which was paid privately in cash at the end of the session. Full
instructions and screenshots are provided in the Appendix.5
Upon arriving at the experiment, participants sat at computer terminals, were given a paper copy of the instructions, and
followed along as the experimenter read them aloud. Participants then completed a protocol in which they took a quiz and
were asked three times to state their beliefs about performance on the quiz. In two separate conditions participants were
asked to estimate a different target score, sT : participants in the Self condition were asked about their own score, s, while
participants in the Other condition we asked about the score of an anonymous randomly selected other participant (RSOP).
Beliefs were elicited three times: (1) prior beliefs (b1 ), before participants took the quiz; (2) interim beliefs (b2 ), immediately
after taking the quiz; and (3) posterior beliefs (b3 ), after receiving feedback about the performance being estimated. We
conducted the sessions in back-to-back pairs featuring the same quizzes, but different conditions, and with the order of the
two quizzes reversed in consecutive session pairs.
After completing this process, participants repeated it with a second quiz, which featured new questions that were drawn
from the same source as the first quiz. When expressing their beliefs, participants did not view beliefs that they had expressed
previously. Further, they did not view payoffs for or scores on the first quiz until they had completed the second. The key
distinction between the first and second round of quizzes is that participants in the second round were experienced, having
already experienced taking the first quiz and receiving noisy feedback about their first-round target score.

2.1. Quizzes

The quizzes consisted of ten multiple-choice questions selected from a book of Mensa quizzes (Grosswirth et al., 1999).
They were presented in a Microsoft Excel spreadsheet which allowed participants to select one answer from a menu of
possible answers. We constructed a total of four quizzes, two of which were used in each session.6 To incentivize effort, for
each quiz we selected one question at random at the end of the session. Participants earned an extra $5 if their answer to this
question were correct. Participants were paid a $5 show-up fee, plus the incentive payment for each quiz, plus the earnings
from one of the three beliefs elicitations for each quiz. The elicitation chosen for payment was randomly selected from the
three beliefs elicitations for each quiz.7

2.2. Belief elicitation

Rather than simply asking participants to guess how many questions they answered correctly, we elicited the entire
distribution of their beliefs, using an interface and incentive scheme similar to those used by Moore and Healy (2008) and
Eil and Rao (2010). Fig. 1 presents a screenshot of our computer interface, which was programmed using Microsoft Excel.
For each raw-score outcome from zero to ten correct answers, participants selected the likelihood (in percentage terms)
with which they believe that outcome to have occurred. The percentages initially were set to zero for each outcome and
participants made adjustments by selecting numbers from the drop-down menu corresponding to each outcome.
Two charts on the same screen were provided as a visual aids to the participants. An auto-adjusting histogram provided
a visual profile of the current selection and a single, green bar showed the running total of percentage-points selected.

4
Session 9, which was in the Self condition, had only 9 participants.
5
The software are available from the authors upon request.
6
All quizzes, plus correct answers are provided in Appendix B.
7
While paying only one of the three elicitations per quiz mitigates the hedging incentive in the belief elicitations within each quiz, there is still hedging
potential across tasks.
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 513

Fig. 1. The beliefs-elicitation interface.

The experimenter verified that the total was 100% before the participant was allowed to save the estimate and close the
program.8 We used the same interface for each elicitation, but participants did not have access to their history of belief
estimates in the form of previous files.9

8
Due to the brevity of the verification process, the experimenter had no practical way to associate an individual participant with her beliefs, nor could
the experimenter track how the beliefs changed across elicitations and how they related to the participant’s score and feedback. The prominence of the
green bar on the belief-elicitation screen allowed the experimenter to accurately judge whether a participant’s reported beliefs totaled to 100%, from a
distance of a few feet. The verification process required only a few seconds for each belief elicitation, particularly after the first elicitation of the session, at
which point instances in which a participant had to be told to adjust the total were exceedingly rare.
9
Participants were allowed to write on scrap paper and may have had access to their previous estimates had they recorded the estimates on the paper.
514 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

Participants’ earnings for their probability estimates were determined by a quadratic scoring rule.10 Specifically, for each
paid elicitation, a participant was earned 1 + 5px − w dollars, where px is the participant’s estimated likelihood for the true
outcome and w is the sum of squares of the likelihoods for each of the eleven possible outcomes (zero correct through ten
correct). Following Moore and Healy (2008) and Eil and Rao (2010), after learning the scoring-rule formula, participants
were told, “This formula may appear complicated, but what it means for you is very simple: You get paid the most when
you honestly report your best guesses about the likelihood of each of the different possible outcomes.” Furthermore, the
instructions walked participants through four detailed examples of how the payoff for the beliefs-elicitation was calculated.11

2.3. Feedback

For each quiz, after all participants had entered their second score estimate, each received a slip of paper, face down for
privacy, with a number on it that constituted his or her feedback, f.12 This number was generated by perturbing the target
score with a noise or ‘luck’ term, , drawn uniformly from then set { − 2, − 1, 0, 1, 2}. Participants were instructed as such
and the instructions emphasized the fact that the number on the paper was equally likely to be equal to sT − 2, sT − 1, sT ,
sT + 1, or sT + 2.
After receiving this feedback participants entered their beliefs a third and final time. After completing this step for the
first quiz, participants moved on to the second part of the session, in which they took a second quiz and entered their beliefs
another three times. After completing this step for the second quiz, participants were given a statement summarizing their
payment and their scores on the two quizzes, received their payment and exited.
After the elicitation of the prior beliefs distribution, b1 , participants viewed and completed the quiz, an experience
that informs interim beliefs, b2 . After expressing b2 , participants observed f and expressed posterior beliefs, b3 . Bayes’ rule
provides no benchmark for updating from b1 to b2 , as we do not observe the information provided by viewing and taking the
quiz. It does, however, provide a specific benchmark for updating from b2 to b3 , shown in Eq. (1), in which bij is the expressed
probability that sT = i in belief elicitation j.

⎨ bi2
bi∗ =
f +2 if f − 2 ≤ i ≤ f + 2
bk (1)
3
⎩ k=f −2 2
0 if i < f − 2 or i > f + 2

Because | |≤2, bi∗


3
is restricted to be zero outside the range [f − 2, f + 2]. Furthermore, scores within [f − 2, f + 2] are expected
in the same proportions as in the expressed interim beliefs. In particular, because of the uniformity of the noise, interim
beliefs are not ‘dragged’ towards the feedback, f. For example, consider a participant with b62 = b72 = 0.50, who receives
feedback f = 5. Such f could be labeled ‘bad news’, in the sense that the (unbiased) feedback is lower than she expected.
However, her Bayesian posterior remains unchanged from b2 , b6∗ 3
= b7∗3
= 0.50. Bayesian updating results in a truncation of
b2 , and no shifting of probabilities among sT ∈ [f − 2, f + 2].
When bi2 = 0 for all i ∈ [f − 2, f + 2], Bayes’ rule provides no benchmark for b3 . This is an important departure from previous
studies, such as Mobius et al. (2011) and Eil and Rao (2010). In these designs, feedback comes in the form of a binary signal
that is more likely to be correct than wrong, so all beliefs generate a Bayesian posterior. In our design, feedback is never
completely wrong, in that there is a limit to how much it can differ from sT .

3. Results

We begin the analysis with a brief look at participants’ quiz performance. Table 1 summarizes scores and feedback by
condition, quiz order, and the specific quiz taken. We have a total of 99 observations in Other condition and 90 in the Self
condition; unless otherwise noted, all analysis is done at the individual-quiz level, so each participant is represented by two
separate observations.13
Participants answer roughly half of the ten questions correctly, on average, and those in the Other condition perform
marginally better, though this difference does not approach statistical significance (t = 1.20, p = 0.23). Performance improves,

10
The incentives provided by the quadratic scoring rule render truth-telling an optimal strategy for a risk-neutral expected-money maximizer, but
incentive-compatibility fails in the presence of risk-aversion.
11
We submit that few participants understand the quadratic scoring rule computationally, but feel that they generally trust the experimenter when told
that honestly reporting their beliefs is the best thing to do.
12
In the instructions, we used neutral language and avoided using the word ‘feedback’. Instead we called it ‘imperfect information’ about the score the
participant was trying to estimate. The quiz score and feedback were automatically computed by the spreadsheet and recorded by an assistant in a separate
room, who handed the slips of the paper to the experimenter. Thus, the experimenter was unable to associate individual participants with their quiz scores
and feedback during the course of the session.
13
We dropped some observations for various reasons: four sessions featured an error causing some participants to be given an incorrect score for one
of the quizzes. As a result we omit from the data analysis all 39 observations for which the target score was miscalculated and 20 observations for a quiz
following one in which the participant’s target score was miscalculated. A programming error corrupted the feedback recorded for eight observations,
which we dropped, and two participants (four observations) were dropped after we discovered that they were repeat participants.
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 515

Table 1
Average score (out of ten) and feedback.

Other Self

s f N s f N

Combined 5.33 5.37 4.91 4.93


99 90
(0.25) (0.30) (0.25) (0.28)
First quiz 5.15 5.20 4.68 4.74
54 50
(0.34) (0.41) (0.31) (0.39)
Second quiz 5.56 5.58 5.20 5.17
45 40
(0.37) (0.44) (0.39) (0.42)

Quiz A 6.90 7.10 5.70 5.70


20 20
(0.58) (0.62) (0.59) (0.53)
Quiz B 4.53 4.59 4.40 4.50
34 30
(0.35) (0.46) (0.36) (0.49)
Quiz C 4.85 4.65 4.65 4.65
20 20
(0.61) (0.72) (0.49) (0.64)
Quiz D 5.56 5.64 5.15 5.10
25 20
(0.44) (0.57) (0.58) (0.62)

Standard errors in parentheses.

Table 2
Average overestimation of the target score for each elicitation.

b1 − s T b2 − s T b3 − s T

Other 0.47 −0.23 −0.05


N = 99 (0.29) (0.30) (0.14)

Self 1.40 1.14 0.67


N = 90 (0.23) (0.20) (0.15)

Standard errors of mean overestimation in parentheses.

in both conditions, from the first quiz to the second, and there is some variance in the difficulty of the four quizzes.14
By construction, feedback in both conditions is unbiased—on average neither an upward nor a downward perturbation of
scores—so it is not surprising that f is on average very similar to s.
The rest of the analysis proceeds as follows: first, we examine participants’ beliefs for evidence that they overestimate
their target score. Then, we explore the extent to which they find feedback on their own performance to be ‘unlucky,’ in
spite of the fact that it is generated by a transparently unbiased process. Next, we analyze the process by which participants
update their interim beliefs after receiving feedback. Finally, we search for improvement in the estimations between the
first and second quiz in each condition.

3.1. Overestimation

We examine the data for evidence of overestimation in two ways. As a preliminary measure, we examine the error in the
mean of each belief elicitation, which we use to proxy participants’ ‘best guess’ about the target score. However, we observe
the entire distribution of beliefs, so we next examine the profile of errors in the belief distribution. While the participants
in the Other condition on average do not appear to significantly overestimate the target score, both measures show that
overestimation is pervasive in the Self condition.
10 i
To begin, let bj ≡ b × i denote the mean of the belief distribution for elicitation j, where bij is a participants’ expressed
i=0 j
belief in elicitation j that sT = i. We focus on overconfidence manifested by bj − sT , the degree to which the mean of beliefs
exceeds the target score. Table 2 presents this measure by condition, separately for each elicitation.
In the Other condition, while the mean prior belief is on average 0.47 points above the target score, corresponding to 9%
of the average quiz score, this overestimation is only marginally significant (t = 1.58, p = 0.12). This suggests that the quizzes
are slightly more difficult than participants expect. We observe no tendency to overestimate the performance of others
after having taken the quiz, as interim and posterior beliefs actually exhibit slight underestimation, though not significant
(t = −0.77, p = 0.44 and t = −0.06, p = 0.72).
In the Self condition, average overestimation is significantly greater than zero across all three elicitations (t = 5.89, 5.62,
and 4.43, respectively; p < 0.001 for all three). Further, the magnitudes of overconfidence are quantitatively meaningful,
corresponding to an overestimation of 23%, 20%, and 13% of the average score, respectively. The difference in mean overes-
timation across the two conditions, as tested by a two-sample t-test, is highly significant (p < 0.001) for all three elicitations.

14
Table A.8 in the appendix further breaks down scores and beliefs by quiz and by order.
516 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

.25
.2
Stated Probability
.1 .15
.05
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

Other Self

(a) Prior beliefs


.25
.2
Stated Probability
.1 .15
.05
0

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

Other Self

(b) Interim beliefs

Fig. 2. Distribution of overestimation of beliefs: bj − ST .

The decrease in overestimation from prior to interim beliefs in the Self condition further supports the notion that quizzes are
more difficult than participants expect when expressing b1 . Importantly, after taking the quiz and learning about its difficulty,
participants continue to overestimate their own score, which they do not when estimating another person’s performance.
Next, we examine evidence of overestimation in the entire distribution of beliefs, by subtracting the true target score, sT ,
from each bj . By outcome, we average bj − sT across all observations and display the resulting overestimation distributions in
Fig. 2, by condition and separately for each pre-feedback elicitation. The extent to which overestimation in the Self condition
exceeds that in the Other condition is apparent in both elicitations. On average, participants in the Self condition place
greater weight on outcomes above the actual target score than do those in the Other condition, while the opposite is true
for outcomes below the target score.

3.2. Estimated luck

We now investigate the degree to which participants consider noisy but unbiased feedback about their own performance
to be ‘unlucky.’ As b3 defines participants’ post-feedback beliefs about sT , and feedback is generated by the process f = sT + ,
the distribution e ≡ f − b3 implicitly defines participants’ beliefs about their luck, . We call e the distribution of estimated
luck.15 As in the previous section, we first examine estimated luck using only the mean of the stated belief distribution, then
we use the entire distribution of beliefs.
Table 3 highlights differences across conditions in estimated luck by summarizing the mean of estimated luck in each
condition. While participants in the Other condition express the belief that sT is perturbed upward by 0.11 questions, on
average, those in the Self condition estimate that it is perturbed downward by 0.65 questions. Thus, participants appear
to believe that feedback about their own score is significantly unlucky (t = 2.57, p = 0.005), but that feedback about their
colleague’s score is relatively accurate.
Fig. 3 presents the empirical distribution of  = f − b3 , aggregated across individuals. Note that in both conditions, the
bulk of e lies between −2 and 2. This shows that most of the mass of b3 is concentrated on [f − 2, f + 2], with 84% of
participants in the Other condition, and 82% in the Self condition correctly expressing b3 entirely within this range. The

15
While the support of  is confined to { − 2, − 1, − , 1, 2}, some participants express b3 such that e falls outside of this range.
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 517

Table 3
Mean estimated luck.

Other Self

e 0.11 −0.65
 = f − b3 (0.14) (0.15)
N 99 90

Standard error of mean estimated luck in parentheses.

.35
.3 .25
Stated Probability
.15 .2.1
.05
0

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

Other Self

Fig. 3. Distribution of estimated luck: f − b3 .

figure shows a noticeable difference in the distribution of estimated luck across conditions. In the Other condition, it is
distributed symmetrically to right-skewed, with a peak of 0, suggesting that participants believe their colleagues’ feedback
to be unbiased to lucky. In the Self condition, the distribution is skewed to the left of zero. Subjects find feedback most likely
to be an unlucky representation of their own performance.

3.3. Belief updating

Why might participants consistently believe that their feedback under-represents their actual performance—that they
are ‘unlucky’—when it is objective and unbiased? One explanation of this finding is that participants attribute positive
outcomes to their performance and negative outcomes to luck. However, the ‘unlucky feeling’ need not arise from a bias
in the updating process. It may be perfectly consistent with Bayes’ rule for a participant with overconfident prior beliefs
who receives relatively accurate feedback. In this section we investigate the extent to which post-feedback belief updating
conforms to the predictions of Bayes’ rule.
The fact that feedback f informs participants that the target score sT lies within the range [f − 2, f + 2] has two implication
worth noting. First, Bayes’ rule offers no benchmark for updating interim beliefs that place no weight on outcomes within
this range. Second, if Bayes’ rule does apply, it prescribes that the participant truncate the distribution of beliefs outside of
the [f − 2, f + 2] range and then redistribute the remaining probability within the feedback range proportional to the interim
beliefs.
We use these two facts to partition our sample into three updating categories. First, observations for which bi2 = 0 for
all i ∈ [f − 2, f + 2] we label as ‘Not Updateable’. Second, when posterior beliefs ascribe positive probability for outcomes
outside of the [f − 2, f + 2] range, the interim beliefs are ‘Not Truncated’. Placing positive probability outside the feedback
range may signal a misunderstanding of the feedback and the related updating process. The remaining observations we
label ‘Truncated’, because the posterior beliefs truncate the interim beliefs within the feedback range—though this does not
guarantee that the probability is redistributed as prescribed.
We find 18 observations in the Other condition and 12 in the Self condition to be Not Updateable, while 11 and 13, respec-
tively, are Not Truncated, and the remaining 70 and 65, respectively, are Truncated. Of the 18 Not Updateable observations in
the Other condition, interim beliefs lie entirely below f − 2 for 13. Of the 12 represented in the Self condition, interim beliefs
lie entirely above f + 2 for 11. In other words, the participants who are most pessimistic about the performance of their peers
and most overconfident about their own performance are the ones for whom Bayes’ rule provides no guidance.
While Bayes’ rule provides no posterior-belief benchmark for Not Updateable observations, we construct a proxy, bP3 , for
the most appropriate beliefs in these cases, appealing to the intuition that, when interim beliefs are falsified, participants
should believe that their score is as close to their interim beliefs as is possible, given the constraints imposed by their
518 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

Table 4
Average performance and mean belief (actual and Bayesian) for the three updating categories.

s sT f b1 b2 b3 b3

Other 4.50 6.94 7.33 5.69 4.20 6.78 6.22


N = 18 (0.52) (0.76) (1.00) (0.42) (0.43) (0.62) (0.65)
Not Updateable
Self 3.25 3.25 2.42 6.19 6.43 4.75 4.08
N = 12 (0.70) (0.70) (0.78) (0.54) (0.53) (0.64) (0.57)

Other 3.64 4.81 5.46 4.84 4.87 5.03 5.34


N = 11 (0.72) (0.76) (0.87) (0.36) (0.49) (0.48) (0.56)
Not Truncated
Self 4.23 4.23 4.54 6.54 5.53 5.62 5.33
N = 13 (0.63) (0.63) (0.71) (0.54) (0.64) (0.65) (0.65)

Other 5.81 4.95 4.86 5.93 5.34 4.90 5.21


N = 70 (0.29) (0.26) (0.29) (0.13) (0.15) (0.24) (0.19)
Truncated
Self 5.41 5.41 5.51 6.30 6.13 5.72 5.83
N = 65 (0.27) (0.27) (0.31) (0.22) (0.24) (0.27) (0.25)
P ∗
Notes: Standard error of means in parentheses. For Not Updateable observations, b3 proxies for b3 .

feedback.16 Thus, if the mean of Not-Updateable interim beliefs is higher (lower) than f, the posterior beliefs, bP3 place
probability 1 on f + 2 (f − 2).17 In the case of Not Updateable beliefs, we use bP3 as a proxy for b∗3 .

3.3.1. Performance and beliefs across updating categories


We next examine performance, overestimation, and estimated luck across the three updating categories and compare
the actual posterior beliefs to those prescribed by Bayes’ rule. Table 4 summarizes performance, feedback, the mean of
the belief distribution and the mean of the Bayesian posteriors. It shows that Not Updateable and Not Truncated beliefs
are accompanied by lower quiz scores, greater overestimation of one’s own score, and greater deviation from Bayesian
posteriors, compared to Truncated beliefs.
In the Not Updateable category, the average quiz scores in the Other and Self conditions are 4.50 and 3.25, respectively,
and in the Not Truncated category they are 3.64 and 4.23, respectively. In contrast, the respective averages in the Truncated
category are 5.81 and 5.41. The overall average score of 5.59 in the Truncated category is significantly higher (t = 4.58, p = 0.00)
than the overall average score of 3.98 in the other two categories, which amounts to only 71% of the average score in the
Truncated category.
Initial analysis revealed significant and persistent overestimation in the Self condition. Subtracting the sT column in
Table 4 from the mean belief column for each of the three respective elicitations yields the same overestimation measure
reported in Table 2, only separated by updating category. Focusing on the Self condition, we see prior, interim, and posterior
overestimation of 0.89, 0.72, and 0.35, respectively, in the Truncated category. Combining the other two updating categories
yields the much higher respective overestimation measures of 2.61, 2.20, and 1.44. Thus, the overall overestimation in the
Self condition is largely driven by the extreme overestimation in the Not Updateable and Not Truncated categories.
Similarly, we find the estimated luck measure reported in Table 3 to be primarily driven by the observations in the Not
Updateable and Not Truncated categories. Subtracting the mean posterior belief (b3 ) column in Table 4 from the feedback
column yields estimated luck separated by updating category. In the Truncated category, Other and Self estimated luck is
−0.04 and −0.25, respectively, while combining the other two categories yields respective measures of 0.49 and −1.68. Thus,
it appears that the feeling that one’s feedback is unlucky is largely confined to the participants in the Self condition who
generate observations in the Not Updateable or Not Truncated category.
When participants in the Self condition perceive their feedback as unlucky, to what extent is this due to deviations from
Bayesian updating? We answer this question by comparing the mean of the actual posterior beliefs to the mean of the
Bayesian (or Bayesian proxy) posteriors, which are reported in the last column of Table 4. Looking first at the Truncated
category, the average of the mean Bayesian posterior belief is 5.83, compared to 5.72 for the actual posteriors. The reported
posteriors actually fall short of the Bayesian posteriors, albeit very slightly, and the difference is not statistically significant
(t = 1.43, p = 0.16). Because of this, we conclude that what little posterior overestimation and negative estimated luck do
manifest in the Truncated category is not driven by deviations from Bayesian updating and can largely be attributed to
overconfident prior beliefs.
Though beliefs are more optimistic than their Bayesian benchmarks in the Not Updateable and Not Truncated categories,
the difference cannot account for the overestimation and negative estimated luck. In the Not Truncated category, the average
of the mean Bayesian posterior belief is 5.33, compared to 5.62 for the actual posteriors, a difference that is not statistically
significant (t = 1.26, p = 0.23). The mean of the Bayesian proxy beliefs in the Not Updateable category averages 4.08, while the
mean of the actual posteriors averages 4.75. This difference is the largest of the three updating categories, but it is also the

16
This intuition is similar to that used to justify the D1 signaling refinement (Banks and Sobel, 1987), which requires that beliefs about the type who
chose an off-the-equilibrium-path strategy place all probability on types that experience the least cost for the observed deviation.
17
While it is possible for b2 to exactly equal f, this never occurred in our sample.
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 519

Table 5
Mean reported or calculated likelihood for each outcome across conditions and updating category. Outcomes are defined relative to feedback and pooled
outside the feedback range.

<f − 2 f−2 f−1 f f+1 f+2 >f + 2

Not Updateable b2 0.78 0 0 0 0 0 0.22


N = 18 bP3 0 0.78 0 0 0 0.22 0
b3 0.07 0.42 0.17 0.11 0.08 0.07 0.09
Not Truncated b2 0.37 0.09 0.07 0.07 0.06 0.09 0.26
Other N = 11 b∗3 0 0.31 0.18 0.18 0.09 0.30 0
b3 0.22 0.12 0.15 0.15 0.11 0.08 0.18
Truncated b2 0.19 0.08 0.08 0.11 0.13 0.15 0.27
N = 70 b∗3 0 0.19 0.13 0.13 0.19 0.34 0
b3 0 0.16 0.19 0.25 0.23 0.17 0

Not Updateable b2 0.08 0 0 0 0 0 0.92


N = 12 bP3 0 0.08 0 0 0 0.92 0
b3 0 0.06 0.02 0.02 0.18 0.48 0.25
Not Truncated b2 0.13 0.05 0.06 0.09 0.11 0.32 0.24
Self N = 13 b∗3 0 0.11 0.11 0.13 0.16 0.48 0
b3 0.11 0.06 0.07 0.08 0.10 0.31 0.27
Truncated b2 0.09 0.10 0.12 0.15 0.18 0.16 0.20
N = 65 b∗3 0 0.16 0.15 0.17 0.22 0.30 0
b3 0 0.12 0.18 0.23 0.26 0.21 0

least meaningful—being based on only a crude proxy for Bayes’ rule—and it is not significant (t = 1.41, p = 0.19). Thus, even
in these two categories, overestimation and negative estimated luck feature prominently even in the Bayesian posteriors of
those estimating their own score.

3.3.2. Actual updating


Table 5 shows how participants beliefs respond, on average, to the feedback, and compares this evolution to the Bayesian
benchmark. It displays the average likelihood placed on each outcome, relative to the realized feedback f, with the outcomes
on each side of the [f − 2, f + 2] range pooled in a single column. Observations are segregated by experimental condition and
updating category.
Beginning with the Not Updateable category, the interim beliefs place most weight (0.78) below the feedback range in
the Other condition and above the feedback range (0.92) in the Self condition, consistent with the fact that observations in
this category are generated largely by the participants who underestimate the RSOP’s score or overestimate their own score.
The average effect of our proxy for Bayesian updating is to shift the weight from outside the feedback range to the nearest
extreme within the feedback range. In both conditions, the actual response places very high weight on the outcome on the
extreme of the feedback range favored by the interim beliefs, but roughly only half of what is prescribed by the Bayesian
proxy—0.42 instead of 0.78 on f − 2 in the Other condition and 0.48 instead of 0.92 in the Self condition. In both conditions,
the remaining weight is spread across the other outcomes in a manner that retains more weight on the side of f favored by
interim beliefs, and that does not exclude outcomes outside the feedback range.
Interim beliefs in the Not Truncated category are close to uniform, but retain slightly more weight below f in the Other
condition and above f, reflecting the ‘unlucky feeling’, in the Self condition. By definition, these observations feature too much
(i.e., non-zero) posterior weight outside the feedback range. While the average posterior likelihoods outside the feedback
range fall by roughly one third in the Other condition (from 0.37 to 0.22 and 0.26 to 0.18), leaving room to reallocated a
modest amount of weight to within the feedback range, no outcome bin in the Self condition features a weight shift in excess
of three percentage points. Thus, participants generating observations in this category appear to be remarkably insensitive
to feedback, particularly in the Self condition, but the unlucky feeling would persist even if they were perfect Bayesians.
Finally, in the Truncated category we observe interim beliefs that are close to uniform, with slightly more weight above f,
particularly in the Self condition. Bayesian posterior beliefs follow suit, taking into account the truncation of beliefs outside
the feedback range, and look remarkably similar across the two conditions, differing by no more than four percentage
points. In both conditions, actual posterior beliefs are reasonably close to the Bayesian posteriors, but are biased towards
the feedback, as if the noise distribution were perceived to be peaked at zero, instead of uniform. Thus, in contrast to the
Not Truncated category, beliefs in this category seem overly sensitive to f. Furthermore, to the extent that participants
estimating their own score in this category feel that their feedback was unlucky, the feeling is not due to deviations from
Bayesian updating and is more consistent with pre-feedback beliefs.

3.3.3. Conservatism and asymmetry in the response to feedback


Mobius et al. (2011) find that beliefs about relative performance tend to be ‘conservative’, or unresponsive to new infor-
mation. In addition, Eil and Rao (2010) and Mobius et al. (2011) both find beliefs to be asymmetrically responsive. According
to this bias, participants’ beliefs respond more strongly to new information if it reflects positively on them than if it reflects
negatively. To test the responsiveness of beliefs, we compare the actual response to feedback, measured by the difference

between mean interim and posterior beliefs, (b3 − b2 ), to its Bayesian benchmark, (b3 − b2 ). Conservatism would suggest
520 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Mean Belief Difference

Mean Belief Difference


-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Bayesian Mean Belief Difference Bayesian Mean Belief Difference

Bayesian Benchmark Truncated Bayesian Benchmark Truncated


Non Truncated Not Updateable Non Truncated Not Updateable

(a) Other condition (b) Self condition



Fig. 4. Plots of the response of mean posterior beliefs to feedback (b3 − b2 ) versus the Bayesian response (b3 − b2 ).

Table 6
Regressions examining the conservatism and symmetry of the response to feedback.

Other Self

(1) (2) (3) (1) (2) (3)



(b3 − b2 ) 0.70 1.14 1.21 0.74 1.06 0.98
(0.27) (0.11) (0.11)∗∗ (0.12)∗∗ (0.24) (0.12)
GN 1.39 1.10 0.84 0.49 0.40 0.56
(0.33)∗∗∗ (0.23)∗∗∗ (0.22)∗∗∗ (0.11)∗∗∗ (0.16)∗∗ (0.16)∗∗∗

(b3 − b2 ) × GN 0.22 −0.37 −0.06 0.45 0.18 0.33
(0.29) (0.21)∗ (0.13) (0.14)∗∗∗ (0.31) (0.20)
Constant −0.93 −0.55 −0.58 −0.33 −0.22 −0.41
(0.30)∗∗∗ (0.14)∗∗∗ (0.14)∗∗∗ (0.09)∗∗∗ (0.14) (0.11)∗∗∗
√ √
Not Updateable – – – –
√ √ √ √
Not Truncated – –
R2 0.8734 0.8704 0.9498 0.7016 0.7713 0.8758
N 98 80 69 88 76 63

Notes: Dependent variable: (b3 − b2 ). GN = 1 iff f > b2 . Three participants for whom f = b2 are not included in this analysis, as it is considered neither good

nor bad news. Asterisks applied to (b3 − b2 ) denote significant difference from one. Asterisks on all other coefficients denote significant difference from
zero. Standard errors in parentheses, clustered by subject.

∗ ∗ ∗
that (b3 − b2 ) < (b3 − b2 ). Asymmetric responsiveness would lead (b3 − b2 ) to be higher, relative to (b3 − b2 ), when (b3 − b2 )
is positive than when it is negative.
Before taking a more quantitative approach to the analysis of conservatism and asymmetry in the response to feedback, we

begin by plotting (b3 − b2 ) vs. (b3 − b2 ) in Fig. 4, which features separate plots for each condition and uses different markers

for observations from each of the three updating categories. The forty-five degree line, (b3 − b2 ) = (b3 − b2 ), is displayed to
facilitate a comparison of participants’ mean difference in beliefs to its Bayesian benchmark. Observations located above the
diagonal represent updating to mean beliefs more favorable than the Bayesian benchmark, and those below represent less
favorable updating.
The figure presents no obvious visual evidence of conservatism or asymmetry in either condition. Conservatism, or unre-
sponsive beliefs, would cause the observations to form a flatter line than the Bayesian benchmark. With the exception of a
few outliers—all from the Not Updateable category—the bulk of the observations adhere relatively closely to the Bayesian
benchmark line, forming a very slightly steeper line in the Other condition, representing oversensitivity to feedback. Asym-

metry would suggest differences in responsiveness on either side of (b3 − b2 ) = 0. A striking feature of panel 4(b) is the
abundance of data points to the left of zero. This stems directly from the overconfidence manifested in b2 , which makes
‘good news’ relatively rare.
Next we take a more quantitative look, with a series of regressions that measure conservatism and asymmetry in the belief-
updating process. Table 6 presents the results of the regressions, in which the dependent variable is the difference in mean
beliefs between interim and actual posterior beliefs, (b3 − b2 ) and the explanatory variables include the difference between

the means of interim beliefs and the Bayesian posteriors, (b3 − b2 ), a dummy variable, GN (for ‘good news’), indicating that
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 521


feedback is exceeds the mean interim belief (f > b2 ), and the interaction term (b3 − b2 ) × GN. For each condition, we repeat
the analysis using all three updating categories (1), excluding the Not Updateable observations (2), and including on the

Truncated category. Strict adherence to Bayes’ rule would yield a coefficient on (b3 − b2 ) of one, and a coefficient of zero
on all other variables. A coefficient significantly lower than one on (b3 − b2 ) would be evidence of conservatism, while a

significant coefficient on the interaction term, (b3 − b2 ) × GN, would be evidence of asymmetry.18
The full-sample regressions (1) for both conditions show some evidence of conservatism, with coefficients on the Bayesian
response term of 0.70 and 0.74 in the Other and Self conditions, respectively, though only the Self condition coefficient is
significantly less than one. This indicates that participants adjust their beliefs about their own score in response to nega-
tive feedback by less than three-quarters of the amount prescribed by Bayes rule. Moreover, while the coefficient on the
interaction term in the Other condition is not significant, the 0.45 in the Self condition is highly significant, indicating that
participants are more sensitive to positive news about themselves than they are to negative news. Specifically, their mean
belief moves 19% more (0.74 + 0.45 = 1.19) than prescribed by Bayes rule, in stark contrast to their conservative response to
bad news.
Tellingly, all evidence of conservatism disappears when we omit the Not Updateable observations. In the Other condition,

the coefficient on (b3 − b2 ) is greater than one in columns (2) and (3), significantly so when only the Truncated observations

are included (3). Similarly, neither of the coefficients on (b3 − b2 ) deviate significantly from one in the Self condition. Fur-
thermore, the asymmetric response to feedback in the Self condition appears to be wholly dependent on the Not Updateable
outliers, as the neither interaction coefficient is significant in the regressions omitting that category. Interestingly, when
just the Not Updateable category is omitted, there is some asymmetry in the Other condition, with a marginally significant
interaction coefficient of −0.37 suggesting that participants under-react to good news about the RSOP’s performance.
To summarize, we observe a self-serving good-news–bad-news effect similar to that observed by Eil and Rao (2010) only
among those participants with the lowest quiz scores and unrealistically high pre-feedback beliefs. Among the remaining
participants, who constitute the vast majority in the Self condition, the response in mean belief is not distinguishable from
the Bayesian response. In contrast, most participants in the Other condition are moderately over-sensitive to bad news and
possibly less sensitive to good news, which is also consistent with Eil and Rao’s finding of self-serving bias in the response
to feedback on relative performance. Furthermore, those with the lowest quiz scores and unrealistically low pre-feedback
beliefs are conservative in their response to bad news.

3.4. Learning transfer

In the previous sections we observed that deviations from Bayesian updating do not account for much of our participants’
post-feedback overestimation of their own scores and that the bulk of the remaining overconfidence is driven by overcon-
fident priors. From where does this overconfidence originate? Among all the studies we have cited, participants do update
their beliefs in the right direction in response to feedback, even if the manner in which they do so is conservative or biased,
so it remains a puzzle how overconfidence persists in the face of repeated instances of tasks and feedback.
The evolution of beliefs across the two repetitions of our quiz and belief-elicitation protocol suggests a channel through
which overconfidence may persist. When estimating performance on the second quiz, participants in the Self condition gain
less from their experience on the first quiz than do those in the Other condition. Thus, even in the absence of bias in the
processing of information about performance on a given quiz, participants’ updating of their underlying ability, from which
each performance is generated, may be subject to a bias that affects how they transfer learning from one situation to the
next.
Table 7 shows the average accuracy of performance estimates, measured in terms of overestimation and mean squared
error, exhibited in prior, interim and posterior beliefs, for the first and second quiz in each condition. The bottom row of
each panel summarizes the deviation from Bayesian posteriors, on average. In the Other condition, once participants take
the first quiz, there is no significant overestimation, on average, with mean overestimations of −0.19, −0.10, −0.02, −0.28,
and −0.01 across the subsequent five elicitations. However, in the Self condition, while overestimation is eliminated after
first-quiz feedback, on the subsequent quiz it nearly regains first-quiz levels (1.18, 1.08, and 0.65). The minor improvement
may be due the experience of taking the first quiz, using the beliefs-elicitation instrument, and receiving and processing
feedback, or it might reflect slightly higher average scores on the second quiz in both conditions.
Analysis of the mean squared error shows that the experience and feedback from the first quiz may be more helpful to
participants in the Other condition. The mean squared error of the prior estimate drops from 12.52 to 11.07 in the Other
condition and from 10.11 to 8.26 in the Self condition. For the posterior elicitation, mean squared error in the Other condition
improves by 1.43, from 4.21 to 2.78, while it only improves by 0.10, from 3.71 to 3.61, in the Self condition. The improvement
in the Other condition is marginally significant (p = 0.052) according to a two-sample t-test, while the improvement in the
Self condition is not statistically significant.

18
Participants for whom f = b2 are categorized neither as ‘good news’ nor ‘bad news’ (f > b), and are therefore omitted. Defining ‘good news’ as feedback
that exceeds the modal interim belief yields the same qualitative results.
522 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

Table 7
Average accuracy (overestimation and mean squared error) of beliefs by elicitation and quiz.

Beliefs Other Self

Overest. MSE Overest. MSE

b1 0.88 12.52 1.57 10.11


(0.41) (1.88) (0.33) (1.10)
b2 −0.19 11.42 1.18 6.00
First quiz
(0.41) (1.65) (0.26) (0.79)
b3 −0.10 4.21 0.68 3.71
(0.21) (0.69) (0.21) (0.74)
b3 − b∗3 −0.35 1.02 −0.11 0.86
(0.17) (0.56) (0.11) (0.50)
N = 54 N = 50

b1 −0.02 11.07 1.18 8.26


(0.43) (1.47) (0.34) (1.19)
b2 −0.28 11.80 1.08 6.70
Second quiz
(0.45) (1.48) (0.32) (1.28)
b3 −0.01 2.78 0.65 3.61
(0.17) (0.43) (0.22) (0.83)
b3 − b∗3 −0.24 0.20 0.06 0.94
(0.11) (0.42) (0.16) (0.37)
N = 45 N = 40

Notes: Standard errors in parentheses. Comparison to Bayesian posteriors (in the fourth row of each panel) excludes observations in the Not Updateable
category. For these comparisons, N = 44 and N = 47, respectively, in the top panel, and N = 37 and N = 31, respectively, in the bottom panel.

Subjects in the Self condition have a distinct advantage in learning from the first quiz to the second. Because participants in
the Other condition estimate the performance of two different people in the two quizzes, their experience with the first quiz
is less relevant for estimating performance on the second quiz than for those in the Self condition, who estimate their own
performance in both quizzes. More learning in the Other condition could be explained by private information about ability
in the Self condition, limiting the impact of information and experience on beliefs relative to those in the Other condition.
However, comparing the MSE of actual posterior beliefs to that of the Bayesian posteriors supports the argument that
the limited cross-game learning in the Self condition can be attributed to an ego-preserving bias, as opposed to private
information. In the Other condition, the posterior MSE significantly exceeds the MSE of Bayesian posteriors by 1.02 (Z = 1.82,
p < 0.04), but by the second quiz, the MSE does not significantly exceed that of the Bayesian posteriors (Z = 0.48, p < 0.32).
In the Self condition, the posterior MSE does not decline relative to the MSE of Bayesian posteriors, with the difference
increasing trivially from 0.86 to 0.94 from the first to the second quiz.

4. Discussion and conclusion

We evaluate quantitatively the role participants feel that luck plays in unbiased, but noisy assessments of their per-
formance on an intelligence-based task. On average, they believe their feedback to under-represent their performance by
some 13%. By measuring beliefs both before and after feedback, we are able to distinguish biased updating from Bayesian
responses to overconfident prior beliefs and the evidence is more supportive of the latter. Like Ertac (2011), we find that
biased information processing is concentrated among the subjects with the least accurate, and most overconfident, prior
beliefs. Participants accurately estimate the performance of a randomly selected colleague, and accurately believe the feed-
back their colleagues received to be on average neither lucky nor unlucky. They also update their beliefs, on average, in a
manner consistent with Bayes’ rule. Though, like Clark and Friesen (2009), we find that updating does not eliminate forecast
errors, participants do seem to learn well about their quiz performance from the feedback, which sharply reduces the degree
of overestimation.
In one sense, our results are consistent with the findings of Eil and Rao (2010) and Charness et al. (2011), namely that
overconfident posterior beliefs are more common and more severe when one’s own image is at stake. While Eil and Rao (2010)
and Mobius et al. (2011) find that biased-updating drives their results, we, like Ertac (2011), find that for most of our sample,
beliefs respond symmetrically to good news versus bad news and biased updating does not explain posterior overconfidence.
Two features of our design may explain the difference. One possibility is that asymmetry in updating is dependent upon the
information process, with the type of signals provided by Ertac and in our study creating an environment the undermines
the self-serving bias.
Another possible explanation lies in the fact that our study focuses on absolute performance, while the other studies focus
on relative performance feedback, which simultaneously conveys information about participants’ own performance and that
of others. Subjects in the Other condition are marginally more responsive to negative feedback helps reconcile our study
with the existing literature. Moderately optimistic interpretation of feedback about one’s own performance, combined with
somewhat pessimistic interpretation of feedback about one’s peers, may both contribute to how one interprets information
about one’s standing relative to others.
Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524 523

Table A.8
Performance and belief by quiz.

Obs. s b1 b2 b3

Other treatment
First quiz 10 5.40 (0.82) 5.87 (0.22) 5.54 (0.40) 5.33 (0.46)
Quiz A
Second quiz 10 8.40 (0.50) 5.11 (0.42) 4.98 (0.45) 7.73 (0.56)
First quiz 22 4.27 (0.49) 6.29 (0.31) 4.41 (0.32) 4.42 (0.42)
Quiz B
Second quiz 12 5.00 (0.43) 6.26 (0.31) 5.37 (0.37) 5.09 (0.49)
First quiz 10 5.70 (1.01) 5.35 (0.38) 4.75 (0.41) 4.98 (0.96)
Quiz C
Second quiz 10 4.00 (0.61) 5.22 (0.37) 5.15 (0.33) 4.19 (0.59)
First quiz 12 6.42 (0.50) 6.56 (0.17) 5.98 (0.34) 6.34 (0.49)
Quiz D
Second quiz 13 4.77 (0.64) 4.94 (0.36) 5.04 (0.49) 4.95 (0.64)

Self treatment
First quiz 10 6.40 (0.72) 6.84 (0.56) 6.77 (0.71) 6.74 (0.68)
Quiz A
Second quiz 10 5.00 (0.91) 5.17 (0.83) 5.68 (0.91) 5.28 (0.91)
First quiz 20 4.00 (0.42) 6.47 (0.28) 5.48 (0.34) 4.74 (0.41)
Quiz B
Second quiz 10 5.20 (0.63) 7.35 (0.19) 5.76 (0.39) 5.81 (0.50)
First quiz 10 4.10 (0.67) 5.97 (0.77) 5.52 (0.88) 5.09 (0.86)
Quiz C
Second quiz 10 5.20 (0.71) 6.49 (0.58) 6.82 (0.54) 6.00 (0.69)
First quiz 10 6.20 (0.70) 6.82 (0.32) 7.38 (0.26) 6.81 (0.46)
Quiz D
Second quiz 10 4.10 (0.82) 5.20 (0.59) 5.56 (0.47) 5.02 (0.59)

As participants’ ‘unlucky feeling’ in the Self condition is largely consistent with Bayes’ rule, we conclude that it is mainly
caused by overconfident prior beliefs. The way we generate feedback, a uniform disturbance of the target score, is a departure
from the binary feedback used in other studies and its simple and uniform nature may help our participants conform to
Bayes’ rule to a greater degree than previously observed. In fact, our participants over-react slightly to the unbiased feedback,
which—given their overconfident priors—might be expected to sharply curb posterior overconfidence.
This noise-generating process restricts the range of possible scores to those close to value of the feedback, and thus is
never “totally wrong.” It also obviates Bayes rule for a number of our participants, who received feedback regarded ex ante as
zero-probability, a group whose members on average overestimated their score more severely and performed more poorly
on the quiz than the rest of the participants. Thus, our findings do not directly call into dispute the conclusions of other
studies finding biased updating, which is likely present in many settings. Rather, they shine a spotlight on how judgments
of absolute performance differ from judgment of relative performance.
Our findings also refine our understanding of the persistence of overconfidence in the face of repeated feedback. Feedback
from the first quiz improves the accuracy of beliefs about the second in the Other condition, but not in the Self condition,
where it seems more relevant. Thus, performance feedback improves beliefs about the performance itself, but not about the
process that generates their score on the second quiz, that is, their quiz-taking ability. If the feedback that affects beliefs about
performances fails to alter beliefs about the aptitudes that generate them, overconfidence with respect to these aptitudes
may persist even in the face of abundant and repeated feedback.
Outside the laboratory, one has a lifetime over which to accumulate consistent and repeated feedback, from which to
learn about learning about one’s own performance and attributes. Despite the presence of self-serving biases that hinder
one’s ability to interpret information relevant to one’s own performance, individuals do learn about their performances.
Thus, overestimation should diminish as age and experience provide repeated learning opportunities.
However, our results suggest that overconfidence may also persist because of the failure to interpret learning acquired in
one setting as relevant to another. We do not observe how strongly participants believe their performances are related to one
another, so we cannot quantitatively evaluate the process by which they learn from their feedback on one quiz about their
performance on the next. Still, the two quizzes are likely far more similar than many pairs of related real-world task-feedback
events, so our evidence suggests that self-serving biases may interfere with learning transfer outside the laboratory.

Appendix A.

See Table A.8.

Appendix B. Supplementary data

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/
10.1016/j.jebo.2012.08.006.

References

Banks, J.S., Sobel, J., 1987. Equilibrium selection in signaling games. Econometrica 5, 647–661.
Benoît, J.-P., Dubra, J., 2011. Apparent overconfidence. Econometrica, Econometric Society 79 (5), 1591–1625, 09.
524 Z. Grossman, D. Owens / Journal of Economic Behavior & Organization 84 (2012) 510–524

Burks, S.V., Carpenter, J.P., Goette, L., Rustichini, A., 2010. Overconfidence is a Social Signaling Bias. IZA Discussion Papers 4840. Institute for the Study of
Labor (IZA).
Camerer, C., Lovallo, D., 1999. Overconfidence and excess entry. American Economic Review 89, 306–318.
Charness, G., Rustichini, A., van de Ven, J., 2011. Self-Confidence and Strategic Deterrence. Technical Report. UC Santa Barbara. Working Paper.
Clark, J., Friesen, L., 2009. Overconfidence in forecasts of own performance: An experimental study. Economic Journal 119, 229–251.
Cooper, D., Kagel, J., 2008. Learning and transfer in signaling games. Economic Theory 34, 415–439, http://dx.doi.org/10.1007/s00199-006-0192-5.
Cooper, D., Kagel, J., 2009. The role of context and team play in cross-game learning. Journal of European Economic Association, 1101–1139.
Cooper, D.J., Kagel, J.H., 2003. Lessons learned: Generalizing learning across games. The American Economic Review 93, 202–207.
Eil, D., Rao, J., 2010. The good news–bad news effect: Asymmetric processing of objective information about yourself. American Economic Journal:
Microeconomics.
Ertac, S., 2011. Does self-relevance affect information processing? Experimental evidence on the response to performance and non-performance feedback.
Journal of Economic Behavior and Organization 80, 532–545.
Greiner, B., 2003. An online recruitment system for economic experiments. Forschung und wissenschaftliches Rechnen 63, 79–93.
Grosswirth, M., Salny, A.F., Stillson, A., 1999. Match Wits with Mensa: The Complete Quiz Book (Mensa Genius Quiz), 1st ed. Da Capo Press, Cambridge,
Massachusetts.
Hoelzl, E., Rustichini, A., 2005. Overconfident: Do you put your money on it? Economic Journal 115, 305–318.
Kagel, J.H., 1995. Cross-game learning: Experimental evidence from first-price and English common value auctions. Economics Letters 49, 163–170.
Malmendier, U., Tate, G., 2008. Who makes acquisitions? CEO overconfidence and the market’s reactions. Journal of Financial Economics 89, 20–43.
Mobius, M.M., Niederle, M., Niehaus, P., Rosenblat, T.S., 2011. Managing Self-Confidence: Theory and Experimental Evidence. Working Paper 17014. National
Bureau of Economic Research.
Moore, D.A., Healy, P.J., 2008. The trouble with overconfidence. Psychological Review 115, 502–517.
Svenson, O., 1981. Are we all less risky and more skillful than our fellow drivers? Acta Psychologica 47, 143–148.

You might also like