Improving Assessment of Another Auditor's Competence

AUDITING: A JOURNAL OF PRACTICE & THEORY American Accounting Association
Vol. 28, No. 1 DOI: 10.2308 / aud.2009.28.1.53

May 2009
pp. 53–78
Improving Assessments of Another

Auditor’s Competence
Noel Harding and Ken T. Trotman
SUMMARY: Auditing standards require auditors to assess the competence of their
colleagues. However, previous studies have shown that auditors’ assessments of
their colleagues’ competence are inaccurate and overconfident, potentially leading to
a reduction in audit effectiveness. In two related studies, we investigate both the pro-
cess by which these assessments are made and a potential intervention aimed at
improving these judgments. In study 1, we investigate the anchors used by senior
auditors in assessing the competence of their subordinates and peers, and find that
the anchors vary depending on the familiarity of the audit senior with their colleague.
These findings inform study 2, which investigates the impact of different types of out-
come feedback on auditors’ assessments of another auditor’s competence. We find
that the effects of individual-specific feedback and average-group feedback will be
contingent on the nature of the relationship between the assessor and assessee. Spe-
cifically, individual-specific outcome feedback is effective in reducing overconfidence
when assessing the competence of a colleague with whom the assessor has previously
worked, but not an unfamiliar colleague. When assessing the competence of an un-
familiar colleague, we find that average-group outcome feedback is effective in reduc-
ing overconfidence. Our results complement and extend earlier theory by showing that
individuals, in assessing a colleague’s competence, use anchors in addition to the
competence of the assessor.
Keywords: competence assessments; outcome feedback; audit judgments;
overconfidence.
Data Availability: Contact the authors.
Noel Harding is an Associate Professor and Ken T. Trotman is a Professor, both at the
University of New South Wales.
We are grateful for helpful comments on earlier versions of this paper from Tom Dyckman, Kathryn Kadous
(associate editor), Robert Libby, Mario Maletta, Mark Nelson, Mark Peecher, Steve Salterio, Ira Solomon, Hun-
Tong Tan, Arnie Wright, and two anonymous reviewers. We also thank participants at the 2005 AFAANZ Annual
Conference, 2006 AAA Annual Meeting, and 2006 AAA ABO Conference, and workshop participants at Macquarie
University, The Shanghai University of Finance and Economics, and Cornell University. The authors also acknowl-
edge the financial support of the Australian Research Council.
Editor’s note: Accepted by Kathryn Kadous, under Dan Simunic’s editorship.
Submitted: November 2007

Accepted: October 2008
Published Online: May 2009
53
54 Harding and Trotman
INTRODUCTION
I
t is a requirement of auditing standards that auditors must consider the professional
competence of other auditors when delegating, directing, supervising, and reviewing
audit work (e.g., ISA No. 220; see International Auditing and Assurance Standards
Board 2005). These activities require an auditor to be familiar with the competence of not
only their subordinates but also their peers. Peers sometimes review each other’s work
(Rich et al. 1997b) and often work as teams in the production of a group output (Trotman
2005). Auditors also assess the work of peers when relying on the work of another audit
firm.
While auditors have been shown to be sensitive to the perceived competence of other
auditors (e.g., Bamber 1983; Bamber and Bylinski 1987; Gibbins and Trotman 2002), it
has also been shown that these perceptions are inaccurate and overconfident (Kennedy and
Peecher 1997; Jamal and Tan 2001; Tan and Jamal 2001; Han et al. 2007). That is, auditors
are reacting to inaccurate perceptions of their colleagues’ competence. This overconfidence
has the potential to adversely affect audit effectiveness in a number of ways.
Overstating competence may result in a reduction in audit effectiveness in that auditors
will be assigned to tasks they are underqualified to perform; there will be inadequate su-
pervision, direction, and coaching of audit personnel while completing their work; and the
review of their work will not be as comprehensive as actual audit competence would require.
In addition, Rich et al. (1997a) note that reviewers will use their knowledge of a preparer’s
competency (among other things) in reviewing work papers. If reviewers do not accurately
assess the preparer’s competency, they may come to inappropriate conclusions.
We investigate the anchors used by auditors in assessing the competence of their sub-
ordinates and peers, and then consider the impact of different forms of outcome feedback
for improving the accuracy with which assessments of a familiar and unfamiliar peer are
made. Drawing on the psychology (e.g., Fussell and Krauss 1991) and accounting (e.g.,
Kennedy and Peecher 1997) literature examining assessments of others’ knowledge and
literature on social categorization (e.g., Snyder and Uranowitz 1978), we argue that the
inputs into the process by which auditors assess the competence of other auditors differ
with the level of familiarity between the assessor and assessee. We further argue that while
outcome feedback may be effective in improving assessments of another’s competence, it
is necessary for the feedback to inform and improve the accuracy of the anchor in order
for it to be beneficial (Hammond et al. 1977). That is, the effectiveness of outcome feedback
depends on the extent to which the feedback informs the particular anchor being used.
While the situation where auditors assess an unfamiliar colleague’s competence is not
as common as the situation of assessing a familiar colleague’s competence, it is not insig-
nificant. Addressing this former group is important given the increased globalization of
audit firms and the frequent movement of staff between offices both nationally and inter-
nationally, in addition to short-term assignments across different divisions and industry
specializations within the one office.
Our paper consists of two studies. In study 1 we examine, using verbal protocol anal-
ysis, the process by which auditors assess the competence of both peers and subordinates.
Study 1 was undertaken in order to better understand the inputs into the judgment process
so that we know which information will be most useful in improving these judgments. The
results of this study revealed the importance of anchors when making assessments of an-
other auditor’s competence. We also found that the process of evaluating competence in an
audit setting differs depending on the level of familiarity between the assessor and assessee.
We found that when assessing the competence of a familiar colleague (either subordinate
or peer), assessors most commonly anchor on their perception of the specific competence
Auditing: A Journal of Practice & Theory May 2009

American Accounting Association
Improving Assessments of Another Auditor’s Competence 55
of the assessee. However, when assessing the competence of an unfamiliar colleague, as-
sessors most commonly anchor on their perception of the average competence of the peer
group from which the assessee is drawn (e.g., the competence of an average staff auditor
or senior). Consequently, in study 2 we consider both individual-specific feedback (i.e.,
feedback that relates specifically to the individual whose competence is being assessed) and
average-group feedback (i.e., feedback that relates to the average performance of a group
of individuals from which the person being assessed is drawn), given that either may im-
prove the accuracy of the inputs (i.e., anchor) and reinforce the judgment process depending
on the relationship between the assessor and assessee.
Study 2 examines the influence of three levels of feedback (no feedback, individual-
specific feedback, and average-group feedback) on an assessor’s accuracy, conditional on
two levels of assessee familiarity (familiar/unfamiliar with the assessee). Our theory and
results show that the provision of outcome feedback can be effective in improving assess-
ments of another auditor’s competence. However, this will not always be the case and
depends on both the type of outcome feedback and the relationship between the assessor
and the colleague whose competence is being assessed. Individual-specific feedback is
effective when assessing the competence of a colleague with whom the assessor is familiar,
but not when assessing the competence of a colleague with whom the assessor has not
previously worked (i.e., an unfamiliar colleague). When assessing the competence of an
unfamiliar colleague, average-group feedback is effective in improving competence assess-
ments. Thus, while intuitively one would expect that specific feedback would be more
informative than general feedback, we show that this is not always the case.
The major contributions of this paper are as follows. First, while there is evidence that
auditors are overconfident in assessing another auditor’s competence (Kennedy and Peecher
1997; Jamal and Tan 2001; Tan and Jamal 2001; Han et al. 2007), little is known about
how auditors assess the competence of another auditor, either a subordinate or a peer, and
the extent to which auditors are consciously aware of their decision processes. Previous
studies (e.g., Kennedy and Peecher 1997) provide evidence that, when predicting other
auditors’ task-specific performance, auditors anchor on perceptions of their own knowledge.
While our analysis highlights the role of an assessor’s own knowledge when predicting
task-specific performance, we suggest and find that overconfidence is also explained by
anchors in addition to auditors’ perception of their own knowledge. Second, this is the first
study to examine auditors’ verbal reports of their own decision processes while predicting
other auditors’ task-specific performance. Specifically, we employ verbal protocol analysis
to investigate what auditors anchor on when assessing the competence of a familiar and
unfamiliar subordinate and peer. Identifying a more comprehensive list of anchors used
when assessing the competence of other auditors (particularly anchors that auditors are
consciously aware of) increases the range of potential interventions aimed at reducing the
overconfidence identified in previous research. Third, we examine whether the provision of
different forms of outcome feedback can improve the assessment of another auditor’s com-
petence and how the effectiveness of these types of outcome feedback varies depending on
the familiarity of the assessee to the assessor.
STUDY 1
Background and Prior Research
The strategy with which individuals assess the competence of others and, therefore, the
conditions necessary for different forms of outcome feedback to be effective, has received
some attention in both the psychology and accounting literatures. The limited psychology
literature reports results consistent with the understanding that assessors use an anchoring

and adjustment heuristic, anchoring on perceptions of their own knowledge, when assessing
the knowledge of others (e.g., Nickerson et al. 1987; Nickerson 1999).
Consistent with the Nickerson (1999) model, research in the accounting literature
(Kennedy and Peecher 1997; Han et al. 2007) theorizes and finds evidence to support the
fact that assessors rely on perceptions of their own knowledge to predict the knowledge of
their colleagues. That is, they use their own knowledge as the anchor and then use the
colleague’s category membership (based on hierarchical level within the audit firm) to make
any adjustment for perceived differences between themselves and the colleague. In this
regard, they propose a ‘‘knowledge gap’’ explanation to argue that the greater the difference
in knowledge, the greater the difficulty for assessors to realize that the colleague does not
possess the same level of knowledge as themselves and therefore the increased likelihood
of insufficient adjustments from the anchor. Following this argument, the greater the knowl-
edge gap, the greater the overconfidence.
Kennedy and Peecher (1997) found that, while both seniors and managers were over-
confident in their predictions of their subordinate’s knowledge, seniors were more overcon-
fident in their staff auditor’s knowledge (high knowledge gap) than managers in their
senior’s knowledge (low knowledge gap). They also found, consistent with the understand-
ing that an auditor’s own knowledge is one of the anchors relied upon when predicting the
knowledge of others, that a superior’s prediction of a subordinate’s knowledge is highly
correlated with the superior’s confidence in their own knowledge.
Tan and Jamal (2001) found that when evaluating the quality of a memo written by an
outstanding and average senior (as identified by the firm from which the participants were
drawn), managers more favorably evaluated the memo written by the outstanding senior
only when the identity of the memo preparer was known. When the identity of the preparer
was concealed, there were no differences in the level of evaluation between the outstanding
and average senior. Tan and Jamal note that ‘‘this result is consistent with the auditors
assessing the quality of their subordinate’s work at least partially based on the perceived
competence of these subordinates’’ (Tan and Jamal 2001, 100). This suggests that auditors
may use multiple anchors. Other psychology research (Nickerson 1999; Switzer and Sniezek
1991; Whyte and Sebenius 1997) also notes the use of multiple anchors. Thus, while
auditors may anchor on their own knowledge when assessing the knowledge of others,
other anchors may be used, either as a substitute for, or together with, the assessor’s own
knowledge. In study 1, we identify some previously unidentified anchors.
The question of which anchors are used in particular situations can be informed with
reference to the social categorization literature. This literature suggests that in order to
manage their interaction with the social world, individuals develop and use social categories
(e.g., Malt et al. 1995). Social categories contain the features believed to be common of
those individuals forming part of that social group (i.e., exemplars) (Moskowitz 2005).
Categories are argued to be hierarchically structured, with some overlap across the
different hierarchies (e.g., Rosch et al. 1976; Taylor and Crocker 1981; Lingle et al. 1984).
Higher level categories are more abstracted representations of lower level categories. With
extensive interaction with a particular person, a social category (or exemplar) representing
the typical features of only that individual will develop (Smith and Zarate 1992; Chen
2001). Individuals may, therefore, attribute an individual with either very specific or very
general characteristics as the anchor depending on the specificity of the social category into
which they are placed. The effectiveness of outcome feedback in this setting, therefore,
depends on the extent to which the feedback informs the particular anchor being used.
Given the necessity to interact with colleagues in the course of completing an audit,
and the clearly defined hierarchical structure of audit firms, auditors are expected to have

a very structured series of social categories that begins with auditors in general, followed
by social categories for managers, seniors, and staff, then working down to categories or
exemplars represented by the typical characteristics of individuals with whom the auditor
often works. Study 1 explores this in relation to an auditor’s assessment of another auditor’s
competence.
Hypothesis Development
Although previous research has suggested that auditors are likely to anchor on their
own perceived knowledge levels, we propose that anchors in addition to the assessor’s own
knowledge may be employed. We propose that the anchor depends on the category mem-
bership of the colleague whose knowledge is being predicted. In situations where little, if
anything, is known about the individual, the individual will be placed into a very general
social category (i.e., superordinate category) (Oakes et al. 1991; Blanz and Aufderheide
1999). The lack of knowledge about the other person will preclude a more specific cate-
gorization into a lower level category. The assessor will attribute the individual with the
competence perceived to be typical of that general social category and then adjust for any
perceived differences that would distinguish the individual from the typical group member
(if adjustment was considered necessary). This is the context within which the limited
psychology literature has examined the issue (e.g., Fussell and Krauss 1992), that is, as-
sessors estimating the knowledge possessed by others with whom they are unfamiliar. In
such a situation, the only option is to attribute the individual being assessed with the typical
knowledge possessed by members of the superordinate social category to which they are
perceived to belong.
In an audit setting, at a minimum, the hierarchical level of the assessee will be known.
When an auditor (in this study, a senior) assesses the competence of another auditor (a
peer senior auditor or a subordinate staff auditor), it is proposed that the assessee will
initially be placed in the broad superordinate social category corresponding to their hier-
archical level. In the situation where little is known about the auditor being assessed (i.e.,
the assessor has not previously worked with the assessee), it will not be possible to place the
individual into a more detailed lower level subordinate social category. Therefore, when
assessing the task-specific competence of an unfamiliar senior auditor (i.e., peer), seniors
will use the competence believed to be typical of senior auditors in general as their anchor.
Similarly, when assessing the task-specific competence of an unfamiliar staff auditor (i.e.,
subordinate), seniors will use the general competence believed to be typical of staff auditors
in general as their anchor. This leads to the following two hypotheses.
H1a: When assessing the task-specific competence of a peer with whom the
assessor has not previously worked, seniors use their perception of the com-
petence of seniors in general as the anchor.
H1b: When assessing the task-specific competence of a subordinate with whom
the assessor has not previously worked, seniors use their perception of the
competence of staff auditors in general as the anchor.
On other occasions, more will be known about the other auditor being assessed. The
assessor and assessee may have, for example, worked together on a number of audit en-
gagements, the assessor may have supervised and/or reviewed the work of the assessee,
and they may have attended the same or similar training sessions. These experiences allow
for the development of more detailed lower level subordinate social categories and the

classification of individuals into those categories. When assessing the task-specific com-
petence of such an auditor, assessors uses their perception of the specific auditor’s overall
competence as the anchor.
It will, however, still be necessary to make an inference about the peer’s or subordi-
nate’s competence. Even if the assessor has previously experienced the assessee’s compe-
tence in the specific area, the time that has passed since that exposure allows for, among
other things, knowledge to have been forgotten, new knowledge to have been learned, and
additional experience to have been gained. This leads to the following two hypotheses.
H2a: When assessing the task-specific competence of a peer with whom the
assessor has previously worked, seniors use their perception of the specific
peer’s overall competence as the anchor.
H2b: When assessing the task-specific competence of a subordinate with whom
the assessor has previously worked, seniors use their perception of the
specific subordinate’s overall competence as the anchor.
Research Methods
Participants
In total, 20 seniors from Australian offices of four international accounting firms par-
ticipated in the study. The mean experience of the participants was 33.4 months (range:
14–48 months). Participants were given an A$50 gift voucher. Verbal protocols were col-
lected individually.
Research Design
The hypotheses were examined with a 2 (familiarity) ⫻ 2 (hierarchical relationship)
design. The two familiarity levels were: familiar with the assessee and unfamiliar with the
assessee. The two hierarchical relationship levels were: a peer of the assessee (i.e., senior)
and a subordinate of the assessee (i.e., staff auditor). Both factors were manipulated within
subjects. The recording equipment failed for one of the protocols provided by one of the
participants. In total, therefore, we analyze 79 protocols.
Participants were required to assess the competence of four other auditors reflecting
two levels of familiarity and two levels of hierarchical relationship as follows: (1) a senior
(with approximately four years of audit experience) with whom the assessor had worked
in the last six months; (2) a senior (with approximately four years of audit experience) with
whom the assessor had not previously worked; (3) a staff auditor (with approximately one
year of audit experience) with whom the assessor had worked in the last six months; and
(4) a staff auditor (with approximately one year of audit experience) with whom the assessor
had not previously worked. They were asked to select colleagues who matched the familiar
descriptions and write these auditors’ first names only onto a ‘‘staff card’’ that could be
referred to while the materials were being completed.1 A description of two other ‘‘ficti-
tious’’ auditors was also provided. The unfamiliar staff auditor was described as follows:
‘‘Lee is a staff auditor with one year of audit experience. Lee transferred to your group
from interstate.’’ The unfamiliar senior was described as follows: ‘‘Chris is a senior auditor
with four years of audit experience (one year as a senior). Chris transferred to your group
from interstate.’’
1
Procedures were put in place to ensure that no one, other than the participants themselves, would ever know
the identity of the colleague whose competence they assessed.

The assessment of competence was operationalized by presenting audit circumstances

that might be encountered during the conduct of an audit. These are circumstances that
warrant the attention of the reviewing senior (for tasks completed by staff auditors) or
require revisions to the audit plan (for tasks completed by senior auditors). Audit compe-
tence is reflected in whether the staff auditor or senior auditor being assessed would bring
the matter to the attention of the reviewing senior or revise the audit program, respectively.
Case Development and Procedures

Each of the four cases outlined a circumstance that an auditor should be vigilant for
when conducting the audit. Participants were asked to assess their confidence in the staff
or senior auditor identifying and appropriately acting on the audit issue discussed in each
case. Participants responded on an 11-point scale anchored by ‘‘100 percent confident’’ and
‘‘0 percent confident.’’
The first staff auditor case (Case 1) involved attendance at the physical inventory count.
Participants were asked to report their confidence that if the staff auditor being evaluated
saw large amounts of inventory in difficult to reach locations, the staff auditor would iden-
tify this as being an issue and bring it to the attention of the reviewing senior. The specific
words used in the case were:
When observing the physical inventory count, the auditors observing the client’s count need
to be alert for the existence of events or circumstances that should be documented and
brought to the reviewing senior’s attention. These circumstances are not usually specified,
or are only generally discussed in the audit program that is followed by audit staff. An
example would be circumstances that may suggest valuation concerns such as inventory in
difficult to reach locations within the warehouse.
The second staff auditor case (Case 2) involved the performance of cut-off tests. Par-
ticipants were asked to report their confidence that if the staff auditor tested transactions
that involved a related party transaction, the auditor would identify this as being an issue
and bring it to the attention of the senior. The specific words used in the case were:
When performing cut-off tests, the auditor performing the work should be alert for the
existence of events or circumstances that should be documented and brought to the reviewing
senior’s attention. These circumstances are not usually specified, or are only generally dis-
cussed in the audit program that is followed by audit staff. An example would be transactions
involving a previously identified related party.
The first senior case (Case 3) related to planning issues following the identification of
an unusual fluctuation in sales returns and allowances. This fluctuation has implications for
a number of accounts including the valuation and existence of debtors, the valuation of
inventory, and the measurement of warranty liabilities. Participants were asked to indicate
their confidence that the auditor being assessed would identify the implications for the
related accounts. The specific words used in the case were:
When conducting analytical review, unusual fluctuations often have implications for accounts
other than that for which the fluctuation was identified. One example would be the identi-
fication of excessive sales returns and allowances. Such a fluctuation has implications for
the valuation of inventory and measurement of warranty liabilities.
The second senior case (Case 4) was concerned with planning issues surrounding the
possibility that debtors were understated. Participants were asked to indicate their confi-
dence that the auditor being assessed would identify the need for additional tracing, not
vouching. The specific words used in the case were:

When amending the audit program following concerns arising from analytical review, the
senior must ensure that the additional testing is consistent with the audit objective being
pursued. One example would relate to the possibility of debtors being understated. In such
a situation, it would be appropriate to, for example, trace sales invoices to debtors listing
(additional tracing) rather than increasing the debtors circularization sample size (additional
vouching).
The order in which the four assessments were made was randomized. In addition, the
two staff auditor and two senior auditor cases were first counterbalanced across the two
staff and senior assessments, respectively.2
Protocol Coding and Analysis
The research materials were administered by one of the researchers separately for each
participant. Consistent with the recommendations of Ericsson and Simon (1993), warm-up
(practice) exercises were employed in order to familiarize participants with the requirement
to ‘‘think aloud.’’ During the completion of the cases, the only interaction between the
researcher and the participant was a prompt to continue thinking aloud when they fell silent
for a period of 10 to 15 seconds.
All protocols were initially transcribed verbatim and then were broken up into discrete
protocol episodes representing a unique thought, word, or comment. This led to a total
of 870 protocol episodes. In order to code the protocol episodes, three broad categories of
operators were defined: information acquisition, processing, and decision. Information ac-
quisition referred to those operators that involved the identification and/or retrieval of in-
formation the participant believed necessary to complete the task. This information may
have been retrieved from memory or from the case materials. Processing operators involved
the use of information as the decision maker moved toward their judgment. Finally, decision
operators involved the use of information to make the decision required in a particular
situation.
Two coders, one of the researchers and an academic colleague with public accounting
experience, independently coded each of the protocols. Both coders were blind to the ex-
perimental condition from which each protocol was derived. The kappa coefficient was
0.83, representing a high level of reliability (Cohen 1960). The disagreements between
coders were discussed and resolved. The final agreed upon coding was used to derive a
decision path (flowchart) for each judgment describing the information used, when the
information was used, and how the information was used.
To identify the anchor employed in each judgment, the information acquisition opera-
tors were examined for each protocol with a view to identifying any such operator that
involved a statement of possible value for the competence decision that was to be made.
Examples of such statements include ‘‘Auditor A is very good,’’ ‘‘Auditor A has a lot of
experience,’’ ‘‘Senior auditors would be able to do this.’’ Once the particular information
acquisition operators were identified, the decision paths were examined so as to eliminate
any such information acquisition operator verbalized after processing had taken place (as
this would relate to an adjustment from a different anchor).3
2
The randomly determined orders were fairly well balanced across the 20 participants with no one particular
order being represented on more than three occasions. It does not appear that the use of an anchor in one
condition influenced the anchor used in other conditions. No participant used the same anchor more than once,
and a review of the protocols revealed no instances of a participant referring to a previous case.
3
An independent coder blind to the study’s hypotheses also examined each protocol with a view to identifying
the anchor employed in each case. There were only four occasions out of a total of 79 protocols where the
independent coder came to a conclusion different to that originally arrived at, and analyzing data using
the independent coder’s conclusions did not change any statistical inferences.

Results
The above analysis allowed for the identification of an anchor in 71 of the 79 protocols.
There were eight occasions where participants immediately made a decision and the pro-
tocols did not reveal how that decision was made. The anchors employed in each of the
four research conditions are reported in Table 1.
Underlying our hypotheses was the belief that the relationship between the assessor
and assessee determines the anchor employed when assessing another auditor’s competence.
The bolded cells in Table 1 highlight the observations that were consistent with the hy-
pothesis in the four research conditions. Of all observations where an anchor was identified
(total 71), 72 percent (␹2 ⫽ 13.53, p ⬍ .001) were consistent with expectations reflected
in the four hypotheses. In addition, the choice of either a generic or assessee-specific anchor
was not independent of assessor-assessee relationship (p ⬍ .001, Fisher’s exact test). As
expected, when assessing the competence of a familiar colleague, auditors preferred an
assessee-specific anchor, but in the case of assessments of an unfamiliar assessee, they
preferred a generic competence anchor (see Table 1).
TABLE 1
Identified Anchors in Each Research Condition
Peer Subordinate Peer Subordinate

Unfamiliar Unfamiliar Familiar Familiar
Generic competence of seniors 14 — 2 —
Generic competence of staff auditors — 10 — 2
Specific competence of individual peer — — 15 —
Specific competence of individual subordinate — — — 12
Assessor’s competence 1 1 — 1
Competence of another auditor a 1 3 — —
Competence / dedication of Asian cultures — 1 — —
Zero confidenceb 3 2 — —
Scale midpointc 1 — 1 —
Non competence based characteristics of assessee — — — 1
Subtotal 20 17 18 16
No identifiable anchor d — 2 2 4
Total 20 19 20 20
Chi-Squared (␹2) Statistice 3.20 0.53 8.00 4.00
Significance .074 .467 .005 .046
Bold indicates observations consistent with the hypothesis.
Hypothesis 1a: peer, unfamiliar research condition.
Hypothesis 1b: subordinate, unfamiliar research condition.
Hypothesis 2a: peer, familiar research condition.
Hypothesis 2b: subordinate, familiar research condition.
a
This anchor was the competency of another auditor (same hierarchical level) with whom the assessor was
familiar.
b
These participants used zero confidence as their starting point.
c
These participants used the scale midpoint as their starting point.
d
In these eight cases, participants immediately made a decision and the protocols did not reveal how that
decision was made.
e
This test examined whether there was a greater than chance preference for an anchor consistent with that
proposed in the hypothesis as compared with the aggregate of all other identified anchors. That is, a 2 ⫻ 1
contingency table.

Hypothesis 1a proposed that when assessing the competence of a peer with whom the
assessee had not previously worked, assessors would use their perception of the competence
of seniors in general as the anchor. Table 1 reveals that 14 of the 20 observations revealed
this anchor. Chi-squared test of proportions revealed that there was a greater than chance
preference for this anchor (␹2 ⫽ 3.20, p ⫽ .074).4 Hypothesis 1a is, therefore, supported
at the 10 percent level.
Hypothesis 1b proposed that when assessing the competence of a subordinate with
whom the assessor had not previously worked, assessors would use their perception of the
competence of staff auditors in general as the anchor. Table 1 reveals that although 10 out
of the 17 observations were consistent with this expectation (and the next most common
anchor was represented on only three occasions), there was no statistically significant pref-
erence for the generic competence of staff auditors as the anchor (␹2 ⫽ 0.53, p ⫽ .467).
Hypothesis 2a predicted that when assessing the competence of a peer with whom the
assessor had previously worked, the assessor would use the specific peer’s competence as
the anchor. Of the 18 observations exhibiting the use of an anchor in this research condition,
15 observations revealed the use of the specific peer’s competence as the anchor. There
was a greater than chance preference for the specific competence of the peer as the anchor
(␹2 ⫽ 8.00, p ⫽ .005). Hypothesis 2a is, therefore, supported.
Hypothesis 2b proposed that the specific competence of the assessee would be used as
the initial reference point when assessing the competence of a subordinate with whom the
assessor had previously worked. Table 1 reveals that 12 of the 16 observations where an
anchor was evident were consistent with the use of this anchor. There was a greater than
chance preference for using the specific competence of the assessee as the anchor (␹2
⫽ 4.00, p ⫽ .046). Hypothesis 2b is, therefore, also supported.
In this study, the assessor’s knowledge or competence was identified as an anchor in
only three of the 79 cases (see Table 1). This is not to suggest that an auditor’s own
knowledge is unimportant when making assessments of another auditor’s competence. Au-
ditors may have multiple anchors, including perceptions of their own knowledge. To the
extent that an anchor is not consciously considered, or the assessor’s own knowledge in-
fluences the specification of the identified anchor, the protocols might not refer to an as-
sessor’s own knowledge.5 In addition to an examination of the role of outcome feedback
as it relates to the anchors identified in study 1, study 2 also investigates the role of the
assessor’s own knowledge when predicting the performance of others.
STUDY 2
While research in psychology has found outcome feedback to be of only limited benefit,
and then only for simple tasks (e.g., Balzer et al. 1994; Kluger and DeNisi 1996), research
in accounting domains has found that where the decision maker has task experience and
the task has a high level of predictability, outcome feedback can have a positive impact
(Hirst and Luckett 1992; Bonner and Walker 1994; Earley 2001). In study 2, we consider
which type of outcome feedback will be most effective in improving assessments of the
4
In this and subsequent analysis, we report the results of a conservative 2 ⫻ 1 contingency table test of propor-
tions, with an anchor consistent with the hypothesized expectation representing one category and all other
anchors collapsed into the other category.
5
In addition, it should be noted that the circumstances in which assessments were made in the current study were
different from those present in Kennedy and Peecher (1997). In Kennedy and Peecher, participants answered
the multiple-choice questions and then immediately assessed the likelihood that a subordinate would answer the
questions correctly. In the current study, auditors were not asked to first complete the four cases before predicting
their colleagues’ responses to the case. In this situation, the assessors’ own knowledge is a less salient cue than
was the case in Kennedy and Peecher (1997).

competence of a peer. As noted earlier, peer assessments occur in group settings within the
one firm and in reviewing the work of peers when relying on the work of another audit
firm.
In study 1, we found that different anchors were used in different situations. When
seniors assess the task-specific competence of another senior with whom they are familiar,
they rely on perceptions of this senior’s overall competence as an anchor. When assessing
the task-specific competence of an unfamiliar senior, the anchor relied on becomes the
average competence of seniors in general. We draw on these results to inform us of which
types of outcome feedback will be most useful given different levels of familiarity with the
assessee. Two types of outcome feedback are examined: ‘‘individual-specific feedback’’ and
‘‘average-group feedback.’’ Individual-specific feedback relates to the particular individual
whose competence is being assessed, while average-group feedback relates to the averaged
performance of a group of individuals to which the individual belongs. We choose these
two types of feedback because they provide information related to the two types of an-
chors used by auditors in assessing the task-specific competence of their peers, as found
in study 1.
Hypothesis Development
First we consider the situation when an auditor is assessing the task-specific competence
of someone they are familiar with and they use their perception of this specific auditor’s
competence as the anchor (as found in study 1). In this situation, individual-specific feed-
back will reinforce the process being used and provide the assessor with an opportunity to
improve the accuracy of the cue (i.e., anchor). This, we argue, will lead to improved
performance. The provision of average-group feedback, on the other hand, will not be
consistent with the anchor being used. The provision of this feedback will not support any
meaningful improvement in the accuracy of the cues being used and, as it is inconsistent
with the strategy being used, it may encourage the decision maker to change strategies in
an attempt to use the perceived informativeness of the feedback being provided.
The opposite is argued to be true when an auditor assesses the performance of an
unfamiliar peer. Study 1 revealed that in these situations, the anchor employed is the average
competence of the group of individuals to which the assessee belongs. Here, individual-
specific feedback will not be consistent with the anchor used and the strategy being em-
ployed. It will not support any meaningful improvement in the accuracy of the predictions.
For an unfamiliar peer, it is average-group feedback that will improve the accuracy of the
anchor and reinforce the strategy being used. These expectations are expressed in H3a and
H3b.
H3a: When predicting familiar peers’ performance, auditors are more accurate
when provided with individual-specific feedback rather than either average-
group feedback or no feedback.
H3b: When predicting unfamiliar peers’ performance, auditors are more accurate
when provided with average-group feedback rather than either individual-
specific feedback or no feedback.
Method
Participants
Participants in this study were part-time graduate students studying in an Australian
accounting master’s degree program taught in China. Most participants were in full-time

employment. Participants from two locations (Beijing and Guangzhou) participated. Partic-
ipants in Beijing participated in the entire study (i.e., stage one and stage two). Participants
in Guangzhou were only involved in the first stage.6 Entry requirements into the postgrad-
uate program in China are the same in both Beijing and Guangzhou and are consistent with
those applied in Australia, the curriculum is the same in both Beijing and Guangzhou,
instruction is in English, and they receive the same qualification. Participants in this study
were completing the final subject (Auditing and Assurance Services) of their master’s
program.
There are a number of important aspects of the postgraduate program in China that
make our sample of graduate students appropriate to this study. First, we have no reason
to expect that characteristics distinguishing graduate students from auditing practitioners
will interact with either feedback type or familiarity and, therefore, use auditing students
to test our theory (e.g., Peecher and Solomon 2001).7 Second, the Beijing and Guangzhou
programs are virtually identical except for the fact that they are taught in different cities.
While students are not aware of the specific individuals studying in the other city, students
know that they are studying an identical program with identical entry standards (related to
past academic record and work experience) and expectations as to progress. The curriculum
is the same, and the academics teaching each course are normally the same. This is not
dissimilar to the situation in which a firm’s offices located in different geographical regions
have consistent recruitment, evaluation, promotion, and training programs. Third, the fact
that students in each location work closely with a small group of their classmates on a
number of class exercises and assignments (and often study together) presents a situation,
again, not dissimilar to that in audit firms where staff work together on a number of client
assignments, study together toward their professional qualifications, and become familiar
with the competence of their colleagues.
In total, 80 participants in Beijing and 27 participants in Guangzhou completed the
stage one materials, comprising 48 randomly ordered multiple-choice audit questions (see
below). In stage two, 80 participants from Beijing completed the research materials.8 The
average age of the participants was 30.4 years, and three-quarters were female.9 Specific
procedures were put in place to ensure the confidentiality of the participants and their
responses. Of the 80 usable responses in stage two, 29 were in the no feedback condition,
25 in the individual-specific feedback condition, and 26 in the average-group feedback
condition.
6
As discussed below, there were two stages in the administration of the study. Stage one provided data that was
necessary for the operationalization of stage two.
7
In addition, the operationalization of the experimental manipulations would have been extremely difficult if
practitioners were to be used as participants. In such a study, it would be necessary to obtain objective measures
of individual auditor performance, then have other auditors predict that performance, followed by either
individual-specific feedback or average-group feedback. Exploratory discussions with firms indicated that the
sensitive nature of the data was a major impediment to their involvement in the project.
8
This was from the same group of participants who completed the stage one materials. From a total available
population of 94 students, 11 were unwilling or unable to participate. In addition, three students did not provide
complete responses and were, therefore, not included in the analysis. This yielded a usable sample of 80
participants. Of the 80 participants who completed stage two, there were five participants who were not partic-
ipants in stage one. Excluding these participants does not impact on our statistical inferences.
9
There were four participants who did not indicate their year of birth. Three participants did not indicate their
gender. The gender imbalance is also reflected in the entire student population from which the sample was
drawn. There is also no evidence to suggest that the proportion of females to males differed across research
conditions (␹2 ⫽ 0.310, p ⫽ 0.856).

Experimental Design
The hypotheses were examined with a 3 (feedback type) ⫻ 2 (familiarity) design.
Feedback was manipulated between subjects, while familiarity was manipulated within
subjects. Feedback was varied across three levels: no feedback (control), individual-specific
feedback, and average-group feedback. Individual-specific feedback is feedback relating
specifically to the competence (past performance) of the assessee. Average-group feedback
is the averaged competence (past performance) of all members of the group (social cate-
gory) to which the assessee belongs.
Familiarity was manipulated within subjects across two levels: familiar with the asses-
see and unfamiliar with the assessee. We operationalized this manipulation by asking par-
ticipants to predict the performance of two individuals: one they selected from a team they
had previously worked with, and the other was an unfamiliar person for whom only general
information was known. In the familiar case, the assessor was familiar with the prior work
performance of the assessee, but not necessarily the specific area where performance was
being predicted. In the unfamiliar case, the assessor was not familiar with the work per-
formance of the specific individual being assessed but was aware of the social category
from which the person was drawn.
Research Instrument and Procedures

Stage one involved participants completing a multiple-choice questionnaire containing
48 audit questions. This allowed for a criterion measure of prediction performance (it is
for these questions that the assessor predicted whether the assessee provided a correct
answer). These questions were based on material that had been covered in the just-
completed first half of the auditing course, were deliberately selected so as to vary in terms
of difficulty, and covered a range of audit topics. The questions were randomly ordered for
each participant and a time limit of one hour set for their completion.
Stage two was administered two weeks after the completion of stage one. All partici-
pants who were completing the audit course in Beijing were randomly allocated to one of
the three feedback conditions. Participants were reminded of the fact that two weeks earlier
they had completed a multiple-choice questionnaire containing 48 questions, and informed
that they would be predicting whether other participants answered those same questions
correctly.
Familiarity was operationalized by having the participants predict the performance of
a student selected at random from Guangzhou (unfamiliar) and also predict the performance
of a student with whom they had worked in completing the group assignment for the course
they were studying at the time (familiar).10 To provide participants with some general
information enabling them to establish the social category from which the unknown student
was drawn, participants were told the following information about the unfamiliar stu-
dent: they were studying accounting in the Guangzhou program, they had passed all subjects
to date, and they had studied the same topics in the auditing course as the Beijing students.
In this way, the assessor is aware of the general social category from which the person is
drawn, but not the specific person.
10
In order to maintain confidentiality, two names were randomly selected from the group in which the participant
completed the auditing assignment. Participants were free to select either student and did not reveal or record
this student’s name on the research materials.

The 48 questions were divided into three blocks of 16 questions.11 For participants in
all conditions, the 48 questions were printed on separate pages (together with the correct
answer) and were randomly ordered within each of the three, 16-question assessment
blocks. The 48 questions, together with space to provide the likelihood assessments, were
presented to the participants in one complete package. Participants were instructed to com-
plete the predictions in the order provided.
Under each question and answer, there were places for participants to indicate their
assessment of whether the familiar and unfamiliar assessee would have answered the ques-
tions correctly. The order of the assessments (i.e., familiar and unfamiliar) was randomized
across participants. To avoid unnecessary confusion, the order did not vary for each indi-
vidual participant. At the completion of the experiment, participants completed a brief exit
questionnaire and a manipulation check on the familiarity variable. For each of the two
students whose performance was predicted, participants indicated their level of familiarity
on a seven-point scale anchored by 1, ‘‘not at all familiar,’’ and 7, ‘‘very familiar.’’
The experiment was administered by one of the authors across five sessions over two
days. The participants, who were randomly allocated to each treatment group, were advised
in advance of which session they should attend. In the first session, all participants in the
no feedback condition completed the materials. The average-group feedback condition was
given in the following two sessions, and the individual-specific feedback condition was given
in the final two sessions.12 At the end of each session, all participants were instructed not
to discuss the nature of the study with other students.
In the average-group feedback condition, average-group feedback was given immedi-
ately following the predictions for each question. After making the two predictions for each
question (i.e., for the familiar and unfamiliar colleagues), participants gave the sheet on
which the likelihood predictions were made to a research assistant who, in turn, gave the
participant a sheet containing average-group feedback. Average-group feedback was oper-
ationalized by providing statistics on what percentage of students responded A, B, C, or D
for each question. The feedback sheet also included the question and correct answer to-
gether with the original likelihood assessments provided by the participant (placed on the
sheet by the research assistant). The participant reviewed the feedback and, when ready,
returned the feedback sheet to the research assistant and moved on to the next question.13
An example of the feedback provided in this condition is presented in Figure 1.
The individual-specific feedback condition used the same procedures as those for the
average-group feedback condition except for the fact that individual-specific feedback was
11
The 48 questions were first ranked in order of difficulty (based on the answers / performance from stage one).
Following Hirst et al. (1999), the questions were then systematically allocated to each of the three blocks by
allocating the most difficult question to block one, the second most difficult question to block two, the third
most difficult to block three, the fourth most difficult to block three, the fifth most difficult to block two, and
so on. The purpose was to maintain equal difficulty across blocks.
12
At each of the five session times, only one large room was available. As a result, if different treatments were
in the same room at the same time, it would have been obvious some were getting feedback (particularly
individual-specific). In addition, the need to administer feedback meant that fewer participants could be accom-
modated in the room at any one time. Consequently, participants were randomly allocated to one of the five
session times in advance.
13
The average-group feedback was based on the performance of the Guangzhou students in stage one. There were
no statistically significant differences in the performance of Beijing students as compared with Guangzhou
students (t ⫽ 0.275, two-tailed p ⫽ 0.787). Each question was analyzed with a view to establishing whether
there were any differences in the level of difficulty for Beijing students compared with Guangzhou students.
Chi-squared tests revealed that there were significant differences in the proportion of students answering correctly
for four of the 48 questions. Two of the questions were more difficult in Beijing, and two were more difficult
in Guangzhou. For each question, there were no statistically significant differences between the pooled perform-
ance and the performance in either Beijing or Guangzhou.

FIGURE 1
Example of Feedback Provided to Participants in the Average-Group Feedback Condition
Feedback
An audit of the financial report of Campbell Ltd is being conducted by an external auditor.
The external auditor is expected to:
A Express an opinion as to the fairness of Campbell Ltd’s financial report.

B Express an opinion as to the attractiveness of Campbell Ltd for investment
purposes.
C Certify the correctness of Campbell Ltd’s financial report.
D Make a 100% examination of Campbell Ltd’s records.
The correct answer is A
85.7% of students answered A and were, therefore, correct.

0.0% of students answered B and were, therefore, incorrect.
10.7% of students answered C and were, therefore, incorrect.
3.6% of students answered D and were, therefore, incorrect.
Your estimated likelihood that student A would answer the question correctly was:
I am certain that the 0 1 2 3 4 5 6 7 8 9 10 I am certain that the

student would have student would have
answered the answered the
question incorrectly question correctly
Your estimated likelihood that student B would answer the question correctly was:
I am certain that the 0 1 2 3 4 5 6 7 8 9 10 I am certain that the

student would have student would have
answered the answered the
question incorrectly question correctly
Feedback of this type was provided following each likelihood assessment.

Participant’s estimated likelihoods were entered on the feedback sheet by a research assistant.
provided in the place of average-group feedback. Individual-specific feedback was opera-

tionalized by indicating the correct response and the response (A, B, C, or D) that the
student whose performance was being predicted provided as an answer, for example:
‘‘The correct answer is A’’ and ‘‘The answer provided by student A was C and was therefore
incorrect.’’ This was done for both assessments made by the participant. The no feedback
group was given neither average-group feedback nor individual-specific feedback.

Dependent Variables
Four measures of prediction performance were employed in this study: calibration,
confidence, percentage correct, and slope.
Participants were asked to assess whether the person whose competence they were
assessing would have answered each of 48 audit questions correctly. Responses were on
an 11-point scale anchored by 0, ‘‘I am certain that the student would have answered the
question incorrectly’’ and 10, ‘‘I am certain that the student would have answered the ques-
tion correctly.’’ To calculate each of the dependent variables, it was first necessary to convert
each participant’s responses so that they represented a prediction and a corresponding as-
sessment of their confidence in that prediction. Responses 0 to 4 and 6 to 10 were coded
as a prediction that the student would have answered the question incorrectly and correctly,
respectively. With regard to confidence, responses of 0 (10), 1 (9), 2 (8), 3 (7), and 4 (6)
were recorded as 1.0, 0.9, 0.8, 0.7, and 0.6, respectively.14
Calibration is a measure of the accuracy of a decision maker’s confidence. Following
Dilla et al. (1991), calibration was calculated using the following formula:
冘 n 兩P ⫺ C 兩
T
1
Calibration ⫽ i i i
N i⫽1
where:
N ⫽ total number of probability assessments;

ni ⫽ number of times a probability response was used;
Pi ⫽ probability response category (i.e., 1.0, 0.9, 0.8, 0.7, 0.6, 0.5);
Ci ⫽ percentage of correct responses for each category; and
T ⫽ total number of response categories.
A low score indicates superior calibration with zero representing perfect calibration.
The use of absolute value ensures that overconfidence at one response category does not
cancel out underconfidence at another response category.
Following Pincus (1991), Simnett (1996), and Kennedy and Peecher (1997), confidence
was calculated using the following formula:
冘 n (P ⫺ C )
T
1
Over/underconfidence ⫽ i i i
N i⫽1
where:
N ⫽ total number of probability assessments;

ni ⫽ number of times a probability response was used;
Pi ⫽ probability response category (i.e., 1.0, 0.9, 0.8, 0.7, 0.6, 0.5);
Ci ⫽ percentage of correct responses for each category; and
T ⫽ total number of response categories.
14
Participants, on average, responded ‘‘5’’ on the likelihood scale (which should represent the fact that participants
were guessing) on 6.95 occasions (out of a total of 96 predictions made by them). Where a participant did
respond ‘‘5,’’ coding proceeded as follows. For the calculation of calibration confidence and slope, a response
of ‘‘5’’ was coded as a prediction (with 50 percent chance of being correct) that the student would have answered
correctly. For percentage correct, responses of ‘‘5’’ were considered to mean ‘‘I do not know’’ and not included
in the analysis.

A positive score indicates overconfidence, while a negative score indicates undercon-

fidence. Interpretation of this score, however, must proceed with caution. A score of zero
may indicate that the decision maker is perfectly calibrated or that overconfidence/under-
confidence at one probability response category is perfectly offset by underconfidence/
overconfidence at another probability response category. The score should, therefore, be
considered following an examination of calibration.
To calculate percentage correct, predictions were compared with the actual performance
of the person whose performance was being predicted in order to determine if the prediction
was correct or not.
Finally, slope was calculated as a measure of discrimination (see Yates 1990). Slope
measures the extent to which an assessor can differentiate between correct responses and
incorrect responses. It is calculated as the average confidence assigned when the prediction
was correct minus the average confidence assigned when the prediction was incorrect.
Higher scores indicate superior discrimination.
Results
As a manipulation check we examined the self-reported familiarity levels, which re-
vealed significant differences between the familiar (mean 5.4) and unfamiliar (mean 2.3)
students whose performance was being assessed (t ⫽ 14.774, one-tailed p ⬍ .001). In
addition, there were no significant differences in self-reported familiarity across the feed-
back conditions for familiar students (average-group feedback: 5.6; individual-specific
feedback: 5.4; no feedback: 5.2; F ⫽ 0.692, p ⫽ .504) or unfamiliar students (average-
group feedback: 2.1; individual-specific feedback: 2.2; no feedback: 2.4; F ⫽ 0.332,
p ⫽ .719). We also examined the knowledge of the assessor and found no significant
differences in assessor knowledge across the no feedback (56.76 percent correct),
individual-specific feedback (56.46 percent correct) and average-group feedback (54.15 per-
cent correct) conditions (F ⫽ 0.422, p ⫽ .657).
In total, and across each of the experimental conditions, participants exhibited poor
prediction performance and were miscalibrated (overconfident) in their predictions. De-
scriptive statistics are reported in Table 2 and presented in Figure 2 (note the direction on
the Y axis is reversed for calibration and confidence).
Hypotheses Testing
Recall that H3a proposed that when predicting the performance of a familiar peer, the
provision of individual-specific feedback would be more effective in improving prediction
performance than the provision of either average-group feedback or no feedback. By con-
trast, H3b proposed that when predicting the performance of an unfamiliar peer, average-
group feedback would be more effective than either individual-specific feedback or no
feedback. Taken together, H3a and H3b predict a significant interaction between feedback
type and level of familiarity. Consistent with this prediction, a 3 ⫻ (2) ANOVA reported
in Panel A of Table 3 revealed a significant feedback by familiarity interaction (F ⫽ 46.922,
p ⬍ .001) when using calibration as the dependent variable. Consistent results were found
for confidence (F ⫽ 32.339, p ⬍ .001), percentage correct (F ⫽ 41.995, p ⬍ .001), and
slope (F ⫽ 23.257, p ⬍ .001).15
To test H3a and H3b, pairwise comparisons using the Tukey HSD procedure were
performed. The results from this analysis are reported in Panel B of Table 3. The results
15
Four participants in stage one failed to provide an answer for one to five of the questions (all from Beijing).
For these participants, a response (either A, B, C, or D) was randomly generated for each of the questions not
answered. This assumes that the participant did not know the answer and would have guessed. All analyses
were rerun excluding the performance of these four participants, with no change in the statistical inferences.
TABLE 2
Descriptive Statistics
Familiar Unfamiliar Combined
Mean s.d. Mean s.d. Mean s.d.
No Feedback (n ⫽ 29) (NF)
Calibration 0.34 0.13 0.34 0.11 0.33 0.12
Confidence 0.30 0.15 0.30 0.12 0.30 0.12
Percentage Correct 53.7 11.0 52.3 9.7 53.0 9.2
Slope ⫺0.01 0.05 0.00 0.05 0.00 0.03
Individual-Specific Feedback (n ⫽ 25) (ISF)
Calibration 0.20 0.08 0.33 0.13 0.25 0.10
Confidence 0.16 0.10 0.30 0.15 0.23 0.11
Slope 0.06 0.05 0.02 0.06 0.05 0.05
Average-Group Feedback (n ⫽ 26) (AGF)
Calibration 0.31 0.07 0.19 0.07 0.24 0.06
Confidence 0.26 0.10 0.15 0.10 0.21 0.08
Slope 0.01 0.06 0.08 0.04 0.04 0.04
Combined (n ⫽ 80)
Calibration 0.29 0.12 0.29 0.12 0.27 0.11
Confidence 0.25 0.13 0.25 0.14 0.25 0.11
Slope 0.03 0.06 0.03 0.06 0.03 0.05
supported H3a. When the assessor was familiar with the assessee, those receiving
individual-specific feedback performed better than those receiving either average-group
feedback or no feedback. All differences were in the expected direction and significant at
p ⬍ .05. There were no statistically significant differences between average-group feedback
and no feedback conditions, highlighting that average-group feedback is no more likely to
result in improved performance than the provision of no feedback.
The results also supported H3b. When the assessor was unfamiliar with the assessee,
those receiving average-group feedback performed better than those receiving either
individual-specific feedback or no feedback. All differences were in the expected direction
and significant at p ⬍ .05. There were no statistically significant differences between
individual-specific feedback and no feedback conditions, highlighting that individual-
specific feedback is no more likely to result in improved performance than the provision
of no feedback.16,17
16
We also analyzed whether our results were being driven by difficult to predict questions. All analyses were
rerun after eliminating the six and 12 most difficult questions for which to predict performance (based on the
number of incorrect predictions for each question), with no change in our statistical inferences.
17
We also tested for learning across the three blocks (see footnote 11). All analyses in the preceding section were
re-performed for each of the three blocks of 16 questions. Consistent with expectations, for each of the three
blocks, there was a significant feedback type by familiarity interaction for all dependent variables (all p ⬍ .02).
Consistent with prior studies (e.g., Hirst et al. 1999), the positive effects of individual-specific feedback did not
reveal themselves until the second block of 16 questions. Tukey HSD tests revealed statistically significant
differences consistent with H3a in blocks two and three (all p ⬍ .01) but not block one. In contrast with prior
studies, however, the positive effects of average-group feedback were immediately apparent. With the exception
of an insignificant difference between individual-specific feedback and average-group feedback in block one for
calibration and an insignificant difference between average-group feedback and no feedback in block two
for calibration and confidence, Tukey HSD tests revealed statistically significant differences consistent with H3b
across all three blocks (all p ⬍ .01).

FIGURE 2
Prediction Performance
Observed Effects
0.1
Performance
Calibration
Prediction
0.2
0.3
0.4
Unfamiliar Familiar
0.1
Performance
Confidence
Prediction
0.2
0.3
0.4
Unfamiliar Familiar
70
Percentage Correct
Performance
Prediction
60
50
40
Unfamiliar Familiar
.09
Performance
Prediction
.06
Slope
.03
.00
Unfamiliar Familiar
Individual-Specific Feedback
Average-Group Feedback
No Feedback

Auditing: A Journal of Practice & Theory
72
TABLE 3
ANOVA
Panel Aa: ANOVA Summary Tables

Calibration Confidence Percentage Correct Slope
Source SS df MS F p SS df MS F p SS df MS F p SS df MS F p
Between-Subjects
Feedback Type 0.25 2 0.13 7.37 .001 0.28 2 0.14 6.14 .003 2,191.6 2 1,095.8 8.36 .001 0.05 2 0.03 8.12 .001
Error 1.30 77 0.02 1.77 77 0.02 10,089.2 77 131.0 0.24 77 0.01
Within-Subjects
Familiarity 0.01 1 0.01 0.02 .886 0.01 1 0.01 0.45 .505 436.3 1 436.3 6.67 .012 0.01 1 0.01 0.66 .419
Fam ⫻ Feed Type 0.40 2 0.20 46.92 ⬍.001 0.41 2 0.21 32.34 ⬍.001 5,479.2 2 2,739.6 42.00 ⬍.001 0.10 2 0.05 23.26 ⬍.001
Error 0.33 77 0.01 5,023.2 77 65.2 0.16 77 0.01
Panel B: Pairwise Comparisons; Difference in Mean (Absolute Value)

Calibration Confidence Percentage Correct Slope
Hypothesis 3a: Familiar Peer
ISF vs. AGF 0.11* 0.10* 11.8* 0.05*
ISF vs. NF 0.14* 0.14* 14.7* 0.07*
AGF vs. NF 0.03 0.04 2.9 0.02
Hypothesis 3b: Unfamiliar Peer
AGF vs. ISF 0.14* 0.15* 17.3* 0.06*
AGF vs. NF 0.15* 0.15* 14.6* 0.08*
ISF vs. NF 0.01 0.00 2.7 0.02
a
This panel reports the results of a 3 ⫻ 2 (i.e. Feedback Type ⫻ Familiarity) ANOVA for each of the four measures of prediction performance (i.e., calibration,
confidence, percentage correct, slope).
Harding and Trotman

* Tukey HSD analysis significant, p ⬍ .05.
ISF ⫽ Individual-specific feedback;
AGF ⫽ Average-group feedback; and
NF ⫽ No feedback.
May 2009
Additional Analysis
As additional analysis, we examined the level of familiarity between the assessor and
assessee, the extent to which the assessee is ‘‘average,’’ and the knowledge gap between
the assessor and assessee.
As reported in Table 3, with the exception of percentage correct, the ANOVA main
effect for familiarity was not significant for any of the measures of prediction performance.
While there is evidence consistent with the understanding that assessors perform better
when assessing a familiar peer when measured in relation to percentage correct (F ⫽ 6.668,
p ⫽ .012), there is no such evidence when calibration (F ⫽ 0.021, p ⫽ .886), confidence
(F ⫽ 0.449, p ⫽ .505), or slope (F ⫽ 0.661, p ⫽ .419) are examined.18 While assessors
may be able to predict the performance of a familiar peer better than an unfamiliar peer,
there are no differences in the assessors’ ability to attribute an appropriate level of confi-
dence to these judgments.
Next we examined the relationship between the assessee’s similarity or difference from
the average and prediction performance. Average-group feedback may be beneficial when
assessing a familiar ‘‘average’’ peer. Although the assessor will anchor on the specific peer’s
overall competence in such a situation, the average-group feedback is quite specific where
an average peer is being assessed. If average-group feedback can improve the quality of
the anchor being used when assessing a familiar average peer, it might be expected that the
superiority of the individual-specific feedback over average-group feedback will be mod-
erated by the extent to which the assessee is similar to the average.
To examine the potential moderating role of similarity to the mean, we first calculated
the variation from average as the absolute value of the variation in the individual assessee’s
performance from that of the mean performance of all participants in stage one. The sample
was then split at the median, with assessees equal to or above (below) the median placed
in the high (low) variation from mean group.
A 3 (feedback type) ⫻ 2 (familiarity) ⫻ 2 (familiar assessee variation from mean)
ANOVA with calibration as the dependent variable showed only a significant main effect
for feedback type (F ⫽ 7.847, p ⫽ .001) and a significant interaction between feedback
type and familiarity (F ⫽ 47.048, p ⬍ .001). No other main effect or interaction was
significant, suggesting that similarity to the mean did not influence the nature of the inter-
action reported in our testing of H3a and H3b.19 These results suggest that while average-
group feedback could be of benefit when assessing familiar peers who are similar to the
average, assessors in this study did not realize this benefit.
As previous research (e.g., Kennedy and Peecher 1997) has focused on both the asses-
sor’s knowledge and the knowledge gap between the assessor and assessee, we also
analyzed the overconfidence identified in our study with reference to the assessor’s and
assessee’s knowledge. This analysis is based on a sample of 75, as five participants in
stage two did not complete stage one.20 We first compared the assessors’ average like-
lihood assessments when they themselves answered the question correctly with average
18
We also analyzed the self-assessed familiarity levels within the two familiarity treatments, with no statistically
significant correlation between self-assessed familiarity and any of the four measures of prediction performance.
Similarly, a median split on self assessed familiarity revealed no statistically significant differences in prediction
performance between those familiar and unfamiliar with the assessee.
19
Consistent results were obtained when we also examined unfamiliar assessee variation from mean in a 3 ⫻ 2
⫻ 2 ⫻ 2 ANOVA, and for all other measures of prediction performance. We also analyzed our data in an
ANCOVA with variation from mean as a continuous covariate(s) with consistent results.
20
All statistical inferences remain unchanged when we conducted the analysis separately for each of the three
feedback conditions.

likelihood assessments when they answered the question incorrectly. Consistent with the
belief that an assessor’s own knowledge impacts on assessments of another’s competence,
paired samples t-tests revealed that participants assessed a higher likelihood that the peer
would answer the question correctly when they themselves had answered the question
correctly. When assessing the competence of a familiar peer, likelihood assessments when
the assessor answered the question correctly (mean 7.64) were higher than those when the
assessor answered the question incorrectly (mean 6.80; t ⫽ 6.758, two-tailed p ⬍ .001).
Similarly, when assessing the competence of an unfamiliar peer, the assessed likelihood
that the peer would answer correctly was greater when the assessor answered the question
correctly (mean 7.36) than when the assessor answered incorrectly (mean 6.44; t ⫽ 7.052,
two-tailed p ⬍ .001).21
To examine the impact of knowledge gap, we calculated two knowledge gap measures
for each participant. One represented the knowledge gap between the assessor and the
familiar student, and the other represented the knowledge gap between the assessor and
the unfamiliar student. These were calculated by subtracting the assessee’s performance (in
terms of percentage correct) from the assessor’s performance. A positive (negative) score
indicates a superior (inferior) knowledge gap. Analysis of covariance revealed that, for
familiar assessments, there was a significant effect of feedback type on overconfidence
(F ⫽ 9.254, p ⬍ .001) but no relationship between knowledge gap and overconfidence (F
⫽ 1.234, p ⫽ .270). For unfamiliar assessments, however, in addition to a significant effect
of feedback type (F ⫽ 7.687, p ⫽ .001), knowledge gap was significantly related to over-
confidence (F ⫽ 4.967, p ⫽ .029). Consistent with Kennedy and Peecher (1997) and Han
et al. (2007), these results suggest that the knowledge gap between the assessor and assessee
impacts on the extent of overconfidence in the competence of an unfamiliar assessee. This
result is consistent with Nickerson’s (1999) theory in that the less one knows about the
person being assessed, the greater is the influence of one’s own knowledge on the working
model of the specific other’s knowledge. These results suggest that, in addition to percep-
tions of the specific or average peers’ competence, one’s own competence, consistent with
the earlier research, may influence the accuracy with which assessments of another’s com-
petence are made.
DISCUSSION AND CONCLUSIONS

In practice, auditors need to work with, delegate work to, supervise, and review the
work of other auditors, some of whom they are familiar with, some of whom they are
unfamiliar with. In doing this, they need to consider the competency of the other auditor.
Study 1 used verbal protocol analysis to obtain a better understanding of how assess-
ments of another auditor’s competence are made. It showed that when predicting the task-
specific competence of a familiar colleague (either a peer or a subordinate), auditors rely
heavily on the perceived competence of the specific colleague as the anchor. When pre-
dicting the task-specific competence of an unfamiliar colleague, auditors rely heavily on
the perceived competence of the hierarchical group (either staff or seniors) to which the
colleague belongs.
Earlier psychology (Nickerson 1999) and accounting studies (Kennedy and Peecher
1997; Jamal and Tan 2001; Tan and Jamal 2001) have theorized and provide empirical
support that assessors anchor on their own knowledge. By comparison, the data reported
in study 1, which is the first study that we are aware of to specifically investigate the
21
There were no statistically significant differences in overconfidence when the assessors themselves answered the
question correctly and incorrectly for either the familiar or unfamiliar peer.

process by which these judgments are made, found only three instances out of 79 judgments
where there was evidence of the auditor’s own knowledge being used as an anchor. How-
ever, our protocol methodology does not allow us to draw conclusions on whether one’s
own knowledge contributes toward the specification of the anchors identified, or on anchors
that might be used at the unconscious level (such as one’s own knowledge). Both possi-
bilities would suggest the use of anchors may be more complex than initially thought.
While auditors in study 1 did not generally refer to themselves while predicting auditors’
task-specific performance, results from study 2 provide support for earlier studies that at
least one of the anchors used is the auditors’ own knowledge. In study 2 we found that
when assessing both familiar and unfamiliar peers, participants assessed a higher likelihood
assessment when they themselves had answered the question correctly than when they
answered it incorrectly. In addition, in study 2, the knowledge gap between an auditor and
his/her peer is associated with the auditor’s overconfidence. Collectively the evidence sup-
ports prior theory and findings that show the relevance of auditors’ own knowledge when
making these types of predictions. Importantly, our study extends this research by showing
that when predicting other auditors’ task-specific performance, anchors other than the au-
ditor’s own knowledge may also be used, and that assessors could rely on conscious or
unconscious anchors. One implication is that if the anchors verbalized in the protocols are
more salient in the auditors’ decision processes, they are likely to be a more productive
way through which to reduce identified levels of overconfidence.
One way in which to reduce the identified overconfidence in auditors’ assessments of
their colleagues’ competence (e.g., Kennedy and Peecher 1997; Jamal and Tan 2001) is to
improve the accuracy of the anchors consciously employed by auditors when making these
judgments. In study 2, we argued that outcome feedback could be effective in reducing the
level of overconfidence by improving the accuracy of the anchor and reinforcing the as-
sessment strategy. However, given our belief that the anchor employed will be contingent
on the assessor’s familiarity with the assessee, and the results of study 1 supporting this
belief, the effectiveness of different types of outcome feedback was argued to be contingent
on the familiarity of the assessee to the assessor.
Our study found that outcome feedback has the potential to improve judgments when
assessing the competence of others. However, the merits of individual-specific feedback as
compared with average-group feedback are contingent on the relationship between the as-
sessee and assessor. Individual-specific feedback was effective in improving accuracy, re-
ducing overconfidence, and improving discrimination when predicting the task-specific per-
formance of a colleague with whom the assessor is familiar. Average-group feedback was
effective when predicting the task-specific performance of a colleague with whom the as-
sessor is not familiar, but is aware of the general characteristics of the group from which
the person is drawn.
While study 2 results are based on peers’ assessing the knowledge of other peers, our
theory and results in study 1 inform an understanding of other relationships in the audit
domain, including those involving a superior and subordinate. Our theory and results sug-
gest that firms need to be aware of the level of familiarity in determining the type of
feedback that should be provided to assessors. While it might be assumed that feedback
should be as specific as possible, our results showed that this will not always be the case.
Indeed, where the assessor is unfamiliar with the specific competence of the assessee, the
provision of individual-specific feedback is not as likely as average-group feedback to result
in improved performance.
Our study has a number of limitations that suggest opportunities for future research.
First, protocol analysis provides information on the anchors participants believe they use

but is unlikely to identify all anchors, including unconscious anchors. Second, we use a
strong operationalization of outcome feedback which is provided to participants after they
complete each case. Third, we used one form of average-group feedback and one form of
individual-specific feedback. While we suggest our theory will apply to variants of these
types of feedback, the strength of the results may vary with the level of informativeness of
the feedback relative to the anchor employed. For example, average-group feedback could
have been less informative (e.g., X percent answered A and were therefore correct). Sim-
ilarly, individual-specific feedback could have referred to more than one question, e.g., how
many questions the assessee answered correctly in total, in a particular category of question,
or in a particular block of questions. Fourth, study 2 only considered peer assessments, and
future research could test the theory with superior-subordinates in an audit firm hierarchy.
Finally, the value of outcome feedback does vary with task complexity.
REFERENCES
Balzer, W. K., L. B. Hammer, K. E. Sumner, T. R. Birchenough, S. P. Martens, and P. H. Raymark.
1994. Effects of cognitive feedback components, display format, and elaboration on perform-
ance. Organizational Behavior and Human Decision Processes 58 (3): 369–385.
Bamber, E. M. 1983. Expert judgment in the audit team: A source reliability approach. Journal of
Accounting Research 21 (2): 396–412.
———, and J. H. Bylinski. 1987. The effects of the planning memorandum, time pressure and in-
dividual auditor characteristics on audit managers’ review time judgments. Contemporary Ac-
counting Research 4 (1): 127–143.
Blanz, M., and B. Aufderheide. 1999. Social categorization and category attribution: The effects of
comparative and normative fit on memory and social judgment. The British Journal of Social
Psychology 38 (2): 157–179.
Bonner, S. E., and P. L. Walker. 1994. The effects of instruction and experience on the acquisition of
auditing knowledge. The Accounting Review 69 (1): 157–178.
Chen, S. 2001. The role of theories in mental representation and their use in social perception: A
theory-based approach to significant-other representations and transference. In Cognitive Social
Psychology: The Princeton Symposium on the Legacy and Future of Social Cognition, edited
by G. B. Moskowitz. Mahwah, NJ: Erlbaum.
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Mea-
surement 20 (1): 37–46.
Dilla, W., R. D. File, I. Solomon, and L. A. Tomassini. 1991. Predictive bankruptcy judgments by
auditors. In Probabilistic Approach Auditing: Advances in Behavioral Research, edited by L.
Poneman and D. Gabhat. Berlin, Germany: Springer-Verlag.
Earley, C. E. 2001. Knowledge acquisition in auditing: Training novice auditors to recognize cue
relationships in real estate valuation. The Accounting Review 76 (1): 81–97.
Ericsson, K. A., and H. A. Simon. 1993. Protocol Analysis: Verbal Reports as Data. Revised Edition.
Cambridge, MA: MIT Press.
Fussell, S. R., and R. M. Krauss. 1991. Accuracy and bias in estimates of others’ knowledge. European
Journal of Social Psychology 21: 445–454.
———, and ———. 1992. Coordination of knowledge in communication: Effects of speakers’ as-
sumptions about what others know. Journal of Personality and Social Psychology 62 (3): 378–
391.
Gibbins, M., and K. T. Trotman. 2002. Audit review: Managers’ interpersonal expectations and con-
duct of the review. Contemporary Accounting Research 19 (3): 411–444.
Hammond, K., J. Rohrbaugh, J. Mumpower, and L. Adelman. 1977. Social judgment theory: Appli-
cations in policy formulation. In Human Judgment and Decision Processes in Applied Settings,
edited by M. Kaplan and S. Schwartz. New York, NY: Academic Press.

Han, J., K. Jamal, and H. T. Tan. 2007. Are auditors overconfident in predicting the knowledge of
other auditors? Working paper, University of Hong Kong.
Hirst, M. K., and P. F. Luckett. 1992. The relative effectiveness of different types of feedback in
performance evaluation. Behavioral Research in Accounting 4: 1–22.
———, ———, and K. T. Trotman. 1999. Effects of feedback and task predictability on task learning
and judgment accuracy. Abacus 35 (3): 286–301.
International Auditing and Assurance Standards Board. 2005. Quality Control for Audits of Historical
Financial Information. International Standard on Auditing (ISA) No. 220. New York, NY: In-
ternational Federation of Accountants.
Jamal, K., and H. T. Tan. 2001. Can auditors predict the choices made by other auditors? Journal of
Accounting Research 39 (3): 583–597.
Kennedy, J., and M. E. Peecher. 1997. Judging auditors’ technical knowledge. Journal of Accounting
Research 35 (2): 279–293.
Kluger, V. N., and A. DeNisi. 1996. The effects of feedback interventions on performance: A historical
review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin
119 (2): 254–284.
Lingle, J. H., M. W. Altom, and D. L. Medin. 1984. Of cabbages and kings: Assessing the extendibility
of natural object concepts models to social things. In Handbook of Social Cognition, edited by
R. S. Wyer and T. K. Srull. Hillsdale, NJ: Erlbaum.
Malt, B. C., B. H. Ross, and G. L. Murphy. 1995. Predicting features for members of natural categories
when categorization is uncertain. Journal of Experimental Psychology. Learning, Memory, and
Cognition 21 (3): 646–661.
Moskowitz, G. B. 2005. Social Cognition: Understanding Self and Others. New York, NY: Guilford.
Nickerson, R. S., A. Baddeley, and B. Freeman. 1987. Are people’s estimates of what other people
know influenced by what they themselves know? Acta Psychologica 64 (3): 245–259.
———. 1999. How we know—and sometimes misjudge—what others know: Imputing one’s own
knowledge to others. Psychological Bulletin 125 (6): 737–759.
Oakes, P. J., J. C. Turner, and S. A. Haslam. 1991. Perceiving people as group members: The role of
fit in the salience of social categorizations. The British Journal of Social Psychology 30: 125–
144.
Peecher, M. E., and I. Solomon. 2001. Theory and experimentation in studies of audit judgments and
decisions: Avoiding common research traps. International Journal of Auditing 5 (3): 193–203.
Pincus, K. V. 1991. Audit judgment confidence. Behavioral Research in Accounting 3: 39–65.
Rich, J. S., I. Solomon, and K. T. Trotman. 1997a. The audit review process: A characterization from
the persuasion perspective. Accounting, Organizations and Society 22 (5): 481–505.
———, ———, and ———. 1997b. Multi-auditor judgment / decision making research: A decade
later. Journal of Accounting Literature 16: 18–26.
Rosch, E., C. B. Mervis, W. D. Gray, D. M. Johnson, and P. Boyes-Braem. 1976. Basic objects in
natural categories. Cognitive Psychology 8: 382–439.
Simnett, R. 1996. The effect of information selection, information processing and task complexity on
predictive accuracy of auditors. Accounting, Organizations and Society 21 (7–8): 699–719.
Smith, E. R., and M. A. Zarate. 1992. Exemplar-based model of social judgment. Psychological
Review 99 (1): 3–21.
Snyder, M., and S. W. Uranowitz. 1978. Reconstructing the past: Some cognitive consequences of
person perception. Journal of Personality and Social Psychology 36 (9): 941–950.
Switzer, F. S., and J. A. Sniezek. 1991. Judgment processes in motivation: Anchoring and adjustment
effects on judgment and behaviour. Organizational Behavior and Human Decision Processes
49: 208–229.
Tan, H. T., and K. Jamal. 2001. Do auditors objectively evaluate their subordinates’ work? The Ac-
counting Review 76 (1): 99–110.
Taylor, S. E., and J. Crocker. 1981. Schematic bases of social information processing. In Social
Cognition: The Ontario Symposium, edited by E. T. Higgins, O. Herman, and M. Zanna. Hills-
dale, NJ: Erlbaum.

Trotman, K. T. 2005. Discussion of: Judgment and decision making research in auditing: A task,
person, and interpersonal interaction perspective. Auditing: A Journal of Practice & Theory 24
(Supplement): 73–87.
Whyte, G., and J. K. Sebenius. 1997. The effect of multiple anchors on anchoring in individual and
group judgment. Organizational Behavior and Human Decision Processes 69 (1): 74–85.
Yates, J. F. 1990. Judgment and Decision Making. New York, NY: Wiley.


Improving Assessment of Another Auditor&#39;s Competence

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improving Assessment of Another Auditor&#39;s Competence

Uploaded by

Copyright:

Available Formats

AUDITING: A JOURNAL OF PRACTICE & THEORY American Accounting Association

Vol. 28, No. 1 DOI: 10.2308 / aud.2009.28.1.53

Improving Assessments of Another

Submitted: November 2007

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

The assessment of competence was operationalized by presenting audit circumstances

Case Development and Procedures

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Peer Subordinate Peer Subordinate

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Research Instrument and Procedures

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

A Express an opinion as to the fairness of Campbell Ltd’s financial report.

The correct answer is A

85.7% of students answered A and were, therefore, correct.

I am certain that the 0 1 2 3 4 5 6 7 8 9 10 I am certain that the

I am certain that the 0 1 2 3 4 5 6 7 8 9 10 I am certain that the

Feedback of this type was provided following each likelihood assessment.

provided in the place of average-group feedback. Individual-specific feedback was opera-

Auditing: A Journal of Practice & Theory May 2009

N ⫽ total number of probability assessments;

N ⫽ total number of probability assessments;

Auditing: A Journal of Practice & Theory May 2009

A positive score indicates overconfidence, while a negative score indicates undercon-

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Panel Aa: ANOVA Summary Tables

Panel B: Pairwise Comparisons; Difference in Mean (Absolute Value)

Harding and Trotman

Auditing: A Journal of Practice & Theory May 2009

DISCUSSION AND CONCLUSIONS

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

Auditing: A Journal of Practice & Theory May 2009

You might also like

Improving Assessment of Another Auditor's Competence

Improving Assessment of Another Auditor's Competence