Advances in Accounting Behavioral Research, Volume 6 (Advances in Accounting Behavioral Research) (PDFDrive)

CONTENTS
LIST OF CONTRIBUTORS vii
REVIEWER ACKNOWLEDGMENTS ix
EDITOR’S COMMENTS xi
EDITORIAL POLICY AND SUBMISSION GUIDELINES xiii
PART I: ACCOUNTING BEHAVIORAL RESEARCH
A STRUCTURAL EQUATION MODEL OF AUDITORS’

PROFESSIONAL COMMITMENT: THE INFLUENCE OF
FIRM SIZE AND POLITICAL IDEOLOGY
John T. Sweeney, Jeffrey J. Quirin and Dann G. Fisher 3
AN ANALYSIS OF GROUP INFLUENCES ON GOING

CONCERN AUDITOR JUDGMENTS
Sunita S. Ahlawat and Timothy J. Fogarty 27
INVESTIGATING ERROR PROJECTION AMONG STATE

AUDITORS: THE IMPACT OF INTENTIONAL AND
SYSTEMATIC MISSTATEMENTS
John T. Reisch, Karen S. McKenzie and
Alan H. Friedberg 53
HOW DOES NEGATIVE SOURCE CREDIBILITY AFFECT

COMMERCIAL LENDERS’ DECISIONS?
Philip R. Beaulieu and Andrew J. Rosman 79
v
vi
EARNINGS MANAGEMENT AND FRAMING: THE SPECIFIC

CASE OF OBSOLETE INVENTORY
Marybeth M. Murphy and Joanne P. Healy 95
THE EFFECTS OF INCENTIVE STRUCTURE AND GOAL

DIFFICULTY ON TIME PLANNING DECISIONS WITHIN A
BALANCED SCORECARD FRAMEWORK
Brad Tuttle and Mark J. Ullrich 121
THE EFFECT OF FAIRNESS IN CONTRACTING ON THE

CREATION OF BUDGETARY SLACK
Theresa Libby 145
PART II: PERSPECTIVES ON RESEARCH PRODUCTIVITY
A TOBIT ANALYSIS OF ACCOUNTING FACULTY

PUBLISHING PRODUCTIVITY IN AUSTRALIAN AND NEW
ZEALAND UNIVERSITIES
Brett R. Wilkinson, Chris H. Durden and
Katherine J. Wilkinson 173
PART III: METHODOLOGICAL ISSUES IN BEHAVIORAL

RESEARCH
CLASSIFICATION OF CUSTOMIZED ASSURANCE

SERVICES BY DECISION MAKERS: THE CASE OF
SysTrust™
Philip R. Beaulieu 189
LIST OF CONTRIBUTORS
Sunita S. Ahlawat School of Business, The College of New

Jersey, USA
Philip R. Beaulieu Haskayne School of Business, University of
Calgary, Canada
Chris H. Durden Department of Accounting, University of
Southern Queensland, Australia
Dann G. Fisher Department of Accounting, Kansas State
University, USA
Timothy J. Fogarty Weatherhead School of Management, Case
Western Reserve University, USA
Alan H. Friedberg School of Accounting, Florida Atlantic
University, USA
Joanne P. Healy College of Business Administration, Kent
State University, USA
Theresa Libby School of Business and Economics, Wilfrid
Laurier University, Canada
Karen S. McKenzie School of Accounting, Florida Atlantic
University, USA
Marybeth M. Murphy College of Business Administration, Kent
State University, USA
Jeffrey J. Quirin Barton School of Business, Wichita State
University, USA
John T. Reisch School of Business, East Carolina University,
USA
Andrew J. Rosman School of Business, University of
Connecticut, USA
vii
viii
John T. Sweeney School of Accounting, Information Systems

& Business Law, Washington State
University, USA
Brad Tuttle Moore School of Business, University of
South Carolina, USA
Mark J. Ullrich Graduate School of Business & Public Policy,
(Deceased) Naval Post Graduate School, USA
Brett R. Wilkinson Hankamer School of Business, Baylor
University, USA
Katherine J. Wilkinson Rawls College of Business, Texas Tech
University, USA
REVIEWER ACKNOWLEDGMENTS
The Editor and Associate Editors at AABR would like to thank the many excellent
reviewers who have volunteered their time and expertise to make this an outstand-
ing publication. Publishing quality papers in a timely manner would not be possible
without their efforts.
Elizabeth Dreike Almer Roger Debreceny
Portland State University, USA Nanyang Technological University,
Singapore
John C. Anderson
San Diego State University, USA William N. Dilla
Iowa State University, USA
Philip R. Beaulieu
University of Calgary, Canada Alan S. Dunk
University of Tasmania, Australia
Jean Bedard
Northeastern University, USA Jennifer D. Goodwin
University of Queensland, Australia
James Bierstaker
University of Massachusetts, Boston, Glen Gray
USA California State University,
Northridge, USA
Dennis M. Bline
Bryant College, USA Heather Hermanson
Kennesaw State University, USA
Robert H. Chenhall
Monash University, Australia Mary Callahan Hill
Kennesaw State University, USA
Freddie Choo
San Francisco State University, USA Karen L. Hooks
Florida Atlantic University, USA
Christie L. Comunale
Long Island University – C.W. Post James E. Hunton
Campus, USA Bentley College, USA
Charles Cullinan Mike Kirschenheiter
Bryant College, USA Columbia University, USA
Elizabeth Davis Stacy Kovar
Baylor University, USA Kansas State University, USA
ix
x
Kip R. Krumwiede Robert J. Parker

Brigham Young University, USA University of South Florida, USA
Theresa Libby Will Quilliam
Wilfrid Laurier University, Canada University of South Florida, USA
Daryl Lindsay John Reisch
University of Saskatchewan, Canada East Carolina University, USA
Timothy J. Louwers Michael Roberts
Louisiana State University, USA University of Alabama, USA
Nace Magner Andrew J. Rosman
Western Kentucky University, USA University of Connecticut, USA
James Maroney Steve G. Sutton
Northeastern University, USA University of Connecticut, USA and
University of Melbourne, Australia
Lokman Mia
Griffith University – Gold Coast, Linda Thorne
Australia York University, Canada
Venky Nagar Sandra Vera-Munoz
University of Michigan, USA University of Notre Dame, USA
Marcus Odom Sally A. Webber
Southern Illinois University, USA Northern Illinois University, USA
Ed O’Donnell Kristin Wentzel
Arizona State University, USA La Salle University, USA
William R. Pasewark Patrick Wheeler
Texas Tech University, USA University of Missouri, USA
Laurie Pant Stephen W. Wheeler
Suffolk University, USA University of the Pacific, USA
EDITOR’S COMMENTS
Welcome to Volume 6 of Advances in Accounting Behavioral Research. This

issue contains an eclectic collection of behavioral research papers that examine
several very important issues. Several of the papers focus on various aspects
of auditors’ decisions such as professional commitment in public accounting
firms, mitigating bias via group decision making, and appropriately using sample
information to estimate errors in governmental auditing. The decisions of other
professionals that use accounting information such as commercial lenders and
divisional managers are also examined. Two papers examine how accounting
information impacts the behaviors of individuals within an organization under
various incentive structures. Two other papers provide perspectives on overall
research with one developing a classification scheme for new assurance services
and the other examining factors that impact research productivity of accounting
faculty members. Overall, this is a very enlightening group of papers that provide
insight into the behaviors of various users of accounting information.
Vicky Arnold
Editor
xi
EDITORIAL POLICY AND
SUBMISSION GUIDELINES
Advances in Accounting Behavioral Research (AABR) publishes articles encom-

passing all areas of accounting that incorporate theory from and contribute new
knowledge and understanding to the fields of applied psychology, sociology,
management science, and economics. The Research Annual is primarily devoted
to original empirical investigations; however, literature review papers, theoretical
analyses, and methodological contributions are welcome. AABR is receptive to
replication studies, provided they investigate important issues and are concisely
written. The Research Annual especially welcomes manuscripts that integrate
accounting issues with organizational behavior, human judgment/decision
making, and cognitive psychology.
Manuscripts will be blind-reviewed by two reviewers and an associate editor.
The recommendations of the reviewers and associate editor will be used to
determine whether to accept the paper as is, accept the paper with minor revisions,
reject the paper or invite the authors to revise and resubmit the paper.
MANUSCRIPT SUBMISSION
Manuscripts should be forwarded to the editor, Vicky Arnold, at Vicky.
Arnold@business.uconn.edu via e-mail. All text, tables, and figures should be in-
corporated into a word document prior to submission. The manuscript should also
include a title page containing the name and address of all authors and a concise
abstract. Also, include a separate word document with any experimental materials
or survey instruments. If you are unable to submit electronically, please forward
the manuscript along with the experimental materials to the following address:
Vicky Arnold, Editor

Advances in Accounting Behavioral Research
Department of Accounting U41A
School of Business
University of Connecticut
Storrs, CT 06269-2041, USA
xiii
xiv
References should follow the APA (American Psychological Association) stan-

dard. References should be indicated by giving (in parentheses) the author’s name
followed by the date of the journal or book; or with the date in parentheses, as in
“suggested by Earley (2000).”
In the text, use the form Rosman et al. (1995) where there are more than two
authors, but list all authors in the references. Quotations of more than one line
of text from cited works should be indented and citation should include the page
number of the quotation; e.g. (Dunbar, 2001, p. 56).
Citations for all articles referenced in the text of the manuscript should be shown
in alphabetical order in the reference list at the end of the manuscript. Only articles
referenced in the text should be included in the reference list. Format for references
is as follows:
For Journals
Dunn, C. L., & Gerard, G. J. (2001). Auditor efficiency and effectiveness with
diagrammatic and linguistic conceptual model representations. International
Journal of Accounting Information Systems, 2(3), 1–40.
For Books
Ashton, R. H., & Ashton, A. H. (1995). Judgment and decision-making research

in accounting and auditing. New York, NY: Cambridge University Press.
For a Thesis
Smedley, G. A. (2001). The effects of optimization on cognitive skill acquisition

from intelligent decision aids. Unpublished doctoral dissertation, University.
For a Working Paper
Thorne, L., Massey, D. W., & Magnan, M. (2000). Insights into selection-
socialization in the audit profession: An examination of the moral reasoning of
public accountants in the United States and Canada. Working paper: York Univer-
sity, North York, Ontario.
xv
For Papers From Conference Proceedings, Chapters From Book, etc.
Messier, W. F. (1995). Research in and development of audit decision aids. In:

R. H. Ashton & A. H. Ashton (Eds), Judgment and Decision Making in Accounting
and Auditing (207–230). New York: Cambridge University Press.
A STRUCTURAL EQUATION MODEL
OF AUDITORS’ PROFESSIONAL
COMMITMENT: THE INFLUENCE OF
FIRM SIZE AND POLITICAL IDEOLOGY
John T. Sweeney, Jeffrey J. Quirin and Dann G. Fisher
ABSTRACT
This study models auditors’ professional commitment as the product of
socialization forces operating within the public accounting profession. The
results of a structural equation analysis from a sample of 349 auditors
representing international, national and regional firms indicate that firm size
is inversely related to professional commitment. Furthermore, the findings
indicate that a strong relationship exists between an auditor’s political
ideology and professional commitment. Politically conservative auditors,
reflecting the dominant ideology in public accounting, reported significantly
higher professional commitment than politically liberal auditors.
INTRODUCTION
The accounting scandals that have marked the dawn of the 21st century, such
as Enron, MCI, and Global Crossing, have damaged the credibility of the audit
report and the reputation of the public accounting industry. Perhaps more than
ever, commitment to the ideals and standards of the auditing profession is vital

Advances in Accounting Behavioral Research, Volume 6, 3–25
Copyright © 2003 by Elsevier Ltd.
All rights of reproduction in any form reserved
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06001-0
3
4 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER
to maintaining stakeholder’s confidence in the integrity of the audit report and in

the reliability of financial statement representations. The purpose of this research
effort is to develop and test a comprehensive model of auditors’ professional
commitment with the objective of furthering our understanding of this attitude so
essential to maintaining public trust.
A primary contribution of this study to the auditing research literature is
the inclusion of variables not previously considered in models of professional
commitment, namely audit firm size and political ideology. Firm size proxies
for differences in organizational culture (Pratt & Beaulieu, 1992) and the results
indicate that auditors’ professional commitment is directly and inversely affected
by firm size. As a profession, the culture of public accounting is predominately
politically conservative (Sweeney, 1995). In this study, political ideology is mod-
eled as a socializing variable. The findings indicate that auditors whose ideology
is consistent with the prevailing conservative doctrine are more committed to the
profession than auditors who are politically liberal.
The results of this study have important implications for the public accounting
profession. First, the inverse relationship between firm size and professional com-
mitment is cause for concern, as larger firms and especially the international firms,
dominate the market for audit services. Larger firms also dominate and increasingly
emphasize the more lucrative consulting and non-audit service areas. Perhaps as
a result of the metamorphosis from traditional accounting firms to diverse service
organizations, auditors from larger firms may have lessened their identification
with and commitment to the ideals of the accounting profession. Second, the model
indicates that political ideology directly influences commitment, perhaps due to
conservative auditors more readily embracing the conservative values traditionally
associated with the profession. Political ideology also influences perceptions of
success in public accounting, as conservative auditors report a significantly higher
probability of attaining partnership in their firms than liberal auditors.
This paper proceeds in the following manner. The next section reviews the
literature relevant to the development of a model of professional commitment.
Hypotheses are then advanced, followed by sections discussing the methodology
and analysis. The final section consists of a summary and discussion.
LITERATURE REVIEW AND HYPOTHESES

DEVELOPMENT
Professional commitment, representing the extent to which one identifies and is
willing to exert effort in support of a profession (Aranya et al., 1981; Aranya &
Ferris, 1984), has been conceptualized as a socialization process where emphasis
A Structural Equation Model of Auditors’ Professional Commitment 5
is given to cultivating professional values (Jeffery & Weatherholt, 1996; Larson,

1977). For the auditing profession, these values include honoring the public inter-
est, independence, integrity, and objectivity. An assumption underlying the attitude
of professional commitment is that the stronger an individual’s identification with
and loyalty to the public accounting profession, the less likely he or she will sub-
rogate professional responsibilities (Farmer, 1993). A strong commitment to the
ideals of the profession is considered a prerequisite for independent professional
judgments (Aranya et al., 1981; Gaffney et al., 1993).
The development of auditors’ professional commitment is generally assumed to
precede the development of their organizational commitment (Aranya et al., 1982;
Aranya & Ferris, 1984). Anticipatory professional socialization often begins in
college, when the choice of accounting as an undergraduate major and career
is made, while organizational commitment commences upon entrance to the
firm (Fogerty, 1992). Early conceptualizations of the professional-organizational
dynamic viewed the two constructs in conflict, as the demands of the employing
bureaucracy were perceived to be in competition with professional loyalties
(Sorenson & Sorenson, 1974). The conflict between organizational and profes-
sional socialization occurs when behaviors concordant with organizational norms
and goals are inconsistent with the profession’s code of conduct. Violation of
organizational norms may result in internal sanctions levied against the auditor.
Violation of professional standards, such as Arthur Andersen’s obstruction
of justice in the Enron audit, can result not only in penalties levied against
the perpetrator and his or her firm but may also diminish the prestige of the
auditing profession and the public’s perception of the assurance provided by the
audit report.
More recent research has not viewed organizational and professional commit-
ment as inherently incompatible, finding instead a positive association between
the two constructs (Aranya et al., 1981, 1982). When the professional and organi-
zational commitments of public accountants are in conflict, however, researchers
have found lower job satisfaction and higher turnover intentions (Aranya & Ferris,
1984; Sorenson & Sorenson, 1974). In order to preserve the role of the audit
function in maintaining capital markets, it is essential that auditors’ commitment
to the profession take priority over loyalties to the organization (Schroeder &
Imdieke, 1977).1
Prior research has generally focused on the consequences of professional
commitment and has consistently found a significant association with important
outcome variables. Professional commitment has had a positive influence on
public accountants’ job satisfaction (Aranya et al., 1982; Bline et al., 1991) and
organizational commitment (Aranya et al., 1982; Aranya & Ferris, 1984) and
a negative association with turnover/migration tendencies (Aranya et al., 1982;
Bline et al., 1991) and organizational-professional conflict (Aranya & Ferris,

1984). Professional commitment has also been associated with auditors’ judgment
regarding client retention decisions (Farmer, 1993).
A Model of Auditors’ Professional Commitment
The objective of this research is to model auditors’ professional commitment. The

changing landscape and demographics of the accounting profession suggest that
an understanding of the socializing factors leading to high levels of professional
commitment is needed. Siegal et al. (1991, p. 58) define professional socialization
as the “acquisition of the values, attitudes, skills and knowledge of a professional
subculture.” Fogerty (1992) contends that socialization within public accounting
organizations is to a large extent a coercive process, as new initiates are inculcated
to adopt the values of the dominant culture. Our model of professional commitment
examines two socialization factors not previously considered in prior published
research: firm size and political ideology.
Firm Size
Pratt and Beaulieu (1992) asserted that differences in firm size proxy for differ-
ences in culture. They concluded that larger firms have more rigid control systems
than smaller firms, resulting in the large firms being more structured and mecha-
nistic than the smaller firms. Wheeler et al. (1987) found that the nature of the work
environment, the organizational structure, performance evaluations, compensation
and promotion procedures in large firms differed substantially from those of
smaller firms. Goetz et al. (1991) contended that the more structured and bureau-
cratic environment of larger firms resulted in less individual voice in determining
rules of conduct within the firm. Ponemon (1992) claims that such a strong firm
culture effectively results in the organization weeding out those persons who fail
to conform.
These factors imply that the loyalty of accountants in the larger firms must
be first to the organization and then to the profession. Goetz et al. (1991)
support this premise and assert that because smaller firms have “less stand-alone
credibility” than do larger firms, practitioners in the smaller firms need the
profession more than practitioners in the larger firms. Larger firms are more
visible and prestigious, endowing upon their members an identity separate from
the profession. This suggests that auditors in smaller firms may identify more
readily with the profession, vis-à-vis the organization, than auditors in larger firms
and correspondingly develop a greater sense of commitment to the profession.
H1. Firm size is inversely related to auditors’ professional commitment.
Political Ideology
Socialization encourages persons “to become similar to their profession, not only
as it is embodied by other organizational members, but also as it is defined by
the profession’s espoused ideals” (Fogerty, 1992, p. 139). This description of
the socialization process implies the existence of a prototypic public accountant
embodying desirable characteristics, values and attitudes. The more effective the
socialization processes, the greater the correspondence between the prototype
and the professional member. Some values and attitudes (i.e. commitment,
identification) may be more readily influenced and inculcated by the social-
ization process than others (i.e. religious preferences). It is also possible that
some prototypic characteristics are not amendable by socialization (i.e. gender,
race).
A particularly appropriate theory for examining the influence of prototypes
on socialization processes in the auditing profession is self-categorization theory
(SCT) (Chatman et al., 1998; Hogg & Terry, 2000; Tajfel & Turner, 1985).2 SCT
focuses on the process whereby individuals define their self-concept in relation
to their membership in social groups. Prototype-based comparisons, whereby
social categorization of the individual into favorable in-group or unfavorable
out-group membership occurs, “lies at the heart” of SCT processes (Hogg &
Terry, 2000, p. 122). Prototypes are cognitive representations of the defining
and stereotypical features of in-groups, embodying exemplary or ideal types
and capturing characteristics that differentiate them from other groups. These
characteristics include demographic attributes, behaviors, attitudes and values.
Critical to the notion of prototypes is that they accentuate similarities within and
differences between groups (Hogg & Terry, 2000). For example, because the pro-
totypical partner in public accounting is male, an in-group characteristic may be
masculinity and an out-group characteristic femininity (Maupin, 1993; Maupin &
Lehman, 1994).3
Prototype-based self-categorization is relevant for modeling professional
commitment as a socialization process directed towards cultivating professional
values (Jeffery & Weatherholt, 1996; Larson, 1977) for several reasons. First, in-
group members, reflecting prototypic characteristics, are more likely to cooperate
with each other and to compete with out-group members (Chatman et al., 1998).
Second, in-group members are likely to receive favorable treatment compared to
out-group members (Ashforth & Mael, 1989). This favoritism may be reflected
in work assignments, performance evaluations, receipt of voluntary mentoring, or
through informal signals of preference relative to out-group members. As a result,
in-group members are likely to maintain more favorable attitudes towards their
profession and be more readily socialized than out-group members. Third, SCT
implies that a prototypically homogeneous audit profession is likely to develop,
which may facilitate socialization by reducing uncertainty regarding appropriate

attitudes and behaviors (Hogg & Terry, 2000).
Chatman (1991) utilized a person-organization fit approach, defined as the con-
gruence between organizational and individual values, in examining socialization
processes in public accounting. She found that socialization is facilitated by the
extent that new auditors possess or inculcate values similar to the prototypical
organizational values. Recruits whose values were aligned with the prevalent
organizational values had greater satisfaction and lower turnover than recruits
who maintained dissimilar values.
Kanter (1977) contends that in-group conformity is a prerequisite for ad-
vancement in organizations and that promotion largely depends upon presenting
political views as well as sex-role characteristics that are similar to the dominant
or prototypic upper-level managers. Sweeney and Fisher (1999) propose that
conservative political ideology represents a normative set of shared values in
public accounting and is an important socialization factor. In his analysis of the
influence of social class on political orientation, Burns (1992) identified several
dimensions collectively predictive of conservative ideology. The dimensions
identified as explaining a conservative/Republican political orientation included
engaging in mental (versus manual) labor, self-employment, individualistic
(versus collective) economic orientation, white race and male.4 These dimensions
are generally descriptive of the prototypic audit firm partner.
There has been little research to date examining the political orientation of
public accountants. In a broad sample of public accountants, Sweeney (1995)
found that approximately 80% identified themselves as politically conservative.
Further testimony to the conservative orientation of public accounting is reflected
in political party contributions over the last election cycle (1999–2000). The
combined contributions of the American Institute of Certified Public Accountants
and the Big 5 international firms to the conservatively oriented Republican Party
($3,358,746) were approximately twice those to the more liberal Democratic
Party ($1,708,220) (FECInfo, 2001).
If political ideology is an important socializing variable in public accounting,
then conservative auditors are most likely to inculcate and embrace the prototypic
politically conservative values of the profession. Politically liberal auditors may
feel disenfranchised by the conservative orientation of public accounting and have
difficulty identifying with the dominant political values. As a result, it is likely that
politically conservative auditors would be more readily socialized, and therefore
be more committed to the profession, than their politically liberal counterparts.
H2. Politically conservative auditors will have greater professional commitment

than politically liberal auditors.
Control Paths
Prior research has documented that partners in public accounting are typically
male (Hooks & Cheramy, 1994; Hull & Umansky, 1997) and, on average,
have developed to the conventional level of moral reasoning (Sweeney, 1995).
Researchers have suggested that masculinity (Maupin, 1993; Maupin & Lehman,
1994) and conventional moral reasoning (Ponemon, 1992) represent prototypes
in public accounting. Since the influence of both gender and moral reasoning on
professional commitment has been examined in prior research, these variables
are included as control paths in the model of professional commitment.
Although the literature suggests that gender barriers in public accounting may
preclude women from attaining the same level of commitment to the profession as
men (Maupin, 1993; Maupin & Lehman, 1994), the results of empirical research
have been equivocal. Gaffney et al. (1993) found that family obligations increased
the professional commitment of men in public accounting but had no effect on
women’s professional commitment. Street et al. (1993), after controlling for
positional level, did not find a difference in professional commitment between
female and male public accountants.
Covaleski et al. (1998) contend that although women may have “broken the
glass ceiling” to attaining partnership in Big 6 firms, there is still a paucity of
high-level female partners. Women who are unable or unwilling to adapt mascu-
line characteristics required by the male-dominated culture of public accounting
may encounter obstacles in making partner (Maupin & Lehman, 1994). Given
the predominance of the male partners and the difficulties that woman may
encounter in adopting in-group male qualities, women in public accounting may
represent an out-group and have correspondingly less professional commitment
than men.
H3. Male auditors will have greater professional commitment than will female
auditors.
Ethics researchers in accounting have consistently found that the ethical devel-
opment of auditors, as measured by the P score of the Defining Issues Test
(DIT) (Rest, 1986, 1993), most commonly reflected conventional reasoning
and was inversely related to positional level (Lampe & Finn, 1992; Ponemon
& Gabhart, 1993; Shaub, 1994). This result seemingly contradicts Kohlberg’s
(1969) moral development theory, which holds that development is sequential
and progressive but not regressive. Ponemon (1992) contended that the inverse
relationship between P scores and rank in public accounting organizations was
the result of a selection-socialization process whereby firms prefer to hire and
then promote individuals with a shared set of ethical values and beliefs. He found
that conventional reasoning auditors, as measured by DIT P scores, were more

likely to be favorably evaluated and promoted and less likely to turnover than
principled reasoning auditors. Ponemon (1992, p. 244) asserted that “individuals
with too high a level of ethical reasoning may experience difficulty progressing
to the upper echelons in the accounting firms’ formal hierarchy.” In other words,
accountants who reason at a conventional level are most likely to accept, embrace,
and support the prototypical norms of the firm and the profession, increasing
their acceptance within the organization and opportunities for promotion. Thus,
it appears that conventional reasoning, as opposed to higher order principled
reasoning, is representative of the ethical value prototype of the audit profession.
Dwyer et al. (2000) examined the relationship between practicing accountants’
professional commitment and DIT P scores. Their results suggested that ac-
countants’ ethical development influenced their interpretation of the professional
commitment construct, although the authors did not indicate a directional
relationship. Shaub et al. (1993) found that auditor’s professional commitment
was influenced by their ethical orientation, with ethical idealism positively
related to and ethical relativism negatively related to commitment. Jeffery and
Weatherholt (1996) posited a link between an accountant’s ethical development,
as measured by DIT P scores, and his or her professional commitment. Consistent
with Ponemon’s (1992) selection-socialization hypothesis, Jeffery and Weather-
holt found that conventional reasoning accountants had higher professional
commitment than principled reasoning accountants.
H4. Professional commitment will be inversely related to auditors’ ethical
development, as measured by the P score of the DIT.
The relationship between positional level and professional commitment has
also been examined in prior research and is included as a control path in our
model. Advancement within public accounting organizations is largely a result
of socialization processes, whereby individuals who reflect the dominant culture
and values of the organization are more likely to be promoted (Fogerty, 1992;
Ponemon, 1992; Pratt & Beaulieu, 1992). Early research on commitment
in public accounting organizations (Schroeder & Imdieke, 1977; Sorenson,
1967; Sorenson & Sorenson, 1974) suggested that partners were more organi-
zationally oriented and less professionally committed than were staff members.
More recent research has not supported this contention. These studies instead
found that professional commitment is positively associated with rank in the firm
(Adler & Aranya, 1984; Aranya et al., 1981; Aranya & Ferris, 1984; Jeffery &
Weatherholt, 1996; Norris & Niebuhr, 1983).
Goetz et al. (1991) speculated that experience and tenure heighten professional
commitment. This is consistent with defining professional commitment as a
socialization process. If the socialization process is successful, then it follows that

those who have been in the profession the longest should display the strongest
commitment. Turnover-survivorship processes would also suggest that profes-
sional commitment should be stronger at higher positional levels. Individuals who
are more committed to the profession would be more likely to remain, which may
explain the inverse relationship between professional commitment and turnover
(Aranya et al., 1982; Bline et al., 1991).
H5. Professional commitment of auditors will increase with rank in the firm.
Hypothesis 3 and Hypothesis 5 posit that gender and rank will have an effect
on professional commitment. Prior studies involving public accountants have
indicated a strong relationship between gender and rank, with females being
underrepresented at higher ranks (Collins, 1993; Hooks & Cheramy, 1994;
Maupin, 1993; Maupin & Lehman, 1994; Sweeney, 1995). As a result, it is
necessary to control for the influence of positional level when assessing the
relationship between gender and professional commitment.
Prior research assessing the ethical development of public accountants have
generally found the DIT P scores of females to be higher than the scores of males
(Bernardi & Arnold, 1997; Enyon et al., 1997; Shaub, 1994; Sweeney, 1995).
The gender effect on P scores appears to hold regardless of firm size. As a result,
the ethical development of female auditors is expected, on average, to be more
advanced than that of male auditors. Therefore, the influence of gender must be
controlled for in assessing the effect of ethical development, as measured by DIT
P scores, on auditors’ professional commitment (H4).
Sweeney and Fisher (1998, 1999) and Fisher and Sweeney (2002) contend
that the DIT contains an imbedded political content biasing the measurement of
test-takers’ ethical development. Although Rest et al. (1999) dispute this claim,
they concede that as much as 40% of the variance in DIT P scores is explained
by political ideology. A priori, the political content of the DIT will result in an
upward bias in the P scores of politically liberal auditors and a downward bias in
the P scores of politically conservative auditors (Sweeney & Fisher, 1998, 1999).
Therefore, the influence of political ideology must be controlled in assessing the
effect of ethical development on professional commitment (H4).5
In summary, we hypothesize that auditors’ professional commitment is directly
impacted by the following variables: firm size (H1), political ideology (H2),
gender (H3), ethical development, as measured by DIT P scores (H4), and
positional level (H5). The model of professional commitment also includes the
following control paths: positional level on gender, gender on ethical development,
and political ideology on DIT P scores. Figure 1 presents our model of auditors’
professional commitment.
Fig. 1. Theoretical Model of Professional Commitment.
METHOD
Sample
Prior to collecting data, management representatives from offices of multiple

public accounting firms agreed to participate in the study and to provide
auditor subjects. Three international firms (Big 5: “large”), two national firms
(“medium”), and six local or regional firms (“small”) participated in the study. The
appropriate office representative indicated the approximate number of available
auditor subjects. The office representative was then provided with the required
number of research instruments to distribute to the participants. Each research
instrument consisted of a questionnaire, the six-story DIT and instructions
enclosed in a stamped, return envelope addressed to the researchers. Participation
Table 1. Descriptive Statistics for Sample.

Position Firm Size Totals
Small Medium Large
Staff 23 22 55 100
Senior 15 10 63 88
Supervisor 11 8 19 38
Manager 10 14 39 63
Partner 29 9 22 60
Totals 88 63 198 349
Males: 230 Liberals: 63 Average age: 30.3 years (S.D. = 8.1)

Females: 119 Conservatives: 286 Average experience: 7.3 years (S.D. = 7.1)
Professional P Score
Commitment
Mean: 75.51 42.14

S.D.: 11.73 12.53
Range: 41–103 8.3–73.3
was voluntary and subjects were assured of anonymity. Participants provided

demographic data but did not otherwise identify themselves.
A total of 383 research instruments were received by the researchers, resulting in
a response rate of approximately 72%. From this initial sample, 27 subjects failed
to pass the internal reliability checks of the DIT, two subjects did not indicate their
political ideology, and five did not complete the professional commitment section.
These respondents were purged from the sample. The final sample consisted of a
cross-section of 349 auditors, of which 66% were male and 82% were politically
conservative. Descriptive statistics for the sample are given in Table 1.
Measures
Professional commitment (PC) was measured with the 15-item scale adapted by
Aranya et al. (1981) from the Porter et al. (1974) organizational commitment
questionnaire. This scale has been utilized extensively by accounting researchers
to measure professional commitment (Aranya et al., 1982; Gaffney et al., 1993;
Harrell et al., 1986; Jeffery & Weatherholt, 1996; Street et al., 1993). Researchers
have indicated that the scale has good internal consistency, with Cronbach’s
alpha reported in the high 0.80s (Aranya et al., 1981; Aranya & Ferris, 1984;
Bline et al., 1991).
Bline et al. (1991), in an extensive examination of the psychometric properties
of the professional commitment questionnaire, report that the scale measures a
construct distinct from organizational commitment. Their tests indicated that the
professional commitment scale has adequate reliability and validity. Furthermore,
the professional commitment construct correlated positively with job satisfaction
and negatively with intent to leave the profession. Other accounting researchers
have reported negative correlations between the professional commitment scale
and organizational-professional conflict (Aranya et al., 1981; Harrell et al., 1986)
and positive correlations with favorable work attitudes in public accounting
(Aranya et al., 1982).6
Ethical development was measured by the sample respondents’ P score
from the 6-story DIT (Rest, 1979, 1986, 1993). The P score is a continuous
measure, ranging from 0 to 95, reflecting the relative importance a subject gives
to principled moral reasoning in resolving moral dilemmas (Rest et al., 1997,
p. 498). Rest (1993) reports an average P score of 45 for college graduates,
although accounting researchers have generally found that public accountants
score lower than adults from the general population at similar educational levels
(Ponemon, 1992; Sweeney, 1995). Rest (1986, pp. 176–179) contends that the P
score correlates most strongly with educational level but only weakly with gender,
intelligence and ethnic background. Gender, however, appears to have a stronger
influence on accountants’ P scores than it does in the general population, with
females attaining significantly higher scores (Bernardi & Arnold, 1997; Enyon
et al., 1997; Shaub, 1994; Sweeney, 1995).
The DIT has been subjected to extensive reliability and validity tests with
generally good results (Rest, 1979, 1986; Rest et al., 1999). Some researchers
(Emler et al., 1983), however, contend that the DIT contains a political bias. In
studies with accounting subjects, Sweeney and Fisher (1998, 1999) found that the
DIT contained an imbedded political content that tended to overstate the scores
of political liberals and to understate the scores of political conservatives. They
suggest that researchers utilizing the DIT control for subjects’ political ideology in
order to more clearly interpret the relationship between P scores and the variable
of interest.
Subjects’ indicated their political ideology in response to the following ques-
tion: “Regarding important social and political issues, would you classify your
opinion or perspective as primarily conservative or liberal?” Forcing subjects to
identify their positions as primarily liberal or conservative is consistent with prior
research (Sweeney, 1995) and eliminates the ambiguity of a political “moderate”
classification.
EMPIRICAL RESULTS
Correlations
Table 2 presents correlation coefficients for professional commitment and

variables of interest. Subjects’ professional commitment is negatively associated
with the size of their respective firm and positively associated with their positional
level. Political ideology and gender are associated with professional commitment
and DIT P scores. Political ideology is not correlated with gender, position, or
firm size. The significant association between gender and position results from the
underrepresentation of female auditors at the higher ranks. The association between
firm size and position is an apparent artifact of the non-random sample selection
process.
Structural Equation Modeling
Structural equation modeling was used to evaluate the proposed hypotheses. The
structural equation model utilized to test the hypotheses corresponds to the model
in Fig. 1. Each link between the variables in Fig. 1 has a path coefficient that
measures the impact of the antecedent variable in explaining the variance in the
outcome variable. For example, the path coefficient for the link between political
ideology and P score indicates the increase in P score, measured in standard
deviations, associated with a one standard deviation increase in political ideology.
The goal of structural equation modeling is to evaluate whether associations
proposed in theory, or in prior research, fit the present data set. Evidence of proper
fit is provided by various other fit indices. However, measures of proper fit can
Table 2. Correlation Matrix.

Professional Firm Political P Score (4) Position (5) Gender (6)
Commitment (1) Size (2) Ideology (3)
(1) 1.000
(2) −0.246** 1.000
(3) −0.132** 0.087 1.000
(4) −0.017 −0.080 0.194** 1.000
(5) 0.234** −0.146** −0.046 −0.105* 1.000
(6) −0.116* 0.054 0.055 0.205** −0.353** 1.000
N = 349.
∗ p < 0.05 (one tailed significance).
∗∗ p < 0.01 (one tailed significance).
be problematic since several of the commonly used fit indices are sample size
dependent. For this reason, multiple measures of overall model fit are reported in
this study.
The Normed Fit Index (NFI) (Bentler & Bonett, 1980) has an index range
from 0 to 1, with values over 0.9 indicating a good fit. This index may be viewed
as the percentage of observed-measure covariation explained by a given model.
The disadvantage of the NFI is that it can underestimate goodness-of-fit in small
samples. Bentler’s (1990) revised Normed Comparative Fit Index (CFI) is based
upon the Bentler and Bonett (1980) NFI but with a correction for sample-size
dependency. CFI values always lie between 0 and 1, with values over 0.9 indicating
a relatively good fit (Bentler, 1990). Finally, the Adjusted Goodness of Fit Index
Fig. 2. Structural Equation Model with Path Coefficients.

Table 3. Structural Equation Modeling Results.

Dependent Independent Associated Path t-Value p-Value
Variable Variable Hypothesis Coefficient
PC Firm size H1 −0.236 −4.65 0.001

Political ideology H2 −0.154 −2.98 0.002
Gender H3 −0.042 −0.76 0.224
P Score H4 0.059 1.13 0.132
Position H5 0.184 3.43 0.001
Position Gender – −0.353 −7.04 0.001
P Score Gender – 0.195 3.78 0.001
Political ideology – 0.183 3.55 0.001
N = 349. PC = Professional Commitment.
(AGFI), devised by Joreskog and Sorbom (1984), is an additional fit index that
ranges from 0 to 1, with values above 0.9 indicating acceptable fit. Specifically,
in addition to the traditional Goodness of Fit Index (GFI), the Adjusted Goodness
of Fit Index (AGFI), the Normed Fit Index (NFI), and the Comparative Fit Index
(CFI) are reported in this study. This lends some assurance that the measures of
fit produced are not spurious.
Figurative depictions of the results of the structural equation analysis are
presented in Fig. 2. With GFI, AGFI, NFI, and CFI values exceeding 0.9 in
all instances, the theoretical model appears to provide a very good fit with the
dataset.
Tabular results of the structural equation analysis including a listing of each
hypothesis and its corresponding path coefficient are presented in Table 3.
Consistent with the relatively high model fit indices, results in Table 3 indicate
that an overwhelming majority of the associations hypothesized in the current
study and suggested by prior literature were significant, providing further support
for the proposed theoretical model of professional commitment.
Tests of Hypotheses
Hypothesis 1 predicts a negative relationship between firm size and professional

commitment. The path coefficient for this theoretical link is −0.236 and is
significant at the p < 0.001 level. Thus, smaller firms tend to have employees
who possess higher levels of professional commitment.
Hypothesis 2 predicts that conservative auditors will demonstrate higher
professional commitment than liberal auditors. For the full sample, a one
tailed t-test indicated that the professional commitment of politically con-

servative auditors was higher than that of liberal auditors (76.2 vs. 72.2;
p < 0.017). The path coefficient for this theoretical link is −0.154 and is
significant at the p < 0.002 level. This result provides support for H2 and
implies that political ideology is an influential socialization variable in public
accounting.
Male auditors are predicted in H3 to have higher professional commitment
than female auditors. For the full sample, males reported higher commitment
than females (76.5 vs. 73.6; p < 0.016) but the association between positional
level and gender must be considered before drawing any conclusions regarding
the gender-professional commitment relationship. The control path between
gender and position has a coefficient −0.353 and is significant at the p < 0.001
level, implying that male auditors in the sample are more likely to inhabit
higher level positions. After controlling for the influence of positional level,
the path coefficient linking gender and professional commitment is −0.042 and
insignificant. This result suggests that gender does not play a direct role in the
development of an auditor’s professional commitment.
Hypothesis 4 predicts that there is a positive relationship between ethical
development, as measured by DIT P scores, and professional commitment.
In order to unambiguously interpret this path, the associations between P
score and gender and P score and political ideology must be considered. The
corresponding coefficient for the path between gender and P score is 0.195
and significant ( p < 0.001). This result suggests that female auditors attain
higher P scores than their male counterparts. Additionally, the path coefficient
linking political ideology with P score is 0.183 and also significant ( p < 0.001),
suggesting that politically liberal auditors attain higher P scores than politically
conservative auditors. After controlling for the influence of gender and political
ideology, the path coefficient between P score and professional commitment
is 0.059 and insignificant. H4 is therefore rejected, as an auditor’s ethical
development does not appear to directly influence his or her professional
commitment.
Hypothesis 5 predicts that there is a positive relationship between position and
professional commitment. The path coefficient linking these two constructs is
0.184 and is significant at the p < 0.001 level. This result provides support for H5
and suggests that auditors employed at higher levels within their respective firms
exhibit higher levels of professional commitment, although the relationship is not
necessarily linear. Furthermore, it is not clear from the analysis whether auditors
with higher levels of professional commitment are more likely to be promoted, or
whether auditors develop higher professional commitment as they advance within
the profession.
Additional Analysis
Table 4 examines the influence of the significant main effects on auditors’

professional commitment, partitioned by firm size, position and political ideology.
Professional commitment scores are highest in the regional firms and, as expected,
at the partner level. Senior auditors in regional and national firms also demonstrate
relatively high commitment. The influence of political ideology is evident, as
Table 4. Summary of Professional Commitment Levels By Political Ideology

and Position for Each Firm Size.
Firm Size n Mean PC Position N Mean PC Political Ideology n Mean PC
Small 88 80.02 Staff 23 75.57 Conservative 17 77.06

Liberal 6 71.33
Senior 15 80.20 Conservative 11 81.27
Liberal 4 77.25
Supervisor 11 77.91 Conservative 9 80.11
Liberal 2 68.00
Manager 10 76.90 Conservative 8 77.13
Liberal 2 76.00
Partner 29 85.34 Conservative 21 86.38
Liberal 8 82.63
Regional 63 76.44 Staff 22 76.64 Conservative 16 78.31
Liberal 6 72.17
Liberal 1 93.00
Liberal 0 –
Liberal 1 86.00
Liberal 1 90.00
Big 6 198 73.21 Staff 55 73.47 Conservative 48 74.13
Liberal 7 69.00
Liberal 15 66.00
Liberal 2 61.50
Liberal 7 67.29
Liberal 1 95.00
conservative auditors have higher commitment than liberal auditors for every cell
containing at least two liberal auditors.
An objective of socialization is to insure that management promotes those
individuals who reflect the culture and values of the organization (Fogerty, 1992;
Kanter, 1977; Ponemon, 1992). If conservative ideology is a strongly held value
in the culture of public accounting, then politically conservative auditors should
perceive greater opportunities for advancement than politically liberal auditors.
To provide further evidence of the socializing influence of political ideology
in public accounting, subjects who were not partners were asked to respond
to the following question: “Please indicate what you believe are your chances
(likelihood) of making partner in your present firm.” The Likert response scale for
the question ranged from 1 (very low) to 7 (very high). Conservative auditors, on
average, perceived their opportunities for advancement to partner as significantly
greater than liberal auditors (3.68 vs. 2.96; p < 0.0003).
LIMITATIONS AND DISCUSSION

The credibility auditors confer upon financial statements is jointly dependent
upon their technical expertise and their commitment to the professional ideal of
independence (Watts & Zimmerman, 1986). In this study, auditors’ professional
commitment was modeled as the collective product of operant socializing
influences in public accounting firms. Two subsets of socializing forces were
examined: professional (firm size and position) and individual characteristics
affecting group membership (political ideology, gender, and ethical development).
The model was tested on a large sample of auditors representing all positional
levels in international, national and regional firms. The results of the structural
equation analysis were generally supportive of the model.
The results support the contention (Pratt & Beaulieu, 1992) that the culture
of the public accounting profession can be differentiated by firm size showing a
significant negative relationship between firm size and professional commitment.
Fogerty (1995, p. 46) suggests that firms “may differ in the balance they encourage
between commitment to the firm and to the profession.” Larger firms, perhaps
due to their stronger culture and identity separate from the profession (Goetz
et al., 1991), may be more likely to shift this balance towards organizational
commitment. Furthermore, auditors from larger firms may identify less with the
profession and more with the organization and because of its greater prestige and
economic significance, while auditors from smaller firms may correspondingly
attach greater significance to professional membership. The findings also indicate
that rank or position, symbolic of status and compensation level, was positively
related to professional commitment.
The process of socialization implies that membership within the dominant

group conveys benefits. Feelings of inclusion or exclusion from the controlling
group are likely manifested in important job-related attitudes, such as professional
commitment (Chatman, 1991; Ponemon, 1992). A major contribution of this study
to the research literature is the inclusion of political ideology as a socializing force
in public accounting organizations. Politically conservative auditors, representing
the dominant ideology, had a greater commitment to the profession than did
liberal auditors.
The public accounting profession has, in recent years, increasingly emphasized
the recruitment of under-represented socio-economic groups; however, a truly
diverse workplace is open to disparate opinions and viewpoints. Although
public accountants have traded their green eyeshades for laptop computers, they
appear to still embrace a politically conservative ideology. Firm management
can benefit from this research in understanding that the public accounting
profession may be so doctrinally conservative that it could be effectively
excluding a significant segment of society, political liberals, whose perspectives
may be valuable in understanding a rapidly changing world. Efforts directed
towards changing the traditionally conservative image of the public account-
ing profession may be beneficial in attracting new members with alternative
viewpoints.
Although male auditors, on average, reported a stronger commitment to the
profession than female auditors, gender was not a significant direct factor in the
model after controlling for the influence of positional level. Gender, however,
did have an indirect impact on professional commitment. After controlling
for political ideology and gender, the relationship between auditors’ ethical
development and professional commitment previously reported was not supported
(Jeffery & Weatherholt, 1996).
The limitations of this research need to be recognized. First, the sample
selection process was non-random, which may limit generalizability. Second, as
the data were drawn from survey questionnaires, reliability is dependent upon the
truthful responses of the participants. Third, the dichotomous measure of political
ideology did not reflect the intensity of the subject’s commitment to conservative
or liberal positions. A more comprehensive measure may better contribute to our
understanding of the impact of political ideology as a socializing force in public
accounting organizations.
Potential extensions of this research include examining further the impact of
political ideology in accounting organizations. The relationship between political
ideology and important job-related attitudes, such as satisfaction, organizational
commitment and turnover intentions, may advance our understanding of profes-
sional socialization. Political ideology may also affect other important processes
in public accounting, such as recruitment and audit team dynamics.
NOTES
1. Former Securities and Exchange (SEC) Commissioner Arthur Levitt questioned
whether the expansion into more lucrative services compromises the traditional audit
function (Covaleski, 1999). Suggesting that the audit has merely become a conduit for
selling other services, Levitt contends that auditors may not be sufficiently committed to
societal expectations and professional standards.
2. SCT is an extension of social identity theory (SIT) (Ashforth & Mael, 1989; Brown,
2000; Tajfel & Turner, 1985). SIT maintains that one’s social identity is derived primarily
from group membership, that people strive to maintain a positive identity, and that this
positive identity largely results from favorable comparisons between relevant in-groups
and out-groups (Ashforth & Mael, 1989).
3. Fogerty (2000, p. 13) described the socializing influence of prototypes in public
accounting firms when he stated: “Experienced organizational members selectively provide
reinforcement, communicate the approved range for action, and serve as examples of
achievement.”
4. An individualist orientation supports the notion of capitalism in viewing people as
independent economic actors, as opposed to a collectivist orientation that is more aligned
with a socialist perspective (Burns, 1992, p. 352).
5. After controlling for political ideology and gender, Sweeney (1995) did not find a
significant relationship between rank and DIT P scores. Therefore, we do not control for
the influence of rank on ethical development.
6. Dwyer et al. (2000) examined the dimensionality of the Aranya et al. (1981) pro-
fessional commitment scale with a broad sample of practicing accountants and concluded
that the 15-item scale could be parsimoniously reduced to a five-item measure. In light of
this research, we performed a principal components, orthogonal rotation factor analysis of
the instrument. Results of the factor analysis indicated that 14 of the 15 items possessed
loadings of 0.40 or greater on a single factor. Item 7 of the instrument, which possessed a
loading of 0.15, was the lone item not contributing to the factor. The resulting eigenvalue
for the 14-item factor was 5.49. The Cronbach alpha for the 15-item measure was 0.88.
Supplemental analyses utilizing the reduced 5-item scale from Dwyer et al. (2000) were also
performed and the results were essentially identical to those incorporating the full scale.
ACKNOWLEDGMENTS
We gratefully acknowledge the helpful comments of the participants in 2001
Annual Meeting of the Accounting, Behavior & Organizations Section, the 2002
Critical Perspectives in Accounting Conference, and the accounting research work-
shops at the Australian National University and at Washington State University.
REFERENCES
Adler, A., & Aranya, A. N. (1984). Comparison of the work needs, attitudes and preferences of profes-
sional accountants at different career stages. Journal of Vocational Behavior (August), 45–57.
Aranya, N., & Ferris, K. R. (1984). A re-examination of accountants’ organizational-professional

conflict. The Accounting Review, 59(October), 1–15.
Aranya, N., Lachman, R., & Amernic, J. (1982). Accountant’s job satisfaction: A path analysis.
Accounting, Organizations and Society (3), 201–215.
Aranya, N., Pollock, J., & Amernic, J. (1981). An examination of professional commitment in public
accounting. Accounting, Organizations and Society (4), 271–280.
Ashforth, B. E., & Mael, F. A. (1989). Social identity theory and the organization. Academy of
Management Review, 18, 20–39.
Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246.
Bentler, P., & Bonett, D. (1980). Significance tests and goodness of fit in the analysis of covariance
structures. Psychological Bulletin, 88, 588–606.
Bernardi, R. A., & Arnold, D. F. (1997). An examination of moral development within public accounting
by gender, staff level, and firm. Contemporary Accounting Research, 14, 653–668.
Bline, D. M., Duchon, D., & Meixner, W. F. (1991). The measurement of organizational and profes-
sional commitment: An examination of the psychometric properties of two commonly used
instruments. Behavioral Research in Accounting, 3, 79–96.
Brown, A. D. (2000). Organization studies and identity: Towards a research agenda. Human Relations,
54(1), 113–121.
Burns, T. J. (1992). Class dimensions, individualism, and political orientation. Social Spectrum, 12,
349–362.
Chatman, J. A. (1991). Matching people and organizations: Selection and socialization in public
accounting firms. Administrative Science Quarterly, 36, 459–484.
Chatman, J. A., Polzer, J. T., Barsade, S. G., & Neal, M. A. (1998). Being different yet feeling similar:
The influence of demographic composition and organizational culture on work processes and
outcomes. Administrative Science Quarterly, 43, 749–780.
Collins, K. M. (1993). Stress and departures from the public accounting profession: A study of gender
differences. Accounting Horizons, 7(March), 29–38.
Covaleski, J. M. (1999). SEC chief lashes out against auditors. Electronic Accountant (October 8th).
Covaleski, M. A., Dirsmith, M. W., Heian, J. B., & Samuel, S. (1998). The calculated and the avowed:
Techniques of discipline and struggles over identity in Big 6 public accounting firms. Admin-
istrative Science Quarterly, 43, 293–327.
Dwyer, P. D., Welker, R. B., & Freidberg, A. H. (2000). A research note concerning the dimensionality
of the professional commitment scale. Behavioral Research in Accounting, 12, 279–296.
Emler, N. P., Renwick, S., & Malone, B. (1983). The relationship between moral reasoning and political
orientation. Journal of Personality and Social Psychology, 45(5), 1072–1080.
Enyon, G., Hill, N. T., & Stevens, K. T. (1997). Factors that influence the moral reasoning abilities of
accountants: Implications for universities and the profession. Journal of Business Ethics, 16,
1297–1309.
Farmer, T. A. (1993). An examination of organizational commitment and professional commitment in
an auditing context. Journal of Managerial Issues (Winter), 503–516.
FECInfo (2001). Federal Election Commission: Final Report: U.S. Senate and House Campaigns.
Washington, DC: Federal Election Commission.
Fisher, D. G., & Sweeney, J. T. (2002). Morality vs. ideology: Implications for accounting ethics
research. Advances in Accounting Behavioral Research, 5, 141–160.
Fogerty, T. J. (1992). Organizational socialization in accounting firms: A theoretical framework and
agenda for future research. Accounting, Organizations and Society, 17, 129–149.
Fogerty, T. J. (1995). Questioning the assumed homogeneity of the behavioural environment of ac-
counting firms: Some exploratory empirical research. British Accounting Review, 27, 45–59.
Fogerty, T. J. (2000). Socialization and organizational outcomes in large public accounting firms.
Journal of Managerial Issues, 12(Spring), 12–33.
Gaffney, M. A., McEwen, R. A., & Welsh, M. J. (1993). Gender effects on commitment of public
accountants: A test of competing sociological models. Advances in Public Interest Accounting,
5, 45–73.
Goetz, J. F., Morrow, P. C., & McElroy, J. C. (1991). The effect of accounting firm size and member
rank on professionalism. Accounting, Organizations and Society, 16, 159–165.
Harrell, A., Chewning, E., & Taylor, M. (1986). Organizational-professional conflict and the job
satisfaction and the turnover intentions of internal auditors. Auditing: A Journal of Practice
and Theory, 5(Spring), 109–121.
Hogg, M. A., & Terry, D. J. (2000). Social identity and self-categorization processes in organizational
contexts. Academy of Management Review, 25(1), 121–140.
Hooks, K. L., & Cheramy, S. J. (1994). Facts and myths about women CPAs. Journal of Accountancy,
178(October), 79–86.
Hull, R. P., & Umansky, P. H. (1997). An examination of gender stereotyping as an explanation for
vertical job segregation in public accounting. Accounting, Organizations and Society, 22(6),
507–528.
Jeffery, C., & Weatherholt, N. (1996). Ethical development, professional commitment, and rule obser-
vance attitudes: A study of CPAs and corporate accountants. Behavioral Research in Accounting,
8, 8–31.
Joreskog, K., & Sorbom, D. (1984). LISREL – VI users guide (4th ed.). Mooresville, IN: Scientific
Software.
Kanter, R. (1977). Men and women of the corporation. New York: Basic Books.
Kohlberg, L. (1969). Stage and sequence: The cognitive developmental approach to socialization. In:
D. A. Goslin (Ed.), Handbook of Socialization Theory and Research (pp. 347–480). Chicago:
Rand McNally.
Lampe, J., & Finn, D. (1992). A model of auditors’ ethical decision process. Auditing: A Journal of
Practice & Theory (Suppl.), 1–21.
Larson, M. S. (1977). Rise of professionalism: A sociological analysis. Berkley: University of California
Press.
Maupin, R. J. (1993). How can women’s lack of upward mobility in accounting organizations be
explained? Group and Organization Management, 18(June), 132–152.
Maupin, R. J., & Lehman, C. R. (1994). Talking heads: Stereotypes, status, sex-roles and satisfaction
of female and male auditors. Accounting, Organizations and Society, 19, 427–437.
Norris, D. R., & Niebuhr, R. E. (1983). Professionalism, organizational commitment and job satisfaction
in an accounting organization. Accounting, Organizations and Society, 9, 49–59.
Ponemon, L. A. (1992). Ethical reasoning and selection-socialization in accounting. Accounting,
Organizations and Society, 17, 239–258.
Ponemon, L. A., & Gabhart, D. (1993). Ethical reasoning in accounting and auditing. Vancouver,
Canada: Canadian General Accountants’ Research Foundation.
Porter, L. W., Steers, R. M., Mowday, R. T., & Boulian, P. V. (1974). Organizational commitment,
job satisfaction, and turnover among psychiatric technicials. Journal of Applied Psychology,
59(October), 603–609.
Pratt, J., & Beaulieu, P. (1992). Organizational culture in public accounting: Size, technology, rank,
and functional area. Accounting, Organizations and Society, 17, 667–684.
Rest, J. R. (1979). Development in judging moral issues. Minneapolis, MN: University of Minnesota
Press.
Rest, J. R. (1986). Moral development: Advances in research and theory. New York: Prager Press.
Rest, J. R. (1993). Guide for the defining issues test. Version 1.3. Minneapolis, MN: University of
Minnesota.
Rest, J., Narvaez, D., Bebeau, M. J., & Thoma, S. J. (1999). Postconventional moral thinking: A
neo-kohlbergian approach. New Jersey: Lawrence Erlbaum Associates.
Rest, J., Thoma, S. J., & Edwards, L. (1997). Designing and validating a measure of moral judgment:
Stage preferences and stage consistency approaches. Journal of Educational Psychology, 89(1),
5–28.
Schroeder, R. G., & Imdieke, L. F. (1977). Local-cosmopolitan and bureaucratic perceptions in public
accounting firms. Accounting, Organizations and Society, 1, 39–45.
Shaub, M. (1994). An analysis of factors affecting the cognitive moral development of auditors and
auditing students. Journal of Accounting Education, 12, 1–26.
Shaub, M., Finn, D., & Munter, P. (1993). The effects of auditors’ ethical orientation on commitment
and ethical sensitivity. Behavioral Research in Accounting, 5, 145–169.
Siegal, P., Blank, M., & Rigsby, J. (1991). Socialization of the accounting professional: Evidence of the
effect of educational structure on subsequent auditor retention and advancement. Accounting,
Auditing and Accountability Journal, 4, 58–70.
Sorenson, J. E. (1967). Professional and bureaucratic organization in the public accounting firm. The
Accounting Review, 42(July), 553–565.
Sorenson, J. E., & Sorenson, T. C. (1974). The conflict of professionals in bureaucratic organizations.
Administrative Science Quarterly (March), 98–106.
Street, D. L., Schroeder, R. G., & Schwartz, B. (1993). The central life interests and organizational
professional commitment of men and women employed by public accounting firms. Advances
in Public Interest Accounting, 5, 201–229.
Sweeney, J. T. (1995). The moral expertise of auditors: An explanatory analysis. Research on Account-
ing Ethics, 1, 213–234.
Sweeney, J. T., & Fisher, D. G. (1998). An examination of the validity of a new measure of moral
judgment. Behavioral Research in Accounting, 10, 138–158.
Sweeney, J. T., & Fisher, D. G. (1999). Politics, faking, and self-presentation: How valid is the P score
of the Defining Issues Test? Research on Accounting Ethics, 5, 51–75.
Tajfel, H., & Turner, J. C. (1985). The social identity theory of intergroup behavior. In: S. Worchel &
W. G. Austin (Eds), Psychology of Intergroup Relations (2nd ed., pp. 7–24). Chicago: Nelson-
Hall.
Watts, R. L., & Zimmerman, J. L. (1986). Positive accounting theory. Englewood Cliffs, NJ: Prentice-
Hall.
Wheeler, R., Felsig, R. M., & Reilly, T. (1987). Large or small CPA firms: A practitioner’s perspective.
CPA Journal (April), 29–33.
AN ANALYSIS OF GROUP
INFLUENCES ON GOING
CONCERN AUDITOR JUDGMENTS
Sunita S. Ahlawat and Timothy J. Fogarty
ABSTRACT
Studies that have indicated that the processing of audit evidence results in
judgment bias may be the result of the study of individual decision-making.
Building on work that suggests important differences between individual
and group decision-making, this paper evaluates decision-making attributes
of audit groups. Experienced auditors from offices of Big-Five firms in the
U.S. served as the participants in an experiment involving the going concern
judgment. Results show that recency does affect the judgments of individual
auditors but disappears as an important effect when groups make judgments.
Group responses are less extreme and exhibit greater confidence than those
of individuals.
INTRODUCTION
The descriptive theory of belief updating proposed by Hogarth and Einhorn (1992)
posits that the order in which evidence is received has a significant and predictable
influence on a person’s final judgment. Most of the attention generated by this
discovery has focused around recency effects. Recency refers to the tendency to
place a greater weight on evidence received later in a sequence. Accordingly, an
over-reliance on information presented last may occur. A number of experimental

© 2003 Published by Elsevier Ltd.
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06002-2
27
28 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY
studies utilizing various conditions suggest that significant recency effects exist in
accountants’ and auditors’ belief revisions (e.g. Asare, 1992; Ashton & Ashton,
1988; Dillard et al., 1991; Pei et al., 1992; Trotman & Wright, 1996; Tubbs
et al., 1990). However, recent research has questioned the prevalence of recency
in auditing. Cushing and Ahlawat (1996) suggested that such effects may not be
common in audit practice. Other studies also have produced evidence that recency
effects do not always occur, or occur only under certain circumstances (Kennedy,
1993; Messier & Tubbs, 1994; Trotman & Wright, 1996).
This paper builds on the growing recognition that contextual factors (e.g.
accountability, cognitive involvement, experience, and task realism) might
mitigate judgment bias in audit judgment. Another potential factor is group
influence. Many auditing situations involve either formal or informal group
consultation (Gibbins & Emby, 1985). For example, a team of audit staff and
seniors typically conduct audit fieldwork. The group expands as managers and
partners review this work prior to the issuance of an audit report. However, the
growing recognition that cognitive heuristics and biases in auditors’ judgments
can lead to different outcomes, including different types of audit reports (e.g.
Asare, 1992), has developed with little consideration of group influences.
This research investigates the potential for group processes to overcome
weaknesses in accountants’ judgment. In addition to the recency bias, this paper
also examines the related attributes of decision confidence and belief revision
that vary between audit groups and individual auditors. This research finds
fundamental differences between groups and individuals in their exposure to
recency effects, the nature of their belief revision processes, and their confidence in
decisions. Four subsequent sections are employed. The first develops the lit-
eratures surrounding group decision-making and judgment biases as a prelude
to stating the research hypotheses. The second describes the empirical study.
The last two sections present the results and discuss their implications and
limitations.
LITERATURE REVIEW AND RESEARCH HYPOTHESES

Groups and Group Decision-Making
The unique condition of the group in business settings has been studied for some
time. Early studies measured the impact of social cues and interpersonal opinions
on performance and cognitive investment (Weiss & Shaw, 1979; White et al.,
1977). As this area matured, interactive effects between group conditions and
individual attributes were recognized (e.g. Vance & Biddle, 1985). Apart from
An Analysis of Group Influences on Going Concern Auditor Judgments 29
these more generic aspects, groups also were found to influence decision-making.
Although individuals come to the group with some degree of pre-discussion
preferences and unique decision-relevant information that continue to influence
group decisions (Winquist & Larson, 1998), the group resists reduction to the
sum of its members. Groups are believed to produce substantively different
decisions than individuals (Hill, 1982; Miner, 1984). The improved accuracy of
groups that has been reported in many areas may be attributable not only to the
increased perspectives contributed by members, but also to the heightened caution
as consensus processes tend to eschew extreme solutions (Myers & Lamm,
1976). Although the balance of evidence suggests net gains for group decisions
over those of individuals, a full explanation of their origin remains elusive. The
extent that groups may be effective at reducing the random error associated with
individual choice, may depend on the effectiveness with which feedback can be
incorporated. Group advantages may also center on the reduction of individual
variability. However, the importance of these conditions varies with the context
of the decision.
Group Decision-Making in Accounting and Auditing
Solomon’s (1987) review of the literature on multi-auditor decision-making has

not resulted in a critical mass of work on audit groups. Notwithstanding the paucity
of academic treatments, the audit process resolutely remains the result of group
deliberations. Evidence gathered by auditors continues to reflect team processes.
Work done by staff members still requires a consensus distillation of conclusions.
Work reviewed by supervisors, and then by partners, indicates a group orientation
toward the work.1 The computerization of the audit may have changed the medium
for group interaction but it has not altered the necessity for a meeting of the minds
by auditors.
A going concern decision involves aspects of both individual and multi-person
decision-making. The decision is based on many pieces of evidence that may have
been gathered and initially reviewed by selected individuals. Because predicting
the going concern status is critical, it is unlikely to be made by an individual
without extensive consultation with the audit team and other audit firm members.
While the decision itself is likely to be made by a group, individual opinions
also are important since pivotally-situated individuals (managers, seniors, and
staff), who themselves have weighed the evidence, make recommendations,
and suggestions. Consultation with other auditors prior to important decisions
(such as going concern) conforms to the requisites of professional auditing
standards (Reckers & Schultz, 1993). If group judgments are significantly
different from individual judgments, the practical implications of going concern

studies involving only individual judgment alone may be somewhat limited.
The going concern judgment has been characterized as a series of belief
revisions, where each revision is the weighted average of the previous judgment
and the value of the current evidence (Asare, 1992; Cushing & Ahlawat, 1996).
The final revised belief is then compared to the threshold for substantial doubt
for issuing an unqualified opinion (Asare, 1992). Thus, unlike most other audit
decisions, the going concern matter goes right to the “bottom line” for both clients
and auditors.
The evaluation of going concern status is regarded as critical, difficult, and
complex by most partners (Chow et al., 1987). This necessitates some considera-
tion of how such a decision is made. Cushing and Ahlawat (1996) asserted that in
order to effectively revise beliefs the auditor must: (1) read and comprehend all
information cues provided; (2) adequately recall relevant information provided
in prior stages of the sequential task; (3) give sufficient attention to relevant prior
and new information at each stage; (4) effectively relate all of this information to
his or her existing knowledge structure; and (5) develop a problem representation
sufficient to complete the task effectively. Failure to carry out any one or more of
these activities could contribute to the recency effect (Cushing & Ahlawat, 1996).
However, these requisites also imply that recency can be reduced with greater
effort or attention. A number of studies have encouraged active involvement in the
above activities. These include studies that examined the effects of accountability
(Kennedy, 1993; Tetlock, 1983), documentation (Cushing & Ahlawat, 1996),
explanation (Anderson & Sechler, 1986), and commitment (Church, 1991).
Tetlock (1983) and Kennedy (1993) reported that judgments were less prone to
order effects when participants were told that they may subsequently have to justify
or explain their conclusions to others, such as their superiors. Apparently, the mere
prospect of accountability was sufficient to produce more desirable information
processing. In contrast, Cushing and Ahlawat (1996) and Church (1991) required
participants to prepare a memorandum documenting the rationale for an audit
decision. Similar results of reduced recency were reported. A common objective
underlying these manipulations was to produce greater cognitive involvement and
effort among participants.
Although many decisions in the audit process are important, few match the
consequences of the going concern decision. Accordingly, the audit firm would
like to be highly confident that it has made the correct decision. The measurement
of endogenous and exogenous levels of confidence has been part of the study of
groups for some time (e.g. Zarnoth & Sniezek, 1997). In situations that lack clear
correct answers, such as the going concern area, confidence and accuracy are not
redundant (Luus & Wells, 1994). The formation of groups to make decisions may
be a means to increase confidence levels. However, at this point it is unclear if

audit groups are more confident about such decisions than would individuals be
when making the same decision.
In sum, group dynamics may provide opportunities for more complete problem
analysis of the going concern decision. The group process may be another form of
cognitive investment that people put into a decision. Focusing primarily upon the
tendency towards recency effects in such an environment allows us to evaluate the
impact of the group. However, other group differences may also be involved for a
broader picture of how groups compare to individuals in the auditing context.
Hypotheses
The studies discussed above suggest that the tools that enhance cognitive involve-
ment can mitigate order effects. Group decision-making can serve to enhance
effort and involvement. Group assistance can also be useful in lessening task
demands. Groups have collective experience to draw from, whereas individuals
work alone. Studies in social psychology have found that livelier interaction
among group members was associated with superior performance (e.g. Valacich
& Schwenk, 1995). Interacting groups also reduced belief perseverance (Wright
et al., 1990). These findings suggest that the interaction process itself may have
a positive effect on judgment.
Two aspects of group process could contribute to superior performance. The
group tends to broaden the information set that is brought to bear upon a choice
(Stasser, 1992). This information set includes perspectives on what factual data
means and what limitations it possesses. Group processes also reduces individual
inconsistency or extremity (Schultz & Reckers, 1981). As information exchange
between members occurs, group interaction becomes a “corrective function” when
individual members have initially incomplete or biased information (Stasser &
Titus, 1985) and are encouraged to alter opinions in order to reach a collective
judgment (Stasser & Davis, 1981).
The complexities of some audits make group processes even more salient.
Auditors are aware of the importance of group work and the need to share and
integrate expertise (Schultz & Reckers, 1981). The audit requires considerable
knowledge about industries and competitive factors in order to ascertain the con-
sequences of account balance fluctuations. Fisher and Ellis (1990) suggested that
social pressures created by the group interaction process would moderate extreme
or divergent views held by group members as they work to accommodate each
other’s views. In an audit setting, groups may be useful in preventing anecdotal
experience about certain business conditions from being overly generalized.
Groups may collectively recognize patterns and relationships that individuals

working alone may not. Group discussion can lead to a more complete problem
analysis resulting in improved judgment quality. Judgment quality is a function
of capacity, effort, internal data (i.e. memory and knowledge), and external data
access (Kennedy, 1993). Because of potential pooling of resources, correction of
errors, and use of qualitatively different learning strategies, process gains from
interaction are possible (Hill, 1982). Estimation biases observed with individual
judgments can be reduced considerably through group interaction (Sniezek &
Henry, 1989). Similarly, group interaction also may mitigate recency due to some
combination of enhanced capability, experience, and cognitive involvement.2
This study specifically examines the consequences of group interaction as a
means to overcome the limitations of the study of individuals engaging in acts that
are more likely performed by groups. By holding the amount of information that
decision-makers have more constant than would be true in an actual audit, this study
enables a focus upon the judgment process. Since groups can increase cognitive
effort, reduce complexity, and capitalize on experience, they should exhibit less
recency bias than individuals. Recency may not be a serious problem in practice
if audit groups are less susceptible to the order in which evidence is presented.
Based on the preceding discussion, the following hypothesis is tested:
H1. Audit groups will exhibit less recency effects in their going concern
judgments than will individual auditors.
Over the last fifteen years, many researchers have recognized that individual
confidence is an important dimension of group interaction. Unlike accuracy,
confidence can be made explicit at the time judgments are made. Therefore,
confidence is a key indictor of the extent that uncertainty is perceived to be
inherent in a task. High levels of uncertainty suggest that a decision is unusually
sensitive to differences in judgment. This condition may make judgment biases
more consequential to the decision. Accordingly, confidence and accuracy can be
affected by different factors (Luus & Wells, 1994).
Sniezek and Henry (1989, 1990) postulate a two-stage process for groups
to reach a consensus judgment: (1) the revision process; and (2) the weighting
process. At the revision stage, individual judgments are voluntarily revised in
light of information exchanged during interaction. At the weighting stage, group
members use some implicit or explicit rule to combine divergent views and
negotiate their individual judgment to form a single group judgment. This process
is sufficiently engaging and explicit so that when group members adopt a single
group judgment, they may have higher confidence in that group judgment than
they would have had in their own individual judgment (Sniezek & Henry, 1990).
After the weighting process, group members should express higher confidence
about their decision because it takes into account a wide set of perspectives on
importance. Lower confidence would be inconsistent with the social pressures
that support the participatory consensus formation around the group’s choice. As
such, the group interaction process may lead to higher group confidence compared
to the individual members’ pre-group confidence (Sniezek & Henry, 1989, 1990).
The greater confidence may also reflect individuals’ recognition that groups can
potentially recognize, evaluate, and process more information than individuals.3
In an accounting study, Bloomfield et al. (1996) showed that interaction that
inspired group confidence contributed to group performance. In a different vein,
Allwood and Granhag (1996) found that groups inspired not only confidence,
but also realistic confidence.
The level of confidence is particularly important for the going concern decision
made by auditors. The evaluation of business survival is inherently oriented toward
the future and therefore is more uncertain than most auditing decisions. Since the
going concern decision has distinct adverse consequences for the client, high levels
of confidence are called for to withstand the client resistance that is likely to result.
Accordingly, the following hypothesis will be considered:
H2. Audit groups will exhibit greater confidence than individual auditors about
going concern decisions.
Research over the last thirty years has identified many reasons to depart from
the belief that the direction of influence in decision-making is symmetrical.
Human beings are not bound to strict mathematical consistency when dealing
with information that points to one conclusion relative to information that leads
to an opposite result. Pivoting around a baseline (zero), positive movements and
negative cues of equal magnitude have often been shown to be processed in a
qualitatively differently way. However, the reasons that individuals are influenced
by these frames of reference are imperfectly understood (Newman, 1980).
If group-based reasoning is capable of integrating more information and wider
perspectives, it also may be capable of altering the tendency to treat categories of
cues in ways that are inconsistent with Bayesian logic. The more varied experiences
available to the group as input to their decision may work against the tendency
to over-weigh the negative or the positive. If framing effects are psychological in
nature, forcing them into open discussion may have the effect of exposing their
inconsistency. In other words, there may be more balance in how groups react to
positive and negative types of information than there would be in how individuals
react to that same information.
Auditing has been described as the attempt to confirm a series of interrelated
hypotheses about the clients accounting records (Church & Schneider, 1993).
Evidence that the accounts are correct as stated therefore can be logically
opposed to evidence of material error. The going concern decision appears to be

a special case of this bifurcation of evidence, here contrasting pro-survival and
anti-survival implications for the business entity. Whereas the former mitigates
against a going concern problem, the latter type tends to confirm such a problem.
What individual auditors may do in the consideration of these types may not be
the same as what groups would do. Individual auditors may not be as able to
recognize that they are acting in a way that systematically over weighs either
positive or negative information. Groups may therefore be less likely to overreact
to either good news about audit client viability or bad news about doubtful
continuation. A two-part hypothesis that pinpoints the possibilities of difference
would be:
H3a. Audit groups will revise their beliefs about going concern in response to
confirmatory going concern evidence less than individual auditors.
H3b. Audit groups will revise their beliefs about going concern in response to
mitigating going concern evidence less than individual auditors.
In sum, four specific effects that differentiate groups and individuals are expected.
Groups should be less influenced by the order of the evidence that they consider
in a going concern decision context. They should also exhibit higher levels of
confidence about the accuracy of their determinations. Groups are expected
to be more temperate in their reactions to incremental positive and negative
information. Together, the hypotheses suggest that groups will make less bias and
more confident going concern decisions.
THE EXPERIMENT
An experiment was designed to test the hypotheses in a context where auditors
are asked to evaluate a client’s ability to continue as a going concern. This type
of context has been employed frequently in prior studies of recency effects in
audit judgment. The specific task in the experiment involves making a series of
judgments about a firm’s going-concern status and a recommendation about the
type of audit report to issue.
The experiment was conducted in the offices of the participating international
public accounting firm over a four-week period. In each office, arrangements were
made for subjects to participate as individuals or as members of three-person
groups. Judgments were made privately by individuals or collaboratively in
groups. Although the assignment of participants to conditions was random, group
composition was subject to member availability at the pre-established time for
the exercise.4 The only qualifying stipulation was that participants were primarily
engaged in the auditing activities of the firm and that they had at least two years
of experience. A researcher distributed and collected all materials in person. For
groups, the researcher was present outside the meeting room for the duration of
the deliberations. Individuals completed the task in their offices, but without the
physical proximity of the researcher.
Task and Procedure
Each participant was provided with case material. Although each member of the
group was given a copy of the case, groups were instructed to respond collectively
on a single response sheet. Group members were encouraged to discuss the case
prior to reaching a consensus. Each group designated one member to record the
group response.
A cover letter accompanying the case materials suggested that the task should
take about 60 minutes to complete. Whereas letters to groups emphasized the
importance of working collectively, letters to individuals stressed the need for
independent work. Both types of letters asked participants to proceed through the
materials in one sitting. All participants were guaranteed anonymity, assured that
there were no right or wrong answers, and told that most of the questions dealt
with matters of professional judgment.
Participants were asked to read the case assuming that they were performing a
review of preliminary results from the current year’s audit engagement. The case
was previewed for realism and relevance by audit professionals other than the
participants and was revised in accordance with their suggestions.
The experimental materials consisted of a set of instructions and a case
booklet. The case booklet contained background and financial information for a
hypothetical client. The background information included a detailed description
of the industry and a company, its operations, economic environment, and the type
of audit opinion it had received in the last two years. The financial information
comprised audited financial statements for the past three years and the current
year. This information included the balance sheet, income statement, selected
financial ratios, footnotes, statement of changes in financial position, and schedule
of working capital changes. The experimental materials were designed to create
a case in which the audit decision was not an obvious unqualified or modified
(going concern) opinion.
Figure 1 depicts the sequence of procedures required of the auditors for the
experiment. The case consisted of four tasks. Participants were asked to complete
each task in the order given to capture belief revision. They were instructed to
Fig. 1. Procedure for the Experiment.
return the task to the appropriately labeled envelope, and to seal the envelope at
the end of each task.
In Task 1, participants were first asked to provide their general threshold level
for substantial doubt, such that a modified audit opinion would be recommended
for any entity whose likelihood for continued existence fell below the threshold
level. This established, in quantified terms, participants’ baseline threshold for
substantial doubt before they considered the hypothetical client in particular.
Group members had to agree to a single baseline. The scale used for pinpointing
participants’ threshold levels ranged from 0 to 100, with endpoints labeled
“certain not to continue” (0) and “certain to continue” (100).
Participants then dealt with case-specific questions. They were asked to: (1)
assess the likelihood of the client’s continued existence through the end of the
current fiscal year; (2) recommend the type of audit report to be issued; and (3)
indicate their confidence in the audit report recommended. A 0–100 scale with
end points labeled “certain not to continue” (0) and “certain to continue” (100)
measured this for each subsequent likelihood judgment. A similar 0–100 scale
with end points labeled “not confident at all” (0) and “very confident” (100) was
used to elicit participants’ confidence level. The audit report categories were Un-
qualified, Modified, and Disclaimer. Under U.S. auditing standards, the modified
opinion would be appropriate if there were significant doubt about the entity’s
continuation (AU 341, AICPA, 1990). At this point, participants did not know that
they would receive additional information or have an opportunity to revise their
previous judgments. In addition to familiarizing the participants with the client’s
overall operations and financial conditions, Task 1 allowed them to set their own
decisional anchor points.
Task 2 of the case sequentially presented six additional pieces of evidence.
Three of the evidence items were classified as “Contrary” with regard to the going
concern status of the hypothetical company. Contrary information is defined as
any evidence or issue that raises doubts about the entity’s ability to continue in
existence. Specifically, the contrary items related to: (1) the upcoming expiration
of a patent that had consistently generated approximately 25% of total sales;
(2) the departure of one of the company’s key sales executives; and (3) the
non-renewal of the company’s line of credit. The other three evidentiary items
could be considered “Mitigating” in nature, since they might quell traditional
auditor going concern doubts. The mitigating factors were: (1) the receipt of a
favorable marketing research report on a new product line; (2) the successful
deferment of an account payable over a three-year period; and (3) a successfully
concluded contract negotiation with an employee labor union. Following the
presentation of each of these pieces of evidence, participants were asked to
provide a revised assessment of the likelihood that the client would continue in
existence through the end of current fiscal year. After providing the last of these
assessments, participants were again asked to recommend the type of audit report
to be issued and to indicate their confidence in the appropriateness of that report.
The six items were presented in two orders. In the condition labeled MMMCCC
on Fig. 1, the three mitigating factors (MMM) were presented first, followed by the
three pieces of contrary information (CCC). The order of evidence was reversed
in the second condition, labeled CCCMMM. The variation in the order of cues
was the recency manipulation. Each of these items was presented on a new page
contained in an envelope. Participants were asked to complete a new 0–100 scaled
sealed assessment of the hypothetical company’s continuation as a going concern
before examining the next item of evidence. After the last piece of evidence was
revealed, participants were again asked about their confidence about the opinion
type they recommended, with a question identical to that used in Task 1.
Task 3 of the case required all participants to complete a questionnaire regarding
their background and auditing experience. Since these questions concerned their
individual attributes, all participants, even those that had worked in groups for
Tasks 1 and 2, were asked to work alone on Task 3.
Task 4 obtained data for a manipulation check. Nine pieces of evidence (includ-
ing the six items presented in the experiment) were used to check respondents’
perceptions. They were asked to classify these nine items as contrary, mitigating,
or neither, in relation to a going concern question. Individuals that had worked
in groups for Task 1 and Task 2 also performed this task collectively in keeping
with the intent to study the difference between groups and individuals.5
Participants
Ninety-one auditors from a Big-Five CPA firm participated in the experiment. Of

the 91 auditors, 49 were managers, and 42 were seniors. There were 21 groups,
each consisting of one manager and two seniors. The 28 people who worked as
individuals were all managers. This design feature was motivated by a desire to
have at least one experienced individual in each decision-making unit.6 Table 1
presents auditor experience by rank and treatment conditions.
On average, managers had 8.45 years of experience (range 5–15 years) while
seniors had 3.26 years of experience (range 2–5 years). The sample of individuals
had, on average, more experience (7.93 years) than auditors in the group
condition (5.24 years). However, the groups had managers with more experience
(9.19 years) as members. The extent to which group members had previous
experience working with each other on actual audit engagements was not available
information.
Table 1. Descriptive Information: Average Auditor Experience by Rank and

Treatment Condition (Standard Deviation in Parentheses).
Audit Experience Rank Decision Unit
Group Individual
Information Ordera Information Order

CCCMMM MMMCCC CCCMMM MMMCCC
Experience (in years) Manager 9.09 (1.38) 9.30 (3.09) 8.23 (1.09) 7.67 (1.40)
Senior 3.36 (0.85) 3.15 (0.99) – –
No. of audits since Manager 106.4 (73.99) 139.5 (122.91) 102.3 (69.63) 105.67 (70.12)
working as an Senior 22.3 (15.13) 26.4 (17.27) – –
auditor
No. of audits in which Manager 3.27 (1.79) 3.60 (3.89) 0.38 (2.87) 2.47 (1.99)
an opinion other Seniorb 0.68 (0.99) 1.15 (1.27) – –
than unqualified
was issued
a Respondents in the CCCMMM (MMMCCC) condition received three items of contrary (mitigating)
evidence, followed by three items of mitigating (contrary) evidence.
b 7 of 49 managers and 21 of 42 seniors indicated they had not been on any engagements in which the
going concern opinion was in fact issued.
Most participants indicated that, as members of audit teams, they had been
involved in engagements in which an opinion other than unqualified was either
seriously considered (81 of 91), or actually issued (63 of 91). This suggests that
participants were familiar with non-standard audit reports in the “real world” of
audit practice. Of the 63 who had been on audits in which a going concern opinion
was issued, 42 were managers and 21 were seniors.
Experiment Design
Participants were assigned to one of four experimental conditions according to a

2 (decision unit) × 2 (order of evidence) design. Thus, the four treatment condi-
tions for the first hypothesis were: Individual, CCCMMM; Individual, MMMCCC;
Group, CCCMMM; and Group, MMMCCC. The dependent variable for the first
hypothesis (H1) and the third hypothesis (H3) was the change in the assessed
likelihood of the client’s continued existence. The change was measured based on
assessments made after the initial review of the case in Task 1 (labeled J0 ) and
after the review of all six additional items of evidence in Task 2 of the experiment
(labeled J6 ). Thus, belief-revision was computed as the percent change difference

between revised and initial likelihood judgments (J 6 − J 0 ) for H1 and sequentially
for the separate informational cues for H3.7 Confidence assessments elicited from
the participants were used to analyze hypothesis H2. These were obtained after
the initial (labeled Ci ) and final (labeled Cf ) recommendations for the type of audit
report to be issued in Tasks 1 and 2, respectively.
RESULTS
Descriptive Results
The results of the manipulation check in Task 4 were very satisfactory. Participants
overwhelmingly reacted in the expected direction. Only 3 (1.02%) of the 294
possible cases (6 items each from 28 individuals and 21 groups) were incorrectly
classified. Regardless of this small misclassification, participants always revised
their probability assessment in the expected direction (downward in response to
contrary information and upward in response to mitigating factors) to the evidential
facts during Task 2.
The average likelihood judgments (J0 – J6 ) are reported in Table 2. The average
initial judgment (J0 ) by individuals (68.92 points) and groups (69.05 points) was
not significantly different ( p > 0.10). Table 2 shows how each subsequent infor-
mational unit altered the progressive going concern estimation in the predicted
direction. The average downward belief revision for contrary information was
39.16 points. The average upward belief revision was 15.82 points for mitigating
information. This magnitude difference is consistent with prior findings that
auditors are particularly sensitive to disconfirming evidence (Ashton & Ashton,
1988; McMillan & White, 1993). The average downward revision for contrary
information was less for groups (31 points) than for individuals (45 points).
Similarly, the average upward revision for mitigating information was 11 points
for groups and 19 points for individuals. Consistent with the literature that suggests
that groups function to taper extreme member positions, group responses were less
polarized than individual responses in both the positive and the negative direction
in this audit context.
Tests of Hypotheses
The first hypothesis specified that the groups would exhibit less recency effects
than individuals. A 2 (decision unit) × 2 (order) ANOVA was conducted with
percent change cumulative belief revision (J 6 − J 0 )/J 0 as the dependent variable.
An Analysis of Group Influences on Going Concern Auditor Judgments
Table 2. Descriptive Information: Analysis of Belief Assessments by Treatment Conditions.
Treatment Conditions Mean (Standard Deviation) of Initial (J0 ) and Revised (J1 Through J6 ) Likelihood Assessments
Decision Unit Ordera J0 J1 J2 J3 J4 J5 J6
Group (N = 11) CCCMMM 69.54 (22.63) 59.54 (24.54) 47.72 (26.77) 38.82 (27.76) 42.73 (26.49) 45.91 (25.18) 51.36 (23.88)
Group (N = 10) MMMCCC 68.50 (16.67) 71.50 (18.86) 73.50 (15.47) 78.50 (14.35) 64.50 (17.55) 54.20 (22.75) 47.00 (21.24)
Individual (N = 13) CCCMMM 66.92 (21.27) 41.92 (22.03) 34.85 (16.62) 23.46 (16.88) 36.92 (16.40) 52.07 (20.38) 52.84 (18.76)
Individual (N = 15) MMMCCC 70.67 (14.12) 76.53 (14.89) 76.73 (13.23) 81.00 (8.70) 60.00 (8.45) 41.80 (15.36) 34.27 (14.28)
a Respondents in the CCCMMM (MMMCCC) condition received three items of contrary (mitigating) evidence, followed by three items of mitigating
(contrary) evidence.
41
Table 3. Tests of Hypothesis H1: Analysis of Variance: Order × Decision Unit

with Belief Revision (J 6 − J 0 )/J 0 as the Dependent Variable.
Source df Mean Square F Sig. of F
Order 1 0.464 9.085 0.004

Decision unit 1 0.033 0.654 0.423
Order × Decision-unit 1 0.278 5.432 0.024
Residual 45 0.051
Order Mean Belief Revision
Individual Group
CCCMMM −0.1921 −0.2921

MMMCCC −0.5180 −0.3133
t 4.13 0.20
p 0.000 0.847
The results are presented in Panel A of Table 3. The significance of the order
variable (F = 9.085, p < 0.01) shows that recency effects are present in auditors’
going concerns decisions. More importantly however, the results reveal a signifi-
cant interaction (F = 5.43, p < 0.05) between order and decision unit. This result
suggests that judgments were not only influenced by the order in which evidence
was evaluated, but also by whether judgments were made individually or in groups.
The decision unit does not have a direct effect and is important only in terms of
altering the impact of order effects. This suggests that groups act as a “debiaser”
in eliminating recency in auditor going concern judgments. H1 is supported.
Another test of recency among individual auditors shows that individuals in
MMMCCC condition made a greater average downward adjustment in their
going-concern likelihood judgments (from 70.67 to 34.27, a change of 36.40
points) than individuals in CCCMMM (from 66.92 to 52.84, a change of 14.08
points). This difference in average belief-revisions was significant (t = 3.96,
p < 0.001). In contrast to the individual results, likelihood judgments of audit
groups exhibited no recency. Here, the average downward adjustment was 21.50
points (from 68.50 to 47.00) for the MMMCCC condition, and almost identical
18.18 (from 69.54 to 51.36) points for the CCCMMM condition. This difference
was not significant (t = 0.47, p > 0.65).8 Hence, as expected, groups mitigated
the recency effect. These results also support H1.
The second hypothesis asserted a relationship between decision unit and going
concern judgment confidence. Specifically, audit groups were predicted to have
greater confidence in their going concern decisions. For these purposes, decision
Table 4. Tests of Hypothesis H2: Analysis of Variance: Order × Decision Unit

with Final Confidences as the Dependent Variable.
Source df Mean Square F Sig. of F
Order 1 138.641 0.623 0.434

Decision unit 1 923.415 4.149 0.048
Order × Decision-unit 1 0.025 0.000 0.992
Residual 45 22.587
Decision Unit Average Confidence
Initial Final
Individual 63.57 71.25

Group 75.57 80.23
t 2.27 2.25
p 0.028 0.029
confidence at the end of the case was used as the dependent variable. Final
confidence is important because it reflects the processing of all the information
in the case, either by groups or individually. Table 4 offers an ANOVA to test the
second hypothesis. Information order and decision unit are included as possible
effects upon final confidence consistent with H2. The significance of decision unit
at p < 0.05 suggests that groups have higher levels of confidence.9 The failure of
order effects, and the interaction between order and decision unit, to be significant
suggests that only how the decision-making unit was structured influenced
confidence.
Although H2 pertains to the existence of group differences, the change in
confidence that occurred during the experiment was also considered. Groups
exhibited significantly higher initial confidence than individuals (t = 2.27,
p < 0.03). A 2 × 2 ANCOVA with final confidence as the dependent variable,
initial confidence as the covariate, and decision unit and order as the independent
variables was conducted. In results not shown, the initial confidence covariate
was significant ( p < 0.05). Neither of the two main effects nor their inter-
action was significant. This suggests that the differential confidence in the final
decision was driven by the initial differences, and not by the differential pro-
cessing of information. Nonetheless, groups maintained a significant difference
in confidence over individuals throughout the entire process of belief revision.
Groups begin more confidently and stay that way, as further information is made
known about relevant events. However, the group does not progressively become
significantly more confident. The confidence difference appears to adhere to
Table 5. Tests of Hypothesis H3: Analysis of Responses to Contrary and

Mitigating Information.
Mean (Standard Deviation) t p
Individuals Groups
Response to contrary information 45.21 (13.44) 31.09 (17.05) 3.13 0.003

Response to mitigating information 19.18 (18.66) 11.33 (15.19) 1.57 0.122
the mere existence of the group, rather than its continued information handling
abilities.
The final hypothesis concerns different processing by groups and individuals
of the confirmatory and mitigating information. In the test of H3, the six
opportunities provided to participants to revise their probability beliefs were
distinguished into contrary and mitigating types. As shown in Table 5, there
is a significant difference between individual and group responses to contrary
information (t = 3.13, p < 0.01), with individuals reacting more severely. This
is consistent with H3a. No significant differences exist between audit groups
and individual auditors when presented with mitigating information (t = 1.57,
p > 0.12). This does not support H3b.
Other Analyses
In Hypothesis H1, the dependent variable was the revision of the assessment of the
likelihood that the client firm will continue as a going concern. As Asare (1992)
points out, it is also important to learn whether the differences in audit judgments
induced by the recency effect are likely to lead to differences in substantive audit
decisions. Accordingly, an additional analysis was performed to examine whether
judgment differences were sufficient to influence the audit report decisions in this
particular case setting.
Table 6 reports the recommended audit opinion of participants in each of the
four treatment conditions, both at the initial stage (Task 1) of the experiment, and
after reviewing all six additional items of information (the conclusion of Task 2).
Since none of the groups or individuals selected the “disclaimer of an opinion”
recommendation at any point in the experiment, the audit opinion variable was
binary. At the initial point, individuals are no more likely to recommend a
modified opinion (␹2 = 0.92, p > 0.50). However, individuals show a stronger
tendency to switch to a modified opinion during the course of the case. When final
decisions are considered, individuals are more likely than groups to recommend a
Table 6. Additional Analyses: Audit Opinion by Treatment Conditions.

Panel A: Recommended Audit Opinion
Unit Order N Initial Opinion Final Opinion
Unqualified Modified Unqualified Modified
Individual CCCMMM 13 8 5 4 9
Individual MMMCCC 15 11 4 2 13
Groups CCCMMM 11 9 2 6 5
Groups MMMCCC 10 7 3 4 6
49 35 14 16 33
Panel B: Opinion Chosen vis-à-vis Opinion Indicated by Threshold
Opinion According to Threshold Opinion Chosen
Initial Final
Unqualified Modified Unqualified Modified
Individuals Unqualified 18a 2 4a 3

Modified 1 7a 2 19a
Groups Unqualified 14a – 9a –
Modified 2 5a 1 11a
a Indicates agreement between threshold and actual opinion issued.
modified opinion (␹2 = 5.029, p < 0.05). In results not shown, individuals in the
CCCMMM condition tended to recommend more unqualified and fewer modified
opinions than individuals in MMMCCC condition at the end of the experiment.
This comparison, however, is not significant (␹2 = 2.24, p > 0.05). A comparison
of the distribution of final recommended opinions to the distribution of initial
opinions shows that 4 of 8 individuals in the CCCMMM group changed their
recommendation from unqualified to modified, while 9 of 11 in the MMMCCC
condition changed from unqualified to modified. A much less severe pattern
existed for groups. Only 6 of 21 groups (3 in each order condition) changed
their recommendation from unqualified to modified. However, neither of these
comparisons is significant (␹2 = 0.962, p > 0.05 and ␹2 = 0.829, p > 0.05 for
individuals and groups, respectively). Contrary to the expected effect of recency
on audit opinions, the number of modified opinions increased in both individual
and group CCCMMM conditions. Although revisions of belief toward modified
opinions may align with the aforementioned heightened sensitivity of auditors
to adverse news, these results also suggest possible differences between binary
(unqualified, modified) and continuous (percentage probability) outcomes.10
SUMMARY AND DISCUSSION

A considerable discrepancy seems to exist between decision-making in auditing
practice and its academic study. Whereas audits are group efforts that utilize the
contributions of many differently situated individuals, academic research has been
mostly the study of autonomous individuals. If groups are different than individu-
als, aggregating the latter to form implications about the work of the former may
not be appropriate.
Many studies of auditors suggest that bias and inconsistency exist in individual
judgments. However, interventions that increase individuals’ cognitive effort and
engagement may reduce bias. Extensive research on decision-making suggests
that forming groups is an effective means of increasing the cognitive effort and
engagement of individuals. Accordingly, the grouping of auditors provides a means
of simultaneously increasing our appreciation of the nature of decision-making and
adding to the realism of the experimental evidence.
The task used in this research involved assessments of the going concern status
of a hypothetical audit client. In many ways, this may be an atypical judgment due
to its extreme nature. However, the descriptive evidence suggested that adequate
familiarity with the issue existed among the participants. The repetitivity with
which this issue has been studied by auditing researchers also allows more direct
comparisons to be made. From a practice standpoint, the need for a strong going
concern evaluation cannot be denied.
This paper provides empirical support for the proposition that group decision
processes differ from those of individuals. The results suggest that when audi-
tors work in groups, judgments are less likely to be influenced by the order in
which evidence is received and evaluated. The recency effects reported in several
experiments with audit practitioners and reproduced in this study with respect to
individual decision-makers were not present for the same judgments that were
made by groups.
This study builds upon Kennedy’s (1993) and Cushing and Ahlawat’s (1996)
work by explaining how other factors relevant to an auditing environment mitigate
recency effects. Together, this line of work implies that recency effects may be
overstated by studies that lack external validity. The results of this study imply
that recency may be less onerous for the profession than others have suggested. If
audit decisions are made in groups, less recency bias appears to be present.
This study offers some interesting results regarding judgment confidence.
Audit groups start with more confidence in their decisions than do individuals.
Participants may intuitively appreciate the superior power of the collective power
to make an informed evaluation, or may just appreciate the help that others provide
when making difficult decisions. Faced with post hoc inconsistent evidence,
groups tend to sustain, but not significantly increase, their confidence advantage
over individuals. This suggests that the advantages of the group mode in an audit
setting occur early in the deliberative process. The fact that the confidence of
groups did not increase over time also may indicate that this collective mode is
not necessarily prone to overconfidence.
The results suggest that one of the main differences that groups may offer is
their willingness to reduce extreme reactions to particular pieces of information
that push toward extreme solutions. In the going concern situation, further evidence
of financial distress would logically make the going concern question more salient.
However, the contribution toward this conclusion for groups is relatively small.
Groups appear to be more willing to suspend judgment or to put each additional
piece of information in a broader context. Individuals demonstrate more sensitivity
to “bad” news by making larger belief revisions. This difference between groups
and individuals is not observed for information that tended to lessen the going
concern problem. Individuals did not react more strongly to facts that suggested
that the hypothetical business would remain financial viable. Further research is
needed to test possible reasons that the two decision units processed good news
and bad news differently.
The results should redirect the attention of auditing organizations and academic
accountants to group dynamics. Groups appear to process information in ways less
affected by its order. Groups are also more confident about decisions and less likely
to overreact to “bad” news about a client. Auditing firms should be comfortable
about the ability of groups to avoid recency bias but be somewhat concerned about
the tendency to perhaps react too little to going concern issues. In light of recent
sudden corporate bankruptcies, the latter tendency needs to be guarded against.
This research did not attempt to evaluate the importance of degrees of
confidence. The superior confidence of groups does not necessarily imply that
groups made more technically correct decisions about the going concern status of
the hypothetical client. This hypothetical nature of the client prevents any proof
of superiority. A necessary prelude to the confidence that constituents might
have about auditing outcomes is the confidence that auditors themselves have
in auditing inputs. Nonetheless, subsequent research should be directed at the
specific value of confidence in auditing judgments.
The findings of this study are subject to certain limitations. One stems from
the unavailability of data regarding the extent to which group members actually
had experience working together on previous engagements. The effectiveness
of group processes may depend on such experience, as individuals learn to
systematically respect or discount the judgments of others. The importance of
working histories of groups may not be as high in auditing as in other business
settings. As firms get larger and centralize control over their human resources,
individual assignments become less predictable and stable. No attention was given
to hierarchical differences within the participants that were assigned to groups.
In the attempt to ensure sufficient going concern expertise, auditors of different
ranks were mixed in the groups. No evidence exists on the question of whether
participants of higher rank dominated group decisions. A more systematic attempt
to isolate the power of more highly ranked individuals would have been necessary
to shed light on this question.11 Another potential limitation stems from the
fact that auditors in the group condition are more experienced than auditors in
the individual condition. Although the groups also included auditors with lesser
experience than those that worked individually, an experience effect may have
resulted if the more experienced group member dominated the group decisions.
NOTES
1. The professional nature of the work mitigates the fact that these groups often consist
of individuals at different levels within the organization. However, the empirical regularities
created by this professionalism need further investigation.
2. The expected ability of groups to make better-informed decisions does not take
into account situations where individuals first make judgments and then enter groups for
the reevaluation of the decision. This may cause groups to move towards more extreme
positions, as shown by Marxen (1990).
3. Group confidence might be lowered by cases where individuals strongly disagreed
with group positions. Therefore, the expectation that group confidence will be higher than
individual confidence implicitly asserts that these situations will be rare. This study does
not measure the degree to which satisfaction is related to confidence.
4. Group composition could be very important to the dynamics of group decision-
making. Since this research could not tightly control the composition dimension,
interpersonal issues such as charisma and persuasiveness could not be measured. On
more objective dimensions such as experience and rank, a suitable mixture of people was
achieved. See Table 1 and the discussion of participants in the Results section.
5. The researcher did not inquire about the decision processes of the groups after the
experiment was completed. Investigating this in a way that did it justice would require
another study.
6. This choice on group composition creates an alternative interpretation about the extent
of influence lower level employees can have on higher ones. See Graen and Uhl-Bien (1995).
7. The measure J0 –J6 was also examined in raw change terms. Since no differences in
the substantive results occurred, these were not shown.
8. Other tests were conducted to clarify the interpretation of the results presented in
Table 3. An ANCOVA with experience as the covariant (p > 0.05) was considered. A
significant order/decision-unit interaction (F = 4.674, p < 0.05) again resulted. This
suggests that these findings are not attributable to an experience effect. Another analysis
used J6 as the dependent variable, J0 as the covariant, and order and decision-unit as the
independent variables. This model captures belief revision in a different way by more
explicitly controlling for the initial anchoring point (J0 ). It also shows results similar
to those that are reported above. Specifically, the interaction between order effects and
decision-unit was significant (F = 4.35, p < 0.05). Another covariant that could be
important is the threshold for substantial doubt. The point at which the decision-maker is
confronted with a reportable going concern issue may present a matter independent from
the quantifiable belief revision variable. Using the probability estimate for this general
threshold specifically collected from the participants in Task 1 as a covariant, the order
effect/decision unit interaction term was again significant (F = 4.89, p < 0.05). The
results suggest the acceptance of the first hypothesis. Audit groups making going concern
decisions are less prone than individual auditors to recency effects.
9. As shown in Panel B of Table 4, this relationship was also analyzed using t-tests.
The results show that the difference between the final confidence of individuals (71.25)
and groups (80.23) was significant (t = 2.25, p < 0.03). This results is consistent with the
expectation in H2.
10. The bottom portion of Table 6 reports whether the participants’ recommended opin-
ions were consistent with the final probability ratings and (J6 ) with their initial threshold
judgment provided at the beginning of Task 1, apart from the consideration of case materials.
An auditor’s opinion type decision was considered consistent if the likelihood rating was
below the threshold judgment, and a modified report was chosen. Alternatively, consistency
could also be achieved with the recommendation that the opinion be unqualified if likelihood
was above the given threshold. Table 6 reports the results of these comparisons. In total,
only 7% (3 of 42) group recommendations of audit opinions were inconsistent. A nearly
twice as large 14% (8 of 56) of the individual recommendations were inconsistent. An even
more telling process unfolds when initial and final likelihood positions are differentiated.
Groups become more consistent to their original threshold over time. Initially, 90% of the
groups are consistent. This increases to 95% consistency after the last piece of information
has been processed. Individuals become less consistent. The percent of individuals that are
consistent changes from 89 to 82% over the course of the decision-making.
11. Conversations with practitioners about this did not reveal any consistent practice.
Some firms had a more hierarchical approach than others almost to the point of resting
this decision on the engagement partner after the other auditors had collected the relevant
information and suggested an outcome. Other firms had a more participatory process
wherein the decision cascaded from the lower levels to the top.
REFERENCES
Allwood, C. M., & Granhag, P. (1996). Realism in confidence judgments as a function of working in
dyads or alone. Organizational Behavior and Human Decision Processes, 64, 277–289.
American Institute of Certified Public Accountants (1990). Statement on auditing standards No. 59:
The auditor’s consideration of an entity’s ability to continue as a going concern. (AU 341) New
York, NY: AICPA.
Anderson, C. A., & Sechler, E. (1986). Effects of explanation and counter-explanation on the develop-
ment and use of social theories. Journal of Personality and Social Psychology, 50, 24–34.
Asare, S. K. (1992). The auditor’s going-concern decision: Interaction of task variables and the
sequential processing of evidence. The Accounting Review, 67, 379–393.
Ashton, A. H., & Ashton, R. (1988). Sequential belief revision in auditing. The Accounting Review,
63, 623–641.
Bloomfield, R., Libby, R., & Nelson, M. (1996). Communication of confidence as a determinant
of group judgment accuracy. Organizational Behavior and Human Decision Processes, 6,
287–300.
Chow, C., McNamee, A., & Plumlee, D. (1987). Practitioners’ perceptions of audit step difficulty and
criticalness: Implications for audit research. Auditing: A Journal of Practice and Theory, 6,
123–133.
Church, B. (1991). An examination of the effect that commitment to a hypothesis has on auditors’
evaluations of confirming and disconfirming evidence. Contemporary Accounting Research, 7,
513–534.
Church, B., & Schneider, A. (1993). Auditor generation of diagnostic hypotheses in response to a
superior’s suggestion: Influence effects. Contemporary Accounting Research, 10, 333–350.
Cushing, B., & Ahlawat, S. (1996). Mitigation of recency bias in audit judgment: The effect of docu-
mentation. Auditing: A Journal of Practice & Theory, 16, 134–146.
Dillard, J. N., Kauffman, N., & Spires, E. (1991). Evidence order and belief revision in management
accounting decisions. Accounting, Organizations and Society, 7, 619–633.
Fisher, B. A., & Ellis, D. (1990). Small group decision-making: Communication and the group process.
New York, NY: McGraw-Hill.
Gibbins, M., & Emby, C. (1985). Evidence on the nature of professional judgment in public accounting.
In: A. R. Abdel-khalik & I. Solomon (Eds), Auditing Research Symposium (pp. 181–212).
Champaign, IL: University of Illinois.
Graen, G. B., & Uhl-Bien, M. (1995). Relationship-based approach to leadership: Development of
leader-member exchange (LMX) theory of leadership over 25 years: Applying a multi-level
multi-domain perspective. Leadership Quarterly, 6, 219–247.
Hill, G. W. (1982). Group versus individual performance: Are n + 1 heads better than one? Psycho-
logical Bulletin, 19, 517–539.
Hogarth, R. M., & Einhorn, H. (1992). Order effects in belief updating: The belief adjustment model.
Cognitive Psychology, 24, 1–55.
Kennedy, J. (1993). Debiasing audit judgment with accountability: A framework and experimental
results. Journal of Accounting Research, 31, 231–245.
Luus, C. A. E., & Wells, G. (1994). The malleability of eyewitness confidence: Co-witness and perse-
verance effects. Journal of Applied Psychology, 79, 714–723.
Marxen, D. (1990). A behavioral investigation of time budget preparation in a competitive audit envi-
ronment. Accounting Horizons, 4, 47–57.
McMillan, J., & White, R. (1993). Auditors’ belief revisions and evidence search: The effect of
hypothesis frame, confirmation bias, and professional skepticism. The Accounting Review, 68,
443–465.
Messier, W., & Tubbs, R. (1994). Mitigating recency effects in belief revision: The impact of audit
experience and the review process. Auditing: A Journal of Practice & Theory, 14, 57–72.
Miner, F. (1984). Group versus industrial decision-making: An investigation of performance mea-
sures, decision strategies and process. Organizational Behavior and Human Performance, 39,
112–124.
Myers, D., & Lamm, H. (1976). The group polarization phenomenon. Psychological Bulletin, 82,
602–627.
Newman, D. (1980). Prospect theory: Implications for information evaluation. Accounting, Organiza-
tions and Society, 5, 217–230.
Pei, B. K., Reed, S., & Koch, B. (1992). Auditor belief revisions in a performance auditing setting:
An application of the belief-adjustment model. Accounting, Organizations, and Society, 17,
169–183.
Reckers, P. M. J., & Schultz, J. (1993). The effect of fraud signals, evidence order, and group-assisted
counsel on independent auditor judgment. Behavioral Research in Accounting, 5, 124–144.
Schultz, J. J., & Reckers, P. (1981). The impact of group processing on selected audit disclosure
decisions. Journal of Accounting Research, 19, 482–501.
Sniezek, J. A., & Henry, R. A. (1989). Accuracy and confidence in group judgment. Organizational
Behavior and Human Decision Processes, 43, 1–28.
Sniezek, J. A., & Henry, R. (1990). Revision, weighting, and commitment in consensus group judgment.
Organizational Behavior and Human Decision Processes, 45, 66–84.
Solomon, I. (1987). Multi-auditor judgment/decision-making research. Journal of Accounting
Literature, 6, 1–25.
Stasser, G. (1992). Information salience and the discovery of hidden profiles by decision-making
groups? A “thought experiment”. Organizational Behavior and Human Decision Processes,
52, 156–181.
Stasser, G., & Davis, J. (1981). Group decision-making and social influence: A social interaction
sequence model. Psychological Review, 88, 523–551.
Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision-making: Biased
information sampling during discussion. Journal of Personality and Social Psychology, 48,
1467–1478.
Tetlock, P. (1983). Accountability and the perseverance of first impressions. Social Psychology
Quarterly, 46, 285–292.
Trotman, K., & Wright, A. (1996). Recency effects: Task complexity, decision-mode, and task-specific
experience. Behavioral Research in Accounting, 8, 175–193.
Tubbs, R., Messier, W., Jr., & Knechel, W. (1990). Recency effects in the auditor’s belief-revision
process. The Accounting Review, 65, 452–460.
Valacich, J. S., & Schwenk, C. (1995). Devil’s advocacy and dialectical inquiry effects on face-to-face
and computer-mediated group decision-making. Organizational Behavior and Human Decision
Processes, 63, 158–173.
Vance, R., & Biddle, T. (1985). Task experience and social cues: Interactive effects on attitudinal
reaction. Organizational Behavior and Human Performance, 35, 252–265.
Weiss, H., & Shaw, J. (1979). Social influences in judgments about task. Organizational Behavior and
Human Performance, 24, 126–140.
White, S., Mitchell, T., & Bell, C. (1977). Goal setting, evaluation apprehension and social cues as
determinants of job performance and job satisfaction in a simulated organization. Journal of
Applied Psychology, 52, 665–673.
Winquist, J., & Larson, J. (1998). Information pooling: When it impacts group decision-making. Journal
of Personality and Social Psychology, 74, 371–378.
Wright, E., Luus, C., & Christie, S. (1990). Does group discussion facilitate the use of consensus
information in making causal attribution? Journal of Personality and Social Psychology, 59,
261–269.
Zarnoth, P., & Sniezek, J. (1997). The social influence of confidence in group decision-making. Journal
of Experimental Social Psychology, 33, 345–367.
INVESTIGATING ERROR PROJECTION
AMONG STATE AUDITORS: THE
IMPACT OF INTENTIONAL AND
SYSTEMATIC MISSTATEMENTS
John T. Reisch, Karen S. McKenzie and

Alan H. Friedberg
ABSTRACT
This paper investigates state auditors’ decisions regarding the isolation
or projection of sample misstatements to underlying sample populations.
Seventy-eight state auditors completed four treatment cases that incor-
porate the complete 2 × 2 manipulation of intentional/unintentional and
systematic/non-systematic misstatements in different case scenarios, enabling
a test of the independent variables both across and within case scenarios.
The results indicate that both across and within case scenarios, auditors
tend to project systematic misstatements more often than they project
non-systematic misstatements. However, the auditors’ isolation/projection
decisions are generally not influenced by whether the sample misstatements
are intentional or unintentional.

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06003-4
53
54 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG
INTRODUCTION
In 2000, state and local governments in the U.S. generated over $1.2 trillion
in revenues; they also spent over $1.1 trillion, accounting for over 9% of the
U.S. gross domestic product (28% of gross domestic product when the federal
government is included) (OMB, 2001). The magnitude of this economic activity
accentuates the need for proper oversight of the sources and uses of the funds,
including the audits of state and local governments. Despite the extent to which
state and local government activity impacts the economy, relatively little behav-
ioral auditing research has been conducted on the effectiveness and efficiency of
the auditors employed by these entities. This study addresses the issue of auditor
effectiveness by empirically testing the professional judgments of state auditors
in a context-rich environment; specifically, it examines the subjective assessment
of sample evidence by state auditors.
Sampling is one area where the evaluation of evidence may be largely
affected by subjective differences in auditors’ judgments. The auditing standards
explicitly state that in a variables sampling context, the “auditor should project
the misstatement results of the sample to the items from which the sample was
selected” (AICPA, 2001, AU§350.26).1 However, in addition to the quantitative
task of projecting sample misstatements, the standards also note that auditors
should consider the qualitative aspects of the misstatements (AICPA, 2001,
AU§350.27), and that the “actions that might be taken in light of the nature and
cause of particular misstatements” is left to the discretion of the auditor (AICPA,
2001, AU§350.06). Thus, some discord exists as to whether misstatements should
always be projected; and if not, under what conditions they should be isolated. If
an auditor inappropriately isolates misstatements found in a sample, the likelihood
of a non-representative, or biased, estimate of the account balance being tested
increases. More specifically, failure to project sample misstatements generally
results in an underestimation of the aggregate misstatement in the underlying
population, thereby increasing the auditor’s risk of incorrect acceptance. In the
case of state auditors, this implies a failure to satisfy an essential element of public
control and accountability.
The extent to which state auditors do not project sample misstatements of
account balances and the potential consequences of inappropriately isolating
misstatements is an important research topic. State auditors often conduct financial
statement audits; the results of which are used in a variety of ways, including the
allocation of resources among programs and personnel, monitoring compliance
with fiscal laws, and even bond ratings. This study focuses on non-sampling
risk,2 and extends existing literature in three ways. The first contribution is the
Investigating Error Projection Among State Auditors 55
finding that systematic misstatements in sampling data significantly affect state

auditors’ decisions to project misstatements to the account population, while the
impact of intentional misstatements is generally not significant. Although the
impact of intentional misstatements on auditors’ projection decisions has been
indirectly investigated in prior studies (Burgstahler & Jiambalvo, 1986; Dusenbury
et al., 1994; Hermanson, 1997), the effect of systematic misstatements has not
been previously examined.
A second contribution of this study is the methodology used. Most other studies
(Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997) have
tested for factors impacting auditors’ isolation/projection decisions using incom-
plete designs that do not isolate the effects of the variables in all combinations of
the treatment variables.3 In this study, participants completed four separate cases
representing the 2 × 2 manipulation of the two independent variables, intentional
or unintentional misstatements and systematic or non-systematic misstatements.
In addition, the study’s design centers on an important aspect of sampling
research – the extent to which projection decisions are context specific; that is,
how different case scenarios (e.g. inventory versus receivables) and types of
misstatements can influence auditors’ decisions. The design utilized in this study
allows for analysis of the data across treatments (the complete 2 × 2 manipulation)
as well as within individual case scenarios because each participant completed
four separate cases, each representing an individual cell in the 2 × 2 design.
The third contribution of this study is the use of state auditors rather than
external auditors to test whether auditors project or isolate misstatements found
in samples. None of the prior studies investigating auditors’ evaluation of sample
findings use governmental auditors in their empirical tests; thus, a secondary
objective of the paper is to specifically investigate state auditors’ sample evalu-
ation decisions. As Green (1992, p. 62) notes, “Applying generic psychological
and/or informational processing theories fails to recognize that there are unique
characteristics in governmental and non-profit settings that could have different
(and perhaps contradictory) influences on behavior in those settings.” One cannot
simply project research findings from the for-profit environment onto state
government auditors. Thus, by using state auditors as study participants, this study
extends the literature investigating auditors’ isolation/projection decisions to this
important setting.
A review of the existing literature investigating behavioral implications of audit
sampling is presented immediately below, leading to the research hypotheses and
experimental methodology. The results of the study are presented and analyzed
in the penultimate section, followed by concluding comments, limitations of the
study, and future research possibilities.
BACKGROUND AND HYPOTHESES DEVELOPMENT

Several empirical studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al.,
1994; Hermanson, 1997; Wheeler et al., 1997) have investigated the behavioral
implications of auditors analyzing sample results. These studies focus on the
potential biases auditors may have when sampling evidence. In particular, the
studies find that auditors sometimes inappropriately isolate sample misstatements
from the population being tested; that is, auditors may not project misstatements
to the population. The manner in which auditors respond to sample data varies
so much that it was “deemed a major problem” over a decade ago (Akresh
& Tatum, 1988), and anecdotal evidence suggests little has changed since. A
recent case filed by the SEC against two former Coopers & Lybrand auditors
accused of negligence during an audit of California Micro Devices poignantly
demonstrates the problem of evaluating sample evidence (MacDonald, 2000).
According to the SEC, the auditors ignored serious issues raised from one-third
of the returned accounts receivable confirmations. The SEC asserts that although
the audit firm did require adjustments for those confirmations, the auditors did
not investigate further to see whether other revenue problems existed. Apparently,
the auditors did not project the results of the sampled confirmations to the overall
receivable balance.
To examine the potential problems auditors may have when evaluating sample
results, the study investigates how two factors affect auditors’ isolation/projection
decisions: whether the sample misstatements are intentional or unintentional,
and whether the misstatements are systematic or non-systematic. The primary
objective in testing these factors is to provide new insight into the biases
that influence auditors’ decisions when deciding whether to isolate a sample
misstatement or project it to the underlying population from which it was
drawn. Testing the two factors concurrently fulfills a call by Dusenbury et al.
(1994) for finer partitioning of misstatements in researching auditors’ projection
decisions.
The Uniqueness of Misstatements
The propensity of auditors to isolate rather than project sample misstatements can
occur when auditors assume that the misstatements do not exist elsewhere in the
population. One explanation for the lack of projection is that the auditors view the
misstatements as being unique or unusual and, therefore, not truly representative
of the underlying population being tested. Empirical evidence indicates that
the uniqueness perception of misstatements is highly significant in determining
whether auditors isolate misstatements found in sample data (Burgstahler &

Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997; Wheeler et al., 1997).
Burgstahler and Jiambalvo (1986) suggest that the positive correlation between
the projection of sample misstatements and the auditors’ perception of similarity
of the misstatement to other errors might be explained by the representative
heuristic (Kahneman & Tversky, 1972). According to Kahneman and Tversky,
individuals make inferences about a situation by comparing it to attributes of
similar situations the individual has encountered previously. When applied to a
sampling context involving a misstatement, the theory predicts that an auditor
will compare the characteristics of the sample error to characteristics of some
prototypical error(s) the auditor has categorized in memory. Thus, errors lacking
the prototypical characteristics of other, more common errors are more likely to be
determined unique by the auditor. Taking this one step further, when a sample error
is perceived as unique, the auditor may think that similar errors are unlikely to exist
in the underlying population and, therefore, will have a tendency to isolate the er-
ror. Conversely, errors that occur more commonly will have a greater likelihood of
being projected since the auditor will recognize characteristics of the misstatement
when compared to error attributes held in memory. The auditor will then determine
that the sample error is not unique and could likely occur again in the population.
The general notion of prototype matching explained by the representative
heuristic is partially supported by Dusenbury et al. (1994) in their investigation
of error containment. Containment is a process that involves a restratification
strategy whereby certain qualitative characteristics associated with the initial
sample misstatement are identified. All transactions in the population that meet
the qualitative criteria are then segregated, ex post, into a narrow stratum. If
no other misstatements are found after examining the entire stratum, the initial
misstatement does not need to be projected to the rest of the population since
the stratum has been thoroughly tested. Dusenbury et al. (1994) found that in the
absence of containment information, less frequent errors were isolated more often
than frequently occurring errors. However, when containment information was
provided to the participating auditors, more frequent errors were isolated more
often than less frequent errors, all of which involved irregularities.
Intentional Nature of Misstatements
A number of accounting studies report the incidence of misstatements, in

general, to be low (Ashton, 1991; Libby, 1985; Libby & Frederick, 1990).
Intentional misstatements (i.e. fraudulent misstatements) occur even less often.
The representative heuristic might suggest that the low frequency of intentional
misstatements in audit populations would prompt auditors to isolate an irregu-

larity rather than project it to the underlying population being tested. However,
auditing standards suggest that irregularities or intentional misstatements warrant
additional consideration when uncovered. In fact, the standards specifically state:
Generally, an isolated, immaterial error in processing accounting data or applying accounting
principles is not significant to the audit. In contrast, when fraud is detected, the auditor should
consider the implications for the integrity of management or employees and the possible effect
on other aspects of the audit (AICPA, 2001, AU§312.08).
Thus, if an auditor uncovers an intentional misstatement in a sample, s/he may be

more inclined to project the sample than if the discovered misstatement was unin-
tentional. This premise is supported, in part, by Dusenbury et al. (1994), who found
that in the presence of containment information, intentional and less frequently
occurring errors were more likely to be projected than unintentional, more frequent
misstatements. Dusenbury et al. (1994, p. 262) suggest that “The discovery of an
irregularity, while normally rare and thus dissimilar to other errors, should induce
additional caution (AU§350.27) and might result in higher projection rates.”
Several studies (Anderson & Maletta, 1994; Ashton & Ashton, 1988; Kida,
1984; Trotman & Sng, 1989) have found that auditors place more importance
on negative evidence (evidence that counters client assertions) than on positive
evidence. Although auditors will likely view any sample misstatement as negative
evidence, an irregularity might be considered more negative evidence than a
similar, unintentional misstatement. If this assertion holds, it suggests that auditors
would tend to be more conservative in their handling of an irregularity than they
would an unintentional error. Thus, auditors may have more of an inclination to
project an intentional sample misstatement to the population than they would an
unintentional error.
Intentional Versus Unintentional Misstatements Hypothesis
Research (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson,

1997; Wheeler et al., 1997) indicates that external auditors generally have
a tendency to isolate less frequently occurring errors. Given that intentional
misstatements occur very infrequently, these research findings suggest that
irregularities discovered in samples are likely to be isolated by auditors because
the misstatements are perceived as being unique. Auditing standards and studies
on negative evidence, however, suggest that intentional misstatements warrant
more consideration than unintentional misstatements. In a sampling context, this
additional consideration may involve the projection of a sample misstatement,
a more conservative approach than isolation; this is consistent with the findings
of Dusenbury et al. (1994) who found that intentional misstatements, in the
presence of containment information, were more likely to be projected than
unintentional misstatements. Increased attention to intentional misstatements may
be particularly warranted in the case of governmental auditors since generally
accepted governmental auditing standards state that the threshold for audit risk
may be lower in governmental audits than in audits of commercial entities (GAO,
1994, §4.9), and because various legal and regulatory requirements faced by
governmental auditors may require reporting on any intentional misstatement,
regardless of its materiality. Thus, the following research hypothesis is proposed:
H1. The propensity of state auditors to project intentional sample misstatements
to the underlying population being tested will be greater than their propensity
to project unintentional misstatements.
Systematic Versus Non-systematic Misstatements Hypothesis
Misstatements, whether intentional or not, may occur systematically or non-

systematically. Systematic misstatements can be defined as those that are likely to
be repeated because of some characteristic(s) associated with a transaction or class
of transactions. Systematic misstatements may occur frequently or infrequently
depending on the persistence of the underlying cause. However, the presence
of a systematic error in an audit sample, by definition, implies that other errors
may be present in the underlying population due to the same causal condition.
Thus, normatively, systematic misstatements discovered in a sample should be
projected to the entire population. The following research hypothesis related to
auditors’ behavior in the evaluation of sample findings is proposed:
H2. The propensity of state auditors to project systematic sample misstatements
to the underlying population being tested will be greater than their propensity
to project non-systematic misstatements.
EXPERIMENTAL METHODOLOGY
Experimental Task
To test the hypotheses, a series of sampling cases (see Appendix) that incorporated
the experimental manipulations was developed. These cases enabled both across
scenarios and within scenario analysis. Burgstahler and Jiambalvo’s (1986)
cases served as a basis for comparability with other studies; however, precise
replication of cases used in these other studies was not tractable given the
manipulations employed. In addition, although comparisons can be made between
the results of this study and other studies on sample projections, the pressures and
incentives encountered by governmental auditors are believed to vary from those
encountered by external auditors. Until research addresses environmental factors,
perceptions of differences are all that distinguish the governmental auditor from
the non-governmental auditor.
The experimental instrument included four treatment cases, each representing
a different account balance (e.g. sales and accounts receivable) and sampling
situation; thus, we are able to capture the four treatments of the 2 × 2 design in
which we manipulate two independent variables: (1) the type of misstatement as
either intentional or unintentional (INT); and (2) the nature of the misstatement
in terms of potential recurrence, operationalized as being either systematic or
non-systematic (SYS).4 Each of the four cases should be projected according to
the guidance provided by SAS No. 39, “Audit Sampling.” When noted in the cases,
the employees responsible for the misstatements were intentionally kept at lower
levels (e.g. clerical employees or warehouse workers) to reduce the saliency of the
individuals involved, especially for manipulations containing intentional misstate-
ments. The dollar amounts of the misstatements were made immaterial since most
misstatements discovered by auditors are not individually material (Elder & Allen,
1998). In addition, keeping the materiality of the misstatements constant (i.e.
immaterial) enhanced control over the manipulations being tested by minimizing
potential confounding effects from the materiality of the misstatements.
The cases used were pretested at a chapter meeting of the Institute of Internal
Auditors. Most of the internal auditors in this chapter were governmental auditors,
and no feedback was received that indicated a problem understanding any of
the case scenarios. In addition, the results of the pretest suggested that the
experimental manipulations worked as intended.
This study is based on a repeated measure design in which each subject received
every possible combination of the 2 × 2 manipulation of INT and SYS. Each
combination was incorporated randomly into one of the four different treatment
scenarios. Each case scenario in the experimental instrument was included on an
individual page and participants were requested not to return to a scenario after it
was completed. The presentation of the case scenarios was randomized to minimize
potential order effects. After reading each scenario, participants made a decision as
to whether they would project the sample to the account population being tested or
isolate the sample result from the population. Subjects were then asked to complete
a ten point Likert-type scale, which measured the comfort level of their decision.
Subjects were instructed to consider each scenario independently and assume that
the samples were selected at random from the populations being tested.
Fig. 1. Illustration of Data Analysis Across Cases (Direct Comparison of the Four
Experimental Manipulations Without Taking the Individual Case Scenarios into Considera-
tion). The Two Manipulated Variables were: (1) Intentional or Unintentional Misstatement;
and (2) Systematic or Non-systematic Misstatement. The Four Cases Scenarios Involved
Misstatements in Sales, Inventory, Receivables, and Unknown Receipts.
The differences in the auditors’ decisions are analyzed both across all cases and
by individual case, as illustrated in Fig. 1. The analysis across the four treatment
cases tests the experimental conditions of the 2 × 2 manipulation of INT and SYS,
with each subject receiving one each of the four conditions in the 2 × 2 design.
In the analysis by individual case, all like cases (e.g. all sales scenarios, Case 1 in
Appendix) received by the participating auditors are tested in the aggregate.
A major difference between the experimental design used in this study and
the research designs of most other studies investigating the isolation/projection
decisions of auditors (e.g. Burgstahler & Jiambalvo, 1986; Dusenbury et al.,
1994; Hermanson, 1997) is that in this study, the effects of the two independent
variables are isolated in the four combinations of the 2 × 2 design. The other
studies tested factors that affect auditors’ decisions by observing differences
across case scenarios. For example, auditors’ decisions regarding an infrequent

sample misstatement in a sales scenario were directly compared to decisions
about a more frequently occurring error in a case scenario involving inventory.
Thus, while appropriate comparisons can be made between the cases, many
extraneous variables that are not measured could also be affecting the results (e.g.
different levels of audit risk may be associated with different types of accounts).
The complete, multi-cell design of this study controls for potential confounding
effects associated with the direct comparisons of different case scenarios because
each case scenario in this study has four treatment combinations. In addition,
this study provides a robust test of the effects of intentional and systematic mis-
statements on auditors’ decisions since the effects are also tested in four different
case settings.
Subjects
A total of 100 experimental instruments were distributed to governmental auditors

from ten different state audit departments, representing all regions of the United
States. Seventy-eight were completed and returned. Department managers of the
participating states administered the distribution of instruments to members of
their audit departments, and collected and returned the completed instruments to
the authors. Each participant was assured complete anonymity.
As Table 1 indicates, 65 of the 78 participants (83%) had some professional
certification (e.g. CGFM, CPA), with the CPA designation being the most common
(62% of the participants). Further analysis shows that only seven participants
without some professional certification are from states hiring entry level audit staff
from a variety of educational backgrounds (i.e. the state auditors are not required
to have an accounting education or minimum number of accounting hours). Thus,
a maximum of 7 subjects may have been responding to the instrument as newly
hired audit staff without accounting knowledge (8.9%).5 The appropriateness
of the participating auditors for the experimental task was further indicated by
their response to a question asking the auditors how frequently they use sampling
procedures on audit engagements, with 80% being the mean response. In addition,
only four participants indicated that they did not use sampling or they did not
respond to this particular inquiry. Analyses of the data excluding these four
auditors indicate no significant differences from the results reported below.
At first glance, the case context might seem inappropriate for the governmental
environment investigated (e.g. “it is company policy to . . .” and “a review of
the company’s internal audit workpapers reveal . . .”); however, the context
is appropriate because auditors of state or local government (SLG) financial
Table 1. Subject Profiles.

Panel A: Frequencies
Region of U.S.a Number of Professional Number of

Subjects Certification Subjectsb
Central 15 CPA 48
North Central 11 CGFM 21
Northeast 7 CIA 3
Northwest 15 Other 7
South 15 None 13
Southeast 20
Total 78
Panel B: Means
Item Mean Standard

Deviation
Frequency of sampling
% of audit engagements 79.9 25.8
When sampling is used
(1) Frequency of statistical sampling (%) 26.3 30.7
(2) Frequency of judgmental sampling (%) 73.3 30.7
a States categorized according to Webster’s College Dictionary (1991).
b Does not total 78 (the number of subjects) since some subjects hold more than one certification.
statements must be knowledgeable of generally accepted accounting principles for

governments, which require two different accounting models. “General govern-
ment” type activities of SLGs employ a current financial resources measurement
focus on a modified accrual basis, whereas “proprietary” type activities of SLGs
employ an economic resources measurement focus on a full accrual basis. The
latter activity type is very similar to private sector accounting, although within the
realm of government controls. All state governments employ general government
activities in the provision of a general basket of services to all constituents. Most
states also employ proprietary activities, linking user fees (charges for services)
to costs of services much like a private business enterprise. Some states rely
on Public Benefit Corporations (PBC) to provide services the state would have
accounted for as proprietary activity, but give audit responsibility over the PBC
to state auditors. Each of the state audit offices selected for participation had audit
responsibilities over such proprietary activities, whether the activities were part
of the state proper or activities of a PBC. In addition, those activities included
the focal issues of this study’s experimental cases (e.g. sales, receivables, and
inventory). The study emphasized the proprietary activities for comparability to

prior studies, which have all targeted private sector oriented subjects.
RESULTS AND ANALYSIS
Manipulation Checks
Prior to testing the hypotheses, manipulation checks were performed on INT

and SYS to determine whether subjects interpreted the independent variables as
intended. In the post-experimental questionnaire, subjects were asked to return
to each case and indicate whether the case misstatement was: (1) intentional or
unintentional; and (2) systematic or non-systematic. Subjects were asked to do
this without changing their initial case responses, and there was no evidence of
subjects changing their responses (e.g. erasures or overwritings).
Across the four treatment cases, subjects were able to correctly identify the
misstatement intention 94.2% of the time ( p < 0.0001 for all four treatment
cases), and subjects agreed with the systematic manipulations 62.6% of the time
( p < 0.10 for Cases 2–4; p = 1.00 for Case 1 (sales)). Subjects did not interpret
the manipulation of systematic misstatements in Case 1 as anticipated; only 49.3%
of the subjects agreed with the intended manipulation of the sales scenario.
The results of the systematic manipulation check do not render the experiment
invalid; in fact, it actually enhances the findings. In testing the hypotheses, the
analyses were run two ways: (1) using the independent variables INT and SYS
in the models; and (2) replacing INT and SYS with the subjects’ assessments of
intentional and systematic misstatements in the cases. This enabled interpretation
of the effects of the variables on the isolation/projection decision from two angles.
First, an analysis was performed on the auditors’ decisions based on our manipula-
tions of the two independent variables. Second, an analysis was performed on the
subjects’ decisions according to how they interpreted whether the misstatements
were intentional and systematic.6
Analysis Across Treatment Cases
The data are first analyzed across cases with all of the cases collapsed into a
single group; that is, the individual case scenarios are ignored yielding a repeated
measures design in which each subject receives all four treatment combinations of
the 2 × 2 manipulation. Since each scenario is expected to elicit the same response,
multiple scenarios are used so the results are not too dependent on any one particular
Table 2. Logistic Regression Results of Isolation/Projection Decisionsa (Across

Treatment Cases).
Independent Parameter Odds Ratio Wald ␹2 p-Value Model
Variable Estimate
␹2 p-Value c
Panel A: Using manipulated variables

Intercept −0.6771 10.71 0.0011 27.47 0.0001 0.660
INT 0.2995 1.26 1.59 0.2077
SYS 1.1864 3.28 24.90 0.0001
Panel B: Using subjects’ assessments of manipulated variables
Intercept −1.1578 24.30 0.0001 64.28 0.0001 0.741
INTCK 0.3187 1.38 1.51 0.2186
SYSCK 1.9360 6.93 55.77 0.0001
a The dichotomous dependent variable is the auditors’ decisions with regard to sample misstatement
findings, defined as 0 = Isolate or 1 = Project. The independent variable INT is manipulated as an
intentional or unintentional misstatement, and SYS is manipulated as a systematic or non-systematic
misstatement. INTCK and SYSCK refer to the participants’ assessments of intentional and systematic
misstatements.
scenario. To the extent that one or more of the scenarios would produce a different
response in the dependent variable (the decision of isolating or projecting the
sample misstatement), the analysis would be biased against finding a significant
result; thus, collapsing the scenarios into a single group is a conservative approach
to the data analysis.
Table 2 presents the results of the logistic regression with all of the cases col-
lapsed into one group. Panel A shows the results using the manipulated variables
INT and SYS. In Panel B, the manipulated explanatory variables INT and SYS
have been replaced in the logistic regression model with INTCK and SYSCK,
the subjects’ assessments of whether the misstatements were intentional or
systematic, respectively. In both across treatment models, the chi-square statistics
are significant at the 0.01 level, suggesting the models are good predictors of
the auditors’ propensity to isolate or project sample misstatements. In addition,
goodness of fit for the logistic regression models was obtained by the c statistic,
which is somewhat analogous to the coefficient of multiple correlation (Kane
et al., 1996).7 In both models, the c statistic is greater than 0.65.
The analysis across treatments does not support H1 regardless of whether INT
or INTCK is included in the regression models, indicating that no difference exists
in the auditors’ isolation/projection decisions whether the sample misstatements
are intentional or not.
The independent variable SYS, which is used to operationalize the manipulation
of systematic misstatements, is highly significant as indicated in Table 2. As a
result, H2 is supported; that is, auditors have more of a propensity to project

systematic sample misstatements to the underlying population being tested than
they do non-systematic misstatements. This is also supported by the odds ratio
presented in the table. Since logistic regression provides the log of odds,8 the
model also provides the odds of projecting the sample misstatements versus
isolating the misstatements (Stokes et al., 1995). Across all treatment cases,
participants were 3.28 times more likely to project systematic misstatements than
non-systematic misstatements.
The findings discussed above for the analyses across treatments strongly
support H2, suggesting that state auditors have a greater propensity to project
systematic misstatements than non-systematic misstatements. However, the rate at
which systematic misstatements are isolated appears to be symptomatic of a lack
of understanding of sampling, including the potential impact of not projecting a
misstatement that may occur again in similar circumstances. As indicated in Panel
A of Table 3, the participating auditors chose to isolate systematic misstatements
35% of the time. The isolation rate for systematic misstatements decreases
somewhat based on the subjects’ assessments of whether the misstatements were
systematic. As shown in Panel B of Table 3, the participating auditors isolated
misstatements they deemed to be systematic 28% of the time. While this is a
decrease over the isolation rate in Panel A, it poignantly indicates that auditors
frequently isolate sample misstatements even if they believe the misstatements are
apt to recur.9 The auditors’ decision frequencies in Table 3, Panels A and B, also
show that across the treatment cases, the systematic/intentional manipulations had
the largest number of projection decisions, while the non-systematic/unintentional
manipulation generally had the least.
In addition to testing for the effects of the independent variables in the logistic
regressions across treatments, tests were performed for interactions in every model
used in this study. Overall, interactive effects were not found to be significant.
Tests were also conducted to determine whether the presentation order of the case
scenarios, professional certification of the auditors, and task-related knowledge
(measured by the frequency in which sampling is performed on engagements) had
an impact on the auditors’ projection decisions. No significant findings were noted
for any of these factors on any of the tests performed in this study.
Analyses of Individual Cases
Table 4 presents logistic regression results for each case treatment rather than for a
single aggregate group across the treatment cases. Each case and each treatment
Table 3. Projection/Isolation Decision Frequencies.

Projected Isolated
Panel A: Using manipulated variables

INT manipulation
Intentional 85 (54%) 71 (46%)
Unintentional 75 (48%) 81 (52%)
Total 160 (51%) 152 (49%)
SYS manipulation
Systematic 102 (65%) 54 (35%)
Non-systematic 58 (37%) 98 (63%)
Total 160 (51%) 152 (49%)
INT and SYS manipulations
Systematic
Intentional 51 (65%) 27 (35%)
Non-systematic
Intentional 34 (44%) 44 (56%)
Total 160 (51%) 152 (49%)
Panel B: Using subjects’ assessments of manipulated variables
INTCK manipulation
Intentional 88 (55%) 73 (45%)
Total 156 (51%) 152 (49%)
SYSCK manipulation
Systematic 113 (72%) 44 (28%)
Non-systematic 39 (27%) 106 (73%)
Total 152 (50%) 150 (50%)
INTCK and SYSCK manipulations
Systematic
Intentional 65 (77%) 19 (23%)
Non-systematic
Intentional 20 (27%) 54 (73%)
Total 152 (50%) 150 (50%)
Number (%) of projection and isolation decisions across treatment cases.

Table 4. Logistic Regression Results of Isolation/Projection Decisionsa (by

Individual Case).
Independent Variable Parameter Odds Ratio Wald ␹2 p-Value Model
Estimate ␹2 p-Value c
Panel A: Case (sales)

Using manipulated variables
Intercept −1.0627 4.79 0.0287 20.72 0.0001 0.778
INT −1.3655 0.26 5.82 0.0158
SYS 2.1975 9.00 13.41 0.0003
Using subject’s assessments of manipulated variables
Intercept −0.6399 2.10 0.1475 9.58 0.0083 0.696
INTCK −0.7632 0.47 2.24 0.1345
SYSCK 1.2085 3.35 5.63 0.0177
Panel B: Case 2 (inventory)
Intercept −2.6914 11.95 0.0005 33.87 0.0001 0.827
INT 2.5369 12.64 10.21 0.0014
SYS 3.0749 21.65 14.81 0.0001
Intercept −3.1318 15.25 0.0001 40.42 0.0001 0.872
INTCK 2.1819 8.86 8.41 0.0037
SYSCK 3.2703 26.32 20.99 0.0001
Panel C: Case 3 (receivables)
Intercept −0.1947 0.24 0.6207 3.01 0.2225 0.610
INT 0.4356 1.55 0.78 0.3766
SYS 0.7499 2.12 2.43 0.1164
Intercept −0.6816 2.65 0.1475 11.61 0.0030 0.700
INTCK 0.1420 1.15 0.07 0.1345
SYSCK 1.6502 5.21 9.75 0.0177
Panel D: Case 4 (unknown receipts)
Intercept −0.2045 0.29 0.5913 2.33 0.3113 0.596
INT 0.4865 1.63 1.09 0.2974
SYS 0.4549 1.58 0.95 0.3291
Intercept −1.0347 5.25 0.0220 15.80 0.0004 0.747
INTCK 0.5288 1.70 1.03 0.3107
SYSCK 1.8864 6.60 13.13 0.0003
a The dichotomous dependent variable is the auditors’ decisions with regard to sample misstatement
findings, defined as 0 = Isolate or 1 = Project. The independent variable INT is manipulated as an
intentional or unintentional misstatement, and SYS is manipulated as a systematic or non-systematic
misstatement. INTCK and SYSCK refer to the participants’ assessments of intentional and systematic
misstatements.
was completely randomized in the experiment to minimize any potential order

effects.
The logistic regression results for the sales scenario (Case 1) are located in
Panel A of Table 4. The manipulation INT is significant ( p < 0.05), but unlike the
other three treatment cases, the auditors had slightly more of a propensity to isolate
the intentional sales misstatement and project the unintentional misstatement.
However, the subjects’ assessments of whether the misstatement was intentional,
INTCK, is not significant. Both the systematic manipulation (SYS) and the
subjects’ assessment of whether the misstatement was systematic (SYSCK) are
significant in the Case 1 regressions ( p < 0.01 and 0.05, respectively). The odds
ratios indicate a 9.00 and 3.35 times more likelihood of projecting a systematic
sales misstatement than a non-systematic misstatement when the variables SYS
and SYSCK are used, respectively.
For the inventory scenario (Case 2), both manipulated variables INT and SYS
are significant ( p < 0.01) as shown in Panel B of Table 4, as are the subjects’
assessments of intentional and systematic misstatements, INTCK and SYSCK.
The second case fits the model the strongest for both the manipulated and
subjects’ assessed variables, as evidenced by high odds ratios for the intentional
and systematic variables, the large model chi-square statistics, and the relatively
high c statistics (above 0.825). Using the subjects’ assessments of the manip-
ulated variables for the inventory scenario, the results indicate that systematic
misstatements are over 26 times more likely to be projected that non-systematic
misstatements, while intentional misstatements are nearly nine times likely to be
projected than unintentional misstatements.
In Case 3, the receivables scenario, neither INT not INTCK are significant in
their respective models (Panel C of Table 4). Of the two systematic misstatement
variables, only SYSCK is significant, as evidenced by a p-value of 0.0117 and an
odds ratio indicating that systematic misstatements are over five times more likely
to be projected than non-systematic misstatements.
Panel D of Table 4 shows the results for the unknown receipts scenario (Case
4). Regression results indicate that none of the factors are significant for the model
containing the manipulated variables INT and SYS. This may be attributable to
a weak fit of the model for the unknown receipts scenario (model ␹2 = 2.33,
c = 0.596). When the subjects’ assessed variables are included in the model
(INTCK and SYSCK), the model fits the data much better (model ␹2 = 15.80,
c = 0.749) and the results are more meaningful. In this model, only SYSCK is
significant ( p < 0.01). The odds ratio for the auditors’ assessment of a systematic
misstatement indicates that misstatements of the unknown receipts are over six
times more likely to be projected than an assessed non-systematic misstatement.
Overall, the analysis of the individual case scenarios indicates that auditors’
isolation/projection decisions are not significantly affected by whether sample
misstatements are intentional or not; thus, the within case analyses does not
support H1. In addition, the results suggest that auditors are more likely to
project systematic sample misstatements to the underlying population than they
are non-systematic misstatements. While the finding is largely applicable in
the logistic regression models containing the variable SYS, the results are even
stronger when the subjects’ assessments of systematic misstatements, SYSCK, is
included in the regression models.
Discussion of Findings
Across case treatments, intentional misstatements were not projected by the

auditors more frequently than they were isolated, failing to support H1. Tests
performed using the within case analyses also suggest that overall, the auditors’
isolation/projection decisions are not influenced by whether or not the sample
misstatements were intentional.
The finding that intentional misstatements are generally not isolated contradicts
the representative heuristic (Kahneman & Tversky, 1972) and prior empirical
evidence that suggests the uniqueness perception of misstatements is highly signif-
icant in auditors’ decisions to isolate or project misstatements (e.g. Burgstahler &
Jiambalvo, 1986; Hermanson, 1997). In practice, intentional misstatements occur
infrequently relative to unintentional errors. Application of the representative
heuristic suggests that intentional misstatements should be isolated more than they
should be projected because auditors, having seen few, if any, intentional misstate-
ments, may not have a category of irregularity attributes in memory. Authoritative
standards appear to counteract the heuristic; generally accepted governmental
auditing standards directs the state auditors to “. . . design the audit to provide
reasonable assurance of detecting material misstatements resulting from non-
compliance with provisions of contracts or grant agreements that have a direct and
material effect on the determination of financial statement amounts” (GAO, 1994,
§4.13). In addition, government auditors apply the AICPA’s generally accepted
auditing standards, including SAS No. 99, “Consideration of Fraud in a Financial
Statement Audit.”
The analyses performed both across and within case treatments suggest that
auditors tend to project systematic misstatements more often than they isolate
them, providing support for H2. While this finding was highly significant for
the analyses using the systematic manipulation (SYS), the results were even
stronger when tests were run based on the subjects’ perceptions of systematic
misstatements (SYSCK). This is evidenced by the overall higher odds ratios and
Wald ␹2 values in Tables 2 and 4.
CONCLUDING COMMENTS
In this study, two factors posited to affect governmental auditors’ sample

projection decisions were tested, whether sample misstatements are intentional
and/or systematic. The study’s research design allowed for the testing of these two
independent variables both across case scenarios and within case scenarios. Prior
studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson,
1997) found evidence of factors affecting the auditors’ decisions as to whether
or not sample errors should be projected to the population from which they were
drawn, however, those findings were aggregated across cases and the impact of
the factors were not examined within specific case scenarios (i.e. the studies did
not manipulate variables within case scenarios).
In analyses performed both across and within case treatments, the results of
the study indicate that the states auditors did not generally project intentional
misstatements more frequently than unintentional misstatements. However, the
results suggest that auditors’ isolation/projection decisions are significantly influ-
enced by whether or not the sample misstatements were systematic; specifically,
auditors tend to project systematic misstatements more often than they isolate
them, providing support for H2.
This study also breaks new ground by bringing state auditors into the existing
research performed on auditors’ projection decisions regarding the evaluation
of sample findings. No other study investigating auditors’ evaluation of sample
findings use governmental auditors in their empirical tests. Prior research
has focused exclusively on external auditors whose pressures and incentives
are perceived to differ from those of the state auditors that participated in
this study. For example, external auditors have greater litigation risk than do
governmental auditors, while governmental auditors may have lower thresholds
for audit risk and materiality due to various legal and regulatory reporting
requirements.
One unexpected finding of the study is the frequency with which the experimen-
tal cases were isolated, especially considering that every manipulation in each of
the four treatment cases, normatively, should have been projected. Across all four
treatment cases, auditors isolated 49% of the sample misstatements. Even when
auditors perceived the misstatements to be systematic, approximately one-quarter
of the sample misstatements were not projected to the underlying populations
from which they were drawn. This finding may indicate that state auditors do not
fully understand audit sampling, including how noted misstatements should be

projected to the underlying sample population.
The inappropriate isolation of sample misstatements may impair the effec-
tiveness of state auditors, which in turn could adversely impact the allocation
of scarce resources by various state agencies. If a state auditor fails to project
sample misstatements, the likelihood of a biased estimate of the account balance
being tested increases, which could result in an underestimation of the aggregate
misstatement in the population. A better understanding of biases present in
sampling environments resulting from this study may lead to improved judgments
in governmental auditors’ projection decisions, and perhaps, more informed
decisions by lawmakers based on higher quality financial information.
Limitations and Extensions
A limitation of this study is the manner in which the instruments were distributed
to the subjects. As noted previously, the instruments were sent to the audit
directors of the participating state audit departments which could have prevented
a random distribution of the instrument to the state auditors in each location; that
is, the audit directors may have selected the most diligent auditors in the office
to complete the task rather than distributing the instrument randomly across all
auditors in the office, limiting the external validity of the study.
To reduce potential confounding factors, subjects were told that none of the mis-
statements presented in the case scenarios were material, and subjects were faced
with a dichotomous decision task – to project or isolate the sample misstatements.
The participants were not given the opportunity to contain the misstatements. The
containment process has been posited as an explanation for choosing isolation over
projection of misstatements (Dusenbury et al., 1994; Wheeler et al., 1997), and is
a common practice among external auditors (Elder & Allen, 1998). Although this
study was not designed to test containment effects, very few of the participants’
comments suggested their decisions were based on perceived containment, or lack
thereof. Of all the subjects’ comments, only a few expressed a desire for contain-
ment information. The lack of options available to the subjects may have weakened
the generalizability of the study, and leaves an avenue open for future research.
Although the manipulation of the systematic misstatements for the four cases
are likely to be repeated because of some characteristic(s) associated with a trans-
action or class of transactions, they are not operationalized in the same manner.
For example, in Case 2 (inventory), the systematic manipulation is operationalized
as a control over inventory by whether or not there were past inventory problems,
whereas in Case 3 (receivables), the systematic manipulation is also a control issue,
but it involves a computer system malfunction. The lack of uniformity in the opera-
tionalization of systematic misstatements is a weakness of the study. However, the
results were analyzed using both the initial manipulations of systematic misstate-
ments and the participating auditors’ assessments of whether the misstatements
were systematic, and in both analyses, the systematic manipulations are almost
all highly significant in explaining the auditors’ isolation/projection decisions.
Future research could address how auditors recognize and interpret the systematic
nature of misstatements and how that affects the auditors’ decision processes.
The case scenarios were set up in random order to minimize potential order
effects. Once the order of the scenarios had been selected for each participant, a
specific manipulation of the two independent variables was assigned to each of the
four treatment cases in a manner that insured every participant received each of the
four combinations of the 2 × 2 design (as illustrated in Fig. 1). While the process
of randomizing the research instrument in this manner should have minimized any
order effects, our ability to test for order effects was limited given that 28 different
combinations of the research instruments were distributed. Tests conducted that
compared the results of the different instrument combinations did not indicate the
presence of any order effects. In addition, ad hoc measures were developed that
compared the decisions among the different instruments in multiple ways (e.g.
compared the results based on which the case was presented first without regard
to the remaining order). These tests are admittedly imprecise; however, no effects
resulting from the order of the case presentation were noted and the randomization
of both the cases and treatment combinations should have minimized potential
order effects. Nevertheless, the low power of the tests for order effects is a limitation
of the study.
Finally, the use of state auditors as the subject pool limits the comparability of
this study to others that used non-governmental auditors as subjects. While both
governmental and non-governmental auditors must decide whether to isolate or
project sample misstatements to the population being evaluated, the experimental
manipulations may have affected the state auditors’ isolation/projection decisions
differently than they would have affected non-governmental auditors. Future
research should investigate the differences in audit environments between gov-
ernmental and non-governmental employers and the impact of those differences
on the actions of the auditors.
NOTES
1. Generally accepted governmental auditing standards (GAGAS) incorporate AICPA
standards relevant to financial statement audits unless the General Accounting Office (GAO)
excludes them by formal announcement (GAO, 1994, p. 32).
2. Auditing standards (AICPA, 2001, AU§350.11) divide the risk that a sample may be
non-representative of the population into sampling risk and non-sampling risk. Sampling
risk is the inherent risk of sampling that arises simply because less than the entire population
is examined. Non-sampling risk consists of risks not due to the sample selected but instead
involves risks associated with evaluating the sample, such as an auditor’s failure to recognize
exceptions in the sample selected, and the auditor’s inappropriate or ineffective application
of audit procedures.
3. Only one study conducted on auditors’ isolation/projection decisions (Wheeler et al.,
1997) used a complete design. They used a full 3 × 2 design to test the impact of containment
information on auditors’ sampling decisions and did not test either factor (intentional or
systematic misstatement) investigated in this study. In addition, Wheeler et al. used a single
case scenario in their study whereas we use four different case scenarios.
4. In this study, the delineation between systematic and non-systematic may be more
precisely described as “more systematic” and “less systematic,” because almost every
misstatement will have certain characteristics that could be construed as systematic.
5. Analyses of the data excluding the seven auditors who may potentially lack a
background in accounting indicate no significant differences from the reported results.
6. Because the manipulation of whether a misstatement is intentional is rather well-
defined, analyses were also conducted that excluded the participants that initially failed
the INT manipulation check. The results were substantially similar to those presented in
the paper.
7. The c statistic is derived by comparing the number of paired responses (of
observed and predicted responses) in the data set. It is defined by the equation: c =
(nc + 0.5(t − nc − nd))/t, where t is the total number of pairs with different responses,
nc is the number of concordant response pairs, nd is the number of discordant response
pairs, and t − nc − nd is the number of ties between the response pairs.
8. The odds ratio is calculated by exponentiating the parameter estimates (variable coef-
ficients) using the natural log (Stokes et al., 1995). For example, if the parameter estimate
is 1.2528, then the odds ratio is 3.50 (e1.2528 = 3.500).
9. The data were also analyzed using repeated measures analysis of variance (ANOVA)
by combining the auditors’ isolate/project decision with the comfort of their decision. The
resulting analysis yielded very similar results to the logistic regression presented.
ACKNOWLEDGMENTS
We appreciate the helpful comments of Richard Dusenbury, Randy Elder,
David Gilbertson, Julia Higgs, Bill Hopwood, Dennis O’Reilly, Steve Wheeler,
participants at the 1999 Southeast Regional AAA and 2000 Auditing Section
Midyear meetings, two anonymous reviewers, and the editor.
REFERENCES
Akresh, A., & Tatum, K. (1988). Audit sampling – dealing with the problems. Journal of Accountancy
(December), 58–64.
American Institute of Certified Public Accountants (AICPA) (2001). AICPA Professional Standards
as of June 30th, 2000 (Vol. 1). New York, NY: AICPA.
Anderson, B. H., & Maletta, M. (1994). Auditor attendance to negative and positive informa-
tion: The effect on experience-related differences. Behavioral Research in Accounting (6),
1–20.
Ashton, A. H. (1991). Experience and error frequency knowledge as potential determinants of audit
experience. The Accounting Review (April), 218–239.
Ashton, A. H., & Ashton, R. H. (1988). Sequential belief revision in auditing. The Accounting Review
(October), 623–641.
Burgstahler, D., & Jiambalvo, J. (1986). Sample error characteristics and projection of error to audit
populations. The Accounting Review (April), 233–248.
Dusenbury, R., Reimers, J., & Wheeler, S. (1994). The effect of containment information and error
frequency on projection of sample errors to audit populations. The Accounting Review (January),
257–264.
Elder, R. S., & Allen, R. D. (1998). An empirical investigation of the auditor’s decision to project
errors. Auditing: A Journal of Practice and Theory (Fall), 71–87.
General Accounting Office (GAO) (1994). Government Auditing Standards: 1994 Revision.
Washington, DC: Comptroller General of the United States.
Green, S. L. (1992). Behavioral research in governmental and nonprofit accounting: An assessment of
the past and suggestions for the future. Research in Governmental and Non-profit Accounting
(7), 53–78.
Hermanson, H. M. (1997). The effects of audit structure and experience on auditors’ decisions to isolate
errors. Behavioral Research in Accounting, Suppl. (9), 76–93.
Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cog-
nitive Psychology (July), 430–454.
Kane, G. D., Richardson, F. M., & Graybill, P. (1996). Recession-induced stress and the prediction of
corporate failure. Contemporary Accounting Research, 13(2), 631–642.
Kida, T. (1984). The impact of hypothesis-testing strategies on auditors’ use of judgment data. Journal
of Accounting Research (Spring), 332–340.
Libby, R. (1985). Availability and the generation of hypotheses in analytical review. Journal of
Accounting Research (Autumn), 648–667.
Libby, R., & Frederick, D. M. (1990). Experience and the ability to explain audit findings. Journal of
Accounting Research (Autumn), 348–367.
MacDonald, E. (2000). ‘What’s Wevenue?’ Auditors Miss a Fraud and SEC tries to put them out of
business – scam at California Micro was well-hidden, says lawyer for Coopers duo – CFO’s
misleading resume. Wall Street Journal (January 6), A1.
Office of Management and Budget (OMB) (2001). A citizens’ guide to the federal budget, fiscal year
2002. Washington, DC: U.S. Government Printing Office.
Random House Webster’s College Dictionary (1991). New York, NY: McGraw-Hill.
Stokes, M. E., Davis, C. S., & Koch, G. G. (1995). Categorical data analysis using the SAS system.
Cary, NC: SAS Institute.
Trotman, K. T., & Sng, J. (1989). The effect of hypothesis framing, prior expectations and cue
diagnosticity on auditors’ information choice. Accounting, Organizations and Society, 14(5/6),
565–576.
Wheeler, S., Dusenbury, R., & Reimers, J. (1997). Projecting sample misstatement to audit popu-
lations: Theoretical, professional, and empirical considerations. Decision Sciences (Spring),
261–278.
APPENDIX
The treatment cases included in the experimental instrument are given below.
Cases 1–4 represent the four combinations of the complete 2 × 2 design that
tests two sample misstatement manipulations: intentional or unintentional mis-
statement and systematic or non-systematic misstatement. The unintentional and
non-systematic misstatement manipulations are italicized first, followed by the
manipulations for intentional and systematic misstatements that are also italicized
but in parentheses.
Case 1 (Sales)
Sales Account No. 77491 was understated by $945.16. It was determined that
a temporary clerical employee, who worked during a two week period in April,
mistakenly (deliberately) misfooted sales invoices for the account. The client’s
controller indicated that this was the only temporary employee (one of 25
temporary employees) used to process sales transactions.
Case 2 (Inventory)
During a physical inventory observation, it was discovered that inventory item No.
245-0672 (cleaning chemicals) was understated by 23 items valued at $50 each.
Further investigation revealed that a warehouse employee temporarily placed
the items in the breakroom to restock the company’s supplies closet (temporarily
placed the items in the breakroom with the intent of taking them home for his per-
sonal use) (the breakroom is adjacent to the company’s supplies closet). A review
of the company’s internal audit workpapers for the last two years, which report
on periodic surprise inventory test counts, revealed no similar instances (revealed
several similar instances) in which inventory was improperly segregated by
warehouse workers.
Case 3 (Receivables)
Receivables Account No. 16788 was overstated by $59. The misstatement was
discovered when the auditor compared the price on the selected sales invoice to the
client’s approved master price list in effect at the date of the sale. An investigation
into the matter revealed that a salesperson overcharged the customer for the item
when she inadvertently read the price of the next item on the master price list (to
increase her sales commission). The client’s accounting system was temporarily
down when the item was ordered and the transaction had to be manually processed.
When the system is operating, it cannot process transactions (it allows overrides
of transactions) if the price of the item is not within the approved master price
range. It was estimated that the system was down 3–5% of the time during the year.
Case 4 (Unknown Receipts)
It is company policy to place unidentified receipts into a temporary account

“Unknown Receipts.” For example, if cash or a check is received for payment
without a remittance advice or other means of identifying the account holder, the
transaction is recorded as a debit to cash and a credit to “Unknown Receipts.”
When the payor is later identified, an entry is made to debit “Unknown Receipts”
and credit the proper account receivable. Transactions involving the “Unknown
Receipts” account are carefully reviewed (are not carefully reviewed). A sample
of transactions indicated that cash was understated by $155.04. An investigation
revealed that an employee unintentionally (intentionally) reversed the entry for
an unknown receipt of cash by crediting cash and debiting “Unknown Receipts.”
HOW DOES NEGATIVE SOURCE
CREDIBILITY AFFECT COMMERCIAL
LENDERS’ DECISIONS?
Philip R. Beaulieu and Andrew J. Rosman
ABSTRACT
Data were collected from loan officers using a computerized process-tracing
program to help shed some light on how source credibility impacts the
judgments made by loan officers. Loan officers did not structure loans more
restrictively regardless of whether they were in the positive or negative char-
acter condition or whether they approved or denied the loan. Negative source
credibility affected decision process effort but did not produce the tradeoff
between loan approval and loan structure that is suggested in the literature.
Although significantly more (fewer) loans were denied when character in-
formation was negative (positive), a majority of loan officers in the negative
character condition approved the loan. While most loan officers were aware
of negative source credibility, they did not react by denying loans or adjusting
loan structure.
INTRODUCTION
While many agree that source credibility is important to lending decisions, how
negative source credibility impacts lender decisions is less understood. Some sug-
gest that loan structure (i.e. collateral and covenants) can be used to compensate
for negative source credibility (e.g. Mather, 1999; Oldham, 1998), while others

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06004-6
79
80 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN
maintain that loan officers should not trade off perceived weaknesses in source
credibility with tighter structure. The risk of attempting to counterbalance flawed
character with loan structure is too great; a safer approach would be to avoid a
business relationship than to trust the applicant’s financial representations (e.g.
Pace & Simonson, 1977).
Research on whether lenders compensate for perceived weakness in source
credibility by imposing tighter loan structure requires joint study of loan approval
and loan structure decisions. However, the literature on how loan officers react to
negative source credibility has focused on loan approval (e.g. Beaulieu, 1994) or
loan structure (Mather, 1999), but not both. Thus, the primary contribution of this
paper is to determine whether the tradeoff exists.
Source credibility was manipulated in the experiment to be either positive
(suggesting a credible source) or negative (suggesting a non-credible source).
Data were collected using a computerized process-tracing program, which
collected information on decision effort, perceptions of the credibility of projected
accounting information, loan approval/denial, and loan structure. The results
indicate that loan officers will deny loans to less credible clients rather than
restructure the conditions of the loan, and that they will not structure loans more
restrictively regardless of whether they were in the positive or negative character
condition or whether they approved or denied the loan.
LITERATURE REVIEW AND

HYPOTHESIS DEVELOPMENT
Defining Source Credibility
In capital markets, source credibility refers to whether managers who direct the
preparation of financial statements inspire belief in the statements. Source cred-
ibility is particularly important in today’s environment as a number of prominent
companies, including several of their CEOs and CFOs were accused of falsifying
documents and manipulating accounting information to hide poor financial results.
Source credibility is distinct from credibility conferred by attestation services
offered by external auditors. While both forms of credibility are important,
source credibility, which has received relatively little attention in finance and
accounting literature, is the focus of this paper. In a post-Enron world, new
research will likely address interactions between source credibility and attestation
services.
Source credibility is important whenever resource providers lack complete in-
formation and must rely on others to provide fair and accurate information. Source
How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 81
credibility is particularly important in commercial lending, which lacks the analyst

following, observable firm valuation in share prices, and visible disciplinary
managerial labor markets affecting large publicly traded firms. In equity markets,
for example, there is a public record of the accuracy of management forecasts
that can be used to evaluate source credibility (Hirst et al., 1999). In commercial
lending, on the other hand, loan officers lack access to the prior accuracy of
borrowers’ financial projections (since the media do not report their projections).
They must observe the behavior of borrowers, such as their willingness to disclose
information, in order to draw conclusions about source credibility.
Lenders refer to source credibility as the character of borrowers. Character
is usually defined as a borrowers’ determination to repay debt, but also connotes
honesty and integrity, which are to be considered when evaluating all statements
(financial or otherwise) made by borrowers. Research in many different contexts
has established that source credibility affects information use. It has been shown
to affect job choice intentions (Coleman & Irving, 1997), purchase intentions
(Gotlieb & Sarel, 1991; Grewal et al., 1994) and commercial loan officers’ loan
decisions (Beaulieu, 1994; Mather, 1999). In all of these contexts, people reduce
the influence of information from non-credible sources on their judgments and
action choices, a process called “discounting” (Beach et al., 1978; Kelley, 1972).
Most research in source credibility has defined and manipulated it as the prior
accuracy of sources of information. An example is Hirst et al. (1999), who studied
the effects of source credibility and the form of management earnings forecasts
on investors’ judgments and confidence. Participants in an experiment were asked
to assume the role of investors evaluating the common stock of a manufacturing
firm. Source credibility was manipulated as high (low) forecast accuracy, and sub-
jects were told that “any differences between (the company’s) prior forecasts and
the actual realizations were generally very small (very large).”1 Another exam-
ple of this research is Maines (1990), who examined whether the prior accuracy
of individual earnings forecasts affected judgments of the expected accuracy of
consensus forecasts that were based on the individual predictions.
Research conducted in this manner does not allow participants to make their
own judgments of source credibility; it simplifies the issue by defining it as
prior accuracy and specifying what that is. In contrast, Beaulieu (1994, 1996)
defined source credibility as whether sources of information inspire belief in their
representations. This more general definition encompasses prior accuracy, but also
includes many other indications of source credibility, which may be referred to
collectively as prior behaviors of the source. Relevant prior behaviors depend on
the context of information usage; an example in commercial lending is the ability
or failure of a borrower to provide documentation when promised (Beaulieu,
1994). In adopting Beaulieu’s definition and manipulation, source credibility is
treated as a more subtle, complex, and practical issue than is done in most prior
research and participants are given more freedom to judge source credibility.
Prior research in lending has found that source credibility affects loan approval
(Beaulieu, 1994) and loan structure (Mather, 1999) judgments. However, loan
approval and loan structure are not separate, independent judgments even though
they have been examined separately. To more completely understand the effect of
source credibility on lending decisions, both judgments need to be simultaneously
examined. Doing so provides a more comprehensive understanding of how
loan officers react to negative source credibility, and in particular, whether
they compensate for negative source credibility by restrictive loan structure or
whether they simply deny the loan request at the outset. This is our basic research
question.
This research question is important because it focuses on shortcomings in the
current literature and helps resolve the debate on how loan officers react to negative
source credibility. Framing the research question in terms of a tradeoff between
loan approval and structure allows us to investigate whether Mather’s findings
(1999) that source credibility affects loan structure would hold if loan officers
were permitted to deny loans. Similarly, while Beaulieu (1994) documented that
more loan officers denied loan applicants with negative source credibility than
those with positive source credibility, there is no evidence whether loan candidates
with negative source credibility who were approved received more restrictive loan
structures than those who were denied or those who had positive source credibility.
If lenders do not structure approved loan candidates with negative source credibility
more restrictively then there is no consequence to candidates with negative source
credibility that would protect lenders.
Loan Approval/Denial and Loan Structure
Commercial lending experts recommend that loan officers evaluate source credi-
bility, in the form of a character judgment, as soon as contact with a prospective
borrower has been made. If character is not of sufficient quality, then analyzing
credit further or considering alternative loan structures may not be worthwhile.
This preliminary character judgment is the first hurdle of lending (McDonald &
McKinley, 1981; Pace & Simonson, 1977). Stephens (1980) confirmed that loan
officers want information about the applicant before examining the details of the
loan. This position can also be inferred from Eisenreich (1981, p. 9):
Since the majority of information will come from the borrower . . . the lender must have confi-
dence in the raw material of the judgment. If not or if critical facts cannot be verified, the lender
cannot make the decision. It would be a gamble rather than a calculated risk.
Loan officers may be tempted to work with potential borrowers, as is suggested

by positive accounting literature (Watts & Zimmerman, 1986), even when they
observe negative character. If the apparent risk of financial loss to the bank resulting
from the actions of borrowers with negative character could be managed through
loan structure, loan officers could acquire profitable clients. Suppose, for example,
that a borrower fails to disclose information about a relatively small liability. As
the following scenario suggests, the loan officer may still attempt to work with the
borrower. The italicized sentence becomes a rationalization for making the loan.
Use of the word “undisclosed” is usually just another way of saying that you have been lied
to, absent a brilliant excuse for amnesia. At this level, a withhold is usually the same as a lie
and is a serious character flaw. Regardless, if the nature and circumstance of an undisclosed
issue can be overcome, then the daunting task of managing any remaining financial risk is still
left to deal with. If what remains is a quantifiable financial issue, then this may be manageable
through the loan structure. Otherwise, walk away (Oldham, 1998, p. 64, emphasis added).
The above direct quote conflicts with the advice offered by other commercial
lenders cited earlier (Eisenreich, 1981; McDonald & McKinley, 1981; Pace &
Simonson, 1977). It seems to advocate both screening borrowers of questionable
credibility and using loan structure to work with them. While prudently this should
be the exception rather than the rule, loan officers may use the exception to ratio-
nalize loan approvals.
Which reaction is more likely to occur is an open issue. Beaulieu (1994)
found that character had a significant main effect on loan officers’ loan decisions
(approval or denial) and that it interacted with accounting information to affect
both decisions and estimates of risk of nonpayment. Specifically, loan decisions
and risk estimates responded significantly to a change in the strength of ac-
counting information when character was positive, but not when it was negative.
Participants in Beaulieu’s study were told to assume, in a loan application case,
that structure of the proposed loan would be determined by the bank’s policy
at competitive terms and that collateral would be available to meet the bank’s
guidelines for that type of loan. They had no opportunity to adjust loan covenants
or collateral. In contrast, Mather (1999) instructed his subjects that loans had
already been approved, so that only the loan structure task was required. Under
these conditions, Mather found that loan officers set more restrictive loan structure
when credibility was unknown than when it was positive.
An objective of the current study is to help to resolve the debate by providing
evidence as to whether lenders simply deny a loan (H1) consistent with Beaulieu
(1994) or select collateral and covenants levels to compensate for weaknesses in
source credibility (H2) consistent with Mather (1999). Essentially, H1 and H2
are competing hypotheses. Because the guidance in the literature is at odds, the
hypotheses are stated in the null form.
H1. There will be no difference in the proportion of loan officers who will
approve loans when character of the borrower is positive than when character
is negative.
H2. There will be no difference in proposed loan structure between loan officers
receiving negative and positive character information.
Process Effort
Loan officers make a critical decision regarding how much effort to expend
when they evaluate a loan candidate. Rosman and Bedard (1999) find evidence
that lenders will structure loans more restrictively when they expend less effort.
However, Rosman and Bedard do not consider the relationship between effort
and loan structure restrictiveness in light of weaknesses in a potential borrower’s
character.
When character is perceived to be weak but not entirely non-credible, the lender
may pour more effort into the file to check on the initial negative impression
of character and to relate character judgments to other information provided,
especially accounting information. This possibility is motivated by the fact that
initial impressions of character and personality can be incorrect (Korem, 1997).
That is, loan officers may consider approving a loan if no aspect of presentation
in the financial statements encourages caution, even though assessments of
management’s credibility raise doubts about their character.2 Increasing decision
effort in such situations reduces concerns raised by initial negative character
judgments that do not push loan officers past a threshold where they feel that they
must deny loans. Increased processing effort, as a response to negative (but not
extremely so) character information, is consistent with Shaub’s (1996) finding
that auditors lacking trust in a client will recommend more work in their audit
plans. It is also consistent with Beaulieu (2001), in which recommended evidence
collection was negatively related to a CFO’s integrity.
The other option available to loan officers when character judgments are
sufficiently negative is to deny loans because such credits do not clear the
“first hurdle” of commercial lending (Pace & Simonson, 1977). This implies
that information processing will be terminated quickly when the character of
borrowers is so negative that they are considered non-credible.
Options one and two (checking initial impressions of character and relating
it to accounting information, and denying loans without checking) require more
and less processing effort, respectively, than an average or baseline credit with
positive character information. It may not be obvious to loan officers whether the
loan applications of borrowers, perceived at first to have weak character, deserve

more or less analysis. Thus, we expect the following:
H3. Variance of effort should be greater when loan officers receive negative
character information than when they receive positive information.
METHOD
Procedure
Decision process and outcome data were collected using Search Monitor, which is a
computerized process-tracing program (Biggs et al., 1993; Brucks, 1988; Rosman
& Bedard, 1999). Search Monitor is interactive, menu-driven software that presents
case materials to participants and captures a complete trace of selected processes
including cue acquisition, acquisition order, and time to examine cues.
Subjects were advised at the beginning of the Search Monitor task that a
commercial loan applicant was seeking a loan package that included short- and
medium-term financing. The case used in this study integrates the case materials
used by Beaulieu (1994), which validated the source credibility measures, and
Rosman and Bedard (1999), which validated the realism of the lending task and
related measures.
The loan applicant, a manufacturer of chemical products, was briefly described,
including the contact person with the firm, its CFO. Further information about the
firm was accessed via a menu having six categories of financial and qualitative
data: profitability, inventory turnover, liquidity, and financial leverage & capital
structure (financial); and management and industry & product (qualitative). Each
of the four categories of financial data consisted of three ratios (and the dollar
values of numerators and denominators), divided into historical (years −2, −1
and 0) and projected (years +1 and +2) information. Case information indicated
that the historical information was given a clean audit opinion, while no opinion
had been expressed regarding the projected figures.
For example, the following menu was presented to participants who selected
profitability information.
(1) Historical net income/average equity
(2) Projected net income/average equity
(3) Historical net income/average total assets
(4) Projected net income/average total assets
(5) Historical net income/net sales
(6) Projected net income/net sales
The order of the six cues was randomized differently each time a participant
returned to the menu. Participants could move both within each of the six
categories of information and between categories as they wished. When they
indicated that they had finished selecting and viewing information, they were
given a series of screens to register their recommendations about the loan.
Approval or denial of the loan was requested, assuming an interest rate set at one
percentage point above prime, followed by loan structure recommendations.3
Participants who recommended denial were told that although they did not
recommend approval, they had been asked to provide input on how to structure the
loan in the event that the loan committee recommended approval. This step was
necessary so that H2 could be examined. That is, even if a loan did not pass the initial
character judgment hurdle (see Pace & Simonson, 1977, discussed earlier), this
step ensures a test of the tradeoff between structure and character that is suggested
by some of the literature, including positive accounting theory. Combined, H1 and
H2 provide a stronger test of the two competing points of view that have been
expressed in the literature.
Four loan structure recommendations were requested (see below). Twelve
responses were provided for each, corresponding to ranges of percentages that
varied, depending on the item.4,5
(1) Percentage of loan principal for which an equivalent amount of assets will be
collateralized.
(2) Level of profitability (ratio of net income to average equity) to be maintained.
(3) Level of liquidity (ratio of cash flows to fixed cash commitments) to be main-
tained.
(4) Level of leverage (ratio of total liabilities to equity) to be maintained.
The loan structure recommendations were followed by a question asking
participants to indicate confidence in their structure judgments on a nine-point
scale. Finally, two questions asked participants to rate the credibility of historical
financial information and management’s financial projections, also on nine-point
scales.
The character information used in this study was adapted from Beaulieu
(1994), which contains a complete description of the development and valida-
tion process. As shown in Table 1, character was manipulated between-subjects in
two places in the Search Monitor program. First, either positive or negative charac-
ter information regarding the CFO was provided in an introductory screen and was
seen by all participants in either condition of the experiment. Second, participants
could select more information about the CFO via the management information
menu. Those selecting the additional information received either a positive or
negative description, depending on the condition to which they had been assigned.
Table 1. Character Conditions and Locations in Search Monitor.a

Location of Information Positive Character Negative Character
Introductory screen When you visited the When you visited the business, the
viewed by all business, the CFO had CFO did not have available all the
participants available all the documentation that he had promised
documentation that he had to provide. However, the following
promised to provide. Among information did become available to
the items you examined you during your initial evaluation.
during your initial evaluation The loan application stated that the
are the following. The loan firm had not been a defendant in legal
application stated that the actions in the last three years. A
firm had not been a defendant background check showed that a
in legal actions in the last former senior officer of the firm has
three years. A background filed a wrongful dismissal suit. The
check confirmed this. At your suit has recently been settled out of
meeting, you said that a court. At your meeting, you said that
decision on the loan would be a decision on the loan would be
made within two weeks. The made within two weeks. The CFO
CFO accepted this time accepted this time frame and did not
frame, and did not urge you urge you to reach a decision earlier.
to reach a decision earlier.
Management menu, CFO The CFO’s work history is The CFO’s work history is provided,
viewed only if selected provided, then the following: then the following: During your
At your first meeting the CFO meeting with him, Mr. Butler ignored
answered your questions your suggestions for improving his
patiently, and volunteered firm’s operations and said that he did
additional information. He is not need business advice. He has
an active member of several changed the firm’s public accountant
local community service twice in the last five years.
organizations. Disagreements with the former
public accountants were reported.
a Thesentences in italics were rated as neutral, not providing information about character, in Beaulieu
(1994). They were not written in italics in Search Monitor.
Participants
Twenty-five bankers representing 11 banks in New England participated in the

study. There were no statistically significant differences between the 14 bankers
in the positive source credibility condition and the 11 bankers in the negative
condition on the following dimensions: years in banking, education level, and
loan size experience. On average, the 25 participants had 17.8 years of banking
experience (range of 8–26 years). All but three of the bankers had a college
education. The bankers had experience with loans that ranged from $1,000,000
to $50,000,000 (mean = $8,730,000), but the most typical loans encountered by

the participants during their normal course of business ranged from $90,000 to
$1,000,000 (mean = $363,000).
RESULTS
Manipulation Check
The potential for source credibility to impact the perception of the credibility of
accounting projections is important because projected accounting information is a
standard component of loan applications (Danos et al., 1989), and is not audited.
This type of credibility judgment is different than other credibility judgments that
are made in equity markets, because the latter are objective assessments of the
accuracy of management forecasts (e.g. Hirst et al., 1999). In contrast, source
credibility in the lending context is a subjective consideration of the prior behavior
of management that is made because there is no objective public record of man-
agement forecast accuracy. The credibility of projected unaudited information is a
judgment that precedes loan approval and loan structure and is used to assess the
success of the manipulation.
Mean source credibility ratings of projected accounting information were
evaluated on a nine-point scale (1 = low, 9 = high). Subjects rated the credibility
of projected information to be higher in the positive condition than in the negative
condition (5.43 vs. 4.18, t = 1.63, p = 0.06, one-tailed). Credibility of the
historical, audited financial information was also judged on the same nine-point
scale. The mean ratings were 6.27 in the negative character condition and 7.14
in the positive condition (t = −1.13, p = 0.27).6 Therefore, any effects of
the manipulation of information about the CFO’s character on loan decisions,
structure recommendations, and processing effort result from changes in the
credibility of projected, rather than historical, accounting information.
Hypothesis Testing
H1 investigated whether loan officers would simply deny loans if they become
sufficiently concerned about character and source credibility. All loan officers
given the positive character information about the CFO approved the loan (100%
of 14), as did 8 of the 11 given the negative version (73%). The ␹2 statistic is 4.34
( p = 0.037). Thus, the null hypothesis is rejected. H2 investigated whether loan
officers would adjust loan structure to compensate for negative source credibility.
Table 2 reports the four mean loan structure recommendations (collateral and
Table 2. Mean Structure Recommendations.a

Negative Character Positive Character t-Statistic ( p)
Percentage of principal collateralized 10.5 10.9 0.60 (0.56)

Covenants
Profitability to be maintained 3.6 3.1 0.82 (0.42)
Liquidity to be maintained 7.5 8.7 1.08 (0.29)
Leverage to be maintainedb 5.0 6.1 0.84 (0.41)
Total of covenant recommendations 16.1 17.9 0.94 (0.36)
a As in Rosman and Bedard (1999), each of the four scales (one for collateral and three for covenants)
consisted of twelve responses. Each response represented a range of percentages, for example 10–20%
of assets collateralized, but the ranges of percentages differed among the four scales. For all four scales,
response 1 indicated 0% and 12 indicated the maximum percentage. Thus, the maximum total score of
collateral and covenants possible is 48 (4 × 12).
b These scores have been converted as described in Note 4.
three covenants) individually and in total. While structure requirements appear

greater in the negative character condition in one case (profitability) and greater
in the positive condition in three cases (collateral, liquidity and leverage), none
of the differences are statistically significant, whether individual or aggregated.
Thus, the null form of H2 cannot be rejected. The results regarding H1 and H2
support the contention of those in practice who recommend that loan officers deny
loans when character is suspect, and imply that doing so is preferable to handling
negative character through loan structure.
Table 3 presents two sets of additional analyses of loan structure judgments:
(1) loan officers within the negative condition who did not deny the loan are com-
pared to those who did deny the loan; and (2) loan officers who did not deny the
loan are compared across conditions. None of the differences in means are statis-
tically significant for either set of analyses. Thus, deniers and approvers structure
loans equally restrictively and those who approved loans structure them equally
restrictively regardless of whether they were in the positive or negative condition.
H3 investigated whether the uncertainty about the usefulness of decision-making
effort would cause loan officers in the negative character condition to exhibit greater
variance in effort than those in the positive condition. Table 4 reports the results for
two aggregate measures of processing effort: total time spent on the task (Panel A)
and total number of visits to information screens (Panel B). In Panel A, variance
of time spent on the task is much greater in the negative character condition; the
standard deviation is almost double that in the positive condition (14.3 vs. 7.5).
The F test is significant (F = 3.64, p = 0.032). In Panel B, variance of total visits
is similar in the negative and positive conditions, the standard deviations being 8.4
90
Table 3. Additional Analyses: Mean Structure Recommendations for those Who did not Deny the Loan.a
Within Negative Condition Within Did-not Deny Decision
Did Not Deny t-Statistic ( p) Negative Positive t-Statistic ( p)

Deny (n = 8) (n = 3) Character (n = 8) Character (n = 14)
Percentage of principal collateralized 10.3 11.0 −0.66 (0.53) 10.3 10.9 −0.66 (0.53)
PHILIP R. BEAULIEU AND ANDREW J. ROSMAN

Covenants
Profitability to be maintained 3.5 3.7 −0.19 (0.85) 3.5 3.1 0.64 (0.53)
Liquidity to be maintained 8.1 6.0 1.21 (0.26) 8.1 8.7 −0.53 (0.60)
Leverage to be maintainedb 4.7 5.7 0.40 (0.70) 4.7 6.1 0.96 (0.35)
Total of covenant recommendations 16.4 15.3 0.35 (0.73) 16.4 17.9 −0.72 (0.48)
Total of collateral and covenants 26.6 26.3 0.09 (0.93) 26.6 28.8 −0.85 (0.40)
a As in Rosman and Bedard (1999), each of the four scales (one for collateral and three for covenants) consisted of twelve responses. Each response
represented a range of percentages, for example 10–20% of assets collateralized, but the ranges of percentages differed among the four scales. For all
four scales, response 1 indicated 0% and 12 indicated the maximum percentage. Thus, the maximum total score of collateral and covenants possible
is 48 (4 × 12).
b These scores have been converted as described in Note 4.
Table 4. Measures of Processing Effort.

Negative Character Positive Character
Panel A: Total minutes on task

Mean 18.3 21.3
Standard deviation 14.3 7.5
Range 6.3–52.6 10.7–35.2
H0 : variances are equal
F 3.64
Prob > F 0.032
Panel B: Total visits to information screens
Mean 20.1 21.2
Standard deviation 8.4 10.5
Range 7–33 9–47
H0 : variances are equal
F 1.55
Prob > F 0.490
and 10.5, respectively (F = 1.55, p = 0.490). Thus, the variance of effort choices
was similar with respect to the quantity of information examined, but not with
respect to time spent examining it.
DISCUSSION AND CONCLUSIONS

In an experiment where loan officers used process-tracing technology called
Search Monitor to evaluate a commercial loan application, two results were found
that help researchers and practitioners more fully understand the role of source
credibility in affecting loan officers’ decision behavior. First, loan officers dealing
with negative-character borrowers were less likely to approve loans than those
in the positive character condition (H1); and second, they did not compensate
for negative source credibility by structuring loans more restrictively (H2). These
results suggest that loan officers tend to deny loans rather than compensate for
negative character in loan structure. They did so even though the manipulation
check and results for variance of processing time in the negative condition (H3)
showed that they were aware of and sensitive to character issues.
However, a large number of loan officers in the negative condition (8 out of 11)
did not deny the loan. How then did those who did not deny the loan in the negative
condition react to negative source credibility? Combining the results for H1 and
H2 with the additional analyses leads to the following conclusion: a minority of
loan officers react to negative source credibility, and they do so by denying loans,
while the majority do not react in terms of the final decisions to approve a loan
or to structure it restrictively. In short, proportionally few loan officers reacted to
negative source credibility, but when they did, they denied loans rather than accept
the loan and handle their concerns with loan structure.
In hindsight, these results mirror the reaction of the stock analyst community
to Enron. Those analysts who doubted Enron a year before its bankruptcy were
few and far between, but they did so by using their assessment of source cred-
ibility as the lens through which to analyze the numbers. Enron’s management
was notorious for dealing arrogantly with analysts and being unable to produce
financial information. This created an environment of distrust in which patterns of
transactions that were questionable could be pieced together. The advice of one
analyst who sold Enron stock short was simple: “Test what a company says; don’t
take it at face value.” In other words, it is necessary to assess the credibility of the
source of the information in order to be able to understand the information itself
(Bailey, 2001, p. F1).
As is true of experimental research, the ability to generalize results both
to other tasks and other financial statement users (in commercial lending and
elsewhere) is limited. In particular, although the indicators of character used in
this experiment have been validated in other research (Beaulieu, 1994, 1996),
subtle changes in the apparent financial strength of firms, task or context may
encourage financial statement users to select other signals of source credibility.
Other sources of credibility, especially external audits, may become relatively
more or less important, depending upon task and context. For example, concerns
about accounting for intangible assets may upset the current balance of users’
reliance upon source credibility vs. credibility derived from audits. Our objective
is to encourage thought and research about this balance, and about the type of
credibility information that different users employ.
NOTES
1. Hirst et al. (1999) did not explain to participants how forecast accuracy was calculated.
2. An example of a presentation that encourages caution is writing off all bad debts in a
single period, making it difficult to chart profitability (Ruth, 1987).
3. We do not examine pricing, that is to charge interest sufficiently above prime rates to
accommodate even the worst credit risks. It is difficult for loan officers in the United States
to price-protect themselves, because the commercial lending market is very competitive
and there is as little as a two-point spread separating prime from high-risk borrowers
(Emmanuel, 1989).
4. Consistent with Rosman and Bedard (1999), collateral was represented to the lenders
on a 12-point scale, which ranged from “0%” to “more than 100%” in 10% increments.
Profitability ranged from “0%” to “more than 50%” of the ratio of net income to average
equity, identified in 5% increments. Liquidity ranged from “0%” to “more than 150%” of
the ratio of cash flows to fixed cash commitments, in 15% increments. Leverage ranged
from “0%” to “more than 70%” of the ratio of total liabilities to equity, in 7% increments.
The upper bounds differ due to variation in the normal range of these ratios. The leverage
covenant was converted to a revised measure (i.e. 13 − x, where “x” is the value selected
by the participant) so that the direction of each scale was similar.
5. In contrast, Mather (1999) asked subjects to make judgments as to the number of
covenants they would seek and how tightly they would be imposed. However, the nature of
the covenants was not specified.
6. A potential concern regarding the experiment is that some participants may not have
seen all of the character information. As explained in Table 1, two facts in each condition
of the experiment were viewed only if selected. If a number of participants did not select
the additional screen about the CFO, the strength of the character manipulation would not
have been consistent. Ten of the 11 participants in the negative condition and 13 of 14 in
the positive condition accessed the optional CFO information. In total, 23 of 25 participants
investigated the CFO, evidence that the character manipulation was consistent across con-
ditions, and that character and source credibility were important to the participants. Both
participants who did not access the additional character information, one in the negative
condition and one in the positive condition, approved the loan.
ACKNOWLEDGMENTS
The authors thank Jean Bedard, Karla Johnstone, Marlys Lipe, Inshik Seol, Kathy
Wilkicki and two anonymous reviewers.
REFERENCES
Bailey, S. (2001). Right on the money. The Boston Globe (December 5th), F1.
Beach, L. R., Mitchell, T., Deaton, M., & Prothero, J. (1978). Information relevance, content and source
credibility in the revision of opinions. Organizational Behavior and Human Performance, 21,
1–16.
Beaulieu, P. (1994). Commercial lenders’ use of accounting information in interaction with source
credibility. Contemporary Accounting Research, 10(Spring), 557–585.
Beaulieu, P. (1996). A note on the role of memory in commercial loan officers’ use of accounting and
character information. Accounting, Organizations and Society, 21(August), 515–528.
Beaulieu, P. (2001). The effects of judgments of new clients’ integrity upon risk judgments, audit
evidence, and fees. Auditing: A Journal of Practice & Theory (Fall), 85–99.
Biggs, S., Rosman, A., & Sergenian, G. (1993). Methodological issues in judgment and decision-
making research: Concurrent verbal protocol validity and simultaneous trace of process. Journal
of Behavioral Decision Making, 6, 187–206.
Brucks, M. (1988). Search monitor: An approach for computer-controlled experiments involving con-
sumer information search. Journal of Consumer Research, 15, 117–121.
Coleman, D., & Irving, G. (1997). The influence of source credibility attributions on expectancy theory
predictions of organizational choice. Canadian Journal of Behavioural Science, 29(April),
122–131.
Danos, P., Holt, D., & Imhoff, E. (1989). The use of accounting information in bank lending decisions.
Accounting, Organizations and Society, 14, 235–246.
Eisenreich, D. (1981). Credit analysis: Tying it all together – Part I. Journal of Commercial Bank
Lending (December), 2–13.
Emmanuel, C. (1989). Limiting exposure to fraudulent financial reporting. The Journal of Commercial
Bank Lending (September), 16–27.
Gotlieb, J., & Sarel, D. (1991). Comparative advertising effectiveness: The role of involvement and
source credibility. Journal of Advertising, 20(1), 38–45.
Grewal, D., Gotlieb, J., & Marmorstein, H. (1994). The moderating effects of message framing and
source credibility on the perceived price-risk relationship. Journal of Consumer Research,
21(June), 145–153.
Hirst, D. E., Koonce, L., & Miller, J. (1999). The joint effect of management’s forecast accuracy and
the form of its financial forecasts on investor judgment. Journal of Accounting Research, 37,
101–123.
Kelley, H. (1972). Attribution in social interaction. Morristown, NJ: General Learning Press.
Korem, D. (1997). The art of profiling: Reading people right the first time. Richardson, TX: International
Focus Press.
Maines, L. (1990). The effect of forecast redundancy on judgments of a consensus forecast’s expected
accuracy. Journal of Accounting Research, 28(Suppl.), 29–47.
Mather, P. (1999). Financial covenants and related contracting processes in the Australian private debt
market: An experimental study. Accounting and Business Research, 30(1), 29–42.
McDonald, J., & McKinley, J. (1981). Corporate banking: A practical approach to lending.
Washington, DC: American Bankers Association.
Oldham, J. (1998). The “killer” character component. The Secured Lender, 54(November/December),
62–66.
Pace, E., & Simonson, D. (1977). The four hurdles of lending. The Journal of Commercial Bank
Lending (March), 10–15.
Rosman, A., & Bedard, J. (1999). Lenders’ strategy selection in loan structure decisions. Journal of
Business Research, 83–94.
Ruth, G. (1987). Commercial lending. Washington, DC: American Bankers Association.
Shaub, M. (1996). Trust and suspicion: The effects of situational and dispositional factors on auditors’
trust of clients. Behavioral Research in Accounting, 8, 154–174.
Stephens, R. (1980). Uses of financial information in bank lending decisions. Ann Arbor, MI: UMI
Research Press.
Watts, R., & Zimmerman, J. (1986). Positive accounting theory. Englewood Cliffs, NJ: Prentice-Hall.
EARNINGS MANAGEMENT AND
FRAMING: THE SPECIFIC CASE
OF OBSOLETE INVENTORY
Marybeth M. Murphy and Joanne P. Healy
ABSTRACT
Recent events have shown that earnings management is a significant problem
in the business world and that the culture in place in many organizations may
encourage managers to manipulate earnings. While prior research has shown
that earnings management exists at the corporate level, it has not examined
whether managers at the divisional level are motivated to manage earnings.
The purpose of this study is to examine whether divisional managers will be
more inclined to manage earnings in order to maximize personal wealth. The
secondary research objective is to examine whether the information frame
will impact discretionary management accounting decisions. Members of
the Institute of Management Accountants participated in an earnings man-
agement study in which two conditions were manipulated. First, the annual
compensation of subjects was contingent on whether target income was met
or not met. Second, information about a potentially obsolete inventory item
was framed as either positive or negative. Subjects were asked the likelihood
they would write off the potentially obsolete inventory. Research findings
support the earnings management hypothesis and indicate that managers
are less likely to write off obsolete inventory when their compensation is
impacted by the write-off. Study results also reveal that the manner in which

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06005-8
95
96 MARYBETH M. MURPHY AND JOANNE P. HEALY
the inventory information is framed may affect managers’ write-off decision.

These results are important as they may indicate that earnings management
is more pervasive throughout the organization than previously shown.
1. INTRODUCTION
Arthur Levitt, while chair of the Securities and Exchange Commission (SEC),
announced a focus on firms that manage earnings (Levitt, 1998). He unfolded an
action plan to address earnings management. Initiatives included better accounting
practices, standards and interpretative guidelines, stricter SEC focus on earnings
management, a review of audit practices, and a call for a cultural change in the
business world regarding the acceptance of earnings manipulations. While the
SEC can address most of these concerns with better standards and practices,
changing the culture of business is more complex. It involves changing the behav-
ior of individuals. Research needs to be conducted that addresses why individuals
manage earnings. Such research is important to future accounting practices.
The purpose of this study was threefold. First, earnings management was
experimentally examined in a managerial accounting setting. Previous empirical
research has examined earnings management at the corporate level indirectly
through the analysis of financial results.1 Researchers typically study discretionary
management decisions (i.e. write down of impaired assets) via publicly available
information and infer whether earnings management has occurred based on a
comparison of actual financial results to some expectation (Rees et al., 1996;
Zucca & Campbell, 1992). Rather than taking this approach in identifying
earnings management behavior, this study behaviorally examines whether bonus
plans influence managers’ decisions.
The second purpose of the study was to investigate earnings management at
the divisional level rather than the overall corporate view, looking at what occurs
within the firm.2 A survey by Buck Consultants of Fortune 1000 companies
found that 61% of U.S. companies offer variable compensation plans below the
executive level, and another 27% are considering them (Wilson, 2001). This
increase in bonus type plans creates greater incentive for earnings management.
Earnings management occurs at the corporate level due, in part, to managers’
efforts to achieve incentive compensation based targets (Watts & Zimmerman,
1978). Schipper (1989) states, “Clearly, compensation schemes and divisional
managers’ private information create a potential incentive to manipulate internal
managerial accounting reports.” If performance of managers at the lower levels
of the firm is also measured based on these types of targets, then the possibility
Earnings Management and Framing 97
exists that earnings management could occur at these levels. Managers could use
various means to manipulate earnings, from writing off low value inventory items
to controlling the timing of shipments to customers. The outcome of some of
these methods could be “buried” in the results of normal operations and therefore
might not be obvious at the corporate level. Alternatively, the consolidation of
this manipulated divisional income could result in significantly greater earnings
management at the corporate level than previously estimated. This division
level earnings management could be a potential intervening variable, which
has led to conflicting results in at least one published earnings management
study.3
The third purpose was to examine the effects of framing on earnings man-
agement. Subjects were presented with information pertinent to a discretionary
managerial decision from both a negative and positive viewpoint. Kahneman
and Tversky (1979) theorize that the way information is framed can impact
decision-making. This study looks at the potential impact of the information
frame on the decision to write off inventory.
The results support both earnings management and framing hypotheses.
Findings suggest that management accountants are more apt to write off inventory
when: (1) their personal wealth is unaffected; and (2) information is framed
negatively. An important contribution of this research is the fact that information
framing can have an impact on the earnings management decision. The probability
of writing off inventory was higher, although insignificant, for participants with
negatively framed information, even though their personal wealth decreased, than
those with positively framed information who were not eligible for a bonus. The
management accounting implication of these results is that managers’ decisions
could be influenced by the way information is presented.
This paper is organized in the following manner. Background and hypotheses
are developed and presented in Section 2. The research design and methodologies
used to test the hypotheses are presented in Section 3. Results are shown in
Section 4 and finally, Section 5 presents contributions and implications for further
study.
2. THEORY AND HYPOTHESES DEVELOPMENT

Managers are faced with many different types of risky decisions each day. Many
of these decisions are made at the discretion of management with potentially far
reaching implications. They impact not only management and those internal to
the firm, but also potentially affect the wealth of the shareholders, since many of
these decisions affect a firm’s cash flows and reported net income. The following
hypotheses examine the decision making behavior of managers.
2.1. Hypothesis 1 – Earnings Management (EM) Hypothesis
Previous empirical research suggests that an incentive exists for managers to

manipulate earnings to achieve personal goals.4 Healy (1985) theorizes that man-
agers make discretionary accounting decisions that maximize the value of their
bonuses. As shown in Fig. 1, Healy hypothesizes that if income is above targeted
levels, management decisions will reflect an income maximizing strategy and no
discretionary accruals will be booked, but if income is below targeted levels, man-
agers will take income decreasing actions. Managers may elect income decreasing
accruals when their firm’s income is either above an upper bound or below a
lower threshold. In these situations, the reduction of income has little or no effect
on managers’ compensation in the current year. Managers would take no action if
income were at or above the desired level (but below the upper bound) since this
would negatively impact their bonus. Healy (1985) provides empirical support for
this theory.
While previous research has tested earnings management indirectly through
the analysis of financial data, little research has directly tested the behavior of
managers (Burgstahler & Dichev, 1997; Cahan et al., 1997; Wu, 1997). Based on
Healy’s bonus maximization theory as a basis for predicting earnings management,
the following hypothesis is set forth:
H1. If earnings before discretionary accruals are less than the lower threshold,
the manager is more likely to take income-decreasing actions than when earnings
are just above the lower threshold.
Fig. 1. Predicted Outcomes Based on Bonus Maximization Theory. Note: Adapted from
Healy (1985).
2.2. Hypothesis 2 – Framing Hypothesis
Prospect theory (Kahneman & Tversky, 1979) proposes that information presen-
tation impacts the editing process involved in decision making. Subtle changes
in the wording of facts of a situation can alter an individual’s reference point
(the point at which a decision is made), and ultimately their final decision. For
example, stating probabilities as a 25% chance of gain (positive frame) versus a
75% chance of loss (negative frame) has been found to affect decision-making
(Kahneman & Tversky, 1984). This framing of information can directly impact
the decision by altering the context or frame of reference in a way that is irrelevant,
sometimes leading to sub-optimal decisions.
Some framing research in accounting has occurred in auditing. Shields et al.
(1987) examined the effects of framing on an auditor’s uncertainty judgments
of account valuation. The sample space for accounts was framed as either book
value misstatements or audit values. They found no effect of the frame on the
auditor’s judgments of account accuracy. However, Ayers and Kaplan (1993)
found auditors exhibited confirming tendencies when assigned a misstatement
(non-misstatement) frame by selecting more misstatement (non-misstatement)
cues as relevant to explaining financial statement ratios. Beeler and Hunton (2002)
found evidence that the existence of non-audit revenues creates a predecisional
distortion of client related information, thereby suggesting a potential impairment
of independence.
Framing research has also examined managerial accounting issues. Lipe
(1993) studied framing in an analysis of variance investigation decision and the
subsequent performance evaluation of the investigation manager. She found that
when investigation expenditures were framed as a cost, managers were evaluated
more favorably than when the same expenditures were framed as a loss. In
another study, Rutledge (1995) examined the interaction between recency effects
and framing. He found that recency effects might be tempered by the framing of
decision relevant information.
The above research indicates that framing may impact accounting decisions.
The way managers perceive information may influence their propensity to manage
earnings, leading to the following hypothesis:
H2. Managers will be more likely to take income-decreasing action when
relevant decision information is framed in a negative manner than when that
information is framed in a positive manner.
Presentation framing may also impact the decision to write off an inventory
item. This study uses a potentially obsolete inventory item to operationalize the
accounting decision. If information regarding an inventory item is presented in
a positive manner, the manager is expected to be less likely to classify the item
as obsolete and not write it off. Conversely, when information about the item is
presented from a negative viewpoint, the item is more likely to appear obsolete and
be written off.
2.3. Hypothesis 3 – Interaction of Income Level and Framing
When considered together, H1 and H2 offer some interesting potential outcomes.

Figure 2 outlines the four possible combinations. At one end of the spectrum,
if income is less than the target and the frame is negative (Treatment D), the
highest level of income decreasing actions would be predicted. Both earnings
management and framing theories would suggest that managers would take actions
to decrease income. If income is already below the bonus level, writing off an
obsolete inventory item and decreasing income further would have no impact on
the employee’s compensation. A negative frame of the write-off item would suggest
to managers that the likelihood of future sale is low.
H3a. Managers will be most likely to take income-decreasing action when
relevant information is framed in a negative manner and when earnings before
discretionary accruals are just below the lower threshold.
Conversely, if income is greater than the target and the frame is positive (Treatment
A), no income decreasing actions are likely. Bonus maximization theory would
predict that managers do not desire to decrease income below the level necessary
to receive a bonus. Prospect theory would suggest that a positive frame about a
potential write-off would influence managers to view the item in question more
favorably, thereby making them less likely to consider the item obsolete and write
it off.
H3b. Managers will be least likely to take income-decreasing action when rel-
evant information is framed in a positive manner and when earnings before
discretionary accruals are just above the lower threshold.
The actions taken for the remaining two treatments (Treatment B – Income
greater than target, negative frame and Treatment C – Income less than target,
positive frame) are not as easy to predict. In each case, the two variables involved
would predict that opposite behavior would occur. The outcome of the experiment
depends on which treatment has the dominant effect. If neither variable dominated,
treatment groups should fall equally into groups taking action and those not.
Analyses of results will provide information as to which of these two effects
are greater.
Earnings Management and Framing
Fig. 2. Predicted Interactions Between Earnings Management and Framing Theories.
101
H3c. There will be no difference in income decreasing actions of managers

when relevant information is framed in a positive manner and earnings before
discretionary accruals are just below the lower threshold and when relevant
information is framed in a negative manner and earnings are just above the
lower threshold.
3. RESEARCH DESIGN AND METHODOLOGIES

3.1. Subjects
The intent of the study was to determine if earnings management would occur
in a bonus situation and whether the frame would influence a manager’s decision
process. Members of the Institute of Management Accountants (IMA) were chosen
as subjects, since these individuals are typically in managerial positions that involve
decisions such as writing off obsolete inventory. A randomly selected sample of
1000 actively employed members was obtained from the IMA.5
3.2. Instrument Design
To test earnings management at lower levels, the experiment was designed to

examine the behavior of divisional managers. This was accomplished by creating
a setting at the plant level of the firm. An experimental manipulation, which con-
sisted of a short scenario with questions (shown in Appendix) and a demographic
questionnaire was used to conduct the experiment. The experimental manipulation
was developed using an adaptation of Puto’s (1987) model of the buying decision
process (Fig. 3). The two boxes on the right side of the model represent those
items that are manipulated in the experiment, while the information on the left
side of the model is held constant.
3.2.1. EM Hypothesis Operationalization

In the experimental manipulation, the EM hypothesis was operationalized by
presenting managers with information concerning their plant’s position with
respect to a goal – budgeted operating income. Subjects were told that their
bonus compensation was dependent upon meeting budgeted income levels. If
budgeted income levels were met, managers received a bonus of 0.5% of income;
otherwise they received nothing. This simulates Healy’s lower threshold. To create
a positive initial reference point, managers were informed that the latest estimate
of operations indicated that budgeted net income would just be met. Net income
Fig. 3. Theoretical Model Inventory Write-Off Decision.
was set at $1,502,000, just above the threshold, so that an inventory write-off
would reduce net income below the threshold, eliminating the manager’s bonus.
With the negative initial reference point, a statement was included that infers
early results indicate that actual income will be lower than budgeted income.
Net income was set at $1,400,000, well below the threshold, to remove any
possibility that the threshold could be achieved. This operationalization simulated
a situation where the manager had the opportunity to reduce current year’s
earnings with no impact on personal wealth and improve prospects for subsequent
years.
The italicized line in the second paragraph of the scenario shown in Appendix
indicates where these statements were placed in the survey instrument with the
negative initial reference point shown in brackets. The first independent variable,
INCOME, was a result of the manipulation of this initial reference point. Analysis
of this variable was conducted as a between-subjects design.
3.2.2. Selection of the Discretionary Decision

Previous research has hypothesized that many types of discretionary decisions are
used to manage earnings at the corporate level. Studies have examined the timing
of recognition of extraordinary items (Barnea et al., 1976; Ronen & Sadan, 1975;
Walsh et al., 1991), write down of impaired assets (Zucca & Campbell, 1992), the
provision for bad debts (McNichols & Wilson, 1988) and non-recurring charges
(Elliott & Shaw, 1988). All have found support for earnings management at the
corporate level.
Hepworth (1953) suggests that the inventory valuation process can be used as a
less obvious method of income smoothing. In a study of business unit managers,
Guidry et al. (1999) tested an inventory model of earnings management along with
two other previously tested models. Evidence of earnings management was found
to be the strongest in the analysis of the inventory reserve account. They suggest
that this occurs due to information asymmetry that exists between these managers
and upper level management related to inventory valuation.
The decision to write off inventory involves considerable management dis-
cretion. Accounting Research Bulletin 43 (FASB, 1992) addresses the inventory
write-off in the following manner:
Thus, in accounting for inventories, a loss should be recognized whenever the utility of goods
is impaired by damage, deterioration, obsolescence, changes in price levels, or other causes.
The measurement of such losses is accomplished by applying the rule of pricing inventories at
cost or market, whichever is lower (Stmt. 5, Para. 8).
This guideline can be difficult to follow in some circumstances. Often, especially

for internally manufactured products, no ready market exists. This creates prob-
lems for the application of the lower of cost or market rule. In addition, someone
must make the determination when damage, deterioration or obsolescence has
occurred and to what extent the inventory value has been affected. The effects of
these write-offs or reductions in inventory value are often not easily discernible
to the user of the financial statements, since they are included as part of the
calculation of cost of goods sold. Managers’ decision to write off immaterial
amounts has not been previously studied. These actions are important because if
managers across the firm (all trying to achieve income targets) decide to write off
small valued inventory items, the write-offs have the potential to be material in
the aggregate. In addition, they are probably the most common method of writing
off inventory (Hepworth, 1953). Therefore, immaterial inventory write-offs were
selected as the discretionary decision in this study.
To operationalize this inventory write-off, the value of the inventory item
involved was set at $15,000. A number of factors were considered when setting the
dollar level of the potential discretionary decision. The amount was set at about
4% of inventory, considered immaterial in value. An immaterial value was chosen
for a number of reasons. First, if written off, the amount would be buried in cost
of goods sold. Therefore it would not be obvious to outsiders, and probably not be
detected by auditors. These decisions would be the type described by Hepworth
(1953). Second, most managers would be expected to act conservatively and
write off the amount. Therefore, differences in decision-making would basically
be due to either bonus implications and/or information framing. In addition to
materiality considerations, if the write-off would take place, the amount is large
enough to cause the income level to fall below budget expectations for those
receiving information that income was above the threshold. Obviously, for those
below the threshold, the write-off would have no impact on bonuses this year.
3.2.3. Expected Payoffs

The independent variables were designed with specific expected payoffs in mind.
In the case where income is greater than budget, estimated net income was
specifically established at a level at which the write-off of the inventory item
would result in net income falling below budgeted levels, thus eliminating the
manager’s bonus. Where income is less than budget, since budgeted net income
had not been achieved, the write-off of inventory would have no impact on the
bonus. Figure 4 indicates the values of these expected payoffs in the year depicted
Fig. 4. Bonus Payoff Based on Current Year Inventory Write-Off Decision.

in the scenario. These payoffs are based on the bonus equal to 0.5% of plant net
income ($7,500 if budgeted net income is met). This dollar value was selected to
approximate bonus compensation for plant controllers.6
The only subjects to receive a bonus in the current year were those with income
greater than budget who did not write off the inventory. Since their expected
payoff is $7,510 ($1,502,000 at 0.5%), this group had the greatest opportunity
cost from writing off the inventory. Based on this payoff, these subjects were
expected to be the least likely to write off inventory, in accordance with the bonus
maximization theory.
3.2.4. Framing Operationalization

In general, previous framing research suggests that management may consider
variables that may be unrelated to the actual decision at hand (Johnson et al.,
1991; Kahneman & Tversky, 1984; Lipe, 1993; O’Clock & Devine, 1995; Puto,
1987). As part of the decision to write off inventory, managers in this study were
presented with various pieces of information about the current status of inventory.
If framing has been found to affect the decision-making process, it could also play
a role in the decision to write off obsolete materials. Each subject was given an
inventory statement to review. The second independent variable, INVENTORY,
was operationalized as a statement about inventory expressed in either a positive
frame or a negative frame. The italicized sentence in Appendix that is part of the
item description indicates the frame. The negative frame is shown in brackets. The
manipulation of the inventory information presentation frame was also conducted
as a between-subject design.
3.2.5. Statistical Analysis

These two independent variables, INCOME and INVENTORY, result in subjects
being assigned to one of four possible treatments shown previously in Fig. 2. The
dependent variable selected for this experiment measured the percent likelihood
that the subject would recommend the write-off of inventory (PROBWO). A 2 × 2
ANOVA and/or the Kruskal–Wallis Multiple Comparison Tests were utilized to
conduct the analyses of the hypotheses.
3.3. Pretesting
Prior to mailing the survey instruments, two pretests were conducted to provide
evidence for content validity as well as to improve the experimental task. The
first pretest was conducted during the monthly meeting of a local IMA chapter.
Comments provided by the participants were incorporated to improve the scenario
before the second pretest was undertaken. In general, participants of the initial
test found the scenario to be incomplete. Specifically, they requested additional
information on the relationship between the value of the write-off and total
inventory. Participants also inquired if the parts could be resold as replacement
parts. A line was added that indicated that no such market existed. The second
pretest was conducted during the monthly meeting of another IMA chapter five
months later. Again, all comments were considered and minor grammatical
changes were made to the experiment. The responses from these 38 pretest
participants have not been included in the final sample.
3.4. Procedure
Dillman’s “Total Design Method” (1978) was employed in the design and mailing
of the questionnaires. Each envelope and cover letter was printed with the
individual’s name and address to make the request more personal. Questionnaires
were numerically coded to determine which subjects had responded to the
mailing. The cover letter indicated that this coding was for mailing purposes
only and individual responses would not be associated with names of subjects.
Participants were asked to complete the experiment and were provided with a
stamped, self-addressed envelope for its return.
4. RESULTS
4.1. Response Rate
Table 1 indicates the number of responses. There were 242 (24.2%) responses
from the initial mailing. A second mailing was sent to the non-respondents; an
Table 1. Description of Questionnaire Responses.

PP NP PN NN Total
Total respondents 85 86 105 115 391

Returned to sender 0 0 2 3 5
Returned incomplete 6 4 6 6 22
Non accountants 14 10 8 16 48
Total usable 65 72 89 90 316
PP means income greater than target and a positive inventory frame.

NP means income less than target and a positive inventory frame.
PN means income greater than target and a negative inventory frame.
NN means income less than target and a negative inventory frame.
additional 149 questionnaires were returned, increasing the overall response rate
to 39.1%. Of the total 391 responses received, 27 were returned either unanswered
or incomplete. Another 48 were from non-accountants. The remaining 316 were
used for the analyses.
4.2. Test for Non-response Bias
Tests for non-response bias were conducted on the final sample of 316 participants.
Mean responses to the participants’ probability of write-off question from the first
mailing were compared to those of the second. Kruskal–Wallis tests indicated no
significant differences between the two mailings (t > χ 2 = 0.236). t-Tests were
conducted for years of experience, number of certifications, firm type, type of
degree, and years on the current job. No significant differences were noted.
4.3. Manipulation Checks
The manipulation of the earnings management situation was tested utilizing the
response to the question, “Did you achieve the operating budget prior to the
inventory write-off decision?” This insured that the subjects knew the position
of estimated net income relative to the budget. Approximately 86% of the
respondents answered the manipulation check for the operating budget correctly.7
The success of the manipulation of the inventory frame was confirmed by anal-
yzing the subjects’ response to the following question, “How risky do you feel it
is for the inventory to remain on the books?” Subjects were asked to respond on a
7 point Likert-type scale with “Very Risky” and “Not Risky” at opposite ends. A
Mann–Whitney Test found significant differences between the mean of the positive
(3.88) and negative (3.49) inventory frames at the 5% probability level indicating
that the frame manipulation had succeeded ( p = 0.0458).
4.4. Demographics
Table 2 provides overall information about the respondents in this study. The
respondents held positions in a fairly diversified number of industries with the
46% of respondents employed by manufacturing firms. Subjects employed by
service-oriented firms composed the next largest group (12.7%), followed by those
from public accounting firms (10.4%). The remaining subjects (30.0%) worked
in a variety of environments from banking, retailing, non-profits, consulting, to
distribution.
Table 2. Demographic Characteristics of Respondents.

Number of Respondents Percent of Total
Panel A: Responses by industry

Manufacturing 148 46.9
Services 40 12.7
Public accounting 33 10.4
Consulting 8 2.5
Retailing 12 3.8
Government 11 3.5
Financial services 6 1.9
Banking 8 2.5
Other 50 15.8
Total 316 100.0
Panel B: Responses by educational level
Less than bachelors 8 2.5
Bachelors 185 58.6
Masters or above 122 38.6
No response to question 1 0.3
Total 316 100.0
Panel C: Responses by certification
0 Certifications 149 47.2
Total 316 100.0
Panel D: Responses by managerial level
Owner/self-employed 20 6.3
Staff/individual contributor 56 17.7
First level management 44 13.9
Mid level management 100 31.7
Top level management 95 30.1
No response 1 0.3
Total 316 100.0
The respondents were well educated with over 97.5% holding a bachelor’s
degree. An additional 38.6% held advanced degrees. More than half the group
possessed some form of certification. Thirty-nine percent held one certification,
while 13.5% had obtained two or more. The most common certifications were the
Certified Public Accountant (CPA) and the Certified Management Accountant
(CMA).
Significant accounting experience was a strong component of the subjects’

background with 16.3 years average experience as a practicing accountant.
Sixty-nine percent had 10 or more years experience in accounting related fields,
with 4 subjects having 40 or more years of experience. Another 22% had between
5 and 10 years experience. Only 9% of the sample had less than five years
experience. This experience was further demonstrated by the average length of
time that the subjects held their current position (5.5 years). Thirty-three percent
of the subjects had held their current position more than five years.
The experience of the sample was also evident by the subjects’ position in
the firm. Sixty-two percent held top or mid-level management positions with
titles such as Corporate Controller, Corporate Vice President of Finance and
Corporate Financial Officer. Thirty-two percent held positions at the individual
contributor or first level management position. Six percent of the subjects were
self-employed.
In addition to being experienced in their profession, the subjects were well
matched with the study characteristics. Sixty percent of the respondents received
bonus compensation as part of their pay. The subjects were also familiar with the
decision to write off inventory. Eighty two percent of the respondents indicated
that sometime during their careers they had been involved in the decision to write
off inventory, while 66% held positions that currently had input into the write-off
decision-making process.
4.5. Overall Results
Table 3 reports the number of respondents, the percent likelihood of inventory

write-off, and the standard deviation for each experimental condition. When the
Table 3. Average Likelihood (Standard Deviation) of Write-Off by

Experimental Condition.
Inventory Frame Total
Positive Negative
Income target met 57.5 (4.12) 67.64 (3.92) 63.1

n = 65 n = 72 n = 137
Income target not met 63.13 (3.54) 76.88 (3.50) 69.9
n = 89 n = 90 n = 179
Total 60.5 73.1
n = 154 n = 162
income target was met, the probability of writing off inventory was 57.5% for
the positive inventory frame compared to 67.64% for the negative frame. The
average likelihood of write-off when the income target was met was 63.1%. In
the condition where income targets were not met, respondents who received the
positive inventory frame indicated that there was a 63.13% likelihood that they
would write off inventory, where the negative frame indicated a 76.88% likelihood.
The average likelihood of write-off when the income target was not met was
69.9%. The results can also be viewed from the inventory frame. For the positive
frame, the average likelihood of inventory write-off was 60.5%. For the negative
frame, the average likelihood of write-off was 73.1%.
4.6. Tests of the Hypotheses
All hypotheses were tested using a 2 × 2 ANOVA and/or Kruskal–Wallis Multiple

Comparison Test. The results of the ANOVA are shown in Table 4.
4.6.1. Earnings Management Hypothesis

H1 predicts that when income is below target, managers would be more likely
to take income decreasing discretionary actions to manage earnings. Overall,
subjects reported a 67% likelihood that they would write off the inventory.
However, significant differences were found between groups. The mean response
is 73.1% for those respondents who were presented with a scenario that suggested
that the budgeted net income would not be met and only 60.5% for those
who would likely meet their profit targets (Table 4). These probabilities are
significantly different ( p = 0.001). While a high percentage of subjects wrote off
the inventory item, there were still significant differences between the income
manipulation groups. This result strongly supports H1 and is consistent with
previous empirical accounting research that suggests some managers manipulate
earnings when they are below budgeted threshold levels (Healy, 1985).
Table 4. ANOVA for Percent Likelihood of Write-Off.

Variable DF Sum-Squares F-Ratio Prob > F
Income 1 11875.56 10.88 0.0010

Inventory 1 3780.09 3.46 0.0627
Interaction 1 374.19 0.34 0.5582
Error 312 1091.31
Total 315 357089.80
4.6.2. Framing Hypothesis

H2 hypothesizes that the way the information is framed will influence the write-off
decision, i.e., that managers presented with negatively framed information about
the inventory item will be more likely to write it off than those with posi-
tively framed information. Results provide marginal support for this hypothesis
( p = 0.0627). Participants who received negatively framed information responded
that the likelihood of writing off inventory was 69.9%, while those receiving the
positive frame would write off the item only 63.1% of the time. This suggests that
information framing may have an impact on the discretionary decisions. These
results also support previous research that framing can play a role in accounting
decision making (Johnson et al., 1991; Lipe, 1993; O’Clock & Devine, 1995).
4.6.3. Interaction Hypotheses

First, the data was tested to determine if an interaction existed between the
variables. The results of the 2 × 2 ANOVA (Table 4) indicated that no interaction
exists ( p = 0.5582). While this interaction was not significant, analysis of differ-
ences between respondent groups could provide further insight. To conduct this
analysis, the responses were divided into four groups based on the experimental
condition (refer to Table 5). The comparisons of the resulting values of these
groups create the foundation for analysis of H3a to H3c.
H3a predicts managers will be most likely to take income-decreasing action
when relevant information is framed in a negative manner and when earnings
before discretionary accruals are less than the lower threshold. The mean response
for Cell D was 76.88% (Table 5), the highest of the four cells. A Kruskal–Wallis
Multiple Comparison Test was run to determine if the results of Cell D were
significantly different than each of the other cells in Table 5. The percent likelihood
of inventory write-off in Cell D was larger and significantly different from each
of the other cells. This indicates that the cumulative effects of INCOME and
INVENTORY were responsible for the level of Cell D. H3a is accepted.
H3b predicts that managers will be least likely to take income-decreasing action
when relevant information is framed in a positive manner and when earnings before
discretionary accruals are just above the lower threshold (Cell A). The mean value
of Cell A is 57.5, the lowest of all cells. Based on the Kruskal–Wallis test, Cell A
is significantly different from Cell D and marginally significantly different from
Cell B. However, there is no significant difference between Cell A and Cell C.
An interesting result is the difference in Cell B and Cell C (H3c). Respondents
in Cell B would lose their bonus if they wrote off inventory. While, those in Cell
C were below the threshold, thus their personal wealth would be unaffected by an
inventory write-off. However, the probability of a write-off is 67.64% in Cell B
and 63.13% in Cell C. The direction of the difference seems to suggest that when
Earnings Management and Framing
Table 5. Kruskal–Wallis Multiple-Comparison z-Value Test.
Positive Inventory, Negative Inventory, Positive Inventory, Negative Inventory,
Positive Income (A) Positive Income (B) Negative Income (C) Negative Income (D)
Positive Inventory, Positive Income (A) 0.0000 1.9465 0.8606 4.0379*

Negative Inventory, Positive Income (B) 1.9465** 0.0000 1.2152 2.0507*
Positive Inventory, Negative Income (C) 0.8606 1.2152 0.0000 3.4575*
Negative Inventory, Negative Income (D) 4.0379* 2.0507* 3.4575* 0.0000
∗ Medians significantly different at 0.05 level if z-value > 1.96.
∗∗ Medians marginally significantly different at 0.10 level if z-value is > 1.645.
113
information is framed negatively, managers are more likely to write off inventory
even when there would be an adverse effect on their income, than when informa-
tion is framed positively and their income would be unaffected by their decision.
However, since the Kruskal–Wallis z-value comparing the difference between
the means of these two groups was not significant (z = 1.2152), it is impossible
to support H3c.
4.7. Further Analyses of Results
Further tests of the association between INCOME and INVENTORY and

PROBWO were conducted using OLS regression analysis with bonus pay, firm
type, number of certifications, and years of experience as control variables. The
results were similar to the ANOVA previously presented – INCOME remained
significant ( p = 0.002), and INVENTORY remained marginally significant
( p = 0.061). The years of experience was found to have a positive association
with PROBWO and was the only control variable to approach significance
( p = 0.078). This may indicate that more experienced managers had a greater
propensity to write off inventory and decrease earnings.
4.8. Discussion
Previous earnings management research has been conducted empirically at the

corporate level inferring managerial actions from financial data. This study
examines the theory in a behavioral setting at a divisional or plant level. As
predicted by the results of empirical research, earnings management appeared to
have occurred. Plant managers in this scenario were more apt to write off inventory
if its negative impact on income did not unfavorably affect their bonus. These
subjects had already missed their income targets, and therefore their bonuses, so
they risked little from a personal financial perspective by the write-off of the item.
Subjects who were above budgeted levels (and still had a bonus at stake) were
more cautious about their willingness to write off the inventory and therefore
miss their income target and risk losing their bonus. The results strongly support
the empirical research that suggests that some managers manipulate earnings by
expensing costs in fiscal years where net income expectations are not realized.
The impact of the inventory frame creates the potential for interesting research.
Many might have suggested that the results of the test of earnings management
would be a “given.” However, the probability of a write-off is actually higher
when the information is framed negatively and the executive would lose his/her
bonus by taking the write-off (mean = 67.64), than when the information is
framed positively and there is no chance the executive would receive a bonus
(mean = 63.13). While the difference in results is not statistically significant
and could be the result of random fluctuation, it does have interesting behavioral
implications. The differences in information frames were designed to be subtle
and not necessarily meant to mislead the reader. If these small changes in
information presentation could yield differences in decision making in a situation
where outcomes were more or less expected, how could the information frame
in other less obvious decisions be impacted? This suggests that the impact of
framing could possibly be important in other accounting decisions as well.
Managers receive and communicate information about decisions every day. If the
frame impacts the decision making in such a seemingly predictable decision as in
the earnings management situation, it has the potential to impact other decisions.
Are managers aware of the potential impact of framing on their decision making?
What should they be alert to in the decision-making analysis?
5. CONTRIBUTIONS AND IMPLICATIONS

FOR FURTHER STUDY
This study makes some interesting contributions to the body of accounting re-
search. First, it is a behavioral test of Healy’s bonus maximization theory. Previous
work has examined the theory indirectly by empirically testing the relationship
between various operating results. This study directly investigates the decision
making of practicing accountants, many who are in a position to make decisions
on the write-off of inventory. The scenario places them in a position to manage
earnings by the write-off of inventory. Results show support for this theory.
Second, the study examines the earnings management question in a managerial
accounting setting at an operational level. Most previous studies have viewed earn-
ings management from a company wide perspective. But earnings management
could occur throughout the firm. Lower level managers could be manipulating
earnings to achieve their established goals. This could be accomplished with or
without their superiors’ knowledge and in line with or in opposition to firm goals.
Earnings management at various levels of the firm could lead to overall corporate
earnings management, but could also be performed in isolation. No previous
research has tested this possibility. Additional research should be conducted in
this area to confirm earnings management at the divisional level and to determine
what motivates managers to make these types of decisions. Perhaps the segment
reporting requirements will provide additional opportunities to explore earnings
management at divisional levels.
Another interesting outcome was the effect of the frame on the write-off decision.
It is interesting to note how something as simple and as indirect as the frame of
the information presentation (inventory frame) could have a significant impact on
results. These results raise the question of what other behavioral factors could
influence managers’ decisions to manage earnings and provides a basis for future
research into the effects of framing on discretionary managerial decisions.
NOTES
1. Burgstahler and Dichev (1997), Wu (1997), Cahan et al. (1997), Rees et al. (1996),
Healy (1996), Amir and Livnat (1996), Bernard and Skinner (1996), Dechow et al. (1996),
Dechow et al. (1995) are just a few of the most recent examples.
2. Schipper (1989) states that although there is a potential incentive for earnings
management at the divisional level, research in that area is “sparse to non-existent.”
3. White (1970) found no evidence of earnings management.
4. Watts and Zimmerman (1978) suggest that political costs and debt violations also
affect managers’ motivations to manipulate earnings. These factors would most likely
impact earnings management at the corporate level. The current research examines earnings
management at the plant level, and does not explicitly test for these other factors.
5. Student members or members reporting their employment status as retired were
excluded from the population.
6. Based on 1998 salaries and total compensation reported by Schroeder and Reichardt
(1999).
7. ANOVA tests were conducted on the sample excluding those individuals who
answered this question incorrectly. Results did not differ greatly from the entire test
sample. The p-value for the variable INCOME was p = 0.0005 for this group and 0.0010
for the full sample; for INVENTORY p = 0.0671 and 0.0627 respectively.
ACKNOWLEDGMENTS
We would especially like to thank Elizabeth Cole, Tim Fogarty, Pete Poznanski,
Ray Stephens and Linda Zucca for their helpful comments and the assistance
and for the support received from the Institute of Management Accountants. We
gratefully acknowledge the financial support received from the Research Council
of Kent State University.
REFERENCES
Amir, E., & Livnat, J. (1996). Multiperiod analysis of adoption motives: The case of SFAS No. 106.
The Accounting Review, 71(4), 539–553.
Ayers, S., & Kaplan, S. E. (1993). An examination of the effect of hypothesis framing on auditors’
information choices in an analytical task. Abacus, 29(2), 113–131.
Barnea, A., Ronen, J., & Sadan, S. (1976). Classificatory smoothing of income with extraordinary
items. The Accounting Review, 52(2), 110–122.
Beeler, J. D., & Hunton, J. E. (2002). Contingent economic rents: Insidious threats to audit indepen-
dence. Advances in Accounting Behavioral Research, 5, 21–50.
Bernard, V. L., & Skinner, D. J. (1996). What motivates managers’ choice of discretionary accruals?
Journal of Accounting and Economics, 22(1–3), 313–325.
Burgstahler, D., & Dichev, I. (1997). Earnings management to avoid earnings decreases and losses.
Journal of Accounting and Economics, 24(1), 99–126.
Cahan, S. F., Chavis, B. M., & Elemendorf, R. G. (1997). Earnings management of chemical firms in
response to political costs from environmental legislation. Journal of Accounting, Auditing &
Finance, 12(1), 37–65.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1995). Detecting earnings management. The Accounting
Review, 70(2), 193–225.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1996). Causes and consequences of earnings
manipulations: An analysis of firms subject to enforcement actions by the SEC. Contempo-
rary Accounting Research, 13(1), 1–36.
Dillman, D. A. (1978). Mail and telephone surveys – The total design method. New York, NY: Wiley.
Elliott, J. A., & Shaw, W. H. (1988). Write offs as accounting procedures to manage earnings. Journal
of Accounting Research, 26(Suppl.), 91–119.
Financial Accounting Standards Board (1992). Original pronouncements – accounting standards –
Volume II. Norwalk, CT.
Guidry, F., Leone, A. J., & Rock, S. (1999). Earnings-based bonus plans and earnings management by
business unit managers. Journal of Accounting and Economics, 26(1–3), 113–142.
Healy, P. M. (1985). The effect of bonus schemes on accounting decisions. Journal of Accounting &
Economics, 7(1–3), 85–107.
Healy, P. M. (1996). Discussion of a market-based evaluation of discretionary accrual models. Journal
of Accounting Research, 34(3), 107–115.
Hepworth, S. R. (1953). Smoothing periodic income. The Accounting Review (January), 32–39.
Johnson, P. E., Jamal, K., & Berryman, R. G. (1991). Effects of framing on auditor decisions. Organi-
zational Behavior and Human Decision Processes, 53(2), 75–105.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Economet-
rica, 47(2), 263–291.
Kahneman D., & Tversky, A. (1984). Choices, values and frames. American Psychologist (April),
341–350.
Levitt, A. (1998). The numbers game (September 28th). New York, NY: NYU Center for Law and
Business.
Lipe, M. G. (1993). Analyzing the variance investigation decision: The effects of outcomes, mental
accounting and framing. The Accounting Review, 68(4), 748–764.
McNichols, M., & Wilson, G. P. (1988). Evidence of earnings management from the provision for bad
debts. Journal of Accounting Research, 26(Suppl.), 1–31.
O’Clock, P., & Devine, K. (1995). An investigation of framing and firm size on the auditor’s going
concern decision. Accounting and Business Research, 25(99), 197–201.
Puto, C. P. (1987). The framing of buying decisions. Journal of Consumer Research, 14(3), 301–315.
Rees, L., Gill, S., & Gore, R. (1996). An investigation of asset write-downs and concurrent abnormal
accruals. Journal of Accounting Research, 34(3), 157–169.
Ronen, J., & Sadan, S. (1975). Classificatory smoothing: Alternative income models. Journal of
Accounting Research, 3(4), 133–149.
Rutledge, R. W. (1995). The ability to moderate recency effects through framing of management
accounting information. Journal of Mathematical Economics, 11(2), 27–40.
Schipper, K. (1989). Commentary on earnings management. Accounting Horizons, 3(4), 91–102.
Schroeder, D., & Reichardt, K. (1999). IMA 98 Salary Guide. Strategic Finance, 8(20), 28–41.
Shields, M. D., Solomon, I., & Waller, W. S. (1987). Effects of alternative sample space representation
on the accuracy of auditor’s uncertainty judgments. Accounting, Organizations and Society,
12(4), 375–385.
Walsh, P., Craig, R., & Clarke, F. (1991). Big bath accounting using extraordinary items adjustments:
Australian empirical evidence. Journal of Business Finance and Accounting, 18(2), 173–189.
Watts, R., & Zimmerman, J. (1978). Towards a positive theory of the determination of accounting
standards. Accounting Review, 53(1), 112–134.
White, G. E. (1970). Discretionary accounting decisions and income normalization. Journal of
Wilson, T. B. (2001). What’s hot and what’s not: Key trends in total compensation. Compensation &
Benefits Management, 17(2), 45–50.
Wu, Y. W. (1997). Management buyouts and earnings management. Journal of Accounting, Auditing,
and Finance, 12(4), 373–389.
Zucca, L. J., & Campbell, D. R. (1992). A closer look at discretionary write-downs of impaired assets.
Accounting Horizons, 6(3), 30–41.
APPENDIX
Scenario
You are the plant accountant for a Cleveland area plant of the Spring Wire Company.
The responsibilities of your position include the processing of payroll, payments
to vendors (accounts payable), inventory accounting, preparation of budget and
estimates, and analysis of actual plant operating results. All members of the plant
staff (including yourself) are given a bonus contingent on achieving or exceeding
the plant’s operating budgeted net income of $1,500,000. If the budgeted operating
income is achieved, 0.5% of the current year’s net income will be paid to you in the
form of a bonus. (e.g. if net income is $1,510,000, your bonus would be $7,550.)
It is January 1, and you have received estimated net income for the year of
$1,502,000 [$1,400,000]. In past years, these early results have proved to be
accurate, with few unexpected adjustments made after this date.
You have one last chance to review the status of your inventory that was taken on
December 31st to determine if any potentially obsolete inventory items should be
written off. You are presented with the following information from the Inventory
and Materials Manager (also a staff manager) concerning the inventory item in
question.
Part Number PX23415 is sold to computer manufacturers. It has a current inven-
tory of 5,000 units on hand with a total current inventory value of $15,000. Your
plant’s total inventory including Part Number PX23415 is $350,000. The demand
for this product is 15% of last year’s demand [Industry sales of this product have
demonstrated an 85% decline in both volume and dollar amounts in the last year].
The inventory turnover ratio for this item has declined substantially from the
prior year. Of the original market for the product, about 20% of your competitors
remain [Approximately 80% of your competitors in the market for this product
have ceased production and sales]. No sales occurred during the months of
November or December for your company. Because of the nature of this product,
the potential for this part to be sold in the replacement parts market does not exist.
(1) Please indicate the percent probability in your opinion that this inventory will
be sold. (0–100%)
(2) Please indicate the percent probability that you would write off Part Number
PX23415 from inventory. (0–100%)
For questions 3 through 5, place an “X” on the box that best indicates your opinion.
(3) How risky do you feel it is for the inventory to remain on the books?
Very Risky Not Risky
(1) (2) (3) (4) (5) (6) (7)
(4) Indicate on the scale below your perception of what is occurring in the
marketplace to the demand for this part?
Significantly Decreased No
Change
(1) (2) (3) (4) (5) (6) (7)
(5) Would you consider the bonus an important component of your income?
Very Important Not Important
(1) (2) (3) (4) (5) (6) (7)
(6) Did you achieve the operating budget prior to the inventory write-off decision?
Yes or No
THE EFFECTS OF INCENTIVE
STRUCTURE AND GOAL DIFFICULTY
ON TIME PLANNING DECISIONS
WITHIN A BALANCED SCORECARD
FRAMEWORK
Brad Tuttle and Mark J. Ullrich
ABSTRACT
Recent innovations in management control systems, such as the Balanced
Scorecard System, reflect today’s complex business environment by account-
ing for performance in multiple areas. When individuals must allocate their
time between multiple areas that compete for their time, the manner in
which incentives are structured is hypothesized to influence their decisions
differently depending on goal difficulty. A decision-making experiment was
conducted to test this proposition. When incentives were structured so that
each area of the Balanced Scorecard is rewarded separately, challenging
goals received more planned attention than easy or unattainable goals
following previous findings. When incentives were structured so that goals
in all areas must be achieved together, the influence of goal difficulty on
the time planning decision diverges from previous findings such that areas
having unattainable goals receive the same planned attention as areas
having challenging goals. The results suggest that companies must consider
how performance is rewarded within a Balanced Scorecard framework.

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06006-X
121
122 BRAD TUTTLE AND MARK J. ULLRICH
INTRODUCTION
This study is motivated by today’s competitive business environment that requires
individuals to give their attention to many areas, all of which compete for their time.
Recent innovations in management accounting control systems, such as Kaplan
and Norton’s (1992, 1996) Balanced Scorecard, reflect this situation and attempt
to influence individuals to balance their time among multiple areas through the
establishment of goals, incentives, and accounting systems. While a great deal of
research has been conducted regarding the effects of incentives and goal difficulty
in relation to a single task (cf. Bonner, Hastie, Sprinkle & Young, 2000; Camerer
& Hogarth, 1999; Cameron & Pierce, 1994; Jenkins, Gupta, Mitra & Shaw, 1998;
Wood & Locke, 1990), very little is known about the effects of these variables
on behavior in relation to accomplishing multiple tasks (Ashford & Northcraft,
in press; Locke & Latham, 1990) as addressed by the Balanced Scorecard.
Research into the effects of incentives and goal difficulty on behavior within
a Balanced Scorecard framework is needed for several reasons. Foremost is the
fact that the kinds of incentive structures that are possible when multiple tasks are
involved have received scant attention in the literature. For instance, incentives
associated with a Balanced Scorecard can be structured so that rewards are
received only after meeting the goals in all areas. Or, Balanced Scorecard areas
can be decoupled so that rewards are provided after meeting goals associated with
individual areas. Furthermore, achieving the goals in one area may be easy while
it may be very challenging in another. The combinations of these possibilities add
a level of complexity to the Balanced Scorecard environment that has received
scant attention in the existing literature. For these reasons, Ashford and Northcraft
(in press) call for more research into decision-making when multiple tasks
compete for an individual’s time and attention. The use of the Balanced Scorecard
as a management tool has increased the need for this research.
Naylor, Pritchard and Illgen (1980) posit a theory, hereafter NPI theory,
suggesting that when individuals are faced with multiple objectives, how they
allocate their time among the areas that compete for their time is more important
to achieving overall satisfactory results than the total amount of time spent
working on all of their goals. This distinction has been termed, direction of effort
versus level of effort (Blau, 1986, 1993). Because the many studies that examine
goal difficulty and incentives typically use only a single goal and single task,
they address only level of effort. The effects of incentives and goal difficulty on
direction of effort remain largely unexplored.1
As a practical issue, organizations would benefit from a better understanding
of how incentives and goal difficulty interact to influence how individuals expect
to use their time among their areas of responsibility. The Balanced Scorecard
The Effects of Incentive Structure and Goal Difficulty 123
System (Kaplan & Norton, 1992, 1996) is based on the premise that overall
performance is improved when goals in all areas are reached together. Failure
on one dimension cannot be completely compensated by success in others.
Conceptually then, organizations may desire to reward individuals only when
they achieve satisfactory performance in all of the Balanced Scorecard areas.
One finding of goal research, however, is that while challenging goals generally
motivate more effort than easy goals (Wood & Locke, 1990), unattainable goals
often do not and can sometimes have large negative consequences (Fatseas &
Hirst, 1992; Lee, Locke & Phan, 1997; Mowen, Middlemist & Luther, 1981;
Wright, 1992). This being the case, basing rewards on areas coupled together via
a comprehensive control system may produce unintended consequences when
information suggests that goals in one or more areas are unattainable. Research
is clearly needed to answer these types of practical questions.
A theoretical justification for this study is that, for Balanced Scorecard systems
to work, they must affect the plans of individuals. Without premeditated, goal-
directed planning, individuals do not control their environments but are controlled
by them. This notion is consistent with the idea that closely related constructs like
goal commitment, goal motivation, and intentions affect goal-related performance
(Locke, Latham & Erez, 1988). If the Balanced Scorecard does not motivate
individuals sufficiently to alter their plans about where they will spend their time,
arguing that they are committed to it is difficult (cf. Naylor & Illgen, 1984, p. 98)
or achieving its objectives is unlikely. Hence, this study builds on the theoretical
foundation from prior studies by looking at the time planning decisions of
the subjects.
Studies that examine the effects of planning and intentions on performance
generally conclude that these variables have a stronger effect than most other
variables. For instance, Chesney and Locke (1991) find that identifying an
appropriate strategy for completing a complex task in the initial planning stage
has a greater effect on performance than does goal difficulty. Early, Wojnaroski
and Prest (1987) find that planning is positively associated with performance
in both the laboratory and the field. In a study by Cotton and Tuttle (1986),
intentions predicted subsequent behavior more reliably than any other variable
they identified in the literature. McAllister, Mitchell and Beach (1979) find
that individuals who planned to spend more time on a task actually did spend
more time on it and thus conclude that intentions are positively related to
performance.
Also from a theoretical viewpoint, this research extends the findings of many
goal studies that employ tasks having a production-line orientation to a context that
more closely resembles those encountered by individuals in management roles.
Managers operate in environments that inherently place many demands on their
time at once. Although prior goal research has examined complex as well as simple
tasks (Chesney & Locke, 1991; Wood & Locke, 1990; Wood, Mento & Locke,
1987), subjects have typically worked towards only a single goal. Settings charac-
terized by single objectives are more characteristic of unskilled or process-oriented
jobs – not management level positions. On the other hand, the task of allocating
one’s individual time and attention between various demands is highly consonant
with what managers do. That is, a manager’s time is his most valuable and scarce
resource and how that resource is allocated likely makes the most difference
to what gets accomplished (Miodonski, 1999; Plack, 2000). Few studies have
looked into factors that influence time allocation between tasks in a managerial
context.
Investigating the difficulty of the goal is also important in a Balanced Scorecard
framework. Information about goal difficulty is an integral, if not a necessary,
component to the successful achievement of most important goals (Wood &
Locke, 1990) and is a major rationale for the existence of the Balanced Scorecard.
Simply put, having a “goal” without also having the ability to assess one’s position
relative to it, is not much of a goal. Notwithstanding, only a very small portion
of the goal literature examines behavior in a setting where information about the
level of goal difficulty in one area permits subjects to shift their time to or from
other relevant, work-related areas. Yet, this is exactly what is possible within a
Balanced Scorecard system.
BACKGROUND AND HYPOTHESES

Organizations set goals with the purpose of influencing their members to spend
their time on the areas that are deemed important to the organization. Considerable
theory, as well as anecdotal evidence from the field, suggests that goals influence
the way individuals allocate their time. Gollwitzer and Bargh’s (1996) action phases
model stresses planning, including how much time to allocate, as an antecedent of
goal-directed effort. Ajzen’s (1987) theory of planned behavior explicitly identifies
intentions as leading to purposeful, goal-directed behavior (Ajzen & Madden,
1986). In the popular literature, Stephen R. Covey’s (1989) The Seven Habits of
Highly Effective People stresses the importance of planning one’s time to achieve
specific goals. His time management planning forms explicitly ask individuals
to clearly define and record their goals directly on the time forms. Interestingly,
Covey asserts that multiple areas should be addressed and that individuals should
think about how much time they are willing to devote to each one.
Two generally accepted findings of goal research are that challenging goals
are more motivating than easy goals, and that impossible goals are often rejected
and, therefore, are less motivating than easy or challenging goals (Fatseas &
Hirst, 1992; Lee, Locke & Phan, 1997; Mowen et al., 1981; Wright, 1992). Erez,
Gopher and Arzi (1990) partially extend these conclusions to multiple tasks and
find that proportionately more attention is allocated to more difficult tasks. To the
extent that these findings generalize to time planning decisions by individuals,
they suggest that individuals will plan to spend more time on challenging goals
and less time on easy or unattainable goals.
Incentive Structure and Goal Difficulty
Bonner, Hastie, Sprinkle and Young (2000) refer to incentives, within the context
of a management control system, as the presence or absence of motivators linked
to performance. They differentiate incentives from incentive type, which refers to
how pay is tied to performance and provide the following major categories: flat
rate, piece rate, variable ratio, quota (or goal), and tournament. As such, incentive
type refers to how incentives are tied to performance that is generally associated
with a single task. This differs from incentive structure, which is used in this
paper to refer to the way incentives are structured between tasks as in the multiple
areas of the Balanced Scorecard. An incentive structure may consist of just one
incentive type or of multiple incentive types across various performance measures
associated with different tasks and goals.
Organizations often implement monetary incentives to motivate goal congruent
behavior. These incentives are designed to motivate individuals to increase their
goal-related effort by making the goal more attractive to attain (Vroom, 1964), by
reinforcing performance (Komaki, Coombs & Schepman, 1996), by motivating
individuals to set more or higher goals (Wright, 1991) or by increasing the
acceptance of difficult goals (Locke, Latham & Erez, 1988). Given the importance
of time planning to goal accomplishment, incentives should motivate individuals
to plan sufficient time to meet their goals. Evidence shows that performance-based
incentives increase the amount of time individuals spend on a task (Awasthi
& Pratt, 1990; Libby & Lipe, 1992; Sprinkle, 2000; Stone & Zeibart, 1995; Tuttle
& Burton, 1999; Tuttle & Harrell, 2001). Some research, however, suggests that the
relationship between incentives and behavior is not direct but is contingent upon
the type of incentive being offered (Bonner et al., 2000) and the difficulty of the goal
(Wright, 1991).
Using NPI theory as a framework, Wright (1991) suggests that goal difficulty
and the structure of incentives interact to determine effort. Wright argues that
incentives will have a negative effect when effort is costly and does not result
in extrinsic rewards. To illustrate, consider the case where an individual is paid
a salary with the possibility of earning a bonus if a high level of customer

satisfaction is achieved. Under these conditions, individuals who just miss their
goal, and thus receive no bonus, may feel punished. These individuals may be less
motivated than had they been provided the same goal but no monetary incentive.
Mowen et al. (1981) and Fatseas and Hirst (1992) find that goal-contingent
incentives produce significantly less performance than either piece rate or
fixed-pay incentives. Wright (1991) concludes that reward contingency (i.e. the
way in which incentives are tied to performance) may have a direct effect on an
individual’s personal goals, commitment to assigned goals and performance.
When individuals are faced with demands on their time in multiple areas, the
manner in which incentives are structured among those areas is likely to interact
with goal difficulty on how they plan their time. To illustrate, consider first an
incentive structure in which each individual area of the Balanced Scorecard is
associated with its own monetary incentive and in which individuals can adjust
their time both within an area and also between areas. A manager in this situation
would not feel a need to allocate a lot of time to sure winners to achieve results
and can be expected to reallocate their efforts to goals in areas that are achievable
but challenging – thus increasing their overall expectation of remuneration. In
this case, the negative effects predicted by Wright (1991) are isolated to areas
with unattainable goals and are likely to be very strong since other Balanced
Scorecard areas compete for the manager’s attention. The manager is predicted to
shift his/her time to achieving challenging but attainable goals in other areas.
Now consider an alternative incentive structure in which a monetary incentive
is received only if the goals associated with all areas are reached. In this case, as
in the first, easy goals may not need much attention; and some of the manager’s
time may be shifted toward more challenging areas. However, the presence of
an unattainable goal eliminates the chance to receive the monetary incentive
regardless of performance in other areas. Given the arguments by Wright (1991)
based on NPI theory and the results of prior studies, we predict that this situation
will have negative motivational effects on all areas, not just on the areas in which
satisfactory performance is unattainable. Both situations suggest that shift in
attention will be greater when incentives are based on goal attainment in each area
separately than when incentives are based on goal attainment for all areas as a set.
Often, goals are set to be challenging (but attainable) from the outset. However,
environmental factors, such as changes in competition, can cause goal difficulty
to change midstream. Likewise, the manager may initially spend too much
or too little time in a particular area so that ultimately achieving satisfactory
performance in that area is almost certain or very unlikely. Under these conditions,
sustaining the same attention to these areas is counterproductive and adjustments
are necessary. For this reason, organizations develop accounting systems that
provide continuous information to individuals regarding goal difficulty (Anthony

& Govindarajan, 1998). Thus, information produced by the Balanced Scorecard is
intended to help individuals adjust their plans for achieving goals (Wood & Locke,
1990). However, the previous findings should also apply to information regarding
goal challenge as communicated by the Balanced Scorecard. Thus, regarding
managers’ time planning decisions the following hypotheses are proposed:
H1. Managers will shift time from Balanced Scorecard areas that information
indicates are easy to areas that information indicates are challenging.
H2. Managers will shift time from Balanced Scorecard areas that information
indicates are unattainable to areas that information indicates are challenging.
H3. Shifts in time from Balanced Scorecard areas that information indicates
are easy to areas that information indicates are challenging will be greater when
Fig. 1. Predicted U-Shaped Functions.

incentives are based on goal attainment in each area separately than when in-
centives are based on goal attainment for all areas as a set.
H4. Shifts in time from Balanced Scorecard areas that information indicates are
unattainable to areas that information indicates are challenging will be greater
when incentives are based on goal attainment in each area separately than when
incentives are based on goal attainment for all areas as a set.
Note that H1 and H2 combine to suggest that a manager’s time allocation will
follow an inverted U-shaped function in relation to goal difficulty for a single
Balanced Scorecard area. Furthermore, as a result of shifting time to and from
competing areas, the time allocated to these other areas will resemble a righted
U-shaped function in relation to the goal difficulty of the single target area (holding
goal difficulty constant for the other competing areas). Figure 1 expresses this
relationship and is consistent with several models of motivation beginning as
early as Atkinson (1958). H3 and H4 imply that both U-shaped functions will be
flatter when incentives are provided only when goals are achieved in all areas in
comparison to when incentives are based on goal attainment for each area of the
Balanced Scorecard individually.
METHOD
Experimental Design
A decision-making experiment was conducted in which participants were

randomly assigned to one of six cells in a 2 × 3 design. Incentive structure was
manipulated between subjects at two levels by making the likelihood of promotion
and receiving a 20% bonus contingent upon achieving goals in four Balanced
Scorecard areas either: (1) individually; or (2) as a set. Goal difficulty was
manipulated between subjects at three levels as being: (1) easy; (2) challenging
but attainable; or (3) not attainable.
Decision Task and Materials
As shown in the Appendix, all participants were projected into the role of a
unit-level manager who was to plan his/her time among four areas corresponding
to a typical Balanced Scorecard: Customer, Financial, Internal Business, and
Learning & Growth. Example performance measures for each area were presented
along with information about goal difficulty. These four goals and associated
performance measures were derived from the Balanced Scorecard used by Mobil
Oil’s domestic marketing and oil refining division (Kaplan, 1997a, b). All subjects
received the same four areas and goals.
The participants were informed that they were being considered for a promotion
and that the corporation offered a performance bonus of up to 20% of their salary,
both of which were linked to their goals. About half of the participants were
informed that the likelihood of promotion and bonus depended upon “how many
goals you achieve” while the remaining participants were informed that their
promotion and bonus depended upon “achieving all four goals together.” This
constituted the incentive structure manipulation with one group’s bonus based
on achieving goals in individual areas and the other group’s bonus based on
achieving goals in the entire set of Balanced Scorecard areas. Thus, all subjects
were provided with a possibility to achieve the same reward; only the manner in
which the incentive was structured varied between groups.
Goal difficulty was manipulated for the Customer area by providing the subjects
with “reliable feedback suggesting that the Customer goal is” easily attainable,
challenging but attainable, or not attainable, depending on their experimental
condition. This resulted in three conditions: Easy, Challenging, and Unattainable.
Goal difficulty was held constant for the other three areas (Financial, Internal
Business, and Learning & Growth) at a “challenging but attainable level” for all
participants.
All participants were informed that they could work as many hours per week
as they wished and that they were free to allocate their work hours as they saw
fit except that they must spend 15 hours per week on tasks unrelated to their four
goals.2 The rest of their time at work was to be devoted to achieving the goals in the
four areas. The participants were then asked how they would allocate their hours
at work to achieve the goals in each of the four Balanced Scorecard areas. Thus,
the hours-per-week the subjects intended to work were collected for each goal
resulting in planned time to spend on Customer, Financial, Internal Business, and
Learning & Growth goals. The sum of these four responses is the total goal related
hours per week. To measure the relative amounts of time allocated to achieving the
various goals, the difference in time allocated to the (manipulated) customer area
and the average time allocated to the three other areas was computed. Positive num-
bers reflect more time to the customer goal in relation to the average time allocated
to the other three goals. Negative numbers reflect more time to the three competing
goals, on average, than to the customer goal. Hence, this measure reflects the
relative emphasis that the subjects placed on the manipulated goal in comparison
to other challenging goals that are competing for their time.3
After the dependent measures were collected, subjects responded to a goal dif-
ficulty manipulation check in which they selected the goal difficulty information
that they received in the case from among the three possibilities. Likewise, in
order to check the incentive manipulation, the participants selected the incentive
manipulation statement that they received in the case. Next, the participants
were asked two questions regarding the valance of the incentives and their
effort-to-performance expectancy. The valance question asked how attractive the
bonus and promotion was using a nine point Likert scale anchored by “1 = very
unattractive” and “9 = very attractive.” The effort-to-performance expectancy
question asked the subjects to rate how likely they would be to accomplish all four
goals if they exerted maximum effort using a nine point Likert scale anchored by
“1 = very unlikely” and “9 = very likely.”4
Finally, the participants were asked to provide demographic information. The
data were gathered during regularly scheduled classes. Participation was voluntary
and anonymous and the experiment took about 15 minutes to complete.5
Participants
One hundred and ninety-three Professional MBA students participated in the

study. Of these, 10 provided incomplete responses and 18 failed one or both
manipulation checks and were deleted leaving 165 usable responses. Because the
professional MBA requires an undergraduate degree in business, work experience,
and is a 12-month program, the subjects were quite homogenous and well qualified
to perform the task. About 26% of the participants were women. The typical
participant was 30.3 years old with 7.8 years of work experience. On average,
the participants had supervised a maximum of 22 people. The participants tended
to agree with the statement, “My advancement and/or compensation at work is
contingent upon achieving a goal or goals . . .” in the same four areas used in
the experimental materials. On a scale of 1 = disagree and 7 = agree, the mean
response for this question is 4.7 for the customer area, 4.9 for the financial area,
5.3 for the internal business process area, and 4.9 for the learning and growth area.
Approximately 42% of the subjects stated that they are paid a bonus in addition to
their salary. These data suggest that the subjects have had exposure to the kinds of
goals, incentives and issues in the decision task and that the participants were, as
a result of their education and work experience, capable of providing meaningful
responses.
RESULTS
Preliminary Analysis
The demographic variables were tested to determine if systematic differences exist

across cells. Chi-square tests show no significant differences across treatment
conditions (all p-values > 0.10) for any of the three categorical demographic
variables: gender, educational degree program, and current compensation plan.
Separate 2 × 3 ANOVAs were conducted for each continuous demographic
variable: the number of years of work experience, the maximum number of
individuals supervised, and whether compensation at work was contingent upon
achieving a goal or goals in each of the four business areas. Incentive structure at
two levels and goal difficulty at three levels served as the independent variables.
The ANOVA results show no significant differences (all p > 0.10) for any
continuous demographic variable across cells. Thus, results from the analysis
of the demographic data suggest that randomization was effective and that the
subjects are homogenous across treatment conditions.
The attractiveness of the incentives and the expectancy of accomplishing
the goals in all four areas (given maximum effort) should not differ between
incentive structures. The data support this proposition in that the attractiveness
of the incentives based on each area separately (mean = 8.11) does not differ
from the attractiveness of the incentives based on achieving the goals in all areas
(mean = 8.28, t = 1.0441, p = 0.2980). Incentive structure was not predicted to
affect goal challenge but to interact with goal challenge to affect motivation. Con-
sistent with this notion, the expectancy of accomplishing the goals in all four areas
when incentives were based on each area (mean = 6.94) does not differ signifi-
cantly from when incentives were based on all areas (mean = 7.24, t = 0.8874,
p = 0.3762).
The expectancy of accomplishing all goals, however, should differ by goal
difficulty so that the expectancy should decrease with goal difficulty. The results
generally support this proposition in that easy and challenging goals (means 7.84
and 7.81, respectively) are seen as more likely to be accomplished (p = 0.0001)
than unattainable goals (mean = 5.78).
The total amount of time the subjects planned to work in a week, as shown in
Table 1, was not affected by incentives or goal difficulty. The finding that subjects
do not adjust their workweek for incentives or goal difficulty is consistent with
Naylor, Pritchard and Illgen (1980) who assert that total work effort is stable
across most conditions other than those associated with individual differences.
Hypotheses Testing
The first hypothesis predicts that individuals will shift time from goals that
information indicates are easy to goals that information indicates are challenging.
Recall that goal difficulty was manipulated only for the customer goal and that
goal difficulty was held constant (i.e. challenging but attainable) for the other
Table 1. Total Planned Hours Summed Across All Four Areas.

Panel A: ANOVA
Source df F p
Goal difficulty (Easy vs. Challenging 2 0.14 0.8714

vs. Unattainable)
Incentive structure (Separate vs. Set) 1 1.75 0.1882
Goal difficulty × Incentive structure 2 0.74 0.4771
Model 5 0.66 0.6521
Error 159
Panel B: Mean (Standard Deviation), Cell Size
Goal Difficulty Incentive Structure
Each Goal Evaluated All Goals Evaluated

Separately As a Set
Easy 38.73 (7.12) 37.74 (6.57)

30 27
Challenging 40.68 (7.51) 37.23 (7.04)
25 26
Incentive average 38.79 (6.21) 38.69 (9.12)
28 29
three goals. To measure the relative amounts of time allocated to achieving the
various goals, the difference in time allocated to the (manipulated) customer
area and the average time allocated to the three other areas was computed.
The hypothesis predicts that the difference in time allocations should be larger
(more positive) when the customer goal is challenging compared to when
it is easy.
The hypothesis was tested using a 2 × 2 ANOVA with the difference in time
allocated between the customer goal and the average of the other three goals as the
dependent measure. Goal difficulty (easy versus challenging but attainable) and
incentive structure (separate versus set) served as the independent variables. As
can be seen from Panel A in Table 2, the main effect for goal difficulty is highly
significant (F = 33.82, p = 0.0001). When the customer goal is challenging, then
all four goals are challenging. In the situation where all goals are challenging,
Panel B of Table 2 shows that the subjects allocated more time to the customer
area than to the other areas (mean difference = +1.77 hours) possibly reflecting
a bias towards taking care of customers or a belief that this area requires a greater
Table 2. Differencea in Time Allocated to Balanced Scorecard Areas Having

Easy or Challenging Customer Goals and Other Areas.
Panel A: ANOVA
Source df F p
Goal difficulty (Easy vs. Challenging) 1 33.82 0.0001

Model 3 12.19 0.0001
Error 104
Panel B: Means
Goal Difficulty Incentive Structure Average

Separately As a Set
Easy −2.51 −3.69 −3.10

Challenging 2.44 1.10 1.77
Incentive average −0.04 −1.29
a The difference is calculated as the time allocated to the customer area less the average time allocated
to other areas so that positive numbers reflect more relative time spent in the manipulated customer
area.
time commitment. In contrast, the subjects shift the time they plan to spend
accomplishing the three other challenging goals (mean difference = −3.10 hours)
when the customer goal is easy. Hypothesis 1 is strongly supported.
The second hypothesis predicts that individuals will shift time from areas that
information indicates the goals are unattainable to areas that information indicates
the goals are challenging. The second hypothesis was tested in a like manner
to H1 using a 2 × 2 ANOVA with the difference in time allocated between the
customer area and the average of the other three areas as the dependent measure.
For this test, goal difficulty (challenging versus unattainable) and incentive
structure (separate versus set) served as the independent variables. As can be seen
from Panel A in Table 3, the main effect for goal difficulty is highly significant
(F = 10.14, p = 0.0019). This result is modified, however, by a significant goal
difficulty by incentive interaction as discussed below.
Overall results for H1 and H2 support the prediction that subjects shift the
time they are willing to spend from one area of responsibility to another due to
goal difficulty as described in Fig. 1. Figure 2 shows the results from the study
Table 3. Differencea in Time Allocated to Balanced Scorecard Areas Having

Challenging or Unattainable Customer Goals and Other Areas.
Panel A: ANOVA
Source df F p
Goal difficulty (Challenging vs. 1 10.14 0.0019

Unattainable)
Model 3 7.44 0.0001
Error 104
Panel B: Means
Goal Difficulty Incentive Structure Average

Separately As a Set
Challenging 2.44 1.10 1.77

Unattainable −4.07 0.90 −1.59
Incentive average −0.82 1.00
a The difference is calculated as the time allocated to the customer area less the average time allocated
to other areas so that positive numbers reflect more relative time spent in the manipulated customer
area.
in the same graphic form as Fig. 1. Recall that the information indicated that all
non-customer goals (i.e. goals from competing areas) are challenging. As can be
seen, individuals react to goal difficulty information by shifting their time from
areas associated with easy goals to those associated with challenging goals, and
from unattainable goals to challenging goals in a manner that supports our overall
prediction.
H3 and H4 suggest that incentive structure modifies the relationship between
goal difficulty and planned time leading to the prediction that the interaction terms
reported in Tables 1 and 2 should be significant. As can be seen in Panel A of
Table 2, incentive structure does not interact with goal difficulty (F = 0.01,
p = 0.9243) thus failing to support H3. Hence, the data do not suggest that
incentive structure modifies the amount of time subjects plan to spend on Balanced
Scorecard areas associated with easy versus challenging goals.
Panel B of Table 3 shows that when incentives are based on each goal separately,
information indicating that the customer goal is unattainable caused individuals
Fig. 2. Observed U-Shaped Functions.
to allocate more time to achieving competing goals (mean difference = −4.07).

This is compared to when information indicates that the customer goal is chal-
lenging but attainable (mean difference = +2.44). These two conditions differ
significantly (p = 0.0001). However, the relative time allocated to the customer
area and its competing areas does not differ between the unattainable (mean
difference = +0.90) and challenging (mean difference = +1.10) conditions when
incentives are based on attaining all goals as a set (p = 0.8894). This suggests
that incentive structure can modify the effects of goal difficulty, thus supporting
H4 while qualifying H2. Hence, the data suggest that incentive structure modifies
the amount of time subjects plan to spend in each Balanced Scorecard area when
those areas differ in terms of whether their associated goals are unattainable versus
challenging or easy.
Supplemental Analysis
Predicting separate differential effects for the manipulations on the time allocated
to each of the three non-customer goals is not possible. Nevertheless, in the spirit
of the study’s main premise that individuals consider all areas of the Balanced
Scorecard together as they plan their time, supplemental analysis of these data
is reported. Panel A of Table 4 shows the results of a MANOVA in which hours
allocated to the Financial goal, the Internal Business goal, and the Learning and
Growth goal are dependent variables with goal constituting a within subject vari-
able. Customer goal difficulty at three levels (easy, challenging, and unattainable)
and incentive structure at two levels (separate versus set) served as the independent
variables. As can be seen from the table, the analysis shows a three-way interaction
between goal, customer goal difficulty, and incentive structure (F 318,4 = 2.44,
p = 0.0468) making the interpretation of other effects difficult.
Some insights into the interaction of these variables are possible by examining
the mean hours allocated towards attaining each of the three non-customer
goals, as well as hours allocated to the customer goal, as shown in Panel B of
Table 4. Consider first the case in which incentives are based on achieving each
goal separately. Here, the time allocated to the customer (manipulated) goal
follows the predicted inverted-U shaped pattern (Fig. 1) and the time allocated to
each competing goal generally follows the predicted righted-U shaped pattern. As
hours are shifted to and from the customer goal according to its difficulty, the hours
are spread relatively consistently across the three competing goals. In contrast,
consider the case in which incentives are based on achieving all goals as a set. Here,
no perceptible difference in time allocation occurs between the challenging and
unattainable conditions across any of the four goals. That is, when the customer
goal is challenging, the subjects allocated 10.1 hours to this goal and the like figure,
10.3 hours, when the goal is unattainable. When the customer goal is challenging
versus unattainable, hours allocated to the other three goals correspond closely
as well: financial goal = 8.6 and 9.6; internal business goal = 9.9 and 9.9; and
learning and growth goal = 8.6 and 8.9, respectively. These observations suggest
that the pattern of results shown in Fig. 2 is driven by the condition in which
incentives reward each goal separately – a conclusion that is consistent with H4.
This incentive structure more closely resembles those used in the prior studies
upon which the predictions were based (in contrast to incentives in which rewards
are received only after achieving an entire set of distinct, competing goals).
The Effects of Incentive Structure and Goal Difficulty
Table 4. Time Allocated to Balanced Scorecard Areas Other than the Customer Area.
Panel A: MANOVA
Source df F p
Between subjects effects

Customer goal difficulty 2 3.03 0.0513
Customer goal difficulty × Incentive 2 1.20 0.3040
structure
Within subjects effects
Goal 2 6.18 0.0023
Goal × Customer goal difficulty 4 2.11 0.0794
Goal × Incentive structure 2 0.71 0.4940
Goal × Customer goal difficulty × 4 2.44 0.0468
Incentive structure
Panel B: Mean Hours Allocated to Balanced Scorecard Areas
Customer Area Incentive Based on Attaining Incentive Based on Attaining

Each Goal Separately All Goals as a Set
Customer Financial Internal Learning & Customer Financial Internal Learning &
Business Growth Business Growth
Easy 7.8 11.5 10.5 9.0 6.7 10.4 11.2 9.5

Challenging 12.0 10.5 9.9 8.4 10.1 8.6 9.9 8.6
Unattainable 6.6 9.6 11.3 11.3 10.3 9.6 9.9 8.9
Average 8.7 10.5 10.5 9.5 9.1 9.5 10.3 9.0
137
DISCUSSION
Some strengths and limitations to the study should be mentioned before discussing
its findings. The study was conducted in the laboratory using a written exercise
designed to capture the essentials of managers’ time allocation decisions. As such,
care should be taken when extrapolating the results to other contexts and situations.
On the other hand, the study employs a strong design that contributes to its internal
validity and allows us to examine the proposed causal relationships. It also benefits
from a high level of experimenter control and uses a task that corresponds more
closely to the kinds of tasks performed by managers than many of the previous
goal studies. Furthermore, the materials that the subjects used are based on the
Balanced Scorecard of an actual company. These factors increase the study’s
external validity.
This is one of a very few studies to examine the effects of goal difficulty and
the effects of incentive structure in a Balanced Scorecard context where multiple
demands vie for the subjects’ time. Based on Naylor, Pritchard and Illgen’s (1980)
NPI theory, we predicted that subjects would shift their time between areas based on
the goal difficulty information they received and as influenced by their incentives.
These predictions were generally supported. Although we found considerable
support that incentive structure and goal difficulty affect how individuals allocate
their time between areas, we found no evidence that either influences the total
amount of time the subjects said they would work to achieve satisfactory results in
all the Balanced Scorecard areas. This also supports NPI theory’s assumption that
in a work-related situation, people do not change their total level of effort except
under very unusual situations. Rather, individuals shift their time from easy goals
to more challenging goals and from unattainable goals to challenging goals.
These findings suggest that organizations should consider incentives and
management control variables such as goal difficulty information as ways to
change or refocus individuals’ time and not as ways to induce more effort. This has
implications for the kinds of effort attributions that are sometimes made during
performance evaluation. Evaluators should be careful about attributing negative
performance to a lack of effort unless they have first ruled out misdirected effort.
It also highlights the importance of receiving timely and accurate information in
order for individuals to appropriately direct their time. The findings also imply
that individuals are sensitive to variables that are under organizational control and
which are susceptible to manipulation. Organizations could improve appropriate
goal directed behavior by making sure that their incentives and reporting systems
focus individuals’ time on their important organizational goals.
One of the major insights of the study is that individuals react differently to
goal difficulty under different incentive structures. When a particular goal is
easy compared to challenging, the time allocated to achieving goals that are
competing for the manager’s attention does not differ according to incentive
structure. However, when information indicates that one goal is unattainable,
the incentive structure makes a substantial difference. When monetary incentives
are based on the extent to which the subjects met each goal individually, the
subjects shifted approximately 6.51 hours from the area with unattainable goals
to alternative areas. However, when the monetary incentives are provided only
upon achieving the goals in all areas, the subjects did not plan to shift any hours
from the area with unattainable goals to alternative areas. Envisioning situations
in which either result is desirable is certainly possible. If goals are somewhat
arbitrary, in that just missing a goal is still beneficial, then basing rewards on
individual goal achievement could be counter productive. Once missing a goal
becomes obvious, individuals will dramatically decrease their planned effort in
that area and redirect it towards meeting challenging but still attainable goals.
On the other hand, there are situations in which organizations want to discourage
individuals from working on unattainable goals. In this case, they should base
monetary rewards on attaining individual goals rather than all goals.
We note that individuals will plan to spend time working on a goal despite
receiving reliable information that the goal is unattainable. This suggests that
individuals consider more than goal difficulty when planning their time. For
instance, individuals may continue to be psychologically committed to goals
that they have previously accepted despite receiving negative goal difficulty
information. In addition, they may feel a need to justify their actions and believe
that missing a goal is easier to justify if effort has been expended than if one
quits altogether. They may also wish to come as close as possible to achieving
the goal in order to preserve their reputations as best they can – coming close
may not be viewed as badly as being way off the mark. Also, individuals know
that in most cases, goal achievement in future periods is tied to the level of effort
exerted this period. Hence, they may be reluctant to completely cease working on
an unattainable goal in order to avoid beginning in a hopeless situation the next
period. These conjectures are fruitful topics for future research.
Together, the findings from the study strongly suggest that when multiple
areas compete for attention, as in the Balanced Scorecard, the way incentives
are structured influences how individuals plan their time between areas rather
than their total level of effort. We have argued that planning one’s time to be
successful in multiple areas is a crucial aspect of what individuals, and particularly
managers, do. For these reasons, this study represents an important contribution
to knowledge about ways incentives can be structured in a Balanced Scorecard
framework to help organizations achieve their goals. Hopefully others will find
the approach taken by this study useful in examining these issues.
NOTES
1. Effort includes both time and intensity components, however, Larson and Callahan
(1990) argue that individuals are more likely to differentially allocate their time than vary
their intensity between tasks. They argue that individuals “groove in” to an overall level of
intensity, which they strive to maintain over time.
2. In pretests, subjects were concerned about duties other than those directly tied to
Balanced Scorecard areas. Inclusion of the 15 hours per week on tasks unrelated to their
four goals controls for differences in the amount of time that subjects would otherwise
have assumed needed to be spent on these tasks.
3. One reviewer suggested analyzing the data using proportions rather than difference
scores. The results are equivalent using either method (cf. Tuttle & Harrell, 2001).
4. The manipulation checks were presented with the original case materials and
asked the subjects not to look back. A stronger test would have been to administer the
post-experimental materials separately from the case.
5. A small number of students, which we did not count, chose not to participate. No
monetary incentive was provided.
ACKNOWLEDGMENTS
The authors would like to thank workshop participants at the University of Utah
and the University of South Carolina for their helpful comments.
REFERENCES
Ajzen, I. (1987). Attitudes, traits, and actions: Dispositional prediction of behavior in personality
and social psychology. In: L. Berkowitz (Ed.), Advances in Experimental Social Psychology
(Vol. 20, pp. 1–63). San Diego, CA: Academic Press.
Ajzen, I., & Madden, T. J. (1986). Prediction of goal-directed behavior: Attitudes, intentions,
and perceived behavioral control. Journal of Experimental Social Psychology, 22(5),
453–474.
Anthony, R., & Govindarajan, V. (1998). Management control systems. Homewood, IL: Irwin/McGraw-
Hill.
Ashford, & Northcraft, G. (2002). Robbing Peter to pay Paul: Feedback environments and enacted
priorities in response to competing task demands. Human Resource Management Review,
forthcoming.
Atkinson, J. W. (1958). Motives in fantasy, action, and society: A method of assessment and study.
Princeton, NJ: Van Nostrand.
Awasthi, V., & Pratt, J. (1990). The effects of monetary incentives on effort and decision performance:
The role of cognitive characteristics. The Accounting Review, 65(4), 797–811.
Blau, G. (1986). The relationship of management level to effort level, direction of effort, and managerial
performance. Journal of Vocational Behavior, 29, 226–239.
Blau, G. (1993). Operationalizing direction and level of effort and testing their relationship to individual
job performance. Organizational Behavior and Human Decision Processes, 55, 152–170.
Bonner, S. E., Hastie, R., Sprinkle, G. B., & Young, S. M. (2000). A review of the effects of finan-
cial incentives on performance in laboratory tasks: Implications for management accounting.
Journal of Management Accounting Research, 12, 19–64.
Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review
and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1–3), 7–42.
Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis.
Review of Educational Research, 64, 363–423.
Chesney, A. A., & Locke, E. A. (1991). Relationships among goal difficulty, business strategies, and
performance on a complex management simulation task. Academy of Management Journal,
34(2), 400–424.
Cotton, J. L., & Tuttle, J. M. (1986). Employee turnover: A meta-analysis and review with implications
for research. Academy of Management Review, 11(1), 55–70.
Covey, S. R. (1989). The seven habits of highly effective people: Restoring the character ethic. New
York, NY: Simon and Schuster.
Early, P. C., Wojnaroski, P., & Prest, W. (1987). Task planning and energy expended: Exploration of
how goals influence performance. Journal of Applied Psychology, 72, 107–114.
Erez, M., Gopher, D., & Arzi, N. (1990). Effects of goal difficulty, self-set goals, and monetary
rewards on dual task performance. Organizational Behavior & Human Decision Processes,
47(2), 247–270.
Fatseas, V. A., & Hirst, M. K. (1992). Incentive effects of assigned goals and compensation schemes
on budgetary performance. Accounting and Business Research, 22(88), 347–355.
Gollwitzer, P. M., & Bargh, J. A. (1996). The psychology of action. New York, NY: Guilford Press.
Jenkins, G. D., Gupta, N., Mitra, A., & Shaw, J. D. (1998). Are financial incentives related to perfor-
mance? A meta-analytic review of empirical research. Journal of Applied Psychology, 83(5),
777–787.
Kaplan, R. S. (1997a). Mobil USM&R (A): Linking the balanced scorecard. Boston, MA: Harvard
Business School Publishing.
Kaplan, R. S. (1997b). Mobil USM&R (B): New England sales and distribution. Boston, MA: Harvard
Business School Publishing.
Kaplan, R. S., & Norton, D. P. (1992). The balanced scorecard: Measures that drive performance.
Harvard Business Review (January–February), 71–79.
Kaplan, R. S., & Norton, D. P. (1996). Translating strategy into action: The balanced scorecard. Boston,
MA: Harvard Business School Publishing.
Komaki, J. L., Coombs, T., & Schepman, S. (1996). Motivational implications of reinforcement theory.
In: R. M. Steers, L. W. Porter & G. A. Bigley (Eds), Motivation and Leadership at Work (pp.
34–52). New York, NY: McGraw-Hill.
Larson, J. R., Jr., & Callahan, C. (1990). Performance monitoring: How it affects work productivity.
Journal of Applied Psychology, 75(5), 530–538.
Lee, T. W., Locke, E. A., & Phan, S. H. (1997). Explaining the assigned goal-incentive interaction:
The role of self-efficacy and personal goals. Journal of Management, 23(4), 541–559.
Libby, R., & Lipe, M. G. (1992). Incentives, effort, and the cognitive processes involved in accounting-
related judgments. Journal of Accounting Research, 30(2), 249–273.
Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs,
NJ: Prentice-Hall.
Locke, E. A., Latham, G. P., & Erez, M. (1988). The determinants of goal acceptance and commitment.
Academy of Management Review, 13, 23–39.
McAllister, D. W., Mitchell, T. R., & Beach, L. R. (1979). The contingency model for the selec-
tion of decision strategies: An empirical test of the effects of significance, accountability, and
reversibility. Organizational Behavior and Human Decision Processes, 24(2), 228–244.
Miodonski, B. (1999). Time management is key to juggling multiple jobs. Contractor, 46(2), 5.
Mowen, J., Middlemist, R., & Luther, D. (1981). Joint effects of assigned goal level and incentive
structure on task performance: A laboratory study. Journal of Applied Psychology, 66, 598–
603.
Naylor, J., & Illgen, D. (1984). Goal setting: A theoretical analysis of a motivation technology. Research
in Organizational Behavior, 6, 95–140.
Naylor, J., Pritchard, R., & Illgen, D. (1980). A theory of behavior in organizations. New York, NY:
Academic Press.
Plack, H. (2000). Managing time can be crucial. Baltimore Business Journal, 17(40), 27.
Sprinkle, G. B. (2000). The effect of incentive contracts on learning and performance. The Accounting
Review, 75(3), 299–326.
Stone, D. N., & Zeibart, D. A. (1995). A model of financial incentive effects in decision making.
Organizational Behavior and Human Decision Processes, 61(3), 250–261.
Tuttle, B., & Burton, F. G. (1999). The effects of a modest incentive on information overload in an
investment analysis task. Accounting, Organizations and Society, 24, 673–687.
Tuttle, B., & Harrell, A. M. (2001). The impact of unit goal priorities, economic incentives, and interim
feedback on the planned effort of information systems professionals. Journal of Information
Systems, 15(2), 81–98.
Vroom, V. H. (1964). Work and motivation. New York, NY: Wiley.
Wood, R. E., & Locke, E. A. (1990). Goal setting and strategy effects on complex tasks. Research in
Organizational Behavior, 12, 73–109.
Wood, R. E., Mento, A. J., & Locke, E. A. (1987). Task complexity as a moderator of goal effects: A
meta-analysis. Journal of Applied Psychology, 72(3), 416–425.
Wright, P. M. (1991). Goals as mediators of the relationship between monetary incentives and perfor-
mance: A review and NPI theory examination. Human Resource Management Review, 1(1),
1–22.
Wright, P. M. (1992). An examination of the relationships among monetary incentives, goal level, goal
commitment, and performance. Journal of Management, 18(4), 677–693.
APPENDIX
Sample Decision Case
Columbia Corporation
Assume that you are a unit level manager employed by the Columbia Corporation.
Columbia’s senior management has identified a competitive strategy that is linked
to goals in four important business areas. All unit managers have the same goals. In
addition, performance measures were developed for each business area as follows:
Area Customer Financial Internal Learning &

Business Growth
Example Mystery Return on Profit per Employee

performance shopper ratings capital business unit attitude survey
measures employed
(ROCE)
Customer Net margin Number of Employee skill
complaints inventory development
stock-outs
Customer Sales & Quality Timely access
compliments growth rate assessment to decision
score making
information
Feedback Goal is easily Goal is Goal is Goal is
attainable challenging challenging challenging
but attainable but attainable but attainable
Notice that you have received reliable interim feedback suggesting that the Cus-
tomer goal is easily attainable and that the other three goals are challenging but
attainable.
Bonus and Promotion: Two items are of particular interest. First, a division
manager is retiring and you are being considered for his replacement. Second,
Columbia provides a performance bonus of up to 20% of your salary.
Both your promotion and bonus depend on how many goals you achieve. The
more goals you achieve the greater your bonus and likelihood of promotion.
Decision: Like most managers, assume that you can work as many hours as you
want and you can allocate the hours as you see fit. Further, assume that during the
next performance evaluation period, you must spend 15 hours per week working on
administrative and other responsibilities that are not directly related to achieving
your goals in the four business areas (e.g. personnel issues, travel). Also, assume
that you will devote all your remaining work time towards achieving your goals in
the four business areas. Given the information in the case, please indicate below
how you would allocate your hours at work to achieve the goals in each business
area:
Goal Area Hours of Work Effort Allocated Each Week

to Achieve the Goals in Each Business Area
Customer Hours/week
Financial Hours/week
Internal business Hours/week
Learning & growth Hours/week
Administrative & other 15 Hours/week
Total work hours Hours/week
THE EFFECT OF FAIRNESS IN
CONTRACTING ON THE CREATION
OF BUDGETARY SLACK
Theresa Libby
ABSTRACT
This paper explores the relationship between fairness in contracting and
the creation of budgetary slack. A laboratory experiment was performed
in which privately informed subjects were compensated under either a
truth-inducing or slack-inducing incentive contract. Contracting processes
were either fair or unfair as defined by procedural justice theory (Leventhal,
1980; Lind & Tyler, 1988). Under the slack-inducing contract, subjects
exposed to the fair contracting process created significantly less slack than
subjects exposed to the unfair contracting process. Slack created by subjects
compensated under the truth-inducing contract was low and insensitive to
the fairness or unfairness of the contracting process employed.
INTRODUCTION
In large, decentralized organizations, accounting information often forms the basis
for budget estimates used in strategic planning, in coordinating work between
organizational divisions, and in setting targets used in performance evaluation
(Merchant, 1985). The accuracy of budget estimates is key to the effectiveness of
these short-run and long-run planning activities. Even so, prior research indicates

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06007-1
145
146 THERESA LIBBY
budget estimates are rarely accurate (Otley, 1985). The lack of accuracy of budget
estimates may be the result of the manager’s inability to forecast accurately
operational input-output relationships due to uncertainty inherent in the task.
In addition, the organization may operate in an environment characterized by
uncertainty. The manager may respond by building a buffer against uncertainty in
the environment or in the task into his or her budget estimate (Davila & Wouters,
2000).1
Alternatively, inaccuracy in budget estimates may be motivated by budget-
constrained performance evaluation and reward systems (Jensen, 2001). Results
of several studies in the accounting literature indicate that budget-constrained
performance evaluation systems that emphasize variances in budget-to-actual
results lead to budget gaming (Bart, 1988; Hopwood, 1972; Merchant, 1985;
Walker & Johnson, 1999). One form of budget gaming that has been the focus of
significant study is the creation of budgetary slack (Young & Lewis, 1995).
Budgetary slack is defined as the intentional incorporation of budget amounts
that make the budget easier to attain (Dunk, 1993). Budgetary slack is created
when managers build excess resources into their budgets or knowingly understate
their productive capabilities (Baiman & Evans, 1983; Young & Lewis, 1995).
Budgetary slack is often manifested through overstated expenses or understated
revenues and production plans (Kren & Liao, 1988).
While budgetary slack may play a positive role by facilitating flexibility in
dealing with uncertainty (Cyert & March, 1963; Van der Stede, 2000), this paper
focuses on the alternative negative role budgetary slack plays when budgets
are used to set targets for performance evaluation.2 Budgetary slack created
when budget estimates are intentionally set at a level that is easy to attain can
be detrimental to management control system effectiveness, especially when
responsibility center managers are held accountable for meeting budget targets
and these targets are used to coordinate activities between organizational divisions
and to compensate managers for high performance.
According to Jensen (2001) and Murphy (2000), the typical pay-for-
performance compensation contract includes a fixed salary plus a bonus
increasing in performance above a pre-specified budget target. When a manager
is compensated under this type of contract, holds private information about the
productive capability of his/her division and participates in setting his/her own
budget target, incentives for slack creation exist. Consequently, this type of
contract has been labeled slack-inducing (Waller, 1988). A significant stream
of research has developed using the agency framework to test the ability of
other forms of budget-based incentive contracts to encourage managers to reveal
their private information while limiting the amount of budgetary slack managers
create (Baiman, 1982). These types of contracts have been labeled truth-inducing
The Effect of Fairness in Contracting on the Creation of Budgetary Slack 147
(Waller, 1988). Truth-inducing contracts typically include a penalty for perfor-

mance that differs from a participatively set budget target (Weitzman, 1976).
Although theoretically sound, truth-inducing contracts are rarely used in prac-
tice (Baker et al., 1988), perhaps because the costs of implementation outweigh
their benefits. An alternative to truth-inducing contracts generating slack-reducing
effects would therefore be valuable. The objective of the present study is to
determine whether the utilization of fair contracting processes combined with an
otherwise slack-inducing incentive contract provides a feasible alternative.3
For the purposes of this study, fair contracting processes are defined according
to procedural justice theory (Leventhal, 1980; Lind & Tyler, 1988). Procedural
justice theory suggests organizational members will perceive a process to be fairer
the greater the degree to which the decision-maker creates a “positive atmosphere
of cooperation and compromise” even when “. . . the values, desires and concerns
of the decision-maker and affected parties may not always agree” (Hunton, 1996,
p. 650).
This paper describes the results of an experiment in which subjects performed
a production task and received compensation under a budget-based incentive
contract of either the slack-inducing or truth-inducing form. In addition, subjects
received information about a contracting process designed to be either fair or
unfair. Results indicated subjects compensated under the slack-inducing contract
and assigned to the fair contracting process condition created significantly less
budgetary slack than subjects assigned to the unfair contracting process condition.
While subjects compensated under the truth-inducing contract, on average, created
less slack than subjects compensated under the slack-inducing contract, the fairness
of the contracting process had no effect on the amount of slack they created.
The remainder of the paper is organized as follows. In the next section,
hypotheses are developed followed by a description of the experimental design
and experimental method. The results of the statistical analyses are then reported
followed by discussion of the experimental findings, their limitations and their
implications for future research.
RELATED LITERATURE AND HYPOTHESIS

DEVELOPMENT
The incentive contracting literature in accounting has developed in response to
the need for incentive contracts that motivate subordinates to be productive and to
truthfully communicate local private information to improve centralized allocation
and coordination decisions. In general, these studies recommend the use of budget-
based incentive contracts and participation in budgeting to motivate managers to set
148 THERESA LIBBY
accurate budget targets (Demski & Feltham, 1978; Melumad & Reichelstein, 1989;
Namazi, 1985). A major concern of this literature is that participation in setting
budget targets allows for information sharing, but also increases the potential for
the creation of budgetary slack if managers are then compensated based on meeting
or exceeding the budget that was participatively set (Antle & Eppen, 1985). Truth-
inducing contracts have been constructed to address this problem.
Truth-inducing contracts impose a penalty for misrepresentation, usually
scaled by the difference between budgeted and actual performance, providing an
incentive for subordinates to reveal their private information through the budget
targets they set (Kirby et al., 1991; Weitzman, 1976). The particular form of
truth-inducing contract studied here was developed theoretically by Reichelstein
and Osband (1984) and adapted to the budgeting context by Kirby et al. (1991).
The contract was further adapted by Kirby (1992) to a context in which the
manager selects a budget target and focuses effort on maximizing output to meet
or exceed that target. The contract is of the following form:
H(A, B) = v(B) + w(B)(A − B)
subject to
v(B) is increasing and convex (v > 0, v < 0) and w(B) = v (B) for all B.
In this context, H(A, B) represents the manager’s total compensation, B repre-
sents the productivity estimate (or budget target) for the period, and A represents
the actual level of productivity for the period. The manager’s total compensation
(H ) is therefore made up of an ex ante payment, v(B), and a bonus or penalty,
w(B) (A − B), whose value depends on the variance between budget and actual
performance.
The truth-inducing properties of this contract have been tested empirically by
Kirby (1992), Reichelstein (1992), and Chow et al. (2000). While the theoretical
design of the contract relies on the assumption that managers are strict utility
maximizers, Kirby (1992) finds the contract maintains its truth-inducing properties
even when this assumption is relaxed. Reichelstein (1992) reports a successful
application of this contract form by the German Department of Defense. Finally,
Chow et al. (2000) experimentally test several mechanisms designed to motivate
truthful upward communication of private information including this contract
form. They find this truth-inducing contract led to significantly less misrepresen-
tation of private information than a slack-inducing linear profit sharing scheme.
Accordingly, in the context of the current study, individuals compensated under
this form of truth-inducing contract are expected to create a relatively low amount
of budgetary slack. This prediction is stated formally as follows:
H1. Individuals compensated under a truth-inducing contract will create less
budgetary slack than individuals compensated under a slack-inducing contract.
Although truth-inducing contract forms have been widely examined in the

academic literature, the feasibility of implementing truth-inducing contracts in
real organizations has been challenged on the basis of complexity (Atkinson,
1985; Jennergren, 1980; Loeb & Magat, 1978) and cost (Evans et al., 2001;
Evans & Sridhar, 1996; Luft, 1994). For example, Evans and Sridhar (1996) find
the tighter the controls placed on the manager, the higher the risk premium the
manager will demand before accepting a particular contract. Evans et al. (2001)
find contracts derived on the assumption that all managers will create budgetary
slack are costlier than contracts developed assuming some managers derive utility
from honestly revealing their private information.
In addition, Luft (1994) finds individuals prefer bonus-framed to penalty-
framed contracts for several reasons including the perception that bonus-framed
contracts are fairer, more rewarding, and more motivational than penalty-framed
contracts. Consequently, employees may demand higher (i.e. more costly to the
employer) potential payoffs from penalty-framed truth-inducing contracts than
bonus-framed slack-inducing contracts to exert the same amount of effort. If this
is the case, then an alternative method of reducing the amount of budgetary slack
subordinates create while continuing to motivate them under a bonus-framed
budget-based incentive contract would be useful. Empirical results of prior studies
reviewed below suggest the use of fair processes in allocating organizational
resources may provide such an alternative.
Fair Process and Social Exchange Theory
Social exchange theory defines behavior in terms of two types of exchange, eco-
nomic exchange and social exchange (Blau, 1964). Economic exchange motivates
behavior intended to fulfil the formal economic employment contract. Employers
offer a “fair day’s pay” and expect employees to provide a “fair day’s work.” Social
exchange, on the other hand, is based on a psychological or implicit contract
that defines obligations on the part of both the organization and the employee
(Rousseau & Parks, 1993). Employees may go beyond the specific duties laid out
in the employment contract if they feel the organization “values their contributions
and cares for their general well-being” (Eisenberger et al., 1990, p. 51).
The division of employee behavior into these two separate, but related cate-
gories mirrors the theoretical predictions of organizational justice theory (OJT).
OJT suggests individuals’ overall perceptions of fairness in an organizational
setting are based on the combination of judgments about the fairness of the actual
amount of resources allocated by the organization to subordinates, known as dis-
tributive fairness, and judgments about the fairness of the processes used to make
allocation decisions, known as procedural fairness (Folger & Cropanzano, 1998).
150 THERESA LIBBY
Employees’ judgments about the fairness of the actual amount of organizational

resources distributed to them may be related to the economic exchange relationship
between the employee and the organization, while judgments about the fairness
of the allocation processes may be related to the social exchange (non-economic)
relationship.
Prior research has demonstrated the relationship between procedural fairness
and social exchange. Specifically, when organizational processes and procedures
are perceived by the employee to be fair, positive consequences result including
subordinates’ increased trust in superiors, increased commitment to organizational
goals, increased willingness to act in the organization’s best interests, reduced
turnover, and improved performance (Kim & Mauborgne, 1993; Moorman et al.,
1998; Naumann & Bennett, 2000).
Naumann and Bennett (2000) report the results of a survey of bank employees
in which they find a significant positive correlation between procedural justice
climate, organizational commitment and extra-role behavior. Procedural justice
climate is defined as an organizational climate with a high emphasis on the fairness
of organizational procedures. Employees who perceived a high procedural justice
climate performed significantly more tasks that were in the organization’s best
interests although these tasks fell outside of their employment contracts.
Moorman et al. (1998) study the link between procedural fairness, perceived
organizational support, and organizational citizenship behaviors. Their results
support the hypothesis that procedural fairness leads to an increase in subordinates’
perceptions of a supportive organizational environment. In addition, they found
that when subordinates felt their organization was supportive of them and valued
their contributions, subordinates reciprocated by increasing extra-role behaviors.
Kim and Mauborgne (1993), in a longitudinal study of multinational organi-
zations, found divisional managers who perceived they had been fairly treated by
head office in resource allocation decisions were more likely to comply with head
office requests and reported higher commitment to the organization and higher
trust in head office management than managers who reported being unfairly
treated. These positive outcomes may be a result of subordinates reciprocating
the fair treatment they have received from the organization.
While these studies consider the effect of fair and unfair organizational
processes on subsequent behavior, they do not consider the organizational
consequences of combining fair or unfair processes with various incentive
contract forms. In the current study, the consequence of combining fair and unfair
contracting processes with budget-based incentive contracts of the slack-inducing
and truth-inducing forms is examined.
When both contract and process are considered, a reduction in budgetary
slack created when contracting processes are perceived as fair may indicate
subordinates’ preference for non-economic benefits derived from reciprocating

the fair treatment they have received from their organization over economic
benefits that may be derived from slack-creation. An interesting question then
becomes under what conditions will economic and non-economic benefits be
traded off? This question is addressed to some degree in the accounting literature
by Luft (1994) and in the economics literature by Fehr et al. (2001) and Fehr and
Gachter (2002). These studies are reviewed below.
Luft (1994) argues that bonus-framed contracts, like the slack-inducing contract
considered here, provide employees with non-monetary payoffs in the form of ap-
proval and appreciation which are not communicated by penalty-framed contracts.
Approval and appreciation are non-monetary payoffs received through the social
exchange relationship between the employee and the organization. Penalty-framed
contracts, on the other hand, tend to focus the employee’s attention on the monetary,
or economic exchange relationship between the employee and the organization.
In addition, Luft (1994) suggests employers allow bonus-framed contracts to re-
main purposefully incomplete in terms of the amount of bonus that will be received.
That is, the employee trusts a bonus will be paid at some stipulated time in the
future assuming the employee exerts a pre-determined level of effort. The nature
of this “trust contract” implies a longer term relationship between the employee
and the organization. Fehr et al. (2001) test the use of fairness as an enforcement
device when contracts are left incomplete. They demonstrate, both theoretically
and experimentally, that an incomplete bonus-based contract can be more efficient
in motivating agents to exert effort than a more complete, penalty-based contract
because the incomplete contract leaves room for reciprocity between the principal
and the agent. They find fair treatment is reciprocated under the bonus-based
incomplete contract, even in a one-period world where the principal contracts
with the agent only once and therefore has an economic incentive not to pay the
promised bonus.
In the context of the current study, individuals compensated under a bonus-based
slack-inducing contract have the opportunity to act in their own best interests by
misinforming the organization of their actual productive capability (i.e. creating
budgetary slack). The individual receives a short-term economic benefit from doing
so in the form of a larger bonus. The organization is worse off if the inaccurate infor-
mation provided is also used for planning and coordination across the organization.
When budgeting processes are fair, the literature reviewed above suggests the
bonus-framed slack-inducing contract allows room for reciprocity between the
employee and the organization. Consequently, the social exchange relationship
between the individual and the organization becomes salient under a bonus-framed
slack-inducing contract and individuals should therefore respond to fair budgeting
processes by creating less budgetary slack. The opposite effect will occur when
152 THERESA LIBBY
budgeting processes are unfair; that is, employees who perceive budgeting
processes to be unfair will reciprocate this unfair treatment by acting in their own
rather than the organization’s best interests by creating a relatively high amount
of budgetary slack.
Penalty-framed truth-inducing contracts tend to be completely specified in
economic terms at the beginning of the period due to difficulties in enforcing
the penalty after the fact (Luft, 1994). As a result, a shorter-term economic
exchange relationship between the individual and the organization may become
salient under the penalty-framed truth-inducing contract. If so, procedural fairness
becomes less important and employees may then focus on the economic benefits
obtainable in the current period to a greater degree than consideration of any
future benefits that may accrue. Fehr and Gachter (2002) refer to this effect
as a “crowding out” of agent’s incentives to voluntarily cooperate. Results of
their experimental study indicate that incentive contracts that include a penalty
for shirking (i.e. the agent provides less than the agreed upon level of effort)
are less efficient than a fixed-fee contract because they discourage agents from
focusing on the longer term employment relationship and therefore, reduce the
agent’s interest in reciprocating fair treatment. Thus, individuals compensated
under the penalty-framed truth-inducing contract may not respond to the fairness
or unfairness of budgeting processes when selecting a budget target because
economic incentives imbedded in the contract will be most salient to them.
In summary, this review of the literature implies that the relation between
fairness in contracting and budgetary slack creation is moderated by the form
of budget-based incentive contract employed. That is, fairness in contracting
will influence the amount of budgetary slack individuals create when they are
compensated under a slack-inducing, but not a truth-inducing incentive contract.
This line of reasoning leads to the following hypothesis:
H2. When a slack-inducing contract is employed, budgetary slack will be lower

when the contracting process is fair than when the contracting process is unfair;
however, when a truth-inducing contract is employed, budgetary slack will be
low irrespective of the contracting process employed.
EXPERIMENTAL DESIGN AND METHOD
Participants
The hypotheses were tested in an experiment in which contract type

(slack-inducing vs. truth-inducing) and contracting process (fair vs. unfair) were
manipulated between subjects in a 2 × 2 full-factorial design. Subjects were

recruited from those enrolled in a required first-year undergraduate business
course. A total of 181 students took part in the experiment (96 male and 85
female) in 12 small groups of 12 to 20 subjects. Subjects within each group were
randomly assigned to one of the four experimental conditions. The study took
approximately forty minutes to complete and the twelve groups took part in the
study over a one-week period. To control for information leakage over that week,
subjects were asked not to discuss the details of the experiment with their peers
and were debriefed and received feedback on their performance only after all
subjects had completed the experiment.
Experimental Task
Subjects acted as employees of the translation division of a book publisher.

Subjects performed a production task that involved translating symbols into al-
phabetic characters using a translation key. The symbols were grouped into words
of different lengths and were presented to subjects on worksheets with ten words
per page. The words were groups of symbols that did not represent actual English
words. The task is a variation of the task developed by Chow (1983) and adapted
by Waller (1988).4
While uncertainty was designed into the experimental task, the level of
uncertainty was controlled across experimental conditions. Specifically, the
lengths of words appearing on each worksheet page were varied between five and
nine symbols with the following probabilities: 0.15 for five letter words, 0.50 for
seven letter words, and 0.35 for nine letter words. Subjects were aware of this
distribution when setting their budget targets.
Contract Type
Incentive contracts used to compensate subjects were based on the slack-inducing
contractual form used by Waller and Chow (1985) and the truth-inducing con-
tractual form developed by Kirby et al. (1991). Subjects earned tickets that were
entered in a raffle for one of twelve cash prizes of $150. The more tickets sub-
jects earned, the greater their chance to win one of these prizes. The two types of
incentive contracts were operationalized as follows:5
Slack-inducing contract
Payment = 3 tickets + 0.30 tickets (Actual − Budget) if Actual > Budget,

or
Payment = 3 tickets if Actual ≤ Budget.
154 THERESA LIBBY
Truth-inducing contract
(Budget)2 2(Budget)
Payment = + (Actual − Budget)
100 100
Subjects assigned the truth-inducing contract were provided a table in which the
total compensation under this contract for various pairs of budgeted and actual
outcomes was calculated. This table is reproduced in Appendix A. All subjects
were given sample budget and actual amounts and asked to calculate the related
compensation that would be received. They then checked these calculations to
ensure that they understood the relationship between their payment, their budget
and their actual performance.
Fair and Unfair Contracting Processes

The fair and unfair contracting processes were operationalized through scenarios
designed to reflect allocation processes and procedures prior research has indicated
most individuals would judge to be fair or unfair (see for example, Lind et al., 1990;
Moorman et al., 1998; Naumann & Bennett, 2000). Specifically, judgments of
procedural fairness in this study depended on allowing individuals the opportunity
to voice their opinions during the allocation process (Hunton & Beeler, 1997; Lind
et al., 1990) and on procedures that would encourage subjects to evaluate the man-
ager as trustworthy (Tyler, 1989; Tyler & Lind, 1992).6 In addition, the procedures
were designed to ensure that decision making was unbiased (Greenberg, 1986) and
that the decision was based on full and complete information (Greenberg, 1987). In
general, subjects were told a new incentive contract had been implemented in the
division in which they worked. Subjects were asked to assume they were new to
the organization, but had collected the information included in the scenario during
the course of their employment to date. The scenarios used to operationalize the
fair and unfair contracting processes are included in Appendix B.
Experimental Procedures
Subjects first completed a five-minute practice period to become familiar with the
translation task. They earned a piece rate of one raffle ticket for every three words
correctly translated. At the end of this practice period, the subjects verified their
work and calculated the number of words that they had correctly translated in the
practice period.7 After practicing the task and being informed of the probability
distribution of words of different lengths, but before experimental manipulations
were introduced, subjects recorded their best estimate of next period performance;
that is, their best estimate of the number of words they expected to be able to
translate if given another five minutes in which to work. Subjects placed this
completed Best Estimate of Production sheet in an envelope and sealed it.
Subjects kept this sealed envelope with them until the experiment was complete
and consequently, this information was unknown to the researcher until subjects
had completed the experiment.
Subjects then read a description of the incentive contract under which they
were to work and the information about the fair or unfair contracting process.
They provided the researcher, acting as the division manager, with the budget
they wished to use in calculating the number of tickets earned in the work
period. Subjects were told the budget would also be used by the division
manager to co-ordinate production between divisions. Information asymmetry
was controlled at a relatively high level by informing all subjects they were new
to the organization and their manager was therefore unsure of their productive
capability and did not have access to the Best Estimate of Production forms.
Subjects wrote down their budgets and then performed the translation task for
five minutes.
The third part of the experiment involved filling out a post-experimental
questionnaire. The experimental materials were then collected and one week
later, subjects received a performance report and the tickets that they had earned.
Tickets were collected and placed in a container from which one of the subjects
drew a winning ticket in each group. A cash prize of $150 was paid to the winning
subject in each group and the goals of the experiment were discussed.8 These
experimental procedures are summarized in Fig. 1.
Dependent Variable – Budgetary Slack
The dependent variable was the amount of budgetary slack subjects created.
Slack was measured as the difference between the best estimate of next period
performance subjects provided before they were given contract and process
information (i.e. prior to the introduction of the experimental manipulations) and
the budget subjects set after the incentive contract and process information was
provided to them. The pre-manipulation best estimate of next period performance
proxied for subjects’ private information about their own productive capability.
This information was known only to the subject until the experiment was
complete. Budget slack should therefore represent the intentional understatement
of subjects’ productive capabilities motivated by the budget-based incentive
contract and/or the contracting process employed. This method of measuring
budget slack is similar to the method used prior experimental studies including
Young (1985), Waller (1988), and Chow et al. (1988).
156 THERESA LIBBY
Fig. 1. Experimental Procedures.
Twelve subjects, approximately equally distributed across cells, failed to pro-

vide information necessary to calculate budget slack and were therefore dropped
from the final sample. In addition, twenty-eight subjects (twenty-three of whom
were assigned the truth-inducing contract) unexpectedly chose a budget target
higher than their expected future performance. The economic incentives imbedded
in the budget-based contracts used in this study were meant to encourage subjects
to choose a budget lower than (slack-inducing case) or equal to (truth-inducing

case) their own best estimate of next period’s performance in order to maximize
their compensation. It is unlikely that subjects misunderstood the incentives
imbedded in the contracts given they were trained in the effects of varying
levels of budget and actual performance under the specific contract assigned
to them.
An analysis of data collected in the post-experimental questionnaire indicates
this result may be due to differences in subjects’ risk tolerance. Following Young
(1985), subject’s attitudes toward risk were measured through subjects’ responses
to a standard gamble. The mean score on this risk aversion measure for these 28
subjects was 0.55 (std. dev. = 0.28)(theoretical range between 0 and 1) compared
to a mean score of 0.65 (std. dev. = 0.20) for all other subjects taking part in the
experiment. The difference between these means is significant, F(1, 167) = 4.78
( p < 0.05) indicating these 28 subjects were significantly less risk averse than
other subjects taking part in the experiment. The effect is most pronounced
in the truth-inducing contract condition. This group of subjects appear to be
responding to incentives other than those anticipated and consequently, they were
dropped from the sample leaving a final useable sample of one hundred and
forty-two subjects.
RESULTS
Manipulation Check for Contracting Process
To ensure subjects assigned the scenarios designed to represent fair and unfair
incentive contracting processes actually perceived these processes to be fair or
unfair respectively, subjects were asked to answer the following questions on
a scale of one (completely unfair) to five (completely fair): “How fair would
you judge the procedures used to set the formula on which your earnings were
based?” and “How fair would you judge the process of setting the budget used
to calculate your earnings?” These questions were based on measures reported
in Tyler and Lind (1992).9 Each subject’s score was their mean score across the
two questions included in the scale. The overall mean score on this scale was 3.60
(std. dev. = 0.81, Cronbach’s alpha = 0.67). Means and standard deviations for
perceived fairness of the contracting process are presented in Table 1, Panel A.
A 2 × 2 analysis of variance was performed on subjects’ perceptions of the
fairness of the contracting process (see Table 1, Panel B). Results indicated a
significant difference in subjects’ perceptions of the fairness of the contracting
process depending on whether they read the scenario describing the contracting
158 THERESA LIBBY
Table 1. Procedural Fairness by Contract Type and Contracting Process.

Panel A: Mean (Standard Deviation) of Procedural Fairness
Slack-Inducing Truth-Inducing Marginals

Contract Contract
Fair contracting process 3.71 3.77 3.73

(0.63) (0.79) (0.70)
n = 42 n = 31 n = 73
Unfair contracting process 3.41 3.56 3.47
(0.99) (0.75) (0.90)
n = 41 n = 28 n = 69
Marginals 3.55 3.67 3.60
(0.84) (0.77) (0.81)
n = 83 n = 59 n = 142
Panel B: Analysis of Variance of Procedural Fairness
Source SS df MS F
Contract type 0.30 1 0.30 0.38

Contracting process 4.48 1 4.48 5.75**
Contract × process 0.84 1 0.84 1.08
Error 107.46 138 0.78
∗∗ p < 0.05.
process designed to be fair or unfair, F(1, 138) = 5.75 ( p < 0.05) but perceptions
of fairness did not differ depending on contract type, F(1, 138) = 0.38, or the
contract by process interaction, F(1, 138) = 1.08, indicating subjects responded
to the manipulation of process fairness as expected.10
In the fair contracting process condition, perceived fairness may also have
manifested itself as a felt social pressure to adhere to the existing norms or culture
of fairness within this organizational division (Naumann & Bennett, 2000).
This perspective may be implied in subjects response to a post-experimental
question asking them to rate the fairness of the work environment. A 2 × 2
analysis of variance indicated a significant main effect of contracting process on
subjects’ evaluation of the fairness of the work environment, F(1, 138) = 54.58
( p < 0.001), with subjects in the fair process condition rating the work envi-
ronment as significantly fairer (mean = 3.34, std. dev. = 1.03) than subjects
in the unfair process condition (mean = 2.35, std. dev. = 1.08). No significant
differences were found based on contract form or the contract by process
interaction.
Hypothesis Tests
A 2 × 2 analysis of variance with adjustment for non-orthogonality (regression

approach) was used to determine the statistical significance of the differences in
the mean amount of slack created in each experimental condition. The one hundred
and forty-two usable observations were included in this analysis. Cell means for
slack created by experimental condition are presented in Table 2 (Panel A) and
Fig. 2. Results of the analysis of variance are presented in Table 2 (Panel B).
Based on previous theoretical and empirical results, H1 predicted subjects
assigned the truth-inducing contract would create less budget slack than subjects
assigned the slack inducing contract. Results of the analysis of variance reveal a
significant main effect of contract type, F(1, 138) = 26.99 ( p < 0.001) indicating
Table 2. Budgetary Slack by Contract Type and Contracting Process.

Panel A: Mean (Standard Deviation) of Budgetary Slack
Slack-Inducing Truth-Inducing
Contract Contract
Fair contracting process 2.40 0.80

(2.38) (1.47)
n = 42 n = 31
Unfair contracting process 3.85 0.71
(3.97) (1.49)
n = 41 n = 28
Panel B: Analysis of Variance of Budgetary Slack Creation
Source SS df MS F
Contract type 193.21 1 193.21 26.99***

Contracting process 15.85 1 15.85 2.21
Contract × process 20.44 1 20.44 2.86*
Error 987.79 138 7.16
Panel C: Simple Effects Analysis
F df Significance
Fair vs. unfair process for 4.09 81 p < 0.05

slack-inducing contract
Fair vs. unfair process for 0.06 57 p = 0.94
truth-inducing contract
∗p < 0.10.
∗∗∗ p < 0.001.
160 THERESA LIBBY
Fig. 2. Mean Budgetary Slack by Experimental Condition.
subjects assigned the slack-inducing contract created significantly more slack

on average (3.12 words) than subjects assigned the truth-inducing contract (0.76
words). These results provide support for H1.
H2 predicted that less slack would be created under a fair contracting
process than an unfair contracting process when the incentive contract was
of the slack-inducing type. However, slack creation would be insensitive to
the contracting process when the incentive contract was of the truth-inducing
type. Results of simple effects analysis employed to test this hypothesis are
presented in Table 2 (Panel C). No significant difference between slack created
under the fair or unfair contracting process was found within the truth-inducing
contract condition, F(1, 57) = 0.06. Differences in the amount of slack created
depending on the fairness of the contracting process were significant within the
slack-inducing contract condition, F(1, 81) = 4.09 ( p < 0.05) with subjects
in the unfair contracting process condition creating significantly more slack
than subjects in the fair contracting process condition. These results provide
support for H2.
DISCUSSION
This study explores the relationship between fair and unfair contracting processes,
budget-based compensation contracts, and the creation of budgetary slack. Prior
research examines the effectiveness of a variety of forms of truth-inducing
contracts in reducing budgetary slack. The current study contributes to the
literature by examining the effectiveness of two specific contract forms when
combined with fair or unfair contracting processes in reducing the amount of

budgetary slack subjects create.
Results were as predicted. Consistent with results of prior studies in this area,
subjects compensated under the truth-inducing contract created significantly less
slack than subjects compensated under the slack-inducing contract. Subjects
compensated under a slack-inducing contract and exposed to an unfair contracting
process created more budgetary slack than subjects compensated under a slack-
inducing contract and exposed to a fair contracting process. Finally, the amount
of budgetary slack created by subjects compensated under the truth-inducing
incentive contract was insensitive to the fairness or unfairness of the contracting
process employed.
It is interesting to note that the benefit of fairness in contracting is realized
only for the slack-inducing contract form. In addition, the truth-inducing contract
form was more effective than the combination of the slack-inducing contract
and fair contracting process in reducing slack creation. The problem is that
truth-inducing contracts do not appear to be widely utilized in practice, can be
costly to implement, are considered less fair, and are overall less preferred than
bonus-framed contracts like the slack-inducing contract studied here (Luft, 1994).
Results of the present study indicate that fairness in contracting may represent a
relatively effective alternative to the implementation of a truth-inducing contract.
Whether organizations currently utilize this means of reducing slack creation
behavior warrants further investigation.
The generalizability of these results to real managers is limited by the use of
student subjects. Consequently, the ability to transfer what has been learned here
to managers in real organizations may be limited. While this threat to external
validity cannot be ruled out, the elements of the theory of procedural justice
on which the hypotheses are based are not manager-specific, but have been
found to apply equally well in a variety of settings. In other words, individuals
do not need managerial experience to be affected by the fairness treatment
in the study described in this paper. In addition, the incentive contracts by
which subjects were compensated are rooted in the basic economic premise that
individuals prefer more money to less. These contracts should therefore retain
their motivational qualities regardless of the degree of managerial experience
of the subjects.
Future research is also required to test directly the process by which fairness
perceptions are translated into reductions in the amount of slack subordinates
create. Cropanzano and Folger (1991) suggest perceptions of fairness lead to
increases in organizational commitment which in turn lead to positive orga-
nizational outcomes. Future studies, perhaps in field settings, are required to
determine whether organizational commitment also moderates the relationship
162 THERESA LIBBY
between fair contracting process and the creation of budgetary slack in the
incentive-contracting setting studied here.
NOTES
1. Task and environmental uncertainty are fundamental issues faced by managers

(Thompson, 1967). Task uncertainty refers to the difficulty of the task, its degree of vari-
ability and the extent to which successful completion of the task depends on the successful
completion of other tasks (Tushman & Nadler, 1978). Environmental uncertainty has many
dimensions, the most important of which may be the degree to which the organization is con-
nected to and relies on other entities in its environment for information and/or resources and
the extent to which these other entities are undergoing change (Lawrence & Lorsch, 1967).
2. While it may be difficult to distinguish between slack created as a buffer against uncer-
tainty and slack created to game the performance evaluation system in real organizations, the
current study benefits from the control allowed in the laboratory environment. Specifically,
in the laboratory setting described here, uncertainty is held constant across experimental
conditions allowing for an analysis of slack created for budget gaming purposes.
3. While some attention has been paid to the fairness construct in previous accounting-
related studies (Ehlen & Welker, 1996; Hunton & Gibson, 1999; Libby, 1999; Lindquist,
1995; Magner & Welker, 1994; Moser et al., 1995), a search of the literature failed to
indicate any other studies examining the relevance of the fairness construct to the creation
of budgetary slack.
4. Although this task is relatively simple, it is not unlike the typical simple, repetitive
production task for which a piece-rate and/or bonus compensation would be paid in actual
organizations in order to motivate performance. The simplicity of the task means that it is
easily understood by subjects and is easy for them to learn in a relatively short period of
time. Therefore, the task gains in terms of experimental realism what may be lost in terms
of mundane realism.
5. The compensation parameters in both the slack-inducing and truth-inducing contracts
were set based on the results of a pre-test in which average output on the experimental task
for subjects similar in background to those taking part in the experiment was 25 words
translated in five minutes (minimum of 14 words, maximum of 38 words).
6. Voice is a generic term indicating the ability for subordinates to communicate their
interests to their superiors in an organization in order to exert some influence over the
decisions their superiors make (Folger, 1977). Budget participation could be viewed as
a context-specific form of voice defined as the process by which managers communicate
information about their productive capabilities to their superiors in order to influence the
setting of targets in their budget-based incentive contracts (Kren, 1992).
7. Before subjects were paid, their practice period and work period performance was
verified and their total compensation was recalculated.
8. Note that subjects’ probability of winning the prize is dependent not just on their own
performance, but on the performance of other subjects in the group. Due to the one-period
nature of the experiment and the setting in which the experiment took place, subjects
had no opportunity to collude or act in any strategic way. Also, note that the perceived
attainability of the target is important. If the target had been viewed as unattainable,
subjects would have conserved energy by not performing the task and taking the fixed
portion of the payment available under each of the incentive schemes. No subjects took
this strategy indicating that the compensation scheme was motivational and that subjects
viewed the target as difficult, but attainable.
9. This scale also includes questions about outcome fairness. The outcome-related
questions were adapted from Tyler and Lind (1992) as “How would you judge the formula
itself that will be used to calculate your earnings for the work period?” and “How fair would
you judge the budget itself?” Perceptions of outcome fairness did not differ depending
on contract type, F(1, 138) = 2.08, process, F(1, 138) = 1.50, or the contract by process
interaction, F(1, 138) = 0.06.
10. As an additional check on subjects’ perceptions of fairness, subjects were asked to
answer the following question: Think about the information you received about the nego-
tiation process between the workers and managers in this organization that was involved
in setting the earnings formula. On a scale of 1 to 5, where 1 means completely unfair
and 5 means completely fair, how fair would you judge the negotiation process? Results
of a 2 × 2 analysis of variance indicated a significant main effect of contracting process
on subject’s evaluation of the fairness of the negotiation process, F(1, 138) = 39.67,
p < 0.001, with subjects in the fair process condition rating the negotiation process
as fairer (mean = 3.71, std. dev. = 0.86) than subjects in the unfair process condition
(mean = 2.68, std. dev. = 1.09). No significant differences were found based on contract
form or the contract by process interaction.
ACKNOWLEDGMENTS
I would like to thank John Waterhouse, Bill Scott, Duane Kennedy, and Jane
Webster for their guidance in the development and execution of this project. I also
wish to thank Glenn Feltham, Joseph Fisher, Kathryn Kadous, Kevin Kelloway,
Robert Mathieu, Don Moser, Steve Salterio, participants at the 1999 Management
Accounting Research Conference, and the accounting research workshops at
HEC (Montreal) and the University of Alberta for their many helpful comments
and suggestions. I gratefully acknowledge the School of Accountancy, University
of Waterloo and CGA-Canada for their financial support of this project. Data
available from the author upon request.
REFERENCES
Antle, R., & Eppen, G. D. (1985). Capital rationing and organizational slack in capital budgeting.
Management Science (Feb), 163–174.
Atkinson, A. (1985). Truth-inducing schemes in budgeting and resource allocation. Cost & Management
(May/June), 38–42.
Baiman, S. (1982). Agency research in management accounting: A survey. Journal of Accounting
Literature, 1, 154–213.
164 THERESA LIBBY
Baiman, S., & Evans, J. H. (1983). Pre-decision information and participative management control
systems. Journal of Accounting Research, 21, 371–395.
Baker, G. P., Jensen, M. C., & Murphy, K. J. (1988). Compensation and incentives: Practice vs. theory.
Journal of Finance, 43(3), 593–617.
Bart, C. (1988). Budgeting gamesmanship. Academy of Management Executive, 285–294.
Blau, P. (1964). Exchange and power in social life. New York, NY: Wiley.
Chow, C. W. (1983). The effects of job standard tightness and compensation scheme on performance:
An exploration of linkages. The Accounting Review, 58, 667–685.
Chow, C. W., Cooper, J. C., & Waller, W. S. (1988). Participative budgeting effects of truth inducing
pay schemes. The Accounting Review, 63, 111–123.
Chow, C. W., Hwang, R. N., & Liao, W. (2000). Motivating truthful upward communication of private
information: An experimental study of mechanisms from theory and practice. Abacus, 36(2),
160–179.
Cropanzano, R., & Folger, R. (1991). Procedural justice and worker motivation. In: R. M. Staw &
L. W. Porter (Eds), Motivation and Work Behavior (5th ed., pp. 131–143). New York, NY:
McGraw-Hill.
Cyert, R. M., & March, J. G. (1963). A behavioral theory of the firm. Englewood Cliffs, NJ: Prentice-
Hall.
Davila, T., & Wouters, M. (2000). Meeting budgets: Budget emphasis and the release of budgetary
slack. Working Paper: Stanford University, Stanford, CA.
Demski, J., & Feltham, G. (1978). Economic incentives in budgetary control systems. The Accounting
Review, 53, 336–359.
Dunk, A. S. (1993). The effect of budget emphasis and information asymmetry on the relation between
budgetary participation and slack. The Accounting Review, 68(2), 400–410.
Ehlen, C. R., & Welker, R. B. (1996). Procedural fairness in the peer and quality review programs.
Auditing: A Journal of Practice and Theory, 15(1), 38–52.
Eisenberger, R., Fasolo, P., & Davis-LaMastro, V. (1990). Perceived organizational Support and
employee diligence, commitment and innovation. Journal of Applied Psychology, 75,
51–59.
Evans, J. H., Hannan, R. L., Krishnan, R., & Moser, D. V. (2001). Honesty in managerial reporting.
The Accounting Review, 76(4).
Evans, J. H., & Sridhar, S. S. (1996). Multiple control systems, accrual accounting, and earnings
management. Journal of Accounting Research, 24(1), 45–65.
Fehr, E., & Gachter, A. (2002). Do incentive contracts crowd out voluntary cooperation? Institute for
Empirical Research in Economics, Working Paper No. 34, University of Zurich.
Fehr, E., Klein, A., & Schmidt, K. M. (2001). Fairness, incentives and contractual incompleteness.
CESifo Working Paper No. 445: Center for Economic Studies, Munich.
Folger, R. (1977). Distributive and procedural justice: Combined impact of voice and improvement on
experienced inequity. Journal of Personality and Social Psychology, 35, 108–119.
Folger, R., & Cropanzano, R. (1998). Organizational justice and human resource management. Thou-
sand Oaks, CA: Sage Publications.
Greenberg, J. (1986). Determinants of perceived fairness of performance evaluations. Journal of Applied
Psychology, 71(2), 340–342.
Greenberg, J. (1987). Reactions to procedural injustice in payment distributions: Do the means justify
the ends? Journal of Applied Psychology, 72(1), 55–61.
Hopwood, A. G. (1972). An empirical study of the role of accounting data in performance evaluation.
Journal of Accounting Research, 10, 156–182.
Hunton, J. (1996). Involving information system users in defining system requirements: The influence
of procedural justice perceptions on user attitudes and performance. Decision Sciences, 27(4),
647–671.
Hunton, J., & Beeler, J. D. (1997). Effects of user participation in systems development: A longitudinal
field experiment. MIS Quarterly, 21(4), 359–388.
Hunton, J., & Gibson, D. (1999). Soliciting user-input during the development of an accounting infor-
mation system: Investigating the efficacy of group discussion. Accounting, Organizations and
Society, 24, 597–618.
Jennergren, L. P. (1980). On the design of incentives in Soviet firms: A survey of some research.
Management Science (Feb), 193–197.
Jensen, M. C. (2001). Corporate budgeting is broken – Let’s fix it. Harvard Business Review, 79(10),
94–101.
Kim, W. C., & Mauborgne, R. A. (1993). Procedural justice, attitudes, and subsidiary top-management
compliance with multinationals’ corporate strategic decisions. Academy of Management
Journal, 36(3), 502–526.
Kirby, A. J. (1992). Incentive compensation schemes: Experimental calibration of the rationality
hypothesis. Contemporary Accounting Research, 8, 374–408.
Kirby, A. J., Reichelstein, S., Sen, P. K., & Paik, T. (1991). Participation, slack, and budget-based
performance evaluation. Journal of Accounting Research, 29, 109–128.
Kren, L. (1992). Budgetary participation and managerial performance: The impact of information and
environmental volatility. The Accounting Review, 67(3), 511–526.
Kren, L., & Liao, W. M. (1988). The role of accounting information in the control of organizations: A
review of the evidence. Journal of Accounting Literature, 7, 280–309.
Lawrence, P. R., & Lorsch, J. W. (1967). Organization and environment: Managing differen-
tiation and integration. Boston: Graduate School of Business Administration, Harvard
University.
Leventhal, G. S. (1980). What should be done with equity theory? In: K. J. Gergen, M. S. Greenberg
& R. H. Willis (Eds), Social Exchange: Advances in Theory and Research. NY: Plenum Press.
Libby, T. (1999). The influence of voice and explanation on performance in a participative budgeting
setting. Accounting, Organizations and Society, 24(2), 125–138.
Lind, E. A., Kanfer, R., & Earley, P. C. (1990). Voice, control and procedural justice: Instrumental and
non-instrumental concerns in fairness judgments. Journal of Personality and Social Psychology,
59(5), 952–959.
Lind, E. A., & Tyler, T. R. (1988). The social psychology of procedural justice. NY: Plenum.
Lindquist, T. M. (1995). Fairness as an antecedent to participative budgeting: Examining the effects of
distributive justice, procedural justice and referent cognitions on satisfaction and performance.
Journal of Management Accounting Research, 7, 122–147.
Loeb, M., & Magat, W. (1978). Soviet success indicators and the evaluation of divisional management.
Journal of Accounting Research (Spring), 103–121.
Luft, J. (1994). Bonus and penalty incentives: Contract choice by employees. Journal of Accounting
and Economics, 18, 181–206.
Magner, N., & Welker, R. B. (1994). Responsibility center managers’ reactions to justice in budgetary
resource allocation. Advances in Management Accounting (Vol. 3, pp. 237–253). Greenwich,
CT: JAI Press.
Melumad, N. D., & Reichelstein, S. (1989). Value of communication in agencies. Journal of Economic
Theory, 47, 334–368.
166 THERESA LIBBY
Merchant, K. A. (1985). Budgeting and the propensity to create budgetary slack. Accounting, Organi-
zations and Society, 10(2), 201–210.
Moorman, R. H., Blakely, G. L., & Niehoff, B. P. (1998). Does perceived organizational support mediate
the relationship between procedural justice and organizational citizenship behavior? Academy
of Management Journal, 41, 351–368.
Moser, D. V., Evans, J. H., III, & Kim, C. K. (1995). The effects of horizontal and exchange inequity
on tax reporting decisions. The Accounting Review, 70(4), 619–634.
Murphy, K. J. (2000). Performance standards in incentive contracts. Journal of Accounting and Eco-
nomics, 30(3), 245–278.
Namazi, M. (1985). Theoretical developments of principal-agent employment contract in accounting:
The state of the art. Journal of Accounting Literature, 4, 113–163.
Naumann, S. E., & Bennett, N. (2000). A case for procedural justice climate: Development and test of
a multilevel model. Academy of Management Journal, 43(5), 881–889.
Otley, D. T. (1985). The accuracy of budgetary estimates: Some statistical evidence. Journal of Business
Finance and Accounting, 12(3), 415–425.
Reichelstein, S. (1992). Constructing incentive schemes for government contracts: An application of
agency theory. The Accounting Review, 67, 712–731.
Reichelstein, S., & Osband, K. (1984). Incentives in government contracts. Journal of Public Eco-
nomics, 24, 257–270.
Rousseau, D. M., & Parks, J. M. (1993). The contracts of individuals and organizations. In: L. L.
Cummings & B. M. Staw (Eds), Research in Organizational Behavior (Vol. 15). JAI Press.
Thompson, J. D. (1967). Organizations in action. New York: McGraw-Hill.
Tushman, M. L., & Nadler, D. A. (1978). Information processing as an integrating concept in organi-
zational design. Academy of Management Review, 3(3), 613.
Tyler, T. R. (1989). The quality of dispute resolution processes and outcomes: Measurement problems
and possibilities. Denver University Law Review, 66, 419–436.
Tyler, T. R., & Lind, E. A. (1992). A relational model of authority in groups. In: L. Berkowitz (Ed.),
Advances in Experimental Social Psychology (Vol. 25, pp. 115–191). Academic Press.
Van der Stede, W. A. (2000). The relationship between two consequences of budgetary controls:
Budgetary slack creation and managerial short-term orientation. Accounting Organizations
and Society, 25(6), 609–622.
Waller, W. S. (1988). Slack in participative budgeting: The joint effects of a truth-inducing pay scheme
and risk preferences. Accounting, Organizations and Society, 87–98.
Waller, W. S., & Chow, C. W. (1985). The self-selection and effort effects of standard-based employ-
ment contracts: A framework and some empirical evidence. The Accounting Review, 60(3),
458–476.
Walker, K. B., & Johnson, E. N. (1999). The effects of budget-based incentive compensation scheme
on the budgeting behavior of managers and subordinates. Journal of Management Accounting
Research, 11, 1–28.
Weitzman, M. (1976). The new Soviet incentive model. Bell Journal of Economics (Spring),
251–257.
Young, S. M. (1985). Participative budgeting: The effects of risk aversion and asymmetric information
on budgetary slack. Journal of Accounting Research, 23(2), 829–842.
Young, S. M., & Lewis, B. (1995). Experimental incentive contracting research in management
accounting. In: R. H. Ashton & A. H. Ashton (Eds), Judgment and Decision-making
Research in Accounting and Auditing (pp. 55–75). Cambridge, NY: Cambridge University
Press.
APPENDIX A
Sample Payments Under the Truth-inducing Contract
Cells of the table below represent the number of tickets earned under different
combinations of budgeted and actual performance. Diagonal cells were shaded
to emphasize that the maximum payments would be earned when the budgets
subjects selected were equal to their actual performance.
APPENDIX B
Fair and Unfair Contracting Process Scenarios
Fair Process:
You have learned that your supervisor has held this supervisory position for many
years. You have also noted that your supervisor appears to be very popular with
your co-workers. Your supervisor’s philosophy is that the employees of the division
are the experts when it comes to the work that they do and that much can be learned
from listening to their suggestions.
The formula that is used to calculate your earnings, as was described above, is a
relatively new innovation within this division. The form of the contract was agreed
upon approximately one year ago based on negotiations between representatives
168 THERESA LIBBY
of the employee group and management. The management negotiation group was
headed by your supervisor.
Although you have been told that the negotiation process led to a degree of tension
between your co-workers and your supervisor, your co-workers seem to be fully
supportive of the contract as it now stands. You have been told by one of your
co-workers, whose opinion you respect, that this is mainly due to strong commu-
nication between the employee and management groups during the negotiation
process.
You have also noticed that the majority of your co-workers with whom you have
talked about the negotiation process feel that the management team was sincerely
interested in their opinions about the earnings formula. Before the formula was
finalized, the management team performed an informal poll of the employees who
would be affected by it and found that the majority supported it. Whenever an issue
came up on which there was disagreement, the worker and management groups
were able to talk out their differences and come to a satisfactory solution, although
the management group also offered to allow any unresolved issues to be passed on
to an objective third-party decision maker.
Many of the employees of this division have held positions within the division
for many years. While increasing their overall pay is, of course, very important to
your co-workers, providing accurate budgets to management and increasing overall
production efficiency in order to ensure the long-term survival of the organization
also seems to be high on their list of priorities. You have heard four or five of
them say that they would have to be given a pretty large raise in pay before they
would be willing to move to a job in another division mainly due to the positive
atmosphere between employees and managers in this division.
Unfair Process:
You have learned that your supervisor has held this supervisory position for many
years. You have also noted that, although your co-workers are polite and do as the
supervisor asks, he does not seem to be very popular with them. The supervisor’s
philosophy is that employees should work hard to receive higher pay and leave
all other decisions to him. You have been told that the supervisor feels that his
long-term position as supervisor of the division makes him the best judge of how
the work should be done and he is not really interested in receiving feedback or
suggestions from the employees that he supervises.
The formula that is used to calculate your earnings, as was described above, is a
relatively new innovation within this division. The form of the contract was agreed
upon approximately one year ago based on negotiations between representatives
of the employee group and management. The management negotiation group was
headed by your supervisor.
You have been told that the negotiation process led to a great deal of tension between
your co-workers and your supervisor. Your co-workers seem to be quite bitter about
the contract as it now stands. You have been told by one of your co-workers, whose
opinion you respect, that this is mainly due to the lack of communication between
the employee and management groups during the negotiation process.
You have also noticed that the majority of your co-workers with whom you have
talked about the negotiation process feel that the management team appeared to
be completely uninterested in their opinions about the earnings formula. Before
the formula was finalized, the employee group suggested that an informal poll be
taken of employees who would be affected by it to measure their degree of support.
This suggestion was ignored by the management group. Whenever an issue came
up on which there was disagreement, the worker and manager groups found it
difficult to come to a satisfactory solution and generally, the solution was imposed
by the person in charge of the management group, who happens to have been your
supervisor.
Many of the employees of this division have held positions within the division for
only a year or two. Receiving the highest possible earnings at the end of each work
period seems to be of utmost importance to your co-workers. You have heard four
or five of them say that they view their position as only a “stepping stone” to a
better position within another division of the organization. Increasing production
efficiency and the long-term health of the division by providing accurate budgets
to management does not seem to be high on their list of priorities. A few of your
co-workers have commented that they would not have to be given a very large raise
in pay, or any raise at all, to convince them to move to a job in another division
of the organization where the atmosphere between the workers and the supervisor
was more positive.
A TOBIT ANALYSIS OF ACCOUNTING
FACULTY PUBLISHING
PRODUCTIVITY IN AUSTRALIAN AND
NEW ZEALAND UNIVERSITIES
Brett R. Wilkinson, Chris H. Durden and Katherine

J. Wilkinson
ABSTRACT
This study examines the research behavior of Australian and New Zealand
accounting faculty to determine the characteristics that influence research
productivity. University reputations are integrally linked with research
performance and determining the qualities that predict research behavior
may be of particular value in the selection and recruitment process. The
study finds that two key factors significantly impact performance: holding
a Ph.D. and having an academe-oriented rather than profession-oriented
background. These results may be interpreted as affirming the U.S. model
of developing specialist academic researchers through doctoral education
programs rather than employing faculty with strong professional experience.
1. INTRODUCTION
Research is an integral function of any university and a key determinant of
academic reputation (Baden-Fuller et al., 2000). Primarily, a university’s research

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06008-3
173
174 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON
productivity is measured by the quantity and quality of published outputs achieved

by its faculty. In this sense publication output is an important method of assessing
academic performance for promotion and tenure decisions (Englebrecht et al.,
1994; Zivney, Bertin & Gavin, 1995). It enables research students, prospective
faculty members, and the academic community in general to make better-informed
decisions about the standing of a particular institution or department (Bairam,
1996; Cargile & Bublitz, 1986; Demski & Zimmerman, 2000; Hull & Wright,
1990). Additionally, the quantum of publishing activity may influence the level of
funding a department receives (Doyle & Arthurs, 1995; Gray & Helliar, 1994). The
importance of published research was aptly highlighted by Cargile and Bublitz
(1986) with their reference to a statement by Davidson (1957, p. 117) that:
The effectiveness or efficiency of a faculty is indeed difficult to measure, and I would not deny
the important faculty function performed by the non-writing researcher. However, I think he
is likely to be a relative “rare bird.” For the great majority of faculty members, it seems to me
that we must continue to emphasize the place of research and publication in their programs.
Only by this procedure can we hope to have accounting remain a vital and stimulating force
in business education and management.
Understanding the factors that impact the level of research (measured in publica-
tions) achieved by faculty members is very important from a university perspective.
This is particularly relevant when recruiting new and inexperienced faculty, where
the existing faculty must rely on indicators of possible future publication success
rather than on an observed publication stream. The issue is even more salient given
that “Most academics publish very little, or not at all” (Demski & Zimmerman,
2000, p. 346). However, research studies in accounting investigating factors that
indicate future publishing output levels are relatively limited (e.g. Cargile &
Bublitz, 1986; Gee & Gray, 1989; Gray & Helliar, 1994; Maranto & Streuly, 1994).
These studies suggest that various factors, such as the institutional setting of the
researcher and possessing a Ph.D., impact the level of research output. A difficulty
with conducting this form of research is that only a relatively small number of
the factors that are likely to influence publication output are “observable” (e.g.
research interests, Ph.D. qualification). Other relevant factors are likely to be more
difficult to accurately measure (e.g. ability, ambition) (Gray & Helliar, 1994).
This study examines the research behavior of Australian and New Zealand
accounting faculty to determine the characteristics that influence research
productivity. In essence, the study asks what factors will predict the desired
research behavior, namely papers published in quality academic journals. It builds
on the work of Wilkinson and Durden (1998) and Durden et al. (1999) who
measured research outputs in an attempt to enable comparisons of performances
across universities. Those studies served to provide a basis for ranking university
A Tobit Analysis of Accounting Faculty Publishing Productivity 175
departments, but did not seek to explain in any comprehensive sense the observed
differences between individual faculty performances. This study develops a Tobit
model to explain publishing output behavior. The findings indicate that two key
factors contribute to publishing performance – holding a Ph.D. qualification
and having an academe-orientation and background rather than an extensive
professional background. Other indicators of publishing productivity were having
stated research interests in the financial accounting, managerial accounting and
auditing fields. This may also reflect a bias in the higher-ranked journals toward
these areas of interest. That is to say, researchers may focus their research efforts in
financial accounting, managerial accounting and auditing because the more highly
ranked journals are more open to accepting research in these areas than in newer
subdisciplines. This is consistent with Daigle and Arnold’s (2000) suggestion that
many of the accounting information systems researchers are forced to develop
and promote research interests in other subdisciplines because research in these
other areas (financial, managerial and tax accounting) is more likely to result in
the highly-ranked journal publications required for tenure purposes.
The remainder of the paper is organized as follows. Section 2 develops the
hypotheses in the context of the extant literature. Section 3 outlines the model
development and data analysis. Results are shown in Section 4 and conclusions
and limitations are discussed in Section 5.
2. HYPOTHESIS DEVELOPMENT
Based on an analysis of prior literature several important characteristics appear to
impact research output. First, possessing a Ph.D. impacts research productivity.
Since the Ph.D. comprises by definition an intensive research preparation process,
a positive relationship likely exists between research productivity and possession
of a Ph.D. degree. Arguments about the importance of the Ph.D. are often based
on theories of human capital (Long et al., 1998; Maranto & Streuly, 1994). In this
sense the Ph.D. provides students with higher levels of intellectual capital which
should result in higher levels of research output and career success. This may exist
among graduates from a range of Ph.D. programs rather than being restricted only
to those with high status academic origins (Long et al., 1998). Other research has
also indicated an association between holding a Ph.D. and research productivity
(Gray et al., 1987; Gray & Helliar, 1994). The Australian and New Zealand
context provides an opportunity to further explore the role of the Ph.D. because
only a relatively small proportion of faculty in these two countries hold a Ph.D.
At the time this study was undertaken only 24% of faculty members were Ph.D.
qualified. H1, in the alternate form, is as follows:
H1. There is a positive relationship between Ph.D. qualification and research

productivity.
Other factors that would seemingly be significant explanators of publishing

productivity include ambition and motivation (Demski & Zimmerman, 2000;
Gray & Helliar, 1994; Long et al., 1998; Maranto & Streuly, 1994). Because
explicitly measuring these constructs is difficult, appropriate proxy measures must
be sought. Here, the tenure/position confirmation process and promotion structure
should motivate publishing activity in the early years of employment (Demski &
Zimmerman, 2000). This may decrease in importance as the number of years of
employment at a given institution significantly increases. Furthermore, periods of
low productivity are expected to impede promotion opportunities to other similarly
ranked institutions (Zivney et al., 1995). Accordingly, H2 is stated as follows:
H2. There is an inverse relationship between years of employment at a given

institution and publishing productivity.
Closely related to the above concepts of motivation and ambition, is that of

motivation for, or interest in, research (Demski & Zimmerman, 2000). Here,
faculty with a predominantly professional background and focus may be less
likely to prioritize the research process. These faculty members may have entered
academe predominantly to become teachers rather than to pursue an interest in
research. Various accounting faculty who have commenced in academe after
spending considerable time in practice, have commented on the attraction of
teaching as a reason for making the career change (e.g. see Beresford, 2001; Meyer
& Titard, 2000). Conversely, those with predominantly academic backgrounds
who started a research career early, are likely to have self-selected into such a
career owing to a research orientation. This is also reinforced by the observation
that increasing numbers of accounting faculty no longer hold professional
qualifications (Newell et al., 1996; Otley, 2002). Accounting faculty pursuing an
academic career from an early age are expected to have a strong research, rather
than teaching or practice, orientation (Abdolmohammadi et al., 1985; Imhoff,
1988; Mautz, 1988). Further, in a non-U.S. context where the pursuit of the Ph.D.
has been less prevalent, it is now increasingly expected that university staff should
have a completed Ph.D. as a prerequisite to an academic career (Blaxter et al.,
1998). This also coincides with greater recognition of the pursuit of research,
rather than teaching, as the primary purpose of a university career (Blaxter et al.,
1998). The emphasis on research early in a university career is reflective of the
theory of accumulative advantage where an “Initial advantage or disadvantage
compounds over time. A premium is placed on a quick start, and ‘late blooming’
is penalized” (Maranto & Streuly, 1994, p. 388). Two measures are employed
as proxies for research interest or orientation: membership of a professional

accounting body, and employment background.1 H3 and H4 are stated as follows:
H3. There is an inverse relationship between research productivity and
professional body membership.
H4. There is an inverse relationship between research productivity and substan-
tial experience outside academe (five years or more).
Measures of faculty research interests are included as control variables (Gray
& Helliar, 1994). No predictions are made as to the significance of these
interests, although there is a possibility that highly ranked journal subdiscipline
preferences may impact the extent to which a researcher can achieve highly
ranked publications in a given field of interest (Hasselback et al., 2000). These
measures serve as controls for journal biases toward certain fields.
As a final control, a measure of the region of highest qualification is included.
Researchers trained in the U.S. may have a greater access to U.S. journals,
many of which rank highly in a range of quality indexes (e.g. Brinn et al., 1996;
Hasselback et al., 2000; Nobes, 1985). This measure may also proxy for research
interest, since Australian and New Zealand accounting faculty who have chosen
to pursue Ph.D. qualifications in the U.S. are likely to have done so on the basis
of an orientation toward research rather than toward the profession.2
3. MODEL DEVELOPMENT AND DATA ANALYSIS

The data used in this study is sourced from Wilkinson and Durden (1998) and
Durden et al. (1999). This data was collected from several sources. The Jacaranda
Wiley Directory of Accounting: 1998–1999 for Australia and New Zealand was
used to derive names and basic data for all Australian and New Zealand accounting
faculty, at the lecturer level and above.3 After deletion of non-accounting faculty4
included in the directory (e.g. business law and finance faculty included within
accounting departments) and those below the rank of lecturer, the total sample
size is 716 faculty members.5 The basic data collected includes research interests,
the length of time of employment at the current institution, qualifications held
and prior employment details. The research output of each faculty member
for the five-year period 1992 to 1997 was derived from an electronic database
(ABI Inform) and from hard copies of the Accounting and Tax Index. While
these indexes provide a relatively comprehensive coverage of the literature,
they do not include several key Australian and New Zealand journals. Failure
to include these potentially biases the results since Australian and New Zealand
accounting faculty might be expected to focus their publishing efforts in these

journals. Accordingly, the data obtained from these indexes are supplemented
with publications from all Australian and New Zealand accounting journals not
included in either the Accounting and Tax Index or ABI Inform. Only research
published in journals was utilized.6 The weighted measure of publication output
developed by Wilkinson and Durden (1998) and Durden et al. (1999) and using
Zeff’s (1996) library holdings measures as a quality index, is used as a measure of
research productivity. This weighted measure adjusts each author’s allocation for
the number of coauthors (half an article is attributed to each author of a published
paper with two authors, one third to each for a published paper with three authors
and so on).
The measure also allocates a quality weighting based on Zeff’s (1996) library
survey. Zeff (1996) conducted a survey of 12 major libraries, five in the U.S.,
five in the U.K. and two in Australia and measured the library holdings of 77
accounting research journals. Journals are then effectively given a rating from
zero to 12 based on the number of the 12 libraries that held the journal. The study
uses a quality weighting based on this rating (e.g. a journal held by all 12 libraries
was weighted as 1 and a journal held by 6 of the libraries was weighted as 0.5).
The Zeff (1996) results are comparable to studies using faculty surveys to measure
quality. The 15 journals that were held by 11 or 12 of the 12 libraries reviewed
by Zeff (1996) compared closely with the top journals identified using surveys
of accounting faculty (e.g. Brown & Huefner, 1994; Hull & Wright, 1990). A
key benefit of using Zeff’s ratings is that his research is international in nature
and thus likely provides a more appropriate measure of quality for faculty in the
Australian-New Zealand environment. The Zeff (1996) ratings for the journals in
which faculty in this study had published are shown in Table 1. Interested readers
are referred to Zeff (1996) for the ratings of the full 77 journal set. Descriptive
statistics for the data used in the model are shown in Table 2.
Although there may be a possibility of correlation amongst the independent vari-
ables, due to these variables proxying for underlying but unmeasurable qualities,
there is no evidence of unreasonably high correlations, as shown in Table 3.
The estimated model is as follows:
Weighted publications = ␤ + ␤1 years employed + ␤2 financial accounting

+ ␤3 managerial accounting + ␤4 auditing
+ ␤5 tax + ␤6 theory + ␤7 education + ␤8 other
+ ␤9 U.S. qualified + ␤10 Ph.D. + ␤11 membership
+ ␤12 academe + ␴
Table 1. Ratings Derived from Zeff (1996) for Journals in the Sample.
Abacus 12 British accounting review 11
Accounting and business research 12 Contemporary accounting research 11
Accounting and finance 7 Financial accountability and 6
management
Accounting, auditing and 8 The international journal of 11
accountability journal accounting
Accounting forum 2 Issues in accounting education 11
Accounting historians journal 10 Journal of accounting and economics 12
Accounting history 0 Journal of accounting and public 11
policy
Accounting horizons 12 Journal of accounting, auditing and 11
finance
Accounting, organizations and 12 Journal of accounting education 6
society
The Accounting review 12 Journal of accounting research 12
Advances in accounting 7 Journal of business finance and 12
accounting
Advances in international accounting 6 Journal of cost management 9
Advances in management accounting 3 Journal of international accounting, 3
auditing and taxation
Asian review of accounting 0 Journal of international financial 7
management and accounting
Auditing: A journal of theory and 10 Management accounting research 7
practice
Australian accounting review 3 Pacific accounting review 3
Behavioral research in accounting 7 Research in accounting in emerging 1
economies
where:
years employed = years employed at current institution;
financial, managerial, auditing, tax, theory, education and other = stated faculty
areas of research interest;
U.S. qualified = a dummy variable taking the value of 1 if the highest educa-
tional qualification is from a U.S. institution and zero otherwise;
Ph.D. = a dummy variable taking the value of 1 if a Ph.D., DBA or D.Phil.
qualification is held and zero otherwise;
Membership = a dummy variable taking the value of 1 if professional body
membership is held and zero otherwise;
Academe = a dummy variable taking the value of 1 if the individual has less
than 5 years of experience outside academe and zero otherwise.
Table 2. Descriptive Statistics for Variables Used in the Study.

Panel A: Continuous Variables
Variable N Mean Standard Minimum Maximum
Deviation
Weighted publications 716 0.29 0.68 0 5.625

Years employed at current institution 716 9.61 6.77 0 39.000
Panel B: Dummy Variables
Interest Frequency Percent
Financial accounting research interest 160 22.4

Managerial accounting research interest 126 17.6
Auditing research interest 84 11.7
Taxation research interest 41 5.7
Accounting education research interest 68 9.5
Accounting theory research interest 17 2.4
Other research interests 398 55.6
Qualifications from US 30 4.2
Ph.D. 172 24.0
Professional membership 551 77.0
Academic orientation 466 65.1
Since a large number of faculty members had no publications during the period
of measurement, there is a high proportion of zeros in the weighted publications
measure. Thus, while data for the independent variables is available, the data for the
dependent variable is of a censored nature. One possibility would be to estimate the
model via OLS using only those faculty for whom the dependent variable is non-
zero. However, as noted by Judge et al. (1988), this results in biased and inconsistent
estimators. A more appropriate approach is to estimate a Tobit regression model.
McDonald and Moffitt (1980, p. 318) identify the Tobit model as assuming “an
underlying, stochastic index equal to (X t ␤+u t ) which is observed only when it is
positive, and hence qualifies as an unobserved, latent variable.” They express the
stochastic model as follows:
Yt = Xt ␤ + ut if Xt ␤ + ut > 0
Yt = 0 if Xt ␤ + ut ≤ 0
where: t = 1, 2, . . ., N.
A Tobit model is estimated via the SAS LIFEREG procedure, using the normal
probability distribution for the error term. As noted by McDonald and Moffitt
(1980), the estimated regression parameters cannot be interpreted in the usual
sense. They will, however, enable us to ascertain the independent variables that
significantly impact publishing performance. As noted later in the paper, the Tobit
Table 3. Pearson Correlation Coefficients (p Values) Between Key

Independent Variables.
Years U.S. Qualifications Membership Ph.D. Academe
Years 1.00000 −0.02101 0.11230 −0.04369 0.01701

(0.5747) (0.0026) (0.2429) (0.6496)
U.S. qualifications 1.00000 −0.06764 0.15979 0.06543
(0.0705) (<0.0001) (0.0802)
Membership 1.00000 −0.01835 −0.13644
(0.6241) (0.0003)
Ph.D. 1.00000 0.11011
(0.0032)
Academe 1.00000
model can be interpreted as providing a probability of publishing measure for

individuals with a given set of characteristics.
4. ANALYSIS AND RESULTS

The results of the Tobit model estimation are shown in Table 4. A comparison of this
model against an intercept only Tobit model indicates that the model is statistically
significant and that all estimated coefficients can be considered to be non-zero.
The estimated model suggests that several factors are highly significant in
determining weighted publications achieved. Consistent with H1, whether an
individual holds a Ph.D. or not is a critical determinant in the level of weighed
publications obtained. Those with Ph.D.s were substantially more likely to have
achieved weighted publications than those without. This lends support to the
current trend in Australia and New Zealand toward requiring a Ph.D. for entry to
accounting academia. It is also consistent with the U.S. experience where Ph.D.s
have been required since the 1960s, a move that was expressly designed to foster
greater research outputs within business schools.
The results also support H4, relating to an individual’s orientation toward
academe or the profession. Essentially, whether an individual’s background
included an extended time (5 years or more) of experience outside academia, was
significant in determining publications. Those who had no such experience (and
were deemed to have an academic/research orientation) were significantly more
likely to have achieved weighted publications than those who had a professional
experience background. This raises some issues about the type of faculty that
universities should seek to hire. Although there may be some tendency in Australia
and New Zealand to attach value to individuals with professional experience,
Table 4. Results of the Tobit Model Estimation.

Variable Parameter Estimate ␹2 p Value
Intercept −0.67 57.02 <0.0001

Years employed 0.003 0.12 0.72
Financial accounting 0.57 18.26 <0.0001
Managerial accounting 0.57 13.63 0.0002
Auditing 0.46 7.44 0.006
Tax −0.42 2.00 0.16
Theory 0.38 1.30 0.25
Education 0.01 0.00 0.96
Other 0.17 2.06 0.15
US qualified 0.11 0.22 0.64
Membership 0.09 0.48 0.49
Ph.D. 1.29 114.74 <0.0001
Academic 0.55 20.07 <0.0001
Scale (␴ – hat) 1.15
Log likelihood −606.08
Weighted publications = ␤ + ␤1 years employed + ␤2 financial accounting +

␤3 managerial accounting + ␤4 auditing + ␤5 tax + ␤6 theory + ␤7 education + ␤8 other +
␤9 US qualified + ␤10 Ph.D. + ␤11 membership + ␤12 academe + ␴
this study calls into question the extent to which such individuals will be likely
to achieve quality research outputs, a critical determinant of a university’s
reputation.
H2 (years of employment at current institution) and H3 (membership of
professional body) were not supported. The failure of professional membership
to explain productivity may be related to the fact that a high number of faculty
hold such membership. Faculty may derive significant benefits from such
membership (for example, insurance benefits) such that even faculty with a low
professional interest, may maintain membership. The insignificance of years of
employment is surprising but may indicate that faculty with a strong research
interest maintain that interest over time and may derive sufficient reward within
their own institutions (Gray & Helliar, 1994).
Also of note were the significant coefficients for faculty research interests in
financial accounting, managerial accounting and auditing. This may reflect a bias
in the higher-ranked journals toward these areas of interest (Hasselback et al.,
2000). Some concerns could be raised with respect to the poor performance of
faculty with stated tax research interests. Here, there is a negative, though not
significant, relationship between an expressed interest in taxation and weighted
publications. This may reflect the tendency, particularly in Australia, for tax
faculty to be concentrated in law/business law disciplines. Tax publishing has
Table 5. Probability Distribution for an Individual with 5 Years’ Employment,

an Interest in Financial Accounting, with a U.S. Qualification and Classed
as an Academic.
Weighted Publication Level Zero Zero to One One to Two More than Two
Without Ph.D. 0.64 0.27 0.08 0.01

With Ph.D. 0.23 0.37 0.29 0.10
accordingly trended toward legal based research rather than empirical accounting
research. This is consistent with comments by Schulman et al. (1996) concerning
the low level of empirical research into the policy implications of tax integration,
a reform that has been implemented in Australia, New Zealand, Canada and the
U.K., along with a range of other countries outside the U.S.. The holding of U.S.
qualifications was also non-significant. This may be the result of the low levels of
individuals holding such qualifications (30 out of 716 faculty).7
As noted earlier, the Tobit model parameters cannot be interpreted in the
same manner as those derived from ordinary least squares. However, the Tobit
model can be used to estimate the probability that an individual with a given
set of characteristics will publish at a certain level. In fact, an entire probability
distribution can be developed for an individual with a given set of characteristics.
For example, consider an individual with 5 years’ of employment at their current
institution, who has a stated interest in financial accounting, is not a member
of a professional organization and who has less than five years’ experience
outside the academic environment, is U.S. qualified with no Ph.D., the probability
distribution shown in Table 5, row 1 would arise.8 If, by way of contrast, an
equivalent individual with respect to the stated characteristics is considered but
who also possesses a Ph.D., the probability distribution shown in Table 5, row
2 arises. Thus, the model predicts a higher probability of increased publishing
performance across the board, and a much reduced probability of having no
publications for an individual with a Ph.D. relative to one without.
5. CONCLUSIONS, LIMITATIONS AND SUGGESTIONS

FOR FURTHER RESEARCH
This study is focused on developing a model that predicts the likelihood of current
faculty or potential faculty publishing at a various levels. Using a measure of
Australian and New Zealand faculty publishing productivity over a five year
period, the study provides evidence that two key factors significantly impact
performance: holding a Ph.D. and having an academe-oriented rather than

profession-oriented background. These findings are revealing in terms of the
types of faculty that universities should consider recruiting, to the extent that
research productivity is perceived as being important.
As with all models, the Tobit analysis represents a simplification of reality.
The publication data on which the model is based most likely contain errors of
measurement, resulting in less precise estimates. Further, the extent to which
the reported research interests reflect genuine active research interests cannot be
ascertained from the data.
The model is limited in terms of its applicability, to the Australian and New
Zealand context, from which the data were obtained. Generalization to other
contexts, such as North America, Europe and Asia, may be problematic given
institutional and cultural differences. The findings are broadly consistent however,
with the results of Gray and Helliar (1994) in the U.K. context. We encourage
research exploring similarities and differences in factors explaining research
productivity across different cultural settings in order to facilitate a greater
understanding. Analysis of the extent to which the model is robust to different
measures of research productivity would also be worthwhile.
NOTES
1. Employment background was coded as “professional” for individuals with 5 years or
more experience in a non-academic role, and as “academic” for those with less than 5 years
experience outside academe.
2. This assessment ignores migration of U.S. citizens already holding Ph.D. qualifi-
cations to Australia and New Zealand, about which no a priori belief is held. Further, the
study uses “highest qualification from U.S.” rather than Ph.D. specifically. A subsequent
test using only U.S. Ph.D. qualification resulted in no qualitative differences.
3. The directory also included part time doctoral teaching assistants and assistant
lecturers, neither of whom would be considered permanent faculty, and were excluded
accordingly. Tutors were also excluded on the basis that their role is explicitly teaching
based, and on grounds that they also tend not to be regarded as permanent faculty.
4. Although the directory is primarily accounting specific, it does include some non-
accounting faculty. Where possible, such faculty were identified and eliminated based on
qualifications, teaching responsibilities and research interests. It is possible, however, that
in some instances non-accounting faculty may not have been identifiable as such and hence
were included. For example, finance faculty listed in the directory that held professional
accounting memberships might not have been clearly distinguishable from accounting
faculty. It is likely, however, that most departments registered only accounting faculty
in the directory and that most non-accounting faculty that were included were identified
and deleted.
5. Limited other deletions were made including the deletion of a dean. Details can be
found in Wilkinson and Durden (1998) and in Durden et al. (1999).
6. Only published articles were included. Hence, published book reports and monographs
were excluded from the study.
7. As a further check, this was restricted to U.S. Ph.D. qualifications. The estimated
coefficient was negative but not significant and there was no qualitative change in the other
estimated coefficients.
8. Probabilities are calculated as follows: P(publications ≤ W P) = P(Z ≤ (W P −
␹t ␤)/␴) For example, the probability that the individual in Table 5 without a Ph.D. will
publish zero publications is:
P(publications = 0) = P(Z < 0 − (−1.66455 + 0.00279 × 5(YEARS)
+ 0.54979(FINANCIAL) + 0.11835(U.S. QUALIFIED)
+ 0.54851(ACADEMIC))/1.15438) or P(Z < 0.376) = 0.647.
ACKNOWLEDGMENTS
The authors wish to thank Peter Westfall for his assistance with the methodological
development. We also thank the editor, Vicky Arnold, and an anonymous reviewer
for helpful comments and suggestions in revising the paper.
REFERENCES
Abdolmohammadi, M. J., Menon, K., Oliver, T. W., & Umapathy, S. (1985). The role of the doctoral
dissertation in accounting research careers. Issues in Accounting Education, 3, 59–76.
Baden-Fuller, C., Ravazzolo, F., & Schweizer, T. (2000). Making and measuring reputations: The
research rankings of European business schools. Long Range Planning, 33(5), 621–650.
Bairam, E. I. (1996). Research productivity in New Zealand university economics departments,
1988–1995. New Zealand Economics Papers, 30, 229–241.
Beresford, D. R. (2001). Guest editorial: If I could do it over again . . .. The CPA Journal, 71(7), 80.
Blaxter, L., Hughes, C., & Tight, M. (1998). Writing on academic careers. Studies in Higher Education,
23(3), 281–295.
Brinn, T., Jones, M. J., & Pendlebury, M. (1996). U.K. accountants’ perceptions of research journal
quality. Accounting and Business Research, 26(3), 265–278.
Brown, L. D., & Huefner, R. J. (1994). The familiarity with and perceived quality of accounting
journals: Views of senior accounting faculty in leading U.S. MBA programs. Contemporary
Cargile, B. R., & Bublitz, B. (1986). Factors contributing to published research by accounting faculties.
The Accounting Review, 61(1), 158–178.
Daigle, R., & Arnold, V. (2000). An analysis of the research productivity of AIS faculty. International
Journal of Accounting Information Systems, 1, 106–122.
Davidson, S. (1957). Research and publication by the accounting faculty. The Accounting Review,
32(1), 114–118.
Demski, J. S., & Zimmerman, J. L. (2000). On Research vs. Teaching: A long-term perspective.
Accounting Horizons, 14(4), 343–352.
Doyle, J. R., & Arthurs, A. J. (1995). Judging the quality of research in business schools: The U.K. as
a case study. Omega International Journal of Management Science, 23(3), 257–270.
Durden, C. H., Wilkinson, B. R., & Wilkinson, K. J. (1999). Publishing productivity of Australian
accounting ‘units’ based on current faculty composition. Pacific Accounting Review, 11(1),
1–27.
Englebrecht, T. D., Govind, S. I., & Patterson, D. M. (1994). An empirical investigation of the publi-
cation productivity of promoted accounting faculty. Accounting Horizons, 8(1), 45–68.
Gee, K. P., & Gray, R. H. (1989). Consistency and stability of U.K. academic publication output criteria
in accounting. British Accounting Review, 21(1), 43–54.
Gray, R. H., Haslam, J., & Prodham, B. K. (1987). Academic departments of accounting in the U.K.:
A note on publication output. British Accounting Review, 19(1), 53–71.
Gray, R., & Helliar, C. (1994). U.K. accounting academics and publication: An exploration of
observable variables associated with publication output. British Accounting Review, 26(3),
235–254.
Hasselback, J. R., Reinstein, A., & Schwan, E. S. (2000). Benchmarks for evaluating the research
productivity of accounting faculty. Journal of Accounting Education, 18(2), 79–97.
Hull, R. P., & Wright, G. B. (1990). Faculty perceptions of journal quality: An update. Accounting
Horizons, 4(1), 77–97.
Imhoff, E. A. (1988). Planning academic accounting careers. Issues in Accounting Education, 3(2),
286–301.
Judge, G. G., Hill, R. C., Griffiths, W. E., Lutkepohl, H., & Lee, T.-C. (1988). Introduction to the theory
and practice of econometrics (2nd ed.). New York: Wiley.
Long, R. G., Bowers, W. P., Barnett, T., & White, M. C. (1998). Research productivity of graduates
in management: Effects of academic origin and academic affiliation. Academy of Management
Journal, 41(6), 704–714.
Maranto, C. L., & Streuly, C. A. (1994). The Determinants of accounting professors’ publishing pro-
ductivity – The early career. Contemporary Accounting Research, 10(2), 387–407.
Mautz, R. K. (1988). Editorial: Fifty years of accounting. Accounting Horizons, 2(1), 126–129.
McDonald, J. R., & Moffitt, R. A. (1980). The uses of Tobit analysis. Review of Economics and
Statistics, 62(2), 318–321.
Meyer, M. J., & Titard, P. L. (2000). Those who can . . . teach. Journal of Accountancy, 190(1), 49–58.
Newell, G., Langsam, S., & Kreuze, J. (1996). Accounting faculty profiles: Demographics and percep-
tions of academia. Journal of Education for Business, 72(2), 87–94.
Nobes, C. W. (1985). International variations in perceptions of accounting journals. The Accounting
Review, 60(4), 702–705.
Otley, D. (2002). British research in accounting and finance (1996−2000): The 2001 research assess-
ment exercise. British Accounting Review, 34(4), 387–417.
Schulman, C. G., Thomas, D. W., Sellers, K. F., & Kennedy, D. B. (1996). Effects of tax integration
and capital gains tax on corporate leverage. National Tax Journal, 46(1), 31–54.
Wiley, J. (1998). Jacaranda Wiley directory of accounting: 1998–1999. Brisbane, Australia: Jacaranda
Wiley.
Wilkinson, B. R., & Durden, C. H. (1998). A study of accounting faculty publishing productivity in
New Zealand. Pacific Accounting Review, 10(2), 75–95.
Zeff, S. A. (1996). A study of academic research journals in accounting. Accounting Horizons, 10(3),
158–177.
Zivney, T. L., Bertin, W. J., & Gavin, T. A. (1995). A comprehensive examination of faculty publishing.
Issues in Accounting Education, 10(1), 1–25.
CLASSIFICATION OF CUSTOMIZED
ASSURANCE SERVICES BY DECISION
MAKERS: THE CASE OF SysTrust™
Philip R. Beaulieu
ABSTRACT
When decision makers encounter new assurance services that can be
customized for individual clients, they must include them in their pre-existing
categorization of assurance, a cognitive task known as postclassification.
This paper draws upon three literatures (classification research in account-
ing, theory of assurance, and cognitive psychology) in order to suggest how
this task might be modeled and studied empirically, using the example of
SysTrust™ . The role of a necessary condition for successful postclassification
called the category use effect (Ross, 2000), in which decision makers are
reminded of pre-existing categories when they learn to use new categories,
is explained.
1. INTRODUCTION
New forms of assurance1 provided by public accountants have proliferated in the
last decade due to both supply and demand factors. On the supply side, public
accounting firms have sought to generate revenue in growth areas of assurance and
related consulting activities because growth opportunities in the mature market
for traditional financial statement assurance are limited. Demand for innovation in
assurance stems partly from technological innovation, which has led to concerns

ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06009-5
189
190 PHILIP R. BEAULIEU
about the reliability of electronic processes (AICPA, 2000) and electronic

reporting of information (Lymer et al., 1999). The result of these supply and
demand pressures is customized assurance geared towards non-traditional (at least
for public accounting firms) market segments. Customized assurance services,
for example services applied to the reliability of systems, share some features
in common with traditional financial statement assurance and some significant
differences. A major challenge facing public accounting firms is to commu-
nicate these similarities and differences between traditional and customized
assurance to their clients and other decision makers who rely upon their services
(AICPA, 2000).
Decision makers face the cognitive task of revising their previous classification
of assurance services by incorporating new categories for customized services.
Before, they may have placed assurance into one or two simple categories
representing all financial statement attestation, or attestation broken down into
audit and review level assurance. They now need new subcategories of assurance,
classified either as attestation or some other form of assurance, in which to
classify customized assurance. If decision makers cannot consistently recognize
distinguishing features of all forms of assurance, especially the level of assurance,
the risk is that some features will be inaccurately attributed to customized
assurance categories, and that public accountants will incur a negative public
reaction. For instance, decision makers may assume that assurance regarding
reliability of systems is at the same level as an audit of financial statements,
whereas practitioners impose qualifications on the assurance related to the
criteria by which systems are defined. This would not be an expectations gap
about one type of attestation (audit), as has been discussed before (e.g. Houston
& Taylor, 1999), but a set of multiple expectations gaps regarding multiple
assurance categories.
The purposes of this paper are to facilitate understanding of customized
assurance classification and suggest avenues for future empirical research
in customized assurance services by combining three literatures: behavioral
classification research in accounting and auditing (discussed in Section 2), theory
of assurance classification (Sections 3 and 4), and recent behavioral research in
cognitive psychology (Section 5). Specific research opportunities are identified
in Section 6. The first key source mentioned in the paper is Cohen (2000), who
proposed two evidential requirements for hierarchical classification systems that
are used to organize the paper: non-empirical logical evidence and empirical
behavioral evidence. The primary sources of assurance classification are various
publications of the American Institute of Certified Public Accountants (AICPA)
and Kinney (2000), and the important behavioral research is Ross (1996, 1997,
1999, 2000). Ross studied postclassification, where people revise pre-existing
Classification of Customized Assurance Services by Decision Makers 191
classification systems to include new categories; this is relevant to the expansion

of assurance services beyond traditional audit, review, and compilation of
financial statements.
SysTrust™ Version 2.0, initially released as an exposure draft in 2000 and
effective in examination periods beginning August 31st, 2002, is used to illustrate
classification of customized assurance services. Jointly issued by the AICPA
and the Canadian Institute of Chartered Accountants (CICA), SysTrust™ is
intended to provide assurance to a firm’s management, investors, and partners
regarding the reliability of its systems. SysTrust™ is described in Section 4 as
a customized service because it encompasses attestation and non-attestation
engagements, as well as many types of systems. This range of services within the
SysTrust™ brand name complicates the task of classification of the services by
decision makers, making the case of SysTrust™ ideally suited to the purposes of
this paper.
2. COGNITIVE MODELS OF CLASSIFICATION

IN ACCOUNTING
Classification systems can in principle be based on the world, culture, language,
or the mind (Dahlgren, 1995). Although these perspectives are related, the mental
basis is the primary interest of this paper because relevant literature in both
accounting (especially Bonner et al., 1997) and cognitive psychology (Ross, 1996,
1997, 1999, 2000) is available. This section presents a brief review of research in
accounting with a cognitive approach to classification, and an explanation of two
evidential requirements for classification systems proposed by Cohen (2000) that
will be used to structure the remainder of the paper. A discussion of literature in
cognitive psychology is reserved for Section 5.
In accounting, cognitive models of classification are commonly referred to
as knowledge structures or schemas, and are used in the fields of auditing and
decision-making uses of financial statements.2 The heaviest use of knowledge
structures as classification systems has been in auditing, where a variety of
dependent variables (tasks) have been studied. Frederick et al. (1994) and
Nelson et al. (1995) used card-sorting tasks to obtain evidence of transaction
cycle-dominant and audit objective-dominant knowledge structures; Nelson
et al. (1995) also included conditional probability estimation tasks. Bonner
et al. (1997) also examined knowledge structures based on transaction cycles and
audit objectives, but selected error frequency and audit planning tasks. Choo and
Trotman (1991) used recall and predictions of the probability of company failure
by auditors to test whether their knowledge structures encode the typicality of
a going-concern firm. Recall was also the dependent variable used by Moeckel
(1990) and Libby and Trotman (1993); see Libby (1995) for a review of research
in auditing involving knowledge structures and memory.
Memory models and recall have been featured in research concerning external
users of financial statements, although less has been done in this area than in
auditing. Beaulieu (1996) posited that commercial loan officers use a classification
system based on the Five Cs of Credit (character, capacity, capital, conditions
and collateral), a classification system used to teach loan officers to process infor-
mation and make loan decisions. Greater recall of decision-consistent character
and accounting (capacity and capital) information than decision-inconsistent
information provided evidence that the classification system resided in long-term
memory and biased recall in favor of decision-consistent information. Another
example of research involving users of financial statements is Kida et al. (1998),
who proposed that managers making stock investment and financial difficulty
decisions encode (classify) accounting information according to affect, a positive
or negative response to numerical data. Recall and decision results supported the
existence of an affect-based classification system in long-term memory. A fair
question to ask is whether auditing and accounting classification systems really
exist in the minds of auditors and financial statement users, psychologically and
neurologically, or whether they exist only as conventions that are convenient for
research purposes. Cohen (2000, p. 2) proposed two types of evidential require-
ments – logical and behavioral – for hierarchical classification systems, in which
“lower level items inherit the properties of higher level items.” Logical evidence
requires a convincing argument that a hierarchy is more efficient than alternative
methods of organizing and accessing knowledge. The argument for an efficient
system asserts that it enables economical storage and access to information,
and that “representation of factual knowledge at different levels of generality
facilitates the identification of useful analogies” (p. 5). Behavioral evidence
consists of experiments in which different hierarchical levels are presented,
causing effects in response times, error rates, and quality of responses.3 Bonner
et al. (1997) illustrates how these two criteria can be used to evaluate potential
classification systems.
Bonner et al. (1997) studied how accounting students learn to estimate the
frequency of financial statement errors. Subjects in their experiment were taught
either: (1) the relationship between financial statement errors and three categories
of transaction cycles: sales and receipts, inventory/purchases, and investments;
or (2) the relationship between errors and three categories of audit objectives:
proper cutoff, validity, and valuation. Subjects then observed a sequence of errors
and finally were asked for frequency estimates. The first hypothesis of Bonner
et al. (p. 391) was:
Subjects receiving transaction cycle (audit objective) category instruction prior to experiencing
frequencies will make frequency estimates which more closely reflect experienced error fre-
quencies for transaction cycle (audit objective) categories than for audit objective (transaction
cycle) categories.
The logical evidence required by Cohen (2000) to justify this hypothesis is in

the domain of accounting and auditing; classification systems may cause similar
effects in other contexts, but a hypothesis worded as specifically as this ought to
be supported by a reasonable accounting story. Bonner et al. (1997) argued that
accounting students lack consistent education regarding categories of financial
statement errors and that in their early experience they learn slowly about errors
because they seldom encounter them. Thus, the first piece of logical evidence
supporting the hypothesis is that accounting students lack any classification
system for financial statement errors. The second piece of logical evidence drawn
from the accounting domain is that financial statement errors can be arrayed
in a matrix (three-by-three in the experiment, Table 2 in Bonner et al.) with
transaction cycles as columns and audit objectives as rows. The two classification
systems are alternatives, not sub- and super-categories of the same hierarchical
system, and knowledge of one system does not help a person understand the
logic of the other.
Cohen (2000) also requires behavioral evidence to support the view that
a classification system is a psychological phenomenon. Bonner et al. (1997)
obtained results supporting the hypothesis; for example, when subjects received
28 actual errors they estimated 19.18 errors when asked to estimate according to
the same categories they were taught (either transaction cycles or audit objectives),
and 15.54 errors when asked to estimate according to different categories. Bonner
et al. ruled out alternative explanations of their results not based on the transaction
and audit objective classification systems, for example by testing differences in
linear trends of frequency estimates rather than mean differences. The results
of Bonner et al. are consistent with research in other contexts, but Bonner et al.
claim that they are interesting because classification in accounting and auditing
involves relatively ill-defined categories, compared to natural categories used
elsewhere in cognitive psychology. The combination of logical evidence from the
accounting domain and statistically significant behavioral evidence does more
than create interesting results – it inspires belief that two alternative classification
systems have psychological, and possibly neurological, reality.
The following two sections address the logical evidence requirement of Cohen
(2000) in the context of customized assurance services, citing assurance literature
to build cases for two alternative classification systems. Section 5 on the category
use effect will cite cognitive psychology literature in order to suggest behavioral
tests of the system.
3. CLASSIFICATION OF ASSURANCE SERVICES

The term “assurance services” came into use in the 1990s and was formally defined
by the AICPA Special Committee on Assurance Services (the “Elliott Committee”)
in 1996 as “independent professional services that improve the quality of informa-
tion, or its context, for decision makers” (AICPA, 1996). The term was intended
to include auditing as a subcategory, as indicated in the following quote, which
refers to the Special Committee’s conceptual framework for assurance services.
The framework’s primary objective is to provide a consistent view of assurance services. It
provides guidelines that will enhance consistency and quality in the performance of services.
It can also help establish a common public perception of the CPA’s function and value.
Assurance services evolve naturally from attestation services, which in turn evolved from
audits. The roots of all three are in independent verification. However, the form and content of
the services differ. The earlier services are highly structured services considered to be relevant
to the greatest number of users. The newer ones are more customized and targeted services
intended to be highly useful in more limited circumstances (AICPA, 2000, p. 1).
The term assurance services was not part of the auditing lexicon prior to the
1990s. For example, the classic book on the philosophy of auditing by Mautz and
Sharaf (1961) does not mention levels or categories of audit services in any of its
eight postulates of auditing or five primary concepts of auditing (evidence, due
audit care, fair presentation, independence and ethical conduct), let alone mention
assurance services. The term appeared in auditing textbooks after the AICPA
definition in 1996 – a year later in the case of Arens and Loebbecke (1997).
Around that time, audit partners began calling themselves assurance partners.
The meaning of new concepts is adjusted by usage until a generally accepted
meaning is established. The most relevant examples for the purposes of this paper
are the concepts of review and compilation services defined in 1978 by the AICPA.
Statement on Standards for Accounting and Review Services (SSARS) No. 1 stated
that in a review engagement, the CPA’s report would indicate “limited assurance,”
or negative assurance, that nothing came to the attention of the CPA indicating a
material misstatement (Kinney, 2000). A compilation was defined as providing no
opinion and no assurance regarding departures from GAAP, although the CPA is
still associated with the financial statements and has some responsibility (Kinney,
2000). Research regarding the financial statement users most affected by this
classification system, commercial lenders, has provided mixed evidence on their
understanding and use of reviews and compilations. Bandyopadhyay and Francis
(1995) found that loan officers’ interest rate recommendations and loan decisions
were affected by the level of attestation (including audit, review, and compilation).
Martin et al. (1988) reported that lenders do not generally differentiate between
audits and reviews, but their acceptance of compilations depends on a number of
factors, including the level of owners’ equity and term of the loan. Johnson et al.
(1983) found that level of attestation (audit, review, compilation, and no attesta-
tion) did not affect loan decisions; Wright and Davidson (2000) similarly found no
effect on loan risk assessments.
In the United States, a gap between users’ and practitioners’ expectations of
audits led to the adoption of many Statements of Auditing Standards (SAS),
including SAS Nos 52–60, as well as SAS No. 82 on consideration of fraud in
a financial statement audit. Thus, in addition to the research conducted between
1983 and 2000 on financial statement users’ perceptions of audit, review, and
compilation services, other papers addressed the expectations gap related solely to
audit-level attestation. Some of this research suggests that expectations gap stan-
dards might effectively narrow the gap (e.g. Bamber & Stratton, 1997; Campbell &
Mutchler, 1988; Jennings et al., 1993; Kinney & Nelson, 1996). However, a paper
by Houston and Taylor (1999) on WebTrust indicated that users of that assurance
service incorrectly inferred that additional assurance regarding product quality
was provided.
Although the research cited in the preceding paragraphs offers the hope that
users can be educated in order to calibrate their expectations of assurance services
consistently with practitioners, it also discourages the assumption that decision
makers have any particular classification system in mind. To be conservative, this
paper will assume nothing about the classification hierarchies that decision makers
might have adopted since 1996 to accommodate customized assurance. Instead,
two theoretical classification systems, the AICPA (2000) and Kinney (2000), will
be examined for their potential in assisting decision makers to classify customized
assurance efficiently.
In addition to defining assurance services in terms of improvements to the
quality and context of information, the Special Committee (AICPA, 2000)
related them to attestation and consulting services in a framework of categories.
Attestation is a subcategory of assurance with detailed standards, whereas there is
some overlap between the categories of assurance and consulting activities. The
primary distinction between assurance and consulting is the goal of the service;
assurance improves decision-makers’ output indirectly, through provision of better
information, whereas consulting aims to aid decision makers directly through
research and findings. The AICPA’s positioning of the assurance, attestation, and
management consulting categories is shown in Fig. 1. Essential features of these
categories are described in Table 1; the hierarchical relationship between attesta-
tion and assurance is evident in the table. For example, the objective of assurance
is better decision making, which subsumes the narrower objective of attestation,
reliable information. The level of assurance is defined as examination, review, or
agreed-upon procedures in the attestation category, but the assurance category is
Fig. 1. Universe of CPA Services (Reproduced from AICPA, 2000, p. 8).
flexible with regard to levels, which may range from explicit assurance about the
usefulness of information for specific purposes to implicit assurance resulting from
CPA involvement.
The test of logical evidence advocated by Cohen (2000) requires that the
hierarchical system in Fig. 1 be more efficient than alternative classification
systems in terms of information storage and access, and identification of useful
analogies. The system is economical in that there are just seven categories at
three levels; a hierarchy that could accommodate the complexity and variety of
assurance services in fewer categories is hard to conceive. The attestation category
is parsimonious because when decision makers encounter a service that they
expect is attestation, they only have to consider coding it as audit examination,
review, or agreed-upon procedures. The system might help decision makers think
Table 1. Types of Services (Reproduced from AICPA, 2000, p. 7).

Attestation Assurance Consulting
Result Written conclusion Better information for Recommendations

about the reliability of decision-makers. based on the
the written assertions objectives of the
of another party. engagement.
Recommendations might
be a byproduct.
Objective Reliable information. Better decision making. Better outcomes.
Parties to the Not specified, but Generally three (although Generally two; CPA
engagement generally three (the the other two might be paid by the user.
third party is usually employed by the same
external); CPA entity); CPA paid by the
generally paid by the preparer or user.
preparer.
Independence Required by Included in definition. Not required.
standards.
Substance of CPA Conformity with Assurance about Recommendations;
output established or stated reliability or relevance of not measured
criteria. information. against formal
criteria.
Criteria might be
established, stated, or
unstated.
Form of CPA output Written. Some form of Written or oral.
communication.
Critical information Asserter. Either CPA or asserter. CPA.
developed by
Information content Preparer (client). Preparer, CPA, or user. CPA.
determined by
Level of assurance Examination, review, Flexible, for example, it No explicit
or agreed-upon might be compilation assurance.
procedures. level, explicit assurance
about usefulness of the
information for intended
purposes, or implicit from
CPA involvement.
of useful analogies when they encounter a new assurance service by identifying

its relationship to familiar services, particularly the audit and review levels of
attestation. Unfortunately, a drawback is that the boundaries of management
consulting overlap those of assurance and attestation (bisecting the agreed-upon
procedures category). Therefore, according to Fig. 1, a management consulting
service may be categorized as exclusively consulting, a non-attestation assurance,
Fig. 2
Information Quality Assurance Services (Adapted from Kinney, 2000, p. 12). Source: This
figure is reproduced from Information Quality Assurance and Internal Control for Manage-
ment Decision Making (2000, Irwin/McGraw-Hill) by W. Kinney and is reproduced with
permission of The McGraw-Hill Companies.
an agreed-upon procedure, or some other type of attestation. Table 1 is somewhat

inconsistent with Fig. 1 because it states that consulting engagements offer no
explicit assurance, and thus would not overlap attestation as in Fig. 1.
Cohen (2000) suggests the logical test of comparing a hierarchical classification
system to alternative systems. The AICPA’s system provides a starting point
because it has the force of the CPA brand for customized assurance services
behind it, but other systems are possible. An alternative provided by Kinney
(2000, p. 12) is adapted as Fig. 2. This system contains nine categories at three
levels, similar in size and depth to the AICPA’s system. Otherwise, Kinney’s
system is, at least superficially, more complex. Two concepts are introduced
subordinate to assurance services: relevance improvement, information that helps

“the decision maker form a better mental image of real-world conditions” (Kinney,
2000, p. 11); and reliability improvement and compliance, which increase the
decision maker’s confidence in the application and display of measurement results.
These two subcategories are the keys to Kinney’s system; cognitive effort expended
in comprehending them might allow a decision maker to classify new assurance
services quickly into the third-level categories. This system shares a difficulty with
the AICPA’s system in that two categories, internal control design/operation and
information origination services, intersect two higher-order categories. Kinney’s
system is different in that the attestation levels of review and agreed-upon
procedures, and the non-attestation service of compilation, are not specified.
Arguing that either classification system is superior on the basis of a logical
argument is difficult. The Kinney system places the decision maker using it under
a heavier conceptual burden up front because its two key categories have no single
real-world referent, such as an audit or review report. There is a potential difficulty
at the third level of the hierarchy because one category is “audits” of internal
control; what is an “audit” in quotation marks supposed to be? Unlike the AICPA’s
system, the only category of attestation or assurance specified unequivocally is the
audit. Decision makers searching either classification for an analogy to a freshly
encountered assurance service are assisted differently in each system. Figure 1
(AICPA) has concrete examples available; and if a customized service does not
match any of them on essential dimensions, it might be placed in an open space
under assurance services, essentially constituting its own category. Using Fig. 2
(Kinney), decision makers would presumably make an initial classification with
respect to relevance or reliability improvement; and if the former (latter) is chosen,
decision makers would make a second classification choice regarding measurement
design or context improvement (audit or non-audit service). This approach enables
efficient storage and retrieval of information, but decision makers with different
cognitive styles might possibly prefer either classification system, the concrete
(AICPA) or the conceptual (Kinney).
In reality, decision makers likely have many different classification systems for
assurance services, in the extreme one unique system per person. Audit, review, and
compilation services are the types of assurance with which most decision makers
would be familiar, and many of them might classify these services in a fashion
similar to that intended by the AICPA (Fig. 1). Kinney’s (2000, Fig. 2) system
appeared in a book that explains assurance services to decision makers, but is
probably less well known. Regardless of the exact numbers of decision makers who
might have encountered either system, they can be seen as ideal, comprehensive
hierarchies capable of accommodating almost any customized assurance service.
With the admission that these systems are ideals that decision makers may adapt
to their individual circumstances, we proceed in the next section to show how they
could be revised to include SysTrust™ , an example of customized assurance.
4. CLASSIFICATION OF CUSTOMIZED ASSURANCE:

THE EXAMPLE OF SysTrust™
SysTrust™ , as described in Exposure Draft Version 2.0, is intended to “increase
the confidence of management, customers, and business partners in systems that
support a business or particular activity” (AICPA/CICA, 2000, p. 4). Elsewhere,
Version 2.0 defines the set of SysTrust™ users more broadly:
Potential users of this service are shareholders, creditors, bankers, business partners, third-party
users who outsource functions to other entities, stakeholders, and anyone who in some way relies
on the continued availability, security, integrity, and maintainability of a system (AICPA/CICA,
2000, p. 4).
The four principles used to judge whether a system is reliable, mentioned in the
above quotation, are defined as follows (AICPA/CICA, 2000, pp. 11–13).
Availability: The system is available for operation and use at times set forth in service-level
statements or agreements.
Security: The system is protected against unauthorized physical and logical access. This prin-
ciple also addresses privacy concerns related to use of confidential information.
Integrity: System processing is complete, accurate, timely, and authorized.
Maintainability: The system can be updated when required in a manner that continues to provide
for system availability, security, and integrity.
In a SysTrust™ engagement, a practitioner collects evidence about the effec-

tiveness of controls over the principles for a defined period.4 Version 2.0 lists
over 200 illustrative controls, covering all four principles, whose effectiveness
practitioners may test. The result is a report on whether management maintained
effective controls over the SysTrust™ principles addressed by the engagement, or
on management’s assertion about the effectiveness of controls. Any system may
be addressed by a SysTrust™ engagement, not just Internet-related systems as in
the case of WebTrust™ . For example, a corporation’s financial services system
may be defined for the purposes of a SysTrust™ engagement as its data center,
including infrastructure such as a CPU and peripherals, software, data, employees,
and procedures.
SysTrust™ engagements are generally considered attestation because they are
performed under Statement on Standards for Attestation Engagements (SSAE)
No 1, found in Section 100 of the AICPA’s Professional Standards. However,
other customized engagements are permitted under SysTrust™ Version 2.0, as

described below.
This document so far has described how the SysTrust™ Principles and Criteria may be used
in examination/audit level attestation engagements for systems in production. The SysTrust™
Principles and Criteria may also be used in other types of engagements that meet client needs,
as long as the applicable professional standards and the SysTrust™ licensing agreement are
observed. Following are examples of other types of SysTrust™ engagements a practitioner
might perform (AICPA/CICA, 2000, p. 19).
The examples that follow this quote are reporting on selected SysTrust™ prin-
ciples, engagements for systems in the preimplementation phase, agreed-upon
procedures engagements, and consulting engagements (review level assurance is
not allowed). Thus, Exposure Draft Version 2.0 enables practitioners to customize
SysTrust™ assurance in several ways to meet specific client needs, but these
adjustments require a great deal of diligence on the part of decision makers to
understand. First, management defines the boundaries of the system in question
in a System Description attached to the management assertion regarding the
effectiveness of its controls, which in turn is attached to the assurance report.
Management can choose to define a system in any way it sees fit; the system
might be narrowly defined, as in the case of a data center, or broadly defined, as
in the case of an outsourced finance and accounting function or ERP system.
A second significant aspect of customization is that Version 2.0 allows reporting
on any one of the four SysTrust™ principles. Thus, an engagement could address
only the integrity principle, and provide no assurance regarding availability,
security, or maintainability. The accountant’s report would list all four principles
and state that integrity is the sole principle covered, but it would be left to
decision makers to search for a definition of the integrity principle. As defined
by SysTrust™ , integrity consists of complete, accurate, timely, and authorized
processing, but the auditor’s report refers the user to the AICPA (or CICA) Web
site for the definition; it does not appear in the report itself.
Customization under SysTrust™ (Version 2.0) extends even further than the
definition of system boundaries and reporting on selected principles. There can
also be engagements for systems in the pre-implementation phase, i.e. systems that
have not yet been placed in operation. Here, the practitioner tests the suitability
of the design of controls at a point in time, rather than the operating effectiveness
of controls for a period of time, as is the case for other SysTrust™ reports. For
pre-implementation phase engagements, the system description attached to the
practitioner’s report would require additional detail, such as the version of the
system and “other appropriate identifiers” (AICPA/CICA, 2000, p. 20).
There are few limits to customization of assurance under the proposed
SysTrust™ , making it relatively difficult to perceive as a single product. However,
Fig. 3. Postclassification of Assurance Services – AICPA Option 1.
it has been trademarked and servicemarked in the United States and Canada, and
the brand appears in independent accountants’ or auditors’ reports, as in the phrase
“SysTrust™ Principles and Criteria.” SysTrust™ users have some alternatives
when they consider how to integrate it into their existing conceptual frameworks,
a process called postclassification in the cognitive psychology literature (Ross,
1999). They range from creating a single category for SysTrust™ , with features
of all customized options attached, to creating many SysTrust™ categories under
pre-existing assurance categories with features matching customization. These
choices are considered below in three possible postclassifications, two using the
AICPA’s classification system and one based on Kinney’s (2000) hierarchy.
Figure 3 revises Fig. 1 (the AICPA system) by including a category for
SysTrust™ that spans the attestation and management consulting categories, and
the attestation subcategories of audit and agreed-upon procedures (excluding
review), as defined by Exposure Draft Version 2.0. It might be a challenge for
decision makers to add the SysTrust™ category because it intersects different
Fig. 4. Postclassification of Assurance Services – AICPA Option 2.
levels and types of assurance, but at least there are concrete reference points
(audit, agreed-upon procedures, and consulting) in the classification system. This
postclassification is also likely to foster the brand-name awareness of SysTrust™
among decision makers that the AICPA desires by creating a single category for it.
A difficulty with this postclassification is that the subject matter and customization
features of SysTrust™ , such as reporting on selective system reliability principles,
are not primary identifiers of the category.
An alternative postclassification to that shown in Fig. 3 would be to create three
separate SysTrust™ subcategories for audit examination, agreed-upon procedures,
and management consulting, as shown in Fig. 4. This option allows decision
makers to compare the subject matter of SysTrust™ with other forms of assurance,
matched according to level of assurance. For example, within the category of
audit examination, the categories of financial statements and SysTrust™ explicitly
recognize that assertions regarding financial information and systems are involved.
However, breaking SysTrust™ down into three subcategories of other concepts
might sacrifice brand recognition among decision makers, and since SysTrust™
is distributed among several subcategories, increase the cognitive effort required
to classify each new SysTrust™ engagement. Using the single-category approach
(Fig. 3), more effort is likely expended initially in identifying the breadth of the
category, but less effort might be needed to store and access new information
once postclassification is complete.
Postclassification of SysTrust™ according to the Kinney (2000) system is
pictured in Fig. 5, which is restricted to the reliability improvement category,
Fig. 5. Postclassification of Assurance Services – Based on Kinney (2000).

the relevant portion of Fig. 2. SysTrust™ would be excluded from the category
of audits of financial statements and would constitute a subcategory of “audits”
of internal control quality, business processes, etc. The meaning of “audits” in
quotations would necessarily expand to include both true audits and quasi-audits.
There is less emphasis on levels of assurance at the top of the Kinney hierarchy
than in the AICPA’s classification system, so decision makers would be required
to recognize them at a lower point in the hierarchy, perhaps constructing sub-
categories of SysTrust™ for audit, agreed-upon procedures and consulting (not
shown in Fig. 5). Kinney’s system is similar to the single-category approach based
on the AICPA’s system in that there is a relatively high initial postclassification
cost in creating a comprehensive category having many features. The cost may
be even greater under Kinney’s system because analogs of financial statement
assurance levels are further removed from SysTrust™ . The advantage of Kinney’s
classification system is that decision makers could quickly classify SysTrust™ as
an assurance service that improves reliability of business processes (systems).
5. BEHAVIORAL EVIDENCE AND

THE CATEGORY USE EFFECT
The preceding section presented three alternative postclassifications (Figs 3–5)
that decision makers could use to incorporate SysTrust™ in initial classification
systems (Figs 1 and 2) for assurance services. An argument can be made in favor
of each system’s ability to help decision makers store and access information
regarding key assurance concepts, but Cohen (2000) requires that behavioral
evidence is needed to support the assertion that any of these postclassifications
has psychological reality. Bonner et al. (1997) pointed out that research attention
in cognitive psychology is directed towards natural categories, making it difficult
to find combinations of theory and method appropriate for acquiring behavioral
evidence in the less concrete domain of assurance. For instance, Rosch et al.
(1976) use musical instruments, fruit, clothing, furniture, trees, fish, and birds
as taxonomy stimuli. Johonson and Mervis (1997) studied the effect of expertise
on categorization of songbirds. In addition to employing natural categories, the
focus of most research in cognitive psychology is how people initially form
mental categories to aid them in problem-solving or classification tasks (e.g. Malt
et al., 1995; Osherson et al., 1990), rather the effect of using a given classification
system on subsequent revision of it, such as subclassification.
Ross (1996, 1997, 1999, 2000) is an exception to these trends because his
research concerns abstract classification systems and revisions to pre-existing
systems, which he refers to as postclassification. He claims that these learning

situations are common to many practical category uses (Ross, 2000); this would
include the situation faced by decision makers who must attempt to classify new
customized assurance services. Ross performed several experiments illustrating
the category use effect, but the one most relevant to customized assurance is
the second experiment in Ross (2000). It must be described in detail in order to
explain the category use effect. In the initial classification learning phase of this
experiment, summarized in Table 2, non-medical student subjects were instructed
Table 2. Summary of Ross (2000), Experiment 2.

Tasks and Result Learning of Use Condition
Classification Required During Classification Not Required During

Learning of Use Learning of Use
Classification Classify “patients” of 3 symptoms Same as other condition.

learning into 1 of 2 disease categories.
Learning of use Classify patients into disease Subjects not required to classify
categories. patients into disease categories.
Classify patients into 1 of 4 Same as other condition.
subcategories (treatments), based
upon symptoms.
A sheet with the two diseases and the A sheet listing only two treatments
two treatments relevant for each is for one of the diseases is shown.
visible. After symptoms have been listed, a
second sheet listing only two
treatments for the second disease is
shown and symptoms are listed
again. Order of treatments shown
was counterbalanced.
Relevant-use symptoms determine Same as other condition.
treatment.
Irrelevant-use symptoms are
irrelevant to treatment.
Final task List symptoms that a person would Same as other condition.
be likely to have, for both diseases.
Result Ratio of correct to incorrect Ratio of correct to incorrect
relevant-use symptoms higher than relevant-use symptoms not
the ratio for irrelevant-use symptoms. significantly different for relevant-use
and irrelevant-use symptoms.
Conclusion During additional learning of a classification system (learning of use), the
original categories must be activated so that they can be modified
(subcategories added).
to learn a classification system consisting of two fictitious diseases. They learned

the system by studying a series of patient cards, each “patient” consisting of a list
of three symptoms. There were twelve symptoms in all (e.g. cough, skin rash, sore
muscles), but only eight symptoms predicted the two diseases, four symptoms for
each disease. The symptoms were perfectly predictive; whenever they appeared,
the disease was present. For each patient card, subjects diagnosed one of the two
diseases, then received feedback. Diagnoses continued until a criterion level of
learning was achieved.
The learning of use phase of the experiment had two conditions, one where
additional classification by disease was required during this phase, and one where
classification by disease was not required. In the former condition, subjects were
asked to diagnose (classify) each patient as before in the initial learning phase,
and decide which of four drug treatments (two for each disease) should be given.
The drug treatments were effective only when specific symptoms, which Ross
called relevant-use symptoms, appeared. There were four relevant-use symptoms.
The other four symptoms that indicated a particular disease gave no indication
as to which drug treatment would be effective in curing the disease; Ross termed
them irrelevant-use symptoms. To help the subjects learn, they were allowed to
look at a sheet listing the two diseases and the two treatments for each disease
as they worked. Subjects were given feedback on their diagnoses and treatment
decisions for each patient, and continued until a criterion level of learning was
achieved. Essentially, subjects learned a subclassification of the disease categories
in this phase, consisting of symptoms of each disease that would or would not
respond to treatment. In summary, there were a total of 12 symptoms: four did
not indicate a disease, eight indicated a disease (four for each disease), and of
the eight indicative symptoms, four indicated which of four treatments would be
effective (with two treatments available for each disease).
In the other condition of the learning phase of the experiment, where additional
classification by disease was not required, subjects did not classify patients into
disease categories before recommending a treatment. Each subject saw patient
cards indicating symptoms of only one of the two diseases and was able to look
at a sheet listing only the two drug treatments corresponding to that disease –
not the name of the disease. Later, subjects performed the same task with the
second disease (the order of diseases was counterbalanced). Thus, in this condition,
subjects were given no direct opportunity to learn how to use the original disease
classification system.
The final task in both conditions of the experiment was a feature generation
task, specifically, subjects were asked to list the symptoms that a person with a
disease would be likely to have. The difference was that the group not required
to classify by disease during learning of use performed the feature generation
task separately for each disease. A symptom would be scored correct if it did
diagnose the disease, and incorrect if it indicated the other disease or was one of
the four symptoms that did not diagnose either disease. In the condition where
classification by disease was required in the learning of use phase, the ratio of
correct to incorrect relevant-use symptoms listed (0.80) was significantly higher
than the corresponding ratio for irrelevant-use symptoms (0.58). In the condition
where classification was not required during learning of use, the ratios of correct to
incorrect symptoms were lower and did not differ significantly between relevant use
(ratio = 0.40) and irrelevant-use (ratio = 0.38) symptoms. This result indicates a
category use effect; using the disease categories while learning subclassifications
of the system – relevant-use versus irrelevant use symptoms – improved the ability
of subjects to list symptoms in general, but particularly symptoms critical to the
subclassification being learned. The critical condition for the category use effect to
occur, identified in this experiment, is that the original categories must be activated
so that they can be revised. In plain language, people must be reminded of original
categories while they learn to use new, related categories in order for their use of
the original categories to be changed in the correct or intended manner.
In four other experiments, Ross (2000) ruled out alternative explanations of
the category use effect and found that it applied to a reverse-order task, in which
subjects were given one or two symptoms and asked to name the disease most
likely for a patient with the symptom(s). In other research, Ross found that the
category use effect applies to a problem-solving task in which formulas must be
learned (Ross, 1999). In short, the effect is robust across variables, measures, and
tasks, although the experiment described above is most relevant to the task of
learning features of customized assurance reports.
Applied to assurance services, the category use effect requires that decision
makers be reminded of initial categories of assurance as they encounter new
services, including SysTrust™ . The AICPA assumes that initial categories will be
related to the CPA brand name in some fashion (refer to the quote in Section 3).
In the AICPA’s initial classification system (Fig. 1), the relevant categories are
attestation, including the subcategories of audit examination and agreed-upon pro-
cedures, and management consulting. Presumably, this reminder would heighten
awareness among decision makers of the customization inherent in SysTrust™
with regard to level of attestation, regardless of whether postclassification involved
single (Fig. 3) or multiple (Fig. 4) categories for SysTrust™ . If decision makers
were taught Kinney’s (2000) classification system initially (Fig. 2), the one
essential category would be “audits” of internal control quality, business process,
etc., because SysTrust™ is entirely contained in that category. However, reminders
about three higher levels of the hierarchy – audits of financial statements, reliability
improvement, and relevance improvement – may possibly help decision makers
define SysTrust™ by contrasting it with different assurance services. In contrast

with category use in the AICPA’s classification system, in Kinney’s system the
focus is on the types of decisions involved and measurement systems supporting
them rather than on levels of assurance. However, if decision makers were required
to review the audits of financial statements category they might be reminded
of similarities and differences between assurance levels implied by SysTrust™
and traditional audits.
This postulated category use effect in the context of assurance services resem-
bles an aspect of the study of category audit knowledge by Bonner et al. (1997).
In the first phase of their experiment subjects learned either a transaction cycle
or audit objective classification system, and in the second frequency-learning
phase they were shown nine errors and asked to classify them according to
their assigned system. The second phase of the experiment had a category use
component; there was an aided frequency knowledge test in which subjects were
reminded of error categories just before the test. There was some evidence in the
results that the reminder improved frequency knowledge, offering some support
for a category use effect in assurance services. However, Bonner et al. studied two
static classification systems and their concern was the effectiveness of category
learning before category use, not category learning concurrent with category use.
Thus, Bonner et al. encourages inquiry into the category use effect in assurance,
but does not address it directly as a postclassification phenomenon. The following
section offers suggestions as to how the category use effect (Ross, 2000) can be
tested in the field of customized assurance services, specifically SysTrust™ .
6. RESEARCH IMPLICATIONS
Behavioral evidence supporting the psychological existence of any classification
system will most likely be found in controlled experiments similar to Ross (2000).
More importantly, if the category use effect is studied, then this methodology must
be used. This section begins with a detailed explanation of one possible experiment,
then considers variations in the design.
An experiment very similar to the one by Ross (2000), summarized in Table 2,
would involve classification of assurance engagements instead of diseases. The
initial classification learning task would consist of learning relevant features of
an assurance classification system as applied to traditional financial statement
assurance, such as the level of assurance implied by each category. An associated
characteristic would be the user’s risk level, the risk that “an assertion accompa-
nied by a favorable attest report is materially misstated” (Kinney, 2000, p. 270).
Risk level might be rated on a four-point scale including low, medium, high, and
very high. In the case of the AICPA’s system (Fig. 1), audit examination would be
labeled as low risk, review as low to moderate, and agreed-upon procedures as low
to very high (Kinney, 2000). After being taught the classification system, subjects
would be given a series of two-part engagement descriptions (corresponding to
patients in Ross, 2000), the first part containing a description of the firm, its indus-
try, its general financial position and performance, and the second part consisting
of an independent accountant’s report. Firms would be described as belonging to
different industries, and their financial condition would vary, so that there would
be some uncertainty regarding the risk of using the accounting information for an
investment decision. Subjects would be asked to make an investment decision and
rate user’s risk for each firm until a criterion level of agreement with the classifica-
tion system’s ratings was achieved, similar to criterion achievement in diagnosis
in Ross (2000).
In the learning of use task, where the manipulation would occur, all subjects
would learn the essential features of SysTrust™ , such as the decision situations
in which it could be used, levels of assurance, and customization options. Next,
they would all see cases similar to those seen in the first part of the experiment,
except that these would describe various fictitious SysTrust™ engagements with
different customization features. However, only subjects in the treatment group
would be asked to classify each SysTrust™ case according to the initial assurance
classification system and would be able to see a picture of the entire system (e.g.
Fig. 1), perhaps with some description of the categories. This classification task
would likely be more difficult than the corresponding task in Ross (2000) because
it is a challenge to perceive relationships between SysTrust™ engagements
addressing the reliability of systems and traditional forms of assurance concerning
accounting information used in investment decisions. No criterion level of
“achievement” would be sought at this point; the purpose is to demonstrate to
subjects through experience the similarities and differences between SysTrust™
and all other assurance services, and the range of engagements possible within
SysTrust™ . Subjects in the control group would read the same descriptions of
SysTrust™ engagements in order to show them specific examples of the service,
but they would not be required to classify the engagements according to the initial
system and would not see a picture of it.
Still in the learning of use stage of the experiment, subjects would be taught
a postclassifcation of assurance services that includes SysTrust™ , for instance
AICPA option 1 as pictured in Fig. 3. Subjects would be shown where in the
revised system various types of engagements would be placed, according to levels
of assurance and customization provided. The final task of the experiment would
require subjects to read examples of SysTrust™ engagements (including the
independent accountant’s reports), rate the reliability of the systems described,
and rate the user’s risk for each of them. The treatment group would observe the
entire postclassification system (e.g. Fig. 3), whereas the control group would
be shown only the part of the postclassification system showing SysTrust™ . In
the case of Fig. 3, this would be only the oval containing SysTrust™ and the
subcategories of audit examination, agreed-upon procedures, and management
consulting. The categories of review, attestation, compilation, and assurance in
the initial classification system would not be shown to the control group.
In this design, judgments of user’s risk replace disease symptoms listed (Ross,
2000) as the dependent variable, but the design follows Ross (2000) in all other
respects as closely as possible. Evidence of a category use effect would be that
subjects in the treatment group rate user’s risk closer to the levels intended by
the AICPA (e.g. consistently low risk for audit examination engagements) than
the control group’s ratings. The category use effect would have been caused first
by the treatment group’s having attempted to classify SysTrust™ with the initial
system and being prompted to recall various levels of user’s risk associated with
analogous assurance services. Also, they would have been able to observe the
entire postclassification system, not just SysTrust™ categories, when making
user’s risk judgments. Stated as a formal hypothesis, the category use effect
would predict:
H1. Decision makers asked to classify SysTrust™ engagements according to
an initial classification system when learning to use SysTrust™ , and to use a
complete postclassification system when asked to rate the user’s risk of systems,
will give ratings closer to those intended by the AICPA than decision makers
not asked to use an initial classification system, or a complete postclassification
system, during learning of use tasks.
The experiment described above could provide behavioral evidence of a category
use effect, but fails to identify a more (or the most) efficient assurance classi-
fication system because subjects are only required to learn one system in the
classification learning task. In the absence of a compelling logical argument, as
required by Cohen (2000), that there exists an assurance classification system
that is more efficient than alternative methods of organizing and accessing
knowledge, additional behavioral evidence is needed to address the question of
cognitive efficiency. To answer this question, the research could be extended
by teaching another classification system, such as Kinney (2000, Fig. 2), in
the classification learning task and comparing user’s risk judgments to results
given the AICPA’s system. Other dependent variables measuring cognitive
efficiency, for instance recall of the detailed information about the customized
features of individual SysTrust™ engagements, could be added to the design.
If one wished to predict that a relatively conceptual classification system is
more efficient than a concrete system, then one possible alternative hypothesis
would be:
H2. Among decision makers given the opportunity to use complete initial
assurance classification systems when learning postclassification systems
that include SysTrust™ , those using systems based on Kinney (2000) will
later recall more information about SysTrust™ engagements than those using
AICPA-based systems.
Designing research in assurance classification is inherently more complex than
the experiments conducted by Ross (2000). Not only are there alternative initial
classification systems, there are alternative postclassification systems even for the
same initial system, as shown in Figs 3 and 4. When subjects begin participating
they might (e.g. commercial loan officers) or might not (e.g. students not having
taken an auditing course) have already learned an assurance classification system.
Those in the former group might find it difficult to ignore their preconceptions
if they contradict what is taught in the classification learning task. One means
of dealing with pre-existing classification systems among user groups would be
to survey them regarding concepts such as levels of assurance and to adjust the
systems taught in experiments for the results.
Research could be extended beyond the strictly cognitive domain by intro-
ducing a dependent variable not used in extant classification research – price.
Experimental markets could be employed to measure the willingness of subjects
to pay for customized assurance services, although some abstraction from the
details of specific services such as SysTrust™ might be necessary. A category use
effect would be evident if subjects who were reminded in some way of an original,
generic assurance classification system as they learned to use customized assur-
ance were willing to pay more for a customized service than subjects not reminded
of the initial system. This result would certainly please the AICPA and lend
support to their hope that the CPA brand can be extended to a broader spectrum
of assurance services.
Finally, research could be extended to other customized assurance services
offered under the AICPA brand, such as ElderCare. Services offered by other
providers could also be included. For example, experiments using recall, user’s risk
or price as dependent variables could require subjects to classify websites having
either WebTrust™ or BBB Online seals as to the level of assurance provided. Re-
gardless of the assurance services and dependent variables addressed, the difficulty
remains that an initial classification system must either be assumed or taught to
participants in the research. Consensus is more likely with relatively homogeneous
user groups. Thus, a sample comprised of either commercial loan officers or
institutional investors will be more likely to share a common classification system
than a sample containing both types of decision makers, or including trading

partners as well as investors. If actual decision makers of any type are sampled,
rather than students without prior beliefs about classification of assurance services,
then care must be taken to identify as precisely as possible the intended consumers
of each service.
In conclusion, an anecdote may illustrate the importance of understanding how
decision makers classify customized assurance services. In 1998 an accountant in
one of North America’s largest independent oil and gas producers mentioned to
me that a partner of a Big 5 firm had left him a business card containing the title
“assurance partner.” The accountant knew that the Big 5 firm was his company’s
auditor, but he was confused by the title. Did it mean that the partner was also
involved in insurance in some capacity? I explained that the partner was indeed an
auditor. Even though assurance is a more familiar term now, that simple explana-
tion could be misleading when describing practitioners who provide customized
assurance services.
NOTES
1. Assurance will be defined in this paper as defined by the AICPA (2000, p. 1): “in-
dependent professional services that improve the quality of information, or its context, for
decision makers.” Attestation, including audits, is a subcategory of assurance (see Fig. 1),
and at times in the paper attestation services will be referred to as a type of assurance.
2. The fields of auditing and decision-making uses of accounting information are most
relevant to this paper, but cognitive models of classification also appear in accounting
education literature, e.g. Butler and Mautz (1996) and Bagranoff et al. (1994). There
is considerable discussion of ontologies in information systems literature concerning
databases and artificial intelligence, e.g. Dahlgren (1995), Parsons and Wand (1997),
Terenziani (1995), and Wand and Wang (1996). However, much of this work is based on
culture and language, for example in reproducing users’ classifications in artificial systems,
rather than psychology and cognition (the focus of this paper).
3. Cohen (2000) discusses two other types of evidence that are less relevant to this paper.
Neuropsychological evidence shows that damage affects hierarchical levels differently.
Ontogenetic evidence shows that children acquire some hierarchical levels before others.
4. Boritz (2001) pointed out that SysTrust™ assurance does not pertain to system
reliability itself – it pertains to effectiveness of controls over principles. He questioned
whether this could cause an expectations gap.
ACKNOWLEDGMENTS
Thanks to Karla Johnstone, Janet Morrill, Steve Salterio, Mike Stein, Michael
Wright, and two anonymous reviewers.
REFERENCES
AICPA (1996). Report of the AICPA Special Committee on Assurance Services. http://www.
aicpa.org/assurance/index.htm
AICPA (2000). Assurance Services – Definition and Interpretive Commentary. http://www.aicpa.
org/assurance/scas/comstud/defincom/index.htm
AICPA/CICA (2000). SysTrust™ Principles and Criteria for Systems Reliability Exposure Draft,
Version 2.0. http://www.aicpa.org or http://www.cica.ca
Arens, A., & Loebbecke, J. (1997). Auditing: An integrated approach (8th ed.). Upper Saddle River,
NJ: Prentice-Hall.
Bagranoff, N., Houghton, K., & Hronsky, J. (1994). The structure of meaning in accounting: A cross-
cultural experiment. Behavioral Research in Accounting (Suppl.), 35–57.
Bamber, E. M., & Stratton, R. (1997). The information content of the uncertainty-modified audit report:
Evidence from bank loan officers. Accounting Horizons (June), 1–11.
Bandyopadhyay, S., & Francis, J. (1995). The economic effect of differing levels of auditor assurance
on bankers’ lending decisions. Canadian Journal of Administrative Sciences, 12, 238–249.
Beaulieu, P. (1996). A note on the role of memory in commercial loan officers’ use of accounting and
character information. Accounting, Organizations and Society (August), 515–528.
Bonner, S., Libby, R., & Nelson, M. (1997). Audit category knowledge as a precondition to learning
from experience. Accounting, Organizations and Society (July), 387–410.
Boritz, J. E. (2001). Information systems assurance. In: V. Arnold & S. G. Sutton (Eds), Research-
ing Accounting as an Information Systems Discipline. Sarasota, FL: American Accounting
Association (forthcoming).
Butler, J., & Mautz, R. D., Jr. (1996). Multimedia presentations and learning: A laboratory experiment.
Issues in Accounting Education (Fall), 259–280.
Campbell, J., & Mutchler, J. (1988). The expectations gap and going-concern uncertainties. Accounting
Horizons (March), 42–49.
Choo, F., & Trotman, K. (1991). The relationship between knowledge structure and judgments for
experienced and inexperienced auditors. The Accounting Review (July), 464–485.
Cohen, G. (2000). Hierarchical models in cognition: Do they have psychological reality? European
Journal of Cognitive Psychology, 12(1), 1–36.
Dahlgren, K. (1995). A linguistic ontology. International Journal of Human-Computer Studies,
43(5–6), 809–818.
Frederick, D., Heiman-Hoffman, V., & Libby, R. (1994). The structure of auditors’ knowledge of
financial statement errors. Auditing: A Journal of Practice and Theory (Spring), 1–21.
Houston, R., & Taylor, G. (1999). Consumer percentions of CPA WebTrust assurances: Evidence of
an expectation gap. International Journal of Auditing, 3, 89–105.
Jennings, M., Kneer, D., & Reckers, P. (1993). The significance of audit decision aids and precise
jurists’ attitudes on perceptions of audit firm culpability and liability. Contemporary Accounting
Research (Spring), 489–507.
Johnson, D., Pany, K., & White, R. (1983). Audit reports and the loan decision: Actions and perceptions.
Auditing: A Journal of Practice and Theory (Spring), 38–51.
Johonson, K., & Mervis, C. (1997). Effects of varying levels of expertise on the basic level of catego-
rization. Journal of Experimental Psychology (September), 248–277.
Kida, T., Smith, J., & Maletta, M. (1998). The effects of encoded memory traces for numerical data on
accounting decision making. Accounting, Organizations and Society (July/August), 451–466.
Kinney, W. (2000). Information quality assurance and internal control for management decision mak-
ing. Boston: Irwin/McGraw-Hill.
Kinney, W., & Nelson, M. (1996). Outcome information and the “expectation gap”: The case of loss
contingencies. Journal of Accounting Research (Autumn), 281–294.
Libby, R. (1995). The role of knowledge and memory in audit judgment. In: R. Ashton & A. H.
Ashton (Eds), Judgment and Decision-Making Research in Accounting and Auditing.
Cambridge: Cambridge University Press.
Libby, R., & Trotman, K. (1993). The review process as a control for differential recall of evidence in
auditor judgments. Accounting, Organizations and Society (August), 559–574.
Lymer, A., Debreceny, R., Gray, G., & Rahman, A. (1999). Business reporting on the internet
(Discussion Paper). London: International Accounting Standards Committee (IASC).
Malt, B., Ross, B., & Murphy, G. (1995). Category coherence in cross-cultural perspective. Cognitive
Psychology, 29, 85–148.
Martin, C., Handorf, W., & Clewell, W. (1988). Small business lending and levels of report assurance.
Akron Business and Economic Review (Summer), 69–84.
Mautz, R., & Sharaf, H. A. (1961). The philosophy of auditing. Sarasota, FL: American Accounting
Association.
Moeckel, C. (1990). The effect of experience on auditors’ memory traces. Journal of Accounting
Research (Autumn), 368–387.
Nelson, M., Libby, R., & Bonner, S. (1995). Knowledge structures and the estimation of conditional
probabilities in audit planning. Accounting Review (January), 27–47.
Osherson, D., Smith, E., Wilkie, O., Lopez, A., & Shafir, E. (1990). Category-based induction. Psy-
chological Review, 97, 185–200.
Parsons, J., & Wand, Y. (1997). Choosing classes in conceptual modeling. Communications of the
ACM, 40(6), 63–69.
Rosch, E., Mervis, D., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural
categories. Cognitive Psychology, 8(3), 382–439.
Ross, B. (1996). Category representations and the effects of interacting with instances. Journal of
Experimental Psychology: Learning, Memory and Cognition, 22, 1249–1265.
Ross, B. (1997). The use of categories affects classification. Journal of Memory and Language,
37(August), 240–267.
Ross, B. (1999). Postclassification category use: The effects of learning to use categories after learning
to classify. Journal of Experimental Psychology: Learning, Memory and Cognition, 25(May),
743–757.
Ross, B. (2000). The effects of category use on learned categories. Memory and Cognition, 28(January),
51–63.
Terenziani, P. (1995). Towards a causal ontology coping with the temporal constraints between causes
and effects. International Journal of Human-Computer Studies, 43(5–6), 847–863.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological foundations. Com-
munications of the ACM, 39(11), 86–95.
Wright, M., & Davidson, R. (2000). The effect of auditor attestation and tolerance for ambiguity on
commercial lending decisions. Auditing: A Journal of Practice and Theory (Fall), 67–81.

Advances in Accounting Behavioral Research, Volume 6 (Advances in Accounting Behavioral Research) (PDFDrive)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advances in Accounting Behavioral Research, Volume 6 (Advances in Accounting Behavioral Research) (PDFDrive)

Uploaded by

Copyright:

Available Formats

CONTENTS

LIST OF CONTRIBUTORS vii

EDITORIAL POLICY AND SUBMISSION GUIDELINES xiii

PART I: ACCOUNTING BEHAVIORAL RESEARCH

A STRUCTURAL EQUATION MODEL OF AUDITORS’

AN ANALYSIS OF GROUP INFLUENCES ON GOING

INVESTIGATING ERROR PROJECTION AMONG STATE

HOW DOES NEGATIVE SOURCE CREDIBILITY AFFECT

EARNINGS MANAGEMENT AND FRAMING: THE SPECIFIC

THE EFFECTS OF INCENTIVE STRUCTURE AND GOAL

THE EFFECT OF FAIRNESS IN CONTRACTING ON THE

PART II: PERSPECTIVES ON RESEARCH PRODUCTIVITY

A TOBIT ANALYSIS OF ACCOUNTING FACULTY

PART III: METHODOLOGICAL ISSUES IN BEHAVIORAL

CLASSIFICATION OF CUSTOMIZED ASSURANCE

Sunita S. Ahlawat School of Business, The College of New

John T. Sweeney School of Accounting, Information Systems

Kip R. Krumwiede Robert J. Parker

Welcome to Volume 6 of Advances in Accounting Behavioral Research. This

Advances in Accounting Behavioral Research (AABR) publishes articles encom-

Vicky Arnold, Editor

References should follow the APA (American Psychological Association) stan-

Ashton, R. H., & Ashton, A. H. (1995). Judgment and decision-making research

Smedley, G. A. (2001). The effects of optimization on cognitive skill acquisition

For a Working Paper

For Papers From Conference Proceedings, Chapters From Book, etc.

Messier, W. F. (1995). Research in and development of audit decision aids. In:

John T. Sweeney, Jeffrey J. Quirin and Dann G. Fisher

Advances in Accounting Behavioral Research

to maintaining stakeholder’s confidence in the integrity of the audit report and in

LITERATURE REVIEW AND HYPOTHESES

is given to cultivating professional values (Jeffery & Weatherholt, 1996; Larson,

Bline et al., 1991) and organizational-professional conflict (Aranya & Ferris,

A Model of Auditors’ Professional Commitment

The objective of this research is to model auditors’ professional commitment. The

which may facilitate socialization by reducing uncertainty regarding appropriate

H2. Politically conservative auditors will have greater professional commitment

that conventional reasoning auditors, as measured by DIT P scores, were more

socialization process. If the socialization process is successful, then it follows that

Fig. 1. Theoretical Model of Professional Commitment.

Prior to collecting data, management representatives from offices of multiple

Table 1. Descriptive Statistics for Sample.

Males: 230 Liberals: 63 Average age: 30.3 years (S.D. = 8.1)

Mean: 75.51 42.14

was voluntary and subjects were assured of anonymity. Participants provided

Table 2 presents correlation coefficients for professional commitment and

Structural Equation Modeling

Table 2. Correlation Matrix.

Fig. 2. Structural Equation Model with Path Coefficients.

Table 3. Structural Equation Modeling Results.

PC Firm size H1 −0.236 −4.65 0.001

N = 349. PC = Professional Commitment.

Hypothesis 1 predicts a negative relationship between firm size and professional

tailed t-test indicated that the professional commitment of politically con-

Table 4 examines the influence of the significant main effects on auditors’

Table 4. Summary of Professional Commitment Levels By Political Ideology

Small 88 80.02 Staff 23 75.57 Conservative 17 77.06

LIMITATIONS AND DISCUSSION

The process of socialization implies that membership within the dominant

Aranya, N., & Ferris, K. R. (1984). A re-examination of accountants’ organizational-professional

Sunita S. Ahlawat and Timothy J. Fogarty

Advances in Accounting Behavioral Research

LITERATURE REVIEW AND RESEARCH HYPOTHESES