On The Interchangeabilfty of Objective and Subjective Measures of Employee Performance: A Meta-Analysis

PERSONNEL PSYCHOLOGY
1995,48
ON THE INTERCHANGEABILfTY OF OBJECTIVE

AND SUBJECTIVE MEASURES OF EMPLOYEE
PERFORMANCE: A META-ANALYSIS
WILLIAM H. BOMMER
Department of Management
Southern Illinois University at Edwardsville
JONATHAN L. JOHNSON
Indiana University
GREGORY A. RICH
MarketingDepartment
Bowling Green State University
PHILIP M. PODSAKOFF
Indiana University
SCOTT B. MAC KENZIE
Marketing Department
Indiana University
A meta-analysis of studies containing both objective and subjective rat-

ings of employee performance resulted in a corrected mean correlation
of .389. This value, although significantly greater than zero, indicates
that objective and subjective performance measures should not be used
interchangeably. Moreover, in no moderator subgroup examined did
the correlation suggest convergent validity. After discussing issues re-
lated to resolving the previous anomalies of primary and meta-analytic
results, a secondary analysis suggested that objective and subjective
measures of the same construct at the same level may be used inter-
changeably. The secondary analysis, however, was based on a very
limited sample. Future research should address the appropriate di-
mensionality of employee performance.
Because job performance is the most widely studied criterion vari-

able in the organizational behavior and human resource management
literatures (cf. Campbell, 1990; Heneman, 1986; Schmidt & Hunter,
1992), the construct validity of performance measures is critical. The
convergent validity of performance measures is important to academics
and practitioners. For academics, it is central to hypothesis test validity
and theory construction, whereas practitioners are interested in accu-
rately assessing employee performance to utilize scarce resources.
Correspondence and requestsfor reprints should be addressed to William H. Bommer,
Department of Management, Southern Illinois University at Edwardsville, Edwardsville
IL 62026-1100.
COPYRIGHT 0 1995 PERSONNEL PSYCHOLOGY, INC.
587
588 PERSONNEL PSYCHOLOGY
Although there are multiple ways to partition performance mea-

sures, the most popular has been between objective and subjective mea-
sures. Objective measures are defined here as direct measures of count-
able behaviors or outcomes, whereas subjective measures consist of su-
pervisor ratings of employee performance. Although these categories
are somewhat arbitrary, they provide a useful distinction by which pre-
vious research may be organized, synthesized, and interpreted.
Theorists who have examined objective and subjective performance
measures have generally agreed that they should not be used inter-
changeably (Murphy & Cleveland, 1991). These recommendations were
empirically supported by Heneman (1986), who reported a corrected
mean correlation of only .27 in a meta-analysis of the relationship be-
tween subjective supervisory ratings and objective result-oriented mea-
sures. Heneman concluded that the measures were not substitutable,
and that “when reviews of the literature are conducted, results should
be grouped by the type of performance criteria” (p. 820).
In spite of these theoretical recommendations and empirical find-
ings, many researchers continue to treat different performance mea-
sures synonymously. One need look at only a few studies that include
performance to realize that many authors’ conclusions are intended to
generalize to a broad performance construct, irrespective of the mea-
sure(s). This is further seen in recent meta-analyses in which the au-
thors did not distinguish between objective and subjective performance
measures (e.g., Fried, 1991; Williams & Livingstone, 1994).
A few primary researchers and meta-analysts have heeded these rec-
ommendations, and have explicitly looked for differences in relationship
strength between multiple independent variables with subjective and ob-
jective performance measures. But contrary to expectations, in practi-
cally no cases have significant relationship differences been found. For
example, Nathan and Alexander (1988) found a difference between ob-
jective and subjective measures in only one of the seven relationships
examined, and concluded that the objective/subjective distinction “may
be more illusory than real” (p. 531). Their conclusion is corroborated
by several meta-analyses. In examining the relationship between age
and performance, McEvoy and Cascio (1989) found no difference in
the strength of relationships involving productivity (objective measures)
and supervisor ratings (subjective measures). Ones, Viswesvaran and
Schmidt (1993) found no difference between the relationships involv-
ing production records and ratings in their examination of integrity tests
and overall job performance. Tett, Jackson, and Rothstein (1991) failed
WILLIAM H.BOMMER ET AL. 589
to find objective/subjective performance measure differences in a meta-

analysis examining personality as a predictor of job performance. Simi-
larly, Mathieu and Zajac (1990) found no objective/subjective relation-
ship strength differences in their commitment study. Even a study that
did not explicitly examine the difference in the relationship (Williams &
Livingstone, 1994), when manually recalculated, yielded no significant
difference.
Thus, there exists a fundamental tension between theoretical ex-
aminations of performance measurement and Heneman’s meta-analyhc
findings on the one hand, and actual research practice and meta-analytic
results on the other. It is possible that equal predictability is a function
of the substitutabilityof objective and subjective measures, or it could be
a function of chance (it is possible for two otherwise unrelated variables
to be equally predictive of a third variable). These differences in theory
and practice suggest that further research is warranted.
The purpose of this study is to assess the relationship between objec-
tive and subjective performance measures, and to estimate the popula-
tion correlation between these measures. If the population correlation
is of a magnitude suggesting convergent validity, then the measures may
be combined largely without incident. Given prior research, however,
the likelihood of the overall convergence between all objective and all
subjective measures is unlikely (e.g., Heneman, 1986). However, it is not
unreasonable to expect that the measures may be interchangeable under
specific circumstances. Thus, we will examine a number of moderators
whose presence may suggest that objective and subjective measures can
be used interchangeably, and if so, in what situations.
Potential Moderators of the Relationship Between Objective and

Subjective Measures
Moderators
Previous research guided our search for moderators of the objec-

tivehbjective performance relationship. The moderators tested in Hen-
eman’s (1986) meta-analysis were included to determine whether the
conclusions reached are supported using an updated, larger data set.
Additional moderators were included based on prior empirical and con-
ceptual relevance. Thus, job type (sales and nonsales), rating method
(relative, absolute, and a hybrid of the two), rating format (composite
and overall), and the objective measure’s content (i.e., whether the ob-
jective measure assessed performance “quantity” or performance “qual-
ity”) were assessed.
590 PERSONNELPSYCHOLOGY
Job type. Heneman (1986) reported a relativelyweak mean corrected

correlation (T = .27) between objective and subjective performance
measures. However, it is noteworthy that all but four samples included
by Heneman used nonsales employees. The lack of sales samples is
notable because of at least three reasons why the focal relationship
should be stronger in sales samples. First, sales managers’ salaries are
often contingent on their employees’ performance, thus increasing the
objective performance measure’s salience. Second, salespeople are tra-
ditionally evaluated on sales output, so managers are likely to consider it
when evaluating performance. Finally, in sales samples objective perfor-
mance measures are generally easily assessed and readily available, pro-
viding sales managers with the information necessary to evaluate their
personnel. Thus,
Hypothesis 1: Job type will moderate the objective/subjectiveperformance
measure relationship. More specifically, objective and subjective perfor-
mance measures will be more strongly related in samples using salespeo-
ple than in samples using nonsales settings.
Objective measure’s content. In the section above, we argued that

due to sales output’s quantitative nature, the objective/subjective rela-
tionship should be stronger in salesperson samples than other job types.
The objective measure’s content is related to job type, but it goes be-
yond a distinction based strictly on sales or nonsales samples. As with
sales, production quantity information is likely to be more easily ob-
tained than quality measures, and the higher frequency of the quantity-
related behaviors (relative to quality measures which are typically low
frequency error rates) should make them easier to observe. Thus, con-
sistent with Nathan and Alexander (1988) and Hoffman, Nathan, and
Holden (1991), we anticipated that objective production quantity mea-
sures would correspond more closely to subjective ratings than would ob-
jective quality measures. To test this, we assessed each study for its objec-
tive measure’s content, differentiating samples using objective produc-
tion quantity measures from samples assessing output quality. In most
cases, samples including quality were accuracy measures (i.e., shortages
and overages) and error rates. Consistent with these arguments,
Hypothesis 2: The objective measure’s content will moderate the objec-
tivehubjective performance measure relationship. More precisely, objec-
tive and subjective performance measures will be more strongly related in
samples using a quantitative objective measure than in samples utilizing
quality measures.
The examination of objective measure’s content leads to two interest-

ing methodological issues. First, job type and objective measure’s con-
tent are not independent because sales samples exclusively used objec-
tive productivity measures. This lack of independence can hinder the
interpretation of the results, although a follow-up analysis will be con-
ducted to determine which potential moderator is most likelyresponsible
for the effects.
The second methodological issue concerns the classification of sam-
ples into appropriate subgroups representing the objective measure’s
content. In this instance, combining the samples in the two subgroups
yields a value larger than our data set. This results from some samples
containing both quantity and quality measures. In the six samples where
dual reporting occurred, we assigned each quality and quantity correla-
tion to its appropriate subgroup. Although correlations from the same
sample are not independent, their relatively infrequent occurrence does
not likely violate the robust assumptions of subgroup analyses.
Rating method and rating format. Consistent with Feldman (1981),
Heneman (1986) argued that less cognitively demanding rating proce-
dures should be more accurate than more complex ones. He identified
two dimensions central to evaluation complexity, labeling them rating
format and rating method. Rating format contrasted overall evaluations
(one-item measures of ratee’s global performance) with composite eval-
uations (multiple-items aggregated to form a measure). Heneman ar-
gued that composite instruments were less cognitively demanding be-
cause they provide the rater with more definite dimensional guidelines.
Rating method contrasted absolute comparisons (ratee compared to a
fixed performance standard) and relative comparisons (ratee compared
to peers). Heneman considered relative comparisons simpler than abso-
lute comparisons. On the assumption that objective measures are more
accurate than subjective ones, Heneman hypothesized that simpler rat-
ings will correlate more strongly with objective measures.
In this study, we did not employ Heneman’s exact rating format clas-
sification, because in several multiple-item scales, each item assessed
overall performance. We classified samples as overall if their subjec-
tive rating measured global performance, and composite if the rating in-
cluded specific performance dimensions, irrespective of the number of
items. For rating method, we used Heneman’s basic scheme, although
we added a hybrid rating method subgroup, representing scales combin-
ing aspects from relative and absolute methods.
Consistent with the arguments presented above,
Hypothesis 3: Rating format will moderate the objectivefsubjectiveper-
formance measure relationship. Thus, the performance measures will be
more strongly correlated when the subjective measure is evaluated with a

composite format than when an overall format is used.
Hypothesis 4 Rating method will moderate the objective/subjective per-
formance measure relationship. Thus, the performance measures will be
more strongly correlated when the subjective measure is evaluated with a
relative format than when an absolute or hybrid format is used.
Method
Sample
Journals were searched for articles reporting a Pearson product-

moment correlation between managerial subjective ratings and objec-
tive employee performance measures. To provide a sample representa-
tive of actual work situations, we included only field studies including
currently employed workers being evaluated by their supervisor. Fur-
ther, to increase sample homogeneity, studies using academics or other
researchers were not included (e.g., Farh, Webel, & Bedeian, 1988). Ex-
amples of studies that were not used include Chaney (1966), who re-
ported tetrachoric correlations, Hollenbeck and Williams (1987), who
used self-rated performance, and Huber, Neale, and Northcraft (1987)
who employed an experimental setting.
Numerous search techniques yielded a comprehensive collection of
studies including the focal relationship. First, the studies from Hene-
man’s (1986) meta-analysiswere obtained. We also examined all articles
citing Heneman (1986), as indicated by the Social Science Citation Index.
Third, a manual search of the followingjournals published between Jan-
uary 1950 and June 1994was conducted: Academy ofManagement Jour-
nal,Administrative Science Quarterly,Journal ofApplied Psychology,Jour-
nal of ConsumerResearch,Journal of Management,Journal of Marketing,
Journal of Marketing Research,Journal of Personal Selling and Sales Man-
agement,Journal of the Academy of Marketing Science,Marketing Science,
Organizational Behavior and Human Decision Processes, and Personnel
Psychology. Marketing journals were included because a preliminary
investigation suggested the presence of relevant studies. Each journal
was then manually checked for studies reporting correlations between
objective and subjective employee performance measures, regardless of
the article’s main focus. Next, PsychLit (January, 1972-June7 1994) was
searched for additional articles including both objective and subjective
WILLIAM H.BOMMER ET A.L. 593
performance measures. Finally, the reference lists of the articles iden-

tified in the above steps were examined for titles suggesting suitability.
This process yielded 40 articles containing 50 independent samples.
Moderator Coding
Each sample was independently coded by the first three authors on

each of the proposed moderators. Discrepancies among the raters were
discussed until all were confident in the coding.
Meta-AnalyticProcedure
The meta-analysis was conducted following Hunter and Schmidt

(1990). To provide the most accurate estimates, the weighted mean cor-
relations and their variances were corrected for measurement and sam-
pling error. To correct for measurement error, we usedMetaDos (Stauf-
fer, 1994) using the option employing Hunter and Schmidt’s (1990) ar-
tifact distribution formulae. The issue of appropriate reliabilities for
the artifact distribution posed an interesting question. We used the Na-
tional Academy of Sciences estimate of .80 (Hartigan & Wigdor, 1989)
as the reliability for subjective ratings. It can be argued that this reliabil-
ity undercorrects, and thus, underestimates the true correlation between
objective and subjective measures. We chose, however, to use this poten-
tially “conservative” reliability to not overcorrect the population corre-
lations. For the present analysis, Type I error should be considered more
serious than Type 11.Thus, the more conservative value seems justified.
On the objective side, selecting an appropriate reliability was more
complex. To provide an accurate estimate, we used two sets of com-
plementary values. Where possible, we used the reliability measure re-
ported. For those studies not reporting a reliability, we used the values
reported by Hunter, Schmidt, and Judiesch (1990). Using the Hunter
et al. (1990) data, reliabilities were matched to each sample based on
job type and the period over which the objective measure was assessed.
Using the reliability information from these sources, the mean reliabil-
ity was .85 based on 43 reliability estimates, again representing a con-
servative correction compared to other meta-analyses of this type (e.g.,
Heneman, 1986).
In this study, we use a relatively new and under-utilized method of
calculating the overall correlation in samples reporting multiple objec-
tive and/or subjective measures. We used linear composites (cf. Hunter
& Schmidt, 1990; Viswesvaran, Schmidt, & Ones, 1994) instead of the
commonly used averaging method (e.g., Heneman, 1986) or the inde-
pendent inclusion of all correlations (e.g., Mitra, Jenkins, & Gupta,
1992). Linear composites are superior to averaging because they pro-

vide a more construct valid estimate of the true correlation and avoid
over- or underestimating the sampling error, thus improving the preci-
sion of meta-analysis. Further, linear composites are superior to the in-
dependent inclusion of all correlations because composites do not dou-
ble count the study in the data set, which systematically underestimates
sampling error.
The homogeneity of the meta-analytically derived relationship be-
tween subjective and objective performance measures was assessed by
credibility intervals (cf. Osburn & Calender, 1992; Whitener, 1990).
Thus, a 90% credibility interval was calculated around the corrected
mean correlation. According to Koslowsky and Sagie (1993), credibil-
ity intervals wider than 0.11 imply the presence of moderators.
Although credibility intervals (as well as any other homogeneity test)
indicate the presence of a moderator, they do not reveal its identity
(Whitener, 1990). To test the impact of potential moderators, we used a
combination of methods suggested in the meta-analysisliterature. Thus,
strong moderating effects are indicated by the agreement of the follow-
ing three criteria: difference in corrected mean correlations, a signifi-
cant t test, and a significant critical ratio.
In addition, to test the significance of the overall relationship, and
the value within each subgroup, 95% confidence intervals were exam-
ined. The relationship was significantly different from zero when the
confidence interval excluded zero.
Results
The meta-analytic results are reported in Table 1. The overall cor-

rected mean correlation between objective and subjective performance
measures was .389 1.317 observed), compared to Heneman’s (1986) find-
ing of .27 (.17 uncorrected). The 95% confidence interval around the
present mean correlation excluded zero, suggesting that the measures
were significantly related, contradicting Heneman’s conclusion that the
measures were not significantlyrelated.
To test the overall relationship’s homogeneity, a 90% credibility in-
terval around the corrected mean was calculated. As shown in Table 1,
this interval ranged from .135 to .639. Because the credibility interval’s
width is nearly five times Koslowsky and Sagie’s (1993) suggested value,
the overall relationship is heterogeneous, signifyingthat moderators may
be present.
Because the overall collection of samples appeared to contain multi-
ple populations, each hypothesized moderator was examined. A perusal
of Table 1, however, reveals that only the objective measure’s content
TABLE 1
Meta-AnalysisResults
Sample # of Observed Corrected Observed Corrected % of 90% Credibility 95% Confidence Critical t test
size studies r r variance variance variance interval intervala ratio
Overall 8,341 50 .317 .389 .031 ,038 15.8 .135-.639 .268-.366
Job type
Saidspeople 4,173 22 .337 .410 .ON .021 23.2 .224-.596 .a07394 3
0.83 0.87 df = 46
Other 4,168 28 .297 ,363 .043 .055 13.4 .063-.663 .221-.373 E
Rating format
Overall 5,174 32 .316 ,384 .026 .031 19.5 .158-,610 .260-.372
0.08 0.03 df = 39
Composite 3,560 22 .312 .382 ,037 ,048 13.8 ,101-.664 .231-.392
Rating method 8
Absolute 4,477 33 ,296 ,359 .045 ,057 13.9 .054-.664 .223-,368 1.40' 1.64df = 27'
Hybrid 2,321 9 .316 .388 .013 .013 32.2 .242-,534 .242-.390 0.39' 0.52 df = 28' R
Relative 1,611 9 .372 .453 .015 .014 36.2 .301-.605 .293-,452 l.OZd 1.18df = 16d M
Objective measure +
L I
Quantity 7,395 40 ,316 .382 ,034 .043 13.1 ,116-.647 ,259,373 F

2.79' 2.82**df = 38
Quality 3,061 16 .191 .242 ,019 .022 26.5 .052-,432 .124-,256
"Confidence intervals were calculated using Osburn and Calender's (1992) formula for sampling variance (equation 5).
' Absolute and hybrid rating methods were compared.
'Absolute and relative ratings methods were compared.
Hybrid and relative ratings methods were compared.
* p< .05; ** p < .01.
had an impact. In addition, no 95% subgroup confidence interval in-

cluded zero, indicating that the correlation was significant in every con-
dition examined.
Moderator analysis found that when the objective measure reflected
performance quantity, the corrected correlation between objective and
subjective measures was T = .382. When the objective measure rep-
resented performance quality, the correlation was r = .242. The three
moderator tests were consistent in their support for subgroup differ-
ences. The difference in the corrected correlations was greater than
0.10, and the t test and critical ratio were significant. Further, the sub-
group confidence intervals were mutually exclusive, indicating a strong
effect. Thus, the relationship between subjective and objective perfor-
mance measures was stronger when the objective measure assessed per-
formance quantity, rather than performance quality.
Due to the lack of independencebetween objective measure content
and employeejob type, subsequent analyseswere needed to make more
definitive statements regarding the underlying cause of the difference
found. Thus, we examined whether a difference was present between
sales samples and nonsales samples using a quantitative objective mea-
sure. This subsequent analysis failed to find a significant difference be-
tween the different job types when the objective measure was quantity
(.413 for salespeople versus .343 for nonsales samples), suggesting that
the objective measure’s content accounted for the overall moderating
effect.
As a final step in our analysis,we examined the relationship strength
between subjective and objective indicatorsof a single performancecon-
struct. This was done to determine the differences that may exist be-
tween “what” is being measured and “how” (i.e., subjective or objective)
the measures are assessed. The results of this analysis were intriguing.
Of the samples included in this study, only three could be found in which
the performance measures tapped precisely the same performance di-
mension. Specifically, subjective and objective measures of production
quantity were correlated at .706 (S99 observed).
Discussion
The results indicate three major findings. First, the corrected pop-
ulation correlation of interest was only .389, suggesting that the mea-
sures are not interchangeable. Second, of the four moderators tested,
only one affected the strength of the relationship between objective and
subjective measures, and even in this case the variance explained failed
to suggest interchangeability. In fact, no subgroup confidence interval
even included S O suggesting that in no case did the measures share even
WILLIAM H. BOMMER ET AL. 597
one-quarter of their variance. On this basis, our results agree with Hen-
eman’s (1986), who concluded that “the data suggest that ratings and
results cannot be treated as substitutes for one another” (p. 818). Our
third finding on the other hand, when the “what” was held constant (i.e.,
restricted to samples that tapped precisely the same construct) and dif-
ferent “howsy’were compared (objective versus subjective indicators),
it appeared that the measures were more reasonably substitutable. It
should be noted that this analysis contained only three samples, and
more research is needed before firm conclusions can be reached.
Theoretical Implications
Although our findings offer mixed support for Heneman’s (1986)

general conclusions, the tensions presented in the introduction remain.
When considering previous studies that have looked at objective and
subjective performance measures, it would appear that a number of
questions may be important to the objective/subjective distinction.
What is the nature of the distinction; is it meaningful? Several au-
thors have argued that the objective/subjective distinction is less clear
than it is often assumed. Nathan and Alexander (1988) reached this
conclusion from their findings that objective performance measures are
not more predictable than subjective measures. Their conclusion is sup-
ported by other theorists (e.g., Campbell, 1990; Muckler & Seven, 1992)
who argue that there are subjective elements in all aspects of perfor-
mance measurement. Campbell (1990), for example, notes that deter-
mining threshold levels for acceptable performance are a subjective mat-
ter (When is a defect a defect?), and Muckler and Seven (1992) point to
subjective aspects of even the “hardest” statistical analyses (e.g., signifi-
cance levels). Even the choice of which performance dimensions should
be measured is not apparent (Pfeffer, 1981). Moreover, the intetprefu-
tion of even the “hardest” performance measures may itself be open to
negotiation.
Considering these arguments, it is judicious to question whether the
objective/subjective distinction is vacuous. We think not, but the discus-
sion certainly needs redirection and further refinement. For instance,
Campbell (1990) points out that most research has assumed that perfor-
mance is a single construct, and the primary task is to reduce measure-
ment error. Moreover, working from the same assumption, reducing er-
ror variance associated with the objective measure will also necessarily
increase the relationship strength because the only remaining variance
will be explained by “the” performance construct. Indeed, Heneman’s
proposed moderators, and three of the four moderators in this study,
were intended to account €or error in ratings. Campbell counters that
PERSONNEL PSYCHOLOGY
there is no single performance factor, and that the objective/subjective

debate has been misdirected.
We agree with Campbell’s arguments regarding the complexityof the
performance factor structure and the subjective aspects of all measures,
but we are not yet ready to completely jettison the objectivelsubjective
distinction. Objective and subjective indicators are implicitly distin-
guished regarding how (versus what or why) performance is measured
(Muckler & Seven, 1992). Thus, the actual measurement process is im-
plicated, not the steps that precede or follow the recording of the perfor-
mance events. Even in this case, the measure’s objectivity or subjectiv-
ity falls along a continuum. Extremes in our sample would be found in
Sackett, Zedeck, and Fogli (1988), wherein performance was measured
by a machine completely unmediated by humans to Weitz (1978) whose
measure consisted only of a single supervisor’s overall rating made over
a year. Although a certain “gray area” exists between these extremes,
reasonable distinctions can be made. These distinctions are important
because the measurement process has implicationsfor the type and mag-
nitude of measurement error that is present.
Beyond the objective/subjective distinction,we were interested to de-
termine how Heneman’s (1986) and our present findings could be rec-
onciled with the findings of researchers who have found no significant
differences in relationships involving objective and subjective measures.
Although this issue has a simple analytical answer, it raises important
conceptual questions. Statistically, if two criterion variables are corre-
lated at the magnitude found in this study (i.e., .389), a very wide range
exists over which any third variable may be correlated with these two
criterion variables. For instance, consider a case where a criterion mea-
sure, commitment for instance, has a correlation of .40 with an objec-
tive performance measure. Using the rob3,subJ = .389 from this study,
Tcommit,subJ could range anywhere from -.689 to .999 (McNemar, 1969).
Clearly, the commitment-subjectiveperformance correlation could eas-
ily equal the .40 value, although it is intriguing that the relationship
strengths examined have so often been equal. It is of conceptual inter-
est to determine whether the independent variable is explaining the same
portion of the variance in the two performance measures. If the variance,
explained is the same portion of the overall variance, then either perfor-
mance measure would be acceptable. However, if the independent vari-
able explained different portions of the performance constructs, then it
would be fruitful to collect both types of measures.
As a final major issue, we were interested in whether the perfor-
mance theory on which most foregoing studies are based is adequate,
and if not, what would constitute an adequate theory. Campbell (1990)
argued persuasively that the field has not yet developed an adequate
performance theory, which he considers ironic given the numerous well-

developed theories for antecedents to performance (e.g., cognitive abil-
ity, physical ability, commitment, etc.). Campbell proposed a perfor-
mance theory to fill the void. In Campbell’s model, each of his eight
factors represent the “highest” possible factor (i.e., there are no more
general performance constructs), but each is hypothesized to consist of
multiple lower-order job-specific factors.
Exploring performance’s factor structure is not a trivial issue, and
the outcome of such an analysis has important implicationsfor the objec-
tive/subjective distinction. This issue is particularly relevant because ob-
jective and subjective measures are not equally susceptible to construct
validity threats (i.e., deficiency and contamination). Although objective
and subjective indicators are open to each threat, they tend not to be
equally susceptible. In general, the fear is that subjective measures are
excessively prone to contamination (Campbell, 1990), especially super-
visor bias. Moreover, subjective indicators likely include sizable random
error, if for no other reason than the supervisor’sbounded cognitive abil-
ities (Feldman, 1981) and/or observational opportunities, so objective
measures are commonly proposed as a superior alternative.
Because objective measures are intended to directly record the ac-
tual job-related behavior or outcome, unmediated by the contamina-
tion in a supervisor’s evaluation, they are frequently assumed to be free
of systematic bias and random error. Sackett et al.’s (1988) machine-
recorded performance measures are excellent examples of this; they
provide little opportunity for systematic rater bias, and they accurately
record all clerical entries and errors. Objective measures, however, can
be criticized as being excessively narrow. Thus, objective measures often
tap a single lower-order construct, only partially constituting the higher
order performance construct of interest. It should be further noted that
no measure could “objectively” measure all relevant performance as-
pects. Some key measures are necessarily subjective.
Practical Implications
At the most basic level, subjective measures should not be used as

proxies for objective measures, if objective performance is the behav-
ior of interest. For example, if sales is the desired outcome, organiza-
tions should not reward employees based on a supervisor’s overall per-
formance evaluation of that employee. Conversely, if broadly defined
performance is deemed more important, it is equally inappropriate to
reward employees solely on gross sales. Although this recommendation
appears intuitive, organizations frequently reward inappropriate behav-
iors (cf. Dechant & Veiga, 1995; Kerr, 1975).
Although universal substitutability is certainly not advised based on

the results of this study, local substitutability may be appropriate. At
least in the case of production quantity, the problems associated with
using proxy measures may not be severe. In this case, however, the ob-
jective measure represents a more valid indicator of this performance
dimension. Thus, researchers and practitioners should opt for objec-
tive indicators of production quantity if they are available, but subjec-
tive measures will likely provide a reasonable approximation. This may
hold true for other cases where it is the precise construct that is being
measured through multiple means, though this question needs more at-
tention before a firm prescription can be provided.
In addition, human resource practitioners need to be cognizant that
the measurement process has implicationsfor the type and magnitude of
measurement error present. Further, contrary to popular connotation,
objective measures are not without hazard. Measures should be selected
that balance types of error, relevance, and the practicality of obtaining
the measure. Thus, both objective measures and subjective measures
are problematic, but an integrative approach can be used to adequately
assess the performance dimensions of interest.
Limitations
This study should be viewed in the context of several limitations.

First, a strong comparison of objective and subjective measures requires
that they assess precisely the same performance construct. Only three
studies included in our sample met such a stringent criterion, although
the remaining studies’ objective and subjective indicators tapped per-
formance constructs that were overlapping by varying degrees. Thus, it
is difficult to determine whether the low correlations between objective
and subjective measures were due to underlying differences in the con-
structs or to differences in systematic and random measurement error.
Second, reliabilities were not reported in all of the studies, requiring the
use of estimates that may have over- or understated the relationship.
Third, we were unable to examine some potentially meaningful moder-
ators (e.g., presence of job analysis, employee participation in the devel-
opment of the performance appraisal instrument) because the required
information was not available in the primary studies. Finally, due to in-
complete information, we were unable to correct for range restriction,
resulting in a systematic downward estimate bias. Thus, there were a
number of factors that may have artificially suppressed the relationships
of interest. Only additional primary research, however, will provide the
data needed to address these issues.
WILLIAMH.BOMMER ETAL. 601
Future Research
Beyond the examination of potential moderators, the present topic

requires some fundamental empirical research. It is extremely surpris-
ing that so little is understood regarding the factor structure of perfor-
mance. Campbell’s (1990) model provides an encouraging start, but this
model has not been widely assessed. Without empirically examining the
dimensionality of job performance, progress toward the further under-
standing of performance as a criterion or as a predictor is unlikely. An
examination of the dimensionality of performance may take one of many
forms, but any analysis should incorporate the following observations.
Research regarding performance needs to recognize that perfor-
mance factors can be measured in multiple ways, and that each construct
may be tapped by objective and/or subjective measures. By using the
tools associated with structural equation modeling, the errors implicit in
different types of ratings can be better understood. Thus, because all
constructs are measured with both random and systematic error, struc-
tural equation modeling may allow a researcher to separate these error
types by modeling forms of systematic error (e.g., halo, leniency, etc.)
as distinct constructs. Further, the measure’s level must also be consid-
ered. Thus, both higher and lower order constructs may be measured,
and there may be one or multiple indicators of each construct. An ex-
ample of the need for a model encompassingthese points can be seen in
the following example.
In Sackett et al.’s (1988) supermarket sample, the job-specific task
proficiency construct consists of speed and accuracy as lower order con-
structs, each with a single machine-recorded objective indicator. In ad-
dition, supervisory ratings were obtained for knowledge and judgment,
register operations, and overall performance. The higher order job-
specific task proficiency construct also has one subjective indicator along
with the lower-order constructs. This provides a good example of the
complexity involved in tapping different constructs at different levels,
and may help to explain why the relationship between various perfor-
mance indicators appears weak.
Conclusion
In short, we agree with Campbell (1990), Muckler and Seven (1992)

and others that the objective/subjective distinction has been given too
much attention at the expense of examining performance’s underlying
structure, but we do not agree that no distinction can be made between
objective and subjective performance measures, nor do we agree that the
issue is singularly unimportant. First, the indicator one uses to measure
a particular performance aspect suggests the types of error against which

the investigator should be prepared to defend. Ratings are subject to nu-
merous well-documented sources of systematic bias and random error.
Objective measures tend to be less prone to bias and random error, but
they are no panacea. Performance constructs that can be measured by
objective means tend to be narrowly focused and are typically represen-
tative of low-order factor structures, and both theoreticians and practi-
tioners are cautioned to not rely solely on objective measures for their
supposedly superior measurement properties. Academics and practi-
tioners alike are well advised to look to Campbell’s model as a checklist
against which to compare planned measures with the theoretical perfor-
mance factor structure. Investigators need to remember that it is better
to imperfectly measure relevant dimensions than to perfectly measure
irrelevant ones.
REFERENCES
References marked with an asterisk indicate studies included in the meta-analysis.
*Alexander ER, Wilkins RD. (1982). Performance rating validity: The relationship of
objective and subjectivemeasures of performance. Groupand Organization Studies,
7,485-496.
*Anderson HE, Roush L, McClary JE. (1973). Relationships among ratings, production,
efficiency, and the General Aptitude Test Battery scales in an industrial setting.
Journal of Applied Aychology, 58,7742.
*Arnewn S , Milikin-Davies M, Hogan J. (1993). Validation of personality and cognitive
measures for insurance claim examiners. Journal of Business and Pycholog~7,
459-473.
*Avila RA,Fern EF, Mann OK. (1988). Unraveling criteria for assessing the performance
of salespeople: A causal analysis. Journal ofPersonal Sellingand Sales Management,
8,45-54.
*Baehr ME, Williams GB. (1968). Prediction of sales successfrom factoriallydetermined
dimensionsof personal background data. Jownal ofApplied Psychology, 52,9&103.
*BarrickMR, Mount MK, StraussJP. (1993). Conscientiousnessand performance of sales
representatives: Test of the mediating effects of goal setting. Journal of Applied
Pvchology, 78,715-722.
*BassAR, Turner JN. (1973). Ethnic group differences in relationships among criteria of
job performance. Journal of Applied Pychology, 57,101-109.
*Behrman DN, Perreault WD. (1982). Measuring the performance of industrial salesper-
sons. Journal of Business Research, 10,355-370.
*Blau G. (1993). Testing the relationship of locus of control to different performance
dimensions. Journal of Occupational and Organizational Pvchology, 66,125-138.
Campbell JP. (1990). Modeling the performance prediction problem in industrial and
organizational psychology. In Dunnette MD, Hough LM (Eds.), Handbook of
Industrial and Organizational Psychology: Vol. I (2nd ed., pp. 687-732). Palo Alto,
California: Consulting Psychologists Press.
Chaney FB. (1966). A cross-cultural study of individual research performance. Journal of
Applied Psychology, 50,206-210.
*Cotham JC. (1969). Using personal history information in retail salesman selection.
Journal of Retailing, 45,31-38.
*Cron WL, Slocum JW. (1986). The influence of career stages on salespeople's job at-
titudes, work perceptions, and performance. Journal of Marketing Research, 23,
119-129.
*DayNE. (1993). Performance in salespeople-The impact of age. Journal of Managerial
Issues, 2, 254-273.
'Deadrick D, Madigan RM. (1990). Dynamic criteria revisited A longitudinal study of
performance stability and predictive validity. PERSONNELPSYCHOLOGY, 43,717-744.
Dechant K, Veiga J. (1995). More on the folly. Academy of Management Enecutive, 9,
15-16.
'Duarte NT, Goodson JR, Klich NR. (1993). How do I like thee? Let me appraise the
ways. Journal of Organizational Behavior; 14,239-249.
Farh J, Webel JD, Bedeian AG. (1988). An empirical investigationof self-appraisal-based
performance evaluation. PERSONNELPSYCHOLOGY, 41,141-156.
Feldman JM. (1981). Beyond attribution theory: Cognitive processes in performance
appraisal. Journal of Applied Psychology, 66,127-148.
*Field HS, Bayley GA, Bayley SM. (1977). Employment test validation for minority and
nonminority production workers. PERSONNEL PSYCHOLOGY, 3 0 , 3 7 4 .
Fried Y. (1991). Meta-analytic comparison of the job diagnostic survey and job charac-
teristics inventory as correlates of work satisfaction and performance. Journal of
Applied Psychology, 76,690-697.
*Gaylord RH, Russell E, Johnson C, Severin D. (1951). The relation of ratings to produc-
tion records: An empirical study. PERSONNEL PSYCHOLOGY, 4,363-371.
Hartigan JA, Wigdor AK. (1989). Fairness in employment testing: Validity generalization,
minoriy issues, and the General Aptitude Test Battery. Washington, DC: National
Academy Press.
Heneman RL. (1986). The relationship between supervisory ratings and results-oriented
measures of performance: A meta-analysis. PERSONNEL PSYCHOLOGY,39,811-826.
*HoffmanCC, Nathan BR, Holden LM. (1991). Acomparison of validation criteria: Ob-
jective versus subjective performance measures and self- versus supervisor ratings.
PERSONNELPSYCHOLOGY, 44,601-618.
'Hogan, EA. (1987). Effects of prior expectationson performance ratings: A longitudinal
study. Academy of Management Journal, 30,354-368.
Hollenbeck JR, Williams CR. (1987). Goal importance, self-focus, and the goal-setting
process. Journal ofApplied Psychology, 72,204211.
Huber HL, Neale MA, Northcraft GB. (1987). Judgment by heuristics: Effects of ratee
and rater characteristics on performance standards on performance-related judg-
ments. Organizational Behavior and Human Decision Processes, 40,149-169.
Hunter JE, Schmidt FL. (1990). Methods of meta-anabsis. Newbury Park, CA: Sage
Publications.
Hunter JE, Schmidt FL, Judiesch MK. (1990). Individual differences in output variability
as a function of job complexity. Journal ofApplied P s y c h o h , 75,28-42.
*IvancevichJM, McMahon JT (1982). The effects of goal setting, external feedback, and
setf-generated feedback on outcome variables: A field experiment. Academy of
Management Journal, 2S, 359-372.
Kerr S. (1975). On the folly of rewarding A, while hoping for B. Academy of Management
Journal, 18,769-783.
*KingstromPO, Mainstone LE. (1985). An investigation of the rater-ratee acquaintance
and rater bias. Academy of Management Journal, 28,641-653.
*Kirchner WE (1960). Predicting ratings of sales success with objective performance
information. Journal of Applied Psychology, 44,398403.
‘Knauft EB. (1949). A selection battery for bake shop managers. Journal of Applied
Psychology, 35,304315,
Koslowsky M, Sagie A. (1993). On the efficacy of credibility intervals as indicators of
moderator effects in meta-analytic research. Journal of Organizational Behavioc
14,695-699.
*Lawshe CH, McGinley AD. (1951). Job performance criteria studies: I. The job perfor-
mance of proofreaders. Journal of Applied Psychology, 35,316-320.
*Lee C, Gillen DJ. (1989). Relationship of Type A behavior pattern, self-efficacy percep
tions on sales performance. Journal of Organizational Behavioq 10,75-81.
*Levy M, Sharma A. (1993). Relationships among measures of retail salesperson perfor-
mance. Journal of the Academy of Marketing Science, 21,231-238.
*Lopez FM. (1966). Current problems in test performance of job applicants: I. PERSONNU
PSYCHOLOGY, 19,1&18.
‘MacKenzie SB, Podsakoff PM, Fetter R. (1991). Organizational citizenship behavior and
objective productivity as determinants of managerial evaluations of salespersons’
performance. Organizational Behavior and Human Decision Processes, 50,123-150.
*MacKenzie SB, Podsakoff PM, Fetter R. (1993). The impact of organizational citizenship
behavior on evaluations of salesperson performance. Journal of Marketing 57,7&
80.
Mathieu JE, Zajac DM. (1990). A review and meta-analysis of the antecedents, correlates,
and consequences of organizational commitment. Psychological Bulletin, 108,171-
194.
McEvoy GM, Cascio WE (1989). Cumulative evidence of the relationship between em-
ployee age and job performance. Journal ofApplied Psychology, 74,ll-17.
McNemar Q . (1969). Psychological statistics. (4th ed.). New York Wiley.
*Meglino BM, Ravlin EC, Adkins CL. (1989). A work values approach to corporate cul-
ture: A field test of the value congruence process and its relationship to individual
outcomes. Journal of Apptied Psycholou, 74,424-432.
Mitra A, Jenkins Jr GD, Gupta N. (1992). A meta-analytic review of the relationship
between absence and turnover. Journal ofApplied Psychology, 77,879-889.
*Motowidlo SJ. (1982). Relationship between self-rated performance and pay satisfaction
among sales representatives. Journal ofApplied P.ychology, 67,209-213.
MucMer FA, Seven SA. (1992). Selecting performance measures: “Objective” versus
“subjective” measurement. Human Factors, 34,441-455.
Murphy KR, Cleveland JN. (1991). Performance appraisal. Needham Heights, MA: Allyn
& Bacon.
Nathan BR, Alexander RA. (1988). A comparison of criteria for test validation: A meta-
analytic investigation. PERSONNELPSYCHOLOGY,41, 517-535.
Ones DS, Viswesvaran C, Schmidt FL. (1993). Comprehensive rneta-analysis of integrity
test validation: Findings and implications for personnel selection and theories of
job performance. Journal of Applied Psychology, 78,679-703.
Osburn HG, Calender J. (1992). A note on the sampling variance of the mean uncor-
rected correlation in meta-analysis and validity generalization. Journal ofApplied
Psychology, 77,115-122.
Pfeffer J. (1981). Power in organizations. Marshfield, MA: Pitman Publishing.
*Podsakoff PM, MacKenzie SB. (1994). Organizational citizenship behavior and sales unit
effectiveness. Journal of Marketing Research, 31,351-363.
*Puffer SM. (1987). Prosocial behavior, noncompliant behavior, and work performance
among commission salespeople. Journal of Applied Psychology, 72,615621.
‘Rush CH. (1953). A factorial study of sales criteria. PERSONNELPSYCHOLOGY,6,9-24.
*Sackett PR, Zedeck S, Fogli L. (1988). Relations between measures of typical and maxi-
mum job performance. Journal ofApplied PTcholw, 73,482486.
Schmidt FL, Hunter JE. (1992). Development of a causal model of processes determining
job performance. Current Directions in Psychological Science, I, 89-92.
*Seashore SE, Indik BP, Georgopoulos BS. (1960). Relations among criteria of job per-
formance. Journal of Applied Psycholoa, 44,195-202.
Stauffer, JM. (1994). MetaDos: Psychometric meta-analysis program [computer pro-
gram]. Terre Haute, IN. Indiana State University.
Tett RP, Jackson DN, Rothstein M. (1991). Personality measures as predictors of job
performance: A meta-analytic review. PERSONNEL PSYCHOLOGY, 44,703-742.
*ValidityInformation Exchange, No. 8-35. (1958). PERSONNEL PSYCHOLOGY, II,501-504.
'Validity Information Exchange, No. 11-10. (1958). PERSONNEL PSYCHOLOGY, 11, 121-
122.
*Validity Information Exchange, No. 11-27. (1958). PERSONNEL PSYCHOLOGY, ZZ, 583-
584.
*Validity Information Exchange, No. 11-30. (1958). PERSONNEL PSYCHOLOGY, 11, 587-
589.
Viswesvaran C, Schmidt FL, Ones DS. (1994, April). Examining the validity ofsupervisory
ratings of job performance using linear composites. Paper presented at the Ninth
Annual Conference of the Society for Industrial and Organizational Psychology,
Nashville, TN.
* Weitz BA. (1978). Relationship between salesperson performance and understanding of
customer decision making. Journal of Marketing Research, 15,501-516.
Whitener EM. (1990). Confusion of confidence intervals and credibility intervals in meta-
analysis. Journal of Applied Psycholoa, 75,259-264.
Williams CR, Livingstone Ll? (1994). Another look at the relationship between perfor-
mance and voluntary turnover. Academy of Management Journal, 37,269-298.

On The Interchangeabilfty of Objective and Subjective Measures of Employee Performance: A Meta-Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On The Interchangeabilfty of Objective and Subjective Measures of Employee Performance: A Meta-Analysis

Uploaded by

Copyright:

Available Formats

PERSONNEL PSYCHOLOGY

ON THE INTERCHANGEABILfTY OF OBJECTIVE

A meta-analysis of studies containing both objective and subjective rat-

Because job performance is the most widely studied criterion vari-

Although there are multiple ways to partition performance mea-

to find objective/subjective performance measure differences in a meta-

Potential Moderators of the Relationship Between Objective and

Previous research guided our search for moderators of the objec-

Job type. Heneman (1986) reported a relativelyweak mean corrected

Objective measure’s content. In the section above, we argued that

The examination of objective measure’s content leads to two interest-

more strongly correlated when the subjective measure is evaluated with a

Journals were searched for articles reporting a Pearson product-

performance measures. Finally, the reference lists of the articles iden-

Each sample was independently coded by the first three authors on

The meta-analysis was conducted following Hunter and Schmidt

1992). Linear composites are superior to averaging because they pro-

The meta-analytic results are reported in Table 1. The overall cor-

Quantity 7,395 40 ,316 .382 ,034 .043 13.1 ,116-.647 ,259,373 F

had an impact. In addition, no 95% subgroup confidence interval in-

Although our findings offer mixed support for Heneman’s (1986)

there is no single performance factor, and that the objective/subjective

performance theory, which he considers ironic given the numerous well-

At the most basic level, subjective measures should not be used as

Although universal substitutability is certainly not advised based on

This study should be viewed in the context of several limitations.

Beyond the examination of potential moderators, the present topic

In short, we agree with Campbell (1990), Muckler and Seven (1992)

a particular performance aspect suggests the types of error against which

You might also like