You are on page 1of 12

Journal of Applied Psychology © 2010 American Psychological Association

2011, Vol. 96, No. 4, 762–773 0021-9010/10/$12.00 DOI: 10.1037/a0021832

Validity of Observer Ratings of the Five-Factor Model of


Personality Traits: A Meta-Analysis

In-Sue Oh Gang Wang and Michael K. Mount


Virginia Commonwealth University University of Iowa

Conclusions reached in previous research about the magnitude and nature of personality–performance
linkages have been based almost exclusively on self-report measures of personality. The purpose of this
study is to address this void in the literature by conducting a meta-analysis of the relationship between
observer ratings of the five-factor model (FFM) personality traits and overall job performance. Our
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

results show that the operational validities of FFM traits based on observer ratings are higher than those
This document is copyrighted by the American Psychological Association or one of its allied publishers.

based on self-report ratings. In addition, the results show that when based on observer ratings, all FFM
traits are significant predictors of overall performance. Further, observer ratings of FFM traits show
meaningful incremental validity over self-reports of corresponding FFM traits in predicting overall
performance, but the reverse is not true. We conclude that the validity of FFM traits in predicting overall
performance is higher than previously believed, and our results underscore the importance of disentan-
gling the validity of personality traits from the method of measurement of the traits.

Keywords: personality, five-factor model, observer ratings, job performance, selection

Numerous meta-analytic studies using the five-factor model former journal editors of the Journal of Applied Psychology and
(FFM) personality traits as an organizing framework have shown Personnel Psychology reviewed the literature pertaining to the
that personality traits are valid predictors of job performance for relationship between personality traits and job performance
numerous criteria. In particular, Conscientiousness, and to a lesser (Morgeson et al., 2007), and they concluded that the validity of
extent Emotional Stability, are the most consistent predictors personality measures in predicting job performance is so low that
across jobs and criteria (e.g., Barrick & Mount, 1991; Hurtz & the use of “self-report” personality tests in selection contexts
Donovan, 2000; Salgado, 1997). Other FFM traits (e.g., Agree- should be reconsidered. Similarly, Ployhart (2006, pp. 878 – 879)
ableness) are also valid predictors of performance in certain jobs or stated, “This concern is not restricted to academics; applied expe-
certain performance criteria (e.g., organizational citizenship be- rience suggests many practitioners and human resource (HR) man-
havior), but their validities do not generalize across jobs and agers remain skeptical of personality testing because these valid-
performance criteria (see Ilies, Fulmer, Spitzmuller, & Johnson, ities appear so small.” It should be noted, however, that other
2009). scholars have refuted these claims and have presented a more
Despite these positive findings, the magnitude of the validity favorable view of the validity of self-report personality measures
obtained in these studies is only moderate, with estimated opera- (e.g., Oh & Berry, 2009; Ones, Dilchert, Viswesvaran, & Judge,
tional validity in the .20s range (Barrick, Mount, & Judge, 2001; 2007; Tett & Christiansen, 2007).
Schmidt, Shaffer, & Oh, 2008). In light of these findings, a set of Thus, considering all arguments pro and con, it is fair to say that
there is considerable debate about the amount of progress that has
been made in understanding relationships between personality
This article was published Online First December 13, 2010. traits and job performance. However, one area where there is no
In-Sue Oh, Department of Management, School of Business, Virginia disagreement is that virtually all of our knowledge about the
Commonwealth University; Gang Wang and Michael K. Mount, Depart- usefulness of personality traits in selection contexts (e.g., which
ment of Management and Organizations, Tippie College of Business, personality traits predict which criteria; how well they predict; and
University of Iowa. whether the validities generalize across jobs, organizations, and
An earlier version of this article was presented at the 2010 annual occupations) is based exclusively on a single method of measure-
meeting of the Academy of Management, Montreal, Quebec, Canada, ment: self-reports of personality traits (see Mount, Barrick, &
August 6 –10. We thank Frank Schmidt, Murray Barrick, and Kibeom Lee
Strauss, 1994, for an exception). Accordingly, in Morgeson et al.’s
for their useful comments on an earlier version of this article. We also
thank Jared LeDoux and Donald H. Kluemper for sharing their data (2007) article, several of the journal editors suggested that future
with us. research should focus on identifying alternatives to self-report
Correspondence concerning this article should be addressed to In-Sue personality measures. For example, K. Murphy (p. 719) stated,
Oh, Department of Management, School of Business, Virginia Common-
wealth University, 301 West Main Street, Box 844000, Richmond, VA I have more faith in the constructs than in the measures. And I think
23284, or to Michael K. Mount, Department of Management and Organi- the problem is that we have been relying on self-report measures for
zations, Tippie College of Business, S358 John Pappajohn Bus Building, the last 100 years. We should look at other ways of assessing person-
University of Iowa, Iowa City, IA 52242-1994. E-mail: isoh@vcu.edu or ality. If you want to know about someone’s personality just ask his or
michael-mount@uiowa.edu her co-workers.

762
OBSERVER RATINGS OF PERSONALITY 763

Similarly, M. Campion (p. 719) stated, “I think one theme I hear is to collect the ratings from other individuals. Second, it may
is let’s think about different ways of measuring these constructs. reflect the belief that personality judgments made by the target
Let’s not abandon the construct, but instead abandon self-report person are the most accurate representations of his or her actual
measurement and think about new and innovative ways of mea- personality. Third, it may reflect the fact that some personality
suring the constructs.” In their Annual Review of Psychology traits (e.g., affective personality traits including Emotional Stabil-
chapter, Sackett and Lievens (2008, p. 424) also noted that one ity) are inherently internal to the target person and thus are less
strategy for improving the measurement of personality is to mea- able to be observed by others or perhaps are unknowable by others
sure the same construct with different methods than self-reports. (e.g., Spain, Eaton, & Funder, 2000). Lastly, it may reflect re-
Examples of this strategy include measuring the FFM traits using search findings that show that self-reports of personality are valid
interviews rather than self-reports (Barrick, Patton, & Haugland, in predicting numerous important life outcomes, including job
2000; Van Iddekinge, Raymark, & Roth, 2005) and developing performance both cross-sectionally and longitudinally (e.g., Judge,
implicit measures of personality using situational judgment tests Higgins, Thoresen, & Barrick, 1999; Roberts, Kuncel, Shiner,
(Motowidlo, Hooper, & Jackson, 2006) or conditional reasoning Caspi, & Goldberg, 2007).
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

measures (James et al., 2005; also see Berry, Sackett, & Tobares, Nonetheless, as discussed earlier, the validity of self-reports of
This document is copyrighted by the American Psychological Association or one of its allied publishers.

2010). personality is not as high as many researchers expect to be. As


Clearly, in any field of science, it is important to demonstrate Sackett and Lievens (2008, p. 424) stated, “there seems to be a
the robustness of findings by showing that results converge across general sense that personality ‘should’ fare better than it does.”
multiple methods of measurement of the same traits (Campbell & There are several possible explanations for the relatively low
Fiske, 1959). Given the important role that personality traits play validity of self-reports of personality traits. First, self-reports of
in predicting job performance at work (e.g., Barrick et al., 2001) personality are subject to response distortion (faking). Generally
and given recent, reoccurring pessimistic views of the use of speaking, it is clear that individuals can fake/distort their responses
self-reports of personality in high-stakes selection contexts (e.g., to personality measures when instructed to do so in both laboratory
Morgeson et al., 2007), it is imperative that researchers examine and field settings (see Hooper & Sackett, 2008, for meta-analytic
alternatives to self-report measures of personality in predicting job evidence). One form of response distortion is impression manage-
performance. Accordingly, the major purpose of this study is to ment tendencies, whereby in high-stakes testing individuals inten-
conduct a meta-analysis to estimate the validity of observer ratings tionally present themselves in a favorable light to obtain valued
of FFM personality traits in predicting job performance. We com- rewards (e.g., employment). A second form is self-deception,
pare the magnitude of the validity estimates of observer ratings of whereby individuals unintentionally distort their ratings of socially
FFM traits with previously derived validity estimates based on desirable traits in a positive direction (e.g., Paulhus, 1991; Paulhus
self-report measures of FFM traits. We also examine the incre- & Reid, 1984). Both of these response distortion tendencies rep-
mental validity of observer ratings of personality over self-reports resent systematic, but irrelevant, variance that may lower the
of personality using currently available meta-analytic evidence. validity of the ratings. There is considerable debate in the literature
We believe the findings have the potential to make important about the impact of response distortion on the validity of self-
practical and theoretical contributions to the field. For example, if reports of personality. Existing research is inconclusive about
the results show that observer ratings of personality have substan- whether these response distortion tendencies significantly alter the
tially higher validities than validities based on self-reports, it validity of self-reports of personality. For example, Ellingson,
affirms the perspective of both Murphy and Campion in Morgeson Smith, and Sackett (2001) and Ones, Viswesvaran, and Reiss
et al. (2007) that we should not abandon FFM constructs but rather (1996) found that social desirability does not affect the construct
should explore alternative ways of measuring the constructs. The- and criterion-related validity of self-report measures of personal-
oretically, such findings would mean that personality traits should ity, respectively. However, in actual selection settings, equal va-
play even more prominent roles in models that seek to explain job lidity does not necessarily translate into the same applicants being
performance. selected (Kehoe, 2002; Morgeson et al., 2007). To the degree that
applicants do not distort their ratings equally, their rank order on
Validity of Self-Reports Versus Observer Ratings of test scores may change, which in turn may alter who is selected if
Personality Traits a top-down selection strategy is used. An anonymous reviewer
suggested to us that response distortion may be less severe for
Historically, the estimated strength of relationships between applicants with high levels of positive FFM traits (e.g., Conscien-
FFM traits and outcome variables has been based on the ability of tiousness) than for applicants with low levels of the same traits,
self-reports of personality to predict the outcomes. In the field which further indicates that the degree of response distortion may
of industrial/organizational (I/O) psychology, use of self-reports of not be equal among applicants; some may fake more than others.
job-relevant personality traits as a selection procedure is often the Overall, the above discussion clearly illustrates that personality
norm given its positive and generalizable validity and its practi- measurement (particularly, in selection contexts) should not be
cally meaningful incremental validity over cognitive ability, the limited to self-reports and that we need to look for alternative
single best selection procedure (Schmidt & Hunter, 1998). Why methods of measurement to move this field forward (Morgeson et
has the use of self-reports been the most frequently used method in al., 2007).
personality measurement? First, it may reflect the convenience of For example, there is evidence (e.g., Kolar, Funder, & Colvin,
obtaining such information. It is easier for researchers and practi- 1996; see Hofstee, 1994) that personality assessments made by
tioners to obtain ratings from a single individual who is the target well-acquainted observers can provide equally accurate informa-
research participant or the focal job applicant or incumbent than it tion about the target person. According to socioanalytic theory
764 OH, WANG, AND MOUNT

(R. T. Hogan, 1991), there is a clear distinction between what cooperative, unenvious, or creative). On the other hand, the fun-
self-reports and observer ratings of personality measure. Self- damental attribution error posits that observers are inclined to
reports assess the internal dynamics (e.g., identity) that shape the unconsciously make dispositional attributions for targets’ behavior
behavior of an individual, whereas observer ratings capture the (Martinko & Gardner, 1987; L. Ross, 1977). Thus, observers may
reputation of an individual. Because reputations typically are based unintentionally amplify the role that target’s personality traits play
on the individual’s past performance, and because past perfor- but overlook the influence of situational factors. As a result, it is
mance is a good predictor of future performance, reputations are possible that observers unintentionally give high ratings to desir-
likely to be more predictive of behavior in work settings than the able traits when targets achieve good results (e.g., job perfor-
internal dynamics of one’s personality. That is, observer ratings of mance) or low ratings of desirable traits to those who achieve poor
personality are not so much a foresight as a hindsight, in which results, even though the good or bad results might be mainly due
personality is inferred from the person’s behavior rather than vice to situational factors that are out of targets’ control. To the extent
versa. Funder and Dobroth (1987, p. 409; also see Funder & that these unintentional distortions exist, observer ratings of per-
Colvin, 1988) also argued, “evaluations of the people in our social
sonality may have lower validity than self-reports of personality.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

environment are central to our decisions about who to befriend and


Third, when assessing others’ personality, observers may also be
This document is copyrighted by the American Psychological Association or one of its allied publishers.

avoid, trust and distrust, hire and fire, and so on.” Hence, the goal
subject to rater errors, such as leniency, central tendency, and halo
of observer ratings of personality is behavioral prediction—from
found in performance appraisal (Ployhart, Schneider, & Schmitt,
an evolutionary psychology perspective, the prediction accuracy is
2006; Scullen, Mount, & Goff, 2000). To the degree that observers
critical to survival (R. T. Hogan, 2007). Furthermore, an individ-
ual’s identity (self-reports of personality) and reputations (ob- make these rater errors, observer ratings of personality may have
server ratings of personality) are not necessarily the same, al- lower validities than self-reports. Nonetheless, given the lack of
though they are related to each other to a meaningful degree. meta-analytic evidence regarding the validity of observer ratings
Connolly, Kavanaugh, and Viswesvaran (2007) presented meta- of personality, it remains an empirical question whether the valid-
analytic evidence showing that the true-score correlations between ity of observer ratings is higher or lower than that of self-reports.
self-reports and observer ratings of the FFM personality are mod- In summary, there are compelling reasons to investigate the
erate to strong with values ranging from a low of .46 for Agree- validity of observer ratings of personality traits versus self-reports
ableness to a high of .62 for Extraversion. This suggests that as well as their incremental validity over self-reports in the pre-
ratings/reports of the same trait obtained from different sources are diction of job performance. First, nearly all of the previous re-
related but also capture unique variance. Thus, one of the main search about the validity of personality traits is based on self-
questions we address in the present study is whether observer reports, and this has limited the robustness of conclusions of the
ratings of personality account for incremental validity in job per- usefulness of personality for selection purposes. Second, recent
formance over that accounted for by self-reports of personality. In research (Connolly et al., 2007) shows that observer ratings dem-
line with this reasoning, Mount et al. (1994) found that observer onstrate adequate psychometric properties—such as internal con-
ratings of sales representative’s personality showed substantial sistency, test–retest reliability, and convergent validity—which
incremental validity above and beyond self-reports of personality establishes the necessary conditions for further exploration of their
in predicting overall job performance. When examined in the validity. Third, by definition, observer ratings do not contain
opposite direction, however, self-reports of personality did not self-distortion bias (e.g., faking) that constrains the validity, fair-
show significant incremental validity over observer ratings of ness, applicant reaction, and practicality (manager reaction) of
personality. Despite these positive findings, it should be noted that self-reports, although observer ratings may be subject to other
their results were limited to a single job—sales; thus, we do not yet equally or more serious types of bias (Ployhart, 2006). Fourth,
know whether the higher validity of observer ratings compared although there is very limited evidence, empirical studies (e.g.,
with self-reports of personality is generalizable to other jobs. Mount et al., 1994) and socioanalytic theory (R. T. Hogan, 1991,
Although Mount et al.’s (1994) results are encouraging, there
2007) suggest that compared with self-reports, observer ratings
are several possible reasons that observer ratings actually may
(one’s reputation) may have higher validities and can account for
have lower validity than self-reports. One is that observers have
unique variance in performance measures beyond that explained
limited opportunities to observe a target person’s behavior both in
by self-reports (one’s identity). Finally, on the basis of their review
terms of the diversity and relevance of situations, as well as the
of the literature, members of a distinguished panel of ex-journal
duration and frequency of time spent observing the target. As
mentioned earlier (Connolly et al., 2007), some personality traits editors (Morgeson et al., 2007) and authors of a recent review of
are private to the target person and thus not easily observable to selection research (Ployhart, 2006; Sackett & Lievens, 2008;
observers. In other words, target persons are in a better position to Schmidt et al., 2008) have called for more research that examines
rate/report their own personality traits because they observe them- alternatives to self-reports of personality.
selves all the time and across numerous different situations. Sec- Therefore, the main purpose of this study is to conduct a
ond, it is possible that observer ratings, like self-reports, contain meta-analysis of the validity of observer ratings of personality
response distortion (e.g., friendship bias), albeit different types traits for overall work performance. We examine, via meta-
than those associated with self-reports (e.g., faking). On one hand, analysis and meta-analytic regression, two important research
it is likely that observers (friends, parents, referees) intentionally questions: (a) are observer ratings of FFM personality traits valid
minimize targets’ socially undesirable traits (e.g., being sloppy, predictors of overall job performance? and (b) do observer ratings
rude, nervous, untrustworthy, or unimaginative) or exaggerate of FFM personality traits account for incremental validity over
targets’ socially desirable traits (e.g., being hardworking, assertive, self-reports of FFM traits?
OBSERVER RATINGS OF PERSONALITY 765

Method (MBA) applicants’ personality was measured by three referees,


one of whom may be the current supervisor who provided perfor-
Literature Search mance ratings (R. Zimmerman, personal communication, May 12,
2010). In addition, we included two primary studies (Aronson,
We conducted an extensive electronic and manual search for Reilly, & Lynn, 2006; Brennan & Skarlicki, 2004) in which
both published and unpublished studies to minimize publication personality and performance were assessed by the same observer.
bias (Cooper, 2003). For the electronic search, we searched elec- Contrary to our expectations, we found that the extrapolated results
tronic databases such as EBSCO, PsycINFO, Web of Science, and were virtually the same with and without the six primary studies
Dissertation Abstracts. Given the large number of studies on the (no more than ⫾.02). Thus, in Table 1, we report the results that
relationship between personality and performance, we developed include all six studies. (Results without these six studies are
two decision rules to search relevant primary studies. First, we available upon request from the first author.) Fifth, we focused on
limited our initial search to studies whose abstract included at least samples of working adults in organizational settings to generalize
a keyword for each of the following two categories: (a) keywords our findings to a typical organizational setting and to be compa-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

representing personality traits (e.g., personality, Big Five, five- rable with similar meta-analyses examining the validity of self-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

factor model [FFM]) and (b) keywords representing performance reports of personality for job performance (e.g., Hurtz & Donovan,
criteria (e.g., performance, organizational citizenship behavior 2000).
[OCB], helping, deviance, counterproductive work behavior). Sec- As a result, a total of 16 primary studies with 18 independent
ond, we searched primary studies citing Mount et al. (1994), the samples met the inclusion criteria and were included in the meta-
pioneering study examining the validity of observer ratings of analysis. Among the 16 primary studies, 12 were published arti-
personality in predicting job performance. For the manual search, cles, and four were unpublished (a dissertation, a data set, and two
we consulted all issues of major I/O psychology journals (e.g., working papers). All of the samples are incumbent samples. To
Journal of Applied Psychology, Personnel Psychology) published maximize coding accuracy, the first two authors independently
as of May 2010 for in-press articles that may have not yet been coded all studies and compared their coding results. The initial
included in the electronic databases. In addition, we searched agreement rate was 93%. All the remaining discrepancies were
abstracts of recent major scholarly meetings (e.g., the Society for resolved through a series of discussions.
Industrial and Organizational Psychology and the Academy of
Management conference programs) and contacted several re- Meta-Analytic Procedures
searchers active in personality research. Finally, we searched for
possible unpublished and in-press studies by sending e-mail re- We estimated the operational (true) validity (corrected for un-
quests to the Academy of Management listservs and by posting a reliability in the criterion measure and range restriction on the
request on the call for papers section of the Society for Industrial predictor measure) of observer ratings of personality using Hunter
and Organizational Psychology’s official website. and Schmidt’s (2004) validity generalization methods, which have
been used in all other meta-analyses on the relationship between
Inclusion Criteria self-reports of personality and job performance (e.g., Barrick &
Mount, 1991; Hurtz & Donovan, 2000). Specifically, we used
To be included in the meta-analysis, primary studies had to meet individual correction methods to synthesize validity coefficients
the following criteria. First, at least one of the FFM personality across studies using the Hunter–Schmidt Meta-Analysis Package
traits had to be measured. We mainly included studies in which Program 1.1 (VG6 Module; Schmidt & Le, 2004).
measures were explicitly designed to measure one of the FFM To be comparable with the procedures used in previous meta-
personality traits (factors), as was done in Hurtz and Donovan analyses (e.g., Hurtz & Donovan, 2000), we corrected each valid-
(2000). That is, studies in which non-FFM personality traits (e.g., ity (correlation) coefficient for measurement error in the criterion
Core Self-Evaluations; Scott & Judge, 2009) were measured were measure using the meta-analytically derived mean interrater reli-
excluded. Second, participants’ FFM personality traits had to be ability estimate, given that no primary study reported interrater
assessed by single or multiple observers. Third, participants’ work reliability estimates (Viswesvaran, Ones, & Schmidt, 1996). If
performance (e.g., overall, task, contextual, counterproductive per- there were multiple raters, we used the meta-analytic single-rater
formance; Rotundo & Sackett, 2002) had to be examined at the reliability (.52) and the Spearman–Brown formula to derive a
individual level. We excluded primary studies in which interview reliability estimate that reflected the number of individuals who
performance was the criterion (e.g., Van Iddekinge, McFarland, & provided performance ratings. The mean interrater reliability esti-
Raymark, 2007). Fourth, participants’ FFM personality and per- mate across these studies included in this meta-analysis was .67
formance had to be rated by different people/sources to avoid (SD ⫽ .16). It should be noted that Hurtz and Donovan (2000)
common source bias (Podsakoff, MacKenzie, Lee, & Podsakoff, used a somewhat lower interrater reliability estimate (.53) to
2003). At the suggestion of an anonymous reviewer, however, we correct for measurement error in the criterion measures. With
relaxed this inclusion criterion to increase the number of relevant regard to which type of reliability to use (interrater or coefficient
studies. We included four primary studies (Peterson, Smith, Mar- alpha) in correcting for unreliability of job performance ratings to
torana, & Owens, 2003; S. M. Ross & Offermann, 1997; Shaffer, estimate operational validity, there is an on-going debate in the
Harrison, Gregersen, Black, & Ferzandi, 2006; Zimmerman, Tri- literature (Murphy & DeShon, 2000; Schmidt, Viswesvaran, &
ana, & Barrick, 2010) in which there is or might be slight overlap Ones, 2000). For informational purposes, we report results cor-
between personality and performance rating sources. For example, rected for measurement error using the meta-analytic mean inter-
in Zimmerman et al. (2010), Master of Business Administration rater reliability estimate adjusted for the number of raters and the
766 OH, WANG, AND MOUNT

Table 1
Operational Validity of Observer Ratings of Five-Factor Model (FFM) Personality Traits for Overall Job Performance

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

FFM trait k N Nraters r៮ ␳ˆ៮ SD␳ 80% CrI␳ SE␳៮ 95% CI␳៮ % Var

Conscientiousnessa 17 2,171 1.72 .28 .37 .17 .16 .59 .05 .29 .46 31
Conscientiousnessb 17 2,171 1.72 .28 .33 .15 .14 .53 .04 .25 .42 30
Emotional Stabilitya 16 1,872 1.70 .15 .21 .00 .21 .21 .02 .16 .25 140
Emotional Stabilityb 16 1,872 1.70 .15 .18 .00 .18 .18 .02 .14 .23 136
Agreeablenessa 16 2,074 1.83 .21 .31 .14 .12 .49 .04 .22 .39 39
Agreeablenessb 16 2,074 1.83 .21 .26 .15 .06 .46 .04 .17 .35 31
Extraversiona 14 1,735 1.73 .19 .27 .15 .08 .46 .05 .18 .36 40
Extraversionb 14 1,735 1.73 .19 .24 .13 .07 .40 .04 .16 .32 40
Openness/Intellecta 14 1,735 1.73 .18 .26 .19 .02 .50 .06 .15 .37 29
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Openness/Intellectb 14 1,735 1.73 .18 .23 .17 .01 .44 .05 .13 .33 28
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Note. (1) number of independent validity coefficients—validities for counterproductive work behaviors are reverse coded when aggregating across performance
criteria; (2) total sample size; (3) average numbers of observers (raters) for personality; (4) sample-size weighted mean observed validity; (5) estimated mean
operational (true) validity—validity corrected for unreliability in the criterion measure and range restriction on the personality measures; (6) estimated standard
deviation of the operational validities; (7) lower and upper bounds of the 80% credibility interval for the validity distribution; (8) estimated standard error for the
mean operational validity; (9) lower and upper bounds of the 95% confidence interval for the estimated mean operational validity; (10) percentage of observed
variance accounted for by statistical artifacts.
a
Corrected for measurement error in the criterion using inter-rater reliability after adjusting for the number of raters (M ⫽ .67, SD ⫽ .16) and range
restriction on the predictor (.92 for Conscientiousness and Extraversion and .91 for the other FFM traits). b Corrected for measurement error in the
criterion using internal consistency (coefficient alpha; M ⫽ .89, SD ⫽ .08) and range restriction on the predictor (.92 for Conscientiousness and Extraversion
and .91 for the other FFM traits).

alpha coefficient estimates available from the primary studies found in a minimum of three studies from at least two different
included in this meta-analysis (M ⫽ .89, SD ⫽ .08). researchers.
Because we were interested in estimating operational validity,
we did not correct for measurement error in the predictor (FFM
personality) measures; the mean reliabilities (coefficients alpha) Results
computed across primary studies included in this meta-analysis
range from .77 for Openness to .86 for Extraversion (on average Overall Validity Coefficients
.82 across all FFM traits), which are generally comparable with
Viswesvaran and Ones’s (2000) FFM reliability generalization Table 1 presents the results of the omnibus meta-analysis ag-
results. We further corrected for direct range restriction on the gregated across performance criteria (e.g., overall, task perfor-
predictor measure using the meta-analytic information (ux [⫽ mance, contextual performance, and counterproductive work be-
restricted incumbent SD/unrestricted applicant SD] ⫽ .92 for Con- havior). In this analysis, validity (correlation) coefficients for
scientiousness and Extraversion, and ux ⫽ .91 for the other FFM counterproductive work behavior are reverse coded. Average num-
traits) reported in Schmidt et al. (2008). It should be noted that bers of observers (raters) for the FFM personality traits are 1.72 for
Hurtz and Donovan (2000) used essentially the same mean ux Conscientiousness, 1.70 for Emotional Stability, 1.83 for Agree-
estimate (.92) to correct for measurement error in their predictor ableness, 1.73 for Extraversion, and 1.73 for Openness/Intellect.
measures.
The mean operational validity estimates (␳៮ˆ ) range from .21 (Emo-
It is noted that by using the mean artifact estimates rather than
tional Stability) to .37 (Conscientiousness). Consistent with pre-
their distributions, the variance due to artifacts was somewhat
vious meta-analyses (Barrick & Mount, 1991; Hurtz & Donovan,
underestimated, and consequently true standard deviation of oper-
2000; Salgado, 1997) and second-order meta-analyses (Barrick et
ational validities was somewhat overestimated (McDaniel, Whet-
zel, Schmidt, & Maurer, 1994). This practice, nonetheless, made al., 2001; Schmidt et al., 2008), Conscientiousness (␳៮ˆ ⫽ .37, k ⫽
the 90% credibility value (a cutoff value used to determine validity 17, N ⫽ 2,171) has the highest mean operational validity among
generalization) more conservative. To ensure that the validity the FFM traits, which represents a moderate to strong level of
coefficients included in our meta-analysis were statistically inde- validity. The estimated operational validity for Agreeableness,
pendent, we computed a composite correlation when original stud- Openness/Intellect, Extraversion, and Emotional Stability is
ies reported multiple validity estimates within a single sample greater than .20 (ks ⫽ 14 –16, Ns ⫽ 1,735–2,074). The 80%
(e.g., correlations between personality and several facets of per- credibility intervals for all FFM traits do not include zero, sug-
formance); otherwise, the average of the validity estimates was gesting that all FFM traits’ positive validities generalize in at least
used. Finally, we initially set the cutoff value of the minimum 90% of the cases. Further, the 95% confidence intervals (CIs) for
number of primary studies to be included in each meta-analysis to all FFM traits do not include zero, which indicates that all mean
three based on Chambless and Hollon (1998), who argued that operational validity estimates are significantly greater than zero
good empirical evidence exists when an important relationship is (nonzero) at the .05 level (Whitener, 1990).
OBSERVER RATINGS OF PERSONALITY 767

Validity Coefficients as a Function of the Number of Table 2). To make the comparison, we carefully chose meta-
Observers (Raters) analytic results of self-reports of personality in Hurtz and Donovan
(2000, Table 3, p. 875).
The validity estimates in Table 1 were based on more than one As shown in Table 2, the mean observed and operational valid-
observer (rater), and the average number of raters varied somewhat ity estimates of the FFM personality traits measured by a “single”
across the FFM traits. (On average, across the FFM traits there observer are higher than the corresponding results based on self-
were 1.74 observers [raters].) It is well established that validity reports for overall job performance on average by .10 (ranging
estimates may be influenced (increased) by the use of multiple from .04 [Emotional Stability] to .13 [Openness/Intellect]) and .12
observers (raters) because averaging across multiple raters may (ranging from .04 [Emotional Stability] to .17 [Openness/
increase both reliability and validity, as the reliability index sets Intellect]), respectively. Further, the mean observed and opera-
the upper limit of validity (Oh & Berry, 2009; Schmidt & Zim- tional validities of the FFM personality traits based on three
merman, 2004). Thus, to make the validity estimates based on observers are even higher than the corresponding results based on
observer ratings of personality comparable with those based self-reports for overall job performance on average by .14 and .19,
on self-reports of personality, we applied the procedure used in
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

respectively. Standard deviations of the mean operational validity


Schmidt and Zimmerman (2004) based on the Spearman–Brown
This document is copyrighted by the American Psychological Association or one of its allied publishers.

estimates and the standard errors for the mean operational validity
formula to extrapolate mean observed and operational validity estimates were also estimated by the use of the procedure de-
estimates based on one, two, and three observer ratings of person- scribed in Hunter and Schmidt (2004, pp. 158 –159, 205).
ality. Accordingly, Table 2 presents the mean observed and oper-
ational validity estimates of the FFM traits in terms of the average Incremental Validity of Single Observer Ratings of
number of observers (raters) and compares the results with oper-
Personality Over Self-Reports
ational validity estimates of self-reports for overall job perfor-
mance. Using the average number of observers for each FFM Table 3 presents the incremental validity of “single” observer
personality trait and/or meta-analytic evidence on interrater reli- ratings of personality over self-reports of personality for overall
ability for a single observer (rater) reported in Connelly (2008), we job performance. Column 1 shows mean operational validity esti-
computed the mean observed and operational validities as a func- mates for self-reports of FFM traits for job performance reported
tion of the number of observers (reported in Columns 1 and 2 of in Hurtz and Donovan (2000, Table 3). Column 2 shows the

Table 2
Operational Validity of Observer Ratings of Five-Factor Model (FFM) Personality for Overall Job Performance as a Function of
the Number of Observers

(1) (2) (3) (4) (5) (6) (7)

FFM trait Rating source r៮ ␳ˆ៮ SD␳ 80% CrI␳ SE␳៮ 95% CI␳៮ Diff.

Conscientiousness self-reporta .15 .22 .13 .06 .38 .03 .17 .27 —
1 observerb .25 .32 .15 .13 .51 .05 .22 .42 .10
2 observersb .29 .38 .18 .15 .61 .06 .26 .50 .16
3 observersb .31 .41 .19 .17 .65 .07 .27 .55 .19
Emotional Stability self-reporta .09 .14 .05 .07 .21 .03 .09 .19 —
1 observerb .13 .18 .00 .18 .18 .04 .10 .26 .04
2 observersb .16 .22 .00 .22 .22 .05 .12 .32 .08
3 observersb .17 .24 .00 .24 .24 .05 .14 .34 .10
Agreeableness self-reporta .07 .10 .10 ⫺.02 .22 .02 .05 .15 —
1 observerb .18 .26 .12 .11 .41 .05 .16 .36 .16
2 observersb .21 .32 .14 .14 .50 .06 .20 .44 .22
3 observersb .23 .34 .15 .15 .53 .06 .22 .46 .24
Extraversion self-reporta .06 .09 .11 ⫺.05 .23 .03 .04 .14 —
1 observerb .17 .24 .13 .07 .41 .05 .14 .34 .15
2 observersb .19 .28 .15 .09 .47 .05 .18 .38 .19
3 observersb .21 .29 .16 .09 .49 .06 .17 .41 .20
Openness/Intellect self-reporta .03 .05 .08 ⫺.05 .15 .03 ⫺.01 .11 —
1 observerb .16 .22 .16 .02 .42 .07 .08 .36 .17
2 observersb .19 .27 .20 .01 .53 .08 .11 .43 .22
3 observersb .20 .29 .21 .02 .56 .08 .13 .45 .24

Note. (1) estimated mean observed validity; (2) estimated mean operational (true) validity—validity corrected for unreliability in the criterion measure
and range restriction on the personality measures; (3) estimated standard deviation of the operational validity; (4) lower and upper bounds of the 80%
credibility interval for the validity distribution; (5) estimated standard error for the mean operational validity; (6) lower and upper bounds of the 95%
confidence interval for the estimated mean operational validity; (7) raw differences between operational validities for self-reports and observer ratings of
a given FFM trait. Total N and k for each FFM personality are as follows: Conscientiousness (N ⫽ 2,171, k ⫽ 17); Emotional Stability (N ⫽ 1,872, k ⫽
16); Agreeableness (N ⫽ 2,074, k ⫽ 16); Extraversion (N ⫽ 1,735, k ⫽ 14); Openness/Intellect (N ⫽ 1,735, k ⫽ 14).
a
Values for self-report personality are taken from Hurtz and Donovan (2000, Table 3). b Values for observer ratings of personality in this table are
extrapolated using the Spearman-Brown formula based on results in boldface in Table 1 and the average number of observers (raters) for each FFM trait:
1.72 for Conscientiousness; 1.70 for Emotional Stability; 1.83 for Agreeableness; 1.73 for Extraversion; and 1.73 for Openness to Experience.
768 OH, WANG, AND MOUNT

Table 3
Regression and Usefulness Analyses for Self-Report Versus “Single” Observer Ratings of Personality in Predicting
Overall Job Performance

(1) (2) (3) (4) (5) (6) (7)

FFM trait ␳ˆ៮ self ␳ˆ៮ other 2


(⌬Rself) (⌬Rself ) 2
(⌬Rother) (⌬Rother ) 2
(Rs⫹o) (Rs⫹o ) ␤self ␤other

Conscientiousness .22 .32 .019 (.013) .119 (.067) .339 (.115) .12 .28
Emotional Stability .14 .18 .020 (.008) .060 (.020) .200 (.040) .09 .15
Agreeableness .10 .26 .001 (.001) .161 (.058) .261 (.068) .02 .25
Extraversion .09 .24 .001 (.000) .151 (.050) .241 (.058) ⫺.02 .25
Openness/Intellect .05 .22 .003 (.001) .173 (.047) .223 (.050) ⫺.04 .24

Note. (1) mean operational validity estimates for self-reports of five-factor model (FFM) personality traits from Hurtz and Donovan (2000, Table 3); (2)
mean operational validity estimates of observer ratings of personality from the current study–validity is attenuated for a “single” observer (see Table 2);
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

(3) incremental validity (⌬R) and change in R2 (⌬R2) due to adding self-reports to observer ratings of a given FFM personality trait; (4) incremental validity
This document is copyrighted by the American Psychological Association or one of its allied publishers.

(⌬R) and change in R2 (⌬R2) due to adding observer ratings to self-reports of a given FFM personality trait; (5) multiple R and R2 from the combination
of self-reports and observer ratings of a given FFM personality trait; (6 –7) standardized regression weights comparing self-reports vs. observer ratings of
a given FFM personality trait.

attenuated operational validity estimates for a “single” observer meta-analyses have shown that the operational validities of self-
rating. Column 3 shows incremental validities of self-reports over reports of FFM traits are modest (in the .20s range) at best.
a “single” observer rating of a given FFM trait computed using the Accordingly, this meta-analysis responds to the call by several
meta-analytically derived intercorrelations between self-reports prominent scholars (e.g., Morgeson et al., 2007; Ployhart, 2006)
and observer ratings of FFM traits (Connolly et al., 2007), whereas for research that investigates alternatives to self-report measures of
Column 4 reports incremental validities of a “single” observer personality. The results lead to different conclusions about the
rating over self-reports of a given FFM trait. The results for each validity of FFM traits in predicting job performance and may
of the FFM traits show that self-reports of personality provide inform the debate surrounding the usefulness of personality traits
negligible incremental validity (.001–.020; on average .009) above for predicting job performance.
and beyond observer ratings of personality. However, observer Prior to discussing specific results, we urge the reader to use
ratings of personality provide substantial incremental validity appropriate levels of caution in interpreting our results because of
(.060 –.173; on average .132) over self-reports of personality in the relatively small number of primary studies/samples included in
predicting overall job performance. Column 5 shows the overall the meta-analysis (ks ⫽ 14 –17). For example, given the relatively
multiple Rs (.200 –.336; on average ⫽ .25) for both self-reports small ks, the standard deviations (variances) of the observed and
and observer ratings of a given FFM personality trait; it clearly operational validity estimates may be somewhat overestimated.
shows that the multiple Rs are mostly due to observer ratings, This may result in wider credibility intervals and conservative
rather than self-reports, of personality. Because “the validity of a estimates of the generalizability of the results across primary
hiring method is a direct determinant of its practical value” or studies. However, the mean operational validities are likely to be
utility, we focused on multiple R and incremental validity (change accurate given considerable sample sizes (Ns ⫽ 1,735–2,171).
in R) above (Schmidt & Hunter, 1998, p. 262). However, as an Relatedly, Hunter and Schmidt (2004, p. 408) stated that “several
anonymous reviewer suggested, some readers may be interested in studies combined meta-analytically contain limited information
seeing variance accounted for (R2) and change in R2. Thus, these about between-study variance (although they provide substantial
values are also provided in Table 3 (numbers in parentheses in information about means).”
Columns 3, 4, and 5). It is again clear that for each FFM trait,
observer ratings of personality (.020 –.067; on average .048 or Magnitude of the Validity of Observer Ratings of
4.8%) account for more variance in job performance than do FFM Traits
self-reports of personality (.000 –.013; on average .005 or .5%), net
of each other. Columns 6 and 7 show the standardized regression One important contribution of this study is that the operational
weights (␤s) for self-reports and observer ratings of a given FFM validities of all FFM traits based on a single observer rating are
trait. For example, when both observer ratings and self-reports of higher than those obtained in meta-analyses based on self-report
Conscientiousness are entered in the same equation predicting measures. The operational validities we obtained for a “single”
overall job performance, the estimated standardized regression observer rating predicting overall performance ranged from .18
weight for observer ratings (.28) is 2.33 times larger than that for (Emotional Stability) to .32 (Conscientiousness), all of which
self-reports (.12). except for Emotional Stability exceed or equal the highest validity
reported in previous meta-analyses based on self-reports of the
FFM traits (i.e., .22 for Conscientiousness; Hurtz & Donovan,
Discussion
2000, Table 3). In fact, the magnitude of the differentials between
Although progress has been made in the past 20 years in observer-ratings and self-reports based validities in predicting
understanding the validity of personality traits for predicting job overall performance is substantial (at least .10) except for Emo-
performance, results of multiple meta-analyses and second-order tional Stability. Even for Emotional Stability, the observer validity
OBSERVER RATINGS OF PERSONALITY 769

is larger by .04 (.18 vs. .14), which translates into about 30% gain accounts for variance in job performance above that accounted for
in validity. As mentioned earlier, the reason why Emotional Sta- by other well established predictors. Following Schmidt and Hunt-
bility based on observer ratings has relatively smaller validity gain er’s (1998) analysis scheme, we computed the incremental validity
might be that internal thought processes inherent to anxiety and of single observer ratings of Conscientiousness (␳៮ˆ ⫽ .32) over
depression are difficult to observe. Nonetheless, the findings re- cognitive ability (␳៮ˆ ⫽ .51), which is the single best predictor of
garding the mean operational validity for observer ratings of overall job performance. The overall R is .605, and the incremental
Openness/Intellect (.22 vs. .05 for self-reports; a 340% gain) are validity (⌬R) of single observer ratings of Conscientiousness over
especially noteworthy in this regard because every prior meta- general mental ability is .10. Further, one noteworthy advantage of
analysis based on self-reports has shown that Openness has essen- observer ratings of personality over self-reports is the use of
tially no validity in predicting overall performance (Schmidt et al., multiple raters, which increases reliability and, in turn, increases
2008). its validity (Schmidt et al., 2008). If three observers (raters) were
One possible explanation pertains to the method of measuring used to assess a target person’s personality, we could expect the
Openness. In fact, most of the primary studies included in this validity to be estimated at .41 (with ⌬R ⫽ .17) for Conscientious-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

meta-analysis measured Openness/Intellect using Goldberg (1992) ness. In sum, our findings address many of the concerns raised by
This document is copyrighted by the American Psychological Association or one of its allied publishers.

or similar variants such as Saucier (1994; in fact, a short version of prominent scholars that personality traits should predict better than
Goldberg, 1992) and the International Personality Item Pool they do; in fact, our results confirm what those scholars suspected,
(http://ipip.ori.org/ipip/).1 These FFM questionnaires emphasize as we found that all of the FFM traits predict overall job perfor-
more g-loaded facets of Openness such as ideas, inquisitiveness, mance substantially better than previous research has shown.
and intellectance than other facets such as aesthetics and fantasy. Given the aforementioned advantage whereby multiple observer
Accordingly, the items’ shared variance with g may account for the ratings are available, one possible use of observer ratings for
high validity obtained for Openness. This also relates to the dis- selection purposes is to embed FFM personality items in reference
tinction we made earlier between personality viewed from the self check procedures. For example, Zimmerman et al. (2010; included
versus other perspective. From the self perspective, Openness may in the current meta-analysis) measured two FFM personality traits
refer to traits associated with one’s internal experience, such as the (Conscientiousness and Emotional Stability) using a reference
facets of fantasy, feelings, and aesthetics, whereas from the ob- checklist form. They reported stronger (observed) validities at .39
server’s perspective, Openness refers to those traits associated with for Conscientiousness and .26 for Emotional Stability. Similarly,
external experience, such as the facets of actions, ideas, and Taylor, Pajo, Cheung, and Stringfield (2004; included in the cur-
values. Because the external facets are more observable (visible to rent meta-analysis) measured two FFM personality traits (Consci-
others) and are more highly correlated with g, observer ratings are entiousness and Agreeableness) through a structured telephone
more valid predictors of job performance (Griffin & Hesketh, reference check with a brief (less than 15 min) interview with each
2004). referee, and they also reported stronger validities.
In sum, our findings based on a single observer rating lead to Another noteworthy practical issue is that applicant reactions to
more optimistic conclusions about the validity of FFM traits in selection procedures and tools can influence recruitment outcomes
predicting overall performance; not only are the validity estimates (Ployhart, 2006). As Hausknecht, Day, and Thomas (2004, p. 675)
substantially higher than those based on self-reports but also all of stated, “organizations using selection tools and procedures that are
the FFM traits are more generalizable predictors of overall perfor- perceived unfavorably by applicants may find that they are unable
mance. to attract top applicants, and may be more likely to face litigation
or negative public relations.” On the positive side, the good news
Incremental Validity of Observer Ratings is that recent meta-analytic evidence indicates that reference forms
Another contribution of the study is that single observer ratings (M ⫽ 3.29, SD ⫽ 0.93) are more favorably rated than are (self-
have substantial incremental validity (on average .132) over cor- report) personality tests (M ⫽ 2.83, SD ⫽ 1.01; d ⫽ 0.47, 95% CI
responding self-report measures of FFM traits in predicting overall [.36, .58]). As such, it is possible that embedding personality tests
performance. However, the reverse is not true, as the incremental in reference forms may lead to more positive applicant reactions
gain of self-reports over observer ratings was negligible (close to than asking applicants to respond to a personality questionnaire.
zero) for all FFM traits. Thus, when single observer ratings are Confirming this expectation, Zimmerman et al. (2010) reported
used to predict overall performance, self-report measures add little that MBA applicant reactions to a reference checklist containing
to the prediction. These findings broadly support the findings in personality measures were positive. Notwithstanding the above
the pioneering study by Mount et al. (1994), which showed higher advantages and possibilities of using observer ratings of person-
validities of observer ratings of personality relative to self-reports ality in selection, Hurtz and Donovan (2000, p. 877) noted, “the
and substantial incremental validity of observer ratings of person- practice of using rating sources other than oneself is not likely to
ality over self-reports in predicting job performance among sales be adopted in personnel selection practices.” Although we do not
representatives. Our results extend their findings by including share this view, we acknowledge that there are important issues
multiple samples and by using meta-analytic methods that remove that need to be addressed before observer ratings can be used for
biases due to statistical and methodological artifacts. personnel selection purposes. As an example, although it is pos-
sible to embed personality measures in other selection tools (e.g.,
Practical and Theoretical Implications
Practically speaking, one way to show the potential of observer 1
We thank an anonymous reviewer for bringing this issue to our
ratings of personality as a selection tool is whether this predictor attention.
770 OH, WANG, AND MOUNT

reference forms, phone reference checks, interviews), it is difficult For example, the operational validity of Conscientiousness mea-
to overcome some of the contextual aspects of the rating proce- sured via a “single” observer rating (␳៮ˆ ⫽ .32) is based on 17
dures. As an anonymous reviewer suggested, applicants are un- independent samples (N ⫽ 2,171; average n per sample ⫽ approx-
likely to seek recommendation letters from individuals who are imately 130). Following Berry et al. (2010), we determined how
likely to rate them poorly, and past employers are generally un- many primary studies, each with an operational validity of .20 (a
willing to provide negative references because of concerns about typical level of operational validity) and sample size of 130, would
litigation.2 More research is definitely needed on this issue. None- be needed to lower the operational validity of observer ratings of
theless, we believe it is premature to dismiss the possible use of Conscientiousness for overall job performance (.32) to .22, the
observer ratings for personnel selection purposes. Our results show operational validity of self-reports of Conscientiousness. We found
that observer ratings have much higher validities in predicting job that it would take about 100 such primary studies with the speci-
performance than self-reports and additionally have an important fied parameters to lower the validity to .22. Thus, these findings
advantage over self-reports because multiple observer ratings can are quite robust, as it would take the addition of a large number of
be averaged, which can reduce the amount of bias (idiosyncrasy) primary studies to change our conclusions. Having said this, one
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

across observers and thus can increase validity. major goal of this meta-analysis was to stimulate additional re-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Regardless of the practical issues associated with using observer search pertaining to observer ratings of personality. It is clear that
ratings for selection purposes, the present findings have significant more primary studies are needed so that we can have more accu-
theoretical implications for models of job performance. An impor- rate estimates and test more moderators (e.g., occupation, narrower
tant goal in I/O psychology is to develop models that seek to facet, rating source, characteristics of observers such as the level of
explain how individual differences and situational factors relate to acquaintanceship).
job performance (e.g., Schmidt & Hunter, 1992). By disentangling We have alluded to areas where additional research is needed,
the validity of FFM traits from their method of measurement, our but there are certainly other areas that need to be addressed. One
results contribute to the development of these models by providing is the extent to which the characteristics of the observers influence
a better understanding of the importance of one set of individual the validity of the ratings. For example, in the present study, most
difference variables, personality traits, in explaining job perfor- observers are coworkers (peers), which we expect are likely to be
mance. Our findings that the operational validities for the FFM the most frequent source of observer ratings in organizational
traits based on single observer ratings are, on average, 1.5 times (or settings. One research question is whether other rating sources
by .12) larger than those for corresponding traits based on self- such as subordinates, supervisors, customers, friends, or referees
reports clearly show that personality traits play a more central role (those explicitly asked to provide ratings in high-stakes contexts)
in explaining job performance than previous research has revealed. may yield comparable or even higher validities. Thus, a fruitful
Further, our findings that each of the FFM traits is a valid (non- area for future research is to examine the psychometric properties
zero) predictor of overall performance and that the validities of all of ratings made by raters from different perspectives. Potential
FFM traits generalize (80% credibility intervals do not contain moderators include how long the rater has known the ratee, in what
zero) suggest that models that include only the FFM constructs of context he or she has known the person (work or nonwork), and
Conscientiousness (and Emotional Stability) are likely to be defi- what the purpose of the observer rating is (for selection or devel-
cient in explaining job performance. Finally, the findings that FFM opment). Another fruitful future research might be to selectively
traits based on observer ratings account for substantial incremental choose and combine facets of the FFM because this would yield
validity over cognitive ability (e.g., for Conscientiousness alone shorter measures that may have higher validities (Hurtz & Dono-
there is a incremental gain of .10 based on a “single” observer van, 2000). Recent research findings show that some facets of
rating) reveal that FFM traits are comparatively more important FFM traits are significantly more predictive of job performance
relative to general mental ability than previous research has sug- than other facets and are more predictive of the overall score on the
gested. FFM construct (e.g., Ashton, 1998; Dudley, Orvis, Lebiecki, &
Cortina, 2006; Mount & Barrick, 1995; Paunonen & Nicol, 2001;
Limitations and Future Research Directions Stewart, 1999; Tett, Steele, & Beauregard, 2003).
Second, both validation type (predictive vs. concurrent) and
There are several limitations in this meta-analysis. First, as we validation sample (applicants vs. incumbents) are important issues
acknowledged in several places, the number of primary studies and (moderators) but have rarely been empirically examined in the
the sample sizes in this meta-analysis are relatively small. How- domain of personality (see Van Iddekinge & Ployhart, 2008, pp.
ever, the overall findings are consistent with those of representa- 883– 885). However, there are several reasons that these issues
tive primary studies such as Mount et al. (1994). Specifically, the should be carefully examined in future personality validation stud-
average raw score difference across the FFM traits is only .02 ies. For example, “applicants are thought to be more likely than
between the validities based on one observer in Table 2 and the incumbents to attempt to distort their response (i.e., fake) on
validities reported in Table 1 of Mount et al. with coworker as noncognitive predictors to increase their chances of being se-
personality rating source and supervisor as performance rating lected” (Van Iddekinge & Ployhart, 2008, p. 894). As discussed
source. Given the much larger sample sizes in the current study, earlier, response distortion may lead to lower validity. In support
our study provides more accurate and powerful validity estimates of this notion, Hough (1998) in fact found that observed validity
of observer ratings of personality than Mount et al. (Hunter &
Schmidt, 2004). In addition, we used relatively stringent inclusion
criteria, which restricted the population of primary studies. In these 2
We thank an anonymous reviewer for bringing this issue to our
circumstances, it is useful to know the robustness of the findings. attention.
OBSERVER RATINGS OF PERSONALITY 771

was smaller for predictive designs than for concurrent designs on *Aronson, Z. H., Reilly, R. R., & Lynn, G. S. (2006). The impact of leader
average by .07 by reanalyzing personality validation data (albeit personality on new product development teamwork and performance: The
not based on explicit FFM measures) collected during Project A. moderating role of uncertainty. Journal of Engineering and Technology
The same may apply to observer ratings of personality. Observers Management, 23, 221–247. doi:10.1016/j.jengtecman.2006.06.003
(e.g., referees) are more likely to distort their ratings (i.e., be Ashton, M. C. (1998). Personality and job performance: The importance of
narrow traits. Journal of Organizational Behavior, 19, 289 –303.
susceptible to friendship bias) when their target is in high-stakes
doi:10.1002/(SICI)1099-1379(199805)19:3⬍289::AID-JOB841⬍3.0
selection contexts rather than in nonselection contexts. To the .CO;2-C
extent this is true, the validity estimates reported in this study may *Barrick, M. R. (2009). [Observer ratings of personality and multiple
be overestimates of predictive designs, given that all primary performance outcomes]. Unpublished raw data.
studies included in the meta-analysis were based on incumbents Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimen-
and that all primary studies except for two (Taylor et al., 2004; sions and job performance. Personnel Psychology, 44, 1–26. doi:
Zimmerman et al., 2010) were based on concurrent designs. How- 10.1111/j.1744-6570.1991.tb00688.x
ever, Weekley, Ployhart, and Harold (2004) did not find significant Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). The FFM personality
dimensions and job performance: Meta-analysis of meta-analyses. In-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

differences in terms of the validity of three FFM traits (Conscien-


This document is copyrighted by the American Psychological Association or one of its allied publishers.

tiousness, Extraversion, Agreeableness) between predictive (based ternational Journal of Selection and Assessment, 9, 9 –30. doi:10.1111/
on three primary studies) and concurrent designs (based on five 1468-2389.00160
Barrick, M. R., Patton, G. K., & Haugland, S. N. (2000). Accuracy of
primary studies). Thus, research is inconclusive about this issue,
interviewer judgments of job applicant personality traits. Personnel
and thus more primary studies should further examine this issue.3
Psychology, 53, 925–951. doi:10.1111/j.1744-6570.2000.tb02424.x
Lastly, Ployhart (2006, p. 884) argued that previous meta- Berry, C. M., Sackett, P. R., & Tobares, V. (2010). A meta-analysis of
analyses (e.g., Barrick & Mount, 1991; J. Hogan & Holland, 2003; conditional reasoning tests of aggression. Personnel Psychology, 63,
Hurtz & Donovan, 2000) provide an effective summary of what 361–384. doi:10.1111/j.1744-6570.2010.01173.x
has been done on the relationships between personality and job *Brennan, A., & Skarlicki, D. P. (2004). Personality and perceived justice
performance, “but we may often be interested in questions of what as predictors of survivor’s reactions following downsizing. Journal of
could be done or what should be done.” On this point, as did Applied Social Psychology, 34, 1306 –1328. doi:10.1111/j.1559-
Ployhart (2006), we agree with Landy, Shankster, and Kohler 1816.2004.tb02008.x
(1994, p. 286), who noted that “meta-analysis and traditional Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant
research should be complementary and not competitors.” That is, validation by the multitrait–multimethod matrix. Psychological Bulletin,
meta-analysis should encourage, rather than discourage, future 56, 81–105. doi:10.1037/h0046016
Chambless, D. L., & Hollon, S. D. (1998). Defining empirically supported
primary studies. We hope that this meta-analysis will further
therapies. Journal of Consulting and Clinical Psychology, 66, 7–18.
encourage organizational researchers to investigate the use of
doi:10.1037/0022-006X.66.1.7
different sources of personality assessment other than self-reports Connelly, B. S. (2008). The reliability, convergence, and predictive valid-
in their primary studies, as well as to explore procedural issues ity of personality ratings: An other perspective (Unpublished doctoral
(both legal and logistical) in the use of observer ratings for selec- dissertation). University of Minnesota, Minneapolis.
tion purposes. Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The convergent
validity between self and observer ratings of personality: A meta-
Conclusion analytic review. International Journal of Selection and Assessment, 15,
110 –117. doi:10.1111/j.1468-2389.2007.00371.x
Due caution should be exercised when interpreting our results Cooper, H. (2003). Editorial. Psychological Bulletin, 129, 3–9. doi:
based on the relatively small number of primary studies included 10.1037/0033-2909.129.1.3
in the meta-analyses. Nonetheless, the results reveal that the va- Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A
meta-analytic investigation of conscientiousness in the prediction of job
lidities of observer ratings of FFM traits are substantially higher in
performance: Examining the intercorrelations and the incremental va-
predicting overall performance compared with those based on
lidity of narrow traits. Journal of Applied Psychology, 91, 40 –57.
self-reports. Further, consistent with previous research, Conscien- doi:10.1037/0021-9010.91.1.40
tiousness has the highest validity predicting overall job perfor- Ellingson, J. E., Smith, D. B., & Sackett, P. R. (2001). Investigating the
mance. The new substantive finding was that when based on influence of social desirability on personality factor structure. Journal of
observer ratings, all FFM traits are valid (nonzero) and generaliz- Applied Psychology, 86, 122–133. doi:10.1037/0021-9010.86.1.122
able predictors of overall performance across situations. The re- Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquain-
sults also show that observer ratings have substantial incremental tanceship, agreement, and the accuracy of personality judgment. Journal
validity over self-reports of FFM traits and over general mental of Personality and Social Psychology, 55, 149 –158. doi:10.1037/0022-
ability in predicting overall performance. 3514.55.1.149
Funder, D. C., & Dobroth, K. M. (1987). Differences between traits:
Properties associated with inter-judge agreement. Journal of Personality
and Social Psychology, 52, 409 – 418. doi:10.1037/0022-3514.52.2.409
3
We thank the Action Editor, Robert Ployhart, for bringing this issue to Goldberg, L. R. (1992). The development of markers for the Big-Five
our attention. factor structure. Psychological Assessment, 4, 26 – 42. doi:10.1037/
1040-3590.4.1.26
References Griffin, B., & Hesketh, B. (2004). Why openness to experience is not a
good predictor of job performance. International Journal of Selection
References marked with an asterisk indicate studies included in the and Assessment, 12, 243–251. doi:10.1111/j.0965-075X.2004.278_1.x
meta-analysis. Hausknecht, J. P., Day, D. V., & Thomas, S. C. (2004). Applicant reactions
772 OH, WANG, AND MOUNT

to selection procedures: An updated model and meta-analysis. Personnel Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R.,
Psychology, 57, 639 – 683. doi:10.1111/j.1744-6570.2004.00003.x Murphy, K., & Schmitt, N. (2007). Reconsidering the use of personality
Hofstee, W. K. B. (1994). Who should own the definition of personality? tests in personnel selection contexts. Personnel Psychology, 60, 683–
European Journal of Personality, 8, 149 –162. doi:10.1002/ 729. doi:10.1111/j.1744-6570.2007.00089.x
per.2410080302 Motowidlo, S. J., Hooper, A. C., & Jackson, H. L. (2006). Implicit policies
Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and about relations between personality traits and behavioral effectiveness in
job performance relations: A socioanalytic perspective. Journal of Ap- situational judgment items. Journal of Applied Psychology, 91, 749 –
plied Psychology, 88, 100 –112. doi:10.1037/0021-9010.88.1.100 761. doi:10.1037/0021-9010.91.4.749
Hogan, R. T. (1991). Personality and personality measurement. In M. D. Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimen-
Dunnette & L. M. Hough (Eds.), Handbook of industrial and organiza- sions: Implications for research and practice in human resource man-
tional psychology (Vol. 2, 2nd ed., pp. 873–919). Palo Alto, CA: agement. Research in Personnel and Human Resources Management,
Consulting Psychologists Press. 13, 153–200.
Hogan, R. T. (2007). Personality and the fate of organizations. Mahwah, *Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validity of
NJ: Erlbaum. observer ratings of the Big Five personality factors. Journal of Applied
Hooper, A. C., & Sackett, P. R. (2008, April). Self-presentation on per- Psychology, 79, 272–280. doi:10.1037/0021-9010.79.2.272
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

sonality measures: A meta-analysis. Paper presented at the 23rd Annual Murphy, K. R., & DeShon, R. P. (2000). Inter-rater correlations do not
Conference of the Society for Industrial and Organizational Psychology, estimate the reliability of job performance ratings. Personnel Psychol-
San Francisco, CA. ogy, 53, 873–900. doi:10.1111/j.1744-6570.2000.tb02421.x
Hough, L. M. (1998). Personality at work: Issues and evidence. In M. *Nilsen, D. (1995). An investigation of the relationship between person-
Hakel (Ed.), Beyond multiple choice: Evaluating alternatives to tradi- ality and leadership performance (Unpublished doctoral dissertation).
tional testing for selection (pp. 131–166). Mahwah, NJ: Erlbaum. University of Minnesota, Minneapolis.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Cor- Oh, I.-S., & Berry, C. M. (2009). The five-factor model of personality and
recting error and bias in research findings. Newbury Park, CA: Sage. managerial performance: Validity gains through the use of 360 degree
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: performance ratings. Journal of Applied Psychology, 94, 1498 –1513.
The Big Five revisited. Journal of Applied Psychology, 85, 869 – 879. doi:10.1037/a0017221
doi:10.1037/0021-9010.85.6.869 Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In
Ilies, R., Fulmer, I., Spitzmuller, M., & Johnson, M. (2009). Personality support of personality assessment in organizational settings. Personnel
and citizenship behavior: The mediating role of job satisfaction. Journal Psychology, 60, 995–1027. doi:10.1111/j.1744-6570.2007.00099.x
of Applied Psychology, 94, 945–959. doi:10.1037/a0013329 Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social
James, L. R., McIntyre, M. D., Glisson, C. A., Green, P. D., Patton, T. W., desirability in personality testing for personnel selection: The red her-
LeBreton, J. M., . . . Williams, L. J. (2005). A conditional reasoning ring. Journal of Applied Psychology, 81, 660 – 679. doi:10.1037/0021-
measure for aggression. Organizational Research Methods, 8, 69 –99. 9010.81.6.660
doi:10.1177/1094428104272182 *Parks, L., & Mount, M. K. (2005). The dark-side of self-monitoring:
*Judge, T. A., & Colbert, A. E. (2001). Personality and leadership: A Engaging in counterproductive behavior at work. Best Papers Proceed-
multi-sample study. Working paper. ings of the Academy of Management, 11–16.
Judge, T. A., Higgins, C., Thoresen, C. J., & Barrick, M. R. (1999). The Paulhus, D. L. (1991). Enhancement and denial in socially desirable
Big Five personality traits, general mental ability, and career success responding. Journal of Personality and Social Psychology, 60, 307–317.
across the life span. Personnel Psychology, 52, 621– 652. doi:10.1111/ doi:10.1037/0022-3514.60.2.307
j.1744-6570.1999.tb00174.x Paulhus, D. L., & Reid, D. B. (1984). Two-component models of socially
*Kamdar, D., & Dyne, L. V. (2007). The joint effects of personality and desirable responding. Journal of Personality and Social Psychology, 46,
workplace social exchange relationships in predicting task performance 598 – 609. doi:10.1037/0022-3514.46.3.598
and citizenship performance. Journal of Applied Psychology, 92, 1286 – Paunonen, S. V., & Nicol, A. A. M. (2001). The personality hierarchy and
1298. doi:10.1037/0021-9010.92.5.1286 the prediction of work behaviors. In B. W. Roberts & R. Hogan (Eds.),
Kehoe, J. (2000). Research and practice in selection. In J. Kehoe (Ed.), Personality psychology in the workplace (pp. 161–191). Washington,
Managing selection in changing organizations: Human resource strat- DC: American Psychological Association. doi:10.1037/10434-007
egies (pp. 397– 437). San Francisco, CA: Jossey-Bass. *Peterson, R. S., Smith, D. B., Martorana, P. V., & Owens, P. D. (2003).
Kolar, D. W., Funder, D. C., & Colvin, C. R. (1996). Comparing the The impact of chief executive officer personality on top management
accuracy of personality judgments by the self and knowledgeable others. team dynamics: One mechanism by which leadership affects organiza-
Journal of Personality, 64, 311–337. doi:10.1111/j.1467-6494.1996 tional performance. Journal of Applied Psychology, 88, 795– 808. doi:
.tb00513.x 10.1037/0021-9010.88.5.795
Landy, F. J., Schankster, L. J., & Kohler, S. S. (1994). Personnel selection Ployhart, R. E. (2006). Staffing in the 21st century: New challenges and
and placement. Annual Review of Psychology, 45, 261–296. doi: strategic opportunities. Journal of Management, 32, 868 – 897. doi:
10.1146/annurev.ps.45.020194.001401 10.1177/0149206306293625
*LeDoux, J., & Kluemper, D. H. (2010, August). An examination of Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organiza-
potential antecedents and organization-based outcomes of metapercep- tions: Contemporary practice and research. Mahwah, NJ: Erlbaum.
tion accuracy. Paper presented at the annual meeting of the Academy of Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003).
Management, Montreal, Quebec, Canada. Common method biases in behavioral research: A critical review of the
Martinko, M. J., & Gardner, W. L. (1987). The leader member attribution literature and recommended remedies. Journal of Applied Psychology,
process. Academy of Management Review, 12, 235–249. doi:10.2307/ 88, 879 –903. doi:10.1037/0021-9010.88.5.879
258532 Roberts, B. W., Kuncel, N., Shiner, R. N., Caspi, A., & Goldberg, L. R.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). (2007). The power of personality: The comparative validity of person-
The validity of employment interviews: A comprehensive review and ality traits, socio-economic status, and cognitive ability for predicting
meta-analysis. Journal of Applied Psychology, 79, 599 – 616. doi: important life outcomes. Perspectives in Psychological Science, 2, 313–
10.1037/0021-9010.79.4.599 345. doi:10.1111/j.1745-6916.2007.00047.x
OBSERVER RATINGS OF PERSONALITY 773

Ross, L. (1977). The intuitive psychologist and his shortcomings: Distor- personality: The relative accuracy of self vs. others for the prediction of
tions in the attribution process. Advances in Experimental Social Psy- behavior and emotion. Journal of Personality, 68, 837– 867. doi:
chology, 10, 173–220. doi:10.1016/S0065-2601(08)60357-3 10.1111/1467-6494.00118
*Ross, S. M., & Offerman, L. R. (1997). Transformational leaders: Mea- Stewart, G. L. (1999). Trait bandwidth and stages of job performance:
surement of personality attributes and work group performance. Person- Assessing differential effects for conscientiousness and its subtraits.
ality and Social Psychology Bulletin, 23, 1078 –1086. doi:10.1177/ Journal of Applied Psychology, 84, 959 –968. doi:10.1037/0021-
01461672972310008 9010.84.6.959
Rotundo, M., & Sackett, P. R. (2002). The relative importance of task, *Taylor, P. J., Pajo, K., Cheung, G. W., & Stringfield, P. (2004). Dimen-
citizenship, and counterproductive performance to global ratings of job sionality and validity of a structured telephone reference check proce-
performance: A policy-capturing approach. Journal of Applied Psychol- dure. Personnel Psychology, 57, 745–772. doi:10.1111/j.1744-
ogy, 87, 66 – 80. doi:10.1037/0021-9010.87.1.66 6570.2004.00006.x
Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the cross-
Psychology, 59, 419 – 450. doi:10.1146/annurev.psych.59.103006.093716 roads: A response to Morgeson, Campion, Dipboye, Hollenbeck, Mur-
Salgado, J. F. (1997). The five factor model of personality and job perfor- phy, and Schmitt (2007). Personnel Psychology, 60, 967–993. doi:
mance in the European community. Journal of Applied Psychology, 82, 10.1111/j.1744-6570.2007.00098.x
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

30 – 43. doi:10.1037/0021-9010.82.1.30 Tett, R. P., Steele, J. R., & Beauregard, R. S. (2003). Broad and narrow
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Saucier, G. (1994). Mini-markers: A brief version of Goldberg’s unipolar measures on both sides of the personality–job performance relationship.
Big-Five markers. Journal of Personality Assessment, 63, 506 –516. Journal of Organizational Behavior, 24, 335–356. doi:10.1002/job.191
doi:10.1207/s15327752jpa6303_8 Van Iddekinge, C. H., McFarland, L. A., & Raymark, P. H. (2007).
Schmidt, F. L., & Hunter, J. E. (1992). Development of causal models of Antecedents of impression management use and effectiveness in a
processes determining job performance. Current Directions in Psycho- structured interview. Journal of Management, 33, 752–773. doi:
logical Science, 1, 89 –92. doi:10.1111/1467-8721.ep10768758 10.1177/0149206307305563
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection Van Iddekinge, C. H., & Ployhart, R. E. (2008). Developments in the
methods in personnel psychology: Practical and theoretical implications criterion-related validation of selection procedures: A critical review and
of 85 years of research findings. Psychological Bulletin, 124, 262–274. recommendations for practice. Personnel Psychology, 61, 871–925. doi:
doi:10.1037/0033-2909.124.2.262 10.1111/j.1744-6570.2008.00133.x
Schmidt, F. L., & Le, H. (2004). Software for Hunter–Schmidt meta- Van Iddekinge, C. H., Raymark, P. H., & Roth, P. L. (2005). Assessing
analysis methods. Iowa City, IA: Department of Management and Or- personality with a structured employment interview: Construct-related
ganizations, University of Iowa. validity and susceptibility to response inflation. Journal of Applied
Schmidt, F. L., Shaffer, J. A., & Oh, I.-S. (2008). Increased accuracy of Psychology, 90, 536 –552. doi:10.1037/0021-9010.90.3.536
range restriction corrections: Implications for the role of personality and Viswesvaran, C., & Ones, D. S. (2000). Measurement error in Big Five
general mental ability in job and training performance. Personnel Psy- factors personality assessment: Reliability generalization across studies
chology, 61, 827– 868. doi:10.1111/j.1744-6570.2008.00132.x and measures. Educational and Psychological Measurement, 60, 224 –
Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not 235. doi:10.1177/00131640021970475
validity and validity is not reliability. Personnel Psychology, 53, 901– Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative
912. doi:10.1111/j.1744-6570.2000.tb02422.x analysis of the reliability of job performance ratings. Journal of Applied
Schmidt, F. L., & Zimmerman, R. D. (2004). A counterintuitive hypothesis Psychology, 81, 557–574. doi:10.1037/0021-9010.81.5.557
about employment interview validity and some supporting evidence. Weekley, J. A., Ployhart, R. E., & Harold, C. M. (2004). Personality and
Journal of Applied Psychology, 89, 553–561. doi:10.1037/0021- situational judgment tests across applicant and incumbent settings: An
9010.89.3.553 examination of validity, measurement, and subgroup differences. Human
Scott, B. A., & Judge, T. A. (2009). The popularity contest at work: Who Performance, 17, 433– 461. doi:10.1207/s15327043hup1704_5
wins, why, and what do they receive? Journal of Applied Psychology, Whitener, E. M. (1990). Confusion of confidence intervals and credibility
94, 20 –33. doi:10.1037/a0012951 intervals in meta-analysis. Journal of Applied Psychology, 75, 315–321.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent doi:10.1037/0021-9010.75.3.315
structure of job performance ratings. Journal of Applied Psychology, 85, *Yoo, T.-Y. (2007). The relationship between HEXACO personality fac-
956 –970. doi:10.1037/0021-9010.85.6.956 tors and a variety of performance in work organizations. Korean Journal
*Shaffer, M. A., Harrison, D. A., Gregersen, H., Black, J. S., & Ferzandi, of Industrial and Organizational Psychology, 20, 283–314.
L. A. (2006). You can take it with you: Individual differences and *Zimmerman, R. D., Triana, M. C., & Barrick, M. R. (2010). Predictive
expatriate effectiveness. Journal of Applied Psychology, 91, 109 –125. criterion-related validity of observer-ratings of personality and job-
doi:10.1037/0021-9010.91.1.109 related competencies using multiple raters and multiple performance
*Small, E. E., & Diffendorff, J. M. (2006). The impact of contextual criteria. Human Performance, 23, 361–378.
self-ratings and observer ratings of personality on the personality–per-
formance relationship. Journal of Applied Social Psychology, 36, 297– Received February 25, 2010
320. doi:10.1111/j.0021-9029.2006.00009.x Revision received September 20, 2010
Spain, J. S., Eaton, L. G., & Funder, D. C. (2000). Perspectives on Accepted September 23, 2010 䡲

You might also like