You are on page 1of 28

Journal of Applied Psychology

© 2021 American Psychological Association


ISSN: 0021-9010 https://doi.org/10.1037/apl0000964

Meta-Analysis of Biodata in Employment Settings: Providing Clarity to


Criterion and Construct-Related Validity Estimates
Andrew B. Speer, Andrew P. Tenbrink, Lauren J. Wegmeyer,
Caitlynn C. Sendra, Mike Shihadeh, and Sugandhjot Kaur
Department of Psychology, Wayne State University

Although biodata inventories have long been used to hire job applicants, there are limitations to current
biodata knowledge and little in the way of contemporary biodata meta-analytic reviews. This study
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

establishes a precise understanding of biodata validity by conducting an updated meta-analysis that


This document is copyrighted by the American Psychological Association or one of its allied publishers.

differentiates biodata validity in terms of two important defining features: construct domain and scoring
method (rational, hybrid, empirical). Evidence was established in terms of criterion-related validity with job
performance and additional work outcomes, as well as convergent validity with common external hiring
measures. In total, 180 independent samples of criterion correlations were examined, and 63 samples were
analyzed containing correlations with convergent measures. Findings across the meta-analyses revealed that
biodata inventories are one of the most predictive assessment methods available, but that the relationship
with work outcomes differs by construct domain and scoring method. Empirically scored overall composite
scales had superior criterion-related validity (ρ = .44) to rationally scored composite scales (ρ = .24).
Scales developed to measure conscientiousness and leadership were generally the most predictive of the job
performance of the narrow construct domains, and particularly when empirically keyed. However, when
biodata scores were correlated with theoretically aligned performance ratings, rational scoring resulted in
similar validity coefficients as empirical scoring. Finally, biodata scales exhibited expected patterns of
correlations with external measures and were only modestly correlated with cognitive ability and five-factor
model personality scores. Taken together, biodata inventories are highly predictive assessment methods and
are likely to provide unique variance over other common predictors.

Keywords: biodata, application blanks, biographical information, employee selection, meta-analysis

Supplemental materials: https://doi.org/10.1037/apl0000964.supp

Biodata inventories have long been used to hire job applicants, differentiating biodata according to two major factors: construct
and this measurement method has demonstrated meaningful rela- domain and scoring method. It examines relationships with several
tionships with many work and life outcomes (e.g., Breaugh, 2014; performance outcomes and with common selection assessments,
Gessner et al., 1993; Mumford & Owens, 1987; Mumford et al., which is important in order to not only determine the construct
1996; Oswald et al., 2004; Rothstein et al., 1990; Schmidt & validity of biodata, but also its ability to uniquely contribute to
Hunter, 1998). Despite this, there are limitations to our current selection test batteries over and above other common assessment
knowledge of biodata inventories. These include a lack of an measures. This work contributes in several major ways to the
updated and nuanced meta-analytic understanding of the relation- biodata literature.
ships between biodata scores with performance outcomes and scores First, existing biodata meta-analyses are outdated and challenging
on other selection assessments. Importantly, past biodata summaries to interpret in several ways. The most recent meta-analyses were
have failed to differentiate biodata according to which construct conducted in the 1990s, with Bliesener (1996) assessing criterion-
domain the measurement assesses. Furthermore, past meta- related validity and Bobko et al. (1999) investigating construct
analytical work has failed to provide a systematic test of the impact relations with other predictors as part of a broader investigation.
of biodata scoring method on biodata validity. The current research The Bliesener’s meta-analysis included data up until 1993 with few
provides an updated and improved biodata meta-analysis by correlations with performance ratings. Nearly 30 years have passed
since. Other biodata meta-analyses or consortium studies are simi-
larly decades old (e.g., Carlson et al., 1999; Hunter & Hunter, 1984;
Reilly & Chao, 1982; Rothstein et al., 1990; Schmidt & Hunter,
1998) and therefore do not account for the considerable amount of
Andrew B. Speer https://orcid.org/0000-0002-3376-2103 primary research that has occurred since then. For example, the most
Portions of this research were presented at the 2021 Annual Conference for
recent meta-analysis of criterion-related validity by Bliesener (1996)
the Society for Industrial & Organizational Psychology in the presentation
titled “Meta-Analysis of Biodata at Work: Criterion & Construct Validity
only reported 16 effect sizes correlating biodata scores with job
Estimates.” performance ratings (Table 1). Our present study, even after being
Correspondence concerning this article should be addressed to Andrew B. more selective in the quality of the studies used, analyzed a total of
Speer, Department of Psychology, Wayne State University, 5057 Woodward 97 independent samples for this effect. In total, 180 independent
Avenue, Detroit, MI 48402, United States. Email: speerworking@gmail.com samples of criterion correlations were examined in this study, and 63

1
2 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

independent samples were analyzed containing convergent predic- biodata, researchers and practitioners can develop a more precise
tor correlations, thus providing a much more comprehensive set of understanding of the validity of this measurement method.
meta-analyses of biodata’s effects. Fourth, although several primary studies have recently examined
Relatedly, there is a need for a more rigorous biodata meta- the impact of scoring method on biodata validity (e.g., Cucina et al.,
analysis to establish firm, accurate estimates of biodata relation- 2012), studies are generally limited to single biodata inventories and
ships. Several features of past work make it challenging to interpret applications within the same or similar organizations. For example,
biodata validity estimates. For example, the Bliesener’s (1996) Cucina et al. (2012) is an excellent study that demonstrates the
meta-analysis used self-report performance ratings, which are benefits and weaknesses to various biodata scoring methods and
known to be biased (Murphy et al., 2018), combined studies that across various sample sizes. However, these analyses were per-
used different methods of obtaining respondent life history (open- formed on a single biodata inventory and in a single organizational
ended survey responses, standardized questionnaires, interview context, and conclusions regarding the superiority of scoring
responses), did not correct for artifacts such as criterion unreliability method would be strengthened if findings were generalized across
and range restriction, and at times reported correlations based on more biodata inventories and across samples. Within a single study
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

criterion averages across bands of biodata scores rather than indi- it is possible that a rational scoring key was poorly developed, thus
This document is copyrighted by the American Psychological Association or one of its allied publishers.

vidual respondents. These features make it challenging to interpret serving as a reason why an empirical key may be superior. For more
past biodata effects and highlight a need to perform an updated and robust confidence regarding the superiority of either method, valid-
more rigorous account of biodata validity. ity must be compared across different inventories and different
Third, past meta-analytical work has failed to consider the contexts. This is important because there is still debate regarding the
construct validity and intended construct domains of biodata scales. appropriateness of different scoring methods (e.g., rational, empiri-
Table 1 provides a comparison of major recent biodata summary cal) in terms of validity. Contemporary narrative reviews of biodata
studies in comparison to this study, and as seen, none empirically are noncommittal or inconclusive regarding the predictive superi-
delineated biodata scales by intended construct domain. Although a ority of any single scoring method (Breaugh, 2009; Schmitt &
strength of biodata is that it is a measurement method and can Golubovich, 2013), and this is likely fueled by not having synthe-
therefore assess a range of constructs (e.g., knowledge and proce- sized empirical evidence regarding these methods. Many research-
dural skills, leadership), this has resulted in challenges interpreting ers and practitioners simply do not know which of these methods
should be preferred if the goal is to maximize criterion-related
the construct validity of biodata. For example, a biodata scale
validity. A lack of meta-analytical evidence regarding scoring
designed to reflect intellectual ability will differ substantially
method is thus a knowledge gap; such meta-analytical evidence
from one that reflects social skills; these scales represent separate
would have a substantial impact on how practitioners apply biodata
construct domains that would be expected to demonstrate differen-
in the field, and as this paper will show, scoring method is a major
tial correlations with work outcomes (e.g., problem-solving at
moderator of biodata validity.
work) and other psychological measures (e.g., extraversion). Con-
Fifth, our understanding of assessment matures as we increas-
struct domain thus serves as a factor that must be considered when
ingly understand how it relates to other constructs and measures. An
comparing the validity of different biodata scales.
often overlooked but important concern is how hiring methods
Although this has been previously acknowledged (e.g., Bobko
correlate with other known predictors commonly used to hire job
et al., 1999; Mumford et al., 1996, 2012), we still do not have a firm
applicants. Knowledge of predictor intercorrelations allows for an
empirical understanding of the construct validity of biodata scales.
understanding of construct validity by empirically establishing
Past biodata meta-analyses have not distinguished biodata scales in nomological networks (Cronbach & Meehl, 1955). It also has
terms of intended construct domains; instead, biodata scales have practical implications: understanding whether an assessment will
been broadly lumped together. Thus, even though per the last meta- provide unique variance over an existing selection measure is useful
analysis (Bliesener, 1996) and a large consortium study (Rothstein when developing an assessment procedure. For example, if biodata
et al., 1990), biodata is estimated to correlate about .32–.33 with job measures correlate very highly with other common measures such as
performance at work, it is unclear whether these estimates apply to personality assessments or cognitive ability tests, biodata scores
all biodata scales consistently (Table 1). A lack of construct domain would be redundant when added to hiring protocols. On the other
differentiation has rendered our current meta-analytic estimates of hand, predictors that exhibit less overlap are more capable of
biodata validity as limited and has hampered biodata theory. providing unique variance over and above one another. For these
As such, this study differentiates biodata scales according to reasons, it is important to investigate the correlations between
intended construct domains while introducing a biodata taxonomy biodata scores and scores from other assessment measures. Thus,
to help make sense of the literature, then summarizing results this study also contributes to the literature by establishing correla-
according to this taxonomy. This work builds on past biodata tions between biodata scores (by construct domain) and other
research (e.g., Mumford & Owens, 1987; Mumford et al., 1996; common preemployment assessments. As shown in Table 1, current
Oswald et al., 2004; Schmitt & Golubovich, 2013; Sisco & Reilly, large-scale evidence of such relationships is limited.
2007) and contributes by providing a much more precise under- Taken together, there is an obvious need for an updated and more
standing of biodata validity in terms of correlations with job comprehensive meta-analysis of biodata validity, and especially one
performance, separated by biodata construct domain. This is in that provides a nuanced understanding of biodata inventories by
line with Lievens and Sackett’s (2017) modular approach to selec- differentiating biodata validity according to construct domain and
tion instruments and their call to conduct meta-analyses that break scoring method, as well as in terms of both criterion-related validity
predictor method relationships by defining method factors and by and relationships with other external selection assessments. We
targeted constructs. By differentiating the construct domains of begin by describing what biodata is in terms of format, construct
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Table 1
Major Recent Large Scale Biodata Summary Studies in Comparison to This Study

rxy Validity (omnibus) rxy Validity (multiple specific construct domains)


Adv Perf Specific Obj Training Adv
Study/moderator Study type Perf ratings Specific perf dim Obj perf Training perf pot ratings perf dim perf perf pot

Rothstein et al. (1990) Consortium 11,332 (1), .27 (.33) 11,332 (1), .27 (.33)
Empirical/hybrid Consortium 11,332 (1), .27 (.33) 11,332 (1), .27 (.33)
Rational
Bliesener (1996) Meta 14,025 (16), .32 (NA) 3,582 (19), .53 (NA) 35,256 (49), .36 (NA)
Empirical/hybrid
Rational
Cucina et al. (2012) Single study 5,272 (1), .25 (NA)
Empirical/hybrid Single study 5,272 (1), .35 (NA)
Rational Single study 5,272 (1),p.15 (NA) p p p p p p p p p
This Study Meta p p p p p p p p p p
Empirical/hybrid Meta p p p p p p p p p p
Rational Meta

Convergent correlations (multiple specific


Convergent correlations (omnibus) construct domains)
Study/moderator GMA E A C ES | O GMA E A C ES | O

Bobko et al. (1999) Meta 1,363 (2), .19 (NA) 1,363 (2), .51 (NA)
BIODATA META-ANALYSIS

Empirical/Hybrid
Rational p p p p p p p p p p
This study Meta p p p p p p p p p p
Empirical/hybrid Meta p p p p p p p p p p
Rational Meta
Note. The chosen articles were viewed as seminal, large-scale studies completed in the past several decades, with some recommended by the review team. Within cell, the data are presented as sample size
(number of studies), uncorrected correlation (corrected correlation). The corrected correlations shown in the table were corrected for criterion unreliability and range restriction. In studies where effect size
estimates were provided for both empirical and rational scoring, we averaged the two effect sizes for the overall results in this table. GMA = general mental ability; E = extraversion; A = agreeableness; C =
conscientiousness; ES = emotional stability; O = openness to experience (latter two combined to save table space). Omnibus refers to biodata analyses without delineation by construct domain. “rxy Validity” =
criterion-related validity; “Perf ratings” = performance ratings; “Specific perf dim” = specific performance dimensions; “Obj perf” = objective performance; “Training perf” = training performance; “Adv pot”
= advancement potential.
3
4 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

domain, and item scoring methods. We also introduce a taxonomy to are a method of measurement and therefore can assess a wide variety
code existing biodata scales into meaningful construct domains. We of constructs. According to ecological theory (e.g., Breaugh, 2009;
then examine the criterion-related and construct-related validity by Mael, 1991; Mumford et al., 1990), individuals enter life with a
construct domain and scoring method. hereditarily determined set of individual differences and environ-
mental resources, and via adaptation to the environment, that person
is shaped over time. A person is exposed to or chooses what
What Is Biodata? situations to engage in, which results in required adaptations to
Biodata inventories have a long history of use and research. This attain goals within those contexts. This produces a cyclical and
measurement method is based on the notion that past behavior is the interactive process where situational exposure and choice lead to
best predictor of future behavior. Thus, biodata inventories include adaptation, which in turn leads to future choices and changes that
questions about a person’s life history. Early accounts of biodata shape the qualities of the individual. This theory is further bolstered
instruments can be traced to the 19th century, where they were by social identity theory, such that experiences that are salient to
one’s self-identity shape future behavioral tendencies and self-views
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

utilized for hiring insurance agents (Stokes, 1994). Since then,


(Mael, 1991).
This document is copyrighted by the American Psychological Association or one of its allied publishers.

biodata inventories have been administered across a range of


settings. For example, they were used during World War II to select The major takeaway is that past history can explain currently
pilots, with the question “Did you ever build a model airplane that existing differences in people, which in turn are related to important
flew?” holding its place in biodata folklore as one of the best life criteria (e.g., job performance). These past experiences can be
predictors of success in that role. More commonly, biodata inven- conceptually grouped into factors or components, such that certain
tories have been used across the public and private sector as part of experiences are expected to covary; this has been demonstrated both
larger selection batteries for a range of different job types. empirically and based on a priori theory (e.g., Cucina et al., 2013;
In terms of a definition, most researchers will agree that biodata Karas & West, 1999; Mumford & Owens, 1987; Mumford et al.,
deals with describing behaviors and events occurring earlier in one’s 1996; Oswald et al., 2004; Sisco & Reilly, 2007; Stricker & Rock,
life, including personal background and life history events (Asher, 1998). These covarying experiences share a similar construct
1972; Hough, 2010; Mael, 1991; Mumford et al., 2012; Nickels, domain, acting either as reflective indicators caused by a latent
1994; Owens & Schoenfeldt, 1979). Furthermore, most acknowl- construct, or serving as causal indicators that influence a formative
edge that biodata inventories are standardized measures about a construct.
person’s past (Mumford et al., 2012). For this study, we define The distinction of formative versus reflective may not always be
biodata based on Mael’s (1991) seminal biodata review, which clear for biodata, given the interactive assumptions of the ecological
outlined a theory and taxonomy of biodata features that is still widely model. However, formative domains are most likely to occur when
followed today. Namely, the defining attribute of biodata is that it is situational exposure is not under the control of the person but which
historical in nature (i.e., deals with past events) and that it avoids nonetheless is likely to shape a person’s qualities (i.e., input vari-
overly generalized perceptions of oneself across many different types ables, per the language of Owens & Schoenfeldt, 1979). Parental
of situations. Thus, questions that are generalized perceptions across warmth would be an example representing this, as taken from
contexts, such as “I am a social person,” are not biodata, nor are Mumford and Owens (1987). Reflective biodata indicators, on
situational judgment questions, or questions that deal with a future the other hand, are more directly caused by the latent construct
frame of reference. However, perceptions and attitudes regarding a domain(s). For example, questions about how many hours someone
person’s past behaviors are biodata. For example, asking someone spent studying are largely a function of one’s achievement motiva-
how they typically behave in social situations (Figure 1) would tion (or other related traits). As another example, getting higher
constitute a biodata item, even though the item is based on a person’s grades than others, quickly reading books, and more capably solving
self-assessment of their past behavior. problems than one’s peers are all at least partial indicators of
A few clarifying points about biodata inventories must be made intelligence as well. In this regard, biodata items are very capable
when discussing the definition of biodata. First, biodata inventories of measuring traditional psychological constructs such as personal-
ity traits, cognitive abilities, knowledge, and interests.
Figure 1 Second, there has been debate whether biodata inventories
Example Biodata Items should only capture objective past behaviors (i.e., “hard” content,
e.g., “Have you ever formally supervised a team?”), or whether
(a) When working on teams, I typically... attitudes and perceptions of past behaviors should be included as
am supportive of other employees’ ideas. well (i.e., “soft” content, “Do you argue with peers at work?”).
help implement others’ plans to perfection. When biodata assessments are “soft” and veer from more objective
provide suggestions to the group on what to do. items, the distinction between biodata inventories and other self-
take the lead when no one else will.
report assessments (e.g., personality inventories) may become
(b) In the past year, how many times were you late to
school or work? blurred. This concern is valid, though also to be expected. Self-
Never report personality assessments and biodata inventories are both
Once methods of assessment, and the domain of personality constructs
Twice (which can be measured by many methods, e.g., self-report, peer-
Three-to-four times
More than four times report, assessment centers, narratives) is tremendously broad con-
In the past year I was not involved in work or school sidering personality is generally defined as a person’s typical pattern
of thoughts, behaviors, and emotions (Funder, 2016).
BIODATA META-ANALYSIS 5

Given this incredible scope of behaviors encompassed within the developed to meet very specific job or company needs, other
personality spectrum, biodata inventories will inevitably capture domains reflect more generic categories that can encompass differ-
personality-related variance. However, the key to differentiating ent types of content, such as knowledge and procedural skills.
biodata inventories from self-report personality assessments is by Although a researcher could try to code all individual types of
relying upon Mael’s (1991) definition that biodata is historical in knowledge into separate domains, the process would become
nature (i.e., deals with past events) and avoids overly generalized unwieldy. Additionally, and to maintain parsimony, similar domains
perceptions of oneself across many different types of situations. such as emotional stability and self-confidence were grouped
Based on this, some items used in traditional personality inventories together. The purity of this distinction is less than either by itself,
clearly could be incorporated within biodata inventories (e.g., “I but evidence suggests the two constructs are highly related and in
talk to a lot of different people at parties” from the International fact often treated within the same construct space (Judge & Bono,
Personality Item Pool) and are incorporated in biodata inventories. 2001). Other categories are defined solely by the intended outcome
On the other hand, items such as “I radiate joy” would not constitute of prediction, assuming that the outcome has a well-defined con-
as biodata. Thus, biodata inventories can be seen as similar to self- struct space, such as academic achievement.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

report personality inventories but narrowed in definition while also Finally, overall composite scales are scales that incorporate items
This document is copyrighted by the American Psychological Association or one of its allied publishers.

capable of assessing a wider array of constructs.1 from multiple construct domains into one measure. Overall composite
scores reflect a direct aggregation of biodata item scores into a single
Taxonomy of Biodata Constructs overall composite or the aggregation of multiple biodata scale scores
into a composite. These overall scores are often used in practice within
Different construct domains represent different clusters of larger inventories to facilitate work decisions (e.g., to hire applicants)
covarying behaviors, which in turn should relate differently to other because they are designed to comprehensively sample behaviors from
variables. Like other measurement methods (e.g., interviews— the entire job domain. Thus, overall composite scores do not reflect a
Huffcutt et al., 2001; assessment centers—Meriac et al., 2008; single construct but rather serve as broad indices to predict work
personality inventories—Barrick & Mount, 1991), construct domain outcomes. Overall composite scores share similarities to composite
should impact both the construct validity and criterion-related scores used from other selection measures such as assessment centers
validity of biodata scores. However, one of the primary theoretical (i.e., overall assessment rating), interviews (i.e., composite score),
voids in the biodata literature is failing to consider such construct multidimensional personality inventories (i.e., job-relevant composite
differences. To remedy this, a classification scheme to code biodata or profile score), and multidimensional situational judgment tests
instruments into construct domains was developed.
(i.e., composite score). Construct-related understanding is diminished
Before describing this effort, it should be emphasized that there
by collapsing across multiple construct domains, though this is offset
will never be a single agreed-upon taxonomy for which to address
because the composite score more comprehensively spans the domain
biodata. Biodata can assess far too many constructs, and those
of job-relevant variance and therefore should be the most predictive
constructs vary substantially in terms of specificity. Nonetheless, we
biodata operationalization.
created a taxonomy specifically for this study. In doing this, several
Considering how these domains will differ in terms of criterion-
criteria were applied in efforts to describe the existing literature:
related validity, assessments that capture a breadth of job-related
(a) the taxonomy should be capable of encompassing all forms of
content should exhibit higher criterion-related validity for broad
knowledge, skills, abilities, and other characteristics and experi-
outcomes like overall job performance, in line with bandwidth-
ences, (b) the taxonomy should be a moderate level of specificity so
fidelity theory (e.g., Ones & Viswesvaran, 1996; Schneider et al.,
to be comprehensive but parsimonious, (c) the taxonomy should
1996). This is because broad bandwidth measures are more likely to
align both with past biodata research and with contemporary con-
assess the many KSAOs that are important at work. In addition to
ceptualizations of assessment and performance domains, and
(d) emphasis was given to job-related construct domains, ignoring overall job performance, we also examine relationships with
those with very distal conceptual relationships to job criteria construct-aligned performance ratings that target the same behav-
(e.g., parental warmth, fear, socioeconomic status). With those ioral domain as the predictor. This has recently been advocated
criteria in mind, we began reviewing existing taxonomies (e.g., (Christian et al., 2010), as it provides clarity regarding why pre-
Mumford & Owens, 1987; Mumford et al., 1996; Oswald et al., dictors relate to outcome variables. In line with bandwidth-fidelity
2004; Sisco & Reilly, 2007), and the research team triangulated theory and construct correspondence theory (e.g., Judge et al.,
communalities across them. Additionally, we consulted past efforts 2013), it is likely that narrow biodata domains will correlate
to classify constructs from other assessment methods and relied more strongly with construct-aligned performance ratings. Finally,
upon Huffcutt et al.’s (2001) interview construct taxonomy, which relationships with other performance outcomes (objective
has been used for other methods-based meta-analyses (Christian
1
et al., 2010). We also spoke with researchers and practitioners who The similarities between biodata inventories and self-report personality
inventories can be frustrating for researchers. Although Mael (1991) and
have used biodata inventories. After doing all this, a classification
other biodata experts (e.g., Asher, 1972) seem to prefer that biodata be
scheme of 11 construct domains was specified, along with an objective and external, Mael and contemporary perspectives (Schmitt &
“other” category2 and an overall composite score category. Table 2 Golubovich, 2013) describe biodata using both “soft” and “hard” items
displays these classifications, with definitions and examples of (Asher, 1972).
scales that would be coded within each category.
2
A reviewer indicated the “other” category is of little theoretical or
practical benefit. We agree results for this category are not greatly informa-
There are a few aspects to note regarding this classification tive, and yet some researchers might prefer comprehensive coverage of all
scheme. First, many domains come from well-known constructs biodata research within this meta-analysis. As such, we have placed biodata
(e.g., conscientiousness). Because biodata inventories are often results for the “other” category in Appendix B.
6 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

Table 2
Biodata Construct Domain Classifications

Construct domain Definition Example categorizations

Mental capacity Demonstrates ability to use and manipulate information, Verbal skills (Reiter-Palmon & Connelly, 2000); learning
learn, and solve problems. ability (Karas & West, 1999)
Knowledge and Possesses domain-specific knowledge (e.g., product Knowledge (Oswald et al., 2004); technical knowledge
procedural skills knowledge) or procedural and technical skills (e.g., can (Stokes et al., 1999)
use certain tools) through experience.
Social skills/sociability Demonstrates effective communication, the capacity to Extraversion (Cucina et al., 2013); social adjustment
influence others, persuasion, sociability, and interpersonal (Reiter-Palmon & Connelly, 2000)
competence.
Agreeableness Demonstrates tendency to work well with others, altruistic Agreeableness (Cucina et al., 2013); focus on others
behavior, compliance, and modesty. (Stokes et al., 1999)
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Conscientiousness Demonstrates organization, tidiness, and deliberation; Conscientiousness (Cucina et al., 2013); need for
This document is copyrighted by the American Psychological Association or one of its allied publishers.

hard-working, persistent, and driven by success. achievement (Stricker & Rock, 1998)
Emotional stability and Demonstrates positive view of self and abilities. Shows Emotional stability (Cucina et al., 2013); calmness/self-
self-confidence tendencies to be calm, unworried, resistant to stress, and assurance (Stokes et al., 1999)
has a positive self-concept.
Openness to experience Demonstrates an appreciation of art, music, and poetry; Openness to experience (Cucina et al., 2013); artistic
curious, creative, imaginative, and considers new ideas. (Oswald et al., 2004)
Leadership Scale developed explicitly to assess leadership tendencies, Leadership (Uhlman & Mumford, 1993); leadership
experiences, preferences, or propensity. Demonstrates (Karas & West, 1999)
dominance and assertiveness; motivates others in a group.
Academic achievement Scale developed explicitly to measure academic achievement, GPA biodata key (Oswald et al., 2004); high-school
which includes succeeding in school and obtaining good achievement/college achievement (Gandy et al., 1994)
grades.
Interests/preferences Shows an attitude or preference toward some specific activity, Army identification (Wasko et al., 2019)
object, or event.
Physical fitness Is physically fit (i.e., cardiovascular, physical strength, Fitness motivation (Putka & Bradley, 2008)
flexible), exercises.
Other Any single-construct domain that does not fit into the Health (Oswald et al., 2004); multicultural (Oswald
preceding categories. et al., 2004)
Overall composite scale Scale consisting of multiple or disparate construct domains Individual achievement record (Gandy et al., 1994)
for the purposes of decision making (e.g., applicant
hiring).
Note. All biodata scales were classified into the taxonomy separately by two raters, with an agreement of 89%. Coders met to resolve any rating disagreements
by referring to the original article and then making a consensus decision.

performance metrics, training performance, advancement potential) performance, turnover). The empirical weighting can occur at the
were also examined. Taking an exploratory approach that focuses response option level (e.g., via point-biserial correlations with the
on establishing empirical evidence across the many measured outcome), or by using empirical methods to remove or optimally
biodata construct domains: weight items via regression-based methods (Cucina et al., 2012;
Putka et al., 2018). Cucina et al. (2012) provide an overview of
Research Question 1: What is the relationship between biodata various empirical scoring methods. Such methods work by assign-
scores and work outcomes (by biodata construct domain)? ing more weight to response options or items that are more strongly
associated with desirable performance outcomes. Using the last
example, response choices from each option in Figure 1a would be
Item Format and Scoring
weighted by their correlations with the criterion (e.g., leadership
There is great variability in how biodata items are scored. Item ratings) when using the point-biserial method described by Cucina
scoring can generally be differentiated into three broad types: et al. (2012). Although empirical keying will score biodata inven-
(a) rational keying, (b) empirical keying, and (c) hybrid blends of tories in a purely empirical manner, best practice still incorporates
the two. Rational keying uses expert judgments to assign points to the development of questions based on job analysis or explicit
response options in efforts to assess specific constructs (e.g., Mumford domain sampling (Fine & Cronshaw, 1994; Russell, 1994). Thus,
et al., 1996; Oswald et al., 2004; Reiter-Palmon & Connelly, 2000; empirical keying does not preclude the presence of theory when
Stricker & Rock, 1998). Using Figure 1a as an example, if item developing biodata inventories, and contextual considerations and
developers created this item to assess leadership, the third and fourth choice of criteria can further be incorporated to apply theoretical
options would receive points when scoring the item and the alternate justification to the practice of empirical keying (Speer et al., 2020).
options would not; this ensures that only options theoretically Finally, hybrid methods incorporate aspects of each approach by
aligned to the targeted construct domain are awarded points. assigning weights to options that covary with important criteria or
Rational scoring may be beneficial because it produces a clearer constructs, but only doing so if the weighting scheme makes
understanding of scale meaning. conceptual or theoretical sense. Returning to Figure 1a as an
This contrasts with empirical keying, which weighs item re- example, if the fourth option were negatively correlated with the
sponses by how well they covary with important criteria (e.g., job criterion but the test developers were trying to measure leadership,
BIODATA META-ANALYSIS 7

they might assign a weight of zero instead of a negative value, given employee selection systems, they are constructs that have direct
this option is theoretically related to assertive leadership behaviors. relevance to many of the biodata construct domains examined in this
Hybrid scoring thus balances the power of empirical scoring with a study, and because the included studies provided enough data in
more theory-driven approach. order to assess relationships with the biodata scores.
With a large enough sample to offset capitalizations on chance Conceptual alignment between biodata construct domains and
(i.e., overfitting), empirically driven scoring should maximize external measures (i.e., expected stronger convergent correlations)
criterion-related validity in new settings. Nonetheless, many re- was determined based on rational construct linkages and the existing
searchers have criticized empirically scored biodata. There are literature. As a method of measurement, biodata can assess multiple
several reasons for this. First, there are challenges in inferring constructs and, in many cases, very close equivalents to the FFM
construct validity when empirical scoring is used. Rational scales (e.g., Sisco & Reilly, 2007) or cognitive ability (Mount et al., 2000).
are more interpretable and easier to communicate. Interpretability is Listing where similar constructs are measured and therefore larger
relevant in many contexts, though there are ways that researchers correlations are expected: the biodata domain of mental capacity
can establish construct validity of empirically derived scales (Speer overlaps conceptually with cognitive ability, the biodata domain of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

et al., 2020). Second, past comparisons between rational and empir- social skills & sociability overlaps with FFM extraversion through the
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ical methods have sometimes suggested there is no difference in enthusiasm facet of extraversion (DeYoung et al., 2007), the biodata
criterion-related validity between these scoring methods (e.g., domain of agreeableness overlaps with FFM agreeableness, the
Hough & Paullin, 1994; Jackson, 1975; Schoenfeldt, 1999). biodata domain of conscientiousness overlaps with FFM conscien-
However, past work espousing that empirical scoring does not tiousness, the biodata domain of emotional stability & self-confidence
improve prediction have methodological limitations. Chief among overlaps with FFM emotional stability, and the biodata domain of
them is that most past studies have not held confounding factors openness to experience overlaps with FFM openness to experience.
constant when making comparisons, such as restricting item content Though assessed via different methods, these sets of scores are
to be identical across comparisons, or by looking at different criteria assumed to measure either the same or very similar constructs.
when making comparisons (Hough & Paullin, 1994; Jackson, 1975; Regarding the other expected relationships, the biodata domain of
Schoenfeldt, 1999; Stokes & Searcy, 1999). In cases where the exact mental capacity should correlate with openness to experience,
same items are used and in the exact same contexts, evidence given that many cognitive ability measures correlate with this trait
generally supports empirical scoring as having higher criterion- (e.g., Ackerman & Heggestad, 1997). The biodata domain of
related validity, with Cucina et al. (2012) demonstrating this across knowledge & skills should correlate with cognitive ability (e.g.,
various calibration sample sizes. That said, current meta-analytical Schmidt & Hunter, 2004). Furthermore, because conscientiousness
evidence that contains data from multiple scales, jobs, organiza- is also generally related to job knowledge (Huang et al., 2013;
tions, and across a large sample does not exist to concisely summa- McIlveen et al., 2013) and skill obtainment (e.g., Schmidt &
rize these differences. Such data could greatly help guard against Hunter, 1998), the biodata domain of knowledge and skills is
false assumptions regarding scoring methods and provide definitive also expected to be related to FFM conscientiousness. Social skills
guidance for researchers and practitioners. (i.e., biodata domain of social skills/sociability) is expected to be
correlated with FFM emotional stability (Joseph et al., 2015), and
Research Question 2: Does the criterion-related validity of biodata self-confidence (i.e., biodata domain of emotional stability & self-
scores vary by scoring method (by biodata construct domain)? confidence) is expected to be related to FFM extraversion via
extraversion’s dominance facet (Bosshardt et al., 1992). Openness
to experience (i.e. biodata domain of openness to experience) is
Construct Validity With Other Predictor Measures
linked to cognitive ability (Ackerman & Heggestad, 1997). The
An oft-overlooked but important issue is a measure’s correlations biodata domain of leadership should also be related to FFM
with other predictors. Not only do assessment intercorrelations extraversion and emotional stability given the shared construct
provide meaningful evidence regarding that measure’s construct overlap regarding the behavioral tendencies of assertiveness and
validity in terms of convergent and discriminant correlations, but the dominance (Bosshardt et al., 1992; Judge et al., 2002). Finally, the
intercorrelations affect whether the hiring measure is likely to biodata domain of academic achievement is expected to be corre-
explain incremental variance over and above other measures. If a lated with FFM conscientiousness and cognitive ability, two traits
hiring assessment exhibits excessively large correlations with other commonly associated with academic success (Poropat, 2009).
predictors, there is less available unique variance to improve
prediction of job performance. On the other hand, predictors that Research Question 3: What are the relationships between
exhibit less overlap are more capable of providing unique variance biodata scores (by construct domain) and scores from FFM
over and above one another. For these reasons, it is important to personality measures and cognitive ability tests?
investigate the correlations between biodata scores and scores from
other commonly used prehire assessments.
Method
Construct-related validity evidence is operationalized here by
convergent and discriminant correlations with nonbiodata measures. Literature Search Plan
The convergent assessments included as part of this study include Biodata Studies
the FFM personality traits (labeled as personality assessments when
measured) and cognitive ability, as assessed via performance-based Several rules were established to guide article inclusion for the
tests. Although other constructs and assessments are used in prac- meta-analysis. As a summary, we refer the readers to Figure 2,
tice, these were chosen because they are commonly used within which contains a PRISMA style figure (Page et al., 2020)
8 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

Figure 2
Study Inclusion Flowchart
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Note. See the online article for the color version of this figure.

documenting the decision process. We also refer readers to the where content, scoring method, or criterion were unclear, we opted
Online Supplemental Materials for lists of all effect sizes. on the side of caution and did not code. In cases where the scoring
First, studies were restricted to measures that were labeled as method could not be discerned but the scale was coded as biodata,
biodata and not mixed measures that incorporated other types of we recorded the scoring method label as “unsure.” Fourth, for any
assessments as well (e.g., situational judgment questions); for stud- scale that used empirical scoring (at the item or option level), we
ies that provided example biodata items, this content had to adhere only included results for holdout samples that were independent of
to the definition of biodata outlined previously in this paper for scale creation or revision. This is vital to ensure empirical methods
inclusion (see “What Is Biodata?” section). We excluded open- do not unfairly capitalize on chance (i.e., overfit) when calculating
ended resume data, or any information provided from other sources empirical relationships. Note that internal scale analyses (e.g., cor-
(e.g., peer-provided biographical information). Second, we only rected item–total correlation analyses, factor analytically deter-
included biodata scales that contained at least seven items, like mined scale composites) are not empirical scoring in the
Bliesener (1996). Scales with few items are unlikely to fully capture traditional sense, and thus these were not coded as such. Fifth,
intended construct domains, nor are they likely to be sufficiently given that some biodata studies explored multiple combinations of
reliable. The requirement of at least seven items increases the items from the same item pool, we only incorporated the final scale
chances that a scale adequately reflects the intended construct, is (within construct category) for any given study.
reliable, and therefore would be appropriate for use in practice. Sixth, our primary focus was on criterion-related validity with job
Third, we only included scales that contained descriptions regard- performance ratings. For this outcome, we only included studies
ing the content of the scale, which allowed sorting into construct where ratings were provided by supervisors, and thus excluding any
domains, and we only coded articles where information regarding self-report criteria or peer criteria, which can possibly be construed
the coded variables was clear (e.g., what the outcome was, whether with predictors themselves. Managerial performance ratings are
the scale was biodata, what the construct domain was). In cases “among the most commonly used and appropriate measures of
BIODATA META-ANALYSIS 9

performance” (p. 12, Principles for the Validation & Use of per DeYoung et al.’s (2007) 10-facet taxonomy. Thus, we required
Personnel Selection Procedures, 2018). Although other criteria that the measure broadly sampled from the larger FFM construct
can be used for validation purposes (e.g., archival metrics, produc- domain for a mapping to occur. Paper and pencil or computer-
tion metrics), these are more likely to suffer from criterion defi- administered cognitive ability tests were incorporated as measures
ciency and contamination, and the construct they are targeting will of cognitive ability, including the ACT and SAT.
vary considerably from setting to setting. Nonetheless, we also
calculated criterion relationships with objective performance me-
Article Search
trics, training performance, and advancement potential to provide a
more comprehensive account of biodata’s criterion-related validity We conducted a multipart search for relevant biodata articles
across work outcomes. containing primary data. We first conducted a comprehensive
Seventh, we only coded zero-order correlations or statistics that review of all studies published since Bliesener (1996), which
could be converted to observed correlations. In cases where a included biodata articles dating back to 1993, and then later
biserial correlation or a corrected correlation could not be converted supplemented that with studies from Bliesener’s article list. In
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

to the observed value, the study was not coded. Eighth, we only the first stage of this, we began by reviewing seminal biodata papers
This document is copyrighted by the American Psychological Association or one of its allied publishers.

coded primary effect sizes rather than papers that reported meta- and their references. We also examined studies that cited those
analytic summaries of consortium studies. In cases where consor- papers. Second, we supplemented this by searching research data-
tium studies were reported, attempts were made to contact authors bases (e.g., PsycINFO, Google Scholar) for relevant articles from
for access to individual study effects. 1993 onward using terms including biodata, biographical data,
achievement record, weighted application blank, and life history.
Criterion and Convergent Variables Third, we conducted an electronic search of the ProQuest Disserta-
tions and Theses Global database to search for doctoral dissertations
Overall job performance ratings and construct-aligned perfor- that could be included. Fourth, we used the Defense Technical
mance ratings were focused on as outcomes to compute criterion- Information Center (DTIC) to search online databases containing
related validity. (a) Overall performance ratings were operationalized research and technical reports sponsored by the U.S. Department of
as any single item overall performance judgment or any composite of Defense and other government agencies. Fifth, we reviewed con-
performance judgments broadly reflecting employee performance. ference programs for the past 10 years of the Society for Industrial
We only included job performance ratings from supervisors or and Organizational Psychology annual conference. We also con-
superiors. This was done for performance judgments and also for tacted consulting companies for any available technical reports.
(b) construct-aligned performance ratings. Aligned performance Following this review of post-1992 studies, we examined and coded
ratings consist of any performance rating that was conceptually articles from Bliesener. In addition to this, the Biodata Handbook
aligned to the biodata construct domain of interest (e.g., for a biodata (Stokes et al., 1994) was inspected for remaining possible articles.
scale assessing “leadership,” an example aligned performance rating After accumulating a potential list of articles, the study authors
might be “leading others”). By incorporating aligned performance reviewed them to determine if they met study criteria. If so, articles
ratings, this study assessed biodata relationships when predictors had were assigned for review. In total, 180 independent samples of
optimal theoretical alignment with criteria. As discussed by Christian criterion correlations were examined in this study, and 63 indepen-
et al. (2010), this allows for optimal clarity regarding the relation- dent samples were analyzed containing correlations with convergent
ships between predictors and outcome constructs. predictor correlations. We refer readers to the Online Supplemental
We also analyzed several additional criteria variables that Materials for details regarding the included studies.
included objective performance metrics, training performance,
and advancement potential. Objective performance metrics include
Coding Biodata Scales
regularly tracked performance measures such as counts, production
rates, sales, timeliness metrics, quality metrics, quantity metrics, and Study authors coded biodata scales into the taxonomy provided in
financial metrics (Bommer et al., 1995; Pulakos & O’Leary, 2011). Table 2. Two raters independently made judgments of each article.
Training performance included instructor ratings, composite mea- All biodata scales were classified into the taxonomy separately
sures taken over a training period, or grades from training courses at based on scale content (agreement = 89%). Coders met to resolve
work. Advancement potential was operationalized as judgments any rating disagreements by referring to the original article and then
about the likelihood of employee advancement within the organi- making a consensus decision.
zation (e.g., leadership potential, potential to advance to senior It was common for studies to include multiple biodata scales;
levels of the company). many of these pertained to the same construct domain within
FFM personality traits and cognitive ability were included sample. When this occurred, composite correlations were computed
as convergent measures. Only self-report FFM measures were using the procedures outlined by Schmidt and Hunter (2015).
included. External measures were labeled according to the FFM if Composite scale reliability was estimated as stratified α (Webb
the scale was labeled as one of the FFM in the study (assuming no et al., 2006). In cases where multiple criteria or convergent mea-
other information was presented that made us question whether the sures were included in the same sample, composites were also
scale content did not reflect the construct domain). For personality formed via composite formula. Simple averages were computed
scales without a direct FFM label, we mapped those scales to the when the same construct domain was scored with multiple empirical
FFM if enough information was available regarding the scale keying methods. Meta-analyses were separately conducted within
definition or scale content. In these cases, we required that the each construct domain (separately within category listed in Table 2)
scale assess content relevant to both major facets of the FFM traits, and within a scoring method (empirical, rational, hybrid) to avoid
10 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

issues of sample dependence. When multiple biodata scores were calculated a sample weighted average from eight individual reli-
computed using different scoring methods within the same sample, ability estimates listed by McNatt (2000) as well as three estimates
averages were computed. Note that few studies used hybrid scoring, from other studies (Lyons et al., 2001; Taylor et al., 2005). The
and thus meta-analytical results are rarely provided for this scoring sample weighted average of these was .78 (SD = .092). Finally, for
method. advancement potential, we calculated a sample weighted average
from six individual reliability estimates across four different studies
(Allen et al., 2014; Carlson et al., 1999; Gentry et al., 2012;
Analyses Gordon & Medland, 1965). The sample weighted average of these
was .86 (SD = .078).
Criterion-Related Validity
We implemented Schmidt and Hunter’s (2015) meta-analysis
Convergent Correlation Estimates
procedures. This computes average correlations and variance cor-
rected for sampling error and for study artifacts. Because biodata Fully corrected correlations were estimated between biodata
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

scales are used to make hiring decisions, we calculated operational scores and convergent measures. Given that assessment measures
This document is copyrighted by the American Psychological Association or one of its allied publishers.

validity coefficients that do not correct for predictor unreliability. were usually completed at the same testing occasion and given the
We corrected for range restriction and criterion unreliability using a previously discussed nonimpact of indirect range restriction, corre-
mixed meta-analysis approach with partial artifact information. lations between assessment measures were not corrected for range
Only several studies reported normative data to account for range restriction. However, correlations were corrected for unreliability
restriction, with only a few being cases of direct range restriction. for all variables at the individual study level. Most studies provided
We were able to locate an additional sample outside of our meta- reliability estimates for biodata scales and convergent measures, and
analytic database that provided population and sample standard in other cases estimates could be found in the existing literature
deviations for indirect range restriction (Rothstein et al., 1990) and (e.g., ACT/SAT). Mean construct-specific reliability values were
we combined with studies from our current meta-analysis to esti- input for studies that did not have reliability information. Mean
mate artifact distributions. From the created artifact database, the reliability values for the biodata construct domains are shown here:
sample weighted average u-ratio was .92 (SD = .040) for cases of mental capacity (.79), knowledge and procedural skills (.69), social
direct range restriction (k = 13, N = 13,344 incumbents).3 Of the skills/sociability (.75), agreeableness (.74), conscientiousness (.75),
studies included in the current meta-analysis, only a handful were emotional stability and self-confidence (.69), openness to experi-
coded as cases of direct range restriction. From the artifact database, ence (.80), leadership (.80), academic achievement (.79), interests/
the sample weighted average u-ratio was .99 (SD = .004) for cases of preferences (.77), physical fitness (.76), other (.67), and overall
indirect range restriction (N = 12,800 incumbents). Thus, indirect composite scale (.79).
range restriction appears to have little effect on biodata variances.
This likely occurs due to weaker biodata relationships with other
selection tools, but also likely because applicant variance for self- Results
report measures is often reduced due to faking. Given these u-ratios, Criterion-Related Validity Estimates
only direct range restriction was corrected for. In studies where
restricted and unrestricted variance estimates existed, direct range Focusing on correlations with overall job performance (Table 3),
restriction corrections were implemented using study-specific va- strong correlations were found for overall composite scales
lues. In the handful of other studies in which direct range restriction (ρ = .37). However, validity coefficients were noticeably higher
occurred but where the necessary descriptive statistics were not validity for empirically derived scales (ρ = .44) than for rationally
published, the .92 average u-ratio described above was used. derived scales (ρ = .24), with empirical validity being .20 (83%)
Criterion unreliability was corrected for by using artifact dis- higher. The value of .44 for empirically derived overall composite
tributions, hence making this a mixed meta-analysis. These artifact scores is higher than past meta-analytical estimates. The SDp of
distribution corrections were applied to the partially (direct range empirical scores was also zero, suggesting that empirically scored
restricted) corrected correlations. This involves correcting correla- overall biodata scores exhibit consistently strong relationships with
tions and variance estimates using the mean and variance of the overall job performance across contexts. On the other hand, ratio-
attenuation factor. Past meta-analytical estimates from Viswesvaran nally derived scales had more modest validity. The 95% confidence
et al. (1996) were used for job performance ratings. A mean intervals for rational overall composite scales (.21–.28) and empiri-
interrater reliability estimate of .52 (SD = .095) was used for overall cal scales (.42–.46) did not overlap.
performance ratings, and an estimate of .53 (SD = .092) was used for Although it is tempting to view these results and claim superiority
construct-aligned performance ratings. of the empirical scoring method, there is an important confound. As
Criterion reliability information was also sparsely reported for the discussed in the introduction, a pure comparison of rational to
additional performance outcomes. Only a few studies included empirical scoring requires that the exact same items be used in
reliability estimates of objective performance. Thus, we chose to the exact same settings, with the only manipulated factor being the
combine those estimates with 39 objective metric reliability coeffi- scoring method. Given the data set, we were able to isolate results to
cients reported in Hunter et al. (1990). A sample weighted average
3
of these yields a reliability estimate of .78 (SD = .089). This is The average u-ratio were calculated by weighting by study sample size.
The inputs for these calculations can be seen in Appendix A. Unrestricted
similar to other meta-analytic estimates used in previous studies standard deviations came from applicant samples and restricted estimates
(Roth et al., 2003; Sturman et al., 2005; Van Iddekinge et al., came from incumbent samples. Thorndike (1949) Case II formula was used
2011). To assess the reliability of training performance, we for direct range restriction.
BIODATA META-ANALYSIS 11

Table 3
Relationships Between Biodata Scores and Overall Job Performance Ratings

Construct domain k N r ρ SDp(r) 80% CV lower 80% CV upper

Overall composite scale 54 22,389 .27 .37 .00 (.06) .37 .37
Empirical 49 20,564 .31 .44 .00 (.06) .44 .44
Rational 22 16,279 .17 .24 .02 (.06) .21 .27
Conscientiousness 27 11,322 .14 .20 .07 (.10) .10 .29
Empirical 8 1,439 .27 .38 .00 (.08) .38 .38
Rational 26 9,565 .12 .17 .06 (.09) .09 .24
Academic achievement 3 614 .06 .08 .12 (.14) −.07 .23
Rational 3 614 .06 .08 .12 (.14) −.07 .23
Agreeableness 9 3,845 .12 .16 .09 (.11) .04 .28
Rational 8 2088 .02 .03 .00 (.05) .03 .03
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Social skills/sociability 19 7,879 .10 .15 .09 (.10) .04 .26


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Rational 17 5,381 .05 .06 .02 (.07) .03 .10


Emotional stability and self-confidence 16 7,896 .12 .17 .06 (.08) .09 .24
Rational 15 6,139 .08 .11 .00 (.05) .11 .11
Leadership 7 2,233 .18 .25 .06 (.09) .18 .33
Empirical 3 844 .33 .46 .00 (.00) .46 .46
Rational 7 2,233 .17 .23 .03 (.08) .19 .28
Openness to experience 7 3,983 .13 .18 .13 (.14) .01 .35
Rational 6 2,226 .02 .02 .08 (.09) −.08 .12
Knowledge and skills 6 1,420 .02 .03 .14 (.15) −.15 .21
Rational 4 1,165 −.04 −.06 .04 (.07) −.11 −.01
Mental capacity 14 4,555 .12 .17 .08 (.10) .07 .27
Rational 11 3,253 .09 .13 .03 (.07) .09 .17
Physical fitness 11 4,348 .19 .27 .10 (.12) .14 .40
Rational 11 4,348 .19 .27 .10 (.12) .14 .40
Interests/preferences 4 3,101 .07 .09 .07 (.09) .00 .19
Rational 4 3,101 .07 .09 .07 (.09) .00 .19
Single-construct domains—Empirical 16 3,411 .28 .40 .00 (.08) .40 .40
Single-construct domains—Hybrid 3 2,318 .20 .28 .07 (.09) .19 .37
Single-construct domains—Rational 42 12,545 .11 .15 .06 (.09) .07 .23
Note. k = number of samples; N = combined sample size; r = uncorrected correlation; ρ = operational validity coefficient corrected for range restriction and
criterion unreliability; SDp = standard deviation of true correlation and SDr = observed standard deviation; CV = 80% credibility interval. Only reporting
estimates where k ≥ 3. Results shown by construct domain (not separated by method) average across methods within construct and within sample, meaning
some samples scored the same content using different methods; this is why k for overall construct domain rows may be lower than the sum of the empirical and
rational totals. Hybrid results were included but there were few. “Single-Construct Domains” represent the average of all individual biodata construct domain
scales besides overall composite scales. For these analyses, all correlations were aggregated within each study setting. These “Single-Construct Domain”
analyses, in addition to all effects shown in this table, contain correlations from the “Other” biodata domain results, as shown in Appendix B.

only studies that used the exact same biodata content in the exact rational to empirical scoring, though the comparison for leader-
same settings (k = 18). Thus, we were able to perform a pure meta- ship is based on a small number of studies. In each case, empirical
analytic comparison of scoring method. Table 4 displays these scoring produced higher validity coefficients than rational scoring
results. When isolating to the same samples and exact same item (.38 vs. .17 for conscientiousness; .46 vs. .23 for leadership). To
content, large differences remained for overall biodata composite further investigate the importance of scoring method, an additional
scales scored using empirical versus rational scoring, with empirical analysis was conducted to compare empirical to rational scoring
scoring exhibiting stronger validity. using an aggregated operationalization across all narrow biodata
Table 3 also contains average validity coefficients with the domains. All narrow scales were aggregated within scoring method
overall performance for the narrow biodata domains. Conscien- within each sample. Although this muddles exactly what constructs are
tiousness biodata scales had the most effect sizes, and only being compared, this approach results in a larger number of samples
conscientiousness and leadership had enough studies to compare and larger sample sizes to compare the effect of scoring method.

Table 4
Comparing Scoring Method Validity With Overall Job Performance Ratings in Same Settings and Same Biodata Items

Construct domain k N r p SDp(r) CV

Overall composite scale—Empirical 18 13,907 .33 .45 .00 (.04) .45, .45
Overall composite scale—Rational 18 15,076 .17 .23 .01 (.06) .22, .25
Note. k = number of samples; N = combined sample size; r = uncorrected correlation; p = estimated true score correlation. Only included studies that
compared empirical to rational scoring using the same samples and same set of items, with the only manipulated variable being scoring method. SDp = standard
deviation of true correlation and SDr = observed standard deviation; CV = 80% credibility interval.
12 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

As shown at the bottom of Table 3, empirically keyed narrow the strongest loadings on general performance factors (Viswesvaran
biodata scales were most strongly related to overall job performance et al., 2005). Empirically sored keys will ultimately assess the
(ρ = .40), followed by hybrid scales (ρ = .28) and then by rational constructs most related to job performance, assuming the biodata
scoring (ρ = .15). items possess content validity relevant to those constructs
To more thoroughly compare individual domains, we isolated (Speer et al., 2020). Because conscientiousness-related variance
results to only rational scoring, where most domains had small-to- is relevant across most job duties, it is not surprising that the
moderate validity coefficients, with the strongest effect sizes occur- effects do not differ much when comparing correlations with overall
ring for physical fitness (ρ = .27), leadership (ρ = .23), and performance versus specific, aligned performance competencies.
conscientiousness (ρ = .17). Most single-construct domains exhib- For leadership, there were only three studies involved in each
ited large credibility intervals, suggesting such scales are likely to be comparison, and thus we are hesitant to overinterpret. However,
useful selection tools in certain settings but not in others. This Viswesvaran et al. (2005) also found that leadership performance
highlights the importance of performing job analysis to link job ratings have among the highest loadings on a general performance
demands to targeted construct domains. factor, likely resulting in a similar effect as that found for
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Table 5 contains correlations with aligned performance ratings conscientiousness.


This document is copyrighted by the American Psychological Association or one of its allied publishers.

made by employees’ superiors, providing a more comprehensive As before, in addition to individual effect sizes, we also show
understanding of biodata construct validity. The number of studies results where all narrow scales were aggregated within scoring
and the total sample sizes are smaller though, and as such results method within each sample to provide more stable effect sizes. There
should be interpreted with caution. As seen in Table 5, correlations was little difference in effect sizes between empirical (ρ = .34) and
between biodata scores and theoretically aligned performance ratings rational scoring (ρ = .32) when correlating scores with construct-
were moderate for all construct domains. As expected, correlations aligned performance ratings. Thus, whereas empirical scoring was
were higher with aligned performance ratings (Table 5) than when superior when predicting overall job performance, rational scoring
correlated with overall job performance (Table 3) for all comparisons seems capable of capturing targeted and specific performance vari-
besides empirically scored conscientiousness and empirically scored ance. However, it should be noted that validity for empirical scales
leadership. Regarding the former effect, conscientiousness variance is predicting construct-aligned performance ratings was consistent
saturated among nearly all work tasks. Reflecting this, performance across settings (SDp = .00), whereas there was more validity fluctua-
dimensions related to conscientiousness (e.g., effort) exhibit among tion for rational scales. Finally, hybrid scoring exhibited the strongest

Table 5
Relationships Between Biodata Scores and Aligned Job Performance Ratings

Construct domain k N r p SDp(r) 80% CV lower 80% CV upper

Conscientiousness 9 3,125 .22 .30 .09 (.11) .19 .42


Empirical 3 242 .27 .37 .11 (.16) .23 .50
Rational 7 2,976 .21 .29 .09 (.11) .18 .41
Agreeableness 6 2002 .21 .29 .05 (.09) .23 .36
Rational 5 1915 .22 .31 .04 (.08) .26 .36
Social skills/sociability 14 4,104 .24 .34 .05 (.09) .28 .40
Empirical 6 1,027 .29 .40 .00 (.09) .40 .40
Rational 11 3,810 .22 .32 .05 (.09) .25 .38
Emotional stability and self-confidence 7 2,865 .23 .31 .13 (.15) .15 .48
Empirical 3 254 .16 .22 .07 (.14) .13 .31
Rational 4 1,375 .09 .12 .03 (.07) .08 .16
Leadership 5 1959 .21 .28 .02 (.07) .26 .31
Empirical 3 1,230 .19 .26 .05 (.08) .20 .32
Rational 3 1,281 .24 .34 .00 (.03) .34 .34
Openness to experience 7 1,063 .31 .42 .10 (.13) .30 .54
Empirical 5 381 .28 .39 .12 (.17) .24 .55
Rational 4 863 .27 .37 .11 (.14) .23 .51
Knowledge & skills 6 1,011 .29 .40 .00 (.04) .40 .40
Empirical 4 329 .34 .46 .00 (.04) .46 .46
Rational 4 863 .28 .39 .00 (.02) .39 .39
Mental capacity 10 1,779 .20 .27 .11 (.14) .13 .40
Empirical 6 1,027 .30 .41 .02 (.10) .39 .44
Rational 5 924 .20 .27 .00 (.07) .27 .27
Single-construct domains—Empirical 12 2,104 .25 .34 .00 (.08) .34 .34
Single-construct domains—Hybrid 3 1885 .31 .42 .17 (.18) .21 .64
Single-construct domains—Rational 13 4,539 .23 .32 .06 (.09) .24 .39
Note. k = number of samples; N = combined sample size; r = uncorrected correlation; p = operational validity coefficient corrected for range restriction and
criterion unreliability; SDp = standard deviation of true correlation and SDr = observed standard deviation, CV = 80% credibility interval. Only reporting
estimates where k ≥ 3. Results shown by construct domain (not separated by method) average across methods within construct and within sample, meaning
some samples scored the same content using different methods; this is why k for overall domain rows may be lower than the sum of the empirical and rational
totals. Hybrid results are included but there were few. The “Single-Construct Domain” analyses, in addition to all effects shown in this table, contain correlations
from the “Other” biodata domain results, as shown in Appendix B.
BIODATA META-ANALYSIS 13

correlation with construct-aligned performance ratings (ρ = .42), which was estimated with a multiple R of .52 in Schmidt and
though this effect was based on only three studies. Hunter’s research. Based on the improved biodata construct validity
estimates from this meta-analysis, the multiple R for empirically
derived overall composite biodata scales and cognitive ability (using
Correlations With Additional Performance Outcomes
Schmidt & Hunter’s .51) is .61, and it is .63 if the empirical estimate
Table 6 displays correlations with other work outcomes. There for the biodata-cognitive ability relationship from this study is used.
were fewer correlations available for these outcomes. Thus, we keep These findings make empirically scored biodata and cognitive
our discussion of these results brief. As seen, overall composite ability one of the highest performing predictor pairs available to
scales correlated .22 with objective performance. Rational scales selection practitioners.
exhibited stronger validity (ρ = .38), though this was based on only Evidence was also generally supportive of the construct validity
three studies, compared to 22 studies for empirical scales (ρ = .28).4 of single domain biodata scores, with an average convergent
Leadership biodata scales were among the strongest narrow pre- correlation of .44 across all constructs, compared to an average
dictors of this outcome (ρ = .20). Overall composite scale correla- discriminant correlation of .25. The most pronounced differences
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

tions were higher for the outcome of training performance (ρ = .33), existed for biodata domains such as leadership, which converged
This document is copyrighted by the American Psychological Association or one of its allied publishers.

with conscientiousness biodata scales (ρ = .25) and physical fitness strongly with extraversion (ρ = .66) and had modest relationships
scales (ρ = .32) exhibiting the next strongest relationships with this with other external measures, including both conscientiousness (ρ =
outcome. Finally, overall composite scale correlations were strong .34) and cognitive ability (ρ = .10). Conscientiousness biodata also
in the prediction of advancement potential (ρ = .46), though this was exhibited strong evidence of construct validity, with an average
only based on four studies. Physical fitness biodata scales (ρ = .24), convergent correlation of .65 and an average discriminant correla-
mental capacity biodata scales (ρ = .23), and conscientiousness tion of .28. On the other hand, scores from several biodata domains
biodata scales (ρ = .20) exhibited moderate relationships with did not exhibit strong evidence of construct validity. These include
advancement potential. Collectively, biodata scales are valid pre- the biodata domains of mental capacity and agreeableness, such that
dictors of multiple work outcomes. each domain had higher discriminant validity than convergent
validity. Specifically, mental capacity biodata scores exhibited
weak correlations with cognitive ability (ρ = .17). Agreeableness
Convergent Validity Estimates
biodata scores correlated strongly with external measures of agree-
Tables 7 and 8 provide convergent validity estimates between ableness (ρ = .49), but these scores also correlated quite strongly
biodata scores and other external assessment measures. The bolded with external measures of conscientiousness (ρ = .57) and emotional
and underlined cells in Tables 7 and 8 reflect relationships that were stability (ρ = .52). Although these three constructs are related when
expected to be stronger (i.e., convergent correlations). Several find- assessed via other methods (van der Linden et al., 2010), such
ings are noteworthy. First, in many cases the credibility intervals are evidence casts doubt on the construct validity of agreeableness
quite large for many of the correlations between biodata scores and biodata scores and mental capacity biodata scores.
external measures. Thus, the relationships between these assessments All together though, the pattern of convergent correlations shown
are likely to vary in practice, even when taking into account the in Tables 7 and 8 (ρ̄ convergent = .44, ρ̄ discriminant = .25)
moderators of this study. Second, overall composite scales correlated provides evidence that biodata measures differ in expected ways in
most strongly with externally measured conscientiousness (ρ = .68). their relationships with other external psychological assessments,
This large correlation is driven in part by a sample of over 10,000 though as mentioned before, a great deal of variability still exists in
respondents from McElreath et al. (2007), using hybrid scoring. As the relationships between biodata scores and external measures.
seen in Table 8, the correlations are .50 with empirically scored Combined with updated criterion-related validity estimates, these
overall composites and .43 with rational scoring. Thus, the relation- findings should help researchers better design comprehensive and
ship with FFM conscientiousness is strong, but these two types of maximally predictive selection batteries when using biodata.
measures capture unique aspects of human behavior. Overall com-
posite scores also correlated strongly with emotional stability (ρ =
.52), though this was once again driven by the McElreath et al.’s Testing Context as a Moderator of Criterion-Related
study, which had very large correlations with personality. Without Validity
considering this study, emotional stability correlates only .18 with
We also conducted an ad hoc moderator analysis of testing
empirical overall composite scores and .36 with rational overall
context by breaking context into the applicant and incumbent
composite scores. Overall composite scores also correlated strongly
samples, focusing on overall performance ratings. Given that bio-
with extraversion (ρ = .46). There was a pattern of modest correla-
data inventories are self-report, they are susceptible to faking
tions with other external measures, ranging from .22 to .27 (cognitive
(Becker & Colquitt, 1992). As such, validity may be attenuated
ability = .22, openness = .25, agreeableness = .27).
in applicant contexts where respondents are motivated to appear
When considering these results, perhaps the most interesting
favorable. Table 9 provides these analyses. Although validity esti-
observation is just how low the correlation between cognitive ability
mates were comparable for overall biodata composites between
and biodata is (ρ = .22). The low correlation with cognitive ability is
applicant samples (ρ = .39) and incumbent samples (ρ = .38), there
remarkable because past seminal research has assumed a correlation
of .50 between cognitive ability and biodata scores (Schmidt & 4
For several of the studies included in these analyses, some of the biodata
Hunter, 1998). Using the Schmidt and Hunter (1998) estimated scales could not be coded according to scoring method (or were hybrid with
intercorrelation would therefore inappropriately cap the combined fewer than three total studies), and hence why the overall k is larger than the
predictive ability of a composite of biodata and cognitive ability, sum of rational and empirical k.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

14

Table 6
Relationships Between Biodata Scores and Other Work Outcomes

Objective performance Training performance Advancement potential


Construct domain k N r p SDp(r) CV k N r p SDp(r) CV k N r p SDp(r) CV

Overall composite scale 29 33,373 .18 .22 .09 (.10) .10, .33 28 44,056 .29 .33 .06 (.08) .25, .40 4 9,099 .43 .46 .09 (.10) .35, .57
Empirical 22 15,906 .24 .28 .09 (.12) .16, .40 16 31,053 .27 .31 .05 (.08) .24, .38
Rational 3 1,083 .33 .38 .16 (.17) .18, .58
Hybrid 3 7,516 .48 .51 .00 (.02) .51, .51
Conscientiousness 4 2,595 .14 .16 .06 (.08) .09, .23 4 8,102 .22 .25 .00 (.04) .25, .25 6 3,762 .18 .20 .15 (.16) .01, .39
Rational 4 2,595 .14 .16 .06 (.08) .09, .23 3 2,288 .25 .28 .04 (.07) .22, .33 6 3,762 .18 .20 .15 (.16) .01, .39
Social skills/sociability 6 2,937 .15 .17 .11 (.13) .03, .32
Rational 6 2,937 .15 .17 .11 (.13) .03, .32
Emotional stability and self-confidence 5 1889 .09 .10 .00 (.04) .10, .10 4 3,660 .14 .16 .04 (.06) .11, .20 5 4,143 .06 .06 .02 (.04) .04, .09
Rational 4 1,795 .08 .09 .00 (.02) .09, .09 3 2,568 .12 .13 .04 (.06) .09, .18 5 4,143 .06 .06 .02 (.04) .04, .09
Leadership 3 1,324 .18 .20 .00 (.04) .20, .20
Rational 3 1,324 .18 .20 .00 (.04) .20, .20
Academic achievement 3 634 .07 .08 .08 (.11) −.02, .19
Rational 3 634 .07 .08 .08 (.11) −.02, .19
Physical fitness 4 2,866 .28 .32 .05 (.08) .26, .38 4 4,000 .22 .24 .11 (.12) .09, .38
Rational 4 2,866 .28 .32 .05 (.08) .26, .38 4 4,000 .22 .24 .11 (.12) .09, .38
Mental capacity 3 1,204 .21 .23 .05 (.08) .16, .30
Interests/preferences 3 3,063 .09 .09 .00 (.03) .09, .09
Rational 3 3,063 .09 .09 .00 (.03) .09, .09
Note. k = number of samples; N = combined sample size; p = estimated true score correlation. Only reporting estimates where k is ≥ 3. Correct ions were made for range restriction and criterion unreliability.
For several of the studies included in these analyses, some of the biodata scales could not be coded according to scoring method, and hence why in some cases the overall k is not followed by corresponding
method moderator analyses of matching sample sizes. Results for the “other” biodata domains that did not fit into any of these categories can be found in Appendix B. SDp = standard deviation of true correlation
SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

and SDr = observed standard deviation; CV = 80% credibility interval.


This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Table 7
Relationships Between Biodata Scores and External Measures of Extraversion, Agreeableness, and Emotional Stability

Extraversion Agreeableness Emotional stability


Construct domain k N r p SDp(r) CV k N r p SDp(r) CV k N r p SDp(r) CV

Overall comp scale 4 2,849 .36 .46 .03 (.05) .42, .50 10 9,265 .21 .27 .20 (.15) .00, .53 8 21,120 .42 .52 .19 (.15) .28, .77
Empirical 3 1,281 .39 .46 .05 (.06) .40, .53 3 1,181 .20 .24 .01 (.01) .24, .24 3 1,281 .14 .18 .04 (.06) .12, .23
Rational 8 9,105 .22 .28 .21 (.16) .00, .55 5 7,914 .28 .36 .20 (.16) .10, .62
Academic achievement 4 1,661 .09 .12 .08 (.10) .01, .22 4 1,661 .15 .21 .15 (.12) .02, .40 4 1,661 .08 .10 .04 (.05) .06, .15
Rational 3 820 .11 .15 .12 (.11) .00, .30 3 820 .20 .29 .19 (.15) .04, .53 3 820 .11 .15 .03 (.06) .12, .19
Conscientiousness 12 7,078 .28 .35 .15 (.16) .16, .54 11 6,815 .20 .25 .18 (.14) .03 (.48) 11 19,468 .32 .43 .20 (.14) .18, .68
Rational 9 5,217 .29 .37 .15 (.14) .18, .56 9 5,217 .21 .27 .20 (.16) .02 (.52) 8 17,607 .32 .44 .20 (.14) .18, .70
Agreeableness 4 2,382 .15 .19 .10 (.11) .07, .31 4 2,382 .36 .49 .14 (.09) .31, .66 5 15,328 .40 .52 .09 (.07) .40, .64
Rational 3 1,625 .15 .20 .12 (.09) .05, .35 3 1,625 .39 .54 .14 (.09) .35, .72 4 14,571 .41 .53 .09 (.07) .42, .64
Social skills/sociability 9 4,261 .31 .40 .21 (.22) .14, .67 12 4,943 .20 .28 .18 (.14) .04, .51 9 4,550 .32 .44 .19 (.13) .20, .68
Rational 7 3,358 .34 .46 .20 (.15) .20, .72 10 4,040 .19 .27 .20 (.16) .01, .53 7 3,647 .33 .46 .20 (.14) .21, .72
ES & self-confidence 8 4,491 .25 .35 .12 (.13) .19, .51 8 4,491 .11 .13 .25 (.18) −.19, .45 8 4,491 .42 .59 .13 (.12) .43, .76
Rational 7 3,734 .28 .40 .08 (.06) .29, .50 7 3,734 .08 .10 .26 (.19) −.24, .44 7 3,734 .44 .64 .10 (.12) .51, .76
Leadership 5 2,320 .54 .66 .04 (.05) .61, .70
BIODATA META-ANALYSIS

Empirical 3 844 .50 .63 .00 (.03) .63, .63


Rational 4 1,479 .55 .66 .06 (.06) .58, .73
Openness to experience 3 1,775 .15 .18 .02 (.05) .16, .20 3 1,775 .19 .24 .05 (.06) .18, .30 3 1,775 .19 .23 .10 (.10) .10, .37
Mental capacity 3 618 .41 .51 .00 (.06) .51, .51 3 618 .04 .05 .00 (.04) .05, .05 3 618 .31 .38 .00 (.04) .38, .38
Interests/preferences 3 1805 .24 .32 .21 (.21) .05, .58 3 1805 −.09 −.14 .10 (.08) −.26,−.01 3 1805 −.05 −.07 .00 (.05) −.07,−.07
Note. k = number of sample; N = combined sample size; r = uncorrected correlation; p = estimated true score correlation; ES = emotional stability. Only reporting estimates where k is ≥ 3. Bolded and
italicized values are expected, theoretically aligned convergent relationships. Results shown by construct domain (not separated by method) average across methods within construct and within sample,
meaning some samples scored the same content using different methods; this is why k for overall domain rows may be lower than the sum of the empirical and rational totals. Hybrid results are included but there
were few. Results for the “other” biodata domains that did not fit into any of these categories can be found in Appendix B. SDp = standard deviation of true correlation and SDr = observed standard deviation,
CV = 80% credibility interval.
15
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

16

Table 8
Relationships Between Biodata Scores and External Measures of Conscientiousness, Openness, and Cognitive Ability

Conscientiousness Openness Cognitive ability


Construct domain k N r p SDp(r) CV k N r p SDp(r) CV k N r p SDp(r) CV

Overall comp scale 9 21,214 .54 .68 .20 (.18) .42, .93 4 2,849 .20 .25 ,12 (.11) .09, .41 14 27,701 .18 .22 .09 (.08) .10, .34
Empirical 4 1,375 .42 .50 .01 (.05) .49, .52 3 1,281 .31 .38 .06 (.06) .30, .45 10 12,691 .13 .16 .06 (.06) .09, .24
Rational 5 7,914 .32 .43 .06 (.08) .35, .51 3 2064 .15 .17 .06 (.07) .09, .25
Academic achievement 4 1,661 .22 .29 .26 (.21) −.04, .62 4 1,661 .16 .22 .15 (.12) .02, .41 5 68,148 .32 .41 .03 (.03) .36, .45
Rational 3 820 .24 .33 .37 (.29) −.14, .79 3 820 .11 .16 .20 (.17) −.10, .43 3 832 .10 .12 .00 (.03) .12, .12
Conscientiousness 14 20,247 .48 .65 .10 (.08) .52, .78 10 6,696 .27 .36 .09 (.08) .25, .47 12 26,987 .11 .15 .13 (.09) −.02, .31
Rational 11 18,386 .49 .66 .10 (.08) .52, .79 8 5,098 .27 .36 .10 (.09) .23, .48 9 19,311 .14 .19 .12 (.09) .03, .35
Agreeableness 5 15,328 .45 .57 .12 (.10) .42, .73 4 2,382 .13 .17 .26 (.21) −.17, .51
Rational 4 14,571 .45 .57 .13 (.10) .41, .73 3 1,625 .03 .05 .24 (.19) −.26, .37
Social skills/sociability 10 4,736 .25 .33 .12 (.11) .18, .48 8 4,079 .19 .26 .08 (.08) .16, .37 8 9,183 .07 .09 .07 (.06) −.01, .18
Empirical 3 6,701 .08 .10 .04 (.03) .06, .15
Rational 8 3,833 .21 .28 .08 (.08) .18, .38 6 3,176 .17 .23 .07, .07 .15, .32 5 2,482 .04 .04 .12 (.11) −.11, .19
ES and self-confidence 9 4,978 .22 .30 .14 (.13) .12, .47 8 4,491 .12 .17 .18 (.14) −.06, .39 5 5,212 .08 .10 .04 (.05) .04, .16
Rational 8 4,221 .18 .25 .10 (.09) .13, .37 7 3,734 .08 .11 .15 (.11) −.08, .30 5 5,212 .08 .10 .04 (.05) .04, .16
Leadership 3 1,776 .28 .34 .16 (.13) .14, .54 6 5,962 .10 .10 .08 (.08) −.01, .21
Rational 3 3,595 .12 .14 .02 (.03) .11, .16
Openness to experience 3 1,775 .23 .27 .24 (.21) −.03, .58 4 2029 .45 .58 .16 (.09) .38, .79 5 5,142 .24 .26 .07 (.07) .16, .35
Rational 3 1,272 .50 .68 .14 (.06) .50, .86 3 3,616 .22 .24 .07 (.07) .15, .33
Knowledge & skills 5 4,529 .23 .29 .07 (.06) .19, .38
Rational 3 3,460 .22 .28 .08 (.07) .16, .38
Mental capacity 4 1,105 .23 .28 .16 (.14) .08, .49 3 618 .49 .64 .06 (.07) .57, .72 5 9,661 .15 .17 .04 (.04) .12, .22
Empirical 3 6,701 .14 .16 .05 (.04) .10, .22
Rational 3 959 .20 .24 .13 (.12) .07, .41
Physical fitness 3 1857 .02 .03 .00 (.04) .03, .03
Rational 3 1857 .02 .03 .00 (.04) .03, .03
Interests/preferences 4 2,292 .20 .27 .10 (.09) .15, .40 3 1805 −.02 −.02 .00 (.03) −.02,−.02 3 3,581 .08 .09 .09 (.09) −.03, .21
SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

Rational 3 2,146 .19 .26 .09 (.09) .14, .38


Note. k = number of samples; N = combined sample size; r = uncorrected correlation; p = estimated true score correlation; ES = emotional stability. Only reporting estimates where k is ≥ 3. Bolded and
italicized values are expected, theoretically aligned convergent relationships. Results shown by construct domain (not separated by method) average across methods within construct and within sample,
meaning some samples scored the same content using different methods; this is why k for overall domain rows may be lower than the sum of the empirical and rational totals. Hybrid results are included but there
were few. Results for the “other” biodata domains that did not fit into any of these categories can be found in Appendix B. SDp = standard deviation of true correlation and SDr = observed standard deviation;
CV = 80% credibility interval.
BIODATA META-ANALYSIS 17

Table 9
Relationships Between Biodata Scores and Overall Job Performance Ratings by Study Context

Construct domain Context k N r p SDp(r) CV

Overall composite scale Applicant predictive 4 2,102 .26 .39 .00 (.03) .39, .39
Overall composite scale Incumbent sample 50 20,287 .27 .38 .00 (.06) .38, .38
Conscientiousness Applicant predictive 6 3,305 .05 .08 .00 (.04) .08, .08
Conscientiousness Incumbent sample 19 7,785 .18 .25 .06 (.09) .17, .32
Agreeableness Incumbent sample 9 3,845 .12 .16 .09 (.11) .04, .28
Social skills/sociability Applicant predictive 3 1,003 .11 .16 .07 (.09) .08, .25
Social skills/sociability Incumbent sample 14 6,644 .11 .15 .09 (.11) .03, .26
Emotional stability & self-confidence Applicant predictive 8 3,569 .07 .09 .00 (.05) .09, .09
Emotional stability & self-confidence Incumbent sample 8 4,327 .16 .22 .06 (.08) .15, .30
Leadership Incumbent sample 6 2,139 .19 .26 .06 (.09) .18, .34
−.01,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Openness to experience Incumbent sample 6 2,953 .13 .19 .16 (.17) .39
−.16,
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Knowledge & procedural skills Incumbent sample 5 1,333 .00 .01 .13 (.14) .17
Mental capacity Applicant predictive 5 1,797 .12 .18 .13 (.14) .01, .35
Mental capacity Incumbent sample 9 2,758 .12 .17 .00 (.05) .17, .17
Physical fitness Applicant predictive 6 3,306 .17 .23 .09 (.11) .12, .34
Physical fitness Incumbent sample 5 1,042 .28 .39 .10 (.13) .26, .51
Single-construct domains—Rational Applicant predictive 8 3,568 .06 .08 .00 (.04) .08, .08
Single-construct domains—Rational Incumbent sample 32 8,745 .12 .17 .07 (.10) .08, .26
Single-construct domains—Hybrid Incumbent sample 3 2,318 .20 .28 .07 (.09) .19, .37
Single-construct domains—Empirical Applicant predictive 3 888 .22 .33 .00 (.04) .33, .33
Single-construct domains—Empirical Incumbent sample 13 2,523 .31 .42 .00 (.08) .42, .42
Note. k = number of samples; N = combined sample size; r = uncorrected correlation; p = estimated true score correlation. Results are collapsed across scoring
methods. Only reporting estimates where k is ≥ 3. “Single-Construct Domains” represent the average of all individual biodata construct scales besides overall
composite scales. For these analyses, all correlations were aggregated within each study setting. Note there were not enough predictive studies set in incumbent
samples to separate incumbent into purely concurrent and purely predictive. The “Single-Construct Domain” analyses, in addition to all effects shown in this
table, contain correlations from the “Other” biodata domain results, as shown in Appendix B. SDp = standard deviation of true correlation and SDr = observed
standard deviation, CV = 80% credibility interval.

were only four applicant samples in this comparison. When com- format. Thus, we thought it may be possible to broadly differentiate
paring narrow biodata domains, validity was generally higher within biodata content in terms of whether the scale was composed of hard or
incumbent samples, with this occurring for conscientiousness, soft items (Asher, 1972), or a combination of the two. A hard item is
emotional stability and self-confidence, and physical fitness. Valid- verifiable (e.g., “How many speeding tickets have you received in
ity coefficients were similar between applicant and incumbent your life?”), whereas a soft item is unverifiable with responses often
contexts for mental capacity and social skills and sociability. expressed abstractly (e.g., “How often do you speed when driving
Given the conflicting findings, results were aggregated across all automobiles?”). Unfortunately, many studies in the meta-analytic
individual construct domains (i.e., all except overall biodata com- database did not clearly indicate the format of all biodata items
posites), which allows for a more comprehensive comparison. included. We required that a study explicitly state whether all items
Aggregations were made across all scales within study setting were hard or soft (or have access to all the items to make this
and by scoring method where possible, given the strength of scoring determination), resulting in no studies categorized by fully soft items.
method as a moderator. Table 9 also displays these results, showing Most studies were coded as mixed or as “unclear,” and there were too
that when comparing rationally scored scales across applicant few studies, holding biodata construct domain constant, to reliably
(k = 8, ρ = .08) and incumbent samples (k = 32, ρ = .17), validity examine this moderator. With little clarity in the composition of item
was marginally higher within incumbent settings. Empirical scales types at the scale level, meaningful meta-analytic analyses could not
in incumbent samples (k = 13, ρ = .42) also had higher validity than be performed for this moderator.
empirical scales in applicant samples (k = 3, ρ = .33), but it should
be noted that validity estimates are moderate-to-high in both set- Job as a Moderator of Criterion-Related Validity
tings. Thus, there seems to be some attenuation of validity when
biodata scales are used in applicant settings, though not to a degree Where enough studies existed, criterion-related validity was also
that diminishes the usefulness of biodata scores. examined within jobs, focusing on overall performance ratings.
Such information is useful for practitioners who wish to identify job-
relevant biodata content within specific work settings. As such, the
Item Format as a Moderator of Criterion-Related study authors reviewed job titles and grouped them into similar job
Validity categories.5 Because this meta-analysis shows it is important to
We also considered other possible moderators of biodata inventory
5
validity, including the format of biodata items (e.g., Mael, 1991). As suggested by an expert review, even though we were able to examine
validity by broad job types, there may be benefits to further examining
Some primary studies have examined narrow item features (e.g., validity by specific job demands or challenges. This was beyond the present
Becker & Colquitt, 1992; Graham et al., 2002). However, studies do study and there were not enough data to reliably make such distinctions, but
not generally compute scale scores composed of only one narrow item this is an area for future research.
18 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

separate biodata scales by construct domain and scoring method, Table 10


results are presented individually by domain and by scoring method Relationships Between Biodata Scores and Overall Job Perfor-
where possible. Table 10 provides the results. Cells are sparse as mance Ratings by Job
expected given the many moderators already considered. Thus,
results should be interpreted with caution. Technical and science Job/domain/method k N r p SDp(r) CV
jobs (e.g., scientists, nurses) and military jobs had results for the Technical and Science
most construct domains. Overall biodata composite scores exhibited Overall composite scale 7 1,604 .30 .43 .07 (.11) .34, .53
meaningful correlations with overall job performance for every job. Empirical 5 1,315 .29 .42 .09 (.13) .31, .54
Conscientiousness 5 408 .14 .19 .20 (.24) −.07, .45
In technical and science jobs, mental capacity (ρ = .33) and
Rational 5 408 .14 .19 .17 (.21) −.06, .38
conscientiousness scale scores (ρ = .19) were most related to Agreeableness 3 228 .07 .10 .00 (.11) .10, .10
performance, though there were no empirically scored conscien- Rational 3 228 .07 .10 .00 (.11) .10, .10
tiousness measures. Military jobs solely used rational scoring, and Social skills/sociability 4 969 .12 .18 .03 (.08) .14, .22
the most predictive domain was physical fitness (ρ = .27). Rational 3 228 −.01 −.02 .01 (.05) −.04, .00
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Openness to experience 3 228 −.02 −.03 .02 (.02) −.05,−.01


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Rational 3 228 −.02 −.03 .02 (.02) −.05,−.01


Knowledge & skills 3 228 .09 .12 .00 (.04) .12, .12
Discussion Rational 3 228 .09 .12 .00 (.04) .12, .12
Mental capacity 4 969 .22 .33 .08 (.11) .22, .44
Because biodata inventories are a measurement method, control- Rational 3 228 .03 .04 .00 (.04) .04, .04
ling for construct domain is vital to understanding biodata validity. Military
Conscientiousness 9 5,001 .09 .12 .03 (.06) .08, .16
However, past biodata review studies have failed to do this, resulting Rational 9 5,001 .09 .12 .03 (.06) .08, .16
in a major gap in biodata knowledge. From a theoretical perspective, Social skills/sociability 3 1,587 .12 .16 .00 (.05) .16, .16
it hampers our understanding of the types of constructs biodata Rational 3 1,587 .12 .16 .00 (.05) .16, .16
inventories can reasonably assess or how predictive biodata scores ES & self-confidence 9 5,066 .09 .13 .00 (.04) .13, .13
Rational 9 5,066 .09 .13 .00 (.04) .13, .13
are. It also muddles efforts to establish a nomological network. Mental capacity 6 2,721 .09 .13 .05 (.08) .07, .19
Practically, this chiasm makes it challenging to identify how any Rational 6 2,721 .09 .13 .05 (.08) .07, .19
given biodata inventory will generalize to new contexts and whether Physical fitness 11 4,348 .19 .27 .10 (.12) .14, .40
that biodata assessment will explain unique variance in work out- Rational 11 4,348 .19 .27 .10 (.12) .14, .40
Interests/preferences 3 2,891 .06 .09 .08 (.09) −.01, .18
comes over and above alternate preemployment assessments. There
Rational 3 2,891 .06 .09 .08 (.09) −.01, .18
is also the challenge that biodata scores differ based on scoring Managers
approach, and prior to this study it was unclear the degree to which Overall composite scale 14 2,647 .31 .43 .00 (.04) .43, .43
empirical versus rational scoring affect validity. This study thus Empirical 13 2,531 .31 .44 .00 (.04) .44, .44
simultaneously examined two major moderators of biodata validity Sales
Overall composite scale 4 154 .39 .55 .00 (.12) .55, .55
(construct domain, scoring method) and established updated and Empirical 4 154 .39 .55 .00 (.12) .55, .55
nuanced estimates of biodata validity, both in terms of criterion- Conscientiousness 4 993 .20 .28 .00 (.03) .28, .28
related validity and also importantly in terms of correlates with other Empirical 3 522 .20 .27 .00 (.04) .27, .27
commonly used assessments. Several findings from this meta- Rational 4 993 .22 .31 .00 (.05) .31, .31
Manual labor
analysis are particularly noteworthy. Conscientiousness 3 756 .23 .32 .00 (.07) .32, .32
First, overall biodata composites exhibited moderate-to-strong Rational 3 756 .22 .31 .00 (.07) .31, .31
correlations with overall job performance ratings (ρ = .37), as well Call center
as meaningful correlations with other performance criteria. How- Overall composite scale 5 1,375 .26 .36 .00 (.07) .36, .36
ever, these findings are not stable across all biodata composite Empirical 3 540 .23 .32 .02 (.09) .29, .35
Individual contributor (other)
scales, as empirically derived scales (ρ = .44) had larger validity Overall composite scale 7 7,519 .24 .33 .00 (.04) .33, .33
coefficients than rationally derived scales (ρ = .24) when predicting Empirical 7 7,519 .30 .42 .00 (.04) .42, .42
job performance ratings. These findings situate the validity of Rational 3 6,545 .15 .21 .00 (.01) .21, .21
empirically scored biodata composites among other highly predic- Conscientiousness 3 1,253 .13 .18 .00 (.02) .18, .18
Rational 3 1,253 .13 .18 .00 (.02) .18, .18
tive selection tools such as structured interviews, work samples, and Social skills/sociability 4 1,463 .02 .02 .00 (.04) .02, .02
cognitive ability tests (Schmidt & Hunter, 1998). Rational 4 1,463 .02 .02 .00 (.04) .02, .02
Second, this study showed that the criterion-related validity of Mixed sample
biodata depends upon what biodata scales are developed to assess. Overall composite scale 17 9,090 .27 .38 .00 (.04) .38, .38
Empirical 17 8,505 .33 .46 .00 (.04) .46, .46
With the exception of empirically scored leadership and conscien-
Rational 16 8,783 .18 .25 .04 (.07) .20, .30
tiousness biodata, there was a great range in narrow domain validity
coefficients, with most construct domains exhibiting small-to- Note. k = number of samples; N = combined sample size; r = uncorrected
correlation; p = estimated true score correlation; ES = emotional stability. Only
modest correlations and possessing moderately wide credibility
reporting estimates where k is ≥ 3. Results shown by construct domain (not
intervals. This suggests that specific biodata measures will only separated by method) average across methods within construct and within
be related to overall job performance in certain job contexts, which sample, meaning some samples scored the same content using different
falls in line with traditional practices of job analysis and the methods; this is why k for overall domain rows may be lower than the sum
of the empirical and rational totals. Hybrid results are included but there were few.
matching of KSAOs to job demands (e.g., Brannick et al., 2007; Results for the “other” biodata domains that did not fit into any of these categories
Tett & Christiansen, 2007). When narrow biodata domain scores can be found in Appendix B. SDp = standard deviation of true correlation and
were correlated with theoretically aligned performance ratings, SDr = observed standard deviation, CV = 80% credibility interval.
BIODATA META-ANALYSIS 19

correlations were expectedly stronger, and this occurred for both For example, items can be written to target specific domains of the
empirical and rational scoring. job (e.g., job knowledge, customer service experience) and then
Third, this is the first meta-analysis to broadly establish the maximally weighted (i.e., empirically keyed) to predict work crite-
convergent and discriminant validity of biodata measures by con- ria. Content should be created based on job analysis or a study of the
struct domain. Once again, this has implications regarding the job. By doing this, scales obtain some level of content validity
nomological network of biodata and practical implications regard- evidence. According to MESBI, the empirical key will then capture
ing biodata’s use in larger selection batteries. Across all construct any construct variance that is related to job performance, but which
domains, the average theoretically aligned convergent correlation can be elicited from the item stimuli. Thus, if test developers restrict
was .44, and the average discriminant correlation was .25. This the item content to only that which is job relevant, concerns over
finding speaks to the flexibility of biodata to assess specific con- interpretability and job relevance should subside.
structs. For example, certain biodata domains such as leadership are This said, we do advocate for a mix between empirical and
more capable of measuring socially oriented traits, whereas domains rational scoring by using hybrid scoring, which either adjusts
such as conscientiousness are more related to hard work and scoring to fall in line with theory or removes items with nonintuitive
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

attention to detail. More generally, overall biodata composite scales empirical weights. This balances the predictive advantages of
This document is copyrighted by the American Psychological Association or one of its allied publishers.

exhibited moderate-to-high overlap with the FFM but only modest empirical scoring with the interpretability of rational scoring and
overlap with cognitive ability. Unfortunately, it is challenging to is likely preferable in litigious environments. Although we found
compare these biodata findings to other assessment methods, as few studies that claimed to use hybrid scoring, we anecdotally know
there are few meta-analyses that have explored the relationships of many consulting companies that use this method, and there is
between other popular assessment methods (e.g., selection inter- little reason to suspect validity will drastically differ from strict
views, assessment centers, situational judgment tests) and job- empirical scoring. It is also a possibility that some of the coded
related traits (e.g., FFM) broken down by construct domain. The empirical studies may have adjusted nonintuitive empirical weights
lack of such comparisons speaks to the comprehensiveness of the without describing that in the manuscript, meaning some empiri-
current work. cally coded scales may have actually been hybrid.
Fourth and as already mentioned briefly, biodata scoring methods We have several additional recommendations regarding empirical
matter greatly. Despite past assumptions that rational and empirical or hybrid scoring. (a) Sample sizes should be large enough to
scoring produce equivalently predictive biodata measures (e.g., produce stable empirical keys. Cucina et al. (2012) provide helpful
Hough & Paullin, 1994; Jackson, 1975; Schoenfeldt, 1999), this sample size guidelines for practitioners to follow. Putka et al. (2018)
study found that empirical scoring was superior in terms of criterion- also provide guidance for sample size requirements to score when
related validity when predicting overall job performance. This using machine learning. (b) The calibration sample should be similar
makes sense, as empirical methods are explicitly designed to to the intended holdout sample. This applies to the types of
maximize prediction. The only instances in which empirical keys respondents, the criteria used to perform keying, and the job context
would be expected to result in lower validity would be if calibration (Speer et al., 2020). Although various studies have shown that
sample sizes are small, or if the calibration sample differs greatly empirical biodata keys are fairly generalizable (e.g., Carlson
from the context in which operational biodata scales are to be used et al., 1999; Rothstein et al., 1990), validity is higher when calibra-
(Speer et al., 2020). The findings from this study should provide tion sample features are close in similarity to the holdout context
closure to the debate as to which method of biodata scoring will (Speer et al., 2020). When these conditions are met, empirically
result in the highest criterion-related validity when predicting scored biodata inventories are capable of exhibiting strong levels of
overall performance ratings. However, our results do suggest that validity.
rational scales can be effective at targeting specific performance
domains, and thus rationally scored biodata might be useful if
Additional Practical Implications
combined with other biodata scales or measures that collectively
capture the full performance domain. The current findings have major implications for biodata inven-
Although findings were favorable regarding empirical scoring, tories and personnel selection. Overall, findings suggest that biodata
researchers may still have concerns over using empirical methods in inventories are one of the most predictive employment assessments
practice. The major downside to empirically scored biodata inven- available. In addition to biodata’s strong criterion-related validity
tories is lessened interpretability, which may be important in some (and evidence of construct-related validity), biodata scales are cheap
settings. A perceived lack of interpretability may raise legal con- to develop and easy to administer relative to other selection methods
cerns. However, we must remember that validity is a unitarian (e.g., interviews, assessment centers, work samples). Despite this,
concept (Binning & Barrett, 1989), and when biodata scores corre- biodata inventories are less frequently used to hire applicants (König
late with external criteria and expected convergent measures, this et al., 2010). Taken together, this represents a missed opportunity,
provides evidence for test validity. As outlined by Speer et al. and biodata scales should be used more frequently in practice when
(2020) Model of Empirically Scored Biodata (MESBI), empirical making personnel decisions. We hope this meta-analysis encourages
biodata scores can exhibit expected patterns of construct validity researchers and practitioners to apply biodata inventories more in
that can be theorized a priori. The crux is the biodata content that is their work.
input into the empirical keying step. Beyond the important implications regarding empirical scoring
We suggest only using biodata content that is first rationally discussed previously, findings also help guide test development and
developed so as to sample from the targeted job domain (as opposed implementation decisions. Results from Tables 3 and 5 suggest that
to strict dustbowl empiricism). Empirically scored biodata scales biodata inventories can be developed to address specific perfor-
should still be developed with theoretical considerations in mind. mance factors at work. For example, a biodata scale designed to
20 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

measure knowledge and procedural skills is likely to exhibit strong between typical biodata scales and many O*NET constructs. Thus,
correlations with performance behaviors associated with job knowl- we felt our taxonomy would better reflect existing biodata scales and
edge and procedural skills. Tables 7 and 8 also provide evidence for would help to advance future biodata research. Support for the
which constructs can reasonably be targeted by biodata inventories. developed classification scheme can be found in our results showing
Work analysis should always guide scale content, and these meta- that biodata domains correlated with theoretically aligned external
analytic findings can help steer the possibilities and limits of measures. Furthermore, a high agreement was found between raters
biodata-based measurement. when categorizing biodata scales into the taxonomy. Nonetheless,
Of the construct domains, leadership and conscientiousness future research might consider scrutinizing and possibly expanding
appear to be good candidates for measurement via empirically this taxonomy further.
scored biodata inventories based on their strong convergent correla- It is possible other moderators might be important to biodata than
tions and high criterion-related validity. On the other hand, biodata those listed here. Some of these have already been discussed (e.g.,
scales might not be particularly well equipped to assess cognitive format of biodata item). However, there was a challenge to con-
characteristics (e.g., cognitive ability), given the weak convergence ducting additional moderator analyses beyond those performed
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

with performance-based cognitive ability measures. This finding is here. Specifically, this study focused on two major moderators—
This document is copyrighted by the American Psychological Association or one of its allied publishers.

particularly useful when considering how to fill-in or complete biodata domain and scoring method—and each had substantial
missing components from a selection battery. Because of the low effects on biodata validity. Examining other moderators without
correlation with cognitive ability, biodata inventories will add considering the construct domain or scoring method of biodata
incremental validity when these measures are used together. In simultaneously would not be particularly helpful given the impact of
fact, this study reveals that a selection battery composed of empiri- biodata domain and scoring method. After accounting for these
cally scored biodata and a cognitive ability measure is among the moderators within our study, there are too few samples for addi-
best possible combinations of selection assessments (multiple tional partitioning in a meaningful way. However, future research
R = .63). This is noteworthy, given many researchers rely on might consider additional combinations of moderating factors,
findings from Schmidt and Hunter (1998) that estimated the multiple should additional data become available.
R as only .52. Given these new findings, biodata should be given Additionally, this research focused on self-report biodata inven-
much more attention from selection researchers. tories. However, biographical information can be obtained in many
The low correlation between biodata scores and cognitive ability ways, including interviews with knowledgeable peers, coding
scores also has implications to other goals of employee selection, resume information, or even new methods of data mining that
such as fairness and diversity. Although this study did not examine leverage “big data.” Plenty of examples already exist for creative
adverse impact, it is commonly believed that biodata scores have ways to capture behavioral data from electronic sources (Bleidorn &
weaker group differences than other common selection assessments, Hopwood, 2019; Landers, 2019; Sajjadiani et al., 2019; Speer,
such as cognitive ability tests. Hough et al. (2001) concluded there 2018). Future research might compare how different methods of
were moderate differences in biodata scores between white and obtaining biodata information impacts the construct validity
black people and negligible differences between other groups (e.g., of biodata scores, with a particular focus on big data methods of
genders). Other research indicates a fluctuating pattern of results data obtainment and scoring.
between demographic groups (Breaugh, 2014; Breaugh et al., 2014; Finally, another limitation is that there was incomplete data for
Oswald et al., 2004; Stricker & Rock, 1998), which is likely due in many empirical scoring estimates. For some construct domains, the
part to comparing biodata scales that differ by construct domain. only method of scoring was rational, making it impossible to
Nonetheless, the high validity from this research coupled with low to separate validity differences due to the method of scoring or due
moderate adverse impact from the aforementioned studies speaks to to differences in construct domain. The small number of samples for
additional possible benefits of using biodata inventories. However, several of the effect sizes also reduces the stability of observed
future research should conduct updated meta-analyses on demo- effects. This was especially the case for the secondary performance
graphic group differences of biodata scores by construct domain and outcomes. Thus, more data are likely necessary to firmly understand
scoring method to help clarify these assumptions. biodata relationships for certain construct/method partitions and for
certain criteria. Additional data would also be helpful that examines
the relationship between biodata scores and job performance within
Additional Limitations and Areas for Future Research
applicant samples. Unfortunately, there were many more incumbent
There are other limitations to this research, as well as additional validation studies than applicant validation studies. During our article
areas for future research. For one, although we created a taxonomy search we contacted consulting companies for technical documentation
by which to categorize biodata scales, there were many possible of their biodata scales, with little luck. To establish a larger database of
ways to do this. Our approach was similar to past meta-analytic applicant biodata validation studies, it is likely that consulting compa-
efforts to code disparate measures into broader categorizations nies using biodata inventories will have to release more of their
(e.g., Christian et al., 2010; Huffcutt et al., 2001). We are not validation reports. If possible, future research should pursue this.
suggesting this taxonomy be used to describe all biodata scales.
In discussing this taxonomy with other biodata experts, some
Conclusion
suggested potentially adopting the O*NET taxonomy (National
Center for O*NET Development, 2017). We agree this might be Biodata inventories have long been used to hire job applicants,
a useful operationalization, though the O*NET taxonomy is tre- and yet there have been major conceptual limitations to our under-
mendously larger and more complex than the taxonomy we used. standing of the construct validity and utility of biodata measures.
Furthermore, we did not believe there is great conceptual alignment The present study helped clarify our understanding of biodata at
BIODATA META-ANALYSIS 21

work by partitioning effect sizes by biodata construct domain and Bliesener, T. (1996). Methodological moderators in validating biographical
scoring method, coming away with several major conclusions. data in personnel selection. Journal of Occupational and Organizational
(a) Biodata scales are not all the same, as there are meaningful Psychology, 69(1), 107–120. https://doi.org/10.1111/j.2044-8325.1996
differences by construct domain and by scoring method. (b) Biodata .tb00603.x
assessments exhibit meaningful levels of convergent validity with Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A
conceptual analysis of the inferential and evidential bases. Journal of
theoretically aligned external assessment measures, thus supporting
Applied Psychology, 74(3), 478–494. https://doi.org/10.1037/0021-9010
their construct validity. (c) Given the correlations between biodata
.74.3.478
and other external assessment methods, biodata assessments can add Bobko, P., Roth, P. L., & Potosky, D. (1999). Derivation and implications of
incremental validity when used with other selection measures. a meta-analytic matrix incorporating cognitive ability, alternative predic-
(d) Biodata assessments have strong criterion-related validity and tors, and job performance. Personnel Psychology, 52(3), 561–589. https://
especially when scoring keys are empirically derived. We hope that doi.org/10.1111/j.1744-6570.1999.tb00172.x
this work encourages more frequent use of biodata inventories in Bommer, W. H., Johnson, J. L., Rich, G. A., Podsakoff, P. M., & MacKenzie,
selection contexts and future research aimed at understanding the S. B. (1995). On the interchangeability of objective and subjective
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

contextual factors that determine when specific biodata measures measures of employee performance: A meta‐analysis. Personnel Psychol-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

will be most successful. ogy, 48(3), 587–605. https://doi.org/10.1111/j.1744-6570.1995.tb01772.x


Bosshardt, M. J., Carter, G. W., Gialluca, K. A., Dunnette, M. D., &
References Ashworth, S. D. (1992). Predictive validation of an insurance agent
support person selection battery. Journal of Business and Psychology,
References marked with an asterisk indicate studies included in the meta- 7(2), 213–224. https://doi.org/10.1007/BF01013930
analysis Brannick, M. T., Levine, E. L., & Morgeson, F. P. (2007). Job and work
analysis: Methods, research, and applications for human resource man-
Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and
agement. Sage Publications.
interests: Evidence for overlapping traits. Psychological Bulletin, 121(2),
Breaugh, J. A. (2009). The use of biodata for employee selection: Past
219–245. https://doi.org/10.1037/0033-2909.121.2.219
*Allen, M. T., Bynum, B. H., Oliver, J. T., Russell, T. L., Young, M. C., & research and future directions. Human Resource Management Review,
19(3), 219–231. https://doi.org/10.1016/j.hrmr.2009.02.003
Babin, N. E. (2014). Predicting leadership performance and potential in
Breaugh, J. A. (2014). Predicting voluntary turnover from job applicant
the US Army Officer Candidate School (OCS). Military Psychology,
biodata and other applicant information. International Journal of Selection
26(4), 310–326. https://doi.org/10.1037/mil0000056
*Allworth, E., & Hesketh, B. (1999). Construct‐oriented biodata: Capturing and Assessment, 22(3), 321–332. https://doi.org/10.1111/ijsa.12080
*Breaugh, J. A., Frye, K., Lee, D., Lammer, V., & Cox, J. (2014). The value
change‐related and contextually relevant future performance. Interna-
tional Journal of Selection and Assessment, 7(2), 97–111. https:// of biodata for selecting employees: Comparable results for job incumbent
doi.org/10.1111/1468-2389.00110 and job applicant samples? Journal of Organizational Psychology, 14(1),
*Allworth, E., & Hesketh, B. (2000). Job requirements biodata as a predictor 40–51.
*Brown, S. H. (1981). Validity generalization and situational moderation
of performance in customer service roles. International Journal of Selec-
tion and Assessment, 8(3), 137–147. https://doi.org/10.1111/1468- in the life insurance industry. Journal of Applied Psychology, 66(6),
2389.00142 664–670. https://doi.org/10.1037/0021-9010.66.6.664
*Brown, S. H., Stout, J. D., Dalessio, A. T., & Crosby, M. M. (1988).
Asher, J. J. (1972). The biographical item: Can it be improved? Personnel
Psychology, 25(2), 251–269. https://doi.org/10.1111/j.1744-6570.1972 Stability of validity indices through test score ranges. Journal of Applied
.tb01102.x Psychology, 73(4), 736–742. https://doi.org/10.1037/0021-9010.73.4.736
*Baehr, M. E., & Williams, G. B. (1968). Prediction of sales success from *Buel, W. D. (1964). Voluntary female clerical turnover: The concurrent and
factorially determined dimensions of personal background data. Journal of predictive validity of a weighted application blank. Journal of Applied
Applied Psychology, 52(2), 98–103. https://doi.org/10.1037/h0020587 Psychology, 48(3), 180–182. https://doi.org/10.1037/h0043613
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions *Buel, W. D. (1965). Biographical data and the identification of creative
and job performance: A meta-analysis. Personnel Psychology, 44(1), research personnel. Journal of Applied Psychology, 49(5), 319–321.
1–26. https://doi.org/10.1111/j.1744-6570.1991.tb00688.x https://doi.org/10.1037/h0022518
Becker, T. E., & Colquitt, A. L. (1992). Potential versus actual faking of a *Buel, W. D., Albright, L. E., & Glennon, J. R. (1966). A note on the
biodata form: An analysis along several dimensions of item type. Person- generality and cross-validity of personal history for identifying creative
nel Psychology, 45(2), 389–406. https://doi.org/10.1111/j.1744-6570 research scientists. Journal of Applied Psychology, 50(3), 217–219.
.1992.tb00855.x https://doi.org/10.1037/h0023364
*Becton, M. C., Matthews, M. C., Hartley, D. L., & Whitaker, D. H. (2009). *Caputo, P. M., Cucina, J. M., & Sacco, J. M. (2010, April). Approaches to
Using biodata to predict turnover, organizational commitment, and job empirical keying of international biodata instruments [Paper presenta-
performance in healthcare. International Journal of Selection and Assess- tion]. Annual Conference for the Society for Industrial and Organizational
ment, 17(2), 189–202. https://doi.org/10.1111/j.1468-2389.2009.00462.x Psychology, Atlanta, GA.
*Becton, M. C., Matthews, M. C., Hartley, D. L., & Whitaker, L. D. (2012). *Carlson, K. D., Scullen, S. E., Schmidt, F. L., Rothstein, H., & Erwin, F.
Using biodata as a predictor of errors, tardiness, policy violations, overall (1999). Generalizable biographical data validity can be achieved without
job performance, and turnover among nurses. Journal of Management multi‐organizational development and keying. Personnel Psychology,
& Organization, 18(5), 714–727. https://doi.org/10.5172/jmo.2012 52(3), 731–755. https://doi.org/10.1111/j.1744-6570.1999.tb00179.x
.18.5.714 *Carraher, S. M., Mendoza, J. L., Buckley, M. R., Schoenfeldt, L. F., &
*Berkeley, M. H. (1953). A comparison between the empirical and rational Carraher, C. E. (1998). Validation of an instrument to measure service-
approaches for keying a heterogeneous test (No. RB-53-24). Air Force orientation. Journal of Quality Management, 3(2), 211–224. https://
Personnel and Training Research Center. doi.org/10.1016/S1084-8568(99)80114-X
Bleidorn, W., & Hopwood, C. J. (2019). Using machine learning to advance *Chait, H. N., Carraher, S. M., & Buckley, M. R. (2000). Measuring service
personality assessment and theory. Personality and Social Psychology orientation with biodata. Journal of Managerial Issues, 12(1), 109–120.
Review, 23(2), 190–203. https://doi.org/10.1177/1088868318772990 https://www.jstor.org/stable/40604297
22 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

*Chaney, F. B. (1966). A cross-cultural study of industrial research perfor- *Drakeley, R. J., Herriot, P., & Jones, A. (1988). Biographical data, training
mance. Journal of Applied Psychology, 50(3), 206–210. https://doi.org/10 success and turnover. Journal of Occupational Psychology, 61(2),
.1037/h0023418 145–152. https://doi.org/10.1111/j.2044-8325.1988.tb00278.x
*Childers, O. (2016). Biodata: A thing of the past? Examining the predictive *Ducey, A. J. (2016). The cross-national generalizability of biographical
validity and user reactions of rationally-selected, empirically keyed data: An examination within a multinational organization [Doctoral
biodata [Doctoral dissertation, University of Huston]. ProQuest Theses dissertation, University of South Florida]. ProQuest Dissertations and
and Dissertations Global. Theses Global.
*Chirico, K. (2005). Predicting objective measures of performance *Edgerton, H. A., Feinberg, M. R., & Thomson, K. F. (1957). Prediction of
[Doctoral dissertation, Auburn University]. ProQuest Dissertations and the “Human Relations” Effectiveness of Industrial Supervisors 1. Person-
Theses Global. nel Psychology, 10(4), 421–430. https://doi.org/10.1111/j.1744-6570
Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational .1957.tb01614.x
judgment tests: Constructs assessed and a meta-analysis of their criterion- *Fetzer, M. S. (2004). An examination of the construct validity, criterion-
related validities. Personnel Psychology, 63(1), 83–117. https://doi.org/10 related validity, and adverse impact of the Cognitive Behavior Inventory
.1111/j.1744-6570.2009.01163.x (CBI) [Doctoral dissertation, The University of Southern Mississippi].
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

*Cline, V. B., Tucker, M. F., & Anderson, D. R. (1966). Psychology of the ProQuest Dissertations and Theses Global.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

scientist: XX. Cross-validation of biographical information predictor keys Fine, S. A., & Cronshaw, S. (1994). The role of job analysis in establishing
across diverse samples of scientists. Psychological Reports, 19(3), the validity of biodata. In M. D. Mumford & W. A. Owens (Eds.), Biodata
951–954. https://doi.org/10.2466/pr0.1966.19.3.951 handbook: Theory, research, and use of biographical information in
*Cline, V. B., Tucker, M. F., & Mulaik, S. A. (1966). Predicting factored selection and performance prediction (pp. 39–64). CPP Books.
criteria of performance measures of pharmaceutical scientists. SA Journal *Fiske, D. W. (1947). Validation of naval aviation cadet selection tests
of Industrial Psychology, 4(1), 8–15. against training criteria. Journal of Applied Psychology, 31(6), 601–614.
*Consulting Firm Technical Report. (2004). Validation of law enforcement https://doi.org/10.1037/h0054274
test: Validity report. *Frei, R. L. (1998). Fake this test! Do you have the ability to raise your score
*Conte, J. M. (1998). Time orientation, biodata, and personality predictors on a service orientation inventory? [Doctoral dissertation, The University
of multiple performance criteria [Doctoral dissertation, The Pennsylvania of Akron]. ProQuest Dissertations and Theses Global.
State University]. ProQuest Dissertations and Theses Global. Funder, D. C. (2016). The personality puzzle (8th ed.). W.W. Norton.
*Converse, P. D., Oswald, F. L., Imus, A., Hedricks, C., Roy, R., & Butera, Gandy, J. A., Dye, D. A., & MacLane, C. N. (1994). Federal government
H. (2008). Comparing personality test formats and warnings: Effects on selection: The individual achievement record. In M. D. Mumford & W. A.
criterion‐related validity and test‐taker reactions. International Journal of Owens (Eds.), Biodata handbook: Theory, research, and use of biograph-
Selection and Assessment, 16(2), 155–169. https://doi.org/10.1111/j ical information in selection and performance prediction (pp. 275–310).
.1468-2389.2008.00420.x CPP Books.
*Cooper, L. A. (1999). An investigation of the relationship between the ASA Gentry, W. A., Gilmore, D. C., Shuffler, M. L., & Leslie, J. B. (2012).
model and biodata [Doctoral dissertation, University of Georgia]. Pro- Political skill as an indicator of promotability among multiple rater
Quest Dissertations and Theses Global. sources. Journal of Organizational Behavior, 33(1), 89–104. https://
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological doi.org/10.1002/job.740
tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/ Gessner, T. L., O’Connor, J. A., Clifton, T. C., Connelly, M. S., & Mumford,
h0040957 M. D. (1993). The development of moral beliefs: A retrospective study.
*Cucina, J. M., Caputo, P. M., Thibodeaux, H. F., & MacLane, C. N. (2012). Current Psychology, 12(3), 236–259. https://doi.org/10.1007/BF02686806
Unlocking the key to biodata scoring: A comparison of empirical, rational, Gordon, L. V., & Medland, F. F. (1965). The cross-group stability of peer
and hybrid approaches at different sample sizes. Personnel Psychology, ratings of leadership potential. Personnel Psychology, 18(2), 173–177.
65(2), 385–428. https://doi.org/10.1111/j.1744-6570.2012.01244.x https://doi.org/10.1111/j.1744-6570.1965.tb00275.x
*Cucina, J. M., Caputo, P. M., Thibodeaux, H. F., MacLane, C. N., & Graham, K. E., McDaniel, M. A., Douglas, E. F., & Snell, A. F. (2002).
Bayless, J. M. (2013). Scoring biodata: Is it rational to be quasi-rational? Biodata validity decay and score inflation with faking: Do item attributes
International Journal of Selection and Assessment, 21(2), 226–232. explain variance across items? Journal of Business and Psychology, 16(4),
https://doi.org/10.1111/ijsa.12032 573–592. https://doi.org/10.1023/A:1015454319119
*Dalessio, A. T., & Silverhart, T. A. (1994). Combining biodata test and *Griffin, B., & Hesketh, B. (2004). Why openness to experience is not a good
interview information: Predicting decisions and performance criteria. predictor of job performance. International Journal of Selection and
Personnel Psychology, 47(2), 303–315. https://doi.org/10.1111/j.1744- Assessment, 12(3), 243–251. https://doi.org/10.1111/j.0965-075X.2004
6570.1994.tb01726.x .278_1.x
*Dean, M. A. (1999). On biodata construct validity, criterion-related *Grösch, N. (2004) Validation of biodata inventory for expatriate selection.
validity, and adverse impact [Doctoral dissertation, Louisiana State Assessing cross-cultural adaptability [Doctoral dissertation, Auburn
University and Agricultural & Mechanical College]. ProQuest Disserta- University]. ProQuest Dissertations and Theses Global.
tions and Theses Global. *Harold, C. M., McFarland, L. A., & Weekley, J. A. (2006). The validity of
*Dean, M. A. (2004). An assessment of biodata predictive ability across verifiable and non-verifiable biodata items: An examination across appli-
multiple performance criteria. Applied H.R.M. Research, 9(1), 1–12. cants and incumbents. International Journal of Selection and Assessment,
*Dean, M. A. (2013). Examination of ethnic group differential responding on 14(4), 336–346. https://doi.org/10.1111/j.1468-2389.2006.00355.x
a biodata instrument. Journal of Applied Social Psychology, 43(9), 1905– *Harrington, C. (1969). Forecasting college performance from biographical
1917. https://doi.org/10.1111/jasp.12212 data. Journal of College Student Personnel, 10(3), 156–160.
*Dean, M. A., & Russell, C. J. (2005). An examination of biodata theory‐based *Hawes, S. R. (2001). A comparison of biodata, ability, and a conditional
constructs in a field context. International Journal of Selection and Assess- reasoning test as predictors of reliable behavior in the workplace [Doc-
ment, 13(2), 139–149. https://doi.org/10.1111/j.0965-075X.2005.00308.x toral dissertation, The University of Tennessee]. ProQuest Dissertations
DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and and Theses Global.
domains: 10 aspects of the Big Five. Journal of Personality and Social *Helmreich, R., Bakeman, R., & Radloff, R. (1973). The life history
Psychology, 93(5), 880–896. https://doi.org/10.1037/0022-3514.93.5.880 questionnaire as a predictor of performance in Navy diver training.
BIODATA META-ANALYSIS 23

Journal of Applied Psychology, 57(2), 148–153. https://doi.org/10.1037/ emotional stability—with job satisfaction and job performance: A
h0037126 meta-analysis. Journal of Applied Psychology, 86(1), 80–92. https://
*Hilliard, P. A. (2002). Comparison of the predictive validity of a written doi.org/10.1037/0021-9010.86.1.80
test, an integrity test, a conscientiousness questionnaire, a structured Judge, T. A., Bono, J. E., Ilies, R., & Gerhardt, M. W. (2002). Personality and
behavioral interview and a personality inventory in the assessment of job leadership: A qualitative and quantitative review. Journal of Applied
applicants’ background investigations, and subsequent task and contex- Psychology, 87(4), 765–780. https://doi.org/10.1037/0021-9010.87.4.765
tual job performance [Doctoral dissertation, University of Southern Judge, T. A., Rodell, J. B., Klinger, R. L., Simon, L. S., & Crawford, E. R.
California]. ProQuest Dissertations and Theses Global. (2013). Hierarchical representations of the five-factor model of personality
*Hinrichs, J. R., Haanperä, S., & Sonkin, L. (1976). Validity of a biographi- in predicting job performance: Integrating three organizing frameworks
cal information blank across national boundaries. Personnel Psychology, with two theoretical perspectives. Journal of Applied Psychology, 98(6),
29(3), 417–421. https://doi.org/10.1111/j.1744-6570.1976.tb00425.x 875–925. https://doi.org/10.1037/a0033901
*Hoffman, R. R., III, Muraca, S., Heffner, T. S., Hendricks, R., & Hunter, *Karas, M., & West, J. (1999). Construct-oriented biodata development for
A. E. (2008). Selection for accelerated basic combat training (No. ARI- selection to a differentiated performance domain. International Journal of
TR-1241). U.S. Army Research Institute for the Behavioral and Social Selection and Assessment, 7(2), 86–96. https://doi.org/10.1111/1468-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Sciences. https://apps.dtic.mil/dtic/tr/fulltext/u2/a494811.pdf 2389.00109


This document is copyrighted by the American Psychological Association or one of its allied publishers.

*Hoiberg, A., Booth, R. F., & Berry, N. H. (1977). Non-cognitive variables *Kilcullen, R. N. (1995). The development and use of rationally-keyed
related to performance in Navy “A” schools. Psychological Reports, background data scales to predict leader effectiveness [Doctoral disserta-
41(2), 647–655. https://doi.org/10.2466/pr0.1977.41.2.647 tion, George Mason University]. ProQuest Dissertations and These
*Horgen, K. E., Kubisiak, U. C., Connell, P. W., White, L. A., Bruk-Lee, Global.
V. B., Penney, L. M., Borman, W. C., Kaufman, J. D., & Bowles, S. V. *Kilcullen, R., Goodwin, J., Chen, G., Wisecarver, M., & Sanders, M. (2002).
(2005). Station commander job analysis and preliminary test validation Identifying agile and versatile officers to serve in the objective force. U.S.
results (Research No. 2006-01). U.S. Army Research Institute for the Army Research Institute for the Behavioral and Social Sciences.
Behavioral and Social Sciences. https://apps.dtic.mil/dtic/tr/fulltext/u2/ *Kilcullen, R. N., Putka, D. J., & McCloy, R. A. (2007). Validation of the
a440172.pdf rational biodata inventory (RBI): Concurrent validation of experimental
Hough, L. M. (2010). Assessment of background and life experience: The army enlisted personnel selection and classification measures (contractor
past as prologue. In J. C. Scott & D. H. Reynolds (Eds.), Handbook of report). Human Resources Research Organization.
workplace assessment: Evidence-based practices for selecting and devel- *Knapp, D. J., & Heffner, T. S. (2010). Expanded enlistment eligibility
oping organizational talent (pp. 109–139). Jossey-Bass. metrics (EEEM): Recommendations on a non-cognitive screen for new
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, solider selection (Report No. 1267). U.S. Army Research Institute for the
detection, and amelioration of adverse impact in personnel selection Behavioral and Social Sciences. https://apps.dtic.mil/dtic/tr/fulltext/u2/
procedures: Issues, evidence, and lessons learned. International Journal a523962.pdf
of Selection and Assessment, 9(1–2), 152–194. https://doi.org/10.1111/ König, C. J., Klehe, U. C., Berchtold, M., & Kleinmann, M. (2010). Reasons
1468-2389.00171 for being selective when choosing personnel selection procedures. Inter-
Hough, L., & Paullin, C. (1994). Construct-oriented scale construction: The national Journal of Selection and Assessment, 18(1), 17–27. https://
rational approach. In G. S. Stokes, M. D. Mumford, & W. A. Owens doi.org/10.1111/j.1468-2389.2010.00485.x
(Eds.), Biodata handbook: Theory, research, and use of biographical Landers, R. N. (2019). The Cambridge handbook of technology and
information in selection and performance prediction (pp. 109–145). employee behavior. Cambridge University Press. https://doi.org/10
CPP Books. .1017/9781108649636
Huang, Y. M., Chen, C. C., & Lai, S. Y. (2013). Test of a multidimensional *Laurent, H. (1970). Cross-cultural cross-validation of empirically validated
model linking applicant work experience and recruiters’ inferences about tests. Journal of Applied Psychology, 54(5), 417–423. https://doi.org/10
applicant competencies. International Journal of Human Resource Man- .1037/h0029920
agement, 24(19), 3613–3629. https://doi.org/10.1080/09585192.2013 *Law, K. S., Mobley, W. H., & Wong, C. (2002). Impression management
.777935 and faking in biodata scores among Chinese job-seekers. Asia Pacific
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Stone, N. J. (2001). Journal of Management, 19(4), 541–556. https://doi.org/10.1023/A:
Identification and meta-analytic assessment of psychological constructs 1020521726390
measured in employment interviews. Journal of Applied Psychology, *Lefkowitz, J. (1972). Differential validity: Ethnic group as a moderator in
86(5), 897–913. https://doi.org/10.1037/0021-9010.86.5.897 predicting tenure. Personnel Psychology, 25(2), 223–240. https://doi.org/
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative 10.1111/j.1744-6570.1972.tb01100.x
predictors of job performance. Psychological Bulletin, 96(1), 72–98. *Legree, P. J., Kilcullen, R. N., Putka, D. J., & Wasko, L. E. (2014).
https://doi.org/10.1037/0033-2909.96.1.72 Identifying the leaders of tomorrow: Validating predictors of leader
Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual performance. Military Psychology, 26(4), 292–309. https://doi.org/10
differences in output variability as a function of job complexity. Journal .1037/mil0000054
of Applied Psychology, 75(1), 28–42. https://doi.org/10.1037/0021-9010 Lievens, F., & Sackett, P. R. (2017). The effects of predictor method factors
.75.1.28 on selection outcomes: A modular approach to personnel selection
Jackson, D. N. (1975). The relative validity of scales prepared by naive item procedures. Journal of Applied Psychology, 102(1), 43–66. https://
writers and those based on empirical methods of personality scale con- doi.org/10.1037/apl0000160
struction. Educational and Psychological Measurement, 35(2), 361–370. *Lyons, T. J., Bayless, J. A., & Park, R. K. (2001). Relationship of cognitive,
https://doi.org/10.1177/001316447503500214 biographical, and personality measures with the training and job perfor-
Joseph, D. L., Jin, J., Newman, D. A., & O’Boyle, E. H. (2015). Why does mance of detention enforcement officers in a federal government agency.
self-reported emotional intelligence predict job performance? A meta- Applied H.R.M. Research, 6(1), 67–70.
analytic investigation of mixed EI. Journal of Applied Psychology, 100(2), *MacLane, C. N., Cucina, J. M., Busciglio, H. H., & Su, C. (2019).
298–342. https://doi.org/10.1037/a0037681 Supervisory opportunity to observe moderates criterion‐related validity
Judge, T. A., & Bono, J. E. (2001). Relationship of core self-evaluations estimates. International Journal of Selection and Assessment, 28(1),
traits—self-esteem, generalized self-efficacy, locus of control, and 55–67. https://doi.org/10.1111/ijsa.12267
24 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

Mael, F. A. (1991). A conceptual rationale for the domain and attributes of *Mount, M. K., Witt, L. A., & Barrick, M. R. (2000). Incremental validity of
biodata items. Personnel Psychology, 44(4), 763–792. https://doi.org/10 empirically keyed biodata scales over GMA and the five factor personality
.1111/j.1744-6570.1991.tb00698.x constructs. Personnel Psychology, 53(2), 299–323. https://doi.org/10
*Mael, F. A., & Ashforth, B. E. (1995). Loyal from day one: Biodata, .1111/j.1744-6570.2000.tb00203.x
organizational identification, and turnover among newcomers. Personnel Mumford, M. D., Barrett, J. D., & Hester, K. S. (2012). Background data:
Psychology, 48(2), 309–333. https://doi.org/10.1111/j.1744-6570.1995 Use of experiential knowledge in personnel selection. In N. Schmitt (Ed.),
.tb01759.x The Oxford handbook of personnel assessment and selection (pp. 353–
*Mael, F. A., & Hirsch, A. C. (1993). Rainforest empiricism and quasi- 382). Oxford University Press.
rationality: Two approaches to objective biodata. Personnel Psychology, Mumford, M. D., Costanza, D. P., Connelly, M. S., & Johnson, J. F. (1996).
46(4), 719–738. https://doi.org/10.1111/j.1744-6570.1993.tb01566.x Item generation procedures and background data scales: Implications for
*Manley, G., Benavidez, J., & Dunn, K. (2007). Development of a construct and criterion-related validity. Personnel Psychology, 49(2),
personality biodata measure to predict ethical decision making. Journal 361–398. https://doi.org/10.1111/j.1744-6570.1996.tb01804.x
of Managerial Psychology, 22(7), 664–682. https://doi.org/10.1108/ Mumford, M. D., & Owens, W. A. (1987). Methodology review: Principles,
02683940710820091 procedures, and findings in the application of background data measures.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

*Marler, L. E. (2008). Proactive behavior: A selection perspective [Doctoral Applied Psychological Measurement, 11(1), 1–31. https://doi.org/10
This document is copyrighted by the American Psychological Association or one of its allied publishers.

dissertation, Louisiana Tech University]. ProQuest Dissertations and .1177/014662168701100101


Theses Global. Mumford, M. D., Stokes, G. S., & Owens, W. A. (1990). Patterns of life
*Matteson, M. T. (1978). An alternative approach to using biographical data history: The ecology of human individuality. Lawrence Erlbaum.
for predicting job success. Journal of Occupational Psychology, 51(2), Murphy, K. R., Cleveland, J. N., & Hanscom, M. E. (2018). Performance
155–162. https://doi.org/10.1111/j.2044-8325.1978.tb00410.x appraisal and management. Sage Publications.
*McDaniel, M. A. (1989). Biographical constructs for predicting employee National Center for O*NET Development. (December 7th, 2017). O*NET-
suitability. Journal of Applied Psychology, 74(6), 964–970. https:// SOC Taxonomy. https://www.onetcenter.org/taxonomy.html
doi.org/10.1037/0021-9010.74.6.964 Nickels, B. J. (1994). The nature of biodata. In G. S. Stokes, M. D. Mumford,
*McElreath, J. (1999). Development of a biodata measure of leadership & W. A. Owens (Eds.), Biodata handbook: Theory, research, and use of
skills [Doctoral dissertation, Wayne State University]. ProQuest Disserta- biographical information in selection and performance prediction (pp. 1–
tions and Theses Global. 16). CPP Books.
*McElreath, J., Cucina, J. M., Busciglio, H., & Reilly, S. M. (2007). *O’Connell, M. S., Hattrup, K., Doverspike, D., & Cober, A. (2002). The
Construct validity of competency-based biodata scales in an enforcement validity of “mini” simulations for Mexican retail salespeople. Journal of
occupation [Paper presentation]. 22nd meeting of the Society for Indus- Business and Psychology, 16(4), 593–600. https://doi.org/10.1023/A:
trial and Organizational Psychology, New York, New York, United 1015406420028
States.https://doi.org/10.1037/e518532013-165 Ones, D. S., & Viswesvaran, C. (1996). Bandwidth-fidelity dilemma in
*McFarland, L. A., & Ryan, A. M. (2000). Variance in faking across personality measurement for personnel selection. Journal of Organiza-
noncognitive measures. Journal of Applied Psychology, 85(5), 812– tional Behavior, 17(6), 609–626. https://doi.org/10.1002/(SICI)1099-
821. https://doi.org/10.1037/0021-9010.85.5.812 1379(199611)17:6<609::AID-JOB1828>3.0.CO;2-K
McNatt, D. B. (2000). Ancient Pygmalion joins contemporary management: *Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J., & Gillespie, M. A.
A meta-analysis of the result. Journal of Applied Psychology, 85(2), (2004). Developing a biodata measure and situational judgment inventory
314–322. https://doi.org/10.1037/0021-9010.85.2.314 as predictors of college student performance. Journal of Applied Psychol-
McIlveen, P., Beccaria, G., & Burton, L. J. (2013). Beyond conscientious- ogy, 89(2), 187–207. https://doi.org/10.1037/0021-9010.89.2.187
ness: Career optimism and satisfaction with academic major. Journal of *Owens, W. A. (1969). Cognitive, noncognitive, and environmental corre-
Vocational Behavior, 83(3), 229–236. https://doi.org/10.1016/j.jvb.2013 lates of mechanical ingenuity. Journal of Applied Psychology, 53(3 part 1),
.05.005 199–208. https://doi.org/10.1037/h0027378
*Mead, A. D. (2000). Properties of a resampling validation technique for Owens, W. A., & Schoenfeldt, L. F. (1979). Toward a classification of
empirically scoring psychological assessments [Doctoral dissertation, persons. Journal of Applied Psychology, 64(5), 569–607. https://doi.org/
University of Illinois at Urbana-Champaign]. ProQuest Dissertations 10.1037/0021-9010.64.5.569
and Theses Global. Page, M. J., McKenzie, J., Bossuyt, P., Boutron, I., Hoffmann, T., &
Meriac, J. P., Hoffman, B. J., Woehr, D. J., & Fleisher, M. S. (2008). Further Mulrow, C. D. (2020). The PRISMA 2020 statement: an updated guideline
evidence for the validity of assessment center dimensions: A meta-analysis for reporting systematic reviews. MetaArXiv.
of the incremental criterion-related validity of dimension ratings. Journal *Ployhart, R. E., Weekley, J. A., Holtz, B. C., & Kemp, C. (2003). Web-
of Applied Psychology, 93(5), 1042–1052. https://doi.org/10.1037/0021- based and paper-and-pencil testing of applicants in a proctored setting: Are
9010.93.5.1042 personality, biodata and situational judgment tests comparable? Personnel
*Michael, W. B., & Colson, K. R. (1979). The development and validation of Psychology, 56(3), 733–752. https://doi.org/10.1111/j.1744-6570.2003
a life experience inventory for the identification of creative electrical .tb00757.x
engineers. Educational and Psychological Measurement, 39(2), 463–470. Poropat, A. E. (2009). A meta-analysis of the five-factor model of personality
https://doi.org/10.1177/001316447903900228 and academic performance. Psychological Bulletin, 135(2), 322–338.
*Mitchell, T. W., & Klimoski, R. J. (1982). Is it rational to be empirical? https://doi.org/10.1037/a0014996
A test of methods for scoring biographical data. Journal of Applied *Prasad, J. J., Showler, M. B., Schmitt, N., Ryan, A. M., & Nye, C. D.
Psychology, 67(4), 411–418. https://doi.org/10.1037/0021-9010.67.4.411 (2017). Using biodata and situational judgment inventories across cultural
*Mock, S. J. (1947). Biographical data. In J. P. Guilford (Ed.), Printed groups. International Journal of Testing, 17(3), 210–233. https://doi.org/
classification tests (pp. 767–795). Army Air Forces Aviation Psychology 10.1080/15305058.2016.1218338
Program Research Reports. Principles for the Validation and Use of Personnel Selection Procedures.
*Moffett, R. G., III. (1997). Relationship between the Big Five personality (2018). Industrial and organizational psychology: Perspectives on
factors and biodata factors [Doctoral dissertation, Auburn University]. science and practice, 11(Suppl. 1), 2–97. https://doi.org/10.1017/iop
ProQuest Dissertations and Theses Global. .2018.195
BIODATA META-ANALYSIS 25

*Pulakos, E. D., & Schmitt, N. (1996). An evaluation of two strategies for 85 years of research findings. Psychological Bulletin, 124(2), 262–274.
reducing adverse impact and their effects on criterion-related validity. https://doi.org/10.1037/0033-2909.124.2.262
Human Performance, 9(3), 241–258. https://doi.org/10.1207/ Schmidt, F. L., & Hunter, J. (2004). General mental ability in the world of
s15327043hup0903_4 work: Occupational attainment and job performance. Journal of Person-
Pulakos, E. D., & O’Leary, R. S. (2011). Why is performance management ality and Social Psychology, 86(1), 162–173. https://doi.org/10.1037/
broken? Industrial and Organizational Psychology: Perspectives on 0022-3514.86.1.162
Science and Practice, 4(2), 146–164. https://doi.org/10.1111/j.1754- Schmidt, F., & Hunter, J. (2015). Methods of meta-analysis (Third Edition).
9434.2011.01315.x Sage Publications.
*Putka, D. J. (2009). Initial development and validation of assessments for Schmitt, N., & Golubovich, J. (2013). Biographical information. In K. F.
predicting disenrollment of four-year scholarship recipients from the Geisinger (Ed.), Test theory and testing and assessment in industrial and
reserve officer training corps (Report No. 2009-06). U.S. Army organizational psychology (pp. 437–455). APA Handbook of Testing and
Research Institute for the Behavioral and Social Sciences. https://apps Assessment in Psychology Psychological Association.
.dtic.mil/dtic/tr/fulltext/u2/a495510.pdf *Schmitt, N., Keeney, J., Oswald, F. L., Pleskac, T. J., Billington, A. Q.,
*Putka, D. J., Beatty, A. S., & Reeder, M. C. (2018). Modern predic- Sinha, R., & Zorzie, M. (2009). Prediction of 4-year college student
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

tion methods: New perspectives on a common problem. Organiza- performance using cognitive and noncognitive predictors and the impact
This document is copyrighted by the American Psychological Association or one of its allied publishers.

tional Research Methods, 21(3), 689–732. https://doi.org/10.1177/ on demographic status of admitted students. Journal of Applied Psychol-
1094428117697041 ogy, 94(6), 1479–1497. https://doi.org/10.1037/a0016810
Putka, D. J., & Bradley, K. M. (2008). Relations between Select21 predictor *Schmitt, N., Oswald, F. L., Kim, B. H., Imus, A., Merritt, S., Friede, A., &
measures and first term attrition (Research Note No. 2008–02). U.S. Shivpuri, S. (2007). The use of background and ability profiles to predict
Army Research Institute for the Behavioral and Social Sciences. college student outcomes. Journal of Applied Psychology, 92(1), 165–179.
*Reeder, M. C., & Schmitt, N. (2013). Motivational and judgment predictors https://doi.org/10.1037/0021-9010.92.1.165
of African American academic achievement at PWIs and HBCUs. Journal Schneider, R. J., Hough, L. M., & Dunnette, M. D. (1996). Broadsided by
of College Student Development, 54(1), 29–42. https://doi.org/10.1353/ broad traits: How to sink science in five dimensions or less. Journal of
csd.2013.0006 Organizational Behavior, 17(6), 639–655. https://doi.org/10.1002/(SICI)
Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of some alternative 1099-1379(199611)17:6<639::AID-JOB3828>3.0.CO;2-9
employee selection procedures. Personnel Psychology, 35(1), 1–62. *Schoenfeldt, L. F. (1999). From dust bowl empiricism to rational constructs
https://doi.org/10.1111/j.1744-6570.1982.tb02184.x in biographical data. Human Resource Management Review, 9(2), 147–
Reiter-Palmon, R., & Connelly, M. S. (2000). Item selection counts: A 167. https://doi.org/10.1016/S1053-4822(99)00016-9
comparison of empirical key and rational scale validities in theory-based *Scollay, R. W. (1957). Personal history data as a predictor of success.
and non-theory-based item pools. Journal of Applied Psychology, 85(1), Personnel Psychology, 10(1), 23–26. https://doi.org/10.1111/j.1744-6570
143–151. https://doi.org/10.1037/0021-9010.85.1.143 .1957.tb00763.x
*Ritchie, R. J., & Boehm, V. R. (1977). Biographical data as a predictor of *Sinha, R., Oswald, F., Imus, A., & Schmitt, N. (2011). Criterion-focused
women’s and men’s management potential. Journal of Vocational Behav- approach to reducing adverse impact in college admissions. Applied
ior, 11(3), 363–368. https://doi.org/10.1016/0001-8791(77)90031-8 Measurement in Education, 24(2), 137–161. https://doi.org/10.1080/
*Roenicke, C. C. (2013). Extending the frame of reference effect beyond 08957347.2011.554605
conscientiousness [Doctoral dissertation, Seattle Pacific University]. *Sisco, H., & Reilly, R. R. (2007). Development and validation of a Biodata
ProQuest Dissertations and Theses Global. Inventory as an alternative method to measurement of the five-factor
Roth, P. L., Huffcutt, A. I., & Bobko, P. (2003). Ethnic group differences in model of personality. The Social Science Journal, 44(2), 383–389. https://
measures of job performance: A new meta-analysis. Journal of Applied doi.org/10.1016/j.soscij.2007.03.010
Psychology, 88(4), 694–706. https://doi.org/10.1037/0021-9010.88.4.694 *Solomonson, A. L. (1999). Development and evaluation of a construct-
Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, oriented biodata measure for predicting positive and negative contextual
C. P. (1990). Biographical data in employment selection: Can validities be performance [Doctoral dissertation, University of Georgia]. ProQuest
made generalizable? Journal of Applied Psychology, 75(2), 175–184. Dissertations and Theses Global.
https://doi.org/10.1037/0021-9010.75.2.175 Speer, A. B. (2018). Quantifying with words: An investigation of the validity
Russell, C. J. (1994). Generation procedures for biodata items: A point of of narrative‐derived performance scores. Personnel Psychology, 71(3),
departure. In M. D. Mumford & W. A. Owens (Eds.), Biodata handbook: 299–333. https://doi.org/10.1111/peps.12263
Theory, research, and use of biographical information in selection and *Speer, A. B., Siver, S. R., & Christiansen, N. D. (2020). Applying theory to
performance prediction (pp. 17–38). CPP Books. the black box: A model for empirically scoring biodata. International
*Russell, C. J., Kanntrowitz, T., Tuzinski, K., & Reddock, C. (2015, April). Journal of Selection and Assessment, 28(1), 68–84. https://doi.org/10
Novel research and advances in biodata. In Dunleavy, E. & Cucina, J. .1111/ijsa.12271
(Chairs), Crafting biographical information items to predict network Stokes, G. S. (1994). Introduction and history. In M. D. Mumford & W. A.
leadership [Symposium]. Annual Conference for the Society of Industrial Owens (Eds.), Biodata handbook: Theory, research, and use of biograph-
and Organizational Psychologists, Philadelphia, PA. ical information in selection and performance prediction (pp. xv–xix).
*Russell, T. L., Paullin, C. J., Legree, P. J., Kilcullen, R. N., & Young, M. C. CPP Books.
(2017). Identifying and validating selection tools for predicting officer *Stokes, G. S., & Cooper, L. A. (2001). Content/construct approaches in life
performance and retention (Research No. 2017-01). U.S. Army Research history form development for selection. International Journal of Selection
Institute for the Behavioral and Social Sciences. https://apps.dtic.mil/dtic/ and Assessment, 9(1–2), 138–151. https://doi.org/10.1111/1468-2389.00170
tr/fulltext/u2/1038674.pdf Stokes, G. S., Mumford, M. D., & Owens, W. A. (Eds.). (1994). Biodata
Sajjadiani, S., Sojourner, A. J., Kammeyer-Mueller, J. D., & Mykerezi, E. handbook: Theory, research, and use of biographical information in
(2019). Using machine learning to translate applicant work history into selection and performance prediction. CPP Books.
predictors of performance and turnover. Journal of Applied Psychology, *Stokes, G. S., & Searcy, C. A. (1999). Specification of scales in biodata
104(10), 1207–1225. https://doi.org/10.1037/apl0000405 form development: Rational vs. empirical and global vs. specific. Inter-
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection national Journal of Selection and Assessment, 7(2), 72–85. https://doi.org/
methods in personnel psychology: Practical and theoretical implications of 10.1111/1468-2389.00108
26 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

*Stokes, G. S., Toth, C. S., Searcy, C. A., Stroupe, J. P., & Carter, G. W. Uhlman, C. E., & Mumford, M. D. (1993). Application of background data
(1999). Construct/rational biodata dimensions to predict salesperson to the selection of State Department foreign service officers. U.S.
performance: Report on the U.S. department of labor sales study. Human Department of State.
Resource Management Review, 9(2), 185–218. https://doi.org/10.1016/ van der Linden, D., Nijenhuis, J., & Bakker, A. B. (2010). The general factor
S1053-4822(99)00018-2 of personality: A meta-analysis of Big Five intercorrelations and a
*Stricker, L. J., & Rock, D. A. (1998). Assessing leadership potential with a criterion-related validity study. Journal of Research in Personality,
biographical measure of personality traits. International Journal of Selec- 44(3), 315–327. https://doi.org/10.1016/j.jrp.2010.03.003
tion and Assessment, 6(3), 164–184. https://doi.org/10.1111/1468- *Van Iddekinge, C. H., Ferris, G. R., & Heffner, T. S. (2009). Test of A
2389.00087 multistage model of distal and proximal antecedents of leader perfor-
Sturman, M. C., Cheramie, R. A., & Cashen, L. H. (2005). The impact of job mance. Personnel Psychology, 62(3), 463–495. https://doi.org/10.1111/j
complexity and performance measurement on the temporal consistency, .1744-6570.2009.01145.x
stability, and test-retest reliability of employee job performance ratings. Van Iddekinge, C. H., Roth, P. L., Putka, D. J., & Lanivich, S. E. (2011). Are
Journal of Applied Psychology, 90(2), 269–283. https://doi.org/10.1037/ you interested? A meta-analysis of relations between vocational interests
0021-9010.90.2.269 and employee performance and turnover. Journal of Applied Psychology,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

*Taylor, C. W., & Ellison, R. L. (1967). Biographical predictors of scientific 96(6), 1167–1194. https://doi.org/10.1037/a0024343
This document is copyrighted by the American Psychological Association or one of its allied publishers.

performance. Science, 155(3766), 1075–1080. https://doi.org/10.1126/ Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis
science.155.3766.1075 of the reliability of job performance ratings. Journal of Applied Psychol-
Taylor, P. J., Russ-Eft, D. F., & Chan, D. W. L. (2005). A meta-analytic ogy, 81(5), 557–574. https://doi.org/10.1037/0021-9010.81.5.557
review of behavior modeling training. Journal of Applied Psychology, Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general
90(4), 692–709. https://doi.org/10.1037/0021-9010.90.4.692 factor in ratings of job performance? A meta-analytic framework for
*Tears, R. S. (2002). Development and evaluation of a construct-oriented disentangling substantive and error influences. Journal of Applied
biodata measure for predicting organizational citizenship behaviors Psychology, 90(1), 108–131. https://doi.org/10.1037/0021-9010.90
[Doctoral dissertation, Auburn University]. ProQuest Dissertations and .1.108
Theses Global. *Wasko, L. E., Putka, D. J., Legree, P. J., & Kilcullen, R. N. (2019).
*Telonson, P. A., Alexander, R. A., & Barrett, G. V. (1983). Scoring the Validation of measures for predicting leader development and assessment
biographical information blank: A comparison of three weighting tech- course performance (Technical Report No. 1375). U.S. Army Research
niques. Applied Psychological Measurement, 7(1), 73–80. https://doi.org/ Institute for the Behavioral and Social Sciences. https://apps.dtic.mil/dtic/
10.1177/014662168300700110 tr/fulltext/u2/1080161.pdf
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: Webb, N. M., Shavelson, R. J., & Haertel, E. H. (2006). 4 reliability
A response to Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and coefficients and generalizability theory. Handbook of Statistics, 26, 81–
Schmitt (2007). Personnel Psychology, 60(4), 967–993. https://doi.org/10 124. https://doi.org/10.1016/S0169-7161(06)26004-8
.1111/j.1744-6570.2007.00098.x *Whitney, D. J., & Schmitt, N. (1997). Relationship between culture and
Thorndike, R. L. (1949). Personnel selection: Test and measurement responses to biodata employment items. Journal of Applied Psychology,
techniques. Wiley. 82(1), 113–129. https://doi.org/10.1037/0021-9010.82.1.113
*Tucker, M. F., Cline, V. B., & Schmitt, J. R. (1967). Prediction of creativity *Wollowick, H. B., & McNamara, W. J. (1969). Relationship of the
and other performance measures from biographical information among components of an assessment center to management success. Journal
pharmaceutical scientists. Journal of Applied Psychology, 51(2), 131–138. of Applied Psychology, 53(5), 348–352. https://doi.org/10.1037/
https://doi.org/10.1037/h0024428 h0028102

(Appendices follow)
BIODATA META-ANALYSIS 27

Appendix A
Studies Used to Establish Range Restriction Estimates

Study Type N uSD rSD u


Brown (1981)—Company 1 Direct selection 3,590 4.98 4.71 0.95
Brown (1981)—Company 2 Direct selection 768 5.07 4.61 0.91
Brown (1981)—Company 3 Direct selection 949 5.20 4.37 0.84
Brown (1981)—Company 4 Direct selection 1,606 4.78 4.64 0.97
Brown (1981)—Company 5 Direct selection 752 5.13 4.74 0.92
Brown (1981)—Company 6 Direct selection 893 5.14 4.48 0.87
Brown (1981)—Company 7 Direct selection 606 4.93 4.79 0.97
Brown (1981)—Company 8 Direct selection 793 5.02 4.46 0.89
Brown (1981)—Company 9 Direct selection 771 4.92 4.24 0.86
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Brown (1981)—Company 10 Direct selection 658 5.12 4.56 0.89


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Brown (1981)—Company 11 Direct selection 661 4.98 4.44 0.89


Brown (1981)—Company 12 Direct selection 406 4.80 4.37 0.91
Becton et al. (2009) Direct selection 891 16.62 14.83 0.89
Ployhart et al. (2003) Indirect selection 425 10.15 10.22 1.01
Rothstein et al., 1990 Indirect selection 11,332 4.72 4.68 0.99
Consulting company data Indirect selection 1,043 0.15 0.15 1.00

Note. N = sample size; uSD = unrestricted standard deviation, determined as an applicant sample; rSD = restricted standard deviation; u = the restricted
standard deviation divided by the unrestricted standard deviation. N is based on an incumbent sample.

Appendix B
Meta-Analytical Results for “Other” Biodata Category

If multiple “other” measures existed within the same sample, It should be noted that because the “other” category is composed of
simple averages were computed for the “other” biodata cate- heterogenous scale domains and has no psychological meaning, results
gory when composites needed to be formed; this reflects a should be interpreted with caution. Findings regarding overall validity,
randomly chosen measure from this undefined construct scoring method, and comparisons by other moderators are confounded
domain. by the lack of clarity in this category of biodata scales.

(Appendices continue)
28 SPEER, TENBRINK, WEGMEYER, SENDRA, SHIHADEH, AND KAUR

Table B1
Meta-Analytical Results for the “Other” Biodata Domain

Category k N r p SDp(r) 80% CV lower 80% CV upper


Overall job performance ratings
Other 22 8,969 0.01 0.01 .07 (.08) −.08 .10
Rational 19 7,667 −0.01 −0.02 .05 (.07) −.08 .05
Applicant predictive 9 4,311 0.02 0.04 .07 (.08) −.06 .13
Incumbent 13 4,658 −0.01 −0.01 .06 (.08) −.09 .06
Aligned job performance ratings
Other 14 4,499 0.28 0.39 .15 (.17) .21 .58
Empirical 9 1,374 0.23 0.32 .00 (.09) .32 .32
Hybrid 3 1885 0.35 0.48 .19 (.21) .23 .73
Rational 3 1,792 0.24 0.34 .09 (.11) .23 .45
Objective performance
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Other 6 2,710 0.06 0.07 .00 (.05) .07 .07


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Rational 4 2,567 0.05 0.06 .00 (.03) .06 .06


Training performance
Other 8 10,920 0.18 0.20 .07 (.09) .10 .29
Rational 5 3,034 0.04 0.04 .00 (.03) .04 .04
Advancement potential
Other 8 5,117 0.06 0.06 .13 (.14) −.10 .23
Rational 6 4,556 0.04 0.04 .12 (.13) −.15 .19
Extraversion
Other 12 4,843 0.09 0.11 .21 (.22) −.16 .37
Rational 10 4,434 0.09 0.12 .21 (.17) −.15 .38
Agreeableness
Other 11 4,580 0.16 0.22 .08 (.10) .12 .33
Rational 10 4,434 0.16 0.23 .08 (.08) .12 .33
Emotional stability
Other 12 18,557 0.16 0.34 .12 (.06) .18 .50
Rational 9 16,824 0.16 0.35 .12 (.05) .20 .50
Conscientiousness
Other 15 18,576 0.15 0.30 .13 (.09) .13 .46
Rational 13 18,167 0.16 0.31 .09 (.06) .19 .43
Openness
Other 11 4,580 0.11 0.15 .15 (.12) −.05 .34
Rational 10 4,434 0.11 0.15 .15 (.13) −.05 .34
Cognitive ability
Other 15 93,931 0.05 0.09 .19 (.09) −.15 .34
Empirical 5 74,156 0.02 0.02 .02 (.02) −.01 .04
Rational 9 18,754 0.17 0.39 .27 (.13) .05 .74

Note. k = number of samples; N = combined sample size; r = uncorrected correlation; p = corrected validity coefficient corrected for range restriction and
criterion unreliability for performance outcomes and corrected for measurement error in both predictors for convergent correlations; SDp = standard deviation of
true correlation and SDr = observed standard deviation; CV = 80% credibility interval.

Received July 1, 2020


Revision received July 26, 2021
Accepted July 31, 2021 ▪

View publication stats

You might also like