Professional Documents
Culture Documents
*This document is intended to provide a simplified and concise overview into the science behind The Predictive Index
Behavioral Assessment. Therefore, this document does not contain all of the scientific details related to the
development, scoring, and statistical properties of the assessment. For a complete and thorough overview, please
see The PI Behavioral Assessment Technical Manual or EFPA report.
The Science Behind the
PI Behavioral Assessment
Table of Contents
About the Predictive Index ............................................................................................................................ 3
The PI Behavioral Assessment ..................................................................................................................... 4
Designed for the Workplace .......................................................................................................................... 5
Proven Predictive Power in the Field ............................................................................................................ 6
Built for the Entire Employee Lifecycle ......................................................................................................... 7
Engineered with Modern Psychometrics ....................................................................................................... 7
Assessment Concepts and Theory ........................................................................................................... 8
Assessment Development, Review, and Translation ............................................................................... 9
Norm Sample .......................................................................................................................................... 10
Validity ..................................................................................................................................................... 10
Reliability ................................................................................................................................................. 12
Fairness................................................................................................................................................... 14
Conclusions ................................................................................................................................................. 18
References .................................................................................................................................................. 19
The degree to which an individual seeks social interaction with other people.
Individuals who score high on this dimension tend to be outgoing, persuasive
B EXTRAVERSION
and socially poised. Individuals who score low on this dimension tend to be
serious, introspective and task oriented.
The degree to which an individual seeks consistency and stability in his or her
environment. Individuals who score high on this dimension tend to be patient,
C PATIENCE
consistent and deliberate. Individuals who score low on this dimension tend to
be fast-paced, urgent and intense.
PI Behavioral Assessment results are reported within-person, meaning that the focus is not on how one’s
personality compares to other people, but instead on whether or not one’s behavioral drives are aligned with
the behavioral demands of the work environment. This approach to assessment maintains comparable
validity and reliability to more conventional personality assessments but has the added benefit of focusing
on how a person is predicted to behave rather than how they compare to other people (e.g., Saville &
Willson, 1991). For example, in the charts below, the candidate’s behavioral pattern is shown first. This
person tends to approach their work with Dominance behaviors (Factor A) followed by Extraversion
behaviors (Factor B) and tends not to express Patience or Formality behaviors (Factors C and D) as
consistently. The second chart shows the demands of their particular job role, as measured with the PI Job
Assessment. This person is an excellent behavioral fit for their role.
Figure 1. PI Behavioral Assessment plots of a candidate (above) showing relative expression of their
behavioral drives. This is compared to a behavioral job target (below) from the PI Job Assessment. This
person is an excellent behavioral fit for their job.
31%
26%
19%
14%
6%
2% 2%
0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
Criterion Validity Coefficient (Correlation)
Figure 2. Distribution of criterion validity coefficient effects showing the strength of statistically significant
relationships between PI Behavioral Assessment scores and 4,800 measures of job performance in nearly
350 studies with PI clients. The PI Behavioral Assessment predicted performance in 94% of these studies.
PI continues to expand its portfolio of validity research and the sophistication of methods used in the
analyses. PI clients have worked with PI’s researchers to go beyond correlations to design customized
studies for clients examining how behavior interacts with variables such as tenure or regional differences.
$1,500,000 $1,500,000
$1,000,000 $1,000,000
$500,000 $500,000
■High Dominance = 2.0 σ
■Low Dominance = -2.0 σ
$0 $0
KENNESAW ALPHARETTA WESTMINSTER -3σ -2σ -1σ 0σ 1σ 2σ 3σ
Figure 3. Example of 2017 PI study showing relationship between Dominance behaviors and sales,
accounting for location effects in a sample of U.S. retail salespeople.
about who those others may be (e.g., coworkers, clients, friends, family, managers, neighbors). Thus, the
use of unspecific stems and solitary adjectives supports an unconscious projective response that reflects
the respondent’s own perceptions of the major drivers of his or her self-concept.
Finally, it is worth noting that the decision to use the free-choice (rather than the forced-choice) selection of
adjectives in the PI Behavioral Assessment was made with the intent of enhancing the projective value of
the instrument, since a free-choice form provides less direction for the subject than does a forced-choice
form. Daniels argued that by allowing the respondent this flexibility, it better represents one’s stimulus to the
environment. Daniels referenced the work of Rogers (1951), explaining that this format allows one to record
the respondent’s reactions to his or her perception of the environment. The use of a checklist of adjectives
also mirrors the designs of other free-choice personality assessments. Gough (1983) recounts that he used
125 of the 171 adjectives from Cattell’s (1946) study of personality in the development of a checklist-based
personality assessment in 1949, with additional adjectives being added to the form in 1952. Clarke’s (1956)
assessment (mentioned previously) was also based on Marston’s (1928) theories, and it too was designed
as a free-choice checklist. These checklist assessments are still in use today. In addition to the projective
value of allowing the respondent to select adjectives to describe his or her Self and Self-Concept behavioral
configurations, the PI Behavioral Assessment’s adjective checklist has an element of efficiency to it,
allowing respondents to quickly decide whether or not each of the 86 words describes him or herself
(Daniels, 1955).
Norm Sample
Standardizing scores for the PI Behavioral Assessment and ensuring the generalizability of the
psychometric analyses requires a representative sample. PI’s tools are designed to be used with the global
workforce, so it was critical to recruit an international sample for piloting and calibrating the PI Behavioral
Assessment. The PI Behavioral Assessment Norm is based on a sample of 9,645 working adults from 129
countries, and the assessment is now taken by more than two million people around the world every year.
Of the initial sample’s 7,658 respondents who reported their gender, 52% were male and 48% were female
(see Figure 4). The average age of the 7,336 respondents who reported age in the sample was 40.1 years
and ranged between 18 years and 64 years (see Figure 5). Although the sample included diverse
nationalities, 64% percent of the sample was from the U.S. The primary ethnic composition of these U.S.
respondents was Caucasian (74%), followed by African American or Black (12%), Latino, Latina or Hispanic
(8%), and Asian (2%).
Validity
The construct validity of the current revision of the PI Behavioral Assessment was examined using
exploratory factor analysis, confirmatory factor analysis, content alignment ratings, corrected item-total
correlations, differential item functioning (DIF) analysis across groups, and convergent correlations with
other personality scales (Foster et al., 2020). Additional validation has been conducted through criterion
validation studies (discussed previously), report accuracy studies, and analysis of the validity of use cases.
Overall, the expansive portfolio of validity evidence described here underscores that the PI Behavioral
Assessment is a valid and powerful tool for assessing workplace behaviors.
PI researchers began validation of the current revision with content alignment studies. The purpose of
content alignment studies was to quantify subject matter expert opinion about the alignment of existing and
potential new words or items with the PI Behavioral Assessment factors. In the current revision, this involved
the 86 existing adjectives of the previous revision of the PI Behavioral Assessment and 140 new pre-test
adjectives. Fifteen content experts, each with a doctorate in psychology, education, or a related field,
participated in content alignment studies. All experts had extensive experience in assessment, education,
and psychology, particularly as they apply to organizational environments. The content experts were asked
to rate the degree to which each word aligns with each PI Behavioral Assessment Factor on a scale of 0 to
100. This information was used to inform item selection along with other data.
To obtain classical test theory item statistics, the new and existing words were administered to operational
samples of respondents. Researchers collected 136,544 usable cases for analysis of the existing version of
the assessment and approximately 10,000 cases for each of the 140 new pre-test words (Foster et al.,
2015). These responses were used to calculate selection rates for adjectives and corrected item-total
correlations, which were used to select items for the final form and demonstrate construct validity.
Exploratory factor analyses (EFA) were conducted to evaluate the unidimensionality of each of the factors
on the PI Behavioral Assessment, which was important for validation, as well as a requirement for the
assumptions underlying internal consistency reliability. Scree plots from the EFA show clear
unidimensionality in all five of the measured traits. Foster et al. (2020) also conducted a confirmatory factor
analysis of the factor structure for the PI Behavioral Assessment, testing the assumption that the observed
responses on the PI Behavioral Assessment represented five latent constructs (Factors A – E). Foster et al.
(2016) found that the model fit well (Root Mean Square Error of Approximation [RMSEA]=0.064,
Standardized Root Mean Square Residual [SRMR]=0.043, Goodness of Fit Index [GFI]=0.93, Comparative
Fit Index [CFI]=0.99). Validity coefficients from the factor loadings ranged from 0.73 to 0.85.
PI also conducted two convergent validity analyses to test whether PI Behavioral Assessment scores
correlated with other similar (but not identical) personality scales. The first study had a sample size of n =
1,023, and the second had a sample size of n = 800. Table 3 shows a sample of these results, and the full
results are reported in Foster et al. (2016). The correlational evidence from these studies leads to the
conclusion that, overall, convergent and divergent validity for the PI Behavioral factor scores was obtained
Table 3. Sample of Convergent Validity of the PI BA Self Raw Scores with IPIP Scales.
IPIP Scale A B C D E
Assertiveness 0.50 0.32 -0.07 0.11 0.20
Domineering 0.31 0.10 -0.20 -0.04 -0.01
Extraversion 0.37 0.59 0.10 0.08 0.09
Sociability 0.33 0.54 0.15 0.13 0.12
Calmness ___ 0.23 0.50 0.23 0.22
Good-Nature 0.17 0.30 0.47 0.18 0.16
Patience 0.12 0.20 0.48 0.23 0.21
Methodicalness 0.15 0.12 0.17 0.43 0.23
Conscientiousness 0.24 0.20 0.13 0.37 0.22
Judgment 0.22 0.13 0.13 0.28 0.32
Independent Minded 0.26 ___ 0.18 0.23 0.41
Note: Boldfaced correlations are hypothesized convergent validity coefficients. Convergent validity is shown
by the boldfaced correlation being stronger and more positive than other correlations within each row.
Finally, there is the validation of the use of the PI Behavioral Assessment. As noted previously, PI has
conducted hundreds of criterion validity studies, the results of which demonstrate that the PI Behavioral
Assessment is both predictive and relevant for hiring decisions; however, PI must also make the case that
the assessment is useful for development of existing employees. The utility of the assessment for this
purpose can be demonstrated in two ways. First, PI provides a software system of reports and content that
supports the development use case. This is evidenced by the availability of Coaching Guides, Relationship
Guides, Personal Development Charts, and Team Workstyles reports. The content and design of these
tools is built to support the employee development use case, and each of these items is backed by careful
user testing and iterative improvements based on user feedback. The second way to evaluate whether the
PI Behavioral Assessment is useful for employee development is the evidence collected from client use. Not
all clients use the PI Behavioral Assessment for employee development, but a survey of 1,127 PI clients in
May 2017 showed that 691 (61%) of these clients use the PI Behavioral Assessment for employee
development. PI also tracks client usage at a more granular level through software, but the overall trend
provides strong evidence that the PI Behavioral Assessment is useful for employee development at many
companies.
However, just because it is being used does not necessarily mean that it is having the intended
consequences of improving employee development; i.e., companies can certainly engage in employee
development without the PI Behavioral Assessment. To verify that our tools are having the intended benefits
for clients, PI tracks a variety of feedback measures, both from users and respondents. For example, of the
clients using the PI Behavioral Assessment for employee development in that May 2017 study mentioned
previously, 559 clients (81%) agreed or strongly agreed that PI’s tools have “helped us develop better
employees.” We also track how well the reports resonate with the respondents and users, with recent
tracking showing that 87% of people agree or strongly agree with the interpretive text provided in their
reports (only 4% of respondents disagreed with their results, indicating a very high report accuracy).
Reliability
Reliability refers to the precision or consistency of measurement (Nunnally & Bernstein, 1994). A common
way to estimate reliability is by computing internal consistency reliability. Of the measures of internal
consistency reliability, the one used most often is coefficient alpha, which reflects the extent to which items
on a measure are intercorrelated. PI has conducted reliability analyses at several points over the years or
with region-specific sample, but the best representation of the reliability of the current revision of the PI
Behavioral Assessment was calculated with 9,645 results from the norm group sample. Table 4 shows the
reliability estimates (alpha) and the standard error of measurement (SEM) for each factor score.
Table 4. Scale Factor Score Reliability Estimates (Alpha) and Standard Error of Measurement (SEM).
Self Self-Concept Synthesis
Alpha SEM(σ) Alpha SEM(σ) Alpha SEM(σ)
A DOMINANCE 0.81 0.44σ 0.77 0.49σ 0.85 0.33σ
B EXTRAVERSION 0.79 0.41σ 0.79 0.42σ 0.85 0.30σ
C PATIENCE 0.79 0.45σ 0.77 0.48σ 0.83 0.36σ
D FORMALITY 0.85 0.38σ 0.84 0.41σ 0.88 0.29σ
E OBJECTIVITY 0.87 0.44σ 0.84 0.50σ 0.90 0.35σ
In addition to the internal consistency reliability estimates in Table 4, one can consider the construct
reliability estimates, which indicate whether the constructs themselves are expected to be stable and
replicable. The construct reliability can be estimated with Coefficient H, which is a useful construct reliability
estimate; because it is not affected by a loading’s sign, it does not decrease when additional indicators are
added, and it cannot be smaller than the smallest loading in the CFA model (Hancock & Mueller, 2001).
Table 5 reports the Coefficient H values from the CFA analysis detailed in Foster et al. (2020).
Reliability can also refer to stability over time. For example, if the PI Behavioral Assessment is used to
inform a hiring decision, will the information gleaned at the hiring stage still hold true a year later? What
about five years later? The most straightforward approach to estimating this kind of reliability is via repeated
measurements of the same person. The correlation between temporally separated scores is known as the
test-retest reliability or coefficient of stability. Three studies by Everton (1999) and Harris et al. (2014) using
the previous release of the PI Behavioral Assessment found 2-week test-retest reliabilities for primary
factors (Self domain) in the 0.71 to 0.84 range. A later study reported by Harris et al. (2014) investigated
test-retest across six months with an average test-retest reliability across factors of 0.71.
These studies show strong test-retest reliability evidence over short periods in experimental conditions, but it
is also worth looking at operational evidence over longer periods of time. Using the previous revision of the
PI Behavioral Assessment, Fossey (2017) analyzed an operational sample of 11,684 respondents who took
the PI Behavioral Assessment twice. These respondents’ first administration occurred between April 2003
and July 2015, and their second administration occurred between August 2006 and July 2015. The median
amount of time that passed between the two administrations was 270 days but ranged from 1 day to over 11
years. Fossey’s (2017) study used an operational sample, meaning that these retests were not initiated as
part of a research study.
To measure the coefficients of stability over different time periods, Fossey (2017) calculated Pearson
correlations for each six-month retest window sample for the PI Behavioral Assessment Self Factors. Figure
6 shows the coefficients of stability in six-month increments up to a maximum eight-year retest window. For
context, the U.S. Bureau of Labor Statistics (2016) reported that the median tenure for U.S. employees is
only 4.2 years. Linear regression analyses showed that time did not have a large effect on the coefficients of
stability except in the case of the Formality scale, which showed a statistically significant decline in
coefficients of stability over the eight years of possible retest windows (F(1,14) = 4.82, p = 0.05). For the
Formality scale, each year that passed between the two administrations of the assessment is expected to
lower the coefficient of stability by only 0.02.
0.9
0.8
Coefficient of Stability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 Year 2 Years 3 Years 4 Years 5 Years 6 Years 7 Years 8 Years
n = 4755 n = 1791 n = 1294 n = 920 n = 709 n = 551 n = 443 n = 331 n = 262 n = 181 n = 151 n = 108 n = 77 n = 50 n = 35 n = 26
In evaluating these estimates, Fossey (2017) cited the U.S. Department of Labor (1999), which observed
that acceptable coefficients of stability may be below 0.70 for constructs that have the potential to vary
over time. Crocker and Algina (2008) explained that coefficients of stability will also change based on the
time period between retests. As the coefficient of stability decreases, there is no way to know if it
represents fluctuation in the trait or a lack of reliability in the assessment instrument itself. In practice,
test-retest reliability coefficients largely vary depending on the type of assessment and the time period
between retests. Crocker and Algina (2008) observed that personality assessments like the PI Behavioral
Assessment typically have lower coefficients of stability than other types of assessments, such as
aptitude tests, but that personality assessments can still produce relatively stable measures over time.
Fossey (2017) argued that the one-year test-retest reliability range of 0.61-0.65 exhibited by the previous
revision of the PI Behavioral Assessment did not preclude its use for selection decision-making, and he
noted that other personality assessments with similar test-retest reliability estimates have been
recommended for selection use as long as the results are used in conjunction with other evidence (e.g.,
Axford & Hayes, 2014). To summarize, the PI Behavioral Assessment’s factor scores are stable enough
over time to support hiring decisions, when combined with other relevant sources of information, such as
cognitive ability, experience, and education/training.
Fairness
At its foundation, the issue of assessment fairness is an issue of validity. Any workforce assessment should
be fair for all test takers in the intended population, and one step to supporting fairness is to review items to
help avoid test content bias (Schmeiser & Welch, 2006; AERA, APA, & NCME, 2014). The adjectives on the
PI Behavioral Assessment go through a bias and fairness review before being field tested or being used for
scored administrations. Differential item functioning (DIF) analyses are performed regularly to understand
response differences that might affect scores of various demographic groups. These analyses were also
used to inform the selection of items (Foster, et al, 2020). PI also provides translated versions of the PI
Behavioral Assessment and analyses regional differences when feasible to ensure that interpretation is
comparable for users across the globe. The following sections provide an overview of some of the fairness
research and development that has gone into the PI Behavioral Assessment.\
Demographic Groups
In employment settings, large differences between demographic groups’ scores on hiring assessments can
result in lower rates of selection for certain subgroups. When these lower rates occur in the United States, it
is called adverse impact. To date, there is no evidence to indicate that the inclusion of the PI Behavioral
Assessment in a company’s personnel selection system, either in a compensatory or in a “multiple-hurdle”
selection model, results in adverse impact against any protected class. In fact, in 60 years, there has never
been a successful legal challenge involving the PI Behavioral Assessment. In addition, The Predictive Index
has run the following studies to demonstrate that the use of its assessment tool does not result in adverse
impact:
• Wolman (1991) ran a study to determine whether men and women tended to score differently on
the PI Behavioral Assessment, and whether African Americans, Hispanics, and Caucasians tended
to score differently. His analyses showed that neither gender nor race was significantly related to PI
Behavioral Assessment scores.
• Harris (2004) analyzed PI Behavioral Assessment scores to determine whether they produced
adverse impact based on age. The study showed that for all PI Behavioral Assessment Factors,
there was no significant difference between people over age 40 (the protected class) and people
under age 40, confirming similar findings initially obtained by Everton (1998).
• In a 2008 banking industry study of 347 employees working in a variety of jobs (e.g., teller, branch
manager, loan officer), gender and race accounted for less than 2% and 3% of the variability,
respectively, in PI Behavioral Assessment factor scores (Harris, Tracy, & Fisher, 2014).
• DIF analysis was conducted to look for statistical bias based on gender, race/ethnicity, and age.
Adjectives with DIF effect sizes of 0.30 or higher were excluded from consideration for the
construction of the PI Behavioral Assessment (Foster et al., 2015).
• Raw score differences were again analyzed for a sample of nearly 620,000 respondents who took
the PI Behavioral Assessment between 2017 and 2020. Differences between gender and age
groups were negligible (Cohen’s D = 0.09 and 0.12, respectively). A one-way ANOVA test of score
differences by race and ethnicity also yielded negligible effects (Eta squared < 0.001) (Foster et al.,
2020).
PI’s software also uses a match score algorithm to objectively rank candidates’ behavioral match to job
targets set with the PI Job Assessment. To gather additional evidence regarding the fairness of match
scores, an adverse impact simulation study was conducted using various match score cut-offs. For
example, if all candidates with a match score of 10 points were hired, and all those with match scores
between 1 and 9 points were rejected, would selection ratios based on demographic groupings result in
adverse impact, as defined by the 4/5ths (or 80%) rule?
Analysis from this study showed that, regardless of the cut-score used or demographic grouping in question,
there was no evidence of adverse impact. Figures 7a-h demonstrate that at each cut-off match score, the
selection ratio remains well above 80%. This study provides evidence that match scores can mitigate or
reduce bias in the hiring process, and the results further strengthen the fairness evidence for the PI
Behavioral Assessment. It is, however, important to note that this adverse impact analysis was hypothetical
and was conducted with very data from hundreds of thousands of respondents applying to a variety of job
roles. This study is presented for illustrative purposes only, and it is not a substitute for hiring organizations
conducting their own adverse impact analyses based upon their hiring practices. Furthermore, The
Predictive Index recommends against using match scores as a true “cut-off”; rather, match scores should be
viewed as a single data point when making selection decisions.
Nevertheless, the evidence from the DIF procedures, simulated adverse impact analysis, and multiple
demographic comparison studies indicates that the PI Behavioral Assessment is age-, gender- and race-
neutral, and PI believes that the inclusion of a well-validated personality assessment such as the PI
Behavioral Assessment in a company’s personnel selection system can support inclusivity and lead to a
more demographically diverse workforce.
Figure 7a. Selection Ratio by Age (Under 40 – Figure 7b. Selection Ratio by Gender (Male –
40+) Female)
100% 100%
75% 75%
50% 50%
25% 25%
0% 0%
10 9 8 7 6 5 4 3 2 10 9 8 7 6 5 4 3 2
Match Score Cut-Off Match Score Cut-Off
Figure 7c. Selection Ratio by Ethnicity (White – Figure 7d. Selection Ratio by Ethnicity (White –
Non-White) African American)
100% 100%
75% 75%
50% 50%
25% 25%
0% 0%
10 9 8 7 6 5 4 3 2 10 9 8 7 6 5 4 3 2
Match Score Cut-Off Match Score Cut-Off
Figure 7e. Selection Ratio by Ethnicity (White – Figure 7f. Selection Ratio by Ethnicity (White –
Asian) Latinx or Hispanic)
100% 100%
75% 75%
50% 50%
25% 25%
0% 0%
10 9 8 7 6 5 4 3 2 10 9 8 7 6 5 4 3 2
Match Score Cut-Off Match Score Cut-Off
Figure 7g. Selection Ratio by Ethnicity (White – Figure 7h. Selection Ratio by Ethnicity (White –
Native Hawaiian or Other Pacific Islander) Native American or Alaskan Native)
100% 100%
75% 75%
50% 50%
25% 25%
0% 0%
10 9 8 7 6 5 4 3 2 10 9 8 7 6 5 4 3 2
Match Score Cut-Off Match Score Cut-Off
Reasonable Accommodations
The PI Behavioral Assessment can be used with most disabled populations without any modification or
accommodation. The PI Behavioral Assessment is an untimed assessment that uses an adjective checklist
format, and the respondent can control many aspects of the delivery mode and context. Nevertheless, test
users should still be prepared to provide a reasonable accommodation in appropriate circumstances. In
cases where a disability might prevent the respondent from understanding the content of the assessment,
use of the PI Behavioral Assessment is not recommended.
Reasonable accommodations address the specific disability and its potential interaction with aspects of the
assessment design and administration. At a very high level, one can consider three categories of
disabilities: physical or sensory disabilities, psychological disabilities, and cognitive disabilities. One can also
consider their interactions with three aspects of the PI Behavioral Assessment: context, content, and
response (AERA, APA, NCME, 2014). Context refers to the conditions under which the assessment is
administered, as well as aspects of the assessment design, such as instructions and language format.
Fortunately, in the PI Behavioral Assessment, the respondent has control over many aspects of the context,
so they can take the PI Behavioral Assessment in a context that is most appropriate for them, without ever
requesting accommodations. For example, common context accommodations for other assessments may
include things like extended time, private testing rooms, extra resources, or special equipment like screen
magnifiers, but the PI Behavioral Assessment can be taken by the respondent without an administrator
present, and there are no time limits. As such, the test user does not need to make any changes for the
administration—the respondent has control over much of the format, and these changes will not impact the
validity of the results.
The content of the PI Behavioral Assessment (including its translations) has gone through a bias review,
and the content is not expected to result in any irrelevant score variance in disabled subpopulations. The
content of the PI Behavioral Assessment is also written at a very basic reading level. The English version of
the assessment is measured to have a reading level appropriate for 8- or 9-year-old children across three
readability scales: Flesch-Kincaid Grade Level, SMOG Index, and Automated Readability Index. As such,
the PI Behavioral Assessment content is expected to be accessible for most of the adult population, even if
they have a disability that impacts reading or verbal comprehension. If, however, the respondent has a
disability which will severely limit their ability to understand the instructions or content of the assessment,
then use of the PI Behavioral Assessment is not recommended.
The content of the PI Behavioral Assessment is also considered appropriate for use with respondents who
have psychological disabilities. The PI Behavioral Assessment is a normal measure of personality,
described with five behavioral factors: Dominance, Extraversion, Patience, Formality, and Objectivity. It has
never been shown to be linked to any psychological disorders. This distinction is made for the PI Behavioral
Assessment because other Five Factor assessments include scales for Emotional Stability (or Neuroticism)
that have been shown to have weak relations to personality-based disorders and anxiety disorders. The PI
Behavioral Assessment does not include an Emotional Stability scale, and the assessment results are
expected to be valid for respondents with similar psychological disabilities.
Regional Equivalency
PI employs a Localization Project Manager (LPM) who oversees localization and translation projects of PI
content for client regions. The LPM worked with the assessment developers to translate the instructions and
adjectives for the assessment. To do this, they employed professional linguists provided by PI’s translation
company and in-country reviewers who were in the PI Partner network and who were native speakers of the
translated language. Back-translation and cultural reviews were also conducted (B. Van Raalte, personal
communication, May 19, 2017).
Before launching the current revision of the PI Behavioral Assessment, comparison norm tables were also
created using North American respondents and the remaining respondents from other regions. PI
researchers compared the expected means and standard deviations in the two tables as well as the
potential differences in factor scores that could occur using the two norm tables (Foster et al., 2020). The
impacts were considered negligible for the purposes of interpretation. The same analysis was conducted
with regional samples from Sweden and Norway as part of the PI’s EFPA certification, with the same region-
neutral findings (Barnett et al., 2017). Because PI serves a global workforce, and because current regional
analyses have not shown any meaningful differences for interpretation, a single, amalgamated global norm
table is currently used for scoring. Nevertheless, PI is committed to providing updated or localized norms
when needed, and new regional analyses are conducted depending on demand and as samples permit.
Conclusions
This document summarizes the science behind the PI Behavioral Assessment; however, this is just an
overview. Clients have access to PI’s portfolio of research, technical manuals, administrator guides,
researchers’ “point-of-view” articles, and white papers. There is expansive scientific evidence to show that
the instrument is a well-constructed, thoroughly validated assessment that supports workplace decision-
making. The PI Behavioral Assessment is constructed following best practices for test development and
maintenance, and PI researchers continue to bolster the evidence for the instruments validity, reliability, and
fairness for its intended applications.
Even more impressive, and often overlooked, is the fact that the instrument takes an average of only six
minutes to complete, yet yields such strong, accurate results that are stable enough to support decisions
spanning years (like hiring). The instrument resonates well with respondents too, with 96% of respondents
providing positive responses on the accuracy of their PI Behavioral Assessment report. The ease of use and
approachable format of the assessment helps it fit in seamlessly in the hiring process or for use with existing
employees, allowing companies to scale behavioral awareness to their entire company as part of their talent
strategy.
The PI Behavioral Assessment has persisted as an industry staple for 60 years, retaining its foundations
and utility for the global workforce while keeping pace with advances in psychometrics and
industrial/organizational psychology. It is well-built as an efficient, versatile, and work-relevant assessment
that supports a variety of applications throughout the employee lifecycle. Clients who use the PI Behavioral
Assessment can rest assured that they are getting a well-maintained, accurate, and useful measure of
workplace personality that will continue to be supported by research for decades to come.
References
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education (2014). Standards for educational and psychological testing.
Washington, DC: American Educational Research Association.
Axford, S. & Hayes, T. L. (2014). Review of the Hogan Development Survey [Revised]. In J. F. Carlson. K.
F. Geisinger, & J. L. Johnson (Eds.), The Nineteenth Mental Measurements Yearbook. Lincoln, NE:
University of Nebraska Press.
Barnett, G., Fossey, A., Black, L., Mulvey, S., Poepsel, M., Mias, E., LaMark, D., Van Raalte, B.,
Dombrowski, R., Van Voorhis, L., Blair-Lamb, K., Kazi, Z., Holsinger, M., & Hosein, A. (2017).
Report on the PI Behavioral Assessment for EFPA Review [technical report]. Westwood, MA: The
Predictive Index.
Cantril, H. (1950). The why of man’s experience. New York, NY: Macmillan Company.
Cattell, R. B. (1946). The description and measurement of personality. New York, NY: Harcourt, Brace &
World.
Chen, A. (2018, October 10). How accurate are personality tests? Scientific American. Available from
https://www.scientificamerican.com/article/how-accurate-are-personality-tests/.
Clarke, W. (1956). The construction of an industrial selection personality test. The Journal of Psychology:
Interdisciplinary and Applied, 41(2), 379-394.
Crocker, L, & Algina, J. (2008). Introduction to classical and modern test theory. Mason, OH: Cengage
Learning.
Daniels, A. S. (1955). Sources of The Predictive Index [technical report]. Wellesley, MA: The Predictive
Index.
Equal Employment Opportunity Commission, U.S. (1978). Uniform guidelines on employment selection
procedures (EEOC Publication No. 43 FR 38295, 38312, Aug. 25, 1978). Washington, DC:
Government Publishing Office.
Equal Employment Opportunity Commission, U.S. (1981). Age discrimination in employment act (EEOC
Publication No. 46 FR 47726, Sept. 29, 1981). Washington, DC: Government Publishing Office.
European Federation of Psychologists’ Associations (EFPA) (2013). EFPA review model for the description
and evaluation of psychological and educational tests: Test review form and notes for reviewers (v
4.2.6). A. Evers, C. Hagemeister, A. Høstmælingen, P. Lindley, J. Muñiz, & A. Sjöberg (Eds.).
Brussels: EFPA. Available from: http://www.efpa.eu/professional-development/assessment.
Everton, W. (1999). A further investigation of the construct validity of The Predictive Index®. (Validity Report
No. 1999.11.01). Wellesley Hills, MA: Predictive Index, LLC (Formerly Praendex, Inc.)
Everton, W. (2000). An investigation of the psychometrics properties of the French Predictive Index checklist
[technical report]. Wellesley Hills, MA: Predictive Index, LLC (Formerly Praendex, Inc.)
Fiske, D. W. (1949) Consistency of the factorial structures of personality ratings from different sources.
Journal of Abnormal and Social Psychology, 2, 329-344.
Fossey, A. (2017). PI Behavioral Assessment: Test-retest reliability [technical report]. Westwood, MA: The
Predictive Index.
Foster, D., Maynes, D., Miller, J., Chernyshenko, O. S., Mead, A., & Drasgow, F. (2015). Meta-analysis
report of The Predictive Index (Form IV). Wellesley Hills, MA: Predictive Index, LLC
Foster, D., Maynes, D., Miller, J., Chernyshenko, S., Mead, A., Drasgow, F., Poepsel, M., Barnett, G.,
Fossey, A., Siminovsky, A., & Mulvey, S. (2020). The Predictive Index Behavioral Assessment
technical manual [technical report]. Westwood, MA: The Predictive Index.
Foster, D., Maynes, D., Miller, J., Parr, R., & Drasgow, F. (2015). Report for development of norms for Form
V of the Predictive Index [technical report]. Westwood, MA: The Predictive Index.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. C.
(2006). The International Personality Item Pool and the future of public-domain personality
measures. Journal of Research in Personality, 40, 84-96.
Gough, H. G., & Heilbrun, A. B. (1983). The Adjective Check List manual. Sunnyvale, CA: Consulting
Psychologists Press.
Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable systems. In R.
Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural Equation Modeling: Present and Future — A
Festschrift
Harris, T. (2006). The impact of gender and race on Predictive Index scores in a large South African Sample
[technical document]. Wellesley Hills, MA: Praendex, Inc.
Harris, T. C., Tracy, A. J., & Fisher, G. G. (2014). Predictive Index technical overview. Wellesley Hills, MA:
Predictive Index, LLC (Formerly Praendex, Inc.)
Harris, T.C. (2004). The impact of age on Predictive Index scores: An update. Wellesley Hills, MA:
Predictive Index, LLC.
Kuder, G. F. (1954). The use of inventories as projective devices. Educational and Psychological
Measurement, 14(2).
Lecky, P. (1945). Self-consistency: A theory of personality. New York, NY: Island Press.
MacKinnon, D. W. (1944). The structure of personality. In J. M. Hunt (Ed.), Personality and the Behavior
Disorders (pp. 3-48). New York, NY: Ronald Press.
Marston, W. M. (1928). The emotions of normal people. New York, NY: Harcourt, Brace and Company.
Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50(4), 370-396.
National Center for O*NET Development. (2020). Work Styles. O*NET OnLine. Retrieved May 15, 2020,
from https://www.onetonline.org/find/descriptor/browse/Work_Styles/
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.
Perry, J. C., & Lavori, P. W. (1983). The Predictive Index: A report on reliability and construct validity
[technical report]. Wellesley Hills, MA: The Predictive Index.
Rogers, C. (1959). A theory of therapy, personality and interpersonal relationships as developed in the
client-centered framework. In S. Koch (Ed.), Psychology: A Study of a Science. Vol. 3: Formulations
of the Person and the Social Context (pp. 184-256). New York, NY: McGraw Hill.
Rogers, C. R. (1951). Client centered therapy. Boston, MA: Houghton Mifflin Co.
Saville, P., & Willson, E. (1991). The reliability and validity of normative and ipsative approaches in the
measurement of personality. Journal of Occupational Psychology, 64, 219-238.
Schmeiser, C. B., & Welch, C. J. 2006). Test development. In R. L. Brennan (Ed.), Educational
Sjöberg, L. (2000). The psychometric structure of the Swedish version of the Predictive Index (PI).
Stockholm, Sweden: Stockholm School of Economics.
Sjöberg, L. (2003). Properties of the new Swedish version of the Predictive Index. Stockholm, Sweden:
Stockholm School of Economics.
The International Test Commission. (2006). International guidelines on computer-based and internet
delivered testing. International Journal of Testing, 6(2), 143-171.
The Predictive Index. (2015). Predictive Index management workshop [user document]. Westwood, MA:
The Predictive Index.
U.S. Bureau of Labor Statistics (2016). Employee tenure summary. Washington, DC: U.S. Bureau of Labor
Statistics. Available from https://www.bls.gov/news.release/tenure.nr0.htm.
U.S. Department of Labor (1999). Testing and assessment: An employer’s guide to good practices.
Washington, DC: U.S. Department of Labor. Available from:
https://wdr.doleta.gov/opr/FULLTEXT/99-testassess.pdf.
Wolman, R. N. (1991). Predictive Index factors in relation to race, sex, and job levels. Wellesley Hills, MA:
Predictive Index, LLC (Formerly Praendex, Inc.).