Chap 06 - Characteristics of Effective Tests

© 2013 Cengage Learning
Characteristics of Effective
Selection Techniques
1
Optimal Employee Selection Systems

• Are Reliable
• Are Valid
– Based on a job analysis (content validity)
– Predict work-related behavior (criterion validity)
• Reduce the Chance of a Legal Challenge
– Face valid
– Don’t invade privacy
– Don’t intentionally discriminate
– Minimize adverse impact
• Are Cost Effective
– Cost to purchase/create
– Cost to administer 2
– Cost to score
Reliability
• The extent to which a score from a test is
consistent and free from errors of measurement
• Methods of Determining Reliability
– Test-retest (temporal stability)
– Alternate forms (form stability)
– Internal reliability (item stability)
– Scorer reliability
3
Test-Retest Reliability
• Measures temporal stability
• Administration
– Same applicants
– Same test
– Two testing periods
• Scores at time one are correlated with
scores at time two
• Correlation should be above .70
4
Test-Retest Reliability
Problems
• Sources of measurement errors
– Characteristic or attribute being measured
may change over time
– Reactivity
– Carry over effects
• Practical problems
– Time consuming
– Expensive
– Inappropriate for some types of tests
5
Alternate Forms Reliability

Administration
• Two forms of the same test are developed, and to the

highest degree possible, are equivalent in terms of
content, response process, and statistical
characteristics
• One form is administered to examinees, and at some
later date, the same examinees take the second form
6

Scoring
• Scores from the first form of test
are correlated with scores from the
second form
• If the scores are highly correlated,
the test has form stability
7

Disadvantages
• Difficult to develop
• Content sampling errors
• Time sampling errors
8
Internal Reliability
• Defines measurement error strictly in terms
of consistency or inconsistency in the
content of the test.
• Used when it is impractical to administer
two separate forms of a test.
• With this form of reliability the test is
administered only once and measures item
stability.
9
Determining Internal Reliability

• Split-Half method (most common)
– Test items are divided into two equal parts
– Scores for the two parts are correlated to get a
measure of internal reliability.
• Spearman-Brown prophecy formula:
(2 x split half reliability) ÷ (1 + split-half reliability)
10
Spearman-Brown Formula
(2 x split-half correlation)
(1 + split-half correlation)
If we have a split-half correlation of .60,

the corrected reliability would be:
(2 x .60) ÷ (1 + .60) = 1.2 ÷ 1.6 = .75
11
Common Methods for

Correlating Split-half Methods
• Cronbach’s Coefficient Alpha
– Used with ratio or interval data.
• Kuder-Richardson Formula
– Used for test with dichotomous items (yes-no true-false)
12
Interrater Reliability
• Used when human judgment of performance is
involved in the selection process
• Refers to the degree of agreement between 2 or
more raters
13
Rate the Waiter’s Performance

(Office Space – DVD segment 3) 14
Reliability: Conclusions
• The higher the reliability of a selection test
the better. Reliability should be .70 or
higher
• Reliability can be affected by many factors
• If a selection test is not reliable, it is
useless as a tool for selecting individuals
15
16
Validity
• Definition
The degree to which inferences from scores on
tests or assessments are justified by the
evidence
• Common Ways to Measure
– Content Validity
– Criterion Validity
– Construct Validity
17
Content Validity
• The extent to which test items sample the
content that they are supposed to measure
• In industry the appropriate content of a test

of test battery is determined by a job
analysis
18
Criterion Validity
• Criterion validity refers to the extent to which a
test score is related to some measure of job
performance called a criterion
• Established using one of the following research
designs:
– Concurrent Validity
– Predictive Validity
– Validity Generalization
19
Concurrent Validity
• Uses current employees
• Range restriction can be a problem
20
Predictive Validity
• Correlates test scores with future behavior
• Reduces the problem of range restriction
• May not be practical
21
Validity Generalization
• Validity Generalization is the extent to which a test
found valid for a job in one location is valid for the
same job in a different location
• The key to establishing validity generalization is
meta-analysis and job analysis
22
Typical Corrected Validity Coefficients for

Selection Techniques
Method Validity Method Validity
Structured Interview .57 College grades .32
Cognitive ability .51 References .29
Job knowledge .48 Experience .27
Work samples .48 Conscientiousness .24
Assessment centers .38 Unstructured interviews .20
Biodata .34 Interest inventories .10
Integrity tests .34 Handwriting analysis .09
Situational judgment .34 Projective personality tests .00
23
Construct Validity
• The extent to which a test actually measures
the construct that it purports to measure
• Is concerned with inferences about test
scores
• Determined by correlating scores on a test
with scores from other test
24
Face Validity
• The extent to which a test appears to be job
related
• Reduces the chance of legal challenge
• Increasing face validity
25
26
Utility
The degree to which a selection
device improves the quality of a
personnel system, above and
beyond what would have occurred
had the instrument not been used.
27
Selection Works Best When...

• You have many job openings
• You have many more applicants than openings
• You have a valid test
• The job in question has a high salary
• The job is not easily performed or easily
trained
28
Common Utility Methods

Taylor-Russell Tables
Proportion of Correct Decisions
The Brogden-Cronbach-Gleser Model
29
Utility Analysis
Taylor-Russell Tables
• Estimates the percentage of future employees
that will be successful
• Three components
– Validity
– Base rate (successful employees ÷ total employees)
– Selection ratio (hired ÷ applicants)
30
Taylor-Russell Example
• Suppose we have
– a test validity of .40
– a selection ratio of .30
– a base rate of .50
• Using the Taylor-Russell
Tables what percentage of
future employees would be
successful?
31
r. .05 .10 .20 .30 .40 .50 .60 .70 .80 .90 .95
50% .00 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50
.10 .58 .57 .56 .55 .54 .53 .53 .52 .51 .51 .50
.20 .67 .64 .61 .59 .58 .56 .55 .54 .53 .52 .51
.30 .74 .71 .67 .64 .62 .60 .58 .56 .54 .52 .51
.40 .82 .78 .73 .69 .66 .63 .61 .58 .56 .53 .52
.50 .88 .84 .76 .74 .70 .67 .63 .60 .57 .54 .52
.60 .94 .90 .84 .79 .75 .70 .66 .62 .59 .54 .52
.70 .98 .95 .90 .85 .80 .75 .70 .65 .60 .55 .53
.80 1.0 .99 .95 .90 .85 .80 .73 .67 .61 .55 .53
.90 1.0 1.0 .99 .97 .92 .86 .78 .70 .62 .5632 .53

• Proportion of Correct Decisions With Test
(Correct rejections + correct acceptances) ÷ Total employees
Quadrant II Quadrant IV Quadrants I+II+III+IV
• Baseline of Correct Decisions

Successful employees ÷ Total employees
Quadrants I + II Quadrants I+II+III+IV
33
10 x x x x x
I II
9 x x x
C 8 x x
r 7 x x x x x
i
t 6 IV x x x
e III
5 x x
r
i 4 x x
o
n 3 x x x
2 x x x
1 x x
1 2 3 4 5 6 7 8 9 10
Test Score (x) 34

• Proportion of Correct Decisions With Test
( 10 + 11 ) ÷ (5 + 10 + 4 + 11)
Quadrant II Quadrant IV Quadrants I+II+III+IV
= 21 ÷ 30 = .70
• Baseline of Correct Decisions

5 + 10 ÷ 5 + 10 + 4 + 11
Quadrants I + II Quadrants I+II+III+IV
= 15 ÷ 30 = .50
35
Brogden-Cronbach-Gleser Utility
Formula
• Gives an estimate of utility by estimating
the amount of money an organization would
save if it used the test to select employees.
Savings =(n) (t) (r) (SDy) (m) - cost of testing
• n= Number of employees hired per year
• t= average tenure
• r= test validity
• SDy=standard deviation of performance in dollars
• m=mean standardized predictor score of selected
applicants
36
Components of Utility
Selection ratio
The ratio between the number of openings to the
number of applicants
Validity coefficient
Base rate of current performance
The percentage of employees currently on the
job who are considered successful.
SDy
The difference in performance (measured in dollars)
between a good and average worker (workers one
standard deviation apart) 37
Calculating m
• For example, we administer a test of mental ability
to a group of 100 applicants and hire the 10 with
the highest scores. The average score of the 10
hired applicants was 34.6, the average test score of
the other 90 applicants was 28.4, and the standard
deviation of all test scores was 8.3. The desired
figure would be:
• (34.6 - 28.4) ÷ 8.3 = 6.2 ÷ 8.3 = ?
38
Calculating m
• You administer a test of mental ability to a group
of 150 applicants, and hire 35 with the highest
scores. The average score of the 35 hired
applicants was 35.7, the average test score of the
other 115 applicants was 24.6, and the standard
deviation of all test scores was 11.2. The desired
figure would be:
– (35.7 - 24.6) ÷ 11.2 = ?
39
Standardized Selection Ratio

SR m
1.00 .00
.90 .20
.80 .35
.70 .50
.60 .64
.50 .80
.40 .97
.30 1.17
.20 1.40
.10 1.76
.05 2.08
40
Example
– Suppose:
• we hire 10 auditors per year
• the average person in this position stays 2 years
• the validity coefficient is .40
• the average annual salary for the position is $30,000
• we have 50 applicants for ten openings.
– Our utility would be:
(10 x 2 x .40 x $12,000 x 1.40) – (50 x 10) =
$133,900
41
42
Definitions
• Measurement Bias
– Technical aspects of the test
– A test is biased if there are group differences in
test scores (e.g., race, gender) that are unrelated
to the construct being measured (e.g., integrity)
• Predictive Bias
– A test is fair if people of equal probability of
success on a job have an equal chance of being
hired
43
Adverse Impact
Occurs when the selection rate for one group is
less than 80% of the rate for the highest scoring
group
Male Female
Number of applicants 50 30
Number hired 20 10
Selection ratio .40 .33
.33/.40 = .83 > .80 (no adverse impact)

44
Adverse Impact - Example 2
Male Female
Number of applicants 40 20
Number hired 20 4
Selection ratio .50 .20
.20/.50 = .40 < .80 (adverse impact)
45
Standard Deviation Method

1. Compute Standard Deviation
female applicants male applicants
x x total hired
total applicants total applicants
2. Multiply standard deviation by 2
3. Compute expected number of females to be hired
(female applicants/total applicants) x total hired
4. Compute confidence interval (expected ± 2 SD)
5. Determine if number of females hired falls within
the confidence interval 46
Standard Deviation Example

1. Compute Standard Deviation
10 40
x x 20 = .20 x .80 x 20 = 3.2 = 1.79
50 50
2. Multiply standard deviation by 2 = 1.79 * 2 = 3.58
3. Compute expected number of females to be hired
(10/50) x 20 = .2 x 20 = 4
4. Compute confidence interval (.42 ¬ 4 ® 7.58)
5. Determine if number of females hired falls within
the confidence interval 47
Other Fairness Issues

• Single-Group Validity
– Test predicts for one group but not another
– Very rare
• Differential Validity
– Test predicts for both groups but better for one
– Also very rare
48
49
Linear Approaches to Making the

Selection Decision
• Unadjusted Top-down Selection
• Passing Scores
• Banding
50
The Top-Down Approach

Who will perform the best?
A “performance first” hiring formula
Applicant Sex Test Score
Drew M 99
Eric M 98
Lenny M 91
Omar M 90
Mia F 88
Morris M 87 51
Top-Down Selection
Advantages
• Higher quality of selected applicants
• Objective decision making
Disadvantages
• Less flexibility in decision making
• Adverse impact = less workforce diversity
• Ignores measurement error
• Assumes test score accounts for all the variance in
performance (Zedeck, Cascio, Goldstein & Outtz, 1996).
52
The Passing Scores Approach

Who will perform at an acceptable level?
A passing score is a point in a distribution of scores that
distinguishes acceptable from unacceptable performance
(Kane, 1994).
Uniform Guidelines (1978) Section 5H:

Passing scores should be reasonable and consistent with
expectations of acceptable proficiency
53
Passing Scores
Applicant Sex Score
Omar M 98
Eric M 80
Mia F 70 (passing score)
Morris M 69
Tammy F 58
Drew M 40
54
Passing Scores
Advantages
• Increased flexibility in decision making
• Less adverse impact against protected
groups
Disadvantages
• Lowered utility
• Can be difficult to set
55
Five Categories of Banding

• Top-down (most inflexibility)
• Rules of “Three” or “Five”
• Traditional banding
• Expectancy bands
• SEM banding (standard error of measurement)
• Testing differences between scores for statistical
significance.
• Pass/Fail bands (most flexibility)
56
Top-Down Banding

Drew M 99
Eric M 98
Lenny M 91
Omar M 90
Mia F 88
Morris M 87
57
Rules of “Three” or “Five”

Drew M 99
Eric M 98
Lenny M 91
Omar M 90
Jerry F 88
Morris M 87
58
Traditional Bands
• Based on expert judgment
• Administrative ease
• e.g. college grading system
• e.g. level of job
qualifications
59
Expectancy Bands
Band Test Score Probability
A 522 – 574 85%
B 483 – 521 75%
C 419 – 482 66%
D 0 – 418 56%
60
SEM Bands
“Ranges of Indifference”
• A compromise between the

top-down and passing
scores approach.
• It takes into account that
tests are not perfectly
reliable (error).
61
SEM Banding
• Compromise between top-down selection and passing scores
• Based on the concept of the standard error of measurement
• To compute you need the standard deviation and reliability of
the test
Standard error =
SD 1 - reliability
• Band is established by multiplying 1.96 times the standard

error
62
Applicant Sex Score Band 1 Band 2 Band 3 Band 4
Armstrong m 99 x hired hired hired
Glenn m 98 x x hired hired
Grissom m 94 x x x hired
Aldren m 92 x x x x
Ride f 88 x hired
Irwin m 87 x x
Carpenter m 84 x
Gibson m 80
McAuliffe f 75
Carr m 72
Teshkova f 70
Jamison m 65
Pogue m 64
Resnick f 61
Anders m 60
Borman m 58
Lovell m 57
Slayton m 55
Kubasov f 53 63
Applicant Sex Score Band 1 Band 2 Band 3 Band 4 Band 5

Clancy m 97 x hired
King m 95 x x hired
Koontz m 94 x x x hired
Follot m 92 x x x x hired
Saunders m 88 x x x x
Crichton m 87 x x x x
Sanford m 84 x x x
Dixon m 80 x
Wolfe m 75
Grisham m 72
Clussler m 70 12.8 1 - .90
Turow m 65
12.8 .10
Cornwell f 64
Clark f 61 = 12.8 * .316
Brown f 60 = 4.04
Band = 4.04 * 1.96 = 7.92 ~ 8
64
Types of SEM Bands

Fixed
Sliding
Diversity-based
• Females and
minorities are given
preference when
selecting from
within a band.
65
Pass or Fail Bands

(just two bands)
Applicant Sex Score

Omar M 98
Eric M 80
Mia F 70 (cutoff)
Morris M 69
Tammy F 58
Drew M 40
66
Advantages of Banding
• Helps reduce adverse impact, increase
workforce diversity,and increase perceptions of
fairness (Zedeck et al., 1996).
• Allows you to consider secondary criteria
relevant to the job (Campion et al., 2001).
67
Disadvantages of Banding
(Campion et al., 2001)
• Lose valuable information

• Lower the quality of people
selected
• Sliding bands may be
difficult to apply in the
private sector
• Banding without minority
preference may not reduce
adverse impact
68
Factors to Consider When

Deciding the Width of a Band
(Campion et. al, 2001)
• Narrow bands are preferred

• Consequences or errors in selection
• Criterion space covered by selection device
• Reliability of selection device
• Validity evidence
• Diversity issues
69
Legal Issues in Banding

(Campion et al., 2001).
Banding has generally been

approved by the courts
• Bridgeport Guardians v. City of
Bridgeport, 1991
• Chicago Firefighters Union Local
No.2 v. City of Chicago, 1999
• Officers for Justice v. Civil Service
Commission, 1992
Minority Preference
70
What the Organization Should do

to Protect Itself
• The company should have
established rules and procedures
for making choices within a band
• Applicants should be informed
about the use and logic behind
banding in addition to company
values and objectives (Campion et
al., 2001).
71
Banding Example
• Sample Test Information • The Band
– Reliability = .80 Band = Standard error * 1.96
– Mean = 72.85 Band = 4.07 * 1.96 = 7.98 ~ 8
– Standard deviation = 9.1
• The Standard Error • Example 1
SD 1 - reliability – We have four openings
– We would like to hire more
females
9.1 1 - .80
• Example 2
9.1 .20 – Reliability = .90
– Standard deviation = 12.8
= 9.1 * .447
= 4.07 72
Should the top scorers on a test

always get the job?
73
Applied Case Study:

Thomas A. Edison’s Employment Test
74
Focus on Ethics
Diversity Efforts
75
What Do You Think?

• To increase diversity, it is often legal to consider race or
gender as a factor in selecting employees. Although legal,
do you think it is ethical that race or gender be a factor in
making an employment decision? How much of a role
should it play?
• Is it ethical to hire a person with a lower test score because
he or she seems to be a better personality fit for an
organization?
• If an I/O psychologist is employed by a company that
appears to be discriminating against Hispanics, is it ethical
for her to stay with the company? What ethical obligations
does she have?
76

Chap 06 - Characteristics of Effective Tests

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 06 - Characteristics of Effective Tests

Uploaded by

Copyright:

Available Formats

© 2013 Cengage Learning

Optimal Employee Selection Systems

Alternate Forms Reliability

• Two forms of the same test are developed, and to the

Alternate Forms Reliability

Alternate Forms Reliability

• Content sampling errors

• Time sampling errors

Determining Internal Reliability

If we have a split-half correlation of .60,

Common Methods for

Rate the Waiter’s Performance

• In industry the appropriate content of a test

• Range restriction can be a problem

Typical Corrected Validity Coefficients for

Selection Works Best When...

Common Utility Methods

Proportion of Correct Decisions

The Brogden-Cronbach-Gleser Model

Proportion of Correct Decisions

• Baseline of Correct Decisions

Proportion of Correct Decisions

• Baseline of Correct Decisions

Standardized Selection Ratio

.33/.40 = .83 > .80 (no adverse impact)

Adverse Impact - Example 2

.20/.50 = .40 < .80 (adverse impact)

Standard Deviation Method

Standard Deviation Example

Other Fairness Issues

Linear Approaches to Making the

The Top-Down Approach

The Passing Scores Approach

Uniform Guidelines (1978) Section 5H:

Five Categories of Banding

Applicant Sex Test Score

Rules of “Three” or “Five”

Applicant Sex Test Score

B 483 – 521 75%

C 419 – 482 66%

• A compromise between the

• Band is established by multiplying 1.96 times the standard

Applicant Sex Score Band 1 Band 2 Band 3 Band 4 Band 5

Types of SEM Bands

Pass or Fail Bands

Applicant Sex Score

• Lose valuable information

Factors to Consider When

• Narrow bands are preferred

Legal Issues in Banding

Banding has generally been

What the Organization Should do

Should the top scorers on a test

Applied Case Study:

What Do You Think?

You might also like