You are on page 1of 6

CHAPTER 6 • SCORER RELIABILITY • THE PEOPLE WHO WILL BE TAKING

- The extent to which two people THE TEST


EVALUATING SELECTION TECHNIQUES
scoring a test agree on the test score, - For example, if you will be using the
AND DECISIONS
or the extent to which a test is scored test for managers, but the reliability
correctly. coefficient in the test manual was
▪ INTERRATER established with high school
CHARACTERISTICS OF EFFECTIVE - Also known as inter-observer students, you would have less
SELECTION TECHNIQUES reliability or inter-scorer reliability, is confidence that the reliability
• RELIABILITY a measure of the consistency and coefficient would generalize well to
- The extent to which a score from a agreement between two or more your organization.
test or from an evaluation independent raters or observers
is consistent and free from error. when assessing or scoring the same • VALIDITY
- If a score from a measure is not set of data, such as responses, - Is the degree to which inferences
stable or error-free, it is not useful. behaviors, or judgments. from scores on tests or
- assessments are justified by the
METHODS USED TO DETERMINE evidence. As with reliability, a test
4 WAYS TO DETERMINE TEST INTERNAL CONSISTENCY: must be valid to be useful. But just
RELIABILITY • KUDER-RICHARDSON FORMULA 20 because a test is reliable does not
(K-R 20) mean it is valid.
• TEST-RETEST RELIABILITY
- A statistic used to determine internal
- The extent to which repeated 5 COMMON STRATEGIES TO
reliability of tests that use items with
administration of the same test will INVESTIGATE THE VALIDITY OF SCORES
dichotomous answers (yes/no,
achieve similar results. ON A TEST:
true/false).
▪ TEMPORAL STABILITY
- The consistency of test scores 1. Content Validity
across time • SPLIT-HALF METHOD 2. Criterion Validity
- A form of internal reliability in which 3. Construct Validity
the consistency of item responses is 4. Face Validity
• ALTERNATE-FORMS RELIABILITY
determined by comparing scores on 5. Known-Group Validity
- The extent to which two forms of
half of the items with scores on the
the same test are similar CHOOSING A WAY TO MEASURE
other half of the items.
▪ COUNTERBALANCING VALIDITY
- A method of controlling for order
effects by giving half of a sample. • SPEARMAN-BROWN PROPHECY • NEXT-DOOR NEIGBOUR RULE
Test A first, followed by Test B, and FORMULA • GANDY CRITICAL THINKING TEST
giving the other half of the sample - Used to correct reliability coefficients - A test itself can never be valid when
Test B first, followed by Test A resulting from the split-half method we speak of validity, we are speaking
▪ FORM STABILITY about the validity of the test scores as
- The extent to which the scores on • COEFFICIENT ALPHA they relate to a particular job
two forms of a test are similar. - A statistic used to determine internal
reliability of tests that use interval or • FACE VALIDITY
• INTERNAL RELIABILITY ratio scales. - extent to which a test appears to be
- The extent to which similar items job related this perception is
are answered in similar ways is EVALUATING THE RELIABILITY OF A important because if a test or its
referred to as internal consistency TEST items do not appear valid that
and measures item stability. • THE MAGNITUDE OF THE doesn't a curse and administrators
▪ ITEM STABILITY RELIABILITY COEFFICIENT will not have confidence in the results
- The extent to which responses to - To evaluate the coefficient, you can
the same test items are consistent. compare it with reliability • BARNUM STATEMENTS
▪ ITEM HOMOGENEITY coefficients typically obtained for - such as those used in astrological
- The extent to which test items similar types of tests. forecast that are so general that they
measure the same construct. can be true of almost anyone.
FINDING RELIABILITY AND VALIDITY ESTABLISHING THE USEFULNESS OF A PROPORTION OF CORRECT DECISIONS
INFORMATION SELECTION DEVICE
- The proportion of correct decisions is
• MENTAL MEASUREMENTS • TAYLOR-RUSSELL TABLES a fundamental concept in assessing
YEARBOOK (MMY) - a series of tables based on the the effectiveness of selection
- Contains information about the selection ratio, base rate, and believe techniques in the context of
reliability that yield information about the employee hiring or promotion.
percentage of future employees who - This metric measures the accuracy of
• TESTS IN PRINT VIII will be successful if a particular test is a selection method in correctly
- Another excellent source of used identifying the individuals who are
information. - designed to estimate the percentage the best fit for a job or role.
of future employees who will be - In essence, it quantifies the
successful on the job if an percentage of times a selection
COST-EFFICIENCY procedure correctly identifies
organization uses a particular test.
- If two or more tests have similar qualified candidates, thereby
validities, then cost should be • INFORMATION NEEDED FOR reducing the likelihood of hiring or
considered. TAYLOR-RUSSELL TABLES promoting individuals who are not
- A particular test is usually designed - Criterion Validity Co-efficient suited for the position.
to be administered either to - Selection Ratio
LAWSHE TABLES
individual applicants or to a group of - Base rate
applicants. - Lawshe Tables are a statistical tool
- an increasing number of used to evaluate the validity of
organization or administering their selection procedures, such as
test over the internet or at remote interviews, tests, or assessments.
testing locations. In computer- - These tables provide a systematic
assisted testing an applicant takes a BASE RATE IS USUALLY OBTAINED IN 1 approach to assessing the
test online computers course as a OF 2 WAYS relationship between a selection
result of the test and interpretation technique and a specific job
or a media available • 1ST METHOD performance criterion.
- this increase in efficiency does not - employees are split into two equal - By using Lawshe Tables, organizations
come at the cause of decreased groups based on their scores on can determine the effectiveness of
validity because as mentioned some criteria such as tenure or their selection methods in predicting
previously test administered performance. job success. It's a structured way to
electronically seem to yield results gauge the validity and reliability of
similar to those administered • 2ND METHOD these techniques.
through the traditional paper and - choose a criterion measure score
pencil format above which all employees are
considered successful BROGDEN-CRONBACH-GLESER UTILITY
• COMPUTER-ADAPTIVE TESTING FORMULA
(CAT) - after the validity selection, ratio, and
base rate figures have been obtained - a mathematical approach for
- a type of test taken on a computer
the Taylor-Russell tables are evaluating the utility or effectiveness
in which the computer adapts the
consulted. of a selection procedure.
difficulty level of questions asked to
- It considers the costs, benefits, and
the test taker's success in answering
probabilities associated with various
previous questions.
selection decisions, helping
organizations make informed choices
based on a cost-benefit analysis.
- This formula assists in optimizing the
selection process by balancing the
trade-off between hiring the right
candidates and managing the costs
and potential risks involved.
DETERMINING THE FAIRNESS OF A • SINGLE-GROUP VALIDITY • MULTIPLE REGRESSION
TEST - The characteristic of a test that - A statistical procedure in which the
significantly predicts a criterion for scores from more than one criterion-
- Once a test has been determined to
one class of people but not for valid test are weighted according to
be reliable and valid and to have
another how well each test score predicts the
utility for an organization, the next
- If a test accurately predicts job criterion.
step is to ensure that the test is fair
performance for engineers but not - If we want to predict how well
and unbiased.
for salespeople, it shows single- someone will do in a job, we can use
group validity. their math and language test scores
• BIAS OR UNBIASED
together to get a more accurate
- the technical aspects of a test
• DIFFERENTIAL VALIDITY prediction.
- The characteristic of a test that
• FAIRNESS
significantly predicts a criterion for • UNADJUSTED TOP-DOWN
- can include bias, but also includes
two groups, such as both minorities SELECTION
political and social issues
and non-minorities, but predicts - Selecting applicants in straight rank
significantly better for one of the two order of their test scores.
• MEASUREMENT BIAS
groups - If we have test scores ranging from
- Group differences in test scores that
- If a test predicts job success for both 100 to 50, we would hire the person
are unrelated to the construct being
men and women, but it's much more with the score of 100 first, then the
measured.
accurate for men, it shows one with 99, and so on, in descending
- If a test has words favoring one race
differential validity. order.
over another, but those words
aren't relevant to the job, the test
• PERCEPTION OF FAIRNESS HELD BY
could be seen as biased and unfair
THE TEST TAKER • COMPENSATORY APPROACH
for that job.
- That is, a test may not have - A method of making selection
measurement or predictive bias, but decisions in which a high score on
• ADVERSE IMPACT
applicants might perceive the test one test can compensate for a low
- An employment practice that
itself or the way in which the test is score on another test.
results in members of a protected
administered as not being fair. - A high general weighted average
class being negatively affected at a
- If they believe the test is too long or might compensate for a low graduate
higher rate than members of the
that it's administered in a confusing record examination score.
majority class. Adverse impact is
way, their perception of fairness can
usually determined by the four-
be affected. • RULE OF THREE
fifths rule
- A variation on top-down selection in
- If 80% of men are hired for a job, but
which the names of the top three
only 40% of women are hired, it MAKING THE HIRING DECISION
applicants are given to a hiring
could indicate adverse impact
- After valid and fair selection tests authority who can then select any of
against women in the hiring
have been administered to a group of the three.
process.
applicants, a final decision must be - Giving the hiring manager the top
made as to which applicant or three candidates to choose from,
• PREDICTIVE BIAS
applicants to hire. At first, this may adding a bit of flexibility to the
- A situation in which the predicted
seem to be an easy decision—hire selection process.
level of job success falsely favors
the applicants with the highest test
one group over another.
scores. But the decision becomes • PASSING SCORE
- If a test wrongly predicts that men
more complicated as both the - The minimum test score that an
are more likely to succeed in a job
number and variety of tests increase. applicant must achieve to be
when that's not true, it's predictive
considered for hire.
bias.
- If a job requires a passing score of 70,
anyone scoring below that won't be
considered.
• MULTIPLE CUT-OFF APPROACH
- A selection strategy in which
applicants must meet or exceed the
passing score on more than one
selection test
- To get a job, one might need to do
well on both a math test and a
communication test, not just one of
them.

• MULTIPLE HURDLE APPROACH


- A selection practice of
administering one test at a time so
that applicants must pass that test
before being allowed to take the
next test.
- Passing a fitness test to get to the
interview stage in firefighter hiring

• BANDING
- A statistical technique based on the
standard error of measurement that
allows similar test scores to be
grouped.
- Test scores that are very close to
each other, like 85 and 86, can be
considered in the same group
because of the standard error of
measurement.

• STANDARD ERROR
- Used to determine how many points
should the applicants have to say
that their test scores are
significantly different.
- If two applicants have test scores
that differ by more than the
standard error, we can say their
scores are significantly different

• STANDARD ERROR OF
MEASUREMENT (SEM)
- The number of points that a test
score could be off due to test
unreliability.
- If a test has a SEM of 2 points, a
score of 80 could actually be
anywhere from 78 to 82 due to the
test's unreliability.
How does the length of a test generally What is adverse impact in testing? What is the main focus of construct
relate to its internal consistency? validity?
- A statistically significant
- Inferences about test construction.
- Longer tests tend to have higher difference in selection rates
internal consistency. between groups.
In a predictive validity design, how is
What does single-group validity
assess? Which method involves splitting test criterion validity established?
items into two groups, usually odd- - By correlating test scores with a
- Whether the test will numbered and even-numbered items, future measure of job
significantly predict and correlating the scores on these performance
performance for one group and groups to determine internal
consistency?
not others.
- Split-half method What is the relationship between
What is content validity concerned reliability and validity?
with? - Reliability is necessary for
What is the purpose of the alternate-
- The extent to which test items validity, but having reliability
forms reliability method?
sample the content they are does not guarantee validity.
- To eliminate effects of test-
supposed to measure. taking order
Which characteristic is described as the
If a test lacks either form stability or extent to which a score from a
How does differential validity differ
temporal stability, what method is selection measure is stable and free
from single-group validity?
needed to determine the cause of the from error?
- Single-group validity is concerned
unreliability? - Reliability
with predicting performance for
- Test-retest reliability one group, while differential
validity involves the test being What does internal consistency
What does criterion validity measure? more valid for one group than the measure?
other. - The consistency with which an
- The relationship between a test
applicant responds to items
score and some measure of job
measuring a similar construct.
performance.
What does the term "bias" in the
context of testing refer to?
Which of the following terms refers to
How is adverse impact determined in - Group differences in test scores a method that involves using a
the context of testing? unrelated to the construct being computer program to calculate
- Both a and b. measured. internal reliability?
- Coefficient alpha
Which type of anxiety, according to the In the context of test-retest reliability,
passage, is important to have temporal why is it important to have a time
What is the impact of item
stability for a test to be useful? interval between the two test
homogeneity on internal consistency?
administrations?
- State anxiety - Higher item homogeneity leads
- To reduce the potential advantage
to higher internal consistency.
What are the four characteristics of to individuals who take the test a
effective selection techniques second time How is the test-retest reliability
mentioned in the book? determined?
- Reliable, valid, cost- - By counterbalancing test-taking
efficient, legally defensible order

You might also like