l8 Validity Bias Fairness

BES3149 PSYCHOMET 1
VALIDITY, BIAS, AND FAIRNESS

test bias EXISTENCE OF TEST BIAS
Bias is a factor inherent in a test that - When test items are easier for one
group of people, than for another
systematically prevents accurate,
impartial measurement - When people with certain qualities
perform better than those without
such qualities, particularly when this
test fairness quality is not related to the variable
- Refers to the extent in being measured
which a test is used in an - Reflects a systematic variation in test
impartial, just, and equitable scores
way ® Systematic variance =
- Has to do with the members of one group score
appropriate use of test higher on a certain variable or
scores, and it is a social, lower
philosophical, or perhaps
legal term that represents
one’s value judgement
- A test may be valid, but it
can be used fairly or
unfairly
® The issue of test
fairness = leads to a
lot of debates and
arguments
categories
test bias
Construct validity bias
- Refers to whether a test accurately measures what it was designed to measure
Content validity bias
- Occurs when the content of a test is comparatively more difficult for one group than for
others
- Can occur when members of a subgroup, such as various minority groups, have not been
given the same opportunity to learn the material being tested = score lower if tests
contains such items
- Can occur when scoring is unfair to a group
® For example, the answers that would make sense in one group’s culture are
deemed incorrect
® Among the Japanese, for instance, abasement (or the tendency to readily admit
one’s fault) is perceived to be positive, so they would be expected to score high in
that trait
BES3149 PSYCHOMET 1
- It can occur when questions are worded in ways that are unfamiliar to certain groups
because of linguistic or cultural differences.
- Item selection bias = subcategory of this bias, refers to the use of individual test items
that are more suited to one group’s language and cultural experiences
Predictive-validity bias
- Can be also known as Criterion-related validity
- Refers to a test’s acuracy in predicting how well a certain student group will perform in
the future
® For example, a test would be considered “unbiased” if it predicted future
academic and test performance equally well for all groups of students
Factors that promotes test bias

§ If the test developer is not demographically or culturally
representative of the intended test takers, test items may reflect
inadvertent bias
§ Norm-referenced tests may be biased if the “norming process”
does not include representative samples of all the tested
subgroups
§ Certain test formats may have an inherent bias toward some
groups, at the expense of others
§ The choice of language in test questions can introduce bias
§ Tests may be considered biased if they include references to
cultural details that are not familiar to particular groups
bias and rating errors

- Much of assessment involves rating
- Process of rating can be biased = rater gives inaccurate ratings due to certain factors
- Rating = numerical or verbal judgement (or both) that places a person or an attribute
along a continuum identified by a scale of numerical or word descriptors known as rating
scale
- Rating error = judgement resulting from the intentional or unintentional misuse of rating
scale
examples
Rating errors
Leniency/generosity errors
- An error in rating that arises from the tendency on the part of the rater to be lenient in
scoring, marking, and/or grading
- Leninent to all
BES3149 PSYCHOMET 1
severity errors
- An error in rating wherein the rater becomes overly strict and gives low ratings
- Strict to all
Central tendency errors
- Rater is reluctant to give extremely high or low ratings
- Ratings cluster at the middle of the continuum
Halo effect
- Tendency for a rater to give a particular ratee a higher rating than he or she objectively
deserves because of the rater’s failure to discriminate among conceptually distinct and
potentially independent aspect of ratee’s behavior
- Positive look ng rater towards a ratee
Horn effect
- Negative look ng rater towards a ratee
TAKE NOTE:
§ Leniency/generosity, severity, and central tendency errors are
also called Distribution Errors or Restriction-of-Range Rating
Errors
§ One remedy to address these errors is to use ranking
® Procedure that requires the rater to measure individuals
against one another instead of against an absolute scale
® By using rankings instead of ratings, the rater is forced to
select first, second, third and so forth choices
§ Another remedy is to provide raters with a list of specific
competencies to be evaluted, as well as how such evaluations for
comptency should be evaluated
§ Distribution errors = effect is sa lahat
§ Halo and Horn effect = sa particular na tao lang
-
addressing rating errors

- Training programs to familiarize raters with common rating errors and sources of rater
bias have shown promise in reducing rating errors and increasing measures of reliability
and validity
- Lecture, role playing, discussion, watching oneself on videotape, and computer
simulation of different situations are some of the many techniques that could be brought
to bear in such training programs
BES3149 PSYCHOMET 1
avoiding
Test bias and lack of test fairness
- Very much like measurement error, some degree of bias and unfairness in testing may be
unavoidable
® The inevitability of test bias and unfairness are among the reasons that many test
developers and testing experts caution against making important decisions based
on a single test result
- Given the fact that test results continue to be widely used when making important
decisions, test developers and experts have identified a number of strategies that can
reduce, if not eliminate, test bias and unfairness
® A few representative examples include:
a. Striving for diversity in test-development staffing, and training test developers and scorers to be
aware of the potential for cultural, linguistic, and socioeconomic bias
b. Having test materials reviewed by experts trained in identifying cultural bias and by representatives
of culturally and linguistically diverse subgroups
c. Ensuring that norming processes and sample sizes used to develop norm- referenced tests are
inclusive of diverse subgroups and large enough to constitute a representative sample.
d. Eliminating items that produce the largest racial and cultural performance gaps, and selecting items
that produce the smallest gaps—a technique known as “the golden rule.”
® This particular strategy may be logistically difficult to achieve, however, given the number
of racial, ethnic, and cultural groups that may be represented in any given testing population
e. Screening for and eliminating items, references, and terms that are more likely to be offensive to
certain groups
f. Translating tests into a test taker’s native language or using interpreters to translate test items
g. Including more “performance-based” items to limit the role that language and word-choice plays in
test performance
h. Using multiple assessment measures to determine academic achievement and progress, and
avoiding the use of test scores, in exclusion of other information, to make important decisions about
students
® These recommendations are set to be more appropriate for tests in the educational
setting, although they may be applied to other settings (i.e., industrial, clinical,
etc.) as well
Take note
- Test bias = closely related to the issue of test fairness
- Test fairness = how the test score was used or applied

l8 Validity Bias Fairness

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

l8 Validity Bias Fairness

Uploaded by

Copyright:

Available Formats

BES3149 PSYCHOMET 1

VALIDITY, BIAS, AND FAIRNESS

Factors that promotes test bias

bias and rating errors

addressing rating errors

You might also like