Professional Documents
Culture Documents
categories
test bias
Construct validity bias
- Refers to whether a test accurately measures what it was designed to measure
Content validity bias
- Occurs when the content of a test is comparatively more difficult for one group than for
others
- Can occur when members of a subgroup, such as various minority groups, have not been
given the same opportunity to learn the material being tested = score lower if tests
contains such items
- Can occur when scoring is unfair to a group
® For example, the answers that would make sense in one group’s culture are
deemed incorrect
® Among the Japanese, for instance, abasement (or the tendency to readily admit
one’s fault) is perceived to be positive, so they would be expected to score high in
that trait
BES3149 PSYCHOMET 1
VALIDITY, BIAS, AND FAIRNESS
- It can occur when questions are worded in ways that are unfamiliar to certain groups
because of linguistic or cultural differences.
- Item selection bias = subcategory of this bias, refers to the use of individual test items
that are more suited to one group’s language and cultural experiences
Predictive-validity bias
- Can be also known as Criterion-related validity
- Refers to a test’s acuracy in predicting how well a certain student group will perform in
the future
® For example, a test would be considered “unbiased” if it predicted future
academic and test performance equally well for all groups of students
examples
Rating errors
Leniency/generosity errors
- An error in rating that arises from the tendency on the part of the rater to be lenient in
scoring, marking, and/or grading
- Leninent to all
BES3149 PSYCHOMET 1
VALIDITY, BIAS, AND FAIRNESS
severity errors
- An error in rating wherein the rater becomes overly strict and gives low ratings
- Strict to all
Central tendency errors
- Rater is reluctant to give extremely high or low ratings
- Ratings cluster at the middle of the continuum
Halo effect
- Tendency for a rater to give a particular ratee a higher rating than he or she objectively
deserves because of the rater’s failure to discriminate among conceptually distinct and
potentially independent aspect of ratee’s behavior
- Positive look ng rater towards a ratee
Horn effect
- Negative look ng rater towards a ratee
TAKE NOTE:
§ Leniency/generosity, severity, and central tendency errors are
also called Distribution Errors or Restriction-of-Range Rating
Errors
§ One remedy to address these errors is to use ranking
® Procedure that requires the rater to measure individuals
against one another instead of against an absolute scale
® By using rankings instead of ratings, the rater is forced to
select first, second, third and so forth choices
§ Another remedy is to provide raters with a list of specific
competencies to be evaluted, as well as how such evaluations for
comptency should be evaluated
§ Distribution errors = effect is sa lahat
§ Halo and Horn effect = sa particular na tao lang
-
avoiding
Test bias and lack of test fairness
- Very much like measurement error, some degree of bias and unfairness in testing may be
unavoidable
® The inevitability of test bias and unfairness are among the reasons that many test
developers and testing experts caution against making important decisions based
on a single test result
- Given the fact that test results continue to be widely used when making important
decisions, test developers and experts have identified a number of strategies that can
reduce, if not eliminate, test bias and unfairness
® A few representative examples include:
a. Striving for diversity in test-development staffing, and training test developers and scorers to be
aware of the potential for cultural, linguistic, and socioeconomic bias
b. Having test materials reviewed by experts trained in identifying cultural bias and by representatives
of culturally and linguistically diverse subgroups
c. Ensuring that norming processes and sample sizes used to develop norm- referenced tests are
inclusive of diverse subgroups and large enough to constitute a representative sample.
d. Eliminating items that produce the largest racial and cultural performance gaps, and selecting items
that produce the smallest gaps—a technique known as “the golden rule.”
® This particular strategy may be logistically difficult to achieve, however, given the number
of racial, ethnic, and cultural groups that may be represented in any given testing population
e. Screening for and eliminating items, references, and terms that are more likely to be offensive to
certain groups
f. Translating tests into a test taker’s native language or using interpreters to translate test items
g. Including more “performance-based” items to limit the role that language and word-choice plays in
test performance
h. Using multiple assessment measures to determine academic achievement and progress, and
avoiding the use of test scores, in exclusion of other information, to make important decisions about
students
® These recommendations are set to be more appropriate for tests in the educational
setting, although they may be applied to other settings (i.e., industrial, clinical,
etc.) as well
Take note
- Test bias = closely related to the issue of test fairness
- Test fairness = how the test score was used or applied