This action might not be possible to undo. Are you sure you want to continue?
Chapter 19, "Test Bias," pages 511–544.
Imagine that you are a professional who wants to evaluate the progress of a given client through psychological testing.
Identify the different sources of bias you might encounter in the assessment process. Discuss the specific steps you would take to reduce test bias effects and what you would look for in an assessment instrument for purposes of bias reduction. Describe how the standard deviation of a test score affects the interpretation of test scores.
Response Guidelines Critically respond to at least one other learner. Are their strategies for reducing bias adequate? Provide feedback to improve their strategy or strengthen their post. The issue of systemic bias in psychological testing has been the subject of debate for many years. The Standards text indicates in 7.3 that “when credible research reports that differential item functioning exists across age, gender, racial/ethnic, cultural, disability, and/or linguistic groups in the population of test takers in the content domain measured by the test, test developers should conduct appropriate studies when feasible. Such research should seek to detect and eliminate aspects of test design, content, and format that might bias test scores for particular groups (AERA, APA, & NCME, 1999).” It is obvious from the above statement that even if the test seems reliable and valid, it can be bias if it systematically distorts the true scores of certain individuals or groups. Although test bias has been difficult to define, Hunter and Schmidt (1976) identify three ethical issues at work in testing bias: unqualified individualism, the use of quotas, and qualified individualism (as cited in Kaplan, R. M., & Saccuzzo, D. P., 2009). Unqualified individualism uses tests to select the most qualified individuals without regard to race and gender. The use of quotas recognizes race and gender in the testing process. Qualified individualism uses tests in the same way as unqualified individualism except it takes into consideration race, gender and religion. Test bias can appear in many forms. Some of the common ones are: cultural, gender, ethnic/race, age, measurement and prediction bias. Measurement bias occurs when a test has systematic errors in measuring a particular characteristic or attribute. Prediction bias takes place when a test makes systematic errors in predicting some outcome. In order to accurately begin to understand whether a test is bias and will result in content irrelevant scores one must first ask two questions: 1) Does the test fulfill the intent of the
and it can be very difficult to decide upon a criterion for forming groups in the first place. Washington. Standards for educational and psychological testing. Another example would be an English-based intelligence test given to a nonEnglish speaker. CA: Cengage. Also. religious language or references that are characteristic of particular groups. But it is important to remember that individual differences within groups tend to be much greater than differences between groups. American Psychological Association. P. but significant. & Saccuzzo. use of stereotypes. but if the sample is too large then almost every item will show a small. language that is considered offensive or demeaning. a particular bias for a specific geographical region and the implication of a particular socioeconomic status.). degree of bias.. & Saccuzzo. . M. certain groups. R. For instance. (2009). P.. Since the standard deviation is the average deviation around the mean. Psychological testing: Principles. In practice the standard deviation of an observed score can estimate the standard error of measurement. & National Council on Measurement in Education. applications. DC: American Educational Research Association. (1999). or against. and/or due to internal factors. 2009). It is very important for tests in selection procedures to be unbiased so that certain groups are not underrepresented in the workforce. Even if a test is both reliable and valid.developer? and 2) Does the test contain any content-irrelevant information that reveals a correct answer? In order to reduce bias one must examine if the following areas exist: language that may contain different meanings for different groups. Repeated applications of the same test can yield different scores. that is no guarantee that it will not be biased for. Test developers should review the test and make appropriate adjustments. The test should be removed from use until further validation is provided. a very large sample is required before subtle degrees of bias can be detected. such as group differences. Bias can be due to external factors. A test is biased if it systematically underestimates or overestimates the true scores of certain individuals. and issues (7th ed. References American Educational Research Association.. Classic test theory tells us that errors of measurement are random because testing instruments are imperfect in measuring a person’s true score (Kaplan. Belmont. The standard deviation of the scores can tells us something about the measurement of error around the true score or mean. such as some questions being significantly harder for a particular group. Kaplan. D. knowledge-based tests are always biased if given to people who have no way of realistically knowing the answers. D. M. If testing bias does exist an investigation should follow. R.
or age – among others. how are the different items interpreted by people from different cultures. According to the standards for Educational and Psychological Testing “the term bias in tests and testing refers to construct-irrelevant components that result in systematically lower or higher scores for identifiable groups of examinees”.g. actual content (face) validity may be different for different cultures.. religion. Test Bias and How to Identify It An assessment item is considered biased if it favors examinees. Group membership may be related to such factors as socioeconomic status. research shows that there is a bias in favour of good-looking applicants.e. i. etc. Hence use of half -vely and half +vely worded items (but there can be semantic difficulties with -vely wording) o Social desirability: tendency to portray self in a positive light. which are a type of test.g. X s ) can be thought of as being made up of two components: T s . not because of their knowledge or skill relative to a learning objective. cultural background. compensation. say "Yes”.). an item should measure content in a way that neither advantages nor disadvantages examinees because of group membership. gender. To be free of bias. o Gender bias may also be possible. many say that most IQ tests may well be valid for middle-class whites but not for blacks or other minorities. It is often suggested that tests used in academic admissions and in personnel selection under-predict the performance of minority applicants Also a test may be useful for predicting the performance of one group e. attention. o Faking bad: Purposely saying 'no' or looking bad if there's a 'reward' (e. Response sets = psychological orientation or bias towards answering in a particular way: o Acquiescence: tendency to agree. In interviews. race. Bias in prediction occurs when the test makes systematic errors in predicting some outcome (or criterion). Bias o Cultural bias: does the psychological construct have the same meaning from one culture to another. Classical reliability theory provides us with a way to think about this problem. social welfare. Try to design questions which so that social desirability isn't salient. It shows us that an obtained score X for student s (i. o Test Bias Bias in measurement occurs when the test makes systematic errors in measuring a particular characteristic or attribute e. but because of their membership in a particular group. males but be less accurate in predicting the performance of females.e.g. region.
. Construct-irrelevant score components may be introduced into tests due to inappropriate sampling of test content or lack of clarity in test instructions.. References to color. bias can also work in favor of students by making it appear that they know more about what is being measured than they actually do. Page 2 Test authors should not use any elements that they think might malign or give unfair advantage to any subgroup of examinees. jargon. assessment writers should remove any elements that are offensive or questionable and would therefore draw attention away from the purpose of the assessment. That is. ..e. the true score). random error and systematic error. (a true score) and E s (an error score) for that student. and the observed scores for students would more closely reflect what they actually know about the construct being tested (i. and demeaning characterizations should not be used. Dose the item contain language that is not commonly used or has different connotations for different groups (e. Test-wise students can often use clues within the test or individual assessment items to boost their observed score even though they know less about the content being assessed than their lower scoring classmates. The characterization of any group within test items should not be at the expense of that group. If we could eliminate test bias we therefore we would also reduce measurement error. depending on where you live). marital status. the error term can also be partitioned into two parts. Notice that according to this interpretation of the equation above.g. on that particular test. They may also arise if scoring criteria fail to credit fully some correct problem approaches or solutions that are more typical of one group than another. consider the following: 1. slang.g. Another kind of item bias can occur because of stereotyping groups that individual students may associate with. or gender should only be made when it is relevant to the context (e. When writing or reviewing assessment materials for bias. In general. X s =T s +E s In this equation. gender-neutral terms should be used whenever possible).
e. 3. Does the item assume that all students come from the same socioeconomic background (e. use a direct question instead of an incomplete statement whenever possible). Does the item contain wording demeaning or offensive to a particular group? 4. a suburban home with a two-car garage)? 6. 8. or emotions. you must constantly ask two questions of each item: (a) Did I communicate my intent clearly? and (b) Did I give any content-irrelevant clues to the correct answer? 1. Make directions clear so that all students know what is expected of them. 7. Is the item appropriate for the geographical region? The following guidelines are provided to help reduce bias and distortion in multiple-choice items. The reading level. 2. The item stem should clearly formulate a problem or question (i. Distractors should be plausible to uninformed examinees. All assessment items must be clearly aligned with their learning targets. and/or complexity of stimulus material. 4. What fruit carries its seeds on the outside? STEM a) apple DISTRACTOR or FOIL OPTION b) grape DISTRACTOR or FOIL OPTION c) strawberry KEY . shown all in CAPS below.. The Anatomy of a Multiple-Choice Item This example will help you understand how four terms. Does the item portray group members in a stereotypical manner? These could include activities. When creating a test. 9.g. occupations. 2. 6. An item should only measure one objective. 5. Does the item include religious references some students may not know? 5.. and contain common misconceptions. Item stems should not contain irrelevant material unless this somehow serves the purpose of the question. Avoid items that present an unnecessary instructional aside. are used to label different parts of a test item. should be appropriate to the examinees. 3. An item should clearly measure its learning target by conforming to the assessment characteristics for that learning target.
) in stems or foils. etc. or eliminate distractors in another item. bold or all CAPS). 21. Items should not ask students to make value judgments.. Foils in one item should not give clues that will help answer another item on the test. or plurals give cues to correct answers. There should be one and only one response that content experts can agree on as correct. or from smallest to largest if numerical. Foils should be grammatically correct completions of the item stem (e. All distractors should be equally attractive... Avoid using “All of the above”. 14.. number or tense of a verb. etc. Each distractor should be a logical response to the item stem.. 25. Avoid using foils where a pair of opposites is presented and one of the pair is the correct answer (key). References .). Item difficulty should not be a function of the vocabulary used in the stem or item options.e. one foil should not be a subset of another). A B C and D) with approximately equal frequency. NOT.g. The correct response should parallel the distractors in terms of length and complexity (i. “None of the above” or “I don’t know” as options. 22.g. OPTION d) tomato DISTRACTOR or FOIL OPTION Page 3 10. Avoid the use of absolutes like always or never. Whenever possible use a logical order when listing foils (e. 24.g.. Foils should not overlap or include one another (e..e. alphabetical if one word. and be randomly distributed throughout the test. The technical level of the distractors should match the technical level of the item stem. 19. The key should appear in all option positions (i. 20. 18. 16.e. 13. use of “a” or “an”. by length of foil if more than one word. Avoid using qualifiers such as LEAST or EXCEPT in the item stem. 17. 12. 23. Avoid the use of negative words (i. 15. words or phrases in the stem and the correct response that sound alike). 26. options should be homogeneous). 11. Foils should be mutually exclusive.. When these terms are used they should be highlighted in some way (e.g. Items should not contain clang associations (i.e.
Standards for Educational and Psychological Testing (1999). American Educational Research Association. and Lehmann. Holt. and National Council on Measurement in Education. I. Mehrens. (1978). Rinehart and Winston. Measurement and evaluation in education and psychology. W. American Psychological Association. .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.