This action might not be possible to undo. Are you sure you want to continue?
Chapter 19, "Test Bias," pages 511–544.
Imagine that you are a professional who wants to evaluate the progress of a given client through psychological testing.
• • •
Identify the different sources of bias you might encounter in the assessment process. Discuss the specific steps you would take to reduce test bias effects and what you would look for in an assessment instrument for purposes of bias reduction. Describe how the standard deviation of a test score affects the interpretation of test scores.
Response Guidelines Critically respond to at least one other learner. Are their strategies for reducing bias adequate? Provide feedback to improve their strategy or strengthen their post. The issue of systemic bias in psychological testing has been the subject of debate for many years. The Standards text indicates in 7.3 that “when credible research reports that differential item functioning exists across age, gender, racial/ethnic, cultural, disability, and/or linguistic groups in the population of test takers in the content domain measured by the test, test developers should conduct appropriate studies when feasible. Such research should seek to detect and eliminate aspects of test design, content, and format that might bias test scores for particular groups (AERA, APA, & NCME, 1999).” It is obvious from the above statement that even if the test seems reliable and valid, it can be bias if it systematically distorts the true scores of certain individuals or groups. Although test bias has been difficult to define, Hunter and Schmidt (1976) identify three ethical issues at work in testing bias: unqualified individualism, the use of quotas, and qualified individualism (as cited in Kaplan, R. M., & Saccuzzo, D. P., 2009). Unqualified individualism uses tests to select the most qualified individuals without regard to race and gender. The use of quotas recognizes race and gender in the testing process. Qualified individualism uses tests in the same way as unqualified individualism except it takes into consideration race, gender and religion. Test bias can appear in many forms. Some of the common ones are: cultural, gender, ethnic/race, age, measurement and prediction bias. Measurement bias occurs when a test has systematic errors in measuring a particular characteristic or attribute. Prediction bias takes place when a test makes systematic errors in predicting some outcome. In order to accurately begin to understand whether a test is bias and will result in content irrelevant scores one must first ask two questions: 1) Does the test fulfill the intent of the
use of stereotypes. a very large sample is required before subtle degrees of bias can be detected. For instance. religious language or references that are characteristic of particular groups. but if the sample is too large then almost every item will show a small. and issues (7th ed. Classic test theory tells us that errors of measurement are random because testing instruments are imperfect in measuring a person’s true score (Kaplan. (2009). Repeated applications of the same test can yield different scores. Test developers should review the test and make appropriate adjustments. Also. In practice the standard deviation of an observed score can estimate the standard error of measurement. DC: American Educational Research Association. such as group differences. R. or against. . Psychological testing: Principles. certain groups. References American Educational Research Association. but significant. American Psychological Association.. such as some questions being significantly harder for a particular group. The test should be removed from use until further validation is provided. degree of bias.. 2009). P. and/or due to internal factors. Washington. Standards for educational and psychological testing. CA: Cengage. A test is biased if it systematically underestimates or overestimates the true scores of certain individuals. Even if a test is both reliable and valid. P. & National Council on Measurement in Education. If testing bias does exist an investigation should follow. and it can be very difficult to decide upon a criterion for forming groups in the first place.. M. Another example would be an English-based intelligence test given to a nonEnglish speaker. D. applications. R. a particular bias for a specific geographical region and the implication of a particular socioeconomic status. Since the standard deviation is the average deviation around the mean. (1999). language that is considered offensive or demeaning. The standard deviation of the scores can tells us something about the measurement of error around the true score or mean. But it is important to remember that individual differences within groups tend to be much greater than differences between groups.). Kaplan. Belmont. & Saccuzzo. D. knowledge-based tests are always biased if given to people who have no way of realistically knowing the answers. It is very important for tests in selection procedures to be unbiased so that certain groups are not underrepresented in the workforce. & Saccuzzo. that is no guarantee that it will not be biased for. Bias can be due to external factors. M.developer? and 2) Does the test contain any content-irrelevant information that reveals a correct answer? In order to reduce bias one must examine if the following areas exist: language that may contain different meanings for different groups.
i. Test Bias Bias in measurement occurs when the test makes systematic errors in measuring a particular characteristic or attribute e. compensation. Bias in prediction occurs when the test makes systematic errors in predicting some outcome (or criterion).• Response sets = psychological orientation or bias towards answering in a particular way: o Acquiescence: tendency to agree. cultural background. males but be less accurate in predicting the performance of females.e. or age – among others. say "Yes”. an item should measure content in a way that neither advantages nor disadvantages examinees because of group membership.). not because of their knowledge or skill relative to a learning objective. actual content (face) validity may be different for different cultures. . It is often suggested that tests used in academic admissions and in personnel selection under-predict the performance of minority applicants Also a test may be useful for predicting the performance of one group e. • Bias o Cultural bias: does the psychological construct have the same meaning from one culture to another. etc.g. Faking bad: Purposely saying 'no' or looking bad if there's a 'reward' (e. social welfare. According to the standards for Educational and Psychological Testing “the term bias in tests and testing refers to construct-irrelevant components that result in systematically lower or higher scores for identifiable groups of examinees”.g. Try to design questions which so that social desirability isn't salient. research shows that there is a bias in favour of good-looking applicants. how are the different items interpreted by people from different cultures. gender. which are a type of test. but because of their membership in a particular group. Gender bias may also be possible. To be free of bias. Hence use of half -vely and half +vely worded items (but there can be semantic difficulties with -vely wording) o o Social desirability: tendency to portray self in a positive light.g. many say that most IQ tests may well be valid for middle-class whites but not for blacks or other minorities. region. In interviews. race. attention. religion. Group membership may be related to such factors as socioeconomic status. o o • • • • • • • • • • • Test Bias and How to Identify It An assessment item is considered biased if it favors examinees.
. assessment writers should remove any elements that are offensive or questionable and would therefore draw attention away from the purpose of the assessment. the error term can also be partitioned into two parts. In general. Construct-irrelevant score components may be introduced into tests due to inappropriate sampling of test content or lack of clarity in test instructions. jargon. Notice that according to this interpretation of the equation above. Test-wise students can often use clues within the test or individual assessment items to boost their observed score even though they know less about the content being assessed than their lower scoring classmates. X s ) can be thought of as being made up of two components: T s (a true score) and E s (an error score) for that student. References to color. bias can also work in favor of students by making it appear that they know more about what is being measured than they actually do. That is. The characterization of any group within test items should not be at the expense of that group... Another kind of item bias can occur because of stereotyping groups that individual students may associate with. and the observed scores for students would more closely reflect what they actually know about the construct being tested (i.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Classical reliability theory provides us with a way to think about this problem.e. They may also arise if scoring criteria fail to credit fully some correct problem approaches or solutions that are more typical of one group than another. random error and systematic error. It shows us that an obtained score X for student s (i. on that particular test. slang. marital status.g. gender-neutral terms should be used whenever . the true score). and demeaning characterizations should not be used. or gender should only be made when it is relevant to the context (e. If we could eliminate test bias we therefore we would also reduce measurement error.e. X s =T s +E s In this equation.
Does the item portray group members in a stereotypical manner? These could include activities. and/or complexity of stimulus material. Is the item appropriate for the geographical region? The following guidelines are provided to help reduce bias and distortion in multiple-choice items. Does the item assume that all students come from the same socioeconomic background (e. consider the following: 1. Avoid items that present an unnecessary instructional aside. 5. Does the item contain wording demeaning or offensive to a particular group? 4. 4.g. When writing or reviewing assessment materials for bias. Make directions clear so that all students know what is expected of them.e.. you must constantly ask two questions of each item: (a) Did I communicate my intent clearly? and (b) Did I give any content-irrelevant clues to the correct answer? 1. 3. 3. Dose the item contain language that is not commonly used or has different connotations for different groups (e. a suburban home with a two-car garage)? 6. All assessment items must be clearly aligned with their learning targets.. 2. occupations.g. 8. The Anatomy of a Multiple-Choice Item This example will help you understand how four terms. The reading level. and contain common misconceptions. should be appropriate to the examinees.. Does the item include religious references some students may not know? 5. 9. Distractors should be plausible to uninformed examinees. shown all . Item stems should not contain irrelevant material unless this somehow serves the purpose of the question. use a direct question instead of an incomplete statement whenever possible). When creating a test. The item stem should clearly formulate a problem or question (i. or emotions. 2. 7.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • possible). 6. depending on where you live). An item should only measure one objective. Page 2 Test authors should not use any elements that they think might malign or give unfair advantage to any subgroup of examinees. An item should clearly measure its learning target by conforming to the assessment characteristics for that learning target.
Item difficulty should not be a function of the vocabulary used in the stem or item options. Items should not contain clang associations (i. etc. by length of foil if more than one word.g. “None of the above” or “I don’t know” as . Foils should not overlap or include one another (e. Each distractor should be a logical response to the item stem. 21. 15.g.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • in CAPS below. Avoid the use of absolutes like always or never.. use of “a” or “an”. or plurals give cues to correct answers. options should be homogeneous). 16. or from smallest to largest if numerical. Avoid using “All of the above”. 12. 14. one foil should not be a subset of another).e.. Whenever possible use a logical order when listing foils (e.. Items should not ask students to make value judgments. Foils should be grammatically correct completions of the item stem (e. 18. 11. Avoid using foils where a pair of opposites is presented and one of the pair is the correct answer (key). alphabetical if one word. are used to label different parts of a test item... 20.).e. All distractors should be equally attractive. The technical level of the distractors should match the technical level of the item stem. 13. 17. 19. What fruit carries its seeds on the outside? STEM a) apple DISTRACTOR or FOIL OPTION b) grape DISTRACTOR or FOIL OPTION c) strawberry KEY OPTION d) tomato DISTRACTOR or FOIL OPTION Page 3 10. Foils should be mutually exclusive. number or tense of a verb.g. words or phrases in the stem and the correct response that sound alike). The correct response should parallel the distractors in terms of length and complexity (i.
etc. 24. Standards for Educational and Psychological Testing (1999). and be randomly distributed throughout the test. When these terms are used they should be highlighted in some way (e.. and Lehmann. The key should appear in all option positions (i. W. 23. . I. Avoid using qualifiers such as LEAST or EXCEPT in the item stem. References Mehrens. A B C and D) with approximately equal frequency.) in stems or foils. American Psychological Association.• • • • • • • • • • • • • • • • • • options. and National Council on Measurement in Education. 25.e. (1978). 22. bold or all CAPS). Measurement and evaluation in education and psychology. NOT.e. 26. or eliminate distractors in another item. Rinehart and Winston. American Educational Research Association..g.. There should be one and only one response that content experts can agree on as correct. Holt. Foils in one item should not give clues that will help answer another item on the test. Avoid the use of negative words (i.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.