You are on page 1of 9

‭Session 3 - SUMMARY - CHAPTER 2 (P.

27-56)‬
‭ hapter 2: Principles of Language Assessment‬
C
‭I.‬ P
‭ racticality‬
‭ PRACTICAL TEST...‬
A
‭* stays within budgetary limits‬
‭* can be completed by the test-taker within appropriate time constraints‬
‭* has clear directions for administration‬
‭* appropriately utilizes available human resources‬
‭* does not exceed available material resources‬
‭* considers the time and effort involved in both designing and scoring.‬

‭ ime is always a crucial practical factor for busy teachers in‬


T
‭classroom-based testing‬
‭II.‬ ‭RELIABILITY‬
‭A reliable test is consistent and dependable.‬
‭ RELIABLE TEST...‬
A
‭* has consistent conditions across two or more administrations‬
‭* gives clear directions for scoring/evaluation‬
‭* has uniform rubrics for scoring/evaluation lends itself to consistent‬
‭application of rubrics by the scorer‬
‭* contains items/tasks that are unambiguous to the test-taker‬
‭ he issue of the reliability of tests can be better understood by considering a‬
T
‭number of factors that can contribute to their unreliability.‬
‭1.‬ ‭Student-Related Reliability‬
‭Most common issues caused by physical or psychological factors (i.e.:‬
‭temporary illness, fatigue, a “bad day,” anxiety, etc.) or a test-taker’s‬
‭test-wiseness, or strategies for efficient test-taking‬
‭2.‬ ‭Rater Reliability‬
‭Interrater reliability occurs when two or more scorers yield consistent scores‬
‭on the same test.‬
‭Intra-rater reliability is an internal factor, a common occurrence for‬
‭classroom teachers. Ex: reliability can be violated in cases of unclear scoring‬
c‭ riteria, fatigue, bias toward particular “good” and “bad” students, or simple‬
‭carelessness.‬
‭3.‬ ‭Test Administration Reliability‬
‭Unreliability may also result from the conditions in which the test is‬
‭administered. (Ex: the unexpected noise from outside affects testers,‬
‭photocopying variations, the amount of light in dif- ferent parts of the room,‬
‭variations in temperature, and even the condition of desks and chairs.)‬
‭4.‬ ‭Test Reliability‬
‭Test unreliability can be caused by many factors, including rater bias (i.e:‬
‭subjective tests with open-ended responses and objective tests with‬
‭predetermined fixed responses)‬
‭Further unreliability may be caused by poorly written test items (including‬
‭too many items and a time limit,...)‬
‭=> test characteristics can interact with student-related unreliability,‬
‭muddying the lines of distinction between test reliability and test‬
‭administration reliability.‬
‭III. VALIDITY‬
‭●‬ ‭the extent to which inferences made from assessment results are appropriate,‬
‭meaningful, and useful in terms of the purpose of the assessment‬
‭ VALID TEST...‬
A
‭* measures exactly what it proposes to measure‬
‭* does not measure irrelevant or “contaminating” variables‬
‭* relies as much as possible on.empirical evidence (performance)‬
‭* involves performance that samples the test’s criterion (objective)‬
‭* offers useful, meaningful information about a test-taker's ability‬
‭* is supported by a theoretical rationale or argument‬
‭1.‬ ‭Content-Related Evidence‬
‭●‬ ‭A test actually samples the subject matter about which conclusions are‬
‭to be drawn + requires the test-taker to perform the behavior measured‬
‭●‬ ‭The difference between direct and indirect testing.‬
‭○‬ ‭Direct testing involves the test-taker in actually performing the‬
‭target task.‬
‭○‬ ‭In an indirect test, learners do not perform the task itself but‬
‭rather a task that is related in some way.‬
‭2.‬ ‭Criterion-Related Evidence‬
‭●‬ T ‭ ests measure specified classroom objectives, and implied‬
‭predetermined levels of performance are expected to be reached‬
‭●‬ ‭is best demonstrated through a comparison of results of an assessment‬
‭with results of some other measure of the same criterion.‬
‭●‬ ‭Criterion-related evidence usually falls into one of two categories: (1)‬
‭concurrent and (2) predictive validity.‬
‭○‬ ‭concurrent validity - test results are supported by other‬
‭concurrent performance beyond the assessment itself‬
‭○‬ ‭The predictive validity - placement tests, admissions assessment‬
‭batteries, and achievement tests designed to determine students’‬
‭readiness to “move on” to another unit.‬
‭3.‬ ‭Construct-Related Evidence‬
‭●‬ ‭A construct is any theory, hypothesis, or model that attempts to‬
‭explain observed phenomena in our universe of perceptions.‬
‭●‬ ‭Constructs may or may not be directly or empirically measured—their‬
‭verification often requires inferential data.‬
‭●‬ ‭Proficiency and communicative competence are examples of linguistic‬
‭constructs; self-esteem and motivation are psychological constructs.‬
‭●‬ ‭​Construct validity is a major issue in validating large-scale‬
‭standardized tests of proficiency - tests must adhere to the principle of‬
‭practicality, must sample a limited number of domains of language,‬
‭and may not be able to contain all the content of a particular field or‬
‭skill.‬
‭4.‬ ‭Consequential Validity (Impact)‬
‭●‬ ‭Consequential validity encompasses all the consequences of a test, (its‬
‭accuracy in measuring intended criteria, its effect on the preparation‬
‭of test-takers, and the (intended and unintended) social consequences‬
‭of a test’s interpretation and use.)‬
‭●‬ ‭The term impact - consequential validity, perhaps more broadly‬
‭encompassing the many consequences of assessment, before and after‬
‭a test administration.‬
‭5.‬ ‭Face Validity‬
‭●‬ ‭Face validity refers to the degree to which a test looks right, and‬
‭appears to measure the knowledge or abilities it claims to measure,‬
‭based on the subjective judgment of the examinees who take it, the‬
a‭ dministrative personnel who decide on its use, and other‬
‭psychometrically unsophisticated observers‬
‭●‬ ‭Test appearance does indeed have an effect that neither test-takers nor‬
‭test designers can ignore‬
‭●‬ ‭Teachers can increase a student's perception of fair tests by using:‬
‭○‬ ‭formats that are expected and well-constructed with familiar‬
‭tasks‬
‭○‬ ‭tasks that can be accomplished within an allotted time limit‬
‭○‬ ‭items that are clear and uncomplicated ¢ directions that are‬
‭crystal clear‬
‭○‬ ‭tasks that have been rehearsed in their previous course work‬
‭○‬ ‭tasks that relate to their course work (content validity)‬
‭○‬ ‭level of difficulty that presents a reasonable challenge‬
‭The psychological state of the learner (confidence, anxiety, etc.) is an‬
‭important ingredient in peak performance.‬
‭IV. AUTHENTICITY‬
‭●‬ ‭the degree of correspondence of the characteristics of a given language test‬
‭task to the features of a target language task‬
‭AN AUTHENTIC TEST...‬
‭●‬ ‭contains language that is as natural as possible‬
‭●‬ ‭has items that are contextualized rather than isolated‬
‭●‬ ‭includes meaningful, relevant, and interesting topics‬
‭●‬ ‭provides some thematic organization to items, such as through a‬
‭story line or episode‬
‭●‬ ‭offers tasks that replicate real-world tasks‬

‭V. WASHBACK‬
‭●‬ ‭the effect of testing on teaching and learning‬
‭●‬ ‭refer to both the promotion and the inhibition of learning, thus emphasizing‬
‭what may be referred to as beneficial versus harmful (or negative)‬
‭washback.‬
‭A TEST THAT PROVIDES BENEFICIAL WASHBACK ...‬
‭●‬ ‭positively influences what and how teachers teach‬
‭●‬ ‭positively influences what and how learners learn‬
‭●‬ ‭offers learners a chance to adequately prepare‬
‭‬ g
● ‭ ives learners feedback that enhances their language development‬
‭●‬ ‭is more formative in nature than summative‬
‭●‬ ‭provides conditions for peak performance by the learner‬

‭●‬ t‭he effects that tests have on instruction in terms of how students prepare for‬
‭the test.‬
‭○‬ ‭Washback can have a number of positive manifestations - from the‬
‭benefits of preparing and reviewing for a test to the learning that‬
‭accrues from feedback on one’s performance. => Teachers can‬
‭provide information that “washes back” to students in the form of‬
‭useful diag- noses of strengths and weaknesses.‬
‭●‬ ‭the effects of an assessment on preparation for the assessment. (Informal‬
‭performance assessment is by nature more likely to have built-in washback‬
‭effects because the teacher usually provides interactive feedback.)‬
‭●‬ ‭To enhance washback - comment generously and specifi- cally on test‬
‭performance.‬
‭●‬ ‭Washback is achieved by a quick consideration of differences between‬
‭formative and summative tests - formative tests provide washback in the‬
‭form of information to the learner on progress toward goals‬
‭●‬ ‭Washback also implies that students have ready access to you to discuss the‬
‭feedback and evaluation you have given - students need to have a chance to‬
‭“feed back” on teachers’ feedback.‬
‭ I. APPLYING PRINCIPLES TO CLASSROOM TESTING‬
V
‭1.‬ ‭Are the Test Procedures Practical?‬

‭ RACTICALITY CHECKLIST‬
P
‭1. Are administrative details all carefully attended to before the test?‬
‭2. Can students complete the test reasonably within the set time frame?‬
‭3. Can the test be administered smoothly, without procedural “glitches”?‬
‭4. Are all printed materials accounted for?‬
‭5. Has equipment been pre-tested?‬
‭6. Is the cost of the test within budgeted limits?‬
‭7. Is the scoring/evaluation system feasible in the teacher’s time frame?‬
‭8. Are methods for reporting results determined in advance?‬
‭ .‬ I‭ s the Test Itself Reliable?‬
2
‭-‬ ‭Test and test administration reliability can be achieved by making sure that all‬
‭students receive the same quality of input, whether written or auditory.‬

‭ EST RELIABILITY CHECKLIST‬


T
‭1. Does every student have a cleanly photocopied test sheet?‬
‭2. Is sound amplification clearly audible to everyone in the room?‬
‭3. Is video input clearly and uniformly visible to all?‬
‭4. Are lighting, temperature, extraneous noise, and other classroom conditions equal‬
‭(and optimal) for all students?‬
‭5. For closed-ended responses, do scoring procedures leave little debate about‬
‭correctness of an answer?‬
‭ .‬ C
3 ‭ an You Ensure Rater Reliability?‬
‭-‬ ‭Intra-rater reliability for open-ended responses may be enhanced by answering‬
‭these questions:‬

I‭ NTRA-RATER RELIABILITY CHECKLIST‬


‭1. Have you established consistent criteria for correct responses?‬
‭2. Can you give uniform attention to those criteria throughout the evaluation time?‬
‭3. Can you guarantee that scoring is based only on the established criteria and not on‬
‭extraneous or subjective variables?‬
‭4. Have you read through tests at least twice to check for consistency?‬
‭5. If you have made “midstream” modifications of what you consider a correct‬
‭response, did you go back and apply the same standards to all?‬
‭6. Can you avoid fatigue by reading the tests in several sittings, especially if the time‬
‭requirement is a matter of several hours?‬
‭ .‬ D
4 ‭ oes the Procedure Demonstrate Content Validity?‬
‭-‬ ‭Content validity: the extent to which the assessment requires students to perform‬
‭tasks included in the previous classroom lessons and that directly represent the‬
‭objectives of the unit on which the assessment is based.‬

‭ ONTENT VALIDITY CHECKLIST (FOR A TEST ON A UNIT)‬


C
‭ .‬
1 ‭Are unit objectives clearly identified?‬
‭2.‬ ‭Are unit objectives represented in the form of test specifications? (See below for‬
‭details on test specifications.)‬
‭3.‬ ‭Do the test specifications include tasks that have already been performed as part‬
‭of the course procedures?‬
‭4.‬ ‭Do the test specifications include tasks that represent all (or most) of the‬
‭ bjectives for the unit?‬
o
‭ .‬ ‭Do those tasks involve actual performance of the target task(s)?‬
5
‭-‬ T
‭ est specifications (specs)‬‭: a test should have a structure that follows logically‬
‭from the lesson or unit you are testing. Many tests have a design that:‬
‭+‬ ‭divides them into a number of sections (corresponding, perhaps, to the‬
‭objectives assessed)‬
‭+‬ ‭offers students a variety of item types‬
‭+‬ ‭gives an appropriate relative weight to each section‬
‭5. Has the Impact of the Test Been Carefully Accounted for?‬

‭ ONSEQUENTIAL VALIDITY CHECKLIST‬


C
‭1. Have you offered students appropriate review and preparation for the test?‬
‭2. Have you suggested test-taking strategies that will be beneficial?‬
‭3. Is the test structured so that, if possible, the best students will be modestly‬
‭challenged and the weaker students will not be overwhelmed?‬
‭4. Does the test lend itself to your giving beneficial washback?‬
‭5. Are the students encouraged to see the test as a learning experience?‬
‭ .‬ A
6 ‭ re the Test Tasks as Authentic as Possible?‬
‭-‬ ‭Evaluate the extent to which a test is authentic by asking the following questions:‬
‭ UTHENTICITY CHECKLIST‬
A
‭1. Is the language in the test as natural as possible?‬
‭2. Are items as contextualized as possible rather than isolated?‬
‭3. Are topics and situations interesting, enjoyable, and/or humorous?‬
‭4. Is some thematic organization provided. such as through a story line or episode?‬
‭5. Do tasks represent. or closely approximate, real-world tasks?‬
-‭ ‬ D ‭ econtextualized tasks‬
‭-‬ ‭Contextualized tasks‬
‭7.‬ ‭Does the Test Offer Beneficial Washback to the Learner?‬

‭ ASHBACK CHECKLIST‬
W
‭1. Is the test designed in such a way that you can give feedback that will be relevant to‬
‭the objectives of the unit being tested?‬
‭2. Have you given students sufficient pretest opportunities to review the subject matter‬
‭of the test?‬
‭3. In your written feedback to each student, do you include comments that will‬
‭contribute to students’ formative development?‬
‭4. After returning tests, do you spend class time “going over” the test and offering‬
a‭ dvice on what students should focus on in the future?‬
‭5. After returning tests, do you encourage questions from students?‬
‭6. If time and circumstances permit, do you offer students (especially the weaker ones)‬
‭a chance to discuss results in an office hour?‬
‭-‬ B
‭ y spending classroom time after the test reviewing the content, students discover‬
‭their areas of strength and weakness.‬
‭ II. MAXIMIZING BOTH PRACTICALITY AND WASHBACK‬
V

-‭ ‬ b‭ uilding as much authenticity as possible into multiple-choice task types and items‬
‭-‬ ‭designing classroom tests that have both objective-scoring sections and‬
‭open-ended response sections‬
‭-‬ v ‭ arying the performance tasks turing multiple-choice test results into diagnostic‬
‭feedback on areas of needed improvement‬
‭-‬ ‭maximizing the preparation period before a test to elicit performance relevant to‬
‭the ultimate criteria of the test‬
‭-‬ ‭teaching test-taking strategies‬
‭-‬ ‭helping students achieve learning beyond the test (don't “teach to the test”)‬
‭-‬ ‭triangulating information on a student before making a final assessment of‬
‭competence‬

You might also like