You are on page 1of 26

Language Testing and assessment

3. In an era of communicative language teaching: Tests should measure up to standards of authenticity and
meaningfulness. Ts should design tests that serve as motivating learning experiences rather than anxietyprovoking threats. Tests; should be positive experiences should build a person‟s confidence and
become learning experiences should bring out the best in students shouldn‟t be degrading
shouldn‟t be artificial shouldn‟t be anxiety-provoking Language Assessment aims; to
create more authentic, intrinsically motivating assessment procedures that are appropriate for their context
& designed offer constructive feedback to sts
4. What is a test? A test is measuring a person’s ability, knowledge or performance in a given
domain. 1. Method A set of techniques, procedures or items. To qualify as a test, the method must be
explicit and structured. Like; Multiple-choice questions with prescribed correct answers A writing prompt
with a scoring rubric An oral interview based on a question script and a checklist of expected responses to
be filled by the administrator 2 Measure A means for offering the test-taker some kind of result. If an
instrument does not specify a form of reporting measurement, then that technique cannot be defined as a
test. Scoring may be like the followings Classroom-based short answer essay test may earn the test-taker a
letter grade accompanied by the instructor‟s marginal comments. Large-scale standardized tests
provide a total numerical score, a percentile rank, and perhaps some sub-scores.
5. 3. The test-taker(the individual) = The person who takes the test. Testers need to understand; who the
test-takers are? what is their previous experience and background? whether the test is appropriately
matched to their abilities? how should test-takers interpret their scores? 4. Performance Test measures
performance, but results imply test-taker‟ ability or competence. Some language tests measure
one‟s ability to perform language: To speak, write, read or listen to a subset of language Some
others measure a test-taker‟s knowledge about language: Defining a vocabulary item, reciting a
grammatical rule or identifying a rhetorical feature in written discourse.
6. 5. Measuring a given domain It means measuring the desired criterion and not including other factors.
Proficiency tests: Even though the actual performance on the test involves only a sampling of skills, that
domain is overall proficiency in a language – general competence in all skills of a language.
Classroom-based performance tests: These have more specific criteria. For example: A test of pronunciation
might well be a test of only a limited set of phonemic minimal pairs. A vocabulary test may focus on only
the set of words covered in a particular lesson. A well-constructed test is an instrument that provides an
accurate measure of the test taker‟s ability within a particular domain.
7. TESTING, ASSESSMENT & TEACHING TESTING are prepared administrative procedures that
occur at identifiable times in a curriculum. When tested, learners know that their performance is being
measured and evaluated. When tested, learners muster all their faculties to offer peak performance. Tests
are a subset of assessment. They are only one among many procedures and tasks that teachers can
ultimately use to assess students. Tests are usually time-constrained (usually spanning a class period or at
most several hours) and draw on a limited sample of behaviour. ASSESSMENT Assessment is an ongoing
process that encompasses a much wider domain. A good teacher never ceases to assess students, whether
those assessments are incidental or intended. Whenever a student responds to a question, offers a comment,
or tries out a new word or structure, the teacher subconsciously makes an assessment of the
student‟s performance. Assessment includes testing. Assessment is more extended and it includes
a lot more components.
8. What about TEACHING? For optimal learning to take place, learners must have opportunities to
“play” with language without being formally graded. Teaching sets up the practice
games of language learning: the opportunities for learners to listen, think, take risks, set goals, and process
feedback from the teacher (coach) and then recycle through the skills that they are trying to master. During
these practice activities, teachers are indeed observing students‟ performance and making various
evaluations of each learner. Then, it can be said that testing and assessment are subsets of teaching.
9. ASSESSMENT Informal Assessment They are incidental, unplanned comments and responses.
Examples include: “Nice job!” “Well done!” “Good work!


” “Did you say can or can’t?” “Broke or break!”,
or putting a ☺ on some homework. Classroom tasks are designed to elicit performance without
recording results and making fixed judgements about a student‟s competence. Examples of
unrecorded assessment: marginal comments on papers, responding to a draft of an essay, advice about how
to better pronounce a word, a suggestion for a strategy for compensating for a reading difficulty, and
showing how to modify a student‟s note-taking to better remember the content of a lecture.
Formal Assessment They are exercises or procedures specifically designed to tap into a storehouse of skills
and knowledge. They are systematic, planned sampling techniques constructed to give Ts and sts an
appraisal of student achievement. They are tournament games that occur periodically in the course of
teaching. It can be said that all tests are formal assessments, but not all formal assessment is testing.
Example 1: A student‟s journal or portfolio of materials can be used as a formal assessment of
attainment of the certain course objectives, but it is problematic to call those two procedures
“test”. Example 2: A systematic set of observations of a student‟s frequency of
oral participation in class is certainly a formal assessment, but not a “test”.
10. THE FUNCTION OF AN ASSESSMENT Formative Assessment Summative Assessment Evaluating
students in the It aims to measure, or summarize, what a student process of “forming”
their has grasped, and typically competencies and skills with occurs at the end of a course. the goal of
helping them to continue that growth process. It does not necessarily point the way to future progress. It
provides the ongoing development of learner‟s lang Example: Final exams in a course and general
Example: When you give sts a proficiency exams. comment or a suggestion, or call attention to an error,
that feedback is offered to improve learner‟s language ability. Virtually all kinds of informal
assessment are formative. All tests/formal assessment (quizzes, periodic review tests, midterm exams, etc.)
are summative.
11. IMPORTANT: As far as summative assessment is considered, in the aftermath of any test, students tend
to think that “Whew! I‟m glad that‟s over. Now I don‟t have to
remember that stuff anymore!” An ideal teacher should try to change this attitude among students.
A teacher should: · instill a more formative quality to his lessons · offer students an
opportunity to convert tests into “learning experiences”.
12. Norm-Referenced Tests TESTS Each test-taker‟s score is interpreted in relation to a mean
(average score), median (middle score), standard deviation (extend of variance in scores), and/or percentile
rank. The purpose is to place test-takers along a mathematical continuum in rank order. Scores are usually
reported back to the test-taker in the form of a numerical score. (230 out of 300, 84%, etc.) Typical of these
tests are standardized tests like SAT. TOEFL, ÜDS, KPDS, DS, etc. These tests are intended to be
administered to large audiences, with results efficiently disseminated to test takers. They must have fixed,
predetermined responses in a format that can be scored quickly at minimum expense. Money and efficiency
are primary Criterion-Referenced Tests They are designed to give testtakers feedback, usually in the form
of grades, on specific course or lesson objectives. Tests that involve the sts in only one class, and are
connected to a curriculum, are Criterion-Referenced Tests. Much time and effort on the part of the teacher
are required to deliver useful, appropriate feedback to students. The distribution of students‟
scores across a continuum may be of little concern as long as the instrument assesses appropriate
objectives. As opposed to standardized, large scale testing with its emphasis on classroom-based testing,
CriterionReferenced Testing is of more prominent interest than Norm-Referenced Testing.
13. Approaches to Language Testing: A Brief History Historically, language-testing trends have followed
the trends of teaching methods. During 1950s: An era of behaviourism and special attention to contrastive
analysis. Testing focused on specific lang elements such as phonological, grammatical, and lexical contrasts
between two languages. During 1970s and 80s: Communicative Theories were widely accepted. A more
integrative view of testing. Today: Test designers are trying to form authentic, valid instruments that
simulate real world interaction.
14. APPROACHES TO LANGUAGE TESTING A) Discrete-Point Testing B) Integrative Testing
Language can be broken down into its component parts and those parts can be tested successfully.
Component parts; listening, speaking, reading and writing. Units of language (discrete points); phonology,
graphology, morphology, lexicon, syntax and discourse. An language proficiency test should sample all 4
skills and as many linguistic discrete points as possible In the face of evidence that in a study each student
scored differently in various skills depending on his background, country and major field, Oller admitted
that “unitary trait hypothesis was wrong.” Language competence is a unified set of


16. grief. (Knowledge about a language. discourse structure. to an extent. because they began to identify the real-world tasks that language learners were called upon to perform.alternative assessment&#x201D.) IN &#x263A.. but those extra efforts are paying off in more direct testing because sts are assessed as they perform actual or simulated real-world tasks. open-ended responses. vocab. We were forced to be in the limits of objectivity and give impersonal responds. intuitive appeal 3 . They do not tell us anything directly about a student&#x201F. norm-referenced tests that are timed in a multiple-choice format consisting of a multiplicity of logic constrained items. and other interactive tasks. analytical tests in measuring lang. responding. Supporters argue that dictation is an integrative test because success on a dictation requires careful listening. Important performancebased assessment means that Ts should rely a little less on formally structured tests and a little more on evaluation while sts are performing various tasks. discrete-point. and other discrete points of lang. The ability to supply appropriate words in blanks requires a number of abilities that lie at the heart of competence in a language: knowledge of vocabulary. test performance must correspond in demonstrable ways to language use in non-test situations. The effect of new theories of intelligence on the testing industry &#xB7. Those who manage their emotions tend to be more capable of fully intelligent processing. These are: &#xB7. We were relying on timed.s linguistic competence. etc. Communicative competence is global and requires such integration that it cannot be captured in additive tests of grammar. For many years.     interacting abilities that cannot be tested separately. These conceptualizations of intelligence&#x201F. Recently: Spatial intelligence musical intelligence bodily-kinesthetic intelligence interpersonal intelligence intrapersonal intelligence EQ (Emotional Quotient) underscore emotions in our cognitive processing. and other discrete points of lang could not be disentangled 15. some expectancy rules to aid the short-term memory. reading. 19. Dictation Essentially. But. 17. Two types of tests examples of integrative tests: *cloze test and **dictation. d) Performance-Based Assessment performance-based assessment of language typically involves oral production. it was clear that the contexts for those tasks were extraordinarily widely varied and that the sampling of tasks for any one assessment procedure needed to be validated by what language users actually do with language. There&#x201F. which were said to be the essence of global language proficiency. 18. CURRENT ISSUES IN CLASSROOM TESTING The design of communicative. written production. group performance. Unitary trait hypothesis: It suggests an &#x201C. Cloze Test: Cloze Test results are good measures of overall proficiency. resentment. & lived in a word of standardized. Any problem in using this approach? Yes. Any problems? It is time-consuming and expensive. view of language proficiency. Paper-and-pencil OUT Result: in this test tasks can approach the authenticity of real life language use. integrated performance (across skill areas). grammar. performance-based assessment continues to challenge both assessment experts and classroom teachers. requesting. The advantage of this approach? Higher content validity is achieved because learners are measured in the process of performing the targeted linguistic acts. It was argued that successful completion of cloze items taps into all of those abilities. other feelings can easily impair peak performance in everyday tasks as well as higher-order problem solving. c) Communicative Language Testing ( recent approach after mid 1980s) What does it criticise? In order for a particular langtest to be useful for its intended purposes. efficient short-term memory. learners listen to a passage of 100 to 150 words read aloud by an administrator (or audiotape) and write what they three issues which are helping to shape our current understanding of effective assessment. not the use of language) Any suggestion? A quest for authenticity. grammatical structure. Integrative tests such as cloze only tell us about a candidate&#x201F. many of which are inauthentic. because anger. reading skills and strategies. as test designers centered on communicative performance. As a result: The assessment field became more and more concerned with the authenticity of tasks and the genuineness of texts. we&#x201F. reproduction in writing of what is heard. The supporters emphasized the importance of strategic competence (the ability to employ communicative strategies to compensate for breakdowns as well as to enhance the rhetorical effect of utterances) in the process of communication. The advent of what has come to be called &#x201C. that vocabulary. The increasing popularity of computer-based testing New Views on Intelligence In the past: Intelligence was once viewed strictly as the ability to perform linguistic and logical-mathematical problem solving. communicative testing presented challenges to test designers.4 skills&#x201D.s performance ability. using correct spelling.indivisible&#x201D. In performance-based assessment: Interactive Tests (speaking. phonology.

Assessments can aid in evaluating teaching effectiveness. TRUE 9. The human interactive element is absent. PRACTICALITY A practical test &#xB7. IMPORTANT It is difficult to draw a clear line of distinction between traditional and alternative assessment. FALSE (They are not. More time and higher institutional budgets are required to administer and score assessments that presuppose more subjective evaluation. ----------1. interactive skills. 20. TRUE 2. Test-takers cannot skip questions. Homegrown quizzes may be mistaken for validates assessments. etc) o Practice for upcoming high-stakes standardized tests o Some individualization. stays within appropriate time constraints. ----------2. creative answers Formative Oriented to process Interactive process Fosters intrinsic motivation 21. incidental and unplanned comments and responses to sts is an example of formal assessment. large scale tests (e.testing a test&#x201D.) 5. and the computer scores each question before selecting the next one. Tests can provide authenticity. Assessments Periodic assessments can increase motivation as milestones of student progress. communicative curriculum. and some combine the best of both. But the payoff of the &#x201C. has a scoring/evaluation procedure that is specific and time-efficient. Ts&#x201F. ----4.g. and. standardized exams -Timed. and feedback to the learner. Assessment and testing are synonymous terms.Alternative&#x201D. TRUE 25. Tests are essential components of a successful curriculum and learning process. CHAPTER 2 PRINCIPLES OF LANGUAGE ASSESSMENT 26. Advantages of ComputerBased Testing: o Classroom-based testing o Self-directed testing on various aspects of a lang (vocabulary. ------7.Alternative Assessment&#x201D. communicative. TOEFL) in which thousands of test-takers are involved. Appropriate assessments aid in the reinforcement and retention of information. 1.: 1. Authenticity 5. is relatively easy to administer.right&#x201D.s possible to create authentic and motivating assessment to offer constructive feedback to the sts. TRUE 3. Disadvantages of ComputerBased Testing: Lack of security and the possibility of cheating in unsupervised computerized tests. A type of computer-based test (Computer-Adaptive Test / CAT) is available In CAT. Tests are the best tools to assess students. FALSE (formative assessment) 8. ----3. Open-ended responses are less likely to appear because of need for human scorers. Formative assessment always points toward future formation of learning. Assessments promote sts autonomy by encouraging self-evaluation progress. TRUE 10. 24.s ability. and more interaction in the process of offering feedback. Performance based tests measure test takers&#x201F. Most of our classroom assessment is summative assessment. is not excessively expensive. Assessments can spur learners to set goals for themselves. the potential for intrinsic motivation. It&#x201F. Many forms of assessment fall in between the two. more individualization. our challenge was to test interpersonal. in the case of CATs. FALSE They are designed to test actual use of lang not knowledge about lang 4. once they have entered and confirmed their answers. The distribution sts&#x201F. Assessments provide sense of periodic closure to modules within a curriculum.) 6. FALSE (They are informal assessment) 7. There&#x2019. Others are standardized. &#xB7. An Overall summary Tests Assessment is an integral part of the teaching-learning cycle. Washback 1. creative. 22. multiple-choice format -Decontextualized test items -Scores suffice for feedback -Norm-referenced scores -Focus on the &#x201C. ----------8. o Scored electronically for rapid reporting of results. Computer-Based Testing Some computer-based tests are small-scale. but one of useful devices to assess sts. Decide whether the following statements are TRUE or FALSE. knowledge about language. ----------6. they cannot return to five testing criteria for &#x201C. &#xB7. Validity 4. scores across a continuum is a concern in norm referenced test. grammar. the test-taker sees only one question at a time. Reliability 3. doing so to place some trust in our subjectivity and intuition. Practicality 2. In an interactive. FALSE (We cannot say they are best. Assessment Traditional Assessment Alternative Assessment -One-shot. Traditional and &#x201C. comes with more useful feedback to students. assessment is almost constant.       infused the 1990s with a sense of both freedom and responsibility in our testing agenda. ----------10. and &#xB7. All tests should offer the test takers some kind of measurement or result. In past. answer -Summative -Oriented to product -Non-interactive process -Fosters extrinsic motivation Continuous longterm assessment Untimed. ---9. C riterion referenced testing has more instructional value than normreferenced testing for classroom teachers. 23. free-response format Contextualized communicative tests Individualized feedback and washback Criterion-referenced scores Open-ended. For a test to be practical 4 . motivation. and ultimately a more complete description of a student&#x201F. ----------5. discourse. Assessments can confirm strength and pinpoint areas needing further work.

A test should have a structure that follows logically from lesson or unit you are testing. then you can argue this. 29. for content validity to be achieved. photocopying variations. &#xB7. Test Reliability: Sometimes the nature of the test itself can cause measurement errors. students. it could be appropriate to study statistical correlation with other related but independent measures. 28. &#xB7. Direct testing involves the test-taker in actually performing the target task. subjectivity. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. one should be able to elicit the following conditions: &#xB7.). Street noise. Student-Related Reliability: Temporary illness.true&#x201D. bias toward particular &#x201C.s validity may focus on the consequences &#x2013. &#xB7. inattention. How is the validity of a test established? There is no final. content-related evidence of validity.bad&#x201D. but several different kinds of evidence may be invoked in support.s &#x201C.testwiseness&#x201D.s &#x201C. inexperience.s ability to speak TL. it may be appropriate to examine the extent to which a test calls for performance that matches that of the course or unit of study being tested. 31. Criterion-related Validity: It examines the extent to which the criterion of test has actually been achieved. sts should be able to complete the test reasonably within the set time frame. 3. 30. &#xB7. lack of attention to scoring criteria. a classroom test designed to assess a point of grammar in communicative use will have criterion validity if test scores are corroborated either by observed subsequent behavior or by other communicative measures of the grammar point in question. or strategies for efficient test taking can also be included in this category. a bad day.     &#xB7.s perception of validity. or simple carelessness. Criterion-related evidence usually falls into one of two categories: Concurrent (uygun. the 5 .) biases may enter into scoring process. poor light. Also a test-taker&#x201F. One solution to such intra-rater unreliability is to read through about half of the tests before rendering any final scores or grades. or even preconceived (pe&#x15F. &#xB7. We will look at these five types of evidence below. the cost of the test should be within budgeted limits. and &#xB7. in the student (Student-Related Reliability). or even on the test-taker&#x201F. all materials and equipment should be ready. To understand content validity consider difference between direct and indirect testing. content validity has probably been achieved. fatigue. &#xB7. 2. The careful specification of an analytical scoring instrument can increase raterreliability. zamanda olan) validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. Poorly written test items may be a further source of test unreliability. Content Validity: If a test requires the test-taker to perform the behaviour that is being measured. fatigue. Test Administration Reliability: Unreliability may also result from the conditions in which the test is administered. RELIABILITY A reliable test is consistent and h&#xFC. Indirect testing involves performing not target task in test administration (Test Administration Reliability).k&#xFC. desks and chairs. asking sts answer paper-and-pencil multiple choice questions requiring grammatical judgements does not achieve content validity. If you assess a person&#x201F. Other concerns about a test&#x201F. In other cases we may be concerned with how well a test determines whether or not students have reached an established set of goals or level of competence. often popularly referred to as content validity. Intra-rater unreliability is because of unclear scoring criteria. ayn&#x131. the test should be able to be administered smoothly (prosed&#xFC.mamal&#x131. the scoring/evaluation system should be feasible in the teacher&#x201F. absolute measure of validity. other physical or psychological factors may make an &#x201C. temperature. but that related in some way. beyond measuring the criteria themselves . and &#x201C. Inter-rater unreliability occurs when 2 or more scorers yield inconsistent scores of the same test. 27. &#xB7. Consider following possibilities: fluctuations &#xB7. then to recycle back through the whole set of tests to ensure an even-handed judgment.rle bo&#x11F. &#xB7. score deviate from one&#x201F. Classroom objectives should be identified and appropriately framed. VALIDITY The extent to which the assessment requires students to perform tasks that were included in the previous classroom lessons.of a test. If you clearly perceive the performance of test-takers as reflective of the classroom objectives. Lesson objectives should be represented in the form of test specifications.s time frame. For example. Direct testing is most feasible (uygun) way to achieve content validity in assessment. administrative details should clearly be established before the test.observed&#x201D. &#xB7. For example. methods for reporting results should be determined in advance. anxiety. Rater Reliability: Human error. in scoring (Rater Reliability). score. Timed tests may discriminate against sts who do not perform well with a time limit. in the test (Test Reliability) itself. The first measure of an effective classroom test is the identification of objectives.good&#x201D.

For example.n olmamas&#x131.Large-scale standardized tests&#x201D. To give an assessment procedure that is &#x201C. &#xB7. tasks represent real-world tasks.lmesi gereken b&#xFC. a&#xE7. pratik olmas&#x131. its effect on the learner. or stories. subsequent performance in a course.&#x131. of it.l&#xE7. and sociolinguistic appropriateness. Listening comprehension sections feature natural language with hesitations. &#xB7.k bir engel olarak kar&#x15F. The scoring analysis for the interview includes several factors in the final score: pronunciation.lememektedir. tahmini) validity: The assessment criterion in such cases is not to measure concurrent ability but to assess (and predict) a testtaker&#x201F. &#xF6. 5. da &#x201C. the language is as natural as possible.zellikleri ta&#x15F.ildir.&#xFC. its difficulty level is appropriately pitched.&#xFC. paragraphs. white noise. 4.y&#xFC. 34. Face Validity: the degree to which a test looks right.zel ders. Face validity asks the question &#x201C. on the &#x201E. &#xB7. language aptitude tests. &#xB7. &#xB7. construct validity asks.zel ilgi). 32. appear from the learner&#x201F.&#xFC. and the like.. the structure of the test is organized logically.s interpretation and use. p. &#x201C.s&#x131. Consequential Validity: Consequential validity encompasses all the consequences of a test.l&#xE7.m&#xFC. grammatical accuracy. or because children with more highly educated parents get help from their parents. &#xB7. Face validity is not something that can be empirically tested by a teacher or even by a testing expert. you could be justifiably suspicious about the construct validity of that test. topics and situations are interesting.yor mu?) Imagine that you have been given a procedure for conducting an oral interview.r&#xFC. only some families can afford coaching. olarak niteledi&#x11F. More and more tests offer items that are &#x201C.&#x131. Teachers should consider the effect of assessments on students&#x201F.rne&#x11F. The justification for these five factors lies in a theoretical construct that claims those factors to be major components of oral proficiency. &#xB7. and interruptions.ndan b&#xFC. suggests strategies that will be beneficial. and the (intended and unintended) social consequences of a test&#x201F. and timing is appropriate. based on the subjective judgment of test-takers &#xB7.sel.construct validity&#x201D.sal &#xF6. items are as contextualized as possible.n&#xFC. &#xB7. Face validity means that the students perceive the test to be valid.surprises&#x201D. a&#xE7. &#xC7.&#x131. Predictive (&#xF6. So if you were asked to conduct on oral proficiency interview that evaluated only pronunciation and grammar. and attitude toward school work.s perspective to test what it is designed to test? &#xB7. a teacher offers students appropriate review and preparation for the test. a&#xE7. McNamara (2000.&#x131. motivation. such as through a story line or episode is provided. If a test samples the actual content of what the learner has achieved or expects to achieve.Does this test actually tap into the theoretical construct as it has been identified?&#x201D.s&#x131. Students will generally judge a test to be face valid if directions are clear.face&#x201F. &#xB7. In the field of TOEFL&#x201F.&#x131. 6 . &#xB7. Reading passages are selected from real-world sources that testtakers are likely to have encountered or will encounter.ndan (yani hem zaman hem de ekonomik nedenlerden) bu testlerde &#xF6.ekten de test etmek istedi&#x11F.s likelihood of future success.nk&#xFC. the test has no &#x201C. 36. the predictive validity of an assessment becomes important in the case of placement tests. 33.oral production&#x201D. It depends on subjective evaluation of the konu ya da beceriyi test etmede gerekli olan yap&#x131. on preparation for the assessment. in that they are sequenced to form meaningful units. independent learning. study habits. vocabulary use. 54) cautions against test results that may reflect socioeconomic conditions such as opportunities for coaching (&#xF6. and structures the test so that the best students will be modestly challenged and the weaker students will not be &#xE7. fluency.s&#x131.r.kmaktad&#x131.navlar &#x201C. &#x201C. its impact on the preparation of testtakers. and appears to measure the knowledge or abilities it claims to measure.episodic&#x201D. face validity will be more likely to be perceived.l&#xFC. &#xB7.n dil becerileri &#xF6. some thematic (konuyla ilgili) organization.     validity of a high score on the final exam of a foreign language course will be substantiated by actual proficiency in the language.m&#x131.t&#xFC. WASHBACK Washback includes the effects of an assessment on teaching and learning prior to the assessment itself. b&#xF6.construct validity&#x201D. AUTHENTICITY In an authentic test & Construct Validity: Every issue in language learning and teaching involves theoretical constructs.biased for best&#x201D. For example. enjoyable and/or humorous. A classroom test is not the time to introduce new tasks. that is. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback.Does the test. &#xD6.&#x131.imiz s&#x131. &#x201C. (test ger&#xE7. Content validity is a very important ingredient in achieving face validity. 35.ndan pek de uygun de&#x11F. including such considerations as its accuracy in measuring intended criteria.

specifically on test performance. 5. TRUE 5. 3. among others. 9. ---------------------3. the objectives should be incorporated into a structure that appropriately weights the various competencies being assessed. 2. There Is a lot of noise outside the building. The tape is of bad quality. What specifically am I trying to find out? &#xB7. The students' psychological mood may affect it negatively or positively. It is based on subjective judgment. especially when they represent accomplishments in a student&#x201F. Face 9. -----------------8. Student-related reliability 8. 1. Face validity 40. The students (test-takers) think they are given enough time to do the test. 2. Teachers can raise the washback potential by asking students to use test results as a guide to setting goals for their future effort. ---------------------6. Test reliability 3. Test reliability 6. Content 5. 7 . we examine test types. 4. ------------4. FALSE 3. The teacher is tired but continues scoring. the test has face validity. ---------------------2. One way of achieving authenticity in testing is to use simplified language. ---------------------12. students do not actually perform the task. and strategic investment. and administration of it may affect the test&#x201F. we will ask some critical questions. To start the process of designing tests. Content validity 14. Face validity can be tested empirically. The student has had an argument with the teacher. TRUE 6. The test has clear directions. (e. The scorers interpret the criteria differently. 5. 1. There are ambiguous items. ----------9. but they provide no washback if the students receive a simple letter grade or a single overall numerical score. Face 4. 1. One of the sources of unreliability of a test is the school. 8. Why am I creating this test? &#xB7. When a test is designed. Diagnosing strengths and weaknesses of students in language learning is a facet of washback. Test administration reliability 41. placement tests). Face validity 13. Washback enhances a number of basic principles of language acquisition: intrinsic motivation. Sts. FALSE 7. Consequential 11. If students are aware of what is being tested when they take a test. ---------------------1. Washback implies that students have ready access to the teacher to discuss the feedback and evaluation he has given. It requires the test-taker to perform the behavior being measured. Decide which type of validity does each sentence belong to? 1. Criterion related 10. It assesses a test-taker's likelihood of future success. -------5. 4. It measures whether the test meets the objectives of classroom objectives. 7. ---------------------. raters. For an evaluation of overall proficiency? (Proficiency Test) &#xB7. TRUE 8. The test covers the objectives of the course. Decide with which type of reliability could each sentence be related? 1. To measure achievement within a course? (Achievement Test) Once you established major purpose of a test. 5 questions should form basis of your approach to designing tests for class. The student is anxious. Decide whether the following statements are TRUE or FALSE. Question 2: What are the objectives of the test? &#xB7. test. 3. To enhance washback comment generously &amp. CHAPTER 3 DESIGNING CLASSROOM LANGUAGE TESTS 42. --------10. 7. It includes the consideration of the test's effect on the learner. It questions the accuracy of measuring the intended criteria.14. Consequential 12.      Formal tests can also have positive washback. ---------------------13. Criterion related 8.g. 6. Tests should serve as learning devices through which washback is achieved. incorrect responses can become windows of insight into further work. What is washback? In general terms: The effect of testing on teaching and learning In large-scale assessment: Refers to the effects that the tests have on instruction in terms of how students prepare for the test In classroom assessment: The information that washes back to students in the form of useful diagnoses of strengths and weaknesses What does washback enhance? Intrinsic motivation Language ego Autonomy Inter-language Self-confidence Strategic investment What should teachers do to enhance washback? Comment generously and specifically on test performance Respond to as many details as possible Praise strengths Criticize weaknesses constructively Give strategic hints to improve performance 38. Washback is part of it.s inter-language. Rater reliability 9. Face 2. Items of the test do not seem to be complicated. To place students into a course? (Placement Test) &#xB7. and think that the questions are appropriate. It requires the test to be based on a theoretical background. Question 1: What is the purpose of the test? &#xB7. Content 7. What language abilities are to be assessed? Question 3: How will test specifications reflect both purpose and objectives? &#xB7. The room is dark. Student-related reliability 4. and learn how to design tests and revise existing ones. Their correct responses need to be praised. self-confidence. 8. language ego. Rater reliability 5. interlanguage. TRUE 4. Consequential 3. It appears to measure the knowledge and abilities it claims to measure. FALSE 39. 6. TRUE 2.s reliability. ---------------------7. Sts&#x201F. In indirect tests. 37. An expensive test is not practical. autonomy. -------------11. you can determine its objectives. Construct 6. Test administration reliability 2. Test administration reliability 7. The test is too long.

The Modern Language Aptitude Test (MLAT). Test-takers are directed to read 150-word passage while they are tape recorded. Language Aptitude Tests They predict a person&#x201F. and more recent tests also include oral production. costly process To choose one of a number of commercially available proficiency tests is a far more practical method for classroom teachers. Such tests often have content validity weaknesses. Achievement Tests 45. Tests vary in the form and function of feedback. or even a total curriculum. depending on their purpose. and the progression of tasks ought to be biased for best performance. Consonants. This information help Ts make decisions about aspects of English phonology. The test administrator then refers to an inventory(envanter. deftere kay&#x131. Stress . Part 2: sts write a composition in response to an article. 3. sts read an essay and identify grammar errors in it. and/or feedback is expected? &#xB7. and it will also help you to focus on the specific objectives of the test. &#xB7. 4.s a fine line of differences between diagnostic test and achievement test. Language Aptitude Tests 2. There&#x201F. 2. A diagnostic test of oral production was created by Clifford Prator (1972) to accompany a manual of English pronunciation. 5. ESL Placement Test (ESLPT) at San Francisco State University has three parts. the way results are reported is an important consideration. ESL is more authentic but less practical. For every test. They are usually not equipped to provide diagnostic feedback. Sts should find the test neither too easy nor too difficult but challenging. A test of pronunciation diagnose phonological features that are difficult for Sts and should become part of a curriculum. and active strategic involvement in learning. After multiple listening. Their role is to accept or to deny someone&#x201F. Achievement tests should be limited to particular material addressed in a curriculum within a particular time frame and should be offered after a course has focused on the objectives in question. A placement test usually includes a sampling of the material to be covered in the various courses in a curriculum. There&#x201F.ya) of phonological items for analyzing a learner&#x201F. It includes: standardized multiple choice items on grammar. it tests overall ability. Part 1: sts read a short article and then write a summary essay. &#xB7. grading. because human evaluators are required for the first two parts. &#xB7. Aptitude test is designed to measure capacity or general ability to learn a FL. Any test that claims to predict success in learning a language is undoubtedly flawed because we now know that with appropriate self-knowledge. Proficiency Tests A proficiency test is not limited to any one course. and paired associates. curriculum. words in 47. They should also achieve content validity by presenting tasks that mirror those of the course being assessed. In the test. phonetic script. reading comprehension. TEST TYPES Defining your purpose will help you choose the right kind of test. The tasks need to be practical. Achievement tests analyze the 8 . Certain proficient tests can act in the role of placement tests. Under some circumstances a letter grade or a holistic score may appropriate. Question 4: How will test tasks be selected and the separate items arranged? &#xB7. Question 5: What kind of scoring. 44. Diagnostic Tests 5.s success prior to exposure to the second language.s no unequivocal evidence that language aptitude tests predict communicative success in a language. rather. Two standardized aptitude tests have been used in the US. 48. 1. They are designed to apply to the classroom learning of any language. A diagnostic test can help a student become aware of errors and encourage the adoption of appropriate compensatory strategies. 49. units. other circumstances may require that a teacher offer substantive washback to the learner. Sometimes a sample of writing is added.       43. Other factors. Achievement Tests Achievement test is related directly to lessons. Placement Tests 4.rhythm.s production. Intonation. validating them with research is time-consuming &amp. Reliability problems present but mitigated by conscientious training evaluators What is lost in practicality and reliability is gained in the diagnostic information that the ESLPT provides. e&#x15F. Below are the test types to be examined: 1. &#xB7. &#xB7. Vowels. and aural comprehension. Diagnostic Tests A diagnostic test is designed to diagnose specified aspects of a language. 46. Proficiency Tests 3. or single skill in the language. Such tests offer a checklist of features for administrator to use in pinpointing difficulties. The tasks themselves should strive for authenticity. spelling clues. Placement Tests The objective of placement test is to correctly place sts into a course or level. Part 3: multiple-choice. They should be evaluated reliably by the teacher or scorer. they produce checklist for errors in 5 categories. Proficiency tests are almost always summative and norm-referenced. Pimsleur Language Aptitude Battery(PLAB) Tasks in MLAT includes: Number learning.s passage into next stage of a journey TOEFL is a typical standardized proficiency test. Creating &amp. everyone can succeed eventually. vocabulary. A writing diagnostic elicit a writing sample from sts that would allow Ts to identify those rhetorical and linguistic features on which the course needed to focus special attention.

reliability.     extent to which students have acquired language features that have already been taught. Does each multiple choice have appropriate cevap geldiyse.l&#x131. With only a minimum of context in each stem. is correct response. Every multiple-choice item has a stem. Do the sum of items and the test as a whole adequately reflect the learning objectives? In the final revision of your test.l&#xE7. 20 &#xF6. which presents several options or alternatives to choose from. Is the language of each item sufficiently authentic? 8. &#x201C. or revise items: The appropriate selection and arrangement of suitable multiple-choice items on a test can best be accomplished by measuring items against three indices: a) item facility(IF). Some important jargons in Multiple-Choice Items: Multiple-choice items are all receptive. Washback may be harmful. make the necessary adjustments make sure your test is neat and uncluttered on the page if there is an audio component. and c) distractor analysis They are often summative because they are administered end of a unit or term. examine the objectives for the unit you&#x201F.or ten-minute quizzes to three-hour final examinations. b) Item discrimination (ID) is extent to which an item differentiates between high.&#x201D. C) Devising Test Tasks how students will perceive them(face validity) the extent to which authentic language and contexts are present potential difficulty caused by cultural schemata In revising your draft. An item on which high-ability students and low-ability students score equally well would have poor ID 9 . IMPORTANT!!! Consider the following four guidelines for designing multiple-choice items for both classroom-based and large-scale situations: 1. 53. test-taker chooses from a set of responses rather than creating a response. Make certain that the intended answer is clearly the only correct a number of weaknesses in multiple-choice items: The technique tests only recognition knowledge. %15 %85&#x201F. or item differentiation. that is. 3. Use item indices (indeksler) to accept. anda hem modal bilgisini hem de article bilgisini &#xF6. One of those options. Time yourself if the test should be shortened or lengthened. Are the directions to each section absolutely clear? 2. B) Drawing Up Test Specifications (Talimatlar) Test specifications will simply comprise a) a broad outline of the test b) what skills you will test c) what the items will look like This is an example for test specifications based on the objective stated above: &#x201C.and low-ability test-takers. simple language? 5. others serve as distractors . And very difficult items can provide a challenge to high estability sts. in simple social conversations. (ayn&#x131. with an almost infinite variety of item types and formats. or item difficulty b) item discrimination (ID). Other receptive item types include true-false questions and matching lists. State both stem and options as simply and directly as possible.zumsuz) words. a wide variety of responses may be perceived as correct. Your first task in designing a test. is to determine appropriate objectives. by the end of a period of analizini yap&#x131.) 2. Is each item stated in clear. But effective achievement tests can serve as useful washback by showing the errors of students and helping them analyze their weaknesses and strengths.65(%65). Unambiguous Objectives Before giving a test. make sure that the script is clear. Do not use superfluous (l&#xFC. Achievement tests range from five. discard. Does each item measure a specified objective? 4. 51. or selective. However. the key. are the wrong items clearly wrong and yet sufficiently &#x201C. 2 principles support multiple-choice formats are practicality . D) Designing Multiple-Choice Test Items There&#x201F. a) Item facility (IF) is the extent to which an item is easy or difficult for the proposed group of testtakers. then. with the correct grammatical form and final intonation pattern. Cheating may be kabul edilebilir Two good reasons for including a very easy item (%85 or higher) are to build in some affective feelings of &#x201C.yor.) Primary role of achievement test is to determine whether course objectives have been met &#x2013. and another rule of succinctness (az ve &#xF6. you should ask yourself some important questions: 1. and appropriate knowledge and skills acquired &#x2013.renciden 13 do&#x11F. Guessing may have a considerable effect on test scores. with the correct grammatical form and final intonation pattern. practical steps in constructing classroom tests: A) Assessing Clear. 52.t ridiculously easy? 6.Students will recognize and produce tag questions. The technique severely restricts what can be tested. in simple social conversations.success&#x201D.&#x11F. 4.yor. Design each item to measure a specific objective. (Ge&#xE7. 13/20=0. that is.) Diagnostic tests should elicit information on what students need to work on in the future.z) is to remove needless redundancy (gereksiz bilgi) from your options.Students will recognize and produce tag questions. (Gelecek ile ilgili bir analiz yap&#x131. among lower-ability students and to serve as warm-up items. Is the difficulty of each item appropriate for your students? 7. Is there an example item for each section? 3. that they aren&#x201F. Eliminating unintended possible answers is often the most difficult problem of designing multiple-choice items. It is very difficult to write successful items.mi&#x15F. 50.

of low group.aya ay&#x131.&#x11F. scores for each element being rated b. a self-assessment c. marginal comments 5.t mean just giving & par&#xE7. quizzes in class. 30 &#xF6. 55. The stem of a multiple-choice item should be as long as possible in order to help students to understand the context.&#x15F.&#xFC.&#x15F.over the head&#x201D. Any placement test can be used at a particular teaching program. A proficiency test is limited to a particular course or curriculum. an indication of correct/incorrect responses b. for 90-100. on the essay b.t fool anyone. Decide whether the following statements are TRUE or FALSE. on all or selected parts of the test. 9. 1. culture and context of class institutional expectations (most of them unwritten). It is very easy to develop multiplechoice tests.e kadar &#xFC.k notu alan 10 &#xF6. An item that garners(toplamak) correct responses from most of the high-ability group and incorrect responses from most of low-ability group has good discrimination power.language aptitude test measures a learner&#x201F. a letter grade 2. 6. 3. Language aptitude tests are very common today. checklist of areas needing work d. writing) 4. post-test conference to go work 7.ok &#xF6. FALSE (It seems easy.0 and no discriminating power at all would be zero. High discriminating level would approach 1. Listening %30. four subscores (speaking. Item discrimination index differentiates between high and low-ability sts. 7. No absolute rule governs establishment of acceptable and unacceptable ID indices.A&#x201D. Choices High-ability students (10) Lowability students (10) A 0 3 B 1 5 C* 7 2 D 0 0 E 2 0 The item might be improved in two ways: a) Distractor D doesn&#x201F. scores for each element being rated c. and therefore latter sts don&#x201F.) 7. FALSE (The first task is to determine appropriate objectives.renciyi bir item&#x201F. 56. FALSE (Not all placement tests suit every teaching program. Each multiple-choice item in a test should measure a specific o beceriye fazla puan vermek gerekir Oral production %30.lure&#x201D.&#x11F.&#x11F. TRUE 5. TRUE 4.nemsemi&#x15F. or curriculum.renciyi en iyiden en d&#xFC. Those are some examples of feedback: 1. it means the item is very easy. It&#x201F.&#xE7. 4. Therefore it probably has no utility.r. A revision might provide a distractor that actually attracts a response or two. listening. TRUE 5.eklinde. Multiple-choice tests are practical but not reliable. B) Grading Grading doesn&#x201F. units. 5. 1. especially lower-ability ones. individual conferences with each student to review the whole test 58. FALSE 3. FALSE 4. Multiple-choice tests are time-saving in terms of scoring and grading.50 The result tells us that us that the item has a moderate level of ID. you must consider how the test will be scored and graded Scoring plan reflects relative weight that you place on each section and items hangi beceriyi daha &#xE7. you would want to discard an item that scored near zero. TRUE 6. GRADING AND GIVING FEEDBACK A) Scoring As you design a test.s not that simple. 1. TRUE 2. post-interview conference to go over results a. oral feedback after the interview 6. explicit and implicit definitions of grades that you have set forth.     because it did not discriminate between the two groups.&#x131.da a&#x15F. If the Item Facility value is . The first task in designing a test is to determine test specification.renci ile en d&#xFC. En y&#xFC. In most cases. expectations that have been engendered in previous tests. FALSE (They can be both practical and reliable.&#xFC. 6.) 3.) 2. c) Distractor efficiency (DE) is the extent to which the distractors &#x201C.) 59. reading.10(% 10). A five-minute quiz can be an achievement test. 1. a whole-class discussion of results of the test 9.daki gibi ay&#x131. for the oral interview a. SCORING.s future success in learning a FL. 2. How assign letter grades is a product of country. FALSE 9. 7. suggestions d. The aim of a placement test is to place a student into particular level.ksek notu alan 10 &#xF6. Achievement tests are related to classroom lessons. a total score 3. the relationship you have established with the class. 8. 5. C) Giving Feedback Feedback should become beneficial washback. for the listening and reading sections a. Example: *Note: C is the correct response. 10 . Sts&#x201F. 2. peer checking of results 8. e&#x15F. Decide whether the following statements are TRUE or FALSE.ral&#x131. but is not very easy. Multiple-choice items are receptive. a checklist of areas needing work e.&#x11F. 8. The other two distractor (A and B) seem to be fulfilling their function of attracting some attention from the lower-ability students.t even consider it. 4. and those responses are somewhat evenly distributed across all distractors. TRUE 8. 3. Placement tests have many varieties. Reading %20 ve Writing %20 &#x15F. b) Distractor E attracts more responses (2) from the high-ability group than the low-ability group (0). a sufficient number of test-takers. 57.m Item # High-ability students (top 10) Low-ability students (bottom10) Correct 7 2 Incorrect 3 8 ID: 7-2=5/ 10= 0. Why are good students choosing this one? Perhaps it includes a subtle reference that entices the high group but is &#x201C.a&#x11F. marginal end-of-essay comments.

spelling. In the MC editing test.gate-keeping essay test&#x201F. ADVANTAGES AND DISADVANTAGES OF STANDARDIZED TESTS: -Advantages: * Readymade previously (Ts don&#x201F. create a smaller-scale standardized test (A) The Test of English as a Foreign Language (TOEFL) &#x201E. &#x2022. 1. purpose is placement and construct validation of a test consisted of an examination of the content of the ESL courses *In recent revision of ESLPT. phrases or words Writing section tests writing ability in the form of open-ended(free composition) or it can be structured to elicit anything from correct spelling to discourse-level competence ESLPT Designing test specs for ESLPT was simpler tasks . select. c) processing data form pilot testing &#x2022. reliability in tasks and item response formats equally important The specification mirrored reading-based and process writing approach used in class.Knowing how to develop a standardized test can be helpful to revise an existing test.) 7. GET specification for GET are skills of writing grammatically and rhetorically acceptable prose on a topic . WHAT IS STANDARDIZATION: A standardized test presupposes certain standard objectives or criteria that are held constant across one form of the test to another. (B) The English as a Second Language Placement Test (ESLPT). content &amp. Scholastic Aptitude Test (SAT): college entrance exam seeking further information The Graduate Record Exam (GRE): test for entry into many graduate school programs Graduate Management Admission Test (GMAT) &amp.Standardized tests are expected to be valid and practical TOEFL *To evaluate the English proficiency of people whose NL is not English. b) providing appropriate prompts (they should fit the passages) &#x2022..        TRUE 6. first (easier task) choose an approp. but not only one particular curriculum They are norm-referenced and the main goal is to place sts in a rank order. *Colleges and universities in the US use the score TOEFL score to admit or refuse international applicants for admission ESLPT *To place already admitted sts at SFSU in an approp. with clearly produced organization of ideas and logical development.placement test at a university&#x201F. They measure a broad band of competencies. Design. San Francisco State University (SFSU) &#x201E.10 is a very difficult one.general ability or proficiency&#x201F. the main problems are a) selecting appropriate passages(conform the standards of content validity) &#x2022. 65. In general standardized test items are in the form of MC. Design test specification. FALSE (It should be short and to the point. TOEFL the first step is to define the construct of language proficiency After breaking langcompetence down into subset of 4 skills each performance mode can be examined on a continuum of linguistic units. TOEFL &#x2022.) 8. section and the test overall. Statistical characteristic: it include IF and ID &#x2022. . course in academic writing and oral production. they are piloted and scientifically selected to meet difficulty specifications within each subsection. word. (C) The Graduate Essay Test (GET). characteristics of a standardized test 64. ESLPT For written parts. Human scored tests of oral and written production are also involved. They provide &#x201E. DEVELOPING A STANDARDIZED TEST: .t need to spend time to prepare it) * It can be administered to a large number of sts in a time constraint * Easy to score thanks to MC format scoring (computerized or holepunched grid scoring) * It has face validity -Disadvantages: * Inappropriate use of tests * Misunderstanding of the difference between direct and indirect testing 63. means for determining correct and incorrect responses. Content coding: the skills and a variety of subject matter without biasing (the content must be universal and as neutral as possible) &#x2022. single sentences. 67. Determine the purpose and objectives of the test. Law School Aptitude Test (LSAT): tests that specialize in particular disciplines Test of English as a Foreign Language (TOEFL): produced by the International English Language Testing System (IELTS) The tests are standardized because they specify a set of competencies for a given domain and through a process of construct validation they program a set of tasks. And also practicality. 2. 3. TRUE 60.objective&#x201F. 62. essay within whick embed errors. However MC is not the only test item type in standardized test. FALSE (An item with an IF value of . and arrange test tasks/items. Before administration. face validity are important theoretical issues. (pronun. grammar) Oral production section tests fluency and pronunciation by using imitation Listening section focuses on a particular feature of lang or overall listening comprehens Reading section aims to test comprehension of long/short passages. Chapter 4 STANDARDIZED TESTING: 61. adapt or expand an existing test. And a more complicated one is to embed a specified number 11 . *To provide Ts some diagnostic information about sts GET *To determine whether their writing ability is sufficient to permit them to enter graduate-level courses in their programs(it is offered beginning of each term) 66. SFSU &#x201E.

error can be used as distractors) Topics are appealing and capable of yielding intended product of an essay that requires an organized logical arguments conclusion. &#x2022. 3 hours (PB) Specifications CB: A listening section which includes dialogs. and mini lectures. culturally biased information may lead to higher level of difficulty GET *No data are collected from sts on their perceptions. ease of administration &amp. 4. 5. Administrators rely on the research on university level academic writing tests such as TWE. .5 hours Specifications A 30-minute impromptu essay on a given topic.5 to 3. but they are must for standardized MC test.For production responses. They give scores between 1 to 4 *recommended score is 6 as threshold for allowing sts to pursue graduate-level courses *If the st gets score below 6. complex lang. he either repeat the test or take a remedial course 70. and reading comprehension. Canadian. universities and colleges for admission purposes Type Computer-based and paper-based Response modes Multiple-choice responses and essay Time allocation Up to 4 hours (CB).S. a 25-minute multiple-choice listening comprehension test. short conversations.        errors from a pre-determined error categories. a 100-item 75-minute multiple choice test of grammar. Any standardized test must be accompanied by systematic periodic corroboration of its effectiveness and by steps toward its improvement TOEFL *the latest study on TOEFL examined the content characteristics of the TOEFL from a communicative perspective based on current research in applied linguistics and language proficiency assessment ESLPT *The development of the new ESLPT involved a lengthy process both content and construct validation. MELAB Primary market U. a structure section which tests formal language with two types of questions (completing incomplete sentences and identifying one of four underlined words or phrases that is not acceptable in English. IELTS Primary market Australian. error recognition. a reading section which include four to five passages on academic subjects with 1014 questions for each passage. and oral production Time allocation 2 hours. and Canadian language programs and colleges. academic discussions.(T can perceive the categories from sts GET previous error in written work &amp. fuzzy data. 45 minutes Specifications A 60-minute reading. No pilot testing of prompts is conducted.S. sts&#x201F. Be careful about the potential cultural effect on the numerous international students who must take the GET 68. some worldwide educational settings Type Paper-based Response modes Multiple-choice responses and essay Time allocation 2. and reading comprehension 75. Specify scoring procedures and reporting formats. a 10 to 15 minute speaking of five sections 74. British. short conversations. approximately 45-minute listening administered by audiocassette and which includes statements. but the scorers have an opportunity to reflect on the validity of given topic 69. practicality. . Make appropriate evaluations of different kinds of items. different forms of evaluation become important. reliability &amp.t receive their essay back GET *Each GET is read by two trained reader. obscure topic. writing section which requires examinees to compose an essay on a given topic 72. *Some criticism of the GET has come from international test-takers who posit that the topics and time limits of the GET work to the disadvantage of writers whose native language is not English. paper-based for Listening and Speaking parts Response modes Multiple-choice responses. vocabulary. 71. Primary market TOEFL U.e. and New Zealand academic institutions and professional organizations and some American academic institutions Type Computer-based for Reading and Writing sections. an optional oral interview 73. along with facing such practical issues as scoring the written sections and a machine-scorable MC answer sheet GET *There is no research to validate the GET itself. essay. questions. timing of test. Performing ongoing construct validation studies. 6. 75-minute reading which includes cloze sentences. and short talks. Unclear direction. a 60minute writing. TOEIC Primary market Worldwide.IF. a 100-item. TOEFL -Scores are calculated and reported for *three sections of TOEFL *a total score *a separate score ESLPT *It reports a score for each of the essay section (each essay is read by 2 readers) *Editing section is machined scanned *It provides data to place sts and diagnostic information *sts don&#x201F. CHAPTER 5 STANDARDIZED-BASED ASSESSMENT: 12 . how much time is required to score *reliability: is a major player is instances where more than one scorer is employed and to a lesser extent when a single scorer has to evaluate tests over long spans of time that could lead to deterioration of standards *facilities: is key for valid and successful items. ID and distractor analysis may not be necessary for classroom (one-time) test. a 30-minute listening of four sections. (i. workplace settings Type Computer-based and paper-based Response modes Multiple-choice responses Time allocation 2 hours Specifications A 100-item. cloze reading. facility) *practicality: clarity of directions.

(MUFRADATTAKI STANDARDLAR GERCEKCI OLCAK) Standards for teachers ( qualifications. science. expertise. Tests were considered to be a way of making reforms in education. Specialists design. It includes more than 80 standardized assessment instruments used to. Criticism: Some teachers claimed that those tests were unfair there were dissimilarity between the content &amp. (reading texts. Teachers&#x201F. The Last 20 Years *Educators become aware of weaknesses in standardized testing: They were not accurate measures of achievement and success and they were not based on carefully framed. culture. revise and validate many tests. functional and sociolinguistic Interpersonal skills. OGRETMEN NASIL OLMALI Linguistic and language development Culture and interrelationship between language and culture Planning and managing instructions Consequences of standardized based and standardized testing Positive High level of practicality and reliability Provides insights into academic performance Accuracy in placing a number of test takers on to a norm referenced scala Ongoing construct validation studies Negative They involve a number of test biases A small but significant number of test takers are not assessed fairly nor they are assessed accurately Fosters extinct motivation Multiple intelligence are not considered There is danger of test driven learning and teaching In general performance is not directly assessed 82. Test bias Standardized tests involve many test bias (lang.) pg 105 please 78. comprehensive and validated standards of achievement. observations recorded on scannable forms. parents. 77. Systems. phonology. organising files etc. TEACHER STANDARDS &#x2013. training)(OGRETMENLERE STANDARD GETIRIYOR) A thorough analysis of means available to assess student attainment of those standards. Late 20th Century *There was possible inequity and disparity between the tests in such tests and the ones they teach in classes. objectives) for each grade level(pre-school to grade 12) and each content area (math. needs *monitor progress *certify mastery of functional skills At higher level of education (colleges. arts&#x2026. discourse. teamwork. ESOL.s Commissions in Achieving Necessary Skills): outlines competencies necessary for language in the workplace the competencies are acquired and maintained through training in basic skills(4 skills). work. sociability) Resources (allocating time. subject matter areas might be assessed most departments of education at all state level in the US have specified the appropriate standards (criteria. pragmatic. A realistic scope of standards to be included in curriculum. customer service etc. *Teachers were in the leading position of those challenges. The California English Language Development Test (CELDT) is a battery of instruments designed to assess attainment of ELD standards across grade level. evaluating data. It is found that the standardized tests of the past decades were not in line with newly developed standards the interactive process not only of developing standards but also of creating standards-based assessment started.ELLs) (LEP is discarded because of the negative connotation word &#x201E. educators started to establish some standards on which sts of all ages &amp. thinking skills (reasoning &amp. ELD ASSESSMENT The development of standards obviously implies the responsibility for correctly assessing their attainment. listening stimulus) Standardised tests do not promote logical-mathematical and 13 . CASAS AND SCANS CASAS: (Comprehensive Adult Student Assessment System): Designed to provide broadly based assessments of ESL curricula across US. Mid 20th Century Standardized tests had unchallenged popularity and growth. Technology use and application 81. ELD STANDARDS In creating benchmarks for accountability. It provided useful data on students&#x201F. workplace) SCANS: (Secretary&#x201F. *The claims in mid-20th century began to be questioned/criticised in all areas. (not publicly available) Language and literacy assessment rubric collected students&#x201F. needs are. Specification of what ELD students&#x201F.) The construction of standards makes possible concordance between standardized test specification and the goals and objectives (ESL. ELD. performance for oral production. teachers.(OGRENCILERIN OGRENDIKLERINI NASIL DEGERLENDIRECEZ 79. and legal consultants. *A movement has started to establish standards to assess students of all ages and subject-matter areas. gender. race. students. materials. *place sts in programs *diagnose learners&#x201F. Information processing. air of empirical science. staff etc. problem solving).       76. learning styles) National Centre for Fair and Open Testing claims of tests bias from. task of the tests &amp. reading and writing in different grades 80. adult and language schools. there is a tremendous responsibility to carry out a comprehensive study of a number of domains: Categories of language. Standardized tests brought convenience. what they were teaching in their classes Solutions: By becoming aware of these weaknesses. efficiency. *There have been efforts on basing the standardised tests on clearly specified criteria for each content area being measured. Quickly and cheaply assessing students became a political issue. personal qualities (self-esteem &amp. understanding social and organizational system.

s listening comprehension. speeches. portfolio. is only as good as one&#x201F. Indicators are more than &#x201E. bring with them certain ethical surrounding the gate-keeping nature of standardized tests. test anxiety. 84. 87. The following processes flash through your brain : 1.indicators&#x201F. designing appropriate assessment tasks in listening begins with the specification of objectives.&#x201D. of them in short-term memory. a memory block. statements (complex evidence of performance. assessment. (Shohamy) Standards .specified by client educational institutions. inaudible &#x2013. page on 305) said. Student learning is at the heart of the teacher&#x201F. tests are most powerful as they are often the single indicators for determining the future of individuals&#x201F. delete the exact linguistic form in which the message was originally received in favor of conceptually retaining important or relevant information in long-term memory. or other studentrelated reliability factor. in most cases. When students know that one single measure of performance will determine their lives they are less likely to take positive attitudes towards learning.s competence is to consider the fallibility of the results of a single performance such as that produced in a test. imprint&#x201D. Oral production ability &#x2013. self &#x2013. OBSERVING THE PERFORMANCE OF FOUR SKILLS to&#x201F. there is no permanent observable product for speaking. THE IMPORTANCE OF LISTENING Listening has often played second fiddle to its counterpart of speaking. A single test with multiple test tasks to account for learning styles and performance variables In-class and extra-class graded work Alternative forms of assessment ( e. Test-driven learning and teaching It is another consequence of standardized testing. ETHICAL ISSUES: CRITICAL LANGUAGE TESTING One of by-products of rapid growing testing industry is danger of an abuse of power. 14 . Extrinsic motivation not intrinsic Ts are also affected from test-driven policies.: examples of evidence that the teacher can meet a part of a standard. This study shows is that activating student&#x201F. illness. observation. Sometimes the performance does not indicate true competence a bad night&#x201F.s rest.. 2 The productive skills allow us to hear and see the process as it is performance writing can give permanent product of written piece. Receptive skills -. BASIC TYPES OF LISTENING For effective test.Listening performance The process of listening performance is about : Invisible. Teachers can demonstrate standards in their teaching. But recorded speech. at the risk of ignoring other objectives in the curriculum. demonstrations. assessment ) 86. ( Jeremy Harmer. Performance based assessment is integrated (not a checklist or discrete assessments) Each assessment has performance criteria against which performance can be measured. g journal.     verbal linguistic to the virtual exclusions of the other contextualised. 4. But its rare to find just a listening test. integrative intelligence. Teachers can be assessed through their classroom performance. samples of work. an emotional distraction. conference. &#x201E. Input the aural-oral mode accounts for a large proportion of successful language acquisition. One important principle for assessing a learner&#x201F.Tests represent a social technology deeply embedded in education. process of internalizing meaning form the auditory signals being transmitted to the ear and brain. Performance criteria identify to what extend the teacher meets the standard. Those who use standardised tests for the gate keeping purposes. with few if only other assessments would do well to consider multiple measures before attributing infallible predictive power to standardised test. Listening is often implied as component of speaking. other than monologues. (some learners may need to be assessed with interviews. 2. peer &#x2013. Multiple measures give more reliable &amp. recognize speech sounds and hold a temporary &#x201C. government and business. The listening tasks are designed to assess the candidate&#x201F. The form which involve performances and contexts in measurement should design following: Several tests that are combined t form an assessment. Performance can be detailed with &#x201E.s performance. That would solve test bias problems but it is difficult to control it in standardized items. Simultaneously determine the type of speech event. two interacting concepts: Performance Observation&#x201D.s ability to process form of spoken English. 6 ASSESSING LISTENING 85. use (bottom-up) linguistic decoding skills and / or (top-down) background schemata to bring a plausible interpretation to the message and assign a literal and intended meaning to the utterance. A more serious effect was to punish schools with lower-socioeconomic neighbourhood 83. observation reports) more formative assessment rather than summative.s schemata. or criteria. valid assessment than a single measure We can observe neither the process of performing nor a product? 1. reading aloud and the like&#x2013. They are under pressure to make sure their sts excelled in the exam. 3. portfolios.

Keiko wants to come to Japan C. Reduced Forms Understanding reduced forms that may not be a part of learner&#x201F. 8. ideas.s from California Test-takers read : A. Rate of Delivery Keeping up with speed of delivery. given information. infer links and connections between events. Elaborations and Insertions 3. What Makes Listening Difficult 1. Vine B. Test-takers read : A. 4. other performance variables Recognize grammatical word classes (nouns. I did.attending signals. goals using real-world knowledge From events. Responsive. reduced forms.t go to the party Test-takers read : A. intonation contours. Designing Assessment Tasks &#x2022. George comes from Canada D. I&#x2019.). corrections.t go to the party One word stimulus Test-takers hear : vine Test-takers read : A. vowels Test-takers hear : is he living? Test-takers read : A.t Test-takers hear : My girlfriend can&#x2019. George is American C. in more of bottom-up process Discriminate among sounds of English retain chunks of language of different lengths in short-term memory Recognize stress patterns. etc. Performance variables Hesitations. Interaction: Negotiation. Extensive. Wine 94. George. &#x2022. predict outcomes. Listening for perception of the components. Sentence Paraphrase Test-takers hear : Hellow. and detect such relations as main idea. words in stressed/ unstressed position. and Intonation: Correctly understanding prosodic elements of spoken language. and their role in signaling information Recognize reduce form of words. yes. guessing the meaning from context. goals Infer situations. Dialogue paraphrase Test-takers hear Testtakers read : man : Hi.termination 92. Keiko is comfortable in japan B. appealing for help. I come from Japan Test-takers read : A. four commonly identified types of listening performances 1. in about an hour B. Intensive. supporting idea. My girlfriend can&#x2019.clarification. according to situations. and exemplification Distinguish between literal and implied meanings Use the facial. Distinguish word boundaries. lang has been presented 4.turn taking. 89. My girlfriend can go to the party B. Morphological pair. shared cultural knowledge 6. Macroskills Focusing on larger elements involved in a top-down approach recognize the communicative functions of utterances. such as detecting key words. pluralization). Maria. and elliptical forms. Detect sentence constituents and distinguish between major-minor constituents Recognize particular meaning may be expressed in different grammatical form Recognize cohesive device in spoken discourse 90. processing automatically as speker continu 7.         88. Appropriate response to a question Test-takers hear Test-takers read : how much time did you take to do your homework? : A.s from California B.s from California Phonemic pair. and so on. pattern. Selective. errors. George lives in United States B.g.textbook&#x201D. Clustering Chunking-phrases. is he leaving? B. about an hour C. Maria is Canadian 95.m Canadian : A. 3. monologue or conversatation and simultaneously read written text in which selected words or phrases have been deleted One Potentional Weakness of listening cloze technique They may be simply become reading comprehension tasks. Open-ended response to a question Test-takers hear Test-takers write or speak : how much time did you take to do your homework? : __________________________________ 96.s past experiences in classes where only formal &#x201D. and signaling comprehension or lack thereof 91. Test-takers who are asked to listen to a story with periodic deletions in the written version may not need to listen at all. verbs. described. new information. Micro and Macro skills Micro skills Attending to smaller bits and chunks. She&#x2019. tense. Designing Assessment Tasks : Selective Listening Test-taker listens a limited quantity of aural input and discern some specific information Listening Cloze (cloze dictations or Partial Dictation) Listening cloze tasks require the test-taker to listen a story. participants. Keiko is Japanese D. participants. yet may still able to 15 . He&#x2019. slang. Extensive listening will usually take a place outside the classroom. &#x2022. Rephrasing. is he living? 93. Are you American? man : no. Rhythm. recognize the core of a words and interpret word order patterns and their significance Process speech at different rates of delivery Process speech containing pauses. Material for extensive listening can be obtained from a number of sources. my name is Keiko. I miss you very much Stress pattern in can&#x2019. constituents 2. consonants Test-takers hear : He&#x2019. False starts. Teacher use audio material on tape or hard disk when they want their students to practice listening skills 2. about $10 D. systems (e. body language. generalization. Recognizing Phonological and Morphological Elements Phonemic pair. I missed you very much B.maintenance. agreement. Redundancy Repetitions. Corrections. my name is George. kinesics. rules.Paraphrase Recognition &#x2013. clauses. deduce causes and effects. Designing Assessment Tasks &#x2022. and other nonverbal clues to decipher meanings Develop and uses a battery of listening strategies. Keiko likes Japan &#x2013. Diversion 5 Colloquial Language Idioms. woman : Nice to meet you. rhythmic structure . Stress. which is more difficult than understanding the smaller phonological bits and pieces. -ed ending Test-takers hear : I missed you very much.

natural speed. 2. sentence.Produce differences among English phonemes and allophonic variants. and other nonverbal cues along with verbal language. 4. etc. First: Test-takers hear the insrtuction and dialogue or monologue. and accurately assessing how well your interlocutor is understanding you.Produce chunks of language of different lengths. implicative. agreement.s daily schedule and fill in the schedule. completing a form. 7.yielding. pause at each break. TV. 8. fillers. Micro. pragmatic conventions.Produce English stress patterns. and no test covers them all. and then must respond with an oral repetition of that stimulus. pluralization). Chapter-7 Assessing Speaking 100. Alternatives to assess comprehension in a truly communicative context Note taking Listening to a lecturer and write down the important ideas. backtracking. It is designed to demonstrate competence in a narrow band of grammatical. appealing for help.Convey facial features. poetry. Communicative Stimulus-Response Tasks The test-takers are presented with a stimulus monologue or conversation and then are asked to respond to a set of comprehension questions. no pauses. authenticity Editing Editing a written stimulus of an aural stimulus Interpretive tasks: paraphrasing a story or conversation Potential stimuli include: song lyrics. 98.and goals. enhance the clarity of the message. identifying an element in a picture. simple requests and comments. 5.s own oral production and use various devicespauses. Second: Test-takers read the multiplechoice comprehension questions and items then chose the correct one Authentic Listening Tasks Buck (2001-p. rhytmic structure. Second reading. authenticity are well incorporated into the task. Third reading. First reading. Extensive (monologue) : Extensive oral production tasks include speeches. phrase. Challenges of the testing speaking: 1. oral presentations. Microskills: 1. phrasal. test-takers listen for gist. events and feelings. such as emphasizing key words.Use adequate number of lexical units(words) to accomplish pragmatic purposes 6. generalization and exemplification. and intonation contours. 5. Pronunciation is tested. 102. phonological relationships (stress / rhythm / intonation) 3. self-corrections. slowed speed. Information Transfer aurally processed must be trnasfered to a visual representation. rephrasing. small talk. 97. body language. lexical. 4. recited 3 times.Use appropriate styles.Use grammatical word classes (nouns. rules. morphemes. E. Examples: Word. test-takers write.Produce reduced forms of words and phrases. Similarly. which includes multiple exchanges /or multiple participant. Intensive: The purpose is producing short stretches of oral language. Chart Filling Test-takers see the chart about Lucy&#x201F. registers.The interaction of speaking and listening 2. story-telling. cognitive.Monitor one&#x201F. news reports.etc. interrupting. Sentence Repetition The test-takers must retain a strecth of language long enough to reproduce it.Imitative: (parrot back) Testing the ability to imitate a word. patterns. show full comprehension Difficulties: scoring and reliability validity. words and phrasal units. 5. 4. style cohesion. discourse.Responsive: (interacting with the interlocutor) include interaction and test comprehension but somewhat limited level of very short conversations. phrase. Macroskills 1.g labelling a diagram.verbs. new information and given information. The macroskills include the speakers' focus on the larger elements such as fluency. 101. natural speed. redundancies. or showing routes on a map.Convey links and connections between events and communicative such relations as focal and peripheral ideas. radio. participants. sentence repetition 2. DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING Dictation: Test-takers hear a passage. function. word order. 3.Elicitation techniques 3.Produce fluent speech at different rates of delivery. standards greetings.Every test requires some components of communicative language ability. communicative ability. nonverbal communication and strategic options.). typically 50-100 words. providing a context for interpreting the meaning of words. Interactive: Difference between responsive and interactive speaking is length and complexity of interaction.Scoring BASIC TYPES OF SPEAKING 1.simply retell it either orally or written &#xE0.      respond with the appropriate word or phrase. every task shares some characteristics with target-language tasks. floor-keeping and &#x2013. and 16 . conversation rules. and other sociolinguistic features in face-to-face conversations. and the like. Disadvantage: scoring is time consuming Advantages: mirror real classroom situation it fulfills the criteria of cognitive demand. Interactive listening (face to face conversations) 99. 3. words in stressed and unstressed positions. Retelling Listen story &amp.92)&#x201C. during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out together.Apropriately accomplish communicative functions according to situations. and no test is completely authentic&#x201D. test takers check their (tense.and Macroskills of Speaking microskills of speaking refer to producing small chunks of language such as phonemes. communicative language &amp.Develop and use a battery of speaking strategies.

An advantage of this technique lies in its moderate control of the output of the test-taker (practical advantage).Use cohesive devices in spoken discourse. Part B Testee repeat sentences dictated over the phone. Pronunciation. Cued Tasks (to elicit oral production by using pictures) One of more popular ways to elicit oral language performance at both intensive and extensive levels is a picture-cued stimulus that requires a destcription from the test-taker. test-takers repeat the stimulus. in attempt to help learners be more comprehensible. and as long as avoid a negative washback effect.s oral production is controlled. 2. or mechanical tasks.) Word repetition task: Scoring specifications must be to avoid reliability breakdowns. Such tasks are clearly mechanical and not communicative(possible drawbacks). A common form of scoring simply indicates 2 or 3 point system for each response Scoring scale for repetition tasks: 2 acceptable pronunciation 1 comprehensible. PHONEPASS TEST The phonepass test has supported the construct validity of its repetition tasks not just for discourse and overall oral production ability. (drawback. or perhaps a question ( to test for intonation production. the more possibility for error and therefore the more difficult it becomes to assign a point system to the text. practicality.) Picture &#x2013. Part C Testee answer questions with a single word or a short phrase of 2 or 3 words. Assessment of oral production may be 17 . repeat sentences. listening vocabulary are the sub-skills scored The scoring procedure has been validated against human scoring with extraordinary high reliabilities and correlation statistics.breath groups. Scores are calculated by a computerized scoring template and reported back to the testtaker within minutes. or what classroom pedagogy would label as controlled responses. whether it is a pair of words. Sentence / Dialogue Completion Tasks and Oral Questionnaries ( to produce omitted lines. repeat accuracy and fluency. Repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment. 10. Interaction effect: impossibility of testing speaking in isolation Elicitation techniques: to elicit specific criterion we expect from test takers.(practical advantages Read &#x2013. 106. Designing Assessment Tasks: Imitative Speaking paying more attention to pronunciation. seriously incorrect pronunciation The longer the stretch of language.s lines have been omitted. reliability in scoring). and sentence constituents. 103. 9.No speaking task is capable of isolating the single skills of oral production. It is easily administered by selecting a passage that incorporates test specs and bye recording testee&#x201F. In a simple repetition task. Designing Assessment Tasks: Intensive Speaking test-takers are prompted to produce short stretches of discourse (no more then a sentence) through which they demonstrate linguistic ability at a specified level lang Intensive tasks may also be described as limited response tasks. say words. Test-takers are directed to telephone a designated number and listen for directions.but they do require minimal processing of meaning in order to produce the correct grammatical output. there are several drawbacks Reading aloud is somewhat inauthentic in that we seldom read anything aloud to someone else in the real world. a sentence.Express a particular meaning in different grammatical forms. words in a dialogue appropiriately) Test-takers read dialogue in which one speaker&#x201F. Part A Testee read aloud selected sentences forum among printed on the test sheet. Make sure your elicitation prompt achieves its aims as closely as possible. and answer questions.Produce speech in natural constituents: in appropriate phrases. inauthentic nature of this task. interaction between speaking and listening or reading is unavoidable. output.Eliciting the specific criterion you have designated for a task can be tricky because beyond the word level. Testtakers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. The test has five sections.     elliptical forms.(possible drawback) Another disadvantage is contrived. 105. spoken language offers a number of productive options to test-takers. Directed Response Tasks Administrator elicits a particular grammatical form or a transformation of a sentence. partially correct pronunciation 0 silence. is usually necessary. especially suprasegmentals. Part D Testee hear 3 word groups in random order and link them in correctly ordered sentence Part E Testee have 30 seconds to talk about their opinion about some topic that is dictated over phone. pause groups. Concurrent involvement of the additional performance of aural comprehension. Aloud Tasks (to improve pronunciation and fluency) include beyond sentence level up to a paragraph or two. If reading aloud shows certain practical adavantages (predictable output. and possibly reading. reading fluency. with exception of a parent reading to a child. Three important issues as you set out to design tasks.It is important to carefully specify scoring procedures for a response so that ultimately you achieve as high a reliability index as possible. 3. 107. 1. 11. The PhonePass tests elicits computer-assisted oral production over a telephone. Scoring: to achieve reliability 104. the scoring is easy because all of the test-takers&#x2019. Test-takers read aloud. One disadvantage of this technique is its reliance on literacy and an ability to transfer easily from written to spoken English.

Giving Instruction and Directions The technique is simple : the administrator poses the problem.Warm-up : (small talk) interviewer directs matual introductions. While role play can be controlled or &#x201E. Question and Answer Question and answer tasks can consist of one or two questions from an interviewer. by the interviewer.Probe: Probe questions and prompts challenge testee to go heights of their ability. and veterinaries. 111. 3. Test-takers respond with a few sentences at most. &#x130. role plays. and even comprehension. workable scoring system (reliability). 4. acceptable target form 1 comprehensible. differing from intensive tasks in the increased creativity given to the test-taker and from interactive tasks by the somewhat limited length of utterances.Wind-down: This phase is a short period of time during which interviewer encourages testee to relax with easy questions. Placement interviews. Scoring is based primarily on comprehensibility and secondarily on other specified grammatical or discourse categories. discussions and conversations with and among students are difficult to specify 18 . 113. it frees students to be somewhat creative in their linguistic output. anxieties. it is a display question intended to elicit a predetermined correct response. both test-takers can ask questions of each other. designed to get a quick spoken sample from a student to verify placement into a course.&#x201F. to extend beyond limits of interviewer&#x201F.( biased for best performance) *creating a consistent. pragmatic appropriateness. sets testee&#x201F. Scoring presents the usual issues in any task that elicits somewhat unpredictable responses from test-takers. task accomplishment. Advantages they elicit short stretches of output and perhaps tap into testee ability to practice conversation by reducing the output/input ratio.minute audio-taped test of oral language ability within an academic or Professional environment. Eliciting instructions or directions Paraphrasing read or hear a number of sentences and produce a paraphrase of the sentence. If you use short paraphrasing tasks as an assessment procedure. The scussess of an oral interview will depend on. fluency. grammatical. partially correct target form 0 silence.s important to pinpoint objective of task clearly. TEST OF SPOKEN ENGLISH (TSE) The TSE is a 20 &#x2013.predicted forms and functions. discussions. The scores are also used for selecting and certifying health professionals such as physicians.&#x201F. In this case. The first question is intensive in its purpose. and the testtaker responds.nterview A test administrator and a test-taker sit down in a direct face-to-face Exchange and proceed through a protocol of questions and directives. *clearly specifying administrative procedures of the assessment(practicality) *focusing the q and probes on the purpose of the assessment(validity) *appropriately eliciting an optimal amount and quality of oral production from the test-taker. nurses. Questions at the responsive level tend to be genuine referential questions in which the test-taker is given more opportunity to produce meaningful language in response. Test-takers respond with questions.s expectation through difficult questions. Four stages: 1. the advantages of translation lie in its control of the output of the test-taker. (practical advantages) Maps are another visual stimulus that can be used to assess the language forms needed to give directions and specify locations. or lexical targets.s ease. the integration of listening and speaking is probably more at stake than simple oral production alone.guided&#x201F. As an assessment procedure.      stimulated through a more elaborate picture. A potentially tricky form of oral production assessment involves more than one test-taker with an interviewer. Scoring may be problematic depending on the expected performance. or sentence and are asked to translate it. or they can make up a portion of a whole battery of questions and prompts in an oral interview.s extroversion. readiness to speak. 110. The interview is then scored on accuracy in pronunciation and/or grammar. Designing Assessment Tasks: Interactive Speaking Tasks include long interactive discourse ( interview. 109. 112. This stage give interviewer a picture of testee&#x201F. physical therapists. Within constraints set forth by guidelines. apprises testee. which of course means that scoring is more easily specified. phrase. (No scoring) 2. Scoring scale for intensive tasks 2 comprehensible. The tasks on the TSE are designed to elicit oral production in various discourse categories rather than in selected phonological.Level check: interviewer stimulates testee to respond using expected . Role Play Role playing is a popular pedagogical activity in communicative language teaching classes. games). With two students in an interview contxt. Designing Assessment Tasks: Response Speaking Assessment involves brief interactions with an interlocutor.(practical advantage) 108. vocabulary usage. helps testee become comfortable. it&#x201F. confidence. or seriously incorrect target form Translation (of Limited Stretches of Discourse) (To translate from target language to native language) The test-takers are given a native language word. pharmacists. Discussions and Conversations As formal assessment devices. this technique takes test-takers beyond simple intensive and responsive levels to a level of creativity and complexity that approaches real-world pragmatics.Linguistic target criteria are scored in this phase.

and interpret word order patterns and their significance. Recognize the communicative functions of written text. Criteria for scoring need to be clear about what it is you are hoping to assess.Crossword puzzles 3. 8 ASSESSING READING 118. and interaction with the hearer. TYPES (GENRES) OF READING Academic reading Reference material . b-set appropriate tasks. The advantage of translation is in the control of the content.&#x201F. cards. deduce causes and effects. verbs. Textbooks. Scoring is the key assessment challenge.&#x201F. etc). consider a picture or series of pictures as a stimulus for a longer or description.Information gap grids 4. mid. they offer a level of authenticity and spontaneity that other assessment techniques may not provide. theses Essays. emails. vocabulary. new information. etc. low. Recognize grammatical word classes(nouns. Picture &#x2013.elicit optimal output. ideas. fluency. Microskills : Discriminate among the distinctive graphemes and orthographic patterns of English. reliable scoring procedures. such as scanning and skimming. patterns. Editorials and opinion writing Job-related reading Messages. c. Recognize a core of word. Retain chunks of language of different lenghts in short-term memory.&#x201F. News Event In this type of task.&#x201F. generalization. mid. Performance is judged by the examiner to be at one of ten possible levels on the ACTFLdesignated proficiency guidelines for speaking: Superior. according to form and purpose Infer context that is not explicit by using background knowledge From described events. systems (tense agreement. sociolinguistic and cultural knowledge. the grammatical and discourse features. a marketing plan. Oral Presentations it would not be uncommon to be called on to present a report. a paper. Recognize cohesive devices in written discourse and their role in signaling the relationship between and among clauses. and exemplification Distinguish between literal and implied meanings. relatively lengthy stretches of discourse.&#x201F. Macroskills : Recognize the rhetorical forms of written discourse and their significance for interpretation. 120. and so it is also advisable to give some cognizance to comprehension performance in evaluating learners. Designing Assessments : Extensive Speaking involves complex. stress and emphasis patterns. rules and elliptical forms. grammar. Test directions. Scoring should meet the intended criteria Translation (of Extended Prose) Longer texts are presented for test-taker to read in NL and then translate into English (dialogues. Process writing at an efficient rate of speed to suit the purpose. mid. Assessment games: 1. a sales idea.low. photographs. and to some extent. Memos Personal reading Newspapers . game (Logo block) 2. They are variations on monologues. fluency and integrative ability.       and even more difficult to score. Detect culturally specific references and interpret them in a context of the appropriate cultural schemata. pluralization). Letters. OPI is the result of historical progression of revisions under the auspices of several agencies.City maps ORAL PROFICIENCY INTERVIEW (OPI) The best-known oral interview format is the Oral Proficinecy Interview. and charts. synopsis of a story or play or movie. or a method. directions for assembly of a product.Tinkertoy&#x201F. diagrams. Assessing the performance of participants through score or checklists should be carefully designed to suit the objectives of the observed discussion. Games Among informal assessment devices are a variety of games that directly involve language production. Retelling a Story. The OPI is carefully designed to elicit pronunciation. in the case of a dramatic story). Intermediate-high.Cued Story-Telling techniques for eliciting oral production is through visual pictures. papers. 114. test-takers hear or read a story or news event that they are asked to retell. a design of new product. Criteria for scoring should take into account not only purpose in stimulating a translation but possibility of errors that are unrelated to oral production ability 117. d-establish practical. Advanced-high.specify the criterion. including the Educational Testing Service and American Council on Teaching Foreign Language (ACTFL). Develop and use a battery of reading strategies. The objectives in assigning such a task vary from listening comprehension of the original to production a number of oral discourse features (communicating sequences and relationships of events. and vocabulary. Novice-high.expression&#x201F. invitations. Once again the rules for effective assessment must be invoked: a. Letters/ emails.low. and detect such relations as main idea. Schedules (trains. directions on how to find something on map. Discussion is a integrative task. bus) 119. 115. &#x201F. 116. But as informal techniques to assess learners. supporting idea. with minimal verbal interaction. and other genres). The disadvantage is that translation of longer text is a highly specialized skill for which some individuals obtain post-baccalaureate. infer links and connections between events. magazines. 19 .

argue. perhaps in one-on-one conferences with students. Matching Tasks The most frequently appearing criterion in matching procedures is vocabulary. 124. Diagrams media presuppose reader&#x201F. but might serve as a vocab or grammar check. and other graphemic symbols. or cognitive) and supply (from background schemata) omitted details. debate. 126. outlines.g. 3th connected to a specific curriculum. and thereby point the learners in positive directions.zoom in&#x201D.. Multiple-choise Choosing one of four or five possible answers. and a few discourse features). Selective Is largely an artifact of assessment formats. PictureCued Items Shown a picture. 123. Graphs.vocabulary and grammar&#x201D.g. medical reports. reports (e. SELECTIVE READING The test designer focuses on formal aspects of language (lexical. one by one. invitations messages. checks. Top down processing is assumed for most extensive tasks. in the presence of-an administrator. Information Transfers Reading Charts. project reports) schedules. punctuation. personal journals. charts. grammatical. and activating schemata for interpretation of texts. short-answer test responses technical reports (e. guessing the meaning of words from the context. poetry) 20 . written text and are given one of a number of possible tasks to perform. Written response Reproduce the probein writing. key words) Use silent reading techniques for rapid processing Use marginal notes. Editing (Longer Texts) The technique has been applied successfully to longer passages of 200 to 300 words. Impromptu Reading Plus Comprehension Questions without some component of assessment involving impromptu reading and responding to questions. announcements. loan applications) forms.s response must be carefully treated. Some principal strategies for reading comprehension: Identify your purpose in reading a text Apply spelling rule and conventions for bottom-up decoding Use lexical analysis to determine meaning Guess at meaning when you aren&#x201F.        detecting discourse markers. as opposed to asked test-takers to &#x201C. signs. calendar entries. Scanning Strategy used by all readers to find relevant information in a text. compositions academically focused journals. Interactive Interactive task is to identify relevant features (lexical. question. Gap-Filling Tasks Is to create completion items where test-takers read part of a sentence and then complete it by writing a phrase. Maps.g. on small details.. Used picture-cued tasks. lab reports). fiction (eg. Short-Answer Tasks following reading passages is the age-old short-answer format. auditory. Short stories. Multiple-Choise (for Form-Focused Criteria) They may have little context. UNIT 9: ASSESSING WRITING 128. 2nd tasks simulates proofreading one&#x201F. Extensive The purposes of assessment usually are to tap into a learner&#x201F. TYPES OF READING Perceptive Involve attending to the components of larger stretches of discourse : letters. and discourse) within texts of moderately short length with the objective of retaining the information that is processed. memos (e. or semantic maps for understanding and retaining information Distinguish between literal and implied meanings Capitalize on discourse markers to process relationships. 121. manuals Personal Writing letters.t certain Skim the text for the gist and for main ideas Scan the text for specific information(names. 1th authenticity. GENRES OF WRITING Academic Writing papers and general subject reports essays. labels.s global understanding of a text. grammatical.. each describing a labeled part of a picture or diagram. tax forms. multiple-choice. reminders financial documents (e. words. Editing Tasks For grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. Ordering Tasks Variations on this can serve as an assessment of overall global understanding of a story and of the cohesive devices that signal the order of events or ideas. Evaluation of the test taker&#x201F.g. shopping lists. 122. advertisements. dissertations Job-Related Writing messages letters/emails. symbolic. matching. Category includes what many incorrectly think of as testing &#x201C.s own essay. 127. job evaluations. dates. true/ false.s schemata for interpreting them and are accompanied by oral or written discourse to convey. notes. emails. Skimming Tasks Process of rapid coverage of reading matter to determine its gist or main idea Summarizing and Responding Is make summary of the text and give it a respond about the text Note Taking and Outlining A teacher.. PERCEPTIVE READING Reading Aloud Reads them aloud. questionnaires. theses. etc. EXTENSIVE READING Involves longer texts than we have been dealing with up to this point. clarify. Picture Cued Tasks read sentence or passage and choose one of four pictures that is described read a series of sentences or definitions. greeting cards. immigration documents diaries. 125. among other linguistic functions. can use student notes/ outlines as indicators of the presence or absence of effective reading strategies. interoffice). INTERACTIVE READING Cloze Tasks fill in gaps in an incomplete image (visual.

Choose words that have been heard or spoken Scoring=correct spelling Picture-Cued Tasks Write words that are displayed by pictures Eg. theses) Processes of writing (strategies of writing) 131. Correctly convey culturally specific references in the context of the written text. papers. Name. and Punctuation Copying ( bit __ / bet __ / bat __ ) Copy the words given in the spaces provided Listening cloze selection tasks Write the missing words in blanks by selecting according to what they hear Combination of dictation with a written text Purpose=to give practice in writing Picture-cued tasks Write the word the picture represents Make sure that pictures are not ambiguous Form completion tasks Complete the blanks in simple forms Eg. Punctuation Grammatical transformation Making grammatical transformations by changing or combining forms of lang Grammatical competence. soliciting feedback and using feedback for revising and 130. agreement). Writing. accurately assessing audience&#x201F. vocabulary Reading-Writing integration. practical &amp. Less authentic: using a word in sentence? Ordering Ordering / re-ordering a scrambled set of words If verbal=intensive speaking. Short sentences 2. generalization. grammar. bit-bite Choose items according to your test purpose Multiple Choice Techniques Choose and write the word with the correct spelling to fit the given sentences Items are better to have writing component / addition of homonym to make the task challenging Clashes with reading. Scoring problematic when pictures are not clear 134. Produce an acceptable core of words and use appropriate word order patterns. Picture sequence description Reading non-verbal means &amp. supporting idea. connecting sentences logically) mostly 2-3 paragraphs Extensive Writing To manage all the processes of writing for all purposes to write longer text (Essays. Express a particular meaning in different grammatical forms. create a logically connected 2 or 3 paragraphs Discourse conventions with strong emphasis on context and meaning (limited discourse level. If written=intensive writing Reading and grammar Appealing for who like word games and puzzles. MICROSKILLS AND MACROSKILLS OF WRITING Micro-skills Produce graphemes and orthographic patterns of English. brief sentences. vocabulary &amp. Develop&amp. reliable No meaningful value. IMITATIVE WRITING Tasks in Hand Writing Letters. Easy to administer &amp. phone number Make sure that students have practiced filling out such forms Converting numbers/abbreviations to words Either write out the numbers or converting abbreviations to words More reading than writing. idioms. Reliable method to stimulate handwritten English 132. punctuation. Boot-book. Vocabulary assessment Either defining or using a word in a sentence. this works well. spelling &amp. Use cohesive devices in written discourse. Types of Writing Performance Imitative Writing Assess ability to spell correctly &amp. writing fluency in first drafts. assessing collocations and derived morphology Vocabulary &amp. communicate such relations as main idea. exemplification. so be careful To assess the ability to spell words correctly and to process phoneme-grapheme correspondences Matching Phonetic Symbols Write the correctly spelled word alphabetically Since Latin alphabet and Phonetic alphabet symbols are different from each other. perceive phoneme/grapheme correspondences Form rather than meaning (letters. Words. Macro-skills Use the rhetorical forms and conventions of written discourse.use writing strategies. using phrases and synonyms. Picture description 3. Distinguish between literal and implied meanings when writing. using prewriting devices. patterns and rules. address. Use acceptable grammatical systems (Tense. Spelling Tasks and Detecting Phoneme-Grapheme Correspondences Spelling Tests Write words that are dictated. grammar &amp. Produce writing at an efficient rate of speed to suit the purpose. Appropriately accomplish the communicative functions of written texts according to form and purpose. new information. Both reading and writing Short answer and sentence completion Answering or asking questions for the given statements / writing 2 or 3 sentences using the given prompts Reading&amp. 133. correct spelling punctuation Dicto-comp Re-writing the paragraph in one's own words after hearing it for 2 or 3 times Listening &amp.s interpretation. spelling &amp. words. correctness. Convey links and connections between events. Even with context no authenticity Picture-cued 1. read-reed. so specify the criterion Low authenticity.      129. appropriateness) Responsive Writing Connect sentences &amp. Scoring on a 2-1-0 scale is appropriate 21 . mechanics of writing) Intensive Writing To produce appropriate vocabulary within a context and correct grammatical features in a sentence More form than meaning but meaning and context are of some importance (collocations. INTENSIVE (CONTROLLED) WRITING Dictation Writing what is heard aurally Listening &amp. Inauthentic Needs practicing in class.

drafting and revising are strategies which help writers create effective texts Writers need to know their subject and purpose and audience to write developing main and supporting ideas is the purpose for only essay writing Some tasks commonly addressed in academic writing courses are compare/contrast. Restate your position and summarize in the concluding paragraph. SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING Holistic Scoring Definition: Assigning a single score to represent general overall assessment Purpose of use: Appropriate for administrative purposes / Admission into an institution or placement in a course Advantage(s): Quick scoring High inter-rater reliability. valid method of writing assessment 136. Every genre of writing requires different conventions. Easily interpreted scores by lay persons Emphasizes strengths of written piece Applicable to many different disciplines Disadvantages No washback potential Masking the differences across the sub skills within each score Not applicable to all genres Needs trained evaluators to use the scale accurately 140. Write effective supporting paragraphs (show transitions. Guided question and answer Its importance: To provide benefits of guiding test takers without dictating the form of the output Test takers' task: Paraphrasing sentences or paragraphs with purposes in mind Assessment type: Informal and formative Scoring: Either on a holistic scale or an analytical one 137. organization Ignore: Grammatical and lexical errors / minor errors 22 . pro/cons and cause and effect. SCORING Both how Ss string words together and what they say 3. 138. By spending time with learners on pre-writing phases. to avoid plagiarism to offer some variety in expression Test takers' task: Paraphrasing sentences or paragraphs with purposes in mind Assessment type: Informal and formative. Positive washback Scoring: Giving similar messages is primary Discourse. Main idea &amp. Assessment of tasks in academic writing course could be formative &amp. a process approach aims to get to the heart of the various skills that most writers employ.opportunities of genre will help to write effectively. GUIDELINES FOR ASSESSING STAGES OF WRITTEN COMPOSITION Initial stages Focus: Meaning &amp. problem solution. RESPONSIVE AND EXTENSIVE WRITING 1. Test of Written English (TWE&#xAE. include a topic sentence. Paraphrasing Its importance: To say something in one's own words. practicality and reliability 2. Edit sentence structure and rhetorical expression. 139. AUTHENTICITY (face and content validity) Teacher becomes less instructor. description. BEYOND SCORING: RESPONDING TO EXTENSIVE WRITING Here.Strategic Options Free writing. outlining. expression of opinion) Purpose of use To focus on the principle function of the text Advantage(s) Practical Allows both the writer and scorer to focus on the function / purpose Disadvantage(s) Breaking text down into subcategories and giving separate ratings for each 141.         135. Primary Trait Scoring Assigning a score based on the effectiveness of the text's achieving its purposes (accuracy. informal Knowing conventions &amp. teacher responding Assessment type: Informal / formative Washback: Potential positive washback Role of the assessor: Guide / facilitator 143. the writer is talking about process approach to writing and how the assessment takes place in this approach. grammar and vocabulary are secondary 2. Paragraph Construction Tasks Topic Sentence Writing The presence or absence of topic sentence The effectiveness of topic sentence Topic Development in a Paragraph The clarity of expression The logic of the sequence The unity and cohesion The overall effectiveness Multi Paragraph Essay Addressing topic /main idea / purpose Organizing supporting ideas Using appropriate details for supporting ideas Facility and fluency in language use Demonstrating syntactic variety 4. editing. This pays attention to various stages that any piece of writing goes through. more coach or facilitator Assessment: formative &#xF0E8.) Time allocated: 30 minutes time limit/ no preparation ahead of time Prepared by: a panel of experts Scoring: a mean score of 2 independent ratings based on a holistic scoring Number of raters: 2 trained raters working independently Limitations: inauthentic / not real life / puts test takers into artificially time constraint context inappropriate for instructional purposes Strengths: serves for administrative purposes Follow 6 steps to be successful Carefully identify the topic. restate topic and state organizational plan of essay. clarity. 1. Plan your supporting ideas. Analytic Scoring Definition Listening short monologues to scan for certain information Purpose of use Classroom instructional purposes Advantage(s) *More backwash into the further stages of learning Diagnose both the weaknesses and strengths of writing Disadvantage(s) Lower practicality since scorers have to attend to details with each sub-score. freedom for drafts before finished product Questioned issue= Timed impromptu format &#xF0E8. peer. specify details). In introductory paragraph. 3. Types of responding: Self. Many educators advocate process approach to writing. re-drafting and finally producing a finished version of their work. (+) washback &gt. TIME No time constraints &#xF0E8. 142.

feelings. On the other hand. such as the following: a. Language learning logs b. is portfolio development. Communicate assessment criteria to students. State objectives clearly. test scores. tap into higherlevel thinking and problem-solving skills. Conferences and interviews and self assessment are Open ended in their time orientation and format Contextualized to a curriculum Referenced to the criteria ( objectives) of that curriculum and Likely to build intrinsic motivation. newspaper or magazine clippings. minimize time and money &#x2022. multiple choice decontextualized. SELF AND PEER ASSESSMENT Five categories of self and peer assessment: 1. Strategies based learning logs e. in this category. &#x2022. especially within a framework of communicative language teaching. do the scoring. and checklists. open-ended in their time orientation and format. ensure that people. 3. PORTFOLIOS One of the most popular alternatives in assessment. ideas. Diaries of attitudes. CONFERENCES AND INTERVIEWS Conferences Conferences is not limited to drafts of written work including portfolios and journals. Test. Give guidelines on what materials to include. feelings. portfolios include materials such as Essays and compositions in draft and final forms Reports. it is of course important to take the following steps: 1. use tasks that represent meaningful instructional activities. Designate an accessible place to keep portfolios. form. 152. etc Journals. 1. Decide how many students will be observed at one time 3. Design a system for recording observed performances 5. Determine the specific objectives of the observation.s thoughts. usually written with little attention to structure. Acculturation logs 151. Categories or purposes in journal writing. and call upon teachers to perform new instructional and assessment roles. or progress toward goals. Assessment of performance. Designate time within the curriculum for portfolio development. diaries. The dilemma of maximizing both practicality and washback The principal purpose of this chapter is to examine some of the alternatives in assessment that are markedly different from formal tests. OBSERVATIONS In order to carry out classroom observation. and that foster extrinsic motivation. Set up the logistics for making unnoticed observations 4. 2. contextualized to a curriculum. 149. encourage open disclosure of standards and rating criteria. 146. considerable time and effort &#x2022. 6. or correctness. offer much authenticity and washback &#x2022.s oral production ascertains a students need before designing a course of curriculum seeks to discover a students&#x201F. &#x2022. &#x2022. Establish periodic schedules for review and conferencing. Especially large scaled standardized tests. assessment. JOURNALS a journal is a log or account of one&#x201F. much practicality or reliability &#x2022. Characteristics of Alternative Assessment require students to perform. cannot offer much washback or authenticity &#x2022. a student typically monitors him or herself in either oral or written 23 . project outlines Poetry and creative prose Artwork. &#x2022. and other affective factors g. 7. 2. Provide positive washback giving final assessment 150. Self-assessment reflections f. Conferences must assume that the teacher plays the role of a facilitator and guide.          Indicate: Global errors but not corrected Later stages Focus: Fine tuning toward a final version Ignore: Indicate: Problems related to cohesion/documentation/citation 144. create. tasks like portfolios. Successful portfolio development will depend on following a number of steps and guidelines. norm-referenced. not of an administrator. &#x2022. likely to build intrinsic motivation &#x2022. allow students to be assessed on what they normally do in class every day. and Self-and peer. demonstrations. journals. produce. Interviews Interview may have one or more of several possible goals in which the teacher assesses the student&#x201F. &#x2022. 10 BEYOND TESTS: ALTERNATIVES IN ASSESSMENT 145. 147. Responses to readings d. 5. or do something. photos. reactions. of a formal assessment. are non-intrusive in that they extend the day-to-day classroom activities. DILEMMA OF MAXIMIZING BOTH PRACTICALITY AND WASHBACK LARGE SCALE STANDARDIZED TESTS ALTERNATIVE ASSESSMENT one-shot performances timed multiple-choice decontextualized norm-referenced foster extrinsic motivation highly practical.assessments-comments. referenced to the criteria (objectives) of that curriculum &#x2022. 148. and other personal reflection . &#x2022. Audio and/or video recordings of presentations. &#x2022. not machines. Grammar journals c. using human judgment. learning style and preferences One overriding principle of effective interviewing centers on the nature of the questions that will be asked. use real-world contexts or simulations. focus on processes as well as products. provide information about both the strengths and weaknesses of students. are multi-culturally sensitive when properly administered. 4. and written homework exercises Notes on lecturer. Plan how many observations you will make 153. reli-able instruments &#x2022. tend to be one shot performances that are timed.

159. reading skill. 154. and on tests. effort. points for a final exam. Four guidelines will help teachers bring this intrinsically motivating task into the classroom successfully. behavior. Socioaffective assessment. 3. RELATIVE GRADING: It is more commonly used than absolute grading. writing skill). and other assessment techniques that will figure into the formula for assigning a grade. CALCULATING GRADES: ABSOLUTE AND RELATIVE GRADING ABSOLUTE GRADING: If you pre-specify standards of performance on a numerical point system. This preference for more individualized evaluations is often a reaction to overgeneralization of letter and numerical grading. speaking skill. indirect assessment targets larger slices of time with a view to rendering an evaluation of general ability as opposed to one to one specific. Ensure beneficial washback through follow up tasks A TAXONOMY OF SELF AND PEER ASSESSMENT TASKS It is helpful to consider a variety of tasks within each of the four skills( listening skill. Finally. Indirect assessment of performance. Practicality can achieve a moderate level with such procedures as checklists and questionnaires 155. But this should not give you an excuse to avoid converting such factors into observable and measurable results. relatively time constrained performance. and nobody questions the teacher's criteria. 158. GUIDELINES FOR SELF AND PEER ASSESSMENT Self and peer assessment are among the best possible formative types of assessment and possibly the most rewarding." a term that comes from the normal bell curve of normative data plotted on a graph. 4. it is important for you to recog-nize their subjectivity. PERCEPTIONS OF APPROPRIATE GRADE DISTRIBUTIONS Most teachers bring to a test or a course evaluation an interpretation of estimated appropriate distributions. Relative grading is usually accomplished by ranking students in order of performance (percentile ranks) and assigning cut-off points for grades. some kind evaluation are more strategic in nature. Tell students the purpose of assessment 2. and make minor adjustments to compensate for such matters as unexpected difficulty. having established points for a midterm test. 2. CHAPTER 11: GRADING AND STUDENT EVALUATION 156. Student generated tests. it is unheard of to ask a student to self-assess performance. 157. with the purpose not just of viewing past performance or competence but of setting goals and maintaining an eye on the process of their pursuit. 160. follow that interpretation. All of the components of a final grade need to be explicitly stated in writ-ing to students at the beginning of a term of study. relatively uncommon method of relative grading is what has been called grading "on the curve. Such assessment is quite different from looking at and planning linguistic aspects of acquisition. Ts assign a grade. GUIDELINES FOR SELECTING GRADING CRITERIA It is essential for all components of grading to be consistent with an institutional philosophy and/or regulations (see below for a further discussion of this topic). Metacognitive assessment for setting goals. 1.       production and renders some kind of evaluation of performance. is that teachers' preconceived notions of their own standards for grading often do not match their actual practice INSTITUTIONAL EXPECTATIONS AND CONSTRAINTS For many institutions letter grading is foreign but point systems (100 pts or percentages) are common. An older. with a designation of percent-ages or weighting figures for each component. and points accumu-lated for the semester. yet another type of self and peer assessment comes in the form of methods of examining affective factors in learning. If your grading system includes items (d) through (g) in the questionnaire above (improvement. you might adhere to the specifications in the table below. however. It has the advantage of allowing your own interpretation and of adjusting for unpredicted ease or difficulty of a test. TEACHERS&#x2019. CROSS-CULTURAL FACTORS AND THE QUESTION OF DIFFICULTY A number of variables bear on the issue. An evaluation of self and peer assessment according to our classic principles of assessment yields a pattern that is quite consistent with other alternatives to assessment that have been analyzed in this chapter. Define the task clearly 3. A designation of 5 percent to 10 percent of a grade to such factors will not mask strong achievement in a course. measure of a good teacher is one who can design a test 24 . Some institutions refuse to employ either a letter grade or a numerical system of evaluation and instead offer narrative evaluations of Ss. motivation). For example. a final type of assessment that is not usually classified strictly as self or peer assessment is the technique of engaging students in the process of constructing tests themselves. What is sur-prising. In many cultures. you are using an absolute system of grading. tasks. Encourage impartial evaluation of performance or ability 4. consider allocating relatively _ small weights to items (c) through (h) so that a grade primarily reflects achievement. The key to making an absolute grading system work is to be painstakingly clear on competencies and objectives. 5.

3. virtually none of the disadvantages of narrative evalu-ations remain. In some cultures a "hard" test is a good test. perhaps in modified forms: a teacher's marginal and/or end of exam/paper/project comments T's summative written evaluative remarks on a journal. !!! When the checklist format is accompanied. evaluation of multiple objectives of a course. a teacher's review of the test in the next class period. assign grades on the basis of explicitly stated criteria. Selfassessment of end-of-course at-tainment of objectives is recommended through the use of the following: Checklists a guided journal entry that directs the student to reflect on the content and linguistic objectives an essay that self-assesses. the primary objective of which is to offer formative feedback. nar-rative evaluations. and a teacher's conference with the student. How do you gauge such difficulty as you design a classroom test that has not had the luxury of piloting and pre-testing? The answer is complex. 161. Disadvantages: not quantified by admissions and transcript evaluation offices. should a. grades of A are reserved for a highly select few. report. and in others they supplemented them. a few Cs. checklists. Self-assessment. but in others. a number of institutions have at one time or another required narrative evaluations of students. reliability. peer-assessment of performance. with only a small chance that some individualization may be slightly. or other tangible product T's written reaction to a student's self assessment of performance in a course a completed summative checklist of competencies. as in this case. It is usually a combination of a number of possible factors: experience as a teacher (with appropriate intuition) adeptness at designing feasible tasks special care in framing items that are clear and relevant mirroring in-class tasks that students have mastered variation of tasks on the test itself reference to prior tests in the same course a thorough review and preparation for the test knowledge of your students' collective abilities a little bit of luck 162. a teacher's written reaction to a student's self-assessment of performance. 2. not practical-time consuming. base criteria on objectives of course or assessment procedure(s). the notion of a teacher's preparing students to do their best on a test is an educational contradiction. scored task. Educators everywhere must work to persuade the gatekeepers of the world that letter/numerical evaluations are simply one side of a complex representation of a student's ability. Alternatives to letter grading are essential considerations. paper. A more detailed look is now appropriate for a few of the summative alternatives to grading. You already know that the impracticality of scheduling sessions with students is offset by its washback benefits. uniform measures are applied across all students.Conferences. b. whether a summa-tive. Advantages: increased practicality. a good test results in a distribution like the one in the bar graph for a "great bunch": a large proportion of As and Bs. washback. 165. a teacher-student conference 2. those same additional assessments can be made.Checklist evaluations. and students are delighted with Bs.? Typically.     that is so difficult that no student could achieve a perfect score. Perhaps enough has been said about the virtues of conferencing. 296-297) Advantages: individualization. succumbing to formulaic narratives which follow a template. extra-class exercise. 163. some programs opt for a compromise: a checklist with brief comments from the teacher ideally followed by a conference and/or a response from the student. 4. particularly self-assessment. one single final examination is the accepted determinant of a student's entire course grade. self-assessment of performance. use a carefully constructed system of grading. and c. Ts&#x201F. Teacher time is minimized. For summative assessment of a student at the end of a course. and maybe a D or an F for the "deadbeats" in the class. Is there a solution to their gate-keeping role? 1. or other formal. To compensate for the time-consuming impracticality of narrative evaluation. 25 . The fact that students fall short of such marks of perfection is a demonstration of the teacher's superior knowledge. and the student responds with his or her own goals (in light of the results of the check-list and teacher comments). by letter grades as well. with comments narrative evaluations of general performance on key objectives a teacher's conference with the student 164. WHAT DO LETTER GRADES &#x201C. end-of-course assessment or on a formal assessment procedure. as a corollary. washback potential. Ss&#x201F. Narrative evaluations. portfolio. paying little attention to these. the pos-sibilities beyond a simple number or letter include a teacher's marginal and/or end comments. 1. ALTERNATIVES TO LETTER GRADING For assessment of a test. In some instances those narratives replaced grades.MEAN&#x201D. some open-ended comments from the teacher are available. Every teacher who uses letter grades or a percentage score to provide an evaluation. face validity. In protest against the widespread use of letter grades as exclusive indicators'of achievement. and conferences. (pg. institutional manuals for teachers and students will list the following descriptors of letter grades: A: excellent B: good C: adequate D: inadequate/unsatisfactory F: failing/unacceptable The overgeneralization implicit in letter grading underscores the meaninglessness of the adjectives typically cited as descriptors of those letters.

Triangulate letter grade evaluations with alternatives that are more formative and that give more washback.  166. the following principled guidelines should help you be an effective grader and evaluator of student performance: Develop an informed. grading is sometimes subjective and context-dependent. tests do not always yield an expected level of difficulty. unless otherwise negotiated. 26 . letter grades may not "mean" the same thing to all people. grading of tests is often done on the "curve. Ascertain an institution&#x201F. 167. to a teacher's expected distribution of stu-dents across a continuum. grades often conform. grades reflect an institutional philosophy of grading. conform to that philosophy (so that you are not out of step with others). Select appropriate criteria for grading and their relative weighting in calculating grades. and alternatives to letter grades or numerical scores are highly desirable as addi-tional indicators of achievement. by design. final)." grades reflect a teacher's philosophy of grading. Communicate criteria for grading to Ss at the beginning of the course and at subsequent grading periods (mid-term. With those characteristics of grading and evaluation in mind. comprehensive personal philosophy of grading that is consistent with your philosophy of teaching and evaluation. cross-cultural variation in grading philosophies needs to be understood. SOME PRINCIPLES AND GUIDELINES FOR GRADING AND EVALUATION You should now understand that grading is not necessarily based on a universally accepted scale. Design tests that conform to appropriate institutional and cultural expectations of the difficulty that Ss should experience.s philosophy of grading and.