You are on page 1of 33


Language testing

Ammar Mustafa mahadi Open university of sudan


Teaching and Testing Introduction


Teaching and testing are so closely interrelated that it is impossible to work in either field without being concerned with the other. Students learn a language. Teachers teach and test what students learn. Teachers test samples of what students learn. The samples are usually chosen form the learning problems; as knowing the problems is knowing the language.

The Need for Testing
To evaluate students performance. To enable teachers to increase their own effectiveness in teaching. To locate the precise difficulties encountered by students. To see which part of the syllabus needs amendments. To motivate learns; when students learn form their weaknesses.

To detect weaknesses in order to make remedial work and/or additional practice. To reinforce learning as well as teaching..


To have wash back on the teaching methods and lea techniques used. To have wash back on the syllabus taught. Backwash: (also washback) Backwash is the effect of testing on teaching and learning. Language test washback is either positive or negative. Positive washback is said to result when a testing procedure which encourages good teaching, practice is used.

e.g. The use of oral interview in a final examination may encourage teachers to use conversational language use with their students. Negative bachwash may occur when the test items have little relationship to the teaching curriculum. e.g. If the writing skill is tested by multiple-choice items. 2. Kind of Tests There are (several) kinds of tests. The kind of test to used in each case depends on the purpose of the test

In this course we will discuss four kinds of tests: Proficiency test. Achievement tests.

Regardless of any training they have had certain syllabus or textbook.- Diagnostic tests. Another type of proficiency test may be designed to see whether candidates have reached a certain level of proficiency. the test is not designed on a Example whether a student can take a university course of electricity in English. What they can do with the language language for a particular purpose. Achievement Tests 4 . Placement tests. i. Proficiency Tests The purpose of this test type of test is to see how for candidates are proficient to use the language for a certain purpose.e. An example The Cambridge Examination: Cambridge First Certificate Examination Cambridge Proficiency Examination. Designed to measure people’s ability in the language. of they can use the .

Achievement test are of two types o Final achievement test. Usually administered at different stage of the course being taught. on a language category. syllabus content. it has to be vast in order to ensure reliability one or two question. The purpose of such tests is to see how successful the learners are in achieving the learning objectives. Progress Tests Are intended to measure the progress the students are making. . Disadvantages of diagnostic test is that can not be short.3 o based on the course or syllabus taught. 3. .Diagnostic Tests Diagnostic tests are used to identify students’ strengths and weaknesses.may be written by Ministry of Education or another Board.- Achievement tests are directly related to language courses.administered at the end of the course. Hence should be based on the parts that are covered by the learner. might be answered correctly by chance. Placement Tests 5 .

2.- Are used to provide information which will help to place students at the stage of the teaching programme most suitable to their abilities. specification of measured language / skill items should be done. On the other hand a saw a random sample of the students will be given 6 . The Aspects of Validity: Content validity: The contents of the test must contain representatives of the language categories that the test is designed to measure. concurrent validity b. This will also give a fair washback. Predictive validity Concurrent Validity (is more used): This is done when the time is not enough for all tests. Criteria-related Validity There are two types of criteria-related validity: a. This means a test for beginners can not be used for advanced students. Successful placement tests are those constructed for particular situation every institution (with a different language course) should design its placement test. Test Validity A valid test is that which measures accurately what it is intended to measure. To ensure content validity. each students will be given a shorter time on some language categories of the test. Therefore.

e. The percentage of scores of the two groups are compared. 7 . Construct Validity A test constructed to measure certain specific characteristic of language learning.- The appropriate time to take all the test. the validity confident be calculated.g. If the comparison between the scores of the two groups reveals a high level of agreement. Example: A class of 50 students A test for oral communication. Criteria group: group (2): 10 students: each takes all the test on the full time of the test 30 minutes. The validity of the test of the first group is calculated. Validity group: group (1): 50 students: each student interviewed in 5 minutes on some parts of the test.: if the communicative approach is adopted throughout a course then the test should be designed to measure the four skills as a n interpreted skills. then the validity of the group one is good. Face Validity To ensure face validity the test designer may show it to other colleagues and friends. The results of the two groups will then be compared.

The administration conditions: e. A list of the course objectives should be put in a table and then the weights can be assigned. The relation between the objectives should also be considered more important items should have more weight. 8 . Factors Affecting Reliability The extent of the sample: the greater the sample the better is reliability. A test designed in a country might have a low face validity in another country authentic material should be used Content Validity This is a type of test designed to contain representatives of all the contents of the course objectives. Reliability A test is said to be reliable if it is administered to the same candidates on different occasions and it produces the same measurements. time. A reliable test is consistent in its measurement. recording of oral test. This can be seen in two ways: Test / re-test: when the test is administered on different occasions. without additional teaching done between the different occasions.- These others might see the test objective in a better objective manner. Marl / re-mark reliability: When the test papers are marked by different teachers and are awarded the same marks or grades.g.

Scoring the test: Objective-multiple choice are more reliable than subjective marking of compositions Profile Reporting Reliability A recent dimension in the concept of reliability. introduced by communicative language testing. In order to get a full profile of a student’s ability in the target language. Hence. it is necessary to assess his / her performance for each of the different areas of communicaion. speaking and listening. reading. the tester has to answer the following questions: What kind of test is it to be? What is its precise purpose? What are the language abilities to be tested? 9 .- Individuals personal factors: such as illness and motivation. Communicative areas of the language are such as: listening comprehension. reading and writing (summarizing) and writing. Instructions: whether the rubrics are clear for all conditions. Stages of Test Construction The tests decides: what it is that the tester wants to know and for what purpose.

This will give evidence of practicality and validity. Pilot stage: To try-out the draft on a larger sample of the same kind of people. 10 - . How detailed must the results be? How accurate must the results be? How important is backwash? What constraints are involved? Stages of Test Construction Planning Stage: o the content o general layout. o scoring method.- d. Final Validation: Try-out for the final form of the test. o the instruction to be given. This will provide washback for test administration and for revision. o the length. Pre-pilot Stage To try the rough draft of the test on a small group of people in order to see general impact of the test and to identify the unsatisfactory items. o types of test items (questions). o time limit. 2.

economical in time and with useful washback. reliable. One of the options is the correct answers and the others are distractors. The candidate is to choose the correct option. During . Example: Stem: He has been here…… half an hour. Number B is the correct answer. rapid and economical on time. Options: B: for C: while D: since Number A. o Multiple Choice In this techniques a question is designed with: a stem and options. 11 A. C and D are distractors.- Test Techniques and Measuring Overall Ability The phrase ‘test techniques’ means ways of electing behaviour from candidates which will tell us about their language abilities. A technique should be valid. The main advantages of multiple choice are that its scoring is perfectly reliable.

The effect of guessing on the scores is 25% (if the test contains four options). with a different order of options. i. the candidate who recognizes the correct option might not be able to vse the same form when speaking or writing. a suggestion is to have two versions of the same test. Other distractors are not quite relevant. Cheating might be facilitated To avoid cheating. Difficulty of Writing Questions The difficulty of possible test items make testers committee mistakes.- Disadvantages of multiple choice test The techniques tests only recognition knowledge. o Clues in the options to which is correct. o Close Test Technique Close test techniques are of various types. They can be used to measure more than one language ability at a time.e. 12 . o Ineffective distractors. though they are recommended for testing reading skills. This also mean that some candidate might get more scores and others might get less Distractors are not always available This limits what can be testes. Example: the past simple and the present perfect. such as: o More than one possible answer.

Deletion of words are made at a fixed numbers of word e. Example: fill in the blank: 13 .- Varieties of Close Test Procedure Blank-filling: The tester deletes a numbers of words in a passage leaving blanks for the tester to fill in. o Easy to score. Characteristics of Blank-Filling Procedure: Positive: o Easy to construct.g. we cannot be of the speaking ability of the testee. every seventh word is deleted. The deletion of words might give a different’. The first sentences are usually left as a ‘lead. o Integrative method: i.e. it measure more than one language ability. - Negatives: o If ‘reading’ is measured. o Different passage used give different washback. Careful selections of text and of deleted words are needed. o Easy to administer.

. the first language reading skills of the students must be ascertained. o Clearly there (1)……… often little purpose in testing those reading (2) …………. into the (7)………. Can be used to test ‘parts of speech’ (5) language (3) first (6) reading (7) second The advantages given to ‘fill-in gap’ can also be given to the C-Test. the second half of every second word is deleted. More passages can be used. is no grantee that he / she will be able to transfer those skills of (6)……. Answers: (1) is (2) skills (4) students The C-Test The C-test is a type of close test. instead of deleting whole words. Example: Complete the words in the paragraph: 14 . But. language. the mere fact that (4)……… has mastered the reading skills in the first (5)……. Only exact scoring is necessary. in the second language which the students have not yet developed in their (3)………… language. Characteristics of C-test: Shorter passages can be used. - However.o Before reading tests in the second or foreign language can be successfully constructed.

Steps for Dictation The tester (teacher) reads all the text straight through. Inside t…… cab of…… the f………. One o…… them dri…… the eng…… . The tester read the text in stretches (meaningful parts): pausing between each stretch and the next. partial dictation can be used.. Dictation can be used to test listening. The dri…… .e. A dictation test is easy to create and to adminster. for testees to check. the tester may re-read the text once more.- There are usually five men in the crew of a fire engine. A Disadvantage o Not easy to score more so when dictating jump led words to be written as sentences. to dictate a text when testees have the written form with gaps to fill in. The lea……. After testees finish writing. engine. Testing Writing 15 . The o…… firemen s………. Dictation Advantages of Dictation: Dictation can be used to test spelling and punctuation. for a time enough for students to write. Sits bes……. Testees (students) listen. i. o To make scoring easy.

words and punctuation. Example: First Level: . we have to make testees write. Language use: to write correct sentences Treatment of content: the ability to think creatively and write relevant information. Content: (for a class test) the test should measure the writing skills the students were trained to write. Tasks: The test is to measure the types of tests and functions that students were learning to perform (included in the syllabus textbook). and use language effectively. Judgment skills: the ability to write for a particular purpose with a particular audience in mind (communicatively). Specifications of A writing Test A writing test should measure only the writing skills. Stylistic skills: the ability to manipulate sentences and paragraphs.greetings o expressions of thanks o wants / need 16 . General Writing Stages: Mechanical abilities: the ability to write correct spelling.- In order to test writing. The writing skills that we should test are those testees were trained to master.

(?) Spelling: “Did you rec……ve a letter from home today?” 17 b. Rubrics: Must be clear Length of the test: should be fixed Types of questions: chosen as appropriate Using Techniques to Test Writing Multiple-choice Questions Example: Instruction: check the alternative that is correct: Punctuation: “Do you plan to come tomorrow ( )”.describe o explain o compare / contrast o argue Material: Test material should be authentic (or the otherwise.) d.) c. the degree to which the authentic material to be altered should be considered. a. “Yes.o apology - Second Level: . (:) . I do”. (. (.

(a) a….V. (a) I (b) e (c) o (d) u Vocabulary: My nephew’s sister is my…….t (b) n…e (c) c…. Guided Composition 18 (watch) ..n (d) s…r B) Filling the Gap Can be used to measure writing skills at different categories: Example: Fill the blank as correct grammar: (Tenses) Yesterday I…………. a good film on T. (a) f (b) gh (c) p (d) ph There were three men and two w…men present. Vocabulary: A group of words that gives fall meaning is a ……. Punctuation: put in the correct punctuation mark: He is a clever student ……. rase. last exam he won the school prize…….- (a) ei (b) ee (c) i (d) ie A group of words is a ………..

o Learning a second language is an advantage (for more education and culture). Use the points given below.g. such as notes or pictures. which means better jobs. You should write about one page. Spelling. Holistic Method: (also called impressionistic method) the tester assigns a certain score to a piece of writing in the basic of an overall impression of it Testing Oral Ability The main objective here is to measure the extend of the learner’s development of the ability to interact in the target language. Example: Compare the benefits of university education in English with its drawbacks. o English is an international language. word order. Arabic: . Analytic Method: In this method the tester allots certain scores for each category he/she is measuring e. English: Scoring Writing Performance There are generally two methods to score students’ performance on a writing test. grammar.Most scientific references are in English.Easier for students o Easier for most teachers o Saves a year in most cases . they should be restricted.- Because candidates should know just what is required of them. 19 . The guided composition technique use different types to do that.

20 . Flexibility: To measure the learner’s ability to initiate a conversation and to adept to new topics. a learner is expected to perform tasks such as: Thanks Requirements Opinions Information Want / need Accuracy: A certain limit of accuracy is expected from the learner when producing the target language. Communicative Function: Appropriate use of language to function. though some error grammatical / lexical errors that do not destroy communication are acceptable. should be considered.- Therefore the tester should sets tasks that represent samples of the performances the learner is expected to perform. Tasks Specifications: Operations: At an intermediate level.

the interviewee speaks to a superior. Interaction with Peers: Two or more students are asked to discuss a topic. Therefore it has a help reliability. there is no way to follow up the testee’s flexibility in responses. A drawback is that. Many functions (such as asking about assessed. A drawback is that one student might affect the others. b. Oral Test Format Interview: This is the most used format. takes most initiative and deprives the others. e.- Size: The ability to produce more complex utterances and develop these into discourse. the testee. There no initiatives from information) would not be 21 . Drawbacks: a.g. Response to tape-recordings: This procedure helps to elicit uniform elicitations of responses.

Role Play: Conditions can be asked to assume a role in a particular situation. Picture A single picture can be used for eliciting descriptions.- Elicitation Techniques Questions and Requests for Information: Direct questions can be used. A series of pictures is suitable as a basis of narration. e. 22 . Setting such reading task is not easy. A disadvantage is that a testee might affect the others..? Explain how/why….? Yes/no questions should be avoided.g. An advantage in that different functions can be tested. Testing Reading The task of the tester is to set reading tasks that will result in behaviour that will demonstrate its successful completion. The tester observes.: Can you tell me what you thank of…. manifest themselves in overt behaviours. because receptive skills do not usually..

the general meaning of the test. Identifying stages of an argument. Identifying examples presented in support of an argument. o Intensive reading: short passages are enough. Micro-Skills to be Tested Identifying refer reference words.g. To test scanning: choose a passage with plenty of discrete pieces of information. transition and conclusion of ideas. Choose a text of a suitable length: o Testing scanning: needs longer text. Skimming to get the gist. 23 . Reading-Skills to be Tested (different levels) Scanning for specific information. pronouns. Using context to guess meaning of unfamiliar words. especially for the introduction. Understanding relations between parts of tests by recognising indicators in discourse.- Therefore. it is important to specify clearly what the testee should be able to do. development. Selecting a text for a Reading Test Select a representative sample: that represent the reading skills you need to test. Recognising the significance of the use of different tenses. e.

e. which word in line 15 means the same as ‘woman and cofortable’ Testing Listening Skills 24 .g. arguments or topics. Do not use a text that students have already read. Identifying referents: e. Possible Techniques of Writing Reading Tests To measure reading skills testers should use questions techniques that measure only reading skills and do not entail writing abilities: Multiple choice questions.g.- Choose a topic which will interest the test.g. Short answers: should not entail much ability of writing. Identifying order of events. list the events (using numbers) as told in the passage. Avoid texts of specific culture bias. Avoid texts of students’ general knowledge. What does the pronoun ‘it’ in line 21 refer to? Guessing the meaning of unfamiliar word: e. True/ False questions. Unique Answer: Filling a gap with a word from the passage.

g. a tape-recorder or a lecture. Listening for the general idea Following directions. sarcasm…. Listening comprehension. When testing listening. Listening Skills to be Tested: Listening for specific information.- There are occasions when only listening is practiced.). Recognition of function of structures (e. most techniques used for reading tests can be applied (receptive skills).  Tested Listening Categories - Testers usually test two listening categories: Phoneme discrimination and sensitivity to stress and intonation.g. However in most cases listening is naturally practiced with speaking (oral skills). Interpretation of intonation patterns (e. Following instruction. interrogative as a request: can you switch on the light?  Testes Used in Testing Listening 25 . Such occasions are like: listening to the radio. Following instructions.

Type of Listening Test Phoneme discrimination tests: o This type of tests consists of a picture accompanied by three or four words spoken by the examiner. instructions…etc. The testee is to indicate which word he heard. pen 3. Different pictures are shown. Two are identical. and which one is different.g. The testee hear three sentences. The testee twice. e. hear only one word. Note: Tester should choose the appropriate category and the appropriate text that are relevant to the testee’s level.g. 1. written: ten pen den ben 26 . announcements. lecture. He listens to only one word spoken by the examiner.- Monologues. pair 4. The testee is to write the numbers of the word that is appropriate for the picture e. one is different. The testee is to choose the right picture. 3. Others: such as. The testee is given a sheet with four written words. Conversations. Dialogues. Pin 2. pain B. in person or on tape. talks. The testee is to indicate which two are the same. spoken 2.

Written: the speaker is: o making a statement. e. spoken: I’ve got three books now.- Spoken: Pen. written: pan pen pin pain (2) Tests of Stress and Intonation Two types of tests are generally used: Testees listen to taped sentence and indicate the syllable which carries the main stress of the whole structure. e. ( ) ( ) ( x) ( ) ( ) 2. from the written four words. Then he choose a word. spoken: put the pan in some hot water.g. The testee has to identify the mode of the sentence: Spoken: He’s a fine goalkeeper.g. The testee hears a sentence. 27 . They show the main stress by putting a cross in the bracket under the appropriate syllable. The examiner says a sentence (or on a tape). which has occurred in the sentence. written: I’ve got three books now. o Being sarcastic.

B: I didn’t tell you but you did it then. - (3) Testing Statements and Dialogues These types of tests are designed to measure how well students can understand short samples of speech. D: I didn’t tell you and you didn’t do it then The testee hears a short question and has to select the correct response. 28 . written: A: I told you and you did it then. Spoken: Why are you going home? written: A: At six o’clock. spoken: I wish you’d done it when I told you. C: I told you but you didn’t do it then.o asking a question. C: To help my mother. B: Yes. These types can be used to test listening comprehension on categories such as grammar and lexicon.g. Two types that can be used are: The testee hears a statement and then choose the best option from four written paraphrases. e. e.g. I am.

on different categories such as: grammar and lexis.g. Spoken: Look! what’s that inside the square? It’s a white circle. The testee decides which statement is true and which is not true. There are many types of test designs: Type 1: A picture is used in conjunction with a set of spoken statements.- D: By bus. (4) Testing Comprehension through Visual Materials Pictures. The testee has to pick out the described picture. Type 3: Simple diagram: students listen to dialogue about one diagram from a set of four diagram. The testee is to indicate which diagram the conversation / dialogue is about. The testee listens to four sentences describing one of the pictures. e. diagrams and maps can be used for testing listening. 29 . Type 2: Testees are given a set of five pictures each picture is somewhat different from the others. The advantage of using such materials is that the testee’s performance is independent on other skills.

Oral Interview: An interview done by the tester and two (or more) testees. Testing Speaking Most of techniques and types used to test listening can also be used to test speaking. Yes. comprising group discussion. A short problem-solving activity involving a comparison or sequencing of pictures. Longer activity. 30 . it is. A role play: each testee takes a role. To restrict the student to the grammatical structure being tested. after the situation is set by the testee. Type 4: Testees listen to talk (a lecture) and then answer questions about the talk. others can be added.- Is that a black circle? Whereabouts? Above the square. part of the paraphrase is given. However. Testing Grammar Grammar can be tested by means of many types of questions: Paraphrase: These require the student to write a sentence equivalent in meaning to one that is given.

Then I took a car selling job. e.. 2. John: And since then what……………….: Mr. Cole.: Testing present perfect with for: It is ten years since I last saw him I ………………………. the or x. an. Mr.? Mr. Cole: I worked in a bank for two years. Six years. Mr.: write a. Mr. John: And when………………………………? Mr. the or no article.g. Cole is being interviewed for a job.g.g. for no article: 31 .- e. Cole: in 2005. A: prepositions of place: A picture can be used. Cloze Test Questions: This type can be used to test several parts of grammar. The picture contains several objects. John: Good morning. Which school……….? Mr. Cole: Oxford College. Completion: the testee is asked to complete the given sentence. e. Mr.g. and the testee is to indicate the position of each item in relation to another. B: Articles: Testees are required to write a. e. an.

. I was happy to find the room quite cosy. e.g. cosy: means: warm and comfortable cold and uncomfortable 32 . The testee is to indicate which definition is the right one.g.: choose the word that is closest in meaning to the word on the left: gleam: a. e. Definitions: A number of definitions are given for the tested word. shine c. Stones at…….- In England children go to ……….: A: After walking in the cold weather.: students should be careful about their examinations……. Dog. One morning they saw …. clean 2. gather b. welcome d. C: linking words: One word is required.. dog was afraid of ………… man. School from Monday to Friday………… school that Mary goes to is very small. ………. She walks there each morning with……… friend.g. e. Testing Vocabulary Several techniques can be used to test vocabulary: Synonyms: the testee is required to indicate the word of closer meaning to a certain word. they will fail. Man throwing……..

gloves……. e.g. disabled c.g. is the second month of the year. Gap filling: Testees are asked to fill in a gap in context. hampered b.: The strong wind…. deranged d. such as a watch. Here students are presented with pictures of different objects.: write down the names of the following objects. The context should not contain another word that the testee does not know.. definitions and gap filling can also be used to test vocabulary in achievement tests. 33 .- hot but comfortable hot and uncomfortable - B: ………….. 3. regaled Production tests: Pictures. e. The man’s effort to put up the tent.