You are on page 1of 5

Te sting

Testing
passports (showing their language abilities in all the languages they speak) and language
biographies (describing their experiences and progress).
There are other forms of continuous assessment, too, which allow us to keep an eye on
how well our students are doing. Such continuous recording may involve, am ong other
things, keeping a record o f who speaks in lessons and how often they do it, how compliant
students are with homework tasks and how well they do them, and also how well they
interact with their classmates.
Some students seem to be well suited to taking progress and achievement tests as the main
way o f having their language abilities measured. Others do less well in such circumstances
■ Reasons for testing students ■ Marking tests and are better able to show their abilities in continuous assessment environments. The best
■ Good tests ■ Designing tests solution is probably a judicious blend of both.

■ Test types
Good tests
Good tests are those that do the job they are designed to do and which convince the people
taking and marking them that they work. Good tests also have a positive rather than a
Reasons for testing students negative effect on both students and teachers.
At various stages during their learning, students may need or want to be tested on their A good test is valid. This means that it does what it says it will. In other words, if we say
ability in the English language. If they arrive at a school and need to be put in a class at an that a certain test is a good measure of a student’s reading ability, then we need to be able to
appropriate level, they may do a placement test. This often takes the form of a num ber of show that this is the case. There is another kind of validity, too, in that when students and
discrete (indirect) items (see below), coupled with an oral interview and perhaps a longer teachers see the test, they should think it looks like the real thing - that it has face validity.
piece of writing. The purpose of the test is to find out not only what students know, but also As they sit in front of their test paper or in front o f the screen, the students need to have
what they don’t know. As a result, they can be placed in an appropriate class. confidence that this test will work (even if they are nervous about their own abilities).
At various stages during a term or semester, we may give students progress tests. These However reliable the test is (see below) face validity demands that the students think it is
have the function of seeing how students are getting on with the lessons, and how well they reliable and valid.
have assimilated what they have been taught over the last week, two weeks or a month. A good test should have marking reliability. Not only should it be fairly easy to mark,
At the end of a term, semester or year, we may want to do a final achievement test but anyone marking it should come up with the same result as someone else. However,
(sometimes called an exit test) to see how well students have learnt everything. Their since different people can (and do) m ark differently, there will always be the danger that
results on this test may determine what class they are placed in next year (in some schools, where tests involve anything other than computer-scorable questions, different results will
failing students have to repeat a year), or may be entered into some kind of school-leaving be given by different markers. For this reason, a test should be designed to minimise the
certificate. Typically, achievement tests include a variety of test types and measure the effect of individual marking styles.
students’ abilities in all four skills, as well as their knowledge o f gramm ar and vocabulary. W hen designing tests, one of the things we have to take into account is the practicality
Many students enter for public examinations such as those offered by the University of the test. We need to work out how long it will take both to sit the test and also to m ark
of Cambridge ESOL, Pitm an or Trinity College in the UK, and in the US, the University it. The test will be worthless if it is so long that no one has the tim e to do it. In the same
of Michigan and TOEFL and TOEIC. These proficiency tests are designed to show what way, we have to think of the physical constraints of the test situation. Some speaking tests,
level a student has reached at any one time, and are used by employers and universities, for especially for international exams, ask not only for an examiner but also for an interlocutor
example, who want a reliable measure of a student’s language abilities. (someone who participates in a conversation with a student). But this is clearly not practical
So far in this chapter we have been talking about testing in terms o f ‘one-off’ events, for teachers working on their own.
usually taking place at the end of a period of tim e (except for placement tests). These Tests have a marked washback/backwash effect, whether they are public exams or
‘sudden death’ events (where ability is measured at a particular point in time) are very institution-designed progress or achievement tests. The washback effect occurs when
different from continuous assessment, where the students’ progress is measured as it is teachers see the form of the test their students are going to have to take and then, as a
happening, and where the measure of a student’s achievement is the work done all through result, start teaching for the test. For example, they concentrate on teaching the techniques
the learning period and not just at the end. One form of continuous assessment is the for answering certain types of question rather than thinking in terms of what language
language portfolio, where students collect examples of their work over time, so that these students need to learn in general. This is completely understandable since teachers want as
pieces of work can all be taken into account when an evaluation is made of their language many of their students as possible to pass the test. Indeed, teachers would be careless if they
progress and achievement. Such portfolios (called dossiers in this case) are part of the CEF did not introduce their students to the kinds of test item they are likely to encounter in the
(Com mon European Framework), which also asks language learners to complete language exam. But this does not m ean that teachers should allow such test preparation to dominate

166 16 7
Chapter 13 Testing

their lessons and deflect from their main teaching aims and procedures.
The washback effect has a negative effect on teaching if the test fails to mirror our Circle the correct answer.
teaching because then we will be tem pted to make our teaching fit the test, rather than the
You m u s t_______here on time.
other way round. Many m odern public examinations have improved greatly from their
more traditional versions, so that they often do reflect contem porary teaching practice. As a to get b getting c to have get d get
a result, the washback effect does no t have the baleful influence on teaching which we have
been discussing. Sometimes students are instructed to choose the ‘correct’ answer (because only one answer
When we design our own progress and achievement tests, we need to try to ensure that is possible), as in the example above. But sometimes, instead, they can be told to choose the
we are not asking students to do things which are completely different from the activities ‘best’ answer (because, although m ore than one answer is possible, one stands out as the
they have taken part in during our lessons. That would clearly be unfair. most appropriate), e.g.
Finally, we need to remember that tests have a powerful effect on student motivation.
Circle the best answer.
Firstly, students often work a lot harder than normal when there is a test or examination
in sight. Secondly, they can be greatly encouraged by success in tests, or, conversely, Police are worried about the level o f ________ crime.
demotivated by doing badly. For this reason, we may want to try to discourage students
a juv en ile b childish c yo ung d infant
from taking public examinations that they are clearly going to fail, and when designing our
own progress and achievement tests, we may want to consider the needs of all our students, Multiple-choice questions have the great advantage of being easy to mark. Answer sheets
not just the ones who are doing well. This does not mean writing easy tests, but it does can be read by computer, or can be marked by putting a transparency over the answer sheet
suggest that when writing progress tests, especially, we do not want to design the test so that which shows the circled correct letters. Markers do not have to worry, then, about the
students fail unnecessarily - and are consequently demotivated by the experience. language in the questions; it is simply a m atter of checking the correct letters for each
question.
One problem with multiple-choice questions lies in the choice of distractors, that is
Test types the three incorrect (or inappropriate) answers. For while it may not be difficult to write
W hen designing tests, we can either write discrete items, or ask students to become involved one obvious distractor (e.g. answer a ‘to get’ in the first example above), because that is a
in m ore integrative language use. Discrete-item testing means only testing one thing at a mistake that students commonly make, it becomes less easy to come up with three items
time (e.g. testing a verb tense or a word), whereas integrative testing means asking students which will all sort out those students who know how this piece of language works from
to use a variety of language and skills to complete a task successfully. A further distinction the ones who don’t. In other words, there is a danger that we will either distract too many
needs to be made between direct and indirect test items. A direct test item is one that asks students (even those who should get the question right) or too few (in which case the
students to do something with language (e.g. write a letter, read and reply to a newspaper question has not done its job of differentiating students).
article or take part in a conversation). Direct test items are almost always integrative. Multiple-choice questions can be used to test reading and listening comprehension
Indirect test items are those which test the students’ knowledge of language rather than (we can also use true/false questions for this: students circle ‘T’ or ‘F’ next to statements
getting them to use it. Indirect test items might focus on, say, word collocations (see page concerning material they have just read or listened to).
75) or the correct use of m odal verbs (see page 69). Direct test items have more to do with The washback effect of multiple-choice questions leads some people to find them
activation, whereas indirect items are m ore closely related to stu dy - that is the construction unattractive, since training students to be good at multiple-choice questions may not help
of language. them to become better language learners. And there is a limit to how m uch we can test with
this kind of indirect item. Nevertheless, multiple-choice questions are very attractive in
Indirect test items
terms of scorer reliability.
There are many different ways of testing the students’ knowledge of language construction.
We will look at three of the most common. Fill-in and cloze
This extremely com mon form of indirect testing involves the examinee writing a word in a
Multiple choice gap in a sentence or paragraph, e.g.
Multiple-choice questions are those where students are given alternatives to choose from,
as in the following example: Yesterday 1 went a the cinema b my friend Clare. 1 enjo yed the film c
she did not.

Gap-fill (or fill-in) items like this are fairly easy to write, though it is often difficult to leave
a gap where only one item is possible. In such cases, we will need to be aware of what
different answers we can accept. They also make marking a little more complex, though we

169
168
Chapter 13 Testing

can design answer sheets where students only have to write the required word against There are many other kinds of indirect test item. We can ask students to put jumbled words
different letters, e.g. in order, to make correct sentences and questions. We can ask them to identify and correct
mistakes or m atch the beginnings and ends of sentences. O ur choice of test item will depend
a on which, if any, of these techniques we have used in our teaching since it will always be
b unfair to give students test items unlike anything they have seen before.
c
Direct test items
A variation on fill-ins and gap-fills is the cloze procedure, where gaps are put into a text at In direct test items, we ask students to use language to do something, instead of just testing
regular intervals (say every sixth word). As a result, w ithout the test writer having to think their knowledge of how the language itself works. We m ight ask our students to write
about it too much, students are forced to produce a wide range of different words based on instructions for a simple task (such as using a vending machine or assembling a shelving
everything from collocation to verb formation, etc, as in the following example. system) or to give an oral m ini-presentation.
There is no real limit to the kinds of tasks we m ight ask students to perform. The
All around the world, students a ____ all ages are learning to b _____ English, but their following list gives some possibilities:
reasons for c _____ to stu dy English can differ d ______ . Som e students, of course, only
e _____ English because it is on f ________curriculum at prim ary or secondary g ______ , Reading and listening
but for others, stu dying the h _______reflects some kind of a i _____. Some reading and writing test items look a bit like indirect items (e.g. when students are
given multiple-choice questions about a particular word in a text, for example, or have to
The random selection o f gaps (every sixth word) is difficult to use in all circumstances. answer T/F questions about a particular sentence). But at other times we m ight ask students
Sometimes the sixth word will be impossible to guess - or will give rise to far too many to choose the best sum m ary of what they have heard or read. We m ight ask them to put a
alternatives (e.g. gaps c and d above). Most test designers use a form of modified cloze to set of pictures in order as they read or listen to a story, or complete a phone message form
counteract this situation, trying to adhere to some kind of random distribution (e.g. making (for a listening task) or fill out a sum m ary form (for a reading task).
every sixth word into a blank), but using their comm on sense to ensure that students have Many reading and listening tests are a blend of direct and indirect testing. We can
a chance of filling in the gaps successfully - and thus dem onstrating their knowledge of ask students direct language - or text-focused - questions as well as testing their global
English. understanding.

Transformation Writing
In transformation items students are asked to change the form of words and phrases to Direct tests of writing m ight include getting students to write leaflets based on inform ation
show their knowledge of syntax and word grammar. In the following test type they are supplied in an accompanying text, or having them write compositions, such as narrative and
given a sentence and then asked to produce an equivalent sentence using a given word: discursive essays. We can ask students to write ‘transactional letters’ (that is letters replying
to an advertisement, or something they have read in the paper, etc). In transactional writing
Rewrite the sentence so that it means the same. Use the word in bold we expect students to include and refer to information they are given.
Could I borrow five p ounds, p lease? Speaking
We can interview students, or we can put them in pairs and ask them to perform a num ber
le n d _______________________
of tasks. These m ight include having them discuss the similarities and differences between
In order to complete the item successfully, the students not only have to know the meaning two pictures (see information-gap activities on page 129); they m ight discuss how to
of borrow and lend, but also how to use them in grammatical constructions. furnish a room, or talk about any other topic we select for them. We can ask them to role-
A variation of this technique is designed to focus more exactly on word grammar. Here, play certain situations (see page 125), such as buying a ticket or asking for inform ation in
students have to complete lines in a text using the correct form of a given word, e.g. a shop, or we m ight ask them to talk about a picture we show them.

It was a ter-n-fy'im performance. terrify When designing direct test items for our students, we need to remember two crucial facts.
The first is that, as with indirect tests, direct tests should have items which look like the kind
The acrobats s h o w e d ___________ no fear even though absolute
of tasks students have been practising in their lessons. In other words, there is no point in
their feats o f ___________ shocked the crowd into stunned silence, dare giving students tasks which, because they are unfamiliar, confuse them. The result of this
will be that students cannot demonstrate properly how well they can use the language, and
These kinds of transform ations work very well as a test of the students’ underlying this will make the test worthless.
knowledge of gram mar and vocabulary. However, the items are quite difficult to Direct test items are much more difficult to mark than indirect items. This is because
construct. our response to a piece of writing or speaking will almost certainly be very subjective -
Testing
Chapter 13

unless we do som ething to m odify this subjectivity. We will now go on to look at how this 5 Exemplary 4 Strong 3 Satisfactory 2 Developing l Weak
can be done. Original treatment Clear, interesting Evident main Some attempt Writing lacks
of ideas, well- ideas enhanced idea with some at support but a central idea;

Ideas/Content
developed from by appropriate supporting details. main topic may development is
start to finish, details. May have some be too general minimal or non-
Marking tests focused topic with irrelevant material, or confused by existent, wanders.
The marking of tests is reasonably simple if the markers only have to tick boxes or individual relevant, strong gaps in needed irrelevant details.
supporting detail. information.
words (though even here human error can often creep in). Things are a lot more complex,
Effectively Structure moves Organisation is An effort has been A lack of structure
however, when we have to evaluate a m ore integrative piece of work. makes this
organised in the reader appropriate but made to organise
One way of m arking a piece of writing, for example, is to give it an overall score (say a logical and smoothly through conventional. the piece, but it piece hard to
A or B, or 65%). This will be based on our experience of the level we are teaching and on interesting way. the text. Well There is an may be a ‘list’ follow. Lead and

Organisation
Has a creative organised with obvious attempt of events. The conclusion may
our ‘gut-instinct’ reaction to what we read. This is the way th at many essays are marked and engaging an inviting at an introduction introduction and be weak or non-
in various different branches of education and sometimes such marking can be highly introduction and introduction and a and conclusion. conclusion are not existent.
appropriate. However, ‘gut instinct’ is a highly subjective phenomenon. Our judgm ent can conclusion. satisfying closure. well developed.
be heavily swayed by factors we are no t even conscious of. All students will rem ember Passionate, Expressive, Pleasant but not Voice may be Writing tends to
times when they didn’t understand why they got a low m ark for an essay which looked compelling, full engaging, distinctive tone mechanical, be flat or stiff.
of energy and sincere tone with and persona. Voice artificial or Style does not
remarkably similar to one of their classmates’ higher-scoring pieces. commitment. good sense of is appropriate inappropriate. suit audience or
There are two ways of countering the danger of m arker subjectivity. The first is to involve Shows emotion audience. Writer to audience and Writer seems to purpose.
and generates behind the words purpose. lack a sense of
other people. W hen two or three people look at the same piece of work and, independently,
an emotional comes through audience.
give it a score, we can have more confidence in the evaluation of the writing than if just one

Voice
response from the occasionally.
person looks at it. reader.
The other way o f making the marking more objective is to use marking scales for a Carefully chosen Word choice is Words may Word choice is Limited vocabulary
range of different items. If we are marking a student’s oral presentation, we m ight use the words convey functional and be correct but monotonous; may range.
strong, fresh, appropriate with mundane; writing be repetitious or
following scales: vivid images some attempt at uses patterns immature.

Word Choice
consistently description; may of conversation
0 1 2 3 4 5 throughout the overuse adjectives rather than book
piece. and adverbs. language and
Gram m ar structure.
Vocabulary High degree of The piece has an The writing shows Many similar No real sentence
craftsmanship; easy flow and some general sentence sense - may
Pronunciation
control of rhythm rhythm with a sense of rhythm beginnings and ramble or sound
Coherence and flow so the good variety of and flow, but many patterns with little choppy to read
writing sounds sentence length sentences follow a sense of rhythm; aloud.
Fluency almost musical and structures. similar structure. sounds choppy to

Sentence Fluency
to read aloud. read aloud. May
Variation in have many short
This kind of scale forces us to look at o ur student’s speaking in m ore detail than is allowed
sentence length sentences or run-
by an overall impressionistic mark. It also allows for differences in individual performance: and forms adds ons.
a student may get marked down on pronunciation, but score more highly on use of interest and
rhythm.
grammar, for example. As a result, the student’s final m ark out of a total o f 25 may reflect
his or her ability more accurately than a one-m ark impression will do. But we are still left The writing Generally, the Occasional errors The writing Errors in
contains few, writing is free from are noticeable but suffers from more conventions
with the problem of knowing exactly why we should give a student 2 rather than 3 for if any, errors in errors, but there minor. The writer frequent errors, make the writing
pronunciation. W hat exactly do students have to do to score 5 for grammar? W hat would conventions. The may be occasional uses conventions inappropriate to difficult to follow.
make us give students 0 for fluency? Subjectivity is still an issue here (though it is less writer shows errors in more with enough skill the grade level, The writer seems
Conventions
control over a complex words to make the paper but a reader can to know some
problematic because we are forcing ourselves to evaluate different aspects o f the students’ wide range of and sentence easily readable. still follow it. conventions, but
performance). conventions for constructions. confuse many
One way o f trying to make marking scales more objective is to write careful descriptions this grade level. more.

of what the different scores for each category actually represent. Here, for example, is a A marking scale for writing
scale for assessing writing, which uses descriptions:

17 2
173
Chapter 13 Testing

This framework suggests that the students’ writing will be marked fairly and objectively. Conclusions I In this chapter we have:
But it is extremely cumbersome, and for teachers to use it well, they will need training and
discussed the different reasons that students take tests, and detailed the differences
familiarity with the different descriptions provided here.
between placement tests, progress tests, achievement tests, public examinations and
W hen marking tests - especially progress tests we design ourselves - we need to strike
proficiency tests.
a balance between totally subjective one-m ark-only evaluation on the one hand, and over-
complexity in marking-scale frameworks on the other. said that good tests are both valid and reliable - and that face validity (‘looking
good’) is also important.
mentioned the fact that test design may be influenced by physical constraints (e.g.
Designing tests time and money).
W hen we write tests for our classes, we need to bear in m ind the characteristics of good tests talked about the washback effect which can sometimes persuade teachers to work
which we discussed on pages 167-168. We will think very carefully about how practical our only on exam preparation with their students while ignoring general language
tests will be in terms of time (including how long it will take us to m ark them). development. We have said this is not usually a good thing. We talked about the
When writing progress tests, it is im portant to try to work out what we want to achieve, effect of success or failure in tests on students’ motivation.
especially since the students’ results in a progress test will have an immediate effect on looked at examples of different test types and items including discrete test items (one
their motivation. As a consequence, we need to think about how difficult we want the test thing at a time) and integrative test items (where students use a variety of language
to be. Is it designed so that only the best students will pass, or should everyone get a good and skills); direct test items (where students are asked to do things with the language
mark? Some test designers, especially for public exams, appear to have an idea of how many - e.g. writing a report) and indirect test items (where they are tested about the
students should get a high grade, what percentage of examinees should pass satisfactorily, language - e.g. grammar tests).
and what an acceptable failing percentage would look like.
discussed the issue of subjectivity when it comes to marking tests and shown how
Progress tests should not work like that, however. Their purpose is only to see how
marking scales can counter such subjectivity - though if they are over-detailed they
well the students have learnt what they have been taught. O ur intention, as far as possible,
may become cumbersome.
should be to allow the students to show us what they know and can do, not what they don’t
know and can’t do. said that when preparing tests, we need to decide what we want to test and how
W hen designing tests for our classes, it is helpful to make a list of the things we want important each part of a test is in relation to the other parts. We said that teachers
should show their tests to colleagues and try them out before using them ‘for real’.
to test. This list might include gramm ar items (e.g. the present continuous) or direct tasks
(e.g. sending an email to arrange a meeting). W hen we have made our lists, we can decide
how much importance to give to each item. We can then reflect these different levels of
importance either by making specific elements take up most of the time (or space) on the
test, or by weighting the marks to reflect the importance of a particular element. In other
words, we might give a writing task double the marks of an equivalent indirect test item to
reflect our belief in the importance of direct test types.
When we have decided what to include, we write the test. However, it is im portant that
we do not just hand it straight over to the students to take. It will be much more sensible
to show the test to colleagues (who frequently notice things we had not thought of) first. If
possible, it is a good idea to try the test out with students of roughly the same level as the
ones it is designed for. This will show us if there are any items which are more difficult (or
easier) than we thought, and it will highlight any items which are unclear - or which cause
unnecessary problems.
Finally, once we have given the test and m arked it, we should see if we need to make any
changes to it if we are to use some or all of it again.
It is not always necessary to write our own tests, however. Many coursebooks now
include test items or test generators which can be used instead of home-grown versions.
However, such tests may not take account of the particular situation or learning experiences
of our own classes.

174 175

You might also like