You are on page 1of 3

2.

5 THE TRANSFORMATION OF TESTS

There are distinct historical phases in the nature of formal testing. These phases provide us with
another useful way to categorize formal tests and compare them. The phases are not surprisingly
linked to the view of language and language learning that had most currency at a particular time.
The tests from each phase are referred to as first, second and third generation tests.

2.5.1 FIRST GENERATION TESTS

These are the tests broadly associated with the grammar/translation approach to language
learning. Candidates are asked to complete various questions such as compositions, translations,
or simple question and answer activities devoid of context. e.g. Write about a holiday you enjoyed
(200 words) - who to? why? Question types are probably non authentic; you do not usually
translate large chunks of literature unless this is your job.

Question types aim to elicit integrative language, which is language that requires a wide range of
language abilities, e.g. a composition will test grammar, vocabulary punctuation and spelling,
discourse structure, i.e. these tests subsume the testing of both accuracy and fluency together.
The type of testing techniques lead to subjective scoring, which charges an experienced tester to
make a judgement of the sample of language according to their knowledge and experience of
other similar samples. This can lead to problems of reliability in marking.

The degree of agreement between two examiners about a mark for the same language sample
is known as inter-rater reliability. The degree of agreement between one single examiners
marking of the same language sample on two separate occasions is known as intra-rater
reliability. Both inter- and intra-rater reliability is low in first generation tests. With first generation
test formats, it is common for two different examiners to mark the same test in a very different
way, or the same examiner to mark the same test differently on two different occasions, and
severe criticism is made to these test types because of this unreliability.

It was the reliance on subjective marking, and the associated problems of reliability, that led to
the development of the next generation of tests.

2.5.2 SECOND GENERATION TESTS


With the structuralist view of language and associated techniques for language teaching, came
the opportunity to iron out some of the problems arising from first generation tests. Where first
generation testing techniques had been marked subjectively, with the associated problems in
standardizing marking to ensure fairness, language items could now be assessed objectively on
a right or wrong basis through multiple choice testing of discrete language items. The test could
be marked by a non-expert, by different people, or by the same person more than once, and the
result would always be the same.
Progress on the front of being objective in second generation tests was not however matched by
similar progress in authenticity of language or the placement of language in context. Questions in
second generation testing normally measure one item of language, known as a discrete point, (to
contrast with the integrative language tested in the first generation tests above.) Since each
question tests one tiny aspect of language (e.g, a verb form, a preposition etc.), tests are often
very long, consisting of many questions. In a test consisting of 200 items, none of them may have
any connection. Inevitably, these tests are criticized because they do not sample integrative
language.

Later versions of these tests developed techniques that aim to redress this problem, evolving
techniques that were both objective and integrative, such as the cloze test - a text from which
words are removed either randomly or against some linguistic criteria, the test is to complete the
text. This is both objective and integrative, drawing on a wider range of language abilities.

Nor are second generation test formats authentic either, real-world language use does not
normally extend to multiple choice conversations! The over-use of second generation testing
techniques can lead to mechanistic teaching of discrete language items, and to very little
language use.

2.5.3 THIRD GENERATION TESTS

The testing of integrative language, with the use of both objective and subjective testing formats,
has come together in third generation tests. These are those tests which have come along the
back of developments in communicative language teaching (CLT). Just as CLT strives to emulate
real language use, then communicative tests aim to do the same, and consist of test items of real
language use.

One of the main issues in communicative language testing is the definition of Communicative
Language Ability; i.e. the theoretical base on which to build a communicative test. Recent models
of communicative language ability propose that it consists of both knowledge of language and the
capacity for implementing that knowledge in communicative language use.

Examples of communicative language testing tasks may be an authentic reading with some
transfer of information such as correcting some notes taken from it, or writing a note with
instructions about some aspect of household organization, or listening to an airport
announcement to find the arrival time of a plane, or giving someone spoken instructions for how
to get to a house.

To compare the third generation tests against the strengths and weaknesses of the previous two:
both the texts used and the tasks set aim to be authentic in third generation tests, all third
generation techniques are contextualized by their very nature as authentic. Candidates are asked
to do tasks which have clear reference in reality. Third generation tests assess integrative
language. The nature of the tasks in speaking and writing demand integrative language use.
Similarly, techniques for listening and reading demand global (integrative) comprehension as well
as comprehension of discrete items.

Finally, since communicative testing of the productive skills gives rise to samples of integrative
language that have to be assessed subjectively, much effort has been put into achieving greater
inter-and intra-examiner reliability. Often more than one assessor is used, assessors attend
retraining meetings where they discuss the interpretation of descriptors of performance against
real samples of tests.

We will be going back to many of the points made briefly in this section, since they require deeper
coverage. This section should however serve as a little historical context against which formal
testing operates, whether the tests are second or third generation. Tests are not normally
watertight representations of one generation of testing, normally techniques are mixed and
matched. The strengths and weaknesses of the second and third generation tests make them
suitable for different testing purposes.

WEEK 02 - Material 3

You might also like