Delta M1 Testing and Evaluation PDF

Unit 6
Assessment
Contents
0 Introduction Page 2
1 Evaluation, Assessment and Testing Page 4
2 Why assess learners? Page 6
3 Test Construction Page 7
Task 1 Page 9
4 What do we test? Page 10
5 Problems with testing and alternative approaches Page 14
Task 2 Page 14
6 List of key terms Page 16
7 Exam Practice Page 18
8 Further Reading Page 20
1
0 Introduction
This unit focuses on the evaluation of learners. A sound understanding of the various
principles of assessment is very important for Paper 2 Task 1 of the exam. Other
questions in which this knowledge is useful are Paper 1 Tasks 1 and 2, and possibly
Paper 2 Tasks 2 and 3.
The main aim of the unit is to provide an introduction to the key terms and concepts
of assessment. It is worth bearing in mind that it really only provides an overview of a
very complex subject, and while further reading is not essential for the exam, there is
a lot more to be read on the subject. As always, there’s a list at the end of this unit of
all terminology mentioned, which you can use for revision purposes, as well as some
recommended reading on the subject.
STOP AND THINK 1
In Paper 2 Question 1 you are asked to evaluate a test or part or a test ‘using your
knowledge of relevant testing concepts’. In order to do this, you will need to be
familiar with some of the relevant terminology (The same terminology is also very
useful when writing the Module 3 extended assignment.)
For this unit we suggest you adopt a test-teach-test approach by first trying to match
the terms with the definitions on the following page, and then, when you have
finished reading the unit, try the task again.
N.B. It’s worth bearing in mind that several of these terms overlap.
2
achievement tests the extent to which a test appears to test what it is
designed to test
analytic marking schemes the extent to which a completed test would be given
the same score by two or more different markers
backwash when the learner is asked to perform the skill that is

being tested (e.g. in order to test how well they can
write an email, the task is to write an email)
construct validity the extent to which a test tests what it is supposed to

test
content validity a means of determining the class or level that a

learner should be placed in
criteria-referenced testing the influence a final test has on the teaching that
comes before it
diagnostic tests assessment that provides information to be used as

feedback to modify the teaching and learning
activities of a course
direct testing a way of testing a learner’s general level irrespective

of the teaching that has preceded it
face validity an attempt to avoid marker subjectivity by breaking

up the learners’ language into different areas for
assessment
formative evaluation the extent to which a test tests what it is

supposed to test and nothing else
indirect testing the extent to which the same test given to the same
learner at a different time would produce the same
score
norm-referenced testing testing the learners’ ability to perform certain skills by

testing things related to the skill rather than by getting
them to actually perform the skill
placement tests evaluation carried out at the end of a course to

determine the effectiveness of the teaching and
learning
proficiency tests a means of assessing a learner’s performance based

on a list of criteria
scorer reliability measuring a learner’s performance by comparing his

score with that of other learners
summative evaluation a way of testing whether learners can do what they

have been taught
3
test reliability a means of analysing a learner’s strengths and
weaknesses, often used to provide information about
what to include in a course programme
1 Evaluation, Assessment and Testing
Evaluation
Testing is a type of assessment, which in turn is an aspect of evaluation.
In very general terms, evaluation is the process of judging how satisfactory

something is. It covers not only the learning that has taken place but also the quality
of educational policy, the effectiveness of educational management, how well a
course has been designed, the quality of course materials and teaching aids, and
how well the teacher performs.
There are three types of evaluation: summative, formative and congruent. The
differences between them are defined by when they take place. Summative
evaluation takes place at the end of a period of study and aims to assess what has
been achieved in that time. Formative evaluation takes place during a period of study
and aims to provide feedback during a course so that it can be improved. Congruent
evaluation refers to the evaluation of a course before it starts to ensure that the
course design matches the course aims and objectives.
Assessment
Assessment is the measurement of the amount of learning that has taken place. It
can be carried out by the teacher (teacher assessment), by students (self-
assessment), by students and teachers (collaborative assessment), by students
with one another (peer assessment).
There are many ways in which information on learning can be provided. Such
assessment activities can be:
• formal (i.e. carried out under test conditions);

• informal (i.e. collecting information about students’ performance in the
normal classroom environment, without establishing test conditions).
Informal assessment involves strategies such as:
• observing students’ behaviour and interactions and listening to what they

say;
• measuring observational evidence against assessment criteria (sometimes
phrased as ‘can do’ statements);
• encouraging students to reflect on their progress, and to think together about
how to improve.
Testing
4
If assessment is formal then it is known as testing. Andy Baxter (1997) describes
testing as a process in which teachers ask learners questions to which they already
know the answers (whereas when evaluating we are asking questions to which we
don’t have the answers). Testing is concerned with what has been learned whereas
evaluation is also concerned with the how and the why.
STOP AND THINK 2
Given these definitions, which do you carry out more of: formal or informal testing?
Make a list of some of the different ways you do each of these.
5
2 Why assess learners?
Generally we assess learners to provide information to stakeholders. Stakeholders

can be:
• the learners;
• the teacher;
• a director of studies;
• parents;
• employers;
• etc.
When we compare learners with one another it is known as norm-referenced

testing. This is done if a certain percentage of learners need to be selected for
something (such as entry to a course with a limited number of places). It therefore
does not provide information on a learner’s individual performance.
If we measure learners’ proficiency against a specific standard this is called criteria-

referenced testing, and shows what a learner can do in the language. This method
is useful when we want to measure a learner’s ability to perform specific tasks or
place the learner in a band or level.
While larger organisations tend to use tests which measure learners against a
standard of proficiency that is not based on any syllabus they may have followed
(proficiency tests), smaller organisations will often test their learners based on the
extent to which they have mastered certain aspects of a syllabus or the overall
objectives of a syllabus (achievement tests). So the CAE or IELTS exams would be
examples of the former, while a coursebook test or a school end-of-term test could
be examples of the latter.
We can also use tests to decide what course learners should take (placement
tests). Such tests may also be achievement tests if learners are to change level or
proficiency tests if the learners are new to the organisation .Alternatively, they could
be a mixture of the two.
Tests can also be used to identify the strengths and weaknesses of a learner or of a
teaching programme. Such tests are known as diagnostic tests.
STOP AND THINK 3
Think of some tests you have given recently. What were the reasons for giving them?
6
3 Test Construction
Obviously the thoroughness of a test depends on its purpose. A good test though
should be:
• valid;
• reliable;
• practical;
• have no negative backwash (or washback).
Validity is often divided into:
• content validity;
• construct validity;
• face validity.
A test has content validity if it tests a good/representative range/sample of what it is

supposed to test. So if a test aims to test the learners’ ability to produce a specified
range of vowel sounds, for example, then it should test a reasonable range of these
sounds and not just one or two.
If the same test only tested the learners’ ability to recognise the different sounds
rather than their ability to produce them, then it would have low construct validity as
it would be testing the wrong thing. Content validity is often considered to be part of
construct validity, as clearly a test can’t accurately measure a learner’s ability or
knowledge if the test does not contain the language areas that are supposed to be
being tested.
Face validity is to do with the way in which a learner perceives a test. Does the
learner believe it is testing what it is supposed to test? In order to have face validity a
test needs to be designed in a way that allows the learner to see that it really is
testing what it is supposed to.
There are two types of reliability:
• test reliability
• scorer reliability
Test reliability refers to the degree to which the same test given to the same learner
under the same conditions would produce the same results. Obviously the more
thorough the test the more data it produces, which increases reliability, but issues of
practicality mean that we often cannot be as thorough as we would like to be.
Creating a reliable test then is a question of compromise between thoroughness and
practicality. Giving learners fresh starts by providing a variety of tasks rather than,
say, one long one is one way of increasing the reliability of a test. Varying the
question types in a test yet sticking to ones that the learners are familiar with is also a
way of ensuring test reliability. Instructions should also be intelligible and the
conditions in which the test is taken should be the same each time it is sat.
If two different people mark the same test and give it the same mark then the test is
said to have scorer reliability.
7
To increase scorer reliability the test either has to have a set of right answers to mark
against (e.g. in a multiple choice exercise) or an answer key and marking scheme
instructing markers on how they should be marking. The scorer reliability of tests that
can be answered in a variety of ways (such as writing tasks) can be increased if
there is more than one marker.
Practicality refers to how easy a test is to administer. This refers not only to finding
the space, time and invigilating staff for the running of the test but also the time and
expertise necessary for the test’s design, trialling and marking as well as the time
and skills necessary for coming up with a valid and reliable test and answer
key/marking scheme.
There is an inevitable conflict between practicality and reliability as producing a test

that is both valid and reliable takes up a lot of time and resources.
Validity and reliability are also not always easy to reconcile in a test. Reliable tests
are not always valid, and vice versa. Having learners write a letter of complaint might
be a very valid way of testing their ability to write such a letter, for example, but
unless the teacher has carefully considered how much guidance the learners will get,
devised exact criteria for marking and arrived at a clear understanding amongst the
various markers of what constitutes a good letter, then the test will not be reliable.
Alternatively, while getting learners to complete a gapped letter of complaint would
make for a very reliable test, its validity would be very low as it would not show how
well they could produce such a text themselves.
Tests such as the gapped letter described above have the advantage of being
relatively objective as there is often a single correct answer. Therefore they are very
easy to mark and can sometimes even be marked by a machine such as an OMR –
an optical mark reader. However, as well as lacking in validity they are often much
more difficult to design.
Subjective tests, such as writing a letter, are ones in which the marker uses his
judgement. These are usually easier to design but have to be marked by a teacher,
and the marking can be time-consuming. Other issues with subjective testing are that
learners can either play safe by avoiding things they are not sure about or produce
language that is beyond the scope of what is being tested.
Another way of labelling tests (and remember, many of these terms overlap
considerably) is by using the terms discrete-point testing and integrative testing.
A discrete-point test is one which consists of several items, each of which tests a
single point of knowledge at a time (e.g. a test in which each part tests a different
grammatical structure). If we want to know if a learner can recognise or produce a
specific language item, then we use discrete item techniques.
An integrative test on the other hand requires learners to combine various

components of language systems in order to complete a task (for example, in order
to write an accident report they would need to show the ability to spell, use
appropriate discourse markers, use narrative tenses, select relevant lexis, use
punctuation, order the report into paragraphs, etc.). If the teacher wants to test how
8
well a student can use their combined knowledge of single items, then integrative
testing techniques are the best method.
Discrete-point tests are usually objective and require short answers whereas
integrative tests are usually open-ended and require the learner to respond in their
own words. Most tests nowadays use a combination of these techniques depending
on the language and skill that is being tested.
Tests can also be described as direct or indirect. In a direct test a learner’s ability to
perform a task in the language is assessed by getting the learner to perform the task.
So if we want to assess a learner’s speaking then the test requires the learner to
speak. An indirect test assesses aspects of the language which give an indication of
how well a learner performs. To assess a learner’s spoken language he might be
asked to match spoken discourse markers with their functions, for example, or to
choose the right response to a request.
McNamara (2000:5) distinguishes between performance tests and the more

traditional paper-and-pencil tests.
Backwash describes the effect a test has on the teaching that precedes it, i.e. the
extent to which a course is influenced by the test it is leading up to. If the content of a
course is improved by the teacher having to ‘teach to the test’ then this is known as
beneficial, or positive, backwash, whereas if the learners are deprived work on the
areas they really need to work on as a result of the course focusing too much on the
upcoming test then the backwash is said to be negative. It is important to be aware
of backwash and not automatically assume that what is in the exam is necessarily
what the learners need.
In the longer term of course it is also the case that tests change to reflect changes in
the way that teachers are teaching and in the content of courses.
TASK 1
1. Take a test you are familiar with (it can be an internationally taken test such as
IELTS, one your school uses, or even one you have designed yourself).
2. Using the key words in Sections 2 and 3, analyse the pros and cons of the test.
3. Post your list on the discussion board. Then read and discuss your colleagues’
reports.
9
4 What do we test?
Over the years there has been a move away from testing learners’ knowledge of a
language towards testing their ability to use it. There has also been less focus on
testing accuracy and more on testing communicative competence, and helping
learners to learn more effectively is seen by many these days as more constructive
than testing their memory.
This section looks at ways in which we can test learners’
• use of language components;

• use of language in real world (the four skills);
Testing Grammar and Lexis
Below are some of the reasons we test grammar and lexis.
• The syllabuses of schools and coursebooks are very often structurally

organised.
• Schools and teachers still tend to measure learners’ progress in terms of their
knowledge of grammar.
• Grammar and lexis are easy to test through objective tests, which are easy to
mark: there is often a right and a wrong answer.
• Students expect it.
STOP AND THINK 4
Think of as many ways as you can for the testing of grammar and lexis and compare
your list with the one below.
Some common types of testing techniques used for grammar and lexis are:
• Gap-filling
• Multiple choice
• Error spotting
• Transformation exercises (e.g. when learners are given a sentence which
they have to express in another way using either a sentence head or a key
word)
• Jumbled sentences for students to order
• Matching tasks (e.g. word and definition, halves of collocation, sentence
halves, sentences, etc.)
• Cloze tests (a text in which every 7th – though in practice, to maintain
coherence it’s often between every 6th and 10th – word is removed)
• Skeleton sentences, which need to be written in full
• Odd one out
• Writing questions for answers
• Adding to categories
10
STOP AND THINK 5
Make a list of advantages and disadvantages of the above testing techniques.
Testing Reading and Listening
STOP AND THINK 6
Which of the techniques for testing lexis and grammar listed above could also be
used for testing reading and listening?
Compare your list with the one below.
Some examples of testing techniques for reading and listening:
• Gap-filling
• Multiple choice questions
• True/false questions
• Completing tables
• Sequencing a jumbled text (reading)
• Writing answers to questions
• Matching (e.g. titles or topics to texts or paragraphs)
• Inserting headings, sentences, paragraphs back into texts
• Labelling diagrams
• Selecting a picture or sequencing pictures
• Spotting differences in content between a written and a spoken text
• Identifying features of spoken language (listening).
Testing Writing
Here are some more holistic ways of testing writing:
• Simply give learners a title and text type

• Learners expand notes into a text
• Learners produce a text based on visuals
• Learners rewrite a text (change the style)
• Learners reply to an email or letter
• Learners write a review of a film they have watched/book they have read
• Learners fill in a form
• Summary writing
Some of these tests involve reading, which could be said to lower their construct
validity. However, it could also be argued that the tasks are therefore more
communicative and reflect the circumstances in which we write in the real world.
The marking of a written text is a considerably more complex process than the
marking of a discrete-point grammar test. Ideally, a learner’s text should receive the
11
same score regardless of who marks it when, so a purely impressionistic mark is
insufficient. Marking writing involves a lot more than simply adding up points
however; a clear list of criteria is needed.
Exams such as the Cambridge suite or IELTS attempt to provide a rationalisation of

the marking of the skills by using descriptors, profiles and bands. Candidates are
placed into a specific band if their writing is said to match the profile in that band. The
profile is usually a short paragraph comprising certain descriptors of what a learner in
this band can do with the language. For a look at the IELTS writing bands, see:
http://www.ielts.org/pdf/UOBDs_WritingT1.pdf.
Such marking scales are either holistic or analytic. An example of a holistic scale is
the Cambridge suite exams (FCE, CAE, etc.), which uses descriptors to assess the
writing from a global point of view. Analytic marking scales break the writing skill
down into different components and award marks for each one. As you hopefully saw
when you clicked on the above link, the IELTS writing tasks are awarded marks
under the following categories, for example:
• task achievement;
• coherence and cohesion;
• lexical resource;
• grammatical range and accuracy.
Testing Speaking
STOP AND THINK 7
Despite its prevalence in communicative classrooms and the fact that learners often
cite the need to speak as a priority, the testing of speaking is often neglected. Why
do you think this might be?
The main problems when testing speaking are practicality and reliability. Formally
testing learners effectively usually involves dividing a group up into individuals or
groups of 2 or 3 and testing them at intervals, which obviously takes a lot of time.
Assessing learners’ speaking can also be very difficult as speaking is ephemeral and
therefore difficult to analyse in real time. Also, the examiner is often involved in
communication with the learner(s), or so intent on understanding the message of
what is being said that assessment of their speaking is not easy.
Informal assessment of speaking is therefore often a more practical and reliable

option. Monitoring group speaking activities during class time in order to assess
specific aspects of learners’ speaking or to gain a holistic impression of their
communicative competence can usually be done in the normal course of a lesson.
This technique also has the advantage of taking the spotlight off the learner and if the
task they have been set is intrinsically motivating they should be concentrating on
task achievement rather than accuracy, which is a closer reflection of what they
usually need to do in the real world.
12
In assessing speaking we can use scales similar to those used in writing, i.e. either
holistic scales or analytic scales based on the speaking skill.
STOP AND THINK 8
What categories could be used in an analytical scale for the speaking skill?
Here are some ways of testing the speaking of individuals:
• Answer questions from the examiner

• Give a talk or presentation on a certain topic
• Discuss a topic
• Describe a picture, compare and contrast pictures
• Tell a story from pictures
• Read aloud (you would need to decide whether or not learners would have a
chance to see the text first)
Pairs can do the above but also:
• Role-play
• Problem solving tasks
• Ranking tasks
• Debates
The IELTS speaking paper is marked according to the following criteria:
• Fluency and Coherence

• Lexical Resource
• Grammatical Range and Accuracy
• Pronunciation
13
5 Problems with testing and alternative approaches
Does assessment help or hinder learning?
The following criticisms are often levelled at formal language testing.
• Many tests do not accurately measure the skill(s) or use of the language
system(s) that they are intended to evaluate.
• They only give learners one shot at ‘getting it right’; informal continuous
assessment arguably gives a far more representative picture of a learners’
abilities.
• Some learners simply ‘aren’t very good at tests’, maybe because of previous
testing experiences, nerves, attention span or because the way in which they
are being tested doesn’t match their learning style.
• Tests can result in negative backwash and therefore make lessons less
interesting/effective/relevant to learners’ needs.
The key to successful testing, i.e. testing that gives us accurate information about our
learners’ language abilities in a number of areas, is a little often rather than one or
two big tests at the end of units or terms. Getting learners more involved in the
assessment process is another way of increasing its effectiveness. One way of doing
this is to get them to keep portfolios in which they could keep:
• test/mini-test results
• marked homework
• project work (may have been written as part of a group)
• audio-cassettes
• video-cassettes
• interesting articles/texts/song lyrics, etc. that the student has
found/read/understood
• compositions
• pages/extracts from a learner diary
• checklists/learned lists
• previous reports/evaluations by teachers, peers, or self
• lesson-redesigns; lesson analyses
• results of previous performance reviews
This ensures that the learners are responsible for keeping a varied and personal
record of their progress over a course, and shares the responsibility of keeping tabs
on it.
TASK 2
Are teachers the best people to judge whether a learner has made progress?
Shouldn’t the learner have some say in the matter? Andy Baxter lists 8 ways in which
the learners can be involved in assessment.
- Confidence ratings
- Checklists
- Learned lists
14
- Learner diaries
- Redesign and analyse a class
- Self-reports
- Student tests
- Clinics
1. What do you think each of these methods would involve?

2. Do you use any of these techniques?
3. What are the benefits of involving learners in assessment in your opinion?
Give and explain your answers to these questions on the discussion board. Then
read and discuss your colleagues’ postings.
15
6 List of key terms: Testing terminology answer key
achievement tests a way of testing whether learners can do what they

have been taught
analytic marking schemes an attempt to avoid marker subjectivity by breaking

up the learners’ language into different areas for
assessment
backwash the influence a final test has on the teaching that

comes before it
construct validity the extent to which a test is designed to test what it

is supposed to test and nothing else
content validity the extent to which a test tests what it is supposed

to test
criteria-referenced testing a means of assessing a learner’s performance based

on a list of criteria
diagnostic tests a means of analysing a learner’s strengths and

weaknesses, often used to provide information
about what to include in a course programme
direct testing when the learner is asked to perform the skill that is
being tested
face validity the extent to which a test appears to test what it is

designed to test
formative evaluation assessment that provides information to be used as

feedback to modify the teaching and learning
activities of a course
indirect testing testing the learners’ ability to perform certain skills

by testing things related to the skill rather than by
getting them to actually perform the skill
norm-referenced testing measuring a learner’s performance by comparing his

score with that of other learners
placement tests a means of determining the class or level that a

learner should be placed in
proficiency tests a way of testing a learner’s general level irrespective

of the teaching that has preceded it
scorer reliability the extent to which a completed test would be given

the same score by two or more different markers
summative evaluation evaluation carried out at the end of a course to

determine the effectiveness of the teaching and
learning
16
test reliability the extent to which the same test given to the same
learner at a different time would produce the same
score
NB Progress tests can be a type of formative assessment. Achievement tests are a type of
summative evaluation. The terms formative and summative are used to talk about evaluation
that provides information about both the learning and the teaching that has taken place,
whereas progress and achievement tests are designed primarily to test the effectiveness of
the learning.
17
7 Exam Practice
Paper Two Task One (18 marks)
The text for this task is reproduced on the opposite page. It is being used in the
following situation
The group consists of six Czech learners all from the same company,
which provides financial services. There is a range of abilities within the
group, which is nominally Intermediate (CEFR B1-2). The stated needs
of the group are to improve their business vocabulary and spoken
fluency. The test was set as part of an end-of-term business English
test. There were other parts which tested the learners’ reading and
writing skills.
Using your knowledge of relevant testing concepts, evaluate the effectiveness of the
tasks for this learner in this situation.
Make a total of six points. You must include positive and negative points.

18
Progress Test December 2010
Prepositions
Put one word in each gap.
1. Our company spends a lot _____ marketing.

2. We now have hundreds _____ customers all over the country.
3. _____ I joined the company I have been very busy.
4. Our profits have increased _____ 10% so far this year.
5. I took _____ a loan from the bank and now I have to pay it _____.
6. In most shops now you have a choice of paying _____ credit cash or _____
cash.
7. Please sign _____ the bottom _____ the contract.
Contracts
Put the words in the correct order.
1. The known and Seller hereinafter Buyer parties be as will The The
2. contract of If either the breaks void be the it parties will and null
3. in this many clauses There contract are
Personal Finance
Write two words that can both go before the words on the right.
1. __________ account
__________
2. __________ a mortgage
__________
3. __________ money
__________
4. __________ an invoice
__________
Accounting
Match the words on the left with their definitions on the right.
1. amortization record the lower price of an asset due to depreciation
2. write down the current value of an asset
3. write off the loss in value of an asset over time
4. book value record the value of an asset as zero due to depreciation
19
8 Further Reading
Key sources. Try and read at least one of these.
Baxter, A. (1997) Evaluating Your Students Richmond
Harris, M. & McCann, P. (1994) Assessment Heinemann
Hughes, A. (1989) Testing for Language Teachers CUP
McNamara T. (2000) Oxford Introductions: Language Testing OUP
Rea-Dickins, P & Germaine, K. (1992) Evaluation OUP
Other useful books
Alderson J.C. (1995) Language Test Construction and Evaluation CUP
Alderson, J. C., Clapham, C. & Wall, D. (1995) Language Test Construction and
Evaluation CUP
Allan, D. April (1999) Distinctions and Dichotomies: Testing and Assessment ETP
Issue 11
Bachman L. (1990) Fundamental considerations in Language testing OUP
Bachman L. (1996) Language Testing in Practice OUP
Bowler, B. & Parminter, S. April (1997) Continuous Assessment ETP Issue 3
Dawson, J. October (1997) Assessing Spoken English ETP, Issue 5
Harris, M. January (1997) Self-assessment in formal settings ELTJ 51/1
Heaton, J. B. (1988) Writing English Language Tests Longman (new edition)
Underhill, N. (1987) Testing Spoken Language Cambridge University Press
20

Delta M1 Testing and Evaluation PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Delta M1 Testing and Evaluation PDF

Uploaded by

Copyright:

Available Formats

Unit 6

1 Evaluation, Assessment and Testing Page 4

2 Why assess learners? Page 6

3 Test Construction Page 7

4 What do we test? Page 10

5 Problems with testing and alternative approaches Page 14

6 List of key terms Page 16

7 Exam Practice Page 18

8 Further Reading Page 20

STOP AND THINK 1

backwash when the learner is asked to perform the skill that is

construct validity the extent to which a test tests what it is supposed to

content validity a means of determining the class or level that a

diagnostic tests assessment that provides information to be used as

direct testing a way of testing a learner’s general level irrespective

face validity an attempt to avoid marker subjectivity by breaking

formative evaluation the extent to which a test tests what it is

norm-referenced testing testing the learners’ ability to perform certain skills by

placement tests evaluation carried out at the end of a course to

proficiency tests a means of assessing a learner’s performance based

scorer reliability measuring a learner’s performance by comparing his

summative evaluation a way of testing whether learners can do what they

1 Evaluation, Assessment and Testing

Testing is a type of assessment, which in turn is an aspect of evaluation.

In very general terms, evaluation is the process of judging how satisfactory

• formal (i.e. carried out under test conditions);

Informal assessment involves strategies such as:

• observing students’ behaviour and interactions and listening to what they

STOP AND THINK 2

Make a list of some of the different ways you do each of these.

Generally we assess learners to provide information to stakeholders. Stakeholders

When we compare learners with one another it is known as norm-referenced

If we measure learners’ proficiency against a specific standard this is called criteria-

STOP AND THINK 3

Validity is often divided into:

A test has content validity if it tests a good/representative range/sample of what it is

There are two types of reliability:

There is an inevitable conflict between practicality and reliability as producing a test

An integrative test on the other hand requires learners to combine various

McNamara (2000:5) distinguishes between performance tests and the more

This section looks at ways in which we can test learners’

• use of language components;

Testing Grammar and Lexis

Below are some of the reasons we test grammar and lexis.

• The syllabuses of schools and coursebooks are very often structurally

STOP AND THINK 4

Make a list of advantages and disadvantages of the above testing techniques.

Testing Reading and Listening

STOP AND THINK 6

Compare your list with the one below.

Some examples of testing techniques for reading and listening:

Here are some more holistic ways of testing writing:

• Simply give learners a title and text type

Exams such as the Cambridge suite or IELTS attempt to provide a rationalisation of

STOP AND THINK 7

Informal assessment of speaking is therefore often a more practical and reliable

STOP AND THINK 8

Here are some ways of testing the speaking of individuals:

• Answer questions from the examiner

Pairs can do the above but also:

The IELTS speaking paper is marked according to the following criteria:

• Fluency and Coherence

Does assessment help or hinder learning?