You are on page 1of 20

Unit 6

Assessment

Contents

0 Introduction Page 2  

1 Evaluation, Assessment and Testing Page 4

2 Why assess learners? Page 6

3 Test Construction Page 7  

  Task 1 Page 9

4 What do we test? Page 10

5 Problems with testing and alternative approaches Page 14

Task 2 Page 14

6 List of key terms Page 16  

7 Exam Practice Page 18

8 Further Reading Page 20

1
0 Introduction

This unit focuses on the evaluation of learners. A sound understanding of the various
principles of assessment is very important for Paper 2 Task 1 of the exam. Other
questions in which this knowledge is useful are Paper 1 Tasks 1 and 2, and possibly
Paper 2 Tasks 2 and 3.

The main aim of the unit is to provide an introduction to the key terms and concepts
of assessment. It is worth bearing in mind that it really only provides an overview of a
very complex subject, and while further reading is not essential for the exam, there is
a lot more to be read on the subject. As always, there’s a list at the end of this unit of
all terminology mentioned, which you can use for revision purposes, as well as some
recommended reading on the subject.

STOP AND THINK 1

In Paper 2 Question 1 you are asked to evaluate a test or part or a test ‘using your
knowledge of relevant testing concepts’. In order to do this, you will need to be
familiar with some of the relevant terminology (The same terminology is also very
useful when writing the Module 3 extended assignment.)

For this unit we suggest you adopt a test-teach-test approach by first trying to match
the terms with the definitions on the following page, and then, when you have
finished reading the unit, try the task again.

N.B. It’s worth bearing in mind that several of these terms overlap.

2
achievement tests the extent to which a test appears to test what it is
designed to test

analytic marking schemes the extent to which a completed test would be given
the same score by two or more different markers

backwash when the learner is asked to perform the skill that is


being tested (e.g. in order to test how well they can
write an email, the task is to write an email)

construct validity the extent to which a test tests what it is supposed to


test

content validity a means of determining the class or level that a


learner should be placed in

criteria-referenced testing the influence a final test has on the teaching that
comes before it

diagnostic tests assessment that provides information to be used as


feedback to modify the teaching and learning
activities of a course

direct testing a way of testing a learner’s general level irrespective


of the teaching that has preceded it

face validity an attempt to avoid marker subjectivity by breaking


up the learners’ language into different areas for
assessment

formative evaluation the extent to which a test tests what it is


supposed to test and nothing else

indirect testing the extent to which the same test given to the same
learner at a different time would produce the same
score

norm-referenced testing testing the learners’ ability to perform certain skills by


testing things related to the skill rather than by getting
them to actually perform the skill

placement tests evaluation carried out at the end of a course to


determine the effectiveness of the teaching and
learning

proficiency tests a means of assessing a learner’s performance based


on a list of criteria

scorer reliability measuring a learner’s performance by comparing his


score with that of other learners

summative evaluation a way of testing whether learners can do what they


have been taught

3
test reliability a means of analysing a learner’s strengths and
weaknesses, often used to provide information about
what to include in a course programme

1 Evaluation, Assessment and Testing

Evaluation

Testing is a type of assessment, which in turn is an aspect of evaluation.

In very general terms, evaluation is the process of judging how satisfactory


something is. It covers not only the learning that has taken place but also the quality
of educational policy, the effectiveness of educational management, how well a
course has been designed, the quality of course materials and teaching aids, and
how well the teacher performs.

There are three types of evaluation: summative, formative and congruent. The
differences between them are defined by when they take place. Summative
evaluation takes place at the end of a period of study and aims to assess what has
been achieved in that time. Formative evaluation takes place during a period of study
and aims to provide feedback during a course so that it can be improved. Congruent
evaluation refers to the evaluation of a course before it starts to ensure that the
course design matches the course aims and objectives.

Assessment

Assessment is the measurement of the amount of learning that has taken place. It
can be carried out by the teacher (teacher assessment), by students (self-
assessment), by students and teachers (collaborative assessment), by students
with one another (peer assessment).

There are many ways in which information on learning can be provided. Such
assessment activities can be:

• formal (i.e. carried out under test conditions);


• informal (i.e. collecting information about students’ performance in the
normal classroom environment, without establishing test conditions).

Informal assessment involves strategies such as:

• observing students’ behaviour and interactions and listening to what they


say;
• measuring observational evidence against assessment criteria (sometimes
phrased as ‘can do’ statements);
• encouraging students to reflect on their progress, and to think together about
how to improve.

Testing

4
If assessment is formal then it is known as testing. Andy Baxter (1997) describes
testing as a process in which teachers ask learners questions to which they already
know the answers (whereas when evaluating we are asking questions to which we
don’t have the answers). Testing is concerned with what has been learned whereas
evaluation is also concerned with the how and the why.

STOP AND THINK 2

Given these definitions, which do you carry out more of: formal or informal testing?

Make a list of some of the different ways you do each of these.

5
2 Why assess learners?

Generally we assess learners to provide information to stakeholders. Stakeholders


can be:

• the learners;
• the teacher;
• a director of studies;
• parents;
• employers;
• etc.

When we compare learners with one another it is known as norm-referenced


testing. This is done if a certain percentage of learners need to be selected for
something (such as entry to a course with a limited number of places). It therefore
does not provide information on a learner’s individual performance.

If we measure learners’ proficiency against a specific standard this is called criteria-


referenced testing, and shows what a learner can do in the language. This method
is useful when we want to measure a learner’s ability to perform specific tasks or
place the learner in a band or level.

While larger organisations tend to use tests which measure learners against a
standard of proficiency that is not based on any syllabus they may have followed
(proficiency tests), smaller organisations will often test their learners based on the
extent to which they have mastered certain aspects of a syllabus or the overall
objectives of a syllabus (achievement tests). So the CAE or IELTS exams would be
examples of the former, while a coursebook test or a school end-of-term test could
be examples of the latter.

We can also use tests to decide what course learners should take (placement
tests). Such tests may also be achievement tests if learners are to change level or
proficiency tests if the learners are new to the organisation .Alternatively, they could
be a mixture of the two.

Tests can also be used to identify the strengths and weaknesses of a learner or of a
teaching programme. Such tests are known as diagnostic tests.

STOP AND THINK 3

Think of some tests you have given recently. What were the reasons for giving them?

6
3 Test Construction

Obviously the thoroughness of a test depends on its purpose. A good test though
should be:

• valid;
• reliable;
• practical;
• have no negative backwash (or washback).

Validity is often divided into:

• content validity;
• construct validity;
• face validity.

A test has content validity if it tests a good/representative range/sample of what it is


supposed to test. So if a test aims to test the learners’ ability to produce a specified
range of vowel sounds, for example, then it should test a reasonable range of these
sounds and not just one or two.

If the same test only tested the learners’ ability to recognise the different sounds
rather than their ability to produce them, then it would have low construct validity as
it would be testing the wrong thing. Content validity is often considered to be part of
construct validity, as clearly a test can’t accurately measure a learner’s ability or
knowledge if the test does not contain the language areas that are supposed to be
being tested.

Face validity is to do with the way in which a learner perceives a test. Does the
learner believe it is testing what it is supposed to test? In order to have face validity a
test needs to be designed in a way that allows the learner to see that it really is
testing what it is supposed to.

There are two types of reliability:

• test reliability
• scorer reliability

Test reliability refers to the degree to which the same test given to the same learner
under the same conditions would produce the same results. Obviously the more
thorough the test the more data it produces, which increases reliability, but issues of
practicality mean that we often cannot be as thorough as we would like to be.
Creating a reliable test then is a question of compromise between thoroughness and
practicality. Giving learners fresh starts by providing a variety of tasks rather than,
say, one long one is one way of increasing the reliability of a test. Varying the
question types in a test yet sticking to ones that the learners are familiar with is also a
way of ensuring test reliability. Instructions should also be intelligible and the
conditions in which the test is taken should be the same each time it is sat.

If two different people mark the same test and give it the same mark then the test is
said to have scorer reliability.

7
To increase scorer reliability the test either has to have a set of right answers to mark
against (e.g. in a multiple choice exercise) or an answer key and marking scheme
instructing markers on how they should be marking. The scorer reliability of tests that
can be answered in a variety of ways (such as writing tasks) can be increased if
there is more than one marker.

Practicality refers to how easy a test is to administer. This refers not only to finding
the space, time and invigilating staff for the running of the test but also the time and
expertise necessary for the test’s design, trialling and marking as well as the time
and skills necessary for coming up with a valid and reliable test and answer
key/marking scheme.

There is an inevitable conflict between practicality and reliability as producing a test


that is both valid and reliable takes up a lot of time and resources.

Validity and reliability are also not always easy to reconcile in a test. Reliable tests
are not always valid, and vice versa. Having learners write a letter of complaint might
be a very valid way of testing their ability to write such a letter, for example, but
unless the teacher has carefully considered how much guidance the learners will get,
devised exact criteria for marking and arrived at a clear understanding amongst the
various markers of what constitutes a good letter, then the test will not be reliable.
Alternatively, while getting learners to complete a gapped letter of complaint would
make for a very reliable test, its validity would be very low as it would not show how
well they could produce such a text themselves.

Tests such as the gapped letter described above have the advantage of being
relatively objective as there is often a single correct answer. Therefore they are very
easy to mark and can sometimes even be marked by a machine such as an OMR –
an optical mark reader. However, as well as lacking in validity they are often much
more difficult to design.

Subjective tests, such as writing a letter, are ones in which the marker uses his
judgement. These are usually easier to design but have to be marked by a teacher,
and the marking can be time-consuming. Other issues with subjective testing are that
learners can either play safe by avoiding things they are not sure about or produce
language that is beyond the scope of what is being tested.

Another way of labelling tests (and remember, many of these terms overlap
considerably) is by using the terms discrete-point testing and integrative testing.
A discrete-point test is one which consists of several items, each of which tests a
single point of knowledge at a time (e.g. a test in which each part tests a different
grammatical structure). If we want to know if a learner can recognise or produce a
specific language item, then we use discrete item techniques.

An integrative test on the other hand requires learners to combine various


components of language systems in order to complete a task (for example, in order
to write an accident report they would need to show the ability to spell, use
appropriate discourse markers, use narrative tenses, select relevant lexis, use
punctuation, order the report into paragraphs, etc.). If the teacher wants to test how

8
well a student can use their combined knowledge of single items, then integrative
testing techniques are the best method.

Discrete-point tests are usually objective and require short answers whereas
integrative tests are usually open-ended and require the learner to respond in their
own words. Most tests nowadays use a combination of these techniques depending
on the language and skill that is being tested.

Tests can also be described as direct or indirect. In a direct test a learner’s ability to
perform a task in the language is assessed by getting the learner to perform the task.
So if we want to assess a learner’s speaking then the test requires the learner to
speak. An indirect test assesses aspects of the language which give an indication of
how well a learner performs. To assess a learner’s spoken language he might be
asked to match spoken discourse markers with their functions, for example, or to
choose the right response to a request.

McNamara (2000:5) distinguishes between performance tests and the more


traditional paper-and-pencil tests.

Backwash describes the effect a test has on the teaching that precedes it, i.e. the
extent to which a course is influenced by the test it is leading up to. If the content of a
course is improved by the teacher having to ‘teach to the test’ then this is known as
beneficial, or positive, backwash, whereas if the learners are deprived work on the
areas they really need to work on as a result of the course focusing too much on the
upcoming test then the backwash is said to be negative. It is important to be aware
of backwash and not automatically assume that what is in the exam is necessarily
what the learners need.

In the longer term of course it is also the case that tests change to reflect changes in
the way that teachers are teaching and in the content of courses.

TASK 1

1. Take a test you are familiar with (it can be an internationally taken test such as
IELTS, one your school uses, or even one you have designed yourself).
2. Using the key words in Sections 2 and 3, analyse the pros and cons of the test.
3. Post your list on the discussion board. Then read and discuss your colleagues’
reports.

9
4 What do we test?

Over the years there has been a move away from testing learners’ knowledge of a
language towards testing their ability to use it. There has also been less focus on
testing accuracy and more on testing communicative competence, and helping
learners to learn more effectively is seen by many these days as more constructive
than testing their memory.

This section looks at ways in which we can test learners’

• use of language components;


• use of language in real world (the four skills);

Testing Grammar and Lexis

Below are some of the reasons we test grammar and lexis.

• The syllabuses of schools and coursebooks are very often structurally


organised.
• Schools and teachers still tend to measure learners’ progress in terms of their
knowledge of grammar.
• Grammar and lexis are easy to test through objective tests, which are easy to
mark: there is often a right and a wrong answer.
• Students expect it.

STOP AND THINK 4

Think of as many ways as you can for the testing of grammar and lexis and compare
your list with the one below.

Some common types of testing techniques used for grammar and lexis are:

• Gap-filling
• Multiple choice
• Error spotting
• Transformation exercises (e.g. when learners are given a sentence which
they have to express in another way using either a sentence head or a key
word)
• Jumbled sentences for students to order
• Matching tasks (e.g. word and definition, halves of collocation, sentence
halves, sentences, etc.)
• Cloze tests (a text in which every 7th – though in practice, to maintain
coherence it’s often between every 6th and 10th – word is removed)
• Skeleton sentences, which need to be written in full
• Odd one out
• Writing questions for answers
• Adding to categories

10
STOP AND THINK 5

Make a list of advantages and disadvantages of the above testing techniques.

Testing Reading and Listening

STOP AND THINK 6

Which of the techniques for testing lexis and grammar listed above could also be
used for testing reading and listening?

Compare your list with the one below.

Some examples of testing techniques for reading and listening:

• Gap-filling
• Multiple choice questions
• True/false questions
• Completing tables
• Sequencing a jumbled text (reading)
• Writing answers to questions
• Matching (e.g. titles or topics to texts or paragraphs)
• Inserting headings, sentences, paragraphs back into texts
• Labelling diagrams
• Selecting a picture or sequencing pictures
• Spotting differences in content between a written and a spoken text
• Identifying features of spoken language (listening).

Testing Writing

Here are some more holistic ways of testing writing:

• Simply give learners a title and text type


• Learners expand notes into a text
• Learners produce a text based on visuals
• Learners rewrite a text (change the style)
• Learners reply to an email or letter
• Learners write a review of a film they have watched/book they have read
• Learners fill in a form
• Summary writing

Some of these tests involve reading, which could be said to lower their construct
validity. However, it could also be argued that the tasks are therefore more
communicative and reflect the circumstances in which we write in the real world.

The marking of a written text is a considerably more complex process than the
marking of a discrete-point grammar test. Ideally, a learner’s text should receive the

11
same score regardless of who marks it when, so a purely impressionistic mark is
insufficient. Marking writing involves a lot more than simply adding up points
however; a clear list of criteria is needed.

Exams such as the Cambridge suite or IELTS attempt to provide a rationalisation of


the marking of the skills by using descriptors, profiles and bands. Candidates are
placed into a specific band if their writing is said to match the profile in that band. The
profile is usually a short paragraph comprising certain descriptors of what a learner in
this band can do with the language. For a look at the IELTS writing bands, see:
http://www.ielts.org/pdf/UOBDs_WritingT1.pdf.

Such marking scales are either holistic or analytic. An example of a holistic scale is
the Cambridge suite exams (FCE, CAE, etc.), which uses descriptors to assess the
writing from a global point of view. Analytic marking scales break the writing skill
down into different components and award marks for each one. As you hopefully saw
when you clicked on the above link, the IELTS writing tasks are awarded marks
under the following categories, for example:

• task achievement;
• coherence and cohesion;
• lexical resource;
• grammatical range and accuracy.

Testing Speaking

STOP AND THINK 7

Despite its prevalence in communicative classrooms and the fact that learners often
cite the need to speak as a priority, the testing of speaking is often neglected. Why
do you think this might be?

The main problems when testing speaking are practicality and reliability. Formally
testing learners effectively usually involves dividing a group up into individuals or
groups of 2 or 3 and testing them at intervals, which obviously takes a lot of time.

Assessing learners’ speaking can also be very difficult as speaking is ephemeral and
therefore difficult to analyse in real time. Also, the examiner is often involved in
communication with the learner(s), or so intent on understanding the message of
what is being said that assessment of their speaking is not easy.

Informal assessment of speaking is therefore often a more practical and reliable


option. Monitoring group speaking activities during class time in order to assess
specific aspects of learners’ speaking or to gain a holistic impression of their
communicative competence can usually be done in the normal course of a lesson.
This technique also has the advantage of taking the spotlight off the learner and if the
task they have been set is intrinsically motivating they should be concentrating on
task achievement rather than accuracy, which is a closer reflection of what they
usually need to do in the real world.

12
In assessing speaking we can use scales similar to those used in writing, i.e. either
holistic scales or analytic scales based on the speaking skill.

STOP AND THINK 8

What categories could be used in an analytical scale for the speaking skill?

Here are some ways of testing the speaking of individuals:

• Answer questions from the examiner


• Give a talk or presentation on a certain topic
• Discuss a topic
• Describe a picture, compare and contrast pictures
• Tell a story from pictures
• Read aloud (you would need to decide whether or not learners would have a
chance to see the text first)

Pairs can do the above but also:

• Role-play
• Problem solving tasks
• Ranking tasks
• Debates

The IELTS speaking paper is marked according to the following criteria:

• Fluency and Coherence


• Lexical Resource
• Grammatical Range and Accuracy
• Pronunciation

13
5 Problems with testing and alternative approaches

Does assessment help or hinder learning?

The following criticisms are often levelled at formal language testing.

• Many tests do not accurately measure the skill(s) or use of the language
system(s) that they are intended to evaluate.
• They only give learners one shot at ‘getting it right’; informal continuous
assessment arguably gives a far more representative picture of a learners’
abilities.
• Some learners simply ‘aren’t very good at tests’, maybe because of previous
testing experiences, nerves, attention span or because the way in which they
are being tested doesn’t match their learning style.
• Tests can result in negative backwash and therefore make lessons less
interesting/effective/relevant to learners’ needs.

The key to successful testing, i.e. testing that gives us accurate information about our
learners’ language abilities in a number of areas, is a little often rather than one or
two big tests at the end of units or terms. Getting learners more involved in the
assessment process is another way of increasing its effectiveness. One way of doing
this is to get them to keep portfolios in which they could keep:

• test/mini-test results
• marked homework
• project work (may have been written as part of a group)
• audio-cassettes
• video-cassettes
• interesting articles/texts/song lyrics, etc. that the student has
found/read/understood
• compositions
• pages/extracts from a learner diary
• checklists/learned lists
• previous reports/evaluations by teachers, peers, or self
• lesson-redesigns; lesson analyses
• results of previous performance reviews

This ensures that the learners are responsible for keeping a varied and personal
record of their progress over a course, and shares the responsibility of keeping tabs
on it.

TASK 2

Are teachers the best people to judge whether a learner has made progress?
Shouldn’t the learner have some say in the matter? Andy Baxter lists 8 ways in which
the learners can be involved in assessment.

- Confidence ratings
- Checklists
- Learned lists

14
- Learner diaries
- Redesign and analyse a class
- Self-reports
- Student tests
- Clinics

1. What do you think each of these methods would involve?


2. Do you use any of these techniques?
3. What are the benefits of involving learners in assessment in your opinion?

Give and explain your answers to these questions on the discussion board. Then
read and discuss your colleagues’ postings.

15
6 List of key terms: Testing terminology answer key

achievement tests a way of testing whether learners can do what they


have been taught

analytic marking schemes an attempt to avoid marker subjectivity by breaking


up the learners’ language into different areas for
assessment

backwash the influence a final test has on the teaching that


comes before it

construct validity the extent to which a test is designed to test what it


is supposed to test and nothing else

content validity the extent to which a test tests what it is supposed


to test

criteria-referenced testing a means of assessing a learner’s performance based


on a list of criteria

diagnostic tests a means of analysing a learner’s strengths and


weaknesses, often used to provide information
about what to include in a course programme

direct testing when the learner is asked to perform the skill that is
being tested

face validity the extent to which a test appears to test what it is


designed to test

formative evaluation assessment that provides information to be used as


feedback to modify the teaching and learning
activities of a course

indirect testing testing the learners’ ability to perform certain skills


by testing things related to the skill rather than by
getting them to actually perform the skill

norm-referenced testing measuring a learner’s performance by comparing his


score with that of other learners

placement tests a means of determining the class or level that a


learner should be placed in

proficiency tests a way of testing a learner’s general level irrespective


of the teaching that has preceded it

scorer reliability the extent to which a completed test would be given


the same score by two or more different markers

summative evaluation evaluation carried out at the end of a course to


determine the effectiveness of the teaching and
learning

16
test reliability the extent to which the same test given to the same
learner at a different time would produce the same
score

NB Progress tests can be a type of formative assessment. Achievement tests are a type of
summative evaluation. The terms formative and summative are used to talk about evaluation
that provides information about both the learning and the teaching that has taken place,
whereas progress and achievement tests are designed primarily to test the effectiveness of
the learning.

17
7 Exam Practice

Paper Two Task One (18 marks)

The text for this task is reproduced on the opposite page. It is being used in the
following situation

The group consists of six Czech learners all from the same company,
which provides financial services. There is a range of abilities within the
group, which is nominally Intermediate (CEFR B1-2). The stated needs
of the group are to improve their business vocabulary and spoken
fluency. The test was set as part of an end-of-term business English
test. There were other parts which tested the learners’ reading and
writing skills.

Using your knowledge of relevant testing concepts, evaluate the effectiveness of the
tasks for this learner in this situation.

Make a total of six points. You must include positive and negative points.

   

18
Progress  Test  December  2010  
Prepositions  

Put  one  word  in  each  gap.  

1. Our  company  spends  a  lot  _____  marketing.      


2. We  now  have  hundreds  _____  customers  all  over  the  country.  
3. _____  I  joined  the  company  I  have  been  very  busy.  
4. Our  profits  have  increased  _____  10%  so  far  this  year.  
5. I  took  _____  a  loan  from  the  bank  and  now  I  have  to  pay  it  _____.  
6. In  most  shops  now  you  have  a  choice  of  paying  _____  credit  cash  or  _____  
cash.  
7. Please  sign  _____  the  bottom  _____  the  contract.  

Contracts  

Put  the  words  in  the  correct  order.  

1. The  known  and  Seller  hereinafter  Buyer  parties  be  as  will  The  The    
2. contract  of  If  either  the  breaks  void  be  the  it  parties  will  and  null  
3. in  this  many  clauses  There  contract  are  

Personal  Finance  

Write  two  words  that  can  both  go  before  the  words  on  the  right.  

1. __________     account  
__________  
2. __________     a  mortgage  
__________  
3. __________     money  
__________  
4. __________     an  invoice  
__________        

Accounting  

Match  the  words  on  the  left  with  their  definitions  on  the  right.  

1. amortization     record  the  lower  price  of  an  asset  due  to  depreciation  
2. write  down     the  current  value  of  an  asset  
3. write  off     the  loss  in  value  of  an  asset  over  time  
4. book  value     record  the  value  of  an  asset  as  zero  due  to  depreciation  

19
8 Further Reading

Key sources. Try and read at least one of these.

Baxter, A. (1997) Evaluating Your Students Richmond

Harris, M. & McCann, P. (1994) Assessment Heinemann

Hughes, A. (1989) Testing for Language Teachers CUP

McNamara T. (2000) Oxford Introductions: Language Testing OUP

Rea-Dickins, P & Germaine, K. (1992) Evaluation OUP

Other useful books

Alderson J.C. (1995) Language Test Construction and Evaluation CUP

Alderson, J. C., Clapham, C. & Wall, D. (1995) Language Test Construction and

Evaluation CUP

Allan, D. April (1999) Distinctions and Dichotomies: Testing and Assessment ETP

Issue 11

Bachman L. (1990) Fundamental considerations in Language testing OUP

Bachman L. (1996) Language Testing in Practice OUP

Bowler, B. & Parminter, S. April (1997) Continuous Assessment ETP Issue 3

Dawson, J. October (1997) Assessing Spoken English ETP, Issue 5

Harris, M. January (1997) Self-assessment in formal settings ELTJ 51/1

Heaton, J. B. (1988) Writing English Language Tests Longman (new edition)

Underhill, N. (1987) Testing Spoken Language Cambridge University Press

20

You might also like