You are on page 1of 87

Comparing Quality:

what teachers need to ask exam providers

Nick Saville

ALTE, Vilnius
November 2007
Vilnius November 2007 Nick Saville
Introduction
Outline of talk

ƒ Testing for teachers - questions to ask


ƒ Stakeholders as customers
ƒ The ALTE Quality “toolkit”
ƒ The CEFR and the Manual for Guidance
ƒ Conclusions

Vilnius November 2007 Nick Saville


Testing for Language Teachers
A.Hughes, 2003 (2nd edition), CUP

‘… many language teachers harbour a mistrust of tests and testers.’

‘Too often language tests have a harmful effect on teaching and


learning and fail to measure accurately – whatever it is they are
intended to measure.’ (page 1)

He also explains why teachers need to know more about language


testing – both in order to write their own tests … and also to
evaluate external examinations (the theme of this talk)

Vilnius November 2007 Nick Saville


In addressing language teachers in particular
Hughes suggests that:

ƒ Teachers need to be able to ask appropriate


questions of test providers

ƒ They also need to understand the main concepts of


assessment

ƒ Otherwise they can’t understand the answers!

Vilnius November 2007 Nick Saville


Cyril Weir also argues that more people in
education need to understand about testing

ƒ He suggests that users of tests – students and


their teachers - are like customers …..

I’ll come back to this idea later

Vilnius November 2007 Nick Saville


‘When we are buying a new car or a new house we have a whole
list of questions we want to ask of the person selling it. Any failure
to answer a question or an incomplete answer will leave doubts in
our minds about buying. Poor performance in relation to one of
these questions puts doubts in our minds about buying the house

or car.’

C.Weir

Vilnius November 2007 Nick Saville


So –

What sort of questions should we ask about language


tests?

How can we tell the difference between a good test


and a bad one?

Vilnius November 2007 Nick Saville


A few suggestions:
e.g. about a test at CEFR B1 level (Threshold Level)

ƒ Is this B1 test really a B1 test?


How do you know?

ƒ Does it test what you say it does?


ƒ Has that claim been validated?
What evidence, if any, is provided to support it?

ƒ Are the judgements made by the oral examiners


accurate and reliable?
Are they appropriately trained and monitored?

Vilnius November 2007 Nick Saville


What about the answers?

How do you know if the answers are adequate?

Or whether they show that the test meets


acceptable standards?

Vilnius November 2007 Nick Saville


Key points for this talk:
ƒ Stakeholders for tests should be treated as clients or
customers (users of products or services)

ƒ The concept of quality is based on customer satisfaction

ƒ As customers the stakeholders need to:

Know what to look for (for their own purposes)


Be able to ask the right questions
Understand the answers
Be able to find “tools” to help them make decisions

Vilnius November 2007 Nick Saville


I’d like one of those
digital cameras
It depends…
How much are
they? What features
do you want?

Client Service provider

Transaction
Vilnius November 2007 Nick Saville
Panasonic Lumix DMC-FX07 - Digital Camera, Silver

Nico chose this one


because of the
following features:

Price (around 200


euros)
The mega pixels
The brand of the
lens
The battery life
Nico’s camera The size (fits in a
pocket)
Vilnius November 2007 Nick Saville
Other Lumix models …….

Panasonic DMC- FX100EB-K Panasonic DMC- FZ50EB-K


Jon Simon chose this one

Vilnius November 2007 Nick Saville


Digital cameras: Which? Magazine
The magazine of the UK consumers’ association

ƒ Specification: ƒ Rating (5-point scale)


ƒ price ƒ sharpness of picture
ƒ mega pixels ƒ picture quality (overcast)
ƒ lens ƒ picture quality (sunny)
ƒ battery life ƒ overall picture quality
ƒ size/weight

Vilnius November 2007 Nick Saville


Which? Magazine on tests

ƒ For example, Eye tests at the optician

Vilnius November 2007 Nick Saville


Criteria for analysis
ƒ Eye tests

ƒ accuracy of
prescription/diagnosis
ƒ thoroughness
ƒ speed of testing
ƒ qualifications of test
providers
ƒ interpretability of results
ƒ cost

Vilnius November 2007 Nick Saville


Criterial features for language tests:
ƒ purpose
ƒ construct definition
ƒ test method – tasks etc.
ƒ content coverage
ƒ skills coverage
ƒ accuracy of measurement
ƒ predictive/diagnostic power
ƒ score interpretability
ƒ test length
ƒ accessibility
ƒ cost

Vilnius November 2007 Nick Saville


Additional criterial features:
ƒ degree of specificity – for work, for study ……
ƒ currency and recognition – where and how the test is used
ƒ relationship to curriculum – washback
ƒ impact in the wider world

ƒ Other???

Vilnius November 2007 Nick Saville


How can the work of ALTE help?

What “tools” are available to help


stakeholders make the right choices?

Vilnius November 2007 Nick Saville


www.alte.org

ALTE Quality

Vilnius November 2007 Nick Saville


Aims of ALTE – since 1990
ƒ To establish common levels of proficiency in order to
promote the transnational recognition of certification in
Europe
ƒ To establish common standards for all stages of the
language-testing process
ƒ To collaborate on joint projects and in the exchange of
ideas and know-how

Motto: “Attaining standards: sustaining diversity”

Vilnius November 2007 Nick Saville


Aims of ALTE – since 1990
ƒ To establish common levels of proficiency in order to
promote the transnational recognition of certification in
Europe
ƒ To establish common standards for all stages of the
language-testing process
ƒ To collaborate on joint projects and in the exchange of
ideas and know-how

Motto: “Attaining standards: sustaining diversity”

Vilnius November 2007 Nick Saville


Quality issues: the ALTE toolkit
a) ALTE Code of Practice
b) Principles of Good Practice
c) Quality Management System - QMS
QMS checklists
d) Minimum standards – 17
Auditing the quality profile

All available on ALTE website in many languages


Vilnius November 2007 Nick Saville
Quality issues: the ALTE toolkit
a) ALTE Code of Practice
b) Principles of Good Practice
c) Quality Management System - QMS
QMS checklists
d) Minimum standards – 17
Auditing the quality profile

All available on ALTE website in many languages


Vilnius November 2007 Nick Saville
a) The ALTE Code of Practice
general/ ethical ALTE Code of
framework Practice, 1994
philosophical

ALTE COP & QMS


principles ALTE Principles of
Good Practice,
1993; 2001
quality ALTE COP & QM
management checklists, 2001
system
specific/
ALTE auditing
practical standards system

Vilnius November 2007 Nick Saville


a) The ALTE Code of Practice

…. common standards for all stages of the


language-testing process

1994 - ALTE published its first Code of Practice

Focused on the roles and responsibilities of


stakeholders in striving for fairness

Vilnius November 2007 Nick Saville


a) The ALTE Code of Practice
Striving for fairness for stakeholders of the examinations

The Code of Practice identifies the roles of three groups


of stakeholder in the testing process:

ƒ the examination developers - e.g. members of ALTE

ƒ the examination takers - primary users who take the examinations


by choice, direction or necessity

ƒ the examination users - secondary users (such as teachers) who


require the examination for some decision-making or other purpose

Vilnius November 2007 Nick Saville


a) The ALTE Code of Practice
Striving for fairness for stakeholders of the examinations

The Code of Practice identifies the roles of three groups


of stakeholder in the testing process:

ƒ the examination developers - e.g. members of ALTE

ƒ the examination takers - primary users who take the examinations


by choice, direction or necessity

ƒ the examination users - secondary users (such as teachers) who


require the examination for some decision-making or other purpose

Clients or customers
Vilnius November 2007 Nick Saville
a) The ALTE Code of Practice

The Code of Practice lays down four broad areas of


responsibility:

developing examinations
interpreting examination results
striving for fairness
informing examination takers

Striving for fairness is a shared responsibility involving all


stakeholders

Vilnius November 2007 Nick Saville


CoP now in e.g. teachers
22 languages

Vilnius November 2007 Nick Saville


Quality issues: the ALTE toolkit
a) ALTE Code of Practice
b) Principles of Good Practice
c) Quality Management System - QMS
QMS checklists
d) Minimum standards – 17
Auditing the quality profile

All available on ALTE website in many languages


Vilnius November 2007 Nick Saville
b) Principles of Good Practice

The Principles of Good Practice set out in more detail


the principles which ALTE members should adopt in
order to achieve high professional standards

It is also important for other stakeholders to understand


some of these principles too – e.g. language teachers

Vilnius November 2007 Nick Saville


Principles of Good Practice
VRIP Features of the Exams
Four key concepts:
ÖV
ÖR
ÖI
ÖP

Vilnius November 2007 Nick Saville


Principles of Good Practice
VRIP Features of the Exams
Four key concepts:
Ö Validity
ÖR
ÖI
ÖP

Vilnius November 2007 Nick Saville


Principles of Good Practice
VRIP Features of the Exams
Four key concepts:
Ö Validity
Ö Reliability
ÖI
ÖP

Vilnius November 2007 Nick Saville


Principles of Good Practice
VRIP Features of the Exams
Four key concepts:
Ö Validity
Ö Reliability
Ö Impact
ÖP

Vilnius November 2007 Nick Saville


Principles of Good Practice
VRIP Features of the Exams
Four key concepts:
Ö Validity
Ö Reliability
Ö Impact
Ö Practicality

Vilnius November 2007 Nick Saville


Principles of Good Practice
VRIP Features of the Exams
Four key concepts:
Ö Validity
Ö Reliability
Ö Impact
Ö Practicality
See Bachman and Palmer 1996

Vilnius November 2007 Nick Saville


Validity
Arthur Hughes, 2003

ƒ A test is said to be valid if it provides:

“…consistent measures of precisely the abilities we are interested in.”

ƒ So as testers we need to specify:


- the abilities to be tested - the constructs
- how those abilities are measures consistently and
precisely - i.e. to achieve reliability

Vilnius November 2007 Nick Saville


Reliability
ƒ Accurate
ƒ Consistent
ƒ Stable results
ƒ Is the same ability level needed to pass at each
administration?
ƒ For Speaking and Writing Tests – is there
standardisation of procedures and the assessment?

Vilnius November 2007 Nick Saville


Impact
ƒ Effects, beneficial or otherwise, on candidates:
ƒ washback on the classroom;
ƒ Wider social, economic or political impact
ƒ Meeting learning goals of the test taker
ƒ Testing real skills/using authentic materials
ƒ Aligned to the CEFR?

Vilnius November 2007 Nick Saville


Practicality
ƒ Costs and availability of resources
ƒ Availability of venues and administration dates
ƒ Availability of qualified examiners

Vilnius November 2007 Nick Saville


Quality issues: the ALTE toolkit
a) ALTE Code of Practice
b) Principles of Good Practice
c) Quality Management System - QMS
QMS checklists
d) Minimum standards – 17
Auditing the quality profile

All available on ALTE website in many languages


Vilnius November 2007 Nick Saville
c) The Quality Management System

“Putting the principles into practice”

The CoP as self-evaluation checklists

Based on international QM Systems such as ISO


9001:2000

Incorporates the principle of customer satisfaction

Vilnius November 2007 Nick Saville


c) The Quality Management System

Aim

To improve quality to improve


customer satisfaction

In language testing, who are the customers/clients?

Vilnius November 2007 Nick Saville


Code of Practice

Primary Users
= test takers

Secondary Users
= those who use the
results tests

Achieving fairness - a shared responsibility

Vilnius November 2007 Nick Saville


c) The Quality Management System
The QM checklists are based on 4 aspects of the
Test Development and Administration Cycle:

1. Test Design and Construction


2. Administration
3. Processing (marking, grading, issue of results)
4. Analysis and Review

- based on Model of Test Development

Vilnius November 2007 Nick Saville


ALTE Quality Checklists - Units 1 to 4

Vilnius November 2007 Nick Saville


Unit 1 - Test Construction

A. Conceptual phase
B. Test development, test construct & context
C. Communication with External Stakeholders

Vilnius November 2007 Nick Saville


Example from the Conceptual Phase

Saville and Hargreaves –


This
model question
of speaking ability
usedrelates to the
for Cambridge
examinations
construct validity of
the test
This relates to the
construct validity of the
test
Vilnius November 2007 Nick Saville
Example from the Development Phase

This
This question
question also
also relates
relates
to
to various
various aspects
aspects of
of
validity:
validity:

Construct-related
Construct-related
Criterion-related
Criterion-related
Content-related
Content-related

Vilnius November 2007 Nick Saville


Example from the Communication Phase

This
This question
question relates
relates to
to “informing test takers” and thus to the
“informing impact
test takers”
of theand
test (consequential validity)
thus to the impact of the test
(consequential validity)

Vilnius November 2007 Nick Saville


Unit 2 - Administration/logistics

This question relates to the important area of setting test performance


This question
conditions – asrelates
changesto the important area
in conditions of setting
can affect test performance
performance this can
conditions – as changes
be seen inasconditions can affectissue
a validity/reliability performance this can
be seen as a validity/reliability issue

Vilnius November 2007 Nick Saville


Unit 3 - Marking, Grading, Results

Examiner recruitment &


training documents/regulations

This question relates to the establishment of procedures to


This question relates to the
ensure establishment
rater reliability of procedures to
ensure rater reliability

Vilnius November 2007 Nick Saville


Unit 4 - Test analysis and post-exam review

This
This
question
question relates
relates
to the
to the
estimation
estimation
of test
of test
biasbias – a
– a validity
validity
issueissue

Vilnius November 2007 Nick Saville


Quality issues: the ALTE toolkit
a) ALTE Code of Practice
b) Principles of Good Practice
c) Quality Management System - QMS
QMS checklists
d) Minimum standards – 17
Auditing the quality profile

All available on ALTE website in many languages


Vilnius November 2007 Nick Saville
d) Standards and the Quality Profile

Quality standards

“Best Practice Models”

Good practice

Satisfactory
Quality Standard
In need of improvement

Vilnius November 2007 Nick Saville


d) Standards and the Quality Profile

ƒ Minimum standards have been agreed to establish a


Quality Profile for an exam or suite of exams
ƒ ALTE members are now required to explain how their
examinations meet these standards
ƒ Evidence to back up the argument is also required
ƒ The explanation and the evidence is checked by an
auditing process

Vilnius November 2007 Nick Saville


d) Standards and the Quality Profile
The 17 minimum standards are based on the Quality
Checklists and cover:

ƒ test construction
ƒ administration & logistics
ƒ marking & grading
ƒ test analysis
ƒ communication with stakeholders

in 26 languages
Vilnius November 2007 Nick Saville
Minimum Standards – example in English
TEST CONSTRUCTION

1 The examination is based on a theoretical construct, e.g. on a model of communicative


competence.

2 You can describe the purpose and context of use of the examination, and the
population for which the examination is appropriate.

3 You provide criteria for selection and training of test constructors and expert
judgement is involved both in test construction, and in the review and revision of the
examinations.

4 Parallel examinations are comparable across different administrations in terms of


content, stability, consistency and grade boundaries.

5 If you make a claim that the examination is linked to an external reference system (e.g.
Common European Framework), then you can provide evidence of alignment to this
system.

Vilnius November 2007 Nick Saville


Minimum Standards – another example
TESTO RENGIMAS
1 Egzaminas pagrindžiamas teoriškai, teorinis konstruktas gali būti, pavyzdžiui,
komunikacinės kompetencijos modelis.

2 Apibūdintas egzamino tikslas, jo naudojimo sąlygos, kandidatai, kuriems tinka šis


egzaminas.

3 Yra nustatyti testų rengėjų atrankos ir mokymo kriterijai, ir specialistai aptaria ir


sprendžia testų rengimo, egzamino peržiūros ir pertvarkymo klausimus.

4 Skirtingose vietose vykdomi egzaminai gali būti lyginami turinio, pastovumo, darnos
ir įvertinimo ribų požiūriais.

5 Tvirtinimai, kad egzaminai atitinka kokią nors išorinę atskaitos sistemą (pavyzdžiui,
„Bendruosius Europos kalbų metmenis“), pagrindžiami sąsajų su ta sistema
įrodymais.

Vilnius November 2007 Nick Saville


Minimum standard

Test construction

1 The examination is based on a


theoretical construct, e.g. on a model of
communicative competence.

Vilnius November 2007 Nick Saville


Minimum standard
Administration & logistics

6 All centres are selected to administer


your examination according to clear,
transparent, established
procedures, and have access to
regulations about how to do so.

Vilnius November 2007 Nick Saville


Minimum standard

Marking & grading

11 Marking is sufficiently accurate and


reliable for purpose and type of
examination.

Vilnius November 2007 Nick Saville


Minimum standard
Test analysis

13 You collect and analyse data on an


adequate and representative sample of
candidates and can be confident that their
achievement is a result of the skills
measured in the examination and not
influenced by factors like L1, country of origin,
gender, age and ethnic origin.

Vilnius November 2007 Nick Saville


Minimum standard
Communication with stakeholders

15 The examination administration system


communicates the results of the
examinations to candidates and to
examination centres (e.g. schools)
promptly and clearly.

Vilnius November 2007 Nick Saville


d) Standards and the Quality Profile
ƒ But - different ALTE exams need different quality
profiles to take into account the following:

context and purpose of use


target candidature – size, demographics, distribution
currency and recognition
distribution of test centres
type of organisation involved

Vilnius November 2007 Nick Saville


The Auditing Process
ƒ ALTE Members build an argument that the quality profile is sufficient and appropriate
for a particular test or suite of tests

ƒ There is no intention to impose a single set of quality profiles

ƒ The audit has both a quality control and consultancy function

ƒ It aims to establish that minimum standards are met in a way that is appropriate to the
context of the test and to offer recommendations towards best practice

There is always room for improvement !

Vilnius November 2007 Nick Saville


Teachers can also base their questions to examination
providers on these minimum standards:

Standard 1
.
What theoretical construct is your exam based on?

What model of communicative competence do you use?

How is this reflected in your test tasks and procedures?

Vilnius November 2007 Nick Saville


Teachers can also base their questions to examination
providers on these minimum standards:

Standard 4
.
How do you ensure that you produce parallel examinations which
are comparable across different administrations?

How do you ensure comparability of content and stability of the


grade boundaries?

What data and analysis do you have to demonstrate this?

Vilnius November 2007 Nick Saville


Finally how can the CEFR help
testers … and teachers?

….. returning to the question:


Is this B1 test really a B1 test?

Vilnius November 2007 Nick Saville


Who knows anything about the CEFR?
The Common European Framework of Reference for Languages:
learning, teaching, assessment

Le Cadre Commun Européen pour les Langues: apprendre, enseigner,


évaluer

Der Gemeinsame Europäische Rahmen für Sprachen: lernen, lehren,


beurteilen

etc…..

Already in 36 languages (Lithuanian in preparation)

Vilnius November 2007 Nick Saville


Most people now know the levels!!
A B C
Basic User Independent User Proficient User

A1 A2 B1 B2 C1 C2
Breakthrough Waystage Threshold Vantage Effective Mastery
Operational
Proficiency

But it is much more too!


Vilnius November 2007 Nick Saville
Is this B1 test really a B1 test?
or
Is my B1 comparable with your B1?

How can we tell if two different tests can be used for the
same purpose? Are they equivalent or comparable?

ƒ Two test of English at B1


ƒ A test of English at B1 and a test of French at B1

What do we mean when we talk of ‘equivalence’ or


‘comparability’?

Vilnius November 2007 Nick Saville


Equivalent – meaning #1
‘Different versions of the same test, which are
regarded as equivalent to each other in that they
are based on the same specifications and
measure the same competence.’

From: ALTE Multilingual Glossary of Testing Terms

Published in the Studies in Language Testing, Vol 6, (1998),


Cambridge ESOL/CUP

Vilnius November 2007 Nick Saville


Equivalent – meaning #1
‘To meet the strict requirements of equivalence
under classical test theory, different forms of a
test must have the same mean difficulty,
variance, and co-variance, when administered to
the same persons.’

ALTE Multilingual Glossary of Testing Terms, 1998

Vilnius November 2007 Nick Saville


Equivalent – meaning #1
‘Equivalence is very difficult to achieve in practice.’

ALTE Multilingual Glossary of Testing Terms

Vilnius November 2007 Nick Saville


Equivalent – meaning #2

‘the relationship between two (different) tests’

‘Strictly speaking, this concept is unjustifiable, since each test is


designed for a different purpose and a different population, and may
view and assess language traits in different ways as well as
describing test-taker performance differently.’

Dictionary of Language Testing Terms, 1999 – Davies et al

Published in the Studies in Language Testing, Vol 7, (1998),


Cambridge ESOL/CUP

Vilnius November 2007 Nick Saville


Equivalent – meaning #2

‘However, in reality test users may demand


statements of equivalence between different tests
(for example, admissions officers at educational
institutions).’

Dictionary of Language Testing Terms, 1999

Vilnius November 2007 Nick Saville


Equivalent – meanings #1 and #2
ƒ It is common and quite easy to put together a
table which ‘maps’ a range of different exams
against each other (for example using the
CEFR)

ƒ But what evidence is there to support the claims


of equivalence or comparability?

Vilnius November 2007 Nick Saville


Remember….
‘… each test is designed for a different purpose
and a different population, and may view and
assess language traits in different ways as well
as describing test-taker performance differently.’

This is reflected in the approach taken by ALTE


in the QM system and the way the auditing
system works
Vilnius November 2007 Nick Saville
Council of Europe Manual for Guidance
Preliminary Pilot Version - 2003/4

November 2003 DGIV/EDU/LANG (2003) 10

Relating Language Examinations to the


Common European Framework of Reference for
Languages: Learning, Teaching, Assessment

It can be accessed in English and French at the Council of Europe

webpage: www.coe.int/lang

Vilnius November 2007 Nick Saville


The 4 parts of the Manual:
ƒ Familiarisation:
You must know the CEFR!!

ƒ Specification:
You must specify your constructs and content coverage in relation
to the CEFR

ƒ Standardisation:
You must be able to compare the test with validated test samples,
and student performances at CEFR levels

ƒ Empirical validation:
You need to have the relevant technical know-how to calibrate
your exam materials

Vilnius November 2007 Nick Saville


The Manual can also help language teachers:
ƒ to ask the the right questions

ƒ to understand the answers related to:


Specification
Standardisation
Empirical validation

ƒ to evaluate claims of comparability and alignment

ƒ Teachers should ask test providers to support their claims


using the CEFR and the Manual as points of reference

Vilnius November 2007 Nick Saville


Conclusions
ƒ Test providers can help teachers understand the concepts
and explain the empirical evidence
E.g. using the ALTE toolkit

ƒ Comparative frameworks such as the CEFR serve a useful


function for a wide variety of test stakeholders
ƒ the CEFR makes it easier to understand the range of options
which are available and help users to make appropriate choices for
their needs

Vilnius November 2007 Nick Saville


Conclusions
HEALTH WARNINGS
ƒ Claims that are made need to be checked – make sure
that there is adequate evidence to back them up
ƒ Be careful in using the CEFR and other Frameworks
ƒ they encourage oversimplification and misinterpretation
ƒ there is a danger that become prescriptive, rather than informative tools

ƒ Use the ALTE toolkit and the Council of Europe Manual to


help evaluate claims and understand the evidence

Vilnius November 2007 Nick Saville


Conclusions

CAVEAT EMPTOR

Thank you for your attention

Vilnius November 2007 Nick Saville

You might also like