Professional Documents
Culture Documents
1 .1 lntroduction
To help decide on rhe mosr suitable formars for inclusion in a tesr, ir is useful to be aware
of the alternative approaches o language resting and their limitations in terms ofrhe criteria
of validity, reliability and efficiency.
Validity is concerned wirh whether a tcst measures what it is intended to measure
Reliability is mrcerned with the extent ro which we can depend on rhe test results. Efficiency
is concerned with maners of practicality and cosr in tesr design and administration.
These glosses should bc sufficient ro follow the review ofapproaches to language testing
in this chapter. Readers requiring a more deailed treatment before continuing are referred
to Chapter Two where a full discussion of these concepts can be found.
Davies (1978) argued rhat by the mid-1970s. approaches to tesring seemed to lalt along
a continuum stretching from 'discrete' itcm tesr at one end, to irltegrative tests such as
cloze at the othcr. He took the view that in testing, as in teaching, there was a rension
berween the analytical on the one hand and the inregrative on the other, and considered
that (p. 149) 'the most satistactory view of language Esting and rhe most us€ful kinds of
language tests, are a combination of rhese two views, rhe analytical and the integraaive.'
He went on to say that it was probable, in any case, thar no rcst could bc wholly analytical
or integrative (p. 149):
The two polcs of anaiysis and inrcgration arc similar to (and may bc cloiely relared to) the
concepG of rcliability and validiry. Test reliabilitv is incr.ased by adding ro the srock ofdiscrere
items in a test: rh€ smallcr thc birs and the more of thcse there are. the higher rhc porentiai
reliahiliry. Validiry, howcvcr. is increased by making rhe tesr rruer to life. in this case more
like language in use.
Oller (1979), on the other hand, felt rhat testing should focus on rhe integrative end
oi the continuum. He made a strong case for following rhe swing of rhe testing pendulum
away from what Spoisky (1976) had described as the .psychometric-srructuralist era'.
2 CommunEatve Language feslring
il ;;il .;".l.r;x
while the distinction bctwcen ,h.
ff
,*'^t?T':..ot
"ff ilT:rj,5 f Hi:,;fr
proccdurcs adoptcd, but,
;": ffi:#*:,J;x
*l**#*--tr#gffi*e**i1u*,x
Analysis of a studcnr's responscs to ao indirecr tcst will ttot ptwidc any rclevanl informllion
as ro the ressorls for thc studcnt's difficultics in the sullEntic trsk. of which onc ilssumcs.
&c indirecr tcsl is a vrlid rrd rcliablc mcasurc. By thcir vcry Murs. indircst tcsts can provide
cvidcnce for /.vel ofachicvcmenl, but cannot diagnose specific arEas ofdifficulty in rclation
to dle authcnric task.
lntegrative tests such as cloze only tell us abour a candidate's linguistic competencc.
They do nor tell us anything direcdy about a student's performance ability, and their main
value in their unmodified form is in designating competencre levels rather than reladng
candidates' perform8nce to any external criteria. They are pcrhaps only of limited use
where the interest is in what the individual student can or @nnot do in terms of the various
language taSks he may facc in real life situations.
The deficiencies in the rype of information the 'discrete point' approaches of the
psychornetric - structuralist era and the more intcgrative approachcs the of
psycholinguistic-sociolinguistic era can provide bring about a need to inve$i8atc thc
'communicarive paradigm'to see whether this approach might prove morc satisfactory.
Apprcaches to language eg desgn 7
1 .1.1 Teiminology
There is, a potential problem with terminology in somc of the lircrature on communicative
approaches to language testing. References are often made in the literarure to testing
communica(ive 'performance', e.g.. B.J. Carroll's book (1980b) is enritled ?rerrmS
communicarive performance. It seems reasonabie ro ralk of tcstrng pcrformance if the
reference is to an individual's performance in one isolared situation, but as soon rs we
wish to gcneralise abour ability ro handle other siruations. 'comperence' as well as
performance' would seem to be involved. or more preciselv caoaciry' in the widdowsonEn
s93p_e (Widdowson. l9E3). Bachman s use of the rerm communicativc language rbility
*hi.h in.lrd"r bo,h kll rmplementing that
compqtence io lansuaJi_lllg_I9ql.d !e_9q-p_q!9!!.Egl!*!ft! ,y!{o.19!-: l:* ln
providinq a more inclusive and satisfactorv detinitioflof language proticiencv.
Sarictly speaking, a performance test is one which samples behaviour in a single setting
with no intention ofgcneralising beyond that sening
- o(herwise a communicative language
rest is bund to concern itself with 'capacity'(Widdowson. 1983) or 'communicative
language ability' (Bachrnan, 1990). The very aa ofgenera.lising beyond the seding acrually
tested, implies some shtemens about abilities to use the language and/or knowlcdge of
ir, Conversely it is difficuk to see how competence (knowing about using a language) might
be evaluated except throuBh ias realisation in performance. Only performarrce can be direcdy
observed and hence evaluated. All linguistic behaviour. even completing multiple choice
tess of phoncme discrimination, necessarily involves performance. [n practicc a clcar
distinctioo betwcen performance and competence wiu be difficult to mainain.
ln testing communicative lansuase abili ty we are evaluating samples of performance.
ln cedarn ific contexas of use. created under panicular test constrailG. for whal they
can tell us about a candidate's communicative capacity or language ability Skehan l9E8)
points out that while such tests may not replicate exadly the performance conditions of
a specific task in the hrget situation they are likely ro replicate to some degree conditions
aclu P€ rmance
Skehan summarises the current position succioctly:
What 'rc nced is a thcory which guidcs and prcdicts how an und€rlying communicflive
compctcnca is manifcstcd in actual perforrnance: how situations aac rclated to one anotllcr.
how compclcocc can bc assesscd by eramples of performancc on actual tests; what componcn(s
communicativc competcncc rctually hasl and how these imcrrclare .. . . Since such dcfinitive
theorics do not cxist. tcstcrs havc to do the best they can with such theories as arc available.
argumcnrs rclaring ro thc state of the available descriptions of languagc in use not
withstanding.
Agrecmenl on what componcnts should be included in a model of communicalivc
language ability is by no means unanimous (sce Courchcnc and dc Baghccra. 1985. p. 49).
Indecd rclatively lirtlc is known about the wider communicative paradigm in comparison
wirh linguistic compctence pcr sa al|d adequately devclopcd therrics of communicativc
languaSe usc arc not yet availablc. This is not to say we must wait for complction of such
rheories bcfore appropriatc Iesting procedures can be dcveloped. Rather we nced to
investigare syslematically some of thc availablc hypothescs about languagc usc and try
ro operationalisc thesc for testing purposas. ln this way the constructs and proccsses of
applied linguistics may bc cxamined cmpirically and thcir status evaluated.
Canale and Swain (1980) provided a uscful $aning poinr for e clarification of thc
tcrminology neccssary for forming a more definite picturc of the ability ro use language
eommunicatively. Thcse authors rook communicative competcncc to includc grainmarical
compdencc (knowlcdge of thc rules of grammar), sociolinguistic mmpctence (knowledgc
of the rulcs of use and rulcs of discourse) and srrategic c.omperencc (knowlcdgc of vcrbal
and non-vcrbal communication srategics). The modcl was subsequcntly updated by Canale
(1983). who proposed a four-dimcnsional modcl comprising linguistic. sociolinguistic.
discoursal and *rategic compatenccsl the addilional disrinction being made between
sociolinguistic (sociocultural rules) competcnce and discoursal competence (cohesion anil
coherence).
The franrcwork proposed by Bachman (190) is consistcnt with these earlier definitions
of communicativc languagc ability:
Language compelence is composed of the specific knowledge and skills required for
operating the language system. for establishing the meanings of unerances, for employing
language appropriate to the conlext and for operating through language beyond the level
of the sentence. Strategic competence consists of the morc general knowledge and skills
involved in assessing. planning and exccuting mmmunicative acts efficiently. Skchan (1988)
suggests lhal the strategic component is implicatcd when communication requires
improvisarion because the oth€r compctcnces are in some way insufficient. The final pan
of Bachman's modcl deals wilh skill and method factors which arc meant to handlc thc
actual operation of language in real situations and so locatc compctcncc in a widcr
pcrformancc framework.
Modcls such as Ihese providc a potcntially useful framework for the design oflanguagc
rcss. but ir must be emphasised rhat thcy are still themselves in nccd of validarion (sec
Approacnes to language test desqn 9
Brindley. 1986: Swain, 1985). The existence of the componenrs of the model even as
separaae emities has not bcen eslablished. Skehan (198E) rightly Points but that the
relationship b€lween the various compercnces is not enairely clear. nor is the way they
are integrated into ovcrall mmmunicative competence. Nor is it made clear how this
communicarive comp€tence is Faoslaled into communicative pertbrmance- Candlin ( l9t6)
also outlined some of the Problems to bc taced in testing communicative comPetence and
argued that their solution dcpends first on our description of this construct'
io date a limited amount of research has been carried oul on investigating the
measutement of language comPetence and method factors but very linle h:rs been done
on the specific measuremenl of communication strategies or its relationship to the other
competences. This in itself may b€ afl indicarion of the inherent dittrculties in this arel'
There is a pressing need for syslemaric research to illuminare all ol these unresolved tssues.
To help clarifo what is meant by communica(ive testing we lre tbrced Io reson to available
pretheorelical daa from the literanrre relating to the concePt oi communrcllive comPetence'
Since Hymes's two-dimensional model of communtcative comPetence. contprising a
'linguistic' and a 'so€ioling'listic' element. most subsequent models have included
considerarion of a sociolinguistic dimension which recognises the imgrrtaoce of coniexl
to the approPriaE use of language and the dynamic in(eraction lhat occurs between lh t
':
I
con(ext and the discourse i6€lf.
For Hymes ( 1972). communicative com oetence had includ ed the abilitv use the
language, as well as having the knowledge which underlaY actual Pe rformance. Morrow
( 1979) felt that a distinction needed to be made between communicative competenc€ and
communicative performance. the distinguishing feature of the laner being the fact that
performance is the rcalisarion of Canale and Swain's ( I 980) three competences and their
interaction:'io the actual prcduction and comprehens ion of utterances (und€r general
psychological constrainG rhat are unique to performance ).
(Morrow. 1979.)
Morrow 1979) and Canale and Swain
( ( l9EO) argued that communicative language
(es(rng, as well as being concerned with w hat the learner knows about (he form oi the
language and about how to use it arely in contexts ofuie (comp€tence). must also
deal with the extent to which rhe learner is actually able Io dimonstrate this knowledge
in a meaningful communicati ve situation (pe rformance). i.e.. what he can do wrth the
language. or as Rea 97E. p.4) put it: 'his ability to communicate with ease and effect
rn specl!<Lt@plinguisttc settings.
The capaciry or ability (sce Widdowson, 1983; Bachman' 1990) to use language
communicatively thus involves bolh comPetence and demonsrration of the ability Io use
rhis competence. It is held that the Performance.-Elks candiqate.s are faced with-in
l
communicative tests should be representative ofihe type of task ihey might encounter in
rtriii-ovrn reaFli situation and sfouiil--Correspond lo normal liingirage use where an
integratro nof communlcanve skllts rs requl e (rme ore act on. or monltor
language input ind outPu(. The criteria em ployed rn t he assessment o perfbrmancc on
-i5-ifi
tnise taski sloulil relate ?loily E-ellectine-aoni min tai ion of ideas ln that con{ext
r'-r
This perspectlve Is conslstcnt wttn me worK or la nguage testers genera lly suppontve
ofa broadly based model ofcommunicative language ability where there is a marked shiti
in.-mphasis from the tinguistic to the communicative dimension. The emphasis is no longer
10 Communaattve Language Testtng
on linguistic accuracy, bur on
the abilin to function
panicular of situarion. 'r effectively through language
in
cooper's"on,"ro
(196g)
-]i"*.Tr,-"Iisting
linguistic comperenc€i
tist
frameworks, because they
might fair t"i*lr.r.p"rson,s conccntrared on
communicativc abiriry, was
up bv Morrow 0e79. p-lagl
ril;;;;
,ri" ,iiai,i.rj',"'*"oro nor give: hken
any convincing proof of the
candidate,s abili
compcrence (or tack or itl rf
*r,i"i i;;r;:;".1-j:_-"!,:illv usc languEge. to translatc rhe
::mlru*ffiYy-f i;{;;ff -1il:l':ff 'ff J::""#,trf ffi #[*ffi
B.J. Carol (l9g9b, p. l) adopted a similar line:
*iffi,"f--;;;il# ;;ffiflLry
ue testea is the quality
ili :ffi::T "an
(r e78, p gso)
Tg ot pur, in a communicative
event is ro produr
IM IL.Uffi
FEf,ftPUSTA
Approaches to language lest
and performance conditions design ll
for any resrs which
ability. Test con structors claim to ass€ss com
must closely identiry municativc lan guage
(see Skehan, l9g those skilts and
8) ttat are the most pe rformance
contexB. The ,ncorporation imPortant components condilions
of these fearures, wh of lan guage use in
to which rhc test task ere appropriate, would Panicular
reflec ted rhe attribures indiclte the degree
to replicate. Unless oti the acriviry in
steps are takeo to real life that ir was
rmprudent to make identi ty and inco meant
sta temeIIB about rpomte such fealures
in his or her frrture a cand idate's abili it wouid secm
target situation ty to t'unction in norma
I conditions
We aiso have to gnsure
thar the sam ple
rs as represenhti ve of communicative
as possible. Whql I anguage abiiity
in language testin g. and how to sam in our testS
Many of rhe avaiiable Pie wirh our resrs ii a key issue
qutte extensive, but descriptions ofu
se are now both d
are not always cailed erailed and
deficiency. If we on bv tesrers. We
are to ertra polate need to ma ke good
communicative language from our test data this
abili ry in real life and make state ments
the texts and task srtuari ons, great about
we em pl oY ln our care needs to be tal<ert
the general descriprive te515. 11.ra"" shouid over
pa rameters of the accord as far as possible
to the skills necessary intended target widr
for succersful pan rcrPat situarion panicularly
meer the pe t formance ton in that sihration. wirh regard
condidons of rhat eddirionally. tesb should
achieving this match conrext as fully
with real life and as possible. The
communicatiye test the resultanr impl difficulties of
data are discussed rcations for generalisabil
In the testin gl rterature rn the final sec iry from
therc ls a strong tion of this cha pter.
and it is held that emphasis on the
rmportance of test
no one solu tron can purpose.
scenarios. It is argued accommod ate the
thar appropriatel y wide variery of possible
to be nrade av ailable differentiared tests test
for evaluating different i n different skil ls
srtuation nceds. To groups areas need
measure language of examinees wi th differenr
niust now be takeD profic lelcy adequatel hrget
of: whe 19, when, how, y ln e2ch srfuadon, accouot
used, and on what with whom , and why
top rcs, and with the laoguage is
is affected by pnor wha t effecr. The o be
knowledge/cx penence/abiliaies fact that commun rcative performance
of rhis for tes t specificiry is accepted along
with the implications
(see A iderson
The impo rtant role and erqu han, 19
of context as a determinant 85b)
stressed and an integrative of commu nicative langua
approach to assessmenr ge abiliry is
is advocated I-anguage as against a decontex
can no( be meani tualised approach
and sociocul tural). ngftrl if ir is devoid
Fo r Oller (1973, ofcontext 0 rnguistic,
discoursal
contexnu.lised the 1979) rhe higher the
, more effective langua level a t which language
to be. The variabil rty ge percepti on, is
rn performance, processing and acquis
according rtion are likely
involved, is rec ogn rsed to
with the atten dant implica the discourse domain or ryp€ of lask
the types of tex t and trons this mighr
formas to be i ncluded have for test length
and
1985: Skehan, 1987) in a Iest banery (see
Douglas and Selinker,
The authentic iry of
rasks and rhe genuineness
worth anemptin g to of tex 6 in tesrc is regarded
pursue despi te as somerh rng
and in its reai rsatron. the probiems in volved
If inau then tic task borh in the definirion
abiliry rhere is a real are incl uded in of this
danger that the rests of communicative
merhod employed la nguage
of the constru ct we are could in rerfere with
interes ted in. We the meas urement
the method rather could end up measuring
rhan the abil lty abi Iity ro cop€ wirh
to read, I tsten, write,
s peak or deal with
a combination
i
12 Communtcaltve Language Tesl'ng
of these skills in specified contexts. The more authentic the tasks the less we need to be
concerned about this. lf ccnain techniques only occur in tess, e.g., cloze or multiple choice,
why should we ever contcmplate their usc? Tests of communicative languagc ability should
'- ' bc as direct as possible (anempt to reflect the 'real life' situation) and the tasks candidates
have to perform should involvc realistic discourse processing.
Unsimplificd language, i.e.. non doctored, 'genuine'texts should be used as inputs (see
Widdowson, 1983) and duc reference made to the referential and functional adequacy of
these. In addition anention necds to be paid to other task dimensions such as the size of
the text to bc undcrstood or produced and to processing in real time.
The net rcsult of these considerations is that different tests n'eed to be constructed to
match diffelrnt trlrposes and thesc instruments are no longer uniform in content or method.
A variety of tcsts is now required whereas within prcvious orthodoxies authors were satisfied
with a single 'best tes'.
In assessing thc ability to interact orally we should try to reflcct the interactive nature
of normal spoken discoursc and anempt to ensur€ that reciprociry is allowed for in the
test tasks irrcludcd. The tasks should be conducred under normal time constraints and the
etemenr of unprcdictability in oral interaction must bc recognized. for authentic
communication may lead the participants in unforeseen dircctions. Candidates may also
be expecred in cenain tasks such as group discussion to demonstrate an ability to manage
rhe interaction and/orto negotiate mcaning widr interlocutors. In short whar we know from
the theory of spoken intcraction should be built into tasks which purport to test it (see
Bygate, 1987 and Weir and Bygate, 1990).
The legitimacy of separate skills testing is bcrng questioned, however, and indeed the
mofe innovatory tcsting of skills thmugh an integrated story-line set of procedures (see
L,ow, l9t6) is gaining favour. The discrcdited holy grails of the psycholinguistic-
sociolinguistic era, such as cloze, arc still scen to have a minor role to play in adding
to the reliability of tes baneries and assessing the more specifically linguistic skills, but
centre stage is now given to morc direct anempts to operationalise the integrated testing
of communicative language abiliry.
Direct testing requires an integrated performance from the candidate involving
communication under rcalistic linguistic, situational. cultural and affective constraints.
Candidates have to perform both receptively and productively in relevant contexts. The
focus is on the expression and undersanding of functional meaning as against a more limited
mastery of form. The move to direct testing has been funher encouraged by a concern
among language testers about the problems of format effect.
Format effect rclates ro the possibility that test results may be cbntaminated by the test
format employed, i.e., a different estimate of a skill such as reading might be obtained
if a different format is employed. This possible influence of test methtrd on trail estimation
is incrcasingly recognised if not yet fully understood (see Bachman and Palmer. 1982:
Bachman, 1990). There is some evidence (Murpby, 1978, l98b; Weir, 1983a) lo suggest
that multiple choice forma! rnay be particularly suspect in this respect.
ln orider to elicit the snrdent's best pcrformancc it is imponant to minimise any detrimental
. effect of the techniques of measuremcnt on this pcrformancc. It is felt that thc typc of
i performance elicited by cenain assessment malrods may be qualiutively diffcrent from
Apprcaches to language tesl desgn ,3
real life lan8uage use and ro the exent rhat this is rhe case it is difficult ro make slaaements
Jbour candidates languagc proficiencies.
ln fie area of marking, the holistic and qualitative asscssment of prcductive skills, and
the implications of this for test reliability, need to be taken on board. The demands of
a criterion-referenced approach to testing communicalive language abiliry (where satistical
rnalvsis is likely to be more problematic) and the establishmenr of meaningful cut offscores
demand attention (see Brindley, l9E6; Cziko, l9E I : and Hauprman er a/.. 1985). At the
tinal shge of the testing process the profiling of rest resuits has to be addressed as we
rbandon the notion of a single general proficiency.
Though it is accepted that linguistic competence must be an essential pam of
communicalive competence. the way in which they relate to each oaher, or indeed how
crther relates to communicative abiiity (in the performance sense), has not beell clearly
es(ablished by ernpirical research. A good deal of work needs ro be done rn comparing
results obtained from performance on non-communicarive, linguistically-based tests with
lhose which sample communicarive ability through inshnces of performance, betbre one
can make any positive statements about the former being a sufficient indication of likely
abiliry in the latter or in real life situations. No realistic comparisons are possible until
reliable and effective. as well as valid, methods are investigated to assess proficiency in
performing relevant communicative task.
For rcsters operating within the communicative paradigm there is greater pressure to
validate tests because of an expressed desire to make the tests as direct as possible both
in terms of task and criteria. Claims made for tests being able to measure or predict real
i life language performance adequately must be tentarive until the validity of the measures
i
used is substantiared- There is a pressing need to esrablish the theoretical and empirical
I validity of measures conceived within this paradigm.
I
The use of introspection srudies to investigate the validiry of a skills approach lo the
I
resting of reading is an example ofone way validation studies might develop in the future.
i Candidates taking a reading test mighr be asked to verbalise how they answer each item
I
and the results of these investigations could then be compared witi the tester's intentions
I
in setting e-ach item. This might shed light on the nature of rhe reading construct its€lf
I and the way suspected component skills relaE to each other (or noO (see also Candlin,
1986).
The commitmert to making tests communicadve fius enails a high dcgree ofexplicitness
both dt thc test design stage where one is concerned with the required resull and at the
:valuation stage where one is estimating the acquired result (see Hawkey, 1982). It is
not necessarily the case that communicative tests will look radically different from some
:xisring Ests: but there mav be strong pragmatic reasons for trying to demonstrate any
lifference in either the rest content. the marking schemes to be applied and the way results
lre reported.
ln the present sate of uncenainty the effect of the test on the teaching that precedes
t should receive serious consideration. If our communicative tests have a beneficial
)ackwash effect in encouraging the dev€lopment of communicarive capacity in thc classroom
.see Swain, 1985; Hughes 1989) then we can be less worried aboul the &eoredcal or
)mpirical shortcomings of our knowledgc of language in use. Similarly if we can include
14 Commutucattve Languagc Tegtng
in our tcsts what is considcrcd to bc most appropriatc and bcst practice from thc languag(
classroom thc match bctwccn tcaching, tcsring and rcality is that much enhanced. Th(
proccdurc adopcd by thc Roya.l Socicty ofAns/UCLES in thc dcsign oftheir Ccnificata
in Communicativc Skills in English arc wonhy of note in this rcspccr (sec Appndix m).
respect' Cumulative'
The promising ficld of Action Research has much to offer in this
contexrs' could
informal, sma'il-scale investigations by teachers, in a variety ofclassroom
techniques for assessing
n"ip J"*" or. ,na"rstandiig of a whole range of communicative
language proficiency (sce Brindley. 1989)'
-
it ,,-ng urgr*"n,, for following the lead of the UCLES/RSA in this respect'
"ti "t"
ln thc design oftheir c;nificates in Communicative Skills in English'
the test constructors
EFL thought to be sound practice in the classroom' The
J."* t on what reachers
"riify tes6 that re$ulted are in essence
.oaauni"uii"" classroom-proven teaching techniques
(see Appendix lll for a
,rhi.h conv€nible to elicitation techniques in a tesl situaiion
"r.
'-tfr"description of this test).
t'ull
difference between teaching and resring withtn the communicative
"rfy """"t*ry
p"aaig.'r"f u,"t ro the amount of help that is available to the candidate from his teacher
"p""o. normallv available in the teaching siruarion e'g- prompts'
Ir f,it The helP that is
oPPonunity to try again'
reformulation of quesdons, encouragement. correction and the
of measurement ln this sense the test might
it t.."""0 in a test for reasons of reliability
between the world of lhe classroom and the future
Ue vierred as an intermediaE slage
target situarion where the candidate will have to oPerate unaided'
Thst is, of all that I srudcnt is cxpectcd to know rnd/or do as a .csult of his clursc of $udy
(in rn achicvcmcnt tcst) or that lhc po,shion tlquircs (in lhc crsc of a prolicicncy rcsr). r rcst
mcasurrs srudc s only on 8 sclcctld ssmplc. Thc rclisbitity of I tcst in thiE conclprion is
Apprcaches to language
@st desgn tl
o-"'rabre indicarion orcandidarcs.abiriry
[T:'il::::i::S;.i,frJ,lT,*t in rcrarion
f:"X;i*:fr,""f;,m#::#['"',t*[i;::Tfj;H:.J:;,l"i;
ur,-","ri,r," o,ii,i ;;iffi; ;iT,H:trfr: il.,:,:ff _:,':i1ffi.:ro1",,t :**;
and Pr.dicted. o.,t" ,r'"o.1,i."a'-,il1ioffi;r';T,ili" compercnce
;L'fl::::".o. and
Morrow (1979. p. 152) saw a third wav out ofthe extrapolation quandary. His argument
is that a modcl (as yet unrealised) for the performance ofglobal communicative tasks may
show. for any task. the enabling skills which have to be mobilised to complete il:
The staaus of thes€ enabling skills vis-l-vis comperence ; pcrtbrmance is incrcsling. They
may be i&ntified by an analysis of pcrformance in operatioMl tcnns. ard thus thcy arc clearly,
ullimately performance-based- But at the same time. their application extcnds far beyond any
onc panicular instance of performancc. and in this creativrry rhey reflect an rspect of wha!
is gerrnlly undentood by competcnce. In this way they offer a possible approach to the problem
of ex(rapolation.
He asserted that (p. 153): 'Analysis of the global tasks. in terms of which the candidate
is to be assessed, will usuallv yield a fairly consistent set of enabling skills' and argues
(hat assessment ofabiliw in using these skills would therelbre yield data which are relevant
across a broad spectrum of global tasks, and are not limit€d to a single insnnce oi
I pe rtb rmance .
For Morrow (1979, p. 153), a working solution to the problem would be the development
:f tests which measure both overall performance in relarion to a specified task and the
I
irraregies and skills which have been used in achieving it:
writren ald spoken production can be assessed in tcrms of borh thcse criteria. In task-based
tesrs of listening and rcading comprchension, howcvcr. it may be rathcr morc difficuk to
see ju$ how the global task has been complcted. ... it is rdther diificult to ass€ss why a
panicl ar answ€r has becn givqt altd to dcduce the skills and stratcgies employcd. In such
cases, qtrcsrioru focusing on specific cnabling skills do s€em to be called for in order to providc
rhe basis for convincing cxtiapolation.
-Ie is aware, though, that there exists in tests of enabling skills a fundamental weakness
n the relationship between the whole and the pans, as a candidate may prove quite capable
rf handling individual enabliflg skills, yet still not be able to communicate effectively.
Another problem is that it is by no means easy to identify these enabling skills: nor
rre there any guidelines for assessing their relative importanc€ for the successful completion
rf a particular communicative task, Iet alone their relative w€ighting across a sPectrum
,f rasks. Morow would appear to iuisume that we are not only able to establish these
:nabling skills, but also to describe the relationship that exists be$een the pan and the
vhole in a fairly accurate manner (in this case, how'separate'enabling skills contribute
l:o the communicative task). He appears to be saying that there is a prescribed formula:
'ossession and
ability to use enabling skills X + Y + Z = successful completion of
aommunicative task (l), whereas it would seem likely that the added presence ofa funher
skill or the absence of a named skill mighr still result in successful completion o[ the task
n hand.
A pragmatic way out of this dilemma of how to know what we are testing would be
) pursue an ethnographic validation approach as outlined ir Section 2.1.2. Dala could
e collected from student introsp€ctions on the processes they are utilising to comPlcte
ems. This could be used to help determine which items best fined the required specification
;ee Aslanian, 1985: Cohen, 1985; and Jones and Friedl. l9E6).
20 Cornmu cahve Language Testtng
In addition, advicc could bc taken from professionals who control the conlenr and th
languagc ofpurposive interacdons in thc rarget domain ofproficicncy tcsr candidares. The.
could be asked to comment on the appropriatencss of test items for the imcnded population
(sc. Douglas and Pcninari, 1983). The reccnt IELTS rcvision projecr has adoptcd thi
useful stratcgy.
Thc cxtrapolation problem faccd by those adopting a morc communicative approacl
to languagc tcst design scems to rclate ro thc wider issuc of thc status of laws in th,
behavioural scienccs. In rhc physical scicnces, laws are exrrapoladons of replicabl
phenomcna. Researchcrs in thesc domains can directly confront what they wish t(
investigatc. formulate hypotheses and rcpear experimenr as many timcs as they wish t(
veriS or falsi! their hypotheses. Bccause of problems associated wirh rhc infinitc variabiliq
of languagc in usc and thc problems involved in population sampling, the ricntific paradign
is 8 difficult one to follow in educational fleasurErrcnt:
Hawkey (1982) dcscribcd rhc classical scientific paradigm as a hyporherico{educriv(
methodolo$i formulating quantifiable, narrow, parsimonious hypotheses, resred througt
the observation of thc behaviour of a random sample of the Erget populatiol, followec
by a $rd$ical analysis of the results according to pre-ordained procedurcs. This approach
is not suitablc for largc scale proficiercy testing wherc candidates might be operating in
a variety ofcontexts and at a variery of lcvels. Account may nced ro bc aken of a large
number of variablcs. somc of which are not predicablc, all interacting in socio-cultural
contcxts. Thus there is a task sampling problem, a validity problcm.
Unlikc the scicntific pradigm described by Hawkey rherc mighr also be seriors problems
in terms of population sampling. If the target population of sfudeltts is rransient, widely
dispcrscd and varied in terms of accessibility, the sampling might of necessity have to
be opportudsric. This is a population sampling problem, a reliability problem.
The conccrn might have to be, of ncccssiry, wirh what Hawkey ( 1982, p. t6) described
as an 'illumhative evaluation' paradigm, wherc the focus was on the descripion of complex
phenomena, the resolution ofsignificant features, and the comprchension of relationships.
Initial res€arch in this area might have to limir itself to providing a descriptivc framework
for establishing communication tasks of relevance to students in a specified context, prior
Io test consfuction (see Weir, I983b)- No definirive claims could bc made (withour
empilical validatioli studies) about how rhe language user operaes when involved in these
communication tasks or how he learns to perform such task (see Kelly, 1978). Whar mighl
be provided is a specificarion, coarse bur robust, ofthe general communicative tasks facing
krget students in their specified context.
Irrespective ofthe problcmadc naturc of the exercise, the need for specifying as clearly
as possible what it is thar is to be tested seems axiomatic for testing within the communicative
paradigm. The currenr intcrest in ESP is a reflecrion of this and the acronym mighl bencr
be rcgarded as English for Specified Purposes rather than specific or.special. This would
emphasise the belief that all teaching and rcsaing is to varying extents specificd and ncver
totally Bencral.
However, thc naturc and shoncomings of target-situation analysis for arriving at
spccifications of lsnguagc nceds for lests havc be€n discussed exrensivcly iIl thc literarurc
(sec Wcir, 1983a). Therc are dangers in analyses being roo specific, e.g.. thcy may nor
Approaches to language eil desgn 2l
be operationalisable in a test. as well as in being too general. e.g.. rhey may
disadvantage
certain candidates. Perhaps the biggest danger is that there is a rendency foineeds
analyis
to claim a disproponionate amount of the time and resources available for research.
oiten
at the expense of test development.
A comprehensive description of the specification stage of a test design can be found
in Weir (1983a' 1983b). where the extensive research and development behind the
Associated Examining Board's Test in English for Educational Purposes (TEEp) is fully
documented (see also Appendix I). The careful reader can see how the results of the needs
analysis influenced the adoption of certain test formats (dictation. the integration of reading,
listening and writing activities) and clarified the range ol skilts to be tested and the
assessment criteria to be employed. The recent IELTS revision project has f,orsaken
such
needs analysis and instead intends to rely on post hoc .expert' commenr to judge the
'authenticity' and other asp€cts of the products
of the test writing teams. who were nor
to be consrrained by any a priori specifications (see Appendix V).
In the development of tests in the funrre the balance of attention musr be paid to translating
specifications into test realisations and validating the laner (in the ways to be discussed
in Chapter Two), though the need to speciff in advance of item writing the construct that
is to be measured is ignored at one's peril. The emphasis. however, should be on test
development and validation rather than on the analysis of needs for creating test
specifications- For this reason the discussion of needs analysis has been limited, and insread
the focus in Chapters Three and Four will be on test construction and in particular on
examining a range of possible formats for testing language skills within a communicative
framework.
The crucial stage in any test development occurs when the specification is ranslated
into a test realisation. The test that results should exhibit the qualities of validity, efficiency
and reliability which are examined next in Chapter Two. These qualities need ro be
determined both qualitatively a priori and empirically a posteriori.