You are on page 1of 21

Chapter One

Approaches to Language Test Design:


A Critical Review

1 .1 lntroduction
To help decide on rhe mosr suitable formars for inclusion in a tesr, ir is useful to be aware
of the alternative approaches o language resting and their limitations in terms ofrhe criteria
of validity, reliability and efficiency.
Validity is concerned wirh whether a tcst measures what it is intended to measure
Reliability is mrcerned with the extent ro which we can depend on rhe test results. Efficiency
is concerned with maners of practicality and cosr in tesr design and administration.
These glosses should bc sufficient ro follow the review ofapproaches to language testing
in this chapter. Readers requiring a more deailed treatment before continuing are referred
to Chapter Two where a full discussion of these concepts can be found.
Davies (1978) argued rhat by the mid-1970s. approaches to tesring seemed to lalt along
a continuum stretching from 'discrete' itcm tesr at one end, to irltegrative tests such as
cloze at the othcr. He took the view that in testing, as in teaching, there was a rension
berween the analytical on the one hand and the inregrative on the other, and considered
that (p. 149) 'the most satistactory view of language Esting and rhe most us€ful kinds of
language tests, are a combination of rhese two views, rhe analytical and the integraaive.'
He went on to say that it was probable, in any case, thar no rcst could bc wholly analytical
or integrative (p. 149):

The two polcs of anaiysis and inrcgration arc similar to (and may bc cloiely relared to) the
concepG of rcliability and validiry. Test reliabilitv is incr.ased by adding ro the srock ofdiscrere
items in a test: rh€ smallcr thc birs and the more of thcse there are. the higher rhc porentiai
reliahiliry. Validiry, howcvcr. is increased by making rhe tesr rruer to life. in this case more
like language in use.
Oller (1979), on the other hand, felt rhat testing should focus on rhe integrative end
oi the continuum. He made a strong case for following rhe swing of rhe testing pendulum
away from what Spoisky (1976) had described as the .psychometric-srructuralist era'.
2 CommunEatve Language feslring

or thc so-ca cd 'discrete point" approach to


psycholinguisric
resring, to what hc rermed .rhc
-sociolinguistic cra : me agc of rhe intcprarivr r.o
Io the descriprion of thesc annroalles
b*;,;;;;,.H;;:L"
or 'pure' rypes. tt is recognised that, in praclicc, wcrc .distinct,
alld the integrstivc, cithcr in the tesl
*i ;;;;r-;;r of rhc discrete
thc asscssment

il ;;il .;".l.r;x
while the distinction bctwcen ,h.
ff
,*'^t?T':..ot
"ff ilT:rj,5 f Hi:,;fr
proccdurcs adoptcd, but,

;": ffi:#*:,J;x

1.2 The psychometric _structuralist


era
The clear advanhges of tcsring .discrcre.
linguisric poinB 8rc thst thcy yield
are easity quantifiabtc, as we, data which
on 'discrerc' linguistic itcms are ". "rro*rg
,Iii.'.fie;;;;i,i#, rcsrs which focus
witir objecdverv scored tqse, bur
f"r. tf,. *J.J,"Uif iv of rro*ing
cfficiem and
bo$
employed in it suffer from the detects.
thc.dr;;;;;#*'i,l.nc ^ro"irr"a
for.rts
The problem with this approac
of thc consEucr they seek ro ""ri*,
measurc.
on_proncicncy u.,"r,*,rf
!,_,[:i,ht?ilT"T:
deficiencics in rerms of .ne oonsrrucr Bi,::i.fi;.i? L*ff ,*"5t:
va,i.,ry f,rp.,iO"-,, i.i,iio,,n of O,i,
"r, "ppro""f,,
DiscrEtc poinl ssalJ,sis ncc$Brih

ffil;frfrfffii*igtfiii-#1ffi ,:*"ff fii


;il ffiJ fr."rLT:"JT,nmrm#*i
n. rL,i,r,:,-j _-,
is g')i);; ;hor i." pans
H,f:##r#iffi,v'
cannot be found in ,rn"
p.n"
ne
"*te
cruci.l
.
Propcnics of thc systcm which simply
..
*pu.l,t flmc
Ollcr is on fairly safe ground hr
a candidare's r i,e,",i"'L?"p",i;'.T,*,J:ffi :l[ ;.1,]lrf; '":*:,:#ff #[T,,,lt
l1]:""";',|,:ff *i :T:i:'iffi tJ-l r n g a. dri v rng'..,

";;;;;; ;" demonsrate


does nol depend solely on
thar
pap€r iest to inform about *..r,"rnt'n* "'*ority knowTedge
a pencil anJ
of'dri'ing. stiu"rr'"-,ir*'*ii'il1 1'-t*-i'ot' "ontt-i,g the principlei
make rhe-m on*;;i; ;;
;il"Ti:lffiff ll,lH:; ..*ms
ffi , :l'

criticiscd isolared skills rcsrs from rhis poinr fi ruf ;;a,f li


;.g"irg U",, .i
of viewi unlikcly thst
";t*JffiT mos-commoirv i*iiJ*, p'l"rac
;?:H;H 'jjtts eithcr singrv

lri'J*i:fl ,lixl"3"ffiff H]$'5i*#;};J',ilil'ff iffi#*


Kelly (t978) arguod thar ifrlrc eoal ofapplicd
linguistics was sccn ss the spplicd
or mcanins' c.s., rhc rccognirionif
trom its sysrem_giving mcaning,
rred;"r,.drr;-;;;fri-^ Ji".rn" o
Enalysis
oinino
rhen applied trnguists strou.t-i'#
fro.rc irtercrrcd in rhc
Apprcaches to language resr design 3
development and measurement of ability to take pan in specified
communicative
performance. the production of and comprehension
of coherent iiscourse, rather than in
linguistic competence. This echoed Spolsky's (196g) earlier point
rhar perhaps instead of
anempring to establish a person's kaowledge ofa language
in terms ofa percenaage mastery
of grammar and lexis. we would be. Uerrer emploiealn resting rhat person,s
ability ro
F€lform i1 a specified socio-linguistic serring.
Rea (19?8. p.5l) has expressed a similar view:

although.w-e would agree rhar language is a compiex


behaviour and lhat we would generally
accepr l definirion ofoverall language proficrency as rhe rbiliry ro function in
a narurirl language
situalon. we SIII inslsr on. or let oth
as an absrracr array or discrere,,JI',Tf;:H:,1":':ff:,':"'.*:li:f:;Xitil:;
tcsrs yield anificial. srerile and irrelevanr rypes
use of language in real life situations.
of irems *tiit iaue no ,"turionrf,ip,o rt.

. Morrow (1979) argued rhar if we are ro assess proficiency, i.e.. potendal


success in
the use,ofth^e language in some general sense.
it would be more ualuabie to test for a
knowledge ofand an abiliry ro apply the rules and process€s
by which rhes€ discrere elements
are synthesised into an infinite number of grammatical
senrcnces and then selected as being
appropriaie for a panicular contexa, rather than simply
to test knowledge of the elemenu
alone. Morrow (1979, p.la5it argues the point ttrat:'
knowledge of thc elemcnts of a laltguage in facr couo6 for rcthing unless the us€r
is able
to combine tlrcm in new and appropriare ways to rncer rhe linguisric
lemands ofth€ si&arion
in which he wishcs to us€ thc language.
I

'1.3 The psycholinguistic_sociolinguistic


era
In response to a feeling thar .discrete point' tests were insufficient
indicators of language
proficiency, rhe resting pendulum on the whole swung in favour
ofglobal tesb in the l97os.
into what Spolsky (1976) termed the psycholinguiiic_socioting-uisric
era, an approach
to- measurem€nt that was in many ways contrary
to the allegedly atomistic assu;p(ions
of the 'discrete point' tesrs (see Davies, l97g).
lt was claimed by oller (1979) that global integrative tests such as cloze and
dicrarion
i went beyond the measurement of a limited pan of language
competence achieved by
'discrete poinr' tesr wirh their bias towards iesting "reclptive
the s'*tts; rhat such tesrs
could measure rhe abiliry ro inregrate disparate langr.ige
skills in ways which more closely
approximared thc actual process of language use. Oiler,s view
1t'979, p.3Z; was rhar:
The concept ofrn rrreg,rarie. tcst was bom in con(rasr wirh rhe dcfinilion ot a
dis.rctc poin!
tests. Ifdiscrete itcms take languagc skillapart. inaegrative
tests pui it back ogerher. whercas
discrete ircms anemp( to rcst knowledgc of languagc one bit at
a iim€. inrcgmrrve lcsrs an€rnp(
to ass.ss a lcanEr's capaciry ro use many bits all arihe sanrc
time. and pos.riiiy while erercising
s€veral presumed components of a grammatical system. and p,erhaps
morc than onc of the
traditionally recognizcd skills or aspecrs of skills
{aurhor,s itaticsl.
4 Communcaive Language
fesling
Rcad (l9gl8. p. r) succir
the psycholinguistic-socioringuisric
From a psychoriolruis,,.
,,^ono,;",,."iui_"na *lll'' l"t"t'* era:
#I?::.;,*i."::::T: ,: o. so.n as rcss of a wc {cnned
l[:H.;1 j:1,il",.1tr:r;i.,1*#I,i$,::#91""4n1x.T*
contribution
oi rt.
d;;il.:TJ:l.ji,""1[ln:f,:::".""#:ll. _ff, E#,:,,:
broadrning "cntrcs
"
rollhrng grsmmslical
orne,rm conbxB
scntcn.
. ,r* ,r'jjlfl:"
.n' i. ,.irrll,J::1'1,**"',*. or rulcs ror
,fi ni".Illii*.,*-,*i;n*i"i,;lfflIHT:"rd,j
rrre. basis
on whicr, rhc
ctnnor bc nclsurcd ".",ffi New crircria
llt€t ty ir,. l-T.Juogcd. havc bccomc inrftduccd
""11"_

33i1=rflr,#{*ilffi -'ifr ,'**:{**ffliid:iktr

*l**#*--tr#gffi*e**i1u*,x

,l;-;t*ee**ge$*ffi .-orcs in two lcsrs


which correlate very
highry. in the
Approaches to language test desgn s
sense that both resB put the individuals in more or less rhe same rank order. but sincc
correlational measures take little or no accounr of mean scores. the group's scores may
be cenrred on very diffcrent means in the two rests. indicating quire different levels of
performance overall. In other words. correlational daa do not provide evidencc about
standards.
The empirical evidence that has been marshallcd in favour of the .unilary compcrence
hypothesis is open to some doubr and there is a growing body of evidence favouring a
divisibiliry hypothesis (see Vollmer. 19791 Bachman and palmer. l98ta; Hughes. lgEla:
Vollmer. l98l: rnd Poner. 1983).
Principal component analysis is often used to substantiare thc 'unitarv competence
hypothesis'. but this merhod ii cssentially designed to simpliry dau. and would b€ expecled
to produce one factor from a batterv of seemingly different language tests (see porrer.
1983). More crucially. this general language proficiency facror does nor necessarily explain
all the variance in the results. and the percentage of variance explained differs from study
ro study (see Vollmer. l98l )- Because ofthe cxistence of factors other rhan the principal
component. which explain reasonable propo(ions of the remaining variance. it is often
possible by pursuing funher factor analysis, for example Varimax romtion of rhe factor
strucure. to obtain a number of indeprendent factors each of which makes a sizeable
contribution to the roal variance.
There is also evidence in the literaore lhar rhe formar of a task can unduly affect the
prerfihance of some candida.es (sec Boniakowska. t986: Murphy, 1978. t98O: and Weir.
1983a). This makes it necessary to include a variety of test formats for assessing each
corstruct ;ther than rely on a single overall measure, such as cloze.
Thoug_h the tests Oller has advocared are global in that rhey rcquire examinees to exhibir
simulEneous control over different aspects of the language system. they are ngvenheless
indirect. Although fie tesc might integrate disparare language skills in ways which more
cl-osely approximate actual language use, one would argue that heir claim to rhe mantle
of communicative validity rcmains suspect, as only dircct tests which simulaae rclevanr
auahentic communication tasks can claim to mirror actual communicatiye interacdon (see
Kelly, 1978: Morrow. 1979). As Moller ( I9E2b, p. 25) poinred out, the indirec! rcsrs Oller
has advocated do not: 'require subjects to perform tasks considered ro be relevant in the
lil1t of rheir known future us€ of the languate.'
Advocates of communicative language resring would argue tlnt Oller's view pays
insuifricient regard o the imporrance of the productive and recepdve processing of discqrrse.
arising out of the actual use of language in a social context with all the anendant p€rformarre
consrraints, e.9., the inreraction-based nature of discourse. unpredictabiliry and behavioural
ourcomes (see Morrow, 1979; Moller. l98lb). Bo.h Rea ( 1978) and Monow ( 1979) have
emphasised that although indirec! measures of language abilities claim exrremely high
snndards of reliabiiity and concurrent validiry as established by slatistical techniques. their
cliim to other types of validity remains suspect.
Morrow ( 1979) cited as evidence for this $e fact thar neirher cloze nor dictarion offcn
the opportuniry for sponaneous production by the candidale and the language norms which
are lbllowed are those of the examiner (or original author of the text). nol of the student
himself. Neither csting procedure ot'fers the possibility for oral or non-controlled written
6 Cdnmunicative Language Tedtng
prqdudion and sincr the oral and wrinen skills are generally held to be highly imPonant'
somc mcans of asscssing them rcliably in communicative situarions should bc found
Although integrativc mcasures appear rc correlarc highly with othcr similar mcesures of
gencral language proficicncy, therc is empirical evidence that clozc corrclslcs only
modcr8tely with tests of wrincn production (sce Wcir and Ormi$on. 1978) and with sPoken
production (scc Vollmcr, lgEl). Givcn that thc tcsts concerned are rcliable. this would
suEgcsl thc possibility that proficiency in lhesc arcas cannot be adequately prcdicted by
a tcst of overall proficicncy.
Mormw also claimcd both clozc and dictation to bc fundamentally suspect since they
arr tests ofurderlying ability (compaence) rather than acoal performancc. ln othcr words'
rhey dcpcnd bosically on a knowlcdgc of thc language sy$cm rather than thc :bility to
opratc this sycm in authentic senings. B-r. Carmll (1980b. P.9) reachcd 6e same
conclusion: 'this (clozc rest) is still csscntially usage based. Thc usk docs Dot rcprescnt
gcnuine interactive communication and is, thereforc, only an inditect index of potential
cfficicncy in coping with day-oday communicadve task.'
Even ifh werc decidcd rhat indirccl tcsts such as clozc wcrc ialid in somc son ofderivcd
fashion, it still rcmains true that pcrforming on a cloze tcst is no( rhe same son of activity
as rcading. The pcdagogical conscqucncG ofincluding this typc of t6t rreasurc in a banery
imight bc harmful if h rcsults in candidates
being taught spccifically to handle inditect
assessment tasks in prefcrcncc to teaching them o cope *ith morc rcalistic tasks.
Kelly ( 1978. p. 241) made thc funhcr point that sorne candidates may managc to succeed
in thc idirecl tssk by raining ofa c€nain kild and thus invalidat! lhc test: 'indirect resa
are subjccl to auacks on their validity in thosc cascs whcrc it is Possible to byPass the
ability in question and dcvelop proficiency in the assessmcnt task alone.' Hc also noted
( 1978, pp. 245-6) that:

Analysis of a studcnr's responscs to ao indirecr tcst will ttot ptwidc any rclevanl informllion
as ro the ressorls for thc studcnt's difficultics in the sullEntic trsk. of which onc ilssumcs.
&c indirecr tcsl is a vrlid rrd rcliablc mcasurc. By thcir vcry Murs. indircst tcsts can provide
cvidcnce for /.vel ofachicvcmenl, but cannot diagnose specific arEas ofdifficulty in rclation
to dle authcnric task.
lntegrative tests such as cloze only tell us abour a candidate's linguistic competencc.
They do nor tell us anything direcdy about a student's performance ability, and their main
value in their unmodified form is in designating competencre levels rather than reladng
candidates' perform8nce to any external criteria. They are pcrhaps only of limited use
where the interest is in what the individual student can or @nnot do in terms of the various
language taSks he may facc in real life situations.
The deficiencies in the rype of information the 'discrete point' approaches of the
psychornetric - structuralist era and the more intcgrative approachcs the of
psycholinguistic-sociolinguistic era can provide bring about a need to inve$i8atc thc
'communicarive paradigm'to see whether this approach might prove morc satisfactory.
Apprcaches to language eg desgn 7

I 1 .4 The communicative paradigm

1 .1.1 Teiminology
There is, a potential problem with terminology in somc of the lircrature on communicative
approaches to language testing. References are often made in the literarure to testing
communica(ive 'performance', e.g.. B.J. Carroll's book (1980b) is enritled ?rerrmS
communicarive performance. It seems reasonabie ro ralk of tcstrng pcrformance if the
reference is to an individual's performance in one isolared situation, but as soon rs we
wish to gcneralise abour ability ro handle other siruations. 'comperence' as well as
performance' would seem to be involved. or more preciselv caoaciry' in the widdowsonEn
s93p_e (Widdowson. l9E3). Bachman s use of the rerm communicativc language rbility
*hi.h in.lrd"r bo,h kll rmplementing that
compqtence io lansuaJi_lllg_I9ql.d !e_9q-p_q!9!!.Egl!*!ft! ,y!{o.19!-: l:* ln
providinq a more inclusive and satisfactorv detinitioflof language proticiencv.
Sarictly speaking, a performance test is one which samples behaviour in a single setting
with no intention ofgcneralising beyond that sening
- o(herwise a communicative language
rest is bund to concern itself with 'capacity'(Widdowson. 1983) or 'communicative
language ability' (Bachrnan, 1990). The very aa ofgenera.lising beyond the seding acrually
tested, implies some shtemens about abilities to use the language and/or knowlcdge of
ir, Conversely it is difficuk to see how competence (knowing about using a language) might
be evaluated except throuBh ias realisation in performance. Only performarrce can be direcdy
observed and hence evaluated. All linguistic behaviour. even completing multiple choice
tess of phoncme discrimination, necessarily involves performance. [n practicc a clcar
distinctioo betwcen performance and competence wiu be difficult to mainain.
ln testing communicative lansuase abili ty we are evaluating samples of performance.
ln cedarn ific contexas of use. created under panicular test constrailG. for whal they
can tell us about a candidate's communicative capacity or language ability Skehan l9E8)
points out that while such tests may not replicate exadly the performance conditions of
a specific task in the hrget situation they are likely ro replicate to some degree conditions
aclu P€ rmance
Skehan summarises the current position succioctly:

What 'rc nced is a thcory which guidcs and prcdicts how an und€rlying communicflive
compctcnca is manifcstcd in actual perforrnance: how situations aac rclated to one anotllcr.
how compclcocc can bc assesscd by eramples of performancc on actual tests; what componcn(s
communicativc competcncc rctually hasl and how these imcrrclare .. . . Since such dcfinitive
theorics do not cxist. tcstcrs havc to do the best they can with such theories as arc available.

1.1-2 The theorelical base


The validiry of tests which claim to be communicative is a function of the degree of
understanding of communicarion and communicative abiliry on the pan of th€ test
co ;tructor. It is naiye to asssume that one can develop valid tests of communicative
language abiliry without reference to the construct which on€ is anempting to measure.
8 CommuncatNe Language Tesnng

argumcnrs rclaring ro thc state of the available descriptions of languagc in use not
withstanding.
Agrecmenl on what componcnts should be included in a model of communicalivc
language ability is by no means unanimous (sce Courchcnc and dc Baghccra. 1985. p. 49).
Indecd rclatively lirtlc is known about the wider communicative paradigm in comparison
wirh linguistic compctence pcr sa al|d adequately devclopcd therrics of communicativc
languaSe usc arc not yet availablc. This is not to say we must wait for complction of such
rheories bcfore appropriatc Iesting procedures can be dcveloped. Rather we nced to
investigare syslematically some of thc availablc hypothescs about languagc usc and try
ro operationalisc thesc for testing purposas. ln this way the constructs and proccsses of
applied linguistics may bc cxamined cmpirically and thcir status evaluated.
Canale and Swain (1980) provided a uscful $aning poinr for e clarification of thc
tcrminology neccssary for forming a more definite picturc of the ability ro use language
eommunicatively. Thcse authors rook communicative competcncc to includc grainmarical
compdencc (knowlcdge of thc rules of grammar), sociolinguistic mmpctence (knowledgc
of the rulcs of use and rulcs of discourse) and srrategic c.omperencc (knowlcdgc of vcrbal
and non-vcrbal communication srategics). The modcl was subsequcntly updated by Canale
(1983). who proposed a four-dimcnsional modcl comprising linguistic. sociolinguistic.
discoursal and *rategic compatenccsl the addilional disrinction being made between
sociolinguistic (sociocultural rules) competcnce and discoursal competence (cohesion anil
coherence).
The franrcwork proposed by Bachman (190) is consistcnt with these earlier definitions
of communicativc languagc ability:

Communicative language abilhy consiss oflanguagc compacnc!. stratcgic competencc. and


prychophysiological mcchanisms. languagc compcrcnct includcs organisational clmpctcncr.
which consists of grammatical and tcrtusl mmpcrcncc. and pragmalic competcncc. u,hich
consists of illocurionary and sociolirguistic compctcnce. Straregic comlEtcncc is scrn as
prrforming asscssmcnt. planning and cxecurion funaions in dclcrmining the mo$ cffecrivc
mcans of achicving a communicativc goal. Psychophysiological mcchanisms involvcd in
lanSuagc usc characteris€ thc channcl (euditory. visu3l) and mode (.cceptivc. producrivc) in
which competencc is implemcntcd.

Language compelence is composed of the specific knowledge and skills required for
operating the language system. for establishing the meanings of unerances, for employing
language appropriate to the conlext and for operating through language beyond the level
of the sentence. Strategic competence consists of the morc general knowledge and skills
involved in assessing. planning and exccuting mmmunicative acts efficiently. Skchan (1988)
suggests lhal the strategic component is implicatcd when communication requires
improvisarion because the oth€r compctcnces are in some way insufficient. The final pan
of Bachman's modcl deals wilh skill and method factors which arc meant to handlc thc
actual operation of language in real situations and so locatc compctcncc in a widcr
pcrformancc framework.
Modcls such as Ihese providc a potcntially useful framework for the design oflanguagc
rcss. but ir must be emphasised rhat thcy are still themselves in nccd of validarion (sec
Approacnes to language test desqn 9

Brindley. 1986: Swain, 1985). The existence of the componenrs of the model even as
separaae emities has not bcen eslablished. Skehan (198E) rightly Points but that the
relationship b€lween the various compercnces is not enairely clear. nor is the way they
are integrated into ovcrall mmmunicative competence. Nor is it made clear how this
communicarive comp€tence is Faoslaled into communicative pertbrmance- Candlin ( l9t6)
also outlined some of the Problems to bc taced in testing communicative comPetence and
argued that their solution dcpends first on our description of this construct'
io date a limited amount of research has been carried oul on investigating the
measutement of language comPetence and method factors but very linle h:rs been done
on the specific measuremenl of communication strategies or its relationship to the other
competences. This in itself may b€ afl indicarion of the inherent dittrculties in this arel'
There is a pressing need for syslemaric research to illuminare all ol these unresolved tssues.
To help clarifo what is meant by communica(ive testing we lre tbrced Io reson to available
pretheorelical daa from the literanrre relating to the concePt oi communrcllive comPetence'
Since Hymes's two-dimensional model of communtcative comPetence. contprising a
'linguistic' and a 'so€ioling'listic' element. most subsequent models have included
considerarion of a sociolinguistic dimension which recognises the imgrrtaoce of coniexl
to the approPriaE use of language and the dynamic in(eraction lhat occurs between lh t
':

I
con(ext and the discourse i6€lf.
For Hymes ( 1972). communicative com oetence had includ ed the abilitv use the
language, as well as having the knowledge which underlaY actual Pe rformance. Morrow
( 1979) felt that a distinction needed to be made between communicative competenc€ and

communicative performance. the distinguishing feature of the laner being the fact that
performance is the rcalisarion of Canale and Swain's ( I 980) three competences and their
interaction:'io the actual prcduction and comprehens ion of utterances (und€r general
psychological constrainG rhat are unique to performance ).
(Morrow. 1979.)
Morrow 1979) and Canale and Swain
( ( l9EO) argued that communicative language
(es(rng, as well as being concerned with w hat the learner knows about (he form oi the
language and about how to use it arely in contexts ofuie (comp€tence). must also
deal with the extent to which rhe learner is actually able Io dimonstrate this knowledge
in a meaningful communicati ve situation (pe rformance). i.e.. what he can do wrth the
language. or as Rea 97E. p.4) put it: 'his ability to communicate with ease and effect
rn specl!<Lt@plinguisttc settings.
The capaciry or ability (sce Widdowson, 1983; Bachman' 1990) to use language
communicatively thus involves bolh comPetence and demonsrration of the ability Io use
rhis competence. It is held that the Performance.-Elks candiqate.s are faced with-in
l
communicative tests should be representative ofihe type of task ihey might encounter in
rtriii-ovrn reaFli situation and sfouiil--Correspond lo normal liingirage use where an
integratro nof communlcanve skllts rs requl e (rme ore act on. or monltor
language input ind outPu(. The criteria em ployed rn t he assessment o perfbrmancc on
-i5-ifi
tnise taski sloulil relate ?loily E-ellectine-aoni min tai ion of ideas ln that con{ext
r'-r

This perspectlve Is conslstcnt wttn me worK or la nguage testers genera lly suppontve
ofa broadly based model ofcommunicative language ability where there is a marked shiti
in.-mphasis from the tinguistic to the communicative dimension. The emphasis is no longer
10 Communaattve Language Testtng
on linguistic accuracy, bur on
the abilin to function
panicular of situarion. 'r effectively through language
in
cooper's"on,"ro
(196g)
-]i"*.Tr,-"Iisting
linguistic comperenc€i
tist
frameworks, because they
might fair t"i*lr.r.p"rson,s conccntrared on
communicativc abiriry, was
up bv Morrow 0e79. p-lagl
ril;;;;
,ri" ,iiai,i.rj',"'*"oro nor give: hken
any convincing proof of the
candidate,s abili
compcrence (or tack or itl rf
*r,i"i i;;r;:;".1-j:_-"!,:illv usc languEge. to translatc rhe
::mlru*ffiYy-f i;{;;ff -1il:l':ff 'ff J::""#,trf ffi #[*ffi
B.J. Carol (l9g9b, p. l) adopted a similar line:

tr'#:ffi :t#;ilrr,;:JH"fi rhcoraicaroranaryicarkrnwrcd.sSof theurget


and constrainr oipanicurar drat langugc within the *n,iri
ranguage-using !,[1ffi#:
Hisopinion (l9g9b, p. 7) is
thar: .rhe ultimare..,r".o.l.,

:if ffi :ffil'ff '*Lffi


.ffmurmx;l j,,fu
tX$f ':"H:":t;1,'iltr?ffii##*::""*
.H.ffi x,:?-':: _::t.:j-f"'{m,"ffif
#trJ,.*",,:li::ffi ;,;ggr#:,#Htr#trTiffi r,"T"t:.1f.?
Moller (lgElb) t",,i1,:ld;;ili
reception **o__" reratcs to
of panicular meanings in
panicuhrtntexts, and wtrat the transmission and

*iffi,"f--;;;il# ;;ffiflLry
ue testea is the quality
ili :ffi::T "an
(r e78, p gso)
Tg ot pur, in a communicative
event is ro produr

:::1;g .lyJg;*."*;*;ffi ,#{ffi;{T*I,ffi :,,f,Hf;


comm io,
u n i car .
".n,I",T.nil. Ht"'fJi' "ffi il#, #* ffi lf - ing' in typici
These s."remrnrr *T:l^1n_emphasis
in language teaching and,
rnore recentry, testing
us e and the conce,n.th"i
[::Xfr ,Tff;:,;:TJl r,", u"".iJi],.I,* communicarive
re70;Hymes,,il,;jr#;*fl :,J:Hir: jr,[,ff
essenriar
p"*i.i."
for describing the broad :r.rj**":xl:",,n*:i
shoutd fa' bur practitioners iiti,in *trict ;;;;ffire ranguage tesring
communicativeness of a ya ry.. "ritur, "nriuri.r'r" .r*L,, the degree of
test or to make their'tests
the constrainrs obtaining. as communicative
as possibre within
wr,", do.lt"orru"i"",,ve
from other tests? These are questions.which..we tesr look likeiHow does
ir differ
need to address.

1'4'3 Distinguishing reaturcs or


communicative ranguagetests
only a few of the currentlyavailable
theories of ranguagc
of language icsring, h is
thcreforc ;;;i;; use scenr ar**n a * dcnands
as precise as possibre
about the sk,rs

IM IL.Uffi
FEf,ftPUSTA
Approaches to language lest
and performance conditions design ll
for any resrs which
ability. Test con structors claim to ass€ss com
must closely identiry municativc lan guage
(see Skehan, l9g those skilts and
8) ttat are the most pe rformance
contexB. The ,ncorporation imPortant components condilions
of these fearures, wh of lan guage use in
to which rhc test task ere appropriate, would Panicular
reflec ted rhe attribures indiclte the degree
to replicate. Unless oti the acriviry in
steps are takeo to real life that ir was
rmprudent to make identi ty and inco meant
sta temeIIB about rpomte such fealures
in his or her frrture a cand idate's abili it wouid secm
target situation ty to t'unction in norma
I conditions
We aiso have to gnsure
thar the sam ple
rs as represenhti ve of communicative
as possible. Whql I anguage abiiity
in language testin g. and how to sam in our testS
Many of rhe avaiiable Pie wirh our resrs ii a key issue
qutte extensive, but descriptions ofu
se are now both d
are not always cailed erailed and
deficiency. If we on bv tesrers. We
are to ertra polate need to ma ke good
communicative language from our test data this
abili ry in real life and make state ments
the texts and task srtuari ons, great about
we em pl oY ln our care needs to be tal<ert
the general descriprive te515. 11.ra"" shouid over
pa rameters of the accord as far as possible
to the skills necessary intended target widr
for succersful pan rcrPat situarion panicularly
meer the pe t formance ton in that sihration. wirh regard
condidons of rhat eddirionally. tesb should
achieving this match conrext as fully
with real life and as possible. The
communicatiye test the resultanr impl difficulties of
data are discussed rcations for generalisabil
In the testin gl rterature rn the final sec iry from
therc ls a strong tion of this cha pter.
and it is held that emphasis on the
rmportance of test
no one solu tron can purpose.
scenarios. It is argued accommod ate the
thar appropriatel y wide variery of possible
to be nrade av ailable differentiared tests test
for evaluating different i n different skil ls
srtuation nceds. To groups areas need
measure language of examinees wi th differenr
niust now be takeD profic lelcy adequatel hrget
of: whe 19, when, how, y ln e2ch srfuadon, accouot
used, and on what with whom , and why
top rcs, and with the laoguage is
is affected by pnor wha t effecr. The o be
knowledge/cx penence/abiliaies fact that commun rcative performance
of rhis for tes t specificiry is accepted along
with the implications
(see A iderson
The impo rtant role and erqu han, 19
of context as a determinant 85b)
stressed and an integrative of commu nicative langua
approach to assessmenr ge abiliry is
is advocated I-anguage as against a decontex
can no( be meani tualised approach
and sociocul tural). ngftrl if ir is devoid
Fo r Oller (1973, ofcontext 0 rnguistic,
discoursal
contexnu.lised the 1979) rhe higher the
, more effective langua level a t which language
to be. The variabil rty ge percepti on, is
rn performance, processing and acquis
according rtion are likely
involved, is rec ogn rsed to
with the atten dant implica the discourse domain or ryp€ of lask
the types of tex t and trons this mighr
formas to be i ncluded have for test length
and
1985: Skehan, 1987) in a Iest banery (see
Douglas and Selinker,
The authentic iry of
rasks and rhe genuineness
worth anemptin g to of tex 6 in tesrc is regarded
pursue despi te as somerh rng
and in its reai rsatron. the probiems in volved
If inau then tic task borh in the definirion
abiliry rhere is a real are incl uded in of this
danger that the rests of communicative
merhod employed la nguage
of the constru ct we are could in rerfere with
interes ted in. We the meas urement
the method rather could end up measuring
rhan the abil lty abi Iity ro cop€ wirh
to read, I tsten, write,
s peak or deal with
a combination

i
12 Communtcaltve Language Tesl'ng

of these skills in specified contexts. The more authentic the tasks the less we need to be
concerned about this. lf ccnain techniques only occur in tess, e.g., cloze or multiple choice,
why should we ever contcmplate their usc? Tests of communicative languagc ability should
'- ' bc as direct as possible (anempt to reflect the 'real life' situation) and the tasks candidates
have to perform should involvc realistic discourse processing.
Unsimplificd language, i.e.. non doctored, 'genuine'texts should be used as inputs (see
Widdowson, 1983) and duc reference made to the referential and functional adequacy of
these. In addition anention necds to be paid to other task dimensions such as the size of
the text to bc undcrstood or produced and to processing in real time.
The net rcsult of these considerations is that different tests n'eed to be constructed to
match diffelrnt trlrposes and thesc instruments are no longer uniform in content or method.
A variety of tcsts is now required whereas within prcvious orthodoxies authors were satisfied
with a single 'best tes'.
In assessing thc ability to interact orally we should try to reflcct the interactive nature
of normal spoken discoursc and anempt to ensur€ that reciprociry is allowed for in the
test tasks irrcludcd. The tasks should be conducred under normal time constraints and the
etemenr of unprcdictability in oral interaction must bc recognized. for authentic
communication may lead the participants in unforeseen dircctions. Candidates may also
be expecred in cenain tasks such as group discussion to demonstrate an ability to manage
rhe interaction and/orto negotiate mcaning widr interlocutors. In short whar we know from
the theory of spoken intcraction should be built into tasks which purport to test it (see
Bygate, 1987 and Weir and Bygate, 1990).
The legitimacy of separate skills testing is bcrng questioned, however, and indeed the
mofe innovatory tcsting of skills thmugh an integrated story-line set of procedures (see
L,ow, l9t6) is gaining favour. The discrcdited holy grails of the psycholinguistic-
sociolinguistic era, such as cloze, arc still scen to have a minor role to play in adding
to the reliability of tes baneries and assessing the more specifically linguistic skills, but
centre stage is now given to morc direct anempts to operationalise the integrated testing
of communicative language abiliry.
Direct testing requires an integrated performance from the candidate involving
communication under rcalistic linguistic, situational. cultural and affective constraints.
Candidates have to perform both receptively and productively in relevant contexts. The
focus is on the expression and undersanding of functional meaning as against a more limited
mastery of form. The move to direct testing has been funher encouraged by a concern
among language testers about the problems of format effect.
Format effect rclates ro the possibility that test results may be cbntaminated by the test
format employed, i.e., a different estimate of a skill such as reading might be obtained
if a different format is employed. This possible influence of test methtrd on trail estimation
is incrcasingly recognised if not yet fully understood (see Bachman and Palmer. 1982:
Bachman, 1990). There is some evidence (Murpby, 1978, l98b; Weir, 1983a) lo suggest
that multiple choice forma! rnay be particularly suspect in this respect.
ln orider to elicit the snrdent's best pcrformancc it is imponant to minimise any detrimental
. effect of the techniques of measuremcnt on this pcrformancc. It is felt that thc typc of
i performance elicited by cenain assessment malrods may be qualiutively diffcrent from
Apprcaches to language tesl desgn ,3
real life lan8uage use and ro the exent rhat this is rhe case it is difficult ro make slaaements
Jbour candidates languagc proficiencies.
ln fie area of marking, the holistic and qualitative asscssment of prcductive skills, and
the implications of this for test reliability, need to be taken on board. The demands of
a criterion-referenced approach to testing communicalive language abiliry (where satistical
rnalvsis is likely to be more problematic) and the establishmenr of meaningful cut offscores
demand attention (see Brindley, l9E6; Cziko, l9E I : and Hauprman er a/.. 1985). At the
tinal shge of the testing process the profiling of rest resuits has to be addressed as we
rbandon the notion of a single general proficiency.
Though it is accepted that linguistic competence must be an essential pam of
communicalive competence. the way in which they relate to each oaher, or indeed how
crther relates to communicative abiiity (in the performance sense), has not beell clearly
es(ablished by ernpirical research. A good deal of work needs ro be done rn comparing
results obtained from performance on non-communicarive, linguistically-based tests with
lhose which sample communicarive ability through inshnces of performance, betbre one
can make any positive statements about the former being a sufficient indication of likely
abiliry in the latter or in real life situations. No realistic comparisons are possible until
reliable and effective. as well as valid, methods are investigated to assess proficiency in
performing relevant communicative task.
For rcsters operating within the communicative paradigm there is greater pressure to
validate tests because of an expressed desire to make the tests as direct as possible both
in terms of task and criteria. Claims made for tests being able to measure or predict real
i life language performance adequately must be tentarive until the validity of the measures
i
used is substantiared- There is a pressing need to esrablish the theoretical and empirical
I validity of measures conceived within this paradigm.
I
The use of introspection srudies to investigate the validiry of a skills approach lo the
I
resting of reading is an example ofone way validation studies might develop in the future.
i Candidates taking a reading test mighr be asked to verbalise how they answer each item
I
and the results of these investigations could then be compared witi the tester's intentions
I
in setting e-ach item. This might shed light on the nature of rhe reading construct its€lf
I and the way suspected component skills relaE to each other (or noO (see also Candlin,
1986).
The commitmert to making tests communicadve fius enails a high dcgree ofexplicitness
both dt thc test design stage where one is concerned with the required resull and at the
:valuation stage where one is estimating the acquired result (see Hawkey, 1982). It is
not necessarily the case that communicative tests will look radically different from some
:xisring Ests: but there mav be strong pragmatic reasons for trying to demonstrate any
lifference in either the rest content. the marking schemes to be applied and the way results
lre reported.
ln the present sate of uncenainty the effect of the test on the teaching that precedes
t should receive serious consideration. If our communicative tests have a beneficial
)ackwash effect in encouraging the dev€lopment of communicarive capacity in thc classroom
.see Swain, 1985; Hughes 1989) then we can be less worried aboul the &eoredcal or
)mpirical shortcomings of our knowledgc of language in use. Similarly if we can include
14 Commutucattve Languagc Tegtng

in our tcsts what is considcrcd to bc most appropriatc and bcst practice from thc languag(
classroom thc match bctwccn tcaching, tcsring and rcality is that much enhanced. Th(
proccdurc adopcd by thc Roya.l Socicty ofAns/UCLES in thc dcsign oftheir Ccnificata
in Communicativc Skills in English arc wonhy of note in this rcspccr (sec Appndix m).

1.1,4 An eccepteble par.dlgfi?


ln I rec.nt Test of English as a Foreign L:nguage (TOEFL) confcrcncc entitled 'Toward
Communicativc Compacnce Tcsting', thc thcmc was to explorc ways in utricb drc TOEFL
tcst might bc made morc 'comrnunicativc' without scriously impairing iB presenl
Fychomctric anributcs (sce Stansficld, 1986). Bachman (1986) invesrigated the lack ol
contcxt associatcd widl many of thc TOEFL itsns ard conclud.d (p. 85) tlnc 'thc rnajoriry
of thc lrsk mcasure only grammarical competencc ...
with only a handful tapping
illocutionary or sociolinguistic comptcncc.'
Douglas ( 1986) argued for lcarning from intcrlanguage studies which had shed light
on thc variability of performancc occasioned by thc cliciatior proccdures cmploycd (t sk)
uid by thc discoursc domain (contcxt) in which rhc tasl6 arc carricd qtt (scc Douglas
8rd Sclinkcr, l9E5; Sclinkcr and Dorglas, 1985; and Skdun, 1984, 1987). Douglas argued
thet if thc TOEFL ard thc Test of Spokfi English CISD wcrc rcviscd in thc direction
of domain-spccific nsls they would 6t morc casily into a framcwork of communicativc
compctcnce. Considcration was also givcn to the authcnticity of thc languagc uscd (sce
Bachman 1986; Douglas, 1986) rDd it was agrccd that in the thco currcm TOEFL not
enough attentiol was givcn (in the tistcning scction) to rcplicating the norrlal fcafinres
of spontaneous spokcn discoursc, c.9., hsitations, nor to incorporadng trormal feaures
of inreraction such as the negotiation of sharcd mcaning,
What is significant about these discussions at thc Educational Tcsting Scnicc (ETS)
at Princeto[ is that an organisation whic-h had hitheno bccn oFreting firmly in thc
psychomctric-srructuralist tradition was now concerned in making its tcsts morc
communicativc. Thc confcrencc was a limited indicadon of acceptancc of thc principles
of communicativc language testing.
ln the language classroom, howcvcr, thc majority of commercially available tests are
still predominan y srrudure-based (see Archer and Nolan-Wmds, 1976: Fowler and Coe,
1978; and Allen, 1982). Most language tcaching coursebook and accompanying teacher
manuals, if thcy contain any advice on testing ar all, usually offer vague theoretical
gcneralisations far removed from the practical needs of the teacher who has to corlstruci
achievement tests for use in the classroom. Equa y unpalaEble is the oudatcd end overly
specific advicc that is somctimes provided on various discrerc point, non-communicative,
atomistic approaches. which pay scant regard to any of rhe insights gained through testing
r€se3rch in the last rwo dccades.
Very linle help is normally providcd in rcl8ting test &sk ro Est purposc or in selecting
appropriatc formas for tcsting morc comrnunicatively. Therc. is an almost total silcnc€
on how to intcrpret rcst rcsults oncc drc data h8s bcen gcnerated.
Thcrc is an urgcllt nccd for ELT publishen to take account of thc dcvelopnrnts in ihc
ficld of communicadvc languagc eaching and Esting if new initistivcs ar. not to bc srifled.
Apprcaches b tanquage test desgn 15

respect' Cumulative'
The promising ficld of Action Research has much to offer in this
contexrs' could
informal, sma'il-scale investigations by teachers, in a variety ofclassroom
techniques for assessing
n"ip J"*" or. ,na"rstandiig of a whole range of communicative
language proficiency (sce Brindley. 1989)'
-
it ,,-ng urgr*"n,, for following the lead of the UCLES/RSA in this respect'
"ti "t"
ln thc design oftheir c;nificates in Communicative Skills in English'
the test constructors
EFL thought to be sound practice in the classroom' The
J."* t on what reachers
"riify tes6 that re$ulted are in essence
.oaauni"uii"" classroom-proven teaching techniques
(see Appendix lll for a
,rhi.h conv€nible to elicitation techniques in a tesl situaiion
"r.
'-tfr"description of this test).
t'ull
difference between teaching and resring withtn the communicative
"rfy """"t*ry
p"aaig.'r"f u,"t ro the amount of help that is available to the candidate from his teacher
"p""o. normallv available in the teaching siruarion e'g- prompts'
Ir f,it The helP that is
oPPonunity to try again'
reformulation of quesdons, encouragement. correction and the
of measurement ln this sense the test might
it t.."""0 in a test for reasons of reliability
between the world of lhe classroom and the future
Ue vierred as an intermediaE slage
target situarion where the candidate will have to oPerate unaided'

1 .4.5 The Prcmis€d land?


approach to tesdng laoguage
So far in this chapter the development of a communicative
view taken of these develoPments'
i^'L"i outrin"a'rnd, in geneJ, a positive has b€en
an approach within
i;;;;;., u o,rLb". of outstanding
ho.t"r"., problems in adopting
draw these Probl€ms ogether
*is oaradiem that n€ed to be address€d. In order to try and
of generalisabilitv or test resuls' This
;;;;.-;;;;J";.J in relation to the cenral issue
we adoPt'
is an unavoidable issue' whatever approach to testing

1.5 The Problem ol extraPolation


with the assessment of
Other than serious marker reliability problems, associated
of a 'communicative'
oerformance (see Section 4.3), the rnajor issue affecting an adoprion
produced.by. a test' .
Ipp.ou.t to language testing is the generalisability of the results.
-iny i"", *n i seien as a impling insrument that Provides evidence on which to base
purposes the evidence provided
inferences ttrat ext ra beyond rhe available data For most
is' the test
uyi".i p"tf"..-"es has to be relevant to the whole domain of interest' that
of allowing stable to be made about
ils to Ue ,atial it also has to be capable Predictions
to be
of the domain' in other words' the test has
a canaidute's performance in any ian
reliable.
tasks ciosely related to
A communicativc les! imPlies the specification of performance
the ler-er's p.ucti"al activities' that is, to the communicalive
contexs in which he would
to bc selected For Kelly
nna ilmsetf. "ftris creates a problem of generalisability of tasks
test' i e'' one that
iiili.'p.lz the possibility of devising a corslruct valid proficiency on the prior
.""."..4abiliry to communicate in rhle target language' was dependent
16 Communcatve Language fesling

cxistence of'appropriaic objcctivcs for the test to measurc"


Thcrc is an oftcn cxprcsscd dcmend in the literaturc on performancc-bascd tcss for z
systcmatic End thorough spccification of the communicedvc dcmands of tlr urga situador
(se. Wcsche. 1985). Advocatcs of pcrformancc-bascd tcsts (sc€ Morow, l9Tl, 1979
Carroll, 1980b, l98lb: and Wcschc, l98l) scem to bc arguing that it is only ncccssarl
to select cenain rcprcsenullivc communication tasks, as wc do not usc thc same languagt
for all possible communication purposes. In thc casc of Ptoficiency tess, thcsc tasks arc
sccn as inhcrcnt to the natulc of thc communication sihrtion for whicb candidates are
bcing asscsscd. Caution, howcvcr, demands thal we wait until empirical cvidencc is
available bcfore making such confidcnt statcmcnts conccrnin! the idcndfication of thes€
tasks. It is only aftcr examining if it is fcasiblc to cs{ablish suitable objectivcs, through
cmpirical rescarch bascd on real pcoplc coping with rcel situarions, thal theib would bc
any grounds for claiming to havc sclcctcd a reprcscnErive samPlc of oFratidtul tasks
to assess pcrformance abiliry. Evcn whcn empirical rcscarch is conductcd to e$ablish
suitable objcctivcs, viz., o idcntify rclcvanl communicativc task and undcrlyinS consritucnt
enabling skills for a nrgct population, thc problcms of sampling, pndicality, rclisbility
and validity still rcmain.
The problems associatcd with cstablishi[g such spccificarions emPiricslly, rhe methods
to use and thc ranking of nceds arc discusscd by Weschc (lgEl) and Wcir (1983a). A
factor that emergcs clearly is rhat thc vcry increasc in spccificity brought about by the
rEcds ,nelysis (FnicIlarly of thc Ivtunby variety) in iBclf scrv6 to dccrcasc the pos.sibility
of generalisability. The more spccitic thc tasks onc idcntifics thc lcss onc can gcncralisc
from pcrformancc on iG rcalisation in a tcst. This typc of necds analysis is in any case
unable to spcciry the relative importsnc! of the variablcs. lf, as Rea (1978) and Morrow
(1q,9) suggcst, thc aim should bc to constnrcl simulatcd corununication tasks which closcly
resemblc those a candidat would fucc in rcal lifc and which makc rralisric dcmands on
him in rerms of languagc performance behaviour, it might bc diffrcuh to do so reliably
or validly. Communication is not conterminous with language and much communication
in nonJinguistic. Often the conditions for actual real-life communication arc not rePlicable
in a test siruation, which is unavoidably anificial and idealised and. to use Davies's (1978)
phrase, Morrow and others arc pcrhaps fruitlessly pursuing the chimera of authcnticity.
Funher, even ifthc sample ofcommunicative tasks possessed content and face validiry'
might it not still lack Benerdlisability in terms of rhe other communicadvc tasks which
are not included? Are assessments of performance on thes€ tasks made undcr Panicular
linguistic and social constraints and thus not relarable to compeence as 'characteristic
abiliries'? ln other words, if a selection is madc, if a sample is taken from a domain' how
can it be ascerhined $at itis an adequare sample?
Kelly (1978, p.226) observed that any kind oftest is an cxercise in sampling and from
rhis sample an anempt is made to infer students' caPabilitics in relarion to their Pcrformance
in general:

Thst is, of all that I srudcnt is cxpectcd to know rnd/or do as a .csult of his clursc of $udy
(in rn achicvcmcnt tcst) or that lhc po,shion tlquircs (in lhc crsc of a prolicicncy rcsr). r rcst
mcasurrs srudc s only on 8 sclcctld ssmplc. Thc rclisbitity of I tcst in thiE conclprion is
Apprcaches to language
@st desgn tl
o-"'rabre indicarion orcandidarcs.abiriry
[T:'il::::i::S;.i,frJ,lT,*t in rcrarion

He poinred our (p. :30) *,, .ii',;TIrI","r-l


the numbcr of differcnt comhunication
problcms a candidatc
;",1,
"",,*;, _"*'
will havc to solvc in the rcal
*" p..'",,",ion,
;i;1;:*il1,iJltrffJ;; mcssages' conlexls"na "o-,,il;;;:,r*,""0
of srtuation
by rh€ vatues
rhar may be encountered. pc,fo.manc" corairiJiI
"na
Thus. on the basis of performance
on a panicular item. one oughr
to be circumspect. to
n g conctusion,'
::r,*,.,,::T:,,,:,drawi
roo,,
"'
.;il.;:'.
;:sJii,,, ro handte sim i tar
Morrow ( 1977. p. 53) was aiso awarp
of rh-
the problems
^-^r.r^_-of" exrraPolation. He succinctly
set our the problem Iike ,n,a, " '*utt ^f

Ihe very essence ofa


cofilmunrat is toesEblish
taurur.s or conre*i.li|.:;;j::T.o*h Panicuiar situa(ions wr(h panrcuiar
o lesl the candidare s ioilitv to use l:
rn terms of a panicular soecificr
j*.fl
:1'*:,",,,,Tffi S:U1""+::f :lll*j[tf:1hm*
which is apP'opriale to a siruarion
onc rcspecr rrom ttnt differeniin even
"srauri.tJiuage
Alderson (Alderson and Hughes,.l9gl,
p.59) aiso "**accepted rhat to follow rhe
a"ni" *r'"'l' *L'il,..,o haa rc ao witi
;H;::T':':J:',1fl:f,:ff,'fo*..
performarce in this manner: ,olil::.:""uut'ons' but recognized oat uy speciryir[
t'6 up describing an impossibli
iituations, *t i.rr'"r.'.r;roi *ri.,i o?
"[111'ent
;#J*iJJ:ffi[i",$*j:'::Hiii:,'fi']:#ffr'n,..,",,o",o,he indenni,e,v
as. rarge a nu.u., or*r,
of resr
*;,-,f$,3ffifl.i.,r,H,i_.i:1fffi
'larger s;r:r:n:{jl
,*"iO,"
efficiency. The ,ni:T,?,: of ,u.f., _O ,f,".rn.r.
the ronger rhe communicarive rest the resa irems.
wiit have . ;. ;;;,;r IL"r*n no,"o,
r cruc iar impo ,,!ncc : evcn we
1|.fl.ll,t'i:,jJil[?'f: iiTP::':'"'"' o
i r carnor

f:"X;i*:fr,""f;,m#::#['"',t*[i;::Tfj;H:.J:;,l"i;
ur,-","ri,r," o,ii,i ;;iffi; ;iT,H:trfr: il.,:,:ff _:,':i1ffi.:ro1",,t :**;
and Pr.dicted. o.,t" ,r'"o.1,i."a'-,il1ioffi;r';T,ili" compercnce
;L'fl::::".o. and

lt may nor' however, be a, thar easy.to


identify rhis .one situation. on which
our predictions. Let us rake as an to base
e; the developmenr of an EAP reading
ar addirronal problem for *nrlinJ.ulll: test. Here
,"h ich a candidate ;^ ; ffi il.,r,",: i ,:T.i3. :;[ff".il:l"f ffi i::iT# [XH:
evidence in rhe tirerature *rat a number
on rheir comprehension of rcxts outside
of stu** il;1il;.""ili1i.u uy u.ing t"rr"a
,r,"i. u.ua.,ni"'r,lia-;"';;;r: I9E3a:
and urquhart. re85a. le85b). rrre impricatton Alderson
oiil';;il;:;::.difreren( tes6 are
18 Communicative Language Testing

necessary for audiences which are clearly


identifiable as different. There
for further invesrigarion inro langusge is an urgent need
Mo,ow (1977) observed that in In"
iesting fr, ,r#;';;;_"r.
of conrentioniiilngu"g"
measuring ma$ery of the ranguage ""r. tesrs aimed ar
The grammaticarand phonorogiiar
code, extrapolation *oum ..",
io pose few probrems.
the rexicar rcsources can be airimir"a.
,yr,.^
of a tanguage;il;. and manageabre
and
is made up of a finite number
it. inRrir"-;;;i;;rences in a ranguage
or etem"nrc, and tests or *re
extremety powertur from a predictir. masteij of these erements are
"iri*lli*l, ifrilrr"
p.in,
'what rcmains a convincing p.Z2s)remarked:
"tg"*,
point and inregrative) is that gramma,
ir-i"vlr of linguistic competencc,rests (boh
discrerc
i, of ranguage rearning . . . . Grammar
is far rnore powertur.in "iL"."or"
org"r"riLu,ity
Ke,y (r97g) provides,an inrcr.rInglffienr than ;;;;"';guage fearure.,
terrn's

for exampre, how cruciar again$ this viewpoint. Ir is nor


known,
objective of being abre ro"
*,,pr",",i""i:; Jffiilil;;:,*f"rosr
is rc rhe overa,
"o'nruri"rt.
ro know rhe second conditionar.
;;;rdd:';; ffi #il a disabitiry it is not
we stir-r do nol.poro. what Keily (lg7g,
p. t7) described
of the relativ. n noion"r irp.**"'lirr,"
il:';"fr?:.knowlcdge u"aou, stru*ures
Given this fairing, it wourd seem
ir-advised ro make any craims
shourd bc abre ro do in a tanguage about what smdents
onit.-urri, of scores iriaJr"i. poinr resrs of syntax
or lexis' The consrruct 'abiriry a in language, invorves more than a
"o,nrriLtc mere
f ilIi'iliffifiH:3lfi il""TfrTj*i'''""*ii;;il;;;t_,.1"on,"q,",::,
)mpt b devisc measuring instruments
can assess performance ability. which
As a way out of the extraporation
quandary, Kcry (lg7g, p.
approach to the ask of devising 23g) suggested a rwo-stage
a test that #ro"no a possibre
conflicting demands compromise between the
of the criieri, orGiai',y, *i"u,riry
lra#:'L"r,
Thc fim stage involves the developmcnr
of a direcr test thar is maximally
and hcn* incfrrcicnt. Thc sccond valid and rcliable.
i"g.
resrs of high varidiry. The varidity
Jlr;,r," a.r.iop,n"nr?#;rnrr. hence indirect.
rhe firsr banery of direcr asks.
oitt.
iiarr""r tests is ro u" a.t"rrin"o
by refercnce to
crearry, wrrcre ,aria ano ,"ri"urclriir.rr.ienr
exisr for the consrrucr in question, rests arrcady
,r,",
efficienr, indires rests whose ,..ult,
,i.
***rch straregy ;;;.
devclopment of
igry
"o...r"tr-t *,riii*r'.iii. .^rrting t.rt.
""i1.
Thus' rerreat from o:1"', evaruation
of perfolmance may be e, provided
[ilji:::l,i:;J;H ffi:fffi: u"t*"* ilt r,o,
"o.fr,","-.
acceptabr
i.-,,ing and predicted
As far as large scare proficiency
testing is concerned, another viabre
to focus anenrion on ranguage sorution might be
use in individuar.and ,p."#Jli,rJtiors
for purposes of extraoorition, tests or,i"
wrrire remining,
language which are sencrarisabre
," ""raiao.,, .uiiiry ;;;i.
those aspects of
urg"rge use situations, namery the grammaticar
and phonological systims.
or it may u."uo. ,i*",r,"t tr," ro-#,
(see weir, r983a) i,,jffi htter unnecessary
as was the casi in ne rilp research, where the
trarruftrr ,,st i,as
ir ofiercd
*:i':::::::#,e ""dd;;;iiio*,,on,o,no, p,iiia bv the more use
Apprcaches to language lest destgn 19

Morrow (1979. p. 152) saw a third wav out ofthe extrapolation quandary. His argument
is that a modcl (as yet unrealised) for the performance ofglobal communicative tasks may
show. for any task. the enabling skills which have to be mobilised to complete il:
The staaus of thes€ enabling skills vis-l-vis comperence ; pcrtbrmance is incrcsling. They
may be i&ntified by an analysis of pcrformance in operatioMl tcnns. ard thus thcy arc clearly,
ullimately performance-based- But at the same time. their application extcnds far beyond any
onc panicular instance of performancc. and in this creativrry rhey reflect an rspect of wha!
is gerrnlly undentood by competcnce. In this way they offer a possible approach to the problem
of ex(rapolation.
He asserted that (p. 153): 'Analysis of the global tasks. in terms of which the candidate
is to be assessed, will usuallv yield a fairly consistent set of enabling skills' and argues
(hat assessment ofabiliw in using these skills would therelbre yield data which are relevant
across a broad spectrum of global tasks, and are not limit€d to a single insnnce oi
I pe rtb rmance .

For Morrow (1979, p. 153), a working solution to the problem would be the development
:f tests which measure both overall performance in relarion to a specified task and the
I
irraregies and skills which have been used in achieving it:
writren ald spoken production can be assessed in tcrms of borh thcse criteria. In task-based
tesrs of listening and rcading comprchension, howcvcr. it may be rathcr morc difficuk to
see ju$ how the global task has been complcted. ... it is rdther diificult to ass€ss why a
panicl ar answ€r has becn givqt altd to dcduce the skills and stratcgies employcd. In such
cases, qtrcsrioru focusing on specific cnabling skills do s€em to be called for in order to providc
rhe basis for convincing cxtiapolation.
-Ie is aware, though, that there exists in tests of enabling skills a fundamental weakness
n the relationship between the whole and the pans, as a candidate may prove quite capable
rf handling individual enabliflg skills, yet still not be able to communicate effectively.
Another problem is that it is by no means easy to identify these enabling skills: nor
rre there any guidelines for assessing their relative importanc€ for the successful completion
rf a particular communicative task, Iet alone their relative w€ighting across a sPectrum
,f rasks. Morow would appear to iuisume that we are not only able to establish these
:nabling skills, but also to describe the relationship that exists be$een the pan and the
vhole in a fairly accurate manner (in this case, how'separate'enabling skills contribute
l:o the communicative task). He appears to be saying that there is a prescribed formula:
'ossession and
ability to use enabling skills X + Y + Z = successful completion of
aommunicative task (l), whereas it would seem likely that the added presence ofa funher
skill or the absence of a named skill mighr still result in successful completion o[ the task
n hand.
A pragmatic way out of this dilemma of how to know what we are testing would be
) pursue an ethnographic validation approach as outlined ir Section 2.1.2. Dala could
e collected from student introsp€ctions on the processes they are utilising to comPlcte
ems. This could be used to help determine which items best fined the required specification
;ee Aslanian, 1985: Cohen, 1985; and Jones and Friedl. l9E6).
20 Cornmu cahve Language Testtng

In addition, advicc could bc taken from professionals who control the conlenr and th
languagc ofpurposive interacdons in thc rarget domain ofproficicncy tcsr candidares. The.
could be asked to comment on the appropriatencss of test items for the imcnded population
(sc. Douglas and Pcninari, 1983). The reccnt IELTS rcvision projecr has adoptcd thi
useful stratcgy.
Thc cxtrapolation problem faccd by those adopting a morc communicative approacl
to languagc tcst design scems to rclate ro thc wider issuc of thc status of laws in th,
behavioural scienccs. In rhc physical scicnces, laws are exrrapoladons of replicabl
phenomcna. Researchcrs in thesc domains can directly confront what they wish t(
investigatc. formulate hypotheses and rcpear experimenr as many timcs as they wish t(
veriS or falsi! their hypotheses. Bccause of problems associated wirh rhc infinitc variabiliq
of languagc in usc and thc problems involved in population sampling, the ricntific paradign
is 8 difficult one to follow in educational fleasurErrcnt:
Hawkey (1982) dcscribcd rhc classical scientific paradigm as a hyporherico{educriv(
methodolo$i formulating quantifiable, narrow, parsimonious hypotheses, resred througt
the observation of thc behaviour of a random sample of the Erget populatiol, followec
by a $rd$ical analysis of the results according to pre-ordained procedurcs. This approach
is not suitablc for largc scale proficiercy testing wherc candidates might be operating in
a variety ofcontexts and at a variery of lcvels. Account may nced ro bc aken of a large
number of variablcs. somc of which are not predicablc, all interacting in socio-cultural
contcxts. Thus there is a task sampling problem, a validity problcm.
Unlikc the scicntific pradigm described by Hawkey rherc mighr also be seriors problems
in terms of population sampling. If the target population of sfudeltts is rransient, widely
dispcrscd and varied in terms of accessibility, the sampling might of necessity have to
be opportudsric. This is a population sampling problem, a reliability problem.
The conccrn might have to be, of ncccssiry, wirh what Hawkey ( 1982, p. t6) described
as an 'illumhative evaluation' paradigm, wherc the focus was on the descripion of complex
phenomena, the resolution ofsignificant features, and the comprchension of relationships.
Initial res€arch in this area might have to limir itself to providing a descriptivc framework
for establishing communication tasks of relevance to students in a specified context, prior
Io test consfuction (see Weir, I983b)- No definirive claims could bc made (withour
empilical validatioli studies) about how rhe language user operaes when involved in these
communication tasks or how he learns to perform such task (see Kelly, 1978). Whar mighl
be provided is a specificarion, coarse bur robust, ofthe general communicative tasks facing
krget students in their specified context.
Irrespective ofthe problcmadc naturc of the exercise, the need for specifying as clearly
as possible what it is thar is to be tested seems axiomatic for testing within the communicative
paradigm. The currenr intcrest in ESP is a reflecrion of this and the acronym mighl bencr
be rcgarded as English for Specified Purposes rather than specific or.special. This would
emphasise the belief that all teaching and rcsaing is to varying extents specificd and ncver
totally Bencral.
However, thc naturc and shoncomings of target-situation analysis for arriving at
spccifications of lsnguagc nceds for lests havc be€n discussed exrensivcly iIl thc literarurc
(sec Wcir, 1983a). Therc are dangers in analyses being roo specific, e.g.. thcy may nor
Approaches to language eil desgn 2l
be operationalisable in a test. as well as in being too general. e.g.. rhey may
disadvantage
certain candidates. Perhaps the biggest danger is that there is a rendency foineeds
analyis
to claim a disproponionate amount of the time and resources available for research.
oiten
at the expense of test development.
A comprehensive description of the specification stage of a test design can be found
in Weir (1983a' 1983b). where the extensive research and development behind the
Associated Examining Board's Test in English for Educational Purposes (TEEp) is fully
documented (see also Appendix I). The careful reader can see how the results of the needs
analysis influenced the adoption of certain test formats (dictation. the integration of reading,
listening and writing activities) and clarified the range ol skilts to be tested and the
assessment criteria to be employed. The recent IELTS revision project has f,orsaken
such
needs analysis and instead intends to rely on post hoc .expert' commenr to judge the
'authenticity' and other asp€cts of the products
of the test writing teams. who were nor
to be consrrained by any a priori specifications (see Appendix V).
In the development of tests in the funrre the balance of attention musr be paid to translating
specifications into test realisations and validating the laner (in the ways to be discussed
in Chapter Two), though the need to speciff in advance of item writing the construct that
is to be measured is ignored at one's peril. The emphasis. however, should be on test
development and validation rather than on the analysis of needs for creating test
specifications- For this reason the discussion of needs analysis has been limited, and insread
the focus in Chapters Three and Four will be on test construction and in particular on
examining a range of possible formats for testing language skills within a communicative
framework.
The crucial stage in any test development occurs when the specification is ranslated
into a test realisation. The test that results should exhibit the qualities of validity, efficiency
and reliability which are examined next in Chapter Two. These qualities need ro be
determined both qualitatively a priori and empirically a posteriori.

You might also like