Professional Documents
Culture Documents
The material in this communication may be subject to copyright under the Act.
Any further reproduction or communication of this material by you
may be the subject of copyright protection under the Act.
This article is a literature review which seeks to answer four questions: 1) What
is washback? 2) How does washback work? 3) How can we promote positive
washback? 4) How can we investigate washback? Building on the ’Washback
Hypothesis’ proposed by Alderson and Wall, and suggestions from Hughes, the
article proposes a model that identifies participants, processes and products which
may influence, or be influenced by, washback. Strategies for investigating wash-
back are also discussed.
I Introduction
This article is essentially a literature review, the purpose of which is
to explore the construct of language testing washback. The questions
addressed by the article are as follows:
1) What is washback?
2) How does washback work?
3) How can we promote beneficial washback?
4) How can we investigate washback?
My purpose is to explore these issues, specifically in the context of
extemal-to-programme tests (rather than teacher-made or intemal-to-
programme examinations). The article is organized around the four
questions posed above.
II What is washback?
Although there is general agreement in the field as to the basic defi-
nition of washback, there is also considerable variety in opinions as
to how it functions. In this section, I will first consider several defi-
nitions of washback and then look specifically at washback in the
context of communicative language testing.
7 Definitions of washback
Buck (1988: 17) describes washback as follows:
There is a natural tendency for both teachers and students to tailor their class-
room activities to the demands of the test, especially when the test is very
258
important to the future of the students, and pass rates are used as a measure
of teacher success. This influence of the test on the classroom (referred to as
washback by language testers) is, of course, very important; this washback
effect can be either beneficial or harmful.
of external testing and the major impact it has on the lives of test
takers’. She cites as examples the introduction of new English-speak-
ing tests in Israel (Shohamy, Reves and Bejarano, 1986), and of the
ACTFL Guidelines and Oral Proficiency Interview in the USA.
Following Alderson and Wall (1993), Messick (this issue: 241,
emphasis added) paraphrases the definition of washback as ’the extent
to which the test influences language teachers and learners to do
things they would not otherwise necessarily do’. Messick (this issue:
243) adds an important dimension to the definition of washback when
he states that ’evidence of teaching and learning effects should be
interpreted as washback only if that evidence can be linked to the
...
There are also concerns about what constitutes both positive and
negative washback, as well as about how to promote the former and
inhibit the latter.
increasing of
use communicative language teaching, result in mis-
matches between the content and procedures of established language
tests and what it is teachers and learners do in (and out of)
classrooms. But in order to understand washback, and then to promote
positive washback, we also need to examine how it works.
.
washback effect on the classroom is likely to be quite beneficial
if students realize that they have to understand passages in their
entirety to complete the tests’ (1988: 29-30).
Thus in terms of washback alone, Buck considers four of these five
familiar formats for assessing listening comprehension to be inad-
equate. Only the fifth type, cloze tests based on summaries of listening
passages, he feels, would promote students’ understanding of
spoken English.
Thus, like Shohamy, Clark stresses the need for comparative infor-
mation in promoting curricular washback. This is, of course, one of
the intended purposes of standardized tests. Standardized scoring sys-
tems allow teachers, researchers and administrators from around the
world to make and interpret comparable statements about learners’
target language proficiency, in quantified terms which are understood
throughout the field.
2 Building in authenticity
Wesche (1983: 53) points out that ’by making our tests more reflec-
tive of the kinds of situations, language content and purposes for
which second-language speakers will need their skills, we will be able
to make more accurate predictions about how they will be able to
function using the target language in &dquo;real life&dquo;’ . She further notes
that ’Such testing is likely to have dramatic effects on the format and
content of second language curricula as well, and to improve student
motivation through its increased relevance’ (1983: 53). This last com-
ment relates directly to washback, both in terms of curricular inno-
vations (induced by washback to the programme) and learners’ motiv-
ation to prepare for the test (promoted by washback to the learners).
Doye (1991: 104) defines authenticity as the term is used in lan-
guage testing:
Absolute congruence would exist when the tasks in the test situation and in
the corresponding real-life situation would actually be identical. In this extreme
case the test situation and the tasks in it are called authentic. An authentic test
is therefore one that reproduces a real-life situation in order to examine the
student’s ability to cope with it.
But Messick notes two possible threats to validity which can affect
a test’s authenticity. One is construct under-representation (when the
assessment mechanism does not include key features of the construct
being measured) and the other is construct-irrelevant variance (when
the assessment includes ’excess reliable variance that is irrelevant to
the interpreted construct’). Both threats, he says, are present in any
test to some degree. They are tied to washback in that ’if one is
concerned with fostering positive washback and reducing negative
washback, one should concentrate first on minimizing construct
under-representation and construct-irrelevant difficulty in the assess-
ment’ ( 1996: 252).
the material at hand (van Lier, pers. comm.). The issue of learner
autonomy is related directly to Alderson and Wall’s tenth washback
hypothesis: ’a test will influence the degree and depth of learning’
( 1993: 120).
The topic of self-assessment has received increasing attention from
language testing researchers in recent years (see, e.g., Oskarsson,
1980; Bachman and Palmer, 1981; LeBlanc and Painchaud, 1985;
von Elek, 1985). The following comments from von Elek (1985: 60)
4 Score reporting
Ideally, standardized, extemal-to-programme tests could provide posi-
tive washback both to the learners themselves and to programme rep-
resentatives, but to do so would minimally entail providing more
detailed score reports than are typically available with wide-scale pro-
ficiency tests. As mentioned above, Shohamy has pointed out that to
promote positive washback, assessment information must be
’detailed, innovative, relevant and diagnostic’ and that it must
’address a variety of dimensions rather than being collapsed into one
general score’ ( 1992: 515 ).
The G-TELP (ITSC, 1990) is one commercially available English
language test which does provide feedback to the learners in the form
of a detailed score report. The G-TELP Score Report provides test-
takers with information about their performance, not only on the sub-
tests of listening, reading and vocabulary, and grammar but also with
relatively specific feedback, in percentage terms, about their perform-
ance on the tasks and structures assessed in these subtests.
Spolsky (1990: 12) has tied the use of detailed score reports
indirectly to washback: ’Because there is a natural tendency on the
part of those who use test results to take shortcuts, there is a moral
responsibility on testers to see that results are not just accurate but
272
learners who choose to enrol and in some instances to take the test).
For these reasons, trying to develop a true experimental design to
assess washback would be both futile and unhelpful.
Alderson and Wall (1993: 127-28) are fully aware of this problem,
and they offer ’a series of proposals for research’. Among their
methodological suggestions we find a heavy (and in my opinion, very
healthy) emphasis on classroom observation and triangulation. (In
ethnography, triangulation is the use of two or more perspectives -
data sets, informants, theories, researchers, etc. - to investigate an
issue; see Denzin, 1970; van Lier, 1988). In investigations of lan-
guage testing washback, triangulation would minimally include teach-
ers’ and students’ perceptions of what I have called ’washback to the
programme’ and ’washback to the learner’ .
VI Conclusion
To conclude, let us consider some criteria suggested by this literature
review as being likely to promote beneficial washback. I offer these
ideas as propositions (derived from the existing literature) which need
to be systematically studied.
The first is that for a test to provide beneficial washback, test-takers
and language programme representatives (including teachers, admin-
istrators, curriculum designers, etc.) must understand the purpose of
the test. What does it measure? To what end(s) will the results be
used? Hughes (1989: 46) made this point when he suggested that, to
promote beneficial washback, the test must be ’known and understood
by students and teachers’.
Secondly, results must be believable to test-takers and user agenc-
ies and must be provided in a timely, detailed fashion. The more
clearly interpretable and informative the score report, the greater the
likelihood that the test will yield positive washback (Shohamy,
1992: 515).
Thirdly,it also matters a great deal that the test-takers find the
results credible and fair. If a test lacks face validity, it is unlikely that
its results, no matter how clearly presented, will promote effective
learning, improved teaching or positive curricular reforms.
Fourthly, a test will promote beneficial washback to programmes
to the extent that it measures what programmes intend to teach. If
teachers and administrators can look at their students’ performance
on an extemal-to-programme measure that clearly relates to the pro-
gramme, they will have confidence in the positive results (where stu-
dents succeed), and they will take seriously the negative results
(where students fall short of mastery).
276
Acknowledgements
I am grateful to Leo van Lier, Gary Buck and Leslie Zambo for shar-
ing their insights on the effects of washback in educational assess-
ment. Revisions to earlier drafts of this article were prompted by
cogent comments from Charles Alderson, Gary Buck, Felicia DeVin-
cenzi, Samuel Messick, Cynthia Potter, Carol Taylor, Jean Turner,
Leo van Lier and Dianne Wall. I am also grateful to Educational
Testing Service for permission to publish this article, which is derived
from an earlier article commissioned by the TOEFL Program.
VII References
Alderson, J.C. 1981: Report of the discussion on communicative language
testing. In Issues in language testing. ELT Documents III, London:
The British Council, 55-65.
testing: can the micro-computer help?
— 1987: Innovation in language
Language Testing Update Special Report 1. Lancaster: University of
Lancaster.
— 1990: Learner-centred testing through computers: institutional issues
in individual assessment. In de Jong, J.H.A.L. and Stevenson, D.K.,
editors, Individualizing the assessment of language abilities, Clevedon:
Multilingual Matters, 20-27.
Alderson, J.C. and Wall, D. 1993: Does washback exist? Applied Linguis-
tics 14, 115-29.
Allwright, D. and Bailey, K.M. 1991: Focus on the language classroom: an
introduction to classroom research for language teachers. New York:
Cambridge University Press.
Bachman, L.F. and Palmer, A.S. 1981: Self-assessment of communicative
competence in English. Salt Lake City: UT: Mimeograph.
Buck, G. 1988: Testing listening comprehension in Japanese university
entrance examinations. JALT Journal 10, 15-42.
Canale, M. and Swain, M. 1980: Theoretical bases of communicative
approaches to second language teaching and testing. Applied Linguis-
tics 1, 1-47.
Carroll, B.J. 1980: Testing communicative performance. Oxford: Perga-
mon Press.
278
Clark, J.L.D. 1983: Language testing: past and current status - directions
for the future. Modern Language Journal 67, 431-32.
Denzin, N.K. 1970: Sociological methods: a source book. Chicago, IL:
Aldine.
Dickinson, L. 1982: Self-assessment as an aspect of autonomy. Edinburgh:
Scottish Centre for Education Overseas.
Doye, P. 1991: Authenticity in foreign language testing. In Anivan, S., edi-
tor, Current developments in language testing. Anthology Series 25,
Singapore: Regional Language Centre, 103-10.
Green, D.Z. 1985: Developing measures of communicative proficiency: a
test for French immersion students in grades 9 and 10. In Hauptman,
P.C., LeBlanc, R., and Wesche, M.B., editors, Second language per-
formance testing, Ottawa: University of Ottawa Press, 215-27.
Hale, G.A., Stansfield, C.W. and Duran, R.P. 1984: Summaries of studies
involving the test of Test of English as a Foreign Language, 1963-
1982. TOEFL Research Report 16. Princeton, NJ: Educational Test-
ing Service.
Hart, D., Lapkin, S. and Swain, M. 1987: Communicative language tests:
perks and perils. Evaluation and Research in Education 1, 83-93.
Henning, G. 1987: A guide to language testing: development, evaluation,
research. New York: Newbury House.
Herman, J.L. and Golan, S. 1993: The effects of standardized testing on
teaching and schools. Educational Measurement: Issues and Practices
12, 20-25.
Hughes, A. 1989: Testing for language teachers. Cambridge: Cambridge
University Press.
— 1993: Backwash and TOEFL 2000. Unpublished manuscript, Univer-
sity of Reading.
International Testing Services Center 1990: G-TELP (General Tests of
English Language Proficiency) Information Bulletin. San Diego, CA:
International Testing Services Center, College of Extended Studies,
San Diego State University.
LeBlanc, R. and Painchaud, G. 1985: Self-assessment as a second language
placement instrument. TESOL Quarterly 19, 673-87.
Messick, S. 1994: The interplay of evidence and consequences in the vali-
dation of performance assessments. Educational Researcher 23, 13-23.
— 1996: Validity and washback in language testing. Language Testing
13, 241-56.
Morrow, K. 1979: Communicative language testing: revolution or evol-
ution ? In Brumfit, C.J. and Johnson, K., editors, The communicative
approach to language teaching, Oxford: Oxford University Press, 143-
57 (reprinted in Issues in language testing. ELT Documents III. Lon-
don : The British Council).
— 1991: Evaluating communicative tests. In Anivan, S., editor, Current
developments in language testing. Anthology Series 25, Singapore:
Regional Language Centre, 111-18.
Oskarsson, M. 1980: Approaches to self-assessment in foreign language
learning. Oxford: Pergamon Press.
279
Raimes, A. 1990: The TOEFL Test of Written English: causes for concern.
TESOL Quarterly 24, 427-42.
Shohamy, E. 1992: Beyond proficiency testing: a diagnostic feedback test-
ing model for assessing foreign language learning. The Modern Langu-
age Journal 76, 513-21.
— 1993: The power of tests: the impact of language tests on teaching and
learning. NFLC Occasional Paper. Washington, DC: National Foreign
Language Center.
Shohamy, W., Reves, T. and Bejarano, Y. 1986: Introducing a new com-
prehensive test of oral proficiency. ELT Journal 40, 212-20.
Spolsky, B. 1990: Social aspects of individual assessment. In de Jong,
J.H.A.L. and Stevenson, D.K., editors, Individualizing the assessment
of language abilities, Clevedon: Multilingual Matters, 3-15.
Swain, M. 1984: Large-scale communicative testing: a case study. In Savig-
non, S.J. and Berns, M., editors, Initiatives in communicative language
teaching, Reading, MA: Addison-Wesley, 185-201.
van Lier, L. 1988: The classroom and the language learner: ethnography
and second-language classroom research. London: Longman.
— 1989: Reeling, writhing, drawling, stretching and feinting in coils: oral
proficiency interviews as conversation. TESOL Quarterly 23, 489-508.
von Elek, T. 1985: A test of Swedish as a second language: an experiment
in self-assessment. In Lee, Y.P., Fok, C.Y.Y., Lord, R. and Low, G.,
editors, New dimensions in language testing, Oxford: Pergamon Press,
47-58.
Wall, D. and Alderson, J.C. 1993: Examining washback: the Sri Lankan
Impact Study. Language Testing 10, 41-69.
Wesche, M.B. 1983: Communicative testing in a second language. Modern
Language Journal 67, 41-55.