You are on page 1of 7

point and counterpoint

A critical review of the I E LT S writing

Hacer Hande Uysal

Administered at local centres in 120 countries throughout the world, IELTS

(International English Language Testing System) is one of the most widely used
large-scale ES L tests that also offers a direct writing test component. Because of its
popularity and its use for making critical decisions about test takers, it is crucial to
draw attention to some issues regarding the assessment procedures of IE LT S.
Therefore, the present paper aims to provide a descriptive and critical review of the
IE LT S writing test by focusing particularly on various reliability issues such as single
marking of papers, readability of prompts, comparability of writing topics, and
validity issues such as the definition of the international writing construct,
without considering variations among rhetorical conventions and genres around
the world. Consequential validity-impact issues will also be discussed and
suggestions will be given for the use of IE LT S around the world and for future
research to improve the test.

Introduction Large-scale ESL tests such as Cambridge certificate exams, IELTS, and
TOEFL (the Test of English as a Foreign Language) are widely used around
the world, and they play an important and critical role in many peoples lives
as they are often used for making critical decisions about test takers such as
admission to universities. Therefore, it is necessary to address the
assessment procedures of such large-scale tests on a regular basis to make
sure that they meet professional standards and to contribute to their further
development. However, although there have been several publications
evaluating these tests in general, these publications often do not offer
detailed information specifically about the writing component of these tests.
Scholars, on the other hand, acknowledge that writing is a very complex and
difficult skill both to be learnt and assessed, and that it is central to academic
success especially at university level. For this reason, the present article aims
to focus only on the assessment of writing, particularly in the IE LT S test,
because as well as being one of the most popular E S L tests throughout the
world it is unique among other tests in terms of its claims to assess English
as an international language, indicating a recognition of the expanding
status of English. After a brief summary of the I E LT S test in terms of its
purpose, content, and scoring procedures, the article aims to discuss several
reliability and validity issues about the I E LT S writing test to be considered
both by language testing researchers and test users around the world.

314 E LT Journal Volume 64/3 July 2010; doi:10.1093/elt/ccp026

The Author 2009. Published by Oxford University Press; all rights reserved.
Advance Access publication April 17, 2009
Downloaded from
by University of Birmingham user
on 13 September 2017
The IELT S writing The IE LT S writing test is a direct test of writing in which tasks are
test communicative and contextualized for a specified audience, purpose, and
General background genre, reflecting recent developments in writing research. There is no
information choice of topics; however, IE LT S states that it continuously pre-tests the
topics to ensure comparability and equality. I E LT S has both academic and
general training modules consisting of two tasks per module. In the
academic writing module, for Task 1, candidates write a report of around 150
words based on a table or diagram, and for Task 2, they write a short essay or
general report of around 250 words in response to an argument or
a problem. In the general training module, in Task 1, candidates write
a letter responding to a given problem, and in Task 2, they write an essay in
response to a given argument or problem. Both academic and general
training writing modules take 60 minutes. The academic writing
component serves the purpose of deciding university admission of
international students, whereas general writing serves the purposes of
completing secondary education, undertaking work experience or training,
or meeting immigration requirements in an English speaking country.
Trained and certified IE LT S examiners assess each writing task
independently giving more weight to Task 2 in marking than Task 1. After
testing, writing scores along with other scores from each module of the test
are averaged and rounded to produce an overall band score. However, how
these descriptors are turned into band scores is kept confidential. There is
no pass/fail cut scores in IE LT S. Detailed performance descriptors have
been developed which describe written performance at the nine I E LT S
bands and results are reported as whole and half bands. I E LT S provides
a guidance table for users on acceptable levels of language performance for
different programmes to make academic or training decisions; however,
IE LT S advises test users to agree on their own acceptable band scores based
on their experience and local needs.

Reliability issues Hamp-Lyons (1990) defines the sources of error that reduce the reliability in
a writing assessment as the writer, task, and raters, as well as the scoring
procedure. IELT S has initiated some research efforts to minimize such
errors, including the scoring procedure, and to prove that acceptable
reliability rates are achieved.
In terms of raters, IELT S states that reliability is assured through training
and certification of raters every two years. Writing is single marked locally
and rater reliability is estimated by subjecting a selected sample of returned
scripts to a second marking by a team of I E LT S senior examiners. Shaw
(2004: 5) reported that the inter-rater correlation was approximately 0.77 for
the revised scale and g-coefficients were 0.840.93 for the operational
single-rater condition. Blackhurst (2004) also found that the paired
examinersenior examiner rating from the sample I ELT S writing test data
produced an average correlation of 0.91. However, despite the reported high
reliability measures, in such a high-stakes international test, single marking
is not adequate. It is widely accepted in writing assessment that multiple
judgements lead to a final score that is closer to a true score than any single
judgement (Hamp-Lyons 1990). Therefore, multiple raters should rate the
IE LT S writing tests independently and inter- and intra-rater reliability

A critical review of the I ELT S writing test 315

Downloaded from
by University of Birmingham user
on 13 September 2017
estimates should be constantly calculated to decide about the reliability and
consistency of the writing scores.
IE LT S also claims that the use of analytic scales contributes to higher
reliability as impressionistic rating and norm referencing are discouraged,
and greater discrimination across bands is achieved. However, Mickan
(2003) addressed the problem of inconsistency in ratings in I ELT S exams
and found that it was very difficult to identify specific lexicogrammatical
features that distinguish different levels of performance. He also discovered
that despite the use of analytic scales, raters tended to respond to texts as
a whole rather than to individual components. Falvey and Shaw (2006), on
the other hand, found that raters tended to adhere to the assessment scale
step by step, beginning with task achievement then moving on to the next
criterion. Given the controversial findings about rater behaviour while using
the scales, more precise information about the scale and about how raters
determine scores from analytical categories should be documented in more
detail to confirm IELT S claims about the analytic scales.
IE LT S pre-tests the tasks to ensure they conform to the test requirements in
terms of content and level of difficulty. OLoughlin and Wigglesworth
(2003) investigated task difficulty in Task 1 in I E LT S academic writing and
found differences among tasks in terms of the language used. It was found
that the simpler tasks with less information elicited higher performance
and more complex language from responders in all proficiency groups.
Mickan, Slater, and Gibson (2000), on the other hand, examined the
readability of test prompts in terms of discourse and pragmatic features
and the test-taking behaviours of test takers in the writing test and found
that the purpose and lexicogrammatical structures in the prompts
influenced the task comprehension and writing performance.
IE LT S also states that topics or contexts of language use, which might
introduce a bias against any group of candidates of a particular background,
are avoided. However, many scholars highlight that controlling the
topic variable is not an easy task as it is highly challenging to
determine a common knowledge base that can be accessed by all
students from culturally diverse backgrounds and who might have varied
reading experiences of the topic or content area (Kroll and Reid 1994).
Given the importance of the topic variable on writing performance and
the difficulty of controlling it in such an international context,
continuous research on topic comparability and appropriateness should
be carried out by IE LT S.
The research conducted by I E LT S has been helpful in understanding some
variables that might affect the reliability and accordingly the validity of the
scores. As indicated by research, different factors interfere with the
consistency of the writing test to varying degrees. Therefore, more research
is necessary especially in the areas of raters, scale, task, test taker behaviour,
and topic comparability to diagnose and minimize sources of error in
testing writing. Shaw (2007) suggests the use of electronic script
management (ES M) data in further research to understand various facets
and interactions among facets, which may have a systematic influence on

316 Hacer Hande Uysal

Downloaded from
by University of Birmingham user
on 13 September 2017
Validity issues IE LT S makes use of both expert judgements by academic staff from the
target domain and empirical approaches to match the test tasks with the
target domain tasks and to achieve high construct representativeness and
relevance. Moore and Morton (1999), for example, compared I E LT S writing
task items with 155 assignments given in two Australian universities. They
found that IE LT S Task 1 was representative of the target language use (TL U)
content, while IE LT S Task 2, which requires students to agree or disagree
with a proposition, did not match exactly with any of the academic genres in
the TLU domain as the university writing corpus was based on external
sources as opposed to IE LT S Task 2, which was based on prior knowledge as
a source of information. IELTS Task 2 had a greater similarity to non-
academic public forms of discourse such as a letter to the editor; however,
IE LT S Task 2 could also be considered close to the genre essay, which was
the most common of the university tasks (60 per cent). In terms of
rhetorical functions, the most common function in the university corpus
was evaluation, parallel to I ELT S Task 2. As a conclusion, it was suggested
that an integrated reading-writing task should be included in the test to
increase authenticity. Nevertheless, I E LT S claims are based on the
investigation of TLU tasks from only a limited contextBritish and
Australian universitiesthus, representativeness and relevance of the
construct and meaningfulness of interpretations in other domains are
seriously questionable.
In terms of the constructs and criteria for writing ability, general language
construct in IELT S is defined both in terms of language ability based on
various applied linguistics and language testing models and in terms of how
these constructs are operationalized within a task-based approach. Task 1
scripts in both general and academic writing are assessed according to task
fulfilment, coherence, cohesion, lexical resource, and grammatical range
and accuracy criteria. Task 2 scripts are assessed on task response (making
arguments), lexical resource, and grammatical range and accuracy criteria.
However, according to Shaw (2004), the use of the same criteria for both
general and academic writing modules is problematic, and this application
was not adequately supported by scientific evidence. In addition, with the
new criteria that have been in use since 2005, the previous broad category
communicative quality has been replaced by coherence and cohesion,
causing rigidity and too much emphasis on paragraphing (Falvey and Shaw
2006). Therefore, it seems as if traditional rules of form rather than
meaning and intelligibility have recently gained weight in construct
definitions of IE LT S.
IE LT S also claims that it is an international English test. At present, its claim
is grounded on the following issues (Taylor 2002).
1 Reflecting social and regional language variations in test input in
terms of content and linguistic features, such as including various
2 Incorporating an international team (UK, Australia, and New Zealand)
which is familiar with the features of different varieties in the test
development process.
3 Including NNS as well as NS raters as examiners of oral and written tests.

A critical review of the I ELT S writing test 317

Downloaded from
by University of Birmingham user
on 13 September 2017
However, the English varieties that are considered in I ELT S include only the
varieties of the inner circle. Except for the inclusion of NNS raters in the
scoring procedure, the attempts of I ELT S to be considered as an
international test of English are very limited and narrow in scope. As an
international English language test, I E LT S acknowledges the need to
account for language variation within the model of linguistic or
communicative competence (Taylor 2002); however, its construct definition
is not any different from other language tests. If I E LT S claims that it
assesses international English, it should include international language
features in its construct definition and provide evidence to support that
IE LT S can actually measure English as an international language.
In addition, Taylor (2002) suggests that besides micro-level linguistic
variations, macro-level discourse variations may occur across cultures.
Therefore, besides addressing the linguistic varieties of English around the
worldWorld Englishesthe I E LT S writing test should also consider the
variations among rhetorical conventions and genres around the
worldworld rhetoricswhile defining the writing construct especially
related to the criteria on coherence, cohesion, and logical argument.
Published literature presents evidence that genre is not universal, but
culture specific; and people in different parts of the world differ in terms of
their argument styles and logical reasoning, use of indirectness devices,
organizational patterns, the degree of responsibility given to the readers,
and rhetorical norms and perceptions of good writing. In particular, the
ability to write an argumentative essay, which is used in the I E LT S writing
test, is found to demonstrate unique national rhetorical styles across
cultures; IE LT S corpus database should be used to find common features of
argumentative writing that are used by all international test takers to
describe the international argumentative writing construct (Taylor 2004).
This is especially important as Cambridge ESOL (the UK partner in I E LT S)
plans to develop a common scale for L2 writing ability in the near future.
It is also important for IE LT S to consider these cultural differences in rater
training and scoring. Purves and Hawisher (1990), based on their study on
an expert rater group, suggest that culture-specific text models also exist in
readers heads and they form the basis for the acceptance and
appropriateness of written texts and affect the rating of student writing. For
example, differences between NS and NNS raters were found in terms of
their evaluation regarding topics, cultural rhetorical patterns, and sentence-
level errors (Kobayashi and Rinnert 1996). Therefore, it is also crucial to
investigate both NS and NNS raters rating behaviours with relation to test-
taker profile.
In terms of consequences, the impact of I E LT S on the content and nature of
classroom activity in IE LT S classes, materials, and the attitudes of test users
and test takers has been investigated. However, these are not enough. I E LT S
should also consider the impact of its writing test in terms of the chosen
standards or criteria on the international communities in a broader context.
Considering IE LT S claims to be an international test, the judging of written
texts from students of various cultural backgrounds according to one
writing standard (based on Western writing norms) may not be fair. Taylor
(2002) states that people who are responsible for language assessment

318 Hacer Hande Uysal

Downloaded from
by University of Birmingham user
on 13 September 2017
should consider how language variation affects the validity, reliability, and
impact of the tests and should provide a clear rationale for why they include
or exclude more than one linguistic variety and where they get their norms
As for the washback effects of I E LT S, at present, it is believed in the
academic world that international students and scholars must learn
Western academic writing so that they can function in the Anglo-American
context. This view, in a way, imposes Western academic conventions on all
the international community, showing no acceptance for other varieties.
According to Kachru (1997), however, this may result in the replacement of
each and every rich creative national style in the world with the Western way
of writing. This view is reflected in most other tests of English as well.
However, because IE LT S claims to be an international test of English, it
should promote rhetorical pluralism and raise awareness of cultural
differences in rhetorical conventions rather than promoting a single
Western norm of writing as pointed out by Kachru (1997). Therefore,
considering the high washback power of I E LT S, communicative aspects of
writing rather than strict rhetorical conventions should be emphasized in
the IELT S writing test.

Conclusion To sum up, IELT S is committed to improving the test further and has been
carrying out continuous research to test its reliability and validity. However,
some issues such as the fairness of using a single prescriptive criterion on
international test takers coming from various rhetorical and argumentative
traditions and the necessity of defining the writing construct with respect to
the claims of IE LT S to be an international test of English, have not been
adequately included in these research efforts. In addition, some areas of
research on the reliability of test scores highlight serious issues that need
further consideration. Therefore, the future research agenda for I E LT S
should include the following issues.
In terms of reliability:
n the comparability and appropriateness of prompts and tasks for all test
takers should be continuously investigated
n multiple raters should be included in the rating process and inter- and
intra-rater reliability measures should be constantly calculated
n more research is needed regarding scales and how scores are rounded to
a final score
n rater behaviour while using the scales should be investigated.

IE LT S has rich data sources such as ESM in hand; however, so far this
source has not been fully exploited to understand interactions among the
above-mentioned factors with relation to test taker and rater profile.
In terms of improving the validation efforts with regard to the I E LT S writing
n future research should be performed to explore whether the
characteristics of the IE LT S test tasks and the TL U tasks match, not only
in the domain of the UK and Australia, but also in other domains

A critical review of the I ELT S writing test 319

Downloaded from
by University of Birmingham user
on 13 September 2017
n cultural differences in writing should be considered both in the construct
definitions and rater training efforts
n research in respect of determining the construct of international English
ability and international English writing ability should also be conducted
by using the already existing corpus of I E LT S, and consequences of the
assessment practices and criteria in terms of their impact on power
relationships in the world context should also be taken into consideration
n test users should also have responsibility for undertaking their own
research to ensure that the test is appropriate for their own institutional or
contextual needs.

Final revised version received November 2008

Notes Reports Vol. 5, Paper 3. Canberra, Australia: I D P

This article forms one part of a Point/Counterpoint I ELT S Australia.
section. It will appear together with its opposing Moore, T. and J. Morton. 1999. Authenticity in the
Counterpoint view in the printed journal. I ELT S Academic Module. Writing Test: A
Comparative Study of Task 2 Items and University
A version of this Paper was presented at the 5th Assignments. I E LT S Research Reports Vol. 2, Paper
International ELT Research Conference, Canakkale, 4. Canberra, Australia: I D P IE LT S Australia.
Turkey in May 2008. OLoughlin, K. and G. Wigglesworth. 2003. Task
Design in I E LT S Academic Writing Task 1: The
References Effect of Quantity and Manner of Presentation of
Blackhurst, A. 2004. I E LT S test performance data Information on Candidate Writing. I E LT S Research
2003. Research Notes 18: 1820. Reports Vol. 4, Paper 3. Canberra, Australia: I D P
Falvey, P. and S. D. Shaw. 2006. I E LT S writing: I ELT S Australia.
revising assessment criteria and scales (phase 5). Purves, A. and G. Hawisher. 1990. Writers, judges,
Research Notes 23: 712. and text models in R. Beach and S. Hynds (eds.).
Hamp-Lyons, L. 1990. Second language writing Developing Discourse Practices in Adolescence and
assessment issues in B. Kroll (ed.). Second Language Adulthood. Advances in Discourse Processes. Vol. 39.
Writing: Research Insights for the Classroom. New York: Norwood, NJ: Ablex Publishing.
Cambridge University Press. Shaw, S. D. 2004. I ELT S writing: revising
Kachru, Y. 1997. Culture and argumentative writing assessment criteria and scales (phase 3). Research
in world Englishes in L. E. Smith and M. L. Forman Notes 16: 37.
(eds.). Literary StudiesEast and West: World Shaw, S. D. 2007. Modelling facets of the
Englishes 2000 Selected Essays. Honolulu, HI: assessment of writing within an E SM environment.
University of Hawaii Press. Research Notes 27: 149.
Kobayashi, H. and C. Rinnert. 1996. Factors Taylor, L. 2002. Assessing learners English: but
affecting composition evaluation in an EF L context: whose/which English(es)? Research Notes 10: 1820.
cultural rhetorical pattern and readers background. Taylor, L. 2004. Second language writing
Language Learning 46/3: 397437. assessment: Cambridge ES OLs ongoing research
Kroll, B. and J. Reid. 1994. Guidelines for designing agenda. Research Notes 16: 23.
writing prompts: clarifications, caveats, and
cautions. Journal of Second Language Writing The author
3/3: 23155. Hacer Hande Uysal is currently an assistant
Mickan, P., S. Slater, and C. Gibson. 2000. A Study of professor in the E LT programme at Gazi University,
Response Validity of the I ELT S Writing Module. Ankara, Turkey. She received her MA in English
IELT S Research Reports Vol. 3, Paper 2. Canberra,
Education and her PhD in Foreign Language/ES L
Australia: IDP I ELTS Australia. Education from the University of Iowa, US A. Her
Mickan, P. 2003. What is Your Score? An research interests are second language writing,
language planning, and teacher education.
Investigation into Language Descriptors from
Rating Written Performance. I E LT S Research

320 Hacer Hande Uysal

Downloaded from
by University of Birmingham user
on 13 September 2017