Professional Documents
Culture Documents
DUDLEY W. REYNOLDS
Framing the Issue
Today three trends signal future directions in the development of new forms of
second language writing assessment. First, influenced by dynamic theories of lan-
guage development, researchers have begun to investigate techniques for assessing
what students can produce on their own versus with the help of scaffolding. Second,
drawing on recent advances in computerized natural language processing (NLP),
researchers are examining how language patterns drawn from large-scale analyses
of corpus data can be used for automated writing assessment. Finally, also taking
advantage of the “big data” analysis capabilities offered by cloud computing,
researchers are developing assessment systems that will facilitate and track class-
room-based formative assessments and combine them with information from more
standardized, benchmark assessments in order to provide policy and decision
makers at all levels with real-time, comprehensive data.
Making the Case
At the most general level, writing ability can be understood as the ability to
generate a written product or products with specified characteristics in a specified
manner. Assessment instruments tend to focus therefore on the product(s) or on
the process of writing. When it comes to second language writers, instruments
may or may not emphasize the challenges faced by individuals when they write in
a code and cultural context with which they have limited familiarity.
An example of a product-focused assessment with less emphasis on the fact that
the writers are using a second language would be one that directs raters to make
judgments about the language, rhetoric, and content of a text. Language use may
be codified in terms of sophistication of vocabulary, syntactic development, and
conformity to grammatical conventions. Rhetoric may invoke judgments related
to audience appeal, organization, and clarity of purpose, while content judgments
might focus on the accuracy, development, or novelty of the propositions presented.
To the extent that these judgments represent characteristics of effective texts,
regardless of the linguistic repertoire of the writer, they prioritize writing ability
over second language writing ability. A product-focused assessment with
greater emphasis on the ability of second language users may forego judgments
about content and possibly even about rhetoric but will include measures of
language complexity, accuracy, and fluency, in an attempt to operationalize the
psycholinguistic dimensions of text production. Familiarity with the cultural
context, understood as adherence to genre features, might be added as well.
Assessment schemes that incorporate attention to the production process
often measure process in terms of the stages of production of a single text
(invention, drafting, revision, and editing). Research into the processes of skilled
and unskilled writers suggests that these “stages” are in fact behaviors that
writers engage in repeatedly during production. In order to make judgments
about these behaviors, the assessor will usually examine artifacts such as notes,
drafts, and marked-up copies of the text as well as written reflections made by
the writer about his or her process. Production ability may also be defined as
Pedagogical Implications
The design and appropriate use of writing assessments in TESOL contexts requires
an understanding of the options of assessment tasks and scoring procedures, as
well as attention to potential issues related to fairness and equity.
Assessment tasks should be shaped by decisions about whether the goal is to
assess writing products or processes, the relative importance of task authenticity
and reliable scoring, and the need for summative judgments versus formative
feedback. The most common assessment task in second language writing con-
texts is probably a prompt-based essay. The prompt is a simulation of a “real-
world” context and purpose that might elicit the creation of a written text. It
serves both to inspire content and to prescribe boundaries for the form of the
response. It may consist of little more than a question or a provocative statement,
with instructions to produce a text of a certain length. Alternatively it may pro-
vide specifications regarding a context where the text would be read or used,
characteristics of the potential audience, suggestions for the writing process,
characteristics of a good or bad response, and information about how the
response will be evaluated. Finally, respondents may be expected to produce the
text as soon as they have read the prompt, in which case the result would be
considered an impromptu writing s ample; or they may be given an extensive
period of time, in which case there may be greater expectations of content devel-
opment and revision.
One recent innovation in more formal writing assessments such as the one
included in ETS’s TOEFL® exam has been the use of integrated assessment of writ-
ing. Inspired by the desire to make such formal assessments mirror the way in
which writing is frequently assessed in classroom settings, integrated assessments
first provide respondents with one or more reading or listening passages on which
they are tested; then the respondents are asked to write an essay that responds to
or draws on the content of the passage(s). Integrated tasks are frequently used in
classroom settings because they mirror authentic, non-classroom writing tasks
such as summarizing a business meeting or writing an academic paper. But, if the
goal is to measure writing ability independently of other language abilities, it
should be kept in mind that it is not clear to what degree the quality of a text has
been influenced by the respondent’s ability to read or understand the source
passage(s).
As an alternative to tasks that base their judgments on a single product, many
classroom teachers and instructional programs have moved to requiring students
to assemble writing portfolios. Typical portfolio tasks require students to select
samples of their work that exemplify specified criteria, to evaluate the work in
relation to those criteria, and to reflect upon their overall learning. Often the works
students choose from will have been previously assessed as stand-alone products,
and so the portfolio task shifts the emphasis to the student’s general writing ability
and meta-awareness. Occasionally portfolios may still focus on a single essay that
has been previously graded. In this case, students must revise the essay and write
a commentary that explains why they made certain revisions, so that the emphasis
is again on broad learning. The biggest challenge with portfolio tasks is the
quantity of information provided through different formats. Judgments about
portfolios can be informed by which texts are selected, the characteristics of the
texts themselves, and the numerous reflective statements included.
While most writing assessment tasks today are based on having students actu-
ally write, it is also possible to consider more indirect measures of writing ability.
Historically, grammar and reading tests employing selected response questions
that could be reliably scored were used to make inferences about one’s writing
ability. Concurrent validity studies frequently showed high correlations between
such measures and teachers’ judgments. It is also possible to construct discrete
test items that query one’s knowledge of writing conventions, rhetorical termi-
nology, and publication styles, or even items where respondents have to choose
between revision options, organizational sequences, or effective introductions.
The writing task provides an opportunity for the students to demonstrate their
ability, but there must also be a system for representing judgments about that abil-
ity—for coming up with scores, grades, or ratings. Scores may be assigned directly
by human raters or by computerized algorithms guided by data from raters who
have made judgments about a set of reference texts. Computerized writing assess-
ments are becoming much more common because of the speed with which they
can be completed; human raters, however, are much better at handling outliers
within a set of responses and also do not require previously scored essays for nor-
ming. Whether human or computerized, scoring systems present two challenges:
What do the scores represent about one’s writing ability? And how do the scores
differentiate individual writers?
Ways of conceptualizing the writing ability in terms of process and product
have been discussed above. When it comes to scoring, however, a decision must be
made about whether the different dimensions of the writing ability should be rep-
resented individually (trait ratings) or as a single ability (holistic rating). The
choice between the two systems is often presented as being about whether the
ability to write is more than the sum of its parts. In practice, however, the decision
is often made on much more practical grounds. If there is a need to provide diag-
nostic feedback to teachers or students (or both), then scores that provide judg-
ments about components, as opposed to judgments about the general ability, are
more desirable. If the purpose of the assessment is to make decisions about level
placement or program entrance/exit, then a single holistic rating will suffice. One
other option is to devise a hybrid system where raters assign trait ratings that are
then combined according to some algorithm to produce a single overall score. By
weighting a trait such as “clarity of purpose” more than “accurate use of punctua-
tion,” for example, the algorithm can also reflect curricular priorities.
Rater judgments are inherently variable. One advantage of holistic systems in
terms of reliability is that they allow for multiple paths to the same end, which
means that there is likely to be greater agreement between scores assigned by dif-
ferent raters. Too much variation in how raters assign scores, however, means that
it is not clear what the score represents. It is important therefore to provide raters
with clear descriptions of what different scores represent. Often such descriptions
SEE ALSO: Analytic, Holistic, and Primary Trait Marking Scales; Automated
Writing Assessment; Ethics in Testing and Assessment; Integrated-Skills
Assessment; Large-Scale Writing Assessment; Norm-Referenced Testing and
Criterion-Referenced Testing; Placement Testing; Portfolios; Scoring Writing
Reference
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education:
Principles, Policy & Practice, 5(1), 7–75.
Suggested Readings
Behizadeh, N., & Engelhard, G., Jr. (2011). Historical view of the influences of measurement
and writing theories on the practice of writing assessment in the United States. Assessing
Writing, 16(3), 189–211. doi:10.1016/j.asw.2011.03.001
Crusan, D. (2010). Assessment in the second language writing classroom. Ann Arbor: University
of Michigan Press.
Cumming, A. (2013). Assessing integrated writing tasks for academic purposes: Promises
and perils. Language Assessment Quarterly, 10(1), 1–8. doi:10.1080/15434303.2011.622016
Hamp-Lyons, L. (Ed.). (1991). Assessing second language writing in academic contexts. Norwood,
NJ: Ablex.
Reynolds, D. W. (2010). Assessing writing, assessing learning. Ann Arbor, MI: University of
Michigan Press.
Weigle, S. C. (2002). Assessing writing. Cambridge, England: Cambridge University Press.