Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
National Council of Teachers of English is collaborating with JSTOR to digitize, preserve and extend
access to College Composition and Communication
This content downloaded from 128.197.26.12 on Mon, 27 Jun 2016 02:29:25 UTC
All use subject to http://about.jstor.org/terms
What We Don't Know about the
Evaluation of Writing
James C. Raymond
James C. Raymond is Associate Professor of English and Assistant Dean of the Graduate
School at the University of Alabama. He is the author of Writing (Is an Unnatural Act), editor of
Literacy as a Human Problem, and co-author of a forthcoming book on legal writing.
College Composition and Communication, Vol. 33, No. 4, December 1982 399
This content downloaded from 128.197.26.12 on Mon, 27 Jun 2016 02:29:25 UTC
All use subject to http://about.jstor.org/terms
400 College Composition and Communication
This content downloaded from 128.197.26.12 on Mon, 27 Jun 2016 02:29:25 UTC
All use subject to http://about.jstor.org/terms
What We Don't Know about the Evaluation of Writing 401
we want to assess editorial skills as distinct from other aspects of writing that
are generally acknowledged to be more important. Usage, incidentally, is one
aspect of writing that ought to be examined with empirical methods, though
as Joseph Williams has told us recently" and Thomas Creswell six years
ago,12 we seem content to recycle old shibboleths instead of using methods
currently available to discover the facts of usage in edited English.
2. There is safety in numbers. Inferences made on the basis of large sam-
ples may be useful, as long as they are not applied injudiciously to the evalua-
tion of single papers. It is true that T-unit length is longer, on the average, in
professionally written prose than it is, on the average, in prose composed by
twelfth-graders. It is not true that any given sample of prose with long
T-units is necessarily better than another sample with short T-units, not even
if they both contain precisely the same information.
3. There is safety in numbers. Because the performance of skilled writers
varies considerably from one day to the next and from one writing task to
another, it makes sense to construct assessment tests that require more than
one kind of writing on more than one day. If external constraints preclude
multiple testing, it makes sense to allow students who are dissatisfied with
the results of a single essay exam to take another without prejudice.
4. There is safety in numbers. Because variability in reader response is
both inevitable and desirable, it makes sense to have more than one reader
evaluate any exam that will have serious consequences for individual stu-
dents. Even in daily classroom practice, it makes sense for teachers to give
students the option of having a second reader for any paper on which the
instructor's grade is disputed.
5. Although training sessions for raters are normally motivated by the de-
sire to achieve inter-rater reliability, their chief value is that they require
evaluators to examine their assumptions critically and to arrive at an institu-
tional policy about what is important and unimportant in writing. The inter-
rater reliability achieved this way ought not to be confused with objectivity
or validity; the consensus reached at one institution will and ought to vary
from the consensus reached at other insitutions, just as judgments about
what constitutes publishable prose vary among editors and publishers.
6. The degree to which inter-rater reliability is a desirable characteristic in
evaluation varies with the kind of assessment the procedure is intended to
yield. It would be possible to achieve near perfect inter-rater reliability by
simply counting the number of words produced during the test period; but
no one would seriously accept this as a measure of quality. Because the qual-
ity of writing resides not entirely in the text, but in the interactions among
the text, its author, and its individual readers, we should not only expect but
actually demand a reasonable amount of variation among raters when the goal
is to evaluate a piece of writing as a whole. Instead of apologizing for reliabil-
ity rates in the neighborhood of .80, we might well become suspicious of
rates that are much higher than that.
This content downloaded from 128.197.26.12 on Mon, 27 Jun 2016 02:29:25 UTC
All use subject to http://about.jstor.org/terms
402 College Composition and Communication
This content downloaded from 128.197.26.12 on Mon, 27 Jun 2016 02:29:25 UTC
All use subject to http://about.jstor.org/terms
What We Don't Know about the Evaluation of Writing 403
Notes
1. Richard Braddock, Richard Lloyd-Jones, and Lowell Schoer, Research in Written Composi-
tion (Urbana, IL: National Council of Teachers of English, 1963), p. 5.
2. See Lee Odell and Charles Cooper, "Procedures for Evaluating Writing: Assumptions and
Needed Research," College English, 42 (September, 1980), 36.
3. Paul B. Diederich, Measuring Growth in English (Urbana, IL: National Council of Teach-
ers of English, 1974).
4. Richard Lloyd-Jones, "Primary Trait Scoring," in Evaluating Writing: Describing, Measur-
ing, Judging, ed. Charles R. Cooper and Lee Odell (Urbana, IL: National Council of Teachers of
English, 1977), pp. 33-66.
5. Kellogg W. Hunt, Grammatical Structures Written at Three Grade Levels (Urbana, IL: Na-
tional Council of Teachers of English, 1965).
6. Charles R. Cooper, "Holistic Evaluation of Writing," in Evaluating Writing: Describing,
Measuring, Judging, p. 3.
7. E. D. Hirsch, Jr., The Philosophy of Composition (Chicago: University of Chicago Press,
1977), p. 189.
8. Quoted by Jonathan Culler, Ferdinand de Saussure (New York: Penguin, 1976), p. 8.
9. Culler, p. xv.
10. Anne Ruggles Gere, "Written Composition: Toward a Theory of Evaluation," College Eng-
lish, 42 (September, 1980), 58.
11. Joseph M. Williams, "The Phenomenology of Error," College Composition and Communica-
tion, 32 (May, 1981), 152-168.
13. Aviva Freedman and Ian Pringle, "Writing in the College Years," College Composition and
Communication, 31 (October, 1980), 314.
14. "Procedures for Evaluating Writing," p. 43.
This content downloaded from 128.197.26.12 on Mon, 27 Jun 2016 02:29:25 UTC
All use subject to http://about.jstor.org/terms