Assessing Writing: Lynda Struthers, Judith C. Lapadat, Peter D. Macmillan

Assessing Writing 18 (2013) 187–201
Contents lists available at SciVerse ScienceDirect
Assessing Writing
Assessing cohesion in children’s writing:

Development of a checklist
Lynda Struthers ∗, Judith C. Lapadat 1, Peter D. MacMillan 2
University of Northern British Columbia, Canada
a r t i c l e i n f o a b s t r a c t
Article history: Cohesion in writing is achieved through the use of linguistic devices
Received 26 October 2012 that tie ideas together across a text, and is an important element in
Received in revised form 4 May 2013 the development of coherent writing. Research shows that inter-
Accepted 10 May 2013
and intra-developmental differences may appear in how children
Available online 11 June 2013
learn to use these devices, but cohesion is commonly overlooked in
the evaluation and instruction of writing. In this study, we devel-
Keywords:
oped a checklist to assess cohesion in the writing of children in
Cohesion
Grades 4–7, with the purpose of informing instructional practices.
Written composition
Writing assessment Following the procedure outlined by Crocker and Algina (1986), we
Children developed and evaluated a checklist designed to assess the types
Writing development of cohesive devices present in the writing of children. The checklist
Curriculum-based measurement items showed fair to good discrimination between high and low
scoring writers as demonstrated by a classical item analysis. We
also found good interrater reliability, and evidence for discrimina-
tive validity. As internal consistency was weak, however, further
research is needed to refine the instrument. Implications for the
assessment of cohesion and future research are discussed.
© 2013 Elsevier Ltd. All rights reserved.
1. Introduction
Written language is an important form of communication. Consequently, learning to write well is

an important educational goal, and one that requires the development of a complex variety of skills.
∗ Corresponding author at: University of Northern British Columbia, 3333 University Way, Prince George, British Columbia,
Canada V2N 4Z9.
Tel.: +1 250 614 0309; fax: +1 250 960 5744.
E-mail addresses: struther@unbc.ca (L. Struthers), judith.lapadat@uleth.ca (J.C. Lapadat), peterm@unbc.ca (P.D. MacMillan).
1
Present address: University of Lethbridge, 4401 University Drive, Lethbridge, Alberta, Canada T1K 3M4.
Tel.: +1 403 332 4418; fax: +1 403 329 2112.
2
School of Education, University of Northern British Columbia, 3333 University Way, Prince George, British Columbia, Canada
V2N 4Z9. Tel.: +1 250 960 5828; fax: +1 250 960 5536.
1075-2935/$ – see front matter © 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.asw.2013.05.001
188 L. Struthers et al. / Assessing Writing 18 (2013) 187–201
Among these are fluency with transcription (spelling and letter formation/keyboarding), language
based skills such as word choice and construction of grammatically correct sentences, and mechanical
skills such as the appropriate use of capital letters and punctuation. However, for children to effectively
communicate their ideas in writing they need to do more than write correctly; they must learn to
construct coherent texts.
In order for educators to effectively teach composition skills, they must be able to systematically
assess writing. Assessment of the mechanical aspects of writing is well established; teachers are adept
in detecting errors of spelling, punctuation, grammar, and sentence structure. However, when it comes
to examining coherence, assessment typically involves teacher judgment or holistic ratings. Holistic
judgments are useful for gaining an overall impression of a written piece, but are less useful for exam-
ination of specific text features and skills (Walcott & Legg, 1998). Analysis of the features of students’
writing is important because it allows teachers to detect strengths and weaknesses, and subsequently
design differentiated instruction that addresses specific skill deficits (National Commission on Writing,
2003; Rousseau, 1990). Thus, in order to design differentiated or remedial instruction that helps stu-
dents learn to write coherently, assessment of the text level features that contribute to coherence is
warranted.
Aspects of a text that contribute to coherence include topic coherence and local connections among
sentences (McCutchen & Perfetti, 1982). Topic coherence refers to the integrity and overall semantic
unity of a written text, and is achieved when each sentence provides a relevant contribution to the
topic. Local connections involve the explicit and implicit connections between neighboring sentences.
Explicit connections, referred to as cohesive devices, can bolster the reader’s ability to make inferences
(Irwin, 1988; O’Reilly & McNamara, 2007; Palmer, 1999) and errors in cohesion can get in the way
of a reader’s efforts to understand the message of the writer (Hedberg & Fink, 1996; Watson Todd,
Khongput, & Darasawang, 2007). It is these cohesive devices that are of interest here as they represent
tangible aspects of a text that can be observed for the purposes of assessment and feature analysis.
However, the analysis of cohesion is not well addressed in assessments of written language com-
monly used in education. The purpose guiding this study, therefore, was to develop an instrument
that would allow for the analysis of cohesion in the writing of school-aged children with the aim of
informing differentiated instruction for those who struggle with creating cohesive texts.
1.1. Definitions of cohesion
Cohesive devices are lexical and grammatical structures that support the formulation of coherent
texts (Mortensen, Smith-Lock, & Nickels, 2009). Halliday and Hasan (1976) described five linguistic
devices that are used to establish cohesion in both spoken and written English: reference, conjunc-
tion, lexical cohesion, substitution, and ellipsis. Reference involves the use of pronouns, articles, and
demonstratives to refer to information previously mentioned in the text (e.g., John sniffed the air. He
could smell smoke), and thus contribute to local connectedness. Conjunction involves the use of addi-
tive (e.g., and), temporal (e.g., before), causal (e.g., because), adversative (e.g., but), and continuative
(e.g., now) conjunctions, as well as adverbial phrases to link ideas across phrases and sentences. Con-
junction also supports local connectedness. Lexical cohesion occurs when semantically related words
are used throughout the text. As such, lexical cohesion captures aspects of both local connectedness
and topic coherence. Reiteration is one type of lexical cohesion, which includes repetition of the same
word (e.g., dog – dog) or the use of superordinates (e.g., dog – animal), synonyms (dog – canine), or
near synonyms (dog – beast) to refer to the same item, person, or event. Collocation, another form of
lexical cohesion, involves the use of antonyms, complementary terms, and converses throughout the
text (e.g., hot – cold, sand – beach, asked – answered). Substitution involves the use of a generic term
to replace a redundant element (e.g., He really wanted a red ball. Finally he found one) and ellipsis
involves the elimination of redundant elements altogether (e.g., I was going to go but [I] didn’t [go]).
In addition to the cohesive devices described by Halliday and Hasan (1976), Perera (1984) indicated
that consistent use of tense markers across a text also supports connectedness in writing. As well, in a
small exploratory study of writing from students in Grades 3, 5, and 7, the first author found that the
use of organizational structures like paragraphs and logical sequencing enhanced the topic coherence
of a written text.
L. Struthers et al. / Assessing Writing 18 (2013) 187–201 189
1.2. The value of assessing cohesion
1.2.1. Cohesion as an indicator of coherence and quality

Cohesion and coherence are not synonymous terms. Whereas coherence relates to the unity in
meaning conveyed by a text, cohesion equates to the text level markers through which coherence is
displayed (Bereiter & Scardamalia, 1987; Graesser, McNamara, Louwerse, & Cai, 2004). Coherence is
also impacted by reader factors such as background knowledge (McNamara, Crossley, & McCarthy,
2010; O’Reilly & McNamara, 2007) and comprehension skill (O’Reilly & McNamara, 2007). Neverthe-
less, cohesion may be an indicator of coherence and quality in children’s writing, as texts with better
use of cohesive devices are also rated as more coherent and of better quality than texts with less
cohesion.
For example Cox, Shanahan, and Sulzby (1990) found that appropriate use of cohesive ties was
positively correlated with ratings of quality in the expository texts of children in Grades 3 and 5.
As well, they found that cohesive harmony, a unitary measure of overall cohesiveness, was related
to quality in both expository and narrative texts. Fitzgerald and Spiegel (1986) found a significant
negative relationship between distance among cohesive ties and a holistic measure of coherence,
such that coherence scores were lower when the distance between ties was greater. Cameron et al.
(1995) found that cohesion accounted for 15% of the unique variance in overall quality of writing of
children, indicating that cohesion plays a role in writing well.
As cohesion is related to text quality and coherence, it may therefore act as a good indicator of coher-
ence. Whereas coherence is not directly observable, cohesion is. Thus, examination of cohesion can
help educators pinpoint where some children may be having difficulty in composing well-constructed
unified texts.
1.2.2. Development of cohesion

As writers develop, along with their growing awareness of audience, there is an increase in the
strategic manipulation of text level structures that allow the writer to intentionally guide their reader
to follow the intended message (Bereiter & Scardamalia, 1987). However, before they can manipulate
linguistic devices in their writing, writers must be able to produce them. Berninger, Mizokawa, Bragg,
Cartwright, and Yates (1994) found, that for school-aged children, production of local connections
between sentences preceded development of the metalinguistic manipulation of these same struc-
tures, suggesting that if children are not able to retrieve and produce cohesive devices, they will also
be unable to strategically manipulate these devices when composing.
Additionally, writers with differing developmental abilities appear to use cohesive devices differ-
ently from one another. For instance, a number of researchers (e.g., Crowhurst, 1987; Fitzgerald &
Spiegel, 1986; Yde & Spoelders, 1985) have found grade level effects for cohesion, showing that stu-
dents at various levels use cohesive devices differently. As well, students with learning disabilities
demonstrate poorer use of cohesive devices when compared to same-aged peers who are typically
developing (Feifer & De Fina, 2002; Silliman, Jimerson, & Wilkinson, 2000). In a study of cohesion in
the writing of children in the elementary grades, Hedberg and Fink (1996) found that students with
language-learning disabilities (LLD) had less cohesive density (a proportional measure of cohesion)
and cohesive harmony than peers without LLD. Similarly, Cox et al. (1990) found that poor readers used
cohesive ties inappropriately in their writing more often than did good readers. Individuals with LLD
may continue to exhibit weak cohesion in their writing in adulthood as well. For example, Mortensen
et al. (2009) found that adults with language impairments were more likely to make errors in pronoun
referencing than adults without language impairments.
Not only are there developmental and individual differences in the way cohesive devices are used
by children, but there are also intra-individual differences. For example, Berninger et al. (1994) found
children’s ability to connect sentences in paragraphs, largely through the use of cohesive devices,
was unrelated to their ability to write well structured sentences or spell well. Whitaker, Berninger,
Johnston, and Swanson (1994) also found no relationship among skills of lexical sophistication,
sentence complexity, or the use of linking words within writing produced by the same individual. How-
ever, they did find a strong relationship between an individual’s production of connections between
sentences in paragraphs and the ability to effectively revise these connections. Thus, it would appear
that difficulties with creating locally connected texts through the use of cohesive devices develops
separately from other writing skills and is related to the ability to edit these connections when revising.
Given that cohesion contributes to text quality; the use of cohesive devices changes with writing
development; difficulties using cohesive devices in writing may persist over time; and problems with
cohesion may occur when other areas of writing remain relatively intact, the assessment of cohesion
becomes important in understanding individual differences in writing performance. Assessment that
shows which cohesive devices students are or are not using effectively to connect their ideas in writing
will help educators design instructional programming to improve these students’ production and
strategic use of such devices.
1.2.3. Lack of availability of cohesion assessment instruments in schools

Although assessment of cohesion may provide useful information about student writing, it has
been largely overlooked in traditional writing assessments (Feifer & De Fina, 2002). In a recently
completed search of commercially available writing tests for school-aged children, we found only
four that attempted to measure aspects of cohesion: the Oral and Written Language Scales (OWLS);
the Writing Process Test; the Wechsler Individual Achievement Test – III (WIAT–III); and the Test of Early
Written Language, Second Edition (TEWL-2). The OWLS (Carrow-Woolfolk, 1996) contains only two
items worth one point each, scored for the effective use of connections between sentences, consistent
tenses, and transition words like ‘then’ in a paragraph writing task. The Writing Process Test provides a
single rating for Organization/Coherence in composition (Kimmel, 1998). The WIAT–III (Psychological
Corporation, 2002) gives credit for the use of linking expressions, unity, and logical order for paragraph
and essay writing. The TEWL-2 scores the use of pronouns, topic sentences, and sequencing of events
in a story writing task, each on a four point rating scale (Hurford & Trevisan, 1998). Each of these
instruments may be useful in determining when students are, or are not, writing cohesively, but does
not provide an analysis of how well, or which type of cohesion is being used.
1.3. Content and format considerations for assessment of cohesion
In order to determine which aspects of cohesion to assess, we examined research studies that
evaluated the use of cohesive devices in the writing of children. Consideration of both the methods
and results of these studies provided direction for the content and format of an assessment tool for
measuring children’s written cohesion. Specifically, we were interested in the measures typically used
and the developmental trends for cohesion.
One difficulty we encountered in our search was a lack of extensive research examining develop-
mental trends in the use of cohesive devices in children’s writing. Additionally, it was challenging to
extract developmental information from this small body of research due the methodological differ-
ences that impacted the findings across studies. For example, while the majority of studies examined
inter-sentential use of cohesion strictly using Halliday and Hasan’s (1976) classifications (Cameron
et al., 1995; Fitzgerald & Spiegel, 1986; Rutter & Raban, 1982; Yde and Spoelders, 1985; Zarnowski,
1983), one study by Cox et al. (1990) examined only reference and ellipsis. In other studies the cat-
egories were adjusted. For instance, Crowhurst (1987) counted three categories of lexical cohesion
(lexical repetition, synonyms, and collocation) and Rutter and Raban (1982) expanded the lexical cat-
egory to include items that were “related as a direct result or consequence of an earlier referent” (p.
68). In another study, McCutchen and Perfetti (1982) did not directly use Halliday and Hasan’s cate-
gories. Instead, they counted incidents of reiteration and reference after parsing the sentences from
writing samples into given and new portions.
In addition to classification differences, studies utilized different writing tasks. These differences
included, but were not limited to, differences with audience, genre, methods for collecting the writing
samples, and sample length. Finally, different age-groups have been examined.
Not surprisingly, findings from these varied studies have been somewhat mixed. Given these
methodological differences, our review of the literature focused on convergent findings across studies
that superseded these differences and would provide evidence for general trends in the development
of cohesion. It should also be noted that most of the studies we reviewed examined narrative writing.
However, for the few studies that examined more than one genre, we were most interested in the
aspects of cohesion that appeared across genres, albeit in different patterns (Crowhurst, 1987) or at
different ages (McCutchen & Perfetti, 1982).
1.3.1. Methods of scoring cohesion

Researchers examining cohesion in writing have done so using a number of scoring approaches.
These include scores of raw counts and proportions of cohesive devices (Crowhurst, 1987), measures of
distances between cohesive ties and their referents (Fitzgerald & Spiegel, 1986), measures of accuracy
in the use of cohesive devices (Cox et al., 1990), and unitary scores of cohesion such as cohesive density
(Yde & Spoelders, 1985) or cohesive harmony (Hedberg & Fink, 1996).
Studies utilizing counts and proportions classify occurrences of cohesive devices according to
Halliday and Hasan’s (1976) definitions, and then tally the total number of devices used across a
certain number of text units (e.g., words or sentences) or calculate the proportion of devices that
represent each category. For example, several studies report the number of ties per T-unit that are
present in a text (Cameron et al., 1995; Cox et al., 1990; Crowhurst, 1987). A T-unit is defined as an
independent clause plus any attached subordinate clauses (Hunt, 1965). Other studies report the total
number of devices present within the entire text (Fitzgerald & Spiegel, 1986; Rutter & Raban, 1982;
Yde & Spoelders, 1985).
Measures of accuracy focus on ties that are complete (i.e., the referent is found within the text) or
ambiguous (i.e., the referent must be inferred or is unclear: Cox et al., 1990; McCutchen & Perfetti,
1982). Measures of distance examine how many sentences or other such text units are inserted
between cohesive elements. For example, McCutchen and Perfetti (1982) categorized and counted ties
as occurring in adjacent sentences (immediate), in sentences separated by one intermediate sentence
(mediated), and in sentences farther apart (remote). Additionally, some researchers have calculated
mean distances by averaging the number of ties by distance across the entire sample (e.g., Spiegel &
Fitzgerald, 1990).
Cohesive harmony and density, on the other hand, involve unitary measures. That is, they provide
a numerical representation of how cohesive a written text is, but do not offer as much insight into
analysis of the types of devices used or the presence of errors or ambiguities. Thus, for the purposes
of this study, we focused our attention on measures of counts, proportions, distances, and accuracy in
cohesion.
1.3.2. Use of cohesive devices by children

We found several converging findings in the research regarding the types of devices commonly
used by children. One is that substitution and ellipsis are rarely found in texts produced by children
(Cameron et al., 1995; Crowhurst, 1987; Yde & Spoelders, 1985; Zarnowski, 1983). Rather, children’s
writing predominantly features lexical cohesion, reference, and conjunction. For example, in their
study of narrative writing by children in Grades 3 and 6, Fitzgerald and Spiegel (1986) found lexical
cohesion was the most commonly used device followed by reference, then conjunction. Cameron
et al. (1995) found the same pattern in narrative writing by children in Grade 4. Zarnowski (1983)
found that students in Grades 4, 6, and 8 used mainly conjunctions; lexical repetition, synonyms, near
synonyms, and collocation types of lexical cohesion; and pronominal and demonstrative referencing,
as opposed to other types of cohesive devices, when writing narratives. In an examination of written
narratives and arguments by students in Grades 6, 10, and 12, Crowhurst (1987), found that, across
grades and genres, lexical cohesion and reference (specifically pronoun referencing, demonstratives,
and the definite article ‘the’) were the most commonly used cohesive ties. Collectively, these findings
suggest that the content of an assessment tool for children should focus on reference, conjunction,
and lexical cohesion.
Despite these convergent findings, some results have been more conflicting, especially across stud-
ies counting the total number of devices used by different age groups. For example, Fitzgerald and
Spiegel (1986) found fewer cohesive ties used in written narratives by students in Grade 6 than in
Grade 3. Conversely, Yde and Spoelders (1985) found that children aged 10–11 years used more over-
all ties in their narrative writing than children aged 8–9 years. Zarnowski (1983) also found higher
numbers of cohesive ties in narratives with increased grade level for children in Grades 4, 6, and 8.
Differences specific to reference use have also been found. For example Yde and Spoelders (1985)
found that 10–11 year-olds used slightly more reference than 8–9 year-olds. Likewise, Zarnowski
(1983) found an increase in the use of reference ties between Grades 4 and 8. Conversely, Rutter and
Raban (1982) found higher use of demonstratives but slightly less reference in collected stories and
poems written by 10-year-olds than by 6-year-olds; however, this decrease was associated with a
switch from third to first person writing for their older students.
Some findings for conjunctions are also somewhat mixed. For example, Yde and Spoelders (1985)
found their older participants used slightly fewer conjunctions overall in their narratives than their
younger participants, but the older participants used a higher proportion of temporal conjunctions.
Crowhurst (1987) on the other hand found a decrease in temporal and causal conjunction use in
narratives across Grades 6, 10, and 12. She attributed these grade differences to a decrease in the
use of ‘then’ and ‘so’ for the older students, who tended to use a greater variety but not necessarily a
greater number of conjunctions. In arguments, this same decrease was not found; however, there was
a similar increase in the variety of conjunctions used, with Grade 12 students using the widest variety
in their arguments.
Unlike reference and conjunction, the findings for lexical cohesion have shown more consistent
age and grade related effects. For example, Rentel et al. (1983) found increases in the use of lexical
repetition in narrative writing across Grades 1–4. Rutter and Raban (1982) found more collocation
used by 10-year-olds than by 6-year-olds in stories and poems. Similarly, Crowhurst (1987) found
that collocation increased across Grades 6, 10, and 12 in narrative writing, and synonym use increased
across grades, irrespective of genre.
The mixed findings of these studies suggest that simple counts of cohesive devices may be espe-
cially sensitive to genre and methodological differences, so may not accurately reflect developmental
changes in cohesion. However, one promising finding that emerged was the use of a greater variety of
devices by older than by younger children (Crowhurst, 1987; Rutter & Raban, 1982). In fact, Crowhurst
noted that, in some cases, students with similar mean counts demonstrated differences in the variety
of cohesive ties they used. Thus, simple counts of cohesive devices seem to be insufficient for detec-
ting differences among writers across grades and genres. Instead, examination of the variety of devices
used may provide more reliable information.
In contrast to the mixed results emerging from studies examining counts of cohesive devices,
the findings regarding accuracy and distance have been more consistent. In general, studies have
found a decline in the number of errors or ambiguous ties in narrative (Cox et al., 1990; Fitzgerald &
Spiegel, 1986) and expository texts (Cox et al., 1990) and a decrease in the distance between ties and
their referents in narrative (Fitzgerald & Spiegel, 1986; McCutchen & Perfetti, 1982; Yde & Spoelders,
1985) and expository texts (McCutchen & Perfetti, 1982) with advancing grade in the elementary
years. For example, McCutchen and Perfetti (1982) found, that across Grades 2, 4, 6, and 8, children’s
expository writing reflected the following developmental pattern. Initially, the writing of children
contained many unconnected sentences. With advancing grade these unsuccessful connections were
replaced by remote connections. By Grade 6, students were using a high proportion of sentence
adjacent connections. The patterns was the same in narrative writing, but sentence adjacent con-
nections emerged earlier with this genre. Collectively, these findings suggest that, to assess cohesion,
the content of an assessment tool should reflect the unambiguous use of referents and the distance
between ties.
1.3.3. Considerations for format

Our final consideration was the format for the assessment tool. We wanted a tool that could be
used easily with a variety of classroom generated short writing samples that would utilize numerical
and objective scoring to allow for peer comparisons, and that would also provide specific analytic
information about cohesion for planning differentiated remedial writing instruction. As simple counts
have been shown to be unreliable indicators of developmental change, we felt that a categorical tool
would be a better alternative for providing quantitative information. We chose a checklist format as
this approach allows for a detailed analysis of the types of devices used successfully and unsuccessfully
(Rousseau, 1990) while avoiding the judgments typically required by rating scales (Gearhart, 2009).
2. Developing the checklist
Our aim in conducting this research was to design a checklist to measure cohesion in children’s
written texts, for the purpose of informing instruction. In this preliminary investigation, we focused on
the development of checklist items and limited our focus to the later elementary school years, when
skill with cohesion is generally emerging (Berninger et al., 1994). We asked, “Would a checklist, with
items based on variety, distance, and accurate use of cohesive devices, provide a method that would
adequately detect developmental differences in cohesion in children’s writing.” As the first steps in
answering this question, we examined:
1. how checklist items performed on a classical item analysis;

2. the reliability of the checklist in measuring cohesion;
3. and the ability of checklist scores to capture a distinct aspect of writing apart from general profi-
ciency, and to differentiate grade as an initial indicator of validity.
We followed Crocker and Algina’s (1986) steps for developing assessment instruments. That is, we
first compiled a pool of checklist items based on the literature findings summarized in 1.3.2. We then
conducted a preliminary evaluation involving panel reviews and pilot project evaluations. Finally, we
conducted a large scale evaluation of the instrument, which included item analyses with subsequent
modification of checklist items, followed by reliability and validity checks.
We used short written narratives for the checklist evaluation procedure. This decision related
to the intended use of the checklist in practice. The school district participating in this study used
curriculum-based measurement (CBM) to monitor student progress in writing, so these short written
texts were gathered routinely. We hoped to extend the assessment utility of these writing fluency
measures by using them to evaluate cohesion. Additionally, we felt that if the checklist was able to
detect differences between writers with these short samples, then the checklist would likely be able to
detect writer differences with longer samples; however, we could not be as confident that the reverse
would be true.
3. Method
3.1. Participants
The study utilized archival texts produced by students in Grades 4–7 from four schools in a large
northern school district in British Columbia. One school provided 20 archival written texts; these were
used for a pilot project item analysis. Three other schools provided a total of 342 archival written texts
collected from the entire population of the target grades in each of the three schools. This larger group
of texts was used for the large scale evaluation of the checklist.
3.2. Materials
Students generated the sample texts by writing for 3 minutes from a story starter. This writ-
ing task was administered by school district staff using standardized procedures across grades and
schools. As these written texts were collected routinely, they provided a convenience sample for this
research.
Each text was coded for grade, and identifying information was removed prior to the sample collec-
tion. We excluded written texts if they contained less than two sentences or were illegible, resulting
in removal of 30 texts (8.7% of the sample) from the large scale study. The final sample for the large
scale study consisted of written texts from 71 students in Grade 4, 67 students in Grade 5, 84 students
in Grade 6, and 89 students in Grade 7. Two examples of texts, one from Grade 4 and one from Grade
7, are included in Appendix A. All 20 texts from the sample used in the pilot project met criteria for
inclusion.
3.3. Preliminary checklist development
3.3.1. Item development

We developed the checklist items based on the descriptions of cohesive devices typically used by
school-aged children presented in the research literature. Specifically, we designed items to account
for a variety of devices (with increased variety leading to higher scores), distances (with higher scores
occurring when ties were used in adjacent sentences), and accuracy (with credit given only for cohesive
devices used unambiguously). We also included items related to the global structure of a text based on
previous findings of an exploratory study conducted by the first author, and suggestions from Perera
(1984). Although these items do not fit the classical definition of cohesion provided by Halliday and
Hasan (1976), we felt that they also contributed to unity within a text, and could be objectively scored.
Our initial pool contained 27 items.
Two panels, one consisting of experienced teachers with graduate level course work in test design
and the other consisting of school speech-language pathologists with extensive experience in language
assessment, reviewed the items. These panels provided feedback on technical flaws, content, and
format of the assessment tool. Based on this feedback, we revised items and clarified scoring guidelines.
These revisions included rewording or rearranging items to reduce any noted ambiguity. Additionally,
we flagged some items as potential problems but left them unchanged pending the outcome of the
pilot project.
3.3.2. Pilot project

The pilot project study involved two preliminary phases of checklist evaluation. First, we scored
20 written texts and conducted a first round examination of checklist items using ITEMAN (1994).
ITEMAN is a computer program that generates statistics demonstrating the performance of test items.
These statistics include the proportion correct, which indicates the proportion of writers who received
credit on the item; the discrimination index, which indicates how well an item discriminates between
high and low scorers; and the point biserial correlation, which shows how well the score on a given
item relates to overall performance on the instrument. The texts used in the pilot project were not
used in the large scale analysis.
Next, using a subset of ten writing samples randomly selected from the 342 samples collected
for the large scale analysis, we conducted a preliminary interrater agreement study. We trained 13
volunteer raters (teachers with experience in writing assessment) how to use the checklist. Training
was conducted over a 3-hour session during which we provided background information on cohe-
sion, introduced the checklist, and provided scoring examples and non-examples for each of the
items, followed by practice scoring of four texts with subsequent feedback on agreement and accu-
racy. Following training, the volunteers independently scored 10 writing samples. We then evaluated
the amount of agreement among raters across items as another means of determining problematic
items.
We considered the results of the pilot interrater study in conjunction with the feedback from the
panel reviews and the previous item analysis. Based on this collective information we combined,
deleted, or expanded problem items to reduce ambiguities. The final product of this pilot project
was a 25 item checklist divided into four subsections, namely Reference (REF), Conjunction (CON),
Lexical Cohesion (LEX), and Global Cohesion. This 25 item checklist (included in Appendix B) was then
submitted to a large scale evaluation.
3.4. Large scale checklist evaluation
The first author scored the 312 texts that met criteria for inclusion in the study. After scoring the
written texts, we evaluated the checklist items using ITEMAN (1994). We revised items with a high
or low proportion correct (values >.85 or <.49 respectively) or items with poor discrimination indices.
We then adjusted the scoring to reflect these checklist changes and performed a second item analysis.
We also ran an additional item analysis to determine whether the items performed better as members
of their assigned subsection or in relation to the checklist as a whole.
To evaluate the reliability of the final version of the checklist, we calculated measures of internal
consistency for the total test and individual subsections. We also conducted an interrater reliability
study with the first author and three trained volunteers, all of whom were school speech-language
pathologists. Speech-language pathologists were chosen due to their similar training and experience
in language testing, and their availability to participate in the research. As a measurement of the instru-
ment’s discriminant validity, we then compared checklist scores to measures of fluency and syntactic
complexity. We calculated these scores on the same written texts to contrast checklist scores to meas-
ures of writing proficiency. The fluency measure consisted of calculating the total words written in 3
minutes (TWW). This measure of TWW has been demonstrated to be a reliable indicator of writing
proficiency and strongly related to other measures of writing ability (Benson & Campbell, 2009). The
syntactic measure consisted of calculating the mean length (in words) of T-units (MLTUs) for each
written text. The measure of MLTU has been demonstrated to increase across grades and therefore is
also related to development in writing (Hunt, 1965; Klecan-Aker & Hendrick, 1985; Loban, 1976). In
addition, we performed a discriminant analysis to determine whether checklist scores differentiated
grade level.
4. Results
4.1. Item analyses
Total scores on the checklist ranged from 2 to 16 with a median score of 8 and a mean total score of
7.69 (SD = 2.20). Nine of the checklist items had discrimination indices below .10, so these items were
eliminated or combined with other items. Two of the items had a proportion correct of greater than
.90 indicating a ceiling effect. Because such items do not discriminate well among students (Sax, 1997)
we eliminated them. We removed one entire subsection, Global Cohesion, as we had eliminated all
but one of its items due to poor statistical performance. The final form of the checklist resulting from
these revisions consisted of 13 items divided among three subsections (REF, CON, and LEX). A copy of
this version of the checklist is included in Appendix C.
Adjusting the scores to reflect these last revisions made on the checklist, we now found a total score
range of 0 to 12 with a median score of 5 and a mean of 4.92 (SD = 1.92). The item analysis conducted
on this 13 item checklist revealed improved discrimination indices on all items with values ranging
from .14 to .50 indicating fair to good discrimination (Sax, 1997). Point biserial correlations ranged
from .22 to .47 (p < .001).
We completed a final item analysis to examine the performance of items compared to their subsec-
tions rather than the total test score. As can be seen from Table 1, we found improved discrimination
indices in the analysis of items by subsection for all but two items. The discrimination indices for
11 of the items ranged from .21 to .93 with point biserial correlations for all items ranging from .25
to .82 (p < .001). We subsequently examined subsection inter-correlations and found no relationship
among the three sections of the checklist (r ≤ .10). However, Pearson’s r correlations between the total
test and REF, CON, and LEX were .70, .67, and .42 (p < .001) respectively, demonstrating a relationship
between each of the subsections and overall cohesion scores.
4.2. Reliability
The internal consistency was ˛ = .32 (SEM = 1.58) for the total checklist. Internal consistencies for
the subsections were ˛ = .39 for REF, ˛ = .22 for CON, and ˛ = .10 for LEX. Thus, our findings for the
internal consistency of the checklist were weak.
In the interrater study, Pearson product moment correlations between pairs of raters ranged from
.70 to .91 (p < .05). It should be noted, however, that coefficients among the first author and two of the
raters ranged from .86 to .91. Thus, there was a high degree of agreement among three of the raters.
Despite the lower agreement with one, a one-way ANOVA showed no significant difference among the
raters, F (3, 44) = .479, p = .699. When examining the percentage of item-by-item agreement among
raters across texts, two of the checklist items (items 5 and 13) showed less than 80% agreement. The
remaining 11 items showed between 83.33 and 97.92% agreement.
Table 1
Item statistics of the 13 item checklist by total test and subsection.
Item statistics by total test Subsection Item statistics by subsection
Item no.a Prop. corr.b Dc rpb d Sub iteme Prop. corr. D rpb
1 .66 .36 .39 REF 1-1 .66 .72 .64

2 .51 .46 .45 1-2 .51 .88 .72
3 .20 .20 .26 1-3 .20 .21 .25
4 .55 .50 .47 1-4 .55 .44 .44
5 .25 .25 .28 1-5 .25 .54 .59
6 .52 .24 .28 CON 2-1 .52 .63 .54

7 .12 .18 .33 2-2 .12 .17 .31
8 .13 .14 .22 2-3 .13 .18 .27
9 .57 .29 .32 2-4 .57 .56 .50
10 .37 .34 .36 2-5 .37 .60 .54
11 .29 .21 .29 2-6 .29 .51 .49
12 .14 .21 .25 LEX 3-1 .14 .22 .62

13 .62 .37 .35 3-2 .62 .93 .82
Note:
a
Checklist item number.
b
Proportion correct.
c
Discrimination index.
d
rpb = point biserial.
e
Item number by subsection.
Table 2
Mean writing scores and standard deviations by grade.
Score Grade
4 5 6 7
M (SD) M (SD) M (SD) M (SD)
REF 2.16 (1.35) 1.97 (1.18) 2.18 (1.28) 2.30 (1.18)

CON 1.73 (1.10) 2.00 (1.10) 1.70 (1.17) 2.19 (1.30)
LEX .58 (.58) .69 (.61) .70 (.62) 1.10 (.69)
Total test 4.46 (1.86) 4.76 (1.35) 4.67 (1.10) 5.64 (.58)
TWWa 37.00 (13.33) 43.54 (14.49) 48.83 (15.22) 59.58 (13.76)
MLTUb 7.88 (2.52) 7.94 (2.38) 8.87 (3.03) 9.53 (3.06)
a
Total words written.
b
Mean length of T-unit.
4.3. Validity
Correlations between the cohesion checklist scores and other measures of writing proficiency
yielded mixed results. As can be seen by examining the descriptive statistics presented in Table 2,
there was a general trend of improvement in writing scores with increasing grade. The relationships
among test scores are summarized in Table 3. The relationships between TWW and scores for the
total checklist and the subsections of CON and LEX reflected medium effect sizes (Cohen, 1992). There
was no relationship between TWW and REF scores. The relationships between MLTU and the total
Table 3
Correlations between checklist scores and concurrent writing measures.
Checklist score Total words written r(p) Mean length of T-unit r(p)
REF −.005 (.929) .203 (.000)

CON .393 (.000) −.065 (.259)
LEX .336 (.000) .173 (.003)
Total test .346 (.000) .146 (.012)
test score, REF, and LEX reflected small, although non-trivial effect sizes (Cohen, 1992). There was no
relationship between MLTU and CON.
The discriminant analysis showed that total checklist scores showed a main effect for grade, F (3,
306) = 6.70, p = .000. As can be seen from Table 2, there was a general trend of increasing checklist
scores from Grade 4 to Grade 7; however, the mean scores for Grades 4, 5, and 6 showed only small
raw score differences. Additionally, two of the subsections, CON, F (3, 306) = 4.41, p = .005 and LEX, F (3,
306) = 9.99, p = .000, also showed main effects for grade. The subsection of REF, F (3, 306) = .92, p = .43,
did not.
5. Discussion
This study was undertaken as the first step in developing an instrument that could be used to
analyze cohesion in the writing of elementary school-aged children with the purpose of informing
differentiated instruction. We developed the initial checklist items to capture the variety of cohesive
devices, accuracy in the use of devices, and the distance between devices that may be encountered in
children’s writing. After completing panel reviews and pilot studies to establish a preliminary check-
list, we evaluated the performance of the checklist items using short written narratives produced by
children in Grades 4–7.
This procedure resulted in a 13 item checklist. In general, in accordance with our first test of the
checklist, the items performed adequately on a classical item analysis, but item statistics were further
improved when calculated in relation to subsection scores rather than the total test score. Two items
reflecting the use of subordinating temporal conjunctions (items 7 and 8) remained problematic.
However, the poor discrimination indices of these items were likely due to the infrequent appearance
of these particular types of cohesive devices in the written texts used in this study. Despite poorer
statistical performance, we did not remove these items from the checklist as they represented cohesive
devices that are typically seen in the writing of intermediate grade level children (Crowhurst, 1987;
Perera, 1984). Removal of these items would therefore negatively impact the generalizability of this
instrument to other texts or genres in future studies.
In response to our second test regarding the reliability of the checklist, we found good interrater
agreement among four raters on all but two items of the checklist; however, the slightly poorer agree-
ment of one rater may have accounted for some of the lower agreement on those items. Additionally,
one of the items, item 13, measures collocation. Scoring collocation requires a judgment based on
perceived relationships among words, which may be specific to individual raters (Crowhurst, 1987;
Morris & Hirst, 2004). Thus, despite our goal to make scoring objective through item wording and
binary scoring, some judgment was likely still required to score this item, therefore contributing to
the corresponding decreased agreement.
We also calculated internal consistency as a measure of reliability. The internal consistency of this
checklist was weak suggesting a lack of homogeneity in the items. We considered two explanations
for this finding.
One explanation for the low internal consistency may be related to the construct(s) being measured.
We found that item statistics were better when analyzed as part of their subsections than as part
of the checklist as a whole. We also found that the subsection scores did not correlate with one
another. These findings suggest that different aspects of cohesion may reflect different underlying
constructs that develop independently from one another. If true, this may account for some of the
difficulty researchers have experienced in detecting consistent developmental patterns for cohesion.
Furthermore, if cohesion reflects more than one construct, we would expect to find weak internal
consistency for items across the checklist as a whole.
Another explanation for the lack of internal consistency relates to the variability in the sample.
A side effect of a reduction in the variability is reduced reliability findings (Sax, 1997). Because the
texts used in this study were generated in 3 minutes and were limited in length, they may not have
warranted the use of a large variety of cohesive devices. For example, the majority of writers in this
sample did not use paragraph structures and many of the texts contained only a single paragraph.
Consequently, items in the global cohesion subsection showed poor statistical performance and were
removed from the checklist. Items reflecting cohesive devices that were rarely used also demonstrated
poor findings for proportion correct. Such items were combined, thus contributing to a reduction in
the total number of checklist items. For example, the CON subsection, though capturing the same
content, was reduced from 12 to 5 items. Thus, the potential for the checklist to be sensitive to variety
in the types of cohesion markers used was somewhat diminished. Whereas collapsing items resulted
in improved item statistics, the internal consistency of the checklist may have been compromised.
Likewise, the internal consistency of subsections was also weak, possibly as a function of the limited
number of items per section.
Another way the variability of the checklist was limited was by the categorical nature of the items.
That is, the binary scoring of items, which was initially designed to reduce subjectivity in scoring,
ultimately impacted the instrument’s ability to detect some practical variation among writers. This
problem was particularly an issue for the REF subsection. For example, a writer who used a reference
cohesive tie correctly on three occasions, but incorrectly on one would receive the same score as a
writer who demonstrated multiple errors in use of the device or did not use the device at all. It also
likely impacted the scoring of the LEX subsection. That is, a writer who used many instances of lexical
cohesion would receive the same score as one who used only a single instance of the same lexical
device. Therefore, the instrument’s ability to detect practical variation was affected. Insensitivity to
such differences limited how well the checklist scores reflected variability in the sample.
Our third test focused on validity, by examining the relationship between checklist scores and
other measures of writing development. We found small to medium effect sizes (Cohen, 1992) in
the relationships between the scores on the checklist and measures of writing proficiency. These
small to medium effect size relationships showed that the subsection and total test scores of the
checklist had some relationship to other measures of writing, but appeared to be measuring a unique
attribute of writing not captured by fluency or syntactic complexity scores. It is not surprising that
some relationship was found, as children who demonstrate greater writing fluency and syntactic
complexity are generally better writers (Benson & Campbell, 2009; Klecan-Aker & Hendrick, 1985),
and better writers may also, as a group, be better able to use cohesive ties. If stronger relationships
had been found between cohesion and TWW or MLTU, this may have implied that cohesion scores
were a function of text length or syntactic complexity, respectively. Our findings of small relationships
supports previous research showing that individuals who have difficulty with cohesion may not have
difficulties with other aspects of writing (Berninger et al., 1994; Irwin, 1988; Whitaker et al., 1994),
and provides further evidence that cohesion warrants assessment as a separate skill. Because cohesion
is related to quality in writing, and reflects topic coherence and local connectedness, assessment of
this aspect of writing is important if our goal is to produce writers who can effectively connect their
ideas in writing.
Our final investigation was to determine whether checklist scores were sensitive to developmen-
tal increases across grades, as evidence of the instrument’s sensitivity to developmental differences
among writers. We found that checklist scores showed a general trend of improvement from Grade 4
to 7; however, grade-by-grade differences in checklist scores did not show practical variation. That is,
the raw score discrepancies between grades were very small and sometimes non-existent. For exam-
ple, the difference in the mean raw score between students in Grade 4 and Grade 6 was only .22.
It may be that skill in the use of cohesive devices develops so slowly that differences on a grade by
grade basis are not noticeable. Small changes in cohesion across grades may also mean that students
in Grades 4–6 are in the same developmental writing stage, and therefore, we would not expect to see
a significant or quantitative shift until students move into to the subsequent stage sometime around
Grade 7. So, while the checklist detects some differences among writers of different abilities, further
development is required to increase its sensitivity.
5.1. Limitations
The findings of this checklist development study were impacted by three key limitations. One
was the brief length of the written texts used to evaluate the checklist. Another was the categorical
nature of the checklist items. Both of these factors impacted the sample variability and likely had
a negative impact on the findings for internal consistency. Finally, the findings of this study were
limited by the use of a single type of writing to evaluate the checklist, as cohesion has been shown
to be affected by such things as genre (Crowhurst, 1987; McCutchen & Perfetti, 1982) and audience
(Liles, 1985).
Even though the goal of this research was to develop an instrument that could be used to assess
short writing samples, the CBM samples used here were likely too short. Though we were able to
detect some grade-level differences with these very short samples, longer samples may have resulted
in greater variability. Greater variability in the texts may have led to improved findings for internal
consistency and greater practical variation in grade-by-grade scores. So, although CBM writing samples
have been demonstrated to be valid indicators of writing proficiency, they do not appear well suited
for assessment of higher level text elements such as cohesion. Additionally, the categorical or binary
scoring system of this checklist created ‘absolute’ scoring criteria that reduced subjectivity, but also
variability. This finding suggests that the binary scoring of a checklist may not be well suited for analysis
of cohesion, particularly for reference and lexical cohesion. Instead, a multi-level scoring approach,
which gives credit on the basis of degree of use, may be required to capture variability among writers.
As well, to truly capture the range of cohesive devices used by children in their writing, the checklist
would need to be evaluated against other genres of writing. Even though we developed the checklist
with items that appear to capture the types of cohesion used across genres, we only tested narra-
tive writing in this preliminary study. Inclusion of other genres in the checklist evaluation may have
resulted in different item performance and increased variability.
5.2. Implications for assessment
Despite the limitations of this preliminary research, the checklist, in its current form, may still
have some utility in writing assessment. If used to examine several writing samples by an individual
student, the checklist can assist an educator in detecting which devices a student has mastered, which
are emerging, and which are absent or used incorrectly. This type of analysis has diagnostic merit
(Rousseau, 1990) and is helpful in guiding instruction (National Commission on Writing, 2003).
5.3. Implications for future research
Given the findings and limitations of this study, we suggest two key areas for future research in
the assessment of cohesion. One addresses the issue of sample variability. The other relates to the
underlying construct being measured.
One way to improve variability is to change the scoring of the instrument to increase its sensitivity
to differences in cohesion. Thus, for future development of this checklist, we should consider adding
items, and/or implementing a multiple point scoring system that reflects the degree of cohesive device
use. Another way to improve variability would be to use longer texts to evaluate the instrument.
Longer written texts may include a greater variety of devices, potentially leading to stronger internal
consistency. Future studies should therefore include longer writing samples to test this hypothesis.
In addition to research to improve the scoring format, further study of its validity are required.
Future studies of validity should include use of the instrument in measuring cohesion in writing across
genres and by comparing checklist scores to teacher ratings of quality and coherence.
It may also be beneficial to take a step back from the checklist itself and further examine the devel-
opment of cohesion with particular attention to the underlying construct(s). The lack of correlation
between subsections of the instrument and their differing relationships to other aspects of writing
proficiency suggest that subsections of the instrument may have tapped separate underlying linguistic
abilities with different developmental rates. For example, it could be that the use of reference relates
to early morphological development. If so, reference skills may not change dramatically across the
intermediate school years. Lexical cohesion, on the other hand, may be related to underlying semantic
development, which continues grow throughout the school years (Byrnes & Wasik, 2009). Further
research into the patterns of development for the different kinds of cohesive markers and their under-
lying constructs will be helpful in determining the aspects of cohesion that might be more sensitive
to developmental differences at or across various grade levels and genres.
6. Conclusion
This study focused on the preliminary development of a checklist to assess cohesion in the writing of
children in Grades 4–7, utilizing brief writing samples. We found adequate performance on a classical
item analysis, good interrater reliability, and statistical evidence for discriminant validity; however,
there is still more work to be done before we will have an instrument that will be sensitive enough to
capture differences among students of similar ages.
Overall, this preliminary checklist development study was the first step in addressing a gap in
written language testing. Although further development is required, this study highlighted the com-
plexities involved in assessing cohesion in writing. Furthermore, we demonstrated that discourse level
aspects of writing, such as cohesion, may not be adequately captured by other assessment approaches.
Additional research to refine this instrument will require further examination of the developmental
progression of cohesion in the writing of children, changes to the scoring procedure, and evaluation of
the instrument using longer text samples, and multiple genres. Finally, to extend the generalizability
of this assessment tool, we will need to administer it to a larger representative sample of students
from various grade levels, school districts, and communities.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in the online version, at
http://dx.doi.org/10.1016/j.asw.2013.05.001.
References
Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Hillsdale, NJ: Lawrence Erlbaum Associates.
Benson, B. J., & Campbell, H. M. (2009). Assessment of student writing with curriculum-based measurement. In: G. A. Troia
(Ed.), Instruction and assessment for struggling writers: Evidence-based practices (pp. 337–357). New York, NY: Guilford Press
Berninger, V. W., Mizokawa, D. T., Bragg, R., Cartwright, A., & Yates, C. (1994). Intraindividual differences in levels of written
language. Reading and Writing Quarterly, 10 (3), 259–275.
Byrnes, J. P., & Wasik, B. A. (2009). Language and literacy: What educators need to know. New York, NY: Guilford Press.
Cameron, C. A., Lee, K., Webster, S., Munro, K., Hunt, A. K., & Linton, M. J. (1995). Text cohesion in children’s narrative writing.
Applied Psycholinguistics, 16 (3), 257–269.
Carrow-Woolfolk, E. (1996). Oral and written language scales. Circles Pines, MN: American Guidance Service.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112 (1), 155–159.
Cox, B. E., Shanahan, T., & Sulzby, E. (1990). Good and poor readers’ use of cohesion in writing. Reading Research Quarterly, 25
(1), 47–65.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart & Winston.
Crowhurst, M. (1987). Cohesion in argument and narration at three grade levels. Research in the Teaching of English, 21 (2),
185–197.
Feifer, S. G., & De Fina, P. A. (2002). The neuropsychology of written language disorders: Diagnosis and intervention. Middletown,
MD: School Neuropsych Press.
Fitzgerald, J., & Spiegel, D. L. (1986). Textual cohesion and coherence in children’s writing. Research in the Teaching of English,
20 (3), 263–280.
Gearhart, M. (2009). Classroom portfolio assessment for writing. In: G. A. Troia (Ed.), Instruction and assessment for struggling
writers: Evidence-based practices (pp. 311–336). New York, NY: Guilford Press.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language.
Behavior Research Methods, 36 (2), 193–202.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London, UK: Longman Group.
Hedberg, N. L., & Fink, R. J. (1996). Cohesive harmony in the written stories of elementary children. Reading and Writing: An
Interdisciplinary Journal, 8, 73–86.
Hunt, K. W. (1965). Grammatical structures written at three grade levels. Champaign, IL: National Council of Teachers of English.
Hurford, D. P., & Trevisan, M. S. (1998). Review of the test of early written language: Second edition. In: J. C. Impara & B. S. Plake
(Eds.), The thirteenth mental measurements yearbook (pp. 1027–1031). Lincoln, NE: Buros Institute of Mental Measurements.
Irwin, J. W. (1988). Linguistic cohesion and the developing reader/writer. Topics in Language Disorders, 8 (3), 14–23.
ITEMAN (Version 3.5). (1994). Computer software. St. Paul, MN: Assessment Systems Corporation.
Kimmel, E. W. (1998). Review of the writing process test. In: J. C. Impara & B. S. Plake (Eds.), The thirteenth mental measurements
yearbook (pp. 1160–1161). Lincoln, NE: Buros Institute of Mental Measurements.
Klecan-Aker, J. S., & Hendrick, D. L. (1985). A study of the syntactic language skills of normal school-aged children. Language,
Speech, and Hearing Services in Schools, 16 (3), 187–198.
Liles, B. Z. (1985). Cohesion in the narratives of normal and language-disordered children. Journal of Speech and Hearing Research,
28, 123–133.
Loban, W. (1976). Language development: Kindergarten through grade twelve. Urbana, IL: National Council for Teachers of English.
McCutchen, D., & Perfetti, C. A. (1982). Coherence and connectedness in the development of discourse production. Text, 2,
113–139.
McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27 (1),
57–86.
Morris, J., & Hirst, G. (2004). The subjectivity of lexical cohesion in text.. Retrieved from http://citeseerx.ist.psu.edu/
viewdoc/download?doi10.1.1.9.4620
Mortensen, L., Smith-Lock, K., & Nickels, L. (2009). Text structure and patterns of cohesion in narrative texts written by adults
with a history of language impairment. Reading and Writing: An Interdisciplinary Journal, 22 (6), 735–752.
National Commission on Writing. (2003, April). The neglected “R”: The need for a writing revolution. In: Report of the
National Commission on Writing in America’s schools and colleges. The College Entrance Examination Board. Retrieved from
http://www.host-collegeboard.com/prod downloads/writingcom/neglectedr.pdf
O’Reilly, T., & McNamara, D. S. (2007). Reversing the reverse cohesion effect: Good texts can be better for strategic, high-
knowledge readers. Discourse Processes, 43 (2), 121–152.
Perera, K. (1984). Children’s writing and reading. Oxford, UK: Basil Blackwell.
Palmer, J. C. (1999). Coherence and cohesion in the English language classroom: The use of lexical reiteration and pronominal-
ization. RELC Journal, 30 (2), 61–85.
Psychological Corporation. (2002). Wechsler individual achievement test (2nd ed.). Toronto, ON: The Psychological Corporation.
Rentel, V., King, M. L., Pettegrew, B., & Pappas, C. (1983). A longitudinal study of coherence in children’s written narratives. In:
Research report for the Ohio State University Research Foundation, Columbus, Ohio. (ERIC Document Reproduction Service No.
ED 237 989).
Rousseau, M. K. (1990). Errors in written language. In: R. A. Gable & J. M. Hendrickson (Eds.), Assessing students with special
needs (pp. 89–101). London: Longman Group.
Rutter, P., & Raban, B. (1982). The development of cohesion in children’s writing: A preliminary investigation. First Language, 3
(7), 63–75.
Sax, G. (1997). Principles of educational and psychological measurement and evaluation (4th ed.). Belmont, CA: Wadsworth
Publishing.
Silliman, E. R., Jimerson, T. L., & Wilkinson, L. C. (2000). A dynamic systems approach to writing assessment in students with
language learning problems. Topics in Language Disorders, 20 (4), 45–64.
Spiegel, D. L., & Fitzgerald, J. (1990). Textual cohesion and coherence in children’s writing revisited. Research in the Teaching of
English, 24, 48–66.
Walcott, W., & Legg, S. M. (1998). An overview of writing assessment: Theory, research, and practice. Urbana, IL: National Council
of Teachers of English.
Watson Todd, R., Khongput, S., & Darasawang, P. (2007). Coherence, cohesion and comments on students’ academic essays.
Assessing Writing, 12, 10–25.
Whitaker, D., Berninger, V., Johnston, J., & Swanson, H. L. (1994). Intraindividual differences in levels of language in intermediate
grade writers: Implications for the translating process. Learning and Individual Differences, 6 (1), 107–130.
Yde, P., & Spoelders, M. (1985). Text cohesion: An exploratory study with beginning writers. Applied Psycholinguistics, 6 (4),
407–415.
Zarnowski, M. (1983). Cohesion in student narratives: Grades four, six, and eight. Unpublished research report (ERIC Document
Reproduction Service No. ED 247 569).
Lynda Struthers holds a BSc in Speech Pathology and Audiology (University of Alberta) and MEd in Curriculum and Instruction
with a focus on language (University of Northern British Columbia). She has extensive experience as a school Speech-Language
Pathologist. Her research interests lie in the area of writing development in children.
Judith C. Lapadat is an Associate Vice-President (Students) and Professor, Faculty of Education at the University of Lethbridge.
She has published three books and over 50 peer-reviewed articles and chapters on technologically mediated teaching and
learning, qualitative methods, language disabilities, and literacy, as well as literary works.
Peter MacMillan holds a PhD (University of Alberta) and MA (UBC), in Educational Psychology with a specialty in Measurement
and Evaluation. His research interests lie in the areas of Rasch and Curriculum Based Measurement, particularly instrument
analysis and judged events. Prior to working at UNBC, Peter taught secondary science and mathematics.

Assessing Writing: Lynda Struthers, Judith C. Lapadat, Peter D. Macmillan

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessing Writing: Lynda Struthers, Judith C. Lapadat, Peter D. Macmillan

Uploaded by

Copyright:

Available Formats

Assessing Writing 18 (2013) 187–201

Contents lists available at SciVerse ScienceDirect

Assessing cohesion in children’s writing:

Written language is an important form of communication. Consequently, learning to write well is

1.1. Deﬁnitions of cohesion

1.2. The value of assessing cohesion

1.2.1. Cohesion as an indicator of coherence and quality

1.2.2. Development of cohesion

1.2.3. Lack of availability of cohesion assessment instruments in schools

1.3. Content and format considerations for assessment of cohesion

1.3.1. Methods of scoring cohesion

1.3.2. Use of cohesive devices by children

1.3.3. Considerations for format

2. Developing the checklist

1. how checklist items performed on a classical item analysis;

3.3. Preliminary checklist development

3.3.1. Item development

3.3.2. Pilot project

3.4. Large scale checklist evaluation

4.1. Item analyses

Item statistics by total test Subsection Item statistics by subsection

1 .66 .36 .39 REF 1-1 .66 .72 .64

6 .52 .24 .28 CON 2-1 .52 .63 .54

12 .14 .21 .25 LEX 3-1 .14 .22 .62

REF 2.16 (1.35) 1.97 (1.18) 2.18 (1.28) 2.30 (1.18)

REF −.005 (.929) .203 (.000)

5.2. Implications for assessment

5.3. Implications for future research

Appendix A. Supplementary data

You might also like