Professional Documents
Culture Documents
2013 12:46PM
A U T O M A T E D EV A L U A T I O N OF T E X T A N D D I S C O U R S E
WITH COH-METRIX
DANIELLE S. McNAMARA
Learning Sciences Institute and Psychology Department,
Arizona State University
ARTHUR C. GRAESSER
Institute for Intelligent Systems and Psychology Department,
The University of Memphis
PHILIP M. McCARTHY
Institute for Intelligent Systems, The University of Memphis
ZHIQIANG CAI
Institute for Intelligent Systems, The University of Memphis
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927IMP.3D iv [4–4] 8.10.2013 8:45PM
www.cambridge.org
Information on this title: www.cambridge.org/9780521192927
© Danielle S. McNamara, Arthur C. Graesser, Philip M. McCarthy, and Zhiqiang Cai 2014
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2014
Printed in the United States of America
A catalog record for this publication is available from the British Library.
Library of Congress Cataloging in Publication Data
McNamara, Danielle S.
Automated evaluation of text and discourse with Coh-Metrix / Danielle S.
McNamara, Arizona State University; Arthur C. Graesser, Institute for
Intelligent Systems, The University of Memphis; Philip M. McCarthy, Institute
for Intelligent Systems, The University of Memphis; Zhiqiang Cai, Institute
for Intelligent Systems, The University of Memphis.
pages cm
Includes bibliographical references.
isbn 978-0-521-19292-7 (Hardback) – isbn 978-0-521-13729-4 (Paperback)
1. Discourse analysis – Data processing. 2. Cognition – Data processing.
3. Psycholinguistics. 4. Cognitive science. 5. Corpora (Linguistics) 6. Computational
linguistics. I. Graesser, Arthur C. II. McCarthy, Philip M., 1967–
III. Cai, Zhiqiang, 1962– IV. Title.
p302.3.m39 2014
006.30 5–dc23
2013030437
isbn 978-0-521-19292-7 Hardback
isbn 978-0-521-13729-4 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such Web sites is, or will remain,
accurate or appropriate.
C:/ITOOLS/WMS/CUP-NEW/4412190/WORKINGFOLDER/MCNAM/9780521192927DED.3D v [5–6] 6.10.2013 1:17PM
Introduction 1
vii
viii Contents
Figures
ix
C:/ITOOLS/WMS/CUP-NEW/4418045/WORKINGFOLDER/MCNAM/9780521192927TOCF.3D x [9–10] 9.10.2013 8:26AM
x List of figures
Tables
xi
C:/ITOOLS/WMS/CUP-NEW/4418045/WORKINGFOLDER/MCNAM/9780521192927TOCT.3D xii [11–12] 9.10.2013 8:28AM
C:/ITOOLS/WMS/CUP-NEW/4415828/WORKINGFOLDER/MCNAM/9780521192927ACK.3D xiii [13–14] 8.10.2013 3:15PM
Acknowledgments
Coh-Metrix has been built, tested, revised, and used by many researchers,
colleagues, and students over the past decade. We are extremely grateful to
the inestimable number of people who have contributed to the Coh-Metrix
project. We are likely to leave someone out if we attempt to list everyone who
has worked with us on Coh-Metrix. We must, however, explicitly acknowl-
edge a few key individuals. Max Louwerse, Randy Floyd, and Xiangen Hu
were co-investigators on the original Coh-Metrix project – we are thankful
for the opportunities we had to work with them and for their invaluable input
and contributions. Jianmin Dai joined our team more recently and has
contributed greatly to our Coh-Metrix analyses of writing and to the develop-
ment of various Coh-Metrix tools. Scott Crossley contributed to the develop-
ment of Coh-Metrix and has been perhaps the most avid user of Coh-Metrix
over the years. Working with Scott has been a delight, and without his work
we would have never progressed to where we are today. Finally, we cannot
express in words our gratitude to the many students who have worked on this
project and on related projects: We would be nothing without them.
The development of Coh-Metrix and much of the research referenced within
this book was supported by the Institute of Education Sciences, U.S. Department
of Education, through Grant [R305G020018-02] to the University of Memphis.
Research using Coh-Metrix was also supported by funding to develop and assess
the Writing Pal by the Institute of Education Sciences, U.S. Department of
Education, through Grants [IES R305A080589] to the University of Memphis
and Grants [R305A09623; R305A120707] to Arizona State University. Use and
modifications of Coh-Metrix was also supported by the National Science
Foundation through grant [BCS 0904909] to the University of Memphis. The
development of the Coh-Metrix text easability components was partially sup-
ported by the Gates Foundation through a subcontract to Student Achievement
Partners. The opinions expressed are those of the authors and do not represent
views of the Institute or the U.S. Department of Education, the National Science
Foundation, or the Gates Foundation.
xiii
C:/ITOOLS/WMS/CUP-NEW/4415828/WORKINGFOLDER/MCNAM/9780521192927ACK.3D xiv [13–14] 8.10.2013 3:15PM
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 1 [1–4] 8.10.2013 8:37PM
Introduction
1
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 2 [1–4] 8.10.2013 8:37PM
Introduction 3
extensive body of research that has evolved since Coh-Metrix was launched to
discourse processing researchers and scholars in other fields.
The Coh-Metrix facility and the associated theoretical framework would
never have been accomplished without an interdisciplinary team of research-
ers. The relevant major fields have included psychology, computer science,
linguistics, and education but it is the more specialized hybrid fields that have
provided the more useful, targeted contributions: discourse processing, psy-
cholinguistics, reading, computational linguistics, corpus linguistics, cogni-
tive science, artificial intelligence, information retrieval, and composition.
Some of us brand ourselves as computational discourse scientists. We use
the term discourse as a general umbrella term for analyses of language, texts,
communication, and social interaction through various communication
channels. Our work is computational in two ways. First, we precisely specify
the algorithms or symbolic procedures that identify text categories, units, or
patterns at the various levels of a multilevel theoretical framework. Second,
we attempt to program the computer to implement these algorithms and
procedures. Many computer implementations are successful, but there are no
guarantees. Coh-Metrix includes only the successful automated algorithms
and procedures. And finally, we are scientists because we embrace scientific
methods in all stages of our research. That is, we sample texts in a systematic
manner when we empirically test well-formulated claims about text charac-
teristics. We perform statistical analyses that assess the generality of our
claims regarding targeted text categories. We collect data from human par-
ticipants to test claims and predictions about the impact of text characteristics
on comprehension and other psychological processes.
We are hopeful that Coh-Metrix will be useful to scholars in both the
sciences and humanities and to all sectors of the public. Coh-Metrix opens the
door to a new paradigm of research that coordinates studies of language,
discourse, corpus analysis, computational linguistics, education, and cogni-
tive science (Graesser, McNamara, & Rus, 2007). We hope that this book will
be of use to a wide range of readers, including researchers, educators, writers,
publishers, and students. Our vision is broad. There is the student in a
literature course who analyzes differences between various works by
Shakespeare, and the student in an educational psychology course who
compares textbooks written for elementary versus middle school courses.
There are the students who want to know about the nature of their own
writing and whether it improves over time. There is the book publisher who
wants to know whether a text in biology is written coherently compared with
other books on the market. There are the school superintendents who want to
evaluate all of the books being used in their school system. There is the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 4 [1–4] 8.10.2013 8:37PM
attorney who wants to know the difficulty of the Miranda Rights when
defending a client who has a modest understanding of the English language.
The uses and applications of Coh-Metrix are endless. Enjoy!
part i
COH-METRIX: THEORETICAL,
TECHNOLOGICAL, AND EMPIRICAL
FOUNDATIONS
C:/ITOOLS/WMS/CUP-NEW/4406319/WORKINGFOLDER/MCNAM/9780521192927PTL01.3D 6 [5–6] 4.10.2013 10:11AM
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 7 [7–17] 8.10.2013 7:17PM
Some texts are easy to read. Others are difficult. That is perfectly obvious. The
challenge lies in devising an objective means to measure texts on how difficult
they are to read. That is one of the puzzles that motivated our development of
Coh-Metrix and ultimately the writing of this book. How do we scale texts on
comprehension difficulty? Or on the flip side: easability?
It is often quite clear when texts are difficult or easy. Consider the two texts
below and cast your vote on which is difficult and which is easy.
Lady Chatterley’s Lover
He spread the blankets, putting one at the side for a coverlet. She took off her hat,
and shook her hair. He sat down, taking off his shoes and gaiters, and undoing his
cord breeches. “Lie down then!” he said, when he stood in his shirt. She obeyed in
silence, and he lay beside her, and pulled the blanket over them both.
A Mortgage
The assignment, sale, or transfer of the servicing of the mortgage loan does not
affect any term or condition of the mortgage instrument, other than terms directly
related to the servicing of your loan. Except in limited circumstances, the law
requires your present servicer send you this notice within 15 days before this
effective date or at closing.
important. Sex and romance are on par with money and domestic security,
although it could be argued that sex and romance are considerably more
interesting. Both texts require a sociocultural context for a complete under-
standing, be it knowledge of romance or of finance. Moreover, a deep under-
standing of the D. H. Lawrence story requires knowledge of the status of
women in the early 20th century (i.e., not great), when it was written. The
differences in comprehension difficulty for these two texts are indeed much
more complex and subtle than is readily apparent from the text alone.
This book will unveil the many ways that texts vary in comprehension
difficulty. What we sometimes call comprehension easability is aligned with
reading ease or readability, the other end of the continuum being text
difficulty or text complexity. Our theoretical approach is to analyze texts on
many levels of language, meaning, and discourse (Graesser & McNamara,
2011). A computer program called Coh-Metrix (and Coh-Metrix-TEA) per-
forms these analyses automatically for many of the levels that researchers
have identified over the years (Graesser, McNamara, & Kulikowich, 2011;
Graesser, McNamara, Louwerse, & Cai, 2004; McNamara & Graesser, 2012;
McNamara, Graesser, & Louwerse, 2012; McNamara, Louwerse, McCarthy, &
Graesser, 2010). The Coh-Metrix output on these many levels provides the
foundation for scaling texts on difficulty (versus easability).
what text?
Our emphasis in this book is on printed texts, although the texts may derive
from virtually any source and be composed for any English language com-
munity. For example, they may be newspaper articles, entries in encyclope-
dias, science texts in schools, legal documents, advertisements, short stories,
or theatrical scripts – the list goes on. The Coh-Metrix program holds up
quite well for most of the texts that we have analyzed. The majority of our
analyses have been on naturalistic texts, but we have also analyzed well-
controlled texts that discourse researchers have prepared or manipulated
for psychology experiments (McNamara et al., 2010). Our goal is to accom-
modate virtually any text in the English language that people write with the
intention of communicating messages to readers.
Our theoretical framework and the Coh-Metrix program can also be used
to analyze transcripts of naturalistic oral discourse. We have analyzed con-
versations in tutoring sessions, chat rooms, e-mail exchanges, and various
forms of informal conversation. Transcribed texts of conversations are replete
with speech disfluencies (um, ah, er), ungrammatical utterances, interrup-
tions, overlapping speech, slang, and semantically vague expressions (Clark,
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 9 [7–17] 8.10.2013 7:17PM
1996). These deviations from well-formed, edited, neat and tidy text have a
major impact on some of the Coh-Metrix measures, but many of the meas-
ures are minimally disturbed. It is also possible to analyze students’ written
responses, explanations, and essays that are similarly replete with untidy
language and discourse (Crossley & McNamara, 2011; Louwerse, McCarthy,
McNamara, & Graesser, 2004; McNamara, Raine et al., 2012; Renner,
McCarthy, Boonthum-Denecke, & McNamara, 2012).
While Coh-Metrix analyses of more naturalistic discourse (e.g., dialogues)
have been highly successful, it remains important to acknowledge that some
classes of printed texts will stress the boundaries of Coh-Metrix. Current
versions of Coh-Metrix are not well equipped to handle mathematical expres-
sions, pictures, diagrams, and other forms of nonverbal media. Coh-Metrix
can be applied to poetry (Lightman, McCarthy, Dufty, & McNamara, 2007b),
but measures at some levels (such as syntax) will be compromised and Coh-
Metrix will not do justice to metaphorical expressions (Graesser, Dowell, &
Moldovan, 2011). Likewise, many aspects of the quality of writing, such as
rhetorical and pragmatic aspects of language, are not fully captured by Coh-
Metrix alone (McNamara, Crossley, & Roscoe, 2013). These challenges are on
deck for future research endeavors.
easy for them to read. The assignment of texts can also be tailored to
particular deficits that a student has at particular levels of language or
discourse. A student who is reading quite well but has trouble understanding
the global meaning of stories should be receiving different texts than students
who are having trouble with syntax or those who experience challenges with
vocabulary. Many claim that text assignment should be adapted to the
student’s profile of reading skills and proficiencies, and moreover, that stu-
dent motivation and learning improve when this happens (Connor,
Morrison, Fishman, Schatschneider, & Underwood, 2007).
Quality of public documents. The comprehension difficulty of many public
documents is too high for a large percentage of the population. The earlier
mortgage text illustrates the problem. Legal documents, medical documents,
and employment agreements are also excellent examples of challenging texts
that are difficult to understand for most of the public. Similarly, question-
naires and surveys administered to the public, such as tax forms and
census surveys, have a high percentage of questions that pose comprehension
difficulties to a significant portion of the public (Conrad & Schober, 2007;
Graesser, Cai, Louwerse, & Daniels, 2006). The reliability and validity of data
collected from these surveys is compromised when the questions have diffi-
cult words, ambiguous meaning, complex syntax, or content that excessively
burden cognitive resources. Individuals and society suffer the consequences.
Drug prescriptions and medical procedures. It is obviously important to
take the proper dosage of drugs, to be mindful of side effects, and to under-
stand medical procedures. Failure to do so may be a matter of life or death.
Unfortunately, the complexity of medical information is too high for most of
the public to comprehend, particularly when there is a large amount of jargon,
incoherent descriptions of procedures, and complex models of health and
biological mechanisms (Day, 2006). Interestingly, the advertisements tend to
be much easier to read than the warnings. Consider the following warning on
a nonprescription drug:
Do not use if you are now taking a prescription monoamine oxidase inhibitor
(MAOI) (certain drugs for depression, psychiatric, or emotional conditions, or
Parkinson’s disease), or for 2 weeks after stopping the MAOI drug.
Text Categories
There are many categories of text, or what some researchers call “genre,” a
French word for category. Text category schemes vary in the sets of categories
that are included as well as in grain size. These variations often depend on the
discipline and theoretical slant of the researchers. A traditional scheme of
Brooks and Warren (1972) divides texts into the categories of narrative,
expository, persuasive, and descriptive (see also McCarthy, Meyers, Briner,
Graesser, & McNamara, 2009). Each of these categories has subcategories and
potentially sub-subcategories in a hierarchical scheme with varying levels of
grain size. Narrative texts convey events and actions performed by characters
that unfold over time, as in the case of folktales, drama, and short stories
(Sanford & Emmott, 2012). Expository texts explain the nature of mecha-
nisms or other phenomena, as in the case of science texts and encyclopedia
articles. Subcategories of persuasive texts are sermons, editorials, and adver-
tisements. Descriptive texts describe either static entities (a visual scenario,
the attributes of an object, the personality of a person) or activities (a broad-
cast of the events at a baseball game).
There are a number of limitations of text categorization schemes. One
problem is that researchers disagree on what categories to include and on the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 12 [7–17] 8.10.2013 7:17PM
Text Dimensions
One approach to scaling texts is to have a single dimension of text difficulty.
This is the approach taken by metrics such as Flesch-Kincaid Grade Level
(FKGL; Klare, 1974–1975), Degrees of Reading Power (DRP; Koslin, Zeno, &
Koslin, 1987), and Lexile scores (Stenner, 2006). We and others have found
these three metrics of text complexity to be highly correlated (r > .90). These
and other similar readability formulas are correlated because they all include
features related to the frequency of the word in language and the length of the
sentence. Readability formulas are theoretically grounded on the assumption
that a reader’s understanding of sentences in a text is related to the likelihood
that the reader knows the words in the sentences and can parse the sentences
in the text.
The Flesch-Kincaid Grade Level metric is based on the length of words and
length of sentences. For example, Formula 1 shows the Flesch-Kincaid metric.
Words refers to the mean number of words per sentence and syllables refers to
the mean number of syllables per word.
The grade level increases as the words and sentences increase in length. These
two factors of word length and sentence length are reasonable psychologi-
cally. Longer words tend to be less frequent in the English language so readers
have less world knowledge about these words. Longer sentences tend to place
a greater load on working memory and thereby increase comprehension
difficulty.
DRP and Lexile scores relate characteristics of the texts to readers’ per-
formance in a cloze task. In the cloze task, the text is presented with words left
blank during the course of reading; the reader is asked to fill in the words by
generating them or by selecting a word from a set of options. A text is at the
reader’s level of proficiency if the reader can perform the cloze task at a
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 14 [7–17] 8.10.2013 7:17PM
Text Levels
In our view, the most promising approach to scaling texts on difficulty is to
adopt a multilevel theoretical framework for language and discourse process-
ing (Graesser & McNamara, 2011). Psychological theories of comprehension
have identified the representations, structures, strategies, and processes at
multiple levels of language and discourse (Graesser, Millis, & Zwaan, 1997;
Kintsch, 1998). For example, Graesser and McNamara (2011) consider six
levels: words, syntax, the explicit textbase, the referential situation model
(sometimes called the mental model), the discourse genre and rhetorical
structure (the type of discourse and its composition), and the pragmatic
communication level (between speaker and listener, or writer and reader).
We believe that a scale of text difficulty needs to consider these different levels.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 15 [7–17] 8.10.2013 7:17PM
Moreover, subscales are needed for each of the levels because a text can be
difficult according to some subscales but not for others.
The first five of these six levels are elaborated in Chapter 3 (see also
Chapter 2). Chapters 4 and 5 provide a more detailed description of the
computational components and measures associated with these levels.
Therefore, only a cursory description of these levels is sufficient in this
introductory chapter. The levels of words and syntax need not much elabo-
ration here because they are self-explanatory. Quite clearly, the vocabulary in
a text can impose comprehension difficulties, as illustrated by the medical
warning example presented earlier. The syntactic composition of sentences
can result in very different comprehension problems than those attributed to
words. It is difficult to construct meanings from sentences that have syntactic
structures that are lengthy with many embedded subordinate clauses. We
believe that the word length and sentence length parameters of the readability
formula capture some facsimile of these word and syntax levels. However, the
other four levels move us beyond the readability formulas and into more
intriguing realms of meaning.
The textbase contains explicit ideas in the text in a form that preserves the
meaning but not the precise wording and syntax. According to van Dijk and
Kintsch (1983), the textbase contains explicit propositions in the text, as well as
links between propositions and a small set of inferences that connect these
explicit propositions. Propositions are more complex idea units than individual
words. For example, consider the first sentence in the earlier example from
Lady Chatterley’s Lover: “He spread the blankets, putting one at the side for a
coverlet.” The first sentence would have the following underlying propositions:
(1) the lover spread the blankets, (2) the lover put a blanket at the side, and (3)
the blanket was for a coverlet. In the van Dijk and Kintsch analysis, the
propositions are in a stripped down form that removes surface code features
captured by determiners (the, a), quantifiers (some, all, three), tense (past,
present, future), aspect (event completed versus in progress) and auxiliary
verbs (could, was). For example, a propositional representation of the lover
spread the blankets is spread (lover, blankets). Further, the textbase representa-
tion glosses over any distinction between the special blanket for the coverlet and
the other blankets. It also ignores the fact that the verb spread is in the past
tense, that the verb putting is a gerund, and that the timing of the spreading and
putting are not identical. These distinctions are explicit in the surface structure
of the reader’s understanding, but are not within the textbase. It is an empirical
question how much the reader tracks or remembers these subtleties.
One of the central questions about a reader’s textbase representation is
whether the noun entities (e.g., lover, blanket, coverlet, side) and propositions
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 16 [7–17] 8.10.2013 7:17PM
We have said very little about the pragmatic communication level of dis-
course up to this point. This is an essential level to understand for compre-
hension to succeed. Texts are written to inform, persuade, tease, irritate,
entertain, seduce, and so on. The situational settings, speakers, audience, and
broader contexts are often absent when a text is analyzed. This is an unfortu-
nate limitation but it is ubiquitous when researchers analyze printed text. The
writer, the reader, and the occasion are stripped from the analysis when printed
text is read and analyzed. Beck, McKeown, Hamilton, and Kucan (2007) have
attempted to encourage their readers to resurrect this context in their
Questioning the Author intervention and this has been quite successful in
improving comprehension. However, this is a giant move that moves us from
the text to the sociocultural context.
conclusion
The Coh-Metrix program provides solid analyses of the first five levels
described in Graesser and McNamara (2011). In contrast, it has a relatively
anemic analysis of the pragmatic communication level. Indeed, we are pre-
pared to surrender and admit that this level is beyond the scope of the Coh-
Metrix project, but perhaps not beyond natural language processing. There
are certainly vestiges of text elements and discourse patterns that signal
components of pragmatic communication. But this research effort is at the
fringe and well beyond the scope of this book. In the meantime, we have
focused our efforts in Coh-Metrix on providing a selection of indices corre-
sponding to the first five levels of discourse: words, syntax, the textbase, the
situation model, and genre and rhetorical structure. The following chapters
in Part I of this book describe the technologies that have enabled the
measurement of these multiple levels of language, the indices provided in
Coh-Metrix Version 3.0, and studies that validate and demonstrate the utility
of Coh-Metrix.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 18 [18–39] 8.10.2013 7:32PM
fi g u r e 2 . 1 . Connection model of coherence. The figure on the left has few connec-
tions and would lead to a less coherent representation than would the figure on the right,
which has more connections.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 20 [18–39] 8.10.2013 7:32PM
forget the others, whereas a reader with the representation on the right would
be more likely to remember the central idea as well as the other four ideas (or
nodes). This stems from a well-established notion that concepts or ideas with
more interconnected associations in memory are more likely to be remem-
bered. Likewise, when there are more connections in the text and when the
reader generates connections between ideas in the text and to prior knowl-
edge, then the reader’s understanding is more likely to be more coherent.
When the level of cohesion in the text is insufficient for the reader or when the
reader does not (or cannot) generate sufficient inferences to make connec-
tions between ideas, then the reader’s understanding will be less coherent.
Although cohesion is not directly tied to coherence, it is a crucial aspect of
predicting the likelihood that a given reader will be able to form a coherent
mental representation of a text.
The dog chased the cat who had been sitting on the brick fence. (2.1)
In this sentence, the verb “chase” connects the subject (dog) and the object
(cat) and conveys the relation between them. “The dog” occurring before “the
cat” conveys who is the subject and who is the object (given that the verb is
active rather than passive). Likewise, the verb “sit” connects the “cat” to the
“brick fence,” while the past tense of “had” indicates that the cat was no longer
sitting on the fence when the dog chased it, and so on. In essence, the syntax
provides cues as to how the words are related to each other at the sentence
level.
Clearly, syntax is essential for the reader to be able to understand the text.
However an important difference between syntax and cohesion is that syntax
adheres to rules. Importantly, these rules cannot be easily violated by the
whims of a writer or speaker. For instance, none of the following sentences are
acceptable if we intend to convey the same meaning as in Example 2.1.
The dog the cat who had been sitting on the brick fence chased. (2.2)
The cat chased the dog who had been sitting on the brick fence. (2.3)
Who had been sitting on the brick fence the dog chased the cat. (2.4)
The the the been on who had brick fence dog chased cat sitting. (2.5)
The addition of the cohesive cue “because” in Example 2.3 is not a compulsory
rule of language; nonetheless, its addition facilitates the understanding of why
smoking was forbidden.
When discourse lacks cohesion, the reader must make inferences to con-
nect the dots. These inferences can be generated by accessing prior text,
everyday world knowledge, or subject matter knowledge associated with a
particular area of specialization (called domain knowledge). These inferences
can be relatively automatic and unnoticeable to the reader, or they may be
conscious and strategic; the inferences may be successful or unsuccessful and
correct or incorrect. The degree to which these inferences occur and are
successful is an important factor influencing the coherence of the reader’s
mental representation of a text. Inferencing can be a good thing, especially for
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 22 [18–39] 8.10.2013 7:32PM
George got some beer out of the car. The beer was warm. (2.8)
George got some picnic supplies out of the car. The beer was warm. (2.9)
The sentence “The beer was warm” is read more quickly in the context of
“George got some beer out of the car” in Example 2.2 where there is overlap in
the referent, “beer,” in comparison to Example 2.3 where there is no common
referent between the two sentences. When text is read more quickly, it is
assumed that the text is easier to process for the reader.
Indeed, there are numerous studies that have demonstrated that referential
overlap impacts reading times and recall of words and sentences (Haviland &
Clark, 1974; Kintsch & Keenan, 1973; Kintsch, Kozminsky, Streby, McKoon, &
Keenan, 1975). Some portion of the effect of referential cohesion may be
attributable to priming (Dell, McKoon, & Ratcliff, 1983). Lexica priming is
the term used to indicate that a concept may be unconscious in working
memory but is activated to a certain extent, which facilitates processing of it.
Priming can emerge from direct overlap in words or from semantically
related words, and is related to the notion of connections between ideas and
activation between those connections.
Although lexical priming may facilitate the reading of other related words,
there is no guarantee the primed concepts make it into a reader’s mental
representation of a text. This point is emphasized in the Construction-
Integration model of text comprehension (Kintsch, 1988, 1998). Specifically,
many words or concepts that are encoded can be lost after the network is
integrated because they have too few connections to other concepts in the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 23 [18–39] 8.10.2013 7:32PM
network (McNamara & Kintsch, 1996; McNamara & McDaniel, 2004). When
there are more connections between ideas in the reader’s mental representa-
tion, the ideas are more likely to be remembered. This quality of the mental
representation is often referred to as coherence.
to capitalize on the text manipulations and the recall test was more sensitive
to those differences.
Beck, McKeown, Sinatra, and Loxterman (1991) extended these findings to
children’s comprehension of social studies texts. They asked children in
grades 4 and 5 to read either the revised or original versions of four passages
from a fifth grade social studies text book about the American Revolution.
The revisions were designed to minimize the need for children to rely on
background knowledge to understand the text by reducing the gaps in the text
requiring knowledge-based inferences. To this end, the researchers made
explicit the causal connections between the ideas, concepts, and events and
added clarifications, elaborations, and explanations to important information
in the texts. In essence, they increased the cohesion in the text in various ways.
After reading the passages, the children were asked to recall the passage and
answer open-ended comprehension questions. The results indicated that the
revisions improved the students’ comprehension both in terms of their recall
as well as their performance on open-ended questions. This study extended
Beck and colleagues’ previous findings to grades 4 and 5 as well as grade 3, and
demonstrated the results across a range of dependent variables, including
recall, multiple choice questions, and open-ended comprehension questions.
Importantly, the studies conducted by Beck et al. (1984, 1991) did not
carefully control the types of manipulations made to the texts. The authors
increased the ease of the text across many theoretical dimensions, including
adding elaborations to unfamiliar concepts and improving the general quality
of the text. As such, we cannot say that the studies’ positive learning outcomes
can be attributed to cohesion alone.
Britton and Gulgoz (1991) approached the issue of text manipulation more
systematically by implementing a model of text processing (Kintsch & van
Dijk, 1978; Miller & Kintsch, 1980; van Dijk & Kintsch, 1983). Their method-
ology of revision differed from that of Beck et al. (1984, 1991), because Britton
and Gulgoz very carefully manipulated some features of the text while others
remained constant. Britton and Gulgoz manipulated an Original passage
about the war in Vietnam, Air War in the North, from three different
theoretical perspectives. They created Heuristic, Readability Formula, and
Principled versions of the passage. In the Heuristic revision, the authors used
their own intuitive notions of better writing practice to improve the passage.
Some information was reordered or clarified, unimportant ideas were
omitted, and important ideas were elaborated. In the Readability Formula
revision, modifications were made to shorten the sentences and use more
familiar words such that the readability (i.e., according to five indices,
including Flesch-Kincaid) was equal to that of the Heuristic revision
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 25 [18–39] 8.10.2013 7:32PM
(i.e., approximately grades 11–12), and two grades lower than the Original and
Principled revision (i.e., approximately grades 13–14).
Most relevant here is the Principled version. In the Principled revision,
Britton and Gulgoz (1991) focused primarily on increasing cohesive cues from
the perspective of Kintsch and van Dijk’s theory of text processing (e.g.,
Kintsch & van Dijk, 1978; Miller & Kintsch, 1980; van Dijk & Kintsch, 1983).
They first identified potential coherence breaks based on van Dijk and
Kintsch’s model of comprehension. A coherence break was a location in the
text in which there was no explicit cue on how the new information was
linked to prior text. In Coh-Metrix, these breaks would be identified in terms
of low referential cohesion and the lack of explicit connectives. Britton and
Gulgoz found 40 coherence breaks in the text and applied three principles to
repair these breaks. Principle 1 was to add referential (i.e., argument) overlap
such that a sentence repeated an idea stated in the previous sentence.
Principle 2 was to rearrange part of each sentence so that readers first received
old information (i.e., an idea presented previously in the text) and then the
new information. Principle 3 was to make explicit any implicit references that
did not have clear referent.
Consider these two examples: two sentences from the Original and
Principled version of the texts in Britton and Gulgoz (1991):
Most members of the Johnson administration believed bombing attacks would
accomplish several things. They would demonstrate clearly and forcefully the
United States’ resolve to halt communist aggression and to support a free
Vietnam. (2.10)
changes increased referential overlap with the paragraph that preceded it, and
also provided the reader with potentially missing background knowledge.
We can also consider the differences between the Principled revision and
Original version in terms of Coh-Metrix values. For example, as described in
Chapter 4, Coh-Metrix provides an argument overlap score (CRFAO1),
which indicates the average overlap between arguments (i.e., nouns, pro-
nouns) in a text. The argument overlap score is .68 for the Principled revision
and .38 for the Original version. We can also calculate overall cohesion scores
using Coh-Metrix Text Easability Scores as described in Chapter 5.
Accordingly, the Referential Cohesion Easability Z-score (ZREF) is 1.79 for
the Principled revision and –0.96 for the Original version. These values
provide some confirmation that the Principled revision was indeed higher
in cohesion than was the original version (see McNamara et al., 2010).
To assess the effects of their text revisions, Britton and Gulgoz (1991) asked
college students to read either the original or a revised version of the text. The
students’ comprehension was measured with free recall, multiple-choice
questions, and a keyword association task. The authors found a significant
disadvantage for the version that was modified based on notions of
Readability. Those who read the Readability Formula version showed lower
performance on both the recall and the multiple-choice comprehension
assessments. By contrast, both the Principled and the Heuristic revisions
improved comprehension in comparison to the Original version. Further,
the students’ efficiency measure for recall (the number of propositions
recalled per minute of reading time) indicated that the revision made the
comprehension process more efficient. Although the Principled and Heuristic
revisions lead to similar improvements, one advantage of the Principled
revision was that the modifications were guided by well-specified rules,
whereas the Heuristic revision was based solely on intuitions of improving
writing by an expert in discourse processing.
In sum, Britton and Gulgoz (1991) found that the Principled revision
improved comprehension according to their three dependent measures (i.e.,
free recall, multiple-choice questions, and a keyword association task).
Further, their efficiency measure for recall (the number of propositions
recalled per minute of reading time) indicated that the revision made the
comprehension process more efficient. There have been numerous studies on
the effects of cohesion using longer texts such as the one investigated by
Britton and Gulgoz (1991). A review of 19 studies and an analysis of the texts
using Coh-Metrix are available in McNamara, Louwerse, McCarthy, and
Graesser (2010). The experimental studies of text cohesion have implemented
a variety of techniques to enhance the coherence of text, including increasing
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 27 [18–39] 8.10.2013 7:32PM
Heart Disease
The heart is the hardest-working organ in the body. We rely on it to supply blood
regularly to the body every moment of every day. Any disorder that stops the
heart from supplying blood to the body is a threat to life. Heart disease is such a
disorder. It is very common. More people are killed every year in the U.S. by heart
disease than by any other disease.
There are many kinds of heart disease, some of which are present at birth and
some of which are acquired later.
In Example 2.12, local referential cohesion was modified in the first paragraph.
For instance, the third sentence was modified from the original version from
“Any disorder that stops the blood supply is a threat to life” to specify
explicitly that the blood supply is being supplied to the body, and conse-
quently increase the overlap between the sentences in the paragraph. The
second paragraph, “There are many kinds of heart disease . . .,” provides a
topic sentence that introduces the upcoming sections, “congenital heart
disease” and “acquired heart disease,” which were two of the three added
headers. The addition of “but, for example,” and “resulting in” are examples
of added connectives to specify the relationships between ideas in the text.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 29 [18–39] 8.10.2013 7:32PM
Cohesion
High local High local Low local Low local
High global low global high global low global
5.8 0.7
6.0 easy
0.6
Argument overlap
F-K Grade Level
The insertion of “hearts have flaps, called valves, that control the blood flow
between its chambers” is an example where an unfamiliar term was defined
for the reader.
These revisions resulted in four versions that manipulated both local and
global cohesion in a factorial design. The primary contrast was between the
two texts that were maximally high or low in cohesion. Interestingly, the
cohesion of the text was negatively related to Flesch-Kincaid readability. As
shown in Figure 2.2, the Coh-Metrix measure of referential cohesion (i.e.,
argument overlap) decreased as cohesion decreased across the four versions
of the text. By contrast, readability estimates such as the Flesch-Kincaid
Grade Level made the opposite estimates of text ease. As cohesion decreased,
the text was estimated to be easier by Flesch-Kincaid Grade Level estimates.
Readability measures often predict a decrease in ease when cohesion is
increased because adding cohesion often results in increasing the length of
the sentences and adding more unfamiliar or longer words.
McNamara et al. (1996) found that the benefits of cohesion were greater for
those readers who knew less about the heart before reading the text. They
found that low-knowledge readers benefited from the added cohesion accord-
ing to all of the comprehension and text recall measures. The size of the
difference in comprehension scores can be measured using Cohen’s d (see
Chapter 11 for a discussion of effect sizes and their interpretation).
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 30 [18–39] 8.10.2013 7:32PM
In about one in every 200 cases something goes wrong. Sometimes a valve
develops the wrong shape. It may be too tight, or fail to close properly. (2.13)
Current Sentence
Prior
Knowledge
fi g u r e 2 . 3 . Model of reader inference using prior text and prior knowledge. Readers
make inferences when reading using prior text and prior knowledge.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 31 [18–39] 8.10.2013 7:32PM
The reader needs to make an inference that something will go wrong refers to
“the baby” and to “the heart,” and thus infer that the baby will be born with a
bad heart rather than a perfect heart. The reader further needs to have some
knowledge of what a valve is within the heart, and that it is not a plastic device.
Hence, the reader must make inferences accessing prior text as well as prior
knowledge. Neither of those inferences is likely to occur in the absence of
some other source of scaffolding (e.g., McNamara, 2004; McNamara &
Dempsey, 2011; see also Chapter 5). Hence, low-knowledge readers who are
faced with texts that contain many such gaps between ideas and sentences
understand very little of the text.
The story is quite different when students have sufficient knowledge to
generate the inferences called for by the low-cohesion text. Across a number
of studies, readers with more background knowledge have been found to
either not benefit from the cohesion or from the lack of cohesion in the text.
McNamara et al. (1996) found that the children with more knowledge about
the heart benefit from the low-cohesion version of the text according to
comprehension measures that tapped into deeper levels of comprehension.
According to the bridging-inference questions, problem-solving questions,
and the sorting task, the children with more knowledge showed better
comprehension if they had read the low-cohesion rather than the high-
cohesion versions of the text. According to their recall of the text and the
performance on text-based (shallow, detail) questions, they showed a slight
advantage from the highest-cohesion text, but on the questions and tasks that
relied on deeper levels of understanding they showed large advantages of
having read the low-cohesion text. The Cohen’s d effect sizes for these low-
cohesion advantages ranged from 0.40 to 1.00 (as reported in McNamara
et al., 2010).
Several subsequent studies by McNamara and colleagues sought to isolate
the locus of this reverse cohesion effect. McNamara (2001) conducted an
experiment to examine the inference generation explanation of the reverse
cohesion effect. The inference generation explanation is based on the Kintsch
(1998) Construction-Integration (CI) theory of text comprehension.
Accordingly, when readers generate inferences that link the text with prior
knowledge, the reader’s situation model level of understanding is enhanced.
The CI model distinguishes between the textbase level of comprehension and
the situation model level of comprehension. Important to the concept of text/
reader interactions is the reader’s level of comprehension. The principal levels
are the surface structure, the propositional textbase, and the situation model
(Kintsch, 1998). These levels of comprehension are also discussed in
Chapters 1 and 3. The surface structure refers to the reader’s memory for
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 32 [18–39] 8.10.2013 7:32PM
the words and syntax of a text. For example, comprehension and memory
for the surface structure for the sentence “The streets were wet because it
was raining” includes only the words and syntax explicitly communicated.
In contrast, a textbase level representation of the sentence may be “The
roads were wet from rain.” The textbase level representation is memory
for the meaning behind the words and syntax, or the meaning at the
propositional level. One version of a propositional representation of
“The streets were wet because it was raining” is [Prop 1:wet(streets); Prop
2: cause(rain)]. The situation model level understanding is generally char-
acterized as resulting from knowledge-based inferences that go beyond the
text. In the case of the previous example, a reader might imagine that the
streets were slick and the sky was grey. The reader brings to the situation
knowledge about rain and streets and the various events that might occur
on wet streets, such as driving, running, or ducking under an awning.
When readers make more inferences that link to prior knowledge, then
the CI model predicts that the reader will construct a deeper, more stable
understanding of the text.
According to the CI model, the high-knowledge readers in McNamara
et al. (1996) were able to gain from low-cohesion text because it forced them to
generate inferences, and that inferencing resulted in a better, or deeper,
understanding of the text. McNamara (2001) tested that notion by having
participants read both the high-cohesion and low-cohesion versions of text
about cell mitosis, or one of the text versions twice. The participants were in
one of four conditions. They either read the same version of the cell mitosis
text twice (high-high; low-low) or they read one or the other version first
(high-low; low-high). Notably, the readers read the same texts in the low-high
and the high-low conditions. That is, they read both the low-cohesion version
and the high-cohesion version of the texts but simply in different orders of
presentation. The reverse cohesion effect was predicted to emerge only when
high-knowledge readers read the low-cohesion version of a text during the
first exposure to the text. If a reader were exposed to a high-cohesion version
of a text followed by the low-cohesion version, the reverse cohesion effect
would not occur. During the first reading, the high-cohesion version would
not induce inferences. Then, when reading the low-cohesion version, a text
representation would be readily available in memory, and the reader would be
less likely to generate the gap-filling inferences. In sum, if the reverse cohesion
effect emerges from inducing the reader to generate inferences to fill in the
conceptual gaps in the low-cohesion text, then a reverse cohesion effect would
be observed for both the low-low and low-high conditions but not for the
high-high or high-low conditions.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 33 [18–39] 8.10.2013 7:32PM
that tap into that level, that the benefits of cohesion will emerge. If the textbase
level of understanding is relatively coherent without cohesion (as it was in
McNamara et al., 1996), then the benefits of inference generation are more
likely to emerge at deeper levels of understanding. These differences may well
depend on the overall difficulty of the text, as we discuss in Chapter 5.
High-knowledge readers who were skilled readers, and thus more naturally
generated inferences, did not need the low-cohesion text to induce them to
generate inferences, and thus there was no reverse cohesion effect.
The findings reported by O’Reilly and McNamara (2007) were replicated
by Ozuru, Dempsey, and McNamara (2009). Ozuru and colleagues used Coh-
Metrix cohesion measures to verify and control the cohesion manipulations
of two science texts, one on the topic of internal distributions of heat in
animals and the other on a plant’s response to an external stimulus. Ozuru
and his colleagues manipulated the cohesion of the texts by (a) replacing
ambiguous pronouns with nouns, (b) adding descriptive elaborations to link
unfamiliar concepts with familiar concepts, (c) adding connectives to specify
the relationships between sentences or ideas, (d) replacing or inserting words
to increase the conceptual overlap between adjacent sentences, (e) adding
topic headers, (f) adding thematic sentences that serve to link each paragraph
to the rest of the text and overall topic, and (g) changing sentence structures
to incorporate the additions and modifications. Coh-Metrix was used to
verify that these modifications resulted in higher-cohesion texts according
to objective measures, including local and global argument overlap and LSA
similarity. The results of the study confirmed that the high-cohesion text
generally improved comprehension at the textbase level. They also replicated
the results reported by O’Reilly and McNamara (2007) by showing that the
reverse cohesion effect (i.e., benefit of low cohesion for high-knowledge
readers) occurred exclusively for the high-knowledge, less-skilled readers.
This is because the less-skilled readers need the low cohesion in the text to
induce inference processes.
Ozuru, Briner, Best, and McNamara (2010) further examined the effects of
deep reading processes in the context of high- and low-cohesion text by
having participants self-explain while reading the text. Self-explaining in
this context involved explaining the meaning of target sentences in the texts
while reading. This process improves comprehension and learning by helping
readers engage in active inference processes. Because there are more gaps in
the low-cohesion text, requiring inference processes to bridge the gaps, Ozuru
and his colleagues hypothesized that the self-explanation process would result
in better comprehension for the low-cohesion than for the high-cohesion
text. That is, self-explanation would be most effective where it was needed: for
the low cohesion text. In turn, the low-cohesion text would enhance the
benefits of the self-explanation, because the gaps in the texts would elicit
more inference-based explanations.
Ozuru et al. (2010) also used Coh-Metrix to guide the cohesion manipu-
lations of their text, titled “Why Is There Sex,” excerpted from the Leahey and
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 36 [18–39] 8.10.2013 7:32PM
The following example from the beginning of one of the narrative texts, called
Orlando, illustrates cohesion manipulations that were also implemented to
create a context and to facilitate interpretations of the situations described in
the text. The order in which information was presented was also changed for
the Orlando text such that the high-cohesion version provided greater tem-
poral cohesion. That is, information was presented in the order in which
events occurred. The low-cohesion version, on the other hand, presented
information in a nontemporal order, and thus the reader had to infer the
actual order of events.
Children in grade 4 read four texts, including one high-cohesion and one
low-cohesion text from each genre. Their comprehension of each text was
assessed using three measures: 12 multiple-choice questions, free recall, and
cued recall. The most important prediction made in this study was that at
the age when young children are expected to begin learning from text,
successful comprehension would largely depend on the reader’s knowledge
about the world and about specific domains. The results confirmed that
comprehension was enhanced by increased knowledge: High-knowledge
readers showed better comprehension than did low-knowledge readers,
and narratives were comprehended better than science texts. Interactions
between readers’ knowledge levels and text characteristics indicated that the
children showed larger effects of knowledge for science than for narrative
texts.
McNamara et al. (2011) found that the high-cohesion text improved
comprehension of the narrative texts as measured by the multiple-choice
questions – a measure that tends to tap textbase level understanding.
Importantly, they also found a reverse cohesion effect for the narrative
texts. That is, children with more knowledge better understood the low-
cohesion narrative texts than the high-cohesion narrative texts. Thus, when
the students possessed enough knowledge (i.e., they were high-knowledge
readers and the texts were narratives), they showed the same patterns that
have been observed for adults. The low-cohesion version, which required
more inferences, was understood better than the high-cohesion version was.
Decoding skill benefited comprehension for these young readers, but
effects of text genre and cohesion depended less on decoding skill than on
prior knowledge. Overall, the study indicates that the fourth grade slump is
at least partially attributable to the emergence of complex dependencies
between the nature of the text and the reader’s prior knowledge. The results
also suggested that simply adding cohesion cues, and not explanatory
information, is not likely to be sufficient for young readers as an approach
to improving comprehension of challenging texts. That is, there were some
benefits of the added cohesion, but they were not as substantial as hoped.
Clearly the young readers needed more cohesion and background
information added to the text in order to improve their comprehension
substantially.
conclusion
In conclusion, across a number of studies, it has been found that low-
knowledge readers gain from higher-cohesion text, and any source of
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 39 [18–39] 8.10.2013 7:32PM
This chapter describes the scientific and technological advances that were the
precursors to the development of Coh-Metrix. The Coh-Metrix team has
developed numerous computational algorithms and procedures for measur-
ing ease (versus difficulty) at the various levels of language and discourse. We
are satisfied with our progress and achievements, but we cannot emphasize
too much that Coh-Metrix was hardly built in a vacuum. Coh-Metrix can be
viewed as a sandbox of automated language and discourse facilities that were
developed not only by our research team but also by others in computational
linguistics, corpus linguistics, discourse processes, cognitive science, psychol-
ogy, and other affiliated fields. We were able to build Coh-Metrix because we
had the advantage of standing on the shoulders of giants.
The contributions of our predecessors come in many varieties. Some
noteworthy examples of these contributions are highlighted below.
1. One type of contribution is lexicons or dictionaries of words that list
qualitative features or quantitative values for each word. For example,
WordNet (Fellbaum, 1998; Miller, Beckwith, Fellbaum, Gross, &
Miller, 1990) stores semantic and syntactic features of nouns, verbs,
adjectives, and other content words in the English language. The MRC
Psycholinguistic Database (Coltheart, 1981) has human ratings of
thousands of words on familiarity, imagery, concreteness, and mean-
ingfulness. The CELEX Lexical Database (Baayen, Piepenbrock, &
Gulikers, 1995) has estimates of how frequently English words are
used in a very large corpus of documents.
2. A second type of contribution is from applications. An application is a
fully functioning program that takes text as input and computes some
language or discourse code as output. We use the output when we
create a Coh-Metrix measure. A good example of this is when we used
40
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 41 [40–59] 8.10.2013 7:44PM
the lexicon
There is a long history of analyzing words in the language, discourse, and
social sciences. Psychologists are prone to have humans rate or categorize
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 42 [40–59] 8.10.2013 7:44PM
It should be noted that these frequency norms will change over time because
the reading materials vary over history and sociocultural contexts. Therefore, it
would be ideal to have an automated facility that periodically samples text
corpora and revises the frequency norms. This approach is being pursued by
many companies in their analyses of Web sites, Wikipedia, and the vast
repository of documents in the cloud. The Word Maturity index of Kireyev
and Landauer (2011) tracks the words exposed to readers of different ages. One
could also imagine word frequency norms that are tailored to particular
populations in a culture – at a grain size akin to the marketers of Amazon.com.
There is ample evidence that text difficulty decreases as a function of the word
frequency of the words in the text. This is indeed reflected in readability
formulas that point to the length of words. We know that word frequency
robustly decreases as a function of word length: Frequent words are shorter
according to Zipf’s law (Zipf, 1949). We also know that the time it takes to read a
text decreases substantially as a function of the reading ease metrics, word
frequency, and the shortness of words. Available evidence supports the claim
that reading time decreases as a function of the logarithm of word frequency
(Haberlandt & Graesser, 1985; Just & Carpenter, 1987). Thus, the difference
between words occurring 10 versus 100 times per million has a much more
robust impact on reading times than words that appears 1,010 versus 1,100
times per million. Word frequency is extremely important because it is aligned
with world knowledge. Readers know much less about rare words, and this
has a tremendous impact on comprehension (McNamara, Kintsch, Songer, &
Kintsch, 1996; Perfetti, 2007; Rayner, Foorman, Perfetti, Pesetsky, & Seidenberg,
2001; Snow, 2002; Stanovich, 1986).
WordNet. WordNet® is a computational, lexical database annotated by
experts on various linguistic and psychological features, containing more
than 170,000 English nouns, verbs, adjectives, and adverbs. The design of
WordNet is inspired by psycholinguistic theories of human lexical representa-
tions (Fellbaum, 1998; Miller et al., 1990). The words are organized in lexical
networks based on connections between related lexical concepts. English
nouns, verbs, adjectives, and adverbs are organized into semantic organizations
of underlying lexical concepts. Some pairs of words are functionally synon-
ymous (e.g., lady and woman) because they have the same or a very similar
meaning. There are relations other than synonyms. Polysemy refers to the
number of senses of a word. A word with more senses runs the risk of being
ambiguous and to slow down processing for less-skilled and low-knowledge
readers (Gernsbacher, 1990; Just & Carpenter, 1987; McNamara & McDaniel,
2004). However, there is an advantage of polysemy because more frequent
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 44 [40–59] 8.10.2013 7:44PM
cohesion by explicitly linking ideas at the clausal and sentential level (Britton &
Gulgoz, 1991; Halliday & Hasan, 1976; Louwerse, 2001; McNamara & Kintsch,
1996; Sanders & Noordman, 2000). These include connectives that correspond to
additive cohesion (e.g., “also,” “moreover,” “however,” “but”), temporal cohesion
(e.g., “after,” “before,” “until”), and causal/intentional cohesion (e.g., “because,”
“so,” “in order to”). Logical operators (e.g., variants of “or,” “and,” “not,” and “if–
then”) are also cohesive links that influence the analytical complexity of a text.
Coh-Metrix has lists of connectives and discourse markers in various categories
that are accessed while interpreting text. The relative frequency of connectives
and discourse markers is expected to correlate positively with discourse cohesion
and text ease. The one caveat in this prediction is that connectives tend to
lengthen sentences so there is a potential burden on cognitive resources and
consequent memory for text (Millis, Graesser, & Haberlandt, 1993).
Pronouns also have repercussions on cohesion and coherence. If the reader
cannot bind a pronoun to a referent, the reader runs the risk of not optimally
connecting ideas in the text. Therefore, the relative frequency of pronouns in
a text should be correlated positively with text difficulty to the extent that the
referents of pronouns are difficult to resolve. However, one also needs to be
tentative in making this prediction because there are other factors to consider.
Pronouns are frequent and have few letters, which should make them easy to
process at the lower, basic levels of reading. Pronouns are diagnostic of
narrative texts that are known to be easier to process than informational
texts. There is a question of whether the scale will tip to pronouns having
ungrounded referents and pronouns being prevalent in easy narrative text.
Empirical tests are needed to resolve such trade-offs.
In summary, there is a wealth of computer technologies and psychological
theories that analyze words. The work level of the multilevel theoretical
framework is well fortified in computational power. As we go to the deeper
levels of meaning, the available repertoire of computer technologies becomes
sparse. However, the lexicons of words are quite plentiful.
syntax
In models of text and discourse comprehension, the surface structure is com-
posed of the words and the sentence (e.g., van Dijk & Kintsch, 1983). One
important aspect of the sentences in a text regards syntax. Both theoretical
and computational linguists have devoted considerable effort to analyzing the
syntax of sentences (Charniak, 2000; Chomsky, 1965; Winograd, 1983). The
words in a sentence are decomposed into basic meaning units called morphemes
(e.g., swimming → swim + -ing). The morphemes are grouped into phrases, such
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 47 [40–59] 8.10.2013 7:44PM
as noun phrase (NP), verb phrase (VB), prepositional phrase (PP), and embed-
ded sentence constituents. The phrases are organized into a tree structure with
nodes and branches. The root of the tree is at the highest level and is the main
sentence node. The root sentence constituent has descending branches that point
to its component phrases (e.g., NP, VP, PP), which are also nodes at an
intermediate structural level. There may be many structural levels of the inter-
mediate nodes. Eventually the tree structure breaks down the information to the
point of reaching the terminal nodes, which are specific words or morphemes.
Figure 3.1 shows an example syntactic tree structure for the sentence “A dog is
swimming in my pool.” There is the Sentence root node and a set of intermedi-
ate phrase nodes (NP, VP, PP). There is a set of part-of-speech (POS) tags, as we
defined earlier. In this sentence the POS tags are determiner, noun, verb,
auxiliary verb, gerund (via the –ing, which is incorrectly assigned according to
some linguists), preposition, and possessive pronoun. The tense and aspect are
specified also in Figure 3.1: present tense and in-progress aspect.
S1
NP VP .
DT NN AUX VP
VBG PP
IN NP
PRPS NN
Note: AUX = auxiliary verb, DT = determiner, NN = noun (singular or mass), NP = noun phrase,
PP= prepositional phrase, PRP$ = possessive pronoun, S1 = sentence, S = simple declarative
clause, VBG = verb (gerund or present participle, VP = verb phrase
textbase
The textbase captures the meaning of explicit information in the text, as we
described in Chapters 1 and 2. Van Dijk and Kintsch (1983) distinguished
between the explicit textbase level and a deeper level called the situation model
level that contains more inferences and more global conceptualizations. The
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 49 [40–59] 8.10.2013 7:44PM
theoretical boundary between the textbase and the situation model is not
always clear-cut, but it does provide a useful guide for separating the semantic
information that is closely tied to the explicit text and the inferences derived
from the text together with world knowledge, genre, and the pragmatic
context.
Propositions. According to van Dijk and Kintsch, the basic units of mean-
ing in the textbase are called propositions. Each proposition contains a
predicate (e.g., main verb, adjective, connective) and one or more arguments
(e.g., nouns, pronouns, embedded propositions) that have a thematic role,
such as agent, patient, object, time, or location. Below are an example
sentence and its propositional meaning representation.
When the committee met on Monday, they discovered the society was
bankrupt.
PROP 1: meet (AGENT=committee, TIME = Monday)
PROP 2: discover (PATIENT=committee, PROP 3)
PROP 3: bankrupt (OBJECT: society)
PROP 4: when (EVENT=PROP 1, EVENT=PROP 2)
The arguments are placed within the parentheses and have role labels,
whereas the predicates are outside of the parentheses. The propositional
representation of van Dijk and Kintsch does not incorporate some of the
more precise and subtle indexes of meaning, such as tense, aspect, quantifiers,
and voice. This decision was undoubtedly a simplification assumption rather
that a core theoretical claim. In principle, an expanded propositional repre-
sentation could be adopted that incorporates more precision and details
about meaning.
Computational linguistics has not been able to develop computer programs
that can automatically translate sentences into a propositional representation
(or a logical form) with a high degree of reliability. Nevertheless, there have
been large-scale attempts to achieve these goals and progress has clearly been
made (Rus, 2004). For example, the assignment of noun-phrases to thematic
roles (e.g., agent, recipient, object, location) is approximately 80% correct in
the available computer systems (DARPA, 1995). One promising project is the
development of a corpus of annotated propositional representations in
PropBank (Palmer, Kingsbury, & Gildea, 2005). This effort will allow
researchers to systematically develop, test, and refine their algorithms for
automatic proposition extraction.
Cohesion. The propositions, clauses, and noun-phrase arguments are con-
nected by principles of cohesion. Referential cohesion occurs when a noun,
pronoun, or noun-phrase that captures an argument refers to another
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 50 [40–59] 8.10.2013 7:44PM
constituent in the text. For example, if the preceding example sentence (“When
the committee met on Monday, they discovered the society was bankrupt.”)
were followed by “The meeting lasted several hours,” the noun-phase argument
“the meeting” refers to PROP-1. Cohesion between propositions or clauses is
also established by discourse markers, such as connectives (e.g., “because,” “in
order to,” “so that”), adverbs (“therefore,” “afterwards”), and transitional
phrases (“on the other hand”). As discussed in Chapter 2, textbase difficulty
is expected to increase when there are cohesion gaps in the text.
Coreference Cohesion. Coh-Metrix does not have a proposition analyzer,
but it goes a long distance in textbase analysis by identifying clauses and
computing different types of cohesion relations between sentences. As dis-
cussed in Chapter 2, one ubiquitous type of cohesion relation is coreference
(Halliday & Hasan, 1976; Sanders & Noordman, 2000; van Dijk & Kintsch,
1983). Referential cohesion occurs when a noun, pronoun, or noun-phrase
argument refers to another constituent in the text. There is a referential
cohesion gap when the content words in a sentence do not connect to
words in surrounding text or sentences. Coh-Metrix tracks five major types
of lexical coreference by computing overlap in nouns, pronouns, arguments,
stems (morpheme units), and content words.
Noun overlap. Two sentences share one or more common nouns.
Pronoun overlap. Sentences share at least one pronoun with the same gender and
number.
Argument overlap. Sentences share the same nouns or pronouns (table/table, he/he).
Stem overlap. One sentence has a noun with the same semantic morpheme (called
a lemma) in common with any word in any grammatical category in the other
sentence (e.g. the noun “swimmer” and the verb “swimming”).
Content word overlap. Sentences are more connected to the extent that they have
more content words that overlap.
prominent in the syntactic parse (Lappin & Leass, 1994), and (c) considers
how often the referent has been mentioned in the previous text. However, the
Coh-Metrix anaphor resolution procedure merely computes whether there is
at least one acceptable referent of the pronoun (Yes or No) rather than filling
in the referent of the anaphor. It should be acknowledged that the perform-
ance of anaphora resolution systems in computational linguistics is modest
(Jurafsky & Martin, 2008).
Discourse Markers and Connectives. A very different mechanism for
establishing textbase cohesion is by various forms of discourse markers and
connectives (Halliday & Hasan, 1976; Louwerse, 2001; Sanders & Noordman,
2000). These include connectives that correspond to additive cohesion
(e.g., “also,” “moreover,” “however,” “but”), temporal cohesion (e.g., “after,”
“before,” “until”), and causal/intentional cohesion (e.g., “because,” “so,” “in
order to”). Logical operators (e.g., variants of “or,” “and,” “not,” and “if–then”)
are also cohesive links that influence the analytical complexity of a text. More
will be said about these connectives and discourse markers in the subsequent
section on the situation model. The connectives and discourse markers have
tight connections to the situation model in addition to the textbase level.
Lexical Diversity. Indices of lexical diversity are presumably related to both
text difficulty and textbase cohesion. Lexical diversity adds to difficulty
because each unique word introduces new information that needs to be
encoded and integrated into the discourse context. On the flip side, low
lexical diversity implies more repetition of the words and redundancy, and
thus higher cohesion. Lexical diversity is also related to lexical sophistication
on the part of the writer because it indicates that the author of the text is able
to use a wider variety of words.
The most well-known computation of lexical diversity is the type-token
ratio (TTR, Templin, 1957). This is the number of unique words in a text (i.e.,
types) divided by the overall number of words (i.e., tokens) in the text. One
problem with TTR, however, is that its results are sensitive to variations in
text length because as the number of word tokens increases, there is a lower
likelihood of those words being unique (McCarthy & Jarvis, 2010). This is of
particular concern because researchers frequently need to analyze texts that
dramatically vary in length. Coh-Metrix also includes measures such as vocd
and Measure of Textual Lexical Diversity (MTLD), which overcome the
potential confound of text length by using sampling and estimation methods
(McCarthy & Jarvis, 2010). The index produced by vocd is calculated through
a computational procedure that fits TTR random samples with ideal TTR
curves. MTLD is calculated as the mean length of sequential word strings in a
text that maintain a given TTR value.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 52 [40–59] 8.10.2013 7:44PM
situation model
As we discussed in Chapter 1, the situation model is a level of representation
that moves us beyond the explicit text into the realm of inferences and the
conceptual meaning of the text beyond language per se. This would be
impossible without the relevant bodies of world knowledge that are shared
by many in the sociocultural context (Graesser, Singer, & Trabasso, 1994;
Kintsch, 1998; McNamara & Magliano, 2009; Snow, 2002; van den Broek,
Rapp, & Kendeou, 2005). In narrative microworlds, the situation model
includes the setting (characters, objects, spatial layout), the plot (events,
actions, conflict), and mental states of characters (goals, emotions, percep-
tions). In informational texts, this is the substantive content on what the text
is about. In a science text, for example, it would include the components of the
system, the spatial layout of the entities, the causal mechanisms, and perhaps
quantitative specifications of these viewpoints. Inferences are needed to
construct the situation model by catering to the unique constraints of the
textbase, the background world knowledge that becomes activated, and the
other levels in the multilevel theoretical framework (see Chapter 1).
Latent Semantic Analysis (LSA). In the early days of artificial intelligence
(AI), researchers struggled with the challenge of representing world knowl-
edge, recruiting such knowledge during, comprehension, and generating
relevant inferences (Lenat, 1995; Schank & Abelson, 1977). AI researchers
identified packages of the generic world knowledge, such as person stereo-
types, spatial frames, scripted activities, and schemas. For example, scripts are
generic representations of everyday activities (e.g., eating at a restaurant,
washing clothes, playing baseball) that have actors with goals and roles,
sequences of actions that are typically enacted to achieve these goals, spatial
environments with objects and props, and so on. These scripts and other
generic knowledge packages were thought to be activated during comprehen-
sion through pattern recognition processes and to guide comprehension by
monitoring attention, generating inferences, formulating expectations, and
interpreting explicit text. AI researchers quickly learned that it was extremely
difficult to program computers to comprehend text even when the systems
were fortified with many different classes of world knowledge (Lehnert &
Ringle, 1982). Moreover, it was tedious to annotate and store large volumes of
world knowledge in formats needed to support computation (but see Lenat,
1995 for attempts to do so).
Coh-Metrix adopts a very different, statistical approach to representing
world knowledge, called Latent Semantic Analysis (Landauer & Dumais, 1997;
Landauer, McNamara, Dennis, & Kintsch, 2007). LSA is a mathematical,
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 53 [40–59] 8.10.2013 7:44PM
conclusion
This chapter has identified the technologies and science that led to the
development of Coh-Metrix. It is quite apparent that many fields in the
interdisciplinary arena of computational discourse science were needed to
reach this point in research and development. Moreover, many of our
colleagues would not have bet 20 years ago on a computer facility like Coh-
Metrix being able to compute automatically so many measures at the levels of
words, syntax, textbase, situation model, and genre. Coh-Metrix is not a
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 59 [40–59] 8.10.2013 7:44PM
Coh-Metrix Measures
Coh-Metrix Measures 61
banks of measures are quite similar. This chapter describes the indices that are
provided in Coh-Metrix 3.0. In this chapter we describe all of those indices in
the order they are output in the tool, except those that are associated with
readability and text ease, which are described in Chapter 5. The indices that
are described in this chapter and Chapter 5 are listed in Appendix A.
Comparative norms for each of the indices are provided in Appendix B by
grade level for three texts genres (language arts, social studies, and science).
descriptive indices
Coh-Metrix provides descriptive indices to help the user check the Coh-
Metrix output (e.g., to make sure that the numbers make sense) and interpret
patterns of data. The extracted indices include those on the following list. In
the output for the current version of Coh-Metrix (Version 3.0), all of these
indices are preceded by DES to designate that they are descriptive measures.
1. Number of paragraphs (DESPC). This is the total number of para-
graphs in the text. Paragraphs are defined by hard returns within the
text.
2. Number of sentences (DESSC). This is the total number of sentences in
the text. Sentences are identified by the OpenNLP sentence splitter
(http://opennlp.sourceforge.net/projects.html).
3. Number of words (DESWC). This is the total number of words in the
text. Words are calculated using the output from the Charniak parser.
For each sentence, the Charniak parser generates a parse tree with part
of speech (POS) tags for clauses, phrases, words, and punctuations. The
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 62 [60–77] 8.10.2013 8:25PM
Coh-Metrix Measures 63
referential cohesion
Referential cohesion refers to overlap in content words between local
sentences, or coreference. In the output for the current version of Coh-
Metrix (Version 3.0), all of these indices are preceded by CRF to designate
that they are coreference measures. As discussed in greater detail in
Chapters 2 and 3, coreference is a linguistic cue that can aid readers in
making connections between propositions, clauses, and sentences in their
textbase understanding (Halliday & Hasan, 1976; McNamara & Kintsch,
1996). Referential cohesion gaps can occur when the words or concepts in a
sentence do not overlap with other sentences in the text. As such cohesion
gaps at the textbase level can have varying effects on comprehension and
reading time depending on the reader’s abilities (McNamara & Kintsch,
1996; O’Brien, Rizzella, Albrecht, & Halleran, 1998; O’Reilly & McNamara,
2007; see Chapter 2).
Coh-Metrix measures for referential cohesion vary along two dimensions.
First, the indices vary from local to more global. Local cohesion is measured
by assessing the overlap between consecutive, adjacent sentences, whereas
global cohesion is assessed by measuring the overlap between all of the
sentences in a paragraph or text. Second, the indices vary in terms of the
explicitness of the overlap. Coh-Metrix tracks different types of coreference:
noun overlap, argument overlap, stem overlap, and content word overlap.
Noun overlap measures the proportion of sentences in a text for which
there are overlapping nouns, with no deviation in the morphological forms
of the nouns (e.g., table/table). Argument overlap also considers overlap
between the head nouns (e.g., “table”/“tables”) and pronouns (e.g., “he”/
“he”) but does not attempt to determine the referents of pronouns (e.g.,
whether “he” refers to Sally or John). Stem overlap considers overlap between
a noun in one sentence and a content word (i.e., nouns, verbs, adjectives,
adverbs) in another sentence. The content word in the other sentence must
share a common lemma (i.e., core morphological element; e.g., “baby”/“babies”;
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 64 [60–77] 8.10.2013 8:25PM
Content
Noun Argument Stem Word LSA
S1 The cell is the basic unit of life.
S2 Cells were discovered by Robert 0 1 1 0 0.37
Hooke.
S3 A cell is the smallest unit of life that is 0 1 1 0 0.40
classified as a living thing.
S4 Some organisms, such as most bacteria, 1 1 1 0.13 0.44
are unicellular (consist of a single cell).
S5 Other organisms, such as humans, are 1 1 1 0.33 0.79
multicellular.
S6 There are two types of cells: eukaryotic 0 0 0 0 0.34
and prokaryotic.
S7 Prokaryotic cells are usually 1 1 1 0.50 0.85
independent.
S8 Eukaryotic cells are often found in 1 1 1 0.20 0.70
multicellular organisms.
Average local (adjacent) 0.57 0.86 0.86 0.17 0.55
Average global (all sentences) 0.43 0.82 0.82 0.13 0.41
Coh-Metrix Measures 65
Coh-Metrix Measures 67
lexical diversity
Coh-Metrix includes three types of indices of lexical diversity: type-token
ratio (TTR; LDTTRc, LDTTRa), the Measure of Textual Lexical Diversity
(MTLD; LDMTLDa), and vocd (LDVOCDa). Type-token ratio is calculated
for content words only (i.e., c) and also for all words (i.e., a), and MTLD and
vocd are calculated for all words (i.e., a). Lexical diversity refers to the variety
of unique words (types) that occur in a text in relation to the total number of
words (tokens). When the number of word types is equal to the total number
of words (tokens), all of the words are different. In that case, lexical diversity is
at a maximum, and the text is likely to be either very low in cohesion or very
short. A high number of different words in a text indicates that new words
need to be integrated into the discourse context. By contrast, lexical diversity
is lower (and cohesion is higher) when more words are used multiple times
across the text. The most well-known lexical diversity index is TTR, which is
simply the number of unique words divided by the overall number of words
(i.e., tokens). TTR is correlated with text length because as the number of
word tokens increases, there is a lower likelihood of those words being
unique. Measures such as MTLD and vocd overcome that confound by
using estimation algorithms (McCarthy & Jarvis, 2010). MTLD is calculated
as the mean length of sequential word strings in a text that maintain a given
TTR value. The index produced by vocd is calculated through a computa-
tional procedure that fits TTR random samples with ideal TTR curves.
connectives
Connectives play an important role in the creation of cohesive links between
ideas and clauses and provide clues about text organization (Cain & Nash,
2011; Crismore, Markkanen, & Steffensen, 1993; Longo, 1994; Sanders &
Noordman, 2000; van de Kopple, 1985). Coh-Metrix provides an incidence
score (occurrence per 1,000 words) for all connectives (CNCAll) as well as
different types of connectives. Indices are provided on five general classes of
connectives (Halliday & Hasan, 1976; Louwerse, 2001): causal (CNCCaus:
“because,” “so”), logical (CNCLogic: “and,” “or”), adversative/contrastive
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 68 [60–77] 8.10.2013 8:25PM
situation model
Referential cohesion is an important linguistic feature of text. However, there
are also deeper levels of meaning that go beyond the words. The term
“situation model” has been used by researchers in discourse processing and
cognitive science to refer to the level of mental representation for a text that
involves much more than the explicit words (Graesser & McNamara, 2011;
Graesser, Singer, & Trabasso, 1994; Kintsch, 1998; van Dijk & Kintsch, 1983;
Zwaan & Radvansky, 1998). Some researchers have described the situational
model in terms of the features that are present in the comprehender’s mental
representation when a given context is activated (e.g., Singer & Leon, 2007).
For example, with episodes in narrative text, the situation model would
include the plot. In an informational text about the circulatory system, the
situation model might convey the flow of the blood. In essence, the situation
model comprises the reader’s mental representation of the deeper underlying
meaning of the text (Kintsch, 1998).
The content words and connective words systematically constrain and are
aligned with aspects of these inferred meaning representations, but the
explicit words do not go the full distance in specifying the deep meanings.
Coh-Metrix provides indices for a number of measures that are potentially
related to the reader’s situation model understanding. These include meas-
ures of causality, such as incidence scores for causal verbs that reflect changes
of state (SMCAUSv: “break,” “freeze,” “impact,” “hit,” “move”), causal verbs
plus causal particles (SMCAUSvp: e.g., both causal verbs and connectives
such as “because,” “in order to”), and intentional verbs (SMINTEp: e.g.,
“contact,” “drop,” “walk,” “talk”). Coh-Metrix uses WordNet (Miller,
Beckwith, Fellbaum, Gross, & Miller, 1990) to classify verbs into the categories
of causal and intentional verbs. The distinction between causality and inten-
tionality has relevance to the nature of knowledge in situation models
(Zwaan & Radvansky, 1998). Intentional verbs signal actions that are volun-
tarily enacted by animate agents, motivated by plans in pursuit of goals (such
as buying groceries, telling a child to behave, or driving to work). By contrast,
causal verbs reflect events in the material world or psychological world (such
as an earthquake erupting, or a person discovering a solution) that either may
or may not be driven by goals of people.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 69 [60–77] 8.10.2013 8:25PM
Coh-Metrix Measures 69
Coh-Metrix also provides two ratio indices: the ratio of causal particles to
causal verbs to (SMCAUSr) and the ratio of intentional particles to inten-
tional verbs (SMINTEr). These ratios are calculated to reflect the necessity of
connectives in text. This necessity will depend on the number of events
expressed in the text. A text is judged as more causally cohesive to the extent
that there are proportionally more connectives that relate actions and events
in the text. If there are numerous action, event, and intentional verbs without
causal connectives to aid the reader, then the reader may be more likely to be
forced to generate inferences to understand the relations between the actions
and events in the sentences.
Coh-Metrix also provides measures of verb overlap, which are calculated
using LSA (SMCAUSlsa) and WordNet (SMCAUSwn). These indices are
indicative of the extent to which verbs (which have salient links to actions,
events, and states) are repeated across the text. In the LSA algorithm, the
cosine of two LSA vectors corresponding to the given pair of verbs is used to
represent the degree of overlap of the two verbs. In the WordNet algorithm,
the overlap was a binary representation: 1 when two verbs were in the same
synonym set and 0 otherwise. McNamara et al. (2012) found that verb
cohesion is greater in the earlier-grade texts than in the later-grade texts
and that verb cohesion decreases monotonically across science, social studies,
and narrative texts. They hypothesized that verb cohesion may help compen-
sate for lower referential cohesion when the text focuses more on events than
objects, as in the cases of lower-grade texts and narrative texts.
Coh-Metrix also provides a measure of temporal cohesion, which reflects
tense and aspect repetition in the text (SMTEMP). Time is represented
through morphemes associated with the main verb or helping verb that signal
tense (past, present, future) and aspect (in progress versus completed). This
measure tracks the consistency of tense and aspect across a passage of text.
The repetition scores decrease as shifts in tense and aspect are encountered.
When such temporal shifts occur, readers may encounter difficulties in the
absence of explicit particles that signal shifts in time, such as the temporal
adverbial (“later on”), temporal connective (“before”), or prepositional
phrases with temporal nouns (“on the previous day”). A low particle-to-
shift ratio is a symptom of problematic temporal cohesion.
syntactic complexity
Theories of syntax assign words to part-of-speech categories (e.g., nouns, verbs,
adjectives, conjunctions), group words into phrases or constituents (noun-
phrases, verb-phrases, prepositional-phrases, clauses), and construct syntactic
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 70 [60–77] 8.10.2013 8:25PM
tree structures for sentences. For example, some sentences are short and have a
simple syntax that follow an actor-action-object syntactic pattern, have few if any
embedded clauses, and have an active rather than passive voice. Some sentences
have complex, embedded syntax that potentially places heavier demands on
working memory. The syntax in text tends to be easier to process when there are
shorter sentences, few words before the main verb of the main clause, and few
words per noun-phase. As mentioned earlier, the average number of words in
sentences is provided in Coh-Metrix as a descriptive measure (DESSL). Coh-
Metrix also calculates the mean number of words before the main verb, or left
embeddedness (SYNLE), and the average number of modifiers per noun phrase
(SYNNP). Sentences with difficult syntactic constructions include the use of
embedded constituents and are often structurally dense, syntactically ambigu-
ous, or ungrammatical (Graesser et al., 2004). As a consequence, they are more
difficult to process and comprehend (Perfetti, Landi, & Oakhill, 2005).
Coh-Metrix assesses a combination of semantic and syntactic dissimilarity
by measuring the uniformity and consistency of the sentence constructions in
the text, based on the notion of a Minimal Edit Distance (MED; McCarthy,
Guess, & McNamara, 2009). Coh-Metrix 3.0 provides three variations on
MED: SYNMEDpos, SYNMEDwrd, and SYNMEDlem. MED calculates the
average minimal edit, or the distance that parts of speech (SYNMEDpos),
words (SYNMEDwrd), or lemmas (SYNMEDlem) are from one another
between consecutive sentences in a text. Consider the following example.
The SYNMEDpos syntactic dissimilarity is 0.0 because the syntax is the same.
By contrast, “cat” and “dog” are in different positions in each sentence, and so
SYNMEDwrd and SYNMEDlem are both 0.4. Considering these indices
together indicates that they have the same syntax but different meanings.
SYNMEDpos considers parts of speech but not the words themselves (e.g.,
determiner + noun). In essence, SYNMEDpos calculates the extent to which
one sentence needs to be modified (edited) to make it have the same syntactic
composition as a second sentence. SYNMEDwrd and SYNMEDlem consider
the words but not the parts of speech (e.g., the + book). The three MED indices
tend to be moderately correlated with measures of referential and semantic
cohesion, with correlations ranging between −.3 and −.7. For example, using the
TASA corpus of 38,807 passages, SYNMEDwrd correlates −.75 with the refer-
ential cohesion easability score (see Chapter 5). However, SYNMEDwrd and
SYNMEDlem tend to be more strongly correlated with referential and semantic
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 71 [60–77] 8.10.2013 8:25PM
Coh-Metrix Measures 71
cohesion (r= −.4 to −.7) than does SYNMEDpos (r=−.2 to −.6), which tends to
correlate also with syntactic complexity (r = −.3 to −.6).
Coh-Metrix 3.0 provides two measures of sentence-to-sentence syntax
similarity (SYNSTRUTa, SYNSTRUTt) by measuring the uniformity and
consistency of the syntactic constructions in the text. SYNSTUTa is the
average parse tree similarity (Sim) between adjacent sentence pairs in a text.
SYNSTUTt is the average parse tree similarity (Sim) between all combina-
tions of sentence pairs across paragraphs of the text. SYNSTRUT is based on
parse tree similarities between sentences. For two sentence parse trees, the
maximum common tree is found by removing uncommon subtrees. The
parse tree similarity is computed by the following formula:
Sim ¼ nodes in the common tree=ðthe sum of the nodes in the two
sentence trees nodes in common treeÞ:
Figure 4.1 illustrates how the common tree is constructed. There are 8 nodes
in the first tree and 10 nodes in the second tree. In the figure, the yellow nodes
are common nodes. There are 6 common nodes. The rectangle leaves with
words are not counted as nodes. Therefore, the similarity is computed as
Sim = 6/ ((8+10)–6) = 6/12 = 0.50. This index not only looks at syntactic
S1 S1
S S
NP VP . NP VP .
DT NN
Note: DT = determiner, NN = noun (singular or mass), NP = noun phrase, PRP = personal pronoun,
S1 = sentence, S = simple declarative clause, VBD = verb (past tense), VP = verb phrase
fi g u r e 4 . 1 . Sentence-to-sentence syntax similarity. This figure presents sentence-to-
sentence syntax similarity (SYNSTRUT) between the two adjacent sentences: “The man
came. He entered the door.” The yellow nodes represent the common nodes between the
two sentences. The outcome of the analysis indicates that 6 nodes are common, and 12
are not, with the result of 0.50 for the index.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 72 [60–77] 8.10.2013 8:25PM
similarity across sentence pairs at the phrasal level, but also takes account of
the parts of speech involved. More uniform syntactic constructions result in
less complex syntax that is easier for the reader to process (Crossley,
Greenfield, & McNamara, 2008).
word information
Vocabulary knowledge, and thus the types of words that are presented in a
text, has a substantial impact on reading time and comprehension (Perfetti,
2007; Rayner et al., 2001; Stanovich, 1986). The words in textbooks and the
texts that children encounter beginning in the late elementary years contain
increasingly more complex and unfamiliar words (Adams, 1990; Beck,
McKeown, & Kucan, 2002). Therefore, it is important to analyze words on
multiple characteristics and dimensions that have relevance to reading devel-
opment and the construction of meaning in text. Coh-Metrix provides an
abundance of word measures that are described in this section.
Parts of Speech. As discussed in greater detail in Chapter 3, each word is
assigned a syntactic part-of-speech category. These syntactic categories are
segregated into content words (e.g., nouns, verbs, adjectives, adverbs) and
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 73 [60–77] 8.10.2013 8:25PM
Coh-Metrix Measures 73
Coh-Metrix Measures 75
the words “cornet” (364), “dogma” (328), and “manus” (113), which
have an average Familiarity of 268. Words with very high Familiarity
include “mother” (632) and “water” (641), compared to “calix (124) and
“witan” (110).
3. Concreteness (WRDCNCc). This is an index of how concrete or non-
abstract a word is. Words that are more concrete are those things you
can hear, taste, or touch. MRC provides ratings for 4,293 unique words.
Coh-Metrix provides the average ratings for content words in a text.
Words that score low on the concreteness scale include “protocol”
(264) and “difference” (270), compared to “box” (597) and “ball” (615).
4. Imagability (WRDIMGc). An index of how easy it is to construct a mental
image of the word is also provided in the merged ratings of the MRC,
which provides ratings for 4,825 words. Coh-Metrix provides the average
ratings for content words in a text. Examples of low-imagery words are
“reason” (285), “dogma” (327), and “overtone” (268) compared to words
with high imagery such as “bracelet” (606) and “hammer” (618).
5. Meaningfulness (WRDMEAc). These are the meaningfulness ratings
from a corpus developed in Colorado by Toglia and Battig (1978). MRC
provides ratings for 2,627 words. Coh-Metrix provides the average
ratings for content words in a text. An example of meaningful word
is “people” (612) as compared to “abbess” (218). Words with higher
meaningfulness scores are highly associated with other words (e.g.,
“people”), whereas a low meaningfulness score indicates that the
word is weakly associated with other words.
6. Polysemy (WRDPOLc). Polysemy refers to the number of senses (core
meanings) of a word. For example, the word “bank” has at least two
senses, one referring to a building or institution for depositing money
and the other referring to the side of a river. Coh-Metrix provides
average polysemy for content words in a text. Polysemy relations in
WordNet are based on synsets (i.e., groups of related lexical items),
which are used to represent similar concepts but distinguish between
synonyms and word senses (Miller et al., 1990). These synsets allow for
the differentiation of senses and provide a basis for examining the
number of senses associated with a word. Coh-Metrix reports the
mean WordNet polysemy values for all content words in a text.
Word polysemy is considered to be indicative of text ambiguity because
the more senses a word contains relates to the potential for a greater
number of lexical interpretations. However, more frequent words also
tend to have more meanings, and so higher values of polysemy in a text
may be reflective of the presence of higher frequency words.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 76 [60–77] 8.10.2013 8:25PM
norms
This chapter has presented all of the indices that are provided in Coh-Metrix
3.0 except those that are related to readability. Comparative norms for the
indices are provided in Appendix B, separated by grade level for three text
genres (language arts, social studies, and science). To create the norms, we
analyzed a subset of a large corpus of texts created by the Touchstone Applied
Science Associates (TASA), Inc. The TASA corpus has 9 genres consisting of
119,627 paragraphs taken from 37,651 samples. The passages all consisted of
one paragraph, because paragraph breaks are not marked in the TASA
corpus. Hence, these norms are not based on a corpus that provides variation
between paragraphs or information at the paragraph level. We nonetheless
used TASA because it is a large corpus that has proven to be representative of
other texts and differences between text genres.
We calculated norms for the three largest domains represented in TASA:
language arts, social studies, and science texts. To do so, we randomly chose
100 passages from each of the 3 genres and each of 13 grade levels, for a total of
3,900 passages. Grade level in the TASA corpus is indexed by the Degrees of
Reading Power (DRP; Koslin et al., 1987). Notably, because the grade levels are
estimated using DRP values, they correspond to grade levels estimated by a
readability measure and do not correspond to an actual grade level. As
described earlier, DRP grade level is defined by a formula that includes
word and sentence characteristics, such as word frequency and sentence
length. To simplify the data analysis and presentation, grade level was
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 77 [60–77] 8.10.2013 8:25PM
Coh-Metrix Measures 77
collapsed across the DRP levels corresponding to the grade bands used within
the Common Core State Standards: grades K to 1 (n=100), 2 to 3 (n=200), 4 to
5 (n=200), 6 to 8 (n=300), 9 to 10 (n=200), and 11 and above (n=300). The
average DRP values as well as the range of DRP values for each grade band are
provided in Appendix B.
conclusion
This chapter has provided a description of the indices that we included in the
most recent version of Coh-Metrix, Version 3.0. This is a small selection of
hundreds of indices that we have explored over the past 10 years. These are the
indices that have risen to the top across the multitude of analyses and studies
conducted using Coh-Metrix. Many of the indices we have developed and
examined have not panned out. Either they simply did not measure what they
were intended to measure, or they were not as predictive of textual differences
in comparison to the indices we have included here.
We have included 106 indices in Coh-Metrix 3.0. We would have preferred
to narrow down the selection of indices even further than we have here.
However, we each have our favorites. Also, different measures are useful to
address different kinds of research questions. In addition, the number of
indices has increased because we have included in this version the standard
deviations for many of the measures. These had not been included in previous
public versions of Coh-Metrix. We have done so because we find the standard
deviation of an index informative both in terms of understanding variation
for the particular index and in terms of understanding the characteristics
of text.
In the following chapter we describe the remaining indices that were not
covered in this chapter. These are the indices related to readability, or text
difficulty. We include the Flesch measures of readability (i.e., Flesch Reading
Ease, Flesch-Kincaid Grade Level) that focus on the word and sentence levels
of complexity, but our primary focus is on the Coh-Metrix Text Easability
Principal Component scores. These are measures of text ease that have been
developed by statistically combining together the indices presented in the
current chapter. Our overarching goal in the Coh-Metrix project has been to
provide a means to enhance our understanding of text difficulty. Hence, the
text easability scores described in Chapter 5 represent a culmination of our
efforts in the Coh-Metrix project.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 78 [78–95] 8.10.2013 9:08PM
One important question with which the Coh-Metrix team has grappled is
how to measure text difficulty, complexity, or, in turn, its ease. This chapter
describes the two traditional readability measures provided by Coh-Metrix –
Flesch-Kincaid Grade Level (RDFKGL) and Flesch Reading Ease (RDFRE) –
as well as the readability index that we developed for second-language texts
(RDL2). We also describe the Coh-Metrix Text Easability Principal
Component Scores that are provided in Coh-Metrix 3.0 (i.e., PCNAR,
PCSYN, PCCNC, PCREF, PCDC, PCVERB, PCONN, PCTEMP).
The traditional and more common approach to scaling texts is to have a
single metric of text ease or difficulty. This is the approach taken by popular
metrics such as Flesch-Kincaid Grade Level (Kincaid, Fishburne, Rogers, &
Chissom, 1975) and Flesch Reading Ease (Flesch, 1948; Klare, 1974–1975),
which are provided by the Coh-Metrix tool. These two Flesch-Kincaid met-
rics are based on the length of words and sentences within the text. In Coh-
Metrix, the Flesch-Kincaid Grade Level (RDFKGL) is computed as [(0.39 *
sentence length) + (11.8 * word length) – 15.59]. The Flesch Reading Ease
(RDFRE) is computed as [206.835 – (1.015 * sentence length) – (84.6 * word
length)]. Sentence length (DESSL) is measured by the mean number of words
per sentence in a text, whereas word length (DESWLsy) is measured as the
mean number of syllables per word (which is highly correlated with the mean
number of letters).
These readability measures can provide robust predictors of sentence-level
understanding and the amount of time it takes to read a passage. Indeed, these
types of text comprehension measures offer impressive validation of the
metric. There are a number of theoretical explanations for the validity of
these and similar metrics, but two principal ones refer to the effects of word
knowledge and working memory while reading. First, infrequent words in a
language tend to be longer according to Zipf’s (1949) law, so the word length
78
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 79 [78–95] 8.10.2013 9:08PM
level of challenge. A teacher may assign a text that is at just the right level,
challenge a student with a more difficult text, or provide a text that is easy
enough for the student to readily understand. A unidimensional metric
provides a simple solution to this task because the dimensions are generally
aligned with a common metric – grade level.
We have conducted two projects to explore unidimensional metrics of text
readability. The first resulted in the L2 Readability (RDL2) score that is
provided in Coh-Metrix 3.0. The second developed an algorithm to predict
textbook grade levels. These algorithms are described in the following
sections.
10–12. The assigned grade level of these texts is determined by the publisher
and assumedly derived from a complex mix of quantitative indices (such as
Flesch-Kincaid Grade Level), the intuition of expert judgment, and the avail-
ability and the requirements of the given state. In this study, Dufty et al.
(2006) examined the degree to which Coh-Metrix successfully predicted these
assigned grade levels. They found that Flesch Kincaid Grade Level correlated
0.77 with grade level, and that cohesion as measured by LSA sentence to text
similarity correlated –.53. A multiple regression analysis indicated that a
combination of variables produced an R2 of .68, which means that cohesion
in combination with Flesch-Kincaid explains 68% of the variance in the grade
level of the textbooks. Of these variables, three cohesion variables significantly
contributed: LSA sentence to text, incidence of causal verbs, and the incidence
of causal connectives. The results suggested that cohesion could predict
publisher-assigned grade level, and that cohesion in combination with
Flesch-Kincaid Grade Level predicted publisher-assigned grade level better
than either readability alone or cohesion alone. This study, therefore, pro-
vided evidence to support the assumption that cohesion has an important role
to play in the evaluation of text difficulty.
a multidimensional approach
While their simplicity and alignment with grade level might be appealing,
there are a number of reasons why unidimensional representations of com-
prehension may be unsatisfying both theoretically and to a practitioner. First,
unidimensional representations of comprehension tend to ignore the impor-
tance of readers’ deeper levels of understanding. As discussed earlier, tradi-
tional readability measures focus on superficial characteristics of text related
to readers’ understanding of the words and of individual sentences in the text.
Likewise, cloze tasks are most often used to gauge individuals’ reading levels,
and these tasks assess comprehension at the word and sentence level. Hence,
traditional readability measures do not tap readers’ ability to comprehend
global levels of discourse meaning.
Second, unidimensional measures ignore the multiple factors that influ-
ence comprehension, particularly those that influence readers’ use of knowl-
edge and deep comprehension such as cohesion and text genre. Genre
refers to the category of text, such as whether the text is primarily narrative
(e.g., novels, folktales), expository (e.g., textbooks, journal articles), persuasive
(e.g., editorials, sermons), or descriptive (Biber, 1988; Pentimonti, Zucker,
Justice, & Francis, 2010). There are distinctive characteristics of language that
signal text genre (Biber, 1988). The genre of a text can be particularly
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 83 [78–95] 8.10.2013 9:08PM
informative with regard to its difficulty. For example, narrative text is sub-
stantially easier to read, comprehend, and recall than is informational text
(Graesser & McNamara, 2011; Haberlandt & Graesser, 1985).
Third, unidimensional metrics of text difficulty are not particularly helpful
or informative to educators when specific guidance is needed for diagnosing a
student’s particular deficit and planning remediation for students (Connor,
Morrison, Fishman, Schatschneider, & Underwood, 2007; Rapp, van den
Broek, McMaster, Kendeou, & Espin, 2007). Readability formulas do not
identify particular characteristics of texts that may be challenging or helpful
to a student. Unidimensional readability scores provide too little information
to teachers on the nature of a text’s complexity. Most importantly, although a
grade level estimate may indicate to a teacher that a text is more or less
difficult, the score does not provide information on why it is difficult. The
scaling and selection of texts would potentially benefit from an analysis of
multiple levels of language and discourse. One of the advantages of Coh-
Metrix is that it has the potential to inform the type of questions and activities
teachers might employ when presenting texts to the entire class or small
groups. By knowing the potential difficulties of any text in advance, teachers
can craft questions or tasks that help students recognize and overcome these
difficulties.
Coh-Metrix assesses challenges that may occur at the word and sentence
levels as well as deeper levels of language. By doing so it comes closer to
having the capability to estimate how well a reader will comprehend a text at
deeper levels of cognition.
Through research on and with Coh-Metrix (see Chapters 2 and 6), we
have gained a deeper understanding of how texts differ and which indices
are most reliable in detecting these differences at meaningful, consequen-
tial levels. Most recently, this work has culminated in the development
of the Coh-Metrix easability components (Graesser, McNamara, &
Kulikowich, 2011). These components provide a more complete picture
of text ease (and difficulty) that emerge from the linguistic characteristics
of texts. The easability components provided by Coh-Metrix go beyond
traditional readability measures by providing metrics of text character-
istics on multiple levels of language and discourse. Moreover, they are well
aligned with theories of text and discourse comprehension (e.g.,
Graesser & McNamara, 2011; Graesser, Singer, & Trabasso, 1994; Kintsch,
1998; McNamara & Magliano, 2009).
In order to discover what aspects of texts comprise text complexity,
Graesser, McNamara, and Kulikowich (2011) conducted a principal compo-
nents analysis (PCA) on 54 Coh-Metrix indices for 37,520 texts in the TASA
corpus. This corpus comprises excerpts (M=287 words) from texts (without
paragraph break markers) that students can be expected to encounter from
kindergarten through 12th grade. The majority of the text genres are charac-
terized as language arts, science, and social studies/history texts, but the
corpus also includes texts from the domains of business, health, home
economics, and industrial arts. The TASA corpus is the most comprehensive
collection of K–12 texts currently available for research. PCA was used to
reduce the large multivariate database to fewer functional dimensions (e.g.,
Brun, Ehrmann, & Jacquet, 2007). Eight components accounted for a sub-
stantial 67.3% of the variability among texts. These components are notably
closely aligned with the multilevel theoretical framework described in
Chapter 3 and by Graesser and McNamara (2011).
In Coh-Metrix 3.0, we provide these eight components in the form of
z-scores and percentile scores. A z-score is a standard score that indicates
how many standard deviations an observation or datum is above or below the
mean, where the mean is set at 0. A percentile score varies from 0 to 100%,
with higher scores meaning the text is likely to be easier to read than other
texts in the corpus. For example, a percentile score of 80% means that 80% of
the texts are more difficult and 20% are easier. The eight components are as
follows.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 85 [78–95] 8.10.2013 9:08PM
component reflects the number of logical relations in the text that are
explicitly conveyed. This score is likely to be related to the reader’s
deeper understanding of the relations in the text.
8. Temporality (PCTEMPz, PCTEMPp). Texts that contain more cues
about temporality and that have more consistent temporality (i.e.,
tense, aspect) are easier to process and understand. In addition, tem-
poral cohesion contributes to the reader’s situation model level under-
standing of the events in the text.
ity
ity
ity
ity
ity
n
es
es
es
io
io
io
io
io
io
tiv
ic
tiv
lic
tiv
lic
es
es
es
es
es
es
en
en
en
pl
p
ra
ra
ra
oh
oh
oh
oh
oh
oh
m
m
et
et
et
ar
ar
ar
Si
Si
Si
cr
lC
cr
lC
cr
lC
C
N
N
on
on
on
tic
tic
tic
p
p
tia
tia
tia
ee
ee
ee
C
C
ac
ac
ac
en
en
en
D
D
d
d
nt
nt
nt
er
er
er
or
or
or
Sy
Sy
Sy
ef
ef
ef
W
W
R
R
Coh-Metrix Easability Components Coh-Metrix Easability Components Coh-Metrix Easability Components
We can visualize differences between text genres using the easability scores.
Figure 5.1 provides the five main Coh-Metrix easability scores (Narrativity,
Syntactic Simplicity, Word Concreteness, Referential Cohesion, and Deep
Cohesion) for a subset of language arts (n=6755), social studies (n=4463), and
science (n=8550) texts above grade level 6 (i.e., using a Degrees of Reading
Power cutoff score of 55.99) from the TASA corpus. These graphs confirm
that the language arts texts tend to have higher narrativity than do the social
studies or science texts. This high narrativity reflects the use of more familiar
words combined with a tendency to focus on events and characters rather
than objects and ideas. By contrast, the social studies and science texts have a
greater density of information and thus lower narrativity.
If a passage is low in narrativity, the reader is potentially left unscaffolded
by world knowledge. In that case, students’ prior domain knowledge in
particular should be considered. While high narrativity scaffolds reading
comprehension by providing more familiar text, at the same time it is
important to recognize the importance of transitioning readers toward less
narrative text (Best, Floyd, & McNamara, 2008; Sanacore & Palumbo, 2009).
Developing readers must learn to understand increasingly complex and
unfamiliar ideas. If a teacher wishes to move the student toward learning to
use knowledge and generating inferences to understand more challenging
text, the teacher may consider where the text falls on the spectrum of
narrativity in terms of the Coh-Metrix easablity scores.
Figure 5.1 confirms that science and social studies texts are informational
texts that are low in narrativity. These passages also tend to have somewhat
lower word concreteness because informational texts tend to include more
abstract concepts than do language arts texts. If a student has very little
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 89 [78–95] 8.10.2013 9:08PM
domain knowledge, teachers may consider using informational texts that help
compensate for vocabulary and mental model deficits. For example, some
informational texts are higher in narrativity and word concreteness than
others are.
Furthermore, other sources of challenges and ease in the text should be
considered, such as syntax and cohesion (O’Reilly & McNamara, 2007).
Similar to the findings in McNamara, Graesser, and Louwerse (2012),
Figure 5.1 also indicates that science texts tend to have less complex syntax
(e.g., shorter, less complex sentences) and higher referential cohesion than the
other two genres. These sources of ease are necessary for informational texts
that contain a good deal of unfamiliar information. Science texts are, by their
very nature, composed of rare words, making it challenging for students to
understand the concepts in the text. For many readers, greater cohesion and
simpler syntax are crucial for this genre of text. Although language arts texts
tend to have more syntactic challenges for the reader and include more
referential cohesion gaps than do science texts, these types of challenges are
generally surmountable for readers with sufficient world knowledge.
Language in narrative texts at the situation model levels can compensate for
challenges that might result from other challenges.
Interestingly, social studies texts seem to have potential challenges at all
five levels of language. This genre of text does not seem to have a consistent
source of ease to help compensate for those challenges. Likewise, McNamara,
Graesser, and Louwerse (2012) reported that social studies texts have the most
challenging words in comparison to language arts and science texts, but they
are also challenging in terms of syntax and cohesion. Thus, they compensate
for lexical challenges less so than do science texts. Authors’ texts in domains
related to social studies may assume that their readers possess a sufficient level
of knowledge to make inferences about events in the world such as history,
government, civilization, war, geography, and so on. Indeed, readers who
possess the necessary knowledge are likely to comprehend these challenging
texts. But readers who do not may need additional scaffolding to help
compensate for the multiple challenges that potentially arise in social studies
texts.
Examining easability profiles for genres of texts can illuminate their poten-
tial challenges. In addition to examining groups of texts, we can also examine
differences between individual passages. To provide an example, we can
graph the five easability scores for the two passages in Chapter 1, Lady
Chatterley’s Lover, and A Mortgage. The Flesch Grade level scores indicate
that A Mortgage excerpt is a highly challenging passage with a grade level of
15.05 compared to the excerpt from Lady Chatterley’s Lover at a grade level of
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 90 [78–95] 8.10.2013 9:08PM
ity
on
ity
ity
on
n
es
io
es
tiv
ic
io
si
es
iv
ic
en
si
pl
es
e
ra
en
t
pl
e
oh
oh
ra
m
t
ar
oh
oh
re
et
Si
ar
lC
C
N
Si
c
cr
lC
C
N
on
tic
p
tia
on
tic
p
ee
tia
C
ac
ee
en
C
ac
D
en
d
nt
D
er
d
or
nt
er
Sy
or
ef
W
Sy
ef
W
R
R
Coh-Metrix Easability Components Coh-Metrix Easability Components
2.91. The latter would imply that Lady Chatterley’s Lover would be appropri-
ate for a second to third grade reader. However, an average grade level
estimate from 14 excerpts across the novel places the book at Grade 5.
The readability scores provide some indication of the reading skill neces-
sary to tackle these texts. Yet these readability scores do not reveal the
potential sources of the challenges or ease in these short excerpts. The
easability scores in Figure 5.2 convey first that the excerpt from Lady
Chatterley’s Lover is high in narrativity, whereas the excerpt from A
Mortgage is very low in narrativity, just as one would expect. There are
additional sources of challenges in A Mortgage. Sources of difficulty come
from the density of information (i.e., low narrativity), highly complex syntax,
moderate referential cohesion, and very low deep cohesion. These challenges
might be potentially offset for the reader by word concreteness, but more
likely prior knowledge of domains such as accounting would play a large role
in how well a reader understood this passage. The sources of complexity for
the excerpt from Lady Chatterley’s Lover seem to come solely from low
referential cohesion, but these are offset for the reader by syntactic simplicity,
word concreteness, and deep cohesion.
Overall, it may seem from Figure 5.2 that the Lady Chatterley’s Lover
passage would not be challenging. Likewise, the readability estimates placed
it at Grade 5. Both the readability scores and Coh-Metrix miss out on the
qualitative and sociological aspects of Lady Chatterley’s Lover that would
prevent a teacher from assigning it to a Grade 5 reader. In addition, a teacher
would have to consider the knowledge necessary to understand this novel. In
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 91 [78–95] 8.10.2013 9:08PM
this case, knowledge of D. H. Lawrence’s ill health at the time that he wrote
Lady Chatterley’s Lover, as well as the relatively misogynist and sexually
repressed society of those times can help a reader understand the deeper
meaning of the story, particularly with respect to current times. Hence, the
Coh-Metrix easability components are informative in that they indicate that
prior knowledge is necessary to understand the passage (i.e., the referential
cohesion is low). But only a qualitative analysis with respect to the potential
readers and a teacher’s pedagogical goals will unveil whether a reading is
appropriate.
As illustrated with the past two examples, Coh-Metrix can be used to better
understand differences between texts at different readability levels, but it can
also be used to understand texts at similar readability levels. Texts often have
the same readability levels but they seem vastly different in terms of the
potential challenges of the text. There are extreme examples where a story
and a science text have the same grade levels but are very different in the skills
that would be called forth to understand the text. A more subtle example
comes two Common Core State Standards (CCSS) story exemplars, Louisa
May Alcott’s Little Women and Mark Twain’s Tom Sawyer. The sample
excerpts from these stories, provided on pages 77–79 of appendix B (www.
corestandards.org/assets/Appendix_B.pdf), are declared to be at a CCSS 6–8
grade band. Likewise, the Flesch Grade level estimate provided by Coh-
Metrix place Little Women at Grade 7 and Tom Sawyer at Grade 6. Below
are the first sentences from the excerpts provided by CCSS:
Little Women: Merry Christmas, little daughters! I’m glad you began at
once, and hope you will keep on. But I want to say one word
before we sit down. Not far away from here lies a poor
woman with a little newborn baby. Six children are huddled
into one bed to keep from freezing, for they have no fire.
Tom Sawyer: But Tom’s energy did not last. He began to think of the fun
he had planned for this day, and his sorrows multiplied.
Soon the free boys would come tripping along on all sorts of
delicious expeditions, and they would make a world of fun
of him for having to work – the very thought of it burnt him
like fire.
As shown in Figure 5.3, the two excerpts have very different profiles on the
various dimensions. They have similar levels of narrativity and referential
cohesion demands. The low referential cohesion is typical of narratives that
call for the reader to make inferences about the characters and events in the
story. Many of the events and characters in these stories may be readily
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 92 [78–95] 8.10.2013 9:08PM
ity
on
ity
ity
n
es
es
io
io
io
iv
ic
iv
ic
si
es
es
es
en
en
t
pl
pl
e
ra
ra
oh
oh
oh
oh
m
m
et
et
ar
ar
Si
Si
cr
lC
cr
lC
C
N
N
on
on
tic
tic
p
tia
tia
ee
ee
C
C
ac
ac
en
en
D
D
d
d
nt
nt
er
er
or
or
Sy
Sy
ef
ef
W
W
R
R
Coh-Metrix Easability Components Coh-Metrix Easability Components
excerpt provided in the CCSS: “But Tom’s energy did not last. He began to
think of the fun he had planned for this day, and his sorrows multiplied.”
Words such as “energy,” “think,” “fun,” and “sorrows” are relatively familiar
words but have abstract connotations. The CCSS calls for students to under-
stand the connotations, denotations, and roles that specific words play in the
text, and these concepts are likely to be represented by more abstract words.
Hence, stories such as Tom Sawyer may be optimal for tackling inference
making processes about words and concepts in a text.
These two passages were both relatively low in referential cohesion.
However, some passages may have the same grade level estimates and differ
greatly in cohesion. As discussed many times in this book, cohesion is crucial
to comprehension, particularly for readers who have low domain knowledge.
A low-cohesion text should be considered in concert with an understanding
of readers’ knowledge base. If readers have little knowledge, the text is low in
narrativity, and the text is low in cohesion, then comprehension may suffer.
However, with sufficient scaffolding, low referential cohesion can help push
readers to generate inferences to fill in the cohesion gaps (e.g., McNamara,
2004). Consider the following two passages from the Common Core State
Standards (CCSS) informational text exemplars, Discovering Mars: The
Amazing Story of the Red Planet by Melvin D. Berger and Hurricanes:
Earth’s Mightiest Storms by Patricia Lauber, which are provided on pages
70–71 of appendix B (www.corestandards.org/assets/Appendix_B.pdf). These
two exemplars, shown in Figure 5.4, are declared to be at a CCSS 4–5 grade
Easability Percentile Scores
100 100
80 80
60 60
40 40
20 20
0 0
ity
ity
ity
ity
on
n
es
es
io
io
io
tiv
lic
tiv
lic
si
es
es
es
en
en
e
p
p
ra
ra
oh
oh
oh
oh
im
im
et
et
ar
ar
cr
lC
cr
lC
C
S
S
N
N
on
on
ic
ic
p
tia
tia
ee
ee
ct
ct
C
C
en
en
a
a
D
D
d
d
nt
nt
er
er
or
or
Sy
Sy
ef
ef
W
W
R
conclusion
There is a long history of unidimensionsal readability metrics that tap
parameters related to challenges at the word and sentence levels. Coh-
Metrix augments our understanding of readability foremost by providing
an estimate of text cohesion, and secondly by providing more specific infor-
mation on the multiple sources of difficulty that may challenge a reader. A
substantial advantage of Coh-Metrix is that it provides metrics on multiple
levels of language and discourse. Such a picture of texts will hopefully provide
educators and researchers with more information about text ease and the
potential challenges in various types of text.
It is crucial for educators to have access to information about the multiple
characteristics of a text, particularly in relation to other aspects of the text and
to the potential ability levels of the students. Narrativity provides information
about whether the reader is more or less likely to be able to use world
knowledge about events and event structures to understand the text.
Likewise, information on the cohesion indicates the degree to which a reader
will need to use knowledge to understand a text. This information can help
teachers align their pedagogical goals to a particular text. Coh-Metrix may
also provide information leading a teacher to use a different text. If a student
has very low domain or world knowledge, teachers may consider texts that
help compensate for vocabulary and mental model deficits.
While school systems and educators have recognized the importance of
text difficulty for decades and implemented any number of systems to grade
level text and assign readers to texts, there have been few efforts that offer
educators a means to understand characteristics of text relative to their
instructional goals as well as their students’ needs and abilities. The time is
ripe to do so, and teachers are calling for it. Our hope is that Coh-Metrix, and
particularly the Coh-Metrix easability metrics, will help improve student
outcomes in educationally meaningful ways.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 96 [96–112] 8.10.2013 9:23PM
between the low-cohesion and high-cohesion texts, and thus which would be
more predictive of cohesion differences across texts. We included noun, argu-
ment, and stem measures that were crossed with the distance of the overlap
(adjacent, two sentences, three sentences, all distances), as well as whether
overlap should be weighted as a function of distance (i.e., with adjacent overlap
given a higher weight than more distant overlap). All 21 indices showed
significant differences between the high-cohesion and low-cohesion versions
used in the targeted studies, with reported Cohen’s d effect sizes ranging from
0.64 to 1.08. The largest differences were observed for noun and argument
overlap and the smallest differences were observed for stem overlap. This latter
result is likely attributable to the types of manipulations in the targeted studies,
because the experimenters who implemented the changes in the texts likely
increased overlap by repeating the exact words rather than a stem of the word.
Thus, including stem overlap would dilute the differences between the text
versions, and argument overlap would be more precise. Weighting the distance
of the overlap also had an effect, but only for the global cohesion measures
(all distances) wherein weighting the closer overlap in comparison to the more
distant overlap increased the effect sizes. Hence, the cohesion indices were quite
robust and effectively picked up on the differences between the texts. The most
sensitive indices were the noun and argument overlap indices. Although this
may depend on this corpus, argument overlap has often risen to the top in terms
of discriminating between texts in other studies.
The McNamara et al. (2010) study also examined effects of cohesion using
LSA indices. These results generally followed the patterns found for referen-
tial cohesion measures. However, the LSA paragraph-to-paragraph overlap
and paragraph-to-text overlap did not show differences between the high-
cohesion and low-cohesion texts. Moreover, the sentence measures (sentence
to sentence, all sentences, paragraph, and text) showed smaller differences
compared to the referential cohesion indexes. The average effect size for the
referential indices was 0.98, whereas the largest difference observed among
the LSA indices was an effect size of 0.59. In McCarthy et al. (2012), we later
examined the ability of the LSA given/new score (see Chapter 4) to predict the
differences between these low-cohesion and high-cohesion texts, and found
similarly moderate effect sizes (Cohen’s d = 0.39). We assume that this
difference between the referential and LSA measures occurs because LSA
more generously assesses overlap by considering semantically related words,
whereas the referential indices are more stringent semantically. When using
LSA, a sentence is more likely to have some overlap with another sentence.
This is particularly important to the materials investigated in the McNamara
and colleagues’ study because the texts being compared were manipulated
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 99 [96–112] 8.10.2013 9:23PM
versions of one another; that is, the differences were relatively subtle. This
conclusion concurs with those reported by McNamara, Cai, and Louwerse
(2007), who found that overlap measures more accurately predict local
cohesion, whereas the LSA indices better predict global cohesion.
McNamara et al. (2010) also measured cohesion in terms of the incidence
of connectives and the ratio of causal particles to causal verbs (SMCAUSr).
Among the various types of connectives, only causal connectives (CNCCaus)
discriminated between the high-cohesion and low-cohesion texts, presum-
ably because the researchers who created the texts primarily manipulated
causal cohesion and not additive, temporal, or clarification connectives. The
causal ratio index also showed a difference with an effect size of 0.64. This
latter result indicates that there were more connectives, and they were
necessary to express more explicitly the relations between actions and events
expressed in the texts.
Analyses were conducted to examine which of the indices were most pre-
dictive of cohesion differences. We conducted a discriminant analysis to answer
this question. A discriminant analysis is a regression technique used for
categorical data to predict the category of each text, in this case high versus
low cohesion. The results indicated that the text cohesion was predicted best
by a combination of word frequency (WRDFRQmc), LSA similarity (LSASS1),
referential noun cohesion (CRFNO1), and the causal ratio (SMCAUSr). The
high-cohesion texts were higher in cohesion according to LSA, referential
cohesion, and the causal ratio, but contained less frequent (less familiar)
words. This combination of indices appears to capture global, local, and causal
cohesion differences in the text.
In terms of the Coh-Metrix Project, this study was crucial in validating the
Coh-Metrix indices to provide measures of text cohesion. We acknowledge
that the researchers who modified the texts purposively modified referential
and causal cohesion, so it is not surprising that these measures rose to the
surface. However, from a validation perspective, if they had not, it would
have indicated that our measures had missed the mark. Moreover, the results
give credence to the general empirical claim that referential and causal
relationships play important roles in the difficulty of texts and how they are
comprehended.
Duran, Bellissens, Taylor, and McNamara (2007) provided further evi-
dence demonstrating the importance of cohesion to comprehension. Coh-
Metrix was used to classify 60 science texts as easy versus hard using Principle
Components Analysis (PCA). The PCA identified a referential cohesion
component and a word concreteness component in the underlying clustering
of the texts. We then chose four topics that included one easy and one hard
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 100 [96–112] 8.10.2013 9:23PM
text in each of the topics and asked 24 participants to read either the easy or
the hard version for each of the four topics. The easy texts resulted in faster
reading times and better recall compared to difficult texts. The participants
recalled more from the easy texts, and there was a greater overlap between the
text and recall according to LSA measures. This study is different from prior
studies because cohesion was not manipulated; instead it was naturally
occurring in the texts. When topic was controlled, cohesion and word con-
creteness, as measured by Coh-Metrix, predicted the level of the text diffi-
culty. This study was continued by our work to develop measures of text
readability and reading ease, as reported in Chapter 5.
In summary, the validity of the Coh-Metrix cohesion indices has been
established across a number of studies, including the study conducted by
McNamara et al. (2010). Coh-Metrix has also been used across a variety of
studies to control and verify the cohesion of texts when experimentally exam-
ining the effects of cohesion and text difficulty on comprehension. These studies
confirm the power of Coh-Metrix as a tool to provide information about the
cohesion and difficulty of a text. They also simply point to the importance of
considering the cohesion of texts to estimate their potential challenges to
comprehension.
Across a number of studies, we have found that there are many linguistic
features that strongly discriminate between text genres. For example, in
Dempsey, McCarthy, and McNamara (2007), we found that phrasal verbs
alone successfully distinguished between genres. Indeed, across our explo-
rations comparing corpora with different genres, such as narratives and
informational texts, it is not uncommon for every Coh-Metrix variable to
show significant and meaningful differences between the genres. Genres are
different – very different. And Coh-Metrix picks up on that. So how are they
different?
Lightman, McCarthy, Dufty, and McNamara (2007) examined the distri-
butions of cohesion and text difficulty in narrative, science, and history
textbooks across the beginning, middle, and end of each chapter. We
expected that the three genres would show different flows of readability and
cohesion challenges across the chapters. We examined the readability of the
text in terms of Flesch-Kincaid Grade Level (see Chapter 5) and cohesion
using argument overlap and LSA. As expected, the science and history texts
were more difficult than the narratives in terms of Flesch-Kincaid grade
levels. Thus, the words were more familiar and the sentences were simpler
in the narrative texts. However, the science texts were also more cohesive.
They contained more overlap in words and concepts than did both the
history and narrative texts. The cohesion in science texts is necessary in
order to scaffold the reader who is confronted with more unfamiliar and
challenging concepts (e.g., McNamara & Kintsch, 1996). Whereas the science
texts showed higher cohesion, it was interesting that the history texts did not,
despite similar readability challenges as observed in the science texts. Thus,
when reading the history texts, readers may not be scaffolded by cohesion as
well as they should.
When Lightman et al. (2007a) examined text difficulty and cohesion across
the chapters – that is, the flow of challenges in the texts – they found that the
science and history textbooks showed an increase in difficulty at the word and
sentence levels as well as a decrease in cohesion across each chapter. Hence, as
the books progressed, they became more difficult at all levels. The narrative
texts, by contrast, displayed a linear decrease in grade level difficulty across
chapters and only a slight decrease in cohesion. These results suggested that
texts for both expository domains gradually rise in complexity as they
develop. It also provides one example showing how the linguistic properties
and the structural characteristics of narrative fiction are different from expos-
itory textbooks. Although science texts are clearly more challenging overall,
the content in science texts appears to be introduced slowly, with simpler,
more readable writing early on in a chapter.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 102 [96–112] 8.10.2013 9:23PM
Kulikowich (2011; see Chapter 5). Verb cohesion was one of the eight principal
components that emerged from the analysis conducted on the 37,520 TASA
texts. This result indicates that verb cohesion is an important factor in
accounting for variance in differences between texts.
Duran, McCarthy, Graesser, and McNamara (2007) examined temporal
cohesion across science, history, and narrative text genres. Temporality is
important because of is crucial role in organizing language and discourse.
Most theories of text comprehension consider temporality to be one of the
critical dimensions for building a coherent mental representation of events
that are described in texts, particularly in narrative texts (Zwaan &
Radvansky, 1998). In English, temporality is partially represented through
inflections and tense morphemes (e.g., “-ed,” “is,” “has”). The temporal
dimension also depicts unique internal event time frames, such as an event
that is complete or ongoing, by incorporating a diverse tense-aspect system
(ter Meulen, 1995). The occurrence of events at a point in time can also be
established by a large repertoire of adverbial cues, such as “before,” “after,”
“then” (Klein, 1994). These temporal features provide several different indices
of the temporal cohesion of a text.
To investigate differences in temporality across genres, Duran et al. (2007)
asked experts in discourse processing to rate 150 texts in terms of temporal
coherence on 3 continuous scale measures designed to capture unique repre-
sentations of time. These evaluations established a gold standard of tempo-
rality. A multiple regression analysis using Coh-Metrix temporal indices
significantly predicted human ratings of temporal coherence. The predictors
included in the model were a subset of five temporal cohesion features
generated by Coh-Metrix: incidence of temporal expression words (“next,”
“following,” “yesterday,” “now,” “Monday,” “noon,” “week”), incidence of
positive temporal connectives (“before,” “then,” “later”), temporal adverbial
phrases (“in a moment,” “sooner or later”), incidence of past tense (“awoke,”
“began,” “saw”), and incidence of present tense (“look,” “move,” “talk”).
Collectively, all but one of the predictors (i.e., the incidence of positive
temporal connectives) significantly predicted the expert ratings of temporal
coherence. The indices accounted for 40% to 64% of the variance in the
experts’ ratings (depending on the type of rating). The study thus demon-
strated that the Coh-Metrix indices of local, temporal cohesion significantly
predicted human interpretations of temporal coherence, thereby validating
these Coh-Metrix measures of temporality.
A discriminant analysis further indicated that the temporal cohesion
indices were highly predictive of text genres (i.e., science, history, and narra-
tive), and were able to classify texts as belonging to a particular genre with
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 104 [96–112] 8.10.2013 9:23PM
very good reliability (i.e., recall and precision ranged from 0.47 to 0.92, with
an average F-measure of 0.68). The results indicated that narrative and
science texts were most different in terms of temporality, whereas history
and narrative texts were more similar. Science texts contained fewer temporal
adverbial phrases compared with narrative and history texts, whereas narra-
tive texts contained more than history texts. Narrative texts also contained
more positive temporal connectives than did the other two types. This
suggests that temporal adverbial phrases and temporal connective are stylistic
markers of narration. The incidence of present tense was higher in science
texts than in both history and narrative texts, whereas the incidence of past
tense was higher in narrative texts. This makes sense because stories often tell
of past events whereas science is prone to articulate generic, timeless truths.
textbook on physics, and physics texts prepared by Kendeou and Van den
Broek (2009) for psychological experiments. They discovered that the
Coh-Metrix profiles were very similar for college students interacting with
AutoTutor versus a human tutor, and were very similar for the two texts that
deliver information in a monologue (the physics textbook and the experiment-
ers’ texts), but radically different for tutorial dialogues versus monologue
texts. Compared to the tutoring discourse, the two expository monologues
tended to be less fragmented, have more complex sentence syntax, and have
higher referential and situation model cohesion. Some of these differences are
compatible with the reported differences between print and oral language that
were identified in the early 1980s (Tannen, 1982). These results further confirm
the utility of the Coh-Metrix measurement profiles in discriminating different
types of texts and discourse registers.
Another style of discourse is related to truth versus deception. Duran, Hall,
McCarthy, and McNamara (2010) investigated whether cohesion and other
Coh-Metrix indices discriminated between dialogues in which one person
was being deceptive. The deceptive and truthful conversational dialogues
were collected by Hancock, Curry, Goorha, and Woodworth (2007) within
an instant-messaging (IM) environment. The Hancock and colleagues’ study
included 66 students who were randomly paired to create 33 same-sex inter-
locutor pairs. Each interlocutor was placed in a separate room to communi-
cate about various conversation topics using IM. One person in the dyad was
assigned the role of the sender to initiate and maintain the conversation, and
the other was the receiver. The sender was instructed to be truthful on two
topics and deceptive on the other two topics.
Duran et al. (2010) used Coh-Metrix to examine which indices were
predictive of the use of deception. The results indicated that the linguistic
features that characterized the deceptive exchanges were substantially differ-
ent from those that characterized the truthful ones. When the sender was
instructed to be deceptive, the conversational dialogues of both the sender
and receiver were characterized by (a) more words overall, but fewer words
used per conversational turn; (b) more meaningful words; (c) greater syntac-
tic complexity; and (d) lower cohesion (as measured by LSA given-new). The
latter results indicated that deceptive dialogues contain more information
related to preceding context. The deceptive dialogues were not characterized
by higher referential cohesion, and so the deceivers did not seem to reiterate
or repeat information, but rather tended to include fewer semantic focal
points. They hypothesized that the truthful events were more extensively
linked in memory than were the fictitious details comprising the lies. When
recounting a truthful story, one detail reminds the sender of a related one,
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 107 [96–112] 8.10.2013 9:23PM
conclusion
This chapter has focused on studies that have validated or made use of the
Coh-Metrix measures of cohesion. We did not include every Coh-Metrix
study involving cohesion indices, and we did not describe the multitude of
studies that have focused on other indices. We focus here on cohesion because
it is central to the purpose of Coh-Metrix. Cohesion measures are a unique
contribution of the Coh-Metrix tool and project. In our laboratory, the
measures are often used to assess the features of texts used in the context of
experimental studies of text comprehension. Coh-Metrix has also been used
in the context of a variety of corpus studies including validation studies,
exploratory studies, and natural language studies. This chapter has described
a plethora of studies that have shown that cohesion is an important feature of
text and discourse. These studies collectively demonstrate that Coh-Metrix
indices serve as valid proxies for their intended constructs, and that what they
measure is predictive of types of texts and human performance in theoret-
ically guided directions.
C:/ITOOLS/WMS/CUP-NEW/4412224/WORKINGFOLDER/MCNAM/9780521192927PTL02.3D 113 [113–114] 7.10.2013
3:27PM
part ii
The Strategy
By now you should have a fair idea of what the Coh-Metrix tool is, what it is
for, where it all came from, and how to use it. However, knowing how to
operate a text analysis tool like Coh-Metrix and knowing how to write up a
research paper using a tool like Coh-Metrix are two very different things. In
this part of the book, our goal is to show you how to write such a paper. What
we have in mind is a short project paper, the kind of paper that would serve
well as a term paper, a conference proceedings manuscript, or even the basis
of a journal article, thesis, or dissertation.
A term paper, a conference proceedings manuscript, a journal article, a
thesis, and a dissertation may all sound like very different composition types.
However, there is a remarkably similar thread that runs through each of them.
After all, whatever the Coh-Metrix project is, there is still the need to inform
the project’s audience of such questions as What is the project about?,
Why was it done?, How was it done?, What are the results?, and What does
it all mean? In many ways then, whether writing something as short as an
abstract or as long as a dissertation, the key aspects of a research paper
are almost always present. It is those key aspects, questions, or communica-
tion moves (Swales, 1981, 1990) that we will be highlighting and discussing
in this part of the book. By showing you where in the composition these
moves occur, what they function as, what they look like, and how to write
them, we hope to provide you with a thorough guide to writing an excellent
Coh-Metrix research paper.
What we offer in this section of the book is what some call a cookie-cutter
approach to writing. Some may hold this approach to writing in disdain
because it is formulaic and, like a menu-driven statistical tool, may result in
writing without thinking. However, we have found that the beginning writer –
and in this case, beginning users of Coh-Metrix – benefit immensely from
writing formulas. Usually writers have to discover these formulas by trial and
115
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 116 [115–127] 8.10.2013
9:42PM
error. Here we have attempted to speed up that process by not only providing
the cookie cutter but also providing multiple examples of Coh-Metrix
research.
Notably, we offer the following chapters to students and other novice
researchers who have little experience in writing, and in particular in writing
about the types of corpus analyses we describe in this book. As such, you
will see that we use a different rhetorical voice in this section. The previous
section covered theoretical, technological, and empirical information about
Coh-Metrix. The voice there was one similar to the writing you will find in
empirical chapters, proceedings, and journal articles. In this section, by
contrast, we adopt a voice directed at the student and the novice researcher.
We hope you like this kinder, gentler us.
process of studying this book. On the other hand, we do not assume that you
have already written a research paper or even an abstract for a research paper.
And we do not assume that you have specific ideas on what is mentioned in a
Method section, written in a Results section, disseminated in a Discussion
section, or composed in a corpus.
over, usually because the wife or girlfriend has found someone else. As the
name suggests, such a letter typically begins with the words Dear [+ name].
The Dear [+ name] is a move that serves to signal the opening of the letter;
that is, who the letter should be read by and that the essence of communica-
tion is about to be presented. Other moves in a Dear John letter might include
the cause/excuse for the splitting up, the reason why the split had to be conveyed
by mail, and, comfortingly enough, a sincere and heartfelt wish for the
recipient’s future happiness. The order of the moves is critical. For example,
it would not be suitably gripping for the writer to inform the recipient of the
forthcoming Hawaiian vacation with the new lover before she had actually
performed the move of notifying the current beaux that it’s all over. It is
noteworthy that moves from other, similar discourse structures (e.g., a post-
card) may not be appropriate. For example, a move that requests any happy
news from the recipient will likely be absent in a Dear John letter. Also absent
from a Dear John letter will be the move that often signals the end of a
communication, specifically hope to see you soon!
Moves are the functions of parts of texts, or what Mann and Thompson
(1988) call rhetorical functions. These functions are sometimes explicitly articu-
lated in the texts with words or phrases that signal the function to the
experienced reader. The words and phrases are typically frozen expressions
because their meaning over time has become fixed, broadly accepted, and
widely understood within the discourse community. For example, in a Dear
John letter, we can see that frozen expressions are very common among the
moves. The word “dear” in Dear [+name] is not arbitrary: It was chosen instead
of other alternatives such as “Hi” or “Hey.” The “dear” conveys a more formal
tone for such a note. This formality signals that it is unlikely that the commu-
nication relates to something mundane like setting the TiVo or feeding the cat.
Instead, “dear” is more likely to signal to an intimate partner the move that
conveys “Listen up, I have some news, and you ain’t gonna like it.” Other frozen
expressions in the Dear John letter include “I think we’ve both known for a long
time,” “I will always treasure . . .,” and “you’re too good for me anyway.”
Odd as it may sound, a Coh-Metrix research paper is just like a Dear John
letter: It is composed of series of scripted moves that are most often in a fixed
order and very often have frozen expressions to signal their function. Also
similar to a Dear John letter, a Coh-Metrix research paper does not allow
numerous moves from other, similar discourse genres or registers. For
instance, a science paper typically has a move at the end of the introduction
that informs the reader as to the forthcoming section headers of the paper
(i.e., Method, Results, Discussion). This move provides a global overview of
the paper, but the reader could also get an overview by perusing sections over
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 119 [115–127] 8.10.2013
9:42PM
the entire manuscript. This example serves as a reminder that many moves
are conventions that may or may not have a rational foundation. Like all
conventions, they have patterns and parts that the experienced reader needs
to see, and the inexperienced writer needs to learn.
The entire point of writing a research paper, as opposed to merely con-
ducting the experiment, is to clearly convey the researcher’s message to the
researcher’s audience. We have argued here that moves (and their associated
frozen expressions) are useful templates for constructing the researcher
paper. But moves are not just the scaffolding around which a draft paper is
wrapped, and neither are frozen expressions simply trite or vacuous clichés
that demonstrate a scientist’s lack of originality. Instead, both moves and
frozen expression are warmly welcomed by readers in the discourse com-
munity because they make understanding the paper both easier and faster.
How do conventional communication moves and frozen expressions make
understanding the paper easier? The answer, simply put, is that it minimizes
cognitive load and maximizes common ground (Clark & Schaefer, 1989;
Kalyuga, 2012). Our cognitive resources are not limitless, so it is beneficial
to learning if our cognitive processes and activities are optimally managed:
We are likely to learn more if we are free to concentrate on understanding
the substantive content in the text rather than having to use our cognitive
resources to infer the writer’s intentions.
In practical terms, we optimize the reader’s cognitive load by presenting our
paper in a predictable form, a predictable order, and using predictable language.
As such, the more the reader’s expectations can be met, the more cognitive
resources the reader has available for understanding the study’s issue. For
example, the reader needs to know the research question, so explicitly making
a statement such as “Our research question is . . .” facilitates the reader’s
processing. That is, using explicit language means that the reader doesn’t have
to use up valuable cognitive resources by making inferences (which might not
even be correct!). Employing well-established moves and frozen expressions are
facilitative in this respect because they are part of accepted, standard, and
established language that conveys accepted, standard, and established meaning.
getting started
Many experienced researchers view a study as evolving through the following
cycles:
1. Theories beget hypotheses.
2. Hypotheses beget research questions.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 120 [115–127] 8.10.2013
9:42PM
Once you have chosen a theme, your next task is to consider the theme’s
practicality. That is, is it even possible to do such a study with the time and
resources available to you? To address that question, consider what we will
need, at a minimum, for a Coh-Metrix study of this kind.
1. You will need “typed texts”1
2. You will need these texts to be about 100 to 1,000 words long.
3. You will need at least 20 of these texts (see Chapter 9)
4. The texts need to be in (relatively) Standard English.
Given these limitations, it is probably not wise to conduct a study on the plays
of James Joyce, because he only wrote two of them. It is probably not prudent
to conduct a study on Russian novels, because they tend to be very long.
Telegrams are not a wise discourse form because the texts are very short,
and not in Standard English. Your time and effort is also a serious factor.
Downloading from the Web is very fast, cheap, and easy; transcribing con-
versations, scanning books, and organizing essay collections is laborious and
time consuming.
Narrowing Down the Theme. It is important to start a Coh-Metrix study
(like any serious study) by thinking in terms of bricks rather than houses. The
vast majority of researchers achieved their status as a result of long series of
experiments, trials, observations, successes, and also failures. If your project
is on literature, you cannot answer a question as broad as “Is American
literature better than British literature?” If you plan on conducting a study
on gender, then it would require a series of studies on many corpora to answer
the question “Is female writing different from male writing?” These questions
are far too broad for any single study to ever address. The secret of a good
Coh-Metrix research paper is to narrow down your theme to a single doable
study. That said, over the course of many such studies, the bricks will gather
up, but it is only at the end of a long process that we see a fully formed house.
Let’s now return to our list of possible themes and see how we can narrow
them down. For example, instead of just “hobbies,” we could have Traditionally
male hobbies, Traditionally female hobbies, Traditionally children’s hobbies,
Outdoor hobbies, Indoor hobbies, Winter hobbies, Summer hobbies, American
hobbies, Alaskan hobbies, New hobbies, Getting started in hobbies, Hobbies as
written by American, Hobbies as written by Australians, Hobbies as written by
1
Coh-Metrix can only process typed texts (not handwritten texts). Coh-Metrix, like most related
software, typically expects documents to be in the .txt format, although variations of the .doc
format may also be used. As technology develops, Coh-Metrix is likely to adapt to new and various
formats of documentation. Because of these changing circumstances, we provide document
settings and document-loading instructions on the tool itself.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 122 [115–127] 8.10.2013
9:42PM
Australians who became Americans, and so on and so forth. The point here is
to narrow down the theme, and to continue to narrow it down until you have
one very specific topic, which will be the subject of your study. In the example
we are using in this chapter, we narrowed down the broad theme of newspaper
stories to “the reporting of local versus global issues in newspaper stories.”
Our study focuses on the language features of newspaper reports. More specifi-
cally, we are interested in the differences between language used for the reporting
of international news (i.e., global issues), and language used for the reporting of
national news (i.e., local issues). Our research question is: Does the language of
news reports become more complex when reporting global issues as opposed to
local issues? And if so, what features of language are driving these differences? To
address our research questions, we formed two contrasting hypotheses. The first
hypothesis is that the language of news reports will become more complex when
reporting global issues because any reporting of global news is likely to be a more
important story, and therefore more difficult to explain: The language of the
report will reflect this difficulty. In contrast, our second hypothesis is that that the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 123 [115–127] 8.10.2013
9:42PM
language of news reports will become less complex when reporting global issues
because the difficult nature of describing such world issues will cause writers to
use facilitative language: The language of the report will reflect this facilitation.
This study builds from the work of researchers such as Herb Clark, Art Graesser,
Walter Kintsch, Danielle McNamara, and John Swales. Their research suggests
that background knowledge, schemas, and expectations of shared experience need
to be established in order to increase the likelihood of comprehension, and that
explicit cohesion at the level of the text might facilitate this goal. Based on this
theory, we can expect some measure of assumed common ground between writer
and reader for local issues. As such, there will be little need for simple language or
explicit textual cohesion. However, if the writer pays little or no attention to the
focus of the report, then the complexity of the global issues might manifest itself
only in more complex and less facilitative language. Our goal in this study is to
discover and assess the language differences used in the reporting of local and
global issues, and, based on our findings, to offer some idea as to the effect these
language features might have on the communicative goals of writers. In order to
address our research question, we will construct two contrastive corpora: one of
newspaper stories concerning local issues, and one of newspaper stories consid-
ering global issues. Having formed the two corpora, we will process the text using
various cognitive and linguistic indices from Coh-Metrix, including situation
model, referential, causal, temporal, special, syntactical, and lexical diversity
indices. Coh-Metrix is particularly well suited to this study, having had its indices
validated in numerous previous studies. We will assess the differences between
the corpora by conducting a series of t-tests. The study is of interest to writers,
especially reporters, because their task is to effectively communicate information
to those who wish to learn. The task is also important to linguists and cognitive
scientists because it stands to better explain how differences in perceived catego-
ries (local, global) are made manifest through linguistic features.
This Elevator Pitch may seem long on time and complex in structure. However,
as we shall see, neither is really the case. First of all, considering the length, the
aforementioned Elevator Pitch takes just two minutes to recite. Such a length
of time may be longer than many elevator rides, but even the most stuffy of
professors can usually (quite literally) spare two minutes for a student. Turning
to the complexity of structure of the pitch, we can actually see that the text
breaks down into a series of moves, the function of which can be represented by
a series of questions. In total, we use 11 Elevator Pitch questions (see Table 7.1).
Generally speaking, if all 11 of these questions have been answered, then your
Elevator Pitch work is complete.
Before we start describing the moves in more detail, it is important that
we make a quick note on the pronoun use we have adopted in this and the
forthcoming chapters. There are four authors of this book, so we always
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 124 [115–127] 8.10.2013
9:42PM
use the pronoun “we.” If you are writing your project as a single author,
you can use the pronoun “I.” However, many researchers balk at the first
person and prefer passive constructions. It is probably a good idea to talk
this issue over with your advisor, or to read sample articles in the publica-
tion outlet.
Let’s now look more closely at just the first two of these questions. The other
nine moves will be discussed over the remainder of this section of the book.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 126 [115–127] 8.10.2013
9:42PM
conclusion
In this chapter we introduced the basic structure of a Coh-Metrix research
study. We outlined the major parts of the study – Introduction, Method,
Results, and Discussion – and we explained that each of these sections
comprises fairly standard moves, which are often constructed with the help
of standard frozen expressions. With regard to the moves, we discussed
choosing a theme for the study and narrowing that theme down to a workable
size. With regard to frozen expressions, we explained why they are useful and
why they are expected. We also provided several examples of frozen expres-
sions. In the next chapter we will be discussing the major moves of the
introduction section of a research paper.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 128 [128–144] 9.10.2013
7:37AM
The Introduction
researcher for the answer to be “yes.” That is, supplementary questions are
often much more speculative, and researchers are required to present a
number of ideas that might explain these questions. Indeed, these questions
are often the basis for “further research,” a notion we revisit later in this book
when we describe the Discussion section (see Chapter 12).
As we mentioned in the section called “Moves” in Chapter 7, a research
paper often has frozen expressions that signal a specific meaning to the
experienced reader. For the research question, the frozen expression is,
simply enough, “Our research question is . . .” This frozen expression may
not seem like rocket science; however, many new researchers think they have
to be original in their writing when, in fact, broadly accepted terminology is
far more likely to be well received.
As a final remark for this section, we should keep in mind that the main
purpose of the research question is to keep the paper focused. That is, the
researchers (and the subsequent readers of that research) should always be
able to relate any part of the paper to the research question. In yet other
words, if the relationship between the research question and any subsequent
part of the paper isn’t immediately apparent, then either that part of the paper
or the research question needs to be modified. This having been said, we also
need to remember that different researchers have different styles of inquiry,
and different research has different demands on what kinds of questions can
be asked. As such, what we have written here on research question format
should serve well the beginning researcher but it should never be treated as a
straight jacket.
Theory
Researchers in most academic fields consider the words “theory” and “hypoth-
esis” to have quite different meanings. By contrast, in informal situations, the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 133 [128–144] 9.10.2013
7:37AM
Theoretical Frameworks
To help us better understand how theory generally takes shape in discourse
science, we should examine the term “theoretical framework.” A theoretical
framework can be viewed as a preliminary theory. More specifically, a theo-
retical framework is a preliminary sketch of a complex system that organizes
a collection of related findings that researchers have packaged and presented
in a coherent fashion. This package may range from the entirely new to a well-
established cohort of findings supported by rigorous empirical studies. In
Coh-Metrix studies, a very pertinent example of a theoretical framework is
cohesion.
You may be familiar with a number of other terms that are very closely
related to what we have called theoretical framework. These terms include
“literature review” and a “major area paper.” Essentially, a literature review
(which is often a chapter in a dissertation) and a major area paper (which is
often a requirement for a doctorate degree) are examples of an extensive,
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 134 [128–144] 9.10.2013
7:37AM
Hypotheses
Earlier we argued that a theory is a broadly accepted explanation and under-
standing of some phenomenon in the world. Thus, if a theory is an explan-
ation or an understanding, then theory, in whatever form, allows us to make
predictions: If we understand something, then we not only know how it works
but also how it will work. The articulation of a prediction is the application of
a theory, and when stated formally, it is referred to as a hypothesis.
A hypothesis is closely related to a research question. Recall from our
Elevator Pitch that our research question was: “Does the language of news
reports become more complex when reporting global issues as opposed to
local issues?” This research question had two corresponding hypotheses:
(1) the language of news reports will become more complex when reporting
global issues; and (2) the language of news reports will become less complex
when reporting global issues. From these examples we can see that hypotheses
are research questions set up as claims (or predictions). Also of importance,
note the word “will” in the hypothesis. This word is common in hypotheses
because we are predicting, and predictions are about the future. Also note that
hypotheses are often accompanied by an explanatory or supportive state-
ment. That is, the hypothesis states what will happen and the supporting
statement explains (briefly) why it will happen. For example, in our Elevator
Pitch, one of the supporting statements was “. . . because any reporting of
global news is likely to be a more important story, and therefore more difficult
to explain.” The clearest way to mark a supporting statement is to use the
word “because.” This word easily signals to the readers that the forthcoming
text will explain the preceding claim.
A researcher tests a hypothesis in order for us to learn more about the
theory from which the hypothesis was generated. If the results of the experi-
ment support the hypothesis, then the theory is strengthened. If the results of
the experiment do not support the hypothesis, then we may have misunder-
stood the theory, misarticulated the theory, or misapplied the theory. On the
other hand, if the results of the experiment are contrary to our hypothesis,
then we may have to reassess the theory, revise the theory, or reject the theory.
And often we simply have to reexamine the data, rethink the analysis, or redo
the experiment.
The important point with theories and hypotheses is that they are the
elements of the method through which we learn about the world. The theory
represents our current understanding, and our goal is to expand that under-
standing. To do so, we extrapolate from the theory an inference (i.e., a hypoth-
esis) about an as yet uncharted area of the framework. We then test the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 136 [128–144] 9.10.2013
7:37AM
hypothesis so that we might have evidence that will lead us to a better under-
standing of the world.
Applying Hypotheses
Let us now turn our attention to how hypotheses are put in place in a research
paper. Every research question has at least two hypotheses: H0 and H1. We use
H0 to designate what is called the null hypothesis. The null hypothesis is the
assumption that all things in the world are equal. The purpose of conducting a
test is to establish whether there is sufficient evidence to reject this null
hypothesis. That is, we want to establish whether there is sufficient evidence
to support H1, which is the theory-based prediction that at least some two
things in the world are not equal. There can also be predictions motivated by
other theoretical frameworks or even theories that predict something very
different than H1 or H0, which can be designated as H2, H3, and so on. In
our Elevator Pitch example, we can say that H1 is the hypothesis that global
newspaper articles are more cohesive than are their local counterparts; we can
say that H2 is the hypothesis that local newspaper articles are more cohesive
than are their global counterparts; and because there is always an H0, we can
say that H0 is the hypothesis that the two categories of articles are equal in
terms of cohesion.
A Coh-Metrix study that is a nice example of such a H0, H1, H2 scenario
is provided by Lightman et al. (2007a). Erin Lightman and her colleagues
investigated cohesion in expository texts. Their research question was: Does
cohesion vary as a function of the page-progress through a book chapter. From
this question she formed three hypotheses:
1. Cohesion will remain relatively constant as a text progresses because
all places in a text are equal (H0).
2. Cohesion will gradually decrease as a text progresses because greater
cohesion is needed at the beginning of a text where the student is least
likely to understand the material (H1).
3. Cohesion will gradually increase as a text progresses because as a text
develops it becomes ever more complex and will subsequently need
greater authorial connections (H2).
In this format then, we are not looking so much at an arrangement of a yes/no
question, but at an arrangement of if hypothesis H1 is correct, then expect results
R1, but if hypothesis H2 is correct, then expect results R2. On the other hand, if
there is insufficient evidence for either H1 or H2, then we cannot reject H0.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 137 [128–144] 9.10.2013
7:37AM
1
From a statistical point of view, an analysis that is not guided by some kind of theory is also of
questionable validity. That is, a “statistically significant result” is only really valid if we can “reject
the null hypothesis.” However, if there is effectively no hypothesis, then it is difficult to argue that
it has been rejected, meaning the result is uninterpretable.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 138 [128–144] 9.10.2013
7:37AM
way or another) the cohesion of text. One of the measures in Coh-Metrix that
is strongly related to cohesion is lexical diversity (see McCarthy & Jarvis,
2007). Lexical diversity is an assessment of the range of vocabulary employed
in a text. Texts with a lower range of vocabulary should have higher cohesion
because the same content words are used repeatedly, and that should lend
itself to cohesion. For the first version of Coh-Metrix, the measure used to
assess lexical diversity was type-token ratio (TTR), an index described in
Chapter 4 of this book. Unfortunately, TTR is confounded by variations in
text length, meaning that researchers who wanted to assess texts of different
length for lexical diversity could never quite be sure whether they were
measuring different vocabulary ranges or just different lengths of texts. To
overcome this problem, a new index of lexical diversity (MTLD; McCarthy &
Jarvis, 2010, 2013) was designed, and this new index was tested to establish the
degree to which it was resistant to variations in text length. In the article that
reported the testing of MTLD, the authors situated the study just as we have
done in this paragraph: by first of all “establishing the problem” with lexical
diversity, and then showing how that problem had been addressed.
As we can see from this example, there are many people for whom this work
might have relevance, and many reasons for why it is relevant to them.
The relevance of a study is often overlooked in papers because the study’s
researchers and the study’s audience are both part of the same discourse
community. In other words, the people who care about the findings are other
people just like those who are conducting the research. As such, there is
considerable assumed common ground. However, even if the study is primarily
of interest to the field within which you are working, you would still hope that
the findings you are planning to report will lead to a better understanding of the
issue you have identified. With this in mind, consider the following extract by
Art Graesser and his colleagues: “[U]nderstanding at the level of the mental
model has particularly important implications for comprehension because this is
the level at which many readers struggle” (Graesser et al., 2003, p. 90). The study
was clearly written for an audience familiar with such concepts as mental models
and comprehension, but the relevance of the study is still explicitly stated so that
all readers can understand how the study is important to the developing field.
Just as the relevance of a study is often overlooked by writers, so too can it
be overlooked by readers. Often, the relevance of the study comes straight
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 143 [128–144] 9.10.2013
7:37AM
after the purpose of the study, so inexperienced readers might not even notice
it as a distinct move. For example, in the following extract, Scott Crossley and
his colleagues say who should care about the study and what they should care
about (see italics in the excerpt) almost immediately after they remind us
what the purpose of the study was (see underlined).
The purpose of the study was to examine whether a tool such as Coh-Metrix could
discriminate between comparable text-types and provide useful information
about the subtle differences between texts. The results of this study suggest that
computational tools such as Coh-Metrix can be used as a means of distinguishing
groups of similar text-types. From a practical standpoint, the findings provide
researchers interested in the field of second language material development with
fundamental information about how simplified and authentic texts differ and to
what degree. (Crossley et al., 2007, pp. 208–209)
To help readers (noting that reviewers and professors are readers too) identify
the relevance of a study, it is probably a good idea to point out exactly to whom
the study is of interest and exactly why it is of interest to them. For example,
maybe the study has practical benefits, making it of interest to text book
designers, teachers, or developers of intelligent tutoring systems. If so, make sure
that a good number of examples of what you are studying are included in the
paper, so that developers can easily establish how the research can be applied. If
the study is more directly of interest to the field, then you need to state clearly
which area of the field and why your study is of benefit to that area of the field.
Applying Frozen Expressions. As ever, there are some frozen expressions
that may be of use when writing relevance moves. For example, we can write:
“This study is of interest to X because Y.” In this formalism, X is who should
be interested and Y is the reason they should be interested. Sometimes, there
is just one major interested party. In this case, a helpful frozen expression is:
“This study is important because X.” Here, X is why people (or the field in
general) should care about the study.
Finally, just in case you might be pondering the value of frozen expressions
like these, we present below a little indication of their widespread use and
growth. The numbers associated with the frozen expressions that follow are
the number of Google hits for the phrase, as taken in June 2012. The numbers
in the parentheses are for the same phrases as recorded a year earlier (June
2011). We’ll leave the math (and the implications of the math) to you:
“this study is of interest to” = 96,700 (56,900)
“this paper is of interest to” = 65,800 (36,800)
“this project is of interest to” = 270,000 (53,400)
“this work is of interest to” = 1,290,000 (250,000)
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 144 [128–144] 9.10.2013
7:37AM
conclusion
In this chapter we discussed forming a research question and supplementary
questions, stating theoretical frameworks and hypotheses, situating and
integrating theory, identifying the purpose of the study, and ensuring that
the relevance of the study is made explicit. In the next chapter we will be
discussing the material for the study (i.e., the texts comprising the corpus).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 145 [145–162] 9.10.2013
7:44AM
The Corpus
In many Coh-Metrix text analysis studies, there is no section with the label
“Method.” Instead, most Coh-Metrix text analysis papers tend to have two
major sections that lie between the Introduction and the Results: These
sections are descriptions of the corpus and the tool, respectively. The sections
on the corpus and the tool largely serve the same purpose as traditional
Method sections. That is, instead of describing the participants in the experi-
ment and how the experiment was conducted, the papers discuss the texts in
the corpus and the variables used from Coh-Metrix. Some Coh-Metrix corpus
studies do use a “Method” header, which is often followed by subheaders for
the description of the corpus, the tool, the variables, and so forth. The final
choice for headers is up to the researcher, the professor, or the conference/
journal guidelines.1 Whatever your headers, however, the next two major
sections we have to consider are the corpus (i.e., the collection of texts we will
use) and the tool, Coh-Metrix (what it is, what it does, why we’re using it, and
how we’re using it). This chapter focuses on the first of those sections, the
corpus.
In Chapter 8 we used the research question as our starting point for a Coh-
Metrix project. We also mentioned that most researchers would argue that
the starting point must be the theoretical framework. However, whether you
start with a research question or with theory, you will very soon afterward
need to be considering your corpus, and continue considering your corpus
during most of the research process.
A corpus is a collection of texts. These texts are the subject of any Coh-
Metrix analysis. The texts are of immense importance because they are the
1
For those interested in a detailed account of Method sections geared more toward psychology
research papers, or if you are planning to do research that involves human participants, we
recommend you also look at Kallet (2004).
145
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 146 [145–162] 9.10.2013
7:44AM
empirical manifestations of the hypothesis you are testing (see Chapters 8 and
11 for more on hypothesis testing). Building a corpus is no simple matter, and
many criteria have to be considered (e.g., what kinds of texts should be in it,
how large does it have to be, etc.). Careful considerations of these and other
questions are just as important as the forming the research question, the
hypotheses, and the theory.
With all of these points in mind, we shall now carefully examine the
concept of the corpus (plural: corpora) and, more particularly, the character-
istics of corpora that are suitable for Coh-Metrix studies.
what is a corpus?
At a basic level, a corpus is a set of texts that are relevant to the research
questions and that have relevant themes, registers, genres, or text types. At a
more sophisticated level, we can consider a corpus to be “a set of written,
representative and balanced, computationally readable texts that form a
reasonable point of departure as a thematically related language variety,
register, genre, or text-type.” Clearly, this long definition requires some
breaking down, and so the remainder of this section of the chapter examines
each of the elements in this definition so as to provide a better understanding
of what Coh-Metrix studies typically consider to be a corpus.
Language Variety, Register, Genre, or Text Type. By language variety,
register, genre, or text type we simply mean that we have no intention of
splitting hairs over these categorization terms, or trying to define where one
category ends and another one begins (interesting study though that may be).
We acknowledge that any number of researchers may feel that a distinction
between some of the terms is crucial. And, to be sure, we would probably call a
corpus of “narrative introductions” a text type rather than a genre, and a
corpus of public speeches a register rather than a language variety. However,
in Coh-Metrix studies, we have yet to experience reviewers having a problem
with how we choose to use these terms, so we leave the choice of terms up to
the individual researcher.
Written, Computationally Readable Texts. By written, computationally
readable texts we mean that Coh-Metrix can only analyze that which is
computationally analyzable. More simply, there is no slot in Coh-Metrix
through which we can deposit handwritten texts, painted texts, CDs of
talks, DVDs, or any example of sign language or brail. Although making
such remarks might seem obvious, it is nevertheless important to consider
these limitations of Coh-Metrix because (1) many people ask us, (2) future
developments in Coh-Metrix need to consider these aspects because they are,
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 147 [145–162] 9.10.2013
7:44AM
after all, language too, and (3) if the researcher’s texts are in any of these
forms, then they will have to be changed to .txt or .doc documents, a process
that might be extraordinarily long and arduous.
Thematically Related. By thematically related we mean that every text in the
corpus is related to every other text in the corpus by a single theme. Thus, just as
“eagle,” “crow,” “robin,” and “swan” are all related to the common theme of
“birds,” so too must every text in a corpus be an example of an overarching
theme. In our example corpus (which we introduced in the previous chapter),
all of our texts fall under the common theme of newspapers.
Representative and Balanced. The terms “representative” and “balanced”
are closely related to the previously discussed notion of “thematically related.”
The key difference is that while thematically related puts the focus on the need
for the texts to be members of a single theme, representative and balanced put
the focus on the need for the theme to have an appropriate membership of
texts. To explain further, the terms “representative” and “balanced” address
the reasonable expectation of someone using the corpus to find within it a
suitable diversity of types of text and a suitable frequency of examples of these
types. To draw an analogy, let us imagine that we happen upon a building that
calls itself Los Compadres. And let us imagine that this building has pinned
on its wall a sign that reads “restaurant.” Within the building, that we take to
be a Mexican restaurant, it would be reasonable for us to expect food items
that included burritos, tacos, enchiladas, and the like. It would also be
reasonable to expect tables, chairs, beer taps, and servers. The presence of
these diverse items constitutes “representativeness.” But now imagine that
inside this building there were just one burrito, one server, one kind of beer,
and 5,000 tables. Such a frequency of examples of the membership would be
extremely poorly “balanced.” Thus, balance refers to an appropriate number
of examples of the membership items.
Turning from a Mexican restaurant to a more text-like example, imagine a
corpus of American newspapers. A corpus of American newspapers is not
simply a corpus of newspapers; it is explicitly a corpus of American newspapers.
As such, it should not contain British, Australian, or Icelandic newspapers
because British, Australian, and Icelandic newspapers are not representative of
American newspapers. And if the corpus of American newspapers is truly a
corpus of American newspapers, then it would have to have both national and
local newspapers, because if it had only national newspapers, then it would be a
corpus of American national newspapers. Further, American national news-
papers have been around for more than 100 years, so if the corpus contained
only articles from, say, 1990 to 2010, then it would not be a corpus of American
national newspapers; it would be a corpus of articles from American national
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 148 [145–162] 9.10.2013
7:44AM
newspapers from 1990 to 2010. And so on and so forth. The point here is that a
researcher needs to consider very carefully the scope of the corpus in order to
make it sufficiently representative (i.e., having all the major members) and
sufficiently balanced (i.e., having appropriate numbers of the major members).
But note here the use of the word “major.” We will return to this point later in
the section.
But the terms “representative” and “balanced” don’t apply just to the
diversity of the total items in the corpus. They also apply to the diversity
within the items itself. That is to say, we must not make the mistake of
thinking that all texts are homogenous; Instead, we must accept that texts
(like pretty much everything else) are made up of many different parts, each
of which may be quite different in nature. For example, let’s consider a
news show, a restaurant dinner, and an ice hockey game. Now let’s divide
each of these examples into thirds (first third, middle third, and final
third). Arguably, the first third of the news show is the most important
part because that’s where the headlines and big stories are most likely to
be. For a restaurant dinner, the appetizer and the desert may be highly
enjoyable, but the middle third (the main course) is probably what the
customers will remember most about the dining experience. And in an
ice hockey game, action can happen at any of the three periods, but it’s
probably the third period (i.e., the final third) that most people would want
to watch if they could only view one part of the game. A text is very similar.
The opening and the closing are quite different aspects, so much so that they
have come to be known by various names that identify them as distinct
types: openings are variously referred to by terms such as exposition,
introduction, foreword, commencement, and preface; closings are variously
referred to the denouement, conclusion, postscript, and finale. Even texts as
small as the paragraph may open with something called a topic sentence and
close with something called a warrant sentence (McCarthy et al., 2008). And
we cannot even assume that the openings and closings are equal in size; after
all, the opening of War and Peace (a 2,000-page tome) is hardly equal in
length to the opening of Three Little Pigs. And what about the middle of the
text? Is the middle only the very middle? How many words on either side of
the middle are also “in the middle”? All of these questions need to be
carefully considered so that the corpus can be justified as representative
and balanced.
For Coh-Metrix analyses, it is vital that the corpora be representative and
balanced. However, let us make it clear that in research in general, the
composition of the corpus depends on the task at hand. For instance, imagine
that we wanted to examine the language of English, with all of its history and
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 149 [145–162] 9.10.2013
7:44AM
variety. And imagine that to do this we used only one text type, let’s say
newspapers. And imagine further that the representativeness of this corpus
amounted to no more than a single type of newspaper, let’s say The Wall
Street Journal. Such a corpus you might think was extremely flawed (given
what we have previously discussed). However, it is interesting (and maybe a
bit worrying) to note that the majority of computational parsing technology
(including the parser used in Coh-Metrix) has been developed, tested, and
validated on exactly this highly unrepresentative corpus.
Shouldn’t this lack of representation present a problem? In fact, it really
doesn’t present that much of a problem at all (at least for some tasks!).
Even though The Wall Street Journal is extremely unrepresentative of
English language as a whole, it is nevertheless a pretty large sample and it is
written in English. These two elements alone mean that a colossal amount of
information can be gleaned from it. Indeed, when Gildea (2001) assessed
state-of-the-art parsers by replacing The Wall Street Journal with the Brown
Corpus (arguably the very model of representativeness and balance, having
15 different registers and numerous examples of each), he found the two
corpora produced remarkably similar results.
The point with a corpus as seemingly unrepresentative as The Wall Street
Journal is that we can learn a lot from it. That is, we can still learn a lot from it
if our task is appropriate. For instance, we can use the corpus to learn that the
most common word in the English language is “the,” and within the corpus
we can find numerous examples of typical English syntax: subject-verb-
object. We can also search the corpus to see what is rare in English. For
example, the part-of-speech structure verb-noun-verb-adjective-article-verb
is very uncommon in English (and very uncommon in The Wall Street
Journal corpus). Having identified which structures are rare, we can assume
that those structures will be difficult for readers to process. In short, then, we
can do (and indeed have done) numerous investigations with this corpus, and
the findings from these investigations can be (and indeed have been)
extremely valuable to a wide variety of research fields.
There are, however, certain things we can’t do with a corpus such as The
Wall Street Journal. We cannot address research questions such as: Are
higher-graded student essays more cohesive? Are doctors’ conversational
turns more cohesive than patients turns? Does newspaper English have
more examples of referential cohesion than causal cohesion? We cannot
address any of these questions because: (1) The Wall Street Journal is not in
any way a graded student essay, so we cannot make any claims that are
specifically about graded student essays; (2) The Wall Street Journal is not
in any way a conversation, so we cannot make any claims that are specifically
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 150 [145–162] 9.10.2013
7:44AM
about conversational English; and (3) although The Wall Street Journal is an
example of a newspaper type, we cannot make any specific claim that it
“generalizes” to all newspapers. As such, the general rule of the thumb for
satisfying representation is the wider you make the representation, the more
able you are to generalize your conclusions.
A Reasonable Point of Departure. By reasonable point of departure we
mean that we don’t need a “perfect corpus”; we just need one that gets the ball
rolling. The concepts of representativeness and balance (discussed earlier)
make it extremely time consuming and expensive to collect the “perfect
corpus.” Put another way, the concepts of representativeness and balance
mean that the corpora we make must be extraordinarily narrowly defined in
order to be appropriately representative and balanced. Many researchers
working in the field of corpus linguistics take these issues extremely seriously
and dedicate huge amounts of time and effort to making remarkable corpora
that are impressively representative and balanced. The British National
Corpus is a good example of this dedicated effort, as are the famous Learner
Corpus and Brown Corpus. Another prime example is the TASA corpus,
which we have used for many purposes including the calculation of the
Norms in Appendix B.
For a Coh-Metrix study, an expansive effort in constructing the corpus is
not usually required. A corpus of the type used in Coh-Metrix studies is not
the same (nor meant to be the same) as a corpus such as Brown, Learners, or
TASA. Corpora such as those are painstakingly constructed as reference
points, suitable for multiple, extensive, and recursive examination. In a
Coh-Metrix study, the goal of the corpus is seldom the making of a fine and
solid reference repository. Instead, the goal is defined by the research ques-
tion, and the corpus is simply a means to this end (which is why putting it in a
method section is appropriate). As such, the important aspect of the corpus in
Coh-Metrix studies is that it be practical and suggestive, rather than exhaus-
tive and definitive.
The notion of “practical” and “suggestive” leads us back to the key phrase: a
reasonable point of departure. That is, if our research question requires us to
examine a set of texts to find evidence for or against a claim, then the question
is: Where is a practical place to start, from which the results are likely to be
sufficiently suggestive to guide our future research? Let’s say our research
question is: Are newspaper headline stories more cohesive than editorials?
To address this question, as a reasonable point of departure, we would
probably aim for a minimum corpus of, say, 3 major newspapers, with
40 editions of each over some fairly recent time slot (e.g., the immediately
previous 2 months, or 6 months from the previous year). To be sure, whatever
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 151 [145–162] 9.10.2013
7:44AM
the results, the findings of this analysis can never be more than suggestive,
because the size and scope of the study is extremely limited. Nevertheless, the
corpus is still a reasonable point of departure because, while a positive result
(one that supports the H1 hypothesis; see Chapter 8) is only suggestive, a
negative result (one that finds no differences at all between the two text-types,
the H0 hypothesis) would almost immediately end the research project
(or dramatically change its direction). More importantly, a positive result
would guide the researcher into the next step of the project, which might
include (1) extending the current corpus to include more major newspapers;
(2) extending the corpus to include local newspapers; or (3) extending the
corpus to include English-language newspapers from other places in the
world. This building of the corpus, directed from the findings of the initial
analysis, returns us to our point of practical. At the same time, our negative-
result example, leading to a possible abandoning of the project, also leads us
to the notion of practical because here practical means disposable. After all,
can you imagine spending a year or more making a definitive corpus only
to find nothing at all in the results? As such, it is much better to start with a
small corpus and build out slowly and carefully, one step at a time, letting
the results of one study guide the direction of the next study, and, whatever
the results, offering only small, humble, and hedged claims as to their
generalizability.
There is one further point on this issue of a reasonable point of departure.
The researcher does not have to have a homemade corpus (like the example of
newspaper corpus given earlier). An alternative approach is to use an already
existing corpus (usually one that is established by way of publication). Such a
corpus (we’ll call it the stand-in corpus) may well be perfect for the analysis at
hand, but more often it is not. However, as we seldom have the time and
resources available to put together the perfect corpus, a stand-in corpus is
often a reasonable point of departure. The non-perfect nature of the corpus
makes any results you draw from the analysis suggestive, not definitive, but
these results will still offer direction for future research.
The approach of the stand-in corpus is as commonplace in the discourse
sciences as is the smaller, more practical homemade corpus. To better under-
stand the stand-in corpus, let’s make an example and say that our research
question involves a study of narratives and expository texts. The Brown
Corpus would be a reasonable point of departure for this study because
(1) it has many examples of fiction texts in it; (2) it is large, certainly compared
to most Coh-Metrix studies; (3) it is well established; and (4) it is relatively
easy to get a hold of. But, at the same time, the Brown Corpus has problems:
(1) it is old, having been compiled in the 1960s; (2) it is composed of only
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 152 [145–162] 9.10.2013
7:44AM
American texts; (3) it is limited in scope because all the texts are 2,000-word
extracts; (4) major registers such as African-American literature are not
present; and, most importantly in this example, (5) the research question
addresses narratives and expository texts whereas the Brown Corpus has
narratives and non-narrative texts. To equate the non-narratives as exposi-
tory texts means that the researcher will not gain a definitive answer.
Nonetheless, if the researcher were to find no cohesion differences at all
between narratives and non-narratives (in the Brown Corpus), then a serious
rethinking of the research project would be needed, and many months
(maybe years) will have been saved. From these examples we can better
understand just how important is the concept of a reasonable point of
departure.
the major writers or resources; and (3) large enough so that results stemming
from the research will be sufficiently compelling for the field to accept the
study as a meaningful step forward. This third point is critical. Recall that
discourse scientists use corpora to direct their future research. The future
research is probably going to be decided by the number and type of statisti-
cally significant results garnered from the analysis, and compelling statistical
results simply cannot be observed if the corpus size is too small (for more
on this subject, see Chapter 11 on results). As a very simple rule of thumb,
researchers are advised to have at least 20–30 texts for each variable in the
analyses they conduct. For example, if you want to examine a corpus for
its referential cohesion, it will cost you 20–30 texts. If you want to subse-
quently examine it for its syntactic complexity, it will cost you an additional
20–30 texts. And so on and so forth.
Three Hundred Texts of 300 Words Each. The 300:300 response is one
that students like hearing, presumably because it is very easy to understand.
By 300:300 we mean there should be a total of at least 300 texts, with each text
being about 300 words long (so that the mean text length is about 300 words,
with a standard deviation of less than 150, which is half of the mean). So, why
is 300 good?
In short, 300 is not “good.” It is simply a large enough number to probably
cover a wide range of requirements for empirical studies. Moreover, it is a
small enough number to be practical for collection in many studies. Let’s look
closer.
If the corpus has 300 texts, then its chances of being completely unrepre-
sentative and completely unbalanced are dramatically reduced. Of course,
there is no guarantee, but the larger the number, the lower the likelihood.
If the corpus has 300 texts, then, in all probability, it can be analyzed with a
large number of Coh-Metrix variables. The ratio rule of thumb described
earlier suggests that a corpus of 300 texts allows comfortably for a test of 10 to
15 variables.
Finally, a corpus of 300 is a nice round number that allows us to divide the
total into a training set of 200 texts and a testing set of 100 texts. We discuss
training and testing in detail later in this chapter. For now, it is enough to
know that the number 300 is suitable for such divisions.
On the issue of 300 words, we also want to make clear that the number is
simply convenient. Similar to 300 texts, the convenience in no way explicitly
helps the validity of a study, but it does cover a number of possible problems.
For example, texts of 300 words do not take too long to process in Coh-
Metrix, whereas texts containing thousands of words can be problematic
depending on the variables used. Similarly, very short texts (i.e., fewer than
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 154 [145–162] 9.10.2013
7:44AM
100 words) are problematic for many variables because there are not enough
words to establish confidence in the assessment. And very short texts (i.e., a
paragraph or so) are often unlikely to have developed fully their range of
cohesion values.
The 300:300 rule is not a bad idea to keep in mind when starting a Coh-
Metrix study. However, the research question, the hypotheses, theory, prac-
ticality, and the need for a response that will guide future research must, in the
end, determine the final size of the corpus.
each text is split in half, then the two halves cannot be described as inde-
pendent because they are each dependent on their corresponding half. Of
course, if each text has one other half, the corpus might seem “independent
enough,” but if each text has its other half, then each text is very closely related
to one text and very distantly related to the remaining 39 texts.
But why does this even matter? It matters because the statistical analysis we
conduct on the corpus takes the number of items in the analysis very
seriously. A corpus is, of course, just a sample of some phenomenon of the
world; it is a sample that, we are arguing, is representative of that phenom-
enon of the world (e.g., newspaper stories). The larger our corpus is, the more
like the real-world phenomenon it is because the more of the real-world
phenomena are in it. Consequently, the larger the corpus is, the greater the
confidence we can have in our analysis, and the statistics we use in our
analysis reflect this. As such, doubling our corpus by chopping it in half is
likely to get us a “better result” without actually increasing the corpus’s
representation of the world. Consequently, the result will be misleading.
ultimately responsible for making sure that the corpus is sufficiently clean,
because, as the old computational saying goes when garbage goes in, garbage
comes out.
A second issue of cleaning concerns consistency. Many students ask what
they should take out of a text and what they should leave in (e.g., headers,
typos, spelling mistakes, pronunciation guides, etc.). To address this question,
we offer two golden rules of analysis:
1. Unless there is a good reason to take it out, you should leave it in.
2. What you do to one, you do to all.
Rule 1 simply asserts that the default condition of the text is exactly the way
you find it. Any changes made to it after that should be documented and
reported in your paper. Some of the most common changes are removing
annotations and picture captions. The annotations are removed because the
text is unreadable with them in, and if they are left uncleaned, Coh-Metrix
results are likely to be flawed. The picture captions need to be removed
because they are not part of the continuous text that the writer intended.
Moreover, their insertion into the document renders the sentence mean-
ingless, and the corresponding evaluations may be misleading.
Rule 2 means that you cannot pick and choose the texts that you modify. If
you remove something from one text (e.g., a date that happens to be at the end
of a text), then you must check that none of the other texts also have that date
(and if they do, then they all must be removed, or all kept). The same
consistency is necessary for spelling corrections and typos. It is tempting
when you see a spelling mistake to correct it, but unless you plan on correct-
ing the entire corpus, you should leave things the way you find them.
Finally, know that encountering a few dirties across the corpus is not
considered unusual. As a general rule of thumb, we say that the corpus
needs to be at least 95% clean. That is, about 95% of the texts should have
no problems at all, and at least 95% of each text should be thoroughly correct.
If your corpus is very large, and reading through all of it to make sure it is
clean would take considerable time, then assessing a sample of the text (e.g.,
10–20%) is generally considered sufficient.
seem like a mundane task (and, actually, it is), without careful organization at
the outset, you’ll soon find yourself spending inordinate amounts of time
trying to pick through your files so as to try to make some sense out of things.
In short, mundane task or not, organization of the corpus must be taken
seriously.
Arranging Your Corpus. For most Coh-Metrix analyses, there are four
basic arrangements of the corpus: the between (contrastive), the within (com-
parative), the matched, and the standard. Before explaining why we need to
even care about these arrangements, let’s take a moment to explain what each
arrangement looks like.
The most common organization of data in a Coh-Metrix corpus is the
contrastive (or between or independent). Essentially, a contrastive organiza-
tion has one corpus that is divided into two (or more) roughly equal parts.
The object of the study is to contrast the two parts, the hypothesis being that
the two parts are different. Our newspaper example, the one we gave in the
Elevator Pitch in Chapter 7, is an instance of contrastive analysis between two
categories: local and global news reporting. In examples of published Coh-
Metrix studies, Crossley et al. (2007) contrasted two sets of texts used by
English language learners: one set was authentic texts and one set was
simplified texts. McCarthy et al. (2009) contrasted the writing of three sets
of scientists: one British, one American, and one Japanese. And Duran et al.
(2007) also used three categories to determine temporal cohesion differences
between the categories of narrative texts, history texts, and science texts.
The comparative (or within or repeated) organization again features two
(or more) sets of texts; however, the difference here is that the two sets are not
independent. For example, the two sets could be (1) students’ essays before an
intervention (e.g., a course in which they are taught something) and (2) essays
by those same students written after the intervention. Other forms of com-
parative design are the first half of a story compared to its second half, or a
first draft compared to a second (or final) draft, or two or more sections from
the same article. This last example occurred in a study by McCarthy et al.
(2007), in which the authors looked at five categories in journal articles:
abstracts, introductions, methods, results, and discussions. Because each
category comes from the same article (and therefore, presumably, the same
writers), it is not considered to be an independent arrangement.
A matched corpus is very much like a comparative corpus. The only
difference is that nonindependence is forced on the texts. For example,
Lightman et al. (2007b) examined (rather morbidly) the song lyrics of artists
that had committed suicide. Each artist that had committed suicide was
matched with a similar artist that had not committed suicide (e.g., Ian
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 158 [145–162] 9.10.2013
7:44AM
Curtis was paired with David Byrne, Kurt Cobain was paired with Chris
Cornell). Effectively, there is no analytical difference between a matched
arrangement and a comparative arrangement, but the right terms should
still be used when describing the data.
And finally, the standard straight corpus, as the name might imply, is
simply the corpus you have without any form of categories. For example,
Weston et al. (2010) analyzed a corpus of free-writes. Each text in the corpus
was given a value for quality, but the corpus as a whole was considered just
one category: free-writes.
The statistical analysis you ultimately use to better understand your results
depends on the arrangement of the corpus. It is for this reason that you have
to make sure that your data is arranged as one of these four types, and not
some kind of odd mixture. That is, you can get into some serious statistical
trouble if some of your data is paired and the rest is independent. For
example, probably the most simple (and yet very powerful) form of statistical
textual analysis is a t-test. In the field of discourse science, a t-test allows you
to make a claim that your groups from your Coh-Metrix analyses are indeed
“different.” However, there is more than one kind of t-test, so the question
becomes which t-test you should use. A paired t-test should be performed if
your corpus is comparative or matched, whereas an independent t-test should
be used if your corpus is contrastive. We discuss statistical analyses in more
detail in Chapter 11. For now, it is enough to know that your corpus arrange-
ment is critical to establishing the value of your ultimate findings. If your
arrangement is a hodgepodge of mixed and independent, then no statistical
analysis will be appropriate, and therefore no meaningful assessment of your
data can be made.
Coding Your Files. In any kind of Coh-Metrix study, it is wise to code the
names of your files to reflect the categories of which they are a part. For
example, in the Duran et al. (2007) study (mentioned earlier in the subsection
on contrastive groups), the narrative texts were coded with the letter N
followed by an underscore, the history texts were coded with an H followed
by an underscore, and the science texts were coded with an S and an under-
score. In many studies, number sequences are preferred to letters. Whatever
the organization, the point is to be consistent, because you will be amazed
how soon you forget what was what and where was where. Here’s an example
of how some of the file names appeared in Nick’s study:
N_07_Treasure_23_045
H_12_Civil_07_063
S_09_Cells_18_107
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 159 [145–162] 9.10.2013
7:44AM
In this coding, the first symbol represents the category (N, H, or S), the second
symbol represents the grade level of the text (7–12), the third is a short form of
the name of the text, the fourth is the sequence number of the text with the
category (1–50), and the final symbol is the number of the text in the entire
corpus (1–150). Note that if the highest numbers are likely to be three figures
(e.g., 107), then smaller numbers also need to have three figures (e.g., 045).
Keeping index names that appear as numbers to the same length may help
later when sorting data. In a matched corpus, the names of the two versions
are likely to be the same except for one key element: the one that indicates to
which of the two corpora it belongs. This single indicator is likely to be the
most important feature of the name (inasmuch as it will probably be the
feature that is viewed most often to check for membership). As such, in a
matched corpus, the distinguishing key is likely to be the first element of the
name.
Coding your files also becomes important when it comes time to conduct
the statistical analyses. The Coh-Metrix output includes only the names of the
files and the Coh-Metrix output. And so, the only way to categorize the items
is by means of the file names. If the file names include all of the necessary
information, then the Coh-Metrix data file is ready to be analyzed.2
2
When using Excel, we generally use the Text-to-Columns feature, which breaks up each of the
parts of the text title into separate columns. Each column can then be used as a variable in the
analyses.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 160 [145–162] 9.10.2013
7:44AM
t a b l e 9 . 1 ( cont.)
countries (Bulgaria, the Czech Republic, Portugal, and Romania), then the
categorization process would be obvious: specifically, they would be catego-
rized according to where they were found.
conclusion
In this chapter we described the material for the experiment (i.e., the corpus).
Predominantly, we provided guidance as to the composition of the corpus,
and the organization of the corpus. And we ended the chapter with examples
of the four major moves associated with the corpus section of a research
paper. In the next chapter we discuss the tool (i.e., Coh-Metrix) that you will
be using in your research project.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 163 [163–175] 9.10.2013
7:54AM
10
The Tool
and paste these sections, but we are saying that your sections are likely to look
very much like these.
One change that you can make (and really should make) is your selection
of where and how Coh-Metrix has been used in previous studies (Question 3
in Table 10.1). On this note, we advise you to list studies according to the
following criteria: (1) the studies that are most relevant to your own; (2) the
studies from the most major journals; and (3) the studies that are most recent.
t a b l e 1 0 . 1 ( cont.)
It is also a good idea if you have actually read the studies that you list; if not, it
can get a little embarrassing during presentations.
As a final point on the third move, note that listing previous studies
that have used the tool (i.e., Coh-Metrix) isn’t a validation of the tool per
se (at least, not in the more traditional sense of validation). Validation of a
tool is typically established by testing that the tool does what it is supposed to
do. Numerous such Coh-Metrix studies have been conducted. For example,
Danielle McNamara and her colleagues showed that Coh-Metrix coreference
measures replicated human assessments of high and low cohesion (McNamara,
Louwerse, McCarthy, & Graesser, 2010; see Chapter 6). We can refer to
validation studies of this type as intrinsic validity. That is, the study itself is
concerned with the validation process. By contrast, we can say that extrinsic
validation refers to a provision of evidence in terms of widespread use and
acceptance by the discourse community. Thus, intrinsic validity establishes
that X is suitably representative of Y, regardless of whether anyone treats it as
such, whereas extrinsic validity demonstrates that X is treated by the discourse
community as suitably representative of Y, regardless of whether it actually is.
Needless to say, a combination of both intrinsic and extrinsic validity is most
desirable to establish confidence in a computational tool – and fortunately,
Coh-Metrix has an abundance of both. Consequently, an extrinsic validity
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 166 [163–175] 9.10.2013
7:54AM
move (such as that given in Table 10.1) should be enough for most readers to be
persuaded that the tool you are using (i.e., Coh-Metrix) has earned sufficient
trust to conduct the task at hand.
Selecting Variables
From Table 10.1, it is only the fourth move – What are you using Coh-Metrix
for? – that must change for each study. For this move you will select the
variables, or banks of variables, that are of most interest to your study. You
will say what the variables are called, why you have selected them, and what
you expect the results to show (i.e., your predictions; see Chapter 11 for more
on this issue). You will also need to describe each of the variables. Sometimes
each index is described separately, and sometimes you will describe the
indices in terms of groups or banks (see Chapter 4).
Selecting variables is not straightforward, and we need to discuss this issue
in quite some detail because if you select too many variables, or you select the
wrong variables, you run the risk of invalidating your study. On the other
hand, if you choose too few variables, you run the risk of finding no results,
making your study essentially worthless. As such, let’s tread very carefully
through this potential minefield.
Deciding How Many Variables to Use. To help you decide how many
variables you can use, we have provided four heuristics. Note that heuristics
are not laws; instead, they are pieces of advice or the generalization of past
practices. You need to consider very carefully how you will apply these
heuristics, taking as much advice as you can find. As you seek out this
advice, you will find many voices that are (shall we say) “animated.” In
short, passions can run high on this subject and you’d do well to spend a
good number of years simply soaking in the vast amount of commentary
that is out there.
The 20:1 Rule. The 20:1 rule says that you can use 1 variable for every 20
items in your corpus. For example, if you are looking at a corpus of 100 essays,
then you can use 100/20 = 5 variables. The number 20 is in no way an ideal, and
many people would strongly argue that 30:1 is far more reasonable because it
allows for more powerful statistical analyses. Of course, more is always better,
but it is probably fair to say that 20:1 is broadly accepted as a minimum ratio of
items to variables (or indices).
Use Them All, Report Them All. A second approach to selecting variables
is very simple: just use them all. But if you do use them all, you have to report
them all too. That is, it is just as important for researchers to know which
variables were not significant as which ones were significant (see Chapter 11).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 167 [163–175] 9.10.2013
7:54AM
Using all the variables has one major advantage and one major disadvantage.
The major advantage is that the results of the analysis can be seen from an
exceptionally broad perspective. As such, we can textually view a corpus
from numerous angles, providing us with the clearest possible insight into
how the corpus differs between constructs. The major disadvantage is that
using all the variables compromises traditional agreements on the level of
statistical significance. That is, the more variables we use, the more likely we
are to see what appears to be statistically “significant” results. However, such
a result is like shooting for three points in basketball: The more we shoot, the
more likely we are to get a basket but without meaning that we are neces-
sarily getting any better at shooting. Therefore, while using all the variables
gets us a grand picture, it makes interpreting the accuracy of the picture
much harder.
Use Theory. Some people argue that we can use as many variables as we
wish, provided we have good theoretical reasons for using them. Although
there is some merit to this claim, it is difficult to imagine the possibility of
sound theoretical reasons for a large basketful of variables. To be sure, includ-
ing reasoning for the use of any variable is a good idea, and having no reason to
include a variable probably means that it should be left out of the analysis. In
short, the better the theoretical reasons for including a variable, the greater the
benefit of the doubt when it comes to assessing the interpretation of the results.
Train and Test. If your data set in large enough – say, 300 items – then
training and testing is possible (see Chapter 9 for discussions on corpus
size). For this approach, we typically divide the data into two groups, with
the training set being two-thirds of the data (200 items in this example) and
the testing set being the remaining one-third (100 items in this case). We then
apply all the indices (or any number of the indices) to the training set only.
From these results, we take only the variables that meet a predefined level
(say, a p-value of less than .05; see Chapter 11). We then test those variables
that passed the criterion using the testing set data. If the variables are statisti-
cally significant on the testing set (again, say a p-value of less than .05), then
we have reason to have confidence in them.
Other Considerations in Variable Selection. In the preceding examples we
used the word “variable” rather than “index” or “measure.” We did so because
the jury is still out on whether the 20:1 rule applies to measures or indices.
Indeed, as mentioned, the 20:1 rule itself is not carved in stone. Bearing all this
in mind, our recommendation is to treat the 20:1 rule for measures (i.e., groups
of related indices that all purport to assess that same construct). However,
note that some constructs generate indices that are highly related, and there-
fore generally they are highly correlated (e.g., referential cohesion indices),
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 168 [163–175] 9.10.2013
7:54AM
whereas other constructs (e.g., word frequencies and syntax) are far less likely
to produce highly correlated results. As such, always try to err on the high side
of items to variables.
A second major consideration in variable selection is to note that you get
rewarded for “success” and punished for “failure.” For example, let’s imagine
that we selected 10 variables and we gave good theoretical reasons for each one
of them. If, in the end, only 1 out of the 10 variables was statistically significant,
we’d have good reason not to trust that lone successful variable. That is, in
a result where we were wrong 9 times out 10, the basis of our theory is likely
to be highly suspect, and the one success is more likely to be attributable
to chance. Similarly, if only 1 out of 10 referential cohesion indices shows
significant results, there is a very good chance that the one significant difference
occurred purely by chance. On the other hand, if we have significant results for
9 out of 10 analyses, then we can also have quite some confidence in the 10th
analysis, even if it isn’t (quite) “statistically” significant. That is, our theory is so
good that this time it is the one bad result that can be put down to chance.
A third consideration is that approaches used commonly in the past might
not be the best approaches to be used in the future. For example, the training
and testing approach (described earlier) is common in Coh-Metrix literature
(and common in many types of literature); however, we typically used this
approach during a long process of validation studies in which our goal was to
know how well the variables worked, and how powerful Coh-Metrix could be.
Put another way, the Coh-Metrix team has done plenty of these studies, but
how satisfactory this form of analysis is going forward could be described
as open to discussion. The main bone of contention, as we saw in the previous
chapter, is that any form of analysis that lacks the guidance of theory is of
debatable value to the developing theoretical framework. Of course, if theoret-
ical motivations are appropriately included in the analysis, then training/testing
is less of an issue; but then again, if theoretical motivations are appropriately
included, then there seems little reason to use a training/testing approach.
In the end, the best advice we can give you is the following:
1. Keep your items-to-variables ratio as high as possible. Having a large
corpus – say, 300 items – helps in this endeavour.
2. Think very carefully about each variable before you use it, because if
you use it, you really should report the result (whatever the result is).
3. Non-significant results, although seldom appreciated in the broader
field, can be every bit as enlightening as significant results.
4. Statistical significance is important, but it is not everything. Means,
standard deviations, and especially effect sizes can be just as enlightening
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 169 [163–175] 9.10.2013
7:54AM
simple text such as “I saw my aunt, brother, father, and grandchild,” LIWC
would record a textual value of 50 for “family”: (dictionary words / total words)
* 100; which is (4/8) * 100 = 50.
The apparent simplicity of the LIWC system should not make you think
its assessments are vapid or error prone. On the contrary, LIWC has been
used in numerous studies to investigate an impressively wide array of con-
structs (Pennebaker, 2011). Moreover, LIWC software can be dated back to at
least 2001 (Pennebaker, Francis, & Booth, 2001), making it one of the earliest
publicly available textual research tools. In short, LIWC’s contribution to
discourse science and ANLP cannot be overstated. And while its approaches
may lack the sophisticated mathematics of more contemporary measures, its
findings present a formidable list of achievements.
LIWC variables and Coh-Metrix variables share some overlap. Indeed, the
overlap is such that Duran et al. (2010) were able to replicate a deception
study that was originally devised by the LIWC team. However, while several
descriptive variables are certainly comparable across the two systems, their
respective goals are fairly distant. LIWC assesses the degree to which a given
construct is present in a given text; Coh-Metrix seeks to better assess a text for
its potential readability and comprehension. Clearly defining the purpose of
your own study should help you decide whether LIWC or Coh-Metrix is the
more appropriate system for your particular project.
Concordancers. Concordancers are any type of computational tool that
focus on the identification of words in context. Thus, whereas “calculators”
(e.g., LIWC) focus on adding up how many times words occur in texts, con-
cordancers focus on identifying the snippets of text in which those words occur.
A concordancer is useful because it tells us about the company that any given
word keeps. For example, Rufenacht, McCarthy, and Lamkin (2011) assessed
the difference between early-learner reading texts for native English-speakers
(e.g., fairy tales) and conventional, early-learner reading texts for English-
language learners. Specifically, the authors used a concordancer to compare
the company of highly common words (e.g., “the”). The analysis suggested that
fairy tales were significantly more likely to feature concrete nouns with the
word “the” (e.g., “the ground,” “the fire,” “the wood,” “the ogre,” “the palace”),
whereas English-language learning texts were more likely to feature abstract
nouns with the word “the” (e.g., “the way,” “the idea”).
Numerous concordancer tools are freely available for download. Some of the
more famous systems include AntConc (http://www.antlab.sci.waseda.ac.jp/
index.html) and MonoConc (http://www.monoconc.com). Systems such as
these are easy to operate, function across a wide variety of platforms, and include
numerous textual investigation features (e.g., word counts, lists, contexts).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 172 [163–175] 9.10.2013
7:54AM
unpinning for the variables you use; instead, you’re likely to want to throw a
lot of variable into the pot to see which ones stick - because you wouldn’t
want to miss some potentially useful index (even if its reason for “working”
isn’t particularly obvious). Although this “dumb” task does (probably) allow
for a greater generosity in the ratio of variables to text, you’d do well to
remember that the keeping the ratio at around the 20:1 or 30:1 minimum is
still highly recommended.
In Sum
The development and application of textual analysis tools can be placed in the
field of ANLP, which is dedicated to identifying, investigating, and resolving
language-related issues through automated approaches. Coh-Metrix studies
form one of the most prominent areas of this field, and that central position
looks likely to continue well into the future.
In terms of textual analysis systems, it is evident that Coh-Metrix is an
immensely powerful tool. Clearly, Coh-Metrix is also one the most widely
applied and best-known textual analytics tools. But Coh-Metrix is not the
only textual analytics tool, and neither is its quantitative approach the only
approach available in textual investigative studies. Other tools (e.g., LIWC)
and contrasting analysis approaches (e.g., concordancing) are also available
to researchers, and knowing a thing or two about these other systems and
approaches may help you better design and execute your projects.
In terms of the algorithms that Coh-Metrix employs, most are theoretically
derived, and those theoretical underpinnings are described at length in Part I
of this book. Other algorithms in other systems may well produce “better”
accuracy results in some tasks (such as classification) because those variables
are derived more for the purpose of performance and are less constrained by
interests in cognitive processing. When selecting your variables, you should
consider the purpose of your project, and you should understand that the
variables you choose may not necessarily lead to the best statistical results
(in some particular instance of a project). Remember that a researcher’s goal
is likely to be expanding the theoretical framework rather than getting a single
“good result.” In short, try to keep in mind the (slightly) bigger picture.
conclusion
In this chapter we have described the tool you’re likely to use in your experi-
ment (i.e., Coh-Metrix). We have provided the four major moves associated
with describing the tool. We also have discussed some of the slightly broader
issues concerning computational textual analysis. In the next chapter we
discuss the basics on how to write up the Results section of the paper.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 176 [176–193] 9.10.2013
7:58AM
11
The Results
The Results section in a research paper is generally the last section to get started
but it is often the first section to get finished. That is, once you have collected
your data (the corpus), you’ll need to analyze it, and once you’ve analyzed it,
it is a relatively simple task to write it up. It is a relatively simple task to write it
up because the results section is (or least can be) highly formulaic. Indeed,
some software (e.g., the Gramulator; McCarthy, Watanabi, & Lamkin, 2012)
actually conducts statistical analyses and automatically outputs an acceptable
(if highly formulaic) results section.
As in Chapters 8, 9, and 10, this chapter looks at the writing process for a
short Coh-Metrix paper in terms of moves and frozen expressions. We will
also look briefly at the meaning of those strange letters and numbers that are
the main feature of the results section (the t’s, p’s, d’s, etc.). We have made
every effort to make this chapter as accessible as possible, assuming that the
reader is relatively new to reporting statistical results; however, as mentioned
at the beginning of Part II, we have also assumed that the reader has some
statistical knowledge. Therefore, the reader should be aware that it is beyond
the scope of this book to explain in any great depth what statistics are, which
kinds of statistics are appropriate for which kinds of analyses, how statistics
work, how they are calculated, how they should be interpreted, and how they
are often misinterpreted. To address questions such as these in more detail,
there are excellent resources available, such as the textbooks SPSS Made
Simple (Kinnear & Gray, 2008) and SPSS for Intermediate Statistics: Use
and Interpretation (Leech, Barrett, & Morgan, 2008). There are also excellent
Web resources such as www.talkstats.com and http://vassarstats.net.
To keep this chapter concise, we have provided the absolute minimum of
what you need to know for quantitative empirical research studies using tools
such as Coh-Metrix. That said, the information we provide should be enough
for many students and early researchers (especially those who do not come
176
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 177 [176–193] 9.10.2013
7:58AM
before starting
An important starting point in any research, and particularly when conduct-
ing research with Coh-Metrix, is to start by checking your data. Any number
of Results sections have been written in our lab, only for the student to find
out later that the data set was flawed. The most important rule of thumb is to
check the ranges and means for all of the variables in the data set. Norms have
been provided in Appendix B that should give a clear idea of what minimum,
maximum, and average values to expect. Let’s take, for example, the refer-
ential cohesion measures that have ranges between 0 and 1. If you see that the
mean is greater than 1, or that the upper range exceeds 1, then that is a clear
indication of a problem. A second rule of thumb is to think about what the
expected values are given the nature of the corpus, and check whether the
means seem reasonable given the expectations. This is of course a more
mindful and challenging evaluation of the data.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 178 [176–193] 9.10.2013
7:58AM
Problems in a data set can arise from any number of missteps in the process
of creating the data set. One misstep may have arisen in the corpora. If the
corpora were not compiled, cleaned, organized, and coded correctly and
thoroughly (as described in Chapter 9), Coh-Metrix will chug ahead and
spit out a seemingly fine analysis of whatever it was fed. A second common
misstep can occur when compiling the data. The most common mistake we
have seen is when students merge data sets using copy and paste (e.g., rather
than using a merge function). We cannot count the number of times a copy-
and-paste was done without aligning the data sets correctly (merging by a
common ID is the only safe way to merge data sets). And so, our first and most
important piece of advice in this chapter is to start by checking your data.
reporting results
For our major results examples, let us imagine that a group of researchers
have become interested in essays produced in the English-speaking region of
Whereverland. A number of previous papers in the field have led to the
theoretical framework positing that writers from the north of Whereverland
appear to take great care in story writing with their explanations. Similarly,
the theory posits that writers from the south of Whereverland report stories
with more of a narrative style. The north, we learn, is more densely populated
than is the south, with greater numbers of businesses, colleges, and city folk.
The theory suggests that these people want their information quickly and
decisively, leading to the more expository form of essay. The south, appa-
rently, has a greater oral tradition, and it is argued that this tradition may have
blended into the writing style of people in this area. The researchers in the
study have sought to find empirical, quantitative evidence to support the
theory described here. They have hypothesized that the essays of Northern
Whereverland writers would feature a higher degree of referential cohesion
because coreference is a feature of expository writing (as compared to the
narrative style). After collecting an appropriate corpus for the analysis, they
processed the texts using Coh-Metrix and are now reporting their results.
The goal of our analysis was to determine the difference in referential cohesion
between the essays of writers from Northern Whereverland (NW) and the essays
of writers from Southern Whereverland (SW). In order to address this goal, we
conducted an independent t-test. The result was as predicted: (NW: M = 0.527,
SD = 0.259; SW: M = 0.347, SD = 0.160; t (39) = 2.651; p = .012; d = 0.838). The result
suggests that NW essays deploy greater explanatory features in their writing and
SW essays deploy a more narrative style.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 179 [176–193] 9.10.2013
7:58AM
This Results section features five major moves. The first two of these moves
(which comprise the first two sentences) are unlikely to change all that much
from study to study. And although the other three moves will change depend-
ing on the results, each of the moves remains very formulaic. We discuss each
of the moves in the order they appear.
Recall from Chapter 7 that the goal of a study, its theoretical framework,
hypotheses, and research question, are all highly related. However, when we
are writing the introduction section of a paper, it is necessary to flesh out the
differences between each of these aspects so as to clearly form common
ground between the writer and the reader. But by the time readers have
reached the Results section of the paper, they will have expended a consid-
erable amount of their cognitive resources on coming to understand the
corpus and the tool (see Chapters 9 and 10). Consequently, readers are likely
to appreciate a gentle reminder of what the research is centered on. As such,
the first move of the results section is no more than a brief recap of the
research question.
Note that the NW in this move (and elsewhere) refers to the essays of
writers from Northern Whereverland, and the SW refers to the essays of writers
from Southern Whereverland. Many novice researchers have the idea that
abbreviating everything is something of a rite of passage, akin to a first
cigarette or getting a speeding ticket. And, indeed, abbreviations in results
sections are common practice, but they can be something of a burden for
readers to have to recall and unravel, so use them sparingly.
models), the more you’ll need to explain why you are using what you are
using. Of course, the question is: How do I know whether my choice of analysis
requires an explanation? A good rule of thumb is to consider how many times
you have used or read about the statistical method you are using (and how
many times your audience is likely to have done the same). The higher you
consider that frequency to be, the less you need to discuss it. A second
heuristic that might be of some use on this matter is the Excel option. Excel
is a very commonly employed Microsoft spreadsheet that calculates numer-
ous functions, including some statistics. Our Excel heuristic is simply if Excel
can do it (without the need for any additional add-in), then it is common
enough for the audience to be able to understand the approach. Returning to
our current example, given that a t-test might well be the most frequently
employed statistical test of all, it generally requires no great explanation in
your paper (although, if you’re a student, your professor might request one in
order to demonstrate that you understand what you are doing and why you
are doing it). Note also the word conducted is used in this move. Informally,
we might say run a t-test or do a t-test, but conduct is likely to garner greater
appreciation in formal circles.
number for the data set. For example, the mean of 1, 1, 1, 10, and 10 is 4.6. But,
of course, it would be hard to argue that 4.6 tells us much about what the data
is (i.e., how it is distributed). The high standard deviation (i.e., 4.93) warns us
that we might have trouble. Under such circumstances, you should first take a
look at the distribution of the data and make absolutely sure that the data are
correct. In our experience, unusual standard deviations are often a sign that
the data were computed, compiled, or calculated incorrectly. If the data is
ensured to be error-free, the standard deviation should inform the interpre-
tation of the results. The mean and the standard deviation work well together
because the mean tells us the apparent result and the standard deviation
indicates whether we can trust that appearance.1
Turning to the inferential statistics, the t of the results is the t-value of the
t-test that was conducted. The t-value is a formula that incorporates the
previously discussed means and standard deviations. In the previous para-
graph we said that the mean tells us the apparent result and the standard
deviation indicates whether we can trust that appearance. The t-test is a
much more rigorous assessment of the same information, and it allows us
to go beyond a summary of our data set (i.e., descriptive data) to making an
inference about the population from which that data was taken. Ever so
basically, the higher the t-value is, the greater the difference is between the
two groups of data that were tested (i.e., the coreference values of the texts for
NW and SW). Importantly for t-values, it is important to know that they are
highly dependent on how many items are being assessed. Here, however, is
where it starts to get tricky. In the current example, we see t (38) = 2.651. The
38 in parentheses means there were 40 items (i.e., 40 total texts in the
corpus). In a t-test, the number in the parentheses is always the total number
of texts – 2 (i.e., 40 – 2 = 38). Why this number is what it is belongs to the
arcane subject of degrees of freedom, which is beyond the scope of this book
(fortunately for us). All we really need to know about this number is that the
higher it is, the higher the value of t is likely to be.2
1
Relative standard error (RSE) is a much more statistically satisfying way of assessing potential
problems in a data distribution. See http://en.wikipedia.org/wiki/Standard_error_%28statistics%
29#Relative_standard_error for more details on this. But despite RSE being more appropriate, it is
seldom produced in a Results section, whereas SD is almost always present. In this chapter, we
suggest that SD values should be considered with caution the closer they move toward equalling
the value of the mean. This suggestion is simply a heuristic, based on the numerous Coh-Metrix
studies we have conducted.
2
More technically, the higher the df value (which is 38 in this example), the lower the value of t that
is considered “significant.” However, as all df, t, and p values are created automatically these days,
the most immediate relationship between df and t is simply that they are highly correlated.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 183 [176–193] 9.10.2013
7:58AM
But how high is high? The p-value in the results addresses this question.
Generally, if the p-value is less than 0.05, usually written p < 0.05, then the
t-value is high enough for the result to be deemed “significant.” Significant is
the most frozen of all frozen expressions in research and must never be used
in any way other than to describe a numeric result. In our example, the
p-value is 0.012, so it is less than 0.05, so it is significant. The p-value is
important because it allows us to go beyond saying that “the result is in the
direction of NW,” and allows us to say that “the result for NW is significantly
higher.” This little difference in articulating a result might seem trivial, but it
is probably the most important part of the research paper. In short, if your
result is significant, then you have a winner; if your result is not significant,
then it’s back to the drawing board.
While we’re on the subject of p-values, let’s take this opportunity to briefly
look at what p < 0.05 means. Observe that 0.05 is one-twentieth of 1.00. Put
another way, 0.05 multiplied by 20 is 1. Or, if you like, 1 divided by 20 is 0.05.
In other words, the relationship between 0.05 and 1 is 20. This number 20 is
very important because it tells us what 0.05 means in practical language. It
means that scientists have generally agreed that a 20-to-1 chance of being
wrong is a statistic that we can all live with (generally). Thus, if your result is
less than 0.05, it means nothing more and nothing less than there is about a
20-to-1 chance that your result, in the real world, isn’t actually significant at all
(instead, you just got lucky with your result). If you think 0.05 (or 20 to 1)
sounds like a fairly arbitrary way to decide whether or not something is
significant, then you’d be in good company! Indeed, the very person who
suggested the number, Ronald A. Fisher, would agree with you. But arbitrary
or not, like the height of a basketball net, it is a number that we are stuck with.
You may well be wondering at this point what t provides us that M and SD
and p don’t. In truth, the answer is not much. So why do we report the t value?
Historically, reporting the value of t was extremely important because
researchers had to first calculate it and then use it to manually look up
p values in a large table in a little book. The t-value, cross-referenced with
the degrees of freedom (here, 38), led us to the p-value. These days, to be
frank, it is only students who are forced to manually calculate t values and
then use look up tables; everyone else uses simple software to calculate the
t value and such software invariably also supplies p, M, SD, and any number of
other things. As such, the value of t itself has become the statistical equivalent
of the human appendix, which is to say, removal would be painful, but
ultimately its loss would make very little difference at all.
The final statistic in our example is the d-value. Like the t-value, the d-value
is also a formula that is based on the means and the standard deviations. But
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 184 [176–193] 9.10.2013
7:58AM
whereas the t-value helps us to establish if the difference between the two
M-values is “significant,” the d-value tells us how different the difference is.
The d-value is referred to as an effect size. There are many different kinds of
effect sizes, but in this example we will only discuss d, which is known more
specifically as Cohen’s d. Cohen’s d is a widely used index of effect size, one
that is relatively simple to calculate, and one that is relatively simple to
interpret. As such, we find it appropriate to use in this example; however,
we neither claim that Cohen’s d is the best measure of effect size nor that
Cohen’s d is a synonym for effect size.
Essentially, Cohen’s d tells us the degree to which we could overlay one set
of data (e.g., NW) with another set of data (e.g., SW). If the value of Cohen’s
d is 0, then we have a perfect match, which tells us the two sets of data are not
different at all. As the value of Cohen’s d increases, so does the indication of
difference between the two sets of data. Over time, a relatively well-agreed
scale has emerged for how the value of Cohen’s d should be interpreted
(Cohen, 1988). Thus, below 0.2 can be called a small difference (about 85%
overlay of data), and from 0.2 to 0.5 can be called a moderate difference
(an overlay of about 67% of the data). Any value after 0.5 is a large difference; a
d-value of 1.0 has an overlay of about 45% and a d-value of 2.0 has an overlay
of about 19%.
For many people, significance is king, and nothing more than the p-value
need trouble them. But, as the legendary statistician R. Fisher himself was at
pains to point out, it is extremely important to interpret a result not with one
value but with every value you have at hand.
Fisher’s point is reminiscent of the old tale of the king and the six blind
men. As the story goes, the six blind men each thought themselves very
wise, and all day long in the market they would argue among themselves as
to who was smarter. The king grew tired of the constant bickering and
thought of a plan that might quiet them all. Now, conveniently for the
story, none of the men had ever seen or heard of an elephant, so, somewhat
implausibly, the king sent for an elephant to be taken to the market for the
blind men to examine. The king’s challenge to the men was to tell him what an
elephant was like. The first blind man took a hold of the elephant’s trunk and
announced confidently that an elephant was like the branch of a tree. The
second blind man took a hold of the elephant’s leg and announced confi-
dently that an elephant was like a pillar. The third blind man took a hold of
the elephant’s ear and announced confidently that an elephant was like a fan.
The fourth blind man took a hold of the elephant’s tail and announced
confidently that an elephant was like a rope. The fifth blind man took a
hold of the elephant’s tusk and announced confidently that an elephant was a
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 185 [176–193] 9.10.2013
7:58AM
long pipe. And the sixth blind man placed his hands on the elephant’s body
and announced confidently that an elephant was like a wall. “You’re all right!”
bellowed the king, “but you’re all badly mistaken alone, and wise indeed
together.”
For us, the old tale of the king and the six blind men is a reminder that each
value to which we have access only tells us part of the story. It is only when we
put all of the pieces together that we know what we’re dealing with. Thus M,
SD, t, p, and d all work together to confirm, elaborate, bridge, and interpret a
result.
So, to sum up the results in numbers move, let’s put all the pieces together.
We know that the value of NW (M = 0.527) looks higher than the values of SW
(M = 0.347). That is to say, the result is in the direction of NW. But is the
numbers 0.527 really representative of NW? And is 0.347 really representative
of SW? Technically put, do the two values reflect different population means?
The SD value for NW (0.259) is just under halfway to 0.527, so the NW data
set is probably fine. The SW data set is also fine, with an SD of 0.160 (less
than halfway to the mean). Turning to the inferential statistics, we have
38 degrees of freedoms, which means we have a total of 38 + 2 = 40 items in
our data set. We want to extrapolate from this sample what the difference
between the two groups would be if we actually had all the data in the world
to work from (the population, instead of just this sample). The difference
between the means (NW = 0.527 and SW = 0.347) might be true for this
sample of 40 texts, but how confident can we be that this difference would
be similar had all the texts from all NW and SW writers been available?
The p-value is less than 0.05, so we can assume the result only has a 1-in-20
probability of being the result of chance. As such, we can say that there is a
significant difference between the means of the coreference values for NW
and SW, with NW being higher. Not only is there a significant difference; we
can also say that the difference is large because the d-value is 0.838.
other matters
What we have described previously constitutes the minimum that a Results
section must include. In the following section, we briefly discuss several other
matters that might be included in the Results section, several other matters
that should be included in the Results section, and, just as importantly, several
matters that should not be included in the Results section.
In example (1), the p-value is 0.012. Because 0.012 is less than 0.05, the result is
deemed significant. The frozen expression is the result was as predicted. In
essence, the phrase means we thought this would be the result and it was. If a
result was predicted and it is also significant, we typically restrict ourselves to
only as much text as used in the example. That is, we often don’t actually write
“the result was significant” because if the result was as predicted, it implies that
it was significant; and in any case, the p-value that follows the statement
confirms the inference.
In example (2), the p-value is .076. The value .076 is not less than .05, so the
result is technically not significant. However, the frozen expressions of
importance here are marginally significant and approaching significance.
These are terms that denote a p-value that is less than .10 but greater than
.05. In essence, the expressions mean: We thought this would be the result but
it wasn’t; but, gosh-darn it, we were so close that we deserve a prize even though
there really isn’t a prize for coming in second. It is important to understand
that “approaching significance” is still “not significant”; however, it is also
important to understand that researchers go to a lot of trouble and spend a lot
of time in conducting experiments and they find it very hard to accept that a
“really close” result isn’t really a result at all. Such has been the breadth of
feeling on this issue that convention, somewhat unofficially, has come to
accept results that are really close to significant. And indeed, particularly in
exploratory research, leaving out the marginally significant results can mean
leaving out meaningful and important results. Notably, there is also the issue
of statistical power. Like degrees of freedom, this issue is beyond the scope of
this chapter, but you might want to take a moment to look it up.
In example (3), the p-value is .219. Because .219 is not less than .05 (and also
not less than .10), the result is deemed not significant. However, the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 188 [176–193] 9.10.2013
7:58AM
And sometimes this explanation can go into perhaps more detail than is
necessary, as when McCarthy et al. (2009, p. 150) wrote:
The third type of justification (i.e., the extrinsic) may require no more
than references to other works. As the name suggests, the statistical approach
in the study is used because other people before have used it. That is, it is
justified to use it now because it has been used before. For example, McCarthy
et al. (2008, p. 654) wrote
To examine the hypothesis that there are linguistic differences that differentiate
simplified and authentic texts, we conducted a discriminant function analysis. A
discriminant function analysis is a common approach used in many previous
studies that attempt to distinguish between text-types (e.g., Biber 1993; McCarthy,
Lewis, et al., 2006).
It may seem odd that a procedure is acceptable simply because someone else
used it. However, if it has been used in previous studies, then we can assume it
was reviewed and accepted there (and does not need to be re-reviewed). Also,
it is important to remember that convention is very strong in the sciences.
Indeed, the very fact that we can talk so much about moves and frozen
expressions is because people have come to accept and expect how we go
about writing up our research. Finally, we also use this approach when we are
deliberately replicating a procedure. For example, Louwerse et al. (2004)
conducted a study that was deliberately based on the study of Biber (1988).
As such, it was important for the authors to write “[W]e carefully followed
Biber’s study” (p. 845).
graphs
The presentation of graphs in a research paper is just as important as the way
to write the research paper. By graphs we mean tables, figures, and any other
representations that are nonlexical. Almost all Coh-Metrix papers feature
some kind of graph and as such it is necessary to discuss them here.
The function of a graph is to facilitate the readers’ comprehension of the
research. More specifically, graphs are used to convey a message to the
audience concerning the goal of the paper that could not be equally well
conveyed in prose. Graphs are more useful than prose when the information
they convey is equal to or better than a prosaic version, and yet is less
cognitively demanding in terms of processing. This processing advantage
that is achieved by graphs can be attributed to the ease with which data can be
found, compared, and contrasted in a graph (relative to prose), and the fact
that differences are often easier to understand by way of visualization rather
than calculation. Although graphs are valuable, they can take up a consid-
erable amount of space in a research paper. As such, you always have to
consider carefully when a graph is worth including. One rule of thumb for
graph inclusion is to remember that a picture, as they say, is worth a thousand
words. So if you find that you can say all that needs to be said in just a couple
of sentences, then you probably don’t need a graph.
In Coh-Metrix papers, the most common form of graph is the table. Tables
are generally used to show results, although they are sometimes used to show
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 192 [176–193] 9.10.2013
7:58AM
conclusion
In this chapter we discussed the reporting and presentation of the results
section of a research paper. We outlined the five major moves of the results
sections, along with the frozen expressions associated with them. We pointed
out that the Results moves had variations, depending on how well the results
met predictions. Several further issues were discussed, including the justifi-
cation of the approach used and the importance of graphs in a Results section.
The next chapter turns to the final section of a paper – the Discussion.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 194 [194–222] 9.10.2013
8:03AM
12
The Discussion
Whereas the primary purpose of the Results section is to explain what the
results of the experiments are, the primary purpose of the Discussion section
is to explain what the results of the experiments mean. Put another way, our
primary (but by no means the only) task in the Discussion section is to provide
a plausible explanation as to the relationship between our results and our
theoretical framework. This requirement is the tricky part because, unlike
other parts of the paper, which can be very cookie-cutter-esque, the require-
ment of the Discussion section demands an element of creativity on the part
of the researchers. That is, the findings of the study are only circumstantial
evidence, and it is up to the investigators to undertake the challenging task of
persuading the audience (i.e., the readership, the discourse community) that
what was found in the study contributes positively to our current understand-
ing of the world.
This task requires a careful meshing of the guiding theoretical framework
and the results. Both can be dauntingly messy. Results are seldom highly
significant with huge effect sizes; if they were, then pretty much no one
would be interested in the results because they are hardly likely to be telling
us anything we didn’t already know, or need to know. So, because frameworks
and results are messy, patching them together requires careful consideration,
rigorous examination, exhaustive reviewing, and, perhaps most important of
all, a creative perspective in order to make a grab-bag of knowledge-ingredients
into a comprehensible propositional-cake.
discussion moves
So, a Discussion section is not easy. But it still has to be written. As ever, the
best way to make sense of it all is to consider it in terms of moves and
their associated frozen expressions (see Chapter 7). However, because the
194
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 195 [194–222] 9.10.2013
8:03AM
I. Summary Phase
a. Commencement move
b. Exposition move
1. Methods element
2. Purpose element
3. Results element
II. Denouement Phase
a. Interpretations move
b. Implications move
III. Acknowledgements Phase
a. Limitations move
b. Future research move
IV. Closure Phase
a. Wind-up move
b. Pitch move
fi g u r e 1 2 . 1 . The discussion model helps organize the ending argument of your paper
Discussion requires more creativity on the part of the writer (to tie together
results into the theoretical framework), the moves of the Discussion are
somewhat less formalized than we have seen in other sections of the paper.
Put another way, the moves of the Discussion section are somewhat more
flexible in where they appear, how they appear, and even if they appear at all.
In some ways, this flexibility makes the Discussion section easier to write
because authors can weave something more like a narrative into the section,
even putting their own spin on how the results should be interpreted. That
said, the flexibility of the Discussion section may also cause authors to wander
off topic or make claims that are poorly evidenced. With such caveats in
mind, we propose that an effective Discussion section can broadly fit into the
following model (see Figure 12.1).
With this model in mind, in the sections that follow we explain each of the
four phases of the discussion (i.e., the summary phase, the denouement
phase, the acknowledgments phase, and the closure phase) together with
their associated moves, elements, and frozen expressions. We will also supply
examples from Coh-Metrix-related papers in order to show how authentic
studies have addressed parts of this discussion design. The chapter ends with
a model example of a Discussion section that is based on the newspaper study
described in the Elevator Pitch of Chapter 7.
summary phase simply sets the stage for that very question to be addressed in
the denouement phase that follows (i.e., the second phase). Indeed, some-
times, when space allows, the summary phase is its own major section of the
paper; however, when that happens, the distinction between a Summary and a
Discussion often gets very blurred, and the Discussion is probably already
blurry enough.
The summary phase aims to address two major questions: (1) What did we
do? and (2) What did we find? Two further questions that may find their way
into the project summary are (3) How did we do it? and (4) Why did we do it?
We begin by focusing on the first two of these questions in what we call the
commencement move of the summary phase. The third and fourth questions
are largely discussed in the subsequent exposition move.
Commencement Move
Generally, a summary phase opens with the commencement move. The pur-
pose of the commencement move is to bring readers and authors together at a
single point of embarkation from which the interpretations and implications of
the results can be “discussed” (hence the name “discussion”). The commence-
ment move is generally nontechnical, because it is important not to confuse
anybody right from the get-go. For the same reason, the commencement move
should also be relatively simple, relatively short, unassuming, and unequivocal.
The basic point here is that the commencement move needs to activate as
much schemata as possible for the reader while limiting the cognitive resources
needed to do so. In such a way, readers are most likely to have available to them
the cognitive resources necessary to integrate the forthcoming information into
their developing mental model of the text.
Possibly the easiest way to achieve a successful commencement move is to
simply state what the paper was about (i.e., what did we do?). The researchers
Rowe and McNamara (2008) provide a nice example of this move when they
write: “This study explored the mechanisms within the CI model related to
disambiguation.”
But as we have pointed out, the presentation of the project summary largely
depends on how the researchers interpret the interplay between the results
and the theoretical framework. As such, Coh-Metrix commencement moves
have come in many forms (see Table 12.1 for examples).
Any number of variations of the commencement move are perfectly
legitimate, but here we describe the most basic example (i.e., what we did).
When we address what we did, we are focusing on the fundamental act that
best describes the methodology used in the project. That is, the project is a
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 197 [194–222] 9.10.2013
8:03AM
Grammatical
Structure Text Authors
Past tense In this study, we analyzed three corpora of science McCarthy,
journal abstracts written by American, British, Lehenbauer, et al.
or Japanese scientists. (2007)
Present perfect Using the computational tool Coh-Metrix, this Crossley &
study has demonstrated that many properties McNamara
of both simplified and authentic texts . . . (2008)
Present tense The findings from these studies indicate that Crossley &
argumentative essays judged to be of higher McNamara (2011)
quality by expert human raters are more
linguistically sophisticated, but at the same
time contain fewer cohesive devices to facilitate
text comprehension.
be written in the past tense (so, if the research verb is “assess,” then the
form of the verb will be “assessed”). To understand what else needs to be
in the move, we should consult the research question. Let’s look at two
examples of Coh-Metrix research questions that were first presented in
Chapter 7.
1. Bruss et al. (2004): Has the language used in scientific texts changed
over the last 200 years?
2. Louwerse et al. (2004): Can Coh-Metrix distinguish spoken English
from written English?
Given Michell Bruss and colleagues’ research question, we can infer that their
study was an assessment of the language used in scientific texts over the last
200 years. Therefore, we can also say Michell Bruss assessed the language used
in scientific texts over the last 200 years.
Given Max Louwerse and colleagues’ research question, we can infer that
their study was an examination of whether Coh-Metrix helps us better detect
document quality. Therefore, we can also say that they examined whether
Coh-Metrix could distinguish spoken English from written English.
Frozen Expressions. As always, a move has associated frozen expressions.
The most common frozen expression associated with the commencement
move is “In this study . . .”. Of course, the word “study” may change depending
on what the researchers view their undertaking to best represent (e.g., “study,”
“chapter,” “dissertation,” “project”). For simplicity, we refer to words like
“study” and “chapter” and the entire family of undertaking words as research
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 199 [194–222] 9.10.2013
8:03AM
nouns. As such, our frozen expression for the commencement move can be
stated as “In this” + [research noun]).
Putting all the pieces together, our model for the commencement move is:
“In this” + [research noun] (e.g., study) +
[Agent] (e.g., I or we) +
[Research verb] (e.g. assessed) +
[Research question] (e.g., Can Coh-Metrix distinguish spoken English
from written English?)
Testing the Model. To test our model for the commencement move, let’s look
at four more Coh-Metrix research questions, all first introduced in Chapter 8.
3. McNamara et al. (2011) asked: Does world knowledge affect young
readers’ comprehension?
4. Ozuru et al. (2007) asked: Does the passage (more so than the question)
explain the difficulty in standardized reading tests?
5. Best et al. (2004) asked: Do the effects of reading skills depend on the
genre of the text?
6. McCarthy et al. (2007) asked: Can Coh-Metrix replicate human ability
to recognize genre at the sub-sentential level?
As we see in Table 12.3, applying our model to these questions makes for
perfectly good commencement moves.
exposition move
As we saw earlier, the commencement move of many Coh-Metrix studies has
taken the form of how the research was conducted, why the research was
conducted, or what the results of the research were. To simplify matters, we
have recommended that the commencement move take the form of address-
ing what we did (where “what we did” is a modified version of the research
question). This recommendation means we are can now inform the reader-
ship of the other three common forms of opening: how the research was
conducted, why the research was conducted, and what the results of the
research were. For simplicity’s sake, we refer to these three elements as the
method-purpose-results elements of the exposition move. We have already
seen some examples of these elements, but now let’s look a little more closely
at them.
Method Element. Starting with the Method element, the following text
comes from McCarthy et al. (2008, p. 251): “Our corpus . . . was formed from a
subset of 100-sentence self-explanations from a recent iSTART experiment.”
This statement briefly explains the composition of the corpus; however,
the authors did not explain what was done to the corpus (e.g., how it was
measured or how it was analyzed). Readers are left to presume that either the
composition of the corpus is of greater importance than the analysis, or that
the analysis is given elsewhere in the discussion.
The most probable reasons for the Method element in the aforementioned
example being so short are: (1) the element isn’t required at all, so researchers
frequently highlight only the part to which they wish to draw attention;
(2) many papers have size restrictions, and reminding people of information
rather than providing new information can seem wasteful.
Purpose Element. Turning now to the Purpose element, the following text
comes from McNamara et al. (2010, p. 315): “There is a need in discourse
psychology for computational techniques to analyze text on levels of cohesion
and text difficulty, particularly because discourse psychologists increasingly
use longer, naturalistic texts from real-world sources.” This statement briefly
explains the reason for conducting the research (i.e., the purpose). The state-
ment takes the classic form of “this is important . . . because . . .”. The purpose
element is not common in the summary part of the discussion. Instead, it may
turn up in the implications move or the wind-up move (discussed later).
Results Element. And finally, we have the Results element. To examine this
element, let’s look at an extract from Roscoe et al. (2011, p. 285):
Linguistic analyses of introduction, body, and conclusion paragraphs using
Coh-Metrix revealed several properties associated with paragraph quality. Some
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 201 [194–222] 9.10.2013
8:03AM
features were common across all types: length, Givenness of information, and
vocabulary. Not surprisingly, paragraphs that were longer received higher ratings,
perhaps because they contained more elaborated arguments or evidence. Better
paragraphs also contained more given information, maintaining cohesion and
comprehensibility of ideas. Lastly, several measures of lexical sophistication were
predictive of paragraph quality, such as word frequency, hypernymy, and lexical
diversity. Paragraphs received higher scores when the writers displayed a deeper
and more varied choice of vocabulary. These results mimic those reported by
McNamara et al. (2010) regarding the entire essays.
The most notable feature of the extract, as compared to the previous elements,
is that it is long. Of course, the length stems from the fact that there is more
than one result that needs to be highlighted. A second notable feature is that
although the extract is “results,” there aren’t any numerals or statistics. Thus,
the results element is written in very general terms and writers can even get
away with a few examples of terms like more and greater without having to
add p-values (see Chapter 11).
Finally, note that the last sentence of the extract is less a statement of a
result and more a statement of implication. We deal with implications in
the denouement phase (discussed later in the chapter), so for now it is enough
to know that results elements can effectively end with a statement of the
implications of the results.
Frozen Expressions. Let’s take a moment to look at a few frozen expres-
sions that are common in the exposition move. The frozen expression “In
sum” is a useful way of joining together several smaller results into one big
picture. For example, McNamara et al. (2010, p. 76) write: “In sum, the results
of this study indicate that more-skilled writers use more sophisticated lan-
guage.” Obviously, what preceded this statement were several examples of
how sophisticated language results had been better explained by the more-
skilled writers.
A second common example of a frozen expression associated with the
exposition move is “our results suggested.” This expression is a very simple
way to highlight to readers that the results element is about to follow. The
word “suggested” is of great importance here and was discussed in Chapter 11.
The point is that no result is the final word, so hedging is always the path of
least resistance.
denouement phase
The word “denouement” (DAY-NOO-MAWN) is French in origin and means
“the unraveling of the knot.” Many people would argue that the denouement is
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 202 [194–222] 9.10.2013
8:03AM
the most important part of the Discussion section, serving to situate the result
of the study into the theoretical framework. In literature, movies, and drama
of any kind, the denouement is that part of the discourse in which all that is
unknown is made known, and in a Discussion section of a research paper it
functions the same way: by explaining how the mystery of the event (the
experiment) can be explained in terms of what we already know and agree
about the world (the theoretical framework).
This unraveling of the knot should be taken seriously, because by this stage
of the paper, all that has so far been presented are facts and figures that
any reasonably well-trained algorithm could produce. Indeed, the software
SCIgen (en.wikipedia.org/wiki/SCIgen) does exactly that by using moves and
frozen expressions, not unlike those described here, to generate nonsense
science papers that seem (to many people) to be just like the real thing. The
point here is that the researchers themselves have to unravel the knot; they
cannot rely heavily on moves and frozen expressions and instead they
must present a plausible explanation as to the interpretation of the result
and its subsequent implications. These two italicized words (interpretation
and implication) are key to this explanation, and they form the moves that
constitute the denouement move.
Interpretations Move
To better understand the purpose of the interpretation move, we can turn to
the literary phrase of a “willing suspension of disbelief.” This phrase, given
to us by Samuel Taylor Coleridge, informs us that an audience (readership,
discourse community) is willing to be persuaded, even of the most incredible
of things, like a flying man, or beaming people from one place to another, or
even such nonsense as politicians putting aside their ideological differences in
order to serve the greater good. But audiences won’t believe just anything.
There are limits to what people will believe. And those limits are reached
when elements such as consistency and reason fail to be maintained.
As an example of a willing suspension of disbelief, let’s consider the movie
Superman. More specifically, let’s consider one much-talked-about part of the
Ilya Salkind Superman series, specifically that while it was perfectly OK for
Superman to fly, it was not OK for him to fly so fast that by doing so he was
able to reverse time (which is what he did in order to bring Lois Lane back
from the dead). The difference between these two aspects (a flying man and
reversing time) is that Superman’s ability to fly is explained in numerous
places and at numerous times by the fact that he is from Krypton: a planet
with far more evolved people, and far greater gravity (thus Superman’s ability
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 203 [194–222] 9.10.2013
8:03AM
in both studies, those texts yielded different results depending on the method
used to measure them.
In Example 2, O’Reilly and his colleagues interpret their result as evidence
that their intervention (i.e., SERT training) is not transitory. This conclusion
is drawn from the fact that similar results were found at both the time of
training and one week after training was completed.
In Example 3, Crossley and his colleagues had previously explained that the
results showed a “difference” between the two text types studied (i.e., sim-
plified texts and authentic texts). The authors then try to explain why this
difference may have occurred.
Note that none of the interpretations are offered as a “proof” or a “claim of
fact.” Instead, the interpretations are offered only as a reasonable explanation
of how the results came to be what they are and how they fit into existing
theory. In that sense we can see that the authors have kept to the theoretical
storyline and supported any claims with appropriate reasoning.
Although the interpretations offered at the beginning of the section help
explain this critical feature of a discussion, we must admit that it is generally
quite difficult to precisely demarcate the ending of the result element of the
exposition move and the beginning of the interpretation move. In most studies,
authors will have blended into their narrative the interpretations, results,
implications, and many other elements. As such, our point is simply that an
interpretation should be there, and not that a single piece of text needs to be
reserved for its presentation.
Frozen Expressions. As always, there are some frozen expressions that
may help authors when writing the interpretation move.
Taken as a whole. This phrase is often useful for bundling up an array of
results before laying down an interpretation. The phrase also seems to have
the effect of lessening the impact of the weaker results. For example, we might
say: “Taken as a whole, the Allies fought an effective campaign during the
Second World War.” It is hard to see how anyone would disagree with such
a statement, and yet it neatly glosses over facts such as the Holocaust, the
millions of Allied and civilian deaths, and such military failures as Pearl
Harbor, the defence of the Philippines, or the opening U.S. military campaign
in Africa.
The result is encouraging. When results are clearly not all you would have
dreamed, but there is at least one avenue of hope, we can claim a result to be
“encouraging.” Encouraging is used in science in much the same way that it is
used in politics and sports. For example, when unemployment is going up,
but not by as much as it was the previous month, we often hear it described as
encouraging. And if a team loses a closely contested game in overtime, as
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 205 [194–222] 9.10.2013
8:03AM
Implications Move
Let’s recap. We have reminded the readers as to what our research study
was about; we have stated the main findings; and we have offered a plausible
interpretation of the results. Our next major task is to explain the implications
of the interpretations.
Two reasonable questions to ask at this time are “implications for what?”
and “implications for whom?” The what would be the theoretical framework:
Writers need to explain such elements as whether the results appear to
support (or not support) the current framework, in what ways they support
(or don’t support) the framework, and what is likely to happen if the frame-
work assimilates the findings of the current study. In turn, the whom would
generally refer to teachers, materials designers, industry, and also other
researchers: Writers need to explain how the results might affect material
production and material usage, in what ways the results affect that material,
and what is likely to happen if the material producers and users assimilate the
findings of the current study.
In terms of writing up the implications, it is fair to say that the implications
move is seldom a single stretch of text. Instead, it is more often the case that
implications tend to pop up around findings and interpretations and wher-
ever else is relevant (recall that the Discussion section is quite fluid). As such,
it is difficult to offer obvious examples of implications’ paragraphs. Despite
this difficulty, we have selected the following Coh-Metrix extracts to show
how implications are included in paragraphs, and how a variety of frozen
expressions can help foreground the implications.
Coh-Metrix Examples of the Implications Move. As we discussed in
Chapter 8, any experiment of any value is related to some kind of theoretical
framework. And as we discussed earlier in this chapter, one of the major
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 206 [194–222] 9.10.2013
8:03AM
[B]ecause we used the same texts as those in McNamara (2001, 2004), we predicted
that, overall, readers would have difficulty understanding the material. McNamara
(2001) argued that the difficulty of the text impeded readers’ ability to develop a
situation model of the material, and because so few readers were able to develop a
coherent situation model of the text, the default representation for the reader was the
textbase. This notion was supported by a difference of over 50% correct comparing
participants’ scores on text-based questions and bridging-inference questions. This
result contrasts with previous studies in which the overall performance for text-
based and bridging-inference questions was, on average, only 3% higher for text-
based questions as compared to bridging-inference questions (McNamara &
Kintsch, 1996, Experiments 1 and 2; McNamara et al., 1996, Experiment 2; emphasis
added).
Frozen Expressions. Once again, we have several frozen expressions that may
be helpful in directing the readership toward the implications of the study.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 207 [194–222] 9.10.2013
8:03AM
Text Authors
It thus expands our understanding of the ways in which Best, Ozuru, &
different types of reader interpret sentences in the McNamara (2004).
comprehension process.
The results of this analysis . . . suggest that authentic texts are Crossley et al. (2007)
significantly more likely than simplified texts to contain
causal verbs and particles. Therefore, they are possibly
better at demonstrating cause-and-effect relationships and
developing plot lines and themes than are simplified texts.
This finding supports many of the criticisms that have been
leveled against simplified texts by proponents of authentic
texts, including claims that simplified texts exhibit stilted
and unnatural language, do not demonstrate natural cause-
and-effect relationships, and do not develop plots and ideas
sufficiently.
These results suggest that the first half of sentences alone McCarthy et al. (2007)
contains sufficient domain characteristics for skilled
readers to begin the process of activating knowledge of text
structure: a process which facilitates comprehension. Such
research may lead to better understanding of how
knowledge is represented and subsequently activated.
This finding supports many of the criticisms that have been Crossley et al. (2007)
leveled against simplified texts by proponents of authentic
texts, including claims that simplified texts exhibit stilted
and unnatural language, do not demonstrate natural cause-
and-effect relationships, and do not develop plots and ideas
sufficiently.
For the implications moves, we find that the associated frozen expressions
generally take the form of a word or phrase that signals a transition in the
text from an interpretation toward an imminent implication. Common
examples of these terms include the adverbials “thus,” “hence,” “therefore,”
“as such,” “along these lines,” “consequently,” and “correspondingly.” Noun
phrases are also common, with examples including “this research,” “these
processes,” and “this analysis.” In general, readers’ comprehension is likely to
be facilitated by these expressions, because explicit transitionals require less
inferencing on the part of the reader.
In Table 12.5 we have provided several examples of frozen expressions
associated with the implications move. The first example comes from Best
et al. (2004). Here the authors use the word “thus” to indicate that the discourse
is transitioning from interpretations to implications. The second example
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 208 [194–222] 9.10.2013
8:03AM
comes from Crossley et al. (2007). Here the authors begin with a single opening
sentence that serves as an interpretation of a wide array of results from the
study. The remainder of the paragraph is dedicated to implications, wherein the
use of the word “therefore” serves much the same purpose as the word “thus.”
Note also the use of the frozen expression “this finding,” which, as we discussed
earlier, is accompanied by the word “supports.” The third example, from
McCarthy et al. (2007), uses the noun phrase “such research” to indicate a
forthcoming implication. And finally, the fourth example, from Crossley et al.
(2007), is the noun phrase “this finding,” which again indicates a forthcoming
implication.
acknowledgments phase
No project that claims to have made “valuable findings” can be simultane-
ously the “final word” on the subject. At the very least, those findings must be
open to scrutiny, and any conclusions based on those findings need to be
open to challenge. But long before any of that business can take place, the
researchers themselves must evaluate their own work: acknowledging prob-
lems, concerns, or shortcomings within the current study, and acknowledging
the long road ahead. These acknowledgments highlight the two major moves
of the acknowledgements phase: the limitations and the future research. Note
also that these two moves may also be combined into a single hybrid move.
Table 12.6 provides several examples of the acknowledgment moves, with
each discussed in detail over the forthcoming related sections.
Limitations Move
All studies have limitations: no corpus can account for every possible text; no
experiment can control for every variable; and no collection of indices can
ever provide more than an approximation of a construct. But the good news is
that all (most) reviewers know this, and they understand that the researchers’
requirement is to make a “good-faith effort” to provide results that reflect the
real world, and not to cover every possible angle of every possible eventuality.
If it were otherwise, no one would ever publish anything.
Perfection may not be compulsory, but there are still lines in the sand, gray
areas, and debatable points. Moreover, a research project can often start off
with indisputable data but end up with an analysis that is anything but
indisputable. For example, from a perfectly good corpus the researchers may
have detected an unusual phenomenon. The researchers wish to investigate this
phenomenon more closely; to do so, however, their number of items becomes
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 209 [194–222] 9.10.2013
8:03AM
(continued )
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 210 [194–222] 9.10.2013
8:03AM
t a b l e 1 2 . 6 ( cont.)
of human life), the project was limited by too few artists deciding to end their
own lives, and by those artists who did end it all not being sufficiently verbose.
Thus, the authors had to explain these extenuations in their limitations move.
Some people may argue that just as bad workmen blame their tools, so too
do bad scientists blame their data; however, as we discussed in Chapter 9, we
have to start somewhere (even with poor data), and showing our results (even
if they’re bad) and discussing our thoughts on why they are bad are more
likely to lead us to long-term success than simply ignoring issues or dumping
all our analyses in the trash.
about future research than limitations. As an author, you want to avoid talking
the reviewers out of publishing your paper by stressing its weaknesses. So,
serving up limitations as future research avoids highlighting the possible weak-
nesses that reviewers might have.
Two examples of this hybrid form are presented in Table 12.6. In the Hall
et al. (2007) extract, the researchers begin with the limitation, which they
present as a cause and effect (see the first sentence of the extract). The authors
then move directly to explain how this limitation forms the springboard for
their next analysis (see the second sentence). The subsequent example is
much more complex. The authors (Crossley et al., 2008) begin with a simple
acknowledgment that no study is perfect. They then proceed to their first of
three extenuations before finally admitting a limitation. This limitation is
then immediately followed by a second extenuation before there is an impli-
cation (which again contains an extenuation). Finally, the authors present
their solution to the limitation, which is, of course, future research.
The comments in the preceding paragraphs may seem like we are making
a joke at the authors’ expense; however, in most cases we are actually the
authors ourselves. But in any case, the point of importance here is that today’s
limitations are tomorrow’s publications, and that there is no shame in
acknowledging that. This having been said, drawing a spotlight to our least
favorable attribute is probably a little too altruistic. As such, we recommend
beginning researchers to think carefully about the limitations of their studies,
and what may mitigate those limitations, and to present the collected evi-
dence positively as a course for the future.
Frozen Expressions
As ever, a number of frozen expressions have evolved as part of the acknowl-
edgments move. Some of these expressions are listed in the following list:
Where X puts a limit on the word “limited” (e.g., might be, is somewhat,
arguably, etc.)
Although/While X, the results produced here offer an important and
exciting . . . .
Where X is an acknowledgment of the limitation
Although/While [acknowledgment of limitation], the results produced here
contribute to the field of X by Y.
Where X is the field in general or a particular subfield being highlighted
(e.g., the conference being applied to); and where Y is the interpretation and/or
implication of the findings.
closure phase
Who’s going to read your paper? Well, apart from your family and friends,
your audience is likely to be made up of reviewers, researchers, and profes-
sors. All of these people – except for your family and friends, who are most
likely positively biased – have two things in common: (1) they are all subject to
limited time, energy, and enthusiasm; and (2) they are all going to grade your
work. With these points in mind, it is well to remember that by the time your
readers have reached the final passage of your magnum opus, they’ll have had
to trawl through an ocean of facts, figures, and frameworks and, therefore,
they may be a little tired. But tired or not, your readers will probably choose to
evaluate your work, and to do so they will have to gather their thoughts as to
what the paper was really about and whether the effort they have just put
in was worthwhile. As such, this is the point of the paper where the writer is
advised to serve up a take-home message that is brief, memorable, and
satisfies the reader’s need for closure.
The closure phase features two moves: the wind-up and the pitch. Ideally,
closure is captured in a single paragraph, beginning with the wind-up and
ending with the pitch (which is the very last sentence of the paragraph). The
purpose of the wind-up element is to focus the reader on the “right” con-
clusion (i.e., the interpretation and implications of the study according to the
authors). The purpose of the pitch is to make that conclusion indelible.
Text Commentary
In this study, . . . Like in conclusion, the phrase in this study
signals the paper is transitioning to
wrapping up.
our interest in topic sentencehood A restatement of the purpose of the study
identification was directed at better
evaluation of text structure in order to
more effectively match text to reader.
Given that topic sentences are more likely to The authors then offer two restatements of
provide assistance to low skilled/low- assumptions from the theoretical
knowledge readers, and given that such framework. Although probably
readers would probably benefit more necessary to include this information,
from ideal type topic sentences, the clauses are both dependent, and
require the reader to hold a significant
amount of information in short term
memory before arriving at the main
clause.
then the Free Model of topic sentencehood A reasonable and accurate conclusion is
introduced here offers systems such as given; however, this final sentence is a
Coh-Metrix the opportunity to better massive 59 words long, with no fewer
assess texts and better fulfill the Coh- than 32 words occurring in the pitch
Metrix goal of optimally matching text to move. As such, this take-home message
readers. will require a truck and trailer.
reflects three facts about the closure move. First, excellent examples of the
move are not common. Second, the move probably deserves more attention
than it has been given. And third, the move is far from easy, demanding a
great deal of the “creativity,” which has often been mentioned in this chapter.
To close this section, two examples of closure moves are provided from
McCarthy, Renner et al. (2008) and McCarthy and McNamara (2007). The
abridged paragraphs along with corresponding critiques are provided in
Tables 12.7 and 12.8.
Text Commentary
While much work remains to be done, At the time this section of the chapter was
finally approved (August 2012), a Google
search of “while much work remains to
be done” provided no fewer than 108,000
hits. Removing the word “work”
provided 214,000 hits. Replacing “while”
with “although” provided 152,000 hits
with “work” included and 244,000 hits
with “work” removed. In short, this
expression is so commonly used that it
should not be thought of as a frozen
expression and would be better thought
of as a cliché. The point is, of course
much work remains to be done! No
reader needs to be reminded of this.
Instead, the readers simply need to be
told what work is planned, and those
plans should be written in the
appropriate acknowledgement section.
our study demonstrates that The authors introduce a summary
statement. Note how the paragraph
would have read perfectly well without
the previous cliché.
genre recognition at the sub-sentential level A brief and affective statement of study’s
is possible. achievement
Such recognition might provide a signature A reasonable summary of the implications.
of reading ability, and as a consequence,
a method of assessing reading ability. The
major results of this study certainly
provide sufficient initial evidence that
such an approach is viable and that this
paradigm can be further explored as an
assessment of reading skill.
Furthermore, there have been no previous The final two sentences demonstrate how
investigations of how much text is the study helps develop the theoretical
required to recognize genre. This study framework. This is an effective strategy,
indicates that very little text is actually although it is difficult to process the final
required and that readers most likely sentence as a pure pitch move.
activate information about text structure Consequently, the last two sentences run
very early in the reading process. together, meaning that they lose some
impact in terms of being memorable.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 217 [194–222] 9.10.2013
8:03AM
memorable), all of them provide the reader with an indelible impression of the
authors’ intent. Naturally, not all readers will agree as to the impact of these
example pitches, but having a strategy with which to form a pitch move may be
helpful.
a model discussion
This chapter has been long. So it may be useful to briefly provide a model of
how the Discussion section might fit together.
In this study, we assessed whatever our research question was. In order
to address whatever we were addressing, we whatever we did to address
it. Our findings suggest whatever they suggest. The study is important
because why it is important.
Collectively/In sum/Broadly speaking/Taken as a whole, our results
should be interpreted to mean something.
Our findings support/contrast with whoever and whatever they support
and whoever and whatever they contrast with.
The implications of our findings
raise questions as to whatever they raise questions as to
indicate whatever they indicate
may mean whatever they may mean
provide evidence of/for whatever it provides evidence of or for
Although our study provided something positive, there are issues as to
whatever there are issues of. Future research needs to address whatever it
needs to address.
In conclusion, our study what the study did in terms of the research
question, especially as it relates to the theoretical framework. Zinger pitch
in terms of some identifiable function.
Using the information provided in this chapter, along with the immediately
preceding model, Table 12.10 provides a complete Discussion section based on
the Elevator Pitch that was provided in Chapter 7.
and finally
Experience tells us that the Discussion section, more so than any other
section, may well receive the least amount of author’s attention. Quite often
the Discussion section will not even be included when a student submits a
draft for review. Instead, a note will be attached along the lines of “I’ll fill in
the Discussion later.” Even for this book, there was some debate as to whether
8:03AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D
t a b l e 1 2 . 1 0 A model of the discussion section by sequential position, paragraph position, discussion phase,
discussion move, and element of move
219
assessed in a series of t-tests.
3 1 Summary exposition purpose The study is important because anyone needing to learn how to
communicate affectively (or how to understand what makes
[194–222] 9.10.2013
affective communication) needs to understand how the features
of language can differ between contrasting registers, and why
these differences are present.
4 1 Summary exposition result Our results suggested that the language of news reports becomes
more complex when reporting global issues. Specifically, global
news reports were significantly lower in terms of situation model
cohesion and syntactic ease. In addition, global news reports
demonstrated significantly higher lexical diversity values,
meaning that a greater range of vocabulary was deployed across
the texts. The result for narrativity was not significant.
5 2 Denouement interpretation / A plausible reason for these results is that any reporting of global
news is likely to be an important story, and therefore one that is
difficult to explain. This complexity may be reflected in the
language selected by writers. If this writing is more complex then
it is possible that writers either don’t realize the complexity, or
(continued )
8:03AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D
t a b l e 1 2 . 1 0 ( cont.)
220
simply be prohibitive to a structure like newspapers.
6 3 Denouement implications / The findings of this study contrast with previous research (e.g., by
researchers such as Graesser, Clark, McNamara, Swales, or
[194–222] 9.10.2013
Kintsch) inasmuch as the newspaper texts appear to be back-to-
front in terms of cohesion. That is, theory suggests that
background knowledge, schemas, and expectations of shared
experience need to be established in order to increase the
likelihood of comprehension, and that explicit cohesion at the
level of the text might facilitate this goal. As such, the more
complex global news story may require more facilitative
language, whereas the local news can assume some degree of
common ground. In the event, the findings suggest a
simplification for local news and a less cohesive text for global
news. Assuming comprehension is the goal of the newspaper
(which is reasonable), these results have important implications
because they suggest that reporters could possibly better serve
their readership with an adjustment in their levels of cohesion.
7 4 Acknowledgements limitations / future / Although the findings of this study offer important insight into
research perception of local and global issues, we advice some caution
with the interpretations of these results until further research can
(continued )
8:03AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D
be conducted on this complex issue. For instance, future research
must consider to what degree a newspaper is a “learning text,”
and to what degree it can be compared to a something like a more
standard high-school text. Such information will better inform us
as to expected comprehension levels and expected Coh-Metrix
values of the corpora. Further, issues such as the reporter type
and the audience type need to be considered. That is, do readers
process newspaper text from non-native English speaking
countries in a similar way to how native English newspapers are
processed? In short, this study is somewhat limited by the
difficulty it has in establishing a sufficiently wide number of
baselines against which to better understand the findings of this
221
study. To be sure, these baselines will be helpful in future
research; however, until that can be achieved, the results
[194–222] 9.10.2013
produced here offer an important and exciting avenue of pursuit.
8 5 Closure wind up / We write news and we read news because we want to understand
our world: both the world close to us, and the world far away.
How this news is reported is just as important as what is reported
because our comprehension of the news dictates its value. This
study demonstrates that reports of local events are textually
different from reports of global events. And more importantly,
that complex events might be associated with less facilitative
language. The results here cannot yet supply evidence that
adding cohesion to news text would be beneficial to news
comprehension, future research will need to address that issue;
however, what this study does provide is evidence that Coh-
Metrix analysis of news text can detect levels of potentially
beneficial lexical features.
9 5 Closure pitch / Consequently, we have the intriguing possibility that Coh-Metrix
analysis might provide for greater comprehension of one the
world’s most widely circulated information materials: the news.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 222 [194–222] 9.10.2013
8:03AM
Concluding Remarks
Our hope in this book has been to provide readers with a coherent description
of Coh-Metrix and how to make use of it. Coh-Metrix has changed our lives
in terms of how we conduct research: from the way we ask questions to the
way we answer them. Our understanding of text, including natural language,
discourse, and linguistics, has grown exponentially as we have developed
Coh-Metrix and explored language using our tools. It has opened doors we
never dreamed existed.
To us, Coh-Metrix is like using the Internet. That is, just as we can now
type pretty much any question into a search engine and expect to get an
actionable answer, so too can we also ask Coh-Metrix to transform our vast
quantities of data into output that answers a world of questions about
language. But of course, there are certainly limits to Coh-Metrix 3.0. First,
although we have explored hundreds of indices in this project, Coh-Metrix
3.0 only includes a subset of these indices. Nonetheless, we have attempted
to include what we consider to be the most important and valid indices
among the entire array. Second, Coh-Metrix includes a wide variety of
indices, but most of these are related to text difficulty, and our particular
focus has been on measures related to cohesion. As such, Coh-Metrix
cannot answer every question about language. Third, our motto in the
Coh-Metrix project has been to explore the “low-hanging fruit.” The indices
we provide in Coh-Metrix tend not to involve highly complex computa-
tional linguistic algorithms. We have avoided algorithms that are computa-
tionally expensive because of the need to process text and provide results
relatively quickly. That said, the Coh-Metrix variables that we have included
are the potential building blocks of far more sophisticated assessments that
we will continue to develop in the next phases of the Coh-Metrix project.
Despite these limitations, Coh-Metrix provides a gold mine of information
about text – and all of it in one tool. We know from the use of the past
223
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 224 [223–228] 9.10.2013
8:16AM
that other researchers can benefit from them, but also so that findings,
developments, and discoveries are still current when they are disseminated.
It is virtually impossible to keep up with the rapid pace of technological
advances. During the last decade, the capabilities of technology have grown
exponentially. Indeed, we expect the opportunities from this growth to
provide exciting new adventures that we couldn’t possibly foretell today.
We look forward to the next decade, and the decades following, of Coh-
Metrix and its progeny. We hope you enjoy it too!
Finally, we have some concluding remarks reserved solely for our student
readers.
evenings trying to help you develop a career, they actually also have the occa-
sional life of their own. And when they receive an e-mail saying, “Thanks for the
comments but I actually sent you the wrong draft – the right draft is now
attached,” they are likely to descend into a hitherto unimaginable outpouring of
spit-infested meltdown. If you do suddenly realize the error of your ways, you
are advised to withdraw from the course, and possibly from the country.
References
Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge,
MA: MIT Press.
Allen, J. (1995). Natural language understanding. Redwood City, CA: Benjamin/Cummings.
Allen, J. F. (2009). Word senses, semantic roles and entailment. 5th International
Conference on Generative Approaches to the Lexicon, September 17–19, 2009. Pisa, Italy.
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database
(CD-ROM). Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
Beck, I., McKeown, M. G., & Kucan, L. (2002). Bringing words to life: Robust vocabulary
development. New York: Guilford Press.
Beck, I. L., McKeown, M. G., Omanson, R. C., & Pople, M. T. (1984). Improving the
comprehensibility of stories: The effects of revisions that improve coherence. Reading
Research Quarterly, 19, 263–277.
Beck, I. L., McKeown, M. G., Sinatra, G. M., & Loxterman, J. A. (1991). Revising social
studies text from a text-processing perspective: Evidence of improved comprehensi-
bility. Reading Research Quarterly, 27, 251–276.
Bell, C., McCarthy, P. M., & McNamara, D. S. (2012). Using LIWC and Coh-Metrix to
investigate gender differences in linguistic styles. In P. M. McCarthy & C. Boonthum-
Denecke (Eds.), Applied natural language processing and content analysis: Identification,
investigation, and resolution (pp. 545–556). Hershey, PA: IGI Global.
Best, R., Ozuru, Y., & McNamara, D. S. (2004). Self-explaining science texts: Strategies,
knowledge, and reading skill. In Y. B. Kafai, W. A. Sandoval, N. Enyedy, A. S. Nixon, &
F. Herrera (Eds.), Proceedings of the Sixth International Conference of the Learning
Sciences: Embracing Diversity in the Learning Sciences (pp. 89–96). Mahwah, NJ: Erlbaum.
Best, R. M., Floyd, R. G., & McNamara, D. S. (2008). Differential competencies contri-
buting to children’s comprehension of narrative and expository texts. Reading
Psychology, 29, 137–164.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, D. (1993). Register variation and corpus design, computational linguistics. Cambridge:
Cambridge University Press.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language
structure and use. Cambridge: Cambridge University Press.
229
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 230 [229–246] 7.10.2013
3:30PM
230 References
References 231
Conrad, F. G., & Schober, M. F. (Eds.). (2007). Envisioning the survey interview of the
future. New York: Wiley.
Crismore, A., Markkanen, R., & Steffensen, M. S. (1993). Metadiscourse in persuasive
writing: a study of texts written by American and finish university students. Written
Communication, 39, 39–71.
Crossley, S. A., Allen, D., & McNamara, D. S. (2011). Text readability and intuitive sim-
plification: A comparison of readability formulas. Reading in a Foreign Language, 23,
84–102.
Crossley, S. A., Allen, D., & McNamara, D. S. (2012). Text simplification and compre-
hensive input: A case for intuitive approach. Language Teaching and Research, 16,
89–108.
Crossley, S. A., Dufty, D. F., McCarthy, P. M., & McNamara, D. S. (2007). Toward a new
readability: A mixed model approach. In D. S. McNamara & G. Trafton (Eds.),
Proceedings of the 29th annual conference of the Cognitive Science Society (pp. 197–202).
Austin, TX: Cognitive Science Society.
Crossley, S. A., Greenfield, J., & McNamara, D. S. (2008). Assessing text readability using
psycholinguistic indices. TESOL Quarterly, 42, 475–493.
Crossley, S. A., Louwerse, M., McCarthy, P. M., & McNamara, D. S. (2007). A linguistic
analysis of simplified and authentic texts. Modern Language Journal, 91, 15–30.
Crossley, S. A., McCarthy, P. M., & McNamara, D. S. (2007). Discriminating between
second language learning text-types. In D. Wilson & G. Sutcliffe (Eds.), Proceedings of
the 20th International Florida Artificial Intelligence Research Society Conference
(pp. 205–210). Menlo Park, California: The AAAI Press.
Crossley, S. A., & McNamara, D. S. (2008). Assessing second language reading texts at
the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy,
and McNamara (2007). Language Teaching, 41, 229–409.
Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differ-
ences in L1 and L2 writing. Journal of Second Language Writing, 18, 119–135.
Crossley, S. A., & McNamara, D. S. (2010). Cohesion, coherence, and expert evaluations of
writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd
Annual Conference of the Cognitive Science Society (pp. 984–989). Austin, TX: Cognitive
Science Society.
Crossley, S. A., & McNamara, D. S. (2011a). Text coherence and judgments of essay
quality: Models of quality and coherence. In L. Carlson, C. Hoelscher, & T. F. Shipley
(Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society
(pp. 1236–1231). Austin, TX: Cognitive Science Society.
Crossley, S. A., & McNamara, D. S. (2011b). Understanding expert ratings of essay
quality: Coh-Metrix analyses of first and second language writing. International
Journal of Continuing Engineering Education and Life, 21, 170–191.
Crossley, S. A., & McNamara, D. S. (2012a). Detecting the first language of second language
writers using automated indices of cohesion, lexical sophistication, syntactic complexity
and conceptual knowledge. In S. Jarvis & S. A. Crossley (Eds.), Approaching language
transfer through text classification: Explorations in the detection-based approach
(pp. 106–126). Bristol, UK: Multilingual Matters.
Crossley, S. A., & McNamara, D. S. (2012b). Interlanguage Talk: A computational anal-
ysis of non-native speakers’ lexical production and exposure. In P. M. McCarthy &
C. Boonthum-Denecke (Eds.), Applied natural language processing and content
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 232 [229–246] 7.10.2013
3:30PM
232 References
analysis: Identification, investigation, and resolution (pp. 425–437). Hershey, PA: IGI
Global.
Crossley, S. A., Roscoe, R., Graesser, A., & McNamara, D. S. (2011). Predicting human
scores of essay quality using computational indices of linguistic and textual features.
In G. Biswas, S. Bull, J. Kay, & A. Mitrovic (Eds.), Proceedings of the 15th International
Conference on Artificial Intelligence in Education. (pp. 438–440). Auckland, New
Zealand: AIED.
Crossley, S. A., Salsbury, T., McCarthy, P. M., & McNamara, D. S. (2008), LSA as a measure
of coherence in second language natural discourse. In V. Sloutsky, B. Love, & K. McRae
(Eds.), Proceedings of the 30th annual conference of the Cognitive Science Society
(pp. 1906–1911). Washington, DC: Cognitive Science Society.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2009). Measuring L2 lexical growth
using hypernymic relationships. Language Learning, 59, 307–334.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010a). The development of polysemy
and frequency use in English second language speakers. Language Learning, 60,
573–605.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010b). The development of semantic
relations in second language speakers. A case for Latent Semantic Analysis. Vigo
International Journal of Applied Linguistics, 7, 55–74.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010c). The role of lexical cohesive
devices in triggering negotiations for meaning. Issues in Applied Linguistics, 18,
55–80.
Crossley, S. A., Weston, J., McLain Sullivan, S. T., & McNamara, D. S. (2011). The
development of writing proficiency as a function of grade level: A linguistic analysis.
Written Communication, 28, 282–311.
Defense Advanced Research Projects Agency (DARPA) (1995). Proceedings of the Sixth
Message Understanding Conference (MUC-6). San Francisco: Morgan Kaufman
Publishers.
Day, R. S. (2006). Comprehension of prescription drug information: Overview of a
research program. In Proceedings of the American Association for Artificial
Intelligence, Argumentation for Consumer Healthcare. Retrieved September 16, 2013,
from http://www.aaai.org/Library/Symposia/Spring/2006/ss06-01-005.php
Dell, G., McKoon, G., & Ratcliff, R. (1983). The activation of antecedent information
during the processing of anaphorix reference in reading. Journal of Verbal Learning
and Verbal Behavior, 22, 121–132.
Dempsey, K. B., McCarthy, P. M., & McNamara, D. S. (2007). Using phrasal verbs as an
index to distinguish text genres. In D. Wilson and G. Sutcliffe (Eds.), Proceedings of the
twentieth International Florida Artificial Intelligence Research Society Conference
(pp. 217–222). Menlo Park, CA: The AAAI Press.
Dufty, D. F., Graesser, A. C., Louwerse, M., & McNamara, D. S. (2006). Assigning grade
level to textbooks: Is it just readability? In R. Sun & N. Miyake (Eds.), Proceedings of
the 28th Annual Conference of the Cognitive Science Society (pp. 1251–1256). Austin,
TX: Cognitive Science Society.
Dufty, D. F., McNamara, D., Louwerse, M., Cai, Z., & Graesser, A. C. (2004). Automatic
evaluation of aspects of document quality. In S. Tilley & S. Huang (Eds.), Proceedings
of the 22nd Annual International Conference on Design of Communication: the
Engineering of Quality Documentation (pp. 14–16). New York: ACM Press.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 233 [229–246] 7.10.2013
3:30PM
References 233
Duncan, B., & Hall, C. (2009). A coh-metrix analysis of variation among biomedical
abstracts. In Florida Artificial Intelligence Research Society Conference (pp. 237–242).
Menlo Park, CA: The AAAI Press.
Duran, N., Bellissens, C., Taylor, R., & McNamara, D. S. (2007). Qualifying text difficulty
with automated indices of cohesion and semantics. In D. S. McNamara & G. Trafton
(Eds.), Proceedings of the 29th Annual Meeting of the Cognitive Science Society
(pp. 233–238). Austin, TX: Cognitive Science Society.
Duran, N. D., Hall, C., McCarthy, P. M., & McNamara, D. S. (2010). The linguistic
correlates of conversational deception: Comparing natural language processing tech-
nologies. Applied Psycholinguistics, 31, 439–462.
Duran, N. D., McCarthy, P. M., Graesser, A. C., & McNamara, D. S. (2006). Using Coh-
Metrix temporal indices to predict psychological measures of time. In R. Sun &
N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science
Society (pp. 190–195). Austin, TX: Cognitive Science Society.
Duran, N. D., McCarthy, P. M., Graesser, A. C., & McNamara, D. S. (2007). Using
temporal cohesion to predict temporal coherence in narrative and expository texts.
Behavior Research Methods, 39, 212–223.
Duran, N. D., & McNamara, D. S. (2006, July). It’s about time: Discriminating differences
in temporality between genres. Poster presented at the 16th Annual Meeting of the
Society for Text and Discourse, Minneapolis, MN.
Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Database [CD-ROM].
Cambridge, MA: MIT Press.
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233.
Freedman, A., & Ian, P. (1980). Writing in the college years: Some indices of growth.
College Composition and Communication, 31, 311–324.
Garnham, A., Oakhill, J., & Johnson-Laird, P. N. (1982). Referential continuity and the
coherence of discourse. Cognition, 11, 29–46.
Gernsbacher, M. A. (1990). Language comprehension as structure building. Hillsdale, NJ:
Erlbaum.
Gernsbacher, M. A., & Givón, T. (Eds.). (1995). Coherence in spontaneous text. Amsterdam:
Benjamins.
Gildea, D. (2001). Corpus variation and parser performance. In D. Yarowsky (Ed.),
Proceedings of the 2001 Conference on Empirical Methods in Natural Language
Processing (pp. 167–202). Pittsburgh, PA: NAACL.
Gilhooly, K. L., & Logie, R. H. (1980). Age of acquisition, imagery, concreteness, familiar-
ity and ambiguity measures for 1944 words. Behavioral Research Methods and
Instrumentation, 12, 395–427.
Givón, T. (1995).Functionalism and grammar. Philadelphia: John Benjamins.
Graesser, A. C. (1981). Prose comprehension beyond the word. New York: Springer-Verlag.
Graesser, A. C., Cai, Z., Louwerse, M., & Daniel, F. (2006). Question Understanding Aid
(QUAID): A web facility that helps survey methodologists improve the comprehen-
sibility of questions. Public Opinion Quarterly, 70, 3–22.
Graesser, A. C., Chipman, P., Haynes, B. C., & Olney, A. (2005). AutoTutor: An intelli-
gent tutoring system with mixed-initiative dialogue. IEEE Transactions in Education,
48, 612–618.
Graesser, A. C., Dowell, N., & Moldovan, C. (2011). A computer’s understanding of
literature. Scientific Studies of Literature, 1, 24–33.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 234 [229–246] 7.10.2013
3:30PM
234 References
References 235
Haviland, S. E., & Clark, H. H. (1974). What’s new? Acquiring new information as
a process in comprehension. Journal of Verbal Learning and Verbal Behavior, 13,
515–521.
Healy, S. L., Weintraub, J. D., McCarthy, P. M., Hall, C., & McNamara, D. S. (2009).
Assessment of LDAT as a grammatical diversity assessment tool. In C. H. Lane &
H. W. Guesgen (Eds.), Proceedings of the 22nd International Florida Artificial Intelligence
Research Society (FLAIRS) International Conference (pp. 249–253). Menlo Park, CA: The
AAAI Press.
Hempelmann, C. F., Dufty, D., McCarthy, P., Graesser, A. C., Cai, Z., & McNamara, D. S.
(2005). Using LSA to automatically identify givenness and newness of noun-phrases
in written discourse. In B. Bara (Ed.), Proceedings of the 27th Annual Meeting of the
Cognitive Science Society (pp. 941–946). Mahwah, NJ: Erlbaum.
Hempelmann, C. F., Rus V., Graesser, A. C., & McNamara, D. S. (2006). Evaluating state-
of-the-art treebank-style parsers for Coh-Metrix and other learning technology envi-
ronments. Natural Language Engineering, 12, 131–144.
Herskovits, A. (1998). Schematization. In P. Olivier & K. P. Gapp (Eds.), Representation
and processing of spatial expressions (pp. 149–162). Mahwah, NJ: Lawrence Erlbaum
Associates.
Hu, X., Cai, Z., Louwerse, M. M., Olney, A. M., Penumatsa, P., & Graesser, A. C. (2003). A
revised algorithm for Latent Semantic Analysis. Proceedings of the 2003 International
Joint Conference on Artificial Intelligence (pp. 1489–1491). San Francisco: Morgan
Kaufmann.
Huot, B. (1996). Toward a new theory of writing assessment. College composition and
communication, 47, 549–566.
Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity.
Language Testing, 19, 57–84.
Jurafsky, D., & Martin, J. (2008). Speech and language processing. Englewood, NJ:
Prentice Hall.
Just, M. A., & Carpenter, P. A. (1971). Comprehension of negation with quantification.
Journal of Verbal Learning and Verbal Behavior, 12, 21–31.
Just, M. A., & Carpenter, P. A. (1987). The psychology of reading and language compre-
hension. Boston: Allyn & Bacon.
Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual
differences in working memory. Psychological Review, 99, 122–149.
Kallet H. (2004) How to write the methods section of a research paper. Respiratory Care
Services, 49, 1229–1232.
Kalyuga, S. (2012). Cognitive load aspects of text processing. In C. Boonthum-Denecke,
P. McCarthy, & T. Lamkin (Eds.), Cross-disciplinary advances in applied natural
language processing: Issues and approaches (pp. 114–132). Hershey, PA: Information
Science Reference.
Kamil, M. L., Pearson, D., Moje, E. B., & Afflerbach, P. (Eds.). (2010). Handbook of
reading research (Vol. 4). New York: Routledge
Keenan, J. M., Betjemann, R. S., & Olson, R. K. (2008). Reading comprehension tests vary
in the skills they assess: Differential dependence on decoding and oral comprehension.
Scientific Studies of Reading, 12, 281–300.
Keil, F. C. (1981). Constraints on knowledge and cognitive development. Psychological
Review, 88, 197–227.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 236 [229–246] 7.10.2013
3:30PM
236 References
Kieras, D. E. (1978). Good and bad structure in simple paragraphs: Effects on apparent
theme, reading time, and recall. Journal of Verbal Learning and Verbal Behavior, 17,
13–28.
Kincaid, J., Fishburne, R., Rogers, R., & Chissom, B. (1975). Derivation of new readability
formulas for navy enlisted personnel. Branch Report 8–75. Millington, TN: Chief of
Naval Training.
King, M., & Rentel, V. (1979). Toward a theory of early writing development. Research in
the Teaching of English, 13, 243–255.
Kinnear, P. R., & Gray, C. D. (2008). SPSS 15 made simple. New York: Psychology Press.
Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge: Cambridge
University Press.
Kintsch, W. (1988). The role of knowledge in discourse comprehension: a construction-
integration model. Psychological review, 95, 163–182.
Kintsch, W., & Keenan, J. (1973). Reading rate and retention as a function of the number
of propositions in the base structure of sentences. Cognitive psychology, 5, 257–274.
Kintsch, W., Kozminsky, E., Streby, W. J., McKoon, G., & Keenan, J. M. (1975).
Comprehension and recall of text as a function of content variables. Journal of
Verbal Learning and Verbal Behavior, 14, 196–214.
Kintsch, W., & Van Djik, T. A. (1978). Toward a model of text comprehension and
production. Psychological Review, 85, 363–394.
Kireyev, K., & Landauer, T. (2011). Word maturity: Computational modeling of word
knowledge. In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies (pp. 299–308). Portland,
OR: Association for Computational Linguistics.
Klare, G. R. (1974–1975). Assessing readability. Reading Research Quarterly, 10, 62–102.
Klein, W. (1994). Time in language. London: Routledge.
Koslin, B. I., Zeno, S., & Koslin, S. (1987). The DRP: An effective measure in reading. New
York: College Entrance Examination Board.
Lamkin, T. A., & McCarthy, P. M. (2012). The hierarchy of detective fiction. In
C. Murray & P. M. McCarthy (Eds.), Proceedings of the 24rd International Florida
Artificial Intelligence Research Society Conference (pp. 257–262). Menlo Park, CA: The
AAAI Press.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent
semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological Review, 104, 211–240.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment.
Assessment in Education: Principles, Policy & Practice, 10, 295–308.
Landauer, T., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2007). Handbook of
latent semantic analysis. Mahwah, NJ: Erlbaum.
Lappin, S., & Leass, H. J. (1994). An algorithm for pronominal coreference resolution.
Computational Linguistics, 20, 535–561.
Leahey, T. H., & Harris, R. J. (1997). Learning and cognition (4th ed.). Saddle River, NJ:
Prentice Hall.
Leech, N. L., Barrett, K. C., & Morgan, G. A. (2008). SPSS for intermediate statistics: Use
and interpretation. Mahwah, NJ: Lawrence Erlbaum Associates.
Lehnert, W. G., & Ringle, M. H. (Eds.). (1982). Strategies for natural language processing.
Hillsdale, NJ: Lawrence Erlbaum.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 237 [229–246] 7.10.2013
3:30PM
References 237
238 References
References 239
McCarthy, P. M., Rus, V., Crossley, S. A., Graesser, A. C., & McNamara, D. S. (2008).
Assessing forward-, reverse-, and average-entailment indices on natural language
input from the intelligent tutoring system, iSTART. In D. Wilson & G. Sutcliffe
(Eds.), Proceedings of the 21st International Florida Artificial Intelligence Research
Society (FLAIRS) Conference (pp. 165–170). Menlo Park, CA: The AAAI Press.
McCarthy, P. M., Watanabe, S., & Lamkin, T. A. (2012). The Gramulator: A tool to
identify differential linguistic features of correlative text types. In P. M. McCarthy &
C. Boonthum-Denecke (Eds.), Applied natural language processing: Identification,
investigation, and resolution (pp. 312–333). Hershey, PA: IGI Global.
McCutchen, D. (1986). Domain knowledge and linguistic knowledge in the development
of writing ability. Journal of Memory and Language, 25, 431–444.
McCutchen, D., & Perfetti, C. A. (1982). Coherence and connectedness in the develop-
ment of discourse production. Text, 2, 113–139.
McNamara, D. S. (1997). Comprehension skill: A knowledge-based account. In
M. G. Shafto & P. Langley (Eds.), Proceedings of the Nineteenth Annual Conference
of the Cognitive Science Society (pp. 508–513). Hillsdale, NJ: Erlbaum.
McNamara, D. S. (2001). Reading both high-coherence and low-coherence texts: Effects
of text sequence and prior knowledge. Canadian Journal of Experimental Psychology,
55, 51–62.
McNamara, D. S. (2004). SERT: Self-explanation reading training. Discourse Processes,
38, 1–30.
McNamara, D. S. (2011). Computational methods to extract meaning from text and
advance theories of human cognition. Topics in Cognitive Science, 2, 1–15.
McNamara, D. S., Boonthum, C., Levinstein, I. B., & Millis, K. (2007). Evaluating self-
explanations in iSTART: Comparing word-based and LSA algorithms. In T. Landauer,
D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis
(pp. 227–241). Mahwah, NJ: Erlbaum.
McNamara, D. S., Cai, Z., & Louwerse, M. M. (2007). Optimizing LSA measures of
cohesion. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.),
Handbook of latent semantic analysis (pp. 379–400). Mahwah, NJ: Erlbaum.
McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of
writing quality. Written Communication, 27, 57–86.
McNamara, D. S., Crossley, S. A., & Roscoe, R. D. (2013). Natural language processing
in an intelligent writing strategy tutoring system. Behavior Research Methods, 45,
499–515.
McNamara, D. S., & Dempsey, K. (2011). Reader expectations of question formats and
difficulty: Targeting the zone. In M. McCrudden, J. Magliano, & G. Schraw (Eds.),
Text relevance and learning from text (pp. 321–352). Charlotte, NC: Information Age
Publishing.
McNamara, D. S., & Graesser, A. C. (2012). Coh-Metrix: An automated tool for theoret-
ical and applied natural language processing. In P. M. McCarthy & C. Boonthum
(Eds.), Applied natural language processing and content analysis: Identification, inves-
tigation, and resolution (pp. 188–205). Hershey, PA: IGI Global.
McNamara, D. S., Graesser, A. C., & Louwerse, M. M. (2012). Sources of text difficulty:
Across genres and grades. In J. P. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring
up: Advances in how we assess reading ability (pp. 89–116). Plymouth, UK: Rowman &
Littlefield Education.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 240 [229–246] 7.10.2013
3:30PM
240 References
McNamara, D. S., & Kintsch, W. (1996). Learning from text: Effects of prior knowledge
and text coherence. Discourse Processes, 22, 247–287.
McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always
better? Text coherence, background knowledge, and levels of understanding in learn-
ing from text. Cognition and Instruction, 14, 1–43.
McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (2010). Coh-
Metrix: Capturing linguistic features of cohesion. Discourse Processes, 47, 292–330.
McNamara, D. S., & Magliano, J. P. (2009). Towards a comprehensive model of com-
prehension. In B. Ross (Ed.), The psychology of learning and motivation (Vol. 51,
pp. 297–384). New York: Elsevier Science.
McNamara, D. S., & McDaniel, M. (2004). Suppressing irrelevant information:
Knowledge activation or inhibition? Journal of Experimental Psychology: Learning,
Memory, & Cognition, 30, 465–482.
McNamara, D. S., Ozuru, Y., & Floyd, R. G. (2011). Comprehension challenges in the
fourth grade: The roles of text cohesion, text genre, and readers’ prior knowledge.
International Electronic Journal of Elementary Education, 4, 229–257.
McNamara, D. S., Ozuru, Y., Graesser, A. C., & Louwerse, M. (2006). Validating Coh-
Metrix. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the
Cognitive Science Society (pp. 573–578). Austin, TX: Cognitive Science Society.
McNamara, D. S., Raine, R., Roscoe, R., Crossley, S., Jackson, G. T., Dai, J., Cai, Z.,
Renner, A., Brandon, R., Weston, J., Dempsey, K., Lam, D., Sullivan, S., Kim, L.,
Rus, V., Floyd, R., McCarthy, P. M., & Graesser, A. C. (2012). The Writing-Pal: Natural
language algorithms to support intelligent tutoring on writing strategies. In
P. M. McCarthy & C. Boonthum (Eds.), Applied natural language processing and
content analysis: Identification, investigation, and resolution (pp. 298–311). Hershey,
PA: IGI Global.
Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability.
London: National Assessment Agency.
Meichenbaum, D., & Biemiller, A. (1998). Nurturing independent learners: Helping
students take charge of their learning. Cambridge, MA: Brookline Books.
Meyer, B. J. F. (1975). The organization of prose and its effect on memory. New York:
Elsevier.
Meyer, B. J. F., & Wijekumar, K. (2007). Web-based tutoring of the structure strategy:
Theoretical background, design, and findings. In D. S. McNamara (Ed.), Reading
comprehension strategies: Theories, interventions, and technologies (pp. 347–375).
Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to
WordNet: An on-line lexical database. Journal of Lexicography, 3, 235–244.
Miller, J. R., & Kintsch, W. (1980). Readability and Recall of Short Prose Passages: A
Theoretical Analysis. Journal of Experimental Psychology: Human Learning and
Memory, 6, 335–354.
Millis, K., Graesser, A. C., & Haberlandt, K. (1993). The impact of connectives on
memory for expository texts. Applied Cognitive Psychology, 7, 317–340.
Millis, K., Magliano, J., Wiemer-Hastings, K., Todaro, S., & McNamara, D. S. (2007).
Assessing and improving comprehension with Latent Semantic Analysis. In
T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent
semantic analysis (pp. 207–225). Mahwah, NJ: Erlbaum.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 241 [229–246] 7.10.2013
3:30PM
References 241
Min, H. C., & McCarthy, P. M. (2010). Identifying varietals in the discourse of American
and Korean scientists: A contrastive corpus analysis using the gramulator. In
H. W. Guesgen & C. Murray (Eds.), Proceedings of the 23rd International Florida
Artificial Intelligence Research Society Conference (pp. 247–252). Menlo Park, CA: The
AAAI Press.
Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2012). Measures of text difficulty: Testing
their predictive value for grade levels and student performance. New York: Student
Achievement Partners.
Oakhill, J., & Cain, K. (2007). Issues of causality in children’s reading comprehension. In
K. Cain & J. Oakhill (Eds.), Cognitive bases of children’s language comprehension
difficulties. New York: Guilford.
Oakhill, J., Yuill, N., & Donaldson, M. L. (1990). Understanding of causal expressions in
skilled and less skilled text comprehenders. British Journal of Developmental Psychology,
8, 401–410.
Oakhill, J. V. (1984). Inferential and memory skills in children’s comprehension of
stories. British Journal of Educational Psychology, 54, 31–39.
Oakhill, J. V., & Yuill, N. M. (1996). Higher order factors in comprehensive disability:
Processes and remediation. In C. Cornoldi & J. V. Oakhill (Eds.), Reading compre-
hension difficulties: Processes and remediation (pp. 69–93). Mahwah, NJ: Lawrence
Erlbaum Associates.
O’Brien, E. J., Rizzella, M. L., Albrecht, J. E., & Halleran, J. G. (1998). Updating a situation
model: A memory-based text processing view. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 24, 1200–1210.
O’Reilly, T., Best, R., & McNamara, D. S. (2004). Self-explanation reading training: Effects
for low-knowledge readers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of
the 26th Annual Conference of the Cognitive Science Society (pp. 1053–1058). Mahwah,
NJ: Erlbaum.
O’Reilly, T., & McNamara, D. S. (2007). Reversing the reverse cohesion effect: good
texts can be better for strategic, high-knowledge readers. Discourse Processes, 43,
121–152.
Ozuru, Y., Best, R., Bell, C., Witherspoon, A., & McNamara, D. S. (2007). Influence of
question format and text availability on assessment of expository text comprehension.
Cognition & Instruction, 25, 399–438.
Ozuru, Y., Briner, S., Best, R., & McNamara, D. S. (2010). Contributions of self-
explanation to comprehension of high and low cohesion texts. Discourse Processes,
47, 641–667.
Ozuru, Y., Dempsey, K., & McNamara, D. S. (2009). Prior knowledge, reading skill,
and text cohesion in the comprehension of science texts. Learning and Instruction, 19,
228–242.
Ozuru, Y., Rowe, M., O’Reilly, T., & McNamara, D. S. (2008). Where’s the difficulty in
standardized reading tests: The passage or the question? Behavior Research Methods,
40, 1001–1015.
Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating
the ancient test. Phi Delta Kappan, 76, 561–565.
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery and meaningful-
ness values for 925 words. Journal of Experimental Psychology Monograph Supplement,
76 (3, Part 2).
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 242 [229–246] 7.10.2013
3:30PM
242 References
Palmer, M., Kingsbury, P., & Gildea, D. (2005). The Proposition Bank: An annotated
corpus of semantic roles. Computational Linguistics, 31, 71–106.
Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic Inquiry and Word
Count: LIWC 2007. Austin, TX: LIWC.net.
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The
development and psychometric properties of LIWC2007. Austin, TX: LIWC.net.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count
(LIWC) (Version LIWC2001) [Computer software]. Mahwah, NJ: Erlbaum.
Pennebaker, W. B. (2011). The secret life of pronouns: What our words say about us.
London: Bloomsbury Press.
Pentimonti, J. M., Zucker, T. A., Justice, L. M., & Kaderavek, J. N. (2010). Informational
text use in preschool classroom read-alouds. The Reading Teacher, 63, 656–665.
Perfetti, C. A. (2007). Reading ability: Lexical quality to comprehension. Scientific Studies
of Reading, 11, 357–383.
Perfetti, C. A., Landi, N., & Oakhill, J. The acquisition of reading comprehension Skill. In
M. J. Snowling & C. Hulme (Eds.), The science of reading: A handbook (pp. 227–247).
Oxford: Blackwell.
Pickering, M., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue.
Behavioral and Brain Sciences, 27, 169–226.
Popken, R. (1991). A study of topic sentence use in technical writing. The Technical
Writing Teacher, 18, 49–58.
Prince, E. F. (1981). Toward a taxonomy of given-new information. In P. Cole (Ed.),
Radical pragmatics (pp. 223–255). New York: Academic Press.
Rapp, D. N., van den Broek, P., McMaster, K. L., Kendeou, P., & Espin, C. A. (2007).
Higher-order comprehension processes in struggling readers: A perspective for
research and intervention. Scientific Studies of Reading, 11, 289–312.
Rayner, K. (1998) Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124, 372–422.
Rayner, K., Foorman, B., Perfetti, C., Pesetsky, D., & Seidenberg, M. (2001). How
psychological science informs the teaching of reading. Psychological Science in the
Public Interest, 2(2), 31–74.
Renner, A., McCarthy, P. M., Boonthum-Denecke, C., & McNamara, D. S. (2012).
Maximizing ANLP evaluation: Harmonizing flawed input. In P. M. McCarthy &
C. Boonthum-Denecke (Eds.), Applied natural language processing and content anal-
ysis: Identification, investigation, and resolution (pp. 438–456). Hershey, PA: IGI
Global.
Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure
of categories. Cognitive Psychology, 7, 573–605.
Roscoe, R. D., Crossley, S. A., Weston, J. L., & McNamara, D. S. (2011). Automated
assessment of paragraph quality: Introductions, body, and conclusion paragraphs.
In R. C. Murray & P. M. McCarthy (Eds.), Proceedings of the 24th International
Florida Artificial Intelligence Research Society (FLAIRS) Conference (pp. 281–286).
Menlo Park, CA: AAAI Press.
Rowe, M., & McNamara, D. S. (2008). Inhibition needs no negativity: Negativity links in the
construction-integration model. In V. Sloutsky, B. Love, & K. McRae (Eds.), Proceedings
of the 30th Annual Conference of the Cognitive Science Society (pp. 1777–1782).
Washington, DC: Cognitive Science Society.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 243 [229–246] 7.10.2013
3:30PM
References 243
244 References
References 245
This appendix provides the list of indices in Coh-Metrix Version 3.0. The first column
provides the label that appears in the output in the current version. The second column
provides the label used in prior versions of Coh-Metrix. The third column provides a
short description of the index.
Label in Label in
Version 3.x Version 2.x Description
Descriptive
1 DESPC READNP Paragraph count, number of
paragraphs
2 DESSC READNS Sentence count, number of sentences
3 DESWC READNW Word count, number of words
4 DESPL READAPL Paragraph length, number of
sentences, mean
5 DESPLd n/a Paragraph length, number of
sentences, standard deviation
6 DESSL READASL Sentence length, number of words,
mean
7 DESSLd n/a Sentence length, number of words,
standard deviation
8 DESWLsy READASW Word length, number of syllables,
mean
9 DESWLsyd n/a Word length, number of syllables,
standard deviation
10 DESWLlt n/a Word length, number of letters, mean
11 DESWLltd n/a Word length, number of letters,
standard deviation
Text Easability Principal Component Scores
12 PCNARz n/a Text Easability PC Narrativity, z score
13 PCNARp n/a Text Easability PC Narrativity,
percentile
(continued )
247
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 248 [247–252] 9.10.2013
8:07AM
Label in Label in
Version 3.x Version 2.x Description
14 PCSYNz n/a Text Easability PC Syntactic simplicity,
z score
15 PCSYNp n/a Text Easability PC Syntactic simplicity,
percentile
16 PCCNCz n/a Text Easability PC Word concreteness,
z score
17 PCCNCp n/a Text Easability PC Word concreteness,
percentile
18 PCREFz n/a Text Easability PC Referential
cohesion, z score
19 PCREFp n/a Text Easability PC Referential
cohesion, percentile
20 PCDCz n/a Text Easability PC Deep cohesion, z
score
21 PCDCp n/a Text Easability PC Deep cohesion,
percentile
22 PCVERBz n/a Text Easability PC Verb cohesion, z
score
23 PCVERBp n/a Text Easability PC Verb cohesion,
percentile
24 PCCONNz n/a Text Easability PC Connectivity, z
score
25 PCCONNp n/a Text Easability PC Connectivity,
percentile
26 PCTEMPz n/a Text Easability PC Temporality, z score
27 PCTEMPp n/a Text Easability PC Temporality,
percentile
Referential Cohesion
28 CRFNO1 CRFBN1um Noun overlap, adjacent sentences,
binary, mean
29 CRFAO1 CRFBA1um Argument overlap, adjacent sentences,
binary, mean
30 CRFSO1 CRFBS1um Stem overlap, adjacent sentences,
binary, mean
31 CRFNOa CRFBNaum Noun overlap, all sentences, binary,
mean
32 CRFAOa CRFBAaum Argument overlap, all sentences,
binary, mean
33 CRFSOa CRFBSaum Stem overlap, all sentences, binary,
mean
34 CRFCWO1 CRFPC1um Content word overlap, adjacent
sentences, proportional, mean
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 249 [247–252] 9.10.2013
8:07AM
Label in Label in
Version 3.x Version 2.x Description
35 CRFCWO1d n/a Content word overlap, adjacent
sentences, proportional, standard
deviation
36 CRFCWOa CRFPCaum Content word overlap, all sentences,
proportional, mean
37 CRFCWOad n/a Content word overlap, all sentences,
proportional, standard deviation
LSA
38 LSASS1 LSAassa LSA overlap, adjacent sentences, mean
39 LSASS1d LSAassd LSA overlap, adjacent sentences,
standard deviation
40 LSASSp LSApssa LSA overlap, all sentences in
paragraph, mean
41 LSASSpd LSApssd LSA overlap, all sentences in
paragraph, standard deviation
42 LSAPP1 LSAppa LSA overlap, adjacent paragraphs,
mean
43 LSAPP1d LSAppd LSA overlap, adjacent paragraphs,
standard deviation
44 LSAGN LSAGN LSA given/new, sentences, mean
45 LSAGNd n/a LSA given/new, sentences, standard
deviation
Lexical Diversity
46 LDTTRc TYPTOKc Lexical diversity, type-token ratio,
content word lemmas
47 LDTTRa n/a Lexical diversity, type-token ratio, all
words
48 LDMTLDa LEXDIVTD Lexical diversity, MTLD, all words
49 LDVOCDa LEXDIVVD Lexical diversity, VOCD, all words
Connectives
50 CNCAll CONi All connectives incidence
51 CNCCaus CONCAUSi Causal connectives incidence
52 CNCLogic CONLOGi Logical connectives incidence
53 CNCADC CONADVCONi Adversative and contrastive
connectives incidence
54 CNCTemp CONTEMPi Temporal connectives incidence
55 CNCTempx CONTEMPEXi Expanded temporal connectives
incidence
56 CNCAdd CONADDi Additive connectives incidence
57 CNCPos n/a Positive connectives incidence
58 CNCNeg n/a Negative connectives incidence
(continued )
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 250 [247–252] 9.10.2013
8:07AM
Label in Label in
Version 3.x Version 2.x Description
Situation Model
59 SMCAUSv CAUSV Causal verb incidence
60 SMCAUSvp CAUSVP Causal verbs and causal particles
incidence
61 SMINTEp INTEi Intentional verbs incidence
62 SMCAUSr CAUSC Ratio of casual particles to causal verbs
63 SMINTEr INTEC Ratio of intentional particles to
intentional verbs
64 SMCAUSlsa CAUSLSA LSA verb overlap
65 SMCAUSwn CAUSWN WordNet verb overlap
66 SMTEMP TEMPta Temporal cohesion, tense and aspect
repetition, mean
Syntactic Complexity
67 SYNLE SYNLE Left embeddedness, words before main
verb, mean
68 SYNNP SYNNP Number of modifiers per noun phrase,
mean
69 SYNMEDpos MEDwtm Minimal Edit Distance, part of speech
70 SYNMEDwrd MEDawm Minimal Edit Distance, all words
71 SYNMEDlem MEDalm Minimal Edit Distance, lemmas
72 SYNSTRUTa STRUTa Sentence syntax similarity, adjacent
sentences, mean
73 SYNSTRUTt STRUTt Sentence syntax similarity, all
combinations, across paragraphs,
mean
Syntactic Pattern Density
74 DRNP n/a Noun phrase density, incidence
75 DRVP n/a Verb phrase density, incidence
76 DRAP n/a Adverbial phrase density, incidence
77 DRPP n/a Preposition phrase density, incidence
78 DRPVAL AGLSPSVi Agentless passive voice density,
incidence
79 DRNEG DENNEGi Negation density, incidence
80 DRGERUND GERUNDi Gerund density, incidence
81 DRINF INFi Infinitive density, incidence
Word Information
82 WRDNOUN NOUNi Noun incidence
83 WRDVERB VERBi Verb incidence
84 WRDADJ ADJi Adjective incidence
85 WRDADV ADVi Adverb incidence
86 WRDPRO DENPRPi Pronoun incidence
87 WRDPRP1s n/a First-person singular pronoun
incidence
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 251 [247–252] 9.10.2013
8:07AM
Label in Label in
Version 3.x Version 2.x Description
88 WRDPRP1p n/a First-person plural pronoun incidence
89 WRDPRP2 PRO2i Second-person pronoun incidence
90 WRDPRP3s n/a Third-person singular pronoun
incidence
91 WRDPRP3p n/a Third-person plural pronoun
incidence
92 WRDFRQc FRCLacwm CELEX word frequency for content
words, mean
93 WRDFRQa FRCLaewm CELEX Log frequency for all words,
mean
94 WRDFRQmc FRCLmcsm CELEX Log minimum frequency for
content words, mean
95 WRDAOAc WRDAacwm Age of acquisition for content words,
mean
96 WRDFAMc WRDFacwm Familiarity for content words, mean
97 WRDCNCc WRDCacwm Concreteness for content words, mean
98 WRDIMGc WRDIacwm Imagability for content words, mean
99 WRDMEAc WRDMacwm Meaningfulness, Colorado norms,
content words, mean
100 WRDPOLc POLm Polysemy for content words, mean
101 WRDHYPn HYNOUNaw Hypernymy for nouns, mean
102 WRDHYPv HYVERBaw Hypernymy for verbs, mean
103 WRDHYPnv HYPm Hypernymy for nouns and verbs, mean
Readability
104 RDFRE READFRE Flesch Reading Ease
105 RDFKGL READFKGL Flesch-Kincaid Grade Level
106 RDL2 L2 Coh-Metrix L2 Readability
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 252 [247–252] 9.10.2013
8:07AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D 253 [253–270] 9.10.2013
8:13AM
This appendix provides norms for the indices described in Chapters 4 and 5. To create
these norms, we analyzed a subset of a large corpus of texts created by the Touchstone
Applied Science Associates (TASA), Inc. The total TASA corpus includes 9 genres
consisting of 119,627 paragraphs taken from 37,651 samples. The norms are provided for
the three largest domains represented in TASA: language arts, social studies, and science
texts. To do so, we randomly chose 100 passages from each of the 3 genres and each of 13
grade levels, for a total of 3,900 passages.
Grade level in the TASA corpus is indexed by the Degrees of Reading Power (DRP;
Koslin et al., 1987), which is a readability measure that includes word- and sentence-level
characteristics. As can be observed in the table, DRP is highly correlated with the Flesch
Reading Ease and Flesch-Kincaid Grade Level measures of readability.
To simplify the data analysis and presentation, DRP levels were translated to their
corresponding grade-level estimates and then collapsed according to the grade bands
used within the Common Core State Standards: grades K to 1, 2 to 3, 4 to 5, 6 to 8, 9 to
10, and 11 and higher. Each grade level within each genre was represented by 100
passages. Because the Common Core grade bands include different numbers of grade
levels per band (e.g., 2–3 includes two grades, 6–8 includes three grades), there are
different numbers of passages represented for each grade band. The average DRP
values as well as the range of DRP values for each grade band are provided in the
Table B.1.
The majority of the values below provided in the norms below can be used as
comparisons to other corpora. However, some of indices are provided solely to
describe the corpus. The descriptive indices provided below are not intended to be
indicative of normative values that generalize to other text corpora. For example,
the passages in TASA all consist of one paragraph because paragraph breaks are not
marked in the TASA corpus. Hence, the paragraph count (i.e., DESPC) in the
norms table is 1. The standard deviation of the paragraph length (i.e., DESPLd) is 0
because this measure averages the length of paragraphs in terms of the number of
sentences across paragraphs (and there is only one paragraph in each text). The
average number of words and sentences (i.e., DESWC, DESSC) describes the corpus
but does not provide a normative value, because the length of the texts was kept
relatively constant within the TASA corpus. However, the remaining indices provide
a normative value that can be used to compare other texts in the corresponding
genre.
253
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Language Arts
254
DESPL 34.640 6.792 26.820 5.573 20.935 4.940 15.923 4.509 13.875 3.871 13.203 3.670
DESPLd 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[253–270] 9.10.2013
DESSL 8.601 1.600 11.375 2.368 14.522 4.421 19.937 6.676 23.002 8.395 24.764 9.406
DESSLd 4.785 1.443 6.516 2.075 8.584 6.329 11.405 5.380 13.674 12.062 13.143 8.233
DESWLsy 1.205 0.061 1.270 0.055 1.320 0.068 1.378 0.068 1.435 0.063 1.546 0.092
DESWLsyd 0.470 0.095 0.555 0.080 0.619 0.101 0.685 0.079 0.756 0.079 0.871 0.103
DESWLlt 3.789 0.201 3.994 0.163 4.159 0.191 4.337 0.188 4.484 0.167 4.763 0.223
DESWLltd 1.730 0.220 1.929 0.185 2.075 0.214 2.242 0.183 2.377 0.173 2.615 0.209
Text Easability Principal Component Scores
PCNARz 1.368 0.574 1.164 0.618 0.745 0.773 0.446 0.714 0.250 0.632 −0.232 0.677
PCNARp 88.175 10.284 83.843 13.577 72.196 21.756 64.119 22.022 58.457 21.305 41.649 21.476
PCSYNz 1.625 0.670 0.891 0.634 0.297 0.755 −0.416 0.882 −0.720 0.848 −0.701 0.946
PCSYNp 91.153 9.522 77.387 16.676 59.784 23.071 38.152 24.265 29.343 21.547 31.250 22.614
PCCNCz 0.205 0.939 0.560 0.863 0.830 1.071 0.883 0.958 0.752 0.944 0.391 1.079
PCCNCp 55.749 27.500 66.449 24.680 71.996 26.876 74.252 24.359 70.562 25.400 59.456 29.013
PCREFz 0.044 0.959 −0.254 0.822 −0.390 0.816 −0.337 0.851 −0.378 0.793 −0.338 0.882
PRREFp 48.809 26.453 41.112 24.837 37.331 24.426 38.894 25.089 37.872 25.042 38.669 26.079
PCDCz −0.007 0.922 0.075 0.762 0.073 0.968 0.171 0.914 0.254 0.969 0.286 1.012
PCDCp 47.978 24.830 51.923 24.310 50.981 27.508 54.417 27.033 56.209 27.069 57.590 27.945
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
PCVERBz −0.024 0.854 −0.374 0.870 −0.089 0.938 −0.222 0.971 −0.294 0.965 −0.631 0.901
PCVERBp 49.733 26.128 38.730 25.442 46.428 27.910 43.596 28.543 42.111 27.689 31.619 24.405
PCCONNz −1.458 1.303 −2.083 1.279 −2.239 1.268 −2.455 1.262 −2.503 1.333 −2.399 1.230
PCCONNp 18.803 21.303 9.055 14.398 7.915 14.561 5.698 10.984 5.530 11.092 6.157 12.481
PCTEMPz 0.066 0.654 0.011 0.800 −0.034 0.989 -0.073 1.064 0.030 1.189 −0.032 1.118
PCTEMPp 52.650 21.834 51.020 25.011 50.525 28.250 49.177 29.389 52.743 31.570 50.784 29.938
Referential Cohesion
CRFNO1 0.149 0.134 0.162 0.133 0.182 0.151 0.225 0.172 0.246 0.165 0.303 0.201
CRFAO1 0.349 0.157 0.413 0.171 0.454 0.184 0.524 0.199 0.537 0.210 0.552 0.223
255
CRFSO1 0.168 0.143 0.191 0.143 0.222 0.170 0.289 0.198 0.328 0.198 0.414 0.230
CRFNOa 0.127 0.090 0.131 0.089 0.143 0.099 0.180 0.126 0.199 0.122 0.243 0.147
[253–270] 9.10.2013
CRFAOa 0.275 0.116 0.339 0.142 0.362 0.149 0.427 0.180 0.443 0.183 0.456 0.204
CRFSOa 0.148 0.103 0.156 0.099 0.175 0.116 0.232 0.146 0.269 0.153 0.344 0.176
CRFCWO1 0.108 0.054 0.101 0.047 0.094 0.043 0.095 0.047 0.090 0.040 0.087 0.047
CRFPCWO1d 0.143 0.039 0.125 0.036 0.113 0.034 0.099 0.037 0.089 0.032 0.084 0.035
CRFCWOa 0.083 0.035 0.077 0.032 0.071 0.030 0.072 0.033 0.068 0.029 0.067 0.037
CRFCWOad 0.133 0.028 0.112 0.019 0.100 0.021 0.089 0.024 0.080 0.019 0.076 0.023
LSA
LSASS1 0.220 0.091 0.232 0.083 0.250 0.092 0.302 0.099 0.334 0.117 0.379 0.100
LSASS1d 0.192 0.045 0.184 0.040 0.171 0.047 0.170 0.047 0.167 0.049 0.167 0.048
LSASSp 0.179 0.079 0.190 0.070 0.207 0.085 0.262 0.094 0.305 0.119 0.345 0.103
LSASSpd 0.188 0.036 0.176 0.034 0.164 0.037 0.164 0.036 0.164 0.037 0.163 0.038
LSAPP1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAPP1d 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAGN 0.380 0.060 0.352 0.042 0.343 0.053 0.348 0.050 0.358 0.056 0.374 0.049
LSAGNd 0.154 0.025 0.141 0.024 0.139 0.026 0.144 0.027 0.153 0.029 0.158 0.028
(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Language Arts (cont.)
256
LDVOCDa 73.046 20.551 87.097 20.668 90.344 21.384 91.741 20.729 94.064 19.323 93.553 20.263
Connectives
[253–270] 9.10.2013
CNCAll 71.718 20.376 81.029 21.149 85.096 19.794 90.798 20.343 91.531 21.506 92.230 19.732
CNCCaus 19.564 11.450 19.730 9.578 19.886 10.761 21.003 9.589 22.830 10.172 24.596 11.061
CNCLogic 30.224 13.516 31.674 11.816 31.685 13.714 32.959 13.104 34.657 12.604 35.772 14.091
CNCADC 9.961 7.049 13.531 8.346 14.391 8.677 15.676 9.045 17.494 9.472 17.710 9.147
CNCTemp 19.152 11.858 20.625 10.014 20.647 11.790 21.766 9.687 20.100 9.705 19.467 9.656
CNCTempx 15.043 10.243 16.112 10.605 17.994 10.557 17.122 10.341 17.245 10.234 16.028 9.761
CNCAdd 37.158 15.511 43.945 14.980 45.453 15.327 49.345 14.983 50.120 15.974 49.906 14.787
CNCPos 66.102 19.949 72.767 19.937 74.704 19.291 78.699 19.547 78.614 19.900 78.575 19.267
CNCNeg 7.765 6.385 9.706 6.672 10.671 7.046 11.711 7.627 12.847 8.108 13.625 8.233
Situation Model
SMCAUSv 52.750 18.394 44.199 13.131 36.328 12.953 27.130 11.755 22.740 10.847 23.172 9.161
SMCAUSvp 61.127 19.923 53.469 13.750 44.633 13.791 36.104 12.926 32.486 12.063 32.783 10.589
SMINTEp 56.429 18.098 41.033 13.971 30.114 12.533 21.366 10.013 17.901 9.464 16.464 8.398
SMCAUSr 0.167 0.156 0.218 0.181 0.248 0.248 0.376 0.553 0.473 0.493 0.452 0.502
SMINTEr 0.336 0.249 0.433 0.297 0.639 0.537 0.919 0.771 1.138 0.884 1.249 1.057
SMCAUSlsa 0.082 0.024 0.071 0.023 0.077 0.034 0.080 0.032 0.083 0.036 0.087 0.037
SMCAUSwn 0.602 0.088 0.566 0.090 0.577 0.095 0.572 0.093 0.569 0.084 0.537 0.093
SMTEMP 0.851 0.061 0.841 0.077 0.833 0.097 0.821 0.106 0.825 0.115 0.820 0.111
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Syntactic Complexity
SYNLE 2.163 0.707 2.593 0.773 3.229 1.242 4.078 1.700 4.644 2.335 5.512 2.430
SYNNP 0.565 0.144 0.623 0.137 0.730 0.166 0.821 0.149 0.877 0.160 0.936 0.164
SYNMEDpos 0.703 0.057 0.698 0.048 0.680 0.047 0.668 0.047 0.665 0.050 0.643 0.048
SYNMEDwrd 0.906 0.047 0.913 0.035 0.906 0.032 0.902 0.029 0.900 0.026 0.891 0.028
SYNMEDlem 0.882 0.052 0.889 0.041 0.885 0.035 0.882 0.032 0.882 0.028 0.873 0.031
SYNSTRUTa 0.172 0.059 0.143 0.036 0.121 0.037 0.097 0.035 0.086 0.031 0.087 0.032
SYNSTRUTt 0.159 0.045 0.134 0.032 0.114 0.029 0.089 0.027 0.083 0.024 0.081 0.024
Syntactic Pattern Density
DRNP 353.241 25.341 352.136 25.748 352.915 29.344 355.756 31.572 363.273 31.000 366.610 32.600
257
DRVP 264.580 29.825 252.577 31.921 229.998 35.829 214.462 35.386 199.327 32.115 191.868 38.489
DRAP 40.308 16.165 42.571 15.109 37.937 14.678 36.662 13.863 35.631 13.605 31.178 12.754
[253–270] 9.10.2013
DRPP 74.397 19.640 85.912 18.751 100.214 21.102 109.790 22.740 115.670 20.955 123.168 21.929
DRPVAL 0.874 1.638 1.862 2.442 2.563 3.498 3.242 3.092 2.969 2.607 4.479 3.438
DRNEG 18.421 10.519 14.917 9.221 12.333 8.728 9.475 7.265 9.343 7.239 8.178 6.264
DRGERUND 7.297 4.945 9.008 5.595 8.642 4.995 9.082 5.421 8.838 5.130 9.022 5.110
DRINF 8.392 4.934 8.808 4.742 8.215 4.445 7.641 5.047 7.143 4.410 7.679 5.010
Word Information
WRDNOUN 210.219 36.902 214.872 36.543 226.645 43.466 230.869 37.516 240.713 35.191 256.079 39.605
WRDVERB 172.875 24.359 161.881 24.171 150.317 22.721 140.766 20.991 134.166 21.520 124.386 21.432
WRDADJ 53.907 19.192 57.607 17.886 66.806 21.064 76.646 20.967 83.810 23.646 91.914 21.640
WRDADV 69.431 23.138 68.531 22.846 62.670 21.274 59.978 19.873 58.900 19.306 54.634 18.949
WRDPRO 131.679 34.332 126.184 31.796 105.848 35.768 91.207 33.823 83.173 29.407 64.285 29.125
WRDPRP1s 35.083 28.790 29.780 28.337 18.946 25.020 15.573 22.917 10.791 18.383 5.478 13.913
WRDPRP1p 8.493 12.401 8.126 14.481 4.954 8.650 4.640 10.894 4.526 9.046 4.873 11.016
WRDPRP2 19.669 15.595 15.295 18.160 10.413 13.583 8.519 17.033 7.034 14.519 7.185 16.385
WRDPRP3s 43.289 30.308 47.940 30.440 45.865 32.239 38.140 30.650 37.031 29.535 23.508 25.622
WRDPRP3p 9.403 10.644 10.525 11.166 10.621 10.609 11.249 11.948 12.332 13.016 12.206 12.865
(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Language Arts (cont.)
258
WRDFAMc 583.866 6.242 578.780 8.419 576.096 7.960 571.920 8.365 570.105 8.352 564.820 9.003
WRDCNCc 400.119 26.115 401.601 25.872 404.363 31.319 399.461 29.030 393.433 28.040 384.911 32.791
[253–270] 9.10.2013
WRDIMGc 430.002 24.017 431.360 23.252 435.387 29.029 431.485 26.385 427.273 25.145 417.412 29.335
WRDMEAc 432.977 13.939 432.909 12.050 435.929 14.786 432.973 14.200 433.259 13.704 429.408 15.955
WRDPOLc 4.642 0.514 4.386 0.441 4.217 0.402 4.107 0.382 3.964 0.379 3.765 0.401
WRDHYPn 6.179 0.850 6.264 0.789 6.314 0.682 6.378 0.622 6.266 0.615 6.373 0.602
WRDHYPv 1.672 0.159 1.667 0.162 1.652 0.170 1.650 0.177 1.631 0.170 1.644 0.189
WRDHYPnv 1.469 0.261 1.511 0.245 1.570 0.254 1.606 0.234 1.624 0.206 1.726 0.230
Readability
RDFRE 95.495 3.854 87.917 3.890 80.502 5.292 70.209 5.873 62.299 7.797 51.092 9.258
RDFKGL 1.941 0.838 3.796 0.775 5.610 1.494 8.381 2.233 10.242 3.012 12.240 3.315
RDL2 27.133 6.216 22.239 4.978 19.238 4.755 15.467 5.032 13.967 4.103 11.808 5.045
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Social Studies
259
DESPLd 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
DESSL 7.983 1.488 10.340 1.470 12.081 2.008 15.316 3.704 18.040 5.580 20.338 5.229
[253–270] 9.10.2013
DESSLd 3.361 1.132 4.247 1.531 4.964 1.905 6.681 3.172 8.070 3.895 9.375 3.970
DESWLsy 1.255 0.068 1.327 0.065 1.395 0.072 1.479 0.079 1.508 0.076 1.623 0.102
DESWLsyd 0.531 0.094 0.612 0.083 0.693 0.094 0.780 0.093 0.817 0.085 0.936 0.101
DESWLlt 3.967 0.205 4.190 0.180 4.379 0.185 4.587 0.203 4.647 0.199 4.930 0.257
DESWLltd 1.809 0.172 1.962 0.161 2.115 0.190 2.327 0.186 2.424 0.182 2.700 0.225
Text Easability Principal Component Scores
PCNARz 0.567 0.847 0.085 0.696 −0.247 0.660 −0.501 0.704 −0.535 0.639 −0.742 0.572
PCNARp 66.349 24.184 52.386 22.572 41.410 21.806 33.426 21.237 31.753 20.201 25.892 17.196
PCSYNz 1.604 0.623 1.152 0.533 0.811 0.616 0.401 0.710 0.049 0.789 −0.101 0.746
PCSYNp 91.492 8.278 84.402 11.144 75.412 16.879 63.186 22.124 52.366 24.017 47.311 22.974
PCCNCz 0.450 0.860 0.739 0.854 0.829 0.901 0.533 0.962 0.456 0.980 0.034 0.964
PCCNCp 62.647 25.222 71.680 23.749 73.278 23.832 65.566 26.855 62.945 27.651 51.251 27.792
PCREFz 0.253 0.978 0.128 0.947 −0.089 0.826 −0.267 0.808 −0.147 0.864 −0.310 0.855
PRREFp 55.911 28.598 52.252 27.285 46.381 25.298 41.257 24.713 44.600 26.443 39.602 25.268
(continued )
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Social Studies (cont.)
260
PCCONNz −1.069 1.122 −1.615 1.142 −1.811 1.211 −2.067 1.302 −2.019 1.363 −2.254 1.276
PCCONNp 23.971 24.281 14.426 19.824 11.934 16.561 9.271 14.590 10.987 15.854 7.839 14.163
[253–270] 9.10.2013
PCTEMPz 0.095 0.706 0.016 0.858 −0.027 0.972 0.114 0.935 −0.008 1.047 −0.154 1.085
PCTEMPp 53.207 23.365 51.220 26.453 50.261 28.317 53.983 28.019 50.949 29.482 47.270 30.021
Referential Cohesion
CRFNO1 0.226 0.162 0.298 0.179 0.325 0.172 0.351 0.174 0.397 0.186 0.399 0.197
CRFAO1 0.437 0.153 0.475 0.170 0.483 0.172 0.496 0.174 0.537 0.186 0.527 0.194
CRFSO1 0.280 0.193 0.364 0.193 0.411 0.190 0.456 0.193 0.501 0.195 0.523 0.212
CRFNOa 0.144 0.093 0.207 0.121 0.215 0.113 0.240 0.123 0.281 0.147 0.289 0.146
CRFAOa 0.294 0.109 0.339 0.132 0.340 0.146 0.354 0.150 0.398 0.167 0.399 0.157
CRFSOa 0.179 0.111 0.262 0.139 0.277 0.132 0.326 0.150 0.381 0.166 0.405 0.168
CRFCWO1 0.141 0.068 0.127 0.058 0.113 0.047 0.100 0.040 0.102 0.045 0.092 0.045
CRFCWO1d 0.163 0.050 0.139 0.038 0.126 0.034 0.110 0.030 0.105 0.033 0.095 0.034
CRFCWOa 0.089 0.037 0.082 0.036 0.074 0.033 0.066 0.027 0.070 0.032 0.064 0.028
CRFPCWOad 0.141 0.034 0.120 0.026 0.107 0.023 0.094 0.022 0.090 0.022 0.083 0.021
LSA
LSASS1 0.264 0.090 0.296 0.099 0.315 0.094 0.344 0.107 0.360 0.100 0.382 0.107
LSASS1d 0.206 0.041 0.198 0.040 0.191 0.039 0.182 0.039 0.175 0.040 0.164 0.039
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
LSASSp 0.156 0.053 0.202 0.076 0.229 0.083 0.277 0.105 0.300 0.098 0.332 0.109
LSASSpd 0.179 0.033 0.180 0.034 0.180 0.034 0.173 0.033 0.166 0.031 0.159 0.030
LSAPP1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAPP1d 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAGN 0.377 0.054 0.376 0.056 0.374 0.050 0.374 0.057 0.376 0.050 0.382 0.053
LSAGNd 0.153 0.026 0.144 0.023 0.141 0.021 0.141 0.021 0.144 0.022 0.145 0.023
Lexical Diversity
LDTTRc 0.635 0.109 0.669 0.094 0.706 0.080 0.738 0.074 0.750 0.075 0.768 0.071
LDTTRa 0.473 0.075 0.497 0.062 0.523 0.053 0.544 0.051 0.546 0.054 0.558 0.048
LDMTLDa 54.345 24.124 59.491 20.692 66.751 21.020 75.340 22.556 77.985 23.133 84.314 24.050
261
LDVOCDa 69.449 22.970 72.753 19.683 77.440 18.942 82.238 20.288 81.764 19.591 87.326 19.731
Connectives
[253–270] 9.10.2013
CNCAll 58.392 22.014 70.728 21.355 76.186 20.233 84.591 21.073 86.130 21.215 90.993 18.121
CNCCaus 17.730 11.448 21.273 11.873 21.854 10.673 24.530 12.556 26.200 11.606 26.776 10.524
CNCLogic 24.637 12.772 29.832 13.176 30.388 12.899 34.090 15.468 36.058 15.587 37.279 14.150
CNCADC 9.107 7.201 11.755 8.533 12.552 7.841 15.300 9.537 15.875 10.126 17.618 9.610
CNCTemp 12.035 9.772 14.929 10.194 16.065 9.549 17.775 9.822 18.087 9.025 18.169 9.035
CNCTempx 17.186 12.035 18.521 11.410 17.821 11.082 18.467 11.393 18.193 9.807 17.083 9.492
CNCAdd 30.570 13.200 37.075 12.994 40.490 13.892 44.441 14.422 44.462 14.981 48.488 14.460
CNCPos 52.524 20.862 62.688 19.735 66.544 18.965 72.794 18.701 74.129 19.404 77.561 16.614
CNCNeg 7.062 6.246 8.885 6.990 9.683 6.756 11.577 8.475 12.077 9.190 13.429 8.296
Situation Model
SMCAUSv 61.365 20.088 50.642 12.846 44.915 13.983 37.219 12.685 32.569 12.356 29.043 10.936
SMCAUSvp 69.113 20.467 59.100 14.680 53.736 15.325 46.898 14.893 42.641 13.929 38.772 12.597
SMINTEp 47.001 18.187 35.157 15.729 29.545 13.520 23.608 11.600 19.900 10.299 18.227 9.953
(continued )
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Social Studies (cont.)
262
SMTEMP 0.869 0.073 0.857 0.083 0.849 0.090 0.853 0.090 0.838 0.099 0.818 0.105
Syntactic Complexity
[253–270] 9.10.2013
SYNLE 1.951 0.578 2.734 0.718 3.299 0.879 4.240 1.174 4.844 1.351 5.608 2.259
SYNNP 0.630 0.152 0.747 0.153 0.820 0.147 0.899 0.161 0.926 0.157 0.960 0.153
SYNMEDpos 0.650 0.066 0.647 0.047 0.641 0.049 0.638 0.039 0.629 0.040 0.628 0.045
SYNMEDwrd 0.876 0.061 0.877 0.045 0.883 0.036 0.888 0.030 0.880 0.030 0.883 0.033
SYNMEDlem 0.840 0.069 0.846 0.049 0.854 0.040 0.862 0.031 0.856 0.031 0.861 0.034
SYNSTRUTa 0.220 0.066 0.183 0.045 0.160 0.044 0.135 0.039 0.121 0.036 0.107 0.036
SYNSTRUTt 0.186 0.050 0.160 0.036 0.143 0.034 0.128 0.033 0.112 0.033 0.100 0.029
Syntactic Pattern Density
DRNP 376.886 35.318 376.609 34.887 383.136 30.244 383.272 35.736 382.043 33.942 375.983 36.490
DRVP 232.749 42.527 222.847 40.522 201.074 36.206 190.151 41.273 188.737 40.653 186.081 39.188
DRAP 32.169 15.158 28.840 13.381 27.278 11.722 26.956 11.667 26.601 11.597 28.050 12.394
DRPP 92.603 26.049 105.813 26.246 118.513 20.893 123.142 23.135 125.957 23.317 128.927 23.647
DRPVAL 2.877 2.789 4.954 4.374 5.382 4.275 5.369 3.916 5.494 4.265 5.555 4.357
DRNEG 9.902 10.422 7.939 7.977 6.574 6.369 6.288 6.653 7.083 6.330 7.163 6.070
DRGERUND 4.560 4.348 4.641 3.743 5.193 4.000 5.778 4.641 5.898 4.603 6.831 4.544
DRINF 7.306 4.810 9.081 5.682 7.927 5.069 7.915 5.049 7.929 5.048 8.549 5.148
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Word Information
WRDNOUN 251.523 48.531 267.986 41.900 279.999 36.277 281.971 41.589 279.454 38.987 279.553 38.174
WRDVERB 140.897 30.124 136.423 31.019 128.984 27.816 124.428 28.567 123.826 24.977 119.149 22.076
WRDADJ 61.441 21.994 71.183 22.908 80.723 25.257 91.124 26.149 91.040 24.166 99.109 25.138
WRDADV 52.279 25.096 48.251 19.409 43.362 15.897 44.283 17.965 45.003 17.571 47.179 18.207
WRDPRO 104.482 41.070 73.265 34.878 59.541 29.932 48.619 29.483 44.589 26.160 39.247 22.219
WRDPRP1s 17.842 32.371 5.380 13.603 3.190 11.449 1.915 6.725 1.818 6.221 1.467 5.729
WRDPRP1p 12.147 16.461 6.581 11.847 5.022 10.555 3.802 9.495 3.213 8.146 4.440 9.077
WRDPRP2 18.873 22.570 13.661 19.926 9.684 21.016 4.931 12.184 4.356 11.807 2.281 7.772
WRDPRP3s 22.336 28.217 17.996 25.936 13.081 18.815 12.667 18.976 12.758 18.809 9.674 14.766
263
WRDPRP3p 19.225 18.635 17.870 15.489 18.358 17.276 16.123 14.541 11.857 11.082 12.691 12.710
WRDFRQc 2.545 0.155 2.441 0.149 2.370 0.150 2.282 0.154 2.230 0.142 2.149 0.145
[253–270] 9.10.2013
WRDFRQa 3.152 0.085 3.127 0.084 3.107 0.093 3.073 0.102 3.057 0.104 2.993 0.106
WRDFRQmc 1.727 0.239 1.498 0.244 1.415 0.309 1.223 0.402 1.116 0.458 0.980 0.384
WRDAOAc 277.278 25.974 297.128 28.422 315.551 31.601 341.214 30.303 354.961 30.564 381.515 31.295
WRDFAMc 583.750 7.528 579.332 7.508 574.291 8.897 569.452 8.817 566.657 8.834 563.451 10.140
WRDCNCc 407.502 24.920 407.562 25.021 408.246 26.648 396.036 28.948 392.308 28.606 378.074 26.879
WRDIMGc 436.973 22.560 437.829 21.671 439.071 22.895 427.854 26.085 424.426 25.038 410.346 24.994
WRDMEAc 442.245 15.125 443.975 13.702 444.783 13.287 438.801 15.023 435.847 17.297 430.164 17.090
WRDPOLc 4.663 0.476 4.518 0.458 4.262 0.444 4.025 0.472 3.945 0.404 3.800 0.422
WRDHYPn 6.060 0.720 6.033 0.652 5.859 0.763 5.934 0.776 6.117 0.707 6.314 0.686
WRDHYPv 1.566 0.166 1.546 0.166 1.563 0.174 1.581 0.195 1.604 0.186 1.626 0.209
WRDHYPnv 1.621 0.264 1.720 0.255 1.716 0.217 1.739 0.246 1.782 0.239 1.843 0.260
Readability
RDFRE 92.393 5.350 84.142 5.036 76.644 5.519 66.234 5.375 61.055 5.698 49.059 9.598
RDFKGL 2.317 0.911 4.079 0.733 5.556 0.870 7.802 1.158 9.194 1.815 11.430 2.240
RDL2 32.381 8.481 27.016 5.956 23.300 5.394 19.139 4.947 17.209 4.737 14.039 4.552
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Science
264
DESPL 36.220 7.450 29.655 4.859 25.410 3.915 21.747 3.993 20.300 4.211 17.193 3.674
DESPLd 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[253–270] 9.10.2013
DESSL 7.884 1.397 9.612 1.485 11.032 1.658 13.259 2.577 14.519 3.241 17.715 4.541
DESSLd 3.020 0.988 3.549 0.930 4.344 1.290 5.376 1.739 5.905 2.295 7.624 3.228
DESWLsy 1.224 0.050 1.293 0.059 1.369 0.069 1.460 0.071 1.518 0.069 1.617 0.097
DESWLsyd 0.487 0.078 0.575 0.086 0.680 0.104 0.761 0.094 0.826 0.082 0.923 0.108
DESWLlt 3.990 0.168 4.162 0.155 4.323 0.181 4.540 0.178 4.681 0.190 4.873 0.248
DESWLltd 1.712 0.178 1.875 0.182 2.120 0.213 2.312 0.191 2.454 0.172 2.662 0.219
Text Easability Principal Component Scores
PCNARz 0.505 0.700 0.096 0.675 −0.255 0.568 −0.550 0.596 −0.724 0.529 −0.959 0.521
PCNARp 65.737 21.564 52.473 21.926 40.811 19.308 31.458 19.066 25.996 15.956 19.716 13.919
PCSYNz 1.844 0.715 1.482 0.626 1.236 0.587 0.885 0.679 0.718 0.739 0.309 0.771
PCSYNp 93.516 8.342 89.560 10.278 85.697 11.654 76.742 18.014 71.898 20.423 59.820 23.839
PCCNCz 0.751 1.024 0.870 0.941 0.826 0.921 0.632 0.958 0.488 0.973 0.053 0.938
PCCNCp 70.805 25.489 74.441 23.220 73.087 23.862 67.847 25.795 64.372 26.565 50.665 28.309
PCREFz 0.947 0.923 0.938 0.806 0.810 0.900 0.557 0.949 0.405 0.980 0.444 1.011
PRREFp 75.220 21.344 76.715 19.316 72.707 23.590 65.528 25.724 60.585 27.897 61.826 27.935
PCDCz −0.368 0.875 0.023 0.917 0.155 0.920 0.222 0.873 0.166 0.953 0.214 0.957
PCDCp 38.052 26.286 49.119 26.881 53.715 27.233 55.915 26.460 53.581 26.370 54.898 27.163
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
PCVERBz 0.832 0.928 0.485 0.808 0.347 0.876 0.027 0.831 −0.113 0.876 −0.494 0.914
PCVERBp 73.537 23.465 64.341 23.442 59.863 25.269 50.723 25.649 46.396 27.163 35.142 25.627
PCCONNz −1.361 1.318 −1.712 1.268 −1.916 1.335 −2.076 1.372 −2.031 1.325 −1.989 1.260
PCCONNp 21.021 23.716 14.153 18.214 12.374 17.597 10.441 16.931 10.335 16.929 10.775 17.118
PCTEMPz −0.154 0.720 −0.172 0.837 −0.148 0.900 −0.144 0.943 −0.276 1.137 −0.021 1.053
PCTEMPp 45.097 23.640 45.724 25.206 46.161 27.327 46.597 27.496 45.171 29.609 50.612 30.055
Referential Cohesion
CRFNO1 0.313 0.154 0.414 0.168 0.464 0.172 0.499 0.179 0.495 0.189 0.528 0.200
CRFAO1 0.528 0.161 0.600 0.153 0.610 0.163 0.624 0.164 0.601 0.174 0.646 0.181
CRFSO1 0.378 0.182 0.491 0.178 0.557 0.180 0.596 0.178 0.583 0.191 0.653 0.192
265
CRFNOa 0.191 0.105 0.260 0.126 0.294 0.126 0.323 0.149 0.338 0.154 0.370 0.162
CRFAOa 0.375 0.150 0.421 0.135 0.431 0.146 0.434 0.153 0.434 0.156 0.477 0.175
[253–270] 9.10.2013
CRFSOa 0.252 0.135 0.330 0.139 0.382 0.150 0.415 0.154 0.421 0.161 0.493 0.174
CRFCWO1 0.180 0.066 0.177 0.056 0.170 0.057 0.151 0.057 0.138 0.055 0.133 0.059
CRFCWO1d 0.190 0.041 0.168 0.032 0.163 0.033 0.141 0.036 0.133 0.035 0.122 0.041
CRFCWOa 0.112 0.045 0.110 0.040 0.102 0.038 0.092 0.039 0.089 0.037 0.086 0.039
CRFPCWOad 0.158 0.030 0.144 0.022 0.136 0.024 0.120 0.026 0.115 0.025 0.105 0.028
LSA
LSASS1 0.327 0.089 0.373 0.098 0.391 0.101 0.409 0.109 0.412 0.111 0.465 0.124
LSASS1d 0.227 0.038 0.219 0.034 0.217 0.037 0.208 0.042 0.197 0.044 0.185 0.047
LSASSp 0.205 0.073 0.252 0.092 0.275 0.103 0.310 0.108 0.323 0.113 0.394 0.132
LSASSpd 0.190 0.029 0.196 0.031 0.198 0.033 0.195 0.035 0.188 0.036 0.182 0.039
LSAPP1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAPP1d 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAGN 0.413 0.049 0.421 0.052 0.419 0.057 0.416 0.058 0.413 0.061 0.430 0.069
LSAGNd 0.155 0.020 0.150 0.019 0.154 0.019 0.155 0.023 0.154 0.025 0.160 0.025
(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Science (cont.)
266
LDVOCDa 58.811 17.974 63.544 16.570 67.023 16.656 73.977 22.106 74.525 20.588 76.040 21.928
Connectives
[253–270] 9.10.2013
CNCAll 61.240 22.355 71.864 21.075 75.838 21.236 80.439 20.504 80.821 21.126 82.993 19.682
CNCCaus 16.616 11.832 21.045 12.954 23.136 11.189 23.393 10.053 23.362 11.194 25.732 11.322
CNCLogic 23.482 14.076 29.202 13.145 32.107 14.812 34.261 13.565 34.051 14.484 35.846 14.105
CNCADC 11.583 9.569 12.505 8.095 14.528 9.634 15.493 9.286 14.651 9.550 16.111 9.221
CNCTemp 14.001 10.562 15.859 10.103 15.728 11.393 16.778 9.666 17.080 10.698 16.619 8.742
CNCTempx 10.491 11.556 11.992 11.850 12.346 10.948 12.082 10.105 12.910 11.448 12.599 9.878
CNCAdd 30.396 14.332 34.893 14.780 37.389 15.078 40.608 15.876 42.023 15.437 42.843 14.158
CNCPos 52.508 19.499 62.084 19.816 64.591 19.458 68.542 18.549 69.903 18.726 70.818 18.139
CNCNeg 9.052 8.366 9.851 6.728 11.649 8.639 12.069 8.182 11.303 8.505 12.562 8.198
Situation Model
SMCAUSv 80.537 26.527 65.375 16.421 56.290 15.282 46.796 14.999 42.447 14.776 35.392 12.810
SMCAUSvp 90.447 25.793 77.056 19.399 68.493 18.587 58.267 16.947 53.433 17.243 47.354 15.434
SMINTEp 41.198 21.419 31.305 13.618 27.159 12.447 22.367 11.673 20.296 10.111 17.278 9.644
SMCAUSr 0.137 0.148 0.179 0.146 0.212 0.152 0.251 0.181 0.261 0.205 0.343 0.257
SMINTEr 0.424 0.430 0.610 0.525 0.741 0.565 0.899 0.949 0.893 0.841 1.072 0.823
SMCAUSlsa 0.112 0.034 0.111 0.036 0.112 0.037 0.114 0.039 0.115 0.040 0.122 0.050
SMCAUSwn 0.632 0.087 0.617 0.096 0.609 0.096 0.589 0.087 0.566 0.087 0.545 0.093
SMTEMP 0.852 0.076 0.843 0.079 0.841 0.085 0.835 0.090 0.819 0.110 0.835 0.101
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Syntactic Complexity
SYNLE 1.843 0.718 2.567 0.790 3.038 0.813 3.864 1.187 4.367 1.409 5.070 1.561
SYNNP 0.650 0.167 0.729 0.174 0.825 0.161 0.882 0.178 0.920 0.166 0.990 0.161
SYNMEDpos 0.637 0.053 0.630 0.044 0.631 0.045 0.624 0.043 0.619 0.047 0.612 0.043
SYNMEDwrd 0.848 0.051 0.855 0.041 0.857 0.036 0.866 0.037 0.870 0.035 0.869 0.037
SYNMEDlem 0.812 0.055 0.817 0.043 0.820 0.040 0.833 0.041 0.839 0.039 0.841 0.039
SYNSTRUTa 0.214 0.061 0.190 0.045 0.169 0.045 0.150 0.040 0.145 0.042 0.120 0.038
SYNSTRUTt 0.168 0.040 0.156 0.038 0.143 0.035 0.133 0.033 0.131 0.033 0.111 0.030
Syntactic Pattern Density
DRNP 352.756 31.437 365.765 32.152 365.788 32.321 369.404 34.689 372.343 32.442 376.769 32.061
267
DRVP 248.430 46.815 231.734 42.833 222.430 37.745 208.325 37.778 203.136 35.739 187.587 32.070
DRAP 28.826 15.049 31.717 15.218 26.094 13.636 27.242 12.589 25.157 11.471 25.998 11.863
[253–270] 9.10.2013
DRPP 85.102 25.395 98.117 25.188 102.450 22.239 114.323 23.615 119.598 23.745 127.057 21.393
DRPVAL 2.885 3.613 5.517 5.442 7.555 5.935 8.240 5.593 7.890 5.360 8.914 5.672
DRNEG 9.466 8.764 7.942 6.924 7.087 6.477 6.644 6.482 5.758 5.711 5.267 4.980
DRGERUND 5.485 5.544 5.533 4.956 6.127 4.932 6.209 4.609 7.142 4.934 6.366 4.779
DRINF 8.203 5.814 7.741 5.773 7.166 5.374 6.967 4.964 6.810 4.891 6.026 4.052
Word Information
WRDNOUN 238.129 45.681 260.970 44.323 272.192 41.299 283.527 40.363 285.882 43.436 290.676 36.160
WRDVERB 143.105 32.688 131.910 29.860 127.329 23.216 120.759 24.827 120.481 23.051 111.054 20.907
WRDADJ 63.722 25.663 65.988 22.732 74.060 23.462 81.881 23.873 90.459 24.846 98.167 24.938
WRDADV 45.789 23.589 48.947 19.425 43.719 20.045 45.224 17.898 42.583 18.595 43.377 17.738
WRDPRO 103.954 44.096 77.585 40.100 61.412 30.452 45.706 27.594 38.556 25.624 30.543 21.270
WRDPRP1s 5.022 19.190 1.473 7.621 0.308 2.149 1.034 5.086 0.348 3.760 0.314 1.953
WRDPRP1p 5.286 11.065 3.268 9.462 2.660 5.823 3.983 9.466 4.301 11.262 4.361 8.776
WRDPRP2 49.982 42.539 37.614 37.498 29.070 30.319 15.759 22.096 10.868 18.686 4.949 13.807
WRDPRP3s 11.569 23.661 7.274 16.673 3.656 11.044 3.491 9.663 4.238 10.942 3.816 11.135
(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Science (cont.)
268
WRDAOAc 264.503 27.001 288.838 31.288 306.917 30.729 326.958 33.093 341.231 31.113 363.769 31.012
WRDFAMc 584.050 7.625 578.728 8.821 575.049 8.419 571.466 9.109 569.298 9.276 563.479 10.348
[253–270] 9.10.2013
WRDCNCc 411.898 37.944 415.776 33.342 416.006 31.143 409.929 31.523 404.657 32.706 392.882 29.889
WRDIMGc 437.035 30.840 439.250 27.163 437.664 25.952 431.475 26.907 427.115 27.848 415.133 24.421
WRDMEAc 441.454 15.135 438.805 15.434 435.548 16.660 431.930 15.050 430.667 15.242 424.332 16.467
WRDPOLc 5.048 0.580 4.830 0.589 4.682 0.571 4.335 0.459 4.225 0.467 3.929 0.418
WRDHYPn 6.574 0.604 6.595 0.619 6.625 0.555 6.489 0.546 6.530 0.493 6.397 0.554
WRDHYPv 1.581 0.174 1.576 0.169 1.546 0.177 1.542 0.155 1.538 0.161 1.526 0.173
WRDHYPnv 1.698 0.272 1.833 0.271 1.890 0.235 1.912 0.231 1.934 0.246 1.925 0.228
Readability
RDFRE 94.959 3.638 87.751 4.716 79.853 5.336 69.956 5.206 63.774 4.398 52.164 8.890
RDFKGL 1.926 0.735 3.400 0.737 4.848 0.783 6.777 0.914 7.946 0.942 10.352 1.974
RDL2 32.462 7.265 28.470 5.953 25.014 5.553 20.866 5.245 18.776 5.122 15.066 5.066
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D 269 [253–270] 9.10.2013
8:13AM
t a b l e b . 1 The TASA passages were categorized into grade bands using DRP scores.
This table provides the number of passages included within each grade band, the
mean and standard deviation for the DRP scores for each set of passages, and the
minimum and maximum cutoff DRP scores used to define the grade bands.
Grade Band N Mean DRP Std. Deviation Minimum DRP Maximum DRP
K-1 300 43.2465 2.33841 35.00 45.99
2–3 600 48.8362 1.45713 46.00 50.99
4–5 600 53.3161 1.44334 51.00 55.99
6–8 900 59.1749 1.34791 56.00 60.99
9–10 600 62.2777 0.90323 61.00 63.99
11-CCR 900 67.4324 3.10350 64.00 85.80
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D 270 [253–270] 9.10.2013
8:13AM