You are on page 1of 285

C:/ITOOLS/WMS/CUP-NEW/4412190/WORKINGFOLDER/MCNAM/9780521192927HTL.3D i [1–2] 6.10.

2013 12:46PM

A U T O M A T E D EV A L U A T I O N OF T E X T A N D D I S C O U R S E
WITH COH-METRIX

Coh-Metrix is among the broadest and most sophisticated automated textual


assessment tools available today. Automated Evaluation of Text and Discourse
with Coh-Metrix describes this computational tool, as well as the wide range of
language and discourse measures it provides. Part I of the book focuses on the
theoretical perspectives that led to the development of Coh-Metrix, its measures,
and empirical work that has been conducted using this approach. Part II shifts to
the practical arena, describing how to use Coh-Metrix and how to analyze,
interpret, and describe results. Coh-Metrix opens the door to a new paradigm
of research that coordinates studies of language, corpus analysis, computational
linguistics, education, and cognitive science. This tool empowers anyone with an
interest in text to pursue a wide array of previously unanswerable research
questions.

Danielle S. McNamara is a professor in the department of psychology and Senior


Scientist in the Learning Sciences Institute at Arizona State University.
Arthur C. Graesser is a professor in the department of psychology and the
Institute of Intelligent Systems at the University of Memphis and is a Senior
Research Fellow in the Department of Education at the University of Oxford.
Philip McCarthy is an assistant professor at The University of Memphis and a
member of the Institute for Intelligent Systems.
Zhiqiang Cai is a research assistant professor in Institute for Intelligent Systems at
the University of Memphis.
C:/ITOOLS/WMS/CUP-NEW/4412190/WORKINGFOLDER/MCNAM/9780521192927HTL.3D ii [1–2] 6.10.2013 12:46PM
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927TTL.3D iii [3–3] 8.10.2013 8:33PM

Automated Evaluation of Text


and Discourse with Coh-Metrix

DANIELLE S. McNAMARA
Learning Sciences Institute and Psychology Department,
Arizona State University

ARTHUR C. GRAESSER
Institute for Intelligent Systems and Psychology Department,
The University of Memphis

PHILIP M. McCARTHY
Institute for Intelligent Systems, The University of Memphis

ZHIQIANG CAI
Institute for Intelligent Systems, The University of Memphis
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927IMP.3D iv [4–4] 8.10.2013 8:45PM

32 Avenue of the Americas, New York, ny 10013–2473, usa

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9780521192927
© Danielle S. McNamara, Arthur C. Graesser, Philip M. McCarthy, and Zhiqiang Cai 2014
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2014
Printed in the United States of America
A catalog record for this publication is available from the British Library.
Library of Congress Cataloging in Publication Data
McNamara, Danielle S.
Automated evaluation of text and discourse with Coh-Metrix / Danielle S.
McNamara, Arizona State University; Arthur C. Graesser, Institute for
Intelligent Systems, The University of Memphis; Philip M. McCarthy, Institute
for Intelligent Systems, The University of Memphis; Zhiqiang Cai, Institute
for Intelligent Systems, The University of Memphis.
pages cm
Includes bibliographical references.
isbn 978-0-521-19292-7 (Hardback) – isbn 978-0-521-13729-4 (Paperback)
1. Discourse analysis – Data processing. 2. Cognition – Data processing.
3. Psycholinguistics. 4. Cognitive science. 5. Corpora (Linguistics) 6. Computational
linguistics. I. Graesser, Arthur C. II. McCarthy, Philip M., 1967–
III. Cai, Zhiqiang, 1962– IV. Title.
p302.3.m39 2014
006.30 5–dc23
2013030437
isbn 978-0-521-19292-7 Hardback
isbn 978-0-521-13729-4 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such Web sites is, or will remain,
accurate or appropriate.
C:/ITOOLS/WMS/CUP-NEW/4412190/WORKINGFOLDER/MCNAM/9780521192927DED.3D v [5–6] 6.10.2013 1:17PM

We dedicate this book to our mentors and students. We learned from


giants and we continue to learn for as long as we have the privilege of
working with our students.
C:/ITOOLS/WMS/CUP-NEW/4412190/WORKINGFOLDER/MCNAM/9780521192927DED.3D vi [5–6] 6.10.2013 1:17PM
Contents

List of Figures page ix


List of Tables xi
Acknowledgements xiii

Introduction 1

part i. coh-metrix: theoretical,


technological, and empirical foundations
1 What Is Text and Why Analyze It? 7
2 The Importance of Text Cohesion 18
3 The Science and Technology That Led to Coh-Metrix 40
4 Coh-Metrix Measures 60
5 Coh-Metrix Measures of Text Readability and Easability 78
6 Using Coh-Metrix Measures: Studies of Cohesion
in Text and Writing 96

part ii. a beginner’s guide to writing


coh-metrix research
7 The Strategy: Moves, Frozen Expressions, and the Elevator Pitch 115
8 The Introduction 128
9 The Corpus 145
10 The Tool 163
11 The Results 176
12 The Discussion 194

vii
viii Contents

Concluding Remarks 223


References 229
Appendix
A. Coh-Metrix 3.0 Indices 247
B. Coh-Metrix Indices Norms 253
Index 271
C:/ITOOLS/WMS/CUP-NEW/4418045/WORKINGFOLDER/MCNAM/9780521192927TOCF.3D ix [9–10] 9.10.2013 8:26AM

Figures

2.1 Connection model of coherence page 19


2.2 Argument overlap and Flesch-Kincaid grade level as a
function of cohesion 29
2.3 Model of reader inference using prior text and prior knowledge 30
3.1 Syntactic structure for “The dog is swimming in my pool” 47
4.1 Sentence-to-sentence syntax similarity 71
5.1 Coh-Metrix percentile scores for the five components
(Narrativity, Referential Cohesion, Syntactic Simplicity, Word
Concreteness, and Deep Cohesion) on 6,755 language arts, 4,463
social studies, and 8,550 science texts from TASA above DRP
grade level 6 88
5.2 Coh-Metrix percentile scores for the five components
(Narrativity, Referential Cohesion, Syntactic Simplicity, Word
Concreteness, and Deep Cohesion) on two excerpts presented in
Chapter 1, Lady Chatterley’s Lover and A Mortgage 90
5.3 Coh-Metrix percentile scores for the five components
(Narrativity, Referential Cohesion, Syntactic Simplicity, Word
Concreteness, and Deep Cohesion) on two excerpts from
appendix B of the Common Core State Standards, Little Women
and Adventures of Tom Sawyer. 92
5.4 Coh-Metrix percentile scores for the five components
(Narrativity, Referential Cohesion, Syntactic Simplicity, Word
Concreteness, and Deep Cohesion) on two excerpts from
appendix B of the Common Core State Standards, Discovering
Mars: The Amazing Story of the Red Planet and Hurricanes:
Earth’s Mightiest Storms 93
8.1 Coh-Metrix Research Paper Outline 144

ix
C:/ITOOLS/WMS/CUP-NEW/4418045/WORKINGFOLDER/MCNAM/9780521192927TOCF.3D x [9–10] 9.10.2013 8:26AM

x List of figures

9.1 Coh-Metrix Research Paper Outline 161


10.1 Coh-Metrix Research Paper Outline 175
12.1 The discussion model helps organize the ending argument of
your paper 195
C:/ITOOLS/WMS/CUP-NEW/4418045/WORKINGFOLDER/MCNAM/9780521192927TOCT.3D xi [11–12] 9.10.2013 8:28AM

Tables

4.1 A comparison of the five coreference indices on a science


text about cells page 64
7.1 The 11 Elements of the Elevator Pitch 124
9.1 The four major moves of the corpus section 160
10.1 The four major moves of the tool section 164
12.1 Examples of four forms used in Coh-Metrix
commencement moves 197
12.2 Examples of three grammatical structures used in
Coh-Metrix studies 198
12.3 Six examples of the commencement move using the
commencement model 199
12.4 Three Coh-Metrix studies featuring interpretation moves 203
12.5 Examples of implication frozen expressions 207
12.6 Examples of limitations moves, future research moves, and
hybrids 209
12.7 Example 3 of a Closure Move by McCarthy, Renner, et al.
(2008) 215
12.8 Example 5 of a Closure Move by McCarthy and McNamara
(2007) 216
12.9 Six examples of pitches 217
12.10 A model of the discussion section by sequential position,
paragraph position, discussion phase, discussion move, and
element of move 219
B.1 TASA passage categorized into grade bands 269

xi
C:/ITOOLS/WMS/CUP-NEW/4418045/WORKINGFOLDER/MCNAM/9780521192927TOCT.3D xii [11–12] 9.10.2013 8:28AM
C:/ITOOLS/WMS/CUP-NEW/4415828/WORKINGFOLDER/MCNAM/9780521192927ACK.3D xiii [13–14] 8.10.2013 3:15PM

Acknowledgments

Coh-Metrix has been built, tested, revised, and used by many researchers,
colleagues, and students over the past decade. We are extremely grateful to
the inestimable number of people who have contributed to the Coh-Metrix
project. We are likely to leave someone out if we attempt to list everyone who
has worked with us on Coh-Metrix. We must, however, explicitly acknowl-
edge a few key individuals. Max Louwerse, Randy Floyd, and Xiangen Hu
were co-investigators on the original Coh-Metrix project – we are thankful
for the opportunities we had to work with them and for their invaluable input
and contributions. Jianmin Dai joined our team more recently and has
contributed greatly to our Coh-Metrix analyses of writing and to the develop-
ment of various Coh-Metrix tools. Scott Crossley contributed to the develop-
ment of Coh-Metrix and has been perhaps the most avid user of Coh-Metrix
over the years. Working with Scott has been a delight, and without his work
we would have never progressed to where we are today. Finally, we cannot
express in words our gratitude to the many students who have worked on this
project and on related projects: We would be nothing without them.
The development of Coh-Metrix and much of the research referenced within
this book was supported by the Institute of Education Sciences, U.S. Department
of Education, through Grant [R305G020018-02] to the University of Memphis.
Research using Coh-Metrix was also supported by funding to develop and assess
the Writing Pal by the Institute of Education Sciences, U.S. Department of
Education, through Grants [IES R305A080589] to the University of Memphis
and Grants [R305A09623; R305A120707] to Arizona State University. Use and
modifications of Coh-Metrix was also supported by the National Science
Foundation through grant [BCS 0904909] to the University of Memphis. The
development of the Coh-Metrix text easability components was partially sup-
ported by the Gates Foundation through a subcontract to Student Achievement
Partners. The opinions expressed are those of the authors and do not represent
views of the Institute or the U.S. Department of Education, the National Science
Foundation, or the Gates Foundation.

xiii
C:/ITOOLS/WMS/CUP-NEW/4415828/WORKINGFOLDER/MCNAM/9780521192927ACK.3D xiv [13–14] 8.10.2013 3:15PM
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 1 [1–4] 8.10.2013 8:37PM

Introduction

This book describes Coh-Metrix, a computational tool that provides a wide


range of language and discourse measures. It is a linguistic workbench that
researchers, teachers, and students of many different disciplines can use to
obtain information about their texts on numerous levels of language. This
book consists of two parts. The first section focuses on the theoretical
motivations and perspectives that led to the development of Coh-Metrix.
Part I describes its technological foundations, the measures it provides, and
empirical work that has been conducted using Coh-Metrix. We see Part I as
being invaluable to researchers who wish to situate their Coh-Metrix work
within the theoretical and empirical fields of discourse processing, psycholin-
guistics, text design, and related fields.
Part II shifts to the practical and pedagogical arena, describing how to use
Coh-Metrix and how to analyze, interpret, and describe Coh-Metrix results.
This section is written for computational novices and students who wish to
not only use Coh-Metrix (or similar computational tools), but also describe
the resulting studies and their outcomes.
Coh-Metrix was developed, refined, and tested between 2002 and 2011 at
the University of Memphis. The initial funding for the Coh-Metrix project
was awarded in 2002 (R305G020018) from the Office of Educational Research
and Improvement (OERI), which became the Institute for Education Sciences
(IES) the following year. Our initial discussions that led to the Coh-Metrix
grant proposal revolved around establishing common ground between an
interdisciplinary collection of researchers with very different backgrounds.
One fundamental issue that called for a common understanding was whether
we all believed that cohesion was observable in text, or alternatively whether it
could only be measured with respect to the reader. We all agreed, fortunately,
that cohesion could be measured in a text. We finally agreed to use the term
cohesion when referring to observable aspects of the text, and coherence when

1
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 2 [1–4] 8.10.2013 8:37PM

2 Automated Evaluation of Text and Discourse with Coh-Metrix

referring to the consequences of cohesion in the mind of the reader (see


Chapter 5). This definition of terms was crucial to our moving forward. Since
that time, we have been working on developing, refining, and playing with
Coh-Metrix.
Coh-Metrix has quickly and effectively moved well beyond its original
goals of developing measures of cohesion to better match text to readers. It
is arguably the broadest and most sophisticated automated textual assessment
tool currently available on the web. Coh-Metrix empowers anyone with an
interest in text to pursue a wide array of previously unanswerable research
questions. Coh-Metrix automatically provides numerous measures of
evaluation at the levels of the text, the paragraph, the sentence, and the
word. Coh-Metrix uses lexicons, part-of-speech classifiers, syntactic parsers,
semantic analyzers, Latent Semantic Analysis (a statistical representation of
world knowledge based on corpus analyses), and several other components
that are widely used in computational linguistics. For example, the MRC
(Medical Research Council) Psycholinguistic Database (Coltheart, 1981) is
used for psycholinguistic information about words. WordNet has linguistic
and semantic features of words, as well as semantic relations between words
(Miller, Beckwith, Fellbaum, Gross & Miller, 1990). Latent Semantic Analysis
computes the semantic similarities between words, sentences, and paragraphs
(Landauer & Dumais, 1997; Landauer, McNamara, Dennis, & Kintsch, 2007).
And, syntax is analyzed by syntactic parsers (e.g., Charniak, 2000).
This book describes a plethora of studies that have been conducted since
Coh-Metrix was first launched in 2003. Our research labs have collectively
published well over a hundred studies that have used Coh-Metrix to analyze
texts in print and oral discourse. Among those publications are studies that
have validated the use of Coh-Metrix to assess the cohesion of text (e.g.,
McNamara, Louwerse, McCarthy, & Graesser, 2011). Collectively, these stud-
ies have used Coh-Metrix to distinguish a wide range of texts. For example,
Louwerse, McCarthy, McNamara, and Graesser (2004) identified significant
differences between spoken and written samples of English. Graesser, Jeon,
Yang, and Cai (2007) identified differences between physics context that
occurred in textbooks, texts prepared by researchers, and conversational
discourse in tutorial dialogue. Lightman, McCarthy, Dufty, and McNamara
(2007a) distinguished the beginnings, middles, and ends of chapters in a
corpus of history and science textbooks for high school. Crossley, Louwerse,
McCarthy, and McNamara’s (2007) investigations of second language learner
texts revealed a wide variety of structural and lexical differences between texts
that were adopted (or authentic) versus adapted (or simplified) for second
language learning purposes. These few studies only begin to represent the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 3 [1–4] 8.10.2013 8:37PM

Introduction 3

extensive body of research that has evolved since Coh-Metrix was launched to
discourse processing researchers and scholars in other fields.
The Coh-Metrix facility and the associated theoretical framework would
never have been accomplished without an interdisciplinary team of research-
ers. The relevant major fields have included psychology, computer science,
linguistics, and education but it is the more specialized hybrid fields that have
provided the more useful, targeted contributions: discourse processing, psy-
cholinguistics, reading, computational linguistics, corpus linguistics, cogni-
tive science, artificial intelligence, information retrieval, and composition.
Some of us brand ourselves as computational discourse scientists. We use
the term discourse as a general umbrella term for analyses of language, texts,
communication, and social interaction through various communication
channels. Our work is computational in two ways. First, we precisely specify
the algorithms or symbolic procedures that identify text categories, units, or
patterns at the various levels of a multilevel theoretical framework. Second,
we attempt to program the computer to implement these algorithms and
procedures. Many computer implementations are successful, but there are no
guarantees. Coh-Metrix includes only the successful automated algorithms
and procedures. And finally, we are scientists because we embrace scientific
methods in all stages of our research. That is, we sample texts in a systematic
manner when we empirically test well-formulated claims about text charac-
teristics. We perform statistical analyses that assess the generality of our
claims regarding targeted text categories. We collect data from human par-
ticipants to test claims and predictions about the impact of text characteristics
on comprehension and other psychological processes.
We are hopeful that Coh-Metrix will be useful to scholars in both the
sciences and humanities and to all sectors of the public. Coh-Metrix opens the
door to a new paradigm of research that coordinates studies of language,
discourse, corpus analysis, computational linguistics, education, and cogni-
tive science (Graesser, McNamara, & Rus, 2007). We hope that this book will
be of use to a wide range of readers, including researchers, educators, writers,
publishers, and students. Our vision is broad. There is the student in a
literature course who analyzes differences between various works by
Shakespeare, and the student in an educational psychology course who
compares textbooks written for elementary versus middle school courses.
There are the students who want to know about the nature of their own
writing and whether it improves over time. There is the book publisher who
wants to know whether a text in biology is written coherently compared with
other books on the market. There are the school superintendents who want to
evaluate all of the books being used in their school system. There is the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927INT.3D 4 [1–4] 8.10.2013 8:37PM

4 Automated Evaluation of Text and Discourse with Coh-Metrix

attorney who wants to know the difficulty of the Miranda Rights when
defending a client who has a modest understanding of the English language.
The uses and applications of Coh-Metrix are endless. Enjoy!

recommended supplementary readings


An introduction to Coh-Metrix is provided in a number of publications
(Graesser & McNamara, 2011; Graesser, McNamara, & Kulikowich, 2011;
Graesser, McNamara, Louwerse, & Cai, 2004; McNamara & Graesser, 2012;
McNamara, Louwerse, & Graesser, 2010). The Coh-Metrix research group
has published well over 50 articles in journals, books, and conference pro-
ceedings. Many of these articles can be accessed on the Coh-Metrix website
(www.cohmetrix.com) and many can be accessed from Danielle McNamara’s
lab website (soletlab.com). Most importantly, the cohmetrix.com site also
provides access to Coh-Metrix 3.0, the focus of this book.
A book edited by McCarthy and Boonthum-Denecke (2012) provides many
examples of research efforts in computational discourse science. This interdis-
ciplinary field is closely aligned with a number of other hybrid fields that
investigate language and discourse, including discourse processing (Graesser,
Goldman, & Gernsbacher, 2003; Sanford & Emmott, 2012), psycholinguistics
(Spivey, Joanisse, & McRae, 2010), reading (Kamil, Pearson, Moje, & Afflerbach,
2011; McNamara, 2007), computational linguistics (Jurafsky & Martin, 2008),
corpus linguistics (Biber, Conrad, & Reppen, 1998), and cognitive science
(Kintsch, 1998; Landauer, McNamara, Dennis, & Kintsch, 2007).
We have adopted a multilevel theoretical framework for analyzing text
difficulty with Coh-Metrix (Graesser & McNamara, 2011). An alternative
perspective assigns a text to a single dimension of text difficulty, as in the
case of Lexiles (Stenner, 2006). Another alternative positions a text in a
multiple dimensional space, as in the case of analyses by Biber (1988).
Multilevel theoretical frameworks have been proposed that include the
levels of words, syntax, textbase, situation model, and genre/rhetorical struc-
ture (Graesser & McNamara, 2011; Graesser, Millis, & Zwaan, 1997; Kintsch,
1998; Pickering & Garrod, 2004). More detailed theoretical and empirical
discussions of these levels are provided for words (Pennebaker et al., 2007;
Perfetti, 2007), syntax (Charniak, 2000; Rus et al., 2006), textbase (van Dijk &
Kintsch, 1983; McNamara et al., 2010), situation model (Graesser, Singer, &
Trabasso, 1994; Zwaan & Radvansky, 1998), and genre/rhetorical structure
(Biber, 1988). The book edited by McCarthy and Boonthum-Denecke (2012)
reports computational measures and psychological evidence for these five
levels and other aspects of language, discourse, and text.
C:/ITOOLS/WMS/CUP-NEW/4406319/WORKINGFOLDER/MCNAM/9780521192927PTL01.3D 5 [5–6] 4.10.2013 10:11AM

part i

COH-METRIX: THEORETICAL,
TECHNOLOGICAL, AND EMPIRICAL
FOUNDATIONS
C:/ITOOLS/WMS/CUP-NEW/4406319/WORKINGFOLDER/MCNAM/9780521192927PTL01.3D 6 [5–6] 4.10.2013 10:11AM
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 7 [7–17] 8.10.2013 7:17PM

What Is Text and Why Analyze It?

Some texts are easy to read. Others are difficult. That is perfectly obvious. The
challenge lies in devising an objective means to measure texts on how difficult
they are to read. That is one of the puzzles that motivated our development of
Coh-Metrix and ultimately the writing of this book. How do we scale texts on
comprehension difficulty? Or on the flip side: easability?
It is often quite clear when texts are difficult or easy. Consider the two texts
below and cast your vote on which is difficult and which is easy.
Lady Chatterley’s Lover
He spread the blankets, putting one at the side for a coverlet. She took off her hat,
and shook her hair. He sat down, taking off his shoes and gaiters, and undoing his
cord breeches. “Lie down then!” he said, when he stood in his shirt. She obeyed in
silence, and he lay beside her, and pulled the blanket over them both.

A Mortgage
The assignment, sale, or transfer of the servicing of the mortgage loan does not
affect any term or condition of the mortgage instrument, other than terms directly
related to the servicing of your loan. Except in limited circumstances, the law
requires your present servicer send you this notice within 15 days before this
effective date or at closing.

We do not need to conduct a survey to discover how most English speakers


will vote. The Chatterley text by D. H. Lawrence is clearly easier than the
mortgage text. The question is why?
Some obvious hypotheses fail to discriminate these two excerpts on com-
prehension difficulty. Both passages have pronouns that require inferences to
understand what they refer to. And, both texts have low-frequency words in
the English language. Readers will be challenged by coverlet, gaiters, and cord
breeches, just as they will be challenged by words such as mortgage, instru-
ment, and present servicer. The core topics underlying these two texts are both
7
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 8 [7–17] 8.10.2013 7:17PM

8 Automated Evaluation of Text and Discourse with Coh-Metrix

important. Sex and romance are on par with money and domestic security,
although it could be argued that sex and romance are considerably more
interesting. Both texts require a sociocultural context for a complete under-
standing, be it knowledge of romance or of finance. Moreover, a deep under-
standing of the D. H. Lawrence story requires knowledge of the status of
women in the early 20th century (i.e., not great), when it was written. The
differences in comprehension difficulty for these two texts are indeed much
more complex and subtle than is readily apparent from the text alone.
This book will unveil the many ways that texts vary in comprehension
difficulty. What we sometimes call comprehension easability is aligned with
reading ease or readability, the other end of the continuum being text
difficulty or text complexity. Our theoretical approach is to analyze texts on
many levels of language, meaning, and discourse (Graesser & McNamara,
2011). A computer program called Coh-Metrix (and Coh-Metrix-TEA) per-
forms these analyses automatically for many of the levels that researchers
have identified over the years (Graesser, McNamara, & Kulikowich, 2011;
Graesser, McNamara, Louwerse, & Cai, 2004; McNamara & Graesser, 2012;
McNamara, Graesser, & Louwerse, 2012; McNamara, Louwerse, McCarthy, &
Graesser, 2010). The Coh-Metrix output on these many levels provides the
foundation for scaling texts on difficulty (versus easability).

what text?
Our emphasis in this book is on printed texts, although the texts may derive
from virtually any source and be composed for any English language com-
munity. For example, they may be newspaper articles, entries in encyclope-
dias, science texts in schools, legal documents, advertisements, short stories,
or theatrical scripts – the list goes on. The Coh-Metrix program holds up
quite well for most of the texts that we have analyzed. The majority of our
analyses have been on naturalistic texts, but we have also analyzed well-
controlled texts that discourse researchers have prepared or manipulated
for psychology experiments (McNamara et al., 2010). Our goal is to accom-
modate virtually any text in the English language that people write with the
intention of communicating messages to readers.
Our theoretical framework and the Coh-Metrix program can also be used
to analyze transcripts of naturalistic oral discourse. We have analyzed con-
versations in tutoring sessions, chat rooms, e-mail exchanges, and various
forms of informal conversation. Transcribed texts of conversations are replete
with speech disfluencies (um, ah, er), ungrammatical utterances, interrup-
tions, overlapping speech, slang, and semantically vague expressions (Clark,
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 9 [7–17] 8.10.2013 7:17PM

What Is Text and Why Analyze It? 9

1996). These deviations from well-formed, edited, neat and tidy text have a
major impact on some of the Coh-Metrix measures, but many of the meas-
ures are minimally disturbed. It is also possible to analyze students’ written
responses, explanations, and essays that are similarly replete with untidy
language and discourse (Crossley & McNamara, 2011; Louwerse, McCarthy,
McNamara, & Graesser, 2004; McNamara, Raine et al., 2012; Renner,
McCarthy, Boonthum-Denecke, & McNamara, 2012).
While Coh-Metrix analyses of more naturalistic discourse (e.g., dialogues)
have been highly successful, it remains important to acknowledge that some
classes of printed texts will stress the boundaries of Coh-Metrix. Current
versions of Coh-Metrix are not well equipped to handle mathematical expres-
sions, pictures, diagrams, and other forms of nonverbal media. Coh-Metrix
can be applied to poetry (Lightman, McCarthy, Dufty, & McNamara, 2007b),
but measures at some levels (such as syntax) will be compromised and Coh-
Metrix will not do justice to metaphorical expressions (Graesser, Dowell, &
Moldovan, 2011). Likewise, many aspects of the quality of writing, such as
rhetorical and pragmatic aspects of language, are not fully captured by Coh-
Metrix alone (McNamara, Crossley, & Roscoe, 2013). These challenges are on
deck for future research endeavors.

why should we scale texts on difficulty?


Skeptics ask why we bother scaling texts on difficulty. What problems will this
solve? Text is qualitative verbal material, so what’s the point in assigning
numbers to the morass of qualitative symbolic codes? Wouldn’t it be better to
have a group of experts describe particular texts on qualitative attributes and
to scrap the mission of assigning numbers to texts?
Our response to the skeptics is that the assignment of Coh-Metrix values to
texts is quite important and eminently humane. Consider the following
applications of Coh-Metrix and the practical implications for quality of life.
Assigning texts to students in school. Ideally, the texts assigned to students
should be within an optimal zone of comprehension difficulty. The optimal
zone is a matter of debate and is likely to depend on the characteristics of the
student (Graesser et al., 2011) as well as the teacher’s pedagogical goals. Some
students are best served by texts at an intermediate level of difficulty for them:
Not too easy, not too difficult, but just right. If the texts are too easy, the
students are not challenged and they may become bored. If the texts are too
difficult, the students are overwhelmed, become discouraged, and tune out.
Some students are eager to read texts considerably above their comfort level
and others need to build self-confidence in reading by receiving texts that are
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 10 [7–17] 8.10.2013 7:17PM

10 Automated Evaluation of Text and Discourse with Coh-Metrix

easy for them to read. The assignment of texts can also be tailored to
particular deficits that a student has at particular levels of language or
discourse. A student who is reading quite well but has trouble understanding
the global meaning of stories should be receiving different texts than students
who are having trouble with syntax or those who experience challenges with
vocabulary. Many claim that text assignment should be adapted to the
student’s profile of reading skills and proficiencies, and moreover, that stu-
dent motivation and learning improve when this happens (Connor,
Morrison, Fishman, Schatschneider, & Underwood, 2007).
Quality of public documents. The comprehension difficulty of many public
documents is too high for a large percentage of the population. The earlier
mortgage text illustrates the problem. Legal documents, medical documents,
and employment agreements are also excellent examples of challenging texts
that are difficult to understand for most of the public. Similarly, question-
naires and surveys administered to the public, such as tax forms and
census surveys, have a high percentage of questions that pose comprehension
difficulties to a significant portion of the public (Conrad & Schober, 2007;
Graesser, Cai, Louwerse, & Daniels, 2006). The reliability and validity of data
collected from these surveys is compromised when the questions have diffi-
cult words, ambiguous meaning, complex syntax, or content that excessively
burden cognitive resources. Individuals and society suffer the consequences.
Drug prescriptions and medical procedures. It is obviously important to
take the proper dosage of drugs, to be mindful of side effects, and to under-
stand medical procedures. Failure to do so may be a matter of life or death.
Unfortunately, the complexity of medical information is too high for most of
the public to comprehend, particularly when there is a large amount of jargon,
incoherent descriptions of procedures, and complex models of health and
biological mechanisms (Day, 2006). Interestingly, the advertisements tend to
be much easier to read than the warnings. Consider the following warning on
a nonprescription drug:

Do not use if you are now taking a prescription monoamine oxidase inhibitor
(MAOI) (certain drugs for depression, psychiatric, or emotional conditions, or
Parkinson’s disease), or for 2 weeks after stopping the MAOI drug.

These examples illustrate the value of analyzing texts on difficulty and


including quantitative scales in this process. We would argue that public
documents and medical instructions need to be within a reasonable zone of
text difficulty. The education of students hinges on the assignment of texts,
tests, and other materials that are within the students’ proficiency zones at
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 11 [7–17] 8.10.2013 7:17PM

What Is Text and Why Analyze It? 11

different levels of language, meaning, and discourse. Coh-Metrix can con-


tribute to these efforts to improve the texts that students and adults read.

three approaches to scaling texts on difficulty


Three perspectives can be taken for scaling texts on difficulty, each putting the
magnifying glass on different analytical schemes. We refer to these as text
categories, dimensions, and levels. We argue that a satisfactory model of
comprehension difficulty involves multiple levels of language and discourse.
Indeed, we have been particularly intrigued with the role of text cohesion and
coherence: the impetus for developing Coh-Metrix. “Cohesion” refers to the
connectedness of concepts presented in a text, whereas “coherence” refers to
the connectedness of mental representations that readers are likely to con-
struct from the text. Although these notions of cohesion and coherence
initially inspired our project, a broad spectrum of language and discourse
measures ultimately evolved over the years of its development. Our multilevel
theoretical framework (Graesser & McNamara, 2011) encompasses the diffi-
culty of words, sentences, and discourse in ways that stretch beyond the
notions of cohesion and coherence.

Text Categories
There are many categories of text, or what some researchers call “genre,” a
French word for category. Text category schemes vary in the sets of categories
that are included as well as in grain size. These variations often depend on the
discipline and theoretical slant of the researchers. A traditional scheme of
Brooks and Warren (1972) divides texts into the categories of narrative,
expository, persuasive, and descriptive (see also McCarthy, Meyers, Briner,
Graesser, & McNamara, 2009). Each of these categories has subcategories and
potentially sub-subcategories in a hierarchical scheme with varying levels of
grain size. Narrative texts convey events and actions performed by characters
that unfold over time, as in the case of folktales, drama, and short stories
(Sanford & Emmott, 2012). Expository texts explain the nature of mecha-
nisms or other phenomena, as in the case of science texts and encyclopedia
articles. Subcategories of persuasive texts are sermons, editorials, and adver-
tisements. Descriptive texts describe either static entities (a visual scenario,
the attributes of an object, the personality of a person) or activities (a broad-
cast of the events at a baseball game).
There are a number of limitations of text categorization schemes. One
problem is that researchers disagree on what categories to include and on the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 12 [7–17] 8.10.2013 7:17PM

12 Automated Evaluation of Text and Discourse with Coh-Metrix

definitions of the categories. A second problem is that a specific text can be


assigned to multiple categories. For example, the beginning of a short story
may fit the description genre as the author describes the setting and charac-
ters. The story may shift to the narrative genre when the plot unfolds, and
include text fitting the expository genre when particular details are filled in. A
third problem is that the categories are not well defined according to classical
definitions. A category C is well defined if category C has a set of features (i.e.,
properties, attributed, characteristics) that are necessary and jointly sufficient
to discriminate texts in category C from texts that are not in C. For example, a
composition instructor might claim that narrative texts have a plot, but
expository, persuasive, and descriptive texts do not have plots. The instructor
would be saying that plot is a necessary feature of narratives and may even be
sufficient to discriminate narratives from non-narratives. Unfortunately,
categories are rarely well defined, as scholars have known for decades
(Wittgenstein, 1953). Instead, categories are usually probabilistic prototypes.
That is, the texts in category C share many features with each other (called
family resemblance) and also more features than texts outside of category
C. Moreover, there may be zero features that (a) are shared by all texts in
category C, or (b) no texts outside of C. In essence, features of a category are
applicable with some probability that is higher than the features are applicable
to other categories. The fact that the categories are probabilistic prototypes is
prone to create confusion when researchers argue for or against a category
scheme. There is the risk of endless uncertainty and debate among scholars
rather than a convergence on a consensual set of text categories.
The fact that it is difficult to cleanly assign texts to specific categories does
not invalidate attempts to define text categories probabilistically. This is
succinctly captured by an old paradox that there is no point in time that
unambiguously segregates night and day and yet there exists a distinction
between night and day. Just as there is a prototypical nighttime and a
prototypical daytime, there are prototypical narrative texts and prototypical
science texts. There are also quantitative methods of representing these text
category prototypes, as we discuss later in this book (see Chapter 5).
Consequently, a particular text might have the value of being 70% narrative
versus 30% informational expository text. This probabilistic prototype view of
text categories is respectable and perfectly aligned with most categories as
they are defined in the cognitive sciences (Rosch & Mervis, 1975; Smith &
Medin, 1981).
Texts in some categories tend to be more difficult to comprehend than texts
in other categories. For example, narrative texts tend to be easier to compre-
hend than informational texts, such as encyclopedia articles and science
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 13 [7–17] 8.10.2013 7:17PM

What Is Text and Why Analyze It? 13

textbooks. According to some estimates, narrative texts are read approxi-


mately twice as quickly and remembered twice as well as informational texts
(Graesser & Ottati, 1996; Haberlandt & Graesser, 1985). Perhaps it is possible
to scale text categories on difficulty and to use the category scale values to
scale individual texts. That is, if text T is in text category C and if category C
has a difficulty level of D, then text T would inherit the difficulty level
D. However, as far as we know, texts have not been scaled on difficulty in
this fashion.

Text Dimensions
One approach to scaling texts is to have a single dimension of text difficulty.
This is the approach taken by metrics such as Flesch-Kincaid Grade Level
(FKGL; Klare, 1974–1975), Degrees of Reading Power (DRP; Koslin, Zeno, &
Koslin, 1987), and Lexile scores (Stenner, 2006). We and others have found
these three metrics of text complexity to be highly correlated (r > .90). These
and other similar readability formulas are correlated because they all include
features related to the frequency of the word in language and the length of the
sentence. Readability formulas are theoretically grounded on the assumption
that a reader’s understanding of sentences in a text is related to the likelihood
that the reader knows the words in the sentences and can parse the sentences
in the text.
The Flesch-Kincaid Grade Level metric is based on the length of words and
length of sentences. For example, Formula 1 shows the Flesch-Kincaid metric.
Words refers to the mean number of words per sentence and syllables refers to
the mean number of syllables per word.

Grade Level ¼ :39 Words þ 11:8 Syllables  15:59 ð1:1Þ

The grade level increases as the words and sentences increase in length. These
two factors of word length and sentence length are reasonable psychologi-
cally. Longer words tend to be less frequent in the English language so readers
have less world knowledge about these words. Longer sentences tend to place
a greater load on working memory and thereby increase comprehension
difficulty.
DRP and Lexile scores relate characteristics of the texts to readers’ per-
formance in a cloze task. In the cloze task, the text is presented with words left
blank during the course of reading; the reader is asked to fill in the words by
generating them or by selecting a word from a set of options. A text is at the
reader’s level of proficiency if the reader can perform the cloze task at a
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 14 [7–17] 8.10.2013 7:17PM

14 Automated Evaluation of Text and Discourse with Coh-Metrix

threshold of performance (e.g., 75%). A text is defined as easy for a population


of readers at a particular grade level if performance exceeds 75% and is
difficult to the extent it is lower than 75%.
These unidimensional metrics of text difficulty provide a reasonable first
approximation to scaling texts on difficulty. Word and sentence length are
indeed excellent predictors of reading time (Haberlandt & Graesser, 1985;
Just & Carpenter, 1987). The Lexile and DRP scores have been impressive
predictors of reading and comprehension scores on psychometric tests that
are widely adopted throughout the country (Stenner, 2006). However, we
believe that a single dimension of text difficulty will not go the distance in
accounting for many facets of comprehension. Our belief is widely shared by
many researchers, teachers, school administrators, policy makers, and others
in the education. This was part of our motivation behind developing Coh-
Metrix.
One potential multidimensional perspective on analyzing texts is to scale
the texts on particular text dimensions (Biber, 1988; Louwerse et al., 2004).
For example, a particular text can be scaled on the extent to which it is
(a) informational versus narrative, (b) print versus oral, (c) decontextualized
versus interactive with an audience, (d) academic versus informal, and so on.
Biber (1988) has developed an analytical scheme that scales texts on dimen-
sions such as these based on 67 features of words and syntax. These dimen-
sions and similar ones have been predictive of a variety of differences between
texts, and may map onto a scale of difficulty. For example, difficult texts
would tend to be informational, print, decontextualized, and academic.
Although this approach is reasonable, we are not aware of a project that has
systematically pursued this approach relative to predicting text difficulty.

Text Levels
In our view, the most promising approach to scaling texts on difficulty is to
adopt a multilevel theoretical framework for language and discourse process-
ing (Graesser & McNamara, 2011). Psychological theories of comprehension
have identified the representations, structures, strategies, and processes at
multiple levels of language and discourse (Graesser, Millis, & Zwaan, 1997;
Kintsch, 1998). For example, Graesser and McNamara (2011) consider six
levels: words, syntax, the explicit textbase, the referential situation model
(sometimes called the mental model), the discourse genre and rhetorical
structure (the type of discourse and its composition), and the pragmatic
communication level (between speaker and listener, or writer and reader).
We believe that a scale of text difficulty needs to consider these different levels.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 15 [7–17] 8.10.2013 7:17PM

What Is Text and Why Analyze It? 15

Moreover, subscales are needed for each of the levels because a text can be
difficult according to some subscales but not for others.
The first five of these six levels are elaborated in Chapter 3 (see also
Chapter 2). Chapters 4 and 5 provide a more detailed description of the
computational components and measures associated with these levels.
Therefore, only a cursory description of these levels is sufficient in this
introductory chapter. The levels of words and syntax need not much elabo-
ration here because they are self-explanatory. Quite clearly, the vocabulary in
a text can impose comprehension difficulties, as illustrated by the medical
warning example presented earlier. The syntactic composition of sentences
can result in very different comprehension problems than those attributed to
words. It is difficult to construct meanings from sentences that have syntactic
structures that are lengthy with many embedded subordinate clauses. We
believe that the word length and sentence length parameters of the readability
formula capture some facsimile of these word and syntax levels. However, the
other four levels move us beyond the readability formulas and into more
intriguing realms of meaning.
The textbase contains explicit ideas in the text in a form that preserves the
meaning but not the precise wording and syntax. According to van Dijk and
Kintsch (1983), the textbase contains explicit propositions in the text, as well as
links between propositions and a small set of inferences that connect these
explicit propositions. Propositions are more complex idea units than individual
words. For example, consider the first sentence in the earlier example from
Lady Chatterley’s Lover: “He spread the blankets, putting one at the side for a
coverlet.” The first sentence would have the following underlying propositions:
(1) the lover spread the blankets, (2) the lover put a blanket at the side, and (3)
the blanket was for a coverlet. In the van Dijk and Kintsch analysis, the
propositions are in a stripped down form that removes surface code features
captured by determiners (the, a), quantifiers (some, all, three), tense (past,
present, future), aspect (event completed versus in progress) and auxiliary
verbs (could, was). For example, a propositional representation of the lover
spread the blankets is spread (lover, blankets). Further, the textbase representa-
tion glosses over any distinction between the special blanket for the coverlet and
the other blankets. It also ignores the fact that the verb spread is in the past
tense, that the verb putting is a gerund, and that the timing of the spreading and
putting are not identical. These distinctions are explicit in the surface structure
of the reader’s understanding, but are not within the textbase. It is an empirical
question how much the reader tracks or remembers these subtleties.
One of the central questions about a reader’s textbase representation is
whether the noun entities (e.g., lover, blanket, coverlet, side) and propositions
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 16 [7–17] 8.10.2013 7:17PM

16 Automated Evaluation of Text and Discourse with Coh-Metrix

are connected in a coherent fashion. Indeed, our measures of cohesion in


Coh-Metrix were developed to assess the extent to which a text has referential
cohesion at the textbase level. Difficult texts have many cohesion gaps. If the
reader does not have enough world knowledge to fill these gaps, then com-
prehension will suffer (McNamara & Kintsch, 1996). Indeed, Chapter 2 elab-
orates on the research on cohesion and knowledge that was foundational to
Coh-Metrix.
The situation model is the subject matter that is being described in infor-
mational texts or the microworld that evolves in a narrative text. In narrative,
this would include the people, objects, spatial setting, actions, events, pro-
cesses, plans, thoughts and emotions of people, and other referential content.
Text comprehension researchers have investigated five dimensions of the
situational model in narrative text (Zwaan & Radvansky, 1998): causation,
intentionality, time, space, and protagonists. A break in cohesion or coher-
ence occurs when there is a discontinuity on one or more of these situation
model dimensions. Whenever such discontinuities occur, it is important to
have connectives (e.g., therefore, because), transitional phrases (e.g., later on
that day, on the other hand), adverbs (e.g., unfortunately, already), or other
signaling devices (e.g., first, second, third) that convey to the readers that there
is a discontinuity; we refer to these different forms of signaling as particles.
Cohesion is facilitated by particles that clarify and stitch together the actions,
goals, events, and states in the text. The coherence in the minds of the readers
is similarly facilitated. However, sometimes it is worthwhile to insert (or
leave) cohesion breaks at the level of the situation model for high-knowledge
readers with good general comprehension skills because such readers will
devote more effort to construct inferences to fill the gaps (McNamara &
Kintsch, 1996). Whereas the low-ability readers have trouble with these
cohesion gaps, the high-knowledge and skilled readers may be inspired to
perform deeper processing (see Chapter 5). This interaction between cohe-
sion and reader profile is an excellent example of the need to consider a more
complex picture than the unidimensional text difficulty perspective.
Text genre has already been described in this chapter but a few words should
be devoted to rhetorical structure. The rhetorical structure is the organization
of the text at a macro-level and the discourse function of particular excerpts.
Example rhetorical structures in informational texts are cause + effect, claim +
evidence, and problem + solution (Meyer & Wijekumar, 2007). An excerpt or
global stretch of text may have an associated point, message, or pragmatic
function. The epistemological status of the sentences in these rhetorical struc-
tures also needs to be understood. There is a difference between a question, a
worry, a belief, a hypothesis, a claim, and the assertion of a fact.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C01.3D 17 [7–17] 8.10.2013 7:17PM

What Is Text and Why Analyze It? 17

We have said very little about the pragmatic communication level of dis-
course up to this point. This is an essential level to understand for compre-
hension to succeed. Texts are written to inform, persuade, tease, irritate,
entertain, seduce, and so on. The situational settings, speakers, audience, and
broader contexts are often absent when a text is analyzed. This is an unfortu-
nate limitation but it is ubiquitous when researchers analyze printed text. The
writer, the reader, and the occasion are stripped from the analysis when printed
text is read and analyzed. Beck, McKeown, Hamilton, and Kucan (2007) have
attempted to encourage their readers to resurrect this context in their
Questioning the Author intervention and this has been quite successful in
improving comprehension. However, this is a giant move that moves us from
the text to the sociocultural context.

conclusion
The Coh-Metrix program provides solid analyses of the first five levels
described in Graesser and McNamara (2011). In contrast, it has a relatively
anemic analysis of the pragmatic communication level. Indeed, we are pre-
pared to surrender and admit that this level is beyond the scope of the Coh-
Metrix project, but perhaps not beyond natural language processing. There
are certainly vestiges of text elements and discourse patterns that signal
components of pragmatic communication. But this research effort is at the
fringe and well beyond the scope of this book. In the meantime, we have
focused our efforts in Coh-Metrix on providing a selection of indices corre-
sponding to the first five levels of discourse: words, syntax, the textbase, the
situation model, and genre and rhetorical structure. The following chapters
in Part I of this book describe the technologies that have enabled the
measurement of these multiple levels of language, the indices provided in
Coh-Metrix Version 3.0, and studies that validate and demonstrate the utility
of Coh-Metrix.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 18 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion

The need to better understand the important role of cohesion in comprehension


was the primary inspiration to develop Coh-Metrix (hence the “Coh” in Coh-
Metrix). There has been considerable evidence that cohesion critically deter-
mines both how challenging a text is and how well the reader will understand it.
Decades of research have demonstrated the importance of cohesion to text
comprehension, yet at the turn of this century there were no means available
for objectively measuring the cohesion of a text. Studies that had manipulated
cohesion (or coherence as it has also often been referred to) had used guidelines
to increase or decrease cohesion for any given text version, but there existed no
measures of text cohesion itself, particularly measures that could be calculated at
large scales (i.e., automatically). This situation presented a clear need to provide
researchers and educators with a tool to objectively measure cohesion.
As we discussed in Chapter 1, one purpose of Coh-Metrix is to assess the
characteristics of the text so that readers’ comprehension can be estimated for
that particular text. Notably, however, Coh-Metrix provides estimates of the
linguistic, semantic, and discourse characteristics of the text without taking
into consideration such fundamental factors as the reader and the task. Any
predictions based on Coh-Metrix values for a text should therefore be quali-
fied by the multiple real-world factors that surround the text, including the
reader and the task. Readers have varying abilities, knowledge, motivation,
and purposes for reading. Tasks vary from reading under duress to reading to
enjoy, to learn, and to solve problems. All of these factors potentially interact
with the features of the text. A text feature may have one effect in one situation
and an entirely different effect in another situation. Such interactions need to
be considered carefully when interpreting Coh-Metrix output.
This chapter discusses the importance of cohesion in text recall and
comprehension. We show how the effects of cohesion can particularly depend
on the reader’s prior knowledge and reading ability.
18
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 19 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 19

cohesion versus coherence


An obvious preliminary assumption that we made before starting the Coh-
Metrix project was that cohesion could be measured. We also made the
assumption that coherence lies in the mind of the reader, whereas cohesion
lies in the text or discourse (Carrell, 1982; Givon, 1995; Graesser, McNamara, &
Louwerse, 2003). An important premise is that these textual elements, which
influence coherence, can be measured directly and can be quantified.
Coherence, by contrast, refers to how well the reader understands a text or
discourse, and therefore the coherence of text can only be measured indirectly.
We can do this, for example, by asking the reader questions, presenting tasks
that probe the depth and stability of comprehension, and assessing memory for
the information conveyed in the text. The coherence of a mental representation
emerges as a function of the number of associations or connections constructed
by the reader. When the representation includes many connections between
the ideas, then it is coherent; when it includes fewer connections, it is less
coherent.
In Figure 2.1, our notion of coherence is conveyed with an abstracted
representation, including nodes and links. The nodes may represent concepts
(e.g., objects, agents) whereas the connections represent the relations between
them (e.g., actions). For the figure on the left of Figure 2.1, only the concept in
the center is well connected to the other concepts. By contrast, for the figure
on the right, the outer four concepts have three connections rather than only
one (to the central concept). The added connections render the representa-
tion more stable because each of the nodes feeds the others’ activation, and
thus it is more coherent (Graesser, 1981; Kintsch, 1988; McNamara, 1997;
Trabasso & van den Broek, 1985). If we relate these representations to memory
(or recall), then we can predict that a reader with the representation on the left
will be more likely to remember the central idea (the node in the middle) and

fi g u r e 2 . 1 . Connection model of coherence. The figure on the left has few connec-
tions and would lead to a less coherent representation than would the figure on the right,
which has more connections.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 20 [18–39] 8.10.2013 7:32PM

20 Automated Evaluation of Text and Discourse with Coh-Metrix

forget the others, whereas a reader with the representation on the right would
be more likely to remember the central idea as well as the other four ideas (or
nodes). This stems from a well-established notion that concepts or ideas with
more interconnected associations in memory are more likely to be remem-
bered. Likewise, when there are more connections in the text and when the
reader generates connections between ideas in the text and to prior knowl-
edge, then the reader’s understanding is more likely to be more coherent.
When the level of cohesion in the text is insufficient for the reader or when the
reader does not (or cannot) generate sufficient inferences to make connec-
tions between ideas, then the reader’s understanding will be less coherent.
Although cohesion is not directly tied to coherence, it is a crucial aspect of
predicting the likelihood that a given reader will be able to form a coherent
mental representation of a text.

Cohesion and Cohesive Cues


As we have mentioned, a major assumption of the Coh-Metrix project is
(1) cohesion is in the text and (2) cohesion can be computationally measured.
But at this point it is important to emphasize that cohesion is a catch-all term,
referring to the many different lexical elements in the text that collectively
contribute to cohesion. When we consider cohesion at the level of a contri-
buting element, we use the term “cohesive cue.” Thus, for example, over-
lapping key words across sentences are a potential cohesive cue, and
connectives such as “and,” “but,” and “because” are potential cohesive cues.
A text may feature one or several cohesive cues. One goal of Coh-Metrix is to
provide measures for a wide range of cohesive cues so that we can better
understand the kinds of cohesive cues that are contributing to cohesion, and
the degree to which those cues are contributing.

What Does Cohesion Look Like?


Cohesion emerges from the presence or absence of cohesive cues in the text.
The purpose of cohesive cues is to tie different parts of the text together. In a
sense then, cohesion is similar to syntax because it generates order. However,
cohesive cues operate at a higher level than does syntax. Syntax ties together
words and phrases in a sentence at a fundamental level: Order conveys the
roles and relations of the words. By contrast, cohesion ties together the clauses
and sentences in text at a semantic level and thus helps the reader better
understand the ideas of the text.
For an example of syntax, consider the following sentence:
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 21 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 21

The dog chased the cat who had been sitting on the brick fence. (2.1)

In this sentence, the verb “chase” connects the subject (dog) and the object
(cat) and conveys the relation between them. “The dog” occurring before “the
cat” conveys who is the subject and who is the object (given that the verb is
active rather than passive). Likewise, the verb “sit” connects the “cat” to the
“brick fence,” while the past tense of “had” indicates that the cat was no longer
sitting on the fence when the dog chased it, and so on. In essence, the syntax
provides cues as to how the words are related to each other at the sentence
level.
Clearly, syntax is essential for the reader to be able to understand the text.
However an important difference between syntax and cohesion is that syntax
adheres to rules. Importantly, these rules cannot be easily violated by the
whims of a writer or speaker. For instance, none of the following sentences are
acceptable if we intend to convey the same meaning as in Example 2.1.
The dog the cat who had been sitting on the brick fence chased. (2.2)

The cat chased the dog who had been sitting on the brick fence. (2.3)

Who had been sitting on the brick fence the dog chased the cat. (2.4)

The the the been on who had brick fence dog chased cat sitting. (2.5)

By contrast, we can easily manipulate textual cohesion. Indeed, it is the


relative ease with which we can do such manipulations that make the Coh-
Metrix project so valuable. For example, consider the following examples:
Smoking was forbidden. The store had inflammables. (2.6)

Smoking was forbidden because the store had inflammables. (2.7)

The addition of the cohesive cue “because” in Example 2.3 is not a compulsory
rule of language; nonetheless, its addition facilitates the understanding of why
smoking was forbidden.
When discourse lacks cohesion, the reader must make inferences to con-
nect the dots. These inferences can be generated by accessing prior text,
everyday world knowledge, or subject matter knowledge associated with a
particular area of specialization (called domain knowledge). These inferences
can be relatively automatic and unnoticeable to the reader, or they may be
conscious and strategic; the inferences may be successful or unsuccessful and
correct or incorrect. The degree to which these inferences occur and are
successful is an important factor influencing the coherence of the reader’s
mental representation of a text. Inferencing can be a good thing, especially for
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 22 [18–39] 8.10.2013 7:32PM

22 Automated Evaluation of Text and Discourse with Coh-Metrix

skilled or high-knowledge readers. However, if the writer’s assumption is that


the reader does not understand the principal content area of the text, then
abandoning the reader to a sea of cohesion gaps is unlikely to result in optimal
levels of understanding. Thus, adding cohesion to the text where needed is
presumably facilitative to reading comprehension.

The Importance of Cohesion


There are many forms of cohesion and numerous studies showing its impor-
tance (e.g., Gernsbacher & Givón, 1995; Halliday & Hasan, 1976; Lorch &
O’Brien, 1995; Sanders, 1997; Sanders, Spooren, & Noordman, 1992).
Referential cohesion is the overlap in words, or semantic references, between
units in the text such as clauses, sentences, and paragraphs. Coh-Metrix
focuses on overlap between sentences and paragraphs. Consider two famous
examples from Haviland and Clark (1974).

George got some beer out of the car. The beer was warm. (2.8)

George got some picnic supplies out of the car. The beer was warm. (2.9)

The sentence “The beer was warm” is read more quickly in the context of
“George got some beer out of the car” in Example 2.2 where there is overlap in
the referent, “beer,” in comparison to Example 2.3 where there is no common
referent between the two sentences. When text is read more quickly, it is
assumed that the text is easier to process for the reader.
Indeed, there are numerous studies that have demonstrated that referential
overlap impacts reading times and recall of words and sentences (Haviland &
Clark, 1974; Kintsch & Keenan, 1973; Kintsch, Kozminsky, Streby, McKoon, &
Keenan, 1975). Some portion of the effect of referential cohesion may be
attributable to priming (Dell, McKoon, & Ratcliff, 1983). Lexica priming is
the term used to indicate that a concept may be unconscious in working
memory but is activated to a certain extent, which facilitates processing of it.
Priming can emerge from direct overlap in words or from semantically
related words, and is related to the notion of connections between ideas and
activation between those connections.
Although lexical priming may facilitate the reading of other related words,
there is no guarantee the primed concepts make it into a reader’s mental
representation of a text. This point is emphasized in the Construction-
Integration model of text comprehension (Kintsch, 1988, 1998). Specifically,
many words or concepts that are encoded can be lost after the network is
integrated because they have too few connections to other concepts in the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 23 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 23

network (McNamara & Kintsch, 1996; McNamara & McDaniel, 2004). When
there are more connections between ideas in the reader’s mental representa-
tion, the ideas are more likely to be remembered. This quality of the mental
representation is often referred to as coherence.

Cohesion and Text Comprehension


Some research studies have gone beyond the sentence reading time approach
by presenting participants with more naturalistic text typical of chapters in a
textbook. This type of empirical study is important in this book and is more
appropriate for analysis in Coh-Metrix. The highly controlled sentence pairs
or “textlets” such as those given previously in Examples 2.8 and 2.9 are fine for
empirical research studies that require tighter control over variables, but they
do not necessarily scale up to naturalistic texts. In either case, there are many
studies showing that increasing text cohesion improves readers’ understand-
ing and memory for text (see McNamara, Louwerse, McCarthy, & Graesser,
2010 for a review).
One of the first studies on this topic was conducted by Beck, McKeown,
Omanson, and Pople (1984), who examined the benefits of increasing the ease
of processing of text for children. They revised two narrative passages from a
second grade reading program. Their revisions were aimed to alleviate three
problems in the text: (1) surface problems, including syntactic complexity,
unclear relations between reference and referent in the text, the inappropriate
use of connectives (e.g., because), and awkward descriptions of events and
states; (2) knowledge problems, involving readers’ lack of familiarity with the
meaning and significance of events and the relations between the events; and
(3) content problems, attributed to ambiguous, irrelevant, or confusing con-
tent. The authors identified 116 such problems in the text and repaired the
problems in the revision process. Third grade children read either the revised
or original versions of the passages, recalled the passages, and answered
multiple-choice questions. Beck and colleagues found overall benefits of the
text revisions on the children’s ability to recall the passages as well as their
ability to answer the multiple-choice questions. They also found that skilled
readers showed greater benefits from the added cohesion than did less skilled
readers in terms of their ability to recall the passages. Hence, all of the readers
tended to benefit from the manipulations of the text that were expected to
facilitate processing, but skilled readers tended to benefit more when their
recall was tested. This latter result may have been because skilled readers are
better able to verbalize their understanding and the recall test depended on
that ability. It also may have been because the skilled readers were better able
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 24 [18–39] 8.10.2013 7:32PM

24 Automated Evaluation of Text and Discourse with Coh-Metrix

to capitalize on the text manipulations and the recall test was more sensitive
to those differences.
Beck, McKeown, Sinatra, and Loxterman (1991) extended these findings to
children’s comprehension of social studies texts. They asked children in
grades 4 and 5 to read either the revised or original versions of four passages
from a fifth grade social studies text book about the American Revolution.
The revisions were designed to minimize the need for children to rely on
background knowledge to understand the text by reducing the gaps in the text
requiring knowledge-based inferences. To this end, the researchers made
explicit the causal connections between the ideas, concepts, and events and
added clarifications, elaborations, and explanations to important information
in the texts. In essence, they increased the cohesion in the text in various ways.
After reading the passages, the children were asked to recall the passage and
answer open-ended comprehension questions. The results indicated that the
revisions improved the students’ comprehension both in terms of their recall
as well as their performance on open-ended questions. This study extended
Beck and colleagues’ previous findings to grades 4 and 5 as well as grade 3, and
demonstrated the results across a range of dependent variables, including
recall, multiple choice questions, and open-ended comprehension questions.
Importantly, the studies conducted by Beck et al. (1984, 1991) did not
carefully control the types of manipulations made to the texts. The authors
increased the ease of the text across many theoretical dimensions, including
adding elaborations to unfamiliar concepts and improving the general quality
of the text. As such, we cannot say that the studies’ positive learning outcomes
can be attributed to cohesion alone.
Britton and Gulgoz (1991) approached the issue of text manipulation more
systematically by implementing a model of text processing (Kintsch & van
Dijk, 1978; Miller & Kintsch, 1980; van Dijk & Kintsch, 1983). Their method-
ology of revision differed from that of Beck et al. (1984, 1991), because Britton
and Gulgoz very carefully manipulated some features of the text while others
remained constant. Britton and Gulgoz manipulated an Original passage
about the war in Vietnam, Air War in the North, from three different
theoretical perspectives. They created Heuristic, Readability Formula, and
Principled versions of the passage. In the Heuristic revision, the authors used
their own intuitive notions of better writing practice to improve the passage.
Some information was reordered or clarified, unimportant ideas were
omitted, and important ideas were elaborated. In the Readability Formula
revision, modifications were made to shorten the sentences and use more
familiar words such that the readability (i.e., according to five indices,
including Flesch-Kincaid) was equal to that of the Heuristic revision
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 25 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 25

(i.e., approximately grades 11–12), and two grades lower than the Original and
Principled revision (i.e., approximately grades 13–14).
Most relevant here is the Principled version. In the Principled revision,
Britton and Gulgoz (1991) focused primarily on increasing cohesive cues from
the perspective of Kintsch and van Dijk’s theory of text processing (e.g.,
Kintsch & van Dijk, 1978; Miller & Kintsch, 1980; van Dijk & Kintsch, 1983).
They first identified potential coherence breaks based on van Dijk and
Kintsch’s model of comprehension. A coherence break was a location in the
text in which there was no explicit cue on how the new information was
linked to prior text. In Coh-Metrix, these breaks would be identified in terms
of low referential cohesion and the lack of explicit connectives. Britton and
Gulgoz found 40 coherence breaks in the text and applied three principles to
repair these breaks. Principle 1 was to add referential (i.e., argument) overlap
such that a sentence repeated an idea stated in the previous sentence.
Principle 2 was to rearrange part of each sentence so that readers first received
old information (i.e., an idea presented previously in the text) and then the
new information. Principle 3 was to make explicit any implicit references that
did not have clear referent.
Consider these two examples: two sentences from the Original and
Principled version of the texts in Britton and Gulgoz (1991):
Most members of the Johnson administration believed bombing attacks would
accomplish several things. They would demonstrate clearly and forcefully the
United States’ resolve to halt communist aggression and to support a free
Vietnam. (2.10)

Most of both civilian and military members of the Johnson administration


believed bombing attacks would accomplish several things. The bombing
attacks would demonstrate clearly and forcefully the United States’ resolve to
halt communist North Vietnam’s aggression and to support a free South
Vietnam. (2.11)

Both of these texts require substantial prior domain knowledge to understand


them. However, Example 2.10, the original low-cohesion version, requires the
reader to make more inferences and rely more on prior knowledge. The
Principled, high-cohesion version in Example 2.11 increases referential over-
lap by specifying the bombing attacks as the referent for They (in They would
demonstrate clearly and forcefully the United States’ resolve to halt communist
aggression and to support a free Vietnam.). In addition, the high-cohesion
version informs readers that (a) the members of the administration include
both civilians and military officials, (b) the communists were in North
Vietnam, and (c) it was South Vietnam that sought freedom. These combined
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 26 [18–39] 8.10.2013 7:32PM

26 Automated Evaluation of Text and Discourse with Coh-Metrix

changes increased referential overlap with the paragraph that preceded it, and
also provided the reader with potentially missing background knowledge.
We can also consider the differences between the Principled revision and
Original version in terms of Coh-Metrix values. For example, as described in
Chapter 4, Coh-Metrix provides an argument overlap score (CRFAO1),
which indicates the average overlap between arguments (i.e., nouns, pro-
nouns) in a text. The argument overlap score is .68 for the Principled revision
and .38 for the Original version. We can also calculate overall cohesion scores
using Coh-Metrix Text Easability Scores as described in Chapter 5.
Accordingly, the Referential Cohesion Easability Z-score (ZREF) is 1.79 for
the Principled revision and –0.96 for the Original version. These values
provide some confirmation that the Principled revision was indeed higher
in cohesion than was the original version (see McNamara et al., 2010).
To assess the effects of their text revisions, Britton and Gulgoz (1991) asked
college students to read either the original or a revised version of the text. The
students’ comprehension was measured with free recall, multiple-choice
questions, and a keyword association task. The authors found a significant
disadvantage for the version that was modified based on notions of
Readability. Those who read the Readability Formula version showed lower
performance on both the recall and the multiple-choice comprehension
assessments. By contrast, both the Principled and the Heuristic revisions
improved comprehension in comparison to the Original version. Further,
the students’ efficiency measure for recall (the number of propositions
recalled per minute of reading time) indicated that the revision made the
comprehension process more efficient. Although the Principled and Heuristic
revisions lead to similar improvements, one advantage of the Principled
revision was that the modifications were guided by well-specified rules,
whereas the Heuristic revision was based solely on intuitions of improving
writing by an expert in discourse processing.
In sum, Britton and Gulgoz (1991) found that the Principled revision
improved comprehension according to their three dependent measures (i.e.,
free recall, multiple-choice questions, and a keyword association task).
Further, their efficiency measure for recall (the number of propositions
recalled per minute of reading time) indicated that the revision made the
comprehension process more efficient. There have been numerous studies on
the effects of cohesion using longer texts such as the one investigated by
Britton and Gulgoz (1991). A review of 19 studies and an analysis of the texts
using Coh-Metrix are available in McNamara, Louwerse, McCarthy, and
Graesser (2010). The experimental studies of text cohesion have implemented
a variety of techniques to enhance the coherence of text, including increasing
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 27 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 27

referential cohesion, clarifying terms, and adding connectives. Nonetheless,


the Coh-Metrix analysis presented by McNamara and colleagues indicated
that across studies noun overlap accounted for the greatest amount of var-
iance in the differences between the high- and low-cohesion versions.

factors that interact with text cohesion


As discussed earlier, one important impetus for developing Coh-Metrix came
from studies showing the benefits of text cohesion. But a second important
impetus for developing Coh-Metrix came from studies showing that the
effects of cohesion depended on factors such as the reader’s domain knowl-
edge and the comprehension task. Indeed, a good deal of research has shown
that the benefits (and even disadvantages; see section below) of increased
cohesion depend on the abilities of the reader. In the remaining sections of
this chapter, we describe how prior knowledge, reading skill, and the age of
the reader are key factors to consider when predicting the effects of cohesion
for particular readers.

The Reverse Cohesion Effect


Several studies have shown that the effects of cohesion depend greatly on the
prior knowledge of the reader. Low-knowledge readers gain greatly from
added cohesion whereas more knowledgeable readers (but not necessarily
experts) can gain from lower cohesion. This phenomenon has been referred
to as the reverse cohesion effect (O’Reilly & McNamara, 2007).
One such study was conducted by McNamara and Kintsch (1996). The
authors were following up on the findings reported by the previously dis-
cussed study of Britton and Gulgoz (1991). Specifically, McNamara and
Kintsch examined the effects of readers’ prior domain knowledge. In their
study, college students read either the Original or Principled version of the
Air War in the North passage from the Britton and Gulgoz study. In their first
experiment, McNamara and Kintsch assessed deep level comprehension with
a sorting task including 22 keywords from the text. The results indicated that
low-knowledge readers benefited from the high-cohesion text. However, they
also found that the high-knowledge readers who read the Original, low-
cohesion version developed a deeper understanding of the relationships
between the concepts in the texts as assessed by the sorting task.
McNamara and Kintsch reported similar results in their second experiment
in which they used open-ended comprehension questions to assess compre-
hension rather than multiple choice questions.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 28 [18–39] 8.10.2013 7:32PM

28 Automated Evaluation of Text and Discourse with Coh-Metrix

McNamara, Kintsch, Songer, and Kintsch (1996) reported similar results


for young adolescent students in the 7th to 10th grades. The children read one
of four versions of an encyclopedia article about heart disease. The four
versions were either high or low in local cohesion and either high or low in
macro-level (global) cohesion. The local cohesion modifications to the text
included replacing pronouns with noun phrases, defining unfamiliar con-
cepts, adding argument overlap between sentences, and adding connectives to
clarify relationships between ideas. Global cohesion was increased by adding
topic headers to sections and adding topic sentences to the beginnings of each
paragraph.
The excerpt below provides the first few paragraphs of the high-cohesion
Heart Disease text used in this study.

Heart Disease
The heart is the hardest-working organ in the body. We rely on it to supply blood
regularly to the body every moment of every day. Any disorder that stops the
heart from supplying blood to the body is a threat to life. Heart disease is such a
disorder. It is very common. More people are killed every year in the U.S. by heart
disease than by any other disease.
There are many kinds of heart disease, some of which are present at birth and
some of which are acquired later.

1. Congenital heart disease


A congenital heart disease is a defect that a baby is born with. Most babies are
born with perfect hearts. But one in every 200 babies is born with a bad heart. For
example, hearts have flaps, called valves, that control the blood flow between its
chambers. Sometimes a valve develops the wrong shape. It may be too tight, or fail
to close properly, resulting in congenital heart disease. Sometimes a gap is left in
the wall, or septum, between the two sides of the heart. This congenital heart
disease is often called a “septal defect”. When a baby’s heart is badly shaped, it
cannot work efficiently. (2.12)

In Example 2.12, local referential cohesion was modified in the first paragraph.
For instance, the third sentence was modified from the original version from
“Any disorder that stops the blood supply is a threat to life” to specify
explicitly that the blood supply is being supplied to the body, and conse-
quently increase the overlap between the sentences in the paragraph. The
second paragraph, “There are many kinds of heart disease . . .,” provides a
topic sentence that introduces the upcoming sections, “congenital heart
disease” and “acquired heart disease,” which were two of the three added
headers. The addition of “but, for example,” and “resulting in” are examples
of added connectives to specify the relationships between ideas in the text.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 29 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 29

Cohesion
High local High local Low local Low local
High global low global high global low global
5.8 0.7
6.0 easy
0.6

Argument overlap
F-K Grade Level

6.2 Argument overlap 0.5


6.4
0.4
6.6
0.3
6.8
7.0 0.2

7.2 F-K 0.1


hard
7.4 0
Flesch-Kincaid Grade Level Argument overlap

fi g u r e 2 . 2 . Argument overlap and Flesch-Kincaid grade level as a function of


cohesion. McNamara et al. (1996) presented participants with four versions of a text
on cell mitosis, varying local cohesion and global cohesion. Although the argument
overlap decreased across the four text versions as intended, readability measures such as
the Flesch-Kincaid indicate that the text is easier when the cohesion is lower. That is, the
grade level goes down when both local and global cohesion are lower.

The insertion of “hearts have flaps, called valves, that control the blood flow
between its chambers” is an example where an unfamiliar term was defined
for the reader.
These revisions resulted in four versions that manipulated both local and
global cohesion in a factorial design. The primary contrast was between the
two texts that were maximally high or low in cohesion. Interestingly, the
cohesion of the text was negatively related to Flesch-Kincaid readability. As
shown in Figure 2.2, the Coh-Metrix measure of referential cohesion (i.e.,
argument overlap) decreased as cohesion decreased across the four versions
of the text. By contrast, readability estimates such as the Flesch-Kincaid
Grade Level made the opposite estimates of text ease. As cohesion decreased,
the text was estimated to be easier by Flesch-Kincaid Grade Level estimates.
Readability measures often predict a decrease in ease when cohesion is
increased because adding cohesion often results in increasing the length of
the sentences and adding more unfamiliar or longer words.
McNamara et al. (1996) found that the benefits of cohesion were greater for
those readers who knew less about the heart before reading the text. They
found that low-knowledge readers benefited from the added cohesion accord-
ing to all of the comprehension and text recall measures. The size of the
difference in comprehension scores can be measured using Cohen’s d (see
Chapter 11 for a discussion of effect sizes and their interpretation).
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 30 [18–39] 8.10.2013 7:32PM

30 Automated Evaluation of Text and Discourse with Coh-Metrix

Accordingly, the average difference between reading a low-cohesion version


and a high-cohesion version for a low-knowledge reader was almost a full
standard deviation. For example, comparing the lowest-cohesion version to
the highest-cohesion text, the Cohen’s d effect sizes ranged from 0.37 on the
open-ended bridging inferences questions to 1.33 on the sorting task measure
(see McNamara et al., 2010). The overall effect size on all of the open-ended
comprehension questions was 0.93. This outcome means that when low-
knowledge children read the higher-cohesion text, they gain by one standard
deviation, a large effect size.
The particular benefits of increasing cohesion for low-knowledge readers
have been replicated in numerous studies (e.g., McNamara, 2001; O’Reilly &
McNamara, 2007). These studies also show that just about any source of
cohesion can help these readers. For example, in McNamara et al. (1996), the
low-knowledge readers benefited significantly from any one of the three texts
with added cohesion in comparison to the low-cohesion version of the heart
disease text. The low-knowledge readers who are confronted with text that
contain cohesion gaps do not have sufficient knowledge to bridge those gaps.
As illustrated in Figure 2.3, when reading a current sentence that does not
have strong overlap or explicit connections to the previous sentence or nearby
sentences, the reader must make an inference in order to understand the text
successfully. Readers can bridge the cohesion gaps by making an effortful
connection to prior text or by retrieving whatever knowledge that might be
relevant. Consider Example 2.13 from the low-cohesion Heart Disease
version:

In about one in every 200 cases something goes wrong. Sometimes a valve
develops the wrong shape. It may be too tight, or fail to close properly. (2.13)

Previous Sentence Prior


Text

Current Sentence
Prior
Knowledge

fi g u r e 2 . 3 . Model of reader inference using prior text and prior knowledge. Readers
make inferences when reading using prior text and prior knowledge.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 31 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 31

The reader needs to make an inference that something will go wrong refers to
“the baby” and to “the heart,” and thus infer that the baby will be born with a
bad heart rather than a perfect heart. The reader further needs to have some
knowledge of what a valve is within the heart, and that it is not a plastic device.
Hence, the reader must make inferences accessing prior text as well as prior
knowledge. Neither of those inferences is likely to occur in the absence of
some other source of scaffolding (e.g., McNamara, 2004; McNamara &
Dempsey, 2011; see also Chapter 5). Hence, low-knowledge readers who are
faced with texts that contain many such gaps between ideas and sentences
understand very little of the text.
The story is quite different when students have sufficient knowledge to
generate the inferences called for by the low-cohesion text. Across a number
of studies, readers with more background knowledge have been found to
either not benefit from the cohesion or from the lack of cohesion in the text.
McNamara et al. (1996) found that the children with more knowledge about
the heart benefit from the low-cohesion version of the text according to
comprehension measures that tapped into deeper levels of comprehension.
According to the bridging-inference questions, problem-solving questions,
and the sorting task, the children with more knowledge showed better
comprehension if they had read the low-cohesion rather than the high-
cohesion versions of the text. According to their recall of the text and the
performance on text-based (shallow, detail) questions, they showed a slight
advantage from the highest-cohesion text, but on the questions and tasks that
relied on deeper levels of understanding they showed large advantages of
having read the low-cohesion text. The Cohen’s d effect sizes for these low-
cohesion advantages ranged from 0.40 to 1.00 (as reported in McNamara
et al., 2010).
Several subsequent studies by McNamara and colleagues sought to isolate
the locus of this reverse cohesion effect. McNamara (2001) conducted an
experiment to examine the inference generation explanation of the reverse
cohesion effect. The inference generation explanation is based on the Kintsch
(1998) Construction-Integration (CI) theory of text comprehension.
Accordingly, when readers generate inferences that link the text with prior
knowledge, the reader’s situation model level of understanding is enhanced.
The CI model distinguishes between the textbase level of comprehension and
the situation model level of comprehension. Important to the concept of text/
reader interactions is the reader’s level of comprehension. The principal levels
are the surface structure, the propositional textbase, and the situation model
(Kintsch, 1998). These levels of comprehension are also discussed in
Chapters 1 and 3. The surface structure refers to the reader’s memory for
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 32 [18–39] 8.10.2013 7:32PM

32 Automated Evaluation of Text and Discourse with Coh-Metrix

the words and syntax of a text. For example, comprehension and memory
for the surface structure for the sentence “The streets were wet because it
was raining” includes only the words and syntax explicitly communicated.
In contrast, a textbase level representation of the sentence may be “The
roads were wet from rain.” The textbase level representation is memory
for the meaning behind the words and syntax, or the meaning at the
propositional level. One version of a propositional representation of
“The streets were wet because it was raining” is [Prop 1:wet(streets); Prop
2: cause(rain)]. The situation model level understanding is generally char-
acterized as resulting from knowledge-based inferences that go beyond the
text. In the case of the previous example, a reader might imagine that the
streets were slick and the sky was grey. The reader brings to the situation
knowledge about rain and streets and the various events that might occur
on wet streets, such as driving, running, or ducking under an awning.
When readers make more inferences that link to prior knowledge, then
the CI model predicts that the reader will construct a deeper, more stable
understanding of the text.
According to the CI model, the high-knowledge readers in McNamara
et al. (1996) were able to gain from low-cohesion text because it forced them to
generate inferences, and that inferencing resulted in a better, or deeper,
understanding of the text. McNamara (2001) tested that notion by having
participants read both the high-cohesion and low-cohesion versions of text
about cell mitosis, or one of the text versions twice. The participants were in
one of four conditions. They either read the same version of the cell mitosis
text twice (high-high; low-low) or they read one or the other version first
(high-low; low-high). Notably, the readers read the same texts in the low-high
and the high-low conditions. That is, they read both the low-cohesion version
and the high-cohesion version of the texts but simply in different orders of
presentation. The reverse cohesion effect was predicted to emerge only when
high-knowledge readers read the low-cohesion version of a text during the
first exposure to the text. If a reader were exposed to a high-cohesion version
of a text followed by the low-cohesion version, the reverse cohesion effect
would not occur. During the first reading, the high-cohesion version would
not induce inferences. Then, when reading the low-cohesion version, a text
representation would be readily available in memory, and the reader would be
less likely to generate the gap-filling inferences. In sum, if the reverse cohesion
effect emerges from inducing the reader to generate inferences to fill in the
conceptual gaps in the low-cohesion text, then a reverse cohesion effect would
be observed for both the low-low and low-high conditions but not for the
high-high or high-low conditions.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 33 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 33

McNamara (2001) confirmed the prediction that high-knowledge readers


would benefit most from reading the low-cohesion text first, and also repli-
cated previous findings that the low-knowledge readers benefited from high
cohesion. Low-knowledge readers (who know less about cells) gained from
reading the high-cohesion text either twice, first, or last. But the high-
knowledge readers gained most from reading the low-cohesion text followed
by the high-cohesion text. Essentially, the low-cohesion text induced the
high-knowledge readers to generate inferences (and they had the knowledge
to do so). Subsequently, reading the high-cohesion version served to confirm
inferences generated while reading the low-cohesion version and also poten-
tially corrected erroneous inferences. Thus, this study provided some evi-
dence in favor of the explanation of the reverse cohesion effect that had been
offered by McNamara and colleagues (McNamara et al., 1996; McNamara &
Kintsch, 1996). Specifically, it suggested that the low-knowledge readers
gained from high-cohesion text because they could not generate the necessary
inferences to fill in the gaps, and that the high-knowledge readers gained from
the low-cohesion text because being induced to generate successful inferences
resulted in a better understanding of the text.
One important caveat resulted from this study and several studies follow-
ing – namely, McNamara (2001) did not find the locus of the benefits at deep
levels of processing, or the situation model level, but rather at the textbase
level. The readers’ performance at the situation model level was assessed by
open-ended questions that required bridging inferences. These types of
questions require the reader to understand the relationship between at least
two separate ideas in the text. By contrast, text-based questions only tap into
the understanding of one single sentence in the text. McNamara (2001) found
that the high-knowledge readers gained from the low-cohesion text at the
textbase level but not the situation model level. This same trend has been
found in three other studies (O’Reilly & McNamara, 2007; Ozuru, Briner,
Best, & McNamara, 2010; Ozuru, Dempsey, & McNamara, 2009). These
results qualify the explanation offered by the CI Model of comprehension.
According to the CI model, the benefits of inference generation will emerge
primarily at the situation model level of understanding. However, the results
of the subsequent studies have indicated that inference generation can benefit
a textbase level of understanding, particularly when it is the basic under-
standing of the text that is suffering in the absence of the scaffolding offered by
the added cohesion. Benefits of scaffolding the comprehension process will
occur at the most shallow level of understanding that is incoherent in the
mind of the reader. If the reader does not understand the text at the textbase
level without scaffolding, then it is at that level, and on the types of questions
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 34 [18–39] 8.10.2013 7:32PM

34 Automated Evaluation of Text and Discourse with Coh-Metrix

that tap into that level, that the benefits of cohesion will emerge. If the textbase
level of understanding is relatively coherent without cohesion (as it was in
McNamara et al., 1996), then the benefits of inference generation are more
likely to emerge at deeper levels of understanding. These differences may well
depend on the overall difficulty of the text, as we discuss in Chapter 5.

Reading Skill Overcomes Cohesion


The effects of cohesion also depend on reading ability. O’Reilly and
McNamara (2007) further examined the inference generation explanation of
the reverse cohesion effect by investigating the effects of reading skill. The
foundation of their study rested on studies indicating that more skilled read-
ers are more likely to generate inferences while reading. Indeed, skilled and
less-skilled readers differ primarily in terms of inference processes. These
inference processes include solving anaphoric reference, selecting the mean-
ing of homographs, processing garden-path sentences, and making appro-
priate inferences while reading (Long, Oppy, & Seely, 1994; Singer & Ritchot,
1996; Whitney, Ritchie, & Clark, 1991; Yuill & Oakhill, 1988). Skilled readers
are also more likely to generate inferences that repair conceptual gaps
between clauses, sentences, and paragraphs (Magliano, Millis, The RSAT
Development Team, Levinstein, & Boonthum 2011; Magliano, Wiemer-
Hastings, Millis, Muñoz, & McNamara, 2002; Oakhill, 1984; Oakhill & Yuill,
1996). In contrast, less-skilled readers tend to ignore gaps and fail to make the
inferences necessary to fill in the gaps (Garnham, Oakhill, & Johnson-Laird,
1982; Oakhill, Yuill, & Donaldson, 1990). In sum, there is a good deal of
literature to support the notion that more-skilled readers generate more
inferences while reading than do less-skilled readers.
If that is the case, then skilled readers should not need the low-cohesion
text to induce them to generate inferences. The inference generation hypoth-
esis rests on the assumption that the high-knowledge readers need to be
induced to generate inferences, and that the high-cohesion text reduces the
need for inferences, and thus high-knowledge readers make fewer active
inferences. But if the reader is an active reader, then there should be no
need for low cohesion to induce inference generation. And that is what
O’Reilly and McNamara (2007) found: Among the high-knowledge readers,
only those who were less skilled (according to a median split on the Nelson
Denny Reading Comprehension test) showed a reverse cohesion effect.
The high-knowledge readers who were more-skilled readers gained from
the high-cohesion text. Essentially, the more-skilled readers generated infer-
ences despite the absence of conceptual gaps in the high-cohesion text.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 35 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 35

High-knowledge readers who were skilled readers, and thus more naturally
generated inferences, did not need the low-cohesion text to induce them to
generate inferences, and thus there was no reverse cohesion effect.
The findings reported by O’Reilly and McNamara (2007) were replicated
by Ozuru, Dempsey, and McNamara (2009). Ozuru and colleagues used Coh-
Metrix cohesion measures to verify and control the cohesion manipulations
of two science texts, one on the topic of internal distributions of heat in
animals and the other on a plant’s response to an external stimulus. Ozuru
and his colleagues manipulated the cohesion of the texts by (a) replacing
ambiguous pronouns with nouns, (b) adding descriptive elaborations to link
unfamiliar concepts with familiar concepts, (c) adding connectives to specify
the relationships between sentences or ideas, (d) replacing or inserting words
to increase the conceptual overlap between adjacent sentences, (e) adding
topic headers, (f) adding thematic sentences that serve to link each paragraph
to the rest of the text and overall topic, and (g) changing sentence structures
to incorporate the additions and modifications. Coh-Metrix was used to
verify that these modifications resulted in higher-cohesion texts according
to objective measures, including local and global argument overlap and LSA
similarity. The results of the study confirmed that the high-cohesion text
generally improved comprehension at the textbase level. They also replicated
the results reported by O’Reilly and McNamara (2007) by showing that the
reverse cohesion effect (i.e., benefit of low cohesion for high-knowledge
readers) occurred exclusively for the high-knowledge, less-skilled readers.
This is because the less-skilled readers need the low cohesion in the text to
induce inference processes.
Ozuru, Briner, Best, and McNamara (2010) further examined the effects of
deep reading processes in the context of high- and low-cohesion text by
having participants self-explain while reading the text. Self-explaining in
this context involved explaining the meaning of target sentences in the texts
while reading. This process improves comprehension and learning by helping
readers engage in active inference processes. Because there are more gaps in
the low-cohesion text, requiring inference processes to bridge the gaps, Ozuru
and his colleagues hypothesized that the self-explanation process would result
in better comprehension for the low-cohesion than for the high-cohesion
text. That is, self-explanation would be most effective where it was needed: for
the low cohesion text. In turn, the low-cohesion text would enhance the
benefits of the self-explanation, because the gaps in the texts would elicit
more inference-based explanations.
Ozuru et al. (2010) also used Coh-Metrix to guide the cohesion manipu-
lations of their text, titled “Why Is There Sex,” excerpted from the Leahey and
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 36 [18–39] 8.10.2013 7:32PM

36 Automated Evaluation of Text and Discourse with Coh-Metrix

Harris (1997) textbook on learning and cognition. The high-cohesion version


was revised by adding connectives, replacing pronouns with noun phrases,
and adding nouns to increase argument overlap. These modifications were
confirmed by Coh-Metrix with higher argument overlap and LSA adjacent
similarity for the high-cohesion than low-cohesion version. Likewise the
Coh-Metrix analysis confirmed the higher incidence of causal and logical
connectives in the high-cohesion text.
The results of the Ozuru et al. (2010) study showed that participants who
read the high-cohesion text produced higher-quality self-explanations; how-
ever, these higher-quality explanations did not affect comprehension. By
contrast, and as predicted, comprehension was enhanced by self-explanation
in the low-cohesion condition. The low-cohesion text required additional
inferences that were facilitated by the self-explanation process. Prior knowl-
edge was not a focus in the study by Ozuru and his colleagues, but the benefits
of self-explanation when reading low-cohesion text might be expected to
depend on prior knowledge. Along these lines, McNamara (2004) found
that providing low-knowledge readers with training and practice to use
reading strategies while self-explaining eliminated their deficits relative to
high-knowledge readers. That is, at least for the text-based questions, the low-
knowledge readers who had been provided training performed as well as the
high-knowledge readers did on comprehension questions when reading a
low-cohesion text. By contrast, when low-knowledge readers self-explained
low-cohesion text and had not been provided with training on reading
strategies and self-explanation, they understood the text quite poorly.

Early Reading and Cohesion


Another question that has been explored using Coh-Metrix is the effect of
cohesion on young children’s comprehension. McNamara, Ozuru, and Floyd
(2011) examined fourth grade students’ comprehension as a function of text
cohesion (high, low), text genre (narrative, science), and readers’ abilities
(reading decoding skills and world knowledge). The purpose of this study was
to further explore and better understand what has been called the fourth grade
slump (Meichenbaum & Biemiller, 1998; Sweet & Snow, 2003). Children at
that age are at a critical period in reading development. Importantly, they are
moving from learning to read to reading to learn, and they are often increas-
ingly faced with challenging, expository texts with unfamiliar concepts and
information. During this time period, some of these children display com-
prehension difficulties that had gone undetected previously.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 37 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 37

To examine this issue, McNamara et al. (2011) used Coh-Metrix to guide


the cohesion manipulations of two narrative and two science texts.
Specifically, seven features of the text were modified to increase cohesion:
(a) replacing pronouns with noun phrases, (b) adding descriptive elabora-
tions, (c) adding sentence connectives, (d) replacing or inserting words to
increase conceptual overlap, (e) adding topic headers, (f) adding theme
sentences, and (g) moving or rearranging sentences to increase temporal or
referential cohesion. For example, if events were not originally presented in
chronological order, the order of the events was modified to do so. These
cohesion manipulations were implemented by the experimenters and then
checked using Coh-Metrix, with the aim of altering the texts so that the high-
cohesion versions approximated equivalent levels of cohesion. The following
example, from one of the science texts about plants, illustrates a few ways in
which cohesion was added. In Example 2.14, a sentence explaining that a
mineral is not a plant or an animal was added to the high-cohesion version.
The third sentence included a connective term “instead.”

Low cohesion (2.14)


Plants also need minerals. A mineral is a naturally occurring substance that is
neither plant nor animal.
High cohesion
Plants also need minerals. A mineral is not a plant or an animal. Instead, a mineral
is a substance in the ground that occurs naturally.

The following example from the beginning of one of the narrative texts, called
Orlando, illustrates cohesion manipulations that were also implemented to
create a context and to facilitate interpretations of the situations described in
the text. The order in which information was presented was also changed for
the Orlando text such that the high-cohesion version provided greater tem-
poral cohesion. That is, information was presented in the order in which
events occurred. The low-cohesion version, on the other hand, presented
information in a nontemporal order, and thus the reader had to infer the
actual order of events.

Low cohesion (2.15)


Salvador was upset. He told his Mama he was going out. He didn’t want to be
worried or sad.
High cohesion
Once upon a time there was a boy. His name was Salvador. Salvador adored his
pet pig named Orlando.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 38 [18–39] 8.10.2013 7:32PM

38 Automated Evaluation of Text and Discourse with Coh-Metrix

Children in grade 4 read four texts, including one high-cohesion and one
low-cohesion text from each genre. Their comprehension of each text was
assessed using three measures: 12 multiple-choice questions, free recall, and
cued recall. The most important prediction made in this study was that at
the age when young children are expected to begin learning from text,
successful comprehension would largely depend on the reader’s knowledge
about the world and about specific domains. The results confirmed that
comprehension was enhanced by increased knowledge: High-knowledge
readers showed better comprehension than did low-knowledge readers,
and narratives were comprehended better than science texts. Interactions
between readers’ knowledge levels and text characteristics indicated that the
children showed larger effects of knowledge for science than for narrative
texts.
McNamara et al. (2011) found that the high-cohesion text improved
comprehension of the narrative texts as measured by the multiple-choice
questions – a measure that tends to tap textbase level understanding.
Importantly, they also found a reverse cohesion effect for the narrative
texts. That is, children with more knowledge better understood the low-
cohesion narrative texts than the high-cohesion narrative texts. Thus, when
the students possessed enough knowledge (i.e., they were high-knowledge
readers and the texts were narratives), they showed the same patterns that
have been observed for adults. The low-cohesion version, which required
more inferences, was understood better than the high-cohesion version was.
Decoding skill benefited comprehension for these young readers, but
effects of text genre and cohesion depended less on decoding skill than on
prior knowledge. Overall, the study indicates that the fourth grade slump is
at least partially attributable to the emergence of complex dependencies
between the nature of the text and the reader’s prior knowledge. The results
also suggested that simply adding cohesion cues, and not explanatory
information, is not likely to be sufficient for young readers as an approach
to improving comprehension of challenging texts. That is, there were some
benefits of the added cohesion, but they were not as substantial as hoped.
Clearly the young readers needed more cohesion and background
information added to the text in order to improve their comprehension
substantially.

conclusion
In conclusion, across a number of studies, it has been found that low-
knowledge readers gain from higher-cohesion text, and any source of
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C02.3D 39 [18–39] 8.10.2013 7:32PM

The Importance of Text Cohesion 39

cohesion or scaffolding can be helpful. High-knowledge readers can gain


from low-cohesion text if they need to be induced to generate inferences
while reading. All in all, the research points to a need to carefully consider
the cohesion of a text with respect to the readers’ knowledge and reading
skill level, as well as to the amount of scaffolding the readers might receive
while working through the text. These studies also pointed toward the
need for objective measures of cohesion, and hence the development of
Coh-Metrix.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 40 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix

This chapter describes the scientific and technological advances that were the
precursors to the development of Coh-Metrix. The Coh-Metrix team has
developed numerous computational algorithms and procedures for measur-
ing ease (versus difficulty) at the various levels of language and discourse. We
are satisfied with our progress and achievements, but we cannot emphasize
too much that Coh-Metrix was hardly built in a vacuum. Coh-Metrix can be
viewed as a sandbox of automated language and discourse facilities that were
developed not only by our research team but also by others in computational
linguistics, corpus linguistics, discourse processes, cognitive science, psychol-
ogy, and other affiliated fields. We were able to build Coh-Metrix because we
had the advantage of standing on the shoulders of giants.
The contributions of our predecessors come in many varieties. Some
noteworthy examples of these contributions are highlighted below.
1. One type of contribution is lexicons or dictionaries of words that list
qualitative features or quantitative values for each word. For example,
WordNet (Fellbaum, 1998; Miller, Beckwith, Fellbaum, Gross, &
Miller, 1990) stores semantic and syntactic features of nouns, verbs,
adjectives, and other content words in the English language. The MRC
Psycholinguistic Database (Coltheart, 1981) has human ratings of
thousands of words on familiarity, imagery, concreteness, and mean-
ingfulness. The CELEX Lexical Database (Baayen, Piepenbrock, &
Gulikers, 1995) has estimates of how frequently English words are
used in a very large corpus of documents.
2. A second type of contribution is from applications. An application is a
fully functioning program that takes text as input and computes some
language or discourse code as output. We use the output when we
create a Coh-Metrix measure. A good example of this is when we used

40
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 41 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 41

a syntactic parser (the one Coh-Metrix uses was based on Charniak,


2000). The parser takes each sentence as input and computes a
syntactic tree structure. These tree structures are used when we scale
texts on syntactic ease or difficulty.
3. A third type of contribution stems from statistical algorithms that can
be used to quantitatively measure texts or discourse components. As
one noteworthy example, Latent Semantic Analysis (LSA, Landauer,
McNamara, Dennis, & Kintsch, 2007) is a statistical representation of
word and world knowledge that is based on a large corpus of texts (10
million words or larger). The LSA statistical spaces allow us to com-
pute the conceptual similarity of any two texts (A and B), with values
ranging from approximately 0 to 1. LSA similarity values are used in
computations of text cohesion or coherence. For example, a text is
coherent to the extent that adjacent sentences in the text have com-
paratively high LSA similarity values.
4. A fourth type of contribution consists of theoretical advances in lan-
guage and discourse analysis. Researchers write books and articles that
analyze words, sentences, and discourse at many levels in our multilevel
theoretical framework (Graesser & McNamara, 2011). Their insights
are incorporated in our Coh-Metrix mechanisms and measures.
This is a unique point in history because there is widespread access to
computer tools that analyze texts at many levels of language and discourse.
Thousands of texts can be quickly accessed and analyzed on thousands of
measures in a short amount of time. However, the storage and processing
components of modern computer technologies cannot alone account for the
advances in computational discourse science. The theoretical advances in
understanding language, discourse, and communication are also responsible
for our opportunity to build Coh-Metrix.
This chapter describes scientific and technological advances in an order
that is aligned with the six levels of the multilevel theoretical framework (see
Chapter 1). We begin with the words (i.e., the lexicon) and proceed to syntax,
the textbase, the situation model, and finally genre and rhetorical structure.
Specific measures of Coh-Metrix are defined more precisely in Chapters 4 and
5 of this book.

the lexicon
There is a long history of analyzing words in the language, discourse, and
social sciences. Psychologists are prone to have humans rate or categorize
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 42 [40–59] 8.10.2013 7:44PM

42 Automated Evaluation of Text and Discourse with Coh-Metrix

words on various psychological properties. In this section, we identify the


major lexicons that are incorporated in Coh-Metrix.
MRC Psycholinguistics Database. This database, assembled by Coltheart
(1981), is a collection of human ratings of more than 150,000 words on 26
psychological properties. Some of these properties are absent for particular
words. For example, imagery ratings are available for only 9,240 of the words.
Coh-Metrix includes the following six MRC properties of words, which are
described in greater detail in Chapter 4.
1. Age of acquisition. This is the age-of-acquisition norms (Gilhooly &
Logie, 1980), reflecting the fact that some words appear in children’s
language earlier than others.
2. Familiarity. A rating of how familiar a word is to an adult.
3. Concreteness. How concrete or nonabstract a word is, based on human
ratings.
4. Imagability. How easy it is to construct a mental image of the word in
one’s mind, according to human ratings.
5. Colorado Meaningfulness. Meaningfulness is related to the degree to
which the word is associated with other words. The meaningfulness
ratings are from a corpus developed in Colorado by Toglia and Battig
(1978).
6. Paivio Meaningfulness. This meaningfulness rating is based on the
norms by Paivio (Paivio, Yuille, & Madigan, 1968) and Gilhooly and
Logie (1980). This measure has been included in various versions of
Coh-Metrix but is not included in Coh-Metrix 3.0.
The impact of these psychological properties on text difficulty is intuitively
straightforward. Text difficulty is predicted to decrease as a function of the
familiarity, imagability, concreteness, meaningfulness, and older age of
acquisition.
CELEX Word Frequency. Word frequency refers to the relative frequency
of words in public documents per million words. Text difficulty is expected to
increase when there are rare words that most readers never or rarely encoun-
ter. It is therefore necessary to conduct a corpus analysis on a large volume of
texts that are representative of what a person reads and to compute how often
particular words occur. After exploring a large number of corpus analyses
with word frequency counts in the evolution of Coh-Metrix, we settled on the
word frequency counts of CELEX, the database from the Dutch Centre for
Lexical Information (Baayen, Piepenbrock, & Gulikers, 1995) that analyzed
17.9 million words.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 43 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 43

It should be noted that these frequency norms will change over time because
the reading materials vary over history and sociocultural contexts. Therefore, it
would be ideal to have an automated facility that periodically samples text
corpora and revises the frequency norms. This approach is being pursued by
many companies in their analyses of Web sites, Wikipedia, and the vast
repository of documents in the cloud. The Word Maturity index of Kireyev
and Landauer (2011) tracks the words exposed to readers of different ages. One
could also imagine word frequency norms that are tailored to particular
populations in a culture – at a grain size akin to the marketers of Amazon.com.
There is ample evidence that text difficulty decreases as a function of the word
frequency of the words in the text. This is indeed reflected in readability
formulas that point to the length of words. We know that word frequency
robustly decreases as a function of word length: Frequent words are shorter
according to Zipf’s law (Zipf, 1949). We also know that the time it takes to read a
text decreases substantially as a function of the reading ease metrics, word
frequency, and the shortness of words. Available evidence supports the claim
that reading time decreases as a function of the logarithm of word frequency
(Haberlandt & Graesser, 1985; Just & Carpenter, 1987). Thus, the difference
between words occurring 10 versus 100 times per million has a much more
robust impact on reading times than words that appears 1,010 versus 1,100
times per million. Word frequency is extremely important because it is aligned
with world knowledge. Readers know much less about rare words, and this
has a tremendous impact on comprehension (McNamara, Kintsch, Songer, &
Kintsch, 1996; Perfetti, 2007; Rayner, Foorman, Perfetti, Pesetsky, & Seidenberg,
2001; Snow, 2002; Stanovich, 1986).
WordNet. WordNet® is a computational, lexical database annotated by
experts on various linguistic and psychological features, containing more
than 170,000 English nouns, verbs, adjectives, and adverbs. The design of
WordNet is inspired by psycholinguistic theories of human lexical representa-
tions (Fellbaum, 1998; Miller et al., 1990). The words are organized in lexical
networks based on connections between related lexical concepts. English
nouns, verbs, adjectives, and adverbs are organized into semantic organizations
of underlying lexical concepts. Some pairs of words are functionally synon-
ymous (e.g., lady and woman) because they have the same or a very similar
meaning. There are relations other than synonyms. Polysemy refers to the
number of senses of a word. A word with more senses runs the risk of being
ambiguous and to slow down processing for less-skilled and low-knowledge
readers (Gernsbacher, 1990; Just & Carpenter, 1987; McNamara & McDaniel,
2004). However, there is an advantage of polysemy because more frequent
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 44 [40–59] 8.10.2013 7:44PM

44 Automated Evaluation of Text and Discourse with Coh-Metrix

words tend to be polysemous. The correlation between polysemy and word


frequency is significantly positive. Therefore, there is a trade-off between these
two processing fallouts.
Another type of relation in the WordNet lexicon is the hypernym relation.
Hypernym count is defined as the number of levels in a conceptual taxonomic
hierarchy that is above (superordinate to) a word. For example, table (as an
object) has seven hypernym levels: seat → furniture → furnishings → instru-
mentality → artifact → object → entity. A word having many hypernym
levels tends to be more concrete whereas a word with few hypernym levels
tends to be more abstract. As predicted, there is a positive correlation between
hypernymy and MRC concreteness of content words.
WordNet classifies content words (i.e., nouns, verbs, adjectives) on a number
of other semantic features. Each sense of a noun is assigned features such as
±HUMAN, ±ANIMATE, and ±CONCRETE. Some of these semantic features
are aligned with syntax in important ways. For example, the sentence “The fig
read the paper” is considered ungrammatical because the verb “read” requires a
subject noun with the feature +HUMAN, but the noun “fig” is marked –
HUMAN. Perhaps this sentence would be acceptable as a metaphor, but with-
out more context the sentence is unilluminating, uninteresting, and fails to
satisfy pragmatic principles of figurative language. The features of the nouns
and main verbs in sentences need to be coordinated in semantically well-
formed expressions. WordNet’s features are available to analyze the content
words and evaluate the quality of semantic integrations. More on which
features from WordNet are used in Coh-Metrix is provided in Chapter 4.
Main verbs also have features that are important for Coh-Metrix.
Some important features are CHANGE-OF-STATE, STATIVE, MOTION,
COGNITION, PERCEPTION, EMOTION, COMMUNICATION,
COMPETITION, CONSUMPTION, CREATION, POSSESSION, and
SOCIAL. The main verbs play an important role in classifying sentences on
their epistemological status, such as whether the sentence refers to an event,
intentional action, versus a state, as illustrated below.
Events: The thunder struck the tree. The soldier remembered
the code.
Intentional Action: The child rescued her kitten. The mother read the
newspaper.
State: The mountain is serene. The budget is gigantic.
Narrative texts tend to have a high frequency of events and actions rather
than states, except for the setting that has more static information. Events
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 45 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 45

and actions are also important in causal knowledge, as discussed later in


this book.
WordNet is a powerful facility for analyzing the semantics of words, sen-
tences, and texts. Coh-Metrix currently uses some of the semantic features in its
computations, as discussed in Chapter 4 where we discuss the specific meas-
ures. Future projects could make use of the WordNet features to capture more
subtle distinctions and to evaluate the semantic congruence between nouns and
verbs in sentences and clauses. However, some of the distinctions and con-
gruence patterns are subtle or infrequent, so the payoff may be minimal. The
current version of Coh-Metrix incorporates the WordNet features that were
found to be sufficiently robust and useful for our discourse analyses.
Parts of Speech. Coh-Metrix provides the part of speech (POS) for every
word contained in a text. There are more than 50 POS tags derived from the
Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993). The tags include
content words (e.g., nouns, verbs, adjectives, adverbs) and function words
(e.g., prepositions, determiners, pronouns). Coh-Metrix incorporates a natural
language-processing tool, the Charniak parser (Charniak, 2000), for assigning
POS tags to each word. When a word can be assigned to more than one POS
category, the most likely category is assigned on the basis of its syntactic
context, using the Charniak parser. Moreover, the syntactic context can
assign the most likely POS category for words it does not know.
The most obvious prediction would be that the content words carry the day
in predicting text comprehension. Indeed, content words are rarer and
semantically richer than function words. Nonetheless, there are some good
reasons for arguing that the function words have a significant impact on the
psychology of text comprehension. Pennebaker and his colleagues
(Pennebaker, 2011; Pennebaker et al., 2007) have documented the psycholog-
ical impact of a wide variety of word types, including pronouns, common and
auxiliary verbs, verb tenses, adverbs, conjunctions, negations, quantifiers,
numbers, and swear words. Work by Pennebaker and his colleagues suggests
that it is the function words rather than the content words that surprisingly
are diagnostic of many social psychological states. Function words are diffi-
cult for people to deliberately control and perceive, so examining their use in
natural language samples provides a nonreactive way to explore social and
personality processes.
Special-Purpose Word Categories. Some categories of words have a special
significance from the standpoint of the psychological impact on comprehension
processes. Cohesion and coherence are particularly salient in this respect. As
discussed in Chapter 2, connectives are known to contribute to discourse
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 46 [40–59] 8.10.2013 7:44PM

46 Automated Evaluation of Text and Discourse with Coh-Metrix

cohesion by explicitly linking ideas at the clausal and sentential level (Britton &
Gulgoz, 1991; Halliday & Hasan, 1976; Louwerse, 2001; McNamara & Kintsch,
1996; Sanders & Noordman, 2000). These include connectives that correspond to
additive cohesion (e.g., “also,” “moreover,” “however,” “but”), temporal cohesion
(e.g., “after,” “before,” “until”), and causal/intentional cohesion (e.g., “because,”
“so,” “in order to”). Logical operators (e.g., variants of “or,” “and,” “not,” and “if–
then”) are also cohesive links that influence the analytical complexity of a text.
Coh-Metrix has lists of connectives and discourse markers in various categories
that are accessed while interpreting text. The relative frequency of connectives
and discourse markers is expected to correlate positively with discourse cohesion
and text ease. The one caveat in this prediction is that connectives tend to
lengthen sentences so there is a potential burden on cognitive resources and
consequent memory for text (Millis, Graesser, & Haberlandt, 1993).
Pronouns also have repercussions on cohesion and coherence. If the reader
cannot bind a pronoun to a referent, the reader runs the risk of not optimally
connecting ideas in the text. Therefore, the relative frequency of pronouns in
a text should be correlated positively with text difficulty to the extent that the
referents of pronouns are difficult to resolve. However, one also needs to be
tentative in making this prediction because there are other factors to consider.
Pronouns are frequent and have few letters, which should make them easy to
process at the lower, basic levels of reading. Pronouns are diagnostic of
narrative texts that are known to be easier to process than informational
texts. There is a question of whether the scale will tip to pronouns having
ungrounded referents and pronouns being prevalent in easy narrative text.
Empirical tests are needed to resolve such trade-offs.
In summary, there is a wealth of computer technologies and psychological
theories that analyze words. The work level of the multilevel theoretical
framework is well fortified in computational power. As we go to the deeper
levels of meaning, the available repertoire of computer technologies becomes
sparse. However, the lexicons of words are quite plentiful.

syntax
In models of text and discourse comprehension, the surface structure is com-
posed of the words and the sentence (e.g., van Dijk & Kintsch, 1983). One
important aspect of the sentences in a text regards syntax. Both theoretical
and computational linguists have devoted considerable effort to analyzing the
syntax of sentences (Charniak, 2000; Chomsky, 1965; Winograd, 1983). The
words in a sentence are decomposed into basic meaning units called morphemes
(e.g., swimming → swim + -ing). The morphemes are grouped into phrases, such
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 47 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 47

as noun phrase (NP), verb phrase (VB), prepositional phrase (PP), and embed-
ded sentence constituents. The phrases are organized into a tree structure with
nodes and branches. The root of the tree is at the highest level and is the main
sentence node. The root sentence constituent has descending branches that point
to its component phrases (e.g., NP, VP, PP), which are also nodes at an
intermediate structural level. There may be many structural levels of the inter-
mediate nodes. Eventually the tree structure breaks down the information to the
point of reaching the terminal nodes, which are specific words or morphemes.
Figure 3.1 shows an example syntactic tree structure for the sentence “A dog is
swimming in my pool.” There is the Sentence root node and a set of intermedi-
ate phrase nodes (NP, VP, PP). There is a set of part-of-speech (POS) tags, as we
defined earlier. In this sentence the POS tags are determiner, noun, verb,
auxiliary verb, gerund (via the –ing, which is incorrectly assigned according to
some linguists), preposition, and possessive pronoun. The tense and aspect are
specified also in Figure 3.1: present tense and in-progress aspect.

S1

NP VP .

DT NN AUX VP

VBG PP

IN NP

PRPS NN

The dog is swimming in my pool .

Note: AUX = auxiliary verb, DT = determiner, NN = noun (singular or mass), NP = noun phrase,
PP= prepositional phrase, PRP$ = possessive pronoun, S1 = sentence, S = simple declarative
clause, VBG = verb (gerund or present participle, VP = verb phrase

fi g u r e 3 . 1 . Syntactic structure for “The dog is swimming in my pool”


C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 48 [40–59] 8.10.2013 7:44PM

48 Automated Evaluation of Text and Discourse with Coh-Metrix

Computational linguists have developed a large number of syntactic pars-


ers that automatically assign syntactic tree structures to sentences (Jurafsky &
Martin, 2008). Two popular contemporary parsers that we have implemented
in Coh-Metrix are the Apple Pie parser (Sekine & Grishman, 1995) and the
Charniak (2000) parser. Hempelmann, Rus, Graesser, and McNamara (2006)
evaluated the accuracy and speed of generating the parse trees for a number of
syntactic parsers and concluded that the Charniak parser fared the best.
A few more details about the parser will be noteworthy for the readers who
are linguistics aficionados. First, the parsers used in Coh-Metrix capture the
surface phrase-structure composition rather than deep structures, logical
forms, or propositional representations. Consequently, very different tree
structures are created for active voice (Rita called the dog) versus the passive
voice (The dog was called by Rita); the subject noun-phrase is different for the
active and passive voice rather than there being the same logical agent (Rita)
for the two voices. Second, the sentences can have recursive embedding of
constituents. For example, sentences can be embedded in other sentences and
in noun-phrases, as in the sentence “My daughter knows that the dog that
lives next door is swimming in my pool.” Third, the Charniak (2000) parser
generates a parse tree from an underlying formal grammar, which can be
induced from a corpus of texts through machine learning technologies.
Therefore, the syntax could be tailored to the particular language application
if the researchers so desired.
The syntactic structure of sentences can be scaled on difficulty in a number
of different ways that computational linguists have investigated (Allen, 1995,
Hempelmann et al., 2006; Jurafsky & Martin, 2008). Psycholinguists have also
investigated how reading times and eye movements are systematically influ-
enced by syntactic composition (Just & Carpenter, 1992; Rayner, 1998).
Syntactic difficulty increases with structural ambiguity, with the degree to
which sentences have embedded constituents, and with the load on working
memory. Working memory is taxed when there are noun-phrases with many
modifiers and when many words must be held in working memory before the
reader receives the main verb of the main clause (Graesser, Cai, Louwerse, &
Daniel, 2006).

textbase
The textbase captures the meaning of explicit information in the text, as we
described in Chapters 1 and 2. Van Dijk and Kintsch (1983) distinguished
between the explicit textbase level and a deeper level called the situation model
level that contains more inferences and more global conceptualizations. The
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 49 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 49

theoretical boundary between the textbase and the situation model is not
always clear-cut, but it does provide a useful guide for separating the semantic
information that is closely tied to the explicit text and the inferences derived
from the text together with world knowledge, genre, and the pragmatic
context.
Propositions. According to van Dijk and Kintsch, the basic units of mean-
ing in the textbase are called propositions. Each proposition contains a
predicate (e.g., main verb, adjective, connective) and one or more arguments
(e.g., nouns, pronouns, embedded propositions) that have a thematic role,
such as agent, patient, object, time, or location. Below are an example
sentence and its propositional meaning representation.
When the committee met on Monday, they discovered the society was
bankrupt.
PROP 1: meet (AGENT=committee, TIME = Monday)
PROP 2: discover (PATIENT=committee, PROP 3)
PROP 3: bankrupt (OBJECT: society)
PROP 4: when (EVENT=PROP 1, EVENT=PROP 2)
The arguments are placed within the parentheses and have role labels,
whereas the predicates are outside of the parentheses. The propositional
representation of van Dijk and Kintsch does not incorporate some of the
more precise and subtle indexes of meaning, such as tense, aspect, quantifiers,
and voice. This decision was undoubtedly a simplification assumption rather
that a core theoretical claim. In principle, an expanded propositional repre-
sentation could be adopted that incorporates more precision and details
about meaning.
Computational linguistics has not been able to develop computer programs
that can automatically translate sentences into a propositional representation
(or a logical form) with a high degree of reliability. Nevertheless, there have
been large-scale attempts to achieve these goals and progress has clearly been
made (Rus, 2004). For example, the assignment of noun-phrases to thematic
roles (e.g., agent, recipient, object, location) is approximately 80% correct in
the available computer systems (DARPA, 1995). One promising project is the
development of a corpus of annotated propositional representations in
PropBank (Palmer, Kingsbury, & Gildea, 2005). This effort will allow
researchers to systematically develop, test, and refine their algorithms for
automatic proposition extraction.
Cohesion. The propositions, clauses, and noun-phrase arguments are con-
nected by principles of cohesion. Referential cohesion occurs when a noun,
pronoun, or noun-phrase that captures an argument refers to another
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 50 [40–59] 8.10.2013 7:44PM

50 Automated Evaluation of Text and Discourse with Coh-Metrix

constituent in the text. For example, if the preceding example sentence (“When
the committee met on Monday, they discovered the society was bankrupt.”)
were followed by “The meeting lasted several hours,” the noun-phase argument
“the meeting” refers to PROP-1. Cohesion between propositions or clauses is
also established by discourse markers, such as connectives (e.g., “because,” “in
order to,” “so that”), adverbs (“therefore,” “afterwards”), and transitional
phrases (“on the other hand”). As discussed in Chapter 2, textbase difficulty
is expected to increase when there are cohesion gaps in the text.
Coreference Cohesion. Coh-Metrix does not have a proposition analyzer,
but it goes a long distance in textbase analysis by identifying clauses and
computing different types of cohesion relations between sentences. As dis-
cussed in Chapter 2, one ubiquitous type of cohesion relation is coreference
(Halliday & Hasan, 1976; Sanders & Noordman, 2000; van Dijk & Kintsch,
1983). Referential cohesion occurs when a noun, pronoun, or noun-phrase
argument refers to another constituent in the text. There is a referential
cohesion gap when the content words in a sentence do not connect to
words in surrounding text or sentences. Coh-Metrix tracks five major types
of lexical coreference by computing overlap in nouns, pronouns, arguments,
stems (morpheme units), and content words.
Noun overlap. Two sentences share one or more common nouns.

Pronoun overlap. Sentences share at least one pronoun with the same gender and
number.

Argument overlap. Sentences share the same nouns or pronouns (table/table, he/he).

Stem overlap. One sentence has a noun with the same semantic morpheme (called
a lemma) in common with any word in any grammatical category in the other
sentence (e.g. the noun “swimmer” and the verb “swimming”).

Content word overlap. Sentences are more connected to the extent that they have
more content words that overlap.

Coh-Metrix would be particularly impressive if it could compute argument


overlap by resolving the referents of pronouns. This is called pronoun ana-
phora resolution in the discourse literature. In the sentence “After the com-
mittee discussed the expenditures with the society leaders, they decided to
table further discussion,” a satisfactory understanding would resolve what
“they” refers to. Does “they” refer to the committee, the leaders, the leaders
together with the leaders, or the expenditures? Coh-Metrix indeed does have a
pronoun resolution module that (a) makes sure that the pronoun agrees with
the referent in number, person, and gender, (b) considers what ideas are most
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 51 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 51

prominent in the syntactic parse (Lappin & Leass, 1994), and (c) considers
how often the referent has been mentioned in the previous text. However, the
Coh-Metrix anaphor resolution procedure merely computes whether there is
at least one acceptable referent of the pronoun (Yes or No) rather than filling
in the referent of the anaphor. It should be acknowledged that the perform-
ance of anaphora resolution systems in computational linguistics is modest
(Jurafsky & Martin, 2008).
Discourse Markers and Connectives. A very different mechanism for
establishing textbase cohesion is by various forms of discourse markers and
connectives (Halliday & Hasan, 1976; Louwerse, 2001; Sanders & Noordman,
2000). These include connectives that correspond to additive cohesion
(e.g., “also,” “moreover,” “however,” “but”), temporal cohesion (e.g., “after,”
“before,” “until”), and causal/intentional cohesion (e.g., “because,” “so,” “in
order to”). Logical operators (e.g., variants of “or,” “and,” “not,” and “if–then”)
are also cohesive links that influence the analytical complexity of a text. More
will be said about these connectives and discourse markers in the subsequent
section on the situation model. The connectives and discourse markers have
tight connections to the situation model in addition to the textbase level.
Lexical Diversity. Indices of lexical diversity are presumably related to both
text difficulty and textbase cohesion. Lexical diversity adds to difficulty
because each unique word introduces new information that needs to be
encoded and integrated into the discourse context. On the flip side, low
lexical diversity implies more repetition of the words and redundancy, and
thus higher cohesion. Lexical diversity is also related to lexical sophistication
on the part of the writer because it indicates that the author of the text is able
to use a wider variety of words.
The most well-known computation of lexical diversity is the type-token
ratio (TTR, Templin, 1957). This is the number of unique words in a text (i.e.,
types) divided by the overall number of words (i.e., tokens) in the text. One
problem with TTR, however, is that its results are sensitive to variations in
text length because as the number of word tokens increases, there is a lower
likelihood of those words being unique (McCarthy & Jarvis, 2010). This is of
particular concern because researchers frequently need to analyze texts that
dramatically vary in length. Coh-Metrix also includes measures such as vocd
and Measure of Textual Lexical Diversity (MTLD), which overcome the
potential confound of text length by using sampling and estimation methods
(McCarthy & Jarvis, 2010). The index produced by vocd is calculated through
a computational procedure that fits TTR random samples with ideal TTR
curves. MTLD is calculated as the mean length of sequential word strings in a
text that maintain a given TTR value.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 52 [40–59] 8.10.2013 7:44PM

52 Automated Evaluation of Text and Discourse with Coh-Metrix

situation model
As we discussed in Chapter 1, the situation model is a level of representation
that moves us beyond the explicit text into the realm of inferences and the
conceptual meaning of the text beyond language per se. This would be
impossible without the relevant bodies of world knowledge that are shared
by many in the sociocultural context (Graesser, Singer, & Trabasso, 1994;
Kintsch, 1998; McNamara & Magliano, 2009; Snow, 2002; van den Broek,
Rapp, & Kendeou, 2005). In narrative microworlds, the situation model
includes the setting (characters, objects, spatial layout), the plot (events,
actions, conflict), and mental states of characters (goals, emotions, percep-
tions). In informational texts, this is the substantive content on what the text
is about. In a science text, for example, it would include the components of the
system, the spatial layout of the entities, the causal mechanisms, and perhaps
quantitative specifications of these viewpoints. Inferences are needed to
construct the situation model by catering to the unique constraints of the
textbase, the background world knowledge that becomes activated, and the
other levels in the multilevel theoretical framework (see Chapter 1).
Latent Semantic Analysis (LSA). In the early days of artificial intelligence
(AI), researchers struggled with the challenge of representing world knowl-
edge, recruiting such knowledge during, comprehension, and generating
relevant inferences (Lenat, 1995; Schank & Abelson, 1977). AI researchers
identified packages of the generic world knowledge, such as person stereo-
types, spatial frames, scripted activities, and schemas. For example, scripts are
generic representations of everyday activities (e.g., eating at a restaurant,
washing clothes, playing baseball) that have actors with goals and roles,
sequences of actions that are typically enacted to achieve these goals, spatial
environments with objects and props, and so on. These scripts and other
generic knowledge packages were thought to be activated during comprehen-
sion through pattern recognition processes and to guide comprehension by
monitoring attention, generating inferences, formulating expectations, and
interpreting explicit text. AI researchers quickly learned that it was extremely
difficult to program computers to comprehend text even when the systems
were fortified with many different classes of world knowledge (Lehnert &
Ringle, 1982). Moreover, it was tedious to annotate and store large volumes of
world knowledge in formats needed to support computation (but see Lenat,
1995 for attempts to do so).
Coh-Metrix adopts a very different, statistical approach to representing
world knowledge, called Latent Semantic Analysis (Landauer & Dumais, 1997;
Landauer, McNamara, Dennis, & Kintsch, 2007). LSA is a mathematical,
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 53 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 53

statistical technique for representing world knowledge, based on a large


corpus of texts. The central intuition is that the meaning of a word is captured
by the company of other words that surround it in naturalistic documents.
Two words have similarity in meaning to the extent that they share similar
surrounding words. For example, the word “hammer” will be highly associ-
ated with words of the same functional context, such as “nail,” “saw,” “build,”
and “construction.” These words are not synonyms or hypernyms of “ham-
mer.” LSA taps word meanings in a very different way than the ways words
are treated in a dictionary, a thesaurus, and WordNet. Two words are more
similar in meaning to the extent that they hang around with similar words in
naturalistic documents.
LSA uses vector analysis combined with a statistical technique called
singular value decomposition to condense a very large corpus of texts to
100–500 statistical dimensions in a high-dimensional semantic space. The
conceptual similarity between any two text excerpts (e.g., word, clause,
sentence, text) is computed as an overlap score (technically a geometric
cosine) between the values and weighted dimensions of the two text excerpts.
The value of the cosine technically varies from –1 to 1 but typically varies from
0 to 1. This is because only positive frequencies of occurrence are calculated,
and thus the resulting cosine is heavily skewed to values above 0.
LSA has had noteworthy successes in educational applications in addition
to basic research in cognitive science and discourse processing (Landauer
et al., 2007). For example, the LSA-based Intelligent Essay Assessor can grade
student essays as reliably as expert graders in high-stakes testing (Landauer,
Foltz, & Laham, 2003). LSA has been used successfully to track student
contributions in tutoring systems that interact with students in natural
language, such as AutoTutor (Graesser, Jeon, & Dufty, 2008) and iSTART
(McNamara, Boonthum, Levinstein, & Millis, 2007).
The application of LSA in Coh-Metrix lies in computing text coherence at
the level of the situation model. LSA similarity scores are computed between
adjacent sentences in the text, between all possible pairs of sentences in a
paragraph, and between adjacent paragraphs. Text difficulty is predicted to
increase as a function of decreases in LSA similarity scores. We are casting
this use of LSA as tapping situation model coherence because LSA moves us
beyond the text and into the minds of readers. However, it could be argued
quite persuasively that LSA also taps the explicit words of the textbase and is
functionally a form of cohesion. It would be pointless to argue strongly one
way or the other on what LSA is predominantly tapping. It undoubtedly
reflects both the textbase and situation model; it has vestiges of both cohesion
and coherence (McNamara, Cai, & Louwerse, 2007).
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 54 [40–59] 8.10.2013 7:44PM

54 Automated Evaluation of Text and Discourse with Coh-Metrix

The statistical representation of words in LSA depends on the corpus of


texts on which they are trained. The users of Coh-Metrix have the option of
declaring which corpus to use, but the corpus that is routinely used and serves
as the default is the Touchstone Applied Science Associates (TASA) corpus of
academic textbooks. This corpus of more than 11 million words covers a broad
range of topics represented in more than 37,651 texts. The corpus represents
the texts that a typical senior in high school would have encountered kinder-
garten through 12th grade. Most of the text genres were classified by the TASA
researchers as being in language arts, science, and social studies/history, but
other categories were business, health, home economics, and industrial arts.
The texts were passages (without marked paragraph breaks) with a mean
length of 288.6 words (SD = 25.4).
Coh-Metrix also computes an LSA-based measure of given (old) versus
new information in a text (McCarthy, Dufty, et al., 2012). Given information
is “recoverable” either anaphorically or situationally from the preceding
discourse, whereas new information is not recoverable (Haviland & Clark,
1974; Prince, 1981). The statistical method is called span (Hu et al., 2003), an
LSA-based metric that compares the LSA vector of each incoming sentence
V(S) to the existing vector of the preceding text V(P). The portion of the V(S)
vector that is shared (parallel) with the previous text is given (G). The
component of the vector that is perpendicular is considered new (N).
McCarthy and colleagues reported that the span method has a high correla-
tion with the theoretical analyses of given/new proposed by Prince (1981).
In summary, LSA has fortified Coh-Metrix with a statistical technique for
capturing world knowledge, inferences, situation model cohesion, and esti-
mates of given versus new information as the text unfolds. This is a nontrivial
advance because we would otherwise be at the mercy of alternative symbolic
computational techniques that require decades to develop with costs of
money, labor and expertise. We of course welcome these alternative symbolic
methods when they arrive.
Situation Model Dimensions. Discourse psychologists have extensively
investigated five dimensions of the situational model (Zwaan & Radvansky,
1998): causation, intentionality, time, space, and protagonists. A break in
cohesion or coherence occurs when there is a discontinuity on one or more
of these situation model dimensions. Whenever such discontinuities occur, it
is important to have connectives, transitional phrases, adverbs, or other
signaling devices that convey to the readers that there is a discontinuity; we
refer to these different forms of signaling as particles. Cohesion is facilitated
by particles that clarify and stitch together the actions, goals, events, and
states conveyed in the text.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 55 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 55

Coh-Metrix analyzes the situation model dimension on causation, inten-


tionality, space, and time, but not protagonists. We have also come to learn
that it is sometimes difficult to persuade colleagues about the value of the
distinction between causality and intentionality; some demand the distinction
whereas others would prefer to gloss over it. We are also less than satisfied
with our computation of space. Our confidence is highest for causality and
temporality, with intentionality close behind.
(a) Causality and Intentionality. The distinction between causality and
intentionality is based on the event-indexing model (Zwaan, Magliano &
Graesser, 1995; Zwaan & Radvansky, 1998). Intentionality refers to the actions
of animate agents as part of plans in pursuit of goals. Narrative text is replete
with such intentionality because they are stories about people with plans that
follow a plot. In contrast, the causal dimension refers to mechanisms in the
material world or psychological world that may or may not be driven by goals
of people. A text about scientific processes and mechanisms is a prototypical
example of the causal dimension. Some researchers consider it important to
distinguish between intentional and causal dimensions because they are
fundamentally different categories of knowledge (Graesser & Hemphill,
1991; Keil, 1981) and may partly explain why science is more difficult to
comprehend than stories. Other researchers believe that the distinction is
unimportant or murky, so they choose to combine the causal and intentional
dimensions into an overarching causal category.
Consider the intentionality dimension. How do we pull out the goal-
oriented, plan-based situation model content that is so characteristic of plot
in narrative or procedural descriptions? As a first step, there needs to be some
way of identifying goals and intentional actions. This is accomplished by
identifying clauses in which (a) the noun in the syntactic subject position is
human or animate (i.e., causal agents) and the main verbs are diagnostic of
goals and actions. The syntactic parser isolates the syntactic subject and then
WordNet takes over. The subject noun needs to be human or animate
according to WordNet, whereas the main verb needs to be in a change-of-
state or other relevant category according to WordNet (Fellbaum, 1998; Miller
et al., 1990). That is, the verbs are change verbs (e.g., “stretch”), contact verbs
(e.g., “smash”), create verbs (e.g., “build”), competition verbs (e.g., “fight”),
and communicate verbs (e.g., “tell”). All three conditions need to be met in
order to classify a clause as being an intentional action or goal. Once this
intentional content is extracted from the text, we ask how much of this
content is woven together cohesively by causal particles, namely connectives
(i.e., “in order to,” “to,” “so that,” “by means of,” “by”). Intentional cohesion
increases theoretically if the ratio of intentional particles to intentional
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 56 [40–59] 8.10.2013 7:44PM

56 Automated Evaluation of Text and Discourse with Coh-Metrix

content is higher. Intentional cohesion is predicted to be inversely related to


text difficulty.
Next consider causal+intentional cohesion. This is computed in the same
way as intentional cohesion except that we relax some of the constraints. We
do not worry about the class of nouns that fill the subject position, but we
make sure that the main verbs are in WordNet verb classes that are diagnostic
of events or actions. The causal particles (“because,” “consequence of,” “as a
result”) and intentional particles are then compared with the frequency of
events or actions.
(b) Temporality. Temporality in text is important because of its ubiquitous
presence in organizing language and discourse. Time is represented through
inflected tense morphemes (e.g., “-ed,” “is,” “has”) in every sentence of the
English language. The temporal dimension also depicts unique internal event
timeframes, such as an event that is completed (i.e., telic) or ongoing (i.e.,
atelic), by incorporating an aspect system. The occurrence of events at a point
in time or relative points in time can be established by a large repertoire of
temporal particles, such as “before,” “after,” “then,” “Monday,” “10 pm.”
These temporal features provide several different measures of the temporal
cohesion of a text (Duran, McCarthy, Graesser, & McNamara, 2007).
Temporal cohesion in Coh-Metrix can be tracked by observing the con-
sistency of tense (e.g., past and present) and aspect (perfective and progressive)
across the sentences in the text. Stories of activities in the past tend to have
their steam of verb phrases in the past tense and perfective aspect. The
repetition score of tense and aspect is an excellent signal of temporal cohesion.
When there are deviations, as in the case of flashbacks and flash-forwards, it is
appropriate to have temporal particles that signal such deviations (“years ear-
lier,” “later that evening”). Failure to have these signals (when they are needed)
will increase text difficulty on the temporal dimension.
(c) Spatiality. Herskovits (1998) proposed that there are two kinds of
spatial information: location information and motion information.
Herskovits also provided a list of particles that capture these two aspects
of spatiality. For example, “beside,” “upon,” “here,” and “there” indicate
location spatiality, whereas the prepositions “into” and “through” indicate
motion spatiality. Herskovits’s theory was extended by assuming that
motion spatiality is represented by motion verbs (“move,” “go,” “run”) in
WordNet and that location spatiality is represented by location nouns
(“place,” “region”) in WordNet. Classifications for both motion verbs and
location nouns can be found in WordNet (Fellbaum, 1998). Coh-Metrix
keeps track of spatial cohesion by simply tracking the relative frequency of
these spatial signals in text.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 57 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 57

genre and rhetorical structure


Coh-Metrix can classify texts into different genres, as discussed in Chapter 1.
The major split is between narrative versus informational genres, but there is
also a three-part split among narrative (language arts), social studies (includ-
ing history), and science. A reader is expected to comprehend a text better if
he or she is able to classify its genre. There is some evidence that training
struggling readers to recognize genre and other aspects of global text structure
helps them improve comprehension (Meyer & Wijekumar, 2007; Oakhill &
Cain, 2007; Williams, 2007). Skilled readers activate particular expectations
and strategies depending on the genre that is identified. For example, they
tend to encode wording and syntax to a greater extent if they believe the text is
literary, but they encode the situation model to a greater extent if the text is
viewed as a newspaper article (Zwaan, 1994).
Genre Classification. There are different computational foundations for
classifying texts into genre, as we discussed in Chapter 1 (Biber, 1988;
Louwerse et al., 2004). One method is to conduct a principal component
analysis on a large set of Coh-Metrix features from other discourse levels and
determine which of these features predict particular discourse genre.
Graesser, McNamara, and Kulikowich (2011) adopted this approach and
discovered that Coh-Metrix could accurately distinguish between narrative
and science texts. McCarthy, Myers, Briner, Graesser, and McNamara (2009)
discovered that the initial words in the first sentence in a passage were
sufficient to significantly classify texts into narrative versus science genres.
Rhetorical Structures. Texts can be broken down into sections with partic-
ular rhetorical structures. The rhetorical structure specifies the organization
of discourse, such as setting+plot+moral, problem+solution, compare-
contrast, claim+evidence, question+answer, and argue+counter argue
(Meyer, 1975). Formal text grammars specify the elements and composition
of these rhetorical patterns explicitly and precisely. At this point, it is beyond
the scope of Coh-Metrix to automatically segment and classify sections of
texts into these rhetorical categories, although we have met some moderate
success using n-gram analyses. One challenge in such endeavors is to obtain
reliable human ratings for the rhetorical structures. Human experts are
needed to segment, annotate, and structure these text representations because
the theoretical distinctions are too complex or subtle for naïve coders to
understand. However, even highly trained experts can differ widely in their
ratings because of the ambiguous, inference-ridden nature of rhetoric.
At present, there are no computer programs that can translate texts into
these structured text representations automatically. Marcu (2000) has
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 58 [40–59] 8.10.2013 7:44PM

58 Automated Evaluation of Text and Discourse with Coh-Metrix

attempted with moderate success to implement the rhetorical structure theory


(RTS) that was developed theoretically by Mann and Thompson (1988). RST
specifies the relations among text spans, which are usually, but not necessa-
rily, identical to clauses. Text spans may have variable size, ranging from two
clauses to multi-sentence segments. RST postulates that there is a set of
rhetorical relations that dominate in most texts, but the door is open for
additional rhetorical relations that the writer needs. Mann and Thompson
(1988) identified 23 rhetorical relations, including circumstance, solution-
hood, elaboration, background, purpose, and non-volitional result. An RST
analysis starts by dividing the text into functional units that are called text
spans. Two text spans form a nucleus and a satellite (Mann & Thompson,
1988); the nucleus is the part that is more essential to the writer’s purpose than
is the satellite. Rhetorical relations are then composed between two non-
overlapping text spans and form schemas. These schemas are rearranged into
larger schema applications. The result of the analysis is a rhetorical structure
tree, which is a hierarchical system of schema applications.
Topic Sentencehood. Topic sentences convey the main idea, topic, or
theme of the paragraph whereas the remaining sentences embellish the
topic sentence. Topic sentences are assumed to help readers better compre-
hend and remember text, so it is prudent for topic sentences to be at the
beginning of a paragraph (Kieras, 1978; Lorch, Lorch, & Morgan, 1987;
McCarthy et al., 2008). Such facilitation is presumably important when the
text is challenging and when the reader lacks domain knowledge, as in the
case of expository texts. This ideal view of topic sentencehood, however, does
not appear to be compatible with patterns in naturalistic texts. Empirical
studies across a wide range of genres (scientific, academic, technical, and
periodical writing) have shown that topic sentences appear in only 50% of
paragraphs (Popken, 1991). This lack of topic sentencehood in professional
texts may pose challenges for low-knowledge and/or less-skilled readers who
need explicit cues in the text to help them organize the information.

conclusion
This chapter has identified the technologies and science that led to the
development of Coh-Metrix. It is quite apparent that many fields in the
interdisciplinary arena of computational discourse science were needed to
reach this point in research and development. Moreover, many of our
colleagues would not have bet 20 years ago on a computer facility like Coh-
Metrix being able to compute automatically so many measures at the levels of
words, syntax, textbase, situation model, and genre. Coh-Metrix is not a
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C03.3D 59 [40–59] 8.10.2013 7:44PM

The Science and Technology That Led to Coh-Metrix 59

perfect system in the sense of computing representations and processes in


ways that were theoretically intended. However, it is good enough to get the
job done for many language and discourse components.
A skeptic might raise the criticism that these analyses of language are
merely word crunchers and do not construct deep, structured meanings.
This observation is to some extent correct. However, one important counter-
argument follows from the distinction between a trin and prox (Page &
Petersen, 1995). A trin is an intrinsic characteristic of text that is closely
aligned with a theoretical component of language or discourse. A prox
(short for proxy) is a superficial observable countable feature of text that is
diagnostic of a trin. One or more proxies may be adequate to estimate a trin. It
is entirely an empirical question whether a proxy is adequate for recovering
the essential intrinsic characteristics of whatever theoretical component is
under consideration. The evidence thus far speaks strongly in favor of our
assumption that the landscape of proxies provided by Coh-Metrix goes a long
way in estimating the semantic characteristics of texts.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 60 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures

As we have discussed in the previous chapters, Coh-Metrix was developed to


analyze texts on multiple characteristics and levels of language and discourse.
Although the original inspiration for the development of Coh-Metrix was to
provide automated metrics of text cohesion (hence Coh-Metrix), it became
clear very early in the Coh-Metrix project that there was a need in the research
community for a more comprehensive tool capable of analyzing texts at
multiple language and discourse levels. The Coh-Metrix team has collected
and evaluated hundreds of indices since the beginning of the project. The
indices scale texts on characteristics related to words, sentences, and con-
nections between sentences. The measures that have been included in
Coh-Metrix naturally align with theories of discourse, which assume that
comprehension operates at multiple levels (e.g., Graesser & McNamara, 2011;
Kintsch, 1998; Snow, 2002). These theoretical frameworks describe represen-
tations, structures, and processes at multiple levels of language and discourse.
As described in Graesser and McNamara (2011), five levels have been pro-
posed most commonly in these frameworks: (1) words, (2) syntax, (3) the
explicit textbase, (4) the situation model, and (5) the discourse genre and
rhetorical structure (i.e., the type of discourse and its composition). The
theoretical alignment of Coh-Metrix with these levels is described in previous
chapters.
The number and particular measures provided by Coh-Metrix depend on
the version and the type of tool. We have developed public versions of the tool
that analyze individual texts and have provided between 40 and 80 theoret-
ically grounded and validated indices. We have also developed internal
versions of Coh-Metrix that analyze texts in batches and that include 600–
1,000 indices, many of which are redundant and many of which have not been
validated (and thus we do not release them to the public). Although the
specific Coh-Metrix measures vary somewhat across versions and tools, the
60
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 61 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 61

banks of measures are quite similar. This chapter describes the indices that are
provided in Coh-Metrix 3.0. In this chapter we describe all of those indices in
the order they are output in the tool, except those that are associated with
readability and text ease, which are described in Chapter 5. The indices that
are described in this chapter and Chapter 5 are listed in Appendix A.
Comparative norms for each of the indices are provided in Appendix B by
grade level for three texts genres (language arts, social studies, and science).

measures, indices, and banks


We try to use consistent terminology to discriminate between measures,
indices, and banks. We use the term “measure” to describe a theoretical
construct (e.g., referential cohesion, lexical diversity, word frequency). We
use the terms “index” or “indices” to describe any one of the ways Coh-Metrix
assesses that measure. For example, adjacent noun overlap and adjacent stem
overlap are both indices that are used in Coh-Metrix to measure local
referential cohesion. A bank of indices describes a group of conceptually or
mathematically similar indices or measures. For example, paragraph length,
sentence length, and word length all fall under the bank called descriptive.
Note also that the term “variable” can be used to describe a measure, an index,
or a bank.

descriptive indices
Coh-Metrix provides descriptive indices to help the user check the Coh-
Metrix output (e.g., to make sure that the numbers make sense) and interpret
patterns of data. The extracted indices include those on the following list. In
the output for the current version of Coh-Metrix (Version 3.0), all of these
indices are preceded by DES to designate that they are descriptive measures.
1. Number of paragraphs (DESPC). This is the total number of para-
graphs in the text. Paragraphs are defined by hard returns within the
text.
2. Number of sentences (DESSC). This is the total number of sentences in
the text. Sentences are identified by the OpenNLP sentence splitter
(http://opennlp.sourceforge.net/projects.html).
3. Number of words (DESWC). This is the total number of words in the
text. Words are calculated using the output from the Charniak parser.
For each sentence, the Charniak parser generates a parse tree with part
of speech (POS) tags for clauses, phrases, words, and punctuations. The
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 62 [60–77] 8.10.2013 8:25PM

62 Automated Evaluation of Text and Discourse with Coh-Metrix

elements on the leaves of a parse tree are tagged words or punctuations.


In Coh-Metrix, words are taken from the leaves of the sentence parse
trees.
4. Mean length of paragraphs (DESPL). This is the average number of
sentences in each paragraph within the text. Longer paragraphs may be
more difficult to process.
5. Standard deviation of the mean length of paragraphs (DESPLd). This is
the standard deviation of the measure for the mean length of para-
graphs within the text. In the output, d is used at the end of the name of
the indices to designate that it is a standard deviation. A large standard
deviation indicates that the text has large variation in terms of the
lengths of its paragraphs, such that it may have some very short and
some very long paragraphs. The presence of headers in a short text can
increase values on this measure.
6. Mean number of words (length) of sentences in (DESSL). This is the
average number of words in each sentence within the text, where a
word is anything that is tagged as a part-of-speech by the Charniak
parser. Sentences with more words may have more complex syntax and
may be more difficult to process. While this is a descriptive measure,
this also provides one commonly used proxy for syntactic complexity.
However, Coh-Metrix provides additional more precise measures of
syntactic complexity discussed later in this chapter.
7. Standard deviation of the mean length of sentences (DESSLd). This is
the standard deviation of the measure for the mean length of sentences
within the text. A large standard deviation indicates that the text has
large variation in terms of the lengths of its sentences, such that it may
have some very short and some very long sentences. The presence of
headers in a short text may impact this measure. Narrative text may
also have variations in sentence length as authors move from short
character utterances to long descriptions of scenes.
8. Mean number of syllables (length) in words (DESWLsy). Coh-Metrix
calculates the average number of syllables in all of the words in the text.
Shorter words are easier to read, and the estimate of word length serves
as a common proxy for word frequency. This is discussed in greater
detail in Chapter 5.
9. Standard deviation of the mean number of syllables in words
(DESWLsyd). This is the standard deviation of the measure for the
mean number of syllables in the words within the text. A large standard
deviation indicates that the text has large variation in terms of the
lengths of its words, such that it may have both short and long words.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 63 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 63

10. Mean number of letters (length) in words (DESWLlt). This is the


average number of letters for all of the words in the text. Longer
words tend to be lower in frequency or familiarity to a reader.
11. Standard deviation of the mean number of letter in words (DESWLltd).
This is the standard deviation of the measure for the mean number of
letters in the words within the text. A large standard deviation indicates
that the text has large variation in terms of the lengths of its words, such
that it may have both short and long words.

referential cohesion
Referential cohesion refers to overlap in content words between local
sentences, or coreference. In the output for the current version of Coh-
Metrix (Version 3.0), all of these indices are preceded by CRF to designate
that they are coreference measures. As discussed in greater detail in
Chapters 2 and 3, coreference is a linguistic cue that can aid readers in
making connections between propositions, clauses, and sentences in their
textbase understanding (Halliday & Hasan, 1976; McNamara & Kintsch,
1996). Referential cohesion gaps can occur when the words or concepts in a
sentence do not overlap with other sentences in the text. As such cohesion
gaps at the textbase level can have varying effects on comprehension and
reading time depending on the reader’s abilities (McNamara & Kintsch,
1996; O’Brien, Rizzella, Albrecht, & Halleran, 1998; O’Reilly & McNamara,
2007; see Chapter 2).
Coh-Metrix measures for referential cohesion vary along two dimensions.
First, the indices vary from local to more global. Local cohesion is measured
by assessing the overlap between consecutive, adjacent sentences, whereas
global cohesion is assessed by measuring the overlap between all of the
sentences in a paragraph or text. Second, the indices vary in terms of the
explicitness of the overlap. Coh-Metrix tracks different types of coreference:
noun overlap, argument overlap, stem overlap, and content word overlap.
Noun overlap measures the proportion of sentences in a text for which
there are overlapping nouns, with no deviation in the morphological forms
of the nouns (e.g., table/table). Argument overlap also considers overlap
between the head nouns (e.g., “table”/“tables”) and pronouns (e.g., “he”/
“he”) but does not attempt to determine the referents of pronouns (e.g.,
whether “he” refers to Sally or John). Stem overlap considers overlap between
a noun in one sentence and a content word (i.e., nouns, verbs, adjectives,
adverbs) in another sentence. The content word in the other sentence must
share a common lemma (i.e., core morphological element; e.g., “baby”/“babies”;
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 64 [60–77] 8.10.2013 8:25PM

64 Automated Evaluation of Text and Discourse with Coh-Metrix

t a b l e 4 . 1 . A comparison of the five coreference indices on a science text about cells.


The Coh-Metrix adjacent coreference calculations for each of the five types of indices
are provided for each sentence in the text. The Coh-Metrix output is the average
across sentences. Each of the five types of indices is also calculated in terms of global
coreference, which is the average overlap between all pairs of sentences in the text.

Content
Noun Argument Stem Word LSA
S1 The cell is the basic unit of life.
S2 Cells were discovered by Robert 0 1 1 0 0.37
Hooke.
S3 A cell is the smallest unit of life that is 0 1 1 0 0.40
classified as a living thing.
S4 Some organisms, such as most bacteria, 1 1 1 0.13 0.44
are unicellular (consist of a single cell).
S5 Other organisms, such as humans, are 1 1 1 0.33 0.79
multicellular.
S6 There are two types of cells: eukaryotic 0 0 0 0 0.34
and prokaryotic.
S7 Prokaryotic cells are usually 1 1 1 0.50 0.85
independent.
S8 Eukaryotic cells are often found in 1 1 1 0.20 0.70
multicellular organisms.
Average local (adjacent) 0.57 0.86 0.86 0.17 0.55
Average global (all sentences) 0.43 0.82 0.82 0.13 0.41

“mouse”/“mice”; “price”/“priced”). Whereas the latter three types of indices are


binary (i.e., there either is or is not any overlap between a pair of sentences),
content word overlap refers to the proportion of content words (nouns, verbs,
adverbs, adjectives, pronouns) that are shared between sentences. Additional
information about the coreference measures with examples in Table 4.1 are
provided.
1. Noun overlap (CRFNO1 and CRFNOa). These are measures of local
and global overlap between sentences in terms of nouns. Adjacent noun
overlap (CRFNO1) represents the average number of sentences in the
text that have noun overlap from one sentence back to the previous
sentence. Among the coreference measures, it is the most strict, in the
sense that the noun must match exactly, in form and plurality. For
example, as shown in Table 4.1, there is no noun overlap between “cell”
and “cells” between sentences 3 and 2. The overlap must be the same
word, as in the overlap between “cell” in sentences 4 and 3 and
“organisms” in sentences 5 and 4.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 65 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 65

Whereas local overlap considers only adjacent sentences, global


overlap (CRFNOa) considers the overlap of each sentence with every
other sentence. As shown in Table 4.1, slightly more than 50% of the
adjacent sentences contained an overlapping noun, and 43% of the
sentence pairs in the text contained an overlapping noun when com-
paring all of the sentences (global overlap).
2. Argument overlap (CRFAO1 and CRFAOa). These local and global
overlap measures are similar to noun overlap measures but include over-
lap between sentences in terms of nouns and pronouns. Argument over-
lap occurs when there is overlap between a noun in one sentence and the
same noun (in singular or plural form) in another sentence; it also occurs
when there are matching personal pronouns between two sentences (e.g.,
“he”/“he”). The term “argument” is used in a linguistic sense, where
noun/pronoun arguments are contrasted with verb/adjective predicates
(Kintsch & Van Dijk, 1978). Consider argument overlap for the science
passage in Table 4.1 in the second column. Note that in comparison to
noun overlap, it is less strict because it considers the overlap for example
between “cells” and “cell.” Argument and stem overlap would also include
overlap between pronouns, such as “it” to “it” or “he” to “he,” which noun
overlap does not include.
3. Stem overlap (CRFSO1, CRFSOa). These two local and global overlap
measures relax the noun constraint held by the noun and argument
overlap measures. A noun in one sentence is matched with a content
word (i.e., nouns, verbs, adjectives, adverbs) in a previous sentence that
shares a common lemma (e.g., “tree”/“treed”; “mouse”/“mousey”;
“price”/“priced”). Notably, the outcomes for stem overlap and argu-
ment overlap in Table 4.1 were identical; however, this will not always
be the case.
4. Content word overlap (CRFCWO1, CRFCWO1d, CRFCWOa,
CRFCWOad) . This measure considers the proportion of explicit
content words that overlap between pairs of sentences. For example,
if a sentence pair has fewer words and two words overlap, the pro-
portion is greater than if a pair has many words and two words
overlap. This measure includes both local (CRFCWO1) and global
(CRFCWOa) indices and also includes their standard deviations
(CRFCWO1d, CRFCWOad). In the example provided in Table 4.1,
the content word overlap both locally and globally was lower than that
estimated by the binary overlap scores. This measure may be partic-
ularly useful when the lengths of the sentences in the text are a
principal concern.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 66 [60–77] 8.10.2013 8:25PM

66 Automated Evaluation of Text and Discourse with Coh-Metrix

latent semantic analysis


Latent Semantic Analysis (LSA; Landauer et al., 2007; see Chapter 3) provides
measures of semantic overlap between sentences or between paragraphs.
Coh-Metrix 3.0 provides eight LSA indices. Each of these measures varies
from 0 (low cohesion) to 1 (high cohesion). LSA considers semantic overlap
between explicit words and words that are implicitly similar or related in
meaning. For example, “home” in one sentence will have a relatively high
degree of semantic overlap with words such as “house” and “table” in another
sentence.
Coh-Metrix measures LSA-based cohesion in several ways, such as LSA
similarity between adjacent sentences (LSASS1), LSA similarity between all
possible pairs of sentences in a paragraph (LSASSp), and LSA similarity
between adjacent paragraphs (LSAPP1), as well as the standard deviations
of these indices (LSASS1d, LSASSpd, LSAPP1d).
Coh-Metrix also provides a unique measure called LSA Given-New
(LSAGN) and its standard deviation (LSAGNd) (Hempelmann et al., 2005;
McCarthy, Dufty et al., 2012). Text constituents can be classified into three
partitions: given, partially given (based on various types of inferential avail-
ability), or not given (i.e., new). This is a proxy for how much given versus
new information exists in each sentence in a text, compared with the content
of prior text information, for example, G/(N+G). To illustrate the basic
notion of givenness, consider the following example.

1. President Barack Obama said on Monday he inherited many of the


country’s problems with high debt and deficits when he entered the
White House, sounding a theme likely to dominate his 2012 re-election
campaign.

In this example, “country’s problems” is new when it is first mentioned, while


“high debt” is coreferential with it. Thus, the constituent “high debt” is given
information even though there are lexical differences that have to be bridged
inferentially. “Re-election campaign,” on the other hand, is only inferentially
available – that is, it is neither fully new nor unexpected in view of the prior
text. Thus, “re-election campaign” is neither given nor new but somewhere in
between.
LSA Given/New is calculated by constructing a hyperplane out of all
previous vectors, rather than by simply adding vectors. The comparison
vector (e.g., a current sentence in the text) is projected onto the hyperplane.
The projection of the sentence vector onto the hyperplane is considered to be
the component of the vector that is shared with the previous text, or given
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 67 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 67

(G). The component of the vector that is perpendicular to the hyperplane is


considered to be the component of the sentence that is new (N). When there
is more given information in a text (e.g., 100%) and less new information,
then G/N approaches 1. When there is less given information (e.g., 10%) then
G/N approaches 0 to indicate that there is lower cohesion.

lexical diversity
Coh-Metrix includes three types of indices of lexical diversity: type-token
ratio (TTR; LDTTRc, LDTTRa), the Measure of Textual Lexical Diversity
(MTLD; LDMTLDa), and vocd (LDVOCDa). Type-token ratio is calculated
for content words only (i.e., c) and also for all words (i.e., a), and MTLD and
vocd are calculated for all words (i.e., a). Lexical diversity refers to the variety
of unique words (types) that occur in a text in relation to the total number of
words (tokens). When the number of word types is equal to the total number
of words (tokens), all of the words are different. In that case, lexical diversity is
at a maximum, and the text is likely to be either very low in cohesion or very
short. A high number of different words in a text indicates that new words
need to be integrated into the discourse context. By contrast, lexical diversity
is lower (and cohesion is higher) when more words are used multiple times
across the text. The most well-known lexical diversity index is TTR, which is
simply the number of unique words divided by the overall number of words
(i.e., tokens). TTR is correlated with text length because as the number of
word tokens increases, there is a lower likelihood of those words being
unique. Measures such as MTLD and vocd overcome that confound by
using estimation algorithms (McCarthy & Jarvis, 2010). MTLD is calculated
as the mean length of sequential word strings in a text that maintain a given
TTR value. The index produced by vocd is calculated through a computa-
tional procedure that fits TTR random samples with ideal TTR curves.

connectives
Connectives play an important role in the creation of cohesive links between
ideas and clauses and provide clues about text organization (Cain & Nash,
2011; Crismore, Markkanen, & Steffensen, 1993; Longo, 1994; Sanders &
Noordman, 2000; van de Kopple, 1985). Coh-Metrix provides an incidence
score (occurrence per 1,000 words) for all connectives (CNCAll) as well as
different types of connectives. Indices are provided on five general classes of
connectives (Halliday & Hasan, 1976; Louwerse, 2001): causal (CNCCaus:
“because,” “so”), logical (CNCLogic: “and,” “or”), adversative/contrastive
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 68 [60–77] 8.10.2013 8:25PM

68 Automated Evaluation of Text and Discourse with Coh-Metrix

(CNCADC: “although,” “whereas”), temporal (CNCTemp, CNCTempx:


“first,” “until”), and additive (CNCAdd: “and,” “moreover”). In addition,
there is a distinction between positive connectives (CNCPos: “also,” “more-
over”) and negative connectives (CNCNeg: “however,” “but”).

situation model
Referential cohesion is an important linguistic feature of text. However, there
are also deeper levels of meaning that go beyond the words. The term
“situation model” has been used by researchers in discourse processing and
cognitive science to refer to the level of mental representation for a text that
involves much more than the explicit words (Graesser & McNamara, 2011;
Graesser, Singer, & Trabasso, 1994; Kintsch, 1998; van Dijk & Kintsch, 1983;
Zwaan & Radvansky, 1998). Some researchers have described the situational
model in terms of the features that are present in the comprehender’s mental
representation when a given context is activated (e.g., Singer & Leon, 2007).
For example, with episodes in narrative text, the situation model would
include the plot. In an informational text about the circulatory system, the
situation model might convey the flow of the blood. In essence, the situation
model comprises the reader’s mental representation of the deeper underlying
meaning of the text (Kintsch, 1998).
The content words and connective words systematically constrain and are
aligned with aspects of these inferred meaning representations, but the
explicit words do not go the full distance in specifying the deep meanings.
Coh-Metrix provides indices for a number of measures that are potentially
related to the reader’s situation model understanding. These include meas-
ures of causality, such as incidence scores for causal verbs that reflect changes
of state (SMCAUSv: “break,” “freeze,” “impact,” “hit,” “move”), causal verbs
plus causal particles (SMCAUSvp: e.g., both causal verbs and connectives
such as “because,” “in order to”), and intentional verbs (SMINTEp: e.g.,
“contact,” “drop,” “walk,” “talk”). Coh-Metrix uses WordNet (Miller,
Beckwith, Fellbaum, Gross, & Miller, 1990) to classify verbs into the categories
of causal and intentional verbs. The distinction between causality and inten-
tionality has relevance to the nature of knowledge in situation models
(Zwaan & Radvansky, 1998). Intentional verbs signal actions that are volun-
tarily enacted by animate agents, motivated by plans in pursuit of goals (such
as buying groceries, telling a child to behave, or driving to work). By contrast,
causal verbs reflect events in the material world or psychological world (such
as an earthquake erupting, or a person discovering a solution) that either may
or may not be driven by goals of people.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 69 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 69

Coh-Metrix also provides two ratio indices: the ratio of causal particles to
causal verbs to (SMCAUSr) and the ratio of intentional particles to inten-
tional verbs (SMINTEr). These ratios are calculated to reflect the necessity of
connectives in text. This necessity will depend on the number of events
expressed in the text. A text is judged as more causally cohesive to the extent
that there are proportionally more connectives that relate actions and events
in the text. If there are numerous action, event, and intentional verbs without
causal connectives to aid the reader, then the reader may be more likely to be
forced to generate inferences to understand the relations between the actions
and events in the sentences.
Coh-Metrix also provides measures of verb overlap, which are calculated
using LSA (SMCAUSlsa) and WordNet (SMCAUSwn). These indices are
indicative of the extent to which verbs (which have salient links to actions,
events, and states) are repeated across the text. In the LSA algorithm, the
cosine of two LSA vectors corresponding to the given pair of verbs is used to
represent the degree of overlap of the two verbs. In the WordNet algorithm,
the overlap was a binary representation: 1 when two verbs were in the same
synonym set and 0 otherwise. McNamara et al. (2012) found that verb
cohesion is greater in the earlier-grade texts than in the later-grade texts
and that verb cohesion decreases monotonically across science, social studies,
and narrative texts. They hypothesized that verb cohesion may help compen-
sate for lower referential cohesion when the text focuses more on events than
objects, as in the cases of lower-grade texts and narrative texts.
Coh-Metrix also provides a measure of temporal cohesion, which reflects
tense and aspect repetition in the text (SMTEMP). Time is represented
through morphemes associated with the main verb or helping verb that signal
tense (past, present, future) and aspect (in progress versus completed). This
measure tracks the consistency of tense and aspect across a passage of text.
The repetition scores decrease as shifts in tense and aspect are encountered.
When such temporal shifts occur, readers may encounter difficulties in the
absence of explicit particles that signal shifts in time, such as the temporal
adverbial (“later on”), temporal connective (“before”), or prepositional
phrases with temporal nouns (“on the previous day”). A low particle-to-
shift ratio is a symptom of problematic temporal cohesion.

syntactic complexity
Theories of syntax assign words to part-of-speech categories (e.g., nouns, verbs,
adjectives, conjunctions), group words into phrases or constituents (noun-
phrases, verb-phrases, prepositional-phrases, clauses), and construct syntactic
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 70 [60–77] 8.10.2013 8:25PM

70 Automated Evaluation of Text and Discourse with Coh-Metrix

tree structures for sentences. For example, some sentences are short and have a
simple syntax that follow an actor-action-object syntactic pattern, have few if any
embedded clauses, and have an active rather than passive voice. Some sentences
have complex, embedded syntax that potentially places heavier demands on
working memory. The syntax in text tends to be easier to process when there are
shorter sentences, few words before the main verb of the main clause, and few
words per noun-phase. As mentioned earlier, the average number of words in
sentences is provided in Coh-Metrix as a descriptive measure (DESSL). Coh-
Metrix also calculates the mean number of words before the main verb, or left
embeddedness (SYNLE), and the average number of modifiers per noun phrase
(SYNNP). Sentences with difficult syntactic constructions include the use of
embedded constituents and are often structurally dense, syntactically ambigu-
ous, or ungrammatical (Graesser et al., 2004). As a consequence, they are more
difficult to process and comprehend (Perfetti, Landi, & Oakhill, 2005).
Coh-Metrix assesses a combination of semantic and syntactic dissimilarity
by measuring the uniformity and consistency of the sentence constructions in
the text, based on the notion of a Minimal Edit Distance (MED; McCarthy,
Guess, & McNamara, 2009). Coh-Metrix 3.0 provides three variations on
MED: SYNMEDpos, SYNMEDwrd, and SYNMEDlem. MED calculates the
average minimal edit, or the distance that parts of speech (SYNMEDpos),
words (SYNMEDwrd), or lemmas (SYNMEDlem) are from one another
between consecutive sentences in a text. Consider the following example.

The dog chases the cat. (4.1)

The cat chases the dog.

The SYNMEDpos syntactic dissimilarity is 0.0 because the syntax is the same.
By contrast, “cat” and “dog” are in different positions in each sentence, and so
SYNMEDwrd and SYNMEDlem are both 0.4. Considering these indices
together indicates that they have the same syntax but different meanings.
SYNMEDpos considers parts of speech but not the words themselves (e.g.,
determiner + noun). In essence, SYNMEDpos calculates the extent to which
one sentence needs to be modified (edited) to make it have the same syntactic
composition as a second sentence. SYNMEDwrd and SYNMEDlem consider
the words but not the parts of speech (e.g., the + book). The three MED indices
tend to be moderately correlated with measures of referential and semantic
cohesion, with correlations ranging between −.3 and −.7. For example, using the
TASA corpus of 38,807 passages, SYNMEDwrd correlates −.75 with the refer-
ential cohesion easability score (see Chapter 5). However, SYNMEDwrd and
SYNMEDlem tend to be more strongly correlated with referential and semantic
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 71 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 71

cohesion (r= −.4 to −.7) than does SYNMEDpos (r=−.2 to −.6), which tends to
correlate also with syntactic complexity (r = −.3 to −.6).
Coh-Metrix 3.0 provides two measures of sentence-to-sentence syntax
similarity (SYNSTRUTa, SYNSTRUTt) by measuring the uniformity and
consistency of the syntactic constructions in the text. SYNSTUTa is the
average parse tree similarity (Sim) between adjacent sentence pairs in a text.
SYNSTUTt is the average parse tree similarity (Sim) between all combina-
tions of sentence pairs across paragraphs of the text. SYNSTRUT is based on
parse tree similarities between sentences. For two sentence parse trees, the
maximum common tree is found by removing uncommon subtrees. The
parse tree similarity is computed by the following formula:

Sim ¼ nodes in the common tree=ðthe sum of the nodes in the two
sentence trees  nodes in common treeÞ:

Figure 4.1 illustrates how the common tree is constructed. There are 8 nodes
in the first tree and 10 nodes in the second tree. In the figure, the yellow nodes
are common nodes. There are 6 common nodes. The rectangle leaves with
words are not counted as nodes. Therefore, the similarity is computed as
Sim = 6/ ((8+10)–6) = 6/12 = 0.50. This index not only looks at syntactic

S1 S1

S S

NP VP . NP VP .

DT NN VBD PRP VBD NP

DT NN

The man came . He entered the door .

Note: DT = determiner, NN = noun (singular or mass), NP = noun phrase, PRP = personal pronoun,
S1 = sentence, S = simple declarative clause, VBD = verb (past tense), VP = verb phrase
fi g u r e 4 . 1 . Sentence-to-sentence syntax similarity. This figure presents sentence-to-
sentence syntax similarity (SYNSTRUT) between the two adjacent sentences: “The man
came. He entered the door.” The yellow nodes represent the common nodes between the
two sentences. The outcome of the analysis indicates that 6 nodes are common, and 12
are not, with the result of 0.50 for the index.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 72 [60–77] 8.10.2013 8:25PM

72 Automated Evaluation of Text and Discourse with Coh-Metrix

similarity across sentence pairs at the phrasal level, but also takes account of
the parts of speech involved. More uniform syntactic constructions result in
less complex syntax that is easier for the reader to process (Crossley,
Greenfield, & McNamara, 2008).

syntactic pattern density


Syntactic complexity is also informed by the density of particular syntactic
patterns, word types, and phrase types. Coh-Metrix provides information on
the incidence of noun phrases (DRNP), verb phrases (DRVP), adverbial phrases
(DRAP), and prepositions (DRPP). The relative density of each of these can be
expected to affect processing difficulty of text, particularly with respect to other
features in a text. For example, if a text has a higher noun and verb phrase
incidence, it is more likely to be informationally dense with complex syntax.
Coh-Metrix also measures relative frequency of sentences with passive
voice (DRPVAL), which are more difficult to process than are sentences
with active voice (Just & Carpenter, 1987). In addition, it provides the
incidence of negation (DRNEG), which is also associated with processing
difficulty (Clark & Clark, 1977; Just & Carpenter, 1971).
Finally, Coh-Metrix provides an indicator regarding the incidence of verb
conjugation in the text. It provides the relative frequency of the use of the
gerund (DRGERUND; in its -ing form) as well as verbs as infinitives
(DRINF). A verb’s infinitive is its unmarked form, such as “be,” “have,” or
“write.” Infinitives are prevalent in situation models with a high density of
intentional content, where agents perform actions in order to achieve goals.

word information
Vocabulary knowledge, and thus the types of words that are presented in a
text, has a substantial impact on reading time and comprehension (Perfetti,
2007; Rayner et al., 2001; Stanovich, 1986). The words in textbooks and the
texts that children encounter beginning in the late elementary years contain
increasingly more complex and unfamiliar words (Adams, 1990; Beck,
McKeown, & Kucan, 2002). Therefore, it is important to analyze words on
multiple characteristics and dimensions that have relevance to reading devel-
opment and the construction of meaning in text. Coh-Metrix provides an
abundance of word measures that are described in this section.
Parts of Speech. As discussed in greater detail in Chapter 3, each word is
assigned a syntactic part-of-speech category. These syntactic categories are
segregated into content words (e.g., nouns, verbs, adjectives, adverbs) and
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 73 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 73

function words (e.g., prepositions, determiners, pronouns). Many words can


be assigned to multiple syntactic categories. For example, the word “bank”
can be a noun (“river bank”), a verb (“don’t bank on it”), or an adjective
(“bank shot”). Coh-Metrix assigns only one part-of-speech category to each
word on the basis of its syntactic context.
Coh-Metrix computes the relative frequency of each word category by
counting the number of instances of the category per 1,000 words of text,
called incidence scores. These include noun incidence (WRDNOUN), verb
incidence (WRDVERB), adjective incidence (WRDADJ), adverb incidence
(WRDADV), and pronoun incidence (WRDPRO). Pronouns are segregated
further into first-person singular pronouns (WRDPRP1s: “I,” “me”), first-
person plural pronouns (WRDPRP1p: “we,” “us”), second-person pronouns
(WRDPRP2: “you”), third-person singular pronouns (WRDPRP3s: “he,”
“she,” “it”), third-person plural pronouns (WRDPRP3p: “they,” “those”).
These distinctions between types of pronouns and their usage have impor-
tant repercussions on other levels of meaning (Pennebaker, Booth, &
Francis, 2007).
Word Frequency. Word frequency indices measure how often particular
words occur in the English language. Words that occur with a higher fre-
quency are more familiar to the reader and are processed more quickly.
Highly frequent content words are linked to richer bodies of world knowledge
(Beck et al., 2002; Haberlandt & Graesser, 1985; Perfetti, 2007). As discussed in
Chapter 3, word frequency in Coh-Metrix is currently computed using
CELEX, the database from the Dutch Centre for Lexical Information
(Baayen, Piepenbrock, & Gulikers, 1995), which is based on an analysis of
17.9 million words. The Coh-Metrix indices report a value for all the word
tokens in the text except those not contained in the CELEX database. If a word
in a text is not included in the CELEX corpus, it is not computed in the Coh-
Metrix indices. Coh-Metrix includes the raw word frequency for content
words (WRDFRQc), the logarithm of word frequency for all words
(WRDFRQa), and the minimum log word frequency for content words
(WRDFRQmc). Log frequency is computed because reading times are line-
arly related to the logarithm of word frequency, not raw word frequencies
(Haberlandt & Graesser, 1985; Just & Carpenter, 1987; see also Chapter 3).
Usually content words, rather than the highly frequent function words, are
considered in these computations. Moreover, it is the low-frequency words in
a sentence that are an important limiting factor in comprehending sentences
and text. One rare word can make the entire sentence difficult to comprehend.
Hence, Coh-Metrix provides the average minimum log frequency words
across sentences.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 74 [60–77] 8.10.2013 8:25PM

74 Automated Evaluation of Text and Discourse with Coh-Metrix

Psychological Ratings. Coh-Metrix uses two lexical databases as sources


for additional information on words based on psychological and semantic
dimensions. The first source is the MRC Psycholinguistic Database, which
provides ratings for several thousand words along several psychological
dimensions (Coltheart, 1981; see Chapter 3). The age-of-acquisition measure
specifies the age in which a word first appears in a child’s vocabulary, whereas
the other measures are based on adults rating samples of content words on
7-point scales, with higher scores reflecting easier processing. Ratings on the
1–7 scale were subsequently multiplied by 100 and rounded to the nearest
integer so as to be able to present all the ratings as integers on a scale from 100
to 700. The familiarity, concreteness, and imagability measures were derived
from a merging of the Paivio, Yuille, and Madigan (1968) norms, the
Colorado norms (Toglia & Battig, 1978), and the Gilhooly and Logie (1980)
norms. Details of merging are provided in appendix 2 of the MRC
Psycholinguistic Database User Manual (Coltheart, 1981a; http://websites.
psychology.uwa.edu.au/school/MRCDatabase/uwa_mrc.htm).
The second source is WordNet (Fellbaum, 1998; Miller et al., 1990; see
Chapter 3). From WordNet, Coh-Metrix provides estimates of word poly-
semy and hypernymy. The MRC and WordNet indices are described in the
following list.

1. Age of acquisition (WRDAOAc). Coh-Metrix includes the age-of-


acquisition norms from MRC, which were compiled by Gilhooly and
Logie (1980) for 1,903 unique words. The c at the end of the index name
indicates that it is calculated for the average ratings for content words in
a text. Age of acquisition reflects the notion that some words appear in
children’s language earlier than others. Words such as “cortex,”
“dogma,” and “matrix” (AOA= 700) have higher age-of-acquisition
scores than words such as “milk,” “smile,” and “pony” (AOA =202).
Words with higher age-of-acquisition scores denote spoken words that
are learned later by children.
2. Familiarity (WRDFAMc). This is a rating of how familiar a word seems
to an adult. Sentences with more familiar words are words that are
processed more quickly. MRC provides ratings for 3,488 unique words.
Coh-Metrix provides the average ratings for content words in a text.
Raters for familiarity provided ratings using a 7-point scale, with 1 being
assigned to words that they never had seen and 7 to words that they had
seen very often (nearly every day). The ratings were multiplied by 100
and rounded to integers. For example, the words “milk” (588), “smile”
(594), and “pony” (524) have an average Familiarity of 569 compared to
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 75 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 75

the words “cornet” (364), “dogma” (328), and “manus” (113), which
have an average Familiarity of 268. Words with very high Familiarity
include “mother” (632) and “water” (641), compared to “calix (124) and
“witan” (110).
3. Concreteness (WRDCNCc). This is an index of how concrete or non-
abstract a word is. Words that are more concrete are those things you
can hear, taste, or touch. MRC provides ratings for 4,293 unique words.
Coh-Metrix provides the average ratings for content words in a text.
Words that score low on the concreteness scale include “protocol”
(264) and “difference” (270), compared to “box” (597) and “ball” (615).
4. Imagability (WRDIMGc). An index of how easy it is to construct a mental
image of the word is also provided in the merged ratings of the MRC,
which provides ratings for 4,825 words. Coh-Metrix provides the average
ratings for content words in a text. Examples of low-imagery words are
“reason” (285), “dogma” (327), and “overtone” (268) compared to words
with high imagery such as “bracelet” (606) and “hammer” (618).
5. Meaningfulness (WRDMEAc). These are the meaningfulness ratings
from a corpus developed in Colorado by Toglia and Battig (1978). MRC
provides ratings for 2,627 words. Coh-Metrix provides the average
ratings for content words in a text. An example of meaningful word
is “people” (612) as compared to “abbess” (218). Words with higher
meaningfulness scores are highly associated with other words (e.g.,
“people”), whereas a low meaningfulness score indicates that the
word is weakly associated with other words.
6. Polysemy (WRDPOLc). Polysemy refers to the number of senses (core
meanings) of a word. For example, the word “bank” has at least two
senses, one referring to a building or institution for depositing money
and the other referring to the side of a river. Coh-Metrix provides
average polysemy for content words in a text. Polysemy relations in
WordNet are based on synsets (i.e., groups of related lexical items),
which are used to represent similar concepts but distinguish between
synonyms and word senses (Miller et al., 1990). These synsets allow for
the differentiation of senses and provide a basis for examining the
number of senses associated with a word. Coh-Metrix reports the
mean WordNet polysemy values for all content words in a text.
Word polysemy is considered to be indicative of text ambiguity because
the more senses a word contains relates to the potential for a greater
number of lexical interpretations. However, more frequent words also
tend to have more meanings, and so higher values of polysemy in a text
may be reflective of the presence of higher frequency words.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 76 [60–77] 8.10.2013 8:25PM

76 Automated Evaluation of Text and Discourse with Coh-Metrix

7. Hypernymy (WRDHYPn, WRDHYPv, WRDHYPnv). Coh-Metrix also


uses WordNet to report word hypernymy (i.e., word specificity). In
WordNet, each word is located on a hierarchical scale allowing for the
measurement of the number of subordinate words below and super-
ordinate words above the target word. Thus, “entity,’ as a possible
hypernym for the noun “chair,” would be assigned the number 1. All
other possible hyponyms of entity as it relates to the concept of a chair
(e.g., “object,” “furniture,” “seat,” “chair,” “camp chair,” “folding
chair”) would receive higher values (see also Chapter 3). Similar values
are assigned for verbs (e.g., “hightail,” “run,” “travel”). As a result, a
lower value reflects an overall use of less-specific words, whereas a
higher value reflects an overall use of more-specific words. Coh-
Metrix provides estimates of hypernymy for nouns (WRDHYPn),
verbs (WRDHYPv), and a combination of both nouns and verbs
(WRDHYPnv).

norms
This chapter has presented all of the indices that are provided in Coh-Metrix
3.0 except those that are related to readability. Comparative norms for the
indices are provided in Appendix B, separated by grade level for three text
genres (language arts, social studies, and science). To create the norms, we
analyzed a subset of a large corpus of texts created by the Touchstone Applied
Science Associates (TASA), Inc. The TASA corpus has 9 genres consisting of
119,627 paragraphs taken from 37,651 samples. The passages all consisted of
one paragraph, because paragraph breaks are not marked in the TASA
corpus. Hence, these norms are not based on a corpus that provides variation
between paragraphs or information at the paragraph level. We nonetheless
used TASA because it is a large corpus that has proven to be representative of
other texts and differences between text genres.
We calculated norms for the three largest domains represented in TASA:
language arts, social studies, and science texts. To do so, we randomly chose
100 passages from each of the 3 genres and each of 13 grade levels, for a total of
3,900 passages. Grade level in the TASA corpus is indexed by the Degrees of
Reading Power (DRP; Koslin et al., 1987). Notably, because the grade levels are
estimated using DRP values, they correspond to grade levels estimated by a
readability measure and do not correspond to an actual grade level. As
described earlier, DRP grade level is defined by a formula that includes
word and sentence characteristics, such as word frequency and sentence
length. To simplify the data analysis and presentation, grade level was
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C04.3D 77 [60–77] 8.10.2013 8:25PM

Coh-Metrix Measures 77

collapsed across the DRP levels corresponding to the grade bands used within
the Common Core State Standards: grades K to 1 (n=100), 2 to 3 (n=200), 4 to
5 (n=200), 6 to 8 (n=300), 9 to 10 (n=200), and 11 and above (n=300). The
average DRP values as well as the range of DRP values for each grade band are
provided in Appendix B.

conclusion
This chapter has provided a description of the indices that we included in the
most recent version of Coh-Metrix, Version 3.0. This is a small selection of
hundreds of indices that we have explored over the past 10 years. These are the
indices that have risen to the top across the multitude of analyses and studies
conducted using Coh-Metrix. Many of the indices we have developed and
examined have not panned out. Either they simply did not measure what they
were intended to measure, or they were not as predictive of textual differences
in comparison to the indices we have included here.
We have included 106 indices in Coh-Metrix 3.0. We would have preferred
to narrow down the selection of indices even further than we have here.
However, we each have our favorites. Also, different measures are useful to
address different kinds of research questions. In addition, the number of
indices has increased because we have included in this version the standard
deviations for many of the measures. These had not been included in previous
public versions of Coh-Metrix. We have done so because we find the standard
deviation of an index informative both in terms of understanding variation
for the particular index and in terms of understanding the characteristics
of text.
In the following chapter we describe the remaining indices that were not
covered in this chapter. These are the indices related to readability, or text
difficulty. We include the Flesch measures of readability (i.e., Flesch Reading
Ease, Flesch-Kincaid Grade Level) that focus on the word and sentence levels
of complexity, but our primary focus is on the Coh-Metrix Text Easability
Principal Component scores. These are measures of text ease that have been
developed by statistically combining together the indices presented in the
current chapter. Our overarching goal in the Coh-Metrix project has been to
provide a means to enhance our understanding of text difficulty. Hence, the
text easability scores described in Chapter 5 represent a culmination of our
efforts in the Coh-Metrix project.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 78 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability


and Easability

One important question with which the Coh-Metrix team has grappled is
how to measure text difficulty, complexity, or, in turn, its ease. This chapter
describes the two traditional readability measures provided by Coh-Metrix –
Flesch-Kincaid Grade Level (RDFKGL) and Flesch Reading Ease (RDFRE) –
as well as the readability index that we developed for second-language texts
(RDL2). We also describe the Coh-Metrix Text Easability Principal
Component Scores that are provided in Coh-Metrix 3.0 (i.e., PCNAR,
PCSYN, PCCNC, PCREF, PCDC, PCVERB, PCONN, PCTEMP).
The traditional and more common approach to scaling texts is to have a
single metric of text ease or difficulty. This is the approach taken by popular
metrics such as Flesch-Kincaid Grade Level (Kincaid, Fishburne, Rogers, &
Chissom, 1975) and Flesch Reading Ease (Flesch, 1948; Klare, 1974–1975),
which are provided by the Coh-Metrix tool. These two Flesch-Kincaid met-
rics are based on the length of words and sentences within the text. In Coh-
Metrix, the Flesch-Kincaid Grade Level (RDFKGL) is computed as [(0.39 *
sentence length) + (11.8 * word length) – 15.59]. The Flesch Reading Ease
(RDFRE) is computed as [206.835 – (1.015 * sentence length) – (84.6 * word
length)]. Sentence length (DESSL) is measured by the mean number of words
per sentence in a text, whereas word length (DESWLsy) is measured as the
mean number of syllables per word (which is highly correlated with the mean
number of letters).
These readability measures can provide robust predictors of sentence-level
understanding and the amount of time it takes to read a passage. Indeed, these
types of text comprehension measures offer impressive validation of the
metric. There are a number of theoretical explanations for the validity of
these and similar metrics, but two principal ones refer to the effects of word
knowledge and working memory while reading. First, infrequent words in a
language tend to be longer according to Zipf’s (1949) law, so the word length
78
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 79 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 79

variable theoretically serves as a proxy for a reader’s word knowledge. When


readers are more likely to know fewer words, the text is likely to be more
difficult. Second, long sentences are more difficult to parse because they are
more likely to include complex syntax. Therefore, long sentences will theo-
retically tend to place more demands on working memory. Sentence length
serves as a proxy for these cognitive factors.
These two Flesch readability measures are highly correlated with other
traditional measures of text difficulty, such as Degrees of Reading Power
(DRP; Koslin, Zeno, & Koslin, 1987) and Lexile scores (Stenner, 2006),
according to available reports as well as statistical analyses we have con-
ducted, with correlations generally ranging from 0.85 to 0.95. These types of
readability formulas have been used for decades to provide educators with an
estimate of the difficulty of a text in relation to the grade level or reading
ability of the reader. One limitation of traditional readability measures is that
they consider only the superficial characteristics of text, which in turn tend to
be predictive of readers’ surface understanding: their understanding of the
words and of individual sentences. In addition, assessments that are used to
validate or provide readability scores most often use a cloze task. In the cloze
task, a word in a sentence is left blank and the reader is asked to fill in the
words by selecting a word from a set of options. A text is considered to be at
the reader’s level of proficiency if the reader can perform the cloze task at a
threshold of performance (e.g., 75%). A text is generally defined as easy for a
population of readers if performance exceeds 75% and difficult to the extent
that it is lower than 75%. Grade level can be calibrated for a text by identifying
the age group that converges on the 75% level of performance. Cloze tasks by
their very nature assess comprehension primarily within sentences based on
word associations (Shanahan, Kamil, & Tobin, 1982) and depend primarily on
decoding rather than language comprehension skills (Keenan, Betjemann, &
Olson, 2008).
Some models of early reading focus primarily on sentence and word
understanding. However, most comprehension models (Graesser &
McNamara, 2011; Kintsch, 1998; McNamara & Magliano, 2009; Van Dijk &
Kintsch, 1983) propose that there are multidimensional levels of understand-
ing that emerge during the comprehension process, including (at least) sur-
face, textbase, and situation model levels (see Chapter 2). Readability
formulas, by contrast, assume a unidimensional representation.
The simplicity of a single dimension of text difficulty can be useful when
assigning texts for students to read. A single dimension provides a common
currency of difficulty for different texts in different categories, which makes it
easier for reading teachers when they strive to select texts at the appropriate
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 80 [78–95] 8.10.2013 9:08PM

80 Automated Evaluation of Text and Discourse with Coh-Metrix

level of challenge. A teacher may assign a text that is at just the right level,
challenge a student with a more difficult text, or provide a text that is easy
enough for the student to readily understand. A unidimensional metric
provides a simple solution to this task because the dimensions are generally
aligned with a common metric – grade level.
We have conducted two projects to explore unidimensional metrics of text
readability. The first resulted in the L2 Readability (RDL2) score that is
provided in Coh-Metrix 3.0. The second developed an algorithm to predict
textbook grade levels. These algorithms are described in the following
sections.

second-language readability score


The Coh-Metrix L2 Readability (RDL2) score is a unidimensional readability
formula intended to predict the readability of texts, in particular for second-
language readers (Crossley, Allen, & McNamara, 2011; Crossley, Greenfield, &
McNamara, 2008). The L2 Readability score considers content word overlap,
sentence syntactic similarity, and word frequency. As such, this formula
considers text challenges at the sentence and the word level, but it also
considers the cohesion between sentences in the text. Specifically, the L2
Reading Index as reported by Crossley, Salsbury, McCarthy, and
McNamara (2008) is provided in formula 5.1.

45:032 þ ð52:230  CRFCWO1Þ


þ ð61:306  SYNSTRUTÞ ð5:1Þ
þ ð22:205  WRDFRQmcÞ

The L2 formula was based on the subset of Bormuth’s (1971) corpus of 32


academic reading texts used by Greenfield (1999) to develop the Miyazaki
EFL readability index. The Bormuth texts (M=269 words) were collected
from instructional materials including passages from biology, chemistry,
civics, current affairs, economics, geography, history, literature, mathe-
matics, and physics (see also, Crossley, McCarthy, Dufty, & McNamara,
2007). Greenfield collected cloze performance on the subset of passages
from 200 Japanese university students. The correlations between the stu-
dents’ cloze scores were 0.85 for Flesch Reading Ease, Flesch-Kincaid Grade
Level (Kincaid et al., 1975), and the Miyazaki EFL readability index, and 0.86
for the Bormuth (1971) formula. The Coh-Metrix L2 Readability formula
correlated 0.93 with the Japanese students’ cloze test performance on the
passages. Hence, the L2 formula provides a significant improvement in
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 81 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 81

predicting cloze performance by L2 readers on academic texts. Notably, the


distinct difference between it and the other formulas is that it goes beyond
difficulty at the level of words and sentences and also considers challenges in
terms of the cohesion of the text.
The L2 Readability formula has not been further assessed in terms of its
ability to predict either L2 or first language readers’ comprehension of texts.
However, Crossley, Allen, and McNamara (2011) compared the L2 Readability
formula to the Flesch-Kincaid Grade Level and Flesch Reading Ease scores in
their ability to classify texts that are typically read by L2 readers. Texts for
language learners are routinely simplified in various ways to make them more
comprehensible to the readers. Material developers who are simplifying texts
often follow guidelines on word lists or use traditional readability formulas
such as the Flesch-Kincaid. Alternatively, materials developers follow intui-
tive approaches driven by the editor’s sense of text comprehensibility.
Crossley et al. (2011) compared the three readability formulas’ ability to
classify 300 L2 news texts that had been simplified by an independent group
of authors (i.e., Allen, 2009) at the beginning, intermediate, and advanced
levels using intuition and without word lists or readability formulas. The
British news texts were originally taken from the Guardian Weekly (see http://
www.onestopenglish.com) and were typically selected for their nonacademic
interest value. Crossley et al. (2011) found, as predicted, that the L2 formula
was the best predictor of level classification, correctly classifying 59% of the
reading texts by level overall. It faired best at classifying the beginner and
advanced texts (70% accuracy) and least well for the intermediate texts (39%
accuracy). This is not an uncommon finding where there is an intermediate
category that contains features from both categories. Importantly, the Flesch
indices faired more poorly, with average accuracies ranging between 44% and
48%. These results confirmed the advantages of the Coh-Metrix L2 Reading
Index in classifying and examining differing levels of intuitively simplified
texts over at least two traditional readability formulas.

assigning grade levels to textbooks


The typical approach in developing readability formulas is to develop an
algorithm that predicts readers’ comprehension, often on a cloze test.
Another approach to estimating the readability of texts is to predict the
publisher-assigned grade level of textbooks. Dufty, Graesser, Louwerse, and
McNamara (2006) sampled extracts of up to 5,000 words from 311 textbooks
that were provided by MetaMetrics, Inc. The text samples included narrative,
science, and social science genres in four grade categories: K–3, 4–6, 7–9, and
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 82 [78–95] 8.10.2013 9:08PM

82 Automated Evaluation of Text and Discourse with Coh-Metrix

10–12. The assigned grade level of these texts is determined by the publisher
and assumedly derived from a complex mix of quantitative indices (such as
Flesch-Kincaid Grade Level), the intuition of expert judgment, and the avail-
ability and the requirements of the given state. In this study, Dufty et al.
(2006) examined the degree to which Coh-Metrix successfully predicted these
assigned grade levels. They found that Flesch Kincaid Grade Level correlated
0.77 with grade level, and that cohesion as measured by LSA sentence to text
similarity correlated –.53. A multiple regression analysis indicated that a
combination of variables produced an R2 of .68, which means that cohesion
in combination with Flesch-Kincaid explains 68% of the variance in the grade
level of the textbooks. Of these variables, three cohesion variables significantly
contributed: LSA sentence to text, incidence of causal verbs, and the incidence
of causal connectives. The results suggested that cohesion could predict
publisher-assigned grade level, and that cohesion in combination with
Flesch-Kincaid Grade Level predicted publisher-assigned grade level better
than either readability alone or cohesion alone. This study, therefore, pro-
vided evidence to support the assumption that cohesion has an important role
to play in the evaluation of text difficulty.

a multidimensional approach
While their simplicity and alignment with grade level might be appealing,
there are a number of reasons why unidimensional representations of com-
prehension may be unsatisfying both theoretically and to a practitioner. First,
unidimensional representations of comprehension tend to ignore the impor-
tance of readers’ deeper levels of understanding. As discussed earlier, tradi-
tional readability measures focus on superficial characteristics of text related
to readers’ understanding of the words and of individual sentences in the text.
Likewise, cloze tasks are most often used to gauge individuals’ reading levels,
and these tasks assess comprehension at the word and sentence level. Hence,
traditional readability measures do not tap readers’ ability to comprehend
global levels of discourse meaning.
Second, unidimensional measures ignore the multiple factors that influ-
ence comprehension, particularly those that influence readers’ use of knowl-
edge and deep comprehension such as cohesion and text genre. Genre
refers to the category of text, such as whether the text is primarily narrative
(e.g., novels, folktales), expository (e.g., textbooks, journal articles), persuasive
(e.g., editorials, sermons), or descriptive (Biber, 1988; Pentimonti, Zucker,
Justice, & Francis, 2010). There are distinctive characteristics of language that
signal text genre (Biber, 1988). The genre of a text can be particularly
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 83 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 83

informative with regard to its difficulty. For example, narrative text is sub-
stantially easier to read, comprehend, and recall than is informational text
(Graesser & McNamara, 2011; Haberlandt & Graesser, 1985).
Third, unidimensional metrics of text difficulty are not particularly helpful
or informative to educators when specific guidance is needed for diagnosing a
student’s particular deficit and planning remediation for students (Connor,
Morrison, Fishman, Schatschneider, & Underwood, 2007; Rapp, van den
Broek, McMaster, Kendeou, & Espin, 2007). Readability formulas do not
identify particular characteristics of texts that may be challenging or helpful
to a student. Unidimensional readability scores provide too little information
to teachers on the nature of a text’s complexity. Most importantly, although a
grade level estimate may indicate to a teacher that a text is more or less
difficult, the score does not provide information on why it is difficult. The
scaling and selection of texts would potentially benefit from an analysis of
multiple levels of language and discourse. One of the advantages of Coh-
Metrix is that it has the potential to inform the type of questions and activities
teachers might employ when presenting texts to the entire class or small
groups. By knowing the potential difficulties of any text in advance, teachers
can craft questions or tasks that help students recognize and overcome these
difficulties.

coh-metrix text easability component z-scores


Coh-Metrix provides information about text at multiple levels of linguistic
analysis, including word characteristics, sentence characteristics, and the
discourse relationships between ideas in text (see Chapter 3). Our ultimate
objective has been to transcend traditional measures of readability that focus
on surface characteristics of texts, which principally tend to affect surface
comprehension. Indeed, one motivation for the development of Coh-Metrix
was to provide better measures of text difficulty (Duran, Bellissens, Taylor, &
McNamara, 2007), and particularly the specific sources of potential chal-
lenges or scaffolds within texts. Coh-Metrix, in contrast to traditional meas-
ures of text readability, has the potential to offer a more complete picture of
the potential challenges that may be faced by a reader as well as the potential
scaffolds that may be offered by the text. Coh-Metrix is motivated by theories
of discourse and text comprehension. As described in earlier chapters, such
theories describe comprehension at multiple levels, from shallow, text-based
comprehension to deeper levels of comprehension that integrate multiple
ideas in the text and bring to bear information that elaborates the ideas in
the text using world and domain knowledge (Graesser & McNamara, 2011).
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 84 [78–95] 8.10.2013 9:08PM

84 Automated Evaluation of Text and Discourse with Coh-Metrix

Coh-Metrix assesses challenges that may occur at the word and sentence
levels as well as deeper levels of language. By doing so it comes closer to
having the capability to estimate how well a reader will comprehend a text at
deeper levels of cognition.
Through research on and with Coh-Metrix (see Chapters 2 and 6), we
have gained a deeper understanding of how texts differ and which indices
are most reliable in detecting these differences at meaningful, consequen-
tial levels. Most recently, this work has culminated in the development
of the Coh-Metrix easability components (Graesser, McNamara, &
Kulikowich, 2011). These components provide a more complete picture
of text ease (and difficulty) that emerge from the linguistic characteristics
of texts. The easability components provided by Coh-Metrix go beyond
traditional readability measures by providing metrics of text character-
istics on multiple levels of language and discourse. Moreover, they are well
aligned with theories of text and discourse comprehension (e.g.,
Graesser & McNamara, 2011; Graesser, Singer, & Trabasso, 1994; Kintsch,
1998; McNamara & Magliano, 2009).
In order to discover what aspects of texts comprise text complexity,
Graesser, McNamara, and Kulikowich (2011) conducted a principal compo-
nents analysis (PCA) on 54 Coh-Metrix indices for 37,520 texts in the TASA
corpus. This corpus comprises excerpts (M=287 words) from texts (without
paragraph break markers) that students can be expected to encounter from
kindergarten through 12th grade. The majority of the text genres are charac-
terized as language arts, science, and social studies/history texts, but the
corpus also includes texts from the domains of business, health, home
economics, and industrial arts. The TASA corpus is the most comprehensive
collection of K–12 texts currently available for research. PCA was used to
reduce the large multivariate database to fewer functional dimensions (e.g.,
Brun, Ehrmann, & Jacquet, 2007). Eight components accounted for a sub-
stantial 67.3% of the variability among texts. These components are notably
closely aligned with the multilevel theoretical framework described in
Chapter 3 and by Graesser and McNamara (2011).
In Coh-Metrix 3.0, we provide these eight components in the form of
z-scores and percentile scores. A z-score is a standard score that indicates
how many standard deviations an observation or datum is above or below the
mean, where the mean is set at 0. A percentile score varies from 0 to 100%,
with higher scores meaning the text is likely to be easier to read than other
texts in the corpus. For example, a percentile score of 80% means that 80% of
the texts are more difficult and 20% are easier. The eight components are as
follows.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 85 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 85

1. Narrativity (PCNARz, PCNARp). Narrative text tells a story, with


characters, events, places, and things that are familiar to the reader.
Narrative is closely affiliated with everyday, oral conversation. This
robust component is highly affiliated with word familiarity, world
knowledge, and oral language. Non-narrative texts on less familiar
topics lie at the opposite end of the continuum.
2. Syntactic Simplicity (PCSYNz, PCSYNp). This component reflects the
degree to which the sentences in the text contain fewer words and use
simpler, familiar syntactic structures that are less challenging to process.
At the opposite end of the continuum are texts that contain sentences
with more words and that use complex, unfamiliar syntactic structures.
3. Word Concreteness (PCCNCz, PCCNCp). Texts that contain content
words that are concrete and meaningful and evoke mental images are
easier to process and understand. Abstract words represent concepts
that are difficult to represent visually. Texts that contain more abstract
words are more challenging to understand.
4. Referential Cohesion (PCREFz, PCREFp). A text with high referential
cohesion contains words and ideas that overlap across sentences and
the entire text, forming explicit threads that connect the text for the
reader. Low-cohesion text is typically more difficult to process because
there are fewer connections that tie the ideas together for the reader.
5. Deep Cohesion (PCDCz, PCDCp). This dimension reflects the degree to
which the text contains causal and intentional connectives when there
are causal and logical relationships within the text. These connectives
help the reader form a deeper and more coherent understanding of the
causal events, processes, and actions in the text. When a text contains
many relationships but does not contain those connectives, the reader
must infer the relationships between the ideas in the text. If the text is
high in deep cohesion, then those relationships and global cohesion are
more explicit.
6. Verb Cohesion (PCVERBz, PCVERBp). This component reflects the
degree to which there are overlapping verbs in the text. When there are
repeated verbs, the text likely includes a more coherent event structure
that will facilitate and enhance situation model understanding. This
component score is likely to be more relevant for texts intended for
younger readers and for narrative texts (McNamara, Graesser, &
Louwerse, 2012).
7. Connectivity (PCCONNz, PCCONNp). This component reflects the
degree to which the text contains explicit adversative, additive, and
comparative connectives to express relations in the text. This
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 86 [78–95] 8.10.2013 9:08PM

86 Automated Evaluation of Text and Discourse with Coh-Metrix

component reflects the number of logical relations in the text that are
explicitly conveyed. This score is likely to be related to the reader’s
deeper understanding of the relations in the text.
8. Temporality (PCTEMPz, PCTEMPp). Texts that contain more cues
about temporality and that have more consistent temporality (i.e.,
tense, aspect) are easier to process and understand. In addition, tem-
poral cohesion contributes to the reader’s situation model level under-
standing of the events in the text.

Of these eight components (narrativity, syntactic simplicity, word concrete-


ness, referential cohesion, deep cohesion, verb cohesion, connectivity, and
temporality), the first five accounted for 54% of the variance. These first
components have been incorporated within a tool intended for educators,
called Coh-Metrix text easability components, because they are most directly
associated with the ease of a text and because they account for the largest
portion of the variance among the 37,520 texts. We refer to these as dimen-
sions of text easability. Coh-Metrix provides both percentile scores and
z-scores as measures of easability. Notably, the percentile and z-scores have
a monotonic but not a linear relationship to each other. Generally, the
z-scores are the preferred scores for research and statistical purposes, but
the percentiles are more easily understood, particularly in a graph.
Graesser et al. (2011) described the relations between the component scores
and grade level estimates. They reported that Degrees of Reading Power
(DRP; Koslin, Zeno, & Koslin, 1987) grade level estimates are primarily
correlated with narrativity (r = −.69) and syntactic simplicity (r = −.47).
Texts at lower grade levels tend to have simpler syntax and are less likely to
contain features characteristic of informational texts (e.g., science, social
studies). They also found that word concreteness tended to decrease across
grade levels (r = −.23) but that referential cohesion (r = .03) and deep cohesion
(r = .11) did not vary systematically or strongly across grade levels as defined
by DRP in the TASA corpus. These results were expected because cohesion is
generally orthogonal to readability. The two constructs are generally not
correlated assumedly because variations in cohesion that affect comprehen-
sion occur both within and across grade levels (i.e., independent of sentence
level and word level challenges).
The Coh-Metrix dimensions of text easability have been further evaluated
by an independent team (Nelson, Perfetti, Liben, & Liben, 2012). Nelson et al.
(2012) reported the correlations between the component percentile scores and
grade level estimates for four sets of texts (i.e., Common Core Exemplar
Texts, State Test Passages, Gates-MacGinitie, and SAT-9), as well as the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 87 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 87

correlation between the component scores and student performance on three


assessments (Gates-MacGinitie, Oasis, and SAT-9). In their analyses relating
the Coh-Metrix component scores to the grade level of text, they confirmed
that syntactic simplicity was the dimension most highly correlated with grade
level. For most of the text sets, narrativity and referential cohesion were also
correlated with grade level, with more cohesive texts and more narrative texts
in the younger grade levels. They also reported that syntactic simplicity,
narrativity, and referential cohesion significantly correlated with student
performance on all three assessments.
One particular advantage Nelson et al. (2012) reported of the Coh-Metrix
components in comparison to unidimensional readability measures was that
Coh-Metrix provided information about the source of the challenges within
each of the different assessments. They found that the comprehension tests
tended to have different sources of challenges within the texts. Identifying the
source of difficulty in an assessment can help understand the nature of the test
as well as account for variation between students (e.g., Ozuru, Best, Bell,
Witherspoon, & McNamara, 2007). It is important to identify the source of
difficulty across texts within grade levels because texts rarely have challenges
at all levels of difficulty. When some aspects of a text are challenging, other
aspects of the text will tend to be easier, to offset the overall difficulty of the
text (e.g., McNamara, Graesser, & Louwerse, 2012). For example, using a
TASA corpus of 37,651 texts, only 89 (0.24%) passages are below the 30th
percentile on all five of the Coh-Metrix components, and likewise, only 88
(0.23%) passages are above the 30th percentile on all five of the Coh-Metrix
components. This means that more than 99% of the passages in TASA have at
least one dimension that is below or above the 30th percentile.
Coh-Metrix easability components augment readability formulas by pro-
viding a picture of the sources of challenges within texts. As discussed in
earlier chapters, one important distinction between texts and source of
difficulty is their genre. It is well documented that narrative is easier to read
than informational texts (Bruner, 1986; Haberlandt & Graesser, 1985;
Graesser, Olde, & Klettke, 2002). Narrativity captures some characteristics
of oral language (Biber, 1988; Clark, 1996; Tannen, 1982), which tends to be on
familiar, contextualized topics as opposed to the decontextualized language of
print. However, no text is pure in terms of genre (McCarthy, Myers, Briner,
Graesser, & McNamara, 2009). For example, some narrative texts have
informational content that explains the setting or context, and some science
texts have story-like language (e.g., the journey of an animal through a
jungle). Narrativity scores indicate the extent to which a text is likely to
contain more familiar, oral language that is easier to understand.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 88 [78–95] 8.10.2013 9:08PM

88 Automated Evaluation of Text and Discourse with Coh-Metrix

Easability Percentile Scores

Easability Percentile Scores

Easability Percentile Scores


Language Arts Social Studies Science
100 100 100
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
ity

ity

ity

ity

ity

ity

n
es

es

es
io

io

io

io

io

io
tiv

ic

tiv

lic

tiv

lic
es

es

es

es

es

es
en

en

en
pl

p
ra

ra

ra
oh

oh

oh

oh

oh

oh
m

m
et

et

et
ar

ar

ar
Si

Si

Si
cr

lC

cr

lC

cr

lC

C
N

N
on

on

on
tic

tic

tic
p

p
tia

tia

tia
ee

ee

ee
C

C
ac

ac

ac
en

en

en
D

D
d

d
nt

nt

nt
er

er

er
or

or

or
Sy

Sy

Sy
ef

ef

ef
W

W
R

R
Coh-Metrix Easability Components Coh-Metrix Easability Components Coh-Metrix Easability Components

fi g u r e 5 . 1 . Coh-Metrix percentile scores for the five components (Narrativity,


Referential Cohesion, Syntactic Simplicity, Word Concreteness, and Deep Cohesion)
on 6,755 language arts, 4,463 social studies, and 8,550 science texts from TASA above
DRP grade level 6.

We can visualize differences between text genres using the easability scores.
Figure 5.1 provides the five main Coh-Metrix easability scores (Narrativity,
Syntactic Simplicity, Word Concreteness, Referential Cohesion, and Deep
Cohesion) for a subset of language arts (n=6755), social studies (n=4463), and
science (n=8550) texts above grade level 6 (i.e., using a Degrees of Reading
Power cutoff score of 55.99) from the TASA corpus. These graphs confirm
that the language arts texts tend to have higher narrativity than do the social
studies or science texts. This high narrativity reflects the use of more familiar
words combined with a tendency to focus on events and characters rather
than objects and ideas. By contrast, the social studies and science texts have a
greater density of information and thus lower narrativity.
If a passage is low in narrativity, the reader is potentially left unscaffolded
by world knowledge. In that case, students’ prior domain knowledge in
particular should be considered. While high narrativity scaffolds reading
comprehension by providing more familiar text, at the same time it is
important to recognize the importance of transitioning readers toward less
narrative text (Best, Floyd, & McNamara, 2008; Sanacore & Palumbo, 2009).
Developing readers must learn to understand increasingly complex and
unfamiliar ideas. If a teacher wishes to move the student toward learning to
use knowledge and generating inferences to understand more challenging
text, the teacher may consider where the text falls on the spectrum of
narrativity in terms of the Coh-Metrix easablity scores.
Figure 5.1 confirms that science and social studies texts are informational
texts that are low in narrativity. These passages also tend to have somewhat
lower word concreteness because informational texts tend to include more
abstract concepts than do language arts texts. If a student has very little
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 89 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 89

domain knowledge, teachers may consider using informational texts that help
compensate for vocabulary and mental model deficits. For example, some
informational texts are higher in narrativity and word concreteness than
others are.
Furthermore, other sources of challenges and ease in the text should be
considered, such as syntax and cohesion (O’Reilly & McNamara, 2007).
Similar to the findings in McNamara, Graesser, and Louwerse (2012),
Figure 5.1 also indicates that science texts tend to have less complex syntax
(e.g., shorter, less complex sentences) and higher referential cohesion than the
other two genres. These sources of ease are necessary for informational texts
that contain a good deal of unfamiliar information. Science texts are, by their
very nature, composed of rare words, making it challenging for students to
understand the concepts in the text. For many readers, greater cohesion and
simpler syntax are crucial for this genre of text. Although language arts texts
tend to have more syntactic challenges for the reader and include more
referential cohesion gaps than do science texts, these types of challenges are
generally surmountable for readers with sufficient world knowledge.
Language in narrative texts at the situation model levels can compensate for
challenges that might result from other challenges.
Interestingly, social studies texts seem to have potential challenges at all
five levels of language. This genre of text does not seem to have a consistent
source of ease to help compensate for those challenges. Likewise, McNamara,
Graesser, and Louwerse (2012) reported that social studies texts have the most
challenging words in comparison to language arts and science texts, but they
are also challenging in terms of syntax and cohesion. Thus, they compensate
for lexical challenges less so than do science texts. Authors’ texts in domains
related to social studies may assume that their readers possess a sufficient level
of knowledge to make inferences about events in the world such as history,
government, civilization, war, geography, and so on. Indeed, readers who
possess the necessary knowledge are likely to comprehend these challenging
texts. But readers who do not may need additional scaffolding to help
compensate for the multiple challenges that potentially arise in social studies
texts.
Examining easability profiles for genres of texts can illuminate their poten-
tial challenges. In addition to examining groups of texts, we can also examine
differences between individual passages. To provide an example, we can
graph the five easability scores for the two passages in Chapter 1, Lady
Chatterley’s Lover, and A Mortgage. The Flesch Grade level scores indicate
that A Mortgage excerpt is a highly challenging passage with a grade level of
15.05 compared to the excerpt from Lady Chatterley’s Lover at a grade level of
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 90 [78–95] 8.10.2013 9:08PM

90 Automated Evaluation of Text and Discourse with Coh-Metrix

Easability Percentile Scores

Easability Percentile Scores


Lady Chatterley's Lover A Mortgage
100 100
80 80
60 60
40 40
20 20
0 0
ity

ity

on

ity

ity

on

n
es

io

es
tiv

ic

io
si

es

iv

ic
en

si
pl

es
e
ra

en
t

pl

e
oh

oh

ra
m

t
ar

oh

oh
re

et
Si

ar
lC

C
N

Si
c

cr

lC

C
N
on
tic

p
tia

on
tic

p
ee

tia
C
ac

ee
en

C
ac
D

en
d
nt

D
er

d
or

nt

er
Sy

or
ef
W

Sy

ef
W
R

R
Coh-Metrix Easability Components Coh-Metrix Easability Components

fi g u r e 5 . 2 . Coh-Metrix percentile scores for the five components (Narrativity,


Referential Cohesion, Syntactic Simplicity, Word Concreteness, and Deep Cohesion)
on two excerpts presented in Chapter 1, Lady Chatterley’s Lover and A Mortgage.

2.91. The latter would imply that Lady Chatterley’s Lover would be appropri-
ate for a second to third grade reader. However, an average grade level
estimate from 14 excerpts across the novel places the book at Grade 5.
The readability scores provide some indication of the reading skill neces-
sary to tackle these texts. Yet these readability scores do not reveal the
potential sources of the challenges or ease in these short excerpts. The
easability scores in Figure 5.2 convey first that the excerpt from Lady
Chatterley’s Lover is high in narrativity, whereas the excerpt from A
Mortgage is very low in narrativity, just as one would expect. There are
additional sources of challenges in A Mortgage. Sources of difficulty come
from the density of information (i.e., low narrativity), highly complex syntax,
moderate referential cohesion, and very low deep cohesion. These challenges
might be potentially offset for the reader by word concreteness, but more
likely prior knowledge of domains such as accounting would play a large role
in how well a reader understood this passage. The sources of complexity for
the excerpt from Lady Chatterley’s Lover seem to come solely from low
referential cohesion, but these are offset for the reader by syntactic simplicity,
word concreteness, and deep cohesion.
Overall, it may seem from Figure 5.2 that the Lady Chatterley’s Lover
passage would not be challenging. Likewise, the readability estimates placed
it at Grade 5. Both the readability scores and Coh-Metrix miss out on the
qualitative and sociological aspects of Lady Chatterley’s Lover that would
prevent a teacher from assigning it to a Grade 5 reader. In addition, a teacher
would have to consider the knowledge necessary to understand this novel. In
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 91 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 91

this case, knowledge of D. H. Lawrence’s ill health at the time that he wrote
Lady Chatterley’s Lover, as well as the relatively misogynist and sexually
repressed society of those times can help a reader understand the deeper
meaning of the story, particularly with respect to current times. Hence, the
Coh-Metrix easability components are informative in that they indicate that
prior knowledge is necessary to understand the passage (i.e., the referential
cohesion is low). But only a qualitative analysis with respect to the potential
readers and a teacher’s pedagogical goals will unveil whether a reading is
appropriate.
As illustrated with the past two examples, Coh-Metrix can be used to better
understand differences between texts at different readability levels, but it can
also be used to understand texts at similar readability levels. Texts often have
the same readability levels but they seem vastly different in terms of the
potential challenges of the text. There are extreme examples where a story
and a science text have the same grade levels but are very different in the skills
that would be called forth to understand the text. A more subtle example
comes two Common Core State Standards (CCSS) story exemplars, Louisa
May Alcott’s Little Women and Mark Twain’s Tom Sawyer. The sample
excerpts from these stories, provided on pages 77–79 of appendix B (www.
corestandards.org/assets/Appendix_B.pdf), are declared to be at a CCSS 6–8
grade band. Likewise, the Flesch Grade level estimate provided by Coh-
Metrix place Little Women at Grade 7 and Tom Sawyer at Grade 6. Below
are the first sentences from the excerpts provided by CCSS:
Little Women: Merry Christmas, little daughters! I’m glad you began at
once, and hope you will keep on. But I want to say one word
before we sit down. Not far away from here lies a poor
woman with a little newborn baby. Six children are huddled
into one bed to keep from freezing, for they have no fire.
Tom Sawyer: But Tom’s energy did not last. He began to think of the fun
he had planned for this day, and his sorrows multiplied.
Soon the free boys would come tripping along on all sorts of
delicious expeditions, and they would make a world of fun
of him for having to work – the very thought of it burnt him
like fire.
As shown in Figure 5.3, the two excerpts have very different profiles on the
various dimensions. They have similar levels of narrativity and referential
cohesion demands. The low referential cohesion is typical of narratives that
call for the reader to make inferences about the characters and events in the
story. Many of the events and characters in these stories may be readily
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 92 [78–95] 8.10.2013 9:08PM

92 Automated Evaluation of Text and Discourse with Coh-Metrix

Easability Percentile Scores Little Women

Easability Percentile Scores


Adventures of Tom
Sawyer
100
80 100
60 80
40 60
40
20 20
0 0
ity

ity

on

ity

ity

n
es

es
io

io

io
iv

ic

iv

ic
si

es

es

es
en

en
t

pl

pl
e
ra

ra
oh

oh

oh

oh
m

m
et

et
ar

ar
Si

Si
cr

lC

cr

lC

C
N

N
on

on
tic

tic

p
tia

tia
ee

ee
C

C
ac

ac
en

en
D

D
d

d
nt

nt
er

er
or

or
Sy

Sy
ef

ef
W

W
R

R
Coh-Metrix Easability Components Coh-Metrix Easability Components

fi g u r e 5 . 3 . Coh-Metrix percentile scores for the five components (Narrativity,


Referential Cohesion, Syntactic Simplicity, Word Concreteness, and Deep Cohesion)
on two excerpts from appendix B of the Common Core State Standards, Little Women
and Adventures of Tom Sawyer.

understood by readers. Nonetheless, like Lady Chatterley’s Lover, they were


both written in different times, different societies, and a different dialect than
some readers will know. Such knowledge gaps will greatly affect readers’
ability to fill in the multiple cohesion gaps in these stories.
Other than low referential cohesion, the challenges in Little Women arise
primarily at the level of syntax. This is evident in the story from sentences
such as “They were all unusually hungry, having waited nearly an hour, and
for a minute no one spoke, only a minute, for Jo exclaimed impetuously, ‘I’m
so glad you came before we began!’” If a text is low in syntactic simplicity,
students’ level of reading skill should be particularly considered, especially to
the extent that other aspects of the text do not compensate for the challenges.
If a syntactically challenging text is also low in narrativity, then the teacher
may wish to consider whether the students’ reading skill and prior knowledge
are sufficient to tackle that text. However, highly narrative texts with chal-
lenging syntax, such as Little Women, may be optimal for tackling the
pedagogical goal of learning to parse sentences.
In comparison to Little Women, the principal source of difficulty in Tom
Sawyer stems from word concreteness. Word concreteness refers to here-and-
now concepts, ideas, and things constituting core lexical knowledge (Toglia &
Battig, 1978). Words such as “table,” “chair,” “street” are more concrete, in
contrast to words such as “love,” “air,” and “mind,” which are more abstract.
Evidence of abstract words is already apparent in the first two sentences of the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 93 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 93

excerpt provided in the CCSS: “But Tom’s energy did not last. He began to
think of the fun he had planned for this day, and his sorrows multiplied.”
Words such as “energy,” “think,” “fun,” and “sorrows” are relatively familiar
words but have abstract connotations. The CCSS calls for students to under-
stand the connotations, denotations, and roles that specific words play in the
text, and these concepts are likely to be represented by more abstract words.
Hence, stories such as Tom Sawyer may be optimal for tackling inference
making processes about words and concepts in a text.
These two passages were both relatively low in referential cohesion.
However, some passages may have the same grade level estimates and differ
greatly in cohesion. As discussed many times in this book, cohesion is crucial
to comprehension, particularly for readers who have low domain knowledge.
A low-cohesion text should be considered in concert with an understanding
of readers’ knowledge base. If readers have little knowledge, the text is low in
narrativity, and the text is low in cohesion, then comprehension may suffer.
However, with sufficient scaffolding, low referential cohesion can help push
readers to generate inferences to fill in the cohesion gaps (e.g., McNamara,
2004). Consider the following two passages from the Common Core State
Standards (CCSS) informational text exemplars, Discovering Mars: The
Amazing Story of the Red Planet by Melvin D. Berger and Hurricanes:
Earth’s Mightiest Storms by Patricia Lauber, which are provided on pages
70–71 of appendix B (www.corestandards.org/assets/Appendix_B.pdf). These
two exemplars, shown in Figure 5.4, are declared to be at a CCSS 4–5 grade
Easability Percentile Scores

Easability Percentile Scores

Discovering Mars Hurricanes

100 100
80 80
60 60
40 40
20 20
0 0
ity

ity

ity

ity

on

n
es

es
io

io

io
tiv

lic

tiv

lic

si
es

es

es
en

en

e
p

p
ra

ra
oh

oh

oh

oh
im

im
et

et
ar

ar
cr

lC

cr

lC

C
S

S
N

N
on

on
ic

ic

p
tia

tia
ee

ee
ct

ct
C

C
en

en
a

a
D

D
d

d
nt

nt
er

er
or

or
Sy

Sy
ef

ef
W

W
R

Coh-Metrix Easability Components Coh-Metrix Easability Components

fi g u r e 5 . 4 . Coh-Metrix percentile scores for the five components (Narrativity,


Referential Cohesion, Syntactic Simplicity, Word Concreteness, and Deep Cohesion)
on two excerpts from appendix B of the Common Core State Standards, Discovering
Mars: The Amazing Story of the Red Planet and Hurricanes: Earth’s Mightiest Storms.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 94 [78–95] 8.10.2013 9:08PM

94 Automated Evaluation of Text and Discourse with Coh-Metrix

band. Likewise, the Flesch Grade level estimates provided by Coh-Metrix


place both passages at Grade 5. Below are excerpts from the passages:
Discovering Mars: Mars is very cold and very dry. Scattered across the
surface are many giant volcanoes. Lava covers much of
the land. In Mars” northern half, or hemisphere, is a
huge raised area. It is about 2,500 miles wide.
Hurricanes: Great whirling storms roar out of the oceans in many
parts of the world. They are called by several names –
hurricane, typhoon, and cyclone are the three familiar
ones. But no matter what they are called they are all the
same sort of storm.
These two passages provide an example where they are both informational
texts and both estimated to be at the same grade level, but their Coh-Metrix
easability profiles are very different. Both are estimated to have low narrativ-
ity, although Hurricanes is more narrative than is Discovering Mars. This
narrativity is evidenced by language in the preceding samples. Discovering
Mars is a relatively dry text that simply provides the information, whereas
Hurricanes is more descriptive.
The two passages have equivalent syntactic simplicity, corresponding
well to the grade level estimates. The two passages are also comparable in
their level of deep cohesion, which is quite low. By contrast, the two differ
greatly in terms of word concreteness and referential cohesion. Discovering
Mars will have additional challenges from abstract words and low refer-
ential cohesion. It is relatively choppy text consisting of short sentences
that covers unfamiliar and abstract concepts. The Hurricanes passage
includes more concrete concepts and also maintains higher cohesion.
Hence, the Coh-Metrix easability scores indicate that Hurricanes will be
more easily understood by readers, particularly those with less knowledge
about the topic.
Any number of examples can be provided where the texts have the same
grade level estimates, but differ on a variety of dimensions. Grade level
estimates solely indicate whether a typical student or even a particular student
is well matched to a text with the goal of understanding the words or separate
sentences. These estimates do not indicate why comprehension may fail or
flourish, and they do not inform the teacher on how a certain text may or may
not align with either pedagogical goals or the readers in the classroom.
Because traditional readability measures are unidimensional, they provide
little guidance on how to modify instruction based on the difficulty of the text.
By contrast, using the Coh-Metrix easibility components, understanding the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C05.3D 95 [78–95] 8.10.2013 9:08PM

Coh-Metrix Measures of Text Readability and Easability 95

cohesion of a text in concert with other characteristics of a text, and a


student’s abilities can potentially guide instruction.

conclusion
There is a long history of unidimensionsal readability metrics that tap
parameters related to challenges at the word and sentence levels. Coh-
Metrix augments our understanding of readability foremost by providing
an estimate of text cohesion, and secondly by providing more specific infor-
mation on the multiple sources of difficulty that may challenge a reader. A
substantial advantage of Coh-Metrix is that it provides metrics on multiple
levels of language and discourse. Such a picture of texts will hopefully provide
educators and researchers with more information about text ease and the
potential challenges in various types of text.
It is crucial for educators to have access to information about the multiple
characteristics of a text, particularly in relation to other aspects of the text and
to the potential ability levels of the students. Narrativity provides information
about whether the reader is more or less likely to be able to use world
knowledge about events and event structures to understand the text.
Likewise, information on the cohesion indicates the degree to which a reader
will need to use knowledge to understand a text. This information can help
teachers align their pedagogical goals to a particular text. Coh-Metrix may
also provide information leading a teacher to use a different text. If a student
has very low domain or world knowledge, teachers may consider texts that
help compensate for vocabulary and mental model deficits.
While school systems and educators have recognized the importance of
text difficulty for decades and implemented any number of systems to grade
level text and assign readers to texts, there have been few efforts that offer
educators a means to understand characteristics of text relative to their
instructional goals as well as their students’ needs and abilities. The time is
ripe to do so, and teachers are calling for it. Our hope is that Coh-Metrix, and
particularly the Coh-Metrix easability metrics, will help improve student
outcomes in educationally meaningful ways.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 96 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures

Studies of Cohesion in Text and Writing

We discussed in Chapter 2 the importance of cohesion and coherence to


comprehension and how these findings were the main impetus for developing
Coh-Metrix. Our primary goal in the Coh-Metrix project has been to develop,
explore, and validate measures of text cohesion. Throughout the Coh-Metrix
Project we have developed and implemented many approaches to assessing
cohesion as well as other levels of language and discourse. The magnifying
glass has always primarily been on cohesion, so we have developed literally
hundreds of cohesion indices that vary in generality (see Chapter 4 for the
distinction between measure, index, bank, and variable). Some indices have
targeted one general construct, such as referential cohesion, whereas others
have drilled to a more specific level, such as temporal and verb cohesion.
A significant portion of our efforts has gone toward rooting among the indices
to choose the best ones and validating new ones. When there are many indices
to measure a similar construct, it has been necessary to identify which ones
rise to the top across the various studies and within studies. The indices need
to be validated so that we have some assurance that they assess what we think
they are assessing and that they are theoretically compatible with patterns of
data corresponding to types of texts or human performance. For example,
some studies show how particular indices account for differences between
texts that fit predictions based on theory or well-accepted empirical findings.
Alternatively, some indices are validated by patterns of data in psychological
experiments using behavioral tasks. We have conducted many such valida-
tion studies. This chapter describes some of the studies we have conducted,
particularly as they relate to referential, semantic, and situation model cohe-
sion. The chapter begins by examining measures of cohesion in the context
of empirical text comprehension studies and differences between types of
text. We subsequently describe our work examining the role of cohesion in
writing.
96
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 97 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 97

cohesion and comprehension


As discussed in Chapter 2, cohesion has an important role in the process and
products of text comprehension. There have been numerous studies showing
that cohesion cues influence the processing of text. In McNamara, Louwerse,
McCarthy, and Graesser (2010), we reviewed studies that had empirically
investigated the effects of text cohesion on comprehension. At the time of the
study, we identified 29 studies on text cohesion in the discourse processing
literature where readers had read relatively long texts (i.e., not textlets
or sentence pairs). We were able to find the texts from 15 of these studies
from the article, the Internet, or by contacting the authors of the studies.
From these studies we collected 19 pairs of high-cohesion and low-cohesion
texts. In our review of these studies, we confirmed effects of cohesion
across a variety of text genres, text manipulation methods, and types of
participants. The benefits of cohesion were robust! Although the results
often depended on the measures used (e.g., recall did not always show the
same effects as inference questions), cohesion improved comprehension
across a wide range of circumstances.
Some of the studies had examined how the benefits of cohesion depended
on individual differences. Among those studies that examined the effects of
prior knowledge, low-knowledge readers benefited more from added cohe-
sion than did high-knowledge readers (e.g., McNamara & Kintsch, 1996;
McNamara et al., 1996; see Chapter 2). By contrast, the studies that included
measures of reading skill tended to show that cohesion benefited readers
regardless of reading skill (Beck et al., 1984; Cataldo & Oakhill, 2000;
Linderholm et al., 2000; Loxterman et al., 1994; cf. O’Reilly & McNamara,
2007). When cohesion of text is manipulated, as it was in these studies, one
unexpected consequence is often an increase in difficulty in terms of word
frequency and syntax. The mean familiarity of the words decreases (for
example, by adding connectives and other discourse markers) and the syntax
becomes more complex (for example, by having embedded clauses referring
to other text constituents). Hence, traditional readability measures such as
the Flesch-Kincaid will predict that the low-cohesion texts will be easier to
read and understand than will be the high-cohesion texts. Nonetheless,
cohesion benefits readers, even when they are relatively less skilled. It
appears that additional cohesive elements do not increase the processing
demands of the text and tend to improve comprehension across a wide range
of circumstances.
One of the goals of the McNamara et al. (2010) study was to examine which
of the Coh-Metrix indices of referential cohesion showed the largest differences
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 98 [96–112] 8.10.2013 9:23PM

98 Automated Evaluation of Text and Discourse with Coh-Metrix

between the low-cohesion and high-cohesion texts, and thus which would be
more predictive of cohesion differences across texts. We included noun, argu-
ment, and stem measures that were crossed with the distance of the overlap
(adjacent, two sentences, three sentences, all distances), as well as whether
overlap should be weighted as a function of distance (i.e., with adjacent overlap
given a higher weight than more distant overlap). All 21 indices showed
significant differences between the high-cohesion and low-cohesion versions
used in the targeted studies, with reported Cohen’s d effect sizes ranging from
0.64 to 1.08. The largest differences were observed for noun and argument
overlap and the smallest differences were observed for stem overlap. This latter
result is likely attributable to the types of manipulations in the targeted studies,
because the experimenters who implemented the changes in the texts likely
increased overlap by repeating the exact words rather than a stem of the word.
Thus, including stem overlap would dilute the differences between the text
versions, and argument overlap would be more precise. Weighting the distance
of the overlap also had an effect, but only for the global cohesion measures
(all distances) wherein weighting the closer overlap in comparison to the more
distant overlap increased the effect sizes. Hence, the cohesion indices were quite
robust and effectively picked up on the differences between the texts. The most
sensitive indices were the noun and argument overlap indices. Although this
may depend on this corpus, argument overlap has often risen to the top in terms
of discriminating between texts in other studies.
The McNamara et al. (2010) study also examined effects of cohesion using
LSA indices. These results generally followed the patterns found for referen-
tial cohesion measures. However, the LSA paragraph-to-paragraph overlap
and paragraph-to-text overlap did not show differences between the high-
cohesion and low-cohesion texts. Moreover, the sentence measures (sentence
to sentence, all sentences, paragraph, and text) showed smaller differences
compared to the referential cohesion indexes. The average effect size for the
referential indices was 0.98, whereas the largest difference observed among
the LSA indices was an effect size of 0.59. In McCarthy et al. (2012), we later
examined the ability of the LSA given/new score (see Chapter 4) to predict the
differences between these low-cohesion and high-cohesion texts, and found
similarly moderate effect sizes (Cohen’s d = 0.39). We assume that this
difference between the referential and LSA measures occurs because LSA
more generously assesses overlap by considering semantically related words,
whereas the referential indices are more stringent semantically. When using
LSA, a sentence is more likely to have some overlap with another sentence.
This is particularly important to the materials investigated in the McNamara
and colleagues’ study because the texts being compared were manipulated
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 99 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 99

versions of one another; that is, the differences were relatively subtle. This
conclusion concurs with those reported by McNamara, Cai, and Louwerse
(2007), who found that overlap measures more accurately predict local
cohesion, whereas the LSA indices better predict global cohesion.
McNamara et al. (2010) also measured cohesion in terms of the incidence
of connectives and the ratio of causal particles to causal verbs (SMCAUSr).
Among the various types of connectives, only causal connectives (CNCCaus)
discriminated between the high-cohesion and low-cohesion texts, presum-
ably because the researchers who created the texts primarily manipulated
causal cohesion and not additive, temporal, or clarification connectives. The
causal ratio index also showed a difference with an effect size of 0.64. This
latter result indicates that there were more connectives, and they were
necessary to express more explicitly the relations between actions and events
expressed in the texts.
Analyses were conducted to examine which of the indices were most pre-
dictive of cohesion differences. We conducted a discriminant analysis to answer
this question. A discriminant analysis is a regression technique used for
categorical data to predict the category of each text, in this case high versus
low cohesion. The results indicated that the text cohesion was predicted best
by a combination of word frequency (WRDFRQmc), LSA similarity (LSASS1),
referential noun cohesion (CRFNO1), and the causal ratio (SMCAUSr). The
high-cohesion texts were higher in cohesion according to LSA, referential
cohesion, and the causal ratio, but contained less frequent (less familiar)
words. This combination of indices appears to capture global, local, and causal
cohesion differences in the text.
In terms of the Coh-Metrix Project, this study was crucial in validating the
Coh-Metrix indices to provide measures of text cohesion. We acknowledge
that the researchers who modified the texts purposively modified referential
and causal cohesion, so it is not surprising that these measures rose to the
surface. However, from a validation perspective, if they had not, it would
have indicated that our measures had missed the mark. Moreover, the results
give credence to the general empirical claim that referential and causal
relationships play important roles in the difficulty of texts and how they are
comprehended.
Duran, Bellissens, Taylor, and McNamara (2007) provided further evi-
dence demonstrating the importance of cohesion to comprehension. Coh-
Metrix was used to classify 60 science texts as easy versus hard using Principle
Components Analysis (PCA). The PCA identified a referential cohesion
component and a word concreteness component in the underlying clustering
of the texts. We then chose four topics that included one easy and one hard
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 100 [96–112] 8.10.2013 9:23PM

100 Automated Evaluation of Text and Discourse with Coh-Metrix

text in each of the topics and asked 24 participants to read either the easy or
the hard version for each of the four topics. The easy texts resulted in faster
reading times and better recall compared to difficult texts. The participants
recalled more from the easy texts, and there was a greater overlap between the
text and recall according to LSA measures. This study is different from prior
studies because cohesion was not manipulated; instead it was naturally
occurring in the texts. When topic was controlled, cohesion and word con-
creteness, as measured by Coh-Metrix, predicted the level of the text diffi-
culty. This study was continued by our work to develop measures of text
readability and reading ease, as reported in Chapter 5.
In summary, the validity of the Coh-Metrix cohesion indices has been
established across a number of studies, including the study conducted by
McNamara et al. (2010). Coh-Metrix has also been used across a variety of
studies to control and verify the cohesion of texts when experimentally exam-
ining the effects of cohesion and text difficulty on comprehension. These studies
confirm the power of Coh-Metrix as a tool to provide information about the
cohesion and difficulty of a text. They also simply point to the importance of
considering the cohesion of texts to estimate their potential challenges to
comprehension.

cohesion and genre


We have conducted a number of studies to examine the differences between
genres of texts. The predictors of genre are primarily on cohesion, but we also
explored other levels of language such as word and sentence measures. Genre
refers to the category of text (Biber, 1988; Pentimonti et al., 2010), such as
whether the text is primarily narrative (e.g., novels, folktales), expository
(e.g., textbooks, journal articles), persuasive (e.g., editorials, sermons), or
descriptive. As discussed in Chapter 5, the genre of a text can be informative,
particularly with regard to its difficulty. For example, narrative text is sub-
stantially easier to read, comprehend, and recall than are other genres of
text such as science, history, and other expository domains (Graesser &
McNamara, 2011; Haberlandt & Graesser, 1985). This ease of understanding
for texts that are more narrative in nature follows from a number of factors:
the words are generally more familiar; the concepts and events are generally
concrete, experiential, and familiar rather than abstract and unfamiliar; and
narratives discuss people, places, and events that are embodied in the real
world and lives of the reader, in some form or another (Bruner, 1986;
Graesser, Hoffman, & Clark, 1980; Rubin, 1995; Tonjes, Wolpow, & Zintz,
1999). Can we capture those differences using Coh-Metrix?
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 101 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 101

Across a number of studies, we have found that there are many linguistic
features that strongly discriminate between text genres. For example, in
Dempsey, McCarthy, and McNamara (2007), we found that phrasal verbs
alone successfully distinguished between genres. Indeed, across our explo-
rations comparing corpora with different genres, such as narratives and
informational texts, it is not uncommon for every Coh-Metrix variable to
show significant and meaningful differences between the genres. Genres are
different – very different. And Coh-Metrix picks up on that. So how are they
different?
Lightman, McCarthy, Dufty, and McNamara (2007) examined the distri-
butions of cohesion and text difficulty in narrative, science, and history
textbooks across the beginning, middle, and end of each chapter. We
expected that the three genres would show different flows of readability and
cohesion challenges across the chapters. We examined the readability of the
text in terms of Flesch-Kincaid Grade Level (see Chapter 5) and cohesion
using argument overlap and LSA. As expected, the science and history texts
were more difficult than the narratives in terms of Flesch-Kincaid grade
levels. Thus, the words were more familiar and the sentences were simpler
in the narrative texts. However, the science texts were also more cohesive.
They contained more overlap in words and concepts than did both the
history and narrative texts. The cohesion in science texts is necessary in
order to scaffold the reader who is confronted with more unfamiliar and
challenging concepts (e.g., McNamara & Kintsch, 1996). Whereas the science
texts showed higher cohesion, it was interesting that the history texts did not,
despite similar readability challenges as observed in the science texts. Thus,
when reading the history texts, readers may not be scaffolded by cohesion as
well as they should.
When Lightman et al. (2007a) examined text difficulty and cohesion across
the chapters – that is, the flow of challenges in the texts – they found that the
science and history textbooks showed an increase in difficulty at the word and
sentence levels as well as a decrease in cohesion across each chapter. Hence, as
the books progressed, they became more difficult at all levels. The narrative
texts, by contrast, displayed a linear decrease in grade level difficulty across
chapters and only a slight decrease in cohesion. These results suggested that
texts for both expository domains gradually rise in complexity as they
develop. It also provides one example showing how the linguistic properties
and the structural characteristics of narrative fiction are different from expos-
itory textbooks. Although science texts are clearly more challenging overall,
the content in science texts appears to be introduced slowly, with simpler,
more readable writing early on in a chapter.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 102 [96–112] 8.10.2013 9:23PM

102 Automated Evaluation of Text and Discourse with Coh-Metrix

McNamara, Graesser, and Louwerse (2012) further examined the differ-


ences between science, social studies, and narrative texts. They examined text
excerpts across grade levels 1 to 12 (defined by Degrees of Reading Power; see
Chapter 5). Like Lightman et al. (2007a), they also found that the cohesion
levels of the science texts were higher than the history and narrative texts, and
that the social studies texts were equivalent to the narratives in cohesion. The
narrative texts had the least challenging words but the most challenging
sentences. By contrast, social study texts had the most challenging words
both in terms of familiarity and concreteness. Social studies texts were equally
or more challenging than were science texts at the sentence level but con-
tained greater challenges at the lexical level. Thus, social studies texts com-
pensated for the difficulty of the words less so than did the science texts at
the sentence level. These differences between genres were similarly captured
by the Coh-Metrix Easability scores in Chapter 5 (Graesser, McNamara, &
Kulikowich, 2011).
McNamara, Graesser, and Louwerse (2012) also found that referential
cohesion increased across the grade levels. That is, texts at the lower grade
levels tended to have lower cohesion than did texts at higher grade levels. This
is counterintuitive because readers at lower levels potentially have a greater
need for cohesion than do readers at higher grade levels. The lower cohesion
is partially attributable to the shorter length of the sentences: Short consec-
utive sentences are less likely to overlap than longer sentences are. The lower
cohesion may also arise because as challenges increase in terms of readability
(i.e., DRP grade level), the challenges increase both at the word and sentence
levels. This in turn increases the need for cohesion: The reader needs more
scaffolding to aid in filling in the gaps in the text and forming a coherent
textbase.
McNamara, Graesser, and Louwerse (2012) also developed and examined
measures of verb cohesion (SMCAUSlsa; SMCAUSwn; see Chapter 4). They
hypothesized that verb cohesion would be more important for texts encoun-
tered by younger readers because actions and events would be more prominent
in these texts than would objects. An example along those lines is a text for a
young reader such as “Horses eat hay. Chickens eat grain. Mice eat cheese.”
There is little referential overlap but perfect verb overlap. As expected, they
observed that verb cohesion was greater in the earlier DRP grade texts than in
the later grade texts. Thus, the results suggested that the lower referential
cohesion in the lower-grade-level texts may be in some part compensated for
by greater verb cohesion, shorter sentences, and more frequent words.
Further evidence for the importance of verb cohesion comes from the
principal component analysis conducted by Graesser, McNamara, and
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 103 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 103

Kulikowich (2011; see Chapter 5). Verb cohesion was one of the eight principal
components that emerged from the analysis conducted on the 37,520 TASA
texts. This result indicates that verb cohesion is an important factor in
accounting for variance in differences between texts.
Duran, McCarthy, Graesser, and McNamara (2007) examined temporal
cohesion across science, history, and narrative text genres. Temporality is
important because of is crucial role in organizing language and discourse.
Most theories of text comprehension consider temporality to be one of the
critical dimensions for building a coherent mental representation of events
that are described in texts, particularly in narrative texts (Zwaan &
Radvansky, 1998). In English, temporality is partially represented through
inflections and tense morphemes (e.g., “-ed,” “is,” “has”). The temporal
dimension also depicts unique internal event time frames, such as an event
that is complete or ongoing, by incorporating a diverse tense-aspect system
(ter Meulen, 1995). The occurrence of events at a point in time can also be
established by a large repertoire of adverbial cues, such as “before,” “after,”
“then” (Klein, 1994). These temporal features provide several different indices
of the temporal cohesion of a text.
To investigate differences in temporality across genres, Duran et al. (2007)
asked experts in discourse processing to rate 150 texts in terms of temporal
coherence on 3 continuous scale measures designed to capture unique repre-
sentations of time. These evaluations established a gold standard of tempo-
rality. A multiple regression analysis using Coh-Metrix temporal indices
significantly predicted human ratings of temporal coherence. The predictors
included in the model were a subset of five temporal cohesion features
generated by Coh-Metrix: incidence of temporal expression words (“next,”
“following,” “yesterday,” “now,” “Monday,” “noon,” “week”), incidence of
positive temporal connectives (“before,” “then,” “later”), temporal adverbial
phrases (“in a moment,” “sooner or later”), incidence of past tense (“awoke,”
“began,” “saw”), and incidence of present tense (“look,” “move,” “talk”).
Collectively, all but one of the predictors (i.e., the incidence of positive
temporal connectives) significantly predicted the expert ratings of temporal
coherence. The indices accounted for 40% to 64% of the variance in the
experts’ ratings (depending on the type of rating). The study thus demon-
strated that the Coh-Metrix indices of local, temporal cohesion significantly
predicted human interpretations of temporal coherence, thereby validating
these Coh-Metrix measures of temporality.
A discriminant analysis further indicated that the temporal cohesion
indices were highly predictive of text genres (i.e., science, history, and narra-
tive), and were able to classify texts as belonging to a particular genre with
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 104 [96–112] 8.10.2013 9:23PM

104 Automated Evaluation of Text and Discourse with Coh-Metrix

very good reliability (i.e., recall and precision ranged from 0.47 to 0.92, with
an average F-measure of 0.68). The results indicated that narrative and
science texts were most different in terms of temporality, whereas history
and narrative texts were more similar. Science texts contained fewer temporal
adverbial phrases compared with narrative and history texts, whereas narra-
tive texts contained more than history texts. Narrative texts also contained
more positive temporal connectives than did the other two types. This
suggests that temporal adverbial phrases and temporal connective are stylistic
markers of narration. The incidence of present tense was higher in science
texts than in both history and narrative texts, whereas the incidence of past
tense was higher in narrative texts. This makes sense because stories often tell
of past events whereas science is prone to articulate generic, timeless truths.

cohesion and differences between specific


types of texts
As the Coh-Metrix project has explored the use of Coh-Metrix indices to
discriminate between types of texts and discourse patterns, the degree of
differentiation has become more fine-grained. There have been numerous
studies that have identified language and discourse characteristics of specific
types of text, as will be illustrated in this section.
Crossley and colleagues (Crossley, Allen, & McNamara, 2012; Crossley,
Louwerse, McCarthy, & McNamara, 2007, Crossley & McNamara, 2008) used
Coh-Metrix to distinguish two types of passages used in second-language
learning textbooks: simplified passages and authentic passages. Simplified
passages are those that have been modified for second-language learners
to be easier to read. Indeed, several studies have reported comprehension
advantages for simplified as compared to authentic versions of texts for
second-language learners (Long & Ross, 1993; Tweissi, 1998; Yano, Long, &
Ross 1994). However, the linguistic features of simplified texts were largely
unknown because second-language texts are often simplified using intuition
and without strict guidelines (Crossley, Allen, & McNamara, 2012). The
research by Crossley and colleagues using Coh-Metrix has indicated that
authentic texts tend to be syntactically more complex and include more
logical connectives, whereas simplified texts are characterized by higher levels
of referential and semantic cohesion, greater redundancy (e.g., lower lexical
diversity, higher G/N ratio), and lower levels of lexical sophistication (e.g.,
higher word frequency). Simplified texts provide second-language learners
with higher cohesion and more common connectives while at the same time
using more frequent, familiar words and less complex syntax than do
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 105 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 105

authentic texts. Second-language learning theorists and researchers have been


divided over whether to use authentic or simplified reading texts for begin-
ning and intermediate-level second-language learners. If the objective is to
facilitate comprehension, then the results of the studies conducted by
Crossley and colleagues indicate that simplified texts have clear advantages.
Graesser, Jeon, Yang, and Cai (2007) used Coh-Metrix to examine cohesion
in tutorial dialogues collected while students were being tutored in AutoTutor,
an intelligent tutoring system. AutoTutor engages students in a dialogue while
tutoring topics such as Newtonian physics, computer literacy, or critical think-
ing (Graesser, Chapman, Hayes, & Olney, 2005; Graesser, Jeon, & Dufty, 2008;
VanLehn et al., 2007). The system presents a series of problems and questions
that require the student to answer using verbal explanations. AutoTutor
engages the student via an animated agent in a dialogue that moves the student
toward constructing the correct answer, a process that typically takes about
100 conversational turns. The tutor attempts to induce students to generate
ideal answers to difficult questions requiring deep reasoning by using a variety
of dialogue moves, such as feedback, hints, prompts, assertions, corrections,
and answers to student questions.
Graesser et al. (2007) compared the tutorial dialogues of high-knowledge
students who had already taken the relevant topics in a college physics class
with those of novice students who had not taken college physics. Analyzing
the cohesion relations in the dialogues allowed them to better understand the
effects of college students’ background knowledge during the tutoring inter-
actions between the student and the pedagogical agent in AutoTutor. The
Coh-Metrix analysis indicated that the tutorial dialogues of high-knowledge
students shared substantially similar linguistic features with the dialogue of
novice students in referential cohesion, syntax, connectives, causal cohesion,
logical operators, and other measures. In contrast, there were significant
differences in the dialogue with high-knowledge students versus novice
students in semantic or conceptual overlap as measured by LSA. This result
suggests it is the more global or inferential level of meaning that differentiated
the discourse with students with differing physics knowledge, a result that is
compatible with conclusions in Chapter 2. Once again, this result supports
the notion that background knowledge on subject matter promotes deeper
levels of comprehension (i.e., the situation models and mental models) of
conceptual physics while interacting with AutoTutor (Graesser, Jeon, Cai, &
McNamara, 2008; Jeon, 2008).
Graesser, Jeon, Yang, and Cai (2007) also compared the cohesion and
language of dialogues with AutoTutor and three other types of discourse on
the very same physics topics: tutorial interaction between humans, a popular
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 106 [96–112] 8.10.2013 9:23PM

106 Automated Evaluation of Text and Discourse with Coh-Metrix

textbook on physics, and physics texts prepared by Kendeou and Van den
Broek (2009) for psychological experiments. They discovered that the
Coh-Metrix profiles were very similar for college students interacting with
AutoTutor versus a human tutor, and were very similar for the two texts that
deliver information in a monologue (the physics textbook and the experiment-
ers’ texts), but radically different for tutorial dialogues versus monologue
texts. Compared to the tutoring discourse, the two expository monologues
tended to be less fragmented, have more complex sentence syntax, and have
higher referential and situation model cohesion. Some of these differences are
compatible with the reported differences between print and oral language that
were identified in the early 1980s (Tannen, 1982). These results further confirm
the utility of the Coh-Metrix measurement profiles in discriminating different
types of texts and discourse registers.
Another style of discourse is related to truth versus deception. Duran, Hall,
McCarthy, and McNamara (2010) investigated whether cohesion and other
Coh-Metrix indices discriminated between dialogues in which one person
was being deceptive. The deceptive and truthful conversational dialogues
were collected by Hancock, Curry, Goorha, and Woodworth (2007) within
an instant-messaging (IM) environment. The Hancock and colleagues’ study
included 66 students who were randomly paired to create 33 same-sex inter-
locutor pairs. Each interlocutor was placed in a separate room to communi-
cate about various conversation topics using IM. One person in the dyad was
assigned the role of the sender to initiate and maintain the conversation, and
the other was the receiver. The sender was instructed to be truthful on two
topics and deceptive on the other two topics.
Duran et al. (2010) used Coh-Metrix to examine which indices were
predictive of the use of deception. The results indicated that the linguistic
features that characterized the deceptive exchanges were substantially differ-
ent from those that characterized the truthful ones. When the sender was
instructed to be deceptive, the conversational dialogues of both the sender
and receiver were characterized by (a) more words overall, but fewer words
used per conversational turn; (b) more meaningful words; (c) greater syntac-
tic complexity; and (d) lower cohesion (as measured by LSA given-new). The
latter results indicated that deceptive dialogues contain more information
related to preceding context. The deceptive dialogues were not characterized
by higher referential cohesion, and so the deceivers did not seem to reiterate
or repeat information, but rather tended to include fewer semantic focal
points. They hypothesized that the truthful events were more extensively
linked in memory than were the fictitious details comprising the lies. When
recounting a truthful story, one detail reminds the sender of a related one,
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 107 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 107

rendering more information available to include in the account. By contrast, a


deceptive story is being constructed on the fly, without the benefit of a
coherent memory structure to cue a variety of details concerning the event.
The greater syntactic complexity in the deceptive conversations seemed to be
associated with stalling. That is, because the deceiver could not rely on a
coherent truthful story, utterances tended to be extended until the relevant
details could be constructed.
There are many other studies aimed at discriminating between variations
in language (Crossley & McNamara, 2012a, 2012b; Crossley, Salsbury, &
McNamara, 2010a, 2010b, 2010c; Hall, McCarthy, Lewis, Lee, & McNamara,
2007; McCarthy et al., 2009). For example, Hall et al. (2007) used Coh-Metrix
indices to examine variations in American and British English, specifically in
texts regarding the topic of law. The corpus included 400 American and
English/Welsh legal cases. As one might expect, the results confirmed that
there were substantial differences between the two. A discriminant function
analysis including five indices of cohesion (referential, causal, syntactic,
semantic, and lexical diversity) correctly classified 85% of the texts in the
test set. Specifically, the British texts contained more cohesion cues than did
the American legal texts. Thus, cohesion was found to be an important and
highly significant predictor of differences between American and British
English, at least in the context of law. This and other studies have provided
evidence that Coh-Metrix, and particularly cohesion, successfully differen-
tiate between closely related registers.

cohesion and language in writing


The texts that we analyze using Coh-Metrix are often finished, edited prod-
ucts that we find in textbooks, books, journals, newspapers, and so on.
Experienced writers produce text that is like a finished product. But many
writers do not. Learning to write is a process that takes time, instruction,
practice, and feedback. Coh-Metrix has served to better understand that
developmental process. A major proportion of research on writing examines
cognitive and behavioral processes that occur during the writing process as
well as individual differences, such as working memory, that mitigate those
processes. By contrast, our focus is on the written product and inferring from
that product the processes in which the writer might have engaged, differ-
ences between writers, and, more importantly, the feedback that may be most
helpful to particular writers (see Crossley & McNamara, 2011 for a review).
Much of this work has been in the context of building a writing strategy
tutoring system called the Writing Pal (McNamara et al., 2011). The Writing
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 108 [96–112] 8.10.2013 9:23PM

108 Automated Evaluation of Text and Discourse with Coh-Metrix

Pal is a game-based intelligent tutoring system that provides training and


practice in using writing strategies, as well as practice and feedback in writing
prompt-based persuasive essays. The Writing Pal project has targeted
prompt-based essays because they are often used to assess writing skill in
high school and college. These essays are generally time limited (the writer is
given 25 minutes to complete the essays) and on relatively familiar topics,
such as the significance of heroes and celebrities, or the value of choices in life.
The writer is asked to take a position on a particular question and support
that position with evidence and examples. Relatively successful essays are
approximately five paragraphs and contain about 700 words. Very poor
essays contain few paragraphs and may contain fewer than 250 words. Our
goal has been to understand the linguistic properties of essays so that mean-
ingful and impactful feedback can be provided to students on the strategies
that they should use to improve the essay and their writing.
We have collected essays from various sources and populations of emerg-
ing writers (e.g., high school students, college students) and examined differ-
ences between those essays as a function of a number of variables, such as the
writers’ age or grade level, whether they were English-speakers or English-
language learners, the prompt to which they were responding, and so on. One
of the goals of this research has been to investigate the role of cohesion in text
produced by developing writers, including young writers and second-
language writers. On the one hand, it might be expected that cohesion is
positively related with the quality of the essay. Cohesion facilitates text
comprehension, and thus better writers might be expected to provide more
cohesive cues in their writing. Indeed, an intuitive assumption, and one made
by many experts in English Language Arts and Composition, is that cohesion
is an essential component of writing. Cohesive cues such as lexical and
semantic overlap and the use of connectives have often been assumed to be
crucial components of higher-quality writing. Higher-quality writing has the
experiential quality of being more coherent and better organized. However,
as discussed in Chapter 2, it is important to distinguish between the cues that
are observed in the text or discourse (i.e., cohesion) and the connections that
are formed in the mind of the reader or listener (i.e., coherence). Many have
assumed that the coherence of higher-quality writing is grounded in cohesive
cues in the writers’ text. Coh-Metrix and other text analysis tools provide the
means to investigate the role of cohesion and other linguistic features in
essays produced by developing writers.
McNamara, Crossley, and McCarthy (2010) examined which linguistic
features were most predictive of essay quality for 120 college student writers
who wrote take-home (untimed) essays. They found that better essays were
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 109 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 109

more syntactically complex, had a greater diversity of words, and included


more rare, unfamiliar words. Hence, the more skilled writers were more
sophisticated in the words they used and the sentences they used to convey
ideas. By contrast, no measures of cohesion from Coh-Metrix correlated with
essay quality. These results indicated that higher-quality essays were more
likely to contain linguistic features associated with text difficulty and sophis-
ticated language. However, cohesion was unrelated to essay quality.
In another study, Crossley, Weston, McClain-Sullivan, and McNamara
(2011) examined differences in writing quality as a function of the develop-
ment of the writer. They compared essays written by 9th grade, 11th grade,
and college students. As expected, the essays increased linearly in quality as a
function of grade level of the writer. Fortunately, the college students wrote
higher-quality essays than did the 11th grade students, who in turn wrote
better essays than did the 9th grade students. Students indeed improve in the
writing quality according the human ratings from ninth grade to college. The
Coh-Metrix analyses showed that the ninth grade essays were characterized
by higher word frequency (i.e., more familiar words) and lower syntactic
complexity (i.e., simple sentences). Similar to McNamara et al. (2010), college
students’ essays were more syntactically complex, had a greater diversity of
words, and included more rare, unfamiliar words. In addition, cohesion
decreased as a function of grade level. The ninth grade essays included
more explicit cohesive cues such as connectives and word overlap, whereas
the college student essays included the least cohesive cues (see also Crossley,
Roscoe, Graesser, & McNamara, 2011). Thus, the writers’ sophistication in
language use increased across the grades, but the use of explicit cohesion cues
decreased.
Research studies examining essays by second-language writers have yielded
similar results. Crossley and McNamara (2012) examined 344 essays written
by high school students taking the Hong Kong advanced level examination
(HKALE) designed to assess ELL students’ ability to understand and use
English. The essays were graded by trained raters from Hong Kong on a
seven-point scale. In this study, only essays between 485 and 555 words were
included to control for the effects of text length and only essays that were
given between a 1 and 6 were included, excluding those given a failing rating
called unclassifiable. The results indicated that the principal indicators of
essay quality were related to lexical sophistication, including greater lexical
diversity (i.e., D), fewer familiar words, more infrequent words, and fewer
meaningful words. However, cohesion indices such as content word overlap,
LSA G/N, aspect repetition were negatively related to essay quality. Higher-
quality essays had lower cohesion. Thus, just as with the English-language
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 110 [96–112] 8.10.2013 9:23PM

110 Automated Evaluation of Text and Discourse with Coh-Metrix

writers, second-language writers’ sophistication in language use increased as


their English proficiency increased, but the use of explicit cohesion cues
decreased.
In sum, both native and second-language writers are aware of and able to
use cohesive cues in the writing early in their development. Indeed, research
on native-language writers indicates that children learn and use cohesive
devises in their writing as early as Grade 2 and continue developing in their
use at least until around Grade 8 (King & Rentel, 1979; McCutchen, 1986;
McCutchen & Perfetti, 1982). After approximately Grade 9, however, it
appears that the use of these cues decreases as they become more proficient
writers (see also Freedman & Pringle, 1980). At the same time, they learn and
are able to use more sophisticated language such as rare words, more diverse
words, and more complex syntax. Likewise, research on second-language
writers indicates that more proficient second-language writers show greater
lexical diversity (e.g., Jarvis, 2002). The decrease in the use of explicit cohesive
cues indicates that skilled writers increase in their awareness of when these
cues are needed to support comprehension.
One important consideration in the evaluation of writing is the distinction
between cohesion and coherence. One question asked by Crossley and
McNamara (2010) was whether essay graders’ judgments of essay coherence
were related to essay quality. As we have explained, cohesion refers to the
presence or absence of explicit cues in a text. Coherence refers to the under-
standing that the reader derives from the text. It is that understanding
that would contribute to an essay grader’s score. The strongly held views
that essay quality is related to cohesion may be driven by raters’ sense of essay
coherence rather than by the presence of cohesive cues in essay. Crossley and
McNamara examined the essay rubric scores from 184 essays written by
college students. The essay graders rated the essays using a rubric including
14 items within 3 subsections: structure, content, and conclusion. Included
among the items were two measures of coherence: reader orientation, defined
for the rater as the essay’s overall coherence, ease of understanding, and
continuity (defined as the strength of connection of ideas and themes within
and between the essays’ paragraphs). The strongest predictor of essay quality
was reader orientation, which had a .80 correlation with the graders’ holistic
score and predicted 65% of the variance in a regression analysis. The graders’
ratings of continuity, correlated .65 with essay quality but did not account for
unique variance in the regression analysis. These results confirmed that
coherence is an important element of human judgments of essay quality.
However, the raters’ judgments of coherence were negatively related to
indices related to text cohesion.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 111 [96–112] 8.10.2013 9:23PM

Using Coh-Metrix Measures 111

The results reported by Crossley and McNamara (2010) indicated that


raters’ sense of coherence was positively related to essay quality, but Coh-
Metrix cohesion indices were negatively related. In addition, there is a great
deal of variance in writing that is not accounted for by Coh-Metrix indices:
The algorithms using Coh-Metrix to analyze writing have typically accounted
for only 20–30% of the variance in writing quality. These outcomes may be
attributable to a number of factors. One consideration regards the subjectivity
of writing evaluation. To assign a score, essay graders must interpret and
judge a multitude of unique qualities that comprise the essay. Without train-
ing, graders are unlikely to assign the same scores to the same essays con-
sistently (Huot, 1996; Meadows & Billington, 2005). This lack of consistency
is remediated by the use of detailed scoring rubrics and extensive training.
Thus, the particular essay features emphasized on the rubric may influence
the raters’ assessment of essay quality.
In addition, the particular essay features that influence scores are likely to
be influenced by the genre of the essay. The Writing Pal research has focused
on persuasive essays because that genre is often used to assess writing ability.
However, students are asked to produce a variety of writing genres. We
know that text genres differ widely in linguistic features. We can assume
that the features of essays that influence expert raters’ ratings of quality will
differ between writing genres. For example, the features of persuasive essays
that are weighed by essay graders are likely to be different from the features of
informational essays. Just as cohesion is more crucial to the comprehension of
informational text than to narratives, cohesive cues may have a greater
influence on raters’ assessments of the quality of informational text than it
seems to have on their assessments of persuasive writing.
A further consideration regards the difference between text difficulty and
essay quality. Coh-Metrix focuses on the assessment of text difficulty, and
thus the majority of its indices are related to text difficulty. However, text
quality is a different construct from text difficulty. Hence, our current efforts
are turning toward developing tools that specifically focus on indices that are
more predictive of the quality of writing. These efforts include the develop-
ment of global cohesion measures that examine the overlap between each of
the paragraphs (e.g., the overlap between the introduction and the conclu-
sion) and contextual cohesion measures that examine the overlap between the
prompt and different parts of the essays. In addition, we are developing
measures of rhetorical cues in the writing, such as the use of exemplification,
convincing arguments, description, narrations, and so on. We expect that
indices that are more strongly related to writing will provide stronger indices
of writing quality.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C06.3D 112 [96–112] 8.10.2013 9:23PM

112 Automated Evaluation of Text and Discourse with Coh-Metrix

We already have some evidence of success in that arena. For example,


Crossley and McNamara (2011) replicated the findings reported in their 2010
study, including a larger corpus of 315 essays and a different scoring rubric.
The rubric in this study comprised 10 items, including one that was most
representative of coherence, called organization, defined as the degree to
which the body paragraphs follow the plan set up in the introduction of the
essay. Organization correlated 0.77 with the graders’ holistic score and was
the strongest predictor, explaining 60% of the variance in a regression
analysis. As found previously, Coh-Metrix cohesion indices showed either
low or negative correlations with essay coherence. However, measures of
semantic overlap between paragraphs in the essays were positively correlated
with essay coherence (see also, McNamara, Crossley, & Roscoe, 2013).

conclusion
This chapter has focused on studies that have validated or made use of the
Coh-Metrix measures of cohesion. We did not include every Coh-Metrix
study involving cohesion indices, and we did not describe the multitude of
studies that have focused on other indices. We focus here on cohesion because
it is central to the purpose of Coh-Metrix. Cohesion measures are a unique
contribution of the Coh-Metrix tool and project. In our laboratory, the
measures are often used to assess the features of texts used in the context of
experimental studies of text comprehension. Coh-Metrix has also been used
in the context of a variety of corpus studies including validation studies,
exploratory studies, and natural language studies. This chapter has described
a plethora of studies that have shown that cohesion is an important feature of
text and discourse. These studies collectively demonstrate that Coh-Metrix
indices serve as valid proxies for their intended constructs, and that what they
measure is predictive of types of texts and human performance in theoret-
ically guided directions.
C:/ITOOLS/WMS/CUP-NEW/4412224/WORKINGFOLDER/MCNAM/9780521192927PTL02.3D 113 [113–114] 7.10.2013
3:27PM

part ii

A BEGINNER’S GUIDE TO WRITING


COH-METRIX RESEARCH
C:/ITOOLS/WMS/CUP-NEW/4412224/WORKINGFOLDER/MCNAM/9780521192927PTL02.3D 114 [113–114] 7.10.2013
3:27PM
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 115 [115–127] 8.10.2013
9:42PM

The Strategy

Moves, Frozen Expressions, and the Elevator Pitch

By now you should have a fair idea of what the Coh-Metrix tool is, what it is
for, where it all came from, and how to use it. However, knowing how to
operate a text analysis tool like Coh-Metrix and knowing how to write up a
research paper using a tool like Coh-Metrix are two very different things. In
this part of the book, our goal is to show you how to write such a paper. What
we have in mind is a short project paper, the kind of paper that would serve
well as a term paper, a conference proceedings manuscript, or even the basis
of a journal article, thesis, or dissertation.
A term paper, a conference proceedings manuscript, a journal article, a
thesis, and a dissertation may all sound like very different composition types.
However, there is a remarkably similar thread that runs through each of them.
After all, whatever the Coh-Metrix project is, there is still the need to inform
the project’s audience of such questions as What is the project about?,
Why was it done?, How was it done?, What are the results?, and What does
it all mean? In many ways then, whether writing something as short as an
abstract or as long as a dissertation, the key aspects of a research paper
are almost always present. It is those key aspects, questions, or communica-
tion moves (Swales, 1981, 1990) that we will be highlighting and discussing
in this part of the book. By showing you where in the composition these
moves occur, what they function as, what they look like, and how to write
them, we hope to provide you with a thorough guide to writing an excellent
Coh-Metrix research paper.
What we offer in this section of the book is what some call a cookie-cutter
approach to writing. Some may hold this approach to writing in disdain
because it is formulaic and, like a menu-driven statistical tool, may result in
writing without thinking. However, we have found that the beginning writer –
and in this case, beginning users of Coh-Metrix – benefit immensely from
writing formulas. Usually writers have to discover these formulas by trial and
115
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 116 [115–127] 8.10.2013
9:42PM

116 Automated Evaluation of Text and Discourse with Coh-Metrix

error. Here we have attempted to speed up that process by not only providing
the cookie cutter but also providing multiple examples of Coh-Metrix
research.
Notably, we offer the following chapters to students and other novice
researchers who have little experience in writing, and in particular in writing
about the types of corpus analyses we describe in this book. As such, you
will see that we use a different rhetorical voice in this section. The previous
section covered theoretical, technological, and empirical information about
Coh-Metrix. The voice there was one similar to the writing you will find in
empirical chapters, proceedings, and journal articles. In this section, by
contrast, we adopt a voice directed at the student and the novice researcher.
We hope you like this kinder, gentler us.

some basic assumptions


We will consider writing a Coh-Metrix project from the point of view of a
person with little experience of using the tool (or similar tools) and little or no
experience of writing up such projects. In other words, this chapter (and those
that follow) primarily considers the writer/researcher as someone who might
be studying this book as part of a graduate course. We will also consider this
initial Coh-Metrix project to be a text analysis corpus study (which is more
common in linguistics than in cognitive psychology) rather than a participant
study (which is more common in cognitive psychology than in linguistics).
A text analysis corpus study is a study of easily available collections of
written texts (e.g., short stories, newspaper columns, poems, instruction
guides, biographies, Web pages etc.) rather than a study involving a collection
of data from human participants. Collecting new data takes time and requires
approval from an institutional review board on research ethics. Coh-Metrix is
perfectly suited to data analyses from participant experiments, but because
those texts have not been edited and there are more language disfluencies
(e.g., misspelled words, ungrammatical sentences), a corpus study is an
easier starting point. Moreover, the vast majority of published Coh-Metrix
studies have been corpus analyses, so we have more examples that we can
refer you to.
Before we get started, we should also state a number of other assumptions
about your prior knowledge of research. We assume that you have read at
least one or two research papers. Therefore, you presumably understand the
basic purpose of the four primary parts of a research paper: Introduction,
Method, Results, and Discussion. And we assume that you have some knowl-
edge of rudimentary statistics, or that you intend to learn it as part of the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 117 [115–127] 8.10.2013
9:42PM

The Strategy 117

process of studying this book. On the other hand, we do not assume that you
have already written a research paper or even an abstract for a research paper.
And we do not assume that you have specific ideas on what is mentioned in a
Method section, written in a Results section, disseminated in a Discussion
section, or composed in a corpus.

the basic outline of a coh-metrix


research paper
Given the goal of writing a short Coh-Metrix project, we now turn to the
typical outline of such a written study. Obviously, nobody is limited to the
parameters we give here, but they do conform to the approximate length of
many term papers or conference proceedings. As such, these criteria offer a
simple and practical framework within which to be thinking and planning
your study.
 Total words: 2,000–5,000 (excluding titles, abstract, references)
 Pages: 4–10 (assuming single-spaced, Times New Roman, 12-point font)
 Sections:
I. Introduction: At least 1 full page, never more than 2 full pages
II. Method
a. Tool description: If it is Coh-Metrix, a maximum of 1 page,
although a paragraph is usually enough.
b. Corpus description: Less than a page, certainly not more than a
page.
III. Results: 1 to 4 pages, depending on the number of analyses and
quantity of tables or figures.
IV. Discussion: At least 1 full page, never more than 2 full pages

moves and frozen expressions


The four main sections of a research paper (i.e., Introduction, Method, Results,
Discussion) can easily look daunting to someone who has never written a research
paper. However, just as a research paper is divided up into four main parts, so
each of those parts is divided up into identifiable smaller parts called moves. We
will treat a move as a single unit of text (at times just a sentence, at times much
longer) that serves a specific communicative purpose to the audience.
For an example of a communicative purpose, consider a Dear John letter. A
Dear John letter is a written message (from a woman to a man) that has the
purpose of conveying to a husband or boyfriend that the relationship is
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 118 [115–127] 8.10.2013
9:42PM

118 Automated Evaluation of Text and Discourse with Coh-Metrix

over, usually because the wife or girlfriend has found someone else. As the
name suggests, such a letter typically begins with the words Dear [+ name].
The Dear [+ name] is a move that serves to signal the opening of the letter;
that is, who the letter should be read by and that the essence of communica-
tion is about to be presented. Other moves in a Dear John letter might include
the cause/excuse for the splitting up, the reason why the split had to be conveyed
by mail, and, comfortingly enough, a sincere and heartfelt wish for the
recipient’s future happiness. The order of the moves is critical. For example,
it would not be suitably gripping for the writer to inform the recipient of the
forthcoming Hawaiian vacation with the new lover before she had actually
performed the move of notifying the current beaux that it’s all over. It is
noteworthy that moves from other, similar discourse structures (e.g., a post-
card) may not be appropriate. For example, a move that requests any happy
news from the recipient will likely be absent in a Dear John letter. Also absent
from a Dear John letter will be the move that often signals the end of a
communication, specifically hope to see you soon!
Moves are the functions of parts of texts, or what Mann and Thompson
(1988) call rhetorical functions. These functions are sometimes explicitly articu-
lated in the texts with words or phrases that signal the function to the
experienced reader. The words and phrases are typically frozen expressions
because their meaning over time has become fixed, broadly accepted, and
widely understood within the discourse community. For example, in a Dear
John letter, we can see that frozen expressions are very common among the
moves. The word “dear” in Dear [+name] is not arbitrary: It was chosen instead
of other alternatives such as “Hi” or “Hey.” The “dear” conveys a more formal
tone for such a note. This formality signals that it is unlikely that the commu-
nication relates to something mundane like setting the TiVo or feeding the cat.
Instead, “dear” is more likely to signal to an intimate partner the move that
conveys “Listen up, I have some news, and you ain’t gonna like it.” Other frozen
expressions in the Dear John letter include “I think we’ve both known for a long
time,” “I will always treasure . . .,” and “you’re too good for me anyway.”
Odd as it may sound, a Coh-Metrix research paper is just like a Dear John
letter: It is composed of series of scripted moves that are most often in a fixed
order and very often have frozen expressions to signal their function. Also
similar to a Dear John letter, a Coh-Metrix research paper does not allow
numerous moves from other, similar discourse genres or registers. For
instance, a science paper typically has a move at the end of the introduction
that informs the reader as to the forthcoming section headers of the paper
(i.e., Method, Results, Discussion). This move provides a global overview of
the paper, but the reader could also get an overview by perusing sections over
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 119 [115–127] 8.10.2013
9:42PM

The Strategy 119

the entire manuscript. This example serves as a reminder that many moves
are conventions that may or may not have a rational foundation. Like all
conventions, they have patterns and parts that the experienced reader needs
to see, and the inexperienced writer needs to learn.
The entire point of writing a research paper, as opposed to merely con-
ducting the experiment, is to clearly convey the researcher’s message to the
researcher’s audience. We have argued here that moves (and their associated
frozen expressions) are useful templates for constructing the researcher
paper. But moves are not just the scaffolding around which a draft paper is
wrapped, and neither are frozen expressions simply trite or vacuous clichés
that demonstrate a scientist’s lack of originality. Instead, both moves and
frozen expression are warmly welcomed by readers in the discourse com-
munity because they make understanding the paper both easier and faster.
How do conventional communication moves and frozen expressions make
understanding the paper easier? The answer, simply put, is that it minimizes
cognitive load and maximizes common ground (Clark & Schaefer, 1989;
Kalyuga, 2012). Our cognitive resources are not limitless, so it is beneficial
to learning if our cognitive processes and activities are optimally managed:
We are likely to learn more if we are free to concentrate on understanding
the substantive content in the text rather than having to use our cognitive
resources to infer the writer’s intentions.
In practical terms, we optimize the reader’s cognitive load by presenting our
paper in a predictable form, a predictable order, and using predictable language.
As such, the more the reader’s expectations can be met, the more cognitive
resources the reader has available for understanding the study’s issue. For
example, the reader needs to know the research question, so explicitly making
a statement such as “Our research question is . . .” facilitates the reader’s
processing. That is, using explicit language means that the reader doesn’t have
to use up valuable cognitive resources by making inferences (which might not
even be correct!). Employing well-established moves and frozen expressions are
facilitative in this respect because they are part of accepted, standard, and
established language that conveys accepted, standard, and established meaning.

getting started
Many experienced researchers view a study as evolving through the following
cycles:
1. Theories beget hypotheses.
2. Hypotheses beget research questions.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 120 [115–127] 8.10.2013
9:42PM

120 Automated Evaluation of Text and Discourse with Coh-Metrix

3. Research questions are empirically tested.


4. Results are analyzed and incorporated back into the developing theory.
5. Repeat steps 1 through 4.
By the end of this part of the book, this cyclical process will hopefully be very
clear to you. However, for a first-time researcher, it may be difficult to jump
straight onto that carousel and enjoy the ride. That is, a first-time researcher
probably doesn’t have a sufficiently clear idea of most of the concepts listed
above and how these concepts work together. For that reason, we start our
Coh-Metrix journey with the concept that is more broadly employed and
(presumably) more easily understood: the theme of the paper. From this
position (i.e., starting with the theme), we will build upward and outward,
slowly bringing in each of the more technical concepts mentioned earlier.

choosing a theme for a coh-metrix


research study
As we mentioned previously, experienced researchers use theory to help them
discover a gap in knowledge. Filling that gap through research becomes the
theme of the study. In contrast, inexperienced researchers are not fortified
with a rich foundation of theory, so it is best for them to identify a theme on a
topic that interests them. So, if you like literature, be thinking about literature.
If you like gender studies, be thinking about gender studies. If you like
politics, be thinking about politics. In the example we will use in this chapter
(see the Elevator Pitch below), our theme will be newspaper stories. However,
just in case you really are having trouble thinking up of a theme, below is a
collage of examples that might get you started:

Hobbies, Reviews, Biographies, Jokes, Web Pages, Manuals, Magazines, Plays,


Poems, Conversations, Emails, Legal Documents, Abstracts, Advertisements,
Folk Tales, Children’s Stories, Romance Fiction, American Fiction, British
Fiction, the Fiction of the Central Republic of Congo, Pulp Fiction, Songs,
Essays, Summaries, Reports, History Text Books, Science Text Books, Text
Books on Crocheting, Sports Reports, Weather Reports, Editorials, Obituaries,
Help Directories, Forums, Wills, Dear John’s, Suicide Notes, Fund Raising Letters,
Postcards, State of the Union Speeches, Apologies, Prepared Statements, Last
Words of Convicted Prisoners, Plenary Speeches, After Dinner Speeches, Toasts,
Wedding Vows, Prayers, Sermons, Referee Reports, Letter’s to the Editor,
Resignation Addresses, Diaries, Free-writes, Philosophies, Histories, Riddles,
Rhymes, and the Unscripted Thoughts of Mothers and Fathers on the Subject
of Bringing up Babies.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 121 [115–127] 8.10.2013
9:42PM

The Strategy 121

Once you have chosen a theme, your next task is to consider the theme’s
practicality. That is, is it even possible to do such a study with the time and
resources available to you? To address that question, consider what we will
need, at a minimum, for a Coh-Metrix study of this kind.
1. You will need “typed texts”1
2. You will need these texts to be about 100 to 1,000 words long.
3. You will need at least 20 of these texts (see Chapter 9)
4. The texts need to be in (relatively) Standard English.
Given these limitations, it is probably not wise to conduct a study on the plays
of James Joyce, because he only wrote two of them. It is probably not prudent
to conduct a study on Russian novels, because they tend to be very long.
Telegrams are not a wise discourse form because the texts are very short,
and not in Standard English. Your time and effort is also a serious factor.
Downloading from the Web is very fast, cheap, and easy; transcribing con-
versations, scanning books, and organizing essay collections is laborious and
time consuming.
Narrowing Down the Theme. It is important to start a Coh-Metrix study
(like any serious study) by thinking in terms of bricks rather than houses. The
vast majority of researchers achieved their status as a result of long series of
experiments, trials, observations, successes, and also failures. If your project
is on literature, you cannot answer a question as broad as “Is American
literature better than British literature?” If you plan on conducting a study
on gender, then it would require a series of studies on many corpora to answer
the question “Is female writing different from male writing?” These questions
are far too broad for any single study to ever address. The secret of a good
Coh-Metrix research paper is to narrow down your theme to a single doable
study. That said, over the course of many such studies, the bricks will gather
up, but it is only at the end of a long process that we see a fully formed house.
Let’s now return to our list of possible themes and see how we can narrow
them down. For example, instead of just “hobbies,” we could have Traditionally
male hobbies, Traditionally female hobbies, Traditionally children’s hobbies,
Outdoor hobbies, Indoor hobbies, Winter hobbies, Summer hobbies, American
hobbies, Alaskan hobbies, New hobbies, Getting started in hobbies, Hobbies as
written by American, Hobbies as written by Australians, Hobbies as written by

1
Coh-Metrix can only process typed texts (not handwritten texts). Coh-Metrix, like most related
software, typically expects documents to be in the .txt format, although variations of the .doc
format may also be used. As technology develops, Coh-Metrix is likely to adapt to new and various
formats of documentation. Because of these changing circumstances, we provide document
settings and document-loading instructions on the tool itself.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 122 [115–127] 8.10.2013
9:42PM

122 Automated Evaluation of Text and Discourse with Coh-Metrix

Australians who became Americans, and so on and so forth. The point here is
to narrow down the theme, and to continue to narrow it down until you have
one very specific topic, which will be the subject of your study. In the example
we are using in this chapter, we narrowed down the broad theme of newspaper
stories to “the reporting of local versus global issues in newspaper stories.”

the elevator pitch


As we work through this part of the book, we will gradually piece together an
Elevator Pitch. The Elevator Pitch conveys the essence of the study, such as
the underlying question, hypothesis, proposal, plan, and summary, depend-
ing on the maturity of the investigation. The staging of the Elevator Pitch runs
something like this: You get into an elevator at your university and you find
yourself face to face with a senior professor (perhaps someone who could be a
great asset to your career). The professor, not really wanting to talk to a lowly
student, still thinks it is polite enough to ask you what you’re doing in your
studies. You have the length of the elevator ride to coherently convey your
study (i.e., your Coh-Metrix study) to the professor, noting that your future
funding might well depend on how successfully you deliver your pitch.
On a more practical level, the Elevator Pitch is simply the framework around
which you develop your study. It is also the reference point to which you can
and should often return to make sure that you have not drifted away from the
goals of your study. And perhaps most importantly of all, the Elevator Pitch is
the cut-and-paste that you should send at the BOTTOM of every e-mail to
your advisor because, while professors may be an enthusiastic supporters of
students’ careers, they may still need to be reminded once in a while as to what
exactly you’re doing. Thus, the Elevator Pitch serves to establish common
ground between writer and reader – a theme we shall return to shortly. Let’s
now look at an example of an Elevator Pitch:

Our study focuses on the language features of newspaper reports. More specifi-
cally, we are interested in the differences between language used for the reporting
of international news (i.e., global issues), and language used for the reporting of
national news (i.e., local issues). Our research question is: Does the language of
news reports become more complex when reporting global issues as opposed to
local issues? And if so, what features of language are driving these differences? To
address our research questions, we formed two contrasting hypotheses. The first
hypothesis is that the language of news reports will become more complex when
reporting global issues because any reporting of global news is likely to be a more
important story, and therefore more difficult to explain: The language of the
report will reflect this difficulty. In contrast, our second hypothesis is that that the
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 123 [115–127] 8.10.2013
9:42PM

The Strategy 123

language of news reports will become less complex when reporting global issues
because the difficult nature of describing such world issues will cause writers to
use facilitative language: The language of the report will reflect this facilitation.
This study builds from the work of researchers such as Herb Clark, Art Graesser,
Walter Kintsch, Danielle McNamara, and John Swales. Their research suggests
that background knowledge, schemas, and expectations of shared experience need
to be established in order to increase the likelihood of comprehension, and that
explicit cohesion at the level of the text might facilitate this goal. Based on this
theory, we can expect some measure of assumed common ground between writer
and reader for local issues. As such, there will be little need for simple language or
explicit textual cohesion. However, if the writer pays little or no attention to the
focus of the report, then the complexity of the global issues might manifest itself
only in more complex and less facilitative language. Our goal in this study is to
discover and assess the language differences used in the reporting of local and
global issues, and, based on our findings, to offer some idea as to the effect these
language features might have on the communicative goals of writers. In order to
address our research question, we will construct two contrastive corpora: one of
newspaper stories concerning local issues, and one of newspaper stories consid-
ering global issues. Having formed the two corpora, we will process the text using
various cognitive and linguistic indices from Coh-Metrix, including situation
model, referential, causal, temporal, special, syntactical, and lexical diversity
indices. Coh-Metrix is particularly well suited to this study, having had its indices
validated in numerous previous studies. We will assess the differences between
the corpora by conducting a series of t-tests. The study is of interest to writers,
especially reporters, because their task is to effectively communicate information
to those who wish to learn. The task is also important to linguists and cognitive
scientists because it stands to better explain how differences in perceived catego-
ries (local, global) are made manifest through linguistic features.

This Elevator Pitch may seem long on time and complex in structure. However,
as we shall see, neither is really the case. First of all, considering the length, the
aforementioned Elevator Pitch takes just two minutes to recite. Such a length
of time may be longer than many elevator rides, but even the most stuffy of
professors can usually (quite literally) spare two minutes for a student. Turning
to the complexity of structure of the pitch, we can actually see that the text
breaks down into a series of moves, the function of which can be represented by
a series of questions. In total, we use 11 Elevator Pitch questions (see Table 7.1).
Generally speaking, if all 11 of these questions have been answered, then your
Elevator Pitch work is complete.
Before we start describing the moves in more detail, it is important that
we make a quick note on the pronoun use we have adopted in this and the
forthcoming chapters. There are four authors of this book, so we always
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 124 [115–127] 8.10.2013
9:42PM

124 Automated Evaluation of Text and Discourse with Coh-Metrix

t a b l e 7 . 1 The 11 Elements of the Elevator Pitch

Question Elevator Pitch Elements


1 What is the theme of the study? Our study focuses on the language features of
newspaper reports.
2 More narrowly, what are you More specifically, we are interested in the
looking at? differences between language used for the
reporting of global issues, and language used
for the reporting of local issues.
3 What is your research question? Our research question is: Does the language of
news reports become more complex when
reporting global issues as opposed to local
issues?
4 Do you have any supplementary And if so, what features of language are driving
research questions? these differences?
5 What is your hypothesis? To address our research questions, we formed
two contrasting hypotheses. The first
hypothesis is that the language of news
reports will become more complex when
reporting global issues because any reporting
of global news is likely to be a more
important story, and therefore more
different to explain: The language of the
report will reflect this difficulty. In contrast,
our second hypothesis is that that the
language of news reports will become less
complex when reporting global issues
because the difficult nature of describing
such world issues will cause writers to use
facilitative language: The language of the
report will reflect this facilitation.
6 What theory and background This study builds from the work of researchers
motivated this study? such as Herb Clark, Art Graesser, Walter
Kintsch, Danielle McNamara, and John
Swales. Their research suggests that
background knowledge, schemas, and
expectations of shared experience need to be
established in order to increase the likelihood
of comprehension, and that explicit cohesion
at the level of the text might facilitate this
goal. Based on this theory, we can expect
some measure of assumed common ground
between writer and reader for local issues. As
such, there will be little need for simple
language or explicit textual cohesion.
However, if the writer pays little or no
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 125 [115–127] 8.10.2013
9:42PM

The Strategy 125

table 7.1 (cont.)

Question Elevator Pitch Elements


attention to the focus of the report, then the
complexity of the global issues might
manifest itself only in more complex and less
facilitative language.
7 What is the purpose of this study? Our goal in this study is to discover and assess
the language differences used in the
reporting of local and global issues, and,
based on our findings, to offer some idea as to
the effect these language features might have
on the communicative goals of writers.
8 What materials are you using for In order to address our research question, we
the study? will construct two contrastive corpora: one of
newspaper stories concerning local issues,
and one of newspaper stories considering
global issues.
9 Which instruments will you use for Having formed the two corpora, we will process
the study and why? the text using various cognitive and linguistic
indices from Coh-Metrix, including situation
model, referential, causal, temporal, spatial,
syntactical, and lexical diversity indices.
Coh-Metrix is particularly well suited to this
study, having had its indices validated in
numerous previous studies.
10 Which statistical methods will you We will assess the differences between the
use? corpora by conducting a series of t-tests.
11 What is the relevance of your The study is of interest to writers, especially
study? In other words, who reporters, because their task is to effectively
cares? communicate information to those who wish
to learn. The task is also important to
linguists and cognitive scientists because it
stands to better explain how differences in
perceived categories (local, global) are made
manifest through linguistic features.

use the pronoun “we.” If you are writing your project as a single author,
you can use the pronoun “I.” However, many researchers balk at the first
person and prefer passive constructions. It is probably a good idea to talk
this issue over with your advisor, or to read sample articles in the publica-
tion outlet.
Let’s now look more closely at just the first two of these questions. The other
nine moves will be discussed over the remainder of this section of the book.
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 126 [115–127] 8.10.2013
9:42PM

126 Automated Evaluation of Text and Discourse with Coh-Metrix

what is the theme of the study?


The first sentence of your Elevator Pitch should be short and sweet. The
listener or the reader needs to quickly and easily form a general idea of the
topic you are about to discuss. This point is not just about style. Long and
complex sentences are cognitively demanding to process. It is not a good idea
to lose your audience on the first sentence of the paper. Note also that when it
comes to writing the paper itself, a bland general opening statement such as
“Our study is on the language of newspaper reports” is not likely to capture
the imagination of the reader. You’ll want to do all you can to make the first
sentence not only short and sweet but also alluring, enticing, and stimulating.
That said, you shouldn’t waste time on the charm of the opening statement
until a full draft is complete. Your paper is likely to develop as the draft comes
together, and you don’t want the thrust of your findings to be restricted by the
pithiness of the first sentence.

more narrowly, what are you looking at?


After a short, simple, and broad opening line, in which you establish some
degree of common ground with your audience, write a longer and more
specific second line. The second line is what your topic is about. If you had
led with the second (and longer) line, the density of the information would risk
losing the audience.
The remaining questions from the Elevator Pitch will be discussed over
the next several chapters. Some are more relevant to the introduction of a
research paper (see Chapter 8), some are more relevant to the corpus and tool
sections (see Chapters 9 and 10), and some occur in several places across the
research paper. As we develop our research paper, we will introduce new
elements that don’t clearly feature in the Elevator Pitch. These new elements
are dependent on the results of the analysis, and will be introduced as they
arise. For example, in the discussion chapter, we will describe the moves of
“interpreting the results” and the “implications of the results.” Obviously, as
we don’t have the results before we actually conduct the experiment, we can’t
have those features in the Elevator Pitch from the get-go.
Finally, as you work through developing your study, try to remember that an
Elevator Pitch is a powerful vehicle for helping you gather the parts necessary
for a research paper. But don’t worry if your Elevator Pitch only comes together
slowly. You may often find yourself writing (or saying) at the end of your
working Elevator Pitch “and that’s as far as I’ve got,” and/or “I haven’t quite
worked this bit out yet.” The temptation for many people is to think that
C:/ITOOLS/WMS/CUP-NEW/4415882/WORKINGFOLDER/MCNAM/9780521192927C07.3D 127 [115–127] 8.10.2013
9:42PM

The Strategy 127

something has to be perfect and complete before it is ready or useful. In


research, few things are ever truly complete, and we all have to live as best
we can with what we have available.

conclusion
In this chapter we introduced the basic structure of a Coh-Metrix research
study. We outlined the major parts of the study – Introduction, Method,
Results, and Discussion – and we explained that each of these sections
comprises fairly standard moves, which are often constructed with the help
of standard frozen expressions. With regard to the moves, we discussed
choosing a theme for the study and narrowing that theme down to a workable
size. With regard to frozen expressions, we explained why they are useful and
why they are expected. We also provided several examples of frozen expres-
sions. In the next chapter we will be discussing the major moves of the
introduction section of a research paper.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 128 [128–144] 9.10.2013
7:37AM

The Introduction

Writing the introduction section of a research paper can be a terribly frustrat-


ing affair. It seems that of any major section of a paper, none require more
reediting, reorganizing, and resubmitting than the introduction. In this chap-
ter, we hope to somewhat alleviate that aspect of the research paper writing
process by describing and discussing the next six major moves of the Elevator
Pitch. To be sure, even with each of these six moves tackled and pinned, writing
the introduction is still likely to require a substantial amount of work, but
having the Elevator Pitch to work from should at least provide reference points
to guide you as to which parts of your paper need your attention most.
One of the major reasons for the introduction needing so much reworking
is that the research project seldom adheres rigidly to the plan. That is, no
matter how well you plan a project, the results are unlikely to fall neatly into
the baskets that you might have hoped for. And let’s face it, if the results of
the project really did end up completely as predicted then chances are your
experiment wasn’t really all that interesting to begin with. Of course, you
might just think that if a plan is truly well thought out, then such problems
are unlikely to be overly arduous. However, there is a good reason why our
language is replete with expressions such as “the best laid plans of mice of
men” and “the first casualty of war is the plan”: Expressions such as these
remind us that while we should carefully plan for the best, we should always
be ready to expect the worst. Consequently, if we rely on merely our project’s
plan, and write too much of the introduction before we have fully examined
the data, then we will often end up with a beautiful but unusable introduction.
Such an introduction inevitably faces shredding, and it is this shredding that
can lead to the frustration that is all too familiar with writing the introduction
section of the research paper.
The aforementioned problems are somewhat alleviated by the flexibility of
the Elevator Pitch. That is, the Elevator Pitch is made up of moves, and these
128
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 129 [128–144] 9.10.2013
7:37AM

The Introduction 129

moves can be moved about. If something in the introduction isn’t working,


then maybe it’s just in the wrong place. So long as the moves themselves are
coherent units, and all the necessary moves are present, then moving the
moves is like moving the furniture in your house. And, of course, moving the
furniture in your house is not necessarily a simple or a pleasant experience.
And equally, some items of furniture seem to want to compete for the
same space. And, granted, there are some parts of the room that seem totally
unwilling to accept certain items of furniture. But for all these problems,
moving furniture is always going to be an easier task than deconstructing and
reconstructing each item.
As such, in this chapter (as with all the chapters), we don’t focus on a line-
by-line assembly of a section of the research paper. Instead, we describe and
discuss the bigger pieces (the moves). We advise you to develop these moves
and to resist the temptation to spend too much time knitting them tastefully
together: That knitting task will be hard enough without having to endure all
the pain of the unthreading caused by unruly data.
With the aforementioned issues in mind, this chapter describes and
discusses six of the major moves associated with introduction: the research
question, the supplementary research question, hypotheses, theory, the
purpose of the paper, and the relevance of the project. As with most
moves, these moves are not restricted to one part of the research paper, so
you should be prepared to tweak them to meet the needs of their relevant
section. But that said, these six moves will likely first appear in the intro-
duction, and as such, it is from the theoretical standpoint of the introduction
that we will discuss them.

the research question


Probably the most important part of all Coh-Metrix studies (and any study,
for that matter) is the research question. The research question is similar to a
thesis statement in the sense that the study itself is a response to the question.
In fact, everything in the study must in some way relate back to the research
question, and anything that doesn’t relate back to the research question
probably needs to be removed from the paper. Of course, the research
question can never stand on its own. Ultimately, the researcher also needs
to consider theory, hypotheses, purpose, and relevance. While those aspects are
all indispensible, and we address them in this chapter, an Elevator Pitch for a
first Coh-Metrix study is probably best tackled by starting with the research
question.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 130 [128–144] 9.10.2013
7:37AM

130 Automated Evaluation of Text and Discourse with Coh-Metrix

Consider the following six studies that used Coh-Metrix:


1. Bruss et al. (2004) asked: Has the language used in scientific texts
changed over the last 200 years?
2. Louwerse et al. (2004) asked: Can Coh-Metrix distinguish spoken
English from written English?
3. McNamara et al. (2011) asked: Does world knowledge affect young
readers’ comprehension?
4. Ozuru et al. (2008) asked: Does the passage (more so than the question)
explain the difficulty in standardized reading tests?
5. Best et al. (2008) asked: Do the effects of reading skills depend on the
genre of the text?
6. McCarthy et al. (2009) asked: Can Coh-Metrix replicate human ability
to recognize genre at the sub-sentential level?
You’ll notice that each of these studies is asking a research question. You’ll also
note that each of the research questions can be framed as a simple yes/no
question. Of course, the answer in the study is seldom as simple as “yes” or
“no,” but we advise you to make the research question follow this format. As
such, research questions that start with words like “How,” “Why,” “When,”
“What,” and “Where” are probably best avoided, at least at the earlier stage
of writing research papers. Another very important element of the yes/no
research question is to frame the question so that your predicted answer is
“yes” (and not “no”). The primary reason for framing the question to have
a yes answer is that this is a research paper (note the “search” part in
“research”): that is, it doesn’t make a lot of sense to be looking for something
(i.e., searching) if you don’t think it is there. Note also that if you look for
something and you don’t find it, it doesn’t mean the thing doesn’t exist: It may
simply mean that you didn’t look in the right place. There are several other
reasons for wanting the answer to our research questions to be “yes.” When
we later discuss theories, theoretical frameworks, hypotheses, and predictions,
we will return to this subject.
With the importance of the format of the research question in mind,
consider the three examples that follow:
1. Is the English writing of Korean scientists more cohesive than the
English writing of American scientists? (The researcher’s predicted
answer is “no.”)
2. Is the English writing of American scientists more cohesive than the
English writing of Korean scientists? (The researcher’s predicted
answer is “yes.”)
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 131 [128–144] 9.10.2013
7:37AM

The Introduction 131

3. What is a major difference between the English writing of American


scientists and Korean scientists? (The researcher’s predicted answer is
“cohesion.”)
The surface form of these three examples is very similar; however, only
the second question is framed with a “yes” response. As such, for a beginning
researcher, it is the format we’d recommend.
Just like the Elevator Pitch is usually a work in progress, so too is the
research question a work in progress. Indeed, up until the time that the paper
is submitted (to the professor, a conference, or a journal), the research
question needs to remain flexible. That said, and although tweaking research
questions is common, you are advised always to try to work within the frame
of the original research question, and only change the question when data or
circumstances make it absolutely necessary. For example, recall the last of
the six research questions earlier in the chapter: Philip McCarthy and his
colleagues asked: Can Coh-Metrix replicate human ability to recognize genre
at the sub-sentential level? The original research question here was can
Coh-Metrix replicate human ability to recognize genre at the sentence level?
The answer to both forms of the question was “yes”; however, during the
course of research, it was discovered that both humans and Coh-Metrix could
recognize genre at well below the sentence level; in fact, it generally required
no more than one to three words for genre to be accurately classified. As such,
we need to appreciate that our working research question helps guide us (the
researchers) in conducting our study. But our final research question (the one
in the submitted paper) helps guide our readers to a better understanding the
research that we are presenting.
So far, we have considered all research questions as simple “yes” or “no”
propositions. In practice, research questions are often supplemented with
more complex follow-up questions. These questions are often premised with
the phrase “and if so.” For example, Duran et al. (2007) asked: Are textual
features of temporality critical for coherent representations – and if so, do Coh-
Metrix indices of temporality predict human evaluations? McCarthy et al.
(2006) asked: Can Coh-Metrix distinguish the texts of Victorian authors –
and if so, does it mean that these authors have writing styles that are relatively
stable? And Crossley et al. (2007) asked: Can Coh-Metrix distinguish lexical
difference between essays written by native and non-native English speakers –
and if so, what can we learn about L2 lexical proficiency that may be relevant
to language teachers and material’s developers? The follow-up questions,
as opposed to the initial primary research question, are not always framed
as yes/no questions. And even when they are, there is less of a burden on the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 132 [128–144] 9.10.2013
7:37AM

132 Automated Evaluation of Text and Discourse with Coh-Metrix

researcher for the answer to be “yes.” That is, supplementary questions are
often much more speculative, and researchers are required to present a
number of ideas that might explain these questions. Indeed, these questions
are often the basis for “further research,” a notion we revisit later in this book
when we describe the Discussion section (see Chapter 12).
As we mentioned in the section called “Moves” in Chapter 7, a research
paper often has frozen expressions that signal a specific meaning to the
experienced reader. For the research question, the frozen expression is,
simply enough, “Our research question is . . .” This frozen expression may
not seem like rocket science; however, many new researchers think they have
to be original in their writing when, in fact, broadly accepted terminology is
far more likely to be well received.
As a final remark for this section, we should keep in mind that the main
purpose of the research question is to keep the paper focused. That is, the
researchers (and the subsequent readers of that research) should always be
able to relate any part of the paper to the research question. In yet other
words, if the relationship between the research question and any subsequent
part of the paper isn’t immediately apparent, then either that part of the paper
or the research question needs to be modified. This having been said, we also
need to remember that different researchers have different styles of inquiry,
and different research has different demands on what kinds of questions can
be asked. As such, what we have written here on research question format
should serve well the beginning researcher but it should never be treated as a
straight jacket.

theories, frameworks, and hypotheses


A research project is never an island. It is always a peninsula. By this, we mean
that research projects do not pop up like isolated islands in a barren sea of
random experimentation. Instead, research projects grow out like a peninsula
from the fertile land of theory. In this section we explore the importance of
theory as it relates to our research project, and more particularly, to our
research question. However, before we show how this link is indispensible,
let us first make sure we have a working understanding of the distinction
between the closely related terms of theory and hypothesis.

Theory
Researchers in most academic fields consider the words “theory” and “hypoth-
esis” to have quite different meanings. By contrast, in informal situations, the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 133 [128–144] 9.10.2013
7:37AM

The Introduction 133

two terms are interchangeable. Clearly, this inconsistency is likely to cause


confusion. So, in a research project, such as the one we are describing here, it
is important to use the words in a way that is generally acceptable to fields such
as psychology, linguistics, and computer science.
In the classic sense, a theory is a broadly accepted explanation and
understanding of some phenomenon in the world. For example, the theory
of evolution posits that mechanisms such as mutation cause, over time,
changes to occur in the inherent traits of organisms. Plate tectonic theory
posits that the earth’s crust comprises several large plates that move slowly
atop a viscous region of the planet’s upper mantle. And the theory of supply
and demand describes the inverse relationship between prices and sales.
These theories specify complex systems that account for a large amount of
empirical facts. Ultimately, these theories generate a large number of
hypotheses, which we will describe in detail a little later in this chapter.
Classic theories, such as those described above, are easily identified and
generally come with their own Wikipedia page and half a library, so it is
important at this point to understand that the field most closely associated
with Coh-Metrix (discourse science) is a relatively young field, and therefore
it has not yet evolved to the point of having many mature theories. Thus, the
theories that we have available to us are not as easily identified, extracted,
learned, and applied.

Theoretical Frameworks
To help us better understand how theory generally takes shape in discourse
science, we should examine the term “theoretical framework.” A theoretical
framework can be viewed as a preliminary theory. More specifically, a theo-
retical framework is a preliminary sketch of a complex system that organizes
a collection of related findings that researchers have packaged and presented
in a coherent fashion. This package may range from the entirely new to a well-
established cohort of findings supported by rigorous empirical studies. In
Coh-Metrix studies, a very pertinent example of a theoretical framework is
cohesion.
You may be familiar with a number of other terms that are very closely
related to what we have called theoretical framework. These terms include
“literature review” and a “major area paper.” Essentially, a literature review
(which is often a chapter in a dissertation) and a major area paper (which is
often a requirement for a doctorate degree) are examples of an extensive,
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 134 [128–144] 9.10.2013
7:37AM

134 Automated Evaluation of Text and Discourse with Coh-Metrix

written manifestation of a given theoretical framework. In a term paper,


proceedings, or article, space restrictions mean that the write-up of the theo-
retical framework needs to be succinct. Nevertheless, in presenting the theo-
retical framework in such a manuscript, most people will simply refer to it as
the literature review.
Because a theoretical framework is not a fully mature theory, its form is
correspondingly open to question. Indeed, it is the purpose of the research to
help better solidify this framework. As such, the researcher is responsible for
arguing (convincingly) that the literature does indeed contain a theoretical
framework that is sufficiently coherent to warrant the research at hand. The
degree to which the researcher’s argument is accepted depends on the read-
ership (or the reviewers or the grading professor). Typically, to make the case,
the researcher needs to demonstrate that a number of studies are sufficiently
related (in their goals, findings, and focus) so as to form a tangible catalyst
from which the proposed study naturally emanates.
Numerous Coh-Metrix studies include examples of constructed theoret-
ical frameworks. For example, Crossley et al. (2007) pieced together various
studies on reading materials for second-language learners and emerged with
a paper on authentic and adapted text. Hall et al. (2007) pieced together
various studies on genres, English varieties, and language learning to emerge
with a paper on differences between cross-Atlantic legal documents. And
McCarthy et al. (2009) pieced together studies on the cross cultural compo-
sitional styles of native and non-native English-speakers to emerge with a
paper comparing the English of Japanese scientists with the English of
American scientists. Additionally, Coh-Metrix studies have been conducted
that tapped theoretical frameworks associated with constructs such as
lexical proficiency (Crossley et al., 2009), lexical diversity (McCarthy &
Jarvis, 2007), deception (Duran, Hall, McCarthy, & McNamara, 2010), text
ease (Graesser et al., 2011), and, of course, cohesion (McNamara et al., 2010;
McNamara et al., 2011).
Mature theories can only hope to form if researchers are allowed to
explore kindling theoretical frameworks with a stream of empirical research
in a discovery-oriented style. A modern discovery-oriented researcher
pursues a delicate balance between the predictions of an emerging theoret-
ical framework and samples of observations in a new empirical landscape.
To be sure, many (most?) theoretical frameworks will never reach maturity,
and many others will be subsumed into larger more predictively successful
frameworks. However, it is by this very process that our knowledge of the
world progresses.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 135 [128–144] 9.10.2013
7:37AM

The Introduction 135

Hypotheses
Earlier we argued that a theory is a broadly accepted explanation and under-
standing of some phenomenon in the world. Thus, if a theory is an explan-
ation or an understanding, then theory, in whatever form, allows us to make
predictions: If we understand something, then we not only know how it works
but also how it will work. The articulation of a prediction is the application of
a theory, and when stated formally, it is referred to as a hypothesis.
A hypothesis is closely related to a research question. Recall from our
Elevator Pitch that our research question was: “Does the language of news
reports become more complex when reporting global issues as opposed to
local issues?” This research question had two corresponding hypotheses:
(1) the language of news reports will become more complex when reporting
global issues; and (2) the language of news reports will become less complex
when reporting global issues. From these examples we can see that hypotheses
are research questions set up as claims (or predictions). Also of importance,
note the word “will” in the hypothesis. This word is common in hypotheses
because we are predicting, and predictions are about the future. Also note that
hypotheses are often accompanied by an explanatory or supportive state-
ment. That is, the hypothesis states what will happen and the supporting
statement explains (briefly) why it will happen. For example, in our Elevator
Pitch, one of the supporting statements was “. . . because any reporting of
global news is likely to be a more important story, and therefore more difficult
to explain.” The clearest way to mark a supporting statement is to use the
word “because.” This word easily signals to the readers that the forthcoming
text will explain the preceding claim.
A researcher tests a hypothesis in order for us to learn more about the
theory from which the hypothesis was generated. If the results of the experi-
ment support the hypothesis, then the theory is strengthened. If the results of
the experiment do not support the hypothesis, then we may have misunder-
stood the theory, misarticulated the theory, or misapplied the theory. On the
other hand, if the results of the experiment are contrary to our hypothesis,
then we may have to reassess the theory, revise the theory, or reject the theory.
And often we simply have to reexamine the data, rethink the analysis, or redo
the experiment.
The important point with theories and hypotheses is that they are the
elements of the method through which we learn about the world. The theory
represents our current understanding, and our goal is to expand that under-
standing. To do so, we extrapolate from the theory an inference (i.e., a hypoth-
esis) about an as yet uncharted area of the framework. We then test the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 136 [128–144] 9.10.2013
7:37AM

136 Automated Evaluation of Text and Discourse with Coh-Metrix

hypothesis so that we might have evidence that will lead us to a better under-
standing of the world.

Applying Hypotheses
Let us now turn our attention to how hypotheses are put in place in a research
paper. Every research question has at least two hypotheses: H0 and H1. We use
H0 to designate what is called the null hypothesis. The null hypothesis is the
assumption that all things in the world are equal. The purpose of conducting a
test is to establish whether there is sufficient evidence to reject this null
hypothesis. That is, we want to establish whether there is sufficient evidence
to support H1, which is the theory-based prediction that at least some two
things in the world are not equal. There can also be predictions motivated by
other theoretical frameworks or even theories that predict something very
different than H1 or H0, which can be designated as H2, H3, and so on. In
our Elevator Pitch example, we can say that H1 is the hypothesis that global
newspaper articles are more cohesive than are their local counterparts; we can
say that H2 is the hypothesis that local newspaper articles are more cohesive
than are their global counterparts; and because there is always an H0, we can
say that H0 is the hypothesis that the two categories of articles are equal in
terms of cohesion.
A Coh-Metrix study that is a nice example of such a H0, H1, H2 scenario
is provided by Lightman et al. (2007a). Erin Lightman and her colleagues
investigated cohesion in expository texts. Their research question was: Does
cohesion vary as a function of the page-progress through a book chapter. From
this question she formed three hypotheses:
1. Cohesion will remain relatively constant as a text progresses because
all places in a text are equal (H0).
2. Cohesion will gradually decrease as a text progresses because greater
cohesion is needed at the beginning of a text where the student is least
likely to understand the material (H1).
3. Cohesion will gradually increase as a text progresses because as a text
develops it becomes ever more complex and will subsequently need
greater authorial connections (H2).
In this format then, we are not looking so much at an arrangement of a yes/no
question, but at an arrangement of if hypothesis H1 is correct, then expect results
R1, but if hypothesis H2 is correct, then expect results R2. On the other hand, if
there is insufficient evidence for either H1 or H2, then we cannot reject H0.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 137 [128–144] 9.10.2013
7:37AM

The Introduction 137

situating the study


Coh-Metrix is such a powerful tool that we can easily forget that it is exactly
that – a tool! By this we mean that Coh-Metrix is not the agent of the study:
the researcher is the agent. As such, the researcher needs to guide the tool in a
systematic fashion that addresses research questions, topics, and themes. If
Coh-Metrix is simply released to analyze various groups of texts (e.g. British
novels versus American novels, or male writing versus female writing, or local
newspapers versus national newspapers, or anything else of the kind), then
Coh-Metrix will undoubtedly find differences. But simply identifying differ-
ences is not particularly useful to anyone. The differences may satisfy idle
curiosity, and it may also be acceptable to conduct such a study if someone
is simply practicing corpus analysis using Coh-Metrix; however, from a
research point of view, if the study just seems to plop out of nowhere, for
no particularly obvious reason, then the value of the study is unlikely to go
beyond the person conducting the study.1 Remember, we test hypotheses so
that we can build on our theories of how things in the world work. If we have
no hypothesis, then our results are fairly meaningless because they cannot
help our theoretical framework mature.
Given that we need hypotheses, and that hypotheses emerge from theoret-
ical frameworks, we now need to look at how this emergence is presented in a
study. We call this description of the emergence situating the study, and there
are three main kinds of situating that a researcher can use: (1) responding to
an identified problem, (2) filling a gap in the research, and (3) building on
existing research. To be sure, there are other kinds of studies (e.g., validation
studies), and there are also variations on the themes (e.g., replication studies),
but the three approaches to situating the study we mention here are probably
the three most common approaches used in Coh-Metrix studies.

Responding to an Identified Problem


At its conception, the primary purpose of Coh-Metrix was to better match
text to reader. The designers of Coh-Metrix believed that such a goal could be
achieved by assessing cohesion in text. Consequently, from the get-go, the
Coh-Metrix tool has included a diverse range of measures that assess (in one

1
From a statistical point of view, an analysis that is not guided by some kind of theory is also of
questionable validity. That is, a “statistically significant result” is only really valid if we can “reject
the null hypothesis.” However, if there is effectively no hypothesis, then it is difficult to argue that
it has been rejected, meaning the result is uninterpretable.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 138 [128–144] 9.10.2013
7:37AM

138 Automated Evaluation of Text and Discourse with Coh-Metrix

way or another) the cohesion of text. One of the measures in Coh-Metrix that
is strongly related to cohesion is lexical diversity (see McCarthy & Jarvis,
2007). Lexical diversity is an assessment of the range of vocabulary employed
in a text. Texts with a lower range of vocabulary should have higher cohesion
because the same content words are used repeatedly, and that should lend
itself to cohesion. For the first version of Coh-Metrix, the measure used to
assess lexical diversity was type-token ratio (TTR), an index described in
Chapter 4 of this book. Unfortunately, TTR is confounded by variations in
text length, meaning that researchers who wanted to assess texts of different
length for lexical diversity could never quite be sure whether they were
measuring different vocabulary ranges or just different lengths of texts. To
overcome this problem, a new index of lexical diversity (MTLD; McCarthy &
Jarvis, 2010, 2013) was designed, and this new index was tested to establish the
degree to which it was resistant to variations in text length. In the article that
reported the testing of MTLD, the authors situated the study just as we have
done in this paragraph: by first of all “establishing the problem” with lexical
diversity, and then showing how that problem had been addressed.

Filling a Gap in the Research


The facilitative effects of cohesion in text are widely documented (most
notably in this book). Indeed, any number of studies has provided compelling
evidence that learning gains can result from increasing cohesion in text
(McNamara et al., 2010). Given that cohesion facilitates learning, it is reason-
able to think that essays with higher cohesion will be judged as having higher
quality. However, despite this widespread assumption, there have been
remarkably few studies that have assessed the relationship between essay
quality and cohesion. Danielle McNamara and her colleagues (McNamara
et al., 2010; see Chapter 6) conducted such a study to understand that relation
better. As such, the authors could claim that their study was filling a gap in the
research. That is, measures of cohesion had been established as facilitative in a
wide range of applications, and had moved from important studies of text
book cohesion (e.g., O’Reilly & McNamara, 2007) to the perhaps less pressing
matter by Erin Lightman and her colleagues (see Lightman et al., 2007b), who
looked at the lyrics of suicidal and non-suicidal songwriters (interesting as
that was). Thus, as a research program moves forward, it often leaves “gaps,”
and it is important that these gaps are identified and filled with evidence
rather than simply assumed to be true. In the case of “cohesion and writing
quality,” this point is particularly important because the initial study that
sought to fill the gap actually found that cohesion did not explain writing
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 139 [128–144] 9.10.2013
7:37AM

The Introduction 139

quality. Instead, writing quality was found to be aligned with measures of


text difficulty and sophisticated language. Clearly such a result is important
given the common requirement for students to write coherent essays if, in
fact, they are more likely to be judged by the complexity of their language (see
Chapter 6).
Building on Existing Research. When we discussed research questions, we
also discussed supplementary questions. Recall that we mentioned that these
supplementary questions were often more speculative than the main research
question were, often leading researchers to posit several future possible
avenues of research. A study that builds on existing research is often a response
to these speculations whereby the researcher is able to essentially move a
supplementary question into the primary-question position. In many studies,
the researcher will directly “call for future research” on specific examples that
have resulted from the primary and supplementary questions. For example,
Philip McCarthy and his colleagues (see McCarthy et al., 2009) ended their
paper about the written English of Japanese scientists and that of native
English-speaking scientists with “Future research must . . . consider other
English language varieties’ production of scientific texts.” In response, Ben
Duncan picked up the gauntlet and extended the research into the writing of
Korean scientists (Duncan & Hall, 2009). And indeed, that research was
subsequently developed further by Julie Min (see Min & McCarthy, 2010).
The point here is that any research paper can only ever hope to cover a limited
area (recall the importance of narrowing the theme). In so doing, researchers
explicitly and implicitly raise any number of questions that provide an impetus
for new studies.
Applying Frozen Expressions. As we have mentioned, moves are generally
accompanied by frozen expressions. Just as these frozen expressions are effec-
tive in signaling to experienced readers that a particular move is in play, so too
do these moves help beginning researchers execute those moves. With this
interplay in mind, we can formalize the frozen expressions associated with the
previous three moves as follows:
1. For an “establishing the problem” paper, the frozen expression struc-
ture we often use is:
This study addresses the problem of X by Y. In this formalism, X is
the problem and Y is the proposed solution.
2. For a “filling the gap” paper, the frozen expression structure we often
use is:
This study fills a gap in the research by X. In this formalism, X is
some broader restatement of the research question.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 140 [128–144] 9.10.2013
7:37AM

140 Automated Evaluation of Text and Discourse with Coh-Metrix

3. For a “building on existing research” paper, the frozen expression


structure we often use is:
This study builds from the previous research of A (B, C, etc.) [by/in
which etc.] X. In this formalism, A (B, C, etc.) refers to previous research,
and X is a (brief) summary of the purpose of the study.

the purpose and relevance of the study


The purpose of the study and the relevance of the study can be easily confused.
One way to discover the distinction is to see the purpose as expectations of
the results, whereas the relevance can be seen as expectations of the conclu-
sions. Put another way, the purpose relates to the direct results of the
particular study, whereas the relevance relates to the broader impact on the
discourse community.
Let’s revisit our Elevator Pitch to see how purpose and relevance differ in
terms of their scope (see Table 7.1 in Chapter 7).
7. What is the purpose of this study?
Our goal in this study is to discover and assess the language differ-
ences used in the reporting of local and global issues, and, based on our
findings, to offer some idea as to the effect these language features
might have on the communicative goals of writers.
11. What is the relevance of your study? In other words, who cares?
The study is of interest to writers, especially reporters, because their
task is to effectively communicate information to those who wish to
learn. The task is also important to linguists and cognitive scientists
because it stands to better explain how differences in perceived cate-
gories (local, global) are made manifest through linguistic features.
As we can see from these examples, the purpose relates directly to itinerary
manipulated in the study (i.e., language differences, local issues, and global issues).
By contrast, relevance addresses the members of the discourse community who
might benefit from the broader impact of the study (i.e., writers, reporters,
linguists, and cognitive scientists). Critically, the purpose and relevance not
only say what is important; they also state why it is important. As we mentioned
earlier, explicitly stating the reasoning releases the readers from the burden of
having to make inferences. Consequently, the readers will have more cognitive
resources available to digest and integrate other information in the paper.
Having discussed what the purpose and relevance are, we also need to say
something about how the purpose and relevance should be formed and
presented. Most importantly, the purpose and the relevance need to be
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 141 [128–144] 9.10.2013
7:37AM

The Introduction 141

sufficiently focused so as to identify a single, achievable, creditable goal. That


is, while you (as the writer) may well have a larger career goal in mind when
you put your study together, the pathway to that grand goal is laid one small
brick at a time. Thus, each research paper that you write needs to be clear as to
its scope and must not overreach that scope. Writers who attempt to hang too
large a coat on too small a hook receive short shrift from reviewers.
Additionally, if there really is much more that needs to be said, then there
is always another paper that can be written.
Let’s now turn to how “purpose” and “relevance” have shown up in some
published Coh-Metrix studies. We’ll begin with the purpose.
 Nick Duran and his colleagues write: “The purpose of this study is to
manipulate groups of mutually exclusive features of cohesion and
semantics to create an automated technique for identifying levels of
text difficulty” (Duran, Bellisens, Taylor, & McNamara, 2007, p. 233)
 Scott Crossley and his colleagues write: “The purpose of the study was to
examine whether a tool such as Coh-Metrix could discriminate between
comparable text-types and provide useful information about the subtle
differences between texts” (Crossley et al., 2007, pp. 208–209).
 Tenaha O’Reilly and Danielle McNamara write: “The goal of this study
was to determine whether the reverse cohesion effect would be offset by
comprehension skill” (O’Reilly & McNamara, 2007, p.138).
Three features of the listed examples are worth discussing a little further:
hedging, terminology, and tense. First, we can see that hedging language is
important. For instance, O’Reilly uses a word “offset” rather than “eliminate,”
and Scott Crossley uses a word “useful” rather than “vital.” Try to always
remember that the reader will decide the value of the study, not the writer. As
such, saying less often ends up meaning much more. Second, we see that the
word “goal” can be used to mean “purpose.” Presumably, “goal,” “purpose,”
and even “relevance” are interchangeable terms at some level, even if their
function in the research paper is different, which suggests that when using
these terms you would probably do well by your readers if you are clear and
consistent as to their scope within the paper. Third, you may have noticed
from these three examples that two of them use the past tense and one uses
the present tense. The difference is probably attributable to the fact that each
example comes from a different part of the research paper: the abstract, the
introduction, and the discussion. When the goal or purpose is being stated in
the introduction, it is usually in the present tense; when it is in the discussion
section, it is usually in the past tense; abstracts might use either tense, often
depending on whether the researchers see the study as complete or ongoing.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 142 [128–144] 9.10.2013
7:37AM

142 Automated Evaluation of Text and Discourse with Coh-Metrix

The ubiquity of these manuscript locations also informs us as to the importance


of restating the goals of the paper. Thus, our recommendation for the goal
statements is that they appear at least in the abstract, the introduction, and the
discussion sections, although some slight rewording is often necessary.
Turning now to a Coh-Metrix example of the relevance of the study:
 Philip McCarthy and his colleagues write: “A computational approach
to distinguishing texts offers researchers and educators a number of
exciting avenues of interest” (McCarthy et al., 2006, p. 769).
In this example, the authors explicitly state who should care: researchers and
educators. But stating who should care is not much use without also stating
why they should care. With this in mind, Philip McCarthy and his colleagues
add the following text:
For example, it allows us the possibility of better estimating the creation of
undated works. It allows us to better settle issues of authorship and cases of
fraud. It allows computer text mining systems to predict text types so that parsers
and taggers can make better predictions of syntax and parts of speech. It presents
the possibility that student writers might be able to assess their works in progress
so as to better understand the characteristics of the style they are developing. And
it allows the possibility that the appropriateness of any given text to its audience
may be more easily assessed. (p. 769)

As we can see from this example, there are many people for whom this work
might have relevance, and many reasons for why it is relevant to them.
The relevance of a study is often overlooked in papers because the study’s
researchers and the study’s audience are both part of the same discourse
community. In other words, the people who care about the findings are other
people just like those who are conducting the research. As such, there is
considerable assumed common ground. However, even if the study is primarily
of interest to the field within which you are working, you would still hope that
the findings you are planning to report will lead to a better understanding of the
issue you have identified. With this in mind, consider the following extract by
Art Graesser and his colleagues: “[U]nderstanding at the level of the mental
model has particularly important implications for comprehension because this is
the level at which many readers struggle” (Graesser et al., 2003, p. 90). The study
was clearly written for an audience familiar with such concepts as mental models
and comprehension, but the relevance of the study is still explicitly stated so that
all readers can understand how the study is important to the developing field.
Just as the relevance of a study is often overlooked by writers, so too can it
be overlooked by readers. Often, the relevance of the study comes straight
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 143 [128–144] 9.10.2013
7:37AM

The Introduction 143

after the purpose of the study, so inexperienced readers might not even notice
it as a distinct move. For example, in the following extract, Scott Crossley and
his colleagues say who should care about the study and what they should care
about (see italics in the excerpt) almost immediately after they remind us
what the purpose of the study was (see underlined).
The purpose of the study was to examine whether a tool such as Coh-Metrix could
discriminate between comparable text-types and provide useful information
about the subtle differences between texts. The results of this study suggest that
computational tools such as Coh-Metrix can be used as a means of distinguishing
groups of similar text-types. From a practical standpoint, the findings provide
researchers interested in the field of second language material development with
fundamental information about how simplified and authentic texts differ and to
what degree. (Crossley et al., 2007, pp. 208–209)

To help readers (noting that reviewers and professors are readers too) identify
the relevance of a study, it is probably a good idea to point out exactly to whom
the study is of interest and exactly why it is of interest to them. For example,
maybe the study has practical benefits, making it of interest to text book
designers, teachers, or developers of intelligent tutoring systems. If so, make sure
that a good number of examples of what you are studying are included in the
paper, so that developers can easily establish how the research can be applied. If
the study is more directly of interest to the field, then you need to state clearly
which area of the field and why your study is of benefit to that area of the field.
Applying Frozen Expressions. As ever, there are some frozen expressions
that may be of use when writing relevance moves. For example, we can write:
“This study is of interest to X because Y.” In this formalism, X is who should
be interested and Y is the reason they should be interested. Sometimes, there
is just one major interested party. In this case, a helpful frozen expression is:
“This study is important because X.” Here, X is why people (or the field in
general) should care about the study.
Finally, just in case you might be pondering the value of frozen expressions
like these, we present below a little indication of their widespread use and
growth. The numbers associated with the frozen expressions that follow are
the number of Google hits for the phrase, as taken in June 2012. The numbers
in the parentheses are for the same phrases as recorded a year earlier (June
2011). We’ll leave the math (and the implications of the math) to you:
“this study is of interest to” = 96,700 (56,900)
“this paper is of interest to” = 65,800 (36,800)
“this project is of interest to” = 270,000 (53,400)
“this work is of interest to” = 1,290,000 (250,000)
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C08.3D 144 [128–144] 9.10.2013
7:37AM

144 Automated Evaluation of Text and Discourse with Coh-Metrix

I. Introduction: At least 1 full page, never more than 2 full pages


a. Theme
b. Research Question
c. Supplementary Research Question
d. Hypotheses
e. Theory
f. Purpose
g. Relevance
II. Method:
a. Tool description: If it is Coh-Metrix, a maximum of 1 page, although a paragraph
is usually enough.
b. Corpus description: Less than a page, certainly not more than a page.
III. Results: 1 to 4 pages, depending on the number of analyses and quantity of tables
or figures.
IV. Discussion: At least 1 full page, never more than 2 full pages.
a. Research Question
b. Supplementary Research Question
c. Hypotheses
d. Purpose
e. Relevance
fi g u r e 8 . 1 . Coh-Metrix Research Paper Outline

“this study is important because” = 189,000 (119,000)


“this paper is important because” = 82,500 (16,700)
“this project is important because” = 177,000 (38,600)
“this work is important because” = 361,000 (69,000)

back to the outline


In the previous chapter we described a typical outline of a Coh-Metrix paper.
We can end this chapter by looking again at that outline, but now adding to it
the moves we have discussed over these first two chapters in Part II. Doing so
we hope that you will begin to see how the paper is coming together. Note that
the moves frequently occur in two places in the paper (see Figure 8.1, partic-
ularly the boldface elements).

conclusion
In this chapter we discussed forming a research question and supplementary
questions, stating theoretical frameworks and hypotheses, situating and
integrating theory, identifying the purpose of the study, and ensuring that
the relevance of the study is made explicit. In the next chapter we will be
discussing the material for the study (i.e., the texts comprising the corpus).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 145 [145–162] 9.10.2013
7:44AM

The Corpus

In many Coh-Metrix text analysis studies, there is no section with the label
“Method.” Instead, most Coh-Metrix text analysis papers tend to have two
major sections that lie between the Introduction and the Results: These
sections are descriptions of the corpus and the tool, respectively. The sections
on the corpus and the tool largely serve the same purpose as traditional
Method sections. That is, instead of describing the participants in the experi-
ment and how the experiment was conducted, the papers discuss the texts in
the corpus and the variables used from Coh-Metrix. Some Coh-Metrix corpus
studies do use a “Method” header, which is often followed by subheaders for
the description of the corpus, the tool, the variables, and so forth. The final
choice for headers is up to the researcher, the professor, or the conference/
journal guidelines.1 Whatever your headers, however, the next two major
sections we have to consider are the corpus (i.e., the collection of texts we will
use) and the tool, Coh-Metrix (what it is, what it does, why we’re using it, and
how we’re using it). This chapter focuses on the first of those sections, the
corpus.
In Chapter 8 we used the research question as our starting point for a Coh-
Metrix project. We also mentioned that most researchers would argue that
the starting point must be the theoretical framework. However, whether you
start with a research question or with theory, you will very soon afterward
need to be considering your corpus, and continue considering your corpus
during most of the research process.
A corpus is a collection of texts. These texts are the subject of any Coh-
Metrix analysis. The texts are of immense importance because they are the

1
For those interested in a detailed account of Method sections geared more toward psychology
research papers, or if you are planning to do research that involves human participants, we
recommend you also look at Kallet (2004).

145
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 146 [145–162] 9.10.2013
7:44AM

146 Automated Evaluation of Text and Discourse with Coh-Metrix

empirical manifestations of the hypothesis you are testing (see Chapters 8 and
11 for more on hypothesis testing). Building a corpus is no simple matter, and
many criteria have to be considered (e.g., what kinds of texts should be in it,
how large does it have to be, etc.). Careful considerations of these and other
questions are just as important as the forming the research question, the
hypotheses, and the theory.
With all of these points in mind, we shall now carefully examine the
concept of the corpus (plural: corpora) and, more particularly, the character-
istics of corpora that are suitable for Coh-Metrix studies.

what is a corpus?
At a basic level, a corpus is a set of texts that are relevant to the research
questions and that have relevant themes, registers, genres, or text types. At a
more sophisticated level, we can consider a corpus to be “a set of written,
representative and balanced, computationally readable texts that form a
reasonable point of departure as a thematically related language variety,
register, genre, or text-type.” Clearly, this long definition requires some
breaking down, and so the remainder of this section of the chapter examines
each of the elements in this definition so as to provide a better understanding
of what Coh-Metrix studies typically consider to be a corpus.
Language Variety, Register, Genre, or Text Type. By language variety,
register, genre, or text type we simply mean that we have no intention of
splitting hairs over these categorization terms, or trying to define where one
category ends and another one begins (interesting study though that may be).
We acknowledge that any number of researchers may feel that a distinction
between some of the terms is crucial. And, to be sure, we would probably call a
corpus of “narrative introductions” a text type rather than a genre, and a
corpus of public speeches a register rather than a language variety. However,
in Coh-Metrix studies, we have yet to experience reviewers having a problem
with how we choose to use these terms, so we leave the choice of terms up to
the individual researcher.
Written, Computationally Readable Texts. By written, computationally
readable texts we mean that Coh-Metrix can only analyze that which is
computationally analyzable. More simply, there is no slot in Coh-Metrix
through which we can deposit handwritten texts, painted texts, CDs of
talks, DVDs, or any example of sign language or brail. Although making
such remarks might seem obvious, it is nevertheless important to consider
these limitations of Coh-Metrix because (1) many people ask us, (2) future
developments in Coh-Metrix need to consider these aspects because they are,
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 147 [145–162] 9.10.2013
7:44AM

The Corpus 147

after all, language too, and (3) if the researcher’s texts are in any of these
forms, then they will have to be changed to .txt or .doc documents, a process
that might be extraordinarily long and arduous.
Thematically Related. By thematically related we mean that every text in the
corpus is related to every other text in the corpus by a single theme. Thus, just as
“eagle,” “crow,” “robin,” and “swan” are all related to the common theme of
“birds,” so too must every text in a corpus be an example of an overarching
theme. In our example corpus (which we introduced in the previous chapter),
all of our texts fall under the common theme of newspapers.
Representative and Balanced. The terms “representative” and “balanced”
are closely related to the previously discussed notion of “thematically related.”
The key difference is that while thematically related puts the focus on the need
for the texts to be members of a single theme, representative and balanced put
the focus on the need for the theme to have an appropriate membership of
texts. To explain further, the terms “representative” and “balanced” address
the reasonable expectation of someone using the corpus to find within it a
suitable diversity of types of text and a suitable frequency of examples of these
types. To draw an analogy, let us imagine that we happen upon a building that
calls itself Los Compadres. And let us imagine that this building has pinned
on its wall a sign that reads “restaurant.” Within the building, that we take to
be a Mexican restaurant, it would be reasonable for us to expect food items
that included burritos, tacos, enchiladas, and the like. It would also be
reasonable to expect tables, chairs, beer taps, and servers. The presence of
these diverse items constitutes “representativeness.” But now imagine that
inside this building there were just one burrito, one server, one kind of beer,
and 5,000 tables. Such a frequency of examples of the membership would be
extremely poorly “balanced.” Thus, balance refers to an appropriate number
of examples of the membership items.
Turning from a Mexican restaurant to a more text-like example, imagine a
corpus of American newspapers. A corpus of American newspapers is not
simply a corpus of newspapers; it is explicitly a corpus of American newspapers.
As such, it should not contain British, Australian, or Icelandic newspapers
because British, Australian, and Icelandic newspapers are not representative of
American newspapers. And if the corpus of American newspapers is truly a
corpus of American newspapers, then it would have to have both national and
local newspapers, because if it had only national newspapers, then it would be a
corpus of American national newspapers. Further, American national news-
papers have been around for more than 100 years, so if the corpus contained
only articles from, say, 1990 to 2010, then it would not be a corpus of American
national newspapers; it would be a corpus of articles from American national
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 148 [145–162] 9.10.2013
7:44AM

148 Automated Evaluation of Text and Discourse with Coh-Metrix

newspapers from 1990 to 2010. And so on and so forth. The point here is that a
researcher needs to consider very carefully the scope of the corpus in order to
make it sufficiently representative (i.e., having all the major members) and
sufficiently balanced (i.e., having appropriate numbers of the major members).
But note here the use of the word “major.” We will return to this point later in
the section.
But the terms “representative” and “balanced” don’t apply just to the
diversity of the total items in the corpus. They also apply to the diversity
within the items itself. That is to say, we must not make the mistake of
thinking that all texts are homogenous; Instead, we must accept that texts
(like pretty much everything else) are made up of many different parts, each
of which may be quite different in nature. For example, let’s consider a
news show, a restaurant dinner, and an ice hockey game. Now let’s divide
each of these examples into thirds (first third, middle third, and final
third). Arguably, the first third of the news show is the most important
part because that’s where the headlines and big stories are most likely to
be. For a restaurant dinner, the appetizer and the desert may be highly
enjoyable, but the middle third (the main course) is probably what the
customers will remember most about the dining experience. And in an
ice hockey game, action can happen at any of the three periods, but it’s
probably the third period (i.e., the final third) that most people would want
to watch if they could only view one part of the game. A text is very similar.
The opening and the closing are quite different aspects, so much so that they
have come to be known by various names that identify them as distinct
types: openings are variously referred to by terms such as exposition,
introduction, foreword, commencement, and preface; closings are variously
referred to the denouement, conclusion, postscript, and finale. Even texts as
small as the paragraph may open with something called a topic sentence and
close with something called a warrant sentence (McCarthy et al., 2008). And
we cannot even assume that the openings and closings are equal in size; after
all, the opening of War and Peace (a 2,000-page tome) is hardly equal in
length to the opening of Three Little Pigs. And what about the middle of the
text? Is the middle only the very middle? How many words on either side of
the middle are also “in the middle”? All of these questions need to be
carefully considered so that the corpus can be justified as representative
and balanced.
For Coh-Metrix analyses, it is vital that the corpora be representative and
balanced. However, let us make it clear that in research in general, the
composition of the corpus depends on the task at hand. For instance, imagine
that we wanted to examine the language of English, with all of its history and
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 149 [145–162] 9.10.2013
7:44AM

The Corpus 149

variety. And imagine that to do this we used only one text type, let’s say
newspapers. And imagine further that the representativeness of this corpus
amounted to no more than a single type of newspaper, let’s say The Wall
Street Journal. Such a corpus you might think was extremely flawed (given
what we have previously discussed). However, it is interesting (and maybe a
bit worrying) to note that the majority of computational parsing technology
(including the parser used in Coh-Metrix) has been developed, tested, and
validated on exactly this highly unrepresentative corpus.
Shouldn’t this lack of representation present a problem? In fact, it really
doesn’t present that much of a problem at all (at least for some tasks!).
Even though The Wall Street Journal is extremely unrepresentative of
English language as a whole, it is nevertheless a pretty large sample and it is
written in English. These two elements alone mean that a colossal amount of
information can be gleaned from it. Indeed, when Gildea (2001) assessed
state-of-the-art parsers by replacing The Wall Street Journal with the Brown
Corpus (arguably the very model of representativeness and balance, having
15 different registers and numerous examples of each), he found the two
corpora produced remarkably similar results.
The point with a corpus as seemingly unrepresentative as The Wall Street
Journal is that we can learn a lot from it. That is, we can still learn a lot from it
if our task is appropriate. For instance, we can use the corpus to learn that the
most common word in the English language is “the,” and within the corpus
we can find numerous examples of typical English syntax: subject-verb-
object. We can also search the corpus to see what is rare in English. For
example, the part-of-speech structure verb-noun-verb-adjective-article-verb
is very uncommon in English (and very uncommon in The Wall Street
Journal corpus). Having identified which structures are rare, we can assume
that those structures will be difficult for readers to process. In short, then, we
can do (and indeed have done) numerous investigations with this corpus, and
the findings from these investigations can be (and indeed have been)
extremely valuable to a wide variety of research fields.
There are, however, certain things we can’t do with a corpus such as The
Wall Street Journal. We cannot address research questions such as: Are
higher-graded student essays more cohesive? Are doctors’ conversational
turns more cohesive than patients turns? Does newspaper English have
more examples of referential cohesion than causal cohesion? We cannot
address any of these questions because: (1) The Wall Street Journal is not in
any way a graded student essay, so we cannot make any claims that are
specifically about graded student essays; (2) The Wall Street Journal is not
in any way a conversation, so we cannot make any claims that are specifically
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 150 [145–162] 9.10.2013
7:44AM

150 Automated Evaluation of Text and Discourse with Coh-Metrix

about conversational English; and (3) although The Wall Street Journal is an
example of a newspaper type, we cannot make any specific claim that it
“generalizes” to all newspapers. As such, the general rule of the thumb for
satisfying representation is the wider you make the representation, the more
able you are to generalize your conclusions.
A Reasonable Point of Departure. By reasonable point of departure we
mean that we don’t need a “perfect corpus”; we just need one that gets the ball
rolling. The concepts of representativeness and balance (discussed earlier)
make it extremely time consuming and expensive to collect the “perfect
corpus.” Put another way, the concepts of representativeness and balance
mean that the corpora we make must be extraordinarily narrowly defined in
order to be appropriately representative and balanced. Many researchers
working in the field of corpus linguistics take these issues extremely seriously
and dedicate huge amounts of time and effort to making remarkable corpora
that are impressively representative and balanced. The British National
Corpus is a good example of this dedicated effort, as are the famous Learner
Corpus and Brown Corpus. Another prime example is the TASA corpus,
which we have used for many purposes including the calculation of the
Norms in Appendix B.
For a Coh-Metrix study, an expansive effort in constructing the corpus is
not usually required. A corpus of the type used in Coh-Metrix studies is not
the same (nor meant to be the same) as a corpus such as Brown, Learners, or
TASA. Corpora such as those are painstakingly constructed as reference
points, suitable for multiple, extensive, and recursive examination. In a
Coh-Metrix study, the goal of the corpus is seldom the making of a fine and
solid reference repository. Instead, the goal is defined by the research ques-
tion, and the corpus is simply a means to this end (which is why putting it in a
method section is appropriate). As such, the important aspect of the corpus in
Coh-Metrix studies is that it be practical and suggestive, rather than exhaus-
tive and definitive.
The notion of “practical” and “suggestive” leads us back to the key phrase: a
reasonable point of departure. That is, if our research question requires us to
examine a set of texts to find evidence for or against a claim, then the question
is: Where is a practical place to start, from which the results are likely to be
sufficiently suggestive to guide our future research? Let’s say our research
question is: Are newspaper headline stories more cohesive than editorials?
To address this question, as a reasonable point of departure, we would
probably aim for a minimum corpus of, say, 3 major newspapers, with
40 editions of each over some fairly recent time slot (e.g., the immediately
previous 2 months, or 6 months from the previous year). To be sure, whatever
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 151 [145–162] 9.10.2013
7:44AM

The Corpus 151

the results, the findings of this analysis can never be more than suggestive,
because the size and scope of the study is extremely limited. Nevertheless, the
corpus is still a reasonable point of departure because, while a positive result
(one that supports the H1 hypothesis; see Chapter 8) is only suggestive, a
negative result (one that finds no differences at all between the two text-types,
the H0 hypothesis) would almost immediately end the research project
(or dramatically change its direction). More importantly, a positive result
would guide the researcher into the next step of the project, which might
include (1) extending the current corpus to include more major newspapers;
(2) extending the corpus to include local newspapers; or (3) extending the
corpus to include English-language newspapers from other places in the
world. This building of the corpus, directed from the findings of the initial
analysis, returns us to our point of practical. At the same time, our negative-
result example, leading to a possible abandoning of the project, also leads us
to the notion of practical because here practical means disposable. After all,
can you imagine spending a year or more making a definitive corpus only
to find nothing at all in the results? As such, it is much better to start with a
small corpus and build out slowly and carefully, one step at a time, letting
the results of one study guide the direction of the next study, and, whatever
the results, offering only small, humble, and hedged claims as to their
generalizability.
There is one further point on this issue of a reasonable point of departure.
The researcher does not have to have a homemade corpus (like the example of
newspaper corpus given earlier). An alternative approach is to use an already
existing corpus (usually one that is established by way of publication). Such a
corpus (we’ll call it the stand-in corpus) may well be perfect for the analysis at
hand, but more often it is not. However, as we seldom have the time and
resources available to put together the perfect corpus, a stand-in corpus is
often a reasonable point of departure. The non-perfect nature of the corpus
makes any results you draw from the analysis suggestive, not definitive, but
these results will still offer direction for future research.
The approach of the stand-in corpus is as commonplace in the discourse
sciences as is the smaller, more practical homemade corpus. To better under-
stand the stand-in corpus, let’s make an example and say that our research
question involves a study of narratives and expository texts. The Brown
Corpus would be a reasonable point of departure for this study because
(1) it has many examples of fiction texts in it; (2) it is large, certainly compared
to most Coh-Metrix studies; (3) it is well established; and (4) it is relatively
easy to get a hold of. But, at the same time, the Brown Corpus has problems:
(1) it is old, having been compiled in the 1960s; (2) it is composed of only
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 152 [145–162] 9.10.2013
7:44AM

152 Automated Evaluation of Text and Discourse with Coh-Metrix

American texts; (3) it is limited in scope because all the texts are 2,000-word
extracts; (4) major registers such as African-American literature are not
present; and, most importantly in this example, (5) the research question
addresses narratives and expository texts whereas the Brown Corpus has
narratives and non-narrative texts. To equate the non-narratives as exposi-
tory texts means that the researcher will not gain a definitive answer.
Nonetheless, if the researcher were to find no cohesion differences at all
between narratives and non-narratives (in the Brown Corpus), then a serious
rethinking of the research project would be needed, and many months
(maybe years) will have been saved. From these examples we can better
understand just how important is the concept of a reasonable point of
departure.

how large does a corpus have to be?


Probably the most common question that students ask when beginning a
quantitative corpus analysis is how large the corpus has to be. There are three
major responses to that question, and we will discuss each one in turn.
Large Enough to Fairly Reflect the Source. If your research question
concerns, say, books of the Old Testament, then you’re going to have a corpus
of between 39 and 51 texts (depending on which church you take to be the
authority). In this scenario, it is simply not possible to have 200 texts for such
a corpus, because there simply aren’t 200 texts in any version of the Old
Testament. By the same token, it would not be good practice to have fewer
than 39 texts in such a corpus because (1) the texts are very easy to collect and
(2) if you did not include all of the texts, you would not really be assessing the
Old Testament (given the ease of collection).
On the other hand, let’s say that you were interested in a study of Internet
home pages. Some estimates put the total number of Internet home pages in
excess of 1 trillion, but whatever the number, it might be fair to argue that 39 to
51 is hardly a reasonable sample. The ease with which we can gather such
documents, and the incredible number of the documents available, means
that a reasonable size for such a corpus would be in the thousands. More to
the point, such a vast number of available texts should guide the researcher to
want to considerably narrow down the research question.
Large Enough to Drive Forward the Research Project. As we mentioned
earlier, a corpus needs to be representative and balanced, and yet at the same
time it must be practical and able to produce suggestive results. The point is
that the corpus must be (1) large enough to have at least several of the major
writers or resources; (2) large enough to have at least several examples from
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 153 [145–162] 9.10.2013
7:44AM

The Corpus 153

the major writers or resources; and (3) large enough so that results stemming
from the research will be sufficiently compelling for the field to accept the
study as a meaningful step forward. This third point is critical. Recall that
discourse scientists use corpora to direct their future research. The future
research is probably going to be decided by the number and type of statisti-
cally significant results garnered from the analysis, and compelling statistical
results simply cannot be observed if the corpus size is too small (for more
on this subject, see Chapter 11 on results). As a very simple rule of thumb,
researchers are advised to have at least 20–30 texts for each variable in the
analyses they conduct. For example, if you want to examine a corpus for
its referential cohesion, it will cost you 20–30 texts. If you want to subse-
quently examine it for its syntactic complexity, it will cost you an additional
20–30 texts. And so on and so forth.
Three Hundred Texts of 300 Words Each. The 300:300 response is one
that students like hearing, presumably because it is very easy to understand.
By 300:300 we mean there should be a total of at least 300 texts, with each text
being about 300 words long (so that the mean text length is about 300 words,
with a standard deviation of less than 150, which is half of the mean). So, why
is 300 good?
In short, 300 is not “good.” It is simply a large enough number to probably
cover a wide range of requirements for empirical studies. Moreover, it is a
small enough number to be practical for collection in many studies. Let’s look
closer.
If the corpus has 300 texts, then its chances of being completely unrepre-
sentative and completely unbalanced are dramatically reduced. Of course,
there is no guarantee, but the larger the number, the lower the likelihood.
If the corpus has 300 texts, then, in all probability, it can be analyzed with a
large number of Coh-Metrix variables. The ratio rule of thumb described
earlier suggests that a corpus of 300 texts allows comfortably for a test of 10 to
15 variables.
Finally, a corpus of 300 is a nice round number that allows us to divide the
total into a training set of 200 texts and a testing set of 100 texts. We discuss
training and testing in detail later in this chapter. For now, it is enough to
know that the number 300 is suitable for such divisions.
On the issue of 300 words, we also want to make clear that the number is
simply convenient. Similar to 300 texts, the convenience in no way explicitly
helps the validity of a study, but it does cover a number of possible problems.
For example, texts of 300 words do not take too long to process in Coh-
Metrix, whereas texts containing thousands of words can be problematic
depending on the variables used. Similarly, very short texts (i.e., fewer than
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 154 [145–162] 9.10.2013
7:44AM

154 Automated Evaluation of Text and Discourse with Coh-Metrix

100 words) are problematic for many variables because there are not enough
words to establish confidence in the assessment. And very short texts (i.e., a
paragraph or so) are often unlikely to have developed fully their range of
cohesion values.
The 300:300 rule is not a bad idea to keep in mind when starting a Coh-
Metrix study. However, the research question, the hypotheses, theory, prac-
ticality, and the need for a response that will guide future research must, in the
end, determine the final size of the corpus.

requirements of the corpus


For the analyses you will be conducting on the corpus, and for the conclusions
you will draw from those analyses, two basic assumptions of the corpus have
to have been satisfied. These assumptions (and here, “assumptions” can be
read as “requirements”) are that the items in the analyses (i.e., the texts) are
random and that the items are independent. Let’s deal with each of these
terms.
Random. By random we do not mean “all mixed up.” We mean that the
texts were sampled systematically and objectively, with no example of any text
being favored. For example, if you are collecting a corpus of British literature,
you can’t “accidently on purpose” make sure that your favorite writer or
favorite book gets included. Similarly, you can’t make sure that the text you
least like gets excluded. Sometimes you will have a long list of possible texts
(say 1,000), but only need 50 of them. In such a case you need to find some
way to randomly select those 50, rather than, say, taking the first 50 on the list,
the last 50 on the list, or the first 50 in alphabetical order. Randomness is
important for the validity of your corpus: You need to be able to claim that if
other researchers follow the same procedure as you, they should come up with
results similar to your own. The degree to which you have collected your
corpus in a nonrandom way is the degree to which your results are unlikely to
be replicated. And if no one can replicate your results, then it starts to look as
if you simply got it wrong.
Independent. Each text in the corpus is considered independent if it is
distinct from all the other texts in the corpus. But, of course, all texts in a
corpus are related at some level, otherwise it wouldn’t be a corpus. As such, we
need to clarify the term “independent.” Let’s take an example: Sooner or later,
every student of text analysis gets to thinking, “If I have only 20 texts, and
I really need 40 texts, then why don’t I just split my texts in half?” Don’t do
this! Splitting up texts to increase the size of your corpus is called falsely
increasing your degrees of freedom, which is basically a form of cheating. If
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 155 [145–162] 9.10.2013
7:44AM

The Corpus 155

each text is split in half, then the two halves cannot be described as inde-
pendent because they are each dependent on their corresponding half. Of
course, if each text has one other half, the corpus might seem “independent
enough,” but if each text has its other half, then each text is very closely related
to one text and very distantly related to the remaining 39 texts.
But why does this even matter? It matters because the statistical analysis we
conduct on the corpus takes the number of items in the analysis very
seriously. A corpus is, of course, just a sample of some phenomenon of the
world; it is a sample that, we are arguing, is representative of that phenom-
enon of the world (e.g., newspaper stories). The larger our corpus is, the more
like the real-world phenomenon it is because the more of the real-world
phenomena are in it. Consequently, the larger the corpus is, the greater the
confidence we can have in our analysis, and the statistics we use in our
analysis reflect this. As such, doubling our corpus by chopping it in half is
likely to get us a “better result” without actually increasing the corpus’s
representation of the world. Consequently, the result will be misleading.

cleaning the corpus


Whether corpora are collected by the researcher, designed by professionals, or
borrowed from other studies, few of them are ever clean. A clean corpus is one
that is as close to a human-readable form as possible. In other words, a clean
text looks just like it would look if the writer had just finished typing it,
printed if off, and handed it over to the reader.
So when are corpora ever dirty? Many professional corpora are annotated
for such features as parts of speech, intonation, and even the actions of the
speaker (e.g., “walks out of the room”). In other cases, such as student essays,
odd line breaks may have occurred, bizarre spelling is ubiquitous, and there
are often such features as the student’s long and rather charmless evaluation
of the exercise just undertaken. In still further cases, corpora that have been
passed around from computer to computer tend to “grow” various oddities
such as the odd Spanish letter, a string of mathematical symbols, or maybe
just a wingding or two. And in cases where researchers have converted
documents that include pictures into text files, the picture in the document
disappears, often leaving the caption of the pictures lurking mysteriously in
the middle of the text. Each of these dirties has the potential to seriously
undermine the analysis offered by Coh-Metrix.
The biggest problem with these dirties is that they are never consistent. Or,
put another way, where they have been found to be consistent, we have
designed algorithms to correct for them. As such, it is the researcher who is
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 156 [145–162] 9.10.2013
7:44AM

156 Automated Evaluation of Text and Discourse with Coh-Metrix

ultimately responsible for making sure that the corpus is sufficiently clean,
because, as the old computational saying goes when garbage goes in, garbage
comes out.
A second issue of cleaning concerns consistency. Many students ask what
they should take out of a text and what they should leave in (e.g., headers,
typos, spelling mistakes, pronunciation guides, etc.). To address this question,
we offer two golden rules of analysis:
1. Unless there is a good reason to take it out, you should leave it in.
2. What you do to one, you do to all.
Rule 1 simply asserts that the default condition of the text is exactly the way
you find it. Any changes made to it after that should be documented and
reported in your paper. Some of the most common changes are removing
annotations and picture captions. The annotations are removed because the
text is unreadable with them in, and if they are left uncleaned, Coh-Metrix
results are likely to be flawed. The picture captions need to be removed
because they are not part of the continuous text that the writer intended.
Moreover, their insertion into the document renders the sentence mean-
ingless, and the corresponding evaluations may be misleading.
Rule 2 means that you cannot pick and choose the texts that you modify. If
you remove something from one text (e.g., a date that happens to be at the end
of a text), then you must check that none of the other texts also have that date
(and if they do, then they all must be removed, or all kept). The same
consistency is necessary for spelling corrections and typos. It is tempting
when you see a spelling mistake to correct it, but unless you plan on correct-
ing the entire corpus, you should leave things the way you find them.
Finally, know that encountering a few dirties across the corpus is not
considered unusual. As a general rule of thumb, we say that the corpus
needs to be at least 95% clean. That is, about 95% of the texts should have
no problems at all, and at least 95% of each text should be thoroughly correct.
If your corpus is very large, and reading through all of it to make sure it is
clean would take considerable time, then assessing a sample of the text (e.g.,
10–20%) is generally considered sufficient.

organizing your corpus


The organization of your corpus is important for two reasons. First, your
corpus needs to be organized so that the appropriate kinds of statistical
analysis are applied to the assessments. Second, your corpus needs to be
organized so that you can find things. Although organizing your corpus may
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 157 [145–162] 9.10.2013
7:44AM

The Corpus 157

seem like a mundane task (and, actually, it is), without careful organization at
the outset, you’ll soon find yourself spending inordinate amounts of time
trying to pick through your files so as to try to make some sense out of things.
In short, mundane task or not, organization of the corpus must be taken
seriously.
Arranging Your Corpus. For most Coh-Metrix analyses, there are four
basic arrangements of the corpus: the between (contrastive), the within (com-
parative), the matched, and the standard. Before explaining why we need to
even care about these arrangements, let’s take a moment to explain what each
arrangement looks like.
The most common organization of data in a Coh-Metrix corpus is the
contrastive (or between or independent). Essentially, a contrastive organiza-
tion has one corpus that is divided into two (or more) roughly equal parts.
The object of the study is to contrast the two parts, the hypothesis being that
the two parts are different. Our newspaper example, the one we gave in the
Elevator Pitch in Chapter 7, is an instance of contrastive analysis between two
categories: local and global news reporting. In examples of published Coh-
Metrix studies, Crossley et al. (2007) contrasted two sets of texts used by
English language learners: one set was authentic texts and one set was
simplified texts. McCarthy et al. (2009) contrasted the writing of three sets
of scientists: one British, one American, and one Japanese. And Duran et al.
(2007) also used three categories to determine temporal cohesion differences
between the categories of narrative texts, history texts, and science texts.
The comparative (or within or repeated) organization again features two
(or more) sets of texts; however, the difference here is that the two sets are not
independent. For example, the two sets could be (1) students’ essays before an
intervention (e.g., a course in which they are taught something) and (2) essays
by those same students written after the intervention. Other forms of com-
parative design are the first half of a story compared to its second half, or a
first draft compared to a second (or final) draft, or two or more sections from
the same article. This last example occurred in a study by McCarthy et al.
(2007), in which the authors looked at five categories in journal articles:
abstracts, introductions, methods, results, and discussions. Because each
category comes from the same article (and therefore, presumably, the same
writers), it is not considered to be an independent arrangement.
A matched corpus is very much like a comparative corpus. The only
difference is that nonindependence is forced on the texts. For example,
Lightman et al. (2007b) examined (rather morbidly) the song lyrics of artists
that had committed suicide. Each artist that had committed suicide was
matched with a similar artist that had not committed suicide (e.g., Ian
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 158 [145–162] 9.10.2013
7:44AM

158 Automated Evaluation of Text and Discourse with Coh-Metrix

Curtis was paired with David Byrne, Kurt Cobain was paired with Chris
Cornell). Effectively, there is no analytical difference between a matched
arrangement and a comparative arrangement, but the right terms should
still be used when describing the data.
And finally, the standard straight corpus, as the name might imply, is
simply the corpus you have without any form of categories. For example,
Weston et al. (2010) analyzed a corpus of free-writes. Each text in the corpus
was given a value for quality, but the corpus as a whole was considered just
one category: free-writes.
The statistical analysis you ultimately use to better understand your results
depends on the arrangement of the corpus. It is for this reason that you have
to make sure that your data is arranged as one of these four types, and not
some kind of odd mixture. That is, you can get into some serious statistical
trouble if some of your data is paired and the rest is independent. For
example, probably the most simple (and yet very powerful) form of statistical
textual analysis is a t-test. In the field of discourse science, a t-test allows you
to make a claim that your groups from your Coh-Metrix analyses are indeed
“different.” However, there is more than one kind of t-test, so the question
becomes which t-test you should use. A paired t-test should be performed if
your corpus is comparative or matched, whereas an independent t-test should
be used if your corpus is contrastive. We discuss statistical analyses in more
detail in Chapter 11. For now, it is enough to know that your corpus arrange-
ment is critical to establishing the value of your ultimate findings. If your
arrangement is a hodgepodge of mixed and independent, then no statistical
analysis will be appropriate, and therefore no meaningful assessment of your
data can be made.
Coding Your Files. In any kind of Coh-Metrix study, it is wise to code the
names of your files to reflect the categories of which they are a part. For
example, in the Duran et al. (2007) study (mentioned earlier in the subsection
on contrastive groups), the narrative texts were coded with the letter N
followed by an underscore, the history texts were coded with an H followed
by an underscore, and the science texts were coded with an S and an under-
score. In many studies, number sequences are preferred to letters. Whatever
the organization, the point is to be consistent, because you will be amazed
how soon you forget what was what and where was where. Here’s an example
of how some of the file names appeared in Nick’s study:
N_07_Treasure_23_045
H_12_Civil_07_063
S_09_Cells_18_107
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 159 [145–162] 9.10.2013
7:44AM

The Corpus 159

In this coding, the first symbol represents the category (N, H, or S), the second
symbol represents the grade level of the text (7–12), the third is a short form of
the name of the text, the fourth is the sequence number of the text with the
category (1–50), and the final symbol is the number of the text in the entire
corpus (1–150). Note that if the highest numbers are likely to be three figures
(e.g., 107), then smaller numbers also need to have three figures (e.g., 045).
Keeping index names that appear as numbers to the same length may help
later when sorting data. In a matched corpus, the names of the two versions
are likely to be the same except for one key element: the one that indicates to
which of the two corpora it belongs. This single indicator is likely to be the
most important feature of the name (inasmuch as it will probably be the
feature that is viewed most often to check for membership). As such, in a
matched corpus, the distinguishing key is likely to be the first element of the
name.
Coding your files also becomes important when it comes time to conduct
the statistical analyses. The Coh-Metrix output includes only the names of the
files and the Coh-Metrix output. And so, the only way to categorize the items
is by means of the file names. If the file names include all of the necessary
information, then the Coh-Metrix data file is ready to be analyzed.2

writing up the corpus description


Although the Elevator Pitch doesn’t mention the corpus a great deal, you will
still need to know how the corpus section is written up in your paper. The
corpus section is fairly short and includes just three to four moves.
Referring to Table 9.1, the “What is the composition of the corpus?” move
addresses the broadest description of the corpus. All remaining questions
are far more fine-grained. The “How can the composition be justified?” move
addresses issues of representation and balance. The “How were the texts
collected?” move reveals the dates of the texts, where the texts came from
(e.g., websites, professional corpus, another study, etc.), and details of any
changes that the researchers made to the texts. The final move of “How were
the texts coded?” is only included if the researchers had to further categorize
the texts. That is, if the study were merely comparing native English-speaking
countries (New Zealand and Canada) to non-native English-speaking

2
When using Excel, we generally use the Text-to-Columns feature, which breaks up each of the
parts of the text title into separate columns. Each column can then be used as a variable in the
analyses.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 160 [145–162] 9.10.2013
7:44AM

160 Automated Evaluation of Text and Discourse with Coh-Metrix

t a b l e 9 . 1 The four major moves of the corpus section

Question Major Corpus Section Moves


1 What is the composition Our corpus comprises 300 texts, taken from the
of the corpus? newspapers of 6 different countries. The two native
English-speaking countries were Canada and New
Zealand (100 texts each). The four non-native English-
speaking countries were Bulgaria, the Czech Republic,
Portugal, and Romania (50 texts each). The native
English-speaking countries have twice as many texts as
the four non-native English-speaking countries to
reflect the assumption of a larger readership.
2 How can the composition The four European countries were selected to increase
be justified? representation while staying broadly within a group
that can be described as similar (i.e., Western). More
specifically, we did not include Middle Eastern
countries or Asian countries because it is conceivable
that cultural differences could affect the notion of local
and global. As such, we chose to narrow our initial
foray to representatives of European and
Commonwealth countries. American and British
newspapers were not included in the corpus for several
reasons. First, because Britain and the United States are
two of the most internationalized countries in the
world, it is often hard to define where local issues
become global issues. A second reason for not
including British and American newspapers was
accessibility. Although many international newspapers
are freely available, a good number of British and
American newspapers require a subscription charge,
making their inclusion prohibitively expensive. Finally,
British and American newspapers tended to produce
substantially longer stories. The variations in length
presented the possibility of confounding the analysis.
3 How were the texts All texts from the corpus were taken from the second half
collected? of 2007. We wanted stories to be recent enough that
they reflect issues of relevance to the readership;
however, we also needed the stories to be as completed
as possible, because local stories can become
international stories. For these reasons, a time in the
recent past was selected. All texts were randomly
selected from their respective Web sites; however, only
headline stories were included. Once downloaded and
saved, all texts in the corpus were manually cleaned for
headers, author names, and other features of the text
that were not part of the story (e.g., dates and
newspaper names).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 161 [145–162] 9.10.2013
7:44AM

The Corpus 161

t a b l e 9 . 1 ( cont.)

Question Major Corpus Section Moves


4 How were the texts For our initial analysis, we automatically divided the texts
coded? using a key word search. The key words were the
variations on the countries names. For example, if a
text from a newspaper in Romania included
“Romania,” “Romania’s,” “Romanian,” or
“Romanians,” then it was assigned as local; if none of
the key word searches were present, then the text was
deemed global (although nonlocal would also be a
reasonable category). The key-word-sorting technique
resulted in X texts assigned as local and Y texts
assigned as global. Key word sorting is a common, if
crude, technique that is likely to result is some texts
being misaligned. Nevertheless, for an initial
investigation, the key word division of texts into local
and global is a reasonable point of departure.

I. Introduction: At least 1 full page, never more than 2 full pages


a. Theme
b. Research Question
c. Supplementary Research Question
d. Hypotheses
e. Theory
f. Purpose
g. Relevance
II. Method
a. Tool description
b. Corpus description:
i. What is the composition of the corpus?
ii. How can the composition be justified?
iii. How were the texts collected?
iv. How were the texts coded?
III. Results: 1 to 4 pages, depending on the number of analyses and quantity of
tables or figures.
IV. Discussion: At least 1 full page, never more than 2 full pages.
a. Research Question
b. Supplementary Research Question
c. Hypotheses
d. Purpose
e. Relevance
fi g u r e 9 . 1 . Coh-Metrix Research Paper Outline
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C09.3D 162 [145–162] 9.10.2013
7:44AM

162 Automated Evaluation of Text and Discourse with Coh-Metrix

countries (Bulgaria, the Czech Republic, Portugal, and Romania), then the
categorization process would be obvious: specifically, they would be catego-
rized according to where they were found.

some final words on your corpus


Back it up. Several times. Several places. Several computers.

back to the outline


Let’s end this chapter by updating our outline of a Coh-Metrix paper. We can
now add the major moves of the corpus and the tool to the outline (see
Figure 9.1).

conclusion
In this chapter we described the material for the experiment (i.e., the corpus).
Predominantly, we provided guidance as to the composition of the corpus,
and the organization of the corpus. And we ended the chapter with examples
of the four major moves associated with the corpus section of a research
paper. In the next chapter we discuss the tool (i.e., Coh-Metrix) that you will
be using in your research project.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 163 [163–175] 9.10.2013
7:54AM

10

The Tool

As we mentioned at the beginning of Chapter 9, the Method section of most


Coh-Metrix research papers comprises a section on the corpus and a section
on the tool. In the last chapter, we discussed the corpus, and in this chapter we
complete the traditional Method section by discussing the tool (which in this
case is Coh-Metrix). Of course, by the time you have reached this point in the
book, we assume that you already know a thing or two about Coh-Metrix.
That is, you will likely know what it is for, which buttons to press to make it
function, what the output looks like, and what some of the indices are. But
knowing what Coh-Metrix is and knowing how to describe it for your read-
ership are two quite different things. It is this description of the tool and its
reason for inclusion in your project that forms the focus of this chapter.

the four major moves of the tool section


In a research paper, the section describing the tool is relatively short because
your study is not likely to be about Coh-Metrix; it is merely using Coh-Metrix.
Therefore, all you need in order to describe the tool is to employ the following
four moves: explain (1) what the tool is, (2) what it does, (3) why it can be
trusted, and (4) why it is appropriate for the current study. To better under-
stand these moves, let’s look at some examples that we’ve adapted from a
study by Hall et al. (2007).

The Boilerplate Moves


The first three of the aforementioned moves are fairly boilerplate. They don’t
change much from paper to paper, because there really isn’t very much that
can change. As such, we recommend that you stick fairly closely to what we
have written earlier. Of course, we’re not saying that you should simply copy
163
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 164 [163–175] 9.10.2013
7:54AM

164 Automated Evaluation of Text and Discourse with Coh-Metrix

and paste these sections, but we are saying that your sections are likely to look
very much like these.
One change that you can make (and really should make) is your selection
of where and how Coh-Metrix has been used in previous studies (Question 3
in Table 10.1). On this note, we advise you to list studies according to the
following criteria: (1) the studies that are most relevant to your own; (2) the
studies from the most major journals; and (3) the studies that are most recent.

t a b l e 1 0 . 1 The four major moves of the tool section

Question Major Moves of the Tools Section


1 What is the tool? Recent developments in computational linguistics and discourse
processing have made it possible for researchers to develop a
wide range of sophisticated indices. These indices have been
gathered together in a tool called Coh-Metrix (see Graesser
et al., 2004), developed at the Institute for Intelligent Systems
at The University of Memphis.
2 What does it do? Coh-Metrix processes texts for numerous indices of cohesion,
language, and readability, which together allow the tool to
estimate a wide range of textual features reflecting cohesion
relations, and world knowledge, together with language and
discourse characteristics. Coh-Metrix functions through a
variety of modules including syntactic parsers (Charniak,
2000), latent semantic analysis (LSA, Landauer, McNamara,
Dennis, & Kintsch, 2007), and many other computational
linguistics features (Jurafsky & Martin, 2008). In addition to
its sophisticated indices, Coh-Metrix also provides
researchers with a range of traditional textual measures such
as average word length, average sentence length, and the
readability formulas of Flesch Reading Ease and Flesch-
Kincaid Grade Level (Klare 1974–1975).
3 Why should we Several studies have validated the Coh-Metrix indices, most
trust it? notably the cohesion and LSA indices (McNamara et al., 2010),
the lexical diversity indices (McCarthy & Jarvis, 2010), and the
L2 index (Crossley, Salsbury, & McNamara, 2009). Coh-Metrix
has also been used to help establish a wealth of evidence on a
variety of text analysis studies. For example, McCarthy, Lewis
et al. (2006) demonstrated that Coh-Metrix was an effective
tool in detecting authorship even when individual authors
recorded significant shifts in their writing style. McCarthy et al.
(2007) used Coh-Metrix-based LSA indices to demonstrate
structural cohesion across variously themed psychology
articles. Duran et al. (2006) used Coh-Metrix to assess
temporal coherence across the textual domains of narratives,
history, and science. Other Coh-Metrix studies include
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 165 [163–175] 9.10.2013
7:54AM

The Tool 165

t a b l e 1 0 . 1 ( cont.)

Question Major Moves of the Tools Section


distinguishing between high-cohesion and low-cohesion texts
(McNamara, Ozuru, Graesser, & Louwerse, 2006), estimating
human-assigned grade levels of published textbooks (Dufty
et al., 2006), calculating textual genre (Duran & McNamara,
2006; McCarthy et al., 2006), assessments of the structural
organization of published high school textbooks (Lightman
et al., 2007a, 2007b), assessments of formal/informal and
spoken/written distinctions across genres (Dempsey,
McCarthy, & McNamara, 2007) Louwerse et al., 2004), studies
of gender differences across texts (Bell, McCarthy, &
McNamara, 2012), and assessments of authentic and modified
texts published for students of English as a second language
(Crossley, Louwerse et al., 2007; Crossley, McCarthy, &
McNamara, 2007). This wide variety and wealth of successful
studies provide compelling evidence that Coh-Metrix is an
ideal tool for investigating the characteristics of text.
4 What are you using Of particular interest to us in this study are the following
it for? measures: (and those measures would then be listed here)

It is also a good idea if you have actually read the studies that you list; if not, it
can get a little embarrassing during presentations.
As a final point on the third move, note that listing previous studies
that have used the tool (i.e., Coh-Metrix) isn’t a validation of the tool per
se (at least, not in the more traditional sense of validation). Validation of a
tool is typically established by testing that the tool does what it is supposed to
do. Numerous such Coh-Metrix studies have been conducted. For example,
Danielle McNamara and her colleagues showed that Coh-Metrix coreference
measures replicated human assessments of high and low cohesion (McNamara,
Louwerse, McCarthy, & Graesser, 2010; see Chapter 6). We can refer to
validation studies of this type as intrinsic validity. That is, the study itself is
concerned with the validation process. By contrast, we can say that extrinsic
validation refers to a provision of evidence in terms of widespread use and
acceptance by the discourse community. Thus, intrinsic validity establishes
that X is suitably representative of Y, regardless of whether anyone treats it as
such, whereas extrinsic validity demonstrates that X is treated by the discourse
community as suitably representative of Y, regardless of whether it actually is.
Needless to say, a combination of both intrinsic and extrinsic validity is most
desirable to establish confidence in a computational tool – and fortunately,
Coh-Metrix has an abundance of both. Consequently, an extrinsic validity
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 166 [163–175] 9.10.2013
7:54AM

166 Automated Evaluation of Text and Discourse with Coh-Metrix

move (such as that given in Table 10.1) should be enough for most readers to be
persuaded that the tool you are using (i.e., Coh-Metrix) has earned sufficient
trust to conduct the task at hand.

Selecting Variables
From Table 10.1, it is only the fourth move – What are you using Coh-Metrix
for? – that must change for each study. For this move you will select the
variables, or banks of variables, that are of most interest to your study. You
will say what the variables are called, why you have selected them, and what
you expect the results to show (i.e., your predictions; see Chapter 11 for more
on this issue). You will also need to describe each of the variables. Sometimes
each index is described separately, and sometimes you will describe the
indices in terms of groups or banks (see Chapter 4).
Selecting variables is not straightforward, and we need to discuss this issue
in quite some detail because if you select too many variables, or you select the
wrong variables, you run the risk of invalidating your study. On the other
hand, if you choose too few variables, you run the risk of finding no results,
making your study essentially worthless. As such, let’s tread very carefully
through this potential minefield.
Deciding How Many Variables to Use. To help you decide how many
variables you can use, we have provided four heuristics. Note that heuristics
are not laws; instead, they are pieces of advice or the generalization of past
practices. You need to consider very carefully how you will apply these
heuristics, taking as much advice as you can find. As you seek out this
advice, you will find many voices that are (shall we say) “animated.” In
short, passions can run high on this subject and you’d do well to spend a
good number of years simply soaking in the vast amount of commentary
that is out there.
The 20:1 Rule. The 20:1 rule says that you can use 1 variable for every 20
items in your corpus. For example, if you are looking at a corpus of 100 essays,
then you can use 100/20 = 5 variables. The number 20 is in no way an ideal, and
many people would strongly argue that 30:1 is far more reasonable because it
allows for more powerful statistical analyses. Of course, more is always better,
but it is probably fair to say that 20:1 is broadly accepted as a minimum ratio of
items to variables (or indices).
Use Them All, Report Them All. A second approach to selecting variables
is very simple: just use them all. But if you do use them all, you have to report
them all too. That is, it is just as important for researchers to know which
variables were not significant as which ones were significant (see Chapter 11).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 167 [163–175] 9.10.2013
7:54AM

The Tool 167

Using all the variables has one major advantage and one major disadvantage.
The major advantage is that the results of the analysis can be seen from an
exceptionally broad perspective. As such, we can textually view a corpus
from numerous angles, providing us with the clearest possible insight into
how the corpus differs between constructs. The major disadvantage is that
using all the variables compromises traditional agreements on the level of
statistical significance. That is, the more variables we use, the more likely we
are to see what appears to be statistically “significant” results. However, such
a result is like shooting for three points in basketball: The more we shoot, the
more likely we are to get a basket but without meaning that we are neces-
sarily getting any better at shooting. Therefore, while using all the variables
gets us a grand picture, it makes interpreting the accuracy of the picture
much harder.
Use Theory. Some people argue that we can use as many variables as we
wish, provided we have good theoretical reasons for using them. Although
there is some merit to this claim, it is difficult to imagine the possibility of
sound theoretical reasons for a large basketful of variables. To be sure, includ-
ing reasoning for the use of any variable is a good idea, and having no reason to
include a variable probably means that it should be left out of the analysis. In
short, the better the theoretical reasons for including a variable, the greater the
benefit of the doubt when it comes to assessing the interpretation of the results.
Train and Test. If your data set in large enough – say, 300 items – then
training and testing is possible (see Chapter 9 for discussions on corpus
size). For this approach, we typically divide the data into two groups, with
the training set being two-thirds of the data (200 items in this example) and
the testing set being the remaining one-third (100 items in this case). We then
apply all the indices (or any number of the indices) to the training set only.
From these results, we take only the variables that meet a predefined level
(say, a p-value of less than .05; see Chapter 11). We then test those variables
that passed the criterion using the testing set data. If the variables are statisti-
cally significant on the testing set (again, say a p-value of less than .05), then
we have reason to have confidence in them.
Other Considerations in Variable Selection. In the preceding examples we
used the word “variable” rather than “index” or “measure.” We did so because
the jury is still out on whether the 20:1 rule applies to measures or indices.
Indeed, as mentioned, the 20:1 rule itself is not carved in stone. Bearing all this
in mind, our recommendation is to treat the 20:1 rule for measures (i.e., groups
of related indices that all purport to assess that same construct). However,
note that some constructs generate indices that are highly related, and there-
fore generally they are highly correlated (e.g., referential cohesion indices),
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 168 [163–175] 9.10.2013
7:54AM

168 Automated Evaluation of Text and Discourse with Coh-Metrix

whereas other constructs (e.g., word frequencies and syntax) are far less likely
to produce highly correlated results. As such, always try to err on the high side
of items to variables.
A second major consideration in variable selection is to note that you get
rewarded for “success” and punished for “failure.” For example, let’s imagine
that we selected 10 variables and we gave good theoretical reasons for each one
of them. If, in the end, only 1 out of the 10 variables was statistically significant,
we’d have good reason not to trust that lone successful variable. That is, in
a result where we were wrong 9 times out 10, the basis of our theory is likely
to be highly suspect, and the one success is more likely to be attributable
to chance. Similarly, if only 1 out of 10 referential cohesion indices shows
significant results, there is a very good chance that the one significant difference
occurred purely by chance. On the other hand, if we have significant results for
9 out of 10 analyses, then we can also have quite some confidence in the 10th
analysis, even if it isn’t (quite) “statistically” significant. That is, our theory is so
good that this time it is the one bad result that can be put down to chance.
A third consideration is that approaches used commonly in the past might
not be the best approaches to be used in the future. For example, the training
and testing approach (described earlier) is common in Coh-Metrix literature
(and common in many types of literature); however, we typically used this
approach during a long process of validation studies in which our goal was to
know how well the variables worked, and how powerful Coh-Metrix could be.
Put another way, the Coh-Metrix team has done plenty of these studies, but
how satisfactory this form of analysis is going forward could be described
as open to discussion. The main bone of contention, as we saw in the previous
chapter, is that any form of analysis that lacks the guidance of theory is of
debatable value to the developing theoretical framework. Of course, if theoret-
ical motivations are appropriately included in the analysis, then training/testing
is less of an issue; but then again, if theoretical motivations are appropriately
included, then there seems little reason to use a training/testing approach.
In the end, the best advice we can give you is the following:
1. Keep your items-to-variables ratio as high as possible. Having a large
corpus – say, 300 items – helps in this endeavour.
2. Think very carefully about each variable before you use it, because if
you use it, you really should report the result (whatever the result is).
3. Non-significant results, although seldom appreciated in the broader
field, can be every bit as enlightening as significant results.
4. Statistical significance is important, but it is not everything. Means,
standard deviations, and especially effect sizes can be just as enlightening
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 169 [163–175] 9.10.2013
7:54AM

The Tool 169

(see Chapter 11 for more on this). As such, some statistical knowledge is


vital for your study.
5. No approach is perfect. You need to be ready to defend the approach
you have taken and to acknowledge its limitations.

the slightly bigger picture


If you’re conducting a Coh-Metrix study, it is likely that someone at some
time will ask you questions such as “Why Coh-Metrix? Why not some
other textual analysis system?” In a similar vein, people are likely to want to
know more about the broader research environment of Coh-Metrix. For
example, “What do you call a researcher who works with textual analysis
tools?” and “What do you call the field within which Coh-Metrix studies
are conducted?” These are all reasonable questions because Coh-Metrix
isn’t an isolated island of research, and those researchers conducting
Coh-Metrix projects aren’t working in a research environment that has
no history and no complementary interests. Indeed, the opposite is much
more the case, with those who developed Coh-Metrix liking to think of
the tool (and those who use the tool) as falling under a very large tent of
scientific research (see Chapters 2 and 3). With this “big tent” in mind,
then, we feel that it is worth knowing (at least) a little about the broader
environment of Coh-Metrix.
In this final section of the chapter we briefly address the slightly bigger
picture of Coh-Metrix. We begin with an outline of the complementary field
of Applied Natural Language Processing (ANLP), discussing the focus of this
field, and why Coh-Metrix studies are prominently represented in its scope of
interest. We then turn to the complementary textual analysis tool (LIWC;
Pennebaker, Booth, & Francis, 2007; Pennebaker, Chung, Ireland, Gonzales, &
Booth, 2007). Although there are many textual analysis tools available, LIWC
probably has the most comparable history, availability, and breadth of interest
to Coh-Metrix. As such, LIWC is the most suitable candidate to represent
alternative systems. Next we briefly discuss a more qualitative approach to
textual analysis, using concordancers. We demonstrate the difference between
this kind of research and Coh-Metrix projects, and offer some advice as to how
concordance work can complement quantitative studies more closely associ-
ated with Coh-Metrix. Finally, we discuss the algorithms that conduct the
textual analysis performed by tools such as Coh-Metrix. More specifically, we
visit the seldom-discussed trade-off between the accuracy of the variables and
the usefulness of the variables.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 170 [163–175] 9.10.2013
7:54AM

170 Automated Evaluation of Text and Discourse with Coh-Metrix

Applied Natural Language Processing


In this book, we have used the term “discourse science” to refer to the broad
scope of interdisciplinary research with which Coh-Metrix is typically asso-
ciated. As members of an interdisciplinary pursuit, discourse scientists rec-
ognize that there is considerable overlap between any number of research
areas, and more importantly, discourse scientists recognize that this overlap
in ideas, approaches, and interests feeds the individual and collective deve-
lopment and progress of all contributing fields. But just as there is overlap in
researchers’ ideas, approaches, and interests, so too is there overlap in research
fields. That is, while Coh-Metrix may be associated with discourse science, it is
also a prominent member of a field called applied natural language processing
(ANLP). A basic understanding of this complimentary field may be of use
when considering why you’re using Coh-Metrix in your study.
The field of ANLP focuses on how automated approaches to textual analysis
assists with solving language-related issues (Boonthum-Denecke, McCarthy, &
Lamkin, 2012; McCarthy & Boonthum-Denecke, 2012). Of all automated
approaches to textual analysis, no system can claim a greater contribution to
ANLP than Coh-Metrix can. In fact, Coh-Metrix projects feature in 9 of the
55 chapters that form the two volumes edited by McCarthy and Boonthum-
Denecke on ANLP. Like discourse science, ANLP is inherently an interdisci-
plinary field, typically featuring contributions from cognitive psychologists,
computer scientists, and linguists. Perhaps the main difference between the
two fields is simply the focus of the particular project, with the focus of ANLP
inevitably being the computational aspect that is analyzing the construct of
interest. Thus, we could say that anyone who is applying Coh-Metrix in their
research is doing ANLP. The point here is simply that Coh-Metrix studies are
as likely to be recognized as examples of ANLP as they are to be recognized as
examples of discourse science.
Linguistic Inquiry and Word Count. LIWC Pennebaker et al., 2007) is a
textual analysis system designed to identify social and psychological phenom-
ena. LIWC utilizes a wide variety of dictionaries (or word lists) to report the
percentage of words in a text that are representative of particular psychological
categories. The 2007 version of LIWC provides roughly 80 word categories, but
also groups these word categories into broader dimensions. Some examples
of the broader dimensions are linguistic words (e.g., pronouns, past tense),
psychological constructs (e.g., causations, sadness), personal constructs (e.g.,
work, religion), paralinguistic dimensions (e.g., speech disfluencies), and punc-
tuations (e.g., comma, period). For example, the dictionary of “family” consists
of lexical items such as “aunt,” “brother,” “father,” and “grandchild.” Given a
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 171 [163–175] 9.10.2013
7:54AM

The Tool 171

simple text such as “I saw my aunt, brother, father, and grandchild,” LIWC
would record a textual value of 50 for “family”: (dictionary words / total words)
* 100; which is (4/8) * 100 = 50.
The apparent simplicity of the LIWC system should not make you think
its assessments are vapid or error prone. On the contrary, LIWC has been
used in numerous studies to investigate an impressively wide array of con-
structs (Pennebaker, 2011). Moreover, LIWC software can be dated back to at
least 2001 (Pennebaker, Francis, & Booth, 2001), making it one of the earliest
publicly available textual research tools. In short, LIWC’s contribution to
discourse science and ANLP cannot be overstated. And while its approaches
may lack the sophisticated mathematics of more contemporary measures, its
findings present a formidable list of achievements.
LIWC variables and Coh-Metrix variables share some overlap. Indeed, the
overlap is such that Duran et al. (2010) were able to replicate a deception
study that was originally devised by the LIWC team. However, while several
descriptive variables are certainly comparable across the two systems, their
respective goals are fairly distant. LIWC assesses the degree to which a given
construct is present in a given text; Coh-Metrix seeks to better assess a text for
its potential readability and comprehension. Clearly defining the purpose of
your own study should help you decide whether LIWC or Coh-Metrix is the
more appropriate system for your particular project.
Concordancers. Concordancers are any type of computational tool that
focus on the identification of words in context. Thus, whereas “calculators”
(e.g., LIWC) focus on adding up how many times words occur in texts, con-
cordancers focus on identifying the snippets of text in which those words occur.
A concordancer is useful because it tells us about the company that any given
word keeps. For example, Rufenacht, McCarthy, and Lamkin (2011) assessed
the difference between early-learner reading texts for native English-speakers
(e.g., fairy tales) and conventional, early-learner reading texts for English-
language learners. Specifically, the authors used a concordancer to compare
the company of highly common words (e.g., “the”). The analysis suggested that
fairy tales were significantly more likely to feature concrete nouns with the
word “the” (e.g., “the ground,” “the fire,” “the wood,” “the ogre,” “the palace”),
whereas English-language learning texts were more likely to feature abstract
nouns with the word “the” (e.g., “the way,” “the idea”).
Numerous concordancer tools are freely available for download. Some of the
more famous systems include AntConc (http://www.antlab.sci.waseda.ac.jp/
index.html) and MonoConc (http://www.monoconc.com). Systems such as
these are easy to operate, function across a wide variety of platforms, and include
numerous textual investigation features (e.g., word counts, lists, contexts).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 172 [163–175] 9.10.2013
7:54AM

172 Automated Evaluation of Text and Discourse with Coh-Metrix

Concordancers offer simplicity and accessibility, coupled with a rich history in


textual analysis. Although they typically lack sophisticated measuring com-
ponents, they can provide useful insight, often leading to a rich vein of inves-
tigation. One further advantage is that even the most technophobic of
departments can generally find some tolerance for concordancers. This toler-
ance presumably stems from the fact that concordancers seldom feature any-
thing more than the most basic of measures. This lack of explicit evaluation
means that there is little controversy in their output; after all, the output is
simply the co-text of any given search term.
For many researchers who might call themselves discourse scientists, a
concordancer is likely to be a supplemental rather than leading vehicle of
inquiry. That is, a concordancer doesn’t truly assess text; it merely presents it,
allowing researchers to form hypotheses more often than test them. While it
may be fair to say that concordancers seldom lead the research charge, it is also
perhaps fair to say that they are underused in presenting examples of claims
that have been formed by more sophisticated approaches. Some textual analysis
systems (e.g., the Gramulator; McCarthy, Watanabi, and Lamkin, 2012) have
therefore included simple concordancing modules so that derived evaluations
could be supplemented by textual extracts. Typical Coh-Metrix constructs
(e.g., cohesion) don’t necessarily lend themselves easily to textual examples
(as cohesion may be played out over long stretches of discourse at various levels
of abstraction); nevertheless, when possible, examples are always useful for
readers, so researchers using Coh-Metrix may well consider whether a con-
cordancer could be beneficial in their project.
Textual Analysis Algorithms. Coh-Metrix uses a wide variety of textual
analysis algorithms. Many of these algorithms (e.g., sentence length and word
frequency) are relatively simple and have long since shaken off any mean-
ingful controversy that may have accompanied them. On the other hand,
many Coh-Metrix variables are still the subject of debate. For example, many
studies (e.g., McCarthy, Rus, Crossley, Bigham, Graesser, & McNamara, 2007)
have reported inconsistencies with LSA variables (see Chapter 3 for a descrip-
tion of LSA). And there are measures such as lexical diversity, which have
a broad range of approaches, often leading to contrasting findings (see
McCarthy & Jarvis, 2007, 2010, 2012). Considering this issue of the measure-
ment validity, and also bearing in mind that Coh-Metrix is, in essence, a large
repository of such measures, it is worth considering the theoretical relevance of
the variables that Coh-Metrix makes available. As with prior discussions on the
choice of tools and the choice of approaches, the issue here is why we might use
this variable (and not that one), and what we should and should not expect
from this variable.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 173 [163–175] 9.10.2013
7:54AM

The Tool 173

Coh-Metrix incorporates numerous sophisticated algorithms, which are


developed based on principles of artificial intelligence, linguistics, and cognitive
psychology. In general, the purpose of these algorithms is to either simulate or
imitate human processes. In terms of simulation, the objective is to develop
algorithms that are operationally comparable to certain cognitive processes. In
this case, the algorithms are constrained by theory, which may affect their
performance. However, the goal in this case is to learn more about how the
mind might work, and so the algorithm function is less about how accurately it
dissects, clusters, or categorizes text and more about how it informs us as to
theoretical perspectives. In terms of imitation, the objective is to develop
algorithms that are operationally comparable to certain human performances.
In this case, the algorithms are less concerned with how we do things and more
with how well we do these things. Like simulation, imitation algorithms may
be guided by theory, but they may also have little or no a priori theoretical
connection to the underlying cognitive processes. For example, most people
can perform the task of differentiating between a narrative text and an expos-
itory text. Cognitively derived algorithms (e.g., LSA) can also perform this task
well; however, the genres can also be distinguished well based solely on the
distribution of the letter d (hardly a cognitive approach!). The letter d, as it
turns out, is commonplace in narratives because it features as the last letter in
the past tense of regular verbs. In contrast, the past tense is relatively rare in
expository texts. Here then, there is little reason to imagine that our minds
process text in terms of instances of a single letter, although that single letter (as
it turns out) works very well.
This discussion on algorithms (simulation versus imitation) informs us
that Coh-Metrix text analysis is not always about getting the best numerical
result. That is, if we want to learn more about how people process text, then a
“weaker” LSA result may be more informative than a “stronger” instances-of-
the-letter-d result. On the other hand, if your goal is classification (a pursuit
common in fields such as text mining, data mining, and machine learning),
then cognitive theory is probably not as high on your priority list. What
matters then is your goal. And the variables you select in your project should
reflect that goal (for a more formal discussion of this topic, see McNamara,
Crossley, and Roscoe, 2012).
This discussion of simulation versus imitation leads us back to an earlier
discussion on how many variables to use in your analysis. If your project is
more theoretically driven, then you’re likely to have to invest a fair amount of
time is selecting and describing your variables. By contrast, if your project is
simply to develop something for a relatively “dumb” classification applica-
tion, then you’re unlikely to want to spend much time giving the theoretical
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 174 [163–175] 9.10.2013
7:54AM

174 Automated Evaluation of Text and Discourse with Coh-Metrix

unpinning for the variables you use; instead, you’re likely to want to throw a
lot of variable into the pot to see which ones stick - because you wouldn’t
want to miss some potentially useful index (even if its reason for “working”
isn’t particularly obvious). Although this “dumb” task does (probably) allow
for a greater generosity in the ratio of variables to text, you’d do well to
remember that the keeping the ratio at around the 20:1 or 30:1 minimum is
still highly recommended.

In Sum
The development and application of textual analysis tools can be placed in the
field of ANLP, which is dedicated to identifying, investigating, and resolving
language-related issues through automated approaches. Coh-Metrix studies
form one of the most prominent areas of this field, and that central position
looks likely to continue well into the future.
In terms of textual analysis systems, it is evident that Coh-Metrix is an
immensely powerful tool. Clearly, Coh-Metrix is also one the most widely
applied and best-known textual analytics tools. But Coh-Metrix is not the
only textual analytics tool, and neither is its quantitative approach the only
approach available in textual investigative studies. Other tools (e.g., LIWC)
and contrasting analysis approaches (e.g., concordancing) are also available
to researchers, and knowing a thing or two about these other systems and
approaches may help you better design and execute your projects.
In terms of the algorithms that Coh-Metrix employs, most are theoretically
derived, and those theoretical underpinnings are described at length in Part I
of this book. Other algorithms in other systems may well produce “better”
accuracy results in some tasks (such as classification) because those variables
are derived more for the purpose of performance and are less constrained by
interests in cognitive processing. When selecting your variables, you should
consider the purpose of your project, and you should understand that the
variables you choose may not necessarily lead to the best statistical results
(in some particular instance of a project). Remember that a researcher’s goal
is likely to be expanding the theoretical framework rather than getting a single
“good result.” In short, try to keep in mind the (slightly) bigger picture.

back to the outline


Let’s end this chapter by updating our outline of a Coh-Metrix paper. We can
now add the major moves of the tool to the outline (see Figure 10.1).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C10.3D 175 [163–175] 9.10.2013
7:54AM

The Tool 175

I. Introduction: At least 1 full page, never more than 2 full pages


a. Theme
b. Research Question
c. Supplementary Research Question
d. Hypotheses
e. Theory
f. Purpose
g. Relevance
II. Method
a. Tool description:
 What is the tool?
 What does it do?
 Why should we trust it?
 Why are we using it in the current study?
b. Corpus description:
i. What is the composition of the corpus?
ii. How can the composition be justified?
iii. How were the texts collected?
iv. How were the texts coded?
III. Results: 1 to 4 pages, depending on the number of analyses and quantity of
tables or figures.
IV. Discussion: At least 1 full page, never more than 2 full pages.
a. Research Question
b. Supplementary Research Question
c. Hypotheses
d. Purpose
e. Relevance
fi g u r e 1 0 . 1 . Coh-Metrix Research Paper Outline

conclusion
In this chapter we have described the tool you’re likely to use in your experi-
ment (i.e., Coh-Metrix). We have provided the four major moves associated
with describing the tool. We also have discussed some of the slightly broader
issues concerning computational textual analysis. In the next chapter we
discuss the basics on how to write up the Results section of the paper.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 176 [176–193] 9.10.2013
7:58AM

11

The Results

The Results section in a research paper is generally the last section to get started
but it is often the first section to get finished. That is, once you have collected
your data (the corpus), you’ll need to analyze it, and once you’ve analyzed it,
it is a relatively simple task to write it up. It is a relatively simple task to write it
up because the results section is (or least can be) highly formulaic. Indeed,
some software (e.g., the Gramulator; McCarthy, Watanabi, & Lamkin, 2012)
actually conducts statistical analyses and automatically outputs an acceptable
(if highly formulaic) results section.
As in Chapters 8, 9, and 10, this chapter looks at the writing process for a
short Coh-Metrix paper in terms of moves and frozen expressions. We will
also look briefly at the meaning of those strange letters and numbers that are
the main feature of the results section (the t’s, p’s, d’s, etc.). We have made
every effort to make this chapter as accessible as possible, assuming that the
reader is relatively new to reporting statistical results; however, as mentioned
at the beginning of Part II, we have also assumed that the reader has some
statistical knowledge. Therefore, the reader should be aware that it is beyond
the scope of this book to explain in any great depth what statistics are, which
kinds of statistics are appropriate for which kinds of analyses, how statistics
work, how they are calculated, how they should be interpreted, and how they
are often misinterpreted. To address questions such as these in more detail,
there are excellent resources available, such as the textbooks SPSS Made
Simple (Kinnear & Gray, 2008) and SPSS for Intermediate Statistics: Use
and Interpretation (Leech, Barrett, & Morgan, 2008). There are also excellent
Web resources such as www.talkstats.com and http://vassarstats.net.
To keep this chapter concise, we have provided the absolute minimum of
what you need to know for quantitative empirical research studies using tools
such as Coh-Metrix. That said, the information we provide should be enough
for many students and early researchers (especially those who do not come
176
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 177 [176–193] 9.10.2013
7:58AM

The Results 177

from disciplines with a strong tradition of statistics) to construct a Results


section that is satisfactory for many abstracts for many conferences. Some
people who are more advanced in their research might consider our examples
to be too short or too simple, but given the audience we assumed in Chapter 7
for this section of the book, we consider it a reasonable point of departure.
Also note that because we need to demonstrate different results scenarios in
this chapter, we will not be using any genuine data sets, although the data and
results we describe are all perfectly plausible.
Clearly, it is somewhat risky to announce unabashedly that this chapter
will be considerably less than a homage to statistics. We also acknowledge
that our lack of statistical deference runs the risk of frustrating some readers
and even offending others. Such readers will have some justification in feeling
aggrieved because we will take a certain degree of poetic license in explaining
some of our uses of statistics. To be sure, our poetic license might even
expand to the odd explanation that is kinda-sorta, eh . . . wrong: Thus, for
example, we might say something akin to jumping off a tall building is likely to
kill you; whereas, in fact, as any real statistician knows, the act of “jumping
off” a building makes absolutely no difference whatsoever: it’s the sudden
stop at the bottom that does all the damage. This said, to whatever degree we
stray from the land of “strictly speaking,” we do so only in the interests of
building a foundation of understanding, but you should know that many
might feel the need to contradict us in places, and if so, we would make no
objections to that.

before starting
An important starting point in any research, and particularly when conduct-
ing research with Coh-Metrix, is to start by checking your data. Any number
of Results sections have been written in our lab, only for the student to find
out later that the data set was flawed. The most important rule of thumb is to
check the ranges and means for all of the variables in the data set. Norms have
been provided in Appendix B that should give a clear idea of what minimum,
maximum, and average values to expect. Let’s take, for example, the refer-
ential cohesion measures that have ranges between 0 and 1. If you see that the
mean is greater than 1, or that the upper range exceeds 1, then that is a clear
indication of a problem. A second rule of thumb is to think about what the
expected values are given the nature of the corpus, and check whether the
means seem reasonable given the expectations. This is of course a more
mindful and challenging evaluation of the data.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 178 [176–193] 9.10.2013
7:58AM

178 Automated Evaluation of Text and Discourse with Coh-Metrix

Problems in a data set can arise from any number of missteps in the process
of creating the data set. One misstep may have arisen in the corpora. If the
corpora were not compiled, cleaned, organized, and coded correctly and
thoroughly (as described in Chapter 9), Coh-Metrix will chug ahead and
spit out a seemingly fine analysis of whatever it was fed. A second common
misstep can occur when compiling the data. The most common mistake we
have seen is when students merge data sets using copy and paste (e.g., rather
than using a merge function). We cannot count the number of times a copy-
and-paste was done without aligning the data sets correctly (merging by a
common ID is the only safe way to merge data sets). And so, our first and most
important piece of advice in this chapter is to start by checking your data.

reporting results
For our major results examples, let us imagine that a group of researchers
have become interested in essays produced in the English-speaking region of
Whereverland. A number of previous papers in the field have led to the
theoretical framework positing that writers from the north of Whereverland
appear to take great care in story writing with their explanations. Similarly,
the theory posits that writers from the south of Whereverland report stories
with more of a narrative style. The north, we learn, is more densely populated
than is the south, with greater numbers of businesses, colleges, and city folk.
The theory suggests that these people want their information quickly and
decisively, leading to the more expository form of essay. The south, appa-
rently, has a greater oral tradition, and it is argued that this tradition may have
blended into the writing style of people in this area. The researchers in the
study have sought to find empirical, quantitative evidence to support the
theory described here. They have hypothesized that the essays of Northern
Whereverland writers would feature a higher degree of referential cohesion
because coreference is a feature of expository writing (as compared to the
narrative style). After collecting an appropriate corpus for the analysis, they
processed the texts using Coh-Metrix and are now reporting their results.

The goal of our analysis was to determine the difference in referential cohesion
between the essays of writers from Northern Whereverland (NW) and the essays
of writers from Southern Whereverland (SW). In order to address this goal, we
conducted an independent t-test. The result was as predicted: (NW: M = 0.527,
SD = 0.259; SW: M = 0.347, SD = 0.160; t (39) = 2.651; p = .012; d = 0.838). The result
suggests that NW essays deploy greater explanatory features in their writing and
SW essays deploy a more narrative style.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 179 [176–193] 9.10.2013
7:58AM

The Results 179

This Results section features five major moves. The first two of these moves
(which comprise the first two sentences) are unlikely to change all that much
from study to study. And although the other three moves will change depend-
ing on the results, each of the moves remains very formulaic. We discuss each
of the moves in the order they appear.

move 1: remind readers why we are here


The goal of our analysis was to determine the difference in terms of referential
cohesion between the essays of writers from Northern Whereverland (NW) and
the essays of writers from Southern Whereverland (SW).

Recall from Chapter 7 that the goal of a study, its theoretical framework,
hypotheses, and research question, are all highly related. However, when we
are writing the introduction section of a paper, it is necessary to flesh out the
differences between each of these aspects so as to clearly form common
ground between the writer and the reader. But by the time readers have
reached the Results section of the paper, they will have expended a consid-
erable amount of their cognitive resources on coming to understand the
corpus and the tool (see Chapters 9 and 10). Consequently, readers are likely
to appreciate a gentle reminder of what the research is centered on. As such,
the first move of the results section is no more than a brief recap of the
research question.
Note that the NW in this move (and elsewhere) refers to the essays of
writers from Northern Whereverland, and the SW refers to the essays of writers
from Southern Whereverland. Many novice researchers have the idea that
abbreviating everything is something of a rite of passage, akin to a first
cigarette or getting a speeding ticket. And, indeed, abbreviations in results
sections are common practice, but they can be something of a burden for
readers to have to recall and unravel, so use them sparingly.

move 2: inform readers as to how the analysis


was conducted
In order to address this goal, we conducted an independent t-test.
The second move of the Results section is a simple statement of the
statistical test used – in this case, a t-test. For rudimentary statistical analysis
(such as a t-test), it is not necessary to explain why the researcher has chosen
that particular form of analysis. However, the more your analysis goes off
the beaten statistical track (e.g., logistic regressions and hierarchical linear
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 180 [176–193] 9.10.2013
7:58AM

180 Automated Evaluation of Text and Discourse with Coh-Metrix

models), the more you’ll need to explain why you are using what you are
using. Of course, the question is: How do I know whether my choice of analysis
requires an explanation? A good rule of thumb is to consider how many times
you have used or read about the statistical method you are using (and how
many times your audience is likely to have done the same). The higher you
consider that frequency to be, the less you need to discuss it. A second
heuristic that might be of some use on this matter is the Excel option. Excel
is a very commonly employed Microsoft spreadsheet that calculates numer-
ous functions, including some statistics. Our Excel heuristic is simply if Excel
can do it (without the need for any additional add-in), then it is common
enough for the audience to be able to understand the approach. Returning to
our current example, given that a t-test might well be the most frequently
employed statistical test of all, it generally requires no great explanation in
your paper (although, if you’re a student, your professor might request one in
order to demonstrate that you understand what you are doing and why you
are doing it). Note also the word conducted is used in this move. Informally,
we might say run a t-test or do a t-test, but conduct is likely to garner greater
appreciation in formal circles.

move 3: what was the result (in prose)?


The third move has five major alternative forms. As such, we have to work out
which of the forms is the most appropriate for the given results. For now, we
just explain the phrase that was given in the example, and we consider the
four other possibilities later in the chapter.
In our example, the move is enacted with the frozen expression: The result
was as predicted. To understand the meaning of the phrase, we need to recall
what we learned in Chapter 8 about the research question. There, we advised
you to word the research question so that the answer would always be “yes.”
Thus, if the result of the t-test analysis suggests that “yes” was indeed the right
answer, then you can write: The result was as predicted.
Note the word “predicted” in this move. As we discussed in Chapter 8, the
word “predicted” is very powerful in research because it is the essence of what
“science” is. That is, science is all about making predictions and testing those
predictions. Any research that does not predict or cannot have its predictions
reasonably tested is generally considered nonscientific. As such, the prosaic
Results move is one of the most important in a research paper, not least
because it makes all of us who do this kind of work feel like we can justify
calling ourselves scientists.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 181 [176–193] 9.10.2013
7:58AM

The Results 181

move 4: what was the result (numerically)?


The move that presents the numeric (or quantitative) result in the example is
NW: M = 0.527, SD = 0.259; SW: M = 0.347, SD = 0.160; t (38) = 2.651; p = .012;
d = 0.838. In this chapter, we are only dealing with what we need to know;
therefore, much of the elements of this move we only touch on lightly.
Breaking down the move then, we first need to understand that the results
contain three different types of abbreviation. The first type of abbreviation
(at least, in this example) refers to the groups in the analysis, NW and SW.
The NW (as we mentioned earlier) refers to the group of texts that
represent the Northern Whereverland writers. Note that NW precedes SW
in this example because NW was the result that was predicted to be higher.
Obviously, in many results sections there will be no need for the groups to be
abbreviated. Male and Female, for instance, are suitably short, and an abbre-
viation of M and F would be confusing because these letters are also used as
statistical symbols. The second set of abbreviations includes M and SD. These
are called descriptive statistics and their function is to summarize what the
data is rather than what the data might imply. In contrast, the third set of
abbreviations, which include t, p, and d, are referred to as inferential statistics.
Their function is to allow us to make generalizations about what our “sample”
of data might tell us about the entire “population” of data (had we been able to
collect it all).
Looking more closely as the descriptive statistics, the M in the result stands
for mean (which has exactly the same meaning as average). When we
compare M for NW and M for SW we see that the M of NW is apparently
the higher number. The frozen expression we often use to describe this
relationship between the means is “. . . in the direction of . . .”. Thus, we
could say, the result was in the direction of NW. We can view the mean as the
apparent result. We can call it the apparent result because it sure looks like
NW is higher than SW, but appearances can be deceptive, as we shall see. The
SD in the results stands for standard deviation. The standard deviation
reflects the (in)consistency of the values that make up the mean. For example,
if all the values in a data set are 5, then the mean will be 5 and the standard
deviation will be 0. The more that the numbers vary, the higher the standard
deviation becomes. The standard deviation becomes more important the
closer it gets to the mean. Thus, we can view the standard deviation as a
measure of concern with the data. If the standard deviation is high, such as
more than halfway toward the mean, then the result is likely to be highly
varied, and, as a consequence, of a high cause for concern. High variation can
be bad because it suggests the mean might not really be a useful representative
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 182 [176–193] 9.10.2013
7:58AM

182 Automated Evaluation of Text and Discourse with Coh-Metrix

number for the data set. For example, the mean of 1, 1, 1, 10, and 10 is 4.6. But,
of course, it would be hard to argue that 4.6 tells us much about what the data
is (i.e., how it is distributed). The high standard deviation (i.e., 4.93) warns us
that we might have trouble. Under such circumstances, you should first take a
look at the distribution of the data and make absolutely sure that the data are
correct. In our experience, unusual standard deviations are often a sign that
the data were computed, compiled, or calculated incorrectly. If the data is
ensured to be error-free, the standard deviation should inform the interpre-
tation of the results. The mean and the standard deviation work well together
because the mean tells us the apparent result and the standard deviation
indicates whether we can trust that appearance.1
Turning to the inferential statistics, the t of the results is the t-value of the
t-test that was conducted. The t-value is a formula that incorporates the
previously discussed means and standard deviations. In the previous para-
graph we said that the mean tells us the apparent result and the standard
deviation indicates whether we can trust that appearance. The t-test is a
much more rigorous assessment of the same information, and it allows us
to go beyond a summary of our data set (i.e., descriptive data) to making an
inference about the population from which that data was taken. Ever so
basically, the higher the t-value is, the greater the difference is between the
two groups of data that were tested (i.e., the coreference values of the texts for
NW and SW). Importantly for t-values, it is important to know that they are
highly dependent on how many items are being assessed. Here, however, is
where it starts to get tricky. In the current example, we see t (38) = 2.651. The
38 in parentheses means there were 40 items (i.e., 40 total texts in the
corpus). In a t-test, the number in the parentheses is always the total number
of texts – 2 (i.e., 40 – 2 = 38). Why this number is what it is belongs to the
arcane subject of degrees of freedom, which is beyond the scope of this book
(fortunately for us). All we really need to know about this number is that the
higher it is, the higher the value of t is likely to be.2

1
Relative standard error (RSE) is a much more statistically satisfying way of assessing potential
problems in a data distribution. See http://en.wikipedia.org/wiki/Standard_error_%28statistics%
29#Relative_standard_error for more details on this. But despite RSE being more appropriate, it is
seldom produced in a Results section, whereas SD is almost always present. In this chapter, we
suggest that SD values should be considered with caution the closer they move toward equalling
the value of the mean. This suggestion is simply a heuristic, based on the numerous Coh-Metrix
studies we have conducted.
2
More technically, the higher the df value (which is 38 in this example), the lower the value of t that
is considered “significant.” However, as all df, t, and p values are created automatically these days,
the most immediate relationship between df and t is simply that they are highly correlated.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 183 [176–193] 9.10.2013
7:58AM

The Results 183

But how high is high? The p-value in the results addresses this question.
Generally, if the p-value is less than 0.05, usually written p < 0.05, then the
t-value is high enough for the result to be deemed “significant.” Significant is
the most frozen of all frozen expressions in research and must never be used
in any way other than to describe a numeric result. In our example, the
p-value is 0.012, so it is less than 0.05, so it is significant. The p-value is
important because it allows us to go beyond saying that “the result is in the
direction of NW,” and allows us to say that “the result for NW is significantly
higher.” This little difference in articulating a result might seem trivial, but it
is probably the most important part of the research paper. In short, if your
result is significant, then you have a winner; if your result is not significant,
then it’s back to the drawing board.
While we’re on the subject of p-values, let’s take this opportunity to briefly
look at what p < 0.05 means. Observe that 0.05 is one-twentieth of 1.00. Put
another way, 0.05 multiplied by 20 is 1. Or, if you like, 1 divided by 20 is 0.05.
In other words, the relationship between 0.05 and 1 is 20. This number 20 is
very important because it tells us what 0.05 means in practical language. It
means that scientists have generally agreed that a 20-to-1 chance of being
wrong is a statistic that we can all live with (generally). Thus, if your result is
less than 0.05, it means nothing more and nothing less than there is about a
20-to-1 chance that your result, in the real world, isn’t actually significant at all
(instead, you just got lucky with your result). If you think 0.05 (or 20 to 1)
sounds like a fairly arbitrary way to decide whether or not something is
significant, then you’d be in good company! Indeed, the very person who
suggested the number, Ronald A. Fisher, would agree with you. But arbitrary
or not, like the height of a basketball net, it is a number that we are stuck with.
You may well be wondering at this point what t provides us that M and SD
and p don’t. In truth, the answer is not much. So why do we report the t value?
Historically, reporting the value of t was extremely important because
researchers had to first calculate it and then use it to manually look up
p values in a large table in a little book. The t-value, cross-referenced with
the degrees of freedom (here, 38), led us to the p-value. These days, to be
frank, it is only students who are forced to manually calculate t values and
then use look up tables; everyone else uses simple software to calculate the
t value and such software invariably also supplies p, M, SD, and any number of
other things. As such, the value of t itself has become the statistical equivalent
of the human appendix, which is to say, removal would be painful, but
ultimately its loss would make very little difference at all.
The final statistic in our example is the d-value. Like the t-value, the d-value
is also a formula that is based on the means and the standard deviations. But
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 184 [176–193] 9.10.2013
7:58AM

184 Automated Evaluation of Text and Discourse with Coh-Metrix

whereas the t-value helps us to establish if the difference between the two
M-values is “significant,” the d-value tells us how different the difference is.
The d-value is referred to as an effect size. There are many different kinds of
effect sizes, but in this example we will only discuss d, which is known more
specifically as Cohen’s d. Cohen’s d is a widely used index of effect size, one
that is relatively simple to calculate, and one that is relatively simple to
interpret. As such, we find it appropriate to use in this example; however,
we neither claim that Cohen’s d is the best measure of effect size nor that
Cohen’s d is a synonym for effect size.
Essentially, Cohen’s d tells us the degree to which we could overlay one set
of data (e.g., NW) with another set of data (e.g., SW). If the value of Cohen’s
d is 0, then we have a perfect match, which tells us the two sets of data are not
different at all. As the value of Cohen’s d increases, so does the indication of
difference between the two sets of data. Over time, a relatively well-agreed
scale has emerged for how the value of Cohen’s d should be interpreted
(Cohen, 1988). Thus, below 0.2 can be called a small difference (about 85%
overlay of data), and from 0.2 to 0.5 can be called a moderate difference
(an overlay of about 67% of the data). Any value after 0.5 is a large difference; a
d-value of 1.0 has an overlay of about 45% and a d-value of 2.0 has an overlay
of about 19%.
For many people, significance is king, and nothing more than the p-value
need trouble them. But, as the legendary statistician R. Fisher himself was at
pains to point out, it is extremely important to interpret a result not with one
value but with every value you have at hand.
Fisher’s point is reminiscent of the old tale of the king and the six blind
men. As the story goes, the six blind men each thought themselves very
wise, and all day long in the market they would argue among themselves as
to who was smarter. The king grew tired of the constant bickering and
thought of a plan that might quiet them all. Now, conveniently for the
story, none of the men had ever seen or heard of an elephant, so, somewhat
implausibly, the king sent for an elephant to be taken to the market for the
blind men to examine. The king’s challenge to the men was to tell him what an
elephant was like. The first blind man took a hold of the elephant’s trunk and
announced confidently that an elephant was like the branch of a tree. The
second blind man took a hold of the elephant’s leg and announced confi-
dently that an elephant was like a pillar. The third blind man took a hold of
the elephant’s ear and announced confidently that an elephant was like a fan.
The fourth blind man took a hold of the elephant’s tail and announced
confidently that an elephant was like a rope. The fifth blind man took a
hold of the elephant’s tusk and announced confidently that an elephant was a
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 185 [176–193] 9.10.2013
7:58AM

The Results 185

long pipe. And the sixth blind man placed his hands on the elephant’s body
and announced confidently that an elephant was like a wall. “You’re all right!”
bellowed the king, “but you’re all badly mistaken alone, and wise indeed
together.”
For us, the old tale of the king and the six blind men is a reminder that each
value to which we have access only tells us part of the story. It is only when we
put all of the pieces together that we know what we’re dealing with. Thus M,
SD, t, p, and d all work together to confirm, elaborate, bridge, and interpret a
result.
So, to sum up the results in numbers move, let’s put all the pieces together.
We know that the value of NW (M = 0.527) looks higher than the values of SW
(M = 0.347). That is to say, the result is in the direction of NW. But is the
numbers 0.527 really representative of NW? And is 0.347 really representative
of SW? Technically put, do the two values reflect different population means?
The SD value for NW (0.259) is just under halfway to 0.527, so the NW data
set is probably fine. The SW data set is also fine, with an SD of 0.160 (less
than halfway to the mean). Turning to the inferential statistics, we have
38 degrees of freedoms, which means we have a total of 38 + 2 = 40 items in
our data set. We want to extrapolate from this sample what the difference
between the two groups would be if we actually had all the data in the world
to work from (the population, instead of just this sample). The difference
between the means (NW = 0.527 and SW = 0.347) might be true for this
sample of 40 texts, but how confident can we be that this difference would
be similar had all the texts from all NW and SW writers been available?
The p-value is less than 0.05, so we can assume the result only has a 1-in-20
probability of being the result of chance. As such, we can say that there is a
significant difference between the means of the coreference values for NW
and SW, with NW being higher. Not only is there a significant difference; we
can also say that the difference is large because the d-value is 0.838.

move 5: what do the results mean?


The fifth and final necessary move for a results section is the interpretation of
the result. In our earlier example we have written: The result suggests that NW
essays might deploy greater explanatory features in their writing and/or SW
essays might deploy a more narrative style.
The key word in this sentence happens to be the most important word in
the research-writing world. You must come to know this word intimately,
love it, and utter it endlessly. The word is “suggest.” The fact that you have a
significant result “proves” nothing. It is not the goal of research in general, or
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 186 [176–193] 9.10.2013
7:58AM

186 Automated Evaluation of Text and Discourse with Coh-Metrix

science in particular, to “prove” anything. The only thing we can ever do is


assess “the preponderance of evidence” for a given theory, and from this
assessment determine the confidence we have in any prediction of causal
events. The result here does nothing more than lend evidence to the stated
claims that NW essays deploy greater explanatory features in their writing, and
that SW essays deploy a more narrative style. Thus, given the stated result here,
when people opine as to these claims, they will now have one more piece of
evidence to weigh in their argument.
The word “suggest” is hugely important in a research paper, although it
seldom lurks alone. In our example, the word that accompanies “suggest” is
the word “might.” Words such as “suggest,” “might,” “could be,” and so forth
can all be grouped as hedges. Hedges are disclaimers. They are the research-
writing equivalent of advertisements that state any kind of variation on the
phrase “results may vary.” And just like “results may vary,” many people read
“suggest” to mean typical, common, likely, probable, or usual. The word
“suggest” should not be interpreted as any of these things. It should be
interpreted as a piece of evidence, from which, in and of itself, there is
insufficient cause to form any kind of substantial conclusion.
Apart from the hedges, the what-do-the-results-mean move is no more
than a general restatement of the first sentence of the results section. In this
move, however, the researcher can afford to be much more explicit in the
claim, so long as the explicit claim is suitably garbed in hedgeware.

other matters
What we have described previously constitutes the minimum that a Results
section must include. In the following section, we briefly discuss several other
matters that might be included in the Results section, several other matters
that should be included in the Results section, and, just as importantly, several
matters that should not be included in the Results section.

what if the p-value is not less than 0.05?


Previously, we have written that when the p-value is less than 0.05, then the
result can be deemed significant. Notably, 0.05 is the traditional value for
significance, but it is not the only possible value. But, when it is, what if the
p-value is not less than 0.05? For example, what if the p-value were 0.08, or
0.12, or 0.45? If the value of p is not less than 0.05, then we cannot call the result
significant. Let’s look at some examples of results and see what happens with
various p values.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 187 [176–193] 9.10.2013
7:58AM

The Results 187

1. The result was as predicted: Group A: M = 0.527, SD = 0.259; Group B:


M = 0.347, SD = 0.160; t (1, 39) = 2.651; p = 0.012; d = 0.838.
2. The result was in the direction of our prediction, and was approaching
significance: Group A: M = 0.471, SD = 0.262; Group B: M = 0.347,
SD = 0.160; t (1, 39) = 1.823, p = 0.076, d = 0.576.
3. The result was in the direction of our prediction, although it did not
reach a level of significance: Group A: M = 0.431, SD = 0.256; Group B:
M = 0.347, SD = 0.160; t (1, 39) = 1.25, p = 0.219, d = 0.395.
4. The result did not support our prediction: Group A: M = 0.345,
SD = 0.203; M = 0.347, SD = 0.160; t (1, 39) = –0.032, p = 0.975, d = –0.01.
5. The result was contrary to our prediction: Group A: M = 0.235,
SD = 0.177; M = 0.347, SD = 0.160; t (1, 39) = –2.1, p = 0.042, d = –0.664.

In example (1), the p-value is 0.012. Because 0.012 is less than 0.05, the result is
deemed significant. The frozen expression is the result was as predicted. In
essence, the phrase means we thought this would be the result and it was. If a
result was predicted and it is also significant, we typically restrict ourselves to
only as much text as used in the example. That is, we often don’t actually write
“the result was significant” because if the result was as predicted, it implies that
it was significant; and in any case, the p-value that follows the statement
confirms the inference.
In example (2), the p-value is .076. The value .076 is not less than .05, so the
result is technically not significant. However, the frozen expressions of
importance here are marginally significant and approaching significance.
These are terms that denote a p-value that is less than .10 but greater than
.05. In essence, the expressions mean: We thought this would be the result but
it wasn’t; but, gosh-darn it, we were so close that we deserve a prize even though
there really isn’t a prize for coming in second. It is important to understand
that “approaching significance” is still “not significant”; however, it is also
important to understand that researchers go to a lot of trouble and spend a lot
of time in conducting experiments and they find it very hard to accept that a
“really close” result isn’t really a result at all. Such has been the breadth of
feeling on this issue that convention, somewhat unofficially, has come to
accept results that are really close to significant. And indeed, particularly in
exploratory research, leaving out the marginally significant results can mean
leaving out meaningful and important results. Notably, there is also the issue
of statistical power. Like degrees of freedom, this issue is beyond the scope of
this chapter, but you might want to take a moment to look it up.
In example (3), the p-value is .219. Because .219 is not less than .05 (and also
not less than .10), the result is deemed not significant. However, the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 188 [176–193] 9.10.2013
7:58AM

188 Automated Evaluation of Text and Discourse with Coh-Metrix

researchers appear to have predicted that Group A would be higher than


Group B, and the means of the two groups reflect that (Group A: M = 0.431;
Group B: M = 0.347). If the means are as predicted but are not supported by
the p-value, then all we have is the frozen expression of “in the direction.” In
plain English then, the statement means the result looks good, but we acknowl-
edge that we’re not going to get any sort of prize for this.
In example (4), the p-value is .975 and the means are virtually the same.
Because .975 is not less than .05, the result is deemed not significant. The
corresponding expression is: The result did not support our prediction. The
meaning here is, simply: We got it wrong.
In example (5), the p-value is .042. Because .042 is less than .05, the result is
deemed significant. However, in this example the means are in the direction
opposite to that which was predicted. As such, the result can be thought of as
significantly wrong! The expression here is: The result was contrary to our
prediction and its meaning can be described as: Oh boy, did we ever get this
wrong.
One final point on the issue of numeric results regards software. These
days, few people calculate statistics by hand: Packages such as SPSS, SAS, and
R do everything we need in the blink of an eye. One consequence of this
convenience is that we save a huge amount of time and trouble, and another
consequence is that we get lazy. Statistical software packages assume that you
know what you’re doing; they never interpret the data and then output
something like “hang on a minute, there’s something fishy here.” As such,
we strongly encourage you not to leave responsibility to the software, and
instead try to learn as much as you can about the elements that you report in
the results section. As the expression has it, “art is a science and science is an
art.” By this we mean that the numbers in a Results section require careful
interpretation because it not so much what the numbers are that is important;
it is what the numbers mean.

justifying your procedure


Whichever approach you are taking to evaluate your data, it is necessary that
the approach is justified. Justification comes in three basic forms: the com-
monplace, the explained, and the extrinsic.
An example of the first type (i.e., the commonplace) is the t-test example
given previously in Move 2 of the results. As we mentioned there, a t-test is so
commonly employed that it is generally unnecessary to state how it works or
why you chose to employ it. Other statistical procedures that fall into com-
mon place include ANOVA and Pearson’s Correlation.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 189 [176–193] 9.10.2013
7:58AM

The Results 189

The second type of justification (i.e., the explained) is a simple sentence or


two explaining what the procedure does and, perhaps, why and how it is used
in the current study. Sometimes this explanation can be very short, as when
McCarthy et al. (2007) wrote: “Using the forced entry method of linear
regression, selected as a conservative form of multivariate analysis, a signifi-
cant model emerged” (p. 250).
Sometimes this explanation can give a little more detail, as when Crossley
and McNamara (2009, p. 128) wrote:

To test the accuracy of these lexical indices to distinguish between L1 and L2


essays, we conducted a series of discriminant function analyses. A discriminant
analysis is a statistical procedure that is able to predict group membership (in this
case L1 and L2 essays) using a series of independent variables (in this case the
selected Coh-Metrix variables).

And sometimes this explanation can go into perhaps more detail than is
necessary, as when McCarthy et al. (2009, p. 150) wrote:

To test the accuracy of our findings, we conducted a series of discriminant


analyses. A discriminate analysis is a statistical procedure that culminates with
a prediction of group membership (in this case, native language category). In
this study, as is typical of discriminant analyses studies, the accuracy of the
results are reported in terms of recall and precision. Recall shows the number of
correct predictions divided by the total number of items in the group. Precision,
on the other hand, is the number of correct predictions divided by the sum of
the number of correct and incorrect predictions. The distinction between
precision and recall is important because an algorithm that predicts everything
to be a member of a single group will account for all members of that particular
group (scoring 100% in terms of recall) but will also falsely claim many
members of other group(s), thereby scoring poorly in terms of precision.
Reporting both values allows for a better understanding of the accuracy of
the model.

The third type of justification (i.e., the extrinsic) may require no more
than references to other works. As the name suggests, the statistical approach
in the study is used because other people before have used it. That is, it is
justified to use it now because it has been used before. For example, McCarthy
et al. (2008, p. 654) wrote

To assess the accuracy of the predictor variables, we used discriminant analysis


and followed similar procedures to earlier Coh-Metrix studies (e.g., Hall et al.,
2007; McCarthy, Lehenbauer, Hall, Duran, Fujiwara, & McNamara, 2007;
McCarthy, Lewis, Dufty, & McNamara, 2006).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 190 [176–193] 9.10.2013
7:58AM

190 Automated Evaluation of Text and Discourse with Coh-Metrix

Similarly, Crossley et al. (2007, p. 123) wrote:

To examine the hypothesis that there are linguistic differences that differentiate
simplified and authentic texts, we conducted a discriminant function analysis. A
discriminant function analysis is a common approach used in many previous
studies that attempt to distinguish between text-types (e.g., Biber 1993; McCarthy,
Lewis, et al., 2006).

It may seem odd that a procedure is acceptable simply because someone else
used it. However, if it has been used in previous studies, then we can assume it
was reviewed and accepted there (and does not need to be re-reviewed). Also,
it is important to remember that convention is very strong in the sciences.
Indeed, the very fact that we can talk so much about moves and frozen
expressions is because people have come to accept and expect how we go
about writing up our research. Finally, we also use this approach when we are
deliberately replicating a procedure. For example, Louwerse et al. (2004)
conducted a study that was deliberately based on the study of Biber (1988).
As such, it was important for the authors to write “[W]e carefully followed
Biber’s study” (p. 845).

some frozen expressions


Time and again, we return to the issue of frozen expressions. Three of the
most useful ones for Results sections are given here.
The Texts Were Then Processed through Coh-Metrix. Sometimes, when
reading analysis, it can seem as though the results have jumped out of
nowhere. So, although it might seem obvious that at some stage in the
study we actually have to use the tool on our texts, it never hurts to help
readers put everything into place and into order.
Whereas X Used Y, We Used Z because . . . Sometimes we don’t want to
copy a previous study, but we do nevertheless want to justify what we’re doing
by referencing that study. For example, a previous study may have used the
statistical approach of a discriminant analysis, but the current approach uses a
fairly similar procedure of a logistical regression. Although discriminant
analysis and logistic regressions have been used in studies to perform similar
tasks (see Lamkin & McCarthy, 2012), each has its own advantages and
disadvantages, and presumably, the word “because” in the frozen expression
will explain why the change was made.
Space Limitations Mean That We Are Unable to . . . At some point in
every researcher’s life, we move from wondering how on Earth we will make
our paper long enough to wondering how on Earth we will make our paper
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 191 [176–193] 9.10.2013
7:58AM

The Results 191

short enough. No matter whether conference proceedings or an edited


article, the number of pages or the number of words is always strictly
adhered to, and no matter what that number is, you’ll always wish it
would have been just a little bit more. As such, a great deal of researchers’
time is spent working out how to cut and where to cut so the final paper
reflects all that we want to say but only in the space in which we are allowed
to say it. In such cases you might think that that the frozen expression “space
limitations mean that we are unable to . . .” would be a nice out for the
researcher. That is, simply hack out a graph or a table and bung in the “space
limitations” frozen expression, and all is well and done! But, it’s not as
simple as that. “Space limitations mean that we are unable to . . .” is generally
used only when readers and reviewers would readily agree that tables,
graphs, explanations, and the like are obviously going to be too large for
the paper. That said, the readers and reviewers can’t see that which was left
out, and so the paper’s case for dropping highly important content needs to
be quite compelling.

graphs
The presentation of graphs in a research paper is just as important as the way
to write the research paper. By graphs we mean tables, figures, and any other
representations that are nonlexical. Almost all Coh-Metrix papers feature
some kind of graph and as such it is necessary to discuss them here.
The function of a graph is to facilitate the readers’ comprehension of the
research. More specifically, graphs are used to convey a message to the
audience concerning the goal of the paper that could not be equally well
conveyed in prose. Graphs are more useful than prose when the information
they convey is equal to or better than a prosaic version, and yet is less
cognitively demanding in terms of processing. This processing advantage
that is achieved by graphs can be attributed to the ease with which data can be
found, compared, and contrasted in a graph (relative to prose), and the fact
that differences are often easier to understand by way of visualization rather
than calculation. Although graphs are valuable, they can take up a consid-
erable amount of space in a research paper. As such, you always have to
consider carefully when a graph is worth including. One rule of thumb for
graph inclusion is to remember that a picture, as they say, is worth a thousand
words. So if you find that you can say all that needs to be said in just a couple
of sentences, then you probably don’t need a graph.
In Coh-Metrix papers, the most common form of graph is the table. Tables
are generally used to show results, although they are sometimes used to show
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 192 [176–193] 9.10.2013
7:58AM

192 Automated Evaluation of Text and Discourse with Coh-Metrix

the organization of complex corpora (see Louwerse et al., 2004). When


including a table in your research paper, three important rules to keep in
mind are: (1) make sure the title is informative of the table (e.g., Coh-Metrix
indices as a function of low- and high-cohesion text versions as opposed to the
less informative version The 24 Coh-Metrix indices); (2) keep down, as far as is
possible, the number of abbreviations used in the table; when abbreviations
do become necessary, make sure that the title, header, or table note explains
what these abbreviations are; and (3) don’t clutter the table with unnecessary
data. Not everything that can be presented in a table must be presented in a
table, so choose carefully what needs to be there and what doesn’t. It is
tempting to throw a lot of data into the table, but if a table becomes hard
to read, then it increases rather than decreases the cognitive load (i.e., it is
counterproductive).
Figures, as compared to tables, are relatively rare in Coh-Metrix papers.
However, when they are used, they can be very powerful because they reduce
complex numerical relationships to a more easily processed visual image.
Figures rely more on lines and shapes than on the numbers that generated
those lines and shapes. They are probably at their most useful when they
are offering a visualization of multiple comparisons or when they are describ-
ing a trend. Multiple comparisons are generally shown using histograms. For
example, Millis, Magliano, Wiemer-Hastings, Todaro, and McNamara (2007)
use a histogram to compare how Latent Semantic Analysis (LSA) values differ
as a function of reading strategy used. And McCarthy, Briner, Rus, and
McNamara (2007) demonstrate how LSA values rise and fall across research
papers, leaving a cohesion signature that is a function of the function of the
paper section.
Graphs can also show the relationship between complex systems. For
example, Cai, McNamara, Louwerse, Hu, Rowe, and Graesser (2004) use a
flowchart to demonstrate how LSA can be used to evaluate text similarity,
whereas McNamara et al. (2010) diagram the relationship between, on the one
hand, difficult text and sophisticated text and, on the other hand, the indica-
tive lexical features that comprise such texts.
The final type of graph we shall deal with is the screen shot. Graesser et al.
(2004) use screenshots to demonstrate the look and use of Coh-Metrix.
Undoubtedly, screenshots help readers better understand complex tools,
because a single picture is easier to construct than multiple descriptions are.
However, some problems with screenshots include the concern that the text
in the screenshot itself is often difficult to read, and also because systems’
interfaces can regularly change, meaning that people who use the system in
conjunction with the image may become very confused.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C11.3D 193 [176–193] 9.10.2013
7:58AM

The Results 193

conclusion
In this chapter we discussed the reporting and presentation of the results
section of a research paper. We outlined the five major moves of the results
sections, along with the frozen expressions associated with them. We pointed
out that the Results moves had variations, depending on how well the results
met predictions. Several further issues were discussed, including the justifi-
cation of the approach used and the importance of graphs in a Results section.
The next chapter turns to the final section of a paper – the Discussion.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 194 [194–222] 9.10.2013
8:03AM

12

The Discussion

Whereas the primary purpose of the Results section is to explain what the
results of the experiments are, the primary purpose of the Discussion section
is to explain what the results of the experiments mean. Put another way, our
primary (but by no means the only) task in the Discussion section is to provide
a plausible explanation as to the relationship between our results and our
theoretical framework. This requirement is the tricky part because, unlike
other parts of the paper, which can be very cookie-cutter-esque, the require-
ment of the Discussion section demands an element of creativity on the part
of the researchers. That is, the findings of the study are only circumstantial
evidence, and it is up to the investigators to undertake the challenging task of
persuading the audience (i.e., the readership, the discourse community) that
what was found in the study contributes positively to our current understand-
ing of the world.
This task requires a careful meshing of the guiding theoretical framework
and the results. Both can be dauntingly messy. Results are seldom highly
significant with huge effect sizes; if they were, then pretty much no one
would be interested in the results because they are hardly likely to be telling
us anything we didn’t already know, or need to know. So, because frameworks
and results are messy, patching them together requires careful consideration,
rigorous examination, exhaustive reviewing, and, perhaps most important of
all, a creative perspective in order to make a grab-bag of knowledge-ingredients
into a comprehensible propositional-cake.

discussion moves
So, a Discussion section is not easy. But it still has to be written. As ever, the
best way to make sense of it all is to consider it in terms of moves and
their associated frozen expressions (see Chapter 7). However, because the
194
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 195 [194–222] 9.10.2013
8:03AM

The Discussion 195

I. Summary Phase
a. Commencement move
b. Exposition move
1. Methods element
2. Purpose element
3. Results element
II. Denouement Phase
a. Interpretations move
b. Implications move
III. Acknowledgements Phase
a. Limitations move
b. Future research move
IV. Closure Phase
a. Wind-up move
b. Pitch move
fi g u r e 1 2 . 1 . The discussion model helps organize the ending argument of your paper

Discussion requires more creativity on the part of the writer (to tie together
results into the theoretical framework), the moves of the Discussion are
somewhat less formalized than we have seen in other sections of the paper.
Put another way, the moves of the Discussion section are somewhat more
flexible in where they appear, how they appear, and even if they appear at all.
In some ways, this flexibility makes the Discussion section easier to write
because authors can weave something more like a narrative into the section,
even putting their own spin on how the results should be interpreted. That
said, the flexibility of the Discussion section may also cause authors to wander
off topic or make claims that are poorly evidenced. With such caveats in
mind, we propose that an effective Discussion section can broadly fit into the
following model (see Figure 12.1).
With this model in mind, in the sections that follow we explain each of the
four phases of the discussion (i.e., the summary phase, the denouement
phase, the acknowledgments phase, and the closure phase) together with
their associated moves, elements, and frozen expressions. We will also supply
examples from Coh-Metrix-related papers in order to show how authentic
studies have addressed parts of this discussion design. The chapter ends with
a model example of a Discussion section that is based on the newspaper study
described in the Elevator Pitch of Chapter 7.

the summary phase


In many ways the summary phase does not truly belong in a Discussion at all
because it does not address the issue of what do the results mean. Instead, the
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 196 [194–222] 9.10.2013
8:03AM

196 Automated Evaluation of Text and Discourse with Coh-Metrix

summary phase simply sets the stage for that very question to be addressed in
the denouement phase that follows (i.e., the second phase). Indeed, some-
times, when space allows, the summary phase is its own major section of the
paper; however, when that happens, the distinction between a Summary and a
Discussion often gets very blurred, and the Discussion is probably already
blurry enough.
The summary phase aims to address two major questions: (1) What did we
do? and (2) What did we find? Two further questions that may find their way
into the project summary are (3) How did we do it? and (4) Why did we do it?
We begin by focusing on the first two of these questions in what we call the
commencement move of the summary phase. The third and fourth questions
are largely discussed in the subsequent exposition move.

Commencement Move
Generally, a summary phase opens with the commencement move. The pur-
pose of the commencement move is to bring readers and authors together at a
single point of embarkation from which the interpretations and implications of
the results can be “discussed” (hence the name “discussion”). The commence-
ment move is generally nontechnical, because it is important not to confuse
anybody right from the get-go. For the same reason, the commencement move
should also be relatively simple, relatively short, unassuming, and unequivocal.
The basic point here is that the commencement move needs to activate as
much schemata as possible for the reader while limiting the cognitive resources
needed to do so. In such a way, readers are most likely to have available to them
the cognitive resources necessary to integrate the forthcoming information into
their developing mental model of the text.
Possibly the easiest way to achieve a successful commencement move is to
simply state what the paper was about (i.e., what did we do?). The researchers
Rowe and McNamara (2008) provide a nice example of this move when they
write: “This study explored the mechanisms within the CI model related to
disambiguation.”
But as we have pointed out, the presentation of the project summary largely
depends on how the researchers interpret the interplay between the results
and the theoretical framework. As such, Coh-Metrix commencement moves
have come in many forms (see Table 12.1 for examples).
Any number of variations of the commencement move are perfectly
legitimate, but here we describe the most basic example (i.e., what we did).
When we address what we did, we are focusing on the fundamental act that
best describes the methodology used in the project. That is, the project is a
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 197 [194–222] 9.10.2013
8:03AM

The Discussion 197

t a b l e 1 2 . 1 Examples of four forms used in Coh-Metrix commencement moves

Form Text Author


What research This study explored the mechanisms Rowe & McNamara
was conducted within the CI model related to (2008)
disambiguation.
How the research Our corpus . . . was formed from a subset McCarthy et al. (2007)
was conducted of 100-sentence self-explanations from
a recent iSTART experiment.
Why the research There is a need in discourse psychology McNamara, Louwerse,
took place for computational techniques to McCarthy, & Graesser
analyze text on levels of cohesion and (2010)
text difficulty, particularly because
discourse psychologists increasingly
use longer, naturalistic texts from real-
world sources.
What the In sum, the results of this study indicate McNamara, Crossley, &
research found that more-skilled writers use more McCarthy (2010)
sophisticated language.

comparison, a demonstration, an assessment, or some other such course of


action that was used to address the research question. In most Coh-Metrix
studies, the word that is associated with these actions is the verb “assessed”
(presumably because Coh-Metrix is an “assessment” tool). Although “assessed”
is the most common verb, there are many other verbs that have also been
used: these include “evaluated,” “compared,” “examined,” “explored,” “demon-
strated,” “analyzed,” “presented,” and “contrasted.” For simplicity, we will refer
to words like “assessed” and “examined” and the entire family of methodolog-
ical action words as research verbs. Thus, we can say that the simplest form of
commencement move involves the use of a research verb in order to state what
was done in the study.
Because the Discussion is concerned with the implications of the study, the
study itself must be complete. And if the study is complete, then the study’s
research verb generally needs to be in past tense. In general, what was done
and what was found are in past tense and their implications (beyond the
study) are in present tense. This having been said, we once again have to
remind readers that the Discussion must be flexible, and so the grammatical
form used in the commencement move will strongly depend on the study that
was conducted, as we see in the examples in Table 12.2.
At this point we know that the commencement move requires a research
verb (e.g., “assess”) and that the commencement move probably needs to
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 198 [194–222] 9.10.2013
8:03AM

198 Automated Evaluation of Text and Discourse with Coh-Metrix

t a b l e 1 2 . 2 Examples of three grammatical structures used in Coh-Metrix studies

Grammatical
Structure Text Authors
Past tense In this study, we analyzed three corpora of science McCarthy,
journal abstracts written by American, British, Lehenbauer, et al.
or Japanese scientists. (2007)
Present perfect Using the computational tool Coh-Metrix, this Crossley &
study has demonstrated that many properties McNamara
of both simplified and authentic texts . . . (2008)
Present tense The findings from these studies indicate that Crossley &
argumentative essays judged to be of higher McNamara (2011)
quality by expert human raters are more
linguistically sophisticated, but at the same
time contain fewer cohesive devices to facilitate
text comprehension.

be written in the past tense (so, if the research verb is “assess,” then the
form of the verb will be “assessed”). To understand what else needs to be
in the move, we should consult the research question. Let’s look at two
examples of Coh-Metrix research questions that were first presented in
Chapter 7.
1. Bruss et al. (2004): Has the language used in scientific texts changed
over the last 200 years?
2. Louwerse et al. (2004): Can Coh-Metrix distinguish spoken English
from written English?
Given Michell Bruss and colleagues’ research question, we can infer that their
study was an assessment of the language used in scientific texts over the last
200 years. Therefore, we can also say Michell Bruss assessed the language used
in scientific texts over the last 200 years.
Given Max Louwerse and colleagues’ research question, we can infer that
their study was an examination of whether Coh-Metrix helps us better detect
document quality. Therefore, we can also say that they examined whether
Coh-Metrix could distinguish spoken English from written English.
Frozen Expressions. As always, a move has associated frozen expressions.
The most common frozen expression associated with the commencement
move is “In this study . . .”. Of course, the word “study” may change depending
on what the researchers view their undertaking to best represent (e.g., “study,”
“chapter,” “dissertation,” “project”). For simplicity, we refer to words like
“study” and “chapter” and the entire family of undertaking words as research
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 199 [194–222] 9.10.2013
8:03AM

The Discussion 199

nouns. As such, our frozen expression for the commencement move can be
stated as “In this” + [research noun]).
Putting all the pieces together, our model for the commencement move is:
 “In this” + [research noun] (e.g., study) +
 [Agent] (e.g., I or we) +
 [Research verb] (e.g. assessed) +
 [Research question] (e.g., Can Coh-Metrix distinguish spoken English
from written English?)
Testing the Model. To test our model for the commencement move, let’s look
at four more Coh-Metrix research questions, all first introduced in Chapter 8.
3. McNamara et al. (2011) asked: Does world knowledge affect young
readers’ comprehension?
4. Ozuru et al. (2007) asked: Does the passage (more so than the question)
explain the difficulty in standardized reading tests?
5. Best et al. (2004) asked: Do the effects of reading skills depend on the
genre of the text?
6. McCarthy et al. (2007) asked: Can Coh-Metrix replicate human ability
to recognize genre at the sub-sentential level?
As we see in Table 12.3, applying our model to these questions makes for
perfectly good commencement moves.

t a b l e 1 2 . 3 Six examples of the commencement move using


the commencement model

Example First Author Text


1 Michelle Bruss In this study, we assessed whether the language used in
scientific texts has changed over the last 200 years.
2 Max Louwerse In this study, we demonstrated that Coh-Metrix could
distinguish spoken English from written English.
3 Danielle In this study, we assessed whether world knowledge affects
McNamara young readers’ comprehension
4 Yasahiro Ozuru In this study, we examined whether the reading passage
(more so than the associated questions) explain the
difficulty in standardized reading tests?
5 Rachel Best In this study, we investigated whether reader skill sets (such
as world knowledge and text decoding skills) apply
differently depending on the genre of the reading text?
6 Philip McCarthy In this study, we assessed whether Coh-Metrix could
replicate human ability to recognize genre at the
sub-sentential level.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 200 [194–222] 9.10.2013
8:03AM

200 Automated Evaluation of Text and Discourse with Coh-Metrix

exposition move
As we saw earlier, the commencement move of many Coh-Metrix studies has
taken the form of how the research was conducted, why the research was
conducted, or what the results of the research were. To simplify matters, we
have recommended that the commencement move take the form of address-
ing what we did (where “what we did” is a modified version of the research
question). This recommendation means we are can now inform the reader-
ship of the other three common forms of opening: how the research was
conducted, why the research was conducted, and what the results of the
research were. For simplicity’s sake, we refer to these three elements as the
method-purpose-results elements of the exposition move. We have already
seen some examples of these elements, but now let’s look a little more closely
at them.
Method Element. Starting with the Method element, the following text
comes from McCarthy et al. (2008, p. 251): “Our corpus . . . was formed from a
subset of 100-sentence self-explanations from a recent iSTART experiment.”
This statement briefly explains the composition of the corpus; however,
the authors did not explain what was done to the corpus (e.g., how it was
measured or how it was analyzed). Readers are left to presume that either the
composition of the corpus is of greater importance than the analysis, or that
the analysis is given elsewhere in the discussion.
The most probable reasons for the Method element in the aforementioned
example being so short are: (1) the element isn’t required at all, so researchers
frequently highlight only the part to which they wish to draw attention;
(2) many papers have size restrictions, and reminding people of information
rather than providing new information can seem wasteful.
Purpose Element. Turning now to the Purpose element, the following text
comes from McNamara et al. (2010, p. 315): “There is a need in discourse
psychology for computational techniques to analyze text on levels of cohesion
and text difficulty, particularly because discourse psychologists increasingly
use longer, naturalistic texts from real-world sources.” This statement briefly
explains the reason for conducting the research (i.e., the purpose). The state-
ment takes the classic form of “this is important . . . because . . .”. The purpose
element is not common in the summary part of the discussion. Instead, it may
turn up in the implications move or the wind-up move (discussed later).
Results Element. And finally, we have the Results element. To examine this
element, let’s look at an extract from Roscoe et al. (2011, p. 285):
Linguistic analyses of introduction, body, and conclusion paragraphs using
Coh-Metrix revealed several properties associated with paragraph quality. Some
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 201 [194–222] 9.10.2013
8:03AM

The Discussion 201

features were common across all types: length, Givenness of information, and
vocabulary. Not surprisingly, paragraphs that were longer received higher ratings,
perhaps because they contained more elaborated arguments or evidence. Better
paragraphs also contained more given information, maintaining cohesion and
comprehensibility of ideas. Lastly, several measures of lexical sophistication were
predictive of paragraph quality, such as word frequency, hypernymy, and lexical
diversity. Paragraphs received higher scores when the writers displayed a deeper
and more varied choice of vocabulary. These results mimic those reported by
McNamara et al. (2010) regarding the entire essays.

The most notable feature of the extract, as compared to the previous elements,
is that it is long. Of course, the length stems from the fact that there is more
than one result that needs to be highlighted. A second notable feature is that
although the extract is “results,” there aren’t any numerals or statistics. Thus,
the results element is written in very general terms and writers can even get
away with a few examples of terms like more and greater without having to
add p-values (see Chapter 11).
Finally, note that the last sentence of the extract is less a statement of a
result and more a statement of implication. We deal with implications in
the denouement phase (discussed later in the chapter), so for now it is enough
to know that results elements can effectively end with a statement of the
implications of the results.
Frozen Expressions. Let’s take a moment to look at a few frozen expres-
sions that are common in the exposition move. The frozen expression “In
sum” is a useful way of joining together several smaller results into one big
picture. For example, McNamara et al. (2010, p. 76) write: “In sum, the results
of this study indicate that more-skilled writers use more sophisticated lan-
guage.” Obviously, what preceded this statement were several examples of
how sophisticated language results had been better explained by the more-
skilled writers.
A second common example of a frozen expression associated with the
exposition move is “our results suggested.” This expression is a very simple
way to highlight to readers that the results element is about to follow. The
word “suggested” is of great importance here and was discussed in Chapter 11.
The point is that no result is the final word, so hedging is always the path of
least resistance.

denouement phase
The word “denouement” (DAY-NOO-MAWN) is French in origin and means
“the unraveling of the knot.” Many people would argue that the denouement is
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 202 [194–222] 9.10.2013
8:03AM

202 Automated Evaluation of Text and Discourse with Coh-Metrix

the most important part of the Discussion section, serving to situate the result
of the study into the theoretical framework. In literature, movies, and drama
of any kind, the denouement is that part of the discourse in which all that is
unknown is made known, and in a Discussion section of a research paper it
functions the same way: by explaining how the mystery of the event (the
experiment) can be explained in terms of what we already know and agree
about the world (the theoretical framework).
This unraveling of the knot should be taken seriously, because by this stage
of the paper, all that has so far been presented are facts and figures that
any reasonably well-trained algorithm could produce. Indeed, the software
SCIgen (en.wikipedia.org/wiki/SCIgen) does exactly that by using moves and
frozen expressions, not unlike those described here, to generate nonsense
science papers that seem (to many people) to be just like the real thing. The
point here is that the researchers themselves have to unravel the knot; they
cannot rely heavily on moves and frozen expressions and instead they
must present a plausible explanation as to the interpretation of the result
and its subsequent implications. These two italicized words (interpretation
and implication) are key to this explanation, and they form the moves that
constitute the denouement move.

Interpretations Move
To better understand the purpose of the interpretation move, we can turn to
the literary phrase of a “willing suspension of disbelief.” This phrase, given
to us by Samuel Taylor Coleridge, informs us that an audience (readership,
discourse community) is willing to be persuaded, even of the most incredible
of things, like a flying man, or beaming people from one place to another, or
even such nonsense as politicians putting aside their ideological differences in
order to serve the greater good. But audiences won’t believe just anything.
There are limits to what people will believe. And those limits are reached
when elements such as consistency and reason fail to be maintained.
As an example of a willing suspension of disbelief, let’s consider the movie
Superman. More specifically, let’s consider one much-talked-about part of the
Ilya Salkind Superman series, specifically that while it was perfectly OK for
Superman to fly, it was not OK for him to fly so fast that by doing so he was
able to reverse time (which is what he did in order to bring Lois Lane back
from the dead). The difference between these two aspects (a flying man and
reversing time) is that Superman’s ability to fly is explained in numerous
places and at numerous times by the fact that he is from Krypton: a planet
with far more evolved people, and far greater gravity (thus Superman’s ability
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 203 [194–222] 9.10.2013
8:03AM

The Discussion 203

to fly is consistent, and it is reasoned). But reversing time is quite different.


Never before had Superman done this (so it is not consistent), and at no point
is it explained how he managed to do this (so it is not reasoned).
Applying the suspension of disbelief as a heuristic for the denouement, we
can say the following: So long as you are consistent in terms of the theory (all
the stuff that has come before) and so long as you provide plenty of reasoning
(the results of your study), your audience will probably believe that Superman
can fly. However, to the extent that you depart from the story that has brought
you to this point (the theoretical framework) and to the extent that you make
claims without sufficient evidence (overreaching with your results), you will
be asking your audience to believe that Superman can alter time – and that
kind of thing your audience will not stand for. In sum, the heuristic of willing
suspension of disbelief reminds us that we have to keep to the story, and we
have to support our claims with reason, but outside of that we have relatively
little to contain us. And it is in this space and with this power that we define
what our study means.
Coh-Metrix Examples of the Interpretation Move. Let’s now look at some
Coh-Metrix-related examples of the interpretation move (see Table 12.4).
In Example 1, the result of the Louwerse and colleagues’ paper is “inter-
preted” as evidence that cohesion measures differ from linguistic measures.
This conclusion is drawn from the fact that although the same texts were used

t a b l e 1 2 . 4 Three Coh-Metrix studies featuring interpretation moves

Example Text Authors


1 The most plausible explanation for this result is the Louwerse et al. (2004)
contrast between Biber’s focus on the linguistic
features operating at the word level and our study
which included a much wider range of language
and discourse characteristics that we have called
cohesion.
2 The finding that the effect of SERT training emerged O’Reilly, Best, &
one week after training is encouraging because it McNamara (2004)
suggests that students remember and use the
strategies beyond the time of training.
3 This difference may result from the tendency in Crossley et al. (2007)
simplified texts to avoid developing and linking
ideas with the more complex connectives, such as
modifiers and logical connectors, and to depend
instead on the more common connectives such as
and, or, and but.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 204 [194–222] 9.10.2013
8:03AM

204 Automated Evaluation of Text and Discourse with Coh-Metrix

in both studies, those texts yielded different results depending on the method
used to measure them.
In Example 2, O’Reilly and his colleagues interpret their result as evidence
that their intervention (i.e., SERT training) is not transitory. This conclusion
is drawn from the fact that similar results were found at both the time of
training and one week after training was completed.
In Example 3, Crossley and his colleagues had previously explained that the
results showed a “difference” between the two text types studied (i.e., sim-
plified texts and authentic texts). The authors then try to explain why this
difference may have occurred.
Note that none of the interpretations are offered as a “proof” or a “claim of
fact.” Instead, the interpretations are offered only as a reasonable explanation
of how the results came to be what they are and how they fit into existing
theory. In that sense we can see that the authors have kept to the theoretical
storyline and supported any claims with appropriate reasoning.
Although the interpretations offered at the beginning of the section help
explain this critical feature of a discussion, we must admit that it is generally
quite difficult to precisely demarcate the ending of the result element of the
exposition move and the beginning of the interpretation move. In most studies,
authors will have blended into their narrative the interpretations, results,
implications, and many other elements. As such, our point is simply that an
interpretation should be there, and not that a single piece of text needs to be
reserved for its presentation.
Frozen Expressions. As always, there are some frozen expressions that
may help authors when writing the interpretation move.
Taken as a whole. This phrase is often useful for bundling up an array of
results before laying down an interpretation. The phrase also seems to have
the effect of lessening the impact of the weaker results. For example, we might
say: “Taken as a whole, the Allies fought an effective campaign during the
Second World War.” It is hard to see how anyone would disagree with such
a statement, and yet it neatly glosses over facts such as the Holocaust, the
millions of Allied and civilian deaths, and such military failures as Pearl
Harbor, the defence of the Philippines, or the opening U.S. military campaign
in Africa.
The result is encouraging. When results are clearly not all you would have
dreamed, but there is at least one avenue of hope, we can claim a result to be
“encouraging.” Encouraging is used in science in much the same way that it is
used in politics and sports. For example, when unemployment is going up,
but not by as much as it was the previous month, we often hear it described as
encouraging. And if a team loses a closely contested game in overtime, as
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 205 [194–222] 9.10.2013
8:03AM

The Discussion 205

opposed to the previous weeks in which it had been completely dominated by


opponents, again we hear the word “encouraging.” In science, encouraging
does not have to be quite so dramatic, and the word may sometimes be used in
a purely positive sense; however, sometimes, “the result is encouraging” is
simply an effective way to say that many of the results were anything but
encouraging.
Suggests. We have discussed hedging in a variety of places (see Chapters 8
and 11), so we shall not go into it in detail here. All that we really need to know
is that verbs such as “suggests” are commonplace, as are noun phrases such
as “a reasonable explanation” and “plausible account” and modal verbs such
as “may” and “could.”

Implications Move
Let’s recap. We have reminded the readers as to what our research study
was about; we have stated the main findings; and we have offered a plausible
interpretation of the results. Our next major task is to explain the implications
of the interpretations.
Two reasonable questions to ask at this time are “implications for what?”
and “implications for whom?” The what would be the theoretical framework:
Writers need to explain such elements as whether the results appear to
support (or not support) the current framework, in what ways they support
(or don’t support) the framework, and what is likely to happen if the frame-
work assimilates the findings of the current study. In turn, the whom would
generally refer to teachers, materials designers, industry, and also other
researchers: Writers need to explain how the results might affect material
production and material usage, in what ways the results affect that material,
and what is likely to happen if the material producers and users assimilate the
findings of the current study.
In terms of writing up the implications, it is fair to say that the implications
move is seldom a single stretch of text. Instead, it is more often the case that
implications tend to pop up around findings and interpretations and wher-
ever else is relevant (recall that the Discussion section is quite fluid). As such,
it is difficult to offer obvious examples of implications’ paragraphs. Despite
this difficulty, we have selected the following Coh-Metrix extracts to show
how implications are included in paragraphs, and how a variety of frozen
expressions can help foreground the implications.
Coh-Metrix Examples of the Implications Move. As we discussed in
Chapter 8, any experiment of any value is related to some kind of theoretical
framework. And as we discussed earlier in this chapter, one of the major
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 206 [194–222] 9.10.2013
8:03AM

206 Automated Evaluation of Text and Discourse with Coh-Metrix

purposes of the Discussion section is to articulate how the results of the


experiment inform that theoretical framework (i.e., what are the implications?).
The implications we are concerned with are typically whether the results add
to (i.e., support) our current understanding of the theoretical framework or
whether they in some way challenge (i.e., contrast with) the understanding of
the framework.
Because supporting and contrasting are two of the most obvious ways that
the implications of the study can be foregrounded, it is not surprising that a
frozen expression is associated with these conclusive terms in the form This
study supports X and This study contrasts with X. In the example that follows,
the first sentence of the McCarthy et al. (2008) text is an interpretation (a
plausible explanation of a finding). The second sentence is an implication
because it relates the interpretation to the theoretical framework: “[O]ur con-
clusion from Experiment 2 was that both humans and a computational model
could distinguish topic sentences from non-topic sentences in a context free
study. This evidence supports the Free Model of topic sentencehood” (p. 660).
Supporting some element of a theoretical framework is all well and good;
however, when there is some conflict between a study’s findings and those of
another study, it is prudent to highlight the inconsistency with some degree
of caution. The expression “This evidence contrasts with . . .” is generally
accepted as a gentle way of pointing out an inconsistency, and it is preferred
to such terms as “contradicts,” “demonstrates the error,” or “falsifies.” With
this in mind, consider the following example from O’Reilly and McNamara
(2007, p. 126). The extract is an interpretation until the underlined sentence
beginning “This result contrasts with . . .”

[B]ecause we used the same texts as those in McNamara (2001, 2004), we predicted
that, overall, readers would have difficulty understanding the material. McNamara
(2001) argued that the difficulty of the text impeded readers’ ability to develop a
situation model of the material, and because so few readers were able to develop a
coherent situation model of the text, the default representation for the reader was the
textbase. This notion was supported by a difference of over 50% correct comparing
participants’ scores on text-based questions and bridging-inference questions. This
result contrasts with previous studies in which the overall performance for text-
based and bridging-inference questions was, on average, only 3% higher for text-
based questions as compared to bridging-inference questions (McNamara &
Kintsch, 1996, Experiments 1 and 2; McNamara et al., 1996, Experiment 2; emphasis
added).

Frozen Expressions. Once again, we have several frozen expressions that may
be helpful in directing the readership toward the implications of the study.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 207 [194–222] 9.10.2013
8:03AM

The Discussion 207

t a b l e 1 2 . 5 Examples of implication frozen expressions

Text Authors
It thus expands our understanding of the ways in which Best, Ozuru, &
different types of reader interpret sentences in the McNamara (2004).
comprehension process.
The results of this analysis . . . suggest that authentic texts are Crossley et al. (2007)
significantly more likely than simplified texts to contain
causal verbs and particles. Therefore, they are possibly
better at demonstrating cause-and-effect relationships and
developing plot lines and themes than are simplified texts.
This finding supports many of the criticisms that have been
leveled against simplified texts by proponents of authentic
texts, including claims that simplified texts exhibit stilted
and unnatural language, do not demonstrate natural cause-
and-effect relationships, and do not develop plots and ideas
sufficiently.
These results suggest that the first half of sentences alone McCarthy et al. (2007)
contains sufficient domain characteristics for skilled
readers to begin the process of activating knowledge of text
structure: a process which facilitates comprehension. Such
research may lead to better understanding of how
knowledge is represented and subsequently activated.
This finding supports many of the criticisms that have been Crossley et al. (2007)
leveled against simplified texts by proponents of authentic
texts, including claims that simplified texts exhibit stilted
and unnatural language, do not demonstrate natural cause-
and-effect relationships, and do not develop plots and ideas
sufficiently.

For the implications moves, we find that the associated frozen expressions
generally take the form of a word or phrase that signals a transition in the
text from an interpretation toward an imminent implication. Common
examples of these terms include the adverbials “thus,” “hence,” “therefore,”
“as such,” “along these lines,” “consequently,” and “correspondingly.” Noun
phrases are also common, with examples including “this research,” “these
processes,” and “this analysis.” In general, readers’ comprehension is likely to
be facilitated by these expressions, because explicit transitionals require less
inferencing on the part of the reader.
In Table 12.5 we have provided several examples of frozen expressions
associated with the implications move. The first example comes from Best
et al. (2004). Here the authors use the word “thus” to indicate that the discourse
is transitioning from interpretations to implications. The second example
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 208 [194–222] 9.10.2013
8:03AM

208 Automated Evaluation of Text and Discourse with Coh-Metrix

comes from Crossley et al. (2007). Here the authors begin with a single opening
sentence that serves as an interpretation of a wide array of results from the
study. The remainder of the paragraph is dedicated to implications, wherein the
use of the word “therefore” serves much the same purpose as the word “thus.”
Note also the use of the frozen expression “this finding,” which, as we discussed
earlier, is accompanied by the word “supports.” The third example, from
McCarthy et al. (2007), uses the noun phrase “such research” to indicate a
forthcoming implication. And finally, the fourth example, from Crossley et al.
(2007), is the noun phrase “this finding,” which again indicates a forthcoming
implication.

acknowledgments phase
No project that claims to have made “valuable findings” can be simultane-
ously the “final word” on the subject. At the very least, those findings must be
open to scrutiny, and any conclusions based on those findings need to be
open to challenge. But long before any of that business can take place, the
researchers themselves must evaluate their own work: acknowledging prob-
lems, concerns, or shortcomings within the current study, and acknowledging
the long road ahead. These acknowledgments highlight the two major moves
of the acknowledgements phase: the limitations and the future research. Note
also that these two moves may also be combined into a single hybrid move.
Table 12.6 provides several examples of the acknowledgment moves, with
each discussed in detail over the forthcoming related sections.

Limitations Move
All studies have limitations: no corpus can account for every possible text; no
experiment can control for every variable; and no collection of indices can
ever provide more than an approximation of a construct. But the good news is
that all (most) reviewers know this, and they understand that the researchers’
requirement is to make a “good-faith effort” to provide results that reflect the
real world, and not to cover every possible angle of every possible eventuality.
If it were otherwise, no one would ever publish anything.
Perfection may not be compulsory, but there are still lines in the sand, gray
areas, and debatable points. Moreover, a research project can often start off
with indisputable data but end up with an analysis that is anything but
indisputable. For example, from a perfectly good corpus the researchers may
have detected an unusual phenomenon. The researchers wish to investigate this
phenomenon more closely; to do so, however, their number of items becomes
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 209 [194–222] 9.10.2013
8:03AM

The Discussion 209

t a b l e 1 2 . 6 Examples of limitations moves, future research moves, and hybrids

Move Text Authors


Limitations While this study has important implications Crossley & McNamara
for differences between L1 and L2 lexical (2009)
proficiency, the use of computational
tools, and statistical analyses, it does have
limitations. First, the study . . .
Limitations Although this study produced some Healy, Weintraub,
significant and potentially important McCarthy, Hall, &
findings, there were limitations. The McNamara (2009)
variety of texts . . . could have produced
some of the puzzling results, such as
the . . . clausal diversity discrepancy. These
results could have been a result of using
non-authentic texts that were designed for
a specific purpose. Because these texts had
to have a high frequency of the desired
grammatical clause or lexical item, they
may have been purposely designed to be
less challenging by being less
grammatically or lexically diverse.
Limitations Results may have . . . been influenced by Lightman et al. (2007)
limitations in the corpora. The lengths of
the individual songs may have affected
the results. Most of the text files consisted
of fewer than 200 words, and some of the
computational algorithms that we used
are more reliable with larger text samples.
In addition, our corpus consisted of
sixteen songwriters. This rather low
number is due to our constraints of
carefully matching the suicidal and non-
suicidal song-writers. Nonetheless, these
sample size constraints may account for
the limited effect sizes.
Future Future research will focus on developing a McCarthy, Briner, Rus, &
range of textual signatures beyond the McNamara (2007)
abstract comparisons outlined in this
chapter. Specifically, comparisons of
section parts from the perspective of the
introduction, methods, results, and
discussions sections need to be examined.
Future In future research, we will seek to better McCarthy, Rus, et al. (2007)
assess the parameters of the measures
discussed in this study. That is, certain

(continued )
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 210 [194–222] 9.10.2013
8:03AM

210 Automated Evaluation of Text and Discourse with Coh-Metrix

t a b l e 1 2 . 6 ( cont.)

Move Text Authors


measures are geared more to evaluate
certain categories of similarities over
others. As such, we want to assign
confidence values to measures so as to
better assess the accuracy of our models.
Future We believe that future research should . . . Best, Ozuru, & McNamara
match the reader type to text type at a (2004)
much more fine grained level using
various global/local attributes of the texts.
We also believe that future versions of
[the software] should take into account
both reader and text characteristics . . .
Future Future accounts of comprehension may McNamara & Magliano
need to turn to theories of word and (2009)
sentence understanding, and purely
connectionist architectures (e.g.,
recurrent networks) in order to fully
account for comprehension.
Hybrid Since this study used a relatively narrow Hall et al. (2007)
field of register (antitrust/competition),
our findings do not necessarily generalize
to all legal areas. As such, future research
will compare more registers (such as
legislation, court transcripts, and “boiler-
plate” documents) to determine if there
are underlying, consistent differences in
the British and American languages of
law. Future analysis must also consider to
what degree these differences in language
variety extend to other genres such as
narrative and expository.
Hybrid Like all studies . . . this study does have Crossley et al. (2008)
limitations. Although it is a longitudinal
study and fits the parameters of the type
of study that is needed in the field . . . it
examined only six learners over the
course of a year. Even though this
number is sufficient for the analyses
conducted in terms of statistical power, it
is arguable whether the sample is large
enough to be representative . . . These are
limitations that future studies should
address.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 211 [194–222] 9.10.2013
8:03AM

The Discussion 211

so small that statistical analysis reaches those gray areas of acceptability.


Naturally, the thing to do is go and find more items, but more items aren’t
always available, and time and cost constraints will also enter the equation. At
times like these, the researcher has to make choices: either wait until sufficient
data becomes available or acknowledge the limitations of the current analysis.
At this point, we come to a useful distinction between a published article
and published proceedings (and we can include in the latter such work as
term papers and even theses and dissertations). A proceedings paper is a work
in progress. It is not complete; the authors are not presenting it as complete
and reviewers should not read at as being complete. As such, it has limita-
tions. Journal articles, on the other hand, can be called complete. Of course,
being complete doesn’t mean that the line of research is at an end, but an
article does present itself as a particular element of the research being at an
end. As such, it should not have (serious) limitations.
Limitations may well be “reasonable,” but just as no one likes to draw
attention to their smelly feet or the tattoo of their ex-lover, so it is that authors
don’t like to point out the weaknesses of their own study. Instead, like an
unwanted child, limitations are often pushed out and dispersed into the fabric
of the Discussion with the hope that no one will pay them too much attention.
This grudging offering makes examples of a full-blown limitations paragraph
exceptionally rare. As such, our own examples of limitations are somewhat
“limited.”
The first extract we shall discuss (see Table 12.6) is by Crossley et al. (2008).
The text is a rare example of an explicit acknowledgment. Specifically, it states
that the study “does have limitations.” Having made that statement, the
authors go on to list the concerns they have with the analysis. Such openness
should perhaps be more common; however, even when we do see transparent
acknowledgment, we still see some kind of positive extenuation, as with the
opening clause that highlights the studies importance.
The second example of limitations (see Table 12.6) is from Healy,
Weintraub, McCarthy, Hall, and McNamara (2009). Here again, we see that
limitations are blended in with a positive extenuation. Specifically, the
authors use an introductory clause that implies this study is a good one, and
only then do they give a list of the problems.
When results are weak, it’s not unusual (nor unreasonable) to be suspicious
of the data. But even if the data is suspect, the authors still have to explain
their suspicions, and they also have to explain why (given those suspicions)
they still went ahead with the study. In the third example (see Table 12.6),
Lightman et al. (2007b) had studied the lyrics of suicidal songwriters.
Unfortunately (or rather fortunately – from the perspective of the uniqueness
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 212 [194–222] 9.10.2013
8:03AM

212 Automated Evaluation of Text and Discourse with Coh-Metrix

of human life), the project was limited by too few artists deciding to end their
own lives, and by those artists who did end it all not being sufficiently verbose.
Thus, the authors had to explain these extenuations in their limitations move.
Some people may argue that just as bad workmen blame their tools, so too
do bad scientists blame their data; however, as we discussed in Chapter 9, we
have to start somewhere (even with poor data), and showing our results (even
if they’re bad) and discussing our thoughts on why they are bad are more
likely to lead us to long-term success than simply ignoring issues or dumping
all our analyses in the trash.

Future Research Move


Authors may not want to parade their studies’ limitations, but when it comes
to future research, they’re much less coy: Discussion sections are replete with
instances of “where we go from here,” “what we’d like to do next,” and “what
the field should be considering.” The purpose of the future research move is to
let other researchers know about your plans and perhaps offer those other
researchers some avenues of pursuit. On a grander scale, the future research
move may also be seen as the authors’ vision of how the theoretical frame-
work might expand.
The first two examples of future research extracts (see Table 12.6) come
from studies that are near the beginning of their particular line of research.
Consequently, the future research move highlights relatively modest develop-
ments, respectively concerning expanding the data (McCarthy, Briner et al.,
2007) and tweaking the algorithms (McCarthy, Rus et al., 2007).
The second two examples of future research extracts (Best, Ozuru, &
McNamara, 2004 and McNamara & Magliano, 2009) come from projects
that are much more developed. From these extracts we can see that future
research is directed much more specifically toward the theoretical framework.

Making “Limitations” and “Future Research” into


a Single Hybrid Move
Having shown that there are two acknowledgment moves, we also need to
show that these two moves might just be a single move looked at from different
perspectives. That is, limitations are simply areas of future research, and future
research is often nothing more than existing limitations.
The “limitations = future research” acknowledgment move is commonplace
for two major reasons. First, it’s simply true: Limitations really are future
research. Second, as we mentioned, authors are much more comfortable talking
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 213 [194–222] 9.10.2013
8:03AM

The Discussion 213

about future research than limitations. As an author, you want to avoid talking
the reviewers out of publishing your paper by stressing its weaknesses. So,
serving up limitations as future research avoids highlighting the possible weak-
nesses that reviewers might have.
Two examples of this hybrid form are presented in Table 12.6. In the Hall
et al. (2007) extract, the researchers begin with the limitation, which they
present as a cause and effect (see the first sentence of the extract). The authors
then move directly to explain how this limitation forms the springboard for
their next analysis (see the second sentence). The subsequent example is
much more complex. The authors (Crossley et al., 2008) begin with a simple
acknowledgment that no study is perfect. They then proceed to their first of
three extenuations before finally admitting a limitation. This limitation is
then immediately followed by a second extenuation before there is an impli-
cation (which again contains an extenuation). Finally, the authors present
their solution to the limitation, which is, of course, future research.
The comments in the preceding paragraphs may seem like we are making
a joke at the authors’ expense; however, in most cases we are actually the
authors ourselves. But in any case, the point of importance here is that today’s
limitations are tomorrow’s publications, and that there is no shame in
acknowledging that. This having been said, drawing a spotlight to our least
favorable attribute is probably a little too altruistic. As such, we recommend
beginning researchers to think carefully about the limitations of their studies,
and what may mitigate those limitations, and to present the collected evi-
dence positively as a course for the future.

Frozen Expressions
As ever, a number of frozen expressions have evolved as part of the acknowl-
edgments move. Some of these expressions are listed in the following list:

Although/While this study offers important . . . X potential concerns need to be


mentioned.
Where X is any number.
Although/While the results of this study are encouraging, future research
needs to X
Where X addresses the limitations.
Although/While + [positive aspect of the study], we advice some caution with
the interpretations of [any result that may be described as “a bit of a stretch”].
Future research X consider
Where X is any modal verb (e.g., must, has to, needs to, might etc.)
This study X limited by [ + extenuation]
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 214 [194–222] 9.10.2013
8:03AM

214 Automated Evaluation of Text and Discourse with Coh-Metrix

Where X puts a limit on the word “limited” (e.g., might be, is somewhat,
arguably, etc.)
Although/While X, the results produced here offer an important and
exciting . . . .
Where X is an acknowledgment of the limitation
Although/While [acknowledgment of limitation], the results produced here
contribute to the field of X by Y.
Where X is the field in general or a particular subfield being highlighted
(e.g., the conference being applied to); and where Y is the interpretation and/or
implication of the findings.

closure phase
Who’s going to read your paper? Well, apart from your family and friends,
your audience is likely to be made up of reviewers, researchers, and profes-
sors. All of these people – except for your family and friends, who are most
likely positively biased – have two things in common: (1) they are all subject to
limited time, energy, and enthusiasm; and (2) they are all going to grade your
work. With these points in mind, it is well to remember that by the time your
readers have reached the final passage of your magnum opus, they’ll have had
to trawl through an ocean of facts, figures, and frameworks and, therefore,
they may be a little tired. But tired or not, your readers will probably choose to
evaluate your work, and to do so they will have to gather their thoughts as to
what the paper was really about and whether the effort they have just put
in was worthwhile. As such, this is the point of the paper where the writer is
advised to serve up a take-home message that is brief, memorable, and
satisfies the reader’s need for closure.
The closure phase features two moves: the wind-up and the pitch. Ideally,
closure is captured in a single paragraph, beginning with the wind-up and
ending with the pitch (which is the very last sentence of the paragraph). The
purpose of the wind-up element is to focus the reader on the “right” con-
clusion (i.e., the interpretation and implications of the study according to the
authors). The purpose of the pitch is to make that conclusion indelible.

the wind-up move


As the names may suggest, it is difficult to separate the wind-up move from
the pitch move. For this reason, in this section we focus on the wind-up, but
we will not (and cannot) ignore the pitch. Also in this section we describe and
discuss the wind-up move from more of a critical perspective. This strategy
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 215 [194–222] 9.10.2013
8:03AM

The Discussion 215

t a b l e 1 2 . 7 Example 3 of a Closure Move by McCarthy, Renner, et al. (2008)

Text Commentary
In this study, . . . Like in conclusion, the phrase in this study
signals the paper is transitioning to
wrapping up.
our interest in topic sentencehood A restatement of the purpose of the study
identification was directed at better
evaluation of text structure in order to
more effectively match text to reader.
Given that topic sentences are more likely to The authors then offer two restatements of
provide assistance to low skilled/low- assumptions from the theoretical
knowledge readers, and given that such framework. Although probably
readers would probably benefit more necessary to include this information,
from ideal type topic sentences, the clauses are both dependent, and
require the reader to hold a significant
amount of information in short term
memory before arriving at the main
clause.
then the Free Model of topic sentencehood A reasonable and accurate conclusion is
introduced here offers systems such as given; however, this final sentence is a
Coh-Metrix the opportunity to better massive 59 words long, with no fewer
assess texts and better fulfill the Coh- than 32 words occurring in the pitch
Metrix goal of optimally matching text to move. As such, this take-home message
readers. will require a truck and trailer.

reflects three facts about the closure move. First, excellent examples of the
move are not common. Second, the move probably deserves more attention
than it has been given. And third, the move is far from easy, demanding a
great deal of the “creativity,” which has often been mentioned in this chapter.
To close this section, two examples of closure moves are provided from
McCarthy, Renner et al. (2008) and McCarthy and McNamara (2007). The
abridged paragraphs along with corresponding critiques are provided in
Tables 12.7 and 12.8.

the pitch move


One strategy to consider when trying to compose the last sentence of a
Discussion is to consider the function of the pitch. For example, you might
want to end the manuscript with a vision, a plan, a conclusion, or a goal. In
Table 12.9, six pitches are presented, each with a different function. While none
of the examples are perfect pitches (for example, some are a little too long to be
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 216 [194–222] 9.10.2013
8:03AM

216 Automated Evaluation of Text and Discourse with Coh-Metrix

t a b l e 1 2 . 8 Example 5 of a Closure Move by McCarthy and McNamara (2007)

Text Commentary
While much work remains to be done, At the time this section of the chapter was
finally approved (August 2012), a Google
search of “while much work remains to
be done” provided no fewer than 108,000
hits. Removing the word “work”
provided 214,000 hits. Replacing “while”
with “although” provided 152,000 hits
with “work” included and 244,000 hits
with “work” removed. In short, this
expression is so commonly used that it
should not be thought of as a frozen
expression and would be better thought
of as a cliché. The point is, of course
much work remains to be done! No
reader needs to be reminded of this.
Instead, the readers simply need to be
told what work is planned, and those
plans should be written in the
appropriate acknowledgement section.
our study demonstrates that The authors introduce a summary
statement. Note how the paragraph
would have read perfectly well without
the previous cliché.
genre recognition at the sub-sentential level A brief and affective statement of study’s
is possible. achievement
Such recognition might provide a signature A reasonable summary of the implications.
of reading ability, and as a consequence,
a method of assessing reading ability. The
major results of this study certainly
provide sufficient initial evidence that
such an approach is viable and that this
paradigm can be further explored as an
assessment of reading skill.
Furthermore, there have been no previous The final two sentences demonstrate how
investigations of how much text is the study helps develop the theoretical
required to recognize genre. This study framework. This is an effective strategy,
indicates that very little text is actually although it is difficult to process the final
required and that readers most likely sentence as a pure pitch move.
activate information about text structure Consequently, the last two sentences run
very early in the reading process. together, meaning that they lose some
impact in terms of being memorable.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 217 [194–222] 9.10.2013
8:03AM

The Discussion 217

t a b l e 1 2 . 9 Six examples of pitches

Pitch Text Author


A vision While such a study cannot hope to McCarthy, Hall, Duran,
completely level the playing field Doiuchi, Duncan,
on which non-native speakers of Fujiwara, & McNamara
English are forced to compete, it (2009)
does at least offer some hope that
computational analyses (such as
those produced by Coh-Metrix
and the Gramulator) will better
facilitate those whose careers
depend on written production in a
foreign language.
A plan Thus, our task becomes the Crossley & McNamara (2010)
identification of these features and
the derivation of computational
algorithms that accurately model
them.
An implication Such research designs allow for the Crossley, Salsbury, &
study not only of polysemous McNamara (2010a)
sense relations in natural language
data but also for the study of other
sense relations and depth of
knowledge in L2 lexical
development.
A summary We have shown that deception is a Duran, Hall, McCarthy, &
feature of language that is McNamara (2010)
identifiable through many
variables, established that Coh-
Metrix is a computational system
that can identify deception, and
revealed that there is insight to
gain by comparing computational
NLP tools.
A conclusion Better, more complete, models of McNamara (2011)
semantics are likely to emerge by
measuring multiple levels of
meaning.
A goal Through the freewriting and other Weston, Crossley, &
strategy modules in W-Pal, we McNamara (2010)
hope to scaffold students toward
building better writing skills.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 218 [194–222] 9.10.2013
8:03AM

218 Automated Evaluation of Text and Discourse with Coh-Metrix

memorable), all of them provide the reader with an indelible impression of the
authors’ intent. Naturally, not all readers will agree as to the impact of these
example pitches, but having a strategy with which to form a pitch move may be
helpful.

a model discussion
This chapter has been long. So it may be useful to briefly provide a model of
how the Discussion section might fit together.
 In this study, we assessed whatever our research question was. In order
to address whatever we were addressing, we whatever we did to address
it. Our findings suggest whatever they suggest. The study is important
because why it is important.
 Collectively/In sum/Broadly speaking/Taken as a whole, our results
should be interpreted to mean something.
 Our findings support/contrast with whoever and whatever they support
and whoever and whatever they contrast with.
 The implications of our findings
 raise questions as to whatever they raise questions as to
 indicate whatever they indicate
 may mean whatever they may mean
 provide evidence of/for whatever it provides evidence of or for
 Although our study provided something positive, there are issues as to
whatever there are issues of. Future research needs to address whatever it
needs to address.
 In conclusion, our study what the study did in terms of the research
question, especially as it relates to the theoretical framework. Zinger pitch
in terms of some identifiable function.
Using the information provided in this chapter, along with the immediately
preceding model, Table 12.10 provides a complete Discussion section based on
the Elevator Pitch that was provided in Chapter 7.

and finally
Experience tells us that the Discussion section, more so than any other
section, may well receive the least amount of author’s attention. Quite often
the Discussion section will not even be included when a student submits a
draft for review. Instead, a note will be attached along the lines of “I’ll fill in
the Discussion later.” Even for this book, there was some debate as to whether
8:03AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D
t a b l e 1 2 . 1 0 A model of the discussion section by sequential position, paragraph position, discussion phase,
discussion move, and element of move

Sequence Paragraph Phase Move Element Text


1 1 Summary commencement / In this study, we examined whether the language of news reports
became more complex when reporting global issues as opposed
to local issues.
2 1 Summary exposition method Our study comprised two contrasting newspaper corpora: one
concerning local issues and one concerning global issues. All
texts were processed through Coh-Metrix, with the results being

219
assessed in a series of t-tests.
3 1 Summary exposition purpose The study is important because anyone needing to learn how to
communicate affectively (or how to understand what makes

[194–222] 9.10.2013
affective communication) needs to understand how the features
of language can differ between contrasting registers, and why
these differences are present.
4 1 Summary exposition result Our results suggested that the language of news reports becomes
more complex when reporting global issues. Specifically, global
news reports were significantly lower in terms of situation model
cohesion and syntactic ease. In addition, global news reports
demonstrated significantly higher lexical diversity values,
meaning that a greater range of vocabulary was deployed across
the texts. The result for narrativity was not significant.
5 2 Denouement interpretation / A plausible reason for these results is that any reporting of global
news is likely to be an important story, and therefore one that is
difficult to explain. This complexity may be reflected in the
language selected by writers. If this writing is more complex then
it is possible that writers either don’t realize the complexity, or

(continued )
8:03AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D
t a b l e 1 2 . 1 0 ( cont.)

Sequence Paragraph Phase Move Element Text


don’t appreciate the complexity. Whatever is the case, there is
little evidence in this study to suggest that writers are using
facilitative language. This having been said, we must
acknowledge that facilitative language (i.e., language of higher
cohesion) is typically longer than its less cohesive counterpart
(McNamara et al., 2010). As such, space requirements may

220
simply be prohibitive to a structure like newspapers.
6 3 Denouement implications / The findings of this study contrast with previous research (e.g., by
researchers such as Graesser, Clark, McNamara, Swales, or

[194–222] 9.10.2013
Kintsch) inasmuch as the newspaper texts appear to be back-to-
front in terms of cohesion. That is, theory suggests that
background knowledge, schemas, and expectations of shared
experience need to be established in order to increase the
likelihood of comprehension, and that explicit cohesion at the
level of the text might facilitate this goal. As such, the more
complex global news story may require more facilitative
language, whereas the local news can assume some degree of
common ground. In the event, the findings suggest a
simplification for local news and a less cohesive text for global
news. Assuming comprehension is the goal of the newspaper
(which is reasonable), these results have important implications
because they suggest that reporters could possibly better serve
their readership with an adjustment in their levels of cohesion.
7 4 Acknowledgements limitations / future / Although the findings of this study offer important insight into
research perception of local and global issues, we advice some caution
with the interpretations of these results until further research can

(continued )
8:03AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D
be conducted on this complex issue. For instance, future research
must consider to what degree a newspaper is a “learning text,”
and to what degree it can be compared to a something like a more
standard high-school text. Such information will better inform us
as to expected comprehension levels and expected Coh-Metrix
values of the corpora. Further, issues such as the reporter type
and the audience type need to be considered. That is, do readers
process newspaper text from non-native English speaking
countries in a similar way to how native English newspapers are
processed? In short, this study is somewhat limited by the
difficulty it has in establishing a sufficiently wide number of
baselines against which to better understand the findings of this

221
study. To be sure, these baselines will be helpful in future
research; however, until that can be achieved, the results

[194–222] 9.10.2013
produced here offer an important and exciting avenue of pursuit.
8 5 Closure wind up / We write news and we read news because we want to understand
our world: both the world close to us, and the world far away.
How this news is reported is just as important as what is reported
because our comprehension of the news dictates its value. This
study demonstrates that reports of local events are textually
different from reports of global events. And more importantly,
that complex events might be associated with less facilitative
language. The results here cannot yet supply evidence that
adding cohesion to news text would be beneficial to news
comprehension, future research will need to address that issue;
however, what this study does provide is evidence that Coh-
Metrix analysis of news text can detect levels of potentially
beneficial lexical features.
9 5 Closure pitch / Consequently, we have the intriguing possibility that Coh-Metrix
analysis might provide for greater comprehension of one the
world’s most widely circulated information materials: the news.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927C12.3D 222 [194–222] 9.10.2013
8:03AM

222 Automated Evaluation of Text and Discourse with Coh-Metrix

a whole chapter on the Discussion section was necessary. In short, the


Discussion is often left as the project’s afterthought.
The Discussion section is probably an unloved child because most people
feel it is merely a summary. In other words, the Discussion section doesn’t
contribute anything new. However, as we have seen here, a Discussion
requires highly creative thinking, arguably more so than in any other part
of the paper. For these reasons, we advise beginning researchers to take the
Discussion section very seriously and to set aside a suitable amount of time
and energy for its development. Remember, the Discussion section is the last
piece of the paper that will be read, so it is the piece that is most likely to be in
a reader’s mind as the paper is evaluated. Given this, there is a high likelihood
that a carefully planned and presented Discussion section could serve very
well the goals of the paper.
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 223 [223–228] 9.10.2013
8:16AM

Concluding Remarks

Our hope in this book has been to provide readers with a coherent description
of Coh-Metrix and how to make use of it. Coh-Metrix has changed our lives
in terms of how we conduct research: from the way we ask questions to the
way we answer them. Our understanding of text, including natural language,
discourse, and linguistics, has grown exponentially as we have developed
Coh-Metrix and explored language using our tools. It has opened doors we
never dreamed existed.
To us, Coh-Metrix is like using the Internet. That is, just as we can now
type pretty much any question into a search engine and expect to get an
actionable answer, so too can we also ask Coh-Metrix to transform our vast
quantities of data into output that answers a world of questions about
language. But of course, there are certainly limits to Coh-Metrix 3.0. First,
although we have explored hundreds of indices in this project, Coh-Metrix
3.0 only includes a subset of these indices. Nonetheless, we have attempted
to include what we consider to be the most important and valid indices
among the entire array. Second, Coh-Metrix includes a wide variety of
indices, but most of these are related to text difficulty, and our particular
focus has been on measures related to cohesion. As such, Coh-Metrix
cannot answer every question about language. Third, our motto in the
Coh-Metrix project has been to explore the “low-hanging fruit.” The indices
we provide in Coh-Metrix tend not to involve highly complex computa-
tional linguistic algorithms. We have avoided algorithms that are computa-
tionally expensive because of the need to process text and provide results
relatively quickly. That said, the Coh-Metrix variables that we have included
are the potential building blocks of far more sophisticated assessments that
we will continue to develop in the next phases of the Coh-Metrix project.
Despite these limitations, Coh-Metrix provides a gold mine of information
about text – and all of it in one tool. We know from the use of the past

223
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 224 [223–228] 9.10.2013
8:16AM

224 Concluding Remarks

Coh-Metrix versions that it has been useful to thousands of other researchers


as well. We hope that its value to others continues to grow.
As we move forward, our research turns to a number of related topics. One
objective is to make available the Coh-Metrix Text Easability Principal
Component scores to educators. This objective is important because we
seek to provide online technology that teachers can use to obtain powerful
yet interpretable information about individual text characteristics, allowing
teachers to select texts that optimize their pedagogical goals and their stu-
dents’ needs. Thus, the objective is to move beyond goals to match reader to
text, as adopted by many readability programs, and to move toward helping
teachers better understand the nature of text so as to empower their instruc-
tional practices. We have developed prototype tools to provide Coh-Metrix
text easability component scores. Links to these tools are currently available at
http://www.cohmetrix.com/. These tools are intended to provide educators
with information about the ease of text. For example, the tool provide at
http://coh-metrix.commoncoretera.com provides additional features such as
a library of texts aligned with the Common Core State Standards.
A second focus is on the developing of indices related to writing. This work
has been spurred by Institute of Education Science grants (IES R305A080589;
R305A09623; R305A120707) to develop and assess a writing strategy tutoring
system called the Writing Pal. The Writing Pal provides writing strategy
instruction, game-based practice, and practice writing persuasive essays.
Part of our efforts in developing this system has focused on creating
algorithms to provide feedback on the essays that students write. Coh-
Metrix provided our starting point in this algorithm development process.
As described in Chapter 6, Coh-Metrix has gone a long way in helping us
develop the Writing Pal feedback algorithms and furthering our under-
standing of writing. However, the focus of Coh-Metrix is primarily on text
difficulty rather than on rhetorical features of language. Indeed, our inves-
tigations of essay writing have indicated that higher-quality essays are
characterized by more challenging and sophisticated text rather than text
with high referential cohesion. A recent goal has been to develop additional
measures of writing quality. Our efforts so far in this regard have been
fruitful, indicating that various measures of semantic coherence and rhet-
orical cues are important components of higher-quality essays. Our ulti-
mate objective is to provide a tool to researchers and educators to assess
writing quality on multiple dimensions.
A third recent focus has been on language across cultures. Researchers in
other countries have frequently inquired whether it would be possible to
develop a Coh-Metrix for their language. This would be entirely possible,
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 225 [223–228] 9.10.2013
8:16AM

Concluding Remarks 225

but only if their languages had corresponding computational linguistics


modules, such as WordNet, syntactic parsers, ratings of words on various
psychological dimensions, LSA spaces, and so on. We have taken initial steps
in building LSA spaces and analyzing cohesion on a recent National Science
Foundation grant (NSF 0904909) that analyzes texts in Arabic, Chinese, and
Spanish. More specifically, we have analyzed the cohesion of the language of
leaders in countries that speak these languages, such as Hosni Mubarak, Mao
Zedong, and Fidel Castro. We find that the cohesion of the speeches of these
leaders is systematically related to historical events and the decades of their
leadership. Once again, we find cohesion to be an important manifestation of
the mind.
A fourth focus is to expand beyond Coh-Metrix to the level of qualitative
output. By qualitative we mean a textual analysis that provides verbal output
in addition to (or complementary to) the quantitative output that have been
the primary focus. For example, given a text that has relatively low cohesion, a
qualitative analysis would provide information on the consequences of
the cohesion gaps and specify the linguistic components contributing to the
particular levels of cohesion. We have begun these efforts in the Coh-Metrix
text easability project where we provide a relatively simple qualitative inter-
pretation of the quantitative output. Our future work will focus on providing
more complex qualitative interpretations that convey interdependencies in
text. For example, as discussed in Chapter 5, the level of cohesion in one
context will have different consequences in another. Low cohesion has differ-
ent implications in the context of narrative text than it does in the context of
expository text. Similarly, a preponderance of unfamiliar words has a very
different implication in the context of highly cohesive text with simple syntax
than it does in the context of syntactically complex, low-cohesion text.
Understanding and conveying these complex interdependencies was the
initial and overarching goal of the Coh-Metrix. Our quantitative focus in
Coh-Metrix was a clearly necessary first step, but the addition of qualitative
analyses provides exciting new developments that we expect to noticeably
enhance the insights that Coh-Metrix provides.
While these are our most recent objectives related to Coh-Metrix, our
research evolves and morphs quickly and often. Our ability to foretell even
our immediate futures is strikingly poor. These four recent research and
development goals are highly likely to evolve and change, perhaps even before
this book is in print. Indeed, technological research in general, and more
specifically corpus analysis using automated tools such as Coh-Metrix, results
in rapidly evolving research areas. One reason is because the findings are
shared with a quick turnover rate (e.g., using outlets such as proceedings) so
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 226 [223–228] 9.10.2013
8:16AM

226 Concluding Remarks

that other researchers can benefit from them, but also so that findings,
developments, and discoveries are still current when they are disseminated.
It is virtually impossible to keep up with the rapid pace of technological
advances. During the last decade, the capabilities of technology have grown
exponentially. Indeed, we expect the opportunities from this growth to
provide exciting new adventures that we couldn’t possibly foretell today.
We look forward to the next decade, and the decades following, of Coh-
Metrix and its progeny. We hope you enjoy it too!
Finally, we have some concluding remarks reserved solely for our student
readers.

some do’s and don’ts

When Sending a Draft of Your Work to Your Professor


DO press the spell check button before submitting the paper to your profes-
sor. Yes, this is a waste of a precious few seconds of your invaluable life; and,
yes, your professor was put on this Earth for the sole purpose of spotting your
reworkings of Webster’s. Nonetheless, it is nice if you would help the dear old
professor by making your contribution minimally cryptic.
DO use a style guide before submitting the paper to your professor.
Whether it be the APA guide, the MLA guide, the CMS guide, or the New
York City Subway System guide, it is important that your paper meet a
recognized form of consistency. And yes, we know that the rules of language
are descriptive, ever in a state of flux, and incapable of being universally
agreed on; however, do try to remember that your professor is made of
impenetrable, immovable, and incomprehensible prescriptive concrete (as
are conferences, journals, and dissertation committees).
DON’T litter your paper with semicolons. You do not know how to use
them; no one does. Kurt Vonnegut Jr. wrote 12 novels without ever using a
single one, so keep yours down to a sensible number.
Especially after dependent clauses, DO include commas. Admittedly, this is
menial work for your minions (a.k.a. professors) to complete, and they should
be grateful for the opportunity of making some kind of meaningful contri-
bution to your object d’art. However, if you could find the odd second or two
to look up comma usage, it will be greatly appreciated.
DO include examples. The phrases “for example,” “for instance,” and “such
as” are a godsend for those trying to unravel the curio that is your last paragraph.
DO make sure you submit to your professor the very latest draft of your
work. Although your professors clearly do live to spend their weekends and
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 227 [223–228] 9.10.2013
8:16AM

Concluding Remarks 227

evenings trying to help you develop a career, they actually also have the occa-
sional life of their own. And when they receive an e-mail saying, “Thanks for the
comments but I actually sent you the wrong draft – the right draft is now
attached,” they are likely to descend into a hitherto unimaginable outpouring of
spit-infested meltdown. If you do suddenly realize the error of your ways, you
are advised to withdraw from the course, and possibly from the country.

When Receiving a Draft of Your Work Back from Your Professor


DON’T just press “accept all.” Actually, physically look at the changes made in
your paper, with your eyes. The point is that you do not make the same mistake
again. If you’ve ever wondered why your lazy old uncaring and thoughtless
professors takes two to three weeks to return your work, it is because they are
writing the same comments for the same things for the thousandth time.
DON’T just read the comments – act on them. It is fine if you disagree with
some comment, but don’t just delete them. Presumably the people who make
comments do so because they happen to think their points are important. Yes,
admittedly, they only have hundreds of publications to their name and
thousands of years of experience behind them, whereas you have really cool
hair and a neighbor in a rock band, but it is just possible that their comments
may have some degree of value. As such, talk to the person who made the
comment, try to establish where the confusion is, and have it agreed and
sorted out before the next draft is submitted.
DO know that your professor is not the human embodiment of pure evil
(for the most part). When your paper is returned to you drenched in track
changes and with more commentary than a cable news show on a presidential
scandal, it is simply your professor trying to help. Every professor has sat
where you are sitting and done what you are doing. And you, in turn, will
have your day. The professor’s response is not about power, revenge, or
humiliation – it is simply trying to get you up to the standard that you
yourself have opted to strive for. A good research paper is the culmination
of a long and often painful journey. If it were easy to do, everyone would be
doing it, and you’d be selling hamburgers (not that there’s anything wrong
with that!).
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927CON.3D 228 [223–228] 9.10.2013
8:16AM
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 229 [229–246] 7.10.2013
3:30PM

References

Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge,
MA: MIT Press.
Allen, J. (1995). Natural language understanding. Redwood City, CA: Benjamin/Cummings.
Allen, J. F. (2009). Word senses, semantic roles and entailment. 5th International
Conference on Generative Approaches to the Lexicon, September 17–19, 2009. Pisa, Italy.
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database
(CD-ROM). Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
Beck, I., McKeown, M. G., & Kucan, L. (2002). Bringing words to life: Robust vocabulary
development. New York: Guilford Press.
Beck, I. L., McKeown, M. G., Omanson, R. C., & Pople, M. T. (1984). Improving the
comprehensibility of stories: The effects of revisions that improve coherence. Reading
Research Quarterly, 19, 263–277.
Beck, I. L., McKeown, M. G., Sinatra, G. M., & Loxterman, J. A. (1991). Revising social
studies text from a text-processing perspective: Evidence of improved comprehensi-
bility. Reading Research Quarterly, 27, 251–276.
Bell, C., McCarthy, P. M., & McNamara, D. S. (2012). Using LIWC and Coh-Metrix to
investigate gender differences in linguistic styles. In P. M. McCarthy & C. Boonthum-
Denecke (Eds.), Applied natural language processing and content analysis: Identification,
investigation, and resolution (pp. 545–556). Hershey, PA: IGI Global.
Best, R., Ozuru, Y., & McNamara, D. S. (2004). Self-explaining science texts: Strategies,
knowledge, and reading skill. In Y. B. Kafai, W. A. Sandoval, N. Enyedy, A. S. Nixon, &
F. Herrera (Eds.), Proceedings of the Sixth International Conference of the Learning
Sciences: Embracing Diversity in the Learning Sciences (pp. 89–96). Mahwah, NJ: Erlbaum.
Best, R. M., Floyd, R. G., & McNamara, D. S. (2008). Differential competencies contri-
buting to children’s comprehension of narrative and expository texts. Reading
Psychology, 29, 137–164.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, D. (1993). Register variation and corpus design, computational linguistics. Cambridge:
Cambridge University Press.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language
structure and use. Cambridge: Cambridge University Press.

229
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 230 [229–246] 7.10.2013
3:30PM

230 References

Boonthum-Denecke, C., McCarthy, P. M., & Lamkin, T. (Eds.). (2012). Cross-disciplinary


advances in applied natural language processing: Issues and approaches. Hershey, PA:
IGI Global.
Bormuth, J. R. (1971). Development of standards of readability: Toward a rational crite-
rion of passage performance. Final report, U.S. Office of Education, Project No. 9–0237.
Chicago: University of Chicago.
Brill, E. (1995). Transformation-based error-driven learning and natural language
processing: A case study in part-of-speech tagging. Computational Linguistics, 21,
543–566.
Britton, B. K., & Gulgoz, S. (1991). Using Kintsch’s computational model to improve
instructional text: Effects of repairing inference calls on recall and cognitive structures.
Journal of Educational Psychology, 83, 329–345.
Brooks, C., & Warren, R. P. (1972). Modern rhetoric. New York: Harcourt Brace Jovanovich.
Brun, C., Ehrmann, M., & Jacquet, G., (2007). A hybrid system for named entity
metonymy recognition. Proceedings of the 4th International Workshop on Semantic
Evaluations (ACL-SemEval) (pp. 23–24). Prague, Czech Republic.
Bruner, J. (1986). Actual minds, possible worlds. Cambridge, MA: Harvard University
Press.
Bruss, M., Albers, M., & McNamara, D. S. (2004). Changes in scientific articles over two
hundred years: A Coh-Metrix analysis. Proceedings of the ACM 22nd International
Conference on Computer Documentation (pp. 104–109). New York: ACM Press.
Cai, Z., McNamara, D. S., Louwerse, M., Hu, X., Rowe, M., & Graesser, A. C. (2004). NLS:
Non-latent similarity algorithm. In K. Forbus, D. Gentner, & T. Regier (Eds.),
Proceedings of the 26th Annual Cognitive Science Society (pp. 180–185). Mahwah, NJ:
Erlbaum.
Cain, K., & Nash, H. M. (2011). The influence of connectives on young readers’ process-
ing and comprehension of text. Journal of Educational Psychology, 103(2), 429–441.
Carrell, P. (1982). Cohesion is not coherence. TESOL Quarterly, 16, 479–488.
Cataldo, M. G., & Oakhill, J. (2000). The effect of text organization (original vs. scrambled)
on readers’ ability to search for information. Journal of Educational Psychology, 92,
791–799.
Charniak, E. (2000). A maximum-entropy-inspired parser. Proceedings of the First
Conference on North American Chapter of the Association for Computational Linguistics
(pp. 132–139). San Francisco: Morgan Kaufmann Publishers.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: The MIT Press.
Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.
Clark, H. H., & Clark, E. V. (1977). Psychology and language. New York: Harcourt Brace
Jovanovich.
Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13,
259–294.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,
NJ: Erlbaum.
Coltheart, M. (1981). The MRC Psycholinguistic Database. Quarterly Journal of
Experimental Psychology, 33A, 497–505.
Connor, C. M., Morrison, F. J., Fishman, B. J., Schatschneider, C., & Underwood, P. (2007).
The early year: Algorithm-guided individualized reading instruction. Science, 315(5811),
464–465.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 231 [229–246] 7.10.2013
3:30PM

References 231

Conrad, F. G., & Schober, M. F. (Eds.). (2007). Envisioning the survey interview of the
future. New York: Wiley.
Crismore, A., Markkanen, R., & Steffensen, M. S. (1993). Metadiscourse in persuasive
writing: a study of texts written by American and finish university students. Written
Communication, 39, 39–71.
Crossley, S. A., Allen, D., & McNamara, D. S. (2011). Text readability and intuitive sim-
plification: A comparison of readability formulas. Reading in a Foreign Language, 23,
84–102.
Crossley, S. A., Allen, D., & McNamara, D. S. (2012). Text simplification and compre-
hensive input: A case for intuitive approach. Language Teaching and Research, 16,
89–108.
Crossley, S. A., Dufty, D. F., McCarthy, P. M., & McNamara, D. S. (2007). Toward a new
readability: A mixed model approach. In D. S. McNamara & G. Trafton (Eds.),
Proceedings of the 29th annual conference of the Cognitive Science Society (pp. 197–202).
Austin, TX: Cognitive Science Society.
Crossley, S. A., Greenfield, J., & McNamara, D. S. (2008). Assessing text readability using
psycholinguistic indices. TESOL Quarterly, 42, 475–493.
Crossley, S. A., Louwerse, M., McCarthy, P. M., & McNamara, D. S. (2007). A linguistic
analysis of simplified and authentic texts. Modern Language Journal, 91, 15–30.
Crossley, S. A., McCarthy, P. M., & McNamara, D. S. (2007). Discriminating between
second language learning text-types. In D. Wilson & G. Sutcliffe (Eds.), Proceedings of
the 20th International Florida Artificial Intelligence Research Society Conference
(pp. 205–210). Menlo Park, California: The AAAI Press.
Crossley, S. A., & McNamara, D. S. (2008). Assessing second language reading texts at
the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy,
and McNamara (2007). Language Teaching, 41, 229–409.
Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differ-
ences in L1 and L2 writing. Journal of Second Language Writing, 18, 119–135.
Crossley, S. A., & McNamara, D. S. (2010). Cohesion, coherence, and expert evaluations of
writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd
Annual Conference of the Cognitive Science Society (pp. 984–989). Austin, TX: Cognitive
Science Society.
Crossley, S. A., & McNamara, D. S. (2011a). Text coherence and judgments of essay
quality: Models of quality and coherence. In L. Carlson, C. Hoelscher, & T. F. Shipley
(Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society
(pp. 1236–1231). Austin, TX: Cognitive Science Society.
Crossley, S. A., & McNamara, D. S. (2011b). Understanding expert ratings of essay
quality: Coh-Metrix analyses of first and second language writing. International
Journal of Continuing Engineering Education and Life, 21, 170–191.
Crossley, S. A., & McNamara, D. S. (2012a). Detecting the first language of second language
writers using automated indices of cohesion, lexical sophistication, syntactic complexity
and conceptual knowledge. In S. Jarvis & S. A. Crossley (Eds.), Approaching language
transfer through text classification: Explorations in the detection-based approach
(pp. 106–126). Bristol, UK: Multilingual Matters.
Crossley, S. A., & McNamara, D. S. (2012b). Interlanguage Talk: A computational anal-
ysis of non-native speakers’ lexical production and exposure. In P. M. McCarthy &
C. Boonthum-Denecke (Eds.), Applied natural language processing and content
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 232 [229–246] 7.10.2013
3:30PM

232 References

analysis: Identification, investigation, and resolution (pp. 425–437). Hershey, PA: IGI
Global.
Crossley, S. A., Roscoe, R., Graesser, A., & McNamara, D. S. (2011). Predicting human
scores of essay quality using computational indices of linguistic and textual features.
In G. Biswas, S. Bull, J. Kay, & A. Mitrovic (Eds.), Proceedings of the 15th International
Conference on Artificial Intelligence in Education. (pp. 438–440). Auckland, New
Zealand: AIED.
Crossley, S. A., Salsbury, T., McCarthy, P. M., & McNamara, D. S. (2008), LSA as a measure
of coherence in second language natural discourse. In V. Sloutsky, B. Love, & K. McRae
(Eds.), Proceedings of the 30th annual conference of the Cognitive Science Society
(pp. 1906–1911). Washington, DC: Cognitive Science Society.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2009). Measuring L2 lexical growth
using hypernymic relationships. Language Learning, 59, 307–334.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010a). The development of polysemy
and frequency use in English second language speakers. Language Learning, 60,
573–605.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010b). The development of semantic
relations in second language speakers. A case for Latent Semantic Analysis. Vigo
International Journal of Applied Linguistics, 7, 55–74.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010c). The role of lexical cohesive
devices in triggering negotiations for meaning. Issues in Applied Linguistics, 18,
55–80.
Crossley, S. A., Weston, J., McLain Sullivan, S. T., & McNamara, D. S. (2011). The
development of writing proficiency as a function of grade level: A linguistic analysis.
Written Communication, 28, 282–311.
Defense Advanced Research Projects Agency (DARPA) (1995). Proceedings of the Sixth
Message Understanding Conference (MUC-6). San Francisco: Morgan Kaufman
Publishers.
Day, R. S. (2006). Comprehension of prescription drug information: Overview of a
research program. In Proceedings of the American Association for Artificial
Intelligence, Argumentation for Consumer Healthcare. Retrieved September 16, 2013,
from http://www.aaai.org/Library/Symposia/Spring/2006/ss06-01-005.php
Dell, G., McKoon, G., & Ratcliff, R. (1983). The activation of antecedent information
during the processing of anaphorix reference in reading. Journal of Verbal Learning
and Verbal Behavior, 22, 121–132.
Dempsey, K. B., McCarthy, P. M., & McNamara, D. S. (2007). Using phrasal verbs as an
index to distinguish text genres. In D. Wilson and G. Sutcliffe (Eds.), Proceedings of the
twentieth International Florida Artificial Intelligence Research Society Conference
(pp. 217–222). Menlo Park, CA: The AAAI Press.
Dufty, D. F., Graesser, A. C., Louwerse, M., & McNamara, D. S. (2006). Assigning grade
level to textbooks: Is it just readability? In R. Sun & N. Miyake (Eds.), Proceedings of
the 28th Annual Conference of the Cognitive Science Society (pp. 1251–1256). Austin,
TX: Cognitive Science Society.
Dufty, D. F., McNamara, D., Louwerse, M., Cai, Z., & Graesser, A. C. (2004). Automatic
evaluation of aspects of document quality. In S. Tilley & S. Huang (Eds.), Proceedings
of the 22nd Annual International Conference on Design of Communication: the
Engineering of Quality Documentation (pp. 14–16). New York: ACM Press.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 233 [229–246] 7.10.2013
3:30PM

References 233

Duncan, B., & Hall, C. (2009). A coh-metrix analysis of variation among biomedical
abstracts. In Florida Artificial Intelligence Research Society Conference (pp. 237–242).
Menlo Park, CA: The AAAI Press.
Duran, N., Bellissens, C., Taylor, R., & McNamara, D. S. (2007). Qualifying text difficulty
with automated indices of cohesion and semantics. In D. S. McNamara & G. Trafton
(Eds.), Proceedings of the 29th Annual Meeting of the Cognitive Science Society
(pp. 233–238). Austin, TX: Cognitive Science Society.
Duran, N. D., Hall, C., McCarthy, P. M., & McNamara, D. S. (2010). The linguistic
correlates of conversational deception: Comparing natural language processing tech-
nologies. Applied Psycholinguistics, 31, 439–462.
Duran, N. D., McCarthy, P. M., Graesser, A. C., & McNamara, D. S. (2006). Using Coh-
Metrix temporal indices to predict psychological measures of time. In R. Sun &
N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science
Society (pp. 190–195). Austin, TX: Cognitive Science Society.
Duran, N. D., McCarthy, P. M., Graesser, A. C., & McNamara, D. S. (2007). Using
temporal cohesion to predict temporal coherence in narrative and expository texts.
Behavior Research Methods, 39, 212–223.
Duran, N. D., & McNamara, D. S. (2006, July). It’s about time: Discriminating differences
in temporality between genres. Poster presented at the 16th Annual Meeting of the
Society for Text and Discourse, Minneapolis, MN.
Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Database [CD-ROM].
Cambridge, MA: MIT Press.
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233.
Freedman, A., & Ian, P. (1980). Writing in the college years: Some indices of growth.
College Composition and Communication, 31, 311–324.
Garnham, A., Oakhill, J., & Johnson-Laird, P. N. (1982). Referential continuity and the
coherence of discourse. Cognition, 11, 29–46.
Gernsbacher, M. A. (1990). Language comprehension as structure building. Hillsdale, NJ:
Erlbaum.
Gernsbacher, M. A., & Givón, T. (Eds.). (1995). Coherence in spontaneous text. Amsterdam:
Benjamins.
Gildea, D. (2001). Corpus variation and parser performance. In D. Yarowsky (Ed.),
Proceedings of the 2001 Conference on Empirical Methods in Natural Language
Processing (pp. 167–202). Pittsburgh, PA: NAACL.
Gilhooly, K. L., & Logie, R. H. (1980). Age of acquisition, imagery, concreteness, familiar-
ity and ambiguity measures for 1944 words. Behavioral Research Methods and
Instrumentation, 12, 395–427.
Givón, T. (1995).Functionalism and grammar. Philadelphia: John Benjamins.
Graesser, A. C. (1981). Prose comprehension beyond the word. New York: Springer-Verlag.
Graesser, A. C., Cai, Z., Louwerse, M., & Daniel, F. (2006). Question Understanding Aid
(QUAID): A web facility that helps survey methodologists improve the comprehen-
sibility of questions. Public Opinion Quarterly, 70, 3–22.
Graesser, A. C., Chipman, P., Haynes, B. C., & Olney, A. (2005). AutoTutor: An intelli-
gent tutoring system with mixed-initiative dialogue. IEEE Transactions in Education,
48, 612–618.
Graesser, A. C., Dowell, N., & Moldovan, C. (2011). A computer’s understanding of
literature. Scientific Studies of Literature, 1, 24–33.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 234 [229–246] 7.10.2013
3:30PM

234 References

Graesser, A. C., Gernsbacher, M. A., & Goldman, S. R. (Eds.). (2003). Handbook of


discourse processes. Mahwah, NJ: Lawrence Erlbaum.
Graesser, A. C., & Hemphill, D. (1991). Question answering in the context of scientific
mechanisms. Journal of Memory and Language, 30, 186–209.
Graesser, A. C., Hoffman, N. L., & Clark, L. F. (1980). Structural components of reading
time. Journal of Verbal Learning and Verbal Behavior, 19, 135–151.
Graesser, A. C., Jeon, M., Cai, Z., & McNamara, D. S. (2008). Automatic analyses of
language, discourse, and situation models. In J. Auracher & W. van Peer (Eds.), New
beginnings in literary studies (pp. 72–88). Cambridge: Cambridge Scholars Publishing.
Graesser, A. C., Jeon, M., & Dufty, D. (2008). Agent technologies designed to facilitate
interactive knowledge construction. Discourse Processes, 45, 298–322.
Graesser, A. C., Jeon, M., Yang, Y., & Cai, Z. (2007). Discourse cohesion in text and
tutorial dialogue. Information Design Journal, 15, 199–213.
Graesser, A. C., & McNamara, D. S. (2011). Computational analyses of multilevel dis-
course comprehension. Topics in Cognitive Science, 3, 371–398.
Graesser, A. C., McNamara, D. S., & Kulikowich, J. (2011). Coh-Metrix: Providing multi-
level analyses of text characteristics. Educational Researcher, 40, 223–234.
Graesser, A. C., McNamara, D. S., & Louwerse, M. M. (2003). What do readers need to
learn in order to process coherence relations in narrative and expository text? In
A. P. Sweet & C. E. Snow (Eds.), Rethinking reading comprehension (pp. 82–98). New
York: Guilford Press.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix:
Analysis of text on cohesion and language. Behavior Research Methods, Instruments,
and Computers, 36, 193–202.
Graesser, A. C., Millis, K. K., & Zwaan, R. A. (1997). Discourse comprehension. Annual
Review of Psychology, 48, 163–189.
Graesser, A. C., Olde, B., & Klettke, B. (2002). How does the mind construct and
represent stories? In M. C. Green, J. J. Strange, & T. C. Brock (Eds.), Narrative impact:
Social and cognitive foundations (pp. 231–263). Mahwah, NJ: Lawrence Erlbaum
Graesser, A. C., & Ottati, V. (1996). Why stories? Some evidence, questions, and chal-
lenges. In R. S. Wyer (Ed.), Knowledge and memory: The real story (pp. 121–132).
Hillsdale, NJ: Erlbaum.
Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during
narrative text comprehension. Psychological Review, 101, 371–395.
Greenfield, G. (1999). Classic readability formulas in an EFL context: Are they valid for
Japanese speakers? Unpublished doctoral dissertation, Temple University, Philadelphia,
PA, United States (University Microfilms No. 99–38670).
Haberlandt, K., & Graesser, A. C. (1985). Component processes in text comprehension and
some of their interactions. Journal of Experimental Psychology: General, 114, 357–374.
Hall, C., McCarthy, P. M., Lewis, G. A., Lee, D. S., & McNamara, D. S. (2007). A Coh-
Metrix assessment of American and English/Welsh Legal English. Coyote Papers:
Psycholinguistic and Computational Perspectives. University of Arizona Working
Papers in Linguistics, 15, 40–54.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.
Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. (2007). On lying and being
lied to: A linguistic analysis of deception in computer-mediated communication.
Discourse Processes, 45, 1–23.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 235 [229–246] 7.10.2013
3:30PM

References 235

Haviland, S. E., & Clark, H. H. (1974). What’s new? Acquiring new information as
a process in comprehension. Journal of Verbal Learning and Verbal Behavior, 13,
515–521.
Healy, S. L., Weintraub, J. D., McCarthy, P. M., Hall, C., & McNamara, D. S. (2009).
Assessment of LDAT as a grammatical diversity assessment tool. In C. H. Lane &
H. W. Guesgen (Eds.), Proceedings of the 22nd International Florida Artificial Intelligence
Research Society (FLAIRS) International Conference (pp. 249–253). Menlo Park, CA: The
AAAI Press.
Hempelmann, C. F., Dufty, D., McCarthy, P., Graesser, A. C., Cai, Z., & McNamara, D. S.
(2005). Using LSA to automatically identify givenness and newness of noun-phrases
in written discourse. In B. Bara (Ed.), Proceedings of the 27th Annual Meeting of the
Cognitive Science Society (pp. 941–946). Mahwah, NJ: Erlbaum.
Hempelmann, C. F., Rus V., Graesser, A. C., & McNamara, D. S. (2006). Evaluating state-
of-the-art treebank-style parsers for Coh-Metrix and other learning technology envi-
ronments. Natural Language Engineering, 12, 131–144.
Herskovits, A. (1998). Schematization. In P. Olivier & K. P. Gapp (Eds.), Representation
and processing of spatial expressions (pp. 149–162). Mahwah, NJ: Lawrence Erlbaum
Associates.
Hu, X., Cai, Z., Louwerse, M. M., Olney, A. M., Penumatsa, P., & Graesser, A. C. (2003). A
revised algorithm for Latent Semantic Analysis. Proceedings of the 2003 International
Joint Conference on Artificial Intelligence (pp. 1489–1491). San Francisco: Morgan
Kaufmann.
Huot, B. (1996). Toward a new theory of writing assessment. College composition and
communication, 47, 549–566.
Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity.
Language Testing, 19, 57–84.
Jurafsky, D., & Martin, J. (2008). Speech and language processing. Englewood, NJ:
Prentice Hall.
Just, M. A., & Carpenter, P. A. (1971). Comprehension of negation with quantification.
Journal of Verbal Learning and Verbal Behavior, 12, 21–31.
Just, M. A., & Carpenter, P. A. (1987). The psychology of reading and language compre-
hension. Boston: Allyn & Bacon.
Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual
differences in working memory. Psychological Review, 99, 122–149.
Kallet H. (2004) How to write the methods section of a research paper. Respiratory Care
Services, 49, 1229–1232.
Kalyuga, S. (2012). Cognitive load aspects of text processing. In C. Boonthum-Denecke,
P. McCarthy, & T. Lamkin (Eds.), Cross-disciplinary advances in applied natural
language processing: Issues and approaches (pp. 114–132). Hershey, PA: Information
Science Reference.
Kamil, M. L., Pearson, D., Moje, E. B., & Afflerbach, P. (Eds.). (2010). Handbook of
reading research (Vol. 4). New York: Routledge
Keenan, J. M., Betjemann, R. S., & Olson, R. K. (2008). Reading comprehension tests vary
in the skills they assess: Differential dependence on decoding and oral comprehension.
Scientific Studies of Reading, 12, 281–300.
Keil, F. C. (1981). Constraints on knowledge and cognitive development. Psychological
Review, 88, 197–227.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 236 [229–246] 7.10.2013
3:30PM

236 References

Kieras, D. E. (1978). Good and bad structure in simple paragraphs: Effects on apparent
theme, reading time, and recall. Journal of Verbal Learning and Verbal Behavior, 17,
13–28.
Kincaid, J., Fishburne, R., Rogers, R., & Chissom, B. (1975). Derivation of new readability
formulas for navy enlisted personnel. Branch Report 8–75. Millington, TN: Chief of
Naval Training.
King, M., & Rentel, V. (1979). Toward a theory of early writing development. Research in
the Teaching of English, 13, 243–255.
Kinnear, P. R., & Gray, C. D. (2008). SPSS 15 made simple. New York: Psychology Press.
Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge: Cambridge
University Press.
Kintsch, W. (1988). The role of knowledge in discourse comprehension: a construction-
integration model. Psychological review, 95, 163–182.
Kintsch, W., & Keenan, J. (1973). Reading rate and retention as a function of the number
of propositions in the base structure of sentences. Cognitive psychology, 5, 257–274.
Kintsch, W., Kozminsky, E., Streby, W. J., McKoon, G., & Keenan, J. M. (1975).
Comprehension and recall of text as a function of content variables. Journal of
Verbal Learning and Verbal Behavior, 14, 196–214.
Kintsch, W., & Van Djik, T. A. (1978). Toward a model of text comprehension and
production. Psychological Review, 85, 363–394.
Kireyev, K., & Landauer, T. (2011). Word maturity: Computational modeling of word
knowledge. In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies (pp. 299–308). Portland,
OR: Association for Computational Linguistics.
Klare, G. R. (1974–1975). Assessing readability. Reading Research Quarterly, 10, 62–102.
Klein, W. (1994). Time in language. London: Routledge.
Koslin, B. I., Zeno, S., & Koslin, S. (1987). The DRP: An effective measure in reading. New
York: College Entrance Examination Board.
Lamkin, T. A., & McCarthy, P. M. (2012). The hierarchy of detective fiction. In
C. Murray & P. M. McCarthy (Eds.), Proceedings of the 24rd International Florida
Artificial Intelligence Research Society Conference (pp. 257–262). Menlo Park, CA: The
AAAI Press.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent
semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological Review, 104, 211–240.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment.
Assessment in Education: Principles, Policy & Practice, 10, 295–308.
Landauer, T., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2007). Handbook of
latent semantic analysis. Mahwah, NJ: Erlbaum.
Lappin, S., & Leass, H. J. (1994). An algorithm for pronominal coreference resolution.
Computational Linguistics, 20, 535–561.
Leahey, T. H., & Harris, R. J. (1997). Learning and cognition (4th ed.). Saddle River, NJ:
Prentice Hall.
Leech, N. L., Barrett, K. C., & Morgan, G. A. (2008). SPSS for intermediate statistics: Use
and interpretation. Mahwah, NJ: Lawrence Erlbaum Associates.
Lehnert, W. G., & Ringle, M. H. (Eds.). (1982). Strategies for natural language processing.
Hillsdale, NJ: Lawrence Erlbaum.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 237 [229–246] 7.10.2013
3:30PM

References 237

Lenat, D. B. (1995). CYC: A large-scale investment in knowledge infrastructure.


Communication of the ACM, 38, 32–38.
Lightman, E. J., McCarthy, P. M., Dufty, D. F., & McNamara, D. S. (2007a). The struc-
tural organization of high school educational texts. In D. Wilson & G. Sutcliffe (Eds.),
Proceedings of the 20th International Florida Artificial Intelligence Research Society
Conference (pp. 235–240). Menlo Park, California: The AAAI Press.
Lightman, E. J., McCarthy, P. M., Dufty, D. F., & McNamara, D. S. (2007b). Using
computation text analysis tools to compare the lyrics of suicidal and non-suicidal
song-writers. In D. S. McNamara & G. Trafton (Eds.), Proceedings of the 29th Annual
Conference of the Cognitive Science Society (pp. 1217–1222). Austin, TX: Cognitive
Science Society.
Linderholm, T., Everson, M. G., Van Den Broek, P., Mischinski, M., Crittenden, A., &
Samuels, J. (2000). Effects of causal text revisions on more-and less-skilled readers’
comprehension of easy and difficult texts. Cognition and Instruction, 18, 525–556.
Long, D. L., Oppy, B. J., & Seely, M. R. (1994). Individual differences in the time course of
inferential processing, Journal of Experimental Psychology, 20, 1245–1470.
Long, M., & Ross, S. (1993). Modifications that preserve language and content. In
M. L. Tickoo (Ed.), Simplification: Theory and application (pp. 29–52). Singapore:
SEAMEO Regional Language Center.
Longo, B. (1994). The role of metadiscourse in persuasion. Technical Communication, 41,
348–352.
Lorch, Jr., R. F., Lorch, E. P., & Mogan, A. M. (1987). Task effects and individual
differences in on-line processing of the topic structure of a text. Discourse
Processes, 24, 350–362.
Lorch, R. F., Jr., & O’Brien, E. J. (1995). Sources of coherence in reading. Hillsdale, NJ:
Erlbaum.
Louwerse, M. M. (2001). An analytic and cognitive parameterization of coherence
relations. Cognitive Linguistics, 12, 291–315.
Louwerse, M. M., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2004). Variation
in language and cohesion across written and spoken registers. In K. D. Forbus,
D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the
Cognitive Science Society. Mahwah, NJ: Erlbaum.
Loxterman, J. A., Beck, I. L., & McKeown, M. G. (1994). The effects of thinking aloud
during reading on students’ comprehension of more or less coherent text. Reading
Research Quarterly, 29, 353–367.
Magliano, J. P., Millis, K. K., The RSAT Development Team, Levinstein, I., & Boonthum,
C. (2011). Assessing comprehension during reading with the reading strategy assess-
ment Tool (RSAT). Metacognition and Learning, 6, 131–154.
Magliano, J. P., Wiemer-Hastings, K., Millis, K. K., Muñoz, B. D., & Mcnamara, D. S.
(2002). Using latent semantic analysis to assess reader strategies. Behavior Research
Methods, Instruments, & Computers, 34, 181–188.
Mann W. C., & Thompson, S. (1988). Rhetorical Structure Theory: Toward a functional
theory of text organization. Text, 8, 243–281.
Marcu, D. (2000). The theory and practice of discourse parsing and summarization.
Cambridge, MA: MIT Press.
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus
of English: The Penn Treebank. Computational Linguistics, 19, 313–330.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 238 [229–246] 7.10.2013
3:30PM

238 References

McCarthy, P. M., & Boonthum-Denecke, C. (Eds.). (2012). Applied natural language


processing and content analysis: Identification, investigation, and resolution. Hershey,
PA: IGI Global.
McCarthy, P. M., Briner, S. W., Rus, V., & McNamara, D. S. (2007). Textual signatures:
Identifying text-types using latent semantic analysis to measure the cohesion of text
structures. In A. Kao & S. Poteet (Eds.), Natural language processing and text mining
(pp. 107–122). London: Springer-Verlag UK.
McCarthy, P. M., Dufty, D., Hempelmann, C., Cai, Z., Graesser, A. C., & McNamara, D. S.
(2012). Newness and givenness of information: Automated identification in written
discourse. In P. M. McCarthy & C. Boonthum (Eds.), Applied natural language process-
ing and content analysis: Identification, investigation, and resolution (pp. 457–478).
Hershey, PA: IGI Global.
McCarthy, P. M., Hall, C., Duran N. D., Doiuchi, M., Duncan, B., Fujiwara, Y., &
McNamara, D. S., (2009). A computational analysis of journal abstracts written by
Japanese, American, and British scientists. The ESPecialist, 30, 141–173.
McCarthy, P. M., Guess, R. H., & McNamara, D. S. (2009). The components of para-
phrase evaluations. Behavioral Research Methods, 41, 682–690.
McCarthy, P. M., & Jarvis, S. (2007). A theoretical and empirical evaluation of vocd.
Language Testing, 24, 459–488.
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of
sophisticated approaches to lexical diversity assessment. Behavior Research Methods,
42, 381–392.
McCarthy, P. M., & Jarvis, S. (2013). From intrinsic to extrinsic issues of lexical diversity
assessment: An ecological validation study. In S. Jarvis & M. Daller (Eds.), Vocabulary
knowledge: Human ratings and automated measures (pp. 45–78). Amsterdam: Benjamins.
McCarthy, P. M., Lehenbauer, B. M., Hall, C., Duran, N. D., Fujiwara, Y., & McNamara, D. S.
(2007). A Coh-Metrix analysis of discourse variation in the texts of Japanese, American,
and British Scientists. Foreign Languages for Specific Purposes, 6, 46–77.
McCarthy, P. M., Lewis, G. A., Dufty, D. F., & McNamara, D. S. (2006). Analyzing
writing styles with Coh-Metrix. In Proceedings of the Florida Artificial Intelligence
Research Society International Conference (pp. 764–769). Menlo Park, CA: AAAI
Press.
McCarthy, P. M., & McNamara, D. S. (2007). Are seven words all we need? Recognizing
genre at the sub-sentential level. In D. S. McNamara and G. Trafton (Eds.),
Proceedings of the 29th annual conference of the Cognitive Science Society (pp. 1295–
1300). Austin, TX: Cognitive Science Society.
McCarthy, P. M., Myers, J. C., Briner, S. W., Graesser, A. C., & McNamara, D. S. (2009).
Are three words all we need? A psychological and computational study of sub-
sentential genre recognition. Journal for Language Technology and Computational
Linguistics, 24, 23–55.
McCarthy, P. M., Renner, A. M., Duncan, M. G., Duran, N. D., Lightman, E. J., &
McNamara, D. S. (2008). Identifying topic sentencehood. Behavior Research and
Methods, 40, 647–664.
McCarthy, P. M., Rus, V., Crossley, S. A., Bigham, S. C., Graesser, A. C., & McNamara, D. S.
(2007). Assessing entailer with a corpus of natural language. In D. Wilson & G. Sutcliffe
(Eds.), Proceedings of the 20th International Florida Artificial Intelligence Research
Society Conference (pp. 247–252). Menlo Park, California: The AAAI Press.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 239 [229–246] 7.10.2013
3:30PM

References 239

McCarthy, P. M., Rus, V., Crossley, S. A., Graesser, A. C., & McNamara, D. S. (2008).
Assessing forward-, reverse-, and average-entailment indices on natural language
input from the intelligent tutoring system, iSTART. In D. Wilson & G. Sutcliffe
(Eds.), Proceedings of the 21st International Florida Artificial Intelligence Research
Society (FLAIRS) Conference (pp. 165–170). Menlo Park, CA: The AAAI Press.
McCarthy, P. M., Watanabe, S., & Lamkin, T. A. (2012). The Gramulator: A tool to
identify differential linguistic features of correlative text types. In P. M. McCarthy &
C. Boonthum-Denecke (Eds.), Applied natural language processing: Identification,
investigation, and resolution (pp. 312–333). Hershey, PA: IGI Global.
McCutchen, D. (1986). Domain knowledge and linguistic knowledge in the development
of writing ability. Journal of Memory and Language, 25, 431–444.
McCutchen, D., & Perfetti, C. A. (1982). Coherence and connectedness in the develop-
ment of discourse production. Text, 2, 113–139.
McNamara, D. S. (1997). Comprehension skill: A knowledge-based account. In
M. G. Shafto & P. Langley (Eds.), Proceedings of the Nineteenth Annual Conference
of the Cognitive Science Society (pp. 508–513). Hillsdale, NJ: Erlbaum.
McNamara, D. S. (2001). Reading both high-coherence and low-coherence texts: Effects
of text sequence and prior knowledge. Canadian Journal of Experimental Psychology,
55, 51–62.
McNamara, D. S. (2004). SERT: Self-explanation reading training. Discourse Processes,
38, 1–30.
McNamara, D. S. (2011). Computational methods to extract meaning from text and
advance theories of human cognition. Topics in Cognitive Science, 2, 1–15.
McNamara, D. S., Boonthum, C., Levinstein, I. B., & Millis, K. (2007). Evaluating self-
explanations in iSTART: Comparing word-based and LSA algorithms. In T. Landauer,
D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis
(pp. 227–241). Mahwah, NJ: Erlbaum.
McNamara, D. S., Cai, Z., & Louwerse, M. M. (2007). Optimizing LSA measures of
cohesion. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.),
Handbook of latent semantic analysis (pp. 379–400). Mahwah, NJ: Erlbaum.
McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of
writing quality. Written Communication, 27, 57–86.
McNamara, D. S., Crossley, S. A., & Roscoe, R. D. (2013). Natural language processing
in an intelligent writing strategy tutoring system. Behavior Research Methods, 45,
499–515.
McNamara, D. S., & Dempsey, K. (2011). Reader expectations of question formats and
difficulty: Targeting the zone. In M. McCrudden, J. Magliano, & G. Schraw (Eds.),
Text relevance and learning from text (pp. 321–352). Charlotte, NC: Information Age
Publishing.
McNamara, D. S., & Graesser, A. C. (2012). Coh-Metrix: An automated tool for theoret-
ical and applied natural language processing. In P. M. McCarthy & C. Boonthum
(Eds.), Applied natural language processing and content analysis: Identification, inves-
tigation, and resolution (pp. 188–205). Hershey, PA: IGI Global.
McNamara, D. S., Graesser, A. C., & Louwerse, M. M. (2012). Sources of text difficulty:
Across genres and grades. In J. P. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring
up: Advances in how we assess reading ability (pp. 89–116). Plymouth, UK: Rowman &
Littlefield Education.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 240 [229–246] 7.10.2013
3:30PM

240 References

McNamara, D. S., & Kintsch, W. (1996). Learning from text: Effects of prior knowledge
and text coherence. Discourse Processes, 22, 247–287.
McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always
better? Text coherence, background knowledge, and levels of understanding in learn-
ing from text. Cognition and Instruction, 14, 1–43.
McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (2010). Coh-
Metrix: Capturing linguistic features of cohesion. Discourse Processes, 47, 292–330.
McNamara, D. S., & Magliano, J. P. (2009). Towards a comprehensive model of com-
prehension. In B. Ross (Ed.), The psychology of learning and motivation (Vol. 51,
pp. 297–384). New York: Elsevier Science.
McNamara, D. S., & McDaniel, M. (2004). Suppressing irrelevant information:
Knowledge activation or inhibition? Journal of Experimental Psychology: Learning,
Memory, & Cognition, 30, 465–482.
McNamara, D. S., Ozuru, Y., & Floyd, R. G. (2011). Comprehension challenges in the
fourth grade: The roles of text cohesion, text genre, and readers’ prior knowledge.
International Electronic Journal of Elementary Education, 4, 229–257.
McNamara, D. S., Ozuru, Y., Graesser, A. C., & Louwerse, M. (2006). Validating Coh-
Metrix. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the
Cognitive Science Society (pp. 573–578). Austin, TX: Cognitive Science Society.
McNamara, D. S., Raine, R., Roscoe, R., Crossley, S., Jackson, G. T., Dai, J., Cai, Z.,
Renner, A., Brandon, R., Weston, J., Dempsey, K., Lam, D., Sullivan, S., Kim, L.,
Rus, V., Floyd, R., McCarthy, P. M., & Graesser, A. C. (2012). The Writing-Pal: Natural
language algorithms to support intelligent tutoring on writing strategies. In
P. M. McCarthy & C. Boonthum (Eds.), Applied natural language processing and
content analysis: Identification, investigation, and resolution (pp. 298–311). Hershey,
PA: IGI Global.
Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability.
London: National Assessment Agency.
Meichenbaum, D., & Biemiller, A. (1998). Nurturing independent learners: Helping
students take charge of their learning. Cambridge, MA: Brookline Books.
Meyer, B. J. F. (1975). The organization of prose and its effect on memory. New York:
Elsevier.
Meyer, B. J. F., & Wijekumar, K. (2007). Web-based tutoring of the structure strategy:
Theoretical background, design, and findings. In D. S. McNamara (Ed.), Reading
comprehension strategies: Theories, interventions, and technologies (pp. 347–375).
Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to
WordNet: An on-line lexical database. Journal of Lexicography, 3, 235–244.
Miller, J. R., & Kintsch, W. (1980). Readability and Recall of Short Prose Passages: A
Theoretical Analysis. Journal of Experimental Psychology: Human Learning and
Memory, 6, 335–354.
Millis, K., Graesser, A. C., & Haberlandt, K. (1993). The impact of connectives on
memory for expository texts. Applied Cognitive Psychology, 7, 317–340.
Millis, K., Magliano, J., Wiemer-Hastings, K., Todaro, S., & McNamara, D. S. (2007).
Assessing and improving comprehension with Latent Semantic Analysis. In
T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent
semantic analysis (pp. 207–225). Mahwah, NJ: Erlbaum.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 241 [229–246] 7.10.2013
3:30PM

References 241

Min, H. C., & McCarthy, P. M. (2010). Identifying varietals in the discourse of American
and Korean scientists: A contrastive corpus analysis using the gramulator. In
H. W. Guesgen & C. Murray (Eds.), Proceedings of the 23rd International Florida
Artificial Intelligence Research Society Conference (pp. 247–252). Menlo Park, CA: The
AAAI Press.
Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2012). Measures of text difficulty: Testing
their predictive value for grade levels and student performance. New York: Student
Achievement Partners.
Oakhill, J., & Cain, K. (2007). Issues of causality in children’s reading comprehension. In
K. Cain & J. Oakhill (Eds.), Cognitive bases of children’s language comprehension
difficulties. New York: Guilford.
Oakhill, J., Yuill, N., & Donaldson, M. L. (1990). Understanding of causal expressions in
skilled and less skilled text comprehenders. British Journal of Developmental Psychology,
8, 401–410.
Oakhill, J. V. (1984). Inferential and memory skills in children’s comprehension of
stories. British Journal of Educational Psychology, 54, 31–39.
Oakhill, J. V., & Yuill, N. M. (1996). Higher order factors in comprehensive disability:
Processes and remediation. In C. Cornoldi & J. V. Oakhill (Eds.), Reading compre-
hension difficulties: Processes and remediation (pp. 69–93). Mahwah, NJ: Lawrence
Erlbaum Associates.
O’Brien, E. J., Rizzella, M. L., Albrecht, J. E., & Halleran, J. G. (1998). Updating a situation
model: A memory-based text processing view. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 24, 1200–1210.
O’Reilly, T., Best, R., & McNamara, D. S. (2004). Self-explanation reading training: Effects
for low-knowledge readers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of
the 26th Annual Conference of the Cognitive Science Society (pp. 1053–1058). Mahwah,
NJ: Erlbaum.
O’Reilly, T., & McNamara, D. S. (2007). Reversing the reverse cohesion effect: good
texts can be better for strategic, high-knowledge readers. Discourse Processes, 43,
121–152.
Ozuru, Y., Best, R., Bell, C., Witherspoon, A., & McNamara, D. S. (2007). Influence of
question format and text availability on assessment of expository text comprehension.
Cognition & Instruction, 25, 399–438.
Ozuru, Y., Briner, S., Best, R., & McNamara, D. S. (2010). Contributions of self-
explanation to comprehension of high and low cohesion texts. Discourse Processes,
47, 641–667.
Ozuru, Y., Dempsey, K., & McNamara, D. S. (2009). Prior knowledge, reading skill,
and text cohesion in the comprehension of science texts. Learning and Instruction, 19,
228–242.
Ozuru, Y., Rowe, M., O’Reilly, T., & McNamara, D. S. (2008). Where’s the difficulty in
standardized reading tests: The passage or the question? Behavior Research Methods,
40, 1001–1015.
Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating
the ancient test. Phi Delta Kappan, 76, 561–565.
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery and meaningful-
ness values for 925 words. Journal of Experimental Psychology Monograph Supplement,
76 (3, Part 2).
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 242 [229–246] 7.10.2013
3:30PM

242 References

Palmer, M., Kingsbury, P., & Gildea, D. (2005). The Proposition Bank: An annotated
corpus of semantic roles. Computational Linguistics, 31, 71–106.
Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic Inquiry and Word
Count: LIWC 2007. Austin, TX: LIWC.net.
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The
development and psychometric properties of LIWC2007. Austin, TX: LIWC.net.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count
(LIWC) (Version LIWC2001) [Computer software]. Mahwah, NJ: Erlbaum.
Pennebaker, W. B. (2011). The secret life of pronouns: What our words say about us.
London: Bloomsbury Press.
Pentimonti, J. M., Zucker, T. A., Justice, L. M., & Kaderavek, J. N. (2010). Informational
text use in preschool classroom read-alouds. The Reading Teacher, 63, 656–665.
Perfetti, C. A. (2007). Reading ability: Lexical quality to comprehension. Scientific Studies
of Reading, 11, 357–383.
Perfetti, C. A., Landi, N., & Oakhill, J. The acquisition of reading comprehension Skill. In
M. J. Snowling & C. Hulme (Eds.), The science of reading: A handbook (pp. 227–247).
Oxford: Blackwell.
Pickering, M., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue.
Behavioral and Brain Sciences, 27, 169–226.
Popken, R. (1991). A study of topic sentence use in technical writing. The Technical
Writing Teacher, 18, 49–58.
Prince, E. F. (1981). Toward a taxonomy of given-new information. In P. Cole (Ed.),
Radical pragmatics (pp. 223–255). New York: Academic Press.
Rapp, D. N., van den Broek, P., McMaster, K. L., Kendeou, P., & Espin, C. A. (2007).
Higher-order comprehension processes in struggling readers: A perspective for
research and intervention. Scientific Studies of Reading, 11, 289–312.
Rayner, K. (1998) Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124, 372–422.
Rayner, K., Foorman, B., Perfetti, C., Pesetsky, D., & Seidenberg, M. (2001). How
psychological science informs the teaching of reading. Psychological Science in the
Public Interest, 2(2), 31–74.
Renner, A., McCarthy, P. M., Boonthum-Denecke, C., & McNamara, D. S. (2012).
Maximizing ANLP evaluation: Harmonizing flawed input. In P. M. McCarthy &
C. Boonthum-Denecke (Eds.), Applied natural language processing and content anal-
ysis: Identification, investigation, and resolution (pp. 438–456). Hershey, PA: IGI
Global.
Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure
of categories. Cognitive Psychology, 7, 573–605.
Roscoe, R. D., Crossley, S. A., Weston, J. L., & McNamara, D. S. (2011). Automated
assessment of paragraph quality: Introductions, body, and conclusion paragraphs.
In R. C. Murray & P. M. McCarthy (Eds.), Proceedings of the 24th International
Florida Artificial Intelligence Research Society (FLAIRS) Conference (pp. 281–286).
Menlo Park, CA: AAAI Press.
Rowe, M., & McNamara, D. S. (2008). Inhibition needs no negativity: Negativity links in the
construction-integration model. In V. Sloutsky, B. Love, & K. McRae (Eds.), Proceedings
of the 30th Annual Conference of the Cognitive Science Society (pp. 1777–1782).
Washington, DC: Cognitive Science Society.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 243 [229–246] 7.10.2013
3:30PM

References 243

Rubin, D. C. (1995). Introduction. In D. C. Rubin (Ed.), Remembering our past: Studies in


autobiographical memory (pp. 1–15). New York: Cambridge University Press.
Rufenacht, R. M., McCarthy, P. M., & Lamkin, T. M. (2011). Fairy tales and ESL texts: An
analysis of linguistic features using the gramulator. In R. C. Murray & P. M. McCarthy
(Eds.), Proceedings of the 24th International Florida Artificial Intelligence Research
Society (FLAIRS) Conference (pp. 287–292). Menlo Park, CA: AAAI Press.
Rus, V. (2004). A first exercise for evaluating logic form identification systems. In
Proceedings Third International Workshop on the Evaluation of Systems for the
Semantic Analysis of Text (SENSEVAL-3), at the Association of Computational
Linguistics Annual Meeting, July. Barcelona, Spain: ACL.
Rus, V., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2008). A study of textual
entailment. International Journal on Artificial Intelligence Tools, 17, 659–685.
Sanacore, J., & Palumbo, A. (2009). Understanding the fourth-grade slump: Our point of
view. Educational Forum, 73, 67–74.
Sanders, T. J. M. (1997). Semantic and pragmatic sources of coherence: On the catego-
rization of coherence relations in context. Discourse Processes, 24, 119–147.
Sanders, T. J. M., & Noordman, L. G. M. (2000). The role of coherence relations and their
linguistic markers in text processing. Discourse Processes, 29, 37–60.
Sanders, T. J. M., Spooren, W. P. M., & Noordman, L. G. M. (1992).Toward a taxonomy
of coherence relations. Discourse Processes, 15, 1–35.
Sanford, A. J., & Emmott, C. (2012). Mind, brain and narrative. Cambridge: Cambridge
University Press.
Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An
inquiry into human knowledge structures. Hillsdale, NJ: Erlbaum.
Sekine, S., & Grishman, R. (1995). A corpus-based probabilistic grammar with only
two nonterminals. In Fourth International Workshop on Parsing Technologies
(pp. 260–270). Prague/Karlovy Vary, Czech Republic.
Shanahan, T., Kamil, M. L., & Tobin, A. W. (1982). Cloze as a measure of intersentential
comprehension. Reading Research Quarterly, 17, 229–255.
Singer, M., & Leon, J. (2007). Psychological studies of higher language processes:
Behavioral and empirical approaches. In F. Schmalhofer & C. Perfetti (Eds.), Higher
level language processes in the brain: Inference and comprehension processes (pp. 9–25).
Mahwah, NJ: Lawrence Erlbaum.
Singer M., & Ritchot, K. F. M. (1996). The role of working memory capacity and knowl-
edge access in text inference processing. Memory & Cognition, 24, 733–743.
Smith, E. E., & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard
University Press.
Snow, C. (2002). Reading for understanding: Toward an R&D program in reading
comprehension. Santa Monica, CA: RAND Corporation.
Spivey, M., McRae, K., & Joanisse, M. (Eds.). (2012). The Cambridge handbook of
psycholinguistics. Cambridge: Cambridge University Press.
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individ-
ual differences in the acquisition of literacy. Reading Research Quarterly, 21(4),
360–406.
Stenner, A. J. (1996). Measuring reading comprehension with the Lexile framework.
Presented at the California Comparability Symposium, October, Durham, NC.
Retrieved January 30, 2006 from http://www.lexile.com/DesktopDefault.aspx?view=re.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 244 [229–246] 7.10.2013
3:30PM

244 References

Swales, J. (1981). Aspects of article introductions. Birmingham, UK: The University of


Aston, Language Studies Unit.
SwaIes, J. (1990). Genre analysis. Cambridge: Cambridge University Press.
Sweet, A. P., & Snow, C. E. (Eds.). (2003). Rethinking reading comprehension. New York:
Guilford.
Tannen, D. (1982). Oral and literate strategies in spoken and written narratives.
Language, 58, 1–21.
Templin, M. (1957). Certain language skills in children: Their development and interre-
lationships. Minneapolis: The University of Minnesota Press.
ter Meulen, A. G. B. (1995). Representing time in natural language: The dynamic inter-
pretation of tense and aspect. Cambridge, MA: The MIT Press.
Toglia, M. P., & Battig, W. R. (1978). Handbook of semantic word norms. New York:
Lawrence Erlbaum.
Tonjes, M. J., Ray, W., & Zintz, M. V. (1999). Integrated content literacy. New York: The
McGraw-Hill Publishers.
Trabasso, T., & van den Broek, P. (1985). Causal thinking and the representation of
narrative events. Journal of Memory and Language, 24, 612–630.
Tweissi, A. I. (1998). The effects of the amount and the type of simplification on foreign
language reading comprehension. Reading in a Foreign Language, 11, 191–206.
U.S. Air Force Reserve Officers’ Training Corps. (1985). U.S. air power: Key to deterrence.
Montgomery, AL: U.S. Air Force.
Van den Broek, P., Rapp, D. N., & Kendeou, P. (2005). Integrating memory-based and
constructionist approaches in accounts of reading comprehension. Discourse Processes,
39, 299–316.
Van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New York:
Academic Press.
Vande Kopple, W. J. (1985). Some exploratory discourse on metadiscourse. College
Composition and Communication, 36, 82–93.
VanLehn, K., Graesser, A. C., Jackson, G. T., Jordan, P., Olney, A., & Rose, C. P. (2007).
When are tutorial dialogues more effective than reading? Cognitive Science, 31, 3–62.
Weston, J., Crossley, S. A., & McNamara, D. S. (2010). Towards a computational assess-
ment of freewriting quality. In H. W. Guesgen & C. Murray (Eds.), Proceedings of the
23rd International Florida Artificial Intelligence Research Society (FLAIRS) Conference
(pp. 283–288). Menlo Park, CA: The AAAI Press.
Whitney, P., Ritchie, B. G., & Clark, M. B. (1991). Working-memory capacity and the use
of elaborative inferences in text comprehension. Discourse Processes, 14, 133–145.
Williams, P. J. (2007). Literacy in the curriculum: Integrating text structure and content
area instruction. In D. S. McNamara (Ed.), Reading comprehension strategies:
Theories, interventions, and technologies (pp. 199–219). Mahwah, NJ: Erlbaum.
Winograd, T. (1983). Language as a cognitive process: Syntax. Reading, MA: Addison-Wesley.
Wittgenstein, L. (1953). Philosophical investigations. London: Blackwell.
Yano, Y., Long, M., & Ross, S. (1994). Effects of simplified and elaborated texts on foreign
language reading comprehension. Language Learning, 44(2), 189–219.
Yuill, N., & Oakhill, J. (1988). Understanding of anaphoric relations in skilled and less
skilled comprehenders. British Journal of Psychology, 79, 173–186.
Zipf, G. K. (1949). Human behavior and the principle of least effort. Reading, MA:
Addison- Wesley.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 245 [229–246] 7.10.2013
3:30PM

References 245

Zwaan, R. A. (1994). Effect of genre expectations on text comprehension. Journal of


Experimental Psychology: Learning, Memory, Cognition, 20, 920–933.
Zwaan, R. A., Magliano, J. P., & Graesser, A. C. (1995). Dimensions of situation model
construction in narrative comprehension. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 21, 386–397.
Zwaan, R. A., & Radvansky, G. A. (1998). Situation models in language comprehension
and memory. Psychological Bulletin, 123, 162–185.
C:/ITOOLS/WMS/CUP-NEW/4412252/WORKINGFOLDER/MCNAM/9780521192927RFA.3D 246 [229–246] 7.10.2013
3:30PM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 247 [247–252] 9.10.2013
8:07AM

Appendix A: Coh-Metrix 3.0 Indices

This appendix provides the list of indices in Coh-Metrix Version 3.0. The first column
provides the label that appears in the output in the current version. The second column
provides the label used in prior versions of Coh-Metrix. The third column provides a
short description of the index.

Label in Label in
Version 3.x Version 2.x Description
Descriptive
1 DESPC READNP Paragraph count, number of
paragraphs
2 DESSC READNS Sentence count, number of sentences
3 DESWC READNW Word count, number of words
4 DESPL READAPL Paragraph length, number of
sentences, mean
5 DESPLd n/a Paragraph length, number of
sentences, standard deviation
6 DESSL READASL Sentence length, number of words,
mean
7 DESSLd n/a Sentence length, number of words,
standard deviation
8 DESWLsy READASW Word length, number of syllables,
mean
9 DESWLsyd n/a Word length, number of syllables,
standard deviation
10 DESWLlt n/a Word length, number of letters, mean
11 DESWLltd n/a Word length, number of letters,
standard deviation
Text Easability Principal Component Scores
12 PCNARz n/a Text Easability PC Narrativity, z score
13 PCNARp n/a Text Easability PC Narrativity,
percentile

(continued )

247
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 248 [247–252] 9.10.2013
8:07AM

248 Appendix A: Coh-Metrix 3.0 Indices

Label in Label in
Version 3.x Version 2.x Description
14 PCSYNz n/a Text Easability PC Syntactic simplicity,
z score
15 PCSYNp n/a Text Easability PC Syntactic simplicity,
percentile
16 PCCNCz n/a Text Easability PC Word concreteness,
z score
17 PCCNCp n/a Text Easability PC Word concreteness,
percentile
18 PCREFz n/a Text Easability PC Referential
cohesion, z score
19 PCREFp n/a Text Easability PC Referential
cohesion, percentile
20 PCDCz n/a Text Easability PC Deep cohesion, z
score
21 PCDCp n/a Text Easability PC Deep cohesion,
percentile
22 PCVERBz n/a Text Easability PC Verb cohesion, z
score
23 PCVERBp n/a Text Easability PC Verb cohesion,
percentile
24 PCCONNz n/a Text Easability PC Connectivity, z
score
25 PCCONNp n/a Text Easability PC Connectivity,
percentile
26 PCTEMPz n/a Text Easability PC Temporality, z score
27 PCTEMPp n/a Text Easability PC Temporality,
percentile
Referential Cohesion
28 CRFNO1 CRFBN1um Noun overlap, adjacent sentences,
binary, mean
29 CRFAO1 CRFBA1um Argument overlap, adjacent sentences,
binary, mean
30 CRFSO1 CRFBS1um Stem overlap, adjacent sentences,
binary, mean
31 CRFNOa CRFBNaum Noun overlap, all sentences, binary,
mean
32 CRFAOa CRFBAaum Argument overlap, all sentences,
binary, mean
33 CRFSOa CRFBSaum Stem overlap, all sentences, binary,
mean
34 CRFCWO1 CRFPC1um Content word overlap, adjacent
sentences, proportional, mean
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 249 [247–252] 9.10.2013
8:07AM

Appendix A: Coh-Metrix 3.0 Indices 249

Label in Label in
Version 3.x Version 2.x Description
35 CRFCWO1d n/a Content word overlap, adjacent
sentences, proportional, standard
deviation
36 CRFCWOa CRFPCaum Content word overlap, all sentences,
proportional, mean
37 CRFCWOad n/a Content word overlap, all sentences,
proportional, standard deviation
LSA
38 LSASS1 LSAassa LSA overlap, adjacent sentences, mean
39 LSASS1d LSAassd LSA overlap, adjacent sentences,
standard deviation
40 LSASSp LSApssa LSA overlap, all sentences in
paragraph, mean
41 LSASSpd LSApssd LSA overlap, all sentences in
paragraph, standard deviation
42 LSAPP1 LSAppa LSA overlap, adjacent paragraphs,
mean
43 LSAPP1d LSAppd LSA overlap, adjacent paragraphs,
standard deviation
44 LSAGN LSAGN LSA given/new, sentences, mean
45 LSAGNd n/a LSA given/new, sentences, standard
deviation
Lexical Diversity
46 LDTTRc TYPTOKc Lexical diversity, type-token ratio,
content word lemmas
47 LDTTRa n/a Lexical diversity, type-token ratio, all
words
48 LDMTLDa LEXDIVTD Lexical diversity, MTLD, all words
49 LDVOCDa LEXDIVVD Lexical diversity, VOCD, all words
Connectives
50 CNCAll CONi All connectives incidence
51 CNCCaus CONCAUSi Causal connectives incidence
52 CNCLogic CONLOGi Logical connectives incidence
53 CNCADC CONADVCONi Adversative and contrastive
connectives incidence
54 CNCTemp CONTEMPi Temporal connectives incidence
55 CNCTempx CONTEMPEXi Expanded temporal connectives
incidence
56 CNCAdd CONADDi Additive connectives incidence
57 CNCPos n/a Positive connectives incidence
58 CNCNeg n/a Negative connectives incidence

(continued )
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 250 [247–252] 9.10.2013
8:07AM

250 Appendix A: Coh-Metrix 3.0 Indices

Label in Label in
Version 3.x Version 2.x Description
Situation Model
59 SMCAUSv CAUSV Causal verb incidence
60 SMCAUSvp CAUSVP Causal verbs and causal particles
incidence
61 SMINTEp INTEi Intentional verbs incidence
62 SMCAUSr CAUSC Ratio of casual particles to causal verbs
63 SMINTEr INTEC Ratio of intentional particles to
intentional verbs
64 SMCAUSlsa CAUSLSA LSA verb overlap
65 SMCAUSwn CAUSWN WordNet verb overlap
66 SMTEMP TEMPta Temporal cohesion, tense and aspect
repetition, mean
Syntactic Complexity
67 SYNLE SYNLE Left embeddedness, words before main
verb, mean
68 SYNNP SYNNP Number of modifiers per noun phrase,
mean
69 SYNMEDpos MEDwtm Minimal Edit Distance, part of speech
70 SYNMEDwrd MEDawm Minimal Edit Distance, all words
71 SYNMEDlem MEDalm Minimal Edit Distance, lemmas
72 SYNSTRUTa STRUTa Sentence syntax similarity, adjacent
sentences, mean
73 SYNSTRUTt STRUTt Sentence syntax similarity, all
combinations, across paragraphs,
mean
Syntactic Pattern Density
74 DRNP n/a Noun phrase density, incidence
75 DRVP n/a Verb phrase density, incidence
76 DRAP n/a Adverbial phrase density, incidence
77 DRPP n/a Preposition phrase density, incidence
78 DRPVAL AGLSPSVi Agentless passive voice density,
incidence
79 DRNEG DENNEGi Negation density, incidence
80 DRGERUND GERUNDi Gerund density, incidence
81 DRINF INFi Infinitive density, incidence
Word Information
82 WRDNOUN NOUNi Noun incidence
83 WRDVERB VERBi Verb incidence
84 WRDADJ ADJi Adjective incidence
85 WRDADV ADVi Adverb incidence
86 WRDPRO DENPRPi Pronoun incidence
87 WRDPRP1s n/a First-person singular pronoun
incidence
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 251 [247–252] 9.10.2013
8:07AM

Appendix A: Coh-Metrix 3.0 Indices 251

Label in Label in
Version 3.x Version 2.x Description
88 WRDPRP1p n/a First-person plural pronoun incidence
89 WRDPRP2 PRO2i Second-person pronoun incidence
90 WRDPRP3s n/a Third-person singular pronoun
incidence
91 WRDPRP3p n/a Third-person plural pronoun
incidence
92 WRDFRQc FRCLacwm CELEX word frequency for content
words, mean
93 WRDFRQa FRCLaewm CELEX Log frequency for all words,
mean
94 WRDFRQmc FRCLmcsm CELEX Log minimum frequency for
content words, mean
95 WRDAOAc WRDAacwm Age of acquisition for content words,
mean
96 WRDFAMc WRDFacwm Familiarity for content words, mean
97 WRDCNCc WRDCacwm Concreteness for content words, mean
98 WRDIMGc WRDIacwm Imagability for content words, mean
99 WRDMEAc WRDMacwm Meaningfulness, Colorado norms,
content words, mean
100 WRDPOLc POLm Polysemy for content words, mean
101 WRDHYPn HYNOUNaw Hypernymy for nouns, mean
102 WRDHYPv HYVERBaw Hypernymy for verbs, mean
103 WRDHYPnv HYPm Hypernymy for nouns and verbs, mean
Readability
104 RDFRE READFRE Flesch Reading Ease
105 RDFKGL READFKGL Flesch-Kincaid Grade Level
106 RDL2 L2 Coh-Metrix L2 Readability
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX01.3D 252 [247–252] 9.10.2013
8:07AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D 253 [253–270] 9.10.2013
8:13AM

Appendix B: Coh-Metrix Indices Norms

This appendix provides norms for the indices described in Chapters 4 and 5. To create
these norms, we analyzed a subset of a large corpus of texts created by the Touchstone
Applied Science Associates (TASA), Inc. The total TASA corpus includes 9 genres
consisting of 119,627 paragraphs taken from 37,651 samples. The norms are provided for
the three largest domains represented in TASA: language arts, social studies, and science
texts. To do so, we randomly chose 100 passages from each of the 3 genres and each of 13
grade levels, for a total of 3,900 passages.
Grade level in the TASA corpus is indexed by the Degrees of Reading Power (DRP;
Koslin et al., 1987), which is a readability measure that includes word- and sentence-level
characteristics. As can be observed in the table, DRP is highly correlated with the Flesch
Reading Ease and Flesch-Kincaid Grade Level measures of readability.
To simplify the data analysis and presentation, DRP levels were translated to their
corresponding grade-level estimates and then collapsed according to the grade bands
used within the Common Core State Standards: grades K to 1, 2 to 3, 4 to 5, 6 to 8, 9 to
10, and 11 and higher. Each grade level within each genre was represented by 100
passages. Because the Common Core grade bands include different numbers of grade
levels per band (e.g., 2–3 includes two grades, 6–8 includes three grades), there are
different numbers of passages represented for each grade band. The average DRP
values as well as the range of DRP values for each grade band are provided in the
Table B.1.
The majority of the values below provided in the norms below can be used as
comparisons to other corpora. However, some of indices are provided solely to
describe the corpus. The descriptive indices provided below are not intended to be
indicative of normative values that generalize to other text corpora. For example,
the passages in TASA all consist of one paragraph because paragraph breaks are not
marked in the TASA corpus. Hence, the paragraph count (i.e., DESPC) in the
norms table is 1. The standard deviation of the paragraph length (i.e., DESPLd) is 0
because this measure averages the length of paragraphs in terms of the number of
sentences across paragraphs (and there is only one paragraph in each text). The
average number of words and sentences (i.e., DESWC, DESSC) describes the corpus
but does not provide a normative value, because the length of the texts was kept
relatively constant within the TASA corpus. However, the remaining indices provide
a normative value that can be used to compare other texts in the corresponding
genre.

253
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Language Arts

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
Descriptive
DESPC 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
DESSC 34.640 6.792 26.820 5.573 20.935 4.940 15.923 4.509 13.875 3.871 13.203 3.670
DESWC 284.760 23.162 290.700 22.346 283.850 23.070 289.330 25.575 289.760 24.249 295.907 26.446

254
DESPL 34.640 6.792 26.820 5.573 20.935 4.940 15.923 4.509 13.875 3.871 13.203 3.670
DESPLd 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

[253–270] 9.10.2013
DESSL 8.601 1.600 11.375 2.368 14.522 4.421 19.937 6.676 23.002 8.395 24.764 9.406
DESSLd 4.785 1.443 6.516 2.075 8.584 6.329 11.405 5.380 13.674 12.062 13.143 8.233
DESWLsy 1.205 0.061 1.270 0.055 1.320 0.068 1.378 0.068 1.435 0.063 1.546 0.092
DESWLsyd 0.470 0.095 0.555 0.080 0.619 0.101 0.685 0.079 0.756 0.079 0.871 0.103
DESWLlt 3.789 0.201 3.994 0.163 4.159 0.191 4.337 0.188 4.484 0.167 4.763 0.223
DESWLltd 1.730 0.220 1.929 0.185 2.075 0.214 2.242 0.183 2.377 0.173 2.615 0.209
Text Easability Principal Component Scores
PCNARz 1.368 0.574 1.164 0.618 0.745 0.773 0.446 0.714 0.250 0.632 −0.232 0.677
PCNARp 88.175 10.284 83.843 13.577 72.196 21.756 64.119 22.022 58.457 21.305 41.649 21.476
PCSYNz 1.625 0.670 0.891 0.634 0.297 0.755 −0.416 0.882 −0.720 0.848 −0.701 0.946
PCSYNp 91.153 9.522 77.387 16.676 59.784 23.071 38.152 24.265 29.343 21.547 31.250 22.614
PCCNCz 0.205 0.939 0.560 0.863 0.830 1.071 0.883 0.958 0.752 0.944 0.391 1.079
PCCNCp 55.749 27.500 66.449 24.680 71.996 26.876 74.252 24.359 70.562 25.400 59.456 29.013
PCREFz 0.044 0.959 −0.254 0.822 −0.390 0.816 −0.337 0.851 −0.378 0.793 −0.338 0.882
PRREFp 48.809 26.453 41.112 24.837 37.331 24.426 38.894 25.089 37.872 25.042 38.669 26.079
PCDCz −0.007 0.922 0.075 0.762 0.073 0.968 0.171 0.914 0.254 0.969 0.286 1.012
PCDCp 47.978 24.830 51.923 24.310 50.981 27.508 54.417 27.033 56.209 27.069 57.590 27.945
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
PCVERBz −0.024 0.854 −0.374 0.870 −0.089 0.938 −0.222 0.971 −0.294 0.965 −0.631 0.901
PCVERBp 49.733 26.128 38.730 25.442 46.428 27.910 43.596 28.543 42.111 27.689 31.619 24.405
PCCONNz −1.458 1.303 −2.083 1.279 −2.239 1.268 −2.455 1.262 −2.503 1.333 −2.399 1.230
PCCONNp 18.803 21.303 9.055 14.398 7.915 14.561 5.698 10.984 5.530 11.092 6.157 12.481
PCTEMPz 0.066 0.654 0.011 0.800 −0.034 0.989 -0.073 1.064 0.030 1.189 −0.032 1.118
PCTEMPp 52.650 21.834 51.020 25.011 50.525 28.250 49.177 29.389 52.743 31.570 50.784 29.938
Referential Cohesion
CRFNO1 0.149 0.134 0.162 0.133 0.182 0.151 0.225 0.172 0.246 0.165 0.303 0.201
CRFAO1 0.349 0.157 0.413 0.171 0.454 0.184 0.524 0.199 0.537 0.210 0.552 0.223

255
CRFSO1 0.168 0.143 0.191 0.143 0.222 0.170 0.289 0.198 0.328 0.198 0.414 0.230
CRFNOa 0.127 0.090 0.131 0.089 0.143 0.099 0.180 0.126 0.199 0.122 0.243 0.147

[253–270] 9.10.2013
CRFAOa 0.275 0.116 0.339 0.142 0.362 0.149 0.427 0.180 0.443 0.183 0.456 0.204
CRFSOa 0.148 0.103 0.156 0.099 0.175 0.116 0.232 0.146 0.269 0.153 0.344 0.176
CRFCWO1 0.108 0.054 0.101 0.047 0.094 0.043 0.095 0.047 0.090 0.040 0.087 0.047
CRFPCWO1d 0.143 0.039 0.125 0.036 0.113 0.034 0.099 0.037 0.089 0.032 0.084 0.035
CRFCWOa 0.083 0.035 0.077 0.032 0.071 0.030 0.072 0.033 0.068 0.029 0.067 0.037
CRFCWOad 0.133 0.028 0.112 0.019 0.100 0.021 0.089 0.024 0.080 0.019 0.076 0.023
LSA
LSASS1 0.220 0.091 0.232 0.083 0.250 0.092 0.302 0.099 0.334 0.117 0.379 0.100
LSASS1d 0.192 0.045 0.184 0.040 0.171 0.047 0.170 0.047 0.167 0.049 0.167 0.048
LSASSp 0.179 0.079 0.190 0.070 0.207 0.085 0.262 0.094 0.305 0.119 0.345 0.103
LSASSpd 0.188 0.036 0.176 0.034 0.164 0.037 0.164 0.036 0.164 0.037 0.163 0.038
LSAPP1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAPP1d 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAGN 0.380 0.060 0.352 0.042 0.343 0.053 0.348 0.050 0.358 0.056 0.374 0.049
LSAGNd 0.154 0.025 0.141 0.024 0.139 0.026 0.144 0.027 0.153 0.029 0.158 0.028

(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Language Arts (cont.)

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
Lexical Diversity
LDTTRc 0.623 0.117 0.731 0.073 0.773 0.076 0.813 0.069 0.828 0.064 0.822 0.075
LDTTRa 0.460 0.072 0.521 0.049 0.548 0.049 0.568 0.043 0.582 0.044 0.581 0.048
LDMTLDa 60.090 19.372 76.500 22.232 82.511 24.296 88.354 25.309 95.296 25.515 94.914 27.858

256
LDVOCDa 73.046 20.551 87.097 20.668 90.344 21.384 91.741 20.729 94.064 19.323 93.553 20.263
Connectives

[253–270] 9.10.2013
CNCAll 71.718 20.376 81.029 21.149 85.096 19.794 90.798 20.343 91.531 21.506 92.230 19.732
CNCCaus 19.564 11.450 19.730 9.578 19.886 10.761 21.003 9.589 22.830 10.172 24.596 11.061
CNCLogic 30.224 13.516 31.674 11.816 31.685 13.714 32.959 13.104 34.657 12.604 35.772 14.091
CNCADC 9.961 7.049 13.531 8.346 14.391 8.677 15.676 9.045 17.494 9.472 17.710 9.147
CNCTemp 19.152 11.858 20.625 10.014 20.647 11.790 21.766 9.687 20.100 9.705 19.467 9.656
CNCTempx 15.043 10.243 16.112 10.605 17.994 10.557 17.122 10.341 17.245 10.234 16.028 9.761
CNCAdd 37.158 15.511 43.945 14.980 45.453 15.327 49.345 14.983 50.120 15.974 49.906 14.787
CNCPos 66.102 19.949 72.767 19.937 74.704 19.291 78.699 19.547 78.614 19.900 78.575 19.267
CNCNeg 7.765 6.385 9.706 6.672 10.671 7.046 11.711 7.627 12.847 8.108 13.625 8.233
Situation Model
SMCAUSv 52.750 18.394 44.199 13.131 36.328 12.953 27.130 11.755 22.740 10.847 23.172 9.161
SMCAUSvp 61.127 19.923 53.469 13.750 44.633 13.791 36.104 12.926 32.486 12.063 32.783 10.589
SMINTEp 56.429 18.098 41.033 13.971 30.114 12.533 21.366 10.013 17.901 9.464 16.464 8.398
SMCAUSr 0.167 0.156 0.218 0.181 0.248 0.248 0.376 0.553 0.473 0.493 0.452 0.502
SMINTEr 0.336 0.249 0.433 0.297 0.639 0.537 0.919 0.771 1.138 0.884 1.249 1.057
SMCAUSlsa 0.082 0.024 0.071 0.023 0.077 0.034 0.080 0.032 0.083 0.036 0.087 0.037
SMCAUSwn 0.602 0.088 0.566 0.090 0.577 0.095 0.572 0.093 0.569 0.084 0.537 0.093
SMTEMP 0.851 0.061 0.841 0.077 0.833 0.097 0.821 0.106 0.825 0.115 0.820 0.111
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Syntactic Complexity
SYNLE 2.163 0.707 2.593 0.773 3.229 1.242 4.078 1.700 4.644 2.335 5.512 2.430
SYNNP 0.565 0.144 0.623 0.137 0.730 0.166 0.821 0.149 0.877 0.160 0.936 0.164
SYNMEDpos 0.703 0.057 0.698 0.048 0.680 0.047 0.668 0.047 0.665 0.050 0.643 0.048
SYNMEDwrd 0.906 0.047 0.913 0.035 0.906 0.032 0.902 0.029 0.900 0.026 0.891 0.028
SYNMEDlem 0.882 0.052 0.889 0.041 0.885 0.035 0.882 0.032 0.882 0.028 0.873 0.031
SYNSTRUTa 0.172 0.059 0.143 0.036 0.121 0.037 0.097 0.035 0.086 0.031 0.087 0.032
SYNSTRUTt 0.159 0.045 0.134 0.032 0.114 0.029 0.089 0.027 0.083 0.024 0.081 0.024
Syntactic Pattern Density
DRNP 353.241 25.341 352.136 25.748 352.915 29.344 355.756 31.572 363.273 31.000 366.610 32.600

257
DRVP 264.580 29.825 252.577 31.921 229.998 35.829 214.462 35.386 199.327 32.115 191.868 38.489
DRAP 40.308 16.165 42.571 15.109 37.937 14.678 36.662 13.863 35.631 13.605 31.178 12.754

[253–270] 9.10.2013
DRPP 74.397 19.640 85.912 18.751 100.214 21.102 109.790 22.740 115.670 20.955 123.168 21.929
DRPVAL 0.874 1.638 1.862 2.442 2.563 3.498 3.242 3.092 2.969 2.607 4.479 3.438
DRNEG 18.421 10.519 14.917 9.221 12.333 8.728 9.475 7.265 9.343 7.239 8.178 6.264
DRGERUND 7.297 4.945 9.008 5.595 8.642 4.995 9.082 5.421 8.838 5.130 9.022 5.110
DRINF 8.392 4.934 8.808 4.742 8.215 4.445 7.641 5.047 7.143 4.410 7.679 5.010
Word Information
WRDNOUN 210.219 36.902 214.872 36.543 226.645 43.466 230.869 37.516 240.713 35.191 256.079 39.605
WRDVERB 172.875 24.359 161.881 24.171 150.317 22.721 140.766 20.991 134.166 21.520 124.386 21.432
WRDADJ 53.907 19.192 57.607 17.886 66.806 21.064 76.646 20.967 83.810 23.646 91.914 21.640
WRDADV 69.431 23.138 68.531 22.846 62.670 21.274 59.978 19.873 58.900 19.306 54.634 18.949
WRDPRO 131.679 34.332 126.184 31.796 105.848 35.768 91.207 33.823 83.173 29.407 64.285 29.125
WRDPRP1s 35.083 28.790 29.780 28.337 18.946 25.020 15.573 22.917 10.791 18.383 5.478 13.913
WRDPRP1p 8.493 12.401 8.126 14.481 4.954 8.650 4.640 10.894 4.526 9.046 4.873 11.016
WRDPRP2 19.669 15.595 15.295 18.160 10.413 13.583 8.519 17.033 7.034 14.519 7.185 16.385
WRDPRP3s 43.289 30.308 47.940 30.440 45.865 32.239 38.140 30.650 37.031 29.535 23.508 25.622
WRDPRP3p 9.403 10.644 10.525 11.166 10.621 10.609 11.249 11.948 12.332 13.016 12.206 12.865

(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Language Arts (cont.)

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
WRDFRQc 2.522 0.153 2.398 0.153 2.339 0.168 2.235 0.159 2.208 0.139 2.114 0.143
WRDFRQa 3.135 0.107 3.090 0.093 3.089 0.095 3.055 0.098 3.046 0.086 2.989 0.087
WRDFRQmc 1.711 0.291 1.536 0.342 1.415 0.331 1.130 0.429 1.076 0.490 0.930 0.489
WRDAOAc 256.837 26.219 273.226 24.087 288.266 27.708 309.533 29.478 325.367 30.105 356.050 34.868

258
WRDFAMc 583.866 6.242 578.780 8.419 576.096 7.960 571.920 8.365 570.105 8.352 564.820 9.003
WRDCNCc 400.119 26.115 401.601 25.872 404.363 31.319 399.461 29.030 393.433 28.040 384.911 32.791

[253–270] 9.10.2013
WRDIMGc 430.002 24.017 431.360 23.252 435.387 29.029 431.485 26.385 427.273 25.145 417.412 29.335
WRDMEAc 432.977 13.939 432.909 12.050 435.929 14.786 432.973 14.200 433.259 13.704 429.408 15.955
WRDPOLc 4.642 0.514 4.386 0.441 4.217 0.402 4.107 0.382 3.964 0.379 3.765 0.401
WRDHYPn 6.179 0.850 6.264 0.789 6.314 0.682 6.378 0.622 6.266 0.615 6.373 0.602
WRDHYPv 1.672 0.159 1.667 0.162 1.652 0.170 1.650 0.177 1.631 0.170 1.644 0.189
WRDHYPnv 1.469 0.261 1.511 0.245 1.570 0.254 1.606 0.234 1.624 0.206 1.726 0.230
Readability
RDFRE 95.495 3.854 87.917 3.890 80.502 5.292 70.209 5.873 62.299 7.797 51.092 9.258
RDFKGL 1.941 0.838 3.796 0.775 5.610 1.494 8.381 2.233 10.242 3.012 12.240 3.315
RDL2 27.133 6.216 22.239 4.978 19.238 4.755 15.467 5.032 13.967 4.103 11.808 5.045
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Social Studies

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
Descriptive
DESPC 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
DESSC 36.070 7.938 27.515 4.254 23.585 3.932 19.390 3.872 17.200 3.512 15.590 3.204
DESWC 276.070 22.543 277.890 22.558 277.120 23.867 283.140 24.360 291.920 24.648 300.000 23.085
DESPL 36.070 7.938 27.515 4.254 23.585 3.932 19.390 3.872 17.200 3.512 15.590 3.204

259
DESPLd 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
DESSL 7.983 1.488 10.340 1.470 12.081 2.008 15.316 3.704 18.040 5.580 20.338 5.229

[253–270] 9.10.2013
DESSLd 3.361 1.132 4.247 1.531 4.964 1.905 6.681 3.172 8.070 3.895 9.375 3.970
DESWLsy 1.255 0.068 1.327 0.065 1.395 0.072 1.479 0.079 1.508 0.076 1.623 0.102
DESWLsyd 0.531 0.094 0.612 0.083 0.693 0.094 0.780 0.093 0.817 0.085 0.936 0.101
DESWLlt 3.967 0.205 4.190 0.180 4.379 0.185 4.587 0.203 4.647 0.199 4.930 0.257
DESWLltd 1.809 0.172 1.962 0.161 2.115 0.190 2.327 0.186 2.424 0.182 2.700 0.225
Text Easability Principal Component Scores
PCNARz 0.567 0.847 0.085 0.696 −0.247 0.660 −0.501 0.704 −0.535 0.639 −0.742 0.572
PCNARp 66.349 24.184 52.386 22.572 41.410 21.806 33.426 21.237 31.753 20.201 25.892 17.196
PCSYNz 1.604 0.623 1.152 0.533 0.811 0.616 0.401 0.710 0.049 0.789 −0.101 0.746
PCSYNp 91.492 8.278 84.402 11.144 75.412 16.879 63.186 22.124 52.366 24.017 47.311 22.974
PCCNCz 0.450 0.860 0.739 0.854 0.829 0.901 0.533 0.962 0.456 0.980 0.034 0.964
PCCNCp 62.647 25.222 71.680 23.749 73.278 23.832 65.566 26.855 62.945 27.651 51.251 27.792
PCREFz 0.253 0.978 0.128 0.947 −0.089 0.826 −0.267 0.808 −0.147 0.864 −0.310 0.855
PRREFp 55.911 28.598 52.252 27.285 46.381 25.298 41.257 24.713 44.600 26.443 39.602 25.268

(continued )
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Social Studies (cont.)

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
PCDCz −0.421 0.929 −0.035 0.979 −0.002 0.906 0.239 1.096 0.346 1.028 0.366 0.911
PCDCp 37.816 27.999 48.207 29.270 49.182 27.069 55.288 29.598 58.051 27.735 60.029 25.962
PCVERBz 1.082 1.324 0.891 0.968 0.749 0.900 0.396 1.105 0.164 0.954 −0.289 0.977
PCVERBp 73.150 27.136 73.746 22.196 71.206 24.386 59.775 29.570 54.634 28.212 41.409 27.942

260
PCCONNz −1.069 1.122 −1.615 1.142 −1.811 1.211 −2.067 1.302 −2.019 1.363 −2.254 1.276
PCCONNp 23.971 24.281 14.426 19.824 11.934 16.561 9.271 14.590 10.987 15.854 7.839 14.163

[253–270] 9.10.2013
PCTEMPz 0.095 0.706 0.016 0.858 −0.027 0.972 0.114 0.935 −0.008 1.047 −0.154 1.085
PCTEMPp 53.207 23.365 51.220 26.453 50.261 28.317 53.983 28.019 50.949 29.482 47.270 30.021
Referential Cohesion
CRFNO1 0.226 0.162 0.298 0.179 0.325 0.172 0.351 0.174 0.397 0.186 0.399 0.197
CRFAO1 0.437 0.153 0.475 0.170 0.483 0.172 0.496 0.174 0.537 0.186 0.527 0.194
CRFSO1 0.280 0.193 0.364 0.193 0.411 0.190 0.456 0.193 0.501 0.195 0.523 0.212
CRFNOa 0.144 0.093 0.207 0.121 0.215 0.113 0.240 0.123 0.281 0.147 0.289 0.146
CRFAOa 0.294 0.109 0.339 0.132 0.340 0.146 0.354 0.150 0.398 0.167 0.399 0.157
CRFSOa 0.179 0.111 0.262 0.139 0.277 0.132 0.326 0.150 0.381 0.166 0.405 0.168
CRFCWO1 0.141 0.068 0.127 0.058 0.113 0.047 0.100 0.040 0.102 0.045 0.092 0.045
CRFCWO1d 0.163 0.050 0.139 0.038 0.126 0.034 0.110 0.030 0.105 0.033 0.095 0.034
CRFCWOa 0.089 0.037 0.082 0.036 0.074 0.033 0.066 0.027 0.070 0.032 0.064 0.028
CRFPCWOad 0.141 0.034 0.120 0.026 0.107 0.023 0.094 0.022 0.090 0.022 0.083 0.021
LSA
LSASS1 0.264 0.090 0.296 0.099 0.315 0.094 0.344 0.107 0.360 0.100 0.382 0.107
LSASS1d 0.206 0.041 0.198 0.040 0.191 0.039 0.182 0.039 0.175 0.040 0.164 0.039
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
LSASSp 0.156 0.053 0.202 0.076 0.229 0.083 0.277 0.105 0.300 0.098 0.332 0.109
LSASSpd 0.179 0.033 0.180 0.034 0.180 0.034 0.173 0.033 0.166 0.031 0.159 0.030
LSAPP1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAPP1d 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAGN 0.377 0.054 0.376 0.056 0.374 0.050 0.374 0.057 0.376 0.050 0.382 0.053
LSAGNd 0.153 0.026 0.144 0.023 0.141 0.021 0.141 0.021 0.144 0.022 0.145 0.023
Lexical Diversity
LDTTRc 0.635 0.109 0.669 0.094 0.706 0.080 0.738 0.074 0.750 0.075 0.768 0.071
LDTTRa 0.473 0.075 0.497 0.062 0.523 0.053 0.544 0.051 0.546 0.054 0.558 0.048
LDMTLDa 54.345 24.124 59.491 20.692 66.751 21.020 75.340 22.556 77.985 23.133 84.314 24.050

261
LDVOCDa 69.449 22.970 72.753 19.683 77.440 18.942 82.238 20.288 81.764 19.591 87.326 19.731
Connectives

[253–270] 9.10.2013
CNCAll 58.392 22.014 70.728 21.355 76.186 20.233 84.591 21.073 86.130 21.215 90.993 18.121
CNCCaus 17.730 11.448 21.273 11.873 21.854 10.673 24.530 12.556 26.200 11.606 26.776 10.524
CNCLogic 24.637 12.772 29.832 13.176 30.388 12.899 34.090 15.468 36.058 15.587 37.279 14.150
CNCADC 9.107 7.201 11.755 8.533 12.552 7.841 15.300 9.537 15.875 10.126 17.618 9.610
CNCTemp 12.035 9.772 14.929 10.194 16.065 9.549 17.775 9.822 18.087 9.025 18.169 9.035
CNCTempx 17.186 12.035 18.521 11.410 17.821 11.082 18.467 11.393 18.193 9.807 17.083 9.492
CNCAdd 30.570 13.200 37.075 12.994 40.490 13.892 44.441 14.422 44.462 14.981 48.488 14.460
CNCPos 52.524 20.862 62.688 19.735 66.544 18.965 72.794 18.701 74.129 19.404 77.561 16.614
CNCNeg 7.062 6.246 8.885 6.990 9.683 6.756 11.577 8.475 12.077 9.190 13.429 8.296
Situation Model
SMCAUSv 61.365 20.088 50.642 12.846 44.915 13.983 37.219 12.685 32.569 12.356 29.043 10.936
SMCAUSvp 69.113 20.467 59.100 14.680 53.736 15.325 46.898 14.893 42.641 13.929 38.772 12.597
SMINTEp 47.001 18.187 35.157 15.729 29.545 13.520 23.608 11.600 19.900 10.299 18.227 9.953

(continued )
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Social Studies (cont.)

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
SMCAUSr 0.138 0.152 0.165 0.142 0.207 0.216 0.275 0.287 0.331 0.320 0.346 0.291
SMINTEr 0.350 0.249 0.614 0.502 0.663 0.416 0.972 0.847 1.151 0.802 1.268 1.061
SMCAUSlsa 0.121 0.058 0.110 0.044 0.105 0.046 0.102 0.049 0.099 0.045 0.097 0.040
SMCAUSwn 0.665 0.115 0.637 0.105 0.628 0.087 0.607 0.094 0.579 0.099 0.553 0.096

262
SMTEMP 0.869 0.073 0.857 0.083 0.849 0.090 0.853 0.090 0.838 0.099 0.818 0.105
Syntactic Complexity

[253–270] 9.10.2013
SYNLE 1.951 0.578 2.734 0.718 3.299 0.879 4.240 1.174 4.844 1.351 5.608 2.259
SYNNP 0.630 0.152 0.747 0.153 0.820 0.147 0.899 0.161 0.926 0.157 0.960 0.153
SYNMEDpos 0.650 0.066 0.647 0.047 0.641 0.049 0.638 0.039 0.629 0.040 0.628 0.045
SYNMEDwrd 0.876 0.061 0.877 0.045 0.883 0.036 0.888 0.030 0.880 0.030 0.883 0.033
SYNMEDlem 0.840 0.069 0.846 0.049 0.854 0.040 0.862 0.031 0.856 0.031 0.861 0.034
SYNSTRUTa 0.220 0.066 0.183 0.045 0.160 0.044 0.135 0.039 0.121 0.036 0.107 0.036
SYNSTRUTt 0.186 0.050 0.160 0.036 0.143 0.034 0.128 0.033 0.112 0.033 0.100 0.029
Syntactic Pattern Density
DRNP 376.886 35.318 376.609 34.887 383.136 30.244 383.272 35.736 382.043 33.942 375.983 36.490
DRVP 232.749 42.527 222.847 40.522 201.074 36.206 190.151 41.273 188.737 40.653 186.081 39.188
DRAP 32.169 15.158 28.840 13.381 27.278 11.722 26.956 11.667 26.601 11.597 28.050 12.394
DRPP 92.603 26.049 105.813 26.246 118.513 20.893 123.142 23.135 125.957 23.317 128.927 23.647
DRPVAL 2.877 2.789 4.954 4.374 5.382 4.275 5.369 3.916 5.494 4.265 5.555 4.357
DRNEG 9.902 10.422 7.939 7.977 6.574 6.369 6.288 6.653 7.083 6.330 7.163 6.070
DRGERUND 4.560 4.348 4.641 3.743 5.193 4.000 5.778 4.641 5.898 4.603 6.831 4.544
DRINF 7.306 4.810 9.081 5.682 7.927 5.069 7.915 5.049 7.929 5.048 8.549 5.148
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Word Information
WRDNOUN 251.523 48.531 267.986 41.900 279.999 36.277 281.971 41.589 279.454 38.987 279.553 38.174
WRDVERB 140.897 30.124 136.423 31.019 128.984 27.816 124.428 28.567 123.826 24.977 119.149 22.076
WRDADJ 61.441 21.994 71.183 22.908 80.723 25.257 91.124 26.149 91.040 24.166 99.109 25.138
WRDADV 52.279 25.096 48.251 19.409 43.362 15.897 44.283 17.965 45.003 17.571 47.179 18.207
WRDPRO 104.482 41.070 73.265 34.878 59.541 29.932 48.619 29.483 44.589 26.160 39.247 22.219
WRDPRP1s 17.842 32.371 5.380 13.603 3.190 11.449 1.915 6.725 1.818 6.221 1.467 5.729
WRDPRP1p 12.147 16.461 6.581 11.847 5.022 10.555 3.802 9.495 3.213 8.146 4.440 9.077
WRDPRP2 18.873 22.570 13.661 19.926 9.684 21.016 4.931 12.184 4.356 11.807 2.281 7.772
WRDPRP3s 22.336 28.217 17.996 25.936 13.081 18.815 12.667 18.976 12.758 18.809 9.674 14.766

263
WRDPRP3p 19.225 18.635 17.870 15.489 18.358 17.276 16.123 14.541 11.857 11.082 12.691 12.710
WRDFRQc 2.545 0.155 2.441 0.149 2.370 0.150 2.282 0.154 2.230 0.142 2.149 0.145

[253–270] 9.10.2013
WRDFRQa 3.152 0.085 3.127 0.084 3.107 0.093 3.073 0.102 3.057 0.104 2.993 0.106
WRDFRQmc 1.727 0.239 1.498 0.244 1.415 0.309 1.223 0.402 1.116 0.458 0.980 0.384
WRDAOAc 277.278 25.974 297.128 28.422 315.551 31.601 341.214 30.303 354.961 30.564 381.515 31.295
WRDFAMc 583.750 7.528 579.332 7.508 574.291 8.897 569.452 8.817 566.657 8.834 563.451 10.140
WRDCNCc 407.502 24.920 407.562 25.021 408.246 26.648 396.036 28.948 392.308 28.606 378.074 26.879
WRDIMGc 436.973 22.560 437.829 21.671 439.071 22.895 427.854 26.085 424.426 25.038 410.346 24.994
WRDMEAc 442.245 15.125 443.975 13.702 444.783 13.287 438.801 15.023 435.847 17.297 430.164 17.090
WRDPOLc 4.663 0.476 4.518 0.458 4.262 0.444 4.025 0.472 3.945 0.404 3.800 0.422
WRDHYPn 6.060 0.720 6.033 0.652 5.859 0.763 5.934 0.776 6.117 0.707 6.314 0.686
WRDHYPv 1.566 0.166 1.546 0.166 1.563 0.174 1.581 0.195 1.604 0.186 1.626 0.209
WRDHYPnv 1.621 0.264 1.720 0.255 1.716 0.217 1.739 0.246 1.782 0.239 1.843 0.260
Readability
RDFRE 92.393 5.350 84.142 5.036 76.644 5.519 66.234 5.375 61.055 5.698 49.059 9.598
RDFKGL 2.317 0.911 4.079 0.733 5.556 0.870 7.802 1.158 9.194 1.815 11.430 2.240
RDL2 32.381 8.481 27.016 5.956 23.300 5.394 19.139 4.947 17.209 4.737 14.039 4.552
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Science

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
Descriptive
DESPC 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
DESSC 36.220 7.450 29.655 4.859 25.410 3.915 21.747 3.993 20.300 4.211 17.193 3.674
DESWC 275.220 16.508 278.070 21.510 273.175 17.885 277.537 19.984 280.800 22.096 287.700 22.881

264
DESPL 36.220 7.450 29.655 4.859 25.410 3.915 21.747 3.993 20.300 4.211 17.193 3.674
DESPLd 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

[253–270] 9.10.2013
DESSL 7.884 1.397 9.612 1.485 11.032 1.658 13.259 2.577 14.519 3.241 17.715 4.541
DESSLd 3.020 0.988 3.549 0.930 4.344 1.290 5.376 1.739 5.905 2.295 7.624 3.228
DESWLsy 1.224 0.050 1.293 0.059 1.369 0.069 1.460 0.071 1.518 0.069 1.617 0.097
DESWLsyd 0.487 0.078 0.575 0.086 0.680 0.104 0.761 0.094 0.826 0.082 0.923 0.108
DESWLlt 3.990 0.168 4.162 0.155 4.323 0.181 4.540 0.178 4.681 0.190 4.873 0.248
DESWLltd 1.712 0.178 1.875 0.182 2.120 0.213 2.312 0.191 2.454 0.172 2.662 0.219
Text Easability Principal Component Scores
PCNARz 0.505 0.700 0.096 0.675 −0.255 0.568 −0.550 0.596 −0.724 0.529 −0.959 0.521
PCNARp 65.737 21.564 52.473 21.926 40.811 19.308 31.458 19.066 25.996 15.956 19.716 13.919
PCSYNz 1.844 0.715 1.482 0.626 1.236 0.587 0.885 0.679 0.718 0.739 0.309 0.771
PCSYNp 93.516 8.342 89.560 10.278 85.697 11.654 76.742 18.014 71.898 20.423 59.820 23.839
PCCNCz 0.751 1.024 0.870 0.941 0.826 0.921 0.632 0.958 0.488 0.973 0.053 0.938
PCCNCp 70.805 25.489 74.441 23.220 73.087 23.862 67.847 25.795 64.372 26.565 50.665 28.309
PCREFz 0.947 0.923 0.938 0.806 0.810 0.900 0.557 0.949 0.405 0.980 0.444 1.011
PRREFp 75.220 21.344 76.715 19.316 72.707 23.590 65.528 25.724 60.585 27.897 61.826 27.935
PCDCz −0.368 0.875 0.023 0.917 0.155 0.920 0.222 0.873 0.166 0.953 0.214 0.957
PCDCp 38.052 26.286 49.119 26.881 53.715 27.233 55.915 26.460 53.581 26.370 54.898 27.163
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
PCVERBz 0.832 0.928 0.485 0.808 0.347 0.876 0.027 0.831 −0.113 0.876 −0.494 0.914
PCVERBp 73.537 23.465 64.341 23.442 59.863 25.269 50.723 25.649 46.396 27.163 35.142 25.627
PCCONNz −1.361 1.318 −1.712 1.268 −1.916 1.335 −2.076 1.372 −2.031 1.325 −1.989 1.260
PCCONNp 21.021 23.716 14.153 18.214 12.374 17.597 10.441 16.931 10.335 16.929 10.775 17.118
PCTEMPz −0.154 0.720 −0.172 0.837 −0.148 0.900 −0.144 0.943 −0.276 1.137 −0.021 1.053
PCTEMPp 45.097 23.640 45.724 25.206 46.161 27.327 46.597 27.496 45.171 29.609 50.612 30.055
Referential Cohesion
CRFNO1 0.313 0.154 0.414 0.168 0.464 0.172 0.499 0.179 0.495 0.189 0.528 0.200
CRFAO1 0.528 0.161 0.600 0.153 0.610 0.163 0.624 0.164 0.601 0.174 0.646 0.181
CRFSO1 0.378 0.182 0.491 0.178 0.557 0.180 0.596 0.178 0.583 0.191 0.653 0.192

265
CRFNOa 0.191 0.105 0.260 0.126 0.294 0.126 0.323 0.149 0.338 0.154 0.370 0.162
CRFAOa 0.375 0.150 0.421 0.135 0.431 0.146 0.434 0.153 0.434 0.156 0.477 0.175

[253–270] 9.10.2013
CRFSOa 0.252 0.135 0.330 0.139 0.382 0.150 0.415 0.154 0.421 0.161 0.493 0.174
CRFCWO1 0.180 0.066 0.177 0.056 0.170 0.057 0.151 0.057 0.138 0.055 0.133 0.059
CRFCWO1d 0.190 0.041 0.168 0.032 0.163 0.033 0.141 0.036 0.133 0.035 0.122 0.041
CRFCWOa 0.112 0.045 0.110 0.040 0.102 0.038 0.092 0.039 0.089 0.037 0.086 0.039
CRFPCWOad 0.158 0.030 0.144 0.022 0.136 0.024 0.120 0.026 0.115 0.025 0.105 0.028
LSA
LSASS1 0.327 0.089 0.373 0.098 0.391 0.101 0.409 0.109 0.412 0.111 0.465 0.124
LSASS1d 0.227 0.038 0.219 0.034 0.217 0.037 0.208 0.042 0.197 0.044 0.185 0.047
LSASSp 0.205 0.073 0.252 0.092 0.275 0.103 0.310 0.108 0.323 0.113 0.394 0.132
LSASSpd 0.190 0.029 0.196 0.031 0.198 0.033 0.195 0.035 0.188 0.036 0.182 0.039
LSAPP1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAPP1d 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
LSAGN 0.413 0.049 0.421 0.052 0.419 0.057 0.416 0.058 0.413 0.061 0.430 0.069
LSAGNd 0.155 0.020 0.150 0.019 0.154 0.019 0.155 0.023 0.154 0.025 0.160 0.025

(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Science (cont.)

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
Lexical Diversity
LDTTRc 0.573 0.083 0.600 0.079 0.622 0.083 0.660 0.085 0.677 0.082 0.693 0.091
LDTTRa 0.427 0.054 0.453 0.050 0.473 0.054 0.500 0.056 0.509 0.059 0.517 0.062
LDMTLDa 42.003 14.061 47.259 13.335 50.508 14.491 60.219 20.326 64.567 22.525 67.781 23.384

266
LDVOCDa 58.811 17.974 63.544 16.570 67.023 16.656 73.977 22.106 74.525 20.588 76.040 21.928
Connectives

[253–270] 9.10.2013
CNCAll 61.240 22.355 71.864 21.075 75.838 21.236 80.439 20.504 80.821 21.126 82.993 19.682
CNCCaus 16.616 11.832 21.045 12.954 23.136 11.189 23.393 10.053 23.362 11.194 25.732 11.322
CNCLogic 23.482 14.076 29.202 13.145 32.107 14.812 34.261 13.565 34.051 14.484 35.846 14.105
CNCADC 11.583 9.569 12.505 8.095 14.528 9.634 15.493 9.286 14.651 9.550 16.111 9.221
CNCTemp 14.001 10.562 15.859 10.103 15.728 11.393 16.778 9.666 17.080 10.698 16.619 8.742
CNCTempx 10.491 11.556 11.992 11.850 12.346 10.948 12.082 10.105 12.910 11.448 12.599 9.878
CNCAdd 30.396 14.332 34.893 14.780 37.389 15.078 40.608 15.876 42.023 15.437 42.843 14.158
CNCPos 52.508 19.499 62.084 19.816 64.591 19.458 68.542 18.549 69.903 18.726 70.818 18.139
CNCNeg 9.052 8.366 9.851 6.728 11.649 8.639 12.069 8.182 11.303 8.505 12.562 8.198
Situation Model
SMCAUSv 80.537 26.527 65.375 16.421 56.290 15.282 46.796 14.999 42.447 14.776 35.392 12.810
SMCAUSvp 90.447 25.793 77.056 19.399 68.493 18.587 58.267 16.947 53.433 17.243 47.354 15.434
SMINTEp 41.198 21.419 31.305 13.618 27.159 12.447 22.367 11.673 20.296 10.111 17.278 9.644
SMCAUSr 0.137 0.148 0.179 0.146 0.212 0.152 0.251 0.181 0.261 0.205 0.343 0.257
SMINTEr 0.424 0.430 0.610 0.525 0.741 0.565 0.899 0.949 0.893 0.841 1.072 0.823
SMCAUSlsa 0.112 0.034 0.111 0.036 0.112 0.037 0.114 0.039 0.115 0.040 0.122 0.050
SMCAUSwn 0.632 0.087 0.617 0.096 0.609 0.096 0.589 0.087 0.566 0.087 0.545 0.093
SMTEMP 0.852 0.076 0.843 0.079 0.841 0.085 0.835 0.090 0.819 0.110 0.835 0.101
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Syntactic Complexity
SYNLE 1.843 0.718 2.567 0.790 3.038 0.813 3.864 1.187 4.367 1.409 5.070 1.561
SYNNP 0.650 0.167 0.729 0.174 0.825 0.161 0.882 0.178 0.920 0.166 0.990 0.161
SYNMEDpos 0.637 0.053 0.630 0.044 0.631 0.045 0.624 0.043 0.619 0.047 0.612 0.043
SYNMEDwrd 0.848 0.051 0.855 0.041 0.857 0.036 0.866 0.037 0.870 0.035 0.869 0.037
SYNMEDlem 0.812 0.055 0.817 0.043 0.820 0.040 0.833 0.041 0.839 0.039 0.841 0.039
SYNSTRUTa 0.214 0.061 0.190 0.045 0.169 0.045 0.150 0.040 0.145 0.042 0.120 0.038
SYNSTRUTt 0.168 0.040 0.156 0.038 0.143 0.035 0.133 0.033 0.131 0.033 0.111 0.030
Syntactic Pattern Density
DRNP 352.756 31.437 365.765 32.152 365.788 32.321 369.404 34.689 372.343 32.442 376.769 32.061

267
DRVP 248.430 46.815 231.734 42.833 222.430 37.745 208.325 37.778 203.136 35.739 187.587 32.070
DRAP 28.826 15.049 31.717 15.218 26.094 13.636 27.242 12.589 25.157 11.471 25.998 11.863

[253–270] 9.10.2013
DRPP 85.102 25.395 98.117 25.188 102.450 22.239 114.323 23.615 119.598 23.745 127.057 21.393
DRPVAL 2.885 3.613 5.517 5.442 7.555 5.935 8.240 5.593 7.890 5.360 8.914 5.672
DRNEG 9.466 8.764 7.942 6.924 7.087 6.477 6.644 6.482 5.758 5.711 5.267 4.980
DRGERUND 5.485 5.544 5.533 4.956 6.127 4.932 6.209 4.609 7.142 4.934 6.366 4.779
DRINF 8.203 5.814 7.741 5.773 7.166 5.374 6.967 4.964 6.810 4.891 6.026 4.052
Word Information
WRDNOUN 238.129 45.681 260.970 44.323 272.192 41.299 283.527 40.363 285.882 43.436 290.676 36.160
WRDVERB 143.105 32.688 131.910 29.860 127.329 23.216 120.759 24.827 120.481 23.051 111.054 20.907
WRDADJ 63.722 25.663 65.988 22.732 74.060 23.462 81.881 23.873 90.459 24.846 98.167 24.938
WRDADV 45.789 23.589 48.947 19.425 43.719 20.045 45.224 17.898 42.583 18.595 43.377 17.738
WRDPRO 103.954 44.096 77.585 40.100 61.412 30.452 45.706 27.594 38.556 25.624 30.543 21.270
WRDPRP1s 5.022 19.190 1.473 7.621 0.308 2.149 1.034 5.086 0.348 3.760 0.314 1.953
WRDPRP1p 5.286 11.065 3.268 9.462 2.660 5.823 3.983 9.466 4.301 11.262 4.361 8.776
WRDPRP2 49.982 42.539 37.614 37.498 29.070 30.319 15.759 22.096 10.868 18.686 4.949 13.807
WRDPRP3s 11.569 23.661 7.274 16.673 3.656 11.044 3.491 9.663 4.238 10.942 3.816 11.135

(continued)
8:13AM
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D
Science (cont.)

K-1 2–3 4–5 6–8 9–10 11-CCR


Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD
WRDPRP3p 15.911 14.703 13.695 14.294 11.673 12.259 10.375 11.306 9.195 11.858 7.765 8.318
WRDFRQc 2.476 0.169 2.368 0.161 2.287 0.155 2.199 0.156 2.150 0.155 2.062 0.139
WRDFRQa 3.089 0.096 3.046 0.094 3.019 0.090 2.975 0.102 2.961 0.105 2.927 0.102
WRDFRQmc 1.687 0.242 1.448 0.256 1.311 0.244 1.166 0.254 1.060 0.249 0.913 0.326

268
WRDAOAc 264.503 27.001 288.838 31.288 306.917 30.729 326.958 33.093 341.231 31.113 363.769 31.012
WRDFAMc 584.050 7.625 578.728 8.821 575.049 8.419 571.466 9.109 569.298 9.276 563.479 10.348

[253–270] 9.10.2013
WRDCNCc 411.898 37.944 415.776 33.342 416.006 31.143 409.929 31.523 404.657 32.706 392.882 29.889
WRDIMGc 437.035 30.840 439.250 27.163 437.664 25.952 431.475 26.907 427.115 27.848 415.133 24.421
WRDMEAc 441.454 15.135 438.805 15.434 435.548 16.660 431.930 15.050 430.667 15.242 424.332 16.467
WRDPOLc 5.048 0.580 4.830 0.589 4.682 0.571 4.335 0.459 4.225 0.467 3.929 0.418
WRDHYPn 6.574 0.604 6.595 0.619 6.625 0.555 6.489 0.546 6.530 0.493 6.397 0.554
WRDHYPv 1.581 0.174 1.576 0.169 1.546 0.177 1.542 0.155 1.538 0.161 1.526 0.173
WRDHYPnv 1.698 0.272 1.833 0.271 1.890 0.235 1.912 0.231 1.934 0.246 1.925 0.228
Readability
RDFRE 94.959 3.638 87.751 4.716 79.853 5.336 69.956 5.206 63.774 4.398 52.164 8.890
RDFKGL 1.926 0.735 3.400 0.737 4.848 0.783 6.777 0.914 7.946 0.942 10.352 1.974
RDL2 32.462 7.265 28.470 5.953 25.014 5.553 20.866 5.245 18.776 5.122 15.066 5.066
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D 269 [253–270] 9.10.2013
8:13AM

Appendix B: Coh-Metrix Indices Norms 269

t a b l e b . 1 The TASA passages were categorized into grade bands using DRP scores.
This table provides the number of passages included within each grade band, the
mean and standard deviation for the DRP scores for each set of passages, and the
minimum and maximum cutoff DRP scores used to define the grade bands.

Grade Band N Mean DRP Std. Deviation Minimum DRP Maximum DRP
K-1 300 43.2465 2.33841 35.00 45.99
2–3 600 48.8362 1.45713 46.00 50.99
4–5 600 53.3161 1.44334 51.00 55.99
6–8 900 59.1749 1.34791 56.00 60.99
9–10 600 62.2777 0.90323 61.00 63.99
11-CCR 900 67.4324 3.10350 64.00 85.80
C:/ITOOLS/WMS/CUP-NEW/4417777/WORKINGFOLDER/MCNAM/9780521192927APX02.3D 270 [253–270] 9.10.2013
8:13AM

You might also like