A Comparison of Online and Offline Measures of Good-Enough Processing in Garden-Path Sentences

Language, Cognition and Neuroscience
ISSN: 2327-3798 (Print) 2327-3801 (Online) Journal homepage: http://www.tandfonline.com/loi/plcp21
A comparison of online and offline measures of

good-enough processing in garden-path sentences
Zhiying Qian, Susan Garnsey & Kiel Christianson
To cite this article: Zhiying Qian, Susan Garnsey & Kiel Christianson (2017): A comparison of
online and offline measures of good-enough processing in garden-path sentences, Language,
Cognition and Neuroscience, DOI: 10.1080/23273798.2017.1379606
To link to this article: http://dx.doi.org/10.1080/23273798.2017.1379606
Published online: 22 Sep 2017.
Submit your article to this journal
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=plcp21
Download by: [Australian Catholic University] Date: 22 September 2017, At: 11:46
LANGUAGE, COGNITION AND NEUROSCIENCE, 2017
https://doi.org/10.1080/23273798.2017.1379606
REGULAR ARTICLE
A comparison of online and offline measures of good-enough processing in

garden-path sentences
Zhiying Qiana,d, Susan Garnseyb,d and Kiel Christiansonc,d
a
Department of East Asian Languages and Cultures, University of Illinois at Urbana-Champaign, Urbana, IL, USA; bDepartment of Psychology,
University of Illinois at Urbana-Champaign, Urbana, IL, USA; cDepartment of Educational Psychology, University of Illinois at Urbana-Champaign,
Urbana, IL, USA; dBeckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
ABSTRACT ARTICLE HISTORY

In two self-paced reading and one ERP experiments, this study tested the good-enough processing Received 15 December 2016
account, which states that readers sometimes misinterpret sentences like While the man hunted the Accepted 29 August 2017
deer ran into the woods because they fail to fully revise the syntactic structure [Christianson, K.,
Downloaded by [Australian Catholic University] at 11:46 22 September 2017
KEYWORDS
Hollingworth, A., Halliwell, J. F., & Ferreira, F. (2001). Thematic roles assigned along the garden Garden-path sentences;
path linger. Cognitive Psychology, 42, 368–407. doi:10.1006/cogp.2001.0752]. Such an account good-enough processing;
predicts more evidence of reanalysis at the disambiguation on correctly- than incorrectly- reanalysis; P600; plausibility
answered trials. Experiment 1, which asked Did the man hunt the deer? and Experiment 2, which
asked Did the sentence explicitly say that the man hunted the deer? showed no difference in
reading time between trials with correct and incorrect responses. Experiment 3 found the
amplitude of P600 was unrelated to comprehension accuracy. These results converged to
suggest that failure to reanalyse ambiguous sentences is not the primary reason for
misinterpretation. Three norming studies revealed instead response accuracy was influenced by
likelihood of events described in the sentences and questions.
1. Introduction Traditional sentence processing models disagree on

the timing of the parser’s use of non-syntactic infor-
When people read sentences such as example (1) below,
mation to constrain the building of syntactic structure,
they typically slow down at the main clause verb ran, pre-
but they all assume that the parser eventually reaches
sumably because they initially interpret the noun phrase
the correct interpretation that is consistent with the lin-
the deer that was brown and graceful as the object of the
guistic input, at least most of the time. Most parsing
subordinate clause verb hunted. On that analysis, the
models would explain the garden-path effect in sen-
main clause verb ran appears to lack a subject, so reana-
tences such as (1) as the result of an initial decision to
lysis is triggered to make the deer its subject. Successful
attach the noun immediately following a potentially tran-
reanalysis includes both detaching the deer from the
sitive verb as its object. The models differ, however,
object role of the subordinate verb hunted and attaching
about the nature and timing of effects of factors such
it instead as the subject of the main clause verb ran, since
as the likelihood that an optionally transitive verb is
English does not allow a noun to fill both syntactic roles
being used transitively, or the plausibility of the particu-
simultaneously in sentences like these. Garden-path sen-
lar noun as an object of the particular verb (Ferreira &
tences such as (1) have been studied extensively in the
Clifton, 1986; Frazier & Fodor, 1978; Frazier & Rayner,
psycholinguistic literature (Ferreira & Henderson, 1990;
1982; Garnsey, Pearlmutter, Myers, & Lotocky, 1997; Mac-
Pickering & Traxler, 2003; Pickering, Traxler, & Crocker,
Donald, Pearlmutter, & Seidenberg, 1994; McRae, Spivey-
2000; Trueswell, Tanenhaus, & Kello, 1993) with the
Knowlton, & Tanenhaus, 1998; Rayner, Carlson, & Frazier,
goal of distinguishing among different theories of sen-
1983; Trueswell, Tanenhaus, & Garnsey, 1994; Trueswell
tence processing.
et al., 1993).
The assumption that people eventually get to the
(1) While the man hunted the deer that was brown and correct interpretation most of the time was questioned
graceful ran into the woods. by Christianson and colleagues (Christianson, Holling-
(2) Did the man hunt the deer? worth, Halliwell, & Ferreira, 2001; Ferreira, Christianson,
(3) Did the deer run into the woods? & Hollingworth, 2001), who demonstrated that readers
CONTACT Zhiying Qian audreyqzy@gmail.com

© 2017 Informa UK Limited, trading as Taylor & Francis Group
2 Z. QIAN ET AL.
do not always reach the correct interpretation of garden- Christianson et al. (2001) included sentences and ques-
path sentences. After reading sentences like (1), readers tions such as (6) and (7).
often incorrectly respond Yes to questions such as (2)
while also correctly responding Yes to questions such (6) While Anna dressed the baby that was cute and small
as (3). Thus, they seem to interpret the sentence as spit up on the bed.
meaning both that the man hunted the deer and that (7) Did Anna dress the baby?
the deer ran into the woods, although this interpretation
is not licensed by the syntax (Christianson, 2008; Chris- These sentences used subordinate verbs that differed
tianson et al., 2001; Christianson, Williams, Zacks, & Fer- in an important way from those used in sentences such
reira, 2006; Ferreira et al., 2001). The error rate for as (1). The subordinate verbs in sentences such as (1)
questions such as (2) was over 70% in Christianson were optionally transitive verbs (OPT), which can be
et al. (2001), and that pattern has been replicated in used either transitively or intransitively. When they are
follow-up studies (Christianson et al., 2006; Ferreira & intransitive, it must be inferred whether they have an
Patson, 2007). The high error rate for questions such as object and what that object is. In contrast, the subordi-
(2), together with the low error rate for questions such nate verbs in sentences such as (6) were reflexive absol-
as (3), led Christianson and colleagues to conclude that ute transitive (RAT) verbs. Such verbs can also be used
reanalysis processes succeeded in attaching the deer as transitively or intransitively, but when they are intransi-
subject of the main clause verb ran, but not in removing tive, their subject is also their object (hence the label
it from the direct object role in the subordinate clause. “reflexive”). Successful reanalysis of (6) leads to the
They argued that incomplete reanalysis allowed the misinterpretation that Anna dressed herself, whereas suc-
interpretation derived from the initial misparse (i.e. the cessful reanalysis of (1) leaves unspecified what the
man hunted the deer) to linger and influence question man hunted. Christianson et al. (2001) found that
responses. readers made about 20% fewer errors after reading tem-
One criticism of interpreting the high error rate to porarily ambiguous sentences with RAT verbs than after
comprehension questions such as (2) as reflecting linger- temporarily ambiguous sentences with OPT verbs, which
ing initial misinterpretation is that readers might have could be ascribed to the absence of inferences drawn
answered the questions based on inferences that could about what the unspecified object was for sentences
naturally be drawn from the sentences. This possibility with RAT verbs. Readers did still make more errors after
is supported by the high error rate to questions following ambiguous (50%) than after unambiguous sentences
unambiguous versions of the sentences, which are dis- (10%) with RAT verbs, however, suggesting that they
ambiguated with a comma after the subordinate clause were sometimes garden-pathed by the ambiguous
verb as in (4) below, or by swapping the order of the versions.
two clauses as in (5) below. A possible alternative explanation for the incorrect
question responses found by Christianson and col-
(4) While the man hunted, the deer that was brown and leagues is that they were due to a reactivation of the
graceful ran into the woods. initial misinterpretation by the questions themselves,
(5) The deer that was brown and graceful ran into the which directly probed the misinterpretation (Nakamura
woods while the man hunted. & Arai, 2016; Sturt, 2007; van Gompel, Pickering,
Pearson, & Jacob, 2006). On this view, the reason there
In Christianson et al. (2001), the error rate to questions were more incorrect responses after ambiguous sen-
such as (2) was over 70% after reading sentences such as tences was that the surface form of the question was
(1), and as high as 42% and 51% after reading unambigu- more similar to the ambiguous sentences than to the
ous versions like (4) and (5), respectively. The 20%–30% unambiguous sentences, since the latter had an
difference in question response accuracy between additional comma.
ambiguous and unambiguous sentence versions can To address this concern, several studies have either
be ascribed to garden-pathing, but the still-high error changed the form of the questions to make them less
rate for unambiguous sentences suggests that there likely to reactivate the initial misinterpretation (Naka-
were also other reasons for incorrect question responses, mura & Arai, 2016) or used more indirect and implicit
such as the inference that the deer that ran into the measures to examine whether the initial misinterpreta-
woods was most likely the same deer that the man tion lingers, including measuring reading times for
was hunting. words later in the sentence that are incompatible with
In order to focus more directly on syntactic reanalysis the initial misinterpretation (Jacob & Felser, 2016; Sturt,
and lessen the influence of inference on response rates, 2007), syntactic priming (van Gompel et al., 2006),
LANGUAGE, COGNITION AND NEUROSCIENCE 3
grammaticality judgments (Lau & Ferreira, 2005), proces- object of the subordinate clause verb appears to
sing of newly-learned structures (Kaschak & Glenberg, remain, given that readers make frequent errors answer-
2004), paraphrasing (Patson, Darowski, Moon, & Ferreira, ing the question Did the man hunt the deer?.
2009), processing of subsequent sentences (Slattery, Effects of lingering misinterpretation have also been
Sturt, Christianson, Yoshida, & Ferreira, 2013), and sen- found in sentences with other types of syntactic ambigu-
tence-picture matching (Malyutina & den Ouden, 2016). ity (Christianson & Luke, 2011; Kaschak & Glenberg, 2004;
For example, Patson et al. (2009) asked people to para- Lau & Ferreira, 2005; Nakamura & Arai, 2016; Sturt, 2007).
phrase sentences they had just read and found that Lingering misinterpretation appears to be typical rather
they produced more paraphrases that retained the than a special property of the direct object/main clause
meaning of the initial misanalysis (e.g. The man hunted ambiguity in sentences such as (1). For instance, Sturt
the deer and it ran into the woods) after reading ambigu- (2007) constructed sentences with direct object/senten-
ous than unambiguous sentences, suggesting that the tial complement temporary ambiguity such as those in
meaning of initial misparse lingered. van Gompel et al. (10) below, in which the garden-path is easier to
(2006) asked participants to read sentences that were recover from than direct object/main clause ambiguities
either ambiguous or disambiguated with a comma and as in (1) above (Grodner, Gibson, Argaman, & Babyony-
then to complete a subsequent sentence fragment shev, 2003; Sturt, Pickering, & Crocker, 1999).
(While the doctor was visiti … ). Participants produced
more transitive completion structures following ambigu-
(10) a. The explorers found the South Pole was actually
ous than unambiguous sentences, which was inter-
right at their feet.
preted as showing that even after reanalysis was
b. The explorers found the South Pole was impossible
conducted, the initially incorrect direct object interpret-
to reach.
ation remained activated and was thus available to
prime the structure produced in the subsequent sen-
tence completions. Slattery et al. (2013) had participants The underlined segment of the sentences in (10)
read sentences such as (8) and (9) while having their above was either consistent (a) or inconsistent (b) with
eye movements monitored. They reported longer the initial misinterpretation. In this study, an ambiguity
reading times at himself in (8) in the ambiguous and effect was found to be localised to the disambiguating
plausible condition (hid the jewel) than the ambiguous verb (was), which Sturt (2007) interpreted as indicating
and implausible condition (hid the guard) and a that reanalysis was completed quickly. Crucially,
gender mismatch effect at himself in (9) in both ambig- despite that, readers still read final segments that con-
uous and unambiguous conditions. These findings flicted in meaning with the initial misanalysis more
showed that even after reanalysis was completed, the slowly than those that did not, suggesting that the
parser still failed to erase the semantics derived from initial misinterpretation lingered even though reanalysis
the initial misparse (See Sturt, 2007, for similar con- was completed.
clusion using a different structure). Even misinterpretation that is activated only very
briefly before being abandoned can persist and affect
(8) While the thief hid (,) {the jewel/the guard} from the the processing of what follows (Kaschak & Glenberg,
gallery could be seen on the security camera. The 2004; Lau & Ferreira, 2005). Kaschak and Glenberg
thief hid himself in a place where the cameras (2004) trained one group of participants with sentences
couldn’t see him. that contained a novel-to-them structure as in (11)
(9) After the bank manager telephoned(,) David’s {father/ below, while a control group instead read sentences
mother} grew worried and gave himself approximately with a familiar structure as in (12). In (11), cleaned could
five days to reply. be analyzed temporarily as a modifier as in The wood
floor needs cleaned corners, while the same analysis was
Using more implicit measures, the studies described not possible in (12). In a subsequent testing session,
above provide evidence that lingering misinterpretation both groups read sentences like (13), in which there
is unlikely to be just an artifact of explicit comprehension were participial adjective modifiers (cooked vegetables).
questions. Christianson et al. and Ferreira et al. con- The group that had been exposed to the novel construc-
cluded that the ambiguous noun is successfully attached tion read cooked faster than the group that had not,
to the main clause after reanalysis is performed because suggesting that when cleaned was misanalysed as a par-
readers respond accurately to the question Did the deer ticipial modifier in at least some trials in the training
run into the woods?. However, it seems that the interpret- session, the misanalysed structure remained activated
ation of the temporarily ambiguous noun as the direct and facilitated the modifier reading of cooked in (13).
4 Z. QIAN ET AL.
(11) The wood floor needs cleaned before our parents get Most relevant to the present study is the Good-
here. Enough Processing Account (Christianson et al., 2001;
(12) The wood floor needs to be cleaned before our Ferreira et al., 2001, 2002; Ferreira & Patson, 2007;
parents get here. Karimi & Ferreira, 2016), which states that when the
(13) The meal needs cooked vegetables so the guests will interpretation derived from the initial misanalysis is sen-
be happy. sible, the parser does not bother to fully reanalyse the
structure even though later information is syntactically
While there is now some consensus that interpret- incompatible with it. The Good-Enough Processing
ation from an initial misparse lingers, there is not consen- Account assumes a dual-pathways processing model, in
sus about what causes it. Lingering misinterpretation has which a semantic processing route and a morphosyntac-
been ascribed to shallow syntactic processing (Clahsen & tic processing route operate independently and each
Felser, 2006; Frisson, 2009) or underspecified syntactic outputs its own interpretation. When the interpretations
structure built by the parser (Ferreira, Bailey, & Ferraro, delivered by the two routes fail to converge, the parser
2002; Sanford & Sturt, 2002; Swets, Desmet, Clifton, & Fer- reconciles them, resulting in a final interpretation that
reira, 2008), memory traces left from the process of com- is not completely faithful to the linguistic input. In the
puting the initial parse (Kaschak & Glenberg, 2004), case of garden-path sentences such as (1), the sensible
shallow and underspecified semantic processing meaning derived from the initial misanalysis cancels
(Barton & Sanford, 1993), and fast-decaying syntactic out the need for computing detailed structure via the
structure (Sachs, 1967; Sturt, 2007). morphosyntactic processing route, leading to incom-
Several processing accounts, however, have ascribed plete syntactic reanalysis and the resulting lingering mis-
the lingering misinterpretation to incomplete reanalysis, interpretation from the initial misparse.
including the Attach Anyway and Adjust Principle (Fodor In a series of experiments, Ferreira and colleagues
& Inoue, 1998), lexically guided tree-adjoining grammar demonstrated that the parser sometimes opts for the
(Ferreira, Lau, & Bailey, 2004; Lau & Ferreira, 2005; Slattery interpretation derived from the semantic heuristics,
et al., 2013) and the Good-Enough Processing Account especially when the syntactic algorithm is demanding
(Christianson, 2016; Christianson et al., 2001; Ferreira and the syntactically licensed interpretation is implausi-
et al., 2001; Karimi & Ferreira, 2016). According to the ble (Christianson, Luke, & Ferreira, 2010; Ferreira, 2003;
Attach Anyway and Adjust Principle, the parser attaches Karimi & Ferreira, 2016). In Ferreira (2003), participants lis-
every incoming word into the existing structure even if tened to sentences such as (14)-(17), and then answered
such integration results in syntactic incompatibility. questions about the agent and patient roles of these
When syntactically illicit structure results, the parser sentences.
starts to revise the structure step by step in a backward
manner. In sentences such as example (1), ran is initially (14) The dog bit the man.
analyzed as the matrix verb although it lacks a subject. (15) The man bit the dog.
The parser then revises the already-built structure by (16) The man was bitten by the dog.
stealing the deer from the subordinate clause and attach- (17) The dog was bitten by the man.
ing it to the main clause. It then proceeds to reinterpret
hunted as an intransitive verb. However, reanalysis may Participants made errors to implausible passives (The
cease before it is completed, resulting in what Fodor dog was bitten by the man.), but not plausible and
and Inoue (1998) call the Thematic Overlay Effect, implausible actives and plausible passives. Most of the
which has the deer remaining both as the patient of errors involved flipping the thematic roles. In English,
hunted and the agent of ran. Similarly, Ferreira and col- the NVN word order usually maps onto Agent-Verb-
leagues’ lexically guided tree-adjoining grammar Patient thematic roles. In the case of implausible pas-
(LTAG) account proposes that the correct structure sives, the word-order heuristic delivers an analysis with
built after reanalysis is overlaid onto the initial incorrect the dog being the agent and the man being the
structure because the initial incorrect structure has not patient, which is in conflict with the output from the syn-
decayed in memory. The not-yet-decayed incorrect tactic processing route. Because NVN word-order is a
structure competes with the correct structure to influ- very powerful heuristic and the nouns fit well with its
ence the processing of subsequent sentences until usual thematic role assignments, the NVN heuristic over-
decay is completed. This process results in a “tree-spli- rides the interpretation from the syntactic route, result-
cing” structure that has the correct structure spliced ing in misinterpretation. In processing sentences like
onto the initial incorrect structure (Christianson et al., (17), outputs from the syntactic route that are not con-
2001). sistent with world-knowledge are “normalized” by the
plausibility heuristics to make the sentence sensible should linger. The present study aims to test that predic-
(Bever, 1970; Ferreira, 2003; Townsend & Bever, 2001). tion, using both reading time and ERP measures obtained
Evidence from electrophysiological studies has also while people read garden-path sentences as indices of
suggested that the semantic processing route some- the amount of time and effort spent on reanalysis, as is
times cancels out or wins over the syntactic processing widely assumed about such measures. Assuming that
route (Kuperberg, 2007). Kim and Osterhout (2005)’s par- responses to comprehension questions directly probing
ticipants read sentences like The hearty meal was devour- the initial misinterpretation reflect whether or not it per-
ing the kids. Since a meal cannot devour something, sists beyond the end of the sentence, then the Incomplete
devouring should elicit an N400 effect, which is an ERP Reanalysis Account predicts that there should be more
component that indexes how easily a word’s meaning evidence of reanalysis during the sentence for trials with
can be integrated with context (Kutas & Federmeier, correctly answered questions than for those with incor-
2011; Kutas & Hillyard, 1980). However, the response to rectly answered questions. If longer reading time at the
devouring showed instead an effect on the P600 com- disambiguating verb reflects more time spent on reanaly-
ponent, which is usually elicited by syntactic violations sis and thus a higher likelihood of its successful com-
(Hagoort, Brown, & Groothusen, 1993; Osterhout & pletion, then longer reading times should predict fewer
Holcomb, 1992), despite the fact that there was no syn- incorrect responses to questions that probe the initial mis-
tactic violation or ambiguity in the sentence. Such interpretation. This prediction has been tested in children
“semantic P600” effects have typically been found with by Wonnacott, Joseph, Adelman, and Nation (2016) and
role-reversal sentences (Kolk, Chwilla, van Herten, & for a different kind of ambiguous structure by Christian-
Oor, 2003; Kuperberg, Caplan, Sitnikova, Eddy, & son and Luke (2011). Neither found a reliable relationship
Holcomb, 2006; van Herten, Chwilla, & Kolk, 2006; van between reading time measures at critical sentence
Herten, Kolk, & Chwilla, 2005; see Kuperberg, 2007 for a regions and subsequent responses to comprehension
review). The semantic P600 effect provides evidence questions. However, neither had a good chance of
for dual syntactic and semantic processing routes. finding such a relationship, given how few trials per con-
More importantly, the absence of the N400 effect dition were used in their studies – six in Christianson and
suggests that information derived from the semantic Luke’s study and two in Wonnacott et al.’s study. Once
route can be strong enough to cause the parser to “nor- reading times were conditionalized on question
malize” the syntax to make it consistent with the seman- responses, very few trials remained per cell. Thus, it is
tics (Bornkessel-Schlesewsky & Schlesewsky, 2008; Kim & important to test the predicted relationship between criti-
Osterhout, 2005; Kuperberg, 2007). cal-region reading times and subsequent question
The existing literature on Good-Enough Processing responses in a study with more power to detect such a
has proposed two mechanisms that could account for relationship. It is also desirable to test the predicted
lingering misinterpretation. The first is that the semantics relationship between reading times and question
of the initial misinterpretation cancels out the need to responses for the same kind of sentences whose proces-
fully reanalyse the syntactic structure, resulting in incom- sing the Good Enough Processing Account was initially
plete reanalysis and lingering misinterpretation (Chris- developed to account for.
tianson et al., 2001; Ferreira et al., 2001). The second There is a similar prediction to be made about the
mechanism is that reanalysis is completed, but the P600 component in ERP waveforms, which is associated
interpretations from both the initial analysis and reanaly- with syntactic processing, including detection and reana-
sis co-exist (Ferreira, 2003; Ferreira et al., 2004; Slattery lysis of syntactic anomaly (Friederici, Mecklinger,
et al., 2013). In other words, according to the Good- Spencer, Steinhauer, & Donchin, 2001; Frisch, Schle-
Enough Processing Account, if either reanalysis of the sewsky, Saddy, & Alpermann, 2002; Gouvea, Phillips,
syntactic structure is incomplete, or syntactic reanalysis Kazanina, & Poeppel, 2010; Hagoort et al., 1993; Osterh-
succeeds but both analyses remain, then the initial mis- out, Holcomb, & Swinney, 1994), as well as other
interpretation can linger. We will refer to the first mech- aspects of syntactic integration in sentences (Gouvea
anism as the “Incomplete Reanalysis” version and the et al., 2010; Kaan, Harris, Gibson, & Holcomb, 2000).
second mechanism as the “Lingering Misinterpretation” P600 amplitude has been found to be larger when reana-
version of the Good-Enough Processing Account. lysis is more difficult (Osterhout et al., 1994) and also
The Incomplete Reanalysis version of the model makes when attention is called to syntactic problems in sen-
the prediction that the more evidence there is of reanaly- tences (Hahne & Friederici, 2002). If larger P600 ampli-
sis during the processing of garden-path sentences, the tude at the disambiguating verb in ambiguous
more likely it should be that reanalysis is successfully com- sentences indexes more complete reanalysis, then
pleted and thus the less the initial misinterpretation larger P600 amplitude should predict fewer incorrect
6 Z. QIAN ET AL.
responses to questions that probe the initial misinterpre- path sentences whose post-sentence questions are
tation. Following this logic, the studies presented here answered correctly than in those whose questions are
compare either reading times (Experiments 1 and 2) or answered incorrectly. Experiment 1 tested this prediction
P600 amplitudes (Experiment 3) at the disambiguating in a self-paced reading experiment using garden-path
verb between trials with correct and incorrect responses sentences with both OPT verbs (e.g. hunted) and RAT
to questions that directly probe the initial misinterpreta- verbs (e.g. dressed). We examined the relation between
tion, to test the Incomplete-Reanalysis explanation for reading times at the disambiguating verb and accuracy
lingering misinterpretation. Slower reading times and on comprehension questions that probed the lingering
larger P600 amplitudes for ambiguous sentences on misinterpretation (e.g. Did the man hunt the deer?)
trials where the probe questions are answered correctly
than for sentences on trials where the probe questions
2.1. Method
are answered incorrectly would support such an expla-
nation. Another possible outcome, however, is that 2.1.1. Participants
slower reading times and/or larger P600 amplitude Thirty-two undergraduate students (12 males; mean age
may simply indicate the amount of difficulty understand- 18.5; range 18–21) at the University of Illinois at Urbana-
ing the sentence rather than specifically the likelihood Champaign participated in Experiment 1. All were native
that it has been successfully reanalysed. In that case, speakers of English, had normal or corrected-to-normal
the meaning of such measures in the literature more vision and gave written informed consent.
generally might need to be reconsidered.
The severity of garden-pathing and the likelihood of 2.1.2. Materials and design
recovery from garden-pathing have been found to be Experimental sentences consisted of 40 sets of sentences
affected by the distance between the temporarily ambig- with OPT verbs and 24 sets of sentences with RAT verbs,
uous noun and the point of disambiguation, presumably with each set including an ambiguous version and a
because the parser assigns thematic roles at the heads of comma-disambiguated unambiguous version, as illus-
phrases (Ferreira & Henderson, 1991, 1998; Frazier & trated below in (18) and (19). In all sentences, the ambig-
Clifton, 1998; Tabor & Hutchins, 2004; Van Dyke & uous noun was followed by a relative clause that
Lewis, 2003; Warner & Glass, 1987). Readers judge the comprised two adjectives (e.g. that was brown and grace-
sentence While the man hunted the deer that was ful). Across the experiment, each OPT verb was used in
brown and graceful ran into the woods as less acceptable just one item set and each RAT verb was used in two
than While the man hunted the brown and graceful deer item sets, because there are fewer RAT verbs than OPT
ran into the woods, probably because in the version verbs in English. All sentences with OPT verbs and half
with post-noun modification the parser has been com- of the sentences with RAT verbs were taken from Chris-
mitted for a longer time to the incorrect direct object tianson et al. (2001). The other half of the sentences
analysis of deer by the time the disambiguating verb is with RAT verbs were newly constructed. In Christianson
reached, and therefore has more trouble abandoning it et al. (2001), the relativizer for all sentences was that. In
(Ferreira & Henderson, 1991, 1998). Perhaps reflecting the present study, who was used instead for human
that commitment, readers make more errors on compre- ambiguous nouns (e.g. the baby who was cute and
hension questions that target the misinterpretation fol- small) because it sounds more natural.
lowing sentences with post-noun-modification (the deer
that was brown and graceful) than those with pre- (18) OPT verb:
noun-modification (the brown and graceful deer) (Chris- Ambiguous: While the man hunted the deer that
tianson et al., 2001). Because the primary goal of the was brown and graceful ran into the
studies presented here was to compare trials with woods.
correct and incorrect post-sentence question responses, Unambiguous: While the man hunted, the deer that
it was crucial to obtain enough trials with each type of was brown and graceful ran into the
response. Toward that goal, the sentences in our woods.
studies used post-noun modification. Question: Did the man hunt the deer?
(19) RAT verb:
Ambiguous: While Anna dressed the baby who was
2. Experiment 1
cute and small spit up on the bed.
The Incomplete-Reanalysis explanation of the Good- Unambiguous: While Anna dressed, the baby who was
Enough Processing Account predicts that there should cute and small spit up on the bed.
be more evidence of reanalysis in ambiguous garden- Question: Did Anna dress the baby?
Critical sentences were distributed across two lists at a time in white 26-point Arial font on a black back-
using a Latin square design, so that each participant ground in the centre of the screen. Each trial began
saw only one version from each item set and saw with a “Ready” prompt that stayed on the screen for
equal numbers of sentences in each condition. Each sen- one second. Each time participants pressed a button
tence was followed by a comprehension question that on a Cedrus-830 response box, the current word was
directly probed the misinterpretation. replaced by the next one in the centre of the screen. Fol-
Ninety-two distractors were added to each list for a lowing each sentence, a comprehension question was
total of 156 trials/list. There were three types of distrac- presented for two seconds, followed by a screen display-
tors: (1) unambiguous sentences with subordinate- ing, for example, “Left: Yes Right: No” choices for a
matrix clause order (e.g. While Jennifer held the cigar maximum of four seconds. Participants pressed one of
that was aged and expensive she told bad jokes; 40 sen- the two mouse buttons to indicate their responses. The
tences); (2) unambiguous sentences with matrix-subordi- positions of Yes and No choices on the screen were coun-
nate clause order (e.g. The mother comforted the toddler terbalanced. Feedback about question accuracy was not
who was chubby and scared while the clown handed given, but a “Too Slow” message appeared when partici-
him a balloon; 40 sentences); and (3) ambiguous and pants did not respond within four seconds. A total of 156
unambiguous versions of sentences using reciprocal sentences were divided into four blocks of 39 sentences
verbs such as met, which are similar to RAT verbs in each, with each block containing the same number of
that their subject is also their object when no other items of each condition. Participants took a short break
object is specified [e.g. As Jane and Mary met(,) the men after each block. A practice block of seven trials was
from Florida drove past them; 12 items]. Comprehension added at the beginning. The entire experiment took
questions to the first two types of distractors asked approximately 40 min to complete.
about the content of various parts of the sentences,
and questions to the third type of distractors asked
2.2. Results
about the initial misinterpretation (e.g. Did Mary meet
the men?). Answers to the first two types of distractors Comprehension accuracy for distractor trials was used
were half Yes half No across the experiment. All sen- to determine whether participants were paying atten-
tences were pseudo-randomised with the constraint tion. All participants were above 80% correct (range
that no two experimental items appeared consecutively 80%–97%, mean 90%), indicating that they were attend-
and were presented to all participants in the same order ing. Thus, all participants’ data were included in the
across all lists. analyses.
Mixed effects models were used in R (R Development
Core Team, 2008) to analyze question response accuracy
2.1.3. Procedures
(logit mixed-effect models), question response times,
Participants sat in a dimly lit and sound-attenuated
and reading times for critical sentence regions (linear
booth in front of a 23-inch LCD monitor. To make presen-
mixed-effects models), with subjects and items as
tation mode comparable for the self-paced reading and
random effects and ambiguity as a fixed effect (Jaeger,
ERP experiments, sentences were presented one word
2008). In the analyses of question response accuracy,
reading time at the disambiguating region and its inter-
actions with ambiguity were also included as fixed
effects. For all analyses, the initial model included a
maximal random effects structure with random inter-
cepts and random slopes for all fixed effects for both
subjects and items (Barr, Levy, Scheepers, & Tily, 2013).
The final models reported here are the most complex
models that converged. For the logit mixed-effect
models used to analyze question response accuracy,
beta estimates, standard errors, and z- and p-values for
fixed effects are reported. For the linear mixed-effect
models used to analyze question response times and
sentence reading times, beta estimates, standard errors
Figure 1. Error rates for post-sentence question responses in and t-values are reported, with t > 2 being interpreted
Experiment 1. Error bars in all figures show standard errors cor- as significant. Items with OPT verbs and RAT verbs
rected for the within-subjects design. were analyzed separately.
8 Z. QIAN ET AL.
2.2.1. Question responses Prior to analysis, word-by-word reading times that

Response accuracy and response times for the questions were faster than 100 ms or slower than 2000 ms were
are shown in Figures 1 and 2 below, separately for the excluded, leading to a loss of 0.5% of the data. Reading
two types of verbs. (In all figures, standard error bars times were also excluded from further analysis for sen-
have been adjusted for the within-subjects design; tences after which participants failed to respond to the
Morey, 2008; see also Cousineau, 2005; Loftus & comprehension question within four seconds, eliminat-
Masson, 1994). Logit mixed-effect analysis showed that ing 2% of the data. Reading times above or below 2.5
there were more erroneous Yes responses for ambiguous standard deviations (SD) from the mean were replaced
than unambiguous sentences for both OPT verbs (16%; by the 2.5 SD cut-off value for each participant, affecting
β = 1.1, SE = 0.23, z = 4.73, p < .001) and RAT verbs (25%; 3% of the data. To remove individual differences in
β = 1.87, SE = 0.38, z = 4.93, p < .001). These error rates reading speed, statistical results reported below were
are very similar to those found in previous studies based on length-corrected residual reading times com-
using these kinds of sentences and questions (e.g. Chris- puted separately for each participant by entering their
tianson et al., 2001). reading times for every word in all sentences (including
Linear mixed-effect analysis of question response distractors) into a regression equation that took
times revealed an interaction between ambiguity and reading time as the dependent variable and word
response accuracy for both verb types (OPT: β = 159.98, length as the independent variable, and then subtracted
SE = 72.66, t = 2.20; RAT: β = 207.58, SE = 92.01, t = 2.26), the predicted reading times from the actual reading
which arose because correct responses were non-reliably times (Ferreira & Clifton, 1986; Trueswell et al., 1994).
slower than incorrect ones for ambiguous items, while For ease of exposition, however, the graphs show
for unambiguous items there was a non-reliable differ- trimmed reading times before the length-correction pro-
ence in the opposite direction (all t’s < 2). cedure, which in all cases showed the same pattern of
results as the length-corrected times.
Figure 3 below shows reading times on the disambig-
2.2.2. Sentence reading times
uating region broken down by the accuracy of responses
Reading times were analyzed at two sentence regions:
to the subsequent questions. For sentences with OPT
1) the disambiguating region, consisting of the disam-
verbs, the disambiguating region was read 30 ms
biguating verb (e.g. ran) and the word following it,
slower in ambiguous than in unambiguous sentences
and 2) the post-disambiguating region, consisting of
(β = 29.99, SE = 8.53, t = 3.52), and the same pattern
the 1–3 words following the disambiguating region
was observed for sentences with RAT verbs, whose dis-
through the end of the sentence. The post-disambigu-
ambiguating region was read 28 ms slower in ambiguous
ating region was analyzed to address the possibility
than in unambiguous sentences (β = 26.67, SE = 10.11,
that reanalysis effects might spill over onto sub-
t = 2.64). Thus, there was a typical ambiguity effect in
sequent words, as often happens with self-paced
the disambiguating-region reading times, suggesting
reading times.
that readers were garden-pathed in the ambiguous sen-
tences. The post-disambiguating region, however,
Figure 2. Question response time in Experiment 1 separately by

question response accuracy. Percentages reflect the unequal
numbers of trials contributing to the bars for correct and incor- Figure 3. Reading time at the disambiguating region in Exper-
rect question responses. iment 1 separately by question accuracy.
showed no effect of ambiguity for either kind of verb (t’s interpretation because sentences with those verbs are
< = 1), suggesting that ambiguity effects did not spill much less susceptible to the same kind of inference.
over to the following region. When a sentence containing a RAT verb does not
Given the primary question addressed here, which is mention an object, its subject automatically becomes
whether spending more time on the disambiguating its object, whereas for sentences with OPT verbs, figuring
region of garden-path sentences led to more accuracy out their unmentioned object requires inference.
in responding to questions that probed the initial misin- Reading times for the region following the disambi-
terpretation, what was evaluated in the following statisti- guation showed no effect of ambiguity, suggesting
cal analyses was whether disambiguating-region reading that readers were done with whatever additional proces-
times affected subsequent question accuracy. Results sing was triggered at the disambiguation in ambiguous
showed that for both OPT and RAT verbs, neither the dis- sentences. This pattern is consistent with results of eye-
ambiguating-region reading times (z’s < 1, p’s > .1) nor tracking studies using similar kinds of sentences (Slattery
their interaction with ambiguity (z’s < 1, p’s > .1) reliably et al., 2013; Sturt, 2007). It is possible that readers may
predicted question responses. The absence of any sometimes successfully complete revising the structure
reliable effects of disambiguating-region reading time before moving on from the disambiguation, but other
on question response accuracy indicates that question times they may never complete a successful revision
responses were not determined by the amount of time but move on anyway.
readers spent processing the disambiguating region. To address whether incorrect responses to questions
probing the revision of garden-path sentences are
based on easily drawn inferences, it is important to try
2.3. Discussion
to discourage such responses. Toward that goal, the
The results of Experiment 1 provided no evidence that questions were revised in Experiment 2 to ask what the
more time spent reading the disambiguating region sentence explicitly said. For example, for the deer-
led to more correct question responses at the end of hunter example the question became Did the sentence
the sentence. This pattern of results is rather surprising, explicitly say that the man hunted the deer? The idea
given the common assumption in the sentence compre- was that discouraging responses based on inferences
hension literature that slowing down on the disambigu- might allow a cleaner relationship between disambiguat-
ating region of a garden-path sentence indicates not just ing-region reading times and question response accu-
the detection of a problem but also time spent fixing it, racy to emerge.
which should then influence responses to questions that
specifically probe whether the sentence was successfully
3. Experiment 2
reanalysed.
Why did reading times at the disambiguating region Experiment 2 differed from Experiment 1 only in the type
not predict question response accuracy? One possibility of question asked after each sentence. In Experiment 1,
mentioned earlier is that erroneous responses to the non-explicit questions such as Did the man hunt the
comprehension questions are probably at least some- deer? were asked, while in Experiment 2, explicit ques-
times based on inferences that can easily be drawn tions such as Did the sentence explicitly say that the
from the sentence content, rather than just on sentence man hunted the deer? were asked to try to reduce
structure. When a man, a deer, and hunting are men- effects of inference.
tioned, the natural inference is that the deer is what
the man was hunting even though deer is not hunted’s
3.1. Method
direct object in the sentence structure. Even if they
have fully revised the sentence structure, people could 3.1.1. Participants
answer the questions based on such inferences, with Forty undergraduate students (16 males; mean age 20;
the result that any relationship between the amount of range 18–25) at the University of Illinois at Urbana-Cham-
time spent reading the disambiguation region and ques- paign participated in Experiment 2. All were native
tion responses would be diluted. Indeed, the 51% error speakers of English, had normal or corrected-to-normal
rate for questions after unambiguous sentences with vision, gave written informed consent, and received
OPT verbs provides an estimate of the likelihood of course credit for taking part.
such inferences, since there would seem to be no other
reason for responding incorrectly to those. The lower 3.1.2. Materials and design
error rate (29%) for questions following unambiguous The same critical sentences were used as in Experiment
sentences with RAT verbs is consistent with that 1, distributed across two lists according to a Latin
10 Z. QIAN ET AL.
square design. The number of distractors was increased appeared on the left and the No option on the right,
from 92 in Experiment 1 to 120 here, of the following while their positions were randomised in Experiment
types: (1) 40 ambiguous sentences in which the noun 1. This change led to shorter question response times
immediately following a verb turns out to be the overall in Experiment 2 than in Experiment 1.
subject of an embedded sentential complement rather
than the direct object of the main clause, along with
their unambiguous versions (e.g. The naïve girl believed 3.2. Results
(that) the urban myth could teach her the real history); Average accuracy for questions following distractors was
(2) 50 sentences with matrix-subordinate clause order 84% (range: 70%–97%), which was slightly lower than in
in which the noun immediately following the main Experiment 1. Answering the explicit questions was
clause verb is its direct object (e.g. The union leader apparently harder, presumably because it sometimes
implied the raise when he met with strikers); (3) 20 sen- required participants to suppress their natural inferences.
tences with subordinate-matrix clause order like the There were three participants who made over 25% errors
experimental items, but containing both a direct object to distractor items, but the results reported below
and a main clause subject (e.g. While Janis watched the include them because analyses with and without them
fish she cleaned the tank); and (4) 10 unambiguous sen- yielded the same pattern of results. (All effects were
tences with matrix-subordinate clause order (e.g. The slightly larger when they were excluded.)
mother served the broccoli while the kids banged the
table). Distractor types 2–4 were added so that the
3.2.1. Question responses
overall proportion of trials on which the noun immedi-
The most striking difference between the results of
ately following a verb turned out to be its direct object,
Experiments 1 and 2 was a drop in the overall error
rather than needing to be reanalysed as the subject of
rate in question responses from 50% in Experiment 1
a subsequent clause, was higher than in Experiment 1
to 30% in Experiment 2. Using explicit questions appar-
(Sentences of distractor type 1 were items for another
ently succeeded, at least to some extent, in encouraging
experiment, not reported here.). For distractor types 2–
participants to respond based on what they understood
4, the explicit question targeted various parts of the sen-
the sentence to say had happened, rather than on infer-
tences. Correct answers to those distractors were half Yes
ences they could easily draw from the sentences.
half No. All sentences were pseudo-randomised with the
Response accuracy and response times for the questions
constraint that no two critical sentences appeared con-
are shown in Figures 4 and 5 below, separately for the
secutively. Participants saw items in the same order in
two types of verbs.
all lists. A total of 184 trials were divided into four
Compared to Experiment 1, question response error
blocks with 46 sentences each, with each block contain-
rates decreased for both ambiguous and unambiguous
ing the same number of items of each condition.
sentences with both verb types, but did so especially
for unambiguous sentences. Similar to the Experiment
3.1.3. Procedures
Procedures in Experiment 2 were the same as in Exper-
iment 1, except that the Yes response option always

Figure 4. Error rates for post-sentence question responses in numbers of trials contributing to the bars for correct and incor-
Experiment 2. rect question responses.
1 results, logit mixed-effect models revealed reliably Figure 6 below. Disambiguating region reading times
more incorrect responses to questions after ambiguous showed a bigger effect of ambiguity than in Experiment
than unambiguous sentences for both verb types (OPT: 1 for both types of verbs, with 50 ms longer reading
29%; β = 2.24, SE = 0.18, z = 12.25, p < .001; RAT: 28%; β times for ambiguous than unambiguous sentences
= 2.18, SE = 0.60, z = 3.66, p < .001), as in Experiment 1. with OPT verbs (β = 53.99, SE = 9.75, t = 5.54) and 55 ms
Question response times in Experiment 2 were about longer for ambiguous than unambiguous sentences
250 ms faster overall than in Experiment 1 because the with RAT verbs (β = 58.62, SE = 10.58, t = 5.54), compared
Yes and No response options were always presented in to 30 and 28 ms in Experiment 1. The larger ambiguity
the same positions on the screen in Experiment 2, effect suggests that the explicit questions led people to
while their positions were randomised in Experiment read the ambiguous sentences more carefully.
1. As in Experiment 1, there was a reliable interaction The main question in these studies concerns the
between ambiguity and response accuracy for question relationship between disambiguating-region reading
response times for both verb types (OPT: β = 235.81, SE times and post-sentence question responses, and Exper-
= 89.69, t = 2.63; RAT: β = 461.39, SE = 90.94, t = 5.07), iment 2 did find such a relationship, while Experiment 1
but it was in many ways opposite the interaction did not. In Experiment 2, there was an effect of disambig-
observed in Experiment 1. In Experiment 2 for both uating-region reading time on question response accu-
verb types, response times were slower for incorrect racy in sentences with OPT verbs (β = 0.29, SE = 0.08, z
than correct responses, though reliably so only for the = 3.40, p < .001), with incorrect question responses in
unambiguous sentences (OPT: 277 ms: β = 208.79, SE = both the ambiguous and unambiguous conditions
65.92, t = 3.17; RAT: 430 ms; β = 402.67, SE = 68.60, t = associated with longer reading times, as shown in
5.87; ambiguous condition ts < 2). Figure 6. (In Experiment 1, there was a non-reliable
trend in the same direction.) However, reading times at
the disambiguating region of items with RAT verbs
3.2.2. Sentence reading times
were not reliably related to question response accuracy
Data trimming and analyses procedures were the same
(β = 0.06, SE = 0.13, z = 0.45, p > .05), although they
as for Experiment 1. Removing word-by-word reading
showed the same numeric trend as items with OPT
times faster than 100 ms or slower than 2000 ms led to
verbs. Crucially, for both verb types, there was still no
the loss of 1% of the data. Removing reading times for
interaction between ambiguity and disambiguating-
trials on which participants failed to respond to the com-
region reading time on question response accuracy
prehension question by the deadline eliminated 0.2% of
(OPT: β = 0.21, SE = 0.16, z = 1.27, p > .05; RAT: β = 0.01,
the data. Replacing reading times that were above or
SE = 0.25, z = 0.03, p > .05).
below 2.5 SD away from the mean with the cut-off
Another way that the results of Experiment 1 and 2
values for each participant affected 3% of the data.
differed was that the effect of ambiguity on reading
Reading times at the disambiguating region broken
times persisted into the post-disambiguating region for
down by question response accuracy are illustrated in
both verb types in Experiment 2: (OPT 19 ms; β = 22.25,
SE = 10.61, t = 2.10; RAT 30 ms; β = 35.08, SE = 14.41, t =
2.43). Thus, the explicit questions seem to have led to
more persistence of the ambiguity effect in reading
times. However, for both verb types, the reading times
at the post-disambiguating region still had no effect on
question response accuracy (ps > .05).
3.3. Discussion
There were several important differences between the
results of Experiments 1 and 2. First, the overall error
rate in question responses decreased substantially,
from 50% in Experiment 1 to 30% in Experiment 2,
suggesting that the explicit questions had the desired
effect of reducing responses based on easily-drawn infer-
Figure 6. Reading time at the disambiguating region in Exper-
iment 2 separately by question accuracy. Percentages reflect ences. The decrease was bigger for unambiguous sen-
the unequal numbers of trials contributing to the bars for tences, leading to a bigger effect of ambiguity on
correct and incorrect question responses. question response accuracy in Experiment 2. There was
12 Z. QIAN ET AL.
also a bigger effect of ambiguity on reading times at slowed down at the disambiguation, it was at least some-
both the disambiguating and post-disambiguating times because they had detected that something was
regions in Experiment 2 than in Experiment 1. The expli- wrong but couldn’t figure out how to fix it. Such an
cit questions clearly led people to both read the sen- explanation should predict longer reading times on the
tences more carefully and rely more on what the post-disambiguating region as well and that was true
sentence actually said had happened in responding to in Experiment 2 but not in Experiment 1.
the questions. Both Christianson and Luke (2011) and Because reading times on the disambiguating region
Swets et al. (2008) have shown similar effects of question of garden-path sentences are likely to be influenced by
type on both online reading times and offline compre- factors other than reanalysis, it is worth testing the
hension accuracy. However, in spite of this, there was Good-Enough Processing Account using a different
still very little relationship between reading times at kind of measure that may be less influenced by the
the disambiguating region and question responses. It is many factors that affect reading times. A good candidate
not that there was no relationship at all between is the P600 component of the event-related brain poten-
reading time and question response accuracy in Exper- tial (ERP), which is believed to be more specifically
iment 2, as was the case in Experiment 1. In Experiment responsive to structure processing in sentences.
2, reading time on the disambiguating region did reliably

predict question response accuracy, but the direction of
4. Experiment 3
the effect was opposite that predicted by the Incomplete
Reanalysis version of the Good-Enough Processing Event-related brain potentials (ERPs) may provide a more
Account. Instead of being more likely to answer the focused tool for examining the predictions of the Good-
question correctly when they spent longer reading the Enough Processing Account. In particular, the P600 com-
disambiguating region, which might index more work ponent may be useful because it is believed to specifi-
done to reanalyse the garden path, people were actually cally index structure processing. In sentences like the
less likely to respond correctly on trials where they spent ones used in Experiments 1 and 2, P600 should be eli-
longer on the disambiguation, suggesting that they cited by the disambiguating verb, and its amplitude
could have been more confused overall on those trials. may be related to the amount of work required to reana-
Furthermore, in none of the analyses of question lyse the garden path.
response accuracy in either study has there been any There is currently some controversy regarding what
interaction between ambiguity and reading time at the the P600 component indexes, but all of the accounts
disambiguation, which is what should happen according involve structural processing of some kind. The P600
to the Incomplete Reanalysis version of the Good- has been interpreted as reflecting syntactic reanalysis
Enough Processing Account. Time spent reading the dis- of garden-path sentences (Osterhout & Holcomb, 1992,
ambiguating region specifically in the ambiguous sen- 1993, see also Osterhout et al., 1994), repair of syntactic
tences is what should reflect amount of reanalysis violations in sentences (Friederici, 1998), and syntactic
work, which should lead to an interaction between ambi- integration in structurally complex sentences (Kaan
guity and reading time in predicting question response et al., 2000). Interpreting the processes underlying differ-
accuracy, but there was no hint of any such interaction ent language-sensitive ERP components has been com-
in either study. Thus, there is no evidence from the plicated by recent findings of P600 effects when N400
reading time studies to support a claim that people effects were expected instead. The studies finding
should be more likely to respond to the questions cor- these “semantic P600” effects have all used sentences
rectly if they spend more time reanalysing garden path in which there are subject and object nouns that
sentences, even in Experiment 2 where responding would be plausible arguments for the verb, but they
based on easily-drawn inferences was reduced. appear in the wrong position or with the wrong morpho-
This line of reasoning assumes that time spent syntactic markers for the role that would be plausible for
reading the disambiguating region indexes the amount them (Kolk et al., 2003; Kuperberg et al., 2006; van Herten
of work done to reanalyse garden path sentences. It is et al., 2006, 2005). Although the semantic P600 results
clear, though, that reading times are influenced by have raised very interesting questions about the inter-
many other factors in addition to reanalysis. The fact play of meaning and structure in sentence comprehen-
that longer reading times at the disambiguation were sion, all of the accounts agree that P600 reflects
associated with more errors in the question responses something about determining and using sentence struc-
in Experiment 2 suggests that longer reading times ture as part of interpreting a sentence (Gouvea et al.,
may sometimes index less success in coming to an 2010; Hagoort et al., 1993; Hahne & Friederici, 1999;
interpretation of the sentence. Perhaps when readers Kaan & Swaab, 2003; Kuperberg, 2007; Osterhout &
Holcomb, 1992). Thus, P600 effects at the disambiguat- After each sentence, a comprehension question was
ing word in garden-path sentences may provide a presented (e.g. Did the man hunt the deer?), with Yes
more specific measure of the work required to success- and No response choices presented below the compre-
fully reanalyse garden-path sentences, and thus might hension question on the screen. Responses were indi-
better predict responses to questions specifically cated by pressing one of two buttons on a Cedrus RB-
probing whether reanalysis succeeded. 830 response box, and as in Experiment 2 (but not Exper-
In Experiment 3, we made use of the P600 component iment 1), the response options appeared in the same
to try to specifically examine the relationship between locations on the screen throughout the session (Yes
the amount of syntactic reanalysis work at the disambig- choices were presented on the left and No choices on
uating verb and the likelihood of incorrect question the right). A “Too Slow” warning was presented if no
responses. The prediction was that bigger P600 at the response was made within four seconds. Feedback was
disambiguating verb should be associated with more not given regarding response accuracy. Stimulus presen-
correct responses after the sentences. tation was controlled by the Presentation® software
package. Each list was divided into four blocks. Partici-
pants were given a short break after each block and
4.1. Method
were instructed to try to minimise blinking and body

4.1.1. Participants movement during the presentation of the sentences.
Participants were 64 undergraduate students at the Uni- They were encouraged to blink between trials when
versity of Illinois at Urbana-Champaign. All were native necessary. A practice block of five trials was given at
speakers of English, were strongly right-handed as the beginning. The recording session lasted about
assessed by the Edinburgh inventory (Oldfield, 1971), 45 min and the entire session lasted approximately two
had normal or corrected-to-normal vision, and reported hours.
no neurological or psychiatric disorders. All gave
written informed consent and received course credits 4.1.4. EEG recording and data analysis
or payment for taking part. Data from six participants Continuous EEG was recorded from 27 Ag/AgCl sintered
were excluded from analysis due to technical problems electrodes placed in an elastic cap (EasyCap, 10–10
with data collection or excessive loss of trials to artifacts. system; Chatrian, Lettich, & Nelson, 1985), referenced
Four additional participants were excluded from data online to the left mastoid and re-referenced offline to
analyses because their response accuracies to distractors the average of left and right mastoids: midline: Fz, Cz,
were below 75%. Data analyses were conducted on the Pz; lateral: AF3/4, F3/4, F7/8, FT7/8, FC3/4, C3/4, T3/4,
remaining 54 participants (26 males, mean age 19.3; CP3/4, T5/T6, P3/4, P5/6, PO7/8. Eye blinks and eye move-
range 18–22). ments were detected with electrodes above and
beneath the right eye and at the outer canthi of both
4.1.2. Materials and design eyes. EEG and EOG recordings were amplified by a
Critical sentences in Experiment 3 were exactly the same Grass Model 12 amplifier and sampled at a frequency
as in Experiments 1 and 2, and the distractors were those of 200 Hz. A .01–30 Hz analog bandpass filter was applied
from Experiment 2. The questions asked at the end of the during online recording. Impedances were maintained
sentences were the non-explicit versions, such as Did the below 5 kΩ.
man hunt the deer?, as in Experiment 1. Sentences were Epochs were extracted from the continuous wave-
distributed over two lists according to a Latin square forms from 100 ms before the onset of the disambiguat-
design, and were presented to all participants in the ing verb through 1100 ms later. Trials contaminated with
same order as in Experiment 2. artifacts during this epoch were rejected using the
ERPLAB toolbox (Lopez-Calderon & Luck, 2014). Blinks
4.1.3. Procedures and eye movements were detected using a moving
Participants were seated comfortably in a dimly lit and window peak-to-peak function on the EOG channels,
sound-attenuated booth in front of a 23-inch LCD and non-ocular artifacts were identified using the same
monitor. Each trial began with a fixation point, which moving window peak-to-peak function applied to the
stayed in the centre of the screen for 500 ms. Because EEG channels, with individualised thresholds determined
eye movements cause artifacts that contaminate the by visual inspection of each participant’s data. Data were
EEG signal, sentences were presented word-by-word at excluded from further analyses if artifact rejection led to
the centre of the screen in 26-point white Arial font on a loss of over 30% of the data in any of the conditions.
a black background, at a rate of 400 ms per word (300 Epochs contaminated with artifacts were discarded,
ms text, 100 ms blank screen). leading to an average loss of 11.5% of the data, which
14 Z. QIAN ET AL.
did not differ across conditions (OPT: ambiguous 11.5%, waveforms for display, but analyses were performed
unambiguous 11.6%; RAT: ambiguous 10.9%, unambigu- before such filtering was applied.
ous 12.0%). ERP data were analyzed using analysis of variance
Mean amplitudes were calculated for each channel in (ANOVA) rather than mixed effects models for purely
each condition for each participant for the 600–900 ms pragmatic reasons. The EEGLAB and ERPLAB analysis
time window to capture the P600 component, and software packages assume that what will be submitted
were submitted to separate repeated-measures analyses to statistical analyses is subject/condition means rather
of variance to conduct two sets of analyses. One set of than individual trials, which is consistent with ANOVA
analyses included all lateral electrodes and another but not with mixed effects models, since those are typi-
included just midline electrodes. The ANOVA including cally done on individual trials. It is not impossible to use
all lateral electrodes had four within-subject factors: mixed effects models to analyze single-trial ERP data, but
two levels of ambiguity (Ambiguous, Unambiguous), it is substantially more difficult to get the data into the
two levels of question response accuracy (Correct, Incor- required form, so that task has been postponed for now.
rect), three levels of electrode site anteriority (Frontal,
Central, Posterior) and two levels of electrode site lateral-
ity (Left, Right). The ANOVA including just midline elec- 4.2. Results
trodes (Fz, Cz, Pz) consisted of the same within-subject
4.2.1. Question responses
factors except that there was no laterality factor. When
The average comprehension accuracy for distractors was
interactions with electrode site in the omnibus ANOVAs
91%, suggesting that participants were attending to the
motivated further analyses, analyses were conducted
sentences. The analysis procedures for question
on six regions of interest (ROIs), each comprising four
response accuracy and response time were the same
electrodes: left anterior (AF3, F3, F7, FT7), right anterior
as in Experiments 1 and 2. Response accuracy and
(AF4, F4, F8, FT8), left central (FC3, C3, CP3, T3), right
response time for the questions are shown in Figures 7
central (FC4, C4, CP4, T4), left posterior (P3, T5, P5, PO7)
and 8 below. As in Experiments 1 and 2, there were
and right posterior (P4, T6, P6, PO8). When there were
more incorrect responses for the ambiguous than unam-
no interactions involving the laterality factor, it was col-
biguous sentences for both verb types (OPT: 18%; β =
lapsed over and analyses were conducted on three
0.85, SE = 0.21, z = 4.06, p < .001; RAT: 19%; β = 1.19,
ROIs: anterior (AF3, F3, F7, FT7, AF4, F4, F8, FT8), central
SE = 0.23, z = 5.22, p < .001). The overall error rate was
(FC3, C3, CP3, T3, FC4, C4, CP4, T4) and posterior (AF4,
similar to Experiment 1, which used the same non-expli-
F4, F8, FT8, FC3, C3, CP3, T3). Analyses within ROIs
cit questions.
included two within-subject factors: two levels of ambi-
Question response times were 595 ms slower overall
guity and two levels of question accuracy. The Green-
than in Experiment 1 and 925 ms slower overall than in
house-Geisser correction was applied wherever
Experiment 2 because of differences in the timing of
necessary to correct for violations of sphericity (Green-
the presentation of the Yes/No response options on the
house & Geisser, 1959). Corrected p-values and original
degrees of freedom are reported. Grand average ERPs
were digitally low-pass filtered at 10 Hz to smooth the

Figure 7. Error rates for post-sentence question responses in numbers of trials contributing to the bars of correct and incorrect
Experiment 3. question responses.
screen. In Experiment 3, the question and the response unambiguous incorrect: 0.46 μV). This observation was
options appeared simultaneously, so response times confirmed by statistical analysis. In the ANOVAs over all
included both reading the question and responding to lateral electrodes, over the midline electrodes, and over
it. In contrast, in Experiments 1 and 2 the response just the centroparietal electrodes where P600 effects
options appeared 2 sec after the question appeared, so tend to be maximal, there were no main effects nor inter-
response times did not include time spent reading the actions involving question response accuracy, all Fs < 1.
question. As in Experiments 1 and 2, there was an inter- Additional analyses conducted at narrower time
action between ambiguity and response accuracy for windows of 600–700, 700–800, and 800–900 ms, which
question response times for both verb types (OPT: β = were done in the hope that finer-grained analyses would
145.91, SE = 53.32, t = 2.74; RAT: β = 149.07, SE = 73.72, reveal a relationship between the amplitudes of the
t = 2.02). The interaction here was similar to that found P600 at the disambiguation and question response accu-
in Experiment 2, with incorrect responses slower than racy, showed the same results. The P600 ambiguity
correct ones, especially for unambiguous items. effect was significant at the central and posterior regions
(Fs > 4.79, ps < .03), but not at the frontal region, Fs < 1,
4.2.2. ERPs and the P600 effect did not interact with question
ERP results are reported below only for the items with response accuracy in any analysis (Fs < 3.18, ps > .08).
OPT verbs because not enough participants had Visual inspection also revealed that there was a Left
enough trials available in every cell for a meaningful Anterior Negativity (LAN) effect, with trials followed by
analysis of items with RAT verbs. This was more a incorrect responses being more negative than those fol-
problem for items with RAT verbs than for those with lowed by correct responses in the left hemisphere
OPT verbs because there were both fewer trials/con- between 400 and 600 ms (see Figure 10). The ANOVA
dition for RAT verbs (12 vs 20) and fewer incorrect over all lateral electrodes showed a marginally significant
responses for items with RAT verbs. Only 23% of the main effect of response accuracy, F(1,53) = 3.56, p = .06,
items with RAT verbs were responded to incorrectly, so and a marginally significant interaction between response
many participants had very few or even no trials contri- accuracy and laterality, F(1,53) = 3.92, p = .052, which was
buting to that cell.1 resulted because the response accuracy effect was signifi-
Visual inspection revealed that the waveforms for the cant at the left anterior region, F(1,53) = 4.23, p < .05, and
disambiguating verb (e.g. ran) in items with OPT verbs the left central region, F(1,53) = 6.02, p < .05, but not at
were more positive during the P600 time window other regions, Fs < 1.95, ps > .17. Crucially, there was no
(600–900 ms) in the ambiguous than the unambiguous interaction between ambiguity and question response
condition, as illustrated for sentences with OPT verbs in accuracy, Fs < 1, suggesting that whatever mechanism
Figure 9. As is typical for P600, the effect was centropar- the LAN reflected that led to correct vs. incorrect question
ietally distributed. ANOVAs over all lateral electrodes responses had nothing to do with ambiguity. Thus there
revealed a main effect of ambiguity, F(1,53) = 4.45, was no evidence that when readers answered the ques-
p < .05, and an interaction between ambiguity and tion following a garden-path sentence correctly, it was
the anteriority of electrode location, F(2,106) = 11.61, because they had successfully reanalysed the sentence
p < .001, which arose because the P600 effect was structure by working hard at the disambiguating verb.
significant at central sites, F(1,53) = 6.58, p = .01 and pos-
terior sites, F(1,53) = 15.91, p < .001, but not at frontal
4.3. Discussion
sites, F < 1. ANOVAs over the midline electrodes
showed the same pattern, with a main effect of ambigu- The results of Experiment 3 indicated that in sentences
ity, F(1,53) = 5.90, p < .05, and an interaction between with OPT verbs, the disambiguating verb (e.g. ran) in
ambiguity and anteriority, F(2,106) = 8.18, p < .001, the ambiguous condition triggered a larger P600 than
because the P600 effect was significant at Cz, F(1,53) = in the unambiguous condition, suggesting that syntactic
6.72, p = .01 and Pz, F(1,53) = 11.77, p = .001, but not at reanalysis took place at the disambiguation. However,
Fz, F < 1. there was no relationship between the amplitude of
When the waveforms were broken down by question the P600 effect at the disambiguation and responses to
response accuracy, visual inspection revealed that the the post-sentence questions, consistent with the
P600 ambiguity effect for sentences with correct reading time data in Experiments 1 and 2. If P600 ampli-
responses did not appear to differ from that for sentences tude indexes reanalysis effort and question response
with incorrect responses, as shown in Figure 9B and C accuracy reflects whether or not the initial misinterpreta-
(ambiguous correct mean voltage: 1.35 μV; ambiguous tion was revised at the disambiguation, this result is
incorrect: 1.28 μV; unambiguous correct: 0.48 μV; inconsistent with the Incomplete Reanalysis explanation
16 Z. QIAN ET AL.
Figure 9. (A) Grand average ERPs for the disambiguating verb at all electrodes in ambiguous and unambiguous sentences with OPT
verbs in Experiment 3, baselined on 100 ms before the onset of the disambiguating verb. Y-axis position indicates onset of the disam-
biguating verb. ERPs averaging across electrodes in the centroparietal region with (B) correct question responses and (C) incorrect ques-
tion responses.
of lingering misinterpretation in the Good-Enough Pro- appears that an anterior negativity can sometimes
cessing Account, just as the reading time data in Exper- result from superposition of partially overlapping N400
iments 1 and 2 was. and P600 effects that cancel each other out to varying
It is not yet clear what the LAN component indexes, degrees at different scalp sites, depending on the ampli-
since a wide range of phenomena have been found to tude of each component (Tanner, 2015). In ambiguous
elicit it, including morphosyntactic agreement violations sentences with OPT verbs, when it becomes clear that
(Coulson, King, & Kutas, 1998; Gunter, Friederici, & Schrie- at the disambiguating verb (ran) that the deer has to be
fers, 2000; Osterhout & Mobley, 1995), word concrete- its subject, that leaves hunted with no specified object.
ness and imageability (Gullick, Mitra, & Coch, 2013; Under these circumstances where the sentence does
Holcomb, Kounios, Anderson, & West, 1999; Kounios & not say what the man hunted, those trials that received
Holcomb, 1994; Lee & Federmeier, 2008; Zhang, Guo, a more thorough analysis of the plausibility of the deer
Ding, & Wang, 2006), working memory load related to as the object of hunted, given that no other object is
complex structure (King & Kutas, 1995; Kluender & available, led to an increase in amplitude of the N400
Kutas, 1993; Weckerly & Kutas, 1999), “frame-shifting” in component, which could emerge as a LAN when the
processing non-literal language (Coulson & Kutas, N400 was cancelled out by the P600 at the centroparietal
2001), and lexical (Lee & Federmeier, 2009; Wlotko & Fed- region. Another possibility is that the LAN effect in Exper-
ermeier, 2011, 2012) or referential (Nieuwland, Otten, & iment 3 reflected the effort of retrieving the deer and
Van Berkum, 2007) ambiguity. The interpretation of hunted from working memory so that plausibility could
LAN has become more complicated recently because it be evaluated.
Figure 10. Grand average ERPs for the disambiguating verb at all electrodes in ambiguous and unambiguous sentences with OPT verbs
in Experiment 3, separately by question response accuracy and based on 100 ms before the onset of the disambiguating verb. Y-axis
position indicates onset of the disambiguating verb.
The above interpretation of the LAN is consistent with retrieval process was needed for answering questions
a finding that has recently been reported at a conference following both ambiguous and unambiguous sentences,
but not yet published. Oines and Kim (2014) asked par- and thus there was no difference in the amplitude of the
ticipants to read role-reversal sentences that typically LAN between ambiguous and unambiguous versions.
elicit the “semantic P600” effect, which was introduced The direction of the LAN effect was consistent with
briefly earlier. Sentences like The hearty meal was devour- reading time data in Experiment 2, with longer reading
ing … would be expected to elicit an N400 effect at time or bigger LAN at the disambiguating verb leading
devouring because it is nonsensical to say that a meal is to more incorrect responses to comprehension
devouring something, but a P600 effect has been questions.
observed instead. Oines and Kim asked participants to In summary, the results of Experiments 1, 2, and 3 all
perform one of two tasks while reading these types of converge to show that measures that have long been
sentences. In the structural repair task, they were asked assumed to index the amount of work done to reanalyse
to figure out how to fix the structure of the sentences garden-path sentences at their disambiguation does not
so that they made sense, while in the semantic inte- predict the accuracy of responses to post-sentence ques-
gration task, they were asked to try very hard to figure tions that specifically probe the success of such reanaly-
out the meanings of the sentences, given their structure. sis, even when answering the questions based on
The structural repair group showed a P600 effect while inference was reduced in Experiment 2 and when what
the “semantic integration” group instead showed a has been believed to be a more specific measure of
LAN. The LAN was interpreted as reflecting the need to structural reanalysis, the P600, was used in Experiment 3.
retrieve word order information from working memory,
since that is what determines the role meal plays in the
5. Experiment 4
devouring event. In Oines and Kim’s study, the LAN was
elicited when word order was a crucial factor in deter- Experiments 1–3 converged to show that unsuccessful or
mining the role of the noun with respect to the verb. incomplete reanalysis is not the primary reason for incor-
Similarly, in Experiment 3, LAN was elicited when the rect responses to post-sentence questions that are
ambiguous noun and the subordinate verb were specifically intended to probe reanalysis success. The
retrieved from working memory so that responses to results confirm two previous reports of no relationship
comprehension questions could be decided on. This between online reading time measures and question
18 Z. QIAN ET AL.
responses (Christianson & Luke, 2011; Wonnacott et al., the burgers? approximately 40% of the time (similar to
2016), and extend them by 1) bringing more power to the result in Christianson et al., 2001, for implausible
bear in the search for such a relationship, specifically sentences).
for the type of sentence used most widely in the Although Jacob and Felser measured both reading
studies leading to the Good Enough Processing times and question responses, they did not attempt to
Account, 2) modifying the questions to try to eliminate relate them to one another trial-by-trial, probably
inference-based responses, and 3) using the P600 ERP because 5 trials/condition would not allow it. Nakamura
component to try to restrict the online measure more and Arai (2016) also measured both sentence reading
specifically to syntactic reanalysis. None of those modifi- times and question responses but also did not attempt
cations led to the emergence of a relationship between to relate them trial-by-trial, again probably because 6–8
the online processing measures and question responses. trials/condition precluded doing so. Their study investi-
The obvious next question is what does determine the gated native Japanese speakers reading Japanese sen-
responses to such questions? Given the fundamental tences containing locally ambiguous relative clauses,
role that incorrect responses to such questions have which are similar to the English sentences with RAT
played in the development of the Good Enough Proces- verbs in that there is no ambiguity about which nouns
sing Account, it is an important question. fill which roles by the end of the relative clause. They
In an early study contributing to the development of also presented post-sentence questions in a form that
the Good Enough Processing Account, Christianson was less likely to reactivate the initial misinterpretation.
et al. (2001) found that question responses were affected Their goal was to test whether Japanese speakers
by plausibility in addition to ambiguity and ambiguous- retained an initial interpretation after it was contradicted
region length. They manipulated the plausibility of the by subsequent words, even though there was no remain-
critical noun as the direct object of the subordinate ing ambiguity about which nouns were the objects of the
clause verb by using different sentence completions, as verbs and thus no need to infer an unspecified argu-
illustrated in (20) below. A deer that is pacing in a zoo ment. They found evidence that the initial misinterpreta-
is less plausible as the object of hunted than one that is tion did linger and argued that previous results in English
running into the woods, and people answered Yes to are not entirely due to inferences drawn in the absence
Did the man hunt the deer? 7%–13% less often after the of a specified object or to reactivation of the initial mis-
pacing-in-the-zoo version, though they still said Yes interpretation by the question.
31%–44% of the time. Another aspect of Nakamura and Arai’s study is par-
ticularly relevant for ours, which is that they normed
(20) Plausible: While the man hunted the deer that the plausibility of their items and examined the effect
was brown and graceful ran into the of item-by-item variation in plausibility on both reading
woods. times and question responses. In a meta-analysis across
Implausible: While the man hunted the deer that two self-paced reading studies, they found reliable
was brown and graceful paced in the graded effects of plausibility on reading times at the criti-
zoo. cal sentence region and also a marginal effect of plausi-
bility ratings on question responses, with items whose
The authors argued that people were more likely to initial misinterpretation was rated as more plausible
pursue reanalysis when the initial interpretation more likely to elicit incorrect Yes responses. That is,
became implausible, which in turn led to more correct when the plausibility of the initial misinterpretation
question responses. However, another possibility is that was higher, people were more likely to retain it.
people were less likely to answer Yes to a question Although plausibility was not explicitly manipulated in
about an implausible event regardless of whether they our studies, it became apparent across the experiments
had successfully reanalysed the sentence. In an eyetrack- that some items rarely elicited incorrect Yes responses
ing study, Jacob and Felser (2016; following Sturt, 2007) (e.g. The question Did the caricaturist draw the child?
took the plausibility manipulation a step further by con- after While the caricaturist drew the child who was freckled
structing sentence continuations whose meaning made and talkative stood on the sidewalk was responded to
it impossible, rather than just unlikely, for the critical incorrectly only 27% of the time), while others frequently
noun to remain a direct object, as in While the gentleman did so (e.g. The question Did the skipper sail the boat?
was eating(,) the burgers that were really huge were still after While the skipper sailed the boat that was small
being heated in the microwave. Despite the impossibility and leaky veered off course was responded to incorrectly
of eating burgers that are still in the microwave, native 87% of the time). Thus, it seemed that sentences varied
English speakers answered Yes to Did the gentleman eat in how much they led people to think that an event
had been described in which the temporarily ambiguous also rated as 17% more likely than unambiguous sen-
noun still played the role of the theme of the subordinate tences (Ambiguous 69%, Unambiguous 52%), and
clause verb even though it had turned out not to be its there was also an interaction between ambiguity and
direct object in the sentence structure. Experiment 4 verb type, F(1, 124) = 8.67, p < .01, because the difference
attempted to assess that for the whole sentence and between ambiguous and unambiguous sentences was
Experiment 5 attempted to do so for particular subcom- bigger for items with RAT verbs (26%) than for those
ponents of the sentence. In Experiment 4, the exper- with OPT verbs (11%, OPT: Ambiguous 75% vs Unam-
imental sentences used in Experiments 1–3 were biguous 64%; RAT: Ambiguous 59% vs Unambiguous
presented as whole sentences followed by a question 33%).
asking participants to judge how likely it was that the Logit mixed-effect models were used to evaluate the
temporarily ambiguous noun was the direct object of relationship between likelihood ratings and question
the verb in the event described in the sentence. So, responses in Experiments 1–3, by including likelihood
after reading While the man hunted the deer that was rating as a fixed effect. In addition, since the analysis of
brown and graceful ran into the woods., participants the ratings showed that they were affected by ambiguity,
were asked How likely is it that the man hunted the deer?. and question responses were also affected by ambiguity
in Experiments 1–3, ambiguity was included as another

fixed effect in the models so that the relationship
5.1. Method
between likelihood ratings and question responses
5.1.1. Participants could be evaluated separately from the effect that ambi-
Fifty undergraduate and graduate students (28 males; guity had on both of them. The initial models all included
mean age 20; range 18–28) at the University of Illinois the interaction between ambiguity and likelihood rating,
participated in Experiment 4. All were native speakers but because this interaction was not significant for any of
of English, had normal or corrected-to-normal vision, the three experiments, it was removed from the models.
gave written informed consent, and received course The results showed overall that likelihood ratings were
credit or payment for taking part. reliably related to question responses, such that ques-
tions after items that were rated more likely were also
5.1.2. Materials and procedures more likely to be given incorrect Yes responses. Analyses
Materials were the same ambiguous and unambiguous were conducted separately for items with OPT and RAT
sentences with OPT and RAT verbs that were used in verbs, but the verb types are shown collapsed together
Experiments 1–3. Ambiguous and unambiguous versions in Figure 11 for each experiment because effects were
of each item were distributed across two lists according generally the same for both verb types.
to a Latin square design, so that no participant saw both For the Experiment 1 results, question responses for
versions of the same sentence. items with both verb types were reliably predicted by
Whole sentences were presented on the computer both ambiguity (OPT: β = .78, SE = .24, z = 3.23, p = .001;
screen. Following each sentence, participants were RAT: β = 1.09, SE = .52, z = 2.07, p < .05) and likelihood
asked to give a percentage rating from 0% to 100% to ratings (OPT: β = .39, SE = .11, z = 3.45, p < .001; RAT: β =
questions such as How likely is it that the man hunted 1.09, SE = .52, z = 2.07, p < .05), with ambiguous items
the deer?. Sentence order was randomised for each par- and items with higher likelihood ratings both more
ticipant. Item-by-item mean likelihood ratings were likely to receive incorrect Yes responses, as shown in
obtained by averaging across participants and those Figure 11. The results for Experiment 2 showed a
were then entered as a fixed effect into new logit similar pattern, with more incorrect Yes responses for
mixed effect models reanalysing the results of Exper- ambiguous items for both verb types (OPT: β = 1.71,
iments 1–3 to determine whether they predicted the SE = .58, z = 2.95, p < .01; RAT: β = 1.70, SE = .74, z = 2.31,
question response accuracy in those studies. p < .05) and for items with higher likelihood ratings for
OPT verbs (β = .02, SE = .01, z = 2.38, p < .05), but the
effect of likelihood ratings was not reliable for items
5.2. Results
with RAT verbs (β = .02, SE = .02, z = 1.42, p > .1). In Exper-
Statistical analysis of the mean likelihood ratings them- iment 3, there were again more incorrect Yes responses
selves for each item averaged across all participants both for ambiguous items with both verb types (OPT:
showed a main effect of verb type, F(1, 124) = 88.46, β = .49, SE = .23, z = 2.16, p = .03; RAT: β = .65, SE = .33,
p < .001, with items with OPT verbs rated 25% more z = 1.95, p = .05) and for items with higher likelihood
likely than those with RAT verbs (OPT 69%, RAT 46%; F ratings for both verb types (OPT: β = .03, SE = .01, z =
(1, 124) = 48.27, p < .001). Ambiguous sentences were 3.71, p < .001; RAT: β = .02, SE = .01, z = 2.12, p < .05).
20 Z. QIAN ET AL.
Figure 11. Scatterplots showing the relationship between the percentage of incorrect question responses an item received and the
item-by-item likelihood ratings in Experiment 1, 2, and 3, collapsing over items with OPT and RAT verbs.
Overall, results for items with both OPT and RAT verbs items with RAT verbs in Experiment 2 was likely due to
indicated that ambiguity and the likelihood ratings had a combination of the explicit questions used in Exper-
separable effects on how readers answered the ques- iment 2 and the properties of RAT verbs, which also
tions after the sentences. Most importantly, although make inference-drawing less likely because their
ambiguity affected both the likelihood ratings them- subject automatically becomes their object when no
selves and the question responses, there were still other object is mentioned. Thus, the rated likelihood
effects of likelihood ratings once ambiguity effects that Anna dressed the baby is less likely to influence
were taken into account. responses to the question Did the sentence explicitly say
It is interesting to note that the effect of likelihood that Anna dressed the baby?
rating was smaller for question responses in Experiment Additional analyses were conducted to determine
2, when inference-drawing was discouraged by asking whether the reading times at the disambiguating
explicit questions, as can be seen in Figure 11. We inter- region in Experiments 1 and 2 and whether the P600
preted the decrease in incorrect question responses in and LAN amplitudes in Experiment 3 were affected by
Experiment 2 as due to a reduction in the number of the same factors that determined the likelihood
question responses based on easily drawn inferences. ratings. For items with both OPT and RAT verbs in Exper-
Since the likelihood ratings obtained in Experiment 4 iment 1, which used non-explicit questions, likelihood
specifically assessed the likelihood of drawing the infer- ratings did not correlate with residual reading times at
ences we believed to be influencing question responses the disambiguation (OPT: r = .02; RAT: r = .04; ps > .1). In
in the other studies, it is not surprising that the relation- Experiment 2, however, when explicit questions were
ship between the ratings and question responses was used, there were very small but reliable correlations
much weaker when inference-drawing was minimised between likelihood ratings and residual reading times
in Experiment 2. The failure of likelihood ratings to (OPT: r = .07, t = 2.34, df = 1193, p < .05; RAT: r = .07, t =
reach reliability in predicting question responses for 2.00, df = 853, p < .05). In Experiment 3, which used the
same non-explicit questions as in Experiment 1, item-by- and the likelihood rating questions were actually quite
item P600 amplitude at the centroparietal region did not similar to the post-sentence questions used in the
correlate with likelihood ratings for either types of verbs other studies. The likelihood questions in Experiment 4
(OPT: r = −.07; RAT: r = .07; ps > .1). However, there was a (How likely is it that the man hunted the deer?) basically
marginally significant correlation between likelihood asked for a graded response to almost the same ques-
ratings and amplitude of the LAN at left anterior and tions that asked for a binary response in Experiments 1
left central regions for sentences with OPT verbs (r = and 3 (Did the man hunt the deer?). In Experiment 5, an
−.21, t = −1.89, df = 78, p = .06). Further analysis attempt was made to evaluate the likelihood of particular
showed that this correlation reached significance in subcomponents of the sentences without ever reading
unambiguous sentences (r = −.31, t = −2.01, df = 38, the whole sentence.
p = .05). The more likely it was for an event to happen,
the more negative the LAN component was, which con-
6. Experiment 5
firmed our earlier interpretation of the LAN as indexing
the amount of effort spent on retrieving the ambiguous In Experiment 4, likelihood ratings were given after the
noun the deer and the subordinate clause verb hunted sentences were read. In Experiment 5, questions were
from the working memory in order to evaluate the asked about particular parts of the sentence without
plausibility of the ambiguous noun as the direct object the whole sentence ever being seen. The idea was to
of the subordinate verb. No such correlation was found examine how particular sentence constituents might
with RAT items (p > .1), probably because the small have contributed to question responses in Experiments
number of RAT sentences was insufficient to average 1–3 separately from the effect of reading the whole sen-
out the amount of noise that is commonly present in tence and being garden-pathed when it was ambiguous.
ERP studies. Experiment 5a attempted to examine the effect of the
Overall, correlational analyses showed that there was relative clause that was brown and graceful from While
either a very small (rs < = .07) or no relation between like- the man hunted the deer that was brown and graceful
lihood ratings and disambiguating region reading times/ ran into the woods., since it could influence how likely
P600 amplitude. This is consistent with the finding across the deer is to be hunted. For instance, a deer that is
experiments that likelihood ratings predicted question cute and little might be less likely to be hunted. Exper-
response accuracy in most cases but reading times at iment 5b examined the effect of the main clause the
the disambiguating verb did not. The correlation deer ran into the woods, since it could also influence
between the amplitude of the LAN and likelihood the likelihood of the event. For example, as described
ratings suggests that event likelihood is indeed taken above, a hunter would be less likely to hunt a deer that
into account during on-line processing of these sen- is pacing in a zoo (Christianson et al., 2001).
tences. Thus, likelihood ratings of the events described
in the sentences were a better predictor of question
6.1. Method
responses than reading times at the disambiguating
verb were. Again, it is interesting to note the small but 6.1.1. Materials and procedures
reliable relationship between the ratings and reading Two norming studies were conducted in Experiment 5. In
times at the disambiguation in Experiment 2. It seems both of them, sentence components were rated. The first
that when participants knew an explicit question was of these norming studies (5a) asked participants to give a
coming, the more likely the event described in the sen- percentage rating to How likely is it that a man would
tence was, the more time they spent reading the disam- hunt a deer that was brown and graceful?. This task will
biguating region that told them they should answer No. be called Modifier norming. In 5b, participants were
To summarise, Experiment 4 was conducted to inves- asked to rate How likely is it that a man would hunt a
tigate whether or not the likelihood of the events deer that ran into the woods?, which will be called Main
described in the garden-path sentences predicted ques- Clause norming. Item-by-item mean likelihood ratings
tion accuracy. The goal was to try to determine whether were obtained by averaging across all subjects and
item-specific properties would predict question were entered into logit mixed-effect models together
responses better than the online processing measures to see if they predicted question response accuracy in
at the disambiguating region did. In retrospect, it Experiments 1–3.
should not be surprising that the likelihood ratings
from Experiment 4 were so successful at predicting the 6.1.2. Participants
question responses in Experiments 1–3, as in all cases Thirty undergraduate students (19 males; mean age 20;
the question was asked after the sentence was read, range 18–22) participated in 5a and another 32 (10 males;
22 Z. QIAN ET AL.
mean age 19; range 18–23) participated in 5b. All were Experiment 2 they were marginal for items with both
recruited from the University of Illinois at Urbana- verb types (OPT: r = .05, t = 1.70, df = 1193, p = .09; RAT:
Champaign and were native speakers of English. They r = .06, t = 1.75, df = 853, p = .08). No correlation was
gave written informed consent and received course found in Experiment 3 with either types of verbs
credit for taking part. (ps > .5). Thus, the more likely the people in the Modifier
norming study found it that a man would hunt a deer
that was brown and graceful, the more different partici-
6.2. Results
pants in Experiments 1 and 2 slowed down at the disam-
On average, the likelihood ratings for OPT items were biguation, which was where they discovered that the
62% (SD = 26%) and for RAT items 59% (SD = 27%) in man might not be hunting the deer after all. That is,
the Modifier norming, and 56% (SD = 29%) for OPT the more plausible one group of people found the
items and 55% (SD = 30%) for RAT items in the Main deer plus its modifying relative clause as the object of
Clause norming. hunting, the more difficult other groups of people
As in Experiment 4, ambiguity was included in all stat- found it to read words contradicting that state of
istical models to evaluate whether there were any effects affairs. In other words, when the deer plus its modifying
of the Modifier and Main Clause likelihood ratings over relative clause was more plausible as the object of
and above the effect of ambiguity on question response hunting, it was harder for the participants in Experiments
accuracy in the other studies. For the Experiment 1 1 and 2 to revise that interpretation, which should make
results, neither the Modifier nor the Main Clause the interpretation more likely to linger and influence
ratings predicted question response accuracy for sen- responses to the post-sentence questions. However,
tences with either OPT or RAT verbs (all p’s > .1). For that turned out not to be the case – question responses
the Experiment 2 results, the Modifier ratings did were generally not predicted by either the Modifier or
predict question response accuracy only for the sen- Main Clause ratings.
tences with RAT verbs (β = .45, SE = .21, z = 2.10, p = .04), Since neither the modifying relative clauses nor the
with events rated as more likely leading to more incor- main clauses from the original sentences affected ques-
rect Yes responses to the questions. The Modifier tion response accuracy, the effect of the likelihood
ratings did not, however, similarly predict response accu- ratings obtained in Experiment 4 on question accuracy
racy for items with OPT verbs, and the Main Clause in Experiments 1 and 2 must be due to the likelihood
ratings did not predict accuracy in sentences with of the entire event described in the original sentence.
either type of verb (all p’s > .1). For the Experiment 3 The more likely an event was, the more likely the
results, the only effect was that the Main Clause ratings interpretation of the noun as the subordinate clause
predicted question response accuracy for items with object tended to linger, which is similar to the marginal
OPT verbs only (β = .02, SE = .01, z = 2.03, p = .04). effect of plausibility on question responses found by
Overall, there was little or no influence of either of Nakamura and Arai (2016) for a different kind of locally
these likelihood ratings on question response accuracy ambiguous sentence in Japanese.
in Experiments 1–3.
The goal of collecting Modifier and Main Clause likeli-
7. General discussion
hood ratings was to elucidate the factors contributing to
the responses to the post-sentence questions in Exper- The Good-Enough Processing Account proposed two
iments 1–3, but it is also interesting to examine possible explanations for why people incorrectly
whether these ratings predicted disambiguation region respond Yes to questions like Did the man hunt the
reading times in Experiments 1 and 2. Because the deer? after sentences like While the man hunted the
main clause had not yet been seen at the disambiguat- deer that was brown and graceful ran into the woods.
ing region in those studies, its ratings should have no According to the Incomplete Reanalysis (Christianson
effect on reading times at that point in the sentence, et al., 2001; Ferreira et al., 2001) version of the Good-
but the modifiers have been seen by then. In both Exper- Enough Account, the initial misinterpretation lingers
iment 1 and 2, there were small correlations between the because the parser fails to completely reanalyse the syn-
Modifier likelihood ratings and reading times, such that tactic structure of garden-path sentences, resulting in
the disambiguating region was read more slowly in the ambiguous noun staying in the direct object role in
items with higher Modifier ratings. In Experiment 1, the subordinate clause. The alternative is the Lingering
those correlations were reliable for items with both OPT Misinterpretation (Ferreira et al., 2004) version, in which
and RAT verbs (OPT: r = .16, t = 5.39, df = 1181, p < .0001; reanalysis is completed, but the results of both analyses
RAT: r = .13, t = 3.50, df = 752, p < .001), and in persist. The present studies aimed specifically to test the
Incomplete Reanalysis version, which predicts that more primary reason for lingering misinterpretation. A likeli-
reanalysis work at the disambiguation should lead to hood rating task was done in Experiments 4 and 5 to
fewer incorrect Yes responses to questions probing the explore whether the likelihood of the events described
initial misinterpretation. Incomplete Reanalysis was in the sentences could explain question response accu-
thus tested by using self-paced reading times and ERP racy. Results showed that, for sentences with both
responses at the disambiguating region as measures types of verbs, the more likely the event described by
indexing the amount of reanalysis work done at the dis- the initial misinterpretation, the more likely readers
ambiguation, and comparing those between trials fol- were to answer the questions incorrectly. Thus, event
lowed by correct responses and those followed by likelihood was a better predictor of response accuracy
incorrect responses. than reading times or P600 amplitudes at the disambig-
Two self-paced reading and one ERP experiment were uating region. Thus, the predictions made by the Incom-
conducted with two types of post-sentence questions. plete Reanalysis version of the Good-Enough Processing
The non-explicit questions used in Experiments 1 and 3 Account were not borne out in the present studies.
were the same questions that had been used in several The logic underlying the present studies depends on
previous studies, asking whether the temporarily ambig- the common assumption that reading times and P600
uous noun was the object of the subordinate clause verb amplitudes at the disambiguating region of garden-
(Did the man hunt the deer?/Did Anna dress the baby?). path sentences at least partially index reanalysis.
The explicit questions used in Experiment 2 more specifi- However, our results suggest that slower reading times
cally targeted the true content of the sentence (Did the and larger P600s at least sometimes indicate continuing
sentence explicitly say that the man hunted the deer?/Did confusion rather than successful recovery from such con-
the sentence explicitly say that Anna dress the baby?). fusion, as it was trials with incorrect question responses
The goal of the explicit questions was to discourage par- that tended to have longer reading times or larger
ticipants from answering based on inferences they could P600 amplitudes at the disambiguating region. This
easily draw from the sentences, i.e. that the deer was result may be analogous to the “labor-in-vain” effect in
most likely what the man hunted even though the sen- the memory literature (Nelson & Leonesio, 1988), which
tence did not actually say that. The idea was that explicit has been discussed in the language processing literature
questions might lead to a cleaner relationship between by Stine-Morrow and colleagues (e.g. Payne & Stine-
the online processing measures at the disambiguation Morrow, 2016; Stine-Morrow et al., 2010). This raises
and question responses because question responses the issue that reading times and ERP measures taken at
should be less likely to be influenced by inference. The the disambiguating region in garden-path sentences
explicit questions apparently did succeed in discouraging may generally need to be reinterpreted.
inference-based responses, as the number of incorrect Our results also suggest that questions intended to
question responses to both types of sentences decreased specifically probe whether an initial misanalysis of a
substantially. Nevertheless, question responses were still garden-path sentence is successfully revised may not
not predictable from the reading times at the disambigu- actually be the best source of evidence, which is ironic
ating region for sentences with either types of the verbs. given that the Good-Enough Processing Account was
In the few instances where there was any relationship initially developed specifically to account for the high
between the online measures and question responses, it number of errors in response to such questions. Given
was in the opposite direction from that predicted by the the fundamental role that incorrect responses to such
Incomplete Reanalysis version of the Good-Enough Pro- questions have played in the development of the
cessing Account. The predicted direction was that Good-Enough Processing Account, it is an important
slower reading times and larger P600 amplitudes should point. However, evidence in support of the Good-
reflect more reanalysis work and that that should lead Enough Processing Account has accumulated from
to more correct question responses, but instead slower other sources using more implicit measures, including
reading times and larger P600 amplitudes tended to be syntactic priming (Christianson et al., 2010; van Gompel
associated with more incorrect question responses, et al., 2006), processing of a subsequent sentence (Slat-
suggesting more confusion in general on those trials. tery et al., 2013), translation (Lim & Christianson, 2013a,
Thus, there was no evidence in any of the studies that 2013b), paraphrasing (Patson et al., 2009), responses to
trials that elicited correct No responses to the questions different types of questions (Christianson & Luke, 2011),
were the ones on which participants did the work necess- and sentence-picture matching (Malyutina & den Ouden,
ary to fully reanalyse the sentences. 2016). Nevertheless, the nature of the relationship
The findings across our experiments converged to between probe type and the representation of the sen-
show that Incomplete Reanalysis might not be the tence content in memory remains largely unexamined,
24 Z. QIAN ET AL.
despite the fact that explicit comprehension questions of the initial misinterpretation, the likelihood ratings indi-
are the most commonly used tool for measuring text cate that the likelihood that a misinterpretation lingers is
comprehension in many fields, including education. influenced by the likelihood of events described in it. Fur-
The likelihood of the events described in the sen- thermore, this likelihood might be modulated by more
tences with both OPT and RAT verbs in our studies global cues distributed across the entire sentence (e.g.
appeared to be a stronger predictor of question While the man hunted the deer paced in the zoo) or by
responses than online processing measures at the disam- more local cues, for example the subcategorization
biguation, which is consistent with previous studies requirements of the temporarily ambiguous verb (e.g.
finding effects of plausibility on the likelihood of a linger- While the man hunted the plane flew over the woods; den
ing misinterpretation. Event likelihood, as measured in Ouden, Dickey, Anderson, & Christianson, 2016). Note
our experiments, is comparable to the plausibility heuris- that comprehension questions analogous to the ones
tic examined in previous Good-Enough-related work (e.g. used here for the locally implausible sentence (i.e. Did
Christianson et al., 2001, 2006; Nakamura & Arai, 2016). the man hunt the plane?) would be predicted to have a
For instance, Slattery et al. (2013) found that an initial relatively higher accuracy rate. Given that den Ouden
misinterpretation lingered only when the ambiguous et al. (2016) use fMRI to show distinct brain activation pat-
noun was a plausible direct object for the subordinate terns for garden-path sentences with local vs. global
clause verb, as in While the thief hid the jewel from the plausibility cues, further work is required to better under-
gallery could be seen on the security camera. There were stand how qualitatively different types of likelihood or
no evidence of lingering misinterpretation when the plausibility might be related to, and affect accuracy
ambiguous noun and the subordinate verb formed an rates in response to, offline measures of comprehension.
implausible interpretation, as in While the thief hid the In conclusion, our studies did not support the Incom-
guard from the gallery could be seen on the security plete Reanalysis version of the Good-Enough Processing
camera. Experiments 1a and 1b in Christianson et al. Account, which they were specifically designed to test.
(2001) also explicitly tested the effect of plausibility on They do not, however, rule out the Lingering Misinterpre-
lingering misinterpretation. The results of those exper- tation version of the Good-Enough Processing Account,
iments showed that implausible (short) garden-paths which awaits further testing. The fact that the rated like-
and non-garden paths alike were responded to with lihood of the events described in a sentence was the
error rates as low as 10% (Experiment 1b). The important strongest predictor of the responses to questions follow-
difference between the previous and present work is that ing those sentences raises the issue that responses to
earlier explorations of plausibility (or event likelihood) such questions are determined by a variety of factors
were not linked to online measures of processing. Conse- other than those our study was designed to test. Finally,
quently, the plausibility effects detailed in the earlier our results further suggest that the processing difficulty
work were interpreted as having short-circuited or trun- indexed by reading times and ERPs at the disambiguating
cated the reanalysis process. As shown by Slattery et al. words in garden-path sentences is not always a sign of
(2013) and the current work, however, there is abundant successful reanalysis, consistent with the general tenets
online evidence that in fact readers usually do attempt to of the Good-Enough Processing Account.
deal with the error signal within the garden-path struc-
ture. Despite this effort, though, accuracy for offline com-
prehension probes remains low. Given that the Note
comprehension question itself is equally likely or plaus- 1. A similar problem with response-contingent reading
ible as the sentence about which it is being asked, and times would have arisen in Experiments 1 and 2 if
that it too is read using whatever mechanisms are ANOVAs had been used to analyze them, but the
employed during the reading of the sentence, it mixed-effect models that were used to analyze reading
times can handle situations with missing data.
follows that response patterns should be related to
both the plausibility of the sentence and the comprehen-
sion question. When a plausible event interpretation is
Disclosure statement
initially generated by a partial parse of a garden-path
sentence, and that interpretation is then re-activated No potential conflict of interest was reported by the authors.
by an equally plausible comprehension question, the
result appears to be error-prone responses due to com-
petition between the revised interpretation and the lin- Funding
gering original incorrect interpretation. Although the This work was funded by the Dissertation Completion Fellow-
present study did not directly manipulate the plausibility ship and the Network for Neuro-Cultures Graduate Training
Fellowship from the Graduate College, University of Illinois at Journal of Experimental Psychology, 69, 926–949. doi:10.
Urbana-Champaign to Zhiying Qian. 1080/17470218.2015.1028416
Ferreira, F. (2003). The misinterpretation of noncanonical sen-
tences. Cognitive Psychology, 47, 164–203. doi:10.1016/
S0010-0285(03)00005-7
References
Ferreira, F., Bailey, K. G. D., & Ferraro, V. (2002). Good-enough
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random representations in language comprehension. Current
effects structure for confirmatory hypothesis testing: Keep Directions in Psychological Science, 11, 11–15.
it maximal. Journal of Memory and Language, 68, 255–278. Ferreira, F., Christianson, K., & Hollingworth, A. (2001).
doi:10.1016/j.jml.2012.11.001 Misinterpretations of garden-path sentences: Implications
Barton, S. B., & Sanford, A. J. (1993). A case study of anomaly for models of sentence processing and reanalysis. Journal
detection: Shallow semantic processing and cohesion estab- of Psycholinguistic Research, 30, 3–20.
lishment. Memory & Cognition, 21, 477–487. Ferreira, F., & Clifton, C. (1986). The independence of
Bever, T. G. (1970). The cognitive basis for linguistic structures. syntactic processing. Journal of Memory and Language, 25,
Cognition and the Development of Language, 279, 1–61. 348–368.
Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2008). An Ferreira, F., & Henderson, J. M. (1990). Use of verb information in
alternative perspective on “semantic P600” effects in syntactic parsing: Evidence from eye movements and word-
language comprehension. Brain Research Reviews, 59, 55– by-word self-paced reading. Journal of Experimental
73. doi:10.1016/j.brainresrev.2008.05.003 Psychology: Learning, Memory, and Cognition, 16, 555–568.

Chatrian, G. E., Lettich, E., & Nelson, P. L. (1985). Ten percent Ferreira, F., & Henderson, J. M. (1991). Recovery from misana-
electrode system for topographic studies of spontaneous lyses of garden-path sentences. Journal of Memory and
and evoked EEG activities. American Journal of EEG Language, 30, 725–745.
Technology, 25, 83–92. Ferreira, F., & Henderson, J. M. (1998). Syntactic reanalysis, the-
Christianson, K. (2008). Sensitivity to syntactic changes in matic processing, and sentence comprehension. In J. D.
garden path sentences. Journal of Psycholinguistic Research, Fodor & F. Ferreira (Eds.), Reanalysis in sentence processing
37, 391–403. (pp. 73–100). Dordrecht: Kluwer Academic.
Christianson, K. (2016). When language comprehension goes Ferreira, F., Lau, E. F., & Bailey, K. G. D. (2004). Disfluencies, language
wrong for the right reasons: Good-enough, underspecified, comprehension, and tree adjoining grammars. Cognitive
or shallow language processing. The Quarterly Journal of Science, 28, 721–749. doi:10.1016/j.cogsci.2003.10.006
Experimental Psychology, 69, 817–828. doi:10.1080/ Ferreira, F., & Patson, N. D. (2007). The “good enough” approach
17470218.2015.1134603 to language comprehension. Language and Linguistics
Christianson, K., Hollingworth, A., Halliwell, J. F., & Ferreira, F. Compass, 1, 71–83. doi:10.1111/j.1749-818X.2007.00007.x
(2001). Thematic roles assigned along the garden path Fodor, J. D., & Inoue, A. (1998). Attach anyway. In J. D. Fodor & F.
linger. Cognitive Psychology, 42, 368–407. doi:10.1006/cogp. Ferreira (Eds.), Reanalysis in sentence processing (Vol. 21, pp.
2001.0752 101–141). Dordrecht: Kluwer.
Christianson, K., & Luke, S. G. (2011). Context strengthens initial Frazier, L., & Clifton, C. (1998). Sentence reanalysis and visibility.
misinterpretations of text. Scientific Studies of Reading, 15, In J. D. Fodor & F. Ferreira (Eds.), Reanalysis in sentence proces-
136–166. doi:10.1080/10888431003636787 sing (pp. 143–176). Dordrecht: Kluwer Academic.
Christianson, K., Luke, S. G., & Ferreira, F. (2010). Effects of plausi- Frazier, L., & Fodor, J. D. (1978). The sausage machine: A new
bility on structural priming. Journal of Experimental two-stage parsing model. Cognition, 6, 291–325.
Psychology: Learning, Memory, and Cognition, 36, 538–544. Frazier, L., & Rayner, K. (1982). Making and correcting errors
Christianson, K., Williams, C. C., Zacks, R. T., & Ferreira, F. (2006). during sentence comprehension: Eye movements in the
Younger and older adults’ “good-enough” interpretations of analysis of structurally ambiguous sentences. Cognitive
garden-path sentences. Discourse Processes, 42, 205–238. Psychology, 14, 178–210.
doi:10.1207/s15326950dp4202_6 Friederici, A. D. (1998). The neurobiology of language proces-
Clahsen, H., & Felser, C. (2006). Grammatical processing in sing. In A. D. Friederici (Ed.), Language comprehension: A bio-
language learners. Applied Psycholinguistics, 27, 3–42. logical perspective (pp. 263–301). Berlin: Springer.
doi:10.1017/S0142716406060024 Friederici, A. D., Mecklinger, A., Spencer, K. M., Steinhauer, K., &
Coulson, S., King, J. W., & Kutas, M. (1998). Expect the unex- Donchin, E. (2001). Syntactic parsing preferences and their
pected: Event-related brain response to morphosyntactic on-line revisions: A spatio-temporal analysis of event-
violations. Language and Cognitive Processes, 13, 21–58. related brain potentials. Cognitive Brain Research, 11, 305–
doi:10.1080/016909698386582 323. doi:10.1016/S0926-6410(00)00065-3
Coulson, S., & Kutas, M. (2001). Getting it: Human event-related Frisch, S., Schlesewsky, M., Saddy, D., & Alpermann, A. (2002).
brain response to jokes in good and poor comprehenders. The P600 as an indicator of syntactic ambiguity. Cognition,
Neuroscience Letters, 316, 71–74. doi:10.1016/S0304-3940 85, B83–B92. doi:10.1016/S0010-0277(02)00126-9
(01)02387-4 Frisson, S. (2009). Semantic underspecification in language pro-
Cousineau, D. (2005). Confidence intervals in within-subject cessing. Language and Linguistics Compass, 3, 111–127.
designs: A simpler solution to Loftus and Masson’s method. doi:10.1111/j.1749-818X.2008.00104.x
Tutorials in Quantitative Methods for Psychology, 1, 42–45. Garnsey, S. M., Pearlmutter, N. J., Myers, E., & Lotocky, M. A.
den Ouden, D. B., Dickey, M. W., Anderson, C., & Christianson, K. (1997). The contributions of verb bias and plausibility to
(2016). Neural correlates of early-closure garden-path pro- the comprehension of temporarily ambiguous sentences.
cessing: Effects of prosody and plausibility. The Quarterly Journal of Memory and Language, 37, 58–93.
26 Z. QIAN ET AL.
Gouvea, A. C., Phillips, C., Kazanina, N., & Poeppel, D. (2010). The King, J., & Kutas, M. (1995). Who did what and when? Using
linguistic processes underlying the P600. Language and word- and clause-level ERPs to monitor working memory
Cognitive Processes, 25, 149–188. doi:10.1080/ usage in reading. Journal of Cognitive Neuroscience, 7, 376–
01690960902965951 395.
Greenhouse, S. W., & Geisser, S. (1959). On methods in the Kluender, R., & Kutas, M. (1993). Bridging the gap: Evidence
analysis of profile data. Psychometrika, 24, 95–112. from ERPs on the processing of unbounded dependencies.
Grodner, D., Gibson, E., Argaman, V., & Babyonyshev, M. Journal of Cognitive Neuroscience, 5, 196–214.
(2003). Against repair-based reanalysis in sentence Kolk, H. H. J., Chwilla, D. J., van Herten, M., & Oor, P. J. W. (2003).
comprehension. Journal of Psycholinguistic Research, 32, Structure and limited capacity in verbal working memory: A
141–166. study with event-related potentials. Brain and Language, 85,
Gullick, M. M., Mitra, P., & Coch, D. (2013). Imagining the truth 1–36. doi:10.1016/S0093-934X(02)00548-5
and the moon: An electrophysiological study of abstract Kounios, J., & Holcomb, P. J. (1994). Concreteness effects in
and concrete word processing. Psychophysiology, 50, 431– semantic processing: ERP evidence supporting dual-coding
440. doi:10.1111/psyp.12033 theory. Journal of Experimental Psychology: Learning,
Gunter, T. C., Friederici, A. D., & Schriefers, H. (2000). Syntactic Memory, and Cognition, 20, 804–823.
gender and semantic expectancy: ERPs reveal early auton- Kuperberg, G. R. (2007). Neural mechanisms of language com-
omy and late interaction. Journal of Cognitive Neuroscience, prehension: Challenges to syntax. Brain Research, 1146, 23–
12, 556–568. 49. doi:10.1016/j.brainres.2006.12.063
Hagoort, P., Brown, C., & Groothusen, J. (1993). The syntactic Kuperberg, G. R., Caplan, D., Sitnikova, T., Eddy, M., & Holcomb, P. J.
positive shift (SPS) as an ERP measure of syntactic proces- (2006). Neural correlates of processing syntactic, semantic, and
sing. Language and Cognitive Processes, 8, 439–483. thematic relationships in sentences. Language and Cognitive
Hahne, A., & Friederici, A. D. (1999). Electrophysiological evi- Processes, 21, 489–530. doi:10.1080/01690960500094279
dence for two steps in syntactic analysis: Early automatic Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting:
and late controlled processes. Journal of Cognitive Finding meaning in the N400 component of the event-
Neuroscience, 11, 194–205. related brain potential (ERP). Annual Review of Psychology,
Hahne, A., & Friederici, A. D. (2002). Differential task effects on 62, 621–647. doi:10.1146/annurev.psych.093008.131123
semantic and syntactic processes as revealed by ERPs. Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences:
Cognitive Brain Research, 13, 339–356. doi:10.1016/S0926- Brain potentials reflect semantic incongruity. Science, 207,
6410(01)00127-6 203–205.
Holcomb, P. J., Kounios, J., Anderson, J. E., & West, W. C. (1999). Lau, E. F., & Ferreira, F. (2005). Lingering effects of disfluent
Dual-coding, context-availability, and concreteness effects in material on comprehension of garden path sentences.
sentence comprehension: An electrophysiological investi- Language and Cognitive Processes, 20, 633–666. doi:10.
gation. Journal of Experimental Psychology: Learning, 1080/01690960444000142
Memory, and Cognition, 25, 721–742. Lee, C.-L., & Federmeier, K. D. (2008). To watch, to see, and to
Jacob, G., & Felser, C. (2016). Reanalysis and semantic persist- differ: An event-related potential study of concreteness
ence in native and non-native garden-path recovery. The effects as a function of word class and lexical ambiguity.
Quarterly Journal of Experimental Psychology, 69, 907–925. Brain and Language, 104, 145–158. doi:10.1016/j.bandl.
doi:10.1080/17470218.2014.984231 2007.06.002
Jaeger, T. F. (2008). Categorical data analysis: Away from Lee, C.-L., & Federmeier, K. D. (2009). Wave-ering: An ERP study
ANOVAs (transformation or not) and towards logit mixed of syntactic and semantic context effects on ambiguity resol-
models. Journal of Memory and Language, 59, 434–446. ution for noun/verb homographs. Journal of Memory and
doi:10.1016/j.jml.2007.11.007 Language, 61, 538–555. doi:10.1016/j.jml.2009.08.003
Kaan, E., Harris, A., Gibson, E., & Holcomb, P. (2000). The P600 as Lim, J.-H., & Christianson, K. (2013a). Integrating meaning and
an index of syntactic integration difficulty. Language and structure in L1-L2 and L2-L1 translations. Second Language
Cognitive Processes, 15, 159–201. doi:10.1080/ Acquisition, 29, 233–256.
016909600386084 Lim, J.-H., & Christianson, K. (2013b). Second language sentence
Kaan, E., & Swaab, T. (2003). Repair, revision, and complexity in processing in reading for comprehension and translation.
syntactic analysis: An electrophysiological differentiation. Bilingualism: Language and Cognition, 16, 518–537.
Journal of Cognitive Neuroscience, 15, 98–110. doi:10.1162/ Loftus, G. R., & Masson, M. E. (1994). Using confidence intervals
089892903321107855 in within-subject designs. Psychonomic Bulletin & Review, 1,
Karimi, H., & Ferreira, F. (2016). Good-enough linguistic rep- 476–490.
resentations and online cognitive equilibrium in language Lopez-Calderon, J., & Luck, S. J. (2014). ERPLAB: An open-source
processing. The Quarterly Journal of Experimental toolbox for the analysis of event-related potentials. Frontiers
Psychology, 69, 1013–1040. doi:10.1080/17470218.2015. in Human Neuroscience, 8, 734. doi:10.3389/fnhum.2014.
1053951 00213
Kaschak, M. P., & Glenberg, A. M. (2004). This construction needs MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994).
learned. Journal of Experimental Psychology: General, 133, The lexical nature of syntactic ambiguity resolution.
450–467. Psychological Review, 101, 676–703.
Kim, A., & Osterhout, L. (2005). The independence of combina- Malyutina, S., & den Ouden, D.-B. (2016). What is it that lingers?
tory semantic processing: Evidence from event-related Garden-path (mis)interpretations in younger and older
potentials. Journal of Memory and Language, 52, 205–225. adults. The Quarterly Journal of Experimental Psychology, 69,
doi:10.1016/j.jml.2004.10.002 880–906. doi:10.1080/17470218.2015.1045530
McRae, K., Spivey-Knowlton, M. J., & Tanenhaus, M. K. (1998). R Development Core Team. (2008). R: A language and environ-
Modeling the influence of thematic fit (and other con- ment for statistical computing. Vienna: R Foundation for
straints) in on-line sentence comprehension. Journal of Statistical Computing.
Memory and Language, 38, 283–312. Sachs, J. S. (1967). Recopition memory for syntactic and seman-
Morey, R. D. (2008). Confidence intervals from normalized data: tic aspects of connected discourse. Perception &
A correction to Cousineau (2005). Tutorials in Quantitative Psychophysics, 2, 437–442.
Methods for Psychology, 4, 61–64. Sanford, A. J., & Sturt, P. (2002). Depth of processing in language
Nakamura, C., & Arai, M. (2016). Persistence of initial misanalysis comprehension: Not noticing the evidence. Trends in
with no referential ambiguity. Cognitive Science, 40(4), 909– Cognitive Sciences, 6, 382–386. doi:10.1016/S1364-6613
940. doi:10.1111/cogs.12266 (02)01958-7
Nelson, T. O., & Leonesio, R. J. (1988). Allocation of self-paced Slattery, T. J., Sturt, P., Christianson, K., Yoshida, M., & Ferreira, F.
study time and the “labor-in-vain effect”. Journal of (2013). Lingering misinterpretations of garden path sen-
Experimental Psychology: Learning, Memory, and Cognition, tences arise from flawed semantic processing. Journal of
14, 676–686. Memory and Language, 69, 104–120. doi:10.1016/j.jml.2013.
Nieuwland, M. S., Otten, M., & Van Berkum, J. J. A. (2007). 04.001
Who are you talking about? Tracking discourse-level Stine-Morrow, E. A. L., Shake, M. C., Miles, J. R., Lee, K., Gao, X., &
referential processing with event-related brain potentials. McConkie, G. W. (2010). Pay now or pay later: Aging and the
Journal of Cognitive Neuroscience, 19, 228–236. doi:10.1162/ role of boundary salience in self-regulation of conceptual
jocn.2007.19.2.228 integration in sentence processing. Psychology and Aging,

Oines, L., & Kim, A. (2014). Integrate or repair? ERP responses to 25, 168–176.
semantic anomalies depend on choice of processing strategy. Sturt, P. (2007). Semantic re-interpretation and garden path
Paper presented at the Architectures and Mechanisms for recovery. Cognition, 105, 477–488. doi:10.1016/j.cognition.
Language Processing (AMLaP) Conference, Edinburgh. 2006.10.009
Oldfield, R. C. (1971). The assessment and analysis of handed- Sturt, P., Pickering, M. J., & Crocker, M. W. (1999). Structural
ness: The Edinburgh inventory. Neuropsychologia, 9, 97–113. change and reanalysis difficulty in language comprehension.
Osterhout, L., & Holcomb, P. J. (1992). Event-related brain Journal of Memory and Language, 40, 136–150.
potentials elicited by syntactic anomaly. Journal of Memory Swets, B., Desmet, T., Clifton, C., & Ferreira, F. (2008).
and Language, 31, 785–806. doi:10.1016/0749-596X Underspecification of syntactic ambiguities: Evidence from
(92)90039-Z self-paced reading. Memory & Cognition, 36, 201–216.
Osterhout, L., & Holcomb, P. J. (1993). Event-related potentials Tabor, W., & Hutchins, S. (2004). Evidence for self-organized sen-
and syntactic anomaly: Evidence of anomaly detection tence processing: Digging-in effects. Journal of Experimental
during the perception of continuous speech. Language and Psychology: Learning, Memory, and Cognition, 30, 431–450.
Cognitive Processes, 8, 413–437. Tanner, D. (2015). On the left anterior negativity (LAN) in elec-
Osterhout, L., Holcomb, P. J., & Swinney, D. A. (1994). Brain trophysiological studies of morphosyntactic agreement.
potentials elicited by garden-path sentences: Evidence of Cortex, 66, 149–155. doi:10.1016/j.cortex.2014.04.007
the application of verb information during parsing. Journal Townsend, D. J., & Bever, T. G. (2001). Sentence comprehension:
of Experimental Psychology: Learning, Memory, and The integration of habits and rules. Cambridge, MA: MIT Press.
Cognition, 20, 786–803. Trueswell, J. C., Tanenhaus, M. K., & Garnsey, S. M. (1994).
Osterhout, L., & Mobley, L. A. (1995). Event-related brain poten- Semantic influences on parsing: Use of thematic role infor-
tials elicited by failure to agree. Journal of Memory and mation in syntactic ambiguity resolution. Journal of
Language, 34, 739–773. Memory and Language, 33, 285–318.
Patson, N. D., Darowski, E. S., Moon, N., & Ferreira, F. (2009). Trueswell, J. C., Tanenhaus, M. K., & Kello, C. (1993). Verb-specific
Lingering misinterpretations in garden-path sentences: constraints in sentence processing: Separating effects of
Evidence from a paraphrasing task. Journal of Experimental lexical preference from garden-paths. Journal of
Psychology: Learning, Memory, and Cognition, 35, 280–285. Experimental Psychology: Learning, Memory, and Cognition,
Payne, B. R., & Stine-Morrow, E. A. L. (2016). Risk for mild cogni- 19, 528–553.
tive impairment is associated with semantic integration def- Van Dyke, J. A., & Lewis, R. L. (2003). Distinguishing effects of
icits in sentence processing and memory. The Journals of structure and decay on attachment and repair: A cue-
Gerontology Series B: Psychological Sciences and Social based parsing account of recovery from misanalyzed ambi-
Sciences, 71, 243–253. doi:10.1093/geronb/gbu103 guities. Journal of Memory and Language, 49, 285–316.
Pickering, M. J., & Traxler, M. J. (2003). Evidence against the use doi:10.1016/S0749-596X(03)00081-0
of subcategorisation frequency in the processing of van Gompel, R. P. G., Pickering, M. J., Pearson, J., & Jacob, G.
unbounded dependencies. Language and Cognitive (2006). The activation of inappropriate analyses in garden-
Processes, 18, 469–503. doi:10.1080/01690960344000017 path sentences: Evidence from structural priming. Journal
Pickering, M. J., Traxler, M. J., & Crocker, M. W. (2000). Ambiguity of Memory and Language, 55, 335–362. doi:10.1016/j.jml.
resolution in sentence processing: Evidence against fre- 2006.06.004
quency-based accounts. Journal of Memory and Language, van Herten, M., Chwilla, D. J., & Kolk, H. H. J. (2006). When heur-
43, 447–475. doi:10.1006/jmla.2000.2708 istics clash with parsing routines: ERP evidence for conflict
Rayner, K., Carlson, M., & Frazier, L. (1983). The interaction of monitoring in sentence perception. Journal of Cognitive
syntax and semantics during sentence processing: Eye Neuroscience, 18, 1181–1197. doi:10.1162/jocn.2006.18.7.1181
movements in the analysis of semantically biased sentences. van Herten, M., Kolk, H. H. J., & Chwilla, D. J. (2005). An ERP study
Journal of Verbal Learning and Verbal Behavior, 22, 358–374. of P600 effects elicited by semantic anomalies. Cognitive
28 Z. QIAN ET AL.
Brain Research, 22, 241–255. doi:10.1016/j.cogbrainres.2004. Wlotko, E. W., & Federmeier, K. D. (2012). So that’s what you
09.002 mean! Event-related potentials reveal multiple aspects of
Warner, J., & Glass, A. L. (1987). Context and distance-to-disam- context use during construction of message-level meaning.
biguation effects in ambiguity resolution: Evidence from NeuroImage, 62, 356–366. doi:10.1016/j.neuroimage.2012.
grammaticality judgments of garden path sentences. 04.054
Journal of Memory and Language, 26, 714–738. doi:10.1016/ Wonnacott, E., Joseph, H. S. S. L., Adelman, J. S., & Nation, K.
0749-596X(87)90111-2 (2016). Is children’s reading “good enough”? Links between
Weckerly, J., & Kutas, M. (1999). An electrophysiological analysis online processing and comprehension as children read syn-
of animacy effects in the processing of object relative sen- tactically ambiguous sentences. The Quarterly Journal of
tences. Psychophysiology, 36, 559–570. Experimental Psychology, 69, 855–879. doi:10.1080/
Wlotko, E., & Federmeier, K. D. (2011). Flexible implementation of 17470218.2015.1011176
anticipatory language comprehension mechanisms. Paper pre- Zhang, Q., Guo, C.-Y., Ding, J.-H., & Wang, Z.-Y. (2006).
sented at the Cognitive Neuroscience Society Meeting, Concreteness effects in the processing of Chinese words.
San Francisco, CA. Brain and Language, 96, 59–68. doi:10.1016/j.bandl.2005.04.004

A Comparison of Online and Offline Measures of Good-Enough Processing in Garden-Path Sentences

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Comparison of Online and Offline Measures of Good-Enough Processing in Garden-Path Sentences

Uploaded by

Copyright:

Available Formats

Language, Cognition and Neuroscience

ISSN: 2327-3798 (Print) 2327-3801 (Online) Journal homepage: http://www.tandfonline.com/loi/plcp21

A comparison of online and offline measures of

Zhiying Qian, Susan Garnsey & Kiel Christianson

To link to this article: http://dx.doi.org/10.1080/23273798.2017.1379606

Published online: 22 Sep 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

A comparison of online and offline measures of good-enough processing in

ABSTRACT ARTICLE HISTORY

1. Introduction Traditional sentence processing models disagree on

CONTACT Zhiying Qian audreyqzy@gmail.com

2.2.1. Question responses Prior to analysis, word-by-word reading times that

Figure 2. Question response time in Experiment 1 separately by

Figure 5. Question response time in Experiment 2 separately by

2, reading time on the disambiguating region did reliably

were instructed to try to minimise blinking and body

Figure 8. Question response time in Experiment 3 separately by

in Experiments 1–3, ambiguity was included as another

73. doi:10.1016/j.brainresrev.2008.05.003 Psychology: Learning, Memory, and Cognition, 16, 555–568.

jocn.2007.19.2.228 integration in sentence processing. Psychology and Aging,

You might also like