Professional Documents
Culture Documents
Generative Perspectives
Edited by
G I S B E RT FA N S E LOW, CA RO L I N E F ÉRY,
R A L F VOG E L , A N D M AT T H I A S S C H L E S EWSK Y
1
3
Great Clarendon Street, Oxford ox2 6dp
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With oYces in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
ß 2006 organization and editorial matter Gisbert Fanselow,
Caroline Féry, Ralf Vogel, and Matthias Schlesewsky
ß 2006 the chapters their various authors
The moral rights of the author have been asserted
Database right Oxford University Press (maker)
First published 2006
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Typeset by SPI Publisher Services, Pondicherry, India
Printed in Great Britain
on acid-free paper by
Biddles Ltd. www.biddles.co.uk
ISBN 019–927479–7 978–019–927479–6
1 3 5 7 9 10 8 6 4 2
Contents
1 Gradience in Grammar 1
Gisbert Fanselow, Caroline Féry, Ralf Vogel,
and Matthias Schlesewsky
References 359
Index of Languages 395
Index of Subjects 397
Index of Names 400
Notes on Contributors
Adam Albright received his BA in linguistics from Cornell University in 1996 and his
Ph.D. in linguistics from UCLA in 2002. He was a Faculty Fellow at UC Santa Cruz
from 2002 to 2004, and is currently an Assistant Professor at MIT. His research
interests include phonology, morphology, and learnability, with an emphasis on
using computational modelling and experimental techniques to investigate issues in
phonological theory.
Paul Boersma is Professor of Phonetic Sciences at the University of Amsterdam. He
works on constraint-based models of bidirectional phonology and phonetics and its
acquisition and evolution. His other interests include the history of Limburgian tones
and the development of Praat, a computer program for speech analysis and manipu-
lation.
Ina Bornkessel graduated from the University of Potsdam with a ‘Diplom’ (MA-
equivalent) in general linguistics in 2001. In her Ph.D. research (completed in 2002 at
the Max Planck Institute of Cognitive Neuroscience/University of Potsdam), she
developed a neurocognitive model of real-time argument comprehension, which is
still undergoing further development and is now being tested in a number of
typologically different languages. Ina Bornkessel is currently the head of the
Independent Junior Research Group Neurotypology at the Max Planck Institute for
Human Cognitive and Brain Sciences in Leipzig.
Abigail C. Cohn is an Associate Professor in Linguistics at Cornell University, Ithaca,
NY, where her research interests include phonology, phonetics, and their interactions.
She has focused on the sound systems of a number of languages of Indonesia, as well
as English and French. She received her Ph.D. in Linguistics at UCLA.
Leonie Cornips is Senior Researcher at the Department of Language Variation of the
Meertens Institute (Royal Netherlands Academy of Arts and Sciences) and head of
the department from 1 January 2006. Her dissertation (1994, Dutch Linguistics,
University of Amsterdam) was about syntactic variation in a regional Dutch variety
(Heerlen Dutch). Recently, she was responsible for the methodology of the Syntactic
Atlas of the Dutch Dialects-project. Further, she examines non-standard Dutch
varieties from both a sociolinguistic and generative perspective.
Matthew W. Crocker (Ph.D. 1992, Edinburgh) is Professor of Psycholinguistics at
Saarland University, having previously been a lecturer and research fellow at the
University of Edinburgh. His current research exploits eye-tracking methods and
computational modelling to investigate adaptive mechanisms in human language
viii Notes on Contributors
comprehension, such as the use of prior linguistic experience and immediate visual
context.
Gisbert Fanselow received his Ph.D. in Passau (1985) and is currently Professor
of Syntax at the University of Potsdam. Current research interests include word
order, discontinuous arguments, wh-movement, and empirical methods in syntax
research.
Janet Dean Fodor has a BA in psychology and philosophy (Oxford University 1964)
and a Ph.D. in linguistics (MIT 1970). She is Distinguished Professor of Linguistics,
Graduate Center, City University of New York and President of the Linguistic Society
of America since 1997. Her research interests are human sentence processing, espe-
cially prosodic influences and garden path reanalysis; and language learnability
theory, especially modelling syntactic parameter setting.
Stefan Frisch studied psychology, philosophy, and linguistics (at the University
of Heidelberg and the Free University Berlin). He was a research assistant at the
Max-Planck Institute of Human Cognitive and Brain Sciences, Leipzig and at
the University of Potsdam, where he got his Ph.D. in 2000. He is now a research
assistant at the Day-Care Clinic of Cognitive Neurology, University of Leipzig.
Stefan A. Frisch received his Ph.D. in linguistics from Northwestern University, and
is currently Assistant Professor in the Department of Communication Sciences and
Disorders at the University of South Florida. He specializes in corpus studies of
phonotactic patterns, experiments on the acceptability of novel word stimuli, and
the phonetic study of phonological speech errors.
bilinguals, the interfaces between syntax and other domains, the psychology of
linguistic intuitions, and the cognitive neuroscience of the bilingual brain.
Ruben Stoel received a Ph.D. from Leiden University in 2005. He is currently a
research assistant at the University of Leiden. His interests include intonation, infor-
mation structure, and the languages of Indonesia.
Ralf Vogel is Assistant Professor at the University of Bielefeld, having received a
Ph.D. from Humboldt University Berlin in 1998. His research agenda involves the
syntax of the Germanic languages, the development of Optimality theory syntax,
both formally and empirically, including interdisciplinary interaction with computer
scientists and psychologists. The development and exploration of empirical methods
in grammar research has become a strong focus of his work.
1
Gradience in Grammar
G I S B E RT FA N S E LOW, C A RO L I N E F É RY, R A L F VO G E L ,
A N D M AT T H I A S S C H L E S EW S K Y
Throughout this book, we will see that frequency plays a crucial role in
patterns of gradience. Frequency in phonology has also been examined from a
diVerent perspective, namely from the point of view of phonotactic patterns.
Frisch (1996) and Frisch et al. (1997) model a gradient constraint combination
to account for the phonotactics of the verbal roots in Arabic. In their chapter
‘Linguistic and Metalinguistic Tasks in Phonology: Methods and Findings’,
Stefan A. Frisch and Adrienne M. Stearns demonstrate that probabilistic and
gradient phonological patterns are part of the knowledge of a language in
general. Evidence for this thesis comes from a variety of sources, including
psycholinguistic experiments using metalinguistic and language processing
tasks, as well as studies of language corpora. These results support theories
that information about phonological pattern frequency is encoded at the
processing and production levels of linguistic representation.
Frisch and Stearn’s chapter Wrst reviews the methodologies that have been
used in phonological studies employing metalinguistic phonological judge-
ments, primarily in the case of phonotactics. These studies have found that
native speaker judgements closely reXect the phonotactic patterns of lan-
guage. Direct measures include well-formedness judgements, such as accept-
ability judgements and wordlikeness judgements (Frisch et al. 2000),
morphophonological knowledge (Zuraw 2000), inXuence of transitional
probabilities on wordlikeness judgements for novel words (Hay et al. 2004),
distance of novel CCVC words as measured by a phoneme substitution score
(Greenberg and Jenkins 1964), and measures of similarity between words.
Indirect measures reXect the grammatical linguistic knowledge through lin-
guistic performance and thus provide evidence for the psychological reality of
gradient phonological patterns. They include elicitation of novel forms (wug
tests), analysis of lexical distributions and of large corpora in general, as well
as analysis of confusability in perception and production. These last tests
show that lexical neighbourhood and phonotactic probability aVect speech
production.
Frisch and Stearn’s case study shows that sonority restrictions in consonant
clusters are gradient, the cross-linguistic preference being for onset consonant
clusters that have a large sonority diVerence. Quantitative language patterns
for thirty-seven languages were measured and compared to attested clusters.
Metalinguistic judgements of wordlikeness were also gathered for English and
compared to the attested and possible clusters, the results again providing
evidence for the psychological reality of gradient patterns in phonology. Mean
wordlikeness judgements correlated signiWcantly with the type frequency of
the CC sequences contained in the novel words.
10 Gradience in Grammar
The authors do not provide a grammatical model for their data. They even
conjecture that it is unclear whether a distinct phonological grammar is
required above and beyond what is necessary to explain patterns of phono-
logical processing. Given the grounding of gradient phonological patterns in
lexical distributions, they propose that exemplar models, based on frequency
information and probabilities, explain generalization-based behaviour as a
reXex of the collective activation of exemplars that are similar along some
phonological dimension, rendering abstract representations obsolete.
The contributions by Boersma and by Albright and Hayes propose anchor-
ing the correlation between gradience and frequency in grammar. They use
the Gradual Learning Algorithm (GLA) developed by Boersma (1998a) and
Boersma and Hayes (2001), a stochastic model of Optimality Theory. In GLA,
the variation comes from the possibility of a reordering of two or more
constraints in the hierarchy, expressed by overlapping of the constraint’s
range. In addition, the constraints have diVerent distances to their neigh-
bours. The likelihood of a reordering is thus not a function of the rank in the
hierarchy, but rather of the stipulated distance between the constraints, which
is encoded in the grammar by assigning numerical values to the constraints
which determine their rank and their distance at the same time. Boersma and
Hayes’s model thus allows us to deal with error variation as a source of
gradience in a language particular way.
Adam Albright and Bruce Hayes’s chapter ‘Modelling Productivity with the
Gradual Learning Algorithm: The Problem of Accidentally Exceptionless
Generalizations’ addresses the modelling of gradient data in inXectional
paradigms. Related to this is an empirical question about productivity:
when language learners are confronted with new data, which weight do they
assign to accuracy versus generality? This problem arises in relationship to
accidentally true or small-scale generalizations. These kinds of generalizations
are conWned to a small set of forms and correlate with unproductivity. This is
a classic problem of inductive learning algorithms which are restricted to a
training set: when confronted with new data, they might fail to make the right
predictions. In a standard Optimality Theory approach, constraints deduced
from the training set apply to the forms used for learning, but unfortunately
they make wrong predictions for new forms. Reliability of rules or
constraints, that is, how much of the input data they cover and how many
exceptions they involve, is not the right property to remedy this problem.
Generality may make better predictions, especially in the case of optionality
between two forms.
Children acquiring English, for instance, are confronted with several
answers as to how to form the past tense, as exempliWed by wing winged,
Gradience in Grammar 11
wring wrung, and sing sang, which are attested English forms. A subset
of such verbs, composed of dig, cling, fling and sling build their past tense
with [^] simulation. Albright and Hayes (2003) Wnd that for a newly coined
verb like spling, English speakers rate past tense splung and splinged nearly
equivalently high. Their conclusion is that general rules, like ‘form past tense
with -ed ’ are so general that they sometimes compete with exceptionless
particular rules.
Navajo sibilant harmony in aYxation, the data set discussed in this
chapter, exhibits a similar, although attested, optionality. If a stem begins
with a [–anterior] sibilant ([č, č, čh, š, ž]), the s-perfective preWx [ši] is
attached. If the stem contains no sibilant, the preWx [si] is the chosen form.
If there is a [–anterior] sibilant later in the stem, both [si] and [ši] are possible.
Albright and Hayes’s learning system is not able to cope with this pattern. The
problem is that in addition to general and useful constraints, the system also
generates junk constraints which apply without exception to a small number of
forms, but which make incorrect predictions for new forms. To remedy the
problem they rely on the Gradual Learning Algorithm (Boersma and Hayes
2001), which assumes a stochastic version of OT. Each pair of constraints is not
strictly ranked, but rather assigned a probability index. The solution they
propose is to provide each constraint with a generality index. Each rule is
provided with a ranking index correlating with generality: the more general the
constraint (in terms of the absolute number of forms which fulWls it), the
higher it is ranked in the initial ranking. The junk constraints are ranked very
low in the initial ranking and have no chance to attain a high ranking, even
though they are never violated by the data of the learning set.
Turning now to Paul Boersma’s chapter ‘Prototypicality Judgements as
Inverted Perception’, it must Wrst be noticed that his goal is very diVerent
from that of the preceding chapter. Boersma presents an account of gradience
eVects in prototypicality tasks as compared to phoneme production tasks
with the example of the vowel /i/. A prototype is more peripheral than
the modal auditory form (the form they hear) in the listeners’ language
environment, including the forms produced by the listeners themselves. The
diVerence between the best prototype /i/ and the best articulated vowel [i] is
implemented as a diVerence in the value of F1 and shows a discrepancy of
50 Hz between the two tasks. Boersma provides a model of production and
comprehension which implements the main diVerence between the tasks in
the presence of the articulatory constraints in the production task and their
absence in the prototypicality task. More speciWcally, he proposes augmenting
12 Gradience in Grammar
There are reasons for considering this negligence unfortunate. Some key
domains of syntax show gradience to a considerable degree. The subjacency
phenomena, superiority eVects, and word order restrictions Wgure promin-
ently in this respect. This high degree of gradience often makes it unclear
what the data really are, and syntactic practice does not follow the golden
rule of formulating theories on the basis of uncontroversial data only (and
have the theory then decide the status of the unclear cases). We believe that
theories formulated on the basis of clear-cut data only would not really
be interesting in many Welds of syntax, so it is necessary to make the
‘problematic’ data less controversial, that is, to formulate a theory of
gradience in syntax.
There are two types of approaches to syntactic gradience as a property of
grammar. Chomsky (1955) allows an interpretation in which the gradience is
coded as a property of the grammatical rules or principles. Simplifying his
idea a bit, one can say that full grammaticality is determined by a set of very
detailed, high precision rules. If a sentence is in line with these, it has the
highest degree of acceptability. For deviant sentences, we can determine the
amount (and kind) of information we would have to eliminate from the high
precision, full detail rules in order to make the deviant sentence Wt the rule.
The less we have to eliminate, the less unacceptable the sentence is. Müller
(1999) makes a related proposal. He reWnes standard OT syntax with the
concept of subhierarchies composed of certain principles that are inserted
into the hierarchy of the other syntactic constraints. For ‘grammaticality’, it
only matters whether at least one of the principles of the subhierarchy is
fulWlled, but a structure is ‘unmarked’ only if it is the optimal structural
candidate with respect to all of the principles in the subhierarchy.
In the other tradition, represented, for example, by Suppes (1970), the rules
and constraints of the grammar are directly linked to numerical values (not
unlike the variable rules in phonology).
In his chapter ‘Linear Optimality Theory as a Model of Gradience in
Grammar’, Frank Keller introduces a theory of the second type—Linear
Optimality Theory (LOT). In contrast to standard OT approaches, LOT is
designed to model gradient acceptability judgement data. The author argues
that the necessity for such an approach results from the observation that
gradience in judgement data has diVerent properties from gradience in corpus
data and that, therefore, both types of gradience should be modelled inde-
pendently. The basic idea of the model is represented in two hypotheses which
are formulated (a) with respect to the relative ranking of the constraints and
(b) regarding the cumulativity of constraint violations. Whereas the former
states that the numeric weight of a constraint is correlated with the reduction
Gradience in Grammar 15
The example that illustrates the Wrst claim is case conXicts in German free
relative constructions (FRs). FRs without case conXict receive higher accept-
ability in acceptability judgement experiments and are more frequent in
corpora than those with a case conXict. But the kind of conXict is also crucial:
conXicting FRs in which the oblique case dative is suppressed are judged as
less acceptable than those in which the structural case nominative is sup-
pressed. Vogel demonstrates that a standard optimality theoretic grammar is
already well-suited to predict these results, if one of its core features, the
central role of markedness constraints, is exploited in the right way. Vogel
further argues against the application of stochastic optimality theory in
syntax, as proposed, for instance, in Bresnan et al. (2001). The relative
frequencies of two alternative syntactic expressions in a corpus not only
reXect how often one structure wins over the other but also how often the
competition itself takes place, which here means how often a particular
semantic content is chosen as input for an OT competition. If the inXuence
of this latter factor is not neutralized, as in the model by Bresnan et al. (2001),
then properties of the world become properties of the grammar, an unwel-
come result. Vogel further provides evidence against another claim by Bresnan
et al., which has become famous as the ‘stochastic generalization’: categorical
contrasts in one language show up as tendencies in other languages. Typolo-
gically, FR structures are less common than semantically equivalent correla-
tive structures. The straightforward explanation for this observation can be
given in OT terms: FRs are more marked than correlatives. Nevertheless, a
corpus study shows that in unproblematic cases like non-conXicting nom-
inative FRs, FRs are much more frequent than correlatives in German. Vogel
argues that corpus frequency is biased by a stylistic preference to avoid over-
correct expressions which contain more redundant material than necessary,
primarily function words. Including such a stylistic preference into an OT
grammar in the form of a universal constraint would lead to incorrect
typological predictions. Vogel opts for the careful use of a multiplicity of
empirical methods in grammar research in order to avoid such method-
induced artefacts.
While these contributions highlight how syntactic principles can be made
responsible for gradience, several of the other factors leading to gradience are
discussed in detail in the following papers. That context and information
structure are relevant for acceptability has often been noted. This aspect is
addressed by Nomi Erteschik-Shir. Her ‘What’s What?’ discusses syntactic
phenomena which have been argued to lead to gradient acceptability in the
literature, namely the extraction of wh-phrases out of syntactic islands, as
well as several instances of so-called superiority violations, where multiple
Gradience in Grammar 17
wh-phrases within one clause appear in non-default order (e.g., *What did
who say ?). The acceptability of the wh-extraction in ‘Who did John say/?
mumble/*lisp that he had seen?’ seems to depend on the choice of the matrix
verb. Previous accounts of this contrast explained it by assigning a diVerent
syntactic status to the subordinate clause depending on the verb, leading to
stronger and weaker extraction islands. Erteschik-Shir shows that this analysis
is unable to explain why the acceptability of the clauses improves when the
oVending matrix verb has been introduced in the preceding context. She
argues that the possibility of extraction is dependent on the verb being
unfocused. The diVerence between semantically light verbs like ‘say’ and
heavier ones like ‘mumble’ and ‘lisp’ is that the latter are focused by default
while the former is not. Erteschik-Shir develops a model of the interaction
between syntax and information structure to account for this observation.
The central idea in her proposal is that only the focus domain can be the
source of syntactic extraction. If the main verb is focused, the subordinate
clause is defocused and thus opaque for extraction. Erteschik-Shir’s account
of superiority and exceptions from it (*What did who read? versus What did
which boy read?) also refers to the information structural implications of these
structures. Crossing movement does not induce a superiority violation if the
fronted wh-phrase is discourse-linked and thus topical. Another crucial factor
is that the in-situ wh-phrase is topical, which leads to focusing of the
complement, out of which extraction becomes possible. Erteschik-Shir’s
explanation for the degraded acceptability of these structures lies in her
view of elicitation methods. Usually, such structures are presented to inform-
ants without contexts. The degraded structures rely on particular information
structural conditions which are harder for the informants to accommodate
than the default readings. As this is a matter of imagination, Erteschik-Shir
also predicts that informants will diVer in their acceptance of the structures in
question. Overall, the account of syntactic gradience oVered here is process-
ing-oriented, in the sense that it is not the grammar itself that produces
gradience, but the inference of the information structural implications of an
expression in the course of parsing and interpretation.
In her chapter ‘Gradedness and Optionality in Mature and Developing
Grammars’, Antonella Sorace argues that residual optionality, which she
considers the source of gradience eVects, occurs only in interface areas of
the competence and not in purely syntactic domains. In that sense, her
approach is quite in line with what Erteschik-Shir proposes. Sorace asks (a)
whether gradedness can be modelled inside or outside of speakers’ grammat-
ical representations, and (b) whether all interfaces between syntax and other
domains of linguistics are equally susceptible to gradedness and optionality.
18 Gradience in Grammar
the crucial factor being that the shorter phrase is preferred to be adjacent to
the verb. An additional factor is whether (only) one of the two constituents is
in a dependency relation with the verb. This factor strengthens the weight
eVect if the selected phrase is shorter, but weakens it if it is longer. Hawkins
also suggests that MiD has shaped grammars and the evolution of grammat-
ical conventions, according to the performance-grammar correspondence
hypothesis: syntactic structures have been conventionalized in proportion to
their degree of preference in performance, as evidenced by patterns of selec-
tion in corpora and by ease of processing in performance. Hawkins further
argues that his account is superior to an alternative approach like stochastic
Optimality Theory because it does not mix grammatical constraints with
processing constraints, as a stochastic OT approach would have to do.
In their chapter ‘EVects of Processing DiYculty on Judgements of
Acceptability’, Gisbert Fanselow and Stefan Frisch present experimental data
highlighting an unexpected eVect of processing on acceptability. Typically, it is
assumed that processing diYculties reduce the acceptability of sentences.
Fanselow and Frisch report the results of experiments suggesting that pro-
cessing problems may make sentences appear more acceptable than they
should be on the basis of their grammatical properties. This is the case
when the sentence involves a local ambiguity that is initially compatible
with an acceptable interpretation of the sentence material, but which is later
disambiguated towards an ungrammatical interpretation. The Wndings
support the view that acceptability judgements not only reXect the outcome
of the Wnal computation, but also intermediate processing steps.
Matthias Schlesewsky, Ina Bornkessel, and Brian McElree examine the
nature of acceptability judgements from the perspective of online language
comprehension in ‘Decomposing Gradience: Quantitative versus Qualitative
Distinctions’. By means of three experimental methods with varying degrees
of temporal resolution (speeded acceptability judgements, event-related brain
potentials, and speed-accuracy trade-oV), the authors track the development
of speakers’ judgements over time, thereby showing that relative diVerences in
acceptability between sentence structures stem from a multidimensional
interaction between time sensitive and time insensitive factors. More specif-
ically, the Wndings suggest that increased processing eVort arising during the
comprehension process may be reXected in acceptability decreases even when
judgements are given without time pressure. In addition, the use of event-
related brain potentials as a multidimensional measurement technique reveals
that quantitative variations in acceptability may stem from underlying diVer-
ences that are qualitative in nature. On the basis of these Wndings, the authors
argue that gradience in linguistic judgements can only be fully described when
20 Gradience in Grammar
all component parts of the judgement process, that is, both its quantitative
and its qualitative aspects, are taken into account.
What conclusions should be drawn from the insight that gradience results
from domains such as processing diYculty or information structure is the
topic of Eric Reuland’s Chapter. In ‘Gradedness: Interpretive Dependencies
and Beyond’ he defends a classic generative conception of grammar that
contains only categorical rules and concepts. He identiWes a number of
grammar-external sources of gradience as it is frequently observed in empir-
ical linguistic studies. The language that should be modelled by grammarians,
according to Reuland, is the language of the idealized speaker/hearer of
Chomsky (1965). In this Chomskyan idealization, most factors which are
crucial for gradience are abstracted away from. Among such factors, Reuland
identiWes the non-discreteness of certain aspects of the linguistic sign, for
instance the intonation contours which are used to express particular seman-
tic and syntactic features of clauses, like focus or interrogativity. Reuland
argues that it is only the means by which these features are expressed which
are non-discrete, not the features themselves. But only the latter are subject to
the theory of grammar. Reuland further separates diVerences in language,
which do not exist despite the preference for one or the other expression
within the (idealized) speech community, from diVerences in socio-cultural
conventions, which may exist, but are irrelevant for the study of grammar.
Nevertheless, non-discrete phenomena are expected to occur where grammar
leaves open space for certain choices, for instance in the way the subcompo-
nents of grammar interact. Another source of gradience is variation in
acceptability judgements within a speech community, as dialectal or ideolec-
tal variation, and even within speakers, using diVerent ideolects on diVerent
occasions, or as the eVect of uncertainty in a judgement task. Apart from these
extra-grammatical explanations for gradience, Reuland also sees grammar
itself as a possible source of gradience. Current models of grammar include
a number of subcomponents, each of which has its own rules and constraints,
some perhaps violable, which interact in a non-trivial way. Any theory of
language, Reuland concludes, that involves a further articulation into such
subsystems is in principle well equipped to deal with ‘degrees’ of well- or
ill-formedness. Reuland exempliWes his position with a comparative case
study of the syntax of pronouns, mainly in Dutch and English. He shows
that the syntactic properties of reXexives and pronominals depend on a
number of further morphosyntactic properties of the languages in question,
among which are the inventory of pronouns in the language, richness of case,
the possibility of preposition stranding, the mode of reXexive marking on
verbs, the organization of the syntax–semantics interface in thematic
Gradience in Grammar 21
2.1 Introduction
In this chapter,1 I consider the status of gradient phonology, that is, phono-
logical patterns best characterized in terms of continuous variables. I explore
some possible ways in which gradience might exist in the phonology,
considering the various aspects of phonology: contrast, phonotactics,
morphophonemics, and allophony. A fuller understanding of the status of
gradience in the phonology has broader implications for our understanding
of the nature of the linguistic grammar in the domain of sound patterns and
their physical realizations. In the introduction, I consider why there might be
gradience in the phonology (Section 2.1.1). I then brieXy discuss the nature of
phonology versus phonetics (Section 2.1.2).
1 A number of the ideas discussed in this chapter were developed in discussions in my graduate
seminars at Cornell, Spring 2004 and Spring 2005. Some of these ideas were also presented in colloquia
at the Universities of BuValo and Cornell. Thanks to all of the participants in these fora for their
insightful comments and questions. Special thanks to Mary Beckman, Jim Scobbie, and an anonymous
reviewer for very helpful reviews of an earlier draft, as well as Johanna Brugman, Marc Brunelle, Ioana
Chitoran, Nick Clements, Caroline Féry, Lisa Lavoie, Amanda Miller, and Draga Zec for their
comments.
26 The Nature of Gradience
For recent discussions of this consensus view, see for example Keating (1996);
Cohn (1998); Ladd (2003), also individual contributions in Burton-Roberts
et al. (2000) and Hume and Johnson (2001). See also Cohn (2003) for a fuller
discussion of the nature of phonology and phonetics and their relationship.
For the sake of concreteness, consider an example of phonological patterns
and their corresponding phonetic realization that are consistent with the
correlations in (2.1). In Figure 2.1, we see representative examples of the
patterns of nasal airXow in French and English (as discussed in Cohn 1990,
1993). Nasal airXow is taken here as the realization of the feature Nasal.
In the case of a nasal vowel in French, here exempliWed in the form daim
‘deer’ [dE](Figure 2.1a), there is almost no nasal airXow on [d] and there is
signiWcant airXow throughout the [E ]. Here we observe plateaus correspond-
ing to the phonological speciWcations, connected by a rapid transition. In
English on the other hand, during a vowel preceding a nasal consonant, such
as [e] in den [den] (Figure 2.1b), there is a gradient pattern—or a cline—
following the oral [d] and preceding the nasal [n] (which are characterized by
the absence and presence of nasal airXow respectively). This is quite diVerent
t]
from the pattern of nasalization observed on the vowel in cases like sent [sE
(Figure 2.1c), in which case the vowel is argued to be phonologically nasalized
(due to the deletion of the following /n/) and we observe a plateau of nasal
airXow during the vowel, similar to the pattern seen in French. The observed
d ∼
ε d ε n
−N +N −N +N 100ms
∼
ε
s (n) t
−N +N −N
~ /
(c) English sent / sεt
Figure 2.1. Examples of nasal airflow in French and English following Cohn (1990,
1993)
28 The Nature of Gradience
diVerences between French and English relate quite directly to the fact that
French has nasal vowels, but English does not.
If the correlations in (2.1) are correct, we expect to Wnd categorical phon-
ology, but not gradient phonology, and gradient, but not categorical, phon-
etics. Recent work calls into question this conclusion. In particular, it is
evidence suggesting that there is gradience in phonology that has led some
to question whether phonetics and phonology are distinct. Pierrehumbert
et al. (2000) state the question in the following way:
this assertion [that the relationship of quantitative to qualitative knowledge is modu-
lar] is problematic because it forces us to draw the line somewhere between the two
modules. Unfortunately there is no place that the line can be cogently drawn. . . . In
short, knowledge of sound structure appears to be spread along a continuum. Fine-
grained knowledge of continuous variation tends to lie at the phonetic end. Know-
ledge of lexical contrasts and alternations tend to be more granular. (Pierrehumbert
et al. 2000: 287)
Let us consider the background of this issue in a bit more depth. Growing out
of Pierrehumbert’s (1980) study of English intonation, gradient phonetic
patterns are understood as resulting from phonetic implementation, through
a mapping of categorical elements to continuous events. Under the particular
view developed there, termed generative phonetics, these gradient patterns are
the result of interpolation through phonologically unspeciWed domains.
Keating (1988) and Cohn (1990) extend this approach to the segmental
domain, arguing that phenomena such as long distance pharyngealization
and nasalization can be understood in these terms as well. For example, the
cline in nasal airXow seen in the vowel [e] in [den] in Figure 2.1b is interpreted
as resulting from phonetic interpolation through a phonologically unspe-
ciWed span.
The phonology, then, is understood as the domain of discrete, qualitative
patterns and the phonetics as the domain of the continuous, quantitative
realization of those patterns. Intrinsic to this view is the idea that lexical
entries and phonological patterns are represented in terms of distinctive
features, taken to be abstract properties, albeit deWned phonetically. These
are then interpreted in a phonetic component, distinct from the phonological
one. I refer to this as a mapping approach. A modular mapping approach has
been the dominant paradigm to the phonology–phonetics interface since the
1980s and has greatly advanced our understanding of phonological patterns
and their realization. Such results are seen most concretely in the success of
many speech-synthesis-by-rule systems both in their modelling of segmental
and suprasegmental properties of sound systems. (See Klatt 1987 for a review.)
Is there Gradient Phonology? 29
the realization of contrastive nasal vowels, there is nasal airXow resulting from
the contrast and also from coarticulatory patterns, seen, for example, in the
transition between oral vowels and nasal consonants. Both aspects need to be
modelled. In the case of contextual nasalization in English, there are both long
distance and more local eVects seen in the physical patterns of nasal airXow
that need to be accounted for.
The question of whether phonology and phonetics should be understood
as distinct modules needs to be approached as an empirical question. What
sort of approach gives us the best Wt for the range of more categorical versus
more gradient phenomena?
There are clearly some grey areas—notably gradient phonology. Yet it is
important to realize that just because it is diYcult to know exactly where to
draw the line (cf. Pierrehumbert et al. 2000), this does not mean there are not
two separate domains of sound structure. The fact that it is diYcult to draw a
line follows in part from the conception of phonologization (Hyman 1976),
whereby over time low-level phonetic details are enhanced to become phono-
logical patterns. Phonologization by its very nature may result in indetermin-
ate cases. As phonetic details are being enhanced, it will be diYcult at certain
stages to say that a particular pattern is ‘phonetic’ while another is ‘phono-
logical’. It has been suggested, for example that vowel lengthening before
voiced sounds in English is currently in this in-between state. The diYculty
of drawing a line also relates to the sense in which categoriality can only be
understood in both rather abstract and language-speciWc terms.
Recent work suggests that phonology and phonetics are not the same thing,
but that the distinction might be more porous than assumed following strict
modularity (e.g. Pierrehumbert 2002 and Scobbie 2004). Pierrehumbert
(2002: 103) states: ‘categorical aspects of phonological competence are
embedded in less categorical aspects, rather than modularized in a conventional
fashion.’ We return below to the nature of the relationship between phon-
ology and phonetics, as the status of gradient phonology plays a crucial role in
this question.
In order to investigate gradience in phonology, we need a clearer under-
standing of what we mean by gradience and we need to consider how it might
be manifested in diVerent aspects of the phonology. I turn to these questions
in the next section.
from one continuous variable to another, that is, a slope. (In linguistic usage,
we use the form gradience as a noun and gradient as an adjective.) It has also
shifted to mean the continuous nature of a single variable.2 Thus we need to
be clear on which sense of gradient we are talking about. Discrete is often
equated with categorical and continuous with gradient (although there may
be gradient patterns that are discrete). We need to consider both the question
of what is gradient, as well as what is continuous.
The terms gradient and gradience have been used in a number of diVerent
ways in the recent phonetic and phonological literature. To think more
systematically about the nature of gradience in phonology, we need to
tease apart these diVerent usages (Section 2.2.1) before considering how
these senses might apply to diVerent aspects of what is understood to be
phonology—that is, contrast (Section 2.2.2), phonotactics (Section 2.2.3),
and alternations, both morphophonemics (Section 2.2.4) and allophony
(Section 2.2.5).
2.2.2 Contrast
Fundamental to a phonological system is the idea of lexical contrast: some
phonetic diVerences in the acoustic signal result in two distinct lexical items,
that is, minimal pairs. This is also the basis upon which inventories of sounds
are deWned. The term contrast is used in two rather diVerent senses: under-
lying or lexical contrast, and surface contrast, that is, identiWable phonetic
diVerences independent of meaning. The question of surface contrast some-
times arises when comparisons are made between phonological categories
across languages. It also often arises in the discussion of phonological
alternations that aVect lexical contrasts in terms of neutralization or near-
neutralization. Cases of complete phonological neutralization should result in
no cues to underlying diVerences or contrast. Yet many cases of what are
claimed to be complete neutralization exhibit subtle phonetic cues that
diVerentiate between surface forms. (For a recent discussion and review of
such cases involving Wnal devoicing, see Warner et al. 2004). Under one
interpretation, such cases can be understood as gradient realization of con-
trast. Due to space limitations, I do not pursue the issue of near-
neutralization here.
We might wonder if contrast is all or nothing, or whether it too might be
gradient in the sense of exhibiting gradient well-formedness. Within genera-
tive grammar, we understand contrast in absolute terms. Two sounds are
either in contrast or they are not. Many contrasts are very robust. Yet, contrast
can also be much more speciWc or limited. (See Ladd 2003 for a discussion of
some such cases.) There are certain sounds that contrast in some positions,
but not others (that is, positional neutralization). For example, even for
speakers who maintain an /a/ – /O/ contrast in American English, this contrast
holds only before coronals and in open syllables. What is the nature of
realization of these sounds before non-coronals? Do speakers produce the
‘same’ vowel in fog and frog? There are also some sounds that contrast in all
positions in the word, but where the functional load of the contrast is very
limited, such as in the case of /u/ versus /ð/ in English (thigh vs. thy, ether vs.
either, Beth vs. eth, that is [ð]). Is contrast realized the same way in these cases
34 The Nature of Gradience
2.2.3 Phonotactics
A second aspect of sound systems widely understood to constitute part of
phonology is allowable sound combinations or sequences—phonotactics.
Some aspects of phonotactics appear to be deWned by segmental context,
especially immediately preceding and following elements; some aspects are
deWned by prosodic position, often best characterized in terms of syllable
structure; and some aspects are deWned by morpheme- or word-position.
Under many approaches to phonology, phonotactic patterns are understood
to be categorical in nature. Particular combinations of sounds are understood
to be either well-formed or ill-formed. Following most generative approaches
to phonology, both rule-based and constraint-based, phonotactic patterns are
captured with the same formal mechanisms as phonological alternations.
Typically, phonotactic and allophonic patterns closely parallel each other,
providing the motivation for such uniWed treatments. It is argued that distinct
treatments would result in a ‘duplication’ problem (e.g. Kenstowicz and
Kisseberth 1977).
Recent work by a wide range of scholars (e.g. Pierrehumbert 1994, Vitevich
et al. 1997, Frisch 2000, Bybee 2001, and Hay et al. 2003) suggests that
phonotactic patterns can be gradient, in the sense that they do not always
hold 100 per cent of the time. Phonotactic patterns may reXect the stochastic
nature of the lexicon and speaker/hearers are able to make judgements about
the relative well-formedness of phonotactic patterns.
As an example, consider the phonotactics of medial English clusters, as
analysed by Pierrehumbert (1994). Pierrehumbert asks the question of how we
can account for the distribution of medial clusters, that is, the fact that certain
consonant sequences are well-formed but others are not, for example /mpr/,
/ndr/ but not */rpm/ or */rdn/. A generative phonology approach predicts:
medial clusters ¼ possible codas + possible onsets. While a stochastic syllable
grammar makes diVerent predictions: ‘the likelihood of medial clusters
derived from the independent likelihoods of the component codas and onsets’
(1994: 174) and ‘The combination of a low-frequency coda and a low-
frequency onset is expected to be a low-frequency occurrence’ (1994: 169).
Pierrehumbert carried out a systematic analysis of a dictionary and found
Is there Gradient Phonology? 35
knowledge is not tied to frequency and indeed is true abstraction across the
lexicon, that is, there is phonological knowledge independent of statistical
generalizations across the lexicon. ‘In light of such results, I will assume,
following mainstream thought in linguistics, that an abstract phonological
level is to be distinguished from the lexicon proper.’ (2003: 191). This suggests
that we have access to both Wne-grained and coarse-grained levels of know-
ledge and that they co-exist (see Beckman 2003 and Beckman et al. 2004). We
would predict a (negative) correlation between the degree of gradience and
the level of abstraction.
2.2.5 Allophony
The Wnal aspect of phonology is allophony. Based on the deWnitions of SPE,
allophony is understood to be part of phonology, due to its language-speciWc
nature. There has been much discussion in the literature about whether
allophony is necessarily categorical in nature or whether there are gradient
aspects of allophony. There are also many cases of what was understood as
allophony in categorical terms that have been shown, based on instrumental
studies, to be gradient. This is the case of anticipatory nasalization in English
discussed in Cohn (1990, 1993) and the case of velarization of [l] in English as
discussed by Sproat and Fujimura (1993). Such cases raise three issues.
granular granular
fine-grained fine-grained
phonetics phonology phonetics phonology
(a) (b)
granular
fine-grained
(c) phonetics phonology
Figure 2.2. (a) Continuum between phonetics and phonology (x-axis) and fine-
grained and granular (y-axis) dimensions of speech; (b) distribution of data, modular
approach; (c) distribution of data, unidimensional approach
Gradedness: Interpretive
Dependencies and Beyond
ERIC REULAND
3.1 Introduction
During the last decades it has been a recurrent theme whether or not the
dichotomy between grammatical versus ungrammatical or well-formed ver-
sus ill-formed should not be better understood as a gradient property (cf.
Chomsky’s (1965) discussion of degrees of grammaticality).1 If so, one may
well ask whether gradedness is not an even more fundamental property of
linguistic notions. The following statement in the announcement of the
conference from which this book originated presupposes an aYrmative
answer, and extends it to linguistic objects themselves, making the suitability
to account for gradedness into a test for linguistic theories: ‘The kind of
grammar typically employed in theoretical linguistics is not particularly
suited to cope with a widespread property of linguistic objects: gradedness.’2
This statement implies that we should strive for theories that capture graded-
ness. To evaluate it one must address the question of what ‘gradedness’ as a
property of linguistic objects really is. The issue is important. But it is also
susceptible to misunderstandings. My Wrst goal will be to show that graded-
ness is not a uniWed phenomenon. Some of its manifestations pertain to
language use rather than to grammar per se. Understanding gradedness may
therefore help us shed light on the division of labour among the systems
underlying language and its use. Showing that this is the case will be the
second goal of this contribution.
1 This material was presented at the ‘Gradedness conference’ organized at the University of
Potsdam 21–23 October 2002. I am very grateful to the organizers, in particular Gisbert Fanselow,
for creating such a stimulating event. I would like to thank the audience and the two reviewers of the
written version for their very helpful comments. Of course, I am responsible for any remaining
mistakes.
2 This statement is taken from the material distributed at the conference.
46 The Nature of Gradience
This quote does not state that the speech community is homogeneous, nor
that one should not study the nature of variation between speech communi-
ties. It also does not claim that intuitive judgements of the speakers about the
grammaticality of utterances are categorical and stable. In fact one should not
expect them to be. Linguistic data are like any empirical data. Whether one
takes standard acceptability judgements of various kinds, truth value judge-
ment tasks, picture identiWcation tasks on the one hand, or eye-tracking data
and neuro-physiological responses in brain tissue on the other, they all have
the ‘ugly’ properties of raw empirical data in every Weld. Depending on the
nature of the test, it may be more or less straightforward to draw conclusions
about grammaticality, temporal or spatial organization of brain processes, etc.
from such data. In fact, Chomsky says no more than that the ‘study of
language is no diVerent from empirical investigation of other complex phe-
nomena’, and that we should make suitable idealizations in order to make
eVective empirical investigation feasible at all. For anyone who can see that
watching a tennis match is not the best starting point if you want to begin
getting an understanding of the laws of motion Chomsky’s point should be
pretty straightforward.
Gradedness: Interpretive Dependencies and Beyond 47
The issue is how the diVerences along those dimensions aVect the
transmission of information to other parts of the system. As we all may
know from inspecting our watches, a digital system can mimic an analogue
process and an analogue system can mimic a digital process, and in fact all
watches are based on conversions from one type of process to another. In
the brain it need not be diVerent. And even if at some level the brain
architecture would have connectionist type properties, this would not
prevent it from emulating symbolic/discrete operations. So, even regardless
of properties of the brain architecture, whether or not the system under-
lying language is best conceived as analogue or discrete/digital, is an
independent empirical issue.
If so, there is no escape from approaching the issue along the lines of any
rational empirical inquiry. All models of language that come anywhere near a
minimal empirical coverage are discrete. There are no analogue models to
compare them with (see Reuland (2000) for some discussion). Yet, there are
some potentially analogue processes in language:
. use of pitch and stress as properties of form representing attitudes
regarding the message;
. use of pitch and stress representing relative importance of parts of the
message.
These, however, involve typically indexical relationships such as the contours
of intonation that one may use for various degrees of wonder, or surprise; or
the heaviness of the stress, or the height of the pitch of an expression, where
the relative position on a scale in some dimension of properties of the signal
reXects the relative position on a scale of importance, surprise, etc., of what it
expresses. One need not doubt that intensity of the emotions involved is
expressed by properties of the linguistic signal in an essentially analogue
manner. But this type of import of the signal must be carefully distinguished
from properties of the signal that serve as grammatical markings in the
sentence. For instance, question intonation is the realization of a grammatical
morpheme. One expression cannot be more or less a question than another
one. So, the grammatical import of question intonation is as discrete as the
presence or absence of a segmental question morpheme. As shown by studies
of the expression of focus (Reinhart 1996), the role of intonation in focus
marking is discrete as well.3 There is no gradedness of focus-hood as a
Unlike what was thought in the 1970s, there is no clear split along regional
dialect lines. Rather there is variation at the individual level. Note, however,
that it is not a matter of real gradedness. Individual speakers are quite
consistent in their judgement. Hence, no insight would be gained by treating
this as a ‘graded’ property of Dutch. The same holds true of the variation
regarding the that-trace Wlter in American English. Variation of this kind is
easily handled by the one theoretical tool late GB or current minimalist theory
has available to capture variation, namely variability in feature composition
of functional elements. The contrast between (3.1a) and (3.1b) reXects a
Gradedness: Interpretive Dependencies and Beyond 51
(3.4) which candidates from every class did some teacher tell every colleague
that most mother’s favourites refused to acknowledge before starting to
support
Carrying out complex computations with interacting quantiWers may easily
lead to the exhaustion of processing resources. OverXow of working memory
leads to guessing patterns. It is well-known that speakers diVer in their
processing resources (for instance, Just and Carpenter 1987; Gathercole and
Baddeley 1993). So, varying availability of resources may lead to diVerential
patterns of processing break-down. Simply speaking, one speaker may start
guessing at a point of complexity, where another speaker is still able to process
the sentence according to the linguistic rules. This would yield diVerences in
observable behaviour that do have gradient properties, but that are again
grammar external, and would in fact be a mistake to encode in the grammat-
ical system.
Note, that even simple sentences may sometimes be mind boggling as in
determining whether every mother’s child adores her allows every mother to
bind her or not.
Of course, the ranking of processes in terms of resource requirements has a
clear theoretical interest, since it sheds light on the overall interaction of
processes within the language system, but it is entirely independent of the
issue of gradedness of grammar.
sensori-motor system or the grammatical system. For instance Dutch dat, and
Frisian dat are both complementizers; they diVer slightly in the way the a is
realized, more back in the case of Frisian. In Frisian, but not in Dutch, dat
carries a grammatical instruction letting it cliticize to wh-words (actually,
it does not matter whether this is a property of dat, or of the element in
spec-CP). This property can be dissociated from its pronunciation, witness
the fact that some Frisian speakers have this feature optionally in their Dutch,
and use cliticization together with the Dutch pronunciation of the a.
Optionality means that the mental lexicon of such speakers does contain
the two variants, both being accessible in the ‘Dutch mode’. If so, there is no
reason that this cannot be generalized to other cases of micro-variation. If the
mental lexicon may contain close variants of one lexical item, one may expect
retrieval of one or the other to be subject to chance. Of course, this does not
mean that the phenomenon of variation and the mechanisms behind it are
uninteresting. It does mean that the concept of gradedness does not
necessarily help understanding it.
Interpretation being dependent on perspective is another possible source of
variation (as in the Necker cube: which edge of the cube is in front). Consider
the following contrast from Pollard and Sag (1992):
(3.5) a. Johni was going to get even with Mary. That picture of himselfi in
the paper would really annoy her, as would the other stunts he had
planned.
b. *Mary was quite taken aback by the publicity Johni was receiving.
That picture of himselfi in the paper had really annoyed her, and
there was not much she could do about it.
Example (3.6) allows for two discourse construals. One in which the view-
point is Mary’s throughout, another in which the perspective shifts to that of
John as soon as John is introduced. Clearly, judgements will be inXuenced by
the ease with which the perspective shift is carried out. So, we have gradedness
in some sense, but it is irrelevant for the system, since the relevant factor is still
discrete. The judgement is determined by whether the shift in viewpoint is
accessed in actual performance.
PF- C-I-
interface interface (interpretive system)
( inferences )
[ lexicon ]
4 Take, for instance, the rules computing entailments. A sentence such as DPplur elected Y does not
entail that z elected Y, for z 2 kDPplurk, whereas DPplur voted for Y does entail that z voted for Y, for
z 2 kDPplurk. This is reXected in the contrast between we elected me and ??we voted for me. The latter is
ill-formed since it entails the instantiation I (lx (x voted for x) ), which is reXexive, but not reXexive-
marked (Reinhart and Reuland 1993).
56 The Nature of Gradience
shows that the computational system can read these properties only in so far as
they can be encoded as combinations (clusters) of two features: [ + c(ause)]
and [ + m(ental)]. Anything else is inaccessible to CHL. The coding of the
concept subsequently determines how each cluster is linked to an argument.
Note furthermore, that an interface must itself contain operations. That is, it
must be a component sui generis. For instance, syntactic constituents do not
always coincide with articulatory units. Furthermore, in preparation for
instruction to the sensori-motor system, hierarchical structure is broken
down and linearized. On the meaning side, it is also well-known that for
storage in memory much structural information is obliterated. As Chomsky
(1995) notes, at the PF side such operations violate inclusiveness. It seems fair
to assume that the same holds for the interpretive system at the C–I interface.
Given this schema, more can be ‘wrong’ with an expression than just a
violation (crash) of the derivation in CHL. DiVerences in the ‘degree of
ungrammaticality’ may well arise by a combination of violations in diVerent
components: PF, lexicon, narrow syntax, interpretive system, interpretation
itself. Any theory of language with a further articulation into subsystems is in
principle well equipped to deal with ‘degrees’ of well- or ill-formedness.
that more Wnely grained analyses make it possible to connect the behaviour of
anaphoric elements to the mechanisms of the grammar, and to explain
variation from details in their feature make-up, or from diVerences in their
environment. Current research warrants the conclusion that the computa-
tional system underlying binding operates as discretely and categorically as it
does in other areas of grammar. However, just as in the cases discussed in
Sections 3.1 and 3.2, the computational system does not determine all aspects
of interpretation; where it does not, systems of use kick in, evoking the air of
Xexibility observed.
Much work on binding and anaphora so far is characterized by two
assumptions:
1. all binding dependencies are syntactically encoded (by indices or equiva-
lent);
2. all bindable elements are inherently characterized as pronominals or
anaphors (simplex, complex, or clitic).
These assumptions are equally characteristic of extant OT-approaches to
binding. So, one Wnds diVerent rankings of binding constraints on pronom-
inals or subtypes of anaphors, across environments and languages. In
Reuland (2001) I argued that both assumptions are false. The argument
against (1) rests in part on the inclusiveness condition, and hence is theory-
internal, but should be taken seriously in any theory that strives for
parsimony. Clearly, indices are not morphosyntactic objects. No language
expresses indices morphologically. Thus, if syntax is, indeed, the component
operating on morphosyntactic objects, it has no place for indices. External
validation of the claim that syntax has no place for indices rests on the
dissociation this predicts between dependencies that can be syntactically
encoded and dependencies that cannot be. In so far as such dissociations
can be found, they constitute independent evidence (see Reuland 2003 for
discussion).
The argument against (2) is largely theory-independent. It essentially rests
on the observation that there are so many instances of free ‘anaphors’ and
locally bound ‘pronominals’, that one would have to resort to massive lexical
ambiguity in order to ‘maintain order’. Instead, we can really understand the
binding behaviour of pronouns (to use a cover term for anaphors and
pronominals) if we consider their inherent features, the way these features
are accessed by the computational system, and general principles governing
the division of labour between the components of the language system, in
particular CHL, logical syntax, and discourse principles as part of the inter-
pretive system.
58 The Nature of Gradience
By way of illustration I will brieXy consider three cases: (a) free anaphors
(often referred to as ‘logophors’); (b) reXexives in PPs; and (3) local binding
of pronominals.
5 The notion ‘logical syntax’ is to be distinguished from ‘logical form’. Logical form is the output of
the computational system (CHL); the operations yielding logical form are subject to the inclusiveness
condition. Logical syntax is the Wrst step in interpretation, with operations that can translate
pronominals as variables, can raise a subject leaving a lambda expression, etc., thus not obeying
inclusiveness. For a discussion of logical syntax, see Reinhart (2000a) and references cited there.
Gradedness: Interpretive Dependencies and Beyond 59
6 I am grateful to an anonymous reviewer for pointing out a problem in the original formulation.
In the present version I state more clearly the empirical assumptions on which the explanation rests.
Note that the problem of keeping arguments apart for purposes of theta-role assignment does not
come up in the case of two diVerent arguments, as in (i) (abstracting away from QR/l-abstraction):
(i) [VP j [V’ V m ]] ! [VP V m j ]) (–/! [VP V m/j])
1 2 3
The objects remain distinct. There is no reason that theta-relations established in the conWguration in
(i.1) would not be preserved in (i.2). Hence, the issue of (i.3) does not arise. Note, that theta-roles are
not syntactic features that modify the representation of the element they are assigned to. That is, they
are not comparable to case features. This leaves no room for the alternative in (ii) where xu1 and xu2 are
distinguishable objects by virtue of having a diVerent ‘u-feature’ composition.
(ii) [VP xu1 [V’ V xu2 ]] ! [VP V xu1 xu2 ] –/! [VP V xu1/xu2]
1 2 3
Rather, with theta-assignment spelled out, but reading xu1 correctly as ‘as x is being assigned the role u1’
we get (iii), which reXects the problem discussed in the main text:
(iii) [VP xu1 [V’ V xu2 ]] ! ([VP V ‘‘x x’’u1/u2 ]) ! *[VP V xu1/u2]
1 2 3
60 The Nature of Gradience
SELF now marks the predicate as reXexive, and indeed nothing resists a
reXexive interpretation of (3.15). Consider now (3.13b), repeated as (3.16):
(3.16) a. *Max boasted that [the queen invited himself for a drink]
b. *Max boasted that [the queen SELF-invited [him[-]] for a drink]
SELF attaches to the verb, marks it as reXexive, but due to the feature
mismatch between the queen and him it cannot be interpreted as reXexive,
and the sentence is ruled out. Consider next (3.13a), repeated as (3.17):
(3.17) a. Max boasted that
[the queen invited [Mary and himself ] for a drink]
b. Max boasted that
[the queen SELF-invited [Mary and [him[-]]] for a drink]
SELF also has another use, namely as an ‘intensiWer’. Hence, its inter-
pretation reXects that property precisely in those cases where its use is not
regulated by the structural requirements of grammar (as we saw in the case
of (3.5) and (3.6)). Note, that we do not Wnd the same pattern in Dutch.
This is because himself diVers from zichzelf in features. The zich in zichzelf is
not speciWed for number and gender. We know descriptively that zich must
be bound in the domain of the Wrst Wnite clause. So, any theory assigning
zich a minimal Wnite clause as its binding domain will predict that Max is
inaccessible as an antecedent to zich in both (3.14a) and (3.14b) (see
Reinhart and Reuland 1991 for references, and Reuland 2001 for an execu-
tion in current theory without indices). So, the contrast between Dutch and
English follows from independent diVerences in feature content between
himself and zichzelf.
Why must anaphors be bound? The fact that himself must be bound only
where it is a syntactic argument of the predicate, and is exempted where it is
not, already shows that this is too crude a question. Also Icelandic sig may be
exempted from a syntactic binding requirement, be it in a diVerent environ-
ment, namely subjunctive.7 Similar eVects show up in many other languages.
Hence, there can be no absolute necessity for anaphors to be bound.
Reuland (2001) shows that anaphors have to be bound only if they can
get a free rider on a syntactic process enabling the dependency between
the anaphor and its antecedent to be syntactically encoded. In the case
of anaphors such as sig, or zich a syntactic dependency can be formed as in (3.18):
(3.18)
DP I V pronoun
R1 R2 R3
One, surely, could not seriously claim that Dutch speakers have a diVerent
pragmatics, or that trots means something really diVerent than Wer. A clearly
syntactic factor can be identiWed, however. As is well-known, Dutch has
preposition stranding, but French does not. Whatever the precise implemen-
tation of preposition stranding, it must involve some relation between P and
the selecting predicate head that obtains in stranding languages like Dutch
and does not in French. Let us assume for concreteness sake that this relation
66 The Nature of Gradience
is ‘allows reanalysis’. Thus, P reanalyses with the selecting head in Dutch, not
in French (following Kayne 1981). We will be assuming that reanalysis is
represented in logical syntax. If so, in all cases in (3.21) we have (3.22) as a
logical syntax representation:
(3.22) DP [V [P pro]] ! . . . .[ V-P] . . . ! DP (lx ([V-P] x x))
We can see now, that at logical syntax level we have a formally reXexive
predicate. Such a predicate must be licensed, which explains the presence of
SELF in all cases.
In French there is no V-P reanalysis. Hence, we obtain (3.23):
(3.23) DP [V [P pro]] ! DP (lx (V x [P x]))
Here, translating into logical syntax does not result in a formally reXexive
predicate. This entails that no formal licensing is required.8 Hence it is indeed
expectations, or other non-grammatical factors that may determine whether a
focus marker like même is required. In a nutshell, we see how in one language
a grammatical computation may cover an interpretive contrast that shows up
in another.
8 Note, that by itself même is quite diVerent from SELF (for instance in French there is no *même-
admiration along the lines of English self-admiration, etc.).
9 The occasional claim that Frisian does not have specialized anaphors is reminiscent of the claim
that Old English does not have them. For Frisian the claim is incorrect. For Old English I consider the
claim as inconclusive. Going over Old English texts, it is striking that all the cases that are usually
adduced in support of the claim are in fact cases that would make perfectly acceptable Frisian. What
one would need are clear cases of sentences with predicates such as hate or admire to settle the point.
10 Note that him is indeed a fully-Xedged pronominal that can be used as freely as English him or
Dutch hem.
Gradedness: Interpretive Dependencies and Beyond 67
In (3.25a) and (3.25b), using hem instead of zich only violates the principle
that it is more economical to have an anaphor than a pronominal. In (3.25c),
however, hem violates two principles, economy and the principle that reXex-
ivity be licensed. And indeed, we do Wnd diVerent degrees of ill-formedness
reXecting the number of violations.
68 The Nature of Gradience
4.1 Introduction
Phonologists have begun to consider the importance of gradient phonological
patterns to theories of phonological competence (e.g. Anttila 1997; Frisch
1996; Hayes and MacEachern 1998; Ringen and Heinamaki 1999; Pierrehum-
bert 1994; Zuraw 2000; and for broader applications in linguistics see the
chapters in Bod et al. 2003 and Bybee and Hopper 2001). Gradient patterns
have been discovered in both internal evidence (the set of existing forms in a
language) and external evidence (the use and processing of existing and
novel forms in linguistic and metalinguistic tasks). The study of gradient
phonological patterns is a new and promising frontier for research in
linguistics. Grammatical theories that incorporate gradient patterns provide
a means to bridge the divide between competence and performance, most
directly in the case of the interface between phonetics and phonology (Pierre-
humbert 2002, 2003) and between theoretical linguistics and sociolinguistics
(e.g. Mendoza-Denton et al. 2003). In addition, linguistic theories that
incorporate gradient patterns are needed to unify theoretical linguistics
with other areas of cognition where probabilistic patterns are the rule rather
than the exception.
A variety of methodologies have been used in these studies, and an over-
view of these methodologies is the primary focus of this paper. The method-
ologies that are reviewed either attempt to directly assess phonological
knowledge (well-formedness tasks and similarity tasks) or indirectly reXect
phonological knowledge through performance (elicitation of novel forms,
corpus studies of language use, errors in production and perception). A case
Linguistic and Metalinguistic Tasks in Phonology 71
more likely for /p t s/ and less likely for /d g/. Zuraw (2000) used elicitation
tasks to explore the productivity of this process, and she also used a well-
formedness task where participants judged the acceptability of nasal substi-
tution constructions using novel stems in combination with common
preWxes. She compared wordlikeness judgements for the same novel stem in
forms with and without nasal substitution, and found that ratings were higher
for nasal substitution in cases where nasal substitution was more common in
the lexicon (e.g. /p t s/ onsets to the novel words).
Hay et al. (2004) examined the inXuence of transitional probabilities on
wordlikeness judgements for novel words containing medial nasal-obstruent
clusters (e.g. strimpy, strinsy, strinpy). Overall, they found that wordlikeness
judgements reXected the probability of the nasal-obstruent consonant cluster.
However, they found surprisingly high wordlikeness for novel words when the
consonant cluster was unattested (zero probability) in monomorphemic
words. They hypothesized that these high judgements resulted from partici-
pants’ analyses of these novel words as multimorphemic. This hypothesis was
supported in additional experiments where subjects were asked to make a
forced choice decision between two novel words as to which was more
morphologically complex. Participants were more likely to judge words
with low probability internal transitions as morphologically complex, dem-
onstrating that participants considered multiple analyses of the forms they
were given. Hay et al. (2004) proposed that participants would assign the
most probable parse to forms they encountered (see also Hay 2003).
The Wndings of Hay et al. (2004) highlight the importance of careful design
and analysis of stimuli in experiments using metalinguistic tasks. Although
these tasks are meant to tap ‘directly’ into phonological knowledge, the
judgements that are given are only as good as the stimuli that are presented
to participants and there is no guarantee that the strategy employed by
participants in the task matches the expectations of the experimenter. Another
example of this problem appears in the case study in this chapter, where
perceptual limitations of the participants resulted in unexpected judgements
for nonword forms containing presumably illegal onset consonant clusters.
4.2.1.2 Distance from English Greenberg and Jenkins (1964) is the earliest
study known to the authors that examined explicit phonological judgements
for novel word forms in a psycholinguistic experiment. They created novel
CCVC words that varied in their ‘distance from English’ as measured by a
phoneme substitution score. For each novel word, one point was added to its
phoneme substitution score for every position or combination of positions for
which phoneme substitution could make a word. For example, for the novel
74 The Nature of Gradience
word /druk/ there is no phoneme substitution that changes the Wrst phoneme
to create a word. However, if the Wrst and second phonemes are replaced, a
word can be created (e.g. /Xuk/). For every novel word, substitution of all four
phonemes can make a word, so each novel word had a minimum score of one.
Greenberg and Jenkins (1964) compared the cumulative edit distance against
participants’ judgements when asked to rate the novel words for their ‘distance
from English’ using an unbounded magnitude estimation task. In this task,
participants are asked for a number for ‘distance from English’ based on their
intuition, with no constraint given to them on the range of numbers to be
used. Greenberg and Jenkins (1964) found a strong correlation between the
edit distance measure and participants’ judgements of ‘distance from English’.
They also found similar results when they used the same stimuli in a
wordlikeness judgement task where participants rated the words on an
11-point scale. Given that the data from a wordlikeness task is simpler to
collect and analyse, there appears to be no particular advantage to collecting
distance judgements in the study of phonology.
4.2.3 Summary
Studies of language patterns, language processing, and metalinguistic judge-
ments have found substantial evidence that probabilistic phonological pat-
terns are part of the knowledge of language possessed by speakers. These
Linguistic and Metalinguistic Tasks in Phonology 77
patterns are reXected in the frequency of usage of phonological forms, the ease
of processing of phonological forms, and gradience in metalinguistic judge-
ments about phonological forms. In the next section, a case study is presented
that demonstrates probabilistic patterns in the cross-linguistic use of conson-
ant clusters and gradience in metalinguistic judgements about novel words
with consonant clusters that reXects the cluster’s probability.
Attested 11 11 4 1
Possible 12 16 6 2
% Attested 92% 69% 67% 50%
Linguistic and Metalinguistic Tasks in Phonology 79
that occur moderately frequently (/kw, dr, sl, fr/) or very frequently (/pl, sp, X,
gr, pr/) in the lexicon. The nonword list was analysed to avoid potential
confounding factors for wordlikeness judgements such as violating a
phonotactic constraint somewhere other than in the onset. Rime statistics
were also compiled to examine the eVects of rime frequency on judgements
(as in Kessler and Treiman 1997). After discarding items that were not suitable,
115 novel words remained to be used in the experiment. The nonwords were
recorded as spoken by the Wrst author using digital recording equipment.
4.3.2.2 Participants Thirty-Wve undergraduate students in an introductory
communication sciences and disorders course participated in the experiment.
Subjects were between 19 and 45 years of age, and three males and thirty-two
females participated. All participants were monolingual native speakers of
American English and reported no past speech or hearing disorders.
4.3.2.4 Results The data were analysed based on the mean rating given to
each stimulus word across subjects. Three instances where a participant gave
no response to the stimulus were discarded; otherwise, all data were analysed.
Correlations were examined for the type frequency of onset CC, C1, C2, rime,
nucleus, and coda. As expected, mean wordlikeness judgements correlated
signiWcantly with the type frequency of the CC sequences contained in the
novel words. The CC frequency was the strongest predictor of how the
participants judged wordlikeness (r ¼ 0.38). The frequency of the nucleus
was also correlated with the participant’s judgement of wordlikeness (r ¼ 0.19).
In a regression model of the wordlikeness data using these two factors, both
factors were found to be signiWcant (CC: t(112) ¼ 4.1; p < .001; Nucleus:
t(112) ¼ 2.0; p < .05).
The amount of variance explained by the CC frequency and the nucleus
frequency is relatively small (cf. Munson 2001; Hay et al. 2004). Unexpectedly,
we found that participant judgements of the novel words containing CC that
do not occur in English were fairly high. Possible explanations for this Wnding
Linguistic and Metalinguistic Tasks in Phonology 81
fl
5 fr gr
kw
Mean Rating
sp
θr psn
4 snpldr
dw sl
3 sf
2 gw
1
1 10 100 1000
Log CC Type Frequency
Figure 4.1. Mean wordlikeness rating for occurring consonant clusters in English
(averaged across stimuli and subjects) by consonant cluster type frequency
82 The Nature of Gradience
4.4 Implications
Phonology provides an ideal domain in which to examine gradient patterns
because there is a rich natural database of phonological forms: the mental
lexicon. A growing number of studies using linguistic and metalinguistic
evidence have shown that phonological structure can be derived, at least in
part, from an analysis of patterns in the lexicon (e.g. Bybee 2001; Coleman and
Pierrehumbert 1997; Frisch et al. 2004; Kessler and Treiman 1997; and see
especially Hay 2003). In addition, it has been proposed that the organization
and processing of phonological forms in the lexicon is a functional inXuence
on the phonology, and that gradient phonological patterns quantitatively
reXect the diYculty or ease of processing of a phonological form (Berg 1998;
Frisch 1996, 2000, 2004).
For example, the storage of a lexical item must include temporal informa-
tion, as distinct orderings of phonemes create distinct lexical items (e.g. /tAp/
is diVerent from /pAt/ in English). The temporal order within a lexical item is
reXected in some models of speech perception, such as the Cohort model and
its descendents (e.g. Marslen-Wilson 1987), where lexical items compete Wrst
on the basis of their initial phonemes, and then later on the basis of later
phonemes. Sevald and Dell (1994) found inXuences of temporal order on
speech production. They had participants produce nonsense sequences of
words and found that participants had more diYculty producing sequences
where words shared initial phonemes. Production was facilitated for words
that shared Wnal phonemes (in comparison to words without shared phon-
emes). In general, models of lexical activation and access predict that lexical
access is most vulnerable to competition between words for initial phonemes
and less vulnerable for later phonemes, as the initial portions of the accessed
word severely restrict the remaining possibilities. This temporal asymmetry
can be reXected quantitatively in functionally grounded phonological con-
straints. For example, phonotactic consonant co-occurrence constraints in
Arabic are stronger word initially than later within the word (Frisch 2000). This
asymmetry is compatible with the claimed grounding of this co-occurrence
constraint in lexical access (Frisch 2004).
It has also been demonstrated in a wide range of studies that the phono-
logical lexicon is organized as a similarity space (Treisman 1978; Luce and
Pisoni 1998). The organization of the lexicon as a similarity space is reXected
in processing diVerences based on activation and competition of words that
share sub-word phonotactic constituents. The impact of similarity-based
organization is most clearly reXected in cases of analogical processes between
Linguistic and Metalinguistic Tasks in Phonology 83
4.5 Summary
Studies of gradience in phonology using linguistic and metalinguistic data
have revealed a much closer connection between phonological grammar and
the mental lexicon. These new dimensions of phonological variation could
not have been discovered without corpus methods and data from groups of
participants in psycholinguistic experiments. While the range of patterns that
have been studied is still quite limited, the presence of gradient phonological
constraints demonstrates that phonological knowledge goes beyond a cat-
egorical symbolic representation of possible forms in a language. In order to
accommodate the broader scope of phonological generalizations, models of
grammar will have to become more like models of other cognitive domains,
which have long recognized and debated the nature of frequency and simi-
larity eVects for mental representation and processing. The study of phon-
ology may provide a unique contribution in addressing these more general
questions in cognition for two reasons. The scope of phonological variability
is bounded by relatively well-understood mechanisms of speech perception
and speech production. Also, phonological categories and phonological pat-
terns provide a suYciently rich and intricate variety of alternatives that the
full complexity of cognitive processes can be explored.
5
5.1 Introduction
Non-standard varieties, such as dialects throughout Europe, which are under
investigation challenge research about the phenomenon of micro-variation in
two ways.1 Within the framework of generative grammar, the linguist studies
the universal properties of the human language in order to Wnd out the
patterns, loci, and limits of syntactic variation. Language is viewed essentially
as an abstraction, more speciWcally, as a psychological construct (I-language)
that refers primarily to diVerences between individual grammars within a
homogeneous speech environment, that is to say, without consideration of
stylistic, geographic, and social variation. Given this objective, a suitable
investigative tool is the use of intuitions or native-speaker introspection, an
abstraction that is quite normal within the scientiWc enterprise. Frequently,
however, there are no suYciently detailed descriptions available of syntactic
phenomena that are of theoretical interest for investigating micro-variation in
and between closely related non-standard varieties in a large geographical area
(cf. Barbiers et al. 2002). Subsequently, a complication emerges in that the
linguist has to collect his own relevant data from speakers of local dialects who
are non-linguists. The elicitation of speaker introspection often calls for a
design of experiments in the form of acceptability judgements when the
linguist has to elicit intuitions from these speakers (Cornips and Poletto 2005).
1 I like to thank two anonymous reviewers for their valuable comments. Of course, all usual
disclaimers apply.
86 The Nature of Gradience
Moreover, the standard variety may strongly interfere with local dialect
varieties in some parts of Europe so that there is no clear-cut distinction
between the standard and the local dialect. In this contact setting—a so-called
intermediate speech repertoire (cf. Auer 2000)—the speakers of local dialects
may assess all possible syntactic variants, that is dialect, standard, and emer-
ging intermediate variants to their local dialect. Subsequently, clear-cut
judgements between the local dialect and the standard variety are not attain-
able at all. This is, among other factors, of crucial importance for under-
standing the phenomenon of gradedness in acceptability judgements.
This chapter is organized as follows. In the second part it is proposed that
acceptability judgements do not oVer a direct window into an individual’s
competence. The third part discusses an intermediate speech repertoire that is
present in the province of Limburg in the Netherlands. In this repertoire some
constructions are diYcult to elicit. Finally, acceptability judgements that are
given by local dialect speakers in the same area are discussed. Using data from
reXexive impersonal passives, reXexive ergatives, and inalienable possession
constructions, it is argued that the occurrence of intermediate variants and
the variation at the level of the individual speaker is not brought about by
speciWc task-eVects but is due to the induced language contact eVects between
the standard variety and the local dialects.
because of the intrusions of numerous other factors’ (cf. Gervain 2002). ‘The
intrusion of numerous other factors’ may lead to a crucial mismatch between
the acceptability judgements of a construction and its use in everyday speech.
One of these factors is that in giving acceptability judgements people tend to
go by prescriptive grammar (what they learned at school, for instance) rather
than by what they actually say (cf. Altenberg and Vago 2002). This is consist-
ent with sociolinguistic research that prescriptive grammars usually equal
standard varieties that are considered more ‘correct’ or have more prestige
than the vernacular forms speakers actually use. Moreover, strong sociolin-
guistic evidence shows that a speaker may judge a certain form to be com-
pletely unacceptable but can, nevertheless, be recorded using it freely in
everyday conversation (Labov 1972, 1994, 1996: 78). One way to diminish the
prescriptive knowledge eVect is to ask for indirect comparative acceptability
judgements. Rather than eliciting direct intuitions by the formula: ‘Do you
judge X a grammatical/better sentence than Y?’, speakers can be asked the
more indirect: ‘Which variant Y or X do you consider to be the most or the
least common one in your local dialect?’ Relative judgements can be adminis-
tered by asking the speakers to indicate how uncommon or how common (for
example represented by the highest/lowest value on a several point scale,
respectively) the variant is in their local dialect. Psychometric research
shows that subjects are thus much more reliable on comparative, as opposed
to independent ratings (cf. Schütze 1996: 79 and references cited there). These
Wndings indicate that relative acceptability is an inevitable part of the
speaker’s judgements.
Relative acceptability is without doubt brought about by the complex
relationship between I-language and E-language phenomena. The opposition
between these two types of phenomena is not necessarily watertight as is often
claimed in the literature. Muysken (2000: 41–3) argues that the cognitive
abilities which shape the I-language determine the constraints on forms
found in the E-language and that it is the norms created within E-language
as a social construct that make the I-language coherent. One example of this
complex relationship may be that in a large geographical area two or more
dialects may share almost all of their grammar (more objective perspective)
but are perceived as diVerent language varieties by their speakers (more
subjective perspective). This can be due to the fact that dialects may diVer
rather strongly in their vocalism and that non-linguists are very sensitive to
the quality of vowels. The perceived diVerences between dialects may be
associated with diVerent identities and vice versa. Another consequence
may be that speakers actually believe that the norm created within their speech
community (E-language; more subjective perspective) reXects their grammar
88 The Nature of Gradience
3 However, the plain impersonal passive is also grammatical in the local dialect of Heerlen. In that
case it has no reXexive interpretation: ‘One is washing (clothes).’
90 The Nature of Gradience
The construction with the reXexive zich is fully ungrammatical in the standard
variety but optional in the local dialect. The written questionnaire of the
Dutch Syntactic Atlas project shows that in only two out of thirty-Wve possible
locations in the province of Limburg and its immediate surroundings, an
answer with the reXexive is given. Obviously, the interference with the
standard variety is so strong that the reXexive is not presented in the answers.
It seems that only a very good subject can provide optional structures or all
the possibilities that come to his mind.
N
Amsterdam et
he
rla
Germany
nd
s
= Heerlen
In the local dialect of Heerlen, the reXexive zich has to be present in (5.5b) and
it may be present in (5.5c). Further, it also arises in some more inchoative
constructions based on transitive verbs as veranderen ‘change’, krullen ‘curl’,
and buigen ‘bend’. All these inchoative constructions are fully ungrammatical
with a reXexive in the standard variety. Heerlen Dutch, however, as a second
standard variety in the area has regularized the presence of the reXexive
throughout the whole verb class. It has an optional reXexive zich in the con-
struction in (5.5c) and also in (5.6) which are ungrammatical in the local
dialect (and in the standard variety) (HD ¼ Heerlen Dutch):4
4 In Cornips and Hulk (1996), it is argued that the Heerlen Dutch constructions in (5.6) are ergative
intransitive counterparts of transitive change of state verbs with a causer as external argument.
Further, it is shown that in constructions such as (5.4) and (5.6), the reXexive marker zich acts as an
aspectual marker, that is: the aspectual focus is on the end-point of the event.
92 The Nature of Gradience
5 Blancquaert et al. (1962) denote the following translation in the local dialect of Kerkrade (in
Dutch orthography):
(i) De schipper likte zich zijn lippen af
The skipper licked refl his lips part.
‘The skipper licked his lips.’
6 The constructions in (5.12) with a PP are more accepted in standard Dutch than the double object
constructions (see Broekhuis and Cornips 1997 for more details about this type of construction).
94 The Nature of Gradience
In (5.13) the semi-copula krijgen ‘get’ cannot assign dative case to the posses-
sor hij ‘he’ which is therefore nominative (cf. Broekhuis and Cornips 1994).
However, the spontaneous speech data example in (5.13b) shows that the
possessor is also realized by means of the reXexive zich, which is fully
ungrammatical in the local dialect (like standard Dutch):
(5.13) a. HD Hij krijgt de handen vies
he gets the hands dirty
‘His hands are dirty.’
b. HD Die heeft zich enorm op z’n donder gekregen
(33: dhr Quint)
he has refl enormously a beating got
‘He gets hell.’
Let us now compare (5.14) with the spontaneous speech data example in (5.15)
that occurs very infrequently in the corpus:7
(5.15) HD Ze slaan mekaarpl niet meteen de koppenpl in
(5: Stef)
they hit each other not at once the heads in
‘They don’t hit each other immediately.’
Apparently, both a distributive interpretation and the property of grammatical
number may no longer be characterizing properties of the intermediate forms
in Heerlen Dutch, that is: the inalienable argument koppen ‘heads’ is plural
although the number of the body-parts kop ‘head’ per individual is limited
to one.
Finally, the dative construction cannot be modiWed by just any attributive
adjective, whereas there is no such restriction in the possessive pronoun
constructions, as exempliWed in (5.16a) and (5.16b), respectively (Vergnaud
and Zubizarreta 1992):
(5.16) a. HD *Ik was hemdat. de vieze buik
I wash him the dirty stomach
‘I am washing his dirty stomach.’
b. HD Ik was zijn vieze buik.
I wash his dirty stomach
‘I am washing his dirty stomach.’
I asked speakers of Heerlen Dutch to tell a short story containing the elements
vuil ‘dirty’ and handen ‘hands’. In addition to (5.16b), they realize the inter-
mediate variant in (5.17) which lacks, due to the presence of the possessive
pronoun, any restriction on the presence of the adjective:
(5.17) HD Hij wast hemdat. zijn vuile handen.
He washes him his dirty hands
‘He is washing his dirty hands.’
Further, a major important aspect of emerging intermediate variants such as
the ones described above is that optionality arises. Hence, a major character-
istic of the dative inalienable possession construction is that the [spec TP]-
subject or the agent, cannot enter into a possessive relation with the direct
7 One reviewer points out that the German counterpart of (5.15) with a plural inalienable argument
is grammatical, whereas the German counterpart of (5.14a) with a singular inalienable argument is
ungrammatical. However, (5.14a) is fully grammatical both in the local dialect of Heerlen and Heerlen
Dutch.
96 The Nature of Gradience
object (or prepositional complement), not even if the indirect object is absent,
as illustrated in (5.18a). Thus, a possessive relation between the subject and the
direct object can only be expressed indirectly, namely by inserting a dative NP
or a reXexive zich, as in (5.18b), respectively (see Broekhuis and Cornips 1994;
Vergnaud and Zubizarreta 1992).
(5.18) a. Hdial /SD *Hiji wast de handeni
b. Hdial /?*SD Hiji wast zichi de handen.
he washes reX the hands
‘He is washing his hands.’
However, in all the intermediate variants described so far in which the
possessive relation is expressed by the possessive pronoun, the dative object
or reXexive referring to a possessor is optional. Importantly, all constructions
in (5.19), in contrast to the double object constructions, involve idiomatic
readings:8
(5.19) a. HD ze zeuren (jedat) van alles naar je hoofd toe (19: Cor)
they nag you everything to your head part
‘They are nagging you about everything.’
b. HD die had (jedat) zo bij je neus staan
they had you right away with your nose stand
(35: dhr Berk)
‘They stand in front of you right away.’
c. HD Die heeft (zich) enorm op z’n donder gekregen
(33: dhr Quint)
he has refl enormously a beating got
‘He gets hell.’
Taken together, these intermediate variants are of extreme importance with
respect to the locus of syntactic variation, that is: whether the primitive of
variation is located outside or inside the grammatical system. It becomes
obvious that the facts in Heerlen Dutch indicate that syntactic variation can
no longer exhaustively be described by binary settings or diVerent values of a
parameter (Cornips 1998). In an intermediate speech repertoire, this concept
is a very problematic one and must be open to discussion. DiVerent alter-
natives are possible but there are no satisfying answers yet. A more recent
alternative is to argue that from a minimalist point of view, lexical elements
8 It might be the case that, eventually, the optional dative object will disappear or that it will gain
emphatic meaning.
Intermediate Syntactic Variants 97
9 However, even if the syntactic variants are analysed as coming into existence as a result of two
competing grammars (cf. Kroch 1989), then some lexical elements must still bear un- and interpretable
features as well in order to account for the syntactic alternatives.
98 The Nature of Gradience
The recording of the Wrst session between the standard Dutch speaking
Weldworker and the local ‘assistant interviewer’ translating standard Dutch
into his own dialect shows that the deWnite article in his translation is absent:
that is, the proper names Wim and Els show up without it, as illustrated in
(5.21). These sentences were elicited in order to investigate the order in the
verbal cluster (right periphery):
(5.21) 1st session (dialect–standard)
Ø Wim dach dat ich Ø Els han geprobeerd e kado te geve
Wim thought that I Els have tried a present to give
‘Wim thought I tried to give a present to Els.’
In the same interview session, the ‘assistant interviewer’ shows in another
sentence that he may or may not use the deWnite article resulting in der Wim
and Ø Els respectively in his local dialect. Note that the deWnite determiner
precedes the subject DP whereas it is absent in front of the object DP:
(5.22) 1st session (dialect–standard)
Der Wim dach dat ich Ø Els e boek han will geve
det Wim thought that I Els a book have will give
‘Wim thought I wanted to give a book to Els.’
In the second session, however, in which the ‘assistant interviewer’ exclusively
interviews the other dialect speaker in the local dialect, the latter utters the
deWnite article both with the subject and object DP as ‘required’:
(5.23) 2nd session (dialect–dialect)
Der Wim menet dat ich et Els e boek probeerd ha kado te geve.
‘Wim thought I tried to give a book to Els.’
Other indications for easily switching between the base dialect and standard
Dutch can be found in (5.24). The inWnitival complementizer has the form om
and voor in standard Dutch and the local dialect, respectively (see Cornips 1996
for more details). In the Wrst session, the ‘assistant interviewer’ in interaction
with the standard Dutch speaking Weldworker uses the standard Dutch com-
plementizer om whereas in the second session the dialect speaker utters voor, as
illustrated in (5.24a) and (5.24b), respectively. Moreover, note that in the Wrst
session the proper name Wendy lacks the deWnite article again whereas it is
present in the second session, as presented in (5.24a) and (5.24b), respectively:
(5.24) a. Ø Wendy probeerdet om ginne pien te doe. 1st session
(dialect–
standard)
100 The Nature of Gradience
Example (5.26) and Figure 5.2 (on page 102) reveal the translations of (5.25)
into the local dialect:
(5.26) ‘translations’
location
a. standard variant Beek he hät zien hanj geweschen
b. standard variant Eijgelshoven
He had sieng heng gewesche
c. standard variant Maastricht
heer heet zien han gewasse
d. standard variant Vaals Hae hat zieng heng jewaesje
e. standard variant WaubachHeë hat zieng heng gewessje
he has his hands washed
f. intermediate variant Eijgelshoven Heë had zieg sieng heng gewesje
g. intermediate variant Valkenburg Hae haet zich zien heng gewesje
h. intermediate variant Spekholzerhei hea hat ziech zien heng jewesche
he has REFL his hands washed
i. dialect variant Simpelveld hea hat zich de heng gewesche
j. dialect variant Waubach Hea haa zich de heng gewèsje
he has REFL the hands washed
To begin with, the responses show that standard variants, dialect variants, and
intermediate variants are among the answers. Further, all deviations of the
input, for example intermediate and dialect variants as in (5.26f,g,h) and
(5.26i,j) respectively, provide strong evidence that these variants are in the
grammar of the speaker (Carden 1976). Moreover, variation arises within a
local dialect, as is the case in the spontaneous speech data of Heerlen Dutch,
which is a regional standard variety. Thus, two respondents in the location of
Eijgelshoven and Waubach reveal diVerent responses. The former displays
both the standard and the intermediate variant in (5.26b), and (5.26f),
respectively, whereas the latter yields the standard and the dialect variant in
(5.26e) and (5.26j), respectively. Finally, the majority of the respondents copy
the standard Dutch variant into their local dialect, as illustrated in (5.26a–e).10
In order to control for this task eVect, we also administered this type of
construction in the oral acceptability task (see below).
Taken together, the translations provide evidence that (a) the standard
variety strongly interferes with the local dialect variety, (b) intermediate
variants arise, and (c) in this part of the province of Limburg syntactic
features from the local dialects and standard Dutch exist in a continuum
both in a regional standard variety and in the local dialects (see also (5.11)).
10 Maastricht, in the western part of Limburg, denoted the standard variant in 1962 (cf. Blancquaert
et al. 1962). The translations in the atlas of Blancquart seem to suggest that the dative inalienable
possessive construction is more spoken in the eastern part of Limburg, i.e. Heerlen and surroundings.
102 The Nature of Gradience
c Meertens Inst
Figure 5.3. Grid of the oral interviews in Heerlen and neighbouring locations
standard variety in shifting towards the more prestigious variety in their
response, as is the case in the written elicitation task.
What is more, the majority of the respondents (6 out of 8, 75 per cent)
reveals an implicational pattern revealing that they copy the standard Dutch
variant in the Wrst session (dialect–standard repertoire) whereas they use the
intermediate or the local dialect variant in the second session which is the
dialect–dialect repertoire, as illustrated in (5.27).
(5.27) Location of Vaals:
Assistant interviewer:
a. Oversetz: ‘Hij heeft zijn handen gewassen.’
Instruction: he has his hands washed
‘Translate’
b. 1st session: ‘Her had zien heng gewasse’
standard–dialect he has his hands washed
c. 2nd session: ‘Her had sich sien heng gewasse.’
dialect–dialect he has refl his hands washed
Assistant interviewer:
d. ‘Komt disse Her had zich de heng gewasse.
satz ook veur?’
‘Do you also He has refl the hands washed
encounter
this variant?’
e. Answer: ‘ja’
‘yes’
104 The Nature of Gradience
Again, the interaction in (5.27) reveals that the speech repertoire in Limburg is
a continuous one in which the distinction between standard and dialect
varieties is blurred. Consequently, the dialect speaker judges all possible vari-
ants, that is to say, the standard possessive pronoun (5.27b), the dialect dative
construction (5.27d,e), and the intermediate variant (5.27c) as acceptable. More
evidence is presented by the fact that 9 out of 12 speakers (75 per cent) accept
both the dative possessive construction and the intermediate form. Strikingly, 6
out of 12 speakers (50 per cent) argue that all forms in (5.27) are acceptable in
the local dialect. Two of them give relative judgements without being asked:
one considered (5.27c) as slightly more acceptable than (5.27d), the other
speaker just considered (5.27d) slightly more acceptable than (5.27c).
This small case study indicates (a) extensive variation at the level of the
individual speaker such that half of the speakers show all possible syntactic
alternatives that exist on the level of their community and (b) the existence of
intermediate variants to such an extent that it blurs the distinction between
the local dialect and the standard variety. This result is attested in spontan-
eous speech, and in both the written and oral elicitation data, so we can
exclude the possibility that it is primarily due to task eVects. In this inter-
mediate speech repertoire the occurrence of intermediate variants is inevit-
able in the process of vertical standard–dialect and horizontal dialect–dialect
convergence. These Wndings put a question mark on the central sociolinguis-
tic proposal that only phonology is a marker of local identity whereas syntax
is a marker of cohesion in large geographical areas. Further, syntactic elicit-
ation shows that speakers of local dialects are no longer able to refuse
syntactic variants as fully ungrammatical even if (a) these concern emerging
intermediate variants and (b) they did not originally belong to their local
dialect variety. Consequently, relative acceptability is the result.
5.5 Conclusion
In this paper, I have discussed a so-called intermediate speech repertoire, that
results from a contact situation between standard Dutch, a regional Dutch
variety (Heerlen Dutch), and local dialects in the southern part of the
province of Limburg in the south of the Netherlands. This speech repertoire
reveals syntactic diVerences along a continuum to such an extent that it blurs
the distinction between the local dialect and the standard variety. It is
demonstrated that in this speech repertoire clear-cut judgements are not
attainable at all. Using case studies, it has been shown that speakers in this
area are not able to judge syntactic features as fully grammatical or ungram-
matical. Instead, all variants heard in the community, for example standard,
Intermediate Syntactic Variants 105
6.1 Introduction
This paper focuses on speciWc patterns of gradedness and optionality in the
grammar of three types of speakers: monolingual native speakers, speakers
whose native language (L1) has been inXuenced by a second language (L2),
and very Xuent non-native speakers. It is shown that in all these cases
gradedness is manifested in areas of grammar that are at the interface between
the syntax and other cognitive domains.
First, evidence is reviewed on the split intransitivity hierarchy (Sorace
2000b, 2003a), indicating that not only auxiliary selection but also a number
of syntactic manifestations of split intransitivity in Italian and other languages
are lexically constrained by aspectual properties. These constructions tend to
show gradedness in native intuitions that cannot easily be accommodated by
current models of the lexicon–syntax interface. Moreover, the mappings
between lexical properties and unaccusative/unergative syntax are develop-
mentally unstable, whereas the unaccusative/unergative distinction itself is
robust and unproblematic in acquisition.
Second, it is shown that residual optionality, with its entailed gradedness
eVects, occurs only in interface areas of the competence of near-native
speakers, and not in purely syntactic domains. Sorace (2003b) indicates that
the interpretable discourse features responsible for the distribution of overt
and null subject pronouns are problematic in the L2 steady state of L1 English
learners of Italian, whereas the non-interpretable syntactic features related to
the null subject parameter are completely acquired.
Third, it is argued that the diVerentiation between narrow computational
syntactic properties and interface properties is also relevant in other domains
of language development, such as L1 attrition due to prolonged exposure to a
Gradedness and Optionality in Mature and Developing Grammars 107
second language (Sorace 2000b; Tsimpli et al. 2004; Montrul 2004). A clear
parallelism exists between the end-state knowledge of English near-native
speakers of Italian and the native knowledge of Italian advanced speakers of
English under attrition with respect to null/overt subjects and pre/postverbal
subjects. In both cases, the speakers’ grammar is/remains a null-subject
language: for example null subjects, when they are used, are used in the
appropriate contexts, that is when there is a topic shift. The purely syntactic
features of grammar responsible for the licensing of null subjects are not
aVected by attrition.
The generalization seems to be that constructions that belong to the syntax
proper are resilient to gradedness in native grammars; are fully acquired in L2
acquisition; and are retained in L1 attrition. In contrast, constructions that
require the interface of syntactic knowledge with knowledge from other
domains are subject to gradedness eVects; present residual optionality in L2;
and exhibit emergent optionality to L1 attrition.
The question of the interpretation of this generalization, however, is still
open. There are (at least) two issues for further research. First, there is a lack of
clarity about the nature of diVerence interfaces. Are all interfaces equally
susceptible to gradedness and optionality? Second, is gradedness inside or
outside the speakers’ grammatical representations? The available evidence is in
fact compatible both with the hypothesis that the gradedness is at the level of
knowledge representations and with the alternative hypothesis that it arises at
the level of processing. Possible approaches to these open issues are outlined.
either approach.1 Instead, she proposes that intransitive verbs are organized
along a hierarchy (the split intransitivity hierarchy (SIH), originally called the
auxiliary selection hierarchy (ASH)) deWned primarily by aspectual notions
(telicity/atelicity), and secondarily by the degree of agentivity of the verb
(Figure 6.1).
The SIH is therefore an empirical generalization that identiWes the notion
of ‘telic change’ at the core of unaccusativity and that of ‘atelic non-motional
1 The study of optionality and gradedness at interfaces can be measured experimentally with
behavioural techniques that are able to capture subtle diVerences in speakers’ performance. The
informal elicitation techniques traditionally used in linguistics and language development research
(such as binary or n-point acceptability judgement tests) are unlikely to be reliable for such data,
because they can measure only very broad distinctions and typically yield ordinal scales, in which the
distance between points cannot be evaluated (Sorace 1996). A suitable experimental paradigm that has
gained ground in recent years is magnitude estimation (ME), a technique standardly applied in
psychophysics to measure judgements of sensory stimuli (Stevens 1975). The magnitude estimation
procedure requires subjects to estimate the perceived magnitude of physical stimuli by assigning values
on an interval scale (e.g. numbers or line lengths) proportional to stimulus magnitude. Highly reliable
judgements can be achieved in this way for a whole range of sensory modalities, such as brightness,
loudness, or tactile stimulation (see Stevens 1975 for an overview).
The ME paradigm has been extended successfully to the psychosocial domain (see Lodge 1981 for a
survey) and recently Bard et al. (1996), Cowart (1997), and Sorace (1992) showed that it may be applied
to judgements of linguistic acceptability. Unlike the n-point scales conventionally employed in the
study of psychological intuition, ME allows us to treat linguistic acceptability as a continuum and
directly measures acceptability diVerences between stimuli. Because ME is based on the concept of
proportionality, the resulting data are on an interval scale, which can therefore be analysed by means
of parametric statistical tests. ME has been shown to provide Wne-grained measurements of linguistic
acceptability, which are robust enough to yield statistically signiWcant results, while being highly
replicable both within and across speakers. ME has been applied successfully to phenomena such as
auxiliary selection (Bard et al. 1996; Sorace, 1992, 1993a, 1993b; Keller and Sorace 2003), binding
(Cowart 1997; Keller and Asudeh 2001), resumptive pronouns (Alexopoulou and Keller 2003;
McDaniel and Cowart 1999), that-trace eVects (Cowart 1997), extraction (Cowart 1997), and word
order (Keller and Alexopoulou 2001; Keller 2000b).
110 The Nature of Gradience
activity’ at the core of unergativity. The closer to the core a verb is, the more
determinate its syntactic status as either unaccusative or unergative. Verbs
that are stative and non-agentive are the most indeterminate. Sensitivity to
contextual or compositional factors correlates with the distance of a verb from
the core. Thus, the ASH helps to account both for variability and for consist-
ency in the behaviour of intransitive verbs. In contrast to the constructionist
view, where context is always critical, the ASH account prescribes that core
verbs have syntactic behaviour that is insensitive to non-lexical properties
contributed by the sentence predicate. On the other hand, peripheral verbs,
which are neither telic nor agentive, do seem to behave according to the
constructionist observation, with syntactic behaviour depending on the prop-
erties of the predicate in which they appear.
The SIH substantiates the intuition that, within their respective classes,
some verbs are ‘more unaccusative’ and ‘more unergative’ than others
(Legendre, Miyata, and Smolensky 1991). Crucially, however, this does not
mean that unaccusativity or unergativity are inherently gradient notions, or
that the distinction is exclusively semantic, but rather that some verbs allow
only one type of syntactic projection whereas other verbs are compatible with
diVerent projections to variable degrees. This is the reason why any approach
that focuses exclusively on the syntactic or on the semantic side of split
intransitivity is ultimately bound to provide only a very partial picture of
the phenomena in this domain. While no formal model yet exists that can
comprehensively account for the SIH, the SIH has given a new impetus to the
search for such a model. Theoretical research inspired by the SIH has in fact
been developed within diVerent frameworks and for diVerent languages
(e.g. Bentley and Eyrthórsson 2004; Cennamo and Sorace in press; Keller
and Sorace 2003; Legendre in press; Legendre and Sorace 2003; Randall in
press; Mateu 2003; among others).
Developmental evidence for the SIH comes from research on second lan-
guage acquisition (Montrul 2004, in press; DuYeld 2003) and on Wrst
language attrition (Montrul in press).
Core verbs are the Wrst ones to be acquired with the correct auxiliary both
in Wrst and second language acquisition. Data from the acquisition of Italian
as a non-native language show that the syntactic properties of auxiliary
selection are acquired earlier with core verbs and then gradually extended to
more peripheral verb types (Sorace 1993a, 1995), although L2 learners do not
attain the same gradient intuitions as those displayed by native Italians.
Moreover, Italian learners of French Wnd it more diYcult to acquire avoir as
the auxiliary for verbs closer to the core than for peripheral verbs (Sorace
Gradedness and Optionality in Mature and Developing Grammars 111
1993b, 1995), and do not completely overcome this diYculty even at the
advanced level. A study by Montrul (in press) conWrms this pattern for
L2 learners of Spanish, who have determinate intuitions on the syntactic
correlates of split intransitivity in this language, but only with core verbs.
These developmental regularities suggest two things. First, the acquisition
of the syntax of unaccusatives crucially depends on the internalization of
both the hierarchical ordering of meaning components, and the lexicon–
syntax mapping system instantiated by the target language. The pattern
uncovered by these data is consistent with an enriched constructional
model, equipped with a checking mechanism that is sensitive to the degree
of lexical speciWcation of verbs and rules out impossible mappings (see
Mateu 2003). As it is the position of verbs on the ASH, rather than their
frequency, which determines the order of acquisition, it seems that L2
learners do rather more than engaging in the kind of statistical learning
envisaged by a basic constructional model. Second, and more generally,
there are two sides to the split intransitivity question: a syntactic side (the
structural conWguration that determines unaccusativity or unergativity) and
a lexicon–syntax interface side (the mapping system that decides the syntactic
behaviour of any given verb). Gradedness and indeterminacy in native
grammars, as well as learning diYculties and residual problems in non-native
grammars, tend to be situated on the interface side: the syntactic distinction
itself is categorically stable.
2 Optionality is regarded here as the pre-condition for gradedness: the term refers to the
co-existence in the same grammar of two alternative ways of expressing the same semantic content,
of which one appears to be preferred over the other by the speaker in production and comprehension,
creating gradedness eVects (Sorace 2000b).
Gradedness and Optionality in Mature and Developing Grammars 113
3 The few existing studies on near-native L2 grammars point to a similar split between purely
syntactic constraints, which are completely acquired, and interpretive conditions on the syntax, which
may or may not be acquired. See Sorace (2003b, in press) for details.
114 The Nature of Gradience
this detail lies in the fact that the options favoured by near-native
speakers are (strongly) dispreferred by natives, but they are not illicit in
their grammar.
6.3.2 L1 attrition
There is evidence that the same pattern of asymmetric optionality is
exhibited by native speakers of null subject languages who have had pro-
longed exposure to English. Research on changes due to attrition from
another language (Sorace 2000a, Tsimpli et al. 2004) indicates that native
Italians who are near-native speakers of English exhibit an identical pattern
of optionality as the English near-native speakers of Italian described
above: these speakers overgeneralize overt subjects and preverbal subjects to
contexts which require a null subject or a postverbal subject. The reverse
pattern is not found.
It is worth noting that the phenomenon is found both in production and in
comprehension.
For example, in the forward anaphora sentences in (6.8b), speakers under
attrition are signiWcantly more likely than monolingual Italians to judge the
overt pronoun as coreferential with the matrix subject ‘Maria’; however, the
null pronoun in (6.8a) is correctly interpreted as referring to the matrix
subject. These speakers are also more likely to produce sentences such as
(6.9a) regardless of whether the subject is deWnite or indeWnite, whereas
monolingual speakers would prefer a postverbal subject, as in (6.9b), particu-
larly when the subject is indeWnite.
(6.8) a. Mentre attraversa la strada, Maria saluta la sua amica
while pro is crossing the street, Maria greets her friend
b. Mentre LEI attraversa la strada, Maria saluta la sua amica
while she is crossing the street, Maria greets her friend
4 Other cases of selective attrition at interfaces are discussed in Montrul (2002) with respect to the
tense/aspect domain in Spanish; Polinsky (1995) with respect to the distinction between reXexive and
possessive pronouns in Russian; Gürel (2004) on pronominals in Turkish.
5 Studies on bilingual Wrst language acquisition converge with the results of research on L2
acquisition and L1 attrition. The syntax–pragmatics interface has been identiWed as a locus of cross-
linguistic inXuence between the bilingual child’s syntactic systems (Müller and Hulk 2001). Bilingual
children who simultaneously acquire a null-subject language and a non-null-subject language over-
produce overt subjects in the null-subject language (see Serratrice 2004 on Italian–English bilinguals;
Paradis and Navarro 2003 on Spanish–English bilinguals; Schmitz (2003) on Italian–German bilin-
guals). Thus, crosslinguistic eVects obtain only from the non-null-subject language to the null-subject
language and never in the other direction, regardless of dominance.
116 The Nature of Gradience
6 At Wrst sight, it may appear as if the generalization just presented contradicts decades of L2
acquisition research. In particular, early research showed that semantically more transparent proper-
ties are easier to learn than more abstract syntactic properties that do not correspond in any clear way
to semantic notions (see e.g. Kellerman 1987). Moreover, studies of the ‘basic variety’ argued that early
interlanguage grammars favour semantic and pragmatic principles of utterance organization (Klein
and Perdue 1997). However, the argument here is NOT that syntactic aspects are easier than semantic
aspects, but that aspects of grammar that require not only syntactic knowledge, but also the integra-
tion of syntactic knowledge with knowledge from other domains are late acquired, or possibly never
completely acquired by L2 learners.
Gradedness and Optionality in Mature and Developing Grammars 117
7 The result is the use of focus in-situ, namely an L1-based strategy that is more economical because
it involves a DP-internal focus position (as the one overtly manifested in a sentence like ‘John himself
sneezed’). It is worth noticing that L1 French speakers of L2 Italian often use clefting in the same
context (Leonini and Belletti 2004), which is an alternative way of activating the VP-periphery (as
shown in the example below) and is widely available in French.
(i) Ce . . . . [Top [Foc [Top [VP être [sc Jean [CP qui a éternué] ] ] ] ] ]
118 The Nature of Gradience
8 Avrutin (2004) goes a step further and regards ‘discourse’ as ‘a computational system (my
emphasis) that operates on non-syntactic symbols and is responsible for establishing referential
dependencies, encoding concepts such as ‘‘old’’ and ‘‘new’’ information, determining topics, introdu-
cing discourse presuppositions, etc’. Investigating the interface between syntax and discourse neces-
sarily requires going beyond ‘narrow syntax’.
Gradedness and Optionality in Mature and Developing Grammars 119
9 As pointed out by a referee, the interface with discourse conditions obviously aVects other aspects
of pronominal use in English, such as the distribution of stress.
120 The Nature of Gradience
10 A similar argument is developed by Rizzi (2002), who accounts for the presence of null subjects
in early child English grammars by assuming that this is an option structurally available to the child,
which also happens to be favoured in terms of limited processing resources.
Gradedness and Optionality in Mature and Developing Grammars 121
11 One should not lose sight of the fact that these diYculties are resolved in ways that betray the
inXuence of universal factors. Optionality favours the retention and occasional surfacing of unmarked
options which are subject to fewer constraints, consistent with typological trends (see Bresnan 2000).
12 The extension of overt subject pronouns to null subject contexts is attested in another situation
in which knowledge of English is not a factor. Bini (1993) shows that Spanish learners of Italian up to
an intermediate proWciency level use signiWcantly more overt subjects than monolingual Italians and
monolingual Spanish speakers. Since the two languages are essentially identical with respect to both
the syntactic licensing of null subjects and the pragmatic conditions on the distribution of pronominal
forms, L1 inXuence is not a relevant factor here. This pattern is therefore likely to be due exclusively to
coordination diYculties leading to the use of overt subjects as a default option.
122 The Nature of Gradience
13 Clearly the quantitative factor is also a function of age of Wrst exposure: it cannot be considered
in absolute terms. Thus, an L2 speaker may have been exposed to the language for many decades and
still exhibit non-native behaviour compared to a younger native speaker who has been exposed to the
language for a shorter time, but since birth.
14 Variation at interfaces may be regarded as the motor of diachronic change, because it is at this
level that ‘imperfect acquisition’ from one generation to the next is likely to begin. Sprouse and Vance’s
(1999) study of the loss of null subjects in French indicates that language contact created a situation of
prolonged optionality, that is competition between forms that make the same contribution to
semantic interpretation, during which the null-subject option became progressively dispreferred in
favour of the overt-subject option because it is the less ambiguous in processing terms. An analogous
situation is experienced by the native Italian speaker after prolonged exposure to English: this speaker
will be exposed both to null pronouns referring to a topic antecedent (in Italian) and to overt
pronouns referring to a topic antecedent (in English, and also in the Italian of other native speakers
in the same situation). Optionality, and competition of functionally equivalent forms, is therefore as
relevant in this situation as in diachronic change.
The diachronic loss of auxiliary choice in Romance languages may also be traced as beginning from
non-core verbs and gradually extending to core verbs (Sorace 2000b; Legendre and Sorace 2003).
Gradedness and Optionality in Mature and Developing Grammars 123
6.5 Conclusions
To conclude, I have presented evidence of gradedness and optionality in
native and non-native grammars whose locus seems to be the interface
between syntactic and other cognitive domains. There are two potential
explanations for these patterns. One involves underspeciWcation at the level
of knowledge representations that involves the interaction of syntax and other
cognitive domains, such as lexical-semantics and discourse-pragmatics. The
other involves insuYcient processing resources for the coordination of diVer-
ent types of knowledge. Furthermore, there are diVerent kinds of interfaces,
not all of which are susceptible to gradedness eVects either in stable or in
developing grammars. Behavioural and neuropsychological evidence suggests
that syntactic processes are less automatic in L2 speakers than in L1 speakers,
which in turn may increase coordination diYculties. L2 speakers may also
have inadequate resources to carry out the right amount of grammatical
processing required by on-line language use, independently of their syntactic
knowledge representations. The processing and the representational explan-
ations, however, do not necessarily exclude each other, and indeed seem to
work in tandem, particularly in the case of bilingual speakers. Furthermore,
syntactic representations and processing abilities may be diVerentially aVected
over time by quantitative and qualitative changes occurring in the input to
which speakers are exposed. Future research is needed to ascertain the plausi-
bility, and work out the details, of a uniWed account of gradedness and
optionality in native and non-native grammars.
7
Decomposing Gradience:
Quantitative versus Qualitative
Distinctions
M AT T H I A S S C H L E S EWS K Y, I NA B O R N K E S S E L A N D
BRIAN MCELREE
7.1 Introduction
Psycho- and neurolinguistic research within the last three decades has shown
that speaker judgements are subject to a great deal of variability. Thus, speakers
do not judge all sentences of a given language equally acceptable that are
assumed to be grammatical from a theoretical perspective. Likewise, ungram-
matical sentences may also vary in acceptability in rating studies conducted
with native speakers. These Wndings stand in stark contrast to the classical
perspective that grammaticality is categorical in that a sentence is either fully
grammatical or fully ungrammatical with respect to a particular grammar.
This apparent contradiction has, essentially, been approached from two
diVerent directions. On the one hand, it has been proposed that judgement
variability—or gradience—results from extra-grammatical ‘performance fac-
tors’ and that it therefore has an origin distinct from linguistic ‘competence’
(Chomsky 1965). Alternatively, the gradience of linguistic intuitions has been
described in terms of varying markedness of the structures in question. Rather
than appealing to variation in grammaticality, this latter approach introduces
and appeals to an additional grammar-internal dimension. The idea that
structures can vary in acceptability for grammar-internal reasons has found
expression in the use of question marks, hash marks, and the like to describe
the perceived deviation from the endpoints of the grammaticality scale.
Importantly, it must be kept in mind that judgements of acceptability—
whether they are binary judgements or numerical ratings—represent
Quantitative versus Qualitative Distinctions 125
(7.1) a. Dann hat der Lehrer dem Jungen den Brief gegeben.
then has [the teacher]NOM [the boy]DAT [the letter]ACC given
‘Then the teacher gave the letter to the boy.’
b. ??Dann hat dem Jungen den Brief der Lehrer gegeben.
then has [the boy]DAT [the letter]ACC [the teacher]NOM given
‘Then the teacher gave the letter to the boy.’
c. *Dann hat der Lehrer gegeben dem Jungen den Brief.
then has [the teacher]NOM given [the boy]DAT [the letter]ACC
scrambling effect
ungrammaticality effect
3 Note that this essentially amounts to the same level of performance that a non-human primate
might be expected to produce when confronted with the same sentences and two alternative push-
buttons. As such, chance-level acceptability deWes interpretation.
Quantitative versus Qualitative Distinctions 129
Thus, we must examine how the judgement ‘emerges’ from the on-line
comprehension process.
CP3 CP3
N400
−5 mV
s OBJECT−SUBJECT
Figure 7.2. Grand average ERPs for object- and subject-initial structures at
the position of the disambiguating clause-final verb (onset at the vertical bar) for
sentences with accusative (A) and dative verbs (B). Negativity is plotted upwards. The
data are from Bornkessel et al. (2004a)
Quantitative versus Qualitative Distinctions 131
accusative structures gives rise to a P600 eVect as discussed above, the revision
towards a dative-initial word order elicits a centro-parietal negativity between
approximately 300 and 500 ms post onset of the disambiguating element
(N400; Bornkessel et al. 2004a; Schlesewsky and Bornkessel 2004). The
diVerence between the two eVects is shown in Figure 7.2.
In accordance with standard views on the interpretation of ERP component
diVerences (e.g. Coles and Rugg 1995), we may conclude from this distinction that
reanalysis towards an object-initial order engages qualitatively diVerent processing
mechanisms in accusative and dative structures. The processes in question, which
may be thought to encompass both conXict detection and conXict resolution,
therefore reXect underlyingly diVerent ways of resolving a superWcially similar
problem (i.e. the correction of an initially preferred subject-initial analysis).
Before turning to the question of whether the ERP diVerence between
reanalyses towards accusative and dative-initial orders may be seen as a
correlate of the diVerent acceptability patterns for the two types of object
cases—and thereby a source of gradience in this respect—we shall Wrst
examine a second exception to the generalizations in (7.3), namely the
behaviour of dative object-experiencer verbs (e.g. gefallen, ‘to be appealing
to’). As brieXy discussed above, this verb class is characterized by an ‘inverse
linking’ between the case/grammatical function hierarchy and the thematic
hierarchy: the thematically higher-ranking experiencer bears dative case,
while the lower-ranking stimulus is marked with nominative case. In the
theoretical syntactic literature, it has often been assumed that these verbs
are associated with a dative-before-nominative base order, which comes
about when the lexical argument hierarchy (experiencer > stimulus) is
mapped onto an asymmetric syntactic structure (e.g. Bierwisch 1988; Wun-
derlich 1997, 2003; Haider 1993; Haider and Rosengren 2003; Fanselow 2000).
These properties of the object-experiencer class lead to an interesting con-
stellation for argument order reanalysis, which may again be illustrated using
the sentence fragment in (7.4). As with the cases discussed above, the com-
prehension system initially assigns a subject-initial analysis to the input
fragment in (7.4). Again, when this fragment is completed by a dative
object-experiencer verb that does not agree with the Wrst argument, reanalysis
towards an object-initial order must be initiated. However, in contrast to the
structures previously discussed, here the verb provides lexical information in
support of the target structure, speciWcally an argument hierarchy in which
the dative outranks the nominative. The ERP diVerences between reanalyses
initiated by dative object-experiencer verbs and dative active verbs (which
project a ‘canonical’ argument hierarchy) are shown in Figure 7.3 (Bornkessel
et al. 2004a).
132 The Nature of Gradience
N400 N400
P4
P4
−5 mV
s SUBJECT−OBJECT
0.5 1.0 OBJECT−SUBJECT
5 P4
Figure 7.3. Grand average ERPs for object- and subject-initial structures at the
position of the disambiguating clause-final verb (onset at the vertical bar) for sen-
tences with dative active (A) and dative object-experiencer verbs (B). Negativity is
plotted upwards. The data are from Bornkessel et al. (2004a)
Subject-Object, Active
4 Object-Subject, Active
Subject-Object, Object-Experiencer
Accuracy (d⬘)
3 Object-Subject, Object-Experiencer
0 1 2 3 4 5 6
Processing time (lag plus latency) in seconds
Figure 7.4. SAT functions for object- and subject-initial structures with (dative)
active and (dative) object-experiencer verbs The data are from Bornkessel et al.
(2004a)
134 The Nature of Gradience
to each of the four conditions and distinct intercepts (d) to the subject-initial
and object-initial conditions, respectively.
(Eq. 1) d’ (t) ¼ l (1–e–b(t–d)) for t > d, otherwise 0
The intercept diVerence between subject-initial and object-initial structures,
with a longer intercept for object-initial structures, indicates that the Wnal
analysis of the object-initial sentences takes longer to compute than the
Wnal analysis of their subject-initial counterparts. This is the characteristic
pattern predicted for a reanalysis operation: as reanalysis requires additional
computational operations, the correct analysis of a structure requiring
reanalysis should be reached more slowly than the correct analysis of an
analogous structure not requiring reanalysis. The dynamics (intercept)
diVerence occurs in exactly the same conditions as the N400 eVect in the
ERP experiment.
The asymptotic diVerences appear to result from two sources. First, the
object-initial structures are generally associated with lower asymptotes than
the subject-initial controls. This diVerence likely reXects a decrease in accept-
ability resulting from the reanalysis operation required to interpret the
object-initial sentences. A principled explanation for this pattern, one that
is consistent with the concomitant dynamics diVerences, is that, on a certain
proportion of trials, the processing system fails to recover from the initial
misanalysis, thus engendering lower asymptotic performance for an initially
ambiguous object-initial structure as compared to a subject-initial structure.
More interesting, perhaps, are the diVerences in asymptote between the two
object-initial conditions: here, the sentences with object-experiencer verbs
were associated with a reliably higher asymptote than those with active verbs.
This diVerence may directly reXect the diVerences in the accessibility of the
object-initial structure required for a successful reanalysis. Whereas the active
verbs provide no speciWc lexical information in favour of such a structure, the
object-experiencer verbs are lexically associated with an argument hierarchy
calling for precisely this type of ordering. Thus, while a garden path results for
both verb types, the object-experiencer verbs provide a lexical cue that aids
the conXict resolution. Again, the correspondence to the ERP data is clear: the
higher asymptote for the object-initial structures with object-experiencer
verbs—which we have interpreted as arising from the higher accessibility of
the object-initial structure in these cases—corresponds to the reduced N400
for this condition, which also reXects a reduction of the reanalysis cost.
Two conclusions concerning acceptability patterns follow from this analy-
sis. First, despite the presence of an almost identical processing conXict in
both cases, dative-initial structures with object-experiencer verbs are more
Quantitative versus Qualitative Distinctions 135
acceptable than those with active verbs because only the former are lexically
associated with an object-initial word order. Secondly, however, even the
presence of an object-experiencer verb can never fully compensate the cost
of reanalysis, as evidenced by the fact that an initially ambiguous dative-initial
structure never outstrips its nominative-initial counterpart in terms of
acceptability. From a surface perspective, therefore, the observed acceptability
patterns are the result of a complex interaction between diVerent factors. The
observed gradience does not result from uncertainty in the judgements, but
rather from interactions between the diVerent operations that lead to the Wnal
intuition concerning acceptability.
Having traced the emergence of the acceptability judgements for the two
types of dative structures, a natural next step appears to be to apply the same
logic to the diVerence between accusative and dative structures and to thus
examine whether similar types of parallels between the on-line and oV-line
Wndings are evident in these cases.
Recall that, while the reanalysis of a dative structure generally engenders an
N400 eVect in ERP terms, reanalysis towards an accusative-initial order has
been shown to reliably elicit a P600 eVect. In addition, the surface acceptabil-
ity is much lower for the accusative than for the dative-initial structures.
How might these Wndings be related? Essentially, the diVerent ERP patterns
suggest that the two types of reanalysis take place not only in a qualitatively
diVerent manner, but also in diVerent phases of processing: while the N400 is
observable between approximately 300 and 500 ms after the onset of a critical
word, the time range of the P600 eVect is between approximately 600 and
900 ms. Thus, the reanalysis of an accusative structure appears to be a later
process than the reanalysis of a dative structure and, in terms of the SAT
methodology, we might therefore expect to observe larger dynamics diVer-
ences between subject- and object-initial accusative sentences than in the
analogous contrast for dative sentences. As discussed above, dynamics diVer-
ences can subsequently lead to diVerences in terminal (asymptotic) accept-
ability and the distinction between the dative and the accusative structures
might therefore also—at least in part—stem from a dynamic source.
The diVerence between subject- and object-initial dative and accusative
structures as shown in an SAT paradigm is shown in Figures 7.5.a and 7.5.b
(Bornkessel et al. submitted).
As the accusative and dative sentences were presented in a between-subjects
design, model Wtting was carried out separately for the two sentence types.
While the accusative structures were best Wt by a 2l–2b–2d model (adjusted
R2 ¼ .994), the best Wt for the dative structures was 1l–1b–2d (adjusted
R2 ¼ .990). Estimates of the composite dynamics (intercept + rate) were
136 The Nature of Gradience
(a) 5 Accusatives
4 S-INITIAL
Accuracy in d⬘ units
O-INITIAL
3
0 1 2 3 4 5 6 7
Processing time in seconds
(b) 5 Datives
4
S-INITIAL
Accuracy in d⬘ units
O-INITIAL
3
0 1 2 3 4 5 6 7
Processing time in seconds
Figure 7.5. SAT functions for object- and subject-initial structures with accusative
(a) and dative verbs (b) The data are from Bornkessel et al. (submitted)
computed for each condition using the formula 1/d + b, which provides a
measure of the mean time required for the SAT function to reach the
asymptote. The dynamics diVerence between the two accusative structures
was estimated to be 588 ms (4430 ms for object- versus 3842 ms for subject-
initial sentences), while the diVerence for the dative sentences was estimated
at 332 ms (3623 ms versus 3291 ms). Thus, while both dative- and accusative-
initial sentences show slower dynamics than their subject-initial counterparts,
the dynamics diVerence between the subject- and object-initial accusative
structures is approximately 250 ms larger than that between the dative
structures. This Wnding therefore supports the hypothesis that the large
Quantitative versus Qualitative Distinctions 137
Ongoing EEG
Amplifier
S S S S
one sec
−6 mV
N400
Signal
averager ELAN
Auditory
stimulus
(S)
P600
P200
+6 mV ms
200 400 600 800 1000
Stimulus Time
onset
2.0
1.5
Proportional dynamics
Functions reach a given proportion
1.0 of their asymptote at the same time.
0.5
Condition A
Condition B
0.0
Accuracy (d⬘ units)
2.0
1.5
Disproportional dynamics
Functions reach a given proportion
1.0
of their asymptote at different times.
0.5
0.0
can be interpreted more quickly than another, the SAT functions will diVer in
rate, intercept, or some combination of the two parameters. This follows from
the fact that the SAT rate and intercept are determined by the underlying
Wnishing time distribution for the processes that are necessary to accomplish
the task. The time to compute an interpretation will vary across trials and
materials, yielding a distribution of Wnishing times. Intuitively, the SAT
intercept corresponds to the minimum of the Wnishing time distribution,
and the SAT rate is determined by the variance of the distribution. Panel (b)
depicts a case where the functions diVer in rate of approach to asymptote,
leading to disproportional dynamics; the functions reach a given proportion
of their asymptote at diVerent times.
Dynamics (rate and/or intercept) diVerences are independent of potential
asymptotic variation. Readers may be less likely to compute an interpretation
for one structure or may Wnd that interpretation less acceptable (e.g. less
plausible) than another; however, they may not require additional time to
compute that interpretation (McElree 1993, 2000; McElree et al. 2003; McElree
and Nordlie 1999).
Part II
Gradience in Phonology
This page intentionally left blank
8
8.1 Introduction
Many phonologists associate the term ‘gradience’ with the distinction
between phonology—which is supposed to be categorical—and phonetics—
which is supposed to be gradient (see Cohn, this volume, for a review of the
issues associated with this distinction).1 In recent years, a diVerent role for
gradience in phonology has emerged: the well-formedness of phonological
structures has been found to be highly gradient in a way that correlates with
their frequency. In their chapter, Frisch and Stearns (this volume) show that
phonotactic patterns, like consonant clusters and other segment sequences, as
well as morphophonology, word-likeness, etc. are gradient in this way. The
examination of large corpora is a reliable indicator of relative frequency.
Crucially, the less frequent sequences are felt by speakers to be less prototyp-
ical exemplars of their category. In grammaticality judgement tasks, word-
likeness tasks, assessment of novel words, etc., less frequent items are likely to
get lower grades than more frequent ones. In short, speakers reproduce in
their judgements the pattern of relative frequency that they encounter in their
linguistic environment. In light of this well-documented (see Frisch and
Stearns, this volume and references cited there), but controversial result, the
question has arisen for some phonologists as to the need of a grammar
operating with abstract phonological categories, like features and phonemes.
In their opinion, if phonotactic distribution is learnable by executing prob-
abilistic generalizations over corpora, the only knowledge we need in order to
1 A pilot experiment for this paper was presented at the Potsdam Gradience Conference in October
2002 and some of the results discussed here were presented at the Syntax and Beyond Workshop in
Leipzig in August 2003. Thanks are due to two anonymous reviewers, as well as to Gisbert Fanselow
and Ede Zimmermann for helpful comments. Thanks are also due to Frank Kügler for speaking the
experimental sentences, and to Daniela Berger, Laura Herbst, and Anja Mietz for technical support.
Nobody except for the authors can be held responsible for shortcomings.
146 Gradience in Phonology
elaborate ‘grammars’ may turn out to be a stochastic one. But before we can
take a stand on this important issue in a competent way, we need to be well-
informed on other aspects of the phonology as well.
In this chapter, we take a Wrst step and investigate the question of whether
intonational contours are gradient in the same way that segment sequences
are. Is it the case that more frequent tonal patterns are more acceptable than
less frequent ones? We use the term gradience in the sense of gradient
acceptability.
Unfortunately, for a number of reasons, large corpora—at least in their
present state—are useless for the study of tonal pattern frequencies. One of
the reasons relates to the analysis and annotation of tonal patterns. Scholars
not only disagree on the kinds of categories entering intonation studies but
also on the annotation for ‘rising contour’, ‘fall–rise’, etc. Melodies—like
Gussenhoven’s (1984, 2004) nuclear contours or the British school’s ‘heads’
and ‘nuclei’—may well exist as independent linguistic elements, but they are
not transcribed uniformly. Even though autosegmental-metrical representa-
tions of tonal contours, like ToBI (Beckman and Ayers-Elam 1993; Jun 2005)
are evolving to become a standard in intonation studies, they are not suY-
ciently represented in corpora. Most large corpora consist of written material
anyway, and those which contain spoken material generally only display
segmental transcription rather than tonal.
In short, the development of corpora which are annotated in a conven-
tional way for intonation patterns is an aim for the future, but as of now, it is
simply not available for German.
As a result, we must rely on the intuition of speakers. The questions we
address in this chapter are: Which tonal contours are accepted most? Which
are less accepted? We will see that the question must be made precise in the
following way: given a certain syntactic structure, is there a contour which is
accepted in the largest set of contexts? And this is related to the question of
pitch accent location. Which constituents are expected to be accented?
Which accent structure is the least marked, in the sense of being accepted
in the greatest number of contexts? Are some accent patterns (tonal
patterns) ‘unmarked’ (more frequent, acquired earlier, but also accepted
more easily) in the same sense as consonant clusters or other segment
sequences are?
Below, we present the results of a perception study bearing on tonal
contours. But before we turn to the experiment, we Wrst sum up some relevant
issues in the research on prosody and situate our research in this broader
context.
Gradient Perception of Intonation 147
250
200
150
Pitch (Hz)
100
50
RUDERER bringen immer BOOTE mit
L*H Hp H*L Li
0 1.67134
Time (s)
Figure 8.1. Pitch track of Ruderer bringen immer Boote mit. ‘Oarsmer always bring
boats.’
250
200
150
Pitch (Hz)
100
50
RUDERER bringen immer Boote mit
H*L Li
0 1.73211
Time (s)
2 Some have found categories in the domain of pitch accent realization; for example Pierrehumbert
and Steele (1989) or Ladd and Morton (1987).
150 Gradience in Phonology
3 Birch and Clifton’s results also indicate that a single accent on the verb is readily accepted in a
context eliciting broad focus (78 percent of yes). The only situation where speakers accepted a pair less
(with 54 per cent and not between 71 and 84 percent as in the other pairs) was when the context was
eliciting a narrow focus on the verb, and the answer had a single accent on the argument (QB/R2).
152 Gradience in Phonology
Much more subtle is the question of whether prosody can help with the
sentence in (8.5). In one reading, it is the woman who lives in Georgia, and
in the other reading, her daughter. The phrasing, in the form of a Phono-
logical Phrase boundary, is roughly the same in both readings. Nevertheless, it
is possible to vary the quantity and the excursion of the boundary tone in
such a way that the preference for one or the other reading is favoured.
8.4 Experiment
8.4.1 Background
The experiment reported in this section was intended to elucidate the
question formulated above: How gradient are tonal contours? We wanted to
understand what triggers broad acceptance for intonational patterns. To this
aim, we used three diVerent kinds of sentences, which were inserted in
diVerent discourse contexts, and cross-spliced. If an eVect was to be found,
we expected it to be of the following kind: the unmarked tonal contours
should be generally better tolerated than the marked ones.
The hypothesis can be formulated as in (8.6).
Gradient Perception of Intonation 153
8.4.2 Material
Three diVerent kinds of sentences served as our experimental material: six
short sentences, six long sentences, and three sentences with ambiguous scope
of negation and quantiWer. Every sentence was inserted in three or four
matching contexts (see below). In (8.7) to (8.9), an example for each sentence
is given along with their contexts. The remaining sentences are listed in the
appendix.
8.4.3 Subjects
Four non-overlapping groups of Wfteen subjects (altogether sixty students at
the University of Potsdam) took part in the experiment. They were native
speakers of Standard German and had no known hearing or speech deWcit. All
were paid or acquired credit points for their participation in the experiment.
Two groups judged the sentences on a scale of 1 (very bad) to 8 (perfect), and
two groups judged the same sentences in a categorical way: acceptable (yes) or
non-acceptable (no). All sixty informants evaluated the scope sentences. In
addition, the Wrst and third groups also judged the short sentences, while the
second and fourth groups judged the long sentences, thus thirty matching
sentences plus sixty-eight non-matching ones each.
8.4.4 Procedure
The subjects were in a quiet room with a presentation using the DMDX
experiment generator software developed by K. and J. Forster at the
156 Gradience in Phonology
University of Arizona. The experimenter left the subject alone in the room
after brief initial instructions as to beginning and ending the session. The
subjects worked through the DMDX presentation in a self-paced manner. It
led them through a set of worded instructions, practice utterances, and Wnally
the experiment itself, consisting of 102 target sentences. No Wllers were
inserted, but three practice sentences started the experiment. This experiment
was itself included in a set of experiments in which the subjects performed
diVerent tasks: production of read material, and dialogues. The instructions
made it clear that the aim of the experiment was to test the intonation and
stress structure of the sentences, and not their meaning or syntax. The stimuli
were presented auditorily only: pairs of context and stimulus sentences were
presented sequentially. The subject heard Wrst a context, and after hitting the
return key, the test sentence. The task consisted in judging the adequacy of
the intonation of the sentence in the given context. Every recorded sentence
of the groups of short and long sentences was presented nine times, in three
diVerent intonational and stress patterns, and each of these patterns in
three diVerent contexts. The scope sentences were presented sixteen times
each, in all possible variants.
The sentences were presented in a diVerent randomized order for each
subject. The set-up and the instructions included the option of repeating the
context–stimulus pair for a particular sentence. Most subjects made occa-
sional use of this possibility. Only the last repetition was included in the
calculation of the reaction time (see Section 8.4.9).
250
200
150
100
50
Ruderer bringen immer BOOTE mit
L*H H*L Li
0 1.74512
Time (s)
Figure 8.3 displays a narrow focus on the object. The subject Ruderer
has a rising prenuclear pitch accent with a much smaller excursion than
in the unmarked topic-focus conWguration. The object Boote carries the
high-pitched nuclear accent.
Table 8.1. Short sentences: mean judgement scores (on a scale from 1 to 8)
8 Context
7 NFS
NFO
6 TF
5
Mean
NFS NFO TF
Intonation
Figure 8.4. Mean acceptability scores for short sentences (scale answers)
8 Context
7 NFS
NFO
6 TF
5
Mean
NFS NFO TF
Intonation
Figure 8.5. Mean acceptability scores for long sentences (scale answers)
160 Gradience in Phonology
Let us now relate our Wndings to those described in Section 8.2. First, the
scores for matching context-intonation pairs were higher than for non-
matching pairs. Second, a missing nuclear accent and an added nuclear accent
triggered lower scores than sentences with the expected accentuation. The
same was true for both a missing prenuclear accent and an added prenuclear
accent. As described by Hruska et al. (2001), adding a prenuclear accent on the
subject in a situation where only a nuclear accent on the object is expected
obtained higher scores than other non-matching pairs. In the same way,
Gussenhoven, as well as Clifton and Birch, also found that an added pre-
nuclear accent delivers better judgements than an added nuclear accent.
5 As a generalization, the negation may have wider scope when both the quantiWer and the negation
(or the negated constituent) are accented. This generalization holds only for this type of construction,
but not for other sentences with inverted scope, such as those with two quantiWers discussed in Krifka
(1998).
Gradient Perception of Intonation 161
2.09
2.0
BESCHÄDIGT worden
Li
beschädigt worden
Li
H*L
Time (s)
Time (s)
sind nicht
sind nicht
Hp
Autos
Autos
H*L
0
0
250
200
150
100
50
250
200
150
100
50
2.06
2.02
beschädigt worden
beschädigt worden
Li
Li
Time (s)
Time (s)
sind NICHT
H*L
NICHT
H*L
Hp
sind
BEIDE Autos
Beide Autos
(a) Context ‘Two’
L*H
0
250
200
150
100
50
250
200
150
100
50
6 Krifka (1998) explains scope inversion of sentences with two quantiWers by allowing movement of
accented constituents at the syntactic component of the grammar. Both topicalized and focused
constituents have to be pre-verbal at some stages of the derivation in order to get stress.
Gradient Perception of Intonation 163
Table 8.3. Scope sentences: mean judgement scores (on a scale from 1 to 8)
8 Context
7 two
FQ
6 FN
TF
5
Mean
two FQ FN TF
Intonation
Figure 8.7. Mean acceptability scores for scope sentences (scale answers)
164 Gradience in Phonology
in both sentences, Wtting both contexts requiring two accents. The same
cannot be said for the realizations with one accent since the accent elicited
in each case is at a diVerent place. However, the FN sentences, with a late
accent, elicited better scores in a non-matching environment than the FQ
sentences with an early accent. The highly marked prosodic pattern found in
FQ sentences obtained poor scores in all non-matching contexts, and the best
results in the matching context.
To sum up the results obtained for these sentences, it can be observed that
the interchangeability of contexts and intonation pattern is higher in these
sentences than in the short and long sentences. We explain this pattern of
acceptability with the fact that the scope structure of these sentences, complex
and subject to diVerent interpretations, renders the accent patterns less rigid.
Another interpretation could be that speakers were more concentrated on
understanding the scopal relationships and were thus less sensitive to slight
variations in the tonal structure of the sentences they heard.
8.5 Conclusion
This chapter has investigated the gradient nature of the acceptability of
intonation patterns in German declarative sentences. Three kinds of sentences
elicited in diVerent information structural contexts were cross-spliced and
informants were asked to judge the acceptability of context-target pairs. The
clearest results were obtained for the short sentences, although the long
sentences delivered comparable results. Finally, the tonal patterns of scope
sentences were much more diYcult to interpret, because the scope behaviour
of the negation and the quantiWer was variable, depending on the accent
structure of these sentences. For all sentences, we found that a prosody with
two accents got better scores than a prosody with only one accent, and that a
contour with a late accent was better accepted in non-matching environ-
ments. We dubbed the prosody with two accents, acceptable in a broad focus
context or in a topic-focus context, UPS, for ‘unmarked prosodic structure’,
and we observe that this contour is accepted in a non-matching context more
readily than contours with only one accent, especially when this single accent
is located early in the sentence.
The results of the short and long sentences, and, to a lesser extent, those of
the scope sentences, point to a good correlation between context and prosodic
structure. Speakers and hearers do use prosodic information such as presence
versus absence of pitch accents, their form, and the phrasing to assess the
well-formedness of context-target sentence pairs, and they do so consistently.
Their performance is ameliorated when the syntactic and semantic structure
of the sentence is very simple. It can safely be claimed that in German,
information structure plays an important role in the processing of prosody,
whereas it has been shown for syntax that word order alone, presented in
written form, does not have the same eVect (see for instance Schlesewsky,
Bornkessel, and McElree, this volume, and references cited there). The con-
clusion one could tentatively draw from this diVerence is that intonation
encodes information structure better than syntax.
An interesting result is that in all three experiments the scores obtained for
the two groups of subjects (scale and yes–no answers) were similar. In other
words, the same gradient results can be obtained by using either gradient or
non-gradient judgements. This is remarkable since the cognitive task executed
in both groups was diVerent. It could have been the case that in a sentence
with a high score of acceptability the rating by scale would have been gradient,
but the yes–no judgement categorical. However, if the groups of informants
166 Gradience in Phonology
are large enough, ‘intolerant’ subjects compensate for the degree of insecurity
that remains in subjects asked to give a judgement on a scale.
Although we oVer no analysis of how our gradient data can be accounted
for in a formal grammar, we conclude with the observation that a categorical
grammar will not be adequate. Speakers are more or less conWdent in their
judgements, and gradiently accept sentences intended to express a diVerent
information structure, depending on whether the sentences have a similar
accent pattern. A gradient grammar, like stochastic OT, which uses overlap-
ping constraints, can account much better for the observed variability. This is,
however, a subject for future research.
8.6 Appendix
Short sentences (three contexts)
1. Maler bringen immer Bilder mit. ‘Painters always bring pictures.’
2. Lehrer bringen immer Hefte mit. ‘Teachers always bring notebooks.’
3. Sänger bringen immer Trommeln mit. ‘Singers always bring drums.’
4. Ruderer bringen immer Boote mit. ‘Oarsmen always bring boats.’
5. Geiger bringen immer Platten mit. ‘Violinists always bring records.’
6. Schüler bringen immer Stifte mit. ‘Students always bring pens.’
Long sentences (three contexts)
7. Passagiere nach Rom nehmen meistens den späten Flug.
‘Passengers to Rome always take the late Xight.’
8. Reisende nach Mailand fahren oft mit dem schnellen Bus.
‘Travelers to Milan often travel with the express bus.’
9. Autofahrer nach Griechenland nehmen immer den kürzesten Weg.
‘Car drivers always take the shortest road.’
10. SchiVe nach Sardinien fahren meistens mit voller Ladung.
‘Ships to Sardinia mostly sail with a full cargo.’
11. Züge nach England fahren oft mit rasantem Tempo.
‘Trains to England often ride at full speed.’
12. Trekker nach Katmandu reisen meistens mit vollem Rucksack.
‘Trekkers to Katmandu mostly travel with a full backpack.’
Variable scope sentences (four contexts)
13. Alle Generäle sind nicht loyal. ‘All generals are not loyal.’
14. Beide Autos sind nicht beschädigt worden. ‘Both cars have not been
damaged.’
15. Viele Gäste sind nicht gekommen. ‘Many guests did not come.’
9
Prototypicality Judgements as
Inverted Perception
PAU L B O E R S M A
In recent work (Boersma and Hayes 2001), Stochastic Optimality Theory has
been used to model grammaticality judgments in exactly the same way as
corpus frequencies are modelled, namely as the result of noisy evaluation of
constraints ranked along a continuous scale. It has been observed, however, that
grammaticality judgements do not necessarily reflect relative corpus fre-
quencies: it is possible that structure A is judged as more grammatical than a
competing structure B, whereas at the same time structure B occurs more often
in actual language data than structure A. The present chapter addresses one of
these observations, namely the finding that ‘ideal’ forms found in experiments
on prototypicality judgements often turn out to be peripheral within the corpus
distribution of their grammatical category (Johnson, Flemming, and Wright
1993). At first sight one must expect that Stochastic Optimality Theory
will have trouble handling such observed discrepancies. The present chapter,
however, shows that a bidirectional model of phonetic perception and produc-
tion (Boersma 2005) solves the paradox. In that model, corpus frequency
reflects the production process, whereas prototypicality judgements naturally
derive from a simpler process, namely the inverted perception process.
9.1.1 Why the /i/ prototype eVect is a problem for linguistic models
The /i/ prototype eVect has consequences for models of phonological gram-
mar. The commonly assumed three-level grammar model, for instance,
has trouble accounting for it. In this model, the phonology module maps
an abstract underlying form (UF), for instance the lexical vowel jij, to an
equally discrete abstract surface form (SF), for instance /i/, and the phonetic
implementation module subsequently maps this phonological SF to a con-
tinuous overt phonetic form (OF), which has auditory correlates, such as a
value of the Wrst formant, and articulatory correlates, such as a certain tongue
height and shape. Such a grammar model can thus be abbreviated as
UF!SF!OF.
The experimental prototypicality judgement task described above involves
a mapping from the phonological surface form /i/ to an overt auditory Wrst
formant value, that is an SF!OF mapping. In the three-level grammar
model, therefore, the natural way to account for this task is to assume that
it shares the SF!OF mapping with the phonetic implementation process. If
so, corpus frequencies (which result from phonetic implementation)
should be the same as grammaticality judgements (whose best result is the
prototype). Given that the two turn out to be diVerent, Johnson et al.
(1993) found the UF!SF!OF model wanting and proposed the model
UF!SF!HyperOF!OF, where the additional intermediate representation
HyperOF is a ‘hyperarticulated’ phonetic target. The prototypicality task, then,
was proposed to tap HyperOF, whereas corpus frequencies reXect OF. The
present paper shows, however, that if one distinguishes between articulatory
and auditory representations at OF, the two tasks (production and proto-
typicality) involve diVerent mappings, and the /i/ prototypicality eVect arises
automatically without invoking the additional machinery of an extra inter-
mediate representation and an extra processing stratum.
9.2.1 A grammar model with two phonological and two phonetic representations
The grammar model presented in Figure 9.1 is the Optimality-Theoretic
model of ‘phonology and phonetics in parallel’ (Boersma 2005).
Prototypicality Judgements as Inverted Perception 169
Figure 9.1 shows the four relevant representations and their connections. There
are two separate phonetic forms: the auditory form (AudF) appears because it is
the input to comprehension, and the articulatory form (ArtF) appears because it
is the output of production. ArtF occurs below AudF because 9-month-old
children can perceive sounds that they have no idea how to produce (for an
overview, see Jusczyk 1997); at this age, therefore, there has to be a connection
from AudF to SF (or even to UF, once the lexicon starts to be built up) that cannot
pass through ArtF; Figure 9.1 generalizes this for speakers of any age.
UF lexical constraints
faithfulness constraints
SF structural constraints
cue constraints
AudF auditory constraints?
sensorimotor constraints
ArtF articulatory constraints
Figure 9.1. The grammar model underlying bidirectional phonology and phonetics
170 Gradience in Phonology
(Escudero and Boersma 2003, 2004), and the output (SF) is evaluated with the
familiar structural constraints known since the earliest Optimality Theory
(OT; Prince and Smolensky 1993). This AudF!SF mapping is language-
speciWc, and several aspects of it have been modelled in OT: categorizing
auditory features to phonemes and autosegments (Boersma 1997, 1998a et
seq.; Escudero and Boersma 2003, 2004), and building metrical foot structure
(Tesar 1997; Tesar and Smolensky 2000; Apoussidou and Boersma 2004).
The second mapping in comprehension, shown at the top left in Figure 9.2,
is that from SF to UF and can be called recognition, word recognition, or lexical
access. In this mapping, the relation between input (SF) and output (UF) is
evaluated with faithfulness constraints such as those familiar from two-level
OT (McCarthy and Prince 1995), and the output (UF) is evaluated with lexical
access constraints (Boersma 2001).
Boersma (2005) proposes that in contradistinction to comprehension, pro-
duction (shown at the right in Figure 9.2) consists of one single mapping from
UF to ArtF, without stopping at any intermediate form as is done in compre-
hension. In travelling from UF to ArtF, the two representations SF and AudF
are necessarily visited, so the production process must evaluate triplets of { SF,
AudF, ArtF } in parallel. As can be seen from Figure 9.1, the evaluation of these
triplets must be done with faithfulness constraints, structural constraints, cue
constraints, sensorimotor constraints (which express the speaker’s knowledge
of how to pronounce a target auditory form, and of what a given articulation
will sound like), and articulatory constraints (which express minimization of
eVort; see Boersma 1998a, Kirchner 1998). According to Boersma (2005), the
point of regarding phonological and phonetic production as parallel processes
is that this can explain how discrete phonological decisions at SF can be
inXuenced by gradient phonetic considerations such as salient auditory cues
at AudF (e.g. Steriade 1995) and articulatory eVort at ArtF (e.g. Kirchner 1998).
Comprehension Production
UF UF
SF SF
AudF AudF
ArtF ArtF
Figure 9.2. The linguistic task of the listener, and that of the speaker
Prototypicality Judgements as Inverted Perception 171
UF UF UF
SF SF SF
1 Unless the very point of the experiment is to investigate the inXuence of the lexicon on prelexical
perception. By the way, if such inXuences turn out to exist (e.g. Ganong 1980; Samuel 1981), the
comprehension model in Figure 9.2 will have to be modiWed in such a way that perception and
recognition work in parallel. In that case, however, the phoneme categorization task will still look like
that in Figure 9.3, and the results of the present paper will still be valid.
172 Gradience in Phonology
to look at auditory events that are likely to be perceived as /i/ or as one of its
neighbours in the vowel space, such as /I/ or /e/. Thus, the auditory form
(AudF) is a combination of an F1 and an F2 value, and the surface form (SF) is a
vowel segment such as /i/ or /e/. This section shows how the AudF!SF
mapping is handled with an Optimality-Theoretic grammar that contains
cue constraints, which evaluate the relation between AudF and SF, and struc-
tural constraints, which evaluate the output representation SF.
For simplicity I discuss the example of a language with three vowels, /a/, /e/,
and /i/, in which the only auditory distinction between these vowels lies in
their F1 values. Suppose that the speakers realize these three vowels most often
with F1 values of 700 Hz, 500 Hz, and 300 Hz, respectively, but that they also
vary in their realizations. If this variation can be modelled with Gaussian
curves with standard deviations of 60 Hz, the distributions of the speakers’
productions will look as in Figure 9.4.
Now how do listeners classify incoming F1 values, that is to which of the
three categories /a/, /e/, and /i/ do they map a certain incoming F1 value x?
This mapping can be handled by a family of negatively formulated
Optimality-Theoretic cue constraints, which can be expressed as ‘if the
auditory form contains an F1 of x Hz, the corresponding vowel in the surface
form should not be y’ (Escudero and Boersma 2003, 2004).2 These cue
constraints exist for all F1 values between 100 and 900 Hz and for all three
vowels. Examples are given in (9.1).
2 There are two reasons for the negative formulation of these constraints. First, a positive formu-
lation would simply not work in the case of the integration of multiple auditory cues (Boersma and
Escudero 2004). Second, the negative formulation allows these constraints to be used in both
directions (comprehension and production), as is most clearly shown by the fact that they can be
formulated symmetrically as *[x]/y / (Boersma 2005). The former reason is not relevant for the present
paper, but the second is, because the same cue constraints are used in the next two sections.
174 Gradience in Phonology
[380 Hz] 320 Hz 380 Hz 460 Hz 320 Hz 460 Hz 380 Hz 380 Hz 320 Hz 460 Hz
not /a/ not /a/ not /i/ not /e/ not /a/ not /i/ not /e/ not /i/ not /e/
(UF = |i|)
/a/ *!
/e/ ←*
/i/ *!→
3 Auditory constraints, if they exist, evaluate the input and cannot therefore distinguish between
the candidates.
Prototypicality Judgements as Inverted Perception 175
If the lexicon now tells the learner that she should have perceived /i/ instead
of /e/, she will regard this as the correct adult SF, as indicated by the check
mark in Tableau 9.1. According to the gradual learning algorithm for stochas-
tic OT (Boersma 1997, Boersma and Hayes 2001), the learner will take action
by raising the ranking value of all the constraints that prefer the adult form /i/
to her own form /e/ (here only ‘380 Hz is not /e/’) and by lowering the
ranking value of all the constraints that prefer /e/ to /i/ (here only ‘380 Hz
is not /i/’). These rerankings are indicated by the arrows in Tableau 9.1.
To see what kind of Wnal perception behaviour this procedure leads to, I ran
a computer simulation analogous to the one by Boersma (1997). A virtual
learner has 243 constraints (F1 values from 100 to 900 Hz in steps of 10 Hz, for
all three vowel categories), all with the same initial ranking value of 100.0. The
learner then hears 10 million F1 values randomly drawn from the distributions
in Figure 9.4, with an equal probability of one in three for each vowel. She is
subjected to the learning procedure exempliWed in Tableau 9.1, with full
knowledge of the lexical form, with an evaluation noise of 2.0, and with a
plasticity (the amount by which ranking values rise or fall when a learning step
is taken) of 0.01. The result is shown in Figure 9.5.
The Wgure is to be read as follows. F1 values below 400 Hz will mostly be
perceived as /i/, since in that region the constraint ‘an F1 of x Hz is not /i/’ (the
solid curve) is ranked lower than the constraints ‘an F1 of x Hz is not /e/’ (the
dashed curve) and ‘an F1 of x Hz is not /a/’ (the dotted curve). Likewise, F1
values above 600 Hz will mostly be perceived as /a/, and values between 400
and 600 Hz mostly as /e/. For every F1 value the Wgure shows us not only the
most often perceived category but also the degree of variation. Around
400 Hz, /i/ and /e/ perceptions are equally likely. Below 400 Hz it becomes
more likely that the listener will perceive /i/, and increasingly so when the
110
105
Ranking value
100
95
90
100 200 300 400 500 600 700 800 900
F1 (Hz)
Figure 9.5. The final ranking of ‘an F1 of x Hz is not /vowel/’, for the vowels /i/ (solid
curve), /e/ (dashed curve), and /a/ (dotted curve)
176 Gradience in Phonology
distance between the curves for /i/ and /e/ increases. This distance is largest for
F1 values around 250 Hz, where there are 99.8 per cent /i/ perceptions and
only 0.1 per cent perceptions of /e/ and /a/ each. Below 250 Hz, the curves
approach each other again, leading to more variation in categorization.
A detailed explanation of the shapes of the curves in terms of properties of
the gradual learning algorithm (approximate probability matching between
250 and 750 Hz, and low corpus frequencies around 100 and 900 Hz) is
provided at the end of the next section, where the shapes of the curves are
related to the behaviour of prototypicality judges.
320 310 170 180 300 190 290 200 280 210 270 230 220 240 260 250
Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz
/i/ not not not not not not not not not not not not not not not not
/i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/
[170 Hz] *!
[180 Hz] *!
[190 Hz] *!
[200 Hz] *!
[210 Hz] *!
[220 Hz] *!
[230 Hz] *!
[240 Hz] *!
[250 Hz] *
[260 Hz] *!
[270 Hz] *!
[280 Hz] *!
[290 Hz] *!
[300 Hz] *!
[310 Hz] *!
[320 Hz] *!
/e/ /a/
/i/
Probability density
The conclusion is that if the prototypicality task uses the same constraint
ranking as phoneme categorization, auditorily peripheral segments will be
judged best if their auditory values are extreme, because cue constraints have
automatically been ranked lower for extreme auditory values than for more
central auditory values. The question that remains is: how has the /i/ curve in
Figure 9.5 become lower at 250 Hz than at 300 Hz? The answer given by
Boersma (1997) is the probability-matching property of the Gradual Learning
Algorithm: the ultimate vertical distance between the /i/ and /e/ curves for a
given F1 is determined (after learning from a suYcient amount of data) by the
probability that that F1 reXects an intended /i/ rather than an intended /e/;
given that an F1 of 250 Hz has a smaller probability of having been intended as
/e/ than an F1 of 300 Hz, the vertical distance between the /i/ and /e/ curves
grows to be larger at 250 Hz than at 300 Hz, providing that the learner is given
suYcient data. With the Gradual Learning Algorithm and enough input, the
prototypicality judge will automatically come to choose the F1 token that is
least likely to be perceived as anything else than /i/.4
There are two reasons why the prototype does not have an even lower F1
than 250 Hz. The Wrst reason, which can be illustrated with the simulation, is
that there are simply not enough F1 values of, say, 200 Hz to allow the learner
to reach the Wnal state of a wide separation between the /i/ and /e/ curves; for
the simulated learner, Figure 9.5 shows that even 10 million inputs did not
suYce. The second reason, not illustrated by the simulation, is that in reality
the F1 values are not unbounded. Very low F1 values are likely to be perceived
as an approximant, fricative, or stop rather than as /i/. Even within the vowel
4 This goal of choosing the least confusing token was proposed by Lacerda (1997) as the driving
force behind the prototypicality judgement. He did not propose an underlying mechanism, though.
See also Section 9.4.
Prototypicality Judgements as Inverted Perception 179
space, this eVect can be seen at the other end of the continuum: one would
think that the best token for /a/ would have an extremely high F1, but in reality
an F1 of, say, 3000 Hz will be perceived as /i/, because the listener will
reinterpret it as an F2 with a missing F1.
Tableau 9.3.
180
The result of the ranking in Tableau 9.3 is that the auditory-articulatory
prec 320 prec 310 prec 170 prec 180 prec 300 prec 190 290 200 280 210 270 230 220 240 260 250
not Hz not Hz not Hz not Hz not Hz not Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz
Gradience in Phonology
/i/
> not > not > not > not > not > not not not not not not not not not not not
26 /i/ 24 /i/ 22 /i/ 20 /i/ 18 /i/ 16 /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/
The result of the phoneme production task is thus very diVerent from that
of the prototypicality task. The diVerence between the two tasks can be
reduced to the presence of the articulatory constraints in the production
task and their absence in the prototypicality task.
9.5 Conclusion
The present paper has oVered formal explanations for two facts, namely that a
prototype (by being less confusable) is more peripheral than the modal
auditory form in the listener’s language environment, and that a prototype
(by not being limited by articulatory restrictions) is more peripheral than the
modal auditory form that the listener herself will produce. Given the repre-
sentation-and-constraints model of Figure 9.1, the only assumptions that led
to these formal explanations were that representations are evaluated only if
they are necessarily activated (Figure 9.3) and that in production processes
(Figure 9.2, right; Figure 9.3, right) the output representations are evaluated
184 Gradience in Phonology
10.1 Introduction
Many cases of gradient intuitions reXect conXicting patterns in the data that a
child receives during language acquisition.1 An area in which learners fre-
quently face conXicting data is inXectional morphology, where diVerent
words often follow diVerent patterns. Thus, for English past tenses, we have
wing winged (the most common pattern in the language), wring wrung
(a widespread [I] [ˆ] pattern), and sing sang (a less common [I] [æ]
pattern). In cases where all of these patterns could apply, such as the novel
verb spling, the conXict between them leads English speakers to entertain
multiple possibilities, with competing outcomes falling along a gradient scale
of intermediate well-formedness (Bybee and Moder 1983; Prasada and Pinker
1993; Albright and Hayes 2003).
In order to get a more precise means of investigating this kind of gradience,
we have over the past few years developed and implemented a formal
model for the acquisition of inXectional paradigms. An earlier version of
our model is described in Albright and Hayes (2002), and its application to
various empirical problems is laid out in Albright et al. (2001), Albright
(2002), and Albright and Hayes (2003). Our model abstracts morphological
and phonological generalizations from representative learning data and uses
1 For helpful comments and advice we would like to thank Paul Boersma, Junko Ito, Armin Mester,
Jaye Padgett, Hubert Truckenbrodt, the editors, and our two reviewers, absolving them for any
shortcomings.
186 Gradience in Phonology
them to construct a stochastic grammar that can generate multiple forms for
novel stems like spling. The model is tested by comparing its ‘intuitions’,
which are usually gradient, against human judgements for the same forms.
In modelling gradient productivity of morphological processes, we have
focused on the reliability of the generalizations: how much of the input data
do they cover, and how many exceptions do they involve? In general, greater
productivity is correlated with greater reliability, while generalizations cover-
ing few forms or entailing many exceptions are relatively unproductive. For
English past tenses, most generalizations have exceptions, so Wnding the
productive patterns requires Wnding the generalizations with the fewest
exceptions. Intermediate degrees of well-formedness arise when the generaliza-
tions covering diVerent patterns suVer from diVerent numbers of exceptions.
The phenomenon of gradient well-formedness shows that speakers do not
require rules or constraints to be exceptionless; when the evidence conXicts, they
are willing to use less than perfect generalizations. One would expect, however,
that when gradience is observed, more reliable generalizations should be
favoured over less reliable ones. In this article, we show that, surprisingly, this
is not always the case. In particular, we Wnd that there may exist generalizations
that are exceptionless and well-instantiated, but are nonetheless either com-
pletely invalid, or are valued below other, less reliable generalizations.
The existence of exceptionless, but unproductive patterns is a challenge for
current approaches to gradient productivity, which generally attempt to extend
patterns in proportion to their strength in the lexicon. We oVer a solution for
one class of these problems, based on the optimality-theoretic principle of
constraint conXict and employing the Gradual Learning Algorithm (Boersma
1997; Boersma and Hayes 2001). In the Wnal section of the paper we return to our
earlier work on gradience and discuss the implications of our present Wndings.
2 We have rendered all transcriptions (including Sapir and Hoijer’s) in near-IPA, except that we use
[č čh č’ š ž] for [tS tSh tS’ SZ] in order to depict the class of nonanterior sibilants more saliently.
The Gradual Learning Algorithm 187
d. [kàn] [sı̀-kàn]
e. [k’àz] [sı̀-k’àz]
f. [khéška1:] [šı̀-khéška1:], [sı̀-khéška1:]
g. [sı́:æ] [sı̀-sı́:æ]
h. [tha1š] [šı̀-tha1š], [sı̀-tha1š]
i. [thó:?] [sı̀-thó:?]
j. [t¸é:ž] [šı̀-t¸é:ž], [sı̀-t¸é:ž]
Where free variation occurs, the learner is provided with one copy of each
variant; thus, for (10.3f) both [khéška1:] [šı̀-khéška1:] and [khéška1:]
[sı̀-khéška1:] are provided.
The goal of learning is to determine which environments require [sı̀-],
which require [šı̀-], and which allow both. Learning involves generalizing
bottom-up from the lexicon, using a procedure described below. Generaliza-
tion creates a large number of candidate environments; an evaluation metric
is later employed to determine how these environments should be employed
in the Wnal grammar.
c. [khéškã:]
` `
[sì-khéškã:] c. [khéškã:]
` `
[šì-khéškã:]
`
e. [thãš] `
[sì-thãš] `
e. [thãš] `
[šì-thãš]
f. [kPàz] [sì-kPàz]
g. [sí: ] [sì-sí: ]
h. [thó:?] [sì-thó:?]
The Gradual Learning Algorithm 189
b. ∅ → šì / [ ___ th ã̀ š ]
+ ∅ → šì / [ ___ t é: z ]
−sonorant +syllabic −sonorant
= ∅ → šì / [ ___ −continuant −high +continuant ]
+anterior −round −anterior
+strident
In this particular case, the forms being compared are quite similar, so
determining which segment should be compared with which is unproblem-
atic. But for forms of diVerent lengths, such as [čhò:jı̀n] and [č’ı̀æ] above, this
is a harder question.4 We adopt an approach that lines up the segments that
are most similar to each another. For instance, (10.7) gives an intuitively
reasonable alignment for [čhò:jı̀n] and [č’ı̀æ]:
(10.7) čh ò: j ı̀ n
j j j
č’ ı̀ æ
4 The issue did not arise in an earlier version of our model (Albright and Hayes 2002), which did
not aspire to learn non-local environments, and thus could use simple edge-in alignment.
190 Gradience in Phonology
∅ → šì / [ kh é š kã: ]
⬘
+ ∅ → šì / [ th ã ]
⬘
-sonorant
+syllabic
-contin
= ∅ → šì / [ –high š (k) (ã:)]
⬘
+spread gl.
–round
-constr. gl.
-sonorant
+syllabic
[ -contin ([+seg])*]
∅ → šì / -high š
+spread gl.
-round
-constr. gl.
set, keeping only those rules that perform best.5 Generalization terminates
when no new ‘keeper’ rules are found.
We Wnd that this procedure, applied to a representative set of words,
discovers the environment of non-local sibilant harmony after only a few
steps. One path to the correct environment is shown in (10.9):
+son
-continuant -cons -syllabic
-anterior ( ) *i +anterior
-nasal [ch ì t í]
-continuant
([+seg])* [z ì: ]
-anterior
[-anterior] ([+seg])* [t í w ó z ì: p á h í ]
The result can be read: ‘PreWx [šı̀-] to a stem consisting of any number of
segments followed by a nonanterior segment, followed by any number of
segments.’ (Note that [–anterior] segments in Navajo are necessarily sibilant.)
In more standard notation, one could replace ([+seg])* with a free variable X,
and follow the standard assumption that non-adjacency to the distal word
edge need not be speciWed, as in (10.10):
(10.10) Ø ! šı̀- / ___ X [–anterior]
We emphasize that at this stage, the system is only generating hypotheses. The
task of using these hypotheses to construct the Wnal grammar is taken up in
Section 10.5.
5 SpeciWcally: (a) for each word in the training set, we keep the most reliable rule (in the sense of
Albright and Hayes 2002) that derives it; (b) for each change, we keep the rule that derives more forms
than any other.
192 Gradience in Phonology
morphology (Boersma 1998b; Russell 1999; Burzio 2002; MacBride 2004). In this
approach, rule (10.11a) is reconstrued as a constraint: ‘Use [šı̀-] / ___ [–anterior]
to form the s-perfective.’ This constraint is violated by forms that begin with a
nonanterior segment, but use something other than [šı̀-] to form the s-perfect-
ive. The basic idea is illustrated below, using hypothetical monosyllabic roots:
(10.12) Morphological Candidates Candidates
that obey that violate
base Use [šı̀-] / ___ Use [šı̀-] / ___
[–anterior] [–anterior]
[šáp] [šı̀-šáp] *[sı̀-šáp], *[mù-šáp], etc.
[táp] all none
It is straightforward to rank these constraints in a way that yields the target
pattern, as (10.13) and (10.14) show:
(.) USE [šì-]/ [−ant]>> { USE [sì-] / X, USE [šì-] / X [−ant] } >> all others
(.) a.
/sì-cìd/ USE [šì-]/___[–ant] USE [šì-]/___ X [–ant] USE [sì-]/___ X
F šì-cìd *
* sì-cìd *! *
For (10.14b), the free ranking of Use [šı̀-] / ___ X [–ant] and Use [sı̀-] / ___ X
produces multiple winners generated in free variation (Anttila 1997).
predictions for forms outside the learning data, such as the legal but non-
existing stem /čálá/ (10.16).
ð10:16Þ +anterior
USE sì- / ([−round])*
+continuant ([−consonantal])*]
sì- c á l á
If ranked high enough, this constraint will have the detrimental eVect of
preventing [šı̀-čálá] from being generated consistently. We will call such
inappropriate generalizations ‘junk’ constraints.
One possible response is to say that the learning method is simply too
liberal, allowing too many generalizations to be projected from the learning
data. We acknowledge this as a possibility, and we have experimented with
various ways to restrict the algorithm to more sensible generalizations. Yet we
are attracted to the idea that constraint learning could be simpliWed—and rely
on fewer a priori assumptions—by letting constraints be generated rather
freely and excluding the bad ones with an eVective evaluation metric. Below,
we lay out such a metric, which employs the Gradual Learning Algorithm.7
7 A reviewer points out that another approach to weeding out unwanted generalizations is to train
the model on various subsets of the data, keeping only those generalizations that are found in all
training sets (cross-validation). Although this technique could potentially eliminate unwanted gener-
alizations (since each subset contains a potentially diVerent set of such generalizations), it could not
absolutely guarantee that they would not be discovered independently in each subset. Given that such
constraints make fatal empirical predictions, we seek a technique that reliably demotes them, should
they arise.
The Gradual Learning Algorithm 195
selection point is adopted for each constraint, taken from a Gaussian prob-
ability distribution with a standard deviation Wxed for all constraints. The
constraints are sorted by their selection points, and the winning candidate is
determined on the basis of this ranking. Since pairwise ranking probabilities
are determined by the ranking values,8 they are guaranteed to be mutually
consistent.
8 A spreadsheet giving the function that maps ranking value diVerences to pairwise probabilities is
posted at http://www.linguistics.ucla.edu/people/hayes/GLA/.
196 Gradience in Phonology
The idea, then, is to assign the constraints initial ranking values that
reXect their generality, with more general constraints on top. If the scheme
works, all the data will be explained by the most general applicable con-
straints, and the others will remain so low that they never play a role in
selecting output forms.
In order to ensure that diVerences in initial rankings are large enough to
make a diVerence, the generality values from (10.17) were rescaled to cover a
huge probability range, using the formula in (10.19):
.8 400 400
.7 350 350
100,000
.6 300 training 300
cycles
.5 250 250
.4 200 200
USE [sì-] /__ [−ant] .339 .3 150.9 150 150
0 0 0
The Wnal grammar is depicted schematically in (10.21), where the arrows show
the probabilities that one constraint will outrank the other. When the diVer-
ence in ranking value exceeds about 10, the probability that the ranking will
hold is essentially 1 (strict ranking).
(.) USE [šì-] /___ [−ant]
514.9
This approach yields the desired grammar: all of the junk constraints (not just
(10.15)) are ranked safely below the top three.
The procedure works because the GLA is error-driven. Thus, if junk
constraints start low, they stay there, since the general constraint that does
the same work has a head start and averts any errors that would promote the
junk constraints. Good constraints with speciWc contexts, on the other hand,
like ‘Use [šı̀-] /___ [–ant]’, are also nongeneral—but appropriately so. Even if
they start low, they are crucial in averting errors like *[sı̀-šáp], and thus they
are soon promoted by the GLA to the top of the grammar.
We Wnd, then, that a preference for more general statements in grammar
induction is not merely an aesthetic bias; it is, in fact, a necessary criterion in
distinguishing plausible hypotheses from those which are implausible, but
coincidentally hold true in the learning sample.
learned by our model, on the other hand, treats harmony as allomorphy ([sı̀-]
versus [šı̀-]), and cannot capture root-internal harmony eVects. Thus, it may
be objected that the model has missed the essential nature of harmony.
In this connection, we note Wrst that harmony is often observed primarily
through aYx allomorphy—either because there is no root-internal
restriction, or because the eVect is weaker within roots, admitting greater
exceptionality. For these cases, allomorphy may be the only appropriate
analysis. For arguments that root-internal and aYxal harmony often require
separate analyses, see Kiparsky (1968).
More generally, however, there still remains the question of how to unify
knowledge about allomorphy and root-internal phonotactics. Even when
aYxes and roots show the same harmony patterns, we believe that under-
standing the distribution of aYx allomorphs could constitute an important
Wrst step in learning the more general process, provided there is some way of
bootstrapping from constraints on particular morphemes to more general
constraints on the distribution of speech sounds. We leave this as a problem
for future work.
9 A related problem, in which overly broad generalizations appear exaggeratedly accurate because
they contain a consistent subset, is discussed in Albright and Hayes (2002).
10 We restrict our discussion to phonological patterns; for discussion of patterns based possibly on
semantic, rather than phonological similarities, see Ramscar (2002). In principle, the approach
described here could be easily extended to include constraints that refer to other kinds of information;
it is an empirical question what properties allomorphy rules may refer to.
202 Gradience in Phonology
The problem is that our algorithm can often Wnd environments for these
minor changes that are exceptionless. For example, the exceptionless minor
change in (10.22) covers the four verbs dig, cling, Xing, and sling.11
2 3
þcor
þdorsal
ð10:22Þ i ! =X þant
4 5 ½ þpast
þvoice
þvoice
The GLA, when comparing an exceptionless constraint against a more general
constraint that suVers from exceptions, always ranks the exceptionless con-
straint categorically above the general one. For cases like Navajo, where the
special constraint was (10.11a) and the general constraint was (10.11c),
the default constraint for [sı̀-], this ranking is entirely correct, capturing the
special/default relationship. But when exceptionless (10.22) is ranked categor-
ically above the constraints specifying the regular ending for English (such as
Use [-d]), the prediction is that novel verbs matching the context of (10.22)
should be exclusively irregular (i.e. blig ! blug, not *bligged). There is
evidence that this prediction is wrong, from wug tests on forms that match
(10.22). For instance, the wug test reported in Albright and Hayes (2003)
yielded the following judgements (scale: 1 worst, 7 best):
(10.23) Present stem Choice for past Rating
a. blig [blIg] blug [bl^g] 4.17
bligged [blIgd] 5.67
b. spling [splI˛] splung [spl˛] 5.45
splinged [splI˛d] 4.36
The regular forms are almost as good or better than the forms derived by the
exceptionless rule.
We infer that numbers matter: a poorly attested perfect generalization such
as (10.22) is not necessarily taken more seriously than a broadly attested
imperfect generalization such as Use [-d]. For Navajo, strict ranking is
appropriate, since the special-environment constraint (10.11a) that must out-
rank the default (10.11c) is robustly attested in the language. In the English
case, the special-environment constraint is also exceptionless, but is attested
in only four verbs, yet the GLA—in either version—ranks it on top of the
grammar, just as in Navajo.
11 This is the largest set of I ! verbs that yields an exceptionless generalization. There are other
subsets, such as cling, Xing, and sling, that also lead to exceptionless generalizations, and these are also
generated by our model. The problem that we discuss below would arise no matter which set is
selected, and would not be solved by trying to, for example, exclude dig from consideration.
The Gradual Learning Algorithm 203
It can now be seen why in our earlier work we avoided constraint interaction
and employed reliability scores instead. With reliability scores, it is simple to
impose a penalty on forms derived by rules supported by few data—following
Mikheev (1997), we used a statistical lower conWdence limit on reliability.
Thus, for a wug form like blig, two rules of comparable value compete: the
regular rule (has exceptions, but vast in scope) versus (10.22) (no exceptions,
but tiny in scope). Ambivalence between the two is a natural consequence.
If reliability statistics are not the right answer to this problem, what is? It
seems that the basic idea that rules based on fewer forms should be downgraded
is sound. But the downgrade need not be carried out based on reliability
scores—it might also be made part of the constraint ranking process. In
particular, we propose that the basic principles of the GLA be supplemented
with biases that exert a downward force on morphological constraints that are
supported by few data, using statistical smoothing or discounting.
As of this writing we do not have a complete solution, but we have
experimented with a form of absolute discounting (Ney et al. 1994), imple-
mented as follows: for each constraint C, we add to the learning data an
artiWcial datum that violates C and obeys every other constraint with which C
is in conXict. Under this scheme, if C (say, (10.22) above) is supported by just
four forms, then an artiWcially-added candidate would have a major eVect in
downgrading its ranking. But if C is supported by thousands of forms (for
example, the constraint for a regular mapping), then the artiWcially added
candidate would be negligible in its eVect.
We found that when we implemented this approach, it yielded reasonable
results for the English scenario just outlined: in a limited simulation consist-
ing of the regulars in Albright and Hayes (2003) plus just the four irregulars
covered by (10.22), regular splinged was a viable competitor with splung, and
the relationships among the competing regular allomorphs remained essen-
tially unchanged.
There are many ways that small-scale generalizations could be downgraded
during learning. We emphasize that the development of a well-motivated
algorithm for this problem involves not just issues of computation, but an
empirical question about productivity: when real language learners confront
the data, what are the relative weights that they place on accuracy versus size
of generalization? Both experimental and modelling work will be needed to
answer these questions.12
12 An unresolved question that we cannot address here is whether a bias for generality can be
applied to all types of phonological constraints, or just those that govern allomorph distribution. It is
worth noting that for certain other types of constraints, such as faithfulness constraints, it has been
argued that speciWc constraints must have higher initial rankings than more general ones (Smith
2000). At present, we restrict our claim to morphological constraints of the form ‘Use X’.
204 Gradience in Phonology
10.14 Conclusion
The comparison of English and Navajo illustrates an important problem in
the study of gradient well-formedness in phonology. On the one hand, there
are cases such as English past tenses, in which the learner is confronted with
many competing patterns and must trust some generalizations despite some
exceptions. In such cases, gradient well-formedness is rampant, and the
model must retain generalizations with varying degrees of reliability. On the
other hand, there are cases such as Navajo sibilant harmony, in which
competition is conWned to particular contexts, and the learner has many
exceptionless generalizations to choose from. In these cases, the challenge is
for the model to choose the ‘correct’ exceptionless patterns, and refrain from
selecting an analysis that predicts greater variation than is found in the target
language.
We seek to develop a model that can handle all conWgurations of gradience
and categoricalness, and we believe the key lies in the trade-oV between
reliability and generality. We have shown here how our previous approach
to the problem was insuYcient, and proposed a new approach using the GLA,
modiWed to favour more general constraints. The precise details of how
generality is calculated, and how severe the bias must be, are left as a matter
for future research.
Part III
Gradience in Syntax
This page intentionally left blank
11
11.1 Introduction
This paper presents a set of corpus data from English, a head-initial language,
and some additional data from Japanese, a head-Wnal language, showing clear
selection preferences among competing structures. The structures involve the
positioning of complements and adjuncts relative to the verb, and the prefer-
ences range from highly productive to unattested (despite being grammatical).
These ‘gradedness eVects’ point to a principle of eYciency in performance,
minimize domains (MiD). SpeciWcally, this chapter argues for the following:
(11.1) a. MiD predicts gradedness by deWning the syntactic and semantic
relations holding between categories, by enumerating the surface
structure domains in which these relations can be processed, and
by ranking common domains in competing structures according
to their degree of minimization. Relative weightings and cumu-
lative eVects among diVerent syntactic and semantic relations are
explained in this way.
b. The same minimization preferences can be found in the preferred-
grammatical conventions of diVerent language types, and a per-
formance–grammar correspondence hypothesis is proposed.
c. Principles of performance can predict what purely grammatical
principles of ordering can only stipulate, and can explain excep-
tions to grammatical principles. A model that appears to capture
the desired performance–grammar correspondence, stochastic
1 This paper is dedicated to Günter Rohdenburg on the occasion of his 65th birthday. Günter’s
work discovering patterns of preference in English performance (see e.g. Rohdenburg 1996) has been
inspirational to me and to many others over many years.
208 Gradience in Syntax
(11.4) Dependency
Two categories A and B are in a relation of dependency iV the
processing of B requires access to A for the assignment of syntactic
or semantic properties to B with respect to which B is zero-speciWed
or ambiguously or polysemously speciWed.
Theta-role assignment to an NP by reference to a verb can serve as an example
of a dependency of B on A. Co-indexation of a reXexive anaphor by reference
to an antecedent, and gap processing in relation to a Wller, are others. Some
dependencies between A and B also involve combination (theta-role assign-
ments, for example), others do not.
A ‘domain’, as this term is used in this context, is deWned in (11.5):
(11.5) Domain
A combinatorial or dependency domain consists of the smallest con-
nected sequence of terminal elements and their associated syntactic
and semantic properties that must be processed for production and/or
recognition of the combinatorial or dependency relation in question.
The domain suYcient for recognition of the VP and its three immediate
constituents (V, PP1, PP2) is shown in bold in the following sentence (cf. 11.3.1
below): the old lady counted on him in her retirement.
One prediction made by MiD (11.2) that will be signiWcant here involves the
preferred adjacency of some categories versus others to a head of phrase:
(11.6) Adjacency to heads
Given a phrase {H, {X, Y}}, H a head category and X and Y phrases
that are potentially adjacent to H, then the more combinatorial and
dependency relations whose processing domains can be minimized
when X is adjacent to H, and the greater the minimization diVerence
between adjacent X and adjacent Y in each domain, the more H and X
will be adjacent.
[V PP1 PP2] 60% (58) 86% (108) 94% (31) 99% (68)
[V PP2 PP1] 40% (38) 14% (17) 6% (2) 1% (1)
PP2 ¼ longer PP; PP1 ¼ shorter PP
Proportion of short–long to long–short given as a percentage; actual numbers of sequences in parentheses
An additional 71 sequences had PPs of equal length (total n ¼ 394)
Source : Hawkins 2000: 237
length diVerence were ordered short before long (265/323), the short PP being
adjacent to V, and the degree of the weight diVerence correlated precisely with
the degree of preference for the short before long order, as shown in Table 11.1.
As length diVerences increase, the eYciency (and IC-to-word ratio) of the
long-before-short structure (11.7b) decreases relative to (11.7a), and (11.7a) is
increasingly preferred, and predicted, by MiD (11.2).
The data of Table 11.1 are from (written) production. Similar preferences have
been elicited in production experiments by Stallings (1998) and Stallings et al.
(1998). Domain minimization can be argued to be beneWcial for the speaker,
therefore, and not just an accommodation to the hearer’s parsing needs (cf.
Hawkins 2004). I shall accordingly relabel a CRD as a phrasal combination
domain (PCD), making it compatible with production and comprehension.
(11.8) Phrasal combination domain (PCD)
The PCD for a mother node M and its I(mmediate) C(onstituents)
consists of the smallest string of terminal elements (plus all M-
dominated non-terminals over the terminals) on the basis of which
the processor can construct M and its ICs.
EIC can be generalized to make it compatible with both as follows:
(11.9) Early immediate constituents (EIC)
The human processor prefers linear orders that minimize PCDs (by
maximizing their IC-to-word ratios), in proportion to the minimiza-
tion diVerence between competing orders.
11.3.1.2 Head-Wnal structures For head-Wnal languages EIC predicts a
mirror-image preference. Postposing a heavy NP or PP to the right in
English shortens PCDs. Preposing heavy constituents in Japanese has the
same eVect, since the relevant constructing categories (V for VP, P for PP,
etc.) are now on the right (which is abbreviated here as VPm, PPm, etc.). In a
structure like [{1PPm, 2PPm} V] the PCD for VP will proceed from the Wrst
212 Gradience in Syntax
6 For quantiWcation of this Japanese preposing preference in relation to EIC, cf. Hawkins (1994: 80–1).
7 The Japanese corpus analysed by Kaoru Horie consisted of 150 pages of written Japanese
summarized in Hawkins (1994: 142), and of three distinct texts.
Gradedness in the Processing of Syntax and Semantics 213
[2ICm 1ICm V] 66% (59) 72% (21) 83% (20) 91% (10)
[1ICm 2ICm V] 34% (30) 28% (8) 17% (4) 9% (1)
NPo ¼ direct object NP with accusative case particle o
PPm ¼ PP constructed on its right periphery by a P(ostposition)
ICm ¼ either NPo or PPm
2IC ¼ longer IC; 1IC ¼ shorter IC
Proportion of long–short to short–long orders given as a percentage; actual numbers of sequences in
parentheses
An additional 91 sequences had ICs of equal length (total n ¼ 244)
Source : Hawkins 1994: 152; data collected by Kaoru Horie
The preference for long before short in Japanese is not predicted by current
models of language production, which are heavily inXuenced by English-type
postposing eVects. Yet it points to the same minimization preference for PCDs
that we saw in head-initial languages. For example, according to the incre-
mental parallel formulator of De Smedt (1994), syntactic segments are assem-
bled incrementally into a whole sentence structure, following message
generation within a conceptualizer. Short constituents can be formulated
with greater speed in the race between parallel processes and should accord-
ingly be generated Wrst, before heavy phrases.
The theory of EIC, by contrast, predicts that short phrases will be formu-
lated Wrst only in head-initial languages, and it deWnes a general preference for
minimal PCDs in all languages. The result: heavy ICs to the left and short ICs
to the right in head-Wnal languages.
8 See Schütze and Gibson (1999) and Manning (2003) for useful discussion of the complement/
adjunct distinction in processing and grammar.
Gradedness in the Processing of Syntax and Semantics 215
Recall that 82 per cent had a shorter PP adjacent to V and preceding a longer
one in Table 11.1. For PPs that were both shorter and lexically dependent, the
adjacency rate to V was almost perfect, at 96 per cent (102/106). This com-
bined adjacency eVect was statistically signiWcantly higher than for lexical
dependency and EIC alone.
The processing of a lexical combination evidently prefers a minimal
domain, just as the processing of phrasal combinations does. This can be
explained as follows. Any separation of count and on his son, and of wait
and for his son, delays recognition of the lexical co-occurrence frame intended
for these predicates by the speaker and delays assignment of the verb’s
combinatorial and dependent properties. A verb can be, and typically is,
associated with several lexical co-occurrence frames, all of which may be
activated when the verb is processed (cf. Swinney 1979; MacDonald et al.
1994).
Accompanying PPs will select between them, and in the case of verbs like
count they will resolve a semantic garden path. For dependent prepositions,
increasing separation from the verb expands the domain and working mem-
ory demands that are required for processing of the preposition.
We can deWne a lexical domain as follows:
(11.11) Lexical domain (LD)9
The LD for assignment of a lexically listed property P to a lexical item L
consists of the smallest string of terminal elements (plus their associ-
ated syntactic and semantic properties) on the basis of which the
processor can assign P to L.
9 I make fairly standard assumptions here about the properties that are listed in the lexicon.
They include: the syntactic category or categories of L (noun, verb, preposition, etc.); the syntactic
co-occurrence frame(s) of L, i.e. its ‘strict subcategorization’ requirements of Chomsky (1965) (e.g. V
may be intransitive or transitive, if intransitive it may require an obligatory PP headed by a particular
preposition, or there may be an optionally co-occurring PP headed by a particular preposition, etc.);
‘selectional restrictions’ imposed by L, Chomsky (1965) (e.g. drink requires an animate subject and
liquid object); syntactic and semantic properties assigned to the complements of L (e.g. the theta-role
assigned to a direct object NP by V); the diVerent range of meanings assignable to L with respect to which
L is ambiguous or polysemous (the diVerent senses of count and follow and run); and frequent
collocations of forms, whether ‘transparent’ or ‘opaque’ in Wasow’s (1997, 2002) sense.
216 Gradience in Syntax
10 In corresponding tables cited in Hawkins (2000, 2001) I included Wve additional sequences,
making 211 in all, in which both PPs were interdependent with V, but one involved more dependencies
than the other. I have excluded these Wve here, resulting in a total of 206 sequences, in all of which one
PP is completely independent while the other PP is interdependent with V by at least one entailment
test.
Gradedness in the Processing of Syntax and Semantics 217
[V Pd Pi] 7% (2) 33% (6) 74% (17) 83% (24) 92% (23) 96% (49) 100% (30)
[V Pi Pd] 93% (28) 67% (12) 26% (6) 17% (5) 8% (2) 4% (2) 0% (0)
Pd ¼ the PP that is interdependent with V by one or both entailment tests
Pi ¼ the PP that is independent of V by both entailment tests
Proportion of adjacent V-Pd to non-adjacent orders given as a percentage; actual numbers of sequences in
parentheses
Source : Hawkins 2000: 247
For the Japanese data of Table 11.2, I predict a similar preference for comple-
ments and other lexically co-occurring items adjacent to the verb, and a
similar (but again mirror-image) interaction with the long-before-short
weight preference. A transitive verb contracts more syntactic and semantic
relations with a direct object NP as a second argument or complement than it
does with a PP, many or most of which will be adjuncts rather than comple-
ments. Hence a preference for NP adjacency is predicted, even when the NP is
longer than the PP, though this preference should decline with increasing
(relative) heaviness of the NP and with increasing EIC pressure in favour of
long before short phrases. This is conWrmed in Table 11.5 where NP-V
adjacency stands at 69 per cent overall (169/244) and is as high as 62 per
cent for NPs longer than PP by 1–2 words and 50 per cent for NPs longer by
3–4 words, that is with short PPs before long NPs. Only for 5+ word diVer-
entials is NP-V adjacency avoided in favour of a majority (79 per cent) of long
NPs before short PPs.
When EIC and complement adjacency reinforce each other in favour of
[PPm NPo V] in the right-hand columns, the result is signiWcantly higher NP
218 Gradience in Syntax
[PPm NPo V] 21% (3) 50% (5) 62% (18) 66% (60) 80% (48) 84% (26) 100% (9)
[NPo PPm V] 79% (11) 50% (5) 38% (11) 34% (31) 20% (12) 16% (5) 0% (0)
NPo ¼ see Table 11.2
PPm ¼ see Table 11.2
Proportion of adjacent NPo-V to non-adjacent orders given as a percentage; actual numbers of sequences in
parentheses
Source : Hawkins 1994: 152; data collected by Kaoru Horie
adjacency (of 80 per cent, 84 per cent and 100 per cent). When weights are
equal there is a strong (66 per cent) NP adjacency preference deWned by the
complement processing preference alone. And when EIC and complement
adjacency are opposed in the left-hand columns, the results are split, as we
have seen, and EIC applies in proportion to its degree of preference. Table 11.5
is the mirror-image of table 11.4 with respect to the interaction between EIC
and lexical domain processing.
One further prediction that remains to be tested on Japanese involves the
PP-V adjacencies, especially those in the right-hand columns in which adja-
cency is not predicted by weight. These adjacencies should be motivated by
strong lexical dependencies, that is they should be lexical complements or
collocations in Wasow’s (1997, 2002) sense, and more Wne-tuned testing needs
to be conducted in order to distinguish diVerent PP types here.
11 See Hawkins (2004) for further illustration and testing of these total domain diVerential
predictions in a variety of other structural types.
Gradedness in the Processing of Syntax and Semantics 221
12 The quantitative data in (11.16) are taken from Matthew Dryer’s sample, measuring languages
rather than genera (see Dryer 1992, Hawkins 1994: 257). The quantitative data in (11.17) come from
Hawkins (1983, 1994: 259).
222 Gradience in Syntax
Vennemann (1974), Lehmann (1978), Hawkins (1983), and Travis (1984, 1989).
The two language types are mirror images of one another, and EIC provides
an explanation: both (a) and (b) are optimally eYcient.
Grammatical conventions also reveal a preference for orderings in propor-
tion to the number of combinatorial and dependency relations whose process-
ing domains can be minimized (recall (11.6)). Complements prefer adjacency to
heads over adjuncts in the basic ordering rules of numerous phrases in English
and other languages and are generated in a position adjacent to the head in the
phrase structure grammars of JackendoV (1977) and Pollard and Sag (1987).
Tomlin’s (1986) verb object bonding principle supports this. Verbs and direct
objects are regularly adjacent across languages and there are languages in which
it is impossible or highly dispreferred for adjuncts to intervene between a verbal
head and its subcategorized object complement.
The basic reason I oVer is that complements also prefer adjacency over
adjuncts in performance (cf. 11.3.3), and the explanation for this, in turn, is
that there are more combinatorial and/or dependency relations linking com-
plements to their heads than link adjuncts to their heads. Complements are
listed in a lexical co-occurrence frame deWned by, and activated in on-line
processing by, a speciWc head such as a verb and processing this co-occurrence
favours a minimal lexical domain (11.11). There are more productive relations
of semantic and syntactic interdependency between heads and complements
than between heads and adjuncts. A direct object receives its theta-role from
the transitive verb, and so on.
These considerations suggest that domain minimization has also shaped
grammars and the evolution of grammatical conventions, according to the
following hypothesis:
(11.18) Performance-grammar correspondence hypothesis (PGCH)
Grammars have conventionalized syntactic structures in proportion
to their degree of preference in performance, as evidenced by pat-
terns of selection in corpora and by ease of processing in performance.
It follows from the PGCH that performance principles can often explain what
purely grammatical models can only stipulate, in this context adjacency
eVects and the head ordering parameter. SigniWcantly, they can also explain
exceptions to these stipulations, as well as many grammatically unpredicted
regularities. For example, Dryer (1992) has shown that there are systematic
exceptions to Greenberg’s correlations ((11.16)/(11.17)) and to consistent head
ordering when the non-head is a single-word item, for example an adjective
modifying a noun (yellow book). Many otherwise head-initial languages have
non-initial heads here (English), while many otherwise head-Wnal languages
Gradedness in the Processing of Syntax and Semantics 223
have noun before adjective (Basque). But when the non-head is a branching
phrase, there are good correlations with the predominant head ordering
position. EIC can explain this asymmetry.
When a head category like N (book) has a branching phrasal sister like
Possp {of, the professor} within NP, the distance from N to the head category P
or V that constructs the next higher phrase, PP or VP respectively, will be long
when head orderings are inconsistent, see, for example, (11.17c) and (11.17d). If
the intervening category is a non-branching single word, then the diVerence
between pp[P [Adj N]np] and pp[P np[N Adj]] is small, only one word.
Hence the MiD preference for noun initiality (and for noun Wnality in
postpositional languages) is signiWcantly less than it is for intervening branch-
ing sisters, and either less head ordering consistency or no consistency is
predicted. When there is just a one-word diVerence between competing
domains in performance, in for example Table 11.1, both ordering options
are generally productive, and so too in grammars.
Many such universals can be predicted from performance preferences,
including structured hierarchies of centre-embedded constituents and of
Wller-gap dependencies, markedness hierarchies, symmetries versus asymmet-
ries, and many morphosyntactic regularities (Hawkins 1994, 1999, 2001, 2003,
2004).
A model of grammar that seems ideally suited to capturing this perform-
ance-grammar correspondence is S(tochastic) O(ptimality) T(heory), cf.
Bresnan et al. (2001), Manning (2003). These authors point to the perform-
ance preference for Wrst and second person subjects in English (I was hit by the
bus) over third person subjects (the bus hit me), which has been convention-
alized into an actual grammaticality distinction in the Salish language Lummi.
SOT models this by building performance preferences directly into the gram-
mar as a probability ranking relative to other OT constraints. For English
there is a partial overlap with other ranked constraints, and non-Wrst person
subjects can surface as grammatical. In Lummi there is no such overlap and
sentences corresponding to the bus hit me are not generated. In the process,
however, SOT stipulates a stochastic distribution between constraint rankings
within the grammar, based on observed frequencies in performance.
We could formulate a similar stochastic model for the phrase structure
adjacencies and lexical co-occurrences of this paper. But there are good
reasons not to do so.
First, SOT would then stipulate what is predictable from the performance
principle of MiD (11.2).
Second, the grammatical type of the syntactic and semantic relation in
question does not enable us to predict the outcome of the constraint
224 Gradience in Syntax
interaction. What matters is the size of the domain that a given relation
happens to require in a given sentence and its degree of minimization over
a competitor. One and the same grammatical relation can have diVerent
strengths in diVerent sentences (as a function of the weight diVerences
between sisters, for example). And phrasal combination processing can be a
stronger force for adjacency than lexical dependency in some sentences, but
weaker in others. In other words, it is processing, not grammar, that makes
predictions for performance, and it would be unrevealing to model this as an
unexplained stochastic distribution in a grammar, when there is a principled
account in terms of MiD.
And third, I would argue that performance preferences have no place in a
grammar anyway, whose primary function is to describe the grammatical
conventions of the relevant language. To do so is to conXate, and confuse,
explanatory questions of grammatical evolution and synchronic questions
of grammaticality prediction. The soft constraint/hard constraint insight is
an important one, and it Wts well with the PGCH (11.18), but hard constraints
can be explained without putting soft constraints into the same grammar,
and the soft constraints require a processing explanation, not a grammatical
one.
11.5 Conclusions
The data considered in this paper lead to the conclusions summarized in (11.1).
First, there are clear preferences among competing and grammatically
permitted structures in the corpus data of English (Tables 11.1, 11.3, and 11.4)
and Japanese (Tables 11.2 and 11.5). These preferences constitute a set of
‘gradedness eVects’ and they can be argued to result from minimization
diVerences in processing domains, cf. minimize domains (11.2).
MiD deWnes a cumulative beneWt for minimality when the same terminal
elements participate in several processing domains. The English intransitive
verbs and PPs that contract relations of both phrasal sisterhood and of lexical
combination exhibit 96 per cent adjacency, those that involve only one or the
other relation exhibit signiWcantly less (cf. 11.3.3). The relative strengths in
these cases reXect the degree of minimization distinguishing competing
orders in the processing of each relation. These cumulative eVects are cap-
tured in a quantitative metric that measures total domain diVerentials (or
TDDs) across structures (11.12). The metric measures domain sizes in words,
but could easily be adapted to quantify a more inclusive node count, or a
count in terms of phrasal nodes only.
Gradedness in the Processing of Syntax and Semantics 225
13 See Hawkins (2003, 2004) for detailed discussion of frequency distributions and their grammat-
ical counterparts in numerous areas, in terms of minimize domains (11.2), in conjunction with two
further principles of eYciency, minimize forms, and maximize on-line processing.
226 Gradience in Syntax
frequencies may not be predictable, but you will not know this if you look
only at the grammar of the preference (Wrst and second person subjects are
preferred over third persons, etc.) instead of its processing. Should such
distributions turn out not to be predictable from eYciency principles of
performance, they should still not be included in a grammar, if their stochas-
tic ranking is not explainable by grammatical principles either, which it
almost certainly will not be.
Fourth and Wnally, we need a genuine theory of performance and of the
human processing architecture from which frequencies can be derived, and I
have argued (Hawkins 2004) that we do not yet have the kind of general
architecture that we need. MiD is an organizing principle with some pre-
dictiveneness, but it too must be derivable from this architecture. We also
need the best model of grammatical description we can get, incorporating
relevant conventions in languages that have them, and deWning the diVerences
in grammaticality between languages in the best possible way. The further
ingredient that is ultimately needed in the explanatory package is a diachronic
model of adaptation and change, of the type outlined in Haspelmath (1999)
and Kirby (1999).
12
12.1 Introduction
Gradience in language comprehension can be manifest in a variety of ways,
and have various sources of origin.1 Based on theoretical and empirical
results, one possible way of classifying such phenomena is whether they
arise from the grammaticality of a sentence, perhaps reflecting the relative
importance of various syntactic constraints, or arise from processing, namely
the mechanisms which exploit our syntactic knowledge for incrementally
recovering the structure of a given sentence. Most of the chapters in this
volume are concerned with the former: how to characterize and explain the
gradient grammaticality of a given utterance, as measured, for example, by
judgements concerning acceptability. While the study of gradient grammat-
icality has a long history in the generative tradition (Chomsky 1964, 1975),
recent approaches such as the minimalist programme (Chomsky 1995) do not
explicitly allow for gradience as part of the grammar.
In this chapter, we more closely consider the phenomena of gradient per-
formance: how can we explain the variation in processing difficulty, as reflected
for example in word-by-word reading times? Psycholinguistic research has
identified two key sources of processing difficulty in sentence comprehension:
local ambiguity and processing load. In the case of local, or temporary ambigu-
ity, there is abundant evidence that people adopt some preferred interpretation
immediately, rather then delaying interpretation. Should the corresponding
1 The authors would like to thank the volume editors, the anonymous reviewers, and also Marshall
Mayberry for their helpful comments. The authors gratefully acknowledge the support of the German
Research Foundation (DFG) through SFB 378 Project ‘Alpha’ awarded to the first author, and an
Emmy Noether fellowship awarded to the second author.
228 Gradience in Syntax
frequencies, but also keeps track of structural frequencies. This view, known
as the tuning hypothesis, states that the human parser deals with ambiguity by
initially selecting the syntactic analysis that has worked most frequently in the
past (see Figure 12.1).
The fundamental question that underlies both lexical and structural
experience models is the grain problem: What is the level of granularity at
which the human sentence processor ‘keeps track’ of frequencies? Does it
count lexical frequencies or structural frequencies (or both), or perhaps
frequencies at an intermediate level, such as the frequencies of individual
phrase structure rules? The latter assumption underlies a number of
experience-based models that are based on probabilistic context free
NP VP
V NP
det N
Someone shot
PP
the servant
Spanish
prep NP
English
of
RC
det N
the actress
relp S
who ...
Figure 12.1 Evidence from relative clause (RC) attachment ambiguity has been
taken to support an experience-based treatment of structural disambiguation. Such
constructions are interesting because they do not hinge on lexical preferences. When
reading sentences containing the ambiguity depicted above, English subjects demon-
strate a preference for low-attachment (where the actress will be further described by
the RC who . . . ), while Spanish subjects, presented with equivalent Spanish sentences,
prefer high-attachment (where the RC concerns the servant) (Cuetos and Mitchell
1988). The Tuning Hypothesis was proposed to account for these findings (Brysbaert
and Mitchell 1996; Mitchell et al. 1996), claiming that initial attachment preferences
should be resolved according to the more frequent structural configuration. Later
experiments further tested the hypothesis, examining subjects’ preferences before and
after a period of two weeks in which exposure to high or low examples was increased.
The findings confirmed that even this brief period of variation in ‘experience’
influenced the attachment preferences as predicted (Cuetos et al. 1996)
232 Gradience in Syntax
grammars (see Figure 12.2 for details). Furthermore, at the lexical level, are
frame frequencies for verb forms counted separately (e.g. know, knew, knows,
. . . ) or are they combined into a set of total frequencies for the verb’s base
form (the lemma KNOW) (Roland and Jurafsky 2002)?
where R is the set of all rules applied in generating the parse tree t. It has been
suggested that the probability of a grammar rule models how easily this rule
can be accessed by the human sentence processor (Jurafsky 1996). Structures
with greater overall probability should be easier to construct, and therefore
preferred in cases of ambiguity. As an example consider the PCFG in Figure
12.2(a). This grammar generates two parses for the the sentence John hit the
man with the book. The first parse t1 attaches the prepositional phrase with the
book to the noun phrase (low attachment), see Figure 12.2(b). The PCFG
assigns t1 the following probability, computed as the product of the probabil-
ities of the rules used in this parse:
(12.4) P(t1 ) ¼ 1:0 0:2 0:7 1:0 0:2 0:6 1:0 1:0 0:5
1:0 0:6 1:0 0:5 ¼ 0:00252
The alternative parse t2 , with the prepositional phrase attached to the verb
phrase (high attachment, see Figure 12.2(c)) has the following probability:
(12.5) P(t2 ) ¼ 1:0 0:2 0:3 0:7 1:0 1:0 0:6 1:0 0:6
1:0 0:5 1:0 0:5 ¼ 0:00378
Under the assumption that the probability of a parse is a measure of process-
ing effort, we predict that t2 (high attachment) is easier to process than t1 , as it
has a higher probability.
In applying PCFGs to the problem of human sentence processing, an
important additional property must be taken into account: incrementality.
That is, people face a local ambiguity as soon as they hear the fragment John
hit the man with . . . and must decide which of the two possible structures is
(a)
S → NP VP 1.0 NP → DetNP 0.6 V → hit 1.0
Figure 12.2 An example for the parse trees generated by a probabilistic context free
grammar (PCFG). (a) The rules of a simple PCFG with associated rule application
probabilities. (b) and (c) The two parse trees generated by the PCFG in (a) for the
sentence John hit the man with the book.
Probabilistic Grammars in Language Processing 235
(b)
S 1.0
NP 0.2 VP 0.7
John
V 1.0 NP 0.2
hit
NP 0.6 PP 1.0
the man
with Det 1.0 N 0.5
the book
(c)
S 1.0
NP 0.2 VP 0.3
John
VP 0.7 PP 1.0
to be preferred. This entails that the parser is able to compute prefix prob-
abilities for sentence initial substrings, as the basis for comparing alternative
(partial) parses. Existing models provide a range of techniques for computing
and comparing such parse probabilities incrementally (Brants and Crocker
2000; Hale 2001; Jurafsky 1996). For the example in Figure 12.1, however, the
preference for t2 would be predicted even before the final NP is processed,
since the probability of that NP is the same for both structures.
Note that the move from CFGs to PCFGs also raises a number of other
computational problems, such as the problem of efficiently computing the
most probable parse for a given input sentence. Existing parsing schemes can
be adapted to PCFGs, including shift-reduce parsing (Briscoe and Carroll
1993) and left-corner parsing (Stolcke 1995). These approaches all use the basic
Viterbi algorithm (Viterbi 1967) for efficiently computing the best parse
generated by a PCFG for a given sentence.
This issue has been addressed by Corley and Crocker’s (2000) broad
coverage model of lexical category disambiguation. Their approach uses a
bigram model to incrementally compute the probability that a string of words
w0 . . . wn has the part of speech sequence t0 . . . tn as follows:
n
Q
(12.6) P(t0 . . . tn ,w0 . . . wn ) P(wi jti )P(ti jti 1 )
i¼1
Here, P(wi jti ) is the conditional probability of word wi given the part of
speech ti , and P(ti jti 1 ) is the probability of ti given the previous part of
speech ti 1 . This model capitalizes on the insight that many syntactic ambi-
guities have a lexical basis, as in (12.7):
(12.7) The warehouse prices/makes —.
These fragments are ambiguous between a reading in which prices or makes is
the main verb or part of a compound noun. After being trained on a large
corpus, the model predicts the most likely part of speech for prices, correctly
accounting for the fact that people understand prices as a noun, but makes as a
verb (Crocker and Corley 2002; Frazier and Rayner 1987; MacDonald 1993).
Not only does the model account for a range of disambiguation preferences
rooted in lexical category ambiguity, it also explains why, in general, people
are highly accurate in resolving such ambiguities.
More recent work on broad coverage parsing models has extended this
approach to full syntactic processing based on PCFGs (Crocker and Brants
2000). This research demonstrates that when such models are trained on
large corpora, they are not only able to account for human disambiguation
behaviour, but they are also able to maintain high overall accuracy under
strict memory and incremental processing restrictions (Brants and Crocker
2000).
Finally, it is important to stress that the kind of probabilistic models we
outline here emphasizes lexical and syntactic information in estimating the
probability of a parse structure. To the extent that a PCFG is lexicalized, with
the head of each phrase being projected upwards to phrasal nodes (Collins
1999), some semantic information may also be implicitly represented in the
form of word co-occurrences (e.g. head–head co-occurrences). In addition to
being incomplete models of interpretation, such lexical dependency probabil-
ities are poor at modelling the likelihood of plausible but improbable struc-
tures. Probabilistic parsers in their current form are therefore only
appropriate for modelling syntactic processing preferences. Probabilistic
models of human semantic interpretation and plausibility remain a largely
unexplored area of research.
238 Gradience in Syntax
likelihood of being correct, namely the higher relative frequency. One well-
studied ambiguity is prepositional phrase attachment:
(12.8) John hit the man [PP with the book ].
Numerous on-line experimental studies have shown an overall preference for
high attachment, that is for the association of the PP with the verb (e.g. as the
instrument of hit) (Ferreira and Clifton 1986; Rayner et al. 1983). Corpus
analyses, however, reveal that low attachment (e.g. interpreting the PP as a
modifier of the man) is about twice as frequent as attachment to the verb
(Hindle and Rooth 1993). Such evidence presents a challenge for accounts
relying exclusively on structural frequencies, but may be accounted for by
lexical preferences for specific verbs (Taraban and McClelland 1988). Another
problem for structural tuning comes from three-site relative clause attach-
ments analogous to that in Figure 12.1, but containing an additional NP
attachment site:
(12.9) [high The friend ] of [midthe servant ] of [lowthe actress ] [RCwho was
on the balcony ] died.
While corpus analysis suggest a preference for low > middle > high attach-
ment (although such structures are rather rare), experimental evidence sug-
gests an initial preference for low > high > middle (with middle being in fact
very difficult) (Gibson et al. 1996a, 1996b). A related study investigating noun
phrase conjunction ambiguities (instead of relative clause) for such three-site
configurations revealed a similar asymmetry between corpus frequency and
human preferences (Gibson and Schütze 1999).
Finally, there is recent evidence against lexical verb frame preferences:
(12.10) The athlete realized [S [NP her shoes/goals ] were out of reach ].
Reading times studies have shown an initial preference for interpreting her
goals as a direct object (Pickering et al. 2000), even when the verb is more
likely to be followed by a sentence complement (see also Sturt et al. 2001, for
evidence against the use of such frame preferences in reanalysis). These
findings might be taken as positive support for the tuning hypothesis, since
object complements are more frequent than sentential complements overall
(i.e. independent of the verb). Pickering et al. (2000), building on previous
theoretical work (Chater et al. 1998), suggest that the parser may in fact still be
using an experience-based metric, but not one which maximizes likelihood
alone.
240 Gradience in Syntax
of the second stage (hence there is no cumulativity, and only a small number
of optimal output forms can occur).
12.5 Conclusion
There is clear evidence for the role of lexical frequency effects in human
sentence processing, particularly in determining lexical category and verb
frame preferences. Since many syntactic ambiguities are ultimately lexically
based, direct evidence for purely structural frequency effects, as predicted by
the tuning hypothesis, remains scarce (Jurafsky 2002).
Probabilistic accounts offer natural explanations for lexical and structural
frequency effects, and a means for integrating the two using lexicalized
techniques that exists in computational linguistics (e.g. Carroll and Rooth
1998; Charniak 2000; Collins 1999). Probabilistic models also offer good
scalability and a transparent representation of symbolic structures and their
likelihood. Furthermore, they provide an inherently gradient characterization
of sentence likelihood, and the relative likelihood of alternative interpret-
ations, promising the possibility of developing truly quantitative accounts of
experimental data.
More generally, however, experience-based models not only offer an
account of specific empirical facts, but can more generally be viewed as
rational (Anderson 1990). That is, their behaviour typically resolves ambigu-
ity in a manner that has worked well before, maximizing the likelihood of
correctly understanding ambiguous utterances. This is consistent with the
suggestion that human linguistic performance is indeed highly adapted to
its environment and the task rapidly of correctly understanding language
(Chater et al. 1998; Crocker to appear). It is important to note however, that
such adaptation based on linguistic experience does not necessitate mechan-
isms which are strictly based on frequency-based estimations of likelihood
(Pickering et al. 2000). Furthermore, different kinds and grains of frequencies
may interact or be combined in complex ways (McRae et al. 1998).
It must be remembered, however, that experience is not the sole determin-
ant of ambiguity resolution behaviour (Gibson and Pearlmutter 1998). Not
only are people clearly sensitive to immediate linguistic and visual context
(Tanenhaus et al. 1995), some parsing behaviours are almost certainly deter-
mined by alternative processing considerations, such as working memory
limitations (Gibson 1998). Any complete account of gradience in sentence
processing must explain how frequency of experience, linguistic and
non-linguistic knowledge, and cognitive limitations are manifest in the
mechanisms of the human sentence processor.
Probabilistic Grammars in Language Processing 245
13.1 Introduction
Markedness plays a central role in optimality theoretic grammars in the form
of violable well-formedness constraints.1 Grammaticality is understood as
optimality relative to a constraint hierarchy composed of markedness con-
straints, which evaluate diVerent aspects of well-formedness, and faithfulness
constraints, which determine, by their deWnition and rank, which aspects of
markedness are tolerated, and which are not: grammaticality is dependent on
and derived from markedness.
An optimality grammar is an input–output mapping: marked features of
the input have a chance to appear in the output, if they are protected by highly
ranked faithfulness constraints. Optimal expressions might diVer in their
markedness which is reXected in the diVerent constraint violation proWles
that these expressions are assigned by the grammar.
This chapter argues that violation proWles can be used to predict contrasts
among expressions in empirical investigations, and that markedness is the
grammar-internal correlate of (some) phenomena of gradedness that we
1 I want to thank my collaborators Stefan Frisch, Jutta Boethke, and Marco Zugck, without whom
the empirical research presented in this paper would not have been undertaken. For fruitful comments
and helpful suggestions I further thank Gisbert Fanselow, Caroline Féry, Doug Saddy, Joanna
Blaszczak, Arthur Stepanov, and the audiences of presentations of parts of this work at the Potsdam
University and the workshop on Empirical Syntax/WOTS 8 at the ZAS Berlin, August 2004. This work
has been supported by a grant from the Deutsche Forschungsgemeinschaft for the research group
‘ConXicting Rules in Language and Cognition’, FOR-375/2-A3.
Degraded Acceptability and Markedness in Syntax 247
normative decision. She could as well have proposed 2.0 or 2.5 as the bound-
ary. How can such a decision be justiWed independently?
An answer to this question requires a theory of acceptability judgements.
Theoretical linguists rarely explicate their point of view on this. Interpreting
the ‘?’ as uncertainty could simplify the problems somewhat, as this allows the
assumption of a categorical grammar.
But we would still have to exclude that the gradedness that we observe
results from inherent properties of the grammar, instead of being the result of
‘random noise’. If, on the other hand, phenomena of gradedness are system-
atically correlated with grammatical properties, then the whole categorical
view on grammar is called into question. I think that this is indeed the case.
More recent variants of ‘explanations’ in terms of non-grammatical factors
attribute variation and gradedness in grammaticality judgements to ‘per-
formance’. Abney (1996) remarked that such a line of argumentation takes
the division between competence and performance more seriously than it
should be taken:
Dividing the human language capacity into grammar and processor is only a manner
of speaking, a way of dividing things up for theoretical convenience. It is naive to
expect the logical grammar/processor division to correspond to any meaningful
physiological division—say, two physically separate neuronal assemblies, one func-
tioning as a store of grammar rules and the other as an active device that accesses the
grammar-rule store in the course of its operation. And even if we did believe in a
physiological division between grammar and processor, we have no evidence at all to
support that belief; it is not a distinction with any empirical content. (Abney 1996: 12)
. Categorical linguistic theories claim too much. They place a hard categorical
boundary of grammaticality where really there is a fuzzy edge, determined by
many conXicting constraints and issues of conventionality versus human cre-
ativity. [ . . . ]
. Categorical linguistic theories explain too little. They say nothing at all about the
soft constraints that explain how people choose to say things (or how they
choose to understand them). (Manning 2003: 296–7)
the results of such empirical studies only rarely Wnd their way into the
grammar theoretical work of generative syntacticians.
2 The case required by the matrix verb appears slanted and attached to the FR in the glosses.
3 For the sake of completeness, I will brieXy describe the experiment design: each of the 24
participants—students of the University of Potsdam—saw eight items of each of the conditions.
Test items were FRs with the four possible case patterns with nominative and dative. The experiment
included four further conditions which will be introduced later—so the experiment had eight test
conditions altogether. The test items of this experiment have been randomized and mixed with the test
items of three other experiments which served as distractor items. The sentences have been presented
visually word by word on a computer screen, one word at a time, each word was presented for 400 ms.
Subjects were asked to give a grammaticality judgement by pressing one of two buttons for gram-
matical/ungrammatical, within a time window of 2,500 ms.
Degraded Acceptability and Markedness in Syntax 251
Table 13.1. Acceptability rates for the structures in (13.1) and (13.2) in the
experiment by Boethke (2005)
I want to emphasize that this experiment led to gradient acceptability (see below) without asking for it.
In questionnaire studies with multi-valued scales and experiments based on magnitude estimation
gradience is already part of the experimental design. One could argue that subjects only give gradient
judgements, because they have been oVered this option. In the experiment described here, the
gradience results from intra- and inter-speaker variation among the test conditions in repeated
measuring.
4 Featherston (to appear) provides more arguments in favour of this position.
252 Gradience in Syntax
A theory of grammar that has the potential to deal with gradedness more
successfully is optimality theory (Prince and Smolensky 1993). It departs in a
number of ways from classical generative grammar. It is constraint-based,
which is not strikingly diVerent, but the constraints are ranked and violable.
DiVerent structures have diVerent violation proWles.
One important departure from traditional grammars is that the grammat-
icality of an expression cannot be determined for that expression in isolation.
An expression is grammatical, if it is optimal. And it is optimal if it performs
better on the constraint hierarchy than all possible alternative expressions in a
competition for the expression of a particular underlying input form.
OT thus determines grammaticality in a relational manner. This is remin-
iscent of what is done in the empirical investigations described above. It
should be possible to systematically relate observed gradedness to relative
optimality of violation proWles.5
OT is based on two types of constraints, markedness and faithfulness
constraints. Markedness constraints evaluate intrinsic properties of candi-
dates, while faithfulness constraints evaluate how similar candidates are to a
given input. As there are inWnitely many possible input speciWcations, there is
the same rich amount of competitions. Grammatical expressions are those
that win, that is are optimal, in at least one of these competitions.
Candidates which are good at markedness, that is relatively unmarked
candidates, are not as dependent on the assistance of faithfulness constraints
relatively marked candidates. This is schematically illustrated in Tables 13.2
and 13.3.
cand1 M1 M2 F cand2 M1 M2 F
+cand1 * + cand1 * *
cand2 *! * cand2 *!
cand1 F M1 M2 cand2 F M1 M2
+cand1 * cand1 *! *
cand2 *! * + cand2 *
M1, M2: markedness constraints; F: faithfulness constraint; cand1, cand2: input speciWcations; cand1, cand2:
output candidates; * ¼ constraint violation; *! ¼ fatal violation; + ¼ winning candidate
5 The Wrst author who explored this feature of OT systematically was Frank Keller (Keller 2000b,
and further work). See below for a brief discussion of his approach.
Degraded Acceptability and Markedness in Syntax 253
1. (13.1a)
2. (13.2a) *
3. (13.2b) *
4. (13.1b) * * *
Groos and van Riemsdijk (1981), Pittner (1991), and Vogel (2001) oVer three
diVerent approaches to case conXicts in German FRs. Interestingly, these
authors also diVer in the grammaticality judgements they report. In particu-
lar, they agree that the two patterns in (13.4) are grammatical. Example (13.4a)
is a so-called ‘matching’ FR, both verbs assign the same case, accusative, no
conXict arises. In (13.4b), two diVerent cases are assigned, nominative and
accusative, but the wh-pronoun ‘was’ is ambiguous for these two cases, so the
FR is matching at the surface, and this is obviously suYcient.
(13.4) a. Ich lade ein, wen ich treVe
I invite [who-acc I-nom meet]-acc
‘I’ll invite whoever I meet’
b. Ich kaufe was mir gefällt
I buy [what-nom me-dat pleases]-acc
‘I’ll buy whatever pleases me’
While (13.5a, 13.5b) are both grammatical in Vogel’s (2001) dialect ‘German A’,
Pittner only classiWes (13.5a) as grammatical (Vogel’s (2001) ‘German B’). Both
patterns in (13.5) are classiWed as ungrammatical by Groos and van Riemsdijk
(Vogel’s (2001) ‘German C’).7
(13.5) a. Ich lade ein, wem ich begegne
I invite [who-dat I-nom meet]-acc
‘I’ll invite whoever I meet’
b. Ich lade ein, wer mir begegnet
I invite [who-nom me-dat meets]-acc
‘I’ll invite whoever meets me’
Note also that, strictly speaking, we have no evidence for the contrast between (13.2a) and (13.2b),
because their acceptability rates (71% versus 62%) did not diVer to a statistically signiWcant degree. If
we interpret this result such that the two structures are equally marked, then RC and S<O would have
to be ranked on a par in order to mirror this in our model.
7 Note that (13.4b) and (13.5b) do not diVer in the case conXict conWguration. The wh-pronoun
‘was’ is ambiguous for nominative and accusative. It is therefore the correct realization for both of
these cases, and the case conXict is, obviously, resolved.
Degraded Acceptability and Markedness in Syntax 255
Pittner oVers an explanation for the contrast she sees in (13.5) in terms of the
case hierarchy ‘nominative < accusative < dative, genitive, PP’: a case may
only be suppressed in favour of another case that is higher on the case
hierarchy, in particular, accusative can be suppressed in favour of dative,
but not in favour of nominative. In Vogel (2001), I capture this with the
following OT constraint:
(13.6) Realize Case (relativized) (RCr):
An assigned case must be realized morphologically by its case
morphology or that of a case that is higher on the case hierarchy.
I assume a further constraint that we may informally call ‘1To1’ here (cf. Vogel
2001):
(13.7) 1To1:
Case assigners and case assignees are in 1-to-1 correspondence.
The high rank of this constraint has the eVect that FRs are disallowed, and lose
against an unfaithful candidate. In German, this unfaithful winner is a
structure that I call ‘correlative’ (CORR):
(13.8) Wer uns hilft, dem werden wir vertrauen
Who-nom us-dat helps that one-dat will we-nom trust
Here, the case conXict is avoided by the insertion of an additional resumptive
pronoun (‘dem’). The diVerences between the judgements given in the three
papers can be described in terms of OT grammars that use the same hierarchy
of markedness constraints, and diVer only in the rank of faithfulness (see
Table 13.5).
If the rank of F is not absolutely determined, but allowed to vary between
RO and 1To1, then there is no need to assume that varying judgements result
from diVerent grammars, as long as variation and gradedness are based on the
same hierarchy of markedness constraints. Faithfulness can be interpreted as a
‘Xoating constraint’ in the sense of Reynolds (1994) and Nagy and Reynolds
(1997).
Floating constraints are ranked within a particular range in the constraint
hierarchy. They are exceptional, constraints in general do not Xoat. The
Table 13.5. DiVerent rankings of faithfulness among identical subrankings of
markedness yield the three variants of German reported in the literature
13.4 Markedness in OT
Markedness constraints do most of the crucial work in OT grammars. One
might object that markedness is only a reXection of typicality (just as one
anonymous reviewer did): a certain expression is degraded in acceptability
only because it is less frequently used or less prototypical. This objection does
not carry over to the phenomenon we are exploring here, case conXicts in
argument FRs. Most German native speakers agree that the following gram-
maticality contrast holds:
(13.9) a. Ich besuche, [fr wem ich vertraue ]
I visit-[acc] who-dat I trust
b. *Ich vertraue, [fr wen ich besuche ]
I trust-[dat] who-acc I visit
8 To me, this formulation even has the Xavour of a logical contradiction. Ungrammatical structures
can by deWnition not be better than other structures.
258 Gradience in Syntax
This is not the case in the proposal that I developed above. I only use
the constraint types that are already there, markedness and faithfulness
constraints. Faithfulness plays a crucial role in selecting the winners of single
competitions, but cannot, by deWnition, play a role in the relative comparison
of these winners, as they are winners for diVerent inputs. Müller, on the
contrary, selects the constraints that are responsible for gradedness in an
ad hoc manner from the set of markedness constraints.
In a similar vein, Keller (2001) and Büring (2001) propose diVerences
among markedness constraints. Roughly speaking, they should be distin-
guished by the eVect of their violation. Irrespective of their rank in the
constraint hierarchy, markedness constraints are claimed to diVer in whether
their violation leads to ungrammaticality or only to degraded acceptability.
These three authors have in common that they propose that markedness in
the traditional sense must be added to the OT model as a further dimension of
constraint violation. They did not Wnd a way of accounting for it within
standard OT. This is surprising insofar as the traditional conception of
markedness is the core of OT. However, I think that I showed a way out of
this dilemma that can do without these complications.
Consequently, CORR structures are less marked than FR structures.9 But how
can an FR be grammatical at all, then? Simply, because we speciWed in the
input that we want an FR structure, and highly ranked faithfulness rules out
the less marked CORR candidate—but only in this particular competition!
The CORR candidate still wins the competition where CORR is speciWed in
the input. The CORR structure performs worse than the FR structure in one
competition, but better in the other one. On which of these two contradicting
competitions shall we now base our empirical predictions?
A competition in an OT model is a purely technical device which should
not be identiWed with a comparison in a psycholinguistic experiment. The
only possible way to derive empirical predictions from a standard OT model
also for the comparison of ungrammatical structures seems to me to be the
meta-comparison for markedness sketched above that abstracts away from
single competitions, and therefore from faithfulness.
A powerful enhancement of OT that tries to relate grammar theory and
empirical linguistics is stochastic optimality theory which will be discussed in
the next section.
9 This is also reXected in the typology of these two constructions. To the best of my knowledge, the
languages which have FR constructions are a proper subset of those that have CORR constructions, as
I also illustrated in my earlier work, cf. (Vogel 2002).
260 Gradience in Syntax
Bresnan et al. (2001) show that Stochastic OT ‘can provide an explicit and
unifying theoretical framework for these phenomena in syntax.’ The frequen-
cies of active and passive are interpreted to correspond to the probabilities of
being the optimal output in a stochastic OT evaluation.
The most important constraints that are used in that account are *Obl1,2
which is ranked highest and bans by-phrases with Wrst and second person,
*SPt, which bans patients from being subjects, that is penalizes passives, *S3,
which penalizes 3rd person subjects, and *SAg, which penalizes Agents as
subjects. The latter two constraints are ranked on a par and overlap a bit
10 In Table 13.6, the description of the Action is to be read as follows: ‘‘1,2 ! 3’’ means that a first or
second person agent acts upon a third person patient.
11 In Lummi, sentences with Wrst or second person objects and third person subjects are ungram-
matical. Likewise, passive is excluded if the agent is Wrst or second person.
12 The eVect described here can be achieved without stochastic enhancements, just by exploiting the
violation proWles in the way illustrated in Section 13.3.
Degraded Acceptability and Markedness in Syntax 261
with the higher ranked *SPt, which in turn overlaps a bit with the higher
ranked *Obl1,2.
The rarity of passives with 1st or 2nd person subjects is mirrored by the
high rank of *Obl1,2. Is it really the case that the rarity of passives with Wrst and
second person by-phrases is the result of a grammatical constraint, or is it not
rather the result of the rarity of the communicative situation in which such a
passive would be appropriate? Not all instances of infrequency have a gram-
matical cause. It seems that a constraint system that is designed to directly
derive frequency patterns runs into the danger of interpreting properties of
the ‘world’ as properties of the grammar. I will discuss this problem in more
detail below.13
Table 13.7. Mean acceptabilities for FR and CORR in diVerent case conWgurations
(in %)
87 95 71 91 62 92 17 90
13 See also Boersma (this volume) for more discussion of problems of this kind.
14 The abbreviations for the case patterns here and below have the following logic: in ‘case1-case2’,
case1 is the case of the wh-pronoun, case2 is the case assigned to the FR by the matrix verb.
262 Gradience in Syntax
context are too good already, and so there might in fact be a diVerence, but it
cannot be detected with this method.
Secondly, the contrast between the FRs in the contexts dat-dat and dat-
nom was not signiWcant either, contrary to all other contrasts. This is perhaps
due to an equal rank of the constraints RC and S<O. Both of these seem to be
minor problems.
However, we also carried out a corpus study on the same structures, and
this study yielded diVerent results precisely in these two problematic cases
(Vogel and Zugck 2003). We used the ‘COSMAS II’ corpus of written German
of the Institut für Deutsche Sprache (IDS) Mannheim. Samples of 500
randomly chosen sentences containing the wh-pronouns ‘wer’ and ‘wem’
have been generated, the FR uses of these instances have been sorted out
and counted. The results which are relevant for our discussion are shown in
Table 13.8.
Roughly 90 per cent of the found items in the nom-nom context were FRs.
This is remarkably diVerent from the result of the acceptability judgement
experiment where CORR had a slightly higher rate, but the diVerence to FR
was not signiWcant. We therefore would have expected equal frequencies for
the two structures at best, but not such a high preference for the more marked
FR. The second diVerence concerns the contrast between the dat-dat and the
dat-nom context: FR is used signiWcantly less often in the dat-dat context. In
the experiment, these FRs have a higher acceptability rate, although this was
again not statistically signiWcant. A formulation of this problem in terms of
standard OT requires the following steps. First, we need a new constraint:
(13.10) Avoid Redundancy (*Red)
Avoid meaningless elements that have a purely grammatical purpose
(so-called ‘function words’).
This constraint favours FR over CORR structures because of the additional
resumptive pronoun in CORR, a pure function word without contribution to
the meaning of the clause. But typologically, the inclusion of this constraint
Degraded Acceptability and Markedness in Syntax 263
would predict the existence of languages that have FRs, but no CORRs. This
prediction seems to be false (cf. Vogel 2001, 2002).
Depending on how we interpret the results, *Red would either have to be
ranked lower than 1To1 in grammaticality judgements, because FRs are judged
as worse than CORR in the experiment, or equal with 1To1, because this
tendency was not signiWcant for the nom-nom context. For ease of presenta-
tion, we deliberately decide to give a clear ranking, and assume that the
observed contrast was only accidentally not signiWcant. Because FR is only
more frequent in the nom-nom context, the eVects of *Red must be restricted
to that context. We do this by adding a constraint conjunction of 1To1 and
S<O, ‘1To1 & S<O’:
(13.11) 1To1 & S<O
No simultaneous violation of 1To1 and S<O.
This constraint should be ranked on top of 1To1 in order to take eVect
independently of that constraint. Clause-initial FRs which are not the subject
of the main clause violate this constraint, and therefore cannot proWt from the
eVects of the lower ranked *Red. The same holds for FRs which violate RC.
Conjoined constraints should be ranked higher than their constituent
constraints, hence, 1To1 & S<O should also be ranked higher than S<O. In
fact, the constraint can fully take over the job of S<O. So we will rank 1to1 &
S<O in place of S<O, which will be ranked lowest.
We can now state the constraint rankings that model the results of the
experiment and the corpus study. The two methods diVer in two rerankings
which have been marked with frames in (13.12):
(13.12) Judgement ranking
RO RCr RC 1To1&S<O >> 1To1 *Red S<O
Corpus ranking
RO RCr 1To1&S<O RC >> * Red 1To1 S<O
How can we account for these contradictory rankings with a single OT
grammar? We might argue that the correlative structure is easier to parse,
hence preferred in the experiment, but it is avoided in production, because it
is, so to speak, ‘over-correct’. Indeed, the most plausible reason why the CORR
structure is avoided in the nom-nom context is that the resumptive pronoun
appears totally superXuous:
(13.13) Wer Hunger hat, (der) soll etwas essen
[Who-nom hunger has]-nom (the one-nom) shall something eat
‘Whoever is hungry shall eat a banana’
264 Gradience in Syntax
low frequency of the passive would then not be the result of the passive being a
rare winner, but of the passive being rarely chosen as input. This is a totally
diVerent issue. The reason why passive is more rarely chosen is, of course, its
higher markedness. But such an explanation makes no intrinsic claim about
the grammaticality status of alternative structures, as does the stochastic
evaluation in the model used by Bresnan et al.
In the same way, our constraint *Red could be seen as a criterion for the
choice of particular inputs, but not as a constraint that evaluates the candi-
dates for this input.
In fact, if we reconsider their statistically signiWcant Wnding that passives
are even less frequent with third person patients, and do not occur at all with
Wrst and second person agents, we see that we cannot tell what this signiW-
cance is evidence for: under the assumption that subjects are more likely to be
topics, and Wrst and second person are more likely to be topics, too, this
Wnding could simply be due to the fact that the contexts where Wrst and
second person have a lower information structural status are extremely rare—
a stochastic OT grammar based on this Wnding would take a property of the
environment within which a grammar is applied for a grammatical con-
straint. Constraints on grammar and constraints of grammar should not be
confused.
That it is necessary to distinguish these two diVerent explanations for the
rarity of structures can also be demonstrated with the result of another corpus
study that we undertook (Vogel et al. in preparation). We again counted free
relative clauses in German in the COSMAS II corpus, this time with the
neuter wh-pronoun ‘was’ (‘what’). This pronoun is the same for both nom-
inative and accusative, which has the eVect that even those speakers who do
not tolerate FRs with case conXicts judge such FRs with ‘was’ as grammatical.
A typical contrast is the one in (13.15):
(13.15) Grammaticality contrast for some German speakers (cf. Pittner 1991)
a. Ich kaufe was mir gefällt
I buy [what-nom me-dat pleases]-acc
b. *Ich lade ein wer mir gefällt
I invite [who-nom me-dat pleases]-acc
The FRs found in the randomly selected sample of 500 sentences have been
counted for FR and CORR structures in the four possible combinations of
nominative and accusative, where the surface form ‘was’ matches both case
requirements. The results are displayed in Table 13.9.
We see that the case conWguration has no inXuence on the relative distri-
bution of FR and CORR. It is about two-thirds to one-third throughout. The
Degraded Acceptability and Markedness in Syntax 267
Table 13.9. Frequencies of German FR and CORR with the pronoun ‘was’ in a
sample of 500 sentences with ‘was’
relative infrequency of the CORR structure that we found with ‘wer’ in the
nom-nom context is observed here again. Furthermore, the two contexts with
conXicting case requirements are totally neutralized in their eVects on the
choice of the construction. The case conXict does not seem to exist anymore if
the pronoun is homophonous for the two conXicting case features.
Compare these Wndings with the Wgures that we present in Vogel and Zugck
(2003) for the FRs with the animate wh-pronouns ‘wer’ (nominative) and
‘wen’ (accusative), repeated in Table 13.10.16
Only the nom-nom case pattern has a preference for FR with animate wh-
pronouns. That we Wnd the same with all FRs with ‘was’ irrespective of the
case pattern shows that the matching eVect is indeed a surface phenomenon.
However, we made a second observation which is perhaps rather unex-
pected. It concerns the relative frequency of the contexts themselves. Under the
assumption that nominative is more frequent than accusative in Wnite clauses
16 The two samples also diVer in the syntactic positions of the FRs that have been counted. ‘was’
also serves as relative pronoun in German, and it therefore was possible to include headed relative
clauses which are semantically equivalent to an FR (as in ‘everything that . . .’) in the statistics, and,
likewise, clause-Wnal FRs. The studies with ‘wer’ and ‘wen’ only counted clause-initial FRs and CORRs.
268 Gradience in Syntax
nominative 81 69
accusative 50 62
in general,17 we expect that nom-nom is the most frequent pattern, and acc-
acc the least frequent, while acc-nom and nom-acc should be equal. This is
not the case in the ‘was’ sample. The context acc-acc is about as frequent as
acc-nom and nom-acc has lowest frequency. The distribution of nominative
and accusative as matrix verb case and FR case is listed in Table 13.11.
To calculate our expectations for the distribution of the case patterns, let us
take the Wgures we Wnd for the matrix verbs as the base.18 Table 13.12 lists the
expected values, and the actual Wndings.
The departures from the expected values for the nom-acc and acc-acc
patterns are statistically signiWcant. This result is in line with the relative
markedness these patterns are assigned by the grammar. In FRs with the nom-
acc pattern, a case that is higher on the case hierarchy is suppressed in favour
of a lower one—this is a highly marked situation. FRs with the acc-acc
pattern are much less problematic, because both cases match, there is no
case conXict. That this structure has a higher frequency is therefore expected
under the assumption that grammatical markedness drives frequency.
The two results of this study together show on the one hand that the case
conWguration does not decrease the preference for the FR if the form of the FR
17 All Wnite verbs that assign accusative also assign nominative, but there are many verbs which do
not assign accusative. Independently of verb frames, all clauses must have a subject, i.e. a nominative,
in German.
18 The calculation for the four contexts is: nom-nom ¼ 8181 ¼ 6561; acc-nom, nom-acc ¼ 8150
¼ 4050; acc-acc ¼ 5050 ¼ 2500. The relative frequencies in per cent are then calculated relative to
the sum of these values: 6561+4050+4050+2500¼17161. These are used in Table 13.12.
Degraded Acceptability and Markedness in Syntax 269
pronoun Wts both case requirements. The conXict is resolved at the surface,
and this is suYcient.
On the other hand, we observe that potentially problematic conWgurations,
like the nom-acc pattern, are signiWcantly less frequent than we expect them
to be. The conclusion must be that such patterns tend to be avoided as inputs
already—even where they turn out to be unproblematic in practice.
The case patterns are crucially dependent on the lexical material that is
chosen, in particular, the case requirements of the chosen verbs. But the
choice of lexical material is not subject to a standard OT competition. It is
given in the input, and the input is presupposed for an OT competition.
Markedness is thus demonstrated to guide not only the choice of how things
are expressed (as FR or CORR), but also of what is to be expressed (which
combination of verbs with which case patterns is chosen/avoided).
13.7 Summary
The conceptual problem behind the traditional competence/performance
distinction does not go away, even if we abandon its original Chomskyan
formulation. It returns as the question about the relation between the model
of the grammar and the results of empirical investigations—the question of
empirical testing and veriWcation.
Markedness can be seen as a correlate of observed gradedness within the
theory of grammar. Optimality theory, being based on markedness, therefore
is a promising framework for the task of bridging the gap between model and
empirical world. However, this task not only requires a model of grammar,
but also a theory of the methods that are chosen in empirical investigations
and how their results are interpreted, and a theory of how to derive predic-
tions for these particular empirical investigations from the model.
Stochastic optimality theory is one possible way of deriving empirical
predictions from an OT model. However, I hope to have shown that it is
not enough to take frequency distributions and relative acceptabilities at face
value, and simply construe some stochastic OT model that Wts the facts. These
facts Wrst of all need to be interpreted, and those factors that the grammar has
to account for must be sorted out from those about which grammar should
have nothing to say. This task, to my mind, is more complicated than the
picture that a simplistic application of (not only) stochastic OT might draw.
14
14.1 Introduction
This paper provides an overview of linear optimality theory (LOT), a variant
of optimality theory initially proposed by Keller (2000b) to model gradient
linguistic data. It is important to note that LOT is a framework designed to
account for gradient judgement data; as has been argued elsewhere in this
volume (Crocker and Keller), gradience in processing data and in corpus data
has diVerent properties from gradience in judgement data, and it is unlikely
that the two types of gradience can be accounted for in a single, uniWed
framework.
The remainder of the paper is structured as follows. In Section 14.2, we
summarize the empirical properties of gradient judgements that motivate the
design of LOT. Section 14.3 deWnes the components of an LOT grammar, and
introduces the LOT notions of constraint competition and optimality. Based
on this, ranking argumentation is deWned, an algorithm for computing
constraint ranks is introduced, and a measure of model Wt in LOT is deWned.
Finally, Section 14.4 provides a comparison with other variants of OT, par-
ticularly with standard OT and with harmonic grammar. This section also
contains a survey of more recent developments, such as probabilistic OT and
variants of OT based on maximum entropy models.
1 An anonymous reviewer points out that cumulativity could also be implemented using the
mechanism of local constraint conjunction used in standard OT, which restricts cumulativity to
particular local domains. Local conjunction has the advantage that the occurrence of cumulative
eVects is still under the control of the linguist: a local conjunction must be deWned explicitly.
272 Gradience in Syntax
Based on deWnitions (14.3) and (14.4), the harmony of a structure can now be
deWned using a simple linear model:
(14.5) Harmony
Let hC,wi be a grammar signature. Then the harmony H(S) of a
candidate structure S with a violation proWle v(S,Ci ) is given in
(14.6).
P
(14.6) H(S) ¼ w(Ci )v(S,Ci )
i
Equation (14.6) states that the harmony of a structure is the negation of
the weighted sum of the constraint violations that the structure incurs.
Intuitively, the harmony of a structure describes its degree of well-formedness
relative to a given set of constraints. This notion corresponds closely to the
deWnition of harmony assumed in standard OT (Prince and Smolensky 1997:
1607) or harmonic grammar (Smolensky et al. 1992: 14).
The assumption is that all constraint weights are positive, that is that
wi $0 for all i. This means that only constraint violations inXuence the
harmony of a structure. Constraint satisfactions will not change the
harmony of the structure (including cases where a constraint is vacuously
satisWed because it is not applicable). This assumption is in accordance with
Keller’s (2000b) experimental results, in which only constraint violations
were found to aVect acceptability. This will be discussed further in Section
14.4.2.
(14.7) Grammaticality
Let S1 and S2 be candidate structures in the candidate set R. Then S1
is more grammatical than S2 if H(S1 ) > H(S2 ). This can be
abbreviated as (S1 ) > (S2 ):2
A crucial diVerence between harmony and grammaticality follows from deW-
nition (14.7). Harmony is an absolute notion that describes the overall well-
formedness of a structure. Grammaticality, on the other hand, describes the
relative ill-formedness of a structure compared with another structure. While
it is possible to compare the harmony of two structures across candidate sets,
the notion of grammaticality is only well-deWned for two structures that
belong to the same candidate set (i.e. share the same input). Therefore,
deWnition (14.7) (and the subsequent deWnition (14.8)) provide a relative
notion of well-formedness, in line with the optimality theoretic tradition.
Based on the deWnition of grammaticality in (14.7), we can deWne the
optimal structure in a candidate set as the one with the highest relative
grammaticality.
(14.8) Optimality
A structure Sopt is optimal in a candidate set R if Sopt > S for every
S 2 R.
A notion of constraint rank can readily be deWned in LOT based on the
relative weight of two constraints (see also the terminological note on ranks
versus weights in Section 14.3.1 above):
(14.9) Constraint rank
A constraint C1 outranks a constraint C2 if w(C1 ) > w(C2 ). This can
be abbreviated as C1 C2 .
In what follows, we will illustrate the deWnitions for harmony, grammaticality,
and optimality. Consider an example grammar with the constraints C1 ,C2 ,
and C3 , and the constraint weights given in Table 14.1. This table also speciWes
an example candidate set S1 , . . . ,S4 and gives the violation proWles for these
candidates. The harmony for each of these structures can be computed based
on deWnition (14.5).
The structure S3 maximizes harmony, that is it incurs the least serious
violation proWle. It is therefore the optimal structure in the candidate set, that
is to say, it is more grammatical than all other candidate structures. The
structures S1 and S4 are both less grammatical than S3 : S1 and S4 receive
the same harmony scores, but for diVerent reasons; S4 because it incurs a
2 This usage diVers from the standard OTusage, where harmonic ordering is denoted by ‘’, not ‘>’.
276 Gradience in Syntax
C1 C2 C3
w(C) 4 3 1 H(S)
S1 * * 4
S2 * ** 5
S3 * 1
S4 * 4
Hence the naive extension of standard OT fails to account for the ganging-up
eVects that were observed experimentally.
/input/ C3 C1 C2 Freq./Accept.
S1 * 3
S2 * * 2
S3 * 1
Source : Keller and Asudeh (2002)
284 Gradience in Syntax
/input/ C1 C2 Freq./Accept.
S1 * 4
S2 ** 3
S3 *** 2
S4 * 1
Source : Keller and Asudeh (2002)
3 More precisely, it is the POT probability of winning, averaged over all pairwise comparisons, but
this diVerence is irrelevant here.
Linear Optimality Theory as a Model 285
/input/ C3 C1 C2 Freq./Accept.
S1 * 2
S2 * * 1
S3 * 1
which is state of the art in computational linguistics (e.g. Abney 1997; Berger
et al. 1996). In maximum entropy OT (MOT) as formulated by Jäger (2004),
the probability of a candidate structure (i.e. of an input–output pair (o,i)) is
deWned as:
1 P
(14.17) PR (oji) ¼ exp ( rj cj (i,0) )
ZR (i) j
Here, rj denotes the numeric rank of constraint j, while R denotes the ranking
vector, that is the set of ranks of all constraints. The function cj (i,o) returns
the number of violations of constraint j incurred by input–output pair (i,o).
ZR (i) is a normalization factor.
The model deWned in (14.17) can be regarded as an extension of LOT as
introduced in Section 14.3.1. It is standard practice in the literature on
gradient grammaticality to model not raw acceptability scores, but log-
transformed, normalized acceptability data (Keller 2000b). This can be
made explicit by log-transforming the left-hand side of (14.6) (and dropping
the minus and renaming the variable i to j). The resulting formula is then
equivalent to (14.18).
P
(14.18) H(S) ¼ exp ( w(Cj )v(S,Cj ) )
j
A comparison of (14.17) and (14.18) shows that the two models have a parallel
structure: w(Cj ) ¼ rj and v(S,Cj ) ¼ cj (i,o) (the input–output structure of the
candidates is implicit in (14.18)). Both models are instances of a more general
family of models referred to as log-linear models. There is, however, a crucial
diVerence between the MOT deWnition in (14.17) and the LOT deWnition in
(14.18). Equation (14.18) does not include the normalization factor ZR (i),
which means that (14.18) does not express a valid probability distribution.
The normalization factor is not trivial to compute, as it involves summing
over all possible output forms o (see Goldwater and Johnson 2003, and Jäger
2004, for details). This is the reason why LOT assumes a simple learning
algorithm based on least square estimation, while MOT has to rely on
learning algorithms for maximum entropy models, such as generalized itera-
tive scaling, or improved iterative scaling (Berger et al. 1996). Another crucial
diVerence between MOT and LOT (pointed out by Goldwater and Johnson
2003) is that MOT is designed to be trained on corpus data, while LOT is
designed to be trained on acceptability judgement data.
14.5 Conclusions
This paper introduced linear optimality theory (LOT) as a model of gradient
grammaticality. Although this model borrows central concepts (such as
Linear Optimality Theory as a Model 287
1 We want to thank Caroline Féry, Heiner Drenhaus, Matthias Schlesewsky, Ralf Vogel, Thomas
Weskott, and an anonymous referee for helpful comments and critical discussion, and Jutta Boethke,
Jörg Didakowski, Ewa Trutkowski, Julia Vogel, Nikolaus Werner, Nora Winter, and Katrin Wrede for
technical support. The research reported here was supported by DFG-grant FOR375.
292 Gradience in Wh-Movement Constructions
Fanselow et al. (1999) interpret this result in terms of memory cost: a fronted
object wh-phrase must be stored in memory during the parse process up to
the point where an object position can be postulated. In an SOV-language
such as German, this means that the object must be memorized until the
subject has been recognized. This account is in line with recent ERP research.
King and Kutas (1995) found a sustained anterior negativity for the processing
of English object relative clauses (as compared to subject relative clauses),
which Müller et al. (1997) relate to memory. Felser et al. (2003), Fiebach et al.
(2002), and Matzke et al. (2002) found a sustained LAN in the processing of
German object-initial wh-questions and declaratives, which is again attrib-
uted to the memory load coming from the preposed object. The claim that
object-initial structures involve a processing diYculty is thus well supported.
It is natural to make this processing diYculty responsible for the reduced
acceptability of sentences such as (15.2b).
Subjacency violations as in (15.4) constitute another domain in which
processing diYculty reduces acceptability. Kluender and Kutas (1993) argue
that syntactic islands arise at ‘processing bottlenecks’ when the processing
demands of a long distance dependency at the clause boundary add up on the
processing demands of who or whether. This processing problem is reXected in
dramatically reduced acceptability.
15.3.3 Experiment 1a
15.3.3.1 Materials Experimental items had the form exempliWed in (15.9).
In a sentence with a pronominal subject preceded by the verb and followed by
an adverb, an object NP was split such that the left part (LP) preceded the
verb, while the right part (RP) was clause Wnal. The LP could consist of a
single noun (simple) (15.9a, 15.9b, 15.9e, 15.9f), or of a noun preceded by an
adjective (like alten, old) agreeing with the noun (15.9c, 15.9d, 15.9g, 15.9h).
The LP and RP appeared in either singular (sg) or plural (pl) form (see
below).
(15.9) a. Professor kennt sie leider keinen simple_sg_sg
professor.sg knows she unfortunately no.sg
b. Professoren kennt sie leider keine simple_pl_pl
professor.pl knows she unfortunately no.pl
c. Alten Professor kennt sie leider keinen complex_sg_sg
old.sg professor.sg knows she unfortunately no.sg
d. Alte Professoren kennt sie leider keine complex_pl_pl
old.pl professor.pl knows she unfortunately no.pl
e. Professor kennt sie leider keine simple_sg_pl
professor.sg knows she unfortunately no.pl
f. Professoren kennt sie leider keinen simple_pl_sg
professor.pl knows she unfortunately no.sg
Effects of Processing DiYculty on Judgements 297
6
4.97 4.9
SG/Match
5
SG/Mism.
4 3.23 3.33
2.98 2.92 PL/Match
2.7 2.56
3 PL/Mism.
2
1
Simple Complex
15.3.4 Experiment 1b
15.3.4.1 Materials The six conditions of experiment 1b are exempliWed in
(15.10a) to (15.10f). All nouns were ambiguous with respect to number. The LP
of DNP which just consisted of a noun was consequently number-ambiguous
as well (15.10a, 15.10b). The addition of a number-marked adjective
disambiguated the LP towards a singular (15.10c, 15.9d) or plural (15.10e,
15.10f) interpretation. The RP of the DNP was either singular (15.10a, 15.10c,
15.10e) or plural (15.10b, 15.10d, 15.10f).
(15.10) a. KoVer hatte er leider keinen amb_sg
suitcase.amb had he unfortunately no.sg
b. KoVer hatte er leider keine amb_pl
suitcase.amb had he unfortunately no.pl
c. Roten KoVer hatte er leider keinen sg_sg
red.sg suitcase had he unfortunately no.sg
d. Roten KoVer hatte er leider keine sg_pl
red.sg suitcase had he unfortunately no.sg
e. Rote KoVer hatte er leider keinen pl_sg
red.pl suitcase had he unfortunately no.sg
f. Rote KoVer hatte er leider keine pl_pl
red.pl suitcase had he unfortunately no.pl
Effects of Processing DiYculty on Judgements 299
15.3.4.2 Method There were four items per contition. Experiment 1b was
included in the same questionnaire as experiment 1a (see above). Each
participant saw 12 experimental items (2 per condition), 74 unrelated and 16
related distractor items (items of experiment 1a) plus 4 Wllers. A larger set of
96 sentences (16 sets of identical lexical material in each of the 6 conditions)
was created and assigned to 8 between subjects versions in such a way that no
subject saw identical lexical material in more than one sentence.
15.3.4.3 Results Figure 15.2 shows the mean acceptability ratings per
condition for all forty subjects.
We computed an ANOVA with the factors LP NUMBER (number of left
part: ambiguous versus singular versus plural) and RP NUMBER (number of
right part: singular versus plural). We found a main eVect of LP NUMBER
(F1(2,78) ¼ 30.82, p < .001) which was due to the fact that LP singulars were
less acceptable than both LP plurals (F1(1,39) ¼ 31.69, p < .001) and ambigu-
ous LP (F1(1,39) ¼ 51.36, p < .001). However, ambiguous and plural LP did
not diVer from one another (F1(1,39) ¼ 1.51, p ¼ .34). Furthermore, there was
a main eVect of RP NUMBER (F1(1,39) ¼ 24.67, p < .001) which was due to the
fact that RP plurals were more acceptable than RP singulars. We also found an
interaction between both factors (F1(2,78) ¼ 13.77, p < .001). Resolving this
interaction by the factor RP NUMBER, we found a main eVect of LP number
for both RP singulars (F1(1,39) ¼ 6.66, p < .01) and RP plurals (F1(1,39) ¼ 33.37,
p < .001). Within RP singulars, ambiguous LP were better than both singulars
(F1(1,39) ¼ 14.38, p < .001) and plurals (F1(1,39) ¼ 7.61, p < .01) whereas
within RP plurals, ambiguous LP were better than singulars (F1(1,39) ¼ 43.60,
p < .001), but equally acceptable as LP plurals (F < 1).
6
4.78 4.97
5
3.87 SG
4
3.23 PL
2.98 2.7
3
1
Ambiguous Singular Plural
represented in Figure 15.2 are compatible with the view that intermediate
acceptability assessments (in our case: concerning Singularity) inXuence
global acceptability: the option of a plural interpretation for a locally
ambiguous noun leads to a positive local acceptability value, because Singu-
larity appears fulWlled. This positive local assessment contributes to the
global acceptability of DNPs even when the plural interpretation is later
abandoned because a singular right part is detected. In contrast to grammat-
icality, global acceptability does not only depend on the Wnal structural
analysis, but also on the acceptability of intermediate analysis steps.
This acceptability pattern can also be found with professional linguists.
They are not immune to such ‘spillover’ eVects increasing global acceptability,
as survey 1c has revealed.
15.3.5 Survey 1c
By e-mail, we asked more than sixty linguists (nearly all syntacticians) with
German as their native language for judgements of sixteen DNP construc-
tions, among them the items (15.11a, 15.11b) illustrating DNP with a singular
RP and a number ambiguous (15.11a) or singular (15.11b) LP constructed as in
experiment 1b.
(15.11) a. KoVer hat er keinen zweiten
suitcase.amb has he no.sg second.sg
b. Roten KoVer hat er keinen zweiten
red-sg suitcase has he no.sg second.sg
Of the remaining fourteen items, eight were DNP constructions with singular
LP and RP, one was a DNP with plural LP and RP, and four DNP had a plural
LP but checked for diVerent grammatical parameters. There was a further
item with an ambiguous LP. No distractor items were used in order to
increase the likelihood of a reply. Forty-Wve linguists responded by sorting
the sentences into the categories ‘*’, ‘?’, and ‘well-formed’. The results are
summarized in Figure 15.3, showing the number of participants choosing a
particular grade.
45 40
40
35 30
30 okay
25 20 ?
20 14
15 11 out
9
10 6 5
5 0
0
15.11a 15.11b 15.12
ambiguous item (15.11a) was diVerent. Only fourteen of the forty-Wve linguists
rejected this sentence. A statistical comparison between the number of rejec-
tions in (15.11a) versus (15.11b) revealed a signiWcant diVerence (x2 ¼ 5:82,
p < :05). The result shows that local ambiguities can improve acceptability
not only in the context of fast responses given by experimental subjects when
Wlling in a questionnaire. The eVect is also visible in the more reXected
judgements of professional syntacticians and other linguists.
When two singular NPs are coordinated by oder ‘or’, choosing plural agree-
ment for the verb is not (necessarily) semantically justiWed. Still, when one
searches the web, plural agreement, as in (15.14), is one of the frequent
options.
(15.14) Wer weiss, wie er oder ich in zwei Jahren denken
who knows how he or I in two years think.3pl
‘who knows what I or he will think in two years’ time’
Of the Wrst twenty-Wve occurences of er oder ich ‘he or I’ found by Google in
the German pages of the internet for which verbal agreement could be
determined (included in the Wrst 180 total hits for er oder ich), fourteen had
a plural verb, and eleven a singular one. However, the plural is less often
chosen when the addition of entweder ‘either’ comes close to forcing the
exclusive interpretation of oder. Among the Wrst twenty-Wve occurences of
entweder er oder ich ‘either he or I’ found by Google in the German pages of
the internet for which verbal agreement could be determined (included in the
Wrst 100 total hits for the construction), only Wve were constructed with a
plural verb.2
When one looks at the data extracted from the web showing singular
agreement more closely, an interesting pattern emerges. Of the thirty-one
examples, twenty-two involved a verb which was morphologically ambiguous
between a 1st and 3rd person interpretation (this is true of past tense verbs,
modal verbs, and a few lexical exceptions), and only nine bore an unambigu-
ous person feature (present tense verbs apart from the exceptions men-
tioned), with a strong bias for 3rd person (7 of 9). This is in line with
intuitions. Neither of the two verbal forms of schlafen ‘sleep’ sounds really
acceptable in the present tense, in which the forms of 1st (15.15c) and 3rd
person (15.15a) are distinguished morphologically. Examples (15.15b) and
(15.15d) involve verb forms that are morphologically ambiguous, and sound
much better.
(15.15) a. er oder ich schläft ER, UNA
he or I sleep3sg
b. er oder ich schlief ER, AMB
he or I slept.amb
c. er oder ich schlafe
he or I sleep.1sg
d. er oder ich darf schlafen
he or I may.amb sleep
2 The websearch was done on 26 January 2005 at 7pm GMT.
304 Gradience in Wh-Movement Constructions
7
6
5 4.51 4.41
3.96 4.07 UNA
4
AMB
3
2
1
ER ICH
ambiguous structures (15b, 16b) (AMB) were rated better (4.5) than
unambiguous ones (15a, 16a) (UNA) (4.0), (F1 (1,47) ¼ 6:79, p < :05;
F2 (1,15) ¼ 50:74, p < :001). There was no interaction between both factors
(F1 (1,47) ¼ 11, p ¼ :30, F2 < 1).
15.3.7.4 Discussion The order in which er and ich appeared in the
experimental items had no eVect on acceptability. In this respect, experiment
2a is comparable to the results of Timmermans et al. (2004) involving and-
coordination. The morphological ambiguity of the verb exerted an eVect on
acceptability, in the expected direction: whenever the morphological shape of
the verb Wts the person speciWcation of both pronouns because of the verbal
ambiguity, acceptability increases. This ambiguity eVect is in line with our
expectations. The acceptability of a sentence depends on whether the verb
agrees with the subject. In the unambiguous conditions, the verb visibly
disagrees with one of the two pronouns. In (15.15b, 15.16b), however, the
ambiguous verb appears to meet the requirements of both pronouns (but
only relative to diVerent interpretations of the verb), which makes a local
perception of acceptability possible. The computations for pairwise agreement
between the verb and the two pronouns yield positive results, which has a
positive eVect on global ambiguity even though the two pairwise agreement
computations cannot be integrated, since they work with diVerent
interpretations of the ambiguous verb.
One might object that the diVerence between the ambiguous and the
unambiguous condition might also be explained in terms of grammatical
well-formedness. The ambiguous verb form might have an underspeciWed
grammatical representation, viz. [singular, –2nd person], which is grammat-
ically compatible with both a 1st and a 3rd person subject. In contrast, the
features of the unambiguous 3rd person form clash with those of the 1st
person pronoun. Thus, the higher acceptability of the ambiguous forms
might only reXect the absence of a feature clash.
Such an account would leave it open, however, why the sentences with
ambiguous verb forms are not rated as fully grammatical, as they should be, if
no feature clash would be involved. We also tested the plausibility of this
alternative explanation in experiment 2b, in which we investigated the ac-
ceptability of sentences in which er ‘he’ and ihr ‘you, plural’ were conjoined by
or. In the regular present tense paradigm, 3rd person singular and 2nd person
plural forms fall together. There is no simple way in which this ambiguity
can be recast as underspeciWcation.3 If underspeciWcation rather than local
3 In a paper written after the completion of the present article, Müller (2005) oVers an under-
speciWcation analysis for (15.17a, 15.17b) within a distributed-morphology model, however.
306 Gradience in Wh-Movement Constructions
ambiguity was responsible for the Wndings in experiment 2a, there should be
no beneWt in acceptability in experiment 2b resulting from the use of hom-
ophonous forms.
7
6
5 4.64 4.52
UNA
4 3.42 3.33 AMB
3
2
1
ER IHR
15.3.10.3 Results and discussion As Wgure 15.6 shows, fronted verb phrases
that include a direct object are more acceptable than those that include a
fronted subject (F1 (1,47) ¼ 34:74, p < :001, F2 (1,15) ¼ 37:26, p < :001).,
Contrary to our expectation, there was no main eVect of ambiguity
(F1 < 1, F2 < 1) and no interaction between both factors
(F1 (1,47) ¼ 2:69, p ¼ :11, F2 < 1). We used the same material in a speeded
acceptability rating experiment.
310 Gradience in Wh-Movement Constructions
6
4.9 5.1
5
3.8 AMB
4 3.7
UNA
3
1
SUB OBJ
94.71 96.15
100
80
65.63
% acceptable
60 51.44 AMB
UNA
40
20
0
VP with SU VP with OB
5 4.7
4
3.15
3
1
Unambigous Ambiguous
also Fanselow et al. to appear). The fairly low acceptability value for the
unambiguous wh-condition constitutes clear evidence for this. Furthermore,
as in the preceding experiments, acceptability is aVected by the presence of a
local ambiguity in a signiWcant way: if the sentence to be judged can tempor-
arily be analysed as involving short movement, its acceptability goes up in
quite a dramatic way.
The initial segment of (15.21a) is locally ambiguous in more than one way.
In addition to the possibility of interpreting was as a matrix clause object or
an argument of the complement clause, was also allows for a temporary
analysis as a wh-scope-marker in the German ‘partial movement construc-
tion’ illustrated in (15.23).
(15.23) was denkst Du wen Maria einlädt
what think you who.acc Mary invites
‘who do you think that Mary invites?’
Therefore, we only know that the local ambiguity of (15.21a) increases its
acceptability, but we cannot decide whether this increase is really due to the
short versus long movement ambiguity.
The relative clause subexperiment avoids this problem. In the grammatical
context in which they appear in (15.22), the crucial elements die and der
can only be analysed as relative pronouns. The only ambiguity is a structural
one: long versus short movement of the relative pronoun. When the
relative pronoun is temporarily compatible with a short movement interpret-
ation, the structure is more acceptable than when the case of the
relative pronoun clashes with the requirements of the matrix clause
(F1 (1,47) ¼ 8:28, p < 0:01), F2 (1,7) ¼ 3:73, p ¼ :10).
Both subexperiments thus show the expected inXuence of the local
ambiguity on global acceptability: long distance movement structures are
Effects of Processing DiYculty on Judgements 315
7
6
4.76
5 4.17
4
3
2
1
Unambiguous Ambiguous
15.5 Conclusions
The experiments reported in this paper have shown that the presence of a
local ambiguity inXuences the overall acceptability of a sentence. If our
interpretation of the results is correct, there is a spillover from the acceptabil-
ity of the initial analysis of a locally ambiguous structure to the global
acceptability of the complete construction. Structures violating some con-
straint may appear more acceptable if their parsing involves an intermediate
analysis in which the crucial constraint seems fulWlled. Similar eVects show up
in further constructions, such as free relative clauses (see Vogel et al in
preparation).
At the theoretical level, several issues arise. First, the factors need to be
identiWed under which local ambiguities increase acceptability. Secondly,
means will have to be developed by which we can distinguish mitigating
eVects of local ambiguities from a situation in which the grammar accepts a
feature clash in case it has no morphological consequences. Thus, in
contrast to what we investigated in experiment 2, plural NP coordinations
such as the one in (15.24) that involve 1st and 3rd person NPs seem fully
acceptable although they should involve a clash of person features. Perhaps,
the diVerent status of (15.24) and the structures we studied in experiment 2 is
caused by the fact that 1st and 3rd person plural verb forms are always
identical in German, whereas the syncretisms studied above are conWned to
certain verb classes, or certain tense forms. Similarly, the case clash for was in
316 Gradience in Wh-Movement Constructions
4 Kaufen assigns accusative case to was, while the matrix predicate requires nominative case.
16
What’s What?
N O M I E RT E S C H I K - S H I R
1 Thanks to Gisbert Fanselow and an anonymous reviewer for their comments and to
Tova Rapoport, SoWe Raviv, and the audience of the ‘Conference on Gradedness’ at the University
of Potsdam for their feedback. This work is partially supported by Israel Science foundation Grant
#1012/03.
318 Gradience in Wh- Movement Constructions
The hierarchy in (16.4) (Lasnik and Saito 1992: 88) illustrates that the strength
of subjacency can be seen as depending on the number of barriers crossed. In
the last example of the three subjacency is doubly violated, in the others it is
only singly violated.
3 Timmermans et al. (2004) argue that agreement involves both a syntactic procedure and a
conceptual-semantic procedure which aVects person agreement with Dutch and German coordinated
elements which diVer in person features. The former, according to these authors, ‘hardly ever derails’.
This is what I have in mind here. The fact that nonsyntactic procedures are also involved in certain
agreement conWgurations is irrelevant to the point I’m making here.
322 Gradience in Wh- Movement Constructions
Example (16.11a) is perfect in Danish and sentences of this sort are common.
Example (16.11b) is surprisingly good in English in view of the fact that it
violates the complex NP constraint, yet it is not considered perfect by speakers
of English. In Erteschik-Shir (1982) I oVer more comparative data and
illustrate that the acceptability squish in English is exactly the same as in
Danish, yet all the examples in English are judged to be somewhat worse than
their Danish counterparts.
In Erteschik-Shir (1997), I introduce a theory of IS, f(ocus)-structure
theory. F-structure is geared to interact with syntax, phonology, and seman-
tics and is therefore viewed as an integral part of grammar. Here I argue that
this approach predicts gradience eVects of various kinds. In Section 16.2, I
map out the theory of f-structure. Section 16.3 demonstrates the f-structure
constraint on extraction. In section 16.4, I show that the same constraint
which accounts for extraction also accounts for Superiority in English and the
concomitant gradience eVects. In section 16.5, I extend this account to explain
diVerent superiority eVects in Hebrew, German, and Danish. Section 16.6
provides a conclusion.
4 In Erteschik-Shir (1997) I assume that the output of syntax is freely annotated for topic and focus
features. In Erteschik-Shir (2003), I introduce topic-focus features at initial merge on a par with
w-features in order to abide by the inclusiveness principle. The issue of how top/foc features are
introduced into the grammar is immaterial to the topic of this paper.
What’s What? 323
8 Sentences uttered out-of-the-blue are contextually linked to the here-and-now of the discourse. I
argue in Erteschik-Shir (1997) that such sentences are to be analysed as all-focus predicated of a ‘stage’
topic. The sentence It is raining, for example, has such a stage topic and is therefore evaluated with
respect to the here-and-now. All-focus sentences also have a canonical f-structure in which the (covert)
topic precedes the focus.
9 I have included only those aspects of f-structure strictly needed for the discussion in this chapter.
See Erteschik-Shir (1997) for a more complete introduction to f-structure theory.
What’s What? 325
Let us Wrst examine how the constraint applies to the graded extraction facts
in (16.6)–(16.8). In Erteschik-Shir and Rapoport (in preparation), we oVer a
lexical analysis of verbs in terms of meaning components. We claim that verbs of
speaking have a Manner (M) meaning-component. M-components are inter-
preted as adverbial modiWers, which normally attract focus. The M-component
of ‘light’ manner-of-speaking verbs such as say is light, that is there is no
adverbial modiWcation, and the verb cannot be focused. M-components can be
defocused contextually, enabling focus on the subordinate clause, which then
meets the requirement on extraction, since, according to the subject constraint,
the dependent (the trace) must be contained in the focus domain. It follows that,
out of context, only that-clauses under say allow extraction. All the other
manner-of-speaking verbs require some sort of contextualization in order for
the adverbial element of the verb to be defocused, thus allowing the subordinate
clause to be focused. Extraction is judged acceptable in these cases to the extent
that the context enables such a focus assignment.
The subject constraint, which constrains dependencies according to
whether the syntactic structure and the f-structure are aligned in a certain
way can, in cases such as this one, generate graded results. This is not always
the case. Extraction out of sentential subjects is always ungrammatical and
cannot be contextually ameliorated. Example (16.17) gives the f-structure
assigned to such a case:
(16.17) *Who is [that John likes t]top [interesting]foc
In order to comply with the subject constraint, the subject, in this case a
sentential one, must be assigned topic. Since dependents must be in the focus
domain, they cannot be identiWed within topics and extraction will always be
blocked. Although the subject constraint involves f-structure, it does not
necessarily render graded results. This is because the constraint involves not
only f-structure but also the alignment of f-structure with syntactic structure.
Sentential subjects are absolute islands because they are both IS topics and
syntactic subjects.
16.4 Superiority
Superiority eVects are graded as the examples in (16.18) show:
16.5.1 Hebrew
The Wrst observation concerning Hebrew is that although topicalization may
result in OSV, superiority violations are licensed only in the order OVS, as
shown in (16.28a) and (16.28b) from Fanselow (2004):
(16.28) a. ma kana mi?
what bought who
b. *ma mi kana?
c. mi kana ma?
Example (16.28a) is only licensed in a d-linked context in which a set of goods
are contextually speciWed and (16.28c) requires a d-linked context in which a
set of buyers are contextually speciWed. D-linking is not employed in Hebrew
as a way to avoid double ID as it is in English.11 The fronted wh-phrase
therefore does not form an I-dependency with its trace. It follows that only
one I-dependency is at work in Hebrew multiple wh-questions, namely the
one that renders the paired reading:
(16.29) a. mi kana ma
j________j
I-dependency
10 Triple dependencies are not derivable in this framework, a desirable result since they do not
render an optimal output.
11 There is no parallel to a ‘which-phrase’ in Hebrew. ‘eize X’ is best paraphrased as ‘what X’.
330 Gradience in Wh- Movement Constructions
b. ma kana mi
j________j
I-dependency
I conclude that the subject constraint is not operative in Hebrew as it is in
English. This conclusion is also supported by the fact that adding a third wh-
phrase not only does not help, as it does in English, but is blocked in all cases:
(16.30) a. *mi kana ma eifo?
b. *ma kana mi eifo?
c. *ma mi kana eifo?
The subject constraint constrains I-dependencies to the canonical f-structure
of a particular language. In English, the canonical f-structure is one in which
syntactic structure and f-structure are aligned. The fact that the OVS and SVO
orders of (16.28a) and (16.28c) are equally good in Hebrew and that the OSV
order of (16.28b) is ruled out may mean that it is the OSV word order which is
the culprit. The diVerence between OSVand OVS in Hebrew is associated with
the function of the subject when the object is fronted. When it is interpreted as
a topic, it is placed preverbally and when it is focused, it is placed after the verb.
The examples in (16.31)–(16.33) demonstrate that this is the case:
(16.31) a. et hasefer moshe kana.12
the-book Moshe bought
b. et hasefer kana moshe.
(16.32) a *et hasefer yeled exad kana
the-book boy one bought
‘Some boy bought the book.’
b. et hasefer kana yeled exad.
(16.33) a. et hasefer hu kana.
the-book he bought
b. *et hasefer kana hu
Example (16.31) shows that a deWnite subject which can function as both a topic
and a focus can occur both preverbally and postverbally. Example (16.32) shows
that an indeWnite subject which cannot be interpreted as a topic is restricted to
the postverbal position. Example (16.33), in turn, shows that a subject pronoun,
which must be interpreted as a topic, can only occur preverbally. Examples
(16.31a) and (16.33a) also require contextualization in view of the fact that both
12 ‘et’ marks deWnite objects. mi (¼ ‘who’) in object position is most naturally marked with ‘et’
whereas ma (¼ ‘what’) is not. I do not have an explanation for this distinction.
What’s What? 331
the topicalized object and the preverbal subject are interpreted as topics. Since
every sentence requires a focus, this forces the verb to be focused or else one of
the arguments must be interpreted contrastively. In either case the f-structure is
marked. To complete our investigation of the unmarked f-structure in Hebrew,
we must also examine the untopicalized cases:
(16.34) a. moshe kana et hasefer/sefer
Moshe bought the-book/(a) book
b. ?yeled exad kana et hasefer
boy one bought the book
The most natural f-structure of (16.34a) is one in which the subject is the topic
and the VP or object is focused. Example (16.34b) with the deWnite object
interpreted as a topic is marked.13 The results of both orders are schematized
in (16.35):
(16.35) a. *Otop Sfoc V
b. ?Otop Stop V
c. Otop V Sfoc
d. *Otop V Stop
e. Stop V Ofoc
f. ?Sfoc V Otop
Examples (16.35c) and (16.35e) are the only unmarked cases. I conclude that the
unmarked focus structure in Hebrew is one in which the topic precedes the
verb and the focus follows it. Hebrew dependencies therefore do not depend
on the syntactic structure of the sentence, but only on the linear order of topic
and focus with respect to the verb. The (subject) constraint on I-dependencies
which applies in Hebrew is shown in (16.36):
(16.36) An I-dependency can occur only in a canonical f-structure:
Xtop V [ . . . Y . . . ]foc
Example (16.36) correctly rules out (16.28b) and predicts that both (16.28a)
and (16.28c) are restricted to d-linked contexts (the initial wh-phrase must be
a topic).
15 Topicalization is licensed in subordinate clauses under a few bridge-verbs such as think. In such
cases the syntactically subordinate clause functions as a main clause.
16 Hebrew is like Danish in this respect. Since Hebrew is not a scrambling language, this is what is
predicted. Since English is not a scrambling language, English should also exhibit a diVerence between
main and subordinate clauses. This is not the case:
(i) Which book did which boy buy?
(ii) I don’t know which book which boy bought.
The diVerence between main and subordinate clauses in Danish arises because only in the former is f-
structure marked by word order. English main clauses do not diVer from subordinate clauses in this
way. This may explain why no diVerence in superiority eVects between main and subordinate clauses
can be detected.
What’s What? 335
diVerence between Hebrew and Danish is the preference for overtly d-linked
wh-phrases.
What is common to the languages examined here is the need for d-linking
of at least one of the wh-phrases in multiple wh-questions. That is why such
questions are always sensitive to context and therefore exhibit gradience.
Variation among languages follows from three parameters: the canonical
f-structure, the availability of topicalization and scrambling processes, and
the array of wh-phrases available in a particular language. As I have shown
here, all three must be taken into account in order to predict the cross-
linguistic distribution of superiority eVects.
17.1 Introduction
It appears that there is a rebellion in the making, against the intuitive
judgements of syntacticians as a privileged database for the development of
syntactic theory.1 Such intuitions may be deemed inadequate because they
are not suYciently representative of the language community at large. The
judgements are generally few and not statistically validated, and they are made
by sophisticated people who are not at all typical users of the language.
Linguists are attuned to subtle syntactic distinctions, about which they have
theories. However, our concern in this paper is with the opposite problem:
that even the most sophisticated judges may occasionally miss a theoretically
signiWcant fact about well-formedness.
In the 1970s it was observed that in order to make a judgement of syntactic
well-formedness one must sometimes be creative. It was noted that some
sentences, such as (17.1), are perfectly acceptable in a suitable discourse
context, and completely unacceptable otherwise (e.g. as the initial sentence
of a conversation; see Morgan 1973).
(17.1) Kissinger thinks bananas.
Context: What did Nixon have for breakfast today?
Given the context, almost everyone judges sentence (17.1) to be well-formed.
But not everyone is good at thinking up such a context when none is
1 This work is a revised and extended version of Kitagawa and Fodor (2003). We are indebted to
Yuki Hirose and Erika Troseth who were primarily responsible for the running of the experiments we
report here, and to Dianne Bradley for her supervision of the data analysis. We are also grateful to the
following people for their valuable comments: Leslie Gabriele, Satoshi Tomioka, three anonymous
reviewers, and the participants of Japanese/Korean Linguistics 12, the DGfS Workshop on Empirical
Methods in Syntactic Research, and seminars at Indiana University and CUNY Graduate Center. This
work has been supported in part by RUGS Grant-in-Aid of Research from Indiana University.
Prosodic InXuence on Syntactic Judgements 337
second clause, despite the nominative ‘we’. If so, they might very well arrive at
the peripheral-gap analysis and judge ‘we’ to be morphosyntactically incor-
rect on that basis. It might occur to some readers to try out another way of
reading the sentence, but it also might not. The standard orthography does
not mark the prosodic features required for gapping; they are not in the
stimulus, but must be supplied by the reader—if the reader thinks to do so.
Thus, grammaticality judgements on written sentences may make it appear
that clause-internal gapping is syntactically unacceptable, even if in fact the
only problem is a prosodic ‘garden path’ in reading such sentences. The way to
Wnd out is to present them auditorily, spoken with the highly marked prosody
appropriate for clause-internal gapping, so that their syntactic status can be
judged without interference from prosodic problems. The outcome of such a
test might still be mixed, of course, if indeed not everyone accepts (this kind
of) non-peripheral gapping, but at least it would be a veridical outcome, a
proper basis for building a theory of the syntactic constraints on ellipsis.
The general hypothesis that we will defend here is that any construction
which requires a non-default prosody is vulnerable to misjudgements of
syntactic well-formedness when it is read, not heard.2 It might be thought
that reading—especially silent reading—is immune to prosodic inXuences,
but recent psycholinguistic Wndings suggest that this is not so. Sentence
parsing data for languages as diverse as Japanese and Croatian are explicable
in terms of the Implicit Prosody Hypothesis (Fodor 2002a, 2002b): ‘In silent
reading, a default prosodic contour is projected onto the stimulus. Other
things being equal, the parser favors the syntactic analysis associated with the
most natural (default) prosodic contour for the construction.’ In other words,
prosody is always present in the processing of language, whether by ear or
by eye. And because prosodic structure and syntactic structure are tightly
related (Selkirk 2000), prosody needs to be under the control of the linguist
who solicits syntactic judgements, not left to the imagination of those
who are giving the judgements. At least this is so for any construction that
requires a non-default prosodic contour which readers may not be inclined to
assign to it.
We illustrate the importance of this methodological moral by considering
a variety of complex wh-constructions in Japanese. In previous work we
have argued that disagreements that have arisen concerning the syntactic
2 There is a Wne line between cases in which a prosodic contour helps a listener arrive at the
intended syntactic analysis, and cases in which a particular prosodic contour is obligatory for the
syntactic construction in question. The examples we discuss in this paper are of the latter kind, we
believe. But as the syntax–phonology interface continues to be explored, this is a distinction that
deserves considerably more attention.
Prosodic InXuence on Syntactic Judgements 339
3 The ambiguity of some complementizers will be important to the discussion below. For clarity, we
note here that both -ka and -no are ambiguous. -ka can function as a wh-scope marker, COMPWH, in
any clause, or as COMPWHETHER in subordinate clauses, and as a yes/no question marker Q in matrix
clauses. -no can be an interrogative complementizer only in matrix clauses, where it can function
either as COMPWH or as Q. For most speakers, -kadooka is unambiguously COMPWHETHER, although
a few speakers can also interpret -kadooka as a wh-scope marker (COMPWH) in a subordinate clause.
340 Gradience in Wh-Movement Constructions
1987, among others). It has been widely, although not universally, maintained
that subjacency is not applicable to covert (LF) operations. Thus it would
clarify the universal status of locality principles in syntax if this were also the
case in Japanese. This is why it is important to determine whether sentences of
this form (i.e. structure (17.4) where the subordinate complementizer is not a
wh-scope marker) are or are not grammatical. We will argue that they are, and
that contrary judgements are due to failure to assign the necessary prosodic
contour.
Example (17.5) raises a diVerent theoretical issue, concerning the relation
between surface position and scope at LF. Note Wrst that the long-distance
scrambling in (17.5) is widely agreed to be grammatical, even when the
COMPSubord is -kadooka (‘whether’) or -ka (‘whether’). Thus, subjacency
does not block scrambling (overt movement) from out of a wh-complement
in Japanese (Saito 1985).4 What needs to be resolved is the possible LF scope
interpretations of a wh-XP that has been scrambled into a higher clause. Does
it have matrix scope, or subordinate scope, or is it ambiguous between the
two? When a wh-XP has undergone overt wh-movement into a higher clause
in a language like English, matrix scope is the only possible interpretation. But
unlike overt wh-movement, long-distance scrambling in Japanese generally
forces a ‘radically’ reconstructed interpretation, that is, a long-distance scram-
bled item is interpreted as if it had never been moved. (Saito 1989 describes
this as scrambling having been ‘undone’ at LF; Ueyama 1998 argues that long-
distance scrambling applies at PF.) If this holds for the scrambling of a wh-
phrase, then subordinate scope should be acceptable in the conWguration
(17.5). However, there has been disagreement on this point. We will maintain
that subordinate scope is indeed syntactically and semantically acceptable,
and judgements to the contrary are most likely due to a clash between the
prosody that is required for the subordinate scope interpretation and the
default prosody that a reader might assign.
Thus, our general claim is that syntactic and semantic principles permit
both interpretations for both constructions (17.4) and (17.5) (given appropri-
ate complementizers), but that they must meet additional conditions on their
PFs in order to be fully acceptable (see Deguchi and Kitagawa 2002 for
details). We discuss the subjacency issue (relevant to construction (17.4)) in
Section 17.3.1, and the reconstruction issue (relevant to construction (17.5)) in
Section 17.3.2.
4 Saito argued, however, that subjacency does block overt scrambling out of a complex NP, and
out of an adjunct. This discrepancy, which Saito did not resolve, remains an open issue to be
investigated.
Prosodic InXuence on Syntactic Judgements 341
17.3.1 Wh-in-situ
First we illustrate Deguchi and Kitagawa’s observation for wh-in-situ. In
(17.6) and (17.7) we show a pair of examples which diVer with respect to
wh-scope, as determined by their selection of complementizers. In both
examples the wh-phrase dare-ni (‘who-DAT’) is in situ and there is no wh-
island, so there is no issue of a subjacency violation. What is of interest here is
the relation between wh-scope and the prosodic contour. (In all examples
below, bold capitals denote an emphatic accent; shading indicates the domain
of eradication, accent marks indicate lexical accents that are unreduced, and "
indicates a Wnal interrogative rise.)
5 In this chapter we retain the term ‘eradication’ used in our earlier papers, but we would emphasize
that it is not intended to imply total erasure of lexical accents. Rather, there is a post-focal reduction of
the phonetic realization of accents, probably as a secondary eVect of the general compression of the
pitch range and amplitude in the post-focal domain. See Ishihara (2003) and Kitagawa (2006), where
we substitute the term post-focal reduction. Also, we note that the utterance-Wnal rise that is charac-
teristic of a matrix question overrules eradication on the sentence-Wnal matrix COMPWH. The
prosodic descriptions given here should be construed as referring to standard (Tokyo) Japanese;
there is apparently some regional variability.
342 Gradience in Wh-Movement Constructions
(17.6) Short-EPD
#Keesatu-wa [ Mary-ga ano-ban re-ni denwasita-ka
DAre-ni denwasita-ka] ı́mademo sirabeteteiru.
Police-top Mary-nom that-night who-dat called-compwh even.now investigating
‘The police are still investigating who Mary called that night.’
(17.7) Long-EPD
Keesatu-wa [ Mary-ga ano-ban re-ni denwasita-to]
DAre-ni denwasita-to] imademo kangaeteiru-no"?
kangaeteiru-no
Police-top Mary-nom that-night who-dat called-compthat even.now think-compwh
‘Who do the police still think that Mary called that night?’
(17.8) Short-EPD
#Keesatu-wa [Mary-ga . . . ano-ban DAre-ni denwasita-kadooka ]
re-ni denwasita-kadooka
ı́mademo sirabeteteiru-no?
Police-top Mary-nom that-night who-dat called-compwh even.now investigating-q
a. ‘‘Who1 is such that the police are still investigating
[whether Mary called him/her1 that night]?’
b. ‘Are the police still investigating [whether Mary called
who that night]?’
6 Although this is generally true, Satoshi Tomioka notes (p.c.) that certain expressive modes (e.g. a
strong expression of surprise) can disturb the prosody-scope correlation for long-EPD. This phenom-
enon needs further investigation. See also Hirotani (2003) for psycholinguistic data on the perception
of long-EPD utterances.
Prosodic InXuence on Syntactic Judgements 343
(17.9) Long-EPD
Keesatu-wa [Mary-ga ano-ban DA re-ni denwasita-kadooka] imademo sirabeteteiru-no
Police-top Mary-nom that-night who-dat called-compwhether
even.now investigating-compwh
‘Who1 is such that the police are still investigating [whether Mary called him/her1 that
night]?’
Pronounced with long-EPD, (17.9) is acceptable and has matrix scope inter-
pretation of the wh-phrase. Sentence (17.8) with short-EPD is not acceptable. It
may be rejected on one of two grounds, as indicated in (a) and (b). Either a
hearer attempts to interpret (17.8) with matrix wh-scope as in translation
(17.8a), and would then judge the prosody to be inappropriate; or (17.8) is
interpreted with subordinate wh-scope as in translation (17.8b), in line with the
prosody, and the subordinate complementizer -kadooka (‘whether’) would be
judged ungrammatical since it cannot be a wh-scope marker. As noted, how-
ever, there are some speakers who are able to interpret -kadooka as a wh-scope-
marker, and for them (17.8) is acceptable with subordinate scope, as expected.
The fact that (17.9) is acceptable shows that matrix wh-scope is available
when the sentence is pronounced with long-EPD. Thus it is evident that
subjacency does not block scope extraction from a -kadooka clause. The
unacceptability of (17.8) therefore cannot be due to subjacency. Only an
approach that incorporates prosody can account for the contrast between
the two examples.
The confusion about the applicability of subjacency in Japanese is thus
resolved. When appropriate prosody is supplied, grammaticality judgements
show no eVect of subjacency on the interpretation of wh-in-situ.7 The variable
judgements reported in the literature are explicable on the assumption that
when no prosody is explicitly provided, readers project their own prosodic
contour. A reader of (17.8)/(17.9) who happened to project long-EPD would
Wnd the sentence acceptable on the matrix scope reading represented in (17.9).
A reader who happened to project short-EPD would in eVect be judging
(17.8), and would be likely to Wnd it unacceptable on the matrix scope reading
(and also the subordinate scope reading). This judgement could create the
impression that subjacency is at work. As we discuss below, there are reasons
why readers might be more inclined to project short-EPD than long-EPD for
7 See Deguchi and Kitagawa (2002) for evidence that long-EPD is not an exceptional prosody which
permits scope extraction out of wh-islands by overriding subjacency.
344 Gradience in Wh-Movement Constructions
17.3.2 Long-distance-scrambled wh
The other data disagreement which needs to be resolved with respect to
Japanese wh-constructions concerns the scope interpretation of a wh-XP
that has undergone long-distance scrambling out of a subordinate clause.
This was schematized in (17.5), repeated here, and is exempliWed in (17.10).
(17.5) Long-distance scrambling :
[ wh-XPi . . . [ . . . ti . . . COMPSubord ] . . . COMPMatrix ]
8 In Kitagawa and Fodor (2003) we noted two additional factors that could inhibit acceptance of
matrix scope for wh-in-situ: semantic/pragmatic complexity (the elaborate discourse presuppositions
that must be satisWed); and processing load (added cost of computing the extended dependency
between the embedded wh-phrase and a scope marker in the matrix clause). It seems quite likely that
these conspire with the default prosody to create diYculty with the matrix scope reading. However, we
will not discuss those factors here, because they cannot account for judgements on the wh-scrambling
examples that we examine in the next section.
Prosodic InXuence on Syntactic Judgements 345
9 If the XP were scrambled to a position between any overt matrix items and the Wrst overt element
of the subordinate clause, no matrix item would be trapped. The resulting sentence would be
ambiguous between local scrambling within the subordinate clause, and long-distance scrambling
into the matrix clause, so it would provide no overt evidence that the scrambled phrase is located in
the matrix clause in the surface form. In that case the example would not be useful for studying the
prosodic and/or semantic eVects of long-distance scrambling. Thus: any sentence that could be used to
obtain informants’ judgements on the acceptability of subordinate clause scope for a long-distance
scrambled wh would necessarily exhibit the entrapment which we argue favours long-EPD and hence
matrix scope.
Prosodic InXuence on Syntactic Judgements 347
eradication proceeds from the focused wh phrase through to the end of the
clause which is its scope. In the case of short-EPD, this will be from the
surface position of the wh-XP to the end of the subordinate clause. Thus,
the matrix topic John-wa in (17.10) will have its accent eradicated even
though it is not in the intended syntactic/semantic scope of the wh-XP.
This is represented in (17.11a).
(17.11) a. NAni
ni11-o John-wa
John-wa [Mary-ga
[Mary-ga t11 tabeta-ka ] siritagátteiru-no"
what-acc John-TOP Mary-NOM ate-compWH want.to.know-q
‘Does John want to know what Mary ate?’
ni1-o
b. NAni John-wa [Mary-ga
[Mary-gatt1 1 tabeta-ka
tabeta-ka ] siritagatteiru-no "
siritagatteiru-no
what-acc John-TOP Mary-NOM ate-compWHETHER want.to.know-compwh
*‘What does John want to know whether Mary ate?’
(i.e. ‘Whati is such that John wants to know whether Mary ate iti?’)
10 We noted above that semantic and processing factors may reinforce the prosodic default in the
case of wh-in-situ. However, those factors would favour subordinate scope for scrambled wh as well as
for wh-in-situ, as explained in Kitagawa and Fodor (2003). Thus, only the prosodic explanation makes
the correct prediction for both contexts: a preference for subordinate scope for wh-in-situ and a
preference for matrix scope for long-distance scrambled wh.
348 Gradience in Wh-Movement Constructions
overt prosody, hearers (and even speakers!) sometimes complain that they can
accept the subordinate scope interpretation only by somehow disregarding or
‘marginalizing’ the intervening matrix constituent. This is interesting. It can
explain why subordinate scope is not always felt to be fully acceptable even
with overt short-EPD, and it is exactly as could be expected given that the
intrusion of this matrix constituent in the subordinate clause eradication
domain is what disfavours the otherwise preferred short-EPD.
The general conclusion is clear: when overt prosody is present, listeners
can be expected to favour the syntactic structure congruent with the prosody
and judge the sentence accordingly. When no overt prosody is in the input,
as in reading, perceivers make their judgements on the basis of whatever
prosodic contour they have projected. This is a function of various principles,
some concerning the prosody–syntax interface, others motivated by purely
phonological concerns (e.g. rhythmicity) which in principle should be irrele-
vant to syntax. However, a reader may proceed as if the mentally projected
prosody had been part of the input, and then judge the syntactic well-
formedness of the sentence on that basis. Although some astute informants
may seek out alternative analyses, there is no compelling reason for them
to do so, especially as the request for an acceptability judgement implies—
contrary to the expectation in normal sentence processing for comprehen-
sion—that failure to Wnd an acceptable analysis is a legitimate possibility.
Therefore, any sentence (or interpretation of an ambiguous sentence) whose
required prosodic contour does not conform to general prosodic patterns
in the language is in danger of being judged ungrammatical in
reading, although perceived as grammatical if spoken with appropriate
prosody.
11 We set aside here studies whose primary focus is judgements by second language learners; see
Murphy (1997) and references there. Murphy found for English and French sentences that subjects
(both native and L2 speakers) were less accurate with auditory presentation than with visual presen-
tation, especially with regard to rejecting subjacency violations and other ungrammatical examples (cf.
Hill’s observation noted below).
12 Schütze also mentions an early and perhaps not entirely serious exploration by Hill (1961) of ten
example sentences, eight of them from Chomsky (1957), judged by ten informants. For instance, the
sentence I saw a fragile of was accepted in written form by only three of the ten informants. In spoken
form, with primary stress and sentence-Wnal intonation on the word of, it was subsequently accepted
by three of the seven who had previously rejected it. Some comments (e.g. ‘What’s an of?’) revealed
that accepters had construed of as a noun. Hill concluded, as we have done, that ‘intonation-pattern
inXuences acceptance or rejection.’ However, his main concern, unlike ours, was over-acceptance of
spoken examples. He warned that ‘If the intonation is right, at least enough normal speakers will react
to the sentence as grammatical though of unknown meaning, to prevent convergent rejection.’ Our
experimental data (see below) also reveal some tendency to over-accept items that are ungrammatical
but pronounced in a plausible-sounding fashion, but we show that this can be minimized by
simultaneous visual and auditory presentation.
350 Gradience in Wh-Movement Constructions
purposes, no exact comparison can be made of the results for the reading
condition and the listening condition, because there were other diVerences of
method between the two experiments.
In the listening test, the sentences were spoken with appropriate prosody:
long-EPD for wh-in-situ examples such as (17.12), and short-EPD for fronted-
13 An extra declarative clause was added in the sentences of type (17.13), structurally intermediate
between the lowest clause, in which the wh-XP originated, and the highest clause, into which it was
scrambled. The purpose of this was to prevent readers, at the point at which they encounter the -ka,
from easily scanning the remainder of the sentence to see that no other possible scope marker is
present. If they had at that point detected the absence of a scope marker in the matrix clause, they
would inevitably have adopted a subordinate scope reading, and that would have inactivated any
possible preference for the long-EPD/matrix scope reading.
Prosodic InXuence on Syntactic Judgements 351
wh examples such as (17.13). In the reading test, prosody was not mentioned,
so readers were free to assign either prosody (or none at all). Our hypothesis
that short-EPD is the default for wh-in-situ examples, and long-EPD the
default for fronted-wh examples, predicted that the experimental sentences
would be rejected more often when presented in written form than when
spoken with appropriate contours.
We conducted a comparable experiment in English in order to provide
some benchmarks for the Japanese study. The English experiment was con-
ducted by Fodor with Erika Troseth, Yukiko Koizumi, and Eva Fernández. The
target materials were of two types. One was ‘not-because’ sentences such as
(17.14), with potentially ambiguous scope that was disambiguated by a nega-
tive polarity item in the because-clause, which would be ungrammatical unless
that clause were within the scope of the negation.
(17.14) Marvin didn’t leave the meeting early because he was mad at anyone.
The second type of target sentence consisted of a complex NP (a head noun
modiWed by a PP) and a relative clause (RC) as in (17.15), which was poten-
tially ambiguous between high attachment to the head noun or low attach-
ment to the noun inside the PP, but was disambiguated by number agreement
toward high attachment.
(17.15) Martha called the assistant of the surgeons who was monitoring the
progress of the baby.
For both of these constructions, as in the Japanese experiment, the disam-
biguation was toward an interpretation which has been claimed to require a
non-default prosody.
For the not-because construction, Frazier and Clifton (1996) obtained
experimental results for written materials indicating that the preferred inter-
pretation has narrow-scope negation, that is the because-clause is outside the
scope of the negation. (Unlike (17.14), their sentences had no negative polarity
item forcing the wide-scope negation reading.) That the dispreferred wide-
scope-negation reading needs a special intonation contour is noted by
Hirschberg and Avesani (2000). In their study, subjects read aloud context-
ually disambiguated examples, and the recordings were acoustically analysed.
The Wnding was that the intonation contours for the (preferred) narrow-
scope-negation ‘usually exhibit major or minor prosodic phrase boundaries
before the subordinate conjunction’ and ‘usually were falling contours’. These
are typical features of multi-clause sentences without negation. By contrast,
Hirschberg and Avesani noted that the intonation contours for the (dis-
preferred) wide-scope-negation ‘rarely contain internal phrase boundaries’
352 Gradience in Wh-Movement Constructions
17.4.2.3 Results Acceptance rates (as percentages) are shown in Figures 17.1
to 17.5. What follows is a brief review of the experimental Wndings. We regard
these results as preliminary, and plan to follow them up with more extensive
studies, but we believe there are already outcomes of interest here, which we
hope will encourage comparable studies on other constructions and in other
languages.
Key to Wgures: In all the Wgures below, the percentage acceptance rates for
target sentences (of each type named) are represented by horizontal stripes.
The grammatical Wller sentences that are related to the targets are represented
by vertical stripes, and the ungrammatical Wllers related to the targets are
represented by dots. The assorted (unrelated) Wllers are shown separately at
the right.
In the Japanese data we see, as predicted, that the target sentences were
accepted more often in listening than in reading (see the central bars for wh-
in-situ and for matrix-scramble, across the two presentation conditions). The
diVerence is not large but it is statistically signiWcant (p < .01). Relatively
speaking, the results are very clear: in the reading condition, the targets are
intermediate in judged acceptability between their matched grammatical
354 Gradience in Wh-Movement Constructions
100.0
80.0
60.0
40.0
20.0
0.0
wh-in-situ matrix-scramble assorted fillers
100.0
80.0
60.0
40.0
20.0
0.0
wh-in-situ matrix-scramble assorted fillers
Wllers and matched ungrammatical Wllers, but in the listening condition they
draw signiWcantly closer to the grammatical Wllers, supporting the hypothesis
that the grammar does indeed license them, although only with a very
particular prosody.
Aspects of the Japanese data that need to be checked in continuing research
include the relatively poor rate of acceptance in reading for the matrix-
scramble Wller sentences,14 and the lowered acceptance of all grammatical
14 This result may dissolve in a larger-scale study. It was due here to only one of the four grammatical
Wller sentences related to the matrix-scramble experimental sentences. Unlike the other three,
which were close to 100% acceptance, this sentence was accepted at an approximately 50% level. This
Prosodic InXuence on Syntactic Judgements 355
100.0
80.0
60.0
40.0
20.0
0.0
not-because RC-attachment assorted fillers
100.0
80.0
60.0
40.0
20.0
0.0
not-because RC-attachment assorted fillers
100.0
80.0
60.0
40.0
20.0
0.0
not-because RC-attachment assorted fillers
For the not-because sentences, acceptance was extremely low in the reading
condition, little better than for the matched ungrammatical Wllers. In the
listening condition there was a striking increase in acceptance for these sen-
tences. It did not rise above 50 per cent, even with the appropriate prosody as
described by Hirschberg and Avesani (2000). The reason for this was apparent
in subjects’ comments on the materials after the experiment: it was often
remarked that some sentences were acceptable except for being incomplete.
In particular, the continuation rise at the end of the not-because sentences
apparently signalled that another clause should follow, to provide the real
reason for the event in question (e.g. Marvin didn’t leave the meeting early
because he was mad at anyone; he left early because he had to pick up his children
from school.)15 This sense of incompleteness clearly cannot be ascribed in the
listening condition to failure to assign a suitable prosodic contour. So it can be
regarded as a genuine syntactic/semantic verdict on these sentences. Thus this
is another case in which auditory presentation aVords a clearer view of the
syntactic/semantic status of the sentences in question. It seems that not-
because sentences with wide-scope negation stand in need of an appropriate
following discourse context—just as some other sentence types (such as (17.1)
above) stand in need of an appropriate preceding discourse context.
The RC-attachment sentences, on the other hand, showed essentially no
beneWt from auditory presentation. Acceptance in the reading condition was
15 We have found that a suitable preceding context can obviate the need for the Wnal rise, and with
it the associated expectation of a continuation. For example, a Wnal fundamental frequency fall on at
anyone is quite natural in: I have no idea what was going on that afternoon, but there’s one thing I do
know: Marvin did not leave the meeting early because he was mad at anyone. However, it is still essential
that there be no intonation phrase boundary between the not and the because-clause.
Prosodic InXuence on Syntactic Judgements 357
already quite high and it did not increase signiWcantly in the listening
condition. This could indicate that the prosodic explanation for the trend
toward low RC-attachment in English is invalid. But equally, it might show
only that this experimental protocol is not suYciently discriminating to reveal
the advantage of the appropriate prosody in this case where the diVerence is
quite subtle. The familiar preference of approximately 60 per cent for low RC-
attachment with written input is for fully ambiguous examples. For sentences
in which the ambiguity is subsequently disambiguated (e.g. by number
agreement, as in the present experiment), subjects may be able to recover
quite eYciently from this mild Wrst-pass preference once the disambiguating
information is encountered. (See Bader 1998 and Hirose 2003 for data on
prosodic inXuences on garden-path recovery in German and Japanese
respectively.) In short: the present results for relative clause attachment do
not contradict standard Wndings, although they also do not deWnitively
support a prosody-based preference for low RC attachment in English read-
ing. If prosody is the source of this preference, this experimental paradigm is
not the way to show it. This is an informative contrast with the case of the not-
because sentences, for which intuitive judgements are sharper and for which
the prosodic cues in spoken sentences had a signiWcant eVect in this experi-
mental setting.
An unwelcome outcome of the English study is that greater acceptance of the
target sentences in the listening condition is accompanied by greater accept-
ance of the related ungrammatical Wller sentences. It is conceivable, therefore,
that these Wndings are of no more interest than the discovery that inattentive
subjects can be taken in by a plausible prosodic contour applied to an ungram-
matical sentence as Hill (1961) suggested (see footnote 12). However, it seems
unlikely that this is all that underlies the considerable diVerence between
reading and listening for the not-because sentences. A plausible alternative
explanation is that listening imposes its own demands on perceivers, which
may oVset its advantages. Although auditory input provides informants with
additional linguistically relevant information in the form of a prosodic
contour, it also requires the hearer to perceive the words accurately and hold
the sentence in working memory without the opportunity for either look-
ahead or review. Our methodology provided no independent assessment of
whether errors of perception were more frequent for auditory than for visual
input. It seems likely that this was so (although the converse might be the case
for poor readers), since the distinction between grammatical and ungrammat-
ical sentences often rested on a minor morphophonological contrast. In the
English RC sentences the disambiguation turned on a singular versus plural
verb, for example walk versus walks, which could have been misheard.
358 Gradience in Wh-Movement Constructions
17.5 Conclusion
These experimental Wndings, although modest as yet, support the general
moral that we were tempted to draw on the basis of informal judgements of
written and spoken sentences. That is: acceptability judgements on written
sentences are not purely syntax-driven; they are not free of prosody even
though no prosody is present in the stimulus. This has a practical conse-
quence for the conduct of syntactic research: more widespread use needs to be
made of spoken sentences for obtaining syntactic well-formedness judge-
ments. The ideal mode of presentation, as we have seen, provides both written
and auditory versions of the sentence (e.g. in a PowerPoint Wle), to minimize
perceptual and memory errors while making sure that the sentence is being
judged on the basis of the prosody intended. We are sympathetic to the fact
that this methodological conclusion entails more work for syntacticians
(Cowart 1997: 64, warns that auditory presentation is ‘time-consuming to
prepare and execute’), but it is essential nonetheless, at least for sentences
whose prosody is suspected of being out of the ordinary in any way.
References
Abney, S. (1996) ‘Statistical methods and linguistics’, in J. Klavans and P. Resnik (eds),
The Balancing Act: Combining Symbolic and Statistical Approaches to Language.
Cambridge, MA: MIT Press, pp. 1–26.
—— (1997) ‘Stochastic attribute-value grammars’, Computational Linguistics 23(4):
597–618.
Albright, A. (2002) ‘Islands of reliability for regular morphology: Evidence from
Italian’, Language 78: 684–709.
—— and Hayes, B. (2002) ‘Modeling English past tense intuitions with minimal
generalization’, in M. Maxwell (ed.), Proceedings of the 2002 Workshop on Morpho-
logical Learning. Philadelphia: Association for Computational Linguistics.
—— and Hayes, B. (2003) ‘Rules vs. analogy in English past tenses: A computational/
experimental study’, Cognition 90: 119–61.
——, Andrade, A., and Hayes, B. (2001) ‘Segmental environments of Spanish diph-
thongization’, UCLA Working Papers in Linguistics 7: 117–51.
Alexopoulou, T. and Keller, F. (2003) ‘Linguistic complexity, locality and resumption’,
in Proceedings of the 22nd West Coast Conference on Formal Linguistics. Somerville,
MA: Cascadilla Press, pp. 15–28.
Altenberg, E. P. and Vago, R. M. ms. (2002) ‘The role of grammaticality judgments in
investigating first language attrition: A cross-disciplinary perspective’, paper pre-
sented at International Conference on First Language Attrition: Interdisciplinary
Perspectives on Methodological Issues. Free University, Amsterdam, 22–24 August.
Queens College and University of New York.
Altmann, G. T. M. (1998) ‘Ambiguity in sentence processing’, Trends in Cognitive
Sciences 2: 146–52.
Andersen, T. (1991) ‘Subject and topic in Dinka’, Studies in Linguistics 15(2): 265–94.
Anderson, J. R. (1990) The Adaptive Character of Thought. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Antilla, A. (2002) ‘Variation and phonological theory’, in J. Chambers , P. Trudgill, and
N. Schilling-Estes (eds), The Handbook of Language Variation and Change. Oxford:
Blackwell, pp. 206–43.
Anttila, A. (1997) ‘Deriving variation from grammar’, in F. Hinskens, R. van Hout, and
L. Wetzels (eds.), Variation, Change and Phonological Theory. Amsterdam: John
Benjamins, pp. 35–68.
Apoussidou, D. and Boersma, P. (2004) ‘Comparing two optimality-theoretic learning
algorithms for Latin stress’, WCCFL 23: 29–42.
Ariel, M. (1990) Accessing Noun Phrase Antecedents. London: Routledge.
360 References
Beckman, M., Munson, B., and Edwards, J. (2004) ‘Vocabulary growth and develop-
mental expansion of types of phonological knowledge’, LabPhon 9, pre-conference
draft.
beim Graben, P., Saddy, J. D., Schlesewsky, M., and Kurths, J. (2000) ‘Symbolic
dynamics of event-related brain potentials’, Physical Review E 62: 5518–41.
Belletti, A. (2004) ‘Aspects of the low IP area’, in L. Rizzi (ed.), The Structure of CP and
IP. The Cartography of Syntactic Structures, Volume 2. Oxford: Oxford University
Press, pp. 16–51.
——, Bennati, E., and Sorace, A. (2005) ‘Revisiting the null subject parameter from
an L2 developmental perspective’, paper presented at the XXXI Conference on
Generative Grammar, Rome, February 2005.
Bentley, D. and Eythórsson, T. (2004) ‘Auxiliary selection and the semantics of
unaccusativity’, Lingua 114: 447–71.
Benua, L. (1998) ‘Transderivational Identity’, Ph.D. thesis, University of Massachusetts.
Berent, I., Pinker, S., and Shimron, J. (1999) ‘Default nominal inflection in Hebrew:
Evidence for mental variables’, Cognition 72: 1–44.
Berg, T. (1998) Linguistic Structure and Change: An Explanation from Language
Processing. Oxford: Clarendon Press.
Berger, A., Della Pietra, S., and Della Pietra,V. (1996) ‘A maximum entropy approach
to natural language processing’, Computational Linguistics 22(1): 39–71.
Berko, J. (1958) ‘The child’s learning of English morphology’, Word 14: 150–77.
Bever, T. G. (1970) ‘The cognitive basis for linguistic structures’, in J. R. Hayes (ed.),
Cognition and the Development of Language. New York: John Wiley.
Bierwisch, M. (1968) ‘Two critical problems in accent rules’, Journal of Linguistics 4: 173–8.
—— (1988) ‘On the grammar of local prepositions’, in M. Bierwisch, W. Motsch, and
I. Zimmermann (eds), Syntax, Semantik und Lexicon (¼ Studia Grammatica XXIX).
Berlin: Akademie Verlag, pp. 1–65.
Bini, M. (1993) ‘La adquisicı́on del italiano: mas allá de las propiedades sintácticas del
parámetro pro-drop’, in J. Liceras (ed.), La linguistica y el analisis de los sistemas no
nativos. Ottawa: Doverhouse, pp. 126–39.
Birch, S. and Clifton, C. (1995) ‘Focus, accent, and argument structure: Effects on
language comprehension’, Language and Speech 38: 365–91.
Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford: Oxford Uni-
versity Press.
Blancquaert, E., Claessens, J., Goffin, W., and Stevens, A. (eds) (1962) Reeks Neder-
landse Dialectatlassen: Dialectatlas van Belgisch-Limburg en Zuid-Nederlands Lim-
burg, 8. Antwerpen: De Sikkel.
Blevins, J. (2004) Evolutionary Phonology: The Emergence of Sound Patterns. Cam-
bridge: Cambridge University Press.
Bod, R. (1998) Beyond Grammar: An Experience-Based Theory of Language. Stanford,
CA: Center for the Study of Language and Information.
——, Hay, J., and Jannedy, S. (2003) Probabilistic Linguistics. Cambridge, MA: MIT
Press.
362 References
Chomsky, N. (1986) Knowledge of Language. Its Nature, Origin, and Use. New York/
Westport/London: Praeger.
—— (1995) The Minimalist Program. Cambridge, MA: MIT Press.
—— and Halle, M. (1968) The Sound Pattern of English. New York: Harper and Row.
—— and Miller, G. A. (1963) ‘Introduction to the formal analysis of natural lan-
guages’, in R. D. Luce, R. R. Bush, and E. Galanter (eds), Handbook of Mathematical
Psychology, volume II. New York: John Wiley.
Christiansen, M. H. and Chater, N. (1999) ‘Connectionist natural language processing:
The state of the art’, Cognitive Science 23: 417–37.
—— and Chater, N. (2001) ‘Connectionist psycholinguistics: Capturing the empirical
data’, Trends in Cognitive Sciences 5: 82–8.
Cinque, G. (1990) Types of A’-Dependencies. Cambridge, MA: MIT Press.
—— (1993) ‘A null theory of phrase and compound stress’, Linguistic Inquiry 24:
239–97.
Clahsen, H. and Felser, C. (in press) ‘Grammatical processing in language learners’, to
appear in Applied Psycholinguistics.
Clements, G. N. (1992) ‘Phonological primes: Gestures or features?’, Working Papers of
the Cornell Phonetics Laboratory 7: 1–15.
Coetzee, A. (2004) ‘What it Means to be a Loser: Non-Optimal Candidates in
Optimality Theory’, Ph.D. thesis, University of Massachusetts.
Cohn, A. (1990) ‘Phonetic and Phonological Rules of Nasalization’, Ph.D. thesis,
UCLA, distributed as UCLA Working Papers in Phonetics 76.
—— (1993) ‘Nasalization in English: Phonology or phonetic’, Phonology 10: 43–81.
—— (1998) ‘The phonetics-phonology interface revisited: Where’s phonetics?’, Texas
Linguistic Forum 41: 25–40.
—— (2003) ‘Phonetics in phonology and phonology in phonetics’, paper presented at
11th Manchester Phonology Meeting, Manchester, UK.
——, Brugman, J., Clifford, C., and Joseph, A. (2005) ‘Phonetic duration of English
homophones: An investigation of lexical frequency effects’, presented at LSA, 79th
meeting, Oakland, CA.
Coleman, J. and Pierrehumbert, J. B. (1997) ‘Stochastic phonological grammars and
acceptability’, in Computational Phonology: Third Meeting of the ACL Special Interest
Group in Computational Phonology. Somerset, NJ: Association for Computational
Linguistics, 49–56.
Coles, M. G. H. and Rugg, M. D. (1995) ‘Event-related brain potentials: An introduc-
tion’, in M. D. Rugg and M. G. H. Coles (eds), Electrophysiology of Mind: Event-
Related Brain Potentials and Cognition. Oxford, UK: Oxford University Press,
pp. 1–26.
Collins, M. (1999) ‘Head-Driven Statistical Models for Natural Language Parsing’,
Ph.D. thesis, University of Pennsylvania, Philadelphia, PA.
Connine, C. M., Ferreira, F., Jones, C., Clifton, C., and Frazier, L. (1984) ‘Verb frame
preferences: Descriptive norms’, Journal of Psycholinguistic Research 13: 307–19.
366 References
Davis, S. and Baertsch, K. (2005) ‘The diachronic link between onset clusters and
codas’, in Proceedings of the Annual Meeting of the Berkeley Linguistics Society,
BLS 31.
De Smedt, K. J. M. J. (1994) ‘Parallelism in incremental sentence generation’, in G.
Adriaens and U. Hahn (eds), Parallelism in Natural Language Processing. New
Jersey: Ablex.
De Vincenzi, M. (1991) Syntactic Parsing Strategies in Italian. Dordrecht: Kluwer
Academic Publishers.
Deguchi, M. and Kitagawa, Y. (2002) ‘Prosody and Wh-questions’, in M. Hirotani
(ed.), Proceedings of the Thirty-second Annual Meeting of the North-Eastern Linguis-
tic Society, pp. 73–92.
Dell, G. S. (1986) ‘A spreading activation theory of retrieval in sentence production’,
Psychological Review 93: 283–321.
Diesing, M. (1992) Indefinites. Cambridge, MA: MIT Press.
Dryer, M. S. (1992) ‘The Greenbergian word order correlations’, Language 68: 81–138.
Duffield, N. (2003) ‘Measures of competent gradedness’, in R. van Hout, A. Hulk,
F. Kuiken, and R. Towel (eds), The Interface between Syntax and the Lexicon in
Second Language Acquisition. Amsterdam: John Benjamins.
Duffy, S. A., Morris, R. K., and Rayner, K. (1988) ‘Lexical ambiguity and fixation times
in reading’, Journal of Memory and Language 27: 429–46.
Elman, J. L. (1991) ‘Distributed representations, simple recurrent networks and gram-
matical structure’, Machine Learning 9: 195–225.
—— (1993) ‘Learning and development in neural networks: The importance of
starting small’, Cognition 48: 71–99.
Erteschik-Shir, N. (1973) ‘On the Nature of Island Constraints’, Ph.D. thesis, MIT.
—— (1982) ‘Extractability in Danish and the pragmatic principle of dominance’, in E.
Engdahl and E. Ejerhed (eds), Readings on Unbounded Dependencies in Scandi-
navian Languages. Sweden: Umeå.
—— (1986) ‘Wh-questions and focus’, Linguistics and Philosophy 9: 117–49.
—— (1997) The Dynamics of Focus Structure. Cambridge: Cambridge University
Press.
—— (1999) ‘Focus structure theory and intonation’, Language and Speech 42(2–3):
209–27.
—— (2003) ‘The syntax, phonology and interpretation of the information structure
primitives topic and focus’, talk presented at GLOW workshop: Information struc-
ture in generative theory vs. pragmatics, The University of Lund, Sweden.
—— and Lappin, S. (1983) ‘Dominance and extraction: A reply to A. Grosu’, Theor-
etical Linguistics 10: 81–96.
—— and Rapoport, T. R. (to appear) The Atoms of Meaning: Interpreting Verb
Projections. Ben Gurion University.
Escudero, P. and Boersma, P. (2003) ‘Modelling the perceptual development of
phonological contrasts with optimality theory and the gradual learning algorithm’,
368 References
——, Schlesewsky, M., Bornkessel, I., and Friederici, A. D. (2004) ‘Distinct neural
correlates of legal and illegal word order variations in German: How can fMRI
inform cognitive models of sentence processing’, in M. Carreiras and C. Clifton, Jr.
(eds), The On-line Study of Sentence Comprehension. New York: Psychology Press,
pp. 357–70.
Filiaci, F. (2003) ‘The Acquisition of Null and Overt Subjects by English-Near-Native
Speakers of Italian’, M.Sc. thesis, University of Edinburgh.
Fischer, S. (2004) ‘Optimal binding’, Natural Language and Linguistic Theory 22:
481–526.
Flemming, E. (2001) ‘Scalar and categorical phenomena in a unified model of
phonetics and phonology’, Phonology 18: 7–44.
Fodor, J. D. (1998) ‘Learning to parse?’, Journal of Psycholinguistic Research 27: 285–319.
—— (2002a) ‘Prosodic disambiguation in silent reading’, in M. Hirotani (ed.),
Proceedings of the Thirty-second Annual Meeting of the North-Eastern Linguistic
Society, pp. 113–37.
—— (2002b) ‘Psycholinguistics cannot escape prosody’, Proceedings of the Speech
Prosody 2002 Conference, Aix-en-Provence, pp. 83–8.
—— and Frazier, L. (1978) ‘The sausage machine: A new two-stage parsing model’,
Cognition 6: 291–325.
Ford, M., Bresnan, J., and Kaplan, R. M. (1982) ‘A competence-based theory of
syntactic closure’, in J. Bresnan (ed.), The Mental Representation of Grammatical
Relations, Cambridge, MA: MIT Press, pp. 727–96.
Francis, N., Kucera, H., and Mackie, A. (1982) Frequency Analysis of English Usage:
Lexicon and Grammar. Boston: Houghton Mifflin.
Frazier, L. (1978) ‘On Comprehending Sentences: Syntactic Parsing Strategies’, Ph.D.
thesis, University of Connecticut.
—— (1987) ‘Syntactic processing: Evidence from Dutch’, Natural Language and
Linguistic Theory 5: 519–59.
—— and Clifton, C. (1996) Construal. Cambridge, MA: MIT Press.
—— and d’Arcais, G. F. (1989) ‘Filler-driven parsing: A study of gap-filling in Dutch’,
Journal of Memory and Language 28: 331–44.
—— and Rayner, K. (1987) ‘Resolution of syntactic category ambiguities: Eye move-
ments in parsing lexically ambiguous sentences’, Journal of Memory and Language
26: 505–26.
Frieda, E. M., Walley, A. C., Flege, J. E., and Sloane, M. E. (2000) ‘Adults’ perception
and production of the English vowel /i/’, Journal of Speech, Language, and Hearing
Research 43: 129–43.
Friederici, A. D. (2002) ‘Towards a neural basis of auditory sentence processing’,
Trends in Cognitive Sciences 6: 78–84.
—— and Mecklinger, A. (1996) ‘Syntactic parsing as revealed by brain responses: First
pass and second pass parsing processes’, Journal of Psycholinguistic Research 25:
157–76.
370 References
Frisch, S., Schlesewsky, M., Saddy, D., and Alpermann, A. (2001) ‘Why syntactic ambi-
guity is costly after all. Reading time and ERP evidence’, AMLaP Saarbrücken 2001.
——, Schlesewsky, M., Saddy, D., and Alpermann, A. (2002) ‘The P600 as an indica-
tor of syntactic ambiguity’, Cognition 85: B83–B92.
Frisch, S. A. (1996) ‘Similarity and Frequency in Phonology’, Ph.D. thesis, North-
western University.
—— (2000) ‘Temporally organized lexical representations as phonological units’, in
M. B. Broe and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in
Laboratory Phonology V. Cambridge: Cambridge University Press, pp. 283–98.
—— (2004) ‘Language processing and OCP effects’, in B. Hayes, R. Kirchner, and D.
Steriade (eds), Phonetically-Based Phonology. Cambridge: Cambridge University
Press, pp. 346–71.
—— and Zawaydeh, B. A. (2001) ‘The psychological reality of OCP-place in Arabic’,
Language 77: 91–106.
——, Broe, M., and Pierrehumbert, J. (1997) ‘Similarity and phonotactics in Arabic’.
MS, Indiana University and Northwestern University.
——, Large, N. R., and Pisoni, D. B. (2000) ‘Perception of wordlikeness: Effects of
segment probability and length on the processing of nonwords’, Journal of Memory
and Language 42: 481–96.
——, Large, N., Zawaydeh, B., and Pisoni, D. (2001) ‘Emergent phonotactic general-
izations’, in J. L. Bybee and P. Hopper (eds), Frequency and the Emergence of
Linguistic Structure, Amsterdam: John Benjamins, pp. 159–80.
——, Pierrehumbert, J. B., and Broe, M. B. (2004) ‘Similarity avoidance and the
OCP’, Natural Language and Linguistic Theory 22: 179–228.
Ganong, W. F. III (1980) ‘Phonetic categorization in auditory word perception’,
Journal of Experimental Psychology: Human Perception and Performance 6: 110–25.
Garnsey, S. M. (1993) ‘Event-related brain potentials in the study of language: An
introduction’, Language and Cognitive Processes 8: 337–56.
——, Pearlmutter, N. J., Myers, E. M., and Lotocky, M. A. (1997) ‘The contributions
of verb bias and plausibility to the comprehension of temporarily ambiguous
sentences’, Journal of Memory and Language 37: 58–93.
Gathercole, S. and Baddeley, A. (1993) Working memory and language. Essays in
Cognitive Psychology. Hove: Lawrence Erlbaum.
Gervain, J. (2002) ‘Linguistic Methodology and Microvariation in Language: The Case
of Operator-Raising in Hungarian’, unpublished M.A. thesis, Dept. of Linguistics,
University of Szeged.
Gibson, E. (1998) ‘Linguistic complexity: Locality of syntactic dependencies’, Cogni-
tion 68: 1–76.
—— and Pearlmutter, N. J. (1998) ‘Constraints on sentence comprehension’, Trends in
Cognitive Sciences 2: 262–8.
—— and Schütze, C. T. (1999) ‘Disambiguation preferences in noun phrase conjunc-
tion do not mirror corpus frequency’, Journal of Memory and Language 40: 263–79.
References 371
Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., and Hickok, G. (1996a) ‘Cross-
linguistic attachment preferences: Evidence from English and Spanish’, Cognition
59: 23–59.
——, Schütze, C. T., and Salomon, A. (1996b) ‘The Relationship between the Fre-
quency and the Processing Complexity of Linguistic Structure’, Journal of Psycho-
linguistic Research 25: 59–92.
Godfrey, J. J., Holliman, E. C., and McDaniel, J. (1992) ‘SWITCHBOARD: Telephone
speech corpus for research and development’, in IEEE International Conference on
Acoustics, Speech and Signal Processing 1992, pp. 517–20.
Goldinger, S. D. (2000) ‘The role of perceptual episodes in lexical processing’, in
A. Cutler, J. M. McQueen, and R. Zondervan (eds), Proceedings of SWAP (Spoken
Word Access Processes), Nijmegen: Max Planck Institute for Psycholinguistics,
pp. 155–9.
Goldstone, R., Medin, D., and Gentner, D. (1991) ‘Relational similarity and the non-
independence of features in similarity judgments’, Cognitive Psychology 23: 222–62.
Goldwater, S. and Johnson, M. (2003) ‘Learning OT constraint rankings using a
maximum entropy model’, in J. Spenader, A. Eriksson, and Ö. Dahl (eds), Proceed-
ings of the Stockholm Workshop on Variation within Optimality Theory, Stockholm
University, pp. 111–20.
Grabe, E. (1998) ‘Comparative Intonational Phonology: English and German’, Ph.D.
thesis, Universiteit Nijmegen.
Greenberg, J. H. (1963) ‘Some universals of grammar with particular reference to the
order of meaningful elements’, in J. H. Greenberg (ed.), Universals of Language,
Cambridge, MA: MIT Press, pp. 73–113.
—— and Jenkins, J. J. (1964) ‘Studies in the psychological correlates of the sound
system of American English’, Word 20: 157–77.
Grice, M., Baumann, S., and Benzmüller, R. (2003) ‘German intonation in autoseg-
mental phonology’, in S.-A. Jun (ed.), Prosodic Typology. Oxford: Oxford University
Press.
Grimshaw, J. (1997) ‘Projection, heads, and optimality’, Linguistic Inquiry 28:
373–422.
—— and Samek-Lodovici, V. (1998) ‘Optimal subjects and subject universals’, in P.
Barbosa, D. Fox, P. Hangstrom, M. McGinnis, and D. Pesetsky (eds), Is the Best
Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press,
pp. 193–219.
Grodzinsky, Y. and Reinhart, T. (1993) ‘The innateness of binding and coreference’,
Linguistic Inquiry 24: 69–101.
Groos, A. and H. van Riemsdijk (1981) ‘Matching effects with free relatives:
A parameter of core grammar’, in A. Belletti, L. Brandi, and L. Rizzi (eds), Theories
of Markedness in Generative Grammar. Pisa: Scuola Normale Superiore di Pisa,
pp. 171– 216.
Grosjean, F. (1980) ‘Spoken word recognition processes and the Gating paradigm’,
Perception and Psychophysics 28: 267–83.
372 References
Lasnik, H. and Saito, M. (1992) Move a, Conditions on its Application and Output.
Cambridge, MA: MIT Press.
Lavoie, L. (1996) ‘Lexical frequency effects on the duration of schwa-resonant sequences
in American English’, poster presented at LabPhon 5, Chicago, IL, June 1996.
—— (2002) ‘Some influences on the realization of for and four in American English’,
JIPA 32: 175–202.
Legendre, G. (in press) ‘Optimizing auxiliary selection in Romance’, to appear in
R. Aranovich (ed.), Cross-Linguistic Perspectives on Auxiliary Selection. Amsterdam:
John Benjamins.
Legendre, G. and Sorace, A. (2003) ‘Auxiliaires et intransitivité en français et dans les
langues romanes’, in D. Godard (ed.), Les langues romanes; problèmes de la phrase
simple. Paris: Editions du CNRS, pp. 185–234.
——, Miyata, Y., and Smolensky, P. (1990a) ‘Harmonic grammar—a formal multi-
level connectionist theory of linguistic well-formedness: Theoretical foundations’,
in Proceedings of the Twelfth Annual Conference of the Cognitive Sciences. Cam-
bridge, MA: Lawrence Erlbaum, pp. 388–95.
——, Miyata, Y., and Smolensky, P. (1990b) ‘Harmonic grammar—A formal multi-
level connectionist theory of linguistic well-formedness: An application’, in Pro-
ceedings of the Twelfth Annual Conference of the Cognitive Sciences. Cambridge, MA:
Lawrence Erlbaum, pp. 884–91.
——, Miyata, Y., and Smolensky, P. (1991) ‘Unifying syntactic and semantic ap-
proaches to unaccusativity: A connectionist approach’, Proceedings of the 17th
Annual Meeting of the Berkeley Linguistic Society. Berkeley: Berkeley Linguistic
Society, pp. 156–67.
Lehiste, I. (1973) ‘Phonetic disambiguation of syntactic ambiguity’, Glossa 7: 107–22.
Lehmann, W. P. (1978) ‘The great underlying ground-plans’, in W. P. Lehmann (ed.),
Syntactic Typology: Studies in the Phenomenology of Language. Austin: University of
Texas Press, pp. 3–55.
Leonini, C. and Belletti, A. (2004) ‘Subject inversion in L2 Italian’, in S. Foster-Cohen,
M. Sharwood Smith, A. Sorace, and M. Ota (eds), Eurosla Workbook 4: 95–118.
——, and Rappaport Hovav, M. (1995) Unaccusativity at the Syntax–Semantics Inter-
face. Cambridge, MA: MIT Press.
Levin, B. and Rappaport Hovav, M. (1996) ‘From lexical semantics to argument
realization’, MS, Northwestern University and Bar-Ilan University.
Lewis, R. (1993) ‘An Architecturally-Based Theory of Human Sentence Comprehen-
sion’, Ph.D. thesis, Carnegie Mellon University.
Li, C. and Thompson, S. (1976) ‘Subject and topic: A new typology’, in C. Li (ed.),
Subject and Topic. New York: Academic Press, pp. 457–89.
Liberman, M. and Pierrehumbert, J. (1984) ‘Intonational invariance under changes in
pitch range and length’, in M. Aronoff and R. T. Oehrle (eds), Language Sound
Structure. Cambridge, MA: MIT Press, pp. 157–233.
Liceras, J., Valenzuela, E., and Dı́az, L. (1999). ‘L1/L2 Spanish Grammars and the
Pragmatic Deficit Hypothesis’, Second Language Research 15: 161–90.
380 References
Lindblom, B. (1990) ‘Models of phonetic variation and selection’, PERILUS 11: 65–100.
Lodge, M. (1981) Magnitude Scaling: Quantitative Measurement of Opinions. Beverley
Hills, CA: Sage Publications.
Lohse, B., Hawkins, J. A., and Wasow, T. (2004) ‘Domain minimization in English
verb-particle constructions’, Language 80: 238–61.
Lovrič, N. (2003) ‘Implicit Prosody in Silent Reading: Relative Clause Attachment in
Croatian’, Ph.D. thesis, CUNY Graduate Center.
Luce, P. A. and Large, N. (2001) ‘Phonotactics, neighborhood density, and entropy in
spoken word recognition’, Language and Cognitive Processes 16: 565–81.
—— and Pisoni, D. B. (1998) ‘Recognizing spoken words: The neighborhood activa-
tion model’, Ear and Hearing 19: 1–36.
MacBride, A. (2004) ‘A Constraint-Based Approach to Morphology’, Ph.D. thesis,
UCLA, http://www.linguistics.ucla.edu/ faciliti/diss.htm.
MacDonald, M. C. (1993) ‘The interaction of lexical and syntactic ambiguity’, Journal
of Memory and Language 32: 692–715.
—— (1994) ‘Probabilistic constraints and syntactic ambiguity resolution’, Language
and Cognitive Processes 9: 157–201.
——, Pearlmutter, N. J., and Seidenberg, M. S. (1994) ‘Lexical nature of syntactic
ambiguity resolution’, Psychological Review 101: 676–703.
Manning, C. D. (2003) ‘Probabilistic syntax’, in R. Bod, J. Hay, and S. Jannedy (eds),
Probabilistic Linguistics. Cambridge, MA: MIT Press, pp. 289–341.
—— and Schütze, H. (1999) Foundations of Statistical Natural Language Processing.
Cambridge, MA: MIT Press.
Marantz, A. (2000) Class Notes. Cambridge, MA: MIT Press.
Marcus, M. P. (1980) A Theory of Syntactic Recognition for Natural Language.
Cambridge, MA: MIT Press.
Marks, L. E. (1965) Psychological Investigations of Semi-Grammaticalness in English.
Dissertation, Harvard: Harvard University
—— (1967) ‘Judgments of grammaticalness of some English sentences and semi-
sentences’, American Journal of Psychology 20: 196–204.
Marslen-Wilson, W. (1987) ‘Functional parallelism in spoken word-recognition’, in
U. Frauenfelder and L. Tyler (eds), Spoken Word Recognition. Cambridge, MA: MIT
Press, pp. 71–102.
Mateu, J. (2003) ‘Digitizing the syntax–semantics interface. The case of aux-selection
in Italian and French’, MS, Universitat Autònoma of Barcelona.
Matzke, M., Mai, H., Nager, W., Rüsseler, J., and Münte, T. F. (2002) ‘The cost of freedom:
An ERP-study of non-canonical sentences’, Clinical Neurophysiology 113: 844–52.
Maynell, L. A. (1999) ‘Effect of pitch accent placement on resolving relative clause
ambiguity in English’, The 12th Annual CUNY Conference on Human Sentence
Processing (Poster). New York, March.
McCarthy, J. (2003) ‘OT constraints are categorical’, Phonology 20: 75–138.
—— and Prince, A. (1993) ‘Generalized alignment’, in G. Booij and J. van Marle (eds),
Morphology Yearbook 1993. Dordrecht: Kluwer, pp. 79–153.
References 381
Mitchell, D. C., Cuetos, F., Corley, M. M. B., and Brysbaert, M. (1996) ‘Exposure based
models of human parsing: Evidence for the use of coarse-grained (nonlexical)
statistical records’, Journal of Psycholinguistic Research 24: 469–88.
Montrul, S. (2002) ‘Incomplete acquisition and attrition of Spanish tense/aspect
distinctions in adult bilinguals’, Bilingualism: Language and Cognition 5: 39–68.
—— (2004) ‘Subject and object expression in Spanish heritage speakers: A case of
morphosyntactic convergence’, Bilingualism: Language and Cognition 7(2): 125–42.
—— (in press) ‘Second language acquisition and first language loss in adult early
bilinguals: Exploring some differences and similarities’, to appear in Second Lan-
guage Research.
Moreton, E. (2002) ‘Structural constraints in the perception of English stop-sonorant
clusters’, Cognition 84: 55–71.
Morgan, J. L. (1973) ‘Sentence fragments and the notion ‘‘Sentence’’ ’, in B. B. Kachru,
R. B. Lees, Y. Malkiel, A. Pietrangeli, and S. Saporta (eds), Issues in Linguistics:
Papers in Honor of Henry and Renee Kahane. Urbana, IL: University of Illinois Press.
Müller, G. (1999) ‘Optimality, markedness, and word order in German’, Linguistics
37(5): 777–818.
—— (2005) ‘Subanalyse verbaler Flexionsmarker’, MS, Universität Leipzig.
Müller, H. M., King, J. W., and Kutas, M. (1997) ‘Event-related potentials elicited by
spoken relative clauses’, Cognitive Brain Research 5: 193–203.
Müller, N. and Hulk, A. (2001) ‘Crosslinguistic influence in bilingual language acqui-
sition: Italian and French as recipient languages’, Bilingualism: Language and
Cognition 4: 1–22.
Müller, S. (2004) ‘Complex NPs, subjacency, and extraposition’, Snippets, Issue 8.
Munson, B. (2001) ‘Phonological pattern frequency and speech production in adults
and children’, Journal of Speech, Language, and Hearing Research 44: 778–92.
Murphy, V. A. (1997) ‘The effect of modality on a grammaticality judgment task’,
Second Language Research 13: 34–65.
Muysken, P. (2000) Bilingual Speech. A Typology of Code-Mixing. Cambridge:
Cambridge University Press.
Nagy, N. and Reynolds, B. (1997) ‘Optimality theory and variable word-final deletion
in Faetar’, Language Variation and Change 9: 37–55.
Narayanan, S. and Jurafsky, D. (1998) ‘Bayesian models of human sentence processing’,
in M. A. Gernsbacher and S. J. Derry (eds), Proceedings of the 20th Annual
Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum
Associates.
Nespor, M. and Vogel, I. (1986) Prosodic Phonology. Dordrecht: Foris.
Ney, H., Essen, U., and Kneser, R. (1994) ‘On structuring probabilistic dependencies in
stochastic language modeling’, Computer Speech and Language 8: 1–28.
Nooteboom, S. G. and Kruyt, J. G. (1987) ‘Accents, focus distribution, and the
perceived distribution of given and new information: An experiment’, Journal of
the Acoustical Society of America 82: 1512–24.
References 383
Nusbaum, H. C., Pisoni, D. B., and Davis, C. K. (1984) ‘Sizing up the Hoosier mental
lexicon: Measuring the familiarity of 20,000 words’, Research on Speech Perception,
Progress Report 10. Bloomington: Speech Research Laboratory, Indiana University,
pp. 357–76.
Ohala, J. J. (1992) ‘Alternatives to the sonority hierarchy for explaining the shape of
morphemes’, Papers from the Parasession on the Syllable. Chicago: Chicago Linguis-
tic Society, 319–38.
Osterhout, L. and Holcomb, P. J. (1992) ‘Event-related brain potentials elicited by
syntactic anomaly’, Journal of Memory and Language 31: 785–804.
Paolillo, J. C. (1997) ‘Sinhala diglossia: Discrete or continuous variation?’, Language in
Society 26, 2: 269–96.
Paradis, J. and Navarro, S. (2003) ‘Subject realization and cross-linguistic interference
in the bilingual acquisition of Spanish and English: What is the role of input?’,
Journal of Child Language 30: 1–23.
Pechmann, T., Uszkoreit, H., Engelkamp, J., and Zerbst, D. (1996) ‘Wortstellung im
deutschen Mittelfeld. Linguistische Theorie und psycholinguistische Evidenz’, in
C. Habel, S. Kanngießer, and G. Rickheit (eds), Perspektiven der Kognitiven Linguistik.
Modelle und Methoden. Opladen: Westdeutscher Verlag, pp. 257–99.
Perlmutter, D. (1978) ‘Impersonal passives and the unaccusative hypothesis’, Berkeley
Linguistic Society 4: 126–70.
Pesetsky, D. (1987) ‘Wh-in situ: Movement and unselective binding’, in E. Reuland and
A. T. Meulen (eds), The Representation of (in)Definiteness. Cambridge, MA: MIT
Press, pp. 98–129.
Peters, J. (2005) Intonatorische Variation im Deutschen. Studien zu ausgewählten
Regionalsprachen. Habilitation thesis. University of Potsdam.
Pickering, M. J., Traxler, M. J., and Crocker, M. W. (2000) ‘Ambiguity resolution in
sentence processing: Evidence against frequency-based accounts’, Journal of Mem-
ory and Language 43: 447–75.
Pierrehumbert, J. (1980) ‘The Phonology and Phonetics of English Intonation’, Ph.D.
thesis, MIT.
—— (1994) ‘Syllable structure and word structure: A study of triconsonantal clusters
in English’, in P. Keating (ed.), Phonological Structure and Phonetic Form: Papers in
Laboratory Phonology III. Cambridge: Cambridge University Press, pp. 168–88.
—— (2001) ‘Stochastic phonology’, GLOT 5 No. 6: 195–207.
—— (2002) ‘Word-specific phonetics’, in C. Gussenhoven and N. Warner (eds),
Laboratory Phonology 7. Berlin: Mouton de Gruyter, pp. 101–39.
—— (2003) ‘Probabilistic phonology: Discrimination and robustness’, in R. Bod,
J. Hay, and S. Jannedy (eds), Probabilistic Linguistics. Cambridge, MA: MIT Press,
pp. 177–228.
—— and Steele, S. (1989) ‘Categories of tonal alignment in English’, Phonetica 46:
181–96.
——, Beckman, M. E. and Ladd, D. R. (2000) ‘Conceptual foundations in phonology
as a laboratory science’, in N. Burton-Roberts, P. Carr, and G. Docherty (eds),
384 References
Phonological Knowledge: Conceptual and Empirical Issues. New York: Oxford Uni-
versity Press, pp. 273–304.
Pinker, S. (1999) Words and Rules: The Ingredients of Language. New York: Basic Books.
—— and Prince, A. (1988) ‘On language and connectionism: Analysis of a parallel
distributed processing model of language acquisition’, Cognition 28: 73–193.
Pitt, M. A. and McQueen, J. M. (1998) ‘Is compensation for coarticulation mediated
by the lexicon?’, Journal of Memory and Language 39: 347–70.
Pittner, K. (1991) ‘Freie Relativsätze und die Kasushierarchie.’, in E. Feldbusch (ed.),
Neue Fragen der Linguistik. Tübingen: Niemeyer, pp. 341–7.
Polinsky, M. (1995) ‘American Russian: Language loss meets language acquisition’,
in W. Browne, E. Dornish, N. Kondrashova and D. Zec (eds), Annual Workshop on
Formal Approaches to Slavic Linguistics. Ann Arbor: Michigan Slavic Publications,
pp. 371–406.
Pollard, C. and Sag, I. A. (1987) Information-Based Syntax and Semantics, Vol.1:
Fundamentals. Stanford University, Stanford: CSLI Lecture Notes No. 13.
—— and Sag, I. A. (1992) ‘Anaphors in English and the scope of the binding theory’,
Linguistic Inquiry 23: 261–305.
Prasada, S. and Pinker, S. (1993) ‘Generalization of regular and irregular morpho-
logical patterns’, Language and Cognitive Processes 8: 1–56.
Prévost, P. and White, L. (2000) ‘Missing surface inflection or impairment in second
language acquisition? Evidence from tense and agreement’, Second Language
Research 16: 103–33.
Prince, A. and Smolensky, P. (1993) ‘Optimality theory: Constraint interaction in
generative grammar’. Technical Report TR-2, Rutgers University Center for Cogni-
tive Science. Published as Prince and Smolensky (2004).
—— and Smolensky, P. (1997) ‘Optimality: From neural networks to universal gram-
mar’, Science 275: 1604–10.
—— and Smolensky, P. (2004) Optimality theory: Constraint interaction in generative
grammar. Oxford: Blackwell.
Pritchett, B. L. (1992) Grammatical Competence and Parsing Performance. Chicago:
University of Chicago Press.
Ramscar, M. (2002) ‘The role of meaning in inflection: Why the past tense does not
require a rule’, Cognitive Psychology 45: 45–94.
Randall, J. (in press) ‘Features and linking rules: A parametric account of auxiliary
selection’, to appear in R. Aranovich (ed.), Cross-Linguistic Perspectives on Auxiliary
Selection. Amsterdam: John Benjamins.
Rayner, K., Carlson, M., and Frazier, L. (1983) ‘Interaction of syntax and semantics
during sentence processing: Eye movements in the analysis of semantically biased
sentences’, Journal of Verbal Learning and Verbal Behavior 22: 358–74.
Reinhart, T. (1981) ‘Pragmatics and linguistics: An analysis of sentence topics’, Philo-
sophica 27: 53–94.
References 385
Vennemann, T. (1974) ‘Theoretical word order studies: Results and problems’, Papiere
zur Linguistik 7: 5–25.
Vergnaud, J. R. and Zubizarreta, M. L. (1992) ‘The definite determiner and the
inalienable constructions in French and English’, Linguistic Inquiry 23: 592–652.
Vetter, H. J., Volovecky, J., and Howell, R. W. (1979) ‘Judgments of grammaticalness:
A partial replication and extension’, Journal of Psycholinguistic Research 8: 567–83.
Viterbi, A. J. (1967) ‘Error bounds for convolutional codes and an asymptotically
optimal decoding algorithm’, IEEE Transactions on Information Processing 13: 260–9.
Vitevitch, M. and Luce, P. (1998) ‘When words compete: Levels of processing in
perception of spoken words’, Psychological Science 9: 325–9.
——, Luce, P., Charles-Luce, J., and Kemmerer, D. (1997) ‘Phonotactics and syllable
stress: Implications for the processing of spoken nonsense words’, Language and
Speech 40: 47–62.
Vitz, P. C. and Winkler, B. S. (1973) ‘Predicting judged similarity of sound of English
words’, Journal of Verbal Learning and Verbal Behavior 12: 373–88.
Vogel, R. (2001) ‘Case conflict in German free relative constructions. An optimality
theoretic treatment’, in G. Müller and W. Sternefeld (eds), ‘Competition in
syntax’, No. 49 in Studies in Generative Grammar. Berlin and New York: de
Gruyter, pp. 341–75.
—— (2002) ‘Free relative constructions in OT syntax’, in G. Fanselow and C. Féry
(eds), ‘Resolving conflicts in grammars: Optimality theory in syntax, morphology,
and phonology,’ in Linguistische Berichte Sonderheft 11, Hamburg: Helmut Buske
Verlag, pp. 119–62.
—— (2003a) ‘Remarks on the architecture of OT syntax’, in R. Blutner and H. Zeevat
(eds), Optimality Theory and Pragmatics. Houndmills, Basingstoke, Hampshire,
England: Palgrave Macmillan, pp. 211–27.
—— (2003b) ‘Surface matters. Case conflict in free relative constructions and case
theory’, in E. Brandner and H. Zinsmeister (eds), New Perspectives on Case Theory.
Stanford: CSLI Publications, pp. 269–99.
—— (2004) ‘Correspondence in OT syntax and minimal link effects’, in A. Stepanov,
G. Fanselow, and R. Vogel (eds), Minimality Effects in Syntax. Berlin: Mouton de
Gruyter, pp. 401–41.
—— and Frisch, S. (2003) ‘The resolution of case conflicts. A pilot study’, in S. Fischer,
R. van de Vijver, and R. Vogel (eds), Experimental Studies in Linguistics 1, vol. 21 of
Linguistics in Potsdam. Institute of Linguistics, Potsdam: University of Potsdam,
pp. 91–103.
—— and Zugck, M. (2003) ‘Counting markedness. A corpus investigation on Ger-
man free relative constructions’, in S. Fischer, R. van de Vijver, and R. Vogel (eds),
Experimental Studies in Linguistics 1, vol. 21 of Linguistics in Potsdam. Institute of
Linguistics, Potsdam: University of Potsdam, pp. 105–22.
——, Frisch, S., and Zugck, M. (in preparation) ‘Case matching. An empirical study.’
MS, University of Potsdam. To appear in Linguistics in Potsdam.
References 393
Warner, N., Jongman, A., Sereno, J., and Kemps, R. (2004) ‘Incomplete neutralization
and other sub-phonemic durational differences in production and perception:
Evidence from Dutch’, Journal of Phonetics 32: 251–76.
Warren, P., Grabe, E., and Nolan, F. (1995) ‘Prosody, phonology and parsing in closure
ambiguities’, Language and Cognitive Processes 10: 457–86.
Wasow, T. (1997) ‘Remarks on grammatical weight’, Language Variation and Change
9: 81–105.
—— (2002) Postverbal Behavior. Stanford University, Stanford: CSLI Publications.
Welby, P. (2003) ‘Effects of pitch accent position, type and status on focus projection’,
Language and Speech 46: 53–8.
White, L. (2003) Second Language Acquisition and Universal Grammar. Cambridge:
Cambridge University Press.
Wickelgren, W. A. (1977) ‘Speed-accuracy tradeoff and information processing
dynamics’, Acta Psychologica 41: 67–85.
Wiltschko, M. (1998) ‘Superiority in German’, in E. Curtis, J. Lyle, and G. Webster
(eds), Wccfl 16, the Proceedings of the Sixteenth West Coast Conference on Formal
Linguistics. Stanford: CSLI, pp. 431–45.
Withgott, M. (1983) ‘Segmental Evidence for Phonological Constituents’, Ph.D. thesis,
Univerity of Texas, Austin.
Wright, R. (1996) ‘Consonant Clusters and Cue Preservation’, Ph.D. thesis, University
of California, Los Angeles.
Wunderlich, D. (1997) ‘Cause and the structure of verbs’, Linguistic Inquiry 28: 27–68.
—— (2003) ‘Optimal case patterns: German and Icelandic compared’, in E. Brandner
and H. Zinsmeister (eds), New Perspectives on Case Theory. Stanford: CSLI Publi-
cations, pp. 329–65.
Yamashita, H. (2002) ‘Scrambled sentences in Japanese: Linguistic properties and
motivations for production’, Text 22(4): 597–633.
—— and Chang, F. (2001) ‘ ‘‘Long before short’’ preference in the production of a
head-final language’, Cognition 81: B45–B55.
Young, R. W., Morgan Sr., W., and Midgette, S. (1992) Analytical Lexicon of Navajo.
Albuquerque: University of New Mexico Press.
Zec, D. (2002) ‘On the prosodic status of function words’, Working Papers of the
Cornell Phonetics Laboratory 14: 206–48.
Zribi-Hertz, A. (1989) ‘A-type binding and narrative point of view’, Language
65: 695–727.
Zsiga, E. (2000) ‘Phonetic alignment constraints: consonant overlap and palataliza-
tion in English and Russian’, Journal of Phonetics 28: 69–102.
Zue, V. and Laferriere, M. (1979) ‘Acoustic study of medial /t, d/ in American English’,
JASA 66: 1039–50.
Zuraw, K. R. (2000) ‘Patterned Exceptions in Phonology’, Ph.D. thesis, University of
California, Los Angeles.
This page intentionally left blank
Index of Languages
Spanish 41, 111, 115, 121, 171, 231–2, 352 Terena 78–9
Sundanese 37 Thai 78–9
Swedish 247, 333 Totonaco 78–9
Tsou 78–9
Tagalog 72 Turkish 115
Takelma 78–9
Telugu 78–9 Wichita 78–9
Index of Subjects
Pollard 53, 61, 222 Schütze 137, 214, 234, 239, 337, 348–9
Prasada 185, 201 Schütze 86–7
Prévost 119 Schwarzschild 148
Prince 169, 201, 252, 274, 277, 280 Scobbie 30, 32
Pritchett 232 Selkirk 148, 338
Semdt, De 213
Randall 110 Sendlmeier 74
Rapoport 325 Serratrice 115, 121
Rappaport Hovav 108 Sevald 82
Rarnscar 201 Skut 242
Rayner 237, 239 Smith 76
Reinhart 48, 52, 55, 61–3, 67, 322–3 Smolensky 110, 169, 252, 274, 277, 280–1,
Reiss 43, 200 283
Reuland 48, 57–8, 60–4, 67 Snyder 293
Reynolds 255 Sorace 106–13, 116, 118, 120–2, 270–1, 277
Riehl 38 Speer 151
Riemsdijk, van 250, 254–5, 295 Sproat 39
Riezler 241 Sprouse 122
Ringen 70, 75 Stallings 211, 225
Rizzi 112, 117, 120 Starke 112
Röder 125, 128 Stearns 145
Roland 230 Stechow, von 147
Rooth 239, 242, 244 Steele 149
Rosenbach 284–5 Steriade 29, 36–9, 42, 83, 170
Rosengren 131 Sternefeld 249
Ross 86, 317–18 Stevens 109
Rugg 131, 139 Stolcke 235
Rumelhart 47, 281 Stowell 64
Russell 193 Strawson 322
Sturt 230, 240
Sabourin 120 Suppes 14
Sag 53, 61, 222 Swerts 151
Saito 318, 340 Swinney 215
Samek-Lodovici 112, 147–8
Samuel 171 Takahashi 344
Sankoff 6 Tanenhaus 230, 233, 245
Sapir 186–7, 192 Taraban 239
Sarle 282 Tesar 169, 283
Schafer 151 Thompson 323
Schladt 59, 66 Timberlake 7
Schlesewsky 128, 130–1, 165, 292, 294 Timmermans 302, 305, 321
Schmerling 147 Tomlin 222
Schriefers 130 Tráisson 63
Index of Names 405