You are on page 1of 416

Gradience in Grammar

This page intentionally left blank


Gradience in Grammar

Generative Perspectives

Edited by
G I S B E RT FA N S E LOW, CA RO L I N E F ÉRY,
R A L F VOG E L , A N D M AT T H I A S S C H L E S EWSK Y

1
3
Great Clarendon Street, Oxford ox2 6dp
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With oYces in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
ß 2006 organization and editorial matter Gisbert Fanselow,
Caroline Féry, Ralf Vogel, and Matthias Schlesewsky
ß 2006 the chapters their various authors
The moral rights of the author have been asserted
Database right Oxford University Press (maker)
First published 2006
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Typeset by SPI Publisher Services, Pondicherry, India
Printed in Great Britain
on acid-free paper by
Biddles Ltd. www.biddles.co.uk
ISBN 019–927479–7 978–019–927479–6
1 3 5 7 9 10 8 6 4 2
Contents

Notes on Contributors vii

1 Gradience in Grammar 1
Gisbert Fanselow, Caroline Féry, Ralf Vogel,
and Matthias Schlesewsky

Part I The Nature of Gradience 23


2 Is there Gradient Phonology? 25
Abigail C. Cohn
3 Gradedness: Interpretive Dependencies and Beyond 45
Eric Reuland
4 Linguistic and Metalinguistic Tasks in Phonology:
Methods and Findings 70
Stefan A. Frisch and Adrienne M. Stearns
5 Intermediate Syntactic Variants in a Dialect-
Standard Speech Repertoire and Relative Acceptability 85
Leonie Cornips
6 Gradedness and Optionality in Mature and
Developing Grammars 106
Antonella Sorace
7 Decomposing Gradience: Quantitative versus
Qualitative Distinctions 124
Matthias Schlesewsky, Ina Bornkessel, and Brian McElree

Part II Gradience in Phonology 143


8 Gradient Perception of Intonation 145
Caroline Féry and Ruben Stoel
9 Prototypicality Judgements as Inverted Perception 167
Paul Boersma
10 Modelling Productivity with the Gradual Learning Algorithm:
The Problem of Accidentally Exceptionless Generalizations 185
Adam Albright and Bruce Hayes
vi Contents

Part III Gradience in Syntax 205


11 Gradedness as Relative Efficiency in the Processing
of Syntax and Semantics 207
John A. Hawkins
12 Probabilistic Grammars as Models of Gradience in
Language Processing 227
Matthew W. Crocker and Frank Keller
13 Degraded Acceptability and Markedness in Syntax,
and the Stochastic Interpretation of Optimality Theory 246
Ralf Vogel
14 Linear Optimality Theory as a Model of Gradience
in Grammar 270
Frank Keller

Part IV Gradience in Wh-Movement Constructions 289


15 Effects of Processing Difficulty on Judgements of Acceptability 291
Gisbert Fanselow and Stefan Frisch
16 What’s What? 317
Nomi Erteschik-Shir
17 Prosodic Influence on Syntactic Judgements 336
Yoshihisa Kitagawa and Janet Dean Fodor

References 359
Index of Languages 395
Index of Subjects 397
Index of Names 400
Notes on Contributors

Adam Albright received his BA in linguistics from Cornell University in 1996 and his
Ph.D. in linguistics from UCLA in 2002. He was a Faculty Fellow at UC Santa Cruz
from 2002 to 2004, and is currently an Assistant Professor at MIT. His research
interests include phonology, morphology, and learnability, with an emphasis on
using computational modelling and experimental techniques to investigate issues in
phonological theory.
Paul Boersma is Professor of Phonetic Sciences at the University of Amsterdam. He
works on constraint-based models of bidirectional phonology and phonetics and its
acquisition and evolution. His other interests include the history of Limburgian tones
and the development of Praat, a computer program for speech analysis and manipu-
lation.
Ina Bornkessel graduated from the University of Potsdam with a ‘Diplom’ (MA-
equivalent) in general linguistics in 2001. In her Ph.D. research (completed in 2002 at
the Max Planck Institute of Cognitive Neuroscience/University of Potsdam), she
developed a neurocognitive model of real-time argument comprehension, which is
still undergoing further development and is now being tested in a number of
typologically different languages. Ina Bornkessel is currently the head of the
Independent Junior Research Group Neurotypology at the Max Planck Institute for
Human Cognitive and Brain Sciences in Leipzig.
Abigail C. Cohn is an Associate Professor in Linguistics at Cornell University, Ithaca,
NY, where her research interests include phonology, phonetics, and their interactions.
She has focused on the sound systems of a number of languages of Indonesia, as well
as English and French. She received her Ph.D. in Linguistics at UCLA.
Leonie Cornips is Senior Researcher at the Department of Language Variation of the
Meertens Institute (Royal Netherlands Academy of Arts and Sciences) and head of
the department from 1 January 2006. Her dissertation (1994, Dutch Linguistics,
University of Amsterdam) was about syntactic variation in a regional Dutch variety
(Heerlen Dutch). Recently, she was responsible for the methodology of the Syntactic
Atlas of the Dutch Dialects-project. Further, she examines non-standard Dutch
varieties from both a sociolinguistic and generative perspective.
Matthew W. Crocker (Ph.D. 1992, Edinburgh) is Professor of Psycholinguistics at
Saarland University, having previously been a lecturer and research fellow at the
University of Edinburgh. His current research exploits eye-tracking methods and
computational modelling to investigate adaptive mechanisms in human language
viii Notes on Contributors

comprehension, such as the use of prior linguistic experience and immediate visual
context.

Nomi Erteschik-Shir is Associate Professor of Linguistics at Ben-Gurion University,


Israel. Her research has concentrated on the focus–syntax–phonology interface. She is
the author of The Dynamics of Focus Structure (Cambridge University Press, 1997),
co-editor of The Syntax of Aspect (Oxford University Press, 2005) and is currently at
work on a volume on the syntax–information structure interface, to be published by
Oxford University Press.

Gisbert Fanselow received his Ph.D. in Passau (1985) and is currently Professor
of Syntax at the University of Potsdam. Current research interests include word
order, discontinuous arguments, wh-movement, and empirical methods in syntax
research.

Caroline FÉry is Professor of Grammar Theory (Phonology) at the University of


Potsdam. Her research bears on different aspects of the phonological theory, as well as
on interface issues in which prosody plays the main role. She received her Ph.D. in
Konstanz and her Habilitation in Tübingen. In recent years she has been involved in a
large project on the information structure in typological comparison.

Janet Dean Fodor has a BA in psychology and philosophy (Oxford University 1964)
and a Ph.D. in linguistics (MIT 1970). She is Distinguished Professor of Linguistics,
Graduate Center, City University of New York and President of the Linguistic Society
of America since 1997. Her research interests are human sentence processing, espe-
cially prosodic influences and garden path reanalysis; and language learnability
theory, especially modelling syntactic parameter setting.
Stefan Frisch studied psychology, philosophy, and linguistics (at the University
of Heidelberg and the Free University Berlin). He was a research assistant at the
Max-Planck Institute of Human Cognitive and Brain Sciences, Leipzig and at
the University of Potsdam, where he got his Ph.D. in 2000. He is now a research
assistant at the Day-Care Clinic of Cognitive Neurology, University of Leipzig.
Stefan A. Frisch received his Ph.D. in linguistics from Northwestern University, and
is currently Assistant Professor in the Department of Communication Sciences and
Disorders at the University of South Florida. He specializes in corpus studies of
phonotactic patterns, experiments on the acceptability of novel word stimuli, and
the phonetic study of phonological speech errors.

John A. Hawkins received his Ph.D. in linguistics from Cambridge University


in 1975. He has held permanent positions at the University of Essex, the Max-
Planck-Institute for Psycholinguistics, and the University of Southern California.
He currently holds a chair at Cambridge University. His present research interests
include language typology, processing, and grammar from a psycholinguistic
perspective.
Notes on Contributors ix

Bruce Hayes is Professsor of Linguistics at the University of California, Los Angeles.


He has published extensively in the fields of phonology and metrics, and is the author
of Metrical Stress Theory: Principles and Case Studies.
Frank Keller is a Lecturer in the School of Informatics at the University of
Edinburgh, where his research interests include sentence processing, linguistic
judgements, and cognitive modelling. Before joining the School of Informatics, he
worked as a postdoc at Saarland University, having obtained a Ph.D. in cognitive
science from Edinburgh.
Yoshihisa Kitagawa is Associate Professor of Linguistics, Indiana University and
has a Ph.D. in linguistics (University of Massachusetts at Amherst 1986). Current
research interests are: information structure and syntax; the influence of prosody,
processing, and pragmatics on grammaticality judgements; refinement of the Min-
imalist programme; interpretation of multiple-Wh-questions; economy in ellipsis;
and anaphora.
Brian McElree received his Ph.D. in psychology in 1990 from Columbia University,
where he studied psycholinguistics with Tom Bever and human memory with Barbara
Dosher. He is currently Professor of Psychology at New York University. His research
focuses on the cognitive structures and processes that enable language comprehen-
sion, as well as more general issues concerning basic mechanisms in human memory
and attention.
Eric Reuland, who achieved his Ph.D. at Groningen in 1979, is Professor of
Linguistics at Utrecht University. He conducts his research at the Utrecht Institute
of Linguistics OTS. He is currently programme director of the Research Master
Linguistics and European editor of Linguistic Inquiry. His research focuses on the
ways anaphoric dependencies are linguistically represented, and include the relation
between grammatical and neuro-cognitive architecture.
Matthias Schlesewsky has a ‘Diplom’ in chemistry (MSc equivalent) from the
University of Potsdam (1992). He subsequently moved to the field of theoretical
linguistics, in which he obtained his Ph.D. in 1997 (Potsdam) with a dissertation
on the processing of morphological case in German. From 1997 to 2002, he was a
research assistant at the Department of Linguistics of the University of Potsdam,
before becoming an Assistant Professor (‘Juniorprofessor’) of Neurolinguistics at the
Philipps University Marburg in 2002. As documented by a wide range of international
publications, his research interests focus primarily on the real-time comprehension of
morphological case and arguments and its neurophysiological and neuroanatomical
correlates.
Antonella Sorace is Professor of Developmental Linguistics at the University of
Edinburgh. The common thread of her research is an interest in the developmental,
synchronic, and experimental aspects of variation in language. Topics that she has
investigated include grammatical representations and processing in early and late
x Notes on Contributors

bilinguals, the interfaces between syntax and other domains, the psychology of
linguistic intuitions, and the cognitive neuroscience of the bilingual brain.
Ruben Stoel received a Ph.D. from Leiden University in 2005. He is currently a
research assistant at the University of Leiden. His interests include intonation, infor-
mation structure, and the languages of Indonesia.
Ralf Vogel is Assistant Professor at the University of Bielefeld, having received a
Ph.D. from Humboldt University Berlin in 1998. His research agenda involves the
syntax of the Germanic languages, the development of Optimality theory syntax,
both formally and empirically, including interdisciplinary interaction with computer
scientists and psychologists. The development and exploration of empirical methods
in grammar research has become a strong focus of his work.
1

Gradience in Grammar
G I S B E RT FA N S E LOW, C A RO L I N E F É RY, R A L F VO G E L ,
A N D M AT T H I A S S C H L E S EW S K Y

1.1 Introductory remarks


Gradience has become a topic to which more and more linguists are turning
their attention. One can attribute this increased interest to a variety of
diVerent factors, but a growing methodological awareness certainly plays a
role, as do the dramatically improved research possibilities in various
domains such as the handling of very large corpora. However, applying these
new methods borrowed from neighbouring disciplines such as psychology,
sociology, or computer science rarely yields the kind of clear-cut categorical
distinctions that most grammatical theories seem to work with. While the
increase of interest in gradience may be a fairly recent phenomenon, reXec-
tions on gradience can even be found in the very Wrst treatise on generative
grammar, viz. Chomsky (1955): DiVerent types of violations of syntactic
principles, Chomsky observes, do not always lead to the same perception of
ill-formedness. Sentences in which the basic laws of phrase structure are not
respected (Man the saw cat a, Geese live in happily) appear much worse than
those which merely violate selectional restrictions (John frightens sincerity).
Such impressionistic judgements have been conWrmed in controlled experi-
ments (see Marks 1965), suggesting that there are indeed degrees of grammat-
icality, and Chomsky (1955) integrated an analysis of degrees of
grammaticality in his early grammatical models.
The term gradience was introduced by Bolinger (1961a, 1961b), the Wrst
detailed work on the topic. Bolinger argued that, in contrast to what the
structuralist tradition claimed and what the structuralist methodology
implied, linguistic categories have blurred edges more often than not, and that
apparently clear-cut categories often have to be replaced by non-discrete scales.
Bolinger identiWed gradient phenomena in various domains of grammar,
2 Gradience in Grammar

such as semantic ambiguities, syntactic blends, and in phonological entities,


including intensity and length, among others.
Such gradience in phonology can be illustrated by syllable structure
constraints. How acceptable are the syllables pleal or plill? Are they possible
English words? They violate a constraint which requires that the second
segment of a complex onset should not be identical to the coda consonant
(Davis and Baertsch 2005). On the other hand, plea, pea, lea, peal, eel, ill, and
pill are English words, and nothing prohibits CCVC syllables in English. For
this reason, pleal and plill are better syllables than the sequence tnoplt, which
violates several well-formedness restrictions on the English syllable.
With this simple example, we have already deWned three degrees of accept-
ability of monosyllabic words for English: perfect (and attested), not so good
(and unattested) and unacceptable (and unattested). This three-way accept-
ability scale can be further reWned: tnopt, however badly ill-formed, is slightly
better than tnoplt, since at least it does not violate the sonority hierarchy in
addition to containing a prohibited onset. In attested syllables, more Wne-
grained wordlikeness exists as well: the attested words mentioned above are
better than equally attested words like Knut or sour with a more marked
syllable structure. In Knut, the onset [kn] is otherwise not attested in English,
and in sour, the sequence of a diphthong plus an [r] is marginal.
The present book aims to represent the state of the art in dealing with
gradient phenomena in the formal aspects of language, with a clearly visible
emphasis on phonological and syntactic issues. The gradient data discussed in
this book come from a wide array of phenomena: formant values, segmental
and allophonic variations, morphological productivity, and tone and stress
patterns, as well as word order variation, question formation, case matching
in free relative clauses, and binding facts. The variety of empirical domains
addressed in this book is matched by a similar richness of factors discussed in
the chapters that might be made responsible for the gradient rather than
categorical properties of the constructions in question: frequency of occur-
rence Wgures prominently, in particular in the phonological contributions,
but the impact of processing diYculty, the production–perception distinc-
tion, and the Wt into context and information structure are considered as well,
among other topics. Several papers also explore the representation and
explanation of gradience within grammar itself—following a line of research
opened by the quantitative studies of Labov.
What makes dealing with gradience quite diYcult is that it is linked to a
number of central (and often somewhat controversial) issues in the theory of
grammar and our understanding of language. Linguistic objects such as words
or sentences may sound acceptable to various degrees, and one can collect
Gradience in Grammar 3

data on acceptability by asking speakers of a language for judgements on some


scale, but all that we arrive at by this procedure is the (relative) mean
acceptability of a particular linguistic object among the speakers of a
language, dialect, or, at least, the participants of the judgement study. In a
strict sense, the object generated by the experiment (viz. mean acceptability)
thus no longer necessarily belongs to the scope of a generative grammar if
such a grammar is meant to represent a psychological state of a native speaker,
as Chomsky claims. A certain mean acceptability value n may arise because
nearly all speakers consulted Wnd the sentence acceptable to degree n, or
because the mean of widely diverging judgements of the participants equals
n, and the behaviour of the individual speaker may show high or low variance
as well. Is the gradience we observe thus a property of the mentally repre-
sented grammar, or does it reXect variation among speakers?
If the latter is the case, our descriptions will have to cover geographical,
social, and also temporal dimensions, to the extent that variation and
language change are related. These issues have been discussed in detail in
phonology, as we will see below. One crucial question is whether diVerent
dialects and registers should be described independently of each other (and if
they are represented independently of each other in our brains); another
question is whether it is justiWable at all to talk of clearly separate dialects
or sociolects.
The problem of intermediate varieties is addressed by Leonie Cornips in
her contribution ‘Intermediate Syntactic Variants in a Dialect: Standard
Speech Repertoire and Relative Acceptability’. She describes how intermediate
language varieties with their own syntactic characteristics emerge from a
language contact situation which is typical for speakers in European societies,
namely the contact between a regional dialect and the standard language
variety spoken in the respective country. An intermediate variety is a variant
of the standard language that is characteristic of a particular region. Cornips
shows that in the intermediate variant Heerlen Dutch the inalienable posses-
sion construction has syntactic characteristics which can be found neither in
the local dialect nor in standard Dutch. Speakers can eVortlessly shift between
the three variants, the regional Heerlen dialect, Standard Dutch, and Heerlen
Dutch. However, Cornips and her colleagues found that it is no longer
possible for these speakers to give clear-cut judgements about the local dialect.
For instance, speakers tend to attribute to their local dialect all versions of the
inalienable possession construction which are possible in the three varieties.
Cornips argues that the intermediate variants form a continuum with the
standard and local dialect varieties, which has arisen due to geographic,
stylistic, and social factors. As a consequence, speakers can only make relative
4 Gradience in Grammar

judgements by comparing variants of a particular form. Gradient acceptabil-


ity is here the result of uncertainty about their own dialects on the part of the
speakers.
The study of aspects of variation thus leads us back to the question of what
gradience means in terms of an individual speaker’s grammar. Controversial
issues may arise in two further domains here. First, if one accepts that data
such as response frequencies in a forced choice acceptability experiment are
relevant for a linguist identifying the grammar of a language, the question
arises as to why perception data should have a privileged status. The grammar,
so one can argue, should also have something to say about production facts,
such as frequencies in controlled production experiments, or frequencies in
text corpora. The contributions by Boersma, Crocker and Keller, and Vogel,
discussed in detail below, address the issue of how to cope with situations in
which gradience as determined by production facts (corpus frequency) does
not go hand in hand with gradience as measured in perception. In general,
while there are undeniable positive correlations between frequency and gra-
dient acceptability (see below), there are also clear cases where the two aspects
are not in agreement, so that the relation between theories that are built on
frequency data and those that rely on acceptability judgements is not always
obvious. It would seem that phonological theories attribute a greater role to
frequency eVects than syntactic theories, which may be related to the fact that
phonology is more concerned with stored and storable items than syntax. But
the issues arising here are far from being resolved.
In any event, a narrow interpretation of the scope of generative grammar
again implies that corpus frequency data are not the kind of object a grammar
can explain, since corpus frequencies do not reXect a mental state of a speaker
whose internal grammar one wants to describe. Of course, concrete corpora
are shaped by a large number of linguistically irrelevant factors (among them,
What are people interested in talking about?), and they are further inXuenced
by a set of cognitive factors (relative production diYculty) that one does not
necessarily have to cover in a grammar. However, the same is true for
perception data, since, for example, processing diYculty inXuences accept-
ability judgements as well (see, e.g. Fanselow and Frisch, Ch. 15 this volume).
In principle, everyone probably agrees that there are no privileged data, but in
practice grammatical models diVer quite substantially as to the type of data
they are designed to capture.
The second aspect is much more controversial. As mentioned above, many
factors inXuence the relative acceptability and the relative frequency of a
linguistic item. When we develop a model for gradience, we must take all of
them into account. The controversy, which comes in many diVerent guises
Gradience in Grammar 5

such as phonetics versus phonology, grammaticality versus acceptability,


competence versus performance, is whether it makes sense to keep at least
some of these factors (say, working memory) outside of what one speciWes in
one’s grammar, and if so, whether one can keep all factors that introduce
gradient properties external to grammar. The answers which can be found in
the literature range from the claim that grammar itself is gradience-free to the
position that the questions addressed here make no sense at all because their
presuppositions are not fulWlled.

1.2 Theories of gradience in phonology


Just like its structuralist predecessor, generative phonology set out with the
ideal of formulating an essentially categorical model. The aim of feature
theory, segment inventories, and, of course, of rules and derivations has
been to provide clear-cut categories such as the following: a system for
describing all phonemes of all languages and a system of ordered rules that
derive completely speciWed surface forms, not available for further variations.
The re-write rules of Chomsky and Halle (1968) were conceived for categor-
ical outputs, which implies that variations within a language were incompat-
ible with the purely categorical approach encompassed in the generative
format. It was generally accepted among generativists that phonology is
categorical and phonetics gradient.
In her chapter ‘Is there Gradient Phonology?’, Abigail Cohn discusses this
division between categorical phonology and gradient phonetics, and asks
where the line between the two modules is to be drawn. The answer to this
question proves to be more diYcult than the previous generative phonology
has suggested. Cohn’s main point is that there are grey areas in which sound
patterns may be explained in terms of gradience or of categoricity, so that it is
diYcult to clearly separate phonology from phonetics. In other words, phon-
ology, even if obviously categorical in some of its parts, also makes use of
gradient patterns. She deWnes the term gradience: (a) as a change in phonetic
space and time; (b) in the sense of variability (also in a diachronic dimen-
sion); and (c) in the sense of gradient well-formedness, concentrating on
the Wrst and marginally on the third interpretation. The Wrst problem she
addresses in her paper concerns contrast. She asks the question of whether
contrast may be gradient, a situation which may arise when two phonemes
contrast in some positions but not in others. Second, she looks at phonotactic
generalizations, like Pierrehumbert’s generalizations on medial consonant
clusters discussed below, which can also be considered gradient. The third and
last question concentrates on alternations, divided into morphophonemic
6 Gradience in Grammar

and allophonic ones. Steriade’s phonetic paradigm uniformity eVects and


Bybee’s frequency eVect in allophony are addressed in some detail. The
well-foundedness of Steriade’s claims that paradigms retain some phonetic
properties of their stem or of some speciWc inXected form (leading to over- or
underapplication of phonological processes) is questioned. Similarly, Bybee’s
suggestion that more frequent words are shorter and phonologically reduced
as compared to less frequent ones is also scrutinized. Cohn observes that both
eVects may be less pervasive than claimed by their proponents. The conclu-
sion of the paper is that phonology is both gradient and categorical, but that
phonology is not to be confounded with phonetics: both are separate modules
of the study of language.
Returning to the second sense of gradience in Cohn’s list, one observes that
in sociolinguistic phonology, the discrete nature of transformational rules was
questioned very early. Labov (1969), Cedergren and SankoV (1974), and
SankoV and Labov (1979) proposed accounting for variation in spontaneous
utterances by adding weighted contextual factors to rules (‘variable rules’).
The treatment of t,d deletion in South Harlem English (Labov et al. 1968) was
a seminal study in this domain (see also Fasold 1991), and we will use this
example to show how variation and gradience are inherent to the phono-
logical part of grammar.
The introduction of variable rules into linguistic theory was severely
criticized, mainly because of its alleged illicit blurring between diVerent
theoretical levels (see as an example Kaye and McDaniel 1978). In a series of
studies, Bybee (Hooper 1976; Bybee 1994, 2000a, 2000b, 2001) has quantiWed
t,d deletion in Standard English, among other lenition and reduction pro-
cesses, and she shows convincingly that this process is an on-going diachronic
change, agreeing in this with Labov (1994). Diachronic change seems to
always be preceded by a phase of synchronic variation—although the reverse
is not true, as will be shown below. Studying synchronic variation helps us to
understand better how language changes diachronically, and why historical
changes happen at all. Sociolinguists like Labov have been interested in
external factors—social class, ethnicity, gender, age, and so on—which intro-
duce variation into synchronic language. Other linguists have concentrated
on the internal factors that trigger change, historical or not, an aspect of this
line of research which is relevant for the issues in this book. Kiparsky (1982)
and Guy (1980, 1981), for instance, use the framework of Lexical Phonology,
which posits that morphology is organized in several derivational levels, each
of them with their own phonological rules. They show that there is a strong
correlation between the morphological structure and the rate of t,d deletion,
and they use categorical aspects of the model to express the variability of
Gradience in Grammar 7

deletion. An additional factor for variation comes from frequency. Bybee


(2000a, 2003) Wnds a signiWcant eVect of frequency on the rate of deletion.
Also among past tense verbs, there is an eVect of frequency, since high
frequency verbs delete their Wnal coronal stop more often than low frequency
ones. These results are largely conWrmed by Jurafsky et al. (2001) in a study
using the Switchboard Corpus (Godfrey et al. 1992), a corpus of telephone
conversations between monolingual American English speakers. They Wnd
that high frequency words delete Wnal t or d twice as frequently as low
frequency words. Jurafsky and colleagues are interested in the fact that
words which are strongly related to each other (like grand piano) or which
are predictable from their neighbours, as for example in collocations, are
more likely to be phonologically reduced.
On the basis of all these facts, Bybee (2003) presents a diVerentiated
account of t,d deletion. Whereas Labov (1994) regards the process as phon-
emic, in other words as a process in which a phoneme is always deleted
entirely (categorically deleted), lexical diVusion is for her both lexically and
phonetically gradient. This means that the coronal stop is not abruptly
deleted but is lenited Wrst. Eventually, a segment which is lenited more and
more may disappear altogether (see also Kirchner 2001 for an account of
lenition in Optimality Theory). Bybee (2003) uses Timberlake’s (1977) insight
on the distinction between uniform and alternating environments to explain
asymmetries in the pattern of t,d deletion, like the fewer occurrences of
deletion in the regular past tense morpheme. The past tense morpheme t,d
has an alternating environment both in the preceding and in the following
segment. It could be that the environments retarding deletion (preceding
vowel, as in played for instance, as contrasted with an obstruent, as in jumped,
missed, or rubbed) have an overall eVect on the pattern. Gradually, the more
frequent occurrences impose their phonetic structure in more and more
contexts, and this explains why words that occur in the context for a change
more frequently undergo the change at a faster rate than those that occur less
frequently in the appropriate context. In words like diVerent or grand, t,d are
always in the right context for deletion, given the syllable structure of the
word in isolation, but the past tense morpheme is more often in a context
where deletion does not trigger a better syllable structure. In the case of
diVerent or grand, the eVects of the change are represented lexically before
those in the case of the past tense morpheme.
We have presented the t.d deletion facts in some detail because it illustrates
the state of the art in phonological gradience: variation and change are not
external to the grammar and lexicon but inherent to them, and the diVusion
is not the result of random variation but rather stems from reduction
8 Gradience in Grammar

processes that occur in the normal automation of motor activity. Frequently


used segment sequences are easier to articulate because the neuromotor
activity controlling them is automatic. According to Lindblom (1990), len-
ition is due to hypoarticulation. He claims that speakers undershoot phonetic
targets, but only to the point at which their utterances are still recoverable.
Moreover, frequent words are more often in unstressed positions, which are
associated with less articulatory eVort. A frequent word is often used several
times in a discourse and this pre-mentioning increases its hypoarticulation
even more.
Pierrehumbert (2001, 2002) proposes a model to explain the pattern of
change, couched in the exemplar or episodic theory, originally a cognitive
theory of perception and categorization. In the linguistic extension of the
theory (Johnson 1997; Goldinger 2000), the mental lexicon consists of stored
episodes, and not of abstract units, as has been assumed in generative
phonology. During perception, a large number of traces residing in memory
are categorized, and activated on the basis of what is heard. Traces cluster in
categories as a function of their similarity. Moreover, frequent words leave
more traces than infrequent ones. As a new word is encountered, it is
categorized as a function of its similarity to existing exemplars, according to
probabilistic computation. If one category is more probable than its com-
petitors, the new item is categorized as an exemplar of this category, and in
case of ambiguity, the more frequent label wins. Pierrehumbert interprets this
theory as implicit knowledge about the probabilistic distribution of phono-
logical elements, organized as a cognitive map. Frequency is not directly
encoded but is just an artefact of the number of relevant traces. A word
which is heard frequently possesses more traces and thus is activated more
strongly than a rare word. A positive aspect of this theory is that it explains
the phonetic details that speakers of a certain language have to know in order
to master the articulatory subtleties. Languages diVer in their vocalic
distribution for instance, and if speakers just have access to rough universal
categories, as has been assumed in the generative approaches to phonology,
this fact is diYcult to understand. If their phonemic knowledge is based on
real acoustic traces of words pronounced, then the native speaker competence
can be understood as the accumulation of the large number of memory
experiences.
As an explanation of historical change, the perceptual memories of
the lenited word forms may increase incrementally. High frequency words,
which are lenited for the reasons mentioned above, are heard more often than
low frequency ones, shifting the direction of historical change even more.
Gradience in Grammar 9

Throughout this book, we will see that frequency plays a crucial role in
patterns of gradience. Frequency in phonology has also been examined from a
diVerent perspective, namely from the point of view of phonotactic patterns.
Frisch (1996) and Frisch et al. (1997) model a gradient constraint combination
to account for the phonotactics of the verbal roots in Arabic. In their chapter
‘Linguistic and Metalinguistic Tasks in Phonology: Methods and Findings’,
Stefan A. Frisch and Adrienne M. Stearns demonstrate that probabilistic and
gradient phonological patterns are part of the knowledge of a language in
general. Evidence for this thesis comes from a variety of sources, including
psycholinguistic experiments using metalinguistic and language processing
tasks, as well as studies of language corpora. These results support theories
that information about phonological pattern frequency is encoded at the
processing and production levels of linguistic representation.
Frisch and Stearn’s chapter Wrst reviews the methodologies that have been
used in phonological studies employing metalinguistic phonological judge-
ments, primarily in the case of phonotactics. These studies have found that
native speaker judgements closely reXect the phonotactic patterns of lan-
guage. Direct measures include well-formedness judgements, such as accept-
ability judgements and wordlikeness judgements (Frisch et al. 2000),
morphophonological knowledge (Zuraw 2000), inXuence of transitional
probabilities on wordlikeness judgements for novel words (Hay et al. 2004),
distance of novel CCVC words as measured by a phoneme substitution score
(Greenberg and Jenkins 1964), and measures of similarity between words.
Indirect measures reXect the grammatical linguistic knowledge through lin-
guistic performance and thus provide evidence for the psychological reality of
gradient phonological patterns. They include elicitation of novel forms (wug
tests), analysis of lexical distributions and of large corpora in general, as well
as analysis of confusability in perception and production. These last tests
show that lexical neighbourhood and phonotactic probability aVect speech
production.
Frisch and Stearn’s case study shows that sonority restrictions in consonant
clusters are gradient, the cross-linguistic preference being for onset consonant
clusters that have a large sonority diVerence. Quantitative language patterns
for thirty-seven languages were measured and compared to attested clusters.
Metalinguistic judgements of wordlikeness were also gathered for English and
compared to the attested and possible clusters, the results again providing
evidence for the psychological reality of gradient patterns in phonology. Mean
wordlikeness judgements correlated signiWcantly with the type frequency of
the CC sequences contained in the novel words.
10 Gradience in Grammar

The authors do not provide a grammatical model for their data. They even
conjecture that it is unclear whether a distinct phonological grammar is
required above and beyond what is necessary to explain patterns of phono-
logical processing. Given the grounding of gradient phonological patterns in
lexical distributions, they propose that exemplar models, based on frequency
information and probabilities, explain generalization-based behaviour as a
reXex of the collective activation of exemplars that are similar along some
phonological dimension, rendering abstract representations obsolete.
The contributions by Boersma and by Albright and Hayes propose anchor-
ing the correlation between gradience and frequency in grammar. They use
the Gradual Learning Algorithm (GLA) developed by Boersma (1998a) and
Boersma and Hayes (2001), a stochastic model of Optimality Theory. In GLA,
the variation comes from the possibility of a reordering of two or more
constraints in the hierarchy, expressed by overlapping of the constraint’s
range. In addition, the constraints have diVerent distances to their neigh-
bours. The likelihood of a reordering is thus not a function of the rank in the
hierarchy, but rather of the stipulated distance between the constraints, which
is encoded in the grammar by assigning numerical values to the constraints
which determine their rank and their distance at the same time. Boersma and
Hayes’s model thus allows us to deal with error variation as a source of
gradience in a language particular way.
Adam Albright and Bruce Hayes’s chapter ‘Modelling Productivity with the
Gradual Learning Algorithm: The Problem of Accidentally Exceptionless
Generalizations’ addresses the modelling of gradient data in inXectional
paradigms. Related to this is an empirical question about productivity:
when language learners are confronted with new data, which weight do they
assign to accuracy versus generality? This problem arises in relationship to
accidentally true or small-scale generalizations. These kinds of generalizations
are conWned to a small set of forms and correlate with unproductivity. This is
a classic problem of inductive learning algorithms which are restricted to a
training set: when confronted with new data, they might fail to make the right
predictions. In a standard Optimality Theory approach, constraints deduced
from the training set apply to the forms used for learning, but unfortunately
they make wrong predictions for new forms. Reliability of rules or
constraints, that is, how much of the input data they cover and how many
exceptions they involve, is not the right property to remedy this problem.
Generality may make better predictions, especially in the case of optionality
between two forms.
Children acquiring English, for instance, are confronted with several
answers as to how to form the past tense, as exempliWed by wing  winged,
Gradience in Grammar 11

wring  wrung, and sing  sang, which are attested English forms. A subset
of such verbs, composed of dig, cling, fling and sling build their past tense
with [^] simulation. Albright and Hayes (2003) Wnd that for a newly coined
verb like spling, English speakers rate past tense splung and splinged nearly
equivalently high. Their conclusion is that general rules, like ‘form past tense
with -ed ’ are so general that they sometimes compete with exceptionless
particular rules.
Navajo sibilant harmony in aYxation, the data set discussed in this
chapter, exhibits a similar, although attested, optionality. If a stem begins
with a [–anterior] sibilant ([č, č, čh, š, ž]), the s-perfective preWx [ši] is
attached. If the stem contains no sibilant, the preWx [si] is the chosen form.
If there is a [–anterior] sibilant later in the stem, both [si] and [ši] are possible.
Albright and Hayes’s learning system is not able to cope with this pattern. The
problem is that in addition to general and useful constraints, the system also
generates junk constraints which apply without exception to a small number of
forms, but which make incorrect predictions for new forms. To remedy the
problem they rely on the Gradual Learning Algorithm (Boersma and Hayes
2001), which assumes a stochastic version of OT. Each pair of constraints is not
strictly ranked, but rather assigned a probability index. The solution they
propose is to provide each constraint with a generality index. Each rule is
provided with a ranking index correlating with generality: the more general the
constraint (in terms of the absolute number of forms which fulWls it), the
higher it is ranked in the initial ranking. The junk constraints are ranked very
low in the initial ranking and have no chance to attain a high ranking, even
though they are never violated by the data of the learning set.
Turning now to Paul Boersma’s chapter ‘Prototypicality Judgements as
Inverted Perception’, it must Wrst be noticed that his goal is very diVerent
from that of the preceding chapter. Boersma presents an account of gradience
eVects in prototypicality tasks as compared to phoneme production tasks
with the example of the vowel /i/. A prototype is more peripheral than
the modal auditory form (the form they hear) in the listeners’ language
environment, including the forms produced by the listeners themselves. The
diVerence between the best prototype /i/ and the best articulated vowel [i] is
implemented as a diVerence in the value of F1 and shows a discrepancy of
50 Hz between the two tasks. Boersma provides a model of production and
comprehension which implements the main diVerence between the tasks in
the presence of the articulatory constraints in the production task and their
absence in the prototypicality task. More speciWcally, he proposes augmenting
12 Gradience in Grammar

the usual three-level grammar model (Underlying Form (UF) ! Surface


Form (SF) ! Overt Form (OF) ) with articulatory and auditory representa-
tions at OF (ArtF and AudF). Production consists of a single mapping from
UF to ArtF, with obligatory considerations of SF and AudF. Comprehension,
on the other hand, takes place in two separate steps: one process is called
perception (AudF is mapped to SF) and is equivalent to the prelexical
perception of McQueen and Cutler (1997), and the other process is recogni-
tion or lexical access (SF to UF). Boersma’s formal analysis is couched in a
stochastic OT approach of the kind he has proposed in previous work (see
above). In the case of perception (prototypicality task), auditory (sensori-
motor) events are relevant. The mapping of an incoming F1 to a certain vowel
is expressed by a series of negatively formulated cue constraints of the form
‘an F1 of 340 Hz is not /i/.’ The ranking of all such constraints determines for a
speciWc value of F1 whether it is to be interpreted as an /i/, an /e/ or any other
vowel. In the production task, both auditory and articulatory constraints are
active. Auditory cue constraints are the same as before, in the same ranking,
but now, articulatory constraints expressed as degrees of articulatory preci-
sion are also involved. Too much eVort is avoided, which explains why
the phoneme production task delivers a diVerent candidate from the proto-
typicality task.
We have discussed a number of approaches and papers that point at a
correlation between frequency patterns and acceptability. The question arises
as to whether this correlation is also attested in other parts of the phonology.
In their chapter ‘Gradient Perception of Intonation’, Caroline Féry and Ruben
Stoel address tonal contours as gradient phonological objects. Since tone
patterns only exist in their association with texts, they conducted an experi-
ment in which sentences realized with diVerent (marked and less marked)
tonal patterns were presented to subjects in an acceptability rating task. The
results of the experiment pointed to the existence of more generally accepted
contours. These are the intonational patterns found in a large number of
contexts. These contours are those which were originally produced as an
answer to a question asking for a wide-focused (all-new) conWguration, or
for a topic-focus pattern. A tonal pattern corresponding to narrow focus on
an early constituent was clearly deviant in most cases, and was attributed a
low grade in most contexts except for the matching one. Intermediate grades
were also obtained, thus revealing that tonal patterns are gradient objects. An
additional result was that the more syntactically and semantically complex the
sentences are, the less clear-cut the results. An interesting point mentioned in
the chapter is that the results obtained by means of a scale are indistinguish-
able from those obtained by categorical judgements. All in all, the paper’s
Gradience in Grammar 13

conclusion is that intonational contours are gradient objects, as far as their


acceptability is concerned, in the same way as segment clusters or word orders
are, and that the acceptability of tonal contours correlates with frequency.
This remark may point to the conclusion that what looks idiosyncratic at Wrst
glance may turn out to be the product of experience after all. What is heard
more often is felt to be more acceptable.
In their chapter ‘Prosodic InXuence on Syntactic Judgements’, Yoshihisa
Kitagawa and Janet Dean Fodor build a bridge from phonology to syntax.
Their point of departure is that a construction which requires a non-default
prosody is vulnerable to misjudgements of syntactic well-formedness when it
is read, and not heard. Acceptability judgements on written sentences are not
purely syntax-driven; they are not free of prosody even though no prosody is
present in the stimulus. As the basis of their observations, they presuppose the
Implicit Prosody Hypothesis (Fodor 2002b), which claims that readers project
a default prosody onto the read sentences. They elicited grammaticality
judgements on both Japanese and English sentences requiring a marked
prosody in order to be grammatical. The Japanese target items were instances
of constructions with wh-in-situ and long-distance scrambled wh. Each was
disambiguated by its combination of matrix and subordinate complement-
izers toward what has been reported to be its less preferred scope interpret-
ation: (a) subordinate wh-in-situ with forced matrix scope, and (b) wh
scrambled from the subordinate clause into the matrix clause, with forced
subordinate scope. The results of the experiments revealed that the target
sentences were accepted more often in the listening task than in the reading
task. The English sentences consisted of negative polarity items (NPIs) in
high/low attachment. In the Wrst case, the prosody has to be strongly marked,
whereas in the second case, the diVerence in attachment barely elicits a
diVerence in prosody. Subjects accepted the NPI sentences more often when
listening to them than when reading them. The best scores were obtained by a
combination of reading and listening.

1.3 Theories of gradience in syntax


There is some irony in the fact (mentioned above) that the Wrst work in
generative syntax developed a model for degrees of grammaticality, while
gradience never played a crucial role later in what may be called mainstream
(generative) syntax. Leading current syntactic models such as Minimalism,
OT, OT-LFG, HPSG, or categorial grammar seem disinterested in gradience,
at least as evidenced by the ‘standard references’ to these models, and this was
not much diVerent ten years ago.
14 Gradience in Grammar

There are reasons for considering this negligence unfortunate. Some key
domains of syntax show gradience to a considerable degree. The subjacency
phenomena, superiority eVects, and word order restrictions Wgure promin-
ently in this respect. This high degree of gradience often makes it unclear
what the data really are, and syntactic practice does not follow the golden
rule of formulating theories on the basis of uncontroversial data only (and
have the theory then decide the status of the unclear cases). We believe that
theories formulated on the basis of clear-cut data only would not really
be interesting in many Welds of syntax, so it is necessary to make the
‘problematic’ data less controversial, that is, to formulate a theory of
gradience in syntax.
There are two types of approaches to syntactic gradience as a property of
grammar. Chomsky (1955) allows an interpretation in which the gradience is
coded as a property of the grammatical rules or principles. Simplifying his
idea a bit, one can say that full grammaticality is determined by a set of very
detailed, high precision rules. If a sentence is in line with these, it has the
highest degree of acceptability. For deviant sentences, we can determine the
amount (and kind) of information we would have to eliminate from the high
precision, full detail rules in order to make the deviant sentence Wt the rule.
The less we have to eliminate, the less unacceptable the sentence is. Müller
(1999) makes a related proposal. He reWnes standard OT syntax with the
concept of subhierarchies composed of certain principles that are inserted
into the hierarchy of the other syntactic constraints. For ‘grammaticality’, it
only matters whether at least one of the principles of the subhierarchy is
fulWlled, but a structure is ‘unmarked’ only if it is the optimal structural
candidate with respect to all of the principles in the subhierarchy.
In the other tradition, represented, for example, by Suppes (1970), the rules
and constraints of the grammar are directly linked to numerical values (not
unlike the variable rules in phonology).
In his chapter ‘Linear Optimality Theory as a Model of Gradience in
Grammar’, Frank Keller introduces a theory of the second type—Linear
Optimality Theory (LOT). In contrast to standard OT approaches, LOT is
designed to model gradient acceptability judgement data. The author argues
that the necessity for such an approach results from the observation that
gradience in judgement data has diVerent properties from gradience in corpus
data and that, therefore, both types of gradience should be modelled inde-
pendently. The basic idea of the model is represented in two hypotheses which
are formulated (a) with respect to the relative ranking of the constraints and
(b) regarding the cumulativity of constraint violations. Whereas the former
states that the numeric weight of a constraint is correlated with the reduction
Gradience in Grammar 15

in acceptability to which it leads, the latter assumes that multiple constraint


violations are linearly cumulated. By means of a comparison of LOT and other
optimality theoretic models, such as Harmonic Grammar or Probabilistic
Optimality Theory, the author demonstrates the advantages of LOT in mod-
elling relative grammaticality and the corresponding judgements that include
optimal as well as suboptimal candidates within one ranking. Keller therefore
presents a necessary prerequisite for a successful understanding and model-
ling of gradient grammaticality from a formal perspective.
The chapter by Matthew Crocker and Frank Keller, ‘Probabilistic Gram-
mars as Models of Gradience in Language Processing’, is also concerned with
the functioning of grammars of the second type, but they focus on gradience
in human sentence processing, which, as the authors argue, can be under-
stood as variation in processing diYculty (or garden path strength). Based on
a number of empirical Wndings, for example modulation of relative clause
attachment preferences via a short training period, they present an account of
this phenomenon in terms of experience-based behaviour. From this perspec-
tive, the interpretation of a sentence is a function of prior relevant experience,
with a ‘positive’ experience supporting a speciWc interpretation and suppress-
ing alternative ones.
In addition to the experimental results, the concept is motivated by other
theoretical and probabilistically driven approaches in psycholinguistics and
cognitive science in general. The authors also discuss the Wne-grained
inXuences and the scope of lexical and structural frequencies during the
incremental interpretation of sentences. Most importantly for the main
focus of the current volume, (a) they claim that there is no straightforward
relationship between the frequency of a sentence type and its acceptability and
(b) this observation leads to the conclusion that a sentence-Wnal judgement
can only be derived as a result of an interaction of the frequency of experience
with respect to this speciWc construction, more general linguistic knowledge,
and cognitive constraints.
The chapter ‘Degraded Acceptability and Markedness in Syntax, and the
Stochastic Interpretation of Optimality Theory ’ by Ralf Vogel represents a
grammatical theory of the Wrst of the two types introduced above. Vogel
makes two central claims:

(i) gradience in syntax is an intrinsic feature of grammar. There are cases


of gradience which directly result from the interaction of the rules and
constraints that make up the grammar;
(ii) there is no need to import a quantitative dimension into grammar in
order to model gradient grammaticality.
16 Gradience in Grammar

The example that illustrates the Wrst claim is case conXicts in German free
relative constructions (FRs). FRs without case conXict receive higher accept-
ability in acceptability judgement experiments and are more frequent in
corpora than those with a case conXict. But the kind of conXict is also crucial:
conXicting FRs in which the oblique case dative is suppressed are judged as
less acceptable than those in which the structural case nominative is sup-
pressed. Vogel demonstrates that a standard optimality theoretic grammar is
already well-suited to predict these results, if one of its core features, the
central role of markedness constraints, is exploited in the right way. Vogel
further argues against the application of stochastic optimality theory in
syntax, as proposed, for instance, in Bresnan et al. (2001). The relative
frequencies of two alternative syntactic expressions in a corpus not only
reXect how often one structure wins over the other but also how often the
competition itself takes place, which here means how often a particular
semantic content is chosen as input for an OT competition. If the inXuence
of this latter factor is not neutralized, as in the model by Bresnan et al. (2001),
then properties of the world become properties of the grammar, an unwel-
come result. Vogel further provides evidence against another claim by Bresnan
et al., which has become famous as the ‘stochastic generalization’: categorical
contrasts in one language show up as tendencies in other languages. Typolo-
gically, FR structures are less common than semantically equivalent correla-
tive structures. The straightforward explanation for this observation can be
given in OT terms: FRs are more marked than correlatives. Nevertheless, a
corpus study shows that in unproblematic cases like non-conXicting nom-
inative FRs, FRs are much more frequent than correlatives in German. Vogel
argues that corpus frequency is biased by a stylistic preference to avoid over-
correct expressions which contain more redundant material than necessary,
primarily function words. Including such a stylistic preference into an OT
grammar in the form of a universal constraint would lead to incorrect
typological predictions. Vogel opts for the careful use of a multiplicity of
empirical methods in grammar research in order to avoid such method-
induced artefacts.
While these contributions highlight how syntactic principles can be made
responsible for gradience, several of the other factors leading to gradience are
discussed in detail in the following papers. That context and information
structure are relevant for acceptability has often been noted. This aspect is
addressed by Nomi Erteschik-Shir. Her ‘What’s What?’ discusses syntactic
phenomena which have been argued to lead to gradient acceptability in the
literature, namely the extraction of wh-phrases out of syntactic islands, as
well as several instances of so-called superiority violations, where multiple
Gradience in Grammar 17

wh-phrases within one clause appear in non-default order (e.g., *What did
who say ?). The acceptability of the wh-extraction in ‘Who did John say/?
mumble/*lisp that he had seen?’ seems to depend on the choice of the matrix
verb. Previous accounts of this contrast explained it by assigning a diVerent
syntactic status to the subordinate clause depending on the verb, leading to
stronger and weaker extraction islands. Erteschik-Shir shows that this analysis
is unable to explain why the acceptability of the clauses improves when the
oVending matrix verb has been introduced in the preceding context. She
argues that the possibility of extraction is dependent on the verb being
unfocused. The diVerence between semantically light verbs like ‘say’ and
heavier ones like ‘mumble’ and ‘lisp’ is that the latter are focused by default
while the former is not. Erteschik-Shir develops a model of the interaction
between syntax and information structure to account for this observation.
The central idea in her proposal is that only the focus domain can be the
source of syntactic extraction. If the main verb is focused, the subordinate
clause is defocused and thus opaque for extraction. Erteschik-Shir’s account
of superiority and exceptions from it (*What did who read? versus What did
which boy read?) also refers to the information structural implications of these
structures. Crossing movement does not induce a superiority violation if the
fronted wh-phrase is discourse-linked and thus topical. Another crucial factor
is that the in-situ wh-phrase is topical, which leads to focusing of the
complement, out of which extraction becomes possible. Erteschik-Shir’s
explanation for the degraded acceptability of these structures lies in her
view of elicitation methods. Usually, such structures are presented to inform-
ants without contexts. The degraded structures rely on particular information
structural conditions which are harder for the informants to accommodate
than the default readings. As this is a matter of imagination, Erteschik-Shir
also predicts that informants will diVer in their acceptance of the structures in
question. Overall, the account of syntactic gradience oVered here is process-
ing-oriented, in the sense that it is not the grammar itself that produces
gradience, but the inference of the information structural implications of an
expression in the course of parsing and interpretation.
In her chapter ‘Gradedness and Optionality in Mature and Developing
Grammars’, Antonella Sorace argues that residual optionality, which she
considers the source of gradience eVects, occurs only in interface areas of
the competence and not in purely syntactic domains. In that sense, her
approach is quite in line with what Erteschik-Shir proposes. Sorace asks (a)
whether gradedness can be modelled inside or outside of speakers’ grammat-
ical representations, and (b) whether all interfaces between syntax and other
domains of linguistics are equally susceptible to gradedness and optionality.
18 Gradience in Grammar

The underlying hypothesis for such an approach can be formulated in the


following way: structures requiring the integration of syntactic knowledge
and knowledge from other domains are more complex than structures
requiring syntactic knowledge only. From this perspective, it can be argued that
complex structures may lead to gradedness and variation in native grammars,
may pose residual diYculties to near native L2 speakers, and may pose
emerging diYculties to L1 speakers experiencing attrition from a second
language because of their increasingly frequent failure to coordinate/integrate
diVerent types of knowledge.
Sorace presents empirical support from all of these domains within the
context of null-subject constructions, post-verbal subject constructions, and
split-intransitivity in Italian. Based on these phenomena she assumes that—at
the moment—gradedness can best be modelled at the interface between
syntax and discourse, without excluding the very likely possibility that there
are additional interface representations. The internal structure of the interface
representations as well as their accessibility during language comprehension
in native and non-native grammars is a subject for further research about
gradedness and optionality.
The role of processing in the sense of syntactic structure building for
acceptability is the topic of the contributions by Hawkins and by Fanselow
and Frisch. John Hawkins’ contribution, ‘Gradedness as Relative EYciency in
the Processing of Syntax and Semantics’, deals with the results of several
corpus studies concerning the positioning of complements and adjuncts
relative to the verb in English and Japanese. The two languages show clear
selection preferences among competing structures which range from highly
productive to unattested (despite being grammatical). Hawkins gives an
explanation for the gradience in these data in terms of a principle of process-
ing eYciency, Minimize Domains (MiD). This principle states that the
human processor prefers to minimize the connected sequences of linguistic
forms and their conventionally associated syntactic and semantic properties.
This preference is of variable degree, depending on the relations whose
domains can be minimized in competing structures. In the paper, the MiD
principle is used to explain weight eVects among multiple constituents fol-
lowing the verb in an English clause. In general, between two NPs or PPs
following the verb, it can be observed that the shorter NP/PP comes Wrst. This
preference is stronger the more the two phrases diVer in size. The eVect of
MiD is that it tends to minimize the distance between the verb and the head of
the non-adjacent phrase, thus favouring an order in which the shorter post-
verbal constituent precedes the longer one. In Japanese, a head-Wnal language,
we observe the reverse situation: the longer phrase precedes the shorter one,
Gradience in Grammar 19

the crucial factor being that the shorter phrase is preferred to be adjacent to
the verb. An additional factor is whether (only) one of the two constituents is
in a dependency relation with the verb. This factor strengthens the weight
eVect if the selected phrase is shorter, but weakens it if it is longer. Hawkins
also suggests that MiD has shaped grammars and the evolution of grammat-
ical conventions, according to the performance-grammar correspondence
hypothesis: syntactic structures have been conventionalized in proportion to
their degree of preference in performance, as evidenced by patterns of selec-
tion in corpora and by ease of processing in performance. Hawkins further
argues that his account is superior to an alternative approach like stochastic
Optimality Theory because it does not mix grammatical constraints with
processing constraints, as a stochastic OT approach would have to do.
In their chapter ‘EVects of Processing DiYculty on Judgements of
Acceptability’, Gisbert Fanselow and Stefan Frisch present experimental data
highlighting an unexpected eVect of processing on acceptability. Typically, it is
assumed that processing diYculties reduce the acceptability of sentences.
Fanselow and Frisch report the results of experiments suggesting that pro-
cessing problems may make sentences appear more acceptable than they
should be on the basis of their grammatical properties. This is the case
when the sentence involves a local ambiguity that is initially compatible
with an acceptable interpretation of the sentence material, but which is later
disambiguated towards an ungrammatical interpretation. The Wndings
support the view that acceptability judgements not only reXect the outcome
of the Wnal computation, but also intermediate processing steps.
Matthias Schlesewsky, Ina Bornkessel, and Brian McElree examine the
nature of acceptability judgements from the perspective of online language
comprehension in ‘Decomposing Gradience: Quantitative versus Qualitative
Distinctions’. By means of three experimental methods with varying degrees
of temporal resolution (speeded acceptability judgements, event-related brain
potentials, and speed-accuracy trade-oV), the authors track the development
of speakers’ judgements over time, thereby showing that relative diVerences in
acceptability between sentence structures stem from a multidimensional
interaction between time sensitive and time insensitive factors. More specif-
ically, the Wndings suggest that increased processing eVort arising during the
comprehension process may be reXected in acceptability decreases even when
judgements are given without time pressure. In addition, the use of event-
related brain potentials as a multidimensional measurement technique reveals
that quantitative variations in acceptability may stem from underlying diVer-
ences that are qualitative in nature. On the basis of these Wndings, the authors
argue that gradience in linguistic judgements can only be fully described when
20 Gradience in Grammar

all component parts of the judgement process, that is, both its quantitative
and its qualitative aspects, are taken into account.
What conclusions should be drawn from the insight that gradience results
from domains such as processing diYculty or information structure is the
topic of Eric Reuland’s Chapter. In ‘Gradedness: Interpretive Dependencies
and Beyond’ he defends a classic generative conception of grammar that
contains only categorical rules and concepts. He identiWes a number of
grammar-external sources of gradience as it is frequently observed in empir-
ical linguistic studies. The language that should be modelled by grammarians,
according to Reuland, is the language of the idealized speaker/hearer of
Chomsky (1965). In this Chomskyan idealization, most factors which are
crucial for gradience are abstracted away from. Among such factors, Reuland
identiWes the non-discreteness of certain aspects of the linguistic sign, for
instance the intonation contours which are used to express particular seman-
tic and syntactic features of clauses, like focus or interrogativity. Reuland
argues that it is only the means by which these features are expressed which
are non-discrete, not the features themselves. But only the latter are subject to
the theory of grammar. Reuland further separates diVerences in language,
which do not exist despite the preference for one or the other expression
within the (idealized) speech community, from diVerences in socio-cultural
conventions, which may exist, but are irrelevant for the study of grammar.
Nevertheless, non-discrete phenomena are expected to occur where grammar
leaves open space for certain choices, for instance in the way the subcompo-
nents of grammar interact. Another source of gradience is variation in
acceptability judgements within a speech community, as dialectal or ideolec-
tal variation, and even within speakers, using diVerent ideolects on diVerent
occasions, or as the eVect of uncertainty in a judgement task. Apart from these
extra-grammatical explanations for gradience, Reuland also sees grammar
itself as a possible source of gradience. Current models of grammar include
a number of subcomponents, each of which has its own rules and constraints,
some perhaps violable, which interact in a non-trivial way. Any theory of
language, Reuland concludes, that involves a further articulation into such
subsystems is in principle well equipped to deal with ‘degrees’ of well- or
ill-formedness. Reuland exempliWes his position with a comparative case
study of the syntax of pronouns, mainly in Dutch and English. He shows
that the syntactic properties of reXexives and pronominals depend on a
number of further morphosyntactic properties of the languages in question,
among which are the inventory of pronouns in the language, richness of case,
the possibility of preposition stranding, the mode of reXexive marking on
verbs, the organization of the syntax–semantics interface in thematic
Gradience in Grammar 21

interpretation, and pragmatics. These factors interact non-trivially; constraint


violation cannot always be avoided, and thus leads to degraded acceptability.
Admitting this, in Reuland’s view, in no way requires abandoning the
categorical conception of grammar.
We have organized this book into four parts. The initial chapters have a
certain emphasis on clarifying the nature of gradience as such, and give
answers to the question of ‘What is gradience’ from the perspectives of
phonology (Cohn; Frisch and Stearns), generative syntax (Reuland), psycho-
linguistics (Schlesewsky, Bornkessel, and McElree; Sorace) and sociolinguis-
tics (Cornips).
The following two parts address speciWc issues in phonology (Boersma;
Albright and Hayes; Féry and Stoel) and syntax (Crocker and Keller; Hawkins;
Keller; Vogel). The contributions to the Wnal part of the book have in
common that they look at a speciWc construction, namely long movement,
from diVerent methodological backgrounds (Erteschik-Shir; Fanselow and
Frisch; Kitagawa and Fodor).
This page intentionally left blank
Part I
The Nature of Gradience
This page intentionally left blank
2

Is there Gradient Phonology?


ABIGAIL C. COHN

2.1 Introduction
In this chapter,1 I consider the status of gradient phonology, that is, phono-
logical patterns best characterized in terms of continuous variables. I explore
some possible ways in which gradience might exist in the phonology,
considering the various aspects of phonology: contrast, phonotactics,
morphophonemics, and allophony. A fuller understanding of the status of
gradience in the phonology has broader implications for our understanding
of the nature of the linguistic grammar in the domain of sound patterns and
their physical realizations. In the introduction, I consider why there might be
gradience in the phonology (Section 2.1.1). I then brieXy discuss the nature of
phonology versus phonetics (Section 2.1.2).

2.1.1 Is there gradient phonology?


Phonology is most basically a system of contrasts, crucial to the conveyance of
linguistic meaning. This suggests that phonology is in some sense ‘categorical’.
Central to most formal models of phonology is a characterization of minim-
ally contrasting sound ‘units’ (whether in featural, segmental, or gestural
terms) that form the building blocks of meaningful linguistic units. In what
ways is phonology categorical—mirroring its function as deWning contrast,
and to what degree is phonology inherently gradient in its representation,
production, perception, acquisition, social realization, and change over time?

1 A number of the ideas discussed in this chapter were developed in discussions in my graduate
seminars at Cornell, Spring 2004 and Spring 2005. Some of these ideas were also presented in colloquia
at the Universities of BuValo and Cornell. Thanks to all of the participants in these fora for their
insightful comments and questions. Special thanks to Mary Beckman, Jim Scobbie, and an anonymous
reviewer for very helpful reviews of an earlier draft, as well as Johanna Brugman, Marc Brunelle, Ioana
Chitoran, Nick Clements, Caroline Féry, Lisa Lavoie, Amanda Miller, and Draga Zec for their
comments.
26 The Nature of Gradience

. The physical realization of sounds, understood (at least intuitively) as


abstract units, is continuous in time and space, with the relationship
between the speciWc acoustic cues and abstract contrasts often being
diYcult to identify.
. One crucial aspect of the acquisition of a sound system is under-
standing how phonetic diVerences are marshalled into deWning abstract
categories.
. Intraspeaker and interspeaker variation signal speaker identity, commu-
nity identity, and attitude, while simultaneously conveying linguistic
meaning through minimally contrasting elements.
. The results of many diachronic changes, understood to be ‘regular sound
change’ in the Neogrammarian sense, are categorical, yet how do changes
come about? Are the changes themselves categorical and abrupt or do the
changes in progress exhibit gradience and gradual lexical diVusion?
A modular view of grammar such as that espoused by Chomsky and Halle
(1968, SPE) frames our modelling of more categorical and more gradient
aspects of such phenomena as belonging to distinct modules (e.g. phonology
versus phonetics). While SPE-style models of sound systems have achieved
tremendous results in the description and understanding of human language,
strict modularity imposes divisions, since each and every pattern is deWned as
either X or Y (e.g. phonological or phonetic). Yet along any dimension that
might have quite distinct endpoints, there is a grey area. For example, what is
the status of vowel length before voiced sounds in English, bead [bi:d] versus
beat [bit]? The diVerence is greater than that observed in many other lan-
guages (Keating 1985), but does it count as phonological?
Bearing in mind how a modular approach leads to a particular interpret-
ation of the issues, I consider the relationship between phonology and
phonetics before exploring the question of gradience in phonology.

2.1.2 The nature of phonetics versus phonology


A widely held hypothesis is that phonology is the domain of abstract patterns
understood to be discrete and categorical, and phonetics is the domain of the
quantitative realization of those patterns in time and space. These relation-
ships are sketched out in (2.1).
(2.1) The relationship between phonology and phonetics:
phonology ¼ discrete, categorical

phonetics ¼ continuous, gradient
Is there Gradient Phonology? 27

For recent discussions of this consensus view, see for example Keating (1996);
Cohn (1998); Ladd (2003), also individual contributions in Burton-Roberts
et al. (2000) and Hume and Johnson (2001). See also Cohn (2003) for a fuller
discussion of the nature of phonology and phonetics and their relationship.
For the sake of concreteness, consider an example of phonological patterns
and their corresponding phonetic realization that are consistent with the
correlations in (2.1). In Figure 2.1, we see representative examples of the
patterns of nasal airXow in French and English (as discussed in Cohn 1990,
1993). Nasal airXow is taken here as the realization of the feature Nasal.
In the case of a nasal vowel in French, here exempliWed in the form daim
‘deer’ [dE](Figure 2.1a), there is almost no nasal airXow on [d] and there is
signiWcant airXow throughout the [E ]. Here we observe plateaus correspond-
ing to the phonological speciWcations, connected by a rapid transition. In
English on the other hand, during a vowel preceding a nasal consonant, such
as [e] in den [den] (Figure 2.1b), there is a gradient pattern—or a cline—
following the oral [d] and preceding the nasal [n] (which are characterized by
the absence and presence of nasal airXow respectively). This is quite diVerent
t]
from the pattern of nasalization observed on the vowel in cases like sent [sE
(Figure 2.1c), in which case the vowel is argued to be phonologically nasalized
(due to the deletion of the following /n/) and we observe a plateau of nasal
airXow during the vowel, similar to the pattern seen in French. The observed

d ∼
ε d ε n
−N +N −N +N 100ms

(a) French daim 'deer' /dε∼/ (b) English den / dεn/


ε
s (n) t
−N +N −N
~ /
(c) English sent / sεt

Figure 2.1. Examples of nasal airflow in French and English following Cohn (1990,
1993)
28 The Nature of Gradience

diVerences between French and English relate quite directly to the fact that
French has nasal vowels, but English does not.
If the correlations in (2.1) are correct, we expect to Wnd categorical phon-
ology, but not gradient phonology, and gradient, but not categorical, phon-
etics. Recent work calls into question this conclusion. In particular, it is
evidence suggesting that there is gradience in phonology that has led some
to question whether phonetics and phonology are distinct. Pierrehumbert
et al. (2000) state the question in the following way:
this assertion [that the relationship of quantitative to qualitative knowledge is modu-
lar] is problematic because it forces us to draw the line somewhere between the two
modules. Unfortunately there is no place that the line can be cogently drawn. . . . In
short, knowledge of sound structure appears to be spread along a continuum. Fine-
grained knowledge of continuous variation tends to lie at the phonetic end. Know-
ledge of lexical contrasts and alternations tend to be more granular. (Pierrehumbert
et al. 2000: 287)

Let us consider the background of this issue in a bit more depth. Growing out
of Pierrehumbert’s (1980) study of English intonation, gradient phonetic
patterns are understood as resulting from phonetic implementation, through
a mapping of categorical elements to continuous events. Under the particular
view developed there, termed generative phonetics, these gradient patterns are
the result of interpolation through phonologically unspeciWed domains.
Keating (1988) and Cohn (1990) extend this approach to the segmental
domain, arguing that phenomena such as long distance pharyngealization
and nasalization can be understood in these terms as well. For example, the
cline in nasal airXow seen in the vowel [e] in [den] in Figure 2.1b is interpreted
as resulting from phonetic interpolation through a phonologically unspe-
ciWed span.
The phonology, then, is understood as the domain of discrete, qualitative
patterns and the phonetics as the domain of the continuous, quantitative
realization of those patterns. Intrinsic to this view is the idea that lexical
entries and phonological patterns are represented in terms of distinctive
features, taken to be abstract properties, albeit deWned phonetically. These
are then interpreted in a phonetic component, distinct from the phonological
one. I refer to this as a mapping approach. A modular mapping approach has
been the dominant paradigm to the phonology–phonetics interface since the
1980s and has greatly advanced our understanding of phonological patterns
and their realization. Such results are seen most concretely in the success of
many speech-synthesis-by-rule systems both in their modelling of segmental
and suprasegmental properties of sound systems. (See Klatt 1987 for a review.)
Is there Gradient Phonology? 29

An alternative to the types of approaches that assume that phonology and


phonetics are distinct and that there is a mapping between these two modules
or domains are approaches that assume that phonology and phonetics are one
and the same thing, understood and modelled with the same formal mech-
anisms, what I term a unidimensional approach. A seminal approach in this
regard is the theory of Articulatory Phonology, developed by Browman and
Goldstein (1992 and work cited therein), where it is argued that the domains
that are often understood as phonology and phonetics respectively can both
be modelled with the same formalisms as constellations of gestures. Under
this view, phonetics and phonology are not distinct and the apparent diVer-
ences might arise through certain (never explicitly speciWed) constraints on
the phonology. This gestural approach has served as fertile ground for
advancing our understanding of phonology as resulting at least in part from
gestural coordination. However, there are criticisms of this approach as a
comprehensive theory of phonology, including arguments that Articulatory
Phonology greatly overgenerates possible patterns of contrast. (See commen-
taries by Clements 1992 and Steriade 1990.)
More recently, there is a signiWcant group of researchers (e.g. Flemming
2001; Kirchner 2001; Steriade 2001; see also Hayes et al. 2004) working within
constraint-based frameworks, pursuing the view that there is not a distinction
between constraints that manipulate phonological categories and those that
determine Wne details of the representation. This then is another type of
unidimensional approach that assumes no formally distinct representations
or mechanisms for phonology and phonetics. One type of argument in favour
of this approach is that it oVers a direct account of naturalness in phonology.
However, the strength of this argument depends on one’s view about the
source(s) of naturalness in language. (See Blevins 2004 for extensive discus-
sion of this issue.)
Such unidimensional views of phonology and phonetics also need to oVer
an account of not only what I term ‘phonetics in phonology’, but also of the
relationship between phonological units and physical realities—‘phonology
in phonetics’. (See Cohn 2003 for a discussion of the distinct ways that
phonology and phonetics interact with each other.) Independent of the
account of naturalness, the question of whether one can adequately model
the way that the phonetics acts on phonology still remains. Both Zsiga (2000)
and Cohn (1998) have argued that such unidimensional approaches do not
oVer an adequate account. As documented by Cohn (1998), this is commonly
seen in ‘phonetic doublets’, cases where similar but distinct eVects of both a
categorical and gradient nature are observed in the same language. These sorts
of eVects can be seen in the case of nasalization discussed above. In French, in
30 The Nature of Gradience

the realization of contrastive nasal vowels, there is nasal airXow resulting from
the contrast and also from coarticulatory patterns, seen, for example, in the
transition between oral vowels and nasal consonants. Both aspects need to be
modelled. In the case of contextual nasalization in English, there are both long
distance and more local eVects seen in the physical patterns of nasal airXow
that need to be accounted for.
The question of whether phonology and phonetics should be understood
as distinct modules needs to be approached as an empirical question. What
sort of approach gives us the best Wt for the range of more categorical versus
more gradient phenomena?
There are clearly some grey areas—notably gradient phonology. Yet it is
important to realize that just because it is diYcult to know exactly where to
draw the line (cf. Pierrehumbert et al. 2000), this does not mean there are not
two separate domains of sound structure. The fact that it is diYcult to draw a
line follows in part from the conception of phonologization (Hyman 1976),
whereby over time low-level phonetic details are enhanced to become phono-
logical patterns. Phonologization by its very nature may result in indetermin-
ate cases. As phonetic details are being enhanced, it will be diYcult at certain
stages to say that a particular pattern is ‘phonetic’ while another is ‘phono-
logical’. It has been suggested, for example that vowel lengthening before
voiced sounds in English is currently in this in-between state. The diYculty
of drawing a line also relates to the sense in which categoriality can only be
understood in both rather abstract and language-speciWc terms.
Recent work suggests that phonology and phonetics are not the same thing,
but that the distinction might be more porous than assumed following strict
modularity (e.g. Pierrehumbert 2002 and Scobbie 2004). Pierrehumbert
(2002: 103) states: ‘categorical aspects of phonological competence are
embedded in less categorical aspects, rather than modularized in a conventional
fashion.’ We return below to the nature of the relationship between phon-
ology and phonetics, as the status of gradient phonology plays a crucial role in
this question.
In order to investigate gradience in phonology, we need a clearer under-
standing of what we mean by gradience and we need to consider how it might
be manifested in diVerent aspects of the phonology. I turn to these questions
in the next section.

2.2 Aspects of gradience


Most basically, we understand gradient and gradience in opposition to cat-
egorical and categoriality. A gradient (n.) in its original sense is a mapping
Is there Gradient Phonology? 31

from one continuous variable to another, that is, a slope. (In linguistic usage,
we use the form gradience as a noun and gradient as an adjective.) It has also
shifted to mean the continuous nature of a single variable.2 Thus we need to
be clear on which sense of gradient we are talking about. Discrete is often
equated with categorical and continuous with gradient (although there may
be gradient patterns that are discrete). We need to consider both the question
of what is gradient, as well as what is continuous.
The terms gradient and gradience have been used in a number of diVerent
ways in the recent phonetic and phonological literature. To think more
systematically about the nature of gradience in phonology, we need to
tease apart these diVerent usages (Section 2.2.1) before considering how
these senses might apply to diVerent aspects of what is understood to be
phonology—that is, contrast (Section 2.2.2), phonotactics (Section 2.2.3),
and alternations, both morphophonemics (Section 2.2.4) and allophony
(Section 2.2.5).

2.2.1 DiVerent uses of the term gradience


When we talk about sound patterns, there are at least three senses of gradience
that have been prevalent in the recent literature—temporal/spatial gradience,
variability, and gradient well-formedness.3
2.2.1.1 Temporal/spatial gradience In work on the phonetic implementation
of phonology, gradient/gradience is used in the sense of change in phonetic
space through time. This is the sense of gradient versus categorical seen in the
example of the realization of nasalization shown in Figure 2.1. In this case,
there is a change in the amount of nasal airXow (space) through time,
characterized as being a cline, distinct from more plateau-like cases (argued
to obtain in cases of contrast). This is what I take to be the primary sense of
gradience versus categoriality as it applies to the domain of sound patterns
and their realization.
2.2.1.2 Variability The term gradience is also often used to refer to variable
realizations or outcomes of sound patterns, understood as unpredictable or as
stemming from various sociolinguistic and performance factors. We might
understand this as gradience in the sense of gradience across tokens.

2 Thanks to Mary Beckman (p.c.) for clarifying this question of usage.


3 There is an additional use of the term gradient in the recent phonological literature. Within
Optimality Theory, gradient has also been used to refer to constraint satisfaction (e.g. McCarthy and
Prince 1993; McCarthy 2003), where more violations of a particular constraint are worse than a single
violation. This is diVerent from the other senses discussed here and will not come into play in the
present discussion.
32 The Nature of Gradience

Variability is sometimes understood in phonological terms as optional rule


application, or freely ranked constraints, or as ‘co-phonologies’. These pat-
terns have sometimes been modelled in statistical or stochastic terms. There
are also approaches that model these factors directly as sociolinguistic or
stylistic markers. (See Antilla 2002 and Coetzee 2004 for discussion of recent
approaches to modelling phonological variation.) Both variability and
gradience in phonetic implementation are pervasive in phonetic patterns
and both must ultimately be understood for a full understanding of
phonology and its realization.
What we sometimes interpret as variability may in fact result from
methodological approaches that are not Wne-tuned enough in their
characterization of conditioning factors or prosodic context. For example,
the realization of the contrast between so-called ‘voiced’ and ‘voiceless’ stops
in English is highly dependent on segmental context, position in the word,
position in the utterance, location relative to stress, etc. The nature of contrast
may also vary systematically by speaker (Scobbie 2004). If these factors are not
taken into consideration, one would conclude that there is enormous
variability in the realization of these contrasts, while in fact much of the
variation is systematic.
It is not necessarily the case that temporal/spatial gradience and variability
go hand in hand. In fact, there are well documented cases where they do not,
that is, cases of variability that involve quite distinct categorical realizations.
For example, this is the case with the allophones of /t/ and /d/ in English as
documented by Zue and Laferriere (1979). There are also patterns of temporal/
spatial gradience that are highly systematic, as numerous studies of coarticu-
lation and phonetic implementation show.
These issues are also closely related to the question of sources of diachronic
change and the issue of whether change is gradual. The nature of variation as
it is manifested in the social system and its relationship to diachronic change
are very important issues, but not ones that I pursue here. (See work by
Labov, Scobbie, Bybee, Kiparksy for recent discussions.)

2.2.1.3 Gradient well-formedness There is gradience across the lexicon, or


statistical knowledge, as documented in recent work by Pierrehumbert,
Frisch, and others. (See Frisch 2000 for a review and Bod et al. 2003
for recent discussion.) Here we talk about gradient well-formedness, the
idea that speaker/hearers make relative judgements about the well-
formedness of various sound structures. In the case of phonotactics, this is
understood as resulting from stochastic generalizations across the lexicon.
Such gradient well-formedness judgements are observed in other aspects of
Is there Gradient Phonology? 33

the phonology, as well as other domains including both morphology and


syntax. (See other chapters, this volume.) In such cases, it is the judgement
about well-formedness or grammaticality that is gradient, not a physical event
in time and space such as in the Wrst sense.
We turn now to the question of how gradience might be manifested in the
diVerent facets of phonology, focusing primarily on temporal/spatial
gradience and gradient well-formedness.

2.2.2 Contrast
Fundamental to a phonological system is the idea of lexical contrast: some
phonetic diVerences in the acoustic signal result in two distinct lexical items,
that is, minimal pairs. This is also the basis upon which inventories of sounds
are deWned. The term contrast is used in two rather diVerent senses: under-
lying or lexical contrast, and surface contrast, that is, identiWable phonetic
diVerences independent of meaning. The question of surface contrast some-
times arises when comparisons are made between phonological categories
across languages. It also often arises in the discussion of phonological
alternations that aVect lexical contrasts in terms of neutralization or near-
neutralization. Cases of complete phonological neutralization should result in
no cues to underlying diVerences or contrast. Yet many cases of what are
claimed to be complete neutralization exhibit subtle phonetic cues that
diVerentiate between surface forms. (For a recent discussion and review of
such cases involving Wnal devoicing, see Warner et al. 2004). Under one
interpretation, such cases can be understood as gradient realization of con-
trast. Due to space limitations, I do not pursue the issue of near-
neutralization here.
We might wonder if contrast is all or nothing, or whether it too might be
gradient in the sense of exhibiting gradient well-formedness. Within genera-
tive grammar, we understand contrast in absolute terms. Two sounds are
either in contrast or they are not. Many contrasts are very robust. Yet, contrast
can also be much more speciWc or limited. (See Ladd 2003 for a discussion of
some such cases.) There are certain sounds that contrast in some positions,
but not others (that is, positional neutralization). For example, even for
speakers who maintain an /a/ – /O/ contrast in American English, this contrast
holds only before coronals and in open syllables. What is the nature of
realization of these sounds before non-coronals? Do speakers produce the
‘same’ vowel in fog and frog? There are also some sounds that contrast in all
positions in the word, but where the functional load of the contrast is very
limited, such as in the case of /u/ versus /ð/ in English (thigh vs. thy, ether vs.
either, Beth vs. eth, that is [ð]). Is contrast realized the same way in these cases
34 The Nature of Gradience

as in the more robust cases? Or should contrast also be understood as a


gradient property? I will not pursue this question here, but it might well be
that contrast is more gradient in nature than often assumed and so robustness
of contrast might well prove to be an interesting area for investigation. Lexical
neighbourhood eVects as well as phonological conditioning might both come
into play.

2.2.3 Phonotactics
A second aspect of sound systems widely understood to constitute part of
phonology is allowable sound combinations or sequences—phonotactics.
Some aspects of phonotactics appear to be deWned by segmental context,
especially immediately preceding and following elements; some aspects are
deWned by prosodic position, often best characterized in terms of syllable
structure; and some aspects are deWned by morpheme- or word-position.
Under many approaches to phonology, phonotactic patterns are understood
to be categorical in nature. Particular combinations of sounds are understood
to be either well-formed or ill-formed. Following most generative approaches
to phonology, both rule-based and constraint-based, phonotactic patterns are
captured with the same formal mechanisms as phonological alternations.
Typically, phonotactic and allophonic patterns closely parallel each other,
providing the motivation for such uniWed treatments. It is argued that distinct
treatments would result in a ‘duplication’ problem (e.g. Kenstowicz and
Kisseberth 1977).
Recent work by a wide range of scholars (e.g. Pierrehumbert 1994, Vitevich
et al. 1997, Frisch 2000, Bybee 2001, and Hay et al. 2003) suggests that
phonotactic patterns can be gradient, in the sense that they do not always
hold 100 per cent of the time. Phonotactic patterns may reXect the stochastic
nature of the lexicon and speaker/hearers are able to make judgements about
the relative well-formedness of phonotactic patterns.
As an example, consider the phonotactics of medial English clusters, as
analysed by Pierrehumbert (1994). Pierrehumbert asks the question of how we
can account for the distribution of medial clusters, that is, the fact that certain
consonant sequences are well-formed but others are not, for example /mpr/,
/ndr/ but not */rpm/ or */rdn/. A generative phonology approach predicts:
medial clusters ¼ possible codas + possible onsets. While a stochastic syllable
grammar makes diVerent predictions: ‘the likelihood of medial clusters
derived from the independent likelihoods of the component codas and onsets’
(1994: 174) and ‘The combination of a low-frequency coda and a low-
frequency onset is expected to be a low-frequency occurrence’ (1994: 169).
Pierrehumbert carried out a systematic analysis of a dictionary and found
Is there Gradient Phonology? 35

roughly Wfty monomorphemic medial clusters. In the same dictionary, there


were 147 possible codas and 129 possible onsets. If these were freely combin-
ing, there would be predicted to be 18,963 medial clusters. With some
expected restrictions, Pierrehumbert concludes that we would still expect
approximately 8,708. Pierrehumbert observes ‘It turned out that almost all
the occurring triconsonantal clusters were among the 200 most likely com-
binations, and that a stochastic interpretation of syllable grammar eVectively
ruled out a huge number of possible clusters, eliminating the need for many
idiosyncratic constraints in the grammar’ (1994: 169). Pierrehumbert then
discusses the systematic restrictions that play a role in determining the
particular Wfty or so medial combinations that are attested among the 200
most likely. She concludes that a stochastic syllable grammar understood in
the context of certain more traditional sorts of phonological constraints
accounts for the observed patterns.
Recent work in psycholinguistics shows that speakers have access in at least
some situations to very Wne details including both speaker-speciWc and
situation-speciWc information. (See Beckman 2003 and Pierrehumbert 2003
for reviews and discussion of this body of work.) Thus, it is not that surprising
that speakers are sensitive to degrees of well-formedness in phonotactic
patterns and that these parallel in some cases distributions in the lexicon.
This leads us to two important issues. First, are phonotactic patterns and
other aspects of phonology (contrast, morphophonemics, and allophony) as
closely associated with each other as has been assumed in the generative
phonological literature? Perhaps while similar and in some cases overlapping,
phonotactics and other aspects of phonological patterning are not necessarily
the same thing. This suggests that the standard generative phonology approach
is reductionist in that it collapses distributional generalizations across the
lexicon with other aspects of what is understood to be phonology. Second,
evidence suggests that we have access to Wner details in at least some situations/
tasks and some of these Wner details may play a role in characterizing lexical
entries. Thus, it cannot be, as is often assumed following theories of under-
speciWcation in generative phonology, that lexical representations consist only
of highly sparse contrastive information (e.g. pit /pit/, spit /spit/). We will not
reach insightful conclusions about the nature of phonology if we just assume
that lexical representations capture only contrast. These two widely held
assumptions of generative phonology need to be revisited.
However, there are two important caveats on the other side. Just because we
are sensitive to Wner details, does not mean that we cannot abstract across the
lexicon. To assume that we do not is to fall prey to this duplication problem
from the other side. Pierrehumbert (2003) argues that some phonotactic
36 The Nature of Gradience

knowledge is not tied to frequency and indeed is true abstraction across the
lexicon, that is, there is phonological knowledge independent of statistical
generalizations across the lexicon. ‘In light of such results, I will assume,
following mainstream thought in linguistics, that an abstract phonological
level is to be distinguished from the lexicon proper.’ (2003: 191). This suggests
that we have access to both Wne-grained and coarse-grained levels of know-
ledge and that they co-exist (see Beckman 2003 and Beckman et al. 2004). We
would predict a (negative) correlation between the degree of gradience and
the level of abstraction.

2.2.4 Alternations (morphophonemics)


In many ways, the core phenomena understood to constitute phonology are
alternations. The most canonical types are morphophonemic alternations,
where the surface form of a morpheme is systematically conditioned by
phonological context. Alternation is also used to refer to allophonic alterna-
tion where particular phones are in complementary distribution and are thus
argued to be variants of the same underlying phoneme. Positional allophones
are argued to alternate in their distribution based on phonological context.
We consider morphophonemic alternations in this subsection and allophony
in Section 2.2.5.
Assuming we can draw appropriate boundaries (delineating the cases that are
phonologically conditioned, productive, and not morpheme-speciWc), mor-
phophonemic alternations are at the very core of what most phonologists think
of as phonology. Most alternations are understood to be quite categorical in
nature, often involving the substitution of distinct sounds in particular envir-
onments. Following a Lexical Phonology approach (e.g. Kiparsky 1982), such
alternations are understood to be part of the lexical phonology and are assumed
to respect structure preservation. If these sorts of cases are shown to involve
gradience, this would strike at the core of our understanding of the phonology,
since these are the least disputable candidates for ‘being phonology’.
A widely cited claim arguing for gradience in phonology is that made by
Steriade (2000). Parallel to phonological paradigm uniformity eVects, which
are taken to account for some ‘cyclic’ eVects (e.g. Benua 1998; Kenstowicz
2002), Steriade argues that there are phonetic paradigm uniformity eVects,
where non-contrastive phonetic details may be marshalled to indicate mor-
phological relatedness.
Consider Wrst a canonical example of so-called paradigm uniformity
eVects. Many languages show overapplication or underapplication of phono-
logical patterns that result in phonological similarity within morphologically
related forms, despite the absence of the relevant phonological conditioning
Is there Gradient Phonology? 37

context. For example, in Sundanese there is a general pattern of vowel


nasalization, whereby vowels become nasalized after a nasal consonant, unless
blocked by a non-nasal supra-laryngeal consonant (Robins 1957). This is
exempliWed in (2.2a). There is overapplication of nasalization in inWxed
forms indicating plurality or distributedness (2.2b).
(2.2) Nasalization in Sundanese (Cohn 1990)
a. /Jiar/ [Jı̃ãr] ‘seek’ (active)
/niis/ [nı̃?ı̃s] ‘relax in a cool place’ (active)
/˛atur/ [˛ãtur] ‘arrange’(active)
/˛uliat/ [˛ũliat] ‘stretch’ (active)
b. Singular Plural
/Jiar/ [Jı̃ãr] ‘seek’(active) /J¼al¼iar/ [Jãlı̃ãr]
/niis/ [nı̃?ı̃s] ‘relax’(active) /n¼ar¼iis/ [nãrı̃?ı̃s]
In derivational approaches, this overapplication follows from a cyclic analysis,
where vowel nasalization reapplies after inWxation (e.g. Cohn 1990). However
such a solution is not available in non-serial approaches such as most
Optimality Theoretic approaches. One account within Optimality Theory is
that such patterns result from Output–Output constraints, reXecting the
morphological relationships between words (Benua 1998). Such phonological
parallels are enforced by paradigm uniformity.
Steriade (2000) argues that not only phonological properties (those that are
potentially contrastive) show such eVects but that ‘paradigmatic uniformity is
enforced through conditions that govern both phonological features and
properties presently classiWed as phonetic detail, such as non-contrastive
degrees in the duration of consonant constrictions, non-contrastive details
in the implementation of the voicing contrast, and degree of gestural overlap.’
(2000: 314). She then goes on to say that ‘There is a larger agenda behind this
argument: the distinction between phonetic and phonological features is not
conducive to progress and cannot be coherently enforced.’ (2000: 314)
This very strong claim rests on two cases. The Wrst case is schwa deletion in
French, where paradigm uniformity is argued to be responsible for the subtle
diVerences between forms such as pas d’role ‘no role’ and pas drôle ‘not funny’,
where the syllable-initial character of [å] is maintained in the Wrst case,
despite the deletion of schwa. The second is Xapping in American English:
the observation (made by Withgott 1983 and others) that in some cases where
the phonological environment is met for Xapping, Xapping does not occur
is argued to be due to subphonemic paradigm uniformity, for example
cápiDalist: cápiDal, but mı̀litarı́stic: mı́litary.
38 The Nature of Gradience

In Steriade’s argument concerning Xapping there are two crucial assump-


tions. First, ‘we suggest that PU [paradigm uniformity] (STRESS) should
characterize not only stress identity between syllables but also the use of
individual stress correlates (such as duration, pitch accents, vowel quality)
to Xag the stress proWle of the lexical item to be accessed.’ (2000: 321). In eVect,
what Steriade hopes to conclude—that non-contrastive details can drive
paradigm uniformity—becomes a working assumption, making the argu-
ment circular. Second, ‘The diVerence between [Q] and [t]/[d] is a function
of closure duration . . . . The extra-short duration of [Q] is a candidate for a
never-contrastive property’ (2000: 322). In fact, there are a number of other
candidates for the diVerence between Xap and [d/t], some of which are
contrastive properties (such as sonority). Steriade conducted an acoustic
study of twelve speakers uttering one repetition each of several pairs of
words, with judgements based on impressionistic listening (which turns out
to be rather unreliable in identifying Xapping). Based on the results of the
study, she concludes that PU (stress: duration) is responsible for observed
base–derivative correspondence.
However a recent experiment designed to replicate Steriade’s Wnding by
Riehl (2003a, 2003b) calls into question Steriade’s (2000) conclusions about
the nature of the paradigm uniformity eVect. Riehl recorded six speakers, with
twelve repetitions of each form, using similar pairs to those in Steriade’s
study. She undertook an acoustic analysis of the data (including measures
of closure duration, VOT, presence or absence of burst, and voicing duration
during closure) and also a systematic perceptual classiWcation by three listen-
ers, in order to compare the perception and consistency of perception with
the acoustic realization.
In Riehl’s data, there were four relevant pairs of forms where phonologically
one might expect a Xap in one case and a stop in the other, as in the examples
studied by Steriade. There were 24 possible cases of paradigm uniformity
(4 forms  6 speakers) where 12/12 forms could have shown both Xaps or
both stops. Since there was quite a bit of variation in the data, Riehl counted
either 12/12 or 11/12 cases with the same allophone as showing ‘uniformity’. Out
of the 24 cases, there were 7 that showed uniformity or near uniformity and 17
with variation within forms and within pairs. Thus the case for paradigm
uniformity was weak at best. In cases of variation, stops were usually produced
earlier in the recordings, Xaps later, arguably showing a shift from more formal
to more casual speech (highlighting the importance of looking at multiple
repetitions). Moreover, Riehl found that the coding of particular tokens as
Xaps or stops was not as straightforward as often assumed and she found that
the perception of Xaps correlated best with VOT, not closure duration.
Is there Gradient Phonology? 39

This does not mean that there is no morphological inXuence on Xapping,


but suggests that the pattern may not be that strong. There is also a lack of
compelling evidence to show that these eVects are best understood as sub-
phonemic paradigm uniformity. Steriade’s conclusions regarding French
schwa are also not that secure. It is not clear whether these eVects are really
what we understand to be paradigm uniformity; rather, this interpretation
seems to be driven by Steriade’s assumption that phonology and phonetics are
not distinct. (Barnes and Kavitskaya 2002 also question Steriade’s conclusions
in the case of schwa deletion in French.) Does this mean that there are not
gradient eVects in the domain of morphophonemics? A more convincing case
of morphological inXuences on phonetic realization may be the degree of
glottalization in English correlating with degree of morphological decom-
positionality, for example, realign versus realize as discussed by Pierrehumbert
(2002). ‘The model predicts in particular the existence of cases in which
relationship of phonetic outcomes to morphological relatedness is gradient.’
(2002: 132) The question is how close the correlation between morphological
decompositionality and phonetic realization is and how best to model this
correlation. I fully agree with Pierrehumbert that ‘More large-scale experi-
ments are needed to evaluate this prediction.’ (2002: 132)

2.2.5 Allophony
The Wnal aspect of phonology is allophony. Based on the deWnitions of SPE,
allophony is understood to be part of phonology, due to its language-speciWc
nature. There has been much discussion in the literature about whether
allophony is necessarily categorical in nature or whether there are gradient
aspects of allophony. There are also many cases of what was understood as
allophony in categorical terms that have been shown, based on instrumental
studies, to be gradient. This is the case of anticipatory nasalization in English
discussed in Cohn (1990, 1993) and the case of velarization of [l] in English as
discussed by Sproat and Fujimura (1993). Such cases raise three issues.

1. Based on impressionist description, much work on allophony suggests


that allophony is quite categorical in nature. Yet, both the tools we use
(careful listening) and the symbols available to us (phonetic transcrip-
tion which is discrete in nature) bias our understanding of these pat-
terns as categorical.
2. There has been a wide body of work arguing for a rethinking of the SPE
deWnition of what is phonology and what is phonetics. Much work has
identiWed the language-speciWc nature of phonetic patterns (e.g. Chen
1970; Keating 1985; Cohn 1990; Kingston and Diehl 1994), leading to a
40 The Nature of Gradience

rethinking of where we draw the boundary between phonetics and


phonology. Under these approaches many cases that have been thought
of as phonological have been reanalysed as phonetic.
3. This still leaves us with the question of where to draw the line and
whether we should draw a line. We return to this question in Section 2.3.

There has also been argued to be gradience in allophony in a rather diVerent


sense. Bybee (2001) and Jurafsky et al. (2001) among others argue that lexical
(token) frequency aVects allophony in the sense that more frequent words are
observed to be shorter and phonologically reduced. Bybee (2001 and earlier
work) has argued that what is understood as allophony in generative phon-
ology cannot follow from general rules or constraints, because there are
frequency eVects on the realization of non-contrastive properties. If what
we think is allophony falls along a continuum rather than in two or three
discrete categories, and if there is a strong correlation between the realization
of a particular non-contrastive property and frequencies of particular lexical
items in the lexicon, then this would be diYcult to model in standard
generative phonological models.
One widely cited case in this regard is the case of schwa deletion in the
context of a resonant in English. Bybee (2001) citing an earlier study based on
speaker self-characterization (Hooper 1976, 1978, a.k.a. Bybee) argues that it is
not the case that schwa is either deleted or present, but rather that there are
degrees of shortening. She observes impressionistically a continuum from
[Ø], to syllabic resonants, to schwa + resonant, for example every [Ø],
memory [r" ], mammary [@r] (where a syllabic resonant is thought to be shorter
than a schwa plus resonant). It is further argued that these diVerent realiza-
tions correlate with the lexical (token) frequency, that is, there is complete
deletion in the most common forms, schwa plus a resonant in the least
frequent forms, and syllabic resonants in the cases which fall in between.
This is understood to follow from the view that sound change is lexically
and phonetically gradual, so that ‘schwa deletion’ is farther along in high
frequency words.
Lavoie (1996) tried to replicate Bybee’s Wnding with a more systematic
study including instrumental data. Her study included acoustic measure-
ments of multiple repetitions of near mimimal triplets, sets that were similar
in their phonological structure and diVered in relative frequency both within
the sets and in terms of absolute frequency across the data set, based on
frequency from Francis et al. (1982). Crucially when frequency was plotted
against duration, no correlation was found. Rather, there was a robust
subpattern of complete deletion of schwa in many forms independent of
Is there Gradient Phonology? 41

lexical frequency and there was variation in duration independent of lexical


frequency. Thus schwa deletion in English does not provide the kind of
evidence that Bybee suggests for allophony being driven by lexical token
frequency. (The other cases widely discussed by Bybee in this regard, such
as aspiration of /s/ in Spanish and /t, d/ deletion in English also warrant
careful reconsideration.)
We need to consider the question of whether there are cases where gradient
phonological patterns correlate with lexical (token) frequency. The short
answer is yes, but the best documented cases in this regard are of a very
diVerent sort than those mentioned above. When token frequency diVerences
correlate with function versus content word diVerences, indeed frequency in
such cases has a major eVect on the realization of sound patterns. Function
words show much more reduced and variable realization than content words.
See for example Lavoie (2002)’s study of the reduction and phonetic realiza-
tion of for versus four and Jurafsky et al.’s (2001) study of both reduction and
variability in function words. Bybee assumes that it is token frequency that
diVerentiates function words and content words, yet these frequency eVects
can also be understood to follow from the prosodic diVerence between
content and function words. (For recent discussion of the prosodic structure
of lexical versus functional categories, see e.g. Zec 2002.)
Unequivocal support for Bybee’s claim would come from duration diVer-
ences strongly correlated with token frequency diVerences found within the
same lexical category, with appropriate controls for discourse context, priming
eVects, and so forth. Cohn et al. (2005) investigate the phonetic durations of
heterographic pairs of homophonous English nouns that diVer in token
frequency. Homophonous pairs were grouped into three categories based on
the magnitude of the frequency diVerence between the members of each pair,
as determined by relative frequencies in Wve large corpora. This included Large
DiVerence pairs (e.g. time  thyme, way  whey), Medium DiVerence pairs
(e.g. pain  pane, gate  gait), and little or No DiVerence pairs (e.g. son  sun,
peace  piece). Four native speakers of American English participated in two
experiments. In the Wrst experiment, the speakers were recorded reading four
repetitions of a randomized list of the target words in a frame sentence. In the
second experiment, a subset of these words was read in composed sentences
with controlled prosodic structures. The phonetic duration of each target
word was then measured in Praat, and the ratio more frequent/less frequent
was calculated for each repetition of each pair. If the hypothesis that greater
frequency leads to shorter duration is correct, then these ratios should sys-
tematically fall below one for the Large DiVerence and Medium DiVerence
pairs, while those for the little or no diVerence group should be approximately
42 The Nature of Gradience

one. No systematic diVerences were found for individual speakers or across


speakers in either the frame sentences or the composed sentences. The lack of
positive correlation between duration and token frequency calls into question
the hypothesis that greater frequency leads to shorter duration. These results
are interesting in light of Jurafksy’s (2003) observation that evidence for
frequency eVects are better documented in comprehension than production.
On the production side, eVects are much more robustly documented for
latency in lexical access than in phonetic duration diVerences. These results
and observations highlight the need for a better understanding of the locus of
frequency eVects in the lexicon and in speech production.

2.3 Conclusions and implications


Having considered the evidence for three cases of gradience in the phonology
in Sections 2.2.3–5, we now return to the broader question: Is there gradience
in the phonology? Not surprisingly, the answer seems to be yes and no. It
depends on what we mean by gradience and it depends on which facets of the
phonology we consider. The clearest evidence for gradience among the cases
that we considered is gradient well-formedness, as documented in the case of
phonotactics. It was less clear that there was a convincing empirical basis for
the speciWc claims made by Steriade in terms of subphonemic eVects in
paradigm uniformity and those made by Bybee regarding frequency eVects
in allophony. However the shakiness of the speciWc cases does not answer the
question of whether there is gradience in phonology in the areas of morpho-
phonology and allophony. In both cases, the conclusion about whether there
is gradience in the phonology depends in part on the deWnition of phonology
and how we understand phonology in relationship to phonetics.
This leads us back to the question, discussed in Section 2.1.2, whether
phonetics and phonology are distinct domains. A modular view of the
grammar necessarily leads us to a mapping approach between the phonology
and phonetics view. On the other hand, focusing primarily on the grey area,
cases that are particularly diYcult to classify, and deWning similarity as
‘duplication’ lead us to a unidimensional view.
Let us return to the observation by Pierrehumbert et al. (2000) that
knowledge of sound structure falls along a continuum, with more Wne-
grained knowledge tending to lie at the phonetic end and lexical contrast
and alternations being more granular. This continuum is schematized in
Figure 2.2a with phonetics versus phonology on the x-axis and degree of
granularity on the y-axis. Consider the schematic distribution of the data: the
modular approach suggests a distribution such as that in Figure 2.2b, with
Is there Gradient Phonology? 43

granular granular

fine-grained fine-grained
phonetics phonology phonetics phonology
(a) (b)

granular

fine-grained
(c) phonetics phonology

Figure 2.2. (a) Continuum between phonetics and phonology (x-axis) and fine-
grained and granular (y-axis) dimensions of speech; (b) distribution of data, modular
approach; (c) distribution of data, unidimensional approach

little or no grey area. The unidimensional approach suggests a distribution


such as that in Figure 2.2c, with little correlation between the two dimensions.
Yet the data clearly fall somewhere between these two views. How can we
understand and model this distribution?
Two methodological issues contribute to the perceived cost of ‘duplication’
and to the tendency to avoid duplication through reductionism. The Wrst is
the nature of modularity. Hale and Reiss (2000: 162) state ‘The modular
approach to linguistics, and to science in general, requires that we both
model the interactions between related domains, and also sharply delineate
one domain from another.’ But, we need to ask the question: Is there strict
modularity? Does modularity entail sharp delineation? Could there be modu-
larity that is not rigid? The lack of strict modularity is implicit in views to
understanding the relationships between linguistic domains through inter-
faces. If we do not subscribe to strict modularity between phonology and
phonetics and between phonology and the lexicon, then it becomes an
empirical question if drawing a distinction is useful. Does a division of labour
contribute to both descriptive adequacy and explanatory adequacy?
The second is the status of Occam’s Razor, or the principle of parsimony.
Perhaps Occam’s Razor does not hold as strongly as we believe. There is
redundancy in language. Redundancy is widely observed in the domain of
phonetics in terms of multiple and varied cues to the realization of
particular phonological structures. Even cases of what we understand to be
44 The Nature of Gradience

a straightforward phonological contrast may involve multiple cues. Evidence


suggests that lexical representations include multiple levels of details, including
the kind of sparse abstract representations widely assumed in generative phon-
ology and much more Wne-grained levels of detail. (See Beckman et al. 2004 for
discussion and a speciWc proposal in this regard.) Not only is there redundancy
within domains, but there appears to be redundancy across domains, so
‘duplication’ is not a problem, but in fact an intrinsic characteristic of language.
Increasingly there is agreement that unidimensional or reductionist views are
not suYcient (see Pierrehumbert 2001: 196). Attempting to understand sound
structure in only abstract categorical terms or in only the gradient details, or
trying to understand the nature of the lexicon in exactly the same terms in
which we try to understand phonology is insuYcient.
In conclusion, the relationship between phonetics and phonology is a
multifaceted one. It reXects phonetic constraints that have shaped synchronic
phonological systems through historical change over time. Synchronically,
phonological systems emerge as a balance between the various demands
placed on the system, but the evidence suggests that phonology cannot be
reduced to the sum of these inXuences. Phonetics and phonology also need to
be understood in relationship to the lexicon. There are parallels and overlaps
between these three areas, but none of them is properly reduced to or
contained in the others. Language patterns are fundamentally Xuid. There is
evidence of phonologization, grammaticalization, lexicalization, and so forth.
Similar patterns can be observed across these domains. To reach a fuller
understanding of the workings of the sound system and the lexicon, we
need to be willing to reconsider widely held assumptions and ask in an
empirically based way what is the connection between these domains of the
linguistic system.
3

Gradedness: Interpretive
Dependencies and Beyond
ERIC REULAND

3.1 Introduction
During the last decades it has been a recurrent theme whether or not the
dichotomy between grammatical versus ungrammatical or well-formed ver-
sus ill-formed should not be better understood as a gradient property (cf.
Chomsky’s (1965) discussion of degrees of grammaticality).1 If so, one may
well ask whether gradedness is not an even more fundamental property of
linguistic notions. The following statement in the announcement of the
conference from which this book originated presupposes an aYrmative
answer, and extends it to linguistic objects themselves, making the suitability
to account for gradedness into a test for linguistic theories: ‘The kind of
grammar typically employed in theoretical linguistics is not particularly
suited to cope with a widespread property of linguistic objects: gradedness.’2
This statement implies that we should strive for theories that capture graded-
ness. To evaluate it one must address the question of what ‘gradedness’ as a
property of linguistic objects really is. The issue is important. But it is also
susceptible to misunderstandings. My Wrst goal will be to show that graded-
ness is not a uniWed phenomenon. Some of its manifestations pertain to
language use rather than to grammar per se. Understanding gradedness may
therefore help us shed light on the division of labour among the systems
underlying language and its use. Showing that this is the case will be the
second goal of this contribution.

1 This material was presented at the ‘Gradedness conference’ organized at the University of
Potsdam 21–23 October 2002. I am very grateful to the organizers, in particular Gisbert Fanselow,
for creating such a stimulating event. I would like to thank the audience and the two reviewers of the
written version for their very helpful comments. Of course, I am responsible for any remaining
mistakes.
2 This statement is taken from the material distributed at the conference.
46 The Nature of Gradience

The introductory material to this book connects the discussion of


gradedness with the notion of idealization in grammar. As the text puts it:
The formulation of grammatical models is often guided by at least two idealizations:
the speech community is homogeneous with respect to the grammar it uses
(no variation), and the intuitive judgements of the speakers about the grammaticality
of utterances are categorical and stable (no gradedness). There is a growing conviction
among linguists with diVerent theoretical orientations that essential progress could be
made even in the classical domains of grammar if these idealizations were given up.

This assessment as such may well be correct, although it pertains to the


sociology of the Weld rather than to linguistic theorizing itself. However, it
is important to see that the idealizations that are being given up do not reXect
the idealization underlying the competence-performance distinction formu-
lated in Chomsky (1965):
Linguistic theory is concerned primarily with an ideal speaker–listener, in a com-
pletely homogeneous speech-community, who knows its language perfectly and is
unaVected by such grammatically irrelevant conditions as memory limitations, dis-
tractions, shifts of attention and interest, and errors (random or characteristic) in
applying his knowledge of the language in actual performance. . . . To study actual
linguistic performance, we must consider the interaction of a variety of factors of
which the underlying competence of the speaker–hearer is only one. (Chomsky
1965: 3)

This quote does not state that the speech community is homogeneous, nor
that one should not study the nature of variation between speech communi-
ties. It also does not claim that intuitive judgements of the speakers about the
grammaticality of utterances are categorical and stable. In fact one should not
expect them to be. Linguistic data are like any empirical data. Whether one
takes standard acceptability judgements of various kinds, truth value judge-
ment tasks, picture identiWcation tasks on the one hand, or eye-tracking data
and neuro-physiological responses in brain tissue on the other, they all have
the ‘ugly’ properties of raw empirical data in every Weld. Depending on the
nature of the test, it may be more or less straightforward to draw conclusions
about grammaticality, temporal or spatial organization of brain processes, etc.
from such data. In fact, Chomsky says no more than that the ‘study of
language is no diVerent from empirical investigation of other complex phe-
nomena’, and that we should make suitable idealizations in order to make
eVective empirical investigation feasible at all. For anyone who can see that
watching a tennis match is not the best starting point if you want to begin
getting an understanding of the laws of motion Chomsky’s point should be
pretty straightforward.
Gradedness: Interpretive Dependencies and Beyond 47

Gradedness in linguistics may come from at least four sources:


1. linguistic processes could have an analogue character as opposed to
being discrete;
2. properties of languages may be tendency based versus system governed;
3. in certain domains one may Wnd individual variability (variability both
across individuals and within individuals at diVerent occasions, in
judgement or in production/perception);
4. linguists may Wnd systematic distinctions in acceptability judgements on
a scale.
In the next section I will discuss these in turn.

3.2 Categorizing gradedness


3.2.1 Gradedness I: the analogue–discrete distinction
Much of our current thinking about language is intimately connected to what
is called the computational theory of the mind. Language is a system allowing
us to compute the systematic correlations between forms and their interpret-
ations. Right from the beginning, generative models of language were based
on computations using discrete algebras, reXecting properties of sets of
strings deWned over Wnite alphabets. The question of whether a certain step
in a computation is admissible or not can only be answered with yes or no; the
question of whether a certain string belongs to the language generated by a
certain grammar admits only three answers: yes, no, or un-decidable. It makes
no sense to say that a certain string approximates membership of a language,
just like the model has no room for saying that that one item is slightly more
of a member of the alphabet than another. The question is whether this
discrete model of linguistic computations may be fundamentally wrong.
One could argue that biological processes are fundamentally analogue in
nature, and that neural systems may encode diVerences in information states
by diVerences in degree of excitation. Such a position appears to be implied by
much work in connectionist modelling of language, since Rumelhart et al.
(1986b).
However, the situation is not that simple. As argued by Marantz (2000) one
cannot maintain that biological systems are analogue: ‘Genetics, at the genetic
coding level, is essentially digital (consider the four ‘‘letters’’ of the genetic
code). In addition, neurons are ‘‘digital’’—there are no percentages of a
neuron; you have a cell or you don’t. Neuron Wring can be conceived of as
digital or analogue.’ That is, whether or not a neuron Wres is a yes–no issue;
but diVerent Wrings may vary along certain physical dimensions.
48 The Nature of Gradience

The issue is how the diVerences along those dimensions aVect the
transmission of information to other parts of the system. As we all may
know from inspecting our watches, a digital system can mimic an analogue
process and an analogue system can mimic a digital process, and in fact all
watches are based on conversions from one type of process to another. In
the brain it need not be diVerent. And even if at some level the brain
architecture would have connectionist type properties, this would not
prevent it from emulating symbolic/discrete operations. So, even regardless
of properties of the brain architecture, whether or not the system under-
lying language is best conceived as analogue or discrete/digital, is an
independent empirical issue.
If so, there is no escape from approaching the issue along the lines of any
rational empirical inquiry. All models of language that come anywhere near a
minimal empirical coverage are discrete. There are no analogue models to
compare them with (see Reuland (2000) for some discussion). Yet, there are
some potentially analogue processes in language:
. use of pitch and stress as properties of form representing attitudes
regarding the message;
. use of pitch and stress representing relative importance of parts of the
message.
These, however, involve typically indexical relationships such as the contours
of intonation that one may use for various degrees of wonder, or surprise; or
the heaviness of the stress, or the height of the pitch of an expression, where
the relative position on a scale in some dimension of properties of the signal
reXects the relative position on a scale of importance, surprise, etc., of what it
expresses. One need not doubt that intensity of the emotions involved is
expressed by properties of the linguistic signal in an essentially analogue
manner. But this type of import of the signal must be carefully distinguished
from properties of the signal that serve as grammatical markings in the
sentence. For instance, question intonation is the realization of a grammatical
morpheme. One expression cannot be more or less a question than another
one. So, the grammatical import of question intonation is as discrete as the
presence or absence of a segmental question morpheme. As shown by studies
of the expression of focus (Reinhart 1996), the role of intonation in focus
marking is discrete as well.3 There is no gradedness of focus-hood as a

3 In Reinhart’s system focus is determined by stress, and two rules.


(i) Focus rule: the focus of IP is a(ny) constituent containing the main stress of IP as determined by
the (general) stress rule
(ii) Marked focus rule: relocate the main stress on a constituent you want to be the focus
Gradedness: Interpretive Dependencies and Beyond 49

function of gradedness of the stress. In a nutshell, there is ‘gradedness’ in the


domain of prosody, but the relevance of this gradedness to the grammar
is null.

3.2.2 Gradedness II: tendency and system


Another face of gradedness we Wnd in the contrast between tendency based
and system governed properties of language(s). Two issues arise: What kind of
phenomena do we have in mind when we speak of tendencies in the study of
language? And what is the status of whatever represents them in our linguistic
descriptions? Sometimes the issue is clear. There are striking diVerences
between written cultures as to the average sentence length in their typical
novels. A cursory examination of a standard nineteenth-century Russian
novel, or a German philosophical treatise, will show that their average sen-
tence length in words will surpass that of a modern American novel. But
individual samples may show substantial variation. Clearly, a matter of
tendency rather than law. What underlies such a diVerence? Few people will
claim that diVerences in this dimension are diVerences in language. Rather
they will be regarded as diVerences in socio-cultural conventions. One may
expect convergence between language communities as the relevant conven-
tions do; eventually, stratiWcation along genres instead of ‘languages’, etc.
Thus, average sentence length is a gradable property of texts. But its source
is conventions of language use. At the other extreme, we Wnd properties such
as the position of the article in a DP, or the position of the Wnite verb in the
clause. No serious linguist would be content claiming that English has a
tendency to put articles before the noun, and Bulgarian a tendency to put
them after, or that Dutch has just tendencies to put the Wnite verb in second
position in declarative root clauses, and to put it after the direct object in
subordinate clauses.
It is easy to think of other cases where system and use based accounts might
conceivably compete. Take, for instance, the cross-linguistic variation that is
found in the expression of agents in passive. The standard view is that such
diVerences are categorical, determined by the precise mechanics used to
‘suppress’ the agent role. Instead one could imagine a usage based approach,
based on cultural variation in the role assigned to eYciency of communica-
tion. For instance, certain communities might entertain the conversational
convention if an argument is so unimportant that you consider demoting it,
better go all the way and omit it entirely. A similar line could be attempted
with respect to word order and its variations. Of course, the question is how
to evaluate such positions.
50 The Nature of Gradience

As long as we do not have eVective analogue models of language,


discussion of such alternatives remains moot. We cannot evaluate analogue
versus algebraic accounts of a certain phenomenon if the analogue version
does not exist. Does this mean that discrete theories rule out that we Wnd
any phenomena only involving tendencies? In fact not. And we can go a bit
further. From the discrete/grammatical perspective, one expects to Wnd
tendencies precisely there where principles of grammar leave choices
open. Just as there are no linguistic principles determining what one will
say, there are no linguistic principles determining how one will say it. I will
return to this below.

3.2.3 Gradedness III: variability


It is a common experience for linguists that speakers of the same language
show variation in their judgements on one and the same sentence, or that
even one and the same speaker gives varying judgements at diVerent occa-
sions. Note that the second variability may also show up as the Wrst, but for
the discussion we will keep them separate. The question is what kind of
theoretical conclusions one should draw from such variation. Do messy,
variable data show that ‘gradedness’ is a concept that should be incorporated
into linguistic theorizing? I will discuss some examples that illustrate the
issue.

3.2.3.1 Inter-subject variability One relevant type of variation is the inter-


subject variation illustrated by the contrast between varieties of Dutch that do
or do not require the expletive er in sentences like (3.1):
(3.1) a. A: Wie denk je dat *(er) komt
b. B: Wie denk je dat (er) komt
who think you that (there) comes

Unlike what was thought in the 1970s, there is no clear split along regional
dialect lines. Rather there is variation at the individual level. Note, however,
that it is not a matter of real gradedness. Individual speakers are quite
consistent in their judgement. Hence, no insight would be gained by treating
this as a ‘graded’ property of Dutch. The same holds true of the variation
regarding the that-trace Wlter in American English. Variation of this kind is
easily handled by the one theoretical tool late GB or current minimalist theory
has available to capture variation, namely variability in feature composition
of functional elements. The contrast between (3.1a) and (3.1b) reXects a
Gradedness: Interpretive Dependencies and Beyond 51

general contrast between Dutch speakers concerning the licensing of non-


argument null-subjects, ultimately reducible to micro-variation in the feature
composition of T.
Another instructive case is variation in operator raising in Hungarian, as in
(3.2) (Gervain 2002):
(3.2) KÉT FIÚT mondtál hogy jön/jönnek
two boys.sg.acc said.2sg that come.3sg/come.3pl
‘You said that two boys would come.’
The literature on operator raising in Hungarian gives conXicting judgements
on case (nominative or accusative on the focused phrase) and agreement or
anti-agreement on the downstairs verb. Gervain succeeds in showing that
there actually is systematic variation between two basic dialects (but without
regional basis). One ‘dialect’ allows both nominative and accusative on the
focused phrase, but rejects anti-agreement on the embedded verb, the other
prefers matrix accusative, but accepts both agreement and anti-agreement.
The former employs a movement strategy, the other a resumptive strategy.
The source of the patterns can ultimately be reduced to a diVerence in the
feature composition of the complementizer hogy ‘that’, one feature
set allowing an operator phrase to pass along by movement, the other block-
ing it. This case is so instructive, since it shows that the right response to
messy data is further investigation rather than being content just stating the
facts.
There are other sources of inter-subject variability. Some speakers may
be daily users of words that other speakers do not know, or deWnitely Wnd
weird when they hear them. Such variation follows from diVerences in
experience, but crucially, resides in the conceptual rather than the grammat-
ical system.
Variability in judgement may also ensue from the interaction of computa-
tionally complex operations. In fact this type of variation may occasionally
cross the border from inter-subject to intra-subject variability. Even in simple
interactions world knowledge may be necessary to bring out object wide
scope readings as in (3.3b) versus (3.3a):
(3.3) a. some student admired every teacher
b. some glass hit every mirror
It is a safe bet that in on-line processing tasks few people will manage to get all
the quantiWcational interdependencies in a sentence of the following type, and
even on paper I have my doubts:
52 The Nature of Gradience

(3.4) which candidates from every class did some teacher tell every colleague
that most mother’s favourites refused to acknowledge before starting to
support
Carrying out complex computations with interacting quantiWers may easily
lead to the exhaustion of processing resources. OverXow of working memory
leads to guessing patterns. It is well-known that speakers diVer in their
processing resources (for instance, Just and Carpenter 1987; Gathercole and
Baddeley 1993). So, varying availability of resources may lead to diVerential
patterns of processing break-down. Simply speaking, one speaker may start
guessing at a point of complexity, where another speaker is still able to process
the sentence according to the linguistic rules. This would yield diVerences in
observable behaviour that do have gradient properties, but that are again
grammar external, and would in fact be a mistake to encode in the grammat-
ical system.
Note, that even simple sentences may sometimes be mind boggling as in
determining whether every mother’s child adores her allows every mother to
bind her or not.
Of course, the ranking of processes in terms of resource requirements has a
clear theoretical interest, since it sheds light on the overall interaction of
processes within the language system, but it is entirely independent of the
issue of gradedness of grammar.

3.2.3.2 Intra-subject variability As we all know, speakers of a language may


not always be able to give clear-cut categorical judgements of the grammatical
status of sentences. Actually, we Wnd this in two forms: (a) a speaker of a
language may give diVerent judgements on diVerent occasions; (b) a speaker
of a language may express uncertainty on one occasion. Once more, the issue
is not so much whether such variability occurs, as what it means for the
(theory of) grammar. And, again, the variability does not necessarily mean
very much for our conception of grammar given what we know about
possible sources. If the interpretation of a sentence exerts a demand on
processing resources to the point of overXow, variability is what we would
expect. Depending on the level of sophistication guessing may either result in
(a), as it does with children on condition B tasks (Grodzinsky and Reinhart
1993), or in (b). For another source of variation, note that people are perfectly
able to master more than one variant of a language, for instance in the form
of diVerent registers, or a standard language and one or more dialects.
This means that the mental lexicon must be able to host entries that are
almost identical; just marginally diVerent in some of their instructions to the
Gradedness: Interpretive Dependencies and Beyond 53

sensori-motor system or the grammatical system. For instance Dutch dat, and
Frisian dat are both complementizers; they diVer slightly in the way the a is
realized, more back in the case of Frisian. In Frisian, but not in Dutch, dat
carries a grammatical instruction letting it cliticize to wh-words (actually,
it does not matter whether this is a property of dat, or of the element in
spec-CP). This property can be dissociated from its pronunciation, witness
the fact that some Frisian speakers have this feature optionally in their Dutch,
and use cliticization together with the Dutch pronunciation of the a.
Optionality means that the mental lexicon of such speakers does contain
the two variants, both being accessible in the ‘Dutch mode’. If so, there is no
reason that this cannot be generalized to other cases of micro-variation. If the
mental lexicon may contain close variants of one lexical item, one may expect
retrieval of one or the other to be subject to chance. Of course, this does not
mean that the phenomenon of variation and the mechanisms behind it are
uninteresting. It does mean that the concept of gradedness does not
necessarily help understanding it.
Interpretation being dependent on perspective is another possible source of
variation (as in the Necker cube: which edge of the cube is in front). Consider
the following contrast from Pollard and Sag (1992):
(3.5) a. Johni was going to get even with Mary. That picture of himselfi in
the paper would really annoy her, as would the other stunts he had
planned.
b. *Mary was quite taken aback by the publicity Johni was receiving.
That picture of himselfi in the paper had really annoyed her, and
there was not much she could do about it.

There is a clear diVerence in well-formedness between these two discourses.


Yet, structurally the position of the anaphor himself is identical in both cases.
The only relevant contrast is in the discourse status of the antecedent. In
(3.5a) John’s viewpoint is taken, in (3.5b) Mary’s. And, as noted by Pollard
and Sag, the interpretation of anaphors that are exempt from a structural
binding requirement (such as the himself in a picture NP) is determined by
viewpoint. Hence, in (3.5b) John does not yield a proper discourse antece-
dent for himself. The following variant will probably give much messier
results:
(3.6) (?) Mary was quite taken aback by the publicity. Johni was getting the
upper hand. That picture of himselfi in the paper had really annoyed
her, and there was not much she could do about it.
54 The Nature of Gradience

Example (3.6) allows for two discourse construals. One in which the view-
point is Mary’s throughout, another in which the perspective shifts to that of
John as soon as John is introduced. Clearly, judgements will be inXuenced by
the ease with which the perspective shift is carried out. So, we have gradedness
in some sense, but it is irrelevant for the system, since the relevant factor is still
discrete. The judgement is determined by whether the shift in viewpoint is
accessed in actual performance.

3.2.4 Gradedness IV: acceptability or grammaticality judgements on a scale


As noted above, Chomsky (1965) introduced the notion ‘degree of grammat-
icality’ indicating that such degrees should reXect the nature of violations of
diVerent subsystems of the grammar and the way these violations add up.
They were expressed in a metric such that a violation of strict sub-
categorization as in *John arrived Mary comes out as worse than a violation
of a selection restriction, as in ??sincerity hated John. Clearly, over the years not
much progress has been made. In practice, researchers have been content
impressionistically labelling violations from **, via *, *?, to ?? and ?. The
question is, does this reXect on grammatical theory or just on grammatical
practice?
For a GB-style syntax it is easily seen that there is no problem in principle.
For any full representation of a sentence, that is with indices, traces, cases,
theta-roles, etc. it is easy to count the number violations and devise a metric
based on that number. Of course, one would need to decide whether viola-
tions of selection requirements count as violations of grammar or not, or
whether all types of violations get the same value in the metric, but that’s a
matter of implementation.
Also a minimalist style grammar allows a metric for violations. It requires
looking at a potential derivation from the outside and assessing what would
have happened if it had not crashed. More precisely, on the basis of each
standard minimalist grammar G one can deWne a derivative grammar
G’ such that derivational steps that crash in G are deWned in G’ and associated
with a marker of ungrammaticality. It is trivial to deWne a metric over such
markings.
A minimalist style grammar is, in fact, even better equipped for expressing
diVerences in well-formedness than a GB-style grammar. Whereas a GB-style
grammar does not make clear-cut distinctions between processes that belong
to syntax proper, or to syntax external components such as semantics, prag-
matics, or conceptual structure (since there aren’t many restrictions on
the potential encoding mechanisms), a minimalist style grammar draws a
Gradedness: Interpretive Dependencies and Beyond 55

PF- C-I-
interface interface (interpretive system)

( inferences )

sensori-motor system CHL language of thought

[ lexicon ]

Figure 3.1. A minimalist organization of the language system

fundamental line around the computational system of human language CHL


(narrow syntax). It allows for a minimal set of operations (Merge, Attract,
Check, Delete (up to recoverability)), deWned over a purely morphosyntactic
vocabulary, and triggered by the grammatical feature composition of lexical
items. Possible derivations are restricted by the inclusiveness condition, which
states that no elements can be introduced in the course of the derivation that
were not part of the initial numeration (no lambda’s, indices, traces, etc.).
Clearly, the language system as a whole must allow for operations that do not
obey inclusiveness. Otherwise, semantic distinctions like distributive versus
collective could only be annotated, not represented.4
Let us place this issue in the broader perspective of the minimalist organ-
ization of the language system, as in Figure 3.1.
Following Chomsky (1995) and subsequent work, CHL mediates between the
sensori-motor system and the language of thought. The lexicon is a set of Wxed
triples <p,g,l>, with p instructions to the sensori-motor system, l instructions
to the language of thought, and g instructions to the computation as such. In
this schema the interfaces play a crucial role, since they determine which of the
properties of the systems outside CHL are legible for the purpose of compu-
tation. What is legible may in fact be only a very impoverished representation
of a rich structure. For instance, verbs may have a very rich conceptual
structure reXected in their thematic properties. Reinhart (2000b, 2003)

4 Take, for instance, the rules computing entailments. A sentence such as DPplur elected Y does not
entail that z elected Y, for z 2 kDPplurk, whereas DPplur voted for Y does entail that z voted for Y, for
z 2 kDPplurk. This is reXected in the contrast between we elected me and ??we voted for me. The latter is
ill-formed since it entails the instantiation I (lx (x voted for x) ), which is reXexive, but not reXexive-
marked (Reinhart and Reuland 1993).
56 The Nature of Gradience

shows that the computational system can read these properties only in so far as
they can be encoded as combinations (clusters) of two features: [ + c(ause)]
and [ + m(ental)]. Anything else is inaccessible to CHL. The coding of the
concept subsequently determines how each cluster is linked to an argument.
Note furthermore, that an interface must itself contain operations. That is, it
must be a component sui generis. For instance, syntactic constituents do not
always coincide with articulatory units. Furthermore, in preparation for
instruction to the sensori-motor system, hierarchical structure is broken
down and linearized. On the meaning side, it is also well-known that for
storage in memory much structural information is obliterated. As Chomsky
(1995) notes, at the PF side such operations violate inclusiveness. It seems fair
to assume that the same holds for the interpretive system at the C–I interface.
Given this schema, more can be ‘wrong’ with an expression than just a
violation (crash) of the derivation in CHL. DiVerences in the ‘degree of
ungrammaticality’ may well arise by a combination of violations in diVerent
components: PF, lexicon, narrow syntax, interpretive system, interpretation
itself. Any theory of language with a further articulation into subsystems is in
principle well equipped to deal with ‘degrees’ of well- or ill-formedness.

3.3 Issues in binding and co-reference


The domain of anaphora is well-suited for an illustration of these issues. The
canonical binding theory (Chomsky 1981) represents a strictly categorical
approach. However, in many languages anaphoric systems are more complex
than the canonical binding theory anticipated. Moreover, not all cases where
an anaphoric element is co-valued with an antecedent are really binding
relations. Limitations of space prevent me from even remotely doing justice
to the discussion over the last decades. Instead I will focus on one basic issue.
Going over the literature one will Wnd that sometimes judgements are clearly
categorical, in other cases we Wnd judgements that are far more ‘soft’ and
‘variable’. In one and the same domain, languages may in fact vary, even when
they are closely related. Below I will discuss some concrete examples. Recently,
this property of anaphora has been taken as evidence that a categorical
approach to anaphora must be replaced by an approach in which binding
principles are replaced by soft, violable constraints along the lines proposed in
optimality theoretic approaches to grammar. A useful discussion along these
lines is presented in Fischer (2004).
Yet, such a conclusion is not warranted. ‘Flexible’ approaches of various
kinds may seem attractive at a coarsely grained level of analysis, taking
observational entities at face value. The attraction disappears if one sees
Gradedness: Interpretive Dependencies and Beyond 57

that more Wnely grained analyses make it possible to connect the behaviour of
anaphoric elements to the mechanisms of the grammar, and to explain
variation from details in their feature make-up, or from diVerences in their
environment. Current research warrants the conclusion that the computa-
tional system underlying binding operates as discretely and categorically as it
does in other areas of grammar. However, just as in the cases discussed in
Sections 3.1 and 3.2, the computational system does not determine all aspects
of interpretation; where it does not, systems of use kick in, evoking the air of
Xexibility observed.
Much work on binding and anaphora so far is characterized by two
assumptions:
1. all binding dependencies are syntactically encoded (by indices or equiva-
lent);
2. all bindable elements are inherently characterized as pronominals or
anaphors (simplex, complex, or clitic).
These assumptions are equally characteristic of extant OT-approaches to
binding. So, one Wnds diVerent rankings of binding constraints on pronom-
inals or subtypes of anaphors, across environments and languages. In
Reuland (2001) I argued that both assumptions are false. The argument
against (1) rests in part on the inclusiveness condition, and hence is theory-
internal, but should be taken seriously in any theory that strives for
parsimony. Clearly, indices are not morphosyntactic objects. No language
expresses indices morphologically. Thus, if syntax is, indeed, the component
operating on morphosyntactic objects, it has no place for indices. External
validation of the claim that syntax has no place for indices rests on the
dissociation this predicts between dependencies that can be syntactically
encoded and dependencies that cannot be. In so far as such dissociations
can be found, they constitute independent evidence (see Reuland 2003 for
discussion).
The argument against (2) is largely theory-independent. It essentially rests
on the observation that there are so many instances of free ‘anaphors’ and
locally bound ‘pronominals’, that one would have to resort to massive lexical
ambiguity in order to ‘maintain order’. Instead, we can really understand the
binding behaviour of pronouns (to use a cover term for anaphors and
pronominals) if we consider their inherent features, the way these features
are accessed by the computational system, and general principles governing
the division of labour between the components of the language system, in
particular CHL, logical syntax, and discourse principles as part of the inter-
pretive system.
58 The Nature of Gradience

By way of illustration I will brieXy consider three cases: (a) free anaphors
(often referred to as ‘logophors’); (b) reXexives in PPs; and (3) local binding
of pronominals.

3.3.1 Free anaphors


There are two main issues any theory of binding must address:
1. Why is it that cross-linguistically reXexivity must be licensed?
2. Why is it that certain expressions, ‘anaphors’, must be bound?
Question (1) covers a substantial part of Condition B of the canonical binding
theory, question (2) reXects the canonical condition A. Consider Wrst question
(1). More concretely it asks: Given that pronominals can generally be bound by
DP, why cannot we simply use expressions of the form John admires him, or
more generally, DP V pronominal, to express that the person who John admires
is John? Simply invoking avoidance of ambiguity does not help. There are lots
of cases where pronominals are used ambiguously, and no special marking is
introduced. An ambiguity story cannot be complete without explaining why in
all the other cases there is no marking. Moreover, languages diVer as to where
they require special marking (we will discuss one such case below). Again, an
avoid-ambiguity story would not lead us to expect such variation.
As I argued in Reuland (2001) the essence of the answer to (1) is that
reXexivization involves the identiWcation of variables in the logical syntax
representation of a predicate.5 Thus, limiting ourselves for simplicity’s sake to
binary predicates, reXexivization eVectively reduces a relation to a property, as
in (3.7):
(3.7) lx ly (P x y) ! lx (P x x)
This leads to a theta-violation since the computational system cannot
distinguish these two tokens of x as diVerent objects (occurrences). At the
C–I interface syntactic structure is broken down and recoverable only in so far
as reXected in conceptual structure. This is parallel to what happens at the
PF-interface; here syntactic structure is broken down as well, and recoverable
in so far as reXected in properties of the signal. SpeciWcally, following the line
of Chomsky (1995), a category such as V’, needed to express the speciWer–
complement asymmetry in [VP Spec [V’ V Comp] ], is not a term, hence not

5 The notion ‘logical syntax’ is to be distinguished from ‘logical form’. Logical form is the output of
the computational system (CHL); the operations yielding logical form are subject to the inclusiveness
condition. Logical syntax is the Wrst step in interpretation, with operations that can translate
pronominals as variables, can raise a subject leaving a lambda expression, etc., thus not obeying
inclusiveness. For a discussion of logical syntax, see Reinhart (2000a) and references cited there.
Gradedness: Interpretive Dependencies and Beyond 59

visible to the interpretive system. Therefore, it is not translatable at the C–I


interface, and the hierarchical information it contributes is lost. What about
order? Syntax proper only expresses hierarchy, but no order. Order is imposed
under realization by spell-out systems. As a consequence, the computational
system cannot distinguish the two tokens of x in (3.7) on our mental ‘scratch
paper’ by order. Hence, translating DP V pronominal at the C–I interface
involves the steps in (3.8):
(3.8) [VP x [V’ V x ]] ! ([VP V ‘‘x x’’ ]) ! *[VP V x]
1 2 3
The second step with the two tokens of x in ‘x x’ is virtual, hence it is put in
brackets. It is virtual, since with the breakdown of structure, and given the
absence of order, it has no status in the computation: eliminate V’ and what
you get is the third stage.
The transition from (3.8.1) to (3.8.3) does not change the arity of V itself. It
is still a 2-place predicate, but in (3.7)/(3.8.3) it sees only one object as its
argument. As a consequence, one theta-role cannot be assigned. Under
standard assumptions about theta-role discharge a theta-violation ensues.
(Alternatively, two roles are assigned to the same argument with the same
result.)6
Languages employ a variety of means to obviate the problem of (3.8) when
expressing reXexive relations (Schladt 2000). They may mark the verb, they
may put the pronominal inside a PP, they may double the pronominal, add a
body-part, a focus-marker, etc. Here I will limit discussion to SELF-marking
in languages such as English and Dutch. BrieXy, the minimum contribution

6 I am grateful to an anonymous reviewer for pointing out a problem in the original formulation.
In the present version I state more clearly the empirical assumptions on which the explanation rests.
Note that the problem of keeping arguments apart for purposes of theta-role assignment does not
come up in the case of two diVerent arguments, as in (i) (abstracting away from QR/l-abstraction):
(i) [VP j [V’ V m ]] ! [VP V m j ]) (–/! [VP V m/j])
1 2 3
The objects remain distinct. There is no reason that theta-relations established in the conWguration in
(i.1) would not be preserved in (i.2). Hence, the issue of (i.3) does not arise. Note, that theta-roles are
not syntactic features that modify the representation of the element they are assigned to. That is, they
are not comparable to case features. This leaves no room for the alternative in (ii) where xu1 and xu2 are
distinguishable objects by virtue of having a diVerent ‘u-feature’ composition.
(ii) [VP xu1 [V’ V xu2 ]] ! [VP V xu1 xu2 ] –/! [VP V xu1/xu2]
1 2 3
Rather, with theta-assignment spelled out, but reading xu1 correctly as ‘as x is being assigned the role u1’
we get (iii), which reXects the problem discussed in the main text:
(iii) [VP xu1 [V’ V xu2 ]] ! ([VP V ‘‘x x’’u1/u2 ]) ! *[VP V xu1/u2]
1 2 3
60 The Nature of Gradience

that SELF must make in order to make a reXexive interpretation possible is to


induce suYcient structure. That is, in a structure such as (3.9) the two
arguments of P are distinct.
(3.9) lx (P x [x SELF])
This is independent of other eVects SELF may have on interpretation. The
semantics of SELF only has to meet one condition: [x SELF] should get an
interpretation that is compatible with the fact that the whole predicate has to
be used reXexively. That implies that whatever interpretation [x SELF] gets
should be suYciently close to whatever will be the interpretation of x. This is
expressed in (3.10):
(3.10) lx (P x f(x)), with f a function that for each x yields a value that can
stand proxy for x
It is this property of SELF that gives rise to the statue-reading discussed by
JackendoV (1992). A statue reading also shows up in Dutch, as is illustrated in
(3.11b), where zichzelf despite being ‘reXexive’ refers to the Queen’s statue
rather than to the Queen herself (Reuland 2001). Importantly, the statue-
reading depends on the presence of SELF. If we replace zichzelf by zich, which
is allowed in this environment, the statue reading disappears. So, zich
expresses identity, zichzelf stands proxy for its antecedent.
(3.11) ‘Madame Tussaud’s’-context:
Consider the following example in Dutch: De koningin liep
bij Madame Tussaud’s binnen. Ze keek in een spiegel en
a. ze zag zich in een griezelige hoek staan
b. ze zag zichzelf in een griezelige hoek staan
Translation: The queen walked into Madame Tussaud’s. She looked
in a mirror and
a. she saw SE in a creepy corner stand
b. she saw herself in a creepy corner stand
Interpretations: (a) zich ¼ the queen: the queen saw herself;
(b) zichzelf ¼ the queen’s statue: the queen saw her statue.
The diYculty for our computational system to deal with diVerent tokens of
indiscernibles is in a nutshell the reason why reXexivity must be licensed. In
other languages a ‘protecting element’ can behave quite diVerently from
English himself. In Malayalam, for instance, the licensing anaphor does not
need to be locally bound at all (Jayaseelan 1997). Compare the sentences in
(3.12):
Gradedness: Interpretive Dependencies and Beyond 61

(3.12) a. raamani tan-nei *(tanne) sneehikunnu


‘Raman SE-acc self loves’
‘Raman loves himself ’
b. raamani wicaariccu [penkuttikal tan-nei tanne sneehikkunnu enn@]
‘Raman thought [girls SE-acc self love Comp]’
‘Raman thought that the girls loved him(self).’
Example (3.12a) is a simple transitive clause. A simple pronominal in object
position cannot be bound by the subject. A reXexive reading requires the
complex anaphor tan-nei tanne. In (3.12b) tan-nei tanne is put in an embed-
ded clause with a plural subject, which is not a possible binder for the
anaphor. In English the result would be ill-formed. In Malayalam, however,
the matrix subject raaman is available as a binder for the anaphor. Thus, the
presence of tanne licenses but does not enforce reXexivity. Note, that tan-ne
tanne is essentially a doubled pronoun, and tanne need not be identiWed with
SELF. Hence a diVerence in behaviour is not unexpected.
As illustrated in (3.5a), in English himself need not always be locally bound
either. Reinhart and Reuland (1991, 1993) and Pollard and Sag (1992) present
extensive discussion of such cases. Example (3.13) presents an illustrative
minimal pair.
(3.13) a. Max boasted that the queen invited Mary and him/himself for a
drink
b. Max boasted that the queen invited him/*himself for a drink

In (3.13a) the reXexive can have a long-distance antecedent, in (3.13b) it


cannot. In their Dutch counterparts we see a diVerent pattern: in both
cases, long-distance binding of zichzelf is impossible.
(3.14) a. Max pochte dat de koningin Marie en hem/*zichzelf voor een
drankje had uitgenodigd
b. Max pochte dat de koningin hem/*zichzelf voor een drankje had
uitgenodigd

In Dutch the canonical counterpart of himself is zichzelf. The question is why


an LD antecedent is possible in (3.13a) and not in (3.13b), and why it is never
possible with zichzelf.
If we only consider superWcial properties of himself and zichzelf, no non-
stipulative answer will be found. However, Reinhart and Reuland (1991)
propose a more Wne-grained analysis (also assumed in Reinhart and Reuland
1993). As we saw, a reXexive predicate must be licensed. In Dutch and English,
the licenser is SELF. Unlike Malayalam tan-ne tanne a SELF-anaphor
62 The Nature of Gradience

obligatorily marks a predicate reXexive if it is a syntactic argument of the latter


(Reinhart and Reuland 1993: a reXexive-marked syntactic predicate
is reXexive). In (3.15b) this requirement causes a clash: the predicate
cannot be reXexive due to the feature mismatch between the queen and
himself. Hence (3.13b) is ill-formed. In (3.13a) himself is properly contained
in a syntactic argument of invite, hence it does not impose a reXexive
interpretation.
But why does the analysis have to refer to the notion syntactic argument/
predicate? This falls into place if we understand reXexive marking to be a real
operation in the computational system. We know independently that coord-
inate structures or adjuncts resist certain operations that complements easily
allow. Movement is a case in point, as illustrated by the coordinate structure
constraint. Assume now, that there is a dependency between SELF and the
verb that is sensitive to the same constraints as movement. For short, let us
assume it is movement and that SELF-movement is really triggered by a
property of the verb. Its eVect in canonical cases is represented in (3.15):
(3.15) a. John admires himself
b. John SELF-admires [him [-]]

SELF now marks the predicate as reXexive, and indeed nothing resists a
reXexive interpretation of (3.15). Consider now (3.13b), repeated as (3.16):
(3.16) a. *Max boasted that [the queen invited himself for a drink]
b. *Max boasted that [the queen SELF-invited [him[-]] for a drink]

SELF attaches to the verb, marks it as reXexive, but due to the feature
mismatch between the queen and him it cannot be interpreted as reXexive,
and the sentence is ruled out. Consider next (3.13a), repeated as (3.17):
(3.17) a. Max boasted that
[the queen invited [Mary and himself ] for a drink]
b. Max boasted that
[the queen SELF-invited [Mary and [him[-]]] for a drink]

In this case SELF-movement is ruled out by the coordinate structure con-


straint. As a consequence, the syntactic predicate of invite is not reXexive-
marked, and no requirement for a reXexive interpretation is imposed. Hence,
syntax says nothing about the way in which himself is interpreted, leaving its
interpretation open to other components of the language system.
Gradedness: Interpretive Dependencies and Beyond 63

SELF also has another use, namely as an ‘intensiWer’. Hence, its inter-
pretation reXects that property precisely in those cases where its use is not
regulated by the structural requirements of grammar (as we saw in the case
of (3.5) and (3.6)). Note, that we do not Wnd the same pattern in Dutch.
This is because himself diVers from zichzelf in features. The zich in zichzelf is
not speciWed for number and gender. We know descriptively that zich must
be bound in the domain of the Wrst Wnite clause. So, any theory assigning
zich a minimal Wnite clause as its binding domain will predict that Max is
inaccessible as an antecedent to zich in both (3.14a) and (3.14b) (see
Reinhart and Reuland 1991 for references, and Reuland 2001 for an execu-
tion in current theory without indices). So, the contrast between Dutch and
English follows from independent diVerences in feature content between
himself and zichzelf.
Why must anaphors be bound? The fact that himself must be bound only
where it is a syntactic argument of the predicate, and is exempted where it is
not, already shows that this is too crude a question. Also Icelandic sig may be
exempted from a syntactic binding requirement, be it in a diVerent environ-
ment, namely subjunctive.7 Similar eVects show up in many other languages.
Hence, there can be no absolute necessity for anaphors to be bound.
Reuland (2001) shows that anaphors have to be bound only if they can
get a free rider on a syntactic process enabling the dependency between
the anaphor and its antecedent to be syntactically encoded. In the case
of anaphors such as sig, or zich a syntactic dependency can be formed as in (3.18):
(3.18)
DP I V pronoun

R1 R2 R3

R3 is the dependency between object pronoun and V, that is realized by


structural case. R1 reXects the agreement between DP and inXection. R2
represents the dependency between verb and inXection (assumed to also
present if I is on an auxiliary). All three dependencies are syntactic and
independent from binding. Dependencies can be composed. Composition
of R1-R2-R3 yields a potential dependency between pronoun and DP. For
reasons discussed in Reuland (2001) this dependency can only be formed

7 See the example in (i):


(i) Marı́a var alltaf svo andstyggileg. Þegar Olafurj kæmi segd̄i hún séri/*j áreid̄anlega ad̄ fara . . .
(Thráinsson 1991)
Mary was always so nasty. When Olaf would come, she would certainly tell himself [the person
whose thoughts are being presented—not Olaf] to leave
64 The Nature of Gradience

eVectively if the pronoun lacks a speciWcation for grammatical number. Thus,


zich allows it (lacking a number speciWcation), its pronominal counterpart
hem ‘him’ does not. Complementarity between pronominals and anaphors
follows not from any local prohibition on binding pronominals, but from a
very general principle concerning the division of labour between components
of the language system: in order to interpret zich as a variable bound by the
antecedent, syntactic processes suYce. The dependency between a pronom-
inal and its antecedent cannot be established in the syntax (grammatical
number prevents this), but requires that part of the computation is carried
out via the CI Interface. Switching between components incurs a cost, so
using zich is less costly than using a pronominal instead.
In the case of SELF the trigger must be diVerent. Dutch zelf does indeed
have a relevant property which so far has gone unnoticed. Consider the
following contrast between zich and zichzelf in Dutch, which shows up with
verbs allowing either. One such verb is verdedigen ‘defend’. Suppose a group of
soldiers has been given the assignment to occupy a hill and subsequently the
enemy attacks. After the battle a number of situations can obtain, two of
which can be described as follows: (a) the soldiers kept the hill, but at the cost
of most of their lives; (b) the soldiers lost the hill, they all stayed alive. In the
Wrst case one can properly say (3.19a), but not (3.19b). In the second case one
can say either:
(3.19) a. De soldaten verdedigden zich met succes
The soldiers defended ‘them’ successfully
b. De soldaten verdedigden zichzelf met succes
The soldiers defended themselves successfully
Zichzelf has a distributive reading (each of the soldiers must have defended
himself successfully), whereas zich is collective. If the verbal projection has a
position to mark distributivity (Stowell and Beghelli 1997), this is suYcient
to warrant attraction of SELF. If so, we have on the one hand an inde-
pendent account of the fact that SELF is attracted to the verb, we have an
account for the meaning contrast in (3.19), and the account accommodates
the fact that not all licensers are attracted and that reXexivity is not always
enforced.
This approach predicts correlations that go beyond being an anaphor or
being a pronominal. For instance, given that in Malayalam reXexivity is not
enforced by tan-ne tanne, one predicts that it does not mark any special
property of the verb. Whether this prediction is in fact correct is a matter of
further research.
Gradedness: Interpretive Dependencies and Beyond 65

3.3.2 ReXexives in PPs


The following contrast between French (Zribi-Hertz 1989) and Dutch illus-
trates how a small variation in grammar independent from binding may have
a signiWcant eVect on binding.
(3.20) a. Jean est Wer de lui/lui-même
Jean is proud of him/himself
b. Jean est jaloux de *lui/lui-même
Jean is jealous of him/himself
c. Jean bavarde avec *lui/lui-même
Jean mocks (of) him/himself
d. Jean parle de lui/lui-même
Jean talks (of) him/himself
This pattern is sometimes taken to be a problem for a conWgurational
approach to binding, and condition B in particular. ConWgurationally
(3.20a) and (3.20b) are identical, and the same holds for (3.20c) and (3.20d).
If so, how can a pronominal be allowed in the one case, and not in the other?
Alternatively, it is argued, these examples show that the selection of anaphors
is sensitive to semantic-pragmatic conditions. How ‘expected’, or ‘normal’, is
the reXexivity of the relation expressed by the predicate? Let us assume the
relevant factor in French is indeed semantic-pragmatic. Yet, this cannot be all
there is to it, since in the corresponding paradigm in Dutch all have the same
status and in all cases a complex anaphor is required.
(3.21) a. Jan is trots op zichzelf/*zich
Jan is proud of him/himself
b. Jan is jaloers op zichzelf/*zich
Jan is jealous of him/himself
c. Jan spot met zichzelf/*zich
Jan mocks (of) him/himself
d. Jan praat over zichzelf/*zich
Jan talks (of) him/himself

One, surely, could not seriously claim that Dutch speakers have a diVerent
pragmatics, or that trots means something really diVerent than Wer. A clearly
syntactic factor can be identiWed, however. As is well-known, Dutch has
preposition stranding, but French does not. Whatever the precise implemen-
tation of preposition stranding, it must involve some relation between P and
the selecting predicate head that obtains in stranding languages like Dutch
and does not in French. Let us assume for concreteness sake that this relation
66 The Nature of Gradience

is ‘allows reanalysis’. Thus, P reanalyses with the selecting head in Dutch, not
in French (following Kayne 1981). We will be assuming that reanalysis is
represented in logical syntax. If so, in all cases in (3.21) we have (3.22) as a
logical syntax representation:
(3.22) DP [V [P pro]] ! . . . .[ V-P] . . . ! DP (lx ([V-P] x x))
We can see now, that at logical syntax level we have a formally reXexive
predicate. Such a predicate must be licensed, which explains the presence of
SELF in all cases.
In French there is no V-P reanalysis. Hence, we obtain (3.23):
(3.23) DP [V [P pro]] ! DP (lx (V x [P x]))
Here, translating into logical syntax does not result in a formally reXexive
predicate. This entails that no formal licensing is required.8 Hence it is indeed
expectations, or other non-grammatical factors that may determine whether a
focus marker like même is required. In a nutshell, we see how in one language
a grammatical computation may cover an interpretive contrast that shows up
in another.

3.3.3 Locally bound pronominals


Certain languages allow pronominals to be locally bound. Frisian is a case in
point. Frisian is instructive, since the literature occasionally reports that
Frisian has no specialized anaphors (Schladt 2000). This is in fact incorrect,
since in Frisian as in Dutch, reXexivity must be licensed by adding sels to the
pronominal, yielding himsels.9 A rule of thumb is that Frisian has the bare
pronominal him where Dutch has the simplex anaphor zich.10 So, in Frisian
one Wnds the pattern in (3.24):
(3.24) a. Jan wasket him
John washes himself
b. Jan Welde [him fuortglieden]
John felt himself slip away

8 Note, that by itself même is quite diVerent from SELF (for instance in French there is no *même-
admiration along the lines of English self-admiration, etc.).
9 The occasional claim that Frisian does not have specialized anaphors is reminiscent of the claim
that Old English does not have them. For Frisian the claim is incorrect. For Old English I consider the
claim as inconclusive. Going over Old English texts, it is striking that all the cases that are usually
adduced in support of the claim are in fact cases that would make perfectly acceptable Frisian. What
one would need are clear cases of sentences with predicates such as hate or admire to settle the point.
10 Note that him is indeed a fully-Xedged pronominal that can be used as freely as English him or
Dutch hem.
Gradedness: Interpretive Dependencies and Beyond 67

c. Jan bewûndere him*(sels)


John admired himself
This is one of the cases where re-ranking of constraints might descriptively
work. However, Reuland and Reinhart (1995) present independent evidence
that pronominals such as him in Frisian are under-speciWed for structural
case. Fully explaining the consequences of this diVerence in feature speciWca-
tion would lead us beyond the scope of this article. For current purposes a
fairly crude suggestion suYces: whereas zich’s under-speciWcation for number
makes it possible for Dutch zich to form a syntactic dependency with its
antecedent along the lines indicated in (3.18), the case system in Frisian does
not enable an element in the position of the pronoun in (3.18) to enter a
syntactic dependency with the antecedent, that is the link R3 is not eVective
for encoding in Frisian. This goes beyond just saying that Frisian happens to
lack a zich-type anaphor. Even if Frisian had a zich-type anaphor, it would not
be more economical than using him. In any case, the case correlation shows
the importance of considering the Wne grain of grammar to understand the
syntactic processes of binding.
As discussed in Reuland and Reinhart (1995) German dialects also exhibit
interesting variation in the local binding of pronominals. This variation is
case related as well, although so far a detailed analysis has not been given. Yet,
any theory that fails to take into account that it is case that links DPs, and
therefore also anaphors and pronominals, to the fabric of the sentential
structure is at risk of missing the source of the variation.
This paves the way for discussing an instance of grammar-based gradedness
in Dutch. Since Dutch has a three-way system, with a simplex anaphor, a
complex anaphor, and a pronominal, the Dutch counterpart of (3.24) has a
couple more options than Frisian:
(3.25) a. Jan wast zich/1hem
John washes himself
b. Jan voelde [zich/1hem wegglijden]
John felt himself slip away
c. Jan bewonderde zichzelf/1zich/ 2hem
John admired himself

In (3.25a) and (3.25b), using hem instead of zich only violates the principle
that it is more economical to have an anaphor than a pronominal. In (3.25c),
however, hem violates two principles, economy and the principle that reXex-
ivity be licensed. And indeed, we do Wnd diVerent degrees of ill-formedness
reXecting the number of violations.
68 The Nature of Gradience

3.3.4 Variability and gradedness in binding relations


It is time for a summary of how real or apparent variation in binding arises.
The collective–distributive contrast in the interpretation of zich versus
zichzelf is a potential source of variation in judgements. Unless collectivity
versus distributivity is systematically controlled for, an investigator may have
the impression of Wnding arbitrary intra- and inter-subject variation where in
fact there is underlying systematicity.
Assessing binding into PPs, one must distinguish between categorical,
grammatical factors, and non-categorical extra-grammatical factors. In
Dutch PPs of the type Jan praat over zichzelf ‘Jan talks of himself ’ grammat-
ical factors overrule possible sources of variation from discourse: no variation
between complex and simplex anaphors obtains. However, in French Jean
parle de lui/lui-même ‘Jean talks of him/himself ’ there are two options, since
grammar leaves the choice of strategy open. Judgements will, therefore, be
open to variation governed by intra-subject shifts in perspective, like Necker
cube eVects in visual perception. Something similar obtains in Dutch locative
PPs, where the grammar leaves open the choice between zich and hem, and
consequently, grammatically they are in free variation. Yet, the choice is not
arbitrary: informally speaking, Jan keek onder zich ‘Jan looked under SE’ is
presented from Jan’s perspective, whereas Jan keek onder hem ‘Jan looked
under him’ is presented from the speaker’s perspective. Subtle though this is,
this is another factor to be controlled for.
As the contrasts in (3.5) and (3.6) showed, perspective with its concomitant
variability is also involved in the licensing of exempt anaphors. Extra-
grammatical factors become visible where the categorical distinctions of
grammar leave interpretive options open.
I presented case as a factor in the local binding of pronominals. In
languages without a strong morphology, case by itself is a low-level variable
factor in the grammar, and a typical area in which we Wnd dialectal variation
and variation across registers in language. Slight variations in the case system
may have high level eVects on binding possibilities of pronominals, as may
other types of variation in feature composition. In addition we saw that
wherever binding phenomena are determined by the interaction between
several conditions, gradedness phenomena also can be explained from the
satisfaction of some, but not all conditions.
Gradedness: Interpretive Dependencies and Beyond 69

3.4 By way of conclusion


I realize that the discussion in this article implies that no easy explanations are
forthcoming in linguistics. Being a pronominal or being an anaphor are no
longer characterizations that can be taken at their face value. What are the
components of a pronominal or anaphoric expression? What grammatical
features do they have? What are the properties of the case system? How is
agreement eVected? How are prepositions and verbs related? What lexical
operations does a language have? What morphological operations are avail-
able to manipulate the argument structure of verbs? The answers to such
questions are needed in order to assess whether putative instances of tenden-
cies, or grading, are really in the purview of grammar.
If certain issues of language have to remain outside our grammars, this
simply reXects the fact that, whatever grammar does, it certainly does not tell
us all of how to say what we want to say.
4

Linguistic and Metalinguistic Tasks


in Phonology: Methods and
Findings
S T E FA N A . F R I S C H A N D A D R I E N N E M . S T E A R N S

4.1 Introduction
Phonologists have begun to consider the importance of gradient phonological
patterns to theories of phonological competence (e.g. Anttila 1997; Frisch
1996; Hayes and MacEachern 1998; Ringen and Heinamaki 1999; Pierrehum-
bert 1994; Zuraw 2000; and for broader applications in linguistics see the
chapters in Bod et al. 2003 and Bybee and Hopper 2001). Gradient patterns
have been discovered in both internal evidence (the set of existing forms in a
language) and external evidence (the use and processing of existing and
novel forms in linguistic and metalinguistic tasks). The study of gradient
phonological patterns is a new and promising frontier for research in
linguistics. Grammatical theories that incorporate gradient patterns provide
a means to bridge the divide between competence and performance, most
directly in the case of the interface between phonetics and phonology (Pierre-
humbert 2002, 2003) and between theoretical linguistics and sociolinguistics
(e.g. Mendoza-Denton et al. 2003). In addition, linguistic theories that
incorporate gradient patterns are needed to unify theoretical linguistics
with other areas of cognition where probabilistic patterns are the rule rather
than the exception.
A variety of methodologies have been used in these studies, and an over-
view of these methodologies is the primary focus of this paper. The method-
ologies that are reviewed either attempt to directly assess phonological
knowledge (well-formedness tasks and similarity tasks) or indirectly reXect
phonological knowledge through performance (elicitation of novel forms,
corpus studies of language use, errors in production and perception). A case
Linguistic and Metalinguistic Tasks in Phonology 71

study demonstrating an approach that combines both internal and external


evidence is also presented.

4.2 Approaches to phonological judgements (data sources)


The evidence for gradient patterns as part of the phonological knowledge of a
language comes from a variety of sources, including psycholinguistic experi-
ments using metalinguistic and language processing tasks, and studies of
language corpora. In this section, these methods are brieXy reviewed. While
the focus of this chapter is on tasks involving metalinguistic judgements,
additional sources of evidence that indirectly reXect linguistic knowledge,
such as corpus studies, are also reviewed. When these diVerent methodo-
logical approaches are used together, strong evidence for the relevance of
gradient phonological patterns can be obtained. For example, lexical corpus
studies have revealed numerous systematic statistical patterns in the phono-
tactics of a variety of languages. If metalinguistic tasks where participants
make judgements about novel forms that share those same patterns demon-
strate that participants make probabilistic generalizations, then it must be the
case that those statistical patterns are part of the phonological knowledge of
the participants.

4.2.1 Direct measures


Direct measures of phonological judgements are tasks that explicitly ask
participants to make a judgement about a lexical item or phonological form.
These tasks probe the explicitly available linguistic knowledge of participants.

4.2.1.1 Well-formedness judgements The most commonly used judgement


tasks are well-formedness judgement tasks. Well-formedness judgement tasks
attempt to directly probe the grammaticality of phonological forms. Two
variants of the well-formedness judgement task are the acceptability task
and the wordlikeness task.
In the acceptability task, the participant is presented with a novel phono-
logical form and given two choices, acceptable/possible or unacceptable/
impossible. This task is equivalent to the prototypical grammaticality
judgement task used elsewhere in linguistics. While the acceptability task
might, at Wrst glance, appear to reduce the likelihood of collecting gradient
data, gradience is still commonly found in acceptability judgement tasks in
variation in the response to a stimulus item across participants. For example,
Coleman and Pierrehumbert (1997) collected acceptability judgements for
72 The Nature of Gradience

novel multisyllabic words containing illegal onset consonant clusters. They


found considerable variability across participants in the acceptability of these
clusters, and this variability was predicted in part by the probability of the rest
of the novel word. An illegal cluster in a high probability context was accepted
by more participants than an illegal cluster in a low probability context.
The wordlikeness task attempts to measure gradient well-formedness
within individual participants. Participants are asked to judge the degree to
which a novel word could be a word in their language. In Frisch et al. (2000)
and Frisch and Zawaydeh (2001), these judgements were on a one to seven
scale, where ‘one’ represents a word that could not possibly be a word in
the language and ‘seven’ represents a word that could easily be a word in the
language. Frisch et al. (2000) and Frisch and Zawaydeh (2001) collected
wordlikeness judgements for novel multisyllabic words in English and Arabic
respectively. As in the Coleman and Pierrehumbert (1997) study, one factor
that was shown to strongly aVect wordlikeness is the aggregate probability of
the constituents in the novel word. An example of this task applied to novel
monosyllabic words is given in the case study section of this chapter.
Frisch et al. (2000) collected both acceptability judgements and wordlike-
ness judgements for the same novel word stimuli in English. In a comparison
of the two sets of data, they found that participants appeared to perform a
similar judgement task whether they were allowed just two options (accept-
able/unacceptable) or seven (1–7). In particular, Frisch et al. (2000) found that
the ‘unacceptable’ option was used in the acceptability task more frequently
than the rating of ‘one’ in the wordlikeness task. Thus it appeared that
participants applied the acceptable/unacceptable rating like a wordlikeness
task on a scale from ‘one’ to ‘two’. Overall, they found that phonotactic
probability predicted participant judgements in both tasks, and that partici-
pant judgements were similar across the two tasks. Given that acceptability
and wordlikeness tasks generated similar results, but the wordlikeness task
allows participants to express more subtle distinctions between forms,
researchers interested in gradient phonological patterns would likely beneWt
from using the wordlikeness task.
Well-formedness judgement tasks have also been used to probe mor-
phophonological knowledge. Zuraw (2000) investigated nasal substitution
in Tagalog. Nasal substitution in Tagalog is a phonological process where
a stem initial obstruent consonant is replaced by a homorganic nasal.
For example, the stem /kamkam/ ‘usurpation’ surfaces with initial /˛/ in
/ma3pa3˛amkam/ ‘rapacious’. This process is lexically gradient, occurring with
some stems but not others. The likelihood that a stem participates in nasal
substitution depends on the obstruent at the onset of the stem. Substitution is
Linguistic and Metalinguistic Tasks in Phonology 73

more likely for /p t s/ and less likely for /d g/. Zuraw (2000) used elicitation
tasks to explore the productivity of this process, and she also used a well-
formedness task where participants judged the acceptability of nasal substi-
tution constructions using novel stems in combination with common
preWxes. She compared wordlikeness judgements for the same novel stem in
forms with and without nasal substitution, and found that ratings were higher
for nasal substitution in cases where nasal substitution was more common in
the lexicon (e.g. /p t s/ onsets to the novel words).
Hay et al. (2004) examined the inXuence of transitional probabilities on
wordlikeness judgements for novel words containing medial nasal-obstruent
clusters (e.g. strimpy, strinsy, strinpy). Overall, they found that wordlikeness
judgements reXected the probability of the nasal-obstruent consonant cluster.
However, they found surprisingly high wordlikeness for novel words when the
consonant cluster was unattested (zero probability) in monomorphemic
words. They hypothesized that these high judgements resulted from partici-
pants’ analyses of these novel words as multimorphemic. This hypothesis was
supported in additional experiments where subjects were asked to make a
forced choice decision between two novel words as to which was more
morphologically complex. Participants were more likely to judge words
with low probability internal transitions as morphologically complex, dem-
onstrating that participants considered multiple analyses of the forms they
were given. Hay et al. (2004) proposed that participants would assign the
most probable parse to forms they encountered (see also Hay 2003).
The Wndings of Hay et al. (2004) highlight the importance of careful design
and analysis of stimuli in experiments using metalinguistic tasks. Although
these tasks are meant to tap ‘directly’ into phonological knowledge, the
judgements that are given are only as good as the stimuli that are presented
to participants and there is no guarantee that the strategy employed by
participants in the task matches the expectations of the experimenter. Another
example of this problem appears in the case study in this chapter, where
perceptual limitations of the participants resulted in unexpected judgements
for nonword forms containing presumably illegal onset consonant clusters.

4.2.1.2 Distance from English Greenberg and Jenkins (1964) is the earliest
study known to the authors that examined explicit phonological judgements
for novel word forms in a psycholinguistic experiment. They created novel
CCVC words that varied in their ‘distance from English’ as measured by a
phoneme substitution score. For each novel word, one point was added to its
phoneme substitution score for every position or combination of positions for
which phoneme substitution could make a word. For example, for the novel
74 The Nature of Gradience

word /druk/ there is no phoneme substitution that changes the Wrst phoneme
to create a word. However, if the Wrst and second phonemes are replaced, a
word can be created (e.g. /Xuk/). For every novel word, substitution of all four
phonemes can make a word, so each novel word had a minimum score of one.
Greenberg and Jenkins (1964) compared the cumulative edit distance against
participants’ judgements when asked to rate the novel words for their ‘distance
from English’ using an unbounded magnitude estimation task. In this task,
participants are asked for a number for ‘distance from English’ based on their
intuition, with no constraint given to them on the range of numbers to be
used. Greenberg and Jenkins (1964) found a strong correlation between the
edit distance measure and participants’ judgements of ‘distance from English’.
They also found similar results when they used the same stimuli in a
wordlikeness judgement task where participants rated the words on an
11-point scale. Given that the data from a wordlikeness task is simpler to
collect and analyse, there appears to be no particular advantage to collecting
distance judgements in the study of phonology.

4.2.1.3 Similarity between words Another phonological judgement task is


to ask participants to judge the similarity between pairs of real words or novel
words (Vitz and Winkler 1973; Sendlmeier 1987; Hahn and Bailey 2003). This
task has been used to investigate the dimensions of word structure that are
most salient to participants when comparing two words. Presumably, these
same dimensions of word structure would be the most salient to participants
in making well-formedness judgements which, in some sense, involve
comparing a word to the entire lexicon. These tasks have found that word
initial constituents and stressed constituents have greater impact on similarity
judgements than other constituents. However, it is well-known from the
literature on similarity judgements in cognitive science that similarity
comparisons are context dependent (Goldstone et al. 1991). Thus the most
salient factors for a pairwise comparison of words might not be the most
salient factors when a novel word is compared to the language as a whole. This
is a research area where very little direct work has so far been done (see Hahn
and Bailey 2003). Frisch et al. (2000) analysed a variety of predictors for their
well-formedness judgements and found some evidence that word initial
constituents and stressed constituents had a greater role in predicting
well-formedness, at least in the case of longer novel words. Thus, it appears
that similarity judgements are a tool that can investigate the same dimensions
of lexical structure and organization from a diVerent perspective.
If a well-deWned relationship between well-formedness and similarity
judgements can be found, then that would suggest more explicitly that the
Linguistic and Metalinguistic Tasks in Phonology 75

well-formedness task is somehow grounded in a similarity judgement to one


or more existing lexical items.

4.2.2 Indirect measures


Indirect measures of well-formedness reXect grammatical linguistic know-
ledge through linguistic performance. Indirect measures have provided evi-
dence for the psychological reality of gradient phonological patterns, as
several studies have shown participant performance to be inXuenced by the
phonological probability of novel words or sub-word constituents.
4.2.2.1 Elicitation of novel forms The traditional linguistic approach to
exploring competence experimentally is to elicit examples that demonstrate
the linguistic process of interest. This procedure has been extended to
elicitations that involve nonsense forms, demonstrating unequivocally that a
process is productive. This type of experiment is sometimes referred to as
‘wug-testing’, in honour of one of the Wrst uses of this technique by Berko
(1958) to explore children’s knowledge of plural word formation using
creatures with novel names (e.g. ‘this is a wug, and here comes another one,
so now there are two of them; there are now two ____’). Elicitation of novel
forms can be used to examine gradient phonological patterns. Variation
across participants in their performance in an elicitation task provides
evidence for an underlying gradient constraint. Alternatively, if participants
are given a number of stimuli of roughly the same phonological form to
process, variation within an individual can be observed (Zuraw 2000).
4.2.2.2 Analysis of lexical distributions Linguistic corpora have also been
used as a source of external evidence. Generally, these corpora have resulted
from large, organized research projects such as CELEX (Burnage 1990) or the
Switchboard corpus (Godfrey et al. 1992). Recently, however, web searches
have been used to generate corpora from the large amount of material posted
on the internet (e.g. Zuraw 2000). Written corpora may be more or less useful
for phonological study, depending on the phenomenon to be examined and
the transparency of the writing system for the language involved. Written
corpora have shown the greatest potential in the study of morphophonology.
For example, Anttila (1997) and Ringen and Heinamaki (1999) examined the
quantitative pattern of vowel harmony in Finnish suYxes using large written
corpora. They found that harmony is variable, with the likelihood of harmony
depending on the distance between the harmony trigger and target, and also
the prosodic prominence of the trigger. In a subsequent psycholinguistic task,
Ringen and Heinamaki (1999) found that type frequencies from their corpus
predicted well-formedness judgements of native speakers.
76 The Nature of Gradience

4.2.2.3 Confusability in perception and production Gradient well-


formedness of phonological forms is also connected to speech perception
and production processes. It is well documented that word perception is
inXuenced by the number of phonologically similar words in the lexicon,
known as the size of the word’s lexical neighbourhood (e.g. Luce and Pisoni
1998). However, lexical neighbourhood sizes are correlated with phonological
probability and the relationship between these two measures has yet to be
clearly established (Bailey and Hahn 2001; Luce and Large 2001). Overall,
words that share sub-word constituents such as onsets, rimes, or nuclei with
many other words are subject to competition during spoken word
recognition. In addition, generalizations about more and less frequent sub-
word sequences bias phonetic perception in favour of more frequent
sequences (Pitt and McQueen 1998; Moreton 2002). Together, these Wndings
suggest that lexical organization for language processing utilizes those sub-
word phonotactic constituents.
Similar eVects of phonotactic probability have been demonstrated in
speech production. For example, Munson (2001) probed the relationship
between phonological pattern frequency and repetition of nonwords by adults
and children. The nonwords created for Munson’s study used consonant
clusters embedded in a two-syllable construct (CVCCVC). The stimuli were
Wrst tested to determine if the frequency of constituents aVected wordlikeness
ratings by participants. The wordlikeness ratings were highly correlated with
the phonological pattern frequency of the sound sequences within the novel
words. Munson then used the same stimuli to study the ability of children and
adults to accurately repeat nonwords and found that subjects were less
accurate in their repetition of nonwords that contained less frequent phon-
eme sequences. In addition, Munson (2001) found there was greater variabil-
ity within participants’ productions for low probability items versus high
probability items. Similar speech production eVects have been found else-
where for monolingual adults (e.g. Vitevitch and Luce 1998) and second
language learners (e.g. Smith et al. 1969), although with additional compli-
cations that will not be discussed here. These results further support theories
that information about phonological pattern frequency is encoded at the
processing and production levels of linguistic representation.

4.2.3 Summary
Studies of language patterns, language processing, and metalinguistic judge-
ments have found substantial evidence that probabilistic phonological pat-
terns are part of the knowledge of language possessed by speakers. These
Linguistic and Metalinguistic Tasks in Phonology 77

patterns are reXected in the frequency of usage of phonological forms, the ease
of processing of phonological forms, and gradience in metalinguistic judge-
ments about phonological forms. In the next section, a case study is presented
that demonstrates probabilistic patterns in the cross-linguistic use of conson-
ant clusters and gradience in metalinguistic judgements about novel words
with consonant clusters that reXects the cluster’s probability.

4.3 Case study


The example case study presented here focuses on initial consonant clusters.
In a linguistic corpus study, the frequency of word onset consonant
clusters across and within languages is examined. Cross-linguistic patterns
of consonant clustering are reXected in the likelihood of use of a consonant
cluster type within a language. The pattern appears to be gradient, and based
on the scalar property of sonority. In an experimental study, wordlikeness for
novel words based on the type frequency of the onset cluster is examined.
Wordlikeness judgements are aVected by consonant cluster frequency, show-
ing that native speakers are sensitive to consonant cluster frequency, and thus
are capable of encoding a gradient onset consonant cluster constraint.

4.3.1 Cross-linguistic patterns in onset clusters


It has been claimed that consonant cluster combinations are restricted within
and across languages by sonority sequencing. Sonority is a property of seg-
ments roughly corresponding to the degree of vocal tract opening or acoustic
intensity (Hooper 1976). Stop consonants have the lowest sonority and liquids
and semi-vowels have the highest sonority. Analyses of sonority sequencing
constraints have generally assumed that there is a parametric restriction on the
degree of sonority diVerence required in a language, where larger sonority
diVerences are allowed and smaller sonority diVerences are not allowed
(e.g. Kenstowicz 1994). The current study shows that sonority restrictions are
gradient, rather than absolute. A closer examination of sonority sequencing
for onsets presented in this section shows that the scalar property of sonority is
reXected quantitatively in the frequency of occurrence of consonant cluster
types across languages. In addition, the data support an analysis of sonority
constraints based on ‘sonority modulation’ (Ohala 1992; Wright 1996).
The cross-linguistic preference is for onset consonant clusters that have a
large sonority diVerence, whether rising or falling. Consonant clusters with
large sonority diVerences occur more frequently than consonant clusters
with small sonority diVerences across a wide range of languages.
78 The Nature of Gradience

Evidence for gradient sonority modulation constraints can be found by


looking at the distribution of consonant clusters within a language. Although
a particular degree of sonority modulation may be permitted in a language, it
is usually the case that not every conceivable cluster for a particular sonority
diVerence is attested. For example, consider word Wnal liquid+C clusters in
English, as in pert, purse, fern, furl. Table 4.1 shows the number of attested
word Wnal liquid+C cluster types, the number that are theoretically possible
given the segment inventory of English, and the percentage of possible clusters
that are attested, for the four diVerent levels of sonority diVerence. For
example, attested cluster types include /rt/, /rs/, /rn/, and /rl/ as demonstrated
in the examples above, as well as /lt/ (e.g. pelt, wilt, salt), /lb/ (bulb), /lv/ (delve,
valve), /lm/ (elm, psalm). Unattested clusters include /lg/, /r˛/, and /lr/. The
number of possible consonant clusters is determined by the size of the
segment inventory. For example, English has three nasal consonants and
two liquids, so there are six theoretically possible nasal-liquid combinations.
Notice in Table 4.1 that as the sonority diVerence between C and the preced-
ing liquid decreases, the relative number of attested clusters decreases quan-
titatively, even when the possible number of cluster combinations is taken
into account.
Similar quantitative analyses of permissible consonant clusters were con-
ducted for onset clusters in a sample of thirty-seven languages from a variety
of language families. In this analysis, attested and possible cluster types were
determined for all combinations of two stop, fricative, nasal, and liquid
consonants as word onsets. The languages analysed were Abun, Aguatec
(Mixtec), Albanian, Amuesha, Chatino, Chinantec (Quioptepec), Chinantec
(Usila), Chontal (Hokan), Chukchee, Couer D’Alene, Cuicatec, Dakota
(Yankton), English, Greek, Huichol, Hungarian, Ioway-Oto, Italian, Keresan,
Khasi, Koryak, Kutenai, Mazatec, Norwegian, Osage, Otomi (Mazahua),
Otomi (Temoayan), Pame, Portuguese, Romanian, Takelma, Telugu, Terena,
Thai, Totonaco, Tsou, and Wichita. For each language, a four-by-four table
similar to Table 4.1 was created with the percentage of attested clusters for

Table 4.1. Word Wnal liquid+C clusters in English

C Stop Fricative Nasal Liquid

Attested 11 11 4 1
Possible 12 16 6 2
% Attested 92% 69% 67% 50%
Linguistic and Metalinguistic Tasks in Phonology 79

each combination of consonant type (e.g. stop-stop, stop-fricative, fricative-


stop, etc.).
Each four-by-four table was then analysed to determine whether larger
sonority diVerences were reXected in greater numbers of attested clusters
while smaller sonority diVerences were reXected in fewer numbers of attested
clusters. For each four-by-four table, there are twenty-four such comparisons
that could be made (e.g. stop-stop to stop-fricative, stop-stop to stop-nasal,
stop-stop to fricative-stop, etc.). The mean across languages was 74 per cent of
comparisons supporting sonority modulation (e.g. fricative-stop and stop-
fricative more frequent than stop-stop and fricative-fricative). In other words,
possible clusters with large sonority diVerences are more frequently attested
than possible clusters with small sonority diVerences cross-linguistically. For
these onset clusters, sonority modulation appeared to be equally robust
for rises toward the syllable nucleus (75 per cent) as for falls toward the
nucleus (73 per cent). The minimum for any particular language was for 50
per cent of comparisons supporting sonority modulation (and so 50 per cent
not supporting it) and the maximum was 100 per cent of comparisons
supporting sonority modulation. Interestingly, non-modulation was never
more common than sonority modulation for any of the thirty-seven languages,
suggesting that sonority modulation is a universal gradient cross-linguistic
constraint.

4.3.2 Wordlikeness judgements


Given the apparent presence of quantitative sonority modulation constraints,
the next question to be examined is whether these quantitative constraints are
part of the synchronic grammar of a native speaker. In this section, an
experiment is presented that demonstrates that onset consonant cluster
frequencies inXuence well-formedness judgements of native speakers of Eng-
lish. In this experiment, native speakers of English rated novel monosyllabic
words with onset consonant clusters for wordlikeness, as in previous studies
(e.g. Frisch et al. 2000).
4.3.2.1 Stimuli The stimulus list was created by randomly matching twenty
diVerent onset consonant clusters with varying frequencies of occurrence in
the lexicon (type frequency) to rimes selected from the mid-range of rime
frequencies. The occurrence frequency of each constituent was obtained from
a computerized American English dictionary (Nusbaum et al. 1984). Among
the consonant clusters were four clusters that do not occur in English (/sr, tl,
dl, ul/) and the remaining sixteen ranged in frequency from clusters that
occur very rarely (/gw, sf, dw/) or somewhat rarely (/tw, sm, ur, sn/) to clusters
80 The Nature of Gradience

that occur moderately frequently (/kw, dr, sl, fr/) or very frequently (/pl, sp, X,
gr, pr/) in the lexicon. The nonword list was analysed to avoid potential
confounding factors for wordlikeness judgements such as violating a
phonotactic constraint somewhere other than in the onset. Rime statistics
were also compiled to examine the eVects of rime frequency on judgements
(as in Kessler and Treiman 1997). After discarding items that were not suitable,
115 novel words remained to be used in the experiment. The nonwords were
recorded as spoken by the Wrst author using digital recording equipment.
4.3.2.2 Participants Thirty-Wve undergraduate students in an introductory
communication sciences and disorders course participated in the experiment.
Subjects were between 19 and 45 years of age, and three males and thirty-two
females participated. All participants were monolingual native speakers of
American English and reported no past speech or hearing disorders.

4.3.2.3 Procedure The experiment was conducted using ECOS/Win


experiment software. Participants were seated at individual carrels for the
experiment in groups of one to four. Subjects listened to each of the stimuli
presented one at a time through headphones at a comfortable listening level.
The computer screen displayed a rating range between 1 (not at all like
English) and 7 (very much like English) and the participants gave their
responses by clicking with a mouse on the button that corresponded with
their rating. The total experiment required approximately 15 minutes of the
participants’ time to complete.

4.3.2.4 Results The data were analysed based on the mean rating given to
each stimulus word across subjects. Three instances where a participant gave
no response to the stimulus were discarded; otherwise, all data were analysed.
Correlations were examined for the type frequency of onset CC, C1, C2, rime,
nucleus, and coda. As expected, mean wordlikeness judgements correlated
signiWcantly with the type frequency of the CC sequences contained in the
novel words. The CC frequency was the strongest predictor of how the
participants judged wordlikeness (r ¼ 0.38). The frequency of the nucleus
was also correlated with the participant’s judgement of wordlikeness (r ¼ 0.19).
In a regression model of the wordlikeness data using these two factors, both
factors were found to be signiWcant (CC: t(112) ¼ 4.1; p < .001; Nucleus:
t(112) ¼ 2.0; p < .05).
The amount of variance explained by the CC frequency and the nucleus
frequency is relatively small (cf. Munson 2001; Hay et al. 2004). Unexpectedly,
we found that participant judgements of the novel words containing CC that
do not occur in English were fairly high. Possible explanations for this Wnding
Linguistic and Metalinguistic Tasks in Phonology 81

are being investigated. Based on preliminary data collected to date, it appears


that participants do not consistently perceive the illegal CC that is presented,
but instead regularly perceive a similar sounding CC that is allowed in English
(e.g. /tl/ ! /pl/). As mentioned in the discussion of Hay et al. (2004),
experiment participants assigned a more probable parse to the unattested
sequences present in the stimuli.
Setting aside the unattested clusters, the pattern for the remaining clus-
ters is as expected. Mean average ratings for the attested consonant clusters
used in the experiment are shown in Figure 4.1 with a best Wt line for the
eVect of log CC type frequency in these clusters. The data clearly show that
the CC frequency was a strong indicator of the subjects’ ratings of stimulus
words.
Overall, then, this experiment demonstrates that English speakers are
sensitive to the frequency of occurrence of onset consonant clusters, and
thus have learned the patterns in the lexicon that would reXect a universal
sonority modulation constraint. Thus it seems possible that English speakers
(and presumably speakers of other languages) could learn a sonority modu-
lation constraint.

fl
5 fr gr
kw
Mean Rating

sp
θr psn
4 snpldr

dw sl
3 sf

2 gw

1
1 10 100 1000
Log CC Type Frequency

Figure 4.1. Mean wordlikeness rating for occurring consonant clusters in English
(averaged across stimuli and subjects) by consonant cluster type frequency
82 The Nature of Gradience

4.4 Implications
Phonology provides an ideal domain in which to examine gradient patterns
because there is a rich natural database of phonological forms: the mental
lexicon. A growing number of studies using linguistic and metalinguistic
evidence have shown that phonological structure can be derived, at least in
part, from an analysis of patterns in the lexicon (e.g. Bybee 2001; Coleman and
Pierrehumbert 1997; Frisch et al. 2004; Kessler and Treiman 1997; and see
especially Hay 2003). In addition, it has been proposed that the organization
and processing of phonological forms in the lexicon is a functional inXuence
on the phonology, and that gradient phonological patterns quantitatively
reXect the diYculty or ease of processing of a phonological form (Berg 1998;
Frisch 1996, 2000, 2004).
For example, the storage of a lexical item must include temporal informa-
tion, as distinct orderings of phonemes create distinct lexical items (e.g. /tAp/
is diVerent from /pAt/ in English). The temporal order within a lexical item is
reXected in some models of speech perception, such as the Cohort model and
its descendents (e.g. Marslen-Wilson 1987), where lexical items compete Wrst
on the basis of their initial phonemes, and then later on the basis of later
phonemes. Sevald and Dell (1994) found inXuences of temporal order on
speech production. They had participants produce nonsense sequences of
words and found that participants had more diYculty producing sequences
where words shared initial phonemes. Production was facilitated for words
that shared Wnal phonemes (in comparison to words without shared phon-
emes). In general, models of lexical activation and access predict that lexical
access is most vulnerable to competition between words for initial phonemes
and less vulnerable for later phonemes, as the initial portions of the accessed
word severely restrict the remaining possibilities. This temporal asymmetry
can be reXected quantitatively in functionally grounded phonological con-
straints. For example, phonotactic consonant co-occurrence constraints in
Arabic are stronger word initially than later within the word (Frisch 2000). This
asymmetry is compatible with the claimed grounding of this co-occurrence
constraint in lexical access (Frisch 2004).
It has also been demonstrated in a wide range of studies that the phono-
logical lexicon is organized as a similarity space (Treisman 1978; Luce and
Pisoni 1998). The organization of the lexicon as a similarity space is reXected
in processing diVerences based on activation and competition of words that
share sub-word phonotactic constituents. The impact of similarity-based
organization is most clearly reXected in cases of analogical processes between
Linguistic and Metalinguistic Tasks in Phonology 83

phonologically similar words (Bybee 2001; Skousen et al. 2002). In related


work, similarity has also been used to explain limitations on phonological
processes. For example, Steriade (2000, 2001) claims that grammar is con-
strained to create maximally similar underlying and derived forms where
similarity is deWned over a perceptual lexical space.
Given the grounding of gradient phonological patterns in lexical distri-
butions, it is unclear whether a distinct phonological grammar (phono-
logical competence) is required above and beyond what is necessary to
explain patterns of phonological processing (phonological performance).
Ultimately, this is an empirical question. Arguments for and against a
distinct phonological grammar have been made (e.g. Pierrehumbert 2003;
Moreton 2002). However, diVerent models of phonological processing and
the lexicon make diVerent predictions about the nature of phonological
patterns. Traditionally, models of language processing hypothesize that ab-
stract linguistic constituents, such as phonemes, onsets, rimes, and syllables,
are used in speech perception, spoken word recognition, speech production,
and lexical organization. These models assume that, at some stage of pro-
cessing, an abstract representation is involved that is independent of speciWc
instances (e.g. Dell 1986; Berent et al. 1999). This type of model uses
generalizations that are naturally akin to the types of representations
found in phonological grammars. These types of representations would
encode gradient phonological patterns as frequency counts or probabilities
for the symbols.
Recently, an alternative processing model based on instance-speciWc exem-
plars has received empirical support in language processing (e.g. Johnson
1997; Pierrehumbert 2002). Exemplar models can still explain generalization-
based behaviour as a reXex of the collective activation of exemplars that are
similar along some phonological dimension. Thus, abstract phonological
categories may not be needed, and consequently a grammar containing
abstract phonological rules may be an illusion resulting from the stable
behaviour that follows from the activation of a large number of overlapping
exemplars. Exemplar models of language represent the frequency information
in gradient phonological constraints in a completely straightforward and
transparent way. The frequency information is a direct consequence of fre-
quency of exposure and use.
The symbolic and exemplar representations systems are not necessarily
mutually exclusive. It is also possible that both abstract categories and
instance-speciWc exemplars are part of a person’s phonological knowledge,
creating a phonological system with redundancy and with competing analyses
of the same phonological form (cf. Bod 1998; Pierrehumbert 2003).
84 The Nature of Gradience

4.5 Summary
Studies of gradience in phonology using linguistic and metalinguistic data
have revealed a much closer connection between phonological grammar and
the mental lexicon. These new dimensions of phonological variation could
not have been discovered without corpus methods and data from groups of
participants in psycholinguistic experiments. While the range of patterns that
have been studied is still quite limited, the presence of gradient phonological
constraints demonstrates that phonological knowledge goes beyond a cat-
egorical symbolic representation of possible forms in a language. In order to
accommodate the broader scope of phonological generalizations, models of
grammar will have to become more like models of other cognitive domains,
which have long recognized and debated the nature of frequency and simi-
larity eVects for mental representation and processing. The study of phon-
ology may provide a unique contribution in addressing these more general
questions in cognition for two reasons. The scope of phonological variability
is bounded by relatively well-understood mechanisms of speech perception
and speech production. Also, phonological categories and phonological pat-
terns provide a suYciently rich and intricate variety of alternatives that the
full complexity of cognitive processes can be explored.
5

Intermediate Syntactic Variants


in a Dialect-Standard Speech
Repertoire and Relative
Acceptability
LEONIE CORNIPS

5.1 Introduction
Non-standard varieties, such as dialects throughout Europe, which are under
investigation challenge research about the phenomenon of micro-variation in
two ways.1 Within the framework of generative grammar, the linguist studies
the universal properties of the human language in order to Wnd out the
patterns, loci, and limits of syntactic variation. Language is viewed essentially
as an abstraction, more speciWcally, as a psychological construct (I-language)
that refers primarily to diVerences between individual grammars within a
homogeneous speech environment, that is to say, without consideration of
stylistic, geographic, and social variation. Given this objective, a suitable
investigative tool is the use of intuitions or native-speaker introspection, an
abstraction that is quite normal within the scientiWc enterprise. Frequently,
however, there are no suYciently detailed descriptions available of syntactic
phenomena that are of theoretical interest for investigating micro-variation in
and between closely related non-standard varieties in a large geographical area
(cf. Barbiers et al. 2002). Subsequently, a complication emerges in that the
linguist has to collect his own relevant data from speakers of local dialects who
are non-linguists. The elicitation of speaker introspection often calls for a
design of experiments in the form of acceptability judgements when the
linguist has to elicit intuitions from these speakers (Cornips and Poletto 2005).

1 I like to thank two anonymous reviewers for their valuable comments. Of course, all usual
disclaimers apply.
86 The Nature of Gradience

Moreover, the standard variety may strongly interfere with local dialect
varieties in some parts of Europe so that there is no clear-cut distinction
between the standard and the local dialect. In this contact setting—a so-called
intermediate speech repertoire (cf. Auer 2000)—the speakers of local dialects
may assess all possible syntactic variants, that is dialect, standard, and emer-
ging intermediate variants to their local dialect. Subsequently, clear-cut
judgements between the local dialect and the standard variety are not attain-
able at all. This is, among other factors, of crucial importance for under-
standing the phenomenon of gradedness in acceptability judgements.
This chapter is organized as follows. In the second part it is proposed that
acceptability judgements do not oVer a direct window into an individual’s
competence. The third part discusses an intermediate speech repertoire that is
present in the province of Limburg in the Netherlands. In this repertoire some
constructions are diYcult to elicit. Finally, acceptability judgements that are
given by local dialect speakers in the same area are discussed. Using data from
reXexive impersonal passives, reXexive ergatives, and inalienable possession
constructions, it is argued that the occurrence of intermediate variants and
the variation at the level of the individual speaker is not brought about by
speciWc task-eVects but is due to the induced language contact eVects between
the standard variety and the local dialects.

5.2 Relative acceptability


Bard et al. (1996: 33) discuss the important three-way distinction among
grammaticality, a characteristic of the linguistic stimulus itself, acceptability, a
characteristic of the stimulus as perceived by a speaker, and an acceptability
judgement which is the speaker’s response to the linguists’ enquiries. These
authors note that relative grammaticality is an inherent feature of the gram-
mar whereas relative acceptability reXects gradience in acceptability judge-
ments. The former has a controversial status since it is not entirely clear how
to deal with relative grammaticality in formal theory. According to Schütze,
the best-known proponents of the view that grammaticality occurs on a
continuum are Ross, LakoV and their followers in the late 1960s and early
1970s (see Schütze 1996: 62, 63 for more detail).
With respect to acceptability judgements, every elicitation situation is artiW-
cial: the speaker is being asked for a sort of behaviour that, at least on the face of
it, is entirely diVerent from everyday conversation (cf. Schütze 1996: 3). More-
over, Chomsky (1986: 36) argues that: ‘In general, informant judgments do
not reXect the structure of the language directly; judgments of acceptability,
for example, may fail to provide direct evidence as to grammatical status
Intermediate Syntactic Variants 87

because of the intrusions of numerous other factors’ (cf. Gervain 2002). ‘The
intrusion of numerous other factors’ may lead to a crucial mismatch between
the acceptability judgements of a construction and its use in everyday speech.
One of these factors is that in giving acceptability judgements people tend to
go by prescriptive grammar (what they learned at school, for instance) rather
than by what they actually say (cf. Altenberg and Vago 2002). This is consist-
ent with sociolinguistic research that prescriptive grammars usually equal
standard varieties that are considered more ‘correct’ or have more prestige
than the vernacular forms speakers actually use. Moreover, strong sociolin-
guistic evidence shows that a speaker may judge a certain form to be com-
pletely unacceptable but can, nevertheless, be recorded using it freely in
everyday conversation (Labov 1972, 1994, 1996: 78). One way to diminish the
prescriptive knowledge eVect is to ask for indirect comparative acceptability
judgements. Rather than eliciting direct intuitions by the formula: ‘Do you
judge X a grammatical/better sentence than Y?’, speakers can be asked the
more indirect: ‘Which variant Y or X do you consider to be the most or the
least common one in your local dialect?’ Relative judgements can be adminis-
tered by asking the speakers to indicate how uncommon or how common (for
example represented by the highest/lowest value on a several point scale,
respectively) the variant is in their local dialect. Psychometric research
shows that subjects are thus much more reliable on comparative, as opposed
to independent ratings (cf. Schütze 1996: 79 and references cited there). These
Wndings indicate that relative acceptability is an inevitable part of the
speaker’s judgements.
Relative acceptability is without doubt brought about by the complex
relationship between I-language and E-language phenomena. The opposition
between these two types of phenomena is not necessarily watertight as is often
claimed in the literature. Muysken (2000: 41–3) argues that the cognitive
abilities which shape the I-language determine the constraints on forms
found in the E-language and that it is the norms created within E-language
as a social construct that make the I-language coherent. One example of this
complex relationship may be that in a large geographical area two or more
dialects may share almost all of their grammar (more objective perspective)
but are perceived as diVerent language varieties by their speakers (more
subjective perspective). This can be due to the fact that dialects may diVer
rather strongly in their vocalism and that non-linguists are very sensitive to
the quality of vowels. The perceived diVerences between dialects may be
associated with diVerent identities and vice versa. Another consequence
may be that speakers actually believe that the norm created within their speech
community (E-language; more subjective perspective) reXects their grammar
88 The Nature of Gradience

(I-language; more objective perspective). For example, in the Dutch Syntactic


Atlas project (acronym SAND) we asked dialect speakers to ‘translate’ stand-
ard Dutch verbal clusters containing three verbs, as exempliWed in (5.1), into
their local dialect (cf. Cornips and Jongenburger 2001).2
(5.1) Ik weet dat Jan hard moet kunnen werken
I know that Jan hard must can work
Quite a number of speakers told us that their dialects are simpler or more
informal than standard Dutch. Therefore, sentences such as (5.1) are judged as
ungrammatical or less grammatical than sentences containing two-verb clus-
ters. However, there is not one Dutch dialect attested that excludes construc-
tions as in (5.1). So, it is important to realize that the use of the dialect and the
standard variety in a speciWc setting may be triggered by stylistic or social
factors (in a speciWc setting, with speciWc interlocutors) if these varieties
constitute a continuum. This information minimizes the risk that we are obtain-
ing information about the prescriptive norms of the standard (or prestigious
or formal) variety while our intention is rather to question speakers about
their dialect (or vernacular non-standard) forms.

5.3 The intermediate speech repertoire


The so-called intermediate speech repertoire (cf. Auer 2000) as exempliWed in
the southeastern part of the Netherlands (province of Limburg) is presumably
the most widespread in Europe today. In this repertoire, there is a structural
or genetic relationship between the standard variety and the dialects (cf. Auer
2005). The inXuence of the standard variety on the dialects is quite manifest.
There is no longer a clear-cut separation between the varieties, that is to
say, speakers can change their way of speaking without a clear and abrupt
point of transition between these varieties. This is of crucial importance to
understanding relative acceptability.
In general, syntactic elicitation provides no diYculties if structures are
grammatical in the standard variety and ungrammatical in the dialects (Cor-
nips and Poletto 2005). The speakers usually refuse these constructions, either
by providing a grammatical alternative or by simply not being able to
translate the sentence and, hence, show non-response. For instance, in the local
dialect of Heerlen (a city in the south of the province of Limburg in the
Netherlands, see Figure 5.1) negation agreement in (5.2a) is ungrammatical.

2 More information about the SAND-project can be found at: http://www.meertens.nl/projecten/


sand/sandeng.html
Intermediate Syntactic Variants 89

The local dialect speaker easily provides an alternative, as shown in (5.2b)


(taken from the Dutch Syntactic Atlas-project):
(5.2) Instruction: ‘Translate into your local dialect’
a. Er wil niemand niet dansen
expl will no one not dance
‘No one wants to dance.’/ ‘Everyone wants to dance.’
Translation:
b. Gene wilt danse
no one wants dance
‘No one wants to dance.’
Second, speakers may provide syntactic features that are obligatory in the
local dialect even if the same phenomenon is banned from the standard
variety. In this case speakers seem to be able to distinguish whether a given
construction is grammatical without interference from prescriptive norms. In
the dialect of Heerlen and standard Dutch, there is a very sharp contrast
between the grammaticality of the impersonal passive with and without a
reXexive, respectively. The local dialect speakers in Heerlen were asked
whether they encounter the variant in (5.3a) in their local dialect. This variant
is fully ungrammatical in the standard variety. The majority of the subjects (16
out of 24, 67 per cent) provide an aYrmative answer, which is conWrmed by
their translation, as exempliWed in (5.3b):3
(5.3) Local dialect of Heerlen
Instruction: ‘Do you encounter this variant’
a. Er wordt zich gewassen
thereexpl is refl washed
Answer: ‘Yes’ and translation
b. 1t weëd zich gewessje
thereexpl is refl washed
‘One is washing himself.’
Finally, in the case that the phenomenon is optional, speakers tend to repro-
duce the standard variety, because this is nonetheless grammatical in their
dialect (Cornips and Poletto 2005). This issue is nicely illustrated by responses
to constructions in which an aspectual reXexive zich (an optional element) is
oVered to the Limburg dialect speakers, as illustrated in (5.4) (cf. Cornips 1998):

3 However, the plain impersonal passive is also grammatical in the local dialect of Heerlen. In that
case it has no reXexive interpretation: ‘One is washing (clothes).’
90 The Nature of Gradience

(5.4) d’r Jan had zich in twieë minute e beeke gedrònke


the Jan had reX in two minutes a small beer drunk

The construction with the reXexive zich is fully ungrammatical in the standard
variety but optional in the local dialect. The written questionnaire of the
Dutch Syntactic Atlas project shows that in only two out of thirty-Wve possible
locations in the province of Limburg and its immediate surroundings, an
answer with the reXexive is given. Obviously, the interference with the
standard variety is so strong that the reXexive is not presented in the answers.
It seems that only a very good subject can provide optional structures or all
the possibilities that come to his mind.

5.3.1 Heerlen Dutch as an emerging intermediate regional standard variety


Without any doubt, every emerging intermediate speech repertoire is a
result of an induced language contact situation and/or processes of stand-
ard–dialect and/or dialect–dialect convergence, that is to say, vertical
and horizontal levelling, respectively (cf. Cornips and Corrigan 2005). The
emergence of intermediate variants, which may result in a regional variety as a
second standard in the area, is crucial to understanding the phenomenon of
syntactic variation within the speech community and at the individual
speaker level.
A good example of an intermediate variety due to language contact eVects
is Heerlen Dutch (Cornips 1998). Heerlen Dutch is a regional standard Dutch
variety in the Netherlands. Heerlen is a town of 90,000 inhabitants, situated in
Limburg, a province in the southeast of the Netherlands near the Belgian and
German borders (see Figure 5.1).
As already discussed above, in the local dialect of Heerlen reXexives may
occur in a much wider range of constructions than in standard Dutch. An
example is the appearance of the reXexive zich in inchoative constructions. In
standard Dutch (SD) the appearance of zich in inchoative verb constructions is
far from regular. Zich is required in (5.5a), is optional in (5.5b), and obligatorily
absent in (5.5c) (cf. Everaert 1986: 83, SD ¼ standard Dutch):
(5.5) a SD Het gerucht verspreidt (*zich)
the rumour spreads refl
b. SD De suiker lost (zich) op
the sugar dissolves refl part.
c. SD De boter smelt *(zich)
the butter melts refl
Intermediate Syntactic Variants 91

N
Amsterdam et
he
rla

Germany
nd
s

Belgium (Flemish part)

= Heerlen

Figure 5.1. The location of Heerlen in the province of Limburg

In the local dialect of Heerlen, the reXexive zich has to be present in (5.5b) and
it may be present in (5.5c). Further, it also arises in some more inchoative
constructions based on transitive verbs as veranderen ‘change’, krullen ‘curl’,
and buigen ‘bend’. All these inchoative constructions are fully ungrammatical
with a reXexive in the standard variety. Heerlen Dutch, however, as a second
standard variety in the area has regularized the presence of the reXexive
throughout the whole verb class. It has an optional reXexive zich in the con-
struction in (5.5c) and also in (5.6) which are ungrammatical in the local
dialect (and in the standard variety) (HD ¼ Heerlen Dutch):4
4 In Cornips and Hulk (1996), it is argued that the Heerlen Dutch constructions in (5.6) are ergative
intransitive counterparts of transitive change of state verbs with a causer as external argument.
Further, it is shown that in constructions such as (5.4) and (5.6), the reXexive marker zich acts as an
aspectual marker, that is: the aspectual focus is on the end-point of the event.
92 The Nature of Gradience

(5.6) a. HD Dit vlees bederft zich


this meat spoils refl
b. HD De jurk sleept zich over de vloer
the dress drags refl over the Xoor
c. HD Het glas breekt zich
the glass breaks refl
It is important to note that optionality arises as an induced contact outcome
(see also later in Section 5.3.2).
Another example of an emerging variant in Heerlen Dutch concerns the so-
called dative inalienable possession construction in which the referent of
the dative object is the possessor of the inalienable body-part(s) denoted by
the direct object. Importantly, all (old) monographs of the Heerlen dialect and
written dialect literature (see Blancquaert et al. 1962; Jongeneel 1884; Kessels
1883) show that in the local dialect the DP referring to the body-part(s) such as
handen ‘hands’, illustrated in (5.7), is headed by the deWnite determiner de ‘the’.
The possessive dative construction expressing inalienable possession is abun-
dantly used in the eastern dialect varieties of Dutch, although extremely rare in
standard Dutch (cf. Broekhuis and Cornips 1994; Cornips 1998). The inalien-
able possession construction, as far as possible, has an idiomatic reading in
standard Dutch that is completely absent in this regional Dutch variety and in
the dialects of Limburg (Hdial ¼ Heerlen dialect):
(5.7) Hdial /?*SD Ik was Jandat./hemdat. de handen.
I wash Jan/him the hands
‘I am washing Jan’s/his hands.’
Hence, in the standard variety the inalienable possession relation must be
expressed by means of a possessive pronoun, namely zijn ‘his’, as illustrated in
(5.8). The construction in (5.8) is in turn rare in the local dialect of Heerlen:
(5.8) SD/?*Hdial Ik was zijn/Jans handen.
I wash his/Jan’s hands
‘I am washing his/Jan’s hands.’
Nowadays, Heerlen Dutch involves a large spectrum of intermediate lects
varying between the local dialect and standard Dutch. As a result, in Heerlen
Dutch we Wnd both the inalienable dative construction in (5.7) and the
possessive pronoun construction in (5.8). Thus, the syntactic variation within
a regional variety as Heerlen Dutch corresponds to cross-linguistic diVerences
between English and French, as in (5.9) and (5.10), respectively (Cornips 1998;
Vergnaud and Zubizarreta 1992):
Intermediate Syntactic Variants 93

(5.9) a. Eng *I am washing himdat. the hands.


b. Eng I am washing his hands.
(5.10) a. Fr *Je lave ses mains. (*inalienable reading)
b. Fr Je luidat. lave les mains.
Moreover, in spontaneous speech data—although rarely—we even Wnd inter-
mediate forms such as the dative object in combination with the possessive
pronoun, as in (5.11b) (cf. Cornips 1998, HD ¼ Heerlen Dutch):
(5.11) a. HD Ik was hemdat. de buik.
I wash him the stomach
b. HD Ik was hemdat. zijn buik.
intermediate form I wash him his stomach
c. HD Ik was zijn buik.
I wash his stomach
It is important to point out that this intermediate variant was already present in
1962 in the neighbouring community of Heerlen, that is Kerkrade.5 Obviously,
the inalienable possessive construction is not a binary variable since it allows
more than two variants, as illustrated in (5.11). Spontaneous speech data of
intermediate variants is presented in (5.12) (see also (5.19) in Section 5.3.2,
Cornips 1998). In (5.12) the inalienable possession relation is expressed both by
means of a possessive dative je ‘you’ and by means of the possessive pronoun
je ‘your’. Note, however, that in all these examples the DP referring to an
inalienable body-part is the complement of a PP and not a direct object:6
(5.12) a. HD want ze zeuren jedat van alles naar je
because they nag you everything to your
hoofd toe (19: Cor)
head part
‘They are nagging you about everything.’
b. HD die had jedat zo bij je neus staan
they had you right away with your nose stand
(35: dhr Berk)
‘They stand in front of you right away.’

5 Blancquaert et al. (1962) denote the following translation in the local dialect of Kerkrade (in
Dutch orthography):
(i) De schipper likte zich zijn lippen af
The skipper licked refl his lips part.
‘The skipper licked his lips.’
6 The constructions in (5.12) with a PP are more accepted in standard Dutch than the double object
constructions (see Broekhuis and Cornips 1997 for more details about this type of construction).
94 The Nature of Gradience

In (5.13) the semi-copula krijgen ‘get’ cannot assign dative case to the posses-
sor hij ‘he’ which is therefore nominative (cf. Broekhuis and Cornips 1994).
However, the spontaneous speech data example in (5.13b) shows that the
possessor is also realized by means of the reXexive zich, which is fully
ungrammatical in the local dialect (like standard Dutch):
(5.13) a. HD Hij krijgt de handen vies
he gets the hands dirty
‘His hands are dirty.’
b. HD Die heeft zich enorm op z’n donder gekregen
(33: dhr Quint)
he has refl enormously a beating got
‘He gets hell.’

5.3.2 The lack of characteristic properties of the dative inalienable possession


construction
Interestingly, it is not only the case that intermediate variants emerge in an
intermediate speech repertoire but they also lack characteristic properties
(Vergnaud and Zubizarreta 1992: 598) of the ‘original’ dative inalienable
possession construction. Two important properties are the strictly distribu-
tive interpretation and grammatical number. The former refers to the
presence of a plural possessor combined with a singular inalienable
argument as in (5.14a). If that is the case, the referent of the inalienable
argument is nevertheless interpreted as referring to more than one body-
part. The latter property refers to the fact that inalienable arguments are
obligatorily singular when referring to body-parts of which the number per
individual is limited to one such as ‘head’, regardless of whether they have a
plural possessor or not. This is illustrated by the grammaticality contrast
between (5.14b) and (5.14c) (note that there is no idiomatic reading
involved):
(5.14) a. Ik was hundat./3pl het hoofd (i.e. more ‘heads’ involved)
I wash them the head
‘I am washing their heads.’
b. *Ik was hundat./3pl de hoofdenpl
I wash them the heads
‘I am washing their heads.’
c. Ik was hundat./3pl de handenpl
I wash them the hands
‘I am washing (both) their hands.’
Intermediate Syntactic Variants 95

Let us now compare (5.14) with the spontaneous speech data example in (5.15)
that occurs very infrequently in the corpus:7
(5.15) HD Ze slaan mekaarpl niet meteen de koppenpl in
(5: Stef)
they hit each other not at once the heads in
‘They don’t hit each other immediately.’
Apparently, both a distributive interpretation and the property of grammatical
number may no longer be characterizing properties of the intermediate forms
in Heerlen Dutch, that is: the inalienable argument koppen ‘heads’ is plural
although the number of the body-parts kop ‘head’ per individual is limited
to one.
Finally, the dative construction cannot be modiWed by just any attributive
adjective, whereas there is no such restriction in the possessive pronoun
constructions, as exempliWed in (5.16a) and (5.16b), respectively (Vergnaud
and Zubizarreta 1992):
(5.16) a. HD *Ik was hemdat. de vieze buik
I wash him the dirty stomach
‘I am washing his dirty stomach.’
b. HD Ik was zijn vieze buik.
I wash his dirty stomach
‘I am washing his dirty stomach.’
I asked speakers of Heerlen Dutch to tell a short story containing the elements
vuil ‘dirty’ and handen ‘hands’. In addition to (5.16b), they realize the inter-
mediate variant in (5.17) which lacks, due to the presence of the possessive
pronoun, any restriction on the presence of the adjective:
(5.17) HD Hij wast hemdat. zijn vuile handen.
He washes him his dirty hands
‘He is washing his dirty hands.’
Further, a major important aspect of emerging intermediate variants such as
the ones described above is that optionality arises. Hence, a major character-
istic of the dative inalienable possession construction is that the [spec TP]-
subject or the agent, cannot enter into a possessive relation with the direct

7 One reviewer points out that the German counterpart of (5.15) with a plural inalienable argument
is grammatical, whereas the German counterpart of (5.14a) with a singular inalienable argument is
ungrammatical. However, (5.14a) is fully grammatical both in the local dialect of Heerlen and Heerlen
Dutch.
96 The Nature of Gradience

object (or prepositional complement), not even if the indirect object is absent,
as illustrated in (5.18a). Thus, a possessive relation between the subject and the
direct object can only be expressed indirectly, namely by inserting a dative NP
or a reXexive zich, as in (5.18b), respectively (see Broekhuis and Cornips 1994;
Vergnaud and Zubizarreta 1992).
(5.18) a. Hdial /SD *Hiji wast de handeni
b. Hdial /?*SD Hiji wast zichi de handen.
he washes reX the hands
‘He is washing his hands.’
However, in all the intermediate variants described so far in which the
possessive relation is expressed by the possessive pronoun, the dative object
or reXexive referring to a possessor is optional. Importantly, all constructions
in (5.19), in contrast to the double object constructions, involve idiomatic
readings:8
(5.19) a. HD ze zeuren (jedat) van alles naar je hoofd toe (19: Cor)
they nag you everything to your head part
‘They are nagging you about everything.’
b. HD die had (jedat) zo bij je neus staan
they had you right away with your nose stand
(35: dhr Berk)
‘They stand in front of you right away.’
c. HD Die heeft (zich) enorm op z’n donder gekregen
(33: dhr Quint)
he has refl enormously a beating got
‘He gets hell.’
Taken together, these intermediate variants are of extreme importance with
respect to the locus of syntactic variation, that is: whether the primitive of
variation is located outside or inside the grammatical system. It becomes
obvious that the facts in Heerlen Dutch indicate that syntactic variation can
no longer exhaustively be described by binary settings or diVerent values of a
parameter (Cornips 1998). In an intermediate speech repertoire, this concept
is a very problematic one and must be open to discussion. DiVerent alter-
natives are possible but there are no satisfying answers yet. A more recent
alternative is to argue that from a minimalist point of view, lexical elements

8 It might be the case that, eventually, the optional dative object will disappear or that it will gain
emphatic meaning.
Intermediate Syntactic Variants 97

show minimal morphosyntactic diVerences, more speciWcally, whether they


bear un- or interpretable features interacting with general principles.9
Another alternative is to place the notion of choice between syntactic variants
into the grammatical system. Henry (1995) shows that imperatives in Belfast
allow optional raising of the verb to C and inversion in embedded questions
may or may not occur. She accounts for this optionality by arguing that a
functional category such as C has both strong and weak features instead of
diVerent settings of a parameter. In sum, analyses diVer with respect to the
locus of syntactic variation (grammar versus lexicon) and whether individuals
may have one or two (or more) grammars responsible for the various
syntactic variants.

5.4 Acceptability judgements in an intermediate speech repertoire


The coming into existence of intermediate variants is of crucial importance
for understanding the phenomena of relative acceptability and perhaps
relative grammaticality. Hence, variationist studies have convincingly
shown that individual speakers do not show all possible alternatives that
exist at their level of their community (Henry 2002; Cornips 1998). So, the
behaviour of individual speakers with respect to acceptability judgments
cannot be interpreted without knowledge of the community pattern. An
individual speaker thus has a passive knowledge of more possible syntactic
alternatives than he actually uses due to the fact that these possible alter-
natives, that is standard, dialect, and emerging intermediate variants, can be
heard daily in his community. The intermediate variants form a continuum
with the standard and local dialect varieties. This continuum arises not only
from a geographic perspective but also from a stylistic (for example the use
of dialect and standard features in a more informal and formal setting,
respectively) and social perspective (age, gender, ethnicity, levels of educa-
tion, and occupation of the speaker) as well. I propose that a speaker may
no longer be able to judge syntactic features as fully grammatical or
ungrammatical. Instead, it is very likely that due to the eVects of the
standard–dialect contact situation the speaker can only make relative judge-
ments by comparing those variants. Further, it might be the case that this
gradience in acceptability judgements partly arises due to the relative

9 However, even if the syntactic variants are analysed as coming into existence as a result of two
competing grammars (cf. Kroch 1989), then some lexical elements must still bear un- and interpretable
features as well in order to account for the syntactic alternatives.
98 The Nature of Gradience

grammaticality of the intermediate variants in the community, namely the


fact that intermediate variants no longer possess characterizing properties as
discussed in 5.3.2.
In the former section, a regional standard variety was discussed. Let us
now consider the local dialects in the same area. These dialects were
investigated in the Dutch Syntactic Atlas project. The design of method-
ology in the Dutch Syntactic Atlas project consisted of two phases, written
elicitation and oral elicitation. The oral acceptability judgement tasks were
administered in dialect rather than in the standard variety or some regiolect,
in order to avoid accommodation, that is adjustment from the dialect in the
direction of the standard-like varieties (cf. Cornips and Poletto 2005). In
the phase of oral elicitation, 250 locations were selected throughout the
Netherlands. We had a major problem in doing the Weldwork since the large
majority of the Weldworkers and Ph.D. students speak only the standard
variety. It is for this reason that we had to ask for the assistance of another
dialect speaker from the same community speaking the same variety in
order to be able to interview the subject in his own dialect. The Weldworker
(speaking only standard Dutch) trained a local dialect speaker as an
‘assistant interviewer’. This ‘assistant interviewer’ was asked to translate
a standard Dutch structured elicitation task into his or her local dialect.
These translations were recorded. In a second session these recordings were
played to the second local dialect speaker. In this session, the entire
conversation was restricted to the two dialect speakers and the Weldworker
did not interfere.

5.4.1 Oral elicitation: the local dialects


Two small case studies convincingly show how easily speakers switch be-
tween the (base) dialect and the standard variety in an oral task in the
southern part of the province of Limburg where an intermediate speech
repertoire exists. One of the locations involved in the project was Nieuwen-
hagen (Landgraaf) a very small ‘rural’ village in the environs of Heerlen. In
the local dialect of Nieuwenhagen proper names are obligatorily preceded
by the deWnite determiner et or der ‘the’ depending on whether the proper
name refers to a female or male, respectively. The presence of the deWnite
determiner preceding a proper name, as in (5.20), is fully ungrammatical in
standard Dutch:
(5.20) et Marie / der Jan is krank
det Mary /det Jan is ill
Intermediate Syntactic Variants 99

The recording of the Wrst session between the standard Dutch speaking
Weldworker and the local ‘assistant interviewer’ translating standard Dutch
into his own dialect shows that the deWnite article in his translation is absent:
that is, the proper names Wim and Els show up without it, as illustrated in
(5.21). These sentences were elicited in order to investigate the order in the
verbal cluster (right periphery):
(5.21) 1st session (dialect–standard)
Ø Wim dach dat ich Ø Els han geprobeerd e kado te geve
Wim thought that I Els have tried a present to give
‘Wim thought I tried to give a present to Els.’
In the same interview session, the ‘assistant interviewer’ shows in another
sentence that he may or may not use the deWnite article resulting in der Wim
and Ø Els respectively in his local dialect. Note that the deWnite determiner
precedes the subject DP whereas it is absent in front of the object DP:
(5.22) 1st session (dialect–standard)
Der Wim dach dat ich Ø Els e boek han will geve
det Wim thought that I Els a book have will give
‘Wim thought I wanted to give a book to Els.’
In the second session, however, in which the ‘assistant interviewer’ exclusively
interviews the other dialect speaker in the local dialect, the latter utters the
deWnite article both with the subject and object DP as ‘required’:
(5.23) 2nd session (dialect–dialect)
Der Wim menet dat ich et Els e boek probeerd ha kado te geve.
‘Wim thought I tried to give a book to Els.’
Other indications for easily switching between the base dialect and standard
Dutch can be found in (5.24). The inWnitival complementizer has the form om
and voor in standard Dutch and the local dialect, respectively (see Cornips 1996
for more details). In the Wrst session, the ‘assistant interviewer’ in interaction
with the standard Dutch speaking Weldworker uses the standard Dutch com-
plementizer om whereas in the second session the dialect speaker utters voor, as
illustrated in (5.24a) and (5.24b), respectively. Moreover, note that in the Wrst
session the proper name Wendy lacks the deWnite article again whereas it is
present in the second session, as presented in (5.24a) and (5.24b), respectively:
(5.24) a. Ø Wendy probeerdet om ginne pien te doe. 1st session
(dialect–
standard)
100 The Nature of Gradience

b. Et Wendy hat geprobeerd voor ginne pien te doe. 2nd session


(dialect–
dialect)
‘Wendy tried not to hurt anyone.’
It is important to note that these functional elements were not explicitly
mentioned to the dialect speakers as features we were interested in. From the
above, it is obvious that in this linguistic repertoire, speakers can adjust to the
standard variety (and surrounding varieties) without a noticeable eVort. This
might be due to the fact that speakers are sensitive to their (un)conscious
awareness of social diagnosticity of syntactic features, namely that features
belonging to the domain of standard Dutch are the prestige variants (Cornips
1996). It is for this reason that training interviewers who are native speakers of
the local dialect is necessary although every design has to take into account
that standard, non-standard, and intermediate variants represent the daily
speech situation, that is: syntactic features from the local dialects and standard
Dutch appear in a continuum and have become continuous (cf. Paolillo 1997).
The emergence of new intermediate syntactic variants, too, points towards
a direction in which it is no longer possible to make a clear-cut distinction
between the standard variety and the local dialects from a syntactic point of
view. In contrast, as already noted in the introduction, the experiences during
Weldwork are that the Limburgian speakers perceive the local dialect and the
standard variety as two diVerent varieties and associate them with diVerent
identities although they share almost all of their grammar.

5.4.2 Written and oral acceptability judgements in the local dialect


In this section, written acceptability judgements about the inalienable pos-
session construction are discussed. The Wrst step in the design of the Dutch
Syntactic Atlas project was an extensive written questionnaire containing 424
questions (including sub-questions and remarks to be made by the respond-
ents) that were sent out to 850 respondents and a number of additional
informants in Belgium (Cornips and Jongenburger 2001). The grid of the
written questionnaires of the Dutch Syntactic Atlas project contains, among
others, ten neighbouring villages of Heerlen. In this questionnaire, local
dialect speakers were oVered the possessive pronoun construction, as in
(5.9), repeated here for convenience as (5.25):
(5.25) instruction ‘Translate into your local dialect’
Ik was zijn handen.
I wash his hands
‘I am washing his hands.’
Intermediate Syntactic Variants 101

Example (5.26) and Figure 5.2 (on page 102) reveal the translations of (5.25)
into the local dialect:
(5.26) ‘translations’
location
a. standard variant Beek he hät zien hanj geweschen
b. standard variant Eijgelshoven
He had sieng heng gewesche
c. standard variant Maastricht
heer heet zien han gewasse
d. standard variant Vaals Hae hat zieng heng jewaesje
e. standard variant WaubachHeë hat zieng heng gewessje
he has his hands washed
f. intermediate variant Eijgelshoven Heë had zieg sieng heng gewesje
g. intermediate variant Valkenburg Hae haet zich zien heng gewesje
h. intermediate variant Spekholzerhei hea hat ziech zien heng jewesche
he has REFL his hands washed
i. dialect variant Simpelveld hea hat zich de heng gewesche
j. dialect variant Waubach Hea haa zich de heng gewèsje
he has REFL the hands washed
To begin with, the responses show that standard variants, dialect variants, and
intermediate variants are among the answers. Further, all deviations of the
input, for example intermediate and dialect variants as in (5.26f,g,h) and
(5.26i,j) respectively, provide strong evidence that these variants are in the
grammar of the speaker (Carden 1976). Moreover, variation arises within a
local dialect, as is the case in the spontaneous speech data of Heerlen Dutch,
which is a regional standard variety. Thus, two respondents in the location of
Eijgelshoven and Waubach reveal diVerent responses. The former displays
both the standard and the intermediate variant in (5.26b), and (5.26f),
respectively, whereas the latter yields the standard and the dialect variant in
(5.26e) and (5.26j), respectively. Finally, the majority of the respondents copy
the standard Dutch variant into their local dialect, as illustrated in (5.26a–e).10
In order to control for this task eVect, we also administered this type of
construction in the oral acceptability task (see below).
Taken together, the translations provide evidence that (a) the standard
variety strongly interferes with the local dialect variety, (b) intermediate
variants arise, and (c) in this part of the province of Limburg syntactic
features from the local dialects and standard Dutch exist in a continuum
both in a regional standard variety and in the local dialects (see also (5.11)).

10 Maastricht, in the western part of Limburg, denoted the standard variant in 1962 (cf. Blancquaert
et al. 1962). The translations in the atlas of Blancquart seem to suggest that the dative inalienable
possessive construction is more spoken in the eastern part of Limburg, i.e. Heerlen and surroundings.
102 The Nature of Gradience

Possessive pronoun (5)

Intermediate forms (3)

dative inalienable possession (2)

Figure 5.2. Possible inalienable possession constructions as revealed by the responses


to the written questionnaire in ten surrounding locations of Heerlen

Similar to the written translation task, the standard Dutch possessive


pronoun construction in (5.25) above was oVered in the oral elicitation task,
which was the second step in the Dutch Syntactic Atlas project. The locations
in the neighbouring villages of Heerlen where oral Weldwork was conducted
are presented in Figure 5.3.
In the Wrst section, that is the standard–dialect interaction (see 5.3.2) the
assistant interviewers have been asked to translate (5.25) into their local
dialect. Only 4 out of 12 respondents (33 per cent) immediately translate
(5.25) into the dialect variant which is the dative inalienable possession
construction. On the other hand, the majority of the assistant interviewers
(8 out of 12, 67 per cent) just copy the possessive pronoun construction into
their local dialect. Hence, these respondents show interference from the
Intermediate Syntactic Variants 103

c Meertens Inst

Figure 5.3. Grid of the oral interviews in Heerlen and neighbouring locations
standard variety in shifting towards the more prestigious variety in their
response, as is the case in the written elicitation task.
What is more, the majority of the respondents (6 out of 8, 75 per cent)
reveals an implicational pattern revealing that they copy the standard Dutch
variant in the Wrst session (dialect–standard repertoire) whereas they use the
intermediate or the local dialect variant in the second session which is the
dialect–dialect repertoire, as illustrated in (5.27).
(5.27) Location of Vaals:
Assistant interviewer:
a. Oversetz: ‘Hij heeft zijn handen gewassen.’
Instruction: he has his hands washed
‘Translate’
b. 1st session: ‘Her had zien heng gewasse’
standard–dialect he has his hands washed
c. 2nd session: ‘Her had sich sien heng gewasse.’
dialect–dialect he has refl his hands washed
Assistant interviewer:
d. ‘Komt disse Her had zich de heng gewasse.
satz ook veur?’
‘Do you also He has refl the hands washed
encounter
this variant?’
e. Answer: ‘ja’
‘yes’
104 The Nature of Gradience

Again, the interaction in (5.27) reveals that the speech repertoire in Limburg is
a continuous one in which the distinction between standard and dialect
varieties is blurred. Consequently, the dialect speaker judges all possible vari-
ants, that is to say, the standard possessive pronoun (5.27b), the dialect dative
construction (5.27d,e), and the intermediate variant (5.27c) as acceptable. More
evidence is presented by the fact that 9 out of 12 speakers (75 per cent) accept
both the dative possessive construction and the intermediate form. Strikingly, 6
out of 12 speakers (50 per cent) argue that all forms in (5.27) are acceptable in
the local dialect. Two of them give relative judgements without being asked:
one considered (5.27c) as slightly more acceptable than (5.27d), the other
speaker just considered (5.27d) slightly more acceptable than (5.27c).
This small case study indicates (a) extensive variation at the level of the
individual speaker such that half of the speakers show all possible syntactic
alternatives that exist on the level of their community and (b) the existence of
intermediate variants to such an extent that it blurs the distinction between
the local dialect and the standard variety. This result is attested in spontan-
eous speech, and in both the written and oral elicitation data, so we can
exclude the possibility that it is primarily due to task eVects. In this inter-
mediate speech repertoire the occurrence of intermediate variants is inevit-
able in the process of vertical standard–dialect and horizontal dialect–dialect
convergence. These Wndings put a question mark on the central sociolinguis-
tic proposal that only phonology is a marker of local identity whereas syntax
is a marker of cohesion in large geographical areas. Further, syntactic elicit-
ation shows that speakers of local dialects are no longer able to refuse
syntactic variants as fully ungrammatical even if (a) these concern emerging
intermediate variants and (b) they did not originally belong to their local
dialect variety. Consequently, relative acceptability is the result.

5.5 Conclusion
In this paper, I have discussed a so-called intermediate speech repertoire, that
results from a contact situation between standard Dutch, a regional Dutch
variety (Heerlen Dutch), and local dialects in the southern part of the
province of Limburg in the south of the Netherlands. This speech repertoire
reveals syntactic diVerences along a continuum to such an extent that it blurs
the distinction between the local dialect and the standard variety. It is
demonstrated that in this speech repertoire clear-cut judgements are not
attainable at all. Using case studies, it has been shown that speakers in this
area are not able to judge syntactic features as fully grammatical or ungram-
matical. Instead, all variants heard in the community, for example standard,
Intermediate Syntactic Variants 105

dialect, and intermediate variants are considered as acceptable. Moreover, it


may be argued that in this speech repertoire dialect and standard varieties
form a continuum also beyond the geographic level, that is to say, a con-
tinuum from a stylistic and social variation perspective. Subsequently, the
Wndings in these case studies can be generalized beyond geographical vari-
ation. Moreover, these case studies show that syntax may also be a marker of
local identity.
6

Gradedness and Optionality in


Mature and Developing Grammars
A N TO N E L L A S O R AC E

6.1 Introduction
This paper focuses on speciWc patterns of gradedness and optionality in the
grammar of three types of speakers: monolingual native speakers, speakers
whose native language (L1) has been inXuenced by a second language (L2),
and very Xuent non-native speakers. It is shown that in all these cases
gradedness is manifested in areas of grammar that are at the interface between
the syntax and other cognitive domains.
First, evidence is reviewed on the split intransitivity hierarchy (Sorace
2000b, 2003a), indicating that not only auxiliary selection but also a number
of syntactic manifestations of split intransitivity in Italian and other languages
are lexically constrained by aspectual properties. These constructions tend to
show gradedness in native intuitions that cannot easily be accommodated by
current models of the lexicon–syntax interface. Moreover, the mappings
between lexical properties and unaccusative/unergative syntax are develop-
mentally unstable, whereas the unaccusative/unergative distinction itself is
robust and unproblematic in acquisition.
Second, it is shown that residual optionality, with its entailed gradedness
eVects, occurs only in interface areas of the competence of near-native
speakers, and not in purely syntactic domains. Sorace (2003b) indicates that
the interpretable discourse features responsible for the distribution of overt
and null subject pronouns are problematic in the L2 steady state of L1 English
learners of Italian, whereas the non-interpretable syntactic features related to
the null subject parameter are completely acquired.
Third, it is argued that the diVerentiation between narrow computational
syntactic properties and interface properties is also relevant in other domains
of language development, such as L1 attrition due to prolonged exposure to a
Gradedness and Optionality in Mature and Developing Grammars 107

second language (Sorace 2000b; Tsimpli et al. 2004; Montrul 2004). A clear
parallelism exists between the end-state knowledge of English near-native
speakers of Italian and the native knowledge of Italian advanced speakers of
English under attrition with respect to null/overt subjects and pre/postverbal
subjects. In both cases, the speakers’ grammar is/remains a null-subject
language: for example null subjects, when they are used, are used in the
appropriate contexts, that is when there is a topic shift. The purely syntactic
features of grammar responsible for the licensing of null subjects are not
aVected by attrition.
The generalization seems to be that constructions that belong to the syntax
proper are resilient to gradedness in native grammars; are fully acquired in L2
acquisition; and are retained in L1 attrition. In contrast, constructions that
require the interface of syntactic knowledge with knowledge from other
domains are subject to gradedness eVects; present residual optionality in L2;
and exhibit emergent optionality to L1 attrition.
The question of the interpretation of this generalization, however, is still
open. There are (at least) two issues for further research. First, there is a lack of
clarity about the nature of diVerence interfaces. Are all interfaces equally
susceptible to gradedness and optionality? Second, is gradedness inside or
outside the speakers’ grammatical representations? The available evidence is in
fact compatible both with the hypothesis that the gradedness is at the level of
knowledge representations and with the alternative hypothesis that it arises at
the level of processing. Possible approaches to these open issues are outlined.

6.2 The syntax–lexicon interface in native grammars:


split intransitivity
According to the unaccusative hypothesis (Perlmutter 1978; Burzio 1986),
there are two types of intransitive verbs, unaccusative and unergative, with
distinct syntactic properties. The essential insight (variously expressed by
diVerent syntactic theories) is that the subject of unaccusative verbs is syn-
tactically comparable to the object of a transitive verb, while the subject of an
unergative verb is a true subject. Evidence for the distinction is both syntactic
and semantic: in several European languages unaccusative verbs generally
select BE as a perfective auxiliary while unergative verbs select have; seman-
tically, the subject of unaccusative verbs tends to be a patient while that of
unergative verbs is an agent. However, it has proved diYcult to Wt many verbs
unambiguously into one class or the other. On the one hand, there are verbs
108 The Nature of Gradience

that do not satisfy unaccusativity diagnostics in consistent ways, both within


and across languages: so blush is unaccusative in Italian (arrossire, selecting
be) but unergative in Dutch (blozen, selecting have); Worire ‘blossom’ can take
either have or be. On the other hand, there are verbs that can display either
unaccusative or unergative syntax depending on the characteristics of the
predicate: for example, all verbs of manner of motion (e.g. swim) select have
in Dutch and German when they denote a process but take be in the presence
of a prepositional phrase denoting a destination; verbs of emission (e.g. rumble;
see Levin and Rappaport Hovav 1995 for extensive discussion) are unergative
in their default case but in some languages may exhibit unaccusative behav-
iour when they receive a telic interpretation.
One of the main challenges opened up by the unaccusative hypothesis is
therefore how to account for the variable behaviour of verbs. A great deal of
research in the last ten years has been devoted to explaining the complex
mappings between a lexical-semantic level of representation and the level of
syntactic structure. This eVort has taken two broad and seemingly incompat-
ible directions. Theories of argument structure (which, following Levin and
Rappaport Hovav (1996) may be termed ‘projectionist’) assume that the verb’s
lexical entry contains the necessary speciWcation for the mapping of argu-
ments onto syntactic positions. This approach posits Wne-grained distinctions
in lexical-semantic representations, singles out the syntactically relevant
lexical-semantic components in diVerent languages, and identiWes a set of
linking rules that deterministically project lexical-semantic features onto
syntactic positions, hence determining the unaccusative or unergative status
of verbs. The second direction—named ‘constructional’ by Levin and Rappa-
port Hovav (1996)—empties the lexical entries of verbs of any syntactic
speciWcation and makes semantic interpretation directly dependent on the
syntactic conWgurations in which the verb can appear. If verbs are thus not
tied to deterministic linking rules but have freedom of mapping, unaccusa-
tivity or unergativity become by-products of the verb’s compatibility with
particular syntactic conWgurations, instead of inherent lexical properties of
verbs (Borer 1994).
However, both the projectionist and the constructional solutions to the
lexicon–syntax puzzle have limitations; the most relevant is that the former
allows for too little variation, because of the deterministic nature of its linking
rules, whereas the latter allows too much variation, because of the lack of a
mechanism that rules out impossible mappings. These limitations have been
highlighted in particular by Sorace (1992, 1993a, 1993b, 1995, 2000b, 2003a),
who has shown that there is systematic variation that cannot be explained by
Gradedness and Optionality in Mature and Developing Grammars 109

CHANGE OF LOCATION > Categorical unaccusative syntax

CHANGE OF STATE >

CONTINUATION OF STATE >

EXISTENCE OF STATE >

UNCONTROLLED PROCESS >

MOTIONAL PROCESS >

NON-MOTIONAL PROCESS Categorical unergative syntax

Figure 6.1. The split intransitivity hierarchy

either approach.1 Instead, she proposes that intransitive verbs are organized
along a hierarchy (the split intransitivity hierarchy (SIH), originally called the
auxiliary selection hierarchy (ASH)) deWned primarily by aspectual notions
(telicity/atelicity), and secondarily by the degree of agentivity of the verb
(Figure 6.1).
The SIH is therefore an empirical generalization that identiWes the notion
of ‘telic change’ at the core of unaccusativity and that of ‘atelic non-motional

1 The study of optionality and gradedness at interfaces can be measured experimentally with
behavioural techniques that are able to capture subtle diVerences in speakers’ performance. The
informal elicitation techniques traditionally used in linguistics and language development research
(such as binary or n-point acceptability judgement tests) are unlikely to be reliable for such data,
because they can measure only very broad distinctions and typically yield ordinal scales, in which the
distance between points cannot be evaluated (Sorace 1996). A suitable experimental paradigm that has
gained ground in recent years is magnitude estimation (ME), a technique standardly applied in
psychophysics to measure judgements of sensory stimuli (Stevens 1975). The magnitude estimation
procedure requires subjects to estimate the perceived magnitude of physical stimuli by assigning values
on an interval scale (e.g. numbers or line lengths) proportional to stimulus magnitude. Highly reliable
judgements can be achieved in this way for a whole range of sensory modalities, such as brightness,
loudness, or tactile stimulation (see Stevens 1975 for an overview).
The ME paradigm has been extended successfully to the psychosocial domain (see Lodge 1981 for a
survey) and recently Bard et al. (1996), Cowart (1997), and Sorace (1992) showed that it may be applied
to judgements of linguistic acceptability. Unlike the n-point scales conventionally employed in the
study of psychological intuition, ME allows us to treat linguistic acceptability as a continuum and
directly measures acceptability diVerences between stimuli. Because ME is based on the concept of
proportionality, the resulting data are on an interval scale, which can therefore be analysed by means
of parametric statistical tests. ME has been shown to provide Wne-grained measurements of linguistic
acceptability, which are robust enough to yield statistically signiWcant results, while being highly
replicable both within and across speakers. ME has been applied successfully to phenomena such as
auxiliary selection (Bard et al. 1996; Sorace, 1992, 1993a, 1993b; Keller and Sorace 2003), binding
(Cowart 1997; Keller and Asudeh 2001), resumptive pronouns (Alexopoulou and Keller 2003;
McDaniel and Cowart 1999), that-trace eVects (Cowart 1997), extraction (Cowart 1997), and word
order (Keller and Alexopoulou 2001; Keller 2000b).
110 The Nature of Gradience

activity’ at the core of unergativity. The closer to the core a verb is, the more
determinate its syntactic status as either unaccusative or unergative. Verbs
that are stative and non-agentive are the most indeterminate. Sensitivity to
contextual or compositional factors correlates with the distance of a verb from
the core. Thus, the ASH helps to account both for variability and for consist-
ency in the behaviour of intransitive verbs. In contrast to the constructionist
view, where context is always critical, the ASH account prescribes that core
verbs have syntactic behaviour that is insensitive to non-lexical properties
contributed by the sentence predicate. On the other hand, peripheral verbs,
which are neither telic nor agentive, do seem to behave according to the
constructionist observation, with syntactic behaviour depending on the prop-
erties of the predicate in which they appear.
The SIH substantiates the intuition that, within their respective classes,
some verbs are ‘more unaccusative’ and ‘more unergative’ than others
(Legendre, Miyata, and Smolensky 1991). Crucially, however, this does not
mean that unaccusativity or unergativity are inherently gradient notions, or
that the distinction is exclusively semantic, but rather that some verbs allow
only one type of syntactic projection whereas other verbs are compatible with
diVerent projections to variable degrees. This is the reason why any approach
that focuses exclusively on the syntactic or on the semantic side of split
intransitivity is ultimately bound to provide only a very partial picture of
the phenomena in this domain. While no formal model yet exists that can
comprehensively account for the SIH, the SIH has given a new impetus to the
search for such a model. Theoretical research inspired by the SIH has in fact
been developed within diVerent frameworks and for diVerent languages
(e.g. Bentley and Eyrthórsson 2004; Cennamo and Sorace in press; Keller
and Sorace 2003; Legendre in press; Legendre and Sorace 2003; Randall in
press; Mateu 2003; among others).
Developmental evidence for the SIH comes from research on second lan-
guage acquisition (Montrul 2004, in press; DuYeld 2003) and on Wrst
language attrition (Montrul in press).
Core verbs are the Wrst ones to be acquired with the correct auxiliary both
in Wrst and second language acquisition. Data from the acquisition of Italian
as a non-native language show that the syntactic properties of auxiliary
selection are acquired earlier with core verbs and then gradually extended to
more peripheral verb types (Sorace 1993a, 1995), although L2 learners do not
attain the same gradient intuitions as those displayed by native Italians.
Moreover, Italian learners of French Wnd it more diYcult to acquire avoir as
the auxiliary for verbs closer to the core than for peripheral verbs (Sorace
Gradedness and Optionality in Mature and Developing Grammars 111

1993b, 1995), and do not completely overcome this diYculty even at the
advanced level. A study by Montrul (in press) conWrms this pattern for
L2 learners of Spanish, who have determinate intuitions on the syntactic
correlates of split intransitivity in this language, but only with core verbs.
These developmental regularities suggest two things. First, the acquisition
of the syntax of unaccusatives crucially depends on the internalization of
both the hierarchical ordering of meaning components, and the lexicon–
syntax mapping system instantiated by the target language. The pattern
uncovered by these data is consistent with an enriched constructional
model, equipped with a checking mechanism that is sensitive to the degree
of lexical speciWcation of verbs and rules out impossible mappings (see
Mateu 2003). As it is the position of verbs on the ASH, rather than their
frequency, which determines the order of acquisition, it seems that L2
learners do rather more than engaging in the kind of statistical learning
envisaged by a basic constructional model. Second, and more generally,
there are two sides to the split intransitivity question: a syntactic side (the
structural conWguration that determines unaccusativity or unergativity) and
a lexicon–syntax interface side (the mapping system that decides the syntactic
behaviour of any given verb). Gradedness and indeterminacy in native
grammars, as well as learning diYculties and residual problems in non-native
grammars, tend to be situated on the interface side: the syntactic distinction
itself is categorically stable.

6.3 The syntax–discourse interface in language development


Developmental research points to the conclusion that the same areas of
grammar appear to be unstable in other domains of language development
and change, regardless of the circumstances in which development takes
place. Areas of grammar that have been found to be particularly vulnerable
to variable crosslinguistic inXuence in diVerent bilingual populations are
those that involve the coordination of syntax and discourse. The obvious
questions are why this should be so, whether crosslinguistic inXuence is the
only cause of these phenomena, and whether the source of crosslinguistic
inXuence is the same for each bilingual group. Before addressing these
questions, the convergence between two developmental domains—L2
acquisition and L1 attrition—will be brieXy illustrated.

6.3.1 Endstate grammars


One of the characteristics of L2 advanced grammars that has received atten-
tion recently is residual optionality, that is unsystematic L1 eVects surfacing in
112 The Nature of Gradience

L2 speakers’ production.2 A much-discussed case is subject realization in


null-subject languages spoken by non-native speakers. It is well established
that null subjects in these languages are syntactically licensed but their
distribution is governed by discourse-pragmatic features (Rizzi 1982;
Cardinaletti and Starke 1999). In Italian, a typical agreement-licensed null-
subject language, sentences such as (6.1a) are possible, whereas the equivalent
sentence in English, (6.1b), is not:
(6.1) a. È partito
is-3rd SG gone.
b. *Is gone.
Moreover, in Italian the option of a null or overt subject is conditioned by
pragmatic factors, such as the [topic-shift] and the [focus] feature (Grimshaw
and Samek-Lodovici 1998). Thus in (6.2), an overt pronoun lui in the subor-
dinate clause can be co-referential with the complement Pietro, or with an
extralinguistic referent, but not with the matrix subject Gianni. In contrast, a
null pronoun in the same context signals co-referentiality with the topic Gianni.
(6.2) Giannii ha salutato Pietrok quando proi / lui*i/k/j l’ha visto.
Gianni has greeted Pietro when pro / he him-saw
‘Gianni greeted Pietro when he saw him.’
A characteristic feature of English near-native speakers of Italian (Sorace
2000a, 2003b; Filiaci 2003; Belletti et al. 2005) is that they optionally produce
(6.3b), where a monolingual Italian speaker would produce (6.3c).
(6.3) a. Perchè Maria è andata via?
why Maria is gone away?
b. (perchè) lei ha trovato un altro lavoro.
(because) she has found another job
c. (perchè) ___ ha trovato un altro lavoro.
(because) has found another job
In contrast, the same speakers never produce a null pronoun when there is a
shift of topic, as in (6.4b), or when the subject is contrastive, as in (6.5b).
(6.4) a. Perchè Maria non ha parlato con nessuno?
b. Perche *Ø (¼ Gianni) non l’ha neanche guardata
because Ø (¼ Gianni) didn’t even look at her

2 Optionality is regarded here as the pre-condition for gradedness: the term refers to the
co-existence in the same grammar of two alternative ways of expressing the same semantic content,
of which one appears to be preferred over the other by the speaker in production and comprehension,
creating gradedness eVects (Sorace 2000b).
Gradedness and Optionality in Mature and Developing Grammars 113

(6.5) a. Maria ha detto che andava da Paolo?


Maria has said that was going to Paolo’s?
b. *No, Ø ha detto che andava da lei
(¼ Paolo)
No, Ø said that
he was her.
going to
A similar pattern obtains for the position of subjects with respect to the verb.
In answer to an all-focus question, such as ‘what happened’, L1 English near-
native speakers of Italian optionally place the subject in preverbal position
(6.6b), whereas native Italians would naturally place it after the verb (6.6c).
This also happens in a narrow-focus context (6.7b), in which Italian requires
the topic to be in postverbal position (6.7c).
(6.6) a. Che cosa è successo? ‘What happened ?’
b. Gianni è partito.
Gianni is-3s left
c. È partito Gianni
is-3s left Gianni
(6.7) a. Chi ha starnutito? ‘‘Who sneezed?’’
b. Gianni ha starnutito
Gianni has-3s sneezed
c. (Ha starnutito) Gianni
Has-3s sneezed Gianni
These patterns are noticeably asymmetric: near-native speakers of Italian over-
generalize overt subject pronouns and preverbal subjects to contexts which
would require null subjects and postverbal subjects in native Italian, but they do
not do the reverse, namely they do not extend null and postverbal subjects to
inappropriate contexts. In fact, when they use null pronouns and postverbal
subjects, they use them correctly. These speakers therefore have acquired a null-
subject grammar. The optionality in their grammar does not aVect the syntactic
licensing of null subjects, but is at the level of the discourse conditions on the
distribution of pronominals and on the placement of subjects.3
It is worth noting that although the behaviour of native speakers is statis-
tically diVerent from that of near-natives, it is not categorical. In a small
number of cases, native speakers also over-produce overt subjects and
postverbal subjects in inappropriate discourse contexts. The signiWcance of

3 The few existing studies on near-native L2 grammars point to a similar split between purely
syntactic constraints, which are completely acquired, and interpretive conditions on the syntax, which
may or may not be acquired. See Sorace (2003b, in press) for details.
114 The Nature of Gradience

this detail lies in the fact that the options favoured by near-native
speakers are (strongly) dispreferred by natives, but they are not illicit in
their grammar.

6.3.2 L1 attrition
There is evidence that the same pattern of asymmetric optionality is
exhibited by native speakers of null subject languages who have had pro-
longed exposure to English. Research on changes due to attrition from
another language (Sorace 2000a, Tsimpli et al. 2004) indicates that native
Italians who are near-native speakers of English exhibit an identical pattern
of optionality as the English near-native speakers of Italian described
above: these speakers overgeneralize overt subjects and preverbal subjects to
contexts which require a null subject or a postverbal subject. The reverse
pattern is not found.
It is worth noting that the phenomenon is found both in production and in
comprehension.
For example, in the forward anaphora sentences in (6.8b), speakers under
attrition are signiWcantly more likely than monolingual Italians to judge the
overt pronoun as coreferential with the matrix subject ‘Maria’; however, the
null pronoun in (6.8a) is correctly interpreted as referring to the matrix
subject. These speakers are also more likely to produce sentences such as
(6.9a) regardless of whether the subject is deWnite or indeWnite, whereas
monolingual speakers would prefer a postverbal subject, as in (6.9b), particu-
larly when the subject is indeWnite.
(6.8) a. Mentre attraversa la strada, Maria saluta la sua amica
while pro is crossing the street, Maria greets her friend
b. Mentre LEI attraversa la strada, Maria saluta la sua amica
while she is crossing the street, Maria greets her friend

(6.9) a. Hai sentito che un palazzo/il palazzo dell’ONU è crollato?


Have heard that a building /the UN building collapsed?
you
b. Hai sentito che è crollato un palazzo/il palazzo dell’ONU?
Have heard that is collapsed a building/the UN building?
you

Thus, there is a parallelism between the end-state knowledge of English near-


native speakers of Italian and the native knowledge of Italian near-native
speakers of English under attrition with respect to null/overt subjects and
Gradedness and Optionality in Mature and Developing Grammars 115

pre/postverbal subjects: the speakers’ grammar is and remains a null-subject


language. The computational features of grammar responsible for the licens-
ing of null subjects are acquired completely, and are not aVected by attrition.4
More generally, changes induced by attrition in individual speakers pri-
marily aVect morphosyntactic features that are interpretable at the interface
with conceptual systems. The aVected features may become unspeciWed,
giving rise to optionality. Thus, attrition is expected to aVect the use of
overt subjects in L2 Italian (given that this is regulated by the interpretable
[topic-shift] and [focus] features). If these features become unspeciWed, overt
subjects in Italian under attrition are not necessarily being used or interpreted
as shifted topics or foci.5
The lexicon–syntax interface conditions governing the syntactic behaviour
of intransitive verbs are also vulnerable to attrition. Montrul’s (in press) study
on generational attrition in Spanish heritage speakers found that attrition
aVects the mappings of individual verbs onto unaccusative/unergative syntax:
these speakers do not show a sensitivity to diVerent subclasses represented on
the SIH, and their determinate intuitions are restricted to core verbs. Mon-
trul’s study shows that this is the same pattern obtained for L2 learners of
Spanish. Once again, both bilingual groups have a robust syntactic represen-
tation of split intransitivity, but exhibit instability with respect to the lexicon–
syntax interface conditions regulating the distribution of verbs into one
syntactic class or the other.

6.4 Interpreting gradedness and optionality


At this point we need a generalization that describes these converging patterns
of results. A Wrst approximation might be the following:

4 Other cases of selective attrition at interfaces are discussed in Montrul (2002) with respect to the
tense/aspect domain in Spanish; Polinsky (1995) with respect to the distinction between reXexive and
possessive pronouns in Russian; Gürel (2004) on pronominals in Turkish.
5 Studies on bilingual Wrst language acquisition converge with the results of research on L2
acquisition and L1 attrition. The syntax–pragmatics interface has been identiWed as a locus of cross-
linguistic inXuence between the bilingual child’s syntactic systems (Müller and Hulk 2001). Bilingual
children who simultaneously acquire a null-subject language and a non-null-subject language over-
produce overt subjects in the null-subject language (see Serratrice 2004 on Italian–English bilinguals;
Paradis and Navarro 2003 on Spanish–English bilinguals; Schmitz (2003) on Italian–German bilin-
guals). Thus, crosslinguistic eVects obtain only from the non-null-subject language to the null-subject
language and never in the other direction, regardless of dominance.
116 The Nature of Gradience

(6.10) ‘Narrow’ versus ‘Interface’ syntax:


. Non-interpretable features that are internal to the computational
system of syntax proper and drive syntactic derivations are cat-
egorical in native grammars; are acquired successfully by adult L2
learners; and are retained in the initial stages of individual attrition.
. Interpretable features that ‘exploit’ syntactic options and belong to
the interface between syntax and other domains, such as the
lexicon, discourse, or pragmatics, may exhibit gradedness in native
grammars; may present residual optionality in near-native gram-
mars, due to the inXuence of the native language even at the most
advanced competence stage; and are vulnerable to change in indi-
vidual attrition.6
This generalization, which is compatible with theoretical assumptions in the
minimalist programme (Chomsky 1995), assumes the existence of diVerent
‘layers’ of syntactic knowledge and places these phenomena at the level of
syntactic representations: hence, within the speaker’s grammatical competence.
Interpretable features at interfaces are more vulnerable to underspeciWcation, in
both native and non-native grammars, and are therefore more prone to grad-
edness and optionality (Sorace and Keller 2005). This is the analysis adopted by
Tsimpli et al. (2004) for L1 attrition: no evidence of attrition is found in the
parameterization of purely formal syntactic features, whereas attrition is evi-
dent with respect to the distribution of subjects in appropriate pragmatic
contexts, which is regulated by interpretable interface features.
A similar diVerentiation between narrow syntax and interface properties
may be found in the domain of split intransitivity. The unaccusative–unerga-
tive distinction is a syntactically represented, potentially universal property,
and it belongs to narrow syntax. As argued in much recent research, the
syntactic conWguration that determines the unaccusativity of verbs contains a
telicity aspectual projection (Borer 1994; van Hout 2000). Core unaccusative
verbs are inherently lexically speciWed for telicity and categorically project
onto the unaccusative conWguration: they are determinate, acquirable, and

6 At Wrst sight, it may appear as if the generalization just presented contradicts decades of L2
acquisition research. In particular, early research showed that semantically more transparent proper-
ties are easier to learn than more abstract syntactic properties that do not correspond in any clear way
to semantic notions (see e.g. Kellerman 1987). Moreover, studies of the ‘basic variety’ argued that early
interlanguage grammars favour semantic and pragmatic principles of utterance organization (Klein
and Perdue 1997). However, the argument here is NOT that syntactic aspects are easier than semantic
aspects, but that aspects of grammar that require not only syntactic knowledge, but also the integra-
tion of syntactic knowledge with knowledge from other domains are late acquired, or possibly never
completely acquired by L2 learners.
Gradedness and Optionality in Mature and Developing Grammars 117

stable. Knowledge of split intransitivity, however, also involves mastery of the


behaviour of non-core verbs and their compositional interpretation in the
predicates in which they appear: this is acquired gradually through exposure
to particular verbs in speciWc aspectual contexts; gives rise to variable
intuitions; and is unstable in a situation of attrition.
Recent proposals in syntactic theory further reWne the distinction between
narrow syntax and interface properties. The distinction between formal
licensing of a null pro and the discourse-related conditions on postverbal
subjects, as well as on the use/interpretation of subject pronouns is high-
lighted by the ‘cartographic’ theoretical framework (Belletti 2004; Rizzi 2004).
It is assumed within this theory that the low part of the clause contains
discourse-related positions, labelled ‘Topic’ and ‘Focus’, which constitute a
clause internal VP periphery. Postverbal subjects Wll one of these dedicated
positions according to their interpretation in discourse contexts. For example,
a sentence such as (6.11b) is associated with the representation in (6.12), where
the subject Wlls the speciWer of the (new information) low Focus projection
and is therefore interpreted as conveying new information; the preverbal
subject position is occupied by pro:
(6.11) a. Chi ha starnutito?
b. Ha starnutito Gianni
(6.12) [CP . . . [TP pro . . . ha starnutito . . . [TopP [FocP Gianni [TopP [VP
. . . .] ] ] ] ] ]
According to this position, the formal syntactic licensing of pro is a necessary,
but not suYcient condition for VS, since the postverbal subject also requires
activation of the VP periphery. The experimental data illustrated earlier
indicate that it is precisely this further condition that remains problematic
in near native L2 speakers of Italian: these speakers often fail to activate the
VP-internal focus position required by focalization in Italian.7
The picture is furthercomplicated by the existence of diVerent phenomena that
involve an interface between syntax and discourse. Do they all present gradedness
and optionality? At a developmental level, there are theoretical and empirical
arguments in favour of a distinction between discourse-related phenomena that
are also relevant to LF, and those that are relevant only to the syntax–discourse

7 The result is the use of focus in-situ, namely an L1-based strategy that is more economical because
it involves a DP-internal focus position (as the one overtly manifested in a sentence like ‘John himself
sneezed’). It is worth noticing that L1 French speakers of L2 Italian often use clefting in the same
context (Leonini and Belletti 2004), which is an alternative way of activating the VP-periphery (as
shown in the example below) and is widely available in French.
(i) Ce . . . . [Top [Foc [Top [VP être [sc Jean [CP qui a éternué] ] ] ] ] ]
118 The Nature of Gradience

interface. LF-relevant phenomena may pose developmental problems at inter-


mediate stages, but they are ultimately acquired; LF-irrelevant phenomena raise
persistent problems at all stages. Moreover, the former normally determine
grammaticality eVects, whereas degrees of preference are associated with the
latter. For example, syntactic focusing in languages such as Hungarian and
Greek involves the formation of an operator-variable structure at LF (cf. Kiss
1998; Szendroi 2004), which causes verb-raising to C/F and associated grammat-
icality eVects, as opposed to degrees of preference. Focus movement of non-
subject arguments as well as adverbs and participles is unproblematic both in
advanced L2 speakers of Greek and in native Greek speakers in an attrition
situation. In contrast, both groups of speakers exhibit more variable performance
on overt subject pronouns, a discourse-related phenomenon (Tsimpli and Sorace
2005). The diVerences observed between these structures may be due to the fact
that discourse, in the sense of pragmatic conditions on the distribution of subject
pronouns, is outside grammar proper, whereas the LF-interface is aVected by
modular computations within the language system. Even when L2 speakers attain
native-like knowledge of properties relevant to LF representations, optionality
and crosslinguistic eVects remain possible at the discourse level where pragmatic
and processing constraints aVect L2 use.
These theoretical reWnements have begun to unravel the complexity of the
factors that determine instability at interfaces. In doing so, however, they have
also magniWed the fundamental ambiguity of the notion of interface and thus
the diYculty of establishing the origins of interface instability. Do interfaces
give rise to indeterminacy at the representational level, or is gradedness a
phenomenon external to syntactic representations? The ambiguity is apparent
in many recent studies. For example, Jakubowicz (2000) argues for the
relevance of the notion of syntactic complexity in research on early normal
and SLI child grammars, claiming that: (a) constructions requiring the
integration of syntactic knowledge and knowledge from other domains are
more complex than constructions requiring syntactic knowledge only; and
(b) a syntactic operation is less complex if it is obligatorily required in every
sentence; it is more complex if it is present only in some sentences because of
semantic or pragmatic choices. The felicitous use of complex constructions,
according to this deWnition, demands the simultaneous mastery of both the
morphosyntactic properties of given constructions and of the discourse
conditions governing their distribution and use.8 But is ‘complexity’ related

8 Avrutin (2004) goes a step further and regards ‘discourse’ as ‘a computational system (my
emphasis) that operates on non-syntactic symbols and is responsible for establishing referential
dependencies, encoding concepts such as ‘‘old’’ and ‘‘new’’ information, determining topics, introdu-
cing discourse presuppositions, etc’. Investigating the interface between syntax and discourse neces-
sarily requires going beyond ‘narrow syntax’.
Gradedness and Optionality in Mature and Developing Grammars 119

to problems internal to the speaker’s representation of syntactic knowledge,


or are these problems external to these representations and resulting from
processing diYculties in integrating knowledge from diVerent domains?
L2 studies on other potentially problematic interfaces (e.g. the syntax–
morphology interface (Lardiere 1998; Prévost and White 2000; White 2003))
point towards the latter explanation, suggesting that persistent problems with
inXectional morphology in endstate grammars may in fact be ‘surface’ prob-
lems related to the retrieval of the correct morphological exponents for
abstract syntactic features. The fact that learners’ problems tend to be with
missing inXection, as opposed to wrong inXection, suggests the existence of
diYculties at the level of access to knowledge, rather than with knowledge
itself, which lead to the optional use of ‘default’ underspeciWed forms.
The choice of referential pronouns in Italian qualiWes as complex, since it
demands the simultaneous mastery of both morphosyntactic properties and
discourse conditions. In contrast, referential subject pronouns in English are
less complex because there is no choice of diVerent forms that is conditioned
by discourse factors.9 Similar diVerentiations in terms of complexity can be
made for some manifestations of split intransitivity. Perfective auxiliaries in
Italian are more complex than in English, because only Italian requires a
choice of auxiliaries governed by lexical-semantic and aspectual features of
the verb. Auxiliary choice in Italian is also more complex than in French,
because in French the only verbs that take BE are those inherently speciWed for
telicity, and therefore there is no auxiliary selection dependent on the evalu-
ation of the properties of the predicate. Within Italian, auxiliary selection with
core verbs is less complex than with non-core verbs, because selection with the
latter has to take into account both the properties of the verb and other
characteristics of the predicate.
It is therefore possible to propose an alternative generalization on the
nature of interfaces:
(6.13) Processing complexity
. Structures requiring the integration of syntactic knowledge and
knowledge from other domains are more complex than structures
requiring syntactic knowledge only.
. Complex structures may present gradedness and variation in native
grammars; may pose residual diYculties to near-native L2 speakers;
may pose emerging diYculties to L1 speakers experiencing attrition

9 As pointed out by a referee, the interface with discourse conditions obviously aVects other aspects
of pronominal use in English, such as the distribution of stress.
120 The Nature of Gradience

from a second language because of increasingly frequent failure to


coordinate/integrate diVerent types of knowledge.
This hypothesis Wnds a fertile testing ground in recent research on L2 pro-
cessing.
. For L2 speakers, recent evidence from on-line psycholinguistic (Felser
et al. 2003; Kilborn 1992) and neuroscience experiments (particularly
ERPs, see Hahne and Friederici 2001);
. Sabourin (2003) indicates that syntactic processing (i.e. access to
‘Narrow Syntax’) continues to be less than optimally eYcient in non-
native speakers even at advanced levels.
If syntactic processing is less eYcient in L2 speakers than in L1 speakers, the
coordination of syntax with other domains is aVected because speakers have
insuYcient processing resources to carry it out. When coordination fails, speakers
resort to the most ‘economical’ option. Crosslinguistic inXuence from English may
thus not be the only cause of the over-use of overt subject pronouns in Italian–
English bilinguals. Rather, this behaviour is favoured by two concomitant factors:
on the one hand, the availability of the English option, which is economical in
processing terms; on the other, the speakers’ sub-optimal processing resources.10
Factors related to inadequate parsing resources also Wgure prominently in a
recent proposal on the nature of language learners’ grammatical processing by
Clahsen and Felser (in press). Accounting for L2 speakers’ divergent behaviour,
according to this proposal, does not necessarily involve positing ‘representa-
tional deWcits’: L2 speakers can, and indeed do, attain target representations of
the L2, but may compute incomplete (‘shallow’) syntactic parses in compre-
hension. Such shallow processing is often accompanied by reliance—or over-
reliance—on lexical, semantic, and pragmatic information, which can lead to
seemingly trouble-free comprehension in ordinary communication.
If the notion of shallow processing is extended to production (see Sorace,
in press, for discussion), one may plausibly assume that shallow processing
may result in non-native speakers’ lack of activation of the VP periphery in
narrow focus contexts. Interpreting these phenomena in the light of Clahsen
and Felser’s hypothesis allows us to identify their source in the persistence of
an L1-based discourse ‘prominent’ strategy, employed to compensate for the
failure to compute the required L2 syntactic representation, despite the
potential grammatical availability of the latter. In comprehension, shallow

10 A similar argument is developed by Rizzi (2002), who accounts for the presence of null subjects
in early child English grammars by assuming that this is an option structurally available to the child,
which also happens to be favoured in terms of limited processing resources.
Gradedness and Optionality in Mature and Developing Grammars 121

processing may similarly involve the optional lack of activation of the VP


periphery, which is necessary to the reading of the postverbal subject as
carrying focus on new information. The result may be, for example, an ‘old
information’ interpretation of an indeWnite postverbal DP.
Analogous considerations may be extended to the diVerent distribution of
overt subject pronouns in near-native Italian. The non-native production and
interpretation of overt subject pronouns may be the result of shallow pro-
cessing of the interface mapping governing the use of overt subjects (e.g. the
obligatory presence of the feature ‘topic shift’; see Tsimpli et al. 2004) and the
consequent assimilation of strong Italian pronouns to the corresponding
weak English pronouns, which—unlike Italian overt pronouns—can refer
to topic antecedents. The strategy used in these circumstances would be
diVerent from the use of overt pronouns in a default form to relieve process-
ing overload due, for example, to insuYcient knowledge of (or access to)
agreement inXection (Bini 1993; Liceras et al. 1999; Sorace in press).
This account crucially involves the optionality of shallow processing, that is,
the L2 speakers’ ability to perform full processing, at least at the near-native
level. Shallow processing, in this sense, would be a relief strategy that is available
to all speakers but is relied on especially by bilingual speakers.11 For this reason,
native speakers should not be immune from occasional interface coordination
diYculties, for example in situations of competing processing demands.
Indeed, a study by Serratrice (2004) shows that older monolingual Italian
children (aged 8+) overproduce overt referential subjects, although not to the
same extent as English–Italian bilingual children. This seems to suggest that
the ‘interface’ conditions relating subject pronouns to discourse factors are late
acquired because they are more demanding. As already mentioned, even Italian
adult monolingual control groups in bilingual studies (Tsimpli et al. 2004;
Filiaci 2003; Belletti et al. 2005) do not show categorically correct behaviour
with respect to subject pronouns; they (sporadically) make unidirectional
mistakes that always involve the inappropriate use of overt subjects in contexts
that would favour null subjects.12

11 One should not lose sight of the fact that these diYculties are resolved in ways that betray the
inXuence of universal factors. Optionality favours the retention and occasional surfacing of unmarked
options which are subject to fewer constraints, consistent with typological trends (see Bresnan 2000).
12 The extension of overt subject pronouns to null subject contexts is attested in another situation
in which knowledge of English is not a factor. Bini (1993) shows that Spanish learners of Italian up to
an intermediate proWciency level use signiWcantly more overt subjects than monolingual Italians and
monolingual Spanish speakers. Since the two languages are essentially identical with respect to both
the syntactic licensing of null subjects and the pragmatic conditions on the distribution of pronominal
forms, L1 inXuence is not a relevant factor here. This pattern is therefore likely to be due exclusively to
coordination diYculties leading to the use of overt subjects as a default option.
122 The Nature of Gradience

6.4.1 The role of external ‘destabilizing’ factors


Finally, the quantitative and qualitative characteristics of the input to which
speakers are exposed may play a role in accounting for the instability found at
interfaces in diVerent speaker populations. The quantitative factor is evident
in bilingual use. What L2 near-native speakers and L1 speakers under attrition
have in common is the fact that their total exposure to the language is reduced
compared to that of monolingual speakers: in the case of L2 speakers, because
they started the process of L2 acquisition in adulthood; in the case of L1
speakers under attrition, because they are no longer exposed to the L1 in a
continuous way.13 Qualitative diVerences may be less obvious but are equally
relevant: both the near-native speakers of Italian and the native Italian
experiencing attrition are likely to receive input from native Italians in a
situation of attrition and from other non-native Italian speakers; they may
be exposed to non-native Italian from their spouse, and to ‘bilingual’ Italian
from their children. Thus, these speakers receive qualitatively diVerent input
that is consistent with, and reinforces, their own grammar.
Gradedness and indeterminacy in split intransitivity is also fed and main-
tained, both in native and non-native speakers, by the input, which is
categorical and uniform (and therefore rich in terms of frequency) for core
verbs and variable for non-core verbs.14
It is intriguing to ask exactly what ‘destabilizing’ eVects may be brought
about by the quantitative and qualitative diVerences in the input to which
speakers are exposed, and grammars are aVected in diVerent ways. One
possible hypothesis is that quantitative diVerences are likely to aVect process-
ing abilities, because speakers have fewer opportunities to integrate syntax

13 Clearly the quantitative factor is also a function of age of Wrst exposure: it cannot be considered
in absolute terms. Thus, an L2 speaker may have been exposed to the language for many decades and
still exhibit non-native behaviour compared to a younger native speaker who has been exposed to the
language for a shorter time, but since birth.
14 Variation at interfaces may be regarded as the motor of diachronic change, because it is at this
level that ‘imperfect acquisition’ from one generation to the next is likely to begin. Sprouse and Vance’s
(1999) study of the loss of null subjects in French indicates that language contact created a situation of
prolonged optionality, that is competition between forms that make the same contribution to
semantic interpretation, during which the null-subject option became progressively dispreferred in
favour of the overt-subject option because it is the less ambiguous in processing terms. An analogous
situation is experienced by the native Italian speaker after prolonged exposure to English: this speaker
will be exposed both to null pronouns referring to a topic antecedent (in Italian) and to overt
pronouns referring to a topic antecedent (in English, and also in the Italian of other native speakers
in the same situation). Optionality, and competition of functionally equivalent forms, is therefore as
relevant in this situation as in diachronic change.
The diachronic loss of auxiliary choice in Romance languages may also be traced as beginning from
non-core verbs and gradually extending to core verbs (Sorace 2000b; Legendre and Sorace 2003).
Gradedness and Optionality in Mature and Developing Grammars 123

and other cognitive domains in interpretation and production; qualitative


diVerences, on the other hand, may aVect representations, because speakers
would receive insuYcient evidence for interface mappings. Generally, it seems
that exposure to consistent input up to a certain threshold level is necessary
both for acquiring and maintaining an eYcient syntactic system.

6.5 Conclusions
To conclude, I have presented evidence of gradedness and optionality in
native and non-native grammars whose locus seems to be the interface
between syntactic and other cognitive domains. There are two potential
explanations for these patterns. One involves underspeciWcation at the level
of knowledge representations that involves the interaction of syntax and other
cognitive domains, such as lexical-semantics and discourse-pragmatics. The
other involves insuYcient processing resources for the coordination of diVer-
ent types of knowledge. Furthermore, there are diVerent kinds of interfaces,
not all of which are susceptible to gradedness eVects either in stable or in
developing grammars. Behavioural and neuropsychological evidence suggests
that syntactic processes are less automatic in L2 speakers than in L1 speakers,
which in turn may increase coordination diYculties. L2 speakers may also
have inadequate resources to carry out the right amount of grammatical
processing required by on-line language use, independently of their syntactic
knowledge representations. The processing and the representational explan-
ations, however, do not necessarily exclude each other, and indeed seem to
work in tandem, particularly in the case of bilingual speakers. Furthermore,
syntactic representations and processing abilities may be diVerentially aVected
over time by quantitative and qualitative changes occurring in the input to
which speakers are exposed. Future research is needed to ascertain the plausi-
bility, and work out the details, of a uniWed account of gradedness and
optionality in native and non-native grammars.
7

Decomposing Gradience:
Quantitative versus Qualitative
Distinctions
M AT T H I A S S C H L E S EWS K Y, I NA B O R N K E S S E L A N D
BRIAN MCELREE

7.1 Introduction
Psycho- and neurolinguistic research within the last three decades has shown
that speaker judgements are subject to a great deal of variability. Thus, speakers
do not judge all sentences of a given language equally acceptable that are
assumed to be grammatical from a theoretical perspective. Likewise, ungram-
matical sentences may also vary in acceptability in rating studies conducted
with native speakers. These Wndings stand in stark contrast to the classical
perspective that grammaticality is categorical in that a sentence is either fully
grammatical or fully ungrammatical with respect to a particular grammar.
This apparent contradiction has, essentially, been approached from two
diVerent directions. On the one hand, it has been proposed that judgement
variability—or gradience—results from extra-grammatical ‘performance fac-
tors’ and that it therefore has an origin distinct from linguistic ‘competence’
(Chomsky 1965). Alternatively, the gradience of linguistic intuitions has been
described in terms of varying markedness of the structures in question. Rather
than appealing to variation in grammaticality, this latter approach introduces
and appeals to an additional grammar-internal dimension. The idea that
structures can vary in acceptability for grammar-internal reasons has found
expression in the use of question marks, hash marks, and the like to describe
the perceived deviation from the endpoints of the grammaticality scale.
Importantly, it must be kept in mind that judgements of acceptability—
whether they are binary judgements or numerical ratings—represent
Quantitative versus Qualitative Distinctions 125

unidimensional assessments of what is inherently a multidimensional signal.


In essence, intuitions of acceptability reXect the endpoint of a complex
sequence of processes underlying sentence comprehension or production.
Consequently, it is possible that two sentence structures judged to be equally
unacceptable may be unacceptable for rather diVerent reasons. Consider, for
example, the following three German examples:

(7.1) a. Dann hat der Lehrer dem Jungen den Brief gegeben.
then has [the teacher]NOM [the boy]DAT [the letter]ACC given
‘Then the teacher gave the letter to the boy.’
b. ??Dann hat dem Jungen den Brief der Lehrer gegeben.
then has [the boy]DAT [the letter]ACC [the teacher]NOM given
‘Then the teacher gave the letter to the boy.’
c. *Dann hat der Lehrer gegeben dem Jungen den Brief.
then has [the teacher]NOM given [the boy]DAT [the letter]ACC

Example (7.1a) illustrates the canonical argument order in German: nomina-


tive > dative > accusative. In (7.1b), two argument permutations have
resulted in the order dative > accusative > nominative. Argument serializa-
tions of this type are typically analysed as grammatical, but are highly marked.
Example (7.1c), by contrast, is ungrammatical because of the positioning of
the participle, which should be clause-Wnal. Interestingly, structures such as
(7.1b) and (7.1c) are consistently judged to be equally (un-)acceptable in
rating studies of various types, including questionnaire studies (Pechmann
et al. 1996) and speeded acceptability ratings (Röder et al. 2002; Fiebach et al.
2004). Thus, it is not possible to discriminate between the two sentence types
by relying on linguistic intuitions alone.
Fortunately, however, other measures can eVectively discriminate between the
structures in (7.1). In a recent study using functional magnetic resonance
imaging (fMRI) to map the brain areas involved in the processing of sentences
such as (7.1), Fiebach et al. (2004) showed that the observed pattern of accept-
ability can be traced back to two distinct sources of neural activation (Figure 7.1).
Whereas complex grammatical sentences (e.g. 7.1b) gave rise to an enhanced
activation in Broca’s area (the pars opercularis of the left inferior frontal gyrus,
BA 44), ungrammatical structures (e.g. 7.1c) engendered activation in the
posterior deep frontal operculum. The data reported by Fiebach et al. (2004)
thus provide a compelling demonstration that overt judgements of sentence
acceptability (or grammaticality) may not provide an adequate means of deter-
mining the underlying diVerences in acceptability of various sentence structures.
126 The Nature of Gradience

scrambling effect
ungrammaticality effect

Figure 7.1. A schematic illustration of the activations elicited by the complexity


(scrambling) and the grammaticality manipulation in Fiebach et al. (2004)

In this chapter, we draw upon studies of word order variation in German to


examine how linguistic judgements emerge from the real-time comprehen-
sion processes. On the basis of a number of empirical observations, we argue
that gradient data need not be interpreted as evidence against categorical
grammars. Rather, gradience can arise from a complex interaction between
grammar-internal requirements, processing mechanisms, general cognitive
constraints1 and the environment within which the judgement task is per-
formed. We begin by describing the critical phenomena from a behavioural
perspective (Section 7.2), before turning to experimental methods providing
more Wne-grained data (event-related brain potentials, ERPs; speed-accuracy
trade-oV, SAT) (Section 7.3).

7.2 The phenomenon: argument order in German


Argument order variations in German are typically classiWed along several
dimensions. First, a permuted argument may occupy either the sentence-
initial position (the Vorfeld) or reside in a clause-medial position (in the
Mittelfeld).2 Secondly, the type of permuted argument (wh-phrase, pronoun,
etc.) is also of crucial importance. In this way, four permutation types are
distinguished: topicalization (7.2a), wh-movement (7.2b), scrambling (7.2c),
and pronoun ‘movement’ (7.2d).
1 It has been shown, for example, that the Wnal interpretation of a sentence may vary interindivi-
dually as a function of general cognitive capacity. Thus, researchers have distinguished between fast
and slow comprehenders (e.g. Mecklinger et al. 1995), good and poor comprehenders (e.g. King and
Kutas 1995), high and low verbal working memory capacity as measured by the reading span test (King
and Just 1991) and individual alpha frequency (Bornkessel et al. 2004b). However, a discussion of these
factors is beyond the scope of this chapter.
2 The Mittelfeld is the region of the German clause that is delimited to the left by a complementizer
(subordinate clauses) or Wnite verb in second position (main clauses) and to the right by a clause-Wnal
participle or particle.
Quantitative versus Qualitative Distinctions 127

(7.2) a. Topicalization (Vorfeld, -wh)


Den Arzt hat wahrscheinlich der Journalist
[the doctor]ACC has probably [the journalist]NOM
eingeladen.
invited
‘The journalist most likely invited the doctor.’
b. Wh-movement (Vorfeld, +wh)
Welchen Arzt hat wahrscheinlich der Journalist
[which doctor]ACC has probably [the journalist]NOM
eingeladen?
invited
‘Which doctor did the journalist most likely invite?’
c. Scrambling (Mittelfeld, non-pronominal)
Wahrscheinlich hat den Arzt der Journalist
probably has [the doctor]ACC [the journalist]NOM
eingeladen.
invited
‘The journalist most likely invited the doctor.’
d. Pronoun ‘movement’ (Mittelfeld, pronominal)
Wahrscheinlich hat ihn der Journalist eingeladen.
probably has [him]ACC [the journalist]NOM invited
‘The journalist most likely invited him.’
In addition to these four theoretically motivated permutation types, psycho-
and neurolinguistic studies implicate an additional dimension, namely
whether the permuted argument is unambiguously case marked (e.g. den
Arzt, ‘[the (male) doctor]NOM’) or case ambiguous (e.g. die Ärztin, ‘[the
(female) doctor]NOM/ACC’). From a comprehension perspective, the diVerence
between unambiguous and ambiguous case marking lies in the fact that the
former immediately signals the presence of an argument order variation, while
the latter does not. Empirical evidence indicates that, when faced with an
ambiguity, German speakers initially pursue a strategy in which the Wrst
ambiguous argument is analysed as the subject of the clause (e.g. Frazier and
d’Arcais 1989; de Vincenzi 1991; Bader and Meng 1999). Only when informa-
tion contradicting this analysis is encountered is a reanalysis of the clause
initiated. In this way, the processes leading to the recognition of an argument
order variation diVer qualitatively in unambiguous and ambiguous situations.
The various types of argument order variations in German have been subject to
a number of empirical investigations using diVerent types of acceptability meas-
urements. From these Wndings, the four central generalizations in (7.3) emerge.
128 The Nature of Gradience

(7.3) Generalizations with regard to the acceptability of argument order


variations in German
(i) object-initial sentences are generally less acceptable than their
subject-initial counterparts (Krems 1984; Hemforth 1993);
(ii) acceptability decreases with an increasing number of
permutations (Pechmann et al. 1996; Röder et al. 2000);
(iii) the acceptability of object-initial sentences decreases when the
permuted object is case ambiguous (Meng and Bader 2000a);
(iv) the acceptability diVerence between object- and subject-initial
structures varies according to the following hierarchy:
scrambling > topicalization > wh-movement > pronoun
movement (Bader and Meng 1999).
The four generalizations summarized in (7.3) interact to produce the overt
acceptability pattern seen in German argument-order variations, giving what
appear to be a highly gradient set of linguistic intuitions. To cite just one
example, Meng and Bader (2000b) report a 49 percent acceptability rate for
the scrambling of ambiguous accusative objects. As participants were asked to
provide yes–no judgements, this amounts to chance-level performance.3
However, the generalizations in (7.3) emerged almost exclusively from
studies on the permutation of accusative objects in transitive sentences. By
contrast, when the relative ordering between dative- and nominative-marked
arguments is examined, at least two interesting exceptions to this general
pattern become apparent. First, the severe drop in acceptability for scrambled
(transitive) objects is attenuated when the object bears dative case marking:
Schlesewsky and Bornkessel (2003) report an 86 percent acceptability rate for
initially ambiguous dative-initial structures similar to those engendering a
49 percent acceptability rate for accusative-initial structures in the Meng and
Bader (2000b) study. Secondly, in sentences with dative object-experiencer
verbs—which project an argument hierarchy in which the dative-marked
experiencer outranks the nominative-marked stimulus—the acceptability
decrease for object-initial orders is neutralized or even tendentiously reversed
(Schlesewsky and Bornkessel 2003; see Table 7.1).
Dramatic diVerences of this sort call for an explanation. We believe that
an adequate explanation requires, as a Wrst step, an accurate, Wne-grained
characterizationoftheoutputsignalthatconstitutesanacceptability judgement.

3 Note that this essentially amounts to the same level of performance that a non-human primate
might be expected to produce when confronted with the same sentences and two alternative push-
buttons. As such, chance-level acceptability deWes interpretation.
Quantitative versus Qualitative Distinctions 129

Table 7.1. Acceptability ratings for locally ambiguous subject-


and object-initial sentences with dative active and dative object-
experiencer verbs

Sentence type Mean acceptability in %

Subject-Wrst, active verb 92.3


Subject-Wrst, object-experiencer verb 84.6
Object-Wrst, active verb 86.5
Object-Wrst, object-experiencer verb 91.7
Source : Schlesewsky and Bornkessel 2003

Thus, we must examine how the judgement ‘emerges’ from the on-line
comprehension process.

7.3 Sources of gradience


A natural Wrst step in tracing the emergence of a linguistic judgement lies in
the examination of the comprehension system’s initial response to the vari-
ation under consideration, that is, for present purposes, to the argument
order permutation. How does the system react when it encounters an object
before a subject and is there evidence for an immediate diVerentiation
between the diVerent permutation types? A methodological approach opti-
mally suited to answering this question is the measurement of event-related
brain potentials (ERPs; see Appendix 1). Because of their very high temporal
resolution and their multi-dimensional characterization of neuronal activity,
ERPs allow for an exquisitely precise diVerentiation of various cognitive
processes, and, not surprisingly, many researchers have capitalized upon
these properties to explore language processing.
Most of the argument order variations discussed above have been subjected
to examinations using ERPs, but we restrict our discussion to order permu-
tations in dative constructions and how these contrast with those in accusa-
tive structures. Moreover, we will focus primarily on Wndings for initially
ambiguous structures, as these reveal the inXuence of processing consider-
ations on acceptability ratings most clearly. From the perspective of a strong
competence versus performance distinction, these structures might be con-
sidered a ‘worst case scenario’ and are as such well-suited to examining the
limitations of supposedly time insensitive linguistic judgements.
The exploration of ambiguous sentences provides fruitful ground
for investigating these issues. Consider, for example, the sentence fragment
in (7.4):
130 The Nature of Gradience

(7.4) . . . dass Dietmar Physiotherapeutinnen . . .


. . . that DietmarNOM/ACC/DAT physiotherapistsNOM/ACC/DAT
When confronted with an input such as (7.4), the processing system initially
analyses the Wrst argument Dietmar as the subject of the clause (e.g. Hemforth
1993; Schriefers et al. 1995; Bader and Meng 1999; Schlesewsky et al. 2000).
Accordingly, the second argument Physiotherapeutinnen—which does not con-
tradict the initial assignment—is analysed as an object. If, however, the clause is
completed by a plural verb such as beunruhigen (‘to disquiet’), the supposed
subject of the clause no longer agrees with the Wnite verb. Thus, a reanalysis
towards an object-initial order must be initiated in order for a correct interpret-
ation to be attained. In terms of ERP measures, reanalyses are typically associated
with a late (approximately 600–900 ms) positivity with a posterior distribution
(P600; e.g. Osterhout and Holcomb 1992). Indeed, this component has also been
observed for the reanalysis of argument order in German, for example in wh-
questions (beim Graben et al. 2000), topicalizations (Frisch et al. 2002) and
scrambled constructions (Friederici and Mecklinger 1996). Note, however, that
all of these studies only manipulated the word order of accusative structures.
However, a qualitative diVerence emerges when the reanalysis towards a
dative-initial order is examined and compared to the reanalysis towards an
accusative-initial order in an otherwise identical sentence (i.e. completing
sentence fragments as in (7.4) with either an accusative or a dative verb, e.g.
besuchen (‘to visit’) versus danken (‘to thank’)). While the reanalysis in

(a) Accusative (b) Dative

CP3 CP3
N400

0.600 ..0.900 s 0.600 ..0.600 s


P600

−4.0 mV +4.0 −4.0 mV +4.0

−5 mV
s OBJECT−SUBJECT

0.5 1.0 SUBJECT−OBJECT


5 CP3

Figure 7.2. Grand average ERPs for object- and subject-initial structures at
the position of the disambiguating clause-final verb (onset at the vertical bar) for
sentences with accusative (A) and dative verbs (B). Negativity is plotted upwards. The
data are from Bornkessel et al. (2004a)
Quantitative versus Qualitative Distinctions 131

accusative structures gives rise to a P600 eVect as discussed above, the revision
towards a dative-initial word order elicits a centro-parietal negativity between
approximately 300 and 500 ms post onset of the disambiguating element
(N400; Bornkessel et al. 2004a; Schlesewsky and Bornkessel 2004). The
diVerence between the two eVects is shown in Figure 7.2.
In accordance with standard views on the interpretation of ERP component
diVerences (e.g. Coles and Rugg 1995), we may conclude from this distinction that
reanalysis towards an object-initial order engages qualitatively diVerent processing
mechanisms in accusative and dative structures. The processes in question, which
may be thought to encompass both conXict detection and conXict resolution,
therefore reXect underlyingly diVerent ways of resolving a superWcially similar
problem (i.e. the correction of an initially preferred subject-initial analysis).
Before turning to the question of whether the ERP diVerence between
reanalyses towards accusative and dative-initial orders may be seen as a
correlate of the diVerent acceptability patterns for the two types of object
cases—and thereby a source of gradience in this respect—we shall Wrst
examine a second exception to the generalizations in (7.3), namely the
behaviour of dative object-experiencer verbs (e.g. gefallen, ‘to be appealing
to’). As brieXy discussed above, this verb class is characterized by an ‘inverse
linking’ between the case/grammatical function hierarchy and the thematic
hierarchy: the thematically higher-ranking experiencer bears dative case,
while the lower-ranking stimulus is marked with nominative case. In the
theoretical syntactic literature, it has often been assumed that these verbs
are associated with a dative-before-nominative base order, which comes
about when the lexical argument hierarchy (experiencer > stimulus) is
mapped onto an asymmetric syntactic structure (e.g. Bierwisch 1988; Wun-
derlich 1997, 2003; Haider 1993; Haider and Rosengren 2003; Fanselow 2000).
These properties of the object-experiencer class lead to an interesting con-
stellation for argument order reanalysis, which may again be illustrated using
the sentence fragment in (7.4). As with the cases discussed above, the com-
prehension system initially assigns a subject-initial analysis to the input
fragment in (7.4). Again, when this fragment is completed by a dative
object-experiencer verb that does not agree with the Wrst argument, reanalysis
towards an object-initial order must be initiated. However, in contrast to the
structures previously discussed, here the verb provides lexical information in
support of the target structure, speciWcally an argument hierarchy in which
the dative outranks the nominative. The ERP diVerences between reanalyses
initiated by dative object-experiencer verbs and dative active verbs (which
project a ‘canonical’ argument hierarchy) are shown in Figure 7.3 (Bornkessel
et al. 2004a).
132 The Nature of Gradience

(a) Active verbs (b) Object-experiencer verbs

N400 N400
P4
P4

0.350 .. 0.550 s 0.350 .. 0.550 s

−3.0 mV +3.0 −3.0 mV +3.0

−5 mV
s SUBJECT−OBJECT
0.5 1.0 OBJECT−SUBJECT
5 P4

Figure 7.3. Grand average ERPs for object- and subject-initial structures at the
position of the disambiguating clause-final verb (onset at the vertical bar) for sen-
tences with dative active (A) and dative object-experiencer verbs (B). Negativity is
plotted upwards. The data are from Bornkessel et al. (2004a)

As Figure 7.3 shows, reanalyses initiated by a dative object-experiencer verb


also engender an N400 component, rather than a P600. However, the N400
eVect is less pronounced than for the analogous structures with dative active
verbs. The diVerence between the two types of dative constructions is there-
fore quantitative rather than qualitative in nature. This suggests that reanaly-
sis is more eVortful in the case of dative active verbs, but that the same
underlying processes may be assumed to take place with both verb classes.
To a large extent, the ERP patterns mirror the acceptability judgements
described above. On the one hand, there is a general diVerence between
dative- and accusative-initial sentences: the former are not only more accept-
able than the latter, they also engage qualitatively diVerent processing mech-
anisms in reanalysis. Secondly, there is also a diVerence within the dative verbs
themselves such that reanalysis towards a dative-initial order is less costly
when it is triggered by an object-experiencer rather than an active verb.
Nonetheless, reanalyses with both dative verb types appear to proceed in a
qualitatively similar manner.
However, despite this strong convergence of measures, it is unrealistic to
expect a one-to-one mapping between the ERP data and the acceptability
ratings. Indeed, not all the diVerences found in ERP measures are expressed
in overt judgements. For example, the disadvantage for the object-initial
Quantitative versus Qualitative Distinctions 133

structures is measurable in ERPs even with dative object-experiencer verbs,


while the diVerence between the two word orders is no longer visible in the
acceptability rates. In order to precisely predict the relationship between the
two types of measures we must fully understand how an overt judgement
‘emerges’ from the comprehension process. This requires tracing the devel-
opment of the judgement from the point at which the problem is detected to
later points when the system has settled on a Wnal assessment of the accept-
ability of the structure.
The speed–accuracy trade-oV procedure (SAT; see Appendix 2) is one
experimental method that allows for an examination of how a linguistic
judgement develops over time. This method traces the emergence of an
acceptability judgement from its beginnings (i.e. from the point at which
the judgement departs from chance-level) up to a terminal point (i.e. a point
at which the judgement no longer changes even with functionally unlimited
processing time). Under the assumption that ERPs characterize the processing
conXict and its resolution, while time-insensitive linguistic judgements reXect
the endpoint of a multidimensional set of processing mechanisms, SAT
procedures provide a bridge between the two measures.
Let us Wrst consider the SAT results for reanalysis towards a dative-initial
order in sentences with dative active and dative object-experiencer verbs. The
SAT functions for the four critical conditions are shown in Figure 7.4 (Born-
kessel et al. 2004a).
The SAT data shown in Figure 7.4 were best Wt with an exponential
approach to a limit function (Eq. 1), that assigned a distinct asymptote (l)

Subject-Object, Active
4 Object-Subject, Active
Subject-Object, Object-Experiencer
Accuracy (d⬘)

3 Object-Subject, Object-Experiencer

0 1 2 3 4 5 6
Processing time (lag plus latency) in seconds

Figure 7.4. SAT functions for object- and subject-initial structures with (dative)
active and (dative) object-experiencer verbs The data are from Bornkessel et al.
(2004a)
134 The Nature of Gradience

to each of the four conditions and distinct intercepts (d) to the subject-initial
and object-initial conditions, respectively.
(Eq. 1) d’ (t) ¼ l (1–e–b(t–d)) for t > d, otherwise 0
The intercept diVerence between subject-initial and object-initial structures,
with a longer intercept for object-initial structures, indicates that the Wnal
analysis of the object-initial sentences takes longer to compute than the
Wnal analysis of their subject-initial counterparts. This is the characteristic
pattern predicted for a reanalysis operation: as reanalysis requires additional
computational operations, the correct analysis of a structure requiring
reanalysis should be reached more slowly than the correct analysis of an
analogous structure not requiring reanalysis. The dynamics (intercept)
diVerence occurs in exactly the same conditions as the N400 eVect in the
ERP experiment.
The asymptotic diVerences appear to result from two sources. First, the
object-initial structures are generally associated with lower asymptotes than
the subject-initial controls. This diVerence likely reXects a decrease in accept-
ability resulting from the reanalysis operation required to interpret the
object-initial sentences. A principled explanation for this pattern, one that
is consistent with the concomitant dynamics diVerences, is that, on a certain
proportion of trials, the processing system fails to recover from the initial
misanalysis, thus engendering lower asymptotic performance for an initially
ambiguous object-initial structure as compared to a subject-initial structure.
More interesting, perhaps, are the diVerences in asymptote between the two
object-initial conditions: here, the sentences with object-experiencer verbs
were associated with a reliably higher asymptote than those with active verbs.
This diVerence may directly reXect the diVerences in the accessibility of the
object-initial structure required for a successful reanalysis. Whereas the active
verbs provide no speciWc lexical information in favour of such a structure, the
object-experiencer verbs are lexically associated with an argument hierarchy
calling for precisely this type of ordering. Thus, while a garden path results for
both verb types, the object-experiencer verbs provide a lexical cue that aids
the conXict resolution. Again, the correspondence to the ERP data is clear: the
higher asymptote for the object-initial structures with object-experiencer
verbs—which we have interpreted as arising from the higher accessibility of
the object-initial structure in these cases—corresponds to the reduced N400
for this condition, which also reXects a reduction of the reanalysis cost.
Two conclusions concerning acceptability patterns follow from this analy-
sis. First, despite the presence of an almost identical processing conXict in
both cases, dative-initial structures with object-experiencer verbs are more
Quantitative versus Qualitative Distinctions 135

acceptable than those with active verbs because only the former are lexically
associated with an object-initial word order. Secondly, however, even the
presence of an object-experiencer verb can never fully compensate the cost
of reanalysis, as evidenced by the fact that an initially ambiguous dative-initial
structure never outstrips its nominative-initial counterpart in terms of
acceptability. From a surface perspective, therefore, the observed acceptability
patterns are the result of a complex interaction between diVerent factors. The
observed gradience does not result from uncertainty in the judgements, but
rather from interactions between the diVerent operations that lead to the Wnal
intuition concerning acceptability.
Having traced the emergence of the acceptability judgements for the two
types of dative structures, a natural next step appears to be to apply the same
logic to the diVerence between accusative and dative structures and to thus
examine whether similar types of parallels between the on-line and oV-line
Wndings are evident in these cases.
Recall that, while the reanalysis of a dative structure generally engenders an
N400 eVect in ERP terms, reanalysis towards an accusative-initial order has
been shown to reliably elicit a P600 eVect. In addition, the surface acceptabil-
ity is much lower for the accusative than for the dative-initial structures.
How might these Wndings be related? Essentially, the diVerent ERP patterns
suggest that the two types of reanalysis take place not only in a qualitatively
diVerent manner, but also in diVerent phases of processing: while the N400 is
observable between approximately 300 and 500 ms after the onset of a critical
word, the time range of the P600 eVect is between approximately 600 and
900 ms. Thus, the reanalysis of an accusative structure appears to be a later
process than the reanalysis of a dative structure and, in terms of the SAT
methodology, we might therefore expect to observe larger dynamics diVer-
ences between subject- and object-initial accusative sentences than in the
analogous contrast for dative sentences. As discussed above, dynamics diVer-
ences can subsequently lead to diVerences in terminal (asymptotic) accept-
ability and the distinction between the dative and the accusative structures
might therefore also—at least in part—stem from a dynamic source.
The diVerence between subject- and object-initial dative and accusative
structures as shown in an SAT paradigm is shown in Figures 7.5.a and 7.5.b
(Bornkessel et al. submitted).
As the accusative and dative sentences were presented in a between-subjects
design, model Wtting was carried out separately for the two sentence types.
While the accusative structures were best Wt by a 2l–2b–2d model (adjusted
R2 ¼ .994), the best Wt for the dative structures was 1l–1b–2d (adjusted
R2 ¼ .990). Estimates of the composite dynamics (intercept + rate) were
136 The Nature of Gradience

(a) 5 Accusatives

4 S-INITIAL
Accuracy in d⬘ units

O-INITIAL
3

0 1 2 3 4 5 6 7
Processing time in seconds

(b) 5 Datives

4
S-INITIAL
Accuracy in d⬘ units

O-INITIAL
3

0 1 2 3 4 5 6 7
Processing time in seconds

Figure 7.5. SAT functions for object- and subject-initial structures with accusative
(a) and dative verbs (b) The data are from Bornkessel et al. (submitted)

computed for each condition using the formula 1/d + b, which provides a
measure of the mean time required for the SAT function to reach the
asymptote. The dynamics diVerence between the two accusative structures
was estimated to be 588 ms (4430 ms for object- versus 3842 ms for subject-
initial sentences), while the diVerence for the dative sentences was estimated
at 332 ms (3623 ms versus 3291 ms). Thus, while both dative- and accusative-
initial sentences show slower dynamics than their subject-initial counterparts,
the dynamics diVerence between the subject- and object-initial accusative
structures is approximately 250 ms larger than that between the dative
structures. This Wnding therefore supports the hypothesis that the large
Quantitative versus Qualitative Distinctions 137

acceptability disadvantage for the initially ambiguous accusative-initial struc-


tures—and the corresponding asymptote diVerences for the accusative sen-
tences—results to a large extent from the highly pronounced dynamics
diVerence between these structures and the corresponding subject-initial
sentences. In other words, the likelihood that the correct analysis fails to be
computed is much higher in the accusative-initial sentences because the
computational operations required to obtain this analysis are much more
complex for this sentence type. Consequently, accusative-initial sentences are
rejected as unacceptable in a higher proportion of trials, thereby yielding a
lower acceptability rate/asymptote.
Again, SAT provides a principled means of establishing the corres-
pondence between the ERP data and the acceptability ratings. The reanalysis
mechanisms that are reXected in an N400 eVect—those that enable reanalysis
towards a dative-initial structure—are also associated with a smaller
dynamics increase than those reXected in a P600 component—those that
enable reanalysis towards an accusative-initial structure. Therefore, we might
speculate that the diVerence in the underlying neural processes, which is
reXected in the diVerent ERP components, gives rise to the concomitant
diVerences in SAT dynamics and, thereby, to the diVerences in surface
acceptability.
If both of the SAT studies discussed here are considered together, an
interesting diVerence between the two becomes apparent. In the Wrst
experiment, in which only dative structures were examined in a design
identical to that used to obtain the acceptability rates in Table 7.1, there
were reliable asymptote diVerences between dative- and nominative-initial
structures with dative active verbs. In the second study, there were compar-
able dynamics diVerences, but the asymptote diVerence—although apparent
in visual inspection—failed to signiWcantly improve the model Wt. How can
we account for this variation or inter-experimental gradience? Assuming
that the asymptotic acceptability measured using SAT reXects the endpoint
of processing and, thereby, the time-independent acceptability of a
given structure, one plausible explanation appears to lie in the diVerent
experimental environments in which the structures were presented. It is
well-known that sentences judgements are inXuenced by various factors
including context, Wller sentence type, etc. (Bard et al. 1996; Schütze 1996).
One crucial diVerence between the two SAT studies is that dative
object-experiencer verbs were only included in the Wrst experiment. It may
therefore be the case that the acceptability ratings for the object-initial
dative active sentences arise not only from a contrast with the corresponding
subject-initial sentences but also with the object-initial sentences with
138 The Nature of Gradience

experiencer verbs. In the face of more acceptable object-initial structures,


the acceptability disadvantage for the object-initial active sentences may be
‘overestimated’. If true, this observation suggests that terminal acceptability
may result from the interaction of a variety of diVerent factors. Whatever
the source of this discrepancy, it serves to highlight the highly variable
nature of acceptability judgements, and to contrast these measures with
those that more directly reXect intrinsic properties of the underlying pro-
cessing mechanisms, such as the dynamics diVerence between object- and
subject-initial structures, which may be assumed to be more stable and less
subject to environmental inXuences.

7.4 Final remarks


In this chapter, we have attempted to show how linguistic judgements arise
from diVerent facets of language comprehension. In particular, our data
suggest the following caveats concerning the interpretation of acceptability
judgements:
1. One-dimensional judgements often result from the interaction of a
variety of factors, and hence are inherently multi-dimensional.
2. SuperWcially similar judgements may stem from qualitatively diVerent
sources, which should be disentangled. DiVerences that may appear
quantitative, for example as diVerent ‘strengths’ on a single dimension
may nevertheless have qualitatively diVerent origins.
3. Acceptability decreases may be dynamic or non-dynamic in nature.
Concerning gradience in linguistic judgements, these Wndings indicate that a
considerable amount of variation in judgements may be accounted for by
carefully considering factors that interact to produce the end state that
constitutes an acceptability judgement. The question thus arises whether
gradience should indeed be attributed to linguistic competence, or whether
it is better described as a product of the language in use, that is of processing
mechanisms—which may or may not be language speciWc—and of general
cognitive factors (e.g. working memory, see footnote 1). From our perspec-
tive, the burden of the evidence rests with the advocates of gradient gram-
maticality, for it appears very diYcult to mount a convincing argument in
favour of grammar-internal gradience on the basis of acceptability judge-
ments alone. Thus, when all possible alternative sources of surface gradience
are considered, a categorical grammar still appears to be the simplest and
therefore most appealing means of accounting for the data.
Quantitative versus Qualitative Distinctions 139

7.5 Appendix 1. Event-related brain potentials (ERPs)


Event-related brain potentials (ERPs) are small changes in the spontaneous
electrical activity of the brain, which occur in response to sensory or cognitive
stimuli and which may be measured non-invasively by means of electrodes
applied to the scalp. The high temporal resolution of ERP measures is of
particular importance for the examination of language comprehension. Fur-
thermore, ERP patterns (‘components’) are characterizable in terms of the
following parameters: polarity (negative versus positive); topography (at
which electrode sites an eVect is visible); latency (the time at which the
eVect is visible relative to the onset of a critical stimulus); and amplitude
(the ‘strength’ of an eVect). While a number of language-related ERP com-
ponents have been identiWed (cf., for example, Friederici 2002), we will not
introduce these here for the sake of brevity. For a more detailed description of
the ERP methodology and how it has been applied to psycholinguistic
domains of investigation, the reader is referred to the overviews presented
in Coles and Rugg (1995), Garnsey (1993), and Kutas and Van Petten (1994).

Ongoing EEG

Amplifier

S S S S
one sec

Auditory event-related potential

−6 mV
N400
Signal
averager ELAN
Auditory
stimulus
(S)

P600

P200
+6 mV ms
200 400 600 800 1000
Stimulus Time
onset

Figure 7.6. Schematic illustration of the ERP methodology


140 The Nature of Gradience

The ERP methodology only provides relative measures, that is an eVect


always results from the comparison of a critical condition with a minimally
diVering control condition. For example, at the position of socks in He
spread the warm bread with socks in comparison to the position of butter
in He spread the warm bread with butter, a negativity with a centro-parietal
distribution and a maximum at 400 ms post critical word onset (N400) is
observable (Kutas and Hillyard 1980). Thus, in the experiments presented
here, we always compare the response to a critical condition with that to a
control condition at a particular (critical) position in the sentence.
A schematic illustration of the ERP methodology is shown in Figure 7.6.

7.6 Appendix 2. Speed–accuracy trade-oV (SAT)


Reading time (eye-movement tracking or self-paced) procedures are often
used as a natural and unintrusive measure of processing time. However, these
measures do not provide an estimate of the likelihood that readers have
successfully processed a sentence and, conversely, do not provide a direct
estimate of the time it takes to compute an interpretation. A reading time
diVerence can reXect the time needed to compute a particular interpretation,
but it can also reXect the likelihood that readers can compute that interpret-
ation or how plausible readers Wnds the resulting interpretation (McElree
1993, 2000; McElree and GriYth 1995, 1998; McElree and Nordlie 1999;
McElree et al. 2003). A standard solution to this problem is to derive a full
time–course function that measures how the accuracy of processing varies
with processing time (Wickelgren 1977). The response-signal, speed–accuracy
trade-oV (SAT) procedure provides the required conjoint measures of pro-
cessing speed and accuracy.
The response-signal speed–accuracy trade-oV task requires subjects to
make their judgement of acceptability at particular times. This serves to
chart the full time–course of processing, measuring when discrimination
departs from a chance level, the rate at which discrimination grows as a
function of processing, and the asymptotic level of discrimination accuracy
reached with (functionally) unlimited processing time. Figure 7.7 presents
illustrative SAT functions derived from this procedure. The accuracy of
discriminating acceptable from unacceptable sentences is measured in d’
units (the z-transform of the probability of correctly accepting an acceptable
sentence minus z-transform of the probability of falsely accepting an
unacceptable sentence). Typical SAT functions display three distinct phases:
a period of chance performance (d’ ¼ 0), followed by a period of increasing
accuracy, followed by an asymptotic period where further processing does not
Quantitative versus Qualitative Distinctions 141

improve performance. In a sentence acceptability task, the SAT asymptote


provides a measure of the probability (across trials and materials) that readers
arrive at an interpretation suYcient to support an ‘acceptable’ response. If
two conditions diVer in asymptote, as illustrated in Panel (a), it indicates that
they diVer in the likelihood that a meaningful interpretation can be computed
or in overall acceptability/plausibility of the respective interpretation.
The point at which accuracy departs from the chance level (the intercept of
the function) and the rate at which accuracy grows over processing time are
joint measures of the underlying speed of processing. If one type of structure

(a) Probability of computing of acceptable interpretation

2.0

1.5
Proportional dynamics
Functions reach a given proportion
1.0 of their asymptote at the same time.

0.5
Condition A
Condition B
0.0
Accuracy (d⬘ units)

0.0 1.0 2.0 3.0 4.0

(b) Speed of computing an acceptable interpretation

2.0

1.5
Disproportional dynamics
Functions reach a given proportion
1.0
of their asymptote at different times.

0.5

0.0

0.0 1.0 2.0 3.0 4.0


Processing time (response time) in seconds

Figure 7.7. Illustrative SAT functions


142 The Nature of Gradience

can be interpreted more quickly than another, the SAT functions will diVer in
rate, intercept, or some combination of the two parameters. This follows from
the fact that the SAT rate and intercept are determined by the underlying
Wnishing time distribution for the processes that are necessary to accomplish
the task. The time to compute an interpretation will vary across trials and
materials, yielding a distribution of Wnishing times. Intuitively, the SAT
intercept corresponds to the minimum of the Wnishing time distribution,
and the SAT rate is determined by the variance of the distribution. Panel (b)
depicts a case where the functions diVer in rate of approach to asymptote,
leading to disproportional dynamics; the functions reach a given proportion
of their asymptote at diVerent times.
Dynamics (rate and/or intercept) diVerences are independent of potential
asymptotic variation. Readers may be less likely to compute an interpretation
for one structure or may Wnd that interpretation less acceptable (e.g. less
plausible) than another; however, they may not require additional time to
compute that interpretation (McElree 1993, 2000; McElree et al. 2003; McElree
and Nordlie 1999).
Part II
Gradience in Phonology
This page intentionally left blank
8

Gradient Perception of Intonation


C A RO L I N E F É RY A N D RU B E N S TO E L

8.1 Introduction
Many phonologists associate the term ‘gradience’ with the distinction
between phonology—which is supposed to be categorical—and phonetics—
which is supposed to be gradient (see Cohn, this volume, for a review of the
issues associated with this distinction).1 In recent years, a diVerent role for
gradience in phonology has emerged: the well-formedness of phonological
structures has been found to be highly gradient in a way that correlates with
their frequency. In their chapter, Frisch and Stearns (this volume) show that
phonotactic patterns, like consonant clusters and other segment sequences, as
well as morphophonology, word-likeness, etc. are gradient in this way. The
examination of large corpora is a reliable indicator of relative frequency.
Crucially, the less frequent sequences are felt by speakers to be less prototyp-
ical exemplars of their category. In grammaticality judgement tasks, word-
likeness tasks, assessment of novel words, etc., less frequent items are likely to
get lower grades than more frequent ones. In short, speakers reproduce in
their judgements the pattern of relative frequency that they encounter in their
linguistic environment. In light of this well-documented (see Frisch and
Stearns, this volume and references cited there), but controversial result, the
question has arisen for some phonologists as to the need of a grammar
operating with abstract phonological categories, like features and phonemes.
In their opinion, if phonotactic distribution is learnable by executing prob-
abilistic generalizations over corpora, the only knowledge we need in order to

1 A pilot experiment for this paper was presented at the Potsdam Gradience Conference in October
2002 and some of the results discussed here were presented at the Syntax and Beyond Workshop in
Leipzig in August 2003. Thanks are due to two anonymous reviewers, as well as to Gisbert Fanselow
and Ede Zimmermann for helpful comments. Thanks are also due to Frank Kügler for speaking the
experimental sentences, and to Daniela Berger, Laura Herbst, and Anja Mietz for technical support.
Nobody except for the authors can be held responsible for shortcomings.
146 Gradience in Phonology

elaborate ‘grammars’ may turn out to be a stochastic one. But before we can
take a stand on this important issue in a competent way, we need to be well-
informed on other aspects of the phonology as well.
In this chapter, we take a Wrst step and investigate the question of whether
intonational contours are gradient in the same way that segment sequences
are. Is it the case that more frequent tonal patterns are more acceptable than
less frequent ones? We use the term gradience in the sense of gradient
acceptability.
Unfortunately, for a number of reasons, large corpora—at least in their
present state—are useless for the study of tonal pattern frequencies. One of
the reasons relates to the analysis and annotation of tonal patterns. Scholars
not only disagree on the kinds of categories entering intonation studies but
also on the annotation for ‘rising contour’, ‘fall–rise’, etc. Melodies—like
Gussenhoven’s (1984, 2004) nuclear contours or the British school’s ‘heads’
and ‘nuclei’—may well exist as independent linguistic elements, but they are
not transcribed uniformly. Even though autosegmental-metrical representa-
tions of tonal contours, like ToBI (Beckman and Ayers-Elam 1993; Jun 2005)
are evolving to become a standard in intonation studies, they are not suY-
ciently represented in corpora. Most large corpora consist of written material
anyway, and those which contain spoken material generally only display
segmental transcription rather than tonal.
In short, the development of corpora which are annotated in a conven-
tional way for intonation patterns is an aim for the future, but as of now, it is
simply not available for German.
As a result, we must rely on the intuition of speakers. The questions we
address in this chapter are: Which tonal contours are accepted most? Which
are less accepted? We will see that the question must be made precise in the
following way: given a certain syntactic structure, is there a contour which is
accepted in the largest set of contexts? And this is related to the question of
pitch accent location. Which constituents are expected to be accented?
Which accent structure is the least marked, in the sense of being accepted
in the greatest number of contexts? Are some accent patterns (tonal
patterns) ‘unmarked’ (more frequent, acquired earlier, but also accepted
more easily) in the same sense as consonant clusters or other segment
sequences are?
Below, we present the results of a perception study bearing on tonal
contours. But before we turn to the experiment, we Wrst sum up some relevant
issues in the research on prosody and situate our research in this broader
context.
Gradient Perception of Intonation 147

8.2 Prosody and intonation


Prosody plays a crucial role in communication. To begin with, we partition
our utterances in prosodic chunks, like phonological phrases and intonation
phrases, which correspond to syntactic constituents (Nespor and Vogel 1986;
Truckenbrodt 1999) or information structural blocks (Vallduvı́ 1992). These
phrases, which help both speakers and hearers structure the discourse, are
signalled phonetically by boundary tones, segmental lengthening, or some
other phonological cues. A second factor playing a role in phonological
patterning is the distribution and form of pitch accents, associated with
prominent syllables. A syllable may be prominent if it is the bearer of the
lexical stress of a word or of a larger constituent which is itself prominent. A
speaker may decide to speak about some object in her surroundings or an
object she knows about, and decide to focus on one property of this object. Or
she may answer a question asked by a protagonist because she feels she has to
deliver some bit of information. In other words, prominence may be assigned
to some linguistic constituents because of contextual or cognitive reasons
(Bolinger 1972). The other reason to assign a pitch accent to a syllable is purely
grammatical. An internal argument of a German predicate + argument
complex, for example, may receive a pitch accent, and the verb may be
unaccented. Still, the whole phrase may be prominent (see Bierwisch 1968;
Schmerling 1976; Gussenhoven 1983, 1992; von Stechow and Uhmann 1986;
Cinque 1993; Féry and Samek-Lodovici 2006, among others).
In Standard German, nuclear accents (the Wnal or most prominent accents
of an intonation phrase) are either bitonally falling, HL, or rising, LH,
whereas prenuclear accents can be rising or falling as well or monotonally
high (H) or low (L) (see Féry 1993; Grabe 1998; Grice et al. 2003; Peters 2005
for phonological studies of intonation of standard German). Prosodic phrases
may be terminated with a boundary tone, which is written with a subscripted
P for a phonological phrase, and a subscripted I for an intonation phrase
(following Hayes and Lahiri’s 1991 notation). For the sake of illustration, two
pitch tracks of a sentence used in the experiments described below are shown
with their phonological tone structure. The Wrst pitch track, in Figure 8.1, is
equivalent to a wide-focused realization with two pitch accents, a rising one
on the subject Ruderer, and a falling one on the object Boote. The verb, adverb,
and particle mit are unstressed. This realization may be dubbed ‘unmarked
prosodic structure’ (UPS, see below). It is expected to be the most frequent
one, and thus, the most widely accepted pattern for such a declarative
sentence. In German, a topic-focus realization, in which the subject is
148 Gradience in Phonology

250

200

150
Pitch (Hz)

100

50
RUDERER bringen immer BOOTE mit

L*H Hp H*L Li

0 1.67134
Time (s)

Figure 8.1. Pitch track of Ruderer bringen immer Boote mit. ‘Oarsmer always bring
boats.’

topicalized and the remainder of the sentence is focused, is identical to a


wide-focused realization.
The second pitch track (Figure 8.2) shows a marked pattern, with just one
pitch accent located early in the sentence. This kind of pattern is expected to
be conWned to special contexts, in particular those eliciting a narrow focus on
the subject.
It is not possible to investigate the gradience of tonal patterns out of
context. Tonal patterns do not exist as pure melodies: they need to be
interpreted as linguistic units, thus as pitch accents or as boundary tones.
This can only happen when tonal excursions are associated with text. More-
over, tonal contours are more or less marked only when they are associated
with speciWc locations in a sentence, since accent locations are dependent on
syntax and information structure. We introduce ‘focus projection’ brieXy in
the next section, but have no space to develop all arguments for this phe-
nomenon (see Selkirk 1995; Schwarzschild 1999; Féry and Samek-Lodovici
2006 among others). We propose the concept of ‘Unmarked Prosodic Struc-
ture’ (UPS, Féry 2005) as the intonation used when the sentence is realized in
a whole-focused environment. It refers to the phrasing and the tonal contour
projected when the speakers have no clue about the context. Unmarked
Prosodic Structure relies solely on the syntactic structure. A tonal contour
Gradient Perception of Intonation 149

250

200

150
Pitch (Hz)

100

50
RUDERER bringen immer Boote mit

H*L Li

0 1.73211
Time (s)

Figure 8.2. Pitch track of Ruderer bringen immer Boote mit

compatible with unmarked prosody is expected to be acceptable in more


environments than other, more marked contours.

8.3 Previous studies on gradient tone perception


Few studies, if any, have explicitly addressed the gradience of intonational
contours, so we cannot base our work on a rich empirical basis. There are,
however, quite a number of studies investigating the question of categories in
intonational morphemes, which have found more or less gradient accents or
boundaries.2 The most relevant studies for our aim have looked at the
adequacy of pitch accent patterns in some speciWc contexts.
The issue of the location of pitch accents and their role for the focus
structure has been investigated for English by Gussenhoven (1983) and Birch
and Clifton (1995), among others, who examine the role of prenuclear accents
on the verb in a VP consisting of a verb plus an argument (or an adjunct by
Gussenhoven) in English. Gussenhoven’s (1983) sentence accent assignment
rules (SAAR) predict that in a focused predicate argument complex, only the
argument needs to be stressed, but that a prenuclear accent can be added

2 Some have found categories in the domain of pitch accent realization; for example Pierrehumbert
and Steele (1989) or Ladd and Morton (1987).
150 Gradience in Phonology

freely on a verb without impairing processing. In a verbal phrase, by contrast,


both the verb and the adjunct need to be stressed. Gussenhoven himself Wnds
conWrmation of this prediction in experimental work. In mini-dialogues such
as (8.1), there is a diVerence between the focus structure of the sentences
answering (8.1a) and (8.1b). In (8.1a), the whole VP share a Xat is focused,
whereas in (8.1b) only the direct object is focused, the diVerence being elicited
by the preceding question. The same kind of contrast is obtained in the
dialogues in (8.2) which contain a verb followed by an adjunct.
(8.1) Verb and argument
a. C: Do you live by yourself?
b. C: I hate sharing things, don’t you?
c. U: I share a Xat. (the whole VP or the argument
NP is focused)
(8.2) Verb and adjunct
a. C: Where will you be in January?
b. C: Where will you be skiing?
c. U: We will be skiing in Scotland. (the whole VP or the adjunct
PP is focused)
Gussenhoven cross-spliced questions and answers, spoken by native speakers,
so as to obtain both answers in both contexts. Subjects then had the task of
deciding which of the two answers was the more appropriate response to the
preceding question. Gussenhoven found that the presence of an accent on the
verb in addition to the expected accent on the object in (8.1) does not change
the acceptability of the pitch accent structure, and that this held in both
narrow and broad focused contexts. The speakers did not do better than by
chance when required to choose between the two contexts on the basis of such
an accent pattern. But in (8.2), the absence of a stress on the verb in (8.2a) was
an indicator that the verb had to be given (and thus not focused), so that the
speakers did better than in the predicate-argument condition in the same
task. The reliability of the accent on the verb in deciding for the wide-focus
context depended gradiently on the number of unstressed syllables interven-
ing between the two accents.
Birch and Clifton (1995) conducted similar experiments, but obtained
slightly diVerent results. They also prepared matched and mismatched pairs
of questions and answers. An example of a dialogue set is reproduced in (8.3).
Only the pairs QA/R1 and QB/R3 match perfectly, all others are predicted to
be more or less deviant along the same lines as those just explained, although
the authors acknowledge that QA/R2 could be as good as QA/R1 if SAAR
make the right predictions.
Gradient Perception of Intonation 151

(8.3) a. Questions QA: Isn’t Kerry pretty smart?


QB: Isn’t Kerry good at math?
b. Responses R1: Yes, she teaches math.
R2: Yes, she teaches math.
R3: Yes, she teaches math.
In judgement and decision tasks, Birch and Clifton found that as an answer to
question QA, speakers prefer R1, with two accents, over R2, with just one
accent on the argument NP. The diVerence was small but signiWcant. And
unsurprisingly, R3 was by far the preferred answer to QB. All other pairs
obtained poorer scores. In a second experiment, speakers had to decide how
well the pairs made sense. In this case, the results for QA were similar to those
of Gussenhoven: there was no diVerence between a sentence with two accents
(R1) and a sentence with just one accent on the argument (R2).3
These results, as well as other perception experiments bearing on the
location of pitch accents conducted for Dutch (Nooteboom and Kruyt 1987;
Krahmer and Swerts 2001) and for German (Hruska et al. 2001) show that, for
these three languages at least, a prenuclear accent is readily acceptable, but
that a postnuclear one is less easily accepted and that accents on narrowly
focused items in an otherwise non-nuclear position are more readily per-
ceived than accents on words accented per default in their unmarked accent
pattern.
Nooteboom and Kruyt (1987) rightly explain the acceptability of a pre-
nuclear accent in terms of topicalizing or thematicizing the bearer of such an
accent, and observe that a sentence with a supplementary prenuclear accent
can get an interpretation in which the prenuclear accent is information
structurally prominent.
In psycholinguistic experiments studying the role of prosody in disam-
biguating syntactic structures (see for instance Lehiste 1973; Kjelgaard and
Speer 1999; Schafer et al. 2000), garden path sentences or sentences with an
ambiguous late or early closure/attachment have been tested. These experi-
ments deliver gradient results correlating with the strength and the location of
boundaries. Comparing the two realizations of the sentences in (8.4), there is
no doubt that intonation can disambiguate the readings. Example (8.4a) is
realized as one Intonation Phrase, but in (8.4b), an Intonation Phrase bound-
ary is located after heiratet, which is then understood as an intransitive verb.

3 Birch and Clifton’s results also indicate that a single accent on the verb is readily accepted in a
context eliciting broad focus (78 percent of yes). The only situation where speakers accepted a pair less
(with 54 per cent and not between 71 and 84 percent as in the other pairs) was when the context was
eliciting a narrow focus on the verb, and the answer had a single accent on the argument (QB/R2).
152 Gradience in Phonology

Much more subtle is the question of whether prosody can help with the
sentence in (8.5). In one reading, it is the woman who lives in Georgia, and
in the other reading, her daughter. The phrasing, in the form of a Phono-
logical Phrase boundary, is roughly the same in both readings. Nevertheless, it
is possible to vary the quantity and the excursion of the boundary tone in
such a way that the preference for one or the other reading is favoured.

L*H L*H H*L LI


(8.4) a. [Maria heiratet Martin nicht]I
‘Mary does not marry Martin’
L*H H*L LI L*H H*L LI
b. [Maria heiratet]I [Martin nicht]I
‘Mary gets married. Martin does not.’
(8.5) [ [Ich treVe mich heute]P [mit der Tochter der Frau]P]I [ [die in
Georgien lebt]P]I
‘I am meeting today with the daughter of the woman who lives in
Georgia.’

We are only marginally interested in syntactic disambiguation in this chapter.


Rather, our experiment aimed at testing the gradience of German inton-
ational structures. This experiment diVers from the ones conducted by
Gussenhoven and by Clifton and Birch in a crucial way: several parameters
were systematically varied: sentence type, context, and tonal contours. We
were explicitly interested in Wnding out whether some kinds of intonation
patterns are more acceptable than others and whether gradience can be
observed in the domain of tonal contours.

8.4 Experiment
8.4.1 Background
The experiment reported in this section was intended to elucidate the
question formulated above: How gradient are tonal contours? We wanted to
understand what triggers broad acceptance for intonational patterns. To this
aim, we used three diVerent kinds of sentences, which were inserted in
diVerent discourse contexts, and cross-spliced. If an eVect was to be found,
we expected it to be of the following kind: the unmarked tonal contours
should be generally better tolerated than the marked ones.
The hypothesis can be formulated as in (8.6).
Gradient Perception of Intonation 153

(8.6) Unmarked Prosodic Structure (UPS) Hypothesis


An unmarked prosodic structure, i.e. a prosodic structure adequate in a
broad focus environment, is readily accepted. It can be inserted suc-
cessfully in more environments than a marked prosodic structure,
which is appropriate in a restricted number of contexts only.
The topic-focus contour that we used in our experiment has the same contour
as a broad focus one. Both have a rising pitch accent on the subject, and a
falling accent on the focused word (the ‘focus exponent’). We chose a topic-
focus environment instead of a broad focus one because of the slightly clearer
accent pattern produced with a topic and a focus. Even though we did not
include a broad focus context in our experiment, we are conWdent that the
pattern we call TF would get high scores in it.

8.4.2 Material
Three diVerent kinds of sentences served as our experimental material: six
short sentences, six long sentences, and three sentences with ambiguous scope
of negation and quantiWer. Every sentence was inserted in three or four
matching contexts (see below). In (8.7) to (8.9), an example for each sentence
is given along with their contexts. The remaining sentences are listed in the
appendix.

(8.7) Short sentences


Maler bringen immer Bilder mit.
Painters bring always pictures with
a. Narrow focus on the subject (NFS): Tom hat mir erzählt, dass
Fotografen unserer Nachbarin immer Bilder mitbringen. Aber das
stimmt nicht:
‘Tom told me that photographers always bring pictures to our
neighbour. But this is not true:’
b. Narrow focus on the object (NFO): Angeblich bringen Maler unserer
Nachbarin immer Bücher mit. Aber das stimmt nicht:
‘It is said that painters always bring books to our neighbour. But
this is not true:’
c. Topic-focus (TF): Meine Nachbarin schmeißt oft große Partys, dafür
bekommt sie aber auch viele Geschenke. Regisseure schenken ihr
Filme, Schriftsteller Bücher und . . .
‘My neighbour often throws big parties, and therefore she also gets
lots of presents. Movie directors give her movies, writers give her
books and . . .’
154 Gradience in Phonology

(8.8) Long sentences


Passagiere nach Rom nehmen meistens den späten Flug.4
Passengers to Rome take mostly the late Xight
a. Narrow focus on the subject (NFS): Angeblich nehmen die Leute
nach Athen meistens den späten Flug. Aber das stimmt nicht:
‘It is said that the people (Xying) to Athens mostly take the late
Xight, but this is not true:’
b. Narrow focus on the object (NFO): Mona sagt, dass Passagiere nach
Rom meistens die frühe Maschine nehmen. Aber das stimmt nicht:
‘Mona says that passengers to Rome mostly take the early Xight,
but this is not true:’
c. Topic-focus (TF): Pendler, die ziemlich weit von zuhause arbeiten,
haben oft ähnliche Angewohnheiten. Geschäftsleute Richtung Paris
fahren oft mit dem Auto, Reisende nach London nehmen den Zug
aus Calais und . . .
‘Commuters who work far away from home often have similar
habits. Business people who go to Paris often take their car,
travellers to London take the train from Calais and . . .’
(8.9) Quantifier-negation sentences
Beide Autos sind nicht beschädigt worden.
Both cars were not damaged
a. Two foci (‘two’): Es wäre schlimm gewesen, wenn Karl bei dem
Unwetter seinen Jaguar und seinen Porsche auf einmal verloren
hätte, aber glücklicherweise war es nicht so.
‘It would have been too bad if Charles had lost both his Jaguar and
his Porsche because of the bad weather, but fortunately this was
not the case.’
b. Narrow focus on the quantiWer (FQ): Ist nur Peters Auto nicht
beschädigt worden? Nein, . . .
‘Has only Peter’s car not been damaged? No, . . .’
c. Narrow focus on the negation (FN): Ich habe gesehen, dass Deine
beiden Autos seit Wochen in der Garage stehen. Sind sie bei dem
Unfall beschädigt worden?—Nein, ich habe Dir doch schon gesagt:
‘I have seen that both your cars have been sitting in the garage for
ages. Were they damaged in the accident?—No, I already told you,
. . .’
4 As Ede Zimmermann (p.c.) observes, it is not undisputed whether there is a structural ambiguity
between the temporal and the quantiWcational reading of meistens. We suspect that, even if conWrmed,
this ambiguity played no role in the experimental results.
Gradient Perception of Intonation 155

d. Topic-focus (TF): Bei dem Unfall ist verschiedenes passiert. Drei


Fahrräder sind jetzt Schrott, ein Fußgänger ist im Krankenhaus, aber
bei den Autos, die dabei involviert waren, war es nicht dramatisch:
‘Several things happened at the accident. Three bikes are now
ruined, a pedestrian is at the hospital, but nothing dramatic
happened to the cars involved:’
Contexts and stimuli sentences were spoken by a trained speaker and recorded
in a sound-proof booth on a DAT recorder. The speaker was instructed to
speak naturally, in a normal tempo. He read the context-target pairs at once,
Wrst the context and then the stimulus sentences. There were 48 matching
pairs for the three experiments altogether (six short sentences, six long
sentences, and three quantiWer-negation sentences in their contexts, thus 18
+ 18 + 12 pairs). All pitch accents of a speciWc type were realized similarly (see
Figures 8.1 to 8.3 for illustrations), and controlled carefully with the help of the
speech analysis program PRAAT. Several recording sessions were necessary.
The sentences were evaluated by three independent trained phonologists as to
their naturalness. Context sentences and stimulus sentences were digitized
into individual sound Wles, ready to be cross-spliced. No manipulation
whatever was undertaken in order to not endanger the naturalness. We
prepared 36 non-matching pairs for the short sentences, 36 for the long
sentences, and 32 for the scope sentences, a total of 104 non-matching pairs.
The sentences to be evaluated thus consisted in 48 matching and 104 non-
matching pairs, an overall total of 152 pairs.

8.4.3 Subjects
Four non-overlapping groups of Wfteen subjects (altogether sixty students at
the University of Potsdam) took part in the experiment. They were native
speakers of Standard German and had no known hearing or speech deWcit. All
were paid or acquired credit points for their participation in the experiment.
Two groups judged the sentences on a scale of 1 (very bad) to 8 (perfect), and
two groups judged the same sentences in a categorical way: acceptable (yes) or
non-acceptable (no). All sixty informants evaluated the scope sentences. In
addition, the Wrst and third groups also judged the short sentences, while the
second and fourth groups judged the long sentences, thus thirty matching
sentences plus sixty-eight non-matching ones each.

8.4.4 Procedure
The subjects were in a quiet room with a presentation using the DMDX
experiment generator software developed by K. and J. Forster at the
156 Gradience in Phonology

University of Arizona. The experimenter left the subject alone in the room
after brief initial instructions as to beginning and ending the session. The
subjects worked through the DMDX presentation in a self-paced manner. It
led them through a set of worded instructions, practice utterances, and Wnally
the experiment itself, consisting of 102 target sentences. No Wllers were
inserted, but three practice sentences started the experiment. This experiment
was itself included in a set of experiments in which the subjects performed
diVerent tasks: production of read material, and dialogues. The instructions
made it clear that the aim of the experiment was to test the intonation and
stress structure of the sentences, and not their meaning or syntax. The stimuli
were presented auditorily only: pairs of context and stimulus sentences were
presented sequentially. The subject heard Wrst a context, and after hitting the
return key, the test sentence. The task consisted in judging the adequacy of
the intonation of the sentence in the given context. Every recorded sentence
of the groups of short and long sentences was presented nine times, in three
diVerent intonational and stress patterns, and each of these patterns in
three diVerent contexts. The scope sentences were presented sixteen times
each, in all possible variants.
The sentences were presented in a diVerent randomized order for each
subject. The set-up and the instructions included the option of repeating the
context–stimulus pair for a particular sentence. Most subjects made occa-
sional use of this possibility. Only the last repetition was included in the
calculation of the reaction time (see Section 8.4.9).

8.4.5 Short and long sentences


There were six short sentences like the one illustrated in (8.7), consisting of
a simple subject (an animate noun in plural), a verb (mitbringen ‘bring’), an
adverb (immer ‘always’) and a simple object (an inanimate noun in plural).
The separable but unstressed particle mit was located at the end of the
sentence, resulting in a non-Wnal object. The sentences were inserted in
three diVerent contexts inducing the following information structures:
narrow corrective focus on the subject (NFS), see Figure 8.2, narrow
corrective focus on the object (NFO), see Figure 8.3, and topic-focus (TF),
the unmarked prosodic structure, see Figure 8.1. The sentences with narrow
focus were elicited by replacing a pre-mentioned element with another one.
Our decision to use a corrective narrow focus was driven by the intention
to have a very clear accentual structure. A topic-focus was elicited by
pre-mentioning some pairs of elements with the same structure as the
tested sentence.
Gradient Perception of Intonation 157

250

200

150

100

50
Ruderer bringen immer BOOTE mit

L*H H*L Li

0 1.74512
Time (s)

Figure 8.3. Pitch track of Ruderer bringen immer Boote mit

Figure 8.3 displays a narrow focus on the object. The subject Ruderer
has a rising prenuclear pitch accent with a much smaller excursion than
in the unmarked topic-focus conWguration. The object Boote carries the
high-pitched nuclear accent.

8.4.6 Results and discussion


Table 8.1 displays the data for the Wrst group of subjects, who had to give scalar
judgements. Each cell shows the mean score of the six sentences having the
same context-intonation pair. The second group of subjects judged the same
sentences in a categorical way, and the mean scores for these subjects are given
in Table 8.2. The correlation between the mean scores in Tables 8.1 and 8.2 is

Table 8.1. Short sentences: mean judgement scores (on a scale from 1 to 8)

Context/intonation NFS NFO TF

NFS 7.7 1.5 2.0


NFO 2.0 7.2 5.9
TF 2.0 3.7 6.8
All contexts 3.9 4.1 4.9
158 Gradience in Phonology

Table 8.2 . Short sentences: mean judgement scores (categorical)

Context / intonation NFS NFO TF

NFS 0.92 0.18 0.11


NFO 0.22 0.89 0.66
TF 0.07 0.32 0.87
All contexts 0.40 0.46 0.54

8 Context

7 NFS
NFO
6 TF
5
Mean

NFS NFO TF
Intonation

Figure 8.4. Mean acceptability scores for short sentences (scale answers)

almost perfect (Pearson’s product-moment correlation ¼ 0.984, p ¼ 0.000).


The interaction between context and intonation is displayed graphically in
Figure 8.4. It presents the results of only the Wrst group (i.e. scale answers), but
a graph of the second group would look very similar due to the strong
association between the two groups.
All patterns were accepted best in their own matching context. The un-
marked TF tonal contour, corresponding to the UPS, was also readily accepted
in the NFO context, a result corresponding to our expectations. NFO had one
pitch accent on the object and a reduced prenuclear accent on the subject. It
thus looked more like the TF (the realization of the UPS) than the NFS with
only one pitch accent on the subject. NFO got intermediate scores in the TF
context. The slight inadequacy that our informants felt can be safely attributed
to the lack of a topical accent on the subject. By contrast, NFS is accepted in its
matching context, but refused in a non-matching context.
Gradient Perception of Intonation 159

Gradient judgements were obtained in two diVerent ways, either directly,


by letting the informants give their own gradient results, or indirectly, by
counting categorical results. The very high correlation between the two
groups of means suggests that it does not matter which method is used, as
both methods give very similar results. It will be shown that this correlation
reproduced itself for all sentences.
In the six longer sentences, one of which is illustrated in (8.8), the subject
and the object were syntactically more complex. We decided to include both
short and long sentences in our experiment in order to verify the inXuence of
length and complexity on the perception of tonal patterns. The distinction
between the two kinds of sentences, however, turned out to be minimal, as
one can see from a comparison between Figure 8.4 and Figure 8.5.
The only diVerence between these sentences and the short ones worth
mentioning is that in the TF context, both NFS and NFO were now better
tolerated. We do not have any explanation for the slightly better acceptance of
the absence of a late accent in a TF context. As an explanation for the better
acceptance of NFO in the TF context, we oVer that it might not be so easy to
perceive the diVerence between weak and strong prenuclear accents when the
sentence is longer.
Here also a very high correlation between the two groups of subjects was
found, suggesting once more that both scalar and categorical methods are
equally good for obtaining gradient judgements.

8 Context

7 NFS
NFO
6 TF
5
Mean

NFS NFO TF
Intonation

Figure 8.5. Mean acceptability scores for long sentences (scale answers)
160 Gradience in Phonology

Let us now relate our Wndings to those described in Section 8.2. First, the
scores for matching context-intonation pairs were higher than for non-
matching pairs. Second, a missing nuclear accent and an added nuclear accent
triggered lower scores than sentences with the expected accentuation. The
same was true for both a missing prenuclear accent and an added prenuclear
accent. As described by Hruska et al. (2001), adding a prenuclear accent on the
subject in a situation where only a nuclear accent on the object is expected
obtained higher scores than other non-matching pairs. In the same way,
Gussenhoven, as well as Clifton and Birch, also found that an added pre-
nuclear accent delivers better judgements than an added nuclear accent.

8.4.7 Scope sentences


The sentences in the third experiment, one of which is illustrated in (8.9),
consist of a subject made up of a quantiWer and a noun, an auxiliary, the
negation nicht, and a past participle or an adjective (below called ‘the predi-
cate’), and are characterized by variable scope of negation and variable scope
of the quantiWer. Four contexts were constructed, as illustrated in (8.9). First a
context eliciting two accents: one on the quantiWer and one on the negation
(called ‘two’ in the following). The second context elicits a narrow focus on
the quantiWer (FQ), the third context a narrow focus on the negation (FN),
and the last context was a topic-focus one, eliciting two accents again, one on
the quantiWer, as in ‘two’, and the second one on the predicate (TF). All four
contours are illustrated for example (8.9) in Figure 8.6.
The syntactic structure of the sentences in this experiment is simple, but
their semantic structure is not. First, the negation can have scope over the
quantiWer or, vice versa, the quantiWer can have scope over the negation. In
the experiment, one context called unambiguously for wide scope of the
negation (‘not both cars . . .’), and one unambiguously for wide scope of
the quantiWer (‘for both cars, it is not the case that . . .’). The Wrst case (‘two’
context in (8.9)) is triggered by double accentuation on the quantiWer and the
negation, and the second case (FQ context in (8.9)) comes with a single accent
on the quantiWer.5 It is assumed here that the scope inversion reading elicited
by the ‘two’ context can be explained by general properties of topicalization,
visible in languages with resumptive pronouns. The topicalized quantiWer in
the sentences under consideration is in a position of extraposition to the left,

5 As a generalization, the negation may have wider scope when both the quantiWer and the negation
(or the negated constituent) are accented. This generalization holds only for this type of construction,
but not for other sentences with inverted scope, such as those with two quantiWers discussed in Krifka
(1998).
Gradient Perception of Intonation 161

2.09

2.0
BESCHÄDIGT worden

Li
beschädigt worden

Li

H*L

Time (s)
Time (s)

sind nicht
sind nicht

Hp
Autos
Autos

(b) Context ‘FQ’

(d) Context ‘TF’


L*H
BEIDE
BEIDE

H*L

0
0

250

200

150

100

50
250

200

150

100

50

Pitch (Hz) Pitch (Hz)

2.06
2.02

beschädigt worden
beschädigt worden

Li
Li

Time (s)

Time (s)
sind NICHT

H*L

NICHT

H*L
Hp

sind
BEIDE Autos

Beide Autos
(a) Context ‘Two’
L*H

(c) Context ‘FN’


L*H
0

0
250

200

150

100

50

250

200

150

100

50

Pitch (Hz) Pitch (Hz)

Figure 8.6. Four realizations of (8.9)

but is nevertheless interpreted to be in the scope of the negation (see also


Höhle 1991). All authors who have studied the scope inversion phenomenon
in German (Höhle 1991; Jacobs 1997; Büring 1997; Krifka 1998) have insisted
on the necessity of a rise–fall contour to get the interpretation aimed at, and
this is the contour which was produced by our speaker as well. Crucially, an
162 Gradience in Phonology

independent phonological phrase is formed which contains the topicalized


constituent, separate from the main clause. In a realization with only one
accent on the quantiWer, by contrast, both the quantiWer and the negation are
interpreted in situ and consequently, the quantiWer has wide scope over the
negation.6 Prosodically, the quantiWer cannot be interpreted as being topica-
lized because it has the focal accent of the sentence. In our experiment, the
context eliciting this accent pattern was one in which the quantiWer was
contrastively accented.
The other two patterns, a single accent on the negation (FN) and a double
accent on the quantiWer and on the predicate (TF) do not evoke clear scopal
relationships. A unique accent on the negation contradicts the preceding
sentence. In the experimental sentences, the predicate had been stressed in
the preceding matching context. However, it was not possible to unambigu-
ously reconstruct the context from the negated sentence only. An accent on
the quantiWer, the noun or the predicate changes the pragmatics of the
sentence, but in the realization with a single accent on the negation, these
diVerences are cancelled. The hypothesis was thus that an accent on the
negation would be tolerated in a variety of contexts.
The TF context with accents both on the NP containing the quantiWer and
on the predicate is similar to the ‘two’ context. It can also have diVerent
readings, one being that the predicate is contrasted. Inverted scope is also not
impossible in this case.
To sum up, a realization with a single accent—especially when the accent is
on the quantiWer—seems to be more marked than a realization with two
accents, in the sense that it is adequate in less contexts. With the third
experiment, we wanted to verify this hypothesis.

8.4.8 Results and discussion


Tables 8.3 and 8.4 as well as Figure 8.7 present the mean values in both scalar
and categorical judgements. Once again, the correlation between the two
groups of means is almost perfect (Pearson’s product-moment correlation
¼ 0.973, p ¼ 0.000).
The results are not as clear cut as in the short and long sentences. For the
‘two’ context, the FN and the FQ, the matching pairs obtained better scores
than the other ones. It is also to be noticed that the TF and ‘two’ contexts are
nearly interchangeable. This can be attributed to the presence of two accents

6 Krifka (1998) explains scope inversion of sentences with two quantiWers by allowing movement of
accented constituents at the syntactic component of the grammar. Both topicalized and focused
constituents have to be pre-verbal at some stages of the derivation in order to get stress.
Gradient Perception of Intonation 163

Table 8.3. Scope sentences: mean judgement scores (on a scale from 1 to 8)

Context / intonation two FQ FN TF

two 6.1 3.6 5.1 6.1


FQ 3.7 7.0 3.2 3.4
FN 5.4 3.1 6.5 5.3
TF 5.4 3.6 4.7 5.8
All contexts 5.1 4.3 4.9 5.1

Table 8.4. Scope sentences: mean judgement scores (categorical)

Context / intonation two FQ FN TF

two 0.73 0.27 0.64 0.71


FQ 0.32 0.90 0.26 0.36
FN 0.62 0.18 0.90 0.57
TF 0.76 0.39 0.54 0.72
All contexts 0.61 0.43 0.59 0.59

8 Context

7 two
FQ
6 FN
TF
5
Mean

two FQ FN TF
Intonation

Figure 8.7. Mean acceptability scores for scope sentences (scale answers)
164 Gradience in Phonology

in both sentences, Wtting both contexts requiring two accents. The same
cannot be said for the realizations with one accent since the accent elicited
in each case is at a diVerent place. However, the FN sentences, with a late
accent, elicited better scores in a non-matching environment than the FQ
sentences with an early accent. The highly marked prosodic pattern found in
FQ sentences obtained poor scores in all non-matching contexts, and the best
results in the matching context.
To sum up the results obtained for these sentences, it can be observed that
the interchangeability of contexts and intonation pattern is higher in these
sentences than in the short and long sentences. We explain this pattern of
acceptability with the fact that the scope structure of these sentences, complex
and subject to diVerent interpretations, renders the accent patterns less rigid.
Another interpretation could be that speakers were more concentrated on
understanding the scopal relationships and were thus less sensitive to slight
variations in the tonal structure of the sentences they heard.

8.4.9 Reaction times


Additional information on the cognitive cost of the task was gathered by the
measure of reaction times. Table 8.5 shows that it took more time to process
the long sentences and the scope sentences than the short ones. It can also be
observed that making a decision on a scale needs more time than making a
categorical decision (except for the long sentences, where no diVerence could
be observed). We could not Wnd any correlation between the number of keys
available for responding and the reaction times, neither in the scalar decision
task when comparing the subjects who used all keys and those using only four
to six keys (out of the eight at their disposal), nor between the two tasks in
comparison. In other words, it is not the case that using eight keys instead of
two increases the time it takes to make a decision. We conclude that the
increase of reaction time that we observe is truly due to an increase of
cognitive complexity.

Table 8.5. Mean reaction times

Short sentences Long sentences Scope sentences

Scale 4.2 s 5.0 s 5.0 s


(sd ¼ 2.24; N ¼ 810) (sd ¼ 2.36; N ¼ 810) (sd ¼ 2.83; N ¼ 1,440)
Categorical 3.7 s 5.0 s 4.7 s
(sd ¼ 2.22; N ¼ 810) (sd ¼ 2.27; N ¼ 810) (sd ¼ 2.80; N ¼ 1,440)
Gradient Perception of Intonation 165

8.5 Conclusion
This chapter has investigated the gradient nature of the acceptability of
intonation patterns in German declarative sentences. Three kinds of sentences
elicited in diVerent information structural contexts were cross-spliced and
informants were asked to judge the acceptability of context-target pairs. The
clearest results were obtained for the short sentences, although the long
sentences delivered comparable results. Finally, the tonal patterns of scope
sentences were much more diYcult to interpret, because the scope behaviour
of the negation and the quantiWer was variable, depending on the accent
structure of these sentences. For all sentences, we found that a prosody with
two accents got better scores than a prosody with only one accent, and that a
contour with a late accent was better accepted in non-matching environ-
ments. We dubbed the prosody with two accents, acceptable in a broad focus
context or in a topic-focus context, UPS, for ‘unmarked prosodic structure’,
and we observe that this contour is accepted in a non-matching context more
readily than contours with only one accent, especially when this single accent
is located early in the sentence.
The results of the short and long sentences, and, to a lesser extent, those of
the scope sentences, point to a good correlation between context and prosodic
structure. Speakers and hearers do use prosodic information such as presence
versus absence of pitch accents, their form, and the phrasing to assess the
well-formedness of context-target sentence pairs, and they do so consistently.
Their performance is ameliorated when the syntactic and semantic structure
of the sentence is very simple. It can safely be claimed that in German,
information structure plays an important role in the processing of prosody,
whereas it has been shown for syntax that word order alone, presented in
written form, does not have the same eVect (see for instance Schlesewsky,
Bornkessel, and McElree, this volume, and references cited there). The con-
clusion one could tentatively draw from this diVerence is that intonation
encodes information structure better than syntax.
An interesting result is that in all three experiments the scores obtained for
the two groups of subjects (scale and yes–no answers) were similar. In other
words, the same gradient results can be obtained by using either gradient or
non-gradient judgements. This is remarkable since the cognitive task executed
in both groups was diVerent. It could have been the case that in a sentence
with a high score of acceptability the rating by scale would have been gradient,
but the yes–no judgement categorical. However, if the groups of informants
166 Gradience in Phonology

are large enough, ‘intolerant’ subjects compensate for the degree of insecurity
that remains in subjects asked to give a judgement on a scale.
Although we oVer no analysis of how our gradient data can be accounted
for in a formal grammar, we conclude with the observation that a categorical
grammar will not be adequate. Speakers are more or less conWdent in their
judgements, and gradiently accept sentences intended to express a diVerent
information structure, depending on whether the sentences have a similar
accent pattern. A gradient grammar, like stochastic OT, which uses overlap-
ping constraints, can account much better for the observed variability. This is,
however, a subject for future research.

8.6 Appendix
Short sentences (three contexts)
1. Maler bringen immer Bilder mit. ‘Painters always bring pictures.’
2. Lehrer bringen immer Hefte mit. ‘Teachers always bring notebooks.’
3. Sänger bringen immer Trommeln mit. ‘Singers always bring drums.’
4. Ruderer bringen immer Boote mit. ‘Oarsmen always bring boats.’
5. Geiger bringen immer Platten mit. ‘Violinists always bring records.’
6. Schüler bringen immer Stifte mit. ‘Students always bring pens.’
Long sentences (three contexts)
7. Passagiere nach Rom nehmen meistens den späten Flug.
‘Passengers to Rome always take the late Xight.’
8. Reisende nach Mailand fahren oft mit dem schnellen Bus.
‘Travelers to Milan often travel with the express bus.’
9. Autofahrer nach Griechenland nehmen immer den kürzesten Weg.
‘Car drivers always take the shortest road.’
10. SchiVe nach Sardinien fahren meistens mit voller Ladung.
‘Ships to Sardinia mostly sail with a full cargo.’
11. Züge nach England fahren oft mit rasantem Tempo.
‘Trains to England often ride at full speed.’
12. Trekker nach Katmandu reisen meistens mit vollem Rucksack.
‘Trekkers to Katmandu mostly travel with a full backpack.’
Variable scope sentences (four contexts)
13. Alle Generäle sind nicht loyal. ‘All generals are not loyal.’
14. Beide Autos sind nicht beschädigt worden. ‘Both cars have not been
damaged.’
15. Viele Gäste sind nicht gekommen. ‘Many guests did not come.’
9

Prototypicality Judgements as
Inverted Perception
PAU L B O E R S M A

In recent work (Boersma and Hayes 2001), Stochastic Optimality Theory has
been used to model grammaticality judgments in exactly the same way as
corpus frequencies are modelled, namely as the result of noisy evaluation of
constraints ranked along a continuous scale. It has been observed, however, that
grammaticality judgements do not necessarily reflect relative corpus fre-
quencies: it is possible that structure A is judged as more grammatical than a
competing structure B, whereas at the same time structure B occurs more often
in actual language data than structure A. The present chapter addresses one of
these observations, namely the finding that ‘ideal’ forms found in experiments
on prototypicality judgements often turn out to be peripheral within the corpus
distribution of their grammatical category (Johnson, Flemming, and Wright
1993). At first sight one must expect that Stochastic Optimality Theory
will have trouble handling such observed discrepancies. The present chapter,
however, shows that a bidirectional model of phonetic perception and produc-
tion (Boersma 2005) solves the paradox. In that model, corpus frequency
reflects the production process, whereas prototypicality judgements naturally
derive from a simpler process, namely the inverted perception process.

9.1 The /i/ prototype eVect: prototypes are peripheral


A notorious example of the diVerence between grammaticality judgements
and corpus frequencies is the ‘/i/ prototype eVect’ in phonology: if the
experimenter asks a subject to choose the most /i/-like vowel from among a
set of tokens that vary in their spectral properties, the subject tends to choose
a very peripheral token, i.e. one with a very low Wrst formant (e.g. 250 Hz)
and a very high second formant (Johnson et al. 1993; Frieda et al. 2000). In
actual speech, less extreme formant values (e.g. an F1 of 300 Hz) are much
more common. Apparently, then, the token that the subject prefers is much
more /i/-like than the average realization of the vowel /i/ is.
168 Gradience in Phonology

9.1.1 Why the /i/ prototype eVect is a problem for linguistic models
The /i/ prototype eVect has consequences for models of phonological gram-
mar. The commonly assumed three-level grammar model, for instance,
has trouble accounting for it. In this model, the phonology module maps
an abstract underlying form (UF), for instance the lexical vowel jij, to an
equally discrete abstract surface form (SF), for instance /i/, and the phonetic
implementation module subsequently maps this phonological SF to a con-
tinuous overt phonetic form (OF), which has auditory correlates, such as a
value of the Wrst formant, and articulatory correlates, such as a certain tongue
height and shape. Such a grammar model can thus be abbreviated as
UF!SF!OF.
The experimental prototypicality judgement task described above involves
a mapping from the phonological surface form /i/ to an overt auditory Wrst
formant value, that is an SF!OF mapping. In the three-level grammar
model, therefore, the natural way to account for this task is to assume that
it shares the SF!OF mapping with the phonetic implementation process. If
so, corpus frequencies (which result from phonetic implementation)
should be the same as grammaticality judgements (whose best result is the
prototype). Given that the two turn out to be diVerent, Johnson et al.
(1993) found the UF!SF!OF model wanting and proposed the model
UF!SF!HyperOF!OF, where the additional intermediate representation
HyperOF is a ‘hyperarticulated’ phonetic target. The prototypicality task, then,
was proposed to tap HyperOF, whereas corpus frequencies reXect OF. The
present paper shows, however, that if one distinguishes between articulatory
and auditory representations at OF, the two tasks (production and proto-
typicality) involve diVerent mappings, and the /i/ prototypicality eVect arises
automatically without invoking the additional machinery of an extra inter-
mediate representation and an extra processing stratum.

9.2 A bidirectional constraint-based explanation of the /i/ prototype


eVect
This section presents a simple constraint-based model of the phonological
grammar and of Wve phonological processes that are deWned on this grammar.
The account leads to an informal explanation for the /i/ prototype eVect.

9.2.1 A grammar model with two phonological and two phonetic representations
The grammar model presented in Figure 9.1 is the Optimality-Theoretic
model of ‘phonology and phonetics in parallel’ (Boersma 2005).
Prototypicality Judgements as Inverted Perception 169

Figure 9.1 shows the four relevant representations and their connections. There
are two separate phonetic forms: the auditory form (AudF) appears because it is
the input to comprehension, and the articulatory form (ArtF) appears because it
is the output of production. ArtF occurs below AudF because 9-month-old
children can perceive sounds that they have no idea how to produce (for an
overview, see Jusczyk 1997); at this age, therefore, there has to be a connection
from AudF to SF (or even to UF, once the lexicon starts to be built up) that cannot
pass through ArtF; Figure 9.1 generalizes this for speakers of any age.

9.2.2 Linguistic processes are deWned on the grammar


Figure 9.1 is not a processing model. Rather, linguistic and paralinguistic tasks
have to be deWned as processes that travel the representations in Figure 9.1 and
are evaluated by the constraints that are visited on the way. Normal language
use consists of two linguistic tasks: that of the listener (comprehension) and
that of the speaker (production). This section describes the implementation of
these two linguistic tasks; three paralinguistic tasks (including the prototypi-
cality task), which can be regarded as simpliWed versions of the linguistic
tasks, are described in the next section.
Boersma (2005) proposes that in the model of Figure 9.1 the linguistic
task of comprehension is implemented as two consecutive mappings (cf.
McQueen and Cutler 1997), as shown on the left in Figure 9.2.
The Wrst mapping in comprehension is perception (also called prelexical
perception or phonetic parsing). In general, perception is the process that maps
continuous sensory information onto a more abstract mental representation.
In phonology, perception is the process that maps continous auditory (and
sometimes visual) information, that is AudF, onto a discrete phonological
surface representation, that is SF. The shortest route from AudF to SF in
Figure 9.1 determines what constraints evaluate this mapping: the relation
between input (AudF) and output (SF) is evaluated with cue constraints

UF lexical constraints
faithfulness constraints
SF structural constraints
cue constraints
AudF auditory constraints?
sensorimotor constraints
ArtF articulatory constraints

Figure 9.1. The grammar model underlying bidirectional phonology and phonetics
170 Gradience in Phonology

(Escudero and Boersma 2003, 2004), and the output (SF) is evaluated with the
familiar structural constraints known since the earliest Optimality Theory
(OT; Prince and Smolensky 1993). This AudF!SF mapping is language-
speciWc, and several aspects of it have been modelled in OT: categorizing
auditory features to phonemes and autosegments (Boersma 1997, 1998a et
seq.; Escudero and Boersma 2003, 2004), and building metrical foot structure
(Tesar 1997; Tesar and Smolensky 2000; Apoussidou and Boersma 2004).
The second mapping in comprehension, shown at the top left in Figure 9.2,
is that from SF to UF and can be called recognition, word recognition, or lexical
access. In this mapping, the relation between input (SF) and output (UF) is
evaluated with faithfulness constraints such as those familiar from two-level
OT (McCarthy and Prince 1995), and the output (UF) is evaluated with lexical
access constraints (Boersma 2001).
Boersma (2005) proposes that in contradistinction to comprehension, pro-
duction (shown at the right in Figure 9.2) consists of one single mapping from
UF to ArtF, without stopping at any intermediate form as is done in compre-
hension. In travelling from UF to ArtF, the two representations SF and AudF
are necessarily visited, so the production process must evaluate triplets of { SF,
AudF, ArtF } in parallel. As can be seen from Figure 9.1, the evaluation of these
triplets must be done with faithfulness constraints, structural constraints, cue
constraints, sensorimotor constraints (which express the speaker’s knowledge
of how to pronounce a target auditory form, and of what a given articulation
will sound like), and articulatory constraints (which express minimization of
eVort; see Boersma 1998a, Kirchner 1998). According to Boersma (2005), the
point of regarding phonological and phonetic production as parallel processes
is that this can explain how discrete phonological decisions at SF can be
inXuenced by gradient phonetic considerations such as salient auditory cues
at AudF (e.g. Steriade 1995) and articulatory eVort at ArtF (e.g. Kirchner 1998).

Comprehension Production

UF UF

SF SF

AudF AudF

ArtF ArtF

Figure 9.2. The linguistic task of the listener, and that of the speaker
Prototypicality Judgements as Inverted Perception 171

Phoneme categorization Prototypicality Phoneme production

UF UF UF

SF SF SF

AudF AudF AudF

ArtF ArtF ArtF

Figure 9.3. Three laboratory tasks

9.2.3 Experimental tasks are paralinguistic processes


Experimental tasks in the phonetics laboratory are often designed to reXect
only a part of one of the linguistic processes shown in Figure 9.2. The present
section addresses the three tasks that are relevant for explaining the /i/
prototype eVect, namely the phoneme categorization task, the phoneme pro-
duction task, and the phoneme prototypicality task.
In the experimental task of phoneme categorization the participant is asked
to classify a given stimulus as one of the phonemes of her language, for instance
to classify a synthetic vowel with a known F1 of 360 Hz as either the vowel /i/ or
the vowel /i/. Such an experiment tends to be set up in such a way that the
inXuence of the lexicon is minimized, for instance by presenting the response
categories as semantically empty vowel symbols (e.g. ‘‘a’’, ‘‘e’’, ‘‘i’’, ‘‘o’’, ‘‘u’’ for
Spanish listeners) or as equally accessible lexical items (e.g. ‘‘ship’’ and ‘‘sheep’’
for English listeners).1 In the former case, UF may not be accessed at all; in the
latter case, the SF!UF mapping may be equally easy for both categories; in
both cases, the inXuence of the lexicon may be ignored, so that the task can be
abbreviated as in Figure 9.3 (left). The only constraints that are relevant for this
mapping are the cue constraints and the structural constraints.
In the experimental task of phoneme production the participant is asked to
pronounce either a nonsense word or a word with ‘‘no’’ phonology, that is
where SF is identical to UF (no faithfulness violations), such as English hid or
heed. In both cases, the inXuence of the lexicon can again be ignored, so the
task can be abbreviated as in Figure 9.3 (right). The relevant constraints will

1 Unless the very point of the experiment is to investigate the inXuence of the lexicon on prelexical
perception. By the way, if such inXuences turn out to exist (e.g. Ganong 1980; Samuel 1981), the
comprehension model in Figure 9.2 will have to be modiWed in such a way that perception and
recognition work in parallel. In that case, however, the phoneme categorization task will still look like
that in Figure 9.3, and the results of the present paper will still be valid.
172 Gradience in Phonology

be the cue constraints, the sensorimotor constraints, and the articulatory


constraints.
In the experimental task of prototypicality judgements the participant is
given an SF, as in the phoneme production task, and asked to choose an
AudF, similar to those in the phoneme categorization task. Since this task
involves neither the lexicon nor any actual articulation, it can be abbreviated
as Figure 9.3 (middle). The only relevant constraints are the cue constraints (if
auditory constraints do not exist).

9.2.4 The informal explanation


The fact that the prototypicality task yields a diVerent result than the phon-
eme production task can be attributed to the diVerence between the relevant
two processes in Figure 9.3: in the production task, constraints on articulatory
eVort do play a role, in the prototypicality task they do not. This is a robust
eVect that seems to withstand conscious manipulation: even if listeners are
asked to choose the auditory form that they would say themselves, they
respond with the prototype, not with the form they would really produce
(Johnson et al. 1993).
The result of the involvement of articulatory constraints in the phoneme
production task is that peripheral tokens of /i/ may be ruled out because they
are too eVortful, for example because they require too much articulatory
precision, whereas tokens closer to the easiest vowel articulation, perhaps [@],
do not violate any high-ranked articulatory constraints.

9.3 A formalization in Optimality Theory


While the explanation presented informally in Section 9.2.4 would work for any
constraint-based theory of bidirectional phonology and phonetics in parallel,
this chapter formally shows that it works for the particular constraint-based
framework of Stochastic Optimality Theory. The point of this exercise is not
only to provide a rigorous illustrative example, so as to achieve descriptive
adequacy, but also to propose an explanation of the acquisition of the relevant
part of the grammar in terms of an initial state and a learning path, so as to
achieve explanatory adequacy and to show that the resulting grammar is stable.

9.3.1 Formalizing phoneme categorization and its acquisition


As seen in Figure 9.3 (left), phoneme categorization can be seen as involving
prelexical perception only, that is as a mapping from an auditory form to a
phonological surface form. For the case of the /i / prototype eVect, it is relevant
Prototypicality Judgements as Inverted Perception 173

/i/ /e/ /a/


Probability density

100 200 300 400 500 600 700 800 900


F1 (Hz)

Figure 9.4. Production distributions of three vowels

to look at auditory events that are likely to be perceived as /i/ or as one of its
neighbours in the vowel space, such as /I/ or /e/. Thus, the auditory form
(AudF) is a combination of an F1 and an F2 value, and the surface form (SF) is a
vowel segment such as /i/ or /e/. This section shows how the AudF!SF
mapping is handled with an Optimality-Theoretic grammar that contains
cue constraints, which evaluate the relation between AudF and SF, and struc-
tural constraints, which evaluate the output representation SF.
For simplicity I discuss the example of a language with three vowels, /a/, /e/,
and /i/, in which the only auditory distinction between these vowels lies in
their F1 values. Suppose that the speakers realize these three vowels most often
with F1 values of 700 Hz, 500 Hz, and 300 Hz, respectively, but that they also
vary in their realizations. If this variation can be modelled with Gaussian
curves with standard deviations of 60 Hz, the distributions of the speakers’
productions will look as in Figure 9.4.
Now how do listeners classify incoming F1 values, that is to which of the
three categories /a/, /e/, and /i/ do they map a certain incoming F1 value x?
This mapping can be handled by a family of negatively formulated
Optimality-Theoretic cue constraints, which can be expressed as ‘if the
auditory form contains an F1 of x Hz, the corresponding vowel in the surface
form should not be y’ (Escudero and Boersma 2003, 2004).2 These cue
constraints exist for all F1 values between 100 and 900 Hz and for all three
vowels. Examples are given in (9.1).

2 There are two reasons for the negative formulation of these constraints. First, a positive formu-
lation would simply not work in the case of the integration of multiple auditory cues (Boersma and
Escudero 2004). Second, the negative formulation allows these constraints to be used in both
directions (comprehension and production), as is most clearly shown by the fact that they can be
formulated symmetrically as *[x]/y / (Boersma 2005). The former reason is not relevant for the present
paper, but the second is, because the same cue constraints are used in the next two sections.
174 Gradience in Phonology

(9.1) Cue constraints for mapping F1 values to vowel categories:


‘an F1 of 340 Hz is not /a/’
‘an F1 of 340 Hz is not /e/’
‘an F1 of 340 Hz is not /i/’
‘an F1 of 539 Hz is not /i/’
The second type of constraints involved in the AudF!SF mapping are the
structural constraints that evaluate the output SF. In the present case, they
could be something like */a/, */e/, and */i/. I assume that all three vowels are
equally perfectly licit phonemes of the language, so that these constraints
must be ranked low. I ignore them in the rest of this paper, so that phoneme
categorization is handled solely by the cue constraints.3
The ranking of the cue constraints results from lexicon-driven perceptual
learning (Boersma 1997, 1998a; Escudero and Boersma 2003, 2004): the learner
hears an auditory event drawn from the environmental distributions in
Figure 9.4 and classiWes it as a certain vowel, and the lexicon subsequently
tells her which vowel category she should have perceived. This type of learning
assumes that the acquisition process contains a period in which the listener
already knows that the language has three vowel categories and in which all
her lexical representations are already correct. If such a learner misclassiWes a
speaker’s intended /pit/ as /pet/, her lexicon, which contains the underlying
form jpitj, will tell her that she should have perceived /pit/ instead. When
detecting an error in this way, the learner will take action by changing the
ranking of some constraints. Suppose that at some point during acquisition
some of the constraints are ranked as in Tableau 9.1. The learner will then
perceive an incoming F1 of 380 Hz as the vowel /e/, as indicated by the
pointing Wnger in Tableau 9.1. We can also read from Tableau 9.1 that
320 Hz will be perceived as /i/, and 460 Hz as /e/.

Tableau 9.1. Learning to perceive vowel height

[380 Hz] 320 Hz 380 Hz 460 Hz 320 Hz 460 Hz 380 Hz 380 Hz 320 Hz 460 Hz
not /a/ not /a/ not /i/ not /e/ not /a/ not /i/ not /e/ not /i/ not /e/
(UF = |i|)

/a/ *!

/e/ ←*

/i/ *!→

3 Auditory constraints, if they exist, evaluate the input and cannot therefore distinguish between
the candidates.
Prototypicality Judgements as Inverted Perception 175

If the lexicon now tells the learner that she should have perceived /i/ instead
of /e/, she will regard this as the correct adult SF, as indicated by the check
mark in Tableau 9.1. According to the gradual learning algorithm for stochas-
tic OT (Boersma 1997, Boersma and Hayes 2001), the learner will take action
by raising the ranking value of all the constraints that prefer the adult form /i/
to her own form /e/ (here only ‘380 Hz is not /e/’) and by lowering the
ranking value of all the constraints that prefer /e/ to /i/ (here only ‘380 Hz
is not /i/’). These rerankings are indicated by the arrows in Tableau 9.1.
To see what kind of Wnal perception behaviour this procedure leads to, I ran
a computer simulation analogous to the one by Boersma (1997). A virtual
learner has 243 constraints (F1 values from 100 to 900 Hz in steps of 10 Hz, for
all three vowel categories), all with the same initial ranking value of 100.0. The
learner then hears 10 million F1 values randomly drawn from the distributions
in Figure 9.4, with an equal probability of one in three for each vowel. She is
subjected to the learning procedure exempliWed in Tableau 9.1, with full
knowledge of the lexical form, with an evaluation noise of 2.0, and with a
plasticity (the amount by which ranking values rise or fall when a learning step
is taken) of 0.01. The result is shown in Figure 9.5.
The Wgure is to be read as follows. F1 values below 400 Hz will mostly be
perceived as /i/, since in that region the constraint ‘an F1 of x Hz is not /i/’ (the
solid curve) is ranked lower than the constraints ‘an F1 of x Hz is not /e/’ (the
dashed curve) and ‘an F1 of x Hz is not /a/’ (the dotted curve). Likewise, F1
values above 600 Hz will mostly be perceived as /a/, and values between 400
and 600 Hz mostly as /e/. For every F1 value the Wgure shows us not only the
most often perceived category but also the degree of variation. Around
400 Hz, /i/ and /e/ perceptions are equally likely. Below 400 Hz it becomes
more likely that the listener will perceive /i/, and increasingly so when the

110

105
Ranking value

100

95

90
100 200 300 400 500 600 700 800 900
F1 (Hz)

Figure 9.5. The final ranking of ‘an F1 of x Hz is not /vowel/’, for the vowels /i/ (solid
curve), /e/ (dashed curve), and /a/ (dotted curve)
176 Gradience in Phonology

distance between the curves for /i/ and /e/ increases. This distance is largest for
F1 values around 250 Hz, where there are 99.8 per cent /i/ perceptions and
only 0.1 per cent perceptions of /e/ and /a/ each. Below 250 Hz, the curves
approach each other again, leading to more variation in categorization.
A detailed explanation of the shapes of the curves in terms of properties of
the gradual learning algorithm (approximate probability matching between
250 and 750 Hz, and low corpus frequencies around 100 and 900 Hz) is
provided at the end of the next section, where the shapes of the curves are
related to the behaviour of prototypicality judges.

9.3.2 Formalizing the prototypicality task


As seen in Figure 9.3 (middle), the prototypicality task can be seen as a
mapping from a phonological surface form to an auditory form, without the
involvement of an articulatory form. This section shows how this SF!AudF
mapping is handled with the same optimality-theoretic cue constraints as
phoneme categorization. From Figure 9.1, we can see that auditory constraints
might be involved in evaluating the output of this mapping, but given that we
do not know whether such constraints (against loud and unpleasant noises?)
are relevant at all for phonology, I ignore them here.
The mapping from SF to AudF in the prototypicality task is thus entirely
handled by the cue constraints. For the listener simulated in the previous section,
these constraints are ranked as in Figure 9.5. The ranking of the constraints for /i/
has been copied from the solid curve in Figure 9.5 to the top row in Tableau 9.2.
In Figure 9.5, for instance, the bottom of the /i/ curve lies at an F1 of 250 Hz.
In Tableau 9.2 this is reXected by the bottom ranking of ‘250 Hz is not /i/’. In
Tableau 9.2 we also see that as the F1 goes up or down from 250 Hz, the constraint
against perceiving this F1 as /i/ becomes higher ranked, just as in Figure 9.5.
With the ranking shown in Figure 9.5 and Tableau 9.2, and with zero
evaluation noise, the listener will choose an F1 of 250 Hz as the optimal
value for /i/. This is more peripheral (more towards the edge of the F1
continuum) than the most often heard /i/, which has an F1 of 300 Hz
according to Figure 9.4. The size of the eVect (50 Hz) is comparable to the
eVect found by Johnson et al. (1993) and Frieda et al. (2000). Of course, this
simulated value of 50 Hz depends on several assumptions, such as an initial
equal ranking for all the constraints, which is probably unrealistic (for a more
realistic proposal based on a period of distributional learning before lexicon-
driven learning, see Boersma et al. 2003). The parameter that determines the
size of the eVect in the present simulation is the standard deviation of the F1
values in the environmental distribution in Figure 9.1, which was 60 Hz. With
diVerent standard deviations, a diVerent eVect size is expected.
Prototypicality Judgements as Inverted Perception 177

Tableau 9.2. The auditory F1 value that gives the best / i /

320 310 170 180 300 190 290 200 280 210 270 230 220 240 260 250
Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz
/i/ not not not not not not not not not not not not not not not not
/i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/
[170 Hz] *!
[180 Hz] *!
[190 Hz] *!
[200 Hz] *!
[210 Hz] *!
[220 Hz] *!
[230 Hz] *!
[240 Hz] *!
[250 Hz] *

[260 Hz] *!
[270 Hz] *!
[280 Hz] *!
[290 Hz] *!
[300 Hz] *!
[310 Hz] *!
[320 Hz] *!

The result of 250 Hz in Tableau 9.2 was based on a categorical ranking of


the constraints. In the presence of evaluation noise the outcome will vary. If
the evaluation noise is 2.0, i.e. the same as during the learning procedure of
the previous section, the outcome for the listener of Figure 9.5 and Tableau 9.2
will vary as in Figure 9.6, which was assembled by computing the outcomes of
100,000 tableaus like Tableau 9.2.
The diVerences between environmental F1 values and prototypicality
judgements seen when comparing Figures 9.4 and 9.6 are very similar to the
production/perception diVerences in the experiments by Johnson et al. (1993)
and Frieda et al. (2000).
178 Gradience in Phonology

/e/ /a/
/i/
Probability density

100 200 300 400 500 600 700 800 900


FI (Hz)

Figure 9.6. Prototypicality distributions for the three vowels

The conclusion is that if the prototypicality task uses the same constraint
ranking as phoneme categorization, auditorily peripheral segments will be
judged best if their auditory values are extreme, because cue constraints have
automatically been ranked lower for extreme auditory values than for more
central auditory values. The question that remains is: how has the /i/ curve in
Figure 9.5 become lower at 250 Hz than at 300 Hz? The answer given by
Boersma (1997) is the probability-matching property of the Gradual Learning
Algorithm: the ultimate vertical distance between the /i/ and /e/ curves for a
given F1 is determined (after learning from a suYcient amount of data) by the
probability that that F1 reXects an intended /i/ rather than an intended /e/;
given that an F1 of 250 Hz has a smaller probability of having been intended as
/e/ than an F1 of 300 Hz, the vertical distance between the /i/ and /e/ curves
grows to be larger at 250 Hz than at 300 Hz, providing that the learner is given
suYcient data. With the Gradual Learning Algorithm and enough input, the
prototypicality judge will automatically come to choose the F1 token that is
least likely to be perceived as anything else than /i/.4
There are two reasons why the prototype does not have an even lower F1
than 250 Hz. The Wrst reason, which can be illustrated with the simulation, is
that there are simply not enough F1 values of, say, 200 Hz to allow the learner
to reach the Wnal state of a wide separation between the /i/ and /e/ curves; for
the simulated learner, Figure 9.5 shows that even 10 million inputs did not
suYce. The second reason, not illustrated by the simulation, is that in reality
the F1 values are not unbounded. Very low F1 values are likely to be perceived
as an approximant, fricative, or stop rather than as /i/. Even within the vowel

4 This goal of choosing the least confusing token was proposed by Lacerda (1997) as the driving
force behind the prototypicality judgement. He did not propose an underlying mechanism, though.
See also Section 9.4.
Prototypicality Judgements as Inverted Perception 179

space, this eVect can be seen at the other end of the continuum: one would
think that the best token for /a/ would have an extremely high F1, but in reality
an F1 of, say, 3000 Hz will be perceived as /i/, because the listener will
reinterpret it as an F2 with a missing F1.

9.3.3 Formalizing phoneme production


Now that we have seen how inverted perception accounts for the /i/ prototype
eVect, we still have to see how it is possible that the same peripheral values are
not used in the phoneme production task. Presumably, after all, the learner as
a speaker will grow to match the modal F1 value of 300 Hz that she Wnds in
her environment (if sound change can be ignored).
The answer is shown in Figure 9.3 (right): the phoneme production task
takes an SF as its input (as does the prototypicality task), but has to generate
both an auditory form and an articulatory form as its output. Similarly to the
prototypicality task, the production process will have to take into account the
cue constraints, but unlike the prototypicality task, the production process
will also have to take into account sensorimotor constraints and articulatory
constraints.
Tableau 9.3 shows how the phonological surface form /i/ is produced
phonetically. The cue constraints are still ranked exactly as in Tableau 9.2.
Every candidate cell, however, now contains a pair of phonetic representa-
tions: articulatory and auditory. The articulatory part of each candidate
shows the gestures needed for articulating [i]-like sounds. For simplicity I
assume that the main issue is the precision with which the tongue has to be
bulged towards the palate, and that more precision yields lower F1 values, for
example a precision of ‘26’ yields an F1 of 240 Hz whereas a precision of ‘17’
yields an F1 of 330 Hz. These precison values are evaluated by articulatory
constraints that are ranked by the amount of eVort involved, i.e. the
constraint ‘the precision should not be greater than 26’ has to outrank
the constraint ‘the precision should not be greater than 17’.
The sensorimotor constraints are missing from Tableau 9.3. This is because
for purposes of simplicity I assume here that the relation between articulatory
and auditory form is Wxed, that is the speaker has a fully proWcient view of what
any articulation will sound like and of how any auditory event can be imple-
mented articulatorily. The candidate [240 Hz]Aud [prec¼26]Art, for instance,
occurs in Tableau 9.3 because it only violates the low-ranked sensori-
motor constraint *[240 Hz]Aud [prec¼26]Art, whereas candidates like
[240 Hz]Aud [prec¼22]Art and [270 Hz]Aud [prec¼26]Art violate the high-
ranked sensorimotor constraints *[240 Hz]Aud [prec¼22]Art and
*[270 Hz]Aud [prec¼26]Art and are therefore ignored in Tableau 9.3. By making
eVortful, whereas forms with a higher F1 are too confusable.
pair [F1 ¼ 300 Hz]Aud [prec¼20]Art wins. Forms with a lower F1 are too

the speaker’s behaviour.


Wxed, so that only the cue constraints and the articulatory constraints determine
this simpliWcation, we can regard the relationship between AudF and ArtF as

Tableau 9.3.

180
The result of the ranking in Tableau 9.3 is that the auditory-articulatory
prec 320 prec 310 prec 170 prec 180 prec 300 prec 190 290 200 280 210 270 230 220 240 260 250
not Hz not Hz not Hz not Hz not Hz not Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz

Gradience in Phonology
/i/
> not > not > not > not > not > not not not not not not not not not not not
26 /i/ 24 /i/ 22 /i/ 20 /i/ 18 /i/ 16 /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/

Providing vowel height


[170 Hz]Aud [prec=33]Art *! * * * * * *
[180 Hz]Aud [prec=32]Art *! * * * * * *
[190 Hz]Aud [prec=31]Art *! * * * * * *
[200 Hz]Aud [prec=30]Art *! * * * * * *
[210 Hz]Aud [prec=29]Art *! * * * * * *
[220 Hz]Aud [prec=28]Art *! * * * * * *
[230 Hz]Aud [prec=27]Art *! * * * * * *
[240 Hz]Aud [prec=26]Art *! * * * * *
[250 Hz]Aud [prec=25]Art *! * * * * *
[260 Hz]Aud [prec=24]Art *! * * * *
[270 Hz]Aud [prec=23]Art *! * * * *
[280 Hz]Aud [prec=22]Art *! * * *
[290 Hz]Aud [prec=21]Art *! * * *
[300 Hz]Aud [prec=20]Art * * *
[310 Hz]Aud [prec=19]Art *! * *
[320 Hz]Aud [prec=18]Art *! *
Prototypicality Judgements as Inverted Perception 181

The result of the phoneme production task is thus very diVerent from that
of the prototypicality task. The diVerence between the two tasks can be
reduced to the presence of the articulatory constraints in the production
task and their absence in the prototypicality task.

9.3.4 The formal explanation for the /i/ prototype eVect


With the tables and simulation in Sections 9.3.1 to 9.3.3 the /i/ prototype eVect
can now be explained in more formal terms than in Section 9.2.4. The
simulation in Section 9.3.1 explains the fact that the F1 in the prototypicality
task was 50 Hz lower than the modal F1 in the learner’s environment, while
the diVerence between the tables in Section 9.3.2 and Section 9.3.3 explains the
fact that the F1 in the production task was 50 Hz higher than in the proto-
typicality task. One can say that the prototypicality eVect is –50 Hz and the
articulatory eVect is +50 Hz, and that the two eVects cancel out.
The fact that the two eVects cancel out in the example of Sections 9.3.1 to
9.3.3 is due to my arbitrary choices for the ranking of the articulatory
constraints in Tableau 9.3. Had I ranked these constraints higher, the candi-
date [310 Hz] might have won in Tableau 9.3; had I ranked them lower, the
candidate [290 Hz] might have won. Either alternative would have led to a
predicted shift in F1 from one generation of speakers to the next. The actual
ranking in Tableau 9.3 was chosen in such a way that the two eVects cancel out
exactly, so that the production distributions stay stable over the generations.
To sum up: three F1 values for /i/ have been addressed: the modal F1 of the
Wrst generation, the prototypical F1 for the second generation, and the modal
F1 of the second generation. If the prototypicality and articulatory eVects
cancel out (as they do in reality, if there is no sound change), the Wrst and
third of these F1 values will be identical, and the prototype F1 will be the odd
one out and the most conspicuous to researchers. Its diVerence from the two
modal F1 values has been accounted for in Sections 9.3.2 and 9.3.3, respectively.

9.3.5 Stability over generations: a coincidence?


The really surprising observation is now no longer the fact that the proto-
typicality task leads to a diVerent F1 than the modal F1 produced by the Wrst
and second generations, but the fact that the modal F1 of the second gener-
ation is identical to that of the Wrst.
The question of the stability of the production distribution could be
answered with the help of Kirchner’s (1998: 288) proposal that the ranking
of articulatory constraints is Wxed, namely simply a function of articulatory
eVort alone. Imagine that this Wxed ranking is the one in Tableau 9.3, but that
182 Gradience in Phonology

a learner is confronted with an environmental distribution with a modal F1 of


280 instead of 300 Hz. The F1 of the prototype /i/ will shift down, but not by
20 Hz, because the inXuence of /e/ tokens decreases; it may shift from 250 to,
say, 235 Hz. The modal produced F1 will also shift, but not by 15 Hz, because
the articulatory constraints do not shift; it may shift to, say, 290 Hz. Within
one generation, therefore, the modal F1 for /i/ will rise from 280 to 290 Hz,
and in a couple of generations more it will be very close to 300 Hz. An
analogous thing will happen if the environmental distribution has a modal
F1 of 320 Hz: it will move to 300 Hz in three generations or so. Given a Wxed
ranking of the articulatory constraints, therefore, every language will reach
the same equilibrium: 300 Hz is the only stable F1 value possible, as long as
everything else remains equal to the case of Figure 9.4. Other possible
explanations for cross-generational stability involve the various learning
algorithms for production in the parallel phonological-phonetic model
(Boersma 2005), but these are far outside the scope of the present paper.

9.4 Comparison with earlier accounts


Tableaus 9.2 and 9.3 automatically predict that, if the child is given enough
time to learn even the rare overt forms, the best auditory form is one that is
less likely to be perceived as /e/ than the modal F1 value for /i/ is, and that
articulatory constraints lead to a higher F1 value in production. This can be
explained within grammar models in which ArtF can inXuence AudF, because
in such models the resulting AudF will be diVerent according to whether an
ArtF has to be evaluated (as in the phoneme production task) or not (as in the
prototypicality task). Such models include the one exempliWed in the present
paper, namely Boersma’s (2005) parallel model of phonology and phonetics,
where production is modelled as UF ! { SF, AudF, ArtF }, but they also
include Boersma’s (1998) earlier listener-oriented grammar model, where
production is modelled as UF ! (ArtF!AudF!SF). They do not include
forward modular models of production of the type UF ! SF ! AudF ! ArtF,
because in such serial models articulatory restrictions cannot inXuence the
auditory form.
The prototypicality proposal by Johnson et al. (1993) is an example of a
serial production model. Presumably, their model would abbreviate the pro-
totypicality task as SF ! HyperOF, and the phoneme production task as SF
! HyperOF ! OF. Since the presence versus absence of the ‘later’ represen-
tation OF has no way of inXuencing the form of the ‘earlier’ representation
HyperOF, this ‘hyperarticulated phonetic target’ has to contain a peripheral F1
value of 250 Hz that is independent from the experimental task. The authors
Prototypicality Judgements as Inverted Perception 183

provide no conclusive independent evidence for the existence of such a


representation, whereas the representations AudF and ArtF proposed in the
present paper are independently needed to serve as the input to comprehen-
sion and the output of production.
The prototypicality proposal by Frieda et al. (2000) invokes an extra
representation as well, namely the ‘prototype’. For the existence of this level
of representation Kuhl (1991) gave some independent evidence, namely the
‘perceptual magnet’ eVect. However, this eVect can be explained without
invoking prototypes. This has Wrst been shown for lexically labelled exemplars
by Lacerda (1995), and even in models of pure distributional learning without
lexical labels, perceptual warping automatically emerges as an epiphenom-
enon, as has been shown for neural maps by Guenther and Gjaja (1996) and
for Optimality Theory by Boersma et al. (2003). With Occam’s razor, explan-
ations without these poorly supported prototypes have to be preferred.
The prototypicality proposal by Lacerda (1997) derives goodness judge-
ments from the activations of categories in an exemplar model of phonology.
However, the auditory token that generates the highest activation for /i/ is still
the modal auditory form; the best prototype is only derived indirectly as the
auditory form that has the highest activation for /i/ relative to its activation
for other vowel categories. This proposal thus does choose the least confus-
able auditory form, but does not provide an automatic underlying mechan-
ism such as the one that necessarily follows from the task in Figure 9.3
(middle).
Finally, the results derived in this paper could equally well have been
derived by formalizing the grammar and task models in Figures 9.1 to 9.3
within a framework that does not rely on constraint ranking but on
constraint weight addition, such as Harmony Grammar (Legendre et al.
1990a, 1990b).

9.5 Conclusion
The present paper has oVered formal explanations for two facts, namely that a
prototype (by being less confusable) is more peripheral than the modal
auditory form in the listener’s language environment, and that a prototype
(by not being limited by articulatory restrictions) is more peripheral than the
modal auditory form that the listener herself will produce. Given the repre-
sentation-and-constraints model of Figure 9.1, the only assumptions that led
to these formal explanations were that representations are evaluated only if
they are necessarily activated (Figure 9.3) and that in production processes
(Figure 9.2, right; Figure 9.3, right) the output representations are evaluated
184 Gradience in Phonology

in parallel, so that ‘later’ representations can inXuence ‘earlier’ representa-


tions.
The explanations provided here for a phonetic example may well extend to
other areas of linguistics. The place where grammaticality judgements have
most often been investigated is that of syntactic theory. One can imagine that
the corpus frequency of constructions that are informed by speaker-based
requirements at Phonetic Form is greater than would be expected on the basis
of grammaticality judgements in a laboratory reading task, which may only
activate Logical Form. This, however, is worth a separate investigation.
10

Modelling Productivity with


the Gradual Learning Algorithm:
The Problem of Accidentally
Exceptionless Generalizations
A DA M A L B R I G H T A N D B RU C E H AY E S

10.1 Introduction
Many cases of gradient intuitions reXect conXicting patterns in the data that a
child receives during language acquisition.1 An area in which learners fre-
quently face conXicting data is inXectional morphology, where diVerent
words often follow diVerent patterns. Thus, for English past tenses, we have
wing  winged (the most common pattern in the language), wring  wrung
(a widespread [I]  [ˆ] pattern), and sing  sang (a less common [I]  [æ]
pattern). In cases where all of these patterns could apply, such as the novel
verb spling, the conXict between them leads English speakers to entertain
multiple possibilities, with competing outcomes falling along a gradient scale
of intermediate well-formedness (Bybee and Moder 1983; Prasada and Pinker
1993; Albright and Hayes 2003).
In order to get a more precise means of investigating this kind of gradience,
we have over the past few years developed and implemented a formal
model for the acquisition of inXectional paradigms. An earlier version of
our model is described in Albright and Hayes (2002), and its application to
various empirical problems is laid out in Albright et al. (2001), Albright
(2002), and Albright and Hayes (2003). Our model abstracts morphological
and phonological generalizations from representative learning data and uses

1 For helpful comments and advice we would like to thank Paul Boersma, Junko Ito, Armin Mester,
Jaye Padgett, Hubert Truckenbrodt, the editors, and our two reviewers, absolving them for any
shortcomings.
186 Gradience in Phonology

them to construct a stochastic grammar that can generate multiple forms for
novel stems like spling. The model is tested by comparing its ‘intuitions’,
which are usually gradient, against human judgements for the same forms.
In modelling gradient productivity of morphological processes, we have
focused on the reliability of the generalizations: how much of the input data
do they cover, and how many exceptions do they involve? In general, greater
productivity is correlated with greater reliability, while generalizations cover-
ing few forms or entailing many exceptions are relatively unproductive. For
English past tenses, most generalizations have exceptions, so Wnding the
productive patterns requires Wnding the generalizations with the fewest
exceptions. Intermediate degrees of well-formedness arise when the generaliza-
tions covering diVerent patterns suVer from diVerent numbers of exceptions.
The phenomenon of gradient well-formedness shows that speakers do not
require rules or constraints to be exceptionless; when the evidence conXicts, they
are willing to use less than perfect generalizations. One would expect, however,
that when gradience is observed, more reliable generalizations should be
favoured over less reliable ones. In this article, we show that, surprisingly, this
is not always the case. In particular, we Wnd that there may exist generalizations
that are exceptionless and well-instantiated, but are nonetheless either com-
pletely invalid, or are valued below other, less reliable generalizations.
The existence of exceptionless, but unproductive patterns is a challenge for
current approaches to gradient productivity, which generally attempt to extend
patterns in proportion to their strength in the lexicon. We oVer a solution for
one class of these problems, based on the optimality-theoretic principle of
constraint conXict and employing the Gradual Learning Algorithm (Boersma
1997; Boersma and Hayes 2001). In the Wnal section of the paper we return to our
earlier work on gradience and discuss the implications of our present Wndings.

10.2 Navajo sibilant harmony


The problem of exceptionless but unproductive generalizations arose in our
eVorts to extend our model to learn non-local rule environments. The Wrst
example we discuss comes from sibilant harmony in Navajo, a process
described in Sapir and Hoijer (1967).
Sibilant harmony can be illustrated by examining the allomorphs of the
s-perfective preWx. This preWx is realized as shown in (10.1) (examples from
Sapir and Hoijer):2

2 We have rendered all transcriptions (including Sapir and Hoijer’s) in near-IPA, except that we use
[č čh č’ š ž] for [tS tSh tS’ SZ] in order to depict the class of nonanterior sibilants more saliently.
The Gradual Learning Algorithm 187

(10.1) a. [šı̀-] if the Wrst segment of the stem is a [–anterior]


sibilant ([č, č’, čh, š, ž]), for example in [šı̀-čh ı̀t]
‘he is stooping over’
b. [šı̀-] or [sı̀- ] if somewhere later in the stem is a [–anterior] sibilant,
as in [šı̀-thé:ž]  [sı̀-th é:ž] ‘they two are lying’
(free variation)
c. [sı̀-] otherwise, as in [sı̀-thK2] ‘he is lying’3
A fully realistic simulation of the acquisition of Navajo sibilant harmony
would require a large corpus of Navajo verb stems, along with their
s-perfectives. Lacking such a corpus, we performed idealized simulations
using an artiWcial language based on Navajo: we selected whole Navajo
words (rather than stems) at random from the electronic version of Young
et al.’s dictionary (1992), and constructed s-perfective forms for them by
attaching [šı̀-] or [sı̀-] according to the pattern described in (10.1).

10.3 The learning model


Our learning system employs some basic assumptions about representations
and rule schemata. We assume that words are represented as sequences of
phonemes, each consisting of a bundle of features, as in Chomsky and Halle
(1968). Rules and constraints employ feature matrices that describe
natural classes, as well as variables permitting the expression of non-local
environments: ([+F]) designates a single skippable segment of the type [+F],
while ([+F])* designates any number of skippable [+F] segments. Thus, the
environment in (10.2):
(10.2) /___ ([+seg])* [–anterior]
can be read ‘where a non-anterior segment follows somewhere later in the
word’ ([+seg] denotes the entire class of segments).
The model is given a list of pairs, consisting of bases and inXected forms.
For our synthetic version of Navajo, such a list would be partially represented
by (10.3):
(10.3) a. [pà:?] [sı̀-pà:?]
b. [č’ı̀æ] [šı̀-č’ı̀æ]
c. [čhò:jı̀n] [šı̀-čhò:jı̀n]
3 Sapir and Hoijer speciWcally say (1967: 14–15): ‘Assimilation nearly always occurs when the two
consonants are close together (e.g. šı̀-čà:?, from sı̀-čà:? ‘‘a mass lies’’; . . . but it occurs less often when
the two consonants are at a greater distance.’
188 Gradience in Phonology

d. [kàn] [sı̀-kàn]
e. [k’àz] [sı̀-k’àz]
f. [khéška1:] [šı̀-khéška1:], [sı̀-khéška1:]
g. [sı́:æ] [sı̀-sı́:æ]
h. [tha1š] [šı̀-tha1š], [sı̀-tha1š]
i. [thó:?] [sı̀-thó:?]
j. [t¸é:ž] [šı̀-t¸é:ž], [sı̀-t¸é:ž]
Where free variation occurs, the learner is provided with one copy of each
variant; thus, for (10.3f) both [khéška1:]  [šı̀-khéška1:] and [khéška1:] 
[sı̀-khéška1:] are provided.
The goal of learning is to determine which environments require [sı̀-],
which require [šı̀-], and which allow both. Learning involves generalizing
bottom-up from the lexicon, using a procedure described below. Generaliza-
tion creates a large number of candidate environments; an evaluation metric
is later employed to determine how these environments should be employed
in the Wnal grammar.

(.) I. PREFIX [sì-] II. PREFIX [šì-]

a. [pà:?] [sì-pà:?] a. [chò:jìn] [šì-chò:jìn]

b. [kàn] [sì-kàn] b. [cPì ] [šì-cPì ]

c. [khéškã:]
` `
[sì-khéškã:] c. [khéškã:]
` `
[šì-khéškã:]

d. [t é:z] [sì-t é:z] d. [t é:z] [šì-t é:z]

`
e. [thãš] `
[sì-thãš] `
e. [thãš] `
[šì-thãš]

f. [kPàz] [sì-kPàz]

g. [sí: ] [sì-sí: ]

h. [thó:?] [sì-thó:?]
The Gradual Learning Algorithm 189

Learning begins by parsing forms into their component morphemes and


grouping them by the morphological change they involve. The data in (10.3)
exhibit two changes, as shown in (10.4); the box surrounds cases of free
variation.
For each change, the system creates hypotheses about which elements in the
environment crucially condition the change. It begins by treating each pair as
a ‘word-speciWc rule’, separating out the changing part from the invariant
part. Thus, the Wrst three [šı̀-] forms in (4) would be construed as in (5):
(10.5) a. Ø ! šı̀ / [ ___ čhò:jı̀n]
b. Ø ! šı̀ / [ ___ č’ı̀æ]
c. Ø ! šı̀ / [___ khéška1:]
Next, the system compares pairs of rules that have the same change (e.g. both
attach [šı̀-]), and extracts what their environments have in common to form a
generalized rule. Thus, given the word-speciWc rules in (10.6a), the system
collapses them together using features, as in (10.6b).
(.) a. ∅ → šì / [ ___ thãš]
∅ → šì / [ ___ t é:z]

b. ∅ → šì / [ ___ th ã̀ š ]

+ ∅ → šì / [ ___ t é: z ]
−sonorant +syllabic −sonorant
= ∅ → šì / [ ___ −continuant −high +continuant ]
+anterior −round −anterior
+strident

In this particular case, the forms being compared are quite similar, so
determining which segment should be compared with which is unproblem-
atic. But for forms of diVerent lengths, such as [čhò:jı̀n] and [č’ı̀æ] above, this
is a harder question.4 We adopt an approach that lines up the segments that
are most similar to each another. For instance, (10.7) gives an intuitively
reasonable alignment for [čhò:jı̀n] and [č’ı̀æ]:
(10.7) čh ò: j ı̀ n
j j j
č’ ı̀ æ
4 The issue did not arise in an earlier version of our model (Albright and Hayes 2002), which did
not aspire to learn non-local environments, and thus could use simple edge-in alignment.
190 Gradience in Phonology

Good alignments have two properties: they match phonetically similar


segments such as [čh] and [č’], and they avoid leaving too many segments
unmatched. To evaluate the similarity of segments, we employ the
similarity metric from Frisch et al. (2004). To guarantee an optimal
pairing, we use a cost-minimizing string alignment algorithm (described in
Kruskal 1999) that eYciently searches all possible alignments for best total
similarity.
Seen in detail, the process of collapsing rules is based on three principles,
illustrated in (10.8) with the collapsing of the rules Ø ! šı̀ / [ ___ khéška1:]
and Ø ! šı̀ / [ ___ tha1š].

(10.8) 1. Shared material is 2. Unmatched material is


collapsed using the designated as optional,
feature system. notated with parentheses.

∅ → šì / [ kh é š kã: ]


+ ∅ → šì / [ th ã ]

-sonorant
+syllabic
-contin
= ∅ → šì / [ –high š (k) (ã:)]

+spread gl.
–round
-constr. gl.

3. Sequential optional elements are collapsed


into a single variable, encompassing all of
their shared features (e.g.([+F])*).

-sonorant
+syllabic
[ -contin ([+seg])*]
∅ → šì / -high š
+spread gl.
-round
-constr. gl.

Paired feature matrices are collapsed by constructing a new matrix, containing


all of their shared features (see step 1). Next, any material in one rule that is
unmatched to the other is designated as optional, represented by parentheses
(step 2). Finally, sequences of consecutive optional elements are collapsed
together into a single expression of the form (F)*, where F is the smallest
natural class containing all of the collapsed optional elements (step 3).
The process is iterated, generalizing the new rule with the other words in
the learning data; the resulting rules are further generalized, and so on. Due to
memory limitations, it is necessary periodically to trim back the hypothesis
The Gradual Learning Algorithm 191

set, keeping only those rules that perform best.5 Generalization terminates
when no new ‘keeper’ rules are found.
We Wnd that this procedure, applied to a representative set of words,
discovers the environment of non-local sibilant harmony after only a few
steps. One path to the correct environment is shown in (10.9):

(.) [ch ò: j ì n] [c’ ì ]

+son
-continuant -cons -syllabic
-anterior ( ) *i +anterior
-nasal [ch ì t í]

-continuant
([+seg])* [z ì: ]
-anterior

[-anterior] ([+seg])* [t í w ó z ì:  p á h í ]

∅→ sì-/ [ ([+seg])* [−anterior] ([+seg])* ]

The result can be read: ‘PreWx [šı̀-] to a stem consisting of any number of
segments followed by a nonanterior segment, followed by any number of
segments.’ (Note that [–anterior] segments in Navajo are necessarily sibilant.)
In more standard notation, one could replace ([+seg])* with a free variable X,
and follow the standard assumption that non-adjacency to the distal word
edge need not be speciWed, as in (10.10):
(10.10) Ø ! šı̀- / ___ X [–anterior]
We emphasize that at this stage, the system is only generating hypotheses. The
task of using these hypotheses to construct the Wnal grammar is taken up in
Section 10.5.

5 SpeciWcally: (a) for each word in the training set, we keep the most reliable rule (in the sense of
Albright and Hayes 2002) that derives it; (b) for each change, we keep the rule that derives more forms
than any other.
192 Gradience in Phonology

10.4 Testing the approach: a simulation


We now show that, given representative learning data, the system just
described can discover the rule environments needed for Navajo sibilant
harmony. As noted above, our learning simulation involved artiWcial Navajo
s-perfectives, created by attaching appropriate preWx allomorphs to whole
Navajo words (as opposed to stems). Selecting 200 words at random,6 we
attached preWxes to the bases as follows, following Sapir and Hoijer’s charac-
terization: (a) if the base began with a nonanterior sibilant, we preWxed [šı̀-]
(there were nineteen of these in the learning set); (b) if the base contained but
did not begin with a nonanterior sibilant, we made two copies, one preWxed
with [šı̀-], the other with [sı̀-] (thirty-seven of each); (c) we preWxed [sı̀-] to
the remaining 144 bases.
Running the algorithm just described, we found that among the 92 envir-
onments it learned, three were of particular interest: the environment for
obligatory local harmony, (10.11a); the environment that licenses distal har-
mony, ((10.11b); note that this includes local harmony as a special case); and
the vacuous ‘environment’ specifying the default allomorph [sı̀-], (10.11c).
(10.11) a. Obligatory local harmony
Ø ! [šı̀-] / ___ [–anterior]
b. Optional distal harmony (¼ (10.10))
Ø ! [šı̀-] / ___ X [–anterior]
c. Default [sı̀-]
Ø ! [sı̀-] / ___ X
The remaining eighty-nine environments are discussed below.

10.5 Forming a grammar


These environments can be incorporated into an eVective grammar by treating
them not as rules, as just given, but rather as optimality-theoretic constraints of
6 From the entire database of 3,023 words, we selected 2,000 words at random, dividing this set into
ten batches of 200 words each. To conWrm the generality of our result, we repeated our simulation on
each 200-word training sample. Due to space considerations, we report here the results of only one of
the ten trials; the remaining nine were essentially the same in that they all discovered the environments
in (10.11). The primary diVerence between trials was the precise formulation of the other, unwanted
constraints (Section 10.6), but in every case, such constraints were correctly ranked below the crucial
constraints, as in (10.13).
The Gradual Learning Algorithm 193

morphology (Boersma 1998b; Russell 1999; Burzio 2002; MacBride 2004). In this
approach, rule (10.11a) is reconstrued as a constraint: ‘Use [šı̀-] / ___ [–anterior]
to form the s-perfective.’ This constraint is violated by forms that begin with a
nonanterior segment, but use something other than [šı̀-] to form the s-perfect-
ive. The basic idea is illustrated below, using hypothetical monosyllabic roots:
(10.12) Morphological Candidates Candidates
that obey that violate
base Use [šı̀-] / ___ Use [šı̀-] / ___
[–anterior] [–anterior]
[šáp] [šı̀-šáp] *[sı̀-šáp], *[mù-šáp], etc.
[táp] all none
It is straightforward to rank these constraints in a way that yields the target
pattern, as (10.13) and (10.14) show:
(.) USE [šì-]/ [−ant]>> { USE [sì-] / X, USE [šì-] / X [−ant] } >> all others

ranked in free variation

(.) a.
/sì-cìd/ USE [šì-]/___[–ant] USE [šì-]/___ X [–ant] USE [sì-]/___ X
F šì-cìd *
* sì-cìd *! *

b. /sì-té:z/ USE [šì-]/___[–ant] USE [šì-]/___ X [–ant] USE [sì-]/___ X


F šì-té:z *
F sì-té:z *!

For (10.14b), the free ranking of Use [šı̀-] / ___ X [–ant] and Use [sı̀-] / ___ X
produces multiple winners generated in free variation (Anttila 1997).

10.6 Unwanted constraints


The eighty-nine constraints not discussed so far consist largely of complicated
generalizations that happen to hold true of the learning data. One example is
shown in (10.15):
 
(10.15) Use sı̀- / ___ ([–round])* +anterior ([ consonantal])*]
+continuant
As it happens, this constraint works for all thirty-seven forms that meet its
description in the learning data. However, it makes profoundly incorrect
194 Gradience in Phonology

predictions for forms outside the learning data, such as the legal but non-
existing stem /čálá/ (10.16).

ð10:16Þ +anterior
USE sì- / ([−round])*
+continuant ([−consonantal])*]

sì- c á l á

If ranked high enough, this constraint will have the detrimental eVect of
preventing [šı̀-čálá] from being generated consistently. We will call such
inappropriate generalizations ‘junk’ constraints.
One possible response is to say that the learning method is simply too
liberal, allowing too many generalizations to be projected from the learning
data. We acknowledge this as a possibility, and we have experimented with
various ways to restrict the algorithm to more sensible generalizations. Yet we
are attracted to the idea that constraint learning could be simpliWed—and rely
on fewer a priori assumptions—by letting constraints be generated rather
freely and excluding the bad ones with an eVective evaluation metric. Below,
we lay out such a metric, which employs the Gradual Learning Algorithm.7

10.7 The Gradual Learning Algorithm


The Gradual Learning Algorithm (GLA: Boersma 1997; Boersma and Hayes
2001) can rank constraints in a way that derives free variation and matches the
frequencies of the learning data. The GLA assumes a stochastic version of
optimality theory, whereby each pair of constraints {A, B} is assigned not a
strict ranking, but rather a probability: ‘A dominates B with probability P.’
Thus, the free ranking given in (10.13) above would be captured by assigning
the constraints Use [sı̀-] / ___ X and Use [šı̀-] / ___ X [–ant] a 50–50 ranking
probability.
Any such theory needs a method to ensure that the pairwise probabilities
assigned to the constraints are mutually consistent. In the GLA, this is done by
arranging the constraints along a numerical scale, assigning each constraint a
ranking value. On any particular occasion that the grammar is used, a

7 A reviewer points out that another approach to weeding out unwanted generalizations is to train
the model on various subsets of the data, keeping only those generalizations that are found in all
training sets (cross-validation). Although this technique could potentially eliminate unwanted gener-
alizations (since each subset contains a potentially diVerent set of such generalizations), it could not
absolutely guarantee that they would not be discovered independently in each subset. Given that such
constraints make fatal empirical predictions, we seek a technique that reliably demotes them, should
they arise.
The Gradual Learning Algorithm 195

selection point is adopted for each constraint, taken from a Gaussian prob-
ability distribution with a standard deviation Wxed for all constraints. The
constraints are sorted by their selection points, and the winning candidate is
determined on the basis of this ranking. Since pairwise ranking probabilities
are determined by the ranking values,8 they are guaranteed to be mutually
consistent.

10.8 The need for generality


Let us now consider the application of the GLA to Navajo. Naively, one might
hope that when the constraints are submitted to the GLA, the junk will settle
to the bottom. However, what one actually Wnds is that the junk constraints
get ranked high. Although Use [šı̀-] / ___ [–ant] does indeed get ranked on
top, the crucial constraints Use [šı̀-] /___ X [–ant] and Use [sı̀-]/___ X are
swamped by higher-ranking junk constraints, and rendered largely ineVective.
The result is a grammar that performs quite well on the training data
(producing something close to the right output frequencies for every stem),
but fails grossly in generating novel forms. The frequencies generated for
novel forms are determined by the number of high ranking junk constraints
that happen to Wt them, and do not respect the distribution in (10.11).
The problem is a classic one in inductive learning theory. If a learning
algorithm excessively tailors its behaviour to the training set, it may learn a
patchwork of small generalizations that collectively cover the learning data.
This does not suYce to cover new forms—which, after all, is the main
purpose of having a grammar in the Wrst place!
Why does the GLA fail? The reason is that it demotes constraints only when
they prefer losing candidates. But within the learning data, the junk con-
straints generally prefer only winners—that is precisely why they emerged
from the inductive generalization phase of learning. Accidentally true gener-
alizations thus defeat the GLA as it currently stands. What is needed is a way
for the GLA to distinguish accidentally true generalizations from linguistically
signiWcant generalizations.

10.9 Initial rankings based on generality


Boersma (1998) suggested that for morphology, initial rankings should be
based on generality—the more general the constraint, the higher it is
ranked before learning takes place. It turns out that this insight is the key

8 A spreadsheet giving the function that maps ranking value diVerences to pairwise probabilities is
posted at http://www.linguistics.ucla.edu/people/hayes/GLA/.
196 Gradience in Phonology

to solving the Navajo problem. What is needed, however, is a way to


characterize generality in numerical terms. There are various possible
approaches; Chomsky and Halle (1968), for example, propose counting
symbols (fewer symbols ¼ greater generality). Here, we adopt an empirical
criterion: a morphological constraint is maximally general if it encompasses
all of the forms that exhibit its structural change. We use the fraction in
(10.17):
(10.17)
number of forms that a constraint applies to
total number of forms exhibiting the change that the constraint requires

In the 200-word Navajo simulation discussed above, some representative


generality values are shown in (10.18).

(10.18) Constraint Relevant Forms with Generality


forms this change
use [šı̀-] / ___
19 .339
[–anterior]
56 [šı̀-] forms
use [šı̀-] / ___ 56 1
X [–anterior]
use [sı̀-] / ___ 181 1
X 181 [sı̀-] forms
Constraint (10.15)
37 .204
(‘junk’ constraint)

The idea, then, is to assign the constraints initial ranking values that
reXect their generality, with more general constraints on top. If the scheme
works, all the data will be explained by the most general applicable con-
straints, and the others will remain so low that they never play a role in
selecting output forms.
In order to ensure that diVerences in initial rankings are large enough to
make a diVerence, the generality values from (10.17) were rescaled to cover a
huge probability range, using the formula in (10.19):

(10.19) For each constraint c, initial ranking valuec ¼


Generality c Generality min
500 
Generality max Generality min
The Gradual Learning Algorithm 197

where Generalitymin is the generality of the least general constraint, and


Generalitymax is the generality of the most general constraint.

10.10 Employing generality in a learning simulation


We implemented this scheme and ran it multiple times on the Navajo
pseudodata described above. For one representative run, it caused the relevant
constraints (including here just one representative ‘junk’ constraint (10.15)),
to be ranked as follows:
(.) GENERALITY INITIAL RANKING FINAL RANKING
550
USE [sì-] /__X [−ant]
1,1 1 500, 500 500 500 514.9
499.9,500.1
USE [sì-] /—X .9 450 450

.8 400 400

.7 350 350
100,000
.6 300 training 300
cycles
.5 250 250

.4 200 200
USE [sì-] /__ [−ant] .339 .3 150.9 150 150

‘Junk’ constraint .204 .2 79.7 100 100


(.) .1 50 50 19.2

0 0 0

The Wnal grammar is depicted schematically in (10.21), where the arrows show
the probabilities that one constraint will outrank the other. When the diVer-
ence in ranking value exceeds about 10, the probability that the ranking will
hold is essentially 1 (strict ranking).
(.) USE [šì-] /___ [−ant]
514.9

1 Undominated local harmony

USE [sì-] /___X USE [šì-] /___ X [−anterior]


500.1 .5 499.9

1 Free variation for non-local harmony


+anterior
USE šì- /___([−round])* +continuant ([−consonantal])*]
19.2

Potentially harmful constraints like (.) safely outranked


198 Gradience in Phonology

This approach yields the desired grammar: all of the junk constraints (not just
(10.15)) are ranked safely below the top three.
The procedure works because the GLA is error-driven. Thus, if junk
constraints start low, they stay there, since the general constraint that does
the same work has a head start and averts any errors that would promote the
junk constraints. Good constraints with speciWc contexts, on the other hand,
like ‘Use [šı̀-] /___ [–ant]’, are also nongeneral—but appropriately so. Even if
they start low, they are crucial in averting errors like *[sı̀-šáp], and thus they
are soon promoted by the GLA to the top of the grammar.
We Wnd, then, that a preference for more general statements in grammar
induction is not merely an aesthetic bias; it is, in fact, a necessary criterion in
distinguishing plausible hypotheses from those which are implausible, but
coincidentally hold true in the learning sample.

10.11 Analytic discussion


While the Navajo simulation oVers a degree of realism in the complexity of
the constraints learned, hand analysis of simpler cases helps in understanding
why the simulation worked, and ensures that the result is a general one.
To this end, we reduce Navajo to three constraints, renamed as follows: (1)
Use [sı̀-], which we will call Default, (2) the special-context Use [šı̀-] /___ X
[–ant], which we will call Contextual [šı̀-], and (3) the accidentally-
exceptionless (10.15), which we will call Accidental [sı̀-]. Accidental [sı̀-]
is exceptionless because the relevant forms in the training data happen not to
contain non-anterior sibilants.
Suppose Wrst that all harmony is optional (50/50 variation). Using the
normal GLA, all constraints start out with equal ranking values, set conven-
tionally at 100. The constraints Contextual [šı̀-] and Default should be
ranked in a tie to match the 50/50 variation. During learning (see Boersma
and Hayes 2001: 51–4), these two constraints vacillate slightly as the GLA seeks
a frequency match, but end up very close to their original value of 100.
Accidental [sı̀-] will remain at exactly 100, since the GLA is error driven
and none of the three constraints favours an incorrect output for the training
data that match Accidental [sı̀-] (Default and Accidental [sı̀-] both
prefer [sı̀-], which is correct; and Contextual [šı̀-] never matches these
forms). Thus, all three constraints are ranked at or near 100. This grammar is
incorrect; when faced with novel forms like (10.16) that match all three
constraints, Contextual [šı̀-] competes against two equally ranked antag-
onists, deriving [šı̀-] only a third of the time instead of half.
The Gradual Learning Algorithm 199

Initial rankings based on generality (Section 10.9) correct this problem.


Given that Default and Contextual [šı̀-] cover all [sı̀-] and [šı̀-] forms
respectively, they will be assigned initial ranking values of 500. DeWne the
critical distance C as the minimum diVerence between two constraints that is
needed to model strict ranking. (Informal trials suggest that a value of about
10.5, which creates a ranking probability of .9999, is suYcient.) It is virtually
certain that the initial ranking value for Accidental [sı̀-] will be far below
500 C, because accidentally true constraints cannot have extremely high
generality, other than through an unlikely accident of the learning data.
Ranking proceeds as before, with Default and Contextual [šı̀-] staying
around 500 and Accidental [sı̀-] staying where it began. The resulting
grammar correctly derives 50/50 variation, because Accidental [sı̀-] is too
low to be active.
Now consider what happens when the data involve no free variation; that is
[šı̀-] is the outcome wherever Contextual [šı̀-] is applicable. When initial
rankings are all equal, [šı̀-] forms will cause Contextual [šı̀-] to rise and
Default to fall, with their diVerence ultimately reaching C (Contextual
[šı̀-]: 500+C/2; Default: 500 C/2). Just as before, Accidental [sı̀-] will
remain ranked where it started, at 500. The diVerence of C/2 between Con-
textual [šı̀-] and Accidental [sı̀-], assuming C ¼ 10.5, will be 5.25, which
means that when the grammar is applied to novel forms matching both
constraints, [sı̀-] outputs will be derived about 3 per cent of the time. This
seems unacceptable, given that the target language has no free variation.
Again, the incorrect result is avoided under the initial-ranking scheme of
Section 10.9, provided that Accidental [sı̀-] is initially ranked at or lower
than 500 C/2, which is almost certain to be the case.
In summary, schematized analysis suggests that the Navajo result is not
peculiar to this case. The eVect of accidentally true generalizations is strongest
when optionality is involved, but they pose a threat even in its absence. Initial
rankings based on generality avoid the problem by keeping such constraints a
critical distance lower than the default, so they can never aVect the outcome.

10.12 The realism of the simulation


In this section we address two possible objections to our model.

10.12.1 Phonological rules versus allomorph distribution


Navajo sibilant harmony is typically analysed as a phonological process,
spreading [–anterior] from right to left within a certain domain. The grammar
200 Gradience in Phonology

learned by our model, on the other hand, treats harmony as allomorphy ([sı̀-]
versus [šı̀-]), and cannot capture root-internal harmony eVects. Thus, it may
be objected that the model has missed the essential nature of harmony.
In this connection, we note Wrst that harmony is often observed primarily
through aYx allomorphy—either because there is no root-internal
restriction, or because the eVect is weaker within roots, admitting greater
exceptionality. For these cases, allomorphy may be the only appropriate
analysis. For arguments that root-internal and aYxal harmony often require
separate analyses, see Kiparsky (1968).
More generally, however, there still remains the question of how to unify
knowledge about allomorphy and root-internal phonotactics. Even when
aYxes and roots show the same harmony patterns, we believe that under-
standing the distribution of aYx allomorphs could constitute an important
Wrst step in learning the more general process, provided there is some way of
bootstrapping from constraints on particular morphemes to more general
constraints on the distribution of speech sounds. We leave this as a problem
for future work.

10.12.2 Should arbitrary constraints be generated at all?


Another possible objection is that a less powerful generalization procedure
would never have posited constraints like (10.15) in the Wrst place. Indeed, if
all constraints come from universal grammar (that is, are innate), the need to
trim back absurd ones would never arise. Against this objection can be cited
work suggesting that environments sometimes really are complex and syn-
chronically arbitrary (Bach and Harms 1972; Hale and Reiss 1998; Hayes 1999;
Blevins 2004). For instance, in examining patterns of English past tenses, we
found that all verbs ending in voiceless fricatives are regular, and that speakers
are tacitly aware of this generalization (Albright and Hayes 2003). Not only
are such patterns arbitrary, but they can also be rather complex (see also
Bybee and Moder 1983). Regardless of whether such generalizations are
learned or innate, it seems likely that any model powerful enough to handle
the full range of attested patterns will need a mechanism to sift through large
numbers of possibly irrelevant hypotheses.

10.13 Modelling gradient productivity: the fate of reliability metrics


As noted above, one of our long-term goals is to understand how gradient
productivity arises when the learner confronts conXicting data. The results
above challenge our earlier views, and in this section we lay out ways in which
our previous approach might be revised.
The Gradual Learning Algorithm 201

Earlier versions of our model evaluated contexts according to their accur-


acy, or reliability, deWned as the ratio of the number of forms a rule derives
correctly, divided by the total number of forms to which the rule is applicable.
We have found in many cases that we could model native speaker intuitions of
novel forms by using the reliability of the best rule that derives them (adjusted
slightly, in a way to be mentioned below). However, the results of our Navajo
simulations show that accuracy alone is not an adequate criterion for evalu-
ation, since assiduous rule discovery can sometimes Wnd accidentally-true
(and thus perfectly accurate) generalizations which nonetheless lead to dis-
aster if trusted. The Navajo example illustrates why it is not enough to
evaluate the accuracy of each generalization independently; we must also
consider whether generalizations cover forms that are better handled by a
diVerent generalization.9
Another possible failing of the reliability approach is that it is ill-suited to
capture special case/‘elsewhere’ relations (Kiparsky 1982). The environment
for [šı̀-] in Navajo is diYcult to express by itself, but easy as the complement
set of the [sı̀-] environments. In optimality theory, ‘elsewhere’ is simply the
result of constraint ranking: a context-sensitive constraint outranks the
default. Unfortunately for the reliability-based approach, default environ-
ments such as (10.11c) often have fairly high reliability (181/237 in
this case)—but that does not mean that they should be applied in the
special-allomorph context (e.g. of (10.11a)).
In light of this, it is worth considering why we adopted reliability scores in
the Wrst place. Ironically, the reason likewise involved accidentally-true gen-
eralizations, but of a diVerent kind.
One of the phenomena that compelled us to use reliability scores was the
existence of small-scale patterns for irregulars, seen, for example, in English
past tenses. As Pinker and Prince (1988) point out, when a system includes
irregular forms, they characteristically are not arbitrary exceptions, but fall
into patterns, e.g. cling  clung, Xing  Xung, swing  swung. These patterns
have some degree of productivity, as shown by historical change (Pinker 1999)
and ‘wug’ (nonce-word) testing (Bybee and Moder 1983; Prasada and Pinker
1993; Albright and Hayes 2003).10

9 A related problem, in which overly broad generalizations appear exaggeratedly accurate because
they contain a consistent subset, is discussed in Albright and Hayes (2002).
10 We restrict our discussion to phonological patterns; for discussion of patterns based possibly on
semantic, rather than phonological similarities, see Ramscar (2002). In principle, the approach
described here could be easily extended to include constraints that refer to other kinds of information;
it is an empirical question what properties allomorphy rules may refer to.
202 Gradience in Phonology

The problem is that our algorithm can often Wnd environments for these
minor changes that are exceptionless. For example, the exceptionless minor
change in (10.22) covers the four verbs dig, cling, Xing, and sling.11
2 3
þcor  
þdorsal
ð10:22Þ i ! =X þant
4 5 Š½ þpast Š
þvoice
þvoice
The GLA, when comparing an exceptionless constraint against a more general
constraint that suVers from exceptions, always ranks the exceptionless con-
straint categorically above the general one. For cases like Navajo, where the
special constraint was (10.11a) and the general constraint was (10.11c),
the default constraint for [sı̀-], this ranking is entirely correct, capturing the
special/default relationship. But when exceptionless (10.22) is ranked categor-
ically above the constraints specifying the regular ending for English (such as
Use [-d]), the prediction is that novel verbs matching the context of (10.22)
should be exclusively irregular (i.e. blig ! blug, not *bligged). There is
evidence that this prediction is wrong, from wug tests on forms that match
(10.22). For instance, the wug test reported in Albright and Hayes (2003)
yielded the following judgements (scale: 1 worst, 7 best):
(10.23) Present stem Choice for past Rating
a. blig [blIg] blug [bl^g] 4.17
bligged [blIgd] 5.67
b. spling [splI˛] splung [spl˛] 5.45
splinged [splI˛d] 4.36
The regular forms are almost as good or better than the forms derived by the
exceptionless rule.
We infer that numbers matter: a poorly attested perfect generalization such
as (10.22) is not necessarily taken more seriously than a broadly attested
imperfect generalization such as Use [-d]. For Navajo, strict ranking is
appropriate, since the special-environment constraint (10.11a) that must out-
rank the default (10.11c) is robustly attested in the language. In the English
case, the special-environment constraint is also exceptionless, but is attested
in only four verbs, yet the GLA—in either version—ranks it on top of the
grammar, just as in Navajo.
11 This is the largest set of I !  verbs that yields an exceptionless generalization. There are other
subsets, such as cling, Xing, and sling, that also lead to exceptionless generalizations, and these are also
generated by our model. The problem that we discuss below would arise no matter which set is
selected, and would not be solved by trying to, for example, exclude dig from consideration.
The Gradual Learning Algorithm 203

It can now be seen why in our earlier work we avoided constraint interaction
and employed reliability scores instead. With reliability scores, it is simple to
impose a penalty on forms derived by rules supported by few data—following
Mikheev (1997), we used a statistical lower conWdence limit on reliability.
Thus, for a wug form like blig, two rules of comparable value compete: the
regular rule (has exceptions, but vast in scope) versus (10.22) (no exceptions,
but tiny in scope). Ambivalence between the two is a natural consequence.
If reliability statistics are not the right answer to this problem, what is? It
seems that the basic idea that rules based on fewer forms should be downgraded
is sound. But the downgrade need not be carried out based on reliability
scores—it might also be made part of the constraint ranking process. In
particular, we propose that the basic principles of the GLA be supplemented
with biases that exert a downward force on morphological constraints that are
supported by few data, using statistical smoothing or discounting.
As of this writing we do not have a complete solution, but we have
experimented with a form of absolute discounting (Ney et al. 1994), imple-
mented as follows: for each constraint C, we add to the learning data an
artiWcial datum that violates C and obeys every other constraint with which C
is in conXict. Under this scheme, if C (say, (10.22) above) is supported by just
four forms, then an artiWcially-added candidate would have a major eVect in
downgrading its ranking. But if C is supported by thousands of forms (for
example, the constraint for a regular mapping), then the artiWcially added
candidate would be negligible in its eVect.
We found that when we implemented this approach, it yielded reasonable
results for the English scenario just outlined: in a limited simulation consist-
ing of the regulars in Albright and Hayes (2003) plus just the four irregulars
covered by (10.22), regular splinged was a viable competitor with splung, and
the relationships among the competing regular allomorphs remained essen-
tially unchanged.
There are many ways that small-scale generalizations could be downgraded
during learning. We emphasize that the development of a well-motivated
algorithm for this problem involves not just issues of computation, but an
empirical question about productivity: when real language learners confront
the data, what are the relative weights that they place on accuracy versus size
of generalization? Both experimental and modelling work will be needed to
answer these questions.12

12 An unresolved question that we cannot address here is whether a bias for generality can be
applied to all types of phonological constraints, or just those that govern allomorph distribution. It is
worth noting that for certain other types of constraints, such as faithfulness constraints, it has been
argued that speciWc constraints must have higher initial rankings than more general ones (Smith
2000). At present, we restrict our claim to morphological constraints of the form ‘Use X’.
204 Gradience in Phonology

10.14 Conclusion
The comparison of English and Navajo illustrates an important problem in
the study of gradient well-formedness in phonology. On the one hand, there
are cases such as English past tenses, in which the learner is confronted with
many competing patterns and must trust some generalizations despite some
exceptions. In such cases, gradient well-formedness is rampant, and the
model must retain generalizations with varying degrees of reliability. On the
other hand, there are cases such as Navajo sibilant harmony, in which
competition is conWned to particular contexts, and the learner has many
exceptionless generalizations to choose from. In these cases, the challenge is
for the model to choose the ‘correct’ exceptionless patterns, and refrain from
selecting an analysis that predicts greater variation than is found in the target
language.
We seek to develop a model that can handle all conWgurations of gradience
and categoricalness, and we believe the key lies in the trade-oV between
reliability and generality. We have shown here how our previous approach
to the problem was insuYcient, and proposed a new approach using the GLA,
modiWed to favour more general constraints. The precise details of how
generality is calculated, and how severe the bias must be, are left as a matter
for future research.
Part III
Gradience in Syntax
This page intentionally left blank
11

Gradedness as Relative Efficiency in


the Processing of Syntax and
Semantics1
J O H N A . H AW K I N S

11.1 Introduction
This paper presents a set of corpus data from English, a head-initial language,
and some additional data from Japanese, a head-Wnal language, showing clear
selection preferences among competing structures. The structures involve the
positioning of complements and adjuncts relative to the verb, and the prefer-
ences range from highly productive to unattested (despite being grammatical).
These ‘gradedness eVects’ point to a principle of eYciency in performance,
minimize domains (MiD). SpeciWcally, this chapter argues for the following:
(11.1) a. MiD predicts gradedness by deWning the syntactic and semantic
relations holding between categories, by enumerating the surface
structure domains in which these relations can be processed, and
by ranking common domains in competing structures according
to their degree of minimization. Relative weightings and cumu-
lative eVects among diVerent syntactic and semantic relations are
explained in this way.
b. The same minimization preferences can be found in the preferred-
grammatical conventions of diVerent language types, and a per-
formance–grammar correspondence hypothesis is proposed.
c. Principles of performance can predict what purely grammatical
principles of ordering can only stipulate, and can explain excep-
tions to grammatical principles. A model that appears to capture
the desired performance–grammar correspondence, stochastic
1 This paper is dedicated to Günter Rohdenburg on the occasion of his 65th birthday. Günter’s
work discovering patterns of preference in English performance (see e.g. Rohdenburg 1996) has been
inspirational to me and to many others over many years.
208 Gradience in Syntax

optimality theory, has the reverse problem: it stipulates and fails


to predict the performance data.
d. What is needed in this and other grammatical areas is: a predict-
ive and explanatory model of performance; an adequate descrip-
tion of the grammatical conventions that have been shaped by
performance preferences; and a diachronic model of adaptation
and change.
The order of presentation is as follows. In Section 11.2 I deWne the principle of
minimize domains. Section 11.3 tests this principle on postverbal PPs in
English and on preverbal NPs and PPs in Japanese and introduces a metric
for quantifying multiple constraints and their interaction. Section 11.4 dis-
cusses minimize domains in grammars and in cross-linguistic variation, and
Section 11.5 summarizes the conclusions.

11.2 Minimize domains


This principle is deWned in (11.2):
(11.2) Minimize domains (MiD)
The human processor prefers to minimize the connected sequences of
linguistic forms and their conventionally associated syntactic and
semantic properties in which relations of combination and/or depend-
ency are processed. The degree of this preference is proportional to the
number of relations whose domains can be minimized in competing
sequences or structures, and to the extent of the minimization diVer-
ence in each domain.
A relation of combination is deWned here as follows.
(11.3) Combination
Two categories A and B are in a relation of combination iV they occur
within the same mother phrase and maximal projections (phrasal
combination), or if they occur within the same lexical co-occurrence
frame (lexical combination).
Clever is in combination with student in clever student, since both are in the
same mother phrase (NP), read combines with the book in the VP read the
book, and so on. These phrasal combinations are deWned by general phrase
structure rules. Subject and object arguments of a verb are in lexical combin-
ation with that verb and with one another, and more generally the ‘comple-
ments’ of a verb are listed alongside that verb in its lexical entry.
For dependency I propose what is ultimately a processing deWnition:
Gradedness in the Processing of Syntax and Semantics 209

(11.4) Dependency
Two categories A and B are in a relation of dependency iV the
processing of B requires access to A for the assignment of syntactic
or semantic properties to B with respect to which B is zero-speciWed
or ambiguously or polysemously speciWed.
Theta-role assignment to an NP by reference to a verb can serve as an example
of a dependency of B on A. Co-indexation of a reXexive anaphor by reference
to an antecedent, and gap processing in relation to a Wller, are others. Some
dependencies between A and B also involve combination (theta-role assign-
ments, for example), others do not.
A ‘domain’, as this term is used in this context, is deWned in (11.5):
(11.5) Domain
A combinatorial or dependency domain consists of the smallest con-
nected sequence of terminal elements and their associated syntactic
and semantic properties that must be processed for production and/or
recognition of the combinatorial or dependency relation in question.
The domain suYcient for recognition of the VP and its three immediate
constituents (V, PP1, PP2) is shown in bold in the following sentence (cf. 11.3.1
below): the old lady counted on him in her retirement.
One prediction made by MiD (11.2) that will be signiWcant here involves the
preferred adjacency of some categories versus others to a head of phrase:
(11.6) Adjacency to heads
Given a phrase {H, {X, Y}}, H a head category and X and Y phrases
that are potentially adjacent to H, then the more combinatorial and
dependency relations whose processing domains can be minimized
when X is adjacent to H, and the greater the minimization diVerence
between adjacent X and adjacent Y in each domain, the more H and X
will be adjacent.

11.3 Verbal complements and adjuncts


In a head-initial language like English there is a clear preference for short
phrases to precede longer ones, the short ones being adjacent to the relevant
head. This ‘weight eVect’ has been analysed in Hawkins (1994, 1998, 2001) in
terms of the eYciency with which phrasal combinations can be parsed (the
theory of early immediate constituents, or EIC). This theory is now subsumed
under the more general theory of MiD (11.2).2
2 See also Gibson’s (1998) theory of ‘locality of syntactic dependencies’, which makes similar
predictions to minimize domains (11.2). See Hawkins (2004) for discussion of some diVerences
between the two approaches.
210 Gradience in Syntax

11.3.1 Relative weight


The immediate constituents (ICs) of a phrase can typically be recognized on
the basis of less than all the words dominated by that phrase, and some
orderings reduce the number of words needed to recognize these ICs, result-
ing in faster phrase structure recognition.
11.3.1.1 Head-initial structures Compare the alternations in (11.7a) and
(11.7b) involving two PPs following an intransitive verb (PP2>PP1 in
numbers of words):
(11.7) a. The man vp[waited pp1[for his son] pp2[in the cold but not
1 2 3 4 5
unpleasant wind]]
b. The man vp[waited pp2[in the cold but not unpleasant wind]
1 2 3 4 5 6 7 8
pp1[for his son] ]
9
The parser can construct the three ICs of VP, V, PP1, and PP2, in a domain of
Wve connected words in (11.7a), compared with nine in (11.7b), assuming that
head categories like P (or other constructing categories) immediately project
to mother nodes such as PP and render them predictable on-line.3 The greater
eYciency of (11.7a), in which the same structure can be derived from less
input, can then be captured in terms of its higher IC-to-word ratio within a
VP constituent recognition domain (CRD).4
(11.7’) a. VP: IC-to-word ratio ¼ 3/5 or 60%
b. VP: IC-to-word ratio ¼ 3/9 or 33%
Hawkins (2000) analysed a set of English structures taken from a corpus in
which exactly two PPs followed an intransitive verb and in which the PPs were
permutable with truth-conditional equivalence, that is the speaker had a
choice as in (11.7a) and (11.7b).5 Overall 82 per cent of the sequences with a
3 For detailed discussion of node construction and an axiom of constructability for phrase
structure, see Hawkins (1993) and (1994: ch. 6). See also Kimball (1973) for an early formulation of
the basic insight about phrasal node construction in parsing (his principle of New Nodes).
4 IC-to-word ratios are simpliWed procedures for quantifying what is technically an IC-to-nonIC
ratio, which measures the ratio of ICs to all other terminal and non-terminal nodes in the domain. For
explicit comparison of the two metrics, see Hawkins (1994: 69–83). For empirical testing of word-based
and (structural) node-based complexity metrics and a demonstration that the two are highly inter-
correlated, cf. Wasow (1997, 2002). See Lohse et al. (2004: 241) for a summary of, and references to,
some of the issues that arise in actually deWning the ‘weight’ of a constituent.
5 The corpus of Hawkins (2000) consisted of 500 pages of written English (200 pages of Wction, 225
pages of non-Wction and 75 pages of a diary) and of eight distinct texts.
Gradedness in the Processing of Syntax and Semantics 211

Table 11.1. English prepositional phrase orderings by relative weight

n ¼ 323 PP2 > PP1 by 1 word by 2–4 by 5–6 by 7+

[V PP1 PP2] 60% (58) 86% (108) 94% (31) 99% (68)
[V PP2 PP1] 40% (38) 14% (17) 6% (2) 1% (1)
PP2 ¼ longer PP; PP1 ¼ shorter PP
Proportion of short–long to long–short given as a percentage; actual numbers of sequences in parentheses
An additional 71 sequences had PPs of equal length (total n ¼ 394)
Source : Hawkins 2000: 237

length diVerence were ordered short before long (265/323), the short PP being
adjacent to V, and the degree of the weight diVerence correlated precisely with
the degree of preference for the short before long order, as shown in Table 11.1.
As length diVerences increase, the eYciency (and IC-to-word ratio) of the
long-before-short structure (11.7b) decreases relative to (11.7a), and (11.7a) is
increasingly preferred, and predicted, by MiD (11.2).
The data of Table 11.1 are from (written) production. Similar preferences have
been elicited in production experiments by Stallings (1998) and Stallings et al.
(1998). Domain minimization can be argued to be beneWcial for the speaker,
therefore, and not just an accommodation to the hearer’s parsing needs (cf.
Hawkins 2004). I shall accordingly relabel a CRD as a phrasal combination
domain (PCD), making it compatible with production and comprehension.
(11.8) Phrasal combination domain (PCD)
The PCD for a mother node M and its I(mmediate) C(onstituents)
consists of the smallest string of terminal elements (plus all M-
dominated non-terminals over the terminals) on the basis of which
the processor can construct M and its ICs.
EIC can be generalized to make it compatible with both as follows:
(11.9) Early immediate constituents (EIC)
The human processor prefers linear orders that minimize PCDs (by
maximizing their IC-to-word ratios), in proportion to the minimiza-
tion diVerence between competing orders.
11.3.1.2 Head-Wnal structures For head-Wnal languages EIC predicts a
mirror-image preference. Postposing a heavy NP or PP to the right in
English shortens PCDs. Preposing heavy constituents in Japanese has the
same eVect, since the relevant constructing categories (V for VP, P for PP,
etc.) are now on the right (which is abbreviated here as VPm, PPm, etc.). In a
structure like [{1PPm, 2PPm} V] the PCD for VP will proceed from the Wrst
212 Gradience in Syntax

P(ostposition) encountered to the verb, and will be smaller if the shorter


1PPm is adjacent to the verb than if the longer 2PPm is adjacent. The
preferred pattern overall should be long before short in Japanese, therefore,
and the degree of this preference should increase with increasing weight
diVerentials. In this way the time course from recognition of the Wrst PP to
the VP-Wnal verb will be faster than if the long PP is adjacent to V, following
the shorter PP.6
Consider some illustrative data collected by Kaoru Horie and involving
orderings of [{NPo, PPm} V], where NPo stands for a direct object NP
containing an accusative case particle o, in combination with a postpositional
phrase, that is with right-peripheral construction of PP by P (PPm).7 I assume
here that the o is the constructing category for this case-marked NP, parallel-
ling the Wnal postposition in PP and the Wnal V in VP, and that the processing
of VP proceeds bottom-up in an order such as [PPm NPo V]: daughter
constituents of PPm are processed before the PP itself is recognized (by
projection from P). The distance and time course from the Wrst
constructing category (P or o) to V is then shorter when the longer
phrase precedes the shorter one. That is, [PPm NPo V] is preferred when
PPm>NPo, and [NPo PPm V] is preferred when NPo>PPm. An example of
the relevant sentence type is given in (11.10):
(11.10) Japanese
a. Tanaka ga [[Hanako kara] [sono hon o] katta]
Tanaka NOM Hanako from that book ACC bought
‘Tanako bought that book from Hanako’
b. Tanaka ga [[sono hon o] [Hanako kara] katta]
Table 11.2 presents the relative weights of the two non-subject phrases and
their correlated orderings. NPo and PPm are collapsed together for present
purposes. The data for individual [PPm NPo V] versus [NPo PPm V] orders
are presented in Table 11.5. Table 11.2 reveals that each additional word of
relative weight results in a higher proportion of long before short orders,
mirroring the short before long preference of Table 11.1. The overall preference
for long before short in Table 11.2 is 72 per cent (110/153) to 28 per cent short
before long (43/153). This long before short eVect in Japanese has been
replicated in Yamashita and Chang (2001) and in Yamashita (2002). It can
also be seen in the widespread preposing preference for subordinate clauses
with Wnal complementizers in this and other head-Wnal languages.

6 For quantiWcation of this Japanese preposing preference in relation to EIC, cf. Hawkins (1994: 80–1).
7 The Japanese corpus analysed by Kaoru Horie consisted of 150 pages of written Japanese
summarized in Hawkins (1994: 142), and of three distinct texts.
Gradedness in the Processing of Syntax and Semantics 213

Table 11.2. Japanese NPo and PPm orderings by relative weight

n ¼ 153 2ICm>1ICm by 1–2 words by 3–4 by 5–8 by 9+

[2ICm 1ICm V] 66% (59) 72% (21) 83% (20) 91% (10)
[1ICm 2ICm V] 34% (30) 28% (8) 17% (4) 9% (1)
NPo ¼ direct object NP with accusative case particle o
PPm ¼ PP constructed on its right periphery by a P(ostposition)
ICm ¼ either NPo or PPm
2IC ¼ longer IC; 1IC ¼ shorter IC
Proportion of long–short to short–long orders given as a percentage; actual numbers of sequences in
parentheses
An additional 91 sequences had ICs of equal length (total n ¼ 244)
Source : Hawkins 1994: 152; data collected by Kaoru Horie

The preference for long before short in Japanese is not predicted by current
models of language production, which are heavily inXuenced by English-type
postposing eVects. Yet it points to the same minimization preference for PCDs
that we saw in head-initial languages. For example, according to the incre-
mental parallel formulator of De Smedt (1994), syntactic segments are assem-
bled incrementally into a whole sentence structure, following message
generation within a conceptualizer. Short constituents can be formulated
with greater speed in the race between parallel processes and should accord-
ingly be generated Wrst, before heavy phrases.
The theory of EIC, by contrast, predicts that short phrases will be formu-
lated Wrst only in head-initial languages, and it deWnes a general preference for
minimal PCDs in all languages. The result: heavy ICs to the left and short ICs
to the right in head-Wnal languages.

11.3.2 Complements versus adjuncts


The adjacency hypothesis of (11.6) predicts a preference for adjacency that is
proportional to the number of combinatorial and dependency relations
whose domains can be minimized in competing orders, and in proportion
to the extent of the minimization diVerence in each domain. This can be
tested by reexamining the data from the last section to see whether the
processing of additional relations between sisters and head has the predicted
eVect on adjacency. Some of these data went against weight alone and had
non-minimal PCDs. Such orders deserve special scrutiny, since they are
predicted here to involve some other syntactic or semantic link whose pro-
cessing prefers a minimal domain.
PPs with intransitive verbs in English exhibit considerable variation with
respect to their precise relation to the verb. An important distinction that is
214 Gradience in Syntax

commonly drawn in the theoretical literature is between ‘complement’ PPs,


which are lexically listed alongside the verbs that govern them, and ‘adjunct’
PPs, which are not so listed and which are positioned by general syntactic
rules. The PP for John is a complement in wait for John, whereas in the evening
is an adjunct in wait in the evening.
The problem with this distinction is that there are several more primitive
properties that underlie it, and there are examples that fall inbetween.8 Some
complements are obligatorily required, others are optional, like adjuncts.
A transitive verb requires its complement object NP, and an intransitive
verb like depend requires a co-occurring PP headed by on (I depended on
John, contrast *I depended). The intransitive wait, on the other hand, is
grammatical without its PP complement (cf. I waited for John, and I waited).
The intransitive count is also grammatical without its PP complement headed
by on (I counted), but the meaning is diVerent from that which is assigned in
the presence of the complement (I counted on John). The interpretation of the
preposition may also depend on the verb, even when the verb’s meaning is not
dependent on the preposition.
There are only few intransitive verbs like depend that require a co-occurring
PP for grammaticality, but many like count on or wait for that involve a
dependency, so we might use dependency as the major criterion for distin-
guishing complements from adjuncts. I will propose two dependency tests
here. They provide what is arguably a suYcient criterion for complement-
hood and for the co-occurrence of a PP in the lexical co-occurrence frame of
an intransitive verb (cf. Hawkins 2000).
One test, the verb entailment test, asked: does [V, {PP1, PP2}] entail V alone
or does V have a meaning dependent on either PP1 or PP2? For example, if the
man waited for his son in the early morning is true, then it is also true that the
man waited, and so the former entails the latter. But the man counted on his
son in his old age does not entail the man counted. Another test, the pro-verb
entailment test, asked: Can V be replaced by some general pro-verb or does
one of the PPs require that particular V for its interpretation? For example, the
boy played on the playground entails the boy did something on the playground,
but the boy depended on his father does not entail the boy did something on his
father.
The results of applying these tests to the data of Table 11.1 were as follows.
When there was a lexical-semantic dependency between V and just one of the
PPs by one or both tests, then 73 per cent (151/206) had that PP adjacent to V.

8 See Schütze and Gibson (1999) and Manning (2003) for useful discussion of the complement/
adjunct distinction in processing and grammar.
Gradedness in the Processing of Syntax and Semantics 215

Recall that 82 per cent had a shorter PP adjacent to V and preceding a longer
one in Table 11.1. For PPs that were both shorter and lexically dependent, the
adjacency rate to V was almost perfect, at 96 per cent (102/106). This com-
bined adjacency eVect was statistically signiWcantly higher than for lexical
dependency and EIC alone.
The processing of a lexical combination evidently prefers a minimal
domain, just as the processing of phrasal combinations does. This can be
explained as follows. Any separation of count and on his son, and of wait
and for his son, delays recognition of the lexical co-occurrence frame intended
for these predicates by the speaker and delays assignment of the verb’s
combinatorial and dependent properties. A verb can be, and typically is,
associated with several lexical co-occurrence frames, all of which may be
activated when the verb is processed (cf. Swinney 1979; MacDonald et al.
1994).
Accompanying PPs will select between them, and in the case of verbs like
count they will resolve a semantic garden path. For dependent prepositions,
increasing separation from the verb expands the domain and working mem-
ory demands that are required for processing of the preposition.
We can deWne a lexical domain as follows:
(11.11) Lexical domain (LD)9
The LD for assignment of a lexically listed property P to a lexical item L
consists of the smallest string of terminal elements (plus their associ-
ated syntactic and semantic properties) on the basis of which the
processor can assign P to L.

11.3.3 The interaction of LD and PCD processing


When lexical processing and phrase structure processing reinforce one
another, that is when the lexically listed PP is also the shorter PP, we have
seen a 96 per cent adjacency to the verb. When the two processing domains
pull in diVerent directions, that is when the complement PP is longer, we
expect variation. What is predicted is a stronger eVect within each domain in

9 I make fairly standard assumptions here about the properties that are listed in the lexicon.
They include: the syntactic category or categories of L (noun, verb, preposition, etc.); the syntactic
co-occurrence frame(s) of L, i.e. its ‘strict subcategorization’ requirements of Chomsky (1965) (e.g. V
may be intransitive or transitive, if intransitive it may require an obligatory PP headed by a particular
preposition, or there may be an optionally co-occurring PP headed by a particular preposition, etc.);
‘selectional restrictions’ imposed by L, Chomsky (1965) (e.g. drink requires an animate subject and
liquid object); syntactic and semantic properties assigned to the complements of L (e.g. the theta-role
assigned to a direct object NP by V); the diVerent range of meanings assignable to L with respect to which
L is ambiguous or polysemous (the diVerent senses of count and follow and run); and frequent
collocations of forms, whether ‘transparent’ or ‘opaque’ in Wasow’s (1997, 2002) sense.
216 Gradience in Syntax

proportion to the minimization diVerence between competing sequences. For


phrasal combinations this will be a function of the weight diVerence between
the two PPs, as measured by EIC ratios. For lexical domains it will be a
function of the absolute size of any independent PP (Pi) that intervenes
between the verb and an interdependent or matching PP. Let us abbreviate
a PP judged interdependent by one or both entailment tests as Pd. If Pi is a
short two-word phrase, the diVerence between [V Pd Pi] and [V Pi Pd] will be
just two words. But as Pi gains in size, the processing domain for the lexical
dependency between V and Pd grows, and the minimization preference for
[V Pd Pi] grows accordingly.
EIC’s graded weight preferences and predictions were conWrmed in the data
of Table 11.1. For lexical dependency it is indeed the absolute size of the
potentially intervening Pi that determines the degree of the adjacency prefer-
ence between V and Pd, as shown in Table 11.3. As Pi grows in size, its
adjacency to V declines.10
Multiple preferences therefore have an additive adjacency eVect by increas-
ing the number of processing domains that prefer minimization. They can
also result in exceptions to each preference when they pull in diVerent
directions. Most of the Wfty-eight exceptional long-before-short sequences
in Table 11.1 do indeed involve a dependency between V and the longer PP
(Hawkins 2000), applying in proportion to the kind of domain minimization
preference shown in Table 11.3. Conversely V and Pd can be pulled apart by
EIC, in proportion to the weight diVerence between Pd and Pi. This is shown
in Table 11.4.
When Pi > Pd and both weight (minimal PCDs) and lexical dependency
prefer Pd adjacent to V, there is almost exceptionless adjacency (in the right-
hand columns). When weights are equal and exert no preference, there is a
strong (83 per cent) lexical dependency eVect. When the two preferences
conXict and the dependent Pd is longer than Pi (in the left-hand columns)
EIC asserts itself in proportion to its degree of preference: for one-word
diVerentials lexical dependency claims the majority (74 per cent) adjacent to
V; for 2–4 word diVerentials the short-before-long preference wins by 67
per cent to 33 per cent; and for 5+ word diVerentials it wins by a larger margin
of 93 per cent to 7 per cent. EIC therefore asserts itself in proportion to its
degree of preference for minimal PCDs.

10 In corresponding tables cited in Hawkins (2000, 2001) I included Wve additional sequences,
making 211 in all, in which both PPs were interdependent with V, but one involved more dependencies
than the other. I have excluded these Wve here, resulting in a total of 206 sequences, in all of which one
PP is completely independent while the other PP is interdependent with V by at least one entailment
test.
Gradedness in the Processing of Syntax and Semantics 217

Table 11.3. English lexically dependent prepositional phrase orderings

n ¼ 206 Pi ¼ 2–3 words 4–5 6–7 8+

[V Pd Pi] 59% (54) 71% (39) 93% (26) 100% (32)


[V Pi Pd] 41% (37) 29% (16) 7% (2) 0% (0)
Pd ¼ the PP that is interdependent with V by one or both entailment tests
Pi ¼ the PP that is independent of V by both entailment tests
Proportion of adjacent V-Pd to non-adjacent orders given as a percentage; actual numbers of sequences in
parentheses
Source : Hawkins’ 2000 data

Table 11.4. Weight and lexical dependency in English prepositional phrase


orderings

Pd>Pi by Pd¼Pi Pi>Pd by


n ¼ 206 5+ 2–4 1 1 2–4 5+

[V Pd Pi] 7% (2) 33% (6) 74% (17) 83% (24) 92% (23) 96% (49) 100% (30)
[V Pi Pd] 93% (28) 67% (12) 26% (6) 17% (5) 8% (2) 4% (2) 0% (0)
Pd ¼ the PP that is interdependent with V by one or both entailment tests
Pi ¼ the PP that is independent of V by both entailment tests
Proportion of adjacent V-Pd to non-adjacent orders given as a percentage; actual numbers of sequences in
parentheses
Source : Hawkins 2000: 247

For the Japanese data of Table 11.2, I predict a similar preference for comple-
ments and other lexically co-occurring items adjacent to the verb, and a
similar (but again mirror-image) interaction with the long-before-short
weight preference. A transitive verb contracts more syntactic and semantic
relations with a direct object NP as a second argument or complement than it
does with a PP, many or most of which will be adjuncts rather than comple-
ments. Hence a preference for NP adjacency is predicted, even when the NP is
longer than the PP, though this preference should decline with increasing
(relative) heaviness of the NP and with increasing EIC pressure in favour of
long before short phrases. This is conWrmed in Table 11.5 where NP-V
adjacency stands at 69 per cent overall (169/244) and is as high as 62 per
cent for NPs longer than PP by 1–2 words and 50 per cent for NPs longer by
3–4 words, that is with short PPs before long NPs. Only for 5+ word diVer-
entials is NP-V adjacency avoided in favour of a majority (79 per cent) of long
NPs before short PPs.
When EIC and complement adjacency reinforce each other in favour of
[PPm NPo V] in the right-hand columns, the result is signiWcantly higher NP
218 Gradience in Syntax

Table 11.5. Weight and direct object adjacency in Japanese

NPo>PPm by NPo¼PPm PPm>NPo by


n ¼ 244 5+ 3–4 1–2 1–2 3–8 9+

[PPm NPo V] 21% (3) 50% (5) 62% (18) 66% (60) 80% (48) 84% (26) 100% (9)
[NPo PPm V] 79% (11) 50% (5) 38% (11) 34% (31) 20% (12) 16% (5) 0% (0)
NPo ¼ see Table 11.2
PPm ¼ see Table 11.2
Proportion of adjacent NPo-V to non-adjacent orders given as a percentage; actual numbers of sequences in
parentheses
Source : Hawkins 1994: 152; data collected by Kaoru Horie

adjacency (of 80 per cent, 84 per cent and 100 per cent). When weights are
equal there is a strong (66 per cent) NP adjacency preference deWned by the
complement processing preference alone. And when EIC and complement
adjacency are opposed in the left-hand columns, the results are split, as we
have seen, and EIC applies in proportion to its degree of preference. Table 11.5
is the mirror-image of table 11.4 with respect to the interaction between EIC
and lexical domain processing.
One further prediction that remains to be tested on Japanese involves the
PP-V adjacencies, especially those in the right-hand columns in which adja-
cency is not predicted by weight. These adjacencies should be motivated by
strong lexical dependencies, that is they should be lexical complements or
collocations in Wasow’s (1997, 2002) sense, and more Wne-tuned testing needs
to be conducted in order to distinguish diVerent PP types here.

11.3.4 Total domain diVerentials


The existence of multiple domains for the processing of syntactic and seman-
tic properties requires a metric that can assess their combined eVect within a
given structure. The principle of MiD (11.2) predicts tighter adjacency, the
more combinatorial and dependency relations there are, and in proportion to
the diVerences between competing domain sizes in the processing of each
relation. But can we make any more absolute predictions for when, for
example, the [V Pi Pd] variant will actually be selected over [V Pd Pi]?
The data in the left-hand columns of Table 11.4 suggest at Wrst that weight is
a stronger preference than lexical dependency since this latter claims a
majority (of 74 per cent) only when the resulting long-before-short ordering
involves a small one-word diVerence, and for all other relative weights of 2+
words the majority of orders are short-before-long, in accordance with EIC,
with the dependent Pd non-adjacent to V.
Gradedness in the Processing of Syntax and Semantics 219

But there is another possibility. Every PP, whether Pd or Pi, is at least


two words in length. The reason why a relative weight diVerence of one-
word cannot generally assert itself in favour of the normally preferred short-
before-long order, when the longer PP is a Pd, could be because the size of
an intervening Pi will be longer (at 2+ words) than the relative weight
diVerence. Hence the LD diVerential would always exceed the PCD diVeren-
tial when this latter stands at one word. As relative weights increase, the PCD
totals will gradually equal or exceed the absolute size of the Pi, and the overall
eYciency of the sequence will shift in favour of relative weight and minimal
PCDs.
In other words, we can make predictions for the relative strength of these
two factors, the syntactic (phrasal combination) and the lexical, by measuring
their respective domain minimizations in terms of words. We can then
formulate a selection prediction based on an assessment of the degree of
minimization that can be accomplished in each domain within each compet-
ing sequence. Whichever sequence has the highest overall minimization will
be more eYcient and should be the one selected. This approach makes the
(probably incorrect) assumption that word minimizations in phrasal com-
bination domains and in lexical domains count equally. But in the absence of
a good theory motivating why one should be stronger or weaker than the
other, it is worth exploring how far we can go without additional assumptions
and stipulations. In the event that a principled reason can eventually be given
for why word minimizations in one domain should exert a stronger inXuence
than those in another, this can be factored into the predictions. In the
meantime I shall run the tests assuming equality between diVerent domains.
Let us deWne a total domain diVerential (TDD) as in (11.12) and an
associated prediction for performance in (11.13):
(11.12) Total domain differential (TDD)
¼ the collective minimization diVerence between two competing
sequences measured in words and calculated on the basis of the
phrasal combination domains, lexical domains, or other domains
required for processing the syntactic or semantic relations within
these sequences.
(11.13) TDD performance prediction
Sequences with the highest collective minimization diVerences will be
those that are preferably selected, in proportion to their relative TDDs.
For the data of Table 11.4 we are dealing with phrasal combination domains
(EIC eVects) and lexical domains (established on the basis of the entailment
tests for lexical dependency). The TDD predictions can be set out as follows:
220 Gradience in Syntax

(11.14) TDD Predictions for Table 11.4


For Pi > Pd Only [V Pd Pi] preferred
[V Pd Pi] Both PCDs and LDs prefer
[V Pi Pd] Neither prefer
For Pi ¼ Pd No PCD preference
[V Pd Pi] LD prefers (in proportion to Pi size)
[V Pi Pd] LD disprefers (in proportion to Pi size)
For Pd > Pi PCD and LD conXict
[V Pd Pi] LD preference $ PCD preference (i.e. the size of
Pi $ weight diVerence)
[V Pi Pd] PCD preference $ LD preference (i.e. the weight
diVerence $ size of Pi)
These predictions are straightforward when Pi > Pd and when weights are equal,
since a Pd adjacent to V is always preferred. But the two processing domains
compete when Pd is the longer phrase, in examples such as count [on the lost son
of my past] [in old age] versus count [in old age] [on the lost son of my past]. Pd has
seven words here and Pi three. The weight diVerence is four, and the absolute size
of Pi is three. This weight diVerence exceeds the size of Pi, so short before long
is predicted, that is [V Pi Pd]. When the weight diVerence is less than Pi, then [V
Pd Pi] is preferred, for example count [on him] [in old age] (weight diVerence ¼
1, Pi ¼ 3). With weight diVerences equal to Pi, both orders are possible.
The results are set out in (11.15). I Wrst give the total correct for the 206
sequences of Table 11.4 (a full 90 percent), along with Wgures from the
remaining 10 percent that come within one word per domain of being correct.
I then give separate Wgures for the conXict cases in which Pd > Pi. The success
rate here is 83 percent, and this jumps to 97 percent for sequences that come
within one word per domain of the preferred TDD.
(11.15) Results for Table 11.4
Total with preferred orders ¼ 90 per cent (185/206)
Additional total within 1 word per domain of preferred TDD
¼ 95 per cent (196/206)
Total correct in conXict cases ¼ 83 per cent (59/71)
Additional total within 1 word per domain of preferred TDD
¼ 97 per cent (69/71)
These Wgures provide encouraging support for this predictive multi-domain
approach to adjacency and relative ordering.11

11 See Hawkins (2004) for further illustration and testing of these total domain diVerential
predictions in a variety of other structural types.
Gradedness in the Processing of Syntax and Semantics 221

11.4 Minimal domains in grammars


Grammatical conventions across languages reveal the same degrees of prefer-
ence for minimal phrasal combination domains that we saw in the perform-
ance data of Section 11.3.1. For example, the Greenbergian word order
correlations show that the adjacency of lexical head categories is massively
preferred over their non-adjacency (Greenberg 1963; Hawkins 1983; Dryer
1992). EIC predicts these correlations (cf. Hawkins 1990, 1994). Two of them
are presented in (11.16) and (11.17), with IC-to-word ratios given for each
order in (11.16).12 Example (11.16) shows a correlation between verb-initial
order and prepositions, and between verb-Wnal and postpositions (i.e. phrases
corresponding to [went [to the store]] versus [[the store to] went]). Example
(11.17) shows one between prepositions and nouns preceding possessive
(genitive) phrases and between postpositions and nouns following (corre-
sponding to [in [books of my professor]] versus [[my professor of books] in]).
(11.16) a. vp[V pp[P NP]] ¼ 161 (41%) b. [[NP P]pp V]vp ¼ 204 (52%)
IC-to-word: 2/2 ¼ 100% IC-to-word: 2/2 ¼ 100%
c. vp[V [NP P]pp] ¼ 18 (5%) d. [pp[P NP] V]vp ¼ 6 (2%)
IC-to-word: 2/4 ¼ 50% IC-to-word: 2/4 ¼ 50%
Assume: V ¼ 1 word; P ¼ 1; NP ¼ 2
EIC-preferred (16a)+(16b) ¼ 365/389 (94%)
(11.17) a. pp[P np[N Possp]] ¼ 134 (40%) b.[[Possp N]np P]pp ¼ 177 (53%)
c. pp[P [Possp N]np] ¼ 14 (4%) d.[np[N Possp] P]pp ¼ 11 (3%)
EIC-preferred (17a) + (17b) ¼ 311/336 (93%)
The adjacency of V and P, and of P and N, guarantees the shortest possible
domain for the recognition and production of the two ICs in question (V and
PP within VP, P and NP within PP). Two adjacent words suYce, hence 100
per cent IC-to-word ratios. In the non-adjacent domains of the (c) and (d)
orders, ratios are signiWcantly lower and exemplifying languages are sig-
niWcantly less. The preferred (a) and (b) structures collectively account for
94 per cent and 93 per cent of all languages respectively.
Patterns like these have motivated the head-initial (or VO) and head-Wnal
(OV) parameter in both typological and generative research, see, for example,

12 The quantitative data in (11.16) are taken from Matthew Dryer’s sample, measuring languages
rather than genera (see Dryer 1992, Hawkins 1994: 257). The quantitative data in (11.17) come from
Hawkins (1983, 1994: 259).
222 Gradience in Syntax

Vennemann (1974), Lehmann (1978), Hawkins (1983), and Travis (1984, 1989).
The two language types are mirror images of one another, and EIC provides
an explanation: both (a) and (b) are optimally eYcient.
Grammatical conventions also reveal a preference for orderings in propor-
tion to the number of combinatorial and dependency relations whose process-
ing domains can be minimized (recall (11.6)). Complements prefer adjacency to
heads over adjuncts in the basic ordering rules of numerous phrases in English
and other languages and are generated in a position adjacent to the head in the
phrase structure grammars of JackendoV (1977) and Pollard and Sag (1987).
Tomlin’s (1986) verb object bonding principle supports this. Verbs and direct
objects are regularly adjacent across languages and there are languages in which
it is impossible or highly dispreferred for adjuncts to intervene between a verbal
head and its subcategorized object complement.
The basic reason I oVer is that complements also prefer adjacency over
adjuncts in performance (cf. 11.3.3), and the explanation for this, in turn, is
that there are more combinatorial and/or dependency relations linking com-
plements to their heads than link adjuncts to their heads. Complements are
listed in a lexical co-occurrence frame deWned by, and activated in on-line
processing by, a speciWc head such as a verb and processing this co-occurrence
favours a minimal lexical domain (11.11). There are more productive relations
of semantic and syntactic interdependency between heads and complements
than between heads and adjuncts. A direct object receives its theta-role from
the transitive verb, and so on.
These considerations suggest that domain minimization has also shaped
grammars and the evolution of grammatical conventions, according to the
following hypothesis:
(11.18) Performance-grammar correspondence hypothesis (PGCH)
Grammars have conventionalized syntactic structures in proportion
to their degree of preference in performance, as evidenced by pat-
terns of selection in corpora and by ease of processing in performance.
It follows from the PGCH that performance principles can often explain what
purely grammatical models can only stipulate, in this context adjacency
eVects and the head ordering parameter. SigniWcantly, they can also explain
exceptions to these stipulations, as well as many grammatically unpredicted
regularities. For example, Dryer (1992) has shown that there are systematic
exceptions to Greenberg’s correlations ((11.16)/(11.17)) and to consistent head
ordering when the non-head is a single-word item, for example an adjective
modifying a noun (yellow book). Many otherwise head-initial languages have
non-initial heads here (English), while many otherwise head-Wnal languages
Gradedness in the Processing of Syntax and Semantics 223

have noun before adjective (Basque). But when the non-head is a branching
phrase, there are good correlations with the predominant head ordering
position. EIC can explain this asymmetry.
When a head category like N (book) has a branching phrasal sister like
Possp {of, the professor} within NP, the distance from N to the head category P
or V that constructs the next higher phrase, PP or VP respectively, will be long
when head orderings are inconsistent, see, for example, (11.17c) and (11.17d). If
the intervening category is a non-branching single word, then the diVerence
between pp[P [Adj N]np] and pp[P np[N Adj]] is small, only one word.
Hence the MiD preference for noun initiality (and for noun Wnality in
postpositional languages) is signiWcantly less than it is for intervening branch-
ing sisters, and either less head ordering consistency or no consistency is
predicted. When there is just a one-word diVerence between competing
domains in performance, in for example Table 11.1, both ordering options
are generally productive, and so too in grammars.
Many such universals can be predicted from performance preferences,
including structured hierarchies of centre-embedded constituents and of
Wller-gap dependencies, markedness hierarchies, symmetries versus asymmet-
ries, and many morphosyntactic regularities (Hawkins 1994, 1999, 2001, 2003,
2004).
A model of grammar that seems ideally suited to capturing this perform-
ance-grammar correspondence is S(tochastic) O(ptimality) T(heory), cf.
Bresnan et al. (2001), Manning (2003). These authors point to the perform-
ance preference for Wrst and second person subjects in English (I was hit by the
bus) over third person subjects (the bus hit me), which has been convention-
alized into an actual grammaticality distinction in the Salish language Lummi.
SOT models this by building performance preferences directly into the gram-
mar as a probability ranking relative to other OT constraints. For English
there is a partial overlap with other ranked constraints, and non-Wrst person
subjects can surface as grammatical. In Lummi there is no such overlap and
sentences corresponding to the bus hit me are not generated. In the process,
however, SOT stipulates a stochastic distribution between constraint rankings
within the grammar, based on observed frequencies in performance.
We could formulate a similar stochastic model for the phrase structure
adjacencies and lexical co-occurrences of this paper. But there are good
reasons not to do so.
First, SOT would then stipulate what is predictable from the performance
principle of MiD (11.2).
Second, the grammatical type of the syntactic and semantic relation in
question does not enable us to predict the outcome of the constraint
224 Gradience in Syntax

interaction. What matters is the size of the domain that a given relation
happens to require in a given sentence and its degree of minimization over
a competitor. One and the same grammatical relation can have diVerent
strengths in diVerent sentences (as a function of the weight diVerences
between sisters, for example). And phrasal combination processing can be a
stronger force for adjacency than lexical dependency in some sentences, but
weaker in others. In other words, it is processing, not grammar, that makes
predictions for performance, and it would be unrevealing to model this as an
unexplained stochastic distribution in a grammar, when there is a principled
account in terms of MiD.
And third, I would argue that performance preferences have no place in a
grammar anyway, whose primary function is to describe the grammatical
conventions of the relevant language. To do so is to conXate, and confuse,
explanatory questions of grammatical evolution and synchronic questions
of grammaticality prediction. The soft constraint/hard constraint insight is
an important one, and it Wts well with the PGCH (11.18), but hard constraints
can be explained without putting soft constraints into the same grammar,
and the soft constraints require a processing explanation, not a grammatical
one.

11.5 Conclusions
The data considered in this paper lead to the conclusions summarized in (11.1).
First, there are clear preferences among competing and grammatically
permitted structures in the corpus data of English (Tables 11.1, 11.3, and 11.4)
and Japanese (Tables 11.2 and 11.5). These preferences constitute a set of
‘gradedness eVects’ and they can be argued to result from minimization
diVerences in processing domains, cf. minimize domains (11.2).
MiD deWnes a cumulative beneWt for minimality when the same terminal
elements participate in several processing domains. The English intransitive
verbs and PPs that contract relations of both phrasal sisterhood and of lexical
combination exhibit 96 per cent adjacency, those that involve only one or the
other relation exhibit signiWcantly less (cf. 11.3.3). The relative strengths in
these cases reXect the degree of minimization distinguishing competing
orders in the processing of each relation. These cumulative eVects are cap-
tured in a quantitative metric that measures total domain diVerentials (or
TDDs) across structures (11.12). The metric measures domain sizes in words,
but could easily be adapted to quantify a more inclusive node count, or a
count in terms of phrasal nodes only.
Gradedness in the Processing of Syntax and Semantics 225

Experimental Wndings are generally correlated with corpus frequences for


these ordering phenomena, see, for example, Stallings (1998), Wasow (2002),
and Yamashita and Chang (2001) for weight eVects. Acceptability intuitions,
on the other hand, appear capable of detecting only the extremes of prefer-
ence and dispreference. My own judgements on English could not have
produced these preferences in advance, and disagreements among linguists
outside the extremes of clear grammaticality and ungrammaticality are
legion.
Second, domain minimization preferences in performance have also
shaped grammars and the evolution of grammatical conventions, according
to the performance-grammar correspondence hypothesis (11.18). Computer
simulations of the evolution of some of these conventions out of performance
preferences can be found in Kirby (1999).
Third, this kind of ‘eYciency-based’ theory of performance can give a
principled explanation for grammatical principles and universals, such as
head ordering, subjacency hierarchies and numerous other phenomena. It
can explain exceptions to these stipulations, such as the absence of consistent
head ordering with single-word sisters of heads. And it can give a theoretical
reason for expecting some typological variants rather than others, by observ-
ing patterns of preference in languages with variation, like the postverbal
orderings of English and the preverbal orderings of Japanese, and formulating
performance principles in conjunction with the PGCH.
I have argued that S(tochastic) O(ptimality) T(heory), despite its attractive
premise that hard constraints mirror soft constraints, stipulates the perform-
ance preferences that need to be explained, and since they are not explained,
the explanation for grammars and grammatical variation is weakened. The
frequency data of this paper are patterned and principled. It remains to be
seen whether other stochastic distributions can be similarly explained, or
whether there are simply certain ‘conventions of use’ in diVerent speech
communities that have to be learned.13 The person ranking constraint of
Bresnan et al. (2001) and Manning (2003) correlates with, and suggests an
explanation in terms of, degrees of accessibility (Ariel 1990) in noun phrase
references in performance.
Interesting test cases here would involve languages with similar typologies
but diVerent distributions for, for example, passive versus active, or diVerent
scrambling or extraposition frequencies. Relative or absolute values for these

13 See Hawkins (2003, 2004) for detailed discussion of frequency distributions and their grammat-
ical counterparts in numerous areas, in terms of minimize domains (11.2), in conjunction with two
further principles of eYciency, minimize forms, and maximize on-line processing.
226 Gradience in Syntax

frequencies may not be predictable, but you will not know this if you look
only at the grammar of the preference (Wrst and second person subjects are
preferred over third persons, etc.) instead of its processing. Should such
distributions turn out not to be predictable from eYciency principles of
performance, they should still not be included in a grammar, if their stochas-
tic ranking is not explainable by grammatical principles either, which it
almost certainly will not be.
Fourth and Wnally, we need a genuine theory of performance and of the
human processing architecture from which frequencies can be derived, and I
have argued (Hawkins 2004) that we do not yet have the kind of general
architecture that we need. MiD is an organizing principle with some pre-
dictiveneness, but it too must be derivable from this architecture. We also
need the best model of grammatical description we can get, incorporating
relevant conventions in languages that have them, and deWning the diVerences
in grammaticality between languages in the best possible way. The further
ingredient that is ultimately needed in the explanatory package is a diachronic
model of adaptation and change, of the type outlined in Haspelmath (1999)
and Kirby (1999).
12

Probabilistic Grammars as Models


of Gradience in Language
Processing
M AT T H EW W. C RO C K E R A N D F R A N K K E L L E R

12.1 Introduction
Gradience in language comprehension can be manifest in a variety of ways,
and have various sources of origin.1 Based on theoretical and empirical
results, one possible way of classifying such phenomena is whether they
arise from the grammaticality of a sentence, perhaps reflecting the relative
importance of various syntactic constraints, or arise from processing, namely
the mechanisms which exploit our syntactic knowledge for incrementally
recovering the structure of a given sentence. Most of the chapters in this
volume are concerned with the former: how to characterize and explain the
gradient grammaticality of a given utterance, as measured, for example, by
judgements concerning acceptability. While the study of gradient grammat-
icality has a long history in the generative tradition (Chomsky 1964, 1975),
recent approaches such as the minimalist programme (Chomsky 1995) do not
explicitly allow for gradience as part of the grammar.
In this chapter, we more closely consider the phenomena of gradient per-
formance: how can we explain the variation in processing difficulty, as reflected
for example in word-by-word reading times? Psycholinguistic research has
identified two key sources of processing difficulty in sentence comprehension:
local ambiguity and processing load. In the case of local, or temporary ambigu-
ity, there is abundant evidence that people adopt some preferred interpretation
immediately, rather then delaying interpretation. Should the corresponding
1 The authors would like to thank the volume editors, the anonymous reviewers, and also Marshall
Mayberry for their helpful comments. The authors gratefully acknowledge the support of the German
Research Foundation (DFG) through SFB 378 Project ‘Alpha’ awarded to the first author, and an
Emmy Noether fellowship awarded to the second author.
228 Gradience in Syntax

syntactic analysis be disconfirmed by the sentence’s continuation, reanalysis is


necessary, and is believed to be an important contributor to observable difficul-
ties in processing. It is also the case, however, that processing difficulties are
found in completely unambiguous utterances, such as centre embedded struc-
tures. One explanation of such effects is that, despite being both grammatical
and unambiguous, such sentences require more cognitive processing resources
(such as working memory) than are available.
While these phenomena have been well studied, both empirically and theoret-
ically, there has been little attempt to model relative processing difficulty: why
some sentences are more difficult than others, and precisely how difficult they are.
Quantitative models, which can predict real-valued behavioural measures are
even less common. We argue, however, that one relatively new class of models
offers considerable promise in addressing this issue. The common distinguishing
feature of the models we discuss here is that they are experience-based. The central
idea behind experienced-based models is that the mechanisms which people use
to arrive at an incremental interpretation of a sentence are crucially dependent on
relevant prior experience. Generally speaking, interpretations which are sup-
ported by our prior experience are preferred to those which are not. Furthermore,
since experience is generally encoded in models as some form of relative likeli-
hood, or activation, it is possible for models to generate real-valued, graded
predictions about the processing difficulty of a particular sentence.
We begin by reviewing some of the key psycholinguistic evidence motivating
the need for experience-based mechanisms, before turning to a discussion of
recent models. We focus our attention here on probabilistic models of human
sentence processing, which attempt to assign a probability to a given sentence,
as well as to alternative parse interpretations for that sentence. Finally, we will
discuss the relationship between probabilistic models of performance (gradi-
ent processing complexity), and probabilistic models of competence (gradient
grammaticality). A crucial consequence of the view we propose is that the
likelihood of a (partial) structure is only meaningful relative to the likelihood
of competing (partial) structures, and does not provide an independently
useful characterization of the grammaticality of the alternatives. Thus we
argue that a probabilistic characterization of gradient grammaticality should
be quite different from a probabilistic performance model.

12.2 The role of experience in sentence processing


People are continually faced with the problem of resolving the ambiguities
that occur in the language they hear and read (Altmann 1998). Computational
theories of human language comprehension therefore place much emphasis
Probabilistic Grammars in Language Processing 229

on the algorithms for constructing syntactic and semantic interpretations,


and the strategies for deciding among alternatives, when more than one
interpretation is possible (Crocker 1999). The fact that people understand
language incrementally, integrating each word into their interpretation of the
sentence as it is encountered, means that people are often forced to resolve
ambiguities before they have heard the entire utterance. While it is clear that
many kinds of information are involved in ambiguity resolution (Gibson and
Pearlmutter 1998), much attention has recently been paid to the role of
linguistic experience. That is to say, to what extent do the mechanisms
underlying human language comprehension rely on previous linguistic
encounters to guide them in resolving an ambiguity they currently face?
During his or her lifetime, the speaker of a language accrues linguistic
experience. Certain lexical items are encountered more often than others,
some syntactic structures are used more frequently, and ambiguities are often
resolved in a particular manner. In lexical processing, for example, the
influence of experience is clear: high frequency words are recognized more
quickly than low frequency ones (Grosjean 1980), syntactically ambiguous
words are initially perceived as having their most likely part of speech
(Crocker and Corley 2002), and semantically ambiguous words are associated
with their more frequent sense (Duffy et al. 1988).
Broadly, we define a speaker’s linguistic experience with a given linguistic
entity as the number of times the speaker has encountered this entity in the
past. Accurately measuring someone’s linguistic experience would (in the
limit) require a record of all the text or speech that person has ever been
exposed to. Additionally, there is the issue of how experience is manifest in the
syntactic processing mechanism. The impracticality of this has lead to alter-
native proposals for approximating linguistic experience, such as norming
experiments or corpus studies.
Verb frames are an instance of linguistic experience whose influence on
sentence processing has been researched extensively in the literature. The
frames of a verb determine the syntactic complements it can occur with.
For example, the verb know can appear with a sentential complement
(S frame) or with a noun phrase complement (NP frame). Norming studies
can be conducted in which subjects are presented with fragments such as
(12.1) and complete them to form full sentences.
(12.1) The teacher knew —.
Subjects might complete the fragment using the answer (NP frame) or the
answer was false (S frame). Verb frame frequencies can then be estimated as
the frequencies with which subjects use the S frame or the NP frame (Garnsey
230 Gradience in Syntax

et al. 1997). An alternative to the use of completion frequencies is the use of


frequencies obtained in a free production task, where subjects are presented
only with a verb, and are asked to produce a sentence incorporating this verb
(Connine et al. 1984).
An alternative technique is to extract frequency information from a corpus,
a large electronic collection of linguistic material. A balanced corpus (Burnard
1995; Francis et al. 1982), which contains representative samples of both text
and speech in a broad range of genres and styles, is often assumed to provide
an approximation of human linguistic experience. In our examples, all
instances of know could be extracted from a corpus, counting how often the
verb occurs with the NP and the S frame.
Additionally, however, there is the issue of how experience is manifest in the
syntactic processing mechanism. A simple frequentist approach would mean
that all our experience has equal weight, whether an instance of exposure
occurred ten seconds ago, or ten years ago. This is true for the kinds of
probabilistic models we discuss here. Thus an interesting difference between
corpus estimates and norming studies is that the former approximates the
experience presented to a speaker, while the latter reflects the influence of that
experience on a speaker’s preferences. Results in the literature broadly indicate
that frame frequencies obtained from corpora and norming studies are reliably
correlated (Lapata et al. 2001; Sturt et al. 1999). It should be borne in mind,
however, that corpus frequencies vary as a function of the genre of the corpus
(Roland and Jurafsky (1998) compared text and speech corpora) and also verb
senses play a role (Roland and Jurafsky 2002).
Once language experience has been measured using norming or corpus
studies, the next step is to investigate how the human language processor uses
experience to resolve ambiguities in real time. A number of studies have
demonstrated the importance of lexical frequencies. These frequencies can
be categorical (e.g. the most frequent part of speech for an ambiguous word,
Crocker and Corley 2002), morphological (e.g. the tendency of a verb to occur
in a particular tense, Trueswell 1996), syntactic (e.g. the tendency of a verb to
occur with a particular frame, as discussed above, Ford et al. 1982; Garnsey
et al. 1997; Trueswell et al. 1993), or semantic (e.g. the tendency of a noun to
occur as the object of a particular verb, Garnsey et al. 1997; McRae et al. 1998;
Pickering et al. 2000). It has been generally argued that these different types of
lexical frequencies form a set of interacting constraints that determine the
preferred parse for a given sentence (MacDonald 1994; MacDonald et al. 1994;
Trueswell and Tanenhaus 1994).
Other researchers (Brysbaert and Mitchell 1996; Mitchell et al. 1996) have
taken the stronger view that the human parser not only makes use of lexical
Probabilistic Grammars in Language Processing 231

frequencies, but also keeps track of structural frequencies. This view, known
as the tuning hypothesis, states that the human parser deals with ambiguity by
initially selecting the syntactic analysis that has worked most frequently in the
past (see Figure 12.1).
The fundamental question that underlies both lexical and structural
experience models is the grain problem: What is the level of granularity at
which the human sentence processor ‘keeps track’ of frequencies? Does it
count lexical frequencies or structural frequencies (or both), or perhaps
frequencies at an intermediate level, such as the frequencies of individual
phrase structure rules? The latter assumption underlies a number of
experience-based models that are based on probabilistic context free

NP VP

V NP

det N
Someone shot
PP
the servant
Spanish
prep NP
English
of
RC
det N

the actress
relp S

who ...

Figure 12.1 Evidence from relative clause (RC) attachment ambiguity has been
taken to support an experience-based treatment of structural disambiguation. Such
constructions are interesting because they do not hinge on lexical preferences. When
reading sentences containing the ambiguity depicted above, English subjects demon-
strate a preference for low-attachment (where the actress will be further described by
the RC who . . . ), while Spanish subjects, presented with equivalent Spanish sentences,
prefer high-attachment (where the RC concerns the servant) (Cuetos and Mitchell
1988). The Tuning Hypothesis was proposed to account for these findings (Brysbaert
and Mitchell 1996; Mitchell et al. 1996), claiming that initial attachment preferences
should be resolved according to the more frequent structural configuration. Later
experiments further tested the hypothesis, examining subjects’ preferences before and
after a period of two weeks in which exposure to high or low examples was increased.
The findings confirmed that even this brief period of variation in ‘experience’
influenced the attachment preferences as predicted (Cuetos et al. 1996)
232 Gradience in Syntax

grammars (see Figure 12.2 for details). Furthermore, at the lexical level, are
frame frequencies for verb forms counted separately (e.g. know, knew, knows,
. . . ) or are they combined into a set of total frequencies for the verb’s base
form (the lemma KNOW) (Roland and Jurafsky 2002)?

12.3 Probabilistic models of sentence processing


Theories of human syntactic processing have traditionally down-played the
importance of frequency (Fodor and Frazier 1978; Marcus 1980; Pritchett
1992), focusing rather on the characterization of more general, sometimes
language universal, processing mechanisms (Crocker 1996). An increasing
number of models, however, incorporate aspects of linguistic experience in
some form or other. This is conceptually attractive, as an emphasis on
experience may help to explain some of the rather striking, yet often unad-
dressed, properties of human sentence processing:
. Efficiency: The use of experience-based heuristics, such as choosing the
reading that was correct most often in the past, helps explain rapid and
seemingly effortless processing, despite massive ambiguity.
. Coverage: In considering the full breadth of what occurs in linguistic
experience, processing models will be driven to cover more linguistic
phenomena, and may look quite different from the toy models which are
usually developed.
. Performance: Wide-coverage experience-based models can offer an
explanation of how people rapidly and accurately understand most of
the language they encounter, while also explaining the kinds of pathologies
which have been the focus of most experimental and modelling research.
. Robustness: Human language processing is robust to slips of the tongue,
disfluencies, and minor ungrammaticalities. The probabilistic mechan-
isms typically associated with experience-based models can often pro-
vide sensible interpretations even in the face of such noise.
. Adaptation: The human language processor is finely tuned to the lin-
guistic environment it inhabits. This adaptation is naturally explained if
processing mechanisms are the product of learning from experience.
Approaches in the literature differ substantially in how they exploit linguistic
experience. Some simply permit heterogeneous linguistic constraints to have
‘weights’ which are determined by frequency (MacDonald et al. 1994; Tanen-
haus et al. 2000), others provide probabilistic models of lexical and syntactic
processing (Crocker and Brants 2000; Jurafsky 1996), while connectionist
Probabilistic Grammars in Language Processing 233

models present yet a further paradigm for modelling experience (Christiansen


and Chater 1999, 2001; Elman 1991, 1993).
Crucially, however, whether experience is encoded via frequencies, prob-
abilities, or some notion of activation, all these approaches share the idea that
sentences and their interpretations will be associated with some real-valued
measure of goodness: namely how likely or plausible an interpretation is,
based on our prior experience. The appeal of probabilistic models is that they
acquire their parameters from data in their environment, offering a transpar-
ent relationship between linguistic experience and a model’s behaviour. The
probabilities receive a cognitive interpretation; typically a high probability is
assumed to correlate with a low processing effort. This suggests that the
human sentence processor will prefer the structure with the lowest processing
effort when faced with a syntactic ambiguity (see Figure 12.1 for an example).
Before considering probabilistic models of human processing in more detail,
we first quickly summarize the ideas that underlie probabilistic parsing.

12.3.1 Probabilistic grammars and parsing


A probabilistic grammar consists of a set of symbolic rules (e.g. context free
grammar rules) annotated with rule application probabilities. These prob-
abilities can then be combined to compute the overall probability of a
sentence, or for a particular syntactic analysis of a sentence. The rule prob-
abilities are typically derived from a corpus—a large, annotated collection of
text or speech. In cognitive terms, the corpus can be regarded as an approxi-
mation of the language experience of the user; the probabilities a reflection of
language use, that is they provide a model of linguistic performance.
Many probabilistic models of human sentence processing are based on the
framework of probabilistic context free grammars (PCFGs, see Manning and
Schütze 1999, for an overview). PCFGs augment standard context free grammars
by annotating grammar rules with rule probabilities. A rule probability expresses
the likelihood of the left-hand side of the rule expanding to its right-hand side. As
an example, consider the rule VP ! V NP in Figure 12.2(a). This rule says that a
verb phrase expands to a verb followed by a noun phrase with a probability of 0.7.
In a PCFG, the probabilities of all rules with the same left-hand side have to
sum to one:
(12.2) 8i P(N i ! zj ) ¼ 1
P
j
where P(N i ! zj ) is the probability of a rule with the left-hand side N i and
the righthand side zj . For example, in Figure 12.2(a) the two rules VP ! V NP
and VP ! VP PP share the same left-hand side (VP), so their probabilities
sum to one.
234 Gradience in Syntax

The probability of a parse tree generated by a PCFG is computed as the


product of its rule probabilities:
Q
(12.3) P(t) ¼ P(N ! z)
(N!z)2R

where R is the set of all rules applied in generating the parse tree t. It has been
suggested that the probability of a grammar rule models how easily this rule
can be accessed by the human sentence processor (Jurafsky 1996). Structures
with greater overall probability should be easier to construct, and therefore
preferred in cases of ambiguity. As an example consider the PCFG in Figure
12.2(a). This grammar generates two parses for the the sentence John hit the
man with the book. The first parse t1 attaches the prepositional phrase with the
book to the noun phrase (low attachment), see Figure 12.2(b). The PCFG
assigns t1 the following probability, computed as the product of the probabil-
ities of the rules used in this parse:
(12.4) P(t1 ) ¼ 1:0  0:2  0:7  1:0  0:2  0:6  1:0  1:0  0:5
 1:0  0:6  1:0  0:5 ¼ 0:00252
The alternative parse t2 , with the prepositional phrase attached to the verb
phrase (high attachment, see Figure 12.2(c)) has the following probability:
(12.5) P(t2 ) ¼ 1:0  0:2  0:3  0:7  1:0  1:0  0:6  1:0  0:6
1:0  0:5  1:0  0:5 ¼ 0:00378
Under the assumption that the probability of a parse is a measure of process-
ing effort, we predict that t2 (high attachment) is easier to process than t1 , as it
has a higher probability.
In applying PCFGs to the problem of human sentence processing, an
important additional property must be taken into account: incrementality.
That is, people face a local ambiguity as soon as they hear the fragment John
hit the man with . . . and must decide which of the two possible structures is

(a)
S → NP VP 1.0 NP → DetNP 0.6 V → hit 1.0

PP → P NP 1.0 NP → NP PP 0.2 N → man 0.5

VP → V NP 0.7 NP → John 0.2 N → book 0.5

VP → VP PP 0.3 P → with 1.0 Det → the 1.0

Figure 12.2 An example for the parse trees generated by a probabilistic context free
grammar (PCFG). (a) The rules of a simple PCFG with associated rule application
probabilities. (b) and (c) The two parse trees generated by the PCFG in (a) for the
sentence John hit the man with the book.
Probabilistic Grammars in Language Processing 235

(b)
S 1.0

NP 0.2 VP 0.7

John
V 1.0 NP 0.2

hit
NP 0.6 PP 1.0

Det 1.0 N 0.5


P 1.0 NP 0.6

the man
with Det 1.0 N 0.5

the book

(c)
S 1.0

NP 0.2 VP 0.3

John

VP 0.7 PP 1.0

V 1.0 NP 0.6 P 1.0 NP 0.6

hit Det 1.0 N 0.5 with Det 1.0 N 0.5

the man the book

Figure 12.2. Contd.


236 Gradience in Syntax

to be preferred. This entails that the parser is able to compute prefix prob-
abilities for sentence initial substrings, as the basis for comparing alternative
(partial) parses. Existing models provide a range of techniques for computing
and comparing such parse probabilities incrementally (Brants and Crocker
2000; Hale 2001; Jurafsky 1996). For the example in Figure 12.1, however, the
preference for t2 would be predicted even before the final NP is processed,
since the probability of that NP is the same for both structures.
Note that the move from CFGs to PCFGs also raises a number of other
computational problems, such as the problem of efficiently computing the
most probable parse for a given input sentence. Existing parsing schemes can
be adapted to PCFGs, including shift-reduce parsing (Briscoe and Carroll
1993) and left-corner parsing (Stolcke 1995). These approaches all use the basic
Viterbi algorithm (Viterbi 1967) for efficiently computing the best parse
generated by a PCFG for a given sentence.

12.3.2 Probabilistic models of human behaviour


Jurafsky (1996) suggests using Bayes’ rule to combine structural probabilities
generated by a probabilistic context free grammar with other probabilistic
information. The model therefore integrates multiple sources of experience
into a single, mathematically founded framework. As an example consider
again the fragment in (12.1). When a speaker reads or hears know, he or she has
the choice between two syntactic readings, involving either an S complement
or an NP complement.
Jurafsky’s model computes the probabilities of these two readings based on
two sources of information: the overall structural probability of the S reading
and the NP reading, and the lexical probability of the verb know occurring
with an S or an NP frame. The structural probability of a reading is inde-
pendent of the particular verb involved; the frame probability, however, varies
with the verb. This predicts that in some cases lexical probabilities can
override structural probabilities.
Jurafsky’s model is able to account for a range of parsing preferences
reported in the psycholinguistic literature. However, it might be criticized
for its limited coverage, that is for the fact that it uses only a small lexicon and
grammar, manually designed to account for a handful of example sentences.
In the computational linguistics literature, on the other hand, broad coverage
parsers are available that compute a syntactic structure for arbitrary corpus
sentences with an accuracy of about 90 per cent (Charniak 2000). Psycholin-
guistic models should aim for similar coverage, which is clearly part of human
linguistic performance.
Probabilistic Grammars in Language Processing 237

This issue has been addressed by Corley and Crocker’s (2000) broad
coverage model of lexical category disambiguation. Their approach uses a
bigram model to incrementally compute the probability that a string of words
w0 . . . wn has the part of speech sequence t0 . . . tn as follows:
n
Q
(12.6) P(t0 . . . tn ,w0 . . . wn )  P(wi jti )P(ti jti 1 )
i¼1
Here, P(wi jti ) is the conditional probability of word wi given the part of
speech ti , and P(ti jti 1 ) is the probability of ti given the previous part of
speech ti 1 . This model capitalizes on the insight that many syntactic ambi-
guities have a lexical basis, as in (12.7):
(12.7) The warehouse prices/makes —.
These fragments are ambiguous between a reading in which prices or makes is
the main verb or part of a compound noun. After being trained on a large
corpus, the model predicts the most likely part of speech for prices, correctly
accounting for the fact that people understand prices as a noun, but makes as a
verb (Crocker and Corley 2002; Frazier and Rayner 1987; MacDonald 1993).
Not only does the model account for a range of disambiguation preferences
rooted in lexical category ambiguity, it also explains why, in general, people
are highly accurate in resolving such ambiguities.
More recent work on broad coverage parsing models has extended this
approach to full syntactic processing based on PCFGs (Crocker and Brants
2000). This research demonstrates that when such models are trained on
large corpora, they are not only able to account for human disambiguation
behaviour, but they are also able to maintain high overall accuracy under
strict memory and incremental processing restrictions (Brants and Crocker
2000).
Finally, it is important to stress that the kind of probabilistic models we
outline here emphasizes lexical and syntactic information in estimating the
probability of a parse structure. To the extent that a PCFG is lexicalized, with
the head of each phrase being projected upwards to phrasal nodes (Collins
1999), some semantic information may also be implicitly represented in the
form of word co-occurrences (e.g. head–head co-occurrences). In addition to
being incomplete models of interpretation, such lexical dependency probabil-
ities are poor at modelling the likelihood of plausible but improbable struc-
tures. Probabilistic parsers in their current form are therefore only
appropriate for modelling syntactic processing preferences. Probabilistic
models of human semantic interpretation and plausibility remain a largely
unexplored area of research.
238 Gradience in Syntax

12.3.3 Towards quantitative models of performance


So far, probabilistic models of sentence processing have only been used to
account for qualitative data about human sentence processing (e.g. to predict
whether a garden path occurs). By quantifying the likelihood of competing
structural alternatives, however, such models in principle offer hope for more
quantitative accounts of gradient behavioural data (e.g. to predict the strength
of a garden path). In general terms, this would entail that the probability
assigned to a syntactic structure is to be interpreted as a measure of the degree
of processing difficulty triggered by this structure. Gradient processing diffi-
culty in human sentence comprehension can be determined experimentally,
for example by recording reading times in self-paced reading studies or eye-
tracking experiments. An evaluation of a probabilistic model should therefore
be conducted by correlating the probability predicted by the model for a given
structure with reading times (and other indices of processing difficulty).
This new way of evaluating processing models raises a number of questions.
Most importantly, an explicit linking hypothesis is required, specifying which
quantity computed by the model would be expected to correlate with human
processing data. One possible measure of processing difficulty would be the
probability ratio of alternative analyses (Jurafsky 1996). That is, in addition to
predicting the highest probability parse to be the easiest, we might expect the
cost of switching to a less preferred parse to be correlated with the probability
ratio of the preferred parse with respect to the alternative.
Hale (2003) suggests an alternative, proposing that the word by word
processing complexity is dominated by the amount of information the
word contributes concerning the syntactic structure, as measured by entropy
reduction. Hale’s model is thus in stark contrast with the previous probabil-
istic parsing accounts, in that he does not assume that switching from a
preferred parse to an alternative is the primary determinant of processing
cost. To date, Hale’s model has been evaluated on rather different kinds of
structures than the probabilistic parsers discussed above. Reconciliation of the
probabilistic disambiguation versus entropy reduction approaches—and
their ability to qualitatively model reading time data—remains an interesting
area for future research.

12.3.4 Evidence against likelihood in sentence processing


Experience-based models often assume some frequency-based ambiguity
resolution mechanism, preferring the interpretation which has the highest
Probabilistic Grammars in Language Processing 239

likelihood of being correct, namely the higher relative frequency. One well-
studied ambiguity is prepositional phrase attachment:
(12.8) John hit the man [PP with the book ].
Numerous on-line experimental studies have shown an overall preference for
high attachment, that is for the association of the PP with the verb (e.g. as the
instrument of hit) (Ferreira and Clifton 1986; Rayner et al. 1983). Corpus
analyses, however, reveal that low attachment (e.g. interpreting the PP as a
modifier of the man) is about twice as frequent as attachment to the verb
(Hindle and Rooth 1993). Such evidence presents a challenge for accounts
relying exclusively on structural frequencies, but may be accounted for by
lexical preferences for specific verbs (Taraban and McClelland 1988). Another
problem for structural tuning comes from three-site relative clause attach-
ments analogous to that in Figure 12.1, but containing an additional NP
attachment site:
(12.9) [high The friend ] of [midthe servant ] of [lowthe actress ] [RCwho was
on the balcony ] died.
While corpus analysis suggest a preference for low > middle > high attach-
ment (although such structures are rather rare), experimental evidence sug-
gests an initial preference for low > high > middle (with middle being in fact
very difficult) (Gibson et al. 1996a, 1996b). A related study investigating noun
phrase conjunction ambiguities (instead of relative clause) for such three-site
configurations revealed a similar asymmetry between corpus frequency and
human preferences (Gibson and Schütze 1999).
Finally, there is recent evidence against lexical verb frame preferences:
(12.10) The athlete realized [S [NP her shoes/goals ] were out of reach ].
Reading times studies have shown an initial preference for interpreting her
goals as a direct object (Pickering et al. 2000), even when the verb is more
likely to be followed by a sentence complement (see also Sturt et al. 2001, for
evidence against the use of such frame preferences in reanalysis). These
findings might be taken as positive support for the tuning hypothesis, since
object complements are more frequent than sentential complements overall
(i.e. independent of the verb). Pickering et al. (2000), building on previous
theoretical work (Chater et al. 1998), suggest that the parser may in fact still be
using an experience-based metric, but not one which maximizes likelihood
alone.
240 Gradience in Syntax

12.4 Probabilistic models of gradient grammaticality


As argued in detail in the previous section, probabilistic grammars can be
used to construct plausible models of human language processing, based on
the observation that the disambiguation decisions of the human parser are
guided by experience. This raises the question whether experience-based
models can also be developed for other forms of linguistic behaviour, such
as gradient grammaticality judgements. This issue will be discussed in this
section.

12.4.1 Probabilities versus degrees of grammaticality


We might want to conjecture that probabilistic models such as PCFGs can be
adapted so as to account for gradient grammaticality, with probabilities being
reinterpreted as degrees of grammaticality. The underlying assumption of
such an approach is that language experience (approximated by the frequen-
cies in a balanced corpus) not only determines disambiguation behaviour, but
also determines (or at least influences) the way speakers make grammaticality
judgements. The simplest model would be one where the probability of a
syntactic structure (as estimated from a corpus) is directly correlated with its
degree of grammaticality. This means that a speaker, when required to make a
grammaticality judgement for a given structure, will draw on his or her
experience with this structure to make this judgement. Manning (2003)
outlines a probabilistic model of gradient grammaticality that comes close
to this view. (However, he also acknowledges that such a model would have to
take the context of an utterance into account, so as to factor out linguistically
irrelevant factors, including world knowledge.)
Other authors take a more sceptical view of the relationship between
probability and grammaticality. Keller (2000b), for instance, argues that the
degree of grammaticality of a structure and its probability of occurrence in a
corpus are two distinct concepts, and it seems unlikely they can both be
modelled in the same probabilistic framework. A related point of view is put
forward by Abney (1996), who states that ‘[w]e must also distinguish degrees
of grammaticality, and indeed, global goodness, from the probability of
producing a sentence. Measures of goodness and probability are mathemat-
ically similar enhancements to algebraic grammars, but goodness alone does
not determine probability. For example, for an infinite language, probability
must ultimately decrease with length, though arbitrary long sentences may be
perfectly good’ (Abney 1996: 14). He also gives a number of examples for
Probabilistic Grammars in Language Processing 241

sentences that have very improbable, but perfectly grammatical readings.


A similar point is made by Culy (1998), who argues that the statistical
distribution of a construction does not bear on the question of whether it is
grammatical or not.
Riezler (1996) agrees that probabilities and degrees of grammaticality are to
be treated as separate concepts. He makes this point by arguing that, if one
takes the notion of degree of grammaticality seriously for probabilistic gram-
mars, there is no sensible application to the central problem of ambiguity
resolution any more. A probabilistic grammar model cannot be trained so
that the numeric value assigned to a structure can function both as a well-
formedness score (degree of grammaticality) and as a probability to be used
for ambiguity resolution.
Keller and Asudeh (2002) present a similar argument in the context of
optimality theory (OT). They point out that if an OT grammar were to model
both corpus frequencies and degrees of grammaticality, then this would entail
that the grammar incorporates both performance constraints (accounting for
frequency effects) and competence constraints (accounting for grammatical-
ity effects). This is highly undesirable in an OT setting, as it allows the
crosslinguistic re-ranking of performance and competence constraints.
Hence such a combined competence/performance grammar predicts that
crosslinguistic differences can be caused by performance factors (e.g. memory
limitations). Clearly, this is a counterintuitive consequence.
A further objection to a PCFG approach to gradient grammaticality is
that assigning probabilities to gradient structures requires the grammar to
contain rules used in ‘ungrammatical’ structures. It might not be plausible to
assume that such rules are part of the mental grammar of a speaker. However,
any realistic grammar of naturally occurring language (i.e. a grammar that
covers a wide range of constructions, genres, domains, and modalities) has
to contain a large number of low-frequency rules anyway, simply in order
to achieve broad coverage and robustness. We can therefore assume that
these rules are also being used to generate structures with a low degree of
grammaticality.

12.4.2 Probabilistic grammars and gradient acceptability data


The previous section reviewed a number of arguments regarding the rela-
tionship between probabilities (derived from corpora) and degrees of gram-
maticality. However, none of the authors cited offers any experimental results
(or corpus data) to support their position; the discussion remains purely
242 Gradience in Syntax

conceptual. A number of empirical studies have recently become available to


shed light on the relationship between probability and grammaticality.
Keller (2003) studies the probability/grammaticality distinction based on a
set of gradient acceptability judgements for word order variation in German.
The data underlying this study were gathered by Keller (2000a), who used an
experimental design that crossed the factors verb order (initial or final),
complement order (subject first or object first), pronominalization, and
context (null context, all focus, subject focus, and object focus context).
Eight lexicalizations of each of the orders were judged by a total of fifty-one
native speakers using a magnitude estimation paradigm (Bard et al. 1996). The
results show that all of the experimental factors have a significant effect on
judged acceptability, with the effects of complement order and pronominali-
zation modulated by context. A related experiment is reported by Keller
(2000b), who uses ditransitive verbs (i.e. complement orders including an
indirect object) instead of transitive ones.
Keller (2003) conducts a modelling study using the materials of Keller
(2000a) and Keller (2000b), based on the syntactically annotated Negra
corpus (Skut et al. 1997). He trains a probabilistic context-free grammar on
Negra and demonstrates that the sentence probabilities predicted by this
model correlate significantly with acceptability scores measured experimen-
tally. Keller (2003) also shows that the correlation is higher if a more sophis-
ticated lexicalized grammar model (Carroll and Rooth 1998) is used.
This result is not incompatible with the claim that there is a divergence
between the degree of acceptability of a sentence and its probability of
occurrence, as discussed in the previous section. The highest correlation
Keller (2003) reports is .64, which corresponds to 40 per cent of the variance
accounted for. However, this is achieved on a data set (experiment 1) which
contains a contrast between verb final (fully grammatical) and verb initial
(fully ungrammatical) sentences; it is not surprising that a PCFG trained on a
corpus of fully grammatical structures (but not on ungrammatical ones) can
make this distinction and thus achieves a fairly high correlation. On a corpus
of only verb final structures that show relatively small differences in accept-
ability (experiment 2), a much lower (though still significant) correlation of
.23 is achieved. This means that the PCFG only models 5 per cent of the
variance. In other words, Keller’s (2003) results indicate that the degree of
grammaticality of a sentence is largely determined by factors other than its
probability of occurrence (at least as modelled by a PCFG).
A related result is reported by Kempen and Harbusch (2004), who again
deal with word order variation in German. They compare twenty-four word
orders obtained by scrambling the arguments of ditransitive verbs (all
Probabilistic Grammars in Language Processing 243

possible argument permutations, with zero or one of the arguments pronom-


inalized). Frequencies were obtained for these twenty-four orders from two
written corpora and one spoken corpus and compared against gradient
grammaticality judgements from Keller’s (2000b) study. The results are sur-
prising in that they show that there is much less word order variation than
expected; just four orders account for the vast majority of corpus instances.
Furthermore, Kempen and Harbusch (2004) demonstrate what they term the
frequency–grammaticality gap: all the word orders that occur in the corpus
are judged as highly grammatical, but some word orders that never occur in
the corpus nevertheless receive grammaticality judgements in the medium
range. This result is consistent with Keller’s (2003) finding: it confirms that
there is only an imperfect match between the frequency of a structure and its
degree of grammaticality (as judged by a native speaker). Kempen and
Harbusch (2004) explain the frequency–grammaticality gap in terms of
sentence production: they postulate a canonical rule that governs word
order during sentence production. The judgement patterns can then be
explained with the additional assumption that the participants in a gram-
maticality judgement task estimate how plausible a given word order is as the
outcome of incremental sentence production (governed by the canonical
rule).
Featherston (2004) presents another set of data that sheds light on the
relationship between corpus frequency and grammaticality. The linguistic
phenomenon he investigates is object co-reference for pronouns and reflex-
ives in German (comparing a total of sixteen co-reference structures, e.g. ihni
ihmi ‘him.ACC him.DAT’, ihni sichi ‘him.ACC REFL.DAT’). In a corpus study,
Featherston (2004) finds that only one of these sixteen co-reference structures
is reasonably frequent; all other structures occur once or zero times in the
corpus. Experimentally obtained grammaticality data show that the most
frequent structure is also the one with the highest degree of grammaticality.
However, there is a large number of structures that also receive high (or
medium) grammaticality judgements, even though they are completely
absent in the corpus. This result is fully compatible with the frequency–
grammaticality gap diagnosed by Kempen and Harbusch (2004). Like them,
Featherston (2004) provides an explanation in terms of sentence production,
but one that assumes a two-stage architecture. The first stage involves the
cumulative application of linguistic constraints, the second stage involves
the competitive selection of a surface string. Grammaticality judgements are
made based on the output of the first stage (hence constraints violations are
cumulative, and there are multiple output forms with a similar degree of
grammaticality). Corpus data, on the other hand, are produced as the output
244 Gradience in Syntax

of the second stage (hence there is no cumulativity, and only a small number
of optimal output forms can occur).

12.5 Conclusion
There is clear evidence for the role of lexical frequency effects in human
sentence processing, particularly in determining lexical category and verb
frame preferences. Since many syntactic ambiguities are ultimately lexically
based, direct evidence for purely structural frequency effects, as predicted by
the tuning hypothesis, remains scarce (Jurafsky 2002).
Probabilistic accounts offer natural explanations for lexical and structural
frequency effects, and a means for integrating the two using lexicalized
techniques that exists in computational linguistics (e.g. Carroll and Rooth
1998; Charniak 2000; Collins 1999). Probabilistic models also offer good
scalability and a transparent representation of symbolic structures and their
likelihood. Furthermore, they provide an inherently gradient characterization
of sentence likelihood, and the relative likelihood of alternative interpret-
ations, promising the possibility of developing truly quantitative accounts of
experimental data.
More generally, however, experience-based models not only offer an
account of specific empirical facts, but can more generally be viewed as
rational (Anderson 1990). That is, their behaviour typically resolves ambigu-
ity in a manner that has worked well before, maximizing the likelihood of
correctly understanding ambiguous utterances. This is consistent with the
suggestion that human linguistic performance is indeed highly adapted to
its environment and the task rapidly of correctly understanding language
(Chater et al. 1998; Crocker to appear). It is important to note however, that
such adaptation based on linguistic experience does not necessitate mechan-
isms which are strictly based on frequency-based estimations of likelihood
(Pickering et al. 2000). Furthermore, different kinds and grains of frequencies
may interact or be combined in complex ways (McRae et al. 1998).
It must be remembered, however, that experience is not the sole determin-
ant of ambiguity resolution behaviour (Gibson and Pearlmutter 1998). Not
only are people clearly sensitive to immediate linguistic and visual context
(Tanenhaus et al. 1995), some parsing behaviours are almost certainly deter-
mined by alternative processing considerations, such as working memory
limitations (Gibson 1998). Any complete account of gradience in sentence
processing must explain how frequency of experience, linguistic and
non-linguistic knowledge, and cognitive limitations are manifest in the
mechanisms of the human sentence processor.
Probabilistic Grammars in Language Processing 245

An even greater challenge to the experience-based view is presented by


gradient grammaticality judgements. A series of studies is now available that
compares corpus frequencies and gradient judgements for a number of
linguistic phenomena (Featherston 2004; Keller 2003; Kempen and Harbusch
2004). These studies indicate that there is no straightforward relationship
between the frequency of a structure and its degree of grammaticality, which
indicates that not only experience, but also a range of processing mechanisms
(most likely pertaining to sentence production) have to be invoked in order to
obtain a plausible account of gradient grammaticality data.
13

Degraded Acceptability and


Markedness in Syntax, and the
Stochastic Interpretation of
Optimality Theory
R A L F VO G E L

13.1 Introduction
Markedness plays a central role in optimality theoretic grammars in the form
of violable well-formedness constraints.1 Grammaticality is understood as
optimality relative to a constraint hierarchy composed of markedness con-
straints, which evaluate diVerent aspects of well-formedness, and faithfulness
constraints, which determine, by their deWnition and rank, which aspects of
markedness are tolerated, and which are not: grammaticality is dependent on
and derived from markedness.
An optimality grammar is an input–output mapping: marked features of
the input have a chance to appear in the output, if they are protected by highly
ranked faithfulness constraints. Optimal expressions might diVer in their
markedness which is reXected in the diVerent constraint violation proWles
that these expressions are assigned by the grammar.
This chapter argues that violation proWles can be used to predict contrasts
among expressions in empirical investigations, and that markedness is the
grammar-internal correlate of (some) phenomena of gradedness that we

1 I want to thank my collaborators Stefan Frisch, Jutta Boethke, and Marco Zugck, without whom
the empirical research presented in this paper would not have been undertaken. For fruitful comments
and helpful suggestions I further thank Gisbert Fanselow, Caroline Féry, Doug Saddy, Joanna
Blaszczak, Arthur Stepanov, and the audiences of presentations of parts of this work at the Potsdam
University and the workshop on Empirical Syntax/WOTS 8 at the ZAS Berlin, August 2004. This work
has been supported by a grant from the Deutsche Forschungsgemeinschaft for the research group
‘ConXicting Rules in Language and Cognition’, FOR-375/2-A3.
Degraded Acceptability and Markedness in Syntax 247

experience in empirical studies. Contrary to other empirically oriented work


within optimality theory (OT), I claim that a standard OT grammar is already
well-suited for reXecting gradedness.
Section 13.2 reXects the discussion about gradedness and categoricity
within the tradition of generative grammar, especially in its relevance for
the competence/performance distinction. Section 13.3 introduces a particular
case of syntactic markedness: case conXicts in argument free relative con-
structions. I present data from an experimental study and show how an OT
grammar interpreted in the way sketched above can predict the observed
gradient data. In Section 13.4, I compare this account with two alternative
ways of dealing with markedness within OT. Section 13.5 discusses stochastic
OT, an enhancement of standard OT, especially designed to deal with results
from more advanced empirical investigations.
A stochastic component as part of an OT syntax grammar is on the one
hand not necessary to derive empirical predictions. But furthermore, the
stochastic OT model, as it is applied to syntactic problems, has one serious
shortcoming: in stochastic OT, an expression is the winner of a competition
with a certain probability only. However, besides this, corpus frequencies not
only mirror how often a candidate wins, but also the frequency of the
competition itself. A rare structure might be a frequent winner of a rare
competition, or a rare winner of a frequent competition. The underlying
grammars in these seemingly indistinguishable cases would be radically
diVerent.
I show in Section 13.6, with the results of a corpus study on German free
relative constructions, that these two kinds of frequencies can indeed be
observed, how they can be distinguished, and that they are both driven by
markedness, which should, therefore, be deWned in an independent way. This
is also necessary to avoid the pitfalls of contradictory results from diVerent
empirical methods. All methods create their own artefacts, and these should
not enter the grammar.

13.2 Gradedness and categoricity in generative syntax


Josefsson (2003) reports a survey that she did with about thirty Swedish native
speakers on the possibility of pronominal object shift in Swedish. She gave her
informants a Wve-point scale with the values ‘o.k. – ? – ? ? – ? * – *’. For the
statistical analysis, she correlated the judgements with natural numbers ran-
ging from ‘o.k.’¼ 4 to ‘*’ ¼ 0. Josefsson further assumed that grammatical
sentences have at least an average acceptability value of 1.5. This decision is not
of particular importance in her analysis. However, it appears to be a purely
248 Gradience in Syntax

normative decision. She could as well have proposed 2.0 or 2.5 as the bound-
ary. How can such a decision be justiWed independently?
An answer to this question requires a theory of acceptability judgements.
Theoretical linguists rarely explicate their point of view on this. Interpreting
the ‘?’ as uncertainty could simplify the problems somewhat, as this allows the
assumption of a categorical grammar.
But we would still have to exclude that the gradedness that we observe
results from inherent properties of the grammar, instead of being the result of
‘random noise’. If, on the other hand, phenomena of gradedness are system-
atically correlated with grammatical properties, then the whole categorical
view on grammar is called into question. I think that this is indeed the case.
More recent variants of ‘explanations’ in terms of non-grammatical factors
attribute variation and gradedness in grammaticality judgements to ‘per-
formance’. Abney (1996) remarked that such a line of argumentation takes
the division between competence and performance more seriously than it
should be taken:

Dividing the human language capacity into grammar and processor is only a manner
of speaking, a way of dividing things up for theoretical convenience. It is naive to
expect the logical grammar/processor division to correspond to any meaningful
physiological division—say, two physically separate neuronal assemblies, one func-
tioning as a store of grammar rules and the other as an active device that accesses the
grammar-rule store in the course of its operation. And even if we did believe in a
physiological division between grammar and processor, we have no evidence at all to
support that belief; it is not a distinction with any empirical content. (Abney 1996: 12)

Gradedness can even be used as a criterion for determining whether a


constraint belongs to competence or performance: constraints that belong
to performance cause degraded acceptability, rather than plain ungrammat-
icality. This would immunize the competence/performance distinction
against any empirical counter-evidence, which would make even clearer that
the distinction is only made for theoretical convenience.
Manning (2003) argues along very much the same lines. Emphasizing, like
Abney, that the generative grammarian discourse centred around the notion
of competence is very limited in its scope, he calls for the application of
probabilistic methods in syntax:

Formal linguistics has traditionally equated structuredness with homogeneity . . . ,


and it has tried too hard to maintain categoricity by such devices as appeal to an
idealised speaker/hearer. . . . The motivation for probabilistic models in syntax comes
from two sides:
Degraded Acceptability and Markedness in Syntax 249

. Categorical linguistic theories claim too much. They place a hard categorical
boundary of grammaticality where really there is a fuzzy edge, determined by
many conXicting constraints and issues of conventionality versus human cre-
ativity. [ . . . ]
. Categorical linguistic theories explain too little. They say nothing at all about the
soft constraints that explain how people choose to say things (or how they
choose to understand them). (Manning 2003: 296–7)

Sternefeld (2001) provides further arguments against the traditional compe-


tence/performance distinction within generative grammar. One of the issues
he discusses is that structures with central embeddings are very often
degraded in their acceptability. The problem for the competence/performance
distinction is: why should the computational system of I-language, the com-
petence system, be able to produce structures that the parser is unable to
compute eYciently? Why is it impossible for the parser to use the computa-
tional system of I-language in processing?
With Kolb (1997), Sternefeld claims that the description of I-language as a
‘computational system’ makes it impossible to distinguish it theoretically and
empirically from the ‘processing system’, that is performance. Both have the
same ontological status as generative, procedural systems. Sternefeld therefore
proposes with Kolb that competence should be understood as a declarative
axiomatic system, comparable to formal logics. Computational procedures,
however abstractly they may be conceived, are then part of the performance
system. A derivation can be seen as a proof for a particular structure, inter-
preted as a theorem of the algebraic system of I-language. The performance
system, however, includes not only these derivational procedures, but also, for
instance, all the psychological restrictions that are known to inXuence lin-
guistic behaviour, and anything else that is usually subsumed under the term
‘performance’.
A research programme that restricts itself to an investigation of competence
in this sense would not be able to formulate anything of empirical relevance.
In other words: the linguists’ focus of interest is and has always been
performance. Abney, Manning, and Sternefeld each argue from diVerent
perspectives for abandoning the competence/performance distinction in
the traditional sense. In particular, they all show that it is useless for the
investigation of a number of empirical phenomena, including gradient
acceptability.
Perhaps, the diYculty of relating the numerical, statistical results of psy-
cholinguistic experiments, corpus studies, or other more advanced empirical
methods to a categorical understanding of grammaticality is the reason why
250 Gradience in Syntax

the results of such empirical studies only rarely Wnd their way into the
grammar theoretical work of generative syntacticians.

13.3 Markedness in syntax


An important feature of many empirical methods is their relational way of
gathering data about linguistic structures. A typical design for a psycholin-
guistic experiment uses minimal pairs. An example is the pair in (13.1): free
relative clauses (FR) are clauses that stand for non-clausal constituents. They
have the syntax of relative clauses, but miss a head noun. The initial wh-
pronouns of argument FRs are sensitive to the case requirements of both the
FR-internal verb and the matrix verb. When the two cases diVer, we observe a
conXict: one of the two cases cannot be realized. This leads to ungrammat-
icality in (13.1b):2
(13.1) Case matching in argument free relative clauses in German:
a. Wer uns hilft, wird uns vertrauen
[Who-nom us-dat helps]-nom will us-dat trust
‘Whoever helps us will trust us’
b. *Wer uns hilft, werden wir vertrauen
[Who-nom us-dat helps]-dat will we-nom trust
‘Whoever helps us, we will trust him’
Experiments usually test for contrasts between minimally diVerent expres-
sions. In our example, the theory of case matching in argument free relative
clauses (Groos and van Riemsdijk 1981; Pittner 1991; Vogel 2001) is conWrmed
if (13.1b) is judged as grammatical less often than (13.1a) to a statistically
signiWcant degree. This is indeed the result of a speeded grammaticality
judgement experiment by Boethke (2005). Structure (13.1b) has signiWcantly
less often been judged as acceptable than (13.1a).3
This result is unproblematic for a categorical grammar. However, the
experiment contained two further conditions:

2 The case required by the matrix verb appears slanted and attached to the FR in the glosses.
3 For the sake of completeness, I will brieXy describe the experiment design: each of the 24
participants—students of the University of Potsdam—saw eight items of each of the conditions.
Test items were FRs with the four possible case patterns with nominative and dative. The experiment
included four further conditions which will be introduced later—so the experiment had eight test
conditions altogether. The test items of this experiment have been randomized and mixed with the test
items of three other experiments which served as distractor items. The sentences have been presented
visually word by word on a computer screen, one word at a time, each word was presented for 400 ms.
Subjects were asked to give a grammaticality judgement by pressing one of two buttons for gram-
matical/ungrammatical, within a time window of 2,500 ms.
Degraded Acceptability and Markedness in Syntax 251

Table 13.1. Acceptability rates for the structures in (13.1) and (13.2) in the
experiment by Boethke (2005)

Case required by matrix verb:


Case of wh-phrase:
nominative dative

nominative 87% (13.1a) 17% (13.1b)


dative 62% (13.2b) 71% (13.2a)

(13.2) a. Wem wir helfen, werden wir vertrauen


[Who-dat we-nom help ]-dat will we-nom trust
‘Whoever we help, we will trust him’
b. Wem wir helfen, wird uns vertrauen
[Who-dat we-nom help ]-nom will us-dat trust
‘Whoever we help, he will trust us’
The acceptability rates for these two structures are between those for the two
structures in (13.1). All contrasts except for the one between (13.2a, 13.2b) were
statistically signiWcant:
A categorical grammar has the problem of mapping this observation to its
dichotomous grammaticality scale. How can we independently justify where
we draw the boundary? If we state that only (13.1b) is ungrammatical, then we
state that the observed contrast between (13.2b) and (13.1b) is crucial, but all
the others are not. Likewise, if we treat (13.2b) as grammatical, we ignore the
contrast between (13.2a) and (13.1b). No matter how we decide, the diYcult
task is Wnding arguments for our decision to ignore some contrasts while
using others. But most importantly, there is no way of accounting for all
contrasts with the grammatical/ungrammatical dichotomy only.
This shows that the decision between a categorical or a gradient conception
of grammaticality is also an empirical matter.4 If empirical methods show an
intermediate acceptability status under such controlled conditions, it is very
likely that the factor that caused this intermediate status is grammar-internal.
At least, this should be the null assumption.

I want to emphasize that this experiment led to gradient acceptability (see below) without asking for it.
In questionnaire studies with multi-valued scales and experiments based on magnitude estimation
gradience is already part of the experimental design. One could argue that subjects only give gradient
judgements, because they have been oVered this option. In the experiment described here, the
gradience results from intra- and inter-speaker variation among the test conditions in repeated
measuring.
4 Featherston (to appear) provides more arguments in favour of this position.
252 Gradience in Syntax

A theory of grammar that has the potential to deal with gradedness more
successfully is optimality theory (Prince and Smolensky 1993). It departs in a
number of ways from classical generative grammar. It is constraint-based,
which is not strikingly diVerent, but the constraints are ranked and violable.
DiVerent structures have diVerent violation proWles.
One important departure from traditional grammars is that the grammat-
icality of an expression cannot be determined for that expression in isolation.
An expression is grammatical, if it is optimal. And it is optimal if it performs
better on the constraint hierarchy than all possible alternative expressions in a
competition for the expression of a particular underlying input form.
OT thus determines grammaticality in a relational manner. This is remin-
iscent of what is done in the empirical investigations described above. It
should be possible to systematically relate observed gradedness to relative
optimality of violation proWles.5
OT is based on two types of constraints, markedness and faithfulness
constraints. Markedness constraints evaluate intrinsic properties of candi-
dates, while faithfulness constraints evaluate how similar candidates are to a
given input. As there are inWnitely many possible input speciWcations, there is
the same rich amount of competitions. Grammatical expressions are those
that win, that is are optimal, in at least one of these competitions.
Candidates which are good at markedness, that is relatively unmarked
candidates, are not as dependent on the assistance of faithfulness constraints
relatively marked candidates. This is schematically illustrated in Tables 13.2
and 13.3.

Table 13.2. Grammar with low ranked faithfulness

cand1 M1 M2 F cand2 M1 M2 F

+cand1 * + cand1 * *
cand2 *! * cand2 *!

Table 13.3. Grammar with highly ranked faithfulness

cand1 F M1 M2 cand2 F M1 M2

+cand1 * cand1 *! *
cand2 *! * + cand2 *
M1, M2: markedness constraints; F: faithfulness constraint; cand1, cand2: input speciWcations; cand1, cand2:
output candidates; * ¼ constraint violation; *! ¼ fatal violation; + ¼ winning candidate

5 The Wrst author who explored this feature of OT systematically was Frank Keller (Keller 2000b,
and further work). See below for a brief discussion of his approach.
Degraded Acceptability and Markedness in Syntax 253

Candidate cand1 performs better than cand2 in the hierarchy of markedness


constraints ‘M1  M2’. Therefore, we can say that cand1 is less marked than
cand2 in the language at issue. This does not tell us anything about the
grammaticality of cand2, however. But we know that cand1 is grammatical,
irrespective of the grammaticality of cand2—provided, as we assume for the
sake of the example, that there are no further constraints and candidates to
consider. The faithfulness constraint F, if ranked low, cannot assist candidate
cand2, and so cand1 wins the competitions for both inputs ‘cand1’ and ‘cand2’.
Highly ranked faithfulness gives higher priority to input preservation, and
therefore cand2 wins its own competition.
Irrespective of the fact that both cand1 and cand2 are grammatical under
highly ranked faithfulness, we can still derive that cand2 is the more marked
structure from the violation proWles of the two structures, when we abstract
away from particular inputs, that is leave out the faithfulness constraints.
An OT grammar, interpreted this way, not only tells whether a structure is
grammatical, it also determines its relative markedness compared to other
structures. This second property is particularly interesting for the predict-
ability of gradience. Markedness can be seen as the correlate of gradience
within the OT grammar. Because markedness is one of the key concepts of
OT, nothing substantial needs to be added to account for gradience. In our
abstract example, the prediction would be that the less marked cand1 receives
higher acceptability, is easier to process, is more frequent, etc.
I will illustrate this with the linguistic example in (13.1) and (13.2). Simpli-
fying my own account (Vogel 2001, 2002, 2003b), we can assume the following
constraints to distinguish the four structures:
(13.3) a. Realize Case (RC):
An assigned case must be realized morphologically.
b. Realise Oblique(RO):
An assigned oblique case (e.g. dative) must be realized
morphologically.
c. Subject precedes Object (S<O):
The subject precedes the object(s) of its clause.
Under the ranking illustrated in Table 13.4, the violation proWles are such that
the relative markedness of the candidates matches the relative acceptabilities
given in Table 13.1.6
6 For the relative markedness of these four candidates the constraint RO is unnecessary. That it is
the constraint that excludes a candidate can be seen from the fact that it does not improve, if the FR is
postponed, which avoids a violation of S<O:
(i) *Wir werden vertrauen, wer uns hilft.
we-nom will trust [who-nom us-dat helps]-dat
‘We will trust whoever helps us’
254 Gradience in Syntax

Table 13.4. Comparison for the relative markedness of the


structures in (13.1) and (13.2)

Rank candidate RO RC S<O

1. (13.1a)
2. (13.2a) *
3. (13.2b) *
4. (13.1b) * * *

Groos and van Riemsdijk (1981), Pittner (1991), and Vogel (2001) oVer three
diVerent approaches to case conXicts in German FRs. Interestingly, these
authors also diVer in the grammaticality judgements they report. In particu-
lar, they agree that the two patterns in (13.4) are grammatical. Example (13.4a)
is a so-called ‘matching’ FR, both verbs assign the same case, accusative, no
conXict arises. In (13.4b), two diVerent cases are assigned, nominative and
accusative, but the wh-pronoun ‘was’ is ambiguous for these two cases, so the
FR is matching at the surface, and this is obviously suYcient.
(13.4) a. Ich lade ein, wen ich treVe
I invite [who-acc I-nom meet]-acc
‘I’ll invite whoever I meet’
b. Ich kaufe was mir gefällt
I buy [what-nom me-dat pleases]-acc
‘I’ll buy whatever pleases me’
While (13.5a, 13.5b) are both grammatical in Vogel’s (2001) dialect ‘German A’,
Pittner only classiWes (13.5a) as grammatical (Vogel’s (2001) ‘German B’). Both
patterns in (13.5) are classiWed as ungrammatical by Groos and van Riemsdijk
(Vogel’s (2001) ‘German C’).7
(13.5) a. Ich lade ein, wem ich begegne
I invite [who-dat I-nom meet]-acc
‘I’ll invite whoever I meet’
b. Ich lade ein, wer mir begegnet
I invite [who-nom me-dat meets]-acc
‘I’ll invite whoever meets me’
Note also that, strictly speaking, we have no evidence for the contrast between (13.2a) and (13.2b),
because their acceptability rates (71% versus 62%) did not diVer to a statistically signiWcant degree. If
we interpret this result such that the two structures are equally marked, then RC and S<O would have
to be ranked on a par in order to mirror this in our model.
7 Note that (13.4b) and (13.5b) do not diVer in the case conXict conWguration. The wh-pronoun
‘was’ is ambiguous for nominative and accusative. It is therefore the correct realization for both of
these cases, and the case conXict is, obviously, resolved.
Degraded Acceptability and Markedness in Syntax 255

Pittner oVers an explanation for the contrast she sees in (13.5) in terms of the
case hierarchy ‘nominative < accusative < dative, genitive, PP’: a case may
only be suppressed in favour of another case that is higher on the case
hierarchy, in particular, accusative can be suppressed in favour of dative,
but not in favour of nominative. In Vogel (2001), I capture this with the
following OT constraint:
(13.6) Realize Case (relativized) (RCr):
An assigned case must be realized morphologically by its case
morphology or that of a case that is higher on the case hierarchy.
I assume a further constraint that we may informally call ‘1To1’ here (cf. Vogel
2001):
(13.7) 1To1:
Case assigners and case assignees are in 1-to-1 correspondence.
The high rank of this constraint has the eVect that FRs are disallowed, and lose
against an unfaithful candidate. In German, this unfaithful winner is a
structure that I call ‘correlative’ (CORR):
(13.8) Wer uns hilft, dem werden wir vertrauen
Who-nom us-dat helps that one-dat will we-nom trust
Here, the case conXict is avoided by the insertion of an additional resumptive
pronoun (‘dem’). The diVerences between the judgements given in the three
papers can be described in terms of OT grammars that use the same hierarchy
of markedness constraints, and diVer only in the rank of faithfulness (see
Table 13.5).
If the rank of F is not absolutely determined, but allowed to vary between
RO and 1To1, then there is no need to assume that varying judgements result
from diVerent grammars, as long as variation and gradedness are based on the
same hierarchy of markedness constraints. Faithfulness can be interpreted as a
‘Xoating constraint’ in the sense of Reynolds (1994) and Nagy and Reynolds
(1997).
Floating constraints are ranked within a particular range in the constraint
hierarchy. They are exceptional, constraints in general do not Xoat. The
Table 13.5. DiVerent rankings of faithfulness among identical subrankings of
markedness yield the three variants of German reported in the literature

RO > F > RCr > RC > 1To1 Vogel (2001), German A


RO > RCr > F > RC > 1To1 Pittner (1991)
RO > RCr > RC > F > 1To1 Groos and van Riemsdijk (1981)
256 Gradience in Syntax

motivation for the introduction of Xoating constraints is the same kind of


problem observed here, variation within a speech community, or a family of
closely related dialects.
In our example, the variable rank of F can be interpreted as reXecting the
individually varying level of ‘error tolerance’ within a speech community. In
stochastic optimality theory, all constraints occupy a particular rank only
with a certain probability, and can potentially be Xoating. It therefore pro-
vides even more Xexibility to make adequate predictions for empirical inves-
tigations. This approach is the topic of Section 13.5.

13.4 Markedness in OT
Markedness constraints do most of the crucial work in OT grammars. One
might object that markedness is only a reXection of typicality (just as one
anonymous reviewer did): a certain expression is degraded in acceptability
only because it is less frequently used or less prototypical. This objection does
not carry over to the phenomenon we are exploring here, case conXicts in
argument FRs. Most German native speakers agree that the following gram-
maticality contrast holds:
(13.9) a. Ich besuche, [fr wem ich vertraue ]
I visit-[acc] who-dat I trust
b. *Ich vertraue, [fr wen ich besuche ]
I trust-[dat] who-acc I visit

This contrast could be conWrmed very clearly in a speeded grammaticality


judgement experiment (Vogel and Frisch 2003). In a corpus investigation,
however, neither one of these structures could be found in our samples (Vogel
and Zugck 2003). The contrast in (13.9) is certainly not a typicality contrast—
the structures are too rare. In our studies, argument FRs without case con-
Xicts turned out to be both more frequent and more likely to be accepted than
conXicting structures such as those in (13.9). The source for the observed
contrasts lies in the expressions under examination: case conXicts are prob-
lematic in themselves, and some conXicts are more problematic than others,
but they are a material property of the expressions, and thus represent
grammatical markedness in the very traditional sense.
Markedness shows up in two ways in an OT model, and these need to be
distinguished. First, every candidate, even the winner of an OT competition,
can violate markedness constraints. A comparison of the constraint violation
proWles only of structures that are optimal in some competition, as sketched
Degraded Acceptability and Markedness in Syntax 257

in Table 13.4, results in a relative markedness ranking of grammatical struc-


tures. This is all we need to predict the relative frequencies of diVerent
structures in a corpus, as ungrammatical structures do not occur in even
very large corpora to a measurable degree. It can be used in the same way to
predict relative acceptabilities in experiments. That an OT markedness gram-
mar outputs the correct relative acceptabilities/frequencies in such a com-
parison could be a criterion for the empirical adequacy of a model.
Secondly, suboptimal structures, the losers of single OT competitions, are
more marked than the winners of these competitions. Many of these output
candidates do not win in any competition of a language. Of course, it is the
nature of OT that suboptimal structures also diVer in their violation proWles.
In the same way as with winning structures, the proWles can be used to predict
the results of experiments which use these expressions. Keller (2000b, 2001)
relates suboptimality to degraded acceptability.

13.4.1 Relative markedness of winners


An understanding of markedness in the Wrst sense underlies the common
sense usage of this term in the linguistic literature. Expressions are classiWed as
grammatical, ungrammatical, and ‘marked’, which usually means, informally
speaking, ‘not ungrammatical, but not perfect either’, but rarely ever
‘ungrammatical, but better than other ungrammatical structures’.8
The possibility of a comparison of all winners in the OT grammar of a
particular language with respect to their relative markedness is an important
feature that distinguishes OT from ordinary models of generative grammar.
There, all grammatical structures are equal in the sense that the only criterion
for grammaticality is the possibility of assigning them a well-formed struc-
tural description. This notion of ‘well-formedness’ is not abandoned in OT.
All winning structures of single competitions are well-formed in this sense.
But the winning structures are not equal. They are assigned diVerent violation
proWles by the OT grammar, and these, in principle, are accessible for
comparison. The result of such a comparison is a scale of relative markedness
which should ideally conform to the gradedness that we observe.
The Wrst one who exploited this idea to account for gradedness in syntax, as
far as I know, was Müller (1999). However, much of the crucial work in his
proposal is done by a subsystem of constraints which works diVerently in
accounting for grammaticality than in accounting for degraded acceptability.
That is, he uses slightly diVerent grammars for the two tasks.

8 To me, this formulation even has the Xavour of a logical contradiction. Ungrammatical structures
can by deWnition not be better than other structures.
258 Gradience in Syntax

This is not the case in the proposal that I developed above. I only use
the constraint types that are already there, markedness and faithfulness
constraints. Faithfulness plays a crucial role in selecting the winners of single
competitions, but cannot, by deWnition, play a role in the relative comparison
of these winners, as they are winners for diVerent inputs. Müller, on the
contrary, selects the constraints that are responsible for gradedness in an
ad hoc manner from the set of markedness constraints.
In a similar vein, Keller (2001) and Büring (2001) propose diVerences
among markedness constraints. Roughly speaking, they should be distin-
guished by the eVect of their violation. Irrespective of their rank in the
constraint hierarchy, markedness constraints are claimed to diVer in whether
their violation leads to ungrammaticality or only to degraded acceptability.
These three authors have in common that they propose that markedness in
the traditional sense must be added to the OT model as a further dimension of
constraint violation. They did not Wnd a way of accounting for it within
standard OT. This is surprising insofar as the traditional conception of
markedness is the core of OT. However, I think that I showed a way out of
this dilemma that can do without these complications.

13.4.2 Markedness as suboptimality


Markedness in the second sense that I mentioned above, as an artefact of the
OT model, is a much more problematic concept, and one might wonder
whether it has or should have any empirical consequences. A single OT
competition only knows winners and losers. Müller (1999) already argues
against the conception of suboptimality proposed by Keller (cf. Keller 2000):
in many OT analyses, the second best candidate is simply the candidate that is
excluded last, and very often this candidate is plainly ungrammatical and
much worse than other candidates which have been excluded earlier.
Take the case of a candidate cand1 that is excluded early in competition A
only because of highly ranked faithfulness, but wins another competition B
that has the appropriate input. Such a structure would certainly be judged
better in an experiment than a candidate cand2 that is excluded late in both
competitions, but does not win any competition in the language at hand, and
is therefore ungrammatical.
Because of faithfulness, structures are assigned diVerent violation proWles
in diVerent competitions. Being suboptimal in one competition does not
mean being suboptimal in all competitions. Consider our example of free
relative constructions. As I showed above, a correlative structure such as the
one in (13.8) avoids the case conXict with an additional resumptive pronoun.
Therefore, CORR structures do not violate constraints on case realization.
Degraded Acceptability and Markedness in Syntax 259

Consequently, CORR structures are less marked than FR structures.9 But how
can an FR be grammatical at all, then? Simply, because we speciWed in the
input that we want an FR structure, and highly ranked faithfulness rules out
the less marked CORR candidate—but only in this particular competition!
The CORR candidate still wins the competition where CORR is speciWed in
the input. The CORR structure performs worse than the FR structure in one
competition, but better in the other one. On which of these two contradicting
competitions shall we now base our empirical predictions?
A competition in an OT model is a purely technical device which should
not be identiWed with a comparison in a psycholinguistic experiment. The
only possible way to derive empirical predictions from a standard OT model
also for the comparison of ungrammatical structures seems to me to be the
meta-comparison for markedness sketched above that abstracts away from
single competitions, and therefore from faithfulness.
A powerful enhancement of OT that tries to relate grammar theory and
empirical linguistics is stochastic optimality theory which will be discussed in
the next section.

13.5 Stochastic optimality theory—how to make grammar Wt


observations
Stochastic OT has been developed by Boersma (1998b) and Boersma and
Hayes (2001). The most important diVerence to classical OT is that con-
straints are ordered at an inWnite numerical scale of ‘strictness’. The relative
rank of constraints is expressed by their distance on this scale, rather than
simply by domination. The ‘rank’ of a constraint, furthermore, is not a Wxed
value, but a probabilistic distribution. A constraint has a particular rank only
with a particular probability. At evaluation time, a certain amount of noise is
added, the probabilistic distributions of two constraints might overlap and
the grammar can have diVerent rankings at diVerent times, although these
rankings might diVer in their probabilities.
The body of work of stochastic OT in syntax is still rather small, and most
of it has been carried out by Joan Bresnan and her group at Stanford
University. Let me introduce only one example. Bresnan et al. (2001) study
the inXuence of person features of agent and patient on the choice of voice in
English and Lummi. They analysed the parsed SWITCHBOARD corpus, a

9 This is also reXected in the typology of these two constructions. To the best of my knowledge, the
languages which have FR constructions are a proper subset of those that have CORR constructions, as
I also illustrated in my earlier work, cf. (Vogel 2002).
260 Gradience in Syntax

Table 13.6. English person/role by voice (full passives) in the parsed


SWITCHBOARD-corpus, from Bresnan et al. (2001)

Action #Active #Passive Active % Passive %

1,2 ! 1,2 179 0 100.0 0.0


1,2 ! 3 6,246 0 100.0 0.0
3! 3 3,110 39 98.8 1.2
3 ! 1,2 472 14 97.1 2.9

database of spontaneous telephone conversations spoken by over 500 Ameri-


can English speakers. The analysis revealed the absence of full passive (with
by-phrases) if the agent of the transitive verb is Wrst or second person, while
they found an albeit small number of full passives with third person agents.
This diVerence, although numerically small, is statistically signiWcant. Table
13.6 displays their Wgures.10
English exhibits as a tendency what a language like Lummi has as a
categorical rule: Passives are avoided for structures with Wrst and second
person agents and they are more likely to occur with Wrst and second person
patients than with third person patients.11 Observations of this sort are
evidence for a position that unites functional and formal linguistics within
optimality theory under the slogan of the ‘stochastic generalization’:12
The same categorical phenomena which are attributed to hard grammatical con-
straints in some languages continue to show up as statistical preferences in other
languages, motivating a grammatical model that can account for soft constraints.
(Bresnan et al. 2001: 29)

Bresnan et al. (2001) show that Stochastic OT ‘can provide an explicit and
unifying theoretical framework for these phenomena in syntax.’ The frequen-
cies of active and passive are interpreted to correspond to the probabilities of
being the optimal output in a stochastic OT evaluation.
The most important constraints that are used in that account are *Obl1,2
which is ranked highest and bans by-phrases with Wrst and second person,
*SPt, which bans patients from being subjects, that is penalizes passives, *S3,
which penalizes 3rd person subjects, and *SAg, which penalizes Agents as
subjects. The latter two constraints are ranked on a par and overlap a bit

10 In Table 13.6, the description of the Action is to be read as follows: ‘‘1,2 ! 3’’ means that a first or
second person agent acts upon a third person patient.
11 In Lummi, sentences with Wrst or second person objects and third person subjects are ungram-
matical. Likewise, passive is excluded if the agent is Wrst or second person.
12 The eVect described here can be achieved without stochastic enhancements, just by exploiting the
violation proWles in the way illustrated in Section 13.3.
Degraded Acceptability and Markedness in Syntax 261

with the higher ranked *SPt, which in turn overlaps a bit with the higher
ranked *Obl1,2.
The rarity of passives with 1st or 2nd person subjects is mirrored by the
high rank of *Obl1,2. Is it really the case that the rarity of passives with Wrst and
second person by-phrases is the result of a grammatical constraint, or is it not
rather the result of the rarity of the communicative situation in which such a
passive would be appropriate? Not all instances of infrequency have a gram-
matical cause. It seems that a constraint system that is designed to directly
derive frequency patterns runs into the danger of interpreting properties of
the ‘world’ as properties of the grammar. I will discuss this problem in more
detail below.13

13.6 A case study—continued


One problem for stochastic optimality theory that has often been noticed (cf.
Boersma, this volume) is that diVerent tasks seem to require diVerent sto-
chastic OT grammars. In particular, corpus frequencies and relative accept-
abilities might not always go hand in hand.
Our studies on German argument free relative clauses that have been
introduced brieXy above are another example in case. The experiment by
Boethke (2005) altogether included eight conditions, four FRs in the four
diVerent case conWgurations, and their correlative counterparts.14
One prediction is that the correlative structures have a higher acceptability
rate than their FR counterparts, because they avoid case realization problems
with the additional resumptive pronoun. This expectation is met, as can be
seen in Table 13.7. All contrasts are statistically signiWcant, except for the least
problematic context, nom-nom. This could be due to the fact that FRs in this

Table 13.7. Mean acceptabilities for FR and CORR in diVerent case conWgurations
(in %)

nom-nom dat-dat dat-nom nom-dat

FR CORR FR CORR FR CORR FR CORR

87 95 71 91 62 92 17 90

13 See also Boersma (this volume) for more discussion of problems of this kind.
14 The abbreviations for the case patterns here and below have the following logic: in ‘case1-case2’,
case1 is the case of the wh-pronoun, case2 is the case assigned to the FR by the matrix verb.
262 Gradience in Syntax

Table 13.8. Results of a corpus investigation (Vogel and


Zugck 2003)

Case pattern FR CORR

nom-nom 274 (89.8%) 31 (10.2%)


dat-dat 1 (5.6%) 17 (94.4%)
dat-nom 33 (34.4%) 63 (65.6%)
nom-dat 0 (0%) 5 (100%)

context are too good already, and so there might in fact be a diVerence, but it
cannot be detected with this method.
Secondly, the contrast between the FRs in the contexts dat-dat and dat-
nom was not signiWcant either, contrary to all other contrasts. This is perhaps
due to an equal rank of the constraints RC and S<O. Both of these seem to be
minor problems.
However, we also carried out a corpus study on the same structures, and
this study yielded diVerent results precisely in these two problematic cases
(Vogel and Zugck 2003). We used the ‘COSMAS II’ corpus of written German
of the Institut für Deutsche Sprache (IDS) Mannheim. Samples of 500
randomly chosen sentences containing the wh-pronouns ‘wer’ and ‘wem’
have been generated, the FR uses of these instances have been sorted out
and counted. The results which are relevant for our discussion are shown in
Table 13.8.
Roughly 90 per cent of the found items in the nom-nom context were FRs.
This is remarkably diVerent from the result of the acceptability judgement
experiment where CORR had a slightly higher rate, but the diVerence to FR
was not signiWcant. We therefore would have expected equal frequencies for
the two structures at best, but not such a high preference for the more marked
FR. The second diVerence concerns the contrast between the dat-dat and the
dat-nom context: FR is used signiWcantly less often in the dat-dat context. In
the experiment, these FRs have a higher acceptability rate, although this was
again not statistically signiWcant. A formulation of this problem in terms of
standard OT requires the following steps. First, we need a new constraint:
(13.10) Avoid Redundancy (*Red)
Avoid meaningless elements that have a purely grammatical purpose
(so-called ‘function words’).
This constraint favours FR over CORR structures because of the additional
resumptive pronoun in CORR, a pure function word without contribution to
the meaning of the clause. But typologically, the inclusion of this constraint
Degraded Acceptability and Markedness in Syntax 263

would predict the existence of languages that have FRs, but no CORRs. This
prediction seems to be false (cf. Vogel 2001, 2002).
Depending on how we interpret the results, *Red would either have to be
ranked lower than 1To1 in grammaticality judgements, because FRs are judged
as worse than CORR in the experiment, or equal with 1To1, because this
tendency was not signiWcant for the nom-nom context. For ease of presenta-
tion, we deliberately decide to give a clear ranking, and assume that the
observed contrast was only accidentally not signiWcant. Because FR is only
more frequent in the nom-nom context, the eVects of *Red must be restricted
to that context. We do this by adding a constraint conjunction of 1To1 and
S<O, ‘1To1 & S<O’:
(13.11) 1To1 & S<O
No simultaneous violation of 1To1 and S<O.
This constraint should be ranked on top of 1To1 in order to take eVect
independently of that constraint. Clause-initial FRs which are not the subject
of the main clause violate this constraint, and therefore cannot proWt from the
eVects of the lower ranked *Red. The same holds for FRs which violate RC.
Conjoined constraints should be ranked higher than their constituent
constraints, hence, 1To1 & S<O should also be ranked higher than S<O. In
fact, the constraint can fully take over the job of S<O. So we will rank 1to1 &
S<O in place of S<O, which will be ranked lowest.
We can now state the constraint rankings that model the results of the
experiment and the corpus study. The two methods diVer in two rerankings
which have been marked with frames in (13.12):
(13.12) Judgement ranking
RO  RCr  RC  1To1&S<O >> 1To1  *Red  S<O
Corpus ranking
RO  RCr  1To1&S<O  RC >> * Red  1To1  S<O
How can we account for these contradictory rankings with a single OT
grammar? We might argue that the correlative structure is easier to parse,
hence preferred in the experiment, but it is avoided in production, because it
is, so to speak, ‘over-correct’. Indeed, the most plausible reason why the CORR
structure is avoided in the nom-nom context is that the resumptive pronoun
appears totally superXuous:
(13.13) Wer Hunger hat, (der) soll etwas essen
[Who-nom hunger has]-nom (the one-nom) shall something eat
‘Whoever is hungry shall eat a banana’
264 Gradience in Syntax

The FR is the subject of the main clause in (13.13), it is clause-initial and


therefore occupies the typical subject position and the FR pronoun bears
nominative, the case of subjects. These two hints suggest the correct analysis
to the parser already, the resumptive pronoun ‘der’ provides no new infor-
mation.
In our corpus study (Vogel and Zugck 2003), we also counted the average
length of the FR in the FR and CORR structures we found in the nom-nom
context. The result, which was highly signiWcant, was that the number of
words between the FR pronoun and the Wrst word of the main clause is 6.02 in
the case of the FR, and 12.04 in the case of CORR.
This can be seen as additional evidence for the redundancy theory: the
longer the FR, the harder it is to keep track of the Wrst word of the FR, that is
the more advantageous it is to double it with a resumptive pronoun.
However, Boersma’s (this volume) line of reasoning implies for our case
that the CORR structure is easier to comprehend in principle. More
generally, the more function words a sentence contains the easier can it
be comprehended. I doubt that this is correct. When we try to understand a
clause we are Wrst of all interested in the content words. Function words are
meaningless, by deWnition, and it appears much more likely to me that an
‘inXation’ of function words makes comprehension more diYcult rather
than easier.
The advantage of the CORR structures in the experiment could be more
task-speciWc: it is easier to judge their grammaticality, precisely because there
is less implicit grammatical information than in the FR structures. This is
rewarded in the experiments. The participants might also be more accurate in
their judgements.
The experimental test sentences have been presented visually, word by
word. Only one word at a time was displayed. This is diVerent from the
‘presentation mode’ of a newspaper text, where the full text is always available
to the reader. It might be a consequence of such an experimental design that
constraints that evaluate the morphological properties of words play a more
important role than they ‘usually’ do. This could be responsible for the task
speciWc constraint ‘lifting’ of RC and 1To1—constraints which evaluate case
morphology—that we observe in the grammaticality judgement task. If this
explanation is on the right track then the variation in constraint ranking is
systematic, not probabilistic: it aVects only constraints which are particularly
useful or useless in the task at issue.
The above discussion might suggest that corpus data are more trustworthy,
or more realistic than experimental data. That is, the constraint ranking that
we need for the corpus would be the ‘real’ grammar ranking, and the
Degraded Acceptability and Markedness in Syntax 265

judgement ranking is derived from it. A couple of objections against such a


point of view have to be made.
First, the constraint *Red is problematic. As already mentioned above,
assuming that it is universal and freely rerankable would predict that there is a
language that has only FRs, but not CORR structures. Such a language has not
been attested thus far. Secondly, it is, in general, not the case that redundancy of
the kind observed here, as an extension of syntactic structure, leads to ungram-
maticality. The transformation of an ordinary question into a cleft question is
another instance of such an extension. It has no eVect on grammaticality:
(13.14) a. What do you want to buy?
b. What is it that you want to buy?
Example (13.14b) expresses the same as (13.14a), it only does so in a more
complicated, perhaps less elegant way. It clearly violates *Red more often.
Thirdly, *Red is sort of counterproductive in grammaticality judgements.
Here, we prefer redundancy. Thus, this constraint very much looks like a
task-speciWc constraint—which should perhaps not be included in the con-
straint hierarchy of ‘grammar proper’.
Consider the study by Bresnan et al. (2001) again. A full passive is chosen
extremely rarely in general in spoken English. Does this mean that full passive
is judged as ungrammatical by speakers 97 percent of the time, as their
stochastic OT account suggests? Corpus frequencies reXect preferences for
the use of particular structures, and so does the corpus ranking proposed in
(13.12). Frequencies reXect the grammar itself only in an indirect way. The
rarity (not: absence!) of a structure S in a corpus could be given at least three
diVerent explanations from the perspective of Optimality Theory:
1. S is the rare winner of a frequent OT competition.
2. S is the frequent winner of a rare OT competition.
3. S is over-correct, and avoided for stylistic reasons.
Bresnan et al. seem to treat all infrequency in terms of explanation 1. They use
a model for OT syntax, where in fact passive and active always compete within
the single competition for the expression of a particular meaning. This has the
somewhat counterintuitive consequence that passives are suboptimal, and
hence ungrammatical, most of the time.
An alternative model that uses faithfulness constraints and a syntactic
speciWcation for active/passive in the input could Wrst of all derive that a
passive wins in a competition where passive is speciWed in the input.15 The

15 I argued for such a version of OT recently (Vogel 2003a, 2004).


266 Gradience in Syntax

low frequency of the passive would then not be the result of the passive being a
rare winner, but of the passive being rarely chosen as input. This is a totally
diVerent issue. The reason why passive is more rarely chosen is, of course, its
higher markedness. But such an explanation makes no intrinsic claim about
the grammaticality status of alternative structures, as does the stochastic
evaluation in the model used by Bresnan et al.
In the same way, our constraint *Red could be seen as a criterion for the
choice of particular inputs, but not as a constraint that evaluates the candi-
dates for this input.
In fact, if we reconsider their statistically signiWcant Wnding that passives
are even less frequent with third person patients, and do not occur at all with
Wrst and second person agents, we see that we cannot tell what this signiW-
cance is evidence for: under the assumption that subjects are more likely to be
topics, and Wrst and second person are more likely to be topics, too, this
Wnding could simply be due to the fact that the contexts where Wrst and
second person have a lower information structural status are extremely rare—
a stochastic OT grammar based on this Wnding would take a property of the
environment within which a grammar is applied for a grammatical con-
straint. Constraints on grammar and constraints of grammar should not be
confused.
That it is necessary to distinguish these two diVerent explanations for the
rarity of structures can also be demonstrated with the result of another corpus
study that we undertook (Vogel et al. in preparation). We again counted free
relative clauses in German in the COSMAS II corpus, this time with the
neuter wh-pronoun ‘was’ (‘what’). This pronoun is the same for both nom-
inative and accusative, which has the eVect that even those speakers who do
not tolerate FRs with case conXicts judge such FRs with ‘was’ as grammatical.
A typical contrast is the one in (13.15):
(13.15) Grammaticality contrast for some German speakers (cf. Pittner 1991)
a. Ich kaufe was mir gefällt
I buy [what-nom me-dat pleases]-acc
b. *Ich lade ein wer mir gefällt
I invite [who-nom me-dat pleases]-acc
The FRs found in the randomly selected sample of 500 sentences have been
counted for FR and CORR structures in the four possible combinations of
nominative and accusative, where the surface form ‘was’ matches both case
requirements. The results are displayed in Table 13.9.
We see that the case conWguration has no inXuence on the relative distri-
bution of FR and CORR. It is about two-thirds to one-third throughout. The
Degraded Acceptability and Markedness in Syntax 267

Table 13.9. Frequencies of German FR and CORR with the pronoun ‘was’ in a
sample of 500 sentences with ‘was’

FR case Matrix verb case FR CORR Sum

nom nom 34 (66.7%) 17 (33.3%) 51 (38.9%)


acc nom 21 (70.0%) 9 (30.0%) 30 (22.9%)
nom acc 12 (66.7%) 6 (33.3%) 18 (13.7%)
acc acc 21 (65.6%) 11 (34.4%) 32 (24.4%)
Sum 88 43 131
Source : Vogel et al. (in preparation)

Table 13.10. Frequencies of German FR and CORR with the pronouns


‘wer’ and ‘wen’ in two samples of 500 sentences each

FR case Matrix verb case FR CORR

nom Nom 274 (89.8%) 31 (10.2%)


acc Nom 5 (25.0%) 15 (75.0%)
nom Acc 0 (0%) 2 (100.0%)
acc Acc 1 (20.0%) 4 (80.0%)
Source : Vogel and Zugck (2003)

relative infrequency of the CORR structure that we found with ‘wer’ in the
nom-nom context is observed here again. Furthermore, the two contexts with
conXicting case requirements are totally neutralized in their eVects on the
choice of the construction. The case conXict does not seem to exist anymore if
the pronoun is homophonous for the two conXicting case features.
Compare these Wndings with the Wgures that we present in Vogel and Zugck
(2003) for the FRs with the animate wh-pronouns ‘wer’ (nominative) and
‘wen’ (accusative), repeated in Table 13.10.16
Only the nom-nom case pattern has a preference for FR with animate wh-
pronouns. That we Wnd the same with all FRs with ‘was’ irrespective of the
case pattern shows that the matching eVect is indeed a surface phenomenon.
However, we made a second observation which is perhaps rather unex-
pected. It concerns the relative frequency of the contexts themselves. Under the
assumption that nominative is more frequent than accusative in Wnite clauses

16 The two samples also diVer in the syntactic positions of the FRs that have been counted. ‘was’
also serves as relative pronoun in German, and it therefore was possible to include headed relative
clauses which are semantically equivalent to an FR (as in ‘everything that . . .’) in the statistics, and,
likewise, clause-Wnal FRs. The studies with ‘wer’ and ‘wen’ only counted clause-initial FRs and CORRs.
268 Gradience in Syntax

Table 13.11. Cases assigned by matrix verb to the FR and by FR


verb to the FR pronoun in a sample of 131 FRs with ‘was’

Case Assigned by matrix verb Assigned to FR pronoun

nominative 81 69
accusative 50 62

Table 13.12. Expected and found distribution of case


conWgurations in a sample of 131 sentences with FRs with ‘was’

FR case Matrix verb case Expected Found

nom nom 50.0 (38.2%) 51 (38.9%)


acc nom 30.9 (23.6%) 30 (22.9%)
nom acc 30.9 (23.6%) 18 (13.7%)
acc acc 19.1 (14.6%) 32 (24.4%)

in general,17 we expect that nom-nom is the most frequent pattern, and acc-
acc the least frequent, while acc-nom and nom-acc should be equal. This is
not the case in the ‘was’ sample. The context acc-acc is about as frequent as
acc-nom and nom-acc has lowest frequency. The distribution of nominative
and accusative as matrix verb case and FR case is listed in Table 13.11.
To calculate our expectations for the distribution of the case patterns, let us
take the Wgures we Wnd for the matrix verbs as the base.18 Table 13.12 lists the
expected values, and the actual Wndings.
The departures from the expected values for the nom-acc and acc-acc
patterns are statistically signiWcant. This result is in line with the relative
markedness these patterns are assigned by the grammar. In FRs with the nom-
acc pattern, a case that is higher on the case hierarchy is suppressed in favour
of a lower one—this is a highly marked situation. FRs with the acc-acc
pattern are much less problematic, because both cases match, there is no
case conXict. That this structure has a higher frequency is therefore expected
under the assumption that grammatical markedness drives frequency.
The two results of this study together show on the one hand that the case
conWguration does not decrease the preference for the FR if the form of the FR
17 All Wnite verbs that assign accusative also assign nominative, but there are many verbs which do
not assign accusative. Independently of verb frames, all clauses must have a subject, i.e. a nominative,
in German.
18 The calculation for the four contexts is: nom-nom ¼ 8181 ¼ 6561; acc-nom, nom-acc ¼ 8150
¼ 4050; acc-acc ¼ 5050 ¼ 2500. The relative frequencies in per cent are then calculated relative to
the sum of these values: 6561+4050+4050+2500¼17161. These are used in Table 13.12.
Degraded Acceptability and Markedness in Syntax 269

pronoun Wts both case requirements. The conXict is resolved at the surface,
and this is suYcient.
On the other hand, we observe that potentially problematic conWgurations,
like the nom-acc pattern, are signiWcantly less frequent than we expect them
to be. The conclusion must be that such patterns tend to be avoided as inputs
already—even where they turn out to be unproblematic in practice.
The case patterns are crucially dependent on the lexical material that is
chosen, in particular, the case requirements of the chosen verbs. But the
choice of lexical material is not subject to a standard OT competition. It is
given in the input, and the input is presupposed for an OT competition.
Markedness is thus demonstrated to guide not only the choice of how things
are expressed (as FR or CORR), but also of what is to be expressed (which
combination of verbs with which case patterns is chosen/avoided).

13.7 Summary
The conceptual problem behind the traditional competence/performance
distinction does not go away, even if we abandon its original Chomskyan
formulation. It returns as the question about the relation between the model
of the grammar and the results of empirical investigations—the question of
empirical testing and veriWcation.
Markedness can be seen as a correlate of observed gradedness within the
theory of grammar. Optimality theory, being based on markedness, therefore
is a promising framework for the task of bridging the gap between model and
empirical world. However, this task not only requires a model of grammar,
but also a theory of the methods that are chosen in empirical investigations
and how their results are interpreted, and a theory of how to derive predic-
tions for these particular empirical investigations from the model.
Stochastic optimality theory is one possible way of deriving empirical
predictions from an OT model. However, I hope to have shown that it is
not enough to take frequency distributions and relative acceptabilities at face
value, and simply construe some stochastic OT model that Wts the facts. These
facts Wrst of all need to be interpreted, and those factors that the grammar has
to account for must be sorted out from those about which grammar should
have nothing to say. This task, to my mind, is more complicated than the
picture that a simplistic application of (not only) stochastic OT might draw.
14

Linear Optimality Theory as a


Model of Gradience in Grammar
FRANK KELLER

14.1 Introduction
This paper provides an overview of linear optimality theory (LOT), a variant
of optimality theory initially proposed by Keller (2000b) to model gradient
linguistic data. It is important to note that LOT is a framework designed to
account for gradient judgement data; as has been argued elsewhere in this
volume (Crocker and Keller), gradience in processing data and in corpus data
has diVerent properties from gradience in judgement data, and it is unlikely
that the two types of gradience can be accounted for in a single, uniWed
framework.
The remainder of the paper is structured as follows. In Section 14.2, we
summarize the empirical properties of gradient judgements that motivate the
design of LOT. Section 14.3 deWnes the components of an LOT grammar, and
introduces the LOT notions of constraint competition and optimality. Based
on this, ranking argumentation is deWned, an algorithm for computing
constraint ranks is introduced, and a measure of model Wt in LOT is deWned.
Finally, Section 14.4 provides a comparison with other variants of OT, par-
ticularly with standard OT and with harmonic grammar. This section also
contains a survey of more recent developments, such as probabilistic OT and
variants of OT based on maximum entropy models.

14.2 Empirical properties of gradient judgements


Reviewing experimental data covering a range of syntactic phenomena in
several languages, Sorace and Keller (2005) identify a number of sources of
gradience in grammar. The two central experimental Wndings according
to Sorace and Keller (2005) are that constraints are ranked and that
constraint violations are cumulative. Constraint ranking means that some
Linear Optimality Theory as a Model 271

constraint violations are signiWcantly more unacceptable than others. Cumu-


lativity means the multiple constraint violations are signiWcantly more un-
acceptable than single violations. These properties seem to be fundamental to
the explanation of gradient linguistic judgements and therefore should form
the basis of a model of gradience in grammar. Cumulativity also accounts for
the ganging up of constraints: multiple violations of lower ranked constraints
can be as unacceptable as a single violation of a higher ranked constraint.
Experimental results reported by Keller (2000b) show that a ganging-up eVect
can be observed for constraints on word order, extraction, and gapping.
Sorace and Keller (2005) list a range of other properties of gradient data:
context eVects, crosslinguistic eVects, and developmental optionality. They
claim that these properties make it possible to classify linguistic constraints
into soft and hard constraints. While this is an interesting claim, it seems to us
more controversial than cumulativity and ranking, which seem to be more
generally accepted properties of gradient data. As LOT only relies on cumu-
lativity and ranking, we will not discuss the other properties here.

14.3 Linear optimality theory


Linear optimality theory as proposed by Keller (2000b) is a model of
gradience that makes predictions about the relative grammaticality of linguis-
tic structures. It builds on core concepts from optimality theory, a framework
that is attractive for this purpose as it is equipped with a notion of compe-
tition that makes it possible to formalize the interaction of linguistic con-
straints. Furthermore, OT provides a notion of constraint ranking that makes
it possible to account for the fact that constraints diVer in strength, that is that
some constraints are more important than others for the overall well-form-
edness of a given linguistic structure.
Although LOT borrows central concepts (such as constraint ranking and
competition) from optimality theory, it diVers in two crucial respects from
existing OT-based accounts. First, it relies on the assumption that constraint
ranks are represented as sets of numeric weights, instead of as partial orders.
Secondly, it assumes that the grammaticality of a given structure is propor-
tional to the sum of the weights of the constraints it violates. This means that
OT’s notion of strict domination is replaced with a linear constraint combin-
ation scheme (hence the name linear optimality theory).1

1 An anonymous reviewer points out that cumulativity could also be implemented using the
mechanism of local constraint conjunction used in standard OT, which restricts cumulativity to
particular local domains. Local conjunction has the advantage that the occurrence of cumulative
eVects is still under the control of the linguist: a local conjunction must be deWned explicitly.
272 Gradience in Syntax

Only a limited number of components of the OT architecture are aVected


by the switch to LOT. The changes concern only HEval, the function that
evaluates the harmony of a candidate, and Rank, the ranking component.
LOT does not aVect assumptions concerning the input and the generation
function Gen, the two components of an OT grammar that determine which
structures compete with each other. Also the constraint component Con,
that is the formal apparatus for representing constraints and candidates, is
unaVected. The LOT approach is neutral in these respects, and compatible
with the diverse assumptions put forward in the OT literature.
However, LOT’s versions of HEval and Rank entail changes in the way the
optimal candidate is computed, as well as requiring a new type of ranking
argumentation, that is a method for establishing constraint ranks from a set of
linguistic examples. It will be shown that this type of ranking argumentation is
considerably simpler than the one classically assumed in OT. Also, well under-
stood algorithms exist for automating this type of ranking argumentation.

14.3.1 Violation proWles and harmony


The most prominent pattern in the experimental data presented by Keller
(2000b) is the cumulativity of constraint violations, that is the fact that the
degree of unacceptability of a structure increases with the number of con-
straint violations it incurs. Cumulativity was in evidence in data on extrac-
tion, binding, gapping, and word order. Keller (2000b) also shows that the
cumulativity eVect extends from multiple violations of diVerent constraints to
multiple violations of the same constraint.
The other pervasive pattern in Keller’s (2000b) data is the ranking of
constraints, that is the fact that constraint violations diVer in the degree of
unacceptability they cause. Constraint ranking was observed in data on
extraction, binding, gapping, and word order.
The LOT model of gradient grammaticality derives from these two funda-
mental Wndings about constraint cumulativity and constraint ranking. Two
hypotheses implement these two results. The Wrst hypothesis deals with
constraint ranking:
(14.1) Ranking hypothesis
The ranking of linguistic constraints can be implemented by annotating
each constraint with a numeric weight representing the reduction in
acceptability caused by a violation of this constraint.
Note that this notion of constraint ranks as numeric weights is more general
than the notion of ranks standardly assumed in optimality theory. Standard
OT formulates constraint ranks as binary ordering statements of the form
Linear Optimality Theory as a Model 273

C1  C2 , meaning that constraint C1 is ranked higher than the constraint C2 .


Such statements do not make any assumptions regarding how much higher
the ranking of C1 is compared to the ranking of C2 . Such information is only
available once we adopt a numeric concept of constraint ranking.
In the remainder of this paper, we will adopt the following terminological
convention. The term constraint weight will be used to refer to the numeric
annotation that our model assigns to a constraint. The term constraint rank
will be employed to refer to the relative weight of two constraints in our
model: we say that a constraint outranks another constraint if it has a greater
weight (see also deWnition (14.9) below). This usage is justiWed by the fact that
standard OT ranks (i.e. constraint orderings) are a special case of ranks as
deWned in linear optimality theory (this will be shown in Section 14.4.1).
Once numeric constraint weights have been postulated, the overall accept-
ability of a structure can be computed based on the weights of the constraints
that the structure violates. We will assume that simple summation is suYcient
to compute the degree of acceptability of a structure from the weights of the
constraints that the structure violates. This will account straightforwardly for
the cumulativity of constraint violations observed experimentally. Keller
(2000b) demonstrates that this approach achieves a good model Wt on his
experimental data.
To account for the cumulativity of constraint weights, LOT formulates the
linearity hypothesis in (14.2):
(14.2) Linearity hypothesis
The cumulativity of constraint violations can be implemented by assum-
ing that the grammaticality of a structure is proportional to the weighted
sum of the constraint violations it incurs, where the weights correspond
to constraint ranks.
The hypotheses in (14.1) and (14.2) can be made explicitly by formulating a
numeric model that relates constraints ranks and degree of grammaticality.
This relies on the notion of a grammar signature, which speciWes the con-
straint set and the associated weights for a grammar. (Note that this deWn-
ition, and all subsequent ones, are independent of the formulation of the
constraints proper; the LOT account is one of constraint interaction, not of
actual linguistic constraints.)
(14.3) Grammar signature
A grammar signature is a tuple hC,wi where C ¼ {C1 , C2 , . . . ,Cn }
is the constraint set, and w(Ci ) is a function that maps a constraint
Ci 2 C on its constraint weight wi .
274 Gradience in Syntax

Relative to a grammar signature, a given candidate structure has a constraint


violation proWle as deWned in (14.4). The violation speciWes which constraints
are violated by the structure and how often. This is a useful auxiliary notion
that will be relied on in further deWnitions.

(14.4) Violation profile


Given a constraint set C ¼ {C1 ,C2 , . . . ,Cn }, the violation proWle of a
candidate structure S is the function v(S, Ci ) that maps S on the
number of violations of the constraint Ci 2 C incurred by S.

Based on deWnitions (14.3) and (14.4), the harmony of a structure can now be
deWned using a simple linear model:

(14.5) Harmony
Let hC,wi be a grammar signature. Then the harmony H(S) of a
candidate structure S with a violation proWle v(S,Ci ) is given in
(14.6).
P
(14.6) H(S) ¼ w(Ci )v(S,Ci )
i
Equation (14.6) states that the harmony of a structure is the negation of
the weighted sum of the constraint violations that the structure incurs.
Intuitively, the harmony of a structure describes its degree of well-formedness
relative to a given set of constraints. This notion corresponds closely to the
deWnition of harmony assumed in standard OT (Prince and Smolensky 1997:
1607) or harmonic grammar (Smolensky et al. 1992: 14).
The assumption is that all constraint weights are positive, that is that
wi $0 for all i. This means that only constraint violations inXuence the
harmony of a structure. Constraint satisfactions will not change the
harmony of the structure (including cases where a constraint is vacuously
satisWed because it is not applicable). This assumption is in accordance with
Keller’s (2000b) experimental results, in which only constraint violations
were found to aVect acceptability. This will be discussed further in Section
14.4.2.

14.3.2 Constraint competition and optimality


Based on the deWnitions of violation proWle and harmony proposed in the
preceding section, LOT’s notion of grammaticality can now be speciWed.
Grammaticality is computed in terms of the relative harmony of two candi-
dates in the same candidate set:
Linear Optimality Theory as a Model 275

(14.7) Grammaticality
Let S1 and S2 be candidate structures in the candidate set R. Then S1
is more grammatical than S2 if H(S1 ) > H(S2 ). This can be
abbreviated as (S1 ) > (S2 ):2
A crucial diVerence between harmony and grammaticality follows from deW-
nition (14.7). Harmony is an absolute notion that describes the overall well-
formedness of a structure. Grammaticality, on the other hand, describes the
relative ill-formedness of a structure compared with another structure. While
it is possible to compare the harmony of two structures across candidate sets,
the notion of grammaticality is only well-deWned for two structures that
belong to the same candidate set (i.e. share the same input). Therefore,
deWnition (14.7) (and the subsequent deWnition (14.8)) provide a relative
notion of well-formedness, in line with the optimality theoretic tradition.
Based on the deWnition of grammaticality in (14.7), we can deWne the
optimal structure in a candidate set as the one with the highest relative
grammaticality.
(14.8) Optimality
A structure Sopt is optimal in a candidate set R if Sopt > S for every
S 2 R.
A notion of constraint rank can readily be deWned in LOT based on the
relative weight of two constraints (see also the terminological note on ranks
versus weights in Section 14.3.1 above):
(14.9) Constraint rank
A constraint C1 outranks a constraint C2 if w(C1 ) > w(C2 ). This can
be abbreviated as C1  C2 .
In what follows, we will illustrate the deWnitions for harmony, grammaticality,
and optimality. Consider an example grammar with the constraints C1 ,C2 ,
and C3 , and the constraint weights given in Table 14.1. This table also speciWes
an example candidate set S1 , . . . ,S4 and gives the violation proWles for these
candidates. The harmony for each of these structures can be computed based
on deWnition (14.5).
The structure S3 maximizes harmony, that is it incurs the least serious
violation proWle. It is therefore the optimal structure in the candidate set, that
is to say, it is more grammatical than all other candidate structures. The
structures S1 and S4 are both less grammatical than S3 : S1 and S4 receive
the same harmony scores, but for diVerent reasons; S4 because it incurs a

2 This usage diVers from the standard OTusage, where harmonic ordering is denoted by ‘’, not ‘>’.
276 Gradience in Syntax

Table 14.1. Example violation proWle and harmony scores

C1 C2 C3
w(C) 4 3 1 H(S)

S1 * * 4
S2 * ** 5
S3 * 1
S4 * 4

high-ranked violation of C1 , S1 because it accumulates violations of C2 and


C3 . The structure S2 is less grammatical than S1 , as it incurs an additional
violation of C3 . In total, we obtain the following grammaticality hierarchy:
S3 > {S1 , S4 } > S2 .
This examples illustrates the three central properties of constraint inter-
action that were identiWed in Section 14.2. The Wrst property is the ranking of
constraints. S3 incurs a violation of C3 , while S4 incurs a violation of C1 . That
S3 is more grammatical than S4 is accounted for by the fact that C1 has a
higher weight than C3 , that is the ranking C1  C3 holds. This is a situation
that was observed many times in the experimental data presented by Keller
(2000b).
Furthermore, the example illustrates how the cumulativity of constraint
violations is modelled. S1 incurs single violations of C2 and C3 . The structure
S2 also incurs a single violation of C2 , but a double violation of C3 . As a
consequence, S1 is more grammatical than S2 . Cumulativity eVects such as
these were encountered frequently in Keller’s (2000b) experimental data.
Finally, Table 14.1 illustrates the ganging-up of constraint violations. The
structures S1 and S4 have diVerent constraint proWles: S4 violates the con-
straint C1 , while S1 violates the two constraints C2 and C3 , which are both
lower ranked than C1 . However, S1 and S4 are equally grammatical because the
two constraints C2 and C3 gang up against C1 , leading to the same harmony
score in both structures. Again, this empirical pattern is in evidence in Keller’s
(2000b) experimental data.
Note that standard optimality theoretic evaluation of the candidate set in
Table 14.1 leads to a diVerent harmonic ordering: S3 > S1 > S2 > S4 . If we
assume a naive extension of standard OT, then this order corresponds to the
grammaticality order of the candidates. The naive extension assumes the strict
domination of constraints, and therefore fails to model ganging-up eVects.
Under this approach, there is no possibility for a joint violation of C2 and C3
to be as serious as a single violation of C1 , due to the ranking C1  C2  C3 .
Linear Optimality Theory as a Model 277

Hence the naive extension of standard OT fails to account for the ganging-up
eVects that were observed experimentally.

14.3.3 Ranking argumentation and parameter estimation


Optimality theory employs so-called ranking arguments to establish con-
straint rankings from data. A ranking argument refers to a set of candidate
structures with a certain constraint violation proWle, and derives a constraint
ranking from this proWle. This can be illustrated by the following example:
assume that two structures S1 and S2 have the same constraint proWle, with the
following exception: S1 violates constraint C1 , but satisWes C2 . Structure S2 , on
the other hand, violates constraint C2 , but satisWes C1 . If S2 is acceptable but S2
is unacceptable, then we can conclude that the ranking C2  C1 holds (see
Prince and Smolensky 1993: 106).
In the general case, the fact that S1 is acceptable but S2 is unacceptable
entails that each constraint violated by S1 is outranked by at least one
constraint violated by S2. (See Hayes 1997, for a more extensive discussion
of the inference patterns involved in ranking argumentation in standard OT.)
The LOT approach allows a form of ranking argumentation that relies on
gradient acceptability data instead of the binary acceptability judgements
used in standard OT. A ranking argument in linear optimality theory can be
constructed based on the diVerence in acceptability between two structures in
the same candidate set, using the following deWnition:
(14.10) Ranking argument
Let S1 and S2 be candidate structures in the candidate set R with
the acceptability diVerence DH. Then the equation in (14.11) holds.
(14.11) H(S1 ) H(S2 ) ¼ DH
This deWnition assumes that the diVerence in harmony between S1 and S2 is
accounted for by DH, the acceptability diVerence between the two structures.
DH can be observed empirically, and measured, for instance, using magnitude
estimation judgements (Sorace and Keller 2005). Drawing on the deWnition of
harmony in (14.5), equation (14.11) can be transformed to:
P
(14.12) w(Ci )(v(S1 ,Ci ) v(S2 ,Ci ) ) ¼ DH
i
This assumes that S1 and S2 have the violation proWles v(S1 ) and v(S2 ) and are
evaluated relative to the grammar signature hC,wi.
Typically, a single ranking argument is not enough to rank the constraints
of a given grammar. Rather, we need to accumulate a suYciently large set of
ranking arguments, based on which we can then deduce the constraint
278 Gradience in Syntax

hierarchy of the grammar. To obtain a maximally informative set of ranking


arguments, we take all the candidate structures in a given candidate set and
compute a ranking argument for each pair of candidates, using deWnition
(14.12).
The number of ranking arguments that a set of k candidates yields is given
in (14.13); note that this is simply the number of all unordered pairs that can
be generated from a set of k elements.
k2 k
(14.13) n¼
2
Now we are faced with the task of computing the constraint weights of a
grammar from a set of ranking arguments. This problem can be solved by
regarding the set of ranking arguments as a system of linear equations. The
solution for this system of equations will then provide a set of constraint
weights for the grammar. This idea is best illustrated using an example. We
consider the candidate set in Table 14.1 and determine all ranking arguments
generated by this candidate set (here wi is used as a shorthand for w(Ci ), the
weight of constraint Ci ):

(14.14) S1 S2 : 0w1 þ 1w2 þ 1w3 0w1 1w2 2w3 ¼ (( 4) ( 5) ) ¼ 1


S1 S3 : 0w1 þ 1w2 þ 1w3 0w1 0w2 1w3 ¼ ( ( 4) ( 1) ) ¼ 3
S1 S4 : 0w1 þ 1w2 þ 1w3 1w1 0w2 0w3 ¼ ( ( 4) ( 4) ) ¼ 0
S2 S3 : 0w1 þ 1w2 þ 1w3 0w1 0w2 1w3 ¼ ( ( 5) ( 1) ) ¼ 4
S2 S4 : 0w1 þ 1w2 þ 1w3 1w1 0w2 0w3 ¼ ( ( 5) ( 4) ) ¼ 1
S3 S5 : 0w1 þ 0w2 þ 0w3 1w1 0w2 0w3 ¼ ( ( 1) ( 4) ) ¼ 3

This system of linear equations can be simpliWed to:


(14.15) w3 ¼ 1
w2 ¼ 3
w2 þ w3 w1 ¼ 0
w2 þ w3 ¼ 4
w2 þ 2w3 w1 ¼ 1
w3 w1 ¼ 3

We have therefore determined that w2 ¼ 3 and w3 ¼ 1. The value of w1 can


easily be obtained from any of the remaining equations: w1 ¼ w2 þ w3 ¼ 4.
This example demonstrates how a system of linear equations that follows
from a set of ranking arguments can be solved by hand. However, such a
manual approach is not practical for large systems of equations as they occur
in realistic ranking argumentation. Typically, we will be faced with a large set
Linear Optimality Theory as a Model 279

of ranking arguments, generated by a candidate set with many structures, or


by several candidate sets.
There are a number of standard algorithms for solving systems of linear
equations, which can be utilized for automatically determining the constraint
weights from a set of ranking arguments. One example is Gaussian Elimin-
ation, an algorithm which delivers an exact solution of a system of linear
equations (if there is one). If we are dealing with experimental data, then the
set of ranking arguments derived from a given data set will often result in an
inconsistent set of linear equations, which means that Gaussian elimination is
not applicable. In such a case, the algorithm of choice is least square estima-
tion (LSE), a method for solving a system of linear equations even if the
system is inconsistent. This means that LSE enables us to estimate the
constraint weights of an LOT grammar if there is no set of weights that satisfy
all the ranking arguments exactly (in contrast to Gaussian elimination). LSE
will Wnd an approximate set of constraint weights that maximizes the Wt with
the experimentally determined acceptability scores. A more detailed explan-
ation of LSE and its application to LOT is provided by Keller (2000b).

14.4 Comparison with other optimality theoretic approaches


14.4.1 Standard optimality theory
Linear optimality theory preserves key concepts of standard optimality the-
ory. This includes the fact that constraints are violable, even in an optimal
structure. As in standard OT, LOT avails itself of a notion of constraint
ranking to resolve constraint conXicts; LOT’s notion of ranking is quantiWed,
that is richer than the one in standard OT. The second core OT concept
inherited by LOT is constraint competition. The optimality of a candidate
cannot be determined in isolation, but only relative to other candidates it
competes with. Furthermore, LOT uses ranking arguments in a similar way to
standard OT. Such ranking arguments work in a competitive fashion, that is
based on the comparison of the relative grammaticality of two structures in
the same candidate set. As in standard OT, a comparison of structures across
candidate sets is not well-deWned; two structures only compete against each
other if they share the same input.
The crucial diVerence between LOT and standard OT is the fact that in LOT,
constraint ranks are implemented as numeric weights and a straightforward
linear constraint combination scheme is assumed. Standard optimality theory
can then be regarded as a special case of LOT, where the constraint weights are
chosen in an exponential fashion so as to achieve strict domination (see the
subset theorem in (14.16)). The extension of standard OT to LOT is crucial in
280 Gradience in Syntax

accounting for the cumulativity of constraint violations. The linear constraint


combination schema also greatly simpliWes the task of determining a con-
straint hierarchy from a given data set. This problem simply reduces to
solving a system of linear equations, a well-understood mathematical prob-
lem for which a set of standard algorithms exists (see Section 14.3.3).
Another advantage is that LOT naturally accounts for optionality, that is
for cases where more than one candidate is optimal. Under the linearity
hypothesis, this simply means that the two candidates have the same harmony
score. Such a situation can arise if the two candidates have the same violation
proWle, or if they have diVerent violation proWles, but the weighted sum of the
violation is the same in both cases. No special mechanisms for dealing with
constraint ties are required in linear OT. This is an advantage over standard
OT, where the modelling of optionality is less straightforward (see Asudeh
2001, for a discussion).
An OT grammar can be formulated as a weighted grammar if the constraint
weights are chosen in an exponential fashion, so that strict domination of
constraints is assured. This observation is due to Prince and Smolensky (1993:
200) and also applies to linear optimality theory. Therefore, the theorem in
(14.16) holds (the reader is referred to Keller (2000b) for a proof).

(14.16) Subset theorem


A standard optimality theory grammar G with the constraint
set C ¼ {C1 ,C2 , . . . ,Cn } and the ranking Cn  Cn 1  . . .  C1
can be expressed as a linear optimality theory grammar G’ with the
signature hC,wi and the weight function w(Ci ) ¼ bi , where b 1 is
an upper bound for multiple constraint violations in G.
Note that the subset theorem holds only if there is an upper bound b 1 that
limits the number of multiple constraint violations that the grammar G
allows. Such an upper bound exists if we assume that the number of violations
incurred by each structure generated by G is Wnite. This might not be true for
all OT constraint systems.

14.4.2 Harmonic grammar


Harmonic grammar (Legendre et al. 1990a, 1990b, 1991; Smolensky et al. 1992,
1993) is the predecessor of OT that builds on the assumption that constraints
are annotated with numeric weights (instead of just being rank-ordered as in
standard OT). Harmonic grammar (HG) can be implemented in a hybrid
connectionist-symbolic architecture and has been applied successfully to
gradient data by Legendre et al. (1990a, 1990b). As Prince and Smolensky
Linear Optimality Theory as a Model 281

(1993: 200) point out, ‘Optimality Theory . . . represents a very specialized


kind of Harmonic Grammar, with exponential weighting of the constraints’.
Linear optimality theory is similar to HG in that it assumes constraints that
are annotated with numeric weights, and that the harmony of a structure is
computed as the linear combination of the weights of the constraints it
violates. There are, however, two diVerences between LOT and HG: (a)
LOT only models constraint violations, while HG models both violations
and satisfactions; and (b) LOT uses standard least square estimation to
determine constraint weights, while HG requires more powerful training
algorithms such as backpropagation. We will discuss each of these diVerences
in turn.
LOT requires that all constraint weights have the same sign (only positive
weights are allowed, see Section 14.3.1). This amounts to the claim that only
constraint violations (but not constraint satisfactions) play a role in deter-
mining the grammaticality of a structure. In HG, in contrast, arbitrary
constraint weights are possible, that is constraint satisfactions (as well as
violations) can inXuence the harmony of a structure. This means that HG
allows a grammar to be deWned that contains a constraint C with the weight w
and a constraint C’ that is the negation of C and has the weight w. In such a
grammar, both the violations and the satisfactions of C inXuence the harmony
of a structure.
The issue of positive weights has important repercussions for the relation-
ship between standard OT and LOT: Keller (2000b) proves a superset theorem
that states that an arbitrary LOT grammar can be simulated by a standard OT
grammar with stratiWed hierarchies. The proof crucially relies on the assump-
tion that all constraint weights are of the same sign. StratiWed hierarchies
allow us to simulate the addition of constraint violations (they correspond to
multiple violations in standard OT), but they do not allow us to simulate the
subtraction of constraint violations (which would be required by constraints
that increase harmony). This means that the superset theorem does not hold
for grammars that have both positive and negative constraint weights, as they
are possible in harmonic grammar.
The second diVerence between HG and LOT concerns parameter estima-
tion. An HG model can be implemented as a connectionist network, and the
parameters of the model (the constraint weights) can be estimated using
standard connectionist training algorithms. An example is provided by the
HG model of unaccusativity/unergativity in French presented by Legendre
et al. (1990a, 1990b) and Smolensky et al. (1992). This model is implemented
as a multilayer perceptron and trained using the backpropagation algorithm
(Rumelhart et al. 1986a).
282 Gradience in Syntax

It is well-known that many connectionist models have an equivalent in


conventional statistical techniques for function approximation. Multilayer
perceptrons, for instance, correspond to a family of non-linear statistical
models, as shown by Sarle (1994). (Which non-linear model a given percep-
tron corresponds to depends on its architecture, in particular the number and
size of the hidden layers.) The parameters of a multilayer perceptron are
typically estimated using backpropagation or similar training algorithms.
On the other hand, a single-layer perceptron (i.e. a perceptron without
hidden layers) corresponds to multiple linear regression, a standard statistical
technique for approximating a linear function of multiple variables. The
parameters (of both a single-layer perceptron and a linear regression
model) can be computed using least square estimation (Bishop 1995). This
technique can also be used for parameter estimation for LOT models (see
Section 14.3.3). Note that LOT can be conceived of as a variant of multiple
linear regression. The diVerence between LOT and conventional multiple
linear regression is that parameter estimation is not carried directly on data
to be accounted for (the acceptability judgements); rather, a preprocessing
step is carried out on the judgement data to compute a set of ranking
arguments, which then form the input for the regression.
To summarize, the crucial diVerence between HG and LOT is that HG is a
non-linear function approximator, while LOT is a linear function approxi-
mator, that is a variant of linear regression. This means that a diVerent set of
parameter estimation algorithms is appropriate for HG and LOT, respectively.

14.4.3 Probabilistic optimality theory


Boersma and Hayes (2001) propose a probabilistic variant of optimality
theory (POT) that is designed to account for gradience both in corpus
frequencies and in acceptability judgements. POT stipulates a continuous
scale of constraint strictness. Constraints are annotated with numerical strict-
ness values; if a constraint C1 has a higher strictness value than a constraint C2 ,
then C1 outranks C2 . Boersma and Hayes (2001) assume probabilistic con-
straint evaluation, which means that at evaluation time, a small amount of
random noise is added to the strictness value of a constraint. As a conse-
quence, re-rankings of constraints are possible if the amount of noise added
to the strictness values exceeds the distance between the constraints on the
strictness scale.
For instance, assume that two constraints C1 and C2 are ranked C1  C2 ,
selecting the structure S1 as optimal for a given input. Under Boersma and
Hayes’ (2001) approach, a re-ranking of C1 and C2 can occur at evaluation
time, resulting in the opposite ranking C2  C1 . This re-ranking might result
Linear Optimality Theory as a Model 283

in an alternative optimal candidate S2 . The probability of the re-ranking that


makes S2 optimal depends on the distance between C1 and C2 on the strictness
scale (and on the amount of noise added to the strictness values). The
re-ranking probability is assumed to predict the degree of grammaticality of
S2 . The more probable the re-ranking C2  C1 , the higher the degree of
grammaticality of S2 ; if the rankings C1  C2 and C2  C1 are equally
probable, then S1 and S2 are equally grammatical.
The POT framework comes with its own learning theory in the form of the
gradual learning algorithm (Boersma 1998a, 2000; Boersma and Hayes 2001).
This algorithm is a generalization of Tesar and Smolensky’s (1998) constraint
demotion algorithm in that it performs constraint promotion as well as
demotion. The gradual learning algorithm incrementally adjusts the strictness
values of the constraints in the grammar to match the frequencies of the
candidate structures in the training data. The fact that the algorithm relies on
gradual changes makes it robust to noise, which is an attractive property from
a language acquisition point of view.
There are, however, a number of problems with the POT approach. As
Keller and Asudeh (2002) point out, POT cannot model cases of harmonic
bounding, as illustrated in Table 14.2: candidate S2 is harmonically bound by
candidate S1 , which means that there is no re-ranking of the constraints that
would make S2 optimal. As S2 can never be optimal, its frequency or accept-
ability is predicated to be zero (i.e. no other candidate can be worse, even if it
violates additional constraints). An example where this is clearly incorrect is
S3 in Table 14.2, which violates a higher ranked constraint and is less accept-
able (or less frequent) than S2 .
A second problem with POT identiWed by Keller and Asudeh (2002) is
cumulativity. This can be illustrated with respect to Table 14.3: here, candidate
S1 violates constraint C2 once and is more acceptable than S2 , which violates
C2 twice. S2 in turn is more acceptable than S3 , which violates C2 three times.
A model based on constraint re-ranking cannot account for this, as a

Table 14.2 . Data that cannot be modelled in probabilistic


OT (hypothetical frequencies or acceptability scores)

/input/ C3 C1 C2 Freq./Accept.

S1 * 3
S2 * * 2
S3 * 1
Source : Keller and Asudeh (2002)
284 Gradience in Syntax

Table 14.3. Data that cannot be modelled in probabilistic


OT (hypothetical frequencies or acceptability scores)

/input/ C1 C2 Freq./Accept.

S1 * 4
S2 ** 3
S3 *** 2
S4 * 1
Source : Keller and Asudeh (2002)

re-ranking of C2 will not change the outcome of the competition between


S1 , S2 , and S3 . Essentially, this is a special case of harmonic bounding involv-
ing only one constraint.
There is considerable evidence that conWgurations such as the ones illus-
trated in Tables 14.2 and 14.3 occur in real data. Keller (2000b) reports
acceptability judgement data for word order variation in German that instan-
tiates both patterns. Guy and Boberg’s (1997) frequency data for coronal stop
deletion in English instantiates the cumulative pattern in Table 14.3. Jäger and
Rosenbach (2004) show that cumulativity is instantiated in both frequency
data and acceptability data on genitive formation in English. None of these
data sets can be modelled by POT, and thus they constitute serious counter-
examples to this approach. In linear optimality theory, on the other hand,
such cases are completely unproblematic, due to the linear combination
scheme assumed in this framework.
In a recent paper, Boersma (2004) acknowledges that cases of harmonic
bounding and cumulativity as illustrated in Tables 14.2 and 14.3 pose a
problem for POT. In response to this, he proposes a variant of POT, which
we will call POT’. In POT’, the acceptability of a candidate S is determined by
carrying out a pairwise comparison between S and each of the other candi-
dates in the candidate set; the acceptability of S then corresponds to the
percentage of comparisons that S wins.3 As an example, consider Table 14.2.
Here, S1 wins against S2 and S3 , hence its acceptability value is 2/2¼100%. S2
wins against S3 but loses against S1 , so its acceptability is 1/2¼50%. S3 loses
against both candidates, and thus receives an acceptability value of 0 per cent.
In POT’, the relative grammaticality of a candidate corresponds to its
optimality theoretic rank in the candidate set. This is not a new idea; in fact

3 More precisely, it is the POT probability of winning, averaged over all pairwise comparisons, but
this diVerence is irrelevant here.
Linear Optimality Theory as a Model 285

Table 14.4 . Data that cannot be modelled in POT’


(hypothetical frequencies or acceptability scores)

/input/ C3 C1 C2 Freq./Accept.

S1 * 2
S2 * * 1
S3 * 1

it is equivalent to the deWnition of relative grammaticality in terms of sub-


optimality, initially proposed by Keller (1997). The only diVerence is that in
POT’, suboptimality is determined based on a POT notion of harmony,
instead of using the standard OT notion of harmony, as assumed by Keller
(1997). However, there are a number of conceptual problems with this pro-
posal (which carry over to POT’), discussed in detail by Müller (1999) and
Keller (2000b).
In addition to that, there are empirical problems with the POT’ approach.
POT’ correctly predicts the relative acceptability of the example in Table 14.2
(as outlined above). However, other counterexamples can be constructed
easily if we assume ganging-up eVects. In Table 14.4, the combined violation
of C1 and C2 is as serious as the single violation of C3 , which means that the
candidates S2 and S3 are equally grammatical. Such a situation cannot be
modelled in POT’, as S2 will win against S3 (because C3 outranks C1 ), hence is
predicted to be more grammatical than S3 . As discussed in Section 14.2,
ganging-up eVects occur in experimental data, and thus pose a real problem
for POT’.
In contrast to POT and POT’, LOT can model ganging-up eVects straight-
forwardly, as illustrated in Section 14.3.2. This is not surprising: the weights in
LOT grammars are estimated so that they correspond in a linear fashion to the
acceptability scores of the candidates in the training data. The strictness bands
in POT (and POT’) grammars, on the other hand, are estimated to match the
frequencies of candidates in the training data; it is not obvious why such a
model should correctly predict acceptability scores, given that it is trained on
a diVerent type of data.

14.4.4 Maximum entropy models


The problems with POT have led a number of authors to propose alternative
ways of dealing with gradience in OT. Goldwater and Johnson (2003), Jäger
(2004), and Jäger and Rosenbach (2004) propose a probabilistic variant of OT
based on the machine learning framework of maximum entropy models,
286 Gradience in Syntax

which is state of the art in computational linguistics (e.g. Abney 1997; Berger
et al. 1996). In maximum entropy OT (MOT) as formulated by Jäger (2004),
the probability of a candidate structure (i.e. of an input–output pair (o,i)) is
deWned as:
1 P
(14.17) PR (oji) ¼ exp ( rj cj (i,0) )
ZR (i) j
Here, rj denotes the numeric rank of constraint j, while R denotes the ranking
vector, that is the set of ranks of all constraints. The function cj (i,o) returns
the number of violations of constraint j incurred by input–output pair (i,o).
ZR (i) is a normalization factor.
The model deWned in (14.17) can be regarded as an extension of LOT as
introduced in Section 14.3.1. It is standard practice in the literature on
gradient grammaticality to model not raw acceptability scores, but log-
transformed, normalized acceptability data (Keller 2000b). This can be
made explicit by log-transforming the left-hand side of (14.6) (and dropping
the minus and renaming the variable i to j). The resulting formula is then
equivalent to (14.18).
P
(14.18) H(S) ¼ exp ( w(Cj )v(S,Cj ) )
j
A comparison of (14.17) and (14.18) shows that the two models have a parallel
structure: w(Cj ) ¼ rj and v(S,Cj ) ¼ cj (i,o) (the input–output structure of the
candidates is implicit in (14.18)). Both models are instances of a more general
family of models referred to as log-linear models. There is, however, a crucial
diVerence between the MOT deWnition in (14.17) and the LOT deWnition in
(14.18). Equation (14.18) does not include the normalization factor ZR (i),
which means that (14.18) does not express a valid probability distribution.
The normalization factor is not trivial to compute, as it involves summing
over all possible output forms o (see Goldwater and Johnson 2003, and Jäger
2004, for details). This is the reason why LOT assumes a simple learning
algorithm based on least square estimation, while MOT has to rely on
learning algorithms for maximum entropy models, such as generalized itera-
tive scaling, or improved iterative scaling (Berger et al. 1996). Another crucial
diVerence between MOT and LOT (pointed out by Goldwater and Johnson
2003) is that MOT is designed to be trained on corpus data, while LOT is
designed to be trained on acceptability judgement data.

14.5 Conclusions
This paper introduced linear optimality theory (LOT) as a model of gradient
grammaticality. Although this model borrows central concepts (such as
Linear Optimality Theory as a Model 287

constraint ranking and competition) from optimality theory, it diVers in two


crucial respects from standard OT. First, LOT assumes that constraint ranks
are represented as numeric weights (this feature is shared with probabilistic
OT and maximum entropy OT, see Sections 14.4.3 and 14.4.4). Secondly, LOT
assumes that the grammaticality of a given structure is proportional to the
sum of the weights of the constraints it violates, which means that OT’s
notion of strict domination is replaced with a linear constraint combination
scheme (this feature is shared with maximum entropy OT, see Section 14.4.4).
We also outlined a learning algorithm for LOT (see Section 14.3.3). This
algorithm takes as its input a grammar (i.e. a set of linguistic constraints) and
a training set, based on which it estimates the weights of the constraints in the
grammar. The training set is a collection of candidate structures, with the
violation proWle and the grammaticality score for each structure speciWed.
Note that LOT is not intended as a model of human language acquisition: it
cannot be assumed that the learner has access to training data that are
annotated with acceptability scores. The sole purpose of the LOT learning
algorithm is to perform parameter Wtting for LOT grammars, that is to
determine an optimal set of constraint weights for a given data set.
LOT is able to account for the properties of gradient structures discussed in
Section 14.2. Constraint ranking is modelled by the fact that LOT annotates
constraints with numeric weights representing the contribution of a con-
straint to the unacceptability of a structure. Cumulativity is modelled by the
assumption that the degree of ungrammaticality of a structure is computed as
the sum of the weights of the constraints the structure violates. Once ranking
and cumulativity are assumed as part of the LOT model, other properties of
gradient linguistic judgements follow without further stipulations.
This page intentionally left blank
Part IV
Gradience in Wh-Movement
Constructions
This page intentionally left blank
15

EVects of Processing DiYculty on


Judgements of Acceptability
G I S B E RT FA N S E LOW A N D S T E FA N F R I S C H

15.1 Introduction and overview


There is a certain tension between the role which acceptability judgements
play in linguistics and the level of their scientiWc underpinning.1 Judgements
of grammaticality form the empirical basis of generative syntax, but little is
known about the processes underlying their formation and the factors diVer-
ent from grammar contributing to them.
This paper illuminates the impact of processing diYculty on acceptability.
Section 15.2 reviews evidence showing that parsing problems often reduce
acceptability. That processing diYculty may increase acceptability is less obvi-
ous, but this possibility is nevertheless borne out, as Section 15.3 shows, which
reports several experiments dealing with locally ambiguous sentences involving
discontinuous NPs, NP-coordination, and VP-preposing. The preferred inter-
pretation of a locally ambiguous construction can have a positive inXuence on
the global acceptability of a sentence even when this reading is later abandoned.
Our experiments focusing on long wh-movement in Section 15.4 conWrm the
existence of the positive eVect of local ambiguities in a domain that goes beyond
mere syntactic feature diVerences. The global acceptability of a sentence is thus
inXuenced by local acceptability perceptions during the parsing process.

15.2 Decreased acceptability caused by processing problems


Generative syntax subscribes to two fundamental convictions: the notions
of grammaticality and acceptability must be kept apart, and grammatical

1 We want to thank Caroline Féry, Heiner Drenhaus, Matthias Schlesewsky, Ralf Vogel, Thomas
Weskott, and an anonymous referee for helpful comments and critical discussion, and Jutta Boethke,
Jörg Didakowski, Ewa Trutkowski, Julia Vogel, Nikolaus Werner, Nora Winter, and Katrin Wrede for
technical support. The research reported here was supported by DFG-grant FOR375.
292 Gradience in Wh-Movement Constructions

sentences may be unacceptable because of the processing diYculty they


involve (Chomsky 1957; Chomsky and Miller 1963). The latter is exempliWed
by multiple centre embeddings such as (15.1). Their acceptability decreases
with the number of self-embeddings, yet they are constructed according to the
principles of English grammar, and should therefore be grammatical. Pro-
cessing explanations for their low acceptability seem well-motivated, since
they Wt into theories of language processing (see Lewis 1993).
(15.1) the man who the woman who the mosquito bit loves kicked the
horse
Strong garden path sentences such as the horse raced past the barn fell (Bever
1970) illustrate the same point: sentences not violating any of the construc-
tional principles of English may have properties that make it close to impos-
sible for human parsing routines to identify their correct grammatical
analysis. This renders them unacceptable.
While it seems uncontroversial that eVects of strong processing problems
should not be explained as violations of grammatical principles, the inter-
pretation of milder parsing diYculties is less uniform. Consider the fronting
of objects in free word order languages such as German. Experimental studies
have revealed that object-initial sentences such as (15.2b) are less acceptable
than their subject-initial counterparts (15.2a) (Bader and Meng 1999; Feath-
erston 2005; Keller 2000a).
(15.2) a. der Tiger hat den Löwen gejagt
the.nom tiger has the.acc lion.acc chased
b. den Löwen hat der Tiger gejagt
‘the tiger has chased the lion’
Keller (2000a) and Müller (1999) make syntactic constraints responsible for
the lower acceptability of object-initial sentences, but a diVerent explanation
is at hand: object-initial sentences are more diYcult to parse than subject-
initial ones, and this may render them less acceptable. Processing diYculties
of object-initial structures have been documented since Krems (1984), see also
Hemforth (1993), Meng (1998), Schlesewsky et al. (2000), among others. Their
low acceptability can be explained in terms of this additional processing load,
and the latter can be shown to be grammar-independent.
In a self-paced reading study, Fanselow et al. (1999) compared the process-
ing of embedded German subject-initial, object-initial, and yes–no-questions.
They found an increase in reading times for the object-initial condition
(15.3b), beginning with the wh-phrase and ending at the position of the
second NP (¼ the subject, in the object-initial condition).
Effects of Processing DiYculty on Judgements 293

(15.3) es ist egal


‘it does not matter’
a. wer vermutlich glücklicherweise den Mann erkannte
who.nom presumably fortunately the.acc man recognized
‘who fortunately presumably recognized the man’
b. wen vermutlich glücklicherweise der Mann erkannte
who.acc presumably fortunately the.nom man recognized
‘who the man presumably fortunately recognized’
c. ob vermutlich glücklicherweise der Mann den Dekan erkannte
whether presumably fortunately the.nom man the.acc dean recognized
‘if the man presumably fortunately recognized the dean’

Fanselow et al. (1999) interpret this result in terms of memory cost: a fronted
object wh-phrase must be stored in memory during the parse process up to
the point where an object position can be postulated. In an SOV-language
such as German, this means that the object must be memorized until the
subject has been recognized. This account is in line with recent ERP research.
King and Kutas (1995) found a sustained anterior negativity for the processing
of English object relative clauses (as compared to subject relative clauses),
which Müller et al. (1997) relate to memory. Felser et al. (2003), Fiebach et al.
(2002), and Matzke et al. (2002) found a sustained LAN in the processing of
German object-initial wh-questions and declaratives, which is again attrib-
uted to the memory load coming from the preposed object. The claim that
object-initial structures involve a processing diYculty is thus well supported.
It is natural to make this processing diYculty responsible for the reduced
acceptability of sentences such as (15.2b).
Subjacency violations as in (15.4) constitute another domain in which
processing diYculty reduces acceptability. Kluender and Kutas (1993) argue
that syntactic islands arise at ‘processing bottlenecks’ when the processing
demands of a long distance dependency at the clause boundary add up on the
processing demands of who or whether. This processing problem is reXected in
dramatically reduced acceptability.

(15.4) ??what do you wonder who has bought


Processing accounts of the wh-island condition furthermore allow us to
understand satiation (Snyder 2000) and training eVects (Fanselow et al. to
appear) that are characteristic of wh-island violations: repeated exposure
facilitates the processing of sentences such as (15.4), and renders them more
acceptable.
294 Gradience in Wh-Movement Constructions

Processing diYculty reduces acceptability in further areas. Müller (2004)


shows that the low acceptability of CP-extrapositions from certain attachment
sites follows from attachment preferences, and does not reXect low grammat-
icality. Experimental evidence (Featherston 2005) suggests that the accept-
ability of subject relative clauses involving a locally ambiguous relative
pronoun decreases with an increase of the length of the ambiguous region.
This may be explained in terms of the processing diYculties associated with
locally ambiguous arguments (Frisch et al. 2001, 2002).

15.3 Increased acceptability linked to processing problems


15.3.1 General remarks
Processing diYculty can reduce the acceptability of a sentence. In principle,
the reverse might also exist: some processing diYculty makes a sentence with
low grammaticality fairly acceptable. For example, this should be the case
when the factor making the structure ungrammatical is diYcult to detect.
Marks (1965: 7) shows that the position of a grammatical violation correlates
with the degree of (un-)acceptability: violations coming early as in boy the hit
the ball are less acceptable than late violations as in the boy hit ball the. Meng
and Bader (2000b) and Schlesewsky et al. (2003) (among others) found
chance performance (rather than outright rejection) in speeded acceptability
rating tasks for ungrammatical transitive sentences such as (15.5) containing
illegitimate combinations of two nominative NPs. Schlesewsky et al. explain
such results with the assumption that the case marking of NPs tends to be
overlooked in nominative-initial sentences.
(15.5) *welcher Gärtner sah der Jäger
which.nom gardener saw the.nom hunter
The experiments reported here were carried out in order to systematically
investigate such positive eVects of processing diYculties on acceptability. In
particular, we expected a mitigating inXuence of local ambiguities. That
parsing problems can reduce the global acceptability of a sentence suggests
that it not only reXects properties of the Wnal analysis, but also the ‘local
acceptability’ of intermediate processing steps. When these intermediate steps
have ‘better’ grammatical properties than the Wnal analysis of the string, one
should expect that global acceptability is increased by the relatively well-
formed intermediate parsing steps. In particular, we studied discontinuous
NPs (experiment 1), subject verb agreement (experiment 2), VP-preposing
(experiment 3), and long distance movement (experiment 4).
Effects of Processing DiYculty on Judgements 295

15.3.2 Experiment 1: Discontinuous noun phrases


NPs can be serialized discontinuously in German, as illustrated in (15.6c). See
Fanselow (1988), Fanselow and Ćavar (2002), and Riemsdijk (1989) for diVer-
ent analyses of discontinuous NPs (DNP), and Bader and Frazier (2005) for
oZine experiments involving DNP.
(15.6) a. er liest [NP viele Bücher]
he reads many books
b. Viele Bücher liest er
many books reads he
c. Bücher liest er viele
books reads he many
German DNP are subject to two grammatical constraints on number
(cf. Fanselow and Ćavar 2002): an agreement constraint, and a ban against
singular count nouns appearing in the construction. Apart from a few excep-
tional constellations, DNPs are grammatical only if the two parts agree in
number (the Agreement constraint). While such a constraint holds for
DNPs in many languages, German is exceptional in the other respect, viz. in
disallowing singular DNPs for count nouns, as the contrast between (15.7a)
and (15.7b) illustrates (the Singularity constraint). The constraint derives
from a general requirement that articles may be absent in German in partial
and complete NPs only if the NP is headed by a plural or mass noun. Some
dialects repair the ungrammaticality of (15.7b) by ‘regenerating’ (Riemsdijk
1989) an article in the left part of the NP as shown in (15.7c); in other dialects,
there is no grammatical way of expressing what (15.7b) tries to convey. For
exceptions to these generalizations, see Fanselow and Ćavar (2002) and van
Hoof (1997).
(15.7) a. alte Professoren liebt sie keine
old.pl professors.pl loves she no.pl
‘she loves no old professors’
b. *alten Professor liebt sie keinen
old.sg professor.sg loves she no.sg
‘she loves no old professor’
c. einen alten Professor liebt sie keinen
an old professor loves she no
Many German nouns such as KoVer ‘suitcase’ do not distinguish singular
and plural morphologically for nominative and accusative case. The left
periphery of the DNP (15.8) is therefore (locally) compatible with a plural
296 Gradience in Wh-Movement Constructions

interpretation, which is excluded when the singular determiner keinen is


processed. Up to this point, however, the phonetic string allows an analysis
in which singularity is not violated. Introspection suggests that this local
number ambiguity increases acceptability as compared to other singular
DNP: (15.8) sounds better than (15.7b).
(15.8) KoVer hat sie keinen
suitcase.ambiguous has she no.singular
Experiment 1 tested several hypotheses on German DNP. Experiment 1a
investigated whether DNP with matching number are more acceptable than
those without (Agreement), and whether singular DNP are less acceptable
than plural ones (Singularity). Experiment 1b addressed the question of
whether local ambiguities of number increase the acceptability of singular
DNP. Experiment 1b required that we compare DNP with and without
adjectives contained in their left part. In experiment 1a, we therefore also
tested whether the presence of an adjective has an inXuence on the accept-
ability of DNP.

15.3.3 Experiment 1a
15.3.3.1 Materials Experimental items had the form exempliWed in (15.9).
In a sentence with a pronominal subject preceded by the verb and followed by
an adverb, an object NP was split such that the left part (LP) preceded the
verb, while the right part (RP) was clause Wnal. The LP could consist of a
single noun (simple) (15.9a, 15.9b, 15.9e, 15.9f), or of a noun preceded by an
adjective (like alten, old) agreeing with the noun (15.9c, 15.9d, 15.9g, 15.9h).
The LP and RP appeared in either singular (sg) or plural (pl) form (see
below).
(15.9) a. Professor kennt sie leider keinen simple_sg_sg
professor.sg knows she unfortunately no.sg
b. Professoren kennt sie leider keine simple_pl_pl
professor.pl knows she unfortunately no.pl
c. Alten Professor kennt sie leider keinen complex_sg_sg
old.sg professor.sg knows she unfortunately no.sg
d. Alte Professoren kennt sie leider keine complex_pl_pl
old.pl professor.pl knows she unfortunately no.pl
e. Professor kennt sie leider keine simple_sg_pl
professor.sg knows she unfortunately no.pl
f. Professoren kennt sie leider keinen simple_pl_sg
professor.pl knows she unfortunately no.sg
Effects of Processing DiYculty on Judgements 297

g. Alten Professor kennt sie leider keine complex_sg_pl


old.sg professor.sg knows she unfortunately no.pl
h. Alte Professoren kennt sie leider keinen complex_pl_sg
old.pl professor.pl knows she unfortunately no.pl

15.3.3.2 Method Forty students of the University of Potsdam participated.


They were paid for their participation, or received course credits. Participants
rated 106 sentences in pseudo-randomized order for acceptability on a six
point scale (1¼ ‘very good’, 6¼ ‘very bad’). There were four items per
condition. Each participant saw 16 experimental items (2 per condition), 74
unrelated and 16 related distractor items (items of experiment 1b plus 4
Wllers). A larger set of 128 sentences (16 sets of identical lexical material in
each of the 8 conditions) was created and assigned to 8 between subjects
versions in such a way that no subject saw identical lexical material in more
than one sentence.
In the other experiments, we used a rating scale diVerent from the one in
experiment 1. In order to increase the comparability of the results, we will use
transformed values for mean ratings in this results section: the ratings on the
‘1¼best/6¼worst’ scale are mapped to their equivalent on the ‘1¼worst/
7¼best’ scale used later (using the equation: transformed_value ¼ 8—
( real_value + (real_value-1)/5) ).
15.3.3.3 Results Figure 15.1 shows the mean acceptability ratings per
condition for all forty subjects.
In an ANOVA with the factors MATCH (number match between LP and
RP), NUMBER (number of LP: singular versus plural) and COMPLEXITY
(with versus without adjective in LP), we found a main eVect of MATCH
(F1(1,39) ¼ 35.02, p < .001) due to higher acceptabilities in matching compared

6
4.97 4.9
SG/Match
5
SG/Mism.
4 3.23 3.33
2.98 2.92 PL/Match
2.7 2.56
3 PL/Mism.
2

1
Simple Complex

Figure 15.1. Results of Experiment 1a


298 Gradience in Wh-Movement Constructions

to mismatching conditions. Furthermore, there was a main eVect of NUMBER


(F1(1,39) ¼ 49.33, p < .001) because LP plurals were more acceptable than LP
singulars. There was no main eVect of complexity (F < 1). In addition, there
was an interaction MATCH  NUMBER (F1(1,39) ¼ 21.72, p < .001). Resolving
this interaction by the factor NUMBER revealed a signiWcant advantage for
matching (compared to mismatching) number for LP plurals (F1(1,39) ¼ 39.10,
p < .001), but only a marginal one for LP singulars (F1(1,39) ¼ 3.71, p ¼ .06).
No further interaction reached signiWcance.
15.3.3.4 Discussion Experiment 1a conWrms the constraints Agreement
and Singularity borrowed from the literature. DNP are not acceptable
when the number of the LP and of the RP of the DNP do not match
(15.9e–15.9h). Furthermore, only plural (15.9b, 15.9d) but not singular DNP
(15.9a, 15.9c) are acceptable (if the construction is formed with a countable
noun).
In line with our expectations, the presence of an adjective in the LP of a
DNP had no inXuence on the acceptability of the construction. The addition
of an adjective could thus be employed as a disambiguating device in experi-
ment 1b.

15.3.4 Experiment 1b
15.3.4.1 Materials The six conditions of experiment 1b are exempliWed in
(15.10a) to (15.10f). All nouns were ambiguous with respect to number. The LP
of DNP which just consisted of a noun was consequently number-ambiguous
as well (15.10a, 15.10b). The addition of a number-marked adjective
disambiguated the LP towards a singular (15.10c, 15.9d) or plural (15.10e,
15.10f) interpretation. The RP of the DNP was either singular (15.10a, 15.10c,
15.10e) or plural (15.10b, 15.10d, 15.10f).
(15.10) a. KoVer hatte er leider keinen amb_sg
suitcase.amb had he unfortunately no.sg
b. KoVer hatte er leider keine amb_pl
suitcase.amb had he unfortunately no.pl
c. Roten KoVer hatte er leider keinen sg_sg
red.sg suitcase had he unfortunately no.sg
d. Roten KoVer hatte er leider keine sg_pl
red.sg suitcase had he unfortunately no.sg
e. Rote KoVer hatte er leider keinen pl_sg
red.pl suitcase had he unfortunately no.sg
f. Rote KoVer hatte er leider keine pl_pl
red.pl suitcase had he unfortunately no.pl
Effects of Processing DiYculty on Judgements 299

15.3.4.2 Method There were four items per contition. Experiment 1b was
included in the same questionnaire as experiment 1a (see above). Each
participant saw 12 experimental items (2 per condition), 74 unrelated and 16
related distractor items (items of experiment 1a) plus 4 Wllers. A larger set of
96 sentences (16 sets of identical lexical material in each of the 6 conditions)
was created and assigned to 8 between subjects versions in such a way that no
subject saw identical lexical material in more than one sentence.
15.3.4.3 Results Figure 15.2 shows the mean acceptability ratings per
condition for all forty subjects.
We computed an ANOVA with the factors LP NUMBER (number of left
part: ambiguous versus singular versus plural) and RP NUMBER (number of
right part: singular versus plural). We found a main eVect of LP NUMBER
(F1(2,78) ¼ 30.82, p < .001) which was due to the fact that LP singulars were
less acceptable than both LP plurals (F1(1,39) ¼ 31.69, p < .001) and ambigu-
ous LP (F1(1,39) ¼ 51.36, p < .001). However, ambiguous and plural LP did
not diVer from one another (F1(1,39) ¼ 1.51, p ¼ .34). Furthermore, there was
a main eVect of RP NUMBER (F1(1,39) ¼ 24.67, p < .001) which was due to the
fact that RP plurals were more acceptable than RP singulars. We also found an
interaction between both factors (F1(2,78) ¼ 13.77, p < .001). Resolving this
interaction by the factor RP NUMBER, we found a main eVect of LP number
for both RP singulars (F1(1,39) ¼ 6.66, p < .01) and RP plurals (F1(1,39) ¼ 33.37,
p < .001). Within RP singulars, ambiguous LP were better than both singulars
(F1(1,39) ¼ 14.38, p < .001) and plurals (F1(1,39) ¼ 7.61, p < .01) whereas
within RP plurals, ambiguous LP were better than singulars (F1(1,39) ¼ 43.60,
p < .001), but equally acceptable as LP plurals (F < 1).

6
4.78 4.97
5
3.87 SG
4
3.23 PL
2.98 2.7
3

1
Ambiguous Singular Plural

Figure 15.2. Results of Experiment 1b


300 Gradience in Wh-Movement Constructions

15.3.4.4 Discussion When we conWne our attention to DNPs with unam-


biguous LPs, experiment 1b is in line with what we saw in experiment 1a.
German DNP are subject to the Singularity constraint ruling out (15.10c,
15.10d), so that only DNP with a plural LP can be acceptable. Furthermore, the
Agreement constraint imposes a further restriction on acceptable DNP: the
RP must be plural as well (15.10f).
The results for the ambiguous LP conditions reveal more interesting facts,
in particular, when the RP is singular. Let us, however, consider ambiguous
LPs with a plural RP Wrst. When the right part of the DNP disambiguates the
DNP towards a plural interpretation (15.10b), sentences beginning with an
ambiguous left part are as acceptable as sentences with a plural left part
(15.10f). This is not surprising: the human parser must interpret the morpho-
logically ambiguous LP of the DNP as plural, since the Singularity con-
straint against singular DNP cannot be fulWlled otherwise. A right part of the
DNP with a plural marking constitutes no reason for abandoning this plural
hypothesis.
Interestingly, however, ambiguous LP are more acceptable than both sin-
gular and plural items when the RP bears a singular marking (15.10a). This is
in line with the intuitive assessment of such structures mentioned at the
outset, and it conWrms our expectation that the presence of a local ambiguity
can increase the acceptability of a sentence.
How does the ambiguity eVect in DNP with singular RPs come about? Note
Wrst that the presence of an adjective in the unambiguous LPs and its absence
in the ambiguous LPs cannot be made responsible for the results, since the
complexity of the LP has not had any eVect on acceptability in experiment 1a.
Rather, we can link the positive eVect of local ambiguity to the processing
diYculty that arguably arises when a locally ambiguous item Wgures diVer-
ently in the computations related to two (or more) diVerent contraints. That
the DNPs with an ambiguous LP are better than those with unambiguously
singular LPs (15.10c) follows straightforwardly from the fact that the ambigu-
ous item can be (temporarily) interpreted as plural. Thus, the Singularity
constraint banning singular DNP can be considered fulWlled when the
ambiguous item is processed (see above). That the ambiguous LPs are also
better than DNP with plural left parts (15.10e) in turn seems to be related to
the fact that the ambiguous item can also be interpreted as a singular, so that
the Agreement requirement can also be taken to be fulWlled. There are two
mitigating eVects of local ambiguity, then, but they are based on two incom-
patible interpretations of the ambiguous noun.
Experiment 1b has thus conWrmed the expectation that the presence of a
local ambiguity can increase global acceptability. In particular, the results
Effects of Processing DiYculty on Judgements 301

represented in Figure 15.2 are compatible with the view that intermediate
acceptability assessments (in our case: concerning Singularity) inXuence
global acceptability: the option of a plural interpretation for a locally
ambiguous noun leads to a positive local acceptability value, because Singu-
larity appears fulWlled. This positive local assessment contributes to the
global acceptability of DNPs even when the plural interpretation is later
abandoned because a singular right part is detected. In contrast to grammat-
icality, global acceptability does not only depend on the Wnal structural
analysis, but also on the acceptability of intermediate analysis steps.
This acceptability pattern can also be found with professional linguists.
They are not immune to such ‘spillover’ eVects increasing global acceptability,
as survey 1c has revealed.

15.3.5 Survey 1c
By e-mail, we asked more than sixty linguists (nearly all syntacticians) with
German as their native language for judgements of sixteen DNP construc-
tions, among them the items (15.11a, 15.11b) illustrating DNP with a singular
RP and a number ambiguous (15.11a) or singular (15.11b) LP constructed as in
experiment 1b.
(15.11) a. KoVer hat er keinen zweiten
suitcase.amb has he no.sg second.sg
b. Roten KoVer hat er keinen zweiten
red-sg suitcase has he no.sg second.sg
Of the remaining fourteen items, eight were DNP constructions with singular
LP and RP, one was a DNP with plural LP and RP, and four DNP had a plural
LP but checked for diVerent grammatical parameters. There was a further
item with an ambiguous LP. No distractor items were used in order to
increase the likelihood of a reply. Forty-Wve linguists responded by sorting
the sentences into the categories ‘*’, ‘?’, and ‘well-formed’. The results are
summarized in Figure 15.3, showing the number of participants choosing a
particular grade.

(15.12) Professoren kennt sie zwei


professor-pl knows she two
As Figure 15.3 indicates, sentence (15.12) beginning with an unambigous plural
item was accepted by nearly all participants. Two-thirds of the participants
rejected sentences that began with an unambiguous singular DNP (15.11b).
Both results are in line with the constraint *Singularity. The reaction to the
302 Gradience in Wh-Movement Constructions

45 40
40
35 30
30 okay
25 20 ?
20 14
15 11 out
9
10 6 5
5 0
0
15.11a 15.11b 15.12

Figure 15.3. Survey 1c

ambiguous item (15.11a) was diVerent. Only fourteen of the forty-Wve linguists
rejected this sentence. A statistical comparison between the number of rejec-
tions in (15.11a) versus (15.11b) revealed a signiWcant diVerence (x2 ¼ 5:82,
p < :05). The result shows that local ambiguities can improve acceptability
not only in the context of fast responses given by experimental subjects when
Wlling in a questionnaire. The eVect is also visible in the more reXected
judgements of professional syntacticians and other linguists.

15.3.6 Experiment 2: Disjunctive coordination


Experiment 2 was carried out with two goals: Wrst, we wanted to demonstrate
a positive eVect of processing diYculty in a domain other than DNP. A second
purpose was to test whether local ambiguities can increase acceptability in
syntactic constellations that do not completely Wt into the ‘classical’ preferred
reading/reanalysis constellation. We looked at a construction involving a
syntactically unsolvable agreement problem.
Subject–verb agreement is mandatory in German. Coordinated subjects
may thus lead to expressivity problems when the language oVers no rules for
computing the person–number features of the coordinated NPs. When two
NPs are coordinated by and, plural agreement of the verb seems justiWed on
semantic grounds, but there are no parallel conceptual arguments for picking
any of two diVerent values of the person feature. Timmermans et al. (2004)
found a preference for choosing 3rd person agreement rather than 2nd person
agreement for subjects consisting of a 2nd and 3rd person noun phrase
coordinated by und ‘and’. The order of the two NPs within the coordination
seemed unimportant.

(15.13) a. weil du und er gehen


because you and he go.3pl
b. weil du und er geht
because you and he go.2pl
Effects of Processing DiYculty on Judgements 303

When two singular NPs are coordinated by oder ‘or’, choosing plural agree-
ment for the verb is not (necessarily) semantically justiWed. Still, when one
searches the web, plural agreement, as in (15.14), is one of the frequent
options.
(15.14) Wer weiss, wie er oder ich in zwei Jahren denken
who knows how he or I in two years think.3pl
‘who knows what I or he will think in two years’ time’
Of the Wrst twenty-Wve occurences of er oder ich ‘he or I’ found by Google in
the German pages of the internet for which verbal agreement could be
determined (included in the Wrst 180 total hits for er oder ich), fourteen had
a plural verb, and eleven a singular one. However, the plural is less often
chosen when the addition of entweder ‘either’ comes close to forcing the
exclusive interpretation of oder. Among the Wrst twenty-Wve occurences of
entweder er oder ich ‘either he or I’ found by Google in the German pages of
the internet for which verbal agreement could be determined (included in the
Wrst 100 total hits for the construction), only Wve were constructed with a
plural verb.2
When one looks at the data extracted from the web showing singular
agreement more closely, an interesting pattern emerges. Of the thirty-one
examples, twenty-two involved a verb which was morphologically ambiguous
between a 1st and 3rd person interpretation (this is true of past tense verbs,
modal verbs, and a few lexical exceptions), and only nine bore an unambigu-
ous person feature (present tense verbs apart from the exceptions men-
tioned), with a strong bias for 3rd person (7 of 9). This is in line with
intuitions. Neither of the two verbal forms of schlafen ‘sleep’ sounds really
acceptable in the present tense, in which the forms of 1st (15.15c) and 3rd
person (15.15a) are distinguished morphologically. Examples (15.15b) and
(15.15d) involve verb forms that are morphologically ambiguous, and sound
much better.
(15.15) a. er oder ich schläft ER, UNA
he or I sleep3sg
b. er oder ich schlief ER, AMB
he or I slept.amb
c. er oder ich schlafe
he or I sleep.1sg
d. er oder ich darf schlafen
he or I may.amb sleep
2 The websearch was done on 26 January 2005 at 7pm GMT.
304 Gradience in Wh-Movement Constructions

We conducted two questionnaire studies in order to test whether the use of


person-ambiguous verbs increases acceptability.

15.3.7 Description of experiment 2a


15.3.7.1 Materials The four conditions of experiment 2a are exempliWed in
(15.15a–b) and (15.16). The experimental items began with an NP consisting of
the pronouns ich ‘I’ and er ‘he’ conjoined by oder ‘or’. In the ER-initial
condition, er came Wrst (15.15), in the ICH-initial condition, the NP began
with ich (15.16). The verb form was always 3rd person singular. The verb could,
however, either allow an additional 1st person singular interpretation
(AMBiguous condition, 15b, 16b) or be conWned to the 3rd person reading
(UNAmbiguous condition).
(15.16) a. ich oder er schläft ICH, UNA
I or he sleeps.3sg.
b. ich oder er schlief ICH, AMB
I or he slept.amb
15.3.7.2 Method Forty-eight students of the University of Potsdam
participated. They were paid for their participation, or received course
credits. Participants rated 120 sentences for acceptability, on a seven point
scale (1 ¼ very bad, 7 ¼ very good). There were 16 experimental items (4 items/
condition) in a within subject design, and 104 items not related to the
experiment.
15.3.7.3 Results Figure 15.4 represents mean judgements of acceptability in
experiment 2a. The mean acceptability of structures beginning with er (ER-
initial (15.15) ) was 4.23, and was statistically indistinguishable from the 4.24
mean acceptability of sentences beginning with ich (ICH-initial, (15.16) )
(F1 < 1, F2 < 1). However, there was a signiWcant eVect of ambiguity: the

7
6
5 4.51 4.41
3.96 4.07 UNA
4
AMB
3
2
1
ER ICH

Figure 15.4. Experiment 2a


Effects of Processing DiYculty on Judgements 305

ambiguous structures (15b, 16b) (AMB) were rated better (4.5) than
unambiguous ones (15a, 16a) (UNA) (4.0), (F1 (1,47) ¼ 6:79, p < :05;
F2 (1,15) ¼ 50:74, p < :001). There was no interaction between both factors
(F1 (1,47) ¼ 11, p ¼ :30, F2 < 1).
15.3.7.4 Discussion The order in which er and ich appeared in the
experimental items had no eVect on acceptability. In this respect, experiment
2a is comparable to the results of Timmermans et al. (2004) involving and-
coordination. The morphological ambiguity of the verb exerted an eVect on
acceptability, in the expected direction: whenever the morphological shape of
the verb Wts the person speciWcation of both pronouns because of the verbal
ambiguity, acceptability increases. This ambiguity eVect is in line with our
expectations. The acceptability of a sentence depends on whether the verb
agrees with the subject. In the unambiguous conditions, the verb visibly
disagrees with one of the two pronouns. In (15.15b, 15.16b), however, the
ambiguous verb appears to meet the requirements of both pronouns (but
only relative to diVerent interpretations of the verb), which makes a local
perception of acceptability possible. The computations for pairwise agreement
between the verb and the two pronouns yield positive results, which has a
positive eVect on global ambiguity even though the two pairwise agreement
computations cannot be integrated, since they work with diVerent
interpretations of the ambiguous verb.
One might object that the diVerence between the ambiguous and the
unambiguous condition might also be explained in terms of grammatical
well-formedness. The ambiguous verb form might have an underspeciWed
grammatical representation, viz. [singular, –2nd person], which is grammat-
ically compatible with both a 1st and a 3rd person subject. In contrast, the
features of the unambiguous 3rd person form clash with those of the 1st
person pronoun. Thus, the higher acceptability of the ambiguous forms
might only reXect the absence of a feature clash.
Such an account would leave it open, however, why the sentences with
ambiguous verb forms are not rated as fully grammatical, as they should be, if
no feature clash would be involved. We also tested the plausibility of this
alternative explanation in experiment 2b, in which we investigated the ac-
ceptability of sentences in which er ‘he’ and ihr ‘you, plural’ were conjoined by
or. In the regular present tense paradigm, 3rd person singular and 2nd person
plural forms fall together. There is no simple way in which this ambiguity
can be recast as underspeciWcation.3 If underspeciWcation rather than local

3 In a paper written after the completion of the present article, Müller (2005) oVers an under-
speciWcation analysis for (15.17a, 15.17b) within a distributed-morphology model, however.
306 Gradience in Wh-Movement Constructions

ambiguity was responsible for the Wndings in experiment 2a, there should be
no beneWt in acceptability in experiment 2b resulting from the use of hom-
ophonous forms.

15.3.8 Description of experiment 2b


15.3.8.1 Materials The four conditions of experiment 2b are exempliWed in
(15.17). The experimental items began with an NP consisting of the pronouns
er ‘he’ and ihr ‘you, plural’ conjoined by oder ‘or’. In the ER-initial condition,
er came Wrst (15.17a, 15.17c), in the IHR-initial condition, the NP began with
ihr (15.17b, 15.17d). The verb form was always 2nd person plural. In the
UNAmbiguous condition (15.17c, 15.17d), the verb appeared in past tense, in
which it is distinct from the 3rd person singular form. In the AMBiguous
condition, the present tense was used. Such verbs allow an additional 3rd
person reading.
(15.17) a. er oder ihr kommt verspätet zu dem TreVen ER, AMB
he or you come late to the meeting
b. ihr oder er kommt verspätet zu dem TreVen IHR, AMB
c. er oder ihr kamt verspätet zu dem TreVen ER,
UNA
he or you came late to the meeting
d. ihr oder er kamt verspätet zu dem TreVen IHR, UNA
15.3.8.2 Method Thirty-two students of the University of Potsdam
participated. They were paid for their participation, or received course
credits. Participants rated 96 sentences for acceptability, on a seven point
scale (1 ¼ very bad, 7 ¼ very good). There were 16 experimental items (4 items/
condition) in a within subject design, and 80 items not related to the
experiment.
15.3.8.3 Results Figure 15.5 represents mean judgements of acceptability
in experiment 2b. The mean acceptability of structures beginning with er
(ER-initial, 15.17a, 15.17c) was 4.03, and was statistically indistinguishable
from the 3.92 mean acceptability of sentences beginning with ihr
(IHR-initial, 15.17b, 15.17d) ) (F1 < 1, F2 < 1). However, there was a
signiWcant eVect of ambiguity: the ambiguous structures (15.17a, 15.17b)
(AMB) were rated better (4.58) than unambiguous ones (15.17c, 15.17d)
(UNA) (3.38) (F1(1,31) ¼ 28.65, p < .001; F2(1,15) ¼ 22.26, p < .001) There
was no interaction between both factors (F1 < 1, F2 < 1).
15.3.8.4 Discussion In line with previous Wndings, the order of the
pronouns had no eVect on acceptability. Acceptability was, rather,
Effects of Processing DiYculty on Judgements 307

7
6
5 4.64 4.52
UNA
4 3.42 3.33 AMB
3
2
1
ER IHR

Figure 15.5. Experiment 2b

inXuenced by local ambiguity again. Structures with a visible clash between


the 3rd person pronoun and the 2nd person verb form (15.17c, 15.17d) were less
acceptable than sentences in which the ambiguity of the verb form made it
seem compatible with both the 3rd singular and the 2nd person plural
pronoun. To the extent that this particular constellation of features is
diYcult to represent in terms of some (plausible) underspeciWcation of the
grammatical features of verbs like kommt, we would possess an additional
type of evidence for the claim that the acceptability diVerence found in
experiment 2b is not due to a diVerence in grammaticality between the two
conditions.
For the records, it may be added that we found a similar ambiguity eVect
for verb stems ending in –s in a further experiment. For such verbs (like reis- ,
‘travel’), the absence of geminate consonants in German implies that the
addition of the 3rd person singular –t ending has the same outcome as
the addition of the 2nd person singular –st ending, viz. reist. Sentences
such as (15.18a) with such ambiguous verb forms were again rated better
(4.80 versus 4.41 on our 7 point scale) than those involving the unambiguous
past tense form (15.18b) by 32 subjects in an experimental design
identical to the one of experiment 2b (F1 (1,31) ¼ 6:58, p < :05; F2 (1,15) ¼
4:38, p ¼ :05).

(15.18) a. er oder du reist nach Amerika


he or you travel to America
b. er oder du reiste nach Amerika
he or you travelled to America
308 Gradience in Wh-Movement Constructions

Local ambiguity thus seems to increase acceptability irrespective of the source


of the ambiguity and the particular feature combinations involved.

15.3.9 Experiment 3: Fronted verb phrases


Experiment 3 investigated whether case ambiguities can also increase accept-
ability. German VPs can appear in the position immediately preceding the
Wnite verb in main clauses, as (15.19a) illustrates. Such structures are gram-
matical when the NP in the fronted VP is an (underlying) object of the verb.
The inclusion of an (underlying) subject (as in 15.19b) is taken to be much less
acceptable.

(15.19) a. [VP einen Jungen geküsst] hat sie nicht


a.acc boy.acc kissed has she not
‘she has not kissed a boy’
b. ??ein Junge geküsst hat sie nicht .
a.nom boy.nom kissed has her not
‘a boy has not kissed her’
Feminine and neuter nouns do not distinguish morphologically between
nominative and accusative. Unlike (15.19a–b), (15.19c–d) with feminine Frau
involve a local ambiguity of the sentence initial NP.

(15.19) c. [VP eine Frau geküsst] hat er nicht


a.amb woman kissed has he.nom not
d. [VP eine Frau geküsst] hat ihn nicht
a.amb woman kissed has him.acc not
The grammatical restriction against the inclusion of subjects in preposed VPs
implies a parsing preference for initially analysing eine Frau as the object of
the verb geküsst in (15.19c–d). This analysis can be maintained in structures
such as (15.19c) in which the second NP is nominative, but it must be
abandoned in (15.19d) when the pronominal NP is parsed because it bears
accusative case, which identiWes it as the object. Example (15.19d) should be
less unacceptable than (15.19b) if global acceptability is inXuenced by tem-
porary acceptability values: unlike (15.19b), (15.19d) initially appears to respect
the ban against the inclusion of subjects in fronted VPs. We tested this
prediction in a questionnaire and a speeded acceptability rating experiment.

15.3.10 Description of experiment 3a


15.3.10.1 Material The experimental items had the structure illustrated in
(15.20).
Effects of Processing DiYculty on Judgements 309

(15.20) Fronted VP ¼ ambiguous subject + verb


a. Ein schlaues Mädchen geküsst hat ihn noch nie
a.amb clever.amb girl kissed has him not yet
‘a clever girl has not yet kissed him’
Fronted VP ¼ ambiguous object + verb
b. Ein schlaues Mädchen geküsst hat er noch nie
a.amb clever.amb girl kissed has he not yet
‘he has not yet kissed a clever girl’
Fronted VP ¼ unambiguous subject + verb
c. Ein junger Mann besucht hatte ihn erst gestern
a.nom young man visited had him only yesterday
‘a young man visited him only yesterday’
Fronted VP ¼ unambiguous object + verb
d. einen jungen Mann besucht hatte er erst gestern
a.acc young man visited had he only yesterday
‘he visited a young man only yesterday’

All experimental items involved a preposed VP. The NP in this VP


could either bear overt case morphology (unambiguous condition,
15.20c–d) or be unmarked for case (locally ambiguous condition, 15.20a–b).
The second NP in the sentence was a pronoun bearing the case not realized
by the Wrst NP (unambiguous condition), or which disambiguated the
initial NP in the ambiguous condition towards an object or subject
reading. A set of 64 sentences (16 sets of identical lexical material in each of
the 4 conditions) was created and assigned to 4 between subjects versions in
such a way that no subject saw identical lexical material in more than one
sentence.

15.3.10.2 Method Forty-eight students of the University of Potsdam


participated. They were paid for their participation, or received course
credits. The 16 experimental items (4 per condition) were among the
distractors of experiment 2a.

15.3.10.3 Results and discussion As Wgure 15.6 shows, fronted verb phrases
that include a direct object are more acceptable than those that include a
fronted subject (F1 (1,47) ¼ 34:74, p < :001, F2 (1,15) ¼ 37:26, p < :001).,
Contrary to our expectation, there was no main eVect of ambiguity
(F1 < 1, F2 < 1) and no interaction between both factors
(F1 (1,47) ¼ 2:69, p ¼ :11, F2 < 1). We used the same material in a speeded
acceptability rating experiment.
310 Gradience in Wh-Movement Constructions

6
4.9 5.1
5
3.8 AMB
4 3.7
UNA
3

1
SUB OBJ

Figure 15.6. Experiment 3a (questionnaire)

15.3.11 Description of experiment 3b


15.3.11.1 Material The experimental items were the ones used in
experiment 3a.
15.3.11.2 Method Twenty-six students were paid for their participation in a
speeded acceptability judgement task. There were 64 experimental sentences
(16 per condition), and 160 unrelated Wller sentences. After a set of 16 training
sentences (4 in each of the critical conditions), the sentences of the
experiment were randomly presented word by word. Every word appeared
in the centre of a screen for 400 ms (plus 100 ms ISI). 500 ms after the last
word of each sentence, subjects had to judge its well-formedness within a
maximal interval of 3000 ms by pressing one of two buttons. 1000 ms after
their response, the next trial began.

94.71 96.15
100

80
65.63
% acceptable

60 51.44 AMB
UNA
40

20

0
VP with SU VP with OB

Figure 15.7. Experiment 3b (speeded rating)


Effects of Processing DiYculty on Judgements 311

15.3.11.3 Results The results of experiment 3b are represented in Figure 15.7.


There was a main eVect of the grammatical function of the NP in the fronted
VP: Fronted VPs that include an object were rated acceptable in 89.5 per cent
of cases, contrasting with 54.6 per cent for VPs including a subject
(F1 (1,25) ¼ 85:65, p < :0001, F2 (1,15) ¼ 127:7, p < :001). Items in which the
NP in VP was case-ambiguous were rated as acceptable more often (75.11 per
cent) than items in which there was no ambiguity (69.0 per cent)
(F1 (1,25) ¼ 6:98, p < :05, F2 (1,15) ¼ 14:87, p < :05:). The interaction between
the grammatical function of the NP in VP and its ambiguity was also sig-
niWcant (F1 (1,25) ¼ 10:25, p < :01, F2 (1,15) ¼ 29:14, p < :001): for VPs in-
cluding objects, there was no ambiguity related diVerence (ambiguous: 94.7
per cent; unambiguous 96.1 per cent). VP-fronting that pied-pipes the subject
was considered acceptable in 65.6 per cent of the trials when the subject was
not overtly case marked, but only 51.4 per cent of the trials were rated as
acceptable when the subject-NP bore an unambiguous case marking.
15.3.11.4 Discussion Experiments 3a and 3b show that there is a syntactic
restriction against underlying subjects appearing in fronted VPs. They also
show that local case ambiguities do not reduce acceptability. Furthermore,
experiment 3b conWrms our expectation that local ambiguities may increase
acceptability: the temporary fulWlment of the constraint blocking subjects in
preposed VPs in structures such as (15.20a) seems to render such structures
more acceptable than examples such as (15.20c), in which the violation of the
anti-subject restriction is obvious from the beginning. Experiment 3b is thus
in line with experiments 1 and 2.
However, locally ambiguous and unambiguous structures were equally
acceptable in experiment 3a. We can only oVer some speculations about the
reasons for this diVerence to the other experiments. The resolution of the
local ambiguity in experiment 3 aVects the assignment of grammatical func-
tions and the interpretation of the sentence, while the number and person
ambiguities had no such eVect in experiments 1 and 2. The need for revising
an initial interpretation may have negative consequences for acceptability that
override positive eVects of local ambiguity (experiment 3a). If this reduction
of acceptability takes place in a time window later than the one used in
experiment 3b, we would understand why the speeded acceptability rating
task shows a positive impact of local ambiguity on acceptability.

15.4 Structural ambiguities


In experiments 1–3, the positive results of early computations increased the
global acceptability of a sentence even when the outcome of these early
312 Gradience in Wh-Movement Constructions

computations had to be revised later. The experiments dealt with construc-


tions in which some crucial item was morphologically ambiguous. In experi-
ment 4, we investigated whether structural ambiguities in which
morphological facts play no (decisive) role can also lead to increased accept-
ability.
Experiment 4 focused on structures that Kvam (1983) had used in his
informal studies on long wh-movement constructions in German. He
observed that the acceptability of such sentences depends on whether the
grammatical features of the phrase having undergone long movement also
match requirements imposed by the matrix verb on its arguments. Example
(15.21) illustrates two structures in which the subject of a complement clause
has been moved into the matrix clause. Example (15.21a) is more acceptable
than (15.21b). When (15.21a) is parsed, was ‘what’ locally allows an analysis as
an object of the matrix clause, while wer ‘who’ could neither function as the
subject nor as the object of the matrix clause.
(15.21) a. was denken Sie, dass die Entwicklung beeinXusst hat
what think you that the development inXuenced has
b. wer denken Sie, dass die Entwicklung beeinXusst hat
who.sg think.pl you that the development inXuenced has
‘who/what do you think inXuences the development’
The same logic underlies the contrast between the relative clauses in (15.22), in
which the relative pronoun is extracted from an inWnitival complement
clause. In the more acceptable (15.22a), the relative pronoun die also Wts the
accusative case requirement of the predicate embedding the inWnitive. In the
less acceptable (15.22b), the dative case of the relative pronoun clashes with
the case requirement of the embedding predicate.
(15.22) a. Eine Kerze, die er für gut hielt, dem Ludwig zu weihen
a candle which he for good held the.dat Ludwig to dedicate
‘a candle which he considered good to dedicate to Ludwig’
b. Eine Frau, der er für angemessen hielt, ein Geschenk zu geben
a woman who.dat he for appropriate held a present to give

Such contrasts in acceptability are predicted by the hypothesis pursued here.


The relatively low acceptability of the b-examples reXects the decrease in
acceptability that long distance movement always seems to come along
with. Probably, this decrease is due to the processing problems which long
distance movement creates. The a-examples, on the other hand, involve a
local ambiguity: temporarily, the a-examples can be interpreted by the human
parser as involving short distance movement only. Short distance movement
Effects of Processing DiYculty on Judgements 313

is more acceptable than long distance movement. If the global acceptability


of a clause reXects the status of intermediate parsing steps, long distance
movement constructions that temporarily allow a short distance analysis
(15.21, 15.22a) should be more acceptable than those long distance movement
constructions that do not involve such an ambiguity (15.21, 15.22b).

15.4.1 Description of experiment 4


15.4.1.1 Material Experiment 4 consists of two subexperiments, one for
wh-questions, the other for relative clauses. In the question subexperiment,
the eight experimental items (4 per condition) had the structure illustrated in
(15.21). The subject of a dass ‘that’- complement clause is moved into the
matrix clause, consisting of a plural matrix verb and a pronominal subject. In
the unambiguous wh-condition (15.21b), the subject extracted from the
complement clause is nominative wer ‘who’. Because of its case, wer allows
no intermediate analysis as the object of the matrix clause. Since it does not
agree with the plural verb, wer can also not be analysed as the matrix subject.
In the ambiguous wh-condition (15.21a), the moved wh-pronoun was ‘what’ is
case ambiguous. In its accusative interpretation, it could Wgure as the object of
the matrix clause, in its (eventually mandatory) nominative interpretation, it
is the subject of the complement clause.
The eight items of the relative clause subexperiment were constructed as
illustrated in (15.22). A relative pronoun is extracted from an inWnitival
complement clause. In the unambiguous relative clause condition (15.22b),
the dative case of the relative pronoun does not match the case requirements
of the predicate embedding the inWnitive. In the locally ambiguous relative
clause condition (15.22a), the accusative relative pronoun is compatible with
the case requirements of the embedding predicate.

15.4.1.2 Method Forty-eight students of the University of Potsdam


participated. They were paid for their participation, or received course
credits. The sixteen experimental items were among the distractor items of
experiment 2a.

15.4.1.3 Results and discussion Figure 15.8 graphically represents the


mean acceptability of locally ambiguous and unambiguous wh-questions.
The acceptability of the locally ambiguous question is much higher
than that of the unambiguous construction (F1 (1,47) ¼ 30:05, p < :001,
F2 (1,7) ¼ 62:50, p < :001).
The wh-subexperiment of experiment 4 conWrms that long distance
wh-movement is not fully acceptable for speakers of Northern German (see
314 Gradience in Wh-Movement Constructions

5 4.7

4
3.15
3

1
Unambigous Ambiguous

Figure 15.8. Experiment 4–wh-questions

also Fanselow et al. to appear). The fairly low acceptability value for the
unambiguous wh-condition constitutes clear evidence for this. Furthermore,
as in the preceding experiments, acceptability is aVected by the presence of a
local ambiguity in a signiWcant way: if the sentence to be judged can tempor-
arily be analysed as involving short movement, its acceptability goes up in
quite a dramatic way.
The initial segment of (15.21a) is locally ambiguous in more than one way.
In addition to the possibility of interpreting was as a matrix clause object or
an argument of the complement clause, was also allows for a temporary
analysis as a wh-scope-marker in the German ‘partial movement construc-
tion’ illustrated in (15.23).
(15.23) was denkst Du wen Maria einlädt
what think you who.acc Mary invites
‘who do you think that Mary invites?’
Therefore, we only know that the local ambiguity of (15.21a) increases its
acceptability, but we cannot decide whether this increase is really due to the
short versus long movement ambiguity.
The relative clause subexperiment avoids this problem. In the grammatical
context in which they appear in (15.22), the crucial elements die and der
can only be analysed as relative pronouns. The only ambiguity is a structural
one: long versus short movement of the relative pronoun. When the
relative pronoun is temporarily compatible with a short movement interpret-
ation, the structure is more acceptable than when the case of the
relative pronoun clashes with the requirements of the matrix clause
(F1 (1,47) ¼ 8:28, p < 0:01), F2 (1,7) ¼ 3:73, p ¼ :10).
Both subexperiments thus show the expected inXuence of the local
ambiguity on global acceptability: long distance movement structures are
Effects of Processing DiYculty on Judgements 315

7
6
4.76
5 4.17
4
3
2
1
Unambiguous Ambiguous

Figure 15.9. Experiment 4–relative clauses

perceived as more acceptable when the wh-phrase/the relative pronoun


involves a local ambiguity. The Wnal subexperiment revealed that such eVects
show up even when the only ambiguity involved is the one between short and
long movement.

15.5 Conclusions
The experiments reported in this paper have shown that the presence of a
local ambiguity inXuences the overall acceptability of a sentence. If our
interpretation of the results is correct, there is a spillover from the acceptabil-
ity of the initial analysis of a locally ambiguous structure to the global
acceptability of the complete construction. Structures violating some con-
straint may appear more acceptable if their parsing involves an intermediate
analysis in which the crucial constraint seems fulWlled. Similar eVects show up
in further constructions, such as free relative clauses (see Vogel et al in
preparation).
At the theoretical level, several issues arise. First, the factors need to be
identiWed under which local ambiguities increase acceptability. Secondly,
means will have to be developed by which we can distinguish mitigating
eVects of local ambiguities from a situation in which the grammar accepts a
feature clash in case it has no morphological consequences. Thus, in
contrast to what we investigated in experiment 2, plural NP coordinations
such as the one in (15.24) that involve 1st and 3rd person NPs seem fully
acceptable although they should involve a clash of person features. Perhaps,
the diVerent status of (15.24) and the structures we studied in experiment 2 is
caused by the fact that 1st and 3rd person plural verb forms are always
identical in German, whereas the syncretisms studied above are conWned to
certain verb classes, or certain tense forms. Similarly, the case clash for was in
316 Gradience in Wh-Movement Constructions

(15.25)4 has no negative eVect on acceptability at all, in contrast to what


happens in (15.26), and this is certainly due to the fact that inanimate
pronouns never make an overt distinction between nominative and accusative
case.
(15.24) entweder wir oder die Brasilianer gewinnen das Spiel
either we or the Brazilians win the game
(15.25) was Du kaufst ist zu teuer
what you buy is too expensive
(15.26) wer/wen Du mitbringst ist zu schüchtern
who you bring is too shy
‘the person who you bring is too shy’
At a more practical level, our results certainly suggest that sentences involving
a local ambiguity should be avoided when one tries to assess the acceptability
of a construction one is interested in.

4 Kaufen assigns accusative case to was, while the matrix predicate requires nominative case.
16

What’s What?
N O M I E RT E S C H I K - S H I R

16.1 What is the status of gradience?


My purpose in this chapter is to demonstrate that the source of graded
acceptability judgements cannot be purely syntactic. Instead, such data are
predicted by information structure (IS) constraints.1
Since the early days of generative grammar, it was observed that accept-
ability often patterns as squishes (e.g. Ross 1971). Ross’ explanation was
couched in terms of the strength of the transformation, the strength of the
construction (island) and the strength of the language. Danish was therefore
considered to be a ‘strong’ language because it allowed extraction out of
relative clauses, which were graded as ‘strong’ islands, by a ‘weak’ transform-
ation such as wh-movement:
(16.1) Hvad for en slags is er der mange børn der kan li?
what kind of ice cream are there many children who like
The idea that gradience is the result of the ‘strength’ of the processes (or
constraints) involved has survived till today, particularly in some versions of
optimality theory. Less sophisticated attempts at this type of theory were
made in the 1990s: for example, the Empty Category Principle (ECP) was
considered to be a stronger constraint and subjacency was considered to be
a weak constraint. The violation of both these constraints together was
predicted to result in a stronger grammaticality infraction than the violation
of just one. This provided an explanation for the distinction between the
examples in (16.2) and (16.3):

1 Thanks to Gisbert Fanselow and an anonymous reviewer for their comments and to
Tova Rapoport, SoWe Raviv, and the audience of the ‘Conference on Gradedness’ at the University
of Potsdam for their feedback. This work is partially supported by Israel Science foundation Grant
#1012/03.
318 Gradience in Wh- Movement Constructions

(16.2) a. ?This is the guy that I don’t know whether to invite t


b. ?This is the guy that I don’t know whether I should ask t to come
to the party.
c. ?This is the guy that I asked whether Peter had seen t at the party.
(16.3) a. *This is the guy that I don’t know whether t should be asked to
come to the party.
b *This is the guy that I asked whether t had seen Peter.

The hierarchy in (16.4) (Lasnik and Saito 1992: 88) illustrates that the strength
of subjacency can be seen as depending on the number of barriers crossed. In
the last example of the three subjacency is doubly violated, in the others it is
only singly violated.

(16.4) a. ??What did you wonder whether John bought?


b. ?*Where did you wonder what John put?
c. *Where did you see the book which John put?
This shows that in principle it is possible to have, as the output of the syntax,
sentences of various levels of acceptability. The number of constraints
violated, as well as the strength of the relevant constraint/s, will render
diVerent outputs.
However, as observed already in Ross (1971), the extraction data are
complicated by the fact that not all processes of extraction render the
same results. The sentences in (16.4a) and (16.4b) are worse than the ones
in (16.2). An attempt to explain why wh-movement can render worse
outputs than relativization is to be found in Cinque (1990). Cinque dem-
onstrates that extracted phrases which can be interpreted as being d-linked
render superior results to those in which the extracted phrases cannot be
interpreted in this way. The type of extraction illustrated in (16.2) is more readily
interpreted as being d-linked than simple wh-movement. D-linked wh-move-
ment also improves the examples in (16.4a) and (16.4b), as shown in (16.5):
(16.5) a. ?Which book did you wonder whether John bought?
b. ??Which place did you wonder what John put?
D-linking depends on whether a contextual referent for the wh-phrase is
available. Hence in a context in which a set of relevant books (16.5a) or a set
of relevant places (16.5b) is available (16.4a) and (16.4b) should be as good as
the examples in (16.5). Cinque builds this notion of referentiality into his
syntax and predicts that when the context provides the required referent,
the extraction should be perfectly acceptable. What is missing in Cinque’s
What’s What? 319

approach is an explanation of why referentiality should interact with syntactic


constraints, such as subjacency, in this way.
On the basis of squishy data of this type I argued in Erteschik-Shir (1973)
that extraction is completely determined by IS constraints, in particular that
only focus domains are transparent for purposes of extraction. The intuition
behind this idea was that potential focus domains are processed diVerently
from non-focus domains in that gaps are only visible in the former.2 In view
of the fact that the availability of focus domains depends on context, the results
will be graded according to the discourse into which the target sentence is
embedded. Example (16.6) provides an illustration:
(16.6) a. Who did John say that he had seen?
b. ?Who did John mumble that he had seen?
c. *Who did John lisp that he had seen?
Example (16.6b) is improved in a context in which ‘mumbling’ has been
mentioned (e.g. following ‘At our meetings everyone always mumbles’).
Example (16.6c) is acceptable in a context in which it is known that John
lisps. This is because such a context enables the main verb to be defocused and
consequently enables the subordinate that-clause to be focused. If intuitions
are elicited out of context, judgements for sentences of this kind will depend
on whatever context the informant happens to come up with. The examples
in (16.7) illustrate other kinds of contextual factors that interact with focus-
assignment and the concomitant acceptability judgements.
(16.7) a. ??What did the paper editorialize that the minister had done?
b. *What did you animadvert that he had done?
Example (16.7a) would sound much better if uttered by a member of an
editorial board, and (16.7b) probably can’t be contextually improved due to
the fact that highly infrequent items such as animadvert are necessarily
focused. Contrastive contexts also interact with extraction judgements:
(16.8) a. ?Who did John SAY that he had seen? [¼contrastive]
b. Who did JOHN say that he had seen? [¼contrastive]
Contrastive focus on the main verb, as in (16.8a), or on another constituent of
the main clause, as in (16.8b), does not preclude focus on the subordinate
clause. Therefore these sentences are Wne with contrastive interpretations. The
reason (16.8a) is slightly more degraded than (16.8b) is because it is harder to
construe a likely context for it.
2 See Erteschik-Shir and Lappin (1983: 87) for this proposal which also provides an explanation for
why resumptive pronouns salvage islands.
320 Gradience in Wh- Movement Constructions

These examples illustrate that the positive response of informants is con-


ditional on their ability to contextualize in such a way that the clause from
which extraction has occurred is interpreted as a focus domain. In view of the
fact that informants diVer with respect to the contexts they are able to
construct, the results across informants are predicted to be non-uniform.
A number of diVerent syntactic solutions have been suggested over the
years to account for such squishes in grammaticality. This type of solution
does not, however, explain the gradience of the output, nor does it explain the
contextual eVects. Speakers’ judgements with respect to data of the kind
illustrated in (16.6)–(16.8) are rarely stable: diVerences are found across
speakers and sometimes the responses of the same speaker change. This
type of instability occurs whenever grammaticality is context dependent
because the judgements in such cases are also context dependent. Therefore,
if a sentence of this type is presented to an informant out of context, it is
judged good to the extent that the speaker can imagine a context in which the
verb is defocused. The lowest grade will be assigned to a sentence for which
the particular informant does not come up with a context which improves it.
No syntactic account of data of the type in (16.6)–(16.8), even if it can predict
gradience, will be able to predict the contexts which improve acceptability.
Syntactic constraints will therefore always fail empirically.
Let us examine if this is indeed the case: it has been suggested that the that-
clauses following verbs of manner-of-speaking, other than say, are adjuncts
rather than complements (e.g. Baltin 1982) and adjuncts are, of course,
islands. Extraction is therefore predicted to be blocked. Formulating the
constraint on extraction out of that-clauses as a syntactic constraint on
extraction out of adjuncts predicts that, with a particular verb, extraction
will either be perfectly good, or totally bad. Moreover, it will not allow for the
inXuence of context in improving extractability.
Supporting the ‘adjunct’ analysis is the fact that the that-clauses which are
argued to be adjuncts are optional:
(16.9) John mumbled/lisped/*said.
This correlation, however, also follows from an analysis in terms of IS: the
verbs that require a complement are light verbs which do not provide a focus.
Sentences with such verbs without a complement are ruled out because they
do not contain an informational focus, a minimal requirement for any
sentence.
Another reason that the oddness of the sentences in (16.6)–(16.8) cannot be
due to a syntactic constraint is that not only must a level of acceptability be
assigned, but an account of the context dependency of the grammaticality
What’s What? 321

judgements must also be given. Such an account is provided by a theory of IS


which accounts for the contextual properties of sentences.
I conclude that any phenomenon which varies with context among and
across speakers cannot receive a syntactic account. An account in terms of IS
is geared to predict this type of variation. It follows that syntactic constraints
will always render ungraded results. A violation of a syntactic constraint will
therefore be ungrammatical, a violation of an IS constraint will be open to
contextual variation and will therefore result in gradience. There will be no
weak syntactic constraints, only strong ones. Examples of violations of such
‘real’ syntactic constraints are shown in (16.10):
(16.10) a. *John eat soup. (agreement)3
b. *Eats John soup? (do)
c. *John likes he. (case)
If we adopt the proposal that violations of syntactic constraints cannot be
graded, whereas violations of IS constraints can be, we can employ the
presence of context-sensitive grammaticality squishes as a diagnostic for
whether a syntactic or an IS constraint is involved: whenever context interacts
with acceptability, the constraint cannot be syntactic. The answer to the
question posed in the title is therefore that IS constraints can generate graded
output whereas syntactic constraints cannot.
It is often assumed that IS constraints must be universal since they are
based on universal concepts such as topic and focus. This is only partially true
and depends on how the particular language codes these basic concepts.
Danish is a language in which topics are fronted whenever possible. In
English, however, topics are generally interpreted in situ. This diVerence
triggers a diVerence in the application of island constraints in the two
languages. Compare, for example, the following case of extraction out of a
relative clause in the two languages (both are licensed by the IS constraint in
(16.16) below):
(16.11) a. Den slags is er der mange der kan li.
This kind icecream are there many who like
‘This kind of icecream there are many who like.’
b. ?This is the kind of icecream that there are many people who like.

3 Timmermans et al. (2004) argue that agreement involves both a syntactic procedure and a
conceptual-semantic procedure which aVects person agreement with Dutch and German coordinated
elements which diVer in person features. The former, according to these authors, ‘hardly ever derails’.
This is what I have in mind here. The fact that nonsyntactic procedures are also involved in certain
agreement conWgurations is irrelevant to the point I’m making here.
322 Gradience in Wh- Movement Constructions

Example (16.11a) is perfect in Danish and sentences of this sort are common.
Example (16.11b) is surprisingly good in English in view of the fact that it
violates the complex NP constraint, yet it is not considered perfect by speakers
of English. In Erteschik-Shir (1982) I oVer more comparative data and
illustrate that the acceptability squish in English is exactly the same as in
Danish, yet all the examples in English are judged to be somewhat worse than
their Danish counterparts.
In Erteschik-Shir (1997), I introduce a theory of IS, f(ocus)-structure
theory. F-structure is geared to interact with syntax, phonology, and seman-
tics and is therefore viewed as an integral part of grammar. Here I argue that
this approach predicts gradience eVects of various kinds. In Section 16.2, I
map out the theory of f-structure. Section 16.3 demonstrates the f-structure
constraint on extraction. In section 16.4, I show that the same constraint
which accounts for extraction also accounts for Superiority in English and the
concomitant gradience eVects. In section 16.5, I extend this account to explain
diVerent superiority eVects in Hebrew, German, and Danish. Section 16.6
provides a conclusion.

16.2 Introduction to f(ocus)-structure theory


The primitives of f-structure are topic and focus.4 These features are legible to
both interfaces: in PF this shows up across languages as intonational marking
as well as f-structure motivated displacements. At the interpretative interface
it allows for the calculation of truth values. Following Strawson (1964) and
Reinhart (1981), I deWne topics as the ‘address’ in a Wle system under which
sentences are evaluated (Erteschik-Shir 1997). If truth values are calculated
with respect to the topic, it also follows that every sentence must have a topic.
Topics are selected from the set of referents previously introduced in the
discourse, which correspond to cards in the common ground. Topics are
therefore necessarily speciWc: they identify an element in the common ground
that the sentence is about.
Whereas diVerent types of focus have been deWned in the literatures (e.g.
information focus, contrastive focus, broad focus, narrow focus), I propose
only one type of focus which functions to introduce or activate discourse
referents. The diVerent types of focus are derived in this framework by

4 In Erteschik-Shir (1997) I assume that the output of syntax is freely annotated for topic and focus
features. In Erteschik-Shir (2003), I introduce topic-focus features at initial merge on a par with
w-features in order to abide by the inclusiveness principle. The issue of how top/foc features are
introduced into the grammar is immaterial to the topic of this paper.
What’s What? 323

allowing for multiple topic-focus assignments. As an illustration, examine the


contrastive f-structure in (16.12):
  
a Book foc
(16.12)Itop read top foc, (not a magazine)
a magazine

In cases of contrast, a contrast set with two members is either discoursally


available or else it is accommodated. In (16.12) ‘a book and a magazine’ form
such a set. In view of the fact that this set is discoursally available, it provides a
topic (as indicated by the top marking on the curly brackets). One of the
members of this set is focused (in this case ‘a book’) and in this way, the set is
partitioned, excluding the non-focused member of the set from the assertion.
Since foci are stressed in English, stress is assigned to the contrasted elem-
ent.5,6 Contrastive elements are thus marked as both topic (the discourse-
available pair) and as focus (the selected element). Such an element can play
the role of a focus in the sentence as a whole as it does in (16.12), and it can
also function as a main topic, forming a contrastive topic; this is illustrated in
(16.13):
  
TOMfoc
(16.13) top top is handsome.
Bill

Example (16.13) asserts that Tom (and not Bill) is handsome.


Not all f-structure assignments are equally good. Example (16.14) illustrates
a well-known asymmetry: objects are harder to interpret as topics than
subjects (in languages with Wxed word order and no morphological marking
of top/foc):7
(16.14) Tell me about John:
a. He is in love with Mary.
b. ??Mary is in love with him.
In view of the fact that this constraint Wgures prominently in languages such
as English which have Wxed word order, I propose that the reason for this
asymmetry is that there is a preference for aligning f-structure with syntactic
structure. The alignment is shown in (16.15):
5 See Erteschik-Shir (1997, 1999) for an account of intonation in which f-structure provides the
input to a stress rule which assigns stress to foci.
6 Example (16.12) also illustrates that not all constituents of a sentence need be assigned either top
or foc features. Here the verb is assigned neither and the sentence could provide an answer to a
question such as ‘Did you read a book or a magazine?’
7 See, among others, Li and Thompson (1976); Reinhart (1981); Andersen (1991); and Lambrecht
(1994).
324 Gradience in Wh- Movement Constructions

(16.15) Canonical f-structure:


SUBJECTtop [ . . . X . . . ]foc
In other words, an unmarked f-structure is one in which syntactic structure is
isomorphic with f-structure: either the subject is the topic and the VP is the
focus or there is a stage topic and the remaining sentence is the focus.8 It
follows that a marked f-structure is one in which an object is the topic.
This section provided a brief outline of the discourse properties of
f-structure.9 I now turn to f-structure constraints on syntax, constraints
which provide a graded output.

16.3 F-structure constraints


As pointed out above, only focus domains are transparent for purposes of
extraction. Erteschik-Shir (1997) argues that this constraint on extraction
falls under a more general constraint on ‘I-(dentiWcational) dependencies’
which include anaphora, wh-trace dependencies, multiple wh-dependencies,
negation and focus of negation, and copular sentences. What all these de-
pendencies have in common is that the dependent is identiWed in the con-
struction, either by its antecedent or by an operator. The constraint which
governs I-dependencies is the Subject Constraint:
(16.16) An I-dependency can occur only in a canonical f-structure:
SUBJECTtop [ . . . X . . . ]foc
j
˜I-dependency
I-dependencies are thus restricted to f-structures in which the subject is the
topic and the dependent is contained in the focus. The intuition behind this
constraint is that dependents must be identiWed and that a canonical
f-structure, in which f-structure and syntactic structure are aligned, enables
the processing of this identiWcation. In the case of wh-traces, for example, the
trace must be identiWed with the fronted wh-phrase. The proposed constraint
restricts such identiWcation to canonical f-structures. The constraint is thus
couched in processing terms in which f-structure plays a critical role.

8 Sentences uttered out-of-the-blue are contextually linked to the here-and-now of the discourse. I
argue in Erteschik-Shir (1997) that such sentences are to be analysed as all-focus predicated of a ‘stage’
topic. The sentence It is raining, for example, has such a stage topic and is therefore evaluated with
respect to the here-and-now. All-focus sentences also have a canonical f-structure in which the (covert)
topic precedes the focus.
9 I have included only those aspects of f-structure strictly needed for the discussion in this chapter.
See Erteschik-Shir (1997) for a more complete introduction to f-structure theory.
What’s What? 325

Let us Wrst examine how the constraint applies to the graded extraction facts
in (16.6)–(16.8). In Erteschik-Shir and Rapoport (in preparation), we oVer a
lexical analysis of verbs in terms of meaning components. We claim that verbs of
speaking have a Manner (M) meaning-component. M-components are inter-
preted as adverbial modiWers, which normally attract focus. The M-component
of ‘light’ manner-of-speaking verbs such as say is light, that is there is no
adverbial modiWcation, and the verb cannot be focused. M-components can be
defocused contextually, enabling focus on the subordinate clause, which then
meets the requirement on extraction, since, according to the subject constraint,
the dependent (the trace) must be contained in the focus domain. It follows that,
out of context, only that-clauses under say allow extraction. All the other
manner-of-speaking verbs require some sort of contextualization in order for
the adverbial element of the verb to be defocused, thus allowing the subordinate
clause to be focused. Extraction is judged acceptable in these cases to the extent
that the context enables such a focus assignment.
The subject constraint, which constrains dependencies according to
whether the syntactic structure and the f-structure are aligned in a certain
way can, in cases such as this one, generate graded results. This is not always
the case. Extraction out of sentential subjects is always ungrammatical and
cannot be contextually ameliorated. Example (16.17) gives the f-structure
assigned to such a case:
(16.17) *Who is [that John likes t]top [interesting]foc
In order to comply with the subject constraint, the subject, in this case a
sentential one, must be assigned topic. Since dependents must be in the focus
domain, they cannot be identiWed within topics and extraction will always be
blocked. Although the subject constraint involves f-structure, it does not
necessarily render graded results. This is because the constraint involves not
only f-structure but also the alignment of f-structure with syntactic structure.
Sentential subjects are absolute islands because they are both IS topics and
syntactic subjects.

16.4 Superiority
Superiority eVects are graded as the examples in (16.18) show:

(16.18) a. Who ate what?


b. *What did who eat?
c. Which boy read which of the books?
d. Which of the books did which boy read?
326 Gradience in Wh- Movement Constructions

e. ?What did which boy read?


f. ?*Which of the books did who read?
Superiority therefore provides a good test case to demonstrate how gradience
is predicted by f-structure theoretical constraints.
The answer to a multiple wh-question forms a paired list, as demonstrated
in (16.19):
(16.19) Q: Who read what?
A: John read the Odyssey and Peter read Daniel Deronda.
Such an answer can be viewed as ‘identifying’ each object (answer to what)
with one of the subjects (answer to who). In this sense the multiple wh-
question itself forms an I-dependency in which one wh-phrase is dependent
on the other.
Superiority eVects are the result of two I-dependencies in the same
structure:
(16.20) *What did who read t
j j
j j
One I-dependency is between the fronted wh-phrase and its trace. The other
one is between the two wh-phrases. As (16.20) illustrates, the dependent is
identiWed in two diVerent dependencies at once. This results in an interpret-
ative clash, thus blocking the processing of the sentence.
The subject constraint is not violated, however, since the subject wh-phrase
can be assigned topic (the question ranges over a discourse speciWed set; it is
d-linked) and the trace can be analysed as within the focus domain. Since it is
not f-structure assignment which rules out the sentence, context should not
have an eVect on cases of superiority. This prediction is false, however, as
shown by the following well-known example:
(16.21) I know that we need to install transistor A, transistor B, and
transistor C, and I know that these three holes are for transistors,
but I’ll be damned if I can Wgure out from the instructions where
what goes! (Pesetsky 1987, from Bolinger 1978).
The answer to this puzzle lies in a proper understanding of the distinction
between d-linked and non-d-linked questions, however not the one proposed
in Pesetsky (1987; see Erteschik-Shir 1986, 1997). Examples (16.22a) and
(16.22b) illustrate a non-d-linked and a d-linked question, respectively, and
the f-structure of each one:
What’s What? 327

(16.22) a. What did you choose?


[What] did you top [choose t]foc
b. Which book did you choose?
[Which book]top [did you choose t]foc
In the non-d-linked question in (16.22a), the fronted wh-phrase and its trace
form an I-dependency and the trace is interpreted as an anaphor. Such a
question must therefore conform to the Subject Constraint. In (16.22b),
however, the fronted wh-phrase functions as a topic in that it ranges over a
contextually available set (of books). The trace can therefore be interpreted on
a par with a coreferent pronoun, since the set over which it ranges is
discoursally available. Since no I-dependency is deWned, the subject constraint
is not invoked, hence no superiority eVects are predicted with which-phrases,
which must be interpreted as d-linked. Questions with simple wh-phrases can
be interpreted as being d-linked if the context provides a set over which they
must range. That is why superiority violations such as (16.21) can be context-
ually ameliorated. They are always degraded, however. The reason is that both
wh-phrases have to be interpreted as topics as shown in (16.23):

(16.23) [Where]top [[whattop [goes t]foc]foc


j j
I-dependency
The subject wh-phrase forms an I-dependency with the trace in order to
render the pair-list reading. The subject constraint on I-dependencies requires
the subject to be a topic. The fronted wh-phrase must be interpreted as a topic
because otherwise it will form an I-dependency with the trace which will then
be doubly identiWed as in (16.21). Bolinger’s detailed context allows for such
an interpretation. The question will be viewed as degraded relative to whether
the context forces a topic reading on both wh-phrases or not. Note that both
wh-phrases must be interpreted as d-linked. Which-phrases are necessarily
d-linked. Therefore, multiple wh-questions involving only which-phrases are
perfect, as shown in (16.18d). When only one of the wh-phrases is a which-
phrase, the other depends on context to receive a d-linked interpretation. This
is why (16.18e) and (16.18f) are degraded.
The examples in (16.24) provide further evidence for the analysis of super-
iority eVects proposed here. They illustrate that superiority eVects also arise in
single wh-questions when the subject is a nonspeciWc indeWnite, that is a
subject which cannot be interpreted as a topic:
(16.24) a. *What did a boy Wnd?
b. (?)Which book did a boy Wnd?
328 Gradience in Wh- Movement Constructions

c. What did a certain boy Wnd?  (16.18e)


d. What did a BOY Wnd?
e. What do boys like?
Example (16.24a) violates the Subject Constraint because the subject cannot
be interpreted as a topic. In (16.24b), the fronted wh-phrase is d-linked and
therefore does not form an I-dependency with its trace. It is degraded on a par
with a sentence with an indeWnite subject and a deWnite object as in (16.25):
(16.25) a. (?)A boy found the book.
b. A BOY found the book.
Example (16.25) is degraded because it is a non-canonical f-structure (cf.
(16.15)) in which the object is the topic. Note that contrastive stress on the
subject as in (16.25b) enables its interpretation as a topic, rendering a canon-
ical f-structure. Examples (16.24c), (16.24d), and (16.24e) do not violate the
subject constraint because speciWc, contrastive, and generic indeWnite subjects
are interpretable as topics.
Kayne’s (1984) facts in (16.26) and (16.27) show that, surprisingly, an extra
wh- improves superiority violations:
(16.26) What did [who]top [hide t where]foc
j______________j
j__________________j
(16.27) Who knows what whotop [saw t]foc
j______________j
j_____________j
This is because the extra wh-phrase makes it possible to circumvent doubly-
identifying the trace. In (16.26), for example, the fronted wh-phrase forms an
I-dependency with the trace. This dependency is licensed by the subject
constraint since the subject is interpreted as a topic and the trace is embedded
in the focus. Another I-dependency is formed between the two remaining wh-
phrases. This I-dependency is also licensed by the subject constraint since the
dependent is embedded in the focus. The presence of the extra wh-phrase
enables the formation of two separate I-dependencies without forcing a
double identiWcation of the trace as in the classic case in (16.20). This is
how the extra wh-phrase saves the construction.
Although Kayne-type questions are an improvement on the classical case,
they are still quite degraded. There are two reasons for this. First, the subject
wh-phrase has to be contextualized as ranging over a topic set (due to the
subject constraint). Second, the integration of the two separate dependencies
What’s What? 329

poses a heavy processing load: One I-dependency in (16.26) is between who


and where allowing for the pair-list interpretation of these two wh-phrases.
However, in order to process the question the fronted wh-phrase what must
also be accommodated so that the interpretation of the question is that it asks
for a ‘triple’ list-reading.10
The account of superiority eVects proposed here thus aVords an explan-
ation of when context can improve acceptability and when it cannot and
predicts the Wne distinctions in acceptability evident in the English data.

16.5 Superiority in other languages


The account of the observed gradience in the English superiority data extends
to other languages, once the nature of their canonical f-structures is deter-
mined. This section discusses Hebrew, German, and Danish data and shows
that superiority eVects are determined by the same considerations as in
English. DiVerences are due to variation in the application of the subject
constraint which is in turn determined by the particular canonical focus
structure of the language in question.

16.5.1 Hebrew
The Wrst observation concerning Hebrew is that although topicalization may
result in OSV, superiority violations are licensed only in the order OVS, as
shown in (16.28a) and (16.28b) from Fanselow (2004):
(16.28) a. ma kana mi?
what bought who
b. *ma mi kana?
c. mi kana ma?
Example (16.28a) is only licensed in a d-linked context in which a set of goods
are contextually speciWed and (16.28c) requires a d-linked context in which a
set of buyers are contextually speciWed. D-linking is not employed in Hebrew
as a way to avoid double ID as it is in English.11 The fronted wh-phrase
therefore does not form an I-dependency with its trace. It follows that only
one I-dependency is at work in Hebrew multiple wh-questions, namely the
one that renders the paired reading:
(16.29) a. mi kana ma
j________j
I-dependency
10 Triple dependencies are not derivable in this framework, a desirable result since they do not
render an optimal output.
11 There is no parallel to a ‘which-phrase’ in Hebrew. ‘eize X’ is best paraphrased as ‘what X’.
330 Gradience in Wh- Movement Constructions

b. ma kana mi
j________j
I-dependency
I conclude that the subject constraint is not operative in Hebrew as it is in
English. This conclusion is also supported by the fact that adding a third wh-
phrase not only does not help, as it does in English, but is blocked in all cases:
(16.30) a. *mi kana ma eifo?
b. *ma kana mi eifo?
c. *ma mi kana eifo?
The subject constraint constrains I-dependencies to the canonical f-structure
of a particular language. In English, the canonical f-structure is one in which
syntactic structure and f-structure are aligned. The fact that the OVS and SVO
orders of (16.28a) and (16.28c) are equally good in Hebrew and that the OSV
order of (16.28b) is ruled out may mean that it is the OSV word order which is
the culprit. The diVerence between OSVand OVS in Hebrew is associated with
the function of the subject when the object is fronted. When it is interpreted as
a topic, it is placed preverbally and when it is focused, it is placed after the verb.
The examples in (16.31)–(16.33) demonstrate that this is the case:
(16.31) a. et hasefer moshe kana.12
the-book Moshe bought
b. et hasefer kana moshe.
(16.32) a *et hasefer yeled exad kana
the-book boy one bought
‘Some boy bought the book.’
b. et hasefer kana yeled exad.
(16.33) a. et hasefer hu kana.
the-book he bought
b. *et hasefer kana hu
Example (16.31) shows that a deWnite subject which can function as both a topic
and a focus can occur both preverbally and postverbally. Example (16.32) shows
that an indeWnite subject which cannot be interpreted as a topic is restricted to
the postverbal position. Example (16.33), in turn, shows that a subject pronoun,
which must be interpreted as a topic, can only occur preverbally. Examples
(16.31a) and (16.33a) also require contextualization in view of the fact that both

12 ‘et’ marks deWnite objects. mi (¼ ‘who’) in object position is most naturally marked with ‘et’
whereas ma (¼ ‘what’) is not. I do not have an explanation for this distinction.
What’s What? 331

the topicalized object and the preverbal subject are interpreted as topics. Since
every sentence requires a focus, this forces the verb to be focused or else one of
the arguments must be interpreted contrastively. In either case the f-structure is
marked. To complete our investigation of the unmarked f-structure in Hebrew,
we must also examine the untopicalized cases:
(16.34) a. moshe kana et hasefer/sefer
Moshe bought the-book/(a) book
b. ?yeled exad kana et hasefer
boy one bought the book
The most natural f-structure of (16.34a) is one in which the subject is the topic
and the VP or object is focused. Example (16.34b) with the deWnite object
interpreted as a topic is marked.13 The results of both orders are schematized
in (16.35):
(16.35) a. *Otop Sfoc V
b. ?Otop Stop V
c. Otop V Sfoc
d. *Otop V Stop
e. Stop V Ofoc
f. ?Sfoc V Otop
Examples (16.35c) and (16.35e) are the only unmarked cases. I conclude that the
unmarked focus structure in Hebrew is one in which the topic precedes the
verb and the focus follows it. Hebrew dependencies therefore do not depend
on the syntactic structure of the sentence, but only on the linear order of topic
and focus with respect to the verb. The (subject) constraint on I-dependencies
which applies in Hebrew is shown in (16.36):
(16.36) An I-dependency can occur only in a canonical f-structure:
Xtop V [ . . . Y . . . ]foc
Example (16.36) correctly rules out (16.28b) and predicts that both (16.28a)
and (16.28c) are restricted to d-linked contexts (the initial wh-phrase must be
a topic).

13 Example (i), in which both arguments are indeWnite, is interpreted as all-focus:


(i) yeled exad kana sefer
boy one bought book
‘Some boy bought a book.’
In this chapter all-focus sentences are ignored. For an account of such sentences within f-structure
theory, see Erteschik-Shir (1997).
332 Gradience in Wh- Movement Constructions

I conclude that multiple wh-questions in Hebrew are governed by the same


considerations as they are in English. DiVerences between the two languages
follow from their diVerent canonical f-structures.
16.5.2 German
According to many authors, German lacks superiority eVects. Wiltschko (1998)
not only argues that this is not the case, but also explains why German
superiority eVects have been overlooked. One of the reasons she oVers is that
controlling for d-linking is diYcult since ‘discourse-related contrasts are often
rather subtle’ (1998: 443). Along these lines, Featherston (2005) performed an
experiment in which informants were asked to grade the data according to an
open-ended scale. His results showed that superiority eVects are ‘robustly
active’ in German. It turns out, then, that German does not diVer signiWcantly
from English in this respect. Fanselow (2004), although aware of Featherston’s
results, still distinguishes the status of English and German with respect to
superiority eVects. Fanselow points out that in German the superiority eVect
does appear when the subject wh-phrase is in Spec, IP (his (35)):
(16.37) a. wann hat’s wer gesehen
when has it who seen
b. ?*wann hat wer’s gesehen
‘who saw it when?’
In (16.37a) the subject follows the object clitic, indicating its VP-internal
position. In (16.37b), it precedes the object clitic and so must be outside the
VP. These data are reminiscent of the Hebrew facts just discussed: German
subjects in Spec,IP must be interpreted as topics, whereas VP-internal subjects
are interpreted as foci.
D-linking is also required, as noted by Wiltschko. Fanselow (2004) gives the
following illustration (his (42)):

(16.38) wir haben bereits herausgefunden


we have already found out
a. wer jemanden gestern anrief, und wer nicht
who.nom someone.acc yesterday called and who.nom not
b. wen jemand gestern anrief, und wen nicht
who.acc someone.nom yesterday called and who.acc not
Aber wir sind nicht eher zufrieden, bis wir auch wissen
But we are not earlier content until we also know
a’. wer WEN angerufen hat
who.nom who.acc called has
b’. wen WER angerufen hat
What’s What? 333

According to Fanselow, OSV order is licensed only if the object is discourse


linked, but SOV order is also allowed in an out-of-the-blue multiple wh-
question (his (43)):
(16.39) Erzähl mir was über die Party.
‘Tell me something about the party.’
a. Wer hat wen getroVen?
who.nom has who.acc met?
b. ??Wen hat wer getroVen
Fanselow’s example cannot, however, be considered out-of-the-blue. A party
necessarily involves a set of participants. These are what the wh-phrases range
over in the questions following the initial sentence. Since both wh-phrases
range over the same set of party-participants, they are equivalent. Example
(16.39a), in which no reordering has occurred, is therefore preferred.
From this data I gather that the German canonical f-structure is similar to
the one proposed for Hebrew, with only one small diVerence: German, too,
requires that the Wrst argument be the topic and the second be the focus, yet
the status of the subject is determined diVerently: German subjects are
interpreted as foci when they are VP-internal, and as topics when they are
not, as shown in (16.37). The position of the subject is transparent only in the
presence of adverbials or other elements that mark the VP boundary.14 In
many of the examples in which such elements are absent, the linear position of
the subject wh-phrase gives no clue as to its syntactic position. In those cases,
the subject will be interpreted according to contextual clues. An I-dependency
is licensed between two wh-phrases in German when the Wrst one is inter-
preted as a topic and the second as a focus.
16.5.3 Danish
According to Fanselow (2004), Swedish does not exhibit superiority eVects
(his (12)):
(16.40) Vad koepte vem
what bought who
In Danish, the same question is degraded:
(16.41) a. Hvem købte hvad?
who bought what
b. ?Hvad købte hvem?
what bought who

14 See Diesing (1992) for this eVect.


334 Gradience in Wh- Movement Constructions

Overt d-linking signiWcantly improves the question:


(16.42) Hvilken bog købte hvilken pige?
Which book bought which girl?
Danish may have a preference for overtly marking d-linked wh-phrases
instead of just depending on contextual clues. Danish is like English in this
respect, except that the preference in English is even stronger. Danish diVers
from English in that superiority eVects in subordinate clauses are not ameli-
orated by overtly d-linked wh-phrases:

(16.43) a. *Jeg ved ikke hvad hvem købte


I know not what who bought
b. *Jeg ved ikke hvilken bog hvilken pige købte
I know not which book which girl bought
Danish generally marks the topic by fronting it to sentence-initial position.
This is also the case if the topic is located in the subordinate clause. Topical-
ization within a subordinate clause is therefore excluded.15 It follows that
whereas word order may signal the f-structure of the main clause, the order
within subordinate clauses does not. This is the explanation I propose for the
diVerent behaviour of Danish main and subordinate clauses with respect to
superiority eVects. Scrambling languages such as German diVer: scrambling
positions topics outside the VP in subordinate clauses as well as main clauses.
No diVerence between main and subordinate clauses is predicted in scram-
bling languages. This prediction is borne out for German.16 Fanselow (2004)
rejects the idea that the availability of scrambling is what explains the lack of
superiority eVects because there are non-scrambling languages which also
lack superiority eVects. I would not be surprised if non-scrambling languages
exhibit the same diVerence between main and subordinate clauses as Danish.
Since the verb in Danish main clauses must appear in second position, the
canonical f-structure is identical to the one proposed for Hebrew. The only

15 Topicalization is licensed in subordinate clauses under a few bridge-verbs such as think. In such
cases the syntactically subordinate clause functions as a main clause.
16 Hebrew is like Danish in this respect. Since Hebrew is not a scrambling language, this is what is
predicted. Since English is not a scrambling language, English should also exhibit a diVerence between
main and subordinate clauses. This is not the case:
(i) Which book did which boy buy?
(ii) I don’t know which book which boy bought.
The diVerence between main and subordinate clauses in Danish arises because only in the former is f-
structure marked by word order. English main clauses do not diVer from subordinate clauses in this
way. This may explain why no diVerence in superiority eVects between main and subordinate clauses
can be detected.
What’s What? 335

diVerence between Hebrew and Danish is the preference for overtly d-linked
wh-phrases.
What is common to the languages examined here is the need for d-linking
of at least one of the wh-phrases in multiple wh-questions. That is why such
questions are always sensitive to context and therefore exhibit gradience.
Variation among languages follows from three parameters: the canonical
f-structure, the availability of topicalization and scrambling processes, and
the array of wh-phrases available in a particular language. As I have shown
here, all three must be taken into account in order to predict the cross-
linguistic distribution of superiority eVects.

16.6 What’s what?


In this paper, I have shown that f-structure constraints, which are sensitive to
context, generally result in gradient output. Speaker judgements, which are
generally solicited out of context, depend on how likely it is for a given
informant to contextualize the test sentence appropriately. This will be hard
if the required f-structure is marked or if accommodation is necessary. The
ability of speakers to contextualize appropriately will also vary. It follows that
gradience within and across speakers is to be found whenever grammaticality
is constrained by f-structure principles.
I expect that the (subject) constraint on I-dependencies is universal and
that its raison d’être is to enable the processing of the dependency. Sentences
which exhibit a canonical f-structure are easy to process because they do not
require complex contextualization. Dependencies also impose a processing
burden. They are therefore restricted to structures which impose only a
minimal processing burden themselves. Language variation follows from
diVerences in canonical f-structure.
The answer to my question ‘What’s what?’ is that gradience can only result
when f-structure is involved. Violations of syntactic constraints necessarily
cause strong grammaticality infractions, thus resulting in ungrammatical
sentences. It follows that context-sensitive grammaticality squishes provide
a diagnostic for whether a syntactic or a focus-structure constraint is involved:
whenever context interacts with acceptability, the constraint cannot be syn-
tactic. There is therefore no need for ‘weighted constraints’ in syntactic
theory.
17

Prosodic InXuence on Syntactic


Judgements
YO S H I H I S A K I TAG AWA A N D JA N E T D E A N F O D O R

17.1 Introduction
It appears that there is a rebellion in the making, against the intuitive
judgements of syntacticians as a privileged database for the development of
syntactic theory.1 Such intuitions may be deemed inadequate because they
are not suYciently representative of the language community at large. The
judgements are generally few and not statistically validated, and they are made
by sophisticated people who are not at all typical users of the language.
Linguists are attuned to subtle syntactic distinctions, about which they have
theories. However, our concern in this paper is with the opposite problem:
that even the most sophisticated judges may occasionally miss a theoretically
signiWcant fact about well-formedness.
In the 1970s it was observed that in order to make a judgement of syntactic
well-formedness one must sometimes be creative. It was noted that some
sentences, such as (17.1), are perfectly acceptable in a suitable discourse
context, and completely unacceptable otherwise (e.g. as the initial sentence
of a conversation; see Morgan 1973).
(17.1) Kissinger thinks bananas.
Context: What did Nixon have for breakfast today?
Given the context, almost everyone judges sentence (17.1) to be well-formed.
But not everyone is good at thinking up such a context when none is
1 This work is a revised and extended version of Kitagawa and Fodor (2003). We are indebted to
Yuki Hirose and Erika Troseth who were primarily responsible for the running of the experiments we
report here, and to Dianne Bradley for her supervision of the data analysis. We are also grateful to the
following people for their valuable comments: Leslie Gabriele, Satoshi Tomioka, three anonymous
reviewers, and the participants of Japanese/Korean Linguistics 12, the DGfS Workshop on Empirical
Methods in Syntactic Research, and seminars at Indiana University and CUNY Graduate Center. This
work has been supported in part by RUGS Grant-in-Aid of Research from Indiana University.
Prosodic InXuence on Syntactic Judgements 337

provided. That is not a part of normal language use. Hence out-of-context


judgements are more variable, since they depend on the happenstance of what
might or might not spring to the mind of the person making the judgement
(see Schütze 1996: sect. 5.3.1.) Our thesis is that prosodic creativity is also
sometimes required in judging syntactic well-formedness when sentences are
presented visually, that is when no prosodic contour is supplied.
Consider sentence (17.2), which is modelled on gapping examples from
Hankamer (1973). If (17.2) is read with a nondescript sort of prosody—the
more or less steady fundamental frequency declination characteristic of
unemphatic declarative sentences in English—it is likely to be understood
as in (17.2a) rather than (17.2b).
(17.2) Jane took the children to the circus, and her grandparents to the ballgame.
a. . . . and Jane took her grandparents to the ballgame.
b. . . . and Jane’s grandparents took the children to the ballgame.

If construed as (17.2a), ‘her grandparents’ in (17.2) is the object of the second


clause, in which the subject and verb have been elided; this is clause-peripheral
gapping. If construed as (17.2b), ‘her grandparents’ in (17.2) is the subject of
the second clause, in which the verb and the object have been elided; this is
clause-internal gapping. It demands a very distinctive prosodic contour,
which readers are unlikely to assign to the word string (17.2) in the absence
of any speciWc indication to do so. It requires paired contrastive accents on
Jane and her grandparents, and on circus and ballgame, defocusing of the
children, and a signiWcant pause between the NP and the PP in the second
clause. (See Carlson 2001 for relevant experimental data.)
In a language with overt case marking, a sentence such as (17.2) would not
be syntactically ambiguous. The (17.2b) analysis could be forced by nomina-
tive case marking on the ‘grandparents’ noun phrase. Case marking is not
robust in English, but for English speakers who still command a reliable
nominative/accusative distinction, the sentence (17.3a) can only be under-
stood as peripheral gapping with ‘us grandparents’ as object, and (17.3b) can
only be understood as clause-internal gapping with ‘we grandparents’ as
subject.
(17.3) a. Jane took the children to the circus, and us grandparents to the ballgame.
b. Jane took the children to the circus, and we grandparents to the ballgame.

If these sentences are presented in written form, (17.3a) is very likely to be


accepted as well-formed but (17.3b) may receive more mixed reactions.
Readers are most likely to begin reading (17.3b) with the default prosody,
and they may then be inclined to continue that prosodic contour through the
338 Gradience in Wh-Movement Constructions

second clause, despite the nominative ‘we’. If so, they might very well arrive at
the peripheral-gap analysis and judge ‘we’ to be morphosyntactically incor-
rect on that basis. It might occur to some readers to try out another way of
reading the sentence, but it also might not. The standard orthography does
not mark the prosodic features required for gapping; they are not in the
stimulus, but must be supplied by the reader—if the reader thinks to do so.
Thus, grammaticality judgements on written sentences may make it appear
that clause-internal gapping is syntactically unacceptable, even if in fact the
only problem is a prosodic ‘garden path’ in reading such sentences. The way to
Wnd out is to present them auditorily, spoken with the highly marked prosody
appropriate for clause-internal gapping, so that their syntactic status can be
judged without interference from prosodic problems. The outcome of such a
test might still be mixed, of course, if indeed not everyone accepts (this kind
of) non-peripheral gapping, but at least it would be a veridical outcome, a
proper basis for building a theory of the syntactic constraints on ellipsis.
The general hypothesis that we will defend here is that any construction
which requires a non-default prosody is vulnerable to misjudgements of
syntactic well-formedness when it is read, not heard.2 It might be thought
that reading—especially silent reading—is immune to prosodic inXuences,
but recent psycholinguistic Wndings suggest that this is not so. Sentence
parsing data for languages as diverse as Japanese and Croatian are explicable
in terms of the Implicit Prosody Hypothesis (Fodor 2002a, 2002b): ‘In silent
reading, a default prosodic contour is projected onto the stimulus. Other
things being equal, the parser favors the syntactic analysis associated with the
most natural (default) prosodic contour for the construction.’ In other words,
prosody is always present in the processing of language, whether by ear or
by eye. And because prosodic structure and syntactic structure are tightly
related (Selkirk 2000), prosody needs to be under the control of the linguist
who solicits syntactic judgements, not left to the imagination of those
who are giving the judgements. At least this is so for any construction that
requires a non-default prosodic contour which readers may not be inclined to
assign to it.
We illustrate the importance of this methodological moral by considering
a variety of complex wh-constructions in Japanese. In previous work we
have argued that disagreements that have arisen concerning the syntactic

2 There is a Wne line between cases in which a prosodic contour helps a listener arrive at the
intended syntactic analysis, and cases in which a particular prosodic contour is obligatory for the
syntactic construction in question. The examples we discuss in this paper are of the latter kind, we
believe. But as the syntax–phonology interface continues to be explored, this is a distinction that
deserves considerably more attention.
Prosodic InXuence on Syntactic Judgements 339

well-formedness of some of these constructions can be laid at the door of the


non-default prosodic contours that they need to be assigned.

17.2 Japanese wh-constructions: syntactic and semantic issues


The constructions of interest are shown schematically in (17.4) and (17.5). For
syntactic/semantic theory, the issues they raise are: (a) whether subjacency
blocks operations establishing LF scope in Japanese; (b) whether (overt) long-
distance scrambling of a wh-phrase in Japanese permits scope reconstruction
at LF. For many years there was no consensus on these matters in the
literature.
(17.4) Wh-in-situ:
[ ------ [------ wh-XP ------ COMPSubord ] ------ COMPMatrix ]
(17.5) Long-distance scrambling:
[ wh-XPi ------ [------ ti ------ COMPSubord ] ------ COMPMatrix ]
Consider Wrst the situation in (17.4), in which a wh-phrase in a subordinate
clause has not moved overtly. This wh-phrase could have matrix scope if
appropriate covert operations are permitted: either movement of a wh-phrase
to its scope position at LF, with or without movement of an empty operator at
S-structure, or operator-variable binding of a wh-in-situ by an appropriate
COMP. If subjacency were applicable to such scope-determining operations,
it would prevent matrix scope when the subordinate clause is a wh-island, for
example when the subordinate clause complementizer is -kadooka (‘whether’)
or -ka (‘whether’). The scope of a wh-XP in Japanese must be marked by a
clause-Wnal COMPwh (-ka or -no).3 Thus, if the matrix clause complement-
izer were -no (scope-marker), and the subordinate clause complementizer
were -kadooka, the sentence would be ungrammatical if subjacency applies to
covert operations in Japanese: subordinate scope would be impossible for lack
of a subordinate scope-marker, and matrix scope would be impossible be-
cause of subjacency.
The applicability of subjacency to wh-in-situ constructions has signiWcant
theoretical ramiWcations (see discussion in Kuno 1973, Huang 1982, Pesetsky

3 The ambiguity of some complementizers will be important to the discussion below. For clarity, we
note here that both -ka and -no are ambiguous. -ka can function as a wh-scope marker, COMPWH, in
any clause, or as COMPWHETHER in subordinate clauses, and as a yes/no question marker Q in matrix
clauses. -no can be an interrogative complementizer only in matrix clauses, where it can function
either as COMPWH or as Q. For most speakers, -kadooka is unambiguously COMPWHETHER, although
a few speakers can also interpret -kadooka as a wh-scope marker (COMPWH) in a subordinate clause.
340 Gradience in Wh-Movement Constructions

1987, among others). It has been widely, although not universally, maintained
that subjacency is not applicable to covert (LF) operations. Thus it would
clarify the universal status of locality principles in syntax if this were also the
case in Japanese. This is why it is important to determine whether sentences of
this form (i.e. structure (17.4) where the subordinate complementizer is not a
wh-scope marker) are or are not grammatical. We will argue that they are, and
that contrary judgements are due to failure to assign the necessary prosodic
contour.
Example (17.5) raises a diVerent theoretical issue, concerning the relation
between surface position and scope at LF. Note Wrst that the long-distance
scrambling in (17.5) is widely agreed to be grammatical, even when the
COMPSubord is -kadooka (‘whether’) or -ka (‘whether’). Thus, subjacency
does not block scrambling (overt movement) from out of a wh-complement
in Japanese (Saito 1985).4 What needs to be resolved is the possible LF scope
interpretations of a wh-XP that has been scrambled into a higher clause. Does
it have matrix scope, or subordinate scope, or is it ambiguous between the
two? When a wh-XP has undergone overt wh-movement into a higher clause
in a language like English, matrix scope is the only possible interpretation. But
unlike overt wh-movement, long-distance scrambling in Japanese generally
forces a ‘radically’ reconstructed interpretation, that is, a long-distance scram-
bled item is interpreted as if it had never been moved. (Saito 1989 describes
this as scrambling having been ‘undone’ at LF; Ueyama 1998 argues that long-
distance scrambling applies at PF.) If this holds for the scrambling of a wh-
phrase, then subordinate scope should be acceptable in the conWguration
(17.5). However, there has been disagreement on this point. We will maintain
that subordinate scope is indeed syntactically and semantically acceptable,
and judgements to the contrary are most likely due to a clash between the
prosody that is required for the subordinate scope interpretation and the
default prosody that a reader might assign.
Thus, our general claim is that syntactic and semantic principles permit
both interpretations for both constructions (17.4) and (17.5) (given appropri-
ate complementizers), but that they must meet additional conditions on their
PFs in order to be fully acceptable (see Deguchi and Kitagawa 2002 for
details). We discuss the subjacency issue (relevant to construction (17.4)) in
Section 17.3.1, and the reconstruction issue (relevant to construction (17.5)) in
Section 17.3.2.

4 Saito argued, however, that subjacency does block overt scrambling out of a complex NP, and
out of an adjunct. This discrepancy, which Saito did not resolve, remains an open issue to be
investigated.
Prosodic InXuence on Syntactic Judgements 341

17.3 The prosody of wh-constructions in Japanese


Japanese wh-questions have a characteristic prosodic contour, called em-
phatic prosody (EPD) by Deguchi and Kitagawa (2002). The wh-XP is
prosodically focused, and everything else in the clause which is its scope is
de-focused. That is, there is an emphatic accent on the wh-item, and then
post-focal ‘eradication’ (compression of pitch and amplitude range, virtually
suppressing lexical and phrasal pitch accents) up to the end of the wh-
scope.5 Importantly, this means that there is a correlation between the extent
of the prosodic eradication and the extent of the syntactic/semantic scope of
the wh-phrase. Subordinate wh-scope (¼ indirect wh-question) is associated
with what Deguchi and Kitagawa called short-EPD, that is EPD which ends
at the COMPWH of the subordinate clause. Matrix wh-scope (¼ direct wh-
question) is associated with what Deguchi and Kitagawa called long-EPD,
that is EPD which extends to the matrix COMPWH at the end of the
utterance. (See Ishihara (2002) for a similar observation and see Hirotani
(2003) for discussion of the role of prosodic boundaries in demarcating the
wh-scope domain.) This is the case for all wh-constructions, regardless of
whether the wh-phrase is moved or in situ, and whether or not it is inside a
potential island.

17.3.1 Wh-in-situ
First we illustrate Deguchi and Kitagawa’s observation for wh-in-situ. In
(17.6) and (17.7) we show a pair of examples which diVer with respect to
wh-scope, as determined by their selection of complementizers. In both
examples the wh-phrase dare-ni (‘who-DAT’) is in situ and there is no wh-
island, so there is no issue of a subjacency violation. What is of interest here is
the relation between wh-scope and the prosodic contour. (In all examples
below, bold capitals denote an emphatic accent; shading indicates the domain
of eradication, accent marks indicate lexical accents that are unreduced, and "
indicates a Wnal interrogative rise.)

5 In this chapter we retain the term ‘eradication’ used in our earlier papers, but we would emphasize
that it is not intended to imply total erasure of lexical accents. Rather, there is a post-focal reduction of
the phonetic realization of accents, probably as a secondary eVect of the general compression of the
pitch range and amplitude in the post-focal domain. See Ishihara (2003) and Kitagawa (2006), where
we substitute the term post-focal reduction. Also, we note that the utterance-Wnal rise that is charac-
teristic of a matrix question overrules eradication on the sentence-Wnal matrix COMPWH. The
prosodic descriptions given here should be construed as referring to standard (Tokyo) Japanese;
there is apparently some regional variability.
342 Gradience in Wh-Movement Constructions

(17.6) Short-EPD
#Keesatu-wa [ Mary-ga ano-ban re-ni denwasita-ka
DAre-ni denwasita-ka] ı́mademo sirabeteteiru.
Police-top Mary-nom that-night who-dat called-compwh even.now investigating
‘The police are still investigating who Mary called that night.’

(17.7) Long-EPD
Keesatu-wa [ Mary-ga ano-ban re-ni denwasita-to]
DAre-ni denwasita-to] imademo kangaeteiru-no"?
kangaeteiru-no
Police-top Mary-nom that-night who-dat called-compthat even.now think-compwh
‘Who do the police still think that Mary called that night?’

When we gather acceptability judgements on these sentences we present them


in spoken form with either the short-EPD or the long-EPD prosodic pattern.
With the contours shown here the sentences are judged acceptable. If the two
contours are exchanged, the sentences are judged to be extremely unnatural.6
(See Section 17.4 for some related experimental data.)
Now let us consider examples that are similar to (17.6) and (17.7) but have a
diVerent selection of complementizers. In (17.8) and (17.9) the wh-phrase is
in situ inside a wh-complement clause. The word strings are identical here, so
(17.8) and (17.9) are lexically and structurally identical; only the prosodic
contour diVers between them. Observe that in both cases the complement-
izers (subordinate -kadooka, matrix -no) are compatible only with matrix
scope. We thus predict that (17.8) with short-EPD will be judged unaccept-
able, while (17.9) with long-EPD will be judged acceptable. And this is indeed
what informants’ judgements reveal when sentences are presented auditorily,
with prosodic properties controlled. (We use # below to denote a sentence
that is unacceptable with the indicated prosody.)

(17.8) Short-EPD
#Keesatu-wa [Mary-ga . . . ano-ban DAre-ni denwasita-kadooka ]
re-ni denwasita-kadooka
ı́mademo sirabeteteiru-no?
Police-top Mary-nom that-night who-dat called-compwh even.now investigating-q
a. ‘‘Who1 is such that the police are still investigating
[whether Mary called him/her1 that night]?’
b. ‘Are the police still investigating [whether Mary called
who that night]?’

6 Although this is generally true, Satoshi Tomioka notes (p.c.) that certain expressive modes (e.g. a
strong expression of surprise) can disturb the prosody-scope correlation for long-EPD. This phenom-
enon needs further investigation. See also Hirotani (2003) for psycholinguistic data on the perception
of long-EPD utterances.
Prosodic InXuence on Syntactic Judgements 343

(17.9) Long-EPD
Keesatu-wa [Mary-ga ano-ban DA re-ni denwasita-kadooka] imademo sirabeteteiru-no
Police-top Mary-nom that-night who-dat called-compwhether
even.now investigating-compwh
‘Who1 is such that the police are still investigating [whether Mary called him/her1 that
night]?’

Pronounced with long-EPD, (17.9) is acceptable and has matrix scope inter-
pretation of the wh-phrase. Sentence (17.8) with short-EPD is not acceptable. It
may be rejected on one of two grounds, as indicated in (a) and (b). Either a
hearer attempts to interpret (17.8) with matrix wh-scope as in translation
(17.8a), and would then judge the prosody to be inappropriate; or (17.8) is
interpreted with subordinate wh-scope as in translation (17.8b), in line with the
prosody, and the subordinate complementizer -kadooka (‘whether’) would be
judged ungrammatical since it cannot be a wh-scope marker. As noted, how-
ever, there are some speakers who are able to interpret -kadooka as a wh-scope-
marker, and for them (17.8) is acceptable with subordinate scope, as expected.
The fact that (17.9) is acceptable shows that matrix wh-scope is available
when the sentence is pronounced with long-EPD. Thus it is evident that
subjacency does not block scope extraction from a -kadooka clause. The
unacceptability of (17.8) therefore cannot be due to subjacency. Only an
approach that incorporates prosody can account for the contrast between
the two examples.
The confusion about the applicability of subjacency in Japanese is thus
resolved. When appropriate prosody is supplied, grammaticality judgements
show no eVect of subjacency on the interpretation of wh-in-situ.7 The variable
judgements reported in the literature are explicable on the assumption that
when no prosody is explicitly provided, readers project their own prosodic
contour. A reader of (17.8)/(17.9) who happened to project long-EPD would
Wnd the sentence acceptable on the matrix scope reading represented in (17.9).
A reader who happened to project short-EPD would in eVect be judging
(17.8), and would be likely to Wnd it unacceptable on the matrix scope reading
(and also the subordinate scope reading). This judgement could create the
impression that subjacency is at work. As we discuss below, there are reasons
why readers might be more inclined to project short-EPD than long-EPD for

7 See Deguchi and Kitagawa (2002) for evidence that long-EPD is not an exceptional prosody which
permits scope extraction out of wh-islands by overriding subjacency.
344 Gradience in Wh-Movement Constructions

wh-in-situ examples. If this is so (i.e. if short-EPD is the default prosody for


this construction), it would encourage the misreading of this word string as
(17.8) rather than as (17.9), and so would tilt readers toward a negative
judgement.8

17.3.2 Long-distance-scrambled wh
The other data disagreement which needs to be resolved with respect to
Japanese wh-constructions concerns the scope interpretation of a wh-XP
that has undergone long-distance scrambling out of a subordinate clause.
This was schematized in (17.5), repeated here, and is exempliWed in (17.10).
(17.5) Long-distance scrambling :
[ wh-XPi . . . [ . . . ti . . . COMPSubord ] . . . COMPMatrix ]

(17.10) Nani1-o John-wa [ Mary-ga t1 tabeta-ka ] siritagatteiru-no"?


what-acc John-top Mary -nom ate-compwhether/wh wants.to.know-compwh/-q
a. ‘Does John want to know what Mary ate?’
b. *‘What does John want to know whether Mary ate?’
(i.e. ‘Whati is such that John wants to know whether Mary ate iti?’)

As noted earlier, there is no evidence of any subjacency restriction on


overt long-distance scrambling in this construction: a wh-phrase can be
freely scrambled even out of a wh-island. But there has been disagreement
in the literature concerning the LF scope of a long-distance-scrambled
wh-XP. If scrambling of a wh-XP is subject to obligatory (or ‘radical’)
reconstruction at LF, the scrambled wh-phrase in (17.10) would have to be
interpreted in its underlying position, that is with the same scope possibil-
ities as for an in-situ wh-phrase. We observed above that wh-in-situ can be
interpreted with either subordinate-clause scope or matrix-clause scope,
although with a preference for the former in reading, when prosody is
not pinned down. However, Takahashi (1993) claimed to the contrary that
only matrix wh-scope (i.e. interpretation (17.10b)) is acceptable in this
construction.

8 In Kitagawa and Fodor (2003) we noted two additional factors that could inhibit acceptance of
matrix scope for wh-in-situ: semantic/pragmatic complexity (the elaborate discourse presuppositions
that must be satisWed); and processing load (added cost of computing the extended dependency
between the embedded wh-phrase and a scope marker in the matrix clause). It seems quite likely that
these conspire with the default prosody to create diYculty with the matrix scope reading. However, we
will not discuss those factors here, because they cannot account for judgements on the wh-scrambling
examples that we examine in the next section.
Prosodic InXuence on Syntactic Judgements 345

Unacceptability of the subordinate scope interpretation (17.10a) does not


follow from subjacency or from any other familiar syntactic constraint. In
order to account for it, Takahashi was driven to assume that sentences such as
(17.10) are derived not by long-distance scrambling but by overt wh-
movement, which (unlike scrambling) would not be ‘undone’ (i.e. would
not be radically reconstructed) at LF. Although a clever notion, this does not
mesh well with other observations about scrambling in Japanese and also in
Korean (e.g. Kim 2000). That it is not the right approach is underscored by
the observation (Deguchi and Kitagawa 2002) that when short-EPD is overtly
supplied in spoken sentences, many speakers accept subordinate scope in
examples such as (17.10). That is, (17.10a) is acceptable with short-EPD,
although not with long-EPD—although informants often sense a lingering
awkwardness in (17.10a), for which we oVer an explanation below. The mixed
opinions on (17.10a) thus fall into place on an account that respects prosodic
as well as syntactic constraints. The correlation of prosody and scope in
informants’ judgements of spoken sentences is exactly as in the other
examples noted above: short-EPD renders subordinate scope acceptable and
blocks matrix scope, while with long-EPD matrix scope is acceptable and
subordinate scope is not.
Note, however, that to explain why it is (17.10a) rather than (17.10b) that
raises disagreement when prosody is not speciWed, the prosodic account
would have to assume that long-EPD is the prosody that readers naturally
project onto the word string. However, we saw above that readers must prefer
short-EPD if prosody is to provide an explanation for the mixed judgements
on wh-in-situ. Apparently the phonological default Xips between wh-in-situ
constructions and wh-scrambled constructions. In the next section we con-
sider why this would be so.

17.3.3 Which prosody is the default?


Many, perhaps most, judgements of syntactic well-formedness reported in the
literature are made on written examples. No doubt this is largely for reasons of
convenience, but perhaps also the intention is to exclude phonological factors
from the judgement so that it can be a pure reXection of syntactic structure.
However, if the implicit prosody hypothesis (Section 17.1) is correct, this is an
unrealistic goal. Phonological factors cannot be excluded, because default pros-
ody intrudes when no prosody is speciWed in the input. Thus judgements on
visually presented sentences are not prosody-free judgements, but are judged as
if spoken with default prosody. To provide a full explanation of why certain
scope interpretations of Japanese wh-constructions tend to be disfavoured in
346 Gradience in Wh-Movement Constructions

reading, we need prosodic theory to make predictions as to which prosody is the


default for which construction. In particular, the observed preference for sub-
ordinate scope for wh-in-situ would be explained if readers tend to assign short-
EPD rather than long-EPD to wh-in-situ constructions; and the observed
preference for matrix clause scope for long-distance scrambled wh would be
explained if readers tend to assign long-EPD in preference to short-EPD to
scrambled wh constructions. Although it may have the Xavour of a contradic-
tion, this is in fact exactly what would be expected. Our proposal is that
competition among various constraints at the PF interface yields a diVerent
prosodic default for scrambled wh than for wh-in-situ.
In our previous work we have argued that short-EPD is phonologically
more natural than long-EPD because the latter creates a long string of
rhythmically and tonally undiVerentiated material, which is generally dis-
preferred in natural language (see Selkirk 1984; Kubozono 1993). This implies
that even where a grammar insists on prosodic eradication, the shorter it can
be, the better it is. In support of this, Kitagawa and Fodor (2003) presented
examples indicating that a sentence becomes progressively less natural as the
extent of EPD is increased by adding extra material even within a single-clause
construction. Comparable sentences but without wh and hence without EPD
do not degrade in the same manner; thus the eVect is apparently prosodic. A
similar distaste for lengthy stretches of deprosodiWed material can be observed
in English right dislocation constructions. The dislocated phrase requires
prosodic eradication yet eradication that extends over more than a few
words is disfavoured. This creates a clash which makes an example such as I
really hated it, that Wsh that Mary tried to persuade me to eat at the French
restaurant last night stylistically awkward.
A diVerent sort of clash occurs when a long-distance-scrambled wh is to
be pronounced with short-EPD in order to give it subordinate clause scope,
as in (17.10a) (repeated below as (17.11a) with its prosody indicated).
Although short-EPD is generally preferred, in the scrambled construction
it traps an element of the matrix clause (John-wa; ‘John-TOP’ in (17.10))
between the scrambled XP and the rest of the subordinate clause.9 Prosodic

9 If the XP were scrambled to a position between any overt matrix items and the Wrst overt element
of the subordinate clause, no matrix item would be trapped. The resulting sentence would be
ambiguous between local scrambling within the subordinate clause, and long-distance scrambling
into the matrix clause, so it would provide no overt evidence that the scrambled phrase is located in
the matrix clause in the surface form. In that case the example would not be useful for studying the
prosodic and/or semantic eVects of long-distance scrambling. Thus: any sentence that could be used to
obtain informants’ judgements on the acceptability of subordinate clause scope for a long-distance
scrambled wh would necessarily exhibit the entrapment which we argue favours long-EPD and hence
matrix scope.
Prosodic InXuence on Syntactic Judgements 347

eradication proceeds from the focused wh phrase through to the end of the
clause which is its scope. In the case of short-EPD, this will be from the
surface position of the wh-XP to the end of the subordinate clause. Thus,
the matrix topic John-wa in (17.10) will have its accent eradicated even
though it is not in the intended syntactic/semantic scope of the wh-XP.
This is represented in (17.11a).

(17.11) a. NAni
ni11-o John-wa
John-wa [Mary-ga
[Mary-ga t11 tabeta-ka ] siritagátteiru-no"
what-acc John-TOP Mary-NOM ate-compWH want.to.know-q
‘Does John want to know what Mary ate?’
ni1-o
b. NAni John-wa [Mary-ga
[Mary-gatt1 1 tabeta-ka
tabeta-ka ] siritagatteiru-no "
siritagatteiru-no
what-acc John-TOP Mary-NOM ate-compWHETHER want.to.know-compwh
*‘What does John want to know whether Mary ate?’
(i.e. ‘Whati is such that John wants to know whether Mary ate iti?’)

Short-EPD as in (17.11a) is dispreferred. There is a mismatch between the


inclusion of John-wa within the EPD domain, and the ending of the EPD at
the subordinate COMP. This oVends a very general preference for congruence
between syntactic and prosodic structure, which encourages perceivers to
assume a simple transparent relationship between prosody and syntax wher-
ever possible. Thus we expect a preference for material in the prosodic
eradication domain of a Japanese wh construction to be construed as being
in the syntactic scope domain also. In the present case this preference for
congruence can be satisWed only if the prosody assigned is long-EPD,
which extends through both clauses, as in (17.11b). If (17.10) were presented
in writing, a reader assigning implicit prosody, and having necessarily eradi-
cated the accent in John-wa, would be likely to continue the eradication
through the rest of the clause that includes John-wa. The result would be
long-EPD, favouring a matrix scope interpretation. The advantage of long-
EPD over short-EPD with respect to congruence for long-distance scrambled
wh might outweigh the fact that long expanses of EPD are generally dis-
preferred.10
In support of this account of long-EPD as the preferred prosody for a
scrambled-wh construction, we note that even for spoken sentences with

10 We noted above that semantic and processing factors may reinforce the prosodic default in the
case of wh-in-situ. However, those factors would favour subordinate scope for scrambled wh as well as
for wh-in-situ, as explained in Kitagawa and Fodor (2003). Thus, only the prosodic explanation makes
the correct prediction for both contexts: a preference for subordinate scope for wh-in-situ and a
preference for matrix scope for long-distance scrambled wh.
348 Gradience in Wh-Movement Constructions

overt prosody, hearers (and even speakers!) sometimes complain that they can
accept the subordinate scope interpretation only by somehow disregarding or
‘marginalizing’ the intervening matrix constituent. This is interesting. It can
explain why subordinate scope is not always felt to be fully acceptable even
with overt short-EPD, and it is exactly as could be expected given that the
intrusion of this matrix constituent in the subordinate clause eradication
domain is what disfavours the otherwise preferred short-EPD.
The general conclusion is clear: when overt prosody is present, listeners
can be expected to favour the syntactic structure congruent with the prosody
and judge the sentence accordingly. When no overt prosody is in the input,
as in reading, perceivers make their judgements on the basis of whatever
prosodic contour they have projected. This is a function of various principles,
some concerning the prosody–syntax interface, others motivated by purely
phonological concerns (e.g. rhythmicity) which in principle should be irrele-
vant to syntax. However, a reader may proceed as if the mentally projected
prosody had been part of the input, and then judge the syntactic well-
formedness of the sentence on that basis. Although some astute informants
may seek out alternative analyses, there is no compelling reason for them
to do so, especially as the request for an acceptability judgement implies—
contrary to the expectation in normal sentence processing for comprehen-
sion—that failure to Wnd an acceptable analysis is a legitimate possibility.
Therefore, any sentence (or interpretation of an ambiguous sentence) whose
required prosodic contour does not conform to general prosodic patterns
in the language is in danger of being judged ungrammatical in
reading, although perceived as grammatical if spoken with appropriate
prosody.

17.4 Judgements for written and spoken sentences


17.4.1 Previous research
When we began this work on wh-scope interpretation in Japanese, we took it
for granted that the relevance of prosody to acceptability, for some construc-
tions at least, would be a familiar point and that the recent wave of psycho-
linguistic experiments on grammaticality judgements would have produced
plenty of data in support of it. But we scoured the literature, most notably the
volumes by Schütze (1996) and Cowart (1997), and found very few reports of
grammaticality judgements on spoken sentences. Comments on using speech
input for grammaticality judgements mostly concern diVerences in register
between spoken and written language. Cowart also notes some practical
Prosodic InXuence on Syntactic Judgements 349

disadvantages of spoken input.11 Schütze cites a study by Vetter et al. (1979),


which compared written presentation and auditory presentation, the latter
with normal or monotone intonation. The sentence materials were diverse
and the results concerning prosody were mixed; normal intonation had an
eVect in some cases only. The details may repay further investigation, but the
sentence materials were not designed in a way that could shed light on our
hypothesis that auditory presentation should aid judgements primarily for
sentences needing non-default prosody.12
The only other study we know of that tested identical sentence materials in
written and spoken form is by Keller and Alexopoulou (2001) on Greek word
order, accent placement, and focus. This is a substantial investigation of six
diVerent word orders in declarative sentences, each in Wve diVerent question
contexts establishing a discourse focus. In the spoken sentences, accent pos-
ition was also systematically varied. The magnitude estimation method (see
Bard et al. 1996) was used to elicit judgements of ‘linguistic acceptability’, a
term which was intentionally not deWned for the participants. The results and
conclusions are of considerable interest but are too numerous to review here. It
is worth noting, however, that Keller and Alexopoulou underscore the sig-
niWcant contribution of prosody to acceptability for sentences involving focus,
even in a language with considerable freedom of word order such as Greek.
They write: ‘English relies on accent placement and only rarely on syntax . . .
for discourse purposes. On the other hand, the literature on free word order
languages . . . has emphasized the role of word order . . . We found that, at
least in Greek, word order . . . plays only a secondary role in marking infor-
mation structure; word order preferences can be overridden by phonological
constraints’ (Keller and Alexopoulou 2001: 359–60). Unfortunately for present

11 We set aside here studies whose primary focus is judgements by second language learners; see
Murphy (1997) and references there. Murphy found for English and French sentences that subjects
(both native and L2 speakers) were less accurate with auditory presentation than with visual presen-
tation, especially with regard to rejecting subjacency violations and other ungrammatical examples (cf.
Hill’s observation noted below).
12 Schütze also mentions an early and perhaps not entirely serious exploration by Hill (1961) of ten
example sentences, eight of them from Chomsky (1957), judged by ten informants. For instance, the
sentence I saw a fragile of was accepted in written form by only three of the ten informants. In spoken
form, with primary stress and sentence-Wnal intonation on the word of, it was subsequently accepted
by three of the seven who had previously rejected it. Some comments (e.g. ‘What’s an of?’) revealed
that accepters had construed of as a noun. Hill concluded, as we have done, that ‘intonation-pattern
inXuences acceptance or rejection.’ However, his main concern, unlike ours, was over-acceptance of
spoken examples. He warned that ‘If the intonation is right, at least enough normal speakers will react
to the sentence as grammatical though of unknown meaning, to prevent convergent rejection.’ Our
experimental data (see below) also reveal some tendency to over-accept items that are ungrammatical
but pronounced in a plausible-sounding fashion, but we show that this can be minimized by
simultaneous visual and auditory presentation.
350 Gradience in Wh-Movement Constructions

purposes, no exact comparison can be made of the results for the reading
condition and the listening condition, because there were other diVerences of
method between the two experiments.

17.4.2 Experimental Wndings: Japanese and English


17.4.2.1 Materials Since the relevance of prosody to acceptability had not
previously been broadly tested, we conducted an experiment on the two
Japanese wh-constructions discussed above, with a related experiment on
two constructions in English for purposes of comparison. In all four cases
the target constructions were hypothesized to be fully acceptable only if
assigned a non-default prosody (explicitly or implicitly). Our prediction
was that they would be accepted more often when presented auditorily with
appropriate prosody (the listening condition) than when presented visually
without prosody (the reading condition).
The Japanese experiment was conducted by Kitagawa and Yuki Hirose. The
target items were instances of constructions (17.4) and (17.5) above, with wh-
in-situ and long-distance scrambled wh respectively. Each was disambiguated
by its combination of matrix and subordinate complementizers toward what
has been reported to be its less preferred scope interpretation: (a) subordinate
wh-in-situ with forced matrix scope as in (17.12) below; (b) wh scrambled
from the subordinate clause into the matrix clause, with forced subordinate
scope as in (17.13).13
(17.12) Kimi-wa Kyooko-ga hontoowa [ dare-o aisiteita-to ] imademo omotteiru-no?
you-top Kyooko-nom in.reality who-acc love-compthat even-now thinking-compwh
‘Who do you still think that Kyoko in fact loves?’
(17.13) Nani1-o aitu-wa [[Tieko-ga t1 kakusiteiru-ka] boku-ga sitteiru-to ]
what-acc that.guy-top Tieko -nom hiding-compwh I-nom know-compthat
omotteiru-rasii-yo.
thinking-seems-affirm
‘That guy seems to think that I know what Chieko is hiding.’

In the listening test, the sentences were spoken with appropriate prosody:
long-EPD for wh-in-situ examples such as (17.12), and short-EPD for fronted-

13 An extra declarative clause was added in the sentences of type (17.13), structurally intermediate
between the lowest clause, in which the wh-XP originated, and the highest clause, into which it was
scrambled. The purpose of this was to prevent readers, at the point at which they encounter the -ka,
from easily scanning the remainder of the sentence to see that no other possible scope marker is
present. If they had at that point detected the absence of a scope marker in the matrix clause, they
would inevitably have adopted a subordinate scope reading, and that would have inactivated any
possible preference for the long-EPD/matrix scope reading.
Prosodic InXuence on Syntactic Judgements 351

wh examples such as (17.13). In the reading test, prosody was not mentioned,
so readers were free to assign either prosody (or none at all). Our hypothesis
that short-EPD is the default for wh-in-situ examples, and long-EPD the
default for fronted-wh examples, predicted that the experimental sentences
would be rejected more often when presented in written form than when
spoken with appropriate contours.
We conducted a comparable experiment in English in order to provide
some benchmarks for the Japanese study. The English experiment was con-
ducted by Fodor with Erika Troseth, Yukiko Koizumi, and Eva Fernández. The
target materials were of two types. One was ‘not-because’ sentences such as
(17.14), with potentially ambiguous scope that was disambiguated by a nega-
tive polarity item in the because-clause, which would be ungrammatical unless
that clause were within the scope of the negation.
(17.14) Marvin didn’t leave the meeting early because he was mad at anyone.
The second type of target sentence consisted of a complex NP (a head noun
modiWed by a PP) and a relative clause (RC) as in (17.15), which was poten-
tially ambiguous between high attachment to the head noun or low attach-
ment to the noun inside the PP, but was disambiguated by number agreement
toward high attachment.
(17.15) Martha called the assistant of the surgeons who was monitoring the
progress of the baby.
For both of these constructions, as in the Japanese experiment, the disam-
biguation was toward an interpretation which has been claimed to require a
non-default prosody.
For the not-because construction, Frazier and Clifton (1996) obtained
experimental results for written materials indicating that the preferred inter-
pretation has narrow-scope negation, that is the because-clause is outside the
scope of the negation. (Unlike (17.14), their sentences had no negative polarity
item forcing the wide-scope negation reading.) That the dispreferred wide-
scope-negation reading needs a special intonation contour is noted by
Hirschberg and Avesani (2000). In their study, subjects read aloud context-
ually disambiguated examples, and the recordings were acoustically analysed.
The Wnding was that the intonation contours for the (preferred) narrow-
scope-negation ‘usually exhibit major or minor prosodic phrase boundaries
before the subordinate conjunction’ and ‘usually were falling contours’. These
are typical features of multi-clause sentences without negation. By contrast,
Hirschberg and Avesani noted that the intonation contours for the (dis-
preferred) wide-scope-negation ‘rarely contain internal phrase boundaries’
352 Gradience in Wh-Movement Constructions

and ‘often end in a ‘‘continuation rise’’.’ This prosody—especially the


sentence-Wnal rise—is generally perceived to be highly marked for English.
In our listening test this marked prosody was used. Thus, we predicted
that the sentences would be perceived with wide-scope negation, which
would license the negative polarity item, so that the sentences would be
judged grammatical. If instead, readers assigned the default prosody without
these marked features, they might not spot the wide-scope-negation
interpretation, and the negative polarity item would then seem to be
ungrammatical.
For the RC construction in (17.15), experimental results by Cuetos and
Mitchell (1988) have shown that the low-attachment reading is mildly pre-
ferred for ambiguous examples in English (although the opposite is true in
Spanish). It has been suggested (Fodor 1998, 2002a, 2002b) that this is for
prosodic reasons. It has been shown (Maynell 1999; Lovrič 2003) that a
prosodic boundary before an RC promotes high attachment; but English
(unlike Spanish) often has no prosodic break at the beginning of an RC. If
English readers tend to assign a contour with no pre-RC break, that would
encourage the low attachment analysis, so the verb in the RC in the experi-
mental sentences would appear to have incorrect number agreement, and a
judgement of ungrammaticality would ensue. In our listening test, we used
the marked prosody with a prosodic break at the pre-RC position, to encour-
age high attachment.
The two English constructions tested in this experiment are useful because
they diVer considerably with respect to the degree of markedness of their less
preferred prosodic contour: for (17.14) it is extreme; for (17.15) it is very slight.
We chose these two constructions in the hope that they would allow us to
bracket the sensitivity of the reading-versus-listening comparison, providing
useful baselines for future research. We predicted considerably lower accept-
ance rates in reading than in listening for the wide-scope interpretation of the
not-because construction, but a much smaller diVerence, if any, for the high-
attachment RC construction. The Japanese wh constructions were expected to
fall between these end-points.

17.4.2.2 Method and presentation In both experiments there were twelve of


each of the two types of target sentence, and for each target type there were also
twelve Wller sentences (four grammatical, eight ungrammatical) for
comparison with the targets; these ‘related Wllers’ were superWcially similar
to the targets in structure but did not contain the critical ambiguity
disambiguated to its non-preferred reading. In both experiments, the targets
and their related Wllers were presented in pseudo-random order among forty
Prosodic InXuence on Syntactic Judgements 353

assorted Wller sentences (twenty grammatical, twenty ungrammatical) with


completely diVerent structures.
One group of subjects saw all sentences on a computer screen, one whole
sentence at a time, with a timed exposure (nine seconds per sentence in the
Japanese study; twelve seconds per sentence in the English one), and read
them silently. Another group heard sound Wles of the same sentences, spoken
with appropriate prosody by an instructed native speaker. For the English
materials, there were twelve seconds between the onsets of successive spoken
sentences as for the written sentences (although none of the spoken sentences
occupied the full twelve seconds). For the Japanese spoken materials, the
presentation time was from Wve to seven seconds, tailored to the length
of the sentence. For English only, there was a third group of subjects who
heard the sound Wles simultaneously with visual presentation, for twelve
seconds per sentence. Subjects in both experiments, thirteen in each presen-
tation condition, were college students, native speakers of the language of the
experimental materials. They made rapid grammaticality judgements by
circling ‘YES’ or ‘NO’ (‘HAI’ or ‘IIE’ in the Japanese experiment) on a written
response sheet. They were then allowed to revise this initial judgement if they
wished to. This revision opportunity was aVorded in order to prevent exces-
sively thoughtful (slow) initial responses. In fact there were few revisions and
we report only the initial judgements here.

17.4.2.3 Results Acceptance rates (as percentages) are shown in Figures 17.1
to 17.5. What follows is a brief review of the experimental Wndings. We regard
these results as preliminary, and plan to follow them up with more extensive
studies, but we believe there are already outcomes of interest here, which we
hope will encourage comparable studies on other constructions and in other
languages.
Key to Wgures: In all the Wgures below, the percentage acceptance rates for
target sentences (of each type named) are represented by horizontal stripes.
The grammatical Wller sentences that are related to the targets are represented
by vertical stripes, and the ungrammatical Wllers related to the targets are
represented by dots. The assorted (unrelated) Wllers are shown separately at
the right.
In the Japanese data we see, as predicted, that the target sentences were
accepted more often in listening than in reading (see the central bars for wh-
in-situ and for matrix-scramble, across the two presentation conditions). The
diVerence is not large but it is statistically signiWcant (p < .01). Relatively
speaking, the results are very clear: in the reading condition, the targets are
intermediate in judged acceptability between their matched grammatical
354 Gradience in Wh-Movement Constructions

100.0

80.0

60.0

40.0

20.0

0.0
wh-in-situ matrix-scramble assorted fillers

Figure 17.1. Japanese reading, percent acceptance

100.0

80.0

60.0

40.0

20.0

0.0
wh-in-situ matrix-scramble assorted fillers

Figure 17.2. Japanese listening, percent acceptance

Wllers and matched ungrammatical Wllers, but in the listening condition they
draw signiWcantly closer to the grammatical Wllers, supporting the hypothesis
that the grammar does indeed license them, although only with a very
particular prosody.
Aspects of the Japanese data that need to be checked in continuing research
include the relatively poor rate of acceptance in reading for the matrix-
scramble Wller sentences,14 and the lowered acceptance of all grammatical

14 This result may dissolve in a larger-scale study. It was due here to only one of the four grammatical
Wller sentences related to the matrix-scramble experimental sentences. Unlike the other three,
which were close to 100% acceptance, this sentence was accepted at an approximately 50% level. This
Prosodic InXuence on Syntactic Judgements 355

100.0

80.0

60.0

40.0

20.0

0.0
not-because RC-attachment assorted fillers

Figure 17.3. English reading, percent acceptance

100.0

80.0

60.0

40.0

20.0

0.0
not-because RC-attachment assorted fillers

Figure 17.4. English listening, percent acceptance

Wllers in the listening condition. The general reduction in discrimination of


grammatical versus ungrammatical Wller items in listening is observed also in
the English study and its cause is considered below.
We turn now to the English data which show, as anticipated, that the beneWt
of spoken input depends on how marked the non-preferred prosody is.

one example had a matrix-scrambled/matrix-interpreted wh-phrase in a construction with three


clauses, in which there were two intervening non-wh complementizers between the overt wh-phrase
and its ultimate wh-scope marker. It is possible that in this multi-clause structure, the dispreference for
very long-EPD outweighed the preference for syntax–prosody congruence, creating an apparent
ungrammaticality.
356 Gradience in Wh-Movement Constructions

100.0

80.0

60.0

40.0

20.0

0.0
not-because RC-attachment assorted fillers

Figure 17.5. English simultaneous reading and listening, percent acceptance

For the not-because sentences, acceptance was extremely low in the reading
condition, little better than for the matched ungrammatical Wllers. In the
listening condition there was a striking increase in acceptance for these sen-
tences. It did not rise above 50 per cent, even with the appropriate prosody as
described by Hirschberg and Avesani (2000). The reason for this was apparent
in subjects’ comments on the materials after the experiment: it was often
remarked that some sentences were acceptable except for being incomplete.
In particular, the continuation rise at the end of the not-because sentences
apparently signalled that another clause should follow, to provide the real
reason for the event in question (e.g. Marvin didn’t leave the meeting early
because he was mad at anyone; he left early because he had to pick up his children
from school.)15 This sense of incompleteness clearly cannot be ascribed in the
listening condition to failure to assign a suitable prosodic contour. So it can be
regarded as a genuine syntactic/semantic verdict on these sentences. Thus this
is another case in which auditory presentation aVords a clearer view of the
syntactic/semantic status of the sentences in question. It seems that not-
because sentences with wide-scope negation stand in need of an appropriate
following discourse context—just as some other sentence types (such as (17.1)
above) stand in need of an appropriate preceding discourse context.
The RC-attachment sentences, on the other hand, showed essentially no
beneWt from auditory presentation. Acceptance in the reading condition was

15 We have found that a suitable preceding context can obviate the need for the Wnal rise, and with
it the associated expectation of a continuation. For example, a Wnal fundamental frequency fall on at
anyone is quite natural in: I have no idea what was going on that afternoon, but there’s one thing I do
know: Marvin did not leave the meeting early because he was mad at anyone. However, it is still essential
that there be no intonation phrase boundary between the not and the because-clause.
Prosodic InXuence on Syntactic Judgements 357

already quite high and it did not increase signiWcantly in the listening
condition. This could indicate that the prosodic explanation for the trend
toward low RC-attachment in English is invalid. But equally, it might show
only that this experimental protocol is not suYciently discriminating to reveal
the advantage of the appropriate prosody in this case where the diVerence is
quite subtle. The familiar preference of approximately 60 per cent for low RC-
attachment with written input is for fully ambiguous examples. For sentences
in which the ambiguity is subsequently disambiguated (e.g. by number
agreement, as in the present experiment), subjects may be able to recover
quite eYciently from this mild Wrst-pass preference once the disambiguating
information is encountered. (See Bader 1998 and Hirose 2003 for data on
prosodic inXuences on garden-path recovery in German and Japanese
respectively.) In short: the present results for relative clause attachment do
not contradict standard Wndings, although they also do not deWnitively
support a prosody-based preference for low RC attachment in English read-
ing. If prosody is the source of this preference, this experimental paradigm is
not the way to show it. This is an informative contrast with the case of the not-
because sentences, for which intuitive judgements are sharper and for which
the prosodic cues in spoken sentences had a signiWcant eVect in this experi-
mental setting.
An unwelcome outcome of the English study is that greater acceptance of the
target sentences in the listening condition is accompanied by greater accept-
ance of the related ungrammatical Wller sentences. It is conceivable, therefore,
that these Wndings are of no more interest than the discovery that inattentive
subjects can be taken in by a plausible prosodic contour applied to an ungram-
matical sentence as Hill (1961) suggested (see footnote 12). However, it seems
unlikely that this is all that underlies the considerable diVerence between
reading and listening for the not-because sentences. A plausible alternative
explanation is that listening imposes its own demands on perceivers, which
may oVset its advantages. Although auditory input provides informants with
additional linguistically relevant information in the form of a prosodic
contour, it also requires the hearer to perceive the words accurately and hold
the sentence in working memory without the opportunity for either look-
ahead or review. Our methodology provided no independent assessment of
whether errors of perception were more frequent for auditory than for visual
input. It seems likely that this was so (although the converse might be the case
for poor readers), since the distinction between grammatical and ungrammat-
ical sentences often rested on a minor morphophonological contrast. In the
English RC sentences the disambiguation turned on a singular versus plural
verb, for example walk versus walks, which could have been misheard.
358 Gradience in Wh-Movement Constructions

Although it may have a natural explanation, the ‘Hill eVect’ is a potential


disadvantage of auditory presentation for the purposes of obtaining reliable
syntactic judgements, since it decreases the discrimination between gram-
matical and ungrammatical items. To the extent that it is due to persuasive-
ness of the prosodic contour, it cannot easily be factored out. But problems of
auditory perceptibility and memory can be eliminated by presenting the
sentence in written form while it is being heard. For the English sentences,
the results for simultaneous visual and auditory presentation (see Figure 17.5)
show that the mis-acceptance of ungrammatical sentences is substantially
reduced, while the grammatical sentences are relatively unaVected or even
improved. Thus, it appears that combined visual and auditory presentation
optimizes both factors: perceptual accuracy and short-term memory are
relieved of pressure, while the extra information in the auditory stimulus
eliminates the need for prosodic creativity in reading sentences that require a
non-default contour. Combined visual and auditory presentation will there-
fore be our next step in investigating the Japanese materials.

17.5 Conclusion
These experimental Wndings, although modest as yet, support the general
moral that we were tempted to draw on the basis of informal judgements of
written and spoken sentences. That is: acceptability judgements on written
sentences are not purely syntax-driven; they are not free of prosody even
though no prosody is present in the stimulus. This has a practical conse-
quence for the conduct of syntactic research: more widespread use needs to be
made of spoken sentences for obtaining syntactic well-formedness judge-
ments. The ideal mode of presentation, as we have seen, provides both written
and auditory versions of the sentence (e.g. in a PowerPoint Wle), to minimize
perceptual and memory errors while making sure that the sentence is being
judged on the basis of the prosody intended. We are sympathetic to the fact
that this methodological conclusion entails more work for syntacticians
(Cowart 1997: 64, warns that auditory presentation is ‘time-consuming to
prepare and execute’), but it is essential nonetheless, at least for sentences
whose prosody is suspected of being out of the ordinary in any way.
References

Abney, S. (1996) ‘Statistical methods and linguistics’, in J. Klavans and P. Resnik (eds),
The Balancing Act: Combining Symbolic and Statistical Approaches to Language.
Cambridge, MA: MIT Press, pp. 1–26.
—— (1997) ‘Stochastic attribute-value grammars’, Computational Linguistics 23(4):
597–618.
Albright, A. (2002) ‘Islands of reliability for regular morphology: Evidence from
Italian’, Language 78: 684–709.
—— and Hayes, B. (2002) ‘Modeling English past tense intuitions with minimal
generalization’, in M. Maxwell (ed.), Proceedings of the 2002 Workshop on Morpho-
logical Learning. Philadelphia: Association for Computational Linguistics.
—— and Hayes, B. (2003) ‘Rules vs. analogy in English past tenses: A computational/
experimental study’, Cognition 90: 119–61.
——, Andrade, A., and Hayes, B. (2001) ‘Segmental environments of Spanish diph-
thongization’, UCLA Working Papers in Linguistics 7: 117–51.
Alexopoulou, T. and Keller, F. (2003) ‘Linguistic complexity, locality and resumption’,
in Proceedings of the 22nd West Coast Conference on Formal Linguistics. Somerville,
MA: Cascadilla Press, pp. 15–28.
Altenberg, E. P. and Vago, R. M. ms. (2002) ‘The role of grammaticality judgments in
investigating first language attrition: A cross-disciplinary perspective’, paper pre-
sented at International Conference on First Language Attrition: Interdisciplinary
Perspectives on Methodological Issues. Free University, Amsterdam, 22–24 August.
Queens College and University of New York.
Altmann, G. T. M. (1998) ‘Ambiguity in sentence processing’, Trends in Cognitive
Sciences 2: 146–52.
Andersen, T. (1991) ‘Subject and topic in Dinka’, Studies in Linguistics 15(2): 265–94.
Anderson, J. R. (1990) The Adaptive Character of Thought. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Antilla, A. (2002) ‘Variation and phonological theory’, in J. Chambers , P. Trudgill, and
N. Schilling-Estes (eds), The Handbook of Language Variation and Change. Oxford:
Blackwell, pp. 206–43.
Anttila, A. (1997) ‘Deriving variation from grammar’, in F. Hinskens, R. van Hout, and
L. Wetzels (eds.), Variation, Change and Phonological Theory. Amsterdam: John
Benjamins, pp. 35–68.
Apoussidou, D. and Boersma, P. (2004) ‘Comparing two optimality-theoretic learning
algorithms for Latin stress’, WCCFL 23: 29–42.
Ariel, M. (1990) Accessing Noun Phrase Antecedents. London: Routledge.
360 References

Asudeh, A. (2001) ‘Linking, optionality, and ambiguity in Marathi’, in P. Sells (ed.),


Formal and Empirical Issues in Optimality-Theoretic Syntax. Stanford, CA: CSLI
Publications.
Auer, P. (2000) ‘A European perspective on social dialectology’, talk presented at First
International Conference on Language Variation in Europe (ICLaVE1). Barcelona, 1 July.
—— (2005) ‘Europe’s sociolinguistic unity, or: a typology of European dialect/stand-
ard constellations’, in N. Delbecque, J. van der Auwera, D. Geeraerts (eds.), Per-
spectives on Variation. Sociolinguistic, Historical, Comparative. Trends in Linguistics.
Studies and Monographs 163: Mouton de Gruyter, pp. 8–42.
Avrutin, S. (2004) ‘Beyond narrow syntax’, in L. Jenkins (ed.), Variation and Universals
in Biolinguistics. Amsterdam: Elsevier.
Bach, E. and Harms, R. (1972) ‘How do languages get crazy rules?’, in R. Stockwell and
R. Macauley (eds), Linguistic Change and Generative Theory. Bloomington, Indiana:
Indiana University Press, pp. 1–21.
Bader, M. (1996) Sprachverstehen. Syntax und Prosodie beim Lesen. Opladen: West-
deutscher Verlag.
—— (1998) ‘Prosodic influences on reading syntactically ambiguous sentences’, in
J. D. Fodor and F. Ferreira (eds), Reanalysis in Sentence Processing. Dordrecht:
Kluwer, pp. 1–46.
—— and Frazier, L. (2005) ‘Interpretation of leftward-moved constituents: Process-
ing topicalizations in German’, Linguistics 43(1): 49–87.
—— and Meng, M. (1999) ‘Subject–object ambiguities in German embedded clauses:
An across-the-board comparison’, Journal of Psycholinguistic Research 28: 121–43.
Bailey, T. M. and Hahn, U. (2001) ‘Determinants of wordlikeness: Phonotactics or
lexical neighborhoods?’, Journal of Memory and Language, 43: 568–91.
Baltin, M. R. (1982) ‘A landing site theory of movement rules’, Linguistic Inquiry 13(1):
2–38.
Barbiers, S., Cornips, L., and van der Kleij, S. (eds) (2002) Syntactic Microvariation.
Electronic publication of the Meertens Instituut. http://www.meertens.knaw.nl/
projecten/sand/synmic/
Bard, E. G., Robertson, D., and Sorace, A. (1996) ‘Magnitude estimation of linguistic
acceptability’, Language 72: 32–68.
Barnes, J. and Kavitskaya, D. (2002) ‘Phonetic analogy and schwa deletion in French’,
presented at the Berkeley Linguistic Society.
Baum, L. E. (1972) ‘An inequality and associated maximization technique in statistical
estimation for probabilistic functions of Markov processes’, Inequalities 3: 1–8.
Beckman, M. (1996) ‘The parsing of prosody’, Language and Cognitive Processes 11:
17–67.
—— (2003) ‘Input representations (inside the mind and out)’, in M. Tsujimura
and G. Garding (eds), WCCFL 22 Proceedings. Somerville, MA: Cascadilla Press,
pp. 101–25.
Beckman, M. E. and Ayers-Elam, G. (1993) Guidelines for ToBI Labelling. http://
www.ohio-state.edu/research/ phonetics/E_ToBI/singer_tobi.html.
References 361

Beckman, M., Munson, B., and Edwards, J. (2004) ‘Vocabulary growth and develop-
mental expansion of types of phonological knowledge’, LabPhon 9, pre-conference
draft.
beim Graben, P., Saddy, J. D., Schlesewsky, M., and Kurths, J. (2000) ‘Symbolic
dynamics of event-related brain potentials’, Physical Review E 62: 5518–41.
Belletti, A. (2004) ‘Aspects of the low IP area’, in L. Rizzi (ed.), The Structure of CP and
IP. The Cartography of Syntactic Structures, Volume 2. Oxford: Oxford University
Press, pp. 16–51.
——, Bennati, E., and Sorace, A. (2005) ‘Revisiting the null subject parameter from
an L2 developmental perspective’, paper presented at the XXXI Conference on
Generative Grammar, Rome, February 2005.
Bentley, D. and Eythórsson, T. (2004) ‘Auxiliary selection and the semantics of
unaccusativity’, Lingua 114: 447–71.
Benua, L. (1998) ‘Transderivational Identity’, Ph.D. thesis, University of Massachusetts.
Berent, I., Pinker, S., and Shimron, J. (1999) ‘Default nominal inflection in Hebrew:
Evidence for mental variables’, Cognition 72: 1–44.
Berg, T. (1998) Linguistic Structure and Change: An Explanation from Language
Processing. Oxford: Clarendon Press.
Berger, A., Della Pietra, S., and Della Pietra,V. (1996) ‘A maximum entropy approach
to natural language processing’, Computational Linguistics 22(1): 39–71.
Berko, J. (1958) ‘The child’s learning of English morphology’, Word 14: 150–77.
Bever, T. G. (1970) ‘The cognitive basis for linguistic structures’, in J. R. Hayes (ed.),
Cognition and the Development of Language. New York: John Wiley.
Bierwisch, M. (1968) ‘Two critical problems in accent rules’, Journal of Linguistics 4: 173–8.
—— (1988) ‘On the grammar of local prepositions’, in M. Bierwisch, W. Motsch, and
I. Zimmermann (eds), Syntax, Semantik und Lexicon (¼ Studia Grammatica XXIX).
Berlin: Akademie Verlag, pp. 1–65.
Bini, M. (1993) ‘La adquisicı́on del italiano: mas allá de las propiedades sintácticas del
parámetro pro-drop’, in J. Liceras (ed.), La linguistica y el analisis de los sistemas no
nativos. Ottawa: Doverhouse, pp. 126–39.
Birch, S. and Clifton, C. (1995) ‘Focus, accent, and argument structure: Effects on
language comprehension’, Language and Speech 38: 365–91.
Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford: Oxford Uni-
versity Press.
Blancquaert, E., Claessens, J., Goffin, W., and Stevens, A. (eds) (1962) Reeks Neder-
landse Dialectatlassen: Dialectatlas van Belgisch-Limburg en Zuid-Nederlands Lim-
burg, 8. Antwerpen: De Sikkel.
Blevins, J. (2004) Evolutionary Phonology: The Emergence of Sound Patterns. Cam-
bridge: Cambridge University Press.
Bod, R. (1998) Beyond Grammar: An Experience-Based Theory of Language. Stanford,
CA: Center for the Study of Language and Information.
——, Hay, J., and Jannedy, S. (2003) Probabilistic Linguistics. Cambridge, MA: MIT
Press.
362 References

Boersma, P. (1997) ‘How we learn variation, optionality, and probability’, Proceedings


of the Institute of Phonetic Sciences of the University of Amsterdam 21: 43–58.
—— (1998a) ‘Functional Phonology: Formalizing the Interactions between Articula-
tory and Perceptual Drives’, Ph.D. thesis, University of Amsterdam.
—— (1998b) ‘Typology and acquisition in functional and arbitrary phonology’, MS,
University of Amsterdam. http://www.fon.hum.uva.nl/paul/ papers/typ_acq.pdf.
—— (2000) ‘Learning a grammar in functional phonology’, in J. Dekkers, F. van der
Leeuw, and J. van de Weijer (eds), Optimality Theory: Phonology, Syntax, and
Acquisition. Oxford: Oxford Univeristy Press, pp. 465–523.
—— (2001) ‘Phonology-semantics interaction in OT, and its acquisition’, in
R. Kirchner, W. Wikeley, and J. Pater (eds), Papers in Experimental and Theoretical
Linguistics, Vol. 6. Edmonton: University of Alberta, pp. 24–35.
—— (2004) ‘A stochastic OT account of paralinguistic tasks such as grammaticality
and prototypicality judgments’, unpublished manuscript, Rutgers Optimality Arch-
ive no. 648-0304.
—— (2005) ‘Some listener-oriented accounts of hache aspiré in French’, MS, Univer-
sity of Amsterdam. Rutgers Optimality Archive 730. http://roa.rutgers.edu
—— and Escudero, P. (2004) ‘Learning to perceive a smaller L2 vowel inventory: An
optimality theory account’, Rutgers Optimality Archive 684.
—— and Hayes, B. (2001) ‘Empirical tests of the gradual learning algorithm’, Lin-
guistic Inquiry 32(1): 45–86.
——, Escudero, P., and Hayes, R. (2003) ‘Learning abstract phonological from
auditory phonetic categories: An integrated model for the acquisition of lan-
guage-specific sound categories’, Proceedings of the 15th International Congress of
Phonetic Sciences, 1013–16.
Boethke, J. (2005) ‘Kasus im Deutschen: Eine empirische Studie am Beispiel freier
Relativsätze’, Diploma thesis, Institute of Linguistics, University of Potsdam.
Bolinger, D. L. (1961a) Generality, Gradience and the All-or-None. The Hague: Mouton.
—— (1961b) ‘Syntactic blends and other matters’, Language 37: 366–81.
—— (1972) ‘Accent is predictable (If you’re a mind-reader)’, Language 48: 633–44.
—— (1978) ‘Asking more than one thing at a time’, in H. Hiz (ed.), Questions.
Dordrecht: Reidel.
Borer, H. (1994) ‘The projection of arguments’, in E. Benedicto and J. Runner (eds),
Functional Projections. University of Massachusetts, Amherst: Occasional Papers 17.
Bornkessel, I., Schlesewsky, M., McElree, B., and Friederici, A. D. (2004a) ‘Multi-
dimensional contributions to garden-path strength: Dissociating phrase structure
from case marking’, Journal of Memory and Language 51: 495–522.
——, Fiebach, C. J., Friederici, A. D., and Schlesewsky, M. (2004b) ‘Capacity recon-
sidered: Interindividual differences in language comprehension and individual
alpha frequency’, Experimental Psychology 51: 279–89.
——, McElree, B., and Schlesewsky, M. (submitted). ‘On the time course of reanaly-
sis: The dynamics of verb-type effects’.
References 363

Brants, T. and Crocker, M. W. (2000) ‘Probabilistic parsing and psychological plausi-


bility’, in Proceedings of the 18th International Conference on Computational Linguis-
tics. Saarbrücken/Luxembourg/Nancy.
Bresnan, J. (2000) ‘The emergence of the unmarked pronoun’, in G. Legendre,
J. Grimshaw, and S. Vikner (eds), Optimality-Theoretic Syntax. Cambridge, MA:
MIT Press, pp. 113–42.
——, Dingare, S., and Manning, C. D. (2001) ‘Soft constraints mirror hard con-
straints; Voice and person in English and Lummi’, in M. Butt and T. Holloway King
(eds), Proceedings of the LFG01 Conference. Stanford University, Stanford: CSLI
Publications, pp. 13–32.
Briscoe, T. and Carroll, J. (1993) ‘Generalised probabilistic LR parsing for unification-
based grammars’, Computational Linguistics 19: 25–60.
Broekhuis, H. and Cornips, L. (1994) ‘Undative constructions’, Linguistics 32(2): 173–89.
—— and Cornips, L. (1997) ‘Inalienable possession in locational constructions’,
Lingua 101: 185–209.
Browman, C. and Goldstein, L. (1992) ‘Articulatory phonology: An overview’,
Phonetica 49: 155–80.
Brysbaert, M. and Mitchell, D. C. (1996) ‘Modifier attachment in sentence parsing:
Evidence from Dutch’, Quarterly Journal of Experimental Psychology 49A: 664–95.
Büring, D. (1997) The Meaning at Topic and Focus – The 59th Street Bridge Accent.
London: Routledge.
—— (2001) ‘Let’s phrase it!—Focus, word order, and prosodic phrasing in German
double object constructions’, in G. Müller and W. Sternefeld (eds), ‘Competition in
Syntax’, No. 49 in Studies in Generative Grammar. Berlin and New York: de Gruyter,
pp. 101–37.
Burnage, G. (1990) CELEX—A Guide for Users. Nijmegen: Centre for Lexical Infor-
mation, University of Nijmegen.
Burnard, L. (1995) Users Guide for the British National Corpus. British National Corpus
Consortium, Oxford University Computing Service.
Burton-Roberts, N., Carr, P., and Docherty, G. (2000) Phonological Knowledge: Con-
ceptual and Empirical Issues. New York: Oxford University Press.
Burzio, L. (1986) Italian Syntax: A Government-Binding Approach. Dordrecht: Foris.
—— (2002) ‘Surface-to-surface morphology: When your representations turn into
constraints’, in P. Boucher (ed.), Many Morphologies. Somerville, MA: Cascadilla
Press, pp. 142–77.
Bybee, J. L. (1994) ‘A view of phonology from a cognitive and functional perspective’,
Cognitive Linguistics 5(4): 285–305.
—— (2000a) ‘Lexicalization of sound change and alternating environments’, in M. B.
Broe and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in Labora-
tory Phonology V. Cambridge: Cambridge University Press, pp. 250–69.
—— (2000b) ‘The phonology of the lexicon: Evidence from lexical diffusion’, in M.
Barlow and S. Kemmer (eds), Usage-Based Models of Language. Palo Alto: CSLI
Publications.
364 References

Bybee, J. L. (2001) Phonology and Language Use. Cambridge: Cambridge University


Press.
—— (2003) ‘Mechanisms of change in grammaticalization: The role of frequency’, in
R. D. Janda and B. D. Joseph (eds), Handbook of Historical Linguistics. Oxford:
Blackwell.
—— and Hopper, P. (eds) (2001) Frequency and the Emergence of Linguistic Structure.
Amsterdam: John Benjamins.
—— and Moder, C. L. (1983) ‘Morphological classes as natural categories’, Language
59: 251–70.
Carden, G. (1976) ‘Syntactic and semantic data: Replication results’, Language in
Society 5: 99–104.
Cardinaletti, A. and Starke, M. (1999) ‘The typology of structural deficiency. A case
study of the three classes of pronouns’, in H. van Riemsdijk (ed.), Clitics in the
Languages of Europe, Vol. 8 of Language Typology. Berlin: Mouton de Gruyter.
Carlson, K. (2001) ‘The effects of parallelism and prosody in the processing of gapping
structures’, Language and Speech 44: 1–26.
Carroll, G. and Rooth, M. (1998) ‘Valence induction with a head-lexicalized PCFG’, in
Proceedings of the Conference on Empirical Methods in Natural Language Processing.
Granada, pp. 36–45.
Cedergren, H. J. and Sankoff, D. (1974) ‘Variable rules: Performance as a statistical
reflection of competence’, Language: Journal of the Linguistic Society of America 50:
333–55.
Cennamo, M. and Sorace, A. (in press) ‘Unaccusativity at the syntax-lexicon interface:
Evidence from Paduan’, to appear in R. Aranovich (ed.), Cross-linguistic Perspectives
on Auxiliary Selection. Amsterdam: John Benjamins.
Charniak, E. (2000) ‘A maximum-entropy-inspired parser’, in Proceedings of the 1st
Conference of the North American Chapter of the Association for Computational
Linguistics. Seattle, WA, pp. 132–9.
Chater, N., Crocker, M., and Pickering, M. (1998) ‘The rational analysis of inquiry:
The case for parsing’, in N. Chater and M. Oaksford (eds), Rational Models of
Cognition. Oxford: Oxford University Press, pp. 44–468.
Chen, M. (1970) ‘Vowel length variation as a function of the voicing of consonant
environment’, Phonetica 22: 129–59.
Chomsky, N. (1955) The Logical Structure of Linguistic Theory. Published (in part) as
Chomsky (1975).
—— (1957) Syntactic Structures. The Hague: Mouton.
—— (1964) ‘Degrees of grammaticalness’, in J. A. Fodor and J. J. Katz (eds), The
Structure of Language: Readings in the Philosophy of Language. Englewood Cliffs, NJ:
Prentice-Hall, pp. 384–9.
—— (1965) Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.
—— (1975) The Logical Structure of Linguistic Theory. New York: Plenum Press.
—— (1981) Lectures on Government and Binding. Dordrecht: Foris.
References 365

Chomsky, N. (1986) Knowledge of Language. Its Nature, Origin, and Use. New York/
Westport/London: Praeger.
—— (1995) The Minimalist Program. Cambridge, MA: MIT Press.
—— and Halle, M. (1968) The Sound Pattern of English. New York: Harper and Row.
—— and Miller, G. A. (1963) ‘Introduction to the formal analysis of natural lan-
guages’, in R. D. Luce, R. R. Bush, and E. Galanter (eds), Handbook of Mathematical
Psychology, volume II. New York: John Wiley.
Christiansen, M. H. and Chater, N. (1999) ‘Connectionist natural language processing:
The state of the art’, Cognitive Science 23: 417–37.
—— and Chater, N. (2001) ‘Connectionist psycholinguistics: Capturing the empirical
data’, Trends in Cognitive Sciences 5: 82–8.
Cinque, G. (1990) Types of A’-Dependencies. Cambridge, MA: MIT Press.
—— (1993) ‘A null theory of phrase and compound stress’, Linguistic Inquiry 24:
239–97.
Clahsen, H. and Felser, C. (in press) ‘Grammatical processing in language learners’, to
appear in Applied Psycholinguistics.
Clements, G. N. (1992) ‘Phonological primes: Gestures or features?’, Working Papers of
the Cornell Phonetics Laboratory 7: 1–15.
Coetzee, A. (2004) ‘What it Means to be a Loser: Non-Optimal Candidates in
Optimality Theory’, Ph.D. thesis, University of Massachusetts.
Cohn, A. (1990) ‘Phonetic and Phonological Rules of Nasalization’, Ph.D. thesis,
UCLA, distributed as UCLA Working Papers in Phonetics 76.
—— (1993) ‘Nasalization in English: Phonology or phonetic’, Phonology 10: 43–81.
—— (1998) ‘The phonetics-phonology interface revisited: Where’s phonetics?’, Texas
Linguistic Forum 41: 25–40.
—— (2003) ‘Phonetics in phonology and phonology in phonetics’, paper presented at
11th Manchester Phonology Meeting, Manchester, UK.
——, Brugman, J., Clifford, C., and Joseph, A. (2005) ‘Phonetic duration of English
homophones: An investigation of lexical frequency effects’, presented at LSA, 79th
meeting, Oakland, CA.
Coleman, J. and Pierrehumbert, J. B. (1997) ‘Stochastic phonological grammars and
acceptability’, in Computational Phonology: Third Meeting of the ACL Special Interest
Group in Computational Phonology. Somerset, NJ: Association for Computational
Linguistics, 49–56.
Coles, M. G. H. and Rugg, M. D. (1995) ‘Event-related brain potentials: An introduc-
tion’, in M. D. Rugg and M. G. H. Coles (eds), Electrophysiology of Mind: Event-
Related Brain Potentials and Cognition. Oxford, UK: Oxford University Press,
pp. 1–26.
Collins, M. (1999) ‘Head-Driven Statistical Models for Natural Language Parsing’,
Ph.D. thesis, University of Pennsylvania, Philadelphia, PA.
Connine, C. M., Ferreira, F., Jones, C., Clifton, C., and Frazier, L. (1984) ‘Verb frame
preferences: Descriptive norms’, Journal of Psycholinguistic Research 13: 307–19.
366 References

Corley, S. and Crocker, M. (2000) ‘The modular statistical hypothesis: Exploring


lexical category ambiguity’, in M. Crocker, M. Pickering, and C. Clifton (eds),
Architectures and Mechanisms for Language Processing. Cambridge: Cambridge
University Press.
Cornips, L. (1996) ‘Social stratification, linguistic constraints and inherent variability
in Heerlen Dutch: The use of the complementizers om/voor’, in J. Arnold et al.
(eds), Sociolinguistic Variation: Data, Theory, and Analysis. Selected papers from
NWAVE- 23, CSLI Publications: Stanford University, pp. 453–67.
—— (1998) ‘Syntactic variation, parameters and their social distribution’, Language
Variation and Change 10(1): 1–21.
—— and Corrigan, K. (2005) ‘Convergence and divergence in grammar’, in P. Auer, F.
Hinskens, and P. Kerswill (eds), The Convergence and Divergence of Dialects in
Contemporary Societies. Cambridge: Cambridge University Press.
—— and Hulk, A. (1996) ‘Ergative reflexives in Heerlen Dutch and French’, Studia
Linguistica 50(1): 1–21.
—— and Jongenburger, W. (2001) ‘Elicitation techniques in a Dutch syntactic dialect
atlas project’, in H. Broekhuis and T. van der Wouden (eds), Linguistics in The
Netherlands 2001, 18. Amsterdam/Philadelphia: John Benjamins, pp. 57–69.
—— and Poletto, C. (2005) ‘On standardising syntactic elicitation techniques. Part I’,
Lingua 115(7): 939–57.
Cowart, W. (1997) Experimental Syntax: Applying Objective Methods to Sentence
Judgments. Thousand Oaks, CA: Sage Publications.
Crocker, M. (1996) Computational Psycholinguistics: An Interdisciplinary Approach to
the Study of Language. Dordrecht: Kluwer.
—— (1999) ‘Mechanisms for sentence processing’, in S. Garrod and M. Pickering
(eds), Language Processing. London: Psychology Press.
—— (to appear) ‘Rational models of comprehension: Addressing the performance
paradox’, in A. Cutler (ed.), Twenty-First Century Psycholinguistics: Four Corner-
stones. Hillsdale: Lawrence Erlbaum.
—— and Brants, T. (2000) ‘Wide-coverage probabilistic sentence processing’, Journal
of Psycholinguistic Research 29: 647–69.
—— and Corley, S. (2002) ‘Modular architectures and statistical mechanisms: The
case from lexical category disambiguation’, in P. Merlo and S. Stevenson (eds), The
Lexical Basis of Sentence Processing: Formal, Computational, and Experimental
Issues. Amsterdam: John Benjamins, pp. 157–80.
Cuetos, F. and Mitchell, D. C. (1988) ‘Cross-linguistic differences in parsing: Restric-
tions on the late closure strategy in Spanish’, Cognition 30: 73–105.
——, Mitchell, D. C., and Corley, M. M. B. (1996) ‘Parsing in different languages’, in
M. Carreiras, J. Garcı́a-Albea, and N. Sabastián-Gallés (eds), Language Processing in
Spanish. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 145–89.
Culy, C. (1998) ‘Statistical distribution and the grammatical/ungrammatical distinc-
tion’, Grammars 1: 1–19.
References 367

Davis, S. and Baertsch, K. (2005) ‘The diachronic link between onset clusters and
codas’, in Proceedings of the Annual Meeting of the Berkeley Linguistics Society,
BLS 31.
De Smedt, K. J. M. J. (1994) ‘Parallelism in incremental sentence generation’, in G.
Adriaens and U. Hahn (eds), Parallelism in Natural Language Processing. New
Jersey: Ablex.
De Vincenzi, M. (1991) Syntactic Parsing Strategies in Italian. Dordrecht: Kluwer
Academic Publishers.
Deguchi, M. and Kitagawa, Y. (2002) ‘Prosody and Wh-questions’, in M. Hirotani
(ed.), Proceedings of the Thirty-second Annual Meeting of the North-Eastern Linguis-
tic Society, pp. 73–92.
Dell, G. S. (1986) ‘A spreading activation theory of retrieval in sentence production’,
Psychological Review 93: 283–321.
Diesing, M. (1992) Indefinites. Cambridge, MA: MIT Press.
Dryer, M. S. (1992) ‘The Greenbergian word order correlations’, Language 68: 81–138.
Duffield, N. (2003) ‘Measures of competent gradedness’, in R. van Hout, A. Hulk,
F. Kuiken, and R. Towel (eds), The Interface between Syntax and the Lexicon in
Second Language Acquisition. Amsterdam: John Benjamins.
Duffy, S. A., Morris, R. K., and Rayner, K. (1988) ‘Lexical ambiguity and fixation times
in reading’, Journal of Memory and Language 27: 429–46.
Elman, J. L. (1991) ‘Distributed representations, simple recurrent networks and gram-
matical structure’, Machine Learning 9: 195–225.
—— (1993) ‘Learning and development in neural networks: The importance of
starting small’, Cognition 48: 71–99.
Erteschik-Shir, N. (1973) ‘On the Nature of Island Constraints’, Ph.D. thesis, MIT.
—— (1982) ‘Extractability in Danish and the pragmatic principle of dominance’, in E.
Engdahl and E. Ejerhed (eds), Readings on Unbounded Dependencies in Scandi-
navian Languages. Sweden: Umeå.
—— (1986) ‘Wh-questions and focus’, Linguistics and Philosophy 9: 117–49.
—— (1997) The Dynamics of Focus Structure. Cambridge: Cambridge University
Press.
—— (1999) ‘Focus structure theory and intonation’, Language and Speech 42(2–3):
209–27.
—— (2003) ‘The syntax, phonology and interpretation of the information structure
primitives topic and focus’, talk presented at GLOW workshop: Information struc-
ture in generative theory vs. pragmatics, The University of Lund, Sweden.
—— and Lappin, S. (1983) ‘Dominance and extraction: A reply to A. Grosu’, Theor-
etical Linguistics 10: 81–96.
—— and Rapoport, T. R. (to appear) The Atoms of Meaning: Interpreting Verb
Projections. Ben Gurion University.
Escudero, P. and Boersma, P. (2003) ‘Modelling the perceptual development of
phonological contrasts with optimality theory and the gradual learning algorithm’,
368 References

in S. Arunachalam, E. Kaiser, and A. Williams (eds), Proceedings of the 25th Annual


Penn Linguistics Colloquium. Penn Working Papers in Linguistics 8(1): 71–85.
Escudero, P. and Boersma, P. (2004) ‘Bridging the gap between L2 speech perception
research and phonological theory’, Studies in Second Language Acquisition 26: 551–85.
Everaert, M. (1986) The Syntax of Reflexivization. Dordrecht: Foris.
Fanselow, G. (1988) ‘Aufspaltung von NP und das Problem der ‘‘freien’’ Wortstellung’,
Linguistische Berichte 114: 91–113.
—— (2000) ‘Optimal exceptions’, in B. Stiebels and D. Wunderlich (eds), Lexicon in
Focus. Berlin: Akademie Verlag, pp. 173–209.
—— (2004) ‘The MLC and derivational economy’, in A. Stepanov, G. Fanselow, and
R. Vogel (eds), Minimality Effects in Syntax. Berlin: Mouton de Gruyter.
—— and Ćavar, D. (2002) ‘Distributed deletion’, in A. Alexiadou (ed.), Theoretical
Approaches to Universals. Amsterdam: Benjamins, pp. 65–107.
——, Kliegl, R., and Schlesewksy, M. (1999) ‘Processing difficulty and principles of
grammar’, in S. Kemper and R. Kliegl (eds), Constraints on Language. Aging,
Grammar, and Memory. Kluwer: Boston, pp. 171–201.
——, Kliegl, R., and Schlesewksy, M. (in preparation) ‘Syntactic variation in German
Wh-questions’, to appear in Linguistic Variation Yearbook 2005.
Fasold, R. (1991) ‘The quiet demise of variable rules’, American Speech 66: 3–21.
Featherston, S. (2004) ‘The decathlon model of empirical syntax’, in S. Kepser and
M. Reis (eds), Linguistic Evidence. Berlin: Mouton de Gruyter.
—— (2005) ‘Universals and grammaticality: Wh-constraints in German and English’,
Linguistics 43 (4).
—— (to appear) ‘Universals and the counter-example model: Evidence from Wh-
constraints in German’, MS, University of Tübingen.
Felser, C., Clahsen, H., and Münte, T. (2003) ‘Storage and integration in the process-
ing of filler-gap dependencies: An ERP study of topicalization and Wh-movement
in German’, Brain and Language 87: 345–54.
——, Roberts, L., Marinis, T., and Gross, R. (2003) ‘The processing of ambiguous
sentences by first and second language learners of English’, Applied Psycholinguistics
24: 453–89.
Ferreira, F. and Clifton, C. (1986) ‘The independence of syntactic processing’, Journal
of Memory and Language 25: 348–68.
Féry, C. (1993) German Intonational Patterns. Tübingen: Niemeyer.
—— (2005) ‘Laute und leise Prosodie’, in H. Blühdorn (ed.), Text-Verstehen. Gram-
matik und darüber hinaus. 41. IDS-Jahrbuch 2005. Berlin: Mouton De Gruyter,
pp. 162–81.
—— and Samek-Lodovici, V. (2006) ‘Focus projection and prosodic prominence in
nested foci’, Language 82(1): 131–50.
Fiebach, C., Schlesewsky, M., and Friederici, A. (2002) ‘Separating syntactic memory
costs and syntactic integration costs during parsing: The processing of German
Wh-questions’, Journal of Memory and Language 47: 250–72.
References 369

——, Schlesewsky, M., Bornkessel, I., and Friederici, A. D. (2004) ‘Distinct neural
correlates of legal and illegal word order variations in German: How can fMRI
inform cognitive models of sentence processing’, in M. Carreiras and C. Clifton, Jr.
(eds), The On-line Study of Sentence Comprehension. New York: Psychology Press,
pp. 357–70.
Filiaci, F. (2003) ‘The Acquisition of Null and Overt Subjects by English-Near-Native
Speakers of Italian’, M.Sc. thesis, University of Edinburgh.
Fischer, S. (2004) ‘Optimal binding’, Natural Language and Linguistic Theory 22:
481–526.
Flemming, E. (2001) ‘Scalar and categorical phenomena in a unified model of
phonetics and phonology’, Phonology 18: 7–44.
Fodor, J. D. (1998) ‘Learning to parse?’, Journal of Psycholinguistic Research 27: 285–319.
—— (2002a) ‘Prosodic disambiguation in silent reading’, in M. Hirotani (ed.),
Proceedings of the Thirty-second Annual Meeting of the North-Eastern Linguistic
Society, pp. 113–37.
—— (2002b) ‘Psycholinguistics cannot escape prosody’, Proceedings of the Speech
Prosody 2002 Conference, Aix-en-Provence, pp. 83–8.
—— and Frazier, L. (1978) ‘The sausage machine: A new two-stage parsing model’,
Cognition 6: 291–325.
Ford, M., Bresnan, J., and Kaplan, R. M. (1982) ‘A competence-based theory of
syntactic closure’, in J. Bresnan (ed.), The Mental Representation of Grammatical
Relations, Cambridge, MA: MIT Press, pp. 727–96.
Francis, N., Kucera, H., and Mackie, A. (1982) Frequency Analysis of English Usage:
Lexicon and Grammar. Boston: Houghton Mifflin.
Frazier, L. (1978) ‘On Comprehending Sentences: Syntactic Parsing Strategies’, Ph.D.
thesis, University of Connecticut.
—— (1987) ‘Syntactic processing: Evidence from Dutch’, Natural Language and
Linguistic Theory 5: 519–59.
—— and Clifton, C. (1996) Construal. Cambridge, MA: MIT Press.
—— and d’Arcais, G. F. (1989) ‘Filler-driven parsing: A study of gap-filling in Dutch’,
Journal of Memory and Language 28: 331–44.
—— and Rayner, K. (1987) ‘Resolution of syntactic category ambiguities: Eye move-
ments in parsing lexically ambiguous sentences’, Journal of Memory and Language
26: 505–26.
Frieda, E. M., Walley, A. C., Flege, J. E., and Sloane, M. E. (2000) ‘Adults’ perception
and production of the English vowel /i/’, Journal of Speech, Language, and Hearing
Research 43: 129–43.
Friederici, A. D. (2002) ‘Towards a neural basis of auditory sentence processing’,
Trends in Cognitive Sciences 6: 78–84.
—— and Mecklinger, A. (1996) ‘Syntactic parsing as revealed by brain responses: First
pass and second pass parsing processes’, Journal of Psycholinguistic Research 25:
157–76.
370 References

Frisch, S., Schlesewsky, M., Saddy, D., and Alpermann, A. (2001) ‘Why syntactic ambi-
guity is costly after all. Reading time and ERP evidence’, AMLaP Saarbrücken 2001.
——, Schlesewsky, M., Saddy, D., and Alpermann, A. (2002) ‘The P600 as an indica-
tor of syntactic ambiguity’, Cognition 85: B83–B92.
Frisch, S. A. (1996) ‘Similarity and Frequency in Phonology’, Ph.D. thesis, North-
western University.
—— (2000) ‘Temporally organized lexical representations as phonological units’, in
M. B. Broe and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in
Laboratory Phonology V. Cambridge: Cambridge University Press, pp. 283–98.
—— (2004) ‘Language processing and OCP effects’, in B. Hayes, R. Kirchner, and D.
Steriade (eds), Phonetically-Based Phonology. Cambridge: Cambridge University
Press, pp. 346–71.
—— and Zawaydeh, B. A. (2001) ‘The psychological reality of OCP-place in Arabic’,
Language 77: 91–106.
——, Broe, M., and Pierrehumbert, J. (1997) ‘Similarity and phonotactics in Arabic’.
MS, Indiana University and Northwestern University.
——, Large, N. R., and Pisoni, D. B. (2000) ‘Perception of wordlikeness: Effects of
segment probability and length on the processing of nonwords’, Journal of Memory
and Language 42: 481–96.
——, Large, N., Zawaydeh, B., and Pisoni, D. (2001) ‘Emergent phonotactic general-
izations’, in J. L. Bybee and P. Hopper (eds), Frequency and the Emergence of
Linguistic Structure, Amsterdam: John Benjamins, pp. 159–80.
——, Pierrehumbert, J. B., and Broe, M. B. (2004) ‘Similarity avoidance and the
OCP’, Natural Language and Linguistic Theory 22: 179–228.
Ganong, W. F. III (1980) ‘Phonetic categorization in auditory word perception’,
Journal of Experimental Psychology: Human Perception and Performance 6: 110–25.
Garnsey, S. M. (1993) ‘Event-related brain potentials in the study of language: An
introduction’, Language and Cognitive Processes 8: 337–56.
——, Pearlmutter, N. J., Myers, E. M., and Lotocky, M. A. (1997) ‘The contributions
of verb bias and plausibility to the comprehension of temporarily ambiguous
sentences’, Journal of Memory and Language 37: 58–93.
Gathercole, S. and Baddeley, A. (1993) Working memory and language. Essays in
Cognitive Psychology. Hove: Lawrence Erlbaum.
Gervain, J. (2002) ‘Linguistic Methodology and Microvariation in Language: The Case
of Operator-Raising in Hungarian’, unpublished M.A. thesis, Dept. of Linguistics,
University of Szeged.
Gibson, E. (1998) ‘Linguistic complexity: Locality of syntactic dependencies’, Cogni-
tion 68: 1–76.
—— and Pearlmutter, N. J. (1998) ‘Constraints on sentence comprehension’, Trends in
Cognitive Sciences 2: 262–8.
—— and Schütze, C. T. (1999) ‘Disambiguation preferences in noun phrase conjunc-
tion do not mirror corpus frequency’, Journal of Memory and Language 40: 263–79.
References 371

Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., and Hickok, G. (1996a) ‘Cross-
linguistic attachment preferences: Evidence from English and Spanish’, Cognition
59: 23–59.
——, Schütze, C. T., and Salomon, A. (1996b) ‘The Relationship between the Fre-
quency and the Processing Complexity of Linguistic Structure’, Journal of Psycho-
linguistic Research 25: 59–92.
Godfrey, J. J., Holliman, E. C., and McDaniel, J. (1992) ‘SWITCHBOARD: Telephone
speech corpus for research and development’, in IEEE International Conference on
Acoustics, Speech and Signal Processing 1992, pp. 517–20.
Goldinger, S. D. (2000) ‘The role of perceptual episodes in lexical processing’, in
A. Cutler, J. M. McQueen, and R. Zondervan (eds), Proceedings of SWAP (Spoken
Word Access Processes), Nijmegen: Max Planck Institute for Psycholinguistics,
pp. 155–9.
Goldstone, R., Medin, D., and Gentner, D. (1991) ‘Relational similarity and the non-
independence of features in similarity judgments’, Cognitive Psychology 23: 222–62.
Goldwater, S. and Johnson, M. (2003) ‘Learning OT constraint rankings using a
maximum entropy model’, in J. Spenader, A. Eriksson, and Ö. Dahl (eds), Proceed-
ings of the Stockholm Workshop on Variation within Optimality Theory, Stockholm
University, pp. 111–20.
Grabe, E. (1998) ‘Comparative Intonational Phonology: English and German’, Ph.D.
thesis, Universiteit Nijmegen.
Greenberg, J. H. (1963) ‘Some universals of grammar with particular reference to the
order of meaningful elements’, in J. H. Greenberg (ed.), Universals of Language,
Cambridge, MA: MIT Press, pp. 73–113.
—— and Jenkins, J. J. (1964) ‘Studies in the psychological correlates of the sound
system of American English’, Word 20: 157–77.
Grice, M., Baumann, S., and Benzmüller, R. (2003) ‘German intonation in autoseg-
mental phonology’, in S.-A. Jun (ed.), Prosodic Typology. Oxford: Oxford University
Press.
Grimshaw, J. (1997) ‘Projection, heads, and optimality’, Linguistic Inquiry 28:
373–422.
—— and Samek-Lodovici, V. (1998) ‘Optimal subjects and subject universals’, in P.
Barbosa, D. Fox, P. Hangstrom, M. McGinnis, and D. Pesetsky (eds), Is the Best
Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press,
pp. 193–219.
Grodzinsky, Y. and Reinhart, T. (1993) ‘The innateness of binding and coreference’,
Linguistic Inquiry 24: 69–101.
Groos, A. and H. van Riemsdijk (1981) ‘Matching effects with free relatives:
A parameter of core grammar’, in A. Belletti, L. Brandi, and L. Rizzi (eds), Theories
of Markedness in Generative Grammar. Pisa: Scuola Normale Superiore di Pisa,
pp. 171– 216.
Grosjean, F. (1980) ‘Spoken word recognition processes and the Gating paradigm’,
Perception and Psychophysics 28: 267–83.
372 References

Guenther, F. H. and Gjaja, M. N. (1996) ‘The perceptual magnet effect as an emergent


property of neural map formation’, JASA 100: 1111–21.
Gürel, A. (2004) ‘Selectivity in L2-induced L1 attrition: A psycholinguistic account’,
Journal of Neurolinguistics 17: 53–78.
Gussenhoven, C. (1983) ‘Testing the reality of focus domains’, Language and Speech 26:
61–80.
—— (1984) On the Grammar and Semantics of Sentence Accents. Dordrecht: Foris.
—— (1992) ‘Sentence accents and argument structure’ in I. Roca (ed.), Thematic
Structure. Its Role in Grammar. Berlin: Foris, pp. 79–106.
—— (2004) The Phonology of Tone and Intonation. Cambridge: Cambridge University
Press.
Guy, G. (1980) ‘Variation in the group and the individual’, in W. Labov (ed.), Locating
Language in Time and Space. New York: Academic Press, pp. 1–36.
—— (1981) Linguistic Variation in Brazilian Portuguese: Aspects of the Phonology,
Syntax, and Language History, Ph.D. thesis, University of Pennsylvania.
—— and Boberg, C. (1997) ‘Inherent variability and the obligatory contour principle’,
Language Variation and Change 9: 149–64.
Hahn, U. and Bailey, T. M. (2003) ‘What makes words sound similar?’, MS, Cardiff
University.
Hahne, A. and Friederici, A. (2001) ‘Processing a second language: Late learners’
comprehension mechanisms as revealed by event-related brain potentials’, Bilin-
gualism: Language and Cognition 4: 123–41.
Haider, H. (1993) Deutsche Syntax, generativ. Tübingen: Narr.
—— and Rosengren, I. (2003) ‘Scrambling: Nontriggered chain formation in OV
languages’, Journal of Germanic Linguistics 15: 203–67.
Hale, J. (2001) ‘A probabilistic Earley parser as a psycholinguistic model’ in Proceed-
ings of the 2nd Conference of the North American Chapter of the Association for
Computational Linguistics, Pittsburgh, PA.
—— (2003) ‘The information conveyed by words’, Journal of Psycholinguistic Research
32: 101–22.
Hale, M. and Reiss, C. (1998) ‘Formal and empirical arguments concerning phono-
logical acquisition’, Linguistic Inquiry 29: 656–83.
—— and Reiss, C. (2000) ‘Phonology as cognition’, in N. Burton-Roberts, P. Carr,
and G. Docherty (eds), Phonological Knowledge: Conceptual and Empirical Issues.
New York: Oxford University Press, pp. 161–84.
Hankamer, J. (1973) ‘Unacceptable ambiguity’, Linguistic Inquiry 4: 17–68.
Haspelmath, M. (1999) ‘Optimality and diachronic adaptation’, Zeitschrift für Sprach-
wissenschaft 18: 180–205.
Hawkins, J. A. (1983) Word Order Universals. New York: Academic Press.
—— (1990) ‘A parsing theory of word order universals’, Linguistic Inquiry 21: 223–61.
—— (1994) A Performance Theory of Order and Constituency. Cambridge: Cambridge
University Press.
References 373

Hawkins, J. A. (1998) ‘Some issues in a performance theory of word order’, in


A. Siewierska (ed.), Constituent Order in the Languages of Europe. Berlin: de
Gruyter, pp. 729–81.
—— (1999) ‘Processing complexity and filler-gap dependencies across grammars’,
Language 75: 244–85.
—— (2000) ‘The relative ordering of prepositional phrases in English: Going beyond
manner–place–time’, Language Variation and Change 11: 231–66.
—— (2001) ‘Why are categories adjacent?’, Journal of Linguistics 37: 1–34.
—— (2003) ‘Efficiency and complexity in Grammars: Three general principles’, in
M. Polinsky and J. Moore (eds), Explanation in Linguistics. Stanford University,
Stanford: CSLI Publications.
—— (2004) Efficiency and Complexity in Grammars. Oxford: Oxford University
Press.
Hay, J. (2003) Causes and Consequences of Word Structure. New York: Routledge.
——, Pierrehumbert, J. B., and Beckman, M. B. (2004) ‘Speech perception, wellformed-
ness, and the statistics of the lexicon’, in J. Local, R. Ogden, and R. Temple (eds), Papers
in Laboratory Phonology VI. Cambridge: Cambridge University Press, pp. 58–74.
Hayes, B. (1997) ‘Four rules of inference for ranking argumentation’, MS, Department
of Linguistics, University of California, Los Angeles.
—— (1999) ‘Phonological restructuring in Yidi and its theoretical consequences’, in
B. Hermans and M. van Oostendorp (eds), The Derivational Residue in Phonological
Optimality Theory. Amsterdam: John Benjamins, pp. 175–205.
—— and Lahiri, A. (1991) ‘Bengali intonational phonology’, Natural Language and
Linguistic Theory 9: 47–96.
—— and MacEachern, M. (1998) ‘Quatrain form in English folk verse’, Language 74:
473–507.
——, Kirchner, R., and Steriade, D. (2004) Phonetically Based Phonology. Cambridge:
Cambridge University Press.
Hemforth, B. (1993) Kognitives Parsing: Repräsentation und Verarbeitung gramma-
tischen Wissens. Sankt Augustin: Infix.
Henry, A. (1995) Belfast English and Standard English: Dialect Variation and Parameter
Setting. Oxford: Oxford University Press.
—— (2002) ‘Variation and syntactic theory’, in J. K. Chambers, P. Trudgill, and N.
Schilling-Estes (eds), The Handbook of Language Variation and Change. Oxford:
Blackwell, pp. 267–83.
Hill, A. A. (1961) ‘Grammaticality’, Word 17: 1–10.
Hindle, D. and Rooth, M. (1993) ‘Structural ambiguity and lexical relations’, Compu-
tational Linguistics 19: 103–20.
Hirose, Y. (2003) ‘Recycling prosodic boundaries’, Journal of Psycholinguistic Research
32: 167–95.
Hirotani, M. (2003) ‘Prosodic effects on the interpretation of Japanese Wh-questions’,
Alonso-Ovalle, L. (ed.), University of Massachusetts Occasional Papers in Linguistics
27—On Semantic Processing, pp. 117–37.
374 References

Hirschberg, J. and Avesani, C. (2000) ‘Prosodic disambiguation in English and Italian’,


in A. Botinis (ed.), Intonation. Dordrecht: Kluwer Academic Publishers.
Höhle, T. (1982) ‘Explikation für ‘‘normale Betonung’’ und ‘‘normale Wortstellung’’ ’,
in Abraham, W. (ed.), Satzglieder im Deutschen: Vorschläge zur syntaktischen,
semantischen und pragmatischen Fundierung. Tübingen: Narr, pp. 75–153.
—— (1991) ‘On reconstruction and coordination’, in H. Haider and K. Netter (eds),
Representation and Derivation in the Theory of Grammar. Dordrecht: Reidel.
—— (1992) ‘Über Verum-Fokus im Deutschen’, in J. Jacobs (ed.), Informationsstruk-
tur und Grammatik (¼ Linguistische Berichte, Sonderheft 4): 112–41.
Hooper, J. B. (1976) ‘Word frequency in lexical diffusion and the source of morpho-
phonological change’, in W. Christie (ed.), Current Progress in Historical Linguistics.
Amsterdam: North Holland, pp. 95–105.
—— (1978) ‘Constraints on schwa deletion in American English’, in J. Fisiak (ed.),
Recent Developments in Historical Phonology. The Hague: Mouton, pp. 183–207.
Hruska, C., Alter, K., Steinhauer, K., and Steube, A. (2001) ‘Misleading dialogues:
Human’s brain reaction to prosodic information’, in C. Cave, I. Guaitella, and S.
Santi (eds), Orality and Gesture. Interactions et comportements multimodaux dans la
communication. Paris: L’Harmattan, pp. 425–30.
Huang, C.-T. J. (1982) ‘Logical Relations in Chinese and the Theory of Grammar’,
Ph.D. thesis, Massachusetts Institute of Technology.
Hume, E. and Johnson, K. (2001) The Role of Speech Perception in Phonology. San
Diego: Academic Press.
Hyman, L. (1976) ‘Phonologization’, in A. Juilland (ed.), Linguistic Studies Offered to
Joseph Greenberg. Vol. 2, Saratoga: Anma Libri, pp. 407–18.
Ishihara, S. (2002) ‘Invisible but audible Wh-scope marking: Wh-constructions and
deaccenting in Japanese’, Proceedings of the Twenty-first West Coast Conference on
Formal Linguistics, pp. 180–93.
—— (2003) ‘Intonation and Interface Conditions’, Ph.D. thesis, Massachusetts,
Institute of Technology.
Jackendoff, R. (1977) X-bar Syntax: A Study of Phrase Structure. Cambridge, MA: MIT
Press.
—— (1992) ‘Mme. Tussaud meets the Binding Theory’, Natural Language and Lin-
guistic Theory 10: 1–33.
Jacobs, J. (1997) ‘I-Topikalisierung’, Linguistische Berichte 168: 91–133.
Jäger, G. (2004) ‘Maximum entropy models and stochastic optimality theory’, MS,
University of Potsdam.
—— and Rosenbach, A. (2004) ‘The winner takes it all—almost: Cumulativity in
grammatical variation’, MS, University of Potsdam and University of Düsseldorf.
Jakubowicz, C. (2000) ‘Functional categories in (ab)normal language acquisition’, MS,
Université Paris 5.
Jannedy, S. (2003) ‘Hat Patterns and Double Peaks: The Phonetics and Psycholinguis-
tics of Broad versus Late Narrow versus Double Focus Intonations’, Ph.D. thesis,
The Ohio State University.
References 375

Jayaseelan, K. A. (1997) ‘Anaphors as pronouns’, Studia Linguistica 51(2): 186–234.


Johnson, K. (1997) ‘Speech perception without speaker normalization: An exemplar
model’, in K. Johnson and J. W. Mullennix (eds), Talker Variability in Speech
Processing. San Diego: Academic Press, pp. 145–65.
——, Flemming, E., and Wright, R. (1993) ‘The hyperspace effect: Phonetic targets are
hyperarticulated’, Language 69: 505–28.
Jongeneel, J. (1884) Dorpsspraak van Heerle vormenleer en woordenboek. Heerlen: Van
Hooren 1980.
Josefsson, G. (2003) ‘Four myths about object shift in Swedish—and the truth . . .’, in
L.-O. Delsing (ed.), Grammar in Focus II. Festschrift for Christer Platzack. Lund:
Wallin + Dalholm, pp. 199–207.
Jun, S. (ed.) (2005) Prosodic Typology. The Phonology of Intonation and Phrasing.
Oxford: Oxford University Press.
Jurafsky, D. (1996) ‘A probabilistic model of lexical and syntactic access and disam-
biguation’, Cognitive Science 20: 137–94.
—— (2003) ‘Probabilistic modeling in psycholinguistics: Linguistic comprehension
and production’, in R. Bod, J. Hay, and S. Jannedy (eds), Probabilistic Linguistics.
Cambridge, MA: MIT Press, pp. 39–95.
——, Bell, A., Gregory, M., and Raymond, W. D. (2001) ‘Probabilistic relations
between words: Evidence from reduction in lexical production’, in J. Bybee and
P. Hopper (eds), Frequency and the Emergence of Linguistic Structure. Amsterdam:
John Benjamins.
Jusczyk, P. (1997) The Discovery of Spoken Language. Cambridge, MA: MIT Press.
Just, M. A. and Carpenter, P. A. (1987) The Psychology of Reading and Language
Comprehension. Boston, London, Sidney, Toronto: Allyn and Bacon Inc.
Kay, P. and McDaniel, C. K. (1978) ‘The linguistic significance of the meanings of basic
color terms’, Language 54: 610–46.
Kayne, R. (1981) ‘On certain differences between French and English’, Linguistic
Inquiry 12: 349–71.
Kayne, R. S. (1984) Connectedness and Binary Branching. Dordrecht: Foris.
Keating, P. (1985) ‘Universal phonetics and the organization of grammars’, in
V. Fromkin (ed.), Phonetic Linguistics: Essays in Honor of Peter Ladefoged. Orlando:
Academic Press, pp. 115–32.
—— (1988) ‘The window model of coarticulation: Articulatory evidence’, UCLA
Working Papers in Phonetics 69: 3–29.
—— (1996) ‘The phonology–phonetics interface’, UCLA Working Papers in Phonetics
92: 45–60.
Keller, F. (1997) ‘Extraction, gradedness, and optimality’, in A. Dimitriadis, L. Siegel,
C. Surek-Clark, and A. Williams (eds), Proceedings of the 21st Annual Penn
Linguistics Colloquium, no. 4.2 in Penn Working Papers in Linguistics, Department
of Linguistics, University of Pennsylvania, pp. 169–86.
—— (2000a) ‘Evaluating competition-based models of word order’, in
L. R. Gleitman and A. K. Joshi (eds), Proceedings of the 22nd Annual Conference
376 References

of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates,


pp. 747–52.
Keller, F. (2000b) ‘Gradience in Grammar: Experimental and Computational Aspects
of Degrees of Grammaticality’, Ph.D. thesis, University of Edinburgh.
—— (2001) ‘Experimental evidence for constraint competition in gapping construc-
tions’, in G. Müller and W. Sternefeld (eds), ‘Competition in syntax’, No. 49 in
Studies in Genitive Grammar. Berlin and New York: de Gruyter, pp. 211–48.
—— (2003) ‘A probabilistic parser as a model of global processing difficulty’, in
R. Alterman and D. Kirsh (eds), Proceedings of the 25th Annual Conference of the
Cognitive Science Society. Boston, pp. 646–51.
—— and Alexopoulou, T. (2001) ‘Phonology competes with syntax: Experimental
evidence for the interaction of word order and accent placement in the realization
of information structure’, Cognition 79: 301–72.
—— and Asudeh, A. (2001) ‘Constraints on linguistic coreference: Structural vs.
pragmatic factors’, in J. D. Moore and K. Stenning (eds), Proceedings of the 23rd
Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum
Associates, pp. 483–8.
—— and Asudeh, A. (2002) ‘Probabilistic learning algorithms and optimality theory’,
Linguistic Inquiry 33(2): 225–44.
—— and Sorace, A. (2003) ‘Gradient auxiliary selection and impersonal passiviza-
tion in German: An experimental investigation’, Journal of Linguistics 39(1):
57–108.
Kellerman, E. (1987) ‘Aspects of Transferability in Second Language Acquisition’, Ph.D.
thesis, University of Nijmegen.
Kempen, G. and Harbusch, K. (2004) ‘Why grammaticality judgments allow more
word order freedom than speaking and writing: A corpus study into argument
linearization in the midfield of German subordinate clauses’, in S. Kepser and
M. Reis (eds), Linguistic Evidence. Berlin: Mouton de Gruyter.
Kenstowicz, M. (1994) Phonology in Generative Grammar. Cambridge: Blackwell.
—— (2002) ‘Paradigmatic uniformity and contrast’, MIT Working Papers in Linguis-
tics 42 Phonological Answers (and Their Corresponding Questions).
—— and Kisseberth, C. (1977) Topics in Phonological Theory. New York: Academic
Press.
Kessels, M. J. H. (1883) Der koehp va Hehle. Ee Hehlisj vertelsel. Heerlen: Uitgeverij
Winants.
Kessler, B. and Treiman, R. (1997) ‘Syllable structure and the distribution of phonemes
in English syllables’, Journal of Memory and Language 37: 295–311.
Kilborn, K. (1992) ‘On-line integration of grammatical information in a second
language’, in R. J. Harris (ed.), Cognitive Processing in Bilinguals. Amsterdam:
Elsevier Science.
Kim, A.-R. (2000) ‘A Derivational Quantification of ‘‘WH-Phrase’’ ’, Ph.D. thesis,
Indiana University.
References 377

Kimball, J. (1973) ‘Seven principles of surface structure parsing in natural language’,


Cognition 2: 15–47.
King, J. and Just, M. (1991) ‘Individual differences in syntactic processing: The role of
working memory’, Journal of Memory and Language 30: 580–602.
—— and Kutas, M. (1995) ‘Who did what and when? Using clause- and word-related
ERPs to monitor working memory usage in reading’, Journal of Cognitive Neuro-
science 7: 378–97.
Kingston, J. and Diehl, R. (1994) ‘Phonetic knowledge’, Language 70: 419–54.
Kiparsky, P. (1968) ‘How abstract is phonology?’, Bloomington: Indiana University
Linguistics Club. Reprinted 1982 in P. Kiparsky, Explanation in Phonology.
Dordrecht: Foris, pp. 119–63.
—— (1982) ‘Lexical morphology and phonology’, in The Linguistics Society of Korea
(ed.), Linguistics in the Morning Calm: Selected Papers from SICOL-1981. Seoul:
Hanshin Publishing Co., pp. 3–91.
Kirby, S. (1999) Function, Selection and Innateness: The Emergence of Language
Universals. Oxford: Oxford University Press.
Kirchner, R. (1998) ‘Lenition in Phonetically-Based Optimality Theory’, Ph.D. thesis,
UCLA.
—— (2001) An Effort-Based Approach to Consonant Lenition. New York, NY: Routle-
dge. (1998 UCLA Ph.D. thesis).
Kiss, K. (1998) ‘Identificational focus versus information focus’, Language 74: 245–73.
Kitagawa, Y. (2006) ‘Wh-Scope puzzles’, Proceedings of the Thirty-fifth Annual Meeting
of the North-Eastern Linguistic Society. Connecticut, 22 October 2004.
—— and Fodor, J. D. (2003) ‘Default prosody explains neglected syntactic analyses of
Japanese’, Japanese/Korean Linguistics 12: 267–79.
Kjelgaard, M. M. and Speer, S. R. (1999) ‘Prosodic facilitation and interference in the
resolution of temporary syntactic closure ambiguity’, Journal of Memory and
Language 40: 153–94.
Klatt, D. (1987) ‘Review of text-to-speech conversion for English’, JASA 82(3): 737–93.
Klein, W. and Perdue, C. (1997). ‘The basic variety (or: Couldn’t natural languages be
much simpler?)’, Second Language Research 13: 301–47.
Kluender, R. and Kutas, M. (1993) ‘Bridging the gap: Evidence from ERPs on
the processing of unbounded dependencies’, Journal of Cognitive Neuroscience
5: 196–214.
Kolb, H.-P. (1997) ‘Is I-language a generative procedure?’, in ‘GB-blues: Two essays’,
No. 110 in Arbeitspapiere des Sonderforschungsbereichs 340, Tübingen: University of
Tübingen, pp. 1–14.
Krahmer, E. and Swerts, M. (2001) ‘On the alleged existence of contrastive accents’,
Speech Communication 34: 391–405.
Krems, J. (1984) Erwartungsgeleitete Sprachverarbeitung. Frankfurt/Main: Lang.
Krifka, M. (1998) ‘Scope inversion under the rise–fall contour in German’, Linguistic
Inquiry 29: 75–112.
378 References

Kroch, A. S. (1989) ‘Reflexes of grammar in patterns of language change’, Language


Variation and Change 1: 199–244.
Kruskal, J. (1999) ‘An overview of sequence comparison’, in D. Sankoff and J. Kruskal
(eds), Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison, 2nd edn. Reading, MA: Addison-Wesley, pp. 1–44.
Kubozono, H. (1993) The Organization of Japanese Prosody. Tokyo: Kurosio Publishers.
Kuhl, P. K. (1991) ‘Human adults and human infants show a ‘‘perceptual magnetic
effect’’ for the prototypes of speech categories, monkeys do not’, Perception and
Psychophysics 50: 93–107.
Kuno, S. (1973) The Structure of the Japanese Language. Cambridge, MA: MIT Press.
Kutas, M. and Hillyard, S. A. (1980) ‘Reading senseless sentences: Brain potentials
reflect semantic incongruity’, Science 207: 203–5.
—— and Van Petten, C. (1994) ‘Psycholinguistics electrified: Event-related brain
potential investigations’, in M. Gernsbacher (ed.), Handbook of Psycholinguistics.
New York: Academic Press, pp. 83–143.
Kvam, S. (1983). Linksverschachtelung im Deutschen und Norwegischen. Tübingen:
Niemeyer.
Labov, W. (1969) ‘Contraction, deletion, and inherent variability of the English
copula’, Language 45: 715–62.
—— (1972) Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
—— (1980) Locating Language in Time and Space. New York: Academic Press.
—— (1994) Principles of Linguistic Change. Internal Factors. Oxford: Blackwell.
—— (1996) ‘When intuitions fail’, Papers from the 32nd Regional Meeting of the
Chicago Linguistics Society 32: 76–106.
——, Cohen, P., Robins, C., and Lewis, J. (1968) A Study of the Non-Standard English
of Negro and Puerto Rican Speakers in New York City. Philadelphia: US Regional
Survey.
Lacerda, F. (1995) ‘The perceptual magnet effect: An emergent consequence of exem-
plar-based phonetic memory’, Proceedings of the XIIIth International Congress of
Phonetic Sciences 2: 140–7.
—— (1997) ‘Distributed memory representations generate the perceptual-magnet
effect’, MS, Institute of Linguistics, Stockholm University.
Ladd, D. R. (2003) ‘ ‘‘Distinctive phones’’ in surface representation’, written version of
paper presented at LabPhon 8, to appear in the Proceedings.
—— and Morton, R. (1987) ‘The perception of intonational emphasis: Continuous or
categorical?’ Journal of Phonetics 25: 313–42.
Lambrecht, K. (1994) Information Structure and Sentence Form. Cambridge:
Cambridge University Press.
Lapata, M., Keller, F., and Schulte im Walde, S. (2001) ‘Verb frame frequency as a
predictor of verb bias’, Journal of Psycholinguistic Research 30: 419–35.
Lardiere, D. (1998) ‘Case and tense in the ‘fossilized’ steady state’, Second Language
Research 14: 1–26.
References 379

Lasnik, H. and Saito, M. (1992) Move a, Conditions on its Application and Output.
Cambridge, MA: MIT Press.
Lavoie, L. (1996) ‘Lexical frequency effects on the duration of schwa-resonant sequences
in American English’, poster presented at LabPhon 5, Chicago, IL, June 1996.
—— (2002) ‘Some influences on the realization of for and four in American English’,
JIPA 32: 175–202.
Legendre, G. (in press) ‘Optimizing auxiliary selection in Romance’, to appear in
R. Aranovich (ed.), Cross-Linguistic Perspectives on Auxiliary Selection. Amsterdam:
John Benjamins.
Legendre, G. and Sorace, A. (2003) ‘Auxiliaires et intransitivité en français et dans les
langues romanes’, in D. Godard (ed.), Les langues romanes; problèmes de la phrase
simple. Paris: Editions du CNRS, pp. 185–234.
——, Miyata, Y., and Smolensky, P. (1990a) ‘Harmonic grammar—a formal multi-
level connectionist theory of linguistic well-formedness: Theoretical foundations’,
in Proceedings of the Twelfth Annual Conference of the Cognitive Sciences. Cam-
bridge, MA: Lawrence Erlbaum, pp. 388–95.
——, Miyata, Y., and Smolensky, P. (1990b) ‘Harmonic grammar—A formal multi-
level connectionist theory of linguistic well-formedness: An application’, in Pro-
ceedings of the Twelfth Annual Conference of the Cognitive Sciences. Cambridge, MA:
Lawrence Erlbaum, pp. 884–91.
——, Miyata, Y., and Smolensky, P. (1991) ‘Unifying syntactic and semantic ap-
proaches to unaccusativity: A connectionist approach’, Proceedings of the 17th
Annual Meeting of the Berkeley Linguistic Society. Berkeley: Berkeley Linguistic
Society, pp. 156–67.
Lehiste, I. (1973) ‘Phonetic disambiguation of syntactic ambiguity’, Glossa 7: 107–22.
Lehmann, W. P. (1978) ‘The great underlying ground-plans’, in W. P. Lehmann (ed.),
Syntactic Typology: Studies in the Phenomenology of Language. Austin: University of
Texas Press, pp. 3–55.
Leonini, C. and Belletti, A. (2004) ‘Subject inversion in L2 Italian’, in S. Foster-Cohen,
M. Sharwood Smith, A. Sorace, and M. Ota (eds), Eurosla Workbook 4: 95–118.
——, and Rappaport Hovav, M. (1995) Unaccusativity at the Syntax–Semantics Inter-
face. Cambridge, MA: MIT Press.
Levin, B. and Rappaport Hovav, M. (1996) ‘From lexical semantics to argument
realization’, MS, Northwestern University and Bar-Ilan University.
Lewis, R. (1993) ‘An Architecturally-Based Theory of Human Sentence Comprehen-
sion’, Ph.D. thesis, Carnegie Mellon University.
Li, C. and Thompson, S. (1976) ‘Subject and topic: A new typology’, in C. Li (ed.),
Subject and Topic. New York: Academic Press, pp. 457–89.
Liberman, M. and Pierrehumbert, J. (1984) ‘Intonational invariance under changes in
pitch range and length’, in M. Aronoff and R. T. Oehrle (eds), Language Sound
Structure. Cambridge, MA: MIT Press, pp. 157–233.
Liceras, J., Valenzuela, E., and Dı́az, L. (1999). ‘L1/L2 Spanish Grammars and the
Pragmatic Deficit Hypothesis’, Second Language Research 15: 161–90.
380 References

Lindblom, B. (1990) ‘Models of phonetic variation and selection’, PERILUS 11: 65–100.
Lodge, M. (1981) Magnitude Scaling: Quantitative Measurement of Opinions. Beverley
Hills, CA: Sage Publications.
Lohse, B., Hawkins, J. A., and Wasow, T. (2004) ‘Domain minimization in English
verb-particle constructions’, Language 80: 238–61.
Lovrič, N. (2003) ‘Implicit Prosody in Silent Reading: Relative Clause Attachment in
Croatian’, Ph.D. thesis, CUNY Graduate Center.
Luce, P. A. and Large, N. (2001) ‘Phonotactics, neighborhood density, and entropy in
spoken word recognition’, Language and Cognitive Processes 16: 565–81.
—— and Pisoni, D. B. (1998) ‘Recognizing spoken words: The neighborhood activa-
tion model’, Ear and Hearing 19: 1–36.
MacBride, A. (2004) ‘A Constraint-Based Approach to Morphology’, Ph.D. thesis,
UCLA, http://www.linguistics.ucla.edu/ faciliti/diss.htm.
MacDonald, M. C. (1993) ‘The interaction of lexical and syntactic ambiguity’, Journal
of Memory and Language 32: 692–715.
—— (1994) ‘Probabilistic constraints and syntactic ambiguity resolution’, Language
and Cognitive Processes 9: 157–201.
——, Pearlmutter, N. J., and Seidenberg, M. S. (1994) ‘Lexical nature of syntactic
ambiguity resolution’, Psychological Review 101: 676–703.
Manning, C. D. (2003) ‘Probabilistic syntax’, in R. Bod, J. Hay, and S. Jannedy (eds),
Probabilistic Linguistics. Cambridge, MA: MIT Press, pp. 289–341.
—— and Schütze, H. (1999) Foundations of Statistical Natural Language Processing.
Cambridge, MA: MIT Press.
Marantz, A. (2000) Class Notes. Cambridge, MA: MIT Press.
Marcus, M. P. (1980) A Theory of Syntactic Recognition for Natural Language.
Cambridge, MA: MIT Press.
Marks, L. E. (1965) Psychological Investigations of Semi-Grammaticalness in English.
Dissertation, Harvard: Harvard University
—— (1967) ‘Judgments of grammaticalness of some English sentences and semi-
sentences’, American Journal of Psychology 20: 196–204.
Marslen-Wilson, W. (1987) ‘Functional parallelism in spoken word-recognition’, in
U. Frauenfelder and L. Tyler (eds), Spoken Word Recognition. Cambridge, MA: MIT
Press, pp. 71–102.
Mateu, J. (2003) ‘Digitizing the syntax–semantics interface. The case of aux-selection
in Italian and French’, MS, Universitat Autònoma of Barcelona.
Matzke, M., Mai, H., Nager, W., Rüsseler, J., and Münte, T. F. (2002) ‘The cost of freedom:
An ERP-study of non-canonical sentences’, Clinical Neurophysiology 113: 844–52.
Maynell, L. A. (1999) ‘Effect of pitch accent placement on resolving relative clause
ambiguity in English’, The 12th Annual CUNY Conference on Human Sentence
Processing (Poster). New York, March.
McCarthy, J. (2003) ‘OT constraints are categorical’, Phonology 20: 75–138.
—— and Prince, A. (1993) ‘Generalized alignment’, in G. Booij and J. van Marle (eds),
Morphology Yearbook 1993. Dordrecht: Kluwer, pp. 79–153.
References 381

—— and Prince, A. (1995) ‘Faithfulness and reduplicative identity’, in J. Beckman,


L. W. Dickey, and S. Urbanczyk (eds), Papers in Optimality Theory. University of
Massachusetts Occasional Papers 18. Amherst, MA: Graduate Linguistic Student
Association, pp. 249–384.
McDaniel, D. and Cowart, W. (1999) ‘Experimental evidence of a minimalist account
of English resumptive pronouns’, Cognition 70: B15–B24.
McElree, B. (1993) ‘The locus of lexical preference effects in sentence comprehension’,
Journal of Memory and Language 32: 536–71.
—— (2000) ‘Sentence comprehension is mediated by content-addressable memory
structures’, Journal of Psycholinguistic Research 29: 111–23.
—— and Griffith, T. (1995) ‘Syntactic and thematic processing in sentence compre-
hension’, Journal of Experimental Psychology: Learning, Memory and Cognition 21:
134–57.
—— and Griffith, T. (1998) ‘Structural and lexical effects on filling gaps during
sentence processing: A time-course analysis’, Journal of Experimental Psychology:
Learning, Memory, and Cognition 24: 432–60.
—— and Nordlie, J. (1999) ‘Literal and figurative interpretations are computed in
equal time’, Psychonomic Bulletin and Review 6: 486–94.
——, Foraker, S., and Dyer, L. (2003) ‘Memory structures that subserve sentence
comprehension’, Journal of Memory and Language 48: 67–91.
McQueen, J. M. and Cutler, A. (1997) ‘Cognitive processes in speech perception’, in
W. J. Hardcastle and J. Laver (eds), The Handbook of Phonetic Sciences. Oxford:
Blackwell, pp. 566–85.
McRae, K., Spivey-Knowlton, M. J., and Tanenhaus, M. K. (1998) ‘Modeling the
influence of thematic fit (and other constraints) in on-line sentence comprehen-
sion’, Journal of Memory and Language 38: 283–312.
Mecklinger, A., Schriefers, H., Steinhauer, K., and Friederici, A. D. (1995) ‘Processing
relative clauses varying on syntactic and semantic dimensions: An analysis with
event-related potentials’, Memory and Cognition 23: 477–94.
Mendoza-Denton, N., Hay, J., and Jannedy, S. (2003) ‘Probabilistic sociolinguistics:
Beyond variable rules’, in R. Bod, J. Hay, and S. Jannedy (eds), Probabilistic
Linguistics. Cambridge University, MA: MIT Press.
Meng, M. (1998) Kognitive Sprachverarbeitung. Rekonstruktion syntaktischer Strukturen
beim Lesen. Wiesbaden: Deutscher Universitätsverlag.
—— and Bader, M. (2000a) ‘Mode of disambiguation and garden path strength:
An investigation of subject–object ambiguities in German’, Language and Speech
43: 43–74.
—— and Bader, M. (2000b) ‘Ungrammaticality detection and garden path strength:
evidence for serial parsing’, Language and Cognitive Processes 15(6): 615–66.
Mikheev, A. (1997) ‘Automatic rule induction for unknown-word guessing’, Compu-
tational Linguistics 23: 405–23.
382 References

Mitchell, D. C., Cuetos, F., Corley, M. M. B., and Brysbaert, M. (1996) ‘Exposure based
models of human parsing: Evidence for the use of coarse-grained (nonlexical)
statistical records’, Journal of Psycholinguistic Research 24: 469–88.
Montrul, S. (2002) ‘Incomplete acquisition and attrition of Spanish tense/aspect
distinctions in adult bilinguals’, Bilingualism: Language and Cognition 5: 39–68.
—— (2004) ‘Subject and object expression in Spanish heritage speakers: A case of
morphosyntactic convergence’, Bilingualism: Language and Cognition 7(2): 125–42.
—— (in press) ‘Second language acquisition and first language loss in adult early
bilinguals: Exploring some differences and similarities’, to appear in Second Lan-
guage Research.
Moreton, E. (2002) ‘Structural constraints in the perception of English stop-sonorant
clusters’, Cognition 84: 55–71.
Morgan, J. L. (1973) ‘Sentence fragments and the notion ‘‘Sentence’’ ’, in B. B. Kachru,
R. B. Lees, Y. Malkiel, A. Pietrangeli, and S. Saporta (eds), Issues in Linguistics:
Papers in Honor of Henry and Renee Kahane. Urbana, IL: University of Illinois Press.
Müller, G. (1999) ‘Optimality, markedness, and word order in German’, Linguistics
37(5): 777–818.
—— (2005) ‘Subanalyse verbaler Flexionsmarker’, MS, Universität Leipzig.
Müller, H. M., King, J. W., and Kutas, M. (1997) ‘Event-related potentials elicited by
spoken relative clauses’, Cognitive Brain Research 5: 193–203.
Müller, N. and Hulk, A. (2001) ‘Crosslinguistic influence in bilingual language acqui-
sition: Italian and French as recipient languages’, Bilingualism: Language and
Cognition 4: 1–22.
Müller, S. (2004) ‘Complex NPs, subjacency, and extraposition’, Snippets, Issue 8.
Munson, B. (2001) ‘Phonological pattern frequency and speech production in adults
and children’, Journal of Speech, Language, and Hearing Research 44: 778–92.
Murphy, V. A. (1997) ‘The effect of modality on a grammaticality judgment task’,
Second Language Research 13: 34–65.
Muysken, P. (2000) Bilingual Speech. A Typology of Code-Mixing. Cambridge:
Cambridge University Press.
Nagy, N. and Reynolds, B. (1997) ‘Optimality theory and variable word-final deletion
in Faetar’, Language Variation and Change 9: 37–55.
Narayanan, S. and Jurafsky, D. (1998) ‘Bayesian models of human sentence processing’,
in M. A. Gernsbacher and S. J. Derry (eds), Proceedings of the 20th Annual
Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum
Associates.
Nespor, M. and Vogel, I. (1986) Prosodic Phonology. Dordrecht: Foris.
Ney, H., Essen, U., and Kneser, R. (1994) ‘On structuring probabilistic dependencies in
stochastic language modeling’, Computer Speech and Language 8: 1–28.
Nooteboom, S. G. and Kruyt, J. G. (1987) ‘Accents, focus distribution, and the
perceived distribution of given and new information: An experiment’, Journal of
the Acoustical Society of America 82: 1512–24.
References 383

Nusbaum, H. C., Pisoni, D. B., and Davis, C. K. (1984) ‘Sizing up the Hoosier mental
lexicon: Measuring the familiarity of 20,000 words’, Research on Speech Perception,
Progress Report 10. Bloomington: Speech Research Laboratory, Indiana University,
pp. 357–76.
Ohala, J. J. (1992) ‘Alternatives to the sonority hierarchy for explaining the shape of
morphemes’, Papers from the Parasession on the Syllable. Chicago: Chicago Linguis-
tic Society, 319–38.
Osterhout, L. and Holcomb, P. J. (1992) ‘Event-related brain potentials elicited by
syntactic anomaly’, Journal of Memory and Language 31: 785–804.
Paolillo, J. C. (1997) ‘Sinhala diglossia: Discrete or continuous variation?’, Language in
Society 26, 2: 269–96.
Paradis, J. and Navarro, S. (2003) ‘Subject realization and cross-linguistic interference
in the bilingual acquisition of Spanish and English: What is the role of input?’,
Journal of Child Language 30: 1–23.
Pechmann, T., Uszkoreit, H., Engelkamp, J., and Zerbst, D. (1996) ‘Wortstellung im
deutschen Mittelfeld. Linguistische Theorie und psycholinguistische Evidenz’, in
C. Habel, S. Kanngießer, and G. Rickheit (eds), Perspektiven der Kognitiven Linguistik.
Modelle und Methoden. Opladen: Westdeutscher Verlag, pp. 257–99.
Perlmutter, D. (1978) ‘Impersonal passives and the unaccusative hypothesis’, Berkeley
Linguistic Society 4: 126–70.
Pesetsky, D. (1987) ‘Wh-in situ: Movement and unselective binding’, in E. Reuland and
A. T. Meulen (eds), The Representation of (in)Definiteness. Cambridge, MA: MIT
Press, pp. 98–129.
Peters, J. (2005) Intonatorische Variation im Deutschen. Studien zu ausgewählten
Regionalsprachen. Habilitation thesis. University of Potsdam.
Pickering, M. J., Traxler, M. J., and Crocker, M. W. (2000) ‘Ambiguity resolution in
sentence processing: Evidence against frequency-based accounts’, Journal of Mem-
ory and Language 43: 447–75.
Pierrehumbert, J. (1980) ‘The Phonology and Phonetics of English Intonation’, Ph.D.
thesis, MIT.
—— (1994) ‘Syllable structure and word structure: A study of triconsonantal clusters
in English’, in P. Keating (ed.), Phonological Structure and Phonetic Form: Papers in
Laboratory Phonology III. Cambridge: Cambridge University Press, pp. 168–88.
—— (2001) ‘Stochastic phonology’, GLOT 5 No. 6: 195–207.
—— (2002) ‘Word-specific phonetics’, in C. Gussenhoven and N. Warner (eds),
Laboratory Phonology 7. Berlin: Mouton de Gruyter, pp. 101–39.
—— (2003) ‘Probabilistic phonology: Discrimination and robustness’, in R. Bod,
J. Hay, and S. Jannedy (eds), Probabilistic Linguistics. Cambridge, MA: MIT Press,
pp. 177–228.
—— and Steele, S. (1989) ‘Categories of tonal alignment in English’, Phonetica 46:
181–96.
——, Beckman, M. E. and Ladd, D. R. (2000) ‘Conceptual foundations in phonology
as a laboratory science’, in N. Burton-Roberts, P. Carr, and G. Docherty (eds),
384 References

Phonological Knowledge: Conceptual and Empirical Issues. New York: Oxford Uni-
versity Press, pp. 273–304.
Pinker, S. (1999) Words and Rules: The Ingredients of Language. New York: Basic Books.
—— and Prince, A. (1988) ‘On language and connectionism: Analysis of a parallel
distributed processing model of language acquisition’, Cognition 28: 73–193.
Pitt, M. A. and McQueen, J. M. (1998) ‘Is compensation for coarticulation mediated
by the lexicon?’, Journal of Memory and Language 39: 347–70.
Pittner, K. (1991) ‘Freie Relativsätze und die Kasushierarchie.’, in E. Feldbusch (ed.),
Neue Fragen der Linguistik. Tübingen: Niemeyer, pp. 341–7.
Polinsky, M. (1995) ‘American Russian: Language loss meets language acquisition’,
in W. Browne, E. Dornish, N. Kondrashova and D. Zec (eds), Annual Workshop on
Formal Approaches to Slavic Linguistics. Ann Arbor: Michigan Slavic Publications,
pp. 371–406.
Pollard, C. and Sag, I. A. (1987) Information-Based Syntax and Semantics, Vol.1:
Fundamentals. Stanford University, Stanford: CSLI Lecture Notes No. 13.
—— and Sag, I. A. (1992) ‘Anaphors in English and the scope of the binding theory’,
Linguistic Inquiry 23: 261–305.
Prasada, S. and Pinker, S. (1993) ‘Generalization of regular and irregular morpho-
logical patterns’, Language and Cognitive Processes 8: 1–56.
Prévost, P. and White, L. (2000) ‘Missing surface inflection or impairment in second
language acquisition? Evidence from tense and agreement’, Second Language
Research 16: 103–33.
Prince, A. and Smolensky, P. (1993) ‘Optimality theory: Constraint interaction in
generative grammar’. Technical Report TR-2, Rutgers University Center for Cogni-
tive Science. Published as Prince and Smolensky (2004).
—— and Smolensky, P. (1997) ‘Optimality: From neural networks to universal gram-
mar’, Science 275: 1604–10.
—— and Smolensky, P. (2004) Optimality theory: Constraint interaction in generative
grammar. Oxford: Blackwell.
Pritchett, B. L. (1992) Grammatical Competence and Parsing Performance. Chicago:
University of Chicago Press.
Ramscar, M. (2002) ‘The role of meaning in inflection: Why the past tense does not
require a rule’, Cognitive Psychology 45: 45–94.
Randall, J. (in press) ‘Features and linking rules: A parametric account of auxiliary
selection’, to appear in R. Aranovich (ed.), Cross-Linguistic Perspectives on Auxiliary
Selection. Amsterdam: John Benjamins.
Rayner, K., Carlson, M., and Frazier, L. (1983) ‘Interaction of syntax and semantics
during sentence processing: Eye movements in the analysis of semantically biased
sentences’, Journal of Verbal Learning and Verbal Behavior 22: 358–74.
Reinhart, T. (1981) ‘Pragmatics and linguistics: An analysis of sentence topics’, Philo-
sophica 27: 53–94.
References 385

—— (1996) ‘Interface economy—focus and markedness’, in C. Wilder, H. M. Gaertner,


and M. Bierwisch (eds), The Role of Economy Principles in Linguistic Theory. Berlin:
Akademic Verlag.
—— (2000a) ‘Strategies of anaphora resolution’, in H. Bennis, M. Everaert, and
E. Reuland (eds), Interface Strategies. Amsterdam: Royal Netherlands Academy of
Arts and Sciences, pp. 295–324.
—— (2000b) ‘The theta system: Syntactic realization of verbal concepts’, OTS working
paper in linguistics 00, 01/TL, Utrecht Institute of Linguistics, OTS.
—— (2003) ‘The theta system—an overview’, Theoretical Linguistics 28(3).
—— and Reuland, E. (1991) ‘Anaphors and logophors: An argument structure per-
spective’, in J. Koster and E. Reuland (eds), Long Distance Anaphora. Cambridge:
Cambridge University Press, pp. 283–321.
—— and Reuland, E (1993) ‘Reflexivity’, Linguistic Inquiry 24: 657–720.
Reuland, E. (2000) ‘The fine structure of grammar: Anaphoric relations’, in
Z. Frajzyngier and T. Curl (eds), Reflexives: Forms and Functions. Amsterdam:
John Benjamins, pp. 1–40.
—— (2001) ‘Primitives of binding’, Linguistic Inquiry 32(2): 439–92.
—— (2003) ‘State-of-the-article. Anaphoric dependencies: A window into the archi-
tecture of the language system’, GLOT International 7(1/2): 2–25.
—— and Reinhart, T. (1995) ‘Pronouns, anaphors and case’, in H. Haider, S. Olsen,
and S. Vikner (eds), Studies in Comparative Germanic Syntax. Dordrecht: Kluwer,
pp. 241–69.
Reynolds, B. (1994) ‘Variation and Phonological Theory’, Ph.D. thesis, University of
Pennsylvania.
Riehl, A. (2003a) ‘American English flapping: Evidence against paradigm uniformity
with phonetic features’, Proceedings of the 15th International Congress of Phonetic
Sciences, 2753–6.
—— (2003b) ‘American English flapping: Perceptual and acoustic evidence against
pardigm uniformity with phonetic features’, Working Papers of the Cornell Phonetics
Laboratory 15: 271–337.
Riemsdijk, H. van (1989) ‘Movement and regeneration’, in P. Benincà (ed.), Dialectal
Variation and the Theory of Grammar. Dordrecht: Foris, pp. 105–36.
Riezler, S. (1996) ‘Quantitative constraint logic programming for weighted grammar
applications’, in Proceedings of the 1st Conference on Logical Aspects of Computational
Linguistics. Berlin: Springer.
Ringen, C. and Heinamaki, O. (1999) ‘Variation in Finnish vowel harmony: An OT
account’, Natural Language and Linguistic Theory 17: 303–37.
Rizzi, L. (1982) Italian Syntax. Dordrecht: Foris.
—— (2002) ‘On the grammatical basis of language development: A case study’, MS,
University of Siena.
—— (2004) The Structure of CP and IP. The Cartography of Syntactic Structures,
Volume 2. Oxford: Oxford University Press.
386 References

Robins, R. H. (1957) ‘Vowel nasality in Sundanese: A phonological and grammatical


study’, Studies in Linguistics (special volume of the Philological Society). Oxford:
Basil Blackwell, pp. 87–103.
Röder, B., Schicke, T., Stock, O., Heberer, G., and Rösler, F. (2000) ‘Word order effects
in German sentences and German pseudo-word sentences’, Zeitschrift für Sprache
und Kognition 19: 31–7.
——, Stock, O., Neville, H., Bien, S., and Rösler, F. (2002) ‘Brain activation modu-
lated by the comprehension of normal and pseudo-word sentences of different
processing demands: A functional magnetic resonance imaging study’, NeuroImage
15: 1003–14.
Rohdenburg, G. (1996) ‘Cognitive complexity and grammatical explicitness in
English’, Cognitive Linguistics 7: 149–82.
Roland, D. and Jurafsky, D. (1998) ‘How verb subcategorization frequencies are
affected by corpus choice’, in Proceedings of the 17th International Conference on
Computational Linguistics and 36th Annual Meeting of the Association for Compu-
tational Linguistics. Montréal, pp. 1122–28.
—— and Jurafsky, D. (2002) ‘Verb sense and verb subcategorization probabilities’, in
P. Merlo and S. Stevenson (eds), The Lexical Basis of Sentence Processing: Formal,
Computational, and Experimental Issues. Amsterdam: John Benjamins, pp. 325–46.
Ross, J. R. (1971) ‘Variable Strength’, MS, MIT.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986a) ‘Learning internal
representations by error propagation’, in Parallel Distributed Processing: Explor-
ations in the Microstructure of Cognition, vol. 1: Foundations. Cambridge, MA:
MIT Press, pp. 318–62.
——, McClelland, J. L., and the PDP Research Group (1986b) Parallel Distributed
Processing. Explorations in the Microstructure of Cognition. Cambridge, MA: MIT
Press.
Russell, K. (1999) ‘MOT: Sketch of an optimality theoretic approach to morphology’,
MS, http://www.umanitoba.ca/linguistics/russell/.
Sabourin, L. (2003) ‘Grammatical Gender and Second Language Processing’, Ph.D.
thesis, University of Groningen.
Saito, M. (1985) ‘Some Asymmetries in Japanese and Their Theoretical Consequences’,
Ph.D. thesis, MIT.
—— (1989) ‘Scrambling as semantically vacuous A’-movement’, in M. Baltin and
A. Kroch (eds), Alternative Conceptions of Phrase Structure. Chicago: University of
Chicago Press.
Samuel, A. G. (1981) ‘The role of bottom-up confirmation in the phonemic-restor-
ation illusion’, Journal of Experimental Psychology: Human Perception and Perform-
ance 7: 1124–31.
Sankoff, D. and Labov, W. (1979) ‘On the uses of variable rules’, Language in Society
8: 189–222.
Sapir, E. and Hoijer, H. (1967) The Phonology and Morphology of the Navajo Language.
Berkeley: University of California Press.
References 387

Sarle, W. S. (1994). ‘Neural networks and statistical models’, in Proceedings of the


19th Annual SAS Users Group International Conference. Cary, NC: SAS Institute,
pp. 1538–50.
Schafer, A. J. (1997) ‘Prosodic Parsing: The Role of Prosody in Sentence Comprehen-
sion’, Ph.D. thesis, Amherst, MA: University of Massachusetts.
——, Carlson, K., Clifton, C., and Frazier, L. (2000) ‘Focus and the interpretation of
pitch accent: Disambiguating embedded questions’, Language and Speech 43: 75–105.
Schladt, M. (2000) ‘The typology and grammaticalization of reflexives’, in
Z. Frajzyngier and T. Curl (eds), Reflexives: Forms and Functions. Amsterdam:
John Benjamins.
Schlesewsky, M. and Bornkessel, I. (2003) ‘Ungrammaticality detection and garden
path strength: A commentary on Meng and Bader’s (2000) evidence for serial
parsing’, Language and Cognitive Processes 18: 299–311.
—— and Bornkessel, I. (2004) ‘On incremental interpretation: Degrees of meaning
accessed during sentence comprehension’, Lingua 114: 1213–34.
——, Fanselow, G., Kliegl, R., and Krems, J. (2000) ‘The subject-preference in the
processing of locally ambiguous Wh-questions in German’, in B. Hemforth and
L. Konieczny (eds), German Sentence Processing. Dordrecht: Kluwer, pp. 65–93.
——, Fanselow, G., and Frisch, S. (2003) ‘Case as a trigger for reanalysis—some
arguments from the processing of double case ungrammaticalities in German’,
MS, University of Potsdam.
Schmerling, S. (1976) Aspects of English Sentence Stress. Austin: University of Texas
Press.
Schmitz, K. (2003) ‘Subject omission and realization in German–Italian bilingual
children’, MS, University of Hamburg.
Schriefers, H., Friederici, A. D., and Kühn, K. (1995) ‘The processing of locally
ambiguous relative clauses in German’, Journal of Memory and Language
34: 499–520.
Schütze, C. T. (1996) The Empirical Base of Linguistics. Grammaticality Judgments and
Linguistic Methodology. Chicago: The University of Chicago Press.
—— and Gibson, E. (1999) ‘Argumenthood and English prepositional phrase attach-
ment’, Journal of Memory and Language 40: 409–31.
Schwarzschild, R. (1999) ‘GIVENness, AvoidF and other constraints on the placement
of accent’, Natural Language Semantics 7: 141–77.
Scobbie, J. (2004) ‘Flexibility in the face of incompatible English VOT systems’, written
version of Lab Phon 8 paper.
Selkirk, E. (1984) Phonology and Syntax. The Relation between Sound and Structure.
Cambridge, MA: MIT Press.
—— (1995) ‘Sentence prosody: Intonation, stress and phrasing’, in J. Goldsmith (ed.),
Handbook of Phonological Theory. Cambridge, MA: Blackwell, pp. 550–69.
—— (2000) ‘The interaction of constraints on prosodic phrasing’, in M. Horne (ed.),
Prosody: Theory and Experiment. Amsterdam: Kluwer, pp. 231–62.
388 References

Sendlmeier, W. F. (1987) ‘Auditive judgments of word similarity’, Zeitschrift für


Phonetik, Sprachwissenschaft und Kommunikationsforschung 40: 538–46.
Serratrice, L. (2004) ‘Anaphoric interpretation of null and overt pronominal subjects
in bilingual and monolingual Italian acquisition’, MS, University of Manchester.
——, Sorace, A., and Paoli, S. (2004) ‘Transfer at the syntax–pragmatics interface:
Subjects and objects in Italian–English bilingual and monolingual acquisition’,
Bilingualism: Language and Cognition: 183–207.
Sevald, C. and Dell, G. S. (1994) ‘The sequential cueing effect in speech production’,
Cognition 53: 91–127.
Skousen, R., Lonsdale, D., and Parkinson, D. B. (2002) Analogical Modeling: An
Exemplar-Based Approach to Language. Amsterdam: John Benjamins.
Skut, W., Krenn, B., Brants, T., and Uszkoreit, H. (1997) ‘An annotation scheme for
free word order languages’, in Proceedings of the 5th Conference on Applied Natural
Language Processing. Washington, DC.
Smith, J. (2000) ‘Positional faithfulness and learnability in optimality theory’, in
R. Daly and A. Riehl (eds), Proceedings of ESCOL 99. Ithaca: CLC Publications,
pp. 203–14.
Smith, O. W., Koutstaa, C. W., and Kepke, A. N. (1969) ‘Relation of language distance
to learning to pronounce Greenberg and Jenkins List-1 CCVCS’, Perception and
Motor Skills 29: 187.
Smolensky, P., Legendre, G., and Miyata, Y. (1992) ‘Principles for an integrated
connectionist/symbolic theory of higher cognition’, Report CU-CS-600–92, Com-
puter Science Department, University of Colorado at Boulder.
——, Legendre, G., and Miyata, Y. (1993) ‘Integrating connectionist and symbolic
computation for the theory of language’, Current Science 64: 381–91.
Snyder, W. (2000) ‘An experimental investigation of syntactic satiation effects’,
Linguistic Inquiry I 31: 575–82.
Sorace, A. (1992) ‘Lexical Conditions on Syntactic Knowledge: Auxiliary Selection in
Native and Non-Native Grammars of Italian’, Ph.D. thesis, University of Edinburgh.
—— (1993a) ‘Incomplete vs. divergent representations of unaccusativity in non-
native grammars of Italian’, Second Language Research 9: 22–47.
—— (1993b) ‘Unaccusativity and auxiliary choice in non-native grammars of Italian
and French: asymmetries and predictable indeterminacy’, Journal of French Lan-
guage Studies 3: 71–93.
—— (1995) ‘Acquiring argument structures in a second language: The unaccusative/
unergative distinction’, in L. Eubank, L. Selinker, and M. Sharwood Smith (eds),
The Current State of the Interlanguage. Amsterdam: John Benjamins, pp. 153–75.
—— (1996) ‘The use of acceptability judgments in second language acquisition
research’, in T. Bhatia and W. Ritchie (eds), Handbook of Second Language Acqui-
sition. San Diego: Academic Press.
—— (2000a) ‘Syntactic optionality in L2 acquisition’, Second Language Research 16:
93–102.
References 389

Sorace, A. (2000b) ‘Gradients in auxiliary selection with intransitive verbs’, Language


76: 859–90.
—— (2003a) ‘Gradedness at the lexicon–syntax interface: Evidence from auxiliary
selection and implications for unaccusativity’, in A. Alexiadou, E. Anagnostopou-
lou, and M. Everaert (eds), The Unaccusativity Puzzle: Explorations in the Syntax–
Lexicon Interface. Oxford: Oxford University Press, pp. 243–68.
—— (2003b) ‘Near-nativeness’, in M. Long and C. Doughty (eds), Handbook of
Second Language Acquisition. Oxford: Blackwell, pp. 130–51.
—— (2005) ‘Syntactic optionality at interfaces’, in L. Cornips and K. Corrigan (eds),
Syntax and Variation: Reconciling the Biological and the Social. Amsterdam: John
Benjamins, pp. 46–111.
—— (in press) ‘Possible manifestations of ‘‘shallow processing’’ in advanced L2
speakers’, to appear in Applied Psycholinguistics.
—— and Keller, F. (2005) ‘Gradedness in linguistic data’, Lingua 115: 1497–1524.
Speer, S., Warren, P., and Schafer, A. (2003) ‘Intonation and sentence processing’, in
Proceedings of the International Congress of Phonetic Sciences 15. Barcelona: 95–106.
Sproat, R. and Fujimura, O. (1993) ‘Allophonic variation in English /l/ and its
implications for phonetic implementation’, Journal of Phonetics 21: 291–311.
Sprouse, R. A. and Vance, B. (1999) ‘An explanation for the decline of null pronouns in
certain Germanic and Romance languages’, in M. DeGraff (ed.), Language Creation
and Language Change: Creolization, Diachrony, and Development. Cambridge, MA:
MIT Press, pp. 257–84.
Stallings, L. M. (1998) ‘Evaluating Heaviness: Relative Weight in the Spoken Produc-
tion of Heavy-NP Shift’, Ph.D. thesis, University of Southern California.
——, MacDonald, M., and O’Seaghda, P. (1998) ‘Phrasal ordering constraints in
sentence production: Phrase length and verb disposition in heavy-NP shift’, Journal
of Memory and Language 39: 392–417.
Stechow, A. v. and Uhmann, S. (1986) ‘Some remarks on focus projection’, in
W. Abraham and S. de Meij (eds), Topic, Focus and Configurationality. Amsterdam/
Philadelphia: John Benjamins, pp. 295–320.
Steinhauer, K. (2000) ‘Hirnphysiologische Korrelate prosodischer Satzverarbeitung
bei gesprochener und geschriebener Sprache’, MPI series in cognitive neuroscience 18.
Steriade, D. (1990) ‘Gestures and autosegments: Comments on Browman and Gold-
stein’s Paper’, in J. Kingston and M. Beckman (eds), Papers in Laboratory Phonology
II: Between the Grammar and Physics in Speech. Cambridge: Cambridge University
Press, pp. 382–97.
—— (1995) ‘Positional neutralization’, unfinished MS, UCLA.
—— (2000) ‘Paradigm uniformity and the phonetics/phonology boundary’, in M.
Broe, and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in Labora-
tory Phonology V. Cambridge: Cambridge University Press, pp. 313–35.
—— (2001) ‘Directional asymmetries in place assimilation: A perceptual account’, in
E. Hume and K. Johnson (eds), The Role of Speech Perception in Phonology. New
York: Academic Press, pp. 219–50.
390 References

Sternefeld, W. (2001) ‘Grammatikalität und Sprachvermögen. Anmerkungen zum


Induktionsproblem in der Syntax’, in J. Bayer and C. Römer (eds), Von der Philo-
logie zur Grammatiktheorie: Peter Suchsland zum 65. Geburtstag. Tübingen:
Niemeyer, pp. 15–44.
Stevens, S. S. (1975) ‘On the psychophysical law’, Psychological Review 64: 153–81.
Stolcke, A. (1995) ‘An efficient probabilistic context-free parsing algorithm that
computes prefix probabilities’, Computational Linguistics 21: 165–201.
Stowell, T. and Beghelli, F. (1997) ‘Distributivity and negation’, in A. Szabolcsi (ed.),
Ways of Scope Taking. Dordrecht: Kluwer, pp. 71–107.
Strawson, P. F. (1964) ‘Identifying reference and truth-values’, Theoria 30: 96–118.
Sturt, P., Pickering, M. J., and Crocker, M. W. (1999) ‘Structural change and
reanalysis difficulty in language comprehension’, Journal of Memory and Language
40: 136–50.
——, Pickering, M. J., Scheepers, C., and Crocker, M. W. (2001) ‘The preservation of
structure in language comprehension: Is reanalysis the last resort?’ Journal of
Memory and Language 45: 283–307.
Suppes, P. (1970) ‘Probabilistic grammars’, Synthese 22: 95–116.
Swinney, D. A. (1979) ‘Lexical access during sentence comprehension: (Re)consid-
eration of context effects’, Journal of Verbal Learning and Verbal Behavior 18:
645–60.
Szendroi, K. (2004) ‘Focus and the interaction between syntax and pragmatics’, Lingua
114: 229–54.
Takahashi, D. (1993) ‘Movement of Wh-phrases in Japanese’, Natural Language and
Linguistic Theory 11: 655–78.
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., and Sedivy, J. C. (1995)
‘Integration of visual and linguistic information in spoken language comprehen-
sion’, Science 268: 1632–4.
——, Spivey-Knowlton, M. J., and Hanna, J. E. (2000) ‘Modelling discourse context
effects: A multiple constraints approach’, in M. Crocker, M. Pickering, and C.
Clifton (eds), Architectures and Mechanisms for Language Processing. Cambridge:
Cambridge University Press, pp. 90–118.
Taraban, R. and McClelland, J. L. (1988) ‘Constituent attachment and thematic role
assignment in sentence processing: Influences of content-based expectation’,
Journal of Memory and Language 27: 597–632.
Tesar, B. (1997) ‘An iterative strategy for learning metrical stress in optimality theory’,
in E. Hughes, M. Hughes, and A. Greenhill (eds), Proceedings of the 21st Annual
Boston University Conference on Language Development. Somerville, MA: Cascadilla,
pp. 615–26.
—— and Smolensky, P. (1998) ‘Learnability in optimality theory’, Linguistic Inquiry
29(2): 229–68.
—— and Smolensky, P. (2000) Learnability in Optimality Theory. Cambridge, MA:
MIT Press.
References 391

Thráinsson, H. (1991) ‘Long-distance reflexives and the typology of NPs’, in J. Koster


and E. Reuland (eds), Long-Distance Anaphora. Cambridge: Cambridge University
Press, pp. 49–76.
Timberlake, A. (1977) ‘Reanalysis and actualization in syntactic changes’, Linguistic
Inquiry 8: 141–77.
Timmermans, M., Schriefers, H., Dijkstra, T., and Haverkorth, M. (2004) ‘Disagree-
ment on agreement: Person agreement between coordinated subjects and verbs in
Dutch and German’, in Linguistics 42: 905–29.
Tomlin, R. S. (1986) Basic Word Order: Functional Principles. London: Routledge
(Croom Helm).
Travis, L. (1984) ‘Parameters and Effects of Word Order Variation’, Ph.D. thesis,
Department of Linguistics, MIT.
—— (1989) ‘Parameters of phrase structure’, in M. R. Baltin, and A. S. Kroch (eds),
Alternative Conceptions of Phrase Structure. Chicago: The University of Chicago
Press.
Treisman, M. (1978) ‘Space or lexicon? The word frequency effect and the error
frequency effect’, Journal of Verbal Learning and Verbal Behavior 17: 37–59.
Truckenbrodt, H. (1999) ‘On the relation between syntactic phrases and phonological
phrases’, Linguistic Inquiry 30: 219–55.
Trueswell, J. C. (1996) ‘The role of lexical frequency in syntactic ambiguity resolution’,
Journal of Memory and Language 35: 566–85.
—— and Tanenhaus, M. K. (1994) ‘Toward a lexicalist framework for constraint-
based syntactic ambiguity resolution’, in C. Clifton, L. Frazier, and K. Rayner (eds),
Perspectives on Sentence Processing. Hillsdale, NJ: Lawrence Erlbaum Associates,
pp. 155–79.
——, Tanenhaus, M. K., and Kello, C. (1993) ‘Verb-specific constraints in sentence
processing: Separating effects of lexical preference from gardenpaths’, Journal of
Experimental Psychology: Learning, Memory, and Cognition 19: 528–53.
Tsimpli, I. and Sorace, A. (2005) ‘Differentiating ‘‘interfaces’’: L2 performance in
syntax/semantics and syntax/discourse phenomena’, MS, University of Thessaloniki
and University of Edinburgh.
——, Sorace, A., Heycock, C., and Filiaci, F. (2004) ‘First language attrition and
syntactic subjects: A study of Greek and Italian near-native speakers of English’,
International Journal of Bilingualism 8: 257–77.
Ueyama, A. (1998) ‘Two Types of Dependency’, Ph.D. thesis, University of Southern
California.
Vallduvı́, E. (1992) The Informational Component. New York: Garland.
Van Hoof, H. (1997) ‘On split topicalization and ellipsis’, Technical Report. 112,
Arbeitspapiere des Sonderforschungsbereichs 340, Tübingen.
Van Hout, A. (2000) ‘Event semantics in the lexicon–syntax interface: Verb frame
alternations in Dutch and their acquisition’, in C. Tenny and J. Pustejovsky (eds),
Events as Grammatical Objects. Stanford: CSLI, 239–82.
392 References

Vennemann, T. (1974) ‘Theoretical word order studies: Results and problems’, Papiere
zur Linguistik 7: 5–25.
Vergnaud, J. R. and Zubizarreta, M. L. (1992) ‘The definite determiner and the
inalienable constructions in French and English’, Linguistic Inquiry 23: 592–652.
Vetter, H. J., Volovecky, J., and Howell, R. W. (1979) ‘Judgments of grammaticalness:
A partial replication and extension’, Journal of Psycholinguistic Research 8: 567–83.
Viterbi, A. J. (1967) ‘Error bounds for convolutional codes and an asymptotically
optimal decoding algorithm’, IEEE Transactions on Information Processing 13: 260–9.
Vitevitch, M. and Luce, P. (1998) ‘When words compete: Levels of processing in
perception of spoken words’, Psychological Science 9: 325–9.
——, Luce, P., Charles-Luce, J., and Kemmerer, D. (1997) ‘Phonotactics and syllable
stress: Implications for the processing of spoken nonsense words’, Language and
Speech 40: 47–62.
Vitz, P. C. and Winkler, B. S. (1973) ‘Predicting judged similarity of sound of English
words’, Journal of Verbal Learning and Verbal Behavior 12: 373–88.
Vogel, R. (2001) ‘Case conflict in German free relative constructions. An optimality
theoretic treatment’, in G. Müller and W. Sternefeld (eds), ‘Competition in
syntax’, No. 49 in Studies in Generative Grammar. Berlin and New York: de
Gruyter, pp. 341–75.
—— (2002) ‘Free relative constructions in OT syntax’, in G. Fanselow and C. Féry
(eds), ‘Resolving conflicts in grammars: Optimality theory in syntax, morphology,
and phonology,’ in Linguistische Berichte Sonderheft 11, Hamburg: Helmut Buske
Verlag, pp. 119–62.
—— (2003a) ‘Remarks on the architecture of OT syntax’, in R. Blutner and H. Zeevat
(eds), Optimality Theory and Pragmatics. Houndmills, Basingstoke, Hampshire,
England: Palgrave Macmillan, pp. 211–27.
—— (2003b) ‘Surface matters. Case conflict in free relative constructions and case
theory’, in E. Brandner and H. Zinsmeister (eds), New Perspectives on Case Theory.
Stanford: CSLI Publications, pp. 269–99.
—— (2004) ‘Correspondence in OT syntax and minimal link effects’, in A. Stepanov,
G. Fanselow, and R. Vogel (eds), Minimality Effects in Syntax. Berlin: Mouton de
Gruyter, pp. 401–41.
—— and Frisch, S. (2003) ‘The resolution of case conflicts. A pilot study’, in S. Fischer,
R. van de Vijver, and R. Vogel (eds), Experimental Studies in Linguistics 1, vol. 21 of
Linguistics in Potsdam. Institute of Linguistics, Potsdam: University of Potsdam,
pp. 91–103.
—— and Zugck, M. (2003) ‘Counting markedness. A corpus investigation on Ger-
man free relative constructions’, in S. Fischer, R. van de Vijver, and R. Vogel (eds),
Experimental Studies in Linguistics 1, vol. 21 of Linguistics in Potsdam. Institute of
Linguistics, Potsdam: University of Potsdam, pp. 105–22.
——, Frisch, S., and Zugck, M. (in preparation) ‘Case matching. An empirical study.’
MS, University of Potsdam. To appear in Linguistics in Potsdam.
References 393

Warner, N., Jongman, A., Sereno, J., and Kemps, R. (2004) ‘Incomplete neutralization
and other sub-phonemic durational differences in production and perception:
Evidence from Dutch’, Journal of Phonetics 32: 251–76.
Warren, P., Grabe, E., and Nolan, F. (1995) ‘Prosody, phonology and parsing in closure
ambiguities’, Language and Cognitive Processes 10: 457–86.
Wasow, T. (1997) ‘Remarks on grammatical weight’, Language Variation and Change
9: 81–105.
—— (2002) Postverbal Behavior. Stanford University, Stanford: CSLI Publications.
Welby, P. (2003) ‘Effects of pitch accent position, type and status on focus projection’,
Language and Speech 46: 53–8.
White, L. (2003) Second Language Acquisition and Universal Grammar. Cambridge:
Cambridge University Press.
Wickelgren, W. A. (1977) ‘Speed-accuracy tradeoff and information processing
dynamics’, Acta Psychologica 41: 67–85.
Wiltschko, M. (1998) ‘Superiority in German’, in E. Curtis, J. Lyle, and G. Webster
(eds), Wccfl 16, the Proceedings of the Sixteenth West Coast Conference on Formal
Linguistics. Stanford: CSLI, pp. 431–45.
Withgott, M. (1983) ‘Segmental Evidence for Phonological Constituents’, Ph.D. thesis,
Univerity of Texas, Austin.
Wright, R. (1996) ‘Consonant Clusters and Cue Preservation’, Ph.D. thesis, University
of California, Los Angeles.
Wunderlich, D. (1997) ‘Cause and the structure of verbs’, Linguistic Inquiry 28: 27–68.
—— (2003) ‘Optimal case patterns: German and Icelandic compared’, in E. Brandner
and H. Zinsmeister (eds), New Perspectives on Case Theory. Stanford: CSLI Publi-
cations, pp. 329–65.
Yamashita, H. (2002) ‘Scrambled sentences in Japanese: Linguistic properties and
motivations for production’, Text 22(4): 597–633.
—— and Chang, F. (2001) ‘ ‘‘Long before short’’ preference in the production of a
head-final language’, Cognition 81: B45–B55.
Young, R. W., Morgan Sr., W., and Midgette, S. (1992) Analytical Lexicon of Navajo.
Albuquerque: University of New Mexico Press.
Zec, D. (2002) ‘On the prosodic status of function words’, Working Papers of the
Cornell Phonetics Laboratory 14: 206–48.
Zribi-Hertz, A. (1989) ‘A-type binding and narrative point of view’, Language
65: 695–727.
Zsiga, E. (2000) ‘Phonetic alignment constraints: consonant overlap and palataliza-
tion in English and Russian’, Journal of Phonetics 28: 69–102.
Zue, V. and Laferriere, M. (1979) ‘Acoustic study of medial /t, d/ in American English’,
JASA 66: 1039–50.
Zuraw, K. R. (2000) ‘Patterned Exceptions in Phonology’, Ph.D. thesis, University of
California, Los Angeles.
This page intentionally left blank
Index of Languages

Abun 78–9 266–8, 283, 292–309, 312–15, 321,


Aguatec 78–9 332–4, 357
Albanian 78–9 Greek 78–9, 118, 349
Amuesha 78–9
Arabic 9 Hebrew 329–32, 334
Huichol 78–9
Basque 223 Hungarian 51, 78–9, 118
Bulgarian 49
Icelandic 63
Chatino 78–9 Ioway-Oto 78–9
Chinantec 78–9 Irish 97
Chontal 78–9 Italian 18, 78–9, 106–8, 110–15, 117,
Chukchee 78–9 119–22
Couer D’Alene 78–9
Croatian 338 Japanese 13, 18–19, 207, 211–13, 217, 225,
Cuicatec 78–9 338–48, 352–4, 357–8

Dakota 78–9 Keresan 78–9


Danish 317, 321–2, 333–5 Khasi 78–9
Dutch 20, 49–51, 53, 59, 61, 63–8, 88–105, Korean 345
108, 151, 321–2, Koryak 78–9

English 2, 6–11, 13, 18, 26–8, 30, 32–4, Lummi 259–60


37–41, 49–54, 59–61, 63, 78–81,
92–3, 106–7, 112–15, 119–22, Malayalam 60–1, 64
149–52, 171, 185–6, 200–3, 207, Mazatec 78–9
209–11, 222–5, 231–2, 259–60, 265,
284, 292, 321, 334, 336–7, 340, Navajo 11, 186, 191–3, 195–204
349–53, 355–8 Norwegian 78–9

Finnish 75 Osage 78–9


French 27–30, 37, 39, 65–6, 68, 92–3, Otomi 78–9
110–11, 117, 119, 122, 281, 349
Frisian 53, 66–7 Pame 78–9
Portugese 78–9
German 16, 20, 49, 67, 95, 108, 115, 125–8,
130, 146–8, 151–5, 161, 165–6, Romanian 78–9
242–3, 247, 249, 254–6, 261–4, Russian 49, 115
396 Index of Languages

Spanish 41, 111, 115, 121, 171, 231–2, 352 Terena 78–9
Sundanese 37 Thai 78–9
Swedish 247, 333 Totonaco 78–9
Tsou 78–9
Tagalog 72 Turkish 115
Takelma 78–9
Telugu 78–9 Wichita 78–9
Index of Subjects

adjacency hypothesis 209, 217–18, 220–2 constraint ranking 173–82, 257–9,


alignment 262–4, 270–2
generally 189–90 constraint ranking value 194–7, 199,
string alignment algorithm 190–1 201
allophony 39 constraint violations 270–2
ambiguous case marking 308–11 cue constraints 170–1, 173–4, 176–80
ambiguous verb form 302–8 faithfulness constraints 170, 252–5,
anaphora 53, 56–8, 60–8 258–9, 265
argument order variation (in floating constraint 255–6
German) 125–8 information structure constraints
argument order permutation types (IS) 317, 319–21
generally 127–9 junk constraints 195, 197–9
pronoun ‘movement’ 127 markedness constraints 252–9
scrambling 127 morphology constraints 193
topicalization 127 realize case (RC) 253–4, 262–4
wh-movement 127 realize case (relativized) (RCr) 255
argument order reanalysis 130–8 realize oblique (RO) 253–5
argument structure theory sensorimotor constraints 170–1, 179
constructional theory 108–9 singularity constraint 295–6, 298, 300
projectionist theory 108 sonority modulation constraint 79–81
split intransivity hierarchy structural constraints 170, 173
(SIH) 109–11, 115–19, 122 subject constraint 324–5, 327–30
attachment 231–2, 239 subject precedes object (S<O) 253–4,
262–3
canonical binding theory 56–8 syllable structure constraints 2
combination 208 contrast 33–4
confusiability testing 76 correlative 255, 261–9
constituent recognition domain
(CRD) 210–11 dative inalienable possession
constraints contruction 92–7, 100–2, 104–5
1to1 255, 263–4 dative object-experiencer verbs 131–8
1to1 & S<O 263 deletion
agreement constraint 296, 300 schwa deletion 37, 39–41
articulatory constraints 170–2, 179–83 t,d deletion 6–8
auditory constraints 171 dependency 209
avoid redundancy (*Red) 262–3, 266 discontinuous NPs 295–302
disjunctive coordination 302–8
398 Index of Subjects

‘distance from English’ task 73–4 immediate constituents (IC) 209


d-linking 317, 327, 329–35 implicit prosody hypothesis 337–8, 345
domain 209 incrementality 235
duration 40–1 intermediate speech repertoire 86, 88,
Dutch Syntactic Atlas Project 88–90, 90, 93–101, 104–5
98, 102 intonational contours 146, 149, 152–61
inverted perception 176–9
early immediate constituents
(EIC) 209, 211–13, 215–19, 221–3 L1 attrition 111–15, 120, 122–3
Empty Category Principle (ECP) 317 L2 acquisiton 111–15, 119–20, 122–3
empathic prosody (EPD) 341–8, 350–1, learning model 187–91, 194, 197, 203
355 least square estimation (LSE) 279
episodic theory lexical access 82
see exemplar theory lexical ditribution analysis 75
event-related brain potentials lexical domain (LD) 215–6, 218–19, 222
(ERP) 129–42 LF interface 117–18
exemplar models 83 linear optimality theory (LOT)
exemplar theory 8 constraint weight in LOT 273
constraint ranking in LOT 273–6
f-(ocus)-structure theory 322–6, 328, culmulativity in constraint
330–5 violations 272–80
flapping 37–9 generally 270–2
focus domains 319 grammar signature 273
focus projection 148 grammaticality 275–6
forward anaphora sentence 114 harmony 274–6
free relative clauses 250–1, 254, 258–9, learning algorithm for LOT 277–9
261–9 linearity hypothesis 273
frequency-grammaticality gap 243–5 optimality 275–6
ranking argument 277–9
Gaussian Elimination 279 ranking hypothesis 272
Gradual Learning Algorithm 10, 178, subset theorem 280
186, 194–8, 202–4, 283 violation profile 274
Greenberginan word order linguistic experience 228–33, 240, 245
correlations 221–2 long distance movement 312–15
long-distance scrambling 339–40, 344–7
Harmony Grammar (HG) 280–2
human parsing routines 292 magnitude estimation (ME) 109
hyperarticulation 8 manner (m) meaning-component 325
mapping approach 28–9, 31
/i/ prototype effect 167, 172–9 markedness 246–7, 256–9, 266, 269
i-(dentificational) dependency 324, maximum entrophy models 285–6
326–31, 333 minimalist style grammar
IC-to-word ratio 210–11, 221 generally 54–6, 116
Index of Subjects 399

PF-interface 55–6, 58 sentence length 49


CI-Interface 55–6, 58–9, 64 sibilant harmony 186–7, 192–3, 199–204
minimize domains (MiD) 207–9, 211, similarity jugements 74–5
215, 218–19, 222–6 sonority 77–9
Mittelfeld 126 speed-accuracy trade-off procedure
morphophonemic alternations 36 (SAT) 133, 135–7, 140–2
stochastic generalization 260
nasalization 27–8, 30, 37, 39 stochastic optimality theory (SOT) 223,
neutralization 33 256, 259–61, 266, 269
structural ambiguity 311–15
operator raising 51 subjacency 317–18, 339–41, 344
optimality theory 56–7 subject
null subject 113–15, 120
paradigm uniformity effects 36–9 overt subject 113–15, 118, 120–1
partial movement contruction 314 postverbal subject 113–17, 121
passive voice 260–1, 265–6 preverbal subject 113–17
‘perceptual magnet’effect 183 subject verb agreement 302–8
performance-grammar correspondence superiority 325–9, 332–5
hypothesis (PGCH) 222–5 syntactic processing mechanism 230
phonologization 30
‘phonology and phonetics in parallel’ three-level grammar model 167–72
model 168 total domain differential (TDD) 219–20,
phonotatics 34–6 224–5
phrasal combination domain (PCD) tuning hypothesis 231, 240
211, 213, 216, 219, 224
pitch accent 147–9, 157–61 unaccusative hypothesis 107–11
probabilistic context free grammars unidimensional approach 29
(PCFG) 234–5, 237, 240–3 unmarked prosodic structure
probabilistic models 228, 232–44, 248, (UPS) 147–8, 153, 158
259
probabilistic optimality theory velarization 39
(POT) 282–5 verb entailment test 214
processing difficulties 228, 292–4 verb order variation 242
pro-verb entailment test 214 Vorfeld 126
VP-preposing 308–11
redundancy 43–4, 264–5 ‘weight effect’ 209, 218, 220, 225
reflexivity 56–68
well-formedness judgements
scope inversion 160–4 acceptability task 71–3
scrambling 334–5 wordlikeness task 72–4, 79–81
segment sequences 146 wide-scope negation 352, 356
SELF-marking 59–64 ‘wug-testing’ 75
sentence accent assignment rules
(SAAR) 149–50
Index of Names

Abney 241, 248, 286 Bishop 282


Albright 11, 185, 189, 191, 200–3 Blancquaert 92–3, 101
Alexopoulou 109, 349 Blevins 29, 200
Altenberg 87 Boberg 284
Altmann 228 Bod 32, 70, 83
Andersen 323 Boersma 10–11, 168–70, 173–6, 178, 181,
Anderson 244 186, 193–5, 198, 259, 261, 282–4
Antilla 32, 70, 75, 193 Boethke 250–1, 261
Apoussidou 169 Bolinger 1, 326–7
Ariel 225 Borer 108, 116
Asudeh 241, 280, 283 Bornkessel 126, 128, 133, 135–6, 165
Auer 86, 88 Brants 233, 235, 237–8
Avesani 351, 356 Bresnan 16, 121, 223, 225, 259–60, 265–6
Ayers-Elan 146 Briscoe 235
Broekhuis 92–4, 96
Bach 200 Browman 29
Baddeley 52 Brysbaert 230–2
Bader 127–8, 130–2, 292, 294–5, 357 Büring 161
Baertsch 2 Burnage 75
Bailey 74, 76 Burnard 230
Baltin 320 Burton-Roberts 27
Barbier 85 Burzio 107, 193
Bard 86, 109, 137, 242, 349 Bybee 6–7, 32, 34, 41–2, 70, 82–3, 185,
Barnes 39 200
Bayes 235
Beckman 35–6, 44, 146 Carden 101
Beghelli 64 Cardinaletti 112
Belletti 112, 117, 121 Carlson 337
Bentley 110 Carpenter 52
Benua 36–7 Carroll 235, 242, 244
Berent 83 Cavar 295
Berger 286 Cedergren 6
Berko 75 Cennamo 110
Bever 292 Chang 212, 225
Bierwisch 131, 147 Charniak 237, 244
Bini 121 Chater 233, 240, 244
Birch 149–52, 160 Chen 39
Index of Names 401

Chomsky 1, 5, 14, 20, 26, 45–6, 54–6, Ferreira 239


58–9, 86, 116, 124, 187, 196, 215, 227, Féry 147–8
292, 349 Fiebach 125–6, 293
Christiansen 233 Filiaci 112, 121
Cinque 147, 318 Fischer 56
Clahsen 120 Flemming 29
Clements 29 Fodor 13, 232, 336, 338, 344, 346, 352
Clifton 149–52, 160, 239, 351 Francis 40, 230
Cocker 270 Frazier 127, 232, 237, 295, 351
Coetzee 32 Frieda 167, 176, 183
Cohn 27–9, 37, 39, 41, 145 Friederici 120, 130, 139
Coleman 71–2, 82 Frisch, S. A. 9, 32, 34, 70, 72, 74, 79, 82,
Coles 131, 139 130, 145, 256
Collins 237, 244 Frisch, S. 294
Corley 229–30, 237 Fujimura 39
Cornips 85, 88–94, 96–100
Corrigan 90 Ganoug 171
Cowart 109, 348, 358 Garnsey 139, 229–30
Crocker 229–30, 232–3, 235, 237–8, 244 Gathercole 52
Cuetos 231–2, 352 Gervain 51, 87
Culy 241 Gibson 209, 229, 239, 245
Cutler 12, 169 Gjaja 183
Godfrey 7
d’Arcais 127 Godfrey 75
Davis 2 Goldinger 8
Deguchi 340–1, 343, 345 Goldstein 29
Dell 82, 83 Goldstone 74
Diehl 39 Goldwater 285–6
Diesing 333 Grabe 147
Dingare 265–6 Graben, beim 130
Dryer 221 Greenberg 9, 73–4, 221–2
Duffield 110 Grice 147
Duffy 229 Griffith 140
Grimshaw 112
Elman 232 Grodzinsky 52
Erteschik-Shir 319, 322, 324–5, 331 Groos 250, 254–5
Escudero 169, 173–4 Grosjean 229
Eyrthórsson 110 Guenther 183
Gürel 115
Fanselow 131, 292–5, 314, 332–4 Gussenhoven 146–7, 149–52, 160
Fasold 6 Guy 6, 284
Featherston 243–5, 251, 292, 294, 332
Felser 120, 293 Hahn 74, 76
402 Index of Names

Haider 131 Jurafsky 7, 40–2, 230, 233–5, 237–8, 244


Hale 43, 200, 235, 238–9 Just 126
Halle 5, 26, 187, 196
Hankarner 337 Kaoru Horie 212–13
Harbusch 243–5 Kavitskaya 39
Harms 200 Kaye 6
Haspelmath 226 Kayne 68, 328
Hawkins 209–14, 216–18, 221–3, 225–6 Keating 26–8, 39
Hay 9, 34, 72, 80–2 Keller 109, 110, 116, 240–3, 245, 252,
Hayes 10–11, 29, 70, 147, 185–6, 189, 191, 257–8, 270–2, 274, 276–7, 279–81,
194, 198, 200–3, 259, 277, 282–3 283–6, 292, 349
Heinamaki 70, 75 Kellerman 116
Hemforth 128, 130, 292 Kempen 243–5
Henry 97 Kenstowicz 32, 36, 77
Hill 349, 357 Kessels 92
Hillyard 140 Kessler 80, 82
Hindle 239 Kilborn 120
Hirose 357 Kim 345
Hirotani 341–2 Kimball 210
Hirschberg 351, 356 King 126, 293
Höhle 101 Kingston 39
Hoijer 186–7, 192 Kirby 225–6
Hooper 6, 40, 70, 77 Kirchner 7, 29, 170, 181
Hout, van 116 Kirparsky 6, 32, 36, 200–1
Hruska 151, 160 Kisseberth 32, 36
Huang 339 Kitagawa 336, 340–1, 343–6, 350
Hulk 91, 115 Kjelgaard 151
Hume 27 Klatt 28
Hyman 30 Klein 116
Kluender 293
Ishihara 341 Kolb 249
Krahmer 151
Jackendoff 60, 222 Krems 128, 292
Jäger 284–6 Krifka 160–2
Jakubowicz 118 Kroch 97
Jayaseelan 60 Kruyt 151
Jenkins 9, 73–4 Kubozono 346
Johnson 8, 27, 83, 167–8, 172, 176–7, 181, Kuhl 183
285–6 Kuno 338
Jongenburger 88, 100 Kutas 126, 139–40, 293
Jongeneel 92 Kvam 312
Josefsson 247
Jun 146 Labov 2, 6–7, 32, 87
Index of Names 403

Lacerda 178, 183 Meng 127–8, 130, 292


Ladd 27, 33, 149 Mikheev 202
Laferriere 32 Miller 292
Lahiri 147 Mitchell 230–2, 352
Lakoff 86 Miyata 110
Lambrecht 323 Moder 185, 200
Lapata 230 Montrul 107, 110–11, 115
Lappin 319 Moreton 76, 83
Large 76 Morgan 336
Laridere 119 Morton 149
Lasnik 318 Müller 14, 115, 257–8, 285, 292–3, 305
Lavoie 40–1 Munson 76, 80
Legendre 110, 122, 183, 280–1 Murphy 349
Lehiste 151 Muysken 87
Lehmann 222 Nagy 255
Leonini 117
Levin 108 Navarro 115
Liceras 121 Nespor 147
Lindblom 8 Ney 203
Lohse 210 Nooteboom 151
Lovric 352 Nordlie 140, 142
Luce 76, 82 Nussbaum 79

MacBride 193 Ohala 77


McCarty 169
McClelland 239 Paolillo 100
McDaniel 6, 109 Paradis 115
MacDonald 215, 230, 233, 237 Pearlmutter 229, 245
MacEachern 70 Pechmann 125, 128
McElree 140, 142, 165 Perdue 116
McQueen 12, 76, 169 Perlmutter 107
McRae 230, 244 Pesetsky 326, 339
Manning 214, 223, 225, 234, 240, 248–9, Peters 147
265–6 Petten, van 139
Maraniz 47 Pickering 230, 240, 244
Marcus 232 Pierrehumbert 8, 28–30, 34–6, 39, 42, 44,
Marks 1, 294 70–2, 82–3, 149
Marslen-Wilson 82 Pinker 185, 201
Mateu 110–11 Pisoni 76, 82
Matzke 293 Pitt 76
Maynell 352 Pittner 250, 254–5, 266
Mecklinger 126, 130 Poletto 85, 88–9, 98
Mendoza-Denton 70 Polinsky 115
404 Index of Names

Pollard 53, 61, 222 Schütze 137, 214, 234, 239, 337, 348–9
Prasada 185, 201 Schütze 86–7
Prévost 119 Schwarzschild 148
Prince 169, 201, 252, 274, 277, 280 Scobbie 30, 32
Pritchett 232 Selkirk 148, 338
Semdt, De 213
Randall 110 Sendlmeier 74
Rapoport 325 Serratrice 115, 121
Rappaport Hovav 108 Sevald 82
Rarnscar 201 Skut 242
Rayner 237, 239 Smith 76
Reinhart 48, 52, 55, 61–3, 67, 322–3 Smolensky 110, 169, 252, 274, 277, 280–1,
Reiss 43, 200 283
Reuland 48, 57–8, 60–4, 67 Snyder 293
Reynolds 255 Sorace 106–13, 116, 118, 120–2, 270–1, 277
Riehl 38 Speer 151
Riemsdijk, van 250, 254–5, 295 Sproat 39
Riezler 241 Sprouse 122
Ringen 70, 75 Stallings 211, 225
Rizzi 112, 117, 120 Starke 112
Röder 125, 128 Stearns 145
Roland 230 Stechow, von 147
Rooth 239, 242, 244 Steele 149
Rosenbach 284–5 Steriade 29, 36–9, 42, 83, 170
Rosengren 131 Sternefeld 249
Ross 86, 317–18 Stevens 109
Rugg 131, 139 Stolcke 235
Rumelhart 47, 281 Stowell 64
Russell 193 Strawson 322
Sturt 230, 240
Sabourin 120 Suppes 14
Sag 53, 61, 222 Swerts 151
Saito 318, 340 Swinney 215
Samek-Lodovici 112, 147–8
Samuel 171 Takahashi 344
Sankoff 6 Tanenhaus 230, 233, 245
Sapir 186–7, 192 Taraban 239
Sarle 282 Tesar 169, 283
Schafer 151 Thompson 323
Schladt 59, 66 Timberlake 7
Schlesewsky 128, 130–1, 165, 292, 294 Timmermans 302, 305, 321
Schmerling 147 Tomlin 222
Schriefers 130 Tráisson 63
Index of Names 405

Travis 222 Wasow 210, 215, 218, 225


Treiman 80, 82 White 119
Treisman 82 Wickelgren 140
Trueswell 230 Wiltschko 332
Tsimpli 107, 114, 116, 118, 121 Winkler 74
Withgott 37
Ueyama 340 Wright 77
Wunderlich 131
Vago 87
Vallduvi 147 Yamashita 212, 225
Vance 122 Young 187
Vennemann 222 Yuki Hirose 350
Vergnaud 92, 94–6
Vetter 349 Zawaydeh 72
Vicenzi 127 Zribi-Hertz 65
Viterbi 235 Zsiga 29
Vitevich 32, 76 Zubizaretta 92, 94–6
Vitz 74 Zue 32
Vogel 147, 250, 253–6, 259, 262–7, 315 Zugck 256, 262, 264, 267
Zuraw 9, 70, 72–3, 75
Warner 33

You might also like