Professional Documents
Culture Documents
This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
In the unlikely event that the author did not send UMI a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION
A Dissertation Presented
by
DOCTOR OF PHILOSOPHY
May 2002
Department of Linguistics
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 3056262
Copyright 2002 by
Moreton, Alfred Elliott
___ ®
UMI
UMI Microform 3056262
Copyright 2002 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
© Copyright by Elliott Moreton 2002
All rights reserved
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION
A Dissertation Presented
by
John
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ACKNOWLEDGEMENTS
I had a lot of help with this. John Kingston trained me up from nothing and saw
me through the whole thesis with more patience than I really deserved. He has always
been willing to discuss an idea, demonstrate a technique, debug a stimulus set, or go over
a draft. This, plus his eye for logical flaws and tenacious memory for obscure journal
articles, were indispensable to the writing of this thesis. So were the advice and
encouragement of the other two committee members, Lyn Frazier and Chuck Clifton,
provided invaluable aid by inviting me to the NTT Basic Research Labs and supervising
my work there. John McCarthy was only tangentially involved in the present work, but
I'm going to thank him anyway because his classes and seminars are among the most
fascinating experiences I've ever had. Parts of Chapter 4 benefited from the comments of
two anonymous Cognition reviewers. Johns Hopkins University has sheltered me while I
completed my revisions. This research was paid for in part by the U.S. National Science
Foundation, the U.S. National Institutes of Health, and the Nippon Telephone and
Telegraph Company.
Kathy Adamczyk and Lynne Ballard rescued me from many a disaster with their
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dissertating students love company, and I was fortunate to have good company in
Schwartz, my housemates Joe Eskinazi, Eva Juarros and Janina Rado, my labmate
Cecilia Kirk, and my just plain mates Andre Isaak, Caroline Jones, and especially
Special thanks are owed to Earl Gaddis, Virginia van Scoy, and the Northampton Group
of the Boston Branch of the Royal Scottish Country Dance Society for six years of
Finally, I would like to thank my parents for their love and encouragement, and
for acting like this was all perfectly normal. This thesis is dedicated to them.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ABSTRACT
MAY 2002
the expectation that the stimulus is an utterance in the perceiver's language, with a
Optimality Theory, is used to select among competing candidate parses of the acoustic
perceptual effects from the lexicon, and a statistical theory based on transitional
probabilities.
illegal in the language, (2) that the dispreference for illegal configurations is far stronger
vi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
than that for configurations which are legal but have zero frequency, and (3) that it is due
to a response dependency, rather than to auditory or other stimulus factors, and cannot be
that (1) the lexical stratum membership of nonsense words can produce a phonotactic
perceptual effect, (2) that the triggering and target segments can be up to three segments
distant, and (3) that the stratum-phonotactic effect is larger than a word-superiority effect
These results are shown to be consistent with the grammar-based model, but
inconsistent with the two grammarless alternatives. Analysis of the three models reveals
that the shortcomings of the alternatives is due to their inability to abstract over phoneme
classes and larger linguistic structures. It is concluded that the mechanisms of speech
vii
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS..................................................................................................... iv
ABSTRACT..............................................................................................................................vi
LIST OF TABLES..................................................................................................................xiv
CHAPTER
1. INTRODUCTION................................................................................................................. 1
2. PHONOLOGICAL PRELIMINARIES........................................................................... 11
2.1. Introduction................................................................................................................ 11
2.2. Inventory and phonotactics in Optimality Theory.................................................11
2.3. Inventory and phonotactics of English syllable onsets............................................14
2.3.1. Explicanda........................................................................................................ 15
2.3.2. Analysis.............................................................................................................18
2.3.2.1. Representations...................................................................................... 18
2.3.2.2. CV syllables........................................................................................... 28
viii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.6. The stop-affricate-fricative series................................................37
2.3.2.3.7. Constraint lattice...........................................................................41
2.3.2.3. *[sI]......................................................................................................... 42
2.3.2.4. *[tl].......................................................................................................... 46
2.3.2.5. ??[pw]......................................................................................................48
2.4. Summary................................................................................................................... 51
3.1. Introduction................................................................................................................53
3.2. TRACE (McClelland & Elman 1986).....................................................................53
3.3. The MERGE Transitional Probability theory (Pitt & McQueen 1998)............... 61
3.3.2.1. Context.................................................................................................... 66
3.3.2.2. Database.................................................................................................. 69
3.3.2.3. Decision ru le.......................................................................................... 72
ix
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.5. Summary..................................................................................................................100
3.6. Appendix: Computing frequencies...................................................................... 102
4. EMPIRICAL TESTS.........................................................................................................108
4.1. Introduction...............................................................................................................108
4.2. Experiment I: Sequence frequency and the phonotactics of word-final lax
vowels................................................................................................................. 110
4.2.1. Rationale.........................................................................................................110
4.2.2. Design............................................................................................................. 113
4.2.3. Predictions..................................................................................................... 119
4.2.4. Methods............................................................................................................137
4.2.5. Results..............................................................................................................139
4.2.6. Discussion........................................................................................................ 143
4.3.1. Rationale...........................................................................................................143
4.3.2. Design...............................................................................................................144
4.3.3. Predictions....................................................................................................... 147
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.3.2.2. SC -1............................................................................................153
4.4.3.1.1. [_fkous]stimuli...........................................................................164
4.4.3.1.2. [_vnAm]stimuli...........................................................................168
4.4.3.1.3. Expected and actual TRACE predictions..................................170
xi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5.1. Rationale..........................................................................................................186
4.5.2. Design.............................................................................................................. 187
4.5.3. Predictions...................................................................................................... 187
4.6.1. Rationale.........................................................................................................208
4.6.2. Design............................................................................................................. 208
4.6.3. Predictions..................................................................................................... 208
4.6.4. Methods..........................................................................................................208
4.6.5. Results............................................................................................................212
4.6.6. Discussion...................................................................................................... 215
4.7.3.1. Design....................................................................................................224
4.7.3.2. M ethods................................................................................................ 225
4.7.3.3. Results and discussion......................................................................... 226
xii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.4.1. Design................................................................................................... 228
4.7.4.2. Predictions........................................................................................... 230
4.7.4.3. M ethods................................................................................................234
4.7.4.4. R esults.................................................................................................. 235
4.7.4.5. Discussion............................................................................................ 238
4.8. Summary..................................................................................................................239
4.9. Appendix: Synthesis parameters for the stimuli of Experiments 4 and 5 .........240
5. CONCLUSIONS...................................................................................................... 248
BIBLIOGRAPHY...........................................................................................................253
xiii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF TABLES
Table Page
xiv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2. Probability that a given diphone will be followed by a given segment
(extract from complete table)..............................................................................64
3.6. Transitional probabilities for the stimuli of Pitt & McQueen (1998), n= 1... 71
3.14. Triphone frequencies for the stimuli of McQueen and Pitt (1998)...................84
3.17. Likelihood ratio as a predictor of the phonotactic bias effects of Pitt (1998). 88
4.2. Change of lax to tense vowels when made final by truncation....................... 112
xv
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
4.9. Featural parameters of the four original TRACE vowels (McClelland &
Elman 1988)....................................................................................................... 124
4.32. Frequency of the syllables in the stimuli for Experiment 2 ............................. 146
4.33. Word-initial occurrences of the critical syllables from Experiment 2 ............ 148
4.51. Words beginning with the critical onsets in the lexicon used for the
TRACE simulation of Experiment 3 ................................................................. 165
xvi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.60. Diphone frequencies for the stimuli of Experiment 3......................................171
4.72. Mean percent "p" response, all intermediate [...vnAm] stimuli....................... 180
4.73. Differences in mean "p" response, pairwise by subject, [...vnAm] stimuli... 180
4.76. Mean percent "p" response, all intermediate [...fkous] stimuli....................... 184
4.95. Errors in production of the initial stop in [bl pi tw] onsets by English-
leaming children in Iowa and Nebraska (Smit 1993)..................................... 218
xvii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.102. Stimuli for Experiment 6 b .................................................................................234
4.104. Constant synthesis parameters which were identical for the "b" and "d"
arrays of Experiments 4 and 5 ........................................................................... 240
4.105. Time-varying synthesis parameters common to the "b” and "d" arrays of
Experiments 4 and 5............................................................................................ 241
4.106. Synthesis parameters for the "b" array of Experiments 4 and 5 .............. 243
4.107. Synthesis parametersfor the "d" array of Experiments 4 and 5 ..................... 245
xviii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF FIGURES
Figure Page
4.12. Results of the TRACE simulation for the input [salleiX]-] ...........................128
xix
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.45. Identification curves for the stimuli of Experiment 2, pooled across 7
listeners................................................................................................................157
4.86. Log odds ratios for the "l"/"w" judgment in Experiment 4, contingent on
the "g’V'd" judgm ent.......................................................................................... 199
4.87. Log odds ratios for the ”17"w" judgment in Experiment 4, contingent on
the "g’7"b" judgm ent.......................................................................................... 200
4.92. Log odds ratios for the 'T7"w" judgment in Experiment 4, contingent on the
»b"/"d» judgment, for the CCV stimuli.............................................................213
4.93. Log odds ratios for the 'T7"w" judgment in Experiment 4, contingent on the
"b'7"d" judgment, for the VCCV stimuli......................................................... 214
4.103. Boundary between [a] and [a:], averaged across 21 listeners....................... 236
xx
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 1
INTRODUCTION
phonological structure of the speech signal. This theory is tested empirically against the
rival claims of two other models to explain the same phenomena: TRACE, which uses
lexical knowledge, and the MERGE transitional-probability theory, which uses segment-
string frequency.
grammaticality. Languages place tight restrictions on how their segmental inventories can
combine into larger units such a syllables, morphemes, or words, and speakers are sensitive
possible and impossible combinations are the phonotactics of the language. Phonotactics
Phonotactic effects turn up in many places. They appear as systematic gaps in the
distribution of sounds in a speech corpus (e.g., Harris 1951, Lamontagne 1993) - what in
Chapter 2 are called phonological gaps. Phonotactics can drive synchronic phonological
alternations, such as that between American English [t] and [rj, which are conditioned by the
neighboring segments (e.g., Prince & Smolensky 1993). A foreign word can undergo
sound changes when it is borrowed that adapt it to the phonotactics of the borrowing
Native speakers share intuitions about the phonological grammaticality in their own
language of novel phoneme strings (Greenberg & Jenkins 1964, Scholes 1966). English
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
listeners can also accurately judge the relative frequency o f non-English consonant clusters
in the languages of the world (Pertz & Bever 1975). A language's phonotactic constraints
are respected by its speakers’ slips of the tongue (Fromkin 1971) and ear (Sapir 1933;
Brown & Hildum 1956; Halle, Segui, Frauenfelder, & Meunier 1998), and have been
perception, requiring, for instance, stronger acoustic evidence to believe that they have heard
an illegal stimulus than a legal one. Moreover, illegality measured one way (e.g., off-line
intuitive-goodness judgments) tends to agree with illegality measured in other ways (e.g.,
What is at issue is the nature of that knowledge, and of its interaction with the
proposals. Each will be examined chiefly in light of its account of the phonotactic effect on
a stimulus contains a phoneme which is acoustically ambiguous between one which is legal
in that context and one which is illegal, listeners' reports are biased towards the legal
The claim which I will advance, elaborate, and defend is the following:
( 1)
Speech input is parsed prelexically to a featural or phonemic surface
representation. When acoustic evidence in the incoming speech stream
supports more than one phonological parse, the competing parses are scored
with respect to the ranked active constraints of the speaker's grammar, and
the more harmonic candidate parse is processed first.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This is a way of allowing performance mechanisms to use linguistic competence, by
setting up a perceptual bias against parses which are disfavored by the grammar. It
therefore incorporates into the performance theory the traditional linguistic view that the
utterance fulfills specific formal requirements - whether it meets the structural description
of a set of abstract grammatical rules. In this view, an ambiguous phoneme generates two
(or more) parses. If one is legal in context and the other is not, perception will favor the
legal parse. This is the principal theoretical contribution offered by this dissertation: An
Quite different in vision are the two rival theories, TRACE (McClelland & Elman
1986) and the MERGE transitional-probability (TP) theory (Pitt & McQueen 1998). These
the lexicon1. Illegality is the extreme low end of a frequency continuum, and its effects are
effects of frequency. Where these two theories differ is in how they implement frequency
effects.
effects emerge as side effects of the word-recognition process. A stimulus will activate
phoneme units, which in turn can produce certain levels of activation in a word unit,
depending on the degree to which the stimulus resembles the word represented by that unit.
A stimulus containing a phoneme ambiguous between a legal and an illegal one will partially
activate some words containing the legal phoneme, but none containing the illegal one.
Activation spreading down from the word units to the phoneme units will increase the
activation of the unit corresponding to the legal phoneme, which will laterally inhibit the
illegal phoneme unit. The result is a perceptual bias towards the legal phoneme.
MERGE model of Norris et al. (2000). In MERGE TP, low-level (pre-Iexical) perceptual
1 Transitional probabilities are assigned to a pre-Iexical m odule in MERGE, but the probabilities
themselves are computed over the lexicon.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mechanisms keep track of the frequencies with which different phoneme sequences occur.
An ambiguous segment and its surrounding context could be interpreted as either of two
sequences, but perception will tend to favor the more frequent possibility - that is, it will
choose the phoneme that, on the basis of past experience, is more likely in the given context.
Both of these models have demonstrated success in accounting for some of the core
phonotactic perceptual phenomena. However, I will argue that neither one is adequate, for
reasons crucially connected with their lack of access to abstract grammatical knowledge.
Both make predictions that are not borne out, and fail to predict phenomena that occur.
syndrome, with a single underlying cause, and identifies that cause with listeners' knowledge
of the sound pattern of their language in the form of phonotactic constraints against
particular combinations of sounds (e.g., Shibatani 1973). Speech tasks make the speaker or
listener assign a linguistic parse to the stimulus; parses which are nonexistent or highly
Perhaps speakers merely know that some sequences are common and others are rare or
(Newman et al. 1997; Pitt & McQueen 1998) in much the same way as phonotactic
illegality. Rarity also speeds "no" responses in lexical decision and slows same-different
judgments of nonwords (Vitevich & Luce 1998). What linguists have described as a
categorical contrast between possible and impossible sequences can instead be interpreted
zero frequency.
Statistical models differ in which statistics they use and how they use them. In the
perceptual model TRACE (McClelland & Elman 1986), the rarity of particular sound
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
querying the lexicon. One component of the MERGE model of perceptual decision-making
(Norris et al. 2000) keeps track of phoneme-to-phoneme transitional probabilities, which are
used without reference to the lexicon. The current version of the Neighborhood Activation
Model (Luce 1986; Luce & Pisoni 1988; Vitevich & Luce 1998, 1999) combines
occurrence frequency has been put forward by Frisch, Broe, & Pierrehumbert (1995).
The theoretically most attractive aspect of statistical models is their account of how
its vocabulary.
On the other hand, they do not explain three phenomena that led people to posit
be able to acquire any language at all. The lexical and statistical mechanisms only
distinguish favored from disfavored sound patterns within a language, after the lexicon has
been learned or the statistical patterns have been analyzed. Yet English listeners can
accurately judge the relative frequency of non-English consonant clusters in the languages
of the world (Pertz & Bever 1975). The cross-linguistic commonness or rarity of different
theories, nor is the way in which the processes found in one language resemble those found
which have evolved a wide array of conceptual tools for this purpose; Articulatory
(Greenberg 1964), feature geometry (Clements 1985), natural classes (Chomsky & Halle
2. The alternations induced by phonotactics are categorical rather than gradient, and
systematic rather than arbitrary. For example, the phonotactics of Standard German forbid
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
word-final [b d g v z]; in that environment, they turn into [p t k f s], despite their differing
frequencies. The frequency difference between (common) word-final [t] and (zero-
frequency) word-final [a] is much greater than that between (uncommon) word-final [p] and
(zero-frequency) word-final [b], yet German speakers "repair" the illegal final voiced
obstruents to the same extent in both cases. And the repair is not to turn the illegal
obstruents into the most frequent legal obstruent, but into the corresponding legal
obstruent.
morphemes or even nonce forms, and exceptions to regular patterns are less likely to occur
as morpheme frequency decreases. These features suggest that the regularity is distinct
from the forms it applies to, rather than emergent from them.
linguistic analysis, which prevents them from abstracting the empirically correct
The only phonological domain above the level of the phoneme which is recognized by
Neither represents the phonotactically crucial domain of the syllable, or any of its
constituents such as the onset and rime. Both are incapable of abstracting over features: All
patterns are represented at the level of the phoneme, sequence, or word frequencies. More
abstract properties which influence phonotactics, such as part of speech or lexical stratum,
are not encoded anywhere. The result is that the dependencies these theories represent do
not correspond to the ones which are linguistically and perceptually relevant.
systematically prohibited) from lexical gaps (sequences which are permitted, but missing
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
from the lexicon through historical accident). If illegality and frequency are the same thing,
then zero-frequency sequences should be equally illegal regardless of why they are illegal.
irrelevant context. For example: The nonword [tli] is illegal in English. The illegality of
the [1] in that context is due entirely to the context on its left - the word boundary and [t],
which create an illegal sequence of two coronal non-continuants in a syllable onset. The [i]
has nothing to do with the phonotactic unacceptability of the string; [tli] and [tla] are both
illegal. TRACE and MERGE TP are blind to this fact. Each applies a fixed "context" to
every phenomenon. The relevant context in TRACE is the entire nonword; that in MERGE
from lexical gaps. Evidence will be presented to show that they can: that phonological gaps
are stronger than lexical gaps, and that phonotactically relevant context is more influential
which it is proposed that performance mechanisms have access to. It first discusses the
review the distinction between lexical and phonological gaps, and between phonotactically
relevant and irrelevant context. Two particularly prominent phonotactic gaps in English
syllable onsets - [tl] and [s j ] - are shown to be phonological rather than lexical gaps, and
Chapter 3 introduces the three theoretical contenders, TRACE, MERGE TP, and the
OT grammatical theory. The rationale for each is discussed and the existing empirical
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The precise workings of the MERGE TP theory have not yet been explained by its
The most important of these parameters is the specific nature of the phonological context:
How many segment positions are included, and how do the left and right contexts interact?
It will be shown that there is no choice of context that can account for the data cited by the
MERGE TP authors in support of the theory. If the context is chosen so as to cover any
one part of the data, the theory makes incorrect predictions about the rest. On the chance
that some of the contradictory data might be artifactual, two contexts are chosen for testing
tactical focus is on the distinction between phonotactically relevant and irrelevant context,
replicate Experiment 1 with initial [pw], considered phonotactically illegal by the TRACE
authors on statistical grounds (McClelland & Elman 1986), but merely "marginal" by
phonoiogists on the basis of intuition, distribution, and history (Hultzen 1965, Wooley
1970, Catford 1988, Hammond 1999). No effect is found, despite the strong statistical
biases against [pw]. Experiment 3 directly compares the bias against [pw] with that against
the much more illegal, but statistically very similar, [tl], and finds a much stronger bias
against the latter. Manipulations of phonotactically relevant context are found to have the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
context have no effect. These findings are argued to favor the grammar-based processing
Where previous work in this field, including Experiments 1-3, has used stimulus
units to measure the dependent and independent variables. Experiments 4 and 5 used a
technique which allows the effect of one response on another to be measured when judging
a CC cluster in which both C’s are ambiguous (Nearey 1990). This allows bias effects to be
disentangled from stimulus factors and hence measured with greater accuracy. In
Experiment 4, the bias against [bw] is compared with that against the much more illegal, but
statistically very similar, [dl]. A strong bias against [dl] is found, but none against [bw],
insure that the results of Experiments 2,3, and 4 were not caused by compensation for
Experiments 6 a and 6 b exploit the stratified nature of the Japanese lexicon, in which
each word belongs to one of four classes with its own syndrome of phonological,
final [a:], while another, Foreign, permits it. The [a]—[a:] boundary is measured in carrier
compared to Foreign cues - an effect which is expected and necessary in the grammar-
based processing model. The MERGE TP model cannot account for this effect directly,
since some of the phonotactically effective context is too far away from the ambiguous
segment for the model to capture the dependency. The results can only be accommodated
The phonotactic boundary shift is larger and more robust than a word-superiority
effect obtained with the same listeners and paradigm in the control Experiment 6 a. This is
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5, finally, sums up the claims, arguments, and data presented in earlier
chapters, and situates them in the larger research context. Problems and opportunities for
the theory of grammar in speech perception are discussed, and areas of future research
delineated.
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 2
PHONOLOGICAL PRELIMINARIES
2.1. Introduction
This chapter has two principal aims. The first is to discuss the Optimality Theory
facts about English syllable onsets that will be used in later chapters. A distinction is drawn
between productive phonological gaps and nonproductive lexical gaps in the syllable
inventory. Two examples of phonological gaps ([tl] and [s j ]) and one example of a lexical
gap ([pw]) in the English syllable onset inventory are discussed, and the grammatical
No spoken language uses all of the segments known to linguistics; each is limited to
only a comparatively small inventory (Maddieson 1984, §1.2). Sounds in the inventory do
not combine at random to form larger units, but are restricted to a small phonotactically
representations, drawn from the lexicon, are inputs to the grammar. The output of the
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 2.1. Architecture of the OT phonological model
LEXICON
/underlying
representation/
CANDIDATE OUTPUTS
[surface representation 1]
[surface representation 2] [surface
GRAMMAR
representation]
Under the principle of Richness o f the Base (Prince & Smolensky 1993, §9.3), the
lexicon and the grammar function as independent modules. AH they have in common is a
representational protocol: The output of the lexicon and the input to the grammar are made
of the same representational elements (features, etc.) put together in the same way. Aside
from this restriction, the lexicon can, in principle, emit any representation, and the grammar
Since the set of output candidates includes, at the very least, a fully faithful candidate
identical to the input (and is generally held to include all of the possible inputs), the
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
grammar acts as a filter: Some of the inputs from the lexicon result in outputs that are
When we observe that a particular segment (or larger configuration) [Y] is missing
from the surface representations of a language, there are therefore two possible accounts:
Either no underlying representation /XI can surface as [Y] because of the filtering action of
the grammar, or else there is in principle such an IXJ, but by historical accident no one
happens to have coined or borrowed a word containing it. In the first case we are dealing
OT is hardly the first theory to make this distinction, and its practical test of what is
or is not grammatical remains the same as that of its predecessors: productivity. If native
speakers can readily accept and produce an unattested segment in different environments,
treating it phonologically and phonetically like a word of their language, then the gap is
accidental, and is modelled as a lexical gap. On the other hand, if speakers are consistently
unable to produce the segment without alteration and without great effort, then the gap is
The distinction between phonological and lexical gaps is similar to, but not quite the
same as, that between systematic and accidental gaps. A gap is systematic if it is part of a
pattern of gaps; it is accidental if it is isolated. When the aim is to describe the sound
pattern of a language with maximum compactness and elegance, it is usual to put the
systematic gaps in the grammar and leave the accidental ones out. Starting with the same
language, we can arrive at different grammars depending on which criterion we follow, since
the systematic gaps are not necessarily productive. (See, for example, the discussion of
English initial [$1] or [pw bw mw] in §3.1.) Because our psychological claim is
1 The term "source/filter model" is an analogy with the source/filter model o f vocal-tract acoustics, in
which the larynx is a sound source whose output is filtered by the rest o f the vocal tract. W hatever the
larynx emits, the rest o f the vocal tract has to deal with it, and will produce som e output.
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ideally, an OT grammar for the phonological inventory of a language should make
the correct filtering predictions. That is, given the entire set of representationally possible
underlying representations, it should produce all and only the productive surface forms.
However, this clear theoretical distinction between legal and illegal is often difficult
The sixteen collated studies list a total of 107 possible onset clusters, of
which there is agreement on only 30, considerably fewer than a third, leaving
discrepancy is even more striking for coda clusters. The same studies
explicitly list or imply well over 500 clusters that are theoretically possible in
syllable codas, of which there is agreement on only 19, fewer than 4 percent
(p. 208).
choice of phonological domain (syllable or word), and so on, but there is a certain
irreducible gradience, a lack of perfectly sharp demarcation between the "legal" and "illegal"
sets. It is agreed that [tl] is an illegal onset and [kw] a legal one, but there is no such
uniformity of judgment about [vl] or [pw] - they are felt to lie somewhere in between. The
problem of gradient illegality is a difficult one for Optimality Theory, and one which we will
articulation in English syllable onsets (Massaro & Cohen 1983, Pitt 1998, Moreton 1999),
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
most often the bans on *[sj] and *[tl dl]. The place-of-articulation restrictions are a good
choice because they are strongly productive, because ambiguous stimuli are straightforward
to synthesize, and because the place features of the critical consonants can be manipulated
affricates, and fricatives found in the onset of a CV syllable, using typologically motivated
constraints. This is then extended to the onset of C[j w 1]V syllables to account for the
phonotactics of *[sj], *[tl], and ?[pw]. To anticipate: I will model the *[sj ] gap as a special
case of a general process spreading anteriority and the *[tl dl] gap as a special case of a
general ban on homorganic obstruent sequences. The ?[pw] gap, though systematic as a
special case of a general ban on homorganic consonant sequences, is not productive and will
§ 2.3.1 lays out the data; § 2.3.2 gives the analysis. § 2.3.2.1 describes the feature
system used in this model, based on Hall's (1997) variant of the now-standard Sagey (1990)
articulator features. § 2.3.2.2. analyzes CV syllables, § 2.3.2.3. discusses *[sr], § 2.3.2.4.
2.3.1. Explicanda
1 The other principal restriction is that onsets have to rise in sonority. In theory, manipulating (he
sonority o f either C in a CC onset cluster should affect perception o f the sonority o f the other C. However,
sonority is not a distinctive feature. Two segments which differ in sonority differ in many other
linguistically relevant ways as well, making the stimuli hard to construct and the results hard to interpret.
5 The C(j] onsets I do not discuss, because they have not been used experimentally.
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.2. C [j w I] onsets of American English
pa hi prove; brew vv vv vv vv
pw bw pueblo; bwana V? ?? 7?
pi bl plant; blame vv vv vv vv
ti di tread; dread vv vv vv vv
tji dji
tw dw twine; dwindle vv vv vv V?
tl dl •• •• •• ••
ki gj crack; grid vv vv vv vv
kw gw quit; Gwen, guava V? vv vv V?
kl gl clean; gleam vv vv vv vv
0j 5j threw v»
0w dw thwart V* ?•
61 dl
SJ Zi •• M M ••
sw zw sweet; Zwicker V? V* V*
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hultzen Woolley Catford Hammond
X* J-' shred; - V* V* V*
Jw JW Schwinn, ?• v. — ••
Schwarzenegger; -
J1 5l schlock; - ?• V* — ?•
author included it, but marked it as marginal. (%) means it was marked as normal for some
dialects. (•) means it was not included.
This list is intended to include all and only C[i w 1] onsets which can be produced
without alteration and without special effort by speakers of American English. Clusters that
are obviously non-native have been included as long as they occur in familiar, easily
pueblo)4. I am not sure whether the unattested onsets [3 r 3 W 3 I] (italicized in Table 2.1)
are a lexical or phonological gap; given the rarity of initial [3 ] in English, it is dangerous to
infer anything from their absence alone. I will take them to be of the same grammaticality
The transcription in Table 2.1 is a broad one. The finer phonetic details, which are
4 Compare news broadcasters' fluent pronunciations of zloty, Norman Schwarzkopf, Vladimir Putin with
their awkward Chechnya and Srebrenica. The productivity o f the syllable-initial [nj] and [s j ] gaps for these
trained speakers is clearly audible.
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2. Analysis
Two generalizations are immediately clear from Table 2.1. First, the C in the C[j w
I] cluster is itself never a [j w I], but is always something of lower sonority. This is a
special case of a general fact, true across languages, about syllable onsets - that sonority
rises over the course of the onset (Clements 1990).5 Second, there is no difference between
Since both of these issues are irrelevant to the question of place restrictions, we can
simplify our task by ignoring them. Henceforth we will only consider voiceless Cs, letting
them do double duty for their voiced counterparts, and ignore candidates with flat or falling
2 3.2.1. Representations
system proposed by Hall (1997), which is in turn a modification of the Sagey (1990)
5 English, like a number o f other languages, allows [s] and perhaps (J] to occur out o f the expected
sonority sequence (e.g., spit, stick, skip, square; shtik). This is a vexed question which I will not discuss.
It has been suggested that the [s]C sequence is a complex segment like a reverse affricate (Hayes 1980.
Lamontagne 1993).
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.2) Feature tree
+Root
+Manner
continuant
consonantal
sonorant
strident
♦lateral
+Laryngeal
spread glottis
constricted glottis
voiced
+Supralaryngeal
♦Velum
nasal
+PIace
+Labial
+round
+Coronal
anterior
distributed
back
+Dorsal
back
high
low
Note: Features marked'+’ are privative; others are equipollent.
The most notable difference between this and the familiar Sagey (1990) system is
that [back], normally a dependent of the Dorsal articulator, is here also a dependent of the
Coronal node too, with the stipulation that [+back] requires [+Dor]. The innovation is
Hall's (1997) solution to a problem in the original system: that palatalization could not be
[+Cor], Segments which triggered palatalization, usually front vowels, were [+Dor -back],
but the [-back] could not be spread to a preceding [+Cor] segment, since [+Cor] could not
support it (see Sagey 1990: §3.4.2.2). Hall argues that the palatalization feature, whatever it
is, must be a child of both the Coronal and Dorsal articulator nodes, since it can be spread to
both [s] and [x]. The segments triggering palatalization or resulting from it are, he says, all
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Allowing both tongue nodes to sponsor [-back] captures the physical link between the
can spread something to the Coronal node and [w] cannot; I have chosen this one because
I have also simplified Hall's feature tree by leaving out his Peripheral node, which
came below Place and above {Labial, Dorsal}, by replacing the Laryngeal features [stiff]
and [slack] with [voiced], and by omitting [rhotic] in favor of [+high, +low] (see below).
The Tongue Root node has been removed; I will ignore the complexities of uvular,
pharyngeal, and laryngeal consonants (McCarthy 1991). None of these changes is crucial
to the analysis.
With two exceptions, all features in this system are either privative or equipollent. A
privative feature is either present or it is not. An equipollent feature is either [+F] or [—F],
but not both. If a feature is present in a representation, then all equipollent children of that
feature have to be present as well, with either + or - specification. That is, an equipollent
feature can be absent from the representation of a segment only if the feature's parent is also
absent. A segment consisting only of the features [+Root +Laryngeal] is possible, but one
The two exceptions are [cont] and [strident]. Affricates are analyzed as [-cont
+cont] (Sagey 1990:§3.3.4.2). The feature [strident] is an equipollent child of the Manner
node, but it is only present when the segment is a fricative or affricate, and only for [+Lab]
or [+Cor] segments.
Finally, I have left the privative [+lateral] under the Manner node because it behaves
like a Manner feature in not spreading. The other obvious option is to put it under [+Cor],
since nearly all known laterals are coronal and they occur at all four coronal places of
articulation (McCarthy 1988, Hall 1997: §A.2.3.2). There are lateral fricatives, lateral flaps,
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and (most commonly) lateral affricates. None is found in English. Evidence for how they
The source-fiJter model is responsible for explaining the badness of a great many
candidates for the C in a C[j w 1] onset. Here, the critical candidates are the oral stops,
affricates, and fricatives at every place of articulation. Their representations are shown in
Manner of articulation_________________
cont - + and - +
cons + + +
son - - -
6 For an alternative view o f the feature specifications for liquids that does not use [lateral 1. see W alsh
Dickey (1997).
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.4. Obstruent place features
a.
Place of articulation
+Lab +
round
+Cor + + +
ant + + -
dist (+) -
+Cor/+Dor
back
+Dor
high
low
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
b.
Place of articulation
+Lab
round
+Cor +
ant -
dist +
+Cor/ +Dor
back - - +
+Dor + +
high + +
low - -
accept Hall's arguments (1997: §2.5.2) that the two should not have different features, sir.ce
no language has two contrasting segments distinguished only by that place difference.
Table 2.5 shows the IPA symbols for every combination of the manners from Table
2.3 with the places from Table 2.4, together with stridency values for the fricatives and
affricates.
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.5. Representation of consonants
Manner
Labial P p 4» ♦ -
Labiodental P pf f +
Dental/Interdental t J0 e -
Alveolar t ts s +
Retroflex t (5 § +
Palatoalveolar, alveolo c
t tf, tQ J. e +
palatal
Palatal c eg 5 -
Velar k kx x -
These are the low-sonority voiceless segments which this system is capable of
representing. In our source-filter model, the lexicon can emit any of them. Since most of
these segments do not and cannot occur in English, we will have to build a grammar which
2.3.2.I.2. Features of [j w I]
Our task in this section is simply to describe the surface features of American
English [j w 1], We will not explain why these segments, rather than other sonorants,
should be in the American English inventory. I adopt the analysis of Kahn (1980), who
makes [j w] glides (semivowels) and [I] a sonorant consonant.
Guenter (2000) summarizes the arguments that American English [j w] are glides
as follows: (1) They are phonetically central approximants. (2) They restrict the set of
6 I do not know anything about the stridency o f lateral fricatives. In the absence o f better information, I
will assume that they are as strident as the corresponding non-lateral fricatives.
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
vowels that can precede them. (3) Each has a stressed syllabic version with which it
alternates. (4) They cannot occur after tautosyllabic diphthongs (Cohn & Lavoie 2000).
(5) Flaps occur after them. (6 ) Final [t d] cannot be deleted after them. These statements
are in general not true of [I].8 To these we can add the observation of Espy-Wilson (1992)
that [I] is frequently produced with a spectral discontinuity, while [j w ] are not.
Kingston (p.c.) points out that stops are often intruded between [1] and a following
lingual fricative: pulse [p\lts], filth [filtO]. The same phenomenon occurs with the other
class of high-sonority consonants, the nasals: warmth [woimpG], chance [tjxnts]. It does
We will model [j w] as glides - that is, as vowels syllabified into a syllable onset,
having the same features as the syllabic [j u ] (Hall 1997:135, Rosenthal! 1997). The
continuant + +
consonantal - -
sonorant + +
strident
7 However. Guenter did And that 15 o f his 16 informants had an [I] that satisfied (4), and many had one that
satisfied (3); he interprets this as evidence o f language change in the direction o f a glided [I].
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.7. Place features for [j w ]
+Lab + +
round + +
+Cor +
ant -
dist - or +
+Cor/ +Dor
back + +
+Dor + +
high + +
low + -
The manner features are standard, as are the place features for [ w ]9 . Those for [ j]
Delattre and Freeman (1968) made X-ray Films, with synchronized spectrograms, of
46 speakers from various parts of the United States. They found a wide variety of [j]
articulations, which sounded very similar. All speakers, in all syllable positions, make a
constriction in the pharynx about halfway between the glottis and the uvula. They also
make a constriction somewhere in the oral cavity between the comer of the alveolar ridge
and the beginning of the soft palate, using the dorsum, blade, or tip of the tongue - in
onsets, always the blade or tip. The lips are rounded (most strongly in the onset of a
stressed syllable). Similar results were obtained in MRI and palatographic studies of 4
* Hall argues that both [j j J are actually [+Cor] (I997:§§ 1.2.6, 4.4).
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In vowels, a pharyngeal constriction is the articulatory correlate of [+low]; a close
American English [j ] are therefore [+low +high]10. The formal advantages are clear: we
are rid of the [+rhotic] feature (which needed the same kind of co-occurrence stipulations as
In the studies of Delattre and Freeman (1968) and Alwan et al. (1997), the tongue
tip or blade participated in all versions of the onset glide [j ], indicating that the glide was
coronal12. The position of the constriction ranged from prepalatal to postpalatal. We can
choice is not crucial to the analysis, I will favor my own speech and pick [+dist].
+round].
For [1], we use the features of Tables 2.8 and 2.9:
continuant -
consonantal +
sonorant +
strident
-t-laterai +
9 Delattre & Freeman's Figure 1, a gallery o f X-ray tracings, shows this very clearly. Their "Type 4"
syllabic [j ] is particularly striking - the tongue has two humps, one in the m iddle pharynx and one under
the hard palate, with a deep indentation between them.
11 The oral constriction in [j ] has also been analyzed as the implementation o f a [coronal] feature (Walsh
Dickey 1997).
10 The nuclear [j] had a coronal component in only one of its five manifestations (there was much more
variety between speakers in non-initial position), with the blade approaching the rear o f the hard palate
(Delattre & Freeman's Figure 1, Type S). It seems that coronal articulations are obligatory in syllable
onsets, but (for most speakers) prohibited in syllable nuclei.
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.9. Place features for [I]
+Lab
round
+Cor +
ant +
dist -
+Cor/ +Dor
back +
+Dor +
high +
low —
The manner features are standard except for [-cont]. It is a matter of debate whether
[1] is continuant phonetically or phonologically.
The double articulation of [1] has been shown by Sproat & Fujimura (1993). Their
X-ray microbeam data, from four speakers of American English, found both a dorsal and an
apical gesture, whose relative timing varied depending on prosodic position. The apical
gesture we model as [+Cor +ant -dist]. The MRI and palatographic study of Narayanan et
al. (1997) confirmed the double gesture, and showed that the apical gesture contacted the
alveolar ridge along the midline in both onset and coda [ 1].
2.3.2.2. CV syllables
follows:
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
I. Two classes do not occur at all: the retroflexes and the palatals. Both of these
Table 2.10 shows the repair which I assume is made to each of the impermissible
segments: A box encloses each group of underlying segments that map onto the same
surface segment.
Labial P P* * -
Labiodental P Pf f +
Dental/Interdental t e _
fi
Alveolar t Is s +
Retroflex t - g +
Palatal c c5 S -
Velar k kx X -
Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.
propose will be quite complex, with 12 constraints ranked in 9 strata. I will first describe
and justify the constraints, then present ranking arguments for them.
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.1. Undominated faithfulness constraints
None of the repairs shown in Table 2.10 involve deleting the offending segment
deletion:
(2.11) MAXSEG
Every segment of the input has a correspondent in the output.
No repair involves changing the major articulator Labials are changed to labials,
coronals to coronals, and dorsals to dorsals. This can be modelled with an IDENT
constraint:
(2.12) Id en t [P l a c e ]
If an underlying and a surface segment are in correspondence, they share the
same major articulator.
Of the four coronal stop places, only the alveolar is used in English CVs. Instead of
The lack o f retroflex and dental stops can be seen as the result of high-ranked
typologically.
Maddieson's genetically and geographically balanced sample of 317 languages, over 99%
had a dental or alveolar stop, while only 11.4% had a retroflex stop (1984: §2.4). In the
same sample, 266 languages (84%) had a non-retroflex voiceless fricative, while only 17
(5.4%) had a retroflex voiceless fricative. For voiced fricatives the numbers were 96 (30%)
and 3 (1.0%) respectively (1984: Table 3.2). The markedness of retroflexes can be
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
modelled by a markedness constraint *RET, which awards one mark for each segment that
is [-ant, -dist].
(2.13) *RET
*[-ant, -dist]
It is unusual for a language to contrast dental and alveolar place; the [+anterior]
stops are either laminal or apical. The dental stops [t d] I assume are ruled out by a blanket
constraint against the dental place of articulation, operative in other languages which favor
(2.14) *DENTAL
*[+Cor +ant +dist]
On the basis of loan-word phonology, I will assume that both retroflexes and
dentals are repaired to alveolars. Alveolars remain alveolar, and palato-alveolars remain
palato-alveolar (though, for reasons discussed in the next section, palato-alveolar stops gain
Table 2.15. R epar of retroflexes to alveolars (Yule & Bumell 1886, American Heritage
Dictionary 2000)
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.16. Repair of dentals to alveolars (American Heritage Dictionary 2000)
The repairs involve changing the [anterior] and [distributed] specifications. The
problem is how to insure that palato-alveolar inputs, and only palato-alveolar inputs, surface
as palato-alveolar outputs. The solution is to hand: Under the Hall (1997) feature system,
alveolar, or vice versa, therefore violates the undominated lDENT[PLACE]. Since *R e t and
*D ental force retroflexes and dentals to change, while IDENT[PLACE] prevents them from
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.17) I d e n t [P l a c e ) » *R e t , *D e n t a l » Id e n t [a n t ), I d e n t [d is t ].
IDENT
[P l a c e ] *R et *D e n t a l IDENT[ANT] IDENT[DIST]
Wl15 *!
-> [d] *
MJ *! * *
[d3] *! *
Id / [d] *! *
-> [d]
MJ *! *
[d3] *! * *
*! * *
/«¥ Ml
-> [d] *
M) *!
[d3] *! *
/d/ [d] *! *
[d] *! * *
MJ *! *
-> [d3]
13 [d] represents a [-ant, + d ist| (palatoalveolar) stop, the stop corresponding to the affricate [tj).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.3. The persistence of [6]
The fricatives [0 d] are the only dental segments in English CVs, and the only
One, discussed in the previous section, is *DENTAL, which militates against all
dental articulations.
[+strident] fricatives ([f s J]) and poor in [-strident] ones (only the comparatively rare [0 ]).
The situation is the same in most languages. The three most common fricative places
(voiced or voiceless) are, in descending order, [s]/[s], [fl, and [f]. The first nonstrident
fricative, in fourth place, is [x], which is half as common as [f| and more than twice as
common as any other nonstrident fricative (Maddieson 1984: Table 3.2). We can capture
(2 .1 8 ) * [-STRIDENT]
I will analyze the persistence of [0] in English as preservation of the salient acoustic
contrast between the [-strident] [0 ] and the other, [+strident], coronal fricatives14:
(2 .1 9 ) M a x [ - s t r i d e n t ]/C o r
An underlying [-strident] coronal must correspond to a surface [-strident]
coronal.
11 Since stops are not specified for [strident], this constraint also keeps [t0 6] from turning into stops.
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.20) MAX[-STRIDENT]/COR » *[—STRIDENT]
/0 ik / M a x [ - S t r i- *D ental * [—STRIDENT] id e n t [D is t ]
dent] / C or
[Oik] * *
[sik] *! *
Since all the other [-strident]s (the labials and dorsals) are still able to change to less
( 2 .21 )
[fact] *!
[fact] *
This leaves the dental fricatives as the only possible dentals and the only possible
non-strident fricatives.
English CVs have only velar articulations. 15 Palatals are repaired to velars.
stops in only 18.6% of his sample, though over 99% had a velar stop (1984: §2.4). A
voiceless palatal fricative occurred in only 16 of the languages, or 5.0%, and a voiced palatal
fricative was found in only 7, or 2.2%. By way of comparison, voiceless and voiced
palatoalveolar fricatives turned up in 146 (46%) and 51 (16%) (Maddieson 1984: Table
12 The allophonically palatalized velars found before front vowels, as in key, are not as far fronted as
phonemic palatals in languages that have them (Keating & Lahiri 1993). W e regard them here as velars.
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2). We capture this with another context-free markedness constraint, *PAL, which gives a
(2.22) *PAL
*[+cons, +Dor, -back]
This constraint will cause underlying palatals to become velars (violating low-ranked
[ca] *!
-> [ka] *
[ta] *!
Since palato-alveolars are also dorsal and [-back], they meet the structural
description of *PAL. However, lDENT[PLACE] protects them from losing their [Cor]
specification:
(2.24)
[ca] *! *
[ka] *! *
-* m *
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.5. Labial places of articulation: bilabials and labiodentals
Since no language is known to contrast bilabial and labiodental stops, they are not
Bilabial and labiodental fricatives are featurally distinct, the bilabials being
[-strident] while the labiodentals are [+strident]. The result of this, as we saw in (2.21), is
It is very common for languages to have affricates at all and only those places where
it has no stops. The most common pattern - found by Maddieson in 8 6 out of 317
languages, or 27% - is the English one of stops at the labial/alveolar/velar places and
affricates at the palatoalveolar place (Maddieson 1984: § 2.5). The effect of this is to
disperse the [-cont] segments as widely as possible in articulatory and acoustic space, with
one segment being made by the lips, one by the tongue tip, one by the tongue blade, and one
by the dorsum.
Affricates are, typologically, more marked than fricatives, which are more marked
than stops. Every one of the languages in Maddieson's sample had stops, and most had two
stop series. All of the 451 languages in the UPSID database have stops; 413 have
affricates than to stops. Maddieson's Tables 2.5 and 2.8 make this clear:
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.25. Number of languages with stops at given places in the sample of Maddieson
(1984:TabIe 2.5)
Palatal or
No. of
Table 2.26. Frequency of the most common affricates in the sample of Maddieson
(1984:Table 2.8)
position in a stop series is filled with an affricate. Affricates are more marked than stops,
except at the palato-alveolar place, where the opposite is true. The affricate, it appears, is the
constraint:
(2.27) AFFR/PALAL
An obstruent should be an affricate if and only if it is palato-alveolar.
13 Maddieson's * indicates "dental o r alveolar (combined)". The non-IPA symbols are Maddieson's.
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In English, illegal labial and alveolar affricates are repaired by converting them into
fricatives: pfennig [fj, tsunami [s], Zeitgeist [z], czar [z] (Jones 1997). The illegal
palatoalveolar stops are repaired by affricating them: Magyar [d3 J. The two processes are
Labial P pf -> f
Alveolar t ts -» s
In the labial and alveolar cases, the marked affricate is deaffricated to a fricative by
deleting [-cont], rather than to a stop by deleting [+cont]. Some faithfulness constraint
must be blocking the deletion of [+cont] but not of [-cont]. We will take it to be
MAX[+CONT], which gives a mark to each corresponding segment pair where the
underlying segment has [+cont] but the surface segment does not:
(2.29) MAX[+CONT]
An underlying [+cont] segment must correspond to a surface [-fcont]
segment.
(2.30) MAX[-CONT]
An underlying[-cont] segment must correspond to a surface [-cont]
segment.
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2 .3 1 ) M a x [+ c o n t ], a f f r /P a l a l » M a x [-c o n t ]
[pfenik] *1
[penik] *!
—> [fenik] | *
The palato-alveolar stops change manner rather than place of articulation so as not to
violate the undominated lDENT[PLACE]. (The palato-alveolar place is the only one which is
both coronal and dorsal, so that any change o f place changes a major articulator.) They
(2.32)
/m adar/
c I d en t [Pl a c e ] A ffr /P a l Al Max [- c o n t ]
[madar] *!
[madar] *!
—» [m ad 3 ar]
[m a 3 ar] *!
by *PAL, *[-STRlD].
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2 .3 3 ) *PAL, *[-STRID] » M a x [+ c o n t ]
[airm an] *! *
[aixman] *!
[aicman] *!
—» [aikman] *
The grammar we have established has the rankings shown in (2.34). The topmost
(2.34)
* [-STRIDENT] *DENTAL
A ffr /P a l A l
M a x [- c o n t ]
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3. * [ s j J
With the basic inventory accounted for, we are now ready to turn to the first of the
two onset conditions used in the experiments. Table 2.10, repeated here, can be compared
with Table 2.35, which shows the observed inventory and repairs in C[j ]V syllables.
Labial P * -
Labiodental S f +
Dental/Interdental t JB 0
Alveolar t ts s +
Retroflex t t§ § +
Palatal c c5 9 -
Velar k kx X —
Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.35. English surface obstruent inventory in C[j ]V syllables
Labial P t -
Labiodental 6 Pf f +
Dental/Interdental t
w% JO 0 ___
Alveolar t s +
Retroflex t r ■' 1 i +
Palatal c c$ 9 -
Velar k kx X —
Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.
The only differences between permissible Cs in CV and C[j]V syllables are in the
coronals. Four formerly separate groups have been merged into two, so that all coronals
(except [0 ], immune as usual) have the same minor place features: [-ant +dist -back].
Since these are exactly the coronal features of [i], the merger is naturally understood as
14 The phonetic effect o f [j ] on preceding ft/ is variously described by different authors. I discuss here the
dialect of Hammond (1999:101) and myself, in which the p re-fi] /t/ have a distributed palato-alveolar
articulation, [tfl. Others, such as Olive et al. (1993), say that the articulation is an apical, retroflexed [(].
Given the wide variation in how speakers articulate [j ], the difference may be due to (he spread o f different
features: the first [j ] being (-an t, -Klist] and the second being [-ant, -dist]. If so, one would expect that (t-i]
speakers would have a retroflex, rather than a palato-alveolar, articulation for /jj/, so that it would be
pronounced [§j|. The acoustic contrast between [t] and [[J, o r between [S] and [§), will in any case be
difficult to hear before [j ] ow ing to the m uffling effect o f lip rounding and the lowering o f form ant
frequencies.
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.35) SPREAD[COR]
Neighboring [+Cor] segments should have the same value of [antj, [dist],
and [back],
violated since coronals either lack a [back] node entirely or are [-back]. Since [0] is
a.
[-STRIDENT] [COR]
/ C or
[sjebienitsa] *!
(jjsbjenitsa] * *
b.
[-STRIDENT] [COR]
/COR
[0 jed] *
Cfjed] *! *
The illegality of [sj ] arises from its failure to obey SPREAD[COR]: The [s] should
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A conceivable repair to [ s j] which is not in fact used is epenthesis; Sri Lanka does
not become [soji] Lanka the way that tmesis becomes [tomisis]. This shows that the anti-
(2.37)
[COR]
[sji] *!
m * *
[saii] *!
The grammar is shown in (2.38) (lines indicate only those rankings proven above):
(2.38)
Affr /P al Al
M ax [- c o n t ]
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2A. *[tl]
The second commonly-used environment is the C[1]V syllable. Here, all of the
coronals are excluded from appearing in C position except the strident fricatives.
Labial P p$ ♦ -
Labiodental P Pf f +
Dental/Interdental t JB e -
Alveolar t ts s +
Retroflex t \§ § +
Palatal c c9 9 -
Velar k kx X -
Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box. The repair for [tl] is
shown as [k], but this is not known with certainty.
The illegality of [tl dl] in an onset can be linked to the fact that both are coronal and
[-cont], through the Relativized Obligatory Contour Principle (Selkirk 1991, Padgett 1991):
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
verb root may contain two successive coronals, but only if they disagree in continuity. Thus
/sVtVq/ is permissible, but /sV0Vq/ or /tVdVq/ is not (Yip 1989). However, there are no
Segments which are too different in stricture do not interact in place. (In feature-geometric
The effect of the Relativized OCP can be seen diachronically in English, in the
progressive loss of the coronal [j] in onsets after coronals and before [u]. It was first lost
from [tju], as in rude, rule, then from [lju] as in lute, dilute, and is now being dropped from
[nju] (news), [sju] (suit), and (in the most advanced dialects) [dju] and [tju] (duke, tune)
(Trudgill 1999:56-59). The more similar the preceding consonant is to [j] in stricture, the
We can see the ban on [tl dl] as a similar phenomenon. English allows [pi bl kl gl],
where the place of articulation differs between stop and liquid even though both are
noncontinuant. It allows [ti dj] and [si], where both segments are coronal but only one is a
non-continuant. What is forbidden is two successive consonants with the same articulator
and the same value of [cont]: [tl dl ] . 18 This is modelled as one of the family of Relativized
OCP constraints:
18 English does have [si], [sn], and fstj onsets, which appear to violate the ban on [coronaI][coronal]
sequences. However, initial [s] is exceptional in another respect: Unlike all other consonants, it can
precede a less sonorous segment. In fact, [s] can be added to any legal onset except a fricative or affricate,
and all three-consonant onsets are so formed. The [s] neutralizes the [voice] contrast in a following stop,
and palatalizes to [f] before [j ], but otherwise does not interact with the rest o f the syllable. These facts are
ordinarily analyzed by positing a reserved structural slot for (s| at the left margin o f the syllable, outside o f
the onset (e.g., Kenstowicz 1994:258; Borowsky 1986:175-179). This account is corroborated by the
coronal fricative [6), which cannot occupy the [s] slot and thus is subject to the [coronal][coronal] ban:
[61], [6n], and [6t] are impossible onsets.
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
There is little evidence as to the nature of the repair to the illegal sequence. It is
possible that /tl/ is repaired by epenthesis, or by making the [I] syllabic, and thereby
separating its coronal articulation from the [t] (Sproat & Fujimura 1993). Another
possibility is that the IV is realized as [k], as in the attested pronunciation [kliqgit] for
Tlingit'9. It has been shown that French listeners strongly tend to misperceive the illegal [tl]
[tlirjgit] *!
[taliqgitj *!
-> [kliggit] *
/tlirjgit/ OCP(CONT,PL) I d e n t [P l a c e ] D e p Se g
[tliggit] *!
-> [taliqgit] *
[klnjgit] *!
2.3.2.S. ??[pw]
The phonological status of initial [pw] in American English has never been fully
clarified. The [pw bw mw] onsets are often described as marginally acceptable by
48
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
permissible English onset; Catford (1988) and Hammond (1999) consider it marginal, like
For/pw/ the example I have long used is puissant, attested for 1450 and
occurring once per million words (1/M20) or within the first 20,000 words of
the language, but pueblo (1818) is more frequent (2/M including the place
name) and is usually cited in whatever lists include this item. Both words
are pronounced as indicated, although they do have alternative
pronunciations not pertinent to our list. The word bwana, included by Hill
and others, is rare, but /bw-/ is frequent in Buenos Aires (1/M) in both
American English and RP (Hultzen 1965:12).
Wooiey points out that low frequency cannot be the sole criterion of phonotactic
badness:
Initial /pw, bw, zw, mw/ pose a more difficult problem. As Hultzen has
shown, puissant, dating from 1450, can hardly be rejected. To appeal to the
low frequency of occurrence of these clusters in order to reject them would
be to lose the natively English initial /0w/ as well (1970:74).
More modem frequency counts show that initial [0w] is more common than [pw]
(Celex combined written and spoken, EFW.CD/EPW.CD: 6 per million vs. I per million;
Francis-Kucera: 4 per million versus 0 per million), but the point is well taken. Hammond
(1999) considers initial [pw] and [bw] to be of the same degree of marginality as [dw]
(Celex has only dwarf, dwell, dwindle, and derivatives) and [0w] (Celex has only thwack,
thwart).
There are two reasons to think that in English, less weight is laid on the [pw bw] ban
than the [tl dl] one. First, English [r] is labial (Delattre & Freeman 1968), so the legal,
frequent onsets [pj bj fi] violate the same constraint as [bw]. Second, the ban on same-
place CC sequences is, cross-linguistically, stronger the more similar the two Cs are in
sonority (Selkirk 1988; Padgett 1991). Since [1] is less sonorous than [w] (Kahn 1980;
20 Frequency counts in the quoted passage are from T horndike & Lorge (1944).
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Guenter 2000), the [dl] sequence is closer in sonority than [bw] and hence a worse
structural violation.
If we assume that [pw bw mw] are actually illegal in English (that they are
phonological rather than lexical gaps), then the illegality must be due to some active
markedness constraint. It has been proposed by Clements & Keyser (1983) that English
actively prohibits [labial][labial] sequences in the syllable onset. Again we have an effect of
(2.44) OCP(LAB)
Adjacent labials are prohibited.
Since [p b] and [w] have different values of [cont], this constraint must be an
unrelativized OCP constraint, unlike the OCP(C0 NT, PL) forbidding [tl dl]. In order for
[Iabial][!abial] sequence is realized in some other way, repairing the violation. Already we
can see that all is not well with this analysis: What is the repair? Jones (1997) gives
faithful pronunciations for most word-initial [labial][labial] sequences, but some have
50
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
The repairs are unsystematic and look suspiciously like spelling pronunciations - as
differing frequency of the sequences contained in them, even when the sequences are
phonotactically legal and attested (Coleman & Pierrehumbert 1997; Frisch et al. 2000). The
phonological gaps. If [pw] is legal in English, but very rare, it may be judged unacceptable
on the grounds of low phonotactic probability alone, even though [pw] words are possible.
The onsets [tl] and [ s j ] will be judged worse, being both illegal and rare. This would
account for the pattern of judgments and attestations reported in the phonological literature.
2.4. Summary
same grammar. Systematic, productive gaps in the inventory and in the set of
phonotactically permissible combinations arise from the filtering effect of the grammar on
the unconstrained set of possible inputs. A productive, phonological gap in the set of
observed surface forms is one which could not be the output of any conceivable input from
the lexicon. A nonproductive, lexical gap is one which could be filled by some input from
the lexicon, but where the necessary input (through historical accident) is not a lexical item.
Two highly productive gaps in the set of English syllable onsets were discussed:
That on initial [ s j ], and that on initial [ t l] . Both were shown to be special cases of
and the Obligatory Contour Principle (OCP), drive similar processes in many non-English
languages. The gaps in the English syllable inventory were analyzed as phonological gaps,
51
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
A systematic gap, but of doubtful productivity, was also analyzed: The partial ban
however, the lack of productivity suggests that this constraint is not ranked high enough to
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 3
3.1. Introduction
This chapter discusses three models of phoneme perception: the TRACE model
(which puts all phonotactic effects in the lexicon), the transitional-probability model of Pitt
and McQueen (1998) (which assigns them to a statistically sensitive prelexical module), and
lexicon can directly influence the perception of phonemes. In TRACE, a fully or partially
effects on speech perception are taken to arise from lack of lexical support for non
The TRACE network is described by McClelland & Elman (1986); I will briefly
summarize what they say, but for full details the reader is referred to the original paper.
Each unit is a "detector" representing a hypothesis about the utterance — that it begins with
a voiced sound, that it begins with a (j], that it contains the word "yard", and so forth. The
detectors are organized into three layers, corresponding to features, phonemes, and words.
The "activation level" of a unit is a nonnegative number which varies over time in response
to the unit's inputs. It tells how much credence the model puts in that hypothesis at the
moment.
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 3.1. The TRACE model of McClelland and Elman (1986)
Phonemes
[x]
Features
-voice
etc.
A unit receives input from all other units with which it is connected. The input
which Unit A contributes to Unit B depends on Unit A's activation and another parameter,
the "strength" of the A-to-B connection. All connections go both ways, so a large positive
strength means A and B strongly excite each other, while a large negative strength means
they strongly inhibit each other. Units on the same level inhibit each other, so that more
confidence in the "plastic" hypothesis means less confidence in the "lap" hypothesis (and
vice versa). Connections between levels are excitatory, so that more confidence in "plaid"
means more confidence that the word starts with [p] (and vice versa). The strength of the
connections is set by the experimenters; TRACE is not a learning model (McClelland &
Elman 1986).
At the very bottom of the model are the acoustic feature detectors, which receive
inputs not only from other units but from outside the model. Each detector is responsible
for a particular feature ([voice], [acute], etc.) at a particular time in the utterance. As the
utterance unfolds moment by moment, the feature detectors register its acoustic properties
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and adjust their activation levels accordingly. Activation spreads upwards through the
network. It also spreads back down from the word detectors to the phoneme detectors, and
from them to the feature detectors. Meanwhile, the units on each level are trying to inhibit
each other.
TRACE assumes that the units are open to conscious introspection: To detect X, the
listener uses the X detector unit. Responses to a phoneme-monitoring task, for instance,
depend on the activation levels of the phoneme units. Responses to a word-recognition task
depend on the activation levels of the word units. Because activation spreads downwards
and inhibition spreads sideways, a unit's activation depends not just on the acoustical
configuration which it is nominally supposed to detect, but on the state of the rest of the
network. Under the right circumstances, the result can be strong activation (inhibition) of
the X detector despite the absence (presence) of evidence for X in the acoustic signal — a
perceptual illusion.
TRACE puts phonotactic illegality in the lexicon. Legal and illegal sequences are
processed differently because the legal ones receive support from lexical items containing
them, while the illegal ones do not (since, by definition, they do not appear in any words) -
that is, instead of punishing illegality, TRACE rewards legality. TRACE thus cannot
distinguish illegal sequences from other sequences of zero frequency. Any behavioral
differences between processing of zero-frequency legal sequences and illegal sequences (if
they can be shown to exist) would have to be explained by something outside of the
TRACE system.
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.2. Lexical effects on phoneme perception
The lexicon can certainly influence performance on tasks that are intended to tap
phoneme perception, lending credence to the TRACE approach. Evidence comes from four
major paradigms:
Phoneme detection (Foss 1969). Subjects listen to each stimulus and respond "yes”
letter). The usual dependent measure is RT for correct detections; error rates are < 10% and
not useful.
stimulus is acoustically ambiguous (e.g., between [bin] and [pin]); subjects are asked which
one it sounds more like. Dependent measures vary; a common one is the point (e.g., on the
VOT continuum) at which both judgments are equally likely. RT is also measured, and
has been either replaced by noise or obscured by noise, and the subject has to say which.
Dependent measures are signal-detection-theoretic d-primes and betas. The effects are
robust; performance is not improved by 1 0 ,0 0 0 trials of practice, nor by any but the most
Shadowing (Cherry 1953). Subject hears speech over headphones, and has to
repeat it in as close to real time as possible. Various dependent measures evaluate how well
activation or inhibition from the word units to the phoneme units (McClelland & Elman
1986). Phonetic data extracted from the speech stream is only one influence on the
phoneme units; it can be drowned out by powerful signals from above which bias a
phoneme unit so strongly in one direction that conflicting information from below is not
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
enough to offset it. The phoneme unit's activation level is trapped between a fixed minimum
and maximum, and becomes less responsive to its inputs the closer its activation level is to
the floor or ceiling; hence, strong excitation or inhibition from above can also reduce a
The three lexical factors known to influence phoneme tasks are lexicality, frequency,
Lexicality and frequency. Effects are rather fragile for many paradigms. The only
between a word and a nonword owing to ambiguity in one phoneme, the phoneme tends to
be heard so that it makes the word (Ganong 1980, Fox 1984, Connine & Clifton 1987,
McQueen 1991, Pitt & Samuel 1993; not replicated by Burton, Baum, & Blumstein 1989).
The same effect was observed in shadowing by Marslen-Wilson (1984): Subjects "fluently
restored" mispronounced words, apparently without noticing the discrepancy (i.e., with no
effect on shadowing latency). There is at least one report that a one-phoneme ambiguity
between a common and a rare word tends to be resolved in favor of the common word.
However, the effect can be reversed (to favor the rarer word) by setting up the experiment
so that the less common word tends to be the right answer (Connine, Titone, & Wang
1993).
Phoneme targets may be detected faster in words (Rubin, Turvey, & van Gelder
1976, Cutler, Mehler, Norris, & Segui 1987; not replicated by Foss, Harwood, & Blank
1980, Frauenfelder, Segui, & Dijkstra 1990). Word initial phonemes may be quicker to
detect in common than rare words (Morton & Long 1976, Dell & Newman 1980; not
replicated by Segui & Frauenfelder 1986). Phoneme restoration may be stronger in words
than in nonwords (Samuel 1981a, 1996; not replicated by Samuel 1987), and in common
downward spread of lexical activation. The ambiguous acoustic stimulus excites, say, [t]
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and [d] equally in the context yar_. Since the YARD unit is somewhat activated by the
context, it contributes activation to [d]; lacking stimulation from a *YART, [t] is overtaken
by [d]. The YARD and [d] units keep exciting each other and inhibiting [t] until [t] is
completely overwhelmed. A stimulus ambiguous between two nonwords, like sirt and sird,
would favor no word node over any other, and would be decided on the basis of the acoustic
evidence (McClellan & Elman 1986). Similar reasoning would apply if YART were a real
activation spreading causes the activation of the relevant phoneme unit to reach response
criterion sooner.
from the phoneme) decreases later in real words but not nonwords (Marslen-Wilson 1984,
Frauenfelder, Segui & Dijkstra 1990, Wurm & Samuel 1997); effect size varies from -30
ms to -300 ms. Phoneme restoration is stronger late in words with early uniqueness points
than late in words with late uniqueness points (Samuel 1987). Shadowers restore
mispronunciations more later in the word (Marslen-Wilson & Welsh 1978). Response
time to reject a nonword is a fixed amount, measured from the earliest point where the
Strong support for the COHORT word-recognition model comes from these effects,
which are hard to explain in other theories, and it is a great virtue of TRACE that the
connectionist architecture is able to do that. Simulations show that an active word unit is
quickly extinguished when mismatching acoustic information comes in, provided that a
better-matching word unit is present (McClelland & Elman 1986). Shortly after the
uniqueness point, only the matching word unit is still active, strongly inhibiting all rivals and
exciting phoneme units consistent with it (which in turn inhibit phoneme units that are
will have a hard time changing the network's mind about the word or the phonemic code.
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.3. Phonotactic effects on phoneme perception
In the previous section we saw that a listener's performance on a phonemic task can
evidence exists that it can be influenced by their knowledge of the possible words of their
language.
Massaro and Cohen (1983) created segments ambiguous between [j ] and [1] by
varying F3, and asked subjects to judge them in the contexts [ tj] , [p j], [v_i], and [s_i]. In
English, only [i] is permissible after [t], only [1] is permissible after [s], both can follow [p],
and neither can follow [v]. The ambiguous segments were most likely to be judged [j ] in
[tj], less likely in [p_i], less likely still in [v_i] and [ s j] (as shown in their Figure 3.1).
Despite the lexical confounds — [tii], [pii], and [pli] are words — the evidence of [v j] and
[sj] suggests that judgments are altered by people's knowledge that [sji] can't be English:
The larger number of [i] judgments after [v] than [s] cannot be due to acoustic-phonetic
factors — if anything, the labial [v] should make the following F3 sound higher and the
ambiguous segment more [l]-like — nor to lexical ones, since [sli] is not a word - which
leaves phonotactics.
The TRACE model has a good explanation for how phonotactics exerts this
influence. McClelland & Elman (1986) found that the ambiguous stimulus [s?i] partially
activates similar words in the lexicon of their simulated word recognizer. Since the lexicon
contains only phonotactically permissible words, the units for sleep, sleet, and so on
become active, feeding excitation back to [1], but no countervailing sreep or sreet assists [j ].
The amount of acoustic support that [1] needs to reach criterion is thus reduced in the
context [ s j] compared to a neutral context like [v_i] in which there are no lexical items
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.4. Empirical shortcomings o f TRACE
Since TRACE models many different things, it has been criticized on many different
perception, see McQueen, Norris, & Cutler (1999). Most important, for our study, is that
the phonotactic effect is usually larger and more robust than the lexical effect. This is very
The evanescence of lexical effects came up in §2.2. In a lengthy study, Cutler et al.
(1987) found that they could make word-superiority effects come and go by boring the
listeners less or more. A varied stimulus set, containing mono- and disyllables, got the
lexical effect; a monotonous one did not. They concluded that the lexical effect was not
automatic, but depended on listeners' allocation of attention between the lexical and the
prelexical levels'of representation. For a detailed review of such results from several
Phonotactic effects, by contrast, are robust and not affected by stimulus monotony.
The original Massaro & Cohen (1983) experiment got very large effects with a
monosyllabic stimulus set repeated for 1120 trials (total over two days). Pitt (1998) got
several large phonotactic effects with monosyllabic stimuli. Moreton & Amano (1999)
directly compared the effect of lexical status on the Japanese vowel-length boundary with
that of phonotactics using the same subjects and paradigm. They found a large phonotactic
In other words, manipulations that make the lexical effect go away can still leave a
phonotactic effect. This is a problem for any theory which, like TRACE, denies a
1 M cClelland and Elman (1986) report an experiment which suggests that the lexical effect and phonotactic
effects can be superimposed. They compared listeners'judgments o f a segment [?] between [b] and [d] in the
contexts jwindle, _wiffle, and _wacelet. The highest rate of "d" response was obtained in the _windle
context, where dwindle is a word; an intermediate rate was found in jwiffle, where neither endpoint is a
word; and the lowest rate was found in _wacelet, where neither endpoint is a word but bwacelet is very
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In TRACE, the phonotactic effect could be stronger because it combines the effects
of many lexical items, while the lexical-superiority effect depends on a single item.
However, the TRACE authors have shown that in fact the lexical-superiority effect is
stronger: When the network is presented with an ambiguous phoneme between [p] and [t]
in the context [_luli], it is classified as [t], with the lexical influence of truly winning out over
3.3. The MERGE Transitional Probability theory (Pitt & McQueen 1998)2
segments. Different segment sequences, such as diphones or triphones, occur with different
frequencies. A model which is sensitive to these frequencies can compare the statistical
Such statistical information can also in principle be used to find word boundaries, and serve
There is evidence from various sources that listeners are sensitive to sequence
sequences are rated as "more English-like" by native speakers than those containing low-
probability sequences, and that, when subjects are asked to construct portmanteaus by
blending two nonwords, low-frequency sequences tend to be broken up more often than
high-frequency ones. Frisch et al. (2000) showed that listeners' "wordlikeness" judgments
were very strongly affected by the frequency of legal phoneme sequences contained in the
stimulus. Vitevich et al. (1997) found that nonwords containing frequent sequences were
similar to one. They interpreted this as a lexical effect superim posed on a phonotactic bias against *[bw) -
i.e., even though all the contexts were phonotactically biased, a lexical effect was still obtained.
Since all o f the contexts started with _w, the experim ent did not demonstrate a phonotactic bias;
its presence was sim ply assum ed. As shown in Chapter 4, E nglish listeners' bias against [bw] is weak if it
exists at all. Hence, the experim ent may just have measured an isolated lexical effect. It could still be true
that a really strong phonotactic effect, like the bias against inital [d l], would swamp any lexical effect.
2 The M ERGE fram ew ork is described most completely in N orris et al. (2000). The first discussion o f it
in connection with transitional probabilities is Pitt and M cQueen (1998).
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
rated as "more English-like" and were repeated faster in a single-word shadowing task than
language even if the linguistic input is an unattended background stimulus (Saffran et al.
Some evidence that the sequences are encoded separately from the lexicon comes
from work by Vitevich & Luce (1998, Experiment 1). They constructed lists of disyllabic
English words and nonwords which varied in sequence frequency, so that some items were
"high probability" and some were "low probability". When pairs of nonwords were
than low-probability pairs. When pairs of words were presented, however, the pattern was
reversed: faster for low-probability, slower for high-probability. The authors' interpretation
is that high-probability words and nonwords are both facilitated by the frequency of their
sublexical sequences, but the high-probability words are more strongly inhibited by
competition from their many lexical neighbors. Further evidence is provided by Pitt &
(the Ganong effect) did not induce compensation for coarticulation in a following
Where TRACE seeks to explain phonotactic illegality as a gap in the lexicon, the
phonotactically illegal configuration is one which has zero frequency (Pitt & McQueen
1998:349). Like the TRACE account, a probabilistic theory predicts (1) that illegal
sequences are only slightly different from rare sequences, and (2 ) that all zero-frequency
theory, namely, that of Pitt & McQueen (1998), because it is the one which was designed
for problems of ambiguous phoneme perception. The authors actually define a class of
62
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
probabilistic theories, rather than a specific one: certain manipulable parameters are left
unfixed. This section will try to narrow down the range of possible implementations on the
basis of existing data, so that the remaining possibilities can be tested experimentally.
The rest of this section is organized as follows: §3.3.1. illustrates the functional
utility of statistical knowledge. §3.3.2. describes the range of possible probabilistic theories
in the Pitt-McQueen class. §3.3.3. discusses these theories with respect to the existing data
probabilities, even for very short sequences, greatly constrain the hypothesis space which
the listener must search. To illustrate this, let us consider a model whose task is to listen to
a list of isolated words drawn from the Celex English lemma database. The words occur
with their Celex spoken corpus frequencies. Every so often, one of the words is truncated at
a random location at least n segments into the word, and the model is asked to predict the
next segment (word boundaries are counted as segments). For this task, the model has
available only a table of transitional probabilities: for each string of n segments, it knows
the likelihood that the n+ /st will be [a], [t], etc. An example, for n = 2, is shown in Table
3.2:
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.2. Probability that a given diphone will be followed by a given segment (extract
from complete table)
WA n 0.97
WA r 0.03
WA s 0 .0 0
vz ) 0.99
vz d 0 .0 1
vae 9 0 .0 0
vae 9 0 .0 0
vae k 0.07
vae 1 0.62
vae m 0 .0 1
vae n 0.25
V T . . It/M l It \ M I ■
Note: "(" an d ")" mark word boundaries.
To predict which segment will follow a given context, the model's best strategy is to
always guess the segment that is most frequent after that context, since that maximizes its
words were randomly chosen from the Celex wordforms database (EPW.CD) according to
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
their frequency (combined written and spoken, which is Field 3 of EPW.CD). Initial and
final word-boundary markers were added to each word's segmental representation. From
each representation, a substring of length n + 1 was chosen at random (if the word was long
enough), and the model was asked to guess the last segment on the basis of its likelihood
given the first n. The model was credited with a correct guess if the final segment of the
substring was the best guess according to the guessing strategy (i.e., if the actual last
segment was the likeliest). The simulation3 was run for approximately 100,000 trials for
Table 3.3. Results of the simulation: Success rate as a function of context size
I 0.378
2 0.630
3 0.783
That is, knowing only the last two segments, the model will predict the next one
correctly nearly two-thirds of the time - with zero acoustic information and zero lexical
information. This is only a thought experiment, but it is close enough to both the lab and
real life to show that even a small amount of probability knowledge can be used to very
great advantage.
The probabilistic theory of Pitt & McQueen (1998) is, essentially, that prelexical
mechanisms are sensitive to sequence frequencies (in the equivalent form of transitional
probabilities), and that, when acoustic evidence is inconclusive, perception favors the more
likely option. This prelexical probabilistic module, along with the Shortlist model of word
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
recognition (Norris et al. 2000), forms part of the MERGE model, a theory of phoneme-
processing tasks in which the output of prelexical phonemic processing, along with lexical
effects, in this model, occur very early, and are separate from lexical effects. The relative
on phoneme perception. The major adjustable parameters include: (1) location and size of
the context, (2 ) the database from which the probabilities are computed, and (3) the guessing
strategy. This section describes the possible parameter settings, and the theories resulting
from them.
3.3.2.1. Context
ambiguous segment "?", which could be either x or y, in a context A_B. How can statistical
knowledge about the frequencies of AtB and AyB be used to disambiguate it?
One way is to directly compare the likelihood of x and y in the context. The
decision depends on the conditional probabilities P(.v | A_B) and P(y | A_B), where
(3.4)
F(AxB)
P(x\A_B) = (a)
F(A_B)
F(AyB)
P( y \ A _ B ) = (b)
F( A _ B )
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Since F(string) is the frequency with which that string occurs in the database (the
listener’s experience), all the model has to do in order to make its decision is to compare
F(Atfl) with F(Ayfl). It consults its table4 of (2n+l)-phone frequencies, where n is the
length of A and B,5 retrieves the two relevant frequencies, and hands them over to the
decision rule. This kind of context I will call surrounding context o f order n.
A second possibility is to treat the left and right contexts separately. The decision
depends on the conditional probabilities P(jc | A_), P(y | A_), P(x | _B), and P(y | _B), which
reduce to the frequencies F(Ax), F(Ay), F(xfl), and F(yB). This I will call independent
neighboring context o f order n (again assuming simplistically that A and B have equal
The predictive difference between these two context types is that surrounding
context (SC) can take advantage of statistical dependencies between A and B, while
For example, English, like most languages, requires sonority to rise in syllable
onsets but not in codas. As a result, it lacks sequences like [tdt], [pdp], [fvp], etc., since no
matter what precedes or follows such a sequence, there is no legitimate syllabification - the
consonant in the middle is higher in sonority than either of its neighbors, but not high
enough that it can serve as a syllable nucleus itself. A model using SC of order 1 will note
the gaps in its table of 3-phone frequencies. A model using INC of order 1, though, will
miss these gaps, since each of the sub-sequences [td], [dt], etc. does in fact occur. Given a
segment ambiguous between [1] and [n] in the context [m_z], the SC-1 model will favor [I],
since [mlz] is attested (e.g., in camels), while [mnz] isn't (at least, not for speakers who
lack syllabic [n] in lemons). The INC-1 model will note only the low nonzero frequency of
[mn] (e.g., damnation, amnesia), and the high frequencies of [mz], [ml] and [Iz], and treat
4 I say "table", but that is only one notational variant. They can also be viewed as sublexical-sequence
detector units whose resting activation or excitability depends on frequency. A proposal along these lines is
Vitevich and Luce (1999).
5 For theoretical simplicity's sake I assume symmetry; A and B have the sam e length. This might be
wrong.
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[n] as no worse in [m_z] than in [m_ei]. Which of these models is closer to what people
significant conceptually, is the size of the n-phone tables. As the size of the context
increases, so does the number of phoneme sequences whose frequencies the prelexical
module has to keep track of. Their number quickly approaches the size of the lexicon:6
2 1,395
3 11,961
4 35, 732
Note: Counts were made from the set of Celex wordforms occurring at least once per
million words. Initial and final word boundaries were counted as phonemes. For
comparison, the number of lemmas in Celex is 52,447.
The MERGE TP models are intended to contrast with TRACE by keeping the
lexicon out of the early stages of speech perception. As the rt-phone tables grow, this
difference becomes blurred: the tables incorporate not only the entire lexicon of length n or
less, but fragments of many larger words, and the two theories come to make more and
more similar predictions. These considerations argue for the INC theories over the SC
theories, since the former use shorter n-phones to describe the same-sized context.
Pitt & McQueen (1998)’s experiments use a preceding context of length 1, but the
authors discuss evidence that a preceding context of length 3 may be needed. Since all of
the stimuli they discuss had the same following context (silence), they did not need to go
into their claims about following context. I will assume that they are considering one of
three theories of context: SC-1, INC-1, or INC-3. (See discussion below, §3.3.3.)
6 The single word bat [baetl, for example, contributes four 2-phones: #b, bae, aet, and t#.
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3.2.2. Database
What corpus are the n-phone tables based on? This is both a theoretical and a
One possibility is that the n-phone frequencies are computed from the stored items
in the lexicon. Each word contributes its n-phones, weighted according to the word's
frequency. The lexicon does not directly participate in speech perception, but contributes
off-line by updating the n-phone tables which the early perceptual mechanisms can consult.
A second possibility is that the /i-phone frequencies are computed directly from the
incoming speech stream - computed by the same mechanisms that later consult them,
without any participation from the lexicon. This is more in keeping with the spirit of the Pitt
6 McQueen (1998) model, a strictly bottom-up theory which aims to block the lexicon
from interfering with the early stages of perception. Such theories use a corpus database.
The empirical difference between the two is that the lexical database respects
morphological word boundaries, while the corpus database does not. The reason is that
morphological word boundaries are represented explicitly in the lexicon, but not in a
segmental analysis of the speech stream7. The effect on n-phone statistics can be
substantial. For instance, geminates are very rare in the English lexicon but occur freely in
statistics easy to compute, while a lack of accessible on-line phonetic corpora makes the
7 M orphological word boundaries have many indirect correlates in the surface-level phonetic analysis o f the
speech stream . Since prosodic boundaries tend to be aligned with them, they often correlate with fortition
(Fougeron & Keating 1997). As "prominent” positions, they tend to support phonological contrasts not
available elsewhere (Beckman 1998, Smith 1999), and to undergo prominence-enhancing phonological
processes (Smith forthcoming). None o f these correlates, however, allows morphological word boundaries
to be unam biguously located in the speech stream pre-lexically. If the n-phone tables are compiled
prelexically, no character corresponding to a morphological word boundary will be present in them.
(The same problem was faced by post-Bloom fieldian structuralist linguistic theory, which
dem anded that grammatical analysis proceed from low er to higher levels. Harris (1951) suggested a
statistical solution: Morph boundaries occur where the unpredictability of the next phoneme reaches a peak.
T his solution, like the TP theory, makes indirect use o f lexical and grammatical information. See
Newm eyer (1986:7-9) for a discussion.)
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
corpus-database statistics hard to compute. The standard practice in the field is therefore to
use an on-line dictionary and tacitly assume that the difference is negligible until proven
otherwise. Since I can’t tell what the corpus-database statistics predict, I will have to ignore
Most of the frequency counts which Pitt and McQueen relied on were computed for
American English pronunciations, with frequencies apparently reckoned from the million-
word written American English corpus of Kucera and Francis (1967). Celex's 18-million-
word corpus is much larger, separates written from spoken English, and distinguishes
different inflected forms of the same word, but its British pronunciations are a drawback
when working with American English speakers. The result is some uncertainty about what
the TPs really are, and hence about what a TP-based theory would actually predict. For
instance, if one wants to reckon the probability that the segment following a given vowel will
be [s] or Jj) - a crucial case in the study of Pitt and McQueen (1998) - one can get three
length-2 sequences). The absolute magnitudes may differ by a factor of three, but the
For safety's sake I will give frequencies using both an American English frequency
dictionary similar to Pitt and McQueen's and the Celex corpus. Details of how these
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.6. Transitional probabilities for the stimuli of Pitt & McQueen (1998), n = 1
Probability
Pr ( [us] | [u _ ]) 0.004 0 .0 0 2 0 .0 0 1
P r ( [ i J ] |[ i J ) 0 .0 0 2 0 .0 0 1 0 .0 0 1
8 Computed from the same database used by Pitt & M cQueen, but apparently using a somewhat different
counting method.
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
To compare the conditional probability P(x | A_B) with P(y | A_B) - that is, the
frequency of AxB with that of AyB, since P(x | A_B) = (frequency of AxB)/(frequency of
the A_B environment) and P(y | A_B) = (frequency of AyB)/(frequency of the A_B
environment). I will therefore report only the AxB and AyB frequency counts.
Once the statistical information has been used to estimate the probability that a
particular segment in the context A_B is x or y, the model then has to choose one of the two.
How?
The general form of the decision rule must specify the probability that the model
guesses x rather than y given a stimulus A ?B. The decision rule has to take into account at
least two things: 1. the acoustic composition of the ambiguous segment (how close it is to x
Some are clear; some are garbled in various ways. The listener may at first parse a given
production "wrongly" (i.e., not as the speaker intended it), but usually the correct
interpretation becomes clear shortly as the listener recognizes the speaker’s intended
message. The listener therefore has the feedback needed to optimize the decision rule by
adjusting its parameters. We will suppose that they do this, with the goal of maximizing
Each AxB or AyB stimulus puts the listener in a particular internal state. Which
internal state will depend not only on the speaker's intent, but on the garbling and on the
perceptual noise added by the listener's auditory system. Under the TP hypothesis, the
listener's response is determined by their perceptual state and by the distributional statistics
of their language. For the sake of illustration, let's assume the SC-1 statistics.
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Suppose the listener, having heard a particular stimulus A ?B (intended by the
speaker as AxB or AyB), is now in State Z, a state which can lead to a response of "x"
(3.7)
By Bayes's Theorem,
(3.8)
P (Z | AxB)* P{AxB)
P(AxB \ Z) =
P(Z | AxB) * P(AxB) + P(Z | AyB) • P(AyB)
(3.9)
PC(Z) = r^ — qx + — ^ ---- qv
r <P< + ryP, rxPl + ryPy
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
What choice of qx, our only free parameter, maximizes our chance of guessing
general that from any given internal state, the optimal choice is either always "x" or always
”y". The choice depends on the r and p parameters: if rxpx >rypy, then "x" is the best
In the language of Signal Detection Theory (Green & Swets 1966, Macmillan &
Creelman 1991), Choice Theory (Luce 1963), or Generalized Recognition Theory (Ashby
perceptual space. ("State Z" is one such point.) Following the reasoning described above,
the space is partitioned into regions, and all points in the same region lead to the same
response. To get optimal performance, each region must contain only points where rj<px >
rypy , or only points where rxpx < rypy , so the boundaries must be drawn so that rxp x
—r\Py for points Z on the boundary (Macmillan & Creelman 1991); i.e., so that rx/ry =
Py/px, or
(3.10)
If the a-priori probability ratio (the right-hand side) changes, as when A_B is
replaced by a different phonological environment C_D in which y is less likely, then the
boundary must move in order to keep the likelihood ratio (the left-hand side) equal to it.
9 PC(Z) is linear in qx , so its maximum must be at the sm allest or largest possible value o f qx , i.e., 0 or
1.
74
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
The consequences for a typical perceptual experiment are illustrated in (10). The
plane is perceptual space, with x and y the idealized perceptual representations of the
endpoint stimuli - "idealized", because in fact perceptual noise causes each presentation of a
stimulus to evoke a slightly different percept. The irregular line of dots shows the idealized
locations of the intermediate stimuli. Following the optimal response strategy, listeners
respond "x" when the percept is on one side of the boundary and "y" when it is on the
other, hence, what will be observed in the experiment is that the responses cross over from
mostly "x" to mostly "y" where the line of stimuli intersects the boundary. The difference
in boundary location between the A_B and C_D contexts causes a corresponding shift in the
A_B C_D
75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The listener thus optimizes performance by changing their willingness to respond
"x" in accordance with the ratio of the probabilities of x and y in the context. This leads to
an important conclusion: The effect of context on the location of the "x'Vy" response
boundary depends on the ratio of the probabilities of x and y in that context, and not on their
difference.
For example: Suppose AxB, AyB, and CxD all occur 1,000 times per million words,
while CyD occurs 901 times per million words. The x/y ratio in AJB is 1, while that in CyD
is about 1.1. If a shift in the "x’V'y" boundary between A_B and C_D is found
experimentally, we would expect an even larger shift between A'JB'and C'_D\ where A ’x B \
A'yB', and C'xD' occur 100 times per million words and C'yD' occurs 1 time per million
words (giving x/y ratios of 1 in A'JB' and 100 in C'_Dr). Though the frequency
differences are the same (99 per million in both cases), it is the ratios that matter.
It is very unlikely that listeners follow the optimal strategy to the letter, which would
entail disregarding acoustic evidence of an event of zero probability. Under the optimal
probably a limit to how far the criterion can be shifted, so that larger and larger a-priori
probability ratios only increase the bias up to a point. Very infrequent sequences are
phonetic effects in phoneme perception. Mann & Repp (1981) showed that a segment
ambiguous between [t] and [k] tends to be heard as [t] after [J] and as [k] after [s]. The
effect evidently arises at an early (low) level of processing, either because the perceptual
system is compensating for expected coarticulatory effects, or because the low-frequency [J]
makes the next segment sound higher by contrast while the high-frequency [s] makes it
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
sound lower (Kluender & Lotto 1994). Elman and McClelland (1988) used neither [f] nor
[s], but a segment [?] acoustically in between them. When [?] followed Christma_,
(tapes-capes) continuum. When [?] followed fooli_, Spani_, or Engli_, it acted like [J].
They concluded that lexical activation was spreading down to the phoneme level to favor tf]
or [s], as the case might be, which then had its ordinary phonetic effect on the following
segment.
Pitt and McQueen (1998) argued that early phoneme processing was immune from
lexical effects, and that the Elman and McClelland results could be accounted for if low-
the preceding segmental context. When [J] was more likely, [?] produced more [t]
responses to the following [t]-[k] continuum: when [s] was more likely, [?] produced more
[k] responses.
Where Elman and McClelland had asked only for judgments of the [t]-[k]
continuum, Pitt and McQueen (1998, Experiment 1) also asked for judgments of [?]. The
[?] was presented in two pairs of biasing contexts. One pair, [d3 U_] and [bu_], were
lexically biased towards [s] (juice) and (XI (bush), but the TP from the vowel to [s] and to (XI
was the same for both. The other pair, [di_] and [nei ], were lexically unbiased (since [dis],
[dijj, [nets], and [neijj are all nonwords), but differed in the TPs from the vowel to the
following fricative. The relevant statistics are shown in Table 3.6, repeated here for
convenience:
77
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Table 3.6. Transitional probabilities for the stimuli of Pitt and McQueen (1998), n=l
Probability
P r ( [ u s ] |[ u J ) 0.019 0 .0 2 1 0.008
P r ( [ u J l |[ u J ) 0 .0 1 0 0.009 0.009
P r ( [ u s ] |[ u J ) 0.004 0 .0 0 2 0 .0 0 1
P r ( [ i f ] |[ U ) 0 .0 0 2 0 .0 0 1 0 .0 0 1
P r ([ ip ]|[ L J ) 0 .0 2 0 0 .0 2 1 0.018
10 Computed from the same database used by Pitt & McQueen, but apparently using a somewhat different
counting method.
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Pitt and McQueen found a lexical and a TP influence on [?] report, but only a TP
influence on [t]-[k] report, suggesting that TPs were influencing early phonetic processing,
with the lexical effect only emerging at a later stage. 11 At an early, prelexical stage of
phonetic processing, the ambiguous fricative would be disambiguated using TPs, and,
having been classified as [s] or tf], would so affect the perception of the following [t] or [k].
Later on, after lexical access, the lexicon could affect listeners' report of the fricative, but not
of the stop.
Does the experimental evidence from these two studies rule out any of the possible
TP-based theories described in the last section? Are the left and right contexts independent
(INC model), or are they treated as a single unit (SC model)? Since following context was
not varied in these studies (it was always a [t]-[k] or [d]-[g] continuum before [ei]), they do
not distinguish INC from SC. However, they do throw some light on the question of how
The most conservative hypothesis, which is used by Pitt and McQueen (1998)
through most of their paper, is that only the immediately preceding segment matters: the
decision between [s] and [f] after [WXYZ_] depends only on the frequencies of [Zs] and
m
McClelland and Elman had considered this account of their results:
11 The latter half o f this squares with the Findings o f Fox (1984), who reported that lexical influences on
ambiguous-phoneme perception turn up only among the responses with long RTs.
12 Sic; an apparent typo for [la].
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
They dismiss it in view of the results of their Experiment 3, in which a large effect
was found when the contexts werefooY? and riciicuY?, where ”Y" was a CV sequence
Context fully three segments away is affecting the ambiguous fricative so strongly
that it in turn affects perception of the following stop. The TRACE authors argue, in effect,
that expanding the TP context to be that big makes the TP theory practically lexical, by
including whole words in table of sequence frequencies (e.g., all words in the lexicon which
are four segments long or shorter). At any rate, a single segment of preceding context is not
enough.
Pitt and McQueen reply that the last vowels in "foo_" and "ridicu_" are not in fact
identical; the one they take to be [u] and the other to be [u]. If we consider the frequencies
of the four sequences [ulis], [uhjl, [ulis], and [ulij], then the TPs favor [is] after [ul_] and
[sj] after [ u l j . 13
/ulij/ occurs in words like coolish, foolish, and ghoulishness. Celex shows
that this string occurs about 26 times per million words, /ulis/ and /ulos/
occur less than once per million words, and /ulojV does not occur at all. So
t y is much more likely given /uIV?/. The opposite bias operates after/U/.
The string /Utas/ is quite common, in words like incredulously, ridiculous,
13 Ridiculous is transcribed by the Francis-Kucera dictionary as ridik{jo]lous; Jones (1997) records both this
and ri</ic[ja]/u5. Some speakers may have this latter pronunciation. CELEX estim ates /alas/ to occur
about 56 times per million words (combined spoken and w ritten), and /alaJV to occur not at all; /alts/
occurs 350 times, and /altJ-/ 25. Following Pitt & McQueen's reasoning, [s] is still more likely after
/a IV?/.
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and stimulus. The CELEX estimate is 8 6 times per million words. /Ulis/
also occurs (in words like oculist and somnambulist, 9 times per million),
but /ulaJV and /ulif/ never occur. So /s/ is much more likely after AJIV?/.
(Pitt & McQueen 1998:365)
This counterproposal does not necessarily require the listener to keep track of 4-
phones, of which there are at least 35,732 (see Table 3.1). Perhaps what has happened in
McClelland and Elman's experiment is a statistical chain reaction. Suppose the listener
maintains a 3-phone table (at least 11,961 entries). When the ambiguous [i]/[a] vowel is
encountered after [ul_J or [ul_], it is disambiguated using 3-phones. The restored [li_] or
The statistics of English permit this. Table 3.12 shows the relevant Celex counts for
the [i]/[a] decision, which favor [i] after [ul_] and [a] after [ul_]. (The [i] counts in Celex
are too high for American English, since word-final unstressed [i], as in marry, is
pronounced [I] in Southern English dialects (Trudgill 1999). I have corrected them in the
table by subtracting the number of word-final occurrences in each context.) Table 3.13
shows the counts for the [s]/[J] decision, which strongly favor [s] after [la_], but are nearly
81
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.12. Triphone frequencies for sequences ending in [i]/[a] in the stimuli of
McClelland and Elman (1988)
Francis/Kucera
[ula] 32 33 12 15
14 This American English dictionary does not contain the word foolish.
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.13. Triphone frequencies for sequences ending in [s]/[f] in the stimuli of
McClelland and Elman (1988)
Francis/Kucera
12 11 14 0
now make the wrong prediction about Pitt and McQueen’s Experiments 1-3, since now
[d3 U_J and [bu_] have 100% TP biases towards [s] and [J] respectively (see Table 3.14).
This should have produced a TP effect, but did not. Even worse, the [di_] and [mi_]
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.14. Triphone frequencies for the stimuli of McQueen and Pitt (1998)
Francis/Kucera
[d3 us] 28 30 4 17
0 0 0 0
[bus] 0 0 0 0
[buj] 74 79 7 24
[dis] 0 0 0 0
[dif] 0 0 0 0
[neis] 14 15 6 2
[mis] 0 0 0 0
[mil] 0 0 0 0
A context size of 1 segment is too small to account for the Elman & McClelland
(1988) results. A context size of 2 segments can handle those, but not the Pitt & McQueen
(1998) results. Larger contexts do not solve this latter problem (since the juice/bush stimuli
are only three segments long), and in any case lead to a duplication of the lexicon at a
prelexical level.
84
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TRACE can explain this disparity, as noted in this connection by Samuel (2000).
Lexical effects in TRACE increase over time, and are greatest at the end of long words,
because the word nodes take time to reach activation and arc more active the more phoneme
nodes are feeding into them. The ambiguous fricatives in both experiments came at the end
of a word, but the McClelland and Elman words were much longer than the Pitt and
McQueen stimuli ([fuli_] and [iidikjuld_J versus [d3 u_] and [bu_J).
In this experiment, Pitt and McQueen not only failed to find a lexical effect with the
contexts jui_ and bu_, they succeeded in getting a TP effect with the contexts mee_ and
nay_, both of which make nonwords no matter which way the ambiguous fricative is
interpreted. Tables 3.15 and 3.16 show the cohorts at the time the ambiguous fricative
appears. It is clear that effects were found in ail and only those cases where, in at least one
of the paired stimulus contexts, the active cohort strongly favored [s] or [f] at the time the
Table 3.15. Cohorts at the appearance of the ambiguous fricative in the experiment of
McClelland and Elman (1988, Experiment 3)
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.16. Cohorts at the appearance of the ambiguous fricative in the experiment of Pitt
and McQueen (1998, Experiment 3)
juicy 2
bushels 2
bushes 1
mee_ [ m ij — 0 — 0
nay_ [ n e ij - 0 nation 45
nations 43
nationwide 3
nationwide 1
No matter what context we choose, there is empirical data which the TP theory will
not cover. Our choice of which version to test will have to be based on other grounds.
There are two good theoretical reasons to choose a one-segment context for the present
study, and a third practical reason. First, we hope to equate "zero-frequency" and
"phonotactically illegal". This is plausible for sequences of length 2, but not for those of
length 4 - in the latter case, it leads to the claim that any 4-segment word which does not
already exist, such as [uloj], is illegal. Second, the MERGE TP theory contrasts with
TRACE in excluding the lexicon from prelexical phonetic processing. As the size of the
86
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
context increases, so does the number of phoneme sequences whose frequencies the
prelexicai module has to keep track of. Their number quickly approaches the size of the
lexicon. Finally, as a practical matter, long contexts make the frequency counts harder to
There is also empirical evidence supporting the one-segment context theory. Pitt
[d_ae], [g_ae], [t_ae], [bjeJ, and [s_ae], and measured listeners' [j ] and [1] judgments. He
found a strong [j ] report bias (compared to the baseline [b_ae]) in [t_ae], a weaker one in
Absolute per-million frequencies of [1] and [j ] after each of the initial consonants are
shown in (3.17). The ratio of these yields the a priori likelihood that an unknown liquid in
that context will be an [I], As we saw in §3.3.2.3, it is this ratio which, when the listener
uses an optimal guessing strategy, predicts the size of the response bias. The order of
effects predicted by the likelihood ratio is exactly the order of effects found by Pitt15:
>s Pitt himself interpreted these results as contradicting the probabilistic account o f the phonotactic effect.
This is because he assumed listeners were using a suboptimal guessing strategy. Rather than the likelihood
ratio, he took the predictor o f statistically-induced bias in favor o f a given cluster to be the sum o f the
logarithm o f the individual frequencies o f the words in which it occurrs.
87
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Table 3.17. Likelihood ratio as a predictor of the phonotactic bias effects of Pitt (1998)
Frequency Empirically
[ti] 8468
[d i] 2020
[g j] 3845
[s j ] 13
length-3 sequences occurring in the English lexicon, including their empirical frequencies.
We will estimate those from the Celex statistics on British English, checking them against
the Francis-Kucera statistics on American English, using the procedure described in the
88
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
2. When an acoustically ambiguous segment between x and y is presented in the
context it will tend to be parsed as the one which is more frequent in that context.
The difference in rate of "x" report between the context A_B and the context C_D will
depend on the relative likelihood of x and y in those contexts. The influence of statistics
3. The relative likelihood of x and y in A_B and C_D can be computed in either of
two ways.
4. The TP effect happens very early, certainly prelexically. However, tasks that
directly tap phoneme perception (such as the syllable and phoneme judgment tasks used by
Massaro & Cohen (1983)) can be responded to on the basis of either a prelexical phonetic
representation, or on the basis of one retrieved from the lexicon after word recognition,
A final possibility is that phonotactic regularities are not emergent, but fundamental;
that the mechanisms of speech perception have access to the possible, as well as the actual,
phonological configurations of their language, and are able to apply that knowledge in
The chief point at issue between the TRACE and MERGE TP theories on the one
hand and a grammatical theory on the other is the status of zero-frequency phoneme
sequences. TRACE and MERGE TP treat ail such gaps alike: The model simply notes the
phonological gaps (configurations which cannot occur) and mere lexical gaps
89
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
That there is such a difference is the central claim we will test here. There are many
different ways to implement a theory of grammar in speech perception. The main model
parameters are the specific grammar to be used (§3.4.1.) and the rule for using it to decide
presented here uses the grammatical framework of phonological Optimality Theory (Prince
& Smolensky 1993). entailing a decision model in which multiple candidate parses are
entertained in parallel.
any procedure which correctly separates the productive gaps from the non-productive ones.
I have chosen phonological Optimality Theory (Prince & Smolensky 1993, McCarthy &
Prince 1995).
Surface representations can be compared for markedness by scoring them with respect to
emergent phenomenon.
in an OT grammar if it is never in the output, regardless of what the input is. This is
reflected in the grammar by having a markedness constraint against the illegal configuration
dominate all of the faithfulness constraints which aim to preserve it, so that an input
containing the illegal configuration will be realized without it. The grammar may contain
many markedness constraints which do not dominate the relevant faithfulness constraints
and hence do not trigger repairs; configurations violating only such constraints are not
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
An example is the case of *PAL, the constraint which forbids palatal consonants
/ca/ *P a l IDENT[BACK]
[ca] *!
-> [ka] *
candidate is eliminated by C (Prince & Smolensky 1993, Chapter 5). In (3.18), for
example, * P al is active for the input /ca/, because it is there that [ca] is eliminated.
In a given grammar, some constraints are never active for any input. An example in
Voicelessness is obligatory for coda obstruents in many languages, including the standard
varieties of Russian, Polish, German, and Turkish. Illegal coda clusters are repaired
order to satisfy *VoiCE]o. (Hence, *VoiCE]o is active for some inputs in those languages
- it is the constraint which eliminates the candidate outputs with voiced coda obstruents.)
In English, however, *VoiCE]o is ranked too low to have such an effect. English
tolerates voiced coda obstruents (tub, leave, brag, etc.). No candidate is ever eliminated by
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.19) IDENT (for instance)» *V oiCE]a
-> [liv] *
[lifl *!
like the general markedness of palatals, are crucial to any grammatical model of language,
but are outside the scope of statistically-based models such as TRACE and MERGE TP.
Within a given framework, there are at least as many different grammars as there are
languages, and for each language, perhaps as many as there are linguists. The predictions
of the perceptual model depend on which one is selected. If experiment falsifies these
predictions, the problem may lie with either the perceptual theory itself or with the
grammatical analysis of the given language (just as, if experiment falsifies a probabilistic
theory, the problem may be due to the perceptual theory itself, or to faulty frequency
counts).
perceptual effects are insensitive to the choice of a specific analysis. The lack of onset [tl
dl] clusters, for instance, is a good choice, because those onsets are so robustly illegal that
framework, that means they must be ruled out by some markedness constraint, which ipso
facto is active. Even if our analysis in Chapter 2 has pointed the finger at the wrong
markedness constraint, any alternative analysis will have a different one from which the
include, at a minimum, the ones which are by all measures productive: those that are not
92
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
naturally violated (having no lexical exceptions) and that speakers cannot be induced to
violate (either because the banned configurations trigger repairs, or because the speaker
simply cannot pronounce them without great effort). For more detailed discussion, see
§ 2 .2 .
Linguistic effects on speech perception come about because language limits the set of
available parses. The listener constructs a phonological parse at two levels of representation,
corresponding to OT's underlying /UR/ and surface [SR]. The [SR] is computed from the
acoustic signal, while the /UR/ is retrieved from the lexicon. Different (/UR/, [SR]) pairs
compete to account for the observed signal, with the grammar as referee: The (/UR/, [SR])
pairs are scored by the hierarchy of markedness and faithfulness constraints of the
language, and perception favors the most harmonic pair. Thus, the OT grammar does the
same job in speech perception that it does in linguistic theory: It compares (/UR/, [SR])
For example, in Pitt (1998)'s replication of the experiments of Massaro & Cohen
(1983), using nonword stimuli, there are no /UR/s to deal with, so the issue is decided by
UR = • I OCP(CONT, COR) S pr ea d [C o r ]
a. (•, [hiae]) 1
b. (•, [blae]) |
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.21) [I] illegal => [j ] bias
UR = • OCP( co n t , C or ) S prea d [C o r ]
a. (•, [Us])
b. (•, [tls]) *!
UR = • OCP( co n t , C or ) S p r ea d [C o r ]
a. (•, [s j s ]) *!
b. (•, [sis])
In this model, the incoming acoustic signal is first transduced into one or more
component which in Figure 3.3 is labelled "Phonetic Parser"; it could also be called a
"Feature Extractor". Given a speech stimulus, it produces a set of [SRJs consistent with that
stimulus.
94
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 3.23. Architecture of an OT-grammatical-based parsing model
OT grammar
[SR] Parse(s)
Phonetic
parser
I assume that under normal laboratory conditions, with a short stimulus clearly
spoken, the Phonetic Parser will emit a single [SR]. Two or more [SR]s can be coaxed out
will evoke [1]; for a given stimulus level, there is a certain probability of getting [j ], a certain
probability of getting [1], a certain probability of getting both, and a certain probability of
getting neither. These probabilities change depending on the acoustic constitution of the
stimulus.
that syllables can be incorporated into a prelexical representation, since nonsense words,
which lack a lexical representation, can be syllabified in off-line judgment tasks. The
question is whether the syllabic structure is automatically computed as part of the parsing
process. There is evidence that it is. Syllable boundaries are needed for segmentation and
lexical access, so they have to be marked in the input to the lexical-access stage. In on-line
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
word-spotting tasks, English listeners are better at finding a word boundary when it is
aligned with the left-hand boundary of a stressed syllable (McQueen et al. 1994, Cutler &
Norris 1988), which suggests that the input is parsed exhaustively into feet. A word
boundary is harder to find if a syllable boundary drawn at that point would create a
phonotactically impossible syllable (e.g., spotting apple in fapple) (Norris et al. 1997,
McQueen et al. 1998). Abstract grammatical preferences for maximal onsets, and for
syllabifying intervocalic consonants with the more stressed vowel, are also detectable in
word-spotting - generalizations which only make sense when stated in terms of syllables
(Kirk 2001) . ‘6
As in the Shortlist model of Norris (1994), lexical entries are activated by an [SR]
which is sufficiently similar to them. I will assume that "sufficient" similarity is determined
by the neighborhood metric; a lexical competitor is any lexical item whose /UR/ can be
obtained from one of the active [SR]s by a one-segment insertion, deletion, or replacement.
Competition between all of the active (/UR/, [SR]) pairs then takes place through the
grammar. 1718
16 Evidence against syllabification com es chiefly from sequence-monitoring tasks. These are known to be
sensitive to syllabification in certain languages. For example, M ehler, Dommergues, Frauenfelder, and
Segui (1981) found that French listeners detected a CV or CVC target faster when it exactly matched the
first syllable o f the stimulus word - ba being detected faster in ba.lance than in bal.con, but bal being
detected faster in bal.conthan in ba.lance. English speakers show no such difference, whether tested with
English (ba/bal in bal.cony and the ambisyllabic ba.lance/bal.ance) or with the original French materials
(Cutler, Mehler, Norris, & Segui 1986). The authors interpreted this to mean that English listeners do not
use on-line syllabification to segm ent speech, even in cases like bal.cony where the syllabification is
unambiguous. However, K irk (2001) argues instead that, owing to the effects o f stress on syllabification,
the first syllable o f balcony is bale, and hence that neither target m atched a syllable. For an extensive
critical review o f the evidence for and against on-line syllabification in English and other languges, see Kirk
(2001, Chapter 2).
17 Although this description is confined to the single-word stimuli actually used in the experiments, it can
be extended to longer utterances in a straightforward way. The Phonetic Parser emits one or more candidate
[SRjs, as before. Candidate word /UR/s are activated by sufficiently sim ilar substrings of the [SR]s;
candidate utterance /UR/s are the concatenations o f nonoveriapping word /UR/s. These utterance (/UR/,
[SR]) pairs then compete as before. This allows the theory to capture word segmentation and inter-word
sandhi phenomena.
18 For bilingual listeners, this model assumes that the grammar is selected before the utterance is parsed
phonologically. It is conceivable that bilingual listeners parse an incom ing utterance in both languages
simultaneously, and sem antically interpret it in whichever language yields the more harmonic phonological
parse. However, existing studies o f language identification suggest that infant and adult listeners use
rhythmic characteristics, rather than inventory or phonotactics (Stockm al, Moates, & Bond 2000; Nazzi,
Jusczyk, & Johnson, 2000). Languages with similar rhythm can be discrim inated by infants (e.g., Catalan
and Spanish, Bosch & Sebastian-G ali£s 1997), but it is not known what information is used.
96
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
It is at this point natural to posit that the grammar scores each pair and chooses the
most harmonic. This theory is elegant but fatally flawed. The root of the problem is that
Domination", which states that if Constraint A is ranked above Constraint B, then violating
Constraint A even once is less harmonic than violating Constraint B any number of times.
Strict domination is the only means which OT affords to represent one constraint's primacy
over another. Violating A cannot be just a little bit worse than violating B; it must be either
infinitely worse, equally bad, or infinitely better (Prince & Smolensky 1993:78).
One consequent prediction is that inactive markedness constraints should have just
patterns in the sound inventories of languages, OT posits certain universally fixed rankings
- e.g., that labial and dorsal articulations are universally more marked than coronal
articulations. In the grammar of English, which allows both labials and coronals, neither the
anti-labial constraint nor the anti-coronal constraint is active; however, they are still in the
grammar and one still dominates the other. A stimulus which is ambiguous between [ba]
and [da] therefore has two interpretations, one of which violates the constraint against
labials, the other of which violates the lower-ranked constraint against coronals. Since
labials are by hypothesis infinitely worse than coronals, perception should strongly favor
[da] over [ba]. This does not seem to always be the case: For example, when Luce (1986,
Ch. 3) presented listeners with a balanced set of CVC nonsense words in noise at a +5 dB
signal-to-noise ratio, he found that final [b] was reported as [d] 27 times out of 150, while
markedness differences, we must stipulate that inactive markedness constraints carry little if
19 Syllable-initially, [b] was reported as [d] far more often than the reverse: 39 times versus 4 at a +5 dB
signal-to-noise ratio, 23 times versus 4 at -5 dB. Even syllable-final [b] was reported as [d] 15 times at a
-5 dB ratio, versus 6 times the other way around. It m ay be that the low markedness, or perhaps the high
frequency, o f [d] is having some sort o f effect. This effect is, however, not as overwhelming as expected,
and may in any case be due to the spectral quality o f the noise used (white Gaussian noise up to 4.8 kHz),
which is sim ilar to the diffuse-rising spectrum o f alveolar plosive bursts (Blumstein & Stevens 1979).
97
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
any perceptual weight. That is, the (/UR/, [SR]) pairs are evaluated only by that part of the
constraint hierarchy which ranks above the highest-ranked inactive markedness constraint.
A second problem that this proposal runs into is the same one which bedeviled
TRACE: the inability to adjust the influence of the lexicon based on attentional factors.
Lexical effects, such as the Ganong effect, would in this theory be captured by faithfulness
constraints:
a. (/taesk/, [daesk]) *!
b. =>(/taesk/, [taesk])
(3.25) [d] word, [t] nonword => [d] bias
ID-Voice
II
b. (/daej/, [taej]) *!
The violation, and hence the predicted bias, is just as large regardless of how much
attention the listener allocates to the lexicon, contrary to the findings of Cutler et al. (1987).
averaging over a large number of trials. Suppose that on each trial, the listener either
"attends to the lexicon" - i.e., insists on a parse with an /SR/ - or does not, on a trial-by-trial
basis. The task manipulations used by Cutler et al. (1987) can be seen as changing the
probability, rather than the extent, of the listener's attention to the lexicon on each trial, and
hence the number of lexically-biased responses which were averaged into the data.
98
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3 .2 6 )
a. If attending to the lexicon, and if the stimulus is close enough to a
real word to activate some /SR/s, choose the (/UR/, [SR]) pair which scores
best on the active constraints.
As in the Race model of Cutler et al. (1987), responses to phoneme tasks can be
based on either the computed [SR] or the retrieved /UR/, with task constraints dictating
which is favored in each case. There is laboratory evidence for the existence of both levels
Xu (1991) showed that Mandarin Chinese speakers had poorer recall for written
lists of rhyming morphemes when the list elements shared the same tone than when they
differed in tone. Speakers were then asked to perform the same task with lists constructed
so that the first two items had the same surface tone but different underlying tones, and
performance was compared with lists in which the first two items had different surface and
underlying tones. Performance was worse on lists of the first sort, suggesting that the
On the other hand, Lahiri & Marslen-Wilson (1991), using a gating task, found that
English listeners took it as a sign of an upcoming nasal consonant, since English vowels are
not inherently nasalized, but become so in a nasal phonetic context. Bengali listeners, on the
other hand, speak a language which has both inherently (i.e., contrastively) nasalized vowels
underlying (i.e., did not take it as a sign that a nasal consonant was coming up) until they
actually heard the beginnings of the nasal consonant. This suggests that the Bengali lexicon
99
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
lexical representation and surface phonetic representation which the gating task (an
inherently lexical task) revealed. It further indicates that the Bengali speakers were
choosing the more faithful (/UR/, [SR]) pair in which the underlying and surface vowel had
the same degree of nasalization over the less faithful pair in which they differed, as we
would expect.20
3.5. Summary
This chapter has presented three very different theories of phonotactics in speech
perception.
the three theories, it makes the smallest demands on the learner, requiring knowledge only
of the lexicon. Phonotactic effects are viewed as diluted lexical effects, in which permitted
defeat competing illegal candidates via lateral inhibition at the phoneme level.
short phoneme sequences. The theory requires knowledge of the lexicon and of a set of
attested phoneme sequences, which may be quite large but can be acquired straightforwardly
through observation. Phonotactic effects are taken to occur at a pre-lexical level, with rare
20 If the OT interpretation o f Lahiri & M arslen-W ilson’s results is correct, it is empirical evidence against
the OT principle o f Lexicon Optimization (Prince & Smolensky 1993, Inkelas 1994). Lexicon
Optimization is a means o f dealing with the source-filler nature o f the O T grammatical m odel, which can
map several /UR/s to the sam e [SR]. In acquiring the lexicon, it is asserted, the /U R/ which is chosen is
the one to which the observed [SR] is most faithful.
We would be led to expect Bengali speakers to represent surface [CVnN] words as underlying
/CV nN/, which map to the sam e output more faithfully than an underlying /CVN/ would. W e would
therefore expect a gating stim ulus o f the form [CVn ...] to often be com pleted with an N, i.e., matched to a
word whose underlying representation is /CV nN/. Instead, they were overwhelmingly matched to words
whose underlying representation was /CVKT/, suggesting that the surface [CVnN] words are underlyingly
/CVN/. The study's finding that speakers apparently lexicalize surface contextually nasalized vowels as
underlying non-nasalized vowels indicates they are not using Lexicon Optimization.
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The OT grammatical model sees perceptual phonotactic effects as a consequence of
the limited range of parses available in the language, and the listener's bias towards a parsed
percept. The implementation used here requires knowledge of the lexicon, and of a set of
constraints. The number of constraints needed is probably not very large (a grammar of the
syllable onsets of English, in Chapter 2, needed well under 20), and the correct ranking is
provably leamable (Tesar & Smolensky 1995); however, their provenance is unclear. They
are normally taken to be innate, since the patterns they represent occur world-wide.21
Phonotactic effects are assumed to occur at a prelexical level, the level of surface
Each of these theories suffers from empirical drawbacks in one domain or another.
TRACE has difficulty explaining why phonotactic effects are more robust than lexical
effects. The MERGE TP model cannot be pinned down on precisely which phoneme
sequences are perceptually relevant; different choices leave different lab results unexplained.
The OT grammatical model accounts for effects of illegality, but not the apparent (usually
neighborhood (Newman, Sawusch, & Luce 1997; Pitt & McQueen 1998 Exp. 4).
The drawbacks of one model are, naturally, the advantages of the others. TRACE is
sound-meaning relations are arbitrary, the lexicon must be learned in any theory. TRACE
says that only the lexicon must be learned, and that apparent effects of grammatical
regularity are really emergent properties of lexical interaction. MERGE TP is only slightly
less parsimonious - only the lexicon must be learned, but the relevant regularities have to be
actively abstracted from it by the probability-tracking system. Though both theories require
innate structure in the perceptual system, neither requires detailed innate knowledge the way
21 Moreover, since the constraints are violable and do get violated, they cannot be individually inferred from
the speech corpus by any simple m echanism — especially the markedness constraints, being prohibitions for
which no positive evidence can exist. (Naturally, a linguistically more sophisticated mechanism could take
advantage o f alternations to deduce abstract underlying forms and the markedness constraints necessary to
cause the alternations.)
101
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the OT grammatical theory does. As a practical matter, it is also easier to make predictions
from TRACE and MERGE TP than from any grammatical theory, since less analytic depth
is required.
In CHAPTER 4, our focus will be on the interesting claim, put forth by the TRACE
and MERGE TP theories and denied by the OT grammatical theory, that phonotactic
illegality is equivalent to zero frequency. The claim is interesting because it suggests that
phonology, at least in perception, is considerably simpler than many linguists have hitherto
acquisition.
All frequency counts were made from the Celex lexical database (Baayen et al.
1995). This is based on a corpus of 16.6 million words of written English and about
800,000 words o f spoken English. Most of the corpus is from British sources, and the
Celex provides two ASCII phonetic transcription systems. I used the one found in
Field 7 of the file EPW.CD. Variant pronunciations are given for some words, but I always
Celex gives frequency counts by "lemma" (i.e., citation form, with know, knows,
knew, and knowing all lumped together) and by "wordform" (i.e., counting inflected forms
categories are counted separately (e.g., link noun and link verb). I used the wordform
database (except where otherwise noted), specifically, the files EPW.CD (the
Frequencies are counted separately for the written and spoken corpora. A
"combined" frequency count is also given; since most of the corpus is written, the
"combined" frequency is usually very close to the written frequency. I have used the
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
spoken counts except where otherwise noted (non-spoken frequencies are used only for
compatibility with counts based on the Francis-Kucera (1967) written-corpus norms). All
counts are from the Celex per-million-words estimates (combined, Field 6 ; written, Field 9;
The Celex transcription system marks syllable boundaries and includes stress marks
The scripts used to create and process the frequency counts are appended.
# !/usr/local/bin/perl
# make_ngram_table
$n = $ARGV[0];
$phon_db = '/tmp/Celex/EPW.CD';
$freq_db = Vtmp/Celex/EFW.CD’;
open (PHON, "< $phon_db") || die "Couldn't open $phon_db";
open (FREQ, "< $freq_db") || die "Couldn't open $freq_db";
103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
($segment_pron = $pron) =~ trA-VYV/d;
#!/usr/local/bin/perl
# make_TP_table
# Input format is
# <Xl...X(n-l)Xn> <combined freq> <written freq> <spoken freq> etc.
# Output format is
# <Xl...X(n-l)> <Xn> <P(Xn | Xl...Xn-l), combined> <same, written>etc.
104
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
# Compute transition probabilities conditional on Xl...X(n-l).
foreach $i (0 ..$#freqs) {
$TPs [$i] = '(none)';
next unless $context_freqs [$i] {"Scontext"}; # avoid /0 errors
STPs [$i] = sprintf "% 6.3f’,
($ngram_freqs [Si] {"ScontextSlastseg"} / $context_freqs [Si]
{Scontext});
}
print "$context\t$lastseg\t";
print join "\t", @TPs;
print "\n";
}
(3.29) Script for finding the active cohort following a given phonological string
#!/usr/local/bin/perl
# cohort
105
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
($segment_pron = $pron) =~ trA-\Y//d;
# Is it in the cohort?
next unless ($segment_pron =~ /A\Q$beginning\E/);
# Yes —print
printf "%s ”, Sbeginning;
printf "%6 d ", $freq_comb_perM;
printf "%6 d ", $freq_writ_perM;
printf "%6 d\t", Sfreq_spok_perM;
printf ”%s\t", $freq_orth;
printf "%s\n", $segment_pron;
)
# !/usr/local/bin/per!
# simulated_guess
# Count frequencies
106
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
# Print guessing strategy
foreach Scontext (keys %cfreq) {
print "Scontext $best_guess{ Scontext }\n”;
}
print "\n";
# Simulate experiment
srand (time());
$CELEX_SIZE = 18000000;
STRIALS = 100000;
$seg_pron = ";
unless ($seg_pron) {
($seg_pron = Spron) =~ trAY\-//d;
$seg_pron = "(" . $seg_pron .")";
last TRIAL if (length ($seg_pron) < $n);
}
print "Sorth Si $seg_pron ";
}
}
107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 4
EMPIRICAL TESTS
4.1. Introduction
phonotactic effects in phoneme tasks. TRACE (McClelland & Elman 1986) holds that
phoneme perception at the very lowest level is directly influenced by the downward spread
of activation from the lexicon. The MERGE TP theory (Pitt & McQueen 1998) assigns the
implemented using Optimality Theory (Prince & Smolensky 1993), attributes phonotactic
effects to the restrictions placed by the sound pattern of the language on the set of available
parses.
grammatical theories, shows that the size of the phonotactic boundary shift is not modulated
which takes phonological overlap to be the source of all phonotactic effects. It is expected
relevant context is too small to include the region manipulated to produce overlap.
such effect was found. This is unexpected under the TRACE and MERGE TP theories,
since the statistical properties of [pw] are very similar to those of [tl] and [s j] (the clusters
used by Massaro & Cohen 1983). A natural explanation in the OT grammatical theory is
108
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that [pw], although rare in English, is not illegal - it does not violate an active markedness
constraint.
effectiveness of the markedness of [pw] and [tl]. The results indicated that [pw] was in fact
less disfavored than [tl]. This result is expected in the OT grammatical theory, but is quite
modulating effect of the degree of phonological overlap with existing words, contrary to the
predictions of TRACE.
Listeners heard stop-sonorant clusters in which both consonants were ambiguous, and
judged both. The effect of the stop judgment on the sonorant judgment was assessed
bias when all acoustic factors were completely fixed. The sonorant in both experiments was
an "l"-"w" scale.
Experiment 4, using CCV stimuli, found that a "d"-or-"g" decision affected the odds
of an "I" response, with ”d" making "1" less likely, while a "b"-or-"g" decision had no
effect. This confirms that the results of Experiments 2 and 3 (the smaller bias against [pw]
than [tl]) were not due simply to closer perceptual spacing of the stimuli at the labial end of
the scale. The existence of a response dependency is inconsistent with the TRACE
response mechanism; the larger effect of the "d"-or-"g" decision is unexpected in the INC-1
Experiment 5 compared the effect of a "d"-or-"b" decision in CCV stimuli with that
in VCCV stimuli. There was a strong effect in the CCV condition, but none in the VCCV
condition, indicating that the weakness of the ban on [pw bw] found in Experiments 2-4 was
not due to compensation for coarticulation, and suggesting that the parser determines
109
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Experiments 6 ab examined the perceptual effects of an abstract morpho-
stratum membership were found to cause the phonotactics of the particular stratum to be
imposed upon ambiguous stimuli, causing a perceptual boundary shift. This has a natural
account in the grammatical theory, where lexical stratum is a necessary theoretical entity. It
is unexpected in TRACE and the MERGE TP theory, for both of which a division of the
lexicon into strata is unmotivated. It is shown that the stratum-phonotactic effect cannot be
emergent in TRACE, because it is weaker than a lexical effect obtained with the same
subjects and paradigm. It cannot be emergent in the MERGE TP theory either, because
perception of the ambiguous segment is influenced by other segments which are too far
The results are argued to support the OT grammatical theory over TRACE and
MERGE TP.
vowels
4.2.1. Rationale
If phonotactic effects are really lexical effects, as claimed by TRACE, then their size
should be modulated by the same factors that control the size of lexical effects: Similarity
to existing words, and number of similar words. If that is so, a nonword that is similar to
many frequent words should induce stronger phonotactic effects than one that is similar to
For this, we can exploit the phonology of the English lax vowels. The lax vowels, [t
e ae a o], form a separate system from the other "tense" vowels of English, both
110
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Like all American English vowels, the lax vowels are somewhat diphthongal, but
where the tense vowels are peripheralizing diphthongs, the lax vowels are centralizing; that
is, the offset of a tense vowel is further from schwa than the onset, but the offset of a lax
vowel is closer to schwa than the onset. The lax vowels are also somewhat shorter and less
Phonologically, the lax vowels are distributed differently from the tense vowels. The
i he
-
i
ei hay
e -
ae -
a pa
0 paw
ou hoe
u -
A -
111
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Not only do lax vowels not occur there, they can not occur there. The intuitive
badness of such nonwords as [hi he hae hu hA] is quite strong1. That the gap is
phonological rather than lexical is illustrated by the change of lax to tense vowels when
Table 4.2. Change of lax to tense vowels when made final by truncation
del[i\catessen del[i]
Un[i]versity Un[ i]
D[l]rdre D\\]d[i)
TRACE ought to be able to model the lack of word-final lax vowels. The
phonotactic ban should emerge from the large population of words ending in tense vowels
and the nonexistence of any words ending in lax vowels. Activation spreading from the
tense-final words should shift the phonotactic boundary on a word-final [i]-[i] continuum,
However, since it is similarity to real words that produces the lexical activation, the
size of the shift should be larger when the rest of the phonological context (i.e., material
besides the [i]-[i] itself and the immediately following segment or boundary) is similar to
more existing words. TRACE does not distinguish between parts of the stimulus that are
1 Lax vowels can occur syllable-fmally in onomatopoea: bleah [bis], baa [bee:] (sound made by a sheep).
M arginal phonology is often found in this dom ain; e.g., boing [boir>] (sound made by a spring), which has
both a diphthong before [q] and a non-coronal after [01 ].
112
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
relevant to the phonotactic generalization, and parts which are not. Everything in the
The MERGE TP theory focuses on a smaller part of the stimulus. One version,
which we called INC-1 in §3.2.3.3, considers separately the ambiguous segment and the
immediately preceding segment on the one hand, and the ambiguous segment and the
arise because of the rarity of the [lax vowel]-[word boundary] sequence. Another version,
which we called SC-1, considers the preceding, ambiguous, and following segments as a
unit. For any choice of preceding segment, SC-1 also predicts a phonotactic boundary shift,
Stimulus context more than one segment away from the ambiguous vowel does not enter
vowels in English. The markedness constraint against them, which I will call *Lax ]o , is
able to trigger repairs, and hence dominates the faithfulness constraint IDENT-V. Since
*Lax]<j is an active constraint (in the sense of §3.4.2), it is expected to penalize any parse
ambiguous cases, a bias towards parses with a tense vowel. The constraints apply equally to
all phonological configurations meeting their structural description. Hence, this theory
predicts that only phonological context directly involved in the phonotactic prohibition (the
vowel and the immediately following segment or boundary) will contribute to the
4.2.2. Design
The aim was to test the prediction of TRACE that the size of the phonotactic effect is
determined by the similarity of the stimulus to words in the lexicon. Listeners judged the
113
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
middle 5 steps of a 7-step continuum from the tense [i] to the lax [i] in each of 16 carrier
contexts:
The phonotactic legality of the [i] endpoint could, as 4.3 shows, be varied by leaving
the final syllable open or closing it with [d3 ]. Since the segment preceding the ambiguous
vowel was always [j ], the TP theories (both INC-1 and SC-1) and the OT grammatical
theory expect only the open-closed manipulation to affect the location of the [i]-[i]
boundary.
Similarity of the stimulus to other words in the lexicon was varied by manipulating
the voicing of the consonant preceding the [j]: When the consonant was [g], the stimulus
was closer to more words than when it was [k]. Table 4.4, extracted from Celex's wordform
database (EPW.CD and EFW.CD), shows the English words ending in each of the eight
final syllables used in this experiment.2 Celex's British English transcriptions have been
2 For this experiment, counts were computed over wordforms rather than lemmas because the relevant
dependency (between a vowel and a word boundary) is affected by inflection. In other experiments, which
used initial clusters, the lemma and wordform frequencies are the same.
3 Celex has final [i] for angry, hungry, kukri, and mimicry, and, in general, for final unstressed IU
elsewhere. Jones (1997) gives [i] as both the BrEng and AmEng pronunciation o f all o f these words (except
kukri, which isn't in that dictionary).
114
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.3. Phonotactics of stimuli for Experiment 1
C a rrie r
[i] [I ]
[z o lg jJ V X
M g jJ V X
[p a lg jJ V X
M g iJ V X
[z o lk i_ J V X
[s o lk ij V X
[p o lk jJ V X
[to lk ij V X
[z a lg j_ d 3 ] V V
[s a lg i_ d 3 ] V V
[p o lg j_ d 3 ] V V
[to lg i_ d 3 ] V V
[ Z 3 lk i_ d 3 ]
V V
[s o lk j_ d 3 ] V V
[p o lk j_ d 3 ] V V
[ta lk i_ d 3 ] V V
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.4. Frequency of the syllables in stimuli for Experiment 1
[gji]4
angry 65 68 19
hungry 34 36 3
agree 20 16 75
bachelor's degree 0 0 0
disagree 2 2 14
filigree 1 1 2
first-degree 1 1 1
pedigree 2 2 2
second-degree 1 1 0
agree 20 16 75
disagree 2 2 14
agree 20 16 75
disagree 2 2 14
agree 20 16 75
disagree 2 2 14
[gnl
decree 0
4 Words with these onsets were extracted from Celex using the t r o h o c script. See the appendix to
Chapter 3 for the script and details o f how frequency counts were made.
116
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Frequency (per million words)
decree 0 0 0
scree 1 1 0
decree 0 0 0
decree 0 0 0
decree 0 0 0
[kn]
[gjid3 ]
[gjid3 ]
[kiid3 ]
[kjid3]
NOTE: Some forms appear more than once, because they are homophonous but
morphologically different: (to) agree, (I) agree, (we) agree, etc. Celex divides form
frequency equally among the homophones (Bumage 1995).
Table 4.5 shows the total word-final frequencies of each critical syllable:
117
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.5. Effect of the [k]/[g] manipulation on the frequency of the word-final syllables in
the stimuli of Experiment 1
0 0 0
[gji]
6 6 0
[kii]
[kn] 0 0 0
0 0 0
[giid3]
0 0 0
[gnd3]
[Iuid3 ] 0 0 0
[kiid3 ] 0 0 0
The closed syllables are very infrequent — they are not represented in Celex at all.
Both [i] and [I] are phonotacticaliy permissible in them, since they are closed. These
syllables provide a baseline context in which to assess the statistical effects, if any, of the
The legal open syllable [gii] is far more frequent than the illegal [gji], especially in
word-final position. TRACE expects the large frequency difference to produce a boundary
shift, as the many words containing final [gii] feed activation down to support [i] over [i].
118
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The frequency difference between [kji] and [ku] is much smaller, leading to a smaller
predicted boundary shift. The MERGE TP and OT grammatical theories treat the [g]/[k]
manipulation as irrelevant - it lies outside the statistically relevant context in MERGE TP,
and outside the structural description of *L ax ]<j in the OT grammatical theory - so both
4.2.3. Predictions
This section spells out in more detail the predictions of the three theories, with
It is notoriously hard to predict how a network will behave - they are just too
complicated. To check that TRACE really did make the predictions outlined in the last
The TRACE architecture has many adjustable variables, controlling things like the
speed with which activation spreads and the relative weight given to the different node
layers. The first step in the simulation was to get the right parameters and replicate the
results of McClelland and Elman (1986). After some trial and error, the following were
Thanks are due to Jeff Elman, one o f TRACE'S creators, for sharing his software.
119
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.6. Parameter settings for the TRACE simulation (all experiments)
a unit
a unit
activation to a unit
frequency
alpha[FP] 0 .0 2 Feature-to-phoneme
excitation
inhibition
alpha[PF] 0 .0 0 Phoneme-to-feature
excitation
alphafPFC] 0 .0 0 Feature-to-phoneme
coaiticulation
Massaro and Cohen (1983) experiment. The network was given as input the string /sLi/,
120
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where !\J was featurally ambiguous between 111 and /!/. After 51 cycles, the network display
corresponded almost exactly to McClelland and Elman's (1986) Figure 7, Panel 3, with /l/
about twice as active as 111 among the phoneme nodes and sleep and sleet the most active
words:
Figure 4.7. Results of the TRACE simulation replicating Figure 7 of McClelland and
Elman (1986)
CYCLE 51
sis
sil
sit
si 8 24
sid
sik
slip 31 14
slit 31 14
su
~s
15 36 36 17
6 8 26
20
38
10 33 67 67 33 10
The horizontal axis represents time; the words and phonemes are arrayed along the
vertical axis. TRACE has a separate unit for each word or phoneme at each time cycle
121
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(corresponding to the hypothesis that the utterance contains that word or phoneme
beginning at that time cycle). The activation level of each unit at the present time. Cycle 51,
is shown by a number if it is greater than zero. So, for instance, the network at this moment
is confident, to a degree of 6 8 (arbitrary measurement units out of 99), that there is an [s] at
time slice 12, and to a degree of 31 that the signal contains the word sleep, beginning at time
slice 12.
As a second check, the network was asked to process the input /Tluli/, where /T/ is
ambiguous between the phonotactically legal /p/ and the illegal, but lexically favored, /t/
(supported by the nearby word truly). By Cycle 54, we see an output almost identical to
McClelland and Elman (1986)’s Figure 8 , Panel 3: /truli/ is the leading word, and the
leading candidate for the cluster is the unphonotactic [tl] - the [t] being the lexically favored
disambiguation of the ambiguous ‘p /t\ and the [I] being the phonetically supported parse of
122
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.8. Results of the TRACE simulation replicating Figure 8 of McClelland and
Elman (1988)
CYCLE 54
30
it
lid
lig
lip
lis t
ru l 15
tr u p
tru
tru li 45 20
16 37 37 16
42
48
41 28
57
20
51 17
25
10 32 67
The original TRACE has only four vowels: (i a u A). These are distinguished one
from another by the features DIFfuse (evidently F2-F1: 8=high, l=low), ACUte (evidently
F2: 8=front, l=back), and POWer (8 for [i a u], 7 for [A] (which is both [a] and [a])).
123
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.9. Featural parameters of the four original TRACE vowels (McClelland & Elman
1988)
i 8 8 8
a 2 1 8
u 6 2 8
A 5 1 7
A new phoneme was added, corresponding to IPA [i], and a new ambiguous
Table 4.10. Featural parameters of the new vowels [1] and [X]
i 8 8 8
X 8 7 8
I 8 6 8
In tests with the lexicon turned off, IX/ was found to activate /i/ and IV equally.
The original TRACE lexicons only included words whose vowels were drawn from
the set /i a u A/. To create a new lexicon including /I/-bearing words, I took the CELEX
English lemma pronunciation Tile EPL.CD, merged it with the CELEX English lemma
frequency file EFL.CD, and culled therefrom the words having all of the following
properties:
124
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. They contained only the phonemes in the modified TRACE (the original
complement + [I], with TRACE [A] corresponding to CELEX [V] and [@] (IPA [a ] and
[3 ])-
2. Since TRACE does not support the [d3 ] phoneme, it was simulated with [S],
CELEX words containing |J] were excluded. Those containing [d3 ] were included, with the
3. They occurred at least 16.7 times per million in the combined spoken and written
but I had to go lower so that this lexicon would be of similar size to their lexicon s le x ) .
The stress and syllabification marks were stripped, the phoneme codes were
converted, and homophones were collapsed into a single entry, with the frequencies added
together. Celex's British-English coding of final [i], as in angry, was converted to the
American English [i]. BrEng [a:] was converted into AmEng [at].
The resulting lexicon contained 241 lemmas (making it about the same size as the
original slex, which has 213). Of them, 5 words had /gji/, 3 of them finally: agree, degree,
disagree, Greek, and Greece/grease, while 1 word had /k_ii/, nonfinally: creep. It contained
no words with [giid 3 ] or [kiid 3 ]. There were no words with /gji/, but 5 had /kii/: cricket,
TRACE can be made sensitive to word frequency. Since McClelland and Elman
(1986) did not use this feature in their simulations, I did not use it in this one.
Simulations with this lexicon showed that TRACE did not distinguish between the
open- and closed-syllable contexts at all. There was no difference in activation level
between [i] and P], regardless of whether the vowel was word-final or not, because TRACE
125
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
was insufficiently sensitive to silence as a word-boundary cue. To remedy this, the TRACE
symbol for silence was added to the end of each word in the lexicon, where it acted as
another phoneme.
To establish a baseline free of lexical bias, the network was probed with the
expect little lexical influence, and we expect the [X] to be equally ambiguous in both
contexts. As a measure of ambiguity we will use the difference between [i] and [I]
activation on Cycle 75 (about one syllable's length after the stimulus offset on Cycle 54).
This point was chosen because trial showed that no major changes in the relative activation
levels of the phoneme units happened later; rather, after Cycle 75, overall network activation
In the [gj_d3 ] condition, [i] leads [I] by 43 to 36; in the [kj_d3 ] condition, [1] leads
[i] by 40 to 34. This represents a modest but definite bias, due mostly to the influence of
Greek, grease/Greece, and greet. The [I] interpretation is supported in both cases by the
lexical item ridge, but in the [gj_d3 ] condition ridge's activation is reduced by inhibition
We now make the same comparison for the critical cases [salgjX] and [salgjX],
126
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.11. Results of the TRACE simulation for the input [sslgjXf]
CYCLE 75
42 45 48 51 54 57 60 63 66
S i-
Su-
d- 31
g rik -
g ris - 38 20
g rit-
l- 31
rlS - 29 31
s llk -
s il-
22
63
59
63
60 19
43
36
61
22 61
127
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.12. Results of the TRACE simulation for the input [solkiXfl
CYCLE 75
II-
It-
S i- 10 16 10
Su- 10 16 10
d- 31
l- 31
k rip - 20
k ru d -
rlS - 3 6 39
rid -
s llk - 42
61 20
61
61 21
34
40
60
22 61
128
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.13. Results of the TRACE simulation for the input [sslgjX]
CYCLE 75
d- 31
g rik - 29
g ris - 29
g rit- 29
31
k ru -
li-
rlS -
rid -
s llk -
s il-
23
64
61
60 20
22 61
129
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.14. Results of the TRACE simulation for the input [sollciX]
CYCLE 75
27 30 33 36 39 42 45 48 51 54 57 60 63 66
II-
It-
d- 31 16 15 19 19 20 13
l- 31 18 30 15 19 19 20 13
it-
k rip - 29
k ru - 28 27
k ru d -
li-
rlS -
rid -
sllk- 38
70
61
59
61 25
42
33
60
22 61
TRACE favors [i] over [I] by 52 to 26 in the [gj_] condition (thanks to the support
of agree), and by 42 to 33 in the [kj_J. There is a phonotactic bias towards [i] in both
cases, but it is much larger in the [g jJ than the [kj_] condition, as we had supposed, owing
130
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to the lack of words ending in [kj_]. If we take the difference in activation level between the
[i] and [I] units as our predictor of effect size, we expect that the proportions of [I]
responses evoked by the same ambiguous vowel in the different contexts to be ordered as
follows: The most [I] responses in [ki_d 3 ] (difference = +6), then [gj_d3 ] and [kj_] (-7
What has happened is that words containing a non-final [gji], [kii], [gn], and [kn]
become partially activated, and provide the same amount of top-down support, regardless of
whether the final syllable of the stimulus is open or closed. However, the open-syllable
condition also allows the population of similar [gji]-final words to assist [i]. (In this
example, with a restricted lexicon, that population is limited to agree, but as we have seen,
there are many more.) Since there is no comparable population of [kji]-final words, [i]
receives less assistance in the [lu_] condition than in the [gj_] condition.
4.2.3.2. MERGE TP
There are two versions of the MERGE TP theory to be considered: INC-1, which
treats preceding and following context separately, and SC-1, which treats them together. As
we discussed in 1.3, the theories' predictions are made using different decision variables.
When a segment ambiguous between x and y is judged in the context A_B, the boundary
131
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.16) SC-1 theory (surrounding context of one segment each way)
In this experiment, the context A is [j ], and the context B is either the word
boundary [#] or [d3 J. Context counts were made from the Celex wordforms database
fril and fnl. All words coded as containing [ji] or [it] were extracted. There were
4117 words with [ii], occurring 124902 times in the 18-million-word corpus, and 12150
words with [j i ], occuring 493281 times. To correct for Celex's coding of American English
final unstressed [i] as [i] (as in angry), the 918 words ending in [j i ], occurring 122617
times, were transferred to the [ji] group. The resulting frequency counts were
Frequency Frequency
Since [j ] occurs 108263 times per million words in Celex (EPW.CD, combined
132
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.18)
fi#] and fi#). The latter does not occur at all in American English. To estimate the
frequency of the former, all words in Celex (wordforms, EPW.CD) coded as ending in [i]
or [i] were extracted, and the [i]-final words were recoded as [i] (to correct for Celex's
British-English transcription of words such as angry). A total of 7611 words were found,
with a total frequency of 2422130 in the 18-million-word corpus, or 134563 per million.
Since the word-boundary [#] occurs one million times per million words, the probabilities
are
(4.19)
fidxl and fidx!. Following the same procedure, we find 133 words with [id3 ],
occuring 5965 times in the 18-million-word corpus or 332 times per millon words, and
1340 words with [id3 ], occuring 62103 times in 18 million words or 3450 times per million
words. Since [d3 ] occurs non-initially 12483 times per million words, the probabilities are
133
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.20)
(4.21)
The INC-1 theory predicts that [i] will be strongly favored in the open-syllable
context, and that [ij will be favored in the closed-syllable context. If the closed-syllable
context is taken as a baseline, we should observe a strong phonotactic shift in favor of [i]
when in the open-syllable context. The rate of [I] responses across the continuum should
fri#] and [j i # ] . The latter does not occur in American English. The former is found
in 12249 words in Celex wordforms (EPW.CD) (coded as [ji#] in 99 cases like agree, and
as [ji#] in 12150 cases like angry) with a total frequency of 32795 occurrences in the 18-
million-word corpus, or 7952 times per million. Since sequences of the form [j X#] occur
134
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.22)
Pr ([J i# ] | [J _ # ]) = 0.697
Pr ([ j i # 1 | [J _ # ]) = 0.000
fjid-^1 and fiithl. The former is found in 53 wordforms with a total frequency of 94
per million; the latter in 399 wordforms with a total frequency of 956 per million. Since
there are 1581 occurrences of [iXd3] per million words, the probabilities are
(4.23)
Pr ( [ J id 3 ] | [i_d3]) = 0.059
P r ( [ n d 3 ] |[ J _ d 3 ]) = 0.605
Here again, the theory predicts a strong shift in favor of [i] in the open-syllable
context, and a shift in the other direction in the closed-syllable context. If the closed-
syllable context is taken as a baseline, we expect a large boundary shift in favor of [i] in the
open-syllable context. The rate of [I] responses across the continuum is expected to be
When a stimulus is presented which is ambiguous between, e.g., [sA lgri] and
[sAlgri], the Phonetic Parser may emit the parse [sA l.g ri], the parse [sA i.g ri], or both, with
probability depending on how close the stimulus is acoustically to [i] or [i]. If both parses
are emitted, they will be scored with respect to the active constraints of English, and the
135
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Since the grammar o f English forbids final lax vowels such as [I], there must be an
(4 .2 4 ) *LAX]<j
For our hypothetical example, this will be the only active constraint which
UR = • La x Jc
a. (•, [SAlgri])
b. (•, [sA lg n ]) *!
Since the more harmonic parse is processed first, responses will tend to be based on
the [i] parse, creating a bias towards [i] response. Since the bias is caused by Lax ]<j , it will
be present to the same degree in any stimulus meeting that constraint's structural description
- specifically, to the same degree in the [gr_] contexts as in the [kr_] contexts.
In the closed-syllable conditions, where both [i] and [I] are legal, no active
markedness constraints are violated by either parse, so no bias should be observed - and,
136
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.4. Methods
Paradigm. The task was an AXB judgment. Listeners heard one endpoint of a
continuum, then an intermediate stimulus, then the other endpoint, and judged whether the
intermediate stimulus (X) sounded more like A or more like B. Response was by button
press. Every AXB was also presented as BXA to counterbalance for primacy, recency, and
handedness effects.
Stimuli The A and B stimuli were synthetic disyllabic nonwords, stressed on the
second syllable, which differed in one segment — the initial segment for fillers, the vowel of
the second syllable for critical items. Between each A and B there were five intermediate X
stimuli, separated from each other and the endpoints by equal steps, making a 7-step
continuum in all6. Synthesizer parameters for the stimuli can be found in the appendix.
Stimuli were of high quality and sounded very similar to natural speech.
Figures 4.26 and 4.27 show how the A-to-B stimulus scales were constructed.
Every possible combination of the bracketed options was used to give a total of 32 filler
scales and 32 critical scales. Since there were 2 endpoints and 5 intermediate points for
each scale, the experiment required synthesizing 448 nonwords. This was done using the
homebrew software that constructed the intermediate points from the endpoints by linear
interpolation.
6 This is considerably fewer interm ediate steps than were used by Massaro and Cohen (1983) or Pitt
(1998). The disyllabic stimulus words, and the three-stimulus AXB trials, used in the present experiment
made each trial much longer than the sim ple monosyllabic X stim uli o f those authors. Use o f more
intermediate steps in this experim ent would have led to an impractically long experimental session. In the
event, acceptable psychometric functions and high significance levels were obtained with 5 intermediate
steps.
137
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.26. Schema for the filler stimuli of Experiment 1
A—B
p—t g i 0
+ + + r + +
a1
s—z k I
d3
A-B
P
t g 0
s + + + r + i-i +
a!
z k <*3
The sound [d3 ] was chosen because, being palatoalveolar, it has little coarticulatory
interaction with, or acoustic effect on, the high front vowels [i] and [I]. The synthesis
parameters for the [d3 ] were adjusted so that the transition from [i] to [d3 ] and that from [I]
[pAlkji]-X3 -[pAlkn], [zAlgjid3 ]-X i-[zAlgiid3 ]. Because the A and B stimuli on any given
7 A pilot experiment used final syllables that were closed with [b] instead o f Ij], and that began with [b] or
[gl rather than [k| or (g). Identification o f ambiguous tokens turned out to depend mostly on how many
[b |s were in the stimulus: The more [bis, the more [il-like the vowel sounded. This was probably an effect
o f compensation for (expected) coarticulation —listeners expected the vowel formants to be lowered by
138
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
trial could differ in initial consonant or final vowel, subjects never knew where the difference
was until they heard the X trial. This, together with the instruction to compare whole words
and the greater variety of word-initial than word-final differences, was intended to distribute
their attention more evenly over the word and discourage the strategy of listening only to the
final vowel in critical trials. The hope was to produce stronger TRACE-type lexical effects,
reported having normal hearing and being native speakers of American English. They were
recruited by poster and paid for their participation. They were naive to the purpose of the
experiment.
Procedure. Subjects were tested four at a time in a quiet room. AXB stimuli were
low-pass filtered at 4.133kHz (down about 80dB at 5kHz), amplified, and presented over
between the end of A and beginning of X, and between the end of X and beginning of B.
Subjects were told that they would hear three "words", that the middle word had been
digitally synthesized to be acoustically in between the first and last word, and that they were
to judge as quickly and accurately as they could whether it sounded more like the first word
or the last word. Response was by button press — the leftmost button on the response box
for the first word, the rightmost for the last word. After the last subject had responded, or
2.5 s had passed, there was a pause of 2.5 s, followed by the next trial. Each of the 320
different trials was presented twice. The experiment lasted 2 hours, broken by a 5-minute
4.2.5. Results
One subject who, in all four open-syllable conditions (gri/grl/kri/krl) gave fewer
than 75% [I] judgments at position 1 and more than 25% [i] judgments at position 5, was
labialization, and "corrected" them by, in effect, adding a few tens o f Hz to the F2s o f stimuli with [bjs in
139
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
exculded from analysis. All other subjects were very consistent at the extreme positions.
Their identification curves are shown in Figure 4.28 (average over all subjects). Half of the
data from three subjects (consisting of one presentation of each trial) was lost through
experimenter error.
Figure 4.28. Identification curves for the stimuli of Experiment 1, pooled across 14
listeners
100
90
80
70
60
% "l" kr d Z
50
response
40
30
20
10
0
1 2 3 4 5
Intermediate stimulus number
For a test statistic, we used each subject's mean % [I] responses across all five
them . Result: [b] shifted the vowel judgm ents in favor o f the more fronted [i].
140
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.29. Mean % [I] response, all intermediate stimuli
The order of [I] response rates is [gj_d3 ] = [kj_d 3 ] > [gj_] = [ki_], precisely as
predicted by the MERGE TP and OT grammatical theories, and very different from the
The confidence intervals are wide because there is a great deal of individual variation
between subjects in their overall [I] report. To reduce this, the results are submitted to a
paired sample /-test. We have three degrees of freedom, so we can test three differences: 1.
between [gj_d3 ] and [kj_d 3 ], predicted by MERGE TP and OT/grammar to be zero and by
TRACE to be negative; 2. between [gi_d3 ] and [gi_J, to see whether the paradigm is
sensitive enough to find a phonotactic effect; 3. between the shift from [gj_d3 ] to [gj_J and
that from [kj_d3 ] to [kj_]. This last is the crucial comparison; TRACE predicts that the
difference will be positive (that the shift will be larger for the g syllables), while the other
two models predict that it will be zero. The numbers are given in Table 4.30.
141
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.30. Differences in mean "I" response, pairwise by subject
We find no difference, or at best a very small one, between the two closed syllables,
confirming that they may be used as a neutral baseline - as predicted by MERGE TP and
The CIs for the differences between the open and closed syllables excludes zero (in
fact, it excludes zero even at a 99% confidence level (one-tailed t test, /0.01, 14 = 2.624, p <
0.001). In 13 out of the 14 valid subjects, judgments shifted towards [I] on both the g
continuum and the k continuum. We have thus replicated the Massaro and Cohen (1983)
Moreover, the effects seem to be the same irrespective of how many lexical items are
similar to the legal nonword; the difference between the effect in the common g syllables
and the rare k ones is close to zero. The subjects’ numerical differences clustered around
zero, split evenly between positive and negative (there was no sign that they divided into two
groups of responders).
TRACE'S only explanation is lexical activation spreading from partially overlapping words.
But a drastic reduction in the number and frequency of those words (when the [g] was
142
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.6. Discussion
The results are clear-cut: They are very much as predicted by the MERGE TP and
the OT grammatical theory, but very different from our expectations under TRACE.
compared to closed ones. TRACE could only explain this by activation-spreading from
similar lexical items. However, a large difference between conditions in the number (and
frequency) of those items, caused by the [k]/[g] manipulation, produced equally large shifts
in both conditions. This result is not a "null effect". It is two positive effects - one of
which was expected under all theories, and the other of which was not expected under
TRACE.
The MERGE TP theory and the OT grammatical theory were both able to explain
the observed facts, because both of them focused on the crucial, systematic phonotactic
difference between the conditions and ignored incidental variation elsewhere in the stimulus.
TRACE, because it can't ignore anything, failed to predict an effect that was actually
observed.
greater than one, as considered by Pitt and McQueen (1998: Note 2), we would have
erroneously predicted the same pattern of results as TRACE, for the same reason. These
4.3.1. Rationale
effect is unaffected by phonological context which does not directly participate in the
phonotactic pattern. To check that this finding was not an artifact of the acoustic
143
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
constitution of the stimuli, or of some idiosyncrasy of the lexicon, it was decided to replicate
exploited was the lack of syllable-initial [pw] discussed in §1.3.2.5. Listeners were to be
Given the low frequency of [pw] and high frequency of [kw], both TRACE and the
MERGE TP theory predict fewer [p] responses would be given before [_w]. If the size of
the boundary shift is modulated by the vowel following the [_w], this would favor TRACE,
which is sensitive to the entire carrier stimulus, over MERGE TP, which only considers the
As we saw in §2.3.2.5, the lack of [pw] is a lexical rather than a phonological gap.
Since no active markedness constraint forbids [pw], the OT grammatical theory predicts no
boundary shift.
4.3.2. Design
144
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.31. Phonotactics of the stimuli for Experiment 2
Carrier
[Pi [kj
Lwifhet] V. A
Lwifnetfl V. V
Lwaefiist] V. V
Lwaefhetfl 7? V
Lnfnet] V V
Lufnetfl V V
[_jaefhet] V V
[_j2efnetf] V V
The phonotactic marginality of the [p] endpoint could, as Table 4.31 shows, be
varied by changing the following glide from [w ] to [ j ]. Since the ambiguous consonant was
always preceded by the same thing (silence), the TP theories (both INC-1 and SC-1) expect
only the [w ]/[i] manipulation to affect the location of the [p]-[k] boundary.
Similarity of the stimulus to other words in the lexicon was varied by manipulating
the quality of the vowel following the glide: When the vowel was [i], the stimulus was
closer to more words than when it was [»]. Table 4.32 counts the words beginning with
each of the stop-glide-vowel sequences used in this experiment. The cohorts are given in
the appendix to this chapter. (These cohorts, unlike those in the previous experiment, were
145
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
extracted from the American English Kucera-Francis database rather than from Celex, even
though Celex is much more complete, because there are two different phonemes in British
[kwi] 24 113
[pwi] 1 0
[kwte] 3 10
[pws] 0 0
[kill 44 418
74 772
[pn]
[kias] 21 77
[pis] 14 139
Both [p] and [k] are roughly equally frequent in the [_j i ] and [_jse] contexts,
providing a statistically neutral baseline. Only [k] is found in the [_wi] and [_wae] contexts
(except for the very infrequent puissance, probably not known to most of the listeners), but
it is much more common before [_wi] than before [_wae]. By the same logic as in
Experiment 1, TRACE predicts more lexical activation in the [_wi] stimuli than in the [_wse]
146
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ones, and hence a larger bias towards [k]. The MERGE TP and OT grammatical theory
predict that the vowel will make no difference - MERGE TP because it is outside the
statistically relevant context, and the OT grammatical theory because no active markedness
constraint is violated.
4.3.3. Predictions
TRACE: [w]. This was done by modifying the featural parameters for [u]. The glide was
A new ambiguous phoneme [Y] between [p] and [k] was constructed by averaging
the feature values of the original [p] and [k]. The new [Y] was not quite in between them;
when run with the lexicon turned off, TRACE tended to favor the [p] interpretation. The
database. All words were extracted which met the following criteria:
1. They contained only phonemes which were in the new TRACE inventory. Since
TRACE does not support [ae], words with [a] were eliminated, and words with [ae] were
included, with the [ae] recoded as "a". Since TRACE does not support [f], words with [S]
were eliminated, and words with [f] were included, with the [f] recoded as [S],
147
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The latter two criteria were imposed to keep the lexicon to a size comparable with
that used by McClelland and Elman (1986). This procedure resulted in 494 lexical items.
The critical experimental sequences occurring in the lexicon are shown in Table 4.33:
Sequence Words
[pwi] (none)
[kwae] quack
[pwae] (none)
[pjtel (none)
In order that the simulated lexicon should better approximate the real one, the words
practise8 and practically were added so that the [piae] cell would not be empty. These
words had been excluded by the lexicon-constructing procedure because practise had zero
frequency and practically had a syllabic [1] in the 4-syllable pronunciation favored by
Kucera and Francis. They were added as [pjaektis] and the 3-syllable [pjaektikli].
148
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Pre-testing showed that, because of the close similarity between the [u] and [I]
phonemes in TRACE, the stimulus [pjifkAs] produced very strong activation of the word
proof, which soon came to dominate the pattern of activtion. These two phonemes are not
easily confusable in actual speech; Luce's confusion matrices (1986:Table 3.8) show that,
even at a -5 dB signal-to-noise ratio, [I] was heard as [u] only 3.3% of the time. Judging
this to be an undesirable artifact of TRACE'S small feature set and my choice of [I]
The simulation was run using the inputs [Y w a e fk A s], [Y w iA c a s], [YiaefkAs], and
[YnfkAs]. As before, the measure of predicted effect size was taken to be the difference in
activation between the [p] and [kj units at Cycle 75. These are shown in Table 4.34:
Table 4.34. Results of the TRACE simulation of Experiment 2: Activation levels at Cycle
75
[YwiQcas] 21 46 -25
[YwaefkAs] 23 42 -19
[Yji Acas] 36 37 -1
[YjaefkAs] 26 40 -16
As we expected, the [_wi] context produces a higher level of [k] activation than the
[_wae] context, and a larger difference between [p] and [k] activation levels. The [_ ji]
context is very nearly neutral betwen [p] and [k]. The [_Jae] context produces an
149
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
unexpectedly strong bias towards [k], which on closer inspection proves to be emanating
from the highly active word craft. This is no artifact; [kixfkAs] contains a sequence
differing from craft in only one feature, and that late in the word where acoustic mismatches
are least inhibitory. TRACE'S predictions of the rate of [p] response across the whole
continuum are therefore [_n] > [ _ J x ] = [_wae] > [_wi] The relatively low frequency of
craft may increase the [p] bias before H ® ] in actual practice, leading to a predicted order
[_ n ]58 L-i®] > [_wae] > [_wi], but in any case we do expect [_W®1 > L ^ l - that is, that the
magnitude of the phonotactic shift away from the illegal *[pw] sequence will be modulated
4.3.3.2. MERGE TP
depending on what is taken to be the relevant phonological context: The INC-1 version,
which uses the segment preceding and the segment following the ambiguous segment, taken
separately, and the SC-1 version, which takes context to consist of the preceding and
following segment as a unit. Both of these theories are crucially distinct from TRACE in
that the first vowel of the stimulus lies outside the relevant context.
Frequency counts derived from the American English Kucera-Francis database are
9 It is not clear how differing activation levels in TRA CE units are to be mapped onto different
boundaries, or what constitutes a "large" difference. An estim ate is provided by the sim ulation o f the
Massaro-Cohen r/l effect in McClelland and Elm an (1986), and replicated in § 1.3.2.1, which found a 20-
150
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.35: Diphone frequencies for the stimuli in Experiment 2
[#p] 23245
[#k] 26201
[pw] 27
[kw] 2480
8525
[pi]
[ki] 3018
Since the word-boundary symbol [#] occurs, by definition, one million times per
million words,
(4.36)
Pr([p] | [ # J ) = 0.0232
Pr ([k] | [ # J ) = 0.0262
Initial [w] occurs 43594 times per million words, and [w] in general 59750, so non
point difference between the activation o f the [j ] and [I] units after [s]. A difference o f this magnitude
should correspond to an effect of substantial size.
151
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.37)
Initial [j ] occurs 14864 times per million words, and [i] in general 118503, so non
(4.38)
(4.39)
A [p] is 103 times less likely than a [k] in [#_w], but 2.5 times more likely in [# _ j ].
The INC-1 theory therefore predicts a strong bias against [p] in the [#_w] context
compared to the [# _ j ] context. The order of [p] report across the continuum is expected to
c o n te x t.
152
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.3.2.2. SC-1 context
In this version of the MERGE TP theory, the left and right contexts are treated as a
single unit. Relevant frequency counts from the American English Kucera-Francis database
[#pw] 0
[#kw] 1172
[#pj] 6858
[#ki] 1487
Since sequences of the form [#_w] occur about 12704 times per million words, and
(4.41)
Again, we expect a sizable bias against [p] in the [#_w] environment compared to
the [#_j ] environment, since [p] is infinitely less frequent than [k] in [#_w], but is 4.6 times
153
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
more frequent in The following vowel, being outside the relevant context, is again
predicted to make no difference, so that the predicted order of [p] response across the
is too low-ranked to actually ban them, OCP[Lab ]. It does not dominate any faithfulness
perception.
UR = • | (faithfulness) OCP[LAB]
a. (•, [kwifkous])
*
b. (•, [pwifkous])
However, it is also possible that listeners' grammars differ: some rank OCP[LAB]
high enough to make [pw] illegal, while others do not. For these listeners, the predicted
order of [p] response rates is [# _ j i ] = [# _ J a e ] > [#_wtj = [#_wae], just as in the MERGE TP
theory. The vowel plays no role, because it is outside the structural description of
OCPfLAB], The effect will be attenuated by averaging these listeners in with the others, but
no other pattern of results should occur besides the null one and [#_Ji] = [#_Jae] > [#_wi] =
[#_wae],
154
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.4. Methods
The same methods were used as in Experiment 1, except for the stimuli, which were
A—B
I t
p -k + + + +
w fits
ae
tf
I
P
+ w + + fhe + Hf
X
k
The [p]-[k] continuum was made by contracting the bandwidth of a burst centered
at 875 Hz from 1000 Hz wide to 100 Hz wide, so that a diffuse burst became a compact one
155
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As in Experiment 1, there could be an ambiguous phoneme at the beginning or end
of the stimulus. On any AXB trial, listeners could not know where it was until they had
To minimize the chance that they were familiar with [pw] onsets from foreign-language
study, only listeners who had not studied French or Spanish were allowed to participate.
4.3.5. Results
One listener's data did not reach criterion performance on the endpoint stimuli and
was discarded, leaving 7. A total of of 3840 trials were collected, of which 100 were
discarded (for pressing an unassigned button, or having an RT above 1500ms). Results for
each of the 4 conditions are shown in Figure 4.45 and Table 4.46. No trace of a
156
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.45. Identification curves for the stimuli of Experiment 2, pooled across 7 listeners
90
80
70
60
50
O/ nmh
70 p
response
40
30
20
10
0
1 2 3 4 5
Intermediate stimulus number
157
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.6. Discussion
constraints seem to have an effect here. This is a null result (and one based on a small
sample); drawing conclusions from it is problematic. The only theory that directly predicted
a null result was the OT grammatical theory, but might there have been confounds that
One possiblity is that the stimuli were ill-chosen. It might be that categorical
perception of the initial stop stimuli left little time for the development of lexical effects.
However, a follow-up experiment in which the stop was part of the context and the glide was
varied from [w ] to [j ] was attempted, and likewise failed to find any effect: Averaged across
seven listeners, the percentage of "r" response was 58.1 after [k_] and 55.9 after [p_].
the glide counteracting the phonotactic effect. However, the two best-known of these
effects, auditory contrast and compensation for coarticulation, are both expected to assist the
phonotactic effect. Auditory contrast would make the bursts sound higher before [w], and
hence produce fewer [p] responses there. Compensation for coarticulation would likewise
attribute some of the labiality of the burst to anticipatory rounding before a following [w],
making the burst have to be more labial in order to sound like [p] and again reducing [p]
responses before [w]. This is what was found by Bailey, Summerfield, and Dorman (1977,
as cited in Repp 1982), who presented a [b]-[d] continuum before front and back vowels
and found that the vowel with the lower F2 produced more [b] responses.
directly compare the phonotactic badness of [pw] with that of [tl], a configuration known to
158
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
be illegal enough to cause boundary shifts on an [j ]-[1 ] continuum (Massaro & Cohen
1983, Pitt 1998). Since the statistics of [pw] and [tl] are very similar, any finding of a
difference between them in phonotactic efficacy would be strong evidence for the OT
4.4.1. Rationale
Although [pw] is a marked onset in English, it is not as marked as those which have
hitherto been shown to cause phonotactic boundary shifts. In §1.3.2.5, [pw] was analyzed
as a lexical rather than a phonological gap, violating only the low-ranked OCP[Lab ]. On
the other hand, it is agreed by all authorities that initial [tl] is not permitted in English. The
boundary between [j ] and [1] is closer to [1] after [t_ ] than after [p _ ] (Massaro & Cohen
1983, Pitt 1998). In Chapter 2, §3.2.5, we have analyzed this as a consequence of a general
prohibition on successive same-syllable [-cont] segments using the same major articulator,
By presenting the same [p]-[t] continuum before [w], [j], and [I], it was hoped that
we would be able to compare the size of the phonotactic boundary shift produced by [w]
159
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.2. Design
Listeners were presented with a 7-step [p]—[t] continuum (two endpoints and five
Carrier [t]
[Pi
[_wifkoos]
? V
Lw ivnam ]
7 V
Lwaefkous]
7 V
[_waevnam]
9 V
[_Jifkous] V V
[_Jivnam] V V
[_jaefkous] V V
[_jaevnam] V V
Llifkous] V X
[_livnam] V X
[_laefkous] V X
[Jaevnam ] V X
160
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The phonotactic status of the endpoints could be varied by manipulating the glide.
An [1] made [t] illegal (a phonological gap); a [w] made [pj marginal (legal, but infrequent -
MERGE TP, [pw] and [tl] are both disfavored, since both sequences are of near-zero
probability. To the OT grammatical theory, only [tl] is illegal (ruled out by an active
markedness constraint). Hence MERGE TP predicts that both [w] and [1] contexts will
shift the [p]-[t] boundary, in different directions, compared to the [j ] baseline. The OT
grammatical theory, on the other hand, predicts that the [1] context will cause a much larger
As before, the MERGE TP and OT grammatical theories predict that the vowel of
the initial syllable, [i] or [ae], will have no effect on the boundary location, being outside of
the statistically or phonologically relevant context. TRACE, on the other hand, expects the
choice of vowel to contribute to the effect: since [plae] is a much more frequent word onset
than [pli], the shift should be larger before [lae] than [li]; and since [twi] is more frequent
than [twae], the shift (in the other direction) should be larger before [wi] than before [was].
Cohort sizes and frequency counts illustrating this are shown in Table 4.18; again,
161
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.48. Frequency statistics for the stimuli of Experiment 3
[twi] 10 34
[pwi] 1 0
[twae] 1 0
[pwae] 0 0
[tn] 29 128
74 772
[pji]
[tiae] 79 600
[pjae] 14 139
[till 0 0
2 1
[ph]
[tlx] 0 0
[plae] 33 548
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.3. Predictions
These stimuli were chosen in the expectation that the TRACE response would be
determined chiefly by the word onsets, from stop through vowel. The rate of [p] response
before [_w], L-0, and [J] was expected to be determined largely by the relative frequencies
(4.49)
since [pi] will activate a much larger cohort than [tl], and [tw] than [pw].
The cohort sizes, and activation strengths, were expected to be modulated by the
vowel. For example, an initial [pi] will activate a cohort of words. Table 4.48 shows that if
the following vowel is [i], only a couple of rare words will receive support from that [i],
while the rest will tend to be deactivated by the mismatch. If the vowel is [ae], on the other
hand, a larger set of words will be supported and further activated, and correspondingly
fewer will be deactivated. Hence [p] should receive more support from [plae] than from
[pit]. If this is what happens, TRACE should make the following predictions:
163
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.50)
between [p] and [t], which activates both equally. This was used to construct the simulated
ambiguous stimuli.
Simulations for the [_fltous] and [_vnom] stimuli were made with slightly different
lexicons, one of which included words with [f] and the other of which included words with
[v]. These simulations are discussed separately in 4.4.3.1.1. and 4.4.3.1.2.; the results are
The same lexicon was used as in the simulation of Experiment 2. Words in the
lexicon which began with the critical onsets are shown in Table 4.51. This approximated
164
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.51. Words beginning with the critical onsets in the lexicon used for the TRACE
simulation of Experiment 3
Onset Words
[twi) twist
[pwi] -
[twae] -
[pwae] -
[tli] -
-
[pli]
[tlae] -
When the simulation was first run, it was found that every context produced an
extremely strong [t] bias (the ratio of activation levels at Cycle 75 being about 70 to 10).
This was because a large number of short words ending in [t] were slightly activated by the
165
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
initial ambiguous segment, and remained slightly active throughout the simulation, giving [t]
a large cumulative advantage regardless of the following segments. This artifactual effect
completely swamped any influence of the remainder of the stimulus. To circumvent this
problem, the lexicon was edited, and word-final [p] and [t] were recoded as [b] and [g], so
When the simulation was run again, it turned out that TRACE did not predict any
phonotactic effect in this experiment. The biases before [_WL L JL and [J ] were of
comparable size, as were the activation levels of the [p] and [t] units:
Table 4.53. Results of the TRACE simulation of Experiment 3: Activation levels at Cycle
75
[TwifkAs] 38 32 6
[TwaefkAs] 44 27 17
[TjifkAs] 42 29 13
[TjrefkAs] 36 39 -3
[TlifkAs] 38 31 7
[TlaefkAs] 39 30 9
166
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Why is the bias in the pre-[_l] cases so small? The TRACE displays show that the
network is extremely good at spotting words, or parts of words, inside other words.
When the [TlifkAs] stimulus is presented, the most activated word units on Cycle 75
are leaf (37), plea (32), and subtly (23). The first of these is neutral between [p] and [tj.
The other two, plea and subtly, urge in opposite directions and cancel each other out. The
lexicon contains no [ph]-initial words which would decide the issue in favor of [p], so, as we
When the [TlaefkAs] stimulus is presented, by far the most active unit on Cycle 75 is
laugh (46). It becomes active early, and is strong enough to inhibit other word candidates,
such as placid and plastic, which we had counted on to produce a larger [p] bias.
If the differences are taken as predictors of the rate of [pj report across the
continuum, then the expected order of effects is [_wae] > [_ j i ] > [_lae] = [_li] = [_wi] >
[_jae]. The average difference before [_w] is 11.5; before [_ j ], 5; and before [_1], 8.
TRACE predicts
(4.54)
[ _ w ] > [ J ] > L i]
and
167
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.55)
For this simulation, a lexicon was selected using almost the same procedure as in
Experiment 2. The procedures were the same except that where the Experiment 2 lexicon
included words with [f] (recoded as [fl), this one included words with [v] (recoded as [j]).
Pretesting showed that prove and approve tended to dominate responses in the pre-[_j]
conditions, as proof had in Experiment 2, so they were removed on the same grounds: that
[i] and [as] are not actually very confusable with [u]. As with the simulation in the previous
section, word-final [p] and [t] in the lexicon were replaced with [b] and [d]. The resulting
lexicon contained the same set of words with the critical onsets as in the simulation for the
V .
168
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.56. Results of the TRACE simulation of Experiment 3: Activation levels at Cycle
75
[TwivnAm] 41 29 12
[TwaevnAm] 48 20 28
[TjivnAm] 49 26 23
38 35 3
[TiaevnAm)
[TlrvnAm] 43 30 13
[TiaevnAm] 44 25 19
Again, the size of the difference in each case is determined by one or two lexical
items. For the [TlivnAm] stimulus, there is some activation from plea (32) and pleaded (23)
in support of [p], which is reduced somewhat by subtly (16). The larger [p] bias for the
[TiaevnAm] stimulus is due to placid (66). The great difference in bias between [TuvnAm]
The predicted rates of [p] report are [_wae] > [_ j i ] > [Jae] > [_wi] > [_h] > [jtas].
The average values of the differences are: [_w], 20; [_ j ], 13; [_1], 16, so TRACE predicts
169
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.57)
[_ w ]> [_ l]> [_ i]
(4.58)
Our expectations of what TRACE would do are only partially supported by actual
simulations, which revealed TRACE'S extreme sensitivity to individual lexical items. The
expected phonotactic effect, [_w] > [_ j ] > [_1], was not supported by the simulation. In both
the [_fkous] and [_vnAm] contexts, [_w] produced the largest [pi bias when it had been
expected to produce the smallest, and LU. expected to cause the largest, never did. The
cohort of words activated by the initial stop-glide sequence did not long remain active.
When the vowel arrived, it deactivated the great majority of the cohort members, reducing
On the other hand, the effect of the following vowel was pretty much as expected,
because the stop-glide-vowel sequence was three segments long— long enough to mismatch
all but one or two lexical items and activate them sufficiently to dominate the lexicon and
decide the outcome. The phonotactic effect in TRACE does not come from partial activation
of many lexical items, but from high activation of one or two. We therefore expect
(4.59)
170
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4A.3.2. MERGE TP
Frequency counts from the American English Kucera-Francis database are shown in
Table 4.60. The "forbidden” [pw] and [tl] occur with fair frequency within words (lapwing,
potlid).
23245
[#p]
[#t] 41840
[pw] 27
[tw] 996
8525
[pj]
[ti] 8468
3589
[pH
[tl] 220
171
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hence
(4.61)
Two-phoneme sequences whose second member is [w] occur 16156 times per
million words, so
(4.62)
million words, so
(4.63)
Two-phoneme sequence whose second member is [1] occur 57018 times per million
words, so
(4.64)
172
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.65)
The [_ j ] context is nearly unbiased between the two stops. In the [_ w ] context, [t] is
expected to be about 66 times as likely as [p], while in the [_1] context, [p] is expected to be
The INC-1 version of the MERGE TP theory estimates the phonotactic badness of
[pw] as much greater than that of [tl], because [tl] is of far greater probability than [pw]
conditional on the glide. In absolute terms, [tl] is actually about eight times more frequent
than [pw]. Because the left and right contexts contribute independently to the theory's
probability estimates, it overestimates the rate at which [tl] will occur word-initially.
Again, MERGE TP INC-1 does not expect anything else in the stimulus context, in
particular the vowel following the glide, to influence the size of the boundary shift.
Here, the left and right contexts are treated as a single unit. The relevant sequence
frequencies, from the American English Kucera-Francis database, are shown in Table 4.66:
173
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.66. Triphone frequencies for the stimuli of Experiment 3
[#pw] 0
[#tw] 251
[#pj] 6858
[#ti] 2625
[#pl] 1981
[#tl] 0
Since sequences of the form [#_w] occur about 12704 times per million words.
those of the form [#_j ] occur about 39037 times, and those of the form [#_1] occur about
15459 times,
(4.67)
\nA
174
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In the [#_j] context, the bias is slight - [p] is only about 2.6 times as likely as [t].
However, in the [#_w] context, [p] is infinitely less likely than [t], while in the [#_l] context,
Under SC-1, we thus expect a large boundary shift, in opposite directions, in both
the [#_w] and the [#_1] contexts10. Again, the following vowel is outside the relevant
then we do not expect the [#_w] condition to produce any kind of boundary shift - it should
be indistinguishable from the [#_j ] condition. On the other hand, the highly illegal [tl] is
ruled out by an active constraint, OCP (PL, CONT), and we expect a sizable boundary shift
in the [#_1] condition. The following vowel is again outside the structural description of
(4.69) [p] marked, but not illegal => small if any [t] bias
10 We are assuming here that the magnitude o f the boundary shift is controlled by the ratio o f the
probabilities, for reasons described in §1.3.2.3. If it is controlled by the difference (a poorer guessing
strategy), then the [#_ j ] and [#J1 contexts are about equally discouraging to [t], which certainly does not
reflect native speakers' intuitions about phonotactic permissibility.
175
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.4. Methods
Unlike the stimuli in Experiments I and 2, this set was constructed from natural
speech. Several dozen tokens of each carrier were recorded by the experimenter (a native
speaker of American English) as a list of isolated words. The carriers were recorded
without an initial stop. One token of each was selected on the basis of the experimenter's
judgment of clarity and of uniformity of speaking rate. The most important criterion, and a
difficult one to meet, was that the initial glides not be confusable with each other.
a burst plus aspiration. The continuum was created by varying the F2 onset from 1000 Hz
to 2000 Hz in equal steps. The burst was kept very short to prevent it from sounding like
[k]. After the onset, F2 continued in a straight line towards 1000 Hz as the aspiration faded
out.
The presentation paradigm was the same as that used in Experiments 1 and 2. The
only difference was that in this experiment, the ambiguous segment only ever came at the
beginning of the stimulus. The ends of the stimuli did vary from trial to trial, but were
4.4.5. Results
176
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
An unexpected finding was that the "irrelevant" filler context, LvnAm] and [_fkoos],
actually influenced the location of the [p]-[t] boundary in the baseline pre-[_j] condition.
For this reason, I will discuss the [_vnAm] and [_fkous] results separately.
Psychometric functions are shown in Figures 4.70 and 4.71. It is clear that the
listeners were perceiving the continuum as a smooth transition from [p] to [t]. Figure 4.70
compares the baseline conditions (before [_ j ]) with the critical condition before [_1]; Figure
4.71, with the critical condition before [_w]. One can see that the pre-[_l] responses are
considerably more favorable to [p] than the pre-[_j] and pre-[_w] responses. These latter
are virtually indistinguishable from each other. The identity of the following vowel does not
To assess the statistical significance of the phonotactic shift, the overall rate of [p]
response was calculated separately for each subject in each condition (based on a maximum
difference between the rate in the baseline condition and those in each of the two critical
conditions was computed for each listener, and these rates were averaged across all 12
listeners to estimate effect size. Results are shown graphically in Figures 4.70 and 4.71,
177
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.70. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled
12 listeners, comparing the [_1] condition with the [_J] baseline
100
90
80
60
•_llvnAm
% "p" ■JaevnAm
50
response ■_rlvnAm
■ raevnAm
40
20
1 2 3 4 5
Intermediate stimulus number
178
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.71. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled
12 listeners, comparing the [_w] condition with the [_j ] baseline
100
90
80
70
60
_wlvnAm
% "p" _waevnAm
50
response _rlvnAm
40
raevnAm
30
20
1 2 3 4 5
Intermediate stimulus number
179
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.72. Mean percent "p" response, all intermediate [...vnAm] stimuli
Table 4.73. Differences in mean "p" response, pairwise by subject, [...vnAm] stimuli
[_ w i ] - U i ]
3.93 16.9 [-6.83 14.7]
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The difference between the [_J] and [ _ j ] conditions is positive, as expected, and the
95% confidence interval around each of the means excludes zero. The differences in the
[_li] and [_lae] cases are not distinguishable from each other. On the other hand, there is no
reliable difference between the [_w] conditions and the [_•*] baseline, and the nonsignificant
These results are just what we would expect under the OT grammatical theory: A
large boundary shift caused by an active markedness constraint, and none by an inactive
one. They strengthen our suspicion that the lack of a perceptual bias against [pw] in
TRACE'S prediction of more [p] responses before [_wae] than [_wi], and before
Llae] than [_li], is not borne out; if anything, the reverse has occurred, though the trend is
highly non-significant. There is a trend in the expected direction of more [p] responses
The other half of the stimuli proved more problematic, and are harder to interpret in
any theory. Psychometric functions are shown in Figure 4.17. The most striking
difference is the very large number of [p] responses before [_ J a e ], which does not fall below
50% until the very last step of the continuum. This context in fact produces more [p]
responses than any other in the experiment, despite the lack of phonotactic or statistical bias.
None of the theories discussed here has an explanation for this. I can only conclude
that it is an acoustic artifact of poor stimulus quality. The [i] in this stimulus is only about
10-15 ms long, less than half as long as the other [ j ] s , putting the burst closer to the vowel
181
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
nucleus. This could have caused an auditory contrast effect, making the burst sound lower
The lack of a reliable [_iae] baseline makes the [_aefkows] responses difficult to
analyze. The [_ifkows] data, shown in the next two figures, is consistent with the pattern
found with the LvnAm] stimuli, though it does not provide especially strong support:
Figure 4.74. Identification curves for the [...fkoUs] stimuli of Experiment 3, pooled across
12 listeners, comparing the [_1] condition with the |_ j ] baseline
100
IlfkoUs
0/
70
nn N
p laefkolls
resp o n se rlfkolls
- X — _raefkolls
0
2 3 4 5
Intermediate stimulus number
182
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.75. Identification curves for the [...fkous] stimuli of Experiment 3, pooled
12 listeners, comparing the [_w] condition with the [_j ] baseline
100
60 -■ .wlfkoUs
% “p“ .waefkoUs
50 -■
resp o n se .rifkolls
.raefkolls
30 ■■
20 - ■
2 3 4
Intermediate stimulus number
183
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.76. Mean percent "p" response, all intermediate [...fkous] stimuli
Table 4.77. Differences in mean "p" response, pairwise by subject, [...fkous] stimuli, [i]
condition only
The LI] effect just misses significance, while the [_w l effect achieves it but is
numerically smaller. Again, TRACE'S prediction of more [p] responses before [_wae] than
184
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
[_wi], and before [_lae] than [_Ii], is not supported. The prediction of more [p] responses
w h ic h w e c a n c o n c lu d e n o th in g .
4.4.6. Discussion
These results confirm our suspicion from Experiment 2 that the ban on initial [pw]
is much weaker than the other phonotactic prohibitions which have been the object of
perceptual experiments.
The sequences [pw] and [tl], despite their comparable statistical properties, differ in
their ability to cause a phonotactic boundary shift. This is unexpected under either version
of the MERGE TP theory, since they have very similar statistical properties. Overall, the
results of this experiment contradict the proposal that phonotactic illegality is equivalent to
zero frequency. The results are especially disappointing for the INC-1 version of the
MERGE TP theory, which predicted that [_w] would cause a larger shift than [_1].
TRACE'S prediction that the following vowel will have a very strong influence is
also not borne out by the data. Since a boundary shift was obtained, TRACE can only
explain it as a consequence of lexical activation spreading, in which case the following vowel
ought to have had an effect. The failure to find one calls TRACE'S explanation into
question.
185
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5. Experiment 4: Sequence frequency and the relative phonotactic badness of
4.5.1. Rationale
manipulated as the independent variable, and its effect on the boundary location was
measured as the dependent variable. For example, in Experiment 3, a [p]-[t] continuum was
judged in the contexts [_wl> LrL and [_1], and [_1] was found to cause a larger shift
(compared to the baseline [_j ] context) than [_w]. This is consistent with the prediction that
the more illegal [tl] onset should be more dispreferred than the less illegal [pw] onset.
However, the finding could also be artifactual. This is because the [_1] and [_w]
contexts are expected to shift the boundary in opposite directions. Different-sized shifts
indicate different-sized phonotactic biases, but they could also simply reflect a closer
perceptual spacing of the stimuli at one end of the [p]-[t] continuum. If, for instance, stimuli
1, 2, and 3 are very similar, while 3,4, and 5 are very different, then a boundary shift from 3
stimulus unit).
It is also possible that the shifts were due to low-level auditory interactions.
filler context ([_fkows] versus [_vnum]) interacted with the other stimulus variables.
Perhaps the stimuli of Experiment 3, based on natural tokens, did not provide sufficient
Experiment 4 was designed to eliminate these problems. The technique used here is
to measure the effect of one response on another: Listeners judged a CC cluster in which
both Cs were ambiguous, and the dependent measure was the effect of their decision about
186
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the first C (“g” vs. “d”, or “g” vs. “b”) on their decision about the second (“1” vs.
“w”) (Nearey 1990). By so doing, one can control stimulus factors completely: The
dependence between stop and sonorant judgments can be measured separately for each
individual stimulus.
A further check on Experiments 2 and 3 is provided by replacing [p] and [t] with [b]
and [d], replicating the original experiment with stimuli that are different segmentally but the
same phonotactically.
4.5.2. Design
The aim was to measure the dependence of ‘T7”w” judgments on “g’T ’d” and
“g”/”b” judgments in English CCV syllables. All listeners were tested on two separate
stimulus sets: an array of stimuli ambiguous among [glre gwae dire dwre], and one
ambiguous among [glre gwre blre bwre], and classified each one as one of those four
stimuli. The dependence between the stop and sonorant judgments was quantified as the
change in the log-odds ratio of an TV 'w" response conditional on the stop judgment (see
below, §4.5.5.).
4.5.3. Predictions
In experiments with only one ambiguous segment, TRACE predictions are derived
by assuming that a larger activation level for, say, [I] than [j ] means a greater likelihood of
(4.78)
2 >
187
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where
(4.79)
S; = c\p(ka])
a; being the activation level of unit j as j ranges over members of the alternative s e t, and k
response is called for, since TRACE has no units corresponding to two-segment sequences.
The simplest assumption would be that the probability of, e.g, a "bl" response is the
product of the probabilities of a ”b" and an "I" response. A stimulus which gets "b"
judgments 25% of the time and "1" judgments 60% of the time should be classified as "bl"
15% of the time. This would be in keeping with the principle that the units represent
hypotheses about the input, and their activation levels represent the strengths of these
hypotheses: The network’s confidence that the input contains a "b" at time 26 is completely
captured by the activation levels of the phoneme units for time slice 26. Under this
interpretation, any response dependency between the stop judgment and the sonorant
4J5.3.2. MERGE TP
Frequency counts from the American English Kucera-Francis database are shown in
Table 4.80. The "forbidden" [bw] and [dl] occur with fair frequency within words
(subway, badly).
188
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.80. Diphone frequencies for the stimuli of Experiment 4
[bw] 7
[dw] 26
[gw] 148
[bl] 2407
[dl] 275
[gl] 625
Nonfinal [b] occurs 49,772 times per million words; nonfinal [d], 32,251 times;
(4.81)
The right-hand context is irrelevant, being in every case the same ([ae]). In the
baseline context [g_], an [I] is 4.23 times as likely as a [w]. After [b_J, [I] is 344 times as
likely as [w]; hence, the "b'V'g" decision changes the expected odds of an [1] by a factor of
81. After [d_], [w] is 10.6 times as likely as [1], so the "d'V'g" decision changes the
189
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The INC-1 theory therefore predicts that both the "b'V’g" and "d’V'g" decisions will
affect the likelihood of an "1" response, with "b" favoring'T' responses and "d" favoring
"w". The effects are predicted to be of similar magnitude, with the "b'V'g" effect perhaps
In the SC-1 theory, what matters is the relative frequencies of the three-phoneme
strings with [1] and [w] in the middle. These frequencies are
[bwre] 0
[hire] 270
[dwre] 0
[dlae] 0
[gv/ae] 0
[glae] 202
Since [w] and [1] have the same frequency in [d_sej, SC-1 predicts no net bias in
favor of either. Meanwhile, the [b_ae] and [g_re] contexts have almost identical statistics -
no [w]s and a couple of hundred [l]s. These two contexts are therefore predicted to have
similar effects, biasing responses strongly towards "w". The "b'Vg" decision is thus
predicted to have little or no effect on the likelihood of an "I" response, while the "d'V'g"
decision ought to have a considerable effect, with "d" leading to more "w" responses.
190
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5.3.3. OT grammatical theory
Both [dl] and [bw] are unattested as syllable onsets in English, as shown in Table 1.
Nonetheless, as Table 4.83 shows, [dl] is commonly classified as “impossible”, while [bw]
is “marginal” at worst (Hultzen 1965; Wooley 1970; Catford 1988; Hammond 1999; see
There are coherent structural grounds for this difference. Both clusters violate a
cross-linguistically widespread constraint against successive consonants with the same place
of articulation in the same unit - here, the syllable onset (McCarthy 1988; Padgett 1991).
The [dl] onset has two successive coronals, while the [bw] onset has two labials.
As discussed in §2.3.2.5. above, English grammar is less hostile to [pw bw] than to
[tl dl]. English [j ] is labial (Delattre & Freeman 1968), so the legal, frequent onsets [br pj]
violate the same OCP constraint as [bw]. Moreover, the OCP is, cross-linguistically,
stronger the more similar the two Cs are in sonority (Selkirk 1988; Padgett 1991). Since [1]
is less sonorous than [w] (Kahn 1980; Guenter 2000), the [dl] sequence is closer in
sonority than [bw] and hence a worse structural violation.11 The OT grammatical theory
therefore predicts a larger perceptual bias against [dl] than against [bw], as shown in (4.84):
For a discussion o f [si], [sn], and [st] onsets, see above. §2.3.2.4.
191
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.83. Frequency of occurrence of the clusters of Experiment 4 as onsets in English
Position
Word-initial Syllable-initial
Labial
bw 0 0 0 0
Coronal
dw 10 983 16 1003
dl 0 0 0 0
Dorsal
gw 6 172 59 4834
When the stop could be either [g] or [d], the decision about the stop has major
consequences for the decision about the sonorant - a "d" decision means that the sonorant
cannot be parsed as [I] without violating an active markedness constraint. When the stop
could be either [g] or [b], though, the stop decision carries less weight, since the sonorant
can still be parsed as either [I] or [w] without violating an active markedness constraint.
The OT grammatical theory therefore predicts that the "g'V'd" decision will affect
the likelihood of an "1" response, with "d" leading to fewer 'T’s, while the "g'V'b" decision
192
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.84) [dlae] is more disfavored than [bwae]
*
a. (•, [bwae])
b. (•, [blae])
e. (•, [dwae])
r .'. v;
*
f. (•, [dlae])
0*
g (•, [gwte])
h. (*, [glae])
4.5.4. Methods
Stimuli were synthetic CCV monosyllables. The V, following Pitt (1998), was
always [*] in order to make all stimuli nonwords. The second C ranged from [1] to [w]; the
first, from [g] to [d] or from [g] to [b]. To prevent listeners from memorizing the individual
stimuli, a large stimulus set was used (Crowther & Mann 1994): six steps along each
continuum, making 36 stimuli in each array. Stimuli were identified by a 2-digit code. The
first digit specified position on the stop continuum (‘0’ = most b- or d-like, ‘5’ = most g-
like); the second, position on the sonorant continuum (‘0’ = most 1-like, ‘5’ = most w-like).
Care was taken to make the stimuli acoustically and articulatorily plausible, and to
insure that ambiguous segments were heard as one of the intended phonemes. Synthesizer
193
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
parameters are shown in Figure 4.19; only differences between the endpoints are discussed
in the text.
(a)
AV
o
TJ
AH
000
000 20000
time (ms)
(b)
FTP
FTZ
300000
000
time (ms)
194
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(C)
73)00 -
* SOOOO
■o
5
■o
2 9 ) jQ0
01
— I— 1— — I—
OO) 20000 40000 80000
time (ms)
(d)
73000
N
b/d B2F
X
2SOJOO
B2F
ooo
ooo
time (ms)
195
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(e)
b/d A2F
A2F
AB
2500
b/d AB
iW
OjOO
TL
OjOO
time (ms)
(f)
MOO jOO
o FO
x
73000
N
X
>•
v
c
3
«r
2SD00
000
000 20000
tim« (ms)
196
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The following parameters, which were the same for all stimuli and did not vary
across time, are not shown: GH 50, OQ 30, F6 4900, B6 100, F5 4300, B5 300, F4 3250,
FL 20. The aspiration control AH was turned on during the last part of the vowel to match
The [I] and [w] endpoints were made following the acoustic theory of those
segments in Stevens (1999:513-555). The [1] endpoint had a low F2 and high F3,
corresponding to an elevated tongue dorsum, and by a pole-zero pair in the vicinity of F4,
corresponding to the cavity above the tongue blade (Stevens 1999:545). At the [w]
endpoint, the pole-zero pair was absent, and all formants above F2 were attenuated to
simulate the low-pass filtering effect of a labial constriction. FI and F2 were even lower
The stop endpoints differed only in F2 onset, bandwidth of frication at F2, and
amplitude of the F2 and wide-band frication components. The [g] had a low F2 onset and a
compact burst spectrum, with energy concentrated near F2, while [b] and [d] had diffuse
burst spectra (Blumstein & Stevens 1979). The [b] had the same low F2 onset as [g], while
[d] had a higher onset than [b] and less energy in the F2 region. The stop transitioned into
Parameter values were adjusted to make even the endpoint stimuli slightly
Interpolation was linear except for the bandwidth of F2 frication, which was interpolated
along an exponential curve of the form B2F = AeBr, where r went from 0 at the [g] endpoint
Stimuli were synthesized using the cascade branch of the SENSYN terminal
analogue synthesizer (Klatt 1980) with 16-bit resolution, a 16-kHz sampling rate, and a 2-
ms frame length. Six formants were used, but only the lowest two varied. Stimuli were
low-pass filtered with a sharp cutoff at 5512 Hz. All parameters are given in tabular format
197
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This procedure yielded two 36-element stimulus arrays: one ambiguous between
[glie gwse dlae dwae] (the “d array”), and one between [glae gwre blae bwie] (the “b
array”). Pretesting with 32 listeners showed that the stimuli sounded natural, were
speech or hearing deficits, and were naive to the purpose of the experiment. One listener
Listeners were tested individually in a soundproof booth (LAC Model 401 A) during
two 15-minute blocks separated by a 5-minute break. Eight of the listeners heard the “d”
array in the first block, while the other 8 heard the “b” array first. In each block, the 36
The listener responded by pressing one of four buttons labelled “dw dl gw gl” or “bw bl
gw gl”. The order of buttons was rearranged between listeners. One second after the
response, the next syllable was presented. Ten responses were collected for each stimulus.
Each block was preceded by a short practice. Each of the four most extreme stimuli
(at the comers of the array) was presented three times, for a total of 12 stimuli, in random
order, and judged by the listener as in the main experiment. No feedback was provided.
The practice was repeated until the listener had used all four responses (accurately or not).
For each stimulus S, the 160 responses from all listeners were pooled to estimate the
likelihood that that stimulus would be put into each of the four categories. The statistic of
interest is how the listeners’ “IT ’w” judgment on a particular stimulus S is affected by
their decision about the stop. The “l”/”w” judgment was quantified as the log-
transformed odds ratio of “I” versus “w” responses. This was calculated separately for
the two stop responses, as shown in Figure 2. If the stop decision had no effect on the
198
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
decision, then all of the points in Figure 2 would lie on the line v = x.
Figure 4.86. Log odds ratios for the "IT w " judgment in Experiment 4, contingent on the
"g'T'd” judgment. Each point represents 16 listeners' pooled responses to one stimulus.
(Stimulus codes are explained in the text.)
200 - + —
OL 42
OjOO
54
45
C
•200 ■+-----24---- f
OjOO
in (P("ar)/P("gw"))
199
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.87. Log odds ratios for the ’TY'w" judgment in Experiment 4, contingent on the
"g'Y'b” judgment. Each point represents 16 listeners' pooled responses to one stimulus.
(Stimulus codes are explained in the text.)
30
20
OjOO
03
13
a. 60
‘24
C
35 -25
54
in (P("gl")/P("gw-))
For example, suppose S is a stimulus from the “d” array which was judged as
“gl” 36 times, “gw” 35 times, “dl” 18 times, and “dw” 67 times. When the stop was
identified as “g”, the sonorant was equally likely to be classified as “1” or “w”: In (
stop was identified as “d”, the sonorant was more likely to be called “w” than “1”: In (
phonotactic bias is the difference d: the log of the “l”/”w” odds ratio contingent on a
“g” decision minus that contingent on a “d” decision, here 0.028 - (-1.310) = 1.338.
For each array, D = mean d over all stimuli was computed. In the “d” array, D =
1.224, indicating that a “d” judgment reduced the odds of “1” by a factor of exp( 1.224) =
3.40. In the “b” array, D was 0.4762 - an unexpected result, since it means that a “b”
200
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
judgment, far from reducing the odds of a “w”, actually increased them by a factor of
exp(0.4762)= 1.61.
relation to the individual subject data and is drawn from an unknown distribution, the
appropriate statistical test is the non-parametric bootstrap (Efron & Gong 1983, Efron &
Tibshirani 1993). The null hypothesis D = 0 was tested against the two-sided
alternative H,: D * 0 using the sensitive procedure recommended by Hall and Wilson
(1991). For each array, B = 10,000 bootstrap resamples were drawn and used to find Da
such that Pr* (| D* - D\> Da) = a. For the “d” array, Dm = 0.3986 and D0(n = 0.5238.
Both are much less than D = 1.224, allowing rejection of Hq at the 99% confidence level.
For the “b” array, Dm = 0.4856 and Dm= 0.6103; hence, the empirical D of 0.4762 barely
Although both [dlx] and [bwae] are unattested in English, a significant phonotactic
This is inconsistent with TRACE, which predicted that the stop and sonorant
judgments would be independent for each stimulus (i.e., that all the points in Figures (4.21)
and (4.22) would like on the liney =x). It is inconsistent with the INC-1 theory, which
predicted that the "b’V’g" decision would affect the "l"/"w" decision at least as strongly as
It is consistent with the SC-1 and OT grammatical theories, both of which predicted
to nine years of exposure to a language with [bw] or [pw] onsets. Could this have allowed
them to build perceptual units for these un-English clusters? Each listener's total number
201
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Mandarin Chinese (see scatterplot in Figure 4.88). Longer exposure led to slightly fewer
"bw" responses. The trend was weak (R-squared =0.201) and due mostly to Listener 11,
who no exposure and a very high rate of "bw" response. When this listener was excluded,
the trend vanished (R-squaied = 0.117). Foreign-language experience does not, therefore,
200J00
tacuoo -
~ WOOD -
300 0 “
Years of exposure
There is another interpretation of TRACE, under which it may make the correct
TRACE is exposed to a given simulated input, it settles into the same state. An actual
202
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
biological system would not do that, owing to uncontrolled random factors which differ
from trial to trial (such as perceptual noise, pre-activation of lexical items through priming,
fixed stimulus would evoke a range of activation levels, leading to a range of response
probabilities; this is a standard assumption of signal detection theory (e.g., Macmillan &
Creelman 1991). These could interact through the network mechanism to produce a
dependency between stop and sonorant decisions for an acoustically fixed stimulus.
For example, a stop that on the average is exactly halfway, perceptually, between [d]
and [g] would sometimes activate the [d] unit a bit more and sometimes activate the [g] unit
a bit more. Of the "d" reports, a disproportionate number would come from the trials on
which the stimulus activated the [d] unit more. On these more [d]-like trials, the [d]-initial
cohort would gain an early advantage and combine to excite the [d] unit even more, leading
to inhibition of the [g] unit and hence of the [g]-initial cohort. The [dw]-initial words would
then feed activation down to the [w] unit. The [I] unit would receive no corresponding
support - the [gl] words being dormant and the [dl] words nonexistent - resulting in an
increased likelihood to respond "w". Hence, "d" responses would tend to be associated
with "w" responses - a dependency between the stop and sonorant decisions.
To assess this dependency, we need to manipulate the activation levels of the stop
units and measure the effect on the activation levels of the sonorant units. This can be done
by simply giving the network unambiguous stops together with ambiguous sonorants.
An ambiguous [l/w] phoneme was created for the TRACE input. Simple parameter
averaging of [I] and [w] produced a segment which stimulated no unit strongly. Parameters
were modified by trial and error until a segment had been attained that excited both the [1]
and [w] unit nearly equally, without exciting the [j ] unit.12 There was a slight initial bias
12 The parameters are: POW 3, VOC 3, DIF 7, ACU 5, GRD 4, VOI 1, BUR 0. It was also necessary to
replace the [S| unit with a copy o f the [s] unit, because otherwise the ambiguous segment activated it.
203
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
towards [I], which increased over time to a factor of 2 at Cycle 72. Since this effect was
present in all conditions of the simulation, it should not affect the pattern of predictions.
Experiment 3, it was found that the initial (b/g) or (d/g) segment excited a great many words
ending in [b d g]. To avoid this artifact, the same procedure was followed as before: Final
[b d g] were replaced by [p t k]. The words guano and dwarf were added, to insure that the
lexicon had at least one word beginning with each attested onset13. The resulting lexicon
Table 4.89. Lexicon for TRACE simulation: Words with [b/d/g]+[w/l] onsets
Onset Words
[gw] guano
[bw] (none)
[dw] dwarf
[dl] (none)
For this simulation,, it is important that the stop and the sonorant interact only
through activation spreading in the network, rather than through coarticulation. The event
being simulated is the repeated presentation of a single stimulus; the effect of interest is the
13 They were adapted to the TRACE phoneme inventory as dwarp and gwadu.
204
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
influence of the stop decision on the sonorant decision. We cannot manipulate the stop
decision directly, so we must do it by actually manipulating the stop in the simulated input.
However, the features of that simulated stop will be realized by TRACE not only at the time
slice of the stop, but for several time slices on either side, with the result that an initial [b]
will pre-activate [w] for reasons which have nothing to do with phonotactics. Manipulating
the stop does not just manipulate the activation input to the stop units, but also that to the
sonorant units. To prevent this, the TRACE program was altered so that the acoustic
features did not spread more than three time slices on either side of their center, preventing
overlap entirely.
(4.90)
where / and w are the activation levels of the [I] and [w] units. That is, the log odds ratio of
the "17"w" decision is proportional to the difference in activation levels between the [1] and
[w] units.
Given the input [g?ae], [1] and [w] start out equally active. By Cycle 45, glad, glass,
and glue are pulling ahead of guano, and [1] has overtaken [w]. By Cycle 75, [1] is three
times as active (39 to 14), and [w] is extinguished by Cycle 96. The log odds ratio at Cycle
75 is therefore 25it.
With [b?ze] as input, the same thing happens, only faster since there are more words
beginning with [bl] but no countervailing [bw]-initial words By Cycle 75, the activation
205
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
levels for [I) and [w] are 49 and 4, and [wj is deactivated on Cycle 81. The log odds ratio at
Cycle 75 is 45k.
With [d?ael as input, [1] and [w] are neck-and-neck for a long time. Dwarf
gradually gains ground, but is held in check by other weakly-activated [d]-initial words like
do, D, and deal. On Cycle 75 [w] is still only half again as active as [1] (32 to 20), and [1]
does not go extinct until Cycle 105. The log odds ratio at Cycle 75 is -12k.
The effect of the "g'V'b” decision on the Cycle 75 log odds ratio is therefore to shift
it by 20k, while that of the "g'V'd" decision is to shift it by -31k. TRACE therefore predicts
that the "g'V'd" decision will have a larger effect on the "l"/"w" decision than will the
The success of this simulation is due to the fact that TRACE implements
frequency ones. Here, the difference in frequency between [gl] (glad, glass, glue) and [gw]
(only guano) was enough to make the [gje] context favor [1]. In this respect it behaved
more like [b_se] and less like [d_re], reducing the effect of the "b'V’g" decision and
increasing that of the "d'V'g" decision. That is to say, the [g_ae] context did not provide a
neutral baseline.
All this is taking the most favorable possible view of the feasibility of modelling
scheme will require quite a lot of noise. In the above simulation, the stop units were
supplied with enough "noise" to completely disambiguate them - the simulated listener, on
that trial, heard an ambiguous segment as a perfect [g], [d], or [b], A more realistic, i.e.
stop-unit activation levels for a fixed stimulus, which would in turn cause smaller variation
(from presentation to presentation of the same stimulus) in both stop and sonorant response
probabilities. It seems implausible that the covariation between stop and sonorant responses
206
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.6. Experiment 5: Phonotactics and syllabification
4.6.1. Rationale
coarticulation (Mann 1980, Mann & Repp 1981, Mann 1986). Suppose an ambiguous stop
in the'd' array is perceived as [d]. Since the F2s in the'd' array range from slightly below
that of a [d] to slightly above that of a [b], they are all lower than the typical [d]. A stop
heard as [d] thus has an atypically low F2 for a [d]. Some of this lowness may be attributed
to labialization from a following [w], causing more “dw” and fewer “dl” responses.
Because [b] and [g] have similar F2s, the compensation effect may be smaller in the “b”
This account can be tested by manipulating the stimuli to alter their phonotactics
while leaving their coarticulatory properties intact. As pointed out by Pitt (1998), a cluster
which is illegal in an onset may become legal if split by a syllable boundary. A structural
account predicts less bias against “dl” responses in [xdlx] than in [dlse], because [aedlae]
allows the legal parse [aed.ls], A compensation account predicts the bias will persist, as
compensation has a strong effect across syllable boundaries (Mann 1980; Mann 1986;
Elman & McClelland 1988; Pitt & McQueen 1998), is unaffected by perceived
syllabification, and is only slightly reduced, if at all, by a preceding vowel context (Mann &
Repp 1981).
207
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.6.2. Design
The design was based on that of Experiment 4. Two 6-by-6 arrays of CCV and
VCCV stimuli were constructed, ambiguous between [dwae dlae bwae blae] or [aedwae aedlaae
aebwae aeblae]. Both [dlae] and [bwae] were included to maximize the expected phonotactic
4.6.3. Predictions
TRACE predicts that the stop and sonorant judgments will be independent, both in the
VCCV and the CCV condition. Neither the INC-1 nor the SC-1 context can "see" the
prepended vowel, so they predict that adding the vowel will not affect the stop-sonorant
dependency.
reduced in the VCCV condition compared to the CCV condition. This is because the
VCCV stimuli can be syllabified as VC.CV. If the stop is classified as "d", then "1" will be
disfavored in the CCV condition because [dlae] violates OCP(CONT, PL); however, in the
VCCV condition, an "1" response will still be possible if the input is parsed as VC.CV.
4.6.4. Methods
constructed, ambiguous between [dwae dlae bwae blae]. A 6-by-6 array of VCCV stimuli
was made by adding a 300 ms [ae] to each of the CCV stimuli. This [ae] used the same
parameters as the final [ae], except that FO began higher (120 Hz). Transition to the stop
took 40 ms. A 40-ms voiced closure preceded the release; though short, this proved to be
208
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.91. Synthesis parameters for the stimuli of Experiment 5
(a)
ffl
•o AV
AF.AH
000
000 10000 20000 30000
tun* (ms)
(b )
FTP
FTZ
time (ms)
209
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(C)
03
B2
040
0130
time (ms)
(d)
B4
73)01
B2F
N
X
c
•v
A
00)
MOO) 20000 3000)
tim« (ms)
210
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(e)
TSjOO -
A2F
m
■o AB
>
TL
OjOO
time (ms)
(0
o
X
N
X
o
c
•
3
V
•
tz
oa t
toooo 20000
time (ms)
211
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Eighteen different members of the same population as in Experiment 1 participated
for psychology course credit. Two were dropped because their native language was not
The only difference from Experiment 4 was that all listeners were tested on the
VCCV block first and the CCV block second, to avoid priming a V.CCV syllabification.
4.6.5. Results
Results are shown in Figures 4.92 and 4.93. As in Experiment 4, bias appears as
displacement from the line y = x. The displacement is greater and more consistent in the
CCV than the VCCV condition. The test statistic was again D, the log of the “I T w ”
odds ratio contingent on a “d” decision minus that contingent on a “b” decision, averaged
212
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.92. Log odds ratios for the "l'V'w" judgment in Experiment 4, contingent on the
"b"/"d" judgment, for the CCV stimuli. Each point represents 16 listeners' pooled
responses to one stimulus. (Stimulus codes are explained in the text.)
CCV stimuli
200
100
52
3 03 02
£ 00
Sl 33
000
34
£ -too ■ + -
05
+ —
In (P(dl)/P(dw))
213
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.93. Log odds ratios for the TV'w" judgment in Experiment 4, contingent on the
"b'V’d" judgment, for the VCCV stimuli. Each point represents 16 listeners' pooled
responses to one stimulus. (Stimulus codes are explained in the text.)
VCCV stimuli
200
50
22 00
32
43
53
24
54
-2 jOO
04 45
55
In (P(dl)/P(<!w))
For the CCV array, D is 1.0505, for the VCCV array, it is 0.0648. The same
A
nonparametric bootstrap procedure was used to test significance. For the CCV array, Doos
A
= 0.4370 and Dooi = 0.5685, confirming a phonotactic effect. For the VCCV array, the
A A
effect did not approach significance: Doos = 0.4362 and Dooi = 0.6269.
The results indicate that the dependency was eliminated by the availability of a legal
parse: The prepended vowel made a difference. This is unexpected under TRACE, INC-1
214
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.6.6. Discussion
Experiment 4 found a perceptual bias against [dlje], but none against [bwae). The
INC-1 version of MERGE TP predicted otherwise, because [dl] and [bw] are both
unattested as English syllable onsets. Since listeners’ experience of both onsets is identical,
that experience cannot explain the difference in performance. TRACE incorrectly predicted
no dependency at all between the stop and sonorant decisions. Only the SC -1 version of
MERGE TP and the OT grammatical theory correctly predicted the pattern of results.
was not due to auditory factors, since bias was measured separately for each stimulus;
rather, it reflected a dependency between the stop and sonorant responses. Experiment 5
confirmed that this dependency was not compensation for coarticulation, because it could be
It may be objected that listeners' experience of [dl tl] and [bw dw] is not in fact
word British English corpus, in favor of [bw pw], which university-aged speakers in the
United States are likely to have encountered in foreign place names such as Buenos Aires,
southwestern U.S. place names such as Pueblo, or occasional loans like puissant or the
colloquial bueno. At a conversational speaking rate of 150 words per minute (Venkatagiri
perhaps one to three years of a person's combined input and output. A word occurring less
frequently than once in one to three years could escape the corpus —though an 18-year-old
participant in these experiments might have heard it 18 times or more, providing enough
experience of [bw pw] onsets to remove the perceptual bias against it. This Undetected
correct.
215
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As has already been pointed out, listeners’ acceptance of [bw] was not increased by
up to 9 years of explicit training in languages in which [bw pw] onsets are common. It was
argued above that this is a ceiling effect; acceptability of [bw pw] cannot be increased by
training because the sequences are legal in English. If instead the UFD Hypothesis is
correct, then the whole of the gain in acceptability must be caused by the exposure to the
first few tokens, with subsequent training having no effect. Hence, it should take exposure
to only a small number of tokens to make any sequence legal. But speakers persist in
treating some sequences as illegal, even after considerable training (Dupoux et al. 1999;
Polivanov 1931).
In support of the UFD Hypothesis, it may be replied that the listeners were exposed
to the undetected low-frequency [bw pw] as children, but received foreign-language training
as adults, after the critical period for accentless acquisition. It is certainly true that infants as
young as 9 months are already sensitive to the sound pattern of their language (Jusczyk,
Friederici, Wessels, Svenkerud, & Jusczyk 1993; Friederici & Wessels 1993; Jusczyk et al.
1994). However, adults can leam phonotactic patterns even without explicit training (Dell,
Reed, Adams, and Meyer, 2000). Moreover, a dispreference for [dl tl] compared to [bw pw]
is found in children who were unlikely to have been exposed to [bw pw] onsets. As shown
in Table 4.29, the midwestem U.S. States of Iowa and Nebraska had few Spanish- or
French-speaking residents in 1990 and almost no place names beginning with [bw pw].
216
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.94. Demographics and occurrences of [bw pw]-initial place names in Iowa and
Nebraska (U.S. Census Bureau, 1990; DeLorme 1998; 2000)
Iowa Nebraska
Note: The proportion of the overall U.S. population of Hispanic origin in 1990 was 8.99%.
"Buena Vista", in Iowa, is locally pronounced with initial [bw] (Buena Vista College Library
staff, personal communication, 2001).
In a study of 1,049 children in Iowa and Nebraska between 2 and 9 years of age,
Smit (1993) systematically elicited productions of most of the English word-initial clusters,
including [bl pi] and [tw]. The [tw] cluster was sometimes produced as [bw] or [pw], but
the [bl] and [pi] clusters almost never became [dl] or [tl], as shown in Table 4.30.14 This
indicates that [d t] are more disfavored before [I] than [b p] are before [w].
14 Similarly, these children also sometimes produce (bl pll as [bw pw), with no corresponding tendency to
turn [tw] into [tl]. Aversion to [tl] may be a contributing factor, but we cannot be sure, because they tend
to replace [I] with [w] in all environments.
217
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.95. Errors in production of the initial stop in [bl pi tw] onsets by English-Ieaming
children in Iowa and Nebraska (Smit 1993)
[tw-] (twins)
2:0-3:0 f,b P
3:6—5:6 p, k, d f, int
6:0-9:0 — —
[p H (plate)
2:0-3:0 — —
3:6-5:0 — b, t
5:6-7:0 — —
8:0-9:0 — —
[bl-] (block)
2:0-3:0 — —
3:6—
5:0 — —
5:6-7:0 — —
8:0-9:0 — —
Note: "Occasional" means "[u]sed by a few groups in an age range with a frequency of 4-
10%, or by most groups in that age range at frequencies of 1-4%"; "rare" means "[ojccurs
with a frequency of less than 3%, and only in a few groups in an age range” (Smit 1993, p.
947). This table includes all errors made by the 1,049 children in the study, "int" =
interdental.
The asymmetry is present at the earliest ages tested — before one would expect
most Iowan or Nebraskan children to have had much exposure to Spanish place names.
The UFD Hypothesis can therefore only be defended if the perceptual effects of frequency
are due chiefly to a very few tokens experienced very early in life. If so, it is an interesting
new finding, with many consequences. It implies that, contrary to TRACE, the many words
218
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
learned after early childhood contribute little to the phonotactic frequency effect. It predicts
large individual variation in phonotactics (since the individual is generalizing from a small
sample of the adult language, which will necessarily differ more between individuals than a
large sample). Finally, it suggests that even large corpora of adult language are inadequate
These findings are consistent with a model in which the decision between competing
parses is guided by the structural constraints of the perceiver’s language - here, the ban on
acoustic cues, the choice was between competing CCV parses. The “dl” responses were
reduced because a “dl” response could only be supported by the structurally disfavored
[dire] parse. In Experiment 5, where both segmental identity and syllabficiation were
ambiguous, “dl” responses could be supported by the legal [xd.lx] parse, and the
The findings of Pitt (1998, Experiment 2) may be reinterpreted in the same way:
“1” response to an [1]-[j ] continuum was reduced, relative to a baseline, in the context
[maet_ae], but not in [mxd_ae]. Strong aspiration on the [t] provided an unambiguous cue to
V.CCV syllabification (Kirk 2001), allowing only the parses [mx.traj and the illegal
[mae.tlae], The [maed_x] context allowed VC.CV syllabification and thus the legal “1”
parse [mxd.lx]. This suggests that prosodic and segmental parse decisions are made in
parallel, with the candidate parses representing both phonemes and syllabification:
[msed.lx], [mx.dlx], [mxd.rx], and [mae.drre]. The chosen prelexica! parse thus provides
the essential information for word segmentation and lexical access. Phonotactically
impossible parses, such as those with vowelless syllables or illegal onset clusters, are
inhibited, leading to the Possible Word Constraint effects observed by Norris, McQueen,
219
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7. Experiment 6: Phonotactics of the Japanese lexical stratals
Experiments 1-3 all examined the issue of whether a perceptual boundary could be
shifted by phonological context that was remote from the ambiguous segment. That it could
theory which used contexts longer than a single phoneme. Results were uniformly negative:
The identity of a segment outside the phonotactically relevant context did not affect
In these experiments, the relevant context was quite small, including only the
ambiguous segment and one adjacent to it. The INC-1 and SC-1 versions of the TP theory
therefore took only the relevant context into account in making their predictions. Where the
grammatical theory said that remote context was ineffective because it was irrelevant, the TP
This section presents evidence that remote context can have a phonotactic effect
when it is (despite its remoteness) phonotactically relevant. It will be shown that Japanese
context which is remote from the ambiguous vowel, but which is important in determining
the lexical stratum membership of the nonword. This can be taken as further evidence
against the INC-1 and SC-1 versions of the MERGE TP theory. The effect of stratum
phonotactics is found to be numerically larger and statistically more robust than a word-
superiority effect obtained with the same listeners and paradigm, contrary to the predictions
of TRACE.
15 The work in this section was perform ed with the guidance and support o f Shigeaki Amano at the NTT
Basic Research Laboratories in Atsugi, Japan.
220
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The finding of a stratum-phonotactic effect is also strong evidence in favor of the
OT grammatical theory, which is the only one of the three theories in which the concept of
Items in the Japanese lexicon can be divided into four classes, called "strata".
characteristics (Shibatani 1990; Tateishi 1990; Martin 1952; ltd & Mester 1994 1995).
combine preferentially with other morphemes of the same stratum. For example, the two
morphemes [dai] (Sino-Japanese) and [o:] (Yamato) both mean big'. The former combines
with Sino-Japanese [moq] 'gate' to give [daimoq] 'main gate of a Buddhist temple’, the latter
with Yamato [te] 'hand' to give [o:te] 'main gate of a castle'. Since all of the verbal and
adjectival inflectional endings are Yamato morphemes, only Yamato items undergo
inflection.16
16 A few exceptions have been noted: [dem orui] 'to demonstrate (politically)', consists o f a Foreign stem
spliced onto a Yamato ending (Sato 1983).
221
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.96. The lexical strata of Japanese
[katarogui]
'catalogue'
Second, the strata are subject to different sets of phonotactic restrictions.17 A Sino-
Japanese morpheme, for example, cannot be more than two moras long, and the second
mora must be either a single vowel, or one of [t^i tsui ki kui]. A Yamato morpheme cannot
begin with [r], while a Mimetic morpheme can have [r] initially or medially but not both
(Tateishi 1990). Single [p], voiced geminates, and voiceless post-nasal obstruents are
17 There are also commonly supposed to be productive phonological processes which operate only in
certain strata. However, attempts to demonstrate their productivity experimentally through native-speaker
"goodness" judgm ents have tended to show the opposite. For rendaku, see Suzuki, Maye, and Ohno (2000);
for the phonology o f verbal affixes, see Vance (1987, 1991). This is a common result of such studies: See
Hsieh (1976) for non-productivity o f Taiwanese tone sandhi, Zimmer (1969) for non-productivity o f a
Turkish labial ity-spreading rule.
222
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Vowel length is distinctive in Japanese. Each of the five short vowels [i e a o ui]
has a long counterpart. Final [a] is found in all strata, while final [a:] is found only Foreign
words.
phenomenon: strictest for the indigenous Yamato stratum, less so for the newer Sino-
Japanese words, and most permissive for the recent Foreign loans (Ito & Mester 1994,
theoretically permitted in the Foreign stratum, are virtually never actually found there
(Moreton et al. 1998). For instance, the palatalized onsets [r*] and [h*] are almost
word containing [rj] or [h*] is almost certain to be Sino-Japanese, and hence to lack [a:],
while a word containing singleton [p] or [$a] is almost certain to be Foreign, and hence to
permit it. Carrier contexts containing different stratum cues can be constructed, and the
[a]-[a:J boundary measured, in order to detect whether stratum membership can influence
boundary location.
In order to estimate the size of a pure lexical effect on the long-short vowel
223
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.3.1. Design
Three pairs of words were selected. One member of each pair ended in a short
vowel and the other in the long version of the same vowel. If a word ended in a short vowel,
then making the vowel long produced a nonword, and vice versa. Both words in each pair
were in the same stratum, and were in or above the 96th percentile in rated familiarity when
presented aurally (Amano & Kondo in press). Both members of each pair had the same
Each word pair provided one context which was expected to lexically bias perception
in favor of the short vowel, and one which was expected to bias in favor of the long vowel.
It was expected that there would be a bias to report [a:] in the context [posut_], since
[posuta:] is a real word while [posuta] is not, and that there would be a bias to report [a] in
the context [pasut_), since [pasuta] is a real word while [pasuta:] is not.
224
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.3.2. Methods
A male native speaker of Japanese recorded several tokens of each word in isolation.
These were digitized at 48 kHz with 16-bit resolution, downsampled to 44.1 kHz, and
normalized for peak amplitude. Using a waveform editor, a single token each of the final
syllables— [ro:], [ta:], and [go:]—was excised and cross-spliced with each of the initial
syllables to make three pairs of stimuli with each pair ending in the same long vowel. The
length of the vowel was adjusted to be as close as possible 250 ms by repeating medial pitch
periods.
in the Tokyo area, with equal numbers of each sex. Subjects reported normal speech and
displayed on the screen. The buttons were labelled in the katakanu syllabary (used
primarily for writing Foreign loanwords); one showed the stimulus word with a short final
vowel, the other with a long one (e.g., “ringo” and “ringoo”). Subjects were asked to
choose the button which better matched what they had heard. The next stimulus followed
To vary the length of the final vowel, a half-wave raised cosine filter was applied.
This caused the final vowel to be gradually reduced to zero amplitude during the 50 ms
following the specified starting point of the filter, providing a natural-sounding end to the
vowel.
The boundary between the long and short vowel was established using the adaptive
method PEST (Taylor & Creelman 1967). One of the 6 stimuli was randomly selected, and
presented with either the longest or the shortest possible final vowel. On the next trial, the
225
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
opposite endpoint was presented. Thereafter, the stimulus presented depended on the
the boundary. The series ended when the step size was reduced to 1/256 of the
continuum—i.e., about 1 ms— or after 80 trials. To reduce the dependence of each trial on
the previous one, two PEST series for the same context were interleaved with each other,
switching back and forth randomly. Once both series finished, the screen cleared and a
button appeared with the message “Click to go on to the next sound”. This procedure took
Three subjects were excluded from the analysis for failure to converge after 80 trials
on 6 or more series (in this or the following experiment combined). The remaining 2 1
listeners converged on 96% of all series after an average of 20.4 trials in each.
For each subject, an [a]-[a:] or [o]-[o:] boundary was computed for each series. The
boundary was the stimulus that would have been presented by PEST if there had been one
more trial. Since two interleaved series were presented for each carrier context, the subject’s
boundary for that stimulus was taken as the average of the two.
For each subject and each pair of words, we calculated the difference between the
boundary in the long-bias context and that in the short-bias context. Results are shown in
Table 4.98.
226
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.98. Boundary difference in long- and short-biased contexts (in milliseconds).
Experiment 6a (N = 21 Ss)
There were large amounts of individual variation in the differences, leading to large
variances and low significance levels. The difference was significantly different from zero
in the posut-Zpasut- context (one-tailed t-test, t(20) = 2.086, p < 0.05). The syuug-Zring-
difference just misses significance at the 5% level (t(20) = 1.725). No effect could be
perception by Ganong (1980). A robust word-superiority effect, however, was not found.
The weak showing of the lexical effect is all the more striking when compared with the
stratum phonotactics can shift the phonetic boundary between two sounds—specifically,
whether the different phonotactics of the Foreign and Sino-Japanese strata can shift the
The idea was to present an [a]-[a:] continuum at the end of a carrier word containing
two consonants, C and C \ If C and C ’ are sounds that occur almost exclusively in Sino-
Japanese words, then they do not naturally co-occur with [a:]. We therefore expected that
listeners would require exceptionally strong acoustic evidence (i.e., a longer vowel) before
227
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
reporting [a:]. On the other hand, if C and C ’ are sounds found only in Foreign words,
then it is unsurprising if the word contains (a:J, and we expected in this case that listeners
would be readier to report the long vowel. In other words, we expected that the [a]-[a:]
boundary would be shifted towards [a:] when C and C’ were highly valid cues for Sino-
Japanese, and that it would be shifted towards [a] when they are highly valid cues for
Foreign.
4.7.4.1. Design
(1999), by exploiting a feature of the Japanese writing system: the lexical stratum of a word
is reflected in its orthography. Foreign words are written in the katakana syllabary.
Chinese characters are used to write both Sino-Japanese and Yamato words, but the
dictionary indicates which pronunciations of a given character are Sino-Japanese and which
are Yamato. There are some exceptions, but they are not numerous. (Details of the
Cues to stratum membership were chosen on the basis of these statistics. For the
Foreign stratum we chose as C the singleton [p] (i.e., [p] neither geminated nor preceded by
a nasal coda) and as C’ the voiceless bilabial fricative [<|>] (orthographically <f>), which
' outside of the Foreign stratum is found only before [ui]. For the Sino-Japanese stratum we
chose [rj] and [h*]. For neutral contexts we chose [r] and [t]. The statistics are shown in
Table 4.99:
228
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.99. Validity of cues to stratum membership: Number of nouns in database
belonging to each stratum containing the given cues
Stratum
[<Mbefore [i e a o] 214 0 16
W) 19 959 109
[h*] 7 273 38
Note: “Other” includes the Mimetic and Yamato stratum, compounds containing words
from more than one stratum, and words which the orthographic algorithm could not
classify. Nouns make up 82% of the database.
The consonant cues were embedded factorially in the template [CoC'_] to produce
[fJohU18
18 None o f the stimuli could be interpreted as Yamato items. Each stimulus begain with either [p], [r], or
[r1], none o f which is permitted word-initially in Yamato.
229
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.4.2. Predictions
4.7.4.2.1. TRACE
There is nothing in TRACE that would allow it to represent the concept of stratum
directly. Stratum effects, if there are any, must emerge from statistical properties of the
lexicon.
However, we can infer something about its predictions based on its behavior in previous
simulations.
In each of the phonotactic simulations we have run, the advantage for one phoneme
over another has come, not from slight activation of many lexical items, but from the
moderate incomplete activation of one or two lexical items that are very similar to the
ambiguous stimulus. Their activation is strong enough to extinguish the other word
A word-superiority effect differs from a phonotactic effect only in that a single word
predicted to be smaller than that of an outright lexical effect. We expect a smaller effect in
4.7.4.2.2. MERGE TP
The INC-1 and SC-1 versions of the MERGE TP theory predict only an effect of
the proximate (C ) context, not of the remote (C) context, because they do not keep track of
dependencies more than a single segment away from the ambiguous segment. If an effect
of C is found, it will constitute positive evidence either that the contexts must be larger, or
230
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7A2.3. OT grammatical theory
OT analyses of the Japanese lexical strata are in agreement that the differences
between strata lie in faithfulness rather than markedness. One proposal is that there are
separate grammars for the separate strata, which differ in how the faithfulness constraints
are ranked. Lexical items are evaluated with respect to their stratum's grammar (Ito &
Mester 1995). An alternative account is that there is only one grammar, but that it contains
duplicate sets of faithfulness constraints, which are stratum-specific. Each set applies only
to lexical items which belong to one stratum. Different rankings of each set result in
The choice is not a crucial one for the present experiment; the reasoning is nearly
the same in either case. For expository purposes I will adopt the second proposal, stratum-
parsimonious (preserving the principle of a single grammar with a single fixed ranking) and
offers a learning algorithm, driven by ranking paradoxes: Its principal empirical advantage
is its ability to deal with "hybrid" compounds - compound words containing words from
researcher’, where each member of the compound obeys the phonotactic restrictions
Japanese lDENT[LENGTH] faithfulness constraint below *[a:]. Its legality in Foreign items
231
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4 .1 0 0 ) lDENT[LENGTH)FOR » *[a:] » I d e n t [ L e n g t h 1s j
/d e n s W s j
a. [densia:] *!
*
b. [densja]
/konp*uita^For
*
c. [konp’uita:]
d. [konp’uita] *!
The predictions made by this grammar in the grammatical model hinge crucially on
the concept of "active constraint", which, as discussed in §3.4.2., is any constraint which, for
some input /, eliminates a candidate output from consideration. In order to use grammatical
knowledge, the parser must decide which markedness constraints are active. In order to do
that, it must decide which stratum it is dealing with. The listener must covertly choose a
stratum within which to interpret the stimulus; that is, the linguistic parse which the listener
attempts to assign to the stimulus includes a stratum classification. The active constraints
are determined relative to the stratum's faithfulness constraints. If the stratum chosen is
Foreign, then *[a:] is inactive and produces no bias. If it is Sino-Japanese, then *[a:J is
232
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.101)
a. [po<[>a]For
*
b. [po$a:]For
c. [rjohia]SJ
*
d. [rjohja:]SJ
In this view, the Foreign cues can provide conclusive evidence of Foreign stratum
constraint which is active with respect to Sino-Japanese. The reverse is not true: Although
the Sino-Japanese cues are rare in Foreign words, they are not actually illegal there. Hence,
any of our stimuli, with either vowel length, has a legitimate parse as a Foreign word. Those
containing Foreign cues cannot be parsed as Sino-Japanese; those lacking Foreign cues can
be parsed as Sino-Japanese.
There should thus be only two degrees of [a] bias: a lesser one in words containing
Foreign cues, and a greater degree in words lacking them. If C is a Foreign cue, then
manipulating C' should have no effect, while if C is not a Foreign cue, then manipulating C'
should have an effect (since making C' Foreign will reduce the [a] bias). The same is of
course true with C and C exchanged. For example, the stimulus contexts [po$_] and
[rjoi{»a_] are predicted to have the same effect, since the presence of [ $ a j cues Foreign
stratum membership regardless of the initial consonant. Both should produce more "aa"
judgments than contexts lacking Foreign cues. This will show up statistically as an
233
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4J.4.3. Methods
The procedure was almost identical to that of Experiment 4a. The same speaker
(naive to the purpose of the experiment) was recorded on digital audio tape producing the
cues embedded in the context [CoC’a:], with a high pitch on the [o] and a low one on the
[a:]. This accent pattern was chosen because it is common in both the Sino-Japanese and
Foreign strata, and because it rules out the possibility of a word boundary inside the
These were digitized and normalized as before. Using a waveform editor, single
tokens of [po], [ro], and [rJoJ were selected and excised. One of the [o]s was chosen and
spliced in to replace the original [o] of the other two, resulting in [po], [$o], and [f'o] tokens
In this experiment, unlike in Experiment 6a, it was necessary to manipulate the initial
consonant of the final syllable; hence, an [a:] token had to be created which could be spliced
onto each of the three possible syllable onsets. A single token of [a:] was selected and
lengthened to 250 ms by repeating medial pitch periods. One token each of the speaker's
original [$a:], [ta:], and [Wa:] was chosen, and truncated to the first zero crossing following
234
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the fourth pitch period of the vowel (early in the steady state), leaving [<|>a-], [ta-], and
[Wa-]. The 250-ms [a:] token was spliced onto the end of each one to produce [<j)a:], [ta:],
The subjects and procedure were as in Experiment 6a. A short break separated the
two experiments, which, together with the post-experiment questionnaire, lasted about two
hours.
At the end of the experiment, subjects received written questionnaires. They listened
as many times as they wished to the longest [a:] and the shortest [a] in each carrier context
(in a random order for each subject) and transcribed the resulting pseudo-word in katakana,
then answered questions about it, including whether they knew it as a real word of Japanese.
4.7.4.4. Results
The same subjects were excluded as in Experiment 6a. For each valid subject, an
[a]-[a:] boundary was computed for each series as in Experiment 6a. The boundary for
235
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.103. Boundary between [a] and [a:), averaged across 21 listeners
[ a ] - 1a : ]
boundary
(m s )
120
SJ ryo
N ro-
The boundary stimulus tends to become longer as the consonantal cues go from
manipulations of both C and C’ had very significant effects on the boundary location. The
C manipulation caused a 7.25-ms shift (F(2,40) = 6.473, p < 0.01); the C ’ manipulation
caused a 12.6-ms shift (F(2,40) = 11.529, p < 0.01). Their interaction did not reach
The effect obtained in this experiment is considerably larger and more robust than
that of Experiment 6a. There, the only significant effect was a 12.2l-ms word-superiority
236
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
effect on the location of an [a]-[a:] boundary. This is about the same size as the effect of
C’ alone in this experiment. TRACE would have predicted the lexical effect to be stronger.
Questionnaire results confirmed that the subjects heard the stimuli correctly; no
stimulus was misheard by more than 3 subjects. One stimulus, [pota], was judged to be a
real word by 15 subjects (it is part of the reduplicated Mimetic potapota). One other was
judged a real word by 2 subjects, and 5 others by 1 subject. Lexical bias, if any were
present, should have given [pota] an especially strong [a] bias. Its boundary is in fact
The INC-1 and SC-1 versions of the transitional-probability theory do not fare well
either. They can account for the effects of C’, but not for those of C, which is fully three
segments away from the ambiguous vowel. The C effect is smaller than the C' effect, which
neighboring ambiguous segment is being added on top of some other effect correlated with
stratum.
These results are also challenging for the OT grammatical model. It is clear that
stratum phonotactics are playing a role, so faithfulness constraints are involved. However,
the lack of interaction between the C and C' effects is unexpected. All stimuli containing at
least one Foreign cue should be classified as Foreign and should all behave alike, with the
same low level of [a] bias. Instead, the [a] bias decreases with the number of Foreign cues,
Furthermore, [rj] and [h*] seem to act as cues to membership in the Sino-Japanese
stratum, since they produce more [a] bias than the Foreign or Neutral cues. This is
membership in the Sino-Japanese stratum. The [rj] and [h*] sounds are statistically rare in
237
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the Foreign stratum, but not grammatically illegal there. Listeners appear to be sensitive to
the statistical link, which the OT grammatical theory does not capture.
4.7.4.S Discussion
First, they found a phonotactic effect that was substantially larger than any of three
lexical effects obtained with the same listeners and paradigm - a point difficult to explain in
TRACE.
Second, they showed that phonological context which is remote from the ambiguous
Third, they demonstrated that the lexical strata of Japanese are not just descriptive
constructs, but play an active role in perception. The distinction among strata (as a primitive
phonology, but is unmotivated (or motivated only post hoc) in TRACE and MERGE TP.
These results leave a problem for the grammatical theory in the gradience of the
cues. As the cues become more Sino-Japanese-like, the [a] bias increases; as they become
on the number and type of stratum cues present in it. A stimulus with two Sino-Japanese
cues is more likely to be perceived as Sino-Japanese, and hence more likely invoke the *[a:J
constraint and produce a boundary shift, than a stimulus with only one.
grammar, since the grammar does not represent the concept of "Sino-Japanese cue" - only
that of "Foreign cue”. Stratum classification might take place as part of the process of
238
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
incorporating a new word into the lexicon, through comparison with existing lexical items.
Since listeners in this experiment heard each stimulus an average of 44 times in succession,
they had concentrated experience with each one, and ample time for slower off-line
processes to unfold.
4.8. Summary
contexts which made one endpoint illegal, but not in contexts which merely made one
endpoint extremely rare. This result is consistent only with the OT grammatical theory, but
not with MERGE TP or TRACE. These same experiments also showed that the boundaiy-
shift effect is not modulated by phonological context which is outside the structural
grammatical theory, and with MERGE TP, but not with TRACE.
Experiment 4 replicated the effects of Experiments 2 and 3 with voiced rather than
voiceless stops, and showed that the boundary shift was due to a dependency between stop
and sonorant responses, rather than to any auditory effect of one consonant on the other.
Experiment 5 confirmed that this dependency was not compensation for coarticulation, and
showed that the phonotactic effect in CCV stimuli can be abolished by prepending another
vowel to provide an alternative parse in which the banned cluster becomes legal. This
indicates that the parser makes syllabification and segmental-identity choices in parallel. If
segmental identity were decided first, then the presence or absence of an initial V would
make no difference. If syllabification were decided first, then phonotactics would not be
1998, Treiman & Zukowski 1990; Treiman & Danis 1988, Kirk 2001).
phonotactics of the lexical strata of Japanese, a variable which exists only in the OT
grammatical theory. Stratum effects must be taken as emergent statistical phenomena in the
239
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
other two theories, but this they cannot be. The stratum effect was found to be larger than a
The phonotactic boundary shift was influenced by segmental context fully three segments
away from the ambiguous segment, too far away for the MERGE TP theory to capture the
dependency.
Taken together, these results provide substantial support for the hypothesis that
Table 4.104. Constant synthesis parameters which were identical for the "b" and "d" arrays
of Experiments 4 and 5
Parameter Value
UI 2
RS 1776
SR 16000
FL 20
OQ 30
GV 60
GH 50
DU 700
F6 4900
B6 100
F5 4300
B5 300
F4 3250
240
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Value
F3 2500
FTP 3800
Table 4.105. Time-varying synthesis parameters common to the "b" and "d" arrays of
Experiments 4 and 5
AV 0 0
AV 25 0
AV 30 40
AV 70 40
AV 75 57
AV 225 60
AV 425 55
AV 475 50
AV 525 43
AV 575 0
AV 700 0
TL 0 30
TL 60 30
TL 65 10
TL 75 10
TL 115 (10 * w + 0 * 1)
TL 145 (20 * w +
241
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value
TL 175 (10 * w + 0 * 1)
TL 225 0
TL 700 0
FO 0 1000
FO 75 1000
FO 225 1100
FO 425 1000
FO 475 900
FO 700 900
AH 0 0
AH 25 0
AH 60 0
AH 65 72
AH 70 70
AH 90 65
AH 115 0
AH 175 0
AH 225 0
AH 275 56
AH 345 60
AH 425 58
AH 475 55
AH 525 53
AH 575 0
AH 700 0
242
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value
AF 0 0
AF 60 0
AF 65 57
AF 70 55
AF 75 0
AF 700 0
Table 4.106. Synthesis parameters for the "b" array of Experiments 4 and 5
AB — ( 24 * g + 44 * b)
A2F — 66
FTZ 0 3800
FTZ 75 3800
B4 0 800
B4 75 800
B4 275 200
243
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value
B4 700 200
B3 0 700
B3 75 700
B3 275 150
B3 700 150
F2 0 st_targ
F2 75 st_targ
F2 140 gLtarg
F2 150 gLtarg
F2 275 1600
F2 700 1600
B2 0 90
B2 115 90
B2 175 (270 * w + 90 * 1)
B2 225 (270 * w + 90 * 1)
B2 275 90
B2 700 90
FI 0 180
FI 75 180
FI 85 200
244
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value
FI 225 700
FI 345 780
FI 700 780
Bl 0 250
Bl 75 250
Bl 95 140
Bl 175 140
Bl 225 80
Bl 700 80
Note: AB = amplitude of the bypass frication filter, dB; A2F = amplitude of the frication
filter centered on F2, dB; B2F = bandwidth of same, Hz; FTZ = frequency of tracheal zero,
dB; Fi and Bi = frequency and bandwidth of the i-th formant, Hz. The variable st_targ, the
stop target F2, is 800 Hz; gljtarg, the glide target F2, is equal to 675 Hz * w + 900 Hz * /.
Table 4.107. Synthesis parameters for the "d" array of Experiments 4 and 5
AB — ( 24 * g + 44 * d)
A2F — ( 66 * g + 60 * d)
FTZ 0 3800
FTZ 75 3800
B4 0 800
245
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value
B4 75 800
B4 275 200
B4 700 200
B3 0 700
B3 75 700
B3 275 150
B3 700 150
F2 0 st_targ
F2 75 st_targ
F2 140 gLtarg
F2 150 gLtarg
F2 275 1600
F2 700 1600
B2 0 90
B2 115 90
B2 175 (270 * w + 90 * 1)
B2 225 (270 * w + 90 * 1)
B2 275 90
B2 700 90
246
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value
FI 0 180
FI 75 180
FI 85 200
FI 225 700
FI 345 780
FI 700 780
Bl 0 250
Bl 75 250
Bl 950 140
Bl 175 140
Bl 225 80
Bl 700 80
Note: AB = amplitude of the bypass frication filter, dB; A2F = amplitude of the frication
Filter centered on F2, dB; B2F = bandwidth of same, Hz; FTZ = frequency of tracheal zero,
dB; Fi and Bi = frequency and bandwidth of the i-th formant, Hz. The variable stjarg, the
stop target F2, is equal to 800 Hz * g + 1400 Hz * d: g lja r g , the glide target F2, is equal
to 675 Hz * w + 900 Hz * /.
247
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 5
CONCLUSIONS
the perceiver’s language. This finding cuts across every level of language organization:
phoneme inventory (e.g., Miyawaki, Strange, Veibrugge, Liberman, Jenkins, & Fujimura,
1975), phonotactics (e.g., Brown & Hildum, 1956), the lexicon (e.g., Ganong, 1980), and
The preceding chapters have offered an explicit theory of how the mechanisms of
speech perception can use grammatical knowledge of the phonology of the stimulus
language to arrive at a phonological parse of the input by choosing from a set of candidate
parses consistent with the acoustic signal. I have argued that such a theory is necessary if
Three principal objections were raised against the statistical theories: their inability
to distinguish between relevant and irrelevant context, their lack of sufficiently rich
classes. At the root of all three is precisely the feature that makes statistical theories so
The statistical theories, including TRACE and MERGE TP, can be characterized as
"unit models" because they attribute perceptual preference for, e.g., [tr] over [tl] to the
listener’s differing experience of the specific phonological units [tr] and [tl]: One is an
attested onset and the other is not (Halle et al. 1998, Pitt 1998), one is common and the
other is rare (Massaro & Cohen 1983, Pitt & McQueen 1998), one is supported by many
lexical items which contain it and the other is not (McClelland & Elman 1986). The a
priori plausibility of unit models comes from the pervasiveness of unit-frequency and
248
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
lexicalty effects in language (e.g., Vitevich & Luce 1999, Jusczyk, Luce, & Charles-Luce
1994, Frisch, Large, & Pisoni 2000, Hay, Pierrehumbert, & Beckman in press; Ganong
1980; Samuel 1981, Fox 1984), combined with the minimal nature of the representations
they posit - phonemes and words, both of which are needed in any theory. The weakness
of unit models is not conceptual but empirical: The phonological knowledge used in speech
processing is more complex than can be accommodated in such a simple architecture. The
experiments of Chapter 4 were designed to exploit this weakness, in order to argue that a
devoicing applying to all obstruents at the end of all syllables). Different rules have
different environments. However, the unit models afford only one environment - the word
for TRACE, the fixed-length phoneme string for MERGE TP - and are forced to detect
nothing to do with the actual phonological pattern. Experiments 1-6 showed that in fact,
when probabilities are equated, phonologically relevant variation has a much stronger
This was particularly clear in the case of Experiment 6, where the magnitude of a
word-superiority effect was compared directly with that of a stratal phonotactic effect and
found to be much smaller. In order to account for the phonotactic effect (which reflected a
dependency between the ambiguous phoneme and one three phonemes previous to it), a unit
model would have to use such a large environment that it would have to also represent
equally strongly the dependency between the first three segments of any word and the
Lack o f sufficiently rich phonological structure. Moreover, since the unit models do
not represent syllabification, they could not predict the effects of syllable structure found by
Pitt (1998) and in Experiment 5. TRACE, whose phoneme-decision process considers each
249
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
phoneme unit in isolation, cannot represent the phonotactic dependency between two
TP, which only represents statistical dependencies between one specific phoneme sequence
and another. For this theory, [ki] and [saj are not two instances of the pattern "CV
syllable", but two unrelated phoneme strings. This renders the theory unable to recognize
natural classes. Experiments 2-4 indicated that English listeners' experience of the common
[labial][labial] onsets [br pr] legitimizes the rare or nonexistent [bw pw], but MERGE TP
cannot make the connection. (TRACE'S featural level could in principle allow it to capture
this generalization, if there were a way of representing syllable structure.) The inability to
relate one phoneme string to another exacerbates the problems of irrelevant context and
allow irrelevant factors to be averaged away and lead to the induction of more structured
representations.
The OT grammatical theory performed well in all of these tests, predicting shifts
when there should have been shifts and no shifts when there should not have been any. The
good performance was not due to the specific choice of Optimality Theory - a similar
theory could in principle have been constructed around any descriptively adequate grammar
- but to the fact that grammatical theory more accurately describes the categories and
processes of language. TRACE and MERGE TP both propose, in essence, that the
representations and rules active in on-line speech perception are very different from those
inferred from typological study of the structure of human languages. Any attempt to
elaborate the architecture of either theory to capture more sophisticated linguistic concepts
(e.g., by adding a layer o f syllable units to TRACE) will amount to building grammar into
them. Since their chief conceptual appeal is their promise to explain apparent grammatical
250
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
There remains some evidence for transitional-probability effects on ambiguous-
phoneme perception which are not captured in the OT grammatical theory - the findings of
Pitt (1998) on English onset clusters - and the effects of lexicality are thoroughly
documented (Ganong 1980; Samuel 1981, Fox 1984, Elman & McClelland 1988, etc.).
Given the pervasive nature of frequency and practice effects in all cognitive domains, and
their narrow stimulus-specificity (e.g., Klapp et al. 1991, Logan 1988a, 1988b), there can be
no doubt that unit-based processes play a prominent role in perception. However, the
evidence of this study suggests that they are considerably weaker than the structural
1998) and Boersma and Hayes (2001): the continuous ranking scale and stochastic
relation to all other constraints, determined by its place in the hierarchy: C l dominates C2
or is dominated by C2, or is ranked in the same stratum as CT, with no other possibility. In
Boersma and Hayes's proposal, each constraint is associated with a range of positions on
the real line, which may overlap with the ranges of other constraints. When the grammar is
given an input to evaluate, the position of each constraint is fixed probabilistically at some
point in its range, yielding a standard ranking. If the range of CI is centered above that of
C2, but overlaps it, then most of the time C l will be fixed above C2, but sometimes Cl will
be low in its range, C2 will be high in its, and the result will be that C2 dominates C l. The
grammar therefore does not always give the same output for a given input; different
constraints are active from one use of the grammar to the next. The further apart the centers
of the C l and C2 ranges are, the likelier C l is to dominate C2 on any given use of the
grammar; hence, the likelier the corresponding grammatical process is to occur. The overlap
between the C l and C2 ranges depends on the frequency with which the C 1 » C 2 and the
251
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In perception, this could cause phonotactic prohibitions to appear and disappear
dominates the faithfulness constraint F (and is therefore active) would be accepted when the
reverse is true. Averaged over a large number of trials, the listener's dispreference for the
configuration would depend on the amount of overlap between the M and F ranges. In this
way, stronger and weaker phonotactic bans would correspond to smaller and greater degrees
of overlap.
demonstrably sensitive to the grammar of phonotactics (Norris et al. 1997, Kirk 2001).
This can be seen as selection of a grammatically more harmonic prosodic parse over a less
harmonic one. Effects of faithfulness should be apparent in word recognition and similarity
constraint should activate the word more strongly than a nonword which is unfaithful to it
on a high-ranked constraint. Such studies offer a test, not merely of the influence of
grammar, but of the specific conception of it put forward by Optimality Theory and
252
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BIBLIOGRAPHY
Archangeli, D., & Pulleybiank, D. (1994). Grounded phonology. Cambridge, MA: MIT
Press.
Ashby, F. G., & Maddox, W. T. (1994). A response time theory of separability and
integrality in speeded classification. Journal o f Mathematical Psychology, 38:423-
466.
Baayen, R., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (CD-
ROM). Philadelphia: Linguistic Data Consortium.
Bailey, P. J., Summerfield, Q., & Dorman, M. (1977). On the identification of sine-
wave analogues of certain speech sounts. Haskins Laboratories Status Report on
Speech Research 51/52:1-25.
253
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Boersma, P. (1998). Functional phonology: Formalizing the interactions between
articulatory and perceptual drives. Ph.D. dissertation, University of Amsterdam.
Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm.
Linguistic Inquiry 32:45-86.
Brown, C., & Matthews, J. (2001). When intake exceeds input: Language-specific
perceptual illusions induced by LI prosodic constraints. Proceedings o f the Third
International Symposium on Bilingualism, Bristol, U.K., April 18-20, 2001.
Brown, R. W., & Hildum , D. C. (1956). Expectancy and the perception of syllables.
Language 32:411-419.
Bumage, G. (1995). The CELEX lexical database. Release 2. Centre for Lexical
Information; Max Planck Institute for Psycholinguistics, The Netherlands.
Cherry, E. C. (1953). Some experiments on the recognition of speech with one and two
ears. Journal o f the Acoustical Society o f America 23:975-979.
Chomsky, N., & Halle, M. (1968). The sound pattern o f English. Cambridge, MA: MIT
Press.
254
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Clements, G. N., & Keyser, S. J. (1983). CVphonology. MIT Press, Cambridge, MA.
Connine C. M., & Clifton, C. (1987). Interactive use of lexical information in speech
perception. Journal o f Experimental Psychology: Human Perception and
Performance 13(2):291-299.
Connine C. M., Titone, D., & Wang , J. (1993). Auditory word recognition: Extrinsic
and intrinsic effects of word frequency. Journal o f Experimental Psychology:
Learning, Memory, and Cognition 19:81-94.
Crowther, C. S., & Mann, V. A. (1994). Use of vocalic cues to consonant voicing and
native language background: The influence of experimental design. Perception and
Psychophysics 55(5):513-525.
Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical
access. Journal o f Experimental Psychology: Human Perception and Performance
14:113-121.
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1986). The syllable's differing role in the
segmentation of French and English. Journal o f Memory and Language 25:385-
400.
255
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1987). Phoneme identification and the
lexicon. Cognitive Psychology 19:141-177.
Delattre, P., & Freeman, D. C. (1968). A dialect study of American R's by X-ray
motion picture. Linguistics 44:29-68.
Dell, G. S., & Newman, J. E. (1980). Detecting phonemes in fluent speech. Journal o f
Verbal Learning and Verbal Behavior 19:608-623.
Dell, G. S., Reed, K. D., Adams, D. R., & Meyer, A. S. (2000). Speech errors,
phonotactic constraints, and implicit learning: A study of the role of experience in
language production. Journal o f Experimental Psychology: Learning, Memory, and
Cognition 26(6): 1355-1367.
DeLorme Publishing Company (1998). Iowa atlas and gazetteer. Yarmouth, Maine:
DeLorme.
Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C., & Mehler, J. (1999). Epenthetic
vowels in Japanese: A perceptual illusion? Journal o f Experimental Psychology:
Human Perception and Performance 25(6): 1568-1578.
Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and
cross-validation. The American Statistician 37(l):36-48.
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York:
Chapman and Hall.
256
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Foss, D. J. (1969). Decision processes during sentence comprehension: Effects of
lexical item difficulty and position upon decision times. Journal o f Verbal
Learning and Verbal Behavior 8:457-462.
Foss, D. J., Harwood, D. A., & Blank, M. A. (1980). Deciphering decoding decisions:
Data and devices. In R. A. Cole (Ed.), Perception and production o f fluent speech.
Hillsdale, NJ: Erlbaum.
Frauenfelder, U. H., Segui, J., & Dijkstra, T. (1990). Lexical effects in phonemic
processing: Facilitory or inhibitory? Journal o f Experimental Psychology: Human
Perception and Performance 16(1):77-91.
Frisch, S. A., Large, N. R., & Pisoni, D. B. (2000). Perception of wordlikeness: Effects
of segment probability and length on the processing of nonwords. Journal o f
Memory and Language 42:481-496.
Frisch, S., Broe, M., & Pierrehumbert, J. (1995). The role of similarity in phonology:
Explaining OCP-Place. Proceedings o f the 13th International Conference o f the
Phonetic Sciences 3:544-547.
Fukazawa, H., Kitahara, M„ & Ota, M. (1998). Lexical stratification and ranking
invariance in constraint-based grammars. In M. C. Gruber, D. Higgins, K. S.
Olson, & T. Wysocki (Eds.), Proceedings o f the Chicago Linguistics Society 34-2
The Panels (pp. 47-62). Chicago: Chicago Linguistics Society.
257
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal o f
Experimental Psychology: Human Perception and Performance 6(1): 110-125.
Games, S., & Bond, Z. S. (1975). Slips of the ear: Errors in perception of casual
speech. Papers from the 11th Regional Meeting o f the Chicago Linguistics Society,
pp. 214-225. Chicago, Illinois: Chicago Linguistic Society.
Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New
York: Wiley.
Greenberg, J. H., & Jenkins, J. J. (1964). Studies in the psychological correlates of the
sound system of American English. Word 20:157-177.
Guenter, J. (2000). What is /l/? Proceedings o f the 26th Annual Meeting o f the Berkeley
Linguistics Society, University of California, Berkeley, February 18-21.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics 47:757-762.
Hall, T. A. (1997). The phonology o f coronals. Amsterdam Studies in the Theory and
History of Linguistic Science, Series IV: Current Issues in Linguistic Theory, Vol.
149. Amsterdam: Benjamins.
Halle, P. A., Segui, J., Frauenfelder, U., & Meunier, C. (1998). Processing of illegal
consonant clusters: A case of perceptual assimilation? Journal o f Experimental
Psychology: Human Perception and Performance 24(2):592-608.
258
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hay, J., Pierrehumbert, J., & Beckman, M. (in press). Speech perception, well-
formedness, and the statistics of the lexicon. In J. Local, R, Ogden, & R. Temple
(Eds.), Papers in laboratory phonology VI. Cambridge, U.K.: Cambridge
University Press.
Hsieh, H.-I. (1976). On the unreality of some phonological rules. Lingua 38:1-19.
Ito, J., & Mester, R. A. (1995). The Core-Periphery Structure of the Lexicon and
Constraints on Reranking. In J. Beckman, S. Urbanczyk, & L. Walsh (Eds.),
University o f Massachusetts occasional papers in linguistics [UMOP] 18: Papers
in Optimality Theory (pp. 181-209). Amherst: GLSA.
Jaeger, J., Lockwood, A., Kemmerer, D., Van Valin, R., & Khalak, H. (1996). A
positron-emission-tomographic study of regular and irregular verb morphology in
English. Language 72(3):451-497.
Jakobson, R., Fant, G. M., & Halle, M. (1952). Preliminaries to speech analysis: The
distinctive features and their correlates. Cambidge, MA: MIT Press.
Jones, D. (1997). English pronouncing dictionary, 15th ed. (P. Roach & J. Hartman,
eds.). Cambridge, UK: Cambridge University Press.
259
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Jusczyk, P. W„ Luce, P. A., & Charles-Luce, J. (1994). Infants' sensitivity to
phonotactic patterns in the native language. Journal o f Memory and Language
33:630-645.
Klapp, S. T., Boches, C. A., Trabert, M. L., & Logan, G. D. (1991). Automatizing
alphabet arithmetic: II. Are there practice effects after automaticity is achieved?
Journal o f Experimental Psychology: Learning, Memory, and Cognition 17(2): 196-
209.
Kluender, K.R., & Lotto, A. J. (1994). Effects of first formant onset frequency on [-
voice] judgments result from general auditory processes not specific to humans.
Journal o f the Acoustical Society o f America 95(2): 1044-1052.
Lahiri A., & Marslen-Wilson, W. (1991). The mental representation of lexical form: A
phonological approach to the recognition lexicon. Cognition 38:245-294.
260
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Lamontagne, G. (1993). Syllabification and consonant cooccurrence conditions. Ph.D.
dissertation, University of Massachusetts, Amherst.
Liberman, A.M., Harris, K. S., Hoffman, H. S., & Griffith, N. C. (1957). The
discrimination of speech sounds within and across phonemic boundaries. Journal o f
Experimental Psychology 53:358-368.
Lindsay, P. H., & Norman, D. A. (1977). Human information processing. New York:
Academic Press.
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood
activation model. Ear and Hearing 19:1-36.
Macmillan, N. A., & Creelman, C. D. (1991). Signal detection theory: A user's guide.
Cambridge, UK: Cambridge University Press.
261
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Mann, V. A. (1986). Distinguishing universal and language-dependent levels of speech
perception: Evidence from Japanese listeners’ perception of English “I” and “r”.
Cognition 24(3): 169-196.
Mann, V. A., & Repp, B. (1981). Influence of preceding fricative on stop consonant
perception. Journal o f the Acoustical Society o f America 69(2):548-558.
Marslen-Wilson, W. D., & Welsh A. (1978). Processing interactions and lexical access
during word recognition in continuous speech. Cognitive Psychology 10:29-63.
Massaro, D. W., & Cohen M., (1983). Phonological context in speech perception.
Perception and Psychophysics 34:338-348.
262
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
McQueen, J. M. (1991). The influence o f the lexicon on phonetic categorization:
Stimulus quality in word-final ambiguity. Journal o f Experimental Psychology:
Human Perception and Performance 17:433-443.
McQueen, J. M., Norris, D., .& Cutler, A. (1994). Competition in spoken word
recognition: Spotting words in other words. Journal o f Experimental Psychology:
Learning, Memory, and Cognition 20:621-638.
McQueen, J. M., Norris, D., .& Cutler, A. (1999). Lexical influence in phonetic
decision making: Evidence from subcategorical mismatches. Journal o f
Experimental Psychology: Human Perception and Performance 25(5): 1363-1389.
Mehler, J., Dommergues, J. Y., Frauenfelder, U., & Segui, J. (1981). The syllable's role
in speech segmentation. Journal o f Verbal Learning and Verbal Behavior 20:298-
305.
Miyawaki, K., Strange, W„ Verbrugge, R., Liberman, A. M., Jenkins, J. J., &
Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r]
and [I] by native speakers of Japanese and English. Perception and Psychophysics
18:331-340.
Moreton, E., & Amano, S. (1999). Phonotactics in the perception of Japanese vowel
length: Evidence for long-distance dependencies. Paper presented at Eurospeech
1999, Budapest.
Moreton, E., Amano, S., & Kondo, T. (1998). Statistical phonotactics of Japanese:
transitional probabilities within the word. Transactions o f the Technical Committee
on Psychological Acoustics, Acoustical Society o f Japan, H-98-120.
Morton, J., & Long, J. (1976). Effect of word transitional probability on phoneme
identification. Journal o f Verbal Learning and Verbal Behavior 15:43-51.
263
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Nazzi, T„ Jusczyk, P. W„ & Johnson, E. K. (2000). Language discrimination by
English-leaming 5-month-olds: Effects of rhythm and familiarity. Journal o f
Memory and Language 43:1-19.
Nearey, T. M., & Assmann, P. F. (1986). Modeling the role of inherent spectral change
in vowel identification .Journal o f the Acoustical Society o f America 80:1297-1308.
Newman, R. S., Sawusch, J. R., & Luce, P. A. (1997). Lexical neighborhood effects in
phonetic processing. Journal o f Experimental Psychology: Human Perception and
Performance 23:873-889.
Norris, D., McQueen, J. M„ & Cutler, A. (1997). The possible-word constraint in the
segmentation of continuous speech. Cognitive Psychology 34(3): 191-243.
Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech
recognition: Feedback is never necessary. Behavioral and Brain Sciences
23(3):299-325.
Pitt, M. A., & Samuel, A. G. (1993). An empirical and meta-analytic evaluation of the
phoneme identification task. Journal o f Experimental Psychology: Human
Perception and Performance 19(4):699-725.
264
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Polivanov, E. (1931). La perception des sons d’une langue etrangere. Travawcdu
Cercle linguistique de Prague 4:79-86.
Repp, B. H. (1982). Phonetic trading relations and context effects: new experimental
evidence for a speech mode of perception. Psychological Bulletin 92(l):81-l 10.
Rubin, P., Turvey, M. T., van Gelder, P. (1976). Inital phonemes are detected faster in
spoken words than in spoken nonwords. Perception and Psychophysics 19:394-
398.
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of
distributional cues. Journal o f Memory and Language 35(4):606-62l.
265
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Samuel, A. G. (1987). Lexical uniqueness effects on phonemic restoration. Journal o f
Memory and Language 26:36-56.
Segui, J., & Frauenfelder, U. (1986). The effect of lexical constraints upon speech
recognition. In F. Klix & H. Hagendorf (Eds.), Human memory and cognitive
capabilities (pp. 795-808). Amsterdam: Elsevier.
Selkirk, E. O. (1988). Dependency, place, and the notion "tier". Ms., Department of
Linguistics, University of Massachusetts, Amherst.
266
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Smith, R. C., & Dixon, T. P. (1971). Frequency and the judged familiarity of
meaningful words. Journal o f Experimental Psychology 88(2):279-28l.
Sproat, R., & Fujimura, O. (1993). Allophonic variation in English /!/ and its
implications for phonetic implementation. Journal o f Phonetics 21: 291-311.
Stockmal, V., Moates, D. R., & Bond, Z. S. (2000). Same talker, different language.
Applied Psycholinguistics 21:383-393.
Suzuki, K„ Maye, J., & Ohno, K. (2000). On the productivity of lexical stratification in
Japanese. Paper presented at the annual meeting of the Linguistic Society of
America, Chicago.
Thorndike, E. L., & Lorge, I. (1944). The teacher's word book o f 30,000 words. New
York : Teachers College, Columbia University.
Treiman, R., Kessler, B., Knewasser, S., Tincoff, R., & Bowman, M. (1996 [2000]).
English speakers' sensitivity to phonotactic patterns. In M. Broe & J. B.
Pierrehumbert (Eds.), Papers in laboratory phonology V: Acquisition and the
lexicon (pp. 269-283). Cambridge, UK: Cambridge University Press.
267
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Treiman, R., Gross, J., & Cwikiel-Glavin, A. (1992). The syllabification of fsf clusters
in English. Journal of Phonetics 20:383-402.
Tyler, L. K., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming
latency data. Perception and Psychophysics 38(3):217-222.
Vitevich, M. S., & Luce, P. A. (1998). When words compete: Levels of processing in
spoken word recognition. Psychological Science 9:325:329.
Vitevich, M. S., Luce, P. A., Charles-Luce, J., & Kemmerer, D. (1997). Phonotactics
and syllable stress: implications for the processing of spoken nonsense words.
Language and Speech 40(l):47-62.
Wall, L., Christiansen, T., & Schwartz, R. L. (1996). Programming Perl, 2nd. ed.
Cambridge, MA: O'Reilly.
268
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Wurm, L. H., & Samuel, A. G. (1997). Lexical inhibition and attentional allocation
during speech perception: Evidence from phoneme monitoring. Journal o f Memory
and Language 36:165-187.
269
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.