Moreton, Alfred Elliott Phonological Grammar in Speech

INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
The quality of this reproduction is dependent upon the quality of the

copy submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by

sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.
Photographs included in the original manuscript have been reproduced

xerographically in this copy. Higher quality 6” x 9" black and white
photographic prints are available for any photographs or illustrations appearing
in this copy for an additional charge. Contact UMI directly to order.
ProQuest Information and Learning

300 North Zeeb Road. Ann Arbor, Ml 48106-1346 USA
800-521-0600
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION
A Dissertation Presented
by
ALFRED ELLIOTT MORETON
Submitted to the Graduate School of the

University of Massachusetts, Amherst, in partial fulfillment
of the requirements for the degree of
DOCTOR OF PHILOSOPHY
May 2002
Department of Linguistics
UMI Number: 3056262
Copyright 2002 by
Moreton, Alfred Elliott
All rights reserved.
___ ®
UMI
UMI Microform 3056262
Copyright 2002 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company

300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
© Copyright by Elliott Moreton 2002
All rights reserved
A Dissertation Presented
by
ALFRED ELLIOTT MORETON
Approved as to style and content by:
John
Lyn Frazier, Member
larles E. Clifton, Member
Elisabeth O. Selkirk, Department Head

Department of Linguistics
ACKNOWLEDGEMENTS
I had a lot of help with this. John Kingston trained me up from nothing and saw
me through the whole thesis with more patience than I really deserved. He has always
been willing to discuss an idea, demonstrate a technique, debug a stimulus set, or go over
a draft. This, plus his eye for logical flaws and tenacious memory for obscure journal
articles, were indispensable to the writing of this thesis. So were the advice and
encouragement of the other two committee members, Lyn Frazier and Chuck Clifton,
who were my connection to the wider world of psycholinguistics. Shigeaki Amano
provided invaluable aid by inviting me to the NTT Basic Research Labs and supervising
my work there. John McCarthy was only tangentially involved in the present work, but
I'm going to thank him anyway because his classes and seminars are among the most
fascinating experiences I've ever had. Parts of Chapter 4 benefited from the comments of
two anonymous Cognition reviewers. Johns Hopkins University has sheltered me while I
completed my revisions. This research was paid for in part by the U.S. National Science
Foundation, the U.S. National Institutes of Health, and the Nippon Telephone and
Telegraph Company.
Kathy Adamczyk and Lynne Ballard rescued me from many a disaster with their
quick thinking and uncanny influence over the university administration.
iv
Dissertating students love company, and I was fortunate to have good company in
my classmates Isadora Cohen, Kiyomi Kusumoto, Junko Shimoyama, and Bernhard
Schwartz, my housemates Joe Eskinazi, Eva Juarros and Janina Rado, my labmate
Cecilia Kirk, and my just plain mates Andre Isaak, Caroline Jones, and especially
Jennifer Smith, a stalwart comrade-in-arms throughout our common dissertating time.
Special thanks are owed to Earl Gaddis, Virginia van Scoy, and the Northampton Group
of the Boston Branch of the Royal Scottish Country Dance Society for six years of
wonderful music, dancing, and comradeship.
Finally, I would like to thank my parents for their love and encouragement, and
for acting like this was all perfectly normal. This thesis is dedicated to them.
ABSTRACT
MAY 2002
ALFRED ELLIOTT MORETON, B. A., SWARTHMORE COLLEGE
Ph.D., UNIVERSITY OF MASSACHUSETTS, AMHERST
Directed by: Professor John Kingston
This dissertation investigates the ways in which speech perception is guided by
the expectation that the stimulus is an utterance in the perceiver's language, with a
particular focus on how phonotactics affects the interpretation of acoustically ambiguous
segments. A model is proposed in which phonological grammar, expressed here as a
system of ranked and violable constraints within the framework of phonological
Optimality Theory, is used to select among competing candidate parses of the acoustic
input. This grammar-based theory is contrasted with two grammarless alternative
accounts of perception: the connectionist network TRACE, which derives phonotactic
perceptual effects from the lexicon, and a statistical theory based on transitional
probabilities.
Experimental evidence is presented to show (1) that English listeners'judgments
of vowels and of consonant clusters disfavor configurations which are grammatically
illegal in the language, (2) that the dispreference for illegal configurations is far stronger
vi
than that for configurations which are legal but have zero frequency, and (3) that it is due
to a response dependency, rather than to auditory or other stimulus factors, and cannot be
explained by foreign-language exposure. Two experiments with Japanese listeners find
that (1) the lexical stratum membership of nonsense words can produce a phonotactic
perceptual effect, (2) that the triggering and target segments can be up to three segments
distant, and (3) that the stratum-phonotactic effect is larger than a word-superiority effect
obtained with the same listeners and paradigm.
These results are shown to be consistent with the grammar-based model, but
inconsistent with the two grammarless alternatives. Analysis of the three models reveals
that the shortcomings of the alternatives is due to their inability to abstract over phoneme
classes and larger linguistic structures. It is concluded that the mechanisms of speech
perception have access to a full-fledged phonological competence.
vii
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS..................................................................................................... iv
ABSTRACT..............................................................................................................................vi
LIST OF TABLES..................................................................................................................xiv
LIST OF FIGURES................................................................................................................ xix
CHAPTER
1. INTRODUCTION................................................................................................................. 1
2. PHONOLOGICAL PRELIMINARIES........................................................................... 11
2.1. Introduction................................................................................................................ 11
2.2. Inventory and phonotactics in Optimality Theory.................................................11
2.3. Inventory and phonotactics of English syllable onsets............................................14
2.3.1. Explicanda........................................................................................................ 15
2.3.2. Analysis.............................................................................................................18
2.3.2.1. Representations...................................................................................... 18
2.3.2.1.1. Consonant features........................................................................21

2.3.2.1.2. Features of [j w 1].........................................................................24
2.3.2.2. CV syllables........................................................................................... 28
2.3.2.3.1. Undominated faithfulness constraints......................................... 30

2.3.2.3.2. Coronal stop places: Dental, retroflex, alveolar, and palato-
alveolar................................................................................................30
2.3.2.3.3. The persistence of [0 ] .................................................................. 34
2.3.2.3.4. Dorsal places of articulation:Palatals and velars....................... 35
2.3.2.3.5. Labial places of articulation:Bilabials and labiodentals 37
viii
2.3.2.3.6. The stop-affricate-fricative series................................................37
2.3.2.3.7. Constraint lattice...........................................................................41
2.3.2.3. *[sI]......................................................................................................... 42
2.3.2.4. *[tl].......................................................................................................... 46
2.3.2.5. ??[pw]......................................................................................................48
2.4. Summary................................................................................................................... 51
3. THEORIES OF PHONOTACTIC EFFECTS IN SPEECH PERCEPTION............... 53
3.1. Introduction................................................................................................................53
3.2. TRACE (McClelland & Elman 1986).....................................................................53
3.2.1. How TRACE works........................................................................................ 53

3.2.2. Lexical effects on phoneme perception......................................................... 56
3.2.3. Phonotactic effects on phoneme perception..................................................59
3.2.4. Empirical shortcomings of TRACE...............................................................60
3.3. The MERGE Transitional Probability theory (Pitt & McQueen 1998)............... 61
3.3.1. Simulation: Success of statistical predictions.............................................. 63

3.3.2. Probabilistic theories of speech perception...................................................65
3.3.2.1. Context.................................................................................................... 66
3.3.2.2. Database.................................................................................................. 69
3.3.2.3. Decision ru le.......................................................................................... 72
3.3.3. Statistical context effects on phoneme perception........................................76
3.4. A grammar-based account....................................................................................... 89
3.4.1. Choice of grammatical theory........................................................................90
3.4.1.1. Grammatical framework........................................................................90

3.4.1.2. Particular grammar................................................................................ 92
3.4.2. Decision mechanism.......................................................................................93
ix
3.5. Summary..................................................................................................................100
3.6. Appendix: Computing frequencies...................................................................... 102
4. EMPIRICAL TESTS.........................................................................................................108
4.1. Introduction...............................................................................................................108
4.2. Experiment I: Sequence frequency and the phonotactics of word-final lax
vowels................................................................................................................. 110
4.2.1. Rationale.........................................................................................................110
4.2.2. Design............................................................................................................. 113
4.2.3. Predictions..................................................................................................... 119
4.2.3.1. TRACE simulation................................................................................119
4.2.3.1.1. Calibration and replication of the original TRACE results.... 119

4.2.3.1.2.Simulation of the present experiment.......................................... 123
4.2.3.2. MERGE T P ...........................................................................................131
4.2.3.21. INC-1............................................................................................ 132

4.2.3.2.2. SC-1............................................................................................. 134
4.2.3.3. OT grammatical theory......................................................................... 135
4.2.4. Methods............................................................................................................137
4.2.5. Results..............................................................................................................139
4.2.6. Discussion........................................................................................................ 143
4.3. Experiment 2: Sequence frequency and word-initial [pw] clusters.................... 143
4.3.1. Rationale...........................................................................................................143
4.3.2. Design...............................................................................................................144
4.3.3. Predictions....................................................................................................... 147
4.2.3.1. TRACE simulation.............................................................................. 147

4.2.3.2. MERGE T P ...........................................................................................150
4.2.3.2.I. INC-1........................................................................................... 150
4.2.3.2.2. SC -1............................................................................................153
4.2.3.3. OT grammatical theory.......................................................................154
4.3.4. Methods........................................................................................................ 155

4.3.5. Results...........................................................................................................156
4.3.6. Discussion..................................................................................................... 158
4.4. Experiment 3: Sequence frequency and the relative phonotactic badness

of [pw] and [tl] onsets........................................................................................ 159
4.4.1. Rationale......................................................................................................... 159

4.4.2. Design..............................................................................................................160
4.4.3. Predictions...................................................................................................... 163
4.4.4.1. TRACE simulation...............................................................................163
4.4.3.1.1. [_fkous]stimuli...........................................................................164
4.4.3.1.2. [_vnAm]stimuli...........................................................................168
4.4.3.1.3. Expected and actual TRACE predictions..................................170
4.4.4.2. MERGE T P ..........................................................................................171
4.2.3.2.1. IN C -1.......................................................................................... 171

4.4.3.2.2. SC -1............................................................................................ 173
4.4.3.3. OT grammatical theory....................................................................... 175
4.4.4. Methods........................................................................................................... 176

4.4.5. Results............................................................................................................. 176
4.4.5.1. [_ v n A tn ] stimuli.................................................................................... 177

4.4.5.2. [_fkous] stimuli.................................................................................... 181
4.4.6. Discussion....................................................................................................... 185
4.5. Experiment 4: Sequence frequency and the relative phonotactic badness

of [bw] and [dl] onsets: Interaction of response variables............................186
xi
4.5.1. Rationale..........................................................................................................186
4.5.2. Design.............................................................................................................. 187
4.5.3. Predictions...................................................................................................... 187
4.5.3.1. TRACE simulation............................................................................... 187

4.5.3.2. MERGE T P ........................................................................................... 188
4.5.3.2.1. IN C-1........................................................................................... 188

4.5.3.2.2. SC -1..............................................................................................190
4.5.3.3. OT grammatical theory.........................................................................191
4.5.4. Methods........................................................................................................... 193

4.5.5. Results and discussion....................................................................................198
4.5.6. Foreign-language exposure.......................................................................... 201
4.5.7. A note on TRACE..........................................................................................202
4.6. Experiment 5: Phonotactics and syllabification.................................................. 207
4.6.1. Rationale.........................................................................................................208
4.6.2. Design............................................................................................................. 208
4.6.3. Predictions..................................................................................................... 208
4.6.4. Methods..........................................................................................................208
4.6.5. Results............................................................................................................212
4.6.6. Discussion...................................................................................................... 215
4.7. Experiment 6: Phonotactics of the Japanese lexical strata................................. 220
4.7.1. Influence of remote context......................................................................... 220

4.7.2. Rationale: Thelexical strata of Japanese................................................... 221
4.7.3. Experiment 6a: Word-superiority effect......................................................223
4.7.3.1. Design....................................................................................................224
4.7.3.2. M ethods................................................................................................ 225
4.7.3.3. Results and discussion......................................................................... 226
4.7.4. Experiment 6b: Lexical stratum phonotactics........................................... 227
xii
4.7.4.1. Design................................................................................................... 228
4.7.4.2. Predictions........................................................................................... 230
4.7.4.2.1. TRACE........................................................................................ 230

4.7.4.2.2. MERGE T P ................................................................................ 230
4.7.4.2.3. OT grammatical theory.............................................................. 231
4.7.4.3. M ethods................................................................................................234
4.7.4.4. R esults.................................................................................................. 235
4.7.4.5. Discussion............................................................................................ 238
4.8. Summary..................................................................................................................239
4.9. Appendix: Synthesis parameters for the stimuli of Experiments 4 and 5 .........240
5. CONCLUSIONS...................................................................................................... 248
BIBLIOGRAPHY...........................................................................................................253
xiii
LIST OF TABLES
Table Page
2.2. C[i w 1] onsets of American English.................................................................... 16
2.3. Obstruent manner features.................................................................................... 21
2.4. Obstruent place features.........................................................................................22
2.5. Representation of consonants................................................................................ 24
2.6. Manner features for [j w ].......................................................................................25
2.7. Place features for [i w ] .......................................................................................... 26
2.8. Manner features for [1]........................................................................................... 27
2.9. Place features for [1]................................................................................................28
2.10. English surface obstruent inventory in CV syllables...........................................29
2.15. Repair of retroflexes to alveolars (Yule & Bumell 1886, American

Heritage Dictionary 2000)...................................................................................31
2.16. Repair of dentals to alveolars (American Heritage Dictionary 2000).............. 32
2.25. Number of languages with stops at given places in the sample of

Maddieson (1984:Table 2 .5 )............................................................................... 38
2.26. Frequency of the most common affricates in the sample of Maddieson

(1984:Table 2.8)....................................................................................................38
2.28. Labial, alveolar, and palatoalveolar series of American English..................... 39
2.35. English surface obstruent inventory in C[j]V syllables..................................... 43
2.39. English surface obstruent inventory in C[1]V syllables..................................... 46
xiv
3.2. Probability that a given diphone will be followed by a given segment
(extract from complete table)..............................................................................64
3.3. Results of the simulation: Success rate as a function of context size............... 65
3.5. Attested English phoneme sequences of lengths 2,3, and 4 .............................. 68
3.6. Transitional probabilities for the stimuli of Pitt & McQueen (1998), n= 1... 71
3.12. Triphone frequencies for sequences ending in [i]/[o] in the stimuli of

McClelland and Elman (1988)............................................................................82
3.13. Triphone frequencies for sequences ending in [s]/[f] in the stimuli of

McClelland and Elman (1988)............................................................................83
3.14. Triphone frequencies for the stimuli of McQueen and Pitt (1998)...................84
3.15. Cohorts at the appearance of the ambiguous fricative in the experiment of

McClelland and Elman (1988, Experiment 3)....................................................85
3.16. Cohorts at the appearance of the ambiguous fricative in the experiment

of Pitt and McQueen (1998, Experiment 3 ).......................................................86
3.17. Likelihood ratio as a predictor of the phonotactic bias effects of Pitt (1998). 88
4.1. Distribution of tense and lax vowels in American English............................ 111
4.2. Change of lax to tense vowels when made final by truncation....................... 112
4.3. Phonotactics of stimuli for Experiment 1.........................................................115
4.4. Frequency of the syllables in stimuli for Experiment 1................................. 116
4.5. Effect of the [k]/[g] manipulation on the frequency of the word-final

syllables in the stimuli of Experiment 1.......................................................... 118
4.6. Parameter settings for the TRACE simulation (all experiments)....................120
xv
4.9. Featural parameters of the four original TRACE vowels (McClelland &
Elman 1988)....................................................................................................... 124
4.10. Featural parameters of the new vowels [I] and [X ]...........................................124
4.17. Diphone frequencies for the stimuli of Experiment 1....................................... 132
4.29. Mean % [I] response, all intermediate stimuli.................................................. 141
4.30. Differences in mean "I" response, pairwise by subject.....................................142
4.31. Phonotactics of the stimuli for Experiment 2 .................................................... 145
4.32. Frequency of the syllables in the stimuli for Experiment 2 ............................. 146
4.33. Word-initial occurrences of the critical syllables from Experiment 2 ............ 148
4.34. Results of the TRACE simulation of Experiment 2: Activation levels at

Cycle 7 5 ............................................................................................................... 149
4.35. Diphone frequencies for the stimuli in Experiment 2 ....................................... 151
4.40: Triphone frequencies for the stimuli of Experiment 2......................................153
4.46. Mean % [p] response, all intermediate stimuli.................................................. 157
4.47. Phonotactics of the stimuli for Experiment 3 .................................................... 160
4.48. Frequency statistics for the stimuli of Experiment 3 ........................................ 162
4.51. Words beginning with the critical onsets in the lexicon used for the
TRACE simulation of Experiment 3 ................................................................. 165

Cycle 7 5 ................................................................................................................166

Cycle 7 5 ................................................................................................................ 169
xvi
4.60. Diphone frequencies for the stimuli of Experiment 3......................................171
4.66. Triphone frequencies for the stimuli of Experiment 3 .................................... 174
4.72. Mean percent "p" response, all intermediate [...vnAm] stimuli....................... 180
4.73. Differences in mean "p" response, pairwise by subject, [...vnAm] stimuli... 180
4.76. Mean percent "p" response, all intermediate [...fkous] stimuli....................... 184
4.77. Differences in mean "p" response, pairwise by subject, [...fkous] stimuli,

[i] condition on ly 184 ■
4.80. Diphone frequencies for the stimuli of Experiment 4 ..................................... 189
4.82. Triphone frequencies for the stimuli of Experiment 4 .................................... 190
4.83. Frequency of occurrence of the clusters of Experiment 4 as onsets in

English.................................................................................................................192
4.89. Lexicon for TRACE simulation: Words with [b/d/g]+[w/l] onsets................204
4.94. Demographics and occurrences of [bw pw]-initial place names in Iowa

and Nebraska (U.S. Census Bureau, 1990; DeLorme 1998; 2000)................217
4.95. Errors in production of the initial stop in [bl pi tw] onsets by English-
leaming children in Iowa and Nebraska (Smit 1993)..................................... 218
4.96. The lexical strata of Japanese.............................................................................222
4.97. Stimulus words used in Experiment 6a............................................................. 224
4.98. Boundary difference in long- and short-biased contexts (in milliseconds),

Experiment 6a (N = 21 Ss).................................................................................227
4.99. Validity of cues to stratum membership: Number of nouns in database

belonging to each stratum containing the given cues..................................... 229
xvii
4.102. Stimuli for Experiment 6 b .................................................................................234
4.104. Constant synthesis parameters which were identical for the "b" and "d"
arrays of Experiments 4 and 5 ........................................................................... 240
4.105. Time-varying synthesis parameters common to the "b” and "d" arrays of
Experiments 4 and 5............................................................................................ 241
4.106. Synthesis parameters for the "b" array of Experiments 4 and 5 .............. 243
4.107. Synthesis parametersfor the "d" array of Experiments 4 and 5 ..................... 245
xviii
LIST OF FIGURES
Figure Page
2.1. Architecture of the OT phonological model........................................................... 12
3.1. The TRACE model of McClelland and Elman (1986).......................................... 54
3.11. Boundary shift in perceptual space....................................................................... 75
3.23. Architecture of an OT-grammatical-based parsing model.................................. 95
4.7. Results of the TRACE simulation replicating Figure 7 of McClelland and

Elman (1986)...................................................................................................... 121
4.8. Results of the TRACE simulation replicating Figure 8 of McClelland and

Elman (1988)...................................................................................................... 123
4 .1 1. Results of the TRACE simulation for the input [salgjXfl.......................... 127
4.12. Results of the TRACE simulation for the input [salleiX]-] ...........................128
4.13. Results of the TRACE simulation for the input [solgjX]............................129
4.14. Results of the TRACE simulation for the input [salluX]............................130
4.26. Schema for the filler stimuli of Experiment 1.....................................................138
4.27. Schema for the critical stimuli of Experiment....1...............................................138
4.28. Identification curves for the stimuli of Experiment 1, pooled across 14

listeners................................................................................................................140
4.43. Schema for the critical stimuli of Experiment 2...............................................155
4.44. Schema for the Filler stimuli of Experiment 2 .....................................................155
xix
4.45. Identification curves for the stimuli of Experiment 2, pooled across 7
listeners................................................................................................................157
4.70. Identification curves for the [...vnAmJ stimuli of Experiment 3, pooled

across 12 listeners, comparing the [_1] condition with the [_ j ] baseline 178
4.71. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled

across 12 listeners, comparing the [_w] condition with the [_ j ] baseline.... 179
4.74. Identification curves for the [...fkoUs] stimuli of Experiment 3, pooled

across 12 listeners, comparing the [_1] condition with the [_ j ] baseline 182
4.75. Identification curves for the [...fkous] stimuli of Experiment 3, pooled

across 12 listeners, comparing the [_w] condition with the [_ j ] baseline .... 183
4.85. Synthesis parameters for the stimuli of Experiment 4 .......................................194
4.86. Log odds ratios for the "l"/"w" judgment in Experiment 4, contingent on
the "g’V'd" judgm ent.......................................................................................... 199
4.87. Log odds ratios for the ”17"w" judgment in Experiment 4, contingent on
the "g’7"b" judgm ent.......................................................................................... 200
4.88. Total number of "bw" responses in Experiment 4 as a function of individual

listeners' exposure to languages containing [bw] or [pw] onsets (French,
Mandarin Chinese, or Spanish)........................................................................ 202
4.91. Synthesis parameters for the stimuli of Experiment 5 ...................................... 209
4.92. Log odds ratios for the 'T7"w" judgment in Experiment 4, contingent on the
»b"/"d» judgment, for the CCV stimuli.............................................................213
4.93. Log odds ratios for the 'T7"w" judgment in Experiment 4, contingent on the
"b'7"d" judgment, for the VCCV stimuli......................................................... 214
4.103. Boundary between [a] and [a:], averaged across 21 listeners....................... 236
xx
CHAPTER 1
INTRODUCTION
This dissertation investigates the role of phonological knowledge in speech
perception. It proposes a theory of performance which makes use of grammatical
competence - specifically, competence expressed in terms of the ranked and violable
constraints of phonological Optimality Theory - to weigh competing hypotheses about the
phonological structure of the speech signal. This theory is tested empirically against the
rival claims of two other models to explain the same phenomena: TRACE, which uses
lexical knowledge, and the MERGE transitional-probability theory, which uses segment-
string frequency.
The phonological phenomenon with which we are concerned here is phonotactic
grammaticality. Languages place tight restrictions on how their segmental inventories can
combine into larger units such a syllables, morphemes, or words, and speakers are sensitive
to these restrictions in a number of ways. The phonologically systematic patterns of
possible and impossible combinations are the phonotactics of the language. Phonotactics
causes redundancy, predictability in speech, which the mechanisms of speech perception
could in principle exploit.
Phonotactic effects turn up in many places. They appear as systematic gaps in the
distribution of sounds in a speech corpus (e.g., Harris 1951, Lamontagne 1993) - what in
Chapter 2 are called phonological gaps. Phonotactics can drive synchronic phonological
alternations, such as that between American English [t] and [rj, which are conditioned by the
neighboring segments (e.g., Prince & Smolensky 1993). A foreign word can undergo
sound changes when it is borrowed that adapt it to the phonotactics of the borrowing
language (ltd & Mester, 1994).
Native speakers share intuitions about the phonological grammaticality in their own
language of novel phoneme strings (Greenberg & Jenkins 1964, Scholes 1966). English
listeners can also accurately judge the relative frequency o f non-English consonant clusters
in the languages of the world (Pertz & Bever 1975). A language's phonotactic constraints
are respected by its speakers’ slips of the tongue (Fromkin 1971) and ear (Sapir 1933;
Brown & Hildum 1956; Halle, Segui, Frauenfelder, & Meunier 1998), and have been
shown to influence speakers' perceptions of phonetically ambiguous segments (Massaro &
Cohen 1983; Pitt 1998).
In sum, speakers tend to reject phonotactically illegal stimuli in production and
perception, requiring, for instance, stronger acoustic evidence to believe that they have heard
an illegal stimulus than a legal one. Moreover, illegality measured one way (e.g., off-line
intuitive-goodness judgments) tends to agree with illegality measured in other ways (e.g.,
ambiguous-segment perception). Some sort of language-specific knowledge is being
brought to bear on all o f these.
What is at issue is the nature of that knowledge, and of its interaction with the
mechanisms of language performance. We will be investigating three very different
proposals. Each will be examined chiefly in light of its account of the phonotactic effect on
ambiguous-phoneme perception, a question directly and explicitly addressed by all three: If
a stimulus contains a phoneme which is acoustically ambiguous between one which is legal
in that context and one which is illegal, listeners' reports are biased towards the legal
interpretation compared to their report of the same ambiguous phoneme presented in a
neutral context (Massaro & Cohen 1983).
The claim which I will advance, elaborate, and defend is the following:
( 1)
Speech input is parsed prelexically to a featural or phonemic surface
representation. When acoustic evidence in the incoming speech stream
supports more than one phonological parse, the competing parses are scored
with respect to the ranked active constraints of the speaker's grammar, and
the more harmonic candidate parse is processed first.
This is a way of allowing performance mechanisms to use linguistic competence, by
setting up a perceptual bias against parses which are disfavored by the grammar. It
therefore incorporates into the performance theory the traditional linguistic view that the
difference between a grammatical and an ungrammatical utterance hinges on whether the
utterance fulfills specific formal requirements - whether it meets the structural description
of a set of abstract grammatical rules. In this view, an ambiguous phoneme generates two
(or more) parses. If one is legal in context and the other is not, perception will favor the
legal parse. This is the principal theoretical contribution offered by this dissertation: An
account of how phonological grammar can be used in a parsing theory.
Quite different in vision are the two rival theories, TRACE (McClelland & Elman
1986) and the MERGE transitional-probability (TP) theory (Pitt & McQueen 1998). These
regard phonotactic illegality as a very concrete phenomenon, equivalent to non-occurrence in
the lexicon1. Illegality is the extreme low end of a frequency continuum, and its effects are
effects of frequency. Where these two theories differ is in how they implement frequency
effects.
TRACE is a connectionist model of word recognition, within which phonotactic
effects emerge as side effects of the word-recognition process. A stimulus will activate
phoneme units, which in turn can produce certain levels of activation in a word unit,
depending on the degree to which the stimulus resembles the word represented by that unit.
A stimulus containing a phoneme ambiguous between a legal and an illegal one will partially
activate some words containing the legal phoneme, but none containing the illegal one.
Activation spreading down from the word units to the phoneme units will increase the
activation of the unit corresponding to the legal phoneme, which will laterally inhibit the
illegal phoneme unit. The result is a perceptual bias towards the legal phoneme.
The MERGE TP theory is associated with a model of phonemic processing, the
MERGE model of Norris et al. (2000). In MERGE TP, low-level (pre-Iexical) perceptual
1 Transitional probabilities are assigned to a pre-Iexical m odule in MERGE, but the probabilities
themselves are computed over the lexicon.
mechanisms keep track of the frequencies with which different phoneme sequences occur.
An ambiguous segment and its surrounding context could be interpreted as either of two
sequences, but perception will tend to favor the more frequent possibility - that is, it will
choose the phoneme that, on the basis of past experience, is more likely in the given context.
Both of these models have demonstrated success in accounting for some of the core
phonotactic perceptual phenomena. However, I will argue that neither one is adequate, for
reasons crucially connected with their lack of access to abstract grammatical knowledge.
Both make predictions that are not borne out, and fail to predict phenomena that occur.
The grammatical approach to phonotactics treats these phonotactic effects as a
syndrome, with a single underlying cause, and identifies that cause with listeners' knowledge
of the sound pattern of their language in the form of phonotactic constraints against
particular combinations of sounds (e.g., Shibatani 1973). Speech tasks make the speaker or
listener assign a linguistic parse to the stimulus; parses which are nonexistent or highly
marked in the language will naturally be disfavored.
However, most of the "phonotactic" effects are open to another interpretation:
Perhaps speakers merely know that some sequences are common and others are rare or
nonexistent. The statistical rarity of particular phoneme sequences affects intuitive
"possible-word" judgments (Treiman et al. 1996) and ambiguous phoneme perception
(Newman et al. 1997; Pitt & McQueen 1998) in much the same way as phonotactic
illegality. Rarity also speeds "no" responses in lexical decision and slows same-different
judgments of nonwords (Vitevich & Luce 1998). What linguists have described as a
categorical contrast between possible and impossible sequences can instead be interpreted
as a frequency continuum, with the "impossible" sequences at the irreducible minimum of
zero frequency.
Statistical models differ in which statistics they use and how they use them. In the
perceptual model TRACE (McClelland & Elman 1986), the rarity of particular sound
sequences is encoded in knowledge of words, and statistical knowledge is retrieved by
querying the lexicon. One component of the MERGE model of perceptual decision-making
(Norris et al. 2000) keeps track of phoneme-to-phoneme transitional probabilities, which are
used without reference to the lexicon. The current version of the Neighborhood Activation
Model (Luce 1986; Luce & Pisoni 1988; Vitevich & Luce 1998, 1999) combines
knowledge of sublexical sequence frequencies with knowledge of lexical frequencies and
neighborhoods. A statistically-based model of the acceptability-judgment task using co
occurrence frequency has been put forward by Frisch, Broe, & Pierrehumbert (1995).
The theoretically most attractive aspect of statistical models is their account of how
speakers learn the phonotactics of their language - en passant, as a by-product of learning
its vocabulary.
On the other hand, they do not explain three phenomena that led people to posit
grammars in the first place.
1. If phonotactics is learned simply by learning or hearing words, speakers should
be able to acquire any language at all. The lexical and statistical mechanisms only
distinguish favored from disfavored sound patterns within a language, after the lexicon has
been learned or the statistical patterns have been analyzed. Yet English listeners can
accurately judge the relative frequency of non-English consonant clusters in the languages
of the world (Pertz & Bever 1975). The cross-linguistic commonness or rarity of different
classes of segments, sequences, or processes, is not addressed by statistical learning
theories, nor is the way in which the processes found in one language resemble those found
in others. Capturing these patterns is a central concern of grammatical models of language,
which have evolved a wide array of conceptual tools for this purpose; Articulatory
grounding of constraints (Archangeli & Pulleyblank 1994), implicational markedness
(Greenberg 1964), feature geometry (Clements 1985), natural classes (Chomsky & Halle
1968), and many more.
2. The alternations induced by phonotactics are categorical rather than gradient, and
systematic rather than arbitrary. For example, the phonotactics of Standard German forbid
word-final [b d g v z]; in that environment, they turn into [p t k f s], despite their differing
frequencies. The frequency difference between (common) word-final [t] and (zero-
frequency) word-final [a] is much greater than that between (uncommon) word-final [p] and
(zero-frequency) word-final [b], yet German speakers "repair" the illegal final voiced
obstruents to the same extent in both cases. And the repair is not to turn the illegal
obstruents into the most frequent legal obstruent, but into the corresponding legal
obstruent.
3. Phonological alternations occur even if the utterance consists of very rare
morphemes or even nonce forms, and exceptions to regular patterns are less likely to occur
as morpheme frequency decreases. These features suggest that the regularity is distinct
from the forms it applies to, rather than emergent from them.
The weakness of both non-grammatical theories is the superficiality of their
linguistic analysis, which prevents them from abstracting the empirically correct
generalizations about legal and illegal sequences.
TRACE and MERGE TP offer extremely simple theories of phonological
representation. Phonemes are unstructured lists of features, as in Jakobson et al. (1952).
The only phonological domain above the level of the phoneme which is recognized by
TRACE is the word, while MERGE TP also recognizes 2- or 3-phoneme sequences.
Neither represents the phonotactically crucial domain of the syllable, or any of its
constituents such as the onset and rime. Both are incapable of abstracting over features: All
patterns are represented at the level of the phoneme, sequence, or word frequencies. More
abstract properties which influence phonotactics, such as part of speech or lexical stratum,
are not encoded anywhere. The result is that the dependencies these theories represent do
not correspond to the ones which are linguistically and perceptually relevant.
These models cannot distinguish phonological gaps (sequences which are
systematically prohibited) from lexical gaps (sequences which are permitted, but missing
from the lexicon through historical accident). If illegality and frequency are the same thing,
then zero-frequency sequences should be equally illegal regardless of why they are illegal.
They cannot distinguish phonotactically relevant context from phonotactically
irrelevant context. For example: The nonword [tli] is illegal in English. The illegality of
the [1] in that context is due entirely to the context on its left - the word boundary and [t],
which create an illegal sequence of two coronal non-continuants in a syllable onset. The [i]
has nothing to do with the phonotactic unacceptability of the string; [tli] and [tla] are both
illegal. TRACE and MERGE TP are blind to this fact. Each applies a fixed "context" to
every phenomenon. The relevant context in TRACE is the entire nonword; that in MERGE
TP is the neighboring phonemes. Both theories therefore incorrectly overestimate the
perceptual influence of incidental context.
It is an empirical question whether listeners process phonological gaps differently
from lexical gaps. Evidence will be presented to show that they can: that phonological gaps
are stronger than lexical gaps, and that phonotactically relevant context is more influential
than phonotactically irrelevant context.
The organization of the dissertation is as follows:
Chapter 2 discusses the phonological background of the theory - the grammar
which it is proposed that performance mechanisms have access to. It first discusses the
Optimality-Theoretic approach to phonotactic grammar as a filter on the lexicon, going on to
review the distinction between lexical and phonological gaps, and between phonotactically
relevant and irrelevant context. Two particularly prominent phonotactic gaps in English
syllable onsets - [tl] and [s j ] - are shown to be phonological rather than lexical gaps, and
are analyzed as special cases of more general prohibitions.
Chapter 3 introduces the three theoretical contenders, TRACE, MERGE TP, and the
OT grammatical theory. The rationale for each is discussed and the existing empirical
evidence weighed, with each theory's account of it presented.
The precise workings of the MERGE TP theory have not yet been explained by its
adherents. Much of Chapter 3 is devoted to considering the different design parameters of
a theory of transitional probabilities in perception and choosing which possibilities to test.
The most important of these parameters is the specific nature of the phonological context:
How many segment positions are included, and how do the left and right contexts interact?
It will be shown that there is no choice of context that can account for the data cited by the
MERGE TP authors in support of the theory. If the context is chosen so as to cover any
one part of the data, the theory makes incorrect predictions about the rest. On the chance
that some of the contradictory data might be artifactual, two contexts are chosen for testing
as the most plausible and interesting.
New empirical evidence bearing on these theories is reported in Chapter 4. The
tactical focus is on the distinction between phonotactically relevant and irrelevant context,
and on that between phonological and lexical gaps.
Experiments 1-5 build on previous psycholinguistic research on the phonotactic
perceptual effect of English syllable structure.
Experiment 1 demonstrates an effect of phonotactically relevant context, but not of
irrelevant context, on perception of an [i]-[i] continuum by American English speakers,
exploiting the phonotactic illegality of word-final lax vowels. Experiment 2 attempts to
replicate Experiment 1 with initial [pw], considered phonotactically illegal by the TRACE
authors on statistical grounds (McClelland & Elman 1986), but merely "marginal" by
phonoiogists on the basis of intuition, distribution, and history (Hultzen 1965, Wooley
1970, Catford 1988, Hammond 1999). No effect is found, despite the strong statistical
biases against [pw]. Experiment 3 directly compares the bias against [pw] with that against
the much more illegal, but statistically very similar, [tl], and finds a much stronger bias
against the latter. Manipulations of phonotactically relevant context are found to have the
effect predicted by the grammar-based theory, while those of phonotactically irrelevant
context have no effect. These findings are argued to favor the grammar-based processing
theory over TRACE and MERGE TP.
Where previous work in this field, including Experiments 1-3, has used stimulus
units to measure the dependent and independent variables. Experiments 4 and 5 used a
technique which allows the effect of one response on another to be measured when judging
a CC cluster in which both C’s are ambiguous (Nearey 1990). This allows bias effects to be
disentangled from stimulus factors and hence measured with greater accuracy. In
Experiment 4, the bias against [bw] is compared with that against the much more illegal, but
statistically very similar, [dl]. A strong bias against [dl] is found, but none against [bw],
corroborating the findings of Experiments 2 and 3. Experiment 5 is a control experiment to
insure that the results of Experiments 2,3, and 4 were not caused by compensation for
coarticulation (Mann 1980).
Experiments 6 a and 6 b exploit the stratified nature of the Japanese lexicon, in which
each word belongs to one of four classes with its own syndrome of phonological,
morphological, and etymological properties. One of these strata, Sino-Japanese, forbids
final [a:], while another, Foreign, permits it. The [a]—[a:] boundary is measured in carrier
nonwords containing phonological cues to membership in one stratum or the other.
In Experiment 6 b, Sino-Japanese cues are found to bias perception against [a:] as
compared to Foreign cues - an effect which is expected and necessary in the grammar-
based processing model. The MERGE TP model cannot account for this effect directly,
since some of the phonotactically effective context is too far away from the ambiguous
segment for the model to capture the dependency. The results can only be accommodated
in that model through ad hoc revisions.
The phonotactic boundary shift is larger and more robust than a word-superiority
effect obtained with the same listeners and paradigm in the control Experiment 6 a. This is
unexpected under TRACE, which models phonotactic effects as word-superiority effects.
Chapter 5, finally, sums up the claims, arguments, and data presented in earlier
chapters, and situates them in the larger research context. Problems and opportunities for
the theory of grammar in speech perception are discussed, and areas of future research
delineated.
10
CHAPTER 2
PHONOLOGICAL PRELIMINARIES
2.1. Introduction
This chapter has two principal aims. The first is to discuss the Optimality Theory
view of surface phonotactics in general; the second is to present a specific OT analysis of
facts about English syllable onsets that will be used in later chapters. A distinction is drawn
between productive phonological gaps and nonproductive lexical gaps in the syllable
inventory. Two examples of phonological gaps ([tl] and [s j ]) and one example of a lexical
gap ([pw]) in the English syllable onset inventory are discussed, and the grammatical
groundwork laid for the perceptual studies of later chapters.
2.2. Inventory and phonotactics in Optimality Theory
No spoken language uses all of the segments known to linguistics; each is limited to
only a comparatively small inventory (Maddieson 1984, §1.2). Sounds in the inventory do
not combine at random to form larger units, but are restricted to a small phonotactically
permissible subset of the logically possible combinations.
The OT account of this is shown schematically in Figure 2.1 below. Underlying
representations, drawn from the lexicon, are inputs to the grammar. The output of the
grammar is the observable set of surface forms.
11
Figure 2.1. Architecture of the OT phonological model
LEXICON
/underlying
representation/
CANDIDATE OUTPUTS
[surface representation 1]
[surface representation 2] [surface
GRAMMAR
representation]
Under the principle of Richness o f the Base (Prince & Smolensky 1993, §9.3), the
lexicon and the grammar function as independent modules. AH they have in common is a
representational protocol: The output of the lexicon and the input to the grammar are made
of the same representational elements (features, etc.) put together in the same way. Aside
from this restriction, the lexicon can, in principle, emit any representation, and the grammar
has to deal with it.
Since the set of output candidates includes, at the very least, a fully faithful candidate
identical to the input (and is generally held to include all of the possible inputs), the
12
grammar acts as a filter: Some of the inputs from the lexicon result in outputs that are
identical to them: others do not. 1
When we observe that a particular segment (or larger configuration) [Y] is missing
from the surface representations of a language, there are therefore two possible accounts:
Either no underlying representation /XI can surface as [Y] because of the filtering action of
the grammar, or else there is in principle such an IXJ, but by historical accident no one
happens to have coined or borrowed a word containing it. In the first case we are dealing
with a phonological gap; in the latter, with a lexical one.
OT is hardly the first theory to make this distinction, and its practical test of what is
or is not grammatical remains the same as that of its predecessors: productivity. If native
speakers can readily accept and produce an unattested segment in different environments,
treating it phonologically and phonetically like a word of their language, then the gap is
accidental, and is modelled as a lexical gap. On the other hand, if speakers are consistently
unable to produce the segment without alteration and without great effort, then the gap is
phonological, and is modelled in OT as a filtering effect of the grammar.
The distinction between phonological and lexical gaps is similar to, but not quite the
same as, that between systematic and accidental gaps. A gap is systematic if it is part of a
pattern of gaps; it is accidental if it is isolated. When the aim is to describe the sound
pattern of a language with maximum compactness and elegance, it is usual to put the
systematic gaps in the grammar and leave the accidental ones out. Starting with the same
language, we can arrive at different grammars depending on which criterion we follow, since
the systematic gaps are not necessarily productive. (See, for example, the discussion of
English initial [$1] or [pw bw mw] in §3.1.) Because our psychological claim is
specifically about productive phonology, we will use the phonological/lexical criterion in
constructing our grammar.
1 The term "source/filter model" is an analogy with the source/filter model o f vocal-tract acoustics, in
which the larynx is a sound source whose output is filtered by the rest o f the vocal tract. W hatever the
larynx emits, the rest o f the vocal tract has to deal with it, and will produce som e output.
13
Ideally, an OT grammar for the phonological inventory of a language should make
the correct filtering predictions. That is, given the entire set of representationally possible
underlying representations, it should produce all and only the productive surface forms.
However, this clear theoretical distinction between legal and illegal is often difficult
to implement in practice. Algeo (1978), for instance, reviewed 16 "typical" studies of
English consonant clusters.
The sixteen collated studies list a total of 107 possible onset clusters, of
which there is agreement on only 30, considerably fewer than a third, leaving
77 onset clusters that are rejected by one or more studies.... The
discrepancy is even more striking for coda clusters. The same studies
explicitly list or imply well over 500 clusters that are theoretically possible in
syllable codas, of which there is agreement on only 19, fewer than 4 percent
(p. 208).
Many of the discrepancies are caused by methodological differences - in selection
of materials, in choice of transcription, in level of representation (surface or underlying), in
choice of phonological domain (syllable or word), and so on, but there is a certain
irreducible gradience, a lack of perfectly sharp demarcation between the "legal" and "illegal"
sets. It is agreed that [tl] is an illegal onset and [kw] a legal one, but there is no such
uniformity of judgment about [vl] or [pw] - they are felt to lie somewhere in between. The
problem of gradient illegality is a difficult one for Optimality Theory, and one which we will
return to in our discussion (below, §2.3.2.4) of the [pw bw mw] onsets.
2.3. Inventory and phonotactics of English syllable onsets
Most experiments on phonotactics have exploited the restrictions on place of
articulation in English syllable onsets (Massaro & Cohen 1983, Pitt 1998, Moreton 1999),
14
most often the bans on *[sj] and *[tl dl]. The place-of-articulation restrictions are a good
choice because they are strongly productive, because ambiguous stimuli are straightforward
to synthesize, and because the place features of the critical consonants can be manipulated
without changing other linguistic variables.2

For these reasons we will concentrate on a grammar of the C [j w 1] onsets of
American English3. We begin by constructing a source-filter account of the stops,
affricates, and fricatives found in the onset of a CV syllable, using typologically motivated
constraints. This is then extended to the onset of C[j w 1]V syllables to account for the
phonotactics of *[sj], *[tl], and ?[pw]. To anticipate: I will model the *[sj ] gap as a special
case of a general process spreading anteriority and the *[tl dl] gap as a special case of a
general ban on homorganic obstruent sequences. The ?[pw] gap, though systematic as a
special case of a general ban on homorganic consonant sequences, is not productive and will
be modelled as a lexical gap.
§ 2.3.1 lays out the data; § 2.3.2 gives the analysis. § 2.3.2.1 describes the feature
system used in this model, based on Hall's (1997) variant of the now-standard Sagey (1990)
articulator features. § 2.3.2.2. analyzes CV syllables, § 2.3.2.3. discusses *[sr], § 2.3.2.4.
discusses *[tl], § 2.3.2.5. discusses ?[pw], A summary is given in §3.3.
2.3.1. Explicanda
Their disagreements about other clusters notwithstanding, linguists are in fair

agreement on the general outlines of the American English C [j w 1] onsets. Table 2.1
shows the onsets which I will treat as the productive ones.
1 The other principal restriction is that onsets have to rise in sonority. In theory, manipulating (he
sonority o f either C in a CC onset cluster should affect perception o f the sonority o f the other C. However,
sonority is not a distinctive feature. Two segments which differ in sonority differ in many other
linguistically relevant ways as well, making the stimuli hard to construct and the results hard to interpret.
5 The C(j] onsets I do not discuss, because they have not been used experimentally.
15
Table 2.2. C [j w I] onsets of American English
Hultzen Woolley Catford Hammon
Onset Examples 1965 1970 1988 1999
pa hi prove; brew vv vv vv vv
pw bw pueblo; bwana V? ?? 7?
pi bl plant; blame vv vv vv vv
ti di tread; dread vv vv vv vv
tji dji
tw dw twine; dwindle vv vv vv V?
tl dl •• •• •• ••
ki gj crack; grid vv vv vv vv
kw gw quit; Gwen, guava V? vv vv V?
kl gl clean; gleam vv vv vv vv
& VJ free; Wronskian, vroom V? vv

fw vw •• •• •• ••
fl vl flea; Vladimir >/• V?
0j 5j threw v»
0w dw thwart V* ?•
61 dl
SJ Zi •• M M ••
sw zw sweet; Zwicker V? V* V*
16
Hultzen Woolley Catford Hammond
Onset Examples 1965 1970 1988 1999
si zl slot; zloty >/• V?
X* J-' shred; - V* V* V*
Jw JW Schwinn, ?• v. — ••
Schwarzenegger; -
J1 5l schlock; - ?• V* — ?•
author included it, but marked it as marginal. (%) means it was marked as normal for some
dialects. (•) means it was not included.
This list is intended to include all and only C[i w 1] onsets which can be produced
without alteration and without special effort by speakers of American English. Clusters that
are obviously non-native have been included as long as they occur in familiar, easily
pronounceable names (Schwarzkopf, Zwicker, Vladimir) or loan words (zloty, guava,
pueblo)4. I am not sure whether the unattested onsets [3 r 3 W 3 I] (italicized in Table 2.1)
are a lexical or phonological gap; given the rarity of initial [3 ] in English, it is dangerous to
infer anything from their absence alone. I will take them to be of the same grammaticality
as their voiced counterparts.
The transcription in Table 2.1 is a broad one. The finer phonetic details, which are
crucial to this analysis, will be discussed below.
4 Compare news broadcasters' fluent pronunciations of zloty, Norman Schwarzkopf, Vladimir Putin with
their awkward Chechnya and Srebrenica. The productivity o f the syllable-initial [nj] and [s j ] gaps for these
trained speakers is clearly audible.
17
2.3.2. Analysis
Two generalizations are immediately clear from Table 2.1. First, the C in the C[j w
I] cluster is itself never a [j w I], but is always something of lower sonority. This is a
special case of a general fact, true across languages, about syllable onsets - that sonority
rises over the course of the onset (Clements 1990).5 Second, there is no difference between
the behavior of unvoiced C and its voiced counterpart.
Since both of these issues are irrelevant to the question of place restrictions, we can
simplify our task by ignoring them. Henceforth we will only consider voiceless Cs, letting
them do double duty for their voiced counterparts, and ignore candidates with flat or falling
sonority (assumed to be ruled out by very high-ranked markedness constraints).
2 3.2.1. Representations
I adopt the representational system below. It is a slightly simplified version of the
system proposed by Hall (1997), which is in turn a modification of the Sagey (1990)
feature geometry based on active articulators.
5 English, like a number o f other languages, allows [s] and perhaps (J] to occur out o f the expected
sonority sequence (e.g., spit, stick, skip, square; shtik). This is a vexed question which I will not discuss.
It has been suggested that the [s]C sequence is a complex segment like a reverse affricate (Hayes 1980.
Lamontagne 1993).
18
(2.2) Feature tree
+Root
+Manner
continuant
consonantal
sonorant
strident
♦lateral
+Laryngeal
spread glottis
constricted glottis
voiced
+Supralaryngeal
♦Velum
nasal
+PIace
+Labial
+round
+Coronal
anterior
distributed
back
+Dorsal
back
high
low
Note: Features marked'+’ are privative; others are equipollent.
The most notable difference between this and the familiar Sagey (1990) system is
that [back], normally a dependent of the Dorsal articulator, is here also a dependent of the
Coronal node too, with the stipulation that [+back] requires [+Dor]. The innovation is
Hall's (1997) solution to a problem in the original system: that palatalization could not be
straightforwardly modelled as feature spreading when the palatalized consonant was
[+Cor], Segments which triggered palatalization, usually front vowels, were [+Dor -back],
but the [-back] could not be spread to a preceding [+Cor] segment, since [+Cor] could not
support it (see Sagey 1990: §3.4.2.2). Hall argues that the palatalization feature, whatever it
is, must be a child of both the Coronal and Dorsal articulator nodes, since it can be spread to
both [s] and [x]. The segments triggering palatalization or resulting from it are, he says, all
characterized by a fronted tongue body, which is the articulatory correlate of [-back].
19
Allowing both tongue nodes to sponsor [-back] captures the physical link between the
lamina and the forepart of the dorsum (Hall 1997:§2.7.2).
It is crucial to the analysis to have some representational scheme under which (j j ]
can spread something to the Coronal node and [w] cannot; I have chosen this one because
of Hall's detailed treatment of the various places of articulation.
I have also simplified Hall's feature tree by leaving out his Peripheral node, which
came below Place and above {Labial, Dorsal}, by replacing the Laryngeal features [stiff]
and [slack] with [voiced], and by omitting [rhotic] in favor of [+high, +low] (see below).
The Tongue Root node has been removed; I will ignore the complexities of uvular,
pharyngeal, and laryngeal consonants (McCarthy 1991). None of these changes is crucial
to the analysis.
With two exceptions, all features in this system are either privative or equipollent. A
privative feature is either present or it is not. An equipollent feature is either [+F] or [—F],
but not both. If a feature is present in a representation, then all equipollent children of that
feature have to be present as well, with either + or - specification. That is, an equipollent
feature can be absent from the representation of a segment only if the feature's parent is also
absent. A segment consisting only of the features [+Root +Laryngeal] is possible, but one
which is [+Cor] must be either [+ant] or [-ant].
The two exceptions are [cont] and [strident]. Affricates are analyzed as [-cont
+cont] (Sagey 1990:§3.3.4.2). The feature [strident] is an equipollent child of the Manner
node, but it is only present when the segment is a fricative or affricate, and only for [+Lab]
or [+Cor] segments.
Finally, I have left the privative [+lateral] under the Manner node because it behaves
like a Manner feature in not spreading. The other obvious option is to put it under [+Cor],
since nearly all known laterals are coronal and they occur at all four coronal places of
articulation (McCarthy 1988, Hall 1997: §A.2.3.2). There are lateral fricatives, lateral flaps,
20
and (most commonly) lateral affricates. None is found in English. Evidence for how they
are repaired is lacking, so will not discuss them here.6
2.3.2.L1. Consonant features
The source-fiJter model is responsible for explaining the badness of a great many
candidates for the C in a C[j w 1] onset. Here, the critical candidates are the oral stops,
affricates, and fricatives at every place of articulation. Their representations are shown in
Tables 2.3 and 2.4.
Table 2.3. Obstruent manner features
Manner of articulation_________________
Manner feature_________ Oral stop___________ Affricate___________ Fricative
cont - + and - +
cons + + +
son - - -
_______strident_____________ (never)_____________ (some)_____________(some)
6 For an alternative view o f the feature specifications for liquids that does not use [lateral 1. see W alsh
Dickey (1997).
21
Table 2.4. Obstruent place features
a.
Place of articulation
Palatoalveolar, Palatal Velar
Place features alveolopalatal
+Lab +
round
+Cor + + +
ant + + -
dist (+) -
+Cor/+Dor
back
+Dor
high
low
22
b.
Place of articulation
Palatoalveolar, Palatal Velar
Place features alveolopalatal
+Lab
round
+Cor +
ant -
dist +
+Cor/ +Dor
back - - +
+Dor + +
high + +
low - -
The IPA distinguishes palatoalveolar from alveolopalatal, at least for fricatives. I
accept Hall's arguments (1997: §2.5.2) that the two should not have different features, sir.ce
no language has two contrasting segments distinguished only by that place difference.
Table 2.5 shows the IPA symbols for every combination of the manners from Table
2.3 with the places from Table 2.4, together with stridency values for the fricatives and
affricates.
23
Table 2.5. Representation of consonants
Manner
Place Stop Affricate Fricative [strident] 7
Labial P p 4» ♦ -
Labiodental P pf f +
Dental/Interdental t J0 e -
Alveolar t ts s +
Retroflex t (5 § +
Palatoalveolar, alveolo c
t tf, tQ J. e +
palatal
Palatal c eg 5 -
Velar k kx x -
These are the low-sonority voiceless segments which this system is capable of
representing. In our source-filter model, the lexicon can emit any of them. Since most of
these segments do not and cannot occur in English, we will have to build a grammar which
deletes the un-English ones or converts them into English segments.
2.3.2.I.2. Features of [j w I]
Our task in this section is simply to describe the surface features of American
English [j w 1], We will not explain why these segments, rather than other sonorants,
should be in the American English inventory. I adopt the analysis of Kahn (1980), who
makes [j w] glides (semivowels) and [I] a sonorant consonant.
Guenter (2000) summarizes the arguments that American English [j w] are glides
as follows: (1) They are phonetically central approximants. (2) They restrict the set of
6 I do not know anything about the stridency o f lateral fricatives. In the absence o f better information, I
will assume that they are as strident as the corresponding non-lateral fricatives.
24
vowels that can precede them. (3) Each has a stressed syllabic version with which it
alternates. (4) They cannot occur after tautosyllabic diphthongs (Cohn & Lavoie 2000).
(5) Flaps occur after them. (6 ) Final [t d] cannot be deleted after them. These statements
are in general not true of [I].8 To these we can add the observation of Espy-Wilson (1992)
that [I] is frequently produced with a spectral discontinuity, while [j w ] are not.
Kingston (p.c.) points out that stops are often intruded between [1] and a following
lingual fricative: pulse [p\lts], filth [filtO]. The same phenomenon occurs with the other
class of high-sonority consonants, the nasals: warmth [woimpG], chance [tjxnts]. It does
not happen after [j j w ].
We will model [j w] as glides - that is, as vowels syllabified into a syllable onset,
having the same features as the syllabic [j u ] (Hall 1997:135, Rosenthal! 1997). The
proposed feature system is shown in Tables 2.6 and 2.7.
Table 2.6. Manner features for [j w]
Manner feature M [w]
continuant + +
consonantal - -
sonorant + +
strident
7 However. Guenter did And that 15 o f his 16 informants had an [I] that satisfied (4), and many had one that
satisfied (3); he interprets this as evidence o f language change in the direction o f a glided [I].
25
Table 2.7. Place features for [j w ]
Place feature M [w]
+Lab + +
round + +
+Cor +
ant -
dist - or +
+Cor/ +Dor
back + +
+Dor + +
high + +
low + -
The manner features are standard, as are the place features for [ w ]9 . Those for [ j]
require some justification.
Delattre and Freeman (1968) made X-ray Films, with synchronized spectrograms, of
46 speakers from various parts of the United States. They found a wide variety of [j]
articulations, which sounded very similar. All speakers, in all syllable positions, make a
constriction in the pharynx about halfway between the glottis and the uvula. They also
make a constriction somewhere in the oral cavity between the comer of the alveolar ridge
and the beginning of the soft palate, using the dorsum, blade, or tip of the tongue - in
onsets, always the blade or tip. The lips are rounded (most strongly in the onset of a
stressed syllable). Similar results were obtained in MRI and palatographic studies of 4
speakers by Alwan et al. (1997).
* Hall argues that both [j j J are actually [+Cor] (I997:§§ 1.2.6, 4.4).
26
In vowels, a pharyngeal constriction is the articulatory correlate of [+low]; a close
oral constriction, that of [+high]. From an articulatory-phonetic standpoint, all varieties of
American English [j ] are therefore [+low +high]10. The formal advantages are clear: we
are rid of the [+rhotic] feature (which needed the same kind of co-occurrence stipulations as
[+lateral]), and we no longer have to stipulate that [+high +low] is impossible."
In the studies of Delattre and Freeman (1968) and Alwan et al. (1997), the tongue
tip or blade participated in all versions of the onset glide [j ], indicating that the glide was
coronal12. The position of the constriction ranged from prepalatal to postpalatal. We can
model it as [+Cor -ant] and either [+dist] or [—

dist] depending on the speaker. Since the
choice is not crucial to the analysis, I will favor my own speech and pick [+dist].
Finally, lip-rounding in all positions requires that [j ] be modelled as [+Lab
+round].
For [1], we use the features of Tables 2.8 and 2.9:
Table 2.8. Manner features for [1]
Manner feature [1]
continuant -
consonantal +
sonorant +
strident
-t-laterai +
9 Delattre & Freeman's Figure 1, a gallery o f X-ray tracings, shows this very clearly. Their "Type 4"
syllabic [j ] is particularly striking - the tongue has two humps, one in the m iddle pharynx and one under
the hard palate, with a deep indentation between them.
11 The oral constriction in [j ] has also been analyzed as the implementation o f a [coronal] feature (Walsh
Dickey 1997).
10 The nuclear [j] had a coronal component in only one of its five manifestations (there was much more
variety between speakers in non-initial position), with the blade approaching the rear o f the hard palate
(Delattre & Freeman's Figure 1, Type S). It seems that coronal articulations are obligatory in syllable
onsets, but (for most speakers) prohibited in syllable nuclei.
27
Table 2.9. Place features for [I]
Place feature [I]
+Lab
round
+Cor +
ant +
dist -
+Cor/ +Dor
back +
+Dor +
high +
low —
The manner features are standard except for [-cont]. It is a matter of debate whether
[1] is continuant phonetically or phonologically.
The double articulation of [1] has been shown by Sproat & Fujimura (1993). Their
X-ray microbeam data, from four speakers of American English, found both a dorsal and an
apical gesture, whose relative timing varied depending on prosodic position. The apical
gesture we model as [+Cor +ant -dist]. The MRI and palatographic study of Narayanan et
al. (1997) confirmed the double gesture, and showed that the apical gesture contacted the
alveolar ridge along the midline in both onset and coda [ 1].
2.3.2.2. CV syllables
The inventory of consonants in C onsets, shown in Table 2.9, can be summed up as
follows:
28
I. Two classes do not occur at all: the retroflexes and the palatals. Both of these
are rare (marked) places of articulation cross-linguistically. 2. There is a stop series
labial-alveolar-velar beside a single palatoalveolar affricate. This is a single [-cont -son]
series, labial-alveolar-palatoalveolar-velar, with affricate manner obligatory at the
palatoalveolar place and forbidden elsewhere. 3. There is a fricative series
labial-dental-alveolar-palatoalveolar. Velar fricatives are forbidden.
Table 2.10 shows the repair which I assume is made to each of the impermissible
segments: A box encloses each group of underlying segments that map onto the same
surface segment.
Table 2.10. English surface obstruent inventory in CV syllables
Place of articulation Stop Affricate Fricative [strident]
Labial P P* * -
Labiodental P Pf f +
Dental/Interdental t e _
fi
Alveolar t Is s +
Retroflex t - g +
Palatoalveolar, alveolopalatal t tf, tq S, c +
Palatal c c5 S -
Velar k kx X -
Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.
The process involves 24 underlying-to-surface mappings, and the grammar I
propose will be quite complex, with 12 constraints ranked in 9 strata. I will first describe
and justify the constraints, then present ranking arguments for them.
29
2.3.2.3.1. Undominated faithfulness constraints
None of the repairs shown in Table 2.10 involve deleting the offending segment
entirely. This is very naturally modelled as an all-dominating faithfulness constraint against
deletion:
(2.11) MAXSEG
Every segment of the input has a correspondent in the output.
No repair involves changing the major articulator Labials are changed to labials,
coronals to coronals, and dorsals to dorsals. This can be modelled with an IDENT
constraint:
(2.12) Id en t [P l a c e ]
If an underlying and a surface segment are in correspondence, they share the
same major articulator.
2.3.2.3.2. Coronal stop places: dental, retroflex, alveolar, and palato-alveolar
Of the four coronal stop places, only the alveolar is used in English CVs. Instead of
a stop, the palato-alveolar place has an affricate.
The lack o f retroflex and dental stops can be seen as the result of high-ranked
markedness constraints against them, constraints whose existence can be justified
typologically.
Retroflexes are banned in many languages besides American English. In
Maddieson's genetically and geographically balanced sample of 317 languages, over 99%
had a dental or alveolar stop, while only 11.4% had a retroflex stop (1984: §2.4). In the
same sample, 266 languages (84%) had a non-retroflex voiceless fricative, while only 17
(5.4%) had a retroflex voiceless fricative. For voiced fricatives the numbers were 96 (30%)
and 3 (1.0%) respectively (1984: Table 3.2). The markedness of retroflexes can be
30
modelled by a markedness constraint *RET, which awards one mark for each segment that
is [-ant, -dist].
(2.13) *RET
*[-ant, -dist]
It is unusual for a language to contrast dental and alveolar place; the [+anterior]
stops are either laminal or apical. The dental stops [t d] I assume are ruled out by a blanket
constraint against the dental place of articulation, operative in other languages which favor
alveolar over dental place.
(2.14) *DENTAL
*[+Cor +ant +dist]
On the basis of loan-word phonology, I will assume that both retroflexes and
dentals are repaired to alveolars. Alveolars remain alveolar, and palato-alveolars remain
palato-alveolar (though, for reasons discussed in the next section, palato-alveolar stops gain
a [+cont] specification to become affricates).
Table 2.15. R epar of retroflexes to alveolars (Yule & Bumell 1886, American Heritage
Dictionary 2000)
Source language Original form English
Tamil [kattumaram] catamaran
Hindi [([akait] dacoit
Hindi [patti:] puttee
Hindi [topi:] topee
Hindi [tamtam] tom-tom
Hindi [lu:t] loot
31
Table 2.16. Repair of dentals to alveolars (American Heritage Dictionary 2000)
Source language Original form English
French [debakl] debacle
French [tul5] Toulon
Russian [tokamak] tokamak
The repairs involve changing the [anterior] and [distributed] specifications. The
problem is how to insure that palato-alveolar inputs, and only palato-alveolar inputs, surface
as palato-alveolar outputs. The solution is to hand: Under the Hall (1997) feature system,
palato-alveolars are both coronal and dorsal. Changing a non-palato-alveolar to a palato
alveolar, or vice versa, therefore violates the undominated lDENT[PLACE]. Since *R e t and
*D ental force retroflexes and dentals to change, while IDENT[PLACE] prevents them from
becoming non-coronals or palato-alveolars, their only recourse is to become alveolars by
changing their values of [ant] or [dist].
32
(2.17) I d e n t [P l a c e ) » *R e t , *D e n t a l » Id e n t [a n t ), I d e n t [d is t ].
IDENT
[P l a c e ] *R et *D e n t a l IDENT[ANT] IDENT[DIST]
Wl15 *!
-> [d] *
MJ *! * *
[d3] *! *
Id / [d] *! *
-> [d]
MJ *! *
[d3] *! * *
*! * *
/«¥ Ml
-> [d] *
M) *!
[d3] *! *
/d/ [d] *! *
[d] *! * *
MJ *! *
-> [d3]
13 [d] represents a [-ant, + d ist| (palatoalveolar) stop, the stop corresponding to the affricate [tj).
2.3.2.3.3. The persistence of [6]
The fricatives [0 d] are the only dental segments in English CVs, and the only
[-strident) fricatives. As such, they seem to be resistant to two markedness constraints.
One, discussed in the previous section, is *DENTAL, which militates against all
dental articulations.
The other is a constraint against non-strident fricatives. English is rich in
[+strident] fricatives ([f s J]) and poor in [-strident] ones (only the comparatively rare [0 ]).
The situation is the same in most languages. The three most common fricative places
(voiced or voiceless) are, in descending order, [s]/[s], [fl, and [f]. The first nonstrident
fricative, in fourth place, is [x], which is half as common as [f| and more than twice as
common as any other nonstrident fricative (Maddieson 1984: Table 3.2). We can capture
the markedness of nonstridents directly as a constraint against [-strident]:
(2 .1 8 ) * [-STRIDENT]
I will analyze the persistence of [0] in English as preservation of the salient acoustic
contrast between the [-strident] [0 ] and the other, [+strident], coronal fricatives14:
(2 .1 9 ) M a x [ - s t r i d e n t ]/C o r
An underlying [-strident] coronal must correspond to a surface [-strident]
coronal.
11 Since stops are not specified for [strident], this constraint also keeps [t0 6] from turning into stops.
34
(2.20) MAX[-STRIDENT]/COR » *[—STRIDENT]
/0 ik / M a x [ - S t r i- *D ental * [—STRIDENT] id e n t [D is t ]
dent] / C or
[Oik] * *
[sik] *! *
Since all the other [-strident]s (the labials and dorsals) are still able to change to less
marked segments, they will do so, while the dentals cannot:
( 2 .21 )
M a x [-S t r id e n t ] * [-S T R ID E N T ] (lower-ranked

/$aet/ /C o r faithfulness)
[fact] *!
[fact] *
This leaves the dental fricatives as the only possible dentals and the only possible
non-strident fricatives.
2.3.2.3.4. Dorsal places o f articulation: palatals and velars
English CVs have only velar articulations. 15 Palatals are repaired to velars.
Palatals are rare cross-linguistically. Maddieson found palatal or palatoalveolar
stops in only 18.6% of his sample, though over 99% had a velar stop (1984: §2.4). A
voiceless palatal fricative occurred in only 16 of the languages, or 5.0%, and a voiced palatal
fricative was found in only 7, or 2.2%. By way of comparison, voiceless and voiced
palatoalveolar fricatives turned up in 146 (46%) and 51 (16%) (Maddieson 1984: Table
12 The allophonically palatalized velars found before front vowels, as in key, are not as far fronted as
phonemic palatals in languages that have them (Keating & Lahiri 1993). W e regard them here as velars.
35
3.2). We capture this with another context-free markedness constraint, *PAL, which gives a
mark to each [+Dor -back] consonant.
(2.22) *PAL
*[+cons, +Dor, -back]
This constraint will cause underlying palatals to become velars (violating low-ranked
IDENT[BACK], but satisfying high-ranked IDENT[PLACE]):
(2.23) IDENT[PLACE] » *PAL » IDENT[BACK]
/ca/ iDENTfPLACE] *PAL IDENT[BACK]
[ca] *!
-> [ka] *
[ta] *!
Since palato-alveolars are also dorsal and [-back], they meet the structural
description of *PAL. However, lDENT[PLACE] protects them from losing their [Cor]
specification:
(2.24)
/tja/ Ident [Pl a c e ] *PAL lDENT[BACK]
[ca] *! *
[ka] *! *
-* m *
36
2.3.2.3.5. Labial places of articulation: bilabials and labiodentals
Since no language is known to contrast bilabial and labiodental stops, they are not
distinguished in the Hall (1997) feature system.
Bilabial and labiodental fricatives are featurally distinct, the bilabials being
[-strident] while the labiodentals are [+strident]. The result of this, as we saw in (2.21), is
that [$] is converted to [f| in order to satisfy *[—STRIDENT].
2.3.2.3.6. The stop-affricate-fricative series
It is very common for languages to have affricates at all and only those places where
it has no stops. The most common pattern - found by Maddieson in 8 6 out of 317
languages, or 27% - is the English one of stops at the labial/alveolar/velar places and
affricates at the palatoalveolar place (Maddieson 1984: § 2.5). The effect of this is to
disperse the [-cont] segments as widely as possible in articulatory and acoustic space, with
one segment being made by the lips, one by the tongue tip, one by the tongue blade, and one
by the dorsum.
Affricates are, typologically, more marked than fricatives, which are more marked
than stops. Every one of the languages in Maddieson's sample had stops, and most had two
stop series. All of the 451 languages in the UPSID database have stops; 413 have
fricatives; only 300 have affricates.
However, it seems that the palatoalveolar place of articulation is more hospitable to
affricates than to stops. Maddieson's Tables 2.5 and 2.8 make this clear:
37
Table 2.25. Number of languages with stops at given places in the sample of Maddieson
(1984:TabIe 2.5)
Palatal or
Dental or palato Labial-
Bilabial alveolar alveolar Retroflex Velar Uvular velar
No. of
languages 314 316 59 36 315 47 20
Percent 99.1% 99.7% 18.6% 11.4% 99.4% 14.8% 6.3%
Table 2.26. Frequency of the most common affricates in the sample of Maddieson
(1984:Table 2.8)
Voicing Dental/alveolar16 Palato-alveolar
Plain voiceless /*ts/ 95 AS/ 141
Aspirated voiceless /*tsty 33 /tSh/ 42
Plain voiced /*dz/ 30 /dZ/ 80
The cross-linguistic pattern is that found in English, where the palato-alveolar
position in a stop series is filled with an affricate. Affricates are more marked than stops,
except at the palato-alveolar place, where the opposite is true. The affricate, it appears, is the
stop of the palato-alveolar place. This tendency can be formalized as a markedness
constraint:
(2.27) AFFR/PALAL
An obstruent should be an affricate if and only if it is palato-alveolar.
13 Maddieson's * indicates "dental o r alveolar (combined)". The non-IPA symbols are Maddieson's.
38
In English, illegal labial and alveolar affricates are repaired by converting them into
fricatives: pfennig [fj, tsunami [s], Zeitgeist [z], czar [z] (Jones 1997). The illegal
palatoalveolar stops are repaired by affricating them: Magyar [d3 J. The two processes are
shown in Table 2.28:
Table 2.28. Labial, alveolar, and palatoalveolar series of American English
Place of articulation [-cont] [-cont +cont] [+cont]
Labial P pf -> f
Alveolar t ts -» s
Palatoalveolar ______ it______ f

Note: The grayed segments are absent from the surface inventory. The repair to each is
indicated by the arrow.
In the labial and alveolar cases, the marked affricate is deaffricated to a fricative by
deleting [-cont], rather than to a stop by deleting [+cont]. Some faithfulness constraint
must be blocking the deletion of [+cont] but not of [-cont]. We will take it to be
MAX[+CONT], which gives a mark to each corresponding segment pair where the
underlying segment has [+cont] but the surface segment does not:
(2.29) MAX[+CONT]
An underlying [+cont] segment must correspond to a surface [-fcont]
segment.
Since the non-palato-alveolar affricates lose their [-cont] specification in order to
satisfy AFFR/Pal AL, it must be ranked above Ma x [-CONT]:
(2.30) MAX[-CONT]
An underlying[-cont] segment must correspond to a surface [-cont]
segment.
39
(2 .3 1 ) M a x [+ c o n t ], a f f r /P a l a l » M a x [-c o n t ]
/pfemk/ M a x [+c o n t ] AFFR/PALAL M a x [- c o n t ]
[pfenik] *1
[penik] *!
—> [fenik] | *
The palato-alveolar stops change manner rather than place of articulation so as not to
violate the undominated lDENT[PLACE]. (The palato-alveolar place is the only one which is
both coronal and dorsal, so that any change o f place changes a major articulator.) They
become affricates, rather than fricatives, to avoid a gratuitous Ma x [-CONT) violation.
(2.32)
/m adar/
c I d en t [Pl a c e ] A ffr /P a l Al Max [- c o n t ]
[madar] *!
[madar] *!
—» [m ad 3 ar]
[m a 3 ar] *!
Max[+CONT] is violated when [9 ] or [x] is repaired to [k], so it must be dominated
by *PAL, *[-STRlD].
40
(2 .3 3 ) *PAL, *[-STRID] » M a x [+ c o n t ]
/airm an / *P a l *[—STRID] M a x [+c o n t ]
[airm an] *! *
[aixman] *!
[aicman] *!
—» [aikman] *
2.3.2.3.7. Consirainl lattice
The grammar we have established has the rankings shown in (2.34). The topmost
stratum consists of unviolated faithfulness constraints. Lines represent rankings established
by direct comparison in ranking arguments.
(2.34)
M a x Seg , Id e n t [Pl a c e ] M a x [ - strid en t ]/C or
* [-STRIDENT] *DENTAL
Iden t [B ack ] M ax [+c o n t ]I d en t [ant ], Id en t [dist ]
A ffr /P a l A l
M a x [- c o n t ]
41
2.3.2.3. * [ s j J
With the basic inventory accounted for, we are now ready to turn to the first of the
two onset conditions used in the experiments. Table 2.10, repeated here, can be compared
with Table 2.35, which shows the observed inventory and repairs in C[j ]V syllables.
Table 2.10. English surface obstruent inventory in CV syllables
Labial P * -
Labiodental S f +
Dental/Interdental t JB 0
Alveolar t ts s +
Retroflex t t§ § +
Palatoalveolar, alveolopalatal t tf, tc £ c +
Palatal c c5 9 -
Velar k kx X —
42
Table 2.35. English surface obstruent inventory in C[j ]V syllables
Labial P t -
Labiodental 6 Pf f +
Dental/Interdental t
w% JO 0 ___
Alveolar t s +
Retroflex t r ■' 1 i +
Palatoalveolar, aiveolopalatal - P : tf, tc +
Palatal c c$ 9 -
Velar k kx X —
The only differences between permissible Cs in CV and C[j]V syllables are in the
coronals. Four formerly separate groups have been merged into two, so that all coronals
(except [0 ], immune as usual) have the same minor place features: [-ant +dist -back].
Since these are exactly the coronal features of [i], the merger is naturally understood as
place assimilation via spreading of the Coronal node. 17
14 The phonetic effect o f [j ] on preceding ft/ is variously described by different authors. I discuss here the
dialect of Hammond (1999:101) and myself, in which the p re-fi] /t/ have a distributed palato-alveolar
articulation, [tfl. Others, such as Olive et al. (1993), say that the articulation is an apical, retroflexed [(].
Given the wide variation in how speakers articulate [j ], the difference may be due to (he spread o f different
features: the first [j ] being (-an t, -Klist] and the second being [-ant, -dist]. If so, one would expect that (t-i]
speakers would have a retroflex, rather than a palato-alveolar, articulation for /jj/, so that it would be
pronounced [§j|. The acoustic contrast between [t] and [[J, o r between [S] and [§), will in any case be
difficult to hear before [j ] ow ing to the m uffling effect o f lip rounding and the lowering o f form ant
frequencies.
43
(2.35) SPREAD[COR]
Neighboring [+Cor] segments should have the same value of [antj, [dist],
and [back],
SPREAD[COR] must dominate the faithfulness constraints against spreading the
minor coronal features: IDENT[ANT] and IDENT[DIST]. No IDENT constraint on [back] is
violated since coronals either lack a [back] node entirely or are [-back]. Since [0] is
immune, SPREAD[COR] must be dominated by MAX[-STRIDENT]/COR.
(2.36) MAX[-STRIDENT]/COR » SPREAD[COR] » IDENT[ANT], lDENT[DIST]
a.
/sjEbjenitsa/ M ax S pr e a d IDENT[ANT] IDENT[DIST]
[-STRIDENT] [COR]
/ C or
[sjebienitsa] *!
(jjsbjenitsa] * *
b.
/ 0 -ied/ M ax SPREAD IDENT[ANT] IDENT[DIST]
[-STRIDENT] [COR]
/COR
[0 jed] *
Cfjed] *! *
The illegality of [sj ] arises from its failure to obey SPREAD[COR]: The [s] should
be retracted to [J], but is not.
44
A conceivable repair to [ s j] which is not in fact used is epenthesis; Sri Lanka does
not become [soji] Lanka the way that tmesis becomes [tomisis]. This shows that the anti-
epenthesis constraint DEPSEG dominates IDENT[ANT] and IDENT[DIST]:
(2.37)
/sji/ S prea d D ep Seg IDENT[ANT] lDENT[DIST]
[COR]
[sji] *!
m * *
[saii] *!
The grammar is shown in (2.38) (lines indicate only those rankings proven above):
(2.38)
M a x Se g , Id e n t [Pla ce ], *Re t , Ma x [- s t r id e n t ]/C or
*R] [-STRIDENT] *DENTAL SPREAD[COR] DEPSEG
IDENT[BACK] MAX [+CONT] IDENT[ANT], lDENT[DIST]
Affr /P al Al
M ax [- c o n t ]
45
2.3.2A. *[tl]
The second commonly-used environment is the C[1]V syllable. Here, all of the
coronals are excluded from appearing in C position except the strident fricatives.
Table 2.39. English surface obstruent inventory in C[l]V syllables
Labial P p$ ♦ -
Labiodental P Pf f +
Dental/Interdental t JB e -
Alveolar t ts s +
Retroflex t \§ § +
Palatoalveolar, alveolopalatal t tl, tc J. 9 +
Palatal c c9 9 -
Velar k kx X -
segments are mapped to the white segment in the same enclosing box. The repair for [tl] is
shown as [k], but this is not known with certainty.
The illegality of [tl dl] in an onset can be linked to the fact that both are coronal and
[-cont], through the Relativized Obligatory Contour Principle (Selkirk 1991, Padgett 1991):
(2.40) Relativized OCP (Selkirk 1991)

G H
| | where G and H share property F, and are F-wise
*F F adjacent.
The Relativized OCP is a feature-geometric constraint designed to account for root
co-occurrence restrictions in which the effects of place of articulation similarity are
modulated by stricture features. In Modem Standard Arabic, for example, a triconsonantal
46
verb root may contain two successive coronals, but only if they disagree in continuity. Thus
/sVtVq/ is permissible, but /sV0Vq/ or /tVdVq/ is not (Yip 1989). However, there are no
general restrictions on co-occurrence of two coronals, or on co-occurrence of two [+cont]
or [-cont] consonants; what is discouraged is similarity on both dimensions at once.
Segments which are too different in stricture do not interact in place. (In feature-geometric
terms, stricture similarity is expressed as adjacency on an autosegmental tier.)
The effect of the Relativized OCP can be seen diachronically in English, in the
progressive loss of the coronal [j] in onsets after coronals and before [u]. It was first lost
from [tju], as in rude, rule, then from [lju] as in lute, dilute, and is now being dropped from
[nju] (news), [sju] (suit), and (in the most advanced dialects) [dju] and [tju] (duke, tune)
(Trudgill 1999:56-59). The more similar the preceding consonant is to [j] in stricture, the
earlier it was lost.
We can see the ban on [tl dl] as a similar phenomenon. English allows [pi bl kl gl],
where the place of articulation differs between stop and liquid even though both are
noncontinuant. It allows [ti dj] and [si], where both segments are coronal but only one is a
non-continuant. What is forbidden is two successive consonants with the same articulator
and the same value of [cont]: [tl dl ] . 18 This is modelled as one of the family of Relativized
OCP constraints:
(2.41) OCP(CONT, PL)

Adjacent consonants using the same articulator are forbidden if they share
the same value of [cont].
18 English does have [si], [sn], and fstj onsets, which appear to violate the ban on [coronaI][coronal]
sequences. However, initial [s] is exceptional in another respect: Unlike all other consonants, it can
precede a less sonorous segment. In fact, [s] can be added to any legal onset except a fricative or affricate,
and all three-consonant onsets are so formed. The [s] neutralizes the [voice] contrast in a following stop,
and palatalizes to [f] before [j ], but otherwise does not interact with the rest o f the syllable. These facts are
ordinarily analyzed by positing a reserved structural slot for (s| at the left margin o f the syllable, outside o f
the onset (e.g., Kenstowicz 1994:258; Borowsky 1986:175-179). This account is corroborated by the
coronal fricative [6), which cannot occupy the [s] slot and thus is subject to the [coronal][coronal] ban:
[61], [6n], and [6t] are impossible onsets.
47
There is little evidence as to the nature of the repair to the illegal sequence. It is
possible that /tl/ is repaired by epenthesis, or by making the [I] syllabic, and thereby
separating its coronal articulation from the [t] (Sproat & Fujimura 1993). Another
possibility is that the IV is realized as [k], as in the attested pronunciation [kliqgit] for
Tlingit'9. It has been shown that French listeners strongly tend to misperceive the illegal [tl]
as [kl] (Halle etal. 1998).
The place-dissimilation repair would require the anti-epenthesis constraint DEPSEG
to dominate lDENT[PLACE]. The epenthesis repair would require the opposite.
(2.42) OCP(CONT, PL), DEPSEG » lDENT[PLACE]
/tlirjgit/ OCP(CONT,PL) D epSeg I d e n t [P l a c e ]
[tlirjgit] *!
[taliqgitj *!
-> [kliggit] *
(2.43) OCP(CONT, PL), lDENT[PLACE] » DEPSEG
/tlirjgit/ OCP(CONT,PL) I d e n t [P l a c e ] D e p Se g
[tliggit] *!
-> [taliqgit] *
[klnjgit] *!
2.3.2.S. ??[pw]
The phonological status of initial [pw] in American English has never been fully
clarified. The [pw bw mw] onsets are often described as marginally acceptable by
English-speaking linguists. Hultzen (1965) and Wooley (1970) consider [pwj a
19 The pronunciation [kliqkit] is deem ed "correct" by the OED.
48
permissible English onset; Catford (1988) and Hammond (1999) consider it marginal, like
the initial [3 ] of genre or the initial [v j ] of vroom.
For/pw/ the example I have long used is puissant, attested for 1450 and
occurring once per million words (1/M20) or within the first 20,000 words of
the language, but pueblo (1818) is more frequent (2/M including the place
name) and is usually cited in whatever lists include this item. Both words
are pronounced as indicated, although they do have alternative
pronunciations not pertinent to our list. The word bwana, included by Hill
and others, is rare, but /bw-/ is frequent in Buenos Aires (1/M) in both
American English and RP (Hultzen 1965:12).
Wooiey points out that low frequency cannot be the sole criterion of phonotactic
badness:
Initial /pw, bw, zw, mw/ pose a more difficult problem. As Hultzen has
shown, puissant, dating from 1450, can hardly be rejected. To appeal to the
low frequency of occurrence of these clusters in order to reject them would
be to lose the natively English initial /0w/ as well (1970:74).
More modem frequency counts show that initial [0w] is more common than [pw]
(Celex combined written and spoken, EFW.CD/EPW.CD: 6 per million vs. I per million;
Francis-Kucera: 4 per million versus 0 per million), but the point is well taken. Hammond
(1999) considers initial [pw] and [bw] to be of the same degree of marginality as [dw]
(Celex has only dwarf, dwell, dwindle, and derivatives) and [0w] (Celex has only thwack,
thwart).
There are two reasons to think that in English, less weight is laid on the [pw bw] ban
than the [tl dl] one. First, English [r] is labial (Delattre & Freeman 1968), so the legal,
frequent onsets [pj bj fi] violate the same constraint as [bw]. Second, the ban on same-
place CC sequences is, cross-linguistically, stronger the more similar the two Cs are in
sonority (Selkirk 1988; Padgett 1991). Since [1] is less sonorous than [w] (Kahn 1980;
20 Frequency counts in the quoted passage are from T horndike & Lorge (1944).
49
Guenter 2000), the [dl] sequence is closer in sonority than [bw] and hence a worse
structural violation.
If we assume that [pw bw mw] are actually illegal in English (that they are
phonological rather than lexical gaps), then the illegality must be due to some active
markedness constraint. It has been proposed by Clements & Keyser (1983) that English
actively prohibits [labial][labial] sequences in the syllable onset. Again we have an effect of
the Obligatory Contour Principle, forbidding adjacent identical segments in a particular
domain (here, the syllable onset).
(2.44) OCP(LAB)
Adjacent labials are prohibited.
Since [p b] and [w] have different values of [cont], this constraint must be an
unrelativized OCP constraint, unlike the OCP(C0 NT, PL) forbidding [tl dl]. In order for
OCP(Lab) to be active, it must dominate some markedness constraint, so that an underlying
[Iabial][!abial] sequence is realized in some other way, repairing the violation. Already we
can see that all is not well with this analysis: What is the repair? Jones (1997) gives
faithful pronunciations for most word-initial [labial][labial] sequences, but some have
alternate pronunciations in which the sequence is repaired:
(2.45) English pronunciations of [pw bw mw]-initial loans (Jones 1997)

p u e b lo [puebloo]
p u is s a n t [pjuusont]
P u e r to ( R ic o ) [pours]
p o ig n a n t [poijiont]
B u e n o s ( A ir e s ) [boonos]
m o ir e [mouei]
M o iv r e [moivoj]
50
The repairs are unsystematic and look suspiciously like spelling pronunciations - as
if the disappearing [labial][labial] sequence were a victim of grapheme-to-phoneme
conversion rules rather than synchronic phonological grammar.
Intuitive judgments of "wordlikeness" have been shown to be strongly influenced by
differing frequency of the sequences contained in them, even when the sequences are
phonotactically legal and attested (Coleman & Pierrehumbert 1997; Frisch et al. 2000). The
intuitive-wordlikeness-judgment task will therefore be sensitive to lexical as well as
phonological gaps. If [pw] is legal in English, but very rare, it may be judged unacceptable
on the grounds of low phonotactic probability alone, even though [pw] words are possible.
The onsets [tl] and [ s j ] will be judged worse, being both illegal and rare. This would
account for the pattern of judgments and attestations reported in the phonological literature.
I will therefore analyze [pw] as a lexical rather than a phonological gap.
2.4. Summary
Optimality Theory views both inventory and phonotactics as consequences of the
same grammar. Systematic, productive gaps in the inventory and in the set of
phonotactically permissible combinations arise from the filtering effect of the grammar on
the unconstrained set of possible inputs. A productive, phonological gap in the set of
observed surface forms is one which could not be the output of any conceivable input from
the lexicon. A nonproductive, lexical gap is one which could be filled by some input from
the lexicon, but where the necessary input (through historical accident) is not a lexical item.
Two highly productive gaps in the set of English syllable onsets were discussed:
That on initial [ s j ], and that on initial [ t l] . Both were shown to be special cases of
systematic prohibitions involving neighboring coronals. These phenomena, assimilation
and the Obligatory Contour Principle (OCP), drive similar processes in many non-English
languages. The gaps in the English syllable inventory were analyzed as phonological gaps,
resulting from the filtering action of the grammar.
51
A systematic gap, but of doubtful productivity, was also analyzed: The partial ban
on 6 ' [pwj. The systematicity demands an OCP constraint on neighboring labials;
however, the lack of productivity suggests that this constraint is not ranked high enough to
be active. The gap is therefore treated as lexical.
52
CHAPTER 3
THEORIES OF PHONOTACTIC EFFECTS IN SPEECH PERCEPTION
3.1. Introduction
This chapter discusses three models of phoneme perception: the TRACE model
(which puts all phonotactic effects in the lexicon), the transitional-probability model of Pitt
and McQueen (1998) (which assigns them to a statistically sensitive prelexical module), and
a perceptual model based on Optimality-Theoretic grammar.
3.2. TRACE (McClelland & Elman 1986)
TRACE is a connectionist model of word and phoneme recognition in which the
lexicon can directly influence the perception of phonemes. In TRACE, a fully or partially
activated word candidate provides support for phonemic candidates which is
indistinguishable from the support provided by incoming acoustic information. Phonotactic
effects on speech perception are taken to arise from lack of lexical support for non
occurring phoneme sequences.
3.2.1. How TRACE works
The TRACE network is described by McClelland & Elman (1986); I will briefly
summarize what they say, but for full details the reader is referred to the original paper.
Each unit is a "detector" representing a hypothesis about the utterance — that it begins with
a voiced sound, that it begins with a (j], that it contains the word "yard", and so forth. The
detectors are organized into three layers, corresponding to features, phonemes, and words.
The "activation level" of a unit is a nonnegative number which varies over time in response
to the unit's inputs. It tells how much credence the model puts in that hypothesis at the
moment.
53
Figure 3.1. The TRACE model of McClelland and Elman (1986)
/plxstlk/ /Ixp/ Lexicon
Phonemes
[x]
Features
-voice
etc.
fft+ t It Acoustic signal
A unit receives input from all other units with which it is connected. The input
which Unit A contributes to Unit B depends on Unit A's activation and another parameter,
the "strength" of the A-to-B connection. All connections go both ways, so a large positive
strength means A and B strongly excite each other, while a large negative strength means
they strongly inhibit each other. Units on the same level inhibit each other, so that more
confidence in the "plastic" hypothesis means less confidence in the "lap" hypothesis (and
vice versa). Connections between levels are excitatory, so that more confidence in "plaid"
means more confidence that the word starts with [p] (and vice versa). The strength of the
connections is set by the experimenters; TRACE is not a learning model (McClelland &
Elman 1986).
At the very bottom of the model are the acoustic feature detectors, which receive
inputs not only from other units but from outside the model. Each detector is responsible
for a particular feature ([voice], [acute], etc.) at a particular time in the utterance. As the
utterance unfolds moment by moment, the feature detectors register its acoustic properties
54
and adjust their activation levels accordingly. Activation spreads upwards through the
network. It also spreads back down from the word detectors to the phoneme detectors, and
from them to the feature detectors. Meanwhile, the units on each level are trying to inhibit
each other.
TRACE assumes that the units are open to conscious introspection: To detect X, the
listener uses the X detector unit. Responses to a phoneme-monitoring task, for instance,
depend on the activation levels of the phoneme units. Responses to a word-recognition task
depend on the activation levels of the word units. Because activation spreads downwards
and inhibition spreads sideways, a unit's activation depends not just on the acoustical
configuration which it is nominally supposed to detect, but on the state of the rest of the
network. Under the right circumstances, the result can be strong activation (inhibition) of
the X detector despite the absence (presence) of evidence for X in the acoustic signal — a
perceptual illusion.
TRACE puts phonotactic illegality in the lexicon. Legal and illegal sequences are
processed differently because the legal ones receive support from lexical items containing
them, while the illegal ones do not (since, by definition, they do not appear in any words) -
that is, instead of punishing illegality, TRACE rewards legality. TRACE thus cannot
distinguish illegal sequences from other sequences of zero frequency. Any behavioral
differences between processing of zero-frequency legal sequences and illegal sequences (if
they can be shown to exist) would have to be explained by something outside of the
TRACE system.
55
3.2.2. Lexical effects on phoneme perception
The lexicon can certainly influence performance on tasks that are intended to tap
phoneme perception, lending credence to the TRACE approach. Evidence comes from four
major paradigms:
Phoneme detection (Foss 1969). Subjects listen to each stimulus and respond "yes”
or "no" depending on whether it has or lacks a particular sound (usually specified as a
letter). The usual dependent measure is RT for correct detections; error rates are < 10% and
not useful.
Phoneme categorization (Liberman, Harris, Hoffman & Griffith 1957). The
stimulus is acoustically ambiguous (e.g., between [bin] and [pin]); subjects are asked which
one it sounds more like. Dependent measures vary; a common one is the point (e.g., on the
VOT continuum) at which both judgments are equally likely. RT is also measured, and
tends to peak at the category boundary.
Phoneme restoration (Samuel 1981ab). One phoneme of the stimulus (non)word
has been either replaced by noise or obscured by noise, and the subject has to say which.
Dependent measures are signal-detection-theoretic d-primes and betas. The effects are
robust; performance is not improved by 1 0 ,0 0 0 trials of practice, nor by any but the most
explicit preview cuing (Samuel 1991).
Shadowing (Cherry 1953). Subject hears speech over headphones, and has to
repeat it in as close to real time as possible. Various dependent measures evaluate how well
mispronunciations were detected.
TRACE attributes lexical effects on phoneme perception to downward spreading of
activation or inhibition from the word units to the phoneme units (McClelland & Elman
1986). Phonetic data extracted from the speech stream is only one influence on the
phoneme units; it can be drowned out by powerful signals from above which bias a
phoneme unit so strongly in one direction that conflicting information from below is not
56
enough to offset it. The phoneme unit's activation level is trapped between a fixed minimum
and maximum, and becomes less responsive to its inputs the closer its activation level is to
the floor or ceiling; hence, strong excitation or inhibition from above can also reduce a
phoneme detector's sensitivity to acoustic features.
The three lexical factors known to influence phoneme tasks are lexicality, frequency,
and uniqueness point (UP).
Lexicality and frequency. Effects are rather fragile for many paradigms. The only
reliable one is the Ganong effect in phonetic categorization: If a stimulus is ambiguous
between a word and a nonword owing to ambiguity in one phoneme, the phoneme tends to
be heard so that it makes the word (Ganong 1980, Fox 1984, Connine & Clifton 1987,
McQueen 1991, Pitt & Samuel 1993; not replicated by Burton, Baum, & Blumstein 1989).
The same effect was observed in shadowing by Marslen-Wilson (1984): Subjects "fluently
restored" mispronounced words, apparently without noticing the discrepancy (i.e., with no
effect on shadowing latency). There is at least one report that a one-phoneme ambiguity
between a common and a rare word tends to be resolved in favor of the common word.
However, the effect can be reversed (to favor the rarer word) by setting up the experiment
so that the less common word tends to be the right answer (Connine, Titone, & Wang
1993).
Phoneme targets may be detected faster in words (Rubin, Turvey, & van Gelder
1976, Cutler, Mehler, Norris, & Segui 1987; not replicated by Foss, Harwood, & Blank
1980, Frauenfelder, Segui, & Dijkstra 1990). Word initial phonemes may be quicker to
detect in common than rare words (Morton & Long 1976, Dell & Newman 1980; not
replicated by Segui & Frauenfelder 1986). Phoneme restoration may be stronger in words
than in nonwords (Samuel 1981a, 1996; not replicated by Samuel 1987), and in common
than rare words (Samuel 1981a; not replicated by Samuel 1981b).
TRACE attributes the word-superiority and shadowing-correction effects to
downward spread of lexical activation. The ambiguous acoustic stimulus excites, say, [t]
57
and [d] equally in the context yar_. Since the YARD unit is somewhat activated by the
context, it contributes activation to [d]; lacking stimulation from a *YART, [t] is overtaken
by [d]. The YARD and [d] units keep exciting each other and inhibiting [t] until [t] is
completely overwhelmed. A stimulus ambiguous between two nonwords, like sirt and sird,
would favor no word node over any other, and would be decided on the basis of the acoustic
evidence (McClellan & Elman 1986). Similar reasoning would apply if YART were a real
word, but much less frequent than YARD.
The possible effects on detection are accounted for by TRACE: Top-down
activation spreading causes the activation of the relevant phoneme unit to reach response
criterion sooner.
Uniqueness point (Marslen-Wilson 1984). RT to detect a phoneme (measured
from the phoneme) decreases later in real words but not nonwords (Marslen-Wilson 1984,
Frauenfelder, Segui & Dijkstra 1990, Wurm & Samuel 1997); effect size varies from -30
ms to -300 ms. Phoneme restoration is stronger late in words with early uniqueness points
than late in words with late uniqueness points (Samuel 1987). Shadowers restore
mispronunciations more later in the word (Marslen-Wilson & Welsh 1978). Response
time to reject a nonword is a fixed amount, measured from the earliest point where the
stimulus differs from all words in the dictionary (Marslen-Wilson 1984).
Strong support for the COHORT word-recognition model comes from these effects,
which are hard to explain in other theories, and it is a great virtue of TRACE that the
connectionist architecture is able to do that. Simulations show that an active word unit is
quickly extinguished when mismatching acoustic information comes in, provided that a
better-matching word unit is present (McClelland & Elman 1986). Shortly after the
uniqueness point, only the matching word unit is still active, strongly inhibiting all rivals and
exciting phoneme units consistent with it (which in turn inhibit phoneme units that are
inconsistent). If mismatching phonetic information comes in after the uniqueness point, it
will have a hard time changing the network's mind about the word or the phonemic code.
58
3.2.3. Phonotactic effects on phoneme perception
In the previous section we saw that a listener's performance on a phonemic task can
be influenced by their knowledge of the real words of their language. Interestingly,
evidence exists that it can be influenced by their knowledge of the possible words of their
language.
Massaro and Cohen (1983) created segments ambiguous between [j ] and [1] by
varying F3, and asked subjects to judge them in the contexts [ tj] , [p j], [v_i], and [s_i]. In
English, only [i] is permissible after [t], only [1] is permissible after [s], both can follow [p],
and neither can follow [v]. The ambiguous segments were most likely to be judged [j ] in
[tj], less likely in [p_i], less likely still in [v_i] and [ s j] (as shown in their Figure 3.1).
Despite the lexical confounds — [tii], [pii], and [pli] are words — the evidence of [v j] and
[sj] suggests that judgments are altered by people's knowledge that [sji] can't be English:
The larger number of [i] judgments after [v] than [s] cannot be due to acoustic-phonetic
factors — if anything, the labial [v] should make the following F3 sound higher and the
ambiguous segment more [l]-like — nor to lexical ones, since [sli] is not a word - which
leaves phonotactics.
The TRACE model has a good explanation for how phonotactics exerts this
influence. McClelland & Elman (1986) found that the ambiguous stimulus [s?i] partially
activates similar words in the lexicon of their simulated word recognizer. Since the lexicon
contains only phonotactically permissible words, the units for sleep, sleet, and so on
become active, feeding excitation back to [1], but no countervailing sreep or sreet assists [j ].
The amount of acoustic support that [1] needs to reach criterion is thus reduced in the
context [ s j] compared to a neutral context like [v_i] in which there are no lexical items
similar enough to be activated. A phonotactic effect is achieved without phonotactic rules.
59
3.2.4. Empirical shortcomings o f TRACE
Since TRACE models many different things, it has been criticized on many different
grounds. For a comprehensive review of its shortcomings as a model of phoneme
perception, see McQueen, Norris, & Cutler (1999). Most important, for our study, is that
the phonotactic effect is usually larger and more robust than the lexical effect. This is very
unexpected in a theory which takes phonotactic effects to be diluted lexical effects.
The evanescence of lexical effects came up in §2.2. In a lengthy study, Cutler et al.
(1987) found that they could make word-superiority effects come and go by boring the
listeners less or more. A varied stimulus set, containing mono- and disyllables, got the
lexical effect; a monotonous one did not. They concluded that the lexical effect was not
automatic, but depended on listeners' allocation of attention between the lexical and the
prelexical levels'of representation. For a detailed review of such results from several
paradigms, see McQueen et al. (1999).
Phonotactic effects, by contrast, are robust and not affected by stimulus monotony.
The original Massaro & Cohen (1983) experiment got very large effects with a
monosyllabic stimulus set repeated for 1120 trials (total over two days). Pitt (1998) got
several large phonotactic effects with monosyllabic stimuli. Moreton & Amano (1999)
directly compared the effect of lexical status on the Japanese vowel-length boundary with
that of phonotactics using the same subjects and paradigm. They found a large phonotactic
effect but barely any lexical effect.
In other words, manipulations that make the lexical effect go away can still leave a
phonotactic effect. This is a problem for any theory which, like TRACE, denies a
distinction between lexical and phonological gaps. 1
1 M cClelland and Elman (1986) report an experiment which suggests that the lexical effect and phonotactic
effects can be superimposed. They compared listeners'judgments o f a segment [?] between [b] and [d] in the
contexts jwindle, _wiffle, and _wacelet. The highest rate of "d" response was obtained in the _windle
context, where dwindle is a word; an intermediate rate was found in jwiffle, where neither endpoint is a
word; and the lowest rate was found in _wacelet, where neither endpoint is a word but bwacelet is very
60
In TRACE, the phonotactic effect could be stronger because it combines the effects
of many lexical items, while the lexical-superiority effect depends on a single item.
However, the TRACE authors have shown that in fact the lexical-superiority effect is
stronger: When the network is presented with an ambiguous phoneme between [p] and [t]
in the context [_luli], it is classified as [t], with the lexical influence of truly winning out over
the phonotactic badness of [tl] (McClelland & Elman 1986).
3.3. The MERGE Transitional Probability theory (Pitt & McQueen 1998)2
Another potentially exploitable redundancy in speech is the statistical distribution of
segments. Different segment sequences, such as diphones or triphones, occur with different
frequencies. A model which is sensitive to these frequencies can compare the statistical
plausibility of alternative parses of ambiguous speech input in order to disambiguate it.
Such statistical information can also in principle be used to find word boundaries, and serve
as the basis for possible-word judgments.
There is evidence from various sources that listeners are sensitive to sequence
frequency. Treiman et al. (1996) found that nonwords containing high-probability
sequences are rated as "more English-like" by native speakers than those containing low-
probability sequences, and that, when subjects are asked to construct portmanteaus by
blending two nonwords, low-frequency sequences tend to be broken up more often than
high-frequency ones. Frisch et al. (2000) showed that listeners' "wordlikeness" judgments
were very strongly affected by the frequency of legal phoneme sequences contained in the
stimulus. Vitevich et al. (1997) found that nonwords containing frequent sequences were
similar to one. They interpreted this as a lexical effect superim posed on a phonotactic bias against *[bw) -
i.e., even though all the contexts were phonotactically biased, a lexical effect was still obtained.
Since all o f the contexts started with _w, the experim ent did not demonstrate a phonotactic bias;
its presence was sim ply assum ed. As shown in Chapter 4, E nglish listeners' bias against [bw] is weak if it
exists at all. Hence, the experim ent may just have measured an isolated lexical effect. It could still be true
that a really strong phonotactic effect, like the bias against inital [d l], would swamp any lexical effect.
2 The M ERGE fram ew ork is described most completely in N orris et al. (2000). The first discussion o f it
in connection with transitional probabilities is Pitt and M cQueen (1998).
61
rated as "more English-like" and were repeated faster in a single-word shadowing task than
nonwords containing rare sequences. English listeners learning an artificial language
develop statistical sensitivity to the different probabilities of sound sequences in that
language even if the linguistic input is an unattended background stimulus (Saffran et al.
1996, 1997; Aslin et al. 1998).
Some evidence that the sequences are encoded separately from the lexicon comes
from work by Vitevich & Luce (1998, Experiment 1). They constructed lists of disyllabic
English words and nonwords which varied in sequence frequency, so that some items were
"high probability" and some were "low probability". When pairs of nonwords were
presented for same-different judgments, listeners responded faster to high-probability pairs
than low-probability pairs. When pairs of words were presented, however, the pattern was
reversed: faster for low-probability, slower for high-probability. The authors' interpretation
is that high-probability words and nonwords are both facilitated by the frequency of their
sublexical sequences, but the high-probability words are more strongly inhibited by
competition from their many lexical neighbors. Further evidence is provided by Pitt &
McQueen (1998), in which an ambiguous fricative disambiguated by lexical information
(the Ganong effect) did not induce compensation for coarticulation in a following
ambiguous stop, while a fricative disambiguated by diphone-frequency information did.
Where TRACE seeks to explain phonotactic illegality as a gap in the lexicon, the
probabilistic theories seek to explain it as a gap in the set of attested sequences: A
phonotactically illegal configuration is one which has zero frequency (Pitt & McQueen
1998:349). Like the TRACE account, a probabilistic theory predicts (1) that illegal
sequences are only slightly different from rare sequences, and (2 ) that all zero-frequency
sequences (of the relevant length) are equally illegal.
In this study, we will focus on one particular implementation of a probabilistic
theory, namely, that of Pitt & McQueen (1998), because it is the one which was designed
for problems of ambiguous phoneme perception. The authors actually define a class of
62
probabilistic theories, rather than a specific one: certain manipulable parameters are left
unfixed. This section will try to narrow down the range of possible implementations on the
basis of existing data, so that the remaining possibilities can be tested experimentally.
The rest of this section is organized as follows: §3.3.1. illustrates the functional
utility of statistical knowledge. §3.3.2. describes the range of possible probabilistic theories
in the Pitt-McQueen class. §3.3.3. discusses these theories with respect to the existing data
on ambiguous-phoneme perception, eliminates some of them, and defines the specific
models which we will test.
3.3.1. Simulation: Success of statistical predictions
The functional motivation for probabilistic speech perception is clean Sequence
probabilities, even for very short sequences, greatly constrain the hypothesis space which
the listener must search. To illustrate this, let us consider a model whose task is to listen to
a list of isolated words drawn from the Celex English lemma database. The words occur
with their Celex spoken corpus frequencies. Every so often, one of the words is truncated at
a random location at least n segments into the word, and the model is asked to predict the
next segment (word boundaries are counted as segments). For this task, the model has
available only a table of transitional probabilities: for each string of n segments, it knows
the likelihood that the n+ /st will be [a], [t], etc. An example, for n = 2, is shown in Table
3.2:
63
Table 3.2. Probability that a given diphone will be followed by a given segment (extract
from complete table)
Preceding context Next segment Probability
WA n 0.97
WA r 0.03
WA s 0 .0 0
vz ) 0.99
vz d 0 .0 1
vae 9 0 .0 0
vae 9 0 .0 0
vae k 0.07
vae 1 0.62
vae m 0 .0 1
vae n 0.25
V T . . It/M l It \ M I ■
Note: "(" an d ")" mark word boundaries.
To predict which segment will follow a given context, the model's best strategy is to
always guess the segment that is most frequent after that context, since that maximizes its
chance of guessing right.
As an illustration, a simulation of this hypothetical experiment was run in which
words were randomly chosen from the Celex wordforms database (EPW.CD) according to
64
their frequency (combined written and spoken, which is Field 3 of EPW.CD). Initial and
final word-boundary markers were added to each word's segmental representation. From
each representation, a substring of length n + 1 was chosen at random (if the word was long
enough), and the model was asked to guess the last segment on the basis of its likelihood
given the first n. The model was credited with a correct guess if the final segment of the
substring was the best guess according to the guessing strategy (i.e., if the actual last
segment was the likeliest). The simulation3 was run for approximately 100,000 trials for
each of n = 1,2, and 3. It did rather well:
Table 3.3. Results of the simulation: Success rate as a function of context size
Size of preceding context (n) Model's success rate
I 0.378
2 0.630
3 0.783
That is, knowing only the last two segments, the model will predict the next one
correctly nearly two-thirds of the time - with zero acoustic information and zero lexical
information. This is only a thought experiment, but it is close enough to both the lab and
real life to show that even a small amount of probability knowledge can be used to very
great advantage.
3.3.2. Probabilistic theories o f speech perception
The probabilistic theory of Pitt & McQueen (1998) is, essentially, that prelexical
mechanisms are sensitive to sequence frequencies (in the equivalent form of transitional
probabilities), and that, when acoustic evidence is inconclusive, perception favors the more
likely option. This prelexical probabilistic module, along with the Shortlist model of word
3 The script s i m u l a t e d _ g u e s s is included in the appendix to this chapter.
65
recognition (Norris et al. 2000), forms part of the MERGE model, a theory of phoneme-
processing tasks in which the output of prelexical phonemic processing, along with lexical
information, is used in making phoneme-based decisions (Norris et al. 2000). Probabilistic
effects, in this model, occur very early, and are separate from lexical effects. The relative
contribution of each to phoneme responses is determined by attentional weighting, which in
turn is determined by task variables.
There are several different ways to make a theory of sequence-probability influence
on phoneme perception. The major adjustable parameters include: (1) location and size of
the context, (2 ) the database from which the probabilities are computed, and (3) the guessing
strategy. This section describes the possible parameter settings, and the theories resulting
from them.
3.3.2.1. Context
In a typical perception experiment, the listener is confronted with an acoustically
ambiguous segment "?", which could be either x or y, in a context A_B. How can statistical
knowledge about the frequencies of AtB and AyB be used to disambiguate it?
One way is to directly compare the likelihood of x and y in the context. The
decision depends on the conditional probabilities P(.v | A_B) and P(y | A_B), where
(3.4)
F(AxB)
P(x\A_B) = (a)
F(A_B)
F(AyB)
P( y \ A _ B ) = (b)
F( A _ B )
66
Since F(string) is the frequency with which that string occurs in the database (the
listener’s experience), all the model has to do in order to make its decision is to compare
F(Atfl) with F(Ayfl). It consults its table4 of (2n+l)-phone frequencies, where n is the
length of A and B,5 retrieves the two relevant frequencies, and hands them over to the
decision rule. This kind of context I will call surrounding context o f order n.
A second possibility is to treat the left and right contexts separately. The decision
depends on the conditional probabilities P(jc | A_), P(y | A_), P(x | _B), and P(y | _B), which
reduce to the frequencies F(Ax), F(Ay), F(xfl), and F(yB). This I will call independent
neighboring context o f order n (again assuming simplistically that A and B have equal
lengths). A table of (n+l)-phone frequencies is consulted.
The predictive difference between these two context types is that surrounding
context (SC) can take advantage of statistical dependencies between A and B, while
independent neighboring context (INC) cannot.
For example, English, like most languages, requires sonority to rise in syllable
onsets but not in codas. As a result, it lacks sequences like [tdt], [pdp], [fvp], etc., since no
matter what precedes or follows such a sequence, there is no legitimate syllabification - the
consonant in the middle is higher in sonority than either of its neighbors, but not high
enough that it can serve as a syllable nucleus itself. A model using SC of order 1 will note
the gaps in its table of 3-phone frequencies. A model using INC of order 1, though, will
miss these gaps, since each of the sub-sequences [td], [dt], etc. does in fact occur. Given a
segment ambiguous between [1] and [n] in the context [m_z], the SC-1 model will favor [I],
since [mlz] is attested (e.g., in camels), while [mnz] isn't (at least, not for speakers who
lack syllabic [n] in lemons). The INC-1 model will note only the low nonzero frequency of
[mn] (e.g., damnation, amnesia), and the high frequencies of [mz], [ml] and [Iz], and treat
4 I say "table", but that is only one notational variant. They can also be viewed as sublexical-sequence
detector units whose resting activation or excitability depends on frequency. A proposal along these lines is
Vitevich and Luce (1999).
5 For theoretical simplicity's sake I assume symmetry; A and B have the sam e length. This might be
wrong.
67
[n] as no worse in [m_z] than in [m_ei]. Which of these models is closer to what people
do is of course a question for the laboratory.
Another difference between SC and INC, of no consequence predictively but
significant conceptually, is the size of the n-phone tables. As the size of the context
increases, so does the number of phoneme sequences whose frequencies the prelexical
module has to keep track of. Their number quickly approaches the size of the lexicon:6
Table 3.5. Attested English phoneme sequences of lengths 2,3, and 4
Sequence length Number of sequences
2 1,395
3 11,961
4 35, 732
Note: Counts were made from the set of Celex wordforms occurring at least once per
million words. Initial and final word boundaries were counted as phonemes. For
comparison, the number of lemmas in Celex is 52,447.
The MERGE TP models are intended to contrast with TRACE by keeping the
lexicon out of the early stages of speech perception. As the rt-phone tables grow, this
difference becomes blurred: the tables incorporate not only the entire lexicon of length n or
less, but fragments of many larger words, and the two theories come to make more and
more similar predictions. These considerations argue for the INC theories over the SC
theories, since the former use shorter n-phones to describe the same-sized context.
Pitt & McQueen (1998)’s experiments use a preceding context of length 1, but the
authors discuss evidence that a preceding context of length 3 may be needed. Since all of
the stimuli they discuss had the same following context (silence), they did not need to go
into their claims about following context. I will assume that they are considering one of
three theories of context: SC-1, INC-1, or INC-3. (See discussion below, §3.3.3.)
6 The single word bat [baetl, for example, contributes four 2-phones: #b, bae, aet, and t#.
68
3.3.2.2. Database
What corpus are the n-phone tables based on? This is both a theoretical and a
practical problem. There are two principal options.
One possibility is that the n-phone frequencies are computed from the stored items
in the lexicon. Each word contributes its n-phones, weighted according to the word's
frequency. The lexicon does not directly participate in speech perception, but contributes
off-line by updating the n-phone tables which the early perceptual mechanisms can consult.
Such theories use a lexical database.
A second possibility is that the /i-phone frequencies are computed directly from the
incoming speech stream - computed by the same mechanisms that later consult them,
without any participation from the lexicon. This is more in keeping with the spirit of the Pitt
6 McQueen (1998) model, a strictly bottom-up theory which aims to block the lexicon
from interfering with the early stages of perception. Such theories use a corpus database.
The empirical difference between the two is that the lexical database respects
morphological word boundaries, while the corpus database does not. The reason is that
morphological word boundaries are represented explicitly in the lexicon, but not in a
segmental analysis of the speech stream7. The effect on n-phone statistics can be
substantial. For instance, geminates are very rare in the English lexicon but occur freely in
running speech (That trite talk keeps Sid dozing).
The practical difference is that on-line dictionaries make the lexical-database
statistics easy to compute, while a lack of accessible on-line phonetic corpora makes the
7 M orphological word boundaries have many indirect correlates in the surface-level phonetic analysis o f the
speech stream . Since prosodic boundaries tend to be aligned with them, they often correlate with fortition
(Fougeron & Keating 1997). As "prominent” positions, they tend to support phonological contrasts not
available elsewhere (Beckman 1998, Smith 1999), and to undergo prominence-enhancing phonological
processes (Smith forthcoming). None o f these correlates, however, allows morphological word boundaries
to be unam biguously located in the speech stream pre-lexically. If the n-phone tables are compiled
prelexically, no character corresponding to a morphological word boundary will be present in them.
(The same problem was faced by post-Bloom fieldian structuralist linguistic theory, which
dem anded that grammatical analysis proceed from low er to higher levels. Harris (1951) suggested a
statistical solution: Morph boundaries occur where the unpredictability of the next phoneme reaches a peak.
T his solution, like the TP theory, makes indirect use o f lexical and grammatical information. See
Newm eyer (1986:7-9) for a discussion.)
69
corpus-database statistics hard to compute. The standard practice in the field is therefore to
use an on-line dictionary and tacitly assume that the difference is negligible until proven
otherwise. Since I can’t tell what the corpus-database statistics predict, I will have to ignore
that theory and focus on the lexical-database theory.
Most of the frequency counts which Pitt and McQueen relied on were computed for
American English pronunciations, with frequencies apparently reckoned from the million-
word written American English corpus of Kucera and Francis (1967). Celex's 18-million-
word corpus is much larger, separates written from spoken English, and distinguishes
different inflected forms of the same word, but its British pronunciations are a drawback
when working with American English speakers. The result is some uncertainty about what
the TPs really are, and hence about what a TP-based theory would actually predict. For
instance, if one wants to reckon the probability that the segment following a given vowel will
be [s] or Jj) - a crucial case in the study of Pitt and McQueen (1998) - one can get three
different estimates for each:
Agreement is tolerably good if we stick to a one-segment context (i.e., a table of
length-2 sequences). The absolute magnitudes may differ by a factor of three, but the
different methods generally agree as to which segment each context favors.
For safety's sake I will give frequencies using both an American English frequency
dictionary similar to Pitt and McQueen's and the Celex corpus. Details of how these
frequencies are computed will be found in the appendix to this chapter.
70
Table 3.6. Transitional probabilities for the stimuli of Pitt & McQueen (1998), n = 1
Probability
Pitt & McQueen Celex: written and Francis-Kucera:
(1998, Table 2): spoken Br. Eng. written Am. Eng .8
Transition written Am. Eng.
Pr ( [us] | [uJ ) 0.019 0 .0 2 1 0.008
Pr ( [uj] | [u_]) 0 .0 1 0 0.009 0.009
Pr ( [us] | [u _ ]) 0.004 0 .0 0 2 0 .0 0 1
Pr 07 u j) 0.004 0.013 0.005
P r ( [ J s ] |[ j J ) 0.058 0.115 0.163
P r ( [ i f l |[ J J ) 0.007 0.009 0.007
Pr ( [eis] | [eiJ ) 0.064 0.026 0.069
Pr ( [eij] | [ e i j ) 0.139 0.043 0.133
Pr ( [is] | [i_ ]) 0 .0 2 1 0.017 0.023
P r ( [ i J ] |[ i J ) 0 .0 0 2 0 .0 0 1 0 .0 0 1
Pr ( [ip] | [i_ ]) 0 .0 2 0 0 .0 2 1 0.018
Pr ( [it] | [i_ ]) 0.025 0.026 0.025
Pr ( [eip] | [eiJ ) 0.015 0.008 0.017
Pr ( [eit] | [eiJ ) 0.151 0.054 0.123
8 Computed from the same database used by Pitt & M cQueen, but apparently using a somewhat different
counting method.
71
To compare the conditional probability P(x | A_B) with P(y | A_B) - that is, the
relative chances of finding x or y in a given environment - we need only compare the
frequency of AxB with that of AyB, since P(x | A_B) = (frequency of AxB)/(frequency of
the A_B environment) and P(y | A_B) = (frequency of AyB)/(frequency of the A_B
environment). I will therefore report only the AxB and AyB frequency counts.
3.3.2.3. Decision rule
Once the statistical information has been used to estimate the probability that a
particular segment in the context A_B is x or y, the model then has to choose one of the two.
How?
The general form of the decision rule must specify the probability that the model
guesses x rather than y given a stimulus A ?B. The decision rule has to take into account at
least two things: 1. the acoustic composition of the ambiguous segment (how close it is to x
ory), and 2. the TP statistics.
In everyday life, listeners are constantly confronted with productions of x and y.
Some are clear; some are garbled in various ways. The listener may at first parse a given
production "wrongly" (i.e., not as the speaker intended it), but usually the correct
interpretation becomes clear shortly as the listener recognizes the speaker’s intended
message. The listener therefore has the feedback needed to optimize the decision rule by
adjusting its parameters. We will suppose that they do this, with the goal of maximizing
their likelihood of correctly restoring the intended stimulus.
Each AxB or AyB stimulus puts the listener in a particular internal state. Which
internal state will depend not only on the speaker's intent, but on the garbling and on the
perceptual noise added by the listener's auditory system. Under the TP hypothesis, the
listener's response is determined by their perceptual state and by the distributional statistics
of their language. For the sake of illustration, let's assume the SC-1 statistics.
72
Suppose the listener, having heard a particular stimulus A ?B (intended by the
speaker as AxB or AyB), is now in State Z, a state which can lead to a response of "x"
The likelihood of correctly guessing the intended message is
(3.7)
PC{Z) = P(AxB | Z) • P( guess x \ Z ) + P(AyB \ Z) • P(guess y | Z)
By Bayes's Theorem,
(3.8)
P (Z | AxB)* P{AxB)
P(AxB \ Z) =
P(Z | AxB) * P(AxB) + P(Z | AyB) • P(AyB)
_________ P(Z\ AyB)*P(AyB)_________

P(AvB\ Z) =
PiZ. | AxB) • P(AxB) + P{Z | AyB) * P(AyB)
Letting rx = P(Z | speaker said AxB), px = P(speaker said AxB), qx = P(guess
Z), and similarly for y, we get
(3.9)
PC(Z) = r^ — qx + — ^ ---- qv
r <P< + ryP, rxPl + ryPy
rxPx + ryPy rxPx + ryPy
73
What choice of qx, our only free parameter, maximizes our chance of guessing
correctly? Clearly9, either qx = 0 or qx = 1. Since Z was arbitrarily chosen, it is true in
general that from any given internal state, the optimal choice is either always "x" or always
”y". The choice depends on the r and p parameters: if rxpx >rypy, then "x" is the best
guess; if the reverse, then "y".
In the language of Signal Detection Theory (Green & Swets 1966, Macmillan &
Creelman 1991), Choice Theory (Luce 1963), or Generalized Recognition Theory (Ashby
& Maddox 1994), an acoustic stimulus evokes an internal representation as a point in a
perceptual space. ("State Z" is one such point.) Following the reasoning described above,
the space is partitioned into regions, and all points in the same region lead to the same
response. To get optimal performance, each region must contain only points where rj<px >
rypy , or only points where rxpx < rypy , so the boundaries must be drawn so that rxp x
—r\Py for points Z on the boundary (Macmillan & Creelman 1991); i.e., so that rx/ry =
Py/px, or
(3.10)
P(Z\ AyB) P{AyB)

P(Z | AxB) ~ P(AxB)
If the a-priori probability ratio (the right-hand side) changes, as when A_B is
replaced by a different phonological environment C_D in which y is less likely, then the
boundary must move in order to keep the likelihood ratio (the left-hand side) equal to it.
9 PC(Z) is linear in qx , so its maximum must be at the sm allest or largest possible value o f qx , i.e., 0 or
1.
74
The consequences for a typical perceptual experiment are illustrated in (10). The
plane is perceptual space, with x and y the idealized perceptual representations of the
endpoint stimuli - "idealized", because in fact perceptual noise causes each presentation of a
stimulus to evoke a slightly different percept. The irregular line of dots shows the idealized
locations of the intermediate stimuli. Following the optimal response strategy, listeners
respond "x" when the percept is on one side of the boundary and "y" when it is on the
other, hence, what will be observed in the experiment is that the responses cross over from
mostly "x" to mostly "y" where the line of stimuli intersects the boundary. The difference
in boundary location between the A_B and C_D contexts causes a corresponding shift in the
location of the "x'V'y" crossover point.
Figure 3.11. Boundary shift in perceptual space
A_B C_D
75
The listener thus optimizes performance by changing their willingness to respond
"x" in accordance with the ratio of the probabilities of x and y in the context. This leads to
an important conclusion: The effect of context on the location of the "x'Vy" response
boundary depends on the ratio of the probabilities of x and y in that context, and not on their
difference.
For example: Suppose AxB, AyB, and CxD all occur 1,000 times per million words,
while CyD occurs 901 times per million words. The x/y ratio in AJB is 1, while that in CyD
is about 1.1. If a shift in the "x’V'y" boundary between A_B and C_D is found
experimentally, we would expect an even larger shift between A'JB'and C'_D\ where A ’x B \
A'yB', and C'xD' occur 100 times per million words and C'yD' occurs 1 time per million
words (giving x/y ratios of 1 in A'JB' and 100 in C'_Dr). Though the frequency
differences are the same (99 per million in both cases), it is the ratios that matter.
It is very unlikely that listeners follow the optimal strategy to the letter, which would
entail disregarding acoustic evidence of an event of zero probability. Under the optimal
strategy, a sequence of probability 0 is infinitely unlikelier than a sequence of positive
probability; hence, phonotactically illegal stimuli should always be as legal. There is
probably a limit to how far the criterion can be shifted, so that larger and larger a-priori
probability ratios only increase the bias up to a point. Very infrequent sequences are
therefore expected to behave similarly to absolutely non-occurring ones.
3.3.3. Statistical context effects on phoneme perception
The MERGE TP theory is intended to explain an interaction between lexical and
phonetic effects in phoneme perception. Mann & Repp (1981) showed that a segment
ambiguous between [t] and [k] tends to be heard as [t] after [J] and as [k] after [s]. The
effect evidently arises at an early (low) level of processing, either because the perceptual
system is compensating for expected coarticulatory effects, or because the low-frequency [J]
makes the next segment sound higher by contrast while the high-frequency [s] makes it
76
sound lower (Kluender & Lotto 1994). Elman and McClelland (1988) used neither [f] nor
[s], but a segment [?] acoustically in between them. When [?] followed Christma_,
r id ic u lo u or copiou_, it acted like [s] in its effect on perception of a following [t]-[k]
(tapes-capes) continuum. When [?] followed fooli_, Spani_, or Engli_, it acted like [J].
They concluded that lexical activation was spreading down to the phoneme level to favor tf]
or [s], as the case might be, which then had its ordinary phonetic effect on the following
segment.
Pitt and McQueen (1998) argued that early phoneme processing was immune from
lexical effects, and that the Elman and McClelland results could be accounted for if low-
level (prelexical) phonetic processes were sensitive to segment-to-segment TPs: the

ambiguous [?] was behaving like (fl or [s] depending on which was more likely to follow
the preceding segmental context. When [J] was more likely, [?] produced more [t]
responses to the following [t]-[k] continuum: when [s] was more likely, [?] produced more
[k] responses.
Where Elman and McClelland had asked only for judgments of the [t]-[k]
continuum, Pitt and McQueen (1998, Experiment 1) also asked for judgments of [?]. The
[?] was presented in two pairs of biasing contexts. One pair, [d3 U_] and [bu_], were
lexically biased towards [s] (juice) and (XI (bush), but the TP from the vowel to [s] and to (XI
was the same for both. The other pair, [di_] and [nei ], were lexically unbiased (since [dis],
[dijj, [nets], and [neijj are all nonwords), but differed in the TPs from the vowel to the
following fricative. The relevant statistics are shown in Table 3.6, repeated here for
convenience:
77
Table 3.6. Transitional probabilities for the stimuli of Pitt and McQueen (1998), n=l
Probability
Pitt & McQueen Celex: written and Francis-Kucera:
(1998, Table 2): spoken Br. Eng. written Am. Eng. 10
Transition written Am. Eng.
P r ( [ u s ] |[ u J ) 0.019 0 .0 2 1 0.008
P r ( [ u J l |[ u J ) 0 .0 1 0 0.009 0.009
P r ( [ u s ] |[ u J ) 0.004 0 .0 0 2 0 .0 0 1
P r ( [ u J ] |[ u J ) 0.004 0.013 0.005
Pr ( [is] | [j _] ) 0.058 0.115 0.163
P r ( [ i f ] |[ i J ) 0.007 0.009 0.007
Pr ( [ets] | [et_]) 0.064 0.026 0.069
P r ( [ e i J ] |[ e i J ) 0.139 0.043 0.133
Pr ( [is] | [i_]) 0 .0 2 1 0.017 0.023
P r ( [ i f ] |[ U ) 0 .0 0 2 0 .0 0 1 0 .0 0 1
P r ([ ip ]|[ L J ) 0 .0 2 0 0 .0 2 1 0.018
Pr ( [it] | [i_]) 0.025 0.026 0.025
Pr ( [eip] | [ei_J ) 0.015 0.008 0.017
P r ( [ e i t ] |[ e i J ) 0.151 0.054 0.123
10 Computed from the same database used by Pitt & McQueen, but apparently using a somewhat different
counting method.
78
Pitt and McQueen found a lexical and a TP influence on [?] report, but only a TP
influence on [t]-[k] report, suggesting that TPs were influencing early phonetic processing,
with the lexical effect only emerging at a later stage. 11 At an early, prelexical stage of
phonetic processing, the ambiguous fricative would be disambiguated using TPs, and,
having been classified as [s] or tf], would so affect the perception of the following [t] or [k].
Later on, after lexical access, the lexicon could affect listeners' report of the fricative, but not
of the stop.
Does the experimental evidence from these two studies rule out any of the possible
TP-based theories described in the last section? Are the left and right contexts independent
(INC model), or are they treated as a single unit (SC model)? Since following context was
not varied in these studies (it was always a [t]-[k] or [d]-[g] continuum before [ei]), they do
not distinguish INC from SC. However, they do throw some light on the question of how
much preceding context has an effect.
The most conservative hypothesis, which is used by Pitt and McQueen (1998)
through most of their paper, is that only the immediately preceding segment matters: the
decision between [s] and [f] after [WXYZ_] depends only on the frequencies of [Zs] and
m
McClelland and Elman had considered this account of their results:
One might have proposed that simple phoneme-to-phoneme sequential

constraints are such that they would lead subjects to predict that the final
phoneme in "Spanish" was an /J7 but the final phoneme in "ridiculous" was
an /s/, quite apart from specific lexical factors; it may be that [nl_] is more
often completed with tf], while [I? _ ] 12 is more often competed with [s]
(1988:158).
11 The latter half o f this squares with the Findings o f Fox (1984), who reported that lexical influences on
ambiguous-phoneme perception turn up only among the responses with long RTs.
12 Sic; an apparent typo for [la].
79
They dismiss it in view of the results of their Experiment 3, in which a large effect
was found when the contexts werefooY? and riciicuY?, where ”Y" was a CV sequence
intermediate between [li] and [Is]:
However, in the syllable-replaced condition, the context in the replaced items

is actually the same for three phonemes before the final fricative; the vowel
in "foo_,” and the last vowel in "ridicu_" are the same vowel, though they
may have slightly different acoustic realizations due to coaiticulation, and the
next two sounds in the two contexts are both acoustically and phonetically
identical in the syllable-replaced stimuli. Thus, any differential prediction of
the identity of the final fricative would have to be based on ”f_" vs. "ridic_,"
and thus would seem to be attributable to knowledge that is specific to the
particular lexical items involved (1988:158-159).
Context fully three segments away is affecting the ambiguous fricative so strongly
that it in turn affects perception of the following stop. The TRACE authors argue, in effect,
that expanding the TP context to be that big makes the TP theory practically lexical, by
including whole words in table of sequence frequencies (e.g., all words in the lexicon which
are four segments long or shorter). At any rate, a single segment of preceding context is not
enough.
Pitt and McQueen reply that the last vowels in "foo_" and "ridicu_" are not in fact
identical; the one they take to be [u] and the other to be [u]. If we consider the frequencies
of the four sequences [ulis], [uhjl, [ulis], and [ulij], then the TPs favor [is] after [ul_] and
[sj] after [ u l j . 13
/ulij/ occurs in words like coolish, foolish, and ghoulishness. Celex shows
that this string occurs about 26 times per million words, /ulis/ and /ulos/
occur less than once per million words, and /ulojV does not occur at all. So
t y is much more likely given /uIV?/. The opposite bias operates after/U/.
The string /Utas/ is quite common, in words like incredulously, ridiculous,
13 Ridiculous is transcribed by the Francis-Kucera dictionary as ridik{jo]lous; Jones (1997) records both this
and ri</ic[ja]/u5. Some speakers may have this latter pronunciation. CELEX estim ates /alas/ to occur
about 56 times per million words (combined spoken and w ritten), and /alaJV to occur not at all; /alts/
occurs 350 times, and /altJ-/ 25. Following Pitt & McQueen's reasoning, [s] is still more likely after
/a IV?/.
80
and stimulus. The CELEX estimate is 8 6 times per million words. /Ulis/
also occurs (in words like oculist and somnambulist, 9 times per million),
but /ulaJV and /ulif/ never occur. So /s/ is much more likely after AJIV?/.
(Pitt & McQueen 1998:365)
This counterproposal does not necessarily require the listener to keep track of 4-
phones, of which there are at least 35,732 (see Table 3.1). Perhaps what has happened in
McClelland and Elman's experiment is a statistical chain reaction. Suppose the listener
maintains a 3-phone table (at least 11,961 entries). When the ambiguous [i]/[a] vowel is
encountered after [ul_J or [ul_], it is disambiguated using 3-phones. The restored [li_] or
[ la J context is then used to decide, statistically, between [s] and [fl.
The statistics of English permit this. Table 3.12 shows the relevant Celex counts for
the [i]/[a] decision, which favor [i] after [ul_] and [a] after [ul_]. (The [i] counts in Celex
are too high for American English, since word-final unstressed [i], as in marry, is
pronounced [I] in Southern English dialects (Trudgill 1999). I have corrected them in the
table by subtracting the number of word-final occurrences in each context.) Table 3.13
shows the counts for the [s]/[J] decision, which strongly favor [s] after [la_], but are nearly
neutral after [li_].
81
Table 3.12. Triphone frequencies for sequences ending in [i]/[a] in the stimuli of
McClelland and Elman (1988)
Frequency per million words, Celex Frequency per
EFW.CD/EPW.CD million words,
Francis/Kucera
3-phone Combined Written Spoken
[uli] (raw) 151 161 76 o 14
[uli#] (word-final) -72 -7 8 -24

[uli] (corrected) 79 83 52
[ula] 32 33 12 15
[uli] (raw) 361 374 219 I
[uli#] (word-final) -264 -273 -171
[uli] (corrected) 97 101 48

[ula] 939 902 1342 175
14 This American English dictionary does not contain the word foolish.
82
Table 3.13. Triphone frequencies for sequences ending in [s]/[f] in the stimuli of
McClelland and Elman (1988)
Francis/Kucera
[Its] 1445 1473 916 416
m 707 695 826 362
[las] 617 612 674 270
12 11 14 0
However, if the TP context is extended to include the preceding two segments, we
now make the wrong prediction about Pitt and McQueen’s Experiments 1-3, since now
[d3 U_J and [bu_] have 100% TP biases towards [s] and [J] respectively (see Table 3.14).
This should have produced a TP effect, but did not. Even worse, the [di_] and [mi_]
contexts, which produced a large effect in their Experiment 3, are unbiased.
83
Table 3.14. Triphone frequencies for the stimuli of McQueen and Pitt (1998)
Francis/Kucera
[d3 us] 28 30 4 17
0 0 0 0
[bus] 0 0 0 0
[buj] 74 79 7 24
[dis] 0 0 0 0
[dif] 0 0 0 0
[neis] 14 15 6 2
[neij] 543 558 394 486
[mis] 0 0 0 0
[mil] 0 0 0 0
A context size of 1 segment is too small to account for the Elman & McClelland
(1988) results. A context size of 2 segments can handle those, but not the Pitt & McQueen
(1998) results. Larger contexts do not solve this latter problem (since the juice/bush stimuli
are only three segments long), and in any case lead to a duplication of the lexicon at a
prelexical level.
84
TRACE can explain this disparity, as noted in this connection by Samuel (2000).
Lexical effects in TRACE increase over time, and are greatest at the end of long words,
because the word nodes take time to reach activation and arc more active the more phoneme
nodes are feeding into them. The ambiguous fricatives in both experiments came at the end
of a word, but the McClelland and Elman words were much longer than the Pitt and
McQueen stimuli ([fuli_] and [iidikjuld_J versus [d3 u_] and [bu_J).
In this experiment, Pitt and McQueen not only failed to find a lexical effect with the
contexts jui_ and bu_, they succeeded in getting a TP effect with the contexts mee_ and
nay_, both of which make nonwords no matter which way the ambiguous fricative is
interpreted. Tables 3.15 and 3.16 show the cohorts at the time the ambiguous fricative
appears. It is clear that effects were found in ail and only those cases where, in at least one
of the paired stimulus contexts, the active cohort strongly favored [s] or [f] at the time the
ambiguous fricative appeared.
Table 3.15. Cohorts at the appearance of the ambiguous fricative in the experiment of
McClelland and Elman (1988, Experiment 3)
Continued with [s] Continued with []"]
Preceding context Words Frequency Words Frequency
ridiculou_ [Jidikjuloj ridiculous 36 (none) 0
fooli_ [fulij (none) 0 foolish 11

foolishly 2
85
Table 3.16. Cohorts at the appearance of the ambiguous fricative in the experiment of Pitt
and McQueen (1998, Experiment 3)
Continued with [s] Continued with [fl
Preceding context Words Frequency Words Frequency
jui_ [d3 U_] juice 2 0
juicy 2
bu_ [bu_] 0 bush 4
bushels 2
bushes 1
mee_ [ m ij — 0 — 0
nay_ [ n e ij - 0 nation 45
nations 43
nationwide 3
nationwide 1
No matter what context we choose, there is empirical data which the TP theory will
not cover. Our choice of which version to test will have to be based on other grounds.
There are two good theoretical reasons to choose a one-segment context for the present
study, and a third practical reason. First, we hope to equate "zero-frequency" and
"phonotactically illegal". This is plausible for sequences of length 2, but not for those of
length 4 - in the latter case, it leads to the claim that any 4-segment word which does not
already exist, such as [uloj], is illegal. Second, the MERGE TP theory contrasts with
TRACE in excluding the lexicon from prelexical phonetic processing. As the size of the
86
context increases, so does the number of phoneme sequences whose frequencies the
prelexicai module has to keep track of. Their number quickly approaches the size of the
lexicon. Finally, as a practical matter, long contexts make the frequency counts harder to
replicate, since each count is based on a smaller sample.
There is also empirical evidence supporting the one-segment context theory. Pitt
(1988) undertook a replication and extension of the Massaro-Cohen (1983) experiments.
He presented an [j]-[1] continuum to American English listeners in the synthetic contexts
[d_ae], [g_ae], [t_ae], [bjeJ, and [s_ae], and measured listeners' [j ] and [1] judgments. He
found a strong [j ] report bias (compared to the baseline [b_ae]) in [t_ae], a weaker one in
[d_ae], none in [g_ae], and a strong [1] bias in [s_ae].
Absolute per-million frequencies of [1] and [j ] after each of the initial consonants are
shown in (3.17). The ratio of these yields the a priori likelihood that an unknown liquid in
that context will be an [I], As we saw in §3.3.2.3, it is this ratio which, when the listener
uses an optimal guessing strategy, predicts the size of the response bias. The order of
effects predicted by the likelihood ratio is exactly the order of effects found by Pitt15:
>s Pitt himself interpreted these results as contradicting the probabilistic account o f the phonotactic effect.
This is because he assumed listeners were using a suboptimal guessing strategy. Rather than the likelihood
ratio, he took the predictor o f statistically-induced bias in favor o f a given cluster to be the sum o f the
logarithm o f the individual frequencies o f the words in which it occurrs.
87
Table 3.17. Likelihood ratio as a predictor of the phonotactic bias effects of Pitt (1998)
Frequency Empirically
(Francis- Ratio, measured bias
Sequence Kucera) F([1])/F([j ]) Statistical bias (Pitt 1998)
[ tl] 220 0.026 Strong [i] Strong [j]
[ti] 8468
[d l] 275 0.136 [i] Weak [j]
[d i] 2020
[gl] 626 0.163 [J] None
[g j] 3845
[b l] 2407 0.992 None (Baseline)

[bj] 2426
[s i] 815 62.7 Strong [I] Strong [1]
[s j ] 13
For the present study, therefore, we will pursue a probabilistic theory of
phonotactics in perception which makes the following claims:
1. The mechanisms of speech perception have access to a table of length-2 or
length-3 sequences occurring in the English lexicon, including their empirical frequencies.
We will estimate those from the Celex statistics on British English, checking them against
the Francis-Kucera statistics on American English, using the procedure described in the
Appendix to this chapter.
88
2. When an acoustically ambiguous segment between x and y is presented in the
context it will tend to be parsed as the one which is more frequent in that context.
The difference in rate of "x" report between the context A_B and the context C_D will
depend on the relative likelihood of x and y in those contexts. The influence of statistics
will be greatest where the acoustic ambiguity is greatest.
3. The relative likelihood of x and y in A_B and C_D can be computed in either of
two ways.
a. (INC-1 theory): Pr (x | A_) * Pr (x | _B) compared to Pr (y | A_) * Pr (y | _B).
b. (SC-1 theory): Pr (x | A_B) compared to Pr (y | A_B).
4. The TP effect happens very early, certainly prelexically. However, tasks that
directly tap phoneme perception (such as the syllable and phoneme judgment tasks used by
Massaro & Cohen (1983)) can be responded to on the basis of either a prelexical phonetic
representation, or on the basis of one retrieved from the lexicon after word recognition,
following the MERGE proposal of Norris et al. (2000).
3.4. A grammar-based account
A final possibility is that phonotactic regularities are not emergent, but fundamental;
that the mechanisms of speech perception have access to the possible, as well as the actual,
phonological configurations of their language, and are able to apply that knowledge in
perceptual tasks to constrain the hypothesis space.
The chief point at issue between the TRACE and MERGE TP theories on the one
hand and a grammatical theory on the other is the status of zero-frequency phoneme
sequences. TRACE and MERGE TP treat ail such gaps alike: The model simply notes the
non-occurrence of a particular sequence, and favors occurring sequences over it. A
grammar-based theory can draw the distinction, discussed in §2 . 1 , between true
phonological gaps (configurations which cannot occur) and mere lexical gaps
(configurations which happen not to have occurred).
89
That there is such a difference is the central claim we will test here. There are many
different ways to implement a theory of grammar in speech perception. The main model
parameters are the specific grammar to be used (§3.4.1.) and the rule for using it to decide
between alternative interpretations of an ambiguous phoneme (§3.4.2.). The model
presented here uses the grammatical framework of phonological Optimality Theory (Prince
& Smolensky 1993). entailing a decision model in which multiple candidate parses are
entertained in parallel.
3.4.1. Choice of grammatical theory
3.4.1.1. Grammatical framework
A grammar-based theory of phoneme perception could in principle be built around
any procedure which correctly separates the productive gaps from the non-productive ones.
I have chosen phonological Optimality Theory (Prince & Smolensky 1993, McCarthy &
Prince 1995).
OT is particularly well-suited to phonotactic modelling because phonotactic
markedness is a theoretical primitive in OT, embodied in the ranked markedncss constraints.
Surface representations can be compared for markedness by scoring them with respect to
those constraints. This contrasts with rule-based theories, in which markedness is an
emergent phenomenon.
Markedness, however, is not the same thing as illegality. A configuration is illegal
in an OT grammar if it is never in the output, regardless of what the input is. This is
reflected in the grammar by having a markedness constraint against the illegal configuration
dominate all of the faithfulness constraints which aim to preserve it, so that an input
containing the illegal configuration will be realized without it. The grammar may contain
many markedness constraints which do not dominate the relevant faithfulness constraints
and hence do not trigger repairs; configurations violating only such constraints are not
illegal (though they may be marked in other ways).
90
An example is the case of *PAL, the constraint which forbids palatal consonants
(§2.3.2.3.4). It dominates the faithfulness constraint lDENT[BACK], and hence is able to
compel violations of it:
(3.18) *PAL » IDENT[BACK]
/ca/ *P a l IDENT[BACK]
[ca] *!
-> [ka] *
A constraint C in a grammar is said to be active for an input i if at least one
candidate is eliminated by C (Prince & Smolensky 1993, Chapter 5). In (3.18), for
example, * P al is active for the input /ca/, because it is there that [ca] is eliminated.
In a given grammar, some constraints are never active for any input. An example in
English is *VoiCE](T, forbidding voiced obstruents in syllable codas (McCarthy 1998).
Voicelessness is obligatory for coda obstruents in many languages, including the standard
varieties of Russian, Polish, German, and Turkish. Illegal coda clusters are repaired
phonologically by devoicing, indicating that faithfulness constraints are being violated in
order to satisfy *VoiCE]o. (Hence, *VoiCE]o is active for some inputs in those languages
- it is the constraint which eliminates the candidate outputs with voiced coda obstruents.)
In English, however, *VoiCE]o is ranked too low to have such an effect. English
tolerates voiced coda obstruents (tub, leave, brag, etc.). No candidate is ever eliminated by
*VO!CE]a, which, therefore, is inactive in English.
91
(3.19) IDENT (for instance)» *V oiCE]a
/liv/ IDENT *VoiCE]a
-> [liv] *
[lifl *!
The general unmarkedness of voiceless codas in acquisition and cross-Iinguistically,
like the general markedness of palatals, are crucial to any grammatical model of language,
but are outside the scope of statistically-based models such as TRACE and MERGE TP.
3.4.1.2. Particular grammar
Within a given framework, there are at least as many different grammars as there are
languages, and for each language, perhaps as many as there are linguists. The predictions
of the perceptual model depend on which one is selected. If experiment falsifies these
predictions, the problem may lie with either the perceptual theory itself or with the
grammatical analysis of the given language (just as, if experiment falsifies a probabilistic
theory, the problem may be due to the perceptual theory itself, or to faulty frequency
counts).
It is therefore important to start by examining phenomena whose predicted
perceptual effects are insensitive to the choice of a specific analysis. The lack of onset [tl
dl] clusters, for instance, is a good choice, because those onsets are so robustly illegal that
any theory of English grammar (Optimality-Theoretic or not) has to ban them. In an OT
framework, that means they must be ruled out by some markedness constraint, which ipso
facto is active. Even if our analysis in Chapter 2 has pointed the finger at the wrong
markedness constraint, any alternative analysis will have a different one from which the
same perceptual consequences will follow.
Thus, we expect the perceptually influential phonotactic constraints of a language to
include, at a minimum, the ones which are by all measures productive: those that are not
92
naturally violated (having no lexical exceptions) and that speakers cannot be induced to
violate (either because the banned configurations trigger repairs, or because the speaker
simply cannot pronounce them without great effort). For more detailed discussion, see
§ 2 .2 .
3.4.2. Decision mechanism
The most straightforward adaptation of OT to speech perception is essentially this:
Linguistic effects on speech perception come about because language limits the set of
available parses. The listener constructs a phonological parse at two levels of representation,
corresponding to OT's underlying /UR/ and surface [SR]. The [SR] is computed from the
acoustic signal, while the /UR/ is retrieved from the lexicon. Different (/UR/, [SR]) pairs
compete to account for the observed signal, with the grammar as referee: The (/UR/, [SR])
pairs are scored by the hierarchy of markedness and faithfulness constraints of the
language, and perception favors the most harmonic pair. Thus, the OT grammar does the
same job in speech perception that it does in linguistic theory: It compares (/UR/, [SR])
pairs and picks the most harmonic.
For example, in Pitt (1998)'s replication of the experiments of Massaro & Cohen
(1983), using nonword stimuli, there are no /UR/s to deal with, so the issue is decided by
the markedness constraints:
(3.20) Both endpoints legal => no grammatical bias
UR = • I OCP(CONT, COR) S pr ea d [C o r ]
a. (•, [hiae]) 1
b. (•, [blae]) |
93
(3.21) [I] illegal => [j ] bias
UR = • OCP( co n t , C or ) S prea d [C o r ]
a. (•, [Us])
b. (•, [tls]) *!
( 3 .2 2 ) [ j ] ille g a l = > [1] b ia s
UR = • OCP( co n t , C or ) S p r ea d [C o r ]
a. (•, [s j s ]) *!
b. (•, [sis])
In this model, the incoming acoustic signal is first transduced into one or more
surface phonetic representations, or [SR]s. The transducing mechanism is a black-box
component which in Figure 3.3 is labelled "Phonetic Parser"; it could also be called a
"Feature Extractor". Given a speech stimulus, it produces a set of [SRJs consistent with that
stimulus.
94
Figure 3.23. Architecture of an OT-grammatical-based parsing model
/UR1/ /UR2/ /UR3/ ... /URn/ Lexicon
OT grammar
[SR] Parse(s)
Phonetic
parser
ftr f+- Acoustic signal
I assume that under normal laboratory conditions, with a short stimulus clearly
spoken, the Phonetic Parser will emit a single [SR]. Two or more [SR]s can be coaxed out
of it by presenting an acoustically ambiguous stimulus. The likelihood that a stimulus

between, say, [i] and [I] will evoke [j ] is assumed to be independent of the likelihood that it
will evoke [1]; for a given stimulus level, there is a certain probability of getting [j ], a certain
probability of getting [1], a certain probability of getting both, and a certain probability of
getting neither. These probabilities change depending on the acoustic constitution of the
stimulus.
The candidate [SR]s are assumed to represent syllabification. It is not in dispute
that syllables can be incorporated into a prelexical representation, since nonsense words,
which lack a lexical representation, can be syllabified in off-line judgment tasks. The
question is whether the syllabic structure is automatically computed as part of the parsing
process. There is evidence that it is. Syllable boundaries are needed for segmentation and
lexical access, so they have to be marked in the input to the lexical-access stage. In on-line
95
word-spotting tasks, English listeners are better at finding a word boundary when it is
aligned with the left-hand boundary of a stressed syllable (McQueen et al. 1994, Cutler &
Norris 1988), which suggests that the input is parsed exhaustively into feet. A word
boundary is harder to find if a syllable boundary drawn at that point would create a
phonotactically impossible syllable (e.g., spotting apple in fapple) (Norris et al. 1997,
McQueen et al. 1998). Abstract grammatical preferences for maximal onsets, and for
syllabifying intervocalic consonants with the more stressed vowel, are also detectable in
word-spotting - generalizations which only make sense when stated in terms of syllables
(Kirk 2001) . ‘6
As in the Shortlist model of Norris (1994), lexical entries are activated by an [SR]
which is sufficiently similar to them. I will assume that "sufficient" similarity is determined
by the neighborhood metric; a lexical competitor is any lexical item whose /UR/ can be
obtained from one of the active [SR]s by a one-segment insertion, deletion, or replacement.
Competition between all of the active (/UR/, [SR]) pairs then takes place through the
grammar. 1718
16 Evidence against syllabification com es chiefly from sequence-monitoring tasks. These are known to be
sensitive to syllabification in certain languages. For example, M ehler, Dommergues, Frauenfelder, and
Segui (1981) found that French listeners detected a CV or CVC target faster when it exactly matched the
first syllable o f the stimulus word - ba being detected faster in ba.lance than in bal.con, but bal being
detected faster in bal.conthan in ba.lance. English speakers show no such difference, whether tested with
English (ba/bal in bal.cony and the ambisyllabic ba.lance/bal.ance) or with the original French materials
(Cutler, Mehler, Norris, & Segui 1986). The authors interpreted this to mean that English listeners do not
use on-line syllabification to segm ent speech, even in cases like bal.cony where the syllabification is
unambiguous. However, K irk (2001) argues instead that, owing to the effects o f stress on syllabification,
the first syllable o f balcony is bale, and hence that neither target m atched a syllable. For an extensive
critical review o f the evidence for and against on-line syllabification in English and other languges, see Kirk
(2001, Chapter 2).
17 Although this description is confined to the single-word stimuli actually used in the experiments, it can
be extended to longer utterances in a straightforward way. The Phonetic Parser emits one or more candidate
[SRjs, as before. Candidate word /UR/s are activated by sufficiently sim ilar substrings of the [SR]s;
candidate utterance /UR/s are the concatenations o f nonoveriapping word /UR/s. These utterance (/UR/,
[SR]) pairs then compete as before. This allows the theory to capture word segmentation and inter-word
sandhi phenomena.
18 For bilingual listeners, this model assumes that the grammar is selected before the utterance is parsed
phonologically. It is conceivable that bilingual listeners parse an incom ing utterance in both languages
simultaneously, and sem antically interpret it in whichever language yields the more harmonic phonological
parse. However, existing studies o f language identification suggest that infant and adult listeners use
rhythmic characteristics, rather than inventory or phonotactics (Stockm al, Moates, & Bond 2000; Nazzi,
Jusczyk, & Johnson, 2000). Languages with similar rhythm can be discrim inated by infants (e.g., Catalan
and Spanish, Bosch & Sebastian-G ali£s 1997), but it is not known what information is used.
96
It is at this point natural to posit that the grammar scores each pair and chooses the
most harmonic. This theory is elegant but fatally flawed. The root of the problem is that
such a literal implementation of OT inherits the OT principle called "Strictness of
Domination", which states that if Constraint A is ranked above Constraint B, then violating
Constraint A even once is less harmonic than violating Constraint B any number of times.
Strict domination is the only means which OT affords to represent one constraint's primacy
over another. Violating A cannot be just a little bit worse than violating B; it must be either
infinitely worse, equally bad, or infinitely better (Prince & Smolensky 1993:78).
One consequent prediction is that inactive markedness constraints should have just
as big an impact on perception as active ones. In order to account for cross-linguistic
patterns in the sound inventories of languages, OT posits certain universally fixed rankings
- e.g., that labial and dorsal articulations are universally more marked than coronal
articulations. In the grammar of English, which allows both labials and coronals, neither the
anti-labial constraint nor the anti-coronal constraint is active; however, they are still in the
grammar and one still dominates the other. A stimulus which is ambiguous between [ba]
and [da] therefore has two interpretations, one of which violates the constraint against
labials, the other of which violates the lower-ranked constraint against coronals. Since
labials are by hypothesis infinitely worse than coronals, perception should strongly favor
[da] over [ba]. This does not seem to always be the case: For example, when Luce (1986,
Ch. 3) presented listeners with a balanced set of CVC nonsense words in noise at a +5 dB
signal-to-noise ratio, he found that final [b] was reported as [d] 27 times out of 150, while
final [d] was reported as [b] 24 times. 19
In order to make the necessary distinction between significant and insignificant
markedness differences, we must stipulate that inactive markedness constraints carry little if
19 Syllable-initially, [b] was reported as [d] far more often than the reverse: 39 times versus 4 at a +5 dB
signal-to-noise ratio, 23 times versus 4 at -5 dB. Even syllable-final [b] was reported as [d] 15 times at a
-5 dB ratio, versus 6 times the other way around. It m ay be that the low markedness, or perhaps the high
frequency, o f [d] is having some sort o f effect. This effect is, however, not as overwhelming as expected,
and may in any case be due to the spectral quality o f the noise used (white Gaussian noise up to 4.8 kHz),
which is sim ilar to the diffuse-rising spectrum o f alveolar plosive bursts (Blumstein & Stevens 1979).
97
any perceptual weight. That is, the (/UR/, [SR]) pairs are evaluated only by that part of the
constraint hierarchy which ranks above the highest-ranked inactive markedness constraint.
A second problem that this proposal runs into is the same one which bedeviled
TRACE: the inability to adjust the influence of the lexicon based on attentional factors.
Lexical effects, such as the Ganong effect, would in this theory be captured by faithfulness
constraints:
(3.24) [d] nonword, [t] word => [t] bias
UR = /taesk/ ID-V oice
a. (/taesk/, [daesk]) *!
b. =>(/taesk/, [taesk])
(3.25) [d] word, [t] nonword => [d] bias
ID-Voice
II
a. => (/daej/, [daej])
b. (/daej/, [taej]) *!
The violation, and hence the predicted bias, is just as large regardless of how much
attention the listener allocates to the lexicon, contrary to the findings of Cutler et al. (1987).
This is only an apparent problem. All experimental results are obtained by
averaging over a large number of trials. Suppose that on each trial, the listener either
"attends to the lexicon" - i.e., insists on a parse with an /SR/ - or does not, on a trial-by-trial
basis. The task manipulations used by Cutler et al. (1987) can be seen as changing the
probability, rather than the extent, of the listener's attention to the lexicon on each trial, and
hence the number of lexically-biased responses which were averaged into the data.
The decision rule we must adopt is therefore:
98
(3 .2 6 )
a. If attending to the lexicon, and if the stimulus is close enough to a
real word to activate some /SR/s, choose the (/UR/, [SR]) pair which scores
best on the active constraints.
b. Otherwise, choose the ([SR]) which scores best on the active

markedness constraints (the faithfulness constraints have no function in the
absence of a /UR/).
c. Ties are broken randomly.
As in the Race model of Cutler et al. (1987), responses to phoneme tasks can be
based on either the computed [SR] or the retrieved /UR/, with task constraints dictating
which is favored in each case. There is laboratory evidence for the existence of both levels
and for attentional effects on them.
Xu (1991) showed that Mandarin Chinese speakers had poorer recall for written
lists of rhyming morphemes when the list elements shared the same tone than when they
differed in tone. Speakers were then asked to perform the same task with lists constructed
so that the first two items had the same surface tone but different underlying tones, and
performance was compared with lists in which the first two items had different surface and
underlying tones. Performance was worse on lists of the first sort, suggesting that the
short-term memory representation in this task was in terms of [SR]s.
On the other hand, Lahiri & Marslen-Wilson (1991), using a gating task, found that
listeners interpreted vowel nasalization differently depending on their native language:
English listeners took it as a sign of an upcoming nasal consonant, since English vowels are
not inherently nasalized, but become so in a nasal phonetic context. Bengali listeners, on the
other hand, speak a language which has both inherently (i.e., contrastively) nasalized vowels
and contextually nasalized vowels. They overwhelmingly interpreted vowel nasalization as
underlying (i.e., did not take it as a sign that a nasal consonant was coming up) until they
actually heard the beginnings of the nasal consonant. This suggests that the Bengali lexicon
represents contextually nasalized vowels as not nasalized, showing a difference between
99
lexical representation and surface phonetic representation which the gating task (an
inherently lexical task) revealed. It further indicates that the Bengali speakers were
choosing the more faithful (/UR/, [SR]) pair in which the underlying and surface vowel had
the same degree of nasalization over the less faithful pair in which they differed, as we
would expect.20
3.5. Summary
This chapter has presented three very different theories of phonotactics in speech
perception.
The TRACE model sees phonotactics as an effect of similarity to lexical items. Of
the three theories, it makes the smallest demands on the learner, requiring knowledge only
of the lexicon. Phonotactic effects are viewed as diluted lexical effects, in which permitted
configurations are supported by partially-overlapping lexical items, which allows them to
defeat competing illegal candidates via lateral inhibition at the phoneme level.
The MERGE TP model attributes phonotactic effects to differing frequencies of
short phoneme sequences. The theory requires knowledge of the lexicon and of a set of
attested phoneme sequences, which may be quite large but can be acquired straightforwardly
through observation. Phonotactic effects are taken to occur at a pre-lexical level, with rare
sequences being perceptually disfavored.
20 If the OT interpretation o f Lahiri & M arslen-W ilson’s results is correct, it is empirical evidence against
the OT principle o f Lexicon Optimization (Prince & Smolensky 1993, Inkelas 1994). Lexicon
Optimization is a means o f dealing with the source-filler nature o f the O T grammatical m odel, which can
map several /UR/s to the sam e [SR]. In acquiring the lexicon, it is asserted, the /U R/ which is chosen is
the one to which the observed [SR] is most faithful.
We would be led to expect Bengali speakers to represent surface [CVnN] words as underlying
/CV nN/, which map to the sam e output more faithfully than an underlying /CVN/ would. W e would
therefore expect a gating stim ulus o f the form [CVn ...] to often be com pleted with an N, i.e., matched to a
word whose underlying representation is /CV nN/. Instead, they were overwhelmingly matched to words
whose underlying representation was /CVKT/, suggesting that the surface [CVnN] words are underlyingly
/CVN/. The study's finding that speakers apparently lexicalize surface contextually nasalized vowels as
underlying non-nasalized vowels indicates they are not using Lexicon Optimization.
100
The OT grammatical model sees perceptual phonotactic effects as a consequence of
the limited range of parses available in the language, and the listener's bias towards a parsed
percept. The implementation used here requires knowledge of the lexicon, and of a set of
constraints. The number of constraints needed is probably not very large (a grammar of the
syllable onsets of English, in Chapter 2, needed well under 20), and the correct ranking is
provably leamable (Tesar & Smolensky 1995); however, their provenance is unclear. They
are normally taken to be innate, since the patterns they represent occur world-wide.21
Phonotactic effects are assumed to occur at a prelexical level, the level of surface
representations, with banned sequences being perceptually disfavored.
Each of these theories suffers from empirical drawbacks in one domain or another.
TRACE has difficulty explaining why phonotactic effects are more robust than lexical
effects. The MERGE TP model cannot be pinned down on precisely which phoneme
sequences are perceptually relevant; different choices leave different lab results unexplained.
The OT grammatical model accounts for effects of illegality, but not the apparent (usually
very small, but definitely detectable) effects of sequence frequency or as lexical
neighborhood (Newman, Sawusch, & Luce 1997; Pitt & McQueen 1998 Exp. 4).
The drawbacks of one model are, naturally, the advantages of the others. TRACE is
theoretically attractive because it offers an extremely parsimonious learning model. Because
sound-meaning relations are arbitrary, the lexicon must be learned in any theory. TRACE
says that only the lexicon must be learned, and that apparent effects of grammatical
regularity are really emergent properties of lexical interaction. MERGE TP is only slightly
less parsimonious - only the lexicon must be learned, but the relevant regularities have to be
actively abstracted from it by the probability-tracking system. Though both theories require
innate structure in the perceptual system, neither requires detailed innate knowledge the way
21 Moreover, since the constraints are violable and do get violated, they cannot be individually inferred from
the speech corpus by any simple m echanism — especially the markedness constraints, being prohibitions for
which no positive evidence can exist. (Naturally, a linguistically more sophisticated mechanism could take
advantage o f alternations to deduce abstract underlying forms and the markedness constraints necessary to
cause the alternations.)
101
the OT grammatical theory does. As a practical matter, it is also easier to make predictions
from TRACE and MERGE TP than from any grammatical theory, since less analytic depth
is required.
In CHAPTER 4, our focus will be on the interesting claim, put forth by the TRACE
and MERGE TP theories and denied by the OT grammatical theory, that phonotactic
illegality is equivalent to zero frequency. The claim is interesting because it suggests that
phonology, at least in perception, is considerably simpler than many linguists have hitherto
supposed, and offers a means of circumventing the difficult problem of grammar
acquisition.
3.6. Appendix: Computing frequencies
All frequency counts were made from the Celex lexical database (Baayen et al.
1995). This is based on a corpus of 16.6 million words of written English and about
800,000 words o f spoken English. Most of the corpus is from British sources, and the
phonetic transcriptions are British.
Celex provides two ASCII phonetic transcription systems. I used the one found in
Field 7 of the file EPW.CD. Variant pronunciations are given for some words, but I always
used only the first pronunciation listed.
Celex gives frequency counts by "lemma" (i.e., citation form, with know, knows,
knew, and knowing all lumped together) and by "wordform" (i.e., counting inflected forms
separately). In both cases, homophonous words belonging to different grammatical
categories are counted separately (e.g., link noun and link verb). I used the wordform
database (except where otherwise noted), specifically, the files EPW.CD (the
pronunciations) and EFW.CD (the frequencies).
Frequencies are counted separately for the written and spoken corpora. A
"combined" frequency count is also given; since most of the corpus is written, the
"combined" frequency is usually very close to the written frequency. I have used the
102
spoken counts except where otherwise noted (non-spoken frequencies are used only for
compatibility with counts based on the Francis-Kucera (1967) written-corpus norms). All
counts are from the Celex per-million-words estimates (combined, Field 6 ; written, Field 9;
spoken, Field 12; of EFW.CD).
The Celex transcription system marks syllable boundaries and includes stress marks
for primary and secondary stress. These were removed.
The scripts used to create and process the frequency counts are appended.
(3.27) Script for counting frequency of length-n sequences
# !/usr/local/bin/perl
# make_ngram_table
# usage: make_ngram_table <n>
# where <n> = # of segments per gram
# Each word is enclosed in wd boundary markers"(" and")", which

# count as phonemes.
$n = $ARGV[0];
$phon_db = '/tmp/Celex/EPW.CD';
$freq_db = Vtmp/Celex/EFW.CD’;
open (PHON, "< $phon_db") || die "Couldn't open $phon_db";
open (FREQ, "< $freq_db") || die "Couldn't open $freq_db";
while ( ($phon_buf = <PHON>) && ($freq_buf = <FR EQ » ) {
# read a record from EPW.CD and EFW.CD
($phon_IDnum, $phon_orth, $phon_freq_comb, $foo, $foo, $foo, $pron)
split AV, $phon_buf;
($freq_IDnum, $freq_orth, $foo,

$freq_comb, $freq_comb_dev, $freq_comb_perM, $freq_comb_loglO,
$freq_writ, $freq_writ_perM, $freq_writ_loglO,
$freq_spok, $freq_spok_perM, $freq_spok_loglO
) = split AV, $freq_buf;
( ($phon_IDnum = $freq_IDnum) && ($phon_orth eq $freq_orth)) ||

die "$phon_db mismatches $freq_db:\n$phon_buf$freq_buf';
# Purge pronunciation of non-segmental characters
103
($segment_pron = $pron) =~ trA-VYV/d;
# Find the n-grams and count their frequencies

@segments = ( (split ", $segment_pron),')’ );
foreach $i (0 ..($#segments - $n + 1 )) {
$ngram = join ", @segments [$i..($i + $n - I)];
$freq_comb_perMs {Sngram} += $freq_comb_perM;

$freq_writ_perMs {$ngram} += $freq_writ_perM;
$freq_spok_perMs {Sngram} += $freq_spok_perM;
}
# Print out the ngrams and their frequencies

foreach $ngram (keys %freq_comb_perMs) {
printf "%s\t%5d\t%5d\t%5d\n",
Sngram,
$freq_comb_perMs {Sngram},
Sfreq_writ_perMs {Sngram},
Sfreq_spok_perMs {Sngram};
I
(3.28) Script for turning those counts into TPs
#!/usr/local/bin/perl
# make_TP_table
# Given a list of n-grams and their frequencies, computes transitional

# probabilities from XlX2...X(n-l) to Xn.
# Input format is
# <Xl...X(n-l)Xn> <combined freq> <written freq> <spoken freq> etc.
# Output format is
# <Xl...X(n-l)> <Xn> <P(Xn | Xl...Xn-l), combined> <same, written>etc.
while (Sbuf = <STDIN>) {
(Sngram, @freqs) = split As+/, Sbuf;
@segments = sp lit", Sngram;

Slastseg = pop @segments;
Scontext = join ", @segments;
# Count occurrences of each context Xl...X(n-l), and of each

#ngram Xl...Xn.
foreach Si (0 ..$#freqs) {
$context_freqs [$i] {"Scontext"} += Sfreqs [Si];
$ngram_freqs [$i] {"$context$lastseg"} += Sfreqs [$i];
104
# Compute transition probabilities conditional on Xl...X(n-l).
foreach $ngram (keys %{$ngram_freqs[0 ]}) {
@segments = split", $ngram;

$lastseg = pop @segments;
Scontext = join", @segments;
foreach $i (0 ..$#freqs) {
$TPs [$i] = '(none)';
next unless $context_freqs [$i] {"Scontext"}; # avoid /0 errors
STPs [$i] = sprintf "% 6.3f’,
($ngram_freqs [Si] {"ScontextSlastseg"} / $context_freqs [Si]
{Scontext});
}
print "$context\t$lastseg\t";
print join "\t", @TPs;
print "\n";
}
(3.29) Script for finding the active cohort following a given phonological string
#!/usr/local/bin/perl
# cohort
# Given a phoneme string, find all words in Celex EPW.CD/EFW.CD

# which begin with that string. Print each word and its per-
# million frequencies.
# Usage: cohort <string>
Sbeginning = shift @ARGV;
open (PHON, "cat /tmp/Celr;x/EPW.CD |") || die "Couldn’t open

EPW.CD";
open (FREQ, "cat /tmp/Celex/EFW.CD |") || die "Couldn't open EFW.CD";
while (($phon_buf = <PHON>) && ($freq_buf = <FREQ>)) {

(Sphon_fDnum, $phon_orth, $phon_freq_comb, Sfoo, Sfoo, Sfoo,
Spron) =
split AV, $phon_buf;
(SfreqJD num , $freq_orth, Sfoo,

$freq_comb, $freq_comb_dev, $freq_comb_perM, Sfreq_comb_loglO,
$freq_writ, Sfreq_writ_perM, Sfreq_writ_loglO,
$freq_spok, $freq_spok_perM, $freq_spok_IoglO
) = split AV, $freq_buf;
( ($phon_IDnum = SfreqJD num ) && ($phon_orth eq $freq_orth)) ||

die "$phon_db mismatches $freq_db:\n$phon_buf$freq_buf’;
# Purge pronunciation of non-segmental characters
105
($segment_pron = $pron) =~ trA-\Y//d;
# Is it in the cohort?
next unless ($segment_pron =~ /A\Q$beginning\E/);
# Yes —print
printf "%s ”, Sbeginning;
printf "%6 d ", $freq_comb_perM;
printf "%6 d ", $freq_writ_perM;
printf "%6 d\t", Sfreq_spok_perM;
printf ”%s\t", $freq_orth;
printf "%s\n", $segment_pron;
)
(3.30) Script for simulating statistically-based guessing
# !/usr/local/bin/per!
# simulated_guess
# Simulated experiment, illustrating the usefulness of TPs. A subject

# hears a corpus of English (a list of words, each word selected from
# Celex such that the English vocabulary occurs with its empirical
# frequency). At random, infrequent intervals, a word is truncated
# after at least (n-1) segments. The listener predicts the next one
# by consulting a table of n-grams, and guessing that the next segment
# will be whatever is most likely to follow the last (n- 1 ) segments
# of the stimulus.
# Input is output of make_ngram_table, awked to: <gram> <freq>
# Output is the expected proportion of trials on which the subject

# guesses correctly.
# Count frequencies
while (Sbuf = <STDIN>) {

(Sgram, Sfreq) = split As+/, Sbuf;
@segs = split", Sgram;

Snextone = pop @segs;
Scontext = join ", @segs;
Stotfreq += Sfreq; # frequency with which ngrams occur

Scfreq {Scontext} += Sfreq; # freq of ngrams starting with this (n- 1 )
gram
Sgfreq {Sgram} += Sfreq; # frequency of this ngram
# Keep track of likeliest ngram beginning with each (n-I)gram

$current_best = $best_guess {Scontext};
if (Sfreq > Sgfreq {"$context$current_best"}) {
$best_guess {Scontext} = Snextone;
}
}
$n = length (Sgram);
106
# Print guessing strategy
foreach Scontext (keys %cfreq) {
print "Scontext $best_guess{ Scontext }\n”;
}
print "\n";
# Simulate experiment
srand (time());
$CELEX_SIZE = 18000000;
STRIALS = 100000;
open (EPW, "cat/tmp/Celex/EPW.CD | ") || die "Couldn’t open EPW";

while (Sbuf = <EPW>) {
(Sfoo, Sorth, $comb_freq, Sfoo, Sfoo, Sfoo, Spron) = split AV, Sbuf;
next unless $comb_freq;
$seg_pron = ";
TRIAL: for (Si = 1; Si <= $comb_freq; $i++) {
# Each word gets as many lottery tickets as its frequency,

# and each ticket has $TRIALS/$CELEX_SIZE chance to win. This
# insures that any given word has its natural probability of
# being used on any given trial, and that the expected # of
# trials is STRIALS.
$r = int (rand ($CELEX_SIZE));

if ( $r >= ($CELEX_SIZE - STRIALS)) {
unless ($seg_pron) {
($seg_pron = Spron) =~ trAY\-//d;
$seg_pron = "(" . $seg_pron .")";
last TRIAL if (length ($seg_pron) < $n);
}
print "Sorth Si $seg_pron ";
$ngram_start = int (rand (length ($seg_pron) - $n + 1 ));

Sngram = substr ($seg_pron, $ngram_start, $n);
Scontext = substr (Sngram, 0, $n-l);
Snextone = substr (Sngram, - 1 );
print "Scontext $nextone\n";
$total_trials++;
$correct_trials++ if (Snextone eq $best_guess {Scontext});
}
}
}
print "$n: $correct_trials right out of $total_trials: ";

printf "%6.3f\n", $correct_trials/$total_trials;
107
CHAPTER 4
EMPIRICAL TESTS
4.1. Introduction
The previous chapter described three contrasting theories of the origin of
phonotactic effects in phoneme tasks. TRACE (McClelland & Elman 1986) holds that
phoneme perception at the very lowest level is directly influenced by the downward spread
of activation from the lexicon. The MERGE TP theory (Pitt & McQueen 1998) assigns the
phonotactic effects to a prelexicaJ level of processing which is sensitive to the frequencies of
sub-word phoneme sequences. A performance theory that uses grammatical knowledge,
implemented using Optimality Theory (Prince & Smolensky 1993), attributes phonotactic
effects to the restrictions placed by the sound pattern of the language on the set of available
parses.
In this chapter, we will present empirical evidence bearing on these theories,
focusing on how each theory distinguishes "legal" from "illegal" sequences.
Experiment 1, aimed at distinguishing TRACE from the MERGE TP and OT
grammatical theories, shows that the size of the phonotactic boundary shift is not modulated
by degree of phonological overlap with existing words. This is unexpected in TRACE,
which takes phonological overlap to be the source of all phonotactic effects. It is expected
in MERGE TP and the OT grammatical theory - in both of which the phonotactically
relevant context is too small to include the region manipulated to produce overlap.
Experiment 2, intended as a replication of Experiment 1 with very different stimuli,
attempted to find a phonotactic effect of the markedness of word-initial [pw] in English. No
such effect was found. This is unexpected under the TRACE and MERGE TP theories,
since the statistical properties of [pw] are very similar to those of [tl] and [s j] (the clusters
used by Massaro & Cohen 1983). A natural explanation in the OT grammatical theory is
108
that [pw], although rare in English, is not illegal - it does not violate an active markedness
constraint.
Experiment 3 was therefore performed to compare directly the phonotactic
effectiveness of the markedness of [pw] and [tl]. The results indicated that [pw] was in fact
less disfavored than [tl]. This result is expected in the OT grammatical theory, but is quite
unexpected in MERGE TP. Experiment 3 also replicated Experiment 1 in finding no
modulating effect of the degree of phonological overlap with existing words, contrary to the
predictions of TRACE.
Where Experiments 1-3 looked at the effect of stimulus variables on response
variables, Experiments 4 and 5 investigated the dependency between different responses.
Listeners heard stop-sonorant clusters in which both consonants were ambiguous, and
judged both. The effect of the stop judgment on the sonorant judgment was assessed
separately for each individual stimulus in a 6 x6 array, providing a measure of phonotactic
bias when all acoustic factors were completely fixed. The sonorant in both experiments was
an "l"-"w" scale.
Experiment 4, using CCV stimuli, found that a "d"-or-"g" decision affected the odds
of an "I" response, with ”d" making "1" less likely, while a "b"-or-"g" decision had no
effect. This confirms that the results of Experiments 2 and 3 (the smaller bias against [pw]
than [tl]) were not due simply to closer perceptual spacing of the stimuli at the labial end of
the scale. The existence of a response dependency is inconsistent with the TRACE
response mechanism; the larger effect of the "d"-or-"g" decision is unexpected in the INC-1
version of MERGE TP.
Experiment 5 compared the effect of a "d"-or-"b" decision in CCV stimuli with that
in VCCV stimuli. There was a strong effect in the CCV condition, but none in the VCCV
condition, indicating that the weakness of the ban on [pw bw] found in Experiments 2-4 was
not due to compensation for coarticulation, and suggesting that the parser determines
segmental identity and syllabification in parallel.
109
Experiments 6 ab examined the perceptual effects of an abstract morpho-
phonological variable, lexical stratum membership in Japanese. Phonological cues to
stratum membership were found to cause the phonotactics of the particular stratum to be
imposed upon ambiguous stimuli, causing a perceptual boundary shift. This has a natural
account in the grammatical theory, where lexical stratum is a necessary theoretical entity. It
is unexpected in TRACE and the MERGE TP theory, for both of which a division of the
lexicon into strata is unmotivated. It is shown that the stratum-phonotactic effect cannot be
emergent in TRACE, because it is weaker than a lexical effect obtained with the same
subjects and paradigm. It cannot be emergent in the MERGE TP theory either, because
perception of the ambiguous segment is influenced by other segments which are too far
away for MERGE TP to connect them.
The results are argued to support the OT grammatical theory over TRACE and
MERGE TP.
4.2. Experiment 1: Sequence frequency and the phonotactics of word-final lax
vowels
4.2.1. Rationale
If phonotactic effects are really lexical effects, as claimed by TRACE, then their size
should be modulated by the same factors that control the size of lexical effects: Similarity
to existing words, and number of similar words. If that is so, a nonword that is similar to
many frequent words should induce stronger phonotactic effects than one that is similar to
few words and rare.
For this, we can exploit the phonology of the English lax vowels. The lax vowels, [t
e ae a o], form a separate system from the other "tense" vowels of English, both
phonetically and phonologically (see, e.g., Ladefoged 1993:86-88).
110
Like all American English vowels, the lax vowels are somewhat diphthongal, but
where the tense vowels are peripheralizing diphthongs, the lax vowels are centralizing; that
is, the offset of a tense vowel is further from schwa than the onset, but the offset of a lax
vowel is closer to schwa than the onset. The lax vowels are also somewhat shorter and less
strongly diphthongized (Nearey & Assmann 1986; DiBenedetto 1989).
Phonologically, the lax vowels are distributed differently from the tense vowels. The
lax vowels do not occur in word-final open syllables;
Table 4.1. Distribution of tense and lax vowels in American English
Tense vowels Lax vowels Open syllables
i he
-
i
ei hay
e -
ae -
a pa
0 paw
ou hoe
u -
A -
111
Not only do lax vowels not occur there, they can not occur there. The intuitive
badness of such nonwords as [hi he hae hu hA] is quite strong1. That the gap is
phonological rather than lexical is illustrated by the change of lax to tense vowels when
words are coined by truncation:
Table 4.2. Change of lax to tense vowels when made final by truncation
Base Truncated form
del[i\catessen del[i]
Mun[i\cipal Transportation Authority Mun[\]
Un[i]versity Un[ i]
D[l]rdre D\\]d[i)
TRACE ought to be able to model the lack of word-final lax vowels. The
phonotactic ban should emerge from the large population of words ending in tense vowels
and the nonexistence of any words ending in lax vowels. Activation spreading from the
tense-final words should shift the phonotactic boundary on a word-final [i]-[i] continuum,
compared to a baseline condition where the [i]-[i] continuum is not word-final.
However, since it is similarity to real words that produces the lexical activation, the
size of the shift should be larger when the rest of the phonological context (i.e., material
besides the [i]-[i] itself and the immediately following segment or boundary) is similar to
more existing words. TRACE does not distinguish between parts of the stimulus that are
1 Lax vowels can occur syllable-fmally in onomatopoea: bleah [bis], baa [bee:] (sound made by a sheep).
M arginal phonology is often found in this dom ain; e.g., boing [boir>] (sound made by a spring), which has
both a diphthong before [q] and a non-coronal after [01 ].
112
relevant to the phonotactic generalization, and parts which are not. Everything in the
stimulus can contribute through lexical activation.
The MERGE TP theory focuses on a smaller part of the stimulus. One version,
which we called INC-1 in §3.2.3.3, considers separately the ambiguous segment and the
immediately preceding segment on the one hand, and the ambiguous segment and the
immediately following segment on the other. A phonotactic boundary shift is predicted to
arise because of the rarity of the [lax vowel]-[word boundary] sequence. Another version,
which we called SC-1, considers the preceding, ambiguous, and following segments as a
unit. For any choice of preceding segment, SC-1 also predicts a phonotactic boundary shift,
owing to the rarity of the [preceding segment]-[lax vowel]-[word boundary] sequence.
Stimulus context more than one segment away from the ambiguous vowel does not enter
into the calculations and should not affect judgments.
The OT grammatical theory relies on the grammatical illegality of syliable-final lax
vowels in English. The markedness constraint against them, which I will call *Lax ]o , is
able to trigger repairs, and hence dominates the faithfulness constraint IDENT-V. Since
*Lax]<j is an active constraint (in the sense of §3.4.2), it is expected to penalize any parse
of the stimulus which postulates a syllable-final lax vowel, producing, in acoustically
ambiguous cases, a bias towards parses with a tense vowel. The constraints apply equally to
all phonological configurations meeting their structural description. Hence, this theory
predicts that only phonological context directly involved in the phonotactic prohibition (the
vowel and the immediately following segment or boundary) will contribute to the
phonotactic boundary shift.
4.2.2. Design
The aim was to test the prediction of TRACE that the size of the phonotactic effect is
determined by the similarity of the stimulus to words in the lexicon. Listeners judged the
113
middle 5 steps of a 7-step continuum from the tense [i] to the lax [i] in each of 16 carrier
contexts:
The phonotactic legality of the [i] endpoint could, as 4.3 shows, be varied by leaving
the final syllable open or closing it with [d3 ]. Since the segment preceding the ambiguous
vowel was always [j ], the TP theories (both INC-1 and SC-1) and the OT grammatical
theory expect only the open-closed manipulation to affect the location of the [i]-[i]
boundary.
Similarity of the stimulus to other words in the lexicon was varied by manipulating
the voicing of the consonant preceding the [j]: When the consonant was [g], the stimulus
was closer to more words than when it was [k]. Table 4.4, extracted from Celex's wordform
database (EPW.CD and EFW.CD), shows the English words ending in each of the eight
final syllables used in this experiment.2 Celex's British English transcriptions have been
Americanized in four cases by changing word-final [i]s to [i]s. 3
2 For this experiment, counts were computed over wordforms rather than lemmas because the relevant
dependency (between a vowel and a word boundary) is affected by inflection. In other experiments, which
used initial clusters, the lemma and wordform frequencies are the same.
3 Celex has final [i] for angry, hungry, kukri, and mimicry, and, in general, for final unstressed IU
elsewhere. Jones (1997) gives [i] as both the BrEng and AmEng pronunciation o f all o f these words (except
kukri, which isn't in that dictionary).
114
Table 4.3. Phonotactics of stimuli for Experiment 1
C a rrie r
[i] [I ]
[z o lg jJ V X
M g jJ V X
[p a lg jJ V X
M g iJ V X
[z o lk i_ J V X
[s o lk ij V X
[p o lk jJ V X
[to lk ij V X
[z a lg j_ d 3 ] V V
[s a lg i_ d 3 ] V V
[p o lg j_ d 3 ] V V
[to lg i_ d 3 ] V V
[ Z 3 lk i_ d 3 ]
V V
[s o lk j_ d 3 ] V V
[p o lk j_ d 3 ] V V
[ta lk i_ d 3 ] V V
Table 4.4. Frequency of the syllables in stimuli for Experiment 1
Frequency (per million words)
Word Combined Written Spoken
[gji]4
angry 65 68 19
hungry 34 36 3
agree 20 16 75
bachelor's degree 0 0 0
degree 105 100 170
disagree 2 2 14
filigree 1 1 2
first-degree 1 1 1
pedigree 2 2 2
second-degree 1 1 0
agree 20 16 75
disagree 2 2 14
agree 20 16 75
disagree 2 2 14
agree 20 16 75
disagree 2 2 14
[gnl
decree 0
4 Words with these onsets were extracted from Celex using the t r o h o c script. See the appendix to
Chapter 3 for the script and details o f how frequency counts were made.
116
Word Combined Written Spoken
decree 0 0 0
scree 1 1 0
decree 0 0 0
decree 0 0 0
decree 0 0 0
[kn]
[gjid3 ]
[gjid3 ]
[kiid3 ]
[kjid3]
NOTE: Some forms appear more than once, because they are homophonous but
morphologically different: (to) agree, (I) agree, (we) agree, etc. Celex divides form
frequency equally among the homophones (Bumage 1995).
Table 4.5 shows the total word-final frequencies of each critical syllable:
117
Table 4.5. Effect of the [k]/[g] manipulation on the frequency of the word-final syllables in
the stimuli of Experiment 1
Final syllable Combined Written Spoken
297 281 553

[gj»]
0 0 0
[gji]
6 6 0
[kii]
[kn] 0 0 0
0 0 0
[giid3]
0 0 0
[gnd3]
[Iuid3 ] 0 0 0
[kiid3 ] 0 0 0
The closed syllables are very infrequent — they are not represented in Celex at all.
Both [i] and [I] are phonotacticaliy permissible in them, since they are closed. These
syllables provide a baseline context in which to assess the statistical effects, if any, of the
preceding [gj] or [ki] context on the [i]-[i] boundary location.
The legal open syllable [gii] is far more frequent than the illegal [gji], especially in
word-final position. TRACE expects the large frequency difference to produce a boundary
shift, as the many words containing final [gii] feed activation down to support [i] over [i].
118
The frequency difference between [kji] and [ku] is much smaller, leading to a smaller
predicted boundary shift. The MERGE TP and OT grammatical theories treat the [g]/[k]
manipulation as irrelevant - it lies outside the statistically relevant context in MERGE TP,
and outside the structural description of *L ax ]<j in the OT grammatical theory - so both
theories expect equally large shifts in either case.
4.2.3. Predictions
This section spells out in more detail the predictions of the three theories, with
special attention to TRACE.
4.2.3.1. TRACE simulation
It is notoriously hard to predict how a network will behave - they are just too
complicated. To check that TRACE really did make the predictions outlined in the last
section, a simulation was run.5 It confirmed our expectations.
4.2.3.1.1. Calibration and replication of the original TRACE results
The TRACE architecture has many adjustable variables, controlling things like the
speed with which activation spreads and the relative weight given to the different node
layers. The first step in the simulation was to get the right parameters and replicate the
results of McClelland and Elman (1986). After some trial and error, the following were
selected (based mostly on McClelland and Elman's Table 2):
Thanks are due to Jeff Elman, one o f TRACE'S creators, for sharing his software.
119
Table 4.6. Parameter settings for the TRACE simulation (all experiments)
Parameter Value Function
max 1.0 0 Maximum activation level of
a unit
min -0.30 Minimum activation level of
a unit
imax 3.00 Maximum incoming
activation to a unit
fscale 0 .0 0 Turns off effects of word
frequency
alpha[IF| 1 .0 0 Input-to-feature gain
alpha[FF] 0.04 Feature-to-feature inhibition
alpha[FP] 0 .0 2 Feature-to-phoneme
excitation
alpha[PP] 0.04 Phoneme-to-phoneme
inhibition
alpha[PW] 0.05 Phoneme-to-word excitation
alpha[WW] 0.03 Word-to-word inhibition
alpha[WP] 0.03 Word-to-phoneme excitation
alpha[PF] 0 .0 0 Phoneme-to-feature
excitation
alphafPFC] 0 .0 0 Feature-to-phoneme
coaiticulation
These led to a near-perfect replication of McClelland and Elman's simulation of the
Massaro and Cohen (1983) experiment. The network was given as input the string /sLi/,
120
where !\J was featurally ambiguous between 111 and /!/. After 51 cycles, the network display
corresponded almost exactly to McClelland and Elman's (1986) Figure 7, Panel 3, with /l/
about twice as active as 111 among the phoneme nodes and sleep and sleet the most active
words:
Figure 4.7. Results of the TRACE simulation replicating Figure 7 of McClelland and
Elman (1986)
CYCLE 51
sis
sil
sit
si 8 24
sid
sik
slip 31 14
slit 31 14
su
~s
15 36 36 17
6 8 26
20
38
10 33 67 67 33 10
The horizontal axis represents time; the words and phonemes are arrayed along the
vertical axis. TRACE has a separate unit for each word or phoneme at each time cycle
121
(corresponding to the hypothesis that the utterance contains that word or phoneme
beginning at that time cycle). The activation level of each unit at the present time. Cycle 51,
is shown by a number if it is greater than zero. So, for instance, the network at this moment
is confident, to a degree of 6 8 (arbitrary measurement units out of 99), that there is an [s] at
time slice 12, and to a degree of 31 that the signal contains the word sleep, beginning at time
slice 12.
As a second check, the network was asked to process the input /Tluli/, where /T/ is
ambiguous between the phonotactically legal /p/ and the illegal, but lexically favored, /t/
(supported by the nearby word truly). By Cycle 54, we see an output almost identical to
McClelland and Elman (1986)’s Figure 8 , Panel 3: /truli/ is the leading word, and the
leading candidate for the cluster is the unphonotactic [tl] - the [t] being the lexically favored
disambiguation of the ambiguous ‘p /t\ and the [I] being the phonetically supported parse of
the unambiguous input.
122
Figure 4.8. Results of the TRACE simulation replicating Figure 8 of McClelland and
Elman (1988)
CYCLE 54
30
it
lid
lig
lip
lis t
ru l 15
tr u p
tru
tru li 45 20
16 37 37 16
42
48
41 28
57
20
51 17
25
10 32 67
4.2.3.1.2. Simulation of the present experiment
The original TRACE has only four vowels: (i a u A). These are distinguished one
from another by the features DIFfuse (evidently F2-F1: 8=high, l=low), ACUte (evidently
F2: 8=front, l=back), and POWer (8 for [i a u], 7 for [A] (which is both [a] and [a])).
123
Table 4.9. Featural parameters of the four original TRACE vowels (McClelland & Elman
1988)
Vowel DIF ACU POW
i 8 8 8
a 2 1 8
u 6 2 8
A 5 1 7
A new phoneme was added, corresponding to IPA [i], and a new ambiguous
phoneme between [i] and [t]:
Table 4.10. Featural parameters of the new vowels [1] and [X]
Vowel DIF ACU POW
i 8 8 8
X 8 7 8
I 8 6 8
In tests with the lexicon turned off, IX/ was found to activate /i/ and IV equally.
The original TRACE lexicons only included words whose vowels were drawn from
the set /i a u A/. To create a new lexicon including /I/-bearing words, I took the CELEX
English lemma pronunciation Tile EPL.CD, merged it with the CELEX English lemma
frequency file EFL.CD, and culled therefrom the words having all of the following
properties:
124
1. They contained only the phonemes in the modified TRACE (the original
complement + [I], with TRACE [A] corresponding to CELEX [V] and [@] (IPA [a ] and
[3 ])-
2. Since TRACE does not support the [d3 ] phoneme, it was simulated with [S],
CELEX words containing |J] were excluded. Those containing [d3 ] were included, with the
[d3 ] recoded as [S],
3. They occurred at least 16.7 times per million in the combined spoken and written
corpus of CELEX. (McClelland and Elman used a 20-per-million Kucera-Francis cutoff,
but I had to go lower so that this lexicon would be of similar size to their lexicon s le x ) .
The stress and syllabification marks were stripped, the phoneme codes were
converted, and homophones were collapsed into a single entry, with the frequencies added
together. Celex's British-English coding of final [i], as in angry, was converted to the
American English [i]. BrEng [a:] was converted into AmEng [at].
The resulting lexicon contained 241 lemmas (making it about the same size as the
original slex, which has 213). Of them, 5 words had /gji/, 3 of them finally: agree, degree,
disagree, Greek, and Greece/grease, while 1 word had /k_ii/, nonfinally: creep. It contained
no words with [giid 3 ] or [kiid 3 ]. There were no words with /gji/, but 5 had /kii/: cricket,
critic, secret, secretly, and script.
TRACE can be made sensitive to word frequency. Since McClelland and Elman
(1986) did not use this feature in their simulations, I did not use it in this one.
Simulations with this lexicon showed that TRACE did not distinguish between the
open- and closed-syllable contexts at all. There was no difference in activation level
between [i] and P], regardless of whether the vowel was word-final or not, because TRACE
125
was insufficiently sensitive to silence as a word-boundary cue. To remedy this, the TRACE
symbol for silence was added to the end of each word in the lexicon, where it acted as
another phoneme.
To establish a baseline free of lexical bias, the network was probed with the
nonwords [salgixf] and [sdlkiXJ]. Because of their sparse lexical neigbhorhoods, we
expect little lexical influence, and we expect the [X] to be equally ambiguous in both
contexts. As a measure of ambiguity we will use the difference between [i] and [I]
activation on Cycle 75 (about one syllable's length after the stimulus offset on Cycle 54).
This point was chosen because trial showed that no major changes in the relative activation
levels of the phoneme units happened later; rather, after Cycle 75, overall network activation
tended to decay towards zero.
In the [gj_d3 ] condition, [i] leads [I] by 43 to 36; in the [kj_d3 ] condition, [1] leads
[i] by 40 to 34. This represents a modest but definite bias, due mostly to the influence of
Greek, grease/Greece, and greet. The [I] interpretation is supported in both cases by the
lexical item ridge, but in the [gj_d3 ] condition ridge's activation is reduced by inhibition
from Greek, grease/Greece, and greet.
We now make the same comparison for the critical cases [salgjX] and [salgjX],
where a phonotactic bias is expected.
126
Figure 4.11. Results of the TRACE simulation for the input [sslgjXf]
CYCLE 75
42 45 48 51 54 57 60 63 66
S i-
Su-
d- 31
g rik -
g ris - 38 20
g rit-
l- 31
rlS - 29 31
s llk -
s il-
22
63
59
63
60 19
43
36
61
22 61
127
Figure 4.12. Results of the TRACE simulation for the input [solkiXfl
CYCLE 75
II-
It-
S i- 10 16 10
Su- 10 16 10
d- 31
l- 31
k rip - 20
k ru d -
rlS - 3 6 39
rid -
s llk - 42
61 20
61
61 21
34
40
60
22 61
128
Figure 4.13. Results of the TRACE simulation for the input [sslgjX]
CYCLE 75
d- 31
g rik - 29
g ris - 29
g rit- 29
31
k ru -
li-
rlS -
rid -
s llk -
s il-
23
64
61
60 20
22 61
129
Figure 4.14. Results of the TRACE simulation for the input [sollciX]
CYCLE 75
27 30 33 36 39 42 45 48 51 54 57 60 63 66
II-
It-
d- 31 16 15 19 19 20 13
l- 31 18 30 15 19 19 20 13
it-
k rip - 29
k ru - 28 27
k ru d -
li-
rlS -
rid -
sllk- 38
70
61
59
61 25
42
33
60
22 61
TRACE favors [i] over [I] by 52 to 26 in the [gj_] condition (thanks to the support
of agree), and by 42 to 33 in the [kj_J. There is a phonotactic bias towards [i] in both
cases, but it is much larger in the [g jJ than the [kj_] condition, as we had supposed, owing
130
to the lack of words ending in [kj_]. If we take the difference in activation level between the
[i] and [I] units as our predictor of effect size, we expect that the proportions of [I]
responses evoked by the same ambiguous vowel in the different contexts to be ordered as
follows: The most [I] responses in [ki_d 3 ] (difference = +6), then [gj_d3 ] and [kj_] (-7
and -9), then [ g jJ (-26).
What has happened is that words containing a non-final [gji], [kii], [gn], and [kn]
become partially activated, and provide the same amount of top-down support, regardless of
whether the final syllable of the stimulus is open or closed. However, the open-syllable
condition also allows the population of similar [gji]-final words to assist [i]. (In this
example, with a restricted lexicon, that population is limited to agree, but as we have seen,
there are many more.) Since there is no comparable population of [kji]-final words, [i]
receives less assistance in the [lu_] condition than in the [gj_] condition.
4.2.3.2. MERGE TP
There are two versions of the MERGE TP theory to be considered: INC-1, which
treats preceding and following context separately, and SC-1, which treats them together. As
we discussed in 1.3, the theories' predictions are made using different decision variables.
When a segment ambiguous between x and y is judged in the context A_B, the boundary
shift will depend on:
(4.15) INC-1 theory (independent neighboring contexts of one segment)
Pr (x | A_) * Pr (x | _B) compared to Pr (y | A_) * Pr (y | _B)
131
(4.16) SC-1 theory (surrounding context of one segment each way)
Pr (x | A_B) compared to Pr (y | A_B)
In this experiment, the context A is [j ], and the context B is either the word
boundary [#] or [d3 J. Context counts were made from the Celex wordforms database
(EPW.CD) using the combined written and spoken frequencies.
Both versions, in this experiment, make the same predictions.
4.2.3.2.I. INC-1 context
fril and fnl. All words coded as containing [ji] or [it] were extracted. There were
4117 words with [ii], occurring 124902 times in the 18-million-word corpus, and 12150
words with [j i ], occuring 493281 times. To correct for Celex's coding of American English
final unstressed [i] as [i] (as in angry), the 918 words ending in [j i ], occurring 122617
times, were transferred to the [ji] group. The resulting frequency counts were
Table 4.17. Diphone frequencies for the stimuli of Experiment 1
Frequency Frequency
Sequence Number of words (18 Mwd corpus) (per million)
5035 247519 13751

[Ji]
11232 370644 20591

[Ji]
Since [j ] occurs 108263 times per million words in Celex (EPW.CD, combined
written and spoken), the probabilities are
132
(4.18)
Pr ([i] | [i_D = 0.127

Pr ([i] | [J_D = 0.190
fi#] and fi#). The latter does not occur at all in American English. To estimate the
frequency of the former, all words in Celex (wordforms, EPW.CD) coded as ending in [i]
or [i] were extracted, and the [i]-final words were recoded as [i] (to correct for Celex's
British-English transcription of words such as angry). A total of 7611 words were found,
with a total frequency of 2422130 in the 18-million-word corpus, or 134563 per million.
Since the word-boundary [#] occurs one million times per million words, the probabilities
are
(4.19)
Pr([i] | [_#]) = 0.135

Pr,^] | [_#]) = 0.000
fidxl and fidx!. Following the same procedure, we find 133 words with [id3 ],
occuring 5965 times in the 18-million-word corpus or 332 times per millon words, and
1340 words with [id3 ], occuring 62103 times in 18 million words or 3450 times per million
words. Since [d3 ] occurs non-initially 12483 times per million words, the probabilities are
133
(4.20)
Pr ([i] | [_d3]) = 0.027

P r([i]|[_ d 3]) = 0.276
Thus, the decision variables for the INC-1 theory are
(4.21)
Pr ([i] | [J_])* Pr ([i] | [_#]) = 0.127 * 0.135 = 0.017

Pr ([i] | [ i j ) * Pr ([i] | [_#]) = 0.190 * 0.000 = 0.000
Pr ([i] | [J_])* Pr ([i] | [_d3]) = 0.127 * 0.027 = 0.0034

Pr ([i] | [ iJ ) * Pr ([i] | [_d3]) = 0.190 * 0.276 = 0.052
The INC-1 theory predicts that [i] will be strongly favored in the open-syllable
context, and that [ij will be favored in the closed-syllable context. If the closed-syllable
context is taken as a baseline, we should observe a strong phonotactic shift in favor of [i]
when in the open-syllable context. The rate of [I] responses across the continuum should
be ordered [gj_d3] = [kj_d3] > [gj_] = [ki_].
4.2.3.2.2. SC-1 context
fri#] and [j i # ] . The latter does not occur in American English. The former is found
in 12249 words in Celex wordforms (EPW.CD) (coded as [ji#] in 99 cases like agree, and
as [ji#] in 12150 cases like angry) with a total frequency of 32795 occurrences in the 18-
million-word corpus, or 7952 times per million. Since sequences of the form [j X#] occur
11407 times per million words, the probabilities are
134
(4.22)
Pr ([J i# ] | [J _ # ]) = 0.697
Pr ([ j i # 1 | [J _ # ]) = 0.000
fjid-^1 and fiithl. The former is found in 53 wordforms with a total frequency of 94
per million; the latter in 399 wordforms with a total frequency of 956 per million. Since
there are 1581 occurrences of [iXd3] per million words, the probabilities are
(4.23)
Pr ( [ J id 3 ] | [i_d3]) = 0.059
P r ( [ n d 3 ] |[ J _ d 3 ]) = 0.605
Here again, the theory predicts a strong shift in favor of [i] in the open-syllable
context, and a shift in the other direction in the closed-syllable context. If the closed-
syllable context is taken as a baseline, we expect a large boundary shift in favor of [i] in the
open-syllable context. The rate of [I] responses across the continuum is expected to be
ordered [gj_d3] = [kj_d3] > [gj_] = [k jJ .
4.2.3.3. OT grammatical theory
When a stimulus is presented which is ambiguous between, e.g., [sA lgri] and
[sAlgri], the Phonetic Parser may emit the parse [sA l.g ri], the parse [sA i.g ri], or both, with
probability depending on how close the stimulus is acoustically to [i] or [i]. If both parses
are emitted, they will be scored with respect to the active constraints of English, and the
more harmonic one will be processed first.
135
Since the grammar o f English forbids final lax vowels such as [I], there must be an
active markedness constraint against it, which we may call *LAX)<j:
(4 .2 4 ) *LAX]<j
Award one mark for every lax vowel in an open syllable.
For our hypothetical example, this will be the only active constraint which
distinguishes the two parses:
(4.25) [i] illegal => [i] bias
UR = • La x Jc
a. (•, [SAlgri])
b. (•, [sA lg n ]) *!
Since the more harmonic parse is processed first, responses will tend to be based on
the [i] parse, creating a bias towards [i] response. Since the bias is caused by Lax ]<j , it will
be present to the same degree in any stimulus meeting that constraint's structural description
- specifically, to the same degree in the [gr_] contexts as in the [kr_] contexts.
In the closed-syllable conditions, where both [i] and [I] are legal, no active
markedness constraints are violated by either parse, so no bias should be observed - and,
naturally, the [g]/[k] manipulation should again have no effect.
The predicted order o f the proportion of [I] responses is therefore [gj_d3 ] =
[kj_d 3 ] > [gj_] = [kj_], just as in the MERGE TP theory.
136
4.2.4. Methods
Paradigm. The task was an AXB judgment. Listeners heard one endpoint of a
continuum, then an intermediate stimulus, then the other endpoint, and judged whether the
intermediate stimulus (X) sounded more like A or more like B. Response was by button
press. Every AXB was also presented as BXA to counterbalance for primacy, recency, and
handedness effects.
Stimuli The A and B stimuli were synthetic disyllabic nonwords, stressed on the
second syllable, which differed in one segment — the initial segment for fillers, the vowel of
the second syllable for critical items. Between each A and B there were five intermediate X
stimuli, separated from each other and the endpoints by equal steps, making a 7-step
continuum in all6. Synthesizer parameters for the stimuli can be found in the appendix.
Stimuli were of high quality and sounded very similar to natural speech.
Figures 4.26 and 4.27 show how the A-to-B stimulus scales were constructed.
Every possible combination of the bracketed options was used to give a total of 32 filler
scales and 32 critical scales. Since there were 2 endpoints and 5 intermediate points for
each scale, the experiment required synthesizing 448 nonwords. This was done using the
SENSYN implementation of the Klatt terminal-analogue synthesizer, augmented by
homebrew software that constructed the intermediate points from the endpoints by linear
interpolation.
6 This is considerably fewer interm ediate steps than were used by Massaro and Cohen (1983) or Pitt
(1998). The disyllabic stimulus words, and the three-stimulus AXB trials, used in the present experiment
made each trial much longer than the sim ple monosyllabic X stim uli o f those authors. Use o f more
intermediate steps in this experim ent would have led to an impractically long experimental session. In the
event, acceptable psychometric functions and high significance levels were obtained with 5 intermediate
steps.
137
Figure 4.26. Schema for the filler stimuli of Experiment 1
A—B
p—t g i 0
+ + + r + +
a1
s—z k I
d3
Figure 4.27. Schema for the critical stimuli of Experiment 1.
A-B
P
t g 0
s + + + r + i-i +
a!
z k <*3
The sound [d3 ] was chosen because, being palatoalveolar, it has little coarticulatory
interaction with, or acoustic effect on, the high front vowels [i] and [I]. The synthesis
parameters for the [d3 ] were adjusted so that the transition from [i] to [d3 ] and that from [I]
to [d3 ] involved acoustic changes o f roughly equal magnitude.7
Typical trials were: Fillers[sAlgji]-X2-[ZAlgji], [pAlgiid ]-X -[Ulgjid ]; critical

3 5 3
[pAlkji]-X3 -[pAlkn], [zAlgjid3 ]-X i-[zAlgiid3 ]. Because the A and B stimuli on any given
7 A pilot experiment used final syllables that were closed with [b] instead o f Ij], and that began with [b] or
[gl rather than [k| or (g). Identification o f ambiguous tokens turned out to depend mostly on how many
[b |s were in the stimulus: The more [bis, the more [il-like the vowel sounded. This was probably an effect
o f compensation for (expected) coarticulation —listeners expected the vowel formants to be lowered by
138
trial could differ in initial consonant or final vowel, subjects never knew where the difference
was until they heard the X trial. This, together with the instruction to compare whole words
and the greater variety of word-initial than word-final differences, was intended to distribute
their attention more evenly over the word and discourage the strategy of listening only to the
final vowel in critical trials. The hope was to produce stronger TRACE-type lexical effects,
if any were there to be had, by encouraging higher-level processing of the stimuli.
Subjects were 15 young adults living in Western Massachusetts. All of them
reported having normal hearing and being native speakers of American English. They were
recruited by poster and paid for their participation. They were naive to the purpose of the
experiment.
Procedure. Subjects were tested four at a time in a quiet room. AXB stimuli were
low-pass filtered at 4.133kHz (down about 80dB at 5kHz), amplified, and presented over
Sennheiser TDH-49 headphones at a listener-selected volume level. One second elapsed
between the end of A and beginning of X, and between the end of X and beginning of B.
Subjects were told that they would hear three "words", that the middle word had been
digitally synthesized to be acoustically in between the first and last word, and that they were
to judge as quickly and accurately as they could whether it sounded more like the first word
or the last word. Response was by button press — the leftmost button on the response box
for the first word, the rightmost for the last word. After the last subject had responded, or
2.5 s had passed, there was a pause of 2.5 s, followed by the next trial. Each of the 320
different trials was presented twice. The experiment lasted 2 hours, broken by a 5-minute
break, a 15-minute break, and another 5-minute break.
4.2.5. Results
One subject who, in all four open-syllable conditions (gri/grl/kri/krl) gave fewer
than 75% [I] judgments at position 1 and more than 25% [i] judgments at position 5, was
labialization, and "corrected" them by, in effect, adding a few tens o f Hz to the F2s o f stimuli with [bjs in
139
exculded from analysis. All other subjects were very consistent at the extreme positions.
Their identification curves are shown in Figure 4.28 (average over all subjects). Half of the
data from three subjects (consisting of one presentation of each trial) was lost through
experimenter error.
Figure 4.28. Identification curves for the stimuli of Experiment 1, pooled across 14
listeners
100
90
80
70
60
% "l" kr d Z
50
response
40
30
20
10
0
1 2 3 4 5
Intermediate stimulus number
For a test statistic, we used each subject's mean % [I] responses across all five
intermediate stimuli in each condition. This was assumed to be normally distributed, an
assumption which was confirmed by a normal probability plot.
them . Result: [b] shifted the vowel judgm ents in favor o f the more fronted [i].
140
Table 4.29. Mean % [I] response, all intermediate stimuli
95% confidence interval
Context Mean SD 00.025, 14 = 2.160)
53.4 7.50 [48".9 57.9J

[gj_d3]
[kj_d3] 53.3 7.82 [48.6 58.0]
41.8 7.55 [37.3 46.3]

[ g jJ
[ k jJ 41.9 6.07 [38.3 45.5]
The order of [I] response rates is [gj_d3 ] = [kj_d 3 ] > [gj_] = [ki_], precisely as
predicted by the MERGE TP and OT grammatical theories, and very different from the
[kj_d3 ] > [gj_d3 ] =[k_i_] > [gi_J predicted by TRACE.
The confidence intervals are wide because there is a great deal of individual variation
between subjects in their overall [I] report. To reduce this, the results are submitted to a
paired sample /-test. We have three degrees of freedom, so we can test three differences: 1.
between [gj_d3 ] and [kj_d 3 ], predicted by MERGE TP and OT/grammar to be zero and by
TRACE to be negative; 2. between [gi_d3 ] and [gi_J, to see whether the paradigm is
sensitive enough to find a phonotactic effect; 3. between the shift from [gj_d3 ] to [gj_J and
that from [kj_d3 ] to [kj_]. This last is the crucial comparison; TRACE predicts that the
difference will be positive (that the shift will be larger for the g syllables), while the other
two models predict that it will be zero. The numbers are given in Table 4.30.
141
Table 4.30. Differences in mean "I" response, pairwise by subject
Context Mean SD 00.025, 14 = 2.160)
[gj_d3H kJ_d3J 0.086 3.824 [-2.21 2.386]
[gj_d 3 ]-[gi_] 11.6 7.16 [7.30 18.9]
([gj_d3]-fgj_])— -0.17 3.36 [-1.66 2.17]

([k i_ d 3 ]-[k jJ)
We find no difference, or at best a very small one, between the two closed syllables,
confirming that they may be used as a neutral baseline - as predicted by MERGE TP and
OT/grammar, but not by TRACE.
The CIs for the differences between the open and closed syllables excludes zero (in
fact, it excludes zero even at a 99% confidence level (one-tailed t test, /0.01, 14 = 2.624, p <
0.001). In 13 out of the 14 valid subjects, judgments shifted towards [I] on both the g
continuum and the k continuum. We have thus replicated the Massaro and Cohen (1983)
effect: Perception of ambiguous segments is influenced by phonotactics.
Moreover, the effects seem to be the same irrespective of how many lexical items are
similar to the legal nonword; the difference between the effect in the common g syllables
and the rare k ones is close to zero. The subjects’ numerical differences clustered around
zero, split evenly between positive and negative (there was no sign that they divided into two
groups of responders).
This is highly unexpected in TRACE. Since we did find a phonotactic shift,
TRACE'S only explanation is lexical activation spreading from partially overlapping words.
But a drastic reduction in the number and frequency of those words (when the [g] was
changed to [k]) did not reduce the shift at all.
142
4.2.6. Discussion
The results are clear-cut: They are very much as predicted by the MERGE TP and
the OT grammatical theory, but very different from our expectations under TRACE.
We observed a strong phonotactic effect: [I] was disfavored in open syllables
compared to closed ones. TRACE could only explain this by activation-spreading from
similar lexical items. However, a large difference between conditions in the number (and
frequency) of those items, caused by the [k]/[g] manipulation, produced equally large shifts
in both conditions. This result is not a "null effect". It is two positive effects - one of
which was expected under all theories, and the other of which was not expected under
TRACE.
The MERGE TP theory and the OT grammatical theory were both able to explain
the observed facts, because both of them focused on the crucial, systematic phonotactic
difference between the conditions and ignored incidental variation elsewhere in the stimulus.
TRACE, because it can't ignore anything, failed to predict an effect that was actually
observed.
If we had computed transitional probabilities based on preceding contexts of length
greater than one, as considered by Pitt and McQueen (1998: Note 2), we would have
erroneously predicted the same pattern of results as TRACE, for the same reason. These
results therefore argue against a TP theory using larger contexts.
4.3. Experiment 2: Sequence frequency and word-initial [pw] clusters
4.3.1. Rationale
The results of Experiment I suggested that the size of a phonotactic boundary-shift
effect is unaffected by phonological context which does not directly participate in the
phonotactic pattern. To check that this finding was not an artifact of the acoustic
143
constitution of the stimuli, or of some idiosyncrasy of the lexicon, it was decided to replicate
the experiment with radically different stimuli.
Where Experiment 1 used an ambiguous vowel at the end of a nonword, this
experiment used an ambiguous consonant at the beginning. The inventory gap to be
exploited was the lack of syllable-initial [pw] discussed in §1.3.2.5. Listeners were to be
presented with a [p ]-[k ] continuum before [_w] and before [_ j ].
Given the low frequency of [pw] and high frequency of [kw], both TRACE and the
MERGE TP theory predict fewer [p] responses would be given before [_w]. If the size of
the boundary shift is modulated by the vowel following the [_w], this would favor TRACE,
which is sensitive to the entire carrier stimulus, over MERGE TP, which only considers the
context immediately adjacent to the ambiguous segment.
As we saw in §2.3.2.5, the lack of [pw] is a lexical rather than a phonological gap.
Since no active markedness constraint forbids [pw], the OT grammatical theory predicts no
boundary shift.
4.3.2. Design
Listeners were presented with a 7-step continuum (endpoints and 5 intermediate
steps) from [p] to [k] in the following contexts:
144
Table 4.31. Phonotactics of the stimuli for Experiment 2
Carrier
[Pi [kj
Lwifhet] V. A
Lwifnetfl V. V
Lwaefiist] V. V
Lwaefhetfl 7? V
Lnfnet] V V
Lufnetfl V V
[_jaefhet] V V
[_j2efnetf] V V
The phonotactic marginality of the [p] endpoint could, as Table 4.31 shows, be
varied by changing the following glide from [w ] to [ j ]. Since the ambiguous consonant was
always preceded by the same thing (silence), the TP theories (both INC-1 and SC-1) expect
only the [w ]/[i] manipulation to affect the location of the [p]-[k] boundary.
Similarity of the stimulus to other words in the lexicon was varied by manipulating
the quality of the vowel following the glide: When the vowel was [i], the stimulus was
closer to more words than when it was [»]. Table 4.32 counts the words beginning with
each of the stop-glide-vowel sequences used in this experiment. The cohorts are given in
the appendix to this chapter. (These cohorts, unlike those in the previous experiment, were
145
extracted from the American English Kucera-Francis database rather than from Celex, even
though Celex is much more complete, because there are two different phonemes in British
English corresponding to American English [xj.)
Table 4.32. Frequency of the syllables in the stimuli for Experiment 2
Total frequency (per million
Onset Number of words words, written)
[kwi] 24 113
[pwi] 1 0
[kwte] 3 10
[pws] 0 0
[kill 44 418
74 772
[pn]
[kias] 21 77
[pis] 14 139
Both [p] and [k] are roughly equally frequent in the [_j i ] and [_jse] contexts,
providing a statistically neutral baseline. Only [k] is found in the [_wi] and [_wae] contexts
(except for the very infrequent puissance, probably not known to most of the listeners), but
it is much more common before [_wi] than before [_wae]. By the same logic as in
Experiment 1, TRACE predicts more lexical activation in the [_wi] stimuli than in the [_wse]
146
ones, and hence a larger bias towards [k]. The MERGE TP and OT grammatical theory
predict that the vowel will make no difference - MERGE TP because it is outside the
statistically relevant context, and the OT grammatical theory because no active markedness
constraint is violated.
4.3.3. Predictions
In order to model this experiment, it was necessary to add a new phoneme to
TRACE: [w]. This was done by modifying the featural parameters for [u]. The glide was
made by reducing the vowel's POW(er) and VOC(alicness) specifications from 8 to 6,
leaving the other parameters unaltered.
A new ambiguous phoneme [Y] between [p] and [k] was constructed by averaging
the feature values of the original [p] and [k]. The new [Y] was not quite in between them;
when run with the lexicon turned off, TRACE tended to favor the [p] interpretation. The
difference in activation levels started out at 0 and increased to 3 after 5 1 cycles.
A new lexicon was constructed, based on the American English Kucera-Francis
database. All words were extracted which met the following criteria:
1. They contained only phonemes which were in the new TRACE inventory. Since
TRACE does not support [ae], words with [a] were eliminated, and words with [ae] were
included, with the [ae] recoded as "a". Since TRACE does not support [f], words with [S]
were eliminated, and words with [f] were included, with the [f] recoded as [S],
2. They occurred at least 5 times per million words in Kucera-Francis.
3. They were at most 9 phonemes long.
147
The latter two criteria were imposed to keep the lexicon to a size comparable with
that used by McClelland and Elman (1986). This procedure resulted in 494 lexical items.
The critical experimental sequences occurring in the lexicon are shown in Table 4.33:
Table 4.33. Word-initial occurrences of the critical syllables from Experiment 2
Sequence Words
[kwi] quick, quill, queer, quit
[pwi] (none)
[kwae] quack
[pwae] (none)
[kn] crib, crisp, critic, script
predict, precarious, pretty

[p-n]
[kiae] craft, crack, scrap
[pjtel (none)
In order that the simulated lexicon should better approximate the real one, the words
practise8 and practically were added so that the [piae] cell would not be empty. These
words had been excluded by the lexicon-constructing procedure because practise had zero
frequency and practically had a syllabic [1] in the 4-syllable pronunciation favored by
Kucera and Francis. They were added as [pjaektis] and the 3-syllable [pjaektikli].
8 The verb, so spelled in the American English on-line dictionary.
148
Pre-testing showed that, because of the close similarity between the [u] and [I]
phonemes in TRACE, the stimulus [pjifkAs] produced very strong activation of the word
proof, which soon came to dominate the pattern of activtion. These two phonemes are not
easily confusable in actual speech; Luce's confusion matrices (1986:Table 3.8) show that,
even at a -5 dB signal-to-noise ratio, [I] was heard as [u] only 3.3% of the time. Judging
this to be an undesirable artifact of TRACE'S small feature set and my choice of [I]
parameters, I removed proof from the lexicon.
The simulation was run using the inputs [Y w a e fk A s], [Y w iA c a s], [YiaefkAs], and
[YnfkAs]. As before, the measure of predicted effect size was taken to be the difference in
activation between the [p] and [kj units at Cycle 75. These are shown in Table 4.34:
Table 4.34. Results of the TRACE simulation of Experiment 2: Activation levels at Cycle
75
Stimulus [p] activation [k] activation Difference
[YwiQcas] 21 46 -25
[YwaefkAs] 23 42 -19
[Yji Acas] 36 37 -1
[YjaefkAs] 26 40 -16
As we expected, the [_wi] context produces a higher level of [k] activation than the
[_wae] context, and a larger difference between [p] and [k] activation levels. The [_ ji]
context is very nearly neutral betwen [p] and [k]. The [_Jae] context produces an
149
unexpectedly strong bias towards [k], which on closer inspection proves to be emanating
from the highly active word craft. This is no artifact; [kixfkAs] contains a sequence
differing from craft in only one feature, and that late in the word where acoustic mismatches
are least inhibitory. TRACE'S predictions of the rate of [p] response across the whole
continuum are therefore [_n] > [ _ J x ] = [_wae] > [_wi] The relatively low frequency of
craft may increase the [p] bias before H ® ] in actual practice, leading to a predicted order
[_ n ]58 L-i®] > [_wae] > [_wi], but in any case we do expect [_W®1 > L ^ l - that is, that the
magnitude of the phonotactic shift away from the illegal *[pw] sequence will be modulated
by the following vowel.9
4.3.3.2. MERGE TP
As in Experiment 1, we distinguish two versions of the MERGE TP theory
depending on what is taken to be the relevant phonological context: The INC-1 version,
which uses the segment preceding and the segment following the ambiguous segment, taken
separately, and the SC-1 version, which takes context to consist of the preceding and
following segment as a unit. Both of these theories are crucially distinct from TRACE in
that the first vowel of the stimulus lies outside the relevant context.
Frequency counts derived from the American English Kucera-Francis database are
shown in Table 4.35:
9 It is not clear how differing activation levels in TRA CE units are to be mapped onto different
boundaries, or what constitutes a "large" difference. An estim ate is provided by the sim ulation o f the
Massaro-Cohen r/l effect in McClelland and Elm an (1986), and replicated in § 1.3.2.1, which found a 20-
150
Table 4.35: Diphone frequencies for the stimuli in Experiment 2
Sequence Frequency (per million words)
[#p] 23245
[#k] 26201
[pw] 27
[kw] 2480
8525
[pi]
[ki] 3018
Since the word-boundary symbol [#] occurs, by definition, one million times per
million words,
(4.36)
Pr([p] | [ # J ) = 0.0232
Pr ([k] | [ # J ) = 0.0262
Initial [w] occurs 43594 times per million words, and [w] in general 59750, so non
initial [w] occurs 16156 times per million words. Hence,
point difference between the activation o f the [j ] and [I] units after [s]. A difference o f this magnitude
should correspond to an effect of substantial size.
151
(4.37)
Pr([p] | [_w]) = 27/16156 = 0.00167

Pr ([k] | [_w]) = 2480/16156 = 0.153
Initial [j ] occurs 14864 times per million words, and [i] in general 118503, so non
initial [i] occurs 103639 times per million words. Hence,
(4.38)
Pr ([p] | U l ) = 8525/103693 = 0.0822

Pr ([k] | [_ J ]) = 3018/103693 = 0.0291
Thus, the decision variables for the INC-1 theory are
(4.39)
P r([p ]|I [#_]) * Pr ([p] I [_w]) = 0.0232 * 0.00167 = 0.0000387

P r([k ]| I [#_]) * Pr ([k] | [_w ]) = 0.0262 * 0.153 = 0.00401
P r([p ]||[ # J ) * P r ( [ p J | | [ _ j ]) = 0.0232 * 0.0822 = 0.00191

P r([k ]| [#_]) * Pr ([k] |[ [_j]) = 0.0262 * 0.0291 = 0.000762
A [p] is 103 times less likely than a [k] in [#_w], but 2.5 times more likely in [# _ j ].
The INC-1 theory therefore predicts a strong bias against [p] in the [#_w] context
compared to the [# _ j ] context. The order of [p] report across the continuum is expected to
b e [ # _ n ] = [#_Jae] > [# _ w i] = [# _ w a e], s in c e th e fo llo w in g v o w e l lie s o u ts id e th e re le v a n t
c o n te x t.
152
4.3.3.2.2. SC-1 context
In this version of the MERGE TP theory, the left and right contexts are treated as a
single unit. Relevant frequency counts from the American English Kucera-Francis database
are shown in Table 4.40:
Table 4.40: Triphone frequencies for the stimuli of Experiment 2
[#pw] 0
[#kw] 1172
[#pj] 6858
[#ki] 1487
Since sequences of the form [#_w] occur about 12704 times per million words, and
those of the form [#_i] occur about 39037 times,
(4.41)
Pr ([p] | [#_w]) = 0/12704 = 0.000

Pr ([k] | [#_w]) = 1172/12704 = 0.0923
Pr ([p] | [#_J]) = 6858/39037 = 0.176

Pr ([k] | [#_j]) = 1487/39037 = 0.0381
Again, we expect a sizable bias against [p] in the [#_w] environment compared to
the [#_j ] environment, since [p] is infinitely less frequent than [k] in [#_w], but is 4.6 times
153
more frequent in The following vowel, being outside the relevant context, is again
predicted to make no difference, so that the predicted order of [p] response across the
continuum is [# _ j i ] = [#_J£e] > [#_wi] = [# _ w te ].
4.3.3.3 OT grammatical theory

As discussed in §1.3.2.5, the [pw bw] onsets violate a markedness constraint which
is too low-ranked to actually ban them, OCP[Lab ]. It does not dominate any faithfulness
constraints, and hence, in the OT grammatical theory, is not expected to influence
perception.
(4.42) [p] illegal => [k] bias
UR = • | (faithfulness) OCP[LAB]
a. (•, [kwifkous])
*
b. (•, [pwifkous])
The prediction is therefore a null result; any sign of an effect is unexpected.
However, it is also possible that listeners' grammars differ: some rank OCP[LAB]
high enough to make [pw] illegal, while others do not. For these listeners, the predicted
order of [p] response rates is [# _ j i ] = [# _ J a e ] > [#_wtj = [#_wae], just as in the MERGE TP
theory. The vowel plays no role, because it is outside the structural description of
OCPfLAB], The effect will be attenuated by averaging these listeners in with the others, but
no other pattern of results should occur besides the null one and [#_Ji] = [#_Jae] > [#_wi] =
[#_wae],
154
4.3.4. Methods
The same methods were used as in Experiment 1, except for the stimuli, which were
synthesized from the following template:
Figure 4.43. Schema for the critical stimuli of Experiment 2
A—B
I t
p -k + + + +
w fits
ae
tf
Figure 4.44. Schema for the filler stimuli of Experiment 2

A—B
I
P
+ w + + fhe + Hf
X
k
The [p]-[k] continuum was made by contracting the bandwidth of a burst centered
at 875 Hz from 1000 Hz wide to 100 Hz wide, so that a diffuse burst became a compact one
with the same center frequency.
155
As in Experiment 1, there could be an ambiguous phoneme at the beginning or end
of the stimulus. On any AXB trial, listeners could not know where it was until they had
processed the X stimulus.
Eight University of Massachusetts undergraduates participated in the experiment.
To minimize the chance that they were familiar with [pw] onsets from foreign-language
study, only listeners who had not studied French or Spanish were allowed to participate.
4.3.5. Results
One listener's data did not reach criterion performance on the endpoint stimuli and
was discarded, leaving 7. A total of of 3840 trials were collected, of which 100 were
discarded (for pressing an unassigned button, or having an RT above 1500ms). Results for
each of the 4 conditions are shown in Figure 4.45 and Table 4.46. No trace of a
phonotactic boundary shift was detected.
156
Figure 4.45. Identification curves for the stimuli of Experiment 2, pooled across 7 listeners
90
80
70
60
50
O/ nmh
70 p
response
40
30
20
10
0
1 2 3 4 5
Table 4.46. Mean % [p] response, all intermediate stimuli
Condition % [pj response SD 00.025,7 = 2.365)
[# _ W I] 56.2 4.6 [45.3 67.1)
[#_wae] 53.4 4.3 [43.2 63.6]
53.3 4.8 [41.9 64.7]

[#_Ji]
[#_jae] 55.6 4.3 [45.4 65.8]
157
4.3.6. Discussion
Neither lexical statistics, transitional probablities, nor low-ranked markedness
constraints seem to have an effect here. This is a null result (and one based on a small
sample); drawing conclusions from it is problematic. The only theory that directly predicted
a null result was the OT grammatical theory, but might there have been confounds that
vitiated the design?
One possiblity is that the stimuli were ill-chosen. It might be that categorical
perception of the initial stop stimuli left little time for the development of lexical effects.
However, a follow-up experiment in which the stop was part of the context and the glide was
varied from [w ] to [j ] was attempted, and likewise failed to find any effect: Averaged across
seven listeners, the percentage of "r" response was 58.1 after [k_] and 55.9 after [p_].
Another possibility is a low-level acoustic-phonetic interaction between the stop and
the glide counteracting the phonotactic effect. However, the two best-known of these
effects, auditory contrast and compensation for coarticulation, are both expected to assist the
phonotactic effect. Auditory contrast would make the bursts sound higher before [w], and
hence produce fewer [p] responses there. Compensation for coarticulation would likewise
attribute some of the labiality of the burst to anticipatory rounding before a following [w],
making the burst have to be more labial in order to sound like [p] and again reducing [p]
responses before [w]. This is what was found by Bailey, Summerfield, and Dorman (1977,
as cited in Repp 1982), who presented a [b]-[d] continuum before front and back vowels
and found that the vowel with the lower F2 produced more [b] responses.
As a more rigorous test of the OT grammatical explanation, it was decided to
directly compare the phonotactic badness of [pw] with that of [tl], a configuration known to
158
be illegal enough to cause boundary shifts on an [j ]-[1 ] continuum (Massaro & Cohen
1983, Pitt 1998). Since the statistics of [pw] and [tl] are very similar, any finding of a
difference between them in phonotactic efficacy would be strong evidence for the OT
grammatical theory over TRACE and the MERGE TP theories.
4.4. Experiment 3: Sequence frequency and the relative phonotactic badness of
[pw] and [tl] onsets
4.4.1. Rationale
Although [pw] is a marked onset in English, it is not as marked as those which have
hitherto been shown to cause phonotactic boundary shifts. In §1.3.2.5, [pw] was analyzed
as a lexical rather than a phonological gap, violating only the low-ranked OCP[Lab ]. On
the other hand, it is agreed by all authorities that initial [tl] is not permitted in English. The
perceptual strength of this prohibition has been experimentally demonstrated: The
boundary between [j ] and [1] is closer to [1] after [t_ ] than after [p _ ] (Massaro & Cohen
1983, Pitt 1998). In Chapter 2, §3.2.5, we have analyzed this as a consequence of a general
prohibition on successive same-syllable [-cont] segments using the same major articulator,
expressed as a markedness constraint OCP(CONT, PL).
By presenting the same [p]-[t] continuum before [w], [j], and [I], it was hoped that
we would be able to compare the size of the phonotactic boundary shift produced by [w]
with that produced by [I].
159
4.4.2. Design
Listeners were presented with a 7-step [p]—[t] continuum (two endpoints and five
intermediate steps) in the environments
Table 4.47. Phonotactics of the stimuli for Experiment 3
Carrier [t]
[Pi
[_wifkoos]
? V
Lw ivnam ]
7 V
Lwaefkous]
7 V
[_waevnam]
9 V
[_Jifkous] V V
[_Jivnam] V V
[_jaefkous] V V
[_jaevnam] V V
Llifkous] V X
[_livnam] V X
[_laefkous] V X
[Jaevnam ] V X
160
The phonotactic status of the endpoints could be varied by manipulating the glide.
An [1] made [t] illegal (a phonological gap); a [w] made [pj marginal (legal, but infrequent -
a lexical gap); a [i] made both legal.
Here, the MERGE TP and OT grammatical theories make different predictions. To
MERGE TP, [pw] and [tl] are both disfavored, since both sequences are of near-zero
probability. To the OT grammatical theory, only [tl] is illegal (ruled out by an active
markedness constraint). Hence MERGE TP predicts that both [w] and [1] contexts will
shift the [p]-[t] boundary, in different directions, compared to the [j ] baseline. The OT
grammatical theory, on the other hand, predicts that the [1] context will cause a much larger
shift (if any) than the [w] context.
As before, the MERGE TP and OT grammatical theories predict that the vowel of
the initial syllable, [i] or [ae], will have no effect on the boundary location, being outside of
the statistically or phonologically relevant context. TRACE, on the other hand, expects the
choice of vowel to contribute to the effect: since [plae] is a much more frequent word onset
than [pli], the shift should be larger before [lae] than [li]; and since [twi] is more frequent
than [twae], the shift (in the other direction) should be larger before [wi] than before [was].
Cohort sizes and frequency counts illustrating this are shown in Table 4.18; again,
they are taken from the American English Kucera-Francis dictionary.
161
Table 4.48. Frequency statistics for the stimuli of Experiment 3
Number of words beginning Total frequency (per million
Onset with that onset words, written)
[twi] 10 34
[pwi] 1 0
[twae] 1 0
[pwae] 0 0
[tn] 29 128
74 772
[pji]
[tiae] 79 600
[pjae] 14 139
[till 0 0
2 1
[ph]
[tlx] 0 0
[plae] 33 548
4.4.3. Predictions
These stimuli were chosen in the expectation that the TRACE response would be
determined chiefly by the word onsets, from stop through vowel. The rate of [p] response
before [_w], L-0, and [J] was expected to be determined largely by the relative frequencies
of [p] compared to those of [t] in each environment:
(4.49)
U ] > [_J] > Lw]
since [pi] will activate a much larger cohort than [tl], and [tw] than [pw].
The cohort sizes, and activation strengths, were expected to be modulated by the
vowel. For example, an initial [pi] will activate a cohort of words. Table 4.48 shows that if
the following vowel is [i], only a couple of rare words will receive support from that [i],
while the rest will tend to be deactivated by the mismatch. If the vowel is [ae], on the other
hand, a larger set of words will be supported and further activated, and correspondingly
fewer will be deactivated. Hence [p] should receive more support from [plae] than from
[pit]. If this is what happens, TRACE should make the following predictions:
163
(4.50)
[_waej > [_wi]

[_Ji) = L j« I, or perhaps [_ j i ] > [_J®]
[J * ] > Lli]
The TRACE distribution is supplied with an ambiguous stimulus phoneme [T] in
between [p] and [t], which activates both equally. This was used to construct the simulated
ambiguous stimuli.
Simulations for the [_fltous] and [_vnom] stimuli were made with slightly different
lexicons, one of which included words with [f] and the other of which included words with
[v]. These simulations are discussed separately in 4.4.3.1.1. and 4.4.3.1.2.; the results are
then compared with the expected predictions in 4.4.3.1.3.
4.4.3.1.1. [_fkous] stimuli
The same lexicon was used as in the simulation of Experiment 2. Words in the
lexicon which began with the critical onsets are shown in Table 4.51. This approximated
the distribution found in the full lexicon, as shown in Table 4.48.
164
Table 4.51. Words beginning with the critical onsets in the lexicon used for the TRACE
simulation of Experiment 3
Onset Words
[twi) twist
[pwi] -
[twae] -
[pwae] -
[tn] trig, trick
predict, precarious, pretty

[pn]
[tiae] traffic, track, tract, trap, traps
[pi«e] practice, practically
[tli] -
-
[pli]
[tlae] -
[plae] placid, plastic
When the simulation was first run, it was found that every context produced an
extremely strong [t] bias (the ratio of activation levels at Cycle 75 being about 70 to 10).
This was because a large number of short words ending in [t] were slightly activated by the
165
initial ambiguous segment, and remained slightly active throughout the simulation, giving [t]
a large cumulative advantage regardless of the following segments. This artifactual effect
completely swamped any influence of the remainder of the stimulus. To circumvent this
problem, the lexicon was edited, and word-final [p] and [t] were recoded as [b] and [g], so
that the ambiguous [T] segment would not activate them.
When the simulation was run again, it turned out that TRACE did not predict any
phonotactic effect in this experiment. The biases before [_WL L JL and [J ] were of
comparable size, as were the activation levels of the [p] and [t] units:
75
Stimulus [p] activation [t] activation Difference
[TwifkAs] 38 32 6
[TwaefkAs] 44 27 17
[TjifkAs] 42 29 13
[TjrefkAs] 36 39 -3
[TlifkAs] 38 31 7
[TlaefkAs] 39 30 9
166
Why is the bias in the pre-[_l] cases so small? The TRACE displays show that the
network is extremely good at spotting words, or parts of words, inside other words.
When the [TlifkAs] stimulus is presented, the most activated word units on Cycle 75
are leaf (37), plea (32), and subtly (23). The first of these is neutral between [p] and [tj.
The other two, plea and subtly, urge in opposite directions and cancel each other out. The
lexicon contains no [ph]-initial words which would decide the issue in favor of [p], so, as we
expected, the phonotactic effect is small if it exists at all.
When the [TlaefkAs] stimulus is presented, by far the most active unit on Cycle 75 is
laugh (46). It becomes active early, and is strong enough to inhibit other word candidates,
such as placid and plastic, which we had counted on to produce a larger [p] bias.
If the differences are taken as predictors of the rate of [pj report across the
continuum, then the expected order of effects is [_wae] > [_ j i ] > [_lae] = [_li] = [_wi] >
[_jae]. The average difference before [_w] is 11.5; before [_ j ], 5; and before [_1], 8.
TRACE predicts
(4.54)
[ _ w ] > [ J ] > L i]
and
167
(4.55)
[_wae] > [_wi]

[_ J i] > [ _ J * ]
[Jae] = Lli] (or perhaps [_lse] > [_li])
These are nearly identical to the predictions made by TRACE in (4.50).
4.4.3.1.2. LvnAm] stimuli
For this simulation, a lexicon was selected using almost the same procedure as in
Experiment 2. The procedures were the same except that where the Experiment 2 lexicon
included words with [f] (recoded as [fl), this one included words with [v] (recoded as [j]).
Pretesting showed that prove and approve tended to dominate responses in the pre-[_j]
conditions, as proof had in Experiment 2, so they were removed on the same grounds: that
[i] and [as] are not actually very confusable with [u]. As with the simulation in the previous
section, word-final [p] and [t] in the lexicon were replaced with [b] and [d]. The resulting
lexicon contained the same set of words with the critical onsets as in the simulation for the
V .
[_fkous] stimuli (Table 4.52).
Results of the simulation are shown in Table 4.56:
168
75
Stimulus [p] activation [t] activation Difference
[TwivnAm] 41 29 12
[TwaevnAm] 48 20 28
[TjivnAm] 49 26 23
38 35 3
[TiaevnAm)
[TlrvnAm] 43 30 13
[TiaevnAm] 44 25 19
Again, the size of the difference in each case is determined by one or two lexical
items. For the [TlivnAm] stimulus, there is some activation from plea (32) and pleaded (23)
in support of [p], which is reduced somewhat by subtly (16). The larger [p] bias for the
[TiaevnAm] stimulus is due to placid (66). The great difference in bias between [TuvnAm]
and [TiaevnAm] is caused by [TiivnAin]'s activation o f previous (51).
The predicted rates of [p] report are [_wae] > [_ j i ] > [Jae] > [_wi] > [_h] > [jtas].
The average values of the differences are: [_w], 20; [_ j ], 13; [_1], 16, so TRACE predicts
169
(4.57)
[_ w ]> [_ l]> [_ i]
(4.58)
[_wae] > [_wi]

L ji] > Li®]
L M > L I i]
4.4.3.I.3. Expected and actual TRACE predictions
Our expectations of what TRACE would do are only partially supported by actual
simulations, which revealed TRACE'S extreme sensitivity to individual lexical items. The
expected phonotactic effect, [_w] > [_ j ] > [_1], was not supported by the simulation. In both
the [_fkous] and [_vnAm] contexts, [_w] produced the largest [pi bias when it had been
expected to produce the smallest, and LU. expected to cause the largest, never did. The
cohort of words activated by the initial stop-glide sequence did not long remain active.
When the vowel arrived, it deactivated the great majority of the cohort members, reducing
their influence on the stop judgment.
On the other hand, the effect of the following vowel was pretty much as expected,
because the stop-glide-vowel sequence was three segments long— long enough to mismatch
all but one or two lexical items and activate them sufficiently to dominate the lexicon and
decide the outcome. The phonotactic effect in TRACE does not come from partial activation
of many lexical items, but from high activation of one or two. We therefore expect
(4.59)
[_wae] > [_wi]

L j i ] > L j ®]
[_lae] > LIi]
170
4A.3.2. MERGE TP
Frequency counts from the American English Kucera-Francis database are shown in
Table 4.60. The "forbidden” [pw] and [tl] occur with fair frequency within words (lapwing,
potlid).
23245
[#p]
[#t] 41840
[pw] 27
[tw] 996
8525
[pj]
[ti] 8468
3589
[pH
[tl] 220
171
Hence
(4.61)
P r([p ]| [#_]) = 0.0232

Pr ([t] | [#_]) = 0.0418
Two-phoneme sequences whose second member is [w] occur 16156 times per
million words, so
(4.62)
Pr ([pw] | [_w]) = 27/16156 = 0.00167

Pr ([tw] | [_w]) = 996/16156 = 0.0616
Two-phoneme sequences whose second member is [j ] occur 103639 times per
million words, so
(4.63)
Pr ([pi] | [_j]) = 8525/103639 = 0.0823

Pr ([ti] | [_ j ]) = 8468/103649 = 0.0817
Two-phoneme sequence whose second member is [1] occur 57018 times per million
words, so
(4.64)
Pr ([pi] | [J ]) = 3589/57018 = 0.0629

Pr ([tl] | [_!]) = 220/57018 = 0.00386
The predictive variables for the INC-1 theory are therefore
172
(4.65)
Pr ([p] | [#_]) * Pr ([pw] | [_w]) = 0.0232 * 0.00167 = 0.0000387

Pr ([t] | [ # J ) * Pr ([tw] | [_w]) = 0.0418 * 0.0616 = 0.00257
Pr U p] | [#_]) * Pr ([p j] | [_J]) = 0.0232 * 0.0823 = 0.00191

Pr ([t] | [#_]) * Pr ([ti] | [_J]) = 0.0418 * 0.0817 = 0.00342
Pr U p] | [#_]) * Pr ([p i] | LH ) = 0.0232 * 0.0629 = 0.00146

Pr ([t] | [ # J ) * Pr ([tl] | [_1]) = 0.0418 * 0.00386 = 0.000161
The [_ j ] context is nearly unbiased between the two stops. In the [_ w ] context, [t] is
expected to be about 66 times as likely as [p], while in the [_1] context, [p] is expected to be
about 9.1 times as likely as [t].
The INC-1 version of the MERGE TP theory estimates the phonotactic badness of
[pw] as much greater than that of [tl], because [tl] is of far greater probability than [pw]
conditional on the glide. In absolute terms, [tl] is actually about eight times more frequent
than [pw]. Because the left and right contexts contribute independently to the theory's
probability estimates, it overestimates the rate at which [tl] will occur word-initially.
Again, MERGE TP INC-1 does not expect anything else in the stimulus context, in
particular the vowel following the glide, to influence the size of the boundary shift.
4.4.3.2.2. SC-1 context
Here, the left and right contexts are treated as a single unit. The relevant sequence
frequencies, from the American English Kucera-Francis database, are shown in Table 4.66:
173
Table 4.66. Triphone frequencies for the stimuli of Experiment 3
[#pw] 0
[#tw] 251
[#pj] 6858
[#ti] 2625
[#pl] 1981
[#tl] 0
Since sequences of the form [#_w] occur about 12704 times per million words.
those of the form [#_j ] occur about 39037 times, and those of the form [#_1] occur about
15459 times,
(4.67)
Pr ([p] | [#_w]) = 0/12704 = 0.000

P r([tl|[# _ w ]) = 251/12804 = 0.0196
Pr Op] | [#_J]) = 6858/39037 = 0.176

Pr ([t] | [#_j]) = 2625/39037 = 0.0672
Pr ([p] | [# 1]) = 1981/15459 = 0.128

Pr ([t]| [#_!]) = 0/15459 = 0.000
\nA
174
In the [#_j] context, the bias is slight - [p] is only about 2.6 times as likely as [t].
However, in the [#_w] context, [p] is infinitely less likely than [t], while in the [#_l] context,
[t] is infinitely less likely than [p].
Under SC-1, we thus expect a large boundary shift, in opposite directions, in both
the [#_w] and the [#_1] contexts10. Again, the following vowel is outside the relevant
context and is not expected to contribute to the effect.

If [pw] is not actually forbidden by any active markedness constraint in English,
then we do not expect the [#_w] condition to produce any kind of boundary shift - it should
be indistinguishable from the [#_j ] condition. On the other hand, the highly illegal [tl] is
ruled out by an active constraint, OCP (PL, CONT), and we expect a sizable boundary shift
in the [#_1] condition. The following vowel is again outside the structural description of
OCP (PL, CONT) and is not expected to have an effect.
(4.68) [t] illegal => [p] bias
UR = • OCP (PL, CONT) (faithfulness) *LABLAB

a. (•, [p ia e v n A m ])
*!
b. (•, [tKevnAm])
(4.69) [p] marked, but not illegal => small if any [t] bias
UR = • OCP (PL, CONT) (faithfulness) ♦l a b l a b

a. (•, [pwaevnAm]) *!
b. (•, [twaevnAm])
10 We are assuming here that the magnitude o f the boundary shift is controlled by the ratio o f the
probabilities, for reasons described in §1.3.2.3. If it is controlled by the difference (a poorer guessing
strategy), then the [#_ j ] and [#J1 contexts are about equally discouraging to [t], which certainly does not
reflect native speakers' intuitions about phonotactic permissibility.
175
4.4.4. Methods
Unlike the stimuli in Experiments I and 2, this set was constructed from natural
speech. Several dozen tokens of each carrier were recorded by the experimenter (a native
speaker of American English) as a list of isolated words. The carriers were recorded
without an initial stop. One token of each was selected on the basis of the experimenter's
judgment of clarity and of uniformity of speaking rate. The most important criterion, and a
difficult one to meet, was that the initial glides not be confusable with each other.
To these carriers was prepended a 7-step synthetic [p]-[t] continuum, consisting of
a burst plus aspiration. The continuum was created by varying the F2 onset from 1000 Hz
to 2000 Hz in equal steps. The burst was kept very short to prevent it from sounding like
[k]. After the onset, F2 continued in a straight line towards 1000 Hz as the aspiration faded
out.
The presentation paradigm was the same as that used in Experiments 1 and 2. The
only difference was that in this experiment, the ambiguous segment only ever came at the
beginning of the stimulus. The ends of the stimuli did vary from trial to trial, but were
always the same within each trial.
Nineteen University of Massachusetts students participated. None reported any
history of hearing disorders.
4.4.5. Results
Of the nineteen listeners, twelve performed accurately enough on the continuum
extrema to be included in the analysis. Each contributed twelve responses to each
ambiguous stimulus, making 720 responses per subject.
176
An unexpected finding was that the "irrelevant" filler context, LvnAm] and [_fkoos],
actually influenced the location of the [p]-[t] boundary in the baseline pre-[_j] condition.
For this reason, I will discuss the [_vnAm] and [_fkous] results separately.
4.4.5.1. [_vnAm] stimuli
Psychometric functions are shown in Figures 4.70 and 4.71. It is clear that the
listeners were perceiving the continuum as a smooth transition from [p] to [t]. Figure 4.70
compares the baseline conditions (before [_ j ]) with the critical condition before [_1]; Figure
4.71, with the critical condition before [_w]. One can see that the pre-[_l] responses are
considerably more favorable to [p] than the pre-[_j] and pre-[_w] responses. These latter
are virtually indistinguishable from each other. The identity of the following vowel does not
seem to have a strong effect on the response curve.
To assess the statistical significance of the phonotactic shift, the overall rate of [p]
response was calculated separately for each subject in each condition (based on a maximum
of 60 responses to the whole continuum in each condition). A planned contrast, the
difference between the rate in the baseline condition and those in each of the two critical
conditions was computed for each listener, and these rates were averaged across all 12
listeners to estimate effect size. Results are shown graphically in Figures 4.70 and 4.71,
and in tabular format in Tables 4.72 and 4.73:
177
Figure 4.70. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled
12 listeners, comparing the [_1] condition with the [_J] baseline
100
90
80
60
•_llvnAm
% "p" ■JaevnAm
50
response ■_rlvnAm
■ raevnAm
40
20
1 2 3 4 5
178
Figure 4.71. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled
12 listeners, comparing the [_w] condition with the [_j ] baseline
100
90
80
70
60
_wlvnAm
% "p" _waevnAm
50
response _rlvnAm
40
raevnAm
30
20
1 2 3 4 5
179
Table 4.72. Mean percent "p" response, all intermediate [...vnAm] stimuli
Context Mean SD 00.025, 12 = 2.179)
Lwi] 48.4 9.02 [42.7 54.1]
[_wae] 45.6 10.9 [38.7 52.5]
44.5 14.3 [35.4 53.6]

Lit]
L i* ] 42.4 9.8 [36.2 52.2]
58.5 11.5 [51.2 65.8

Lli]
LM 56.9 13.5 [48.3 70.4]
Table 4.73. Differences in mean "p" response, pairwise by subject, [...vnAm] stimuli
Difference Mean SD (t0.025, 12 = 2.179)
[Ji]-L ii] 14.0 13.3 [5.53 22.5]
Llae]-LJae] 14.5 11.3 [7.35 21.7]
[_ w i ] - U i ]
3.93 16.9 [-6.83 14.7]
[_wre]-Llae] 3.15 10.2 [-3.32 9.62]
The difference between the [_J] and [ _ j ] conditions is positive, as expected, and the
95% confidence interval around each of the means excludes zero. The differences in the
[_li] and [_lae] cases are not distinguishable from each other. On the other hand, there is no
reliable difference between the [_w] conditions and the [_•*] baseline, and the nonsignificant
trend actually goes in the wrong direction.
These results are just what we would expect under the OT grammatical theory: A
large boundary shift caused by an active markedness constraint, and none by an inactive
one. They strengthen our suspicion that the lack of a perceptual bias against [pw] in
Experiment 2 was a real phenomenon, not just an effect of an insensitive paradigm.
TRACE'S prediction of more [p] responses before [_wae] than [_wi], and before
Llae] than [_li], is not borne out; if anything, the reverse has occurred, though the trend is
highly non-significant. There is a trend in the expected direction of more [p] responses
before [_ ji] than [_jae], but again it is non-significant.
4.4.5.2. [_fkous] stimuli
The other half of the stimuli proved more problematic, and are harder to interpret in
any theory. Psychometric functions are shown in Figure 4.17. The most striking
difference is the very large number of [p] responses before [_ J a e ], which does not fall below
50% until the very last step of the continuum. This context in fact produces more [p]
responses than any other in the experiment, despite the lack of phonotactic or statistical bias.
None of the theories discussed here has an explanation for this. I can only conclude
that it is an acoustic artifact of poor stimulus quality. The [i] in this stimulus is only about
10-15 ms long, less than half as long as the other [ j ] s , putting the burst closer to the vowel
181
nucleus. This could have caused an auditory contrast effect, making the burst sound lower
by comparison with the higher F2 of the vowel.
The lack of a reliable [_iae] baseline makes the [_aefkows] responses difficult to
analyze. The [_ifkows] data, shown in the next two figures, is consistent with the pattern
found with the LvnAm] stimuli, though it does not provide especially strong support:
Figure 4.74. Identification curves for the [...fkoUs] stimuli of Experiment 3, pooled across
12 listeners, comparing the [_1] condition with the |_ j ] baseline
100
IlfkoUs
0/
70
nn N
p laefkolls
resp o n se rlfkolls
- X — _raefkolls
0
2 3 4 5
182
Figure 4.75. Identification curves for the [...fkous] stimuli of Experiment 3, pooled
12 listeners, comparing the [_w] condition with the [_j ] baseline
100
60 -■ .wlfkoUs
% “p“ .waefkoUs
50 -■
resp o n se .rifkolls
.raefkolls
30 ■■
20 - ■
2 3 4
183
Table 4.76. Mean percent "p" response, all intermediate [...fkous] stimuli
Context Mean SD (tO.025, 12 = 2.179)
Lwi] 51.1 8.79 [45.5 59.9]
Lwae] 45.6 9.31 [39.7 51.5]
55.3 9.66 [49.2 61.4]

L-n]
L-iae] 65.8 10.2 [59.3 72.3]
60.8 8.14 [55.6 70.0]

Lit]
U s] 60.3 14.0 [51.4 69.2]
Table 4.77. Differences in mean "p" response, pairwise by subject, [...fkous] stimuli, [i]
condition only
Difference Mean SD (t0.025, 12 = 2.179)
5.41 11.27 [-0.25 11.1]

L li]-L n ]
Lwi]-[_-n] -4.23 5.61 [-7.79 -0.67]
The LI] effect just misses significance, while the [_w l effect achieves it but is
numerically smaller. Again, TRACE'S prediction of more [p] responses before [_wae] than
184
[_wi], and before [_lae] than [_Ii], is not supported. The prediction of more [p] responses
b e fo re [_ j i ] th a n [_Jae] is c o n tra d ic te d , b u t o n ly b y th e a n o m a lo u s [_Jzeflcous] c o n te x t, from
w h ic h w e c a n c o n c lu d e n o th in g .
4.4.6. Discussion
These results confirm our suspicion from Experiment 2 that the ban on initial [pw]
is much weaker than the other phonotactic prohibitions which have been the object of
perceptual experiments.
The sequences [pw] and [tl], despite their comparable statistical properties, differ in
their ability to cause a phonotactic boundary shift. This is unexpected under either version
of the MERGE TP theory, since they have very similar statistical properties. Overall, the
results of this experiment contradict the proposal that phonotactic illegality is equivalent to
zero frequency. The results are especially disappointing for the INC-1 version of the
MERGE TP theory, which predicted that [_w] would cause a larger shift than [_1].
TRACE'S prediction that the following vowel will have a very strong influence is
also not borne out by the data. Since a boundary shift was obtained, TRACE can only
explain it as a consequence of lexical activation spreading, in which case the following vowel
ought to have had an effect. The failure to find one calls TRACE'S explanation into
question.
185
4.5. Experiment 4: Sequence frequency and the relative phonotactic badness of
[bw] and [dl] onsets: Interaction of response variables
4.5.1. Rationale
Experiments 1 through 3 measured effects in stimulus units: Phonetic context was
manipulated as the independent variable, and its effect on the boundary location was
measured as the dependent variable. For example, in Experiment 3, a [p]-[t] continuum was
judged in the contexts [_wl> LrL and [_1], and [_1] was found to cause a larger shift
(compared to the baseline [_j ] context) than [_w]. This is consistent with the prediction that
the more illegal [tl] onset should be more dispreferred than the less illegal [pw] onset.
However, the finding could also be artifactual. This is because the [_1] and [_w]
contexts are expected to shift the boundary in opposite directions. Different-sized shifts
indicate different-sized phonotactic biases, but they could also simply reflect a closer
perceptual spacing of the stimuli at one end of the [p]-[t] continuum. If, for instance, stimuli
1, 2, and 3 are very similar, while 3,4, and 5 are very different, then a boundary shift from 3
to 1 (2 stimulus units) might actually be smaller perceptually than a shift from 3 to 4 (1
stimulus unit).
It is also possible that the shifts were due to low-level auditory interactions.
Certainly something unexpected was happening in Experiment 3, where the "irrelevant"
filler context ([_fkows] versus [_vnum]) interacted with the other stimulus variables.
Perhaps the stimuli of Experiment 3, based on natural tokens, did not provide sufficient
control on irrelevant parameters.
Experiment 4 was designed to eliminate these problems. The technique used here is
to measure the effect of one response on another: Listeners judged a CC cluster in which
both Cs were ambiguous, and the dependent measure was the effect of their decision about
186
the first C (“g” vs. “d”, or “g” vs. “b”) on their decision about the second (“1” vs.
“w”) (Nearey 1990). By so doing, one can control stimulus factors completely: The
dependence between stop and sonorant judgments can be measured separately for each
individual stimulus.
A further check on Experiments 2 and 3 is provided by replacing [p] and [t] with [b]
and [d], replicating the original experiment with stimuli that are different segmentally but the
same phonotactically.
4.5.2. Design
The aim was to measure the dependence of ‘T7”w” judgments on “g’T ’d” and
“g”/”b” judgments in English CCV syllables. All listeners were tested on two separate
stimulus sets: an array of stimuli ambiguous among [glre gwae dire dwre], and one
ambiguous among [glre gwre blre bwre], and classified each one as one of those four
stimuli. The dependence between the stop and sonorant judgments was quantified as the
change in the log-odds ratio of an TV 'w" response conditional on the stop judgment (see
below, §4.5.5.).
4.5.3. Predictions
In experiments with only one ambiguous segment, TRACE predictions are derived
by assuming that a larger activation level for, say, [I] than [j ] means a greater likelihood of
classifying the stimulus as "1". The likelihood of giving response R, is given by
(4.78)
2 >
187
where
(4.79)
S; = c\p(ka])
a; being the activation level of unit j as j ranges over members of the alternative s e t, and k
being a constant (McClelland & Elman 1986).
It is not immediately obvious how to derive predictions when a two-segment
response is called for, since TRACE has no units corresponding to two-segment sequences.
The simplest assumption would be that the probability of, e.g, a "bl" response is the
product of the probabilities of a ”b" and an "I" response. A stimulus which gets "b"
judgments 25% of the time and "1" judgments 60% of the time should be classified as "bl"
15% of the time. This would be in keeping with the principle that the units represent
hypotheses about the input, and their activation levels represent the strengths of these
hypotheses: The network’s confidence that the input contains a "b" at time 26 is completely
captured by the activation levels of the phoneme units for time slice 26. Under this
interpretation, any response dependency between the stop judgment and the sonorant
judgment for a fixed stimulus is unexpected.
4J5.3.2. MERGE TP
Frequency counts from the American English Kucera-Francis database are shown in
Table 4.80. The "forbidden" [bw] and [dl] occur with fair frequency within words
(subway, badly).
188
[bw] 7
[dw] 26
[gw] 148
[bl] 2407
[dl] 275
[gl] 625
Nonfinal [b] occurs 49,772 times per million words; nonfinal [d], 32,251 times;
nonfinal [g], 17,586 times. Hence
(4.81)
P r ([w] | [b J) = 7/49772 = 0.000141

P r ([I] | [b_]) = 2407/49772 = 0.0484
P r ([w] | [d J) = 26/32251 = 0.00806

P r ([I] | [d_]> = 275/32241 = 0.00353
Pr ([w] | [g_]) = 148/17586 = 0.00842

Pr ([1] | [g_]) = 626/17586 = 0.0356
The right-hand context is irrelevant, being in every case the same ([ae]). In the
baseline context [g_], an [I] is 4.23 times as likely as a [w]. After [b_J, [I] is 344 times as
likely as [w]; hence, the "b'V'g" decision changes the expected odds of an [1] by a factor of
81. After [d_], [w] is 10.6 times as likely as [1], so the "d'V'g" decision changes the
expected odds of an [1] by a factor of 45.
189
The INC-1 theory therefore predicts that both the "b'V’g" and "d’V'g" decisions will
affect the likelihood of an "1" response, with "b" favoring'T' responses and "d" favoring
"w". The effects are predicted to be of similar magnitude, with the "b'V'g" effect perhaps
larger than the "d’V'g" effect.
4i.3.2.2. SC-1 context
In the SC-1 theory, what matters is the relative frequencies of the three-phoneme
strings with [1] and [w] in the middle. These frequencies are
Table 4.82. Triphone frequencies for the stimuli of Experiment 4
[bwre] 0
[hire] 270
[dwre] 0
[dlae] 0
[gv/ae] 0
[glae] 202
Since [w] and [1] have the same frequency in [d_sej, SC-1 predicts no net bias in
favor of either. Meanwhile, the [b_ae] and [g_re] contexts have almost identical statistics -
no [w]s and a couple of hundred [l]s. These two contexts are therefore predicted to have
similar effects, biasing responses strongly towards "w". The "b'Vg" decision is thus
predicted to have little or no effect on the likelihood of an "I" response, while the "d'V'g"
decision ought to have a considerable effect, with "d" leading to more "w" responses.
190
Both [dl] and [bw] are unattested as syllable onsets in English, as shown in Table 1.
Nonetheless, as Table 4.83 shows, [dl] is commonly classified as “impossible”, while [bw]
is “marginal” at worst (Hultzen 1965; Wooley 1970; Catford 1988; Hammond 1999; see
§§2.3.2.4 and 23.2.5 above).
There are coherent structural grounds for this difference. Both clusters violate a
cross-linguistically widespread constraint against successive consonants with the same place
of articulation in the same unit - here, the syllable onset (McCarthy 1988; Padgett 1991).
The [dl] onset has two successive coronals, while the [bw] onset has two labials.
As discussed in §2.3.2.5. above, English grammar is less hostile to [pw bw] than to
[tl dl]. English [j ] is labial (Delattre & Freeman 1968), so the legal, frequent onsets [br pj]
violate the same OCP constraint as [bw]. Moreover, the OCP is, cross-linguistically,
stronger the more similar the two Cs are in sonority (Selkirk 1988; Padgett 1991). Since [1]
is less sonorous than [w] (Kahn 1980; Guenter 2000), the [dl] sequence is closer in
sonority than [bw] and hence a worse structural violation.11 The OT grammatical theory
therefore predicts a larger perceptual bias against [dl] than against [bw], as shown in (4.84):
For a discussion o f [si], [sn], and [st] onsets, see above. §2.3.2.4.
191
Table 4.83. Frequency of occurrence of the clusters of Experiment 4 as onsets in English
Position
Word-initial Syllable-initial
Cluster By types By tokens By types By tokens
Labial
bw 0 0 0 0
bl 389 27948 890 69100
Coronal
dw 10 983 16 1003
dl 0 0 0 0
Dorsal
gw 6 172 59 4834
gl 148 12644 292 19001
Note: Values represent occurrences in the 18.5-million-word London-Lund corpus of

written and spoken British English, using the principal pronunciation of each entry in the
CELEX EPL.CD lemma database (Baayen, Piepenbrock, & Gulikers 1995). Phrasal entries
(e.g., black-and-blue) were counted as single words.
When the stop could be either [g] or [d], the decision about the stop has major
consequences for the decision about the sonorant - a "d" decision means that the sonorant
cannot be parsed as [I] without violating an active markedness constraint. When the stop
could be either [g] or [b], though, the stop decision carries less weight, since the sonorant
can still be parsed as either [I] or [w] without violating an active markedness constraint.
The OT grammatical theory therefore predicts that the "g'V'd" decision will affect
the likelihood of an "1" response, with "d" leading to fewer 'T’s, while the "g'V'b" decision
will have little or no effect.
192
(4.84) [dlae] is more disfavored than [bwae]
UR —• OCP(CONT, PL) (faithfulness) OCP[LAB]
*
a. (•, [bwae])
b. (•, [blae])
c. (•, [gwae]) •• .-•-- V

• -.. -
d. (•, [glae]) •V'lV r-
e. (•, [dwae])
r .'. v;
*
f. (•, [dlae])
0*
g (•, [gwte])
h. (*, [glae])
4.5.4. Methods
Stimuli were synthetic CCV monosyllables. The V, following Pitt (1998), was
always [*] in order to make all stimuli nonwords. The second C ranged from [1] to [w]; the
first, from [g] to [d] or from [g] to [b]. To prevent listeners from memorizing the individual
stimuli, a large stimulus set was used (Crowther & Mann 1994): six steps along each
continuum, making 36 stimuli in each array. Stimuli were identified by a 2-digit code. The
first digit specified position on the stop continuum (‘0’ = most b- or d-like, ‘5’ = most g-
like); the second, position on the sonorant continuum (‘0’ = most 1-like, ‘5’ = most w-like).
Care was taken to make the stimuli acoustically and articulatorily plausible, and to
insure that ambiguous segments were heard as one of the intended phonemes. Synthesizer
193
parameters are shown in Figure 4.19; only differences between the endpoints are discussed
in the text.
Figure 4.85. Synthesis parameters for the stimuli of Experiment 4
(a)
AV
o
TJ
AH
000
000 20000
time (ms)
(b)
FTP
FTZ
300000
000
time (ms)
194
(C)
73)00 -
* SOOOO
■o
5
■o
2 9 ) jQ0
01
— I— 1— — I—
OO) 20000 40000 80000
time (ms)
(d)
73000
N
b/d B2F
X
2SOJOO
B2F
ooo
ooo
time (ms)
195
(e)
b/d A2F
A2F
AB
2500
b/d AB
iW
OjOO
TL
OjOO
time (ms)
(f)
MOO jOO
o FO
x
73000
N
X
>•
v
c
3
«r
2SD00
000
000 20000
tim« (ms)
196
The following parameters, which were the same for all stimuli and did not vary
across time, are not shown: GH 50, OQ 30, F6 4900, B6 100, F5 4300, B5 300, F4 3250,
FL 20. The aspiration control AH was turned on during the last part of the vowel to match
the breathy offsets of the natural model tokens.
The [I] and [w] endpoints were made following the acoustic theory of those
segments in Stevens (1999:513-555). The [1] endpoint had a low F2 and high F3,
corresponding to an elevated tongue dorsum, and by a pole-zero pair in the vicinity of F4,
corresponding to the cavity above the tongue blade (Stevens 1999:545). At the [w]
endpoint, the pole-zero pair was absent, and all formants above F2 were attenuated to
simulate the low-pass filtering effect of a labial constriction. FI and F2 were even lower
than in [1], another correlate of labiality.
The stop endpoints differed only in F2 onset, bandwidth of frication at F2, and
amplitude of the F2 and wide-band frication components. The [g] had a low F2 onset and a
compact burst spectrum, with energy concentrated near F2, while [b] and [d] had diffuse
burst spectra (Blumstein & Stevens 1979). The [b] had the same low F2 onset as [g], while
[d] had a higher onset than [b] and less energy in the F2 region. The stop transitioned into
the sonorant over 65 ms.
Parameter values were adjusted to make even the endpoint stimuli slightly
ambiguous. Intermediate steps were made by interpolating the synthesizer parameters.
Interpolation was linear except for the bandwidth of F2 frication, which was interpolated
along an exponential curve of the form B2F = AeBr, where r went from 0 at the [g] endpoint
to 1 at the [b] and [d] endpoints.
Stimuli were synthesized using the cascade branch of the SENSYN terminal
analogue synthesizer (Klatt 1980) with 16-bit resolution, a 16-kHz sampling rate, and a 2-
ms frame length. Six formants were used, but only the lowest two varied. Stimuli were
low-pass filtered with a sharp cutoff at 5512 Hz. All parameters are given in tabular format
in the Appendix to this chapter.
197
This procedure yielded two 36-element stimulus arrays: one ambiguous between
[glie gwse dlae dwae] (the “d array”), and one between [glae gwre blae bwie] (the “b
array”). Pretesting with 32 listeners showed that the stimuli sounded natural, were
ambiguous, and were only heard as one of the intended syllables.
Seventeen undergraduate native speakers of American English participated in this
experiment as part of a course requirement in introductory psychology. All reported no
speech or hearing deficits, and were naive to the purpose of the experiment. One listener
was dropped for inability to do the practice, leaving 16 valid subjects.
Listeners were tested individually in a soundproof booth (LAC Model 401 A) during
two 15-minute blocks separated by a 5-minute break. Eight of the listeners heard the “d”
array in the first block, while the other 8 heard the “b” array first. In each block, the 36
syllables were presented in pseudorandom order through Sennheiser EH-1430 headphones.
The listener responded by pressing one of four buttons labelled “dw dl gw gl” or “bw bl
gw gl”. The order of buttons was rearranged between listeners. One second after the
response, the next syllable was presented. Ten responses were collected for each stimulus.
Each block was preceded by a short practice. Each of the four most extreme stimuli
(at the comers of the array) was presented three times, for a total of 12 stimuli, in random
order, and judged by the listener as in the main experiment. No feedback was provided.
The practice was repeated until the listener had used all four responses (accurately or not).
4.5.5. Results and discussion
For each stimulus S, the 160 responses from all listeners were pooled to estimate the
likelihood that that stimulus would be put into each of the four categories. The statistic of
interest is how the listeners’ “IT ’w” judgment on a particular stimulus S is affected by
their decision about the stop. The “l”/”w” judgment was quantified as the log-
transformed odds ratio of “I” versus “w” responses. This was calculated separately for
the two stop responses, as shown in Figure 2. If the stop decision had no effect on the
198
decision, then all of the points in Figure 2 would lie on the line v = x.
Displacement from this line indicates phonotactic bias.
Figure 4.86. Log odds ratios for the "IT w " judgment in Experiment 4, contingent on the
"g'T'd” judgment. Each point represents 16 listeners' pooled responses to one stimulus.
(Stimulus codes are explained in the text.)
200 - + —
OL 42
OjOO
54
45
C
•200 ■+-----24---- f
OjOO
in (P("ar)/P("gw"))
199
Figure 4.87. Log odds ratios for the ’TY'w" judgment in Experiment 4, contingent on the
"g'Y'b” judgment. Each point represents 16 listeners' pooled responses to one stimulus.
(Stimulus codes are explained in the text.)
30
20
OjOO
03
13
a. 60
‘24
C
35 -25
54
-2D 0 OjOO 200
in (P("gl")/P("gw-))
For example, suppose S is a stimulus from the “d” array which was judged as
“gl” 36 times, “gw” 35 times, “dl” 18 times, and “dw” 67 times. When the stop was
identified as “g”, the sonorant was equally likely to be classified as “1” or “w”: In (
P(“gl”|S)/P(“gw”|S )) = In (36/35) = 0.028, plotted on the .t-axis in Figure 2. When the
stop was identified as “d”, the sonorant was more likely to be called “w” than “1”: In (
P(“dl”|5)/P(“dw”|5 ) ) = In (18/67) = -1.310, plotted on the y axis. The measure of
phonotactic bias is the difference d: the log of the “l”/”w” odds ratio contingent on a
“g” decision minus that contingent on a “d” decision, here 0.028 - (-1.310) = 1.338.
For each array, D = mean d over all stimuli was computed. In the “d” array, D =
1.224, indicating that a “d” judgment reduced the odds of “1” by a factor of exp( 1.224) =
3.40. In the “b” array, D was 0.4762 - an unexpected result, since it means that a “b”
200
judgment, far from reducing the odds of a “w”, actually increased them by a factor of
exp(0.4762)= 1.61.
Because the dependent measure, difference in log-odds ratios, bears a complex
relation to the individual subject data and is drawn from an unknown distribution, the
appropriate statistical test is the non-parametric bootstrap (Efron & Gong 1983, Efron &
Tibshirani 1993). The null hypothesis D = 0 was tested against the two-sided
alternative H,: D * 0 using the sensitive procedure recommended by Hall and Wilson
(1991). For each array, B = 10,000 bootstrap resamples were drawn and used to find Da
such that Pr* (| D* - D\> Da) = a. For the “d” array, Dm = 0.3986 and D0(n = 0.5238.
Both are much less than D = 1.224, allowing rejection of Hq at the 99% confidence level.
For the “b” array, Dm = 0.4856 and Dm= 0.6103; hence, the empirical D of 0.4762 barely
misses significance at the 0.05 level.
Although both [dlx] and [bwae] are unattested in English, a significant phonotactic
bias was found only against [dlae].
This is inconsistent with TRACE, which predicted that the stop and sonorant
judgments would be independent for each stimulus (i.e., that all the points in Figures (4.21)
and (4.22) would like on the liney =x). It is inconsistent with the INC-1 theory, which
predicted that the "b’V’g" decision would affect the "l"/"w" decision at least as strongly as
the "d"/"g" decision did.
It is consistent with the SC-1 and OT grammatical theories, both of which predicted
no effect of "b"/"g" and a large one of "d'V'g".
4.5.6. Foreign-language exposure
An alternative explanation must be considered. Most of the participants had had up
to nine years of exposure to a language with [bw] or [pw] onsets. Could this have allowed
them to build perceptual units for these un-English clusters? Each listener's total number
of "bw" responses was regressed against years of exposure to French, Spanish, or
201
Mandarin Chinese (see scatterplot in Figure 4.88). Longer exposure led to slightly fewer
"bw" responses. The trend was weak (R-squared =0.201) and due mostly to Listener 11,
who no exposure and a very high rate of "bw" response. When this listener was excluded,
the trend vanished (R-squaied = 0.117). Foreign-language experience does not, therefore,
explain the weakness of the bias against [bw].
Figure 4.88. Total number of "bw" responses in Experiment 4 as a function of individual

listeners' exposure to languages containing [bw] or [pw] onsets (French, Mandarin Chinese,
or Spanish)
200J00
tacuoo -
~ WOOD -
300 0 “
Years of exposure
4.5.7. A note on TRACE
There is another interpretation of TRACE, under which it may make the correct
predictions about Experiment 5.
The TRACE network is an idealization of a complex perceptual process. Each time
TRACE is exposed to a given simulated input, it settles into the same state. An actual
202
biological system would not do that, owing to uncontrolled random factors which differ
from trial to trial (such as perceptual noise, pre-activation of lexical items through priming,
or variations in attentional weight placed on different acoustic features of the stimulus). A
fixed stimulus would evoke a range of activation levels, leading to a range of response
probabilities; this is a standard assumption of signal detection theory (e.g., Macmillan &
Creelman 1991). These could interact through the network mechanism to produce a
dependency between stop and sonorant decisions for an acoustically fixed stimulus.
For example, a stop that on the average is exactly halfway, perceptually, between [d]
and [g] would sometimes activate the [d] unit a bit more and sometimes activate the [g] unit
a bit more. Of the "d" reports, a disproportionate number would come from the trials on
which the stimulus activated the [d] unit more. On these more [d]-like trials, the [d]-initial
cohort would gain an early advantage and combine to excite the [d] unit even more, leading
to inhibition of the [g] unit and hence of the [g]-initial cohort. The [dw]-initial words would
then feed activation down to the [w] unit. The [I] unit would receive no corresponding
support - the [gl] words being dormant and the [dl] words nonexistent - resulting in an
increased likelihood to respond "w". Hence, "d" responses would tend to be associated
with "w" responses - a dependency between the stop and sonorant decisions.
To assess this dependency, we need to manipulate the activation levels of the stop
units and measure the effect on the activation levels of the sonorant units. This can be done
by simply giving the network unambiguous stops together with ambiguous sonorants.
An ambiguous [l/w] phoneme was created for the TRACE input. Simple parameter
averaging of [I] and [w] produced a segment which stimulated no unit strongly. Parameters
were modified by trial and error until a segment had been attained that excited both the [1]
and [w] unit nearly equally, without exciting the [j ] unit.12 There was a slight initial bias
12 The parameters are: POW 3, VOC 3, DIF 7, ACU 5, GRD 4, VOI 1, BUR 0. It was also necessary to
replace the [S| unit with a copy o f the [s] unit, because otherwise the ambiguous segment activated it.
203
towards [I], which increased over time to a factor of 2 at Cycle 72. Since this effect was
present in all conditions of the simulation, it should not affect the pattern of predictions.
The same lexicon was used as in the TRACE simulation of Experiment 3. As in
Experiment 3, it was found that the initial (b/g) or (d/g) segment excited a great many words
ending in [b d g]. To avoid this artifact, the same procedure was followed as before: Final
[b d g] were replaced by [p t k]. The words guano and dwarf were added, to insure that the
lexicon had at least one word beginning with each attested onset13. The resulting lexicon
contained the following words with the relevant onsets:
Table 4.89. Lexicon for TRACE simulation: Words with [b/d/g]+[w/l] onsets
Onset Words
[gw] guano
glad, glass, glue

[gl]
[bw] (none)
[bl] blush, blood, blast, black, bleak, blue
[dw] dwarf
[dl] (none)
For this simulation,, it is important that the stop and the sonorant interact only
through activation spreading in the network, rather than through coarticulation. The event
being simulated is the repeated presentation of a single stimulus; the effect of interest is the
13 They were adapted to the TRACE phoneme inventory as dwarp and gwadu.
204
influence of the stop decision on the sonorant decision. We cannot manipulate the stop
decision directly, so we must do it by actually manipulating the stop in the simulated input.
However, the features of that simulated stop will be realized by TRACE not only at the time
slice of the stop, but for several time slices on either side, with the result that an initial [b]
will pre-activate [w] for reasons which have nothing to do with phonotactics. Manipulating
the stop does not just manipulate the activation input to the stop units, but also that to the
sonorant units. To prevent this, the TRACE program was altered so that the acoustic
features did not spread more than three time slices on either side of their center, preventing
overlap entirely.
The predicted log odds ratio follows from
(4.90)
l^p(»j")_ f exp (kl) ^ exp(/W) + exp(fov)

/T V ) exp(kl) + exp(kw) exp(kw) ,
exp(kl)
exp (kw)
= k(l - w)
where / and w are the activation levels of the [I] and [w] units. That is, the log odds ratio of
the "17"w" decision is proportional to the difference in activation levels between the [1] and
[w] units.
Given the input [g?ae], [1] and [w] start out equally active. By Cycle 45, glad, glass,
and glue are pulling ahead of guano, and [1] has overtaken [w]. By Cycle 75, [1] is three
times as active (39 to 14), and [w] is extinguished by Cycle 96. The log odds ratio at Cycle
75 is therefore 25it.
With [b?ze] as input, the same thing happens, only faster since there are more words
beginning with [bl] but no countervailing [bw]-initial words By Cycle 75, the activation
205
levels for [I) and [w] are 49 and 4, and [wj is deactivated on Cycle 81. The log odds ratio at
Cycle 75 is 45k.
With [d?ael as input, [1] and [w] are neck-and-neck for a long time. Dwarf
gradually gains ground, but is held in check by other weakly-activated [d]-initial words like
do, D, and deal. On Cycle 75 [w] is still only half again as active as [1] (32 to 20), and [1]
does not go extinct until Cycle 105. The log odds ratio at Cycle 75 is -12k.
The effect of the "g'V'b” decision on the Cycle 75 log odds ratio is therefore to shift
it by 20k, while that of the "g'V'd" decision is to shift it by -31k. TRACE therefore predicts
that the "g'V'd" decision will have a larger effect on the "l"/"w" decision than will the
"g'V'b" decision - as was in fact observed.
The success of this simulation is due to the fact that TRACE implements
phonotactic bias not by inhibiting low-frequency sequences, but by facilitating high-
frequency ones. Here, the difference in frequency between [gl] (glad, glass, glue) and [gw]
(only guano) was enough to make the [gje] context favor [1]. In this respect it behaved
more like [b_se] and less like [d_re], reducing the effect of the "b'V’g" decision and
increasing that of the "d'V'g" decision. That is to say, the [g_ae] context did not provide a
neutral baseline.
All this is taking the most favorable possible view of the feasibility of modelling
judgment dependencies as a consequence of perceptual noise added to TRACE inputs. This
scheme will require quite a lot of noise. In the above simulation, the stop units were
supplied with enough "noise" to completely disambiguate them - the simulated listener, on
that trial, heard an ambiguous segment as a perfect [g], [d], or [b], A more realistic, i.e.
smaller, estimate of perceptual noise would lead to a correspondingly smaller variation in
stop-unit activation levels for a fixed stimulus, which would in turn cause smaller variation
(from presentation to presentation of the same stimulus) in both stop and sonorant response
probabilities. It seems implausible that the covariation between stop and sonorant responses
would show up as strongly as it did in only 160 trials from 16 listeners.
206
4.6. Experiment 5: Phonotactics and syllabification
4.6.1. Rationale
Another possible source of the effects in Experiment 4 is compensation for
coarticulation (Mann 1980, Mann & Repp 1981, Mann 1986). Suppose an ambiguous stop
in the'd' array is perceived as [d]. Since the F2s in the'd' array range from slightly below
that of a [d] to slightly above that of a [b], they are all lower than the typical [d]. A stop
heard as [d] thus has an atypically low F2 for a [d]. Some of this lowness may be attributed
to labialization from a following [w], causing more “dw” and fewer “dl” responses.
Because [b] and [g] have similar F2s, the compensation effect may be smaller in the “b”
than the “d” array, producing the observed results.
This account can be tested by manipulating the stimuli to alter their phonotactics
while leaving their coarticulatory properties intact. As pointed out by Pitt (1998), a cluster
which is illegal in an onset may become legal if split by a syllable boundary. A structural
account predicts less bias against “dl” responses in [xdlx] than in [dlse], because [aedlae]
allows the legal parse [aed.ls], A compensation account predicts the bias will persist, as
compensation has a strong effect across syllable boundaries (Mann 1980; Mann 1986;
Elman & McClelland 1988; Pitt & McQueen 1998), is unaffected by perceived
syllabification, and is only slightly reduced, if at all, by a preceding vowel context (Mann &
Repp 1981).
207
4.6.2. Design
The design was based on that of Experiment 4. Two 6-by-6 arrays of CCV and
VCCV stimuli were constructed, ambiguous between [dwae dlae bwae blae] or [aedwae aedlaae
aebwae aeblae]. Both [dlae] and [bwae] were included to maximize the expected phonotactic
effect. Listeners were asked to make the same judgments as in Experiment 4.
4.6.3. Predictions
The predictions of TRACE, INC-1, and SC-1 are exactly as in Experiment 4.
TRACE predicts that the stop and sonorant judgments will be independent, both in the
VCCV and the CCV condition. Neither the INC-1 nor the SC-1 context can "see" the
prepended vowel, so they predict that adding the vowel will not affect the stop-sonorant
dependency.
The OT grammatical theory predicts that the stop-sonorant dependency will be
reduced in the VCCV condition compared to the CCV condition. This is because the
VCCV stimuli can be syllabified as VC.CV. If the stop is classified as "d", then "1" will be
disfavored in the CCV condition because [dlae] violates OCP(CONT, PL); however, in the
VCCV condition, an "1" response will still be possible if the input is parsed as VC.CV.
4.6.4. Methods
From the endpoints of Experiment 4, a 6-by-6 array of CCV stimuli was
constructed, ambiguous between [dwae dlae bwae blae]. A 6-by-6 array of VCCV stimuli
was made by adding a 300 ms [ae] to each of the CCV stimuli. This [ae] used the same
parameters as the final [ae], except that FO began higher (120 Hz). Transition to the stop
took 40 ms. A 40-ms voiced closure preceded the release; though short, this proved to be
the most natural-sounding duration. Details are in Figure 4.91.
208
Figure 4.91. Synthesis parameters for the stimuli of Experiment 5
(a)
ffl
•o AV
AF.AH
000
000 10000 20000 30000
tun* (ms)
(b )
FTP
FTZ
time (ms)
209
(C)
03
B2
040
0130
time (ms)
(d)
B4
73)01
B2F
N
X
c
•v
A
00)
MOO) 20000 3000)
tim« (ms)
210
(e)
TSjOO -
A2F
m
■o AB
>
TL
OjOO
time (ms)
(0
o
X
N
X
o
c
•
3
V
•
tz
oa t
toooo 20000
time (ms)
211
Eighteen different members of the same population as in Experiment 1 participated
for psychology course credit. Two were dropped because their native language was not
English, leaving 16 valid subjects.
The only difference from Experiment 4 was that all listeners were tested on the
VCCV block first and the CCV block second, to avoid priming a V.CCV syllabification.
4.6.5. Results
Results are shown in Figures 4.92 and 4.93. As in Experiment 4, bias appears as
displacement from the line y = x. The displacement is greater and more consistent in the
CCV than the VCCV condition. The test statistic was again D, the log of the “I T w ”
odds ratio contingent on a “d” decision minus that contingent on a “b” decision, averaged
across all stimuli.
212
Figure 4.92. Log odds ratios for the "l'V'w" judgment in Experiment 4, contingent on the
"b"/"d" judgment, for the CCV stimuli. Each point represents 16 listeners' pooled
responses to one stimulus. (Stimulus codes are explained in the text.)
CCV stimuli
200
100
52
3 03 02
£ 00
Sl 33
000
34
£ -too ■ + -
05
+ —
000 100 200
In (P(dl)/P(dw))
213
Figure 4.93. Log odds ratios for the TV'w" judgment in Experiment 4, contingent on the
"b'V’d" judgment, for the VCCV stimuli. Each point represents 16 listeners' pooled
responses to one stimulus. (Stimulus codes are explained in the text.)
VCCV stimuli
200
50
22 00
32
43
53
24
54
-2 jOO
04 45
55
000 100 200
In (P(dl)/P(<!w))
For the CCV array, D is 1.0505, for the VCCV array, it is 0.0648. The same
A
nonparametric bootstrap procedure was used to test significance. For the CCV array, Doos
A
= 0.4370 and Dooi = 0.5685, confirming a phonotactic effect. For the VCCV array, the
A A
effect did not approach significance: Doos = 0.4362 and Dooi = 0.6269.
The results indicate that the dependency was eliminated by the availability of a legal
parse: The prepended vowel made a difference. This is unexpected under TRACE, INC-1
or SC-1, but is precisely as predicted by the OT grammatical theory.
214
4.6.6. Discussion
Experiment 4 found a perceptual bias against [dlje], but none against [bwae). The
INC-1 version of MERGE TP predicted otherwise, because [dl] and [bw] are both
unattested as English syllable onsets. Since listeners’ experience of both onsets is identical,
that experience cannot explain the difference in performance. TRACE incorrectly predicted
no dependency at all between the stop and sonorant decisions. Only the SC -1 version of
MERGE TP and the OT grammatical theory correctly predicted the pattern of results.
Foreign-language experience provided no alternative explanation. The difference
was not due to auditory factors, since bias was measured separately for each stimulus;
rather, it reflected a dependency between the stop and sonorant responses. Experiment 5
confirmed that this dependency was not compensation for coarticulation, because it could be
reduced or eliminated by providing a legal parse for the cluster.
It may be objected that listeners' experience of [dl tl] and [bw dw] is not in fact
identical -- that there is a frequency difference, too small to be detected in an 18-million-
word British English corpus, in favor of [bw pw], which university-aged speakers in the
United States are likely to have encountered in foreign place names such as Buenos Aires,
southwestern U.S. place names such as Pueblo, or occasional loans like puissant or the
colloquial bueno. At a conversational speaking rate of 150 words per minute (Venkatagiri
1999), an 18-million-word corpus would represent only 8Sdays of continuous speech, or
perhaps one to three years of a person's combined input and output. A word occurring less
frequently than once in one to three years could escape the corpus —though an 18-year-old
participant in these experiments might have heard it 18 times or more, providing enough
experience of [bw pw] onsets to remove the perceptual bias against it. This Undetected
Frequency Difference (UFD) Hypothesis is a serious objection, but it is unlikely to be
correct.
215
As has already been pointed out, listeners’ acceptance of [bw] was not increased by
up to 9 years of explicit training in languages in which [bw pw] onsets are common. It was
argued above that this is a ceiling effect; acceptability of [bw pw] cannot be increased by
training because the sequences are legal in English. If instead the UFD Hypothesis is
correct, then the whole of the gain in acceptability must be caused by the exposure to the
first few tokens, with subsequent training having no effect. Hence, it should take exposure
to only a small number of tokens to make any sequence legal. But speakers persist in
treating some sequences as illegal, even after considerable training (Dupoux et al. 1999;
Polivanov 1931).
In support of the UFD Hypothesis, it may be replied that the listeners were exposed
to the undetected low-frequency [bw pw] as children, but received foreign-language training
as adults, after the critical period for accentless acquisition. It is certainly true that infants as
young as 9 months are already sensitive to the sound pattern of their language (Jusczyk,
Friederici, Wessels, Svenkerud, & Jusczyk 1993; Friederici & Wessels 1993; Jusczyk et al.
1994). However, adults can leam phonotactic patterns even without explicit training (Dell,
Reed, Adams, and Meyer, 2000). Moreover, a dispreference for [dl tl] compared to [bw pw]
is found in children who were unlikely to have been exposed to [bw pw] onsets. As shown
in Table 4.29, the midwestem U.S. States of Iowa and Nebraska had few Spanish- or
French-speaking residents in 1990 and almost no place names beginning with [bw pw].
216
Table 4.94. Demographics and occurrences of [bw pw]-initial place names in Iowa and
Nebraska (U.S. Census Bureau, 1990; DeLorme 1998; 2000)
Iowa Nebraska
Population of Hispanic origin (1990) 1.18% 2.34%
Language spoken at home (1990)
Spanish or Spanish Creole 1.13% 1.55%
French or French Creole 0.29% 0.26%
Place names beginning with
[bw] (Bue-, Boi-) Buena Vista (town, ---
county, and college)
[pw] (Pue-, Poi-) — —
Note: The proportion of the overall U.S. population of Hispanic origin in 1990 was 8.99%.
"Buena Vista", in Iowa, is locally pronounced with initial [bw] (Buena Vista College Library
staff, personal communication, 2001).
In a study of 1,049 children in Iowa and Nebraska between 2 and 9 years of age,
Smit (1993) systematically elicited productions of most of the English word-initial clusters,
including [bl pi] and [tw]. The [tw] cluster was sometimes produced as [bw] or [pw], but
the [bl] and [pi] clusters almost never became [dl] or [tl], as shown in Table 4.30.14 This
indicates that [d t] are more disfavored before [I] than [b p] are before [w].
14 Similarly, these children also sometimes produce (bl pll as [bw pw), with no corresponding tendency to
turn [tw] into [tl]. Aversion to [tl] may be a contributing factor, but we cannot be sure, because they tend
to replace [I] with [w] in all environments.
217
Table 4.95. Errors in production of the initial stop in [bl pi tw] onsets by English-Ieaming
children in Iowa and Nebraska (Smit 1993)
Onset cluster Error rate category
Age group Occasional Rare
[tw-] (twins)
2:0-3:0 f,b P
3:6—5:6 p, k, d f, int
6:0-9:0 — —
[p H (plate)
2:0-3:0 — —
3:6-5:0 — b, t
5:6-7:0 — —
8:0-9:0 — —
[bl-] (block)
2:0-3:0 — —
3:6—
5:0 — —
5:6-7:0 — —
8:0-9:0 — —
Note: "Occasional" means "[u]sed by a few groups in an age range with a frequency of 4-
10%, or by most groups in that age range at frequencies of 1-4%"; "rare" means "[ojccurs
with a frequency of less than 3%, and only in a few groups in an age range” (Smit 1993, p.
947). This table includes all errors made by the 1,049 children in the study, "int" =
interdental.
The asymmetry is present at the earliest ages tested — before one would expect
most Iowan or Nebraskan children to have had much exposure to Spanish place names.
The UFD Hypothesis can therefore only be defended if the perceptual effects of frequency
are due chiefly to a very few tokens experienced very early in life. If so, it is an interesting
new finding, with many consequences. It implies that, contrary to TRACE, the many words
218
learned after early childhood contribute little to the phonotactic frequency effect. It predicts
large individual variation in phonotactics (since the individual is generalizing from a small
sample of the adult language, which will necessarily differ more between individuals than a
large sample). Finally, it suggests that even large corpora of adult language are inadequate
predictors of phonotactic performance, and that research on probabilistic phonotactics
should focus more on child-directed speech.
These findings are consistent with a model in which the decision between competing
parses is guided by the structural constraints of the perceiver’s language - here, the ban on
[coronal][coronal] onsets. In Experiment 4, where syllabification was fixed by clear
acoustic cues, the choice was between competing CCV parses. The “dl” responses were
reduced because a “dl” response could only be supported by the structurally disfavored
[dire] parse. In Experiment 5, where both segmental identity and syllabficiation were
ambiguous, “dl” responses could be supported by the legal [xd.lx] parse, and the
response bias disappeared.
The findings of Pitt (1998, Experiment 2) may be reinterpreted in the same way:
“1” response to an [1]-[j ] continuum was reduced, relative to a baseline, in the context
[maet_ae], but not in [mxd_ae]. Strong aspiration on the [t] provided an unambiguous cue to
V.CCV syllabification (Kirk 2001), allowing only the parses [mx.traj and the illegal
[mae.tlae], The [maed_x] context allowed VC.CV syllabification and thus the legal “1”
parse [mxd.lx]. This suggests that prosodic and segmental parse decisions are made in
parallel, with the candidate parses representing both phonemes and syllabification:
[msed.lx], [mx.dlx], [mxd.rx], and [mae.drre]. The chosen prelexica! parse thus provides
the essential information for word segmentation and lexical access. Phonotactically
impossible parses, such as those with vowelless syllables or illegal onset clusters, are
inhibited, leading to the Possible Word Constraint effects observed by Norris, McQueen,
and Cutler (1997).
219
4.7. Experiment 6: Phonotactics of the Japanese lexical stratals
4.7.1. Influence of remote context
Experiments 1-3 all examined the issue of whether a perceptual boundary could be
shifted by phonological context that was remote from the ambiguous segment. That it could
be was a main prediction of TRACE, and also of versions of the transitional-probability
theory which used contexts longer than a single phoneme. Results were uniformly negative:
The identity of a segment outside the phonotactically relevant context did not affect
boundary location even when segments within it did.
In these experiments, the relevant context was quite small, including only the
ambiguous segment and one adjacent to it. The INC-1 and SC-1 versions of the TP theory
therefore took only the relevant context into account in making their predictions. Where the
grammatical theory said that remote context was ineffective because it was irrelevant, the TP
theory said it was ineffective because it was remote.
This section presents evidence that remote context can have a phonotactic effect
when it is (despite its remoteness) phonotactically relevant. It will be shown that Japanese
listeners'judgments of vowel length in nonword stimuli are affected by phonological
context which is remote from the ambiguous vowel, but which is important in determining
the lexical stratum membership of the nonword. This can be taken as further evidence
against the INC-1 and SC-1 versions of the MERGE TP theory. The effect of stratum
phonotactics is found to be numerically larger and statistically more robust than a word-
superiority effect obtained with the same listeners and paradigm, contrary to the predictions
of TRACE.
15 The work in this section was perform ed with the guidance and support o f Shigeaki Amano at the NTT
Basic Research Laboratories in Atsugi, Japan.
220
The finding of a stratum-phonotactic effect is also strong evidence in favor of the
OT grammatical theory, which is the only one of the three theories in which the concept of
"lexical stratum" is at all motivated.
4.7.2. Rationale: The lexical strata of Japanese
Items in the Japanese lexicon can be divided into four classes, called "strata".
Words in a given stratum share historical, phonological, morphological, and orthographic
characteristics (Shibatani 1990; Tateishi 1990; Martin 1952; ltd & Mester 1994 1995).
These strata are summarized in Table 4.96.
Synchronically, the strata can be identified in two principal ways.
First, they are morpheme co-occurrence classes: In forming words, morphemes
combine preferentially with other morphemes of the same stratum. For example, the two
morphemes [dai] (Sino-Japanese) and [o:] (Yamato) both mean big'. The former combines
with Sino-Japanese [moq] 'gate' to give [daimoq] 'main gate of a Buddhist temple’, the latter
with Yamato [te] 'hand' to give [o:te] 'main gate of a castle'. Since all of the verbal and
adjectival inflectional endings are Yamato morphemes, only Yamato items undergo
inflection.16
16 A few exceptions have been noted: [dem orui] 'to demonstrate (politically)', consists o f a Foreign stem
spliced onto a Yamato ending (Sato 1983).
221
Table 4.96. The lexical strata of Japanese
Stratum Origin Orthography Examples
Yamato Antiquity "Sense reading" [kami] 'god'

(kun-yomi) of
[yama] 'mountain'
Chinese characters
Mimetic Onomatopoeia Hiragana [potapota] 'drip'
Sino-Japanese Borrowings from "Sound reading" [kazarj] 'volcano'

Chinese (on-yomi) of
[keizai] 'economy'
Chinese characters
Foreign Borrowings from Katakana [maikui]

European languages
'microphone'
[katarogui]
'catalogue'
Second, the strata are subject to different sets of phonotactic restrictions.17 A Sino-
Japanese morpheme, for example, cannot be more than two moras long, and the second
mora must be either a single vowel, or one of [t^i tsui ki kui]. A Yamato morpheme cannot
begin with [r], while a Mimetic morpheme can have [r] initially or medially but not both
(Tateishi 1990). Single [p], voiced geminates, and voiceless post-nasal obstruents are
prohibited in Yamato, but permitted in Mimetic, Foreign, and Sino-Japanese, respectively.
17 There are also commonly supposed to be productive phonological processes which operate only in
certain strata. However, attempts to demonstrate their productivity experimentally through native-speaker
"goodness" judgm ents have tended to show the opposite. For rendaku, see Suzuki, Maye, and Ohno (2000);
for the phonology o f verbal affixes, see Vance (1987, 1991). This is a common result of such studies: See
Hsieh (1976) for non-productivity o f Taiwanese tone sandhi, Zimmer (1969) for non-productivity o f a
Turkish labial ity-spreading rule.
222
Vowel length is distinctive in Japanese. Each of the five short vowels [i e a o ui]
has a long counterpart. Final [a] is found in all strata, while final [a:] is found only Foreign
words.
The absolute phonotactics of the strata can be viewed as a “core-periphery”
phenomenon: strictest for the indigenous Yamato stratum, less so for the newer Sino-
Japanese words, and most permissive for the recent Foreign loans (Ito & Mester 1994,
1995). However, there are many configurations in Sino-Japanese which, though
theoretically permitted in the Foreign stratum, are virtually never actually found there
(Moreton et al. 1998). For instance, the palatalized onsets [r*] and [h*] are almost
nonexistent outside of Sino-Japanese.
The result is to create long-distance phonological dependencies. For example, a
word containing [rj] or [h*] is almost certain to be Sino-Japanese, and hence to lack [a:],
while a word containing singleton [p] or [$a] is almost certain to be Foreign, and hence to
permit it. Carrier contexts containing different stratum cues can be constructed, and the
[a]-[a:J boundary measured, in order to detect whether stratum membership can influence
boundary location.
4.7.3. Experiment 6a: Word-superiority effect
In order to estimate the size of a pure lexical effect on the long-short vowel
boundary location, an experiment like that of Ganong (1980) was performed.
223
4.7.3.1. Design
Three pairs of words were selected. One member of each pair ended in a short
vowel and the other in the long version of the same vowel. If a word ended in a short vowel,
then making the vowel long produced a nonword, and vice versa. Both words in each pair
were in the same stratum, and were in or above the 96th percentile in rated familiarity when
presented aurally (Amano & Kondo in press). Both members of each pair had the same
accent pattern. The words are shown in Table 4.97:
Table 4.97. Stimulus words used in Experiment 6a
Word Stratum Gloss
$oro: Foreign ‘follow-up’
puiro Foreign ‘professional’
posuita: Foreign ‘poster’
pasuita Foreign ‘pasta’
sju:go: Sino-Japanese ‘meeting’
rirjgo Sino-Japanese ‘apple’
Each word pair provided one context which was expected to lexically bias perception
in favor of the short vowel, and one which was expected to bias in favor of the long vowel.
It was expected that there would be a bias to report [a:] in the context [posut_], since
[posuta:] is a real word while [posuta] is not, and that there would be a bias to report [a] in
the context [pasut_), since [pasuta] is a real word while [pasuta:] is not.
224
4.7.3.2. Methods
A male native speaker of Japanese recorded several tokens of each word in isolation.
These were digitized at 48 kHz with 16-bit resolution, downsampled to 44.1 kHz, and
normalized for peak amplitude. Using a waveform editor, a single token each of the final
syllables— [ro:], [ta:], and [go:]—was excised and cross-spliced with each of the initial
syllables to make three pairs of stimuli with each pair ending in the same long vowel. The
length of the vowel was adjusted to be as close as possible 250 ms by repeating medial pitch
periods.
Experimental subjects were 24 native speakers of Japanese, post-secondary students
in the Tokyo area, with equal numbers of each sex. Subjects reported normal speech and
hearing. They were paid for their participation.
Subjects were tested 8 at a time in a sound-treated room in the presence of the
experimenters. Stimuli were presented diotically through Sennheiser headphones at a peak
intensity of 70 dB SPL. Subjects responded by mouse-clicking on one of two buttons
displayed on the screen. The buttons were labelled in the katakanu syllabary (used
primarily for writing Foreign loanwords); one showed the stimulus word with a short final
vowel, the other with a long one (e.g., “ringo” and “ringoo”). Subjects were asked to
choose the button which better matched what they had heard. The next stimulus followed
after 0.8 seconds.
To vary the length of the final vowel, a half-wave raised cosine filter was applied.
This caused the final vowel to be gradually reduced to zero amplitude during the 50 ms
following the specified starting point of the filter, providing a natural-sounding end to the
vowel.
The boundary between the long and short vowel was established using the adaptive
method PEST (Taylor & Creelman 1967). One of the 6 stimuli was randomly selected, and
presented with either the longest or the shortest possible final vowel. On the next trial, the
225
opposite endpoint was presented. Thereafter, the stimulus presented depended on the
subject’s responses to previous stimuli, as specified by the PEST rule, so as to zero in on
the boundary. The series ended when the step size was reduced to 1/256 of the
continuum—i.e., about 1 ms— or after 80 trials. To reduce the dependence of each trial on
the previous one, two PEST series for the same context were interleaved with each other,
switching back and forth randomly. Once both series finished, the screen cleared and a
button appeared with the message “Click to go on to the next sound”. This procedure took
place once for each of the 6 stimuli.
4.7.3.3. Results and discussion
Three subjects were excluded from the analysis for failure to converge after 80 trials
on 6 or more series (in this or the following experiment combined). The remaining 2 1
listeners converged on 96% of all series after an average of 20.4 trials in each.
For each subject, an [a]-[a:] or [o]-[o:] boundary was computed for each series. The
boundary was the stimulus that would have been presented by PEST if there had been one
more trial. Since two interleaved series were presented for each carrier context, the subject’s
boundary for that stimulus was taken as the average of the two.
For each subject and each pair of words, we calculated the difference between the
boundary in the long-bias context and that in the short-bias context. Results are shown in
Table 4.98.
226
Table 4.98. Boundary difference in long- and short-biased contexts (in milliseconds).
Experiment 6a (N = 21 Ss)
Long-bias context Short-bias context Boundary difference Standard error
<jior_ ptur_ -4.22 4.92
posuit_ pasuit_ * 12.21 5.45
sju:g_ 9.04 4.92

ngg_
Note: Asterisk represents 5% significance.
There were large amounts of individual variation in the differences, leading to large
variances and low significance levels. The difference was significantly different from zero
in the posut-Zpasut- context (one-tailed t-test, t(20) = 2.086, p < 0.05). The syuug-Zring-
difference just misses significance at the 5% level (t(20) = 1.725). No effect could be
found for the for-Zpur- pair.
This experiment partially replicated the lexical bias found in ambiguous-phoneme
perception by Ganong (1980). A robust word-superiority effect, however, was not found.
The weak showing of the lexical effect is all the more striking when compared with the
strong results of the next experiment.
4.7.4. Experiment 6b: Lexical stratum phonotactics
We used the Massaro-Cohen paradigm, as in Experiments 1-3, to test whether
stratum phonotactics can shift the phonetic boundary between two sounds—specifically,
whether the different phonotactics of the Foreign and Sino-Japanese strata can shift the
boundary between word-final [a] and [a:].
The idea was to present an [a]-[a:] continuum at the end of a carrier word containing
two consonants, C and C \ If C and C ’ are sounds that occur almost exclusively in Sino-
Japanese words, then they do not naturally co-occur with [a:]. We therefore expected that
listeners would require exceptionally strong acoustic evidence (i.e., a longer vowel) before
227
reporting [a:]. On the other hand, if C and C ’ are sounds found only in Foreign words,
then it is unsurprising if the word contains (a:J, and we expected in this case that listeners
would be readier to report the long vowel. In other words, we expected that the [a]-[a:]
boundary would be shifted towards [a:] when C and C’ were highly valid cues for Sino-
Japanese, and that it would be shifted towards [a] when they are highly valid cues for
Foreign.
4.7.4.1. Design
Cue validity and co-occurrence statistics were computed based on a preliminary
(pre-release) updated version of the Japanese-language database of Amano and Kondo
(1999), by exploiting a feature of the Japanese writing system: the lexical stratum of a word
is reflected in its orthography. Foreign words are written in the katakana syllabary.
Chinese characters are used to write both Sino-Japanese and Yamato words, but the
dictionary indicates which pronunciations of a given character are Sino-Japanese and which
are Yamato. There are some exceptions, but they are not numerous. (Details of the
classification procedure are given in Moreton et al. (1998).)
Cues to stratum membership were chosen on the basis of these statistics. For the
Foreign stratum we chose as C the singleton [p] (i.e., [p] neither geminated nor preceded by
a nasal coda) and as C’ the voiceless bilabial fricative [<|>] (orthographically <f>), which
' outside of the Foreign stratum is found only before [ui]. For the Sino-Japanese stratum we
chose [rj] and [h*]. For neutral contexts we chose [r] and [t]. The statistics are shown in
Table 4.99:
228
Table 4.99. Validity of cues to stratum membership: Number of nouns in database
belonging to each stratum containing the given cues
Stratum
Foreign Sino-Japanese Other
Foreign (F) cues
[p] singleton 812 1 83
[<Mbefore [i e a o] 214 0 16
Neutral (N) cues
2922 2917 6530

[r]
[t] 1683 4068 4336
Sino-Japanese (SJ) cues
W) 19 959 109
[h*] 7 273 38
Note: “Other” includes the Mimetic and Yamato stratum, compounds containing words
from more than one stratum, and words which the orthographic algorithm could not
classify. Nouns make up 82% of the database.
The consonant cues were embedded factorially in the template [CoC'_] to produce
nine carrier contexts, ranging from Foreign/Foreign [po<j>_] to Sino-Japanese/Sino-Japanese
[fJohU18
18 None o f the stimuli could be interpreted as Yamato items. Each stimulus begain with either [p], [r], or
[r1], none o f which is permitted word-initially in Yamato.
229
4.7.4.2. Predictions
4.7.4.2.1. TRACE
There is nothing in TRACE that would allow it to represent the concept of stratum
directly. Stratum effects, if there are any, must emerge from statistical properties of the
lexicon.
It was not possible to perform an actual TRACE simulation of this experiment.
However, we can infer something about its predictions based on its behavior in previous
simulations.
In each of the phonotactic simulations we have run, the advantage for one phoneme
over another has come, not from slight activation of many lexical items, but from the
moderate incomplete activation of one or two lexical items that are very similar to the
ambiguous stimulus. Their activation is strong enough to extinguish the other word
competitors which overlap the stimulus to a smaller degree.
A word-superiority effect differs from a phonotactic effect only in that a single word
candidate is very highly active. As a result, the magnitude of a "phonotactic" effect is
predicted to be smaller than that of an outright lexical effect. We expect a smaller effect in
this experiment than in Experiment 6a.
4.7.4.2.2. MERGE TP
MERGE TP would make lexical stratum effects on perception, if any, an emergent
statistical property of phoneme sequence probabilities.
The INC-1 and SC-1 versions of the MERGE TP theory predict only an effect of
the proximate (C ) context, not of the remote (C) context, because they do not keep track of
dependencies more than a single segment away from the ambiguous segment. If an effect
of C is found, it will constitute positive evidence either that the contexts must be larger, or
that the phonotactic effect is not entirely due to transitional probabilities.
230
4.7A2.3. OT grammatical theory
OT analyses of the Japanese lexical strata are in agreement that the differences
between strata lie in faithfulness rather than markedness. One proposal is that there are
separate grammars for the separate strata, which differ in how the faithfulness constraints
are ranked. Lexical items are evaluated with respect to their stratum's grammar (Ito &
Mester 1995). An alternative account is that there is only one grammar, but that it contains
duplicate sets of faithfulness constraints, which are stratum-specific. Each set applies only
to lexical items which belong to one stratum. Different rankings of each set result in
different input-output mappings for different strata (Fukazawa et al. 1998).
The choice is not a crucial one for the present experiment; the reasoning is nearly
the same in either case. For expository purposes I will adopt the second proposal, stratum-
specific faithfulness constraints. Its principal theoretical advantages are that it is
parsimonious (preserving the principle of a single grammar with a single fixed ranking) and
offers a learning algorithm, driven by ranking paradoxes: Its principal empirical advantage
is its ability to deal with "hybrid" compounds - compound words containing words from
different lexical strata, such as the Yamato/Sino-Japanese [tombokeqkjiu:ka] 'dragonfly
researcher’, where each member of the compound obeys the phonotactic restrictions
appropriate to its stratum.
The illegality of [a:] in Sino-Japanese is reflected in the ranking of the Sino-
Japanese lDENT[LENGTH] faithfulness constraint below *[a:]. Its legality in Foreign items
is captured by ranking the Foreign IDENT[LENGTH] above *[a:].
231
(4 .1 0 0 ) lDENT[LENGTH)FOR » *[a:] » I d e n t [ L e n g t h 1s j
Ident [Len g th Jfo r *[a:l IDENT[LENGTH]SJ
/d e n s W s j
a. [densia:] *!
*
b. [densja]
/konp*uita^For
*
c. [konp’uita:]
d. [konp’uita] *!
The predictions made by this grammar in the grammatical model hinge crucially on
the concept of "active constraint", which, as discussed in §3.4.2., is any constraint which, for
some input /, eliminates a candidate output from consideration. In order to use grammatical
knowledge, the parser must decide which markedness constraints are active. In order to do
that, it must decide which stratum it is dealing with. The listener must covertly choose a
stratum within which to interpret the stimulus; that is, the linguistic parse which the listener
attempts to assign to the stimulus includes a stratum classification. The active constraints
are determined relative to the stratum's faithfulness constraints. If the stratum chosen is
Foreign, then *[a:] is inactive and produces no bias. If it is Sino-Japanese, then *[a:J is
active and produces a bias.
232
(4.101)
/any input/ Ident[Length ]fo r *[a:] IDENT[LENGTH]SJ
a. [po<[>a]For
*
b. [po$a:]For
c. [rjohia]SJ
*
d. [rjohja:]SJ
In this view, the Foreign cues can provide conclusive evidence of Foreign stratum
membership, because they are illegal in Sino-Japanese - they violate a markedness
constraint which is active with respect to Sino-Japanese. The reverse is not true: Although
the Sino-Japanese cues are rare in Foreign words, they are not actually illegal there. Hence,
any of our stimuli, with either vowel length, has a legitimate parse as a Foreign word. Those
containing Foreign cues cannot be parsed as Sino-Japanese; those lacking Foreign cues can
be parsed as Sino-Japanese.
There should thus be only two degrees of [a] bias: a lesser one in words containing
Foreign cues, and a greater degree in words lacking them. If C is a Foreign cue, then
manipulating C' should have no effect, while if C is not a Foreign cue, then manipulating C'
should have an effect (since making C' Foreign will reduce the [a] bias). The same is of
course true with C and C exchanged. For example, the stimulus contexts [po$_] and
[rjoi{»a_] are predicted to have the same effect, since the presence of [ $ a j cues Foreign
stratum membership regardless of the initial consonant. Both should produce more "aa"
judgments than contexts lacking Foreign cues. This will show up statistically as an
interaction between the effects of the C and C' manipulations.
233
4J.4.3. Methods
The procedure was almost identical to that of Experiment 4a. The same speaker
(naive to the purpose of the experiment) was recorded on digital audio tape producing the
cues embedded in the context [CoC’a:], with a high pitch on the [o] and a low one on the
[a:]. This accent pattern was chosen because it is common in both the Sino-Japanese and
Foreign strata, and because it rules out the possibility of a word boundary inside the
stimulus. The 9 possible combinations of C and C ’ yielded the following nonwords:
Table 4.102. Stimuli for Experiment 6b
c Foreign Neutral Sino-Japanese
Foreign [po<(»a:] [pota:] [poh’a:]
Neutral [ro$a:] [rota:] [roWa:]
Sino-Japanese [rjo^a:] [Wota:] [r’oWa:]
These were digitized and normalized as before. Using a waveform editor, single
tokens of [po], [ro], and [rJoJ were selected and excised. One of the [o]s was chosen and
spliced in to replace the original [o] of the other two, resulting in [po], [$o], and [f'o] tokens
with identical vowels.
In this experiment, unlike in Experiment 6a, it was necessary to manipulate the initial
consonant of the final syllable; hence, an [a:] token had to be created which could be spliced
onto each of the three possible syllable onsets. A single token of [a:] was selected and
lengthened to 250 ms by repeating medial pitch periods. One token each of the speaker's
original [$a:], [ta:], and [Wa:] was chosen, and truncated to the first zero crossing following
234
the fourth pitch period of the vowel (early in the steady state), leaving [<|>a-], [ta-], and
[Wa-]. The 250-ms [a:] token was spliced onto the end of each one to produce [<j)a:], [ta:],
and [Wa:] tokens with identical extra-long vowels.
The subjects and procedure were as in Experiment 6a. A short break separated the
two experiments, which, together with the post-experiment questionnaire, lasted about two
hours.
At the end of the experiment, subjects received written questionnaires. They listened
as many times as they wished to the longest [a:] and the shortest [a] in each carrier context
(in a random order for each subject) and transcribed the resulting pseudo-word in katakana,
then answered questions about it, including whether they knew it as a real word of Japanese.
4.7.4.4. Results
The same subjects were excluded as in Experiment 6a. For each valid subject, an
[a]-[a:] boundary was computed for each series as in Experiment 6a. The boundary for
each context, averaged across subjects, is shown in Figure 4.103:
235
Figure 4.103. Boundary between [a] and [a:), averaged across 21 listeners
[ a ] - 1a : ]
boundary
(m s )
120
SJ ryo
N ro-
The boundary stimulus tends to become longer as the consonantal cues go from
Foreign (F) to Sino-Japanese (SJ). A by-subjects two-factor ANOVA shows that
manipulations of both C and C’ had very significant effects on the boundary location. The
C manipulation caused a 7.25-ms shift (F(2,40) = 6.473, p < 0.01); the C ’ manipulation
caused a 12.6-ms shift (F(2,40) = 11.529, p < 0.01). Their interaction did not reach
significance (F(4,80) = 0.714).
The effect obtained in this experiment is considerably larger and more robust than
that of Experiment 6a. There, the only significant effect was a 12.2l-ms word-superiority
236
effect on the location of an [a]-[a:] boundary. This is about the same size as the effect of
C’ alone in this experiment. TRACE would have predicted the lexical effect to be stronger.
Questionnaire results confirmed that the subjects heard the stimuli correctly; no
stimulus was misheard by more than 3 subjects. One stimulus, [pota], was judged to be a
real word by 15 subjects (it is part of the reduplicated Mimetic potapota). One other was
judged a real word by 2 subjects, and 5 others by 1 subject. Lexical bias, if any were
present, should have given [pota] an especially strong [a] bias. Its boundary is in fact
intermediate between those of its neighbors.
The INC-1 and SC-1 versions of the transitional-probability theory do not fare well
either. They can account for the effects of C’, but not for those of C, which is fully three
segments away from the ambiguous vowel. The C effect is smaller than the C' effect, which
could indicate that a transitional-probability effect between C’ and the immediately
neighboring ambiguous segment is being added on top of some other effect correlated with
stratum.
These results are also challenging for the OT grammatical model. It is clear that
stratum phonotactics are playing a role, so faithfulness constraints are involved. However,
the lack of interaction between the C and C' effects is unexpected. All stimuli containing at
least one Foreign cue should be classified as Foreign and should all behave alike, with the
same low level of [a] bias. Instead, the [a] bias decreases with the number of Foreign cues,
and two are more effective than one.
Furthermore, [rj] and [h*] seem to act as cues to membership in the Sino-Japanese
stratum, since they produce more [a] bias than the Foreign or Neutral cues. This is
unexpected in the grammatical theory because there are no grammatical cues to
membership in the Sino-Japanese stratum. The [rj] and [h*] sounds are statistically rare in
237
the Foreign stratum, but not grammatically illegal there. Listeners appear to be sensitive to
the statistical link, which the OT grammatical theory does not capture.
4.7.4.S Discussion
Experiments 6a and 6b have demonstrated three points with some assurance.
First, they found a phonotactic effect that was substantially larger than any of three
lexical effects obtained with the same listeners and paradigm - a point difficult to explain in
TRACE.
Second, they showed that phonological context which is remote from the ambiguous
segment can influence perception. If the effect is caused by perceptual mechanisms
sensitive to phoneme sequence probabilities, they must be sensitive to sequences of an
implausibly great length.
Third, they demonstrated that the lexical strata of Japanese are not just descriptive
constructs, but play an active role in perception. The distinction among strata (as a primitive
rather than emergent phenomenon) is crucial to any grammatical account of Japanese
phonology, but is unmotivated (or motivated only post hoc) in TRACE and MERGE TP.
These results leave a problem for the grammatical theory in the gradience of the
cues. As the cues become more Sino-Japanese-like, the [a] bias increases; as they become
more Foreign-like, it decreases.
It may be that the likelihood of classifying a stimulus in a particular stratum depends
on the number and type of stratum cues present in it. A stimulus with two Sino-Japanese
cues is more likely to be perceived as Sino-Japanese, and hence more likely invoke the *[a:J
constraint and produce a boundary shift, than a stimulus with only one.
This solution requires that stratum classification be performed outside of the
grammar, since the grammar does not represent the concept of "Sino-Japanese cue" - only
that of "Foreign cue”. Stratum classification might take place as part of the process of
238
incorporating a new word into the lexicon, through comparison with existing lexical items.
Since listeners in this experiment heard each stimulus an average of 44 times in succession,
they had concentrated experience with each one, and ample time for slower off-line
processes to unfold.
4.8. Summary
Experiments 1-3 showed that a phonotactic boundary shift could be obtained in
contexts which made one endpoint illegal, but not in contexts which merely made one
endpoint extremely rare. This result is consistent only with the OT grammatical theory, but
not with MERGE TP or TRACE. These same experiments also showed that the boundaiy-
shift effect is not modulated by phonological context which is outside the structural
description of the relevant phonotactic prohibition - a result consistent with the OT
grammatical theory, and with MERGE TP, but not with TRACE.
Experiment 4 replicated the effects of Experiments 2 and 3 with voiced rather than
voiceless stops, and showed that the boundary shift was due to a dependency between stop
and sonorant responses, rather than to any auditory effect of one consonant on the other.
Experiment 5 confirmed that this dependency was not compensation for coarticulation, and
showed that the phonotactic effect in CCV stimuli can be abolished by prepending another
vowel to provide an alternative parse in which the banned cluster becomes legal. This
indicates that the parser makes syllabification and segmental-identity choices in parallel. If
segmental identity were decided first, then the presence or absence of an initial V would
make no difference. If syllabification were decided first, then phonotactics would not be
able to influence segmentation, as it is known to do (Friederici & Wessels 1993; McQueen
1998, Treiman & Zukowski 1990; Treiman & Danis 1988, Kirk 2001).
Experiments 6a and 6b found that the perception can be influenced by the
phonotactics of the lexical strata of Japanese, a variable which exists only in the OT
grammatical theory. Stratum effects must be taken as emergent statistical phenomena in the
239
other two theories, but this they cannot be. The stratum effect was found to be larger than a
word-superiority effect, ruling out TRACE'S contention that it is a word-superiority effect.
The phonotactic boundary shift was influenced by segmental context fully three segments
away from the ambiguous segment, too far away for the MERGE TP theory to capture the
dependency.
Taken together, these results provide substantial support for the hypothesis that
ambiguous-phoneme perception is guided by grammatical knowledge, and that the parser
considers several candidates in parallel when making its decisions.
4.9. Appendix: Synthesis parameters for the stimuli of Experiments 4 and 5
Table 4.104. Constant synthesis parameters which were identical for the "b" and "d" arrays
of Experiments 4 and 5
Parameter Value
UI 2
RS 1776
SR 16000
FL 20
OQ 30
GV 60
GH 50
DU 700
F6 4900
B6 100
F5 4300
B5 300
F4 3250
240
Parameter Value
F3 2500
FTP 3800
Note: UI = update interval (frame length), ms; RS = random-number-generator seed; SR =

sampling rate, Hz; FL = flutter (%F0 variation), OQ = open quotient (% of the glottal cycle
with open glottis), GV = output gain for voicing source, dB; GH = output gain for
aspiration source, dB; DU = duration, ms; FTP = frequency of tracheal pole, Hz.
Table 4.105. Time-varying synthesis parameters common to the "b" and "d" arrays of
Experiments 4 and 5
Parameter Time, ms Value
AV 0 0
AV 25 0
AV 30 40
AV 70 40
AV 75 57
AV 225 60
AV 425 55
AV 475 50
AV 525 43
AV 575 0
AV 700 0
TL 0 30
TL 60 30
TL 65 10
TL 75 10
TL 115 (10 * w + 0 * 1)
TL 145 (20 * w +
241
TL 175 (10 * w + 0 * 1)
TL 225 0
TL 700 0
FO 0 1000
FO 75 1000
FO 225 1100
FO 425 1000
FO 475 900
FO 700 900
AH 0 0
AH 25 0
AH 60 0
AH 65 72
AH 70 70
AH 90 65
AH 115 0
AH 175 0
AH 225 0
AH 275 56
AH 345 60
AH 425 58
AH 475 55
AH 525 53
AH 575 0
AH 700 0
242
AF 0 0
AF 60 0
AF 65 57
AF 70 55
AF 75 0
AF 700 0
Note: AV = amplitude of voicing, dB arbitrary reference; AH = amplitude of aspiration, dB

arbitrary reference; AF = amplitude of frication, dB arbitrary reference; FO = fundamental
frequency, 0.1 Hz; TL = spectral tilt, dB reduction at 3000 Hz.
Table 4.106. Synthesis parameters for the "b" array of Experiments 4 and 5
AB — ( 24 * g + 44 * b)
A2F — 66
B2F — (75 * exp ( log (600/75) * b))
FTZ 0 3800
FTZ 75 3800
FTZ 115 (3800 * w + 3300 * 1)
FTZ 175 (3800 * w + 3300 * 1)
FTZ 275 3800
FTZ 700 3800
B4 0 800
B4 75 800
B4 115 (800 * w + 300 * 1)
B4 175 (800 * w + 300 * 1)
B4 275 200
243
B4 700 200
B3 0 700
B3 75 700
B3 115 (800 * w + 250 * 1)
B3 175 (800 * w + 250 * 1)
B3 275 150
B3 700 150
F2 0 st_targ
F2 75 st_targ
F2 115 (0.20 * st_targ + 0.80 * gl_targ)
F2 140 gLtarg
F2 150 gLtarg
F2 225 (0.33 * gLtarg + 0.67 * 1600)
F2 275 1600
F2 700 1600
B2 0 90
B2 115 90
B2 175 (270 * w + 90 * 1)
B2 225 (270 * w + 90 * 1)
B2 275 90
B2 700 90
FI 0 180
FI 75 180
FI 85 200
FI 140 (270 * w + 320 * 1)
244
FI 150 (270 * w + 320 * 1)
FI 225 700
FI 345 780
FI 700 780
Bl 0 250
Bl 75 250
Bl 95 140
Bl 175 140
Bl 225 80
Bl 700 80
Note: AB = amplitude of the bypass frication filter, dB; A2F = amplitude of the frication
filter centered on F2, dB; B2F = bandwidth of same, Hz; FTZ = frequency of tracheal zero,
dB; Fi and Bi = frequency and bandwidth of the i-th formant, Hz. The variable st_targ, the
stop target F2, is 800 Hz; gljtarg, the glide target F2, is equal to 675 Hz * w + 900 Hz * /.
Table 4.107. Synthesis parameters for the "d" array of Experiments 4 and 5
AB — ( 24 * g + 44 * d)
A2F — ( 66 * g + 60 * d)
B2F — (75 * exp ( log (600/75) * d))
FTZ 0 3800
FTZ 75 3800
FTZ 115 (3800 * w + 3300 * 1)
FTZ 175 (3800 * w + 3300 * 1)
FTZ 275 3800
FTZ 700 3800
B4 0 800
245
B4 75 800
B4 115 (800 * w + 300 * I)
B4 175 (800 * w + 300 * 1)
B4 275 200
B4 700 200
B3 0 700
B3 75 700
B3 115 (800 * w + 250 * I)
B3 175 (800 * w + 250 * 1)
B3 275 150
B3 700 150
F2 0 st_targ
F2 75 st_targ
F2 115 (0.20 * st_targ + 0.80 * gl_targ)
F2 140 gLtarg
F2 150 gLtarg
F2 225 (0.33 * gLtarg + 0.67 * 1600)
F2 275 1600
F2 700 1600
B2 0 90
B2 115 90
B2 175 (270 * w + 90 * 1)
B2 225 (270 * w + 90 * 1)
B2 275 90
B2 700 90
246
FI 0 180
FI 75 180
FI 85 200
FI 140 (270 * w + 320 * 1)
FI 150 (270 * w + 320 * 1)
FI 225 700
FI 345 780
FI 700 780
Bl 0 250
Bl 75 250
Bl 950 140
Bl 175 140
Bl 225 80
Bl 700 80
Note: AB = amplitude of the bypass frication filter, dB; A2F = amplitude of the frication
Filter centered on F2, dB; B2F = bandwidth of same, Hz; FTZ = frequency of tracheal zero,
dB; Fi and Bi = frequency and bandwidth of the i-th formant, Hz. The variable stjarg, the
stop target F2, is equal to 800 Hz * g + 1400 Hz * d: g lja r g , the glide target F2, is equal
to 675 Hz * w + 900 Hz * /.
247
CHAPTER 5
CONCLUSIONS
Speech perception is guided by the expectation that the stimulus is an utterance in
the perceiver’s language. This finding cuts across every level of language organization:
phoneme inventory (e.g., Miyawaki, Strange, Veibrugge, Liberman, Jenkins, & Fujimura,
1975), phonotactics (e.g., Brown & Hildum, 1956), the lexicon (e.g., Ganong, 1980), and
syntax (e.g., Games & Bond, 1975).
The preceding chapters have offered an explicit theory of how the mechanisms of
speech perception can use grammatical knowledge of the phonology of the stimulus
language to arrive at a phonological parse of the input by choosing from a set of candidate
parses consistent with the acoustic signal. I have argued that such a theory is necessary if
the observed effects of phonology on perception are to be understood, in view of the
inability of competing statistically-based theories (TRACE and transitional probabilities) to
account for the empirical Findings.
Three principal objections were raised against the statistical theories: their inability
to distinguish between relevant and irrelevant context, their lack of sufficiently rich
phonological structure, and their inability to generalize appropriately to phonological
classes. At the root of all three is precisely the feature that makes statistical theories so
conceptually attractive - their low degree of abstraction.
The statistical theories, including TRACE and MERGE TP, can be characterized as
"unit models" because they attribute perceptual preference for, e.g., [tr] over [tl] to the
listener’s differing experience of the specific phonological units [tr] and [tl]: One is an
attested onset and the other is not (Halle et al. 1998, Pitt 1998), one is common and the
other is rare (Massaro & Cohen 1983, Pitt & McQueen 1998), one is supported by many
lexical items which contain it and the other is not (McClelland & Elman 1986). The a
priori plausibility of unit models comes from the pervasiveness of unit-frequency and
248
lexicalty effects in language (e.g., Vitevich & Luce 1999, Jusczyk, Luce, & Charles-Luce
1994, Frisch, Large, & Pisoni 2000, Hay, Pierrehumbert, & Beckman in press; Ganong
1980; Samuel 1981, Fox 1984), combined with the minimal nature of the representations
they posit - phonemes and words, both of which are needed in any theory. The weakness
of unit models is not conceptual but empirical: The phonological knowledge used in speech
processing is more complex than can be accommodated in such a simple architecture. The
experiments of Chapter 4 were designed to exploit this weakness, in order to argue that a
full-fledged phonological competence must be available to perceptual mechanisms on line.
Inability to distinguish relevant from irrelevant phonological context. Phonological
processes apply to classes of sounds in classes of environments (e.g., a process of
devoicing applying to all obstruents at the end of all syllables). Different rules have
different environments. However, the unit models afford only one environment - the word
for TRACE, the fixed-length phoneme string for MERGE TP - and are forced to detect
linguistically irrelevant accidental correlations involving contextual material which has
nothing to do with the actual phonological pattern. Experiments 1-6 showed that in fact,
when probabilities are equated, phonologically relevant variation has a much stronger
perceptual effect than phonologically irrelevant variation.
This was particularly clear in the case of Experiment 6, where the magnitude of a
word-superiority effect was compared directly with that of a stratal phonotactic effect and
found to be much smaller. In order to account for the phonotactic effect (which reflected a
dependency between the ambiguous phoneme and one three phonemes previous to it), a unit
model would have to use such a large environment that it would have to also represent
equally strongly the dependency between the first three segments of any word and the
fourth - incorrectly predicting a similarly-sized or stronger word-superiority effect.
Lack o f sufficiently rich phonological structure. Moreover, since the unit models do
not represent syllabification, they could not predict the effects of syllable structure found by
Pitt (1998) and in Experiment 5. TRACE, whose phoneme-decision process considers each
249
phoneme unit in isolation, cannot represent the phonotactic dependency between two
phoneme decisions found in Experiments 4 and 5.
Inability to generalize to phonological classes. This problem is acute for MERGE
TP, which only represents statistical dependencies between one specific phoneme sequence
and another. For this theory, [ki] and [saj are not two instances of the pattern "CV
syllable", but two unrelated phoneme strings. This renders the theory unable to recognize
natural classes. Experiments 2-4 indicated that English listeners' experience of the common
[labial][labial] onsets [br pr] legitimizes the rare or nonexistent [bw pw], but MERGE TP
cannot make the connection. (TRACE'S featural level could in principle allow it to capture
this generalization, if there were a way of representing syllable structure.) The inability to
relate one phoneme string to another exacerbates the problems of irrelevant context and
oversimplified phonological structure by preventing the process of comparison which might
allow irrelevant factors to be averaged away and lead to the induction of more structured
representations.
The OT grammatical theory performed well in all of these tests, predicting shifts
when there should have been shifts and no shifts when there should not have been any. The
good performance was not due to the specific choice of Optimality Theory - a similar
theory could in principle have been constructed around any descriptively adequate grammar
- but to the fact that grammatical theory more accurately describes the categories and
processes of language. TRACE and MERGE TP both propose, in essence, that the
representations and rules active in on-line speech perception are very different from those
inferred from typological study of the structure of human languages. Any attempt to
elaborate the architecture of either theory to capture more sophisticated linguistic concepts
(e.g., by adding a layer o f syllable units to TRACE) will amount to building grammar into
them. Since their chief conceptual appeal is their promise to explain apparent grammatical
effects, like phonotactics, as emergent statistical generalizations captured by a simple
learning system, the remedy will require amputation.
250
There remains some evidence for transitional-probability effects on ambiguous-
phoneme perception which are not captured in the OT grammatical theory - the findings of
Pitt (1998) on English onset clusters - and the effects of lexicality are thoroughly
documented (Ganong 1980; Samuel 1981, Fox 1984, Elman & McClelland 1988, etc.).
Given the pervasive nature of frequency and practice effects in all cognitive domains, and
their narrow stimulus-specificity (e.g., Klapp et al. 1991, Logan 1988a, 1988b), there can be
no doubt that unit-based processes play a prominent role in perception. However, the
evidence of this study suggests that they are considerably weaker than the structural
constraints of the listener’s native language.
A mechanism for capturing some of these effects—gradient illegality and
probabilistic phonotactics— in Optimality Theory has been proposed by Boersma (1997,
1998) and Boersma and Hayes (2001): the continuous ranking scale and stochastic
constraint evaluation. In a standard OT grammar each constraint has a fixed dominance
relation to all other constraints, determined by its place in the hierarchy: C l dominates C2
or is dominated by C2, or is ranked in the same stratum as CT, with no other possibility. In
Boersma and Hayes's proposal, each constraint is associated with a range of positions on
the real line, which may overlap with the ranges of other constraints. When the grammar is
given an input to evaluate, the position of each constraint is fixed probabilistically at some
point in its range, yielding a standard ranking. If the range of CI is centered above that of
C2, but overlaps it, then most of the time C l will be fixed above C2, but sometimes Cl will
be low in its range, C2 will be high in its, and the result will be that C2 dominates C l. The
grammar therefore does not always give the same output for a given input; different
constraints are active from one use of the grammar to the next. The further apart the centers
of the C l and C2 ranges are, the likelier C l is to dominate C2 on any given use of the
grammar; hence, the likelier the corresponding grammatical process is to occur. The overlap
between the C l and C2 ranges depends on the frequency with which the C 1 » C 2 and the
C 2 » C I outputs occur in the data to which the learner is exposed.
251
In perception, this could cause phonotactic prohibitions to appear and disappear
probabilistically. A configuration which is ruled out when the markedness constraint M
dominates the faithfulness constraint F (and is therefore active) would be accepted when the
reverse is true. Averaged over a large number of trials, the listener's dispreference for the
configuration would depend on the amount of overlap between the M and F ranges. In this
way, stronger and weaker phonotactic bans would correspond to smaller and greater degrees
of overlap.
Although this dissertation has considered only phonotactic effects of grammar on
ambiguous-phoneme perception, the OT grammatical model predicts that effects will be
pervasive in more naturalistic on-line tasks. Speech segmentation, for instance, is
demonstrably sensitive to the grammar of phonotactics (Norris et al. 1997, Kirk 2001).
This can be seen as selection of a grammatically more harmonic prosodic parse over a less
harmonic one. Effects of faithfulness should be apparent in word recognition and similarity
priming: A nonword which is unfaithful to a real word on a low-ranked faithfulness
constraint should activate the word more strongly than a nonword which is unfaithful to it
on a high-ranked constraint. Such studies offer a test, not merely of the influence of
grammar, but of the specific conception of it put forward by Optimality Theory and
competing theories of linguistic competence.
252
BIBLIOGRAPHY
Algeo, J. (1978). What consonant clusters are possible? Word 29:206-224.
Alwan, A. A., Narayanan, S.S., & Haker, K. (1997). Toward articulatory-acoustic

models for liquid approximants based on MRI and EPG data. Part II. The rhotics.
Journal o f the Acoustical Society o f America 101(2):1078-1089.
Amano, S., & Kondo, T. (1999). Nihongo no goitokusei. Tokyo: Sanseido.
Archangeli, D., & Pulleybiank, D. (1994). Grounded phonology. Cambridge, MA: MIT
Press.
Ashby, F. G., & Maddox, W. T. (1994). A response time theory of separability and
integrality in speeded classification. Journal o f Mathematical Psychology, 38:423-
466.
Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of probability

statistics by 8-month-old infants. Psychological Science 9(4):32l-324.
Baayen, R., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (CD-
ROM). Philadelphia: Linguistic Data Consortium.
Bailey, P. J., Summerfield, Q., & Dorman, M. (1977). On the identification of sine-
wave analogues of certain speech sounts. Haskins Laboratories Status Report on
Speech Research 51/52:1-25.
Beckman, J. N. (1998). Positional faithfulness. Ph. D. dissertation, University of

Massachusetts, Amherst.
Blumstein, S. E., & Stevens, K. N. (1979). Acoustic invariance in speech production:

Evidence from measurements of the spectral characteristics of stop consonants.
Journal o f the Acoustical Society o f America 66(4): 1001-1017.
Boersma, P. (1997). How we learn variation, optionality, and probability.

Proceedings o f the Institute o f Phonetic Sciences o f the University o f Amsterdam
21:43-58
253
Boersma, P. (1998). Functional phonology: Formalizing the interactions between
articulatory and perceptual drives. Ph.D. dissertation, University of Amsterdam.
Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm.
Linguistic Inquiry 32:45-86.
Borowsky, T. J. (1986). Topics in the lexical phonology o f English. Ph.D. dissertation,

University of Massachusetts, Amherst.
Bosch, L., and Sebastian-Galles, N. (1997). Native-language recognition abilities in 4-

month-old-infants from monolingual and bilingual environments. Cognition 65:33-
69.
Brown, C., & Matthews, J. (2001). When intake exceeds input: Language-specific
perceptual illusions induced by LI prosodic constraints. Proceedings o f the Third
International Symposium on Bilingualism, Bristol, U.K., April 18-20, 2001.
Brown, R. W., & Hildum , D. C. (1956). Expectancy and the perception of syllables.
Language 32:411-419.
Bumage, G. (1995). The CELEX lexical database. Release 2. Centre for Lexical
Information; Max Planck Institute for Psycholinguistics, The Netherlands.
Burton, M. W„ Baum, S. R„ & Blumstein, S. E. (1989). Lexical effects on the phonetic

categorization of speech: The role of acoustic structure. Journal o f Experimental
Psychology: Human Perception and Performance 15(3):567-575.
Catford, J. C. (1988). A Practical Introduction to Phonetics. Oxford, UK: Oxford

University Press.
Cherry, E. C. (1953). Some experiments on the recognition of speech with one and two
ears. Journal o f the Acoustical Society o f America 23:975-979.
Chomsky, N., & Halle, M. (1968). The sound pattern o f English. Cambridge, MA: MIT
Press.
Clements, G. N. (1990). The role of the sonority cycle in core syllabification. In J.

Kingston & M. Beckman (Eds.), Papers in laboratory phonology I: Between the
grammar and physics o f speech (pp. 283-333). New York: Cambridge University
Press.
254
Clements, G. N., & Keyser, S. J. (1983). CVphonology. MIT Press, Cambridge, MA.
Clements, G.N. (1985). The geometry of phonological features. Phonology Yearbook

2:225-252.
Cohn, A., & Lavoie, L. (2000). English vowel-liquid monosyllables: A case of

trimoraic syllables. Poster presented at the Seventh Conference on Laboratory
Phonology, Nijmegen, The Netherlands.
Coleman, J., & Pierrehumbert, J. (1997). Stochastic phonological grammars and

acceptability. In Proceedings o f the 3rd Meeting o f the ACL Special Interest Group
in Computational Phonology (12 July 1997), pp. 49-56. Somerset, NJ: Association
for Computational Linguistics.
Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed).

Strategies o f information-processing (pp. 151-216). London: Academic Press.
Connine C. M., & Clifton, C. (1987). Interactive use of lexical information in speech
perception. Journal o f Experimental Psychology: Human Perception and
Performance 13(2):291-299.
Connine C. M., Titone, D., & Wang , J. (1993). Auditory word recognition: Extrinsic
and intrinsic effects of word frequency. Journal o f Experimental Psychology:
Learning, Memory, and Cognition 19:81-94.
Crowther, C. S., & Mann, V. A. (1994). Use of vocalic cues to consonant voicing and
native language background: The influence of experimental design. Perception and
Psychophysics 55(5):513-525.
Cutler A., & Norris, D. (1979). Monitoring sentence comprehension. In W. E. Cooper

& E. C. T. Walker (Eds.), Sentence Processing: Psycholinguistic studies presented
to Merrill Garrett (pp. 113-134). Hillsdale, NY: Erlbaum.
Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical
access. Journal o f Experimental Psychology: Human Perception and Performance
14:113-121.
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1986). The syllable's differing role in the
segmentation of French and English. Journal o f Memory and Language 25:385-
400.
255
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1987). Phoneme identification and the
lexicon. Cognitive Psychology 19:141-177.
Delattre, P., & Freeman, D. C. (1968). A dialect study of American R's by X-ray
motion picture. Linguistics 44:29-68.
Dell, G. S., & Newman, J. E. (1980). Detecting phonemes in fluent speech. Journal o f
Verbal Learning and Verbal Behavior 19:608-623.
Dell, G. S., Reed, K. D., Adams, D. R., & Meyer, A. S. (2000). Speech errors,
phonotactic constraints, and implicit learning: A study of the role of experience in
language production. Journal o f Experimental Psychology: Learning, Memory, and
Cognition 26(6): 1355-1367.
DeLorme Publishing Company (1998). Iowa atlas and gazetteer. Yarmouth, Maine:
DeLorme.
DeLorme Publishing Company (2000). Nebraska atlas and gazetteer. Yarmouth,

Maine: DeLorme.
Di Benedetto, M. G. (1989). Frequency and time variations of the first formant:

Properties relevant to the perception of vowel height. Journal o f the Acoustical
Society o f America 86:67-77.
Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C., & Mehler, J. (1999). Epenthetic
vowels in Japanese: A perceptual illusion? Journal o f Experimental Psychology:
Human Perception and Performance 25(6): 1568-1578.
Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and
cross-validation. The American Statistician 37(l):36-48.
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York:
Chapman and Hall.
Elman, J. L., & McClelland, J. L. (1988). Cognitive penetration of the mechanisms of

perception: Compensation for coarticulation of lexically restored phonemes.
Journal o f Memory and Language 27:143-165.
Espy-Wilson, C. R. (1992). Acoustic measures for linguistic features distinguishing

the semivowels /w j r Min American English. Journal o f the Acoustical Society o f
America 92(2):736-757.
256
Foss, D. J. (1969). Decision processes during sentence comprehension: Effects of
lexical item difficulty and position upon decision times. Journal o f Verbal
Learning and Verbal Behavior 8:457-462.
Foss, D. J., Harwood, D. A., & Blank, M. A. (1980). Deciphering decoding decisions:
Data and devices. In R. A. Cole (Ed.), Perception and production o f fluent speech.
Hillsdale, NJ: Erlbaum.
Fougeron, C., & Keating, P. A. (1997). "Articulatory strengthening at edges of

prosodic domains" Journal o f the Acoustical Society o f America 101:3728-3740.
Fox, R. A. (1984). Effect of lexical status on phonetic categorization. Journal o f

Experimental Psychology: Human Perception and Performance 610:526-540.
Frauenfelder, U. H., Segui, J., & Dijkstra, T. (1990). Lexical effects in phonemic
processing: Facilitory or inhibitory? Journal o f Experimental Psychology: Human
Perception and Performance 16(1):77-91.
Friederici, A. D., & Wessels, J. M. (1993). Phonotactic knowledge of word boundaries

and its use in infant speech perception. Perception and Psychophysics 54(3):287-
295.
Frisch, S. A., & Zawaydeh, B. A. (2001). The psychological reality of OCP-Place in

Arabic. Language 77( 1):91-106.
Frisch, S. A., Large, N. R., & Pisoni, D. B. (2000). Perception of wordlikeness: Effects
of segment probability and length on the processing of nonwords. Journal o f
Memory and Language 42:481-496.
Frisch, S., Broe, M., & Pierrehumbert, J. (1995). The role of similarity in phonology:
Explaining OCP-Place. Proceedings o f the 13th International Conference o f the
Phonetic Sciences 3:544-547.
Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language

47:27-52.
Fukazawa, H., Kitahara, M„ & Ota, M. (1998). Lexical stratification and ranking
invariance in constraint-based grammars. In M. C. Gruber, D. Higgins, K. S.
Olson, & T. Wysocki (Eds.), Proceedings o f the Chicago Linguistics Society 34-2
The Panels (pp. 47-62). Chicago: Chicago Linguistics Society.
257
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal o f
Experimental Psychology: Human Perception and Performance 6(1): 110-125.
Games, S., & Bond, Z. S. (1975). Slips of the ear: Errors in perception of casual
speech. Papers from the 11th Regional Meeting o f the Chicago Linguistics Society,
pp. 214-225. Chicago, Illinois: Chicago Linguistic Society.
Goldinger, S. (1997). Words and voices: perception and production in an episodic

lexicon. In K. Johnson & J. Mullenix (Eds.), Talker variability in speech
processing (pp 33-66). San Diego: Academic Press.
Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New
York: Wiley.
Greenberg, J. H., & Jenkins, J. J. (1964). Studies in the psychological correlates of the
sound system of American English. Word 20:157-177.
Greenberg, J. H. (1963). Some universals of grammar, with particular reference to the

order of meaningful elements. In Greenberg (Ed.), Language universals (pp. 73-
113). Cambridge, MA: MIT Press.
Guenter, J. (2000). What is /l/? Proceedings o f the 26th Annual Meeting o f the Berkeley
Linguistics Society, University of California, Berkeley, February 18-21.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics 47:757-762.
Hall, T. A. (1997). The phonology o f coronals. Amsterdam Studies in the Theory and
History of Linguistic Science, Series IV: Current Issues in Linguistic Theory, Vol.
149. Amsterdam: Benjamins.
Halle, P. A., Segui, J., Frauenfelder, U., & Meunier, C. (1998). Processing of illegal
consonant clusters: A case of perceptual assimilation? Journal o f Experimental
Psychology: Human Perception and Performance 24(2):592-608.
Hammond, M. (1999). The phonology o f English: A prosodic Optimality-Theoretic

approach. Oxford, UK: Oxford University Press.
Harris, Z. (1951). Methods in structural linguistics. Chicago: University of Chicago

Press.
258
Hay, J., Pierrehumbert, J., & Beckman, M. (in press). Speech perception, well-
formedness, and the statistics of the lexicon. In J. Local, R, Ogden, & R. Temple
(Eds.), Papers in laboratory phonology VI. Cambridge, U.K.: Cambridge
University Press.
Hayes, B. (1980). A metrical theory o f stress rules. Ph.D. dissertation, MIT.
Hsieh, H.-I. (1976). On the unreality of some phonological rules. Lingua 38:1-19.
Hultzen, L. (1965). Consonant clusters in English. American Speech 40:5-19.
Inkelas, S. (1994). The consequences of optimization for underspecification. MS,

Rutgers Optimality Archive, Rutgers University.
Ito, J., & Mester, R. A. (1994). Japanese phonology. In J. A. Goldsmith (Ed.),

Handbook o f phonological theory (pp. 817-838). Cambridge, MA: Blackwell.
Ito, J., & Mester, R. A. (1995). The Core-Periphery Structure of the Lexicon and
Constraints on Reranking. In J. Beckman, S. Urbanczyk, & L. Walsh (Eds.),
University o f Massachusetts occasional papers in linguistics [UMOP] 18: Papers
in Optimality Theory (pp. 181-209). Amherst: GLSA.
Jaeger, J., Lockwood, A., Kemmerer, D., Van Valin, R., & Khalak, H. (1996). A
positron-emission-tomographic study of regular and irregular verb morphology in
English. Language 72(3):451-497.
Jakobson, R., Fant, G. M., & Halle, M. (1952). Preliminaries to speech analysis: The
distinctive features and their correlates. Cambidge, MA: MIT Press.
Johnson, K. (1997). Speech perception without speaker normalization: An exemplar

model. In K. Johnson & J. Mullenix (Eds.), Talker variability in speech processing
(pp. 145-166). San Diego: Academic Press.
Jones, D. (1997). English pronouncing dictionary, 15th ed. (P. Roach & J. Hartman,
eds.). Cambridge, UK: Cambridge University Press.
Jusczyk, P. W„ Friederici, A. D., Wessels, J., Svenkerud, V. Y., & Jusczyk, A. M.

(1993). Infants' sensitivity to the sound pattern of native-language words. Journal
o f Memory and Language 32:402-420.
259
Jusczyk, P. W„ Luce, P. A., & Charles-Luce, J. (1994). Infants' sensitivity to
phonotactic patterns in the native language. Journal o f Memory and Language
33:630-645.
Kahn, D. (1980). Syllable-based generalizations in English phonology. New York:

Garland.
Kenstowicz, M. (1994). Phonology in generative grammar. Cambridge, MA:

Blackwell.
Kirk, C. J. (2000). The effects of stress on the segmentation of continuous speech.

Paper presented at the 75th annual meeting of the Linguistic Society of America,
Washington, DC, January 5th.
Kirk, C. J. (2001). Phonological constraints on the segmentation o f continuous speech.

Ph.D. dissertation, University of Massachusetts, Amherst.
Klapp, S. T., Boches, C. A., Trabert, M. L., & Logan, G. D. (1991). Automatizing
alphabet arithmetic: II. Are there practice effects after automaticity is achieved?
Journal o f Experimental Psychology: Learning, Memory, and Cognition 17(2): 196-
209.
Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer. Journal o f the

Acoustical Society o f America 67:971-995.
Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysis and

lexical access. Journal o f Phonetics 7:279-312.
Kluender, K.R., & Lotto, A. J. (1994). Effects of first formant onset frequency on [-
voice] judgments result from general auditory processes not specific to humans.
Kucera, H, & Francis, W. N. (1967). Computational analysis o f present-day English.

Providence: Brown University Press.
Ladefoged, P. (1993). A course in phonetics. Fort Worth: Harcourt Brace.
Lahiri A., & Marslen-Wilson, W. (1991). The mental representation of lexical form: A
phonological approach to the recognition lexicon. Cognition 38:245-294.
260
Lamontagne, G. (1993). Syllabification and consonant cooccurrence conditions. Ph.D.
dissertation, University of Massachusetts, Amherst.
Liberman, A.M., Harris, K. S., Hoffman, H. S., & Griffith, N. C. (1957). The
discrimination of speech sounds within and across phonemic boundaries. Journal o f
Experimental Psychology 53:358-368.
Lindsay, P. H., & Norman, D. A. (1977). Human information processing. New York:
Academic Press.
Logan, G. D. (1988a). Toward an instance theory of automatization. Psychological

Review 95(4):492-527.
Logan, G. D.( 1988b). Automaticity, resources, and memory: Theoretical controversies

and practical implications. Human Factors 30(5):583-598.
Logan, G. D., & Klapp, S. T . (1991). Automatizing alphabet arithmetic: I. Is extended

practice necessary to produce automaticity? Journal o f Experimental Psychology:
Learning, Memory, & Cognition 17(2): 179-195.
Luce, P. A. (1986). Neighborhoods o f words in the mental lexicon. Technical Report

#6, Speech Research Laboratory, Department of Psychology, Indiana University.
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood
activation model. Ear and Hearing 19:1-36.
Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush & E.

Galanter (Eds.), Handbook o f Mathematical Psychology, Volume I (pp. 103-189).
New York: Wiley.
Macmillan, N. A., & Creelman, C. D. (1991). Signal detection theory: A user's guide.
Cambridge, UK: Cambridge University Press.
Maddieson, I. (1984). Patterns o f sounds. Cambridge, UK: Cambridge University

Press.
Mann, V. A. (1980). Influence of preceding liquid on stop consonant perception.

Perception and Psychophysics 28(5):407-412.
261
Mann, V. A. (1986). Distinguishing universal and language-dependent levels of speech
perception: Evidence from Japanese listeners’ perception of English “I” and “r”.
Cognition 24(3): 169-196.
Mann, V. A., & Repp, B. (1981). Influence of preceding fricative on stop consonant
perception. Journal o f the Acoustical Society o f America 69(2):548-558.
Marslen-Wilson, W. D. (1984). Function and process in spoken word recognition. In

H. Bouma & D. G. Brouwhuis (Eds.), Attention and performance X: Control of
language processes (pp. 125-149). Hillsdale, NJ: Erlbaum.
Marslen-Wilson, W. D., & Welsh A. (1978). Processing interactions and lexical access
during word recognition in continuous speech. Cognitive Psychology 10:29-63.
Martin, S. E. (1952). Morphophonemics of standard colloquial Japanese. Language

28(2, Part 2): 1-113.
Massaro, D. W., & Cohen M., (1983). Phonological context in speech perception.
Perception and Psychophysics 34:338-348.
McCarthy, J. J. (1988). Feature geometry and dependency: A review. Phonetica 43:84-

108.
McCarthy, J. J. (1991). The phonology of Semitic pharyngeals. MS, University of

McCarthy, J. J. (1998). Morpheme structure constraints and paradigm occultation. In

M.C. Gruber, D. Higgins, K. Olson, & T. Wysocki (Eds.), Papers from the 32nd
Regional Meeting o f the Chicago Linguistic Society: The Panels (pp. 123-150).
Chicago: Chicago Linguistic Society.
McCarthy, J. J., & Prince, A. (1995). Correspondence and reduplicative identity. In J.

N. Beckman, L. Walsh Dickey, & S. Urbanczyk (Eds.), University o f
Massachusetts occasional papers in linguistics [UMOP] 18: Papers in Optimality
Theory (pp. 249-384). Amherst: GLSA.
McClelland, J. L., & Elman, J. L. (1986). Interactive processes in speech perception:

The TRACE model. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel
distributed processing. Volume 2 (pp. 58-121). Cambridge, MA: MIT Press.
262
McQueen, J. M. (1991). The influence o f the lexicon on phonetic categorization:
Stimulus quality in word-final ambiguity. Journal o f Experimental Psychology:
Human Perception and Performance 17:433-443.
McQueen, J. M. (1996). Phonetic categorisation. Language and Cognitive Processes

ll(6):655-664.
McQueen, J. M. (1998). Segmentation of continuous speech using phonotactics.

Journal o f Memory and Language39(l):21-46.
McQueen, J. M., Norris, D., .& Cutler, A. (1994). Competition in spoken word
recognition: Spotting words in other words. Journal o f Experimental Psychology:
Learning, Memory, and Cognition 20:621-638.
McQueen, J. M., Norris, D., .& Cutler, A. (1999). Lexical influence in phonetic
decision making: Evidence from subcategorical mismatches. Journal o f
Experimental Psychology: Human Perception and Performance 25(5): 1363-1389.
Mehler, J., Dommergues, J. Y., Frauenfelder, U., & Segui, J. (1981). The syllable's role
in speech segmentation. Journal o f Verbal Learning and Verbal Behavior 20:298-
305.
Miyawaki, K., Strange, W„ Verbrugge, R., Liberman, A. M., Jenkins, J. J., &
Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r]
and [I] by native speakers of Japanese and English. Perception and Psychophysics
18:331-340.
Moreton, E., & Amano, S. (1999). Phonotactics in the perception of Japanese vowel
length: Evidence for long-distance dependencies. Paper presented at Eurospeech
1999, Budapest.
Moreton, E., Amano, S., & Kondo, T. (1998). Statistical phonotactics of Japanese:
transitional probabilities within the word. Transactions o f the Technical Committee
on Psychological Acoustics, Acoustical Society o f Japan, H-98-120.
Morton, J., & Long, J. (1976). Effect of word transitional probability on phoneme
identification. Journal o f Verbal Learning and Verbal Behavior 15:43-51.
Narayanan, S. S., Alwan, A. A., & Haker, K. (1997). Toward articulatory-acoustic

models for liquid approximants based on MR1 and EPG data. Part I. The laterals.
263
Nazzi, T„ Jusczyk, P. W„ & Johnson, E. K. (2000). Language discrimination by
English-leaming 5-month-olds: Effects of rhythm and familiarity. Journal o f
Nearey, T. M. (1990). The segment as a unit of speech perception. Journal o f

Phonetics 18:347-373.
Nearey, T. M., & Assmann, P. F. (1986). Modeling the role of inherent spectral change
in vowel identification .Journal o f the Acoustical Society o f America 80:1297-1308.
Newman, R. S., Sawusch, J. R., & Luce, P. A. (1997). Lexical neighborhood effects in
phonetic processing. Journal o f Experimental Psychology: Human Perception and
Performance 23:873-889.
Newmeyer, F. J. (1986). Linguistic theory in America. Orlando: Academic Press.
Norris, D. (1994). Shortlist: a connectionist model of continuous speech recognition.

Cognition 52:189-234.
Norris, D., McQueen, J. M„ & Cutler, A. (1997). The possible-word constraint in the
segmentation of continuous speech. Cognitive Psychology 34(3): 191-243.
Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech
recognition: Feedback is never necessary. Behavioral and Brain Sciences
23(3):299-325.
Padgett, J. (1991). Stricture in feature geometry. Ph.D. dissertation. University of

Pertz, D. L., & Bever, T. G. (1975). Sensitivity to phonological universals in children

and adolescents. Language 51(1): 149-162.
Pitt, M. A. (1998). Phonological processes and the perception of phonotactically illegal

consonant clusters. Perception and Psychophysics 60:941-951.
Pitt, M. A., & Samuel, A. G. (1993). An empirical and meta-analytic evaluation of the
phoneme identification task. Journal o f Experimental Psychology: Human
Perception and Performance 19(4):699-725.
Pitt, M. A., & McQueen, J. M. (1998). Is compensation for coarticulation mediated by

the lexicon? Journal o f Memory and Language 39:347-370.
264
Polivanov, E. (1931). La perception des sons d’une langue etrangere. Travawcdu
Cercle linguistique de Prague 4:79-86.
Prince, A., & Smolensky, P. (1993). Optimality’ Theory: Constraint interaction in

generative grammar. MS, Rutgers University.
Repp, B. H. (1982). Phonetic trading relations and context effects: new experimental
evidence for a speech mode of perception. Psychological Bulletin 92(l):81-l 10.
Rosenthall, S. (1997). Vowel/glide alternations in a theory o f constraint interaction.

New York: Garland.
Rubin, D. C. (1976). Frequency of occurrence as a psychophysical continuum.

Perception and Psychophysics 20(5):327-330.
Rubin, P., Turvey, M. T., van Gelder, P. (1976). Inital phonemes are detected faster in
spoken words than in spoken nonwords. Perception and Psychophysics 19:394-
398.
Rumelhart, D. E., Hinton, G. E., Williams, R. J. (1986). Learning internal

representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.),
Parallel distributed processing. Volume 1 (pp. 318-362). Cambridge, MA: MIT
Press.
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of
distributional cues. Journal o f Memory and Language 35(4):606-62l.
Saffran, J. R., Newport, E. L., Aslin, R. N., &Tunick, R. A. (1997). Incidental

language learning: listening (and learning) out of the comer of your ear.
Psychological Science 8(2): 101-105.
Sagey, E. (1990). The representations o f features in non-linear phonology: The

articulator node hierarchy. New York: Garland.
Samuel, A. G. (1981a). Phonemic restoration: Insights from a new methodology.

Journal of Experimental Psychology: General 110:474-494.
Samuel, A. G. (1981b). The role of bottom-up confirmation in the phonemic

restoration illusion. Journal o f Experimental Psychology: Human Perception and
Performance 7(5): 1124-1131.
265
Samuel, A. G. (1987). Lexical uniqueness effects on phonemic restoration. Journal o f
Samuel, A. G. (1991). A further examination of attentional effects in the phonemic

restoration illusion. Quarterly Journal o f Experimental Psychology 43A(3):679-
699.
Samuel, A. G. (1996). Phoneme restoration. Language and Cognitive Processes

ll(6):647-653.
Sapir, E. (1933). La Realite psychologique des phonemes. Journal de Psychologie

Normale et Pathologique 30:247-265.
Sato, P. T. (1985). Denominal verbs with -r: A response to de Chene. Japanese

Linguistics 10:149-169.
Scholes, R. J. (1966). Phonotactic grammaticality. The Hague: Mouton.
Segui, J., & Frauenfelder, U. (1986). The effect of lexical constraints upon speech
recognition. In F. Klix & H. Hagendorf (Eds.), Human memory and cognitive
capabilities (pp. 795-808). Amsterdam: Elsevier.
Selkirk, E. O. (1988). Dependency, place, and the notion "tier". Ms., Department of
Linguistics, University of Massachusetts, Amherst.
Shibatani, M. (1973). The role of surface phonetic constraints in generative phonology.

Language 48(1):87-106.
Shibatani, M. (1990). The languages o f Japan. Cambridge, UK: Cambridge University

Press.
Smit, A. B. (1993). Phonologic error distributions in the Iowa-Nebraska Articulation

Norms Project: Word-initial consonant clusters. Journal o f Speech and Hearing
Research 36:931-947.
Smith, J. L. (1999). Noun faithfulness and accent in Fukuoka Japanese. In S. Bird, A.

Camie, J. D. Haugen, & P. Norquest, eds., Proceedings o f the West Coast
Conference on Formal Linguistics XVIII. Somerville, MA: Cascadilla Press, 519-
531.
266
Smith, R. C., & Dixon, T. P. (1971). Frequency and the judged familiarity of
meaningful words. Journal o f Experimental Psychology 88(2):279-28l.
Sproat, R., & Fujimura, O. (1993). Allophonic variation in English /!/ and its
implications for phonetic implementation. Journal o f Phonetics 21: 291-311.
Stevens, K. N. (1999). Acoustic phonetics. Cambridge, MA: MIT Press.
Stockmal, V., Moates, D. R., & Bond, Z. S. (2000). Same talker, different language.
Applied Psycholinguistics 21:383-393.
Suzuki, K„ Maye, J., & Ohno, K. (2000). On the productivity of lexical stratification in
Japanese. Paper presented at the annual meeting of the Linguistic Society of
America, Chicago.
Tateishi, K. (1990). Phonology of Sino-Japanese morphemes. In G. Lamontagne & A.

Taub (Eds.), University o f Massachusetts occasional working papers in linguistics
[UMOPJ 13 (pp. 209-235). Amherst, MA: GLSA.
Taylor, M. M., & Creelman, C. D. (1967). PEST: Efficient estimates on probability

functions. Journal o f the Acoustical Society o f America 41:782-787.
Tesar, B., & Smolensky, P. (1993). The leamability of Optimality Theory: an

algorithm and some basic complexity results. Technical Report CU-CS-678-93,
Department of Computer Science, University of Colorado at Boulder.
Thorndike, E. L., & Lorge, I. (1944). The teacher's word book o f 30,000 words. New
York : Teachers College, Columbia University.
Treiman, R., Kessler, B., Knewasser, S., Tincoff, R., & Bowman, M. (1996 [2000]).
English speakers' sensitivity to phonotactic patterns. In M. Broe & J. B.
Pierrehumbert (Eds.), Papers in laboratory phonology V: Acquisition and the
lexicon (pp. 269-283). Cambridge, UK: Cambridge University Press.
Treiman, R. & Zukowski, A. (1990). Toward an understanding of English

syllabification. Journal o f Memory and Language 29:66-85.
Treiman, R., & Danis, C. (1988). Syllabification of intervocalic consonants. Journal o f

267
Treiman, R., Gross, J., & Cwikiel-Glavin, A. (1992). The syllabification of fsf clusters
in English. Journal of Phonetics 20:383-402.
Trudgill, P. (1999). The dialects o f England, 2nd Edition. London: Blackwell.
Tyler, L. K., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming
latency data. Perception and Psychophysics 38(3):217-222.
U. S. Census Bureau (1990). Statistical abstract o f the United States. Washington,

D.C.: U.S. Department of Commerce.
Vance, T. J. (1987). An introduction to Japanese phonology. Albany: State University

of New York Press.
Vance, T. J. (1991). A new experimental study of Japanese verb morphology. Journal

o f Japanese Linguistics 13:145-166.
Venkatagiri, H. S. (1999). Clinical measurement of rate of reading and discourse in

young adults. Journal o f Fluency Disorders 24(3):209-226.
Vitevich, M. S., & Luce, P. A. (1998). When words compete: Levels of processing in
spoken word recognition. Psychological Science 9:325:329.
Vitevich, M. S., & Luce, P. A. (1999). Probabilistic phonotactics and neighborhood

activation in spoken word recognition. Journal o f Memory and Language 40:374-
408.
Vitevich, M. S., Luce, P. A., Charles-Luce, J., & Kemmerer, D. (1997). Phonotactics
and syllable stress: implications for the processing of spoken nonsense words.
Language and Speech 40(l):47-62.
Wall, L., Christiansen, T., & Schwartz, R. L. (1996). Programming Perl, 2nd. ed.
Cambridge, MA: O'Reilly.
Walsh Dickey, L. (1997). The phonology o f liquids. Ph.D. dissertation, University of

Warren, P., & Marslen-Wilson, W. (1987). Continuous uptake of acoustic cues in

spoken word recognition. Perception and Psychophysics 41(3):262-275.
Wooley, D. E. (1970). Feature redundancy in consonant clusters. Linguistics 64:70-93.
268
Wurm, L. H., & Samuel, A. G. (1997). Lexical inhibition and attentional allocation
during speech perception: Evidence from phoneme monitoring. Journal o f Memory
and Language 36:165-187.
Xu, Y. (1991). Depth of phonological recoding in short-term memory. Memory and

Cognition 19(3):263-273.
Yip, M. (1989). Feature geometry and co-occurrence restrictions. Phonology 6(2):349-

374.
Yule, H., & Bumel, A. C. (1886 [1985]). Hobson-Jobson: A glossary o f colloquial

Anglo-Indian words and phrases, and o f kindred terms, etymological, historical,
geographical, and discursive, 2nd edition. London: Routledge and Kegan Paul.
Zimmer, K. E. (1969). Psychological correlates of some Turkish morpheme structure

conditions. Language 45(2):309-321.
269

Moreton, Alfred Elliott Phonological Grammar in Speech

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Moreton, Alfred Elliott Phonological Grammar in Speech

Uploaded by

Copyright:

Available Formats

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the

Oversize materials (e.g., maps, drawings, charts) are reproduced by

Photographs included in the original manuscript have been reproduced

ProQuest Information and Learning

ALFRED ELLIOTT MORETON

Submitted to the Graduate School of the

All rights reserved.

ProQuest Information and Learning Company

ALFRED ELLIOTT MORETON

Approved as to style and content by:

Lyn Frazier, Member

larles E. Clifton, Member

Elisabeth O. Selkirk, Department Head

who were my connection to the wider world of psycholinguistics. Shigeaki Amano

quick thinking and uncanny influence over the university administration.

my classmates Isadora Cohen, Kiyomi Kusumoto, Junko Shimoyama, and Bernhard

Jennifer Smith, a stalwart comrade-in-arms throughout our common dissertating time.

wonderful music, dancing, and comradeship.

PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION

ALFRED ELLIOTT MORETON, B. A., SWARTHMORE COLLEGE

Ph.D., UNIVERSITY OF MASSACHUSETTS, AMHERST

Directed by: Professor John Kingston

This dissertation investigates the ways in which speech perception is guided by

particular focus on how phonotactics affects the interpretation of acoustically ambiguous

segments. A model is proposed in which phonological grammar, expressed here as a

system of ranked and violable constraints within the framework of phonological

input. This grammar-based theory is contrasted with two grammarless alternative

accounts of perception: the connectionist network TRACE, which derives phonotactic

Experimental evidence is presented to show (1) that English listeners'judgments

of vowels and of consonant clusters disfavor configurations which are grammatically

explained by foreign-language exposure. Two experiments with Japanese listeners find

obtained with the same listeners and paradigm.

perception have access to a full-fledged phonological competence.

LIST OF FIGURES................................................................................................................ xix

2.3.2.1.1. Consonant features........................................................................21

2.3.2.3.1. Undominated faithfulness constraints......................................... 30

3. THEORIES OF PHONOTACTIC EFFECTS IN SPEECH PERCEPTION............... 53

3.2.1. How TRACE works........................................................................................ 53

3.3.1. Simulation: Success of statistical predictions.............................................. 63

3.3.3. Statistical context effects on phoneme perception........................................76

3.4. A grammar-based account....................................................................................... 89

3.4.1. Choice of grammatical theory........................................................................90

3.4.1.1. Grammatical framework........................................................................90

3.4.2. Decision mechanism.......................................................................................93

4.2.3.1. TRACE simulation................................................................................119

4.2.3.1.1. Calibration and replication of the original TRACE results.... 119

4.2.3.2. MERGE T P ...........................................................................................131

4.2.3.21. INC-1............................................................................................ 132

4.2.3.3. OT grammatical theory......................................................................... 135

4.3. Experiment 2: Sequence frequency and word-initial [pw] clusters.................... 143

4.2.3.1. TRACE simulation.............................................................................. 147

4.2.3.2.I. INC-1........................................................................................... 150

4.2.3.3. OT grammatical theory.......................................................................154

4.3.4. Methods........................................................................................................ 155

4.4. Experiment 3: Sequence frequency and the relative phonotactic badness

4.4.1. Rationale......................................................................................................... 159

4.4.4.1. TRACE simulation...............................................................................163

4.4.4.2. MERGE T P ..........................................................................................171

4.2.3.2.1. IN C -1.......................................................................................... 171

4.4.3.3. OT grammatical theory....................................................................... 175

4.4.4. Methods........................................................................................................... 176

4.4.5.1. [_ v n A tn ] stimuli.................................................................................... 177

4.4.6. Discussion....................................................................................................... 185