You are on page 1of 292

INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.

The quality of this reproduction is dependent upon the quality of the


copy submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by


sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.

Photographs included in the original manuscript have been reproduced


xerographically in this copy. Higher quality 6” x 9" black and white
photographic prints are available for any photographs or illustrations appearing
in this copy for an additional charge. Contact UMI directly to order.

ProQuest Information and Learning


300 North Zeeb Road. Ann Arbor, Ml 48106-1346 USA
800-521-0600

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION

A Dissertation Presented

by

ALFRED ELLIOTT MORETON

Submitted to the Graduate School of the


University of Massachusetts, Amherst, in partial fulfillment
of the requirements for the degree of

DOCTOR OF PHILOSOPHY

May 2002

Department of Linguistics

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 3056262

Copyright 2002 by
Moreton, Alfred Elliott

All rights reserved.

___ ®

UMI
UMI Microform 3056262
Copyright 2002 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company


300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
© Copyright by Elliott Moreton 2002
All rights reserved

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION

A Dissertation Presented

by

ALFRED ELLIOTT MORETON

Approved as to style and content by:

John

Lyn Frazier, Member

larles E. Clifton, Member

Elisabeth O. Selkirk, Department Head


Department of Linguistics

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ACKNOWLEDGEMENTS

I had a lot of help with this. John Kingston trained me up from nothing and saw

me through the whole thesis with more patience than I really deserved. He has always

been willing to discuss an idea, demonstrate a technique, debug a stimulus set, or go over

a draft. This, plus his eye for logical flaws and tenacious memory for obscure journal

articles, were indispensable to the writing of this thesis. So were the advice and

encouragement of the other two committee members, Lyn Frazier and Chuck Clifton,

who were my connection to the wider world of psycholinguistics. Shigeaki Amano

provided invaluable aid by inviting me to the NTT Basic Research Labs and supervising

my work there. John McCarthy was only tangentially involved in the present work, but

I'm going to thank him anyway because his classes and seminars are among the most

fascinating experiences I've ever had. Parts of Chapter 4 benefited from the comments of

two anonymous Cognition reviewers. Johns Hopkins University has sheltered me while I

completed my revisions. This research was paid for in part by the U.S. National Science

Foundation, the U.S. National Institutes of Health, and the Nippon Telephone and

Telegraph Company.

Kathy Adamczyk and Lynne Ballard rescued me from many a disaster with their

quick thinking and uncanny influence over the university administration.

iv

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dissertating students love company, and I was fortunate to have good company in

my classmates Isadora Cohen, Kiyomi Kusumoto, Junko Shimoyama, and Bernhard

Schwartz, my housemates Joe Eskinazi, Eva Juarros and Janina Rado, my labmate

Cecilia Kirk, and my just plain mates Andre Isaak, Caroline Jones, and especially

Jennifer Smith, a stalwart comrade-in-arms throughout our common dissertating time.

Special thanks are owed to Earl Gaddis, Virginia van Scoy, and the Northampton Group

of the Boston Branch of the Royal Scottish Country Dance Society for six years of

wonderful music, dancing, and comradeship.

Finally, I would like to thank my parents for their love and encouragement, and

for acting like this was all perfectly normal. This thesis is dedicated to them.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ABSTRACT

PHONOLOGICAL GRAMMAR IN SPEECH PERCEPTION

MAY 2002

ALFRED ELLIOTT MORETON, B. A., SWARTHMORE COLLEGE

Ph.D., UNIVERSITY OF MASSACHUSETTS, AMHERST

Directed by: Professor John Kingston

This dissertation investigates the ways in which speech perception is guided by

the expectation that the stimulus is an utterance in the perceiver's language, with a

particular focus on how phonotactics affects the interpretation of acoustically ambiguous

segments. A model is proposed in which phonological grammar, expressed here as a

system of ranked and violable constraints within the framework of phonological

Optimality Theory, is used to select among competing candidate parses of the acoustic

input. This grammar-based theory is contrasted with two grammarless alternative

accounts of perception: the connectionist network TRACE, which derives phonotactic

perceptual effects from the lexicon, and a statistical theory based on transitional

probabilities.

Experimental evidence is presented to show (1) that English listeners'judgments

of vowels and of consonant clusters disfavor configurations which are grammatically

illegal in the language, (2) that the dispreference for illegal configurations is far stronger

vi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
than that for configurations which are legal but have zero frequency, and (3) that it is due

to a response dependency, rather than to auditory or other stimulus factors, and cannot be

explained by foreign-language exposure. Two experiments with Japanese listeners find

that (1) the lexical stratum membership of nonsense words can produce a phonotactic

perceptual effect, (2) that the triggering and target segments can be up to three segments

distant, and (3) that the stratum-phonotactic effect is larger than a word-superiority effect

obtained with the same listeners and paradigm.

These results are shown to be consistent with the grammar-based model, but

inconsistent with the two grammarless alternatives. Analysis of the three models reveals

that the shortcomings of the alternatives is due to their inability to abstract over phoneme

classes and larger linguistic structures. It is concluded that the mechanisms of speech

perception have access to a full-fledged phonological competence.

vii

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS..................................................................................................... iv

ABSTRACT..............................................................................................................................vi

LIST OF TABLES..................................................................................................................xiv

LIST OF FIGURES................................................................................................................ xix

CHAPTER

1. INTRODUCTION................................................................................................................. 1

2. PHONOLOGICAL PRELIMINARIES........................................................................... 11

2.1. Introduction................................................................................................................ 11
2.2. Inventory and phonotactics in Optimality Theory.................................................11
2.3. Inventory and phonotactics of English syllable onsets............................................14

2.3.1. Explicanda........................................................................................................ 15
2.3.2. Analysis.............................................................................................................18

2.3.2.1. Representations...................................................................................... 18

2.3.2.1.1. Consonant features........................................................................21


2.3.2.1.2. Features of [j w 1].........................................................................24

2.3.2.2. CV syllables........................................................................................... 28

2.3.2.3.1. Undominated faithfulness constraints......................................... 30


2.3.2.3.2. Coronal stop places: Dental, retroflex, alveolar, and palato-
alveolar................................................................................................30
2.3.2.3.3. The persistence of [0 ] .................................................................. 34
2.3.2.3.4. Dorsal places of articulation:Palatals and velars....................... 35
2.3.2.3.5. Labial places of articulation:Bilabials and labiodentals 37

viii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.6. The stop-affricate-fricative series................................................37
2.3.2.3.7. Constraint lattice...........................................................................41

2.3.2.3. *[sI]......................................................................................................... 42
2.3.2.4. *[tl].......................................................................................................... 46
2.3.2.5. ??[pw]......................................................................................................48

2.4. Summary................................................................................................................... 51

3. THEORIES OF PHONOTACTIC EFFECTS IN SPEECH PERCEPTION............... 53

3.1. Introduction................................................................................................................53
3.2. TRACE (McClelland & Elman 1986).....................................................................53

3.2.1. How TRACE works........................................................................................ 53


3.2.2. Lexical effects on phoneme perception......................................................... 56
3.2.3. Phonotactic effects on phoneme perception..................................................59
3.2.4. Empirical shortcomings of TRACE...............................................................60

3.3. The MERGE Transitional Probability theory (Pitt & McQueen 1998)............... 61

3.3.1. Simulation: Success of statistical predictions.............................................. 63


3.3.2. Probabilistic theories of speech perception...................................................65

3.3.2.1. Context.................................................................................................... 66
3.3.2.2. Database.................................................................................................. 69
3.3.2.3. Decision ru le.......................................................................................... 72

3.3.3. Statistical context effects on phoneme perception........................................76

3.4. A grammar-based account....................................................................................... 89

3.4.1. Choice of grammatical theory........................................................................90

3.4.1.1. Grammatical framework........................................................................90


3.4.1.2. Particular grammar................................................................................ 92

3.4.2. Decision mechanism.......................................................................................93

ix

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.5. Summary..................................................................................................................100
3.6. Appendix: Computing frequencies...................................................................... 102

4. EMPIRICAL TESTS.........................................................................................................108

4.1. Introduction...............................................................................................................108
4.2. Experiment I: Sequence frequency and the phonotactics of word-final lax
vowels................................................................................................................. 110

4.2.1. Rationale.........................................................................................................110
4.2.2. Design............................................................................................................. 113
4.2.3. Predictions..................................................................................................... 119

4.2.3.1. TRACE simulation................................................................................119

4.2.3.1.1. Calibration and replication of the original TRACE results.... 119


4.2.3.1.2.Simulation of the present experiment.......................................... 123

4.2.3.2. MERGE T P ...........................................................................................131

4.2.3.21. INC-1............................................................................................ 132


4.2.3.2.2. SC-1............................................................................................. 134

4.2.3.3. OT grammatical theory......................................................................... 135

4.2.4. Methods............................................................................................................137
4.2.5. Results..............................................................................................................139
4.2.6. Discussion........................................................................................................ 143

4.3. Experiment 2: Sequence frequency and word-initial [pw] clusters.................... 143

4.3.1. Rationale...........................................................................................................143
4.3.2. Design...............................................................................................................144
4.3.3. Predictions....................................................................................................... 147

4.2.3.1. TRACE simulation.............................................................................. 147


4.2.3.2. MERGE T P ...........................................................................................150

4.2.3.2.I. INC-1........................................................................................... 150

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.3.2.2. SC -1............................................................................................153

4.2.3.3. OT grammatical theory.......................................................................154

4.3.4. Methods........................................................................................................ 155


4.3.5. Results...........................................................................................................156
4.3.6. Discussion..................................................................................................... 158

4.4. Experiment 3: Sequence frequency and the relative phonotactic badness


of [pw] and [tl] onsets........................................................................................ 159

4.4.1. Rationale......................................................................................................... 159


4.4.2. Design..............................................................................................................160
4.4.3. Predictions...................................................................................................... 163

4.4.4.1. TRACE simulation...............................................................................163

4.4.3.1.1. [_fkous]stimuli...........................................................................164
4.4.3.1.2. [_vnAm]stimuli...........................................................................168
4.4.3.1.3. Expected and actual TRACE predictions..................................170

4.4.4.2. MERGE T P ..........................................................................................171

4.2.3.2.1. IN C -1.......................................................................................... 171


4.4.3.2.2. SC -1............................................................................................ 173

4.4.3.3. OT grammatical theory....................................................................... 175

4.4.4. Methods........................................................................................................... 176


4.4.5. Results............................................................................................................. 176

4.4.5.1. [_ v n A tn ] stimuli.................................................................................... 177


4.4.5.2. [_fkous] stimuli.................................................................................... 181

4.4.6. Discussion....................................................................................................... 185

4.5. Experiment 4: Sequence frequency and the relative phonotactic badness


of [bw] and [dl] onsets: Interaction of response variables............................186

xi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5.1. Rationale..........................................................................................................186
4.5.2. Design.............................................................................................................. 187
4.5.3. Predictions...................................................................................................... 187

4.5.3.1. TRACE simulation............................................................................... 187


4.5.3.2. MERGE T P ........................................................................................... 188

4.5.3.2.1. IN C-1........................................................................................... 188


4.5.3.2.2. SC -1..............................................................................................190

4.5.3.3. OT grammatical theory.........................................................................191

4.5.4. Methods........................................................................................................... 193


4.5.5. Results and discussion....................................................................................198
4.5.6. Foreign-language exposure.......................................................................... 201
4.5.7. A note on TRACE..........................................................................................202

4.6. Experiment 5: Phonotactics and syllabification.................................................. 207

4.6.1. Rationale.........................................................................................................208
4.6.2. Design............................................................................................................. 208
4.6.3. Predictions..................................................................................................... 208
4.6.4. Methods..........................................................................................................208
4.6.5. Results............................................................................................................212
4.6.6. Discussion...................................................................................................... 215

4.7. Experiment 6: Phonotactics of the Japanese lexical strata................................. 220

4.7.1. Influence of remote context......................................................................... 220


4.7.2. Rationale: Thelexical strata of Japanese................................................... 221
4.7.3. Experiment 6a: Word-superiority effect......................................................223

4.7.3.1. Design....................................................................................................224
4.7.3.2. M ethods................................................................................................ 225
4.7.3.3. Results and discussion......................................................................... 226

4.7.4. Experiment 6b: Lexical stratum phonotactics........................................... 227

xii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.4.1. Design................................................................................................... 228
4.7.4.2. Predictions........................................................................................... 230

4.7.4.2.1. TRACE........................................................................................ 230


4.7.4.2.2. MERGE T P ................................................................................ 230
4.7.4.2.3. OT grammatical theory.............................................................. 231

4.7.4.3. M ethods................................................................................................234
4.7.4.4. R esults.................................................................................................. 235
4.7.4.5. Discussion............................................................................................ 238

4.8. Summary..................................................................................................................239
4.9. Appendix: Synthesis parameters for the stimuli of Experiments 4 and 5 .........240

5. CONCLUSIONS...................................................................................................... 248

BIBLIOGRAPHY...........................................................................................................253

xiii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF TABLES

Table Page

2.2. C[i w 1] onsets of American English.................................................................... 16

2.3. Obstruent manner features.................................................................................... 21

2.4. Obstruent place features.........................................................................................22

2.5. Representation of consonants................................................................................ 24

2.6. Manner features for [j w ].......................................................................................25

2.7. Place features for [i w ] .......................................................................................... 26

2.8. Manner features for [1]........................................................................................... 27

2.9. Place features for [1]................................................................................................28

2.10. English surface obstruent inventory in CV syllables...........................................29

2.15. Repair of retroflexes to alveolars (Yule & Bumell 1886, American


Heritage Dictionary 2000)...................................................................................31

2.16. Repair of dentals to alveolars (American Heritage Dictionary 2000).............. 32

2.25. Number of languages with stops at given places in the sample of


Maddieson (1984:Table 2 .5 )............................................................................... 38

2.26. Frequency of the most common affricates in the sample of Maddieson


(1984:Table 2.8)....................................................................................................38

2.28. Labial, alveolar, and palatoalveolar series of American English..................... 39

2.35. English surface obstruent inventory in C[j]V syllables..................................... 43

2.39. English surface obstruent inventory in C[1]V syllables..................................... 46

xiv

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2. Probability that a given diphone will be followed by a given segment
(extract from complete table)..............................................................................64

3.3. Results of the simulation: Success rate as a function of context size............... 65

3.5. Attested English phoneme sequences of lengths 2,3, and 4 .............................. 68

3.6. Transitional probabilities for the stimuli of Pitt & McQueen (1998), n= 1... 71

3.12. Triphone frequencies for sequences ending in [i]/[o] in the stimuli of


McClelland and Elman (1988)............................................................................82

3.13. Triphone frequencies for sequences ending in [s]/[f] in the stimuli of


McClelland and Elman (1988)............................................................................83

3.14. Triphone frequencies for the stimuli of McQueen and Pitt (1998)...................84

3.15. Cohorts at the appearance of the ambiguous fricative in the experiment of


McClelland and Elman (1988, Experiment 3)....................................................85

3.16. Cohorts at the appearance of the ambiguous fricative in the experiment


of Pitt and McQueen (1998, Experiment 3 ).......................................................86

3.17. Likelihood ratio as a predictor of the phonotactic bias effects of Pitt (1998). 88

4.1. Distribution of tense and lax vowels in American English............................ 111

4.2. Change of lax to tense vowels when made final by truncation....................... 112

4.3. Phonotactics of stimuli for Experiment 1.........................................................115

4.4. Frequency of the syllables in stimuli for Experiment 1................................. 116

4.5. Effect of the [k]/[g] manipulation on the frequency of the word-final


syllables in the stimuli of Experiment 1.......................................................... 118

4.6. Parameter settings for the TRACE simulation (all experiments)....................120

xv

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
4.9. Featural parameters of the four original TRACE vowels (McClelland &
Elman 1988)....................................................................................................... 124

4.10. Featural parameters of the new vowels [I] and [X ]...........................................124

4.17. Diphone frequencies for the stimuli of Experiment 1....................................... 132

4.29. Mean % [I] response, all intermediate stimuli.................................................. 141

4.30. Differences in mean "I" response, pairwise by subject.....................................142

4.31. Phonotactics of the stimuli for Experiment 2 .................................................... 145

4.32. Frequency of the syllables in the stimuli for Experiment 2 ............................. 146

4.33. Word-initial occurrences of the critical syllables from Experiment 2 ............ 148

4.34. Results of the TRACE simulation of Experiment 2: Activation levels at


Cycle 7 5 ............................................................................................................... 149

4.35. Diphone frequencies for the stimuli in Experiment 2 ....................................... 151

4.40: Triphone frequencies for the stimuli of Experiment 2......................................153

4.46. Mean % [p] response, all intermediate stimuli.................................................. 157

4.47. Phonotactics of the stimuli for Experiment 3 .................................................... 160

4.48. Frequency statistics for the stimuli of Experiment 3 ........................................ 162

4.51. Words beginning with the critical onsets in the lexicon used for the
TRACE simulation of Experiment 3 ................................................................. 165

4.53. Results of the TRACE simulation of Experiment 3: Activation levels at


Cycle 7 5 ................................................................................................................166

4.56. Results of the TRACE simulation of Experiment 3: Activation levels at


Cycle 7 5 ................................................................................................................ 169

xvi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.60. Diphone frequencies for the stimuli of Experiment 3......................................171

4.66. Triphone frequencies for the stimuli of Experiment 3 .................................... 174

4.72. Mean percent "p" response, all intermediate [...vnAm] stimuli....................... 180

4.73. Differences in mean "p" response, pairwise by subject, [...vnAm] stimuli... 180

4.76. Mean percent "p" response, all intermediate [...fkous] stimuli....................... 184

4.77. Differences in mean "p" response, pairwise by subject, [...fkous] stimuli,


[i] condition on ly 184 ■

4.80. Diphone frequencies for the stimuli of Experiment 4 ..................................... 189

4.82. Triphone frequencies for the stimuli of Experiment 4 .................................... 190

4.83. Frequency of occurrence of the clusters of Experiment 4 as onsets in


English.................................................................................................................192

4.89. Lexicon for TRACE simulation: Words with [b/d/g]+[w/l] onsets................204

4.94. Demographics and occurrences of [bw pw]-initial place names in Iowa


and Nebraska (U.S. Census Bureau, 1990; DeLorme 1998; 2000)................217

4.95. Errors in production of the initial stop in [bl pi tw] onsets by English-
leaming children in Iowa and Nebraska (Smit 1993)..................................... 218

4.96. The lexical strata of Japanese.............................................................................222

4.97. Stimulus words used in Experiment 6a............................................................. 224

4.98. Boundary difference in long- and short-biased contexts (in milliseconds),


Experiment 6a (N = 21 Ss).................................................................................227

4.99. Validity of cues to stratum membership: Number of nouns in database


belonging to each stratum containing the given cues..................................... 229

xvii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.102. Stimuli for Experiment 6 b .................................................................................234

4.104. Constant synthesis parameters which were identical for the "b" and "d"
arrays of Experiments 4 and 5 ........................................................................... 240

4.105. Time-varying synthesis parameters common to the "b” and "d" arrays of
Experiments 4 and 5............................................................................................ 241

4.106. Synthesis parameters for the "b" array of Experiments 4 and 5 .............. 243

4.107. Synthesis parametersfor the "d" array of Experiments 4 and 5 ..................... 245

xviii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF FIGURES

Figure Page

2.1. Architecture of the OT phonological model........................................................... 12

3.1. The TRACE model of McClelland and Elman (1986).......................................... 54

3.11. Boundary shift in perceptual space....................................................................... 75

3.23. Architecture of an OT-grammatical-based parsing model.................................. 95

4.7. Results of the TRACE simulation replicating Figure 7 of McClelland and


Elman (1986)...................................................................................................... 121

4.8. Results of the TRACE simulation replicating Figure 8 of McClelland and


Elman (1988)...................................................................................................... 123

4 .1 1. Results of the TRACE simulation for the input [salgjXfl.......................... 127

4.12. Results of the TRACE simulation for the input [salleiX]-] ...........................128

4.13. Results of the TRACE simulation for the input [solgjX]............................129

4.14. Results of the TRACE simulation for the input [salluX]............................130

4.26. Schema for the filler stimuli of Experiment 1.....................................................138

4.27. Schema for the critical stimuli of Experiment....1...............................................138

4.28. Identification curves for the stimuli of Experiment 1, pooled across 14


listeners................................................................................................................140

4.43. Schema for the critical stimuli of Experiment 2...............................................155

4.44. Schema for the Filler stimuli of Experiment 2 .....................................................155

xix

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.45. Identification curves for the stimuli of Experiment 2, pooled across 7
listeners................................................................................................................157

4.70. Identification curves for the [...vnAmJ stimuli of Experiment 3, pooled


across 12 listeners, comparing the [_1] condition with the [_ j ] baseline 178

4.71. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled


across 12 listeners, comparing the [_w] condition with the [_ j ] baseline.... 179

4.74. Identification curves for the [...fkoUs] stimuli of Experiment 3, pooled


across 12 listeners, comparing the [_1] condition with the [_ j ] baseline 182

4.75. Identification curves for the [...fkous] stimuli of Experiment 3, pooled


across 12 listeners, comparing the [_w] condition with the [_ j ] baseline .... 183

4.85. Synthesis parameters for the stimuli of Experiment 4 .......................................194

4.86. Log odds ratios for the "l"/"w" judgment in Experiment 4, contingent on
the "g’V'd" judgm ent.......................................................................................... 199

4.87. Log odds ratios for the ”17"w" judgment in Experiment 4, contingent on
the "g’7"b" judgm ent.......................................................................................... 200

4.88. Total number of "bw" responses in Experiment 4 as a function of individual


listeners' exposure to languages containing [bw] or [pw] onsets (French,
Mandarin Chinese, or Spanish)........................................................................ 202

4.91. Synthesis parameters for the stimuli of Experiment 5 ...................................... 209

4.92. Log odds ratios for the 'T7"w" judgment in Experiment 4, contingent on the
»b"/"d» judgment, for the CCV stimuli.............................................................213

4.93. Log odds ratios for the 'T7"w" judgment in Experiment 4, contingent on the
"b'7"d" judgment, for the VCCV stimuli......................................................... 214

4.103. Boundary between [a] and [a:], averaged across 21 listeners....................... 236

xx

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 1

INTRODUCTION

This dissertation investigates the role of phonological knowledge in speech

perception. It proposes a theory of performance which makes use of grammatical

competence - specifically, competence expressed in terms of the ranked and violable

constraints of phonological Optimality Theory - to weigh competing hypotheses about the

phonological structure of the speech signal. This theory is tested empirically against the

rival claims of two other models to explain the same phenomena: TRACE, which uses

lexical knowledge, and the MERGE transitional-probability theory, which uses segment-

string frequency.

The phonological phenomenon with which we are concerned here is phonotactic

grammaticality. Languages place tight restrictions on how their segmental inventories can

combine into larger units such a syllables, morphemes, or words, and speakers are sensitive

to these restrictions in a number of ways. The phonologically systematic patterns of

possible and impossible combinations are the phonotactics of the language. Phonotactics

causes redundancy, predictability in speech, which the mechanisms of speech perception

could in principle exploit.

Phonotactic effects turn up in many places. They appear as systematic gaps in the

distribution of sounds in a speech corpus (e.g., Harris 1951, Lamontagne 1993) - what in

Chapter 2 are called phonological gaps. Phonotactics can drive synchronic phonological

alternations, such as that between American English [t] and [rj, which are conditioned by the

neighboring segments (e.g., Prince & Smolensky 1993). A foreign word can undergo

sound changes when it is borrowed that adapt it to the phonotactics of the borrowing

language (ltd & Mester, 1994).

Native speakers share intuitions about the phonological grammaticality in their own

language of novel phoneme strings (Greenberg & Jenkins 1964, Scholes 1966). English

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
listeners can also accurately judge the relative frequency o f non-English consonant clusters

in the languages of the world (Pertz & Bever 1975). A language's phonotactic constraints

are respected by its speakers’ slips of the tongue (Fromkin 1971) and ear (Sapir 1933;

Brown & Hildum 1956; Halle, Segui, Frauenfelder, & Meunier 1998), and have been

shown to influence speakers' perceptions of phonetically ambiguous segments (Massaro &

Cohen 1983; Pitt 1998).

In sum, speakers tend to reject phonotactically illegal stimuli in production and

perception, requiring, for instance, stronger acoustic evidence to believe that they have heard

an illegal stimulus than a legal one. Moreover, illegality measured one way (e.g., off-line

intuitive-goodness judgments) tends to agree with illegality measured in other ways (e.g.,

ambiguous-segment perception). Some sort of language-specific knowledge is being

brought to bear on all o f these.

What is at issue is the nature of that knowledge, and of its interaction with the

mechanisms of language performance. We will be investigating three very different

proposals. Each will be examined chiefly in light of its account of the phonotactic effect on

ambiguous-phoneme perception, a question directly and explicitly addressed by all three: If

a stimulus contains a phoneme which is acoustically ambiguous between one which is legal

in that context and one which is illegal, listeners' reports are biased towards the legal

interpretation compared to their report of the same ambiguous phoneme presented in a

neutral context (Massaro & Cohen 1983).

The claim which I will advance, elaborate, and defend is the following:

( 1)
Speech input is parsed prelexically to a featural or phonemic surface
representation. When acoustic evidence in the incoming speech stream
supports more than one phonological parse, the competing parses are scored
with respect to the ranked active constraints of the speaker's grammar, and
the more harmonic candidate parse is processed first.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This is a way of allowing performance mechanisms to use linguistic competence, by

setting up a perceptual bias against parses which are disfavored by the grammar. It

therefore incorporates into the performance theory the traditional linguistic view that the

difference between a grammatical and an ungrammatical utterance hinges on whether the

utterance fulfills specific formal requirements - whether it meets the structural description

of a set of abstract grammatical rules. In this view, an ambiguous phoneme generates two

(or more) parses. If one is legal in context and the other is not, perception will favor the

legal parse. This is the principal theoretical contribution offered by this dissertation: An

account of how phonological grammar can be used in a parsing theory.

Quite different in vision are the two rival theories, TRACE (McClelland & Elman

1986) and the MERGE transitional-probability (TP) theory (Pitt & McQueen 1998). These

regard phonotactic illegality as a very concrete phenomenon, equivalent to non-occurrence in

the lexicon1. Illegality is the extreme low end of a frequency continuum, and its effects are

effects of frequency. Where these two theories differ is in how they implement frequency

effects.

TRACE is a connectionist model of word recognition, within which phonotactic

effects emerge as side effects of the word-recognition process. A stimulus will activate

phoneme units, which in turn can produce certain levels of activation in a word unit,

depending on the degree to which the stimulus resembles the word represented by that unit.

A stimulus containing a phoneme ambiguous between a legal and an illegal one will partially

activate some words containing the legal phoneme, but none containing the illegal one.

Activation spreading down from the word units to the phoneme units will increase the

activation of the unit corresponding to the legal phoneme, which will laterally inhibit the

illegal phoneme unit. The result is a perceptual bias towards the legal phoneme.

The MERGE TP theory is associated with a model of phonemic processing, the

MERGE model of Norris et al. (2000). In MERGE TP, low-level (pre-Iexical) perceptual

1 Transitional probabilities are assigned to a pre-Iexical m odule in MERGE, but the probabilities
themselves are computed over the lexicon.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mechanisms keep track of the frequencies with which different phoneme sequences occur.

An ambiguous segment and its surrounding context could be interpreted as either of two

sequences, but perception will tend to favor the more frequent possibility - that is, it will

choose the phoneme that, on the basis of past experience, is more likely in the given context.

Both of these models have demonstrated success in accounting for some of the core

phonotactic perceptual phenomena. However, I will argue that neither one is adequate, for

reasons crucially connected with their lack of access to abstract grammatical knowledge.

Both make predictions that are not borne out, and fail to predict phenomena that occur.

The grammatical approach to phonotactics treats these phonotactic effects as a

syndrome, with a single underlying cause, and identifies that cause with listeners' knowledge

of the sound pattern of their language in the form of phonotactic constraints against

particular combinations of sounds (e.g., Shibatani 1973). Speech tasks make the speaker or

listener assign a linguistic parse to the stimulus; parses which are nonexistent or highly

marked in the language will naturally be disfavored.

However, most of the "phonotactic" effects are open to another interpretation:

Perhaps speakers merely know that some sequences are common and others are rare or

nonexistent. The statistical rarity of particular phoneme sequences affects intuitive

"possible-word" judgments (Treiman et al. 1996) and ambiguous phoneme perception

(Newman et al. 1997; Pitt & McQueen 1998) in much the same way as phonotactic

illegality. Rarity also speeds "no" responses in lexical decision and slows same-different

judgments of nonwords (Vitevich & Luce 1998). What linguists have described as a

categorical contrast between possible and impossible sequences can instead be interpreted

as a frequency continuum, with the "impossible" sequences at the irreducible minimum of

zero frequency.

Statistical models differ in which statistics they use and how they use them. In the

perceptual model TRACE (McClelland & Elman 1986), the rarity of particular sound

sequences is encoded in knowledge of words, and statistical knowledge is retrieved by

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
querying the lexicon. One component of the MERGE model of perceptual decision-making

(Norris et al. 2000) keeps track of phoneme-to-phoneme transitional probabilities, which are

used without reference to the lexicon. The current version of the Neighborhood Activation

Model (Luce 1986; Luce & Pisoni 1988; Vitevich & Luce 1998, 1999) combines

knowledge of sublexical sequence frequencies with knowledge of lexical frequencies and

neighborhoods. A statistically-based model of the acceptability-judgment task using co­

occurrence frequency has been put forward by Frisch, Broe, & Pierrehumbert (1995).

The theoretically most attractive aspect of statistical models is their account of how

speakers learn the phonotactics of their language - en passant, as a by-product of learning

its vocabulary.

On the other hand, they do not explain three phenomena that led people to posit

grammars in the first place.

1. If phonotactics is learned simply by learning or hearing words, speakers should

be able to acquire any language at all. The lexical and statistical mechanisms only

distinguish favored from disfavored sound patterns within a language, after the lexicon has

been learned or the statistical patterns have been analyzed. Yet English listeners can

accurately judge the relative frequency of non-English consonant clusters in the languages

of the world (Pertz & Bever 1975). The cross-linguistic commonness or rarity of different

classes of segments, sequences, or processes, is not addressed by statistical learning

theories, nor is the way in which the processes found in one language resemble those found

in others. Capturing these patterns is a central concern of grammatical models of language,

which have evolved a wide array of conceptual tools for this purpose; Articulatory

grounding of constraints (Archangeli & Pulleyblank 1994), implicational markedness

(Greenberg 1964), feature geometry (Clements 1985), natural classes (Chomsky & Halle

1968), and many more.

2. The alternations induced by phonotactics are categorical rather than gradient, and

systematic rather than arbitrary. For example, the phonotactics of Standard German forbid

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
word-final [b d g v z]; in that environment, they turn into [p t k f s], despite their differing

frequencies. The frequency difference between (common) word-final [t] and (zero-

frequency) word-final [a] is much greater than that between (uncommon) word-final [p] and

(zero-frequency) word-final [b], yet German speakers "repair" the illegal final voiced

obstruents to the same extent in both cases. And the repair is not to turn the illegal

obstruents into the most frequent legal obstruent, but into the corresponding legal

obstruent.

3. Phonological alternations occur even if the utterance consists of very rare

morphemes or even nonce forms, and exceptions to regular patterns are less likely to occur

as morpheme frequency decreases. These features suggest that the regularity is distinct

from the forms it applies to, rather than emergent from them.

The weakness of both non-grammatical theories is the superficiality of their

linguistic analysis, which prevents them from abstracting the empirically correct

generalizations about legal and illegal sequences.

TRACE and MERGE TP offer extremely simple theories of phonological

representation. Phonemes are unstructured lists of features, as in Jakobson et al. (1952).

The only phonological domain above the level of the phoneme which is recognized by

TRACE is the word, while MERGE TP also recognizes 2- or 3-phoneme sequences.

Neither represents the phonotactically crucial domain of the syllable, or any of its

constituents such as the onset and rime. Both are incapable of abstracting over features: All

patterns are represented at the level of the phoneme, sequence, or word frequencies. More

abstract properties which influence phonotactics, such as part of speech or lexical stratum,

are not encoded anywhere. The result is that the dependencies these theories represent do

not correspond to the ones which are linguistically and perceptually relevant.

These models cannot distinguish phonological gaps (sequences which are

systematically prohibited) from lexical gaps (sequences which are permitted, but missing

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
from the lexicon through historical accident). If illegality and frequency are the same thing,

then zero-frequency sequences should be equally illegal regardless of why they are illegal.

They cannot distinguish phonotactically relevant context from phonotactically

irrelevant context. For example: The nonword [tli] is illegal in English. The illegality of

the [1] in that context is due entirely to the context on its left - the word boundary and [t],

which create an illegal sequence of two coronal non-continuants in a syllable onset. The [i]

has nothing to do with the phonotactic unacceptability of the string; [tli] and [tla] are both

illegal. TRACE and MERGE TP are blind to this fact. Each applies a fixed "context" to

every phenomenon. The relevant context in TRACE is the entire nonword; that in MERGE

TP is the neighboring phonemes. Both theories therefore incorrectly overestimate the

perceptual influence of incidental context.

It is an empirical question whether listeners process phonological gaps differently

from lexical gaps. Evidence will be presented to show that they can: that phonological gaps

are stronger than lexical gaps, and that phonotactically relevant context is more influential

than phonotactically irrelevant context.

The organization of the dissertation is as follows:

Chapter 2 discusses the phonological background of the theory - the grammar

which it is proposed that performance mechanisms have access to. It first discusses the

Optimality-Theoretic approach to phonotactic grammar as a filter on the lexicon, going on to

review the distinction between lexical and phonological gaps, and between phonotactically

relevant and irrelevant context. Two particularly prominent phonotactic gaps in English

syllable onsets - [tl] and [s j ] - are shown to be phonological rather than lexical gaps, and

are analyzed as special cases of more general prohibitions.

Chapter 3 introduces the three theoretical contenders, TRACE, MERGE TP, and the

OT grammatical theory. The rationale for each is discussed and the existing empirical

evidence weighed, with each theory's account of it presented.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The precise workings of the MERGE TP theory have not yet been explained by its

adherents. Much of Chapter 3 is devoted to considering the different design parameters of

a theory of transitional probabilities in perception and choosing which possibilities to test.

The most important of these parameters is the specific nature of the phonological context:

How many segment positions are included, and how do the left and right contexts interact?

It will be shown that there is no choice of context that can account for the data cited by the

MERGE TP authors in support of the theory. If the context is chosen so as to cover any

one part of the data, the theory makes incorrect predictions about the rest. On the chance

that some of the contradictory data might be artifactual, two contexts are chosen for testing

as the most plausible and interesting.

New empirical evidence bearing on these theories is reported in Chapter 4. The

tactical focus is on the distinction between phonotactically relevant and irrelevant context,

and on that between phonological and lexical gaps.

Experiments 1-5 build on previous psycholinguistic research on the phonotactic

perceptual effect of English syllable structure.

Experiment 1 demonstrates an effect of phonotactically relevant context, but not of

irrelevant context, on perception of an [i]-[i] continuum by American English speakers,

exploiting the phonotactic illegality of word-final lax vowels. Experiment 2 attempts to

replicate Experiment 1 with initial [pw], considered phonotactically illegal by the TRACE

authors on statistical grounds (McClelland & Elman 1986), but merely "marginal" by

phonoiogists on the basis of intuition, distribution, and history (Hultzen 1965, Wooley

1970, Catford 1988, Hammond 1999). No effect is found, despite the strong statistical

biases against [pw]. Experiment 3 directly compares the bias against [pw] with that against

the much more illegal, but statistically very similar, [tl], and finds a much stronger bias

against the latter. Manipulations of phonotactically relevant context are found to have the

effect predicted by the grammar-based theory, while those of phonotactically irrelevant

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
context have no effect. These findings are argued to favor the grammar-based processing

theory over TRACE and MERGE TP.

Where previous work in this field, including Experiments 1-3, has used stimulus

units to measure the dependent and independent variables. Experiments 4 and 5 used a

technique which allows the effect of one response on another to be measured when judging

a CC cluster in which both C’s are ambiguous (Nearey 1990). This allows bias effects to be

disentangled from stimulus factors and hence measured with greater accuracy. In

Experiment 4, the bias against [bw] is compared with that against the much more illegal, but

statistically very similar, [dl]. A strong bias against [dl] is found, but none against [bw],

corroborating the findings of Experiments 2 and 3. Experiment 5 is a control experiment to

insure that the results of Experiments 2,3, and 4 were not caused by compensation for

coarticulation (Mann 1980).

Experiments 6 a and 6 b exploit the stratified nature of the Japanese lexicon, in which

each word belongs to one of four classes with its own syndrome of phonological,

morphological, and etymological properties. One of these strata, Sino-Japanese, forbids

final [a:], while another, Foreign, permits it. The [a]—[a:] boundary is measured in carrier

nonwords containing phonological cues to membership in one stratum or the other.

In Experiment 6 b, Sino-Japanese cues are found to bias perception against [a:] as

compared to Foreign cues - an effect which is expected and necessary in the grammar-

based processing model. The MERGE TP model cannot account for this effect directly,

since some of the phonotactically effective context is too far away from the ambiguous

segment for the model to capture the dependency. The results can only be accommodated

in that model through ad hoc revisions.

The phonotactic boundary shift is larger and more robust than a word-superiority

effect obtained with the same listeners and paradigm in the control Experiment 6 a. This is

unexpected under TRACE, which models phonotactic effects as word-superiority effects.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5, finally, sums up the claims, arguments, and data presented in earlier

chapters, and situates them in the larger research context. Problems and opportunities for

the theory of grammar in speech perception are discussed, and areas of future research

delineated.

10

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 2

PHONOLOGICAL PRELIMINARIES

2.1. Introduction

This chapter has two principal aims. The first is to discuss the Optimality Theory

view of surface phonotactics in general; the second is to present a specific OT analysis of

facts about English syllable onsets that will be used in later chapters. A distinction is drawn

between productive phonological gaps and nonproductive lexical gaps in the syllable
inventory. Two examples of phonological gaps ([tl] and [s j ]) and one example of a lexical

gap ([pw]) in the English syllable onset inventory are discussed, and the grammatical

groundwork laid for the perceptual studies of later chapters.

2.2. Inventory and phonotactics in Optimality Theory

No spoken language uses all of the segments known to linguistics; each is limited to

only a comparatively small inventory (Maddieson 1984, §1.2). Sounds in the inventory do

not combine at random to form larger units, but are restricted to a small phonotactically

permissible subset of the logically possible combinations.

The OT account of this is shown schematically in Figure 2.1 below. Underlying

representations, drawn from the lexicon, are inputs to the grammar. The output of the

grammar is the observable set of surface forms.

11

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 2.1. Architecture of the OT phonological model

LEXICON

/underlying
representation/

CANDIDATE OUTPUTS

[surface representation 1]
[surface representation 2] [surface
GRAMMAR
representation]

Under the principle of Richness o f the Base (Prince & Smolensky 1993, §9.3), the

lexicon and the grammar function as independent modules. AH they have in common is a

representational protocol: The output of the lexicon and the input to the grammar are made

of the same representational elements (features, etc.) put together in the same way. Aside

from this restriction, the lexicon can, in principle, emit any representation, and the grammar

has to deal with it.

Since the set of output candidates includes, at the very least, a fully faithful candidate

identical to the input (and is generally held to include all of the possible inputs), the

12

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
grammar acts as a filter: Some of the inputs from the lexicon result in outputs that are

identical to them: others do not. 1

When we observe that a particular segment (or larger configuration) [Y] is missing

from the surface representations of a language, there are therefore two possible accounts:

Either no underlying representation /XI can surface as [Y] because of the filtering action of

the grammar, or else there is in principle such an IXJ, but by historical accident no one

happens to have coined or borrowed a word containing it. In the first case we are dealing

with a phonological gap; in the latter, with a lexical one.

OT is hardly the first theory to make this distinction, and its practical test of what is

or is not grammatical remains the same as that of its predecessors: productivity. If native

speakers can readily accept and produce an unattested segment in different environments,

treating it phonologically and phonetically like a word of their language, then the gap is

accidental, and is modelled as a lexical gap. On the other hand, if speakers are consistently

unable to produce the segment without alteration and without great effort, then the gap is

phonological, and is modelled in OT as a filtering effect of the grammar.

The distinction between phonological and lexical gaps is similar to, but not quite the

same as, that between systematic and accidental gaps. A gap is systematic if it is part of a

pattern of gaps; it is accidental if it is isolated. When the aim is to describe the sound

pattern of a language with maximum compactness and elegance, it is usual to put the

systematic gaps in the grammar and leave the accidental ones out. Starting with the same

language, we can arrive at different grammars depending on which criterion we follow, since

the systematic gaps are not necessarily productive. (See, for example, the discussion of

English initial [$1] or [pw bw mw] in §3.1.) Because our psychological claim is

specifically about productive phonology, we will use the phonological/lexical criterion in

constructing our grammar.

1 The term "source/filter model" is an analogy with the source/filter model o f vocal-tract acoustics, in
which the larynx is a sound source whose output is filtered by the rest o f the vocal tract. W hatever the
larynx emits, the rest o f the vocal tract has to deal with it, and will produce som e output.

13

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ideally, an OT grammar for the phonological inventory of a language should make

the correct filtering predictions. That is, given the entire set of representationally possible

underlying representations, it should produce all and only the productive surface forms.

However, this clear theoretical distinction between legal and illegal is often difficult

to implement in practice. Algeo (1978), for instance, reviewed 16 "typical" studies of

English consonant clusters.

The sixteen collated studies list a total of 107 possible onset clusters, of

which there is agreement on only 30, considerably fewer than a third, leaving

77 onset clusters that are rejected by one or more studies.... The

discrepancy is even more striking for coda clusters. The same studies

explicitly list or imply well over 500 clusters that are theoretically possible in

syllable codas, of which there is agreement on only 19, fewer than 4 percent

(p. 208).

Many of the discrepancies are caused by methodological differences - in selection

of materials, in choice of transcription, in level of representation (surface or underlying), in

choice of phonological domain (syllable or word), and so on, but there is a certain

irreducible gradience, a lack of perfectly sharp demarcation between the "legal" and "illegal"

sets. It is agreed that [tl] is an illegal onset and [kw] a legal one, but there is no such

uniformity of judgment about [vl] or [pw] - they are felt to lie somewhere in between. The

problem of gradient illegality is a difficult one for Optimality Theory, and one which we will

return to in our discussion (below, §2.3.2.4) of the [pw bw mw] onsets.

2.3. Inventory and phonotactics of English syllable onsets

Most experiments on phonotactics have exploited the restrictions on place of

articulation in English syllable onsets (Massaro & Cohen 1983, Pitt 1998, Moreton 1999),

14

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
most often the bans on *[sj] and *[tl dl]. The place-of-articulation restrictions are a good

choice because they are strongly productive, because ambiguous stimuli are straightforward

to synthesize, and because the place features of the critical consonants can be manipulated

without changing other linguistic variables.2


For these reasons we will concentrate on a grammar of the C [j w 1] onsets of

American English3. We begin by constructing a source-filter account of the stops,

affricates, and fricatives found in the onset of a CV syllable, using typologically motivated

constraints. This is then extended to the onset of C[j w 1]V syllables to account for the

phonotactics of *[sj], *[tl], and ?[pw]. To anticipate: I will model the *[sj ] gap as a special

case of a general process spreading anteriority and the *[tl dl] gap as a special case of a

general ban on homorganic obstruent sequences. The ?[pw] gap, though systematic as a

special case of a general ban on homorganic consonant sequences, is not productive and will

be modelled as a lexical gap.

§ 2.3.1 lays out the data; § 2.3.2 gives the analysis. § 2.3.2.1 describes the feature

system used in this model, based on Hall's (1997) variant of the now-standard Sagey (1990)
articulator features. § 2.3.2.2. analyzes CV syllables, § 2.3.2.3. discusses *[sr], § 2.3.2.4.

discusses *[tl], § 2.3.2.5. discusses ?[pw], A summary is given in §3.3.

2.3.1. Explicanda

Their disagreements about other clusters notwithstanding, linguists are in fair


agreement on the general outlines of the American English C [j w 1] onsets. Table 2.1

shows the onsets which I will treat as the productive ones.

1 The other principal restriction is that onsets have to rise in sonority. In theory, manipulating (he
sonority o f either C in a CC onset cluster should affect perception o f the sonority o f the other C. However,
sonority is not a distinctive feature. Two segments which differ in sonority differ in many other
linguistically relevant ways as well, making the stimuli hard to construct and the results hard to interpret.
5 The C(j] onsets I do not discuss, because they have not been used experimentally.

15

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.2. C [j w I] onsets of American English

Hultzen Woolley Catford Hammon

Onset Examples 1965 1970 1988 1999

pa hi prove; brew vv vv vv vv
pw bw pueblo; bwana V? ?? 7?

pi bl plant; blame vv vv vv vv

ti di tread; dread vv vv vv vv
tji dji

tw dw twine; dwindle vv vv vv V?
tl dl •• •• •• ••

ki gj crack; grid vv vv vv vv
kw gw quit; Gwen, guava V? vv vv V?
kl gl clean; gleam vv vv vv vv

& VJ free; Wronskian, vroom V? vv


fw vw •• •• •• ••

fl vl flea; Vladimir >/• V?

0j 5j threw v»
0w dw thwart V* ?•

61 dl

SJ Zi •• M M ••

sw zw sweet; Zwicker V? V* V*

16

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hultzen Woolley Catford Hammond

Onset Examples 1965 1970 1988 1999

si zl slot; zloty >/• V?

X* J-' shred; - V* V* V*
Jw JW Schwinn, ?• v. — ••

Schwarzenegger; -

J1 5l schlock; - ?• V* — ?•

author included it, but marked it as marginal. (%) means it was marked as normal for some
dialects. (•) means it was not included.

This list is intended to include all and only C[i w 1] onsets which can be produced

without alteration and without special effort by speakers of American English. Clusters that

are obviously non-native have been included as long as they occur in familiar, easily

pronounceable names (Schwarzkopf, Zwicker, Vladimir) or loan words (zloty, guava,

pueblo)4. I am not sure whether the unattested onsets [3 r 3 W 3 I] (italicized in Table 2.1)

are a lexical or phonological gap; given the rarity of initial [3 ] in English, it is dangerous to

infer anything from their absence alone. I will take them to be of the same grammaticality

as their voiced counterparts.

The transcription in Table 2.1 is a broad one. The finer phonetic details, which are

crucial to this analysis, will be discussed below.

4 Compare news broadcasters' fluent pronunciations of zloty, Norman Schwarzkopf, Vladimir Putin with
their awkward Chechnya and Srebrenica. The productivity o f the syllable-initial [nj] and [s j ] gaps for these
trained speakers is clearly audible.

17

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2. Analysis

Two generalizations are immediately clear from Table 2.1. First, the C in the C[j w

I] cluster is itself never a [j w I], but is always something of lower sonority. This is a

special case of a general fact, true across languages, about syllable onsets - that sonority

rises over the course of the onset (Clements 1990).5 Second, there is no difference between

the behavior of unvoiced C and its voiced counterpart.

Since both of these issues are irrelevant to the question of place restrictions, we can

simplify our task by ignoring them. Henceforth we will only consider voiceless Cs, letting

them do double duty for their voiced counterparts, and ignore candidates with flat or falling

sonority (assumed to be ruled out by very high-ranked markedness constraints).

2 3.2.1. Representations

I adopt the representational system below. It is a slightly simplified version of the

system proposed by Hall (1997), which is in turn a modification of the Sagey (1990)

feature geometry based on active articulators.

5 English, like a number o f other languages, allows [s] and perhaps (J] to occur out o f the expected
sonority sequence (e.g., spit, stick, skip, square; shtik). This is a vexed question which I will not discuss.
It has been suggested that the [s]C sequence is a complex segment like a reverse affricate (Hayes 1980.
Lamontagne 1993).

18

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.2) Feature tree
+Root
+Manner
continuant
consonantal
sonorant
strident
♦lateral
+Laryngeal
spread glottis
constricted glottis
voiced
+Supralaryngeal
♦Velum
nasal
+PIace
+Labial
+round
+Coronal
anterior
distributed
back
+Dorsal
back
high
low
Note: Features marked'+’ are privative; others are equipollent.

The most notable difference between this and the familiar Sagey (1990) system is

that [back], normally a dependent of the Dorsal articulator, is here also a dependent of the

Coronal node too, with the stipulation that [+back] requires [+Dor]. The innovation is

Hall's (1997) solution to a problem in the original system: that palatalization could not be

straightforwardly modelled as feature spreading when the palatalized consonant was

[+Cor], Segments which triggered palatalization, usually front vowels, were [+Dor -back],

but the [-back] could not be spread to a preceding [+Cor] segment, since [+Cor] could not

support it (see Sagey 1990: §3.4.2.2). Hall argues that the palatalization feature, whatever it

is, must be a child of both the Coronal and Dorsal articulator nodes, since it can be spread to
both [s] and [x]. The segments triggering palatalization or resulting from it are, he says, all

characterized by a fronted tongue body, which is the articulatory correlate of [-back].

19

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Allowing both tongue nodes to sponsor [-back] captures the physical link between the

lamina and the forepart of the dorsum (Hall 1997:§2.7.2).

It is crucial to the analysis to have some representational scheme under which (j j ]

can spread something to the Coronal node and [w] cannot; I have chosen this one because

of Hall's detailed treatment of the various places of articulation.

I have also simplified Hall's feature tree by leaving out his Peripheral node, which

came below Place and above {Labial, Dorsal}, by replacing the Laryngeal features [stiff]

and [slack] with [voiced], and by omitting [rhotic] in favor of [+high, +low] (see below).

The Tongue Root node has been removed; I will ignore the complexities of uvular,

pharyngeal, and laryngeal consonants (McCarthy 1991). None of these changes is crucial

to the analysis.

With two exceptions, all features in this system are either privative or equipollent. A

privative feature is either present or it is not. An equipollent feature is either [+F] or [—F],

but not both. If a feature is present in a representation, then all equipollent children of that

feature have to be present as well, with either + or - specification. That is, an equipollent

feature can be absent from the representation of a segment only if the feature's parent is also

absent. A segment consisting only of the features [+Root +Laryngeal] is possible, but one

which is [+Cor] must be either [+ant] or [-ant].

The two exceptions are [cont] and [strident]. Affricates are analyzed as [-cont

+cont] (Sagey 1990:§3.3.4.2). The feature [strident] is an equipollent child of the Manner

node, but it is only present when the segment is a fricative or affricate, and only for [+Lab]

or [+Cor] segments.

Finally, I have left the privative [+lateral] under the Manner node because it behaves

like a Manner feature in not spreading. The other obvious option is to put it under [+Cor],

since nearly all known laterals are coronal and they occur at all four coronal places of

articulation (McCarthy 1988, Hall 1997: §A.2.3.2). There are lateral fricatives, lateral flaps,

20

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and (most commonly) lateral affricates. None is found in English. Evidence for how they

are repaired is lacking, so will not discuss them here.6

2.3.2.L1. Consonant features

The source-fiJter model is responsible for explaining the badness of a great many
candidates for the C in a C[j w 1] onset. Here, the critical candidates are the oral stops,

affricates, and fricatives at every place of articulation. Their representations are shown in

Tables 2.3 and 2.4.

Table 2.3. Obstruent manner features

Manner of articulation_________________

Manner feature_________ Oral stop___________ Affricate___________ Fricative

cont - + and - +

cons + + +

son - - -

_______strident_____________ (never)_____________ (some)_____________(some)

6 For an alternative view o f the feature specifications for liquids that does not use [lateral 1. see W alsh
Dickey (1997).

21

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.4. Obstruent place features

a.

Place of articulation

Palatoalveolar, Palatal Velar

Place features alveolopalatal

+Lab +

round

+Cor + + +

ant + + -

dist (+) -

+Cor/+Dor

back

+Dor

high

low

22

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
b.
Place of articulation

Palatoalveolar, Palatal Velar

Place features alveolopalatal

+Lab

round

+Cor +

ant -

dist +

+Cor/ +Dor

back - - +

+Dor + +

high + +

low - -

The IPA distinguishes palatoalveolar from alveolopalatal, at least for fricatives. I

accept Hall's arguments (1997: §2.5.2) that the two should not have different features, sir.ce

no language has two contrasting segments distinguished only by that place difference.

Table 2.5 shows the IPA symbols for every combination of the manners from Table

2.3 with the places from Table 2.4, together with stridency values for the fricatives and

affricates.

23

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.5. Representation of consonants

Manner

Place Stop Affricate Fricative [strident] 7

Labial P p 4» ♦ -

Labiodental P pf f +

Dental/Interdental t J0 e -

Alveolar t ts s +

Retroflex t (5 § +

Palatoalveolar, alveolo­ c
t tf, tQ J. e +

palatal

Palatal c eg 5 -

Velar k kx x -

These are the low-sonority voiceless segments which this system is capable of

representing. In our source-filter model, the lexicon can emit any of them. Since most of

these segments do not and cannot occur in English, we will have to build a grammar which

deletes the un-English ones or converts them into English segments.

2.3.2.I.2. Features of [j w I]

Our task in this section is simply to describe the surface features of American
English [j w 1], We will not explain why these segments, rather than other sonorants,

should be in the American English inventory. I adopt the analysis of Kahn (1980), who
makes [j w] glides (semivowels) and [I] a sonorant consonant.

Guenter (2000) summarizes the arguments that American English [j w] are glides

as follows: (1) They are phonetically central approximants. (2) They restrict the set of

6 I do not know anything about the stridency o f lateral fricatives. In the absence o f better information, I
will assume that they are as strident as the corresponding non-lateral fricatives.

24

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
vowels that can precede them. (3) Each has a stressed syllabic version with which it

alternates. (4) They cannot occur after tautosyllabic diphthongs (Cohn & Lavoie 2000).

(5) Flaps occur after them. (6 ) Final [t d] cannot be deleted after them. These statements

are in general not true of [I].8 To these we can add the observation of Espy-Wilson (1992)

that [I] is frequently produced with a spectral discontinuity, while [j w ] are not.

Kingston (p.c.) points out that stops are often intruded between [1] and a following

lingual fricative: pulse [p\lts], filth [filtO]. The same phenomenon occurs with the other

class of high-sonority consonants, the nasals: warmth [woimpG], chance [tjxnts]. It does

not happen after [j j w ].

We will model [j w] as glides - that is, as vowels syllabified into a syllable onset,

having the same features as the syllabic [j u ] (Hall 1997:135, Rosenthal! 1997). The

proposed feature system is shown in Tables 2.6 and 2.7.

Table 2.6. Manner features for [j w]

Manner feature M [w]

continuant + +

consonantal - -

sonorant + +

strident

7 However. Guenter did And that 15 o f his 16 informants had an [I] that satisfied (4), and many had one that
satisfied (3); he interprets this as evidence o f language change in the direction o f a glided [I].

25

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.7. Place features for [j w ]

Place feature M [w]

+Lab + +

round + +

+Cor +

ant -

dist - or +

+Cor/ +Dor

back + +

+Dor + +

high + +

low + -

The manner features are standard, as are the place features for [ w ]9 . Those for [ j]

require some justification.

Delattre and Freeman (1968) made X-ray Films, with synchronized spectrograms, of
46 speakers from various parts of the United States. They found a wide variety of [j]

articulations, which sounded very similar. All speakers, in all syllable positions, make a

constriction in the pharynx about halfway between the glottis and the uvula. They also

make a constriction somewhere in the oral cavity between the comer of the alveolar ridge

and the beginning of the soft palate, using the dorsum, blade, or tip of the tongue - in

onsets, always the blade or tip. The lips are rounded (most strongly in the onset of a

stressed syllable). Similar results were obtained in MRI and palatographic studies of 4

speakers by Alwan et al. (1997).

* Hall argues that both [j j J are actually [+Cor] (I997:§§ 1.2.6, 4.4).

26

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In vowels, a pharyngeal constriction is the articulatory correlate of [+low]; a close

oral constriction, that of [+high]. From an articulatory-phonetic standpoint, all varieties of

American English [j ] are therefore [+low +high]10. The formal advantages are clear: we

are rid of the [+rhotic] feature (which needed the same kind of co-occurrence stipulations as

[+lateral]), and we no longer have to stipulate that [+high +low] is impossible."

In the studies of Delattre and Freeman (1968) and Alwan et al. (1997), the tongue

tip or blade participated in all versions of the onset glide [j ], indicating that the glide was

coronal12. The position of the constriction ranged from prepalatal to postpalatal. We can

model it as [+Cor -ant] and either [+dist] or [—


dist] depending on the speaker. Since the

choice is not crucial to the analysis, I will favor my own speech and pick [+dist].

Finally, lip-rounding in all positions requires that [j ] be modelled as [+Lab

+round].
For [1], we use the features of Tables 2.8 and 2.9:

Table 2.8. Manner features for [1]

Manner feature [1]

continuant -

consonantal +

sonorant +

strident

-t-laterai +

9 Delattre & Freeman's Figure 1, a gallery o f X-ray tracings, shows this very clearly. Their "Type 4"
syllabic [j ] is particularly striking - the tongue has two humps, one in the m iddle pharynx and one under
the hard palate, with a deep indentation between them.
11 The oral constriction in [j ] has also been analyzed as the implementation o f a [coronal] feature (Walsh
Dickey 1997).
10 The nuclear [j] had a coronal component in only one of its five manifestations (there was much more
variety between speakers in non-initial position), with the blade approaching the rear o f the hard palate
(Delattre & Freeman's Figure 1, Type S). It seems that coronal articulations are obligatory in syllable
onsets, but (for most speakers) prohibited in syllable nuclei.

27

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.9. Place features for [I]

Place feature [I]

+Lab

round

+Cor +

ant +

dist -

+Cor/ +Dor

back +

+Dor +

high +

low —

The manner features are standard except for [-cont]. It is a matter of debate whether
[1] is continuant phonetically or phonologically.

The double articulation of [1] has been shown by Sproat & Fujimura (1993). Their

X-ray microbeam data, from four speakers of American English, found both a dorsal and an

apical gesture, whose relative timing varied depending on prosodic position. The apical

gesture we model as [+Cor +ant -dist]. The MRI and palatographic study of Narayanan et

al. (1997) confirmed the double gesture, and showed that the apical gesture contacted the

alveolar ridge along the midline in both onset and coda [ 1].

2.3.2.2. CV syllables

The inventory of consonants in C onsets, shown in Table 2.9, can be summed up as

follows:

28

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
I. Two classes do not occur at all: the retroflexes and the palatals. Both of these

are rare (marked) places of articulation cross-linguistically. 2. There is a stop series

labial-alveolar-velar beside a single palatoalveolar affricate. This is a single [-cont -son]

series, labial-alveolar-palatoalveolar-velar, with affricate manner obligatory at the

palatoalveolar place and forbidden elsewhere. 3. There is a fricative series

labial-dental-alveolar-palatoalveolar. Velar fricatives are forbidden.

Table 2.10 shows the repair which I assume is made to each of the impermissible

segments: A box encloses each group of underlying segments that map onto the same

surface segment.

Table 2.10. English surface obstruent inventory in CV syllables

Place of articulation Stop Affricate Fricative [strident]

Labial P P* * -

Labiodental P Pf f +

Dental/Interdental t e _
fi

Alveolar t Is s +

Retroflex t - g +

Palatoalveolar, alveolopalatal t tf, tq S, c +

Palatal c c5 S -

Velar k kx X -

Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.

The process involves 24 underlying-to-surface mappings, and the grammar I

propose will be quite complex, with 12 constraints ranked in 9 strata. I will first describe

and justify the constraints, then present ranking arguments for them.

29

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.1. Undominated faithfulness constraints

None of the repairs shown in Table 2.10 involve deleting the offending segment

entirely. This is very naturally modelled as an all-dominating faithfulness constraint against

deletion:

(2.11) MAXSEG
Every segment of the input has a correspondent in the output.

No repair involves changing the major articulator Labials are changed to labials,

coronals to coronals, and dorsals to dorsals. This can be modelled with an IDENT

constraint:

(2.12) Id en t [P l a c e ]
If an underlying and a surface segment are in correspondence, they share the
same major articulator.

2.3.2.3.2. Coronal stop places: dental, retroflex, alveolar, and palato-alveolar

Of the four coronal stop places, only the alveolar is used in English CVs. Instead of

a stop, the palato-alveolar place has an affricate.

The lack o f retroflex and dental stops can be seen as the result of high-ranked

markedness constraints against them, constraints whose existence can be justified

typologically.

Retroflexes are banned in many languages besides American English. In

Maddieson's genetically and geographically balanced sample of 317 languages, over 99%

had a dental or alveolar stop, while only 11.4% had a retroflex stop (1984: §2.4). In the

same sample, 266 languages (84%) had a non-retroflex voiceless fricative, while only 17

(5.4%) had a retroflex voiceless fricative. For voiced fricatives the numbers were 96 (30%)

and 3 (1.0%) respectively (1984: Table 3.2). The markedness of retroflexes can be

30

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
modelled by a markedness constraint *RET, which awards one mark for each segment that

is [-ant, -dist].

(2.13) *RET
*[-ant, -dist]

It is unusual for a language to contrast dental and alveolar place; the [+anterior]

stops are either laminal or apical. The dental stops [t d] I assume are ruled out by a blanket

constraint against the dental place of articulation, operative in other languages which favor

alveolar over dental place.

(2.14) *DENTAL
*[+Cor +ant +dist]

On the basis of loan-word phonology, I will assume that both retroflexes and

dentals are repaired to alveolars. Alveolars remain alveolar, and palato-alveolars remain

palato-alveolar (though, for reasons discussed in the next section, palato-alveolar stops gain

a [+cont] specification to become affricates).

Table 2.15. R epar of retroflexes to alveolars (Yule & Bumell 1886, American Heritage
Dictionary 2000)

Source language Original form English

Tamil [kattumaram] catamaran

Hindi [([akait] dacoit

Hindi [patti:] puttee

Hindi [topi:] topee

Hindi [tamtam] tom-tom

Hindi [lu:t] loot

31

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.16. Repair of dentals to alveolars (American Heritage Dictionary 2000)

Source language Original form English

French [debakl] debacle

French [tul5] Toulon

Russian [tokamak] tokamak

The repairs involve changing the [anterior] and [distributed] specifications. The

problem is how to insure that palato-alveolar inputs, and only palato-alveolar inputs, surface

as palato-alveolar outputs. The solution is to hand: Under the Hall (1997) feature system,

palato-alveolars are both coronal and dorsal. Changing a non-palato-alveolar to a palato­

alveolar, or vice versa, therefore violates the undominated lDENT[PLACE]. Since *R e t and

*D ental force retroflexes and dentals to change, while IDENT[PLACE] prevents them from

becoming non-coronals or palato-alveolars, their only recourse is to become alveolars by

changing their values of [ant] or [dist].

32

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.17) I d e n t [P l a c e ) » *R e t , *D e n t a l » Id e n t [a n t ), I d e n t [d is t ].

IDENT

[P l a c e ] *R et *D e n t a l IDENT[ANT] IDENT[DIST]

Wl15 *!

-> [d] *

MJ *! * *

[d3] *! *

Id / [d] *! *

-> [d]

MJ *! *

[d3] *! * *

*! * *
/«¥ Ml

-> [d] *

M) *!

[d3] *! *

/d/ [d] *! *

[d] *! * *

MJ *! *

-> [d3]

13 [d] represents a [-ant, + d ist| (palatoalveolar) stop, the stop corresponding to the affricate [tj).

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.3. The persistence of [6]

The fricatives [0 d] are the only dental segments in English CVs, and the only

[-strident) fricatives. As such, they seem to be resistant to two markedness constraints.

One, discussed in the previous section, is *DENTAL, which militates against all

dental articulations.

The other is a constraint against non-strident fricatives. English is rich in

[+strident] fricatives ([f s J]) and poor in [-strident] ones (only the comparatively rare [0 ]).

The situation is the same in most languages. The three most common fricative places
(voiced or voiceless) are, in descending order, [s]/[s], [fl, and [f]. The first nonstrident

fricative, in fourth place, is [x], which is half as common as [f| and more than twice as

common as any other nonstrident fricative (Maddieson 1984: Table 3.2). We can capture

the markedness of nonstridents directly as a constraint against [-strident]:

(2 .1 8 ) * [-STRIDENT]

I will analyze the persistence of [0] in English as preservation of the salient acoustic

contrast between the [-strident] [0 ] and the other, [+strident], coronal fricatives14:

(2 .1 9 ) M a x [ - s t r i d e n t ]/C o r
An underlying [-strident] coronal must correspond to a surface [-strident]
coronal.

11 Since stops are not specified for [strident], this constraint also keeps [t0 6] from turning into stops.

34

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.20) MAX[-STRIDENT]/COR » *[—STRIDENT]

/0 ik / M a x [ - S t r i- *D ental * [—STRIDENT] id e n t [D is t ]

dent] / C or

[Oik] * *

[sik] *! *

Since all the other [-strident]s (the labials and dorsals) are still able to change to less

marked segments, they will do so, while the dentals cannot:

( 2 .21 )

M a x [-S t r id e n t ] * [-S T R ID E N T ] (lower-ranked


/$aet/ /C o r faithfulness)

[fact] *!

[fact] *

This leaves the dental fricatives as the only possible dentals and the only possible

non-strident fricatives.

2.3.2.3.4. Dorsal places o f articulation: palatals and velars

English CVs have only velar articulations. 15 Palatals are repaired to velars.

Palatals are rare cross-linguistically. Maddieson found palatal or palatoalveolar

stops in only 18.6% of his sample, though over 99% had a velar stop (1984: §2.4). A

voiceless palatal fricative occurred in only 16 of the languages, or 5.0%, and a voiced palatal

fricative was found in only 7, or 2.2%. By way of comparison, voiceless and voiced

palatoalveolar fricatives turned up in 146 (46%) and 51 (16%) (Maddieson 1984: Table

12 The allophonically palatalized velars found before front vowels, as in key, are not as far fronted as
phonemic palatals in languages that have them (Keating & Lahiri 1993). W e regard them here as velars.

35

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2). We capture this with another context-free markedness constraint, *PAL, which gives a

mark to each [+Dor -back] consonant.

(2.22) *PAL
*[+cons, +Dor, -back]

This constraint will cause underlying palatals to become velars (violating low-ranked

IDENT[BACK], but satisfying high-ranked IDENT[PLACE]):

(2.23) IDENT[PLACE] » *PAL » IDENT[BACK]

/ca/ iDENTfPLACE] *PAL IDENT[BACK]

[ca] *!

-> [ka] *

[ta] *!

Since palato-alveolars are also dorsal and [-back], they meet the structural

description of *PAL. However, lDENT[PLACE] protects them from losing their [Cor]

specification:

(2.24)

/tja/ Ident [Pl a c e ] *PAL lDENT[BACK]

[ca] *! *

[ka] *! *

-* m *

36

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3.5. Labial places of articulation: bilabials and labiodentals

Since no language is known to contrast bilabial and labiodental stops, they are not

distinguished in the Hall (1997) feature system.

Bilabial and labiodental fricatives are featurally distinct, the bilabials being

[-strident] while the labiodentals are [+strident]. The result of this, as we saw in (2.21), is

that [$] is converted to [f| in order to satisfy *[—STRIDENT].

2.3.2.3.6. The stop-affricate-fricative series

It is very common for languages to have affricates at all and only those places where

it has no stops. The most common pattern - found by Maddieson in 8 6 out of 317

languages, or 27% - is the English one of stops at the labial/alveolar/velar places and

affricates at the palatoalveolar place (Maddieson 1984: § 2.5). The effect of this is to

disperse the [-cont] segments as widely as possible in articulatory and acoustic space, with

one segment being made by the lips, one by the tongue tip, one by the tongue blade, and one

by the dorsum.

Affricates are, typologically, more marked than fricatives, which are more marked

than stops. Every one of the languages in Maddieson's sample had stops, and most had two

stop series. All of the 451 languages in the UPSID database have stops; 413 have

fricatives; only 300 have affricates.

However, it seems that the palatoalveolar place of articulation is more hospitable to

affricates than to stops. Maddieson's Tables 2.5 and 2.8 make this clear:

37

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.25. Number of languages with stops at given places in the sample of Maddieson
(1984:TabIe 2.5)

Palatal or

Dental or palato­ Labial-

Bilabial alveolar alveolar Retroflex Velar Uvular velar

No. of

languages 314 316 59 36 315 47 20

Percent 99.1% 99.7% 18.6% 11.4% 99.4% 14.8% 6.3%

Table 2.26. Frequency of the most common affricates in the sample of Maddieson
(1984:Table 2.8)

Voicing Dental/alveolar16 Palato-alveolar

Plain voiceless /*ts/ 95 AS/ 141

Aspirated voiceless /*tsty 33 /tSh/ 42

Plain voiced /*dz/ 30 /dZ/ 80

The cross-linguistic pattern is that found in English, where the palato-alveolar

position in a stop series is filled with an affricate. Affricates are more marked than stops,

except at the palato-alveolar place, where the opposite is true. The affricate, it appears, is the

stop of the palato-alveolar place. This tendency can be formalized as a markedness

constraint:

(2.27) AFFR/PALAL
An obstruent should be an affricate if and only if it is palato-alveolar.

13 Maddieson's * indicates "dental o r alveolar (combined)". The non-IPA symbols are Maddieson's.

38

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In English, illegal labial and alveolar affricates are repaired by converting them into

fricatives: pfennig [fj, tsunami [s], Zeitgeist [z], czar [z] (Jones 1997). The illegal

palatoalveolar stops are repaired by affricating them: Magyar [d3 J. The two processes are

shown in Table 2.28:

Table 2.28. Labial, alveolar, and palatoalveolar series of American English

Place of articulation [-cont] [-cont +cont] [+cont]

Labial P pf -> f

Alveolar t ts -» s

Palatoalveolar ______ it______ f


Note: The grayed segments are absent from the surface inventory. The repair to each is
indicated by the arrow.

In the labial and alveolar cases, the marked affricate is deaffricated to a fricative by

deleting [-cont], rather than to a stop by deleting [+cont]. Some faithfulness constraint

must be blocking the deletion of [+cont] but not of [-cont]. We will take it to be

MAX[+CONT], which gives a mark to each corresponding segment pair where the

underlying segment has [+cont] but the surface segment does not:

(2.29) MAX[+CONT]
An underlying [+cont] segment must correspond to a surface [-fcont]
segment.

Since the non-palato-alveolar affricates lose their [-cont] specification in order to

satisfy AFFR/Pal AL, it must be ranked above Ma x [-CONT]:

(2.30) MAX[-CONT]
An underlying[-cont] segment must correspond to a surface [-cont]
segment.

39

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2 .3 1 ) M a x [+ c o n t ], a f f r /P a l a l » M a x [-c o n t ]

/pfemk/ M a x [+c o n t ] AFFR/PALAL M a x [- c o n t ]

[pfenik] *1

[penik] *!

—> [fenik] | *

The palato-alveolar stops change manner rather than place of articulation so as not to

violate the undominated lDENT[PLACE]. (The palato-alveolar place is the only one which is

both coronal and dorsal, so that any change o f place changes a major articulator.) They

become affricates, rather than fricatives, to avoid a gratuitous Ma x [-CONT) violation.

(2.32)

/m adar/
c I d en t [Pl a c e ] A ffr /P a l Al Max [- c o n t ]

[madar] *!

[madar] *!

—» [m ad 3 ar]

[m a 3 ar] *!

Max[+CONT] is violated when [9 ] or [x] is repaired to [k], so it must be dominated

by *PAL, *[-STRlD].

40

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2 .3 3 ) *PAL, *[-STRID] » M a x [+ c o n t ]

/airm an / *P a l *[—STRID] M a x [+c o n t ]

[airm an] *! *

[aixman] *!

[aicman] *!

—» [aikman] *

2.3.2.3.7. Consirainl lattice

The grammar we have established has the rankings shown in (2.34). The topmost

stratum consists of unviolated faithfulness constraints. Lines represent rankings established

by direct comparison in ranking arguments.

(2.34)

M a x Seg , Id e n t [Pl a c e ] M a x [ - strid en t ]/C or

* [-STRIDENT] *DENTAL

Iden t [B ack ] M ax [+c o n t ]I d en t [ant ], Id en t [dist ]

A ffr /P a l A l

M a x [- c o n t ]

41

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2.3. * [ s j J

With the basic inventory accounted for, we are now ready to turn to the first of the

two onset conditions used in the experiments. Table 2.10, repeated here, can be compared

with Table 2.35, which shows the observed inventory and repairs in C[j ]V syllables.

Table 2.10. English surface obstruent inventory in CV syllables

Place of articulation Stop Affricate Fricative [strident]

Labial P * -

Labiodental S f +

Dental/Interdental t JB 0
Alveolar t ts s +

Retroflex t t§ § +

Palatoalveolar, alveolopalatal t tf, tc £ c +

Palatal c c5 9 -

Velar k kx X —

Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.

42

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.35. English surface obstruent inventory in C[j ]V syllables

Place of articulation Stop Affricate Fricative [strident]

Labial P t -

Labiodental 6 Pf f +

Dental/Interdental t
w% JO 0 ___

Alveolar t s +

Retroflex t r ■' 1 i +

Palatoalveolar, aiveolopalatal - P : tf, tc +

Palatal c c$ 9 -

Velar k kx X —

Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box.

The only differences between permissible Cs in CV and C[j]V syllables are in the

coronals. Four formerly separate groups have been merged into two, so that all coronals

(except [0 ], immune as usual) have the same minor place features: [-ant +dist -back].

Since these are exactly the coronal features of [i], the merger is naturally understood as

place assimilation via spreading of the Coronal node. 17

14 The phonetic effect o f [j ] on preceding ft/ is variously described by different authors. I discuss here the
dialect of Hammond (1999:101) and myself, in which the p re-fi] /t/ have a distributed palato-alveolar
articulation, [tfl. Others, such as Olive et al. (1993), say that the articulation is an apical, retroflexed [(].
Given the wide variation in how speakers articulate [j ], the difference may be due to (he spread o f different
features: the first [j ] being (-an t, -Klist] and the second being [-ant, -dist]. If so, one would expect that (t-i]
speakers would have a retroflex, rather than a palato-alveolar, articulation for /jj/, so that it would be
pronounced [§j|. The acoustic contrast between [t] and [[J, o r between [S] and [§), will in any case be
difficult to hear before [j ] ow ing to the m uffling effect o f lip rounding and the lowering o f form ant
frequencies.

43

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.35) SPREAD[COR]
Neighboring [+Cor] segments should have the same value of [antj, [dist],
and [back],

SPREAD[COR] must dominate the faithfulness constraints against spreading the

minor coronal features: IDENT[ANT] and IDENT[DIST]. No IDENT constraint on [back] is

violated since coronals either lack a [back] node entirely or are [-back]. Since [0] is

immune, SPREAD[COR] must be dominated by MAX[-STRIDENT]/COR.

(2.36) MAX[-STRIDENT]/COR » SPREAD[COR] » IDENT[ANT], lDENT[DIST]

a.

/sjEbjenitsa/ M ax S pr e a d IDENT[ANT] IDENT[DIST]

[-STRIDENT] [COR]

/ C or

[sjebienitsa] *!

(jjsbjenitsa] * *

b.

/ 0 -ied/ M ax SPREAD IDENT[ANT] IDENT[DIST]

[-STRIDENT] [COR]

/COR

[0 jed] *

Cfjed] *! *

The illegality of [sj ] arises from its failure to obey SPREAD[COR]: The [s] should

be retracted to [J], but is not.

44

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A conceivable repair to [ s j] which is not in fact used is epenthesis; Sri Lanka does

not become [soji] Lanka the way that tmesis becomes [tomisis]. This shows that the anti-

epenthesis constraint DEPSEG dominates IDENT[ANT] and IDENT[DIST]:

(2.37)

/sji/ S prea d D ep Seg IDENT[ANT] lDENT[DIST]

[COR]

[sji] *!

m * *

[saii] *!

The grammar is shown in (2.38) (lines indicate only those rankings proven above):

(2.38)

M a x Se g , Id e n t [Pla ce ], *Re t , Ma x [- s t r id e n t ]/C or

*R] [-STRIDENT] *DENTAL SPREAD[COR] DEPSEG

IDENT[BACK] MAX [+CONT] IDENT[ANT], lDENT[DIST]

Affr /P al Al

M ax [- c o n t ]

45

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.2A. *[tl]

The second commonly-used environment is the C[1]V syllable. Here, all of the

coronals are excluded from appearing in C position except the strident fricatives.

Table 2.39. English surface obstruent inventory in C[l]V syllables

Place of articulation Stop Affricate Fricative [strident]

Labial P p$ ♦ -

Labiodental P Pf f +

Dental/Interdental t JB e -

Alveolar t ts s +

Retroflex t \§ § +

Palatoalveolar, alveolopalatal t tl, tc J. 9 +

Palatal c c9 9 -

Velar k kx X -

Note: The white segments are found in surface CV environments. Underlying gray
segments are mapped to the white segment in the same enclosing box. The repair for [tl] is
shown as [k], but this is not known with certainty.

The illegality of [tl dl] in an onset can be linked to the fact that both are coronal and

[-cont], through the Relativized Obligatory Contour Principle (Selkirk 1991, Padgett 1991):

(2.40) Relativized OCP (Selkirk 1991)


G H
| | where G and H share property F, and are F-wise
*F F adjacent.

The Relativized OCP is a feature-geometric constraint designed to account for root

co-occurrence restrictions in which the effects of place of articulation similarity are

modulated by stricture features. In Modem Standard Arabic, for example, a triconsonantal

46

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
verb root may contain two successive coronals, but only if they disagree in continuity. Thus
/sVtVq/ is permissible, but /sV0Vq/ or /tVdVq/ is not (Yip 1989). However, there are no

general restrictions on co-occurrence of two coronals, or on co-occurrence of two [+cont]

or [-cont] consonants; what is discouraged is similarity on both dimensions at once.

Segments which are too different in stricture do not interact in place. (In feature-geometric

terms, stricture similarity is expressed as adjacency on an autosegmental tier.)

The effect of the Relativized OCP can be seen diachronically in English, in the

progressive loss of the coronal [j] in onsets after coronals and before [u]. It was first lost

from [tju], as in rude, rule, then from [lju] as in lute, dilute, and is now being dropped from

[nju] (news), [sju] (suit), and (in the most advanced dialects) [dju] and [tju] (duke, tune)

(Trudgill 1999:56-59). The more similar the preceding consonant is to [j] in stricture, the

earlier it was lost.

We can see the ban on [tl dl] as a similar phenomenon. English allows [pi bl kl gl],

where the place of articulation differs between stop and liquid even though both are
noncontinuant. It allows [ti dj] and [si], where both segments are coronal but only one is a

non-continuant. What is forbidden is two successive consonants with the same articulator

and the same value of [cont]: [tl dl ] . 18 This is modelled as one of the family of Relativized

OCP constraints:

(2.41) OCP(CONT, PL)


Adjacent consonants using the same articulator are forbidden if they share
the same value of [cont].

18 English does have [si], [sn], and fstj onsets, which appear to violate the ban on [coronaI][coronal]
sequences. However, initial [s] is exceptional in another respect: Unlike all other consonants, it can
precede a less sonorous segment. In fact, [s] can be added to any legal onset except a fricative or affricate,
and all three-consonant onsets are so formed. The [s] neutralizes the [voice] contrast in a following stop,
and palatalizes to [f] before [j ], but otherwise does not interact with the rest o f the syllable. These facts are
ordinarily analyzed by positing a reserved structural slot for (s| at the left margin o f the syllable, outside o f
the onset (e.g., Kenstowicz 1994:258; Borowsky 1986:175-179). This account is corroborated by the
coronal fricative [6), which cannot occupy the [s] slot and thus is subject to the [coronal][coronal] ban:
[61], [6n], and [6t] are impossible onsets.

47

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
There is little evidence as to the nature of the repair to the illegal sequence. It is

possible that /tl/ is repaired by epenthesis, or by making the [I] syllabic, and thereby

separating its coronal articulation from the [t] (Sproat & Fujimura 1993). Another

possibility is that the IV is realized as [k], as in the attested pronunciation [kliqgit] for

Tlingit'9. It has been shown that French listeners strongly tend to misperceive the illegal [tl]

as [kl] (Halle etal. 1998).

The place-dissimilation repair would require the anti-epenthesis constraint DEPSEG

to dominate lDENT[PLACE]. The epenthesis repair would require the opposite.

(2.42) OCP(CONT, PL), DEPSEG » lDENT[PLACE]

/tlirjgit/ OCP(CONT,PL) D epSeg I d e n t [P l a c e ]

[tlirjgit] *!

[taliqgitj *!

-> [kliggit] *

(2.43) OCP(CONT, PL), lDENT[PLACE] » DEPSEG

/tlirjgit/ OCP(CONT,PL) I d e n t [P l a c e ] D e p Se g

[tliggit] *!

-> [taliqgit] *

[klnjgit] *!

2.3.2.S. ??[pw]

The phonological status of initial [pw] in American English has never been fully

clarified. The [pw bw mw] onsets are often described as marginally acceptable by

English-speaking linguists. Hultzen (1965) and Wooley (1970) consider [pwj a

19 The pronunciation [kliqkit] is deem ed "correct" by the OED.

48

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
permissible English onset; Catford (1988) and Hammond (1999) consider it marginal, like

the initial [3 ] of genre or the initial [v j ] of vroom.

For/pw/ the example I have long used is puissant, attested for 1450 and
occurring once per million words (1/M20) or within the first 20,000 words of
the language, but pueblo (1818) is more frequent (2/M including the place
name) and is usually cited in whatever lists include this item. Both words
are pronounced as indicated, although they do have alternative
pronunciations not pertinent to our list. The word bwana, included by Hill
and others, is rare, but /bw-/ is frequent in Buenos Aires (1/M) in both
American English and RP (Hultzen 1965:12).

Wooiey points out that low frequency cannot be the sole criterion of phonotactic

badness:

Initial /pw, bw, zw, mw/ pose a more difficult problem. As Hultzen has
shown, puissant, dating from 1450, can hardly be rejected. To appeal to the
low frequency of occurrence of these clusters in order to reject them would
be to lose the natively English initial /0w/ as well (1970:74).

More modem frequency counts show that initial [0w] is more common than [pw]

(Celex combined written and spoken, EFW.CD/EPW.CD: 6 per million vs. I per million;

Francis-Kucera: 4 per million versus 0 per million), but the point is well taken. Hammond
(1999) considers initial [pw] and [bw] to be of the same degree of marginality as [dw]

(Celex has only dwarf, dwell, dwindle, and derivatives) and [0w] (Celex has only thwack,

thwart).

There are two reasons to think that in English, less weight is laid on the [pw bw] ban

than the [tl dl] one. First, English [r] is labial (Delattre & Freeman 1968), so the legal,

frequent onsets [pj bj fi] violate the same constraint as [bw]. Second, the ban on same-

place CC sequences is, cross-linguistically, stronger the more similar the two Cs are in

sonority (Selkirk 1988; Padgett 1991). Since [1] is less sonorous than [w] (Kahn 1980;

20 Frequency counts in the quoted passage are from T horndike & Lorge (1944).

49

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Guenter 2000), the [dl] sequence is closer in sonority than [bw] and hence a worse

structural violation.
If we assume that [pw bw mw] are actually illegal in English (that they are

phonological rather than lexical gaps), then the illegality must be due to some active

markedness constraint. It has been proposed by Clements & Keyser (1983) that English

actively prohibits [labial][labial] sequences in the syllable onset. Again we have an effect of

the Obligatory Contour Principle, forbidding adjacent identical segments in a particular

domain (here, the syllable onset).

(2.44) OCP(LAB)
Adjacent labials are prohibited.

Since [p b] and [w] have different values of [cont], this constraint must be an

unrelativized OCP constraint, unlike the OCP(C0 NT, PL) forbidding [tl dl]. In order for

OCP(Lab) to be active, it must dominate some markedness constraint, so that an underlying

[Iabial][!abial] sequence is realized in some other way, repairing the violation. Already we

can see that all is not well with this analysis: What is the repair? Jones (1997) gives

faithful pronunciations for most word-initial [labial][labial] sequences, but some have

alternate pronunciations in which the sequence is repaired:

(2.45) English pronunciations of [pw bw mw]-initial loans (Jones 1997)


p u e b lo [puebloo]
p u is s a n t [pjuusont]
P u e r to ( R ic o ) [pours]
p o ig n a n t [poijiont]
B u e n o s ( A ir e s ) [boonos]
m o ir e [mouei]
M o iv r e [moivoj]

50

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
The repairs are unsystematic and look suspiciously like spelling pronunciations - as

if the disappearing [labial][labial] sequence were a victim of grapheme-to-phoneme

conversion rules rather than synchronic phonological grammar.

Intuitive judgments of "wordlikeness" have been shown to be strongly influenced by

differing frequency of the sequences contained in them, even when the sequences are

phonotactically legal and attested (Coleman & Pierrehumbert 1997; Frisch et al. 2000). The

intuitive-wordlikeness-judgment task will therefore be sensitive to lexical as well as

phonological gaps. If [pw] is legal in English, but very rare, it may be judged unacceptable

on the grounds of low phonotactic probability alone, even though [pw] words are possible.

The onsets [tl] and [ s j ] will be judged worse, being both illegal and rare. This would

account for the pattern of judgments and attestations reported in the phonological literature.

I will therefore analyze [pw] as a lexical rather than a phonological gap.

2.4. Summary

Optimality Theory views both inventory and phonotactics as consequences of the

same grammar. Systematic, productive gaps in the inventory and in the set of

phonotactically permissible combinations arise from the filtering effect of the grammar on

the unconstrained set of possible inputs. A productive, phonological gap in the set of

observed surface forms is one which could not be the output of any conceivable input from

the lexicon. A nonproductive, lexical gap is one which could be filled by some input from

the lexicon, but where the necessary input (through historical accident) is not a lexical item.

Two highly productive gaps in the set of English syllable onsets were discussed:

That on initial [ s j ], and that on initial [ t l] . Both were shown to be special cases of

systematic prohibitions involving neighboring coronals. These phenomena, assimilation

and the Obligatory Contour Principle (OCP), drive similar processes in many non-English

languages. The gaps in the English syllable inventory were analyzed as phonological gaps,

resulting from the filtering action of the grammar.

51

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
A systematic gap, but of doubtful productivity, was also analyzed: The partial ban

on 6 ' [pwj. The systematicity demands an OCP constraint on neighboring labials;

however, the lack of productivity suggests that this constraint is not ranked high enough to

be active. The gap is therefore treated as lexical.

52

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 3

THEORIES OF PHONOTACTIC EFFECTS IN SPEECH PERCEPTION

3.1. Introduction

This chapter discusses three models of phoneme perception: the TRACE model

(which puts all phonotactic effects in the lexicon), the transitional-probability model of Pitt

and McQueen (1998) (which assigns them to a statistically sensitive prelexical module), and

a perceptual model based on Optimality-Theoretic grammar.

3.2. TRACE (McClelland & Elman 1986)

TRACE is a connectionist model of word and phoneme recognition in which the

lexicon can directly influence the perception of phonemes. In TRACE, a fully or partially

activated word candidate provides support for phonemic candidates which is

indistinguishable from the support provided by incoming acoustic information. Phonotactic

effects on speech perception are taken to arise from lack of lexical support for non­

occurring phoneme sequences.

3.2.1. How TRACE works

The TRACE network is described by McClelland & Elman (1986); I will briefly

summarize what they say, but for full details the reader is referred to the original paper.

Each unit is a "detector" representing a hypothesis about the utterance — that it begins with

a voiced sound, that it begins with a (j], that it contains the word "yard", and so forth. The

detectors are organized into three layers, corresponding to features, phonemes, and words.

The "activation level" of a unit is a nonnegative number which varies over time in response

to the unit's inputs. It tells how much credence the model puts in that hypothesis at the

moment.

53

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 3.1. The TRACE model of McClelland and Elman (1986)

/plxstlk/ /Ixp/ Lexicon

Phonemes

[x]

Features
-voice
etc.

fft+ t It Acoustic signal

A unit receives input from all other units with which it is connected. The input

which Unit A contributes to Unit B depends on Unit A's activation and another parameter,

the "strength" of the A-to-B connection. All connections go both ways, so a large positive

strength means A and B strongly excite each other, while a large negative strength means

they strongly inhibit each other. Units on the same level inhibit each other, so that more

confidence in the "plastic" hypothesis means less confidence in the "lap" hypothesis (and

vice versa). Connections between levels are excitatory, so that more confidence in "plaid"

means more confidence that the word starts with [p] (and vice versa). The strength of the

connections is set by the experimenters; TRACE is not a learning model (McClelland &

Elman 1986).

At the very bottom of the model are the acoustic feature detectors, which receive

inputs not only from other units but from outside the model. Each detector is responsible

for a particular feature ([voice], [acute], etc.) at a particular time in the utterance. As the

utterance unfolds moment by moment, the feature detectors register its acoustic properties

54

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and adjust their activation levels accordingly. Activation spreads upwards through the

network. It also spreads back down from the word detectors to the phoneme detectors, and

from them to the feature detectors. Meanwhile, the units on each level are trying to inhibit

each other.

TRACE assumes that the units are open to conscious introspection: To detect X, the

listener uses the X detector unit. Responses to a phoneme-monitoring task, for instance,

depend on the activation levels of the phoneme units. Responses to a word-recognition task

depend on the activation levels of the word units. Because activation spreads downwards

and inhibition spreads sideways, a unit's activation depends not just on the acoustical

configuration which it is nominally supposed to detect, but on the state of the rest of the

network. Under the right circumstances, the result can be strong activation (inhibition) of

the X detector despite the absence (presence) of evidence for X in the acoustic signal — a

perceptual illusion.

TRACE puts phonotactic illegality in the lexicon. Legal and illegal sequences are

processed differently because the legal ones receive support from lexical items containing

them, while the illegal ones do not (since, by definition, they do not appear in any words) -

that is, instead of punishing illegality, TRACE rewards legality. TRACE thus cannot

distinguish illegal sequences from other sequences of zero frequency. Any behavioral

differences between processing of zero-frequency legal sequences and illegal sequences (if

they can be shown to exist) would have to be explained by something outside of the

TRACE system.

55

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.2. Lexical effects on phoneme perception

The lexicon can certainly influence performance on tasks that are intended to tap

phoneme perception, lending credence to the TRACE approach. Evidence comes from four

major paradigms:

Phoneme detection (Foss 1969). Subjects listen to each stimulus and respond "yes”

or "no" depending on whether it has or lacks a particular sound (usually specified as a

letter). The usual dependent measure is RT for correct detections; error rates are < 10% and

not useful.

Phoneme categorization (Liberman, Harris, Hoffman & Griffith 1957). The

stimulus is acoustically ambiguous (e.g., between [bin] and [pin]); subjects are asked which

one it sounds more like. Dependent measures vary; a common one is the point (e.g., on the

VOT continuum) at which both judgments are equally likely. RT is also measured, and

tends to peak at the category boundary.

Phoneme restoration (Samuel 1981ab). One phoneme of the stimulus (non)word

has been either replaced by noise or obscured by noise, and the subject has to say which.

Dependent measures are signal-detection-theoretic d-primes and betas. The effects are

robust; performance is not improved by 1 0 ,0 0 0 trials of practice, nor by any but the most

explicit preview cuing (Samuel 1991).

Shadowing (Cherry 1953). Subject hears speech over headphones, and has to

repeat it in as close to real time as possible. Various dependent measures evaluate how well

mispronunciations were detected.

TRACE attributes lexical effects on phoneme perception to downward spreading of

activation or inhibition from the word units to the phoneme units (McClelland & Elman

1986). Phonetic data extracted from the speech stream is only one influence on the

phoneme units; it can be drowned out by powerful signals from above which bias a

phoneme unit so strongly in one direction that conflicting information from below is not

56

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
enough to offset it. The phoneme unit's activation level is trapped between a fixed minimum

and maximum, and becomes less responsive to its inputs the closer its activation level is to

the floor or ceiling; hence, strong excitation or inhibition from above can also reduce a

phoneme detector's sensitivity to acoustic features.

The three lexical factors known to influence phoneme tasks are lexicality, frequency,

and uniqueness point (UP).

Lexicality and frequency. Effects are rather fragile for many paradigms. The only

reliable one is the Ganong effect in phonetic categorization: If a stimulus is ambiguous

between a word and a nonword owing to ambiguity in one phoneme, the phoneme tends to

be heard so that it makes the word (Ganong 1980, Fox 1984, Connine & Clifton 1987,

McQueen 1991, Pitt & Samuel 1993; not replicated by Burton, Baum, & Blumstein 1989).

The same effect was observed in shadowing by Marslen-Wilson (1984): Subjects "fluently

restored" mispronounced words, apparently without noticing the discrepancy (i.e., with no

effect on shadowing latency). There is at least one report that a one-phoneme ambiguity

between a common and a rare word tends to be resolved in favor of the common word.

However, the effect can be reversed (to favor the rarer word) by setting up the experiment

so that the less common word tends to be the right answer (Connine, Titone, & Wang

1993).

Phoneme targets may be detected faster in words (Rubin, Turvey, & van Gelder

1976, Cutler, Mehler, Norris, & Segui 1987; not replicated by Foss, Harwood, & Blank

1980, Frauenfelder, Segui, & Dijkstra 1990). Word initial phonemes may be quicker to

detect in common than rare words (Morton & Long 1976, Dell & Newman 1980; not

replicated by Segui & Frauenfelder 1986). Phoneme restoration may be stronger in words

than in nonwords (Samuel 1981a, 1996; not replicated by Samuel 1987), and in common

than rare words (Samuel 1981a; not replicated by Samuel 1981b).

TRACE attributes the word-superiority and shadowing-correction effects to

downward spread of lexical activation. The ambiguous acoustic stimulus excites, say, [t]

57

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and [d] equally in the context yar_. Since the YARD unit is somewhat activated by the

context, it contributes activation to [d]; lacking stimulation from a *YART, [t] is overtaken

by [d]. The YARD and [d] units keep exciting each other and inhibiting [t] until [t] is

completely overwhelmed. A stimulus ambiguous between two nonwords, like sirt and sird,

would favor no word node over any other, and would be decided on the basis of the acoustic

evidence (McClellan & Elman 1986). Similar reasoning would apply if YART were a real

word, but much less frequent than YARD.

The possible effects on detection are accounted for by TRACE: Top-down

activation spreading causes the activation of the relevant phoneme unit to reach response

criterion sooner.

Uniqueness point (Marslen-Wilson 1984). RT to detect a phoneme (measured

from the phoneme) decreases later in real words but not nonwords (Marslen-Wilson 1984,

Frauenfelder, Segui & Dijkstra 1990, Wurm & Samuel 1997); effect size varies from -30

ms to -300 ms. Phoneme restoration is stronger late in words with early uniqueness points

than late in words with late uniqueness points (Samuel 1987). Shadowers restore

mispronunciations more later in the word (Marslen-Wilson & Welsh 1978). Response

time to reject a nonword is a fixed amount, measured from the earliest point where the

stimulus differs from all words in the dictionary (Marslen-Wilson 1984).

Strong support for the COHORT word-recognition model comes from these effects,

which are hard to explain in other theories, and it is a great virtue of TRACE that the

connectionist architecture is able to do that. Simulations show that an active word unit is

quickly extinguished when mismatching acoustic information comes in, provided that a

better-matching word unit is present (McClelland & Elman 1986). Shortly after the

uniqueness point, only the matching word unit is still active, strongly inhibiting all rivals and

exciting phoneme units consistent with it (which in turn inhibit phoneme units that are

inconsistent). If mismatching phonetic information comes in after the uniqueness point, it

will have a hard time changing the network's mind about the word or the phonemic code.

58

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.3. Phonotactic effects on phoneme perception

In the previous section we saw that a listener's performance on a phonemic task can

be influenced by their knowledge of the real words of their language. Interestingly,

evidence exists that it can be influenced by their knowledge of the possible words of their

language.

Massaro and Cohen (1983) created segments ambiguous between [j ] and [1] by

varying F3, and asked subjects to judge them in the contexts [ tj] , [p j], [v_i], and [s_i]. In

English, only [i] is permissible after [t], only [1] is permissible after [s], both can follow [p],

and neither can follow [v]. The ambiguous segments were most likely to be judged [j ] in

[tj], less likely in [p_i], less likely still in [v_i] and [ s j] (as shown in their Figure 3.1).

Despite the lexical confounds — [tii], [pii], and [pli] are words — the evidence of [v j] and

[sj] suggests that judgments are altered by people's knowledge that [sji] can't be English:

The larger number of [i] judgments after [v] than [s] cannot be due to acoustic-phonetic

factors — if anything, the labial [v] should make the following F3 sound higher and the

ambiguous segment more [l]-like — nor to lexical ones, since [sli] is not a word - which

leaves phonotactics.

The TRACE model has a good explanation for how phonotactics exerts this

influence. McClelland & Elman (1986) found that the ambiguous stimulus [s?i] partially

activates similar words in the lexicon of their simulated word recognizer. Since the lexicon

contains only phonotactically permissible words, the units for sleep, sleet, and so on
become active, feeding excitation back to [1], but no countervailing sreep or sreet assists [j ].

The amount of acoustic support that [1] needs to reach criterion is thus reduced in the

context [ s j] compared to a neutral context like [v_i] in which there are no lexical items

similar enough to be activated. A phonotactic effect is achieved without phonotactic rules.

59

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.4. Empirical shortcomings o f TRACE

Since TRACE models many different things, it has been criticized on many different

grounds. For a comprehensive review of its shortcomings as a model of phoneme

perception, see McQueen, Norris, & Cutler (1999). Most important, for our study, is that

the phonotactic effect is usually larger and more robust than the lexical effect. This is very

unexpected in a theory which takes phonotactic effects to be diluted lexical effects.

The evanescence of lexical effects came up in §2.2. In a lengthy study, Cutler et al.

(1987) found that they could make word-superiority effects come and go by boring the

listeners less or more. A varied stimulus set, containing mono- and disyllables, got the

lexical effect; a monotonous one did not. They concluded that the lexical effect was not

automatic, but depended on listeners' allocation of attention between the lexical and the

prelexical levels'of representation. For a detailed review of such results from several

paradigms, see McQueen et al. (1999).

Phonotactic effects, by contrast, are robust and not affected by stimulus monotony.

The original Massaro & Cohen (1983) experiment got very large effects with a

monosyllabic stimulus set repeated for 1120 trials (total over two days). Pitt (1998) got

several large phonotactic effects with monosyllabic stimuli. Moreton & Amano (1999)

directly compared the effect of lexical status on the Japanese vowel-length boundary with

that of phonotactics using the same subjects and paradigm. They found a large phonotactic

effect but barely any lexical effect.

In other words, manipulations that make the lexical effect go away can still leave a

phonotactic effect. This is a problem for any theory which, like TRACE, denies a

distinction between lexical and phonological gaps. 1

1 M cClelland and Elman (1986) report an experiment which suggests that the lexical effect and phonotactic
effects can be superimposed. They compared listeners'judgments o f a segment [?] between [b] and [d] in the
contexts jwindle, _wiffle, and _wacelet. The highest rate of "d" response was obtained in the _windle
context, where dwindle is a word; an intermediate rate was found in jwiffle, where neither endpoint is a
word; and the lowest rate was found in _wacelet, where neither endpoint is a word but bwacelet is very

60

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In TRACE, the phonotactic effect could be stronger because it combines the effects

of many lexical items, while the lexical-superiority effect depends on a single item.

However, the TRACE authors have shown that in fact the lexical-superiority effect is

stronger: When the network is presented with an ambiguous phoneme between [p] and [t]

in the context [_luli], it is classified as [t], with the lexical influence of truly winning out over

the phonotactic badness of [tl] (McClelland & Elman 1986).

3.3. The MERGE Transitional Probability theory (Pitt & McQueen 1998)2

Another potentially exploitable redundancy in speech is the statistical distribution of

segments. Different segment sequences, such as diphones or triphones, occur with different

frequencies. A model which is sensitive to these frequencies can compare the statistical

plausibility of alternative parses of ambiguous speech input in order to disambiguate it.

Such statistical information can also in principle be used to find word boundaries, and serve

as the basis for possible-word judgments.

There is evidence from various sources that listeners are sensitive to sequence

frequency. Treiman et al. (1996) found that nonwords containing high-probability

sequences are rated as "more English-like" by native speakers than those containing low-

probability sequences, and that, when subjects are asked to construct portmanteaus by

blending two nonwords, low-frequency sequences tend to be broken up more often than

high-frequency ones. Frisch et al. (2000) showed that listeners' "wordlikeness" judgments

were very strongly affected by the frequency of legal phoneme sequences contained in the

stimulus. Vitevich et al. (1997) found that nonwords containing frequent sequences were

similar to one. They interpreted this as a lexical effect superim posed on a phonotactic bias against *[bw) -
i.e., even though all the contexts were phonotactically biased, a lexical effect was still obtained.
Since all o f the contexts started with _w, the experim ent did not demonstrate a phonotactic bias;
its presence was sim ply assum ed. As shown in Chapter 4, E nglish listeners' bias against [bw] is weak if it
exists at all. Hence, the experim ent may just have measured an isolated lexical effect. It could still be true
that a really strong phonotactic effect, like the bias against inital [d l], would swamp any lexical effect.
2 The M ERGE fram ew ork is described most completely in N orris et al. (2000). The first discussion o f it
in connection with transitional probabilities is Pitt and M cQueen (1998).

61

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
rated as "more English-like" and were repeated faster in a single-word shadowing task than

nonwords containing rare sequences. English listeners learning an artificial language

develop statistical sensitivity to the different probabilities of sound sequences in that

language even if the linguistic input is an unattended background stimulus (Saffran et al.

1996, 1997; Aslin et al. 1998).

Some evidence that the sequences are encoded separately from the lexicon comes

from work by Vitevich & Luce (1998, Experiment 1). They constructed lists of disyllabic

English words and nonwords which varied in sequence frequency, so that some items were

"high probability" and some were "low probability". When pairs of nonwords were

presented for same-different judgments, listeners responded faster to high-probability pairs

than low-probability pairs. When pairs of words were presented, however, the pattern was

reversed: faster for low-probability, slower for high-probability. The authors' interpretation

is that high-probability words and nonwords are both facilitated by the frequency of their

sublexical sequences, but the high-probability words are more strongly inhibited by

competition from their many lexical neighbors. Further evidence is provided by Pitt &

McQueen (1998), in which an ambiguous fricative disambiguated by lexical information

(the Ganong effect) did not induce compensation for coarticulation in a following

ambiguous stop, while a fricative disambiguated by diphone-frequency information did.

Where TRACE seeks to explain phonotactic illegality as a gap in the lexicon, the

probabilistic theories seek to explain it as a gap in the set of attested sequences: A

phonotactically illegal configuration is one which has zero frequency (Pitt & McQueen

1998:349). Like the TRACE account, a probabilistic theory predicts (1) that illegal

sequences are only slightly different from rare sequences, and (2 ) that all zero-frequency

sequences (of the relevant length) are equally illegal.

In this study, we will focus on one particular implementation of a probabilistic

theory, namely, that of Pitt & McQueen (1998), because it is the one which was designed

for problems of ambiguous phoneme perception. The authors actually define a class of

62

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
probabilistic theories, rather than a specific one: certain manipulable parameters are left

unfixed. This section will try to narrow down the range of possible implementations on the

basis of existing data, so that the remaining possibilities can be tested experimentally.

The rest of this section is organized as follows: §3.3.1. illustrates the functional

utility of statistical knowledge. §3.3.2. describes the range of possible probabilistic theories

in the Pitt-McQueen class. §3.3.3. discusses these theories with respect to the existing data

on ambiguous-phoneme perception, eliminates some of them, and defines the specific

models which we will test.

3.3.1. Simulation: Success of statistical predictions

The functional motivation for probabilistic speech perception is clean Sequence

probabilities, even for very short sequences, greatly constrain the hypothesis space which

the listener must search. To illustrate this, let us consider a model whose task is to listen to

a list of isolated words drawn from the Celex English lemma database. The words occur

with their Celex spoken corpus frequencies. Every so often, one of the words is truncated at

a random location at least n segments into the word, and the model is asked to predict the

next segment (word boundaries are counted as segments). For this task, the model has

available only a table of transitional probabilities: for each string of n segments, it knows

the likelihood that the n+ /st will be [a], [t], etc. An example, for n = 2, is shown in Table

3.2:

63

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.2. Probability that a given diphone will be followed by a given segment (extract
from complete table)

Preceding context Next segment Probability

WA n 0.97
WA r 0.03
WA s 0 .0 0

vz ) 0.99
vz d 0 .0 1

vae 9 0 .0 0

vae 9 0 .0 0

vae k 0.07
vae 1 0.62
vae m 0 .0 1

vae n 0.25

V T . . It/M l It \ M I ■
Note: "(" an d ")" mark word boundaries.

To predict which segment will follow a given context, the model's best strategy is to

always guess the segment that is most frequent after that context, since that maximizes its

chance of guessing right.

As an illustration, a simulation of this hypothetical experiment was run in which

words were randomly chosen from the Celex wordforms database (EPW.CD) according to

64

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
their frequency (combined written and spoken, which is Field 3 of EPW.CD). Initial and

final word-boundary markers were added to each word's segmental representation. From

each representation, a substring of length n + 1 was chosen at random (if the word was long

enough), and the model was asked to guess the last segment on the basis of its likelihood

given the first n. The model was credited with a correct guess if the final segment of the

substring was the best guess according to the guessing strategy (i.e., if the actual last

segment was the likeliest). The simulation3 was run for approximately 100,000 trials for

each of n = 1,2, and 3. It did rather well:

Table 3.3. Results of the simulation: Success rate as a function of context size

Size of preceding context (n) Model's success rate

I 0.378

2 0.630

3 0.783

That is, knowing only the last two segments, the model will predict the next one

correctly nearly two-thirds of the time - with zero acoustic information and zero lexical

information. This is only a thought experiment, but it is close enough to both the lab and

real life to show that even a small amount of probability knowledge can be used to very

great advantage.

3.3.2. Probabilistic theories o f speech perception

The probabilistic theory of Pitt & McQueen (1998) is, essentially, that prelexical

mechanisms are sensitive to sequence frequencies (in the equivalent form of transitional

probabilities), and that, when acoustic evidence is inconclusive, perception favors the more

likely option. This prelexical probabilistic module, along with the Shortlist model of word

3 The script s i m u l a t e d _ g u e s s is included in the appendix to this chapter.

65

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
recognition (Norris et al. 2000), forms part of the MERGE model, a theory of phoneme-

processing tasks in which the output of prelexical phonemic processing, along with lexical

information, is used in making phoneme-based decisions (Norris et al. 2000). Probabilistic

effects, in this model, occur very early, and are separate from lexical effects. The relative

contribution of each to phoneme responses is determined by attentional weighting, which in

turn is determined by task variables.

There are several different ways to make a theory of sequence-probability influence

on phoneme perception. The major adjustable parameters include: (1) location and size of

the context, (2 ) the database from which the probabilities are computed, and (3) the guessing

strategy. This section describes the possible parameter settings, and the theories resulting

from them.

3.3.2.1. Context

In a typical perception experiment, the listener is confronted with an acoustically

ambiguous segment "?", which could be either x or y, in a context A_B. How can statistical

knowledge about the frequencies of AtB and AyB be used to disambiguate it?

One way is to directly compare the likelihood of x and y in the context. The

decision depends on the conditional probabilities P(.v | A_B) and P(y | A_B), where

(3.4)

F(AxB)
P(x\A_B) = (a)
F(A_B)

F(AyB)
P( y \ A _ B ) = (b)
F( A _ B )

66

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Since F(string) is the frequency with which that string occurs in the database (the

listener’s experience), all the model has to do in order to make its decision is to compare

F(Atfl) with F(Ayfl). It consults its table4 of (2n+l)-phone frequencies, where n is the

length of A and B,5 retrieves the two relevant frequencies, and hands them over to the

decision rule. This kind of context I will call surrounding context o f order n.

A second possibility is to treat the left and right contexts separately. The decision

depends on the conditional probabilities P(jc | A_), P(y | A_), P(x | _B), and P(y | _B), which

reduce to the frequencies F(Ax), F(Ay), F(xfl), and F(yB). This I will call independent

neighboring context o f order n (again assuming simplistically that A and B have equal

lengths). A table of (n+l)-phone frequencies is consulted.

The predictive difference between these two context types is that surrounding

context (SC) can take advantage of statistical dependencies between A and B, while

independent neighboring context (INC) cannot.

For example, English, like most languages, requires sonority to rise in syllable

onsets but not in codas. As a result, it lacks sequences like [tdt], [pdp], [fvp], etc., since no

matter what precedes or follows such a sequence, there is no legitimate syllabification - the

consonant in the middle is higher in sonority than either of its neighbors, but not high

enough that it can serve as a syllable nucleus itself. A model using SC of order 1 will note

the gaps in its table of 3-phone frequencies. A model using INC of order 1, though, will

miss these gaps, since each of the sub-sequences [td], [dt], etc. does in fact occur. Given a

segment ambiguous between [1] and [n] in the context [m_z], the SC-1 model will favor [I],

since [mlz] is attested (e.g., in camels), while [mnz] isn't (at least, not for speakers who

lack syllabic [n] in lemons). The INC-1 model will note only the low nonzero frequency of

[mn] (e.g., damnation, amnesia), and the high frequencies of [mz], [ml] and [Iz], and treat

4 I say "table", but that is only one notational variant. They can also be viewed as sublexical-sequence
detector units whose resting activation or excitability depends on frequency. A proposal along these lines is
Vitevich and Luce (1999).
5 For theoretical simplicity's sake I assume symmetry; A and B have the sam e length. This might be
wrong.

67

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[n] as no worse in [m_z] than in [m_ei]. Which of these models is closer to what people

do is of course a question for the laboratory.

Another difference between SC and INC, of no consequence predictively but

significant conceptually, is the size of the n-phone tables. As the size of the context

increases, so does the number of phoneme sequences whose frequencies the prelexical

module has to keep track of. Their number quickly approaches the size of the lexicon:6

Table 3.5. Attested English phoneme sequences of lengths 2,3, and 4

Sequence length Number of sequences

2 1,395

3 11,961

4 35, 732

Note: Counts were made from the set of Celex wordforms occurring at least once per
million words. Initial and final word boundaries were counted as phonemes. For
comparison, the number of lemmas in Celex is 52,447.

The MERGE TP models are intended to contrast with TRACE by keeping the

lexicon out of the early stages of speech perception. As the rt-phone tables grow, this

difference becomes blurred: the tables incorporate not only the entire lexicon of length n or

less, but fragments of many larger words, and the two theories come to make more and

more similar predictions. These considerations argue for the INC theories over the SC

theories, since the former use shorter n-phones to describe the same-sized context.

Pitt & McQueen (1998)’s experiments use a preceding context of length 1, but the

authors discuss evidence that a preceding context of length 3 may be needed. Since all of

the stimuli they discuss had the same following context (silence), they did not need to go

into their claims about following context. I will assume that they are considering one of

three theories of context: SC-1, INC-1, or INC-3. (See discussion below, §3.3.3.)

6 The single word bat [baetl, for example, contributes four 2-phones: #b, bae, aet, and t#.

68

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3.2.2. Database

What corpus are the n-phone tables based on? This is both a theoretical and a

practical problem. There are two principal options.

One possibility is that the n-phone frequencies are computed from the stored items

in the lexicon. Each word contributes its n-phones, weighted according to the word's

frequency. The lexicon does not directly participate in speech perception, but contributes

off-line by updating the n-phone tables which the early perceptual mechanisms can consult.

Such theories use a lexical database.

A second possibility is that the /i-phone frequencies are computed directly from the

incoming speech stream - computed by the same mechanisms that later consult them,

without any participation from the lexicon. This is more in keeping with the spirit of the Pitt

6 McQueen (1998) model, a strictly bottom-up theory which aims to block the lexicon

from interfering with the early stages of perception. Such theories use a corpus database.

The empirical difference between the two is that the lexical database respects

morphological word boundaries, while the corpus database does not. The reason is that

morphological word boundaries are represented explicitly in the lexicon, but not in a

segmental analysis of the speech stream7. The effect on n-phone statistics can be

substantial. For instance, geminates are very rare in the English lexicon but occur freely in

running speech (That trite talk keeps Sid dozing).

The practical difference is that on-line dictionaries make the lexical-database

statistics easy to compute, while a lack of accessible on-line phonetic corpora makes the

7 M orphological word boundaries have many indirect correlates in the surface-level phonetic analysis o f the
speech stream . Since prosodic boundaries tend to be aligned with them, they often correlate with fortition
(Fougeron & Keating 1997). As "prominent” positions, they tend to support phonological contrasts not
available elsewhere (Beckman 1998, Smith 1999), and to undergo prominence-enhancing phonological
processes (Smith forthcoming). None o f these correlates, however, allows morphological word boundaries
to be unam biguously located in the speech stream pre-lexically. If the n-phone tables are compiled
prelexically, no character corresponding to a morphological word boundary will be present in them.
(The same problem was faced by post-Bloom fieldian structuralist linguistic theory, which
dem anded that grammatical analysis proceed from low er to higher levels. Harris (1951) suggested a
statistical solution: Morph boundaries occur where the unpredictability of the next phoneme reaches a peak.
T his solution, like the TP theory, makes indirect use o f lexical and grammatical information. See
Newm eyer (1986:7-9) for a discussion.)

69

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
corpus-database statistics hard to compute. The standard practice in the field is therefore to

use an on-line dictionary and tacitly assume that the difference is negligible until proven

otherwise. Since I can’t tell what the corpus-database statistics predict, I will have to ignore

that theory and focus on the lexical-database theory.

Most of the frequency counts which Pitt and McQueen relied on were computed for

American English pronunciations, with frequencies apparently reckoned from the million-

word written American English corpus of Kucera and Francis (1967). Celex's 18-million-

word corpus is much larger, separates written from spoken English, and distinguishes

different inflected forms of the same word, but its British pronunciations are a drawback

when working with American English speakers. The result is some uncertainty about what

the TPs really are, and hence about what a TP-based theory would actually predict. For

instance, if one wants to reckon the probability that the segment following a given vowel will

be [s] or Jj) - a crucial case in the study of Pitt and McQueen (1998) - one can get three

different estimates for each:

Agreement is tolerably good if we stick to a one-segment context (i.e., a table of

length-2 sequences). The absolute magnitudes may differ by a factor of three, but the

different methods generally agree as to which segment each context favors.

For safety's sake I will give frequencies using both an American English frequency

dictionary similar to Pitt and McQueen's and the Celex corpus. Details of how these

frequencies are computed will be found in the appendix to this chapter.

70

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.6. Transitional probabilities for the stimuli of Pitt & McQueen (1998), n = 1

Probability

Pitt & McQueen Celex: written and Francis-Kucera:

(1998, Table 2): spoken Br. Eng. written Am. Eng .8

Transition written Am. Eng.

Pr ( [us] | [uJ ) 0.019 0 .0 2 1 0.008

Pr ( [uj] | [u_]) 0 .0 1 0 0.009 0.009

Pr ( [us] | [u _ ]) 0.004 0 .0 0 2 0 .0 0 1

Pr 07 u j) 0.004 0.013 0.005

P r ( [ J s ] |[ j J ) 0.058 0.115 0.163

P r ( [ i f l |[ J J ) 0.007 0.009 0.007

Pr ( [eis] | [eiJ ) 0.064 0.026 0.069

Pr ( [eij] | [ e i j ) 0.139 0.043 0.133

Pr ( [is] | [i_ ]) 0 .0 2 1 0.017 0.023

P r ( [ i J ] |[ i J ) 0 .0 0 2 0 .0 0 1 0 .0 0 1

Pr ( [ip] | [i_ ]) 0 .0 2 0 0 .0 2 1 0.018

Pr ( [it] | [i_ ]) 0.025 0.026 0.025

Pr ( [eip] | [eiJ ) 0.015 0.008 0.017

Pr ( [eit] | [eiJ ) 0.151 0.054 0.123

8 Computed from the same database used by Pitt & M cQueen, but apparently using a somewhat different
counting method.

71

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
To compare the conditional probability P(x | A_B) with P(y | A_B) - that is, the

relative chances of finding x or y in a given environment - we need only compare the

frequency of AxB with that of AyB, since P(x | A_B) = (frequency of AxB)/(frequency of

the A_B environment) and P(y | A_B) = (frequency of AyB)/(frequency of the A_B

environment). I will therefore report only the AxB and AyB frequency counts.

3.3.2.3. Decision rule

Once the statistical information has been used to estimate the probability that a

particular segment in the context A_B is x or y, the model then has to choose one of the two.

How?

The general form of the decision rule must specify the probability that the model

guesses x rather than y given a stimulus A ?B. The decision rule has to take into account at

least two things: 1. the acoustic composition of the ambiguous segment (how close it is to x

ory), and 2. the TP statistics.

In everyday life, listeners are constantly confronted with productions of x and y.

Some are clear; some are garbled in various ways. The listener may at first parse a given

production "wrongly" (i.e., not as the speaker intended it), but usually the correct

interpretation becomes clear shortly as the listener recognizes the speaker’s intended

message. The listener therefore has the feedback needed to optimize the decision rule by

adjusting its parameters. We will suppose that they do this, with the goal of maximizing

their likelihood of correctly restoring the intended stimulus.

Each AxB or AyB stimulus puts the listener in a particular internal state. Which

internal state will depend not only on the speaker's intent, but on the garbling and on the

perceptual noise added by the listener's auditory system. Under the TP hypothesis, the

listener's response is determined by their perceptual state and by the distributional statistics

of their language. For the sake of illustration, let's assume the SC-1 statistics.

72

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Suppose the listener, having heard a particular stimulus A ?B (intended by the

speaker as AxB or AyB), is now in State Z, a state which can lead to a response of "x"

The likelihood of correctly guessing the intended message is

(3.7)

PC{Z) = P(AxB | Z) • P( guess x \ Z ) + P(AyB \ Z) • P(guess y | Z)

By Bayes's Theorem,

(3.8)

P (Z | AxB)* P{AxB)
P(AxB \ Z) =
P(Z | AxB) * P(AxB) + P(Z | AyB) • P(AyB)

_________ P(Z\ AyB)*P(AyB)_________


P(AvB\ Z) =
PiZ. | AxB) • P(AxB) + P{Z | AyB) * P(AyB)

Letting rx = P(Z | speaker said AxB), px = P(speaker said AxB), qx = P(guess

Z), and similarly for y, we get

(3.9)

PC(Z) = r^ — qx + — ^ ---- qv
r <P< + ryP, rxPl + ryPy

rxPx + ryPy rxPx + ryPy

73

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
What choice of qx, our only free parameter, maximizes our chance of guessing

correctly? Clearly9, either qx = 0 or qx = 1. Since Z was arbitrarily chosen, it is true in

general that from any given internal state, the optimal choice is either always "x" or always
”y". The choice depends on the r and p parameters: if rxpx >rypy, then "x" is the best

guess; if the reverse, then "y".

In the language of Signal Detection Theory (Green & Swets 1966, Macmillan &

Creelman 1991), Choice Theory (Luce 1963), or Generalized Recognition Theory (Ashby

& Maddox 1994), an acoustic stimulus evokes an internal representation as a point in a

perceptual space. ("State Z" is one such point.) Following the reasoning described above,

the space is partitioned into regions, and all points in the same region lead to the same
response. To get optimal performance, each region must contain only points where rj<px >

rypy , or only points where rxpx < rypy , so the boundaries must be drawn so that rxp x

—r\Py for points Z on the boundary (Macmillan & Creelman 1991); i.e., so that rx/ry =

Py/px, or

(3.10)

P(Z\ AyB) P{AyB)


P(Z | AxB) ~ P(AxB)

If the a-priori probability ratio (the right-hand side) changes, as when A_B is

replaced by a different phonological environment C_D in which y is less likely, then the

boundary must move in order to keep the likelihood ratio (the left-hand side) equal to it.

9 PC(Z) is linear in qx , so its maximum must be at the sm allest or largest possible value o f qx , i.e., 0 or
1.

74

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
The consequences for a typical perceptual experiment are illustrated in (10). The

plane is perceptual space, with x and y the idealized perceptual representations of the

endpoint stimuli - "idealized", because in fact perceptual noise causes each presentation of a

stimulus to evoke a slightly different percept. The irregular line of dots shows the idealized

locations of the intermediate stimuli. Following the optimal response strategy, listeners

respond "x" when the percept is on one side of the boundary and "y" when it is on the

other, hence, what will be observed in the experiment is that the responses cross over from

mostly "x" to mostly "y" where the line of stimuli intersects the boundary. The difference

in boundary location between the A_B and C_D contexts causes a corresponding shift in the

location of the "x'V'y" crossover point.

Figure 3.11. Boundary shift in perceptual space

A_B C_D

75

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The listener thus optimizes performance by changing their willingness to respond

"x" in accordance with the ratio of the probabilities of x and y in the context. This leads to

an important conclusion: The effect of context on the location of the "x'Vy" response

boundary depends on the ratio of the probabilities of x and y in that context, and not on their

difference.

For example: Suppose AxB, AyB, and CxD all occur 1,000 times per million words,

while CyD occurs 901 times per million words. The x/y ratio in AJB is 1, while that in CyD

is about 1.1. If a shift in the "x’V'y" boundary between A_B and C_D is found

experimentally, we would expect an even larger shift between A'JB'and C'_D\ where A ’x B \

A'yB', and C'xD' occur 100 times per million words and C'yD' occurs 1 time per million

words (giving x/y ratios of 1 in A'JB' and 100 in C'_Dr). Though the frequency

differences are the same (99 per million in both cases), it is the ratios that matter.

It is very unlikely that listeners follow the optimal strategy to the letter, which would

entail disregarding acoustic evidence of an event of zero probability. Under the optimal

strategy, a sequence of probability 0 is infinitely unlikelier than a sequence of positive

probability; hence, phonotactically illegal stimuli should always be as legal. There is

probably a limit to how far the criterion can be shifted, so that larger and larger a-priori

probability ratios only increase the bias up to a point. Very infrequent sequences are

therefore expected to behave similarly to absolutely non-occurring ones.

3.3.3. Statistical context effects on phoneme perception

The MERGE TP theory is intended to explain an interaction between lexical and

phonetic effects in phoneme perception. Mann & Repp (1981) showed that a segment

ambiguous between [t] and [k] tends to be heard as [t] after [J] and as [k] after [s]. The

effect evidently arises at an early (low) level of processing, either because the perceptual
system is compensating for expected coarticulatory effects, or because the low-frequency [J]

makes the next segment sound higher by contrast while the high-frequency [s] makes it

76

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
sound lower (Kluender & Lotto 1994). Elman and McClelland (1988) used neither [f] nor

[s], but a segment [?] acoustically in between them. When [?] followed Christma_,

r id ic u lo u or copiou_, it acted like [s] in its effect on perception of a following [t]-[k]

(tapes-capes) continuum. When [?] followed fooli_, Spani_, or Engli_, it acted like [J].

They concluded that lexical activation was spreading down to the phoneme level to favor tf]

or [s], as the case might be, which then had its ordinary phonetic effect on the following

segment.

Pitt and McQueen (1998) argued that early phoneme processing was immune from

lexical effects, and that the Elman and McClelland results could be accounted for if low-

level (prelexical) phonetic processes were sensitive to segment-to-segment TPs: the


ambiguous [?] was behaving like (fl or [s] depending on which was more likely to follow

the preceding segmental context. When [J] was more likely, [?] produced more [t]

responses to the following [t]-[k] continuum: when [s] was more likely, [?] produced more

[k] responses.

Where Elman and McClelland had asked only for judgments of the [t]-[k]

continuum, Pitt and McQueen (1998, Experiment 1) also asked for judgments of [?]. The

[?] was presented in two pairs of biasing contexts. One pair, [d3 U_] and [bu_], were

lexically biased towards [s] (juice) and (XI (bush), but the TP from the vowel to [s] and to (XI

was the same for both. The other pair, [di_] and [nei ], were lexically unbiased (since [dis],

[dijj, [nets], and [neijj are all nonwords), but differed in the TPs from the vowel to the

following fricative. The relevant statistics are shown in Table 3.6, repeated here for

convenience:

77

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Table 3.6. Transitional probabilities for the stimuli of Pitt and McQueen (1998), n=l

Probability

Pitt & McQueen Celex: written and Francis-Kucera:

(1998, Table 2): spoken Br. Eng. written Am. Eng. 10

Transition written Am. Eng.

P r ( [ u s ] |[ u J ) 0.019 0 .0 2 1 0.008

P r ( [ u J l |[ u J ) 0 .0 1 0 0.009 0.009

P r ( [ u s ] |[ u J ) 0.004 0 .0 0 2 0 .0 0 1

P r ( [ u J ] |[ u J ) 0.004 0.013 0.005

Pr ( [is] | [j _] ) 0.058 0.115 0.163

P r ( [ i f ] |[ i J ) 0.007 0.009 0.007

Pr ( [ets] | [et_]) 0.064 0.026 0.069

P r ( [ e i J ] |[ e i J ) 0.139 0.043 0.133

Pr ( [is] | [i_]) 0 .0 2 1 0.017 0.023

P r ( [ i f ] |[ U ) 0 .0 0 2 0 .0 0 1 0 .0 0 1

P r ([ ip ]|[ L J ) 0 .0 2 0 0 .0 2 1 0.018

Pr ( [it] | [i_]) 0.025 0.026 0.025

Pr ( [eip] | [ei_J ) 0.015 0.008 0.017

P r ( [ e i t ] |[ e i J ) 0.151 0.054 0.123

10 Computed from the same database used by Pitt & McQueen, but apparently using a somewhat different
counting method.

78

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Pitt and McQueen found a lexical and a TP influence on [?] report, but only a TP

influence on [t]-[k] report, suggesting that TPs were influencing early phonetic processing,

with the lexical effect only emerging at a later stage. 11 At an early, prelexical stage of

phonetic processing, the ambiguous fricative would be disambiguated using TPs, and,

having been classified as [s] or tf], would so affect the perception of the following [t] or [k].

Later on, after lexical access, the lexicon could affect listeners' report of the fricative, but not

of the stop.

Does the experimental evidence from these two studies rule out any of the possible

TP-based theories described in the last section? Are the left and right contexts independent

(INC model), or are they treated as a single unit (SC model)? Since following context was

not varied in these studies (it was always a [t]-[k] or [d]-[g] continuum before [ei]), they do

not distinguish INC from SC. However, they do throw some light on the question of how

much preceding context has an effect.

The most conservative hypothesis, which is used by Pitt and McQueen (1998)

through most of their paper, is that only the immediately preceding segment matters: the
decision between [s] and [f] after [WXYZ_] depends only on the frequencies of [Zs] and

m
McClelland and Elman had considered this account of their results:

One might have proposed that simple phoneme-to-phoneme sequential


constraints are such that they would lead subjects to predict that the final
phoneme in "Spanish" was an /J7 but the final phoneme in "ridiculous" was
an /s/, quite apart from specific lexical factors; it may be that [nl_] is more
often completed with tf], while [I? _ ] 12 is more often competed with [s]
(1988:158).

11 The latter half o f this squares with the Findings o f Fox (1984), who reported that lexical influences on
ambiguous-phoneme perception turn up only among the responses with long RTs.
12 Sic; an apparent typo for [la].

79

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
They dismiss it in view of the results of their Experiment 3, in which a large effect

was found when the contexts werefooY? and riciicuY?, where ”Y" was a CV sequence

intermediate between [li] and [Is]:

However, in the syllable-replaced condition, the context in the replaced items


is actually the same for three phonemes before the final fricative; the vowel
in "foo_,” and the last vowel in "ridicu_" are the same vowel, though they
may have slightly different acoustic realizations due to coaiticulation, and the
next two sounds in the two contexts are both acoustically and phonetically
identical in the syllable-replaced stimuli. Thus, any differential prediction of
the identity of the final fricative would have to be based on ”f_" vs. "ridic_,"
and thus would seem to be attributable to knowledge that is specific to the
particular lexical items involved (1988:158-159).

Context fully three segments away is affecting the ambiguous fricative so strongly

that it in turn affects perception of the following stop. The TRACE authors argue, in effect,

that expanding the TP context to be that big makes the TP theory practically lexical, by

including whole words in table of sequence frequencies (e.g., all words in the lexicon which

are four segments long or shorter). At any rate, a single segment of preceding context is not

enough.

Pitt and McQueen reply that the last vowels in "foo_" and "ridicu_" are not in fact
identical; the one they take to be [u] and the other to be [u]. If we consider the frequencies

of the four sequences [ulis], [uhjl, [ulis], and [ulij], then the TPs favor [is] after [ul_] and

[sj] after [ u l j . 13

/ulij/ occurs in words like coolish, foolish, and ghoulishness. Celex shows
that this string occurs about 26 times per million words, /ulis/ and /ulos/
occur less than once per million words, and /ulojV does not occur at all. So
t y is much more likely given /uIV?/. The opposite bias operates after/U/.
The string /Utas/ is quite common, in words like incredulously, ridiculous,

13 Ridiculous is transcribed by the Francis-Kucera dictionary as ridik{jo]lous; Jones (1997) records both this
and ri</ic[ja]/u5. Some speakers may have this latter pronunciation. CELEX estim ates /alas/ to occur
about 56 times per million words (combined spoken and w ritten), and /alaJV to occur not at all; /alts/
occurs 350 times, and /altJ-/ 25. Following Pitt & McQueen's reasoning, [s] is still more likely after
/a IV?/.

80

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and stimulus. The CELEX estimate is 8 6 times per million words. /Ulis/
also occurs (in words like oculist and somnambulist, 9 times per million),
but /ulaJV and /ulif/ never occur. So /s/ is much more likely after AJIV?/.
(Pitt & McQueen 1998:365)

This counterproposal does not necessarily require the listener to keep track of 4-

phones, of which there are at least 35,732 (see Table 3.1). Perhaps what has happened in

McClelland and Elman's experiment is a statistical chain reaction. Suppose the listener

maintains a 3-phone table (at least 11,961 entries). When the ambiguous [i]/[a] vowel is

encountered after [ul_J or [ul_], it is disambiguated using 3-phones. The restored [li_] or

[ la J context is then used to decide, statistically, between [s] and [fl.

The statistics of English permit this. Table 3.12 shows the relevant Celex counts for

the [i]/[a] decision, which favor [i] after [ul_] and [a] after [ul_]. (The [i] counts in Celex

are too high for American English, since word-final unstressed [i], as in marry, is

pronounced [I] in Southern English dialects (Trudgill 1999). I have corrected them in the

table by subtracting the number of word-final occurrences in each context.) Table 3.13

shows the counts for the [s]/[J] decision, which strongly favor [s] after [la_], but are nearly

neutral after [li_].

81

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.12. Triphone frequencies for sequences ending in [i]/[a] in the stimuli of
McClelland and Elman (1988)

Frequency per million words, Celex Frequency per

EFW.CD/EPW.CD million words,

Francis/Kucera

3-phone Combined Written Spoken

[uli] (raw) 151 161 76 o 14

[uli#] (word-final) -72 -7 8 -24


[uli] (corrected) 79 83 52

[ula] 32 33 12 15

[uli] (raw) 361 374 219 I

[uli#] (word-final) -264 -273 -171

[uli] (corrected) 97 101 48


[ula] 939 902 1342 175

14 This American English dictionary does not contain the word foolish.

82

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.13. Triphone frequencies for sequences ending in [s]/[f] in the stimuli of
McClelland and Elman (1988)

Frequency per million words, Celex Frequency per

EFW.CD/EPW.CD million words,

Francis/Kucera

3-phone Combined Written Spoken

[Its] 1445 1473 916 416

m 707 695 826 362

[las] 617 612 674 270

12 11 14 0

However, if the TP context is extended to include the preceding two segments, we

now make the wrong prediction about Pitt and McQueen’s Experiments 1-3, since now
[d3 U_J and [bu_] have 100% TP biases towards [s] and [J] respectively (see Table 3.14).

This should have produced a TP effect, but did not. Even worse, the [di_] and [mi_]

contexts, which produced a large effect in their Experiment 3, are unbiased.

83

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.14. Triphone frequencies for the stimuli of McQueen and Pitt (1998)

Frequency per million words, Celex Frequency per

EFW.CD/EPW.CD million words,

Francis/Kucera

3-phone Combined Written Spoken

[d3 us] 28 30 4 17

0 0 0 0

[bus] 0 0 0 0

[buj] 74 79 7 24

[dis] 0 0 0 0

[dif] 0 0 0 0

[neis] 14 15 6 2

[neij] 543 558 394 486

[mis] 0 0 0 0

[mil] 0 0 0 0

A context size of 1 segment is too small to account for the Elman & McClelland

(1988) results. A context size of 2 segments can handle those, but not the Pitt & McQueen

(1998) results. Larger contexts do not solve this latter problem (since the juice/bush stimuli

are only three segments long), and in any case lead to a duplication of the lexicon at a

prelexical level.

84

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TRACE can explain this disparity, as noted in this connection by Samuel (2000).

Lexical effects in TRACE increase over time, and are greatest at the end of long words,

because the word nodes take time to reach activation and arc more active the more phoneme

nodes are feeding into them. The ambiguous fricatives in both experiments came at the end

of a word, but the McClelland and Elman words were much longer than the Pitt and

McQueen stimuli ([fuli_] and [iidikjuld_J versus [d3 u_] and [bu_J).

In this experiment, Pitt and McQueen not only failed to find a lexical effect with the

contexts jui_ and bu_, they succeeded in getting a TP effect with the contexts mee_ and

nay_, both of which make nonwords no matter which way the ambiguous fricative is

interpreted. Tables 3.15 and 3.16 show the cohorts at the time the ambiguous fricative

appears. It is clear that effects were found in ail and only those cases where, in at least one

of the paired stimulus contexts, the active cohort strongly favored [s] or [f] at the time the

ambiguous fricative appeared.

Table 3.15. Cohorts at the appearance of the ambiguous fricative in the experiment of
McClelland and Elman (1988, Experiment 3)

Continued with [s] Continued with []"]

Preceding context Words Frequency Words Frequency

ridiculou_ [Jidikjuloj ridiculous 36 (none) 0

fooli_ [fulij (none) 0 foolish 11


foolishly 2

85

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.16. Cohorts at the appearance of the ambiguous fricative in the experiment of Pitt
and McQueen (1998, Experiment 3)

Continued with [s] Continued with [fl

Preceding context Words Frequency Words Frequency

jui_ [d3 U_] juice 2 0

juicy 2

bu_ [bu_] 0 bush 4

bushels 2

bushes 1

mee_ [ m ij — 0 — 0

nay_ [ n e ij - 0 nation 45

nations 43

nationwide 3

nationwide 1

No matter what context we choose, there is empirical data which the TP theory will

not cover. Our choice of which version to test will have to be based on other grounds.

There are two good theoretical reasons to choose a one-segment context for the present

study, and a third practical reason. First, we hope to equate "zero-frequency" and

"phonotactically illegal". This is plausible for sequences of length 2, but not for those of

length 4 - in the latter case, it leads to the claim that any 4-segment word which does not
already exist, such as [uloj], is illegal. Second, the MERGE TP theory contrasts with

TRACE in excluding the lexicon from prelexical phonetic processing. As the size of the

86

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
context increases, so does the number of phoneme sequences whose frequencies the

prelexicai module has to keep track of. Their number quickly approaches the size of the

lexicon. Finally, as a practical matter, long contexts make the frequency counts harder to

replicate, since each count is based on a smaller sample.

There is also empirical evidence supporting the one-segment context theory. Pitt

(1988) undertook a replication and extension of the Massaro-Cohen (1983) experiments.

He presented an [j]-[1] continuum to American English listeners in the synthetic contexts

[d_ae], [g_ae], [t_ae], [bjeJ, and [s_ae], and measured listeners' [j ] and [1] judgments. He

found a strong [j ] report bias (compared to the baseline [b_ae]) in [t_ae], a weaker one in

[d_ae], none in [g_ae], and a strong [1] bias in [s_ae].

Absolute per-million frequencies of [1] and [j ] after each of the initial consonants are

shown in (3.17). The ratio of these yields the a priori likelihood that an unknown liquid in
that context will be an [I], As we saw in §3.3.2.3, it is this ratio which, when the listener

uses an optimal guessing strategy, predicts the size of the response bias. The order of

effects predicted by the likelihood ratio is exactly the order of effects found by Pitt15:

>s Pitt himself interpreted these results as contradicting the probabilistic account o f the phonotactic effect.
This is because he assumed listeners were using a suboptimal guessing strategy. Rather than the likelihood
ratio, he took the predictor o f statistically-induced bias in favor o f a given cluster to be the sum o f the
logarithm o f the individual frequencies o f the words in which it occurrs.

87

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Table 3.17. Likelihood ratio as a predictor of the phonotactic bias effects of Pitt (1998)

Frequency Empirically

(Francis- Ratio, measured bias

Sequence Kucera) F([1])/F([j ]) Statistical bias (Pitt 1998)

[ tl] 220 0.026 Strong [i] Strong [j]

[ti] 8468

[d l] 275 0.136 [i] Weak [j]

[d i] 2020

[gl] 626 0.163 [J] None

[g j] 3845

[b l] 2407 0.992 None (Baseline)


[bj] 2426

[s i] 815 62.7 Strong [I] Strong [1]

[s j ] 13

For the present study, therefore, we will pursue a probabilistic theory of

phonotactics in perception which makes the following claims:

1. The mechanisms of speech perception have access to a table of length-2 or

length-3 sequences occurring in the English lexicon, including their empirical frequencies.

We will estimate those from the Celex statistics on British English, checking them against

the Francis-Kucera statistics on American English, using the procedure described in the

Appendix to this chapter.

88

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
2. When an acoustically ambiguous segment between x and y is presented in the

context it will tend to be parsed as the one which is more frequent in that context.

The difference in rate of "x" report between the context A_B and the context C_D will

depend on the relative likelihood of x and y in those contexts. The influence of statistics

will be greatest where the acoustic ambiguity is greatest.

3. The relative likelihood of x and y in A_B and C_D can be computed in either of

two ways.

a. (INC-1 theory): Pr (x | A_) * Pr (x | _B) compared to Pr (y | A_) * Pr (y | _B).

b. (SC-1 theory): Pr (x | A_B) compared to Pr (y | A_B).

4. The TP effect happens very early, certainly prelexically. However, tasks that

directly tap phoneme perception (such as the syllable and phoneme judgment tasks used by

Massaro & Cohen (1983)) can be responded to on the basis of either a prelexical phonetic

representation, or on the basis of one retrieved from the lexicon after word recognition,

following the MERGE proposal of Norris et al. (2000).

3.4. A grammar-based account

A final possibility is that phonotactic regularities are not emergent, but fundamental;

that the mechanisms of speech perception have access to the possible, as well as the actual,

phonological configurations of their language, and are able to apply that knowledge in

perceptual tasks to constrain the hypothesis space.

The chief point at issue between the TRACE and MERGE TP theories on the one

hand and a grammatical theory on the other is the status of zero-frequency phoneme

sequences. TRACE and MERGE TP treat ail such gaps alike: The model simply notes the

non-occurrence of a particular sequence, and favors occurring sequences over it. A

grammar-based theory can draw the distinction, discussed in §2 . 1 , between true

phonological gaps (configurations which cannot occur) and mere lexical gaps

(configurations which happen not to have occurred).

89

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
That there is such a difference is the central claim we will test here. There are many

different ways to implement a theory of grammar in speech perception. The main model

parameters are the specific grammar to be used (§3.4.1.) and the rule for using it to decide

between alternative interpretations of an ambiguous phoneme (§3.4.2.). The model

presented here uses the grammatical framework of phonological Optimality Theory (Prince

& Smolensky 1993). entailing a decision model in which multiple candidate parses are

entertained in parallel.

3.4.1. Choice of grammatical theory

3.4.1.1. Grammatical framework

A grammar-based theory of phoneme perception could in principle be built around

any procedure which correctly separates the productive gaps from the non-productive ones.

I have chosen phonological Optimality Theory (Prince & Smolensky 1993, McCarthy &

Prince 1995).

OT is particularly well-suited to phonotactic modelling because phonotactic

markedness is a theoretical primitive in OT, embodied in the ranked markedncss constraints.

Surface representations can be compared for markedness by scoring them with respect to

those constraints. This contrasts with rule-based theories, in which markedness is an

emergent phenomenon.

Markedness, however, is not the same thing as illegality. A configuration is illegal

in an OT grammar if it is never in the output, regardless of what the input is. This is

reflected in the grammar by having a markedness constraint against the illegal configuration

dominate all of the faithfulness constraints which aim to preserve it, so that an input

containing the illegal configuration will be realized without it. The grammar may contain

many markedness constraints which do not dominate the relevant faithfulness constraints

and hence do not trigger repairs; configurations violating only such constraints are not

illegal (though they may be marked in other ways).

90

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
An example is the case of *PAL, the constraint which forbids palatal consonants

(§2.3.2.3.4). It dominates the faithfulness constraint lDENT[BACK], and hence is able to

compel violations of it:

(3.18) *PAL » IDENT[BACK]

/ca/ *P a l IDENT[BACK]

[ca] *!

-> [ka] *

A constraint C in a grammar is said to be active for an input i if at least one

candidate is eliminated by C (Prince & Smolensky 1993, Chapter 5). In (3.18), for

example, * P al is active for the input /ca/, because it is there that [ca] is eliminated.

In a given grammar, some constraints are never active for any input. An example in

English is *VoiCE](T, forbidding voiced obstruents in syllable codas (McCarthy 1998).

Voicelessness is obligatory for coda obstruents in many languages, including the standard

varieties of Russian, Polish, German, and Turkish. Illegal coda clusters are repaired

phonologically by devoicing, indicating that faithfulness constraints are being violated in

order to satisfy *VoiCE]o. (Hence, *VoiCE]o is active for some inputs in those languages

- it is the constraint which eliminates the candidate outputs with voiced coda obstruents.)
In English, however, *VoiCE]o is ranked too low to have such an effect. English

tolerates voiced coda obstruents (tub, leave, brag, etc.). No candidate is ever eliminated by

*VO!CE]a, which, therefore, is inactive in English.

91

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.19) IDENT (for instance)» *V oiCE]a

/liv/ IDENT *VoiCE]a

-> [liv] *

[lifl *!

The general unmarkedness of voiceless codas in acquisition and cross-Iinguistically,

like the general markedness of palatals, are crucial to any grammatical model of language,

but are outside the scope of statistically-based models such as TRACE and MERGE TP.

3.4.1.2. Particular grammar

Within a given framework, there are at least as many different grammars as there are

languages, and for each language, perhaps as many as there are linguists. The predictions

of the perceptual model depend on which one is selected. If experiment falsifies these

predictions, the problem may lie with either the perceptual theory itself or with the

grammatical analysis of the given language (just as, if experiment falsifies a probabilistic

theory, the problem may be due to the perceptual theory itself, or to faulty frequency

counts).

It is therefore important to start by examining phenomena whose predicted

perceptual effects are insensitive to the choice of a specific analysis. The lack of onset [tl

dl] clusters, for instance, is a good choice, because those onsets are so robustly illegal that

any theory of English grammar (Optimality-Theoretic or not) has to ban them. In an OT

framework, that means they must be ruled out by some markedness constraint, which ipso

facto is active. Even if our analysis in Chapter 2 has pointed the finger at the wrong

markedness constraint, any alternative analysis will have a different one from which the

same perceptual consequences will follow.

Thus, we expect the perceptually influential phonotactic constraints of a language to

include, at a minimum, the ones which are by all measures productive: those that are not

92

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
naturally violated (having no lexical exceptions) and that speakers cannot be induced to

violate (either because the banned configurations trigger repairs, or because the speaker

simply cannot pronounce them without great effort). For more detailed discussion, see

§ 2 .2 .

3.4.2. Decision mechanism

The most straightforward adaptation of OT to speech perception is essentially this:

Linguistic effects on speech perception come about because language limits the set of

available parses. The listener constructs a phonological parse at two levels of representation,

corresponding to OT's underlying /UR/ and surface [SR]. The [SR] is computed from the

acoustic signal, while the /UR/ is retrieved from the lexicon. Different (/UR/, [SR]) pairs

compete to account for the observed signal, with the grammar as referee: The (/UR/, [SR])

pairs are scored by the hierarchy of markedness and faithfulness constraints of the

language, and perception favors the most harmonic pair. Thus, the OT grammar does the

same job in speech perception that it does in linguistic theory: It compares (/UR/, [SR])

pairs and picks the most harmonic.

For example, in Pitt (1998)'s replication of the experiments of Massaro & Cohen

(1983), using nonword stimuli, there are no /UR/s to deal with, so the issue is decided by

the markedness constraints:

(3.20) Both endpoints legal => no grammatical bias

UR = • I OCP(CONT, COR) S pr ea d [C o r ]

a. (•, [hiae]) 1

b. (•, [blae]) |

93

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.21) [I] illegal => [j ] bias

UR = • OCP( co n t , C or ) S prea d [C o r ]

a. (•, [Us])

b. (•, [tls]) *!

( 3 .2 2 ) [ j ] ille g a l = > [1] b ia s

UR = • OCP( co n t , C or ) S p r ea d [C o r ]

a. (•, [s j s ]) *!

b. (•, [sis])

In this model, the incoming acoustic signal is first transduced into one or more

surface phonetic representations, or [SR]s. The transducing mechanism is a black-box

component which in Figure 3.3 is labelled "Phonetic Parser"; it could also be called a

"Feature Extractor". Given a speech stimulus, it produces a set of [SRJs consistent with that

stimulus.

94

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 3.23. Architecture of an OT-grammatical-based parsing model

/UR1/ /UR2/ /UR3/ ... /URn/ Lexicon

OT grammar

[SR] Parse(s)

Phonetic
parser

ftr f+- Acoustic signal

I assume that under normal laboratory conditions, with a short stimulus clearly

spoken, the Phonetic Parser will emit a single [SR]. Two or more [SR]s can be coaxed out

of it by presenting an acoustically ambiguous stimulus. The likelihood that a stimulus


between, say, [i] and [I] will evoke [j ] is assumed to be independent of the likelihood that it

will evoke [1]; for a given stimulus level, there is a certain probability of getting [j ], a certain

probability of getting [1], a certain probability of getting both, and a certain probability of

getting neither. These probabilities change depending on the acoustic constitution of the

stimulus.

The candidate [SR]s are assumed to represent syllabification. It is not in dispute

that syllables can be incorporated into a prelexical representation, since nonsense words,

which lack a lexical representation, can be syllabified in off-line judgment tasks. The

question is whether the syllabic structure is automatically computed as part of the parsing

process. There is evidence that it is. Syllable boundaries are needed for segmentation and

lexical access, so they have to be marked in the input to the lexical-access stage. In on-line

95

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
word-spotting tasks, English listeners are better at finding a word boundary when it is

aligned with the left-hand boundary of a stressed syllable (McQueen et al. 1994, Cutler &

Norris 1988), which suggests that the input is parsed exhaustively into feet. A word

boundary is harder to find if a syllable boundary drawn at that point would create a

phonotactically impossible syllable (e.g., spotting apple in fapple) (Norris et al. 1997,

McQueen et al. 1998). Abstract grammatical preferences for maximal onsets, and for

syllabifying intervocalic consonants with the more stressed vowel, are also detectable in

word-spotting - generalizations which only make sense when stated in terms of syllables

(Kirk 2001) . ‘6

As in the Shortlist model of Norris (1994), lexical entries are activated by an [SR]

which is sufficiently similar to them. I will assume that "sufficient" similarity is determined

by the neighborhood metric; a lexical competitor is any lexical item whose /UR/ can be

obtained from one of the active [SR]s by a one-segment insertion, deletion, or replacement.

Competition between all of the active (/UR/, [SR]) pairs then takes place through the

grammar. 1718

16 Evidence against syllabification com es chiefly from sequence-monitoring tasks. These are known to be
sensitive to syllabification in certain languages. For example, M ehler, Dommergues, Frauenfelder, and
Segui (1981) found that French listeners detected a CV or CVC target faster when it exactly matched the
first syllable o f the stimulus word - ba being detected faster in ba.lance than in bal.con, but bal being
detected faster in bal.conthan in ba.lance. English speakers show no such difference, whether tested with
English (ba/bal in bal.cony and the ambisyllabic ba.lance/bal.ance) or with the original French materials
(Cutler, Mehler, Norris, & Segui 1986). The authors interpreted this to mean that English listeners do not
use on-line syllabification to segm ent speech, even in cases like bal.cony where the syllabification is
unambiguous. However, K irk (2001) argues instead that, owing to the effects o f stress on syllabification,
the first syllable o f balcony is bale, and hence that neither target m atched a syllable. For an extensive
critical review o f the evidence for and against on-line syllabification in English and other languges, see Kirk
(2001, Chapter 2).
17 Although this description is confined to the single-word stimuli actually used in the experiments, it can
be extended to longer utterances in a straightforward way. The Phonetic Parser emits one or more candidate
[SRjs, as before. Candidate word /UR/s are activated by sufficiently sim ilar substrings of the [SR]s;
candidate utterance /UR/s are the concatenations o f nonoveriapping word /UR/s. These utterance (/UR/,
[SR]) pairs then compete as before. This allows the theory to capture word segmentation and inter-word
sandhi phenomena.
18 For bilingual listeners, this model assumes that the grammar is selected before the utterance is parsed
phonologically. It is conceivable that bilingual listeners parse an incom ing utterance in both languages
simultaneously, and sem antically interpret it in whichever language yields the more harmonic phonological
parse. However, existing studies o f language identification suggest that infant and adult listeners use
rhythmic characteristics, rather than inventory or phonotactics (Stockm al, Moates, & Bond 2000; Nazzi,
Jusczyk, & Johnson, 2000). Languages with similar rhythm can be discrim inated by infants (e.g., Catalan
and Spanish, Bosch & Sebastian-G ali£s 1997), but it is not known what information is used.

96

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
It is at this point natural to posit that the grammar scores each pair and chooses the

most harmonic. This theory is elegant but fatally flawed. The root of the problem is that

such a literal implementation of OT inherits the OT principle called "Strictness of

Domination", which states that if Constraint A is ranked above Constraint B, then violating

Constraint A even once is less harmonic than violating Constraint B any number of times.

Strict domination is the only means which OT affords to represent one constraint's primacy

over another. Violating A cannot be just a little bit worse than violating B; it must be either

infinitely worse, equally bad, or infinitely better (Prince & Smolensky 1993:78).

One consequent prediction is that inactive markedness constraints should have just

as big an impact on perception as active ones. In order to account for cross-linguistic

patterns in the sound inventories of languages, OT posits certain universally fixed rankings

- e.g., that labial and dorsal articulations are universally more marked than coronal

articulations. In the grammar of English, which allows both labials and coronals, neither the

anti-labial constraint nor the anti-coronal constraint is active; however, they are still in the

grammar and one still dominates the other. A stimulus which is ambiguous between [ba]

and [da] therefore has two interpretations, one of which violates the constraint against

labials, the other of which violates the lower-ranked constraint against coronals. Since

labials are by hypothesis infinitely worse than coronals, perception should strongly favor

[da] over [ba]. This does not seem to always be the case: For example, when Luce (1986,

Ch. 3) presented listeners with a balanced set of CVC nonsense words in noise at a +5 dB

signal-to-noise ratio, he found that final [b] was reported as [d] 27 times out of 150, while

final [d] was reported as [b] 24 times. 19

In order to make the necessary distinction between significant and insignificant

markedness differences, we must stipulate that inactive markedness constraints carry little if

19 Syllable-initially, [b] was reported as [d] far more often than the reverse: 39 times versus 4 at a +5 dB
signal-to-noise ratio, 23 times versus 4 at -5 dB. Even syllable-final [b] was reported as [d] 15 times at a
-5 dB ratio, versus 6 times the other way around. It m ay be that the low markedness, or perhaps the high
frequency, o f [d] is having some sort o f effect. This effect is, however, not as overwhelming as expected,
and may in any case be due to the spectral quality o f the noise used (white Gaussian noise up to 4.8 kHz),
which is sim ilar to the diffuse-rising spectrum o f alveolar plosive bursts (Blumstein & Stevens 1979).

97

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
any perceptual weight. That is, the (/UR/, [SR]) pairs are evaluated only by that part of the

constraint hierarchy which ranks above the highest-ranked inactive markedness constraint.

A second problem that this proposal runs into is the same one which bedeviled

TRACE: the inability to adjust the influence of the lexicon based on attentional factors.

Lexical effects, such as the Ganong effect, would in this theory be captured by faithfulness

constraints:

(3.24) [d] nonword, [t] word => [t] bias

UR = /taesk/ ID-V oice

a. (/taesk/, [daesk]) *!

b. =>(/taesk/, [taesk])
(3.25) [d] word, [t] nonword => [d] bias

ID-Voice
II

a. => (/daej/, [daej])

b. (/daej/, [taej]) *!

The violation, and hence the predicted bias, is just as large regardless of how much

attention the listener allocates to the lexicon, contrary to the findings of Cutler et al. (1987).

This is only an apparent problem. All experimental results are obtained by

averaging over a large number of trials. Suppose that on each trial, the listener either

"attends to the lexicon" - i.e., insists on a parse with an /SR/ - or does not, on a trial-by-trial

basis. The task manipulations used by Cutler et al. (1987) can be seen as changing the

probability, rather than the extent, of the listener's attention to the lexicon on each trial, and

hence the number of lexically-biased responses which were averaged into the data.

The decision rule we must adopt is therefore:

98

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3 .2 6 )
a. If attending to the lexicon, and if the stimulus is close enough to a
real word to activate some /SR/s, choose the (/UR/, [SR]) pair which scores
best on the active constraints.

b. Otherwise, choose the ([SR]) which scores best on the active


markedness constraints (the faithfulness constraints have no function in the
absence of a /UR/).

c. Ties are broken randomly.

As in the Race model of Cutler et al. (1987), responses to phoneme tasks can be

based on either the computed [SR] or the retrieved /UR/, with task constraints dictating

which is favored in each case. There is laboratory evidence for the existence of both levels

and for attentional effects on them.

Xu (1991) showed that Mandarin Chinese speakers had poorer recall for written

lists of rhyming morphemes when the list elements shared the same tone than when they

differed in tone. Speakers were then asked to perform the same task with lists constructed

so that the first two items had the same surface tone but different underlying tones, and

performance was compared with lists in which the first two items had different surface and

underlying tones. Performance was worse on lists of the first sort, suggesting that the

short-term memory representation in this task was in terms of [SR]s.

On the other hand, Lahiri & Marslen-Wilson (1991), using a gating task, found that

listeners interpreted vowel nasalization differently depending on their native language:

English listeners took it as a sign of an upcoming nasal consonant, since English vowels are

not inherently nasalized, but become so in a nasal phonetic context. Bengali listeners, on the

other hand, speak a language which has both inherently (i.e., contrastively) nasalized vowels

and contextually nasalized vowels. They overwhelmingly interpreted vowel nasalization as

underlying (i.e., did not take it as a sign that a nasal consonant was coming up) until they

actually heard the beginnings of the nasal consonant. This suggests that the Bengali lexicon

represents contextually nasalized vowels as not nasalized, showing a difference between

99

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
lexical representation and surface phonetic representation which the gating task (an

inherently lexical task) revealed. It further indicates that the Bengali speakers were

choosing the more faithful (/UR/, [SR]) pair in which the underlying and surface vowel had

the same degree of nasalization over the less faithful pair in which they differed, as we

would expect.20

3.5. Summary

This chapter has presented three very different theories of phonotactics in speech

perception.

The TRACE model sees phonotactics as an effect of similarity to lexical items. Of

the three theories, it makes the smallest demands on the learner, requiring knowledge only

of the lexicon. Phonotactic effects are viewed as diluted lexical effects, in which permitted

configurations are supported by partially-overlapping lexical items, which allows them to

defeat competing illegal candidates via lateral inhibition at the phoneme level.

The MERGE TP model attributes phonotactic effects to differing frequencies of

short phoneme sequences. The theory requires knowledge of the lexicon and of a set of

attested phoneme sequences, which may be quite large but can be acquired straightforwardly

through observation. Phonotactic effects are taken to occur at a pre-lexical level, with rare

sequences being perceptually disfavored.

20 If the OT interpretation o f Lahiri & M arslen-W ilson’s results is correct, it is empirical evidence against
the OT principle o f Lexicon Optimization (Prince & Smolensky 1993, Inkelas 1994). Lexicon
Optimization is a means o f dealing with the source-filler nature o f the O T grammatical m odel, which can
map several /UR/s to the sam e [SR]. In acquiring the lexicon, it is asserted, the /U R/ which is chosen is
the one to which the observed [SR] is most faithful.
We would be led to expect Bengali speakers to represent surface [CVnN] words as underlying
/CV nN/, which map to the sam e output more faithfully than an underlying /CVN/ would. W e would
therefore expect a gating stim ulus o f the form [CVn ...] to often be com pleted with an N, i.e., matched to a
word whose underlying representation is /CV nN/. Instead, they were overwhelmingly matched to words
whose underlying representation was /CVKT/, suggesting that the surface [CVnN] words are underlyingly
/CVN/. The study's finding that speakers apparently lexicalize surface contextually nasalized vowels as
underlying non-nasalized vowels indicates they are not using Lexicon Optimization.

100

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The OT grammatical model sees perceptual phonotactic effects as a consequence of

the limited range of parses available in the language, and the listener's bias towards a parsed

percept. The implementation used here requires knowledge of the lexicon, and of a set of

constraints. The number of constraints needed is probably not very large (a grammar of the

syllable onsets of English, in Chapter 2, needed well under 20), and the correct ranking is

provably leamable (Tesar & Smolensky 1995); however, their provenance is unclear. They

are normally taken to be innate, since the patterns they represent occur world-wide.21

Phonotactic effects are assumed to occur at a prelexical level, the level of surface

representations, with banned sequences being perceptually disfavored.

Each of these theories suffers from empirical drawbacks in one domain or another.

TRACE has difficulty explaining why phonotactic effects are more robust than lexical

effects. The MERGE TP model cannot be pinned down on precisely which phoneme

sequences are perceptually relevant; different choices leave different lab results unexplained.

The OT grammatical model accounts for effects of illegality, but not the apparent (usually

very small, but definitely detectable) effects of sequence frequency or as lexical

neighborhood (Newman, Sawusch, & Luce 1997; Pitt & McQueen 1998 Exp. 4).

The drawbacks of one model are, naturally, the advantages of the others. TRACE is

theoretically attractive because it offers an extremely parsimonious learning model. Because

sound-meaning relations are arbitrary, the lexicon must be learned in any theory. TRACE

says that only the lexicon must be learned, and that apparent effects of grammatical

regularity are really emergent properties of lexical interaction. MERGE TP is only slightly

less parsimonious - only the lexicon must be learned, but the relevant regularities have to be

actively abstracted from it by the probability-tracking system. Though both theories require

innate structure in the perceptual system, neither requires detailed innate knowledge the way

21 Moreover, since the constraints are violable and do get violated, they cannot be individually inferred from
the speech corpus by any simple m echanism — especially the markedness constraints, being prohibitions for
which no positive evidence can exist. (Naturally, a linguistically more sophisticated mechanism could take
advantage o f alternations to deduce abstract underlying forms and the markedness constraints necessary to
cause the alternations.)

101

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the OT grammatical theory does. As a practical matter, it is also easier to make predictions

from TRACE and MERGE TP than from any grammatical theory, since less analytic depth

is required.

In CHAPTER 4, our focus will be on the interesting claim, put forth by the TRACE

and MERGE TP theories and denied by the OT grammatical theory, that phonotactic

illegality is equivalent to zero frequency. The claim is interesting because it suggests that

phonology, at least in perception, is considerably simpler than many linguists have hitherto

supposed, and offers a means of circumventing the difficult problem of grammar

acquisition.

3.6. Appendix: Computing frequencies

All frequency counts were made from the Celex lexical database (Baayen et al.

1995). This is based on a corpus of 16.6 million words of written English and about

800,000 words o f spoken English. Most of the corpus is from British sources, and the

phonetic transcriptions are British.

Celex provides two ASCII phonetic transcription systems. I used the one found in

Field 7 of the file EPW.CD. Variant pronunciations are given for some words, but I always

used only the first pronunciation listed.

Celex gives frequency counts by "lemma" (i.e., citation form, with know, knows,

knew, and knowing all lumped together) and by "wordform" (i.e., counting inflected forms

separately). In both cases, homophonous words belonging to different grammatical

categories are counted separately (e.g., link noun and link verb). I used the wordform

database (except where otherwise noted), specifically, the files EPW.CD (the

pronunciations) and EFW.CD (the frequencies).

Frequencies are counted separately for the written and spoken corpora. A

"combined" frequency count is also given; since most of the corpus is written, the

"combined" frequency is usually very close to the written frequency. I have used the

102

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
spoken counts except where otherwise noted (non-spoken frequencies are used only for

compatibility with counts based on the Francis-Kucera (1967) written-corpus norms). All

counts are from the Celex per-million-words estimates (combined, Field 6 ; written, Field 9;

spoken, Field 12; of EFW.CD).

The Celex transcription system marks syllable boundaries and includes stress marks

for primary and secondary stress. These were removed.

The scripts used to create and process the frequency counts are appended.

(3.27) Script for counting frequency of length-n sequences

# !/usr/local/bin/perl

# make_ngram_table

# usage: make_ngram_table <n>

# where <n> = # of segments per gram

# Each word is enclosed in wd boundary markers"(" and")", which


# count as phonemes.

$n = $ARGV[0];

$phon_db = '/tmp/Celex/EPW.CD';
$freq_db = Vtmp/Celex/EFW.CD’;
open (PHON, "< $phon_db") || die "Couldn't open $phon_db";
open (FREQ, "< $freq_db") || die "Couldn't open $freq_db";

while ( ($phon_buf = <PHON>) && ($freq_buf = <FR EQ » ) {

# read a record from EPW.CD and EFW.CD

($phon_IDnum, $phon_orth, $phon_freq_comb, $foo, $foo, $foo, $pron)

split AV, $phon_buf;

($freq_IDnum, $freq_orth, $foo,


$freq_comb, $freq_comb_dev, $freq_comb_perM, $freq_comb_loglO,
$freq_writ, $freq_writ_perM, $freq_writ_loglO,
$freq_spok, $freq_spok_perM, $freq_spok_loglO
) = split AV, $freq_buf;

( ($phon_IDnum = $freq_IDnum) && ($phon_orth eq $freq_orth)) ||


die "$phon_db mismatches $freq_db:\n$phon_buf$freq_buf';

# Purge pronunciation of non-segmental characters

103

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
($segment_pron = $pron) =~ trA-VYV/d;

# Find the n-grams and count their frequencies


@segments = ( (split ", $segment_pron),')’ );
foreach $i (0 ..($#segments - $n + 1 )) {
$ngram = join ", @segments [$i..($i + $n - I)];

$freq_comb_perMs {Sngram} += $freq_comb_perM;


$freq_writ_perMs {$ngram} += $freq_writ_perM;
$freq_spok_perMs {Sngram} += $freq_spok_perM;
}

# Print out the ngrams and their frequencies


foreach $ngram (keys %freq_comb_perMs) {
printf "%s\t%5d\t%5d\t%5d\n",
Sngram,
$freq_comb_perMs {Sngram},
Sfreq_writ_perMs {Sngram},
Sfreq_spok_perMs {Sngram};
I
(3.28) Script for turning those counts into TPs

#!/usr/local/bin/perl

# make_TP_table

# Given a list of n-grams and their frequencies, computes transitional


# probabilities from XlX2...X(n-l) to Xn.

# Input format is
# <Xl...X(n-l)Xn> <combined freq> <written freq> <spoken freq> etc.

# Output format is
# <Xl...X(n-l)> <Xn> <P(Xn | Xl...Xn-l), combined> <same, written>etc.

while (Sbuf = <STDIN>) {

(Sngram, @freqs) = split As+/, Sbuf;

@segments = sp lit", Sngram;


Slastseg = pop @segments;
Scontext = join ", @segments;

# Count occurrences of each context Xl...X(n-l), and of each


#ngram Xl...Xn.
foreach Si (0 ..$#freqs) {
$context_freqs [$i] {"Scontext"} += Sfreqs [Si];
$ngram_freqs [$i] {"$context$lastseg"} += Sfreqs [$i];

104

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
# Compute transition probabilities conditional on Xl...X(n-l).

foreach $ngram (keys %{$ngram_freqs[0 ]}) {

@segments = split", $ngram;


$lastseg = pop @segments;
Scontext = join", @segments;

foreach $i (0 ..$#freqs) {
$TPs [$i] = '(none)';
next unless $context_freqs [$i] {"Scontext"}; # avoid /0 errors
STPs [$i] = sprintf "% 6.3f’,
($ngram_freqs [Si] {"ScontextSlastseg"} / $context_freqs [Si]
{Scontext});
}

print "$context\t$lastseg\t";
print join "\t", @TPs;
print "\n";
}

(3.29) Script for finding the active cohort following a given phonological string

#!/usr/local/bin/perl

# cohort

# Given a phoneme string, find all words in Celex EPW.CD/EFW.CD


# which begin with that string. Print each word and its per-
# million frequencies.

# Usage: cohort <string>

Sbeginning = shift @ARGV;

open (PHON, "cat /tmp/Celr;x/EPW.CD |") || die "Couldn’t open


EPW.CD";
open (FREQ, "cat /tmp/Celex/EFW.CD |") || die "Couldn't open EFW.CD";

while (($phon_buf = <PHON>) && ($freq_buf = <FREQ>)) {


(Sphon_fDnum, $phon_orth, $phon_freq_comb, Sfoo, Sfoo, Sfoo,
Spron) =
split AV, $phon_buf;

(SfreqJD num , $freq_orth, Sfoo,


$freq_comb, $freq_comb_dev, $freq_comb_perM, Sfreq_comb_loglO,
$freq_writ, Sfreq_writ_perM, Sfreq_writ_loglO,
$freq_spok, $freq_spok_perM, $freq_spok_IoglO
) = split AV, $freq_buf;

( ($phon_IDnum = SfreqJD num ) && ($phon_orth eq $freq_orth)) ||


die "$phon_db mismatches $freq_db:\n$phon_buf$freq_buf’;

# Purge pronunciation of non-segmental characters

105

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
($segment_pron = $pron) =~ trA-\Y//d;
# Is it in the cohort?
next unless ($segment_pron =~ /A\Q$beginning\E/);
# Yes —print
printf "%s ”, Sbeginning;
printf "%6 d ", $freq_comb_perM;
printf "%6 d ", $freq_writ_perM;
printf "%6 d\t", Sfreq_spok_perM;
printf ”%s\t", $freq_orth;
printf "%s\n", $segment_pron;
)

(3.30) Script for simulating statistically-based guessing

# !/usr/local/bin/per!

# simulated_guess

# Simulated experiment, illustrating the usefulness of TPs. A subject


# hears a corpus of English (a list of words, each word selected from
# Celex such that the English vocabulary occurs with its empirical
# frequency). At random, infrequent intervals, a word is truncated
# after at least (n-1) segments. The listener predicts the next one
# by consulting a table of n-grams, and guessing that the next segment
# will be whatever is most likely to follow the last (n- 1 ) segments
# of the stimulus.

# Input is output of make_ngram_table, awked to: <gram> <freq>

# Output is the expected proportion of trials on which the subject


# guesses correctly.

# Count frequencies

while (Sbuf = <STDIN>) {


(Sgram, Sfreq) = split As+/, Sbuf;

@segs = split", Sgram;


Snextone = pop @segs;
Scontext = join ", @segs;

Stotfreq += Sfreq; # frequency with which ngrams occur


Scfreq {Scontext} += Sfreq; # freq of ngrams starting with this (n- 1 )
gram
Sgfreq {Sgram} += Sfreq; # frequency of this ngram

# Keep track of likeliest ngram beginning with each (n-I)gram


$current_best = $best_guess {Scontext};
if (Sfreq > Sgfreq {"$context$current_best"}) {
$best_guess {Scontext} = Snextone;
}
}
$n = length (Sgram);

106

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
# Print guessing strategy
foreach Scontext (keys %cfreq) {
print "Scontext $best_guess{ Scontext }\n”;
}
print "\n";

# Simulate experiment

srand (time());
$CELEX_SIZE = 18000000;
STRIALS = 100000;

open (EPW, "cat/tmp/Celex/EPW.CD | ") || die "Couldn’t open EPW";


while (Sbuf = <EPW>) {
(Sfoo, Sorth, $comb_freq, Sfoo, Sfoo, Sfoo, Spron) = split AV, Sbuf;

next unless $comb_freq;

$seg_pron = ";

TRIAL: for (Si = 1; Si <= $comb_freq; $i++) {

# Each word gets as many lottery tickets as its frequency,


# and each ticket has $TRIALS/$CELEX_SIZE chance to win. This
# insures that any given word has its natural probability of
# being used on any given trial, and that the expected # of
# trials is STRIALS.

$r = int (rand ($CELEX_SIZE));


if ( $r >= ($CELEX_SIZE - STRIALS)) {

unless ($seg_pron) {
($seg_pron = Spron) =~ trAY\-//d;
$seg_pron = "(" . $seg_pron .")";
last TRIAL if (length ($seg_pron) < $n);
}
print "Sorth Si $seg_pron ";

$ngram_start = int (rand (length ($seg_pron) - $n + 1 ));


Sngram = substr ($seg_pron, $ngram_start, $n);
Scontext = substr (Sngram, 0, $n-l);
Snextone = substr (Sngram, - 1 );
print "Scontext $nextone\n";
$total_trials++;
$correct_trials++ if (Snextone eq $best_guess {Scontext});
}

}
}

print "$n: $correct_trials right out of $total_trials: ";


printf "%6.3f\n", $correct_trials/$total_trials;

107

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 4

EMPIRICAL TESTS

4.1. Introduction

The previous chapter described three contrasting theories of the origin of

phonotactic effects in phoneme tasks. TRACE (McClelland & Elman 1986) holds that

phoneme perception at the very lowest level is directly influenced by the downward spread

of activation from the lexicon. The MERGE TP theory (Pitt & McQueen 1998) assigns the

phonotactic effects to a prelexicaJ level of processing which is sensitive to the frequencies of

sub-word phoneme sequences. A performance theory that uses grammatical knowledge,

implemented using Optimality Theory (Prince & Smolensky 1993), attributes phonotactic

effects to the restrictions placed by the sound pattern of the language on the set of available

parses.

In this chapter, we will present empirical evidence bearing on these theories,

focusing on how each theory distinguishes "legal" from "illegal" sequences.

Experiment 1, aimed at distinguishing TRACE from the MERGE TP and OT

grammatical theories, shows that the size of the phonotactic boundary shift is not modulated

by degree of phonological overlap with existing words. This is unexpected in TRACE,

which takes phonological overlap to be the source of all phonotactic effects. It is expected

in MERGE TP and the OT grammatical theory - in both of which the phonotactically

relevant context is too small to include the region manipulated to produce overlap.

Experiment 2, intended as a replication of Experiment 1 with very different stimuli,

attempted to find a phonotactic effect of the markedness of word-initial [pw] in English. No

such effect was found. This is unexpected under the TRACE and MERGE TP theories,

since the statistical properties of [pw] are very similar to those of [tl] and [s j] (the clusters

used by Massaro & Cohen 1983). A natural explanation in the OT grammatical theory is

108

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that [pw], although rare in English, is not illegal - it does not violate an active markedness

constraint.

Experiment 3 was therefore performed to compare directly the phonotactic

effectiveness of the markedness of [pw] and [tl]. The results indicated that [pw] was in fact

less disfavored than [tl]. This result is expected in the OT grammatical theory, but is quite

unexpected in MERGE TP. Experiment 3 also replicated Experiment 1 in finding no

modulating effect of the degree of phonological overlap with existing words, contrary to the

predictions of TRACE.

Where Experiments 1-3 looked at the effect of stimulus variables on response

variables, Experiments 4 and 5 investigated the dependency between different responses.

Listeners heard stop-sonorant clusters in which both consonants were ambiguous, and

judged both. The effect of the stop judgment on the sonorant judgment was assessed

separately for each individual stimulus in a 6 x6 array, providing a measure of phonotactic

bias when all acoustic factors were completely fixed. The sonorant in both experiments was

an "l"-"w" scale.

Experiment 4, using CCV stimuli, found that a "d"-or-"g" decision affected the odds

of an "I" response, with ”d" making "1" less likely, while a "b"-or-"g" decision had no

effect. This confirms that the results of Experiments 2 and 3 (the smaller bias against [pw]

than [tl]) were not due simply to closer perceptual spacing of the stimuli at the labial end of

the scale. The existence of a response dependency is inconsistent with the TRACE

response mechanism; the larger effect of the "d"-or-"g" decision is unexpected in the INC-1

version of MERGE TP.

Experiment 5 compared the effect of a "d"-or-"b" decision in CCV stimuli with that

in VCCV stimuli. There was a strong effect in the CCV condition, but none in the VCCV

condition, indicating that the weakness of the ban on [pw bw] found in Experiments 2-4 was

not due to compensation for coarticulation, and suggesting that the parser determines

segmental identity and syllabification in parallel.

109

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Experiments 6 ab examined the perceptual effects of an abstract morpho-

phonological variable, lexical stratum membership in Japanese. Phonological cues to

stratum membership were found to cause the phonotactics of the particular stratum to be

imposed upon ambiguous stimuli, causing a perceptual boundary shift. This has a natural

account in the grammatical theory, where lexical stratum is a necessary theoretical entity. It

is unexpected in TRACE and the MERGE TP theory, for both of which a division of the

lexicon into strata is unmotivated. It is shown that the stratum-phonotactic effect cannot be

emergent in TRACE, because it is weaker than a lexical effect obtained with the same

subjects and paradigm. It cannot be emergent in the MERGE TP theory either, because

perception of the ambiguous segment is influenced by other segments which are too far

away for MERGE TP to connect them.

The results are argued to support the OT grammatical theory over TRACE and

MERGE TP.

4.2. Experiment 1: Sequence frequency and the phonotactics of word-final lax

vowels

4.2.1. Rationale

If phonotactic effects are really lexical effects, as claimed by TRACE, then their size

should be modulated by the same factors that control the size of lexical effects: Similarity

to existing words, and number of similar words. If that is so, a nonword that is similar to

many frequent words should induce stronger phonotactic effects than one that is similar to

few words and rare.

For this, we can exploit the phonology of the English lax vowels. The lax vowels, [t

e ae a o], form a separate system from the other "tense" vowels of English, both

phonetically and phonologically (see, e.g., Ladefoged 1993:86-88).

110

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Like all American English vowels, the lax vowels are somewhat diphthongal, but

where the tense vowels are peripheralizing diphthongs, the lax vowels are centralizing; that

is, the offset of a tense vowel is further from schwa than the onset, but the offset of a lax

vowel is closer to schwa than the onset. The lax vowels are also somewhat shorter and less

strongly diphthongized (Nearey & Assmann 1986; DiBenedetto 1989).

Phonologically, the lax vowels are distributed differently from the tense vowels. The

lax vowels do not occur in word-final open syllables;

Table 4.1. Distribution of tense and lax vowels in American English

Tense vowels Lax vowels Open syllables

i he

-
i

ei hay

e -

ae -

a pa

0 paw

ou hoe

u -

A -

111

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Not only do lax vowels not occur there, they can not occur there. The intuitive

badness of such nonwords as [hi he hae hu hA] is quite strong1. That the gap is

phonological rather than lexical is illustrated by the change of lax to tense vowels when

words are coined by truncation:

Table 4.2. Change of lax to tense vowels when made final by truncation

Base Truncated form

del[i\catessen del[i]

Mun[i\cipal Transportation Authority Mun[\]

Un[i]versity Un[ i]

D[l]rdre D\\]d[i)

TRACE ought to be able to model the lack of word-final lax vowels. The

phonotactic ban should emerge from the large population of words ending in tense vowels

and the nonexistence of any words ending in lax vowels. Activation spreading from the

tense-final words should shift the phonotactic boundary on a word-final [i]-[i] continuum,

compared to a baseline condition where the [i]-[i] continuum is not word-final.

However, since it is similarity to real words that produces the lexical activation, the

size of the shift should be larger when the rest of the phonological context (i.e., material

besides the [i]-[i] itself and the immediately following segment or boundary) is similar to

more existing words. TRACE does not distinguish between parts of the stimulus that are

1 Lax vowels can occur syllable-fmally in onomatopoea: bleah [bis], baa [bee:] (sound made by a sheep).
M arginal phonology is often found in this dom ain; e.g., boing [boir>] (sound made by a spring), which has
both a diphthong before [q] and a non-coronal after [01 ].

112

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
relevant to the phonotactic generalization, and parts which are not. Everything in the

stimulus can contribute through lexical activation.

The MERGE TP theory focuses on a smaller part of the stimulus. One version,

which we called INC-1 in §3.2.3.3, considers separately the ambiguous segment and the

immediately preceding segment on the one hand, and the ambiguous segment and the

immediately following segment on the other. A phonotactic boundary shift is predicted to

arise because of the rarity of the [lax vowel]-[word boundary] sequence. Another version,

which we called SC-1, considers the preceding, ambiguous, and following segments as a

unit. For any choice of preceding segment, SC-1 also predicts a phonotactic boundary shift,

owing to the rarity of the [preceding segment]-[lax vowel]-[word boundary] sequence.

Stimulus context more than one segment away from the ambiguous vowel does not enter

into the calculations and should not affect judgments.

The OT grammatical theory relies on the grammatical illegality of syliable-final lax

vowels in English. The markedness constraint against them, which I will call *Lax ]o , is

able to trigger repairs, and hence dominates the faithfulness constraint IDENT-V. Since

*Lax]<j is an active constraint (in the sense of §3.4.2), it is expected to penalize any parse

of the stimulus which postulates a syllable-final lax vowel, producing, in acoustically

ambiguous cases, a bias towards parses with a tense vowel. The constraints apply equally to

all phonological configurations meeting their structural description. Hence, this theory

predicts that only phonological context directly involved in the phonotactic prohibition (the

vowel and the immediately following segment or boundary) will contribute to the

phonotactic boundary shift.

4.2.2. Design

The aim was to test the prediction of TRACE that the size of the phonotactic effect is

determined by the similarity of the stimulus to words in the lexicon. Listeners judged the

113

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
middle 5 steps of a 7-step continuum from the tense [i] to the lax [i] in each of 16 carrier

contexts:

The phonotactic legality of the [i] endpoint could, as 4.3 shows, be varied by leaving

the final syllable open or closing it with [d3 ]. Since the segment preceding the ambiguous

vowel was always [j ], the TP theories (both INC-1 and SC-1) and the OT grammatical

theory expect only the open-closed manipulation to affect the location of the [i]-[i]

boundary.

Similarity of the stimulus to other words in the lexicon was varied by manipulating

the voicing of the consonant preceding the [j]: When the consonant was [g], the stimulus

was closer to more words than when it was [k]. Table 4.4, extracted from Celex's wordform

database (EPW.CD and EFW.CD), shows the English words ending in each of the eight

final syllables used in this experiment.2 Celex's British English transcriptions have been

Americanized in four cases by changing word-final [i]s to [i]s. 3

2 For this experiment, counts were computed over wordforms rather than lemmas because the relevant
dependency (between a vowel and a word boundary) is affected by inflection. In other experiments, which
used initial clusters, the lemma and wordform frequencies are the same.
3 Celex has final [i] for angry, hungry, kukri, and mimicry, and, in general, for final unstressed IU
elsewhere. Jones (1997) gives [i] as both the BrEng and AmEng pronunciation o f all o f these words (except
kukri, which isn't in that dictionary).

114

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.3. Phonotactics of stimuli for Experiment 1

C a rrie r
[i] [I ]

[z o lg jJ V X

M g jJ V X

[p a lg jJ V X

M g iJ V X

[z o lk i_ J V X

[s o lk ij V X

[p o lk jJ V X

[to lk ij V X

[z a lg j_ d 3 ] V V

[s a lg i_ d 3 ] V V

[p o lg j_ d 3 ] V V

[to lg i_ d 3 ] V V

[ Z 3 lk i_ d 3 ]
V V

[s o lk j_ d 3 ] V V

[p o lk j_ d 3 ] V V

[ta lk i_ d 3 ] V V

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.4. Frequency of the syllables in stimuli for Experiment 1

Frequency (per million words)

Word Combined Written Spoken

[gji]4

angry 65 68 19

hungry 34 36 3

agree 20 16 75

bachelor's degree 0 0 0

degree 105 100 170

disagree 2 2 14

filigree 1 1 2

first-degree 1 1 1

pedigree 2 2 2

second-degree 1 1 0

agree 20 16 75

disagree 2 2 14

agree 20 16 75

disagree 2 2 14

agree 20 16 75

disagree 2 2 14

[gnl

decree 0

4 Words with these onsets were extracted from Celex using the t r o h o c script. See the appendix to
Chapter 3 for the script and details o f how frequency counts were made.

116

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Frequency (per million words)

Word Combined Written Spoken

decree 0 0 0

scree 1 1 0

decree 0 0 0

decree 0 0 0

decree 0 0 0

[kn]

[gjid3 ]

[gjid3 ]

[kiid3 ]

[kjid3]

NOTE: Some forms appear more than once, because they are homophonous but
morphologically different: (to) agree, (I) agree, (we) agree, etc. Celex divides form
frequency equally among the homophones (Bumage 1995).

Table 4.5 shows the total word-final frequencies of each critical syllable:

117

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.5. Effect of the [k]/[g] manipulation on the frequency of the word-final syllables in
the stimuli of Experiment 1

Frequency (per million words)

Final syllable Combined Written Spoken

297 281 553


[gj»]

0 0 0
[gji]

6 6 0
[kii]

[kn] 0 0 0

0 0 0
[giid3]

0 0 0
[gnd3]

[Iuid3 ] 0 0 0

[kiid3 ] 0 0 0

The closed syllables are very infrequent — they are not represented in Celex at all.

Both [i] and [I] are phonotacticaliy permissible in them, since they are closed. These

syllables provide a baseline context in which to assess the statistical effects, if any, of the

preceding [gj] or [ki] context on the [i]-[i] boundary location.

The legal open syllable [gii] is far more frequent than the illegal [gji], especially in

word-final position. TRACE expects the large frequency difference to produce a boundary

shift, as the many words containing final [gii] feed activation down to support [i] over [i].

118

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The frequency difference between [kji] and [ku] is much smaller, leading to a smaller

predicted boundary shift. The MERGE TP and OT grammatical theories treat the [g]/[k]

manipulation as irrelevant - it lies outside the statistically relevant context in MERGE TP,

and outside the structural description of *L ax ]<j in the OT grammatical theory - so both

theories expect equally large shifts in either case.

4.2.3. Predictions

This section spells out in more detail the predictions of the three theories, with

special attention to TRACE.

4.2.3.1. TRACE simulation

It is notoriously hard to predict how a network will behave - they are just too

complicated. To check that TRACE really did make the predictions outlined in the last

section, a simulation was run.5 It confirmed our expectations.

4.2.3.1.1. Calibration and replication of the original TRACE results

The TRACE architecture has many adjustable variables, controlling things like the

speed with which activation spreads and the relative weight given to the different node

layers. The first step in the simulation was to get the right parameters and replicate the

results of McClelland and Elman (1986). After some trial and error, the following were

selected (based mostly on McClelland and Elman's Table 2):

Thanks are due to Jeff Elman, one o f TRACE'S creators, for sharing his software.

119

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.6. Parameter settings for the TRACE simulation (all experiments)

Parameter Value Function

max 1.0 0 Maximum activation level of

a unit

min -0.30 Minimum activation level of

a unit

imax 3.00 Maximum incoming

activation to a unit

fscale 0 .0 0 Turns off effects of word

frequency

alpha[IF| 1 .0 0 Input-to-feature gain

alpha[FF] 0.04 Feature-to-feature inhibition

alpha[FP] 0 .0 2 Feature-to-phoneme

excitation

alpha[PP] 0.04 Phoneme-to-phoneme

inhibition

alpha[PW] 0.05 Phoneme-to-word excitation

alpha[WW] 0.03 Word-to-word inhibition

alpha[WP] 0.03 Word-to-phoneme excitation

alpha[PF] 0 .0 0 Phoneme-to-feature

excitation

alphafPFC] 0 .0 0 Feature-to-phoneme

coaiticulation

These led to a near-perfect replication of McClelland and Elman's simulation of the

Massaro and Cohen (1983) experiment. The network was given as input the string /sLi/,

120

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where !\J was featurally ambiguous between 111 and /!/. After 51 cycles, the network display

corresponded almost exactly to McClelland and Elman's (1986) Figure 7, Panel 3, with /l/

about twice as active as 111 among the phoneme nodes and sleep and sleet the most active

words:

Figure 4.7. Results of the TRACE simulation replicating Figure 7 of McClelland and
Elman (1986)

CYCLE 51

sis

sil

sit

si 8 24

sid

sik

slip 31 14

slit 31 14

su

~s

15 36 36 17

6 8 26

20

38

10 33 67 67 33 10

The horizontal axis represents time; the words and phonemes are arrayed along the

vertical axis. TRACE has a separate unit for each word or phoneme at each time cycle

121

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(corresponding to the hypothesis that the utterance contains that word or phoneme

beginning at that time cycle). The activation level of each unit at the present time. Cycle 51,

is shown by a number if it is greater than zero. So, for instance, the network at this moment

is confident, to a degree of 6 8 (arbitrary measurement units out of 99), that there is an [s] at

time slice 12, and to a degree of 31 that the signal contains the word sleep, beginning at time

slice 12.

As a second check, the network was asked to process the input /Tluli/, where /T/ is

ambiguous between the phonotactically legal /p/ and the illegal, but lexically favored, /t/

(supported by the nearby word truly). By Cycle 54, we see an output almost identical to

McClelland and Elman (1986)’s Figure 8 , Panel 3: /truli/ is the leading word, and the

leading candidate for the cluster is the unphonotactic [tl] - the [t] being the lexically favored

disambiguation of the ambiguous ‘p /t\ and the [I] being the phonetically supported parse of

the unambiguous input.

122

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.8. Results of the TRACE simulation replicating Figure 8 of McClelland and
Elman (1988)

CYCLE 54

30

it

lid

lig

lip

lis t

ru l 15

tr u p

tru

tru li 45 20

16 37 37 16

42

48

41 28

57

20

51 17

25

10 32 67

4.2.3.1.2. Simulation of the present experiment

The original TRACE has only four vowels: (i a u A). These are distinguished one

from another by the features DIFfuse (evidently F2-F1: 8=high, l=low), ACUte (evidently

F2: 8=front, l=back), and POWer (8 for [i a u], 7 for [A] (which is both [a] and [a])).

123

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.9. Featural parameters of the four original TRACE vowels (McClelland & Elman
1988)

Vowel DIF ACU POW

i 8 8 8

a 2 1 8

u 6 2 8
A 5 1 7

A new phoneme was added, corresponding to IPA [i], and a new ambiguous

phoneme between [i] and [t]:

Table 4.10. Featural parameters of the new vowels [1] and [X]

Vowel DIF ACU POW

i 8 8 8

X 8 7 8

I 8 6 8

In tests with the lexicon turned off, IX/ was found to activate /i/ and IV equally.

The original TRACE lexicons only included words whose vowels were drawn from

the set /i a u A/. To create a new lexicon including /I/-bearing words, I took the CELEX

English lemma pronunciation Tile EPL.CD, merged it with the CELEX English lemma

frequency file EFL.CD, and culled therefrom the words having all of the following

properties:

124

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. They contained only the phonemes in the modified TRACE (the original

complement + [I], with TRACE [A] corresponding to CELEX [V] and [@] (IPA [a ] and

[3 ])-

2. Since TRACE does not support the [d3 ] phoneme, it was simulated with [S],

CELEX words containing |J] were excluded. Those containing [d3 ] were included, with the

[d3 ] recoded as [S],

3. They occurred at least 16.7 times per million in the combined spoken and written

corpus of CELEX. (McClelland and Elman used a 20-per-million Kucera-Francis cutoff,

but I had to go lower so that this lexicon would be of similar size to their lexicon s le x ) .

The stress and syllabification marks were stripped, the phoneme codes were

converted, and homophones were collapsed into a single entry, with the frequencies added

together. Celex's British-English coding of final [i], as in angry, was converted to the

American English [i]. BrEng [a:] was converted into AmEng [at].

The resulting lexicon contained 241 lemmas (making it about the same size as the

original slex, which has 213). Of them, 5 words had /gji/, 3 of them finally: agree, degree,

disagree, Greek, and Greece/grease, while 1 word had /k_ii/, nonfinally: creep. It contained

no words with [giid 3 ] or [kiid 3 ]. There were no words with /gji/, but 5 had /kii/: cricket,

critic, secret, secretly, and script.

TRACE can be made sensitive to word frequency. Since McClelland and Elman

(1986) did not use this feature in their simulations, I did not use it in this one.

Simulations with this lexicon showed that TRACE did not distinguish between the

open- and closed-syllable contexts at all. There was no difference in activation level

between [i] and P], regardless of whether the vowel was word-final or not, because TRACE

125

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
was insufficiently sensitive to silence as a word-boundary cue. To remedy this, the TRACE

symbol for silence was added to the end of each word in the lexicon, where it acted as

another phoneme.

To establish a baseline free of lexical bias, the network was probed with the

nonwords [salgixf] and [sdlkiXJ]. Because of their sparse lexical neigbhorhoods, we

expect little lexical influence, and we expect the [X] to be equally ambiguous in both

contexts. As a measure of ambiguity we will use the difference between [i] and [I]

activation on Cycle 75 (about one syllable's length after the stimulus offset on Cycle 54).

This point was chosen because trial showed that no major changes in the relative activation

levels of the phoneme units happened later; rather, after Cycle 75, overall network activation

tended to decay towards zero.

In the [gj_d3 ] condition, [i] leads [I] by 43 to 36; in the [kj_d3 ] condition, [1] leads

[i] by 40 to 34. This represents a modest but definite bias, due mostly to the influence of

Greek, grease/Greece, and greet. The [I] interpretation is supported in both cases by the

lexical item ridge, but in the [gj_d3 ] condition ridge's activation is reduced by inhibition

from Greek, grease/Greece, and greet.

We now make the same comparison for the critical cases [salgjX] and [salgjX],

where a phonotactic bias is expected.

126

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.11. Results of the TRACE simulation for the input [sslgjXf]

CYCLE 75

42 45 48 51 54 57 60 63 66

S i-

Su-

d- 31

g rik -

g ris - 38 20

g rit-

l- 31

rlS - 29 31

s llk -

s il-

22

63

59

63

60 19

43

36

61

22 61

127

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.12. Results of the TRACE simulation for the input [solkiXfl

CYCLE 75

II-

It-

S i- 10 16 10

Su- 10 16 10

d- 31

l- 31

k rip - 20

k ru d -

rlS - 3 6 39

rid -

s llk - 42

61 20

61

61 21

34

40

60

22 61

128

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.13. Results of the TRACE simulation for the input [sslgjX]

CYCLE 75

d- 31

g rik - 29

g ris - 29

g rit- 29

31

k ru -

li-

rlS -

rid -

s llk -

s il-

23

64

61

60 20

22 61

129

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.14. Results of the TRACE simulation for the input [sollciX]

CYCLE 75

27 30 33 36 39 42 45 48 51 54 57 60 63 66

II-

It-

d- 31 16 15 19 19 20 13

l- 31 18 30 15 19 19 20 13

it-

k rip - 29

k ru - 28 27

k ru d -

li-

rlS -

rid -

sllk- 38

70

61

59

61 25

42

33

60

22 61

TRACE favors [i] over [I] by 52 to 26 in the [gj_] condition (thanks to the support

of agree), and by 42 to 33 in the [kj_J. There is a phonotactic bias towards [i] in both

cases, but it is much larger in the [g jJ than the [kj_] condition, as we had supposed, owing

130

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to the lack of words ending in [kj_]. If we take the difference in activation level between the

[i] and [I] units as our predictor of effect size, we expect that the proportions of [I]

responses evoked by the same ambiguous vowel in the different contexts to be ordered as

follows: The most [I] responses in [ki_d 3 ] (difference = +6), then [gj_d3 ] and [kj_] (-7

and -9), then [ g jJ (-26).

What has happened is that words containing a non-final [gji], [kii], [gn], and [kn]

become partially activated, and provide the same amount of top-down support, regardless of

whether the final syllable of the stimulus is open or closed. However, the open-syllable

condition also allows the population of similar [gji]-final words to assist [i]. (In this

example, with a restricted lexicon, that population is limited to agree, but as we have seen,

there are many more.) Since there is no comparable population of [kji]-final words, [i]

receives less assistance in the [lu_] condition than in the [gj_] condition.

4.2.3.2. MERGE TP

There are two versions of the MERGE TP theory to be considered: INC-1, which

treats preceding and following context separately, and SC-1, which treats them together. As

we discussed in 1.3, the theories' predictions are made using different decision variables.

When a segment ambiguous between x and y is judged in the context A_B, the boundary

shift will depend on:

(4.15) INC-1 theory (independent neighboring contexts of one segment)

Pr (x | A_) * Pr (x | _B) compared to Pr (y | A_) * Pr (y | _B)

131

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.16) SC-1 theory (surrounding context of one segment each way)

Pr (x | A_B) compared to Pr (y | A_B)

In this experiment, the context A is [j ], and the context B is either the word

boundary [#] or [d3 J. Context counts were made from the Celex wordforms database

(EPW.CD) using the combined written and spoken frequencies.

Both versions, in this experiment, make the same predictions.

4.2.3.2.I. INC-1 context

fril and fnl. All words coded as containing [ji] or [it] were extracted. There were

4117 words with [ii], occurring 124902 times in the 18-million-word corpus, and 12150

words with [j i ], occuring 493281 times. To correct for Celex's coding of American English

final unstressed [i] as [i] (as in angry), the 918 words ending in [j i ], occurring 122617

times, were transferred to the [ji] group. The resulting frequency counts were

Table 4.17. Diphone frequencies for the stimuli of Experiment 1

Frequency Frequency

Sequence Number of words (18 Mwd corpus) (per million)

5035 247519 13751


[Ji]

11232 370644 20591


[Ji]

Since [j ] occurs 108263 times per million words in Celex (EPW.CD, combined

written and spoken), the probabilities are

132

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.18)

Pr ([i] | [i_D = 0.127


Pr ([i] | [J_D = 0.190

fi#] and fi#). The latter does not occur at all in American English. To estimate the

frequency of the former, all words in Celex (wordforms, EPW.CD) coded as ending in [i]

or [i] were extracted, and the [i]-final words were recoded as [i] (to correct for Celex's

British-English transcription of words such as angry). A total of 7611 words were found,

with a total frequency of 2422130 in the 18-million-word corpus, or 134563 per million.

Since the word-boundary [#] occurs one million times per million words, the probabilities

are

(4.19)

Pr([i] | [_#]) = 0.135


Pr,^] | [_#]) = 0.000

fidxl and fidx!. Following the same procedure, we find 133 words with [id3 ],

occuring 5965 times in the 18-million-word corpus or 332 times per millon words, and

1340 words with [id3 ], occuring 62103 times in 18 million words or 3450 times per million

words. Since [d3 ] occurs non-initially 12483 times per million words, the probabilities are

133

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.20)

Pr ([i] | [_d3]) = 0.027


P r([i]|[_ d 3]) = 0.276

Thus, the decision variables for the INC-1 theory are

(4.21)

Pr ([i] | [J_])* Pr ([i] | [_#]) = 0.127 * 0.135 = 0.017


Pr ([i] | [ i j ) * Pr ([i] | [_#]) = 0.190 * 0.000 = 0.000

Pr ([i] | [J_])* Pr ([i] | [_d3]) = 0.127 * 0.027 = 0.0034


Pr ([i] | [ iJ ) * Pr ([i] | [_d3]) = 0.190 * 0.276 = 0.052

The INC-1 theory predicts that [i] will be strongly favored in the open-syllable

context, and that [ij will be favored in the closed-syllable context. If the closed-syllable

context is taken as a baseline, we should observe a strong phonotactic shift in favor of [i]

when in the open-syllable context. The rate of [I] responses across the continuum should

be ordered [gj_d3] = [kj_d3] > [gj_] = [ki_].

4.2.3.2.2. SC-1 context

fri#] and [j i # ] . The latter does not occur in American English. The former is found

in 12249 words in Celex wordforms (EPW.CD) (coded as [ji#] in 99 cases like agree, and

as [ji#] in 12150 cases like angry) with a total frequency of 32795 occurrences in the 18-

million-word corpus, or 7952 times per million. Since sequences of the form [j X#] occur

11407 times per million words, the probabilities are

134

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.22)

Pr ([J i# ] | [J _ # ]) = 0.697
Pr ([ j i # 1 | [J _ # ]) = 0.000

fjid-^1 and fiithl. The former is found in 53 wordforms with a total frequency of 94

per million; the latter in 399 wordforms with a total frequency of 956 per million. Since

there are 1581 occurrences of [iXd3] per million words, the probabilities are

(4.23)

Pr ( [ J id 3 ] | [i_d3]) = 0.059
P r ( [ n d 3 ] |[ J _ d 3 ]) = 0.605

Here again, the theory predicts a strong shift in favor of [i] in the open-syllable

context, and a shift in the other direction in the closed-syllable context. If the closed-

syllable context is taken as a baseline, we expect a large boundary shift in favor of [i] in the

open-syllable context. The rate of [I] responses across the continuum is expected to be

ordered [gj_d3] = [kj_d3] > [gj_] = [k jJ .

4.2.3.3. OT grammatical theory

When a stimulus is presented which is ambiguous between, e.g., [sA lgri] and

[sAlgri], the Phonetic Parser may emit the parse [sA l.g ri], the parse [sA i.g ri], or both, with

probability depending on how close the stimulus is acoustically to [i] or [i]. If both parses

are emitted, they will be scored with respect to the active constraints of English, and the

more harmonic one will be processed first.

135

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Since the grammar o f English forbids final lax vowels such as [I], there must be an

active markedness constraint against it, which we may call *LAX)<j:

(4 .2 4 ) *LAX]<j

Award one mark for every lax vowel in an open syllable.

For our hypothetical example, this will be the only active constraint which

distinguishes the two parses:

(4.25) [i] illegal => [i] bias

UR = • La x Jc

a. (•, [SAlgri])

b. (•, [sA lg n ]) *!

Since the more harmonic parse is processed first, responses will tend to be based on

the [i] parse, creating a bias towards [i] response. Since the bias is caused by Lax ]<j , it will

be present to the same degree in any stimulus meeting that constraint's structural description

- specifically, to the same degree in the [gr_] contexts as in the [kr_] contexts.

In the closed-syllable conditions, where both [i] and [I] are legal, no active

markedness constraints are violated by either parse, so no bias should be observed - and,

naturally, the [g]/[k] manipulation should again have no effect.

The predicted order o f the proportion of [I] responses is therefore [gj_d3 ] =

[kj_d 3 ] > [gj_] = [kj_], just as in the MERGE TP theory.

136

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.4. Methods

Paradigm. The task was an AXB judgment. Listeners heard one endpoint of a

continuum, then an intermediate stimulus, then the other endpoint, and judged whether the

intermediate stimulus (X) sounded more like A or more like B. Response was by button

press. Every AXB was also presented as BXA to counterbalance for primacy, recency, and

handedness effects.

Stimuli The A and B stimuli were synthetic disyllabic nonwords, stressed on the

second syllable, which differed in one segment — the initial segment for fillers, the vowel of

the second syllable for critical items. Between each A and B there were five intermediate X

stimuli, separated from each other and the endpoints by equal steps, making a 7-step

continuum in all6. Synthesizer parameters for the stimuli can be found in the appendix.

Stimuli were of high quality and sounded very similar to natural speech.

Figures 4.26 and 4.27 show how the A-to-B stimulus scales were constructed.

Every possible combination of the bracketed options was used to give a total of 32 filler

scales and 32 critical scales. Since there were 2 endpoints and 5 intermediate points for

each scale, the experiment required synthesizing 448 nonwords. This was done using the

SENSYN implementation of the Klatt terminal-analogue synthesizer, augmented by

homebrew software that constructed the intermediate points from the endpoints by linear

interpolation.

6 This is considerably fewer interm ediate steps than were used by Massaro and Cohen (1983) or Pitt
(1998). The disyllabic stimulus words, and the three-stimulus AXB trials, used in the present experiment
made each trial much longer than the sim ple monosyllabic X stim uli o f those authors. Use o f more
intermediate steps in this experim ent would have led to an impractically long experimental session. In the
event, acceptable psychometric functions and high significance levels were obtained with 5 intermediate
steps.

137

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.26. Schema for the filler stimuli of Experiment 1

A—B

p—t g i 0

+ + + r + +
a1

s—z k I
d3

Figure 4.27. Schema for the critical stimuli of Experiment 1.

A-B

P
t g 0

s + + + r + i-i +
a!

z k <*3

The sound [d3 ] was chosen because, being palatoalveolar, it has little coarticulatory

interaction with, or acoustic effect on, the high front vowels [i] and [I]. The synthesis

parameters for the [d3 ] were adjusted so that the transition from [i] to [d3 ] and that from [I]

to [d3 ] involved acoustic changes o f roughly equal magnitude.7

Typical trials were: Fillers[sAlgji]-X2-[ZAlgji], [pAlgiid ]-X -[Ulgjid ]; critical


3 5 3

[pAlkji]-X3 -[pAlkn], [zAlgjid3 ]-X i-[zAlgiid3 ]. Because the A and B stimuli on any given

7 A pilot experiment used final syllables that were closed with [b] instead o f Ij], and that began with [b] or
[gl rather than [k| or (g). Identification o f ambiguous tokens turned out to depend mostly on how many
[b |s were in the stimulus: The more [bis, the more [il-like the vowel sounded. This was probably an effect
o f compensation for (expected) coarticulation —listeners expected the vowel formants to be lowered by

138

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
trial could differ in initial consonant or final vowel, subjects never knew where the difference

was until they heard the X trial. This, together with the instruction to compare whole words

and the greater variety of word-initial than word-final differences, was intended to distribute

their attention more evenly over the word and discourage the strategy of listening only to the

final vowel in critical trials. The hope was to produce stronger TRACE-type lexical effects,

if any were there to be had, by encouraging higher-level processing of the stimuli.

Subjects were 15 young adults living in Western Massachusetts. All of them

reported having normal hearing and being native speakers of American English. They were

recruited by poster and paid for their participation. They were naive to the purpose of the

experiment.

Procedure. Subjects were tested four at a time in a quiet room. AXB stimuli were

low-pass filtered at 4.133kHz (down about 80dB at 5kHz), amplified, and presented over

Sennheiser TDH-49 headphones at a listener-selected volume level. One second elapsed

between the end of A and beginning of X, and between the end of X and beginning of B.

Subjects were told that they would hear three "words", that the middle word had been

digitally synthesized to be acoustically in between the first and last word, and that they were

to judge as quickly and accurately as they could whether it sounded more like the first word

or the last word. Response was by button press — the leftmost button on the response box

for the first word, the rightmost for the last word. After the last subject had responded, or

2.5 s had passed, there was a pause of 2.5 s, followed by the next trial. Each of the 320

different trials was presented twice. The experiment lasted 2 hours, broken by a 5-minute

break, a 15-minute break, and another 5-minute break.

4.2.5. Results

One subject who, in all four open-syllable conditions (gri/grl/kri/krl) gave fewer

than 75% [I] judgments at position 1 and more than 25% [i] judgments at position 5, was

labialization, and "corrected" them by, in effect, adding a few tens o f Hz to the F2s o f stimuli with [bjs in

139

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
exculded from analysis. All other subjects were very consistent at the extreme positions.

Their identification curves are shown in Figure 4.28 (average over all subjects). Half of the

data from three subjects (consisting of one presentation of each trial) was lost through

experimenter error.

Figure 4.28. Identification curves for the stimuli of Experiment 1, pooled across 14
listeners

100

90

80

70

60

% "l" kr d Z
50
response

40

30

20

10

0
1 2 3 4 5
Intermediate stimulus number

For a test statistic, we used each subject's mean % [I] responses across all five

intermediate stimuli in each condition. This was assumed to be normally distributed, an

assumption which was confirmed by a normal probability plot.

them . Result: [b] shifted the vowel judgm ents in favor o f the more fronted [i].

140

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.29. Mean % [I] response, all intermediate stimuli

95% confidence interval

Context Mean SD 00.025, 14 = 2.160)

53.4 7.50 [48".9 57.9J


[gj_d3]

[kj_d3] 53.3 7.82 [48.6 58.0]

41.8 7.55 [37.3 46.3]


[ g jJ

[ k jJ 41.9 6.07 [38.3 45.5]

The order of [I] response rates is [gj_d3 ] = [kj_d 3 ] > [gj_] = [ki_], precisely as

predicted by the MERGE TP and OT grammatical theories, and very different from the

[kj_d3 ] > [gj_d3 ] =[k_i_] > [gi_J predicted by TRACE.

The confidence intervals are wide because there is a great deal of individual variation

between subjects in their overall [I] report. To reduce this, the results are submitted to a

paired sample /-test. We have three degrees of freedom, so we can test three differences: 1.

between [gj_d3 ] and [kj_d 3 ], predicted by MERGE TP and OT/grammar to be zero and by

TRACE to be negative; 2. between [gi_d3 ] and [gi_J, to see whether the paradigm is

sensitive enough to find a phonotactic effect; 3. between the shift from [gj_d3 ] to [gj_J and

that from [kj_d3 ] to [kj_]. This last is the crucial comparison; TRACE predicts that the

difference will be positive (that the shift will be larger for the g syllables), while the other

two models predict that it will be zero. The numbers are given in Table 4.30.

141

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.30. Differences in mean "I" response, pairwise by subject

95% confidence interval

Context Mean SD 00.025, 14 = 2.160)

[gj_d3H kJ_d3J 0.086 3.824 [-2.21 2.386]

[gj_d 3 ]-[gi_] 11.6 7.16 [7.30 18.9]

([gj_d3]-fgj_])— -0.17 3.36 [-1.66 2.17]


([k i_ d 3 ]-[k jJ)

We find no difference, or at best a very small one, between the two closed syllables,

confirming that they may be used as a neutral baseline - as predicted by MERGE TP and

OT/grammar, but not by TRACE.

The CIs for the differences between the open and closed syllables excludes zero (in

fact, it excludes zero even at a 99% confidence level (one-tailed t test, /0.01, 14 = 2.624, p <

0.001). In 13 out of the 14 valid subjects, judgments shifted towards [I] on both the g

continuum and the k continuum. We have thus replicated the Massaro and Cohen (1983)

effect: Perception of ambiguous segments is influenced by phonotactics.

Moreover, the effects seem to be the same irrespective of how many lexical items are

similar to the legal nonword; the difference between the effect in the common g syllables

and the rare k ones is close to zero. The subjects’ numerical differences clustered around

zero, split evenly between positive and negative (there was no sign that they divided into two

groups of responders).

This is highly unexpected in TRACE. Since we did find a phonotactic shift,

TRACE'S only explanation is lexical activation spreading from partially overlapping words.

But a drastic reduction in the number and frequency of those words (when the [g] was

changed to [k]) did not reduce the shift at all.

142

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.6. Discussion

The results are clear-cut: They are very much as predicted by the MERGE TP and

the OT grammatical theory, but very different from our expectations under TRACE.

We observed a strong phonotactic effect: [I] was disfavored in open syllables

compared to closed ones. TRACE could only explain this by activation-spreading from

similar lexical items. However, a large difference between conditions in the number (and

frequency) of those items, caused by the [k]/[g] manipulation, produced equally large shifts

in both conditions. This result is not a "null effect". It is two positive effects - one of

which was expected under all theories, and the other of which was not expected under

TRACE.

The MERGE TP theory and the OT grammatical theory were both able to explain

the observed facts, because both of them focused on the crucial, systematic phonotactic

difference between the conditions and ignored incidental variation elsewhere in the stimulus.

TRACE, because it can't ignore anything, failed to predict an effect that was actually

observed.

If we had computed transitional probabilities based on preceding contexts of length

greater than one, as considered by Pitt and McQueen (1998: Note 2), we would have

erroneously predicted the same pattern of results as TRACE, for the same reason. These

results therefore argue against a TP theory using larger contexts.

4.3. Experiment 2: Sequence frequency and word-initial [pw] clusters

4.3.1. Rationale

The results of Experiment I suggested that the size of a phonotactic boundary-shift

effect is unaffected by phonological context which does not directly participate in the

phonotactic pattern. To check that this finding was not an artifact of the acoustic

143

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
constitution of the stimuli, or of some idiosyncrasy of the lexicon, it was decided to replicate

the experiment with radically different stimuli.

Where Experiment 1 used an ambiguous vowel at the end of a nonword, this

experiment used an ambiguous consonant at the beginning. The inventory gap to be

exploited was the lack of syllable-initial [pw] discussed in §1.3.2.5. Listeners were to be

presented with a [p ]-[k ] continuum before [_w] and before [_ j ].

Given the low frequency of [pw] and high frequency of [kw], both TRACE and the

MERGE TP theory predict fewer [p] responses would be given before [_w]. If the size of

the boundary shift is modulated by the vowel following the [_w], this would favor TRACE,

which is sensitive to the entire carrier stimulus, over MERGE TP, which only considers the

context immediately adjacent to the ambiguous segment.

As we saw in §2.3.2.5, the lack of [pw] is a lexical rather than a phonological gap.

Since no active markedness constraint forbids [pw], the OT grammatical theory predicts no

boundary shift.

4.3.2. Design

Listeners were presented with a 7-step continuum (endpoints and 5 intermediate

steps) from [p] to [k] in the following contexts:

144

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.31. Phonotactics of the stimuli for Experiment 2

Carrier
[Pi [kj

Lwifhet] V. A

Lwifnetfl V. V

Lwaefiist] V. V

Lwaefhetfl 7? V

Lnfnet] V V

Lufnetfl V V

[_jaefhet] V V

[_j2efnetf] V V

The phonotactic marginality of the [p] endpoint could, as Table 4.31 shows, be

varied by changing the following glide from [w ] to [ j ]. Since the ambiguous consonant was

always preceded by the same thing (silence), the TP theories (both INC-1 and SC-1) expect

only the [w ]/[i] manipulation to affect the location of the [p]-[k] boundary.

Similarity of the stimulus to other words in the lexicon was varied by manipulating

the quality of the vowel following the glide: When the vowel was [i], the stimulus was

closer to more words than when it was [»]. Table 4.32 counts the words beginning with

each of the stop-glide-vowel sequences used in this experiment. The cohorts are given in

the appendix to this chapter. (These cohorts, unlike those in the previous experiment, were

145

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
extracted from the American English Kucera-Francis database rather than from Celex, even

though Celex is much more complete, because there are two different phonemes in British

English corresponding to American English [xj.)

Table 4.32. Frequency of the syllables in the stimuli for Experiment 2

Total frequency (per million

Onset Number of words words, written)

[kwi] 24 113

[pwi] 1 0

[kwte] 3 10

[pws] 0 0

[kill 44 418

74 772
[pn]

[kias] 21 77

[pis] 14 139

Both [p] and [k] are roughly equally frequent in the [_j i ] and [_jse] contexts,

providing a statistically neutral baseline. Only [k] is found in the [_wi] and [_wae] contexts

(except for the very infrequent puissance, probably not known to most of the listeners), but

it is much more common before [_wi] than before [_wae]. By the same logic as in

Experiment 1, TRACE predicts more lexical activation in the [_wi] stimuli than in the [_wse]

146

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ones, and hence a larger bias towards [k]. The MERGE TP and OT grammatical theory

predict that the vowel will make no difference - MERGE TP because it is outside the

statistically relevant context, and the OT grammatical theory because no active markedness

constraint is violated.

4.3.3. Predictions

4.3.3.1. TRACE simulation

In order to model this experiment, it was necessary to add a new phoneme to

TRACE: [w]. This was done by modifying the featural parameters for [u]. The glide was

made by reducing the vowel's POW(er) and VOC(alicness) specifications from 8 to 6,

leaving the other parameters unaltered.

A new ambiguous phoneme [Y] between [p] and [k] was constructed by averaging

the feature values of the original [p] and [k]. The new [Y] was not quite in between them;

when run with the lexicon turned off, TRACE tended to favor the [p] interpretation. The

difference in activation levels started out at 0 and increased to 3 after 5 1 cycles.

A new lexicon was constructed, based on the American English Kucera-Francis

database. All words were extracted which met the following criteria:

1. They contained only phonemes which were in the new TRACE inventory. Since

TRACE does not support [ae], words with [a] were eliminated, and words with [ae] were

included, with the [ae] recoded as "a". Since TRACE does not support [f], words with [S]

were eliminated, and words with [f] were included, with the [f] recoded as [S],

2. They occurred at least 5 times per million words in Kucera-Francis.

3. They were at most 9 phonemes long.

147

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The latter two criteria were imposed to keep the lexicon to a size comparable with

that used by McClelland and Elman (1986). This procedure resulted in 494 lexical items.

The critical experimental sequences occurring in the lexicon are shown in Table 4.33:

Table 4.33. Word-initial occurrences of the critical syllables from Experiment 2

Sequence Words

[kwi] quick, quill, queer, quit

[pwi] (none)

[kwae] quack

[pwae] (none)

[kn] crib, crisp, critic, script

predict, precarious, pretty


[p-n]

[kiae] craft, crack, scrap

[pjtel (none)

In order that the simulated lexicon should better approximate the real one, the words

practise8 and practically were added so that the [piae] cell would not be empty. These

words had been excluded by the lexicon-constructing procedure because practise had zero

frequency and practically had a syllabic [1] in the 4-syllable pronunciation favored by

Kucera and Francis. They were added as [pjaektis] and the 3-syllable [pjaektikli].

8 The verb, so spelled in the American English on-line dictionary.

148

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Pre-testing showed that, because of the close similarity between the [u] and [I]

phonemes in TRACE, the stimulus [pjifkAs] produced very strong activation of the word

proof, which soon came to dominate the pattern of activtion. These two phonemes are not

easily confusable in actual speech; Luce's confusion matrices (1986:Table 3.8) show that,

even at a -5 dB signal-to-noise ratio, [I] was heard as [u] only 3.3% of the time. Judging

this to be an undesirable artifact of TRACE'S small feature set and my choice of [I]

parameters, I removed proof from the lexicon.

The simulation was run using the inputs [Y w a e fk A s], [Y w iA c a s], [YiaefkAs], and

[YnfkAs]. As before, the measure of predicted effect size was taken to be the difference in

activation between the [p] and [kj units at Cycle 75. These are shown in Table 4.34:

Table 4.34. Results of the TRACE simulation of Experiment 2: Activation levels at Cycle
75

Stimulus [p] activation [k] activation Difference

[YwiQcas] 21 46 -25

[YwaefkAs] 23 42 -19

[Yji Acas] 36 37 -1

[YjaefkAs] 26 40 -16

As we expected, the [_wi] context produces a higher level of [k] activation than the

[_wae] context, and a larger difference between [p] and [k] activation levels. The [_ ji]

context is very nearly neutral betwen [p] and [k]. The [_Jae] context produces an

149

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
unexpectedly strong bias towards [k], which on closer inspection proves to be emanating

from the highly active word craft. This is no artifact; [kixfkAs] contains a sequence

differing from craft in only one feature, and that late in the word where acoustic mismatches

are least inhibitory. TRACE'S predictions of the rate of [p] response across the whole

continuum are therefore [_n] > [ _ J x ] = [_wae] > [_wi] The relatively low frequency of

craft may increase the [p] bias before H ® ] in actual practice, leading to a predicted order

[_ n ]58 L-i®] > [_wae] > [_wi], but in any case we do expect [_W®1 > L ^ l - that is, that the

magnitude of the phonotactic shift away from the illegal *[pw] sequence will be modulated

by the following vowel.9

4.3.3.2. MERGE TP

As in Experiment 1, we distinguish two versions of the MERGE TP theory

depending on what is taken to be the relevant phonological context: The INC-1 version,

which uses the segment preceding and the segment following the ambiguous segment, taken

separately, and the SC-1 version, which takes context to consist of the preceding and

following segment as a unit. Both of these theories are crucially distinct from TRACE in

that the first vowel of the stimulus lies outside the relevant context.

4.3.3.2.I. INC-1 context

Frequency counts derived from the American English Kucera-Francis database are

shown in Table 4.35:

9 It is not clear how differing activation levels in TRA CE units are to be mapped onto different
boundaries, or what constitutes a "large" difference. An estim ate is provided by the sim ulation o f the
Massaro-Cohen r/l effect in McClelland and Elm an (1986), and replicated in § 1.3.2.1, which found a 20-

150

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.35: Diphone frequencies for the stimuli in Experiment 2

Sequence Frequency (per million words)

[#p] 23245

[#k] 26201

[pw] 27

[kw] 2480

8525
[pi]

[ki] 3018

Since the word-boundary symbol [#] occurs, by definition, one million times per

million words,

(4.36)

Pr([p] | [ # J ) = 0.0232
Pr ([k] | [ # J ) = 0.0262

Initial [w] occurs 43594 times per million words, and [w] in general 59750, so non­

initial [w] occurs 16156 times per million words. Hence,

point difference between the activation o f the [j ] and [I] units after [s]. A difference o f this magnitude
should correspond to an effect of substantial size.

151

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.37)

Pr([p] | [_w]) = 27/16156 = 0.00167


Pr ([k] | [_w]) = 2480/16156 = 0.153

Initial [j ] occurs 14864 times per million words, and [i] in general 118503, so non­

initial [i] occurs 103639 times per million words. Hence,

(4.38)

Pr ([p] | U l ) = 8525/103693 = 0.0822


Pr ([k] | [_ J ]) = 3018/103693 = 0.0291

Thus, the decision variables for the INC-1 theory are

(4.39)

P r([p ]|I [#_]) * Pr ([p] I [_w]) = 0.0232 * 0.00167 = 0.0000387


P r([k ]| I [#_]) * Pr ([k] | [_w ]) = 0.0262 * 0.153 = 0.00401

P r([p ]||[ # J ) * P r ( [ p J | | [ _ j ]) = 0.0232 * 0.0822 = 0.00191


P r([k ]| [#_]) * Pr ([k] |[ [_j]) = 0.0262 * 0.0291 = 0.000762

A [p] is 103 times less likely than a [k] in [#_w], but 2.5 times more likely in [# _ j ].

The INC-1 theory therefore predicts a strong bias against [p] in the [#_w] context

compared to the [# _ j ] context. The order of [p] report across the continuum is expected to

b e [ # _ n ] = [#_Jae] > [# _ w i] = [# _ w a e], s in c e th e fo llo w in g v o w e l lie s o u ts id e th e re le v a n t

c o n te x t.

152

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.3.2.2. SC-1 context

In this version of the MERGE TP theory, the left and right contexts are treated as a

single unit. Relevant frequency counts from the American English Kucera-Francis database

are shown in Table 4.40:

Table 4.40: Triphone frequencies for the stimuli of Experiment 2

Sequence Frequency (per million words)

[#pw] 0

[#kw] 1172

[#pj] 6858

[#ki] 1487

Since sequences of the form [#_w] occur about 12704 times per million words, and

those of the form [#_i] occur about 39037 times,

(4.41)

Pr ([p] | [#_w]) = 0/12704 = 0.000


Pr ([k] | [#_w]) = 1172/12704 = 0.0923

Pr ([p] | [#_J]) = 6858/39037 = 0.176


Pr ([k] | [#_j]) = 1487/39037 = 0.0381

Again, we expect a sizable bias against [p] in the [#_w] environment compared to

the [#_j ] environment, since [p] is infinitely less frequent than [k] in [#_w], but is 4.6 times

153

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
more frequent in The following vowel, being outside the relevant context, is again

predicted to make no difference, so that the predicted order of [p] response across the

continuum is [# _ j i ] = [#_J£e] > [#_wi] = [# _ w te ].

4.3.3.3 OT grammatical theory


As discussed in §1.3.2.5, the [pw bw] onsets violate a markedness constraint which

is too low-ranked to actually ban them, OCP[Lab ]. It does not dominate any faithfulness

constraints, and hence, in the OT grammatical theory, is not expected to influence

perception.

(4.42) [p] illegal => [k] bias

UR = • | (faithfulness) OCP[LAB]

a. (•, [kwifkous])

*
b. (•, [pwifkous])

The prediction is therefore a null result; any sign of an effect is unexpected.

However, it is also possible that listeners' grammars differ: some rank OCP[LAB]

high enough to make [pw] illegal, while others do not. For these listeners, the predicted

order of [p] response rates is [# _ j i ] = [# _ J a e ] > [#_wtj = [#_wae], just as in the MERGE TP

theory. The vowel plays no role, because it is outside the structural description of

OCPfLAB], The effect will be attenuated by averaging these listeners in with the others, but

no other pattern of results should occur besides the null one and [#_Ji] = [#_Jae] > [#_wi] =

[#_wae],

154

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.4. Methods

The same methods were used as in Experiment 1, except for the stimuli, which were

synthesized from the following template:

Figure 4.43. Schema for the critical stimuli of Experiment 2

A—B

I t

p -k + + + +
w fits

ae
tf

Figure 4.44. Schema for the filler stimuli of Experiment 2


A—B

I
P

+ w + + fhe + Hf

X
k

The [p]-[k] continuum was made by contracting the bandwidth of a burst centered

at 875 Hz from 1000 Hz wide to 100 Hz wide, so that a diffuse burst became a compact one

with the same center frequency.

155

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As in Experiment 1, there could be an ambiguous phoneme at the beginning or end

of the stimulus. On any AXB trial, listeners could not know where it was until they had

processed the X stimulus.

Eight University of Massachusetts undergraduates participated in the experiment.

To minimize the chance that they were familiar with [pw] onsets from foreign-language

study, only listeners who had not studied French or Spanish were allowed to participate.

4.3.5. Results

One listener's data did not reach criterion performance on the endpoint stimuli and

was discarded, leaving 7. A total of of 3840 trials were collected, of which 100 were

discarded (for pressing an unassigned button, or having an RT above 1500ms). Results for

each of the 4 conditions are shown in Figure 4.45 and Table 4.46. No trace of a

phonotactic boundary shift was detected.

156

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.45. Identification curves for the stimuli of Experiment 2, pooled across 7 listeners

90

80

70

60

50
O/ nmh
70 p

response
40

30

20

10

0
1 2 3 4 5
Intermediate stimulus number

Table 4.46. Mean % [p] response, all intermediate stimuli

95% confidence interval

Condition % [pj response SD 00.025,7 = 2.365)

[# _ W I] 56.2 4.6 [45.3 67.1)

[#_wae] 53.4 4.3 [43.2 63.6]

53.3 4.8 [41.9 64.7]


[#_Ji]

[#_jae] 55.6 4.3 [45.4 65.8]

157

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.6. Discussion

Neither lexical statistics, transitional probablities, nor low-ranked markedness

constraints seem to have an effect here. This is a null result (and one based on a small

sample); drawing conclusions from it is problematic. The only theory that directly predicted

a null result was the OT grammatical theory, but might there have been confounds that

vitiated the design?

One possiblity is that the stimuli were ill-chosen. It might be that categorical

perception of the initial stop stimuli left little time for the development of lexical effects.

However, a follow-up experiment in which the stop was part of the context and the glide was

varied from [w ] to [j ] was attempted, and likewise failed to find any effect: Averaged across

seven listeners, the percentage of "r" response was 58.1 after [k_] and 55.9 after [p_].

Another possibility is a low-level acoustic-phonetic interaction between the stop and

the glide counteracting the phonotactic effect. However, the two best-known of these

effects, auditory contrast and compensation for coarticulation, are both expected to assist the

phonotactic effect. Auditory contrast would make the bursts sound higher before [w], and

hence produce fewer [p] responses there. Compensation for coarticulation would likewise

attribute some of the labiality of the burst to anticipatory rounding before a following [w],

making the burst have to be more labial in order to sound like [p] and again reducing [p]

responses before [w]. This is what was found by Bailey, Summerfield, and Dorman (1977,

as cited in Repp 1982), who presented a [b]-[d] continuum before front and back vowels

and found that the vowel with the lower F2 produced more [b] responses.

As a more rigorous test of the OT grammatical explanation, it was decided to

directly compare the phonotactic badness of [pw] with that of [tl], a configuration known to

158

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
be illegal enough to cause boundary shifts on an [j ]-[1 ] continuum (Massaro & Cohen

1983, Pitt 1998). Since the statistics of [pw] and [tl] are very similar, any finding of a

difference between them in phonotactic efficacy would be strong evidence for the OT

grammatical theory over TRACE and the MERGE TP theories.

4.4. Experiment 3: Sequence frequency and the relative phonotactic badness of

[pw] and [tl] onsets

4.4.1. Rationale

Although [pw] is a marked onset in English, it is not as marked as those which have

hitherto been shown to cause phonotactic boundary shifts. In §1.3.2.5, [pw] was analyzed

as a lexical rather than a phonological gap, violating only the low-ranked OCP[Lab ]. On

the other hand, it is agreed by all authorities that initial [tl] is not permitted in English. The

perceptual strength of this prohibition has been experimentally demonstrated: The

boundary between [j ] and [1] is closer to [1] after [t_ ] than after [p _ ] (Massaro & Cohen

1983, Pitt 1998). In Chapter 2, §3.2.5, we have analyzed this as a consequence of a general

prohibition on successive same-syllable [-cont] segments using the same major articulator,

expressed as a markedness constraint OCP(CONT, PL).

By presenting the same [p]-[t] continuum before [w], [j], and [I], it was hoped that

we would be able to compare the size of the phonotactic boundary shift produced by [w]

with that produced by [I].

159

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.2. Design

Listeners were presented with a 7-step [p]—[t] continuum (two endpoints and five

intermediate steps) in the environments

Table 4.47. Phonotactics of the stimuli for Experiment 3

Carrier [t]
[Pi

[_wifkoos]
? V

Lw ivnam ]
7 V

Lwaefkous]
7 V

[_waevnam]
9 V

[_Jifkous] V V

[_Jivnam] V V

[_jaefkous] V V

[_jaevnam] V V

Llifkous] V X

[_livnam] V X

[_laefkous] V X

[Jaevnam ] V X

160

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The phonotactic status of the endpoints could be varied by manipulating the glide.

An [1] made [t] illegal (a phonological gap); a [w] made [pj marginal (legal, but infrequent -

a lexical gap); a [i] made both legal.

Here, the MERGE TP and OT grammatical theories make different predictions. To

MERGE TP, [pw] and [tl] are both disfavored, since both sequences are of near-zero

probability. To the OT grammatical theory, only [tl] is illegal (ruled out by an active

markedness constraint). Hence MERGE TP predicts that both [w] and [1] contexts will

shift the [p]-[t] boundary, in different directions, compared to the [j ] baseline. The OT

grammatical theory, on the other hand, predicts that the [1] context will cause a much larger

shift (if any) than the [w] context.

As before, the MERGE TP and OT grammatical theories predict that the vowel of

the initial syllable, [i] or [ae], will have no effect on the boundary location, being outside of

the statistically or phonologically relevant context. TRACE, on the other hand, expects the

choice of vowel to contribute to the effect: since [plae] is a much more frequent word onset

than [pli], the shift should be larger before [lae] than [li]; and since [twi] is more frequent

than [twae], the shift (in the other direction) should be larger before [wi] than before [was].

Cohort sizes and frequency counts illustrating this are shown in Table 4.18; again,

they are taken from the American English Kucera-Francis dictionary.

161

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.48. Frequency statistics for the stimuli of Experiment 3

Number of words beginning Total frequency (per million

Onset with that onset words, written)

[twi] 10 34

[pwi] 1 0

[twae] 1 0

[pwae] 0 0

[tn] 29 128

74 772
[pji]

[tiae] 79 600

[pjae] 14 139

[till 0 0

2 1
[ph]

[tlx] 0 0

[plae] 33 548

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.3. Predictions

4.4.3.1. TRACE simulation

These stimuli were chosen in the expectation that the TRACE response would be

determined chiefly by the word onsets, from stop through vowel. The rate of [p] response

before [_w], L-0, and [J] was expected to be determined largely by the relative frequencies

of [p] compared to those of [t] in each environment:

(4.49)

U ] > [_J] > Lw]

since [pi] will activate a much larger cohort than [tl], and [tw] than [pw].

The cohort sizes, and activation strengths, were expected to be modulated by the

vowel. For example, an initial [pi] will activate a cohort of words. Table 4.48 shows that if

the following vowel is [i], only a couple of rare words will receive support from that [i],

while the rest will tend to be deactivated by the mismatch. If the vowel is [ae], on the other

hand, a larger set of words will be supported and further activated, and correspondingly

fewer will be deactivated. Hence [p] should receive more support from [plae] than from

[pit]. If this is what happens, TRACE should make the following predictions:

163

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.50)

[_waej > [_wi]


[_Ji) = L j« I, or perhaps [_ j i ] > [_J®]
[J * ] > Lli]

The TRACE distribution is supplied with an ambiguous stimulus phoneme [T] in

between [p] and [t], which activates both equally. This was used to construct the simulated

ambiguous stimuli.

Simulations for the [_fltous] and [_vnom] stimuli were made with slightly different

lexicons, one of which included words with [f] and the other of which included words with

[v]. These simulations are discussed separately in 4.4.3.1.1. and 4.4.3.1.2.; the results are

then compared with the expected predictions in 4.4.3.1.3.

4.4.3.1.1. [_fkous] stimuli

The same lexicon was used as in the simulation of Experiment 2. Words in the

lexicon which began with the critical onsets are shown in Table 4.51. This approximated

the distribution found in the full lexicon, as shown in Table 4.48.

164

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.51. Words beginning with the critical onsets in the lexicon used for the TRACE
simulation of Experiment 3

Onset Words

[twi) twist

[pwi] -

[twae] -

[pwae] -

[tn] trig, trick

predict, precarious, pretty


[pn]

[tiae] traffic, track, tract, trap, traps

[pi«e] practice, practically

[tli] -

-
[pli]

[tlae] -

[plae] placid, plastic

When the simulation was first run, it was found that every context produced an

extremely strong [t] bias (the ratio of activation levels at Cycle 75 being about 70 to 10).

This was because a large number of short words ending in [t] were slightly activated by the

165

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
initial ambiguous segment, and remained slightly active throughout the simulation, giving [t]

a large cumulative advantage regardless of the following segments. This artifactual effect

completely swamped any influence of the remainder of the stimulus. To circumvent this

problem, the lexicon was edited, and word-final [p] and [t] were recoded as [b] and [g], so

that the ambiguous [T] segment would not activate them.

When the simulation was run again, it turned out that TRACE did not predict any

phonotactic effect in this experiment. The biases before [_WL L JL and [J ] were of

comparable size, as were the activation levels of the [p] and [t] units:

Table 4.53. Results of the TRACE simulation of Experiment 3: Activation levels at Cycle
75

Stimulus [p] activation [t] activation Difference

[TwifkAs] 38 32 6

[TwaefkAs] 44 27 17

[TjifkAs] 42 29 13

[TjrefkAs] 36 39 -3

[TlifkAs] 38 31 7

[TlaefkAs] 39 30 9

166

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Why is the bias in the pre-[_l] cases so small? The TRACE displays show that the

network is extremely good at spotting words, or parts of words, inside other words.

When the [TlifkAs] stimulus is presented, the most activated word units on Cycle 75

are leaf (37), plea (32), and subtly (23). The first of these is neutral between [p] and [tj.

The other two, plea and subtly, urge in opposite directions and cancel each other out. The

lexicon contains no [ph]-initial words which would decide the issue in favor of [p], so, as we

expected, the phonotactic effect is small if it exists at all.

When the [TlaefkAs] stimulus is presented, by far the most active unit on Cycle 75 is

laugh (46). It becomes active early, and is strong enough to inhibit other word candidates,

such as placid and plastic, which we had counted on to produce a larger [p] bias.

If the differences are taken as predictors of the rate of [pj report across the

continuum, then the expected order of effects is [_wae] > [_ j i ] > [_lae] = [_li] = [_wi] >

[_jae]. The average difference before [_w] is 11.5; before [_ j ], 5; and before [_1], 8.

TRACE predicts

(4.54)

[ _ w ] > [ J ] > L i]

and

167

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.55)

[_wae] > [_wi]


[_ J i] > [ _ J * ]

[Jae] = Lli] (or perhaps [_lse] > [_li])

These are nearly identical to the predictions made by TRACE in (4.50).

4.4.3.1.2. LvnAm] stimuli

For this simulation, a lexicon was selected using almost the same procedure as in

Experiment 2. The procedures were the same except that where the Experiment 2 lexicon

included words with [f] (recoded as [fl), this one included words with [v] (recoded as [j]).

Pretesting showed that prove and approve tended to dominate responses in the pre-[_j]

conditions, as proof had in Experiment 2, so they were removed on the same grounds: that

[i] and [as] are not actually very confusable with [u]. As with the simulation in the previous

section, word-final [p] and [t] in the lexicon were replaced with [b] and [d]. The resulting

lexicon contained the same set of words with the critical onsets as in the simulation for the
V .

[_fkous] stimuli (Table 4.52).

Results of the simulation are shown in Table 4.56:

168

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.56. Results of the TRACE simulation of Experiment 3: Activation levels at Cycle
75

Stimulus [p] activation [t] activation Difference

[TwivnAm] 41 29 12

[TwaevnAm] 48 20 28

[TjivnAm] 49 26 23

38 35 3
[TiaevnAm)

[TlrvnAm] 43 30 13

[TiaevnAm] 44 25 19

Again, the size of the difference in each case is determined by one or two lexical

items. For the [TlivnAm] stimulus, there is some activation from plea (32) and pleaded (23)

in support of [p], which is reduced somewhat by subtly (16). The larger [p] bias for the

[TiaevnAm] stimulus is due to placid (66). The great difference in bias between [TuvnAm]

and [TiaevnAm] is caused by [TiivnAin]'s activation o f previous (51).

The predicted rates of [p] report are [_wae] > [_ j i ] > [Jae] > [_wi] > [_h] > [jtas].

The average values of the differences are: [_w], 20; [_ j ], 13; [_1], 16, so TRACE predicts

169

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.57)

[_ w ]> [_ l]> [_ i]

(4.58)

[_wae] > [_wi]


L ji] > Li®]
L M > L I i]

4.4.3.I.3. Expected and actual TRACE predictions

Our expectations of what TRACE would do are only partially supported by actual

simulations, which revealed TRACE'S extreme sensitivity to individual lexical items. The

expected phonotactic effect, [_w] > [_ j ] > [_1], was not supported by the simulation. In both

the [_fkous] and [_vnAm] contexts, [_w] produced the largest [pi bias when it had been

expected to produce the smallest, and LU. expected to cause the largest, never did. The

cohort of words activated by the initial stop-glide sequence did not long remain active.

When the vowel arrived, it deactivated the great majority of the cohort members, reducing

their influence on the stop judgment.

On the other hand, the effect of the following vowel was pretty much as expected,

because the stop-glide-vowel sequence was three segments long— long enough to mismatch

all but one or two lexical items and activate them sufficiently to dominate the lexicon and

decide the outcome. The phonotactic effect in TRACE does not come from partial activation

of many lexical items, but from high activation of one or two. We therefore expect

(4.59)

[_wae] > [_wi]


L j i ] > L j ®]
[_lae] > LIi]

170

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4A.3.2. MERGE TP

4.4.3.2.I. INC-1 context

Frequency counts from the American English Kucera-Francis database are shown in

Table 4.60. The "forbidden” [pw] and [tl] occur with fair frequency within words (lapwing,

potlid).

Table 4.60. Diphone frequencies for the stimuli of Experiment 3

Sequence Frequency (per million words)

23245
[#p]

[#t] 41840

[pw] 27

[tw] 996

8525
[pj]

[ti] 8468

3589
[pH

[tl] 220

171

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hence

(4.61)

P r([p ]| [#_]) = 0.0232


Pr ([t] | [#_]) = 0.0418

Two-phoneme sequences whose second member is [w] occur 16156 times per

million words, so

(4.62)

Pr ([pw] | [_w]) = 27/16156 = 0.00167


Pr ([tw] | [_w]) = 996/16156 = 0.0616

Two-phoneme sequences whose second member is [j ] occur 103639 times per

million words, so

(4.63)

Pr ([pi] | [_j]) = 8525/103639 = 0.0823


Pr ([ti] | [_ j ]) = 8468/103649 = 0.0817

Two-phoneme sequence whose second member is [1] occur 57018 times per million

words, so

(4.64)

Pr ([pi] | [J ]) = 3589/57018 = 0.0629


Pr ([tl] | [_!]) = 220/57018 = 0.00386

The predictive variables for the INC-1 theory are therefore

172

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.65)

Pr ([p] | [#_]) * Pr ([pw] | [_w]) = 0.0232 * 0.00167 = 0.0000387


Pr ([t] | [ # J ) * Pr ([tw] | [_w]) = 0.0418 * 0.0616 = 0.00257

Pr U p] | [#_]) * Pr ([p j] | [_J]) = 0.0232 * 0.0823 = 0.00191


Pr ([t] | [#_]) * Pr ([ti] | [_J]) = 0.0418 * 0.0817 = 0.00342

Pr U p] | [#_]) * Pr ([p i] | LH ) = 0.0232 * 0.0629 = 0.00146


Pr ([t] | [ # J ) * Pr ([tl] | [_1]) = 0.0418 * 0.00386 = 0.000161

The [_ j ] context is nearly unbiased between the two stops. In the [_ w ] context, [t] is

expected to be about 66 times as likely as [p], while in the [_1] context, [p] is expected to be

about 9.1 times as likely as [t].

The INC-1 version of the MERGE TP theory estimates the phonotactic badness of

[pw] as much greater than that of [tl], because [tl] is of far greater probability than [pw]

conditional on the glide. In absolute terms, [tl] is actually about eight times more frequent

than [pw]. Because the left and right contexts contribute independently to the theory's

probability estimates, it overestimates the rate at which [tl] will occur word-initially.

Again, MERGE TP INC-1 does not expect anything else in the stimulus context, in

particular the vowel following the glide, to influence the size of the boundary shift.

4.4.3.2.2. SC-1 context

Here, the left and right contexts are treated as a single unit. The relevant sequence

frequencies, from the American English Kucera-Francis database, are shown in Table 4.66:

173

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.66. Triphone frequencies for the stimuli of Experiment 3

Sequence Frequency (per million words)

[#pw] 0

[#tw] 251

[#pj] 6858

[#ti] 2625

[#pl] 1981

[#tl] 0

Since sequences of the form [#_w] occur about 12704 times per million words.

those of the form [#_j ] occur about 39037 times, and those of the form [#_1] occur about

15459 times,

(4.67)

Pr ([p] | [#_w]) = 0/12704 = 0.000


P r([tl|[# _ w ]) = 251/12804 = 0.0196

Pr Op] | [#_J]) = 6858/39037 = 0.176


Pr ([t] | [#_j]) = 2625/39037 = 0.0672

Pr ([p] | [# 1]) = 1981/15459 = 0.128


Pr ([t]| [#_!]) = 0/15459 = 0.000

\nA
174

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In the [#_j] context, the bias is slight - [p] is only about 2.6 times as likely as [t].

However, in the [#_w] context, [p] is infinitely less likely than [t], while in the [#_l] context,

[t] is infinitely less likely than [p].

Under SC-1, we thus expect a large boundary shift, in opposite directions, in both

the [#_w] and the [#_1] contexts10. Again, the following vowel is outside the relevant

context and is not expected to contribute to the effect.

4.4.3.3. OT grammatical theory


If [pw] is not actually forbidden by any active markedness constraint in English,

then we do not expect the [#_w] condition to produce any kind of boundary shift - it should

be indistinguishable from the [#_j ] condition. On the other hand, the highly illegal [tl] is

ruled out by an active constraint, OCP (PL, CONT), and we expect a sizable boundary shift

in the [#_1] condition. The following vowel is again outside the structural description of

OCP (PL, CONT) and is not expected to have an effect.

(4.68) [t] illegal => [p] bias

UR = • OCP (PL, CONT) (faithfulness) *LABLAB


a. (•, [p ia e v n A m ])
*!
b. (•, [tKevnAm])

(4.69) [p] marked, but not illegal => small if any [t] bias

UR = • OCP (PL, CONT) (faithfulness) ♦l a b l a b


a. (•, [pwaevnAm]) *!
b. (•, [twaevnAm])

10 We are assuming here that the magnitude o f the boundary shift is controlled by the ratio o f the
probabilities, for reasons described in §1.3.2.3. If it is controlled by the difference (a poorer guessing
strategy), then the [#_ j ] and [#J1 contexts are about equally discouraging to [t], which certainly does not
reflect native speakers' intuitions about phonotactic permissibility.

175

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.4. Methods

Unlike the stimuli in Experiments I and 2, this set was constructed from natural

speech. Several dozen tokens of each carrier were recorded by the experimenter (a native

speaker of American English) as a list of isolated words. The carriers were recorded

without an initial stop. One token of each was selected on the basis of the experimenter's

judgment of clarity and of uniformity of speaking rate. The most important criterion, and a

difficult one to meet, was that the initial glides not be confusable with each other.

To these carriers was prepended a 7-step synthetic [p]-[t] continuum, consisting of

a burst plus aspiration. The continuum was created by varying the F2 onset from 1000 Hz

to 2000 Hz in equal steps. The burst was kept very short to prevent it from sounding like

[k]. After the onset, F2 continued in a straight line towards 1000 Hz as the aspiration faded

out.

The presentation paradigm was the same as that used in Experiments 1 and 2. The

only difference was that in this experiment, the ambiguous segment only ever came at the

beginning of the stimulus. The ends of the stimuli did vary from trial to trial, but were

always the same within each trial.

Nineteen University of Massachusetts students participated. None reported any

history of hearing disorders.

4.4.5. Results

Of the nineteen listeners, twelve performed accurately enough on the continuum

extrema to be included in the analysis. Each contributed twelve responses to each

ambiguous stimulus, making 720 responses per subject.

176

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
An unexpected finding was that the "irrelevant" filler context, LvnAm] and [_fkoos],

actually influenced the location of the [p]-[t] boundary in the baseline pre-[_j] condition.

For this reason, I will discuss the [_vnAm] and [_fkous] results separately.

4.4.5.1. [_vnAm] stimuli

Psychometric functions are shown in Figures 4.70 and 4.71. It is clear that the

listeners were perceiving the continuum as a smooth transition from [p] to [t]. Figure 4.70

compares the baseline conditions (before [_ j ]) with the critical condition before [_1]; Figure

4.71, with the critical condition before [_w]. One can see that the pre-[_l] responses are

considerably more favorable to [p] than the pre-[_j] and pre-[_w] responses. These latter

are virtually indistinguishable from each other. The identity of the following vowel does not

seem to have a strong effect on the response curve.

To assess the statistical significance of the phonotactic shift, the overall rate of [p]

response was calculated separately for each subject in each condition (based on a maximum

of 60 responses to the whole continuum in each condition). A planned contrast, the

difference between the rate in the baseline condition and those in each of the two critical

conditions was computed for each listener, and these rates were averaged across all 12

listeners to estimate effect size. Results are shown graphically in Figures 4.70 and 4.71,

and in tabular format in Tables 4.72 and 4.73:

177

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.70. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled
12 listeners, comparing the [_1] condition with the [_J] baseline

100

90

80

60
•_llvnAm
% "p" ■JaevnAm
50
response ■_rlvnAm
■ raevnAm
40

20

1 2 3 4 5
Intermediate stimulus number

178

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.71. Identification curves for the [...vnAm] stimuli of Experiment 3, pooled
12 listeners, comparing the [_w] condition with the [_j ] baseline

100

90

80

70

60
_wlvnAm
% "p" _waevnAm
50
response _rlvnAm

40
raevnAm

30

20

1 2 3 4 5
Intermediate stimulus number

179

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.72. Mean percent "p" response, all intermediate [...vnAm] stimuli

95% confidence interval

Context Mean SD 00.025, 12 = 2.179)

Lwi] 48.4 9.02 [42.7 54.1]

[_wae] 45.6 10.9 [38.7 52.5]

44.5 14.3 [35.4 53.6]


Lit]

L i* ] 42.4 9.8 [36.2 52.2]

58.5 11.5 [51.2 65.8


Lli]

LM 56.9 13.5 [48.3 70.4]

Table 4.73. Differences in mean "p" response, pairwise by subject, [...vnAm] stimuli

95% confidence interval

Difference Mean SD (t0.025, 12 = 2.179)

[Ji]-L ii] 14.0 13.3 [5.53 22.5]

Llae]-LJae] 14.5 11.3 [7.35 21.7]

[_ w i ] - U i ]
3.93 16.9 [-6.83 14.7]

[_wre]-Llae] 3.15 10.2 [-3.32 9.62]

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The difference between the [_J] and [ _ j ] conditions is positive, as expected, and the

95% confidence interval around each of the means excludes zero. The differences in the

[_li] and [_lae] cases are not distinguishable from each other. On the other hand, there is no

reliable difference between the [_w] conditions and the [_•*] baseline, and the nonsignificant

trend actually goes in the wrong direction.

These results are just what we would expect under the OT grammatical theory: A

large boundary shift caused by an active markedness constraint, and none by an inactive

one. They strengthen our suspicion that the lack of a perceptual bias against [pw] in

Experiment 2 was a real phenomenon, not just an effect of an insensitive paradigm.

TRACE'S prediction of more [p] responses before [_wae] than [_wi], and before

Llae] than [_li], is not borne out; if anything, the reverse has occurred, though the trend is

highly non-significant. There is a trend in the expected direction of more [p] responses

before [_ ji] than [_jae], but again it is non-significant.

4.4.5.2. [_fkous] stimuli

The other half of the stimuli proved more problematic, and are harder to interpret in

any theory. Psychometric functions are shown in Figure 4.17. The most striking

difference is the very large number of [p] responses before [_ J a e ], which does not fall below

50% until the very last step of the continuum. This context in fact produces more [p]

responses than any other in the experiment, despite the lack of phonotactic or statistical bias.

None of the theories discussed here has an explanation for this. I can only conclude

that it is an acoustic artifact of poor stimulus quality. The [i] in this stimulus is only about

10-15 ms long, less than half as long as the other [ j ] s , putting the burst closer to the vowel

181

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
nucleus. This could have caused an auditory contrast effect, making the burst sound lower

by comparison with the higher F2 of the vowel.

The lack of a reliable [_iae] baseline makes the [_aefkows] responses difficult to

analyze. The [_ifkows] data, shown in the next two figures, is consistent with the pattern

found with the LvnAm] stimuli, though it does not provide especially strong support:

Figure 4.74. Identification curves for the [...fkoUs] stimuli of Experiment 3, pooled across
12 listeners, comparing the [_1] condition with the |_ j ] baseline

100

IlfkoUs
0/
70
nn N
p laefkolls
resp o n se rlfkolls
- X — _raefkolls

0
2 3 4 5
Intermediate stimulus number

182

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.75. Identification curves for the [...fkous] stimuli of Experiment 3, pooled
12 listeners, comparing the [_w] condition with the [_j ] baseline

100

60 -■ .wlfkoUs
% “p“ .waefkoUs
50 -■
resp o n se .rifkolls
.raefkolls

30 ■■

20 - ■

2 3 4
Intermediate stimulus number

183

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.76. Mean percent "p" response, all intermediate [...fkous] stimuli

95% confidence interval

Context Mean SD (tO.025, 12 = 2.179)

Lwi] 51.1 8.79 [45.5 59.9]

Lwae] 45.6 9.31 [39.7 51.5]

55.3 9.66 [49.2 61.4]


L-n]

L-iae] 65.8 10.2 [59.3 72.3]

60.8 8.14 [55.6 70.0]


Lit]

U s] 60.3 14.0 [51.4 69.2]

Table 4.77. Differences in mean "p" response, pairwise by subject, [...fkous] stimuli, [i]
condition only

95% confidence interval

Difference Mean SD (t0.025, 12 = 2.179)

5.41 11.27 [-0.25 11.1]


L li]-L n ]

Lwi]-[_-n] -4.23 5.61 [-7.79 -0.67]

The LI] effect just misses significance, while the [_w l effect achieves it but is

numerically smaller. Again, TRACE'S prediction of more [p] responses before [_wae] than

184

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
[_wi], and before [_lae] than [_Ii], is not supported. The prediction of more [p] responses

b e fo re [_ j i ] th a n [_Jae] is c o n tra d ic te d , b u t o n ly b y th e a n o m a lo u s [_Jzeflcous] c o n te x t, from

w h ic h w e c a n c o n c lu d e n o th in g .

4.4.6. Discussion

These results confirm our suspicion from Experiment 2 that the ban on initial [pw]

is much weaker than the other phonotactic prohibitions which have been the object of

perceptual experiments.

The sequences [pw] and [tl], despite their comparable statistical properties, differ in

their ability to cause a phonotactic boundary shift. This is unexpected under either version

of the MERGE TP theory, since they have very similar statistical properties. Overall, the

results of this experiment contradict the proposal that phonotactic illegality is equivalent to

zero frequency. The results are especially disappointing for the INC-1 version of the

MERGE TP theory, which predicted that [_w] would cause a larger shift than [_1].

TRACE'S prediction that the following vowel will have a very strong influence is

also not borne out by the data. Since a boundary shift was obtained, TRACE can only

explain it as a consequence of lexical activation spreading, in which case the following vowel

ought to have had an effect. The failure to find one calls TRACE'S explanation into

question.

185

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5. Experiment 4: Sequence frequency and the relative phonotactic badness of

[bw] and [dl] onsets: Interaction of response variables

4.5.1. Rationale

Experiments 1 through 3 measured effects in stimulus units: Phonetic context was

manipulated as the independent variable, and its effect on the boundary location was

measured as the dependent variable. For example, in Experiment 3, a [p]-[t] continuum was

judged in the contexts [_wl> LrL and [_1], and [_1] was found to cause a larger shift

(compared to the baseline [_j ] context) than [_w]. This is consistent with the prediction that

the more illegal [tl] onset should be more dispreferred than the less illegal [pw] onset.

However, the finding could also be artifactual. This is because the [_1] and [_w]

contexts are expected to shift the boundary in opposite directions. Different-sized shifts

indicate different-sized phonotactic biases, but they could also simply reflect a closer

perceptual spacing of the stimuli at one end of the [p]-[t] continuum. If, for instance, stimuli

1, 2, and 3 are very similar, while 3,4, and 5 are very different, then a boundary shift from 3

to 1 (2 stimulus units) might actually be smaller perceptually than a shift from 3 to 4 (1

stimulus unit).

It is also possible that the shifts were due to low-level auditory interactions.

Certainly something unexpected was happening in Experiment 3, where the "irrelevant"

filler context ([_fkows] versus [_vnum]) interacted with the other stimulus variables.

Perhaps the stimuli of Experiment 3, based on natural tokens, did not provide sufficient

control on irrelevant parameters.

Experiment 4 was designed to eliminate these problems. The technique used here is

to measure the effect of one response on another: Listeners judged a CC cluster in which

both Cs were ambiguous, and the dependent measure was the effect of their decision about

186

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the first C (“g” vs. “d”, or “g” vs. “b”) on their decision about the second (“1” vs.

“w”) (Nearey 1990). By so doing, one can control stimulus factors completely: The

dependence between stop and sonorant judgments can be measured separately for each

individual stimulus.

A further check on Experiments 2 and 3 is provided by replacing [p] and [t] with [b]

and [d], replicating the original experiment with stimuli that are different segmentally but the

same phonotactically.

4.5.2. Design

The aim was to measure the dependence of ‘T7”w” judgments on “g’T ’d” and

“g”/”b” judgments in English CCV syllables. All listeners were tested on two separate

stimulus sets: an array of stimuli ambiguous among [glre gwae dire dwre], and one

ambiguous among [glre gwre blre bwre], and classified each one as one of those four

stimuli. The dependence between the stop and sonorant judgments was quantified as the

change in the log-odds ratio of an TV 'w" response conditional on the stop judgment (see

below, §4.5.5.).

4.5.3. Predictions

4.5.3.1. TRACE simulation

In experiments with only one ambiguous segment, TRACE predictions are derived

by assuming that a larger activation level for, say, [I] than [j ] means a greater likelihood of

classifying the stimulus as "1". The likelihood of giving response R, is given by

(4.78)

2 >

187

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where

(4.79)

S; = c\p(ka])

a; being the activation level of unit j as j ranges over members of the alternative s e t, and k

being a constant (McClelland & Elman 1986).

It is not immediately obvious how to derive predictions when a two-segment

response is called for, since TRACE has no units corresponding to two-segment sequences.

The simplest assumption would be that the probability of, e.g, a "bl" response is the

product of the probabilities of a ”b" and an "I" response. A stimulus which gets "b"

judgments 25% of the time and "1" judgments 60% of the time should be classified as "bl"

15% of the time. This would be in keeping with the principle that the units represent

hypotheses about the input, and their activation levels represent the strengths of these

hypotheses: The network’s confidence that the input contains a "b" at time 26 is completely

captured by the activation levels of the phoneme units for time slice 26. Under this

interpretation, any response dependency between the stop judgment and the sonorant

judgment for a fixed stimulus is unexpected.

4J5.3.2. MERGE TP

4.5.3.2.I. INC-1 context

Frequency counts from the American English Kucera-Francis database are shown in

Table 4.80. The "forbidden" [bw] and [dl] occur with fair frequency within words

(subway, badly).

188

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.80. Diphone frequencies for the stimuli of Experiment 4

Sequence Frequency (per million words)

[bw] 7

[dw] 26

[gw] 148

[bl] 2407

[dl] 275

[gl] 625

Nonfinal [b] occurs 49,772 times per million words; nonfinal [d], 32,251 times;

nonfinal [g], 17,586 times. Hence

(4.81)

P r ([w] | [b J) = 7/49772 = 0.000141


P r ([I] | [b_]) = 2407/49772 = 0.0484

P r ([w] | [d J) = 26/32251 = 0.00806


P r ([I] | [d_]> = 275/32241 = 0.00353

Pr ([w] | [g_]) = 148/17586 = 0.00842


Pr ([1] | [g_]) = 626/17586 = 0.0356

The right-hand context is irrelevant, being in every case the same ([ae]). In the

baseline context [g_], an [I] is 4.23 times as likely as a [w]. After [b_J, [I] is 344 times as

likely as [w]; hence, the "b'V'g" decision changes the expected odds of an [1] by a factor of

81. After [d_], [w] is 10.6 times as likely as [1], so the "d'V'g" decision changes the

expected odds of an [1] by a factor of 45.

189

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The INC-1 theory therefore predicts that both the "b'V’g" and "d’V'g" decisions will

affect the likelihood of an "1" response, with "b" favoring'T' responses and "d" favoring

"w". The effects are predicted to be of similar magnitude, with the "b'V'g" effect perhaps

larger than the "d’V'g" effect.

4i.3.2.2. SC-1 context

In the SC-1 theory, what matters is the relative frequencies of the three-phoneme

strings with [1] and [w] in the middle. These frequencies are

Table 4.82. Triphone frequencies for the stimuli of Experiment 4

Sequence Frequency (per million words)

[bwre] 0

[hire] 270

[dwre] 0

[dlae] 0

[gv/ae] 0

[glae] 202

Since [w] and [1] have the same frequency in [d_sej, SC-1 predicts no net bias in

favor of either. Meanwhile, the [b_ae] and [g_re] contexts have almost identical statistics -

no [w]s and a couple of hundred [l]s. These two contexts are therefore predicted to have

similar effects, biasing responses strongly towards "w". The "b'Vg" decision is thus

predicted to have little or no effect on the likelihood of an "I" response, while the "d'V'g"

decision ought to have a considerable effect, with "d" leading to more "w" responses.

190

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5.3.3. OT grammatical theory

Both [dl] and [bw] are unattested as syllable onsets in English, as shown in Table 1.

Nonetheless, as Table 4.83 shows, [dl] is commonly classified as “impossible”, while [bw]

is “marginal” at worst (Hultzen 1965; Wooley 1970; Catford 1988; Hammond 1999; see

§§2.3.2.4 and 23.2.5 above).

There are coherent structural grounds for this difference. Both clusters violate a

cross-linguistically widespread constraint against successive consonants with the same place

of articulation in the same unit - here, the syllable onset (McCarthy 1988; Padgett 1991).

The [dl] onset has two successive coronals, while the [bw] onset has two labials.

As discussed in §2.3.2.5. above, English grammar is less hostile to [pw bw] than to

[tl dl]. English [j ] is labial (Delattre & Freeman 1968), so the legal, frequent onsets [br pj]

violate the same OCP constraint as [bw]. Moreover, the OCP is, cross-linguistically,

stronger the more similar the two Cs are in sonority (Selkirk 1988; Padgett 1991). Since [1]

is less sonorous than [w] (Kahn 1980; Guenter 2000), the [dl] sequence is closer in

sonority than [bw] and hence a worse structural violation.11 The OT grammatical theory

therefore predicts a larger perceptual bias against [dl] than against [bw], as shown in (4.84):

For a discussion o f [si], [sn], and [st] onsets, see above. §2.3.2.4.

191

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.83. Frequency of occurrence of the clusters of Experiment 4 as onsets in English

Position

Word-initial Syllable-initial

Cluster By types By tokens By types By tokens

Labial

bw 0 0 0 0

bl 389 27948 890 69100

Coronal

dw 10 983 16 1003

dl 0 0 0 0

Dorsal

gw 6 172 59 4834

gl 148 12644 292 19001

Note: Values represent occurrences in the 18.5-million-word London-Lund corpus of


written and spoken British English, using the principal pronunciation of each entry in the
CELEX EPL.CD lemma database (Baayen, Piepenbrock, & Gulikers 1995). Phrasal entries
(e.g., black-and-blue) were counted as single words.

When the stop could be either [g] or [d], the decision about the stop has major

consequences for the decision about the sonorant - a "d" decision means that the sonorant

cannot be parsed as [I] without violating an active markedness constraint. When the stop

could be either [g] or [b], though, the stop decision carries less weight, since the sonorant

can still be parsed as either [I] or [w] without violating an active markedness constraint.

The OT grammatical theory therefore predicts that the "g'V'd" decision will affect

the likelihood of an "1" response, with "d" leading to fewer 'T’s, while the "g'V'b" decision

will have little or no effect.

192

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.84) [dlae] is more disfavored than [bwae]

UR —• OCP(CONT, PL) (faithfulness) OCP[LAB]

*
a. (•, [bwae])

b. (•, [blae])

c. (•, [gwae]) •• .-•-- V


• -.. -

d. (•, [glae]) •V'lV r-

e. (•, [dwae])
r .'. v;
*
f. (•, [dlae])

0*
g (•, [gwte])

h. (*, [glae])

4.5.4. Methods

Stimuli were synthetic CCV monosyllables. The V, following Pitt (1998), was

always [*] in order to make all stimuli nonwords. The second C ranged from [1] to [w]; the

first, from [g] to [d] or from [g] to [b]. To prevent listeners from memorizing the individual

stimuli, a large stimulus set was used (Crowther & Mann 1994): six steps along each

continuum, making 36 stimuli in each array. Stimuli were identified by a 2-digit code. The

first digit specified position on the stop continuum (‘0’ = most b- or d-like, ‘5’ = most g-

like); the second, position on the sonorant continuum (‘0’ = most 1-like, ‘5’ = most w-like).

Care was taken to make the stimuli acoustically and articulatorily plausible, and to

insure that ambiguous segments were heard as one of the intended phonemes. Synthesizer

193

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
parameters are shown in Figure 4.19; only differences between the endpoints are discussed

in the text.

Figure 4.85. Synthesis parameters for the stimuli of Experiment 4

(a)

AV
o
TJ
AH
000
000 20000

time (ms)

(b)

FTP
FTZ

300000

000

time (ms)

194

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(C)

73)00 -

* SOOOO
■o
5
■o

2 9 ) jQ0

01
— I— 1— — I—
OO) 20000 40000 80000

time (ms)

(d)

73000

N
b/d B2F
X

2SOJOO

B2F
ooo
ooo

time (ms)

195

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(e)

b/d A2F
A2F

AB

2500
b/d AB
iW

OjOO
TL

OjOO

time (ms)

(f)

MOO jOO

o FO
x
73000
N
X
>•
v
c
3
«r

2SD00

000
000 20000

tim« (ms)

196

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The following parameters, which were the same for all stimuli and did not vary

across time, are not shown: GH 50, OQ 30, F6 4900, B6 100, F5 4300, B5 300, F4 3250,

FL 20. The aspiration control AH was turned on during the last part of the vowel to match

the breathy offsets of the natural model tokens.

The [I] and [w] endpoints were made following the acoustic theory of those

segments in Stevens (1999:513-555). The [1] endpoint had a low F2 and high F3,

corresponding to an elevated tongue dorsum, and by a pole-zero pair in the vicinity of F4,

corresponding to the cavity above the tongue blade (Stevens 1999:545). At the [w]

endpoint, the pole-zero pair was absent, and all formants above F2 were attenuated to

simulate the low-pass filtering effect of a labial constriction. FI and F2 were even lower

than in [1], another correlate of labiality.

The stop endpoints differed only in F2 onset, bandwidth of frication at F2, and

amplitude of the F2 and wide-band frication components. The [g] had a low F2 onset and a

compact burst spectrum, with energy concentrated near F2, while [b] and [d] had diffuse

burst spectra (Blumstein & Stevens 1979). The [b] had the same low F2 onset as [g], while

[d] had a higher onset than [b] and less energy in the F2 region. The stop transitioned into

the sonorant over 65 ms.

Parameter values were adjusted to make even the endpoint stimuli slightly

ambiguous. Intermediate steps were made by interpolating the synthesizer parameters.

Interpolation was linear except for the bandwidth of F2 frication, which was interpolated

along an exponential curve of the form B2F = AeBr, where r went from 0 at the [g] endpoint

to 1 at the [b] and [d] endpoints.

Stimuli were synthesized using the cascade branch of the SENSYN terminal

analogue synthesizer (Klatt 1980) with 16-bit resolution, a 16-kHz sampling rate, and a 2-

ms frame length. Six formants were used, but only the lowest two varied. Stimuli were

low-pass filtered with a sharp cutoff at 5512 Hz. All parameters are given in tabular format

in the Appendix to this chapter.

197

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This procedure yielded two 36-element stimulus arrays: one ambiguous between

[glie gwse dlae dwae] (the “d array”), and one between [glae gwre blae bwie] (the “b

array”). Pretesting with 32 listeners showed that the stimuli sounded natural, were

ambiguous, and were only heard as one of the intended syllables.

Seventeen undergraduate native speakers of American English participated in this

experiment as part of a course requirement in introductory psychology. All reported no

speech or hearing deficits, and were naive to the purpose of the experiment. One listener

was dropped for inability to do the practice, leaving 16 valid subjects.

Listeners were tested individually in a soundproof booth (LAC Model 401 A) during

two 15-minute blocks separated by a 5-minute break. Eight of the listeners heard the “d”

array in the first block, while the other 8 heard the “b” array first. In each block, the 36

syllables were presented in pseudorandom order through Sennheiser EH-1430 headphones.

The listener responded by pressing one of four buttons labelled “dw dl gw gl” or “bw bl

gw gl”. The order of buttons was rearranged between listeners. One second after the

response, the next syllable was presented. Ten responses were collected for each stimulus.

Each block was preceded by a short practice. Each of the four most extreme stimuli

(at the comers of the array) was presented three times, for a total of 12 stimuli, in random

order, and judged by the listener as in the main experiment. No feedback was provided.

The practice was repeated until the listener had used all four responses (accurately or not).

4.5.5. Results and discussion

For each stimulus S, the 160 responses from all listeners were pooled to estimate the

likelihood that that stimulus would be put into each of the four categories. The statistic of

interest is how the listeners’ “IT ’w” judgment on a particular stimulus S is affected by

their decision about the stop. The “l”/”w” judgment was quantified as the log-

transformed odds ratio of “I” versus “w” responses. This was calculated separately for

the two stop responses, as shown in Figure 2. If the stop decision had no effect on the

198

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
decision, then all of the points in Figure 2 would lie on the line v = x.

Displacement from this line indicates phonotactic bias.

Figure 4.86. Log odds ratios for the "IT w " judgment in Experiment 4, contingent on the
"g'T'd” judgment. Each point represents 16 listeners' pooled responses to one stimulus.
(Stimulus codes are explained in the text.)

200 - + —

OL 42
OjOO
54
45

C
•200 ■+-----24---- f

OjOO

in (P("ar)/P("gw"))

199

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.87. Log odds ratios for the ’TY'w" judgment in Experiment 4, contingent on the
"g'Y'b” judgment. Each point represents 16 listeners' pooled responses to one stimulus.
(Stimulus codes are explained in the text.)

30
20

OjOO
03
13
a. 60
‘24
C
35 -25

54

-2D 0 OjOO 200

in (P("gl")/P("gw-))

For example, suppose S is a stimulus from the “d” array which was judged as

“gl” 36 times, “gw” 35 times, “dl” 18 times, and “dw” 67 times. When the stop was

identified as “g”, the sonorant was equally likely to be classified as “1” or “w”: In (

P(“gl”|S)/P(“gw”|S )) = In (36/35) = 0.028, plotted on the .t-axis in Figure 2. When the

stop was identified as “d”, the sonorant was more likely to be called “w” than “1”: In (

P(“dl”|5)/P(“dw”|5 ) ) = In (18/67) = -1.310, plotted on the y axis. The measure of

phonotactic bias is the difference d: the log of the “l”/”w” odds ratio contingent on a

“g” decision minus that contingent on a “d” decision, here 0.028 - (-1.310) = 1.338.

For each array, D = mean d over all stimuli was computed. In the “d” array, D =

1.224, indicating that a “d” judgment reduced the odds of “1” by a factor of exp( 1.224) =

3.40. In the “b” array, D was 0.4762 - an unexpected result, since it means that a “b”

200

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
judgment, far from reducing the odds of a “w”, actually increased them by a factor of

exp(0.4762)= 1.61.

Because the dependent measure, difference in log-odds ratios, bears a complex

relation to the individual subject data and is drawn from an unknown distribution, the

appropriate statistical test is the non-parametric bootstrap (Efron & Gong 1983, Efron &

Tibshirani 1993). The null hypothesis D = 0 was tested against the two-sided

alternative H,: D * 0 using the sensitive procedure recommended by Hall and Wilson

(1991). For each array, B = 10,000 bootstrap resamples were drawn and used to find Da

such that Pr* (| D* - D\> Da) = a. For the “d” array, Dm = 0.3986 and D0(n = 0.5238.

Both are much less than D = 1.224, allowing rejection of Hq at the 99% confidence level.

For the “b” array, Dm = 0.4856 and Dm= 0.6103; hence, the empirical D of 0.4762 barely

misses significance at the 0.05 level.

Although both [dlx] and [bwae] are unattested in English, a significant phonotactic

bias was found only against [dlae].

This is inconsistent with TRACE, which predicted that the stop and sonorant

judgments would be independent for each stimulus (i.e., that all the points in Figures (4.21)

and (4.22) would like on the liney =x). It is inconsistent with the INC-1 theory, which

predicted that the "b’V’g" decision would affect the "l"/"w" decision at least as strongly as

the "d"/"g" decision did.

It is consistent with the SC-1 and OT grammatical theories, both of which predicted

no effect of "b"/"g" and a large one of "d'V'g".

4.5.6. Foreign-language exposure

An alternative explanation must be considered. Most of the participants had had up

to nine years of exposure to a language with [bw] or [pw] onsets. Could this have allowed

them to build perceptual units for these un-English clusters? Each listener's total number

of "bw" responses was regressed against years of exposure to French, Spanish, or

201

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Mandarin Chinese (see scatterplot in Figure 4.88). Longer exposure led to slightly fewer

"bw" responses. The trend was weak (R-squared =0.201) and due mostly to Listener 11,

who no exposure and a very high rate of "bw" response. When this listener was excluded,

the trend vanished (R-squaied = 0.117). Foreign-language experience does not, therefore,

explain the weakness of the bias against [bw].

Figure 4.88. Total number of "bw" responses in Experiment 4 as a function of individual


listeners' exposure to languages containing [bw] or [pw] onsets (French, Mandarin Chinese,
or Spanish)

200J00

tacuoo -

~ WOOD -

300 0 “

Years of exposure

4.5.7. A note on TRACE

There is another interpretation of TRACE, under which it may make the correct

predictions about Experiment 5.

The TRACE network is an idealization of a complex perceptual process. Each time

TRACE is exposed to a given simulated input, it settles into the same state. An actual

202

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
biological system would not do that, owing to uncontrolled random factors which differ

from trial to trial (such as perceptual noise, pre-activation of lexical items through priming,

or variations in attentional weight placed on different acoustic features of the stimulus). A

fixed stimulus would evoke a range of activation levels, leading to a range of response

probabilities; this is a standard assumption of signal detection theory (e.g., Macmillan &

Creelman 1991). These could interact through the network mechanism to produce a

dependency between stop and sonorant decisions for an acoustically fixed stimulus.

For example, a stop that on the average is exactly halfway, perceptually, between [d]

and [g] would sometimes activate the [d] unit a bit more and sometimes activate the [g] unit

a bit more. Of the "d" reports, a disproportionate number would come from the trials on

which the stimulus activated the [d] unit more. On these more [d]-like trials, the [d]-initial

cohort would gain an early advantage and combine to excite the [d] unit even more, leading

to inhibition of the [g] unit and hence of the [g]-initial cohort. The [dw]-initial words would

then feed activation down to the [w] unit. The [I] unit would receive no corresponding

support - the [gl] words being dormant and the [dl] words nonexistent - resulting in an

increased likelihood to respond "w". Hence, "d" responses would tend to be associated

with "w" responses - a dependency between the stop and sonorant decisions.

To assess this dependency, we need to manipulate the activation levels of the stop

units and measure the effect on the activation levels of the sonorant units. This can be done

by simply giving the network unambiguous stops together with ambiguous sonorants.

An ambiguous [l/w] phoneme was created for the TRACE input. Simple parameter

averaging of [I] and [w] produced a segment which stimulated no unit strongly. Parameters

were modified by trial and error until a segment had been attained that excited both the [1]

and [w] unit nearly equally, without exciting the [j ] unit.12 There was a slight initial bias

12 The parameters are: POW 3, VOC 3, DIF 7, ACU 5, GRD 4, VOI 1, BUR 0. It was also necessary to
replace the [S| unit with a copy o f the [s] unit, because otherwise the ambiguous segment activated it.

203

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
towards [I], which increased over time to a factor of 2 at Cycle 72. Since this effect was

present in all conditions of the simulation, it should not affect the pattern of predictions.

The same lexicon was used as in the TRACE simulation of Experiment 3. As in

Experiment 3, it was found that the initial (b/g) or (d/g) segment excited a great many words

ending in [b d g]. To avoid this artifact, the same procedure was followed as before: Final

[b d g] were replaced by [p t k]. The words guano and dwarf were added, to insure that the

lexicon had at least one word beginning with each attested onset13. The resulting lexicon

contained the following words with the relevant onsets:

Table 4.89. Lexicon for TRACE simulation: Words with [b/d/g]+[w/l] onsets

Onset Words

[gw] guano

glad, glass, glue


[gl]

[bw] (none)

[bl] blush, blood, blast, black, bleak, blue

[dw] dwarf

[dl] (none)

For this simulation,, it is important that the stop and the sonorant interact only

through activation spreading in the network, rather than through coarticulation. The event

being simulated is the repeated presentation of a single stimulus; the effect of interest is the

13 They were adapted to the TRACE phoneme inventory as dwarp and gwadu.

204

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
influence of the stop decision on the sonorant decision. We cannot manipulate the stop

decision directly, so we must do it by actually manipulating the stop in the simulated input.

However, the features of that simulated stop will be realized by TRACE not only at the time

slice of the stop, but for several time slices on either side, with the result that an initial [b]

will pre-activate [w] for reasons which have nothing to do with phonotactics. Manipulating

the stop does not just manipulate the activation input to the stop units, but also that to the

sonorant units. To prevent this, the TRACE program was altered so that the acoustic

features did not spread more than three time slices on either side of their center, preventing

overlap entirely.

The predicted log odds ratio follows from

(4.90)

l^p(»j")_ f exp (kl) ^ exp(/W) + exp(fov)


/T V ) exp(kl) + exp(kw) exp(kw) ,
exp(kl)
exp (kw)
= k(l - w)

where / and w are the activation levels of the [I] and [w] units. That is, the log odds ratio of

the "17"w" decision is proportional to the difference in activation levels between the [1] and

[w] units.

Given the input [g?ae], [1] and [w] start out equally active. By Cycle 45, glad, glass,

and glue are pulling ahead of guano, and [1] has overtaken [w]. By Cycle 75, [1] is three

times as active (39 to 14), and [w] is extinguished by Cycle 96. The log odds ratio at Cycle

75 is therefore 25it.

With [b?ze] as input, the same thing happens, only faster since there are more words

beginning with [bl] but no countervailing [bw]-initial words By Cycle 75, the activation

205

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
levels for [I) and [w] are 49 and 4, and [wj is deactivated on Cycle 81. The log odds ratio at

Cycle 75 is 45k.

With [d?ael as input, [1] and [w] are neck-and-neck for a long time. Dwarf

gradually gains ground, but is held in check by other weakly-activated [d]-initial words like

do, D, and deal. On Cycle 75 [w] is still only half again as active as [1] (32 to 20), and [1]

does not go extinct until Cycle 105. The log odds ratio at Cycle 75 is -12k.

The effect of the "g'V'b” decision on the Cycle 75 log odds ratio is therefore to shift

it by 20k, while that of the "g'V'd" decision is to shift it by -31k. TRACE therefore predicts

that the "g'V'd" decision will have a larger effect on the "l"/"w" decision than will the

"g'V'b" decision - as was in fact observed.

The success of this simulation is due to the fact that TRACE implements

phonotactic bias not by inhibiting low-frequency sequences, but by facilitating high-

frequency ones. Here, the difference in frequency between [gl] (glad, glass, glue) and [gw]

(only guano) was enough to make the [gje] context favor [1]. In this respect it behaved

more like [b_se] and less like [d_re], reducing the effect of the "b'V’g" decision and

increasing that of the "d'V'g" decision. That is to say, the [g_ae] context did not provide a

neutral baseline.

All this is taking the most favorable possible view of the feasibility of modelling

judgment dependencies as a consequence of perceptual noise added to TRACE inputs. This

scheme will require quite a lot of noise. In the above simulation, the stop units were

supplied with enough "noise" to completely disambiguate them - the simulated listener, on

that trial, heard an ambiguous segment as a perfect [g], [d], or [b], A more realistic, i.e.

smaller, estimate of perceptual noise would lead to a correspondingly smaller variation in

stop-unit activation levels for a fixed stimulus, which would in turn cause smaller variation

(from presentation to presentation of the same stimulus) in both stop and sonorant response

probabilities. It seems implausible that the covariation between stop and sonorant responses

would show up as strongly as it did in only 160 trials from 16 listeners.

206

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.6. Experiment 5: Phonotactics and syllabification

4.6.1. Rationale

Another possible source of the effects in Experiment 4 is compensation for

coarticulation (Mann 1980, Mann & Repp 1981, Mann 1986). Suppose an ambiguous stop

in the'd' array is perceived as [d]. Since the F2s in the'd' array range from slightly below

that of a [d] to slightly above that of a [b], they are all lower than the typical [d]. A stop

heard as [d] thus has an atypically low F2 for a [d]. Some of this lowness may be attributed

to labialization from a following [w], causing more “dw” and fewer “dl” responses.

Because [b] and [g] have similar F2s, the compensation effect may be smaller in the “b”

than the “d” array, producing the observed results.

This account can be tested by manipulating the stimuli to alter their phonotactics

while leaving their coarticulatory properties intact. As pointed out by Pitt (1998), a cluster

which is illegal in an onset may become legal if split by a syllable boundary. A structural

account predicts less bias against “dl” responses in [xdlx] than in [dlse], because [aedlae]

allows the legal parse [aed.ls], A compensation account predicts the bias will persist, as

compensation has a strong effect across syllable boundaries (Mann 1980; Mann 1986;

Elman & McClelland 1988; Pitt & McQueen 1998), is unaffected by perceived

syllabification, and is only slightly reduced, if at all, by a preceding vowel context (Mann &

Repp 1981).

207

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.6.2. Design

The design was based on that of Experiment 4. Two 6-by-6 arrays of CCV and

VCCV stimuli were constructed, ambiguous between [dwae dlae bwae blae] or [aedwae aedlaae

aebwae aeblae]. Both [dlae] and [bwae] were included to maximize the expected phonotactic

effect. Listeners were asked to make the same judgments as in Experiment 4.

4.6.3. Predictions

The predictions of TRACE, INC-1, and SC-1 are exactly as in Experiment 4.

TRACE predicts that the stop and sonorant judgments will be independent, both in the

VCCV and the CCV condition. Neither the INC-1 nor the SC-1 context can "see" the

prepended vowel, so they predict that adding the vowel will not affect the stop-sonorant

dependency.

The OT grammatical theory predicts that the stop-sonorant dependency will be

reduced in the VCCV condition compared to the CCV condition. This is because the

VCCV stimuli can be syllabified as VC.CV. If the stop is classified as "d", then "1" will be

disfavored in the CCV condition because [dlae] violates OCP(CONT, PL); however, in the

VCCV condition, an "1" response will still be possible if the input is parsed as VC.CV.

4.6.4. Methods

From the endpoints of Experiment 4, a 6-by-6 array of CCV stimuli was

constructed, ambiguous between [dwae dlae bwae blae]. A 6-by-6 array of VCCV stimuli

was made by adding a 300 ms [ae] to each of the CCV stimuli. This [ae] used the same

parameters as the final [ae], except that FO began higher (120 Hz). Transition to the stop

took 40 ms. A 40-ms voiced closure preceded the release; though short, this proved to be

the most natural-sounding duration. Details are in Figure 4.91.

208

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.91. Synthesis parameters for the stimuli of Experiment 5

(a)

ffl
•o AV

AF.AH
000
000 10000 20000 30000

tun* (ms)

(b )

FTP
FTZ

time (ms)

209

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(C)

03

B2
040
0130

time (ms)

(d)

B4
73)01

B2F
N
X

c
•v
A

00)
MOO) 20000 3000)

tim« (ms)

210

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(e)

TSjOO -

A2F

m
■o AB

>

TL

OjOO

time (ms)

(0

o
X
N
X
o
c

3
V

tz

oa t
toooo 20000

time (ms)

211

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Eighteen different members of the same population as in Experiment 1 participated

for psychology course credit. Two were dropped because their native language was not

English, leaving 16 valid subjects.

The only difference from Experiment 4 was that all listeners were tested on the

VCCV block first and the CCV block second, to avoid priming a V.CCV syllabification.

4.6.5. Results

Results are shown in Figures 4.92 and 4.93. As in Experiment 4, bias appears as

displacement from the line y = x. The displacement is greater and more consistent in the

CCV than the VCCV condition. The test statistic was again D, the log of the “I T w ”

odds ratio contingent on a “d” decision minus that contingent on a “b” decision, averaged

across all stimuli.

212

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.92. Log odds ratios for the "l'V'w" judgment in Experiment 4, contingent on the
"b"/"d" judgment, for the CCV stimuli. Each point represents 16 listeners' pooled
responses to one stimulus. (Stimulus codes are explained in the text.)

CCV stimuli

200

100
52
3 03 02
£ 00
Sl 33
000
34

£ -too ■ + -

05
+ —

000 100 200

In (P(dl)/P(dw))

213

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.93. Log odds ratios for the TV'w" judgment in Experiment 4, contingent on the
"b'V’d" judgment, for the VCCV stimuli. Each point represents 16 listeners' pooled
responses to one stimulus. (Stimulus codes are explained in the text.)

VCCV stimuli

200

50
22 00
32

43
53
24

54
-2 jOO
04 45
55

000 100 200

In (P(dl)/P(<!w))

For the CCV array, D is 1.0505, for the VCCV array, it is 0.0648. The same
A

nonparametric bootstrap procedure was used to test significance. For the CCV array, Doos
A

= 0.4370 and Dooi = 0.5685, confirming a phonotactic effect. For the VCCV array, the
A A

effect did not approach significance: Doos = 0.4362 and Dooi = 0.6269.

The results indicate that the dependency was eliminated by the availability of a legal

parse: The prepended vowel made a difference. This is unexpected under TRACE, INC-1

or SC-1, but is precisely as predicted by the OT grammatical theory.

214

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.6.6. Discussion

Experiment 4 found a perceptual bias against [dlje], but none against [bwae). The

INC-1 version of MERGE TP predicted otherwise, because [dl] and [bw] are both

unattested as English syllable onsets. Since listeners’ experience of both onsets is identical,

that experience cannot explain the difference in performance. TRACE incorrectly predicted

no dependency at all between the stop and sonorant decisions. Only the SC -1 version of

MERGE TP and the OT grammatical theory correctly predicted the pattern of results.

Foreign-language experience provided no alternative explanation. The difference

was not due to auditory factors, since bias was measured separately for each stimulus;

rather, it reflected a dependency between the stop and sonorant responses. Experiment 5

confirmed that this dependency was not compensation for coarticulation, because it could be

reduced or eliminated by providing a legal parse for the cluster.

It may be objected that listeners' experience of [dl tl] and [bw dw] is not in fact

identical -- that there is a frequency difference, too small to be detected in an 18-million-

word British English corpus, in favor of [bw pw], which university-aged speakers in the

United States are likely to have encountered in foreign place names such as Buenos Aires,

southwestern U.S. place names such as Pueblo, or occasional loans like puissant or the

colloquial bueno. At a conversational speaking rate of 150 words per minute (Venkatagiri

1999), an 18-million-word corpus would represent only 8Sdays of continuous speech, or

perhaps one to three years of a person's combined input and output. A word occurring less

frequently than once in one to three years could escape the corpus —though an 18-year-old

participant in these experiments might have heard it 18 times or more, providing enough

experience of [bw pw] onsets to remove the perceptual bias against it. This Undetected

Frequency Difference (UFD) Hypothesis is a serious objection, but it is unlikely to be

correct.

215

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As has already been pointed out, listeners’ acceptance of [bw] was not increased by

up to 9 years of explicit training in languages in which [bw pw] onsets are common. It was

argued above that this is a ceiling effect; acceptability of [bw pw] cannot be increased by

training because the sequences are legal in English. If instead the UFD Hypothesis is

correct, then the whole of the gain in acceptability must be caused by the exposure to the

first few tokens, with subsequent training having no effect. Hence, it should take exposure

to only a small number of tokens to make any sequence legal. But speakers persist in

treating some sequences as illegal, even after considerable training (Dupoux et al. 1999;

Polivanov 1931).

In support of the UFD Hypothesis, it may be replied that the listeners were exposed

to the undetected low-frequency [bw pw] as children, but received foreign-language training

as adults, after the critical period for accentless acquisition. It is certainly true that infants as

young as 9 months are already sensitive to the sound pattern of their language (Jusczyk,

Friederici, Wessels, Svenkerud, & Jusczyk 1993; Friederici & Wessels 1993; Jusczyk et al.

1994). However, adults can leam phonotactic patterns even without explicit training (Dell,

Reed, Adams, and Meyer, 2000). Moreover, a dispreference for [dl tl] compared to [bw pw]

is found in children who were unlikely to have been exposed to [bw pw] onsets. As shown

in Table 4.29, the midwestem U.S. States of Iowa and Nebraska had few Spanish- or

French-speaking residents in 1990 and almost no place names beginning with [bw pw].

216

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.94. Demographics and occurrences of [bw pw]-initial place names in Iowa and
Nebraska (U.S. Census Bureau, 1990; DeLorme 1998; 2000)

Iowa Nebraska

Population of Hispanic origin (1990) 1.18% 2.34%

Language spoken at home (1990)

Spanish or Spanish Creole 1.13% 1.55%

French or French Creole 0.29% 0.26%

Place names beginning with

[bw] (Bue-, Boi-) Buena Vista (town, ---

county, and college)

[pw] (Pue-, Poi-) — —

Note: The proportion of the overall U.S. population of Hispanic origin in 1990 was 8.99%.
"Buena Vista", in Iowa, is locally pronounced with initial [bw] (Buena Vista College Library
staff, personal communication, 2001).

In a study of 1,049 children in Iowa and Nebraska between 2 and 9 years of age,

Smit (1993) systematically elicited productions of most of the English word-initial clusters,

including [bl pi] and [tw]. The [tw] cluster was sometimes produced as [bw] or [pw], but

the [bl] and [pi] clusters almost never became [dl] or [tl], as shown in Table 4.30.14 This

indicates that [d t] are more disfavored before [I] than [b p] are before [w].

14 Similarly, these children also sometimes produce (bl pll as [bw pw), with no corresponding tendency to
turn [tw] into [tl]. Aversion to [tl] may be a contributing factor, but we cannot be sure, because they tend
to replace [I] with [w] in all environments.

217

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.95. Errors in production of the initial stop in [bl pi tw] onsets by English-Ieaming
children in Iowa and Nebraska (Smit 1993)

Onset cluster Error rate category

Age group Occasional Rare

[tw-] (twins)

2:0-3:0 f,b P
3:6—5:6 p, k, d f, int

6:0-9:0 — —

[p H (plate)

2:0-3:0 — —

3:6-5:0 — b, t

5:6-7:0 — —

8:0-9:0 — —

[bl-] (block)

2:0-3:0 — —

3:6—
5:0 — —

5:6-7:0 — —

8:0-9:0 — —

Note: "Occasional" means "[u]sed by a few groups in an age range with a frequency of 4-
10%, or by most groups in that age range at frequencies of 1-4%"; "rare" means "[ojccurs
with a frequency of less than 3%, and only in a few groups in an age range” (Smit 1993, p.
947). This table includes all errors made by the 1,049 children in the study, "int" =
interdental.

The asymmetry is present at the earliest ages tested — before one would expect

most Iowan or Nebraskan children to have had much exposure to Spanish place names.

The UFD Hypothesis can therefore only be defended if the perceptual effects of frequency

are due chiefly to a very few tokens experienced very early in life. If so, it is an interesting

new finding, with many consequences. It implies that, contrary to TRACE, the many words

218

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
learned after early childhood contribute little to the phonotactic frequency effect. It predicts

large individual variation in phonotactics (since the individual is generalizing from a small

sample of the adult language, which will necessarily differ more between individuals than a

large sample). Finally, it suggests that even large corpora of adult language are inadequate

predictors of phonotactic performance, and that research on probabilistic phonotactics

should focus more on child-directed speech.

These findings are consistent with a model in which the decision between competing

parses is guided by the structural constraints of the perceiver’s language - here, the ban on

[coronal][coronal] onsets. In Experiment 4, where syllabification was fixed by clear

acoustic cues, the choice was between competing CCV parses. The “dl” responses were

reduced because a “dl” response could only be supported by the structurally disfavored

[dire] parse. In Experiment 5, where both segmental identity and syllabficiation were

ambiguous, “dl” responses could be supported by the legal [xd.lx] parse, and the

response bias disappeared.

The findings of Pitt (1998, Experiment 2) may be reinterpreted in the same way:

“1” response to an [1]-[j ] continuum was reduced, relative to a baseline, in the context

[maet_ae], but not in [mxd_ae]. Strong aspiration on the [t] provided an unambiguous cue to

V.CCV syllabification (Kirk 2001), allowing only the parses [mx.traj and the illegal

[mae.tlae], The [maed_x] context allowed VC.CV syllabification and thus the legal “1”

parse [mxd.lx]. This suggests that prosodic and segmental parse decisions are made in

parallel, with the candidate parses representing both phonemes and syllabification:

[msed.lx], [mx.dlx], [mxd.rx], and [mae.drre]. The chosen prelexica! parse thus provides

the essential information for word segmentation and lexical access. Phonotactically

impossible parses, such as those with vowelless syllables or illegal onset clusters, are

inhibited, leading to the Possible Word Constraint effects observed by Norris, McQueen,

and Cutler (1997).

219

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7. Experiment 6: Phonotactics of the Japanese lexical stratals

4.7.1. Influence of remote context

Experiments 1-3 all examined the issue of whether a perceptual boundary could be

shifted by phonological context that was remote from the ambiguous segment. That it could

be was a main prediction of TRACE, and also of versions of the transitional-probability

theory which used contexts longer than a single phoneme. Results were uniformly negative:

The identity of a segment outside the phonotactically relevant context did not affect

boundary location even when segments within it did.

In these experiments, the relevant context was quite small, including only the

ambiguous segment and one adjacent to it. The INC-1 and SC-1 versions of the TP theory

therefore took only the relevant context into account in making their predictions. Where the

grammatical theory said that remote context was ineffective because it was irrelevant, the TP

theory said it was ineffective because it was remote.

This section presents evidence that remote context can have a phonotactic effect

when it is (despite its remoteness) phonotactically relevant. It will be shown that Japanese

listeners'judgments of vowel length in nonword stimuli are affected by phonological

context which is remote from the ambiguous vowel, but which is important in determining

the lexical stratum membership of the nonword. This can be taken as further evidence

against the INC-1 and SC-1 versions of the MERGE TP theory. The effect of stratum

phonotactics is found to be numerically larger and statistically more robust than a word-

superiority effect obtained with the same listeners and paradigm, contrary to the predictions

of TRACE.

15 The work in this section was perform ed with the guidance and support o f Shigeaki Amano at the NTT
Basic Research Laboratories in Atsugi, Japan.

220

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The finding of a stratum-phonotactic effect is also strong evidence in favor of the

OT grammatical theory, which is the only one of the three theories in which the concept of

"lexical stratum" is at all motivated.

4.7.2. Rationale: The lexical strata of Japanese

Items in the Japanese lexicon can be divided into four classes, called "strata".

Words in a given stratum share historical, phonological, morphological, and orthographic

characteristics (Shibatani 1990; Tateishi 1990; Martin 1952; ltd & Mester 1994 1995).

These strata are summarized in Table 4.96.

Synchronically, the strata can be identified in two principal ways.

First, they are morpheme co-occurrence classes: In forming words, morphemes

combine preferentially with other morphemes of the same stratum. For example, the two

morphemes [dai] (Sino-Japanese) and [o:] (Yamato) both mean big'. The former combines

with Sino-Japanese [moq] 'gate' to give [daimoq] 'main gate of a Buddhist temple’, the latter

with Yamato [te] 'hand' to give [o:te] 'main gate of a castle'. Since all of the verbal and

adjectival inflectional endings are Yamato morphemes, only Yamato items undergo

inflection.16

16 A few exceptions have been noted: [dem orui] 'to demonstrate (politically)', consists o f a Foreign stem
spliced onto a Yamato ending (Sato 1983).

221

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.96. The lexical strata of Japanese

Stratum Origin Orthography Examples

Yamato Antiquity "Sense reading" [kami] 'god'


(kun-yomi) of
[yama] 'mountain'
Chinese characters

Mimetic Onomatopoeia Hiragana [potapota] 'drip'

Sino-Japanese Borrowings from "Sound reading" [kazarj] 'volcano'


Chinese (on-yomi) of
[keizai] 'economy'
Chinese characters

Foreign Borrowings from Katakana [maikui]


European languages
'microphone'

[katarogui]

'catalogue'

Second, the strata are subject to different sets of phonotactic restrictions.17 A Sino-

Japanese morpheme, for example, cannot be more than two moras long, and the second

mora must be either a single vowel, or one of [t^i tsui ki kui]. A Yamato morpheme cannot

begin with [r], while a Mimetic morpheme can have [r] initially or medially but not both

(Tateishi 1990). Single [p], voiced geminates, and voiceless post-nasal obstruents are

prohibited in Yamato, but permitted in Mimetic, Foreign, and Sino-Japanese, respectively.

17 There are also commonly supposed to be productive phonological processes which operate only in
certain strata. However, attempts to demonstrate their productivity experimentally through native-speaker
"goodness" judgm ents have tended to show the opposite. For rendaku, see Suzuki, Maye, and Ohno (2000);
for the phonology o f verbal affixes, see Vance (1987, 1991). This is a common result of such studies: See
Hsieh (1976) for non-productivity o f Taiwanese tone sandhi, Zimmer (1969) for non-productivity o f a
Turkish labial ity-spreading rule.

222

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Vowel length is distinctive in Japanese. Each of the five short vowels [i e a o ui]

has a long counterpart. Final [a] is found in all strata, while final [a:] is found only Foreign

words.

The absolute phonotactics of the strata can be viewed as a “core-periphery”

phenomenon: strictest for the indigenous Yamato stratum, less so for the newer Sino-

Japanese words, and most permissive for the recent Foreign loans (Ito & Mester 1994,

1995). However, there are many configurations in Sino-Japanese which, though

theoretically permitted in the Foreign stratum, are virtually never actually found there

(Moreton et al. 1998). For instance, the palatalized onsets [r*] and [h*] are almost

nonexistent outside of Sino-Japanese.

The result is to create long-distance phonological dependencies. For example, a

word containing [rj] or [h*] is almost certain to be Sino-Japanese, and hence to lack [a:],

while a word containing singleton [p] or [$a] is almost certain to be Foreign, and hence to

permit it. Carrier contexts containing different stratum cues can be constructed, and the

[a]-[a:J boundary measured, in order to detect whether stratum membership can influence

boundary location.

4.7.3. Experiment 6a: Word-superiority effect

In order to estimate the size of a pure lexical effect on the long-short vowel

boundary location, an experiment like that of Ganong (1980) was performed.

223

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.3.1. Design

Three pairs of words were selected. One member of each pair ended in a short

vowel and the other in the long version of the same vowel. If a word ended in a short vowel,

then making the vowel long produced a nonword, and vice versa. Both words in each pair

were in the same stratum, and were in or above the 96th percentile in rated familiarity when

presented aurally (Amano & Kondo in press). Both members of each pair had the same

accent pattern. The words are shown in Table 4.97:

Table 4.97. Stimulus words used in Experiment 6a

Word Stratum Gloss

$oro: Foreign ‘follow-up’

puiro Foreign ‘professional’

posuita: Foreign ‘poster’

pasuita Foreign ‘pasta’

sju:go: Sino-Japanese ‘meeting’

rirjgo Sino-Japanese ‘apple’

Each word pair provided one context which was expected to lexically bias perception

in favor of the short vowel, and one which was expected to bias in favor of the long vowel.

It was expected that there would be a bias to report [a:] in the context [posut_], since

[posuta:] is a real word while [posuta] is not, and that there would be a bias to report [a] in

the context [pasut_), since [pasuta] is a real word while [pasuta:] is not.

224

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.3.2. Methods

A male native speaker of Japanese recorded several tokens of each word in isolation.

These were digitized at 48 kHz with 16-bit resolution, downsampled to 44.1 kHz, and

normalized for peak amplitude. Using a waveform editor, a single token each of the final

syllables— [ro:], [ta:], and [go:]—was excised and cross-spliced with each of the initial

syllables to make three pairs of stimuli with each pair ending in the same long vowel. The

length of the vowel was adjusted to be as close as possible 250 ms by repeating medial pitch

periods.

Experimental subjects were 24 native speakers of Japanese, post-secondary students

in the Tokyo area, with equal numbers of each sex. Subjects reported normal speech and

hearing. They were paid for their participation.

Subjects were tested 8 at a time in a sound-treated room in the presence of the

experimenters. Stimuli were presented diotically through Sennheiser headphones at a peak

intensity of 70 dB SPL. Subjects responded by mouse-clicking on one of two buttons

displayed on the screen. The buttons were labelled in the katakanu syllabary (used

primarily for writing Foreign loanwords); one showed the stimulus word with a short final

vowel, the other with a long one (e.g., “ringo” and “ringoo”). Subjects were asked to

choose the button which better matched what they had heard. The next stimulus followed

after 0.8 seconds.

To vary the length of the final vowel, a half-wave raised cosine filter was applied.

This caused the final vowel to be gradually reduced to zero amplitude during the 50 ms

following the specified starting point of the filter, providing a natural-sounding end to the

vowel.

The boundary between the long and short vowel was established using the adaptive

method PEST (Taylor & Creelman 1967). One of the 6 stimuli was randomly selected, and

presented with either the longest or the shortest possible final vowel. On the next trial, the

225

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
opposite endpoint was presented. Thereafter, the stimulus presented depended on the

subject’s responses to previous stimuli, as specified by the PEST rule, so as to zero in on

the boundary. The series ended when the step size was reduced to 1/256 of the

continuum—i.e., about 1 ms— or after 80 trials. To reduce the dependence of each trial on

the previous one, two PEST series for the same context were interleaved with each other,

switching back and forth randomly. Once both series finished, the screen cleared and a

button appeared with the message “Click to go on to the next sound”. This procedure took

place once for each of the 6 stimuli.

4.7.3.3. Results and discussion

Three subjects were excluded from the analysis for failure to converge after 80 trials

on 6 or more series (in this or the following experiment combined). The remaining 2 1

listeners converged on 96% of all series after an average of 20.4 trials in each.

For each subject, an [a]-[a:] or [o]-[o:] boundary was computed for each series. The

boundary was the stimulus that would have been presented by PEST if there had been one

more trial. Since two interleaved series were presented for each carrier context, the subject’s

boundary for that stimulus was taken as the average of the two.

For each subject and each pair of words, we calculated the difference between the

boundary in the long-bias context and that in the short-bias context. Results are shown in

Table 4.98.

226

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.98. Boundary difference in long- and short-biased contexts (in milliseconds).
Experiment 6a (N = 21 Ss)

Long-bias context Short-bias context Boundary difference Standard error

<jior_ ptur_ -4.22 4.92

posuit_ pasuit_ * 12.21 5.45

sju:g_ 9.04 4.92


ngg_

Note: Asterisk represents 5% significance.

There were large amounts of individual variation in the differences, leading to large

variances and low significance levels. The difference was significantly different from zero

in the posut-Zpasut- context (one-tailed t-test, t(20) = 2.086, p < 0.05). The syuug-Zring-

difference just misses significance at the 5% level (t(20) = 1.725). No effect could be

found for the for-Zpur- pair.

This experiment partially replicated the lexical bias found in ambiguous-phoneme

perception by Ganong (1980). A robust word-superiority effect, however, was not found.

The weak showing of the lexical effect is all the more striking when compared with the

strong results of the next experiment.

4.7.4. Experiment 6b: Lexical stratum phonotactics

We used the Massaro-Cohen paradigm, as in Experiments 1-3, to test whether

stratum phonotactics can shift the phonetic boundary between two sounds—specifically,

whether the different phonotactics of the Foreign and Sino-Japanese strata can shift the

boundary between word-final [a] and [a:].

The idea was to present an [a]-[a:] continuum at the end of a carrier word containing

two consonants, C and C \ If C and C ’ are sounds that occur almost exclusively in Sino-

Japanese words, then they do not naturally co-occur with [a:]. We therefore expected that

listeners would require exceptionally strong acoustic evidence (i.e., a longer vowel) before

227

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
reporting [a:]. On the other hand, if C and C ’ are sounds found only in Foreign words,

then it is unsurprising if the word contains (a:J, and we expected in this case that listeners

would be readier to report the long vowel. In other words, we expected that the [a]-[a:]

boundary would be shifted towards [a:] when C and C’ were highly valid cues for Sino-

Japanese, and that it would be shifted towards [a] when they are highly valid cues for

Foreign.

4.7.4.1. Design

Cue validity and co-occurrence statistics were computed based on a preliminary

(pre-release) updated version of the Japanese-language database of Amano and Kondo

(1999), by exploiting a feature of the Japanese writing system: the lexical stratum of a word

is reflected in its orthography. Foreign words are written in the katakana syllabary.

Chinese characters are used to write both Sino-Japanese and Yamato words, but the

dictionary indicates which pronunciations of a given character are Sino-Japanese and which

are Yamato. There are some exceptions, but they are not numerous. (Details of the

classification procedure are given in Moreton et al. (1998).)

Cues to stratum membership were chosen on the basis of these statistics. For the

Foreign stratum we chose as C the singleton [p] (i.e., [p] neither geminated nor preceded by

a nasal coda) and as C’ the voiceless bilabial fricative [<|>] (orthographically <f>), which

' outside of the Foreign stratum is found only before [ui]. For the Sino-Japanese stratum we

chose [rj] and [h*]. For neutral contexts we chose [r] and [t]. The statistics are shown in

Table 4.99:

228

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.99. Validity of cues to stratum membership: Number of nouns in database
belonging to each stratum containing the given cues

Stratum

Foreign Sino-Japanese Other

Foreign (F) cues

[p] singleton 812 1 83

[<Mbefore [i e a o] 214 0 16

Neutral (N) cues

2922 2917 6530


[r]

[t] 1683 4068 4336

Sino-Japanese (SJ) cues

W) 19 959 109

[h*] 7 273 38

Note: “Other” includes the Mimetic and Yamato stratum, compounds containing words
from more than one stratum, and words which the orthographic algorithm could not
classify. Nouns make up 82% of the database.

The consonant cues were embedded factorially in the template [CoC'_] to produce

nine carrier contexts, ranging from Foreign/Foreign [po<j>_] to Sino-Japanese/Sino-Japanese

[fJohU18

18 None o f the stimuli could be interpreted as Yamato items. Each stimulus begain with either [p], [r], or
[r1], none o f which is permitted word-initially in Yamato.

229

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.4.2. Predictions

4.7.4.2.1. TRACE

There is nothing in TRACE that would allow it to represent the concept of stratum

directly. Stratum effects, if there are any, must emerge from statistical properties of the

lexicon.

It was not possible to perform an actual TRACE simulation of this experiment.

However, we can infer something about its predictions based on its behavior in previous

simulations.

In each of the phonotactic simulations we have run, the advantage for one phoneme

over another has come, not from slight activation of many lexical items, but from the

moderate incomplete activation of one or two lexical items that are very similar to the

ambiguous stimulus. Their activation is strong enough to extinguish the other word

competitors which overlap the stimulus to a smaller degree.

A word-superiority effect differs from a phonotactic effect only in that a single word

candidate is very highly active. As a result, the magnitude of a "phonotactic" effect is

predicted to be smaller than that of an outright lexical effect. We expect a smaller effect in

this experiment than in Experiment 6a.

4.7.4.2.2. MERGE TP

MERGE TP would make lexical stratum effects on perception, if any, an emergent

statistical property of phoneme sequence probabilities.

The INC-1 and SC-1 versions of the MERGE TP theory predict only an effect of

the proximate (C ) context, not of the remote (C) context, because they do not keep track of

dependencies more than a single segment away from the ambiguous segment. If an effect

of C is found, it will constitute positive evidence either that the contexts must be larger, or

that the phonotactic effect is not entirely due to transitional probabilities.

230

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7A2.3. OT grammatical theory

OT analyses of the Japanese lexical strata are in agreement that the differences

between strata lie in faithfulness rather than markedness. One proposal is that there are

separate grammars for the separate strata, which differ in how the faithfulness constraints

are ranked. Lexical items are evaluated with respect to their stratum's grammar (Ito &

Mester 1995). An alternative account is that there is only one grammar, but that it contains

duplicate sets of faithfulness constraints, which are stratum-specific. Each set applies only

to lexical items which belong to one stratum. Different rankings of each set result in

different input-output mappings for different strata (Fukazawa et al. 1998).

The choice is not a crucial one for the present experiment; the reasoning is nearly

the same in either case. For expository purposes I will adopt the second proposal, stratum-

specific faithfulness constraints. Its principal theoretical advantages are that it is

parsimonious (preserving the principle of a single grammar with a single fixed ranking) and

offers a learning algorithm, driven by ranking paradoxes: Its principal empirical advantage

is its ability to deal with "hybrid" compounds - compound words containing words from

different lexical strata, such as the Yamato/Sino-Japanese [tombokeqkjiu:ka] 'dragonfly

researcher’, where each member of the compound obeys the phonotactic restrictions

appropriate to its stratum.

The illegality of [a:] in Sino-Japanese is reflected in the ranking of the Sino-

Japanese lDENT[LENGTH] faithfulness constraint below *[a:]. Its legality in Foreign items

is captured by ranking the Foreign IDENT[LENGTH] above *[a:].

231

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4 .1 0 0 ) lDENT[LENGTH)FOR » *[a:] » I d e n t [ L e n g t h 1s j

Ident [Len g th Jfo r *[a:l IDENT[LENGTH]SJ

/d e n s W s j

a. [densia:] *!

*
b. [densja]

/konp*uita^For

*
c. [konp’uita:]

d. [konp’uita] *!

The predictions made by this grammar in the grammatical model hinge crucially on

the concept of "active constraint", which, as discussed in §3.4.2., is any constraint which, for

some input /, eliminates a candidate output from consideration. In order to use grammatical

knowledge, the parser must decide which markedness constraints are active. In order to do

that, it must decide which stratum it is dealing with. The listener must covertly choose a

stratum within which to interpret the stimulus; that is, the linguistic parse which the listener

attempts to assign to the stimulus includes a stratum classification. The active constraints

are determined relative to the stratum's faithfulness constraints. If the stratum chosen is

Foreign, then *[a:] is inactive and produces no bias. If it is Sino-Japanese, then *[a:J is

active and produces a bias.

232

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(4.101)

/any input/ Ident[Length ]fo r *[a:] IDENT[LENGTH]SJ

a. [po<[>a]For

*
b. [po$a:]For

c. [rjohia]SJ

*
d. [rjohja:]SJ

In this view, the Foreign cues can provide conclusive evidence of Foreign stratum

membership, because they are illegal in Sino-Japanese - they violate a markedness

constraint which is active with respect to Sino-Japanese. The reverse is not true: Although

the Sino-Japanese cues are rare in Foreign words, they are not actually illegal there. Hence,

any of our stimuli, with either vowel length, has a legitimate parse as a Foreign word. Those

containing Foreign cues cannot be parsed as Sino-Japanese; those lacking Foreign cues can

be parsed as Sino-Japanese.

There should thus be only two degrees of [a] bias: a lesser one in words containing

Foreign cues, and a greater degree in words lacking them. If C is a Foreign cue, then

manipulating C' should have no effect, while if C is not a Foreign cue, then manipulating C'

should have an effect (since making C' Foreign will reduce the [a] bias). The same is of

course true with C and C exchanged. For example, the stimulus contexts [po$_] and

[rjoi{»a_] are predicted to have the same effect, since the presence of [ $ a j cues Foreign

stratum membership regardless of the initial consonant. Both should produce more "aa"

judgments than contexts lacking Foreign cues. This will show up statistically as an

interaction between the effects of the C and C' manipulations.

233

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4J.4.3. Methods

The procedure was almost identical to that of Experiment 4a. The same speaker

(naive to the purpose of the experiment) was recorded on digital audio tape producing the

cues embedded in the context [CoC’a:], with a high pitch on the [o] and a low one on the

[a:]. This accent pattern was chosen because it is common in both the Sino-Japanese and

Foreign strata, and because it rules out the possibility of a word boundary inside the

stimulus. The 9 possible combinations of C and C ’ yielded the following nonwords:

Table 4.102. Stimuli for Experiment 6b

c Foreign Neutral Sino-Japanese

Foreign [po<(»a:] [pota:] [poh’a:]

Neutral [ro$a:] [rota:] [roWa:]

Sino-Japanese [rjo^a:] [Wota:] [r’oWa:]

These were digitized and normalized as before. Using a waveform editor, single

tokens of [po], [ro], and [rJoJ were selected and excised. One of the [o]s was chosen and

spliced in to replace the original [o] of the other two, resulting in [po], [$o], and [f'o] tokens

with identical vowels.

In this experiment, unlike in Experiment 6a, it was necessary to manipulate the initial

consonant of the final syllable; hence, an [a:] token had to be created which could be spliced

onto each of the three possible syllable onsets. A single token of [a:] was selected and

lengthened to 250 ms by repeating medial pitch periods. One token each of the speaker's

original [$a:], [ta:], and [Wa:] was chosen, and truncated to the first zero crossing following

234

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the fourth pitch period of the vowel (early in the steady state), leaving [<|>a-], [ta-], and

[Wa-]. The 250-ms [a:] token was spliced onto the end of each one to produce [<j)a:], [ta:],

and [Wa:] tokens with identical extra-long vowels.

The subjects and procedure were as in Experiment 6a. A short break separated the

two experiments, which, together with the post-experiment questionnaire, lasted about two

hours.

At the end of the experiment, subjects received written questionnaires. They listened

as many times as they wished to the longest [a:] and the shortest [a] in each carrier context

(in a random order for each subject) and transcribed the resulting pseudo-word in katakana,

then answered questions about it, including whether they knew it as a real word of Japanese.

4.7.4.4. Results

The same subjects were excluded as in Experiment 6a. For each valid subject, an

[a]-[a:] boundary was computed for each series as in Experiment 6a. The boundary for

each context, averaged across subjects, is shown in Figure 4.103:

235

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.103. Boundary between [a] and [a:), averaged across 21 listeners

[ a ] - 1a : ]
boundary
(m s )
120

SJ ryo
N ro-

The boundary stimulus tends to become longer as the consonantal cues go from

Foreign (F) to Sino-Japanese (SJ). A by-subjects two-factor ANOVA shows that

manipulations of both C and C’ had very significant effects on the boundary location. The

C manipulation caused a 7.25-ms shift (F(2,40) = 6.473, p < 0.01); the C ’ manipulation

caused a 12.6-ms shift (F(2,40) = 11.529, p < 0.01). Their interaction did not reach

significance (F(4,80) = 0.714).

The effect obtained in this experiment is considerably larger and more robust than

that of Experiment 6a. There, the only significant effect was a 12.2l-ms word-superiority

236

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
effect on the location of an [a]-[a:] boundary. This is about the same size as the effect of

C’ alone in this experiment. TRACE would have predicted the lexical effect to be stronger.

Questionnaire results confirmed that the subjects heard the stimuli correctly; no

stimulus was misheard by more than 3 subjects. One stimulus, [pota], was judged to be a

real word by 15 subjects (it is part of the reduplicated Mimetic potapota). One other was

judged a real word by 2 subjects, and 5 others by 1 subject. Lexical bias, if any were

present, should have given [pota] an especially strong [a] bias. Its boundary is in fact

intermediate between those of its neighbors.

The INC-1 and SC-1 versions of the transitional-probability theory do not fare well

either. They can account for the effects of C’, but not for those of C, which is fully three

segments away from the ambiguous vowel. The C effect is smaller than the C' effect, which

could indicate that a transitional-probability effect between C’ and the immediately

neighboring ambiguous segment is being added on top of some other effect correlated with

stratum.

These results are also challenging for the OT grammatical model. It is clear that

stratum phonotactics are playing a role, so faithfulness constraints are involved. However,

the lack of interaction between the C and C' effects is unexpected. All stimuli containing at

least one Foreign cue should be classified as Foreign and should all behave alike, with the

same low level of [a] bias. Instead, the [a] bias decreases with the number of Foreign cues,

and two are more effective than one.

Furthermore, [rj] and [h*] seem to act as cues to membership in the Sino-Japanese

stratum, since they produce more [a] bias than the Foreign or Neutral cues. This is

unexpected in the grammatical theory because there are no grammatical cues to

membership in the Sino-Japanese stratum. The [rj] and [h*] sounds are statistically rare in

237

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the Foreign stratum, but not grammatically illegal there. Listeners appear to be sensitive to

the statistical link, which the OT grammatical theory does not capture.

4.7.4.S Discussion

Experiments 6a and 6b have demonstrated three points with some assurance.

First, they found a phonotactic effect that was substantially larger than any of three

lexical effects obtained with the same listeners and paradigm - a point difficult to explain in

TRACE.

Second, they showed that phonological context which is remote from the ambiguous

segment can influence perception. If the effect is caused by perceptual mechanisms

sensitive to phoneme sequence probabilities, they must be sensitive to sequences of an

implausibly great length.

Third, they demonstrated that the lexical strata of Japanese are not just descriptive

constructs, but play an active role in perception. The distinction among strata (as a primitive

rather than emergent phenomenon) is crucial to any grammatical account of Japanese

phonology, but is unmotivated (or motivated only post hoc) in TRACE and MERGE TP.

These results leave a problem for the grammatical theory in the gradience of the

cues. As the cues become more Sino-Japanese-like, the [a] bias increases; as they become

more Foreign-like, it decreases.

It may be that the likelihood of classifying a stimulus in a particular stratum depends

on the number and type of stratum cues present in it. A stimulus with two Sino-Japanese

cues is more likely to be perceived as Sino-Japanese, and hence more likely invoke the *[a:J

constraint and produce a boundary shift, than a stimulus with only one.

This solution requires that stratum classification be performed outside of the

grammar, since the grammar does not represent the concept of "Sino-Japanese cue" - only

that of "Foreign cue”. Stratum classification might take place as part of the process of

238

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
incorporating a new word into the lexicon, through comparison with existing lexical items.

Since listeners in this experiment heard each stimulus an average of 44 times in succession,

they had concentrated experience with each one, and ample time for slower off-line

processes to unfold.

4.8. Summary

Experiments 1-3 showed that a phonotactic boundary shift could be obtained in

contexts which made one endpoint illegal, but not in contexts which merely made one

endpoint extremely rare. This result is consistent only with the OT grammatical theory, but

not with MERGE TP or TRACE. These same experiments also showed that the boundaiy-

shift effect is not modulated by phonological context which is outside the structural

description of the relevant phonotactic prohibition - a result consistent with the OT

grammatical theory, and with MERGE TP, but not with TRACE.

Experiment 4 replicated the effects of Experiments 2 and 3 with voiced rather than

voiceless stops, and showed that the boundary shift was due to a dependency between stop

and sonorant responses, rather than to any auditory effect of one consonant on the other.

Experiment 5 confirmed that this dependency was not compensation for coarticulation, and

showed that the phonotactic effect in CCV stimuli can be abolished by prepending another

vowel to provide an alternative parse in which the banned cluster becomes legal. This

indicates that the parser makes syllabification and segmental-identity choices in parallel. If

segmental identity were decided first, then the presence or absence of an initial V would

make no difference. If syllabification were decided first, then phonotactics would not be

able to influence segmentation, as it is known to do (Friederici & Wessels 1993; McQueen

1998, Treiman & Zukowski 1990; Treiman & Danis 1988, Kirk 2001).

Experiments 6a and 6b found that the perception can be influenced by the

phonotactics of the lexical strata of Japanese, a variable which exists only in the OT

grammatical theory. Stratum effects must be taken as emergent statistical phenomena in the

239

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
other two theories, but this they cannot be. The stratum effect was found to be larger than a

word-superiority effect, ruling out TRACE'S contention that it is a word-superiority effect.

The phonotactic boundary shift was influenced by segmental context fully three segments

away from the ambiguous segment, too far away for the MERGE TP theory to capture the

dependency.

Taken together, these results provide substantial support for the hypothesis that

ambiguous-phoneme perception is guided by grammatical knowledge, and that the parser

considers several candidates in parallel when making its decisions.

4.9. Appendix: Synthesis parameters for the stimuli of Experiments 4 and 5

Table 4.104. Constant synthesis parameters which were identical for the "b" and "d" arrays
of Experiments 4 and 5

Parameter Value

UI 2

RS 1776

SR 16000

FL 20

OQ 30

GV 60

GH 50

DU 700

F6 4900

B6 100

F5 4300

B5 300

F4 3250

240

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Value

F3 2500

FTP 3800

Note: UI = update interval (frame length), ms; RS = random-number-generator seed; SR =


sampling rate, Hz; FL = flutter (%F0 variation), OQ = open quotient (% of the glottal cycle
with open glottis), GV = output gain for voicing source, dB; GH = output gain for
aspiration source, dB; DU = duration, ms; FTP = frequency of tracheal pole, Hz.

Table 4.105. Time-varying synthesis parameters common to the "b" and "d" arrays of
Experiments 4 and 5

Parameter Time, ms Value

AV 0 0

AV 25 0

AV 30 40

AV 70 40

AV 75 57

AV 225 60
AV 425 55

AV 475 50

AV 525 43
AV 575 0
AV 700 0

TL 0 30

TL 60 30

TL 65 10

TL 75 10

TL 115 (10 * w + 0 * 1)

TL 145 (20 * w +

241

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value

TL 175 (10 * w + 0 * 1)

TL 225 0

TL 700 0

FO 0 1000

FO 75 1000

FO 225 1100

FO 425 1000

FO 475 900

FO 700 900

AH 0 0

AH 25 0

AH 60 0

AH 65 72

AH 70 70

AH 90 65

AH 115 0

AH 175 0

AH 225 0

AH 275 56

AH 345 60

AH 425 58

AH 475 55

AH 525 53

AH 575 0

AH 700 0

242

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value

AF 0 0

AF 60 0

AF 65 57

AF 70 55

AF 75 0

AF 700 0

Note: AV = amplitude of voicing, dB arbitrary reference; AH = amplitude of aspiration, dB


arbitrary reference; AF = amplitude of frication, dB arbitrary reference; FO = fundamental
frequency, 0.1 Hz; TL = spectral tilt, dB reduction at 3000 Hz.

Table 4.106. Synthesis parameters for the "b" array of Experiments 4 and 5

Parameter Time, ms Value

AB — ( 24 * g + 44 * b)

A2F — 66

B2F — (75 * exp ( log (600/75) * b))

FTZ 0 3800

FTZ 75 3800

FTZ 115 (3800 * w + 3300 * 1)

FTZ 175 (3800 * w + 3300 * 1)

FTZ 275 3800

FTZ 700 3800

B4 0 800

B4 75 800

B4 115 (800 * w + 300 * 1)

B4 175 (800 * w + 300 * 1)

B4 275 200

243

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value

B4 700 200

B3 0 700

B3 75 700

B3 115 (800 * w + 250 * 1)

B3 175 (800 * w + 250 * 1)

B3 275 150

B3 700 150

F2 0 st_targ

F2 75 st_targ

F2 115 (0.20 * st_targ + 0.80 * gl_targ)

F2 140 gLtarg

F2 150 gLtarg

F2 225 (0.33 * gLtarg + 0.67 * 1600)

F2 275 1600

F2 700 1600

B2 0 90

B2 115 90

B2 175 (270 * w + 90 * 1)
B2 225 (270 * w + 90 * 1)

B2 275 90

B2 700 90

FI 0 180

FI 75 180

FI 85 200

FI 140 (270 * w + 320 * 1)

244

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value

FI 150 (270 * w + 320 * 1)

FI 225 700

FI 345 780

FI 700 780

Bl 0 250

Bl 75 250

Bl 95 140

Bl 175 140

Bl 225 80

Bl 700 80

Note: AB = amplitude of the bypass frication filter, dB; A2F = amplitude of the frication
filter centered on F2, dB; B2F = bandwidth of same, Hz; FTZ = frequency of tracheal zero,
dB; Fi and Bi = frequency and bandwidth of the i-th formant, Hz. The variable st_targ, the
stop target F2, is 800 Hz; gljtarg, the glide target F2, is equal to 675 Hz * w + 900 Hz * /.

Table 4.107. Synthesis parameters for the "d" array of Experiments 4 and 5

Parameter Time, ms Value

AB — ( 24 * g + 44 * d)

A2F — ( 66 * g + 60 * d)

B2F — (75 * exp ( log (600/75) * d))

FTZ 0 3800

FTZ 75 3800

FTZ 115 (3800 * w + 3300 * 1)

FTZ 175 (3800 * w + 3300 * 1)

FTZ 275 3800

FTZ 700 3800

B4 0 800

245

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value

B4 75 800

B4 115 (800 * w + 300 * I)

B4 175 (800 * w + 300 * 1)

B4 275 200

B4 700 200

B3 0 700

B3 75 700

B3 115 (800 * w + 250 * I)

B3 175 (800 * w + 250 * 1)

B3 275 150

B3 700 150

F2 0 st_targ

F2 75 st_targ

F2 115 (0.20 * st_targ + 0.80 * gl_targ)

F2 140 gLtarg

F2 150 gLtarg

F2 225 (0.33 * gLtarg + 0.67 * 1600)

F2 275 1600
F2 700 1600

B2 0 90

B2 115 90
B2 175 (270 * w + 90 * 1)
B2 225 (270 * w + 90 * 1)

B2 275 90
B2 700 90

246

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Parameter Time, ms Value

FI 0 180

FI 75 180

FI 85 200

FI 140 (270 * w + 320 * 1)

FI 150 (270 * w + 320 * 1)

FI 225 700

FI 345 780

FI 700 780

Bl 0 250

Bl 75 250

Bl 950 140

Bl 175 140

Bl 225 80

Bl 700 80

Note: AB = amplitude of the bypass frication filter, dB; A2F = amplitude of the frication
Filter centered on F2, dB; B2F = bandwidth of same, Hz; FTZ = frequency of tracheal zero,
dB; Fi and Bi = frequency and bandwidth of the i-th formant, Hz. The variable stjarg, the
stop target F2, is equal to 800 Hz * g + 1400 Hz * d: g lja r g , the glide target F2, is equal
to 675 Hz * w + 900 Hz * /.

247

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 5

CONCLUSIONS

Speech perception is guided by the expectation that the stimulus is an utterance in

the perceiver’s language. This finding cuts across every level of language organization:

phoneme inventory (e.g., Miyawaki, Strange, Veibrugge, Liberman, Jenkins, & Fujimura,

1975), phonotactics (e.g., Brown & Hildum, 1956), the lexicon (e.g., Ganong, 1980), and

syntax (e.g., Games & Bond, 1975).

The preceding chapters have offered an explicit theory of how the mechanisms of

speech perception can use grammatical knowledge of the phonology of the stimulus

language to arrive at a phonological parse of the input by choosing from a set of candidate

parses consistent with the acoustic signal. I have argued that such a theory is necessary if

the observed effects of phonology on perception are to be understood, in view of the

inability of competing statistically-based theories (TRACE and transitional probabilities) to

account for the empirical Findings.

Three principal objections were raised against the statistical theories: their inability

to distinguish between relevant and irrelevant context, their lack of sufficiently rich

phonological structure, and their inability to generalize appropriately to phonological

classes. At the root of all three is precisely the feature that makes statistical theories so

conceptually attractive - their low degree of abstraction.

The statistical theories, including TRACE and MERGE TP, can be characterized as

"unit models" because they attribute perceptual preference for, e.g., [tr] over [tl] to the

listener’s differing experience of the specific phonological units [tr] and [tl]: One is an

attested onset and the other is not (Halle et al. 1998, Pitt 1998), one is common and the

other is rare (Massaro & Cohen 1983, Pitt & McQueen 1998), one is supported by many

lexical items which contain it and the other is not (McClelland & Elman 1986). The a

priori plausibility of unit models comes from the pervasiveness of unit-frequency and

248

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
lexicalty effects in language (e.g., Vitevich & Luce 1999, Jusczyk, Luce, & Charles-Luce

1994, Frisch, Large, & Pisoni 2000, Hay, Pierrehumbert, & Beckman in press; Ganong

1980; Samuel 1981, Fox 1984), combined with the minimal nature of the representations

they posit - phonemes and words, both of which are needed in any theory. The weakness

of unit models is not conceptual but empirical: The phonological knowledge used in speech

processing is more complex than can be accommodated in such a simple architecture. The

experiments of Chapter 4 were designed to exploit this weakness, in order to argue that a

full-fledged phonological competence must be available to perceptual mechanisms on line.

Inability to distinguish relevant from irrelevant phonological context. Phonological

processes apply to classes of sounds in classes of environments (e.g., a process of

devoicing applying to all obstruents at the end of all syllables). Different rules have

different environments. However, the unit models afford only one environment - the word

for TRACE, the fixed-length phoneme string for MERGE TP - and are forced to detect

linguistically irrelevant accidental correlations involving contextual material which has

nothing to do with the actual phonological pattern. Experiments 1-6 showed that in fact,

when probabilities are equated, phonologically relevant variation has a much stronger

perceptual effect than phonologically irrelevant variation.

This was particularly clear in the case of Experiment 6, where the magnitude of a

word-superiority effect was compared directly with that of a stratal phonotactic effect and

found to be much smaller. In order to account for the phonotactic effect (which reflected a

dependency between the ambiguous phoneme and one three phonemes previous to it), a unit

model would have to use such a large environment that it would have to also represent

equally strongly the dependency between the first three segments of any word and the

fourth - incorrectly predicting a similarly-sized or stronger word-superiority effect.

Lack o f sufficiently rich phonological structure. Moreover, since the unit models do

not represent syllabification, they could not predict the effects of syllable structure found by

Pitt (1998) and in Experiment 5. TRACE, whose phoneme-decision process considers each

249

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
phoneme unit in isolation, cannot represent the phonotactic dependency between two

phoneme decisions found in Experiments 4 and 5.

Inability to generalize to phonological classes. This problem is acute for MERGE

TP, which only represents statistical dependencies between one specific phoneme sequence

and another. For this theory, [ki] and [saj are not two instances of the pattern "CV

syllable", but two unrelated phoneme strings. This renders the theory unable to recognize

natural classes. Experiments 2-4 indicated that English listeners' experience of the common

[labial][labial] onsets [br pr] legitimizes the rare or nonexistent [bw pw], but MERGE TP

cannot make the connection. (TRACE'S featural level could in principle allow it to capture

this generalization, if there were a way of representing syllable structure.) The inability to

relate one phoneme string to another exacerbates the problems of irrelevant context and

oversimplified phonological structure by preventing the process of comparison which might

allow irrelevant factors to be averaged away and lead to the induction of more structured

representations.

The OT grammatical theory performed well in all of these tests, predicting shifts

when there should have been shifts and no shifts when there should not have been any. The

good performance was not due to the specific choice of Optimality Theory - a similar

theory could in principle have been constructed around any descriptively adequate grammar

- but to the fact that grammatical theory more accurately describes the categories and

processes of language. TRACE and MERGE TP both propose, in essence, that the

representations and rules active in on-line speech perception are very different from those

inferred from typological study of the structure of human languages. Any attempt to

elaborate the architecture of either theory to capture more sophisticated linguistic concepts

(e.g., by adding a layer o f syllable units to TRACE) will amount to building grammar into

them. Since their chief conceptual appeal is their promise to explain apparent grammatical

effects, like phonotactics, as emergent statistical generalizations captured by a simple

learning system, the remedy will require amputation.

250

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
There remains some evidence for transitional-probability effects on ambiguous-

phoneme perception which are not captured in the OT grammatical theory - the findings of

Pitt (1998) on English onset clusters - and the effects of lexicality are thoroughly

documented (Ganong 1980; Samuel 1981, Fox 1984, Elman & McClelland 1988, etc.).

Given the pervasive nature of frequency and practice effects in all cognitive domains, and

their narrow stimulus-specificity (e.g., Klapp et al. 1991, Logan 1988a, 1988b), there can be

no doubt that unit-based processes play a prominent role in perception. However, the

evidence of this study suggests that they are considerably weaker than the structural

constraints of the listener’s native language.

A mechanism for capturing some of these effects—gradient illegality and

probabilistic phonotactics— in Optimality Theory has been proposed by Boersma (1997,

1998) and Boersma and Hayes (2001): the continuous ranking scale and stochastic

constraint evaluation. In a standard OT grammar each constraint has a fixed dominance

relation to all other constraints, determined by its place in the hierarchy: C l dominates C2

or is dominated by C2, or is ranked in the same stratum as CT, with no other possibility. In

Boersma and Hayes's proposal, each constraint is associated with a range of positions on

the real line, which may overlap with the ranges of other constraints. When the grammar is

given an input to evaluate, the position of each constraint is fixed probabilistically at some

point in its range, yielding a standard ranking. If the range of CI is centered above that of

C2, but overlaps it, then most of the time C l will be fixed above C2, but sometimes Cl will

be low in its range, C2 will be high in its, and the result will be that C2 dominates C l. The

grammar therefore does not always give the same output for a given input; different

constraints are active from one use of the grammar to the next. The further apart the centers

of the C l and C2 ranges are, the likelier C l is to dominate C2 on any given use of the

grammar; hence, the likelier the corresponding grammatical process is to occur. The overlap

between the C l and C2 ranges depends on the frequency with which the C 1 » C 2 and the

C 2 » C I outputs occur in the data to which the learner is exposed.

251

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In perception, this could cause phonotactic prohibitions to appear and disappear

probabilistically. A configuration which is ruled out when the markedness constraint M

dominates the faithfulness constraint F (and is therefore active) would be accepted when the

reverse is true. Averaged over a large number of trials, the listener's dispreference for the

configuration would depend on the amount of overlap between the M and F ranges. In this

way, stronger and weaker phonotactic bans would correspond to smaller and greater degrees

of overlap.

Although this dissertation has considered only phonotactic effects of grammar on

ambiguous-phoneme perception, the OT grammatical model predicts that effects will be

pervasive in more naturalistic on-line tasks. Speech segmentation, for instance, is

demonstrably sensitive to the grammar of phonotactics (Norris et al. 1997, Kirk 2001).

This can be seen as selection of a grammatically more harmonic prosodic parse over a less

harmonic one. Effects of faithfulness should be apparent in word recognition and similarity

priming: A nonword which is unfaithful to a real word on a low-ranked faithfulness

constraint should activate the word more strongly than a nonword which is unfaithful to it

on a high-ranked constraint. Such studies offer a test, not merely of the influence of

grammar, but of the specific conception of it put forward by Optimality Theory and

competing theories of linguistic competence.

252

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BIBLIOGRAPHY

Algeo, J. (1978). What consonant clusters are possible? Word 29:206-224.

Alwan, A. A., Narayanan, S.S., & Haker, K. (1997). Toward articulatory-acoustic


models for liquid approximants based on MRI and EPG data. Part II. The rhotics.
Journal o f the Acoustical Society o f America 101(2):1078-1089.

Amano, S., & Kondo, T. (1999). Nihongo no goitokusei. Tokyo: Sanseido.

Archangeli, D., & Pulleybiank, D. (1994). Grounded phonology. Cambridge, MA: MIT
Press.

Ashby, F. G., & Maddox, W. T. (1994). A response time theory of separability and
integrality in speeded classification. Journal o f Mathematical Psychology, 38:423-
466.

Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of probability


statistics by 8-month-old infants. Psychological Science 9(4):32l-324.

Baayen, R., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (CD-
ROM). Philadelphia: Linguistic Data Consortium.

Bailey, P. J., Summerfield, Q., & Dorman, M. (1977). On the identification of sine-
wave analogues of certain speech sounts. Haskins Laboratories Status Report on
Speech Research 51/52:1-25.

Beckman, J. N. (1998). Positional faithfulness. Ph. D. dissertation, University of


Massachusetts, Amherst.

Blumstein, S. E., & Stevens, K. N. (1979). Acoustic invariance in speech production:


Evidence from measurements of the spectral characteristics of stop consonants.
Journal o f the Acoustical Society o f America 66(4): 1001-1017.

Boersma, P. (1997). How we learn variation, optionality, and probability.


Proceedings o f the Institute o f Phonetic Sciences o f the University o f Amsterdam
21:43-58

253

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Boersma, P. (1998). Functional phonology: Formalizing the interactions between
articulatory and perceptual drives. Ph.D. dissertation, University of Amsterdam.

Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm.
Linguistic Inquiry 32:45-86.

Borowsky, T. J. (1986). Topics in the lexical phonology o f English. Ph.D. dissertation,


University of Massachusetts, Amherst.

Bosch, L., and Sebastian-Galles, N. (1997). Native-language recognition abilities in 4-


month-old-infants from monolingual and bilingual environments. Cognition 65:33-
69.

Brown, C., & Matthews, J. (2001). When intake exceeds input: Language-specific
perceptual illusions induced by LI prosodic constraints. Proceedings o f the Third
International Symposium on Bilingualism, Bristol, U.K., April 18-20, 2001.

Brown, R. W., & Hildum , D. C. (1956). Expectancy and the perception of syllables.
Language 32:411-419.

Bumage, G. (1995). The CELEX lexical database. Release 2. Centre for Lexical
Information; Max Planck Institute for Psycholinguistics, The Netherlands.

Burton, M. W„ Baum, S. R„ & Blumstein, S. E. (1989). Lexical effects on the phonetic


categorization of speech: The role of acoustic structure. Journal o f Experimental
Psychology: Human Perception and Performance 15(3):567-575.

Catford, J. C. (1988). A Practical Introduction to Phonetics. Oxford, UK: Oxford


University Press.

Cherry, E. C. (1953). Some experiments on the recognition of speech with one and two
ears. Journal o f the Acoustical Society o f America 23:975-979.

Chomsky, N., & Halle, M. (1968). The sound pattern o f English. Cambridge, MA: MIT
Press.

Clements, G. N. (1990). The role of the sonority cycle in core syllabification. In J.


Kingston & M. Beckman (Eds.), Papers in laboratory phonology I: Between the
grammar and physics o f speech (pp. 283-333). New York: Cambridge University
Press.

254

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Clements, G. N., & Keyser, S. J. (1983). CVphonology. MIT Press, Cambridge, MA.

Clements, G.N. (1985). The geometry of phonological features. Phonology Yearbook


2:225-252.

Cohn, A., & Lavoie, L. (2000). English vowel-liquid monosyllables: A case of


trimoraic syllables. Poster presented at the Seventh Conference on Laboratory
Phonology, Nijmegen, The Netherlands.

Coleman, J., & Pierrehumbert, J. (1997). Stochastic phonological grammars and


acceptability. In Proceedings o f the 3rd Meeting o f the ACL Special Interest Group
in Computational Phonology (12 July 1997), pp. 49-56. Somerset, NJ: Association
for Computational Linguistics.

Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed).


Strategies o f information-processing (pp. 151-216). London: Academic Press.

Connine C. M., & Clifton, C. (1987). Interactive use of lexical information in speech
perception. Journal o f Experimental Psychology: Human Perception and
Performance 13(2):291-299.

Connine C. M., Titone, D., & Wang , J. (1993). Auditory word recognition: Extrinsic
and intrinsic effects of word frequency. Journal o f Experimental Psychology:
Learning, Memory, and Cognition 19:81-94.

Crowther, C. S., & Mann, V. A. (1994). Use of vocalic cues to consonant voicing and
native language background: The influence of experimental design. Perception and
Psychophysics 55(5):513-525.

Cutler A., & Norris, D. (1979). Monitoring sentence comprehension. In W. E. Cooper


& E. C. T. Walker (Eds.), Sentence Processing: Psycholinguistic studies presented
to Merrill Garrett (pp. 113-134). Hillsdale, NY: Erlbaum.

Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical
access. Journal o f Experimental Psychology: Human Perception and Performance
14:113-121.

Cutler, A., Mehler, J., Norris, D., & Segui, J. (1986). The syllable's differing role in the
segmentation of French and English. Journal o f Memory and Language 25:385-
400.

255

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1987). Phoneme identification and the
lexicon. Cognitive Psychology 19:141-177.

Delattre, P., & Freeman, D. C. (1968). A dialect study of American R's by X-ray
motion picture. Linguistics 44:29-68.

Dell, G. S., & Newman, J. E. (1980). Detecting phonemes in fluent speech. Journal o f
Verbal Learning and Verbal Behavior 19:608-623.

Dell, G. S., Reed, K. D., Adams, D. R., & Meyer, A. S. (2000). Speech errors,
phonotactic constraints, and implicit learning: A study of the role of experience in
language production. Journal o f Experimental Psychology: Learning, Memory, and
Cognition 26(6): 1355-1367.

DeLorme Publishing Company (1998). Iowa atlas and gazetteer. Yarmouth, Maine:
DeLorme.

DeLorme Publishing Company (2000). Nebraska atlas and gazetteer. Yarmouth,


Maine: DeLorme.

Di Benedetto, M. G. (1989). Frequency and time variations of the first formant:


Properties relevant to the perception of vowel height. Journal o f the Acoustical
Society o f America 86:67-77.

Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C., & Mehler, J. (1999). Epenthetic
vowels in Japanese: A perceptual illusion? Journal o f Experimental Psychology:
Human Perception and Performance 25(6): 1568-1578.

Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and
cross-validation. The American Statistician 37(l):36-48.

Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York:
Chapman and Hall.

Elman, J. L., & McClelland, J. L. (1988). Cognitive penetration of the mechanisms of


perception: Compensation for coarticulation of lexically restored phonemes.
Journal o f Memory and Language 27:143-165.

Espy-Wilson, C. R. (1992). Acoustic measures for linguistic features distinguishing


the semivowels /w j r Min American English. Journal o f the Acoustical Society o f
America 92(2):736-757.

256

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Foss, D. J. (1969). Decision processes during sentence comprehension: Effects of
lexical item difficulty and position upon decision times. Journal o f Verbal
Learning and Verbal Behavior 8:457-462.

Foss, D. J., Harwood, D. A., & Blank, M. A. (1980). Deciphering decoding decisions:
Data and devices. In R. A. Cole (Ed.), Perception and production o f fluent speech.
Hillsdale, NJ: Erlbaum.

Fougeron, C., & Keating, P. A. (1997). "Articulatory strengthening at edges of


prosodic domains" Journal o f the Acoustical Society o f America 101:3728-3740.

Fox, R. A. (1984). Effect of lexical status on phonetic categorization. Journal o f


Experimental Psychology: Human Perception and Performance 610:526-540.

Frauenfelder, U. H., Segui, J., & Dijkstra, T. (1990). Lexical effects in phonemic
processing: Facilitory or inhibitory? Journal o f Experimental Psychology: Human
Perception and Performance 16(1):77-91.

Friederici, A. D., & Wessels, J. M. (1993). Phonotactic knowledge of word boundaries


and its use in infant speech perception. Perception and Psychophysics 54(3):287-
295.

Frisch, S. A., & Zawaydeh, B. A. (2001). The psychological reality of OCP-Place in


Arabic. Language 77( 1):91-106.

Frisch, S. A., Large, N. R., & Pisoni, D. B. (2000). Perception of wordlikeness: Effects
of segment probability and length on the processing of nonwords. Journal o f
Memory and Language 42:481-496.

Frisch, S., Broe, M., & Pierrehumbert, J. (1995). The role of similarity in phonology:
Explaining OCP-Place. Proceedings o f the 13th International Conference o f the
Phonetic Sciences 3:544-547.

Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language


47:27-52.

Fukazawa, H., Kitahara, M„ & Ota, M. (1998). Lexical stratification and ranking
invariance in constraint-based grammars. In M. C. Gruber, D. Higgins, K. S.
Olson, & T. Wysocki (Eds.), Proceedings o f the Chicago Linguistics Society 34-2
The Panels (pp. 47-62). Chicago: Chicago Linguistics Society.

257

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal o f
Experimental Psychology: Human Perception and Performance 6(1): 110-125.

Games, S., & Bond, Z. S. (1975). Slips of the ear: Errors in perception of casual
speech. Papers from the 11th Regional Meeting o f the Chicago Linguistics Society,
pp. 214-225. Chicago, Illinois: Chicago Linguistic Society.

Goldinger, S. (1997). Words and voices: perception and production in an episodic


lexicon. In K. Johnson & J. Mullenix (Eds.), Talker variability in speech
processing (pp 33-66). San Diego: Academic Press.

Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. New
York: Wiley.

Greenberg, J. H., & Jenkins, J. J. (1964). Studies in the psychological correlates of the
sound system of American English. Word 20:157-177.

Greenberg, J. H. (1963). Some universals of grammar, with particular reference to the


order of meaningful elements. In Greenberg (Ed.), Language universals (pp. 73-
113). Cambridge, MA: MIT Press.

Guenter, J. (2000). What is /l/? Proceedings o f the 26th Annual Meeting o f the Berkeley
Linguistics Society, University of California, Berkeley, February 18-21.

Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics 47:757-762.

Hall, T. A. (1997). The phonology o f coronals. Amsterdam Studies in the Theory and
History of Linguistic Science, Series IV: Current Issues in Linguistic Theory, Vol.
149. Amsterdam: Benjamins.

Halle, P. A., Segui, J., Frauenfelder, U., & Meunier, C. (1998). Processing of illegal
consonant clusters: A case of perceptual assimilation? Journal o f Experimental
Psychology: Human Perception and Performance 24(2):592-608.

Hammond, M. (1999). The phonology o f English: A prosodic Optimality-Theoretic


approach. Oxford, UK: Oxford University Press.

Harris, Z. (1951). Methods in structural linguistics. Chicago: University of Chicago


Press.

258

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hay, J., Pierrehumbert, J., & Beckman, M. (in press). Speech perception, well-
formedness, and the statistics of the lexicon. In J. Local, R, Ogden, & R. Temple
(Eds.), Papers in laboratory phonology VI. Cambridge, U.K.: Cambridge
University Press.

Hayes, B. (1980). A metrical theory o f stress rules. Ph.D. dissertation, MIT.

Hsieh, H.-I. (1976). On the unreality of some phonological rules. Lingua 38:1-19.

Hultzen, L. (1965). Consonant clusters in English. American Speech 40:5-19.

Inkelas, S. (1994). The consequences of optimization for underspecification. MS,


Rutgers Optimality Archive, Rutgers University.

Ito, J., & Mester, R. A. (1994). Japanese phonology. In J. A. Goldsmith (Ed.),


Handbook o f phonological theory (pp. 817-838). Cambridge, MA: Blackwell.

Ito, J., & Mester, R. A. (1995). The Core-Periphery Structure of the Lexicon and
Constraints on Reranking. In J. Beckman, S. Urbanczyk, & L. Walsh (Eds.),
University o f Massachusetts occasional papers in linguistics [UMOP] 18: Papers
in Optimality Theory (pp. 181-209). Amherst: GLSA.

Jaeger, J., Lockwood, A., Kemmerer, D., Van Valin, R., & Khalak, H. (1996). A
positron-emission-tomographic study of regular and irregular verb morphology in
English. Language 72(3):451-497.

Jakobson, R., Fant, G. M., & Halle, M. (1952). Preliminaries to speech analysis: The
distinctive features and their correlates. Cambidge, MA: MIT Press.

Johnson, K. (1997). Speech perception without speaker normalization: An exemplar


model. In K. Johnson & J. Mullenix (Eds.), Talker variability in speech processing
(pp. 145-166). San Diego: Academic Press.

Jones, D. (1997). English pronouncing dictionary, 15th ed. (P. Roach & J. Hartman,
eds.). Cambridge, UK: Cambridge University Press.

Jusczyk, P. W„ Friederici, A. D., Wessels, J., Svenkerud, V. Y., & Jusczyk, A. M.


(1993). Infants' sensitivity to the sound pattern of native-language words. Journal
o f Memory and Language 32:402-420.

259

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Jusczyk, P. W„ Luce, P. A., & Charles-Luce, J. (1994). Infants' sensitivity to
phonotactic patterns in the native language. Journal o f Memory and Language
33:630-645.

Kahn, D. (1980). Syllable-based generalizations in English phonology. New York:


Garland.

Kenstowicz, M. (1994). Phonology in generative grammar. Cambridge, MA:


Blackwell.

Kirk, C. J. (2000). The effects of stress on the segmentation of continuous speech.


Paper presented at the 75th annual meeting of the Linguistic Society of America,
Washington, DC, January 5th.

Kirk, C. J. (2001). Phonological constraints on the segmentation o f continuous speech.


Ph.D. dissertation, University of Massachusetts, Amherst.

Klapp, S. T., Boches, C. A., Trabert, M. L., & Logan, G. D. (1991). Automatizing
alphabet arithmetic: II. Are there practice effects after automaticity is achieved?
Journal o f Experimental Psychology: Learning, Memory, and Cognition 17(2): 196-
209.

Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer. Journal o f the


Acoustical Society o f America 67:971-995.

Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysis and


lexical access. Journal o f Phonetics 7:279-312.

Kluender, K.R., & Lotto, A. J. (1994). Effects of first formant onset frequency on [-
voice] judgments result from general auditory processes not specific to humans.
Journal o f the Acoustical Society o f America 95(2): 1044-1052.

Kucera, H, & Francis, W. N. (1967). Computational analysis o f present-day English.


Providence: Brown University Press.

Ladefoged, P. (1993). A course in phonetics. Fort Worth: Harcourt Brace.

Lahiri A., & Marslen-Wilson, W. (1991). The mental representation of lexical form: A
phonological approach to the recognition lexicon. Cognition 38:245-294.

260

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Lamontagne, G. (1993). Syllabification and consonant cooccurrence conditions. Ph.D.
dissertation, University of Massachusetts, Amherst.

Liberman, A.M., Harris, K. S., Hoffman, H. S., & Griffith, N. C. (1957). The
discrimination of speech sounds within and across phonemic boundaries. Journal o f
Experimental Psychology 53:358-368.

Lindsay, P. H., & Norman, D. A. (1977). Human information processing. New York:
Academic Press.

Logan, G. D. (1988a). Toward an instance theory of automatization. Psychological


Review 95(4):492-527.

Logan, G. D.( 1988b). Automaticity, resources, and memory: Theoretical controversies


and practical implications. Human Factors 30(5):583-598.

Logan, G. D., & Klapp, S. T . (1991). Automatizing alphabet arithmetic: I. Is extended


practice necessary to produce automaticity? Journal o f Experimental Psychology:
Learning, Memory, & Cognition 17(2): 179-195.

Luce, P. A. (1986). Neighborhoods o f words in the mental lexicon. Technical Report


#6, Speech Research Laboratory, Department of Psychology, Indiana University.

Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood
activation model. Ear and Hearing 19:1-36.

Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush & E.


Galanter (Eds.), Handbook o f Mathematical Psychology, Volume I (pp. 103-189).
New York: Wiley.

Macmillan, N. A., & Creelman, C. D. (1991). Signal detection theory: A user's guide.
Cambridge, UK: Cambridge University Press.

Maddieson, I. (1984). Patterns o f sounds. Cambridge, UK: Cambridge University


Press.

Mann, V. A. (1980). Influence of preceding liquid on stop consonant perception.


Perception and Psychophysics 28(5):407-412.

261

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Mann, V. A. (1986). Distinguishing universal and language-dependent levels of speech
perception: Evidence from Japanese listeners’ perception of English “I” and “r”.
Cognition 24(3): 169-196.

Mann, V. A., & Repp, B. (1981). Influence of preceding fricative on stop consonant
perception. Journal o f the Acoustical Society o f America 69(2):548-558.

Marslen-Wilson, W. D. (1984). Function and process in spoken word recognition. In


H. Bouma & D. G. Brouwhuis (Eds.), Attention and performance X: Control of
language processes (pp. 125-149). Hillsdale, NJ: Erlbaum.

Marslen-Wilson, W. D., & Welsh A. (1978). Processing interactions and lexical access
during word recognition in continuous speech. Cognitive Psychology 10:29-63.

Martin, S. E. (1952). Morphophonemics of standard colloquial Japanese. Language


28(2, Part 2): 1-113.

Massaro, D. W., & Cohen M., (1983). Phonological context in speech perception.
Perception and Psychophysics 34:338-348.

McCarthy, J. J. (1988). Feature geometry and dependency: A review. Phonetica 43:84-


108.

McCarthy, J. J. (1991). The phonology of Semitic pharyngeals. MS, University of


Massachusetts, Amherst.

McCarthy, J. J. (1998). Morpheme structure constraints and paradigm occultation. In


M.C. Gruber, D. Higgins, K. Olson, & T. Wysocki (Eds.), Papers from the 32nd
Regional Meeting o f the Chicago Linguistic Society: The Panels (pp. 123-150).
Chicago: Chicago Linguistic Society.

McCarthy, J. J., & Prince, A. (1995). Correspondence and reduplicative identity. In J.


N. Beckman, L. Walsh Dickey, & S. Urbanczyk (Eds.), University o f
Massachusetts occasional papers in linguistics [UMOP] 18: Papers in Optimality
Theory (pp. 249-384). Amherst: GLSA.

McClelland, J. L., & Elman, J. L. (1986). Interactive processes in speech perception:


The TRACE model. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel
distributed processing. Volume 2 (pp. 58-121). Cambridge, MA: MIT Press.

262

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
McQueen, J. M. (1991). The influence o f the lexicon on phonetic categorization:
Stimulus quality in word-final ambiguity. Journal o f Experimental Psychology:
Human Perception and Performance 17:433-443.

McQueen, J. M. (1996). Phonetic categorisation. Language and Cognitive Processes


ll(6):655-664.

McQueen, J. M. (1998). Segmentation of continuous speech using phonotactics.


Journal o f Memory and Language39(l):21-46.

McQueen, J. M., Norris, D., .& Cutler, A. (1994). Competition in spoken word
recognition: Spotting words in other words. Journal o f Experimental Psychology:
Learning, Memory, and Cognition 20:621-638.

McQueen, J. M., Norris, D., .& Cutler, A. (1999). Lexical influence in phonetic
decision making: Evidence from subcategorical mismatches. Journal o f
Experimental Psychology: Human Perception and Performance 25(5): 1363-1389.

Mehler, J., Dommergues, J. Y., Frauenfelder, U., & Segui, J. (1981). The syllable's role
in speech segmentation. Journal o f Verbal Learning and Verbal Behavior 20:298-
305.

Miyawaki, K., Strange, W„ Verbrugge, R., Liberman, A. M., Jenkins, J. J., &
Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r]
and [I] by native speakers of Japanese and English. Perception and Psychophysics
18:331-340.

Moreton, E., & Amano, S. (1999). Phonotactics in the perception of Japanese vowel
length: Evidence for long-distance dependencies. Paper presented at Eurospeech
1999, Budapest.

Moreton, E., Amano, S., & Kondo, T. (1998). Statistical phonotactics of Japanese:
transitional probabilities within the word. Transactions o f the Technical Committee
on Psychological Acoustics, Acoustical Society o f Japan, H-98-120.

Morton, J., & Long, J. (1976). Effect of word transitional probability on phoneme
identification. Journal o f Verbal Learning and Verbal Behavior 15:43-51.

Narayanan, S. S., Alwan, A. A., & Haker, K. (1997). Toward articulatory-acoustic


models for liquid approximants based on MR1 and EPG data. Part I. The laterals.
Journal o f the Acoustical Society o f America 101(2): 1064-1077.

263

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Nazzi, T„ Jusczyk, P. W„ & Johnson, E. K. (2000). Language discrimination by
English-leaming 5-month-olds: Effects of rhythm and familiarity. Journal o f
Memory and Language 43:1-19.

Nearey, T. M. (1990). The segment as a unit of speech perception. Journal o f


Phonetics 18:347-373.

Nearey, T. M., & Assmann, P. F. (1986). Modeling the role of inherent spectral change
in vowel identification .Journal o f the Acoustical Society o f America 80:1297-1308.

Newman, R. S., Sawusch, J. R., & Luce, P. A. (1997). Lexical neighborhood effects in
phonetic processing. Journal o f Experimental Psychology: Human Perception and
Performance 23:873-889.

Newmeyer, F. J. (1986). Linguistic theory in America. Orlando: Academic Press.

Norris, D. (1994). Shortlist: a connectionist model of continuous speech recognition.


Cognition 52:189-234.

Norris, D., McQueen, J. M„ & Cutler, A. (1997). The possible-word constraint in the
segmentation of continuous speech. Cognitive Psychology 34(3): 191-243.

Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech
recognition: Feedback is never necessary. Behavioral and Brain Sciences
23(3):299-325.

Padgett, J. (1991). Stricture in feature geometry. Ph.D. dissertation. University of


Massachusetts, Amherst.

Pertz, D. L., & Bever, T. G. (1975). Sensitivity to phonological universals in children


and adolescents. Language 51(1): 149-162.

Pitt, M. A. (1998). Phonological processes and the perception of phonotactically illegal


consonant clusters. Perception and Psychophysics 60:941-951.

Pitt, M. A., & Samuel, A. G. (1993). An empirical and meta-analytic evaluation of the
phoneme identification task. Journal o f Experimental Psychology: Human
Perception and Performance 19(4):699-725.

Pitt, M. A., & McQueen, J. M. (1998). Is compensation for coarticulation mediated by


the lexicon? Journal o f Memory and Language 39:347-370.

264

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Polivanov, E. (1931). La perception des sons d’une langue etrangere. Travawcdu
Cercle linguistique de Prague 4:79-86.

Prince, A., & Smolensky, P. (1993). Optimality’ Theory: Constraint interaction in


generative grammar. MS, Rutgers University.

Repp, B. H. (1982). Phonetic trading relations and context effects: new experimental
evidence for a speech mode of perception. Psychological Bulletin 92(l):81-l 10.

Rosenthall, S. (1997). Vowel/glide alternations in a theory o f constraint interaction.


New York: Garland.

Rubin, D. C. (1976). Frequency of occurrence as a psychophysical continuum.


Perception and Psychophysics 20(5):327-330.

Rubin, P., Turvey, M. T., van Gelder, P. (1976). Inital phonemes are detected faster in
spoken words than in spoken nonwords. Perception and Psychophysics 19:394-
398.

Rumelhart, D. E., Hinton, G. E., Williams, R. J. (1986). Learning internal


representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.),
Parallel distributed processing. Volume 1 (pp. 318-362). Cambridge, MA: MIT
Press.

Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of
distributional cues. Journal o f Memory and Language 35(4):606-62l.

Saffran, J. R., Newport, E. L., Aslin, R. N., &Tunick, R. A. (1997). Incidental


language learning: listening (and learning) out of the comer of your ear.
Psychological Science 8(2): 101-105.

Sagey, E. (1990). The representations o f features in non-linear phonology: The


articulator node hierarchy. New York: Garland.

Samuel, A. G. (1981a). Phonemic restoration: Insights from a new methodology.


Journal of Experimental Psychology: General 110:474-494.

Samuel, A. G. (1981b). The role of bottom-up confirmation in the phonemic


restoration illusion. Journal o f Experimental Psychology: Human Perception and
Performance 7(5): 1124-1131.

265

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Samuel, A. G. (1987). Lexical uniqueness effects on phonemic restoration. Journal o f
Memory and Language 26:36-56.

Samuel, A. G. (1991). A further examination of attentional effects in the phonemic


restoration illusion. Quarterly Journal o f Experimental Psychology 43A(3):679-
699.

Samuel, A. G. (1996). Phoneme restoration. Language and Cognitive Processes


ll(6):647-653.

Sapir, E. (1933). La Realite psychologique des phonemes. Journal de Psychologie


Normale et Pathologique 30:247-265.

Sato, P. T. (1985). Denominal verbs with -r: A response to de Chene. Japanese


Linguistics 10:149-169.

Scholes, R. J. (1966). Phonotactic grammaticality. The Hague: Mouton.

Segui, J., & Frauenfelder, U. (1986). The effect of lexical constraints upon speech
recognition. In F. Klix & H. Hagendorf (Eds.), Human memory and cognitive
capabilities (pp. 795-808). Amsterdam: Elsevier.

Selkirk, E. O. (1988). Dependency, place, and the notion "tier". Ms., Department of
Linguistics, University of Massachusetts, Amherst.

Shibatani, M. (1973). The role of surface phonetic constraints in generative phonology.


Language 48(1):87-106.

Shibatani, M. (1990). The languages o f Japan. Cambridge, UK: Cambridge University


Press.

Smit, A. B. (1993). Phonologic error distributions in the Iowa-Nebraska Articulation


Norms Project: Word-initial consonant clusters. Journal o f Speech and Hearing
Research 36:931-947.

Smith, J. L. (1999). Noun faithfulness and accent in Fukuoka Japanese. In S. Bird, A.


Camie, J. D. Haugen, & P. Norquest, eds., Proceedings o f the West Coast
Conference on Formal Linguistics XVIII. Somerville, MA: Cascadilla Press, 519-
531.

266

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
Smith, R. C., & Dixon, T. P. (1971). Frequency and the judged familiarity of
meaningful words. Journal o f Experimental Psychology 88(2):279-28l.

Sproat, R., & Fujimura, O. (1993). Allophonic variation in English /!/ and its
implications for phonetic implementation. Journal o f Phonetics 21: 291-311.

Stevens, K. N. (1999). Acoustic phonetics. Cambridge, MA: MIT Press.

Stockmal, V., Moates, D. R., & Bond, Z. S. (2000). Same talker, different language.
Applied Psycholinguistics 21:383-393.

Suzuki, K„ Maye, J., & Ohno, K. (2000). On the productivity of lexical stratification in
Japanese. Paper presented at the annual meeting of the Linguistic Society of
America, Chicago.

Tateishi, K. (1990). Phonology of Sino-Japanese morphemes. In G. Lamontagne & A.


Taub (Eds.), University o f Massachusetts occasional working papers in linguistics
[UMOPJ 13 (pp. 209-235). Amherst, MA: GLSA.

Taylor, M. M., & Creelman, C. D. (1967). PEST: Efficient estimates on probability


functions. Journal o f the Acoustical Society o f America 41:782-787.

Tesar, B., & Smolensky, P. (1993). The leamability of Optimality Theory: an


algorithm and some basic complexity results. Technical Report CU-CS-678-93,
Department of Computer Science, University of Colorado at Boulder.

Thorndike, E. L., & Lorge, I. (1944). The teacher's word book o f 30,000 words. New
York : Teachers College, Columbia University.

Treiman, R., Kessler, B., Knewasser, S., Tincoff, R., & Bowman, M. (1996 [2000]).
English speakers' sensitivity to phonotactic patterns. In M. Broe & J. B.
Pierrehumbert (Eds.), Papers in laboratory phonology V: Acquisition and the
lexicon (pp. 269-283). Cambridge, UK: Cambridge University Press.

Treiman, R. & Zukowski, A. (1990). Toward an understanding of English


syllabification. Journal o f Memory and Language 29:66-85.

Treiman, R., & Danis, C. (1988). Syllabification of intervocalic consonants. Journal o f


Memory and Language 27:87-104.

267

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Treiman, R., Gross, J., & Cwikiel-Glavin, A. (1992). The syllabification of fsf clusters
in English. Journal of Phonetics 20:383-402.

Trudgill, P. (1999). The dialects o f England, 2nd Edition. London: Blackwell.

Tyler, L. K., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming
latency data. Perception and Psychophysics 38(3):217-222.

U. S. Census Bureau (1990). Statistical abstract o f the United States. Washington,


D.C.: U.S. Department of Commerce.

Vance, T. J. (1987). An introduction to Japanese phonology. Albany: State University


of New York Press.

Vance, T. J. (1991). A new experimental study of Japanese verb morphology. Journal


o f Japanese Linguistics 13:145-166.

Venkatagiri, H. S. (1999). Clinical measurement of rate of reading and discourse in


young adults. Journal o f Fluency Disorders 24(3):209-226.

Vitevich, M. S., & Luce, P. A. (1998). When words compete: Levels of processing in
spoken word recognition. Psychological Science 9:325:329.

Vitevich, M. S., & Luce, P. A. (1999). Probabilistic phonotactics and neighborhood


activation in spoken word recognition. Journal o f Memory and Language 40:374-
408.

Vitevich, M. S., Luce, P. A., Charles-Luce, J., & Kemmerer, D. (1997). Phonotactics
and syllable stress: implications for the processing of spoken nonsense words.
Language and Speech 40(l):47-62.

Wall, L., Christiansen, T., & Schwartz, R. L. (1996). Programming Perl, 2nd. ed.
Cambridge, MA: O'Reilly.

Walsh Dickey, L. (1997). The phonology o f liquids. Ph.D. dissertation, University of


Massachusetts, Amherst.

Warren, P., & Marslen-Wilson, W. (1987). Continuous uptake of acoustic cues in


spoken word recognition. Perception and Psychophysics 41(3):262-275.

Wooley, D. E. (1970). Feature redundancy in consonant clusters. Linguistics 64:70-93.

268

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Wurm, L. H., & Samuel, A. G. (1997). Lexical inhibition and attentional allocation
during speech perception: Evidence from phoneme monitoring. Journal o f Memory
and Language 36:165-187.

Xu, Y. (1991). Depth of phonological recoding in short-term memory. Memory and


Cognition 19(3):263-273.

Yip, M. (1989). Feature geometry and co-occurrence restrictions. Phonology 6(2):349-


374.

Yule, H., & Bumel, A. C. (1886 [1985]). Hobson-Jobson: A glossary o f colloquial


Anglo-Indian words and phrases, and o f kindred terms, etymological, historical,
geographical, and discursive, 2nd edition. London: Routledge and Kegan Paul.

Zimmer, K. E. (1969). Psychological correlates of some Turkish morpheme structure


conditions. Language 45(2):309-321.

269

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

You might also like