You are on page 1of 42

CHAPTER 9

READING AND SPEECH


PERCEPTION

INTRODUCTION which listening to speech can be easier than


reading. Speech often contains prosodic cues
Humanity excels in its command of language. (discussed in Chapter 11; see Glossary). Prosodic
Indeed, language is of such enormous impor- cues are hints to sentence structure and intended
tance that this chapter and the following two meaning via the speaker’s pitch, intonation,
are devoted to it. In this chapter, we consider stress, and timing (e.g., questions have a rising
the basic processes involved in reading words intonation on the last word in the sentence).
and in recognising spoken words. It often does In contrast, the main cues to sentence structure
not matter whether a message is presented to specific to text are punctuation marks (e.g.,
our eyes or to our ears. For example, you would commas, semi-colons). These are often less
understand the sentence, “You have done informative than prosodic cues in speech.
exceptionally well in your cognitive psychology The fact that reading and listening to speech
examination”, in much the same way whether differ considerably can be seen by considering
you read or heard it. Thus, many comprehension children and brain-damaged patients. Young
processes are very similar whether we are read- children often have good comprehension of
ing a text or listening to someone talking. spoken language, but struggle to read even simple
However, reading and speech perception stories. Part of the reason may be that reading
differ in various ways. In reading, each word is a relatively recent invention in our evolutionary
can be seen as a whole, whereas a spoken word history, and so lacks a genetically programmed
is spread out in time and is transitory. More specialised processor (McCandliss, Cohen, &
importantly, it is much harder to tell where Dehaene, 2003). Some adult brain-damaged
one word ends and the next starts with speech patients can understand spoken language but
than with text. Speech generally provides a cannot read, and others can read perfectly well
more ambiguous signal than does printed text. but cannot understand the spoken word.
For example, when words were spliced out of Basic processes specific to reading are dealt
spoken sentences and presented on their own, with first in this chapter. These processes are
they were recognised only half of the time involved in recognising and reading individual
(Lieberman, 1963). words and in guiding our eye movements
There are other significant differences. The during reading. After that, we consider basic
demands on memory are greater when listening processes specific to speech, including those
to speech than reading a text, because the words required to divide the speech signal into separate
already spoken are no longer accessible. So far we words and to recognise those words.
have indicated ways in which listening to speech In Chapter 10, we discuss comprehension
is harder. However, there is one major way in processes common to reading and listening. In
334 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

contrast to this chapter, the emphasis will be Balota, Paul, and Spieler (1999) argued that
on larger units of language consisting of several reading involves several kinds of processing:
sentences. Bear in mind, however, that the pro- orthography (the spelling of words); phonology
cesses discussed in this chapter play an import- (the sound of words); semantics (word mean-
ant role in our comprehension of texts or long ing); syntax; and higher-level discourse integra-
speech utterances. tion. The various tasks differ in the involvement
of these kinds of processing:

READING: INTRODUCTION In naming, the attentional control system


would increase the influence of the
It is important to study reading because adults computations between orthography and
without effective reading skills are at a great phonology . . . the demands of lexical
disadvantage. Thus, we need to understand the decision performance might place a high
processes involved in reading to help poor readers. priority on the computations between
In addition, reading requires several perceptual orthographic and meaning level modules
and other cognitive processes as well as a good [processors] . . . if the goal . . . is reading
knowledge of language and of grammar. Thus, comprehension, then attentional control
reading can be regarded as visually guided would increase the priority of
thinking. computations of the syntactic-, meaning-,
and discourse-level modules (p. 47).

Research methods Thus, performance on naming and lexical


Several methods are available for studying read- decision tasks may not reflect accurately normal
ing. These methods have been used extensively reading processes.
in research, and so it is important to understand Next, there is priming, in which a prime
what they involve as well as their limitations. word is presented very shortly before the tar-
For example, consider ways of assessing the time get word. The prime word is related to the
taken for word identification or recognition target word (e.g., in spelling, meaning, or sound).
(e.g., deciding a word is familiar; accessing its What is of interest is to see the effects of
meaning). The lexical decision task involves the prime on processing of (and response to) the
deciding rapidly whether a string of letters forms target word. For example, when reading the
a word. The naming task involves saying a
printed word out loud as rapidly as possible.
These techniques ensure certain processing has KEY TERMS
been performed but possess clear limitations.
Normal reading times are disrupted by the require- lexical decision task: a task in which
ment to respond to the task, and it is hard to individuals decide as rapidly as possible whether
know precisely what processes are reflected in a letter string forms a word.
naming task: a task in which visually presented
lexical decision or naming times. words are pronounced aloud as rapidly as possible.
Recording eye movements during reading is orthography: information about the spellings
useful. It provides an unobtrusive and detailed of words.
on-line record of attention-related processes. The phonology: information about the sounds of
only important restriction on readers whose eye words and parts of words.
movements are being recorded is that they must semantics: the meaning conveyed by words and
keep their heads fairly still. The main problem sentences.
priming: influencing the processing of (and
is the difficulty of deciding precisely what pro- response to) a target by presenting a stimulus
cessing occurs during each fixation (period of related to it in some way beforehand.
time during which the eye remains still).
9 READING AND SPEECH PERCEPTION 335

pronunciation of their phonological


structure is not required. Thus, the
strong phonological model would
predict that phonological processing will
be mandatory [obligatory], perhaps
automatic (Frost, 1998, p. 76).

Evidence
The assumption that phonological processing
is important when identifying words was sup-
ported by van Orden (1987). Some of the words
he used were homophones (words having one
Reading is a complex skill. It involves processing
information about word spellings, the sounds of
pronunciation but two spellings). Participants
words, and the meanings of words, as well as made many errors when asked questions such
higher-level comprehension processes. as, “Is it a flower? ROWS”, than when asked,
“Is it a flower? ROBS”. The problem with
“ROWS” is that it is homophonic with “ROSE”,
which of course is a flower. The participants
word “clip”, do you access information about made errors because they engaged in phono-
its pronunciation? We will see shortly that the logical processing of the words.
most likely answer is, “Yes”. If the word is We now move on to the notion of phono-
preceded by a non-word having identical pro- logical neighbourhood. Two words are phono-
nunciation (“klip”) presented below the level logical neighbours if they differ in only one
of conscious awareness, it is processed faster phoneme (e.g., “gate” has “bait” and “get” as
(see Rastle & Brysbaert, 2006, for a review). neighbours). If phonology is used in visual word
Finally, there is brain imaging. In recent recognition, then words with many phonological
years, there has been increasing interest in iden- neighbours should have an advantage. Yates
tifying the brain areas associated with various (2005) found support for this assumption using
language processes. Some of the fruits of such various tasks (e.g., lexical decision; naming).
research will be discussed in this chapter and Within sentences, words having many phono-
the next two. logical neighbours are fixated for less time than
those with few neighbours (Yates, Friend, &
Ploetz, 2008).
Phonological processes in reading Many researchers have used masked phono-
You are currently reading this sentence. Did logical priming to assess the role of phonology
you access the relevant sounds when identifying in word processing (mentioned earlier). A word
the words in the previous sentence? The most (e.g., “clip”) is immediately preceded by a phono-
common view (e.g., Coltheart, Rastle, Perry, logically identical non-word prime (e.g., “klip”).
Langdon, & Ziegler, 2001) is that phonological This prime is masked and presented very briefly
processing of visual words is relatively slow so it is not consciously perceived. Rastle and
and inessential for word identification. This Brysbaert (2006) carried out a meta-analysis.
view (the weak phonological model) differs
from the strong phonological model in which
phonology has a much more central role: KEY TERM
homophones: words having the same
A phonological representation is a pronunciations but that differ in the way they
necessary product of processing printed are spelled.
words, even though the explicit
336 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

Words were processed faster on various tasks meanings of words for which he could not
(e.g., lexical decision task; naming task) when supply the appropriate phonology.
preceded by such primes than by primes similar One way of finding out when phonological
to them in terms of spelling but not phonology processing occurs is to use event-related poten-
(e.g., “plip”). These findings strongly imply tials (ERPs; see Glossary). When Ashby and
that phonological processing occurs rapidly Martin (2008) did this, they found that syllable
and automatically, as predicted by the strong information in visually presented words was
phonological model. However, findings with processed 250–350 ms after word onset. This
masked phonological priming do not prove is rapidly enough to influence visual word
that visual word recognition must depend on recognition.
prior phonological processing.
In a study on proof-reading and eye move- Evaluation
ments, Jared, Levy, and Rayner (1999) found Phonological processing typically occurs
that the use of phonology depended on the rapidly and automatically during visual word
nature of the words and participants’ reading recognition. Thus, the weak phonological model
ability. Eye-movement data suggested that may have underestimated the importance of
phonology was used in accessing the meaning phonological processing. As Rastle and Brysbaert
of low-frequency words (those infrequently (2006) pointed out, the fact that we develop
encountered) but not high-frequency ones. In phonological representations years before we
addition, poor readers were more likely than learn to read may help to explain why pho-
good ones to access phonology. nology is so important.
Does phonological processing occur before What are the limitations of the strong phono-
or after a word’s meaning has been accessed? logical model? There is as yet little compelling
In one study (Daneman, Reingold, and Davidson, evidence that phonological information has to
1995), readers fixated homophones longer be used in visual word recognition. In several
when they were incorrect (e.g., “He was in his studies (e.g., Hanley & McDonnell, 1997; Jared
stocking feat”) than when they were correct et al., 1999), evidence of phonological processing
(e.g., “He was in his stocking feet”). That would was limited or absent. There is also phonological
not have happened if the phonological code dyslexia (discussed in detail shortly). Phono-
had been accessed before word meaning. How- logical dyslexics have great difficulties with
ever, there were many backward eye movements phonological processing but can nevertheless read
(regressions) after incorrect homophones had familiar words. This is somewhat puzzling if
been fixated. These findings suggest that the phonological processing is essential for reading.
phonological code may be accessed after word Even when there is clear evidence of phonological
meaning is accessed. processing, this processing may occur after
Reasonably convincing evidence that word accessing word meaning (Daneman et al., 1995).
meaning can be accessed without access to pho- In sum, the strong phonological model is
nology was reported by Hanley and McDonnell probably too strong. However, phonological
(1997). They studied a patient, PS, who under- processing often plays an important role in visual
stood the meanings of words while reading even word recognition even if word recognition can
though he could not pronounce them accurately. occur in its absence.
PS did not even seem to have access to an internal
phonological representation of words. He could
not gain access to the other meaning of homo- WORD RECOGNITION
phones when he saw one of the spellings (e.g.,
“air”). The fact that PS could give accurate defini- College students typically read at about 300
tions of printed words in spite of his impairments words per minute, thus averaging only 200 ms
suggests strongly that he had full access to the to recognise each word. How long does word
9 READING AND SPEECH PERCEPTION 337

recognition take? That is hard to say, in part by a pattern mask. Participants decide which
because of imprecision about the meaning of two letters was presented in a particular
of “word recognition”. The term can refer to position (e.g., the third letter). The word su-
deciding that a word is familiar, accessing a periority effect is defined by the finding that
word’s name, or accessing its meaning. We will performance is better when the letter string
see that various estimates of the time taken for forms a word than when it does not.
word recognition have been produced. The word superiority effect suggests that in-
formation about the word presented can facili-
tate identification of the letters of that word.
Automatic processing However, there is also a pseudoword superiority
Rayner and Sereno (1994) argued that word effect: letters are better recognised when presented
recognition is generally fairly automatic. This in pseudowords (pronounceable nonwords such
makes intuitive sense given that most college as “MAVE”) than in unpronounceable non-
students have read between 20 and 70 million words (Carr, Davidson, & Hawkins, 1978).
words in their lifetimes. It has been argued
that automatic processes are unavoidable and
unavailable to consciousness (see Chapter 5). Interactive activation model
Evidence that word identification may be McClelland and Rumelhart (1981) proposed
unavoidable in some circumstances comes from an influential interactive activation model of
the Stroop effect (see Glossary), in which naming visual word processing to account for the word
the colours in which words are printed is superiority effect. It was based on the assump-
slowed when the words themselves are different tion that bottom-up and top-down processes
colour names (e.g., the word RED printed in interact (see Figure 9.1):
green). The Stroop effect suggests that word
meaning can be extracted even when people • There are recognition units at three levels:
try not to process it. Cheesman and Merikle the feature level at the bottom; the letter
(1984) found that the Stroop effect could be level in the middle; and the word level at
obtained even when the colour name was pre- the top.
sented below the level of conscious awareness. • When a feature in a letter is detected (e.g.,
This latter finding suggests that word recognition vertical line at the right-hand side of a
or identification does not necessarily depend letter), activation goes to all letter units
on conscious awareness. containing that feature (e.g., H, M, N), and
inhibition goes to all other letter units.
• Letters are identified at the letter level. When
Letter and word processing a letter within a word is identified, activation
It could be argued that the recognition of a is sent to the word level for all four-letter
word on the printed page involves two successive word units containing that letter in that
stages: position within the word, and inhibition is
sent to all other word units.
(1) Identification of the individual letters in
the word.
(2) Word identification. KEY TERMS

In fact, however, the notion that letter identifica- word superiority effect: a target letter is
more readily detected in a letter string when
tion must be complete before word identifica-
the string forms a word than when it does not.
tion can begin is wrong. For example, consider pseudoword: a pronounceable nonword (e.g.,
the word superiority effect (Reicher, 1969). A “tave”).
letter string is presented very briefly, followed
338 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

just one of its letters. Thus, for example, the word


WORD LEVEL Inh. “stem” has words including “seem”, “step”, and
“stew” as orthographic neighbours. When a word
Exc. Inh. Inh. Exc. is presented, these orthographic neighbours be-
come activated and increase the time taken to
LETTER LEVEL Inh. identify it. Theoretically, this inhibitory effect is
especially great when a word’s orthographic neigh-
Exc. Inh. Exc. Inh. bours are higher in frequency in the language
than the word itself. This is because high-frequency
FEATURE LEVEL Inh. words (words encountered frequently in our
everyday lives) have greater resting activation
levels than low-frequency ones. It has proved very
WRITTEN WORD
difficult to find this predicted inhibitory effect
of higher frequency neighbours in studies using
Inh. = Inhibitory process English words (e.g., Sears, Campbell, & Lupker,
Exc. = Excitatory process
2006). Interestingly, there is much stronger evid-
Figure 9.1 McClelland and Rumelhart’s (1981) ence for an inhibitory effect in other languages
interactive activation model of visual word (e.g., French, Dutch, Spanish; see Sears et al., 2006,
recognition. Adapted from Ellis (1984). for a review). English has many more short words
with several higher frequency neighbours than
• Words are recognised at the word level. these other languages. As a result, inhibitory
Activated word units increase the level of effects in English might make it extremely dif-
activation in the letter-level units for the ficult to identify many low-frequency words.
letters forming that word. The model predicts that the word superior-
ity effect should be greater for high-frequency
According to the model, top-down process- words than for low-frequency ones. The reason
ing is involved in the activation and inhibition is that high-frequency words have a higher
processes going from the word level to the resting level of activation and so should generate
letter level. The word superiority effect occurs more top-down activation from the word level to
because of top-down influences of the word the letter level. In fact, however, the size of the
level on the letter level. Suppose the word SEAT word superiority effect is unaffected by word
is presented, and participants decide whether the frequency (Gunther, Gfoerer, & Weiss, 1984).
third letter is an A or an N. If the word unit
for SEAT is activated at the word level, this Evaluation
will increase activation of the letter A at the The interactive activation model has been very
letter level and inhibit activation of the letter influential. It was one of the first examples
N, leading to stronger activation of SEAT. of how a connectionist processing system (see
How can the pseudoword superiority effect Chapter 1) can be applied to visual word pro-
be explained? When letters are embedded in cessing. It apparently accounts for phenomena
pronounceable nonwords, there will generally such as the word superiority effect and the
be some overlap of spelling patterns between pseudoword superiority effect.
the pseudoword and genuine words. This over-
lap can produce additional activation of the
letters presented in the pseudoword and lead KEY TERM
to the pseudoword superiority effect.
orthographic neighbours: with reference to
According to the model, time to identify a
a given word, those other words that can be
word depends in part on its orthographic neigh- formed by changing one of its letters.
bours, the words that can be formed by changing
9 READING AND SPEECH PERCEPTION 339

The model was not designed to provide a


60
comprehensive account of word recognition.
Accordingly, it is not surprising that it has little 50

Facilitation (ms)
to say about various factors that play an impor- 40
tant role in word recognition. For example, we 30
have seen that phonological processing is often
20
involved in word recognition, but this is not
10
considered within the model. In addition, the
model does not address the role of meaning. 0
As we will see, the meaning of relevant context 10
often influences the early stages of word recogni-

Inhibition (ms)
20
tion (e.g., Lucas, 1999; Penolazzi, Hauk, &
30
Pulvermüller, 2007).
40

50
Context effects
Is word identification influenced by context? 250 400 700
Prime-to-target interval (ms)
This issue was addressed by Meyer and Schvan-
eveldt (1971) in a study in which participants Expected, semantically related
decided whether letter strings formed words Expected, semantically unrelated
Unexpected, semantically related
(lexical decision task). The decision time for a word Unexpected, semantically unrelated
(e.g., DOCTOR) was shorter when the preceding
context or prime was semantically related (e.g., Figure 9.2 The time course of inhibitory and
NURSE) than when it was semantically unrelated facilitatory effects of priming as a function of
whether or not the target word was related
(e.g., LIBRARY) or there was no prime. This is
semantically to the prime, and of whether or not the
known as the semantic priming effect. target word belonged to the expected category. Data
Why does the semantic priming effect occur? from Neely (1977).
Perhaps the context or priming word auto-
matically activates the stored representations (2) The category name was followed by a mem-
of all words related to it due to massive previous ber of the same (but unexpected) category
learning. Another possibility is that controlled (e.g., Bird–Magpie).
processes may be involved, with a prime such
as NURSE leading participants to expect that There were two priming or context effects
a semantically related word will follow. (see Figure 9.2). First, there was a rapid, auto-
Neely (1977) distinguished between the matic effect based only on semantic relatedness.
above explanations. The priming word was a Second, there was a slower-acting attentional
category name (e.g., “Bird”), followed by a effect based only on expectations. Subsequent
letter string at one of three intervals: 250, 400, research has generally confirmed Neely’s (1977)
or 700 ms. In the key manipulation, participants findings except that automatic processes can
expected a particular category name would cause inhibitory effects at short intervals (e.g.,
usually be followed by a member of a different Antos, 1979).
pre-specified category (e.g., “Bird” followed
by the name of part of a building). There were
two kinds of trial with this manipulation: KEY TERM
semantic priming effect: the finding that
(1) The category name was followed by a mem-
word identification is facilitated when there is
ber of a different (but expected) category priming by a semantically related word.
(e.g., Bird–Window).
340 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

Do context effects occur before or after At the onset of the homophone at the end
the individual has gained access to the internal of the sentence, participants were presented
lexicon (a store containing several kinds of with four pictures. In the example given, one
information about words)? In other words, do of the pictures showed flour and another
context effects precede or follow lexical access)? picture showed an object resembling a flower.
Lucas (1999) addressed this issue in a meta- The participants showed a tendency to fixate
analysis. In most of the studies, each context the flower-like picture even though the context
sentence contained an ambiguous word (e.g., “The made it very clear that was not the homo-
man spent the entire day fishing on the bank”). phone’s intended meaning.
The ambiguous word was immediately followed In sum, context often has a rapid influence
by a target word on which a naming or lexical on word processing. However, this influence is
decision task was performed. The target word less than total. For example, word meanings
was appropriate (e.g., “river”) or inappropriate that are inappropriate in a given context can
(e.g., “money”) to the meaning of the ambiguous be activated when listening to speech or reading
word in the sentence context. Overall, the appro- (Chen & Boland, 2008).
priate interpretation of a word produced more
priming than the inappropriate one.
Further support for the notion that con- READING ALOUD
text can influence lexical access was reported
by Penolazzi et al. (2007) using event-related Read out the following words and pseudowords
potentials (ERPs). The target word (shown here (pronounceable nonwords):
in bold) was expected (when “around” was in the
sentence) or not expected (when “near” was in the CAT FOG COMB PINT MANTINESS
sentence): “He was just around/near the corner.” FASS
There was a difference in the ERPs within 200 ms
of the onset of the target word depending on Hopefully, you found it a simple task even though
whether the word was expected or unexpected. it involves hidden complexities. For example,
The finding that the meaning of the context how do you know the “b” in “comb” is silent
affected the processing of the target word so and that “pint” does not rhyme with “hint”?
rapidly suggests (but does not prove) that con- Presumably you have specific information stored
text affects lexical access to the target word. in long-term memory about how to pronounce
We have seen that context has a rapid impact these words. However, this cannot explain your
on processing. However, that does not mean ability to pronounce nonwords such as “man-
that word meanings inconsistent with the con- tiness” and “fass”. Perhaps pseudowords are
text are always rejected very early on. Chen pronounced by analogy with real words (e.g.,
and Boland (2008) focused on the processing “fass” is pronounced to rhyme with “mass”).
of homophones. They selected homophones Another possibility is that rules governing the
having a dominant and a non-dominant mean- translation of letter strings into sounds are used
ing (e.g., “flower” is dominant and “flour” is to generate a pronunciation for nonwords.
non-dominant). Participants listened to sentences
in some of which the context biased the inter-
pretation towards the non-dominant meaning KEY TERMS
of the homophones. Here is an example:
lexicon: a store of detailed information about
words, including orthographic, phonological,
The baker had agreed to make several semantic, and syntactic knowledge.
pies for a large event today, so he started lexical access: entering the lexicon with its
by taking out necessary ingredients like store of detailed information about words.
milk, eggs, and flour.
9 READING AND SPEECH PERCEPTION 341

The above description of the reading of These processes are relatively neat and tidy, and
individual words is oversimplified. Studies on some of them are rule-based. According to the
brain-damaged patients suggest that there are connectionist approach, in contrast, the various
different reading disorders depending on which processes involved in reading are used more
parts of the language system are damaged. We flexibly than assumed within the dual-route
turn now to two major theoretical approaches model. In crude terms, it is a matter of “all
that have considered reading aloud in healthy hands to the pump”: all the relevant knowledge
and brain-damaged individuals. These are the we possess about word sounds, word spellings,
dual-route cascaded model (Coltheart et al., and word meanings is used in parallel whether
2001) and the distributed connectionist approach we are reading words or nonwords.
or triangle model (Plaut, McClelland, Seidenberg,
& Patterson, 1996).
At the risk of oversimplification, we can Dual-route cascaded model
identify various key differences between the two Coltheart and his colleagues have put for-
approaches as follows. According to the dual-route ward various theories of reading, culminating
approach, the processes involved in reading in their dual-route cascaded model (2001; see
words and nonwords differ from each other. Figure 9.3). This model accounts for reading

Print

Orthographic
analysis

Orthographic
input lexicon Route 1
Route 2

Semantic Grapheme–phoneme
system Route 3 rule system

Phonological
output lexicon

Response
buffer

Figure 9.3 Basic


architecture of the
Speech dual-route cascaded model.
Adapted from Coltheart
et al. (2001).
342 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

aloud and for silent reading. There are two set of English monosyllables that contain that
main routes between the printed word and grapheme” (p. 216).
speech, both starting with orthographic analysis If a brain-damaged patient used only Route
(used for identifying and grouping letters 1, what would we find? The use of grapheme–
in printed words). The crucial distinction is phoneme conversion rules should permit
between a lexical or dictionary lookup route accurate pronunciation of words having regular
and a non-lexical route (Route 1), which involves spelling–sound correspondences but not of
converting letters into sounds. In Figure 9.3, irregular words not conforming to the con-
the non-lexical route is Route 1, and the lexical version rules. For example, if an irregular
route is divided into two sub-routes (Routes 2 word such as “pint” has grapheme–phoneme
and 3). conversion rules applied to it, it should be
It is assumed that healthy individuals use pronounced to rhyme with “hint”. This is known
both routes when reading aloud, and that as regularisation. Finally, grapheme–phoneme
these two routes are not independent in their conversion rules can provide pronunciations
functioning. However, naming visually presented of nonwords.
words typically depends mostly on the lexical Patients adhering most closely to exclusive
route rather than the non-lexical route, because use of Route 1 are surface dyslexics. Surface
the former route generally operates faster. dyslexia is a condition involving particular
It is a cascade model because activation at problems in reading irregular words. McCarthy
one level is passed on to the next level before and Warrington (1984) studied KT, who had
processing at the first level is complete. Cascaded surface dyslexia. He read 100% of nonwords
models can be contrasted with thresholded accurately, and 81% of regular words, but was
models in which activation at one level is only successful with only 41% of irregular words.
passed on to other levels after a given threshold Over 70% of the errors KT made with irregular
of activation is reached. words were due to regularisation.
Earlier we discussed theoretical approaches If patients with surface dyslexia exclusively
differing in the importance they attach to phono- use Route 1, their reading performance should
logical processing in visual word identification. not depend on lexical variables (e.g., word
Coltheart et al. (2001) argued for a weak phono- frequency). That is not true of some surface
logical model in which word identification dyslexics. Bub, Cancelliere, and Kertesz (1985)
generally does not depend on phonological studied MP, who read 85% of irregular high-
processing. frequency words accurately but only 40% of
low-frequency ones. Her ability to read many
Route 1 (grapheme–phoneme irregular words and her superior performance
conversion) with high-frequency words indicate she could
Route 1 differs from the other routes in using make some use of the lexical route.
grapheme–phoneme conversion, which involves According to the model, the main reason
converting spelling (graphemes) into sound patients with surface dyslexia have problems
(phonemes). A grapheme is a basic unit of written
language and a phoneme is a basic unit of spoken
language. According to Coltheart et al. (2001, KEY TERMS
p. 212), “By the term ‘grapheme’ we mean a
letter or letter sequence that corresponds to a cascade model: a model in which information
single phoneme, such as the i in pig, the ng in passes from one level to the next before
ping, and the igh in high.” In their computa- processing is complete at the first level.
surface dyslexia: a condition in which regular
tional model, “For any grapheme, the phoneme words can be read but there is impaired ability
assigned to it was the phoneme most com- to read irregular words.
monly associated with that grapheme in the
9 READING AND SPEECH PERCEPTION 343

when reading irregular words is that they rely read 100% of real words but only 10% of
primarily on Route 1. If they can also make nonwords. Funnell (1983) studied a patient,
reasonable use of Route 3, then they might be WB. His ability to use Route 1 was very limited
able to read aloud correctly nearly all the words because he could not produce the sound of any
they know in the absence of any knowledge single letters or nonwords. He could read 85%
of the meanings of those words stored in the of words, and seemed to do this by using Route
semantic system. Thus, there should not be an 2. He had a poor ability to make semantic
association between impaired semantic know- judgements about words, suggesting he was
ledge and the incidence of surface dyslexia. bypassing the semantic system when reading
Woollams, Lambon Ralph, Plaut, & Patterson words.
(2007) studied patients with semantic dementia According to the dual-route model, phono-
(see Glossary). This is a condition in which logical dyslexics have specific problems with
brain damage impairs semantic knowledge (see grapheme–phoneme conversion. However,
Chapter 7), but typically has little effect on the Coltheart (1996) discussed 18 patients with
orthographic or phonological systems. There was phonological dyslexia, all of whom had general
a strong association between impaired semantic phonological impairments. Subsequent research
knowledge and surface dyslexia among these has indicated that some phonological dyslexics
patients. The implication is that damage to have impairments as specific as assumed within
the semantic system is often a major factor in the dual-route model. Caccappolo-van Vliet,
surface dyslexia. Miozzo, and Stern (2004) studied two phono-
logical dyslexics. IB was a 77-year-old woman
Route 2 (lexicon + semantic who had worked as a secretary, and MO was
knowledge) and Route 3 (lexicon only) a 48-year-old male accountant. Both patients
The basic idea behind Route 2 is that repre- showed the typical pattern associated with
sentations of thousands of familiar words are phonological dyslexia – their performance on
stored in an orthographic input lexicon. Visual reading regular and irregular words exceeded
presentation of a word leads to activation in 90% compared to under 60% with nonwords.
the orthographic input lexicon. This is followed Crucially, the performance of IB and MO on
by obtaining its meaning from the semantic various phonological tasks (e.g., deciding whether
system, after which its sound pattern is gener- two words rhymed; finding a rhyming word)
ated by the phonological output lexicon. Route was intact (above 95%).
3 also involves the orthographic input and
phonological output lexicons, but it bypasses Deep dyslexia
the semantic system. Deep dyslexia occurs as a result of brain
How could we identify patients using damage to left-hemisphere brain areas involved
Route 2 or Route 3 but not Route 1? Their in language. Deep dyslexics have particular
intact orthographic input lexicon means they problems in reading unfamiliar words, and an
can pronounce familiar words whether regular
or irregular. However, their inability to use
grapheme–phoneme conversion should mean KEY TERMS
they find it very hard to pronounce unfamiliar
words and nonwords. phonological dyslexia: a condition in which
Phonological dyslexics fit this predicted familiar words can be read but there is impaired
pattern fairly well. Phonological dyslexia involves ability to read unfamiliar words and nonwords.
particular problems with reading unfamiliar deep dyslexia: a condition in which reading
unfamiliar words is impaired and there are
words and nonwords. The first case of phono- semantic reading errors (e.g., reading “missile”
logical dyslexia reported systematically was RG as “rocket”).
(Beauvois & Dérouesné, 1979). RG successfully
344 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

inability to read nonwords. However, the most that 7898 (99%) were read accurately. When the
striking symptom is semantic reading errors model was presented with 7000 one-syllable
(e.g., “ship” read as “boat”). Deep dyslexia may nonwords, it read 98.9% of them correctly.
result from damage to the grapheme–phoneme It follows from the model that we might
conversion and semantic systems. Deep dyslexia expect different brain regions to be associated
resembles a more severe form of phonological with each route. What has been done in several
dyslexia. Indeed, deep dyslexics showing some studies is to compare the brain activation when
recovery of reading skills often become phono- participants name irregular words and pseudo-
logical dyslexics (Southwood & Chatterjee, 2001). words (pronounceable nonwords). The assump-
Sato, Patterson, Fushimi, Maxim, and Bryan tion is that the lexical route is of primary
(2008) studied a Japanese woman, YT. She had importance with irregular words, whereas the
problems with the Japanese script kana (each non-lexical route is used with pseudowords.
symbol represents a syllable) and the Japanese Seghier, Lee, Schofield, Ellis, and Price (2008)
script kanji (each symbol stands for a morpheme, found that the left anterior occipito-temporal
which is the smallest unit of meaning). YT showed region was associated with reading irregular words.
deep dyslexia for kanji but phonological dyslexia In contrast, the left posterior occipito-temporal
for kana. Sato et al. concluded that YT’s impaired region was associated with reading pseudowords.
reading performance was due mainly to a general These findings are consistent with the notion
phonological deficit. of separate routes in reading.
The notion that deep dyslexia and phono- Zevin and Balota (2000) argued that the
logical dyslexia involves similar underlying mech- extent to which we use the lexical and non-
anisms is an attractive one. Jefferies, Sage, and lexical routes when naming words depends on
Lambon Ralph (2007) found that deep dyslexics attentional control. Readers named low-frequency
performed poorly on various phonologically- irregular words or pseudowords before naming
based tasks (e.g., phoneme addition; phoneme a target word. They predicted that naming
subtraction). They concluded that deep dyslexics irregular words would cause readers to attend
have a general phonological impairment, as do to lexical information, whereas naming pseudo-
phonological dyslexics. words would lead them to attend to non-lexical
information. As predicted, the relative roles of
Computational modelling the lexical and non-lexical routes in reading
Coltheart et al. (2001) produced a detailed the target word were affected by what had
computational model to test their dual-route been read previously.
cascaded model. They started with 7981 one- According to the model, regular words
syllable words varying in length between one (those conforming to the grapheme–phoneme
and eight letters. They used McClelland and rules in Route 1) can often be named faster than
Rumelhart’s (1981) interactive activation model irregular words. According to the distributed
(discussed earlier) as the basis for the ortho- connectionist approach (Plaut et al., 1996;
graphic component of their model, and the discussed shortly), what is important is con-
output or response side of the model derives sistency. Consistent words have letter patterns
from the theories of Dell (1986) and Levelt et al. that are always pronounced the same in all
(1999) (see Chapter 11). The pronunciation most words in which they appear and are assumed
activated by processing in the lexical and non- to be faster to name than inconsistent words.
lexical routes is the one determining the naming Irregular words tend to be inconsistent, and
response. so we need to decide whether regularity or
consistency is more important. Jared (2002)
Evidence compared directly the effects of regularity and
Coltheart et al. (2001) presented their com- of consistency on word naming. Her findings
putational model with all 7981 words and found were reasonably clear-cut: word naming times
9 READING AND SPEECH PERCEPTION 345

What are the model’s limitations? First, the


610
600
Inconsistent assumption that the time taken to pronounce
Mean naming latency (ms) Regular- a word depends on its regularity rather than
590 consistent
580 its consistency is incorrect (e.g., Glushko, 1979;
570 Jared, 2002). This is serious because the theor-
560 etical significance of word regularity follows
550
directly from the central assumption that the
540
530
non-lexical route uses a grapheme–phoneme
520 rule system.
510 Second, as Perry et al. (2007, p. 276)
500 pointed out, “A major shortcoming of DRC
HF-EXC HF-RI LF-EXC LF-RI [dual-route cascaded model] is the absence
Word type of learning. DRC is fully hardwired, and the
nonlexical route operates with a partially hard-
Figure 9.4 Mean naming latencies for high- coded set of grapheme–phoneme rules.”
frequency (HF) and low-frequency (LF) words that
were irregular (exception words: EXC) or regular
Third, the model assumes that only the
and inconsistent (RI). Mean naming latencies of non-lexical route is involved in pronouncing
regular consistent words matched with each of these nonwords. As a consequence, similarities and
word types are also shown. The differences between differences between nonwords and genuine
consistent and inconsistent words were much words are irrelevant. In fact, however, we will
greater than those between regular and irregular
words (EXC compared to RI). Reprinted from Jared
see shortly that prediction is incorrect, because
(2002), Copyright 2002, with permission from consistent nonwords are faster to pronounce
Elsevier. than inconsistent ones (Zevin & Seidenberg,
2006).
were affected much more by consistency than Fourth, the model assumes that the phono-
by regularity (see Figure 9.4). This finding, which logical processing of visually presented words
is contrary to the dual-route model, has been occurs fairly slowly and has relatively little effect
replicated in other studies (Harley, 2008). on visual word recognition. In fact, however, such
phonological processes generally occur rapidly
Evaluation and automatically (Rastle & Brysbaert, 2006).
The dual-route cascaded model represents an Fifth, it is assumed that the semantic system
ambitious attempt to account for basic reading can play an important role in reading aloud
processes in brain-damaged and healthy indi- (i.e., via Route 2). In practice, however, “The
viduals. Its explanation of reading disorders such semantic system of the model remains unimple-
as surface dyslexia and phonological dyslexia mented” (Woollams et al., 2007, p. 317). The
has been very influential. The model has also reason is that it is assumed within the model that
proved useful in accounting for the naming and individuals can read all the words they know
lexical-decision performance of healthy indi- without accessing the meanings of those words.
viduals, and has received some support from Sixth, as Coltheart et al. (2001, p. 236)
studies in cognitive neuroscience (e.g., Seghier admitted, “The Chinese, Japanese, and Korean
et al., 2008). Perry, Ziegler, and Zorzi (2007) writing systems are structurally so different
developed a new connectionist dual process model from the English writing system that a model
(the CDP+ model) based in part on the dual-route like the DRC [dual-route cascaded] model would
cascaded model. This new model includes a simply not be applicable: for example, mono-
lexical and a sublexical route, and eliminates some syllabic nonwords cannot even be written in
of the problems with the dual-route cascaded the Chinese script or in Japanese kanji, so the
model (e.g., its inability to learn; its inability distinction between a lexical and non-lexical
to account for consistency effects). route for reading cannot even arise.”
346 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

Distributed connectionist
approach Context
Within the dual-route model, it is assumed that
pronouncing irregular words and nonwords
involves different routes. This contrasts with the Meaning
connectionist approach pioneered by Seidenberg
and McClelland (1989) and developed most
notably by Plaut et al. (1996). According to
Plaut et al. (p. 58), their approach

eschews [avoids] separate mechanisms


Orthography Phonology
for pronouncing nonwords and exception
[irregular] words. Rather, all of the
system’s knowledge of spelling–sound
correspondences is brought to bear in MAKE /mAk/
pronouncing all types of letter strings
[words and nonwords]. Conflicts among Figure 9.5 Seidenberg and McClelland’s (1989)
“triangle model” of word recognition. Implemented
possible alternative pronunciations of a
pathways are shown in blue. Reproduced with
letter string are resolved . . . by co-operative permission from Harm and Seidenberg (2001).
and competitive interactions based on
how the letter string relates to all known
words and their pronunciations. conforming to those rules). As we have seen,
the evidence favours the notion of consistency
Thus, Plaut et al. (1996) assumed that the pro- over regularity (Jared, 2002).
nunciation of words and nonwords is based on Plaut et al. (1996) developed a successful
a highly interactive system. simulation of reading performance. Their net-
This general approach is known as the dis- work learned to pronounce words accurately
tributed connectionist approach or the triangle as connections developed between the visual
model (see Figure 9.5). The three sides of the forms of letters and combinations of letters
triangle are orthography (spelling), phonology (grapheme units) and their corresponding pho-
(sound), and semantics (meaning). There are nemes (phoneme units). The network learned
two routes from spelling to sound: (1) a direct via back-propagation, in which the actual out-
pathway from orthography to phonology; and puts or responses of the system are compared
(2) an indirect pathway from orthography to against the correct ones (see Chapter 1). The
phonology that proceeds via word meanings. network received prolonged training with 2998
Plaut et al. (1996) argued that words (and words. At the end of training, the network’s
nonwords) vary in consistency (the extent to performance resembled that of adult readers
which their pronunciation agrees with those in various ways:
of similarly spelled words). Highly consistent
words and nonwords can generally be pro- (1) Inconsistent words took longer to name
nounced faster and more accurately than incon- than consistent ones.
sistent words and nonwords, because more of (2) Rare words took longer to name than
the available knowledge supports the correct common ones.
pronunciation of such words. In contrast, the (3) There was an interaction between word
dual-route cascaded model divides words into frequency and consistency, with the effects
two categories: words are regular (conforming of consistency being much greater for rare
to grapheme–phoneme rules) or irregular (not words than for common ones.
9 READING AND SPEECH PERCEPTION 347

(4) The network pronounced over 90% of an orthographic impairment in addition to the
nonwords “correctly”, which is comparable phonological one. Howard and Best (1996)
to adult readers. This is impressive given found that their patient, Melanie-Jane, was
that the network received no direct training better at reading pseudohomophones whose
on nonwords. spelling resembled the related word (e.g.,
“gerl”) than those whose spellings did not (e.g.,
What role does semantic knowledge of “phocks”). Finally, Nickels, Biedermann, Coltheart,
words play in Plaut et al.’s (1996) model? It is Saunders, and Tree (2008) used a combination
assumed that the route from orthography to of computer modelling and data from phono-
phonology via meaning is typically slower than logical dyslexics. No single locus of impairment
the direct route proceeding straight from ortho- (e.g., the phonological system) could account for
graphy to phonology. Semantic knowledge is the various impairments found in patients.
most likely to have an impact for inconsistent What does the model say about deep
words – they take longer to name, and this dyslexia? Earlier we discussed evidence (e.g.,
provides more opportunity for semantic know- Jefferies et al., 2007) suggesting that a general
ledge to have an effect. phonological impairment is of major impor-
tance in deep dyslexia. Support for this view-
Evidence point was provided by Crisp and Lambon
How does the distributed connectionist approach Ralph (2006). They studied patients with deep
account for surface dyslexia, phonological dyslexia or phonological dyslexia. There was
dyslexia, and deep dyslexia? It is assumed that no clear dividing line between the two con-
surface dyslexia (involving problems in reading ditions, with the two groups sharing many
irregular or inconsistent words) occurs mainly symptoms. Patients with both conditions had
because of damage to the semantic system. We a severe phonological impairment, but patients
saw earlier that patients with semantic dementia with deep dyslexia were more likely than those
(which involves extensive damage to the semantic with phonological dyslexia to have severe
system) generally exhibit the symptoms of sur- semantic impairments as well.
face dyslexia. Plaut et al. (1996) damaged their According to the model, semantic factors
model to reduce or eliminate the contribution can be important in reading aloud, especially
from semantics. The network’s reading per- when the words (or nonwords) are irregular or
formance remained very good on regular inconsistent and so are more difficult to read.
high- and low-frequency words and on non- McKay, Davis, Savage, and Castles (2008) decided
words, worse on irregular high-frequency words, to test this prediction directly by training parti-
and worst on irregular low-frequency words. cipants to read aloud nonwords (e.g., “bink”).
This matches the pattern found with surface Some of the nonwords had consistent (or expected)
dyslexics. pronunciations whereas others had inconsistent
It is assumed that phonological dyslexia pronunciations. The crucial manipulation was
(involving problems in reading unfamiliar words that participants learned the meanings of some
and nonwords) is due to a general impairment of these nonwords but not of others.
of phonological processing. The evidence is The findings obtained by McKay et al. (2008)
mixed (see earlier discussion). On the one were entirely in line with the model. Reading
hand, Coltheart (1996) found many cases in aloud was faster for nonwords in the semantic
which phonological dyslexia was associated condition (learning pronunciations) than in the
with a general phonological impairment. On non-semantic condition when the nonwords
the other hand, Caccappolo-van Vliet et al. were inconsistent (see Figure 9.6). However,
(2004) studied phonological dyslexics whose speed of reading aloud was the same in the
phonological processing was almost intact. semantic and non-semantic conditions when
Phonological dyslexics may also suffer from the nonwords were consistent.
348 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

to the triangle model, however, the pronunci-


1000
ations of inconsistent nonwords should be more
Nonsemantic variable than those of consistent ones, and that
Semantic
is what was found.
Reading latency (ms)

900

Evaluation
800 The distributed connectionist approach has
several successes to its credit. First, the over-
700 arching assumption that the orthographic,
semantic, and phonological systems are used in
parallel in an interactive fashion during reading
600 has received much support. Second, much pro-
Consistent Inconsistent gress has been made in understanding reading
Trained novel words disorders by assuming that a general phono-
logical impairment underlies phonological dys-
Figure 9.6 Mean reading latencies in ms for lexia, whereas a semantic impairment underlies
consistent and inconsistent novel words (nonwords) surface dyslexia. Third, the assumption that the
that had been learned with meanings (semantic) or semantic system is often important in reading
without meanings (nonsemantic). From McKay et al.
(2008), Copyright © 2008 American Psychological aloud appears correct (e.g., McKay et al., 2008).
Association. Reproduced with permission. Fourth, the assumption that consistency is more
important than word regularity (emphasised
According to the triangle model, the time within the dual-route cascaded model) in deter-
taken to pronounce nonwords should depend mining the time taken to name words has
on whether they are consistent or not. For received strong support. Fifth, the distributed
example, the word body “–ust” is very con- connectionist approach is more successful than
sistent because it is always pronounced in the the dual-route model in accounting for con-
same way in monosyllabic words, and so the sistency effects with nonwords and for individual
nonword “nust” is consistent. In contrast, differences in nonword naming (Zevin &
the word body “–ave” is inconsistent because Seidenberg, 2006). Sixth, the distributed con-
it is pronounced in different ways in different nectionist approach includes an explicit mech-
words (e.g., “save” and “have”), and so the anism to simulate how we learn to pronounce
nonword “mave” is inconsistent. The prediction words, whereas the dual-route model has less
is that inconsistent nonwords will take longer to say about learning.
to pronounce. According to the dual-route What are the triangle model’s limitations?
cascaded model, in contrast, nonwords are pro- First, as Harley (2008) pointed out, connectionist
nounced using non-lexical pronunciation rules models have tended to focus on the processes
and so there should be no difference between involved in reading relatively simple, single-
consistent and inconsistent nonwords. syllable words.
The findings are clear-cut. Inconsistent non- Second, as Plaut et al. (1996, p. 108) admitted,
words take longer to pronounce than consistent “The nature of processing within the semantic
ones (Glushko, 1979; Zevin & Seidenberg, pathway has been characterised in only the
2006). Such findings provide support for the coarsest way.” However, Harm and Seidenberg
triangle model over the dual-route model. (2004) largely filled that gap within the triangle
Zevin and Seidenberg obtained further support model by implementing its semantic com-
for the triangle model over the dual-route ponent to map orthography and phonology
model. According to the dual-route model, the onto semantics.
pronunciation rules should generate only one Third, the model’s explanations of phono-
pronunciation for each nonword. According logical dyslexia and surface dyslexia are
9 READING AND SPEECH PERCEPTION 349

somewhat oversimplified. Phonological dyslexia mutilated to permit normal reading only within
is supposed to be due to a general phonological the window region. The effects of different-
impairment, but some phonological dyslexics sized windows on reading performance can be
do not show that general impairment (e.g., compared.
Caccappolo-van Vliet et al., 2004; Tree & Kay, The perceptual span (effective field of view)
2006). In similar fashion, surface dyslexia is is affected by the difficulty of the text and print
supposed to be due to a general semantic impair- size. It extends three or four letters to the left
ment, but this is not always the case (Woollams of fixation and up to 15 letters to the right.
et al., 2007). This asymmetry is clearly learned. Readers of
Fourth, we saw earlier that the processes Hebrew, which is read from right to left, show
involved in naming words can be influenced the opposite asymmetry (Pollatsek, Bolozky,
by attentional control (Zevin & Balota, 2000). Well, & Rayner, 1981). The size of the perceptual
However, this is not a factor explicitly con- span means that parafoveal information (from
sidered within the triangle model. the area surrounding the central or foveal
region of high visual acuity) is used in reading.
Convincing evidence comes from use of the
READING: EYE-MOVEMENT boundary technique, in which there is a preview
RESEARCH word just to the right of the point of fixation.
As the reader makes a saccade to this word,
Eye movements are of fundamental importance it changes into the target word, although the
to reading. Most of the information that we reader is unaware of the change. The fixation
process from a text at any given moment relates duration on the target word is less when that
to the word that is currently being fixated, word is the same as the preview word. The
although some information may be processed evidence using this technique suggests that visual
from other words close to the fixation point. and phonological information can be extracted
Our eyes seem to move smoothly across (see Reichle, Pollatsek, Fisher, & Rayner, 1998)
the page while reading. In fact, they actually from parafoveal processing.
move in rapid jerks (saccades), as you can see Readers typically fixate about 80% of content
if you look closely at someone else reading. words (nouns, verbs, and adjectives), whereas
Saccades are ballistic (once initiated, their they fixate only about 20% of function words
direction cannot be changed). There are fairly (articles such as “a” and “the”; conjunctions
frequent regressions in which the eyes move such as “and”, “but”, and “or”); and pronouns
backwards in the text, accounting for about such as “he”, “she”, and “they”). Words not
10% of all saccades. Saccades take 20–30 ms fixated tend to be common, short, or predictable.
to complete, and are separated by fixations Thus, words easy to process are most likely to
lasting for 200–250 ms. The length of each be skipped. Finally, there is the spillover effect:
saccade is approximately eight letters or spaces.
Information is extracted from the text only
during each fixation and not during the inter- KEY TERMS
vening saccades (Latour, 1962).
The amount of text from which useful saccades: fast eye movements that cannot be
information can be obtained in each fixation altered after being initiated.
has been studied using the “moving window” perceptual span: the effective field of view in
technique (see Rayner & Sereno, 1994). Most of reading (letters to the left and right of fixation
the text is mutilated except for an experimenter- that can be processed).
spillover effect: any given word is fixated
defined area or window surrounding the reader’s longer during reading when preceded by a rare
fixation point. Every time the reader moves word rather than a common one.
his/her eyes, different parts of the text are
350 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

the fixation time on a word is longer when it then, could they decide which words to skip?
is preceded by a rare word. The E-Z Reader model provides an ele-
gant solution to the above problems. A crucial
assumption is that the next eye movement is
E-Z Reader model programmed after only part of the processing
Reichle et al. (1998), Reichle, Rayner, and of the currently fixated word has occurred. This
Pollatsek (2003), and Pollatsek, Reichle, and assumption greatly reduces the time between
Rayner (2006) have accounted for the pattern completion of processing on the current word
of eye movements in reading in various versions and movement of the eyes to the next word.
of their E-Z Reader model. The name is a spoof There is typically less spare time available
on the title of the movie Easy Rider. However, with rare words than common ones, and that
this is only clear if you know that Z is pro- accounts for the spillover effect described above.
nounced “zee” in American English! If the processing of the next word is completed
How do we use our eyes when reading? The rapidly enough (e.g., it is highly predictable in
most obvious model assumes that we fixate on the sentence context), it is skipped.
a word until we have processed it adequately, According to the model, readers can attend
after which we immediately fixate the next to two words (the currently fixated one and the
word until it has been adequately processed. next word) during a single fixation. However,
Alas, there are two major problems with such it is a serial processing model, meaning that at
a model. First, it takes 85–200 ms to execute an any given moment only one word is processed.
eye-movement programme. If readers operated This can be contrasted with parallel processing
according to the simple model described above, models such as the SWIFT (Saccade-generation
they would waste time waiting for their eyes With Inhibition by Foveal Targets) model put
to move to the next word. Second, as we have forward by Engbert, Longtin, and Kliegl (2002)
seen, readers sometimes skip words. It is hard and Engbert, Nuthmann, Richter, and Kliegl
to see how this could happen within the model, (2005). It is assumed within the SWIFT model
because readers would not know anything about that the durations of eye fixations in reading
the next word until they had fixated it. How, are influenced by the previous and the next
word as well as the one currently fixated. As
Kliegl (2007) pointed out, the typical perceptual
span of about 18 letters is large enough to
accommodate all three words (prior, current,
and next) provided they are of average length.
We will discuss evidence comparing serial and
parallel models later.
Here are the major assumptions of the E-Z
Reader model:

(1) Readers check the familiarity of the word


currently fixated.
(2) Completion of frequency checking of a
word (the first stage of lexical access) is
the signal to initiate an eye-movement
According to the E-Z Reader model (a spoof on programme.
the title of the movie Easy Rider) readers can (3) Readers then engage in the second stage
attend to two words (the currently fixated one
of lexical access (see Glossary), which
and the next word) during a single fixation.
© John Springer Collection/CORBIS. involves accessing the current word’s
semantic and phonological forms. This
9 READING AND SPEECH PERCEPTION 351

stage takes longer than the first one. Evidence


(4) Completion of the second stage is the Reichle et al. (2003) compared 11 models of
signal for a shift of covert (internal) atten- reading in terms of whether each one could
tion to the next word. account for each of eight phenomena (e.g.,
(5) Frequency checking and lexical access are frequency effects; spillover effects; costs of
completed faster for common words than skipping). E-Z Reader accounted for all eight
rare ones (more so for lexical access). phenomena, whereas eight of the other models
(6) Frequency checking and lexical access are accounted for no more than two.
completed faster for predictable than for One of the model’s main assumptions is that
unpredictable words. information about word frequency is accessed
rapidly during word processing. There is support
The above theoretical assumptions lead to for that assumption. For example, Sereno,
various predictions (see Figure 9.7). Assumptions Rayner, and Posner (1998) observed effects
(2) and (5) together predict that the time spent of word frequency on event-related potentials
fixating common words will be less than rare (ERPs; see Glossary) within 150 ms.
words: this has been found repeatedly. According The model was designed to account for the
to the model, readers spend the time between eye fixations of native English speakers reading
completion of lexical access to one word and English texts. However, English is unusual in
the next eye movement in parafoveal processing some respects (e.g., word order is very impor-
of the next word. There is less parafoveal tant), and it is possible that the reading strategies
processing when the fixated word is rare (see used by readers of English are not universal. This
Figure 9.7). Thus, the word following a rare issue was addressed by Rayner, Li, and Pollatsek
word needs to be fixated longer than the word (2007), who studied eye movements in Chinese
following a common word (the spillover effect readers reading Chinese text. Chinese differs
described earlier). from English in that it is written without spaces
Why are common, predictable, or short between successive characters and consists
words most likely to be skipped or not fixated? of words mostly made up of two characters.
A word is skipped when its lexical access has However, the pattern of eye movements was
been completed while the current word is being similar to that previously found for readers of
fixated. This is most likely to happen with English.
common, predictable, or short words because According to the model, word frequency
lexical access is fastest for these words (assump- and word predictability are independent factors
tions 5 and 6). determining how long we fixate on a word
during reading. However, McDonald and

300
Time between successive
eye movements (ms)

250

200 Eye movement executed


150

100 Completion of lexical access:


start of processing of next word
50 Completion of familiarity check Figure 9.7 The effects
of word frequency on eye
0
0 1 2 3 4 5 6 7 8 9 10 11 movements according to the
Word frequency E-Z Reader model. Adapted
from Reichle et al. (1998).
352 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

Shillcock (2003) found that common words (although occasional words may be skipped).
were more predictable than rare ones on the If readers deviate from the “correct” order, it
basis of the preceding word. When the effects of would be expected that they would struggle
word frequency and word predictability were to make sense of what they are reading. In
disentangled, the effects of word frequency on contrast, a parallel processing model such as
word fixation time disappeared. SWIFT does not assume that words have to be
It is assumed within the model that fixations read in the correct order or that deviation from
should be shortest when they are on the centre that order necessarily creates any problems.
of words rather than towards either end. The Kennedy and Pynte (2008) found that readers
reason is because word identification should only rarely read texts in a totally orderly fashion.
be easiest when that happens. In fact, fixations In addition, there was practically no evidence
tend to be much longer when they are at the that a failure to read the words in a text in the
centre of words than towards one end (Vitu, correct order caused any significant disruption
McConkie, Kerr, & O’Regan, 2001). Why is to processing.
this? Some fixations at the end of a word are
short because readers decide to make a second Evaluation
fixation closer to the middle of the word to The model has proved very successful. It specifies
facilitate its identification. many of the major factors determining eye
We turn finally to the controversial assump- movements in reading, and has performed
tion that words are processed serially (one at a well against rival models. At a very general
time), which is opposed by advocates of parallel level, the model has identified close connections
processing models such as SWIFT (Engbert between eye fixations and cognitive processes
et al., 2002, 2005). We will focus on parafoveal- during reading. In addition, the model has
on-foveal effects – it sounds complicated but identified various factors (e.g., word frequency;
simply means that characteristics of the next word predictability) influencing fixation times.
word influence the fixation duration on the What are the limitations of the model?
current word. If such effects exist, they suggest First, its emphasis is very much on the early pro-
that the current and the next word are both cesses involved in reading (e.g., lexical access).
processed at the same time. In other words, As a result, the model has little to say about
these effects suggest the existence of parallel higher-level processes (e.g., integration of infor-
processing, which is predicted by the SWIFT mation across the words within a sentence) that
model but not by the E-Z Reader model. are important in reading. Reichle et al. (2003)
The findings are mixed (see Rayner et al. defended their neglect of higher-level processes
(2007) for a review). However, Kennedy, Pynte, as follows: “We posit [assume] that higher-order
and Ducrot (2002) obtained convincing evidence processes intervene in eye-movement control
of parafoveal-on-foveal effects in a methodo- only when ‘something is wrong’ and either
logically sound study. White (2008) varied the send a message to stop moving forward or a
orthographic familiarity and word frequency signal to execute a regression.”
of the next word. There were no parafoveal-on- Second, doubts have been raised concerning
foveal effects when word frequency was manipu- the model’s assumptions that attention is allo-
lated and only a very small effect (6 ms) when cated in a serial fashion to only one word at
orthographic familiarity was manipulated. These
findings suggest there may be a limited amount
of parallel processing involving low-level features KEY TERM
(i.e., letters) of the next word, but not lexical
parafoveal-on-foveal effects: the finding that
features (i.e., word frequency).
fixation duration on the current word is
According to the E-Z Reader model, readers influenced by characteristics of the next word.
fixate and process words in the “correct” order
9 READING AND SPEECH PERCEPTION 353

a time and that words are processed in the of eye fixations. However, word frequency
“correct” order. The existence of parafoveal-on- generally correlates with word predictability,
foveal effects (e.g., Kennedy et al., 2002; White, and some evidence (e.g., McDonald & Shillcock,
2008) suggests that parallel processing can 2003) suggests that word predictability may be
occur, but the effects are generally small. The more important than word frequency.
finding that most readers fail to process the
words in a text strictly in the “correct” order
is inconsistent with the model. LISTENING TO SPEECH
Third, the emphasis of the model is perhaps
too much on explaining eye-movement data Understanding speech is much less straight-
rather than other findings on reading. As Sereno, forward than one might imagine. Some idea of
Brewer, and O’Donnell (2003, p. 331) pointed the processes involved in listening to speech is
out, “The danger is that in setting out to establish provided in Figure 9.8. The first stage involves
a model of eye-movement control, the result decoding the auditory signal. As Liberman,
may be a model of eye-movement experiments.” Cooper, Shankweiler, and Studdert-Kennedy
What is needed is to integrate the findings from (1967) pointed out, speech can be regarded as
eye-movement studies more closely with general a code, and we as listeners possess the key to
theories of reading. understanding it. However, before starting to do
Fourth, the model attaches great importance that, we often need to select out the speech signal
to word frequency as a determinant of the length from other completely irrelevant auditory input
Integrate

Integration
into discourse
model
Utterance
Recognise

interpretation
Word – syntactic analysis
recognition – thematic processing
– activation of
lexical candidates
– competition
– retrieval of
Segment

lexical information

Transform to
abstract representation
Decode

Select speech from


acoustic background Figure 9.8 The main
processes involved in
speech perception and
comprehension. From
Auditory input Cutler and Clifton (1999) by
permission of Oxford
University Press.
354 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

(e.g., traffic noise). Decoding itself involves extract- The fourth and fifth stages both emphasise
ing discrete elements from the speech signal. speech comprehension. The focus in the fourth
Cutler and Clifton (1999, p. 126) provide a good stage is on interpretation of the utterance.
account of what is involved: “Linguists describe This involves constructing a coherent meaning
speech as a series of phonetic segments; a phonetic for each sentence on the basis of informa-
segment (phoneme) is simply the smallest unit tion about individual words and their order in
in terms of which spoken language can be sequen- the sentence. Finally, in the fifth stage, the focus
tially described. Thus, the word key consists of is on integrating the meaning of the current
two segments /ki/, and sea of the two segments sentence with preceding speech to construct an
/si/; they differ in the first phoneme.” overall model of the speaker’s message.
It is generally assumed that the second
stage of speech perception involves identifying
the syllables contained in the speech signal. Speech signal
However, there is some controversy as to whether Useful information about the speech signal has
the phoneme or the syllable is the basic unit (or been obtained from the spectrograph. Sound
building block) in speech perception. Goldinger enters this instrument through a microphone,
and Azuma (2003) argued that there is no basic and is then converted into an electrical signal.
unit of speech perception. Instead, the perceptual This signal is fed to a bank of filters selecting
unit varies flexibly depending on the precise narrow-frequency bands. Finally, the spectro-
circumstances. They presented listeners with graph produces a visible record of the component
lists of two-syllable nonwords and asked them frequencies of speech over time; this is known
to decide whether each nonword contained a as a spectrogram (see Figure 9.9). This provides
target. The target was a phoneme or a syllable. information about formants, which are fre-
The volunteers who recorded the lists of non- quency bands emphasised by the vocal apparatus
words were told that phonemes are the basic when saying a phoneme. Vowels have three
units of speech perception or that syllables are formants numbered first, second, and third,
the basic units. These instructions influenced starting with the formant of lowest frequency.
how they read the nonwords, and this in turn The sound frequency of vowels is generally
affected the listeners’ performance. Listeners lower than that of consonants.
detected phoneme targets faster than syllable Spectrograms may seem to provide an
targets when the speaker believed phonemes accurate picture of those aspects of the sound
are the fundamental units in speech perception. wave having the greatest influence on the
In contrast, they detected syllable targets faster human auditory system. However, this is not
than phoneme targets when the speaker believed necessarily so. For example, formants look
syllables are the basic perceptual units. Thus, important in a spectrogram, but this does
either phonemes or syllables can form the not prove they are of value in human speech
perceptual units in speech perception. perception. Evidence that the spectrogram is
The third stage of speech perception (word of value has been provided by using a pattern
identification) is of particular importance. Some
of the main problems in word identification
are discussed shortly. However, we will mention KEY TERMS
one problem here. Most people know tens of
thousands of words, but these words (in English phonemes: basic speech sounds conveying
at least) are constructed out of only about 35 meaning.
phonemes. The obvious consequence is that spectrograph: an instrument used to produce
visible records of the sound frequencies in speech.
the great majority of spoken words resemble formants: peaks in the frequencies of speech
many other words at the phonemic level, and sounds; revealed by a spectrograph.
so are hard for listeners to distinguish.
9 READING AND SPEECH PERCEPTION 355

Figure 9.9 Spectrogram of the sentence “Joe took father’s shoe bench out”. From Language Processes by
Vivian C. Tartter (1986, p. 210). Reproduced with permission of the author.

playback or vocoder, which allows the spectro- begins. Ways in which listeners cope with
gram to be played back (i.e., reconverted the segmentation problem are discussed
into speech). Liberman, Delattre, and Cooper shortly.
(1952) constructed “artificial” vowels on the (3) In normal speech, there is co-articulation,
spectrogram based only on the first two for- which is “the overlapping of adjacent
mants of each vowel. These vowels were easily articulations” (Ladefoged, 2001, p. 272).
identified when played through the vocoder, More specifically, the way a phoneme
suggesting that formant information is used is produced depends on the phonemes
to recognise vowels. preceding and following it. The existence
of co-articulation means that the pro-
nunciation of any given phoneme is not
Problems faced by listeners invariant, which can create problems for
Listeners are confronted by several problems the listener. However, co-articulation means
when understanding speech: that listeners hearing one phoneme are
provided with some information about
(1) Language is spoken at about ten phonemes the surrounding phonemes. For example,
(basic speech sounds) per second, and so “The /b/ phonemes in ‘bill’, ‘bull’, and
requires rapid processing. Amazingly, we ‘bell’ are all slightly different acoustically,
can understand speech artificially speeded
up to 50–60 sounds or phonemes per second
(Werker & Tees, 1992). KEY TERMS
(2) There is the segmentation problem, which
is the difficulty of separating out or dis- segmentation problem: the listener’s problem
tinguishing words from the pattern of of dividing the almost continuous sounds of
speech sounds. This problem arises because speech into separate phonemes and words.
speech typically consists of a continuously co-articulation: the finding that the production
of a phoneme is influenced by the production of
changing pattern of sound with few periods the previous sound and preparations for the
of silence. This can make it hard to know next sound; it provides a useful cue to listeners.
when one word ends and the next word
356 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

and tell us about what is coming next”


(Harley, 2008, p. 259).
(4) There are significant individual differences
from one speaker to the next. For example,
speakers vary considerably in their rate of
speaking. Sussman, Hoemeke, and Ahmed
(1993) asked various speakers to say the
same short words starting with a consonant.
There were clear differences across speakers
in their spectrograms. Wong, Nusbaum,
and Small (2004) studied brain activation
Listeners often have to contend with degraded
when listeners were exposed to several speech; for example: interference from a crackly
speakers or to only one. When exposed phone line; street noise; or other people nearby
to several speakers at different times, talking at the same time. How do we cope with
listeners had increased attentional pro- these problems?
cessing in the major speech areas (e.g.,
posterior superior temporal cortex) and in
areas associated with attentional shifts (e.g., a striking demonstration. They prepared a
superior parietal cortex). Thus, listeners videotape of someone saying “ba” repeatedly.
respond to the challenge of hearing several The sound channel then changed so there was
different voices by using active attentional a voice saying “ga” repeatedly in synchron-
and other processes. isation with lip movements still indicating “ba”.
(5) Mattys and Liss (2008) pointed out that Listeners reported hearing “da”, a blending of
listeners in everyday life have to contend the visual and auditory information. Green,
with degraded speech. For example, there Kuhl, Meltzoff, and Stevens (1991) showed
are often other people talking at the same that the so-called McGurk effect is surprisingly
time and/or there are distracting sounds robust – they found it even with a female face
(e.g., noise of traffic or aircraft). It is of some and a male voice.
concern that listeners in the laboratory It is generally assumed that the McGurk
are rarely confronted by these problems effect depends primarily on bottom-up pro-
in research on speech perception. This led cesses triggered directly by the discrepant
Mattys and Liss (p. 1235) to argue that, visual and auditory signals. If so, the McGurk
“Laboratory-generated phenomena reflect effect should not be influenced by top-down
what the speech perception system can processes based on listeners’ expectations. How-
do with highly constrained input.” ever, expectations are important. More listeners
produced the McGurk effect when the crucial
We have identified several problems that word (based on blending the discrepant visual
listeners face when trying to make sense of and auditory cues) was presented in a semant-
spoken language. Below we consider some of ically congruent than a semantically incongruent
the main ways in which listeners cope with sentence (Windmann, 2004). Thus, top-down
these problems. processes play an important role.

Lip-reading: McGurk effect Addressing the segmentation


Listeners (even those with normal hearing)
often make extensive use of lip-reading to problem
provide them with additional information. Listeners have to divide the speech they hear
McGurk and MacDonald (1976) provided into its constituent words (i.e., segmentation)
9 READING AND SPEECH PERCEPTION 357

Basis for segmentation Interpretive


conditions

Sentential context
(pragmatics, syntax,
LEXICAL

semantics) TIER 1
Optimal
Lexical

Lexical knowledge

Figure 9.10 A hierarchical


approach to speech
Phonotactics segmentation involving three
Acoustic-phonetics TIER 2 Poor lexical
levels or tiers. The relative
SUB-LEXICAL

(coarticulation, allophony) Segmental information


importance of the different
types of cue is indicated by
TIER 3
the width of the purple
Poor segmental triangle. From Mattys et al.
Word stress Metrical information
prosody
(2005). Copyright © 2005
American Psychological
Association.

and decide what words are being presented. (Cutler & Butterfield, 1992). For example, “con-
There has been controversy as to whether duct ascents hill” was often misperceived as the
segmentation precedes and assists word recog- meaningless, “A duck descends some pill.”
nition or whether it is the product of word Fourth, the extent of co-articulation provides
recognition. We will return to that controversy a useful cue to word boundaries. As mentioned
shortly. Before doing so, we will consider various above, co-articulation can help the listener to
non-lexical cues used by listeners to facilitate anticipate the kind of phoneme that will occur
segmentation. First, certain sequences of speech next. Perhaps more importantly, there is generally
sounds (e.g., <m,r> in English) are never found more co-articulation within words than between
together within a syllable, and such sequences them (Byrd & Saltzman, 1998).
suggest a likely boundary between words (Dumay, Mattys, White, and Melhorn (2005) argued
Frauenfelder, & Content, 2002). persuasively that we need to go beyond simply
Second, Norris, McQueen, Cutler, and describing the effects of individual cues on word
Butterfield (1997) argued that segmentation is segmentation. He put forward a hierarchical
influenced by the possible-word constraints approach, according to which there are three
(e.g., a stretch of speech lacking a vowel is not main categories of cue: lexical (e.g., syntax, word
a possible word). For example, listeners found knowledge); segmental (e.g., coarticulation); and
it hard to identify the word “apple” in “fapple” metrical prosody (e.g., word stress) (see Figure
because the /f/ could not possibly be an English 9.10). We prefer to use lexical cues (Tier 1) when
word. In contrast, listeners found it relatively all cues are available. When lexical information
easy to detect the word “apple” in “vuffapple”, is lacking or is impoverished, we make use of
because “vuff” could conceivably be an English segmental cues such as co-articulation and
word. allophony (one phoneme may be associated
Third, there is stress. In English, the initial
syllable of most content words (e.g., nouns,
verbs) is typically stressed. When listeners KEY TERM
heard strings of words without the stress on
allophony: an allophone is one of two or more
the first syllable (e.g., “conduct ascents uphill”) similar sounds belonging to the same phoneme.
presented faintly, they often misheard them
358 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

with two or more similar sounds or allophones) were identical than when they were not (Pisoni
(Tier 2). For example, Harley (2008) gives this & Tash, 1974).
example: the phoneme /p/ can be pronounced Raizada and Poldrack (2007) presented
differently as in “pit” and “spit”. Finally, if listeners with auditory stimuli ranging along
it is difficult to use Tier 1 or Tier 2 cues, we a continuum from the phoneme /ba/ to the
resort to metrical prosody cues (e.g., stress) phoneme /da/. Two similar stimuli were pre-
(Tier 3). sented at the same time, and participants decided
Why do we generally prefer not to use whether they represented the same phoneme.
stress cues? As Mattys et al. (2005) pointed Listeners were more sensitive to the differences
out, stress information is misleading for words between the stimuli when they straddled the
in which the initial syllable is not stressed (cf., category boundary between /ba/ and /da/. The
Cutler & Butterfield, 1992). key finding was that differences in brain activa-
There is reasonable support for the above tion of the two stimuli being presented were
hierarchical approach. Mattys (2004) found strongly amplified when they were on opposite
that co-articulation (Tier 2) was more useful than sides of the category boundary. This amplifica-
stress (Tier 3) for identifying word boundaries tion effect suggests that categories are important
when the speech signal was phonetically intact. in speech perception.
However, when the speech signal was impover-
ished so that it was hard to use Tier 1 or Tier 2
cues, stress was more useful than co-articulation. Context effects: sound
Mattys et al. (2005) found that lexical cues identification
(i.e., word context versus non-word context) Spoken word recognition involves a mixture
were more useful than stress in facilitating of bottom-up or data-driven processes triggered
word segmentation in a no-noise condition. by the acoustic signal, and top-down or con-
However, stress was more useful than lexical ceptually driven processes generated from
cues in noise. the linguistic context. Finding that the identi-
fication of a sound or a word is influenced by
the context in which it is presented provides
Categorical perception evidence for top-down effects. However, there
Speech perception differs from other kinds of has been much controversy concerning the
auditory perception. For example, there is a interpretation of most context effects. We will
definite left-hemisphere advantage for perception consider context effects on the identification
of speech but not other auditory stimuli. There of sounds in this section, deferring a discussion
is categorical perception of phonemes: speech of context effects in word identification until
stimuli intermediate between two phonemes later. We start by considering context in the
are typically categorised as one phoneme or the form of an adjacent sound, and then move on
other, and there is an abrupt boundary between to discuss sentential context (i.e., the sentence
phoneme categories. For example, the Japanese within which a sound is presented). We will
language does not distinguish between /l/ and see that the processes underlying different kinds
/r/. These sounds belong to the same category of context effect probably differ.
for Japanese listeners, and so they find it very
hard to discriminate between them (Massaro,
1994).
The existence of categorical perception does KEY TERM
not mean we cannot distinguish at all between
categorical perception: perceiving stimuli as
slightly different sounds assigned to the same belonging to specific categories; found with
phoneme category. Listeners decided faster that phonemes.
two syllables were the same when the sounds
9 READING AND SPEECH PERCEPTION 359

Lexical identification shift perception was apparently reported by Warren


We have seen that listeners show categorical and Warren (1970). They studied the phonemic
perception, with speech stimuli intermediate restoration effect. Listeners heard a sentence
between two phonemes being categorised as in which a small portion had been removed
one phoneme or the other. Ganong (1980) and replaced with a meaningless sound. The
wondered whether categorical perception of sentences used were as follows (the asterisk
phonemes would be influenced by context. indicates a deleted portion of the sentence):
Accordingly, he presented listeners with various
sounds ranging between a word (e.g., dash) • It was found that the *eel was on the
and a non-word (e.g., tash). There was a con- axle.
text effect – an ambiguous initial phoneme was • It was found that the *eel was on the
more likely to be assigned to a given phoneme shoe.
category when it produced a word than when • It was found that the *eel was on the
it did not (the lexical identification shift). table.
There are at least two possible reasons why • It was found that the *eel was on the
context might influence categorical perception. orange.
First, context may have a direct influence on
perceptual processes. Second, context may The perception of the crucial element in
influence decision or other processes occurring the sentence (e.g., *eel) was influenced by the
after the perceptual processes are completed sentence context. Participants listening to the
but prior to a response being made. Such pro- first sentence heard “wheel”, those listening to
cesses can be influenced by providing rewards the second sentence heard “heel”, and those
for correct responses and penalties for incor- exposed to the third and fourth sentences heard
rect ones. Pitt (1995) found that rewards and “meal” and “peel”, respectively. The crucial
penalties had no effect on the lexical identi- auditory stimulus (i.e., “*eel”) was always the
fication shift, suggesting that it depends on same, so all that differed was the contextual
perceptual processes rather than ones occurring information.
subsequently. What causes the phonemic restoration
Connine (1990) found that the identifica- effect? According to Samuel (1997), there are
tion of an ambiguous phoneme is influenced two main possibilities:
by the meaning of the sentence in which it is
presented (i.e., by sentential context). How- (1) There is a direct effect on speech process-
ever, the way in which this happened differed ing (i.e., the missing phoneme is processed
from the lexical identification shift observed almost as if it were present).
by Ganong (1980). Sentential context did not (2) There is an indirect effect with listeners
influence phoneme identification during initial guessing the identity of the missing pho-
speech perception, but rather affected processes neme after basic speech processing has
occurring after perception. occurred.
In sum, the standard lexical identification
shift depends on relatively early perceptual
processes. In contrast, the effects of sentence KEY TERMS
context on the identification of ambiguous
phonemes involve later processes following lexical identification shift: the finding that an
perception. ambiguous phoneme tends to be perceived so as
to form a word rather than a nonword.
phonemic restoration effect: an illusion in
Phonemic restoration effect which the listener “perceives” a phoneme has
Evidence that top-down processing based on been deleted from a spoken sentence.
the sentence context can be involved in speech
360 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

The findings appear somewhat inconsistent. We start with a brief account of the motor
Samuel (e.g., 1981, 1987) added noise to the theory of speech perception originally proposed
crucial phoneme or replaced the missing pho- over 40 years ago. However, our main focus
neme with noise. If listeners processed the missing will be on the cohort and TRACE models, both
phoneme as usual, they would have heard the of which have been very influential in recent
crucial phoneme plus noise in both conditions. years. The original cohort model (Marslen-
As a result, they would have been unable to tell Wilson & Tyler, 1980) emphasised interactions
the difference between the two conditions. In fact, between bottom-up and top-down processes
the listeners could readily distinguish between the in spoken word recognition. However, Marslen-
conditions, suggesting that sentence context affects Wilson (e.g., 1990) subsequently revised his
processing occurring following perception. cohort model to increase the emphasis on
Samuel (1997) used a different paradigm in bottom-up processes driven by the auditory
which there was no sentential context. Some stimulus. In contrast, the TRACE model argues
listeners repeatedly heard words such as “aca- that word recognition involves interactive top-
demic”, “confidential”, and “psychedelic”, all of down and bottom-up processes. Thus, a crucial
which have /d/ as the third syllable. The multiple difference is that top-down processes (e.g.,
presentations of these words reduce the prob- context-based effects) play a larger role in the
ability of categorising subsequent sounds as /d/ TRACE model than in the cohort model.
because of an adaptation effect. In another con-
dition, listeners were initially exposed to the same
words with the key phoneme replaced by noise Motor theory
(e.g., aca*emic; confi*entail; psyche*elic). In Liberman, Cooper, Shankweiler, and Studdert-
a different condition, the /d/ phoneme was Kennedy (1967) argued that a key issue in
replaced by silence. Listeners could have guessed speech perception is to explain how listeners
the missing phoneme in both conditions. How- perceive words accurately even though the
ever, perceptual processes could only have been speech signal provides variable information. In
used to identify the missing phoneme in the their motor theory of speech perception, they
noise condition. proposed that listeners mimic the articulatory
What did Samuel (1997) find? There was movements of the speaker. The motor signal
an adaptation effect in the noise condition but thus produced was claimed to provide much
not in the silence condition. These findings seem less variable and inconsistent information about
to rule out guessing as an explanation. They what the speaker is saying than the speech
suggest that there was a direct effect of lexical signal itself. Thus, our recruitment of the motor
or word activation on perceptual processes in the system facilitates speech perception.
noise condition leading to an adaptation effect.
In sum, it is likely that the processes under- Evidence
lying the phonemic restoration effect vary Findings consistent with the motor theory were
depending on the precise experimental condi- reported by Dorman, Raphael, and Liberman
tions. More specifically, there is evidence for (1979). A tape was made of the sentence,
direct effects (Samuel, 1997) and indirect effects “Please say shop”, and a 50 ms period of
(Samuel, 1981, 1987). silence was inserted between “say” and “shop”.
As a result, the sentence was misheard as,
“Please say chop”. Our speech musculature
THEORIES OF SPOKEN forces us to pause between “say” and “chop”
WORD RECOGNITION but not between “say” and “shop”. Thus, the
evidence from internal articulation would
There are several theories of spoken word favour the wrong interpretation of the last
recognition, three of which are discussed here. word in the sentence.
9 READING AND SPEECH PERCEPTION 361

Fadiga, Craighero, Buccino, and Rizzolatti use auditory information to mimic the speaker’s
(2002) applied transcranial magnetic stimulation articulatory movements. More generally, the
(TMS; see Glossary) to the part of the motor theory doesn’t attempt to provide a comprehen-
cortex controlling tongue movements while sive account of speech perception.
Italian participants listened to Italian words. Second, many individuals with very severely
Some of the words (e.g., “terra”) required strong impaired speech production nevertheless have
tongue movements when pronounced, whereas reasonable speech perception. For example, some
others (e.g., “baffo”) did not. The key finding patients with Broca’s aphasia (see Glossary)
was that there was greater activation of listeners’ have effective destruction of the motor speech
tongue muscles when they were presented with system but their ability to perceive speech is
words such as “terra” than with words such as essentially intact (Harley, 2008). In addition,
“baffo”. some mute individuals can perceive spoken
Wilson, Saygin, Sereno, and Iacoboni (2004) words normally (Lenneberg, 1962). However,
had their participants say aloud a series of the motor theory could account for these find-
syllables and also listen to syllables. As pre- ings by assuming the motor movements involved
dicted by the motor theory, the motor area in speech perception are fairly abstract and do
activated when participants were speaking was not require direct use of the speech musculature
also activated when they were listening. This (Harley, 2008).
activated area was well away from the classical Third, it follows from the theory that
frontal lobe language areas. infants with extremely limited expertise in
The studies discussed so far do not show that articulation of speech should be very poor
activity in motor areas is linked causally to speech at speech perception. In fact, however, 6- to
perception. This issue was addressed by Meister, 8-month-old infants perform reasonably well
Wilson, Deblieck, Wu, and Iacobini (2007). They on syllable detection tasks (Polka, Rvachew,
applied repetitive transcranial magnetic stimu- & Molnar, 2008).
lation (rTMS) to the left premotor cortex while
participants performed a phonetic discrimina-
tion or tone discrimination task. Only the former Cohort model
task requires language processes. TMS adversely The cohort model was originally put forward
affected performance only on the phonetic dis- by Marslen-Wilson and Tyler (1980), and has
crimination task, which involved discriminating been revised several times since then. We will
stop consonants in noise. These findings provide consider some of the major revisions later, but
reasonable evidence that speech perception is for now we focus on the assumptions of the
facilitated by recruitment of the motor system. original version:

Evaluation • Early in the auditory presentation of a word,


There has been an accumulation of evidence words conforming to the sound sequence
supporting the motor theory of speech percep- heard so far become active; this set of words
tion in recent years (see reviews by Galantucci, is the “word-initial cohort”.
Fowler, and Turvey, 2006, and Iacoboni, 2008). • Words belonging to this cohort are then
Speech perception is often associated with ac- eliminated if they cease to match further
tivation of the motor area and motor processes information from the presented word,
can facilitate speech perception. However, we or because they are inconsistent with the
must be careful not to exaggerate the importance semantic or other context. For example,
of motor processes in speech perception. the words “crocodile” and “crockery” might
What are the limitations of the motor theory? both belong to a word-initial cohort, with
First, the underlying processes are not spelled out. the latter word being excluded when the
For example, it is not very clear how listeners sound /d/ is heard.
362 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

• Processing of the presented word continues the rest of the sentence are both used at the
until contextual information and informa- same time. As predicted, complete sensory
tion from the word itself are sufficient to analysis was not needed with adequate con-
eliminate all but one of the words in the textual information (see Figure 9.11). It was
word-initial cohort. The uniqueness point only necessary to listen to the entire word when
is the point at which the initial part of a the sentence context contained no useful syn-
word is consistent with only one word. tactic or semantic information (i.e., random
However, words can often be recognised condition).
earlier than that because of contextual Evidence that the uniqueness point is
information. important in speech perception was reported
• Various sources of information (e.g., lexical, by Marslen-Wilson (1984). Listeners were pre-
syntactic, semantic) are processed in parallel. sented with words and nonwords and decided
These information sources interact and on a lexical decision task whether a word had
combine with each other to produce an been presented. The key finding related to non-
efficient analysis of spoken language. words. The later the position of the phoneme
at which the sound sequence deviated from all
Marslen-Wilson and Tyler tested their theoretical English words, the more time the listeners took
notions in a word-monitoring task in which to make nonword decisions.
listeners identified pre-specified target words O’Rourke and Holcomb (2002) also
presented within spoken sentences. There were addressed the assumption that a spoken word
normal sentences, syntactic sentences (gram- is identified when the uniqueness point is
matically correct but meaningless), and random reached (i.e., the point at which only one word
sentences (unrelated words). The target was a is consistent with the acoustic signal). Listeners
member of a given category, a word rhyming heard spoken words and pseudowords and
with a given word, or a word identical to a given decided as rapidly as possible whether each
word. The dependent variable was the speed stimulus was a word. Some words had an early
with which the target was detected. uniqueness point (average of 427 ms after word
According to the original version of the onset), whereas others had a late uniqueness
cohort model, sensory information from the point (average of 533 ms after word onset).
target word and contextual information from The N400 (a negative-going wave assessed by

600 Random sentence

550 Syntactic sentence


Mean target detection latencies

500

450 Normal sentence

400

350

300

250

200
Figure 9.11 Detection
times for word targets
presented in sentences. Identical Rhyme Category
Adapted from Marslen- Target type
Wilson and Tyler (1980).
9 READING AND SPEECH PERCEPTION 363

ERPs; see Glossary) was used as a measure of hearing “focabulaire” activated the word “vocab-
the speed of word processing. ulaire”). However, the listeners took some
O’Rourke and Holcomb (2002) found that time to overcome the effects of the mismatch
the N400 occurred about 100 ms earlier for in the initial phoneme. Allopenna, Magnuson,
words having an early uniqueness point than and Tanenhaus (1998) found that the initial
for those having a late uniqueness point. This is phoneme of a spoken word activated other words
important, because it suggests that the unique- sharing that phoneme (e.g., the initial sounds
ness point may be significant. The further finding of “beaker” caused activation of “beetle”).
that N400 typically occurred shortly after the Somewhat later, there was a weaker tendency
uniqueness point had been reached supports for listeners to activate words rhyming with
the assumption of cohort theory that spoken the auditory input (e.g., “beaker” activated
word processing is highly efficient. “speaker”). The key point in these studies is
Radeau, Morais, Mousty, and Bertelson that some words not sharing an initial phoneme
(2000) cast some doubt over the general impor- with the auditory input were not totally
tance of the uniqueness point. Listeners were excluded from the cohort as predicted by the
presented with French nouns having early or original cohort model.
late uniqueness points. The uniqueness point
influenced performance when the nouns were Revised model
presented at a slow rate (2.2 syllables/second) Marslen-Wilson (1990, 1994) revised the
or a medium rate (3.6 syllables/second) but not cohort model. In the original version, words
when presented at a fast rate (5.6 syllables/ were either in or out of the word cohort. In
second). This is somewhat worrying given that the revised version, candidate words vary in
the fast rate is close to the typical conversa- their level of activation, and so membership of
tional rate of speaking! the word cohort is a matter of degree. Marslen-
There is considerable emphasis in the cohort Wilson (1990) assumed that the word-initial
model on the notion of competition among cohort may contain words having similar initial
candidate words when a listener hears a word. phonemes rather than being limited only to
Weber and Cutler (2004) found that such com- words having the initial phoneme of the pre-
petition can include more words than one might sented word.
imagine. Dutch students with a good command There is a second major difference between
of the English language identified target pictures the original and revised versions of cohort theory.
corresponding to a spoken English word. Even In the original version, context influenced word
though the task was in English, the Dutch recognition early in processing. In the revised
students activated some Dutch words – they version, the effects of context on word recogni-
fixated distractor pictures having Dutch names tion occur only at a fairly late stage of processing.
that resembled phonemically the English name More specifically, context influences only the
of the target picture. Overall, Weber and Cutler’s integration stage at which a selected word is
findings revealed that lexical competition was integrated into the evolving representation of the
greater in non-native than in native listening. sentence. Thus, the revised cohort model places
Undue significance was given to the initial more emphasis on bottom-up processing than
part of the word in the original cohort model. the original version. However, other versions
It was assumed that a spoken word will generally of the model (e.g., Gaskell & Marslen-Wilson,
not be recognised if its initial phoneme is unclear 2002) are less explicit about the late involve-
or ambiguous. Evidence against that assumption ment of context in word recognition.
has been reported. Frauenfelder, Scholten, and
Content (2001) found that French-speaking Evidence
listeners activated words even when the initial The assumption that membership of the word
phoneme of spoken words was distorted (e.g., cohort is gradated rather than all-or-none is
364 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

clearly superior to the previous assumption They presented sentences ending with incom-
that membership is all-or-none. Some research plete words (e.g., “To light up the dark she
causing problems for the original version of needed her can ___”. Immediately afterwards,
the model (e.g., Allopenna et al., 1998; Frauen- listeners saw a visual word matched to the
felder et al., 2001) is much more consistent with incomplete word in form and meaning (e.g.,
the revised assumption. “candle”), in meaning only (e.g., “lantern”),
Some of the strongest support for the in form only (e.g., “candy”), or in neither
assumption that context influences only the (“number”). Event-related potentials (ERPs;
later stages of word recognition was reported see Glossary) were recorded to assess the early
by Zwitserlood (1989). Listeners performed a stages of word processing. There was evidence
lexical decision task (deciding whether visually for a form-based cohort 250 ms after presenta-
presented letter strings were words) immedi- tion of the visual word, and of a meaning-
ately after hearing part of a spoken word. For based cohort 220 ms after presentation. The
example, when only “cap___” had been pre- existence of a form-based cohort means that
sented, it was consistent with various possible “candy” was activated even though the context
words (e.g., “captain”, “capital”). Performance strongly indicated that it was not the correct
on the lexical decision task was faster when word. Thus, context did not constrain the
the word on that task was related in meaning words initially processed as predicted by the
to either of the possible words (e.g., “ship” revised cohort model.
for “captain” and “money” for “capital”). Of In spite of the above findings, sentence
greatest importance was what happened when context can influence spoken word processing
the part word was preceded by a biasing con- some time before a word’s uniqueness point has
text (e.g., “With dampened spirits the men been reached. Van Petten, Coulson, Rubin, Plante,
stood around the grave. They mourned the loss and Parks (1999) presented listeners with a
of their captain.” Such context did not prevent spoken sentence frame (e.g., “Sir Lancelot spared
the activation of competitor words (e.g., the man’s life when he begged for _____”),
“capital”). followed after 500 ms by a final word congruent
So far we have discussed Zwitserlood’s (e.g., “mercy”) or incongruent (e.g., “mermaid”)
(1989) findings when only part of the spoken with the sentence frame. Van Petten et al. used
word was presented. What happened when ERPs to assess processing of the final word.
enough of the word was presented for listeners There were significant differences in the N400
to be able to guess its identity correctly? (a negative wave occurring about 400 ms after
According to the revised cohort model, we stimulus presentation) to the contextually con-
should find effects of context at this late stage gruent and incongruent words 200 ms before
of word processing. That is precisely what the uniqueness point was reached. Thus, very
Zwitserlood found. strong context influenced spoken word pro-
Friedrich and Kotz (2007) carried out a cessing earlier than expected within the revised
similar study to that of Zwitserlood (1989). cohort model.

Immediate effects of context on processing of spoken words


One of the most impressive attempts to show to textures. After that, they presented visual
that context can have a very rapid effect during displays consisting of four objects, and participants
speech perception was reported by Magnuson, were instructed to click on one of the objects
Tanenhaus, and Aslin (2008). Initially, they taught (identified as “the (adjective)” or as “the (noun)”).
participants an artificial lexicon consisting of nouns The dependent variable of interest was the eye
referring to shapes and adjectives referring fixations of participants.
9 READING AND SPEECH PERCEPTION 365

On some trials, the display consisted of four 1.0


different shapes, and so only a noun was needed
to specify uniquely the target object. In other 0.8

Fixation proportion
words, the visual context allowed participants
Average
to predict that the target would be accurately 0.6
noun offset Target noun
described just by a noun. On every trial, there
was an incorrect competitor word starting with 0.4
the same sound as the correct word. This com-
Competitor noun
petitor was a noun or an adjective. According 0.2
to the cohort model, this competitor should
have been included in the initial cohort regard- 0
0 400 800 1200 1600
less of whether it was a noun or an adjective. Time since noun offset (ms)
In contrast, if listeners could use context very 1.0
rapidly, they would have only included the com-
petitor when it was a noun. 0.8

Fixation proportion
The competitor was considered until 800 Average
ms after word onset (200 ms after word offset) 0.6 noun offset
when it was a noun (see Figure 9.12). Dramatically, Target noun

however, the competitor was eliminated within 0.4


200 ms of word onset (or never considered at
all) when it was an adjective. 0.2 Competitor adjective
What do these findings mean? They cast
considerable doubt on the assumption that 0
0 400 800 1200 1600
context effects occur only after an initial cohort
Time since noun offset (ms)
of possible words has been established. If the
Figure 9.12 Eye fixation proportions to noun
context allows listeners to predict accurately targets and noun competitors (top figure) and
which words are relevant and which are irrelevant, to noun targets and adjective competitors
then the effects of context can occur more (bottom figure) over time after noun onset.
rapidly than is assumed by the cohort model. The time after noun onset at which the target
attracted significantly more fixations than the
According to Magnuson et al. (2008), delayed
competitor occurred much later with a noun
effects of context are found when the context than an adjective competitor. Based on data in
only weakly predicts which word is likely to be Magnuson et al. (2008).
presented.

Overall evaluation which spoken words are generally identified


The theoretical approach represented by the and the importance of the uniqueness point
cohort model possesses various strengths. First, indicate the importance of sequential processing.
the assumption that accurate perception of a Third, the revised version of the model has two
spoken word involves processing and rejecting advantages over the original version:
several competitor words is generally correct.
However, previous theories had typically paid (1) The assumption that membership of the
little or no attention to the existence of sub- word cohort is a matter of degree rather
stantial competition effects. Second, there is the than being all-or-none is more in line with
assumption that the processing of spoken words the evidence.
is sequential and changes considerably during (2) There is more scope for correcting errors
the course of their presentation. The speed with within the revised version of the model
366 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

because words are less likely to be elimin- • There are individual processing units or
ated from the cohort at an early stage. nodes at three different levels: features (e.g.,
voicing; manner of production), phonemes,
What are the limitations of the cohort and words.
model? First, there is the controversial issue of • Feature nodes are connected to phoneme
the involvement of context in auditory word nodes, and phoneme nodes are connected
recognition. According to the revised version to word nodes.
of the cohort model, contextual factors only • Connections between levels operate in both
exert an influence late in processing at the directions, and are only facilitatory.
integration stage. This is by no means the whole • There are connections among units or nodes
story. It may be correct when context only at the same level; these connections are
moderately constrains word identity but strongly inhibitory.
constraining context seems to have an impact • Nodes influence each other in proportion
much earlier in processing (e.g., Magnuson et al., to their activation levels and the strengths
2008; Van Petten et al., 1999). However, Gaskell of their interconnections.
and Marslen-Wilson (2002) emphasised the • As excitation and inhibition spread among
notion of “continuous integration” and so can nodes, a pattern of activation or trace
accommodate the finding that strong context develops.
has early effects. • The word recognised or identified by the
Second, the modifications made to the listener is determined by the activation level
original version of the model have made it of the possible candidate words.
less precise and harder to test. As Massaro
(1994, p. 244) pointed out, “These modifica- The TRACE model assumes that bottom-up
tions . . . make it more difficult to test against and top-down processes interact throughout
alternative models.” speech perception. In contrast, most versions
Third, the processes assumed to be involved of the cohort model assume that top-down
in processing of speech depend heavily on iden- processes (e.g., context-based effects) occur
tification of the starting points of individual relatively late in speech perception. Bottom-up
words. However, it is not clear within the activation proceeds upwards from the feature
theory how this is accomplished. level to the phoneme level and on to the word
level, whereas top-down activation proceeds in
the opposite direction from the word level to the
TRACE model phoneme level and on to the feature level.
McClelland and Elman (1986) and McClelland
(1991) produced a network model of speech Evidence
perception based on connectionist principles Suppose we asked listeners to detect target
(see Chapter 1). Their TRACE model of speech phonemes presented in words and nonwords.
perception resembles the interactive activa- According to the TRACE model, performance
tion model of visual word recognition put should be better in the word condition. Why
forward by McClelland and Rumelhart (1981; is that? In that condition, there would be acti-
discussed earlier in the chapter). The TRACE vation from the word level proceeding to the
model assumes that bottom-up and top-down phoneme level which would facilitate phoneme
processes interact flexibly in spoken word detection. Mirman, McClelland, Holt, and
recognition. Thus, all sources of information Magnuson (2008) asked listeners to detect a
are used at the same time in spoken word target phoneme (/t/ or /k/) in words and non-
recognition. words. Words were presented on 80% or 20%
The TRACE model is based on the following of the trials. The argument was that attention
theoretical assumptions: to (and activation at) the word level would be
9 READING AND SPEECH PERCEPTION 367

activated while other phonemes are inhibited.


Words Nonwords
McClelland et al.’s computer simulation based
Condition on the model successfully produced categorical
600 speech perception.
500 Norris, McQueen, and Cutler (2003) obtained
convincing evidence that phoneme identifica-
400
tion can be directly influenced by top-down
RT (ms)

300 processing. Listeners were initially presented


200 with words ending in the phoneme /f/ or /s/.
For different groups, an ambiguous phoneme
100
equally similar to /f/ and /s/ replaced the final
0
/t/-high /t/-low /k/-high /k/-low /f/ or /s/ in these words. After that, listeners
Condition categorised phonemes presented on their own as
/f/ or /s/. Listeners who had heard the ambiguous
Figure 9.13 Mean reaction times (in ms) for phonemes in the context of /s/-ending words
recognition of /t/ and /k/ phonemes in words and
strongly favoured the /s/ categorisation. In con-
nonwords when words were presented on a high
(80%) or low (20%) proportion of trials. From trast, those who had heard the same phoneme
Mirman et al. (2008). Reprinted with permission of in the context of /f/-ending words favoured the
the Cognitive Science Society Inc. /f/ categorisation. Thus, top-down learning at
the word level affected phoneme categorisation
greater when most of the auditory stimuli were as predicted by the TRACE model.
words, and that this would increase the word According to the TRACE model, high-
superiority effect. frequency words (those often encountered) are
What did Mirman et al. (2008) find? First, processed faster than low-frequency ones partly
the predicted word superiority effect was found because they have higher resting activation
in most conditions (see Figure 9.13). Second, levels. Word frequency is seen as having an
the magnitude of the effect was greater when important role in the word-recognition process
80% of the auditory stimuli were words than and should influence even early stages of word
when only 20% were. These findings provide processing. Support for these predictions was
strong evidence for the involvement of top- reported by Dahan, Magnuson, and Tanenhaus
down processes in speech perception. (2001) in experiments using eye fixations as a
The TRACE model can easily explain the measure of attentional focus. Participants were
lexical identification shift (Ganong, 1980). In presented with four pictures (e.g., bench, bed,
this effect (discussed earlier), there is a bias bell, lobster), three of which had names starting
towards perceiving an ambiguous phoneme so with the same phoneme. They clicked on the
that a word is formed. According to the TRACE picture corresponding to a spoken word (e.g.,
model, top-down activation from the word level “bench”) while ignoring the related distractors
is responsible for the lexical identification shift. (bed, bell) and the unrelated distractor (lobster).
McClelland, Rumelhart, and the PDP (Parallel According to the model, more fixations should
Distributed Processing) Research Group (1986) be directed to the related distractor having a
applied the TRACE model to the phenomenon high-frequency name (i.e., bed) than to the one
of categorical speech perception discussed ear- having a low-frequency name (i.e., bell). That was
lier. According to the model, the discrimination what Dahan et al. found. In addition, frequency
boundary between phonemes becomes sharper influenced eye fixations very early in processing,
because of mutual inhibition between phoneme which is also predicted by the TRACE model.
units at the phoneme level. These inhibitory We turn now to research revealing problems
processes produce a “winner takes all” situation with the TRACE model. One serious limitation
in which one phoneme becomes increasingly is that it attaches too much importance to the
368 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

influence of top-down processes on spoken word Finally, we consider a study by Davis, Marslen-
recognition. Frauenfelder, Segui, and Dijkstra Wilson, and Gaskell (2002). They challenged the
(1990) gave participants the task of detecting TRACE model’s assumption that recognising
a given phoneme. The key condition was one a spoken word is based on identifying its pho-
in which a nonword closely resembling an nemes. Listeners heard only the first syllable
actual word was presented (e.g., “vocabutaire” of a word, and decided whether it was the only
instead of “vocabulaire”). According to the syllable of a short word (e.g., “cap” or the first
model, top-down effects from the word node syllable of a longer word (e.g., “captain”). The
corresponding to “vocabulaire” should have two words between which listeners had to
inhibited the task of identifying the “t” in choose were cunningly selected so that the first
“vocabutaire”. They did not. phoneme was the same for both words. Since
The existence of top-down effects depends listeners could not use phonemic information
more on stimulus degradation than predicted to make the correct decision, the task should
by the model. McQueen (1991) presented have been very difficult according to the TRACE
ambiguous phonemes at the end of stimuli, and model. In fact, however, performance was good.
participants categorised them. Each ambiguous Listeners used non-phonemic information (e.g.,
phoneme could be perceived as completing a small differences in syllable duration) ignored
word or a nonword. According to the model, by the TRACE model to discriminate between
top-down effects from the word level should short and longer words.
have produced a preference for perceiving the
phonemes as completing words. This prediction Evaluation
was confirmed only when the stimulus was The TRACE model has various successes to its
degraded. It follows from the TRACE model credit. First, it provides reasonable accounts of
that the effects should be greater when the phenomena such as categorical speech recog-
stimulus is degraded. However, the absence of nition, the lexical identification shift, and the
effects when the stimulus was not degraded is word superiority effect in phoneme monitoring.
inconsistent with the model. Second, a significant general strength of the
Imagine you are listening to words spoken model is its assumption that bottom-up and
by someone else. Do you think that you would top-down processes both contribute to spoken
activate the spellings of those words? It seems word recognition, combined with explicit
unlikely that orthography (information about assumptions about the processes involved.
word spellings) is involved in speech perception, Third, the model predicts accurately some of
and there is no allowance for its involvement the effects of word frequency on auditory word
in the TRACE model. However, orthography processing (e.g., Dahan et al., 2001). Fourth,
does play a role in speech perception. Perre and “TRACE . . . copes extremely well with noisy
Ziegler (2008) gave listeners a lexical decision input – which is a considerable advantage given
task (deciding whether auditory stimuli were the noise present in natural language.” (Harley,
words or nonwords). The words varied in terms 2008, p. 274). Why does TRACE deal well
of the consistency between their phonology with noisy and degraded speech? TRACE
and their orthography or spelling. This should emphasises the role of top-down processes, and
be irrelevant if orthography isn’t involved in such processes become more important when
speech perception. In fact, however, listeners bottom-up processes have to deal with limited
performed the lexical decision task slower stimulus information.
when the words were inconsistent than when What are the limitations of the TRACE
they were consistent. Event-related potentials model? First, and most importantly, the model
(ERPs; see Glossary) indicated that inconsistency exaggerates the importance of top-down effects
between phonology and orthography was on speech perception (e.g., Frauenfelder et al.,
detected rapidly (less than 200 ms). 1990; McQueen, 1991). Suppose listeners hear
9 READING AND SPEECH PERCEPTION 369

a mispronunciation. According to the model, audiometric testing reveals they are not deaf.
top-down activation from the word level will Detailed analysis of these patients suggests vari-
generally lead listeners to perceive the word best ous processes can be used to permit repetition
fitting the presented phonemes rather than the of a spoken word. As we will see, the study of
mispronunciation itself. In fact, however, mispro- such patients has shed light on issues such as
nunciations have a strong adverse effect on speech the following: Are the processes involved in
perception (Gaskell & Marslen-Wilson, 1998). repeating spoken words the same for familiar
Second the TRACE model incorporates and unfamiliar words? Can spoken words be
many different theoretical assumptions, which repeated without accessing their meaning?
can be regarded as an advantage in that it Information from brain-damaged patients
allows the model to account for many findings. was used by Ellis and Young (1988) to propose
However, there is a suspicion that it makes the a theoretical account of the processing of
model so flexible that, “it can accommodate spoken words (see Figure 9.14; a more complete
any result” (Harley, 2008, p. 274). figure of the whole language system is provided
Third, tests of the model have relied heavily by Harley, 2008, p. 467). This theoretical account
on computer simulations involving a small (a framework rather than a complete theory)
number of one-syllable words. It is not entirely has five components:
clear whether the model would perform
satisfactorily if applied to the vastly larger • The auditory analysis system extracts
vocabularies possessed by most people. phonemes or other sounds from the speech
Fourth, the model ignores some factors influ- wave.
encing auditory word recognition. As we have • The auditory input lexicon contains infor-
seen, orthographic information plays a significant mation about spoken words known to the
role in speech perception (Perre & Ziegler, 2008). listener but not about their meaning.
In addition, non-phonemic information such as • Word meanings are stored in the semantic
syllable duration also helps to determine auditory system (cf., semantic memory discussed in
word perception (Davis et al., 2002). Chapter 7).
• The speech output lexicon provides the
spoken form of words.
COGNITIVE • The phoneme response buffer provides
NEUROPSYCHOLOGY distinctive speech sounds.
• These components can be used in various com-
We have been focusing mainly on the processes binations so there are several routes between
permitting spoken words to be identified, i.e., hearing a spoken word and saying it.
word recognition. This is significant because
word recognition is of vital importance as we The most striking feature of the framework
strive to understand what the speaker is saying. is the assumption that saying a spoken word
In this section, we consider the processes in- can be achieved using three different routes
volved in the task of repeating a spoken word varying in terms of which stored information
immediately after hearing it. A major goal of about heard spoken words is accessed. We will
research using this task is to identify some of the consider these three routes after discussing the
main processes involved in speech perception. role of the auditory analysis system in speech
However, the task also provides useful in- perception.
formation about speech production (discussed
in Chapter 11).
In spite of the apparent simplicity of the Auditory analysis system
repetition task, many brain-damaged patients Suppose a patient had damage only to the
experience difficulties with it even though auditory analysis system, thereby producing a
370 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

HEARD WORD

AUDITORY ANALYSIS
SYSTEM
(extracts phonemes
or other sounds)
AUDITORY INPUT
LEXICON
(recognises familiar
spoken words) ROUTE 1

ROUTE 3 SEMANTIC SYSTEM


(acoustic to ROUTE 2 (contains word
phonological meanings)
conversion)

SPEECH OUTPUT ROUTE 1


LEXICON
(stores spoken
forms of words)
PHONEME RESPONSE
BUFFER
(provides distinctive
speech sounds)

SPEECH

Figure 9.14 Processing and repetition of spoken words. Adapted from Ellis and Young (1988).

deficit in phonemic processing. Such a patient are highly selective to speech and do not apply
would have impaired speech perception for to non-speech sounds. Many patients seem to
words and nonwords, especially those con- display the necessary selectivity. However, Pinard,
taining phonemes that are hard to discriminate. Chertkow, Black, and Peretz (2002) identified
However, such a patient would have generally impairments of music perception and/or envi-
intact speech production, reading, and writing, ronmental sound perception in 58 out of 63
would have normal perception of non-verbal patients they reviewed.
environmental sounds not containing phonemes Speech perception differs from the percep-
(e.g., coughs; whistles), and his/her hearing tion of most non-speech sounds in that coping
would be unimpaired. The term pure word with rapid change in auditory stimuli is much
deafness describes patients with these symp- more important in the former case. Jörgens et al.
toms. There would be evidence for a double
dissociation if we could find patients with
impaired perception of non-verbal sounds but
intact speech perception. Peretz et al. (1994) KEY TERM
reported the case of a patient having a functional
impairment limited to perception of music and pure word deafness: a condition in which
severely impaired speech perception is combined
prosody.
with good speech production, reading, writing,
A crucial part of the definition of pure word and perception of non-speech sounds.
deafness is that auditory perception problems
9 READING AND SPEECH PERCEPTION 371

(2008) studied a 71-year-old woman with pure version of the acoustic information contained
word deafness, who apparently had no problems in heard words into the appropriate spoken
in identifying environmental sounds in her forms of those words. It is assumed that such
everyday life. However, when asked to count conversion processes must be involved to allow
rapid clicks, she missed most of them. This listeners to repeat back unfamiliar words and
suggests she had problems in dealing with rapid nonwords.
changes in auditory input. Other patients with
pure word deafness have problems in perceiving Evidence
rapid changes in non-speech sounds with com- If patients could use Route 2 but Routes 1 and
plex pitch patterns (see Martin, 2003). Thus, 3 were severely impaired, they should be able
impaired ability to process rapidly changing to understand familiar words but would not
auditory stimuli may help to explain the poor understand their meaning (see Figure 9.14).
speech perception of patients with pure word In addition, they should have problems with
deafness. unfamiliar words and nonwords, because non-
words cannot be dealt with via Route 2. Finally,
since such patients would make use of the input
Three-route framework lexicon, they should be able to distinguish
Unsurprisingly, the most important assumption between words and nonwords.
of the three-route framework is that there are Patients suffering from word meaning
three different ways (or routes) that can be used deafness fit the above description. The notion
when individuals process and repeat words they of word meaning deafness has proved contro-
have just heard. As you can see in Figure 9.14, versial and relatively few patients with the
these three routes differ in terms of the number condition have been identified. However, a few
and nature of the processes used by listeners. All fairly clear cases have been identified. For example,
three routes involve the auditory analysis system Jacquemot, Dupoux, and Bachoud-Lévi (2007)
and the phonemic response buffer. Route 1 claimed that a female patient, GGM, had
involves three additional components of the all of the main symptoms of word meaning
language system (the auditory input lexicon, the deafness.
semantic system, and the speech output lexicon), Franklin, Turner, Ralph, Morris, and Bailey
Route 2 involves two additional components (1996) studied Dr O, who was another clear
(auditory input lexicon and the speech output case of word meaning deafness. He had impaired
lexicon), and Route 3 involves an additional rule- auditory comprehension but intact written word
based system that converts acoustic information comprehension. His ability to repeat words was
into words that can be spoken. We turn now dramatically better than his ability to repeat
to a more detailed discussion of each route. nonwords (80% versus 7%, respectively). Finally,
According to the three-route framework, Dr O had a 94% success rate at distinguishing
Routes 1 and 2 are designed to be used with between words and nonwords.
familiar words, whereas Route 3 is designed Dr O seemed to have reasonable access to
to be used with unfamiliar words and non- the input lexicon as shown by his greater ability
words. When Route 1 is used, a heard word to repeat words than nonwords, and by his
activates relevant stored information about it, almost perfect ability to distinguish between
including its meaning and its spoken form.
Route 2 closely resembles Route 1 except that
information about the meaning of heard words KEY TERM
is not accessed. As a result, someone using
word meaning deafness: a condition in which
Route 2 would say familiar words accurately
there is a selective impairment of the ability to
but would not know their meaning. Finally, understand spoken (but not written) language.
Route 3 involves using rules about the con-
372 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

words and nonwords. He clearly has some Jefferies et al. (2007) found that patients
problem relating to the semantic system. How- with deep dysphasia suffered from poor phono-
ever, the semantic system itself does not seem logical production on word repetition, reading
to be damaged, because his ability to under- aloud, and spoken picture naming. As pre-
stand written words is intact. He probably has dicted, they also performed very poorly on
damage to parts of Route 1. Tyler and Moss tasks involving the manipulation of phonology
(1997) argued that Dr O might also have problems such as the phoneme subtraction task (e.g.,
earlier in processing (e.g., in extracting phonemic remove the initial phoneme from “cat”). Further-
features from speech). For example, when he more, they had problems with speech percep-
was asked to repeat spoken words as rapidly tion, as revealed by their poor performance in
as possible, he made 25% errors. deciding whether two words rhymed with each
According to the theoretical framework, other. In sum, Jefferies et al. provided good
we would expect to find some patients who make support for their phonological impairment
use primarily or exclusively of Route 3, which hypothesis.
involves converting acoustic information from
heard words into the spoken forms of those words. Evaluation
Such patients would be reasonably good at The three-route framework is along the right
repeating spoken words and nonwords but would lines. Patients vary in the precise problems they
have very poor comprehension of these words. have with speech perception (and speech pro-
Some patients with transcortical sensory aphasia duction), and some evidence exists for each of
exhibit precisely this pattern of symptoms (Coslett, the three routes. At the very least, it is clear that
Roeltgen, Rothi, & Heilman, 1987; Raymer, repeating spoken words can be achieved in
2001). These patients typically have poor reading various different ways. Furthermore, conditions
comprehension in addition to impaired auditory such as pure word deafness, word meaning
comprehension, suggesting they have damage deafness and transcortical aphasia can readily
within the semantic system. be related to the framework.
Some brain-damaged patients have extensive What are the limitations of the framework?
problems with speech perception and production. First, it is often difficult to decide precisely how
For example, patients with deep dysphasia make patients’ symptoms relate to the framework.
semantic errors when asked to repeat spoken For example, deep dysphasia can be seen as
words by saying words related in meaning to involving impairments to all three routes or
those spoken (e.g., saying “sky” when they hear alternatively as mainly reflecting a general phono-
“cloud”). In addition, they find it harder to logical impairment. Second, some conditions
repeat abstract words than concrete ones, and (e.g., word meaning deafness; auditory phono-
have a very poor ability to repeat nonwords. logical agnosia) have only rarely been reported
How can we explain deep dysphasia? With and so their status is questionable.
reference to Figure 9.14, it could be argued
that none of the routes between heard words
and speech is intact. Perhaps there is a severe
impairment to the non-lexical route (Route 3) KEY TERMS
combined with an additional impairment in
(or near) the semantic system. Other theorists transcortical sensory aphasia: a disorder in
(e.g., Jefferies et al., 2007) have argued that the which words can be repeated but there are
central problem in deep dysphasia is a general many problems with language.
phonological impairment (i.e., problems in deep dysphasia: a condition in which there is
poor ability to repeat spoken words and
processing word sounds). This leads to semantic
especially nonwords, and there are semantic
errors because it increases patients’ reliance on errors in repeating spoken words.
word meaning when repeating spoken words.
9 READING AND SPEECH PERCEPTION 373

C H A P T E R S U M M A RY
• Reading: introduction
Several methods are available to study reading. Lexical decision, naming, and priming
tasks have been used to assess word identification. Recording eye movements provides
detailed on-line information, and is unobtrusive. Studies of masked phonological priming
suggest that phonological processing occurs rapidly and automatically in reading. However,
phonological activation is probably not essential for word recognition.

• Word recognition
According to the interactive activation model, bottom-up and top-down processes interact
during word recognition. It seems to account for the word-superiority effect, but ignores
the roles of phonological processing and meaning in word recognition. Sentence context
often has a rapid influence on word processing, but this influence is less than total.

• Reading aloud
According to the dual-route cascaded model, lexical and non-lexical routes are used in
reading words and nonwords. Surface dyslexics rely mainly on the non-lexical route,
whereas phonological dyslexics use mostly the lexical route. The dual-route model
emphasises the importance of word regularity, but consistency is more important. The
model also ignores consistency effects with nonwords and minimises the role of phono-
logical processing. The triangle model consists of orthographic, phonological, and semantic
systems. Surface dyslexia is attributed to damage within the semantic system, whereas
phonological dyslexia stems from a general phonological impairment. Deep dyslexia
involves phonological and semantic impairments. The triangle model has only recently
considered the semantic system in detail, and its accounts of phonological and surface
dyslexia are oversimplified.

• Reading: eye-movement research


According to the E-Z Reader model, the next eye-movement is planned when only part
of the processing of the currently fixated word has occurred. Completion of frequency
checking of a word is the signal to initiate an eye-movement programme, and completion
of lexical access is the signal for a shift of covert attention to the next word. The model
provides a reasonable account of many findings. However, it exaggerates the extent of
serial processing, and mistakenly predicts that readers will read words in the “correct”
order or suffer disruption if they do not.

• Listening to speech
Listeners make use of prosodic cues and lip-reading. Among the problems faced by
listeners are the speed of spoken language, the segmentation problem, co-articulation,
individual differences in speech patterns, and degraded speech. Listeners prefer to use
lexical information to achieve word segmentation, but can also use co-articulation,
allophony, and syllable stress. There is categorical perception of phonemes, but we can
discriminate unconsciously between sounds categorised as the same phoneme. The lexical
identification shift and the phonemic restoration effect show the effects of context on
speech perception.
374 COGNITIVE PSYCHOLOGY: A STUDENT’S HANDBOOK

• Theories of spoken word recognition


According to the motor theory, listeners mimic the articulatory movements of the speaker.
There is reasonable evidence that motor processes can facilitate speech perception. However,
some patients with severely impaired speech production have reasonable speech perception.
Cohort theory is based on the assumption that perceiving a spoken word involves rejecting
competitors in a sequential process. However, contextual factors can influence speech
perception earlier in processing than assumed by the model. The TRACE model is highly
interactive and accounts for several phenomena (e.g., word superiority effect in phoneme
monitoring). However, it exaggerates the importance of top-down effects.

• Cognitive neuropsychology
It has been claimed that there are three routes between sound and speech. Patients with pure
word deafness have problems with speech perception that may be due to impaired phonemic
processing. Patients with word meaning deafness have problems in acoustic-to-phonological
conversion and with using the semantic system. Patients with transcranial sensory aphasia
seem to have damage to the semantic system but can use acoustic-to-phonological conversion.
The central problem in deep dysphasia is a general phonological impairment.

F U RT H E R R E A D I N G
• Diehl, R.L., Lotto, A.J., & Holt, L.L. (2004). Speech perception. Annual Review of
Psychology, 55, 149 –179. The authors discuss major theoretical perspectives in terms of
their ability to account for key phenomena in speech perception.
• Gaskell, G. (ed.) (2007). Oxford handbook of psycholinguistics. Oxford: Oxford University
Press. This large edited volume contains several chapters dealing with basic processes in
reading and speech perception. This is especially the case with Part 1, which is devoted
to word recognition.
• Harley, T.A. (2008). The psychology of language: From data to theory (3rd ed.). Several
chapters (e.g., 6, 7, and 9) of this excellent textbook contain detailed information about
the processes involved in recognising visual and auditory words.
• Pisoni, D.B., & Remez, R.E. (eds.) (2004). The handbook of speech perception. Oxford:
Blackwell. This edited book contains numerous important articles across the entire field
of speech perception.
• Rayner, K., Shen, D., Bai, X., & Yan, G. (eds.) (2009). Cognitive and cultural influences
on eye movements. Hove, UK: Psychology Press. Section 2 of this edited book is devoted
to major contemporary theories of eye movements in reading.
• Smith, F. (2004). Understanding reading: A psycholinguistic analysis of reading and learning
to read. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. This textbook provides a thorough
account of theory and research on reading.

You might also like