You are on page 1of 58

Theories of

Spoken Word
Recognition
THEORIES TO BE DISCUSS Original cohort model
(MarslenWilson & Tyler, 1980)
There are several -Bottom-up and top-down processes.
theories of spoken We start with a brief account of Revised cohort model
word recognition, the motor theory of speech (MarslenWilson e.g., 1990) –
three of which are perception originally proposed Bottom-up processes.
discussed here. over 40 years ago.
TRACE model-
However, our main focus will be top-down processes.
on the cohort and TRACE
models, both of which have been
very influential in recent years.

2
Motor theory
Liberman, Cooper, Shankweiler, and
StuddertKennedy (1967) argued that
a key issue in speech perception is to
explain how listeners perceive words
accurately even though the speech
signal provides variable information.
In their motor theory of speech
perception, they proposed that
listeners mimic the articulatory
movements of the speaker.

3
Findings consistent Fadiga, Craighero, Wilson, Saygin,
with the motor theory Buccino, and Sereno, and Iacoboni
were reported by Rizzolatti (2002) The (2004) As predicted
Dorman, Raphael, and key finding was that by the motor theory,
Evidences Liberman (1979). A tape
was made of the
there was greater
activation of
the motor
activated
area
when
sentence, “Please say listeners’ tongue participants were
shop”, and a 50 ms muscles when they speaking was also
period of silence was were presented with activated when they
inserted between “say” words such as “terra” were listening. This
and “shop”. As a result, than with words such activated area was
the sentence was as “baffo”. well away from the
misheard as, “Please say classical frontal lobe
chop”. language areas.

4
Limitations
First, the underlying processes are not spelled
out.
Second, many individuals with very severely
impaired speech production nevertheless have
reasonable speech perception.
Third, it follows from the theory that infants
with extremely limited expertise in articulation
of speech should be very poor at speech
perception.

5
Cohort Model
The cohort model was originally
put forward by Marslen-Wilson
and Tyler (1980), and has been
revised several times since then.

6
• Early in the auditory presentation of a word, words
We will consider
some of the major
conforming to the sound sequence heard so far become
revisions later, but active; this set of words is the “word-initial cohort”.
for now we focus on • Words belonging to this cohort are then eliminated if they
the assumptions of cease to match further information from the presented
the original version: word, or because they are inconsistent with the semantic
or other context. For example, the words “crocodile” and
“crockery” might both belong to a word-initial cohort,
with the latter word being excluded when the sound /d/ is
heard.

7
• Processing of the presented word continues until contextual
information and information from the word itself are sufficient to
eliminate all but one of the words in the word-initial cohort. The
uniqueness point is the point at which the initial part of a word is
consistent with only one word. However, words can often be
recognized earlier than that because of contextual information.
• Various sources of information (e.g., lexical, syntactic, semantic)
are processed in parallel. These information sources interact and
combine with each other to produce an efficient analysis of spoken
language.
Revised model
Marslen-Wilson (1990, 1994) revised the cohort model. In
the original version, words were either in or out of the word
cohort. In the revised version, candidate words vary in their
level of activation, and so membership of the word cohort is
a matter of degree. Marslen-Wilson (1990) assumed that the
word-initial cohort may contain words having similar initial
phonemes rather than being limited only to words having the
initial phoneme of the presented word.

9
There is a second major difference between
the original and revised versions of cohort
Revised theory. In the original version, context
model influenced word recognition early in
processing. In the revised version, the effects
of context on word recognition occur only at
a fairly late stage of processing.

10
▫ First, the assumption that accurate perception of
a spoken word involves processing and rejecting
several competitor words is generally correct.
Strengths of ▫ Second, there is the assumption that the
the Cohort processing of spoken words is sequential and
Model changes considerably during the course of their
presentation.
▫ Third, the revised version of the model has two
advantages over the original version.

11
1. The assumption that membership of the word cohort is
a matter of degree rather than being all-or-none is more
in line with the evidence.
2. There is more scope for correcting errors within the
revised version of the model because words are less
likely to be eliminated from the cohort at an early
stage.
“ What are the limitations of
the cohort model?

13
• First, there is the controversial issue of the involvement
of context in auditory word recognition.
• Second, the modifications made to the original version of
the model have made it less precise and harder to test.
• Third, the processes assumed to be involved in
processing of speech depend heavily on identification of
the starting points of individual words. However, it is not
clear within the theory how this is accomplished.
TRACE MODEL
• McClelland and Elman (1986) and McClelland (1991) produced a
network model of speech perception based on connectionist
principles.
• Their TRACE model of speech perception resembles the interactive
activation model of visual word recognition put forward by
McClelland and Rumelhart (1981; discussed earlier in the chapter).
• The TRACE model assumes that bottom-up and top-down processes
interact flexibly in spoken word recognition. Thus, all sources of
information are used at the same time in spoken word recognition.
The TRACE • There are individual processing units or nodes at
model is based three different levels: features (e.g., voicing;
on the manner of production), phonemes, and words.
following • Feature nodes are connected to phoneme nodes,
theoretical and phoneme nodes are connected to word
assumptions: nodes.
• Connections between levels operate in both
directions, and are only facilitatory

16
• There are connections among units or nodes at the same
level; these connections are inhibitory.
• Nodes influence each other in proportion to their activation
levels and the strengths of their interconnections.
• As excitation and inhibition spread among nodes, a pattern
of activation or trace develops.
• The word recognised or identified by the listener is
determined by the activation level of the possible candidate
words.
“ The TRACE model assumes that bottom-up
and top-down processes interact throughout
speech perception. In contrast, most versions of
the cohort model assume that top-down processes
(e.g., context-based effects) occur relatively late
in speech perception.

18
The TRACE model has various successes to its credit.

• First, it provides reasonable accounts of phenomena such as categorical speech


recognition, the lexical identification shift, and the word superiority effect in
phoneme monitoring.
• Second, a significant general strength of the model is its assumption that
bottom-up and top-down processes both contribute to spoken word recognition,
combined with explicit assumptions about the processes involved.
• Third, the model predicts accurately some of the effects of word frequency on
auditory word processing (e.g., Dahan et al., 2001).
• Fourth, “TRACE . . . copes extremely well with noisy input – which is a
considerable advantage given the noise present in natural language.”
Why does
TRACE deal
well with noisy
and degraded TRACE emphasizes the role of top-
speech? down processes, and such processes
become more important when bottom-up
processes have to deal with limited
stimulus information.

20
• First, and most importantly, the model exaggerates the
importance of top-down effects on speech perception (e.g.,
Limitations Frauenfelder et al., 1990; McQueen, 1991).
of the • Second the TRACE model incorporates many different
theoretical assumptions, which can be regarded as an advantage
TRACE in that it allows the model to account for many findings.
Model • Third, tests of the model have relied heavily on computer
simulations involving a small number of one-syllable words.
• Fourth, the model ignores some factors influencing auditory
word recognition. As we have seen, orthographic information
plays a significant role in speech perception (Perre & Ziegler,
2008).

21
22
Cognitive
Neuropsychology
In this section, we consider the processes involved in the task of repeating
a spoken word immediately after hearing it.

• In spite of the apparent simplicity of the repetition task, many brain-


damaged patients experience difficulties with it even though
audiometric testing reveals they are not deaf. Detailed analysis of these
patients suggests various processes can be used to permit repetition of
a spoken word.
• Information from brain-damaged patients was used by Ellis and
Young (1988) to propose a theoretical account of the processing of
spoken words.
• The auditory analysis system extracts phonemes or other sounds from
the speech wave.
• The auditory input lexicon contains information about spoken words
This theoretical known to the listener but not about their meaning.
account has five • Word meanings are stored in the semantic system
• The speech output lexicon provides the spoken form of words.
components:
• The phoneme response buffer provides distinctive speech sounds

• These components can be used in various combinations so there are


several routes between hearing a spoken word and saying it.
The most striking feature of the framework is the
assumption that saying a spoken word can be achieved
using three different routes varying in terms of which
stored information about heard spoken words is
accessed.

26
• Auditory analysis system
Suppose a patient had damage only to the auditory analysis system,
thereby producing a deficit in phonemic processing. Such a patient
would have impaired speech perception for words and nonwords,
especially those containing phonemes that are hard to discriminate.
• The term pure word deafness describes patients with these
symptoms. There would be evidence for a double dissociation if we
could find patients with impaired perception of non-verbal sounds but
intact speech perception. Peretz et al. (1994) reported the case of a
patient having a functional impairment limited to perception of music
and prosody.
A crucial part of the definition of pure word deafness
is that auditory perception problems are highly
selective to speech and do not apply to non-speech
sounds. Many patients seem to display the necessary
selectivity. However, Pinard, Chertkow, Black, and
Peretz (2002) identifi ed impairments of music
perception and/or environmental sound perception in
58 out of 63 patients they reviewed.
Unsurprisingly, the most important assumption of the three-route
framework is that there are three different ways (or routes) that
can be used when individuals process and repeat words they
have just heard. These three routes differ in terms of the number
Three-route and nature of the processes used by listeners. All three routes
framework involve the auditory analysis system and the phonemic response
buffer.

Route 1 involves three additional components of the language


system (the auditory input lexicon, the semantic system, and the
speech output lexicon).
Route 2 involves two additional components (auditory input
lexicon and the speech output lexicon).
Route 3 involves an additional rule-based system that converts
acoustic information into words that can be spoken.
“ word meaning deafness is a
condition in which there is a
selective impairment of the ability
to understand spoken (but not
written) language.

30
The three-route framework is along the right lines.
Patients vary in the precise problems they have with speech
perception (and speech production), and some evidence exists
for each of the three routes. At the very least, it is clear that
repeating spoken words can be achieved in various different
ways. Furthermore, conditions such as pure word deafness,
word meaning deafness and transcortical aphasia can readily
be related to the framework.

31
• First, it is often difficult to decide precisely
how patients’ symptoms relate to the
framework.
Limitations
• Second, some conditions (e.g., word
meaning deafness; auditory phonological
agnosia) have only rarely been reported and
so their status is questionable.

32
Chapter 10

Language Comprehension

33
In this chapter, we discuss the ways in which phrases,
sentences, and entire stories are processed and
understood during reading and listening.

34
What is the structure of this chapter? At a general level,
we start by considering comprehension processes at
the level of the sentence and fi nish by focusing on
comprehension processes with larger units of language
such as complete texts.

35
• First, there is an analysis of the syntactical (grammatical)
Two main levels of structure of each sentences (parsing).
• Second, there is an analysis of sentence meaning..
analysis in sentence
comprehension

36
Parsing
Is an analysis of the
syntactical or grammatical
structure of sentences.

37
Much of the research on parsing concerns the
relationship between syntactic and semantic analysis.
There are at least four major possibilities:
(1) Syntactic analysis generally precedes (and
influences) semantic analysis.
(2) Semantic analysis usually occurs prior to syntactic
analysis.
(3) Syntactic and semantic analysis occur at the same
time.
(4) Syntax and semantics are very closely associated,
and have a hand-in-glove relationship (Altmann,
personal communication).

38
▫ Ideally, a grammar should be
Grammar or able to generate all the
syntax permissible sentences in a
given language, while at the
same time rejecting all the
unacceptable ones.

39
▫ You might imagine that parsing or assigning grammatical
structure to sentences would be easy.
Syntactic
▫ However, numerous sentences in the English language
ambiguity (e.g., “They are flying planes”) have an ambiguous
grammatical structure.
▫ Some sentences are syntactically ambiguous at the global
level, in which case the whole sentence has two or more
possible interpretations. For example, “They are cooking
apples”, is ambiguous because it may or may not mean
that apples are being cooked.

40
▫ Features of spoken language such as
Prosodic stress, intonation, and duration that
Cues make it easier for listeners to
understand what is being said.
▫ Prosodic cues are most likely to be
used (and are of most value) when
spoken sentences are ambiguous.

41
Theories of Parsing

42
Garden-path model
Frazier and Rayner (1982) put forward a two stage,
garden-path model. It was given that name because
readers or listeners can be misled or “led up the
garden path” by ambiguous sentences such as, “The
horse raced past the barn fell.”
▫ Only one syntactical structure is initially considered
for any sentence.
▫ Meaning is not involved in the selection of the initial
The model is syntactical structure.
based on the ▫ The simplest syntactical structure is chosen, making
following use of two general principles: minimal attachment
and late closure.
assumptions: ▫ According to the principle of minimal attachment,
the grammatical structure producing the fewest nodes
(major parts of a sentence such as noun phrase and
verb phrase) is preferred.
▫ The principle of late closure is that new words
encountered in a sentence are attached to the current
phrase or clause if grammatic

44
• If there is a conflict between the above two principles,
it is resolved in favor of the minimal attachment
principle.

• If the syntactic structure that a reader constructs for a


sentence during the first stage of processing is
incompatible with additional information (e.g.,
semantic) generated by a thematic processor, then there
is a second stage of processing in which the initial
syntactic structure is revised.

45
The principle of minimal attachment can be illustrated
by the following example taken from Rayner and
Pollatsek (1989). In the sentences, “The girl knew the
answer by heart”, and, “The girl knew the answer was
wrong”, the minimal attachment principle leads a
grammatical structure in which “the answer” is regarded
as the direct object of the verb “knew”. This is
appropriate only for the first sentence.
The principle of late closure produces the correct
grammatical structure in a sentence such as, “Since
Jay always jogs a mile this seems like a short distance
to him”. However, use of this principle would lead to an
inaccurate syntactical structure in the following sentence:
“Since Jay always jogs a mile seems like a short
distance”. The principle leads “a mile” to be placed in the
preceding phrase rather than at the start of the new phrase.
Of course, there would be less confusion if a comma were
inserted after the word “jogs”. In general, readers are less
misled by garden-path sentences that are punctuated (Hills
& Murray, 2000).
The model provides a simple and coherent account of
key processes in sentence processing. There is evidence
indicating that the principles of minimal attachment and
late closure often influence the selection of an initial
syntactic structure for sentences.

48
• First, the assumption that the meanings of words within
sentences do not influence the initial assignment of
grammatical structure is inconsistent with some of the
evidence (e.g., Trueswell et al., 1994).
Limitations • Second, prior context often seems to influence the
interpretation of sentences much earlier in processing
than assumed by the model.
• Third, the notion that the initial choice of grammatical
structure depends only on the principles of minimal
attachment and late closure seems too neat and tidy.
• Fourth, the model does not take account of differences
among languages.
• Fifth, it is hard to provide a definitive test of the model.

49
Constraint-based theories

There are substantial differences between constraint-based


theories and the garden-path model. According to constraint-based
theories, the initial interpretation of a sentence depends on
multiple sources of information (e.g., syntactic, semantic, general
world knowledge) called constraints. These constraints limit the
number of possible interpretations.

50
According to the theory, the processing system uses four language
characteristics to resolve ambiguities in sentences:

(1)Grammatical knowledge constrains possible sentence interpretations.


(2)The various forms of information associated with any given word are
typically not independent of each other.
(3) A word may be less ambiguous in some ways than in others (e.g.,
ambiguous for tense but not for grammatical category).
(4) The various interpretations permissible according to grammatical
rules generally differ considerably in frequency and probability on the
basis of past experience.

51
According to the model, the assignment of
syntactic structure to a sentence is influenced
by verb bias. Many verbs can occur within
various syntactic structures, but are found more
often in some syntactic structures than others.

52
Unrestricted race model

Van Gompel, Pickering, and Traxler (2000)


put forward the unrestricted race model that
combined aspects of the garden-path and
constraintbased models.

53
Its main assumptions are as follows:
(1) All sources of information (semantic as well as syntactic) are used to
identify a syntactic structure, as is assumed by constraint-based models.
(2) All other possible syntactic structures are ignored unless the favored
syntactic structure is disconfirmed by subsequent information.
(3) If the initially chosen syntactic structure has to be discarded, there is an
extensive process of re-analysis before a different syntactic structure is
chosen. This assumption makes the model similar to the garden-path
model, in that parsing often involves two distinct stages.

54
Good-enough representations
Nearly all theories of sentence processing (including those we have
discussed) have an important limitation. Such theories are based on the
assumption that the language processor “generates representations of the
linguistic input that are complete, detailed, and accurate” (Ferreira, Bailey, &
Ferraro, 2002, p. 11). An alternative viewpoint is based on the assumption of
“good-enough” representations. According to this viewpoint, the typical goal
of comprehension is “to get a parse of the input that is ‘good enough’ to
generate a response given the current task” (Swets et al., 2008, p. 211).

55
Cognitive neuroscience

Cognitive neuroscience is making substantial contributions


to our understanding of parsing and sentence
comprehension. Since the precise timing of different
processes is so important, much use has been made of
event-related potentials.

56
PRAGMATICS
Pragmatics is concerned with practical language use and
comprehension, especially those aspects going beyond the literal
meaning of what is said and taking account of the current social
context. Thus, pragmatics relates to the intended rather than literal
meaning as expressed by speakers and understood by listeners,
and often involves drawing inferences. The literal meaning of a
sentence is often not the one the writer or speaker intended to
communicate.

57
Theoretical approaches
Much theorizing has focused on figurative language in general and
metaphor in particular. According to the standard pragmatic model
(e.g., Grice, 1975), three stages are involved:
(1) The literal meaning is accessed. For example, the literal meaning of
“David kicked the bucket”, is that David struck a bucket with his
foot.
(2) The reader or listener decides whether the literal meaning makes
sense in the context in which it is read or heard.
(3) If the literal meaning seems inadequate, the reader or listener
searches for a nonliteral meaning that does make sense in the
context.

58

You might also like