You are on page 1of 5

WORD PRODUCTION IMPAIRMENTS: EARLY CASES AND EARLY THEORIES

Before discussing recent theoretical innovations in the cognitive neuropsychology of word production, it is first
necessary to present an overview of early cognitive neuropsychological theories, and the empirical studies that
inspired them. In the late 1970s and early 1980s, when the CN approach first emerged, its primary objective was
to map out the major cognitive systems (‘‘components’’) involved in various tasks and their relationships to
each other (for some early examples relating to word production, see Ellis, 1982; McCarthy & Warrington,
1984; Morton, 1980; Morton & Patterson, 1980; for good reviews, see Cuetos, 2003; Nickels, 1997). The
resultant theories were often illustrated diagrammatically, with core components being represented as boxes,
interconnected with appropriately oriented arrows (hence the nickname box-and-arrow theories). One of the
primary sources of evidence that was used to argue for the existence of particular cognitive components was the
behaviour of individuals with aphasia. If two individuals were found to exhibit completely opposite patterns of
performance on two different tasks or skills, this strongly suggested that there were at least two distinguishable
cognitive systems involved in these skills (this approach is known as dissociation logic: for a detailed
discussion, see Shallice, 1988). The emphasis in these early CN approaches was squarely on the coarse-grained
organisation of major cognitive components—that is, their functional architecture. They neither attempted to,
nor purported to, address the specific cognitive operations that supported processing within these components.
These details, it was argued, could be fleshed out at a later date.

In the area of spoken word production, the earliest CN case studies in the 1980s supported the existence of three
crucial cognitive components, each of which could give rise to a different pattern of performance when
impaired. The first pattern of performance is exemplified by the case of JCU (Howard & Orchard-Lisle, 1984)
whose profile is summarised in Table 1. Although JCU had a global aphasia, she could sometimes produce the
correct name for a picture if cued with its first sound (e.g., ‘‘starts with p’’). However, Howard and Orchard-
Lisle (1984) found that JCU was often misled by such a cue if it was inappropriate, particularly if it cued a
semantically related item (e.g., for the target tiger, the cue ‘‘starts with l’’ could elicit ‘‘lion’’). JCU frequently
failed to recognise her errors as incorrect. Perhaps most importantly, JCU made similar types of semantic
confusions in both production and comprehension: for example, when asked to judge whether an auditory word
matched a given picture, she had great difficulty rejecting incorrect, but semantically related, distractors (e.g.,
car vs bike). The pattern of performance shown by JCU and other similar cases was consistent with an inability
to access stored semantic information about words; the fact that both comprehension and production were
affected suggested that the same store was accessed in both modalities. JCU’s frequent semantic errors
suggested that she could sometimes access partial semantic descriptions of words; however, these were
sometimes not sufficiently specified to enable her to discriminate among highly similar items, or to reject
misleading cues. This led to the proposition that a crucial first step in spontaneous word production (and
naming) involved accessing representations within a central, non-modality-specific verbal semantic store (Ellis
& Young, 1988; Hillis, Rapp, Romani, & Caramazza, 1990; Howard & Orchard-Lisle, 1984; Howard &
Patterson, 1992).

The second pattern of performance is illustrated in case EST (Kay & Ellis, 1987). Unlike JCU, EST’s speech
was fluent and informative, although it contained many word-finding pauses and circumlocutory descriptions.
On picture-naming tasks, EST produced a range of different types of errors, including phonemic paraphasias
(e.g., grapes R ‘‘graffs’’), and semantic paraphasias (e.g., axe R ‘‘hammer’’). These errors were more common
on low than on high-frequency words. Unlike JCU, EST’s auditory comprehension was normal, and he could
not be ‘‘miscued’’ in naming (see Table 1 for a summary of features of EST). These two latter features
suggested EST had no difficulty accessing the semantic representations of words. Nevertheless, he was unable
to consistently access specific information required for the accurate production of words. The phonemic
paraphasias in particular suggested incomplete access to information about the words’ phonological forms.
Cases such as EST’s led to the proposal that there was at least one further major cognitive component involved
in word production. This was conceptualised as a lexical store, separate and distinct from the verbal-semantic
store, which contained information about the phonological forms of words needed for production. This
component was commonly known as the phonological output lexicon (see, e.g., Ellis & Young, 1988; Kay &
Ellis, 1987; Patterson & Shewell, 1987).

A third distinct pattern of performance was described in the case of RL (Caplan, Vanier, & Baker, 1986). RL
spoke fluently and grammatically. His auditory comprehension was normal on simple word–picture matching
tasks. He almost never produced semantic paraphasias. RL’s major difficulty was that he produced a
considerable number of phonemic paraphasias, particularly on longer words. Many of these differed from the
intended word by only one or two phonemes. Unlike EST, the strongest predictor of his errors was not word
frequency, but rather word length. Crucially, RL made phonemic paraphasias not just in spontaneous speech and
picture naming, but also in word repetition and reading, and even in repetition of nonsense words such as
‘‘splent’’. Therefore, RL’s difficulty did not appear to be restricted to tasks that involved semantic and/or lexical
processing, but was evident even on tasks that did not explicitly require access to the mental lexicon. The
problem did not appear to involve actual motor programming of the articulators, because RL’s articulation of
phonological forms—both correct and incorrect— sounded normal (see Table1 for a summary of RL’s profile).
In the context of existing theoretical frameworks, RL’s profile suggested impairment to a system that utilised a
phonological code, but that operated subsequent to lexical retrieval. This system was commonly characterised as
a buffer store, whose role was to temporarily retain phonological sequences planned for production until they
were ready to be articulated (see e.g., Bisiacchi, Cipolotti, & Denes, 1989; Bub, Black, Howell, & Kertesz,
1987; Morton & Patterson, 1980; Patterson & Shewell, 1987; see also Caramazza, Miceli, & Villa, 1986). This
component was termed the response buffer or the phonological output buffer.

In conclusion, what emerged from these kinds of studies was a framework that identified at least three major
processing components involved in word production. In theoretical frameworks at the time, these three systems
were usually embedded within a more general theory which specified the cognitive components participating in
other types of single word tasks, including word repetition, word comprehension, written word production, and
oral reading. Figure 1 illustrates the most influential theory of this type, based on Patterson and Shewell (1987;
for similar proposals see also Ellis & Young, 1988; Harris & Coltheart, 1986). The three components that were
considered crucial for spontaneous word production are shown in bold. This basic configuration gained wide
acceptance among the aphasia research community, and indeed it is still highly influential today. The key
terminology—terms such as ‘‘phonological output lexicon’’ and ‘‘phonological output buffer’’—is still
frequently used in current research to describe specific patterns of performance (for some recent examples, see
Biran & Friedmann, 2005; Goldrick & Rapp, 2007; Howard & Gatehouse, 2006; Papagno & Girelli, 2005;
Shallice, Rumiati, & Zadini, 2000). This theoretical framework has also been highly influential in assessment
and rehabilita- tion. It forms the basis of several recent cognitively oriented aphasia assessment batteries—e.g.,
The Psycholinguistic Assessment of Language Processing in Aphasia (PALPA: Kay, Lesser, & Coltheart,
1992); The Comprehensive Aphasia Test (CAT: Swinburn, Porter, & Howard, 2004). The general approach also
inspired new approaches to aphasia therapy, in which treatment is targeted towards the specific cognitive
components hypothesised to be impaired in each particular individual (for reviews see Laine & Martin, 2006;
Nickels & Best, 1996). One of the main reasons for the success of these functional architecture theories was that
they provided a means of describing important differences in the patterns of performance exhibited by different
individuals, many of which were not captured well within more traditional syndrome-based diagnostic schemes.
Also, by allowing for the possibility that a person could potentially suffer from an impairment to more than one
of the major cognitive systems, they offered considerably more flexibility than previous schemes. Another
strength was undoubtedly their conceptual transparency. These theories could be readily represented graphically
using simple box-and-arrow diagrams, which made them not only easy to grasp, but also easy to communicate
to others.

Nevertheless, there are limitations to the use of functional architecture theories. First, the descriptive terms used
in these theories are quite idiosyncratic to the neuropsychological literature, and do not map neatly onto the
terms used in theoretical descriptions of normal word production. Consequently, a heavy reliance on these
frameworks can limit the potential contribution of patient methods to language production research more
generally. A second disadvantage is that, within these types of functional architecture theories, it becomes
deceptively easy to mistake description for explanation. For example, within a functional architecture scheme,
an individual who makes phonological errors when attempting to produce words (particularly rare words) may
be described as being ‘‘unable to access the phonological output lexicon’’. This description forms a useful
shorthand for describing the person’s pattern of performance and a simple way of conceptualising their problem.
However, it does not identify the precise processes that are impaired, and offers few predictions about
performance on other types of words or tasks, or about the kinds of treatments that might be most effective.
Third, although functional architecture theories may appear to provide a useful starting point for theorising,
enabling researchers to then ‘‘fill in the details’’ of the specific impairment through subsequent research, this
enterprise may be more problematic than it might first appear. As will be illustrated below, when researchers
began to consider the relevant cognitive processes at a finer-grained level, the number and function of the
‘‘components’’ themselves began to change dramatically. Some systems that seemed to be required to account
for particular patterns of performance suddenly became unnecessary, and others changed their character
completely. Further, as will be seen below, certain language difficulties that are considered completely unrelated
when viewed in the context of a functional architecture framework began to reveal commonalties when
considered at a finer-grained level. It would appear that the relationship between coarse-grained, functional
architecture type theories and finer- grained theories is not as transparent as it first seems.
The following sections of this article will review more recent approaches to aphasic word production that aim to
go beyond mere characterisation of the major component(s) impaired within a functional architecture type
framework. Many of these have been strongly influenced by theoretical concepts from cognitive psychology and
cognitive science. Therefore, the review will begin by outlining some of the most influential of these concepts.
It will then discuss the various theoretical accounts of aphasic word production that have been inspired by them,
and the new avenues of research that have in turn been stimulation by these accounts.

KEY CONCEPTS FROM THEORIES OF NORMAL WORD PRODUCTION

In the late 1970s and 1980s, an idea that became popular in the cognitive psychology literature was that the
process of word retrieval took place in two major stages. The first of these stages was generally characterised as
a lexical selection stage, and its primary purpose was to identify the most appropriate word within the mental
lexicon to match the desired concept. At this stage, the representations of individual words were not considered
to be specified in terms of their phonological form. It was proposed that information about the selected word’s
sound form was retrieved during the second major stage, the phonological retrieval stage (e.g., Dell, 1986, 1988;
Garrett, 1975; Harley, 1984; Kempen & Huijbers, 1983; Levelt, 1989; MacKay, 1987; Stemberger, 1985). This
basic two-stage concept is illustrated in Figure 2. Individual theories differed as to whether they also included
intermediate stages (for example, ones that decompose words into their morphemes and/or syllables, or ones that
organise phonemes into larger multi-word phrases). Nevertheless, almost all shared the basic idea of two major
steps or stages. Indeed, this two-stage concept remains a central feature of most recent theories of word
production (see, e.g., Caramazza, 1997; Levelt, Roelofs, & Meyer, 1999; Rapp & Goldrick, 2000; Roelofs,
2004; Ruml, Caramazza, Capasso, & Miceli, 2005; Schwartz, Dell, Martin, Gahl, & Sobel, 2006).

Another idea that gained ground in cognitive psychology in the 1980s was the description of particular cognitive
processes in spreading activation terms (e.g., Dell, 1986, 1988; Harley, 1984; Roelofs, 1992, 1997; Stemberger,
1985, 1990). For example, the simple two-stage theory illustrated in Figure3 describes the major stages of word
production in terms of spreading activation through a very simple neural network consisting of words, their
semantic representations, and their constituent phonemes (the example shown is from Dell & O’Seaghdha,
1991, 1992; for similar proposals see Dell, 1986; Dell, Schwartz, Martin, Saffran, & Gagnon, 1997; Foygel &
Dell, 2000). In this particular theory, the network contains units that correspond to individual lexical items, their
phonemes, and their semantic features. Corresponding units at different levels are connected to one another: for
example, the lexical unit ‘‘cat’’ is connected to the semantic feature units ‘‘has four legs’’, ‘‘has fur’’, and
‘‘goes meow’’, and also to the phonological units /k/, /a/ and /t/.

In this theory, word production proceeds as follows. First, during the lexical selection stage, the semantic
representation of the concept, which is expressed as a pattern of activation across semantic feature units, begins
to transmit activation to lexical units. All units of associated words receive some activation (for example, if the
target item is a cat, the lexical unit for ‘‘dog’’ will also become activated because of its shared features).
However, the word that contains the greatest number of relevant semantic features will generally receive the
most activation. The spread of activation is continuous so each lexical unit, once activated, automatically starts
activating its corresponding phonological units. The lexical selection stage is complete when the most highly
activated lexical unit is ‘‘selected’’ for production (at which time it is given an additional activation boost). If
the lexical selection stage goes awry for some reason, an incorrect word may be produced. This will commonly
be a semantically related word because the items become partially activated during the lexical selection stage.
During the phonological retrieval stage, activation spreads from lexical units to their corresponding
phonological units. This process starts even before lexical selection is complete, but gains greater momentum
after the selected lexical unit receives its ‘‘boost’’. The phonological retrieval stage is complete when a
phoneme has been selected for each position in the word. An incorrect selection at this stage will most
commonly lead to a phonological error, in which one or more of

the target word’s phonemes are incorrect or missing (e.g., castle R ‘‘cacksel’’). Spreading activation theories
like these substantially changed the theoretical landscape in the field of normal word production. They led to a
shift in attention away from the nature of the linguistic representations themselves, and towards the processes
that map between the different types of representations. In addition, they led to a change in the way researchers
viewed speech errors. In spreading activation theories, the activation level of any unit reflects the sum of the
activation it receives from all other units—an idea sometimes called activation summation. Consequently, each
unit’s total activation depends crucially on which other units are also activated at the time. This could vary
considerably from task to task—for example, in a task like picture naming, lexical units will receive most of
their activation from their corresponding semantic units, whereas in a task like repetition the lexical unit might
receive additional activation directly from the auditory stimulus itself. This led to a whole new way of thinking
about differences in performance across tasks. The next

section discusses how these ideas influenced explanations of aphasic disorders.


The notion of spreading activation remains central to virtually all current theories of normal word production,
despite variation in the numbers and types of units each theory possesses, the way activation flows through the
network, and the extent to which it is modulated by network-external processes. However, recent debates in the
literature on normal word production have centred largely on the question of how activation flows throughout
the network. One hotly debated issue is whether activation in the network is ‘‘gated’’ so that only one lexical
unit substantially activates its corresponding phonological units, or whether activation flows ‘‘in cascade’’ so
that several lexical units substantially activate their phonological units, even before a single lexical item has
been selected (Cutting & Ferreira, 1999; Navarrete & Costa, 2005; Peterson & Savoy, 1998; Rapp & Goldrick,
2000). These two contrasting proposals are illustrated in panels (a) and (b) of Figure 4. The theory of Dell and
colleagues described earlier (Figure 3) allows for some cascade, because activation can start spreading to
phonological units even before lexical selection is complete. However, some theories do not permit any cascade

at all (e.g., Levelt et al., 1999; Roelofs, 2004).


Another issue concerns whether activation flows only downwards—from

semantic to lexical then to phonologically units—or whether it can also feed back from phonological to lexical
units, and from lexical to semantic units. This idea is illustrated in panel (c) of Figure 4. If activation can ‘‘feed
back’’ up the network, then it is possible for events happening during the phonological retrieval stage to
influence events occurring during lexical selection. In the Dell theory discussed earlier (Figure 3) feedback is
permitted, but in many other theories it is not (e.g., Laine, Tikkala & Juhola, 1998; Levelt et al., 1999; Roelofs,
2004; Ruml et al., 2005; for a recent discussion of this issue, see Rapp & Goldrick, 2004; Roelofs, 2004).
Theories that allow for feedback (so-called ‘‘interactive’’ theories) make several important predictions. For
example, in addition to semantic and phonological errors, they predict the occurrence of a third type of error,
called a formal error or a formal paraphasia. In a formal error, a similar-sounding real word is produced instead
of the intended word (e.g. camel R ‘‘candle’’). These errors are predicted because activation feeds back from
phonological to lexical units, and therefore words that are phonologically similar to the target word will tend to
be more activated than unrelated words. They therefore have a greater chance of being selected in place of the
target. There is empirical support for this prediction of interactive theories: In many collections of normal
spontaneous speech errors, formal errors have been found to occur more often than would be expected by
chance alone (Dell & Reich, 1981; Nooteboom, 2005) and they also occur at a higher-than-chance rates in
certain kinds of aphasic conditions (see detailed discussion below).

In theories where no feedback is permitted, an explanation can be offered for the occurrence of formal errors,
but it involves making additional assumptions about the way in which errors are monitored and corrected by the
speaker prior to articulation. One explanation that has been offered is that the formal errors themselves are
chance occurrences—that is, they are either unrelated word errors where the word just happens to be
phonologically related to the target, or they are phonological errors where the phoneme string just happens to
correspond to a real word. The reason they occur more often than chance is that speakers are less likely to detect
and correct them than they are other types of errors, because they are so similar in form to the target, and are
also themselves legitimate words (see Levelt et al., 1999; Roelofs, 2004).

The predictions of interactive theories are not limited to the issue of formal errors. These theories also predict
the occurrence of other types of errors, such as ‘‘mixed’’ errors (errors that are both phonologically and
semantically related to the target, such as carrot–cabbage). Several studies have found that these kinds of errors
also occur more frequently than chance in both normal speakers (Dell & Reich, 1981; Harley, 1984) and in
some individuals with aphasia (Blanken, 1990, 1998; Laine & Martin, 1996; Rapp & Goldrick, 2000). In
theories that allow for feedback, these errors occur because a word that is both semantically and phonologically
related to the target receives activation from two sources: It receives top-down activation via the semantic
features it shares with the target, and it receives feedback activation via the phonemes it shares with the target.
In theories that do not allow for feedback, further assumptions must be made in order to explain such errors (for
example, the error-monitoring system has to be attributed with the ability to evaluate semantic as well as
phonological appropriateness). We return a consideration of these issues in the discussion of aphasia below.

You might also like