You are on page 1of 49

This article was downloaded by: [b-on: Biblioteca do conhecimento online UTAD]

On: 11 February 2013, At: 09:56


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
House, 37-41 Mortimer Street, London W1T 3JH, UK

Cognitive Neuropsychology
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/pcgn20

NORMAL AND IMPAIRED SPELLING IN A


CONNECTIONIST DUAL-ROUTE ARCHITECTURE
a b
George Houghton & Marco Zorzi
a
University of Wales, Bangor, UK
b
Università di Padova, Italy
Version of record first published: 09 Sep 2010.

To cite this article: George Houghton & Marco Zorzi (2003): NORMAL AND IMPAIRED SPELLING IN A CONNECTIONIST DUAL-
ROUTE ARCHITECTURE, Cognitive Neuropsychology, 20:2, 115-162

To link to this article: http://dx.doi.org/10.1080/02643290242000871

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to
anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss, actions,
claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
Q1030–CN3301 / Mar 4, 03 (Tue)/ [48 pages, 9 tables, 12 figures, 8 footnotes, 3 Appendices] – S endings, c.f. [no
comma]. DISK Edited - Maths Equations & Phonetics to set
COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2), 115–162

NORMAL AND IMPAIRED SPELLING IN A


CONNECTIONIST DUAL-ROUTE ARCHITECTURE

George Houghton
University of Wales, Bangor, UK
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Marco Zorzi
Università di Padova, Italy

This paper presents a dual-route connectionist model of spelling, in which one route maps directly from
sound to spelling (phonemes to graphemes), while in the other route the mapping is mediated by a fur-
ther level of representation. The direct route is implemented as a two-layer associative network, with
syllabically structured phonemic (input) and graphemic (output) representations, which comes to
behave as a productive sound-to-spelling conversion mechanism through the exposure to a corpus of
monosyllabic words. The mediated route is modelled as a frequency-sensitive lexical pathway. Nodes
representing more frequent words become activated more rapidly than those of lower-frequency words.
Access to both routes occurs in parallel, and the final spelling is determined by the combined output of
both routes. We show that the model accounts for a wide range of data from normal spellers (including
nonword spelling, the variability in vowel spelling and the effect of surrounding phonological context,
frequency effect and its interaction with spelling regularity). We also investigate the effect of a selective
lesion to the lexical route in which the ceiling of lexical activation is lowered. This manipulation pro-
duces a model with surface dysgraphic characteristics, which is tested against data from two impaired
subjects. As well as simulating the classic surface dysgraphic profile, including a frequency by regularity
interaction, the model exhibits a phenomenon that has only recently been reported, and which provides
strong evidence for the idea that multiple routes are active in parallel, and combine to produce the final
spelling.

INTRODUCTION quence of learning to read and write in an alpha-


betic script (Goswami & Bryant, 1990; Morais,
In literate societies, a considerable part of our expe- Cary, Alegria, & Bertelson, 1979). It is also likely
rience of language involves its written form. For that the acquisition of literacy, and even the nature
children, learning to read and write is the most and regularity of the writing system learned, has
important goal of their early school years, and pro- implications for brain development. For instance,
vides the indispensable foundation for all later for- in a recent neuroimaging study comparing reading
mal education. Literacy may have effects on other in English and Italian, Paulesu et al. (2000) report
language skills; for instance, there is evidence that differences in the distribution of brain activation as
awareness of individual phonemes arises as a conse- a function of the “depth” of the orthography.

Requests for reprints should be addressed to George Houghton, School of Psychology, University of Wales, Bangor, Bangor,
Gwynedd, LL57 2DG, UK (Tel: (0)1248 382692; Fax: (0) 1248 382599; Email: g.houghton@bangor.ac.uk).
We are grateful to Michael Cortese, David Plaut, Brenda Rapp, and Marie Josephe Tainturier for their helpful comments on an
earlier version of this article.

 2003 Psychology Press Ltd 115


http://www.tandf.co.uk/journals/pp/02643294.html DOI:10.1080/02643290242000871
HOUGHTON AND ZORZI

It is no surprise, then, that educational, cogni- language has more than a few dozen phonemes,
tive, and neuro-psychologists have a long-standing compared to thousands of words and morphemes.
interest in the processing of written language. Thus an alphabetic script allows an entire language
Although more experimental work has been done to be written with relatively few symbols, and the
on reading than on writing, the production of writ- writing of new words (for instance, foreign loan
ten language, in particular spelling, has not been words and place names) does not require the inven-
neglected (Brown & Ellis, 1994; Frith, 1980). tion of new symbols.
However, in the last decade or so the empirical For the psychologist, one of the most striking
work on reading has been accompanied by a vigor- features of alphabetic writing is that in principle it
ous programme of theoretical work, involving com- provides more than one method by which a word
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

putational modelling of many aspects of normal may be spelled (or read). Take the English word
and impaired reading by a number of different “bat.” Intuitively, one could write it correctly
groups (e.g., Brown, 1987; Bullinaria, 1997; either by learning it as a sequence of three letters,
Coltheart & Rastle, 1994; Coltheart, Rastle, Perry, or by translating its constituent phonemes /b&t/1
Langdon, & Ziegler, 2001; Grainger & Jacobs, into letters, /b/ = B, /&/ = A, /t/ = T. In a com-
1996; Harm & Seidenberg, 1999; Hinton & pletely regular, consistent alphabetic script, the
Shallice, 1991; McClelland & Rumelhart, 1981; second method would always generate the canoni-
Norris, 1994; Plaut, McClelland, Seidenberg, & cal spelling, and many modern orthographies
Patterson, 1996; Plaut & Shallice, 1993; come very close to this ideal (German, Spanish,
Seidenberg & McClelland, 1989; Sejnowski & Welsh, Italian, etc.). Others, however, do not, and
Rosenberg, 1987; Zorzi, 2000; Zorzi, Houghton & considerable attention has been paid to the “quasi-
Butterworth, 1998a, 1998b). Research on spelling regularity” problem—in English, for instance,
has not so far given rise to a theoretical literature of sound-to-spelling relationships are basically regu-
comparable breadth and depth (see the General lar but with many inconsistencies and specific
Discussion section for review). In this paper, we exception words.
present the first model of spelling with a scope The relationship between the pronunciation and
comparable to that achieved in some of the models the spelling of a word can be used to classify it with
of reading. respect to how regular and consistent its spelling is
relative to the major sound–spelling regularities of
the language. A broad but useful classification dis-
Spelling and the dual-route framework
tinguishes between regular-consistent (RC), regu-
In alphabetic scripts, the written symbols are lar-inconsistent (RI), and irregular or exception
intended to stand for words by virtue of represent- (EXC) words. Words of the first two categories are
ing their pronunciation; in particular, written regularly spelled in the sense that the sounds of the
graphemes (single letters, or letter combinations word are represented by graphemes that represent
such as SH in English, or LL in Welsh) stand for the same sounds in many other words. In the case of
spoken phonemes. This way of representing spoken RC words, the sound-to-spelling relationship is
language was a significant cultural invention, and also consistent, in that the sounds in question are
evolved from earlier logographic scripts in which always (or almost always) spelled the same way. For
individual symbols had semantic rather than pho- instance, words such as plank, lift, shop, thing, etc.
nological significance (Gelb, 1952). The advantage are both regular and consistent. Each phoneme is
of an alphabet lies in its economy and flexibility. No represented by the grapheme that always represents

1
Phonemic symbol key (British English pronunciation): Vowels: /A/ as in cAR, /&/ as in cAt, /e/ as in bEt, /I/ as in hIt, /o/ as in
hOt, /V/ as in hUt, /i / as in bEAt, /u/ as in bOOt, /U/ as in pUt, /O/ as in dOOR, /@U/ as in lOAf, /aI/ as in fIle, /eI/ as in wAIt, /oI/
as in sOIl, /3/ as in bURn, /I@/ as in chEER. Consonants: most have standard values, e.g., /d/ as in Door. Also /dZ/ as in Jet, /S/ as in
SHed, /tS/ as in CHin, /9/ as in siNG, /T/ as in THin, /D/ as in THat.

116 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

that sound in the context in which it occurs. So all vowel in the pseudoword /pIv/ with an I. In con-
English words that begin with the dental fricative trast, the majority of participants spelled the
/T/ begin with the letters TH; (virtually) all mono- pseudoword /teIn/ in one of two different ways,
syllabic words containing the short vowel /I/ are tane (63%) or tain (37%—the percentages are
spelled with an I. By contrast, regular-inconsistent derived from raw scores reported in Barry &
words contain sounds that may be spelled in more Seymour, 1988, Appendices 1 and 2).
than one way. The major locus of inconsistency in The idea that the sound-to-spelling regularities
English spelling is in the vowel (see Kessler & of an alphabetic script may be separately repre-
Treiman, 2001, for a quantitative analysis). For sented from knowledge of the spelling of individual
instance, while words such as feet, corn, paid, etc. word has a long history (see, e.g., Baron &
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

show a “regular” spelling pattern, the vowels in Strawson, 1976; Coltheart, 1978, 1985; Forster &
them are often represented by other graphemes. So Chambers, 1973; Morton, 1980), and has given rise
the vowel /i/ of feet is often represented by the to the “dual-route” framework for both reading and
grapheme EA (e.g., treat, feat). In British English, spelling. This is illustrated in Figure 1 for the case
the long /O/ of corn could be represented by the of spelling.
grapheme AW (e.g., yawn, lawn, brawn), and so One route of this framework, which we will refer
on. Exception, or irregular, words contain at least to as the lexical route, is postulated to be memory
one very low-frequency phoneme–grapheme based. Through extensive experience, literate adults
relationship, e.g., /eI/ → EA in great, /V/ → 0 in come to memorise the spelling of individual words,
front, /S/ → CH in chef. It is important to note here and these are stored in the orthographic output lexi-
that the classification of words in this way is not con (OOL). How this is done, and the nature of the
independent of the direction of the sound–spelling lexical representations formed, is not specified by the
relationship one is considering. Thus some words framework, and may differ extensively between
that are regular and consistent from the point of models that conform to it (see Shallice, Glasspool, &
view of reading (e.g., green could only be pro- Houghton, 1995, for a comparison between two lex-
nounced /grin/), are inconsistent with respect to ical spelling models). These lexical representations
spelling (/grin/ could be spelled grean, by analogy can be accessed in various ways, for instance from the
with mean, lean, bean, etc., or possibly even grene, as sound of a familiar word, or from its meaning (see
in scene; see Ziegler, Stone, & Jacobs, 1997). Barry, 1994, for discussion of a number of possible
The standard way of assessing the psychological
“strength” of a particular sound–spelling relation- Phonological
ship is to test its productivity in reading or spelling input Semantics
phonologically legal nonwords (or pseudowords).
Literate adults can easily read and write such words Phonological
and generally do so in a way that conforms to the input lexicon
dominant regularities of the language. To take an
example from reading, the vowel letter I, when fol- Sound–spelling Orthographic
conversion output lexicon
lowed by a group of consonant letters, is virtually
always pronounced as in hint, pick, swim, grist, etc.
However, in the exception word pint it corresponds Graphemic
to a diphthong. But the pseudoword WINT is most output buffer
likely to be pronounced to rhyme with lint rather
than with pint. Where there is more inconsistency
Handwriting Typing
in a mapping, this often manifests itself as greater
variability in the responses to pseudowords. For Figure 1. A typical “dual-route” model for spelling. Dotted arrows
instance, in a study on nonword spelling by Barry show the lexical route, bold arrows the sound–spelling conversion
and Seymour (1988), all participants spelled the route. A direct input from semantics is also shown.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 117


HOUGHTON AND ZORZI

routes to the OOL). The second route deals with the Surface dysgraphia
sublexical sound–spelling conversion process, and
The cognitive operations involved in the process of
we will generally refer to this as the phonological route
oral and written spelling have been investigated in a
(other terms used include the assembly route, sound-
number of neuropsychological studies (Denes,
to-spelling route, and phoneme–grapheme conversion).
Cipolotti, & Zorzi, 1999; Shallice, 1988, for
Again, the dual-route framework itself does not
reviews). Similarly to the case of reading disorders,
specify how this route is formed or how it operates,
the study of acquired spelling disorders
and many possibilities are compatible with the
framework. For instance, the analogous route for (“dysgraphia”) provides strong evidence for a two-
reading (spelling-to-sound conversion) is modelled process model of spelling. More specifically, the
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

by Coltheart and Rastle (1994) as a serial-process- observation of a neuropsychological double dissoci-


ing, symbolic production system, and by Zorzi et al. ation between the ability to spell known words (in
(1998a) as a parallel-processing, associative network particular exception words) and the ability to spell
(see Zorzi, 2000, for further discussion of serial vs. nonwords has been taken as evidence for the exis-
parallel processing in phonological assembly). A fur- tence of functionally independent lexical and
ther possible source of variation amongst models sublexical spelling procedures.
concerns the nature of the interactions between the First described by Beauvois and Derousné
two routes, and control over when each route is to be (1981), surface dysgraphia is characterised by poor
used. spelling of words whose spellings cannot be unam-
Despite the many important differences biguously derived from their pronunciations. Sur-
amongst possible dual-route models, there is a cen- face dysgraphic patients are impaired in spelling
tral point of agreement regarding the division of words with exceptional sound–spelling correspon-
labour within the system. The lexical route may be dences, whereas their ability to convert phonemes
used to spell any learned word, and is crucial at least into graphemes (i.e., nonword spelling) is well pre-
for the correct spelling of exception words. The served. The syndrome was first termed lexical or
phonological route translates the constituent orthographic agraphia, to indicate that the disorder
sounds of words into letters on the basis of originated from a deficit involving the lexical spell-
sublexical sound-to-spelling regularities; hence ing procedure. The complementary deficit, named
exception words, by definition, will be regularised phonological agraphia, was described by Shallice
(misspelled) by this route. The phonological route (1981) and is characterised by a dissociation
is at least used to spell novel, or poorly learned, between a good ability to write words (both regular
words. Its contribution to the spelling of known and exception) and a poor performance in writing
words is not defined by the dual-route framework, nonwords to dictation. For example, the patient
and depends on the particular theory. However, it is described by Shallice wrote 94% of words correctly
generally supposed that any plausible mechanism but only 18% of nonwords.
will be capable of generating the correct spelling of A number of cases of surface dysgraphia have
at least the regular-consistent words (in languages been described (e.g., Baxter & Warrington, 1987;
with very regular orthographies, this set will include Behrmann, 1987; Behrmann & Bub, 1992;
most of the words in the language). Coltheart & Funnell, 1987; De Partz, Seron, &
Strong support for the basic assumptions of the Van der Linden, 1992; Goodman & Caramazza,
dual-route framework has been derived from stud- 1986; Goodman-Schulman & Caramazza, 1987;
ies of subjects with spelling impairments, and mod- Hatfield & Patterson, 1983; Rapcsak, Arthur, &
elling data from such studies is an important aim of Rubens, 1988; Rapp, Epstein, & Tainturier, 2002;
this paper. Before describing the model, we briefly Roeltgen & Heilman, 1984; Weekes & Coltheart,
review data from neuropsychological studies sup- 1996). Most of the patients had stable focal lesions
portive of the basic functional division discussed resulting from stroke or brain injury, but the surface
here. dysgraphic pattern has also been observed in

118 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

patients with progressive disorders such as Alzhei- Schulman & Caramazza, 1987; Hatfield &
mer’s dementia (e.g., Hughes, Graham, Patterson, Patterson, 1983; Penniello et al., 1995; Roeltgen &
& Hodges, 1997; Lambert, Eustache, Viader, Heilman, 1984), although other regions (e.g., the
Dany, Rioux, & Lechevalier, 1996; Platel et al., left precentral gyrus; Rapcsak et al., 1988) have
1993; Rapcsak, Arthur, Bliklen, & Rubens, 1989) sometimes been implicated. However, the idea that
and semantic dementia (e.g., Graham, Patterson, the functional role of the left angular gyrus is that of
& Hodges, 1997, 2000). Surface dysgraphia also the orthographic lexicon (whose impairment would
has a developmental counterpart. A surface cause surface dysgraphia) is somewhat inconsistent
dysgraphic pattern has been observed in some with the results of neuroimaging studies of reading,
developmental cases, usually associated with sur- which have related activation of the left angular
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

face dyslexia (e.g., Castles & Coltheart, 1996; gyrus to semantic processing (e.g., Demonet et al.,
Coltheart, Masterson, Byng, Prior, & Riddoch, 1992; Price, Moore, Humphreys, & Wise, 1997).
1983; Hanley, Hastie, & Kay, 1992). Furthermore, the regions associated to ortho-
Typical spelling errors in surface dysgraphia are graphic, “word form” processing in neuroimaging
phonologically plausible, in the sense that reading studies of reading are more consistent with the
aloud the misspelled word will produce the correct lesions that cause letter production problems in
phonological form of the target. For instance, spelling (i.e., a peripheral dysgraphia; Ellis, 1988).
the word VEIN might be misspelled as VANE, High-resolution techniques (e.g., fMRI) are
but both spelt forms have the same pronunciation needed to investigate whether a whole-word level
(/veIn/). However, accuracy of spelling is also and a letter/grapheme level of orthographic repre-
affected by frequency: Surface dysgraphic patients sentation can be separated in an activation study,
show a frequency-by-regularity interaction, that is, because the associated regions are likely to be spa-
their performance on exception words gets worse as tially contiguous if not partially overlapping (see N.
the frequency of the words decreases (Beauvois L. Graham et al., 1997, for similar argument).
& Derousné, 1981; Behrmann & Bub, 1992;
Coltheart & Funnell, 1987; De Partz et al., 1992; Spelling and semantics. Recent neuropsychological
Goodman & Caramazza, 1986). studies have tackled the issue of the relationship
A pure example of acquired surface dysgraphia is between semantic memory impairments and sur-
MP, a patient studied by Behrmann and Bub face dysgraphia. Graham et al. (2000) studied a
(1992). The case of MP is widely known as one of group of patients with semantic dementia and
the purest examples of surface dyslexia and has been found that their performance, compared to that of
the subject of several papers (e.g., Behrmann & control subjects, was disproportionately affected by
Bub, 1992; Bub, Cancelliere, & Kertesz, 1985; regularity and word frequency. Spelling of low-fre-
Patterson & Behrmann, 1997). Interestingly, the quency exception words was most impaired, and
surface dysgraphia showed by MP paralleled her the majority of errors were phonologically plausible
surface dyslexic pattern in a striking way. She renderings of the target word. This pattern mirrors
showed very good spelling of nonwords and regular the neuropsychological association between
words, but was poor at spelling irregular words. semantic dementia and surface dyslexia that led
Furthermore, her performance with irregular words Patterson and Hodges (1992) to the hypothesis
declined as a function of their frequency and almost that correct reading of low-frequency exception
all of her spelling errors were phonologically plausi- words is dependent upon the integrity of semantic
ble (i.e., “regularisations”). We present a detailed representations (see Plaut et al., 1996, for a simula-
simulation of data from MP later in the paper. tion instantiating this theory). Indeed, many
Surface dysgraphia is usually associated with patients presenting with semantic dementia are also
lesions in the region of the left angular gyrus (e.g., surface dyslexic (e.g., Funnell, 1996; Graham,
Beauvois & Derousné, 1981; Behrmann, 1987; Hodges, & Patterson, 1994; Patterson & Hodges,
Goodman & Caramazza, 1986; Goodman- 1992), but the corresponding dissociation (i.e.,

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 119


HOUGHTON AND ZORZI

intact reading in the presence of semantic deficits) that in surface dyslexics a lexical response is pre-
has also been observed (e.g., Cipolotti & ferred over a sublexical regularised response only
Warrington, 1995; Lambon Ralph, Ellis, & Frank- when vestiges of word meaning remain. Patient
lin, 1995; Schwartz, Saffran, & Marin, 1980). With EP, who presented with semantic dementia (and
regard to spelling, Hall and Riddoch (1997) surface dyslexia), showed knowledge of both lexi-
described the case of a patient (KW) who, in spite of cally derived and sublexically derived pronunciation
impaired auditory comprehension, showed a well- in reading exception words (e.g., presented with
preserved ability to write to dictation words (both GLOVE she responded “/gl@Uv/ or /glVv/ ?”; see
regular and irregular) that he could not understand. Funnell, 1996). When all meaning of a given word
In patients with dementia of the Alzheimer’s type was lost, the patient selected a sublexical response,
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

(DAT), who indeed present with semantic deficits, even when she had also produced the correct lexical
the finding of surface dysgraphia is not universal, response. In the same vein, Ward, Stott, and Parkin
and with disease progression nonphonologically (2000) documented the case of a patient with
plausible errors increase (see Graham, 2000, for semantic dementia (SA) who showed the ability to
review). Moreover, in DAT patients the effect of use partial semantic knowledge to constrain his
spelling regularity is only slightly enhanced in com- reading and spelling.
parison to control subjects (e.g., Glosser, Grugan,
& Friedman, 1999a; Glosser, Kohn, Sands,
Connectionist models of spelling: Single-
Grugan, & Friedman, 1999b).
and dual-route architectures
One possible explanation for the association
between semantic dementia and surface dyslexia/ As noted earlier, the experimental literature on nor-
dysgraphia is that it reflects pathological involve- mal and impaired reading has generated an expand-
ment of brain regions that are closely related both ing literature in computational modelling, with
functionally and anatomically (also see Cipolotti & virtually all current models containing at least some
Warrington, 1995). In other words, this specific connectionist components. Although spelling has
form of cortical degeneration would lead to seman- not received the same degree of attention, a number
tic impairments but also (and perhaps invariably) to of connectionist models of spelling have been pro-
the disruption of lexical processing (both ortho- duced, using architectures similar to those of the
graphic and phonological). This hypothesis would reading models of Sejnowski and Rosenberg (1987)
seem to gain support from a recent study that com- and Seidenberg and McClelland (1989). For
pared the reading performance of patients with instance, Brown and Loosemore (1994), Bullinaria
different types of dementia (Noble, Glosser, & (1994), and Olson and Caramazza (1994) all
Grossman, 2000). Despite the presence of a seman- describe models of sound–spelling conversion
tic impairment, patients with Alzheimer’s disease, using “single-route,” multilayer networks trained
frontotemporal dementia, and progressive non- with the backpropagation algorithm (Rumelhart,
fluent aphasia did not show a pattern of reading Hinton, & Williams, 1986; these models are
difficulty consistent with surface dyslexia; only the described further in the General Discussion section
patients with semantic dementia showed the pre- of this paper). Such networks contain layers of pro-
dicted pattern of reading impairment. However, it cessing units whose activation states represent the
is worth noting that the disruption of lexical ortho- inputs and outputs of the cognitive system under
graphic and phonological processing in semantic study (phonemes and graphemes, respectively, in
dementia appears to be tightly linked to the loss of the case of spelling). Mediating between these lev-
word meaning, as demonstrated by the correlation els is another layer of units (“hidden units”) that
across items between comprehension and exception receive activation via a set of learned weights from
word reading (Funnell, 1996; Graham et al., 1994) the input units, and transmit their responses to the
and spelling (Macoir & Bernier, 2002) that has output units via a second set of weights (see Figure
been observed in some patients. Funnell proposed 2b). This architecture for mapping between two

120 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

levels of representation has become quite common (a)


in cognitive modelling, and we will refer to it as the
single-route, multilayer (or SR-ML) architecture.
As is well known, such multilayer networks have
greater representational power than two-layer net-
works in which the input and output domains are (b) (c)
directly connected. As such they can be considered
a “generalisation” of the feedforward perceptron of
Rosenblatt (1962; see Rumelhart et al., 1986, for
discussion). The provision of hidden units in the
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

intermediate layer permits the learning (in princi-


ple) of arbitrary nonlinear mappings, including the Figure 2. (a) Simplest feedforward network, consisting of two
item-by-item “rote” learning of the stimuli in the layers of units and connections from input to output units. (b & c)
training set. If the representations in the hidden Two generalisations of the two-layer net. (b) single-route
layer are distributed to some extent, then it is possi- multilayer (SR-ML): Hidden units are added between the input
ble for such a network to learn to respond well to and output units and the direct connections from input to output
are removed; (c) Dual-route multilayer (DR-ML): As for the SR-
exceptional input–output pairs, while at the same ML net but the direct input-output connections are not removed.
time extracting the kind of statistically reliable rela-
tionships that permit a good generalisation to novel
stimuli. However, problems for such architectures the addition of hidden units and the necessary con-
have arisen in the domain of written-language pro- nections, but also the removal of the existing direct
cessing due to the dissociations discussed above connections between the input and output layers.
between the surface and phonological forms of dys- Suppose instead that the hidden units are added but
lexia and dysgraphia. Artificial lesioning of SR-ML the direct connections are not removed. This pro-
models has not been found to give rise to such dis- duces a network with the same number of layers of
sociations (Plaut et al., 1996), and careful analytic units as the SR-ML, but with two pathways from
studies of this architecture have led Bullinaria and input to output, one direct and the other mediated
Chater (1995) to argue that this is a limitation in by hidden units (Figure 2)2.
principle (for models that map between only two This architecture (call it the dual-route
types of information). multilayer network, DR-ML) has a number of
The debate between proponents of interesting properties that distinguish it from the
connectionist and more traditional symbolic (rule- SR-ML (Zorzi et al., 1998a, 1998b). For instance,
based) models has tended to become focused on the learning can take place in both routes at the same
properties of SR-ML networks (Pinker, 1997; time, but the network tends to partition the learn-
Seidenberg & McClelland, 1989). However, we do ing such that the direct route will learn simple (lin-
not believe that any substantive principle of ear) regularities, while the mediated route “mops
connectionist theory is at stake, should such models up” idiosyncratic (exception) input–output pairs by
be rejected for multiroute alternatives. As noted recognising the exceptional inputs and correcting
above, the SR-ML architecture is a generalisation the regular response produced by the direct route.
of the basic two-layer perceptron, but it is not the In this case, the network’s ability to generalise to
only possible one. To generate the SR-ML net- novel stimuli tends to be concentrated in the direct
work from a two-layer network requires not only route. This means that the mediated route (via hid-

2
The dual-route architecture actually appears in Figures 2 and 3 of the seminal article by Rumelhart et al. (1986), which
introduced the backpropagation algorithm. In particular the first learned solution to the XOR (exclusive-OR) problem (in their Figure
3, p. 331), is shown for a network with two pathways, direct and mediated, between input and output. From there on, however, the
architecture is dropped in favour of the single-route one.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 121


HOUGHTON AND ZORZI

den units) is largely freed from the need to general- selves (rather than, say, in the connections between
ise and can, in principle, simply learn the training layers), in terms of the strength of positive feedback
set on a rote basis. Damage to the two pathways has a unit provides to itself. The result of this is that
different effects and double dissociations of the more frequently used lexical items become acti-
type discussed above can be produced. As noted, vated more rapidly for a given level of input.
the production of such dissociations has proved 3. The sound–spelling mapping is parallel, associa-
extremely challenging for the SR-ML paradigm tive, and based on experience of whole (real) words.
(Bullinaria & Chater, 1995; Plaut et al., 1996). Where the sound–spelling relationships in a language
are inconsistent, then the representation of phoneme-
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

grapheme relationships may be one-to-many and


A connectionist dual-route model of spelling many-to-one. The direct (sound–spelling) route of
our model produces direct associations between
The aim of this paper is to implement a detailed
input phonemes and output graphemes. These
model of the spelling system within the
associations are generated by training on English
connectionist DR-ML framework. The model
words, whereby the model attempts to predict the
embodies a number of central theoretical claims.
spelling of a word given its pronunciation. The fact
While some of these are common to dual-route
that the associations are direct (implemented as a
models, in particular the separation of the whole-
two-layer network), means that, when required to
word (lexical) and subword (sound–spelling)
deal with a large corpus of words (over 3000), the
mappings, others are more specific to our model. As
model encodes only the more common spelling–
an introduction to the model we describe the main
sound relationships, though these may be inconsis-
specific features here:
tent (that is, after training, multiple candidate
1. Both routes are activated in parallel with com- graphemes may be activated by certain phonemes
petitive–cooperative interactions at the grapheme (out- or phoneme combinations). The scarcity of repre-
put) level. We propose that a phonological input to sentation resources in direct mapping means that
the spelling system will, at least for known words, information about individual lexical items experi-
activate both a lexical pathway and a phoneme- enced in training is lost when the set is large.
based sound–spelling conversion process in parallel. 4. Input (phonemic) and output (graphemic) rep-
Both processes produce activation of the same set of resentations are syllabically structured, and complex
grapheme nodes, and the activations are pooled. graphemes (e.g., SH, TH, EE) are locally represented.
Cooperative interactions take place where both We propose that syllabic chunking, at both the
routes agree on a spelling. Where they do not, posi- phonological and orthographic levels, is important
tion-based competition takes place between candi- to the spelling process (a position shared with other
date graphemes. In the absence of any random authors; Caramazza & Miceli, 1990; Treiman &
effects (noise), the grapheme receiving the highest Chafetz, 1987). The output representation is struc-
activation in any position (see below) will be the one tured to use “graphosyllables,” with onset, vowel,
produced. and coda constituents (cf. Plaut et al., 1996), and
2. Lexical representation is “localist” (orthogonal also complex graphemes.
representation of distinct orthographic word forms),
and the rate of activation of lexical nodes depends on The simulations in the paper represent a detailed
their frequency of use. Although it is not a strictly empirical test of these proposals against a range of
required feature of the architecture we use (Zorzi et data from both normal and impaired subjects (sur-
al., 1998a), we model the lexical route as a “localist” face dysgraphics in particular). In the first section,
network, where each orthographic word form is we deal with the sound–spelling mapping in isola-
represented by a single node (connectionist unit, in tion, and investigate the model’s nonword spelling
the implementation). We represent the effect of ability following training on a corpus of monosyl-
word frequency at the level of these units them- labic real words. The model’s output is compared to

122 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

data regarding normal performance, and the seven positions. Within the onset and coda, both
nonword spelling of a severe surface dysgraphic. phonemes and graphemes are “left-justified”; that
Following this we describe the full dual-route is, the first phoneme or grapheme occupies position
model, in which the lexical pathway is imple- 1, the second position 2, and so on (Figure 3). This
mented, and the mode of interaction of the two- “position-specific” code means that individual
routes is specified. After examining the model’s graphemes (especially single consonants) are dupli-
“normal” behaviour, we attempt to simulate in cated, e.g., the onset Rs in RATE and GRATE are
some detail the word spelling of acquired surface separate nodes, as are the coda Ts in WET and
dysgraphics. We model the impairment to the lexi- WEST. That phonological representations are
cal route as a lowering of the maximum activation syllabically organised, with a division between
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

level achievable by lexical nodes. Otherwise the onset and rime, is widely accepted (Hartley &
model is unchanged. Houghton, 1996; Treiman, 1986). The case is per-
haps not quite so clear for orthographic representa-
tions, but a number of authors have argued for the
existence of “graphosyllables” (Caramazza &
A TWO-LAYER NETWORK MODEL Miceli, 1990), and for a division between ortho-
OF SOUND-TO-SPELLING graphic onset and rime (Treiman & Chafetz, 1987;
CONVERSION Treiman, Mullenix, Bijeljac-Babic, & Richmond-
Welty, 1995). The current model is consistent with
Method: Description of the model
3 these proposals.
In some pilot work, we used an orthographic
We begin the discussion of the model by looking at representation that lacked complex graphemes
phonological spelling (sound-to-spelling conver- (e.g., SH, DD, OU, etc.), only a single letter being
sion) in isolation from lexical spelling. We first activated in any position. This produced a rather
develop what we believe is the simplest trainable low level of plausible spelling (around 70%).
mechanism capable of good performance on this Analysis of the errors showed that the model was
task. Our criteria for “good performance” include often producing blends of alternative graphemes
the following: production of phonologically plausi- for a given sound. For instance, an initial /f/ is usu-
ble spellings of nonwords; correct spelling of regu- ally spelled F, but can also be spelled PH (phrase,
lar consistent words; regularisation of exception phase, etc). Without grapheme nodes the model
words; and ambiguity over the spelling of regular- could sometimes produce the illegal blend “FH.”
inconsistent words (and analogous nonwords).

Representation. The basis of the sound–spelling


model is a standard two-layer feed forward net-
work, with complete connectivity between input
and output units (all output units receive connec-
tions from all input units). The model uses a syllabic
representation for both phonological input and Figure 3. Structure of the orthographic output representation.
orthographic output, with a distinction between Seven positions are distinguished, classified as either consonant or
onset, vowel, and coda positions. Along with a sin- vowel positions. All vowel graphemes (apart from final-e) occur in
the same position. There are three onset consonant positions and
gle vowel position, three onset (pre-vocalic) conso-
three coda consonant positions. Both onset and coda consonants are
nant positions and three coda (post-vocalic) “left justified,” i. e., the first consonant goes in the first slot, the
consonant positions are defined, giving a total of second, in the second, and so on. (On = Onset, Co = Coda).

3
The model was implemented in Visual Basic V.5, and runs under MS Windows on an IBM-compatible PC. Enquiries regarding
the implementation should be directed by email to the first author at his current address.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 123


HOUGHTON AND ZORZI

Table 1. Complex graphemes used in the model’s output


representation

Onset CH GH GN KN PH QU SH TH WH WR
Vowel AI AIR AR AU AW AY
EA EAR EAU EE EER EI EIR ER EU EW EY
IE IER IEU IEW IR
OA OAR OE OI OO OOR OU OUR OR OW OY
UA UAR UE UI UR UY
YE YR
Coda CH CK DD DG FF GH GHT GN GUE LL
MB NG PH QUE SH SS TCH TH TT ZZ
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

This kind of error indicates that complex


graphemes should compete for output as single
response options. In the above example, the H
should only be produced as long as it is part of a
sequence “PH,” which competes as a unit with the
alternative “F.” Figure 4. Structure of the direct (sound-to-spelling) route of the
The model was thus revised to incorporate com- model. Both phonology and orthography use a syllabic (Onset/
plex graphemes in the orthographic representation Vowel/Coda) representation, and the orthographic representation
(Plaut et al., 1996; Shallice et al., 1995). includes complex graphemes. The figure shows how the input and
output are represented for the mapping from /Tr@Ut/ to
Graphemes are divided into consonant and vowel
THROAT.
graphemes, and all vowel graphemes (with the
exception of final-e) occur in the same position.
The orthographic representation is shown in The phonological input representation is iden-
Figure 3. tical to that used for the output of the reading
The graphemes used (in addition to the standard model by Zorzi et al. (1998a, 1998b). Thus, there
single letters) are shown in Table 1. Table 2 shows are seven phoneme positions defined, one for the
some examples of the way in which individual vowel and three each for pre-vocalic consonants
words are represented in this scheme. With respect (the onset) and post-vocalic consonants (the
to the vowel graphemes, the letters W and R imme- coda). The overall structure of the two-layer
diately following the orthographic vowels are con- model is shown in Figure 4.
sidered to form part of the vowel grapheme (the
model was trained using British English pronunci- Training. The network was trained on a set of
ation, in which post-vocalic /r/ does not occur). 3165 uninflected monosyllabic sound–spelling
pairs. The majority of the words were extracted
Table 2. Examples of the orthographic representation used in the from the Oxford Psycholinguistic Database, and
model these were supplemented by a number of addi-
Onset Coda tional words (in particular, we made sure that all
——————— ——————— the real words required for simulations of experi-
Word 1 2 3 Vowel 1 2 3 mental data occurred in the training set). Regular
three TH R # EE # # # inflected words, e.g., tells, called, cats, were not
itch # # # I TCH # # included, though analogous irregular words were,
hearth H # # EAR TH # # e.g., has, bought, men. Note that the training set
twelfth T W # E L F TH does not define a one-to-one mapping, as it con-
grown G R # OW N # #
tains a number of homophones (e.g., /blu/ →
A # represents an empty position. BLEW, BLUE) and homographs (/lid/, /led/ →

124 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

LEAD). Prior to learning, the simple ortho- mean error reached asymptote. This was normally
graphic form of each word is parsed (by a computer after around 15–20 epochs of training.
program) into the syllabically-aligned graphemic Finally, weight pruning was used to remove
representation described above, and associated small weights. The delta rule distributes the
with the representation of its pronunciation. Pre- “blame” for errors over all activated input pathways,
sentation of the training set occurred in “epochs,” and early in training, when errors are the norm, this
in which each sound–spelling pair was presented produces many non-zero weights, which later have
once, and the order of presentation was random- little effect on performance. Pruning “cleans up”
ised at the beginning of each epoch (by “shuffling” these weights and simplifies analysis of the net-
the training set used in the previous epoch). Fre- work. Pruning takes place at the end of each epoch,
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

quency of presentation was not manipulated; the and weights whose magnitude falls below some
extremely short training required by the two-layer percentage of the magnitude of the largest weight
network (learning asymptotes after just a few (3%–5% in the present simulations) are set to 0.
epochs, as in Zorzi et al., 1998a, 1998b) would Note that as training progresses the magnitude of
make a frequency manipulation unrealistic. For the largest weights tends to increase, and hence the
each pair, the required phonological input is set up absolute value of the pruning threshold also
by activating the appropriate phoneme nodes to a increases.
value of 1. Activation propagates to units in the
grapheme layer in the usual manner: the
feedforward input to each grapheme unit is the dot Testing. Following training, the model’s perfor-
product of the vector of phoneme activations and mance was investigated in a number of studies
the vector of input weights to each grapheme unit reported below. Testing consists of presenting the
(since the phoneme units have activations of 1 or 0, phonemic representation at the input, propagating
this reduces to summing the weights on activated the activation to the output units (as described
pathways). Grapheme node activations are then under Training) and then selecting the most active
calculated as a sigmoidal function f of the this input, grapheme at each syllabic position. The selection
bounded in the range [0, 1] and with f(0) = 0 (i.e., mechanism is postulated to be a form of response
no input, no output). competition, and is implemented by lateral inhibi-
The grapheme activation pattern produced in tion between the nodes within a given position.
this way is then compared with the target. Nodes Each node has inhibitory connections to all other
that should be on have a target activation of 1, while competing nodes, and has a positive feedback con-
those that should be off have a target activation of 0. nection to itself. After achieving their initial activa-
The error for each output node is the difference tion levels due to the phonological input (computed
between its target and its actual activation value. as during training), the activation level of ortho-
This error is then used to change the weights using graphic nodes evolves according to the following
the Delta rule error correcting algorithm (Widrow rule:
& Hoff, 1960), i.e.,
a j (t + 1) = a j (t ) + F j+ (t ) − F j− (t ) (2)
∆w ij = λai (t i − a j ) (1)
where aj is the activation level of a unit j, t is the time
where wij is the weight from phoneme unit ui to (in discrete time slices, t = 1, 2, …), F +j is the self-
grapheme unit uj; ai and aj are the activations of the feedback from a unit j to itself, and F –j is the lateral
phoneme and grapheme units respectively; tj is the inhibitory input from competing units (those rep-
target activation for grapheme unit uj; and λ is a resenting graphemes in the same position). The
scaling parameter affecting the size of the weight two inputs F +, F –j are given by:
changes (conventionally known as the “learning w + a j (t ), if a j (t ) > 0
rate”). The learning rate was typically set to values F j+ (t )  (3)
between 0.03–0.05 and training stopped when the  − ∂a j (t ), otherwise

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 125


HOUGHTON AND ZORZI

F j− (t ) = w − ∑ a (t ) i
(4) Results: Performance of the two-layer model
i ∈P ,i ≠ j
Training and basic spelling
where w+ (equation 3) is the positive feedback
Two-layer networks trained with the delta rule are
weight of a node onto itself, and w- (equation 4) is
bound to reach a global minimum with respect to
the lateral inhibitory weight. Both weights were
the total error over the training set. This does not,
set at 0.2. The positive self-feedback only operates
of course, mean that the total error will be 0 (even
if the activation of the node gets above a threshold
less that performance will thereby be optimal with
θ. Below this threshold, the activation decays at a
respect to fitting psychological data). However, in
rate proportional to ∂. The threshold for the
terms of simply minimising the global error, the
orthographic vowel (first rime) position was set
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

weights produced during learning will be about “as


somewhat lower than for the other positions,
good as it gets” for the given architecture. Unlike
reflecting the fact that all words need a vowel
the case of models containing hidden units, there is
grapheme. The threshold is fixed at 0.1 for the
no scope for optimising the model with respect to
vowel position, and at 0.2 for all other positions.
additional performance criteria; the network can-
Equation 4 expresses the lateral inhibitory input to
not be hand-crafted (by varying the number of hid-
a node j; the summation is over all other nodes i ≠ j
den units) to produce a desired a level of
in an orthographic position P. Finally, the activa-
generalisation, or otherwise fit known data4.
tions of output nodes are bounded (clipped) in the
Nevertheless, there is some room for variation in
range [0,1].
the model in terms of, for instance, the learning rate
To summarise: During testing, the two-layer
used, the pruning threshold, and the length of
model’s response to a phonological input is gener-
training. To examine this, we have trained the
ated as follows:
model many times using different combinations of
1. The phoneme nodes comprising the input these learning parameters, within limited ranges.
word or nonword are set to an activation value of 1. For none of the model behaviours reported below
2. This activation propagates through the have we observed any major effect of this variation,
learned weights producing an initial orthographic though the model’s spelling of some stimuli may
activation (sigmoidal function of net input). alter, and overall response times vary slightly.
3. Phonological input then ceases (the word Therefore, in all the simulations reported in the
does not stay in the environment) and the activated paper, we use the same set of weights in the phono-
orthographic nodes obey equations 2–4 until all logical route. These were produced by training the
output nodes reach a steady-state activation of model for 20 epochs, with a learning rate of 0.03,
either 1 (fully on) or 0 (full off). and a pruning threshold of 5%. At 20 epochs the
4. The model’s spelling of the input word is that model’s error curve has been more or less flat for a
specified by the set of grapheme nodes with an acti- number of epochs, i.e., further improvement on the
vation of 1 eventually produced by step 3. A mea- training set is negligible. The learning rate is low
sure of relative “reaction time” to generate a enough to minimise any potential “recency” effects
coherent spelling plan is provided by the number of arising from the last words the network happened
cycles needed for the orthographic nodes to settle to be trained on, but not so low as to prolong train-
down. ing times unnecessarily.

4
For instance, Plunkett and Marchman (1991) used 20 hidden units in a three-layer model for learning the past tense of English.
They explain this number as a result of experimentation in which they varied the number of hidden units between 10 and 120. The final
choice of 20 is said to represent a “compromise” between optimal performance and the desire to maximise the network’s ability to
generalise. Similarly, Hahn and Nakisa (2000) describe results for a multilayer perceptron model for producing the plural of German
nouns. They report having experimented with networks with hidden units ranging from 2–1000. However, they only report the actual
results from a single network (the best performing one, with 40 hidden units).

126 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Table 3. Model's spelling of some of the exception words in its such subjects are the least likely to be able to use lex-
training set ical information. We would repeat the point made
Correct Model Correct Model above, that since this route has no hidden units,
there is no possibility of varying the number of
SOAP SOPE NIGHT NITE
SCHOOL SCOOL SWORD SORD them until a desired degree of generalisation is
AISLE ILE CHIC SHEEK obtained.
COUGH COF LIMB LIM Table 4 provides a comparison of the model’s
spelling of a set of 58 nonwords with those pro-
duced by MP, a surface dyslexic and dysgraphic
After training, the model’s spelling was first subject who showed near-perfect performance on
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

tested on all of the words in the training set, using nonword reading and spelling (Behrmann & Bub,
the procedure set out above. The model spelled 1992).
about 68% of the words correctly. This compares The spellings produced by the model are all
with a figure of 81% correct in reading a similar plausible, and in many cases identical to those pro-
(though slightly smaller) training set by the related duced by MP. One possible criticism is that the
reading model of Zorzi et al. (1998a). This differ- model sometimes produces a redundant final-e,
ence reflects the greater inconsistency of the Eng- e.g., in moope. This is due to the fact that “silent e”
lish sound-to-spelling mapping compared to the spellings, as in late, pile, mole, etc., do not compete
reverse. For regular consistent words (plank, match, with alternative spellings as a single response
bill, etc.) the model’s spelling was correct, while option. Hence the final-e can be selected irrespec-
exception and inconsistent words were regularised tive of the chosen vowel (Houghton & Zorzi,
(see Table 3 for examples). This aspect of the 1998). There is evidence that literate adults also
model’s performance is investigated more system- produce such spellings. For instance, most subjects
atically below. Overall, 96% of the words’ spellings in the Barry and Seymour (1988) study spelled the
were judged to be phonologically plausible, in the pseudoword /teIn/ either tane or tain. However a
sense that the pronunciation of the word could be few spelled it taine or tayne (redundant final-e).
reliably retrieved from the spelling. Many of the Similarly, the stimulus /pim/, generally spelled
errors involved adding an unlikely final “e” on some peem or peam, was sometimes spelled peeme or
words, so for instance, /swIs/ was spelt “swisse”. peame.
Although these were scored as errors, we suspect It is noticeable that the model produces spellings
many would be pronounced correctly. identical to those of MP where the sound-to-spell-
ing mapping is highly consistent, e.g., dimp, wush,
Simulation 1: Nonword spelling in surface fent, cham, etc. In the case of inconsistent mappings,
dysgraphia the model may generate an alternative acceptable
Of particular interest is the model’s nonword spell- spelling, e.g., dreece vs. dreace, kail vs. caile, caum vs.
ing (generalisation). Since the isolated two-layer corm5. In most of these cases, we find that the spell-
model has no lexicon, it cannot spell nonwords by ings produced by MP have in fact been activated in
lexical analogy and must use any sound–spelling the model but have not won the output competi-
regularities it has extracted. An interesting test of tion. For instance, Figure 5 shows the initial
the model is therefore to compare its nonword grapheme activations (prior to response resolution
spelling with that of a surface dysgraphic subject, as by the lateral inhibition mechanism) generated by

5
The difference between caum (MP) and corm (model) (similarly for bauce vs. borse) is probably due to dialect. MP is American and
we assume would not spell a pseudoword with a post-vocalic R unless there was an /r/ (or roticised vowel) in its pronunciation (hence
caum, bauce; cf. Treiman & Barry, 2000). The model, however, is trained on (Southern) British English pronunciation, in which the
dominant spelling of the long vowel /O/ is OR. The model’s spelling is supported by the Barry and Seymour (1988) study, in which
about 80% of the (British) participants produced OR when spelling the nonwords /vOk/ and /nOn/ (see also Treiman & Barry, 2000).

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 127


HOUGHTON AND ZORZI

Table 4. Nonword spelling: comparison of model with surface dyslexic MP (Behrmann &
Bub, 1992, Appendix 2)

Word MP Model Word MP Model

/bil/ beel beel /bl@Um/ blome blome


/sek/ sek seck /kOm/ caum corm
/mel/ mell mel /tS&m/ cham cham
/sp&l/ spal spal /hin/ heen heen
/ked/ ked ced /ditS/ deetch deech
/fleIt/ flate flate /dreIt/ drate drate
/neld/ neld neld /d0ld/ dold dold
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

/k@Ub/ cobe cobe /fl&m/ flam flam


/bOs/ bauce borse /l&ts/ lats lats
/dImp/ dimp dimp /sVf/ suff suff
/w@Ul/ wole wole /wVS/ wush wush
/lim/ leem leam /dris/ dreece dreace
/seIt/ sate sate /sk3l/ skirl scurl
a
/k0dZ/ cudge codge /nitS/ neech neech
/frIm/ frim frim /swim/ sweam sweam
/nIlt/ nilt nilt /mup/ moop moope
/trIst/ trist trist /teIz/ taze tase
/flip/ fleep fleep /hoIl/ hoil hoil
/blim/ bleam bleam /bris/ brease breace
/ged/ ged ged /witS/ weech weech
/rIlt/ rilt rilt /plik/ pleak pleak
/ris/ reese rease /hoIs/ hois hoise
/maIz/ mise mise /keIl/ kail cale
/fent/ fent fent /degz/ degs degs
/rel/ rell rel /st0ld/ stold stold
a
/pl0k/ pluck plock /skIm/ skim skim
/r0g/ rog rog /petS/ petch petch
/blIk/ blick blick /l0t/ lot lot
/fritS/ freech freech /nud/ nood noode
a
Denotes response marked as an error by Behrmann and Bub.

the input /dris/. The grapheme EE (produced by mappings. We return to the issue of the
MP) has an initial activation of just over 0.3, which representation of inconsistent sound-to-spelling
is later suppressed in the model by the more active relationships, and the effect of phonological con-
EA grapheme. text, below.
However, this does not mean that the model will
always spell /i/ as EA, even in the absence of any Simulation 2: Consistency effects in word spelling
random factors (noise). As can be seen in Table 4, We noted earlier that the model tends to spell regu-
the model’s spelling of /i/ actually varies. In the lar words correctly and regularises inconsistent and
pseudowords /bil/, /ditS/, /flip/ it is spelled EE exception words. To make a more systematic com-
(beel, deech, fleep), but is spelled EA in /swim/, /bris/, parison between the model’s spelling of consistent
and /plik/ (sweam, breace, pleak). Interestingly, MP and inconsistent words and nonwords, we used
shows very similar variation (compare the model’s word lists produced by Stone, Vanhoy, and Van
and MP’s spelling of these words in Table 4). This Orden (1997) based specifically on their sound-to-
shows the influence in the model (and presumably spelling consistency. Some words that have a con-
in MP) of surrounding phonological context on the sistent pronunciation with respect to their spelling,
selection of inconsistent phoneme–grapheme e.g., “bean”, may be inconsistently spelled, given

128 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

the pronunciation, i.e., the syllable /bin/ could be


spelt bean, been, or even bene. Words such as plank,
march, loft, however, are considered sound-to-
spelling consistent, as there is (putatively) no other
plausible way to spell them (see Ziegler et al., 1997).
If the model has indeed extracted the basic regulari-
ties of the sound–spelling mapping for English,
without remembering the specific words it was
trained on, it should show a clear difference
between these sets of words, spelling the consistent
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

words the way they are actually spelt, but producing


alternative spellings for more of the inconsistent
words.
Figure 5. Initial orthographic response to nonword input /dris/,
The model was presented with the phonological
showing competition between two possible renderings of the vowel forms of two lists of words, consistent and inconsis-
/i/. Y-axis shows activation level, x-axis shows grapheme output tent, from the Stone et al. (1997) Experiment 1. All
position. Dark bars represent the most active response in each the words in both lists occurred in the model’s
position; lighter bars competing responses. Competitive interactions training set. The model’s spellings of the words
within the vowel position lead to the less activated option EE
being suppressed. Note also the (subthreshold) activation of S in the
were recorded and compared to the actual spelling.
first coda position. (On = Onset, Co = Coda). The results are shown in Figure 6.
Of the 25 Stone et al. (1997) consistent words,
the model spelled 23 of them correctly (92%). The
two “errors” were fluke spelled flook, and loaf spelled
loafe, both of which we consider phonologically
plausible alternatives. Of the 25 inconsistent words,
the model spelled only 6 correctly (24%), while all
25 spellings were phonologically plausible. The
model thus shows clear sensitivity to the dominant
sound–spelling regularities of English when spell-
ing words on which it has been trained. If the word
is an inconsistent or an exception word this could in
principle act as a source of interference with respect
to lexical (case-specific) spelling. This issue is
investigated in detail in later simulations.

Simulations 3 and 4: Reaction times in the assembly


route
As shown above, the model can generate alternative
spellings where there is phoneme–grapheme
inconsistency in the training set. As well as predict-
ing variability in spelling, the time taken by the out-
put mechanism to resolve this response
competition can be used to predict reaction time
Figure 6. Comparison of the two-layer model’s spelling of words
differences as a function of sound–spelling consis-
showing consistent versus inconsistent sound-to-spelling patterns.
Graph shows per cent correct vs. incorrect (but plausible) spellings tency. We can show the effect of consistency on
of words of each type. All tested words were in the model’s training reaction time by looking at the time the model takes
set. to produce a stable output representation in

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 129


HOUGHTON AND ZORZI

response to the different types of Stone et al. stim- way analysis of variance (ANOVA) in which the
uli. These stimuli also allow us to make a word– factors were lexicality (word vs. nonword), and con-
nonword comparison orthogonal to consistency. sistency (consistent vs. inconsistent). A significant
Of course, with respect to spelling real words, we main effect of consistency was found—the model
cannot directly compare the model’s RT to human takes longer to settle on a spelling for inconsistent
data, as normal participants may not spell words stimuli than for consistent ones, F(1, 96) = 18.04, p
(especially inconsistent and exception words) using < .001. However, neither the lexicality factor nor
an assembly procedure. It is nevertheless of interest the interaction between the two factors was signifi-
to compare the model’s word and nonword spelling cant. The absence of a lexicality effect means that,
to see whether the assembly route shows any kind of at least in terms of response times, the phonological
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

lexicality effect—that is, does having been trained route does not distinguish between words it has
on a particular word speed up processing of that been trained on and novel stimuli. This is an impor-
word compared to nonwords controlled for tant property of the model which we will return to
consistency? later in the context of a simulation of surface
dysgraphic spelling errors.
Simulation 3. We examined the latency times for
the four word lists from Stone et al. (1997), Experi- Simulation 4. We wished to confirm the consistency
ment 1. These lists comprise consistent and incon- effect found for words on which the model has been
sistent words and nonwords. There are 25 words in trained, as this is important for understanding the
each list. The four lists were presented to the model interaction between the lexical and phonological
and the time taken (in simulated processing cycles) routes studied later in the paper. For this purpose,
to produce a stable response to each word was we presented the model with sets of regular consis-
recorded. The means for each condition are shown tent and exception words taken from the Taraban
in Figure 7. The results were analysed with a two- and McClelland (1987) study of consistency effects
in reading. The regular consistent words had to be
altered slightly, as they contained words that were
somewhat inconsistent from a spelling point of
view, e.g., write, which. Some other words were
changed as they were inflected, and hence were not
in the model’s training set. The resulting stimulus
lists used are given in Appendix A. In all there were
48 regular words and 48 exception words (all in the
model’s training set). The model’s spelling of each
word was recorded, along with the time taken (in
number of cycles). Only three regular words were
incorrectly spelled, whereas most of the exception
words were regularised (37/48, 77%). The signifi-
cant consistency effect was also confirmed in the
response latencies. Mean response time for the
regular words was 4.33 cycles, while the exception
words (even though regularised) needed 6.9 cycles,
t(96) = –4.11.
Figure 7. Mean time taken by the model (in processing cycles) to
Simulations 5 and 6: Phoneme–grapheme
produce a stable and unambiguous orthographic response as a
function of stimulus type: Factors are lexicality (word vs. consistency and the effects of context
nonword) and consistency (consistent vs. inconsistent). Only the In Simulation 1, it was noted that the model can
effect of consistency is significant. spell a given vowel sound (e.g., /i/) in more than one

130 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

way, apparently dependent on phonological con- ing. Here we confine ourselves to examples that
text. On the other hand, the model misspells excep- illustrate the way in which the model responds to
tion words it has been repeatedly trained on differences in the degree of consistency of a
(Simulations 2–4), indicating that it does not use phoneme–grapheme relationship, and its use of
the whole-word context to generate a correct spell- context.
ing. In such cases it appears to be largely driven by Figures 8 and 9 each show the excitatory con-
regular phoneme–grapheme correspondences. In nections from one group of phoneme nodes to one
this section we investigate these properties in more group of grapheme nodes. Figure 8 shows the con-
detail, and simulate nonword spelling data from nections from the syllable-initial consonants to the
normal subjects. syllable-initial graphemes, and Figure 9 those from
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

the phonological to the orthographic vowels. The


Connection weights following training. Before relative strength of the connection from a given
describing the results of the simulations, it is phoneme to a given grapheme is shown by the size
instructive to examine some aspects of the pattern of the filled rectangle in the relevant cell of the
of weights formed by the model as a result of train- matrix.

Figure 8. A subset of the weights developed by the model during learning. The matrix shows the excitatory weights formed between phoneme
nodes in the first onset position (rows) and grapheme nodes in the equivalent orthographic position (columns). Each row and column is
labelled with the associated phoneme and grapheme respectively. The size of a filled rectangle indicates the (relative) strength of the
connection between the nodes in the corresponding row and column. Empty cells have zero weights.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 131


HOUGHTON AND ZORZI
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Figure 9. Excitatory weights formed between phonological and orthographic vowel nodes during learning. See Figure 8 for explanation and
comparison.

The initial consonants (Figure 8) show a clear which can help it make a correct choice. For
pattern. Most phonemes have a strong excitatory instance, after short vowels (/I/, /&/, etc.) a final /k/
connection to just one grapheme. In a few cases, a is usually spelled CK (SICK, BACK, LOCK, etc.),
phoneme may show a second connection, e.g., ini- whereas after long vowels and diphthongs, it is
tial /k/ has connections to both C and K. It is evi- spelled K (WEAK, FORK, CROAK, etc.). The
dent that where a particular phoneme–grapheme model correctly predicts the spelling of the final /k/
mapping is highly regular, and independent of con- in these words (and also for nonwords derived by
text (e.g., words with an initial /b/ are always spelled changing the initial consonants of the words). This
with a B), the model simply learns a strong excit- ability must depend on connections from the pho-
atory connection between the relevant (position nological vowel to the relevant coda consonants.
specific) phoneme and grapheme. Otherwise, few Inspecting the relevant connections (not illustrated
connections develop to such graphemes. Exam- here), we find first that the coda /k/ has excitatory
ining the weight matrix as a whole we find that, in weights to both K and CK. Looking at the weights
the case of initial consonants such as B, P, D, M, from the vowel phonemes, we find that the set of
etc., the large excitatory weight shown in Figure 8 is short vowels have excitatory connections to CK,
the only such connection these graphemes receive. and strongly inhibit K. The other vowels simply
Hence, in these cases the model acts as a simple inhibit CK. Hence the /k/ activates both candidates
phoneme–grapheme conversion mechanism. and the vowels select between them.
When phoneme–grapheme relationships are Turning to the vowel-to-vowel connections
not so straightforward, the model will attempt to (Figure 9), it is immediately apparent that they
use any information from the phonological context show much more ambiguity than the consonants.

132 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Most vowels have connections to more than one /teIn/ and /pleIl/). Appendices 1 and 2 of Barry and
graphemic vowel and many of the connections are Seymour (1988) list all the unprimed spellings
relatively weak (the scaling of the rectangles repre- produced by their subjects for each nonword, along
senting connections in Figures 8 and 9 is the same). with the (absolute) number of subjects producing
It is noticeable, however, that the short vowels (/I/, each spelling. From these raw scores (combining
/e/, /&/, /0/, /V/, /U/) have relatively unambiguous the data from the two nonwords for each vowel) we
connections, each forming a dominant link to a can produce a rank ordering of the frequency of use
single grapheme (I, E, A, O, U, OO, respectively). of a particular vowel spelling in this study. This
The spelling of these vowels is generally consistent. ordering can be compared with the relative strength
The long vowels and diphthongs show more ambi- of the connections in the model from the same pho-
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

guity. For instance, earlier we noted the model’s nological vowels to vowel graphemes (Figure 9).
variability in spelling the long vowel /i/ (EE or EA). Of the 13 long vowels and diphthongs repre-
In Figure 9 we can see that this vowel has similarly sented in the model, the Barry and Seymour (1988)
sised connections to EE and EA, with a slight study provides data on the spelling of 9. All 9
advantage for EE. In such cases, grapheme selec- showed variability in their spelling. Table 5 shows
tion can easily be influenced by the balance of excit- the graphemes used to spell these vowels in this
atory and inhibitory inputs from the phonological study, ranked (left to right) according to frequency.
context. (Only the two or three most frequent spellings are
We now describe two simulations aimed at listed. These account for the vast majority of the
assessing the model’s weighting of vowel spellings, data. Individual subjects occasionally produced
and its use of phonological context to determine spellings that we have not listed.) Alongside the
which vowel is the most appropriate. subject spellings, we list the two or three vowel
graphemes which receive the strongest inputs from
the phonological vowels in the model (Figure 9), in
Simulation 5. We can get some measure of the
this case ordered according to strength of connec-
appropriateness of the ambiguous connection pat-
tion. As can be seen, the variability inherent in the
terns formed by the vowels by comparing them with
model connectivity closely matches the variability
data on variability in vowel spelling in nonwords
in the subject data. In most cases, the spellings the
provided by Barry and Seymour (1988). This study
model activates correspond to those used by the
was mainly aimed at investigating lexical priming of
subjects, and the rank orderings are very similar.
vowels in nonword spelling (the spelling of incon-
sistent nonwords can be affected by lexical primes),
but unprimed spellings were collected to provide a Simulation 6. As noted, the major locus of inconsis-
baseline. Each vowel was tested using two tency in English orthography is in vowel spelling
nonwords (e.g., for the vowel /eI/, the stimuli were (see Kessler & Treiman, 2001, for a quantitative
a
Table 5. Ambiguity in vowel spelling by participants and the model

Long vowels Diphthongs


————————————————————— —————————————————————
Vowel Participants Model Vowel Participants Model

/i/ EE > EA EE > EA /eI/ AI > A-E > AY A-E > AY > AI
/u/ OO > U-E OO > EW > U-E /aI/ I-E > Y(-E) I-E > Y(-E)
/O/ OR > AW > AU OR > AW >AU /oI/ OI > OY OI > OY
/3/ UR > ER UR > IR > ER /@U/ O-E > OA > OU O-E > OA > OW
/aU/ OW > OU OU > OW
a
Participant spellings are ranked according to frequency of use (data from Barry & Seymour, 1988, Appendices 1 and
2), model spellings by strength of connection from vowel phoneme to graphemes. Spelling with a dash (e.g., O-E)
denotes final-E spelling (e.g., POVE).

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 133


HOUGHTON AND ZORZI

demonstration of this). However, statistical analy- to conclude that adult spellers are influenced by the
ses of English sound-to-spelling correspondences surrounding phonological context when spelling
have shown that the spelling of some inconsistently vowels.
spelled vowels becomes more predictable when the To test the model against these results we pre-
preceding (onset) or following (coda) consonants sented it with the nonword stimuli used in the
are taken into account, though the effect of the coda Treiman et al. (in press) Experiment 1. As well as
consonants is considerably larger (Kessler & looking at the actual spellings produced by the
Treiman, 2001). While we were in the process of model, we also compared the initial levels of activa-
revising this article, we became aware of the results tion of both the critical and default spellings in the
of a study by Treiman, Kessler, and Bick (in press, experimental and control contexts. This is a useful
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Experiment 1), who tested whether adult spellers comparison, as it provides a more detailed picture of
showed any sensitivity to the phonological context the relative availability of a given spelling in a given
provided by the coda when spelling monosyllabic context. It also permits us to perform a statistical
nonwords. In this study, for each vowel that was analysis by item of the difference between the con-
investigated, two different phonological contexts trol and experimental contexts.
were identified such that a particular spelling, In their Experiment 1, Treiman et al. (in press)
referred to as the “critical spelling,” is used more in investigated the effect of coda consonants on the
one context than the other (in the set of English spelling of six vowels /e/, /i/, /U/, /aU/, /@U/, and
monomorphemic monosyllables). For instance, for /aI/. Of these we were able to test the first four in
the vowel /e/ (as in PET), the critical spelling is EA, the model, as they involve a simple comparison
which commonly occurs only in the coda context / between different vowel graphemes6. The results of
_d/ (e.g., HEAD, SPREAD, DREAD, etc). In the simulation and a comparison with the Treiman
other contexts the spelling is consistently E, and et al. data are shown in Table 6. The first two rows
this was the spelling expected to be produced in a show the proportion of words produced with the
set of control nonwords (we will refer to this as the critical spelling in the experimental and control
“default spelling”, to contrast it with the critical contexts, by both the human subjects and the model
spelling). For each such case, Treiman et al. con- (the human data are the means across subjects). As
structed 10 nonwords with the experimental con- can be seen, for each of the four vowels the model
text (i.e., that associated with the critical spelling) produced more of the critical spellings in the exper-
in the coda. Each experimental stimulus was paired imental contexts. Interestingly, for three of the four
with a control stimulus, which was identical except vowels, the model’s rate of production of the critical
that it lacked the experimental context. Treiman et spelling was quite close to the mean of the subjects.
al. report that the rate of production of the critical The exception was the vowel /U/ (critical spelling
spellings was significantly greater in the experimen- U, expected spelling OO). The model was rather
tal than in the control contexts, leading the authors less likely to produce the critical grapheme than the

6
We did not include /@U/ because the comparison is between the critical spelling O and expected spelling O_E. The latter (a
silent-E spelling) is not a grapheme in the current model, and we cannot assess its activation level independently of the activation of O
in isolation. For the case of /aI/ the experimental context was /_t/ and the critical spelling IGH, as in e.g., N-IGH-T. That is, Treiman
et al. (in press) effectively treat IGH as vowel grapheme and contrast it with the expected spelling I_E. Our model does not concur with
this analysis; instead we parse NIGHT as N-I-GHT, with GHT as a coda grapheme. The control nonwords used by Treiman et al. for
this vowel had codas other than /t/, e.g., /draIb/, /gaIf/, and hence the control condition measured the rate at which subjects produced
spelling such as DRIGHB and GIGHF. Our model simply predicts that such spellings will never be produced, as they are
orthographically impossible for it. In agreement with this, Treiman et al. report that such spellings were never produced by any subject.
For the experimental nonwords (with rime /aIt/), the model spelled 1 of the 10 words with IGHT, and activated the GHT grapheme to
some degree in every case (which it did not do for the control nonwords). Treiman et al. report a mean proportion of .27 for the critical
spelling in the /_t/ context. Hence the model behaviour is consistent with the results for this vowel, although the complete explanation
it offers of the difference between the conditions involves the orthographic representation, which is not the issue being addressed here.

134 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Table 6. Results of simulation 6. Data from Treiman et al. (in press) Experiment 1

/e/ – EA /i/ – EE /U/ – U /aU/ – OW


——————– ——————– ——————– ——————–
Vowel—Critical spelling Data Model Data Model Data Model Data Model

Proportion of critical spellings,


experimental nonwords .11 .10 .72 .90 .54 .20 .50 .60
Proportion of critical spellings,
control nonwords .05 .00 .45 .50 .43 .10 .08 .00
Mean proportion of vowel activation
due to critical spelling, experimental nonwords .19 .63 .42 .50
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Mean proportion of vowel activation


due to critical spelling, control nonwords .10 .45 .29 .27
Difference in critical vowel activation,
experimental vs. control, t test by items t(9) = 2.49 t(9) = 2.95 t(9) = 4.04 t(9) = 4.36

human subjects, irrespective of the context, prefer- 1. Nonwords are spelled plausibly, generally
ring OO in nearly every case. using the most frequent sound–spelling relation-
As a more sensitive test of the effect of context, ships. However, at the level of phoneme–grapheme
we examined the relative availability of the critical mappings, the model shows some variability.
grapheme by comparing its activation to that of the 2. Regular-consistent words are spelled
default grapheme for each stimulus. This was done correctly.
by expressing the activation level of the critical 3. Exception words are regularised.
grapheme as a proportion of the summed activation
of both the critical and default graphemes—for- The phonological route shows additional proper-
mally, aprop = acrit/(acrit + adef), which is equivalent to ties that are not necessarily associated with the
the Luce choice rule for two responses. The means dual-route framework, in particular,
of this measure for each vowel for the experimental 1. Multiple grapheme candidates can be acti-
and control nonwords are shown in rows 3 and 4 of vated in parallel, resulting in response competition.
Table 6. Row 5 shows the results of t tests by items 2. This causes variation in response latencies,
comparing the two conditions. For every vowel, the resulting in a consistency effect—inconsistent
relative activation of the critical grapheme is signif- words and nonwords take longer to spell.
icantly higher in the experimental than in the con- 3. Despite the model only being trained on
trol context. words, there is no effect of lexicality on spelling
latencies (and no interaction of lexicality with
consistency).
4. Individual phonemes are not always spelled
Discussion of the phonological-route model in the same way, but show effects of surrounding
context. The model is not functionally equivalent,
We have shown above that a two-layer network, for example, to a set of context-free phoneme–
trained on whole-word sound–spelling pairs using a grapheme mapping rules.
very general learning rule (Sutton & Barto, 1981),
comes to behave like a sublexical sound-to-spelling With respect to the three traditional properties of
conversion system. That is, in the process of trying the phonological route, we emphasise that they are
to learn to spell words, it acquires all the functional not due to any nonstandard mechanisms built into
characteristics traditionally associated with the the model. The model is a two-layer feedforward
phonological or assembly route of the dual-route neural network, trained using a general learning
framework. Specifically, rule, one which has found quite wide application in

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 135


HOUGHTON AND ZORZI

human and animal learning (Gluck & Bower, competition to be studied with a minimum of
1988a, 1988b; Miller, Barnet, & Grahame, 1995; complicating factors. However, we now want to
Shanks, 1991). This model minimises its error on consider the temporal interaction of this route with
the words it is trained on, but due to limitations output from a lexical procedure. As described in
inherent in the architecture, it is incapable of reduc- more detail below, we envisage the build-up of lexi-
ing the error to 0 on this particular problem. cal activation as being a gradual process; in particu-
Instead, it can only represent the more statistically lar, we will argue that the spelling of high-
reliable sound–spelling regularities. These regulari- frequency words becomes available more rapidly
ties also turn out to produce good nonword spell- than that of low-frequency words (see Zorzi et al.,
ing. This is an especially significant result in a two- 1998a, for a similar treatment of lexical activation in
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

layer network, because the model cannot be tuned reading). To model the interaction of the phono-
post hoc to produce a desired degree of generalisa- logical route with this process, we also require its
tion, for instance by trial-and-error selection of the output to build up over time, rather than being
number of hidden units (see Footnote 4). instantly available.
With respect to the second set of properties
listed above, the central feature of the model is that
Method: Description of the model
it need not generate just a single candidate
grapheme for each phoneme, but can produce mul- We will describe the functioning of the two-route
tiple competing candidates. In addition, the relative model in three stages: first, the activation of
strength of these candidates is not fixed, but is graphemes by the lexical route; second, the
affected by surrounding context. This property is activation by the phonological route; and finally,
crucial to explaining the kind of variability found the integration of the two sources of activation.
in subject data on nonword spelling (Barry &
Seymour, 1988; Treiman et al., in press), and Dual-route model: Grapheme activation by the
depends on the weights formed during learning. lexical route
Having established the viability of one route of To model lexical contributions to spelling, we have
the dual-route multilayer net as a sound-to-spelling implemented a simulated orthographic output lexi-
conversion mechanism, we now turn to the other con (OOL). The OOL itself is conceived of as a
route, and to the interaction between the routes at competitive, localist network in which each node
the common output level. This second route represents the spelling of a single orthographic
enables interactions between sound and spelling to word form, and competitive interactions allow just
be mediated by another layer of units. In the follow- one node to become active when a familiar phono-
ing we will assume that these units simply learn the logical code is encountered. The implementation of
patterns they are exposed to, i.e., that they function the lexical route involves a number of architectural
as an orthographic output lexicon. simplifications compared to box-and-arrow models
such as that shown in Figure 1. The main functional
property we are interested in implementing is the
A PARALLEL, INTERACTIVE gradual activation of a single orthographic lexical
MODEL OF SPELLING candidate. In principle this might be due to input
from a phonological lexicon, semantics, or both.
In the simulation of the isolated assembly route However, we do not attempt to simulate these pro-
described above, there is no temporal dynamic in cesses. In the implemented model, the vector of
the way the phonological input initially activates phoneme activations is simply matched against a
the graphemic output. Phonological activation stored input weight vector for each orthographic
spreads in a single pass to the grapheme nodes, lexical item, and the word producing the best match
which then compete to generate an unambiguous between the two is selected. If the input word is a
spelling plan. This permits the effects of response nonhomographic homophone (e.g., /bin/, BEEN/

136 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

BEAN), then the most frequent spelling is selected where w+i is the feedback weight to a lexical unit ui,
(we assume that under normal circumstances Wmax is the maximum possible weight, freqi is the
semantic context plays a part in disambiguating the Kucera and Francis (1967) frequency count of the
input).7 Thus the two important features of this lex- word, and f (.) is a sigmoidal function in the range
ical process are: [0,1], with f (0) = 0. The parameter Wmax is set to .9
in all the following simulations. The way in which
1. Activation of a selected lexical node is grad-
this frequency-dependent parameter affects the
ual, building to an asymptotic level over a number
“excitability” of a unit is described next.
of time steps.
2. The rate at which a lexical node becomes
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

activated by its adequate stimulus depends on the


frequency of the word it represents. Activation of the lexical nodes. The input to a lexical
node from the input phonemes is assumed to
We start with the frequency manipulation. develop gradually, reaching an asymptotic level in a
given number of time steps, a process we refer to as
“ramping.” The phonological input to a lexical node
ui at time t is given by,
Word frequency as a dynamic effect. The nodes in the
Input i (t ) = Phon max ∗ α [ ramp − t ] (7)
+
OOL are conceived of as being embedded in a com-
petitive network of the type discussed above for the
where Phonmax is the maximum possible input, α is a
grapheme output process. In such a network each
scaling factor, ramp is the time it takes (in discrete
node has an excitatory feedback loop onto itself,
cycles) for the input to reach its maximum value,
giving it the ability to support its own activation.
and [x]+ = max(0,x). In all simulations Phonmax = 1, α
Typically, the strength of this feedback depends on
= .8. The value of the parameter ramp is discussed
a parameter (the feedback weight) that is the same
below.
for all nodes in the network (see equation 3). In the
The activation level LexAi of a lexical node ui at
implementation of the OOL, we allow this feed-
time t is then given by the sum of its current input,
back weight to vary as a function of word fre-
as defined by equation 7, and a frequency-depend-
quency—the more frequent a word is, the stronger
ent proportion of its previous activation,
the feedback weight. In principle, we believe that
such a modulation of a unit’s feedback weight could LexA i (t ) = Input i (t ) + w i+ LexA i (t − 1) (8)
easily be achieved as part of a competitive learning
algorithm. In the implementation we simply derive The value of the feedback weight w+ depends on
the feedback weight for each unit from its frequency the word frequency as specified in equation 6.
by a rule, in which a maximum possible value for the Finally, lexical node activations are bounded in
weight is multiplied by a sigmoidal function of [0,1]. The overall result of equations 6–8 is simply
frequency: that the selected lexical node becomes activated
gradually, and that the rate of increase of activation
w i+ = W max ∗ f ( freq i ) (6) is proportional to its frequency.

7
A further consequence of the simplification of access to the OOL is that partial activation of nodes representing words
phonologically similar to the target does not occur. However, we do not wish to claim that partial activation of OOL nodes is
impossible. As stated, we envisage all lexical nodes as being embedded in a competitive network in that lateral inhibitory interactions
allow the node that best “matches” current inputs to dominate the activation of other nodes, i.e., precisely the kind of dynamics
implemented amongst the grapheme units which allows selection of at most a single response in each syllabic position. While there is
no theoretical difficulty in implementing the competitive selection mechanism amongst the lexical nodes, we felt that the additional
complexity it would add to the implementation, including the need for further free parameters governing the competitive interactions,
would have little benefit with respect to the phenomena we are interested in, as we do not attribute any of them to possible activation
effects of multiple lexical candidates.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 137


HOUGHTON AND ZORZI

Lexical activation of grapheme nodes. A lexical node Dual-route model: Summation of inputs and
sends activation to all graphemes that it contains grapheme competition
and inhibition to those it doesn’t. Thus the lexical The activation of grapheme nodes follows the same
input, LexInput, to a grapheme node i given by, rules as specified earlier for the phonological route
in isolation (equations 2–4), except that input is
LexInput i (t ) = w lex LexA j (t ) (10)
now also received from the lexical route. The acti-
where LexA is the activation of the selected lexical vation GraphAi of a grapheme node is given by,
item (equation 9), and wlex is a weight parameter.
For graphemes that are in the word, wlex = 1.0; for GraphA i (t ) = δGraphA i (t − 1)
(12)
those which are not wlex = –.5. The positive weight + ExtInput i (t ) − IntInput i (t )
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

simply transfers the lexical activation directly to the where δ (= .6) is a decay parameter; ExtInput (exter-
graphemes contained in the word, while the nega- nal input) is just the sum of the lexical input
tive weight helps to suppress any inconsistent input (LexInput, equation 10) and the phonological input
from the sound–spelling route. The more rapid (PhonInput, equation 11); and IntInput (internal
activation of higher-frequency words leads to more input) is the lateral inhibitory input due to other
rapid activation of their constituent graphemes. As activated grapheme nodes competing for the same
noted above, lexical node activation has a maximum output position (equation 4). As before, grapheme
value of 1.0. The setting of the excitatory lexicon- node activations are clipped to remain in the range
to-grapheme weight to unity means that lexical [0,1].
input to grapheme nodes also has a maximum of As discussed above, the input from the two
1.0. routes is “ramped,” and the time taken (in simu-
lated time steps) for input to reach its maximum
value depends on the parameter ramp (equations 7
Dual-route model: Grapheme activation by the and 11). In principle, this could be set differently
phonological route for the two routes, reflecting a greater emphasis on
Grapheme nodes receive input from the phonolog- the use of one route over the other. For instance, a
ical route in parallel to the lexical input. The gener- large difference between the two values would
ation of this input is exactly as described in the effectively disable the slower route. In the follow-
discussion of the phonological route in isolation, ing simulations, it is assumed to be equal for the
except that it builds up over time. This is achieved two routes (i.e., phonological activation flows into
in exactly the same way as for the lexical route, viz. a the two routes at the same rate), and is set to a
maximum input value for each grapheme node is value of 10.
multiplied by an exponential ramping factor,
+
PhonInput i (t ) = Maxi ∗ α [ ramp − t ] (11)
Summary of parameters. The full model with acti-
where PhonInputi(t) is the input to grapheme unit ui vation ramping contains more parameters than the
from the phonological route at time t, and Maxi is phonological route in isolation (Table 7). They are
the maximum value of this input (cf. equation 7, for set to values that enable the model to produce
lexical activation). The term Maxi is computed at robust, unambiguous spelling of all “known”
the outset and is the activation of a grapheme node i words (in particular, of low-frequency exception
(prior to grapheme competition) that would be pro- words) given the fairly strict response criteria spec-
duced by the phonological route if the input pho- ified above (all grapheme nodes having an activa-
neme nodes had their maximum activation of 1.0. tion of either 1 or 0). The same parameters are
Note that the phonological route cannot activate a used in all simulations, except in the simulations of
grapheme node above a value of 1.0 (i.e., Maxi ≤ 1), surface dysgraphia, where we model the deficit as a
and hence PhonInput, like LexInput, cannot be lowering of the ceiling of activation for lexical
greater than 1.0. nodes.

138 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Table 7. Parameters used in dual-route simulations

Symbol Function Value

Wmax Maximum strength of lexical node excitatory self-feedback. .9


Modulated by lexical frequency.
ramp Number of cycles for phonological input to each route to 10
reach its maximum value. Set to be equal for the two routes.
Wlex (positive) Excitatory lexicon-to-grapheme weight. 1.0
Wlex (negative) Inhibitory lexicon-to-grapheme weight. –.5
w+ Grapheme node self-activation. .6
w- Grapheme node lateral inhibition.
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

–.75

Results: Performance of the dual-route incorrect phonological activation. All words benefit
model from this effect, regardless of frequency.
Second, although the “ramping” parameter is set
Lexical dominance: Spelling exception words to be equal for the two routes (i.e., activation
We showed earlier how the isolated sound-to- spreads at the same rate into the two routes), the use
spelling component of the model regularises of the frequency-weighted feedback parameter in
inconsistent and exception words on which it has the lexical route speeds up lexical activation in pro-
been trained. However, when used in combination portion to the word frequency. Hence the lexicon-
with the lexical route just described, this should to-grapheme input can increase more rapidly than
not cause the model to make large numbers of the phonological input, even when both inputs have
errors on these words. That is, when investigating the same asymptotic value. This advantage is par-
the full model, it is best to start with a version that ticularly enjoyed by higher-frequency words. As
can actually spell irregular words well. This is with the first factor, this only causes the suppression
achieved by the model as described above, with the of incorrect graphemes by virtue of the lateral
given parameter setting. In this form, the model inhibitory interactions amongst graphemes.
correctly spells even low-frequency exception Finally, lexical nodes also send inhibitory input
words (the most “difficult” cases), the lexical route to graphemes they don’t contain. This feature is
being able to dominate the combined inputs from distinct from the first two in not depending on
both routes. This dominance is due to a combina- competition amongst grapheme nodes to have an
tion of factors. effect. In many cases it is fairly redundant, the first
First, the input from an activated lexical node to two factors being sufficient to establish lexical
each grapheme it contains has an asymptotic value dominance even when lexical input is purely excit-
of 1.0. Although the sound–spelling input can also atory (this is easily achieved in the model imple-
reach this level for particular graphemes, it fre- mentation by setting the lexicon-to-grapheme
quently does not, and in particular it does not do so inhibitory weight to 0). However, it is needed in a
in cases of sound–spelling inconsistency (e.g., /i/ → few cases where the phonological route activates a
EE, EA; see Figure 5). Hence the activation from grapheme in an output position in which the lexical
the lexicon is stronger where sound–spelling rela- route activates nothing, in particular a redundant
tionships are inconsistent. Furthermore, if the final-e, e.g., “soon” → SOONE. In this case there
word has an inconsistent, rather than truly excep- are no competing graphemes to suppress the activa-
tional, spelling then one of the candidates activated tion of the final-e. Ideally, it would be parsimonious
by the phonological route is likely to be the lexically to do without this factor, thus leaving response
correct one. In this case the lexical spelling will be competition as the only mechanism for resolving
further reinforced. In combination with the lateral ambiguous graphemic activation. However, to deal
inhibitory interactions, this can suppress lexically- with cases such as SOONE, this would appear to

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 139


HOUGHTON AND ZORZI

require the use of “null” grapheme nodes (repre- exception words (McCarthy & Warrington, 1986;
senting empty positions): A lexically activated null see Zorzi et al., 1998a, for a simulation of surface
grapheme in the last position would compete with, dyslexia).
and suppress, the final-e. However, for consistency Analogous results have been reported for spell-
of representation, these null nodes would also have ing (Kreiner, 1996; Kreiner & Gough, 1990). For
to be included in the training and output of the instance, Kreiner (1996) used an oral spelling test to
phonological route. This would complicate the examine time taken to spell (polysyllabic) words as a
model and may have unexpected side effects. Since function of their rated familiarity (a variable corre-
we currently know of no evidence to support the lated with frequency) and degree of regularity. Both
existence of null grapheme nodes (which can be total spelling time and time to start spelling were
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

activated and compete like any other), inhibitory measured. Reliable main effects on total spelling
lexical input is used, though its strength is kept as time were found for both familiarity and regularity,
low as possible. higher values of either factor producing more rapid
It is important here to emphasise a final point: spelling. Most importantly, there was a significant
Although these features result in the dominance of interaction between the factors, the low-familiarity
the lexical route in spelling known words, they do words being much more affected by regularity than
not remove all influence of the sound-to-spelling the high-familiarity words. A similar effect of the
process, and this influence should not be considered interaction was also found for subjects’ time to start
to be merely a source of interference. In particular, spelling.
where the two routes are in agreement, the input is Kreiner (1996, Experiment 3) reports an analo-
summed, and lexical activation can be strongly rein- gous pattern of results, including the interaction,
forced by input from the phonological route. Since when the data were analysed for spelling accuracy.
most graphemes in the great majority of English Low-familiarity words were generally spelled worse
words are in regular (or quasiregular) correspon- than high-familiarity words, but the effect was
dence to the words’ phonemes, the pooling of the much greater for irregular than for regular words.
two sources of graphemic activation is generally Results regarding spelling accuracy in normal sub-
beneficial. That is, phonological input to the nor- jects are reflected in the impaired spelling of sub-
mal lexical-spelling process should not be con- jects with acquired surface dysgraphia. Recently,
strued as a source of noise that it would be better to Rapp et al. (2002) have reported data from surface
do without . dysgraphic patient LAT, who achieved 90–98%
correct spelling of pseudowords while showing sig-
Simulation 7: The frequency-by-regularity nificantly impaired spelling of real words. LAT
interaction showed both a frequency effect (high-frequency
Reading words aloud is known to be affected by the words spelled better than low-frequency words),
regularity of the words. Irregular words (those con- and a regularity effect (regular words spelled better
taining at least one low-frequency grapheme-pho- than irregular words). Importantly, Rapp et al.
neme correspondence) have longer reading report a frequency-by-regularity interaction, such
latencies than regular words (Baron & Strawson, that low-frequency, irregular words were always
1976). This effect has been reported to interact spelled with lowest accuracy. This pattern clearly
with word frequency, such that it is particularly indicates an impairment in lexical spelling with
strong for low-frequency words (Seidenberg, spared grapheme–phoneme conversion. This view
Waters, Barnes, & Tanenhaus, 1984; Taraban & is reinforced by the observation that the vast major-
McClelland, 1987). Related findings have been ity (90–100%) of LAT’s spelling errors were
reported from acquired surface dyslexic subjects, phonologically plausible.
who can read nonwords well but tend to regularise We first examine the model’s behaviour using a
the pronunciation of exception words. The latter response latency measure. Later we will look at
tendency is especially marked for low-frequency accuracy in the context of simulating the surface

140 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

dysgraphic pattern. To examine latency we used


four word lists adapted from the Taraban and
McClelland (1987) study of reading. These lists
(see Appendix A) consist of two types of word stim-
uli (regular and exception), and two frequency
bands (high and low). The same stimuli were used
earlier to confirm the consistency effect in the iso-
lated phoneme–grapheme route. In that simula-
tion, word frequency was irrelevant, and the four
lists were collapsed into two lists of 48 items of
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

mixed frequency. In this simulation, word fre-


quency is a factor and we first wished to ensure that
(for the model) there was no confound between fre-
quency and regularity for the phonological route
(i.e., that the two lists of regular words were equally
regular, the two lists of exception words equally
exceptional—recall that the original word lists were
compiled on spelling-to-sound criteria). Thus we
first presented all four lists (of 24 items each) to the Figure 10. Effects of word frequency and regularity on the time
phonological route in isolation and collected the taken by the dual-route model to produce a consistent, fully
model’s latencies (as described earlier). No effect of activated spelling pattern. Both factors produced significant effects,
frequency was found for either word type, the and the interaction was also significant. The stimulus lists used in
model producing almost identical mean latencies the study are given in Appendix A.

for both lists of regular words and for both lists of


exception words.
Following this, the same word lists were pre- Discussion. The finding of a highly significant fre-
sented to the full model, and the time taken (in quency effect demonstrates that the lexical fre-
simulated processing cycles) to produce an unam- quency manipulation in the model (variation in the
biguous spelling plan was recorded. All the stimuli positive feedback weight on lexical nodes) makes
were spelled correctly, including the low-frequency itself felt at the grapheme level, even though
exception words. Figure 10 shows the mean laten- grapheme nodes are also receiving input from a
cies for the four different conditions. The response phonological route that does not show a frequency
latencies were submitted to a two-way ANOVA effect (indeed, does not show a lexicality effect, as
with factors Word Type (regular or exception) and discussed earlier). Similarly, the consistency effect
Frequency (low or high). Significant main effects found in the phonological route produces an appar-
were found for both factors: Word Type, F(1, 92) = ent lexical regularity effect—regular words are
17.78, p < .001, regular words being faster than spelled significantly faster than irregular ones. The
exception words; Frequency, F(1, 92) = 152.22, p < model would not show this effect without the inter-
.001, high-frequency words being faster than low- active input from the phonological route when
frequency words. Importantly, the interaction of spelling known words. Finally, the frequency-by-
Word Type × Frequency was also significant, F(1, regularity interaction was also significant: The low-
92) = 10.76, p < .01, that is, change in word fre- frequency of a word causes particular difficulty if
quency has a greater effect on the spelling of excep- the word is also irregularly spelled. For these words,
tion words than on regular words. The model thus the effect of the relative slowness of the lexical acti-
shows the basic pattern found in studies of spelling vation is compounded by (1) the lack of support
time for polysyllabic words reported by Kreiner from the phonological route for the exceptional PG
(1996). mappings in the word, and (2) response competi-

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 141


HOUGHTON AND ZORZI

tion due to activation of higher frequency The activation ceiling is reduced for all lexical nodes
mappings. For high-frequency exception words, equally, irrespective of frequency. Hence, to the
the lexical route activation is sufficiently rapid that extent that the model exhibits the more detailed
the effects of these two factors is much weaker. In surface dysgraphic pattern, it is not due to the rep-
particular, effects of response competition are much resentation of lower frequency words suffering
reduced, as the low-probability graphemes acti- greater “damage.” Note that this kind of simulated
vated by the lexical route quickly build up a high lesion can be easily reconciled with the hypothesis
activation enabling them to inhibit any activation, that spelling partly depends upon the integrity of
of potentially competing graphemes. the semantic system (Graham et al., 2000). Weaker
activation of the OOL might result from the lack of
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Modelling impaired spelling: Surface dysgraphia input from a damaged semantic system, or alterna-
We showed earlier that the isolated phonological tively from the absence of coherent resonance (via
route could spell regular words and nonwords well, recurrent feedback) between orthographic and
and consistently regularises exception words. As semantic representations. The detailed modelling
such it behaves like a very severe surface dysgraphic of surface dysgraphia in the context of semantic
subject. However, such subjects typically show a impairments (as well as its time course due to the
preserved ability to spell many exception words, progressive nature of the deficit) represents an
and in particular exhibit a frequency-by-regularity intriguing challenge but one that is beyond the
interaction such that their spelling of exception scope of the present paper.
words gets worse as the frequency decreases (e.g.,
patient MP, Behrmann & Bub, 1992; patient Surface dysgraphia: Patient MP. Behrmann and Bub
LAT., Rapp et al., 2002). In terms of the current (1992) present a fairly detailed analysis of the spell-
model, this pattern clearly suggests some general ing of a surface dysgraphic (and dyslexic) patient,
weakening of the influence of the lexical route MP. MP was asked to write to dictation a total of
rather than its complete neutralisation. To investi- 392 words (mono- and polysyllabic), half of which
gate this possibility we examined the performance were regular and half irregular (see Behrmann &
of the full model with the output of the lexical path- Bub, 1992, Appendix 3 for a complete list of the
way weakened. stimuli along with MP’s spelling of them). The
There are a number of manipulations that may assignment of words to these classes was made on
be made to the lexical route which would weaken its the basis of spelling-to-sound regularity, i.e., with
influence on the spelling process. For instance, as respect to reading. Both sets of words were divided
described above, the rate of lexical activation is par- into six frequency bands, ranging from less than 10
tially determined by a ramping parameter, which is per million to more than 200 per million. For spell-
set to be equal for both routes in the model. If the ing, MP showed a significant effect of regularity on
value of this parameter is increased for the lexical accuracy, spelling regular words better than irregu-
route, then lexical activation will be slowed relative lar words. In addition her spelling of the irregular
to the phonological route (see Zorzi et al.,1998a, items showed a marked frequency effect, accuracy
for an example of such a manipulation in the con- decreasing with frequency. No equivalent trend of
text of reading). An alternative proposal is that the accuracy against frequency was found for the regu-
asymptotic strength of the lexical output has been lar words, resulting in a frequency-by-regularity
reduced following neural damage (e.g., due to low- interaction. Although Behrmann and Bub did not
ered excitability of the neural structures encoding analyse the spelling errors produced by MP any fur-
orthographic lexical knowledge). In this case the ther, it is clear from the results provided in their
ramping parameter is itself unaffected, but normal Appendix 3 that the vast majority of them were
levels of output are never achieved. phonologically plausible.
In the following studies we use the latter manip- In terms of comparing the model with the data
ulation to simulate the surface dysgraphic deficit. from MP, a major problem is that the word lists

142 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

include many polysyllabic words, whereas the


model is restricted to monosyllables. An additional,
less serious, problem results from the use of
grapheme–phoneme correspondences to define
regularity. Consequently some of the supposedly
“regular” words contain rather low-frequency pho-
neme–grapheme correspondences, e.g., siege, niece,
limb, launch, etc. The inclusion of such stimuli in
the regular-word set is likely to contaminate the
results somewhat. Fortunately, Berhrmann and
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Bub (1992) provide a complete listing of all the


stimuli they used, along with MP’s spelling of
them. We were therefore able to analyse their data
for the monosyllabic words only, and compare this
with the performance of the model. Before looking
at the performance of the model with an impaired
lexical route, we tested the unimpaired model on
the stimuli, and analysed the model’s response
latencies. As well as providing a predicted “normal Figure 11. Model latencies for regular and irregular words in six
profile” for these stimuli, it allows us to examine the different frequency bands (for stimuli, see Appendices B and C).
frequency-by-regularity effect reported above in
more detail.
was also significant, F(5, 215) = 2.52, p <.05. This
Simulation 8: Normal performance confirms the results reported above using different
As noted above, the polysyllabic words had to be sets of words. It is clear from the graph that the RT
removed from the Behrmann and Bub (1992) stim- difference between regular and exception words
uli, and this occasioned some additional changes. In increases as word frequency decreases, and that at
particular, the removal of the polysyllabic words led the highest frequencies there is no regularity effect.
to there being very few stimuli in some conditions To look at this in more detail, the data were sub-
compared to others (e.g., 7 against 27), so we jected to trend analysis for both factors and the
supplemented the depleted conditions with more interaction. The two main effects showed signifi-
words of the appropriate frequency and word-type cant linear trends, and the only significant trend in
to get a better sample with which to test the model. the interaction was also linear (t = 2.68, p <. 01).
The resulting sets of regular and irregular words are Hence the main difference in the two curves is in
shown in Appendices, B and C, along with the their slope, the curve for the exception words being
mean frequency of each word group. In the end, the significantly steeper than that for regular words.
monosyllabic stimuli consisted of 216 words.
The 12 lists were presented to the model and the Simulation 9: Weakening the lexical route
response time for each word was recorded. Parame- Figure 12 shows the spelling accuracy data (per-
ters were exactly as for the previous simulations. centage correct) for MP on the monosyllable set,
The mean latency for each word list is shown in for each of the 12 conditions (6 frequency x 2
Figure 11. regularity levels).
The results were analysed using ANOVA for It is immediately apparent that the large effect of
two factors, Regularity and Frequency. Both factors regularity reported for the complete set of words
produced significant main effects: Regularity, F(1, (Behrmann & Bub, 1992) is reproduced when the
215) = 36, p < .0001; Frequency, F(5, 215) = 22.4, p monosyllables are analysed in isolation. In addition,
< .0001. The Frequency × Regularity interaction regularity interacts with frequency. As is clearly

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 143


HOUGHTON AND ZORZI

to obtain a close quantitative fit of the model to


every point in the MP monosyllable data. For one
thing, some of these individual points (in the sub-
ject data) are based on stimulus sets with very few
members (e.g., seven) and may not be very reliable.
In general, fitting detailed data from just a single
subject by searching through a model’s parameter
space is not very revealing, unless the parameters
can be fixed in some independent way. In this case,
we leave the model parameters exactly as described
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

above for modelling normal performance. The only


change is that the strength of the output of the lexi-
cal pathway is reduced by a constant factor.
The strength of the lexical pathway has to be
chosen to reflect the degree of MP’s deficit in some
reasonable manner. A good “anchor point” should
clearly involve the data most relevant to the descrip-
tion of the condition. Beyond this we suggest that a
Figure 12. Percentage correct spelling of monosyllabic regular and
“minimal impairment” strategy is preferable; that
irregular words by patient MP (data derived from Behrmann & is, the impairment to the model should permit it to
Bub, 1992, Appendix 3) and by the lesioned model. Scores are match the subject’s best performance (on the rele-
shown for words from six different frequency bands. The model was vant condition). This ensures the model is not over-
fitted to the patient data by finding a value for the strength of the
damaged. The key data in this case involve the
lexical output that produced the same percentage of correct responses
on the highest-frequency irregular words (74%). irregular words, and MP performs best on the high-
est frequency ones8. Hence we anchored the model
to the data by finding a value for the lexical strength
which produced an error rate in the model as close
shown by the graph in Figure 12, the effect of fre-
as possible to that shown by MP for the highest-
quency is very pronounced for irregular words but
frequency irregular words. MP scored 74% correct
virtually absent for regular words.
in this condition. With the lexical strength set to
In comparing the model to such data, it is
.34, the model produced the same result. The word
important to be clear on the aims of the simulation.
sets for the other 11 conditions were then presented
We wish to demonstrate that a very simple global
to the model and the spellings produced were
impairment to the model provides a good quantita-
analysed for accuracy.
tive and qualitative fit to the main features of the
The results from the model are shown in Figure
data. Quantitatively, we take these to be a signifi-
12, together with MP’s performance. As can be
cant main effect of regularity, and a frequency-by-
seen, the model results fit the data fairly well. The
regularity interaction such that relative frequency
regular words show no significant effect of
has a greater impact on irregular than on regular
frequency, while the irregular words show an
words. Qualitatively, the kinds of spelling errors
advantage for higher-frequency words.
produced by the model should also resemble those
produced by MP, i.e., be phonologically plausible (a
more stringent test of qualitative similarity is Simulation 10: MP’s spelling errors
described below in relation to data produced by As well as reproducing the quantitative error profile
Rapp et al., 2002). It is not an aim of the simulation shown above, a model of surface dysgraphia should

8
An additional significant advantage of this anchor point is that this set contains a relatively large sample of monosyllables, 23.

144 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Table 8. Spellings produced by MP and the model for irregular words from the two
lowest frequency bands

Frequency 0-9 Frequency 10-19


—————————————— —————————————
Target MP Model Target MP Model

womb woom woom tomb tume tom


chrome crome crome ache ake ake
chef shef shef pint pinte 3
yacht yaut yott shove 3 3
awe aw 3 glove 3 3
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

dove 3 3 mild 3 3
hind hinde 3 gauge gaje gage
shone shaun 3 deaf def def
dread 3 3 wool 3 3
sweat swet swet brow 3 3
yearn yurn 3 plough 3 3
sieve sive sive soup 3 soope
soot 3 3 sword sord 3
crook 3 3 dough doe 3
hook 3 3 ton tun tun
brook 3 3 shoe 3 shoo
vow 3 3
sponge spunj spunge
suave swav swarve
cough 3 cof
trough 3 trof
shove shuv 3

3 = correct spelling.

show a good qualitative agreement with the kinds model with neither of these properties. For
of errors participants make. In Table 8 we show the instance, it might be proposed (see, e.g., Kreiner,
spellings produced by MP and the model for the 1992) that access to the lexical route occurs first
irregular words from the two lowest-frequency (and in isolation), the phonological route only com-
bands. As can be seen, there is quite good agree- ing into play if no lexical candidate is activated
ment between the two sets of responses as to which (hence no parallelism, and no interaction). In such a
words are difficult; the model tends to make mis- scenario, lexical access would be all or nothing, but
takes where MP does, and gets many of the same probabilistic, with the probability of successful
words right. The great majority of errors produced access being lower for low-frequency words. The
by both model and patient are phonologically impairment resulting in the surface dysgraphic pro-
plausible, and in many cases are identical. file would be due to an overall lowering of the prob-
ability of successful lexical access; access to low-
Simulation 11: Errors and dual-route frequency lexical items would, on average, fail more
interactions—data from patient LAT often than access to high-frequency items. In the
The current model is parallel and interactive. Both case of regular words, this would not have much
routes receive phonological input in parallel and effect on errors as the phonological route would
interactions occur at the graphemic output stage generally produce the correct spelling. For irregular
where both cooperation and competition take words, the more frequent use of the phonological
place. However, it is conceivable that the results route for low-frequency words would lead to the
discussed above might be shown by a dual-route error profile discussed above.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 145


HOUGHTON AND ZORZI

Such an account is quite plausible and, given Seeing these results, we examined the errors
the possibility of deliberate, strategic control over produced by the model in simulations of impaired
spelling, possibly correct in some cases. Interest- spelling for evidence of the same effect. Candidate
ingly, though, it makes different predictions than words would need to contain more than one low-
the current model regarding the occurrence of a frequency phoneme–grapheme (PG) mapping.
particular class of error. A noninteractive model Rapp et al. (2002) use matched pseudowords to
will either spell a word lexically (if lexical access is control for the possibility that the phonological
successful) or phonologically (if it is not), and route sometimes produces low-frequency PG
hence does not predict the occurrence of individ- mappings. This is not necessary in the model, as we
ual spelling errors containing a mixture of ele- have access to the output produced by the phono-
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

ments coming from different routes. While we logical route (even if it does not appear in the final
were in the process of preparing this article, we output). Each word can thus serve as its own “con-
became aware of the results of a study by Rapp et trol.” The model was considered to exhibit the
al. (2002) with an acquired surface dysgraphic LAT effect in a given word only if the phonological
patient, LAT. LAT scored 90–98% correct on route was trying to regularise more than one low-
spelling pseudowords. On real words, he spelled probability (lexically correct) mapping, but only
high-frequency words better than low-frequency succeeded on one.
words, and regular words were spelled better than We found a number of examples of the LAT
words containing at least one low-probability effect in the impaired model’s output, and some are
phoneme–grapheme mapping. Low-frequency, listed in Table 9. For each word, the table shows the
irregular words were spelled worst of all. Of LAT’s target spelling (which also represents the lexical
errors, 90–100% were phonologically plausible. input to the grapheme selection process), the pho-
The interesting thing about the errors is that they nological spelling produced by the model (control),
showed clear evidence of integration of both lexi- and the actual spelling due to the combined influ-
cal and sublexical information. This can be shown ence of the two routes. In the actual spelling, under-
to occur when a word contains two (or more) low- lined graphemes originate from the phonological
probability phoneme–grapheme mappings. If the route, while graphemes in italic arise from lexical
phonological route is used alone, then we should input. These results confirm the proposal of Rapp
expect all mappings in the word to be high-proba- et al. (2002) that interactive, dual-route models
bility. What Rapp et al. found was that, in many of should exhibit these effects.
LAT’s phonologically plausible errors, alongside
the regularisation there occurred lexically correct,
low-probability elements. By comparison, his Table 9. Regularisation errors (model) showing lexically correct,
a
spelling of matched pseudowords contained sig- low probability graphemes
nificantly fewer such elements, making it unlikely Phonological
that the sound–spelling process was occasionally Target route Actual
contributing low-probability mappings to the Input word (lexical spelling) spelling spelling
spelling of words. For instance, LAT might spell /tVf/ TOUGH TUFF TUGH
the word “bouquet” (/b@UkeI/) as BOUKET, /skim/ SCHEME SCEAME SCEME
but would spell the matched pseudoword /Sik/ CHIC SHEEK SHI K
/lAf/ LAUGH LARF LARGH
“louquet” (/l@UkeI/) as LOKAY. In this case, in /deIn/ DEIGN DANE DEI NE
“bouquet” only the mapping /k/ → QU has been /tu/ TWO TOO TWOO
regularised, the remaining lexically correct, low- /w0t/ WHAT WOT WAT
probability mappings (e.g., /eI/ → ET) have /lImf/ LYMPH LIMF LY MF
remained. In the case of the matched pseudoword, a
In the actual spelling, underlined graphemes are
all the phoneme–grapheme mappings are regular regularisations from the phonological route, graphemes in
(high probability). italic are low-probability spellings from the lexical route.

146 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

LAT errors: Other possible mechanisms. This rather sistent words and nonwords, indicating that, on
subtle effect provides support for interactivity only average, they are equivalent in terms of the ambigu-
if the model is unable to produce it using either of ity or uncertainty of grapheme activation. Thus,
the two routes in isolation. In this case, the impair- one would not anticipate that the addition of noise
ment to the model would have to be different to to grapheme activations should result in a lexicality
that used so far, as reducing or slowing the activa- effect on the production of low-probability
tion of either route does not introduce graphemes graphemes. However, given that the latency results
into the output (of the isolated slowed route) that are based on means, it remains a possibility that
would not otherwise occur. The obvious alternative some exception words, though regularised, still
would be the addition of random noise to grapheme produce a higher degree of activation of their
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

node activations, a manipulation that has been lexically correct, low-probability graphemes than
used, for instance, in modelling the error patterns would matched pseudowords. If this were so, then
found in graphemic buffer disorder (Glasspool & addition of noise to the grapheme activations could
Houghton, 2002; Houghton et al., 1994). result in a higher rate of production of such
Addition of random noise to the output of the graphemes for words than nonwords.
isolated lexical route would not produce the LAT To explicitly test this possibility, we constructed
pattern, because when the lexical route activates the matched pseudowords (by changing either an
spelling of, say, lymph it does not produce any acti- initial or final consonant phoneme) for each of the
vation of the grapheme F as an alternative to PH eight exception words shown in Table 9. For each
(see Table 9). Although the addition of random pair, a “critical grapheme” was defined, this being
noise to all graphemes might occasionally lead to the lexically correct, low-probability grapheme pro-
the production of an F in the final position, it would duced by the (impaired) dual-route model when
not do so with any greater probability than the pro- attempting to spell the word (the italic graphemes
duction of any other grapheme, e.g., G (lymg). in Table 9, column 4). We presented each stimulus
However, as noted above, LAT’s errors are over- pair (word–pseudoword) to the isolated phonologi-
whelmingly phonologically plausible, and hence cal route and compared the initial grapheme activa-
this possibility is ruled out. tions (prior to output competition) produced by the
The possibility of the model showing the effect elements of each pair.
due to random noise in the isolated phonological First we examined whether, for the word stimu-
(assembly) route is rather more interesting. Rapp et lus, the critical grapheme received any activation at
al. (2002) rule this out in the case of LAT by the use all. If it does not, then it is no more likely to be pro-
of matched pseudowords as controls, showing that duced than any other grapheme, including any of
the production of the (lexically correct) low-proba- the phonologically implausible options (of which
bility graphemes is much more likely to occur for there are many more). In this case no further analy-
the word than for the nonword stimuli. If both sis is needed, as the addition of the amount of noise
types of stimuli were being produced by a noisy required to get the critical grapheme produced
phonological route, one would not expect to find would also generate many more phonologically
any such lexicality effect. However, this expectation implausible spellings (which LAT does not do). If
depends on the assumption that no plausible the critical grapheme is activated above baseline,
sound-to-spelling conversion mechanism will then the likelihood of its being produced by the
show a lexicality effect. This assumption may be addition of noise depends on its activation level
incorrect for models such as this, which are trained relative to that of its most active competitor (the
entirely on whole-word stimuli. inevitable “winner,” when the model is run without
Earlier, we showed that if consistency is con- noise). In this case we compared this difference
trolled, the model’s phonological route spells words with the equivalent one found for the matched
and nonwords equally quickly (Simulation 2). In pseudoword. If there is no difference between the
particular, no difference was found between incon- two cases (word vs. pseudoword), then the level of

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 147


HOUGHTON AND ZORZI

noise required to produce the low-probability the architecture embodying this solution as the
grapheme for the word stimulus would also cause it dual-route multilayer architecture (DR-ML,
to be produced (with the same frequency) by the Figure 2).
matched pseudoword, contrary to Rapp et al.’s One route of the DR-ML (the direct route) is
(2002) findings. The results were unequivocal. For equivalent to the simple two-layer network, and we
seven out of the eight words, the phonological route have shown that this route will come to act like a
produced no activation of the critical grapheme. sublexical sound-to-spelling conversion mecha-
The one case in which it did was /w0t/ (spelled nism (a “phonological” route) even when trained
WOT, lexically what, critical grapheme A, cf. Sim- exclusively on whole words. This mechanism regu-
ulation 6). The difference in initial activation larises irregularly spelled words, while showing
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

between the O and A (O minus A) was .5. For the excellent generalisation to nonword spelling. It also
matched pseudoword /w0l/ (spelled WOL) the A shows a consistency effect in response latencies, and
was also activated and the activation difference shows the kind of variability at the level of pho-
between O and A was .53, very close to that of the neme–grapheme mappings that is characteristic of
associated word. the performance of adult spellers of English. As
These results demonstrate that it is extremely such it seems to have captured something of the
unlikely that the phonological route of the model (implicit) knowledge underlying such performance,
will show any significant lexicality effect on the including more subtle aspects such as its context
selection of low-probability graphemes. We con- dependence and degrees of certainty. Some of this
clude, then, that the Rapp et al. (2002) data do knowledge is functionally equivalent to a simple
indeed constitute a stringent test of the interactivity verbalisable rule, for instance that the sound /b/ is
assumption. written B (Figure 8). However, the kind of graded
contextual influence shown by the model, for
instance in vowel spelling, is less easily expressible
GENERAL DISCUSSION in the form of explicit rules. The great attraction of
the connectionist framework is that both types of
Many cases of human problem solving exhibit a dis- relationship can be represented (and learned) in the
tinction between memory-based and analytic or same way, as sets of associative weights. The postu-
procedural methods. The memory method recog- lation of a competitive output mechanism allows
nises previously encountered examples of a prob- the model to represent inconsistent sound–spelling
lem, and retrieves the solution from memory (if one mappings; the model generates multiple possible
is available). The analytic method applies a general spellings in parallel, but eventually settles on the
procedure to the problem, and is especially useful one that is, overall, most consistent with the sound
for novel cases. Dual-route models of spelling and of a word.
reading are specific instances of this general In the second route of the model, interactions
perspective. between sound and spelling are mediated by a
While this general view is useful in focusing further representational level (or levels). In
attention on commonalties across cognitive connectionist terms, this translates into at least
domains, it may not take us very far when it comes one additional layer of units lying between the
to explaining detailed patterns of data in particular input and output layers. Because of the generalisa-
domains, and many different domain-specific tion ability of the direct route of the model, it is
models are compatible with it. In this paper we have possible for the mediated pathway to operate as a
applied a connectionist version of the dual-route localist lexical route (of the type postulated
theory to the problem of spelling. The specific form in many models of speech recognition, e.g.,
of the model emerges naturally as one solution to McClelland & Elman, 1986). In this paper, we
the problem of the representational limitations of have assumed that this is the case. Input of a famil-
the simple linear feedforward model. We refer to iar phonological word leads to the activation of an

148 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

orthographic lexical node, which in turn activates units and without direct connections between the
an associated spelling pattern. We also made the input and output units (Figure 2b). Such a model
simplifying (and possibly false) assumption that all can produce distributed representations at the hid-
known words are equally well learned, in the sense den level, which allows it to generalise what it has
that they are equally capable (in principle) of acti- learned to novel stimuli (pseudowords). Thus the
vating the correct orthographic pattern. Lower- functions performed by the traditional lexical and
frequency words, however, become activated more sublexical reading processes are intertwined in a
slowly due to their having a relatively weak ability single mechanism.
to support their own activation, resulting in slower Brown and Loosemore (1994) describe a
activation of grapheme nodes. When the two connectionist spelling model very similar to the
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

routes are activated in parallel, this results in a Seidenberg and McClelland (1989) reading model,
greater influence of the phonological route on the including the use of highly distributed input
spelling of low-frequency words, though it is still (sound) and output (spelling) representations. The
possible for the lexical route to dominate output model was trained on a small vocabulary of 227
where the two routes disagree. Nevertheless, the words, the focus of the study being on the rate at
interaction of the two routes produces important which words were learned as a function of their
results for spelling of known words in terms of sound–spelling regularity. All results reported from
response latencies: There is a main effect of regu- the model are in terms of its “error score” alone. The
larity, and a significant interaction of frequency model does not produce differences in response
with regularity-consistency. latency, and no examples are given of its actual
Further effects of the interaction of the two outputs.
routes only emerge when the lexical route is subject It was found that regular-consistent words were
to a lesion in the form of a reduced ceiling of activa- learned faster than both irregular words and
tion. Such a lesion produces a good quantitative and “unusual” words (ones with unique rimes, such as
qualitative simulation of the spelling of two surface bulb). Similar findings are reported from a study
dysgraphic patients. In particular the model shows with children in four different age groups. Brown
an effect that has only recently been reported (Rapp and Loosemore (1994) also produced developmen-
et al., 2002) and that we would not otherwise have tally “dyslexic” versions of their model by reducing
thought to look for. As far as we are aware, this is the number of hidden units. These models learned
the first time that such a range of data from more slowly than the normal model, but still
dysgraphic subjects has been successfully simulated showed the regularity effect. Though no detailed
by a computational model. analysis of the normal model’s nonword spelling is
provided, when compared to the dyslexic versions it
performed better (even when level of regular word
Relationship of the model to other models
spelling was held constant across models). How-
of spelling
ever, judging by Brown and Loosemore’s Figure
A number of connectionist models of spelling have 16.5, the nonword spelling of their normal model
been produced previously, mainly inspired by the was actually rather poor; its error score on
Seidenberg and McClelland (1989) model of read- nonwords was more than twice that produced on
ing, which was most notable for the challenge it regular words, and rather higher than that pro-
threw down to the traditional dual-route frame- duced by their “severely dyslexic” model on irregu-
work. In this model (and its successor, Plaut et al., lar words (Brown & Loosemore, 1994, Figure
1996), a single (nonsemantic) route translates 16.4a). Thus, although the model showed an
between the spelling and the pronunciation of a advantage for regular word spelling, there is no
word or word-like stimulus. The route consists of a indication that it abstracted the kind of representa-
three-layer, feedforward connectionist network tion necessary to support good spelling on
(trained using backpropagation), with hidden- nonwords. It would therefore not be expected to

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 149


HOUGHTON AND ZORZI

exhibit the surface dysgraphic pattern if damaged in errors, most were phonologically plausible. On a
some way. generalisation test using words the model had not
The spelling model of Olson and Caramazza been trained on, the 1628-word model performed
(1994) was also inspired by work on reading, in this quite well, producing 87% plausible spellings.
case the Sejnowski and Rosenberg (1987) NETtalk When damaged by the addition of noise to the
model. Like the Brown and Loosemore (1994) weights, the number of errors increased and irregu-
model, this is a three-layer feedforward network, lar words were more affected than regular words,
trained with the backpropagation algorithm. The but again with many implausible errors. This rather
model functions serially: The phonemic input is weak regularity effect was confirmed when the
presented by a “moving window”, centered on the damaged model was required to spell untrained
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

current phoneme, which moves from “left to right” words. For a mild lesion, plausible spellings of
as the word is presented. This allows the model to untrained words fell to around 70%, and with a
“see” each phoneme in the word in sequence, along stronger lesion it fell to about 50%. Olson and
with preceding and upcoming phonemic context. Caramazza conclude that, overall, the behaviour of
At the output level, a single grapheme is activated at the model provides only a weak approximation to
each time step, intended to correspond to the pho- the surface dysgraphic pattern; in both its normal
neme in the central position of the window. Note and damaged state, it produces too many implausi-
that, in contrast to the model presented in this ble spellings to be considered a viable model of sub-
paper, this method provides explicit training on ject data.
phoneme–grapheme relationships, as the model’s A similar model, again using the sliding window
“attention” is centred on one phoneme at a time, approach, has been proposed by Bullinaria (1994,
and the corresponding output grapheme (of the 1997). This model was trained on a corpus of 2837
current word) is simultaneously presented in isola- monosyllabic words, presented on the basis of their
tion. On this basis, one would anticipate that the log frequency, and achieved 100% correct spelling
Olson and Caramazza model should perform better of the corpus after 700 epochs. Like the Brown and
than the Brown and Loosemore model on nonword Loosemore (1994) model, it learned regular words
spelling. As with the latter model, Olson and more easily than exception words. The generalisa-
Caramazza assess their model exclusively by the tion performance of the model (nonword spelling)
errors it makes. It is not clear how the architecture was 88.6% correct, very similar to that achieved by
could produce variation in response latency. the Olson and Caramazza (1994) model. As with
The model was trained on corpora of 1000 and the other models discussed in this section,
1628 words, depending on the simulation. Both Bullinaria’s model only produces variation in error
corpora contained both regular and irregular words. scores, not in response latencies. However,
The best performance on the training set was Bullinaria (1994) follows Seidenberg and
reported for the 1628-word model. After 100 McClelland (1989) in maintaining that error scores
epochs it spelled 83% of the words correctly, at in the model should be monotonically related to
which point the error curve appeared to be at human response times. On this basis, the model
asymptote. The authors report various attempts to shows a regularity effect (regular vs. exception
improve on this level of performance, but they were words), but no frequency effect and no frequency by
unsuccessful. Of some importance, then, are the regularity interaction. Damage to the model (in the
errors the model made, in particular whether they form of weight scaling) has the greatest effect on
were phonologically plausible. Olson and the model’s spelling of exception words, a basic
Caramazza (1994) report that the majority of cases element of the surface dysgraphia. Additional
were actually omission errors, defined as the model features of surface dysgraphia are not simulated,
failing to activate any grapheme above a threshold. and the possibility of also observing phonological
Such errors would result in phonologically implau- dysgraphia in the same model is ruled out
sible spellings. However, of the nonomission (Bullinaria, 1994). Bullinaria concludes that the

150 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

model suffers from the lack of any kind of lexical extracting and representing the sound-to-spelling
route, but that it might serve as a model of the regularities of English. This is based on the statisti-
sound-to-spelling route. However, if this were so cal regularities present in its training corpus. Any
then it would be a sound-to-spelling model capable corpus with the same properties would lead to the
(in principle) of learning to spell exception words, same result. Thus to the extent that regularities
which our model cannot do. present in the set of English monosyllables are rep-
In conclusion, previous models of spelling were resentative of those in the language more generally,
inspired by the single-route reading models of the the eventual outcome should be the same. How-
1980s. As might be expected, they suffer from ever, the details of the developmental progression
many of the problems these latter models have run might well be affected if the sound-to-spelling rela-
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

into. No such model has so far proved able to simu- tionships of the set of words children are exposed to
late the range of data considered in this paper; this is are not statistically stationary.
true both of the detailed features and also of a class A more important limitation to the model is that
of data, variation in response latency. None of the the lexical route is at present not learned. This
models discussed above have any mechanism affects the development of the phonological route,
whereby this can vary. as it learns with no contribution from a lexical route.
If the latter were developing in parallel, then it
would be expected to contribute by reducing the
Limitations of the model
error, and this could have a knock-on effect on what
The present model is limited in a number of ways. the phonological route learns. One possibility is
Below we discuss three areas clearly meriting fur- that the lexical route might contribute most to what
ther investigation: learning; representation of poly- it is most important for, i.e., spelling irregular and
syllabic words; and output processes, in particular inconsistent words. These possibilities were inves-
control of serial order. tigated by Zorzi et al. (1998a) in their reading
model (which has the same architecture as this
Learning. The phonological route of the model is model). The model was trained on the monosyl-
trained, and extracts its sound-to-spelling knowl- labic word set (from spelling to sound), with learn-
edge from what we consider to be a reasonable ing taking place in both direct and mediated
sample of English words (all the uninflected mono- (hidden unit) pathways at the same time. This ver-
syllables). The corpus is not “filtered” in any way to sion of the model was capable of learning the whole
bias results in a desired direction, and is of the kind training set, including the exception words. How-
that has been used in other spelling models, and in ever, it was found that the direct route (when stud-
related work on reading, permitting models to be ied in isolation) still behaved like a spelling-to-
compared on a consistent basis. However, we sound conversion mechanism, and did not acquire
would not advance the model as it stands as a model lexical properties. The hidden unit pathway
of learning to spell. behaved more like a (distributed) lexical route.
One reason is related to this corpus. Children do When the model was trained with relatively few
not learn to read and write from a few exposures hidden units (restricting its capacity to represent
(about 20 in the simulations) to the whole set of the training set), they appeared to dedicate them-
monosyllabic words of English (many of which are selves to the exception words, by “correcting” the
of very low-frequency). Rather they begin with a output of the direct route, which, left to itself,
smaller set, which they see many times, and many of would regularise them. Hence, there is no problem
which are polysyllabic and /or inflected. A realistic in principle for the DR-ML architecture to learn in
learning model should clearly reflect more closely both routes simultaneously, and there is evidence
the stimulus environment of the child learning to that it will tend to “self-modularise” so that the
spell. That said, judging by the results of the regular, productive, sound-spelling mapping will
simulations, the model actually does a good job of be carried out by the direct route (the two-layer

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 151


HOUGHTON AND ZORZI

phonological route of the current model). Notably, BICK), they hardly ever used it at the beginning of
a very similar result has also been obtained in the a word, only in the middle and at the ends. Most
context of learning the past tenses of English verbs importantly, this positional restriction (and others
(Thomas & Karmiloff-Smith, in press). like it) was not explicitly taught in the school.
In the simulations of Zorzi et al. (1998a, 1998b), Hence even children at the beginning of formal
backpropagation was used to train the hidden-unit instruction in spelling appear to have (implicitly)
route. This algorithm tends to generate distributed extracted some representation of the positional
(i.e., non-orthogonal) representations at the hid- availability of graphemes from the structure of the
den units. The current model, however, simulates words they see.
the lexical route as a localist network, in which each The present model ought to be able to accom-
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

known word is orthogonally represented. Such rep- modate such findings. It never produces CK at the
resentations can be formed in neural networks beginning of a word (even early in learning) because
using, for instance, competitive learning algorithms it encodes grapheme identity along with position.
(Grossberg, 1980; Kohonen, 1984). Our proposal Since CK never appears in the word initial position,
that frequency affects the strength of the feedback it never co-occurs with an initial /k/. A related
weight, by which a representation supports its own developmental account might go as follows.
activation, is compatible with competitive learning; Children first need to develop representations of
if each time a node is activated, its feedback loop is individual letters for the purpose of writing them.
strengthened (thus enabling it to fare better in the This generally occurs before learning to spell whole
competition for activation, which is essential to words (Treiman, 1994), and the representations
such algorithms), then more frequent words will be formed are consequently independent of letter
more easily activated. Thus there should be no dif- position. However, with the experience of writing
ficulty in principle with adding learning to the lexi- words, it is obvious that letter position must also be
cal route. encoded in some form (rat, art, and tar are not the
A further important learning issue involves the same word). With increasing experience, conjunc-
model’s orthographic representation. The model is tive representations may develop of “graphemes in
supplied at the outset with a syllabically aligned positions,” but only for those combinations that are
graphemic representation, which is of great benefit actually encountered. On this account, children’s
to its learning of the sound–spelling mapping. sensitivity to positional constraints would reflect
While it is known that children have developed syl- their ability to produce representations containing
labic phonology by the time they start to learn to an element of serial position.
read and write (Treiman, 1994), their orthographic
representations must develop as a part of this learn- Representation of polysyllabic words. The model can
ing. Hence a plausible developmental model of only handle monosyllabic words. How could it be
spelling cannot start out with complex grapheme extended to deal with polysyllabic words? One
nodes, and orthographic syllable structure. In some obvious problem that might arise concerns the
cases children may acquire graphemes by being model’s positional orthographic representation.
taught them directly, e.g., TH, SH, CH, etc. As a The model proposes that (good) spellers have
result these short sequences become chunked, and developed conjunctive representations of
available for use in representing longer sequences. graphemes in syllabic positions. If literally extended
However, not all orthographic knowledge of this to longer words, this would mean that separate
sort is directly taught. Treiman (1993) provides an nodes would be needed for, say, all the initial conso-
interesting example of the grapheme CK, which nants in the 2nd syllable of a word, all the vowels in
cannot occur syllable-initially in English (though the 2nd syllable, all final consonants in the 3rd
its associated phoneme /k/ can). Treiman found syllable, and so on. Intuitively this is not a very
that although first-graders sometimes used this appealing prospect, though given that words are of
grapheme incorrectly (for instance, spelling bike as finite length it is not actually an impossibility.

152 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

However, one serious problem would be that learn- in particular, there can be no direct access to the
ing of the sound-to-spelling relationship for one- spelling of a word from its meaning (as it contains
syllable words would not generalise to polysyllabic no serial order information). Most current func-
words, because they would use a different set of tional models of spelling reject this position, postu-
position-specific grapheme nodes. Knowing that lating the existence of a direct route from semantics
an initial /b/ is spelled B in monosyllabic words to the orthographic output lexicon (Figure 1). This
would not help the learner spell the last syllable of view is supported, for instance, by studies of sub-
Beelzebub. jects who show better written than oral naming
One solution to this problem is to emphasise the (Bub & Kertesz, 1982). On this account, it is
serial nature of spelling, and propose that it pro- unlikely that the processes controlling serial output
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

ceeds on a syllable-by-syllable basis. Thus in the in spelling are simple reflexes of those of the speech
orthographic lexicon, polysyllabic words would be system.
represented hierarchically as a sequence of syllables. The idea that sequentially ordered behaviour
Each syllable node would be connected to the involves a stage of parallel activation of a set of
graphemes in that particular syllable, and a single responses has a long history, and indeed was central
set of grapheme nodes could be used to represent all to Lashley’s (1951/1960) influential arguments
syllables. Spelling the word would involve the sylla- against associative chaining (see Houghton &
ble nodes becoming active in sequence. For the Hartley, 1996, for discussion). Box-and-arrow
phonological route the principle would be the same. models of spelling endorse this view by postulating
The phonological form of polysyllabic words (and the existence of a “graphemic output buffer” in
nonwords) would be presented one syllable at a which a number of responses are activated before
time, and thus would be spelled by iterating over the being executed (Morton, 1980; Shallice, 1988).
process of spelling monosyllabic words. Our model is clearly compatible with this idea. A
Hence we propose that spelling polysyllabic specific acquired disorder of the graphemic buffer
words in this model requires the addition of more has been identified (Caramazza, Miceli, Villa, &
hierarchical representations and of serial processes. Romani, 1987) and studied in a number of patients
However, even spelling a monosyllabic word is (see Shallice et al., 1995, for a review). The impair-
serial at the level of the individual letters, but our ment mainly involves errors in letter selection and
model stops at the parallel activation of a set of ordering, and is similar for both words and
grapheme nodes representing a spelling “plan.” nonwords. Two models of the graphemic buffer
This raises the issue of how this plan is actually real- and disorders of it have been proposed, the
ised as a sequence of actions. symbolic Multiple Object Spelling (MOS) model
of Caramazza and Miceli (1990), and the
Serial order and the graphemic output buffer. Some of connectionist Competitive Queuing (CQ) model
the models discussed above (Bullinaria, 1994; of Houghton, Glasspool, and Shallice (1994; see
Olson & Caramazza, 1994) present the phonologi- also Glasspool & Houghton, 2002). Detailed
cal input serially using the sliding window tech- discussion of these models and the data they are
nique. This leads directly to serial activation of intended to account for is beyond the scope of this
grapheme nodes at the output. If we assume that article. However, it is important to consider how
the activation of each output node is translated into compatible they are with the proposals made in the
a response as it becomes active (i.e., no “buffering” current model.
takes place), then these models do not generate a The MOS model is in essence a model of spell-
spelling plan in the way our model does. The ing representation (Shallice et al., 1995), and has
advantage of this is that, for spelling, there is no little to say about serial output processes per se. It
“problem of serial order”—the serial order is in the proposes that spelling plans in the graphemic buffer
phonology. An apparent prediction of such a model include information regarding orthographic sylla-
is that all spelling must therefore be via phonology; ble structure, the consonant–vowel status of indi-

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 153


HOUGHTON AND ZORZI

vidual letters, and whether a letter is to be doubled. plex grapheme nodes) followed by the “unpacking”
Our model is broadly consistent with these propos- of this spelling plan into the activation of a series of
als. The main difference appears to be that our individual letters to control the serial behaviour of
model contains complex grapheme nodes, whereas writing (or typing, or oral spelling). This final out-
it is not clear that these are included in the MOS put stage would have the dynamic characteristics
model. In all descriptions we have seen of the model proposed by CQ models, modulated by informa-
(which has not been implemented), syllable nodes tion regarding “graphosyllabic” structure, as pro-
are decomposed directly into individual letters (see, posed by Caramazza and colleagues, and as
e.g., Link & Caramazza, 1994). The importance of implemented in the current model.
complex graphemes in spelling output representa- Although the two-stage idea may seem some-
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

tions may therefore be a distinct prediction of the what complex, it is inevitable in any model that
present model. activates a group of grapheme nodes (more gener-
The CQ model of Houghton and colleagues ally “response nodes”) in parallel, i.e., any model
(1994) was developed to simulate the serial produc- with the functional equivalent of a response buffer.
tion of letters. When a word is to be spelled, a group Hence, such a two-stage process is also required by
of letter nodes are activated in parallel, but with a many current models of speech production (Dell,
gradient of activation over them such that letters are Schwartz, Martin, Saffran, & Gagnon, 1997; Rapp
more active the sooner they are to be produced. Let- & Goldrick, 2000) in which activation of the pho-
ters compete to be produced depending on their nemes of a syllable occurs in parallel. Clearly, some
activation level. Under disruption (in the form of additional process must convert this parallel activa-
noise added to letter activations) the model has tion into the serial order of speech sounds (see
proved capable of accounting for many basic fea- Hartley & Houghton, 1996). This point applies
tures of the errors found in graphemic buffer disor- also to the output stage of current models of reading
der, such as word length effects, the serial position aloud, which stop at the parallel activation of a set of
curve, and error types (Shallice et al., 1995). In the phonemes in a syllabic frame (Plaut et al., 1996;
first version of the model, a word was represented Zorzi et al., 1998a).
simply as a series of letters, with no further structure
other than a special marking of doubled letters
Conclusions
(geminates). A more recent version by Glasspool
and Houghton (2002) includes a representation of In this paper we have presented the first fully
the consonant–vowel status of letters, but like the implemented dual-route model of spelling. We
MOS model, does not use complex grapheme have argued that this architecture can be derived as
nodes. one solution to the problem of the computational
The CQ model is compatible with the current limitations of the simplest connectionist feed-
model if it is assumed that, following the initial par- forward network. It differs from the more
allel activation of a set of grapheme nodes proposed conventional single-route multilayer network in
here, there is a process that converts this activation maintaining the direct connections between the
into a gradient of activation over a set of letter input and output units. This permits the network
nodes. The first stage of activation would not only to partition its knowledge to some extent. In par-
provide information regarding consonant-vowel ticular we have shown that the direct route, when
status of letters (as required by the model of attempting to learn to spell a representative sample
Glasspool & Houghton, 2002), but also informa- of English words, ends up behaving like a
tion about the position of letters in the graphemic sublexical sound-to-spelling conversion route.
syllable, as required by Caramazza and colleagues’ This route in isolation shows a consistency effect
MOS model. The picture that emerges is therefore on response latency but no lexicality effect. The
of a two-stage output process, consisting of an ini- spellings it produces are virtually always phono-
tial parallel stage (including the activation of com- logically plausible, and the variation and context-

154 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

sensitivity it shows matches well with the available Baxter, D. M., & Warrington, E. K. (1987). Ideational
data from adult subjects, both normal and agraphia: A single case study. Journal of Neurology,
impaired. When combined with a frequency-sen- Neurosurgery, and Psychiatry, 49, 369–374.
sitive lexical route, the resulting interactive model Beauvois, M. F., & Derousné, J. (1981). Lexical or ortho-
shows a frequency-by-regularity interaction on graphic agraphia. Brain, 104, 21–49.
Behrmann, M. (1987). The rites of righting writing:
response latencies for spelling words. Weakening
Homophone remediation in acquired dysgraphia.
the lexical route permits a more detailed qualita-
Cognitive Neuropsychology, 4, 365–384.
tive and quantitative simulation of the surface Behrmann, M., & Bub, D. (1992). Surface dylexia and
dysgraphic syndrome than has previously been dysgraphia: Dual routes, single lexicon. Cognitive
achieved. This includes the finding that the model
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Neuropsychology, 9, 209–251.
produces spelling errors containing blends of Brown, G. D. A. (1987). Resolving inconsistencies: A
(low-probability) lexical and nonlexical elements, computational model of word naming. Journal of
an aspect of surface dysgraphia that has only Memory and Language, 26, 1–23.
recently been explored (Rapp et al., 2002). Brown, G. D. A., & Ellis, N. C. (Eds.). (1994). Hand-
In conclusion, the model provides strong sup- book of spelling: Theory, process and intervention.
port for a dual-route architecture for spelling, in Chichester, UK: John Wiley.
which there is parallel access to the two routes and Brown, G. D. A., & Loosemore, R. P. W. (1994),
interaction between them at the level at which a Computational approaches to normal and impaired
spelling plan is formed. The sound-to-spelling spelling. In G. D. A. Brown & N. C. Ellis (Eds.),
route need not be specified as such a priori, but will Handbook of spelling: Theory, process and intervention.
emerge from the attempt to predict spelling from (pp. 319–336). Chichester, UK. John Wiley.
Bub, D., Cancelliere, A., & Kertesz, A. (1985). Whole-
sound, as long as the two levels of representation
word and analytic translation of spelling to sound in a
can make direct contact with each other. Implicit
non-semantic reader. In K. E. Patterson, J. C.
knowledge of the sound-to-spelling relationship in
Marshall, & M. Coltheart (Eds.), Surface dyslexia:
English is best characterised as associative, and Neuropsychological and cognitive studies of phonological
based on the extraction of statistical relationships of reading (pp. 15–34). Hove, UK: Lawrence Erlbaum
varying degrees of consistency. Associates Ltd.
Bub, D., & Kertesz, A. (1982). Evidence for lexico-
Manuscript received 16 October 2001 graphic processing in a patient with preserved writ-
Revised manuscript received 28 October 2002 ten over oral single word naming. Brain, 105, 697–
Revised manuscript accepted 14 November 2002
717.
Bullinaria, J. (1994). Connectionist modelling of spell-
ing. In Proceedings of the Sixteenth Annual Conference of
the Cognitive Science Society (pp. 78–83). Hillsdale,
REFERENCES NJ: Lawrence Erlbaum Associates Inc.
Bullinaria, J. (1997). Modelling reading, spelling and
Baron, J., & Strawson, C. (1976). Use of orthographic past tense learning with artificial neural networks.
and word-specific knowledge in reading words aloud. Brain and Language, 59, 236–266.
Journal of Experimental Psychology: Human Perception Bullinaria, J., & Chater, N. (1995). Connectionist
and Performance, 2, 386–392. modelling: Implications for cognitive neuro-
Barry, C. (1994). Spelling routes (or roots or rutes). In G. psychology. Language and Cognitive Processes, 10,
D. A. Brown & N. C. Ellis (Eds.), Handbook of 227–264.
spelling: Theory, process and intervention (pp. 27–49). Caramazza, A., & Miceli, G. (1990). The structure of
Chichester, UK: John Wiley. graphemic representations. Cognition, 37, 243–297.
Barry, C., & Seymour, P. H. K. (1988). Lexical priming Caramazza, A., Miceli, G., Villa, G., & Romani, C.
and sound-to-spelling contingency effects in (1987). The role of the graphemic buffer in spelling:
nonword spelling. Quarterly Journal of Experimental evidence from a case of acquired dysgraphia. Cogni-
Psychology, 40A, 5–40. tion, 26, 59–85.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 155


HOUGHTON AND ZORZI

Castles, A., & Coltheart, M. (1996). Cognitive correlates Forster, K. I., & Chambers, S. (1973). Lexical access and
of developmental surface dyslexia: A single case study. naming time. Journal of Verbal Learning and Verbal
Cognitive Neuropsychology, 13, 25–50. Behaviour, 12, 627–635.
Cipolotti, L., & Warrington, E. K. (1995). Semantic Frith, U. (Ed.). (1980). Cognitive processes in spelling.
memory and reading abilities: A case report. Journal of London: Academic Press.
the International Neuropsychological Society, 1, 104– Funnell, E. (1996). Response biases in oral reading: An
110. account of the co-occurrence of surface dyslexia and
Coltheart, M. (1978). Lexical access in simple reading semantic dementia. Quarterly Journal of Experimental
tasks. In G. Underwood (Ed.), Strategies of informa- Psychology, 49A, 417–446.
tion processing. London: Academic Press. Gelb, I. J. (1952). A study of writing. Chicago: University
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Coltheart, M. (1985). Cognitive neuropsychology and of Chicago Press.


the study of reading. In M. I. Posner & O. S. M. Glasspool, D., & Houghton, G. (2002). Serial order and
Marin (Eds.), Attention and performance XI (pp. 3– consonant–vowel structure in a model of disordered spell-
37). Hillsdale, NJ: Lawrence Erlbaum Associates Inc. ing. Manuscript submitted for publication.
Coltheart, M., & Funnell, E. (1987). Reading and Glosser, G., Grugan, P., & Friedman, R. B. (1999a).
writing: One lexicon or two? In A. Allport, D. G. Comparison of reading and spelling in patients with
MacKay et al. (Eds), Language perception and produc- probable Alzheimer’s disease. Neuropsychology, 13,
tion: Relationships between listening, speaking, reading 350–358.
and writing (pp. 313–339). London: Academic Press. Glosser, G., Kohn, S. E., Sands L., Grugan, P. K., &
Coltheart, M., Masterson, J., Byng, S., Prior, M., & Friedman, R. B. (1999b). Impaired spelling in
Riddoch, J. (1983). Surface dyslexia. Quarterly Journal
Alzheimer’s disease: A linguistic deficit? Neuro-
of Experimental Psychology, 37A, 469–495.
psychologia, 37, 807–815.
Coltheart, M., & Rastle, K. (1994). Serial processing in
Gluck, M. A., & Bower, G. H. (1988a). Evaluating an
reading aloud: Evidence for dual-route models of
adaptive network model of human learning. Journal of
reading. Journal of Experimental Psychology: Human
Memory and Language, 27, 166–195.
Perception and Performance, 20, 1197–1211.
Gluck, M. A., & Bower, G. H. (1988b). From condi-
Coltheart, M., Rastle, C., Perry, C., Langdon, R., &
tioning to category learning: An adaptive network
Ziegler, J. (2001). DRC: A dual route cascaded model
model. Journal of Experimental Psychology: General,
of visual word recognition and reading aloud. Psycho-
117, 227–247.
logical Review, 108, 204–258.
Goodman, R. A., & Caramazza, A. (1986). Aspects of
Dell, G. S., Schwartz, M. F., Martin, N., Saffran, E. M.,
& Gagnon, D. A. (1997). Lexical access in aphasic the spelling process: Evidence from a case of acquired
and nonaphasic speakers. Psychological Review, 104, dysgraphia. Language and Cognitive Processes, 1, 263–
801–838. 296.
Demonet, J. F., Chollet, F., Ramsay, S., Cardebat, D., Goodman-Schulman, R., & Caramazza, A. (1987).
Nespoulous, J. L., Wise, R. et al. (1992). The anat- Patterns of dysgraphia and the nonlexical spelling
omy of phonological and semantic processing in process. Cortex, 23, 143–148.
normal subjects. Brain, 115, 1753–1768. Goswami, U., & Bryant, P. (1990). Phonological skills and
Denes, F., Cipolotti, L., & Zorzi, M. (1999). Acquired learning to read. Hove, UK: Lawrence Erlbaum Asso-
dyslexias and dysgraphias. In G. Denes & L. ciates Ltd.
Pizzamiglio (Eds.), Handbook of clinical and experi- Graham, N. L. (2000). Dysgraphia in dementia.
mental neuropsychology (pp. 289–317). Hove, UK: Neurocase, 6, 365–376.
Psychology Press. Graham, N. L., Hodges, J. R., & Patterson, K. (1994).
De Partz, M.-P., Seron, X., & Van der Linden, M. The relationship between comprehension and oral
(1992). Re-education of a surface dysgraphia with a reading in progressive fluent aphasia. Neuro-
visual imagery strategy. Cognitive Neuropsychology, 9, psychologia, 32, 299–316.
369–401. Graham, N. L., Patterson, K., & Hodges, J. R. (1997).
Ellis, A. (1988). Normal writing processes and peripheral Progressive dysgraphia: Co-occurrence of central and
acquired dysgraphias. Language and Cognitive peripheral impairments. Cognitive Neuropsychology,
Processes, 3, 99–127. 14, 975–1005.

156 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Graham, N. L., Patterson K., & Hodges, J. R. (2000). Kessler, B., & Treiman, R. (2001). Relationships
The impact of semantic memory impairment on between sounds and letters in English monosyllables.
spelling: Evidence from semantic dementia. Journal of Memory and Language, 44, 592–617.
Neuropsychologia, 38, 143–163. Kohonen, T. (1984). Self-organisation and associative
Grainger, J., & Jacobs, A. M. (1996). Orthographic memory. New York: Springer.
processing in visual word recognition: A multiple Kreiner, D. S. (1992). Reaction time measures of spell-
read-out model. Psychological Review, 103, 518–565. ing: Testing a two-strategy model of skilled spelling.
Grossberg, S. (1980). How does a brain build a cognitive Journal of Experimental Psychology: Learning, Memory
code? Psychological Review, 87, 1–51. and Cognition, 18, 765–776.
Hahn, U., & Nakisa, R. C. (2000). German inflection: Kreiner, D. S. (1996). Effects of word familiarity and
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Single route or dual route? Cognitive Psychology, 41, phoneme–grapheme polygraphy on oral spelling
313–360. time and accuracy. The Psychological Record, 46, 49–
Hall, D. A., & Riddoch, M. J. (1997). Word meaning 70.
deafness: Spelling words that are not understood. Kreiner, D. S., & Gough, P. B. (1990). Two ideas about
Cognitive Neuropsychology, 14, 1131–1164. spelling: Rules and word-specific memory. Journal of
Hanley, R., Hastie, K., & Kay, J. (1992). Developmental Memory and Language, 29, 103–118.
surface dyslexia and dysgraphia: An orthographic Kucera, H., & Francis, W. N. (1967). Computational
processing impairment. Quarterly Journal of Experi- analysis of present-day American English. Providence,
mental Psychology, 44A, 285–319. RI: Brown University Press.
Harm, M., & Seidenberg, M. S. (1999). Phonology, Lambert, J., Eustache, F., Viader, F., Dary, M., Rioux,
reading acquisition, and dyslexia: Insights from P., & Lechevalier, B. (1996). Agraphia in Alzhei-
connectionist models. Psychological Review, 106, 491–
mer’s disease: An independent lexical impairment.
528.
Brain and Language, 53, 222–233.
Hartley, T., & Houghton, G. (1996). A linguistically-
Lambon Ralph, M., Ellis, A. W., & Franklin, S. (1995).
constrained model of short-term memory for
Semantic loss without surface dyslexia. Neurocase, 1,
nonwords. Journal of Memory and Language, 35, 1–31.
363–369.
Hatfield, F. M., & Patterson, K. E. (1983). Phonological
Lashley, K. S. (1951/1960). The problem of serial order
spelling. Quarterly Journal of Experimental Psychology,
in behaviour. In F. Beach, D. Hebb, C. Morgan, &
35A, 451–468.
H. Nissen (Eds.), The neuropsychology of Lashley (pp.
Hinton, G. E., & Shallice, T. (1989). Lesioning an
attractor network: Investigations of acquired dyslexia. 506–528). New York: McGraw-Hill.
Psychological Review, 98, 74–95. Link, K., & Caramazza, A. (1994). Orthographic struc-
Houghton, G., Glasspool, D., & Shallice, T. (1994). ture and the spelling process: A comparison of differ-
Spelling and serial recall: Insights from a competi- ent codes. In G. D. A. Brown & N. C. Ellis (Eds.),
tive queuing model. In G. D. A. Brown & N. C. Ellis Handbook of spelling: Theory, process and intervention
(Eds.), Handbook of spelling: Theory, process and inter- (pp. 261–294). Chichester, UK: John Wiley.
vention (pp. 365–404). Chichester, UK: John Wiley. Macoir, J., & Bernier, J. (2002). Is surface dysgraphia tied
Houghton, G., & Hartley, T. (1996). Parallel models of to semantic impairment? Evidence from a case of
serial behaviour: Lashley revisted. Psyche, 2. Retrieved semantic dementia. Brain and Cognition, 48, 452–
month, day, year, from, http://psyche.cs.monash.edu. 247.
au/v2/psyche-2-25-houghton. html McCarthy, R., & Warrington, E. K. (1986). Phonologi-
Houghton, G., & Zorzi, M. (1998). A model of the cal reading: Phenomena and paradoxes. Cortex, 22,
sound–spelling mapping in English and its role in 359–380.
word and nonword spelling. Proceedings of the Twenti- McClelland, J. D., & Elman, J. L. (1986). The TRACE
eth Annual Conference of the Cognitive Science Society model of speech perception. Cognitive Psychology, 18,
(pp. 490–495). Mahwah, NJ: Lawrence Erlbaum 1–86.
Associates Inc. McClelland, J. L., & Rumelhart, D. E. (1981). An inter-
Hughes, J. C., Graham, N., Patterson, K., & Hodges, J. active activation model of context effects in letter
R. (1997). Dysgraphia in mild dementia of the perception: Part 1. An account of basic findings.
Alzheimer’s type. Neuropsychologia, 35, 533–545. Psychological Review, 88, 375–407.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 157


HOUGHTON AND ZORZI

Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995) Plunkett, K., & Marchman, V. (1991). U-shaped learn-
Assessment of the Rescorla-Wagner model. Psycho- ing and frequency-effects in a multilayered
logical Bulletin, 117, 363–386. perceptron—Implications for child language acquisi-
Morais, J., Cary, L., Alegria, J., & Bertelson, P. (1979). tion. Cognition, 38, 43–102.
Does awareness of speech as a sequence of phonemes Price, C. J., Moore, C. J., Humphreys, G. W., & Wise,
arise spontaneously? Cognition, 7, 323–331. R. J. S. (1997). Segregating semantic from phonolog-
Morton, J. (1980). The logogen model and orthographic ical processing during reading. Journal of Cognitive
structure. In U. Frith (Ed.), Cognitive processes in spell- Neuroscience, 9, 727–733.
ing (pp. 117–113). London: Academic Press. Rapcsak, S. Z., Arthur, S. A., Bliklen, D. A., & Rubens,
Noble, K., Glosser, G., & Grossman, M. (2000). Oral A. B. (1989). Lexical agraphia in Alzheimer’s disease.
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

reading in dementia. Brain and Language, 74, 48–69. Archives of Neurology, 46, 65–68.
Norris, D. (1994). A quantitative, multiple levels model Rapcsak, S. Z., Arthur, S. A., & Rubens, A. B. (1988).
of reading aloud. Journal of Experimental Psychology: Lexical agraphia from focal lesion of the left
Human Perception and Performance, 20, 1212–1232. precentral gyrus. Neurology, 38, 1119–1123.
Olson, A., & Caramazza, A. (1994). Representation and Rapp, B., Epstein, C., & Tainturier, M. J. (2002). The
connectionist models: The NETspell experience. In integration of information across lexical and
G. D. A. Brown & N. C. Ellis (Eds.), Handbook of sublexical processes in spelling. Cognitive
spelling: Theory, process and intervention (pp. 337– Neuropsychology, 19, 1–29.
364). Chichester, UK: John Wiley and Sons. Rapp, B., & Goldrick, M. (2000). Discreteness and
Patterson, K., & Behrmann, M. (1997). Frequency and interactivity in spoken word production. Psychological
consistency effects in a pure surface dyslexic patient. Review, 107, 460–499.
Journal of Experimental Psychology: Human Perception Roeltgen, D. P., & Heilman, K. M. (1984). Lexical
and Performance, 23, 1217–1231. agraphia: Further support for the two-system hypo-
Patterson, K. E., & Hodges, J. R. (1992). Deterioration thesis of linguistic agraphia. Brain, 107, 811–827.
of word meaning: Implications for reading. Rosenblatt, F. (1962). Principles of neurodynamics. New
Neuropsychologia, 12, 1025–1040. York: Spartan.
Paulesu, E., McCrory, E., Fazio, F., Menoncello, L., Rumelhart, D. E., Hinton, G. E., & Williams, R. J.
Brunswick, N., Cappa, S. F., Cotelli, M., Cossu, G., (1986). Learning internal representations by error
Corte, F., Lorossu, M., Pesenti, S., Gallagher, A., propagation. In D. E. Rumelhart & J. L. McClelland
Perani, D., Price, C., Frith, C. D., & Frith, U. (2000). (Eds.), Parallel distributed processing: Exploration in
A cultural effect on brain function. Nature Neurosci- the microstructure of cognition. Volume 1: Foundations
ence, 3, 91–96. (pp. 318–362). Cambridge, MA: MIT Press.
Penniello, M. J., Lambert, J., Eustache, F., et al. (1995). Schwartz, M. F., Saffran, E. M., & Marin, O. S. M.
A PET study of the functional neuroanatomy of writ- (1980). Fractionating the reading process in demen-
ing impairment in Alzheimer’s disease. The role of tia: Evidence for word-specific print-to-sound associ-
the left supramarginal and left angular gyri. Brain, ations. In M. Coltheart, K. E. Patterson, & J. C.
118, 697–706. Marshall (Eds.), Deep dyslexia (pp. 259–269).
Pinker, S. (1997). Words and rules in the human brain. London: Routledge & Kegan Paul.
Nature, 387, 547–548. Seidenberg, M. S., & McClelland, J. L. (1989). A
Platel, H., Lambert, J., Eustache, F., Cadet, B., Dary, distributed, developmental model of word recogni-
M., Viader, F., & Lechevalier, B. (1993). Character- tion and naming. Psychological Review, 96, 523–568.
istics and evolution of writing impairment in Alzhei- Seidenberg, M. S., Waters, G. S., Barnes, M. A., &
mer’s disease. Neuropsychologia, 31, 1147–1158. Tanenhaus, M. K. (1984). When does irregular spell-
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & ing or pronunciation influence word recognition?
Patterson, K. E. (1996). Understanding normal and Journal of Verbal Learning and Verbal Behaviour, 23,
impaired word reading: Computational principles in 383–404.
quasi-regular domain. Psychological Review, 103, 56– Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel
115. networks that learn to pronounce English text.
Plaut, D. C., & Shallice, T. (1993). Deep dyslexia: A case Complex Systems, 1, 145–168.
study of connectionist neuropsychology. Cognitive Shallice, T. (1981). Phonological agraphia and the lexical
Neuropsychology, 10, 377–500. route in writing. Brain, 104, 413–429.

158 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

Shallice, T. (1988). From neuropsychology to mental struc- reading. Hove, UK: Lawrence Erlbaum Associates
ture. Cambridge: Cambridge University Press. Ltd.
Shallice, T., Glasspool, D. W., & Houghton, G. (1995). Treiman, R., Kessler, B., & Bick, S. (in press). Context
Can neuropsychological evidence inform connection- sensitivity in the spelling of English vowels. Journal of
ist modelling? Analyses of spelling. Language and Memory and Language.
Cognitive Processes, 10, 195–225. Treiman, R., Mullenix, J., Bijeljac-Babic, R., & Rich-
Shanks, D. R. (1991). Categorization by a connectionist mond-Welty, E. D. (1995). The special role of rimes
network. Journal of Experimental Psychology: Learning, in the description, use, and acquisition of English
Memory and Cognition, 17, 433–443. orthography. Journal of Experimental Psychology:
Stone, G. O., Vanhoy, M., & Van Orden, G. C. (1997). General, 124, 107–136.
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

Perception is a two-way street: Feedforward and feed- Ward, J., Stott, R., & Parkin, A. J. (2000). The role of
back phonology in visual word recognition. Journal of semantics in reading and spelling: Evidence for the
Memory and Language, 36, 337–359. “summation hypothesis”. Neuropsychologia, 38, 1643–
Sutton, R. S., & Barto, A. G. (1981). Toward a modern 1653.
theory of adaptive networks: Expectation and Weekes, B., & Coltheart, M. (1996). Surface dyslexia
prediciton. Psychological Review, 88, 135–170. and surface dysgraphia: Treatment studies and their
Taraban, R., & McClelland, J. L. (1987). Conspiracy theoretical implications. Cognitive Neuropsychology,
effects in word pronunciation. Journal of Memory and 13, 277–315.
Language, 26, 608–631. Widrow, G., & Hoff, M. E. (1960). Adaptive switching
Thomas, M., & Karmiloff-Smith, A. (in press). Are circuits. Institute of Radio Engineers, Western Electronic
developmental disorders like cases of adult brain Show and Convention Record, Part 4 (pp. 96–104).
damage? Implications from connectionist modelling. Ziegler, J. C., Stone, G. O., & Jacobs, A. M. (1997).
Behavioural and Brain Sciences. What is the pronunciation for -ough and the spelling
Treiman, R. (1986). The division between onsets and for /u/? A database for computing feedforward and
rimes in English syllables. Journal of Memory and feedback consistency in English. Behavior Research
Language, 25, 476–491. Methods, Instruments, and Computers, 29, 600–618.
Treiman, R. (1993). Beginning to spell: A study of first Zorzi, M. (2000). Serial processing in reading aloud: No
grade children. New York: Oxford University Press. challenge for a parallel model. Journal of Experimental
Treiman, R. (1994). Sources of information used by Psychology: Human Perception and Performance, 26,
beginning spellers. In G. D. A. Brown & N. C. Ellis 847–856.
(Eds.), Handbook of spelling: Theory, process and inter- Zorzi, M., Houghton, G., & Butterworth, B. (1998a).
vention (pp. 75–92). Chichester, UK: John Wiley. Two routes or one in reading aloud? A connectionist
Treiman, R., & Barry, C. (2000). Dialect and dual-process model. Journal of Experimental Psychol-
authography: Some differences between American ogy: Human Perception and Performance, 24, 1131–
and British spellers. Journal of Experimental Psychol- 1161.
ogy: Learning, Memory and Cognition, 26, 1423–1430. Zorzi, M., Houghton, G., & Butterworth, B. (1998b).
Treiman, R., & Chafetz, J. (1987). Are there onset- and The development of spelling–sound relationships in a
rime-like units in printed words? In M. Coltheart model of phonological reading. Language and Cogni-
(Ed.), Attention and performance XII: The psychology of tive Processes, 13, 337–371.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 159


HOUGHTON AND ZORZI

APPENDIX A
a
Stimuli used in Simulations 3 and 7

Regular consistent words Exception words


——————————— ———————————
High Low High Low

best deft are bowl


big broke both broad
came bap break bush
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

clasp dig choose deaf


dark dot come doll
did bake do flood
got float dough gross
him grape done lose
page lunch foot pear
place pill front phase
soon pitch give pint
still pump great plough
stop ripe have rouse
tell sank move sew
will slam pull shoe
with slip put spook
think stunt said swarm
bring swore shall swamp
cost trunk want touch
felt wake watch wad
large wax were wand
land weld what wash
north wing word wool
press wit work worm
a
The word lists are based on those used in the Taraban and
McClelland (1987) study of reading, but the regular
consistent words have been adapted somewhat to conform to
sound-to-spelling consistency.

160 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)


CONNECTIONIST DUAL-ROUTE SPELLING

APPENDIX B
a
Lists of irregularly spelled words used in Simulations 8, 9, and 10

Word frequency band


—————————————————————————————————————
0-9 10-19 20-49 50-99 100-199 200+

womb tomb debt wild doubt sure


chrome ache scheme laugh blood give
chef glove tour spread stood move
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

yacht gauge grow height gone come


awe deaf scarce broad heart are
dove wool breath foot month child
shone brow grind key meant kind
hind plough blind cook blue said
dread soup aunt shook talk death
sweat sword guide lose due head
yearn dough breast bought floor friend
sieve ton thread learn dead book
soot shoe wealth build wrote good
crook pint realm knife truth how
hook tough sign road now
brook rough style live could
vow touch should
sponge myth group
suave guard young
cough bread build
trough two
shove both
front

5 12 35 70 156 825
a
Stimuli from word sets used by Behrmann and Bub (1992, Appendix 3). Each column contains
words from one frequency band, and the mean frequency of each set is given at the bottom of
the column.

COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2) 161


HOUGHTON AND ZORZI

APPENDIX C

a
Lists of regularly spelled words used in Simulations 8, 9, and 10

Word frequency band


——————————————————————————————————————–
0-9 10-19 20-49 50-99 100-199 200+

chore curb branch chest charge point


preach hub grave frame leave five
Downloaded by [b-on: Biblioteca do conhecimento online UTAD] at 09:56 11 February 2013

chess pinch globe bond spoke those


chive arch shame list lost more
hant couch slide pitch wish save
punch slice cure goal fear think
sage hint carve proof peace firm
shave grove storm boot pool grove
skate dose milk round soon wait
sane ranch lost air ground reach
hive fist brush wage sound each
dole launch dean nine stand food
mole heap lean strike brown tool
mint teach teach moon shot show
wilt boot mount block hear house
spear glow bound hell green found
breach mouse trap ground
peach switch strip street
niece ditch yard black
broom song time
broon sweet just
coil clean made
rouse part
bounce state
gaze
5 17 44 73 189 530
a
See footnote to Appendix B.

162 COGNITIVE NEUROPSYCHOLOGY, 2003, 20 (2)

You might also like