You are on page 1of 23

Available online at www.sciencedirect.

com

ScienceDirect
Lingua 159 (2015) 70--92
www.elsevier.com/locate/lingua

When in doubt, read the instructions: Orthographic effects in


loanword adaptation
Robert Daland a, Mira Oh b,*, Syejeong Kim b
a
Department of Linguistics, UCLA, United States
b
Department of English, Chonnam National University, Republic of Korea
Received 1 February 2014; received in revised form 25 February 2015; accepted 3 March 2015
Available online 15 April 2015

Abstract
Loanword adaptation has yielded many insights into the relationship between speech perception and the phonological grammar.
Evidence is now mounting that orthographic effects on loanword adaptation may be more prevalent than was once thought (cf. Paradis
and LaCharité, 2002; Vendelin and Peperkamp, 2006), partially obscuring phonological effects. This paper investigates orthographic
effects in the adaptation of vowels of English words loaned into Korean. Experiment I uses information-theoretic statistics, called the
orthographic and perceptual information gains, to estimate a lower bound on the contribution of orthography and perception to vowel
adaptation. The results suggest that orthography contributes more to the adaptation of unstressed vowels, while perception contributes
more to the adaptation of stressed vowels. Experiment II considers the adaptation of the /ɛ//æ/ contrast; these vowels have merged
recently in Korean although the orthographic distinction is maintained. The paper concludes by proposing the Perceptual Uncertainty
Hypothesis: source-loan orthographic alignment plays the greatest role in constraining loanword adaptation when phonological parsing in
the borrowing language is underdetermined by perceptual factors alone.
© 2015 Elsevier B.V. All rights reserved.

Keywords: Loanword phonology; Speech perception; Korean; Information theory

1. Introduction

Loanword adaptation provides a unique window onto the relation between speech perception and the phonological
grammar. For example, loanword adaptation data have facilitated the discovery of phonetic factors that contribute to
phoneme identification in native and non-native speech perception (Davidson, 2007; Kang, 2003a,b). Similarly, the desire
to formally account for loanword adaptation has supported key tenets of constraint-based phonology, and spurred formal
innovations pertinent to L1 phonology (Boersma and Hamann, 2009; Shinohara, 2004).
Researchers agree that the adaptation process yields forms which are generally similar to the source form, while also
generally conforming to phonotactic properties of the borrowing language (Iverson and Lee, 2006; Kang, 2010; LaCharité
and Paradis, 2002; Peperkamp et al., 2008; Shinohara, 2004; Silverman, 1992; Yip, 1993). Debate has centered on the
extent to which loanword adaptation reflects direct perception versus phonologically informed adaptation. It now seems
clear that certain aspects of loanword adaptation reflect perceptual distortions that derive from the structure of borrowers’
native language (e.g. Dupoux et al., 1999; Kabak and Idsardi, 2007; Boersma and Hamann, 2009), while other aspects of

* Corresponding author.
E-mail address: mroh@chonnam.ac.kr (M. Oh).

http://dx.doi.org/10.1016/j.lingua.2015.03.002
0024-3841/© 2015 Elsevier B.V. All rights reserved.
R. Daland et al. / Lingua 159 (2015) 70--92 71

loanword adaptation reflect post-perceptual repair of a form that violates L1 phonotactics (Shinohara, 2004; Kabak and
Idsardi, 2007; for an analogous point in L2 phonology see Broselow et al., 1998). Adjudicating this tension has paid off
theoretically by increasing our understanding of how speech perception shapes synchronic phonology, and vice versa.
Much of this progress involved the use of loanword corpora (Kang, 2010; Paradis and LaCharité, 2002; Peperkamp
et al., 2008; Shinohara, 2004; Silverman, 1992; Yip, 1993). Corpus studies are methodologically attractive because
loanword corpora are comparatively easy to collect, analyze, and share between research teams; thus, the time required
to do a study is fairly small, the number of items is large, and replicability is good. On the other hand, the processes by
which loanwords are introduced and standardized are not fully understood, and are likely subject to a wide variety of extra-
perceptual influences. An especially salient source of ‘contamination’ is orthography -- adapters may be guided by
aspects of the source form’s orthographic representation rather than by its phonological (and perceptual) properties.
LaCharité and Paradis (2005) conducted an influential study of French loanword phonology, in which they concluded that
English orthography affected the French adaptation in a negligible proportion of forms. This view appears to reflect the
conventional wisdom, as the majority of papers on loanword adaptation that we are familiar with do not make more than
passing reference to orthography. One notable exception is Vendelin and Peperkamp (2006), who make orthography the
focus of their study. They conducted an ‘‘online adaptation’’ study with French listeners, and found that the inclusion of
English orthography conditioned how French listeners adapted the forms.
This paper follows in the footsteps of Vendelin and Peperkamp by studying the role of orthography in loanword adaptation,
but with a completely different language: Korean. The research question this paper takes up is driven by the authors’ informal
observation that the adaptation of English unstressed vowels into Korean appears to be guided by orthography. Experiment I
is a loanword corpus experiment, designed to assess the relative effects of orthography versus perceptual factors in vowel
adaptation. The methodology uses concepts and statistics from information theory to partially separate orthographic and
perceptual factors. Experiment Ia demonstrates that both orthography and perception play a role in vowel adaptation;
Experiment Ib provides suggestive evidence that the role of orthography is greater for unstressed vowels, while perception
may play a stronger role in the adaptation of stressed vowels. Experiment II considers the adaptation of the English /ɛ/-/æ/
contrast (typically adapted into Korean orthography as and respectively). The vowels spelled and underwent a
phonetic merger sometime around the 1950s in the standard dialect of Korean, although the spelling distinction is retained in
the orthography (Hong, 1988; Choi, 2002; Kim, 2000; Chung, 2002). Experiment II, an online adaptation study with nonce
words, shows that Koreans do not exhibit distinct adaptation patterns for English [ɛ]-[æ] when exposed to the auditory forms
alone (suggesting they are unable to distinguish these vowels), but when orthography is included, the adaptation patterns
shift toward what was found in the corpus study. The paper concludes by introducing the Perceptual Uncertainty Hypothesis,
that source-loan orthographic alignment constrains the phonological and orthographic parse assigned to a loan, and that this
constraint is supplementary to perceptual adaptation, so that the strongest orthographic effects will be observed when
perceptual factors alone underdetermine adaptation.

1.1. Conventions

We employ several conventions in this paper. In cases where the language of a form might be ambiguous, we supply
a subscript E after English forms, and a subscript K after Korean forms. We use tilde-angle bracket sequence to
indicate loanword adaptation, and a hyphen-angle bracket sequence to indicate native phonological mappings, e.g. wordE
> /wʌtɨ/K --> [wʌdɨ]K. Tense obstruents are denoted with a ‘*’ diacritic. Finally, while the Korean vowels and are often
transcribed as /e/ and /ɛ/ respectively, we symbolize both as /ɛ/ to reflect the vowel merger (e.g. Hwang and Moon, 2005),
except in Experiment II, where we nonstandardly transcribe as /ɛ/ and as /æ/. This transcription reflects the preferred
adaptation for the corresponding English vowels; we do this so as to minimize the cognitive load on readers not already
familiar with Korean phonology.

1.2. Korean phonology and orthography

Korean (Ahn, 1998; Sohn, 1999; Shin et al., 2012) includes 7 monophthongal vowels /i, ɨ, u, ɛ, ʌ, o, a/, and various
diphthongs. The sonorant consonant inventory includes front and back glides /j, w/; a single liquid /l/ (realized as a tap [ɻ] in
onset position), and nasals at the major places of articulation /m, n, ŋ/. Stops contrast for four places of articulation (labial,
denti-alveolar, alveo-palatal, and velar); the alveo-palatal stops are affricated. The stops exhibit a typologically unusual
three-way laryngeal contrast: plain/lax /p, t, c, k/, aspirated /pʰ, tʰ, cʰ, kʰ/, and tense/fortis /p*, t*, c*, k*/. There are lax and
tense coronal fricatives /s, s*/ (which alveopalatalize before [j, i], yielding [ɕ, ɕ*]), and a glottal fricative /h/ that is subject to
various lenition and coalescence processes.
The basic syllable structure is (C)(G)V(C). In the coda position, only the liquid, nasals, and plain oral stops may occur, a
restriction that is enforced by active laryngeal neutralization (tense and aspirated stops become plain) and active place/
manner neutralization (alveo-palatal stops and all fricatives become plain denti-alveolar stops). Underlying laryngeal and
72 R. Daland et al. / Lingua 159 (2015) 70--92

manner features of postvocalic consonants are maintained by alternations, as these segments resyllabify when followed
by a vowel-initial morpheme (such as a case particle), as shown in (1):

(1) /os/ ‘clothes’ /patʰ/ ‘field’ /kaps/ ‘price’


-LOC [os-ɛ] [patʰ-ɛ] [kaps̚ *-ɛ]
(BARE) [ot ]̚ [pat ]̚ [kap ]̚
-‘also’ [ot-t̚ *o] [pat-t̚ *o] [kap-t̚ *o]

The stops in coda position in (1) are shown with the ‘unreleased stop’ diacritic, because the phonetic grammar of Korean
requires that coda stops be obligatorily unreleased. This property plays a key role in explaining why vowel epenthesis is the
preferred repair for phonotactically illicit sequences in loanword adaptation (Kang, 2003a,b; Boersma and Hamann, 2009).
Korean exhibits a variety of active alternations across syllable boundaries. For example, plain stops are obligatorily
tensed after stops, and obligatorily voiced when they occur after a sonorant and before a vowel within a word. As another
example, stop-nasal sequences cannot occur within words, and are repaired, somewhat unusually, by nasalizing the stop.
Both the intervocalic voicing and nasalization processes are evident in ‘Korean (language)’ [hanguŋmal], whose UR is
/hankukmal/ (cf. ‘Korean’ [han], ‘country’ [kuk ]̚ , and ‘speech’ [mal]).
Hangeul, the Korean writing system, is nearly morphophonemic and nearly syllabic. The smallest units in the writing
system are called jamo; each jamo represents a distinct phoneme of the language (excepting very recent sound changes,
such as the aforementioned /ɛ/-/æ/ merger). Jamo are organized into blocks, each containing 0--1 prevocalic consonant
graphemes, 1--2 vowel graphemes, and 0--2 postvocalic graphemes. Many morphologically simplex words of Korean are
spelled with a single such block. The writing system is not phonetic, because laryngeal and manner features in the
postvocalic position alternate. That is, the same grapheme may exhibit multiple pronunciations, but most of the
pronunciation variation is predictable from the morphophonological context. Scholars and prescriptive sources have
employed several different romanizations for Korean; common romanizations for monophthongs are given in (2).

(2) Hangeul IPA romanizations


a <a>, <o>
i <i>, <ee>
ɨ <eu>
ʌ <eo>, <u>
o <o>
u <oo>, <u>
/æ/ [ɛ] <ae> (e.g. noraebang ‘karaoke hall’)
/ɛ/ [ɛ] <e>

1.3. Previous research on orthographic effects in loanword phonology

Orthograpic effects in loanword adaptation receive a casual mention in numerous studies. For example, Dohlus (2005)
claims a role for orthography in Japanese loanword adaptation, but does not give any specific example. As another
example, Kabak (2003) notes that English picnic can be adapted as [pʰiŋnik]̚K even though [pʰikʰɨnik]̚K would be the normal
loanword adaptation, which he attributes to orthographic adaptation. (However, as we will discuss later, the [pʰiŋnik]̚K
adaptation is also predicted if Koreans interpret picnic as morphologically complex; Oh, 2012). More sustained attention to
orthography is given in Smith (2006), who points to and discusses numerous ‘doublets’ which were adapted from English
into Japanese. Smith argues plausibly that in some cases, one form represents an auditory adaptation, while the other is
informed by the orthography, as shown in (3).

(3) auditory process example auditory form orthographic adapt’n


onset simplification glycerine [_ɾi.sɯ.ɾiɴ] [ɡɯ.ɾi.se.ɾiɴ]
final coda deletion jitterbug [dʒi.ɾɯ.ba_] [dʒit.taː.baɡ.ɡɯ]
coda [ŋ] as [ɴ], not [ɴɡɯ] pudding [pɯ.ɾiɴ] [pɯ.diɴ.ɡɯ]

Several experimental studies with French-speaking participants have investigated orthographic effects in loanword
adaptation. Perhaps the most well-known study in this vein was conducted by Vendelin and Peperkamp (2006), who asked
native French speakers to adapt English nonwords in an online adaptation task. Participants were tested in an oral-only or a
written + oral condition. The authors found that vowel adaptation in particular was noticeably different between these
conditions, indicating that orthography conditioned the adaptation when it was present. A similar conclusion was obtained in
Detey and Nespoulous (2008), who investigated the perception of tautosyllabic consonant clusters by Japanese learners of
R. Daland et al. / Lingua 159 (2015) 70--92 73

French. Listeners heard items with initial, medial, and final clusters, e.g. /trosema/, /sematro/, and /semagotr/. The task was a
syllable-counting task (in fact, the authors admit the task was somewhat ambiguous between syllable-counting and mora-
counting, although this only mattered for clusters containing /n/), and the dependent measure was whether speakers
indicated more than 3 syllables. The crucial manipulation was modality: participants first completed a block with auditory
presentation, then a block with simultaneous auditory and visual (orthographic) presentation, then a block with visual
presentation alone. Intriguingly, the authors found the lowest rate of epenthesis in the auditory-only condition and the highest
rate of epenthesis in the visual-only condition. The authors propose that participants listened for actual vowels during the
auditory-only block, but occasionally relied upon their own silent (but nonetheless adapted) pronunciations for the syllable
count. Whether this interpretation is accepted or not, it is indisputable that orthography affected the adaptation process.
The earliest study we know of that quantitatively assesses orthographic effects in Korean loanword adaptation is Jun
(2002). That study is primarily concerned with the role of word-final stop burst releases in loanword epenthesis. As the original
reference is in Korean, we rely here upon Kang’s (2003, ftnt. 6, p. 227) summary: ‘‘Jun (2002) examined adaptation patterns
of English nonce words in a large-scale study involving 260 college students, and found three additional factors affecting the
likelihood of vowel insertion after postvocalic word-final stops; the frequency of vowel insertion was higher when the English
words were presented in written forms than in oral forms, when the final stop in oral inputs were released than unreleased and
when the final syllable was stressed than not.’’ This pattern echoes the finding of Detey and Nespoulous, that written
adaptation actually triggers a higher rate of perceptual epenthesis (although the situation is slightly different in Korean
because it allows word-final stops, so vowel epenthesis is not necessarily forced by native-language phonotactics).
In Lee’s (2009) unpublished dissertation, Chapter 5 (Section 5.3, 77--85) reports a perception experiment in which
listeners are asked to online-adapt sCVC forms like spote (/spot/E). The primary goal of the experiment was to assess
Koreans’ perception of the post-sibilant stop (e.g. the p in spote). The loanword adaptation pattern is that English voiceless
stops are almost categorically adapted as aspirated stops (e.g. /p/E > [pʰ]K). However, some researchers have speculated
that the Korean tense stops are a better phonetic match for English post-sibilant stops, both in terms of VOT and laryngeal
coarticulation on the following vowel (for details see Lee, 2009). If Korean tense stops are a better phonetic match for English
post-sibilant stops, then the consistent adaptation to Korean aspirated stops is evidence of either phonemic adaptation, or
perhaps orthographic adaptation. Lee presented native Korean listeners with 30 sCVC items in both an oral-only block and
oral-plus-written block, with the order counterbalanced across participants. The most frequent adaptation pattern was that in
the oral-only condition, the post-sibilant C was preferentially adapted as a tense consonant, while in the oral-plus-written
condition, it was preferentially adapted as an aspirated consonant. It was unclear whether listeners in the oral-plus-written
block were applying a direct orthographyE-to-phonemeK mapping, or whether they were inferring an English phonemic
category from the English spelling, and then mapping that phoneme. However, what was very clear was that the inclusion of
orthographic information significantly altered the loanword adaptation of post-sibilant stops.
Kang (2009) compares the adaptation of morphemic /z/ (i.e. the plural/possessive morphemes) and non-morphemic /z/
(as well as the adaptation of /s/) in English words loaned into Korean during the 1930s. The morphemic /z/ (plural/
possessive marker) was always adapted as [s] or [s*]. However, a more complex adaptation pattern was evident for non-
morphemic /z/. The majority pattern was adaptation to [c]. But when the /z/ was spelled with an <s> (e.g. pose), the
majority pattern was adaptation to [s] or [s*] -- very comparable to adaptation of /s/. In these cases, the only time the
majority adaptation to [c] occurred was in non-devoicing contexts, when the /z/ occurred after a vowel and before a
sonorant (e.g. Israel, basal). In section 4, Kang summarized her interpretation thus
Orthography has often been considered an extra-linguistic influence in loanword adaptation that should be factored
out in order to reveal the true linguistic principles underlying loanword adaptation (Paradis and LaCharité, 2008;
Vendelin and Peperkamp, 2006). However, given the fact that orthography often systematically, albeit imperfectly,
correlates with aspects of linguistic structure, such as phonemes, the fact that the adapters relied on the input
language orthography in adaptation reveals what type of linguistic structure the adapters were paying attention to
(Oh and Steriade 2005). They resorted to orthographic cues when they are not certain about the phonemic identity
of the input sounds and in contexts where other correlations between the orthography and the relevant linguistic
structure were often ambiguous, orthography could have led the adapters to a ‘‘wrong’’ adaptation. In this respect,
the orthography effect is not a factor to be ignored, but rather it provides valuable evidence about the nature of the
adaptation itself. The adaptation was sensitive to the phonemic identity of the input sounds and since the adapters
were not necessarily fully competent bilinguals, they may have relied on orthography as a cue to phonemic identity,
even though this often led to wrong conclusions.
We will propose something rather similar in the conclusion of this paper. In any case, there is fairly strong evidence that
orthography does affect loanword adaptation.
However, not every study that has looked for orthographic effects in Korean loanword adaptation has found them.
Kang (2012) investigates the diachronic development of English liquid adaptation into Korean. Recall that the Korean
liquid /l/K is realized as a tap [ɻ]K in onset position, as [l]K in coda, and as [lː]K when geminated. Variability occurs in the
74 R. Daland et al. / Lingua 159 (2015) 70--92

adaptation of English onset [l]E -- it is more frequently adapted as the Korean geminate [lː]K, but in 8--16% of words it is
adapted as the onset tap [ɻ]K (cf. melonE > [mɛɻon]K; MellonE > [mɛlːon]K). Neither adaptation is a perfect phonetic
match; for example, the geminate adaptation preserves laterality but mismatches the source in duration. Oh (2004) found
that the spelling of the onset [l]E (as either <l>E or <ll>E) was strongly predictive of whether the [l]E would be adapted as
onset [ɻ]K or geminate [lː]K, suggesting an orthographic effect on loanword adaptation. However, Kang hypothesized that
many of the putatively orthographic adaptations of singleton <l>E > [ɻ]K actually represented loanwords which were first
borrowed into Japanese, and from there into Korean (melonE > [meɻoɴ]J > [mɛɻon]K). In this case, the rhotic adaptation
is expected because Japanese only has a single, rhotic liquid. The central argument is encapsulated in Kang’s logistic
regression (10; p. 55), which incorporated spelling in the English form (<l> vs. <ll>), word initial vs. medial, cluster vs.
singleton, and whether the word was from a document exhibiting hallmarks of Japanese intermediation. It turned out that
Japanese influence was the strongest predictor of non-geminate adaptation, and word-initiality was the only other
significant predictor. In other words, these factors explained away the putative orthographic influence that had been
reported in earlier studies. Regression (15), dealing with a larger but more etymologically uncertain dataset from the
1930s, exhibits a similar pattern. Kang’s final argument was that the rate of [ɻ] adaptation (the one that is expected from
Japanese loans) was highest from the 1930s, the time period at which Japanese-mediated loanword adaptation was at its
peak. Kang therefore concludes that for this particular case, there is little compelling evidence of orthographic adaptation.
In summary, the study of orthographic effects on loanword adaptation is still relatively new. There is some evidence to
suggest that native language influence on the adaptation (such as vowel epenthesis) is actually stronger when the adapter
is presented with orthography (possibly because this encourages the adapter to engage in L1 phonological encoding).
From the limited evidence reviewed above, one might speculate that orthographic effects in loanword adaptation are fairly
common, having been found in French>English, Japanese>French, and English>Korean loans. However, it is not
the case that orthographic effects are found every time that someone looks for them. The diachronic study of liquid
adaptation by Kang (2012) argued that the putatively orthographic effects reported earlier by Oh (2004) were explained
away when the loanword’s origin was incorporated; that is, most or all of the time that English [l] was adapted as Korean [ɻ],
it was because the loanword came to Korean through Japanese (which does not have an [l]).
This pattern presents an interesting puzzle. If orthographic effects on loanword adaptation are as common as the
review above suggests, why is it that they have not been found earlier? After all, loanword phonology has been an active
area of research for several decades. Conversely, if orthographic effects are so common, why is it that they do not occur in
some cases where they might reasonably be expected? There are (at least) two logical possibilities that can explain this.
The first is that orthographic effects are pervasive throughout an adaptation system but have subtle and/or variable
consequences, so that they can only be detected by large scale statistical studies or carefully targeted experimentation.
The second possibility is that orthographic effects are restricted to particular phonological/perceptual contexts, and
become evident only when these particular contexts are studied in detail. At present, we do not have a predictive theory for
when orthographic effects will occur in loanword adaptation. To move toward such a theory, it is necessary to gather more
data. In the next section, we consider the adaptation of English vowels into Korean, and we will argue that there is
considerable orthographic influence there.

2. The adaptation of English vowels into Korean

In this section, we consider the adaptation of English vowels into Korean. Prior to presenting any data, we give a
‘‘translation table’’, indicating the most common adaptation from English vowels to their Korean correspondents.

English Korean English Korean


I i u u
æ ɛ( ) l̩ ɨ, ʌ, a, ɛ
ɛ ɛ( ) n̩ jʌ, ɨ
o o ɒ <Q> o, a
aI ͡ ai ɑː <#> a, ɛ ( )
e ɛi ( ) ɔː <$> o, ʌ
i i ə ʌ, a, o, ɛ ( ), i, ɨ, jʌ, iʌ, u

(When multiple Korean adaptations occur for the same English vowel, they are listed in decreasing order of frequency;
only adaptations with a frequency of at least 10 are listed. Angle brackets indicate the DISC transcription for low vowels
which contrast in British RP. For adaptations including /ɛ/, the Korean grapheme is indicated; e.g. is used for English /ɑː/,
while is used to adapt English /e/.) Note that schwa is the most variably adapted -- even more variable than syllabic
onsonants (which properly speaking do not actually even possess a vowel). In the next section, we describe the loanword
corpus from which the adaptation counts were drawn, and on which Experiment I is based.
R. Daland et al. / Lingua 159 (2015) 70--92 75

2.1. The NAKL database

The loanword corpus consists of 2462 loanwords derived from a loanword list published by the National Academy of
the Korean Language (NAKL, 1991). The loanwords were collected from six daily newspapers and nine magazines
published in Korea in 1990 (NAKL, 1991: 159--237). Most loanwords are single words but some of them consist of multiple
words. For instance, ‘mineral water’ contains two words, in which case each individual word is taken as a separate datum.
In addition, some words exhibit multiple adaptations. For instance, the final unstressed vowel in ‘mineral’ is adapted in two
ways, [minɛɾʌl]K ( 네 ) and [minɛɾal]K ( 네 ). Thus the list used here has three forms derived from ‘mineral water’:
[minɛɾʌl]K, [minɛɾal]K, and [wʌtʰʌ]K. About 1800 English pronunciations were supplied via automatic lookup from the
CELEX database using the DISC transcription; the remaining pronunciations were manually supplied by the first author.
Vowel graphemes and phonemes were identified and parsed from the loanword list using a combination of hand-
coding and text processing. Epenthetic vowels were eliminated, leaving 5852 vowels (data points). This paper will use the
following conventions:

 i is an index, representing vowels in the NAKL list


 pi is the phonetic/phonological identity of the source (English) vowel
 oi is the orthographic spelling of the source (English) vowel
 ki is the Korean adaptation of the vowel (ASCII transliteration), and
 di is the ith observation, the tuple di = ( pi, oi, ki)

To illustrate, suppose that the first four vowels in the NAKL list come from the English word academy. The complete
phonetic form, orthographic form, and Korean form is a1ca2de3my4 / [ə1kʰæ2də3mi4] / , where the
vowels are subscripted for clarity. Then p1 = [ə] because the phonetic value of the first vowel in academy is schwa;
o1 = <a> because the first vowel in academy is spelled with the letter <a>, and k1 = ‘‘a’’ because the Korean adaptation of
the first vowel in academy is , and ‘‘a’’ is the transliteration equivalent chosen here for . Here are the four observations
derived from academy:

(3) 1. a1cademy p1 = [ə] o1 = <a> k1 = ‘‘a’’ (transliteration of )


2. aca2demy p2 = [æ] o2 = <a> k2 = ‘‘a’’
3. acade3my p3 = [ə] o3 = <e> k3 = ‘‘E’’
4. academy4 p4 = [i] o4 = <y> k4 = ‘‘i’’

The transliteration of Korean vowels is 1--1 (i.e., different Korean vowels are mapped to different Latin graphemes), and
was done because it is easier to process ASCII using command-line tools and Python scripting. Note that in most cases
the orthographic correspondent consists of a single letter or a digraph (e.g. the vowel in beach is spelled <ea>; the first
vowel of aerodynamic is spelled <ae>), however in cases where the English orthography is especially opaque, the
orthographic form included as much context as the authors felt would be helpful for grapheme-to-phoneme mapping, e.g.
the entire -tion affix was included in the relatively few cases where it occurred. (The complete list of orthographic
sequences used is given in Table 4 in the Appendix.)

2.2. Some examples of vowel adaptation

Inspecting the loanword corpus informally, one may be struck by how inconsistently the unstressed schwa vowel is
adapted. Several examples are given in (4) where two different schwas are adapted differently:

(4) English Source SR Loan SR Hangeul


academy [əkʰædəmi] [akʰadɛmi]
acropolis [əkʰɹapʰəlIs] [akʰɨɾopʰolːisɨ]
aluminum [əlumInəm] [alːuminjum]
ballerina [pæləɹinə] [palːɛɾina]
camera [ˈkʰæməɹə] [kʰameɾa]
chlorella [kʰləˈɹɛla] [kʰɨlːoɾɛlːa]
emerald [ˈɛməɹəld] [ɛmʌɾaldɨ][ɛmɛɾaldɨ]
messiah [məˈsaIjə]͡ [mɛɕia]
opera [ˈapʰəɹə] [opʰeɾa]
76 R. Daland et al. / Lingua 159 (2015) 70--92

The schwas and their correspondents are underlined in the English orthography. For example, in academy, the schwa is
adapted into Korean as /a/, while the second schwa is adapted as /ɛ/. Indeed, in all of the examples in (4), the Korean
vowel adaptation corresponds to what might be thought of as the default grapheme-to-phoneme mapping.
Some more examples are shown in (5) to illustrate the adaptation for every English single-grapheme vowel that can
unambiguously correspond to a schwa:

(5) <o> angstromE > [oŋsɨtʰɨɾoŋ]K <e> boomerangE > [pumɛɾaŋ]K


automaticE > [otʰomætʰik ]̚ K aldehydeE > [aldɛhidɨ]K
benomylE > [pɛnomil]K allegoryE > [alːɛgoɾi]K
alcoholE > [alkʰohol]K
<a> accordE > [akʰodɨ]K
<u> atriumE > [atʰɨɾium]K alanineE > [alːanin]K

The graphemes <i> and <y> are omitted, because these graphemes typically correspond either to the diphthong [aI] ͡
(I, aldehyde) or an unstressed vowel (automatic and phenyl). In the latter case, the unstressed vowel is phonetically
ambiguous between [ə] and [I], which is therefore less clearly suggestive of an orthographic effect.
Having given numerous examples where the orthography appears to condition the adaptation of schwa, it is worth
noting that there are cases where schwa appears to be adapted based on its sound value only, rather than orthography.
The examples in (6) suggest that word-finally, schwa-rhotic elements (variously analyzed as [əɹ], [2], or [ɹ̩]), schwa-lateral
sequences, and schwa-nasal sequences are mainly adapted with [ʌ], regardless of the vowel’s spelling.

(6) messenger [mɛɕ*indʑʌ]͡ K miniature ͡ K


[miniʌtɕʰʌ]
mirror [miɾʌ]K mixer [miks̚ *ʌ]K
apparel [ʌpʰɛɾʌl]K memorandum [mɛmoɾɛndʌm]K
bourbon [pʌbʌn]K minimal [minimʌl]K
minimum [minimʌm]K animation [ɛnimɛiɕjʌn]K
bacon [pɛikʰʌn]K badminton [pɛdɨmintʰʌn]K

An example which illustrates the complexity of loanword adaptation is national, which is adapted variably as [nɛɕjʌnʌl],
[nɛɕjʌnal], and [nɛɕjonal] (내 널, 내 날, and 내 날). The latter adaptation apparently reflects a purely orthographic
adaptation, while the first adaptation apparently reflects purely perceptual adaptation; the middle form apparently reflects
perceptual adaptation of the -tion suffix’s vowel, and orthographic adaptation of the -al suffix’s vowel. Note, however, that
while the unstressed vowels in national are adapted variably, the stressed vowel is adapted the same way in all three
variants.
More generally, the examples given above suggest that stressed vowels are adapted somewhat more consistently
than unstressed vowels. This point can be seen more quantitatively in Table 1, which gives adaptation counts from the
loanword corpus.
Rows correspond to the English vowel’s phonological identity; columns correspond to the Korean vowel grapheme
(which corresponds to its pronunciation, except for the /ɛ//æ/ merger). Cells indicate the number of times the adaptation
was observed; e.g. the first row corresponds to English schwa, and the first column corresponds to Korean [i], so the top
left cell value of 63 means that English schwa was adapted to Korean /i/ 63 times in this dataset. (The analogous table for
the correspondence between English orthography and the English vowel identity is given as Table 4 in the Appendix.)
The data above offer anecdotal support for this idea, but some more rigorous statistical assessment is called for.
Herein lies the challenge: familiar statistical methods such as ANOVA and linear/logistic regression require either a scalar
(continuous, single-dimensional) response variable, or a binary-valued (single-dimensional) response variable. In the
vowel adaptation data here, it is most plausible to treat the Korean adaptation (ki) as the response variable, but this
variable is discrete (the vowels cannot be arranged onto a meaningful single-dimensional scale), and the number of
response categories is much larger than 2. Similar remarks apply to both the English phonetic and orthographic vowel
identities ( pi, oi), which should be regarded as the independent variables. Thus, multinomial regression (the multi-
dimensional variant of logistic regression) is also inappropriate. To test for orthographic versus phonetic effects on Korean
vowel adaptation, this paper turns to statistics derived from information theory.

2.3. Information theory

This subsection describes the fundamental concepts of information theory (Shannon, 1948), beginning with the
entropy, conditional entropy, and mutual information. Then, it is shown how these measures can be applied to determine
R. Daland et al. / Lingua 159 (2015) 70--92 77

Table 1
Counts of English vowel to Korean vowel adaptation. Koreans vowels (columns) with frequencies less than 15 were omitted for readability.

i ʌ o a ɛ æ ai ɛi u ɨ ju jʌ au iʌ ja ia jo oi ɛʌ

ə 63 560 127 242 65 15 6 3 11 41 8 18 -- 15 4 9 2 -- 1


I 1025 2 6 8 63 4 2 2 2 4 -- -- -- -- -- -- -- -- --
æ 8 6 10 176 34 286 -- 2 -- -- -- -- -- 3 6 -- -- -- --
ɛ 11 5 7 3 385 14 2 1 1 1 1 -- -- -- -- -- -- -- 1
o 6 6 304 2 -- 3 -- -- 1 -- -- -- 1 -- -- -- 9 -- --
ɒ 2 16 184 61 3 -- 4 -- 1 2 1 -- 1 -- 4 -- 6 -- --
e 1 1 -- 28 19 3 3 189 -- 2 1 -- -- -- -- -- 1 -- --
i 197 -- 7 7 27 2 2 1 -- -- 1 -- -- 1 -- -- -- -- 1
aI ͡ 26 3 6 7 3 -- 196 -- -- -- -- -- -- -- 1 -- -- -- --
ɑː 1 1 3 136 3 24 1 -- -- 1 -- -- -- -- 5 -- -- -- --
u 1 2 5 -- 3 -- -- -- 74 2 75 -- -- -- -- -- -- -- --
ɔː 2 11 126 3 2 -- -- -- -- -- -- -- 3 -- -- -- 2 -- --
ʌ 1 121 7 9 2 -- 2 -- 4 -- -- 3 -- 1 -- -- -- -- --
l̩ 1 30 2 19 11 -- -- -- 1 47 -- 3 -- -- -- -- -- -- --
n̩ 1 6 2 2 1 -- -- -- 1 11 -- 67 -- -- -- -- -- -- --
iə ͡ 12 1 1 1 6 2 -- -- --- -- 9 2 -- 22 -- 11 -- -- --
2˞ 1 54 2 3 3 -- -- -- --- -- -- 2 -- 1 1 -- -- -- --
ao ͡ -- 1 -- 7 -- -- -- -- --- -- -- -- 46 -- -- -- -- -- --
U -- 1 -- -- 2 -- -- -- 28 -- 12 -- -- -- -- -- -- -- --
ɛə ͡ -- -- -- 8 1 -- -- 1 -- -- -- -- -- -- -- -- -- -- 15
oI ͡ -- -- 2 -- 1 -- -- -- -- -- -- -- -- -- -- -- -- 20 --

upper and lower bounds on the influence of orthography and phonology on loanword adaptation. Readers wishing for
further information are advised to consult a tutorial on information theory in linguistics (Daland, 2014, freely available from
link in references) or other studies where it has been applied (Goldsmith, 1998, 2002).
The entropy of a random variable (RV) is a statistical measurement which characterizes the amount of ‘‘uncertainty’’ as
to the value of the RV on any given trial. Entropy is typically measured in bits, which has the intuitive interpretation that the
uncertainty of an RV is equal to the average number of binary questions that must be asked to determine the value of the
RV (assuming perfect knowledge of the relative probabilities of different outcomes). For example, a fair coin has one bit of
entropy, because exactly one binary question must be asked to determine the value of the coin on any given trial. Having
given the intuitive meaning of entropy, we turn now to a formal definition.
The surprisal of an event v is defined as the negative of the logarithm (base b) of its probability, !(v) = logb Pr(v). In
this paper, the binary base is used throughout; therefore we will use the special notation lg x to represent log2 x, and the
units of surprisal will be called bits. The concept of surprisal can be informally understood by considering the surprisal of
events at three different levels of certainty. An event v is a foregone conclusion if Pr(v) = 1; in this case the surprisal is 0: !
(v) = lg 1 = 0. In a fair coin, the probability of an h (heads) is 1/2, so the surprisal of an h is 1 bit: !(h) = lg (1/2) = lg
(21) = (1) = 1. If an event v is impossible, its probability is 0, and its surprisal is infinite/undefined: !(v) = lg 0 = 1.
Finally we are in a position to define entropy formally: the entropy of a discrete random variable X, written H[X], is the
expectation of its surprisal: H[X] = E[!(X)]. In other words, entropy is a weighted average of surprisal over the event space
of X, where each event’s surprisal is weighted by the likelihood of the event:
P
H[X] =  v in X Pr(v)lg Pr(v) entropy [1]

This concept can be illustrated by applying this definition to the case of a fair coin. The events are h (heads) and t (tails),
each with a probability of 1/2:

H[fair coin] = (Pr(h)lg Pr(h) + Pr(t)lg Pr(t)) [2]


= (1/2  lg (1/2) + 1/2  lg (1/2))
= (1/2  (1) + 1/2  (1))
= (1) = 1

Thus, the entropy of a fair coin is 1 bit, in agreement with the intuitive interpretation offered previously: it takes an average
of one yes-no question to determine the outcome of a fair coin flip.
78 R. Daland et al. / Lingua 159 (2015) 70--92

It follows from Eq. [1] that when there are n outcomes that are all equally likely, the entropy is lg n. As it turns out, this is
the maximum entropy that is possible when there are n distinct outcomes, and this maximum is only obtained when all n
events have equal probability. Put another way, if the random variable is skewed toward certain outcomes, its entropy will
be less than the maximum. This point is illustrated in [3], where the entropy is shown for a set of unfair coins in which the
probability of a heads is steadily decreased toward 0:

Pr(h) H[coin] Pr(h) H[coin] [3]


1/2 1.0 bits 1/8 .54 bits
1/4 .81 bits 1/16 .34 bits

This point is especially relevant for linguistics, since word frequencies and other linguistic distributions tend to be heavily
skewed (Baayen, 2001).
The scientific utility of information theory arises from the power to measure relationships between two or more random
variables. For instance, suppose the following situation:

 It rains 3 days out of 10.


 On days that it will rain, there is an 80% probability that Aunt Eugenia will complain about her aching bones in the
morning.
 On days that it will not rain, there is a 20% probability that Aunt Eugenia will complain about her aching bones.

If one wishes to have a picnic, it is clearly a good idea to consult Aunt Eugenia first. Her bones are not a perfect indicator of
whether rain will come, but they appear to be significantly better than guessing based on the raw probability of rain. The
concepts of joint entropy, conditional entropy, and mutual information will be illustrated using this example.
The joint distribution of two random variables (X, Y) is the distribution over all possible pairs of events from X and Y. For
illustration, let X be the observable information of whether Aunt Eugenia’s bones hurt, X = hurt or X = :hurt, and let Y be
the hidden outcome we wish to make inferences about, Y = rain or Y = :rain. The joint distribution consists of all pairs of
outcomes. The probabilities can be calculated given the information above. For instance, the probability of hurt and rain is
determined by multiplying the probability that it will rain by the probability that Aunt Eugenia’s bones will hurt, given that it
will rain: Pr(hurt ^ rain) = Pr(rain)Pr(hurt j rain) = 0.30.8 = 0.24. The full joint distribution is given in [4].

X Y: rain :rain sum [4]


hurt 0.24 0.14 0.38
:hurt 0.06 0.56 0.62
sum 0.30 0.70 1.0

The joint entropy, written H[X,Y], is simply the entropy of the joint distribution, i.e. the expected surprisal of every pair of
events. For this Aunt Eugenia example, H[X,Y] = 1.603, considerably less than the 2 bits that would result if the variables
were independent and all events were equiprobable.
From the joint distribution, it is straightforward to calculate the distributions of X and Y alone. For example, the total
probability that Aunt Eugenia’s bones hurt is obtained by summing probabilities across the entire hurt row, i.e. Pr(hurt)
= Pr(hurt ^ rain) + Pr(hurt ^ :rain) = 0.24 + 0.14 = 0.38. Summing in this way is called marginalization, and the resulting
distributions (X and Y) are called the marginal distributions. For this Aunt Eugenia example, the marginal distribution
entropies are H[X] = 0.958, H[Y] = 0.881.
The conditional entropy of Y given X, written H[Y j X], represents the average amount of uncertainty in the outcome of
Y, given that the value of X is known. In the present case, this represents our level of uncertainty as to whether it will rain,
given that Aunt Eugenia has told us whether her bones hurt. A simple way to calculate the conditional entropy is given in
[5].

H[Y j X] = H[X,Y]  H[X] conditional entropy [5]

Since the joint and marginal entropies were calculated above, it is easy to calculate the conditional entropy for the Aunt
Eugenia example: H[Y j X] = 1.603  0.958 = 0.645.
It is instructive to compare the conditional entropy of rain (given the state of Aunt Eugenia’s bones) to the raw entropy of
rain. Copying from above, H[Y] = 0.881, while H[Y j X] = 0.645. In other words, our uncertainty about the weather has been
decreased by incorporating our knowledge about Aunt Eugenia’s bones, by about 0.236 (=0.881  0.645) bits. The
R. Daland et al. / Lingua 159 (2015) 70--92 79

Fig. 1. Venn diagram representing algebraic relations between joint, conditional, and marginal entropies and mutual information.

quantity H[Y]  H[Y j X] indicates the reduction in our uncertainty as to Y that arises from knowing X. This quantity is called
the mutual information:

I[X; Y] = H[Y]  H[Y j X] mutual information [6]

It is straightforward to show that mutual information is symmetric (I[X; Y] = I[Y; X]).


In fact, the relationship between joint entropy, conditional entropy, and mutual information can be expressed
algebraically in [7], and graphically as in Fig. 1.

H[X,Y] = H[X j Y] + I[X; Y] + H[Y j X] [7]

It follows from Eq.’s [5]--[7] that H[X, Y] ≤ H[X] + H[Y]; moreover, strict equality holds if and only if X and Y are independent
(in which case I[X; Y] = 0). Equivalently, H[Y j X] = H[Y] if and only if Y is independent of X. In other words, Y is statistically
independent of X if knowing the value of X provides no reduction in uncertainty as to the value of Y. Therefore, information
theory can be used to test for the degree of (in)dependence between two variables, and this is the property we will exploit
to measure orthographic influence on vowel adaptation in the next section.
Note that in the Aunt Eugenia example presented here, each of the RV’s X and Y were binary-valued. Thus, in this
example, there was little special advantage in using information theoretic measures over other measures, such as d0 or
logistic regression. In a more general setting like Korean vowel adaptation, there are many possible inputs and many
possible responses, so d0 and logistic regression cannot be applied straightforwardly. However, the information-
theoretic measures can be applied in exactly the same way as with the Aunt Eugenia example. The next section shows
how they can be applied to determine bounds on the orthographic versus perceptual contributions to loanword
adaptation.

2.4. Applying information theory, part I: Getting a probability distribution

As laid out in the previous section, information theory can be used to quantify relationships between random variables.
The essence of the idea here is to regard Korean vowel adaptation as a random variable K, and to investigate its
relationship to the random variables P and O, representing the perceptual category of the English vowel and its
orthographic representation respectively. Thus, the key to the approach is to cast the data from the NAKL corpus into the
statistical framework of random variables.
Recall that the data consist of vowel tuples di = ( pi,oi, ki) where pi represents the perceptual vowel category of the ith
vowel in the English source form, oi represents the English orthographic representation of the same vowel, and ki
represents (an ASCII transliteration of) the Korean vowel in the loan form. These di values may be regarded as a sample
from a joint distribution, di  (P, O, K). The most straightforward assumption is that every possible ( p, o, k) triple occurs
with its own probability, and that these probabilities are proportional to the sample frequency: Pr(di = ( p, o, k)) = Fr( p, o, k)/N
where Fr() is the sample frequency and N is the total number of observations. To illustrate this process, let us suppose that
the data consisted of the following:
80 R. Daland et al. / Lingua 159 (2015) 70--92

(7) 1. a1cademy p1 = [ə], o1 = <a>, k1 = ‘‘a’’


2. aca2demy p2 = [æ], o2 = <a>, k2 = ‘‘a’’
3. acade3my p3 = [ə], o3 = <e>, k3 = ‘‘E’’
4. academy4 p4 = [i], o4 = <y>, k4 = ‘‘i’’
5. ca5mera p5 = [æ], o5 = <a>, k5 = ‘‘a’’
6. came6ra p6 = [ə], o6 = <e>, k6 = ‘‘E’’
7. camera7 p7 = [ə], o7 = <a>, k7 = ‘‘a’’

Although there are 7 data points, there are only 4 distinct event types, yielding the following:

Fr([ə], <a>, ‘‘a’’) = 2 (d1, d7) Pr([ə], <a>, ‘‘a’’) = 2/7 [8]
Fr([æ], <a>, ‘‘a’’) = 2 (d2, d5) Pr([æ], <a>, ‘‘a’’) = 2/7
Fr([ə], <e>, ‘‘E’’) = 2 (d3, d6) Pr([ə], <e>, ‘‘E’’) = 2/7
Fr([i], <y>, ‘‘i’’) = 1 (d4) Pr([i], <y>, ‘‘i’’) = 1/7

The marginal distributions can be obtained by summing as before; for example the (O, K) distribution can be obtained by
marginalizing over P, yielding Pr(<a>, ‘‘a’’) = 4/7, Pr(<e>, ‘‘E’’) = 2/7, Pr(<y>, ‘‘i’’) = 1/7. This estimation procedure allows
us to infer probability distributions for the joint random variable (P, O, K), which is the essential first step in applying
information theory. Note that this estimation procedure also introduces sampling errors into the estimated distribution,
which artificially inflates the information measures of interest. This issue will be discussed later; for now the relevant point
is that it is possible to approximate the joint distribution (P, O, K) from the NAKL data.

2.5. Applying information theory, part II: The measures of interest

As described in the previous section, the joint distribution (P, O, K) can be inferred from the sample frequencies of each
distinct ( p, o, k) triple in the loanword adaptation dataset. From this distribution, it is possible to calculate various statistics
of interest. It will be shown here that certain mutual informations and mutual information-like statistics represent upper and
lower bounds on the extent of orthographic (and perceptual) influence on Korean vowel adaptation. (As will become
evident throughout the paper, it is not possible to fully deconfound orthographic from perceptual influence, since in many
cases orthography and perception are fully confounded.)

2.5.1. An upper bound on orthographic influence


Let us consider what the value I[K; O] represents in light of the meaning of mutual information. This value indicates the
mutual information of English vowel orthography and the Korean loanword adaptation (without regard to the perceptual
category of the vowel). From definition in Eq. [6], one way of rewriting this is as H[K]  H[K j O]: the reduction in uncertainty
as to the Korean vowel that is chosen given that the English vowel’s orthographic representation is known. Since the
English orthography is logically prior to the Korean vowel adaptation, it would be tempting to assume that this statistic
measures the causative influence of the English orthography on the Korean vowel adaptation. However, this is fallacious,
since English orthography O is partially confounded with the perceptual vowel category P. That is, the statistic I[K; O]
measures the greatest amount of influence that English vowel orthography could possibly exert on the Korean loanword
adaptation, ignoring any possible perceptual contributions. Therefore, I[K; O] is an upper bound on the influence of O on K.

2.5.2. An upper bound on perceptual influence


By identical reasoning as in the previous paragraph, I[K; P] represents an upper bound on the possible influence of a
vowel’s perceptual identity on its adaptation into Korean.

2.5.3. A lower bound on orthographic influence


The conditional entropy H[K j P] represents the amount of uncertainty that remains in the Korean vowel adaptation,
given the perceptual identity of the vowel to be adapted. In other words, it represents the amount of uncertainty that
remains in K, when as much uncertainty as possible has been explained away by P. Similarly, the conditional entropy H[K
j P,O] represents the amount of uncertainty that remains in K, when both P and O are known. It follows that H[K j P] ≥ H[K j
P,O]: having two pieces of information is better (or at least as good) as having one, on average. Moreover, the difference H
[K j P]  H[K j P,O] is of interest: this value represents the extra reduction in uncertainty as to K that arises from knowing O,
after all the uncertainty that could be explained by P has already been removed. In other words, this value represents a
lower bound on the orthographic contribution to the loanword adaptation. We will refer to this statistic as the orthographic
information gain. (This concept bears a coarse conceptual and mathematical resemblance to the concept of functional
load, as treated in, e.g. Surendran and Niyogi, 2003).
R. Daland et al. / Lingua 159 (2015) 70--92 81

2.5.4. A lower bound on perceptual influence


By identical reasoning as above, the perceptual information gain H[K j O]  H[K j P,O] represents a lower bound on the
perceptual influence on vowel adaptation. The four bounds are summarized in Eq. [9]:

orthographic influence perceptual influence [9]


lower bound H[K j P]  H[K j P, O] H[K j O]  H[K j P, O]
upper bound I[K; O] I[K; P]

The remainder of this section discusses two technical challenges in obtaining/interpreting these values which arise from
finite sample effects.

2.6. Applying information theory, part III: Chance (correcting for accidental information)

As noted briefly above, the process of inferring underlying probabilities from sample frequencies introduces sampling
errors. For example, with a fair coin the true probability of a heads is 0.5, but a sample of 100 coin flips might contain 55
heads and 45 tails, yielding modest errors in the inferred probabilities. This kind of sampling error tends to be small
when the number of observations for a category is large in proportion to the total number of observations. Conversely,
sampling error tends to be large when the number of observations for a category is small in both absolute and relative
terms. Now, suppose that two random variable X and Y are actually independent, and we observe a sample from the
joint distribution (X, Y), i.e. a set of data {zi} where each zi = (xi, yi). Vinh et al. (2009) show that the samples will exhibit
‘‘accidental’’ dependencies, even though the RV’s they are drawn from are actually independent. This problem is
exacerbated when the distribution is heavily skewed, and when the number of categories is large; unfortunately this
characterization holds for nearly all linguistic distributions of theoretical interest, and it certainly holds here. For
example, there are over 100 distinct orthographic realizations of English vowels in this corpus, many of which occur
just a few times. The process of inferring the joint distribution from sample frequencies will introduce ‘‘accidental
information’’, over and above any true dependencies in the data, so the upper and lower bounds described in the
previous section cannot be taken at face value. This section describes how the amount of accidental information can
be estimated for each measure.
Let us begin by observing that the true relationships (if any) between English orthography and Korean loanword
adaptation can be destroyed by shuffling the order of the oi data points (with respect to the corresponding pi and ki data).
By calculating probabilities from sample probabilities just as with the real (non-shuffled) data, we obtain a ‘shuffled’ joint
distribution (P, O’, K), indicated with a stroke on the shuffled variable. The same informational statistics can be collected
on this data; for example the upper bound on shuffled orthographic influence is I[K; O0 ]. This value should be low, since the
shuffling process will destroy any true relationships between K and O. However, owing to finite sampling errors, it may be
above zero, even if it is small. In fact, this is the amount of accidental mutual information we can expect in the joint
distribution of K and O, since it is the amount of mutual information we obtained when we caused the two RV’s to be
independent. Since shuffling is inherently a random process, the exact value of I[K; O0 ] will vary slightly from shuffle to
shuffle. Therefore, the shuffling process is repeated 1000 times, and we take the average, denoted E[I[K; O0 ]]. The full set
of accidental information measures are given in [10]:

orthographic influence perceptual influence [10]


lower bound E[H[K j P]  H[K j P, O0 ]] E[H[K j O]  H[K j P0 , O]]
upper bound E[I[K; O0 ]] E[I[K; P0 ]]

Finally, the ‘true’ value of the bound is obtained by beginning with the sample value in [8], and then subtracting out the
accidental information values in [9]. For example, the true upper bound on orthographic influence is I[K; O]  E[I[K; O0 ]].

2.7. Experiment Ia: all vowels

The method described in the preceding sections was applied to the NAKL data. Prior to applying any of the
computations, the data were filtered by removing p, o, or k categories that occurred less than 5 times in the data set. This
was done because these items were disproportionately likely to contain a spelling/transcription errors, and because these
low frequency items are the ones which presumably contribute a disproportionate amount of accidental information. After
removing these infrequent categories, the sample entropies and ‘shuffled’ entropies were calculated. The sample
entropies and shuffled entropy means and ranges are shown in Table 2 (ranges indicate the total range of variation), while
the derived bounds are reported in [11].
82 R. Daland et al. / Lingua 159 (2015) 70--92

Table 2
Information-theoretic measures for Experiment Ia. The left half of the table shows the absolute entropies of
various distributions, defined in the text. The right half of the table shows corresponding conditional entropies.
The chance column represents the mean and range for a Monte Carlo simulation of the null hypothesis, as
discussed in the text. Cells marked with ‘--’ have the same value as in the observed sample.

Distribution Sample entropy Shuffled orthography Shuffled phonology

P, O K 6.31 8.51 (8.47--8.54) 8.44 (8.42--8.46)


P, O 5.59 7.32 (7.30--7.33) 7.32 (7.30--7.33)
O, K 5.07 7.03 (7.01--7.04) --
P, K 5.17 -- 7.04 (7.03--7.05)
P 3.71 -- --
O 3.77 -- --
K 3.41 -- --

orthographic influence perceptual influence [11]


sample  accidental sample  accidental
lower bound 0.74  0.27 = 0.57 0.58  0.18 = 0.40
upper bound 2.11  0.15 = 1.96 1.85  0.08 = 1.79

In short, the fact that the lower bounds in [11] are greater than zero suggest that both orthography and perception
contribute to the adaptation of English vowels in Korean loanwords.
It is natural to wonder whether these results are ‘‘statistically significant’’. For example, is the ‘true’ lower bound on
orthographic influence significantly greater than 0? This question is theoretically significant since one salient null
hypothesis is that orthography has no effect on loanword adaptation at all (i.e. that any apparent influence purely arises
from associations between orthography and perceptual categories). Similarly, is the ‘true’ lower bound on perceptual
influence significantly greater than 0? This question is similarly theoretically significant, because it addresses the opposite
null hypothesis, that there is no perceptual component to loanword adaptation. (For example, Korean loanword might rely
on learned associations between English graphemes and Korean phonemes.) In the present case, by repeating the
shuffling process 1000 times we obtain not only means but good estimates of the variation on the ‘‘accidental information’’
measures. As shown in Table 2, these bounds are fairly tight, on the order of .03 bits for most variables of interest.
Therefore, the sample orthographic information gain was greater than the accidental orthographic information gain on
1000/1000 trials, and similarly the sample perceptual information gain was greater than the accidental perceptual
information gain on 1000/1000 trials. Thus, the true lower bounds on orthographic and perceptual influence on loanword
adaptation are both greater than 0 with p < 0.001.

2.8. Experiment Ib: the effect of stress

Experiment Ia was conducted to validate the methodology, and to introduce the measures of interest. Now that the
methodology and measures of interest have been introduced and validated, they can be applied to questions of more
theoretical interest, such as the hypothesis that motivated this line of work. Namely, is the orthographic effect relatively
strong for unstressed vowels, and relatively weak for stressed vowels? Experiment Ib was done using the same dataset
as Experiment Ia, but separating the vowels by stress level (as coded in the CELEX database). Then the same method
was applied separately to each subset. The same and shuffled entropies are shown in Table 3, while the true bounds
calculations are shown in [12].

orthographic perceptual [12]


lower upper lower upper
primary .39  .25 = .14 2.13  .19 = 1.94 .65  .24 = .41 2.39  .12 = 2.27
secondary .88  .59 = .29 2.13  .47 = 1.66 .56  .39 = .17 1.81  .50 = 1.31
unstressed 1.01  .30 = .71 2.05  .21 = 1.84 .30  .15 = .15 1.34  .10 = 1.24

As in Experiment Ia, the range of shuffled entropies is small; therefore all lower bounds are significantly greater than 0 with
p < .001. (The range of variation within shuffles was about .02 for this dataset; thus any difference of .04 or bigger is highly
significant, including differences between lower bound values.)
R. Daland et al. / Lingua 159 (2015) 70--92 83

Table 3
Information-theoretic measures for Experiment Ib. The major row division corresponds to stress level (primary, secondary, unstressed). Within
each stress level, the cells indicate the observed value (first subrow), predicted value when O is scrambled (second subrow), and predicted value
when P is scrambled (third subrow). The range is shown for values which change under scrambling (e.g. H[P j K] is constant when O is scrambled).
The ‘chance’ value for orthographic information gain is shown underneath the ‘actual’ value within the same stress group, e.g. the highest
orthographic information gain for primary stressed vowels that is expected by chance is 0.25 + .03, whereas the observed orthographic
information gain for primary stressed vowels is 0.39. Phonological gain is analogous.

Primary Secondary Unstressed

obs’d O’ P’ obs’d O’ P’ obs’d O’ P’

P, O, K 5.7 7.93  .04 8.20  .03 5.53 6.98  .11 6.85  .11 5.65 7.58  .04 7.02  .04
P, O 5.01 7.10  .02 7.10  .02 4.99 6.15  .12 6.14  .10 4.95 5.17  .03 6.17  .03
O, K 4.85 6.79  .03 -- 4.15 5.81  .11 -- 4.76 6.60  .03 --
P, K 4.85 -- 7.12  .02 5.06 -- 6.37  .10 4.3 -- 5.54  .02
P 3.77 -- -- 3.64 -- -- 2.59 -- --
O 3.51 -- -- 3.05 -- -- 3.76 -- --
K 3.47 -- -- 3.23 -- -- 3.05 -- --

2.9. Summary and discussion

The results of Experiments Ia and Ib demonstrate both orthographic and perceptual effects on the adaptation of
English vowels in Korean loanwords. This conclusion was reached by applying a set of information theoretic measures,
developed Sections 2.5--2.7, to a corpus of loanwords accumulated by the National Academy of the Korean Language
(NAKL, 1991). Observations consisted of individual vowels and their adaptations -- specifically the perceptual identity of
the vowel (vowel category) p, its orthographic representation in English o, and then the Korean vowel or vowel sequence it
was adapted to k. The resulting triples di = ( pi, oi, ki) were treated as samples from a joint distribution (P, O, K). This
enabled inferences about the role of perception versus orthography in loanword adaptation. For example the mutual
information I[K; O] is an upper bound on the potential influence of orthography on loanword adaptation, though this value
alone does not indicate the exact degree of orthographic influence since orthography is confounded with English vowel
identity. An additional complication arises from sampling errors in the process of inferring a joint distribution from discrete
data; the ‘‘accidental information’’ that is introduced this way was estimated by a Monte Carlo shuffling process, and
subtracted out to yield the true bounds. In Experiment Ia the method was applied to the entire dataset, while Experiment Ib
applied the same method, but separated by stress level of the English vowel.
The pattern of results was relatively straightforward. First, both the orthographic and perceptual lower bounds were
significantly greater than zero (Experiment Ia). This indicates that orthography and perception both condition the
adaptation of English vowels loaned into Korean. Second, the upper and lower bounds differ systematically by the stress
level. The lower bound on orthographic influence is greatest for unstressed vowels, intermediate for secondary stressed
vowels, and least for primary vowels. Exactly the opposite pattern is shown for the upper and lower bounds on perceptual
influence: these values are highest for primary stressed vowels, lower for secondary stressed vowels, and lowest for
unstressed vowels. Although these bounds cannot fully deconfound orthographic from perceptual factors (since they are
in many cases aligned), they are most consistent with the hypothesis that orthographic influence on vowel adaptation is
strongest in unstressed vowels and weakest in stressed vowels, while perceptual influence exhibits the opposite pattern.
In other words, these results suggest that Korean adapters attend relatively more to perceptual cues when adapting
stressed English vowels, and relatively more to orthography when adapting unstressed English vowels.
The use of information-theoretic statistics in this section was motivated by the ‘broad-brush’ character of the
investigation: more familiar analysis methods such as logistic regression are not appropriate when there is a large number
of response categories. The next section focuses more narrowly on the adaptation of a particular contrast, the English
/ɛ//æ/ contrast.

3. The adaptation of /ɛ/ and /æ/

The vowels [ɛ] and [æ] are in contrast in English. Moreover, the English orthography-phonology mapping is relatively
consistent for these two vowels: [ɛ] is usually spelled with an <e>, and [æ] is usually spelled with an <a>, cf. bed, bad.
Historically, Korean had a vowel contrast that is phonetically analogous to the English [ɛ][æ] contrast; the vowel
graphemes are written and . As shown in (8), the preferred adaptation is for English [ɛ] to be adapted with the vowel
grapheme, while English [æ] is typically adapted with the vowel grapheme.
84 R. Daland et al. / Lingua 159 (2015) 70--92

(8) p o k count example


[æ] <a> (/æ/K) 270 accent, amp1
[æ] <a> (/ɛ/K) 25 badminton, amp2
[ɛ] <a> (/æ/K) 1 paragliding
[ɛ] <e> (/æ/K) 7 credit, jetline
[ɛ] <e> (/ɛ/K) 368 academic, bed

For this reason, we transcribe as /ɛ/K, and as /æ/K.


What makes this adaptation pattern fascinating and relevant to the present paper is that in modern Korean, the vowels
and have completely merged to [ɛ] (Hong, 1988; Choi, 2002; Kim, 2000; Chung, 2002). The merger appears to have
been largely complete by the mid 1950s, meaning that speakers born after the Korean War do not produce the vowels
differently according to standard acoustic measurements. (NB In a conference presentation in Seoul on 7/05/2013 at
which the third author was present, Tae Jin Yoon claimed that male speakers over 70 still make this distinction in
production, while female and younger speakers do not). We are unaware of any perceptual experiments that bear on the
merger (besides Experiment II, to be reported below); however there are multiple sources of anecdotal data which suggest
the merger is perceptually complete in modern-day speakers as well. In the course of teaching phonetics to Korean
undergraduate English majors, the first author has informally found an abject failure to discriminate the English [ɛ][æ]
contrast. Moreover, the second and third authors frequently observe spelling errors between and in Korean university
students, which is expected only if the orthographic contrast is phonologically arbitrary. In short, there can be no doubt that
and represent a single phonological category in most present-day Korean speakers.
Therefore, we are faced with the following situation: Koreans cannot distinguish English [ɛ] from English [æ] on
perceptual grounds, but English [ɛ] tends to be adapted with the vowel grapheme, formerly representing /ɛ/K, while
English [æ] tends to be adapted with the grapheme, formerly representing /æ/K. Prima facie this suggests that
orthography is involved in the adaptation system, since it is otherwise inexplicable how Korean can distinguish a vowel
contrast that they cannot hear. However, the issue is clouded by the uncertain diachrony of individual loanwords, vis a vis
the phonological merger. If a loanword was adapted by a pre-merger speaker (that is, a speaker for whom the [ɛ]K[æ]K
contrast was intact), and the orthography is consistent with the vowel category, then the adaptation could be explained by
either perceptual mapping or orthographic mapping.
There are several ways to deconfound these two explanations. One is to carefully investigate the diachrony of
loanwords, for example by identifying loanwords that were adapted prior to the vowel merger from written sources.
Another way is to examine the current loanword corpus for cases in which orthographic and perceptual adaptations would
yield differing outcomes, and assess the relative proportion of each. A final option is to conduct an online adaptation
study along the lines of Vendelin and Peperkamp (2006). The counts in (8) suggest that the second option is unlikely to be
fruitful -- there simply are not enough cases where the English spelling and phonology are inconsistent to make a useful
statistical comparison. Similarly, because the spelling and phonology are so closely aligned for these vowels, a diachronic
investigation is unlikely to yield a conclusive result -- most of the data will be compatible with either hypothesis. Therefore,
this paper turns to an online adaptation study.

3.1. Experiment II

Experiment II is closely modeled on the online adaptation study of Vendelin and Peperkamp (2006). In this study,
Korean listeners are presented with English CVC nonwords containing /ɛ/ or /æ/, and spelled correspondingly with <e> or
<a> (e.g. meb /mɛb/). In the oral-only condition, listeners only heard the form (as produced by a native speaker of
American English). In the oral + written condition, listeners both heard the word and saw the English orthography.
Listeners were asked to identify which Korean vowel was the best match for the English sound they heard.

3.1.1. Participants
The listeners were 29 native speakers of Korean, recruited from a large public university in the southwest province of
Korea (Jeolla-do). Listeners were in their early to mid 20s. Their English exposure came primarily from the public
education system (where the emphasis is on reading and writing), supplemented in some cases by hagwon (after-school
schools, which may focus on English, math, or other subjects). At the time of testing, listeners were either taking or had
completed Level 2 of English Composition and Conversation, a mandatory freshman-level course for English majors.

3.1.2. Stimuli
The stimuli included the following minimal quadruples (given below in English orthography for reader ease):
R. Daland et al. / Lingua 159 (2015) 70--92 85

(9) fap, fep, fip, feap


hab, heb, hib, heab
mab, meb, mib, meab

The critical stimuli were the first two words in each row, fap vs. fep, hab vs. heb, and mab vs. meb.
These items were produced by a male native speaker of American English, and digitized at 44.1 kHz. Recording was
done with Praat 5.3.53 (Boersma and Weenink, 2013) on a Sony laptop (model VPCEA36FK) running Windows 7. Prior to
participating in the adaptation study, the participants also completed a vowel production study, whose goal was to verify
that the  merger was complete in these speakers (since the Jeolla dialect had not been previously tested). The
merger was confirmed; those results are not presented here as the goal of the present study is to test for orthographic
effects in online adaptation. The listeners were also asked to do online adaptation of existing/familiar loanword forms. As it
is unclear how pre-established lexical knowledge interacts with loanword perception, those results are also not presented
here.

3.1.3. Procedure
Participants were seated at a testing booth in a quiet room. The experiment was explained to them in Korean, and the
experiment began. The instructions stated (in Korean) that they were to match the (English) word with the best Korean
sound. Participants were first presented with the oral-only condition. Each item was presented once, auditorily. Four
options were offered: (/æ/K), (/ɛ/K), (/a/K), and (/i/K). The items were presented in random order. Upon completing
the oral-only phase, participants began the oral + written condition. The procedure was exactly the same, except that the
orthographic form of the nonword was presented along with the auditory form.

3.1.4. Results
Prior to analysis, results from the non-critical stimuli (tense vs. lax high front vowels) were informally inspected. A very
high percentage of the responses to these items were the Korean vowel /i/, suggesting that all participants were attending
to the task. These items were then discarded, as well as any items where the response was not (/æ/K) or (/ɛ/K). A
summary of these data is shown in Fig. 2 as a barplot of the percentage of (/ɛ/K), broken down by the English vowel and
the presentation mode.
100

oral
mixed
80
60
Percent EH

40
20
0

AE EH

English vowel

Fig. 2. Korean vowel grapheme identification in Experiment II, as a function of English vowel quality and oral or oral + written presentation.
86 R. Daland et al. / Lingua 159 (2015) 70--92

The results were submitted to a mixed-effects logistic regression, with (log-odds of) choosing (/ɛ/K) as the dependent
variable. The fixed effects included the English vowel identity (/ɛ/ or /æ/), the presentation modality (oral-only or oral
+ written), and an interaction term. The random effects included a random intercept for items, a random intercept for
listeners, and random listener slopes for English vowel identity, presentation modality, and an interaction (i.e. a listener
slope for each fixed effect).
In the oral-only condition, there were two relevant results. First, there was no significant difference between the
adaptation of /ɛ/E and /æ/E in the oral condition (b = 0.3895, z = 0.950, p > 0.3). Second, there was an overall bias toward
the (/æ/) adaptation (b = 1.0541, z = 3.509, p < 0.0005). This statistical finding is entirely consistent with Fig. 2,
which shows the /æ/K adaptation was preferred about 70% of the time, regardless of which English vowel occurred in the
source form.
The oral + written condition differed significantly from the oral-only condition. When the English vowel was /æ/ and
spelled with <a>, the likelihood of an (/æ/) adaptation was higher than the comparable figure in the oral-only condition
(b = 2.5948, z = 2.798, p < 0.01). But when the English vowel was /ɛ/ and spelled with <e>, the likelihood of an (/ɛ/)
adaption was far higher than the comparable figure in the oral-only condition (b = 5.4644, z = 4.362, p < 0.0001).

3.2. Summary and discussion

Experiment II presented Korean listeners with English minimal pairs containing either /ɛ/ or /æ/, and participants were
asked to adapt the vowel. In the oral-only condition, there was a global bias toward the (/æ/) adaptation, but listeners did
not exhibit any sensitivity to whether the English vowel contained /ɛ/ or /æ/. This suggests they could not discriminate the
contrast; at the very least it shows they had not learned the association between the English vowels and the
corresponding Korean graphemes. However, a completely different pattern was found in the oral + written condition.
There, listeners tended to associate /ɛ/E with (/ɛ/K), and /æ/E with (/æ/K), exactly the pattern that is already established
in existing loanwords. As the main difference in these conditions was the presence/absence of orthography, the results
offer very strong evidence that Korean listeners relied on the English orthography in selecting the Korean vowel
adaptation. The remainder of this section addresses the (/æ/K) response bias, and the more basic question of how this
loanword pattern came about in the first place.
As shown in Fig. 2 and summarized in the regression results, Korean listeners did not exhibit differential adaptation of
[æ]E vs. [ɛ]E in the oral-only condition, but they did exhibit a substantial bias toward adapting both [æ]E and [ɛ]E to (/æ/K),
in preference to (/ɛ/K). While this fact does not contradict the central finding of an orthographic effect, we had no a priori
expectation of such a bias. The most natural explanation for this kind of response bias is that it is rooted in lexical statistics.
For example, it could be that (/æ/K) is simply more frequent in Korean than (/ɛ/K) is; or that (/æ/K) is more frequent
than (/ɛ/K) specifically in English loanwords. As it turns out, neither of these is true. From the counts in (8), it is evident
that there is a moderate frequency asymmetry, but it goes in the wrong direction: [ɛ]E/ (/ɛ/K) is somewhat more frequent in
English loanwords than [æ]E/ (/æ/K). The relative frequency of these vowels in the Korean language can be inferred from
the Sejong Corpus (Kim and Kang, 1996). As it turns out, these vowels have almost the same frequency, whether
assessed by type (/ɛ/K: 649,385; /æ/K: 643,900) or token (/ɛ/K: 2,341,339; /æ/K: 2,368,042). Therefore, the (/æ/K) bias
has no obvious basis in type/token frequencies. As this issue is not central to the main claims of the paper, we will not
consider it further; we turn instead to the question of how this adaptation pattern came to be.
As was shown earlier in (8), the existing loanwords exhibit a near-categorical pattern of alignment: English words
containing [ɛ] are almost exceptionlessly spelled with <e> rather than <a>, and adapted to (/ɛ/K) with high probability;
English words containing [æ] are almost exceptionlessly spelled with <a> rather than <e>, and adapted to (/æ/K) with
high probability. Experiment II, an online adaptation task, showed that in the absence of orthography, modern speakers do
not appear to differentially adapt [æ]E vs. [ɛ]E, but when orthography is given, the adaptation pattern more closely mirrors
the pattern found in existing loanwords. Thus, the evidence is strong that perception plays no role in the (/ɛ/K) vs.
(/æ/K) distinction for modern speakers. But this does not address the question of how the pattern came to be in existing
loanwords. Was it determined by orthography all along? Or, could it be that perception set up the pattern, and it is now
maintained by orthography?
The present data do not and cannot distinguish these possibilities, so here we can only give informed speculation. As
noted earlier, the phonetic merger of (/ɛ/) and (/æ/) is believed to have reached completion by the mid-1950s,
meaning that speakers who were past adolescence at that time should not exhibit the merger. Owing to the US
involvement in the Korean War, it seems likely that a large influx of English loanwords began in the 1950s, chiefly arising
from young (but still pre-merger) speakers who interacted with US military personnel, such as soldiers and administrative
support staff. Although orthographic influences cannot be ruled out at this period, there is at least a plausible account for
primarily perceptual adaptation ([ɛ]E > /ɛ/K; [æ]E > [æ]K) as sketched in Boersma and Hamann (2009, pp. 10--12). As
the speakers of this generation matured and rose to positions of greater societal influence, they may have been the main
promulgaters of English loanwords in academic and popular press. Because these vowels are spelled consistently as
R. Daland et al. / Lingua 159 (2015) 70--92 87

shown in (8), this would have set up the conditions for post-merger speakers to note the association between English and
Korean orthography (<e> > ; <a> > ), despite the fact that these two graphemes represent a single vowel [ɛ]K.
When post-merger speakers wish to adapt a novel English word (that is, one with no conventionalized Korean spelling),
there is no phonological issue: both [ɛ]E and [æ]E are mapped to the (now-)single Korean phonological category /ɛ/K.
However, to include this form in writing, an orthographic form must be selected, which involves spelling /ɛ/K with or the
homophonous . It is at this point that the learned associations <e> > , <a> > become relevant: the post-merger
adapter can rely upon this association to determine the orthographic form for new loanwords containing /ɛ/K. Of course, it
cannot be ruled out that pre-merger adapters were also making use of orthography in this manner, as even passing
familiarity with Hangeul romanization conventions would yield the association > <e>, > <a> as shown in (1). In
short, the present data are only compatible with orthographic adaptation of these vowels in the present day, but they are
compatible with both orthographic and perceptual adaptation prior to the vowel merger (or some combination).

4. General discussion

This paper has presented two case studies, on the adaptation of English vowels in Korean loanwords. The first study, in
Experiments Ia and Ib, focused on the role of orthography in the adaptation of stressed versus unstressed vowels. A
series of information-theoretic statistics were developed which yield upper and lower bounds on the contributions of
orthography versus perception in vowel adaptation. Experiment Ia showed that the lower bound on both orthography and
perception was above zero, implying that both orthography and phonetics/phonology are involved in vowel adaptation.
Experiment Ib investigated replicated Experiment Ia, but separated by stress levels; the results showed a systematic
effect of stress, whereby the upper and lower bounds for orthographic influence were higher for unstressed vowels, while
the upper and lower bounds for perceptual influence were higher for stressed vowels. The second case study investigated
the adaptation of English stressed [ɛ]E and [æ]E, which tend to be adapted consistently as (/ɛ/K) and (/æ/K)
respectively, despite the fact that these vowels have undergone a complete merger. In an online adaptation study, Korean
listeners did not exhibit distinct adaptation behaviors for [ɛ]E and [æ]E when they heard auditory stimuli without
orthography, but when English orthography was supplied, items spelled with <e> were more likely to be adapted as
(/ɛ/K) while items spelled with <a> were more likely to be adapted with (/æ/K). These case studies provide very strong
evidence of orthographic influence in loanword adaptation. In the General Discussion, we take up the broader implications
of the present findings, and discuss some outstanding questions.

4.1. There is an orthographic effect, and we can measure it

Aside from the empirical contributions of Experiments I and II, this paper makes a novel theoretical and methodological
contribution by developing upper and lower bounds for the influence of various factors in loanword adaptation. In the
present case, the methods were used to test for orthographic versus perceptual influences. However, in principle the
same kind of method can be used to test for other types of influence, with limited adjustments.
This kind of information-theoretic statistic is most appropriate for cases in which a response variable can assume one
of numerous possible values, and there is not a 1--1 mapping from inputs to responses (even if there are tendencies). For
example, in the case of Koreans adapting English vowels, there are more than 8 distinct categories (including diphthongal
adaptations, such as /eI/E > [ei]K), so logistic regression is unsuitable. A property of the method is that it does not provide
an exact measure of influence for the relevant conditioning factors; it can only indicate bounds. However, this is not a
defect of the analysis method; rather, it is determined by the property of the data that the conditioning factors are partially
confounded. For example, even though the English spelling system is known for its lack of transparency, it is still true that
phonotactically typical nonword pronunciations and their spellings enjoy a fairly high degree of mutual predictability (e.g.
Carr, 2013). In the present situation, where the English orthography and English pronunciation often make the same
prediction, it is impossible to fully deconfound their relative influence. Therefore, the type of bounds developed here give
the most honest measure of influence that can be inferred from the kind of corpus data used in Experiments I. To isolate
orthographic effects more precisely, it is necessary to conduct controlled, online adaptation experiments as advocated by
Vendelin and Peperkamp (2006), and as was done in Experiment II.

4.2. When orthographic effects matter: the perceptual uncertainty hypothesis

For researchers whose primary interest in loanword adaptation is the insights it can yield on speech perception,
orthographic effects represent a kind of ‘contamination’. The question is what to do about this kind of contamination. One
possibility is to abandon loanword corpora altogether, as advocated by Vendelin and Peperkamp (2006). Another
possibility is to acknowledge that loanword corpora studies do not constitute a perfect methodology, but to understand the
88 R. Daland et al. / Lingua 159 (2015) 70--92

weaknesses of the methodology so as to more appropriately safeguard against them. We advocate for this latter
possibility. The fields of phonology and speech perception have gained a great deal from loanword corpus studies; it is
important to understand when loanwords are likely or unlikely to evince strong orthographic effects, both for the sake of
future research, and for the purposes of re-evaluating past research in light of what we have learned.
To begin, it must be acknowledged that loanword adaptation is not a unitary phenomenon. Much depends on the
literacy, fluency, purpose, and social context of the adapter at the moment of adaptation, which are in turn affected by the
milieu (cf. Haugen, 1950; Kang, 2010, 2013). For example, although Paradis and LaCharité (2008) maintain that fluent
bilinguals are the drivers of loanword adaptation throughout the history of Quebec French, they find that the rate of
‘phonetic approximation’ is higher for Old Quebec French at a time when there were fewer fluent bilinguals, and became
‘more phonological’ later. Clearly, Old Quebec French adapters were behaving differently in the aggregate than modern
Quebec French adapters, even if there are many points of similarity, and the changes in the adaptation system have
presumably arisen from the closer degree and nature of the language contact there.
One major factor which we suspect will influence the level of orthographic influence in a set of loanwords is the nature
and extent of the foreign/source language education system in the borrowing/target culture. In S. Korea, English
education begins in primary school (approximately 11 years of age) and primarily focuses on reading and writing, rather
than listening and speaking. For instance, an acquaintance of the authors related that when she was first learning English,
she was taught to read but not how to pronounce English words; for a word like final she would supply the pronunciation
[pʰinal] (where the perceptually and phonologically most suitable adaptation is [pʰainal]). This pronunciation can be
understood as matching the English string to the closest Hangeul romanization and reverse-transliterating to arrive at a
written Hangeul form (approximating the Korean UR). It seems to us that scientific/technical terms are especially likely to
be adapted according to this kind of reverse transliteration process, since these are words that adapters are least likely to
have heard pronounced aloud by native speakers, and simultaneously most likely to be written down rather than
pronounced aloud by the adapters themselves. It also seems likely to us that the extent of this kind of orthographic
adaptation of scientific/technical jargon will depend on the relative level of spoken English in the borrowing culture, as well
as on the level of interchange between scholars. Although S. Korea is now the third largest contributor of foreign graduate
students studying in the US, the number of Korean graduate students was much smaller in past (e.g. according http://
www.iie.org/Research-and-Publications/Open-Doors/Data/International-Students/Leading-Places-of-Origin/, there were
70,627 Korean graduate students in the US in 2013, but barely half as many in 1999, which was still 8 years after the NAKL
loanword database used here was collected). In short, we suspect that scientific/technical loanwords are more likely to
exhibit orthographic effects in adaptation (a lexical factor), and this is especially likely for cultures in which there is
relatively little academic/literary exchange with the borrowing language, or in which foreign language education
emphasizes literacy to the exclusion of speaking and listening (a cultural factor).
Finally, we suspect there are specifically phonological/perceptual factors that affect the degree of orthographic
influence in loanword adaptation. There is widespread agreement that loanword adaptation tends to select a borrowing
form which maximizes the degree of perceptual or phonological match to the source form (while satisfying borrowing
language phonotactics), but it must also be acknowledged that in many cases it is not possible to achieve a perfect match.
For example, Pierrehumbert (2003) writes ‘‘[L]arge and objective data sets on the quantitative properties and exact
patterns of variation of phonological categories in different languages. . . have revealed that superficially analogous
categories have different quantitative properties in different languages’’. It would not be surprising, therefore, if there exist
cases in which multiple adaptations are roughly equally acceptable in terms of perceptual match. For example, stressed
English vowels are known to vary widely in their acoustic realization, and seminal studies in quantitative phonetics
demonstrated that there are regions of F1--F2 space which are compatible with 2 or more vowel category labels (Peterson
and Barney, 1952; Hillenbrand et al., 1995). Let us suppose that the typical acoustic realization of an English vowel
happens to fall into such an ambiguous region of the Korean vowel space. For example, suppose that the medial vowel in
albatrossE is equally perceptually compatible in this context with both of the Korean vowels [a]K and [ʌ]K. Under these
circumstances, the English orthography would seem to be an entirely reasonable source of authority to appeal to. In the
English spelling system, the English vowel [ʌ]E is often spelled with a <u> (cf. bug), whereas the English vowel [a]E is often
spelled with an <a> (cf. bar). Therefore, the spelling alb<a>tross suggests the [ɛlbatʰɨɾos*ɨ]K adaptation is preferable to
corresponding adaptation with [ʌ]K.
According to this line of reasoning, orthography is most likely to constrain loanword adaptation in cases when it is not
fully determined by perception/phonology alone. We refer to this as the Perceptual Uncertainty Hypothesis. In the
remainder of this section, we discuss the results of the present study in light of the Perceptual Uncertainty Hypothesis, and
address the potential issue of whether English orthography indicates the underlying vowel qualities of schwas.
As shown in Experiment Ib, the upper and lower bounds on orthographic influence are higher for unstressed vowels
than for stressed vowels, while the opposite pattern is true for perceptual influence, suggesting that orthography plays a
stronger role in the adaptation of unstressed vowels than it does for the adaptation of stressed vowels. For the moment, let
us suppose that English unstressed vowels are on average compatible with a greater number of possible Korean
R. Daland et al. / Lingua 159 (2015) 70--92 89

phonological parses than stressed vowels are. At present, we do not have conclusive evidence that this supposition is
correct. However, if it were correct, then the result we found in Experiment Ib is expected according to the Perceptual
Uncertainty Hypothesis. Moreover, this supposition is at least compatible with what is known about English vowels. For
example, Flemming and Johnson (2007) show that the acoustic realization of the English schwa is especially variable,
spanning nearly the entire F1--F2 space delimited by the average values for the stressed vowels (their Fig. 1). It would be
straightforward to assess the how variably Koreans perceive English schwa versus other vowels in terms of Korean
vowels, or to assess the influence of orthography more directly, using an online adaptation experiment.
Experiment II represents an extreme case of perceptual uncertainty -- the Korean writing system makes 2 symbols
available which map onto the same set of acoustic outcomes in context. In this case, the English orthography plays a very
powerful role in deciding which Korean grapheme is used. Of course, this example tells us almost nothing about the
phonology/speech perception interface; but that is the point. In a case where the choice is constrained to two labels that
are fully phonologically and perceptually equivalent, English orthography plays an almost deterministic role in which
Korean label is selected. Thus, the results of Experiment II are fully consistent with the Perceptual Uncertainty Hypothesis.

4.3. What lies beneath [ə]

There is an alternative explanation for the putative orthographic effects identified in Experiment I. At least in the case of
Quebec French adaptation, Paradis and LaCharité (1997, p. 395) write, ‘‘vowels reduced to schwa at the phonetic level in
English. . . are in the vast majority of cases correctly interpreted according to the English lexical vowels in Quebec French
loans.’’ That is, Paradis and LaCharité argue that it is not orthography that determines the adaptation of schwas from
English into French, but rather the schwa’s underlying vowel quality. For example, Paradis and LaCharité might argue that
in the adaptation of camera, Korean speakers adapt the medial vowel as [ɛ] and the final one as [a] because these are the
underlying vowels in the English lexical representation. We find this argument untenable.
The most basic reason is that many of these vowels do not alternate. For example, camera does not readily accept
stress-shifting suffixes such as -ity or -ical; as far as we know, the medial and final vowels of this word always appear as
schwas. Therefore, there is no phonological or phonetic evidence that we are aware of that would enable /ɛ/ to be
recovered as the underlying medial vowel but /a/ as the final one. This holds not only for non-native (e.g. Korean)
speakers, but even for native English speakers. From a phonological learning perspective, the only evidence that the
medial and final ‘‘lexical vowels’’ of camera are /ɛ/ and /a/ is that the former is spelled with <e> and the latter is spelled with
<a>. This is hardly evidence for the lack of an orthographic influence.
Indeed, we don’t understand the basis for Paradis and LaCharité’s claim that French vowels adaptations ‘‘are in the
vast majority of cases correctly interpreted according to the English lexical vowels’’. One reason for this is that we do not
see how the underlying vowel identity can be definitively identified in cases like camera above, where the schwa does not
alternate. The other reason is that Paradis and LaCharité do not actually give any data or citation to support this claim; they
also do not describe how they identify the underlying vowel in cases like camera. This is especially problematic because
this is one of the key arguments that Paradis and LaCharité make to support their contention that orthography does not
play a significant role in loanword adaptation. In fact, as we have shown throughout this paper, there are numerous cases
where orthography and phonology make the same prediction. Therefore, even for cases in which the underlying vowel
quality of a surface schwa can be determined (e.g. from a stress alternation), the fact that it is adapted in a way that is
consistent with the underlying vowel does not in any way rule out orthographic adaptation. This can only be determined
from cases in which the phonology and the orthography make differing predictions. Paradis and LaCharité (1997) did not
actually present evidence bearing on this point. In short, the idea that adapters are sensitive to the underlying vowels of
English schwas rather than orthography is untenable, because orthography is the only clue to the underlying vowel in non-
alternating schwas like in camera. If anything, these items provide evidence for orthographic influence, not against it.

4.4. Conclusions

In this article we have considered two case studies on the adaptation of English vowels in Korean loanwords, both
demonstrating orthographic influences. The first, broad-brush study developed statistical tools from information theory. It
was found that English orthography provides significant predictive power for the Korean vowel that is adapted, over and
above what can be explained by knowing the perceptual/phonological identity of the English vowel alone. The general
pattern of upper and lower bounds is most consistent with the interpretation that orthographic influence is greatest for
unstressed vowels, while perceptual/phonological influence is greatest for primary stressed vowels. The second study
focused more narrowly on the adaptation of English [ɛ] vs. [æ], which are typically adapted into separate Korean
graphemes despite the fact that the corresponding Korean vowels have undergone a complete merger. A corpus study
showed that the adaptations predicted by orthography and the pre-merger phonology are almost fully aligned (which is
90 R. Daland et al. / Lingua 159 (2015) 70--92

compatible with either orthographic adaptation, or primarily pre-merger perceptual adaptation). In an online adaptation
study, Korean listeners followed the corpus pattern when English orthography was presented with the auditory stimulus,
but in the absence of orthography their adaptations of [ɛ] vs. [æ] were not distinct. Thus, both case studies demonstrated
orthographic effects on loanword adaptation in Korean, a novel empirical contribution. Besides the methodological
contribution of the information-theoretic metrics, the paper offered a theoretical proposal, the Perceptual Uncertainty
Hypothesis. This Hypothesis is that orthographic influences in loanword adaptation will be strongest when perceptual
factors underconstrain the adaptation. The discussion section identified other lexical and cultural factors which might be
likely to influence the relative amount of orthographic influence in particular loanword adaptation systems.

Acknowledgments

This work was supported by the National Research Foundation of Korea Grant, funded by the Korean Government
(NRF-2014-1215). We also wish to acknowledge Sharon Peperkamp, Emmanuel Dupoux, and an anonmyous reviewer
for useful discussion, commentary, and critique.

Appendix

Table 4
Counts of the correspondence between English vowel and English orthography.

ə I æ ɛ o ɒ e i aI ͡ ɑː u ɔː ʌ l̩ n̩ 2˞ iə ͡ ao ͡ U ɛə ͡ oI ͡

a 334 22 494 8 5 19 181 10 5 92 -- 24 -- 13 1 -- 3 1 -- 9 --


i 56 771 7 9 4 5 1 50 578 -- 1 1 2 -- 1 2 3 -- -- -- --
o 195 5 11 6 291 241 2 5 7 3 8 49 28 1 3 3 1 12 2 -- 2
e 113 132 8 380 -- 8 14 60 3 1 3 3 2 3 -- 1 13 -- 2 -- --
u 53 3 -- 1 2 2 2 1 2 -- 76 1 110 1 -- 2 -- -- 27 -- --
er 259 1 -- 1 -- -- -- -- -- -- -- -- -- -- -- 21 1 -- -- -- --
y 6 147 -- -- -- -- 1 20 36 1 -- -- -- -- -- -- -- -- -- -- --
ar 16 -- 1 1 -- 1 -- -- 1 73 -- 3 1 -- -- -- -- -- -- 3 --
or 44 -- -- 1 2 1 -- -- -- -- -- 41 -- -- -- 8 -- -- -- -- --
ea -- -- -- 18 -- -- 3 36 -- 1 -- -- -- -- -- -- 6 -- -- -- --
ou 2 1 -- -- 3 -- -- -- -- -- 14 2 4 -- -- -- -- 37 -- -- --
tion 6 -- -- -- -- -- -- -- -- -- -- -- -- -- 51 -- -- -- -- -- --
al 2 -- 4 -- -- -- -- -- -- -- -- 1 -- 47 -- -- -- -- -- -- --
ee -- 1 1 -- -- -- 1 44 -- -- -- -- -- -- -- -- 2 -- -- 1 --
oo -- -- -- -- 2 1 -- -- -- -- 31 -- -- -- -- -- -- -- 11 -- --
ai 2 2 -- 1 -- -- 30 -- -- -- -- -- -- -- -- -- -- -- -- 1 --
ia 14 -- -- -- -- -- -- -- -- -- -- -- -- 1 -- -- 13 -- -- -- --
sm 25 1 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
i.e. -- 5 -- 1 -- -- -- 12 4 -- -- -- 1 -- -- -- -- -- -- -- --
ur 8 -- -- -- -- -- -- -- -- -- -- -- 1 -- -- 13 -- -- -- -- --
au 1 -- -- -- 1 1 1 -- -- -- -- 14 -- -- -- -- -- 1 -- -- --
oa -- -- -- -- 18 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
oi -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 18
ui -- 9 -- -- 1 -- -- 2 1 -- 5 -- -- -- -- -- -- -- -- -- --
ow -- -- -- -- 12 1 -- -- -- -- -- -- -- -- -- -- -- 3 -- -- --
iu 5 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 9 -- -- -- --
ir -- 1 -- -- 1 -- -- -- -- -- -- -- -- -- -- 11 -- -- -- -- --
ei 1 -- -- 1 -- -- 5 3 2 -- -- -- -- -- -- -- -- -- -- -- --
sion -- -- -- -- -- -- -- -- -- -- -- -- -- -- 12 -- -- -- -- -- --
ue 1 -- 1 1 -- -- -- -- -- -- 8 -- -- -- -- -- -- -- -- -- --
ay -- 1 -- 1 -- -- 7 -- 1 -- -- -- -- -- -- -- -- -- -- -- --
ey -- 8 -- -- -- -- 1 1 -- -- -- -- -- -- -- -- -- -- -- -- --
eer -- -- -- -- -- -- -- 1 -- -- -- -- -- -- -- -- 8 -- -- -- --
io 7 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2 -- -- -- --
our 1 -- -- -- -- -- -- -- -- -- -- 4 -- -- -- 4 -- -- -- -- --
air -- -- -- -- -- -- -- 1 -- -- -- -- -- -- -- -- -- -- -- 5 --
R. Daland et al. / Lingua 159 (2015) 70--92 91

References

Ahn., S.-C., 1998. An Introduction to Korean Phonology. Hanshin Publishing Company.


Baayen, R.H., 2001. Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht.
Boersma, P., Hamann, S., 2009. Loanword adaptation as first-language phonological perception. In: Calabrese, A., Leo Wetzels, W. (Eds.),
Loanword Phonology. John Benjamins, Amsterdam, pp. 11--58.
Boersma, P., Weenink, D., 2013. Praat: Doing Phonetics by Computer [Computer program]. Version 5.3.53. retrieved approx. (1.07.13) from
http://www.praat.org/
Broselow, E., Chen, S.-I., Wang, C., 1998. The emergence of the unmarked in second language phonology. Stud. Second Lang. Acqis. 20,
261--280.
Carr, P., 2013. English Phonetics and Phonology. Blackwell Publishers.
Choi, H.-W., 2002. A Survey of the Standard Pronunciation (in Korean). National Institute of the Korean Language, Seoul.
Chung, M.-S., 2002. Diachronic changes in Korean evidence by acoustic data. In: Korea University Research Institute of Korean Studies (Eds.),
Acoustic Data and Research in Korean. Wul-in, Seoul. (in Korean), pp. 67--159. (in Korean).
Daland, R., 2014. Tutorial on Information Theory I. Keynote address presented to the 5th International Conference of the Korean Phonology and
Morphology Circle, Jul. 3--5. (e-copy available from author’s website).
Davidson, L., 2007. The relationship between the perception of non-native phonotactics and loanword adaptation. Phonology 24, 261--286.
Detey, S., Nespoulous, J.-L., 2008. Can orthography influence second language syllabic segmentation? Japanese epenthetic vowels and French
consonantal clusters. Lingua 118, 66--81.
Dohlus, K., 2005. Phonetics or phonology: asymmetries in loanword adaptations -- French and German mid front rounded vowels in Japanese.
ZAS Pap. Linguist. 42, 117--135.
Dupoux, E., Hirose, Y., Kakehi, K., Pallier, C., Mehler, J., 1999. Epenthetic vowels in Japanese: a perceptual illusion? J. Exp. Psychol.: Hum.
Percept. Perform. 25, 1568--1578.
Flemming, E., Johnson, S., 2007. Rosa’s roses: reduced vowels in American English. J. Int. Phon. Assoc. 37, 83--96.
Goldsmith, J., 1998. On information theory, entropy, and phonology in the 20th century. In: Paper presented to the Royaumont CTIP II Round
Table on Phonology in the 20th Century.
Goldsmith, J., 2002. Probabilistic models of grammar: phonology as information minimization. Phonol. Stud. 5, 21--46.
Haugen, E., 1950. The analysis of linguistic borrowing. Language 26, 210--231.
Hillenbrand, J., Gett, L., Clark, M., Wheeler, K., 1995. Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97 (5), 3099--3111.
Hong, Y., 1988. A Sociolinguistic Study of Seoul Korean (PhD dissertation). University of Pennsylvania.
Hwang, H., Moon, S., 2005. An acoustic comparative study of Korean / , / and English /ε, æ/ pronounced by Korean young male speakers.
Malsori 56, 29--47.
Iverson, G.K., Lee, A., 2006. Perception of contrast in Korean loanword adaptation. Korean Linguist. 13, 49--87.
Jun, E., 2002. An experimental study of the effect of release of English syllable final stops on vowel epenthesis in English loanwords. Stud. Phon.
Phonol. Morphol. 8, 117--134.
Kabak, B., 2003. The Perceptual Processing of Second Language Consonant Clusters (PhD Dissertation). University of Delaware.
Kabak, B., Idsardi, W.J., 2007. Speech perception is not isomorphic to phonology: the case of perceptual epenthesis. Lang. Speech 50, 23--52.
Kang, Y., 2003a. Perceptual similarity in loanword adaptation: English postvocalic word-final stops in Korean. Phonology 20, 219--273. http://dx.
doi.org/10.1017/S0952675703004524
Kang, Y., 2003b. Sound change affecting noun-final coronal obstruents in Korean. Jpn./Korean Linguist. 12, 128--139.
Kang, Y., 2009. English /z/in 1930s Korean. In: Potter, D., Storoshenko, D.R. (Eds.), The Simon Fraser University Working Papers in Linguistics
2 (Proceedings of the 2nd International Conference on East Asian Linguistics).
Kang, Y., 2010. The emergence of phonological adaptation from phonetic adaptation: English loanwords in Korean. Phonology 27, 225--253.
http://dx.doi.org/10.1017/S0952675710000114
Kang, Y., 2012. The adaptation of English liquids in Contemporary Korean: a diachronic study. Catalan J. Linguist. 11 .
Kang, Y., 2013. Loanwords. Oxford Bibliographies: Linguistics.
Kim, Y.-B., 2000. A diachronic examination of the phonology of Contemporary Korean. In: Hong, J.-S. (Ed.), The Formation and Changes of
Contemporary Korean, vol. 1. Pakiceng, Seoul. pp. 61--86 (in Korean).
Kim, H., Kang, B., 1996. Korea-1 Corpus: design and composition. Korean Linguist. (written in Korean).
LaCharité, D., Paradis, C., 2002. Addressing and disconfirming some predictions of phonetic approximation for loanword adaptation. Langues et
Linguistique 28, 71--91.
LaCharité, D., Paradis, C., 2005. Category preservation and proximity versus phonetic approximation in loanword adaptation. Linguist. Inq. 36 (2),
223--258.
Lee, A., 2009. Korean Loanword Phonology: Perceptual Assimilation and Extraphonological Factors (Unpublished doctoral dissertation).
University of Wisconsin, Milwaukee.
Kwulipkwukeyenkwuwen [The National Academy for the Korean Language], 1991. Oylaye sayong siltay cosa: 1990 nyendo [Survey of the State
of Loanword Usage: 1990]. NAKL, Seoul.
Oh, M., 2004. Phonetic and spelling information in loan adaptation. In: Proceedings of the 2004 linguistic society of Korean International
Conference 1 (Forum lectures and workshops), Hanshin Publishing Co, Seoul, pp. 195--204.
Oh, M., 2012. Adaptation of English complex words into Korean. J. East Asian Linguist. 21, 267--304.
Paradis, C., LaCharité, D., 1997. Preservation and minimality in loanword adaptation. J. Linguist. 33, 379--430.
Paradis, D., LaCharité, C., 2002. Addressing and disconfirming some predictions of phonetic approximation for loanword adaptation. Lang.
Linguist. 28, 71--91.
Paradis, C., LaCharité, D., 2008. Apparent phonetic approximation: English loanwords in Old Quebec French. J. Linguist. 44 (1), 87--128. http://dx.
doi.org/10.1017/S0022226707004963
92 R. Daland et al. / Lingua 159 (2015) 70--92

Peperkamp, S., Vendelin, I., Nakamura, K., 2008. On the perceptual origin of loanword adaptations: experimental evidence from Japanese.
Phonology 25, 129--164.
Peterson, G.E., Barney, H.L., 1952. Control methods used in a study of the vowels. J. Acoust. Soc. Am. 24, 175--184.
Pierrehumbert, J., 2003. Phonetic diversity, statistical learning, and acquisition of phonology. Lang. Speech 46 (2--3), 115--154.
Shannon, C.E., 1948. A mathematical theory of communication. Bell Sys. Tech. J. 27, 379--423623--656.
Shin, J.-Y., Kiaer, J.-E., Char, J.-E., 2012. The Sounds of Korean Academic (ISBN 97811076672680).
Shinohara, S., 2004. Emergence of universal grammar in foreign word adaptation. In: Kager, R., Pater, J., Zonneveld, W. (Eds.), Fixing Priorities:
Constraints in Phonological Acquisition. Cambridge University Press, pp. 292--320.
Silverman, D., 1992. Multiple scansions in loanword phonology: evidence from Cantonese. Phonology 9, 298--328.
Smith, J., 2006. Loan phonology is not all perception: evidence from Japanese loan doublets. In: Vance, T.J., Jones, K.A. (Eds.), Japanese/
Korean Linguistics, 14. CSLI, Stanford. pp. 63--74 (Pre-publication version available as Rutgers Optimality Archive #729).
Sohn, H.-M.-M., 1999. The Korean Language. Cambridge University Press, Cambridge.
Surendran, D., Niyogi, P., 2003. Quantifying the functional load of phonemic oppositions, distinctive features, and suprasegmentals. In: Ole, N.T.
(Ed.), Current Trends in the Theory of Linguistic Change. In Commemoration of Eugenio Coseriu (1921--2002). Benjamins, Amsterdam &
Philadelphia.
Vendelin, I., Peperkamp, S., 2006. The influence of orthography on loanword adaptations. Lingua 116, 996--1007.
Vinh, N.X., Epps, J., Bailey, J., 2009. Information theoretic measures for clusterings comparison. In: Meila, M. (Ed.), Proceedings of the 26th
Annual International Conference on Machine Learning -- ICML ’09. ISBN 9781605585161, pp. 2837--2854.. http://dx.doi.org/10.1145/
1553374.1553511ISBN 9781605585161.
Yip, M., 1993. Cantonese loanword phonology and optimality theory. J. East Asian Linguist. 2, 261--291.

You might also like