4 Basic English Pronunciation Rules
by User Not Found | Sep 21, 2011
Here we show you several basic English pronunciation rules to help you during your classes at your ESL school and in your practice time alone. Make your practice a dynamic and effective one by looking for new words these rules apply to: 1. Pronunciation of the “Y” “Y” is pronounced as „ai‟ or „i:‟. - In one-syllable words, “Y” is pronounced as „ai‟. For example: my, by, fly, shy, sky, dry, cry, fry, and try. - In two-syllable words, “Y” is pronounced as „i:‟. As example: happy, funny, baby, bony, puppy, party, tiny, city, candy, berry, penny, and turkey. 2. Pronunciation of the “C” “C” is pronounced as „s‟ or „k‟. For example: city, cider, circle, and country. - When an “E” or “Y” follow the “C”, it is pronounced as „s‟. Good examples are: cellar, center, cent, ice, cycle, cell, cypress, and cyclone. - When an “O”, “U”, or “A” follows the “C”, it is pronounced as „k‟. Some examples are: cold, country, computer, couple, cup, curb, cut, cap, can, and cat. Read these sentences aloud and compare both sounds: _ The city is cloudy. _ The center is covered. _ We cycle in the city but drive the car in the country. 3. Pronunciation of the “G” - When an “E”, “I”, or “Y” follows the “G”, it is pronounced as „ʤ‟. Examples are: gym, giant, gem, gorgeous, and George. ‟ or „æ‟. For example: gun, gum, gas, garden, and gap.- When a “U” or “A” follows the “G”, it is pronounced as „ 4. Pronunciation of vowel letters - The long “A” and the short “A”, for example: cape and gap. -at: bat, cat, hat, fat, sat, rat -ad: bad, had, mad, sad -ag: tag, wag, rag, bag -an: fan, pan, can, ran -am: jam, ham, ram, yam -ap: map, tap, nap When the word ends in “E”, the “A” is pronounced as a long vowel. Examples of this are: rake, gate, face, base, cage, wave, and take. When the word ends in “R”, the “A” sound is as in: tar, jar, car, and far. - The long “I” and the short “I”. -id: bid, kid, lid, did -ig: big, rig, wig, pig, dig -in: pin, fin, tin, win, bin -ip: tip, lip, hip, rip, dip -it: kit, hit, fit, sit, pit When the word ends in “E”, the “I” is pronounced as a long vowel. For example: kite, bike, dime, ride, and vine. - The long “O” and the short “O”. -og: fog, hog, dog, jog, log -op: mop, pop, hop, top -ot: hot, pot, got, not -ob: mob, cob, job, sob When the word ends in “E”, the “O” is pronounced as a long vowel. As examples: rose, pole, and hope. - The long “U” and the short “U”. -ut: pup, cup, put, up, rut, hut, cut, nut -ub: cub, tub -us: bus, pus -un: fun, sun, run, bun, gun -ug: mug, bug, tug, hug When the word ends in “E”, the “O” is pronounced as a long vowel. Examples: tune, cube, and cute.
The vowels are "a,e,i,o, and u"; also sometimes "y" & "w". This also includes the diphthongs "oi,oy,ou,ow,au,aw, oo" and many others. The consonants are all the other letters which stop or limit the flow of air from the throat in speech. They are: "b,c,d,f,g,h,j,k,l,m,n,p,qu,r,s,t,v,w,x,y,z,ch,sh,th,ph,wh, ng, and gh". 1. Sometimes the rules don't work. There are many exceptions in English because of the vastness of the language and the many languages from which it has borrowed. The rules do work however, in the majority of the words. 2. Every syllable in every word must have a vowel. English is a "vocal" language; Every word must have a vowel. 3. "C" followed by "e, i or y" usually has the soft sound of "s". Examples: "cyst", "central", and "city". 4. "G" followed by "e, i or y" usually has the soft sound of "j". Example: "gem", "gym", and "gist". 5. When 2 consonants are joined together and form one new sound, they are a consonant digraph. They count as one sound and one letter and are never separated. Examples: "ch,sh,th,ph and wh". 6. When a syllable ends in a consonant and has only one vowel, that vowel is short. Examples: "fat, bed, fish, spot, luck". 7. When a syllable ends in a silent "e", the silent "e" is a signal that the vowel in front of it is long. Examples: "make, gene, kite, rope, and use". 8. When a syllable has 2 vowels together, the first vowel is usually long and the second is silent. Examples: "pain, eat, boat, res/cue, say, grow". NOTE: Diphthongs don't follow this rule; In a diphthong, the vowels blend together to create a single new sound. The diphthongs are: "oi,oy,ou,ow,au,aw, oo" and many others. 9. When a syllable ends in any vowel and is the only vowel, that vowel is usually long. Examples: "pa/per, me, I, o/pen, u/nit, and my". 10. When a vowel is followed by an "r" in the same syllable, that vowel is "r-controlled". It is not long nor short. "R-controlled "er,ir,and ur" often sound the same (like "er"). Examples: "term, sir, fir, fur, far, for, su/gar, or/der".
Basic Syllable Rules
1. To find the number of syllables: ---count the vowels in the word, ---subtract any silent vowels, (like the silent "e" at the end of a word or the second vowel when two vowels a together in a syllable) ---subtract one vowel from every diphthong, (diphthongs only count as one vowel sound.) ---the number of vowels sounds left is the same as the number of syllables.
The number of syllables that you hear when you pronounce a word is the same as the number of vowels sounds heard. For example: The word "came" has 2 vowels, but the "e" is silent, leaving one vowel sound andone syllable. The word "outside" has 4 vowels, but the "e" is silent and the "ou" is a diphthong which counts as only one sound, so this word has only two vowels sounds and therefore, two syllables. 2. Divide between two middle consonants. Split up words that have two middle consonants. For example: hap/pen, bas/ket, let/ter, sup/per, din/ner, and Den/nis. The only exceptions are the consonant digraphs. Never split up consonant digraphs as they really represent only one sound. The exceptions are "th", "sh", "ph", "th", "ch", and "wh". 3. Usually divide before a single middle consonant. When there is only one syllable, you usually divide in front of it, as in: "o/pen", "i/tem", "e/vil", and "re/port". The only exceptions are those times when the first syllable has an obvious short sound, as in "cab/in". 4. Divide before the consonant before an "-le" syllable. When you have a word that has the old-style spelling in which the "-le" sounds like "-el", divide before the consonant before the "-le". For example: "a/ble", "fum/ble", "rub/ble" "mum/ble" and "this/tle". The only exception to this are "ckle" words like "tick/le". 5. Divide off any compound words, prefixes, suffixes and roots which have vowel sounds. Split off the parts of compound words like "sports/car" and "house/boat". Divide off prefixes such at "un/happy", "pre/paid", or "re/write". Also divide off suffixes as in the words "farm/er", "teach/er", "hope/less" and "care/ful". In the word "stop/ping", the suffix is actually "-ping" because this word follows the rule that when you add "-ing" to a word with one syllable, you double the last consonant and add the "-ing".
Everybody agrees that English spelling is horrible. There have been almost as many proposals for spelling reform as there are rewrites of Esperanto. (Tellingly, there has been precisely one success in each category-- Noah Webster and Ido-- and neither caught on universally.) Most of these proposals spend their energy fixing what isn't broken. For instance, they search hard for clever new ways of spelling the ch sound-even though ch does the job just fine in hundreds of languages. Or, they insist on 'correcting' the Great Vowel Shift, using Italian values for the vowels. Whenever the subject comes up, someone is sure to bring up all the words in -ough, or George Bernard Shaw's ghoti-- a word which illustrates only Shaw's wiseacre ignorance. English spelling may be a nightmare, but it does have rules, and by those rules, ghoti can only be pronounced like goatee. The purpose of this page is to describe those rules-- to explain the system behind English spelling, the rules that tell you how to pronounce a written word correctly over 85% of the time. Many people expect the opposite as well-- to predict the spelling from the pronunciations-- not realizing that few orthographies meet this goal. It's far from true of Spanish, for instance, which is often held up as an example of a good orthography. I stopped fervently admiring Spanish orthography when I saw a sign in a Mexican bakery with about one spelling mistake every third word. Several different types of people might be interested in this page:
foreign learners of English native speakers who never quite mastered English spelling spelling reformers who care to understand the system they want to replace linguists interested in how an inadequate alphabet is manhandled to fit an unruly language.
I've also included a sample lexicon and a set of spelling rules which you can use with my Sound Change Applier to automatically derive the pronunciation.
Thanks to Éamonn McManus, Aaron J. Dinkin, Dennis Paul Himes, Geoff Eddy, Hirofumi Nagamura, and John Cowan for useful comments and ideas, which I've tried to incorporate here.
The sounds of General American
If we're discussing spelling, we have to discuss sounds as well; and this means choosing a reference dialect. I'll use my own, of course-- a version of General American that's unexcitingly close to the standard. I'll call it GA below. Here's the vowels and consonants of my dialect. For each I give the IPA, the representation in the eccentric phonemic transcription I use in this document, and a couple of sample words. The IPA is given in Unicode; if it doesn't look right you have a nasty old non-Unicode-compliant browser.
Vowels IPA Phoneme Samples e
Consonants IPA Phoneme Samples p p b b t
rate rat meet, machine met, dread bite, cycle bit, lick note, sow not, clock cute, you cut, come
paper book take dead get cape, talk, quite moon new sing, think four, physics
æ â i ɛ
d d g g k k m m n n ŋ ñ f
aj ï ɪ o a
î ö ô
ju ü ʌ
v u ɔ ʊ ə
u ò ù @
vine thin this so zoo shack measure chew judge ran late hang
coot caught, dog cook, put
θ + ð + s
above, cynic, until z z ʃ
aw ôw oj öy
crowd, loud boy, droid
tʃ ç dʒ j
you, million wait, cow
h h ɚ @r
search, manor, bird button, happen battle, final
Who cares about dialects?
Ideally you shouldn't have to worry about my dialect at all: you could simply take (say) ê to represent whatever you pronounce as the vowel in met. Unfortunately, English dialects are not uniform enough to share a single phonology. There are many words that are not only pronounced differently in different dialects-- that is, they have a distinct phonetic realization-- but also have their own phonemic representation. Some examples:
GA is rhotic-- we pronounce the post-vocalic r's-- while other important dialects are not, notably the British standard, RP.
I distinguish cot and caught, Don and Dawn; these vowels (ô, ò) merge in the US West. On the other hand, I merge the vowel sounds in Mary, merry, and marry, which are distinguished in Eastern US dialects and in RP. I pronounce w and wh the same.
Spellings are in teal italics; pronunciations are in blue Courier. This convention avoids cluttering the text with brackets and quotation marks. Thus g refers to the letter <g>, while g refers to the sound /g/, and I will write that laugh is pronounced lâf. Linguists can take the 'pronunciations' as phonemic; e.g. I haven't attempted to indicate aspiration, the flapping of medial t and d, the appearance of clear and darkl, etc. I indicate some but not all vowel reductions (basically, those that are reduced in all forms of the morpheme). # represents the beginning or end of a word. For instance, #rh represents an rh that begins a word; g# refers to a final g. Capital letters represent variables; e.g. V represents any vowel.
The computer simulation
Along with this explanatory page, I've put up
a sample lexicon of over 5000 English words a sound change file giving the spelling rules sample output from the Sound Change Applier
The lexicon includes the target pronunciation in GA; I modified the program to compare the results of the rule application with the target. The results:
3079 (or 59%) of the pronunciations are generated perfectly. 4389 (or 85%) are generated perfectly or with only minor errors: vowel length errors, failure to reduce vowels to @, or failure to voice an s.
This is impressive; but it understates the systematicity of English spelling:
Many of the errors are off in only one segment. (E.g. the rules predict everything about bachelor except the loss of the middle
vowel. Shouldn't they get some credit for getting six segments correct?) Many of the pronunciations are really predictable using rules beyond the scope of the Sound Change Applier. I haven't by any means found every possible rule, or stated them in the best, most general form. The worst offenders in the language are already included in the sample; a larger vocabulary would include a higher percentage of well-behaved spellings.
There is a fuller discussion of the mispredictions at the end of the document. The odd phonetic transcription, by the way, derives from the dual need to easily represent sounds both in html and in the sound change file. I'm restricted to characters that html supports; and I can't use capital letters, because I need them for variable definitions in the rules. As a mnemonic, think of the umlauts as colons, so that ö is short for o:, 'long o'. The wacky spellings I used for the vowels, however, are inherent in the logic of English spelling. It would only obscure how the system works if I represented the long and short vowels with IPA forms.
The bulk of this page is basically a human-readable restatement of the rules in the sound change file The order of the rules is important. The rules can be thought of as a recipe: to pronounce a word, you go down the list of rules, seeing if each one in turn applies, and applying it if it does. The result is sometimes a little backwards in terms of explaining the system, because exceptions come first, before the general rules. That's the best way to teach the computer; but humans tend to do best by learning the most general rule first. I'll warn you: some of these rules are going to seem mondo obscure. That's because I've tried to find every regularity I could, even if it only explains half a dozen words. The yield of some rules may be small enough that some people would rather just learn the affected words as irregularities. But if anything I'm more interested in the minor regularities; they're puzzles, often unfamiliar ones, and many are the fossils of minor sound changes. To head off another likely reaction: yes, you can find exceptions to the rules. I'm perfectly aware that ough is not always pronounced ö. The point
is, what follows are the default rules that work 85% of the time. Think of ö as the default pronunciation of ough; any other pronunciation of ough is an irregularity. And finally: I'm aware that some linguists (e.g. Edward Carney) have also worked on these problems; unfortunately, I've only seen their work in summaries. I've tried to be careful and linguistically informed, but I don't claim to have committed a work of scholarship.
English has more phonemes than the alphabet has available symbols; the usual expedient of the orthography for solving this problem is to use digraphs. (Both the problem and the solution are inherited from Latin, which had hardly finished tossing out the Greek letters it didn't think it needed when it started to borrow Greek words that needed them.)
1. Make the following unconditional replacements:
ch sh ph th qu wr wh xh rh
ç $ f + kw r w x r
Before an o, replace wh with h instead: who, whore, whole. If you're one of those fossils who still use a voiceless w or another strange contortion to distinguish wh and w, you'd modify this rule. We can do significantly better than the program if we don't do these substitutions when the digraph spans a morpheme boundary. In other words, we shouldn't do the replacement in compound words like bosshood, flathead, uphill, or perhaps.
We can also do better if we replace ch with k in words of Greek and Hebrew origin-- that is, in two-dollar words like archaism or trochaic or Malachi. The program actually replaces only initial rh, since medial rh is so likely to be found in a compound (and it doesn't occur finally in the sample lexicon). (xh isn't really a digraph; the rule just reflects the fact that an initial h isn't pronounced after a prefix ending in x, as in exhibit.)
2. Replace x with ks; but after e and before another vowel, use gz instead.
(This is not an allophonic rule: compare the near-minimal pair exist and excite.)
3. Ignore apostrophes (can't, cop's, o'clock). Hyphens can however be
treated as word separators (mother-in-law is pronounced like mother in law).
The notorious gh
4. Before a vowel, gh becomes g: ghost = göst. 5. gh turns a preceding single vowel long: right = rït. 6. aught and ought become òt: daughter = dòt@r, sought = sòt. 7. Any other ough becomes ö: dough = dö. 8. Elsewhere, gh is simply dropped: freight = frät.
People usually trot out gh when they bitch about English spelling. The culprit is sound change: gh used to do nicely for the x sound (now usually represented kh when we transcribe foreign words), but the sound disappeared in everything but Scots. It usually went quietly, but sometimes, word-finally (laugh, cough, enough, rough, tough, and not much more) it was transformed to finstead. ough is also notorious, but the usual sound (as seen in rule 7) is ö. Through is a notable exception. Initial gh is sometimes used to keep the g from softening (ghetto); but generally it's a meaningless variant on g, said to be introduced by Dutch typesetters in the early days of printing. In any case it's no problem, since it's always g. This is one reason Shaw's ghoti is such a fraud: initial gh can never be pronounced f.
9. In initial gn, kn, mn, pt, ps, tm, pronounce the second letter
only: gnostic = nôstîk, psycho = sïkö, knight = nït. Most of these are Greek borrowings-- Greek is much freer with initial clusters than English is-- but kn derives from Old English.
10. Replace y with ï if it ends a one-syllable word: ply = plï. 11. ey is pronounced ë; ay is ä; and oy is öy: say, monkey boy = sä mûnkë
12. Replace y with i if it's not adjacent to a vowel-- we'll worry later about
how to pronounce the i. Thus, system = sîst@m but you, where the y adjoins a vowel, is yu.
Simplification of stl
13. The t in stl is lost before a final vowel: bustle = bûs@l", bristly = brîslë.
This could perhaps be generalized; but in slow speech I leave the t in (say) coastline or Christlike. I'm also tempted to generalize to all stops, but the only instance in the sample lexicon is muscle, and it's pretty silly to have a rule that applies to a single word.
(Af)frication before i
14. ci or ti becomes $ before a vowel: gracious = grä$@s, nation = ä$@n. 15. tu becomes çu before a vowel, or before a liquid (r, l) followed by a
vowel: mutual = müçu@l, mature = m@çur.
16. s becomes $ (or $ if it's preceded by a vowel):
before o-- passion = pâ$@n, vision = vî$@n". Note that the i is lost. before ur-- assure = @$ur; leisure = lë$@r. after k and before a vowel: sexual = sêk$u@l.
At some point English affricated a number of consonants before a i or y that preceded another vowel, including the [y] sound that begins ü Sometimes the y has been lost since. This process seems to be no longer productive-compare costume, Casio. (Or is it? In quick speech I do say kôsçùm.)
Rule 14 shows another reason ghoti is a fraud: ti only fricativizes when it's followed by a vowel.
Voicing of s
17. s is voiced between two vowels (amuse, design, prison), except
after a (base, parasite). It's easy to find exception to this rule: disagree, opposite, analysis-- there's even words where the rule applies only for verbs (abuse, house). The rule as stated has more successes than failures, and I haven't been able to find merely lexical rules that do much better. A better rule might take the language of origin into account: the voicing tends to occur in French and Latin words (resent, please, reason, miserable), but not if they're from Greek (analysis, isoceles) or more exotic languages (papoose, Osaka). The voicing of s is so almost predictable that there are orthographic conventions (borrowed from French) to indicate that we really do want an s: double the s (cf.Moses vs. mosses), or use c instead (race vs. rase). Annoyingly, there are a few cases of unexpectedly voiced ss (dessert, dissolve). As a corollary of this rule, the American use of -ize for British -ise was unnecessary, although of course it is more foolproof.
You know me, al
18. al is pronounced òl before r, s, m, a dental stop, or final ll: also, already,
wall, bald, although, almost.
19. alk becomes òk, except initially: walk = wòk.
I suspect this is a sound change, obscured by later borrowings like alcohol.
Softening of velars
20. c becomes s before a front vowel, k elsewhere: cell = sêl, acid = âsîd,
but cow = kôw, backer = bâk@r, clear = klër.
21. Similarly, g becomes j before a front
vowel, g elsewhere: gel = jêl, turgid = t@rjîd, but got = gôt, twig = twîg, gleam = glëm.
22. If the g doesn't begin the word, and the triggering e precedes o or a,
the e is lost: changeable = cänj@b@l; dungeon = dûnj@n (but geology = jëôl@jë).
23. Initial gu or final gue is pronounced g: guest = gêst, plague = pläg.
(Medially, it tends to be gw instead: language, anguish.) Front vowels are i and e; note that y was changed to i by rule 12. We owe these rules to a sound change, and not even our own-- it derives from the history of French. The last two rules allow g to be used for two sounds:
ga ge gi go gu ja je ji jo
can be written ga gue gui go gu ju can be written gea ge gi geo geu.
The inserted e or u are orthographic only; they make sure rule 21 applies or doesn't apply, as desired. In French, there's a parallel with c:
can be written ca que qui co cu sa se si so su can be written cea ce ci ceo ceu (but it's more usual to write ça ce ci ço çu)
ka ke ki ko ku
but it doesn't work so well in English, since our qu is still kw. The inserted e is found in just a few words (e.g. placeable), due to compounding.
Untangle reverse-written final liquids
24. le and re (after a consonant, and ending the word) should be
To be precise, they become syllabic consonants: the final sound in bottle is a prolonged dark l. I think this is an allophonic detail, however: if you like, just add a rule at the end to turn all instances of @r into syllabic r.
Short and long vowels
OK, listen up, because these are the two most important rules of English spelling.
25. Vowels are pronounced long before an intervocalic consonant (rate,
mete, fine, rote, cute = rät
mët fïn röt küt).
26. They're short before two consonants (baffle, held, children, rotten,
butler), or before a final consonant (pat, pet, pit, pot, but = pât bût).
pêt pît pôt
English has a dozen or so vowel phonemes, and this silly alphabet we inherited from the Romans has just five vowel symbols (y is sometimes used as a vowel, but as we've seen, it pointlessly duplicates i). The five symbols can represent ten sounds, thanks to these rules. Each vowel letter has two basic interpretations, which by convention are called long and short. (Phonetically they're not distinguished by length; tense and lax would be more accurate. But I think the more familiar terms will be more readable, and remind readers that their old English teachers were onto something after all.) In my transcription, long vowels are marked with a diaresis, since html doesn't supply a macron (äëïöü), and short vowels with a circumflex (âêîôû). Now you can see why I chose those odd representations-- they come from the basic logic of English spelling. (Think of the diaresis as the IPA : long mark.) Note that the names of the letters A E I O U are simply the 'long' vowels. And where did that come from?
The spelling of the long vowels is the fault of the Great Vowel Shift of early modern times. Middle English spoke the vowels with their 'proper' vowels, so that (say) mate would have been pronounced môt@. The short vowels are simply laxed versions of the original sounds of the long vowels. ê, for instance, is a lazy version of ä (the original sound of long e)-- closer to the muddy center of the vowel space.
The above rules work in conjunction with rule 54, which means that doubling a consonant changes a medial vowel from long to short: later/latter, Peter/petter, biter/bitter, hoping/hopping, cuter/cutter.
Exceptions, but general ones
27. Final ind is ïnd, final oss is òs; final og is òg: mind, boss, dog = mïnd bòs
28. o also becomes ò before f and another consonant
(offer = òf@r, soften = sòf@n).
29. wa is pronounced wô before a dental or alveolar consonant (t d n s
want, wander, swan, Rwanda, swat, wad, wasp, and as wò between w and(t)$: wash, squash, watch = wò$ skwò$
29a. u is pronounced u before l, or after a labial stop (pb) and before a
sibilant (s$ç): adult, push, butch. (This doesn't apply if the u is long: mule.) I don't think I ever noticed these generalizations till I started working out the rules for this page. At least some of these, such as 29a, are sound changes from Shakespeare's time. Rules such as 6, 18, 19, 27, 28, and 51 introduce ò, a vowel which (as signalled by the odd diacritic in my transcription) doesn't fit well into English phonology. The fact that a velar occurs in many of the rule conditions suggests that it was originally an allophonic variant of /ô/ and /â/ in this environment-- compare dog, ought, long, walk with dot, out, lot, wad. But it's now phonemic in GA, as can be seen in the minimum triad caught, cot, cat. These rules would have to be modified (and some could be eliminated) in dialects that merge ò and ô. For some speakers, rule 29a only applies after labials, so that pull and dull don't rhyme.
Softening of gn
30. Except before a vowel, the vowel in ign or igm lengthens, and the g is
lost: alignment paradigm = @lïnm@nt,
but igneous = îgnë@s.
31. The g is simply lost in eign: feign = fän.
Handling of -ous
32. Except before a vowel, ous reduces to @s: jealous = jêl@s.
I'm ambivalent about rules that relate to a particular suffix, since arguably the pronunciation is simply a fact about the suffix in the mental lexicon. But a suffix can apply to dozens of words, so there was a large gain from including some such rules in the file. Note the importance of order: this rule has to be ordered before silent e deletion, or it will apply to words like arouse.
Removal of silent e
33. Remove final e: rate mike cute = rät mïk küt (unless it's the only vowel
in the word, as in he). This and rules 25 and 26 (on long and short vowels) are the guts of the English spelling system. They allow the five vowel symbols to represent ten vowel phonemes.
English orthography tends to preserve the spelling of morphemes in derived words, including their final e. The program is too stupid to handle this, since it has no way of recognizing compounds. But of course in words like safety, lovely, changeable, careful, warehouse, jukebox, placement, placeholder the e in the first morpheme should be deleted by this rule. People pay tribute to these rules every time they make up words-- whether for marketing purposes (Nite-Lite, Cold-Eeze, Unix), slang (reefer, dweeb, doofus), a created world (hobbit, Leela, Oz, Alley Oop, Naboo, Mr. Magoo, Morlock), or for borrowings ( thuggee, kangaroo, tycoon, igloo, tepee). Words that don't fit the pattern, like Linux, can cause confusion.
Add shortening; stir
Some vowels that are orthographically long are pronounced short, and frankly I haven't put my finger on the pattern. In the file I did add this rule:
34. Shorten a vowel that precedes a simple, final CV syllable (and is not the
first syllable in the word). This handles words like anomaly, cinema, sanity, biology, century; but it fails on other words, like patina, tuxedo, agora. Obviously the shortened vowels are all unstressed; but the idea here is to predict pronunciations from the spelling, and the spelling doesn't indicate the stress. (We've already removed silent e, so this rule isn't triggered by words like phoneme.) Somewhere I read that long vowels can't occur earlier than the antepenult; but obvious counterexamples are isolating or unification. I'll see if I can improve the generalization, however.
Besides the long/short trick, English expands its repertoire of vowel representations with digraphs. Quite a few of these are redundant, and there are lots of exceptions-- this, and not ch or ough, is the real weak point of English spelling.
35. iV (that is, i plus another vowel) becomes ï@ in the initial syllable: bias,
diagram = bï@s,
36. Exceptions to the following rule:
Final ow is pronounced ö: slow, rainbow, overthrow. oo is pronuonced ù before a k: book, crook, look.
ei is pronuonced ë after s: perceive, ceiling, seize. ie is pronounced ï finally: dye, necktie. oul becomes ù before a final d.
37. Make the following substitutions:
au, aw ò ee ea ei eo
ë ë ä ë@
eu, ew ü ie iV oa oe oo
ë ë@ ö ö u
ou, ow ôw oi ua ue ui
öy ü@ u u
Again, the program is not smart enough to recognize when the digraph spans a morpheme boundary, and thus should be treated as two separate vowels: goer =gö@r, coaxial = köâksë@l. Annoyingly, some of these digraphs have at least two values: cf. wool, fool; mead, dread; fief, friend; reign, seize; ground, group. The values in the
table are those that occur most often. (The alternatives are generally just a step or two apart phonetically, e.g. u/ù, ë/ê, ä/ë.) For ease of exposition I've put the final ie rule here, but it really goes before rule 14 (affrication); otherwise terrible things happen to words like untie.
Those pesky final syllabics
38. Any vowel reduces to @ before final l: battle, final, hovel, evil, symbol. 39. Any short vowel reducts to @ before a final n: human, frighten, cabin,
button. These rules don't apply to monosyllables (pal, can), nor to vowels that have already been assigned a particular value by an earlier rule (e.g. meal to mël by rule 37). These rules could probably be refined; they don't apply to stressed finals, but again, the orthography doesn't indicate stress. You can take @l as a phonemic representation, or add a rule at the end to replace it with vocalic l. Ditto for @n.
40. The following suffixes are reduced as follows:
-able, -ible -lion -nion
@b@l ly@n ny@n
Again, we really shouldn't have 'rules' for single lexical entries. But these suffixes are common, so the rule has a large yield.
41. A final b or n is not pronounced if preceded by an m: damn bomb = dâm
Final vowel coloration
42. Pronounce any remaining final vowel as follows:
-a -i -o -u
@ ë ö u
A final vowel is usually the mark of a foreign word, which is why final vowels tend to have the 'continental' values: sushi, cello, haiku. Earlier borrowings were nativized, meaning that final vowels had to be written as diphthongs (e.g. Munsee, Hindoo). Since final -e is already in use, we used to mark one that was supposed to be pronounced (Chloë = klöë), or, if we were borrowing from French, we retained the accent (café = kâfä). But English seems to be so allergic to diacritics that these helpful conventions have largely been lost.
Vowels before r
r is hell on English vowels; it tends to color the vowels, and in many dialects, disappear. In GA there are 12 monophthongal vowels, but only 6 can appear before r-- ä ë ô ö ò u-- plus @r, which is really just a prolonged vocalic r.
43. An ôw, ô, or ò resulting from the previous rules changes to ö before
an r: course = körs, for = för.
44. war is pronounced wör, except before a vowel: warlock, war,
dwarf = wörlôk, worry. merit = tär@r,
and wor is pronounced w@r: word, worst,
45. ê or â before a double r (and ê before ri) become ä: terror, marry,
46. â before any other r becomes ô: mark, star = môrk, stôr. 47. ê, î, û before r are reduced to schwa: perk, fir, fur = p@rk, f@r, f@r.
Thanks to the infamous rule 45, I pronounce Mary, merry, marry the same. If you left this rule out, it would probably correctly predict the pronounciation of Easterners and Britons who distinguish them.
The velar nasal ng
The careful reader may wonder why ng was not handled earlier, with the other consonantal digraphs. The reason is that orthographically, it acts as a double consonant-- e.g. singer has a short not a long i. But now it's time to handle it. For lack of an eng, I represent the velar nasal as ñ; don't confuse it with a palatalized ny.
48. ng becomes ñg before a liquid (r, l) or semivowel (y, w): angry,
England, singular, anguish = äñgrë,
îñglând, sîñgül@r, äñgwî$.
49. ng becomes ñ finally, or before another
consonant: hung = hûng, length = läñ+.
50. n becomes ñ before a velar stop (k, g): anger = äñg@r, think = +îñk. 51. ô becomes ò, and â becomes ä before ñ: song = sòñ; hang = häñ.
Note that rule 50 doesn't apply to words like hung, because rule 49 already removed the g in those words. 50 is arguably merely allophonic, but since it's completely consistent I treated it as a spelling rule. You could certainly say that a word like ungrateful 'really' has an underlying /ng/, because it's composed of un plus grateful; then this, as in most languages, will get pronounced ñg. But if you go that route, you can't actually show that English allows /ñg/ as well as /ng/-- how do we know that wrong isn't actually /ròng/, modified by the allophonic rule? The important thing is not to pretend that we have a contrast of /ng/ and /ñg/.
Voicing of s
52. s is voiced finally, after a voiced oral stop: dogs = dògz. 53. It's also voiced before final m: prism = prîzm.
The first of these rules is really morphophonemic: the plural, possessive, and 3p singular inflections of English are spelled s even when, by assimilation, they're pronounced z. This rule is not phonological, as can be seen by a word like chance = çâns; compare fans = fânz.
54. A double consonant is pronounced singly: dinner, buzzard,
hassle = dîn@r,
55. A t disappears before ç, and a d before j: batch = bâç, judge = jûj. 56. An s disappears before $: pressure = prê$r.
Rule 54 works hand in hand with rule 25: a consonant is doubled to show that the preceding vowel is short: redder = rêd@r (compare red, where the d doesn't need to be doubled because a vowel preceding a final consonant is already short). Rule 55 is something of a corollary: to 'double' ç, we write tch rather than chch; and to double a j, we write dg rather than jj or gg. Rule 56 goes with rule 16, which changed s to $
of u. before some instances
Almost but not quite regular
In the rule list there's almost a rule that changes o to û before certain fricatives or nasals. Here's a list of affected words, as well as counterexamples:
clover, prove, drover, jovial, move, novel, above, cover, dove, glove, govern, hovel, _v over, poverty, proverb, province, sovereign, hover, love, oven, shovel, of stove, bovine
color other, another, mother, brother, nothing
apology, polo both, bother, broth, brothl, cloth, clothes, moth alone, bone, honest, honor, tonight, pond, beyond, conk bomb, comb, dome, home, gnome, Mom, whom, womb
onion, none, money, monk, monkey, _n month, wonder, front, son, sponge, honey, Monday, one
come, become, from, some, stomach
Most of these turn out to be due to an orthographic or even a calligraphic rule: medieval English scribes wrote o instead of u before m, n, v, apparently because in the medieval hand, the verticals of the u ran confusingly together with those of the following consonant.
So what's irregular?
The biggest source of errors are those that I considered near-misses: instances where the rules get the length of a vowel wrong, or don't predict a reduction to schwa, or don't predict a voiced s. The first two of these are a feature not a bug, since they make word roots recognizable, despite predictable differences in pronunciation. For instance, the rootpedant is spelled identically in pedant (pêd@nt) and pedantic (p@dântîk)). This underlines the relationship between the two words, despite the fact that neither root vowel is pronounced the same. Similarly, sanity has a short a (sânîtë), although a vowel preceding a single consonant is normally long; this is an 'error', but it keeps the same spelling of the root as in sane. Putting these near-misses aside, my program gets 791 words wrong in a 5180-word sample vocabulary. Many of these are really stupidities of the program, not the language. There are:
188 simple variations of other errors-- e.g. since busy is wrongly predicted to have a ü, so is business 52 borrowings using foreign spelling conventions (e.g. aficionado, bourgeois, cello, stein). Borrowings are common enough in English that writers can learn the patterns for each source language. 18 instances of final -ed taken as êd 45 words (mostly Greek) where ch = k not ç 45 silent e's not recognized as such due to compounding 20 over-enthusiastic vowel reductions (usually due to stress falling where, statistically, it doesn't occur much: amen, violin; or to vowels that unexpectedly don't turn to schwa before r: mirror, sergeant). 6 instances of consonant combinations taken as single sounds despite crossing a morpheme boundary (e.g. dishonor, shepherd)
That leaves about 420 words wrong, less than 10%; the major categories are as follows:
195 misinterpretations of diphthongs; some of these are genuine ambiguities in English spelling (cf. dead, mead, real; die, sieve, science, fief); others are due to insufficient analysis (e.g. poet is mispredicted simply because I didn't provide a rule for oe-- it wasn't worth it, it occurred too rarely in the lexicon). 37 examples of the o to û change discussed above.
26 indefensible vowel spellings (e.g. pretty, women, resin, English, lose, swamp, water, bury, lawyer). 17 consonant clusters not simplified enough (e.g. half, folks, listen, mortgage, raspberry). 17 instances of an unexpected (or mispredicted) ò; e.g. cloth, frost, chocolate. 18 instances of final -y being ï rather than ë . 13 annoying cases where g before a front vowel is hard (e.g. get, give); there are also 4 cases where gg + front vowel was taken incorrectly as gj-- which it should be, dammit (suggest) but often isn't (stagger). 8 instances of an unexpected ù; e.g. put, wolf, woman. (These all begin with labials-- these may be related to rule 29a.) 10 unexpected (af)frications (e.g. educate, ocean, righteous, sure); there's also an instance of an unexpected lack of frication (absurd) 8 more instances of er becoming är (besides those noted in the rules-- e.g. era, there, herald, very) 6 instances of vowels unexpectedly dropping (e.g. bachelor, vegetable, Wednesday)
Generating spellings from pronunciation
Can you reverse these rules to get instructions on how to spell a word given its pronunciation? Not really, since there are too many alternative spellings. However, the following table can be taken as a first approximation. For each GA phoneme, I list the spellings referred to in the rules above. Caveats:
Remember the long/short vowel rules (25,26). o To ensure a short pronunciation, double the following consonant. o To ensure a long pronunciation: at the end of a word, add a silent e elsewhere in the word, use a diphthong instead. Remember the softening of velars; see rules 20-23 for a discussion of how to spell s/k/g/j before various vowels. Parenthesized characters represent the environment where you can use a spelling. Examples: o under s, (V)ss(V) means that you can spell it ss between two vowels o under ä, a(ng) means that you can spell it a before ng. # represents the end or beginning of a word: o i# under ï means that this spelling occurs word-finally. ks (or intervocalic gz) can be written x.
It's preferable to spell a word the same way across all morphological changes, even if it means slight violations of the rules (e.g. 'silent final e' in the middle of a word). Likewise: write reduced vowels with the full vowel in a morphologically related word. E.g. the second vowel in parent is e because we have a full ê inparental.
p b t d g k m n ñ f v
ä â ë ê ï î ö ô ü û
a, ay, ai, ei, e(r), a(ng) a e, ee, ea, ey, (c)ei, e(V), i#, y# e, ea i, y ,ie, igh, ig(n), i(V) i, y o, oa, oe, ough, o#, ow#, eau o, (w)a(n/s/t/d), a(r) u, eu, ew u
p b t d g, gh(i/e/y) k, c(a/o/u), q(u), ck# m n ng, n(k,g) f, ph v th th s, (V)ss(V), c(i/e/y), ce(a/o/u) z, (V)s(V) sh, ci(V), ti(V); rule 16 situations: s, ss s, zh ch, (doubled) tch, t(u) j, (doubled) dg, g(i/e/y),
u ò ù @
oo, ue, ui, u#
au, aw, augh(t), a(l), (w)a(sh,ch), o(ss#, + g#, fC, ng) oo, u V, a#
s z $
ou, ow oy, oi
$ ç j
y; yu can be u w, #wh, u(V)
r ;l h
r, #wr, rh l h
@r @n @l
Vr, re# Vn Vl, le#
Spelling reform by regularization
You could use the above table as the basis for a really useful and minimal spelling reform. For instance, here's Percy Bysshe Shelley's Ozymandias in regularized spelling. To minimize the barbarity, I exempt one- and two-letter words from reform. I met a traveller from an anteke land hu sed: Tue vast and trunkless legs of stone stand in the desert. Near them, on the sand, haff sunk, a shattered visage lies, huse frown, and wrinkled lip, and sneer of cold cummand tell that its sculptor well those passions read, which yet remain, stamped on these lifeless things-- the hand that mocked them, and the hart that fed. And on the peddestal these words are carved: 'My name is Ozzymandias, king of kings! Look on my works, ye mighty, and despair!' Nuthing beside remains. Round the decay of that colossal wreck, boundless and bare, the lone andlevvel sands stretch far away. Or of course we could just hang it up and use Chinese-style syllabograms instead.
So how horrible is English spelling really?
I doubt that this page will convince anyone that English spelling is a good system. There's too many oddities.
Vowel combinations are a mess-- often the best you can do is give the two most likely sounds (realm, reap), and even those will be
overruled in the fairly frequent cases where two vowels really adjoin (reality). There's too many quirky rules that derive from odd sound changes. We may not be able to get away from the Romance c/g softening or the Great Vowel Shift, but does our spelling need to preserve old forms of feign or walk? There was a period when busybodies did their best to make English look like Latin. This was bad enough when we distorted perfectly good French loans likedette into debt, but we're also stuck with false etymologies like island (in place of the older, and regular, iland). And the modern custom of borrowing instead of adapting spellings, though nice for etymology, plays havoc with the orthography, especially as we start to borrow from more exotic languages and forget where they're from. I've heard well-meaning idiots pronouncing a Russian z as ts, as if it were German; and people like to pronounce words like Sarajevo as if they were Spanish. And why spell gyros as if it were classical instead of modern Greek (inviting the pronunciation jïröz in place of yërös)? While we're at it, could we please fix the word ginkgo, which is not only difficult and irregular, but doesn't reflect any proper Japanese word? The Japanese characters (銀杏) can be read two ways: as icho:, they refer to the tree; as ginnan, to the fruit. The second character can be read kyo: in other words, so someone misread the combination as ginkyo:, and someone else mangled this into ginkgo.
What I hope to have shown, however, is that beneath all the pitfalls, there's a rather clever and fairly regular mechanism at work, and one which still gets the vast majority of words pretty much correct. It's not to modern tastes, but by no means as broken as people think.
'woncha' = "won't you'' Also: "do you" is often pronounced to sound exactly the same as "Jew"
"Where are theirs?" "They're there." "No, they aren't" "Well that's where they were."
/i:/ here, hear, beer, serene, prenatal, breathe, the (before
vowels), leisure, we, he, she
beat, seat, sheet, receive, brief, pier, fear, seizure, obscene,
/i/ cyclical, bicycle, pretty, forage, pigeon, lettuce, busy,
business, build, Jesus's, mountain, waited, beloved
bit, kit, mint, hill, hymn, women, it, which, av(e)rage,