Professional Documents
Culture Documents
Random Words
Probability and English ... what a mix!
Random Letters
You would think it was easy to create random words ... just pick letters randomly and put them
together, and voila! a random word.
Well, here are 20 words made that way:
tldkl oewkx dmwol vuptg hvwjk naqid avypr zwtip zgnzs bvdhd
muyfd ighgd xhlng oyecn vjnsl ssjrx gxald tukxj rvfoq yxzxq
It turns out that the words are not only nonsense, but quite hard to pronounce!
(Try saying "tldkl" or "oewkx")
Why? Well, English has around 200,000 words (228,000 in the Oxford English Dictionary,
including many words no longer used) ... but how many different words can be made with just 5
letters?
26 × 26 × 26 × 26 × 26 = 11,881,376 possible 5 letter words!
And that is just the 5 letter words ...
Let us guess that there are 40,000 words in English that have 5 letters. So the probability of
making a real word just randomly would be:
40,000 / 11,881,376 = 0.003, or about 0.3% chance
So real words are rare. And we can see that putting random letters together is very unlikely to
produce a real word.
https://www.mathsisfun.com/data/randomwords.html 1/6
1/16/2017 Random Words
Vowels
We can improve our success by insisting that a word have at least one vowel, since nearly every
word in English has one (except fly, by and a few others). Like this:
ectot gjaqv kuifg vzicu zspsu pdidb wqdis uerrs ucgej okimw
fnevz ewxko ljgew aglgo jpfoq dcytu uwkcj dzioy wekdx xuybk
This is a great improvement. More words can be pronounced.
But there are still lots of strange words like "zspsu" and "xuybk"
Letter Frequency
So, our next improvement is to use less of the letters like j,x,z and q and more of the letters like
e,t and s.
In fact the frequency of letters in the English Language is well known. Here is how many times
you would expect to see a letter in every 1,000 letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
82 15 28 42 127 22 20 61 70 2 8 40 24 67 75 19 1 60 63 90 27 10 24 2 20 1
Can you see that "e" is common, but "z" is rare?
"e" is lkely to occur 127 times in every 1,000, or as a ratio 127/1000 = .127 (=12.7%)
"z" is lkely to occur only 1 time in every 1,000, or as a ratio 1/1000 = .001 (=0.1%)
So, by selecting letters based on that frequency (a bit like rolling a 1,000 sided die (dice) ,
where each die has 82 a's, 15 b's ... and only one z), we can get output like this:
elnao etgov segty laast aessn siuon oenha eaoas ncoot ctwka
dmswo dpuoh eewis ebdni laarm syucs idvos lhina igahh soyie
Still no real words, but some are close. And most of them can be pronounced. (Great names if
you are writing a science fiction novel!)
https://www.mathsisfun.com/data/randomwords.html 2/6
1/16/2017 Random Words
View Larger
but we can do better ...
2-Letter Frequencies
We can take the idea of Letter Frequency one step further by asking
"what is the frequency of letters that follow another letter"
For example, if we already have a "t", the next letter is very likely to be an "h" (making "th").
https://www.mathsisfun.com/data/randomwords.html 3/6
1/16/2017 Random Words
So, "h" occured 3197 times after a "t" ("th") ... but "b" never followed a "t"
OK, let us start with a "t", and let us say we choose an "h" to make "th", then next we would use
the "h"row to choose another letter (maybe an "e" to make "the"), and so on ... well, here is a
sample:
the cur the bund hof arytowno d sheromasees asemedosouro f
soacthake d imon binofowat oaten d heng wa
The results are remarkable ... nonsense, but almost like some strange language.
In fact we are not just making random words now, we are making random sentences!
3 Letter Frequencies
How do 3 Letter Frequencies work?
Well, say I already have two letters (like "ei") ... we then:
look through the sample text for every time "ei" appears,
randomly choose one of those
look for the letter following "ei" (possibly "t").
then add the "t" to make "eit"
and start again using "it" (... always the last two letters)
Here is a sample:
Either great into get very deep welled of it it, and
to wondere started into the book about hear!
Now, that looks good! By sampling from a real source we can get good results.
4 Letter Frequencies
Using the same method I used groups of 3 Letters to decide on the 4th letter and got:
https://www.mathsisfun.com/data/randomwords.html 4/6
1/16/2017 Random Words
Either the sides or conversations in time to
happen next. First, she look down mind
5 Letter Frequencies
And with 5 Letter frequencies:
There was just in time it all seemed quite natural);
but to take out of time as she had not like to do
Find something from Shakespeare, or a political speech and see what it comes up with ... you
could even combine quotes from different authors to see what their children might write.
View Larger
https://www.mathsisfun.com/data/randomwords.html 5/6
1/16/2017 Random Words
Search :: Index :: About :: Contact :: Contribute :: Cite This Page :: Privacy
Copyright © 2013 MathsIsFun.com
https://www.mathsisfun.com/data/randomwords.html 6/6