You are on page 1of 4

How many words?

It is often said that the English language is particularly rich in vocabulary) but to make
such a statement we need to know what words to count and what counts as a word.

DAVID CRYSTAL meaning as the lock on a canal?


English? And how many of
HOw thesemany words
words doesare athere
nativein
Should ring (the shape) be kept
separate from ring (the sound?) Are
speaker know? These apparently would you treat them as combina- such cases 'the same word with
simple little questions turn out to be tions of old words: foster + brother, different meanings' or 'different
surprisingly complicated. In answer care, and so on. This is a big problem words'? These are the daily decisions
to the first, estimates have been given for the dictionary-makers, who often that any word-counter (or dictionary
ranging from half a million to over 2 reach different conclusions about compiler) must make.
million. In answer to the second, the what should be done.
estimates have been as low as 10,000 What would you do with get at, get
and over ten times that number. by, get in, get off, get over, and the Whose English are we
People are, it seems, quite happy to dozens of other cases where get is used counting?
drop all kinds of figures into their with an additional word. Would you
lectures and publications (see Panel count get once, for all of these, or Sooner or later, the question would
1). The figures give the impression of would you say that, because these arise about the kind of vocabulary to
great precision - though it should be items have different meanings (get at, include in your count. There
noted that they are usually accompa- for example, can mean 'nag'), they wouldn't be a difficulty if the words
nied by such emptying expressions as should be counted separately? In were part of standard English - used
'approximately', 'on average', or 'it is which case, what about get it?, get your by educated people throughout the
thought'. Nonetheless, the vagueness own back, get your act together, and all English-speaking world. Obviously
does not stop organizations offering the other 'idioms'? Would you say these have to be counted. But what
courses and exercises (at a price) that that these had to be counted about the vast numbers of words
will enable readers to 'increase their separately too? Would you count kick which are not found everywhere -
word power' - without ever providing the bucket (meaning 'die') as three words which are restricted to a
these readers with the opportunity of familiar words or as a single idiom? It particular country (such as Canada,
discovering what their current word hardly seems sensible to count the Britain, India, or Australia), or to a
power actually is. words separately, for kick has nothing particular part of a country (such as
How can we throw light on this to do with moving the foot, nor is Wales, Yorkshire or Liverpool)?
apparently confusing area? Let us bucket a container. They will include words like stroller
begin with the question of how many If you let the meaning influence (= push-chair) and station (= stock
words there are in English - a topic you (as it should), then you will find farm) from Australia, bach (= holiday
which has attracted almost as many your word count growing very cottage) and pakeha (= white person)
estimates as estimators. The question rapidly indeed. But as soon as you do from New Zealand, do'ICP (= village)
is complex for two reasons. It partly this, you will start to worry about and indaba (= conference) from
depends on what you count as an other meanings, even in single words. South Africa cwm (= valley) and
English word, and partly on where Is there a single meaning for high in eisteddfod (= competitive arts festival)
you go looking for them. high tea, high priest and high season? Is from Wales,faucet (= tap) and fall (=
the lock on a door the same basic autumn) from North America, fort-
What counts as a word?

Consider the problems, if someone


asked you to count the number of
Varying estlmates
words in English. You would im- Shakespeare had one of the largest At two years old the average vocabul-
mediately find thousands of cases vocabularies of any English writer, ary is about three hundred words. By
where you would not be sure whether some 30,000 words. (Estimates of an the age of five it is about five thousand.
to count one word or two. In writing, educated person's vocabulary today By twelve it is about 12,000. And there
it is often not clear whether some- vary, but it is probably about half this, for most people it rests - at the same
15,000.) (Robert McCrum, et ai, The size repertoire employed by a popular
thing should be written as a single
Story of English, 1986, p. 102) daily newspaper. (Jane Bouttell, The
word, as two words, or hyphenated.
Guardian, 12 August 1986)
Is it washing machine or washing- He [Shakespeare] has the largest
machine? school children or school- vocabulary of any writer in English, Graduates have an average vocabulary
approximately 34,000 words, which is of about 23,000 words, fostered, I
children? flower pot, flower-pot or
about double what an educated person would contend, by intensive tutoring.
flowerpot? Would you count all the
uses today in their lifetime. (John (Jane Bouttel, also The Guardian)
items beginning with foster as new Barton, in The Story of English episode
words: foster brother, foster care, foster 3)
child,foster father,foster home, etc? Or

ENGLISH TODAY No. 12 - OCTOBER 1987 11


night (= two weeks) and nappy (= Royal Automobile Club), AAA ( =
DAVID CRYSTAL read English at
baby wear) from Britain, loch( = lake) University College London, and has since
Automobile Association of America),
and wee (= small) from Scotland, held posts in linguistics at the University or reflect local organisations and
dunny ( = money) and duppy (= ghost) College of North Wales, Bangor, and at attitudes - with varying levels of
from Jamaica, lakh (= a hundred the University of Reading, where he seriousness - such as MADD (=
taught for twenty years. He works
thousand) and crore (= ten million) Mothers Against Drunk Driving) and
currently as a writer, lecturer, and
from India, and many more. broadcaster on language and linguistics, DAMM ( = Drinkers Against Mad
Regional dialect words have every maintaining his academic links through Mothers).
right to be included in an English an honorary professorship in linguistics at Because these forms are dependent
vocabulary count. They are English Bangor. He is the editor of Linguistics on 'bigger' words for their existence,
Abstracts and Child Language Teaching
words, after all- even if they are used and Therapy. Among his recent
you might well decide not to include
only in a single locality. But no one publications are Listen to Your Child, them in your count. On the other
knows how many there are. Several Who Cares About English Usage?, and hand, you could argue that they are
big dictionary projects exist, cata- Linguistic Encounters with Language often more important than the
Handicap. His most recent book is the original words - and that the original
loguing the local words used in some
Cambridge Encyclopedia of Language.
of these areas, but in many parts of words may not even be remembered
the world where English is a or known (as many people find with
mother-tongue or second language, such forms as AIDS). Personally, I
there has been little or no research. how much use is still made today of would include them in my word
And the smaller the locality, the such early jazz-world words as groovy, count - but some dictionaries do not.
greater the problem. Everyone knows hip, square, solid, cat, and have a ball? There are other marginal cases.
that 'local' words exist: 'we have our Or how much use is made of the new What would you do with the names of
own word for such-and-such round slang terms derived from computers, people, places and things in the
here'. Local dialect societies some- such as he's integrated (= organised) world? Should London, Whitehall,
times print lists of them, and dialect or she's high res (= very alert, from Paris, Munich, and Spain be included
surveys try to keep records of them. 'high resolution'). Which words for in your word coun t? You migh t think
But surveys are lengthy and expen- 'being drunk' are now still current: they should - especially knowing that
sive enterprises, and not many have canned, blotto, squiffy, jagged, paraly- many of these words are different in
been completed. As a result, most tic, smashed ... ? And how do we get other languages (such as M unchen and
regional vocabulary - especially that at the vast special vocabulary which Espaiia). However, it isn't usual to
used in cities - is never recorded. has not grown up in the drugs world? include them as part ofthe vocabulary
There must be thousands of distinc- Word-lovers from time to time make of English, because the vast majority
tive words inhabiting such areas as collections, but the feeling always can appear in any language. Whichev-
Brooklyn, the East End of London, exists that the items listed are only the er language you speak, if you walk
San Francisco, Edinburgh and Liver- tip of a huge lexical iceberg. down Pall Mall, you can refer to
pool, none of which has ever where you are by using the words Pall
appeared in any dictionary. Mall in your own language. The old
The more colloquial varieties of Some marginal cases music hall repartee relied on this
English - and slang, in particular - point:
also tend to be given inadequate Estimating the vocabulary size of
treatment. In dictionary-writing, the English is further complicated by the A: I say, I say, I say. I can speak
tradition has been to take material existence of hundreds of thousands of French.
only from the written language, and uncertain cases - words which you B: You can speak French? I didn't
this has led to the compilers concen- wouldn't feel were part of the know that. Let me hear you speak
trating on educated, standard forms. 'central' vocabulary of the language. French.
They commonly leave out non- On the other hand, you might well A: Paris, Marseilles, Nice, Calais,
standard expressions, such as every- feel unhappy about leaving them out. Jean-Paul Sartre .
day slang and obscenities, as well as What would you do with all the
the slang of specific social groups, abbreviations that exist, for example? The same applies to the names of
such as the army, sport, thieves, A recent dictionary of abbreviated people, animals, objects (such as
public school, banking, or medicine. words (the impressive Acronyms, trains and boats), and so on. Proper
Eric Partridge once devoted a whole Initialisms& Abbreviations Dictionary names aren't part of anyone
dictionary to this world of 'slang and published by the Gale Research language: they are universal. How-
unconventional English'. Some of the Company, 11th edition, 1987) lists ever, it's important to note the usages
words it contained were thought to be over 400,000 entries. It includes old where these words do take on special
so shocking that for several years and familiar forms such as flu, hi-ft., meanings - as in Has Whitehall said
many libraries banned it from their deb, FBI, UFO, NATO and BA. anything about this? Here, Whitehall
open shelves! There are large numbers of new means 'the government'; it isn't just a
Keeping track of slang, though, is technical terms, such as VHS (the place name. Dictionaries would
one of the most difficult tasks in video system), AIDS, and all the usually include this kind of usage in
vocabulary study, because it can be so terms from computerspeak (PC, their list. But it's not at all clear how
shifting and short-lived. The life- RAM, ROM, BASIC, bit) and space many uses of this kind there are.
span of a word or phrase may be only travel (SRB - solid rocket boosters, Fauna and flora present a further
a few years - or even months. The OMS - orbital manoeuvring system, type of difficulty. Around a million
expression might fall out of use in one etc.) And there are thousands of species of insects have already been
social group, and reappear some time coinages which have a restricted described, for example. Which means
later in another. Who knows exactly regional currency, such as RAC (= that there must be around a million

12 ENGLISH TODAY No. 12 - OCTOBER 1987


designations available to enable En-
glish-speaking entomologists to talk
~ about their subject. How much of this
Lexical coverage of three can be included in our word count?

unabridged US dictionaries The largest dictionaries already in-


clude hundreds of thousands of
A hint of the extent to which any given overlap. This figure is not much technical and scientific terms, but
dictionary underestimates the total increased even if RH's proper names none of them includes more than a
word-stock of English can be obtained are excluded from consideration. fraction of the insect names - usually
from the table below, which lists the The same story emerges if pairs of just the most important species. Add
bold-face words found as initial items dictionaries are compared. There is an this total to that required for birds,
in the entries of three unabridged overlap of 13 between WIll and RH, fish, and other animals, and the
American dictionaries (variants later of 11 between RH and WEE, and of 10
theoretical size of English vocabulary
in the entry's opening line have been between WIll and WEE, suggesting
excluded). Of the 48 possible items that, if this sample is representative, the increases enormously.
listed, coverage ranges from 70% to average overlapping coverage (as de- In the light of these problems, it
35%. Only nine words appear in all fined by headwords) between any two may not be possible to arrive at a
three dictionaries - less than 20% dictionaries might be as low as 25%. satisfactory total for English vocabul-
ary. But one thing is plain: the core
vocabulary, as reflected in the entry
Webscer III sabadilla
Sabaean
saba
sabadilla
Sabaean
sabalo
World Random
Eook House totals cited for such works as the
sabaeanl
sabadinine
Sabadell
sabana
sabalo
Sabaist
sabal palmetto
Encyclopedia Sabah Sabaism unabridged Oxford English Dictionary
or Webster's Third New International,
is a considerable underestimate (see
Panel 2). These totals focus on a
figure of about half a million.
However, if we allow in some of the
above categories, this figure must be
increased by a factor of three or four.
I would never want to go below one
million, for an estimate of English
vocabulary, and with very little
persuasion I would readily accept
two.

How large is your


vocabulary?
Sabaoth Sabaoth
Sabata There seems to be no more agreement
Sabatier about the size of an adult's vocabulary
Sabatini than there is about the total number
sabathe's cycle of words in English. Estimates do
sabaton sabaton indeed vary, as we have seen. Part of
sabayon sabayon the problem, I imagine', is what is
sabbat Sabbat Sabbat
meant by 'educated'. But whether we
sabbatarianI Sabbatarian Sabbatarian
are educated or not, how can we find
sabbatarian2 out the truth of the matter?
sabbatarianism Sabbatarianism Sabbatarianism
sabbath Sabbath Sabbath We might tape record everything
sabbath we said and heard for a month, or a
sabbath dayI year, and keep a record of everything
sabbath dal we read and wrote. Then we could
sabbatharian tabulate all the words, mark which
sabbath-day house ones we understood and which we
sabbath-day's journey Sabbath-day's journey failed to understand, and count up.
sabbathless Sabbathless Sabbathless But life is too short.
SabbatWike An alternative, which can be
sabbathly
sabbath school Sabbath School Sabbath School carried out in a couple of hours, gives
sabbatia a fairly good idea. You take a
sabbatian1 medium-sized dictionary - one which
sabbatian2 contains about 100,000 entries - and
sabbatic test your know ledge of a sample of the
sabbaticall Sabbatical sabbatical words it contains. A sample of about
sabbatical2 2% of the whole, taken from various
sabbaticals sections of the alphabet, gives a
Sabbatically sabbatically reasonable result. In other words, if
Sabbaticalness
such a dictionary were 2000 pages
Total: 34 22 17 long, you would have a sample of 40
pages. Use the following procedure.

ENGLISH TODAY No. 12 - OCTOBER 1987 13


• It's wise to break this sample down
into a series of selections, say of 5
pages each, from different parts of the
dictionary. It wouldn't be sensible to
take all 40 pages from the letter U, for
P"t of one p,~"
~~::y~,~!':;'~mg~~
t!'~W:~'~f
Dictionary of the English Language (90,000+ headwords).
th, L~
+ = known/used.
instance, as a large number of these No Never
Vaguely
Occasionally
Often ++ + + +
KNOWN
USED + + +
iole
e television
words wouldWell begin with un-, and this
would hardly be typical. On the other
hand, prefixes are an important
aspect of English word formation, so
we mustn't exclude them entirely.
Similarly, it would be silly to include
a section containing a large number of
scientific words (such as the section
containing electro-), or rare words
(such as those beginning with X).
• One possible sample, which tries
to balance various factors of this kind,
would take sections of 5 complete
pages from each of the following parts
of the dictionary: C-, EX-, J-, 0-,
PL-, SC-, TO- and UN-. Begin with
the first full page in each case - in
other words, don't include the very
first page of the C section, if the you know or use the word in any of its The results are interesting. Note that
heading takes up a large part of the meanings, that will do. (Deciding passive vocabulary is much larger
page; ignore the first few EX- entries, how many meanings of a word you than active. This will always be the
if they start towards the bottom of a know or use would be another - much case. You will also find that it's easier
page; and so on. vaster - project!) to make up your mind about the
• Draw up a table of words like the • When you've finished, add up the words you definitely know than the
one in Panel 3. On the left-hand side words you frequently use.
ticks in each column, and multiply
write in the headwords from the Even allowing for wishful think-
the total by 50 (if the sample was 2%
dictionary, as they appear. Do not of the whole). The total in the first ing, sampling bias, and other such
factors, it would seem that some of
include any parts of words which the column is probably an underestimate
the widely quoted estimates of our
dictionary might list, such as cac- or of your vocabulary size. And if you
-caine, but do include words with take the first two columns together, vocabulary size are a long way from
affixes, such as cadetship alongside the total will probably be an overesti- reality. Comparisons with Shake-
cadet, even if the former is listed only mate. speare or other past writers are'
meaningless, given the enormous
as -ship within the entry on cadet. In
This procedure of course doesn't increase in English vocabulary since
short, include all items in bold face
allow for people who happen to know his day. What I would now very much
within an entry. Include phrases or
a large number of non-standard like to know is (a) whether this
idioms (e.g. call the tune). Ignore
words that may not be in the procedure can be tightened up in
alternative spellings (e.g. caesarian/
cesanan. dictionary (such as local dialect some way, or whether a better
words). If you are such a person, the procedure can be suggested? and (b)
• The table has two columns: the figures will have to be adjusted again what range of totals emerge from
first asks you to say whether you - bu~ that will be pure guesswork. people of varying backgrounds and
think you know the word, from Here are the estimates for the first ages? ET will publish in due course a
having heard or seen it used; the two columns, as filled in by a female range of vocabulary estimates from
second whether you think you office secretary in her 50s: readers who have tried out the
actually use it yourself in your speech procedure for themselves (or, if they
or writing. This is the difference prefer, have tried it out on a 'friend').
between passive and active vocabul- WORDS KNOWN If you do send in these details, please
ary. Within each column, there are make sure you include data on age,
three judgments to be made. For Well Vaguely educational background, and occupa-
passive vocabulary, you ask 'Do I tion, as well as the dictionary you
30,050 8,250
know the word well? vaguely? or not used. The results will always be
at all?'. For active vocabulary, you 38,300 interesting, and may be surprising. If
ask: 'Do I use the word' often? nothing else, it can provide you with a
occasionally? or not at all?'. Place a WORDS USED good topic for parties. There really
tick in the appropriate column. If you isn't a way of capping such observa-
are uncertain, use· the final column. Often Occasionally tions as 'I have an active vocabulary of
You may need to look at the definition approximately 38,600 words'. It will
16,300 15,200
or examples given next to the word, be a safe conversation-stopper -
before you can decide. Ignore the 31,500 unless, that is, you encounter another
number of meanings the word has: if ET reader at the same party. ,F.;[j

14 ENGLISH TODAY No. 12 - OCTOBER 1987

You might also like