You are on page 1of 8

Old, L. J. (2003).

An analysis of semantic overlap among English prepositions in Roget’s Thesaurus.


In P. Saint-Dizier (Ed.),
Proceedings of the Association for Computational Linguistics SIG Semantics Conference
(ACL-SIGSEM) (pp. 13-19). Toulouse: IRIT.

An Analysis of Semantic Overlap among English Prepositions in Roget’s


Thesaurus.
L. John Old
School of Computing, Napier University
10 Colinton Road, Edinburgh, EH10 5DT
E-mail: j.old(at)napier.ac.uk

Abstract. Using prepositions, related words from other parts of speech, and senses listed in
Roget’s Thesaurus, this paper discusses and illustrates the complex relationships between and
among prepositions and other basic parts of speech. The pattern of genus and differentiae
emerging from the complex relationships between words and senses suggests that prepositions
cannot be viewed in isolation, and that a natural, and even optimal, organization of semantics
exists that may explain why current methods of classification and partitioning of words and
senses sometimes result in confusion.

Keywords: Preposition, word class, part of speech, genus and differentiae, lattice, Formal
Concept Analysis, Roget’s Thesaurus
1. where the average teacher speaking words from other
English speaker American English, classes.
would instead use referring to a
Introductio This paper
outside of, or poorly written
n except. Outwith is a essay, might tell a offers no solutions
Prepositions, as perfectly good student to “do it to this apparent
preposition and over,” a British confusion, but
a class of words,
have been referred unambiguous to its teacher would only attempts to
users. ever say, “do it illustrate it as a
to as a closed set.
The “set” is the set While an again.” natural, and even
English speaker Even desirable, feature of
of words that are
eligible to be called standing before a prepositions prepositions—and
house might say commonly of language.
prepositions. It is
closed probably as that the rear garden considered The
is beyond, to the synonyms may vary prepositions used
a consequence of
the fact that the back of or behind or disagree in the here are drawn
the house, but senses they from the 411
words defined as
(or classed as) never after the describe. Above can entries found in
house; a Dutch be a synonym of Roget’s
prepositions
describe a limited speaker would say over in the sense of International
it is “achter het “higher up,” but not Thesaurus [1962].
set of concepts (for
example spatial and huis.” Achter means in the sense of This is the
“after.” It has the “across”--one may “American” edition
temporal relations)
that don’t change-- same Indo- live across the road of the thesaurus.
European language or over the road, Roget’s Thesaurus
unless our
consensual reality root as after, and but not above the is used because it
has the same basic road (and still mean groups words of
changes.
Prepositions are semantics2 in both the same thing). similar meaning
languages. Even Furthermore, together, by part of
not, on the other
hand, a stable set. though Dutch and there is speech. WordNet
English are about considerable [Miller et al., 1993]
The semantics of
individual as close as any two overlap between does the same, and
languages can be the set of words contains a richer set
prepositions is
mutable across without being that are called of relations, but
dialects, this prepositions and does not contain
time, and among
related languages. preposition has words from other prepositions. The
evolved to be used word classes (parts comparisons made
Non-standard or
idiomatic use of in different ways. of speech). Crystal here between
Words that are [1989, p. 92] points prepositions and
prepositions can
become the prepositions do not out that word other parts of
have a clear classes: speech are limited
standard, while the
“correct” or semantics even … are not as to nouns, verbs,
within the same nearly adjectives and
traditional usage homogeneous as
goes out of fashion. language. Where a adverbs.
the theory implies.
Or not… An Each class has a
educated Scot uses (illegal), is. core of words that
the word outwith
2
It is still acceptable behave identically,
English to say, “Take from a
(archaic to some1) the first turn right grammatical point
after the set of lights;” of view. But at the
1
From Middle or “After you :)”—but “edges” of a class
English, according to we are more likely to are the more
Webster’s 3rd Edition, use it in its analogous irregular words,
1965. Though it is not temporal form: “… some of which
in Roget’s Thesaurus, after 10 o’clock;” or may behave like
outwith the law “… after I get up.”
2. counted only once under different parts adverbs (indicated
per part of speech. of speech. In Figure by a thick arrow)
The word line, for 1 the percentage of and 32% are also
Overla
example, is found overlap among parts adjectives (indicated
p as a noun entry in of speech has been by a narrower
among 20 different illustrated double-lined arrow);
Parts thesaurus senses graphically using pie and 57% of adverbs
of but is counted here charts. In this case are also adjectives
Speech only once as a entries were chosen, (indicated by a thick
noun. The rather than words. arrow). The relative
Prepositions are preposition after is Entries found proportions shown
a small set just one of the 411 classified under only here are not
compared to other prepositions one part of speech normalized numbers
parts of speech. counted here. It is are ignored here, as for each word class
While prepositions also counted once they do not (for example there
are a closed set, under each of the contribute to the are many more
nouns are ever other parts of analysis of overlap nouns and verbs
increasing as speech as its 13 between parts of than prepositions),
science and senses, or entries, speech, and also but a clear
technology advance are spread across because the more indication, at least, is
and new words are all five parts of than 105,000 unique present in the
needed to describe speech.3 The entries in this illustration.
new concepts. difference between category (of the Note that among
Other parts of entries and words is total 200,000 the different parts of
speech are being that an entry thesaurus entries) speech only adverbs
added to as well represents one would make the (that is, 8% of
(for example, to be sense-instance of a overlapping entries adverbs that occur
“ENRONed”), word, while word is for smaller parts of in other parts of
though not as a particular string of speech, invisible. So speech) are also
rapidly. Table 1 characters. So after, betwixt, for found as
shows the word- with 13 senses, is example, which prepositions in any
count-by-part-of- represented in occurs only as a significant numbers.
speech for words in Roget’s Thesaurus preposition, is Those same entries
Roget’s Thesaurus. by 13 entries. Four ignored. After, constitute the 48%
of those senses are which occurs in all of entries
POS Count PCent prepositional, so the five word classes, is represented on the
Noun 69017 57.4%word after has four included in the Prepositions pie
Adjective 23171 19.3%prepositional entries. calculations for all chart labeled “4
Verb 21368 17.8%
Approximately five pie charts. 48%“ (in white).
half of the The arrows
Adverb 6346 5.3%
prepositions found serve as a rough POS Overlap
Preposition 411 0.3%in Roget’s indicator of the main Adverb 137
Thesaurus have allegiance owed by a
Table 1. Part of Adjective 87
more than one sense word class to
speech count of and so are another word class. Noun 33
words in Roget’s polysemous. Many For example, verbs Conjunction 18
Thesaurus of those words are and nouns share a Verb 12
elsewhere in the high percentage of
Other lexicons will thesaurus classified words (77% and
have different 87% respectively),
numbers but the 3
After is found as a indicated by a thick,
distribution will be synonym of afternoon double-headed
about the same. In and evening in one
nominal thesaurus
arrow; 47% of
Table 1 words are sense. prepositions are also
shows the actual 3. Part of those senses--its
Table 2. Number of overlap in terms of synonyms. The
Speech
words word-counts lattice includes only

4
Key 3 1%
5
1 Nouns 12%
0%
2 Verbs
3 Adjectives 1
4 Adverbs 87%
5 Prepositions Verbs
4 5 4
3 10% 5
1% 0%
22% 1%
2
29% 1
Nouns 60%
Adjs.
2
77%

1
13% Advs 5
4 Preps
2 8% 1
48% 22%
7%

3 2
32% 13%
3
57%

Figure 1. Percentage overlap between parts of speech in Roget’s Thesaurus


shared (including Overlap “shared” synonyms
between conjunctions). These for the of over--those
preposition overlaps are formed words that occur
Prepositio
s and other with 198 of the 411 with over in more
parts of prepositions. There n Over than one sense. As
speech. are a further 213 In Figure 2 the with Figure 1, the
prepositions that do overlap between words that have
In real numbers, not overlap with any prepositions that been omitted occur
287 words classified other part of speech. occur as synonyms in only one part of
as prepositions in of over in various speech and do not
Roget’s Thesaurus senses with various contribute to the
are also found in parts of speech can connectivity or
senses other than be seen represented overlap between
those classed as as a “concept parts of speech, or
prepositional. For lattice” [Wille, senses. They would
example, 33 of these 1982]. This forms a however
words also occur as is not included in this kind of topology of differentiate or
nouns4. Table 2 American edition of over, its senses, and discriminate senses
Roget’s Thesaurus. the word that are which otherwise
4
An over (Nn) is a Examples of verbs are
cricket term for a period “to further a cause” and found contain identical
of play—but that sense “to near a conclusion.” accompanying it in sets of words. This
is discussed further by the index the lattice structure Senses are read
under the Section, numbers of the are labeled by off the concept
Genus and senses and below senses that contain lattice top down.
Differentiae, below. by words found in more synonyms, To the top and right
those senses. Index and concepts lower of the centre of the

Figure 2. Lattice showing the topology of relationships between entries, senses and
parts of speech for the word over.
A concept numbers are of the in the lattice are lattice can be seen
lattice is generated form: labeled by senses sense 227.40.1, a
automatically from that contain fewer prepositional sense
Category#:Paragr
a relation between synonyms. from Category 227,
aph#:Sense.
two sets, objects Symmetrically, Covering. This
and attributes. In Though a concept concepts lower in sense of over
this example the is defined as the set the lattice are contains the
objects are words of all of its labeled by words following set of
from Roget’s attributes (words) that have more entries that share
Thesaurus while and all of its objects senses, and more than one
their attributes are (senses), for concepts higher in sense with over:
the senses of the economy of the lattice are {on top of, on,
words. A representation labeled by words upon, above, over,
polysemous word words and index that have fewer o’er}. These entries
can occur in more numbers are used senses. No can be found on the
than one sense (as as labels only once. information is lost lattice by following
several entries) and Words label the through this the lines (or links)
a sense can contain lowest concept in method of labeling down from the
more than one which they occur only once per word Covering concept,
word—hence the and index numbers and once per as follows: the
graph structure label the highest sense--the complete concept below and
formed is a lattice, concept in which sets of senses and to the left is labeled
not a tree. The they occur. Thus a words can be read with on top of; the
nodes/circles are lattice is a partial from the lattice as concept below and
called concepts and ordering, where illustrated in the to the middle is
are labeled above concepts higher in following examples. labeled with o’er;
following the link Above has six convince the reader among some
down to the right senses shared with that this words. These
there is a concept over, {36.13.1; automatically- words are examples
labeled with upon 206.24.2; 206.27.4; derived graphic has of the type
and on; and finally, 227.40.1; 661:27:1; presented the described by
the concept below 40:10:1}, three of senses of over in a Crystal [1987] as
and linked to both which are adverbial, coherent way—a being at the
the lower-left and one of which is way which supports “edges” of the
middle concepts prepositional, and Brugman and word classes. They
(labeled with on two of which are Lakoff’s [1988] are the glue that
top of and o’er), is adjectival. These assertion that ties the senses
labeled with above. can be identified senses of a word together, and
Together these and read off the are related and that incidentally, some
labels make up the lattice by tracing there are gradual of the most
set of shared the lines up from transitions, or common
entries, or the concept that is transformations, as (polysemous and
synonyms, of over labeled with above. one navigates from high-frequency-
found in Roget’s The scope of closely to more usage) words in the
Thesaurus the concept labeled distantly related thesaurus.
Category 227, with above, reading senses. Similar
Covering, the lattice upwards, lattices can be
Paragraph 40, is the set of six derived for any
Sense 1. senses of above; word in Roget’s
The four senses while the scope of Thesaurus that has
labeling the bottom the same concept, senses crossing part
node contain no reading of speech
other entries downwards, is the boundaries.
(besides over) that set of words that Figure 3 shows the
are found in more are contained as concept lattice of
than one sense of synonyms in the above—also
over. The top node two senses that restricted to
is unlabeled as label that concept synonyms that
there is no sense (over and above). occur in more than
which contains all In Formal Concept one sense. Six of
of the words. Analysis [Wille, the seven senses are
To find the 1989] the set of shared with over
senses of a objects (the set of (c.f. Figure 2). The
particular word the words) is called the seventh sense
lattice is read from extent of a concept; differentiates above
the bottom up. So and the set of from over in this
for example the attributes (the set lattice.
word over, which is of senses), the The
found in all senses, intent of the automatically
labels the lowest concept. constructed lattices
concept--all of the It is not show that many
senses of over can necessary to closely related
be found by tracing navigate the lattice adjectives, adverbs
the lines up (and expertly or and prepositions
conversely, all of understand the may be selected by
the senses can be underlying focusing on a single
seen to contain the mathematical word, and illustrate
word over by formalism. Simply the overlap and
tracing the lines comparing adjacent blending among
down from them). concepts should parts of speech, and
other senses via Moreover, there
those shared words, is a symmetric
hints at what is at organization among
the core of the words. In the
prepositional same way that
semantics, it senses can be read
illustrates the down the lattice
concept of genus (their constituent
and differentiae words identified),
used to construct and words can be
sense-definitions in read up the lattice
dictionaries. A (their various
simplified senses can be
dictionary example identified), some
would be: “A cup is senses act as
a type of container differentiators for
(genus) that has a words and some
handle (differentia words act in a
number one) and is “genus” capacity,
used for drinking gluing the senses
Figure 3. Lattice of above. liquids (differentia together.
nouns5, and 3 as number two).” As Perhaps this
4. Genus and verbs. Of the stated earlier, “genus-differentiae”
Differentiae remaining Figure 2 includes facet of word-sense
“idiosyncratic” only those words organization has
In contrast to words (single- that share more implications for the
Brugman and instance words, than one sense with conceptual
Lakoff’s “radial omitted from this over. The words organization of the
category” of lattice), that do not share brain, but it is
senses, there is no additionally, more than one beyond the scope
central sense moreover, and sense with over of this paper to
evident in the furthermore occur include the enlarge on that. It is
lattice. None-the- in the thesaurus differentiating sufficient to say
less, the sense with only in this sense— entries in each of its that the
index 40.10.1 from they characterize it, senses. So the organization seen in
Category 40 differentiating it lattice is a kind of the lattice emerges
Addition, an from other senses. “genus” topology, naturally from the
adverbial sense, They are the stripes only. The missing data—from the
shares words with that separate this words are what semantic
many of the other tiger from other big facilitate the relationships
senses. In the cats—they discrimination of between synonyms,
thesaurus it has 37 distinguish this senses from one and from the
entries. Of the 37, sense from other another in the same transitional or
24 are words that senses. way that transformational
have more than one This sense, distinguishing connections
part of speech, and along with its features allow us to between senses of
31 are polysemous. idiosyncratic recognize and polysemous words.
Of those with more words, and differentiate This organization
than one part of relationships to individual people, provides a natural
speech, 14 double and living things way to arrange
as adjectives, 12 5
In uses such as: the are differentiated information in a
double as more the merrier; a amongst in fairly optimal
prepositions, 4 as blast from the past; a
biological fashion--so that the
movie extra; a real
plus. taxonomies. pieces of
information become synonyms, and Gross, D., Data, Springer-
neither isolated, nor parts of speech, Miller, K., and Verlag, Berlin-
too densely packed. together. Despite Tengi, R. (1993). Heidelberg.
this apparently Five Papers on
5. overwhelming WordNet,
complexity, senses Technical
of words, in Report,
Conclusi
context, can be Princeton
on disambiguated6 University,
A preposition is almost Princeton, N.J.
a word (or phrase). instantaneously by
But in Roget’s native speakers. It Roget's
Thesaurus that may not be “despite International
specific word may of,” but “because Thesaurus, 3rd
be represented by of” this complexity Edition, Berry,
many entries under that we are able to L., (Ed.)
separate do it. Thomas Crowell
prepositional Co., New York,
senses. The same References 1962.
word, or string of Brugman C. and Webster’s Third
characters New
Lakoff, G.,
(excluding International
(1988).
homographs), may Dictionary
Cognitive
also have one or (unabridged),
Topology and
more entries Gove, P. B.
Lexical
classified under (Ed.), G & C
Networks. In
other, non- Merriam,
Small, S. I.,
prepositional parts Publishers,
Cottrell, G. W.,
of speech. So, to Springfield,
Tanenhaus, M.
say that over is a 1965.
K. (Eds.),
preposition is not Lexical
to exclude it from Wille, R., (1982).
Ambiguity
being any other part Restructuring
Resolution,
of speech. Also, to Lattice Theory:
Morgan
say that over is a An Approach
Kaufmann.
synonym of above Based on
is not to say that it Crystal, David Hierarchies of
is a synonym of (1987). The Concepts. In I.
above in all senses Cambridge Rival, (Ed.),
or, for that matter, Encyclopedia of Ordered Sets,
for all parts of Language, Reidel,
speech. To say a Cambridge Dordrecht-
word “means” University Press, Boston, pp. 445-
something, or “is” a Cambridge. 470.
preposition, is
Miller, G., Wille, R. (1989).
misleading. Outside
Beckwith, R., Geometric
of usage (spoken or
Fellbaum, C., Representation
written context),
of Concept
the meaning of a
Lattices. In
word can only be 6
And if not
immediately Opitz, O. (Ed.),
understood in the
disambiguated, at least Conceptual and
context of the
identified as congruent Numerical
semantics of all of with the current Analysis of
its senses, context.

You might also like