You are on page 1of 10

Verbal roots in the Sanskrit Wordnet

Malhar Kulkarni and Pushpak Bhattacharyya

Indian Institute of Technology, Mumbai, India.


malhar@iitb.ac.in
pb@cse.iitb.ac.in
http://www.iitb.ac.in

1 Introduction
Wordnets (WN) are accepted worldwide as useful lexical tools for Natural Lan-
guage Processing (NLP) . Projects for building WNs of different languages of
the world are going for quite some time.1 The scenario for Indian Languages is
also encouraging. Indian Institute of Technology Bombay (IITB) has successfully
created WNs for Hindi and Marathi.2 There have been more than 100,000 hits
of the sites for these resources.
The importance of developing a Sanskrit WN (SWN), in the context of Indian
Languages (ILs) cannot be over-emphasised. Languages in India are broadly cat-
egorized into three families, one of which namely, Indo-European, has Sanskrit
as a major language historically. Many modern Indian Languages like Hindi,
Marathi, Bengali, Gujrathi, Panjabi, Oriya etc. have substantial number of bor-
rowed Sanskrit words. Even the grammars of these languages have categories of
words called tadbhava (generated from Sanskrit) and tatsama(similar to San-
skrit). SWN, it follows, can logically provide a natural platform for integrating
IL WNs. Several institutes and scholars have been trying to undertake the task
of building SWN with various strategies. Not much of substance, however, is
visible on this front. T he main issue regarding the structure of SWN that comes
up at the time of discussion is that while building the SWN, traditional knowl-
edge bases (śastric knowledge) should be used, and one should not blindly follow
structures of existing WNs which are based on western concepts.
It is this particular aspect that is aimed at studying in the present paper.

2 Main Aim of the Paper:


We aim to apply existing theories in the two traditional schools, namely, Vyākaran.a
and Navya-Nyaya to the construction of SWN. It is indeed a matter of great
privilege for us to have certain theories propounded by these schools as the
base which may not be the case for other Indian Languages. We aim to use the
Vaiśes.ika Ontology as developed by Navya-Nyaya on one hand and the Kāraka
theory as well as the semantic structure theory developed by the Vyākaran.a
1
http://www.globalwordnet.org/gwa/wordnet table.htm
2
www.cfilt.iitb.ac.in
2 Verbal roots in the Sanskrit Wordnet

School on the other hand in this regard. Since the morphology of Sanskrit is
very rich and since the syntax is said to be embedded in the morphology, there
is a large influence of morphology on any of these theories. We cannot do away
with morphological considerations while building SWN.
There are some attempts made so far to propose schemes of Sanskrit WNs.
Behra et. al., a Sanskrit WN which had only 22 synsets. S. Mohanty. K.P. Das
Adhikary, P.K. Santi, G.P. Rout presented a structure of a proposed Sanskrit
WN. This was a general structure of limited use. Although it recognized four
types of words in Sanskrit, namely, Yaugika, Yogarudha, Rudha and Yaugika
Rudha, it focused entirely on Nouns. It also suggested using Vaisesika Ontology
which is well accepted. It did not however, take into consideration the verbal
roots which form morphologically the core part of Sanskrit Language on which
are based a large number of Sanskrit Nouns. An effective use of verbal roots
would lead to the major goal of a WN namely Word Sense disambiguation as far
as Sanskrit is considered in particular, and other Indian Languages in general.
We here propose the following:

1. A structure based on the verbal roots: We believe we are well supported


here by the traditional school of Vyākaran.a which says- sakalaśabdānām.
dhātumūlatvāt (Parama-Laghu-Mañjus.ā) (Since all the words are derived
from verbal roots)
2. Create Synsets of verbal roots and not of verbal forms: this is for obvious
reasons, the main among them being the large number of verbal forms which
can be stored and used with the help of a Morphological Analyser. We have
taken for example, all the roots, meaning Gati (movement) from all the
dhātupāt.has. We note that there are more than 300 verbal roots in Sanskrit
noted by all the dhātupāt.has (a list attached). They all form members of
the synsets of the meaning Concept Gati/ Gamana. We propose to have the
following features mentioned in SWN:
(i) Semantic Tree- This is useful in order to understand the semantic and
syntactic structure of the verbal root as well as the nouns that are gen-
erated by it. It will be of the following nature (figure ??):

Fig. 1. Semantic and Syntactic Structure of Verbal Roots


Verbal roots in the Sanskrit Wordnet 3

Table 1. Verbs and Upasargas

verbal sense verbal roots upasarga+ changed ver- related verbal


verbal root bal sense roots
gati gam ava+gam jñāna jñā
adhi+gam budh
hr. sam+hr. hanana han
pra+hr. him .s

In this x and y are not the same objects and the roots are called sakar-
maka. Whereever these two are one and the same object, the roots are
called akarmaka. This information is available to us from a semantic tree
bank that will be developed for all the synsets of the verbal roots.
(ii) Upasarga and meaning change- It is said that upasargas changes the
meaning of the verbal roots.
Upasargen.a dhātvartho balādanyatra nı̄yate|
Prahārāhārasam
. hāravihāraparihāravat|| (That is, the meaning of
the dhātu is perforce taken elsewhere by the upasargas; just as
in the case of hr., when preceded by pra it means to strike, when
preceded by ā, it means to eat, when preceded by sam it means
to kill, when preceded by vi, it means to enjoy, when preceded
by pari, it means to solve.)
We propose to link the original synsets of the verbal roots with the other synsets
to which that root will logically belong after it is associated with a particular
upasarga. We would also like to store following information regarding a verbal
root (see figure 2).
1. Svara is for useful for Morphological Analysis.

Fig. 2. Morphological and other information stored with verbal roots


4 Verbal roots in the Sanskrit Wordnet

2. Gan.a-Vikaran.a is useful in case of a root appearing in more than 2 groups


with 2+ meanings.
3. Pada is helpful in cases like bhuj.
It is essential at this juncture to take a look at how Hindi and Marathi WNs are
built and how verbal roots are treated therein.

3 Hindi and Marathi Wordnets


We have, for long, been engaged in building lexical resources for Indian languages
with focus on Hindi and Marathi (http://www.cfilt.iitb.ac.in). The Hindi and
Marathi wordnets [2] and the HVKB [3] have been given special attention. The
Wordnets more or less follow the design principle(s) of the Princeton Wordnet
[1] for English paying particular attention to language specific phenomena (such
as complex predicates) whenever they arise.

3.1 Hindi and Marathi Wordnets (HWN and MWN)


HWN and MWN have been created with the following current statistics given
in table 2 which can be compared with the status of the other Wordnets:

Table 2. Current status Wordnets

Total Number of Synsets Total Unique Words


Hindi Wordnet 28,867 64,725
Marathi Wordnet 11,908 18,093
WordNet (2.1) 117597 155327
GermaNet (2004) 53312 76563
Multi Word Net (1.39) 32,700 58,000

We have incorporated a supporting ontology to whose nodes the synsets are


linked and whose details are as follows: While HWN had been created from

Table 3. Details of ontology

Part of speech Number of nodes


Noun 151
Verb 39
Adjective 35
Adverb 14

first principles, by looking up the various listed meanings of words in different


Verbal roots in the Sanskrit Wordnet 5

Fig. 3. MWN synset creation

dictionaries, MWN has been created derivatively from HWN. That is, the synsets
of HWN are adapted to MWN via addition or deletion of synonyms in the synset.

Figure 3 shows the creation of the synset for the word peR “tree” in MWN
via addition and deletion of synonyms from HWN. The synset in HWN for this
word is {peR, vriksh, paadap, drum, taru, viTap, ruuksh, ruukh, adhrip, taruvar}
“tree”. MWN deletes {peR,viTap, ruuksh, ruukh, adhrip} and adds {jhaaR} to
it. Thus, the synset for tree in MWN is {jhaaR, vriksh, taruvar, drum, taruu,
paadap} “tree”. Hindi and Marathi being close members of the same language
family, many Hindi words have the same meaning in Marathi. This is especially
so for tatsam words, which are directly borrowed from Sanskrit. The semantic
relations are borrowed directly, thus saving time and effort.

Synsets The principles of minimality, coverage and replaceability govern the


creation of the synsets :

(i) Minimality: Only the minimal set that uniquely identifies the concept
is used to create the sysnet, e.g.,
{ghar, kamaraa} (room)
ghar- which is ambiguous- is not by itself sufficient to denote the concept of
a room. The addition of kamaraa to the synset brings out this unique sense.
(ii) Coverage: The synset should contain all the words denoting a concept.
The words are listed in order of (decreasing) frequency of their occurrence
in the corpus.
{ghar, kamaraa, kaksh} (room)
(iii) Replaceability: The words forming the synset should be mutually
replaceable in a specific context. Two synonyms may mutually replace each
other in a context C, if the substitution of the one for the other in C does
not alter the meaning of the sentence. Consider,
6 Verbal roots in the Sanskrit Wordnet

{svadesh, ghar} (motherland )– {apanaa desh} (the country where one is born)
amerikaa meN do saal bitaane ke baad shyaam svadesh/ ghar lauTaa
America in two years stay after Shyam motherland returned
‘Shyam returned to his motherland after spending two years in America

The replaceability criterion is observed with respect to synonymy (semantic


properties) and not with respect to the syntactic properties (such as subcate-
gorization). For instance, the two verbs {aanaa, jaananaa} “know” appear in
the same synset for the word know. In Figure 4, the sentence frames show that
while aanaa “know” assigns dative case to the subject NP, jaananaa “know”
assigns nominative case. The two verbs {aanaa, jaananaa} “know” denote the
same concept and each may replace the other in this particular semantic context.
A synset in HWN (and in MWN) consists of the following elements.

Fig. 4. Sentence frame for “know”

A. Synset: {vidyaalay, paaThshaalaa, skuul} (school )


B. Gloss which consists of two parts.
a. The text definition that explains the concept denoted by the synset.
vah sthaan jahaaM praathamik yaa maadhyamik star kii aupachaarik
shikshaa dii jaatii hai
‘The place where formal education for primary or secondary level is
given’
b. A sample sentence that uses the word in a sentence
is vidyaalay meM pahalii se paanchavii tak kii shikshaa dii jaatii hai
‘Education from first to fifth class is given in this school’

The data is stored in the Devanāgari script in MYSQL database. The part
of speech for each entry is listed in this database. In Figure 4 we provide sample
entries from both HWN and MWN.

Lexical Relations HWN incorporates commonly used semantic and lexical


relationships along with a few new ones. A brief description follows:
Verbal roots in the Sanskrit Wordnet 7

Fig. 5. HWN and MWN Sample Entry

1. Antonymy is a lexical relation indicating ‘opposites’. For instance, {moTaa,


sthuulkaay} ‘fat’ → {patlaa, dublaa} ‘thin’
patlaa (thin) is the antonym of moTaa (fat) and vice versa. The HWN also
indicates the criterion under which the antonomy holds. In the above ex-
ample, the antonymy criterion is size. Other criteria are given in Table 4.

Table 4. Criteria for Antonymy

Criterion Examples Gloss


Size (chhoTaa-badzaa, moTaa -patlaa) big-small, thick-thin
Quality (achchhaa-buraa, pyaar-ghriNaa) good-bad, love-hatred
Gender (beta-beTii, maataa-pitaa) son-daughter, father-mother
State (shuruu-ant) beginning-end
Personality (raam-raavaN) Rama-Ravana
Direction (puurv-pashchim, aage-piichhe) eat-west, front-behind
Action (lenaa-denaa, khariid-bikrii) take- give, buy-sell
Amount (kam-jyaadaa, halkaa-bhaarii) little-much, light-heavy
Place (duur-paas) far-near
Time (din-raat, subaha-shaam) Day-night,morning-evening

2. Gradation is a lexical relation that represents possible intermediate states


between two antonyms. Figure 6 shows the gradation relation among time
words.
3. Hypernymy and Hyponymy encode lexical relations between a more
general term and specific instances of it.
{belpatra, belpattii, bilvapatra} ‘a leaf of a tree named bela
→ {pattaa, paat, parN, patra, dal} ‘leaf ’
8 Verbal roots in the Sanskrit Wordnet

Fig. 6. : Gradation relation

Here, belpatra (a leaf of a tree named bel ) is a kind of pattaa (leaf ). pattaa
(leaf ) is the hypernym of belpatra (a leaf of a tree named bel ) and belpatra
(a leaf of a tree named bel ) is a hyponym of pattaa (leaf ).
4. Meronymy and Holonymy express the part-of relationshipand its inverse.
{jaR, muul, sor} ‘root’ → {peR, vriksh, paadap, drum} ‘tree’ Here, jaR (root)
is the part of peR (tree), implies jaR (root) is the meronym of peR (tree)
and peR (tree) is the holonym of jaR (root).
5. Entailment is a semantic relationship between two verbs. Any verb A entails
a verb B, if the meaning of B follows logically and is strictly included in the
meaning of A. This relation is unidirectional. For instance, snoring entails
sleeping, but sleeping does not entail snoring.
{kharraaTaa lenaa, naak bajaanaa} ‘snore’ → {sonaa} ‘sleep’
6. Troponymy is a semantic relation between two verbs when one is a specific
“manner” elaboration of another. For instance,
{dahaaRanaa} ‘to roar ’ is the troponym of {bolanaa} ‘to speak ’
7. Cross-linkage between different parts of speech: The HWN also links
synsets across different parts of speech. These links have not been taken from
the EWN. Links between “nouns” and “verbs” include the following:
(a) Ability link specifies the features inherited by a nominal concept. For
example,
{machlii, macchii, matsya, miin, maahii} ‘fish’ → {tairnaa, pairnaa,
paunrnaa} ‘swim’
(b) Capability link specifies features acquired by a nominal concept. For
example,
{vyakti, maanas} ’person’ → {tairnaa, pairnaa, paunrnaa} ‘swim’
(c) Function link specifies function(s) associated with a nominal concept.
For example,
{adhyaapak, shikshak} ‘teacher ’ → {paRhanaa, shikshaa denaa} ‘teach’
Links between “nouns” and “adjectives” are used to indicate typical proper-
ties of a noun. Example, {sher} ‘tiger ’ →{maansaahaarii} “carnivorous”. Links
between morphologically derived forms mark the root form from which a partic-
ular word is derived by affixation. For example, {bhaaratiiyataa} “indianness”
Verbal roots in the Sanskrit Wordnet 9

is derived from {bhaaratiiya} “Indian” and is linked to it. Figures 3.1 and 8
below we show the web interfaces for HWN and MWN and in Figure 3.1, the
data entry interface.

Fig. 7. Web interface for Hindi Wordnet

Fig. 8. Web interface for Marathi Wordnet


10 Verbal roots in the Sanskrit Wordnet

Fig. 9. HWN data entry interface

4 Conclusion and Future Work

We propose to maintain the core structure of a WN as it is, while building the


Sanksrit WN, in the sense that nodal elements will be synsets which will be linked
with lexical and semantic relations. What we propose to add is the language
specific approach which will include storing information related to Morphology.
This way of storing Verbal roots will definitely cover almost all the Yaugika
words, as well as some of the Yogarudha words.

References
1. Fellbaum, C., ed.: WordNet: An Electronic Lexical Database. MIT Press (1998)
2. D., C., Bhattacharyya, P.: Creation of english and hindi verb hierarchies and their
application to hindi wordnet building and english-hindi mt. In: Proceedings of the
Second Global Wordnet Conference, Brno, Czech Republic (2004)
3. G.B., P.: The Sanskrit Dhatupathas: A Critical study. University of Poona, Pune
(1961)
4. G.B., P.: A concordance of Sanskrit Dhatupathas. Deccan College, Post Graduate
Studies and Research Institute, Pune (1953)
5. S. Mohanty. K.P. Das Adhikary, P.K. Santi, G.R.: Proposed model of sanskrit word-
net in concept capability of sanskrit word-net: for convergence of knowledge-base.
In: Convergence 2003. (2003)
6. V.B.Bhagwat: Paramalaghumañjus.ā with Marathi Translation. Dept. of Philosophy,
University of Poona, Pune. (2000)

You might also like