Verbal Roots Sanskrit Wordnet

Verbal roots in the Sanskrit Wordnet
Malhar Kulkarni and Pushpak Bhattacharyya
Indian Institute of Technology, Mumbai, India.

malhar@iitb.ac.in
pb@cse.iitb.ac.in
http://www.iitb.ac.in
1 Introduction
Wordnets (WN) are accepted worldwide as useful lexical tools for Natural Lan-
guage Processing (NLP) . Projects for building WNs of different languages of
the world are going for quite some time.1 The scenario for Indian Languages is
also encouraging. Indian Institute of Technology Bombay (IITB) has successfully
created WNs for Hindi and Marathi.2 There have been more than 100,000 hits
of the sites for these resources.
The importance of developing a Sanskrit WN (SWN), in the context of Indian
Languages (ILs) cannot be over-emphasised. Languages in India are broadly cat-
egorized into three families, one of which namely, Indo-European, has Sanskrit
as a major language historically. Many modern Indian Languages like Hindi,
Marathi, Bengali, Gujrathi, Panjabi, Oriya etc. have substantial number of bor-
rowed Sanskrit words. Even the grammars of these languages have categories of
words called tadbhava (generated from Sanskrit) and tatsama(similar to San-
skrit). SWN, it follows, can logically provide a natural platform for integrating
IL WNs. Several institutes and scholars have been trying to undertake the task
of building SWN with various strategies. Not much of substance, however, is
visible on this front. T he main issue regarding the structure of SWN that comes
up at the time of discussion is that while building the SWN, traditional knowl-
edge bases (śastric knowledge) should be used, and one should not blindly follow
structures of existing WNs which are based on western concepts.
It is this particular aspect that is aimed at studying in the present paper.
2 Main Aim of the Paper:

We aim to apply existing theories in the two traditional schools, namely, Vyākaran.a
and Navya-Nyaya to the construction of SWN. It is indeed a matter of great
privilege for us to have certain theories propounded by these schools as the
base which may not be the case for other Indian Languages. We aim to use the
Vaiśes.ika Ontology as developed by Navya-Nyaya on one hand and the Kāraka
theory as well as the semantic structure theory developed by the Vyākaran.a
1
http://www.globalwordnet.org/gwa/wordnet table.htm
2
www.cfilt.iitb.ac.in
2 Verbal roots in the Sanskrit Wordnet
School on the other hand in this regard. Since the morphology of Sanskrit is
very rich and since the syntax is said to be embedded in the morphology, there
is a large influence of morphology on any of these theories. We cannot do away
with morphological considerations while building SWN.
There are some attempts made so far to propose schemes of Sanskrit WNs.
Behra et. al., a Sanskrit WN which had only 22 synsets. S. Mohanty. K.P. Das
Adhikary, P.K. Santi, G.P. Rout presented a structure of a proposed Sanskrit
WN. This was a general structure of limited use. Although it recognized four
types of words in Sanskrit, namely, Yaugika, Yogarudha, Rudha and Yaugika
Rudha, it focused entirely on Nouns. It also suggested using Vaisesika Ontology
which is well accepted. It did not however, take into consideration the verbal
roots which form morphologically the core part of Sanskrit Language on which
are based a large number of Sanskrit Nouns. An effective use of verbal roots
would lead to the major goal of a WN namely Word Sense disambiguation as far
as Sanskrit is considered in particular, and other Indian Languages in general.
We here propose the following:
1. A structure based on the verbal roots: We believe we are well supported

here by the traditional school of Vyākaran.a which says- sakalaśabdānām.
dhātumūlatvāt (Parama-Laghu-Mañjus.ā) (Since all the words are derived
from verbal roots)
2. Create Synsets of verbal roots and not of verbal forms: this is for obvious
reasons, the main among them being the large number of verbal forms which
can be stored and used with the help of a Morphological Analyser. We have
taken for example, all the roots, meaning Gati (movement) from all the
dhātupāt.has. We note that there are more than 300 verbal roots in Sanskrit
noted by all the dhātupāt.has (a list attached). They all form members of
the synsets of the meaning Concept Gati/ Gamana. We propose to have the
following features mentioned in SWN:
(i) Semantic Tree- This is useful in order to understand the semantic and
syntactic structure of the verbal root as well as the nouns that are gen-
erated by it. It will be of the following nature (figure ??):
Fig. 1. Semantic and Syntactic Structure of Verbal Roots

Verbal roots in the Sanskrit Wordnet 3
Table 1. Verbs and Upasargas
verbal sense verbal roots upasarga+ changed ver- related verbal

verbal root bal sense roots
gati gam ava+gam jñāna jñā
adhi+gam budh
hr. sam+hr. hanana han
pra+hr. him .s
In this x and y are not the same objects and the roots are called sakar-
maka. Whereever these two are one and the same object, the roots are
called akarmaka. This information is available to us from a semantic tree
bank that will be developed for all the synsets of the verbal roots.
(ii) Upasarga and meaning change- It is said that upasargas changes the
meaning of the verbal roots.
Upasargen.a dhātvartho balādanyatra nı̄yate|
Prahārāhārasam
. hāravihāraparihāravat|| (That is, the meaning of
the dhātu is perforce taken elsewhere by the upasargas; just as
in the case of hr., when preceded by pra it means to strike, when
preceded by ā, it means to eat, when preceded by sam it means
to kill, when preceded by vi, it means to enjoy, when preceded
by pari, it means to solve.)
We propose to link the original synsets of the verbal roots with the other synsets
to which that root will logically belong after it is associated with a particular
upasarga. We would also like to store following information regarding a verbal
root (see figure 2).
1. Svara is for useful for Morphological Analysis.
Fig. 2. Morphological and other information stored with verbal roots

2. Gan.a-Vikaran.a is useful in case of a root appearing in more than 2 groups

with 2+ meanings.
3. Pada is helpful in cases like bhuj.
It is essential at this juncture to take a look at how Hindi and Marathi WNs are
built and how verbal roots are treated therein.
3 Hindi and Marathi Wordnets

We have, for long, been engaged in building lexical resources for Indian languages
with focus on Hindi and Marathi (http://www.cfilt.iitb.ac.in). The Hindi and
Marathi wordnets [2] and the HVKB [3] have been given special attention. The
Wordnets more or less follow the design principle(s) of the Princeton Wordnet
[1] for English paying particular attention to language specific phenomena (such
as complex predicates) whenever they arise.
3.1 Hindi and Marathi Wordnets (HWN and MWN)

HWN and MWN have been created with the following current statistics given
in table 2 which can be compared with the status of the other Wordnets:
Table 2. Current status Wordnets
Total Number of Synsets Total Unique Words

Hindi Wordnet 28,867 64,725
Marathi Wordnet 11,908 18,093
WordNet (2.1) 117597 155327
GermaNet (2004) 53312 76563
Multi Word Net (1.39) 32,700 58,000
We have incorporated a supporting ontology to whose nodes the synsets are

linked and whose details are as follows: While HWN had been created from
Table 3. Details of ontology
Part of speech Number of nodes

Noun 151
Verb 39
Adjective 35
Adverb 14
first principles, by looking up the various listed meanings of words in different

Fig. 3. MWN synset creation
dictionaries, MWN has been created derivatively from HWN. That is, the synsets
of HWN are adapted to MWN via addition or deletion of synonyms in the synset.
Figure 3 shows the creation of the synset for the word peR “tree” in MWN
via addition and deletion of synonyms from HWN. The synset in HWN for this
word is {peR, vriksh, paadap, drum, taru, viTap, ruuksh, ruukh, adhrip, taruvar}
“tree”. MWN deletes {peR,viTap, ruuksh, ruukh, adhrip} and adds {jhaaR} to
it. Thus, the synset for tree in MWN is {jhaaR, vriksh, taruvar, drum, taruu,
paadap} “tree”. Hindi and Marathi being close members of the same language
family, many Hindi words have the same meaning in Marathi. This is especially
so for tatsam words, which are directly borrowed from Sanskrit. The semantic
relations are borrowed directly, thus saving time and effort.
Synsets The principles of minimality, coverage and replaceability govern the

creation of the synsets :
(i) Minimality: Only the minimal set that uniquely identifies the concept
is used to create the sysnet, e.g.,
{ghar, kamaraa} (room)
ghar- which is ambiguous- is not by itself sufficient to denote the concept of
a room. The addition of kamaraa to the synset brings out this unique sense.
(ii) Coverage: The synset should contain all the words denoting a concept.
The words are listed in order of (decreasing) frequency of their occurrence
in the corpus.
{ghar, kamaraa, kaksh} (room)
(iii) Replaceability: The words forming the synset should be mutually
replaceable in a specific context. Two synonyms may mutually replace each
other in a context C, if the substitution of the one for the other in C does
not alter the meaning of the sentence. Consider,
{svadesh, ghar} (motherland )– {apanaa desh} (the country where one is born)
amerikaa meN do saal bitaane ke baad shyaam svadesh/ ghar lauTaa
America in two years stay after Shyam motherland returned
‘Shyam returned to his motherland after spending two years in America
The replaceability criterion is observed with respect to synonymy (semantic

properties) and not with respect to the syntactic properties (such as subcate-
gorization). For instance, the two verbs {aanaa, jaananaa} “know” appear in
the same synset for the word know. In Figure 4, the sentence frames show that
while aanaa “know” assigns dative case to the subject NP, jaananaa “know”
assigns nominative case. The two verbs {aanaa, jaananaa} “know” denote the
same concept and each may replace the other in this particular semantic context.
A synset in HWN (and in MWN) consists of the following elements.
Fig. 4. Sentence frame for “know”
A. Synset: {vidyaalay, paaThshaalaa, skuul} (school )

B. Gloss which consists of two parts.
a. The text definition that explains the concept denoted by the synset.
vah sthaan jahaaM praathamik yaa maadhyamik star kii aupachaarik
shikshaa dii jaatii hai
‘The place where formal education for primary or secondary level is
given’
b. A sample sentence that uses the word in a sentence
is vidyaalay meM pahalii se paanchavii tak kii shikshaa dii jaatii hai
‘Education from first to fifth class is given in this school’
The data is stored in the Devanāgari script in MYSQL database. The part
of speech for each entry is listed in this database. In Figure 4 we provide sample
entries from both HWN and MWN.
Lexical Relations HWN incorporates commonly used semantic and lexical

relationships along with a few new ones. A brief description follows:
Fig. 5. HWN and MWN Sample Entry
1. Antonymy is a lexical relation indicating ‘opposites’. For instance, {moTaa,

sthuulkaay} ‘fat’ → {patlaa, dublaa} ‘thin’
patlaa (thin) is the antonym of moTaa (fat) and vice versa. The HWN also
indicates the criterion under which the antonomy holds. In the above ex-
ample, the antonymy criterion is size. Other criteria are given in Table 4.
Table 4. Criteria for Antonymy
Criterion Examples Gloss

Size (chhoTaa-badzaa, moTaa -patlaa) big-small, thick-thin
Quality (achchhaa-buraa, pyaar-ghriNaa) good-bad, love-hatred
Gender (beta-beTii, maataa-pitaa) son-daughter, father-mother
State (shuruu-ant) beginning-end
Personality (raam-raavaN) Rama-Ravana
Direction (puurv-pashchim, aage-piichhe) eat-west, front-behind
Action (lenaa-denaa, khariid-bikrii) take- give, buy-sell
Amount (kam-jyaadaa, halkaa-bhaarii) little-much, light-heavy
Place (duur-paas) far-near
Time (din-raat, subaha-shaam) Day-night,morning-evening
2. Gradation is a lexical relation that represents possible intermediate states

between two antonyms. Figure 6 shows the gradation relation among time
words.
3. Hypernymy and Hyponymy encode lexical relations between a more
general term and specific instances of it.
{belpatra, belpattii, bilvapatra} ‘a leaf of a tree named bela
→ {pattaa, paat, parN, patra, dal} ‘leaf ’
Fig. 6. : Gradation relation
Here, belpatra (a leaf of a tree named bel ) is a kind of pattaa (leaf ). pattaa
(leaf ) is the hypernym of belpatra (a leaf of a tree named bel ) and belpatra
(a leaf of a tree named bel ) is a hyponym of pattaa (leaf ).
4. Meronymy and Holonymy express the part-of relationshipand its inverse.
{jaR, muul, sor} ‘root’ → {peR, vriksh, paadap, drum} ‘tree’ Here, jaR (root)
is the part of peR (tree), implies jaR (root) is the meronym of peR (tree)
and peR (tree) is the holonym of jaR (root).
5. Entailment is a semantic relationship between two verbs. Any verb A entails
a verb B, if the meaning of B follows logically and is strictly included in the
meaning of A. This relation is unidirectional. For instance, snoring entails
sleeping, but sleeping does not entail snoring.
{kharraaTaa lenaa, naak bajaanaa} ‘snore’ → {sonaa} ‘sleep’
6. Troponymy is a semantic relation between two verbs when one is a specific
“manner” elaboration of another. For instance,
{dahaaRanaa} ‘to roar ’ is the troponym of {bolanaa} ‘to speak ’
7. Cross-linkage between different parts of speech: The HWN also links
synsets across different parts of speech. These links have not been taken from
the EWN. Links between “nouns” and “verbs” include the following:
(a) Ability link specifies the features inherited by a nominal concept. For
example,
{machlii, macchii, matsya, miin, maahii} ‘fish’ → {tairnaa, pairnaa,
paunrnaa} ‘swim’
(b) Capability link specifies features acquired by a nominal concept. For
example,
{vyakti, maanas} ’person’ → {tairnaa, pairnaa, paunrnaa} ‘swim’
(c) Function link specifies function(s) associated with a nominal concept.
For example,
{adhyaapak, shikshak} ‘teacher ’ → {paRhanaa, shikshaa denaa} ‘teach’
Links between “nouns” and “adjectives” are used to indicate typical proper-
ties of a noun. Example, {sher} ‘tiger ’ →{maansaahaarii} “carnivorous”. Links
between morphologically derived forms mark the root form from which a partic-
ular word is derived by affixation. For example, {bhaaratiiyataa} “indianness”
is derived from {bhaaratiiya} “Indian” and is linked to it. Figures 3.1 and 8
below we show the web interfaces for HWN and MWN and in Figure 3.1, the
data entry interface.
Fig. 7. Web interface for Hindi Wordnet
Fig. 8. Web interface for Marathi Wordnet

Fig. 9. HWN data entry interface
4 Conclusion and Future Work
We propose to maintain the core structure of a WN as it is, while building the

Sanksrit WN, in the sense that nodal elements will be synsets which will be linked
with lexical and semantic relations. What we propose to add is the language
specific approach which will include storing information related to Morphology.
This way of storing Verbal roots will definitely cover almost all the Yaugika
words, as well as some of the Yogarudha words.
References
1. Fellbaum, C., ed.: WordNet: An Electronic Lexical Database. MIT Press (1998)
2. D., C., Bhattacharyya, P.: Creation of english and hindi verb hierarchies and their
application to hindi wordnet building and english-hindi mt. In: Proceedings of the
Second Global Wordnet Conference, Brno, Czech Republic (2004)
3. G.B., P.: The Sanskrit Dhatupathas: A Critical study. University of Poona, Pune
(1961)
4. G.B., P.: A concordance of Sanskrit Dhatupathas. Deccan College, Post Graduate
Studies and Research Institute, Pune (1953)
5. S. Mohanty. K.P. Das Adhikary, P.K. Santi, G.R.: Proposed model of sanskrit word-
net in concept capability of sanskrit word-net: for convergence of knowledge-base.
In: Convergence 2003. (2003)
6. V.B.Bhagwat: Paramalaghumañjus.ā with Marathi Translation. Dept. of Philosophy,
University of Poona, Pune. (2000)

Verbal Roots Sanskrit Wordnet

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Verbal Roots Sanskrit Wordnet

Uploaded by

Copyright:

Available Formats

Verbal roots in the Sanskrit Wordnet

Malhar Kulkarni and Pushpak Bhattacharyya

Indian Institute of Technology, Mumbai, India.

2 Main Aim of the Paper:

1. A structure based on the verbal roots: We believe we are well supported

Fig. 1. Semantic and Syntactic Structure of Verbal Roots

Table 1. Verbs and Upasargas

verbal sense verbal roots upasarga+ changed ver- related verbal

Fig. 2. Morphological and other information stored with verbal roots

2. Gan.a-Vikaran.a is useful in case of a root appearing in more than 2 groups

3 Hindi and Marathi Wordnets

3.1 Hindi and Marathi Wordnets (HWN and MWN)

Table 2. Current status Wordnets

Total Number of Synsets Total Unique Words

We have incorporated a supporting ontology to whose nodes the synsets are

Table 3. Details of ontology

Part of speech Number of nodes

first principles, by looking up the various listed meanings of words in different

Fig. 3. MWN synset creation

Synsets The principles of minimality, coverage and replaceability govern the

The replaceability criterion is observed with respect to synonymy (semantic

Fig. 4. Sentence frame for “know”

A. Synset: {vidyaalay, paaThshaalaa, skuul} (school )

Lexical Relations HWN incorporates commonly used semantic and lexical

Fig. 5. HWN and MWN Sample Entry

1. Antonymy is a lexical relation indicating ‘opposites’. For instance, {moTaa,

Table 4. Criteria for Antonymy

Criterion Examples Gloss

2. Gradation is a lexical relation that represents possible intermediate states

Fig. 6. : Gradation relation

Fig. 7. Web interface for Hindi Wordnet

Fig. 8. Web interface for Marathi Wordnet

Fig. 9. HWN data entry interface

4 Conclusion and Future Work

We propose to maintain the core structure of a WN as it is, while building the

You might also like