Professional Documents
Culture Documents
1 Introduction
Wordnets (WN) are accepted worldwide as useful lexical tools for Natural Lan-
guage Processing (NLP) . Projects for building WNs of different languages of
the world are going for quite some time.1 The scenario for Indian Languages is
also encouraging. Indian Institute of Technology Bombay (IITB) has successfully
created WNs for Hindi and Marathi.2 There have been more than 100,000 hits
of the sites for these resources.
The importance of developing a Sanskrit WN (SWN), in the context of Indian
Languages (ILs) cannot be over-emphasised. Languages in India are broadly cat-
egorized into three families, one of which namely, Indo-European, has Sanskrit
as a major language historically. Many modern Indian Languages like Hindi,
Marathi, Bengali, Gujrathi, Panjabi, Oriya etc. have substantial number of bor-
rowed Sanskrit words. Even the grammars of these languages have categories of
words called tadbhava (generated from Sanskrit) and tatsama(similar to San-
skrit). SWN, it follows, can logically provide a natural platform for integrating
IL WNs. Several institutes and scholars have been trying to undertake the task
of building SWN with various strategies. Not much of substance, however, is
visible on this front. T he main issue regarding the structure of SWN that comes
up at the time of discussion is that while building the SWN, traditional knowl-
edge bases (śastric knowledge) should be used, and one should not blindly follow
structures of existing WNs which are based on western concepts.
It is this particular aspect that is aimed at studying in the present paper.
School on the other hand in this regard. Since the morphology of Sanskrit is
very rich and since the syntax is said to be embedded in the morphology, there
is a large influence of morphology on any of these theories. We cannot do away
with morphological considerations while building SWN.
There are some attempts made so far to propose schemes of Sanskrit WNs.
Behra et. al., a Sanskrit WN which had only 22 synsets. S. Mohanty. K.P. Das
Adhikary, P.K. Santi, G.P. Rout presented a structure of a proposed Sanskrit
WN. This was a general structure of limited use. Although it recognized four
types of words in Sanskrit, namely, Yaugika, Yogarudha, Rudha and Yaugika
Rudha, it focused entirely on Nouns. It also suggested using Vaisesika Ontology
which is well accepted. It did not however, take into consideration the verbal
roots which form morphologically the core part of Sanskrit Language on which
are based a large number of Sanskrit Nouns. An effective use of verbal roots
would lead to the major goal of a WN namely Word Sense disambiguation as far
as Sanskrit is considered in particular, and other Indian Languages in general.
We here propose the following:
In this x and y are not the same objects and the roots are called sakar-
maka. Whereever these two are one and the same object, the roots are
called akarmaka. This information is available to us from a semantic tree
bank that will be developed for all the synsets of the verbal roots.
(ii) Upasarga and meaning change- It is said that upasargas changes the
meaning of the verbal roots.
Upasargen.a dhātvartho balādanyatra nı̄yate|
Prahārāhārasam
. hāravihāraparihāravat|| (That is, the meaning of
the dhātu is perforce taken elsewhere by the upasargas; just as
in the case of hr., when preceded by pra it means to strike, when
preceded by ā, it means to eat, when preceded by sam it means
to kill, when preceded by vi, it means to enjoy, when preceded
by pari, it means to solve.)
We propose to link the original synsets of the verbal roots with the other synsets
to which that root will logically belong after it is associated with a particular
upasarga. We would also like to store following information regarding a verbal
root (see figure 2).
1. Svara is for useful for Morphological Analysis.
dictionaries, MWN has been created derivatively from HWN. That is, the synsets
of HWN are adapted to MWN via addition or deletion of synonyms in the synset.
Figure 3 shows the creation of the synset for the word peR “tree” in MWN
via addition and deletion of synonyms from HWN. The synset in HWN for this
word is {peR, vriksh, paadap, drum, taru, viTap, ruuksh, ruukh, adhrip, taruvar}
“tree”. MWN deletes {peR,viTap, ruuksh, ruukh, adhrip} and adds {jhaaR} to
it. Thus, the synset for tree in MWN is {jhaaR, vriksh, taruvar, drum, taruu,
paadap} “tree”. Hindi and Marathi being close members of the same language
family, many Hindi words have the same meaning in Marathi. This is especially
so for tatsam words, which are directly borrowed from Sanskrit. The semantic
relations are borrowed directly, thus saving time and effort.
(i) Minimality: Only the minimal set that uniquely identifies the concept
is used to create the sysnet, e.g.,
{ghar, kamaraa} (room)
ghar- which is ambiguous- is not by itself sufficient to denote the concept of
a room. The addition of kamaraa to the synset brings out this unique sense.
(ii) Coverage: The synset should contain all the words denoting a concept.
The words are listed in order of (decreasing) frequency of their occurrence
in the corpus.
{ghar, kamaraa, kaksh} (room)
(iii) Replaceability: The words forming the synset should be mutually
replaceable in a specific context. Two synonyms may mutually replace each
other in a context C, if the substitution of the one for the other in C does
not alter the meaning of the sentence. Consider,
6 Verbal roots in the Sanskrit Wordnet
{svadesh, ghar} (motherland )– {apanaa desh} (the country where one is born)
amerikaa meN do saal bitaane ke baad shyaam svadesh/ ghar lauTaa
America in two years stay after Shyam motherland returned
‘Shyam returned to his motherland after spending two years in America
The data is stored in the Devanāgari script in MYSQL database. The part
of speech for each entry is listed in this database. In Figure 4 we provide sample
entries from both HWN and MWN.
Here, belpatra (a leaf of a tree named bel ) is a kind of pattaa (leaf ). pattaa
(leaf ) is the hypernym of belpatra (a leaf of a tree named bel ) and belpatra
(a leaf of a tree named bel ) is a hyponym of pattaa (leaf ).
4. Meronymy and Holonymy express the part-of relationshipand its inverse.
{jaR, muul, sor} ‘root’ → {peR, vriksh, paadap, drum} ‘tree’ Here, jaR (root)
is the part of peR (tree), implies jaR (root) is the meronym of peR (tree)
and peR (tree) is the holonym of jaR (root).
5. Entailment is a semantic relationship between two verbs. Any verb A entails
a verb B, if the meaning of B follows logically and is strictly included in the
meaning of A. This relation is unidirectional. For instance, snoring entails
sleeping, but sleeping does not entail snoring.
{kharraaTaa lenaa, naak bajaanaa} ‘snore’ → {sonaa} ‘sleep’
6. Troponymy is a semantic relation between two verbs when one is a specific
“manner” elaboration of another. For instance,
{dahaaRanaa} ‘to roar ’ is the troponym of {bolanaa} ‘to speak ’
7. Cross-linkage between different parts of speech: The HWN also links
synsets across different parts of speech. These links have not been taken from
the EWN. Links between “nouns” and “verbs” include the following:
(a) Ability link specifies the features inherited by a nominal concept. For
example,
{machlii, macchii, matsya, miin, maahii} ‘fish’ → {tairnaa, pairnaa,
paunrnaa} ‘swim’
(b) Capability link specifies features acquired by a nominal concept. For
example,
{vyakti, maanas} ’person’ → {tairnaa, pairnaa, paunrnaa} ‘swim’
(c) Function link specifies function(s) associated with a nominal concept.
For example,
{adhyaapak, shikshak} ‘teacher ’ → {paRhanaa, shikshaa denaa} ‘teach’
Links between “nouns” and “adjectives” are used to indicate typical proper-
ties of a noun. Example, {sher} ‘tiger ’ →{maansaahaarii} “carnivorous”. Links
between morphologically derived forms mark the root form from which a partic-
ular word is derived by affixation. For example, {bhaaratiiyataa} “indianness”
Verbal roots in the Sanskrit Wordnet 9
is derived from {bhaaratiiya} “Indian” and is linked to it. Figures 3.1 and 8
below we show the web interfaces for HWN and MWN and in Figure 3.1, the
data entry interface.
References
1. Fellbaum, C., ed.: WordNet: An Electronic Lexical Database. MIT Press (1998)
2. D., C., Bhattacharyya, P.: Creation of english and hindi verb hierarchies and their
application to hindi wordnet building and english-hindi mt. In: Proceedings of the
Second Global Wordnet Conference, Brno, Czech Republic (2004)
3. G.B., P.: The Sanskrit Dhatupathas: A Critical study. University of Poona, Pune
(1961)
4. G.B., P.: A concordance of Sanskrit Dhatupathas. Deccan College, Post Graduate
Studies and Research Institute, Pune (1953)
5. S. Mohanty. K.P. Das Adhikary, P.K. Santi, G.R.: Proposed model of sanskrit word-
net in concept capability of sanskrit word-net: for convergence of knowledge-base.
In: Convergence 2003. (2003)
6. V.B.Bhagwat: Paramalaghumañjus.ā with Marathi Translation. Dept. of Philosophy,
University of Poona, Pune. (2000)