You are on page 1of 38

Sanskrit and Natural Language Processing

Dr.Srinivasa Varakhedi
Center for Advanced Studies and Research in Shabdabodha and NLP


Dream of a bee…..

®úÉÊjÉ& MÉʨɹªÉÊiÉ ¦ÉʴɹªÉÊiÉ ºÉÖ|ɦÉÉiɨÉÂ* ¦Éɺ´ÉÉxÉ =näù¹ªÉÊiÉ ½þʺɹªÉÊiÉ {ÉRÂóEòVɸÉÒ&** <ilÉÆ Ê´ÉÊSÉxiɪÉÊiÉ EòÉä¶ÉMÉiÉä Êuù®äú¡äò* ½þÉ ½þxiÉ ½þxiÉ xÉʱÉxÉÓ MÉVÉ =VVɽþÉ®ú**

Present situation of Sanskrit
 

  

Sanskrit colleges are like 'zoo'! No Govt. support unless we are productive Humanities and Languages are being neglected How far this support will continue ? Great tradition of learning is being lost No scope for novel research

psychology.Innovation is the key     Sanskrit Shastras are competent enough to enter the science world Move out of Humanities and get merged with science Analogy : Maths. . We must find practical approach for these Sanskrit Sciences. Logic.

we have lost 80%  Meemamsa .No use in modern dialectics ? Vyakarana – No application ??   What to do ? .No practical approach ! Nyaya .

Relevance of Sanskrit Shastras in Modern Technology  fortunately these shastras are found relevent in today’s technology    Computing ideas in Panini Text processing principles in Meemamsa Formal languages in Nyaya we lack the technology and application area Story of Babbage!!! .

” .Ishavasya Uapanishad Sri Shankara Bhagavatpada comments on this ….Massage of Acharya Shankara Bhagavatpada “avidyayaa mrtyum tiirtvaa.. vidyayaa amrtamashnute. vidyaa = knowledge ... avidyaa = karma .

Opportunity   Emerging Info technology has provided a great oportunity to survive MÉÞàþÒªÉÉiÉ ÊiÉxiÉÞhÉÒ¶ÉÉJÉÉÆ Ê¶ÉOÉÖ¶ÉÉJÉÉOɽäþhÉ ÊEò¨É ?    Solve a major contemporary problem like MT basing on the shastras Get new openings for Sanskritists Open a new avenue for research .

who act as a bridge between modern scientists and technologists one side and sanskrit scholars on the other side.   .Know How…  Ultimate aim :finding appropriate place for sanskrit Shastras Method: solutions to contemporory problems adopting modern technology Resource needed : Adequate manpower.

Change the scenario Technology Western Theories INDIAN THEORIES .

Opportunities missed  Industrial revolution  We missed this with some hasty decisions  IT revolution  Indians are serving in the level of coding . not in designing level ! we should take this advantage  Knowledge Revolution  .

Need of the hour  we need   to understand how technology works to understand the contempomporary problems we will be able to give solutions in the light of sashtras and show the relevence of Indian theories  Then  .

History and Progress    Conference held at Bangalore in Dec 1987 on “Knowledge Representation and Sanskritam” generated tremendous interest Nothing much has been archived. except some efforts and projects here and there in small scale that too in technical institutions Time running out ! What progress has been made since then? .

Complexity of the problem    Different Goal : Two disciplines – Technology and Shastras . Traditional Pandits on the other hand prefer oral tradition Language Barrier : Both of them do not understand each other’s language ! The tuning in of the dialogue will take time .are developed in different context Paradigm difference : Modern Scholars are accustomed to visual teaching method.

Who would bell the cat ?     It needs a long interaction between technologists and Traditional Sanskrit Scholars Technical institutions are always ready for such activities There is NO much interest is seen in Sanskrit Institutions It is we Sanskritists should to bell the cat .

Long process like extraction of ghee from milk  Nothing miracle happens in the initial stage It’s a big challenge. one OR two persons are not enough We need hundreds of dedicated persons to achieve a small goal   A person can climb a small hill . Team can climb the Everest .

 we can direct you towards that by way of negating what we know. Hence .Braman in Upanishads  what is Brahman?   we can NOT show it as it is impercievable.¶ÉÉJÉÉSÉxpù¨ÉºÉɯûxvÉiÉÒxªÉɪÉ& .  (+{ÉÉä½þ) . we can NOT describe it as it is beyond words.Identifying the “problem”  Analogy:.

Possible areas       Machine Translation Speech Processing Summary Extraction from huge texts Indo Wordnet as a base for IL-wordnets Developing Tools for IL Researchers Knowledge Representation schemes .

Machine Translation  English To Indian Languages      Word sense disambiguation Karaka & Syntax Relation Word-grouping Idiomatic Expression Shabdasutra  MT among Indian Languages   Bi-language Electronic Dictionaries Karaka & Vibhakti Relation .

Major MT systems  India      Angla-Bharati. HCU. CSS. IIIT Hyderabad Mantra. CDAC Pune SaHiT (Sanskrit Hindi Translator). IIIT) . JNU Anusaaraka (RSV. IIT Kanpur Shakti.

Japan) JANUS (bimodal.compuserve.pdf  .com/homepages/WJHutchins/Compendium-11. US-Germany) SLT (SRI.Major MT systems  Outside India        UNITRAN BabelFish AltaVista (Systran) ATR (bimodal. Cambridge) VERBMOBIL (Germany) DIPLOMAT (Carnegie-Mellon) Get a 125 page directory of available MT systems at http://ourworld.

. Bangalore.Summary Extraction  Meemamsa Principles applied to extract the summary of a text Upakramaadi Tatparya Lingas are used to extract the summary of a text in Indian Institute of Science. in our consultancy.

…. vriksha.. nivaasa.….…tara. shariira. mukha.tama} .} Gradation {Shushka. akinchana} Mecronymy {nAsika. vanaspati…} Antonnymy {Shreemaan.Wordnet / Concept-net based on NN ontology  Wordnet is an electronic lexical reference resource system designed on the basis of semantic relations of words      Synonymy {Graha.} Hypernymy {Amra.

there are various worked out methodologies Finally Knowledge Representation needs special treatment where Indian Knowledge systems can be applied . For representing and retrieving useful information. several databse management systems are available.Knowledge Engineering  Representation    For Data representation.

Info – Age calculated.sources. .Knowledge and its importance in AI     AI researchers are interested in building Intelligent systems Web technologies looking forward to Semantic webs instead of syntactic web Knowledge is more valuable than data and Information Data – simple DoB. Knowledge – the judgment about suitability for job at hand etc. This requires a lot of inputs from various K.

Computational Linguistics and Panini’s Grammar     The structure of Paninian Grammar is nothing but a computer program – Babbage ! It has captured the base of universal principles of all languages CL requires formal rules for analysis and generation of language Slowly Chomsky and others are turning towards Panini… .

The System of Panini  Phonetic component    Rule base       Phonemes pratyahara  Lexicon   Vidhi (operations) Samjna paribhasha (metarules) adhikara (headings) atidea (extension) niyama (restriction) Dhatupaatha Ganapaatha Affixes Rule specific items  Lists   .

Paninian Model for Sentence Analysis       Action – Central theme Karakas – Syntactico-semantic roles Visheshana-Visheshyabhava Concept of anabhihite…in switching to different voice Vivakshaa – Intention of speaker Form and meaning .

Ontology 2.. Epistemology 3.Navya Nyaya -> AI ? Classify Nyaya into five parts …. Technical Language 4. Semantics 5. 1. Art of debate and fallacies .

Ontology Includes…  Categories ..  Relations – SamavAya. especially in Cognitive sciences. .Substance. K-Repr. SvarUpa …  Universals – Types or classes… Ontology helps to various areas like NLP. Quality etc. K-Engg.

Epistemology Deals with …  Cognitive process  Cognitive structure It helps to solve the problems of cognitive sciences and K-repr. .

.Technical Language  NNL is a Restricted Language that has both the features – power of mechanism of Artificial Languages and power of of expression of Natural Languages. The basic ideas behind this language will be helpful in Knowledge Represenation.

Classification of words – rUdha. yoga Syntactical analysis Power of definitions KR & NN .Semantics  Way of analysis of semantics shown by Navya Naiyayikas has been crucially found helpful in NLP and Machine Translation Eg.

e names… Yougik – word has etymological meaning…cook. Yoga-rudha – which has etymology as well as convention…CD-driver .Semantics in MT  Lexicography   Word/concepts nets based NN ontology Classification of pada’s (words)    Rudha – word has convention I. driver.

Is useful .WSD – using different techniques  Definitions of Karaka relation without any overlap   Kartrtvam = kriyAnukUlakritimattvam Karmattvam = para-samaveta-kriyA-janyaphala-Ashrayatvam     Going – Rama and Forest Who is going where ? Result –contact is possible in Rama too. To avoid such overlap. this def..

understandable so on.Refinement of karaka Relations  Classification of Karma  Karma – Reachable.  Analysis of root semantics  Leave – He left the place / left from the place Rats killed cats  Analysis of expectancy (AkAnkshA)  .

To infinity relation      I stand up to speak I want o speak He goes to London to study law He wants to study law in London To walk in mornings is good for health .

Namaste! Special thanks to The authorities of Sri Chandrashekharendra Sarasvati Vishvamahavidyalaya Kanchipuram .