DOI: 10.21917/ijsc.2016.0188

Prachi Dalvi1, Varsha Mandave2, Madhu Gothkhindi3, Ankita Patil4, S. Kadam5 and Soudamini Pawar6
Department of Computer Engineering, D Y Patil College of Engineering, India

Abstract Marathi is spoken in the complete Maharashtra state which
Ontology is defined as shared specification of conceptual vocabulary consists of 34 different districts. Marathi language is the most
used for formulating knowledge-level theories about a domain of effective and common way of communication between farmers in
discourse. Dataset is created by manually collecting information about Maharashtra. Most of the farmers are able to understand Marathi
different diseases related to crops. Ontology modeling is used for language in Maharashtra.
knowledge representation of various domains. India is an agricultural
based economic country. Majority of Indian population relies on The most widely quoted definition of ‘‘ontology’’ was given
farming but the technologies are sparsely used for the aid of farmers. by Tom Gruber in 1993, who defines ontology as (Gruber, 1993)
Ontology based modeling for agricultural knowledge can change this [1]: “An explicit specification of a conceptualization”. Ontologies
scenario. The farmers can understand it easily in their native language. have proved their usefulness in different applications scenarios,
We proposed a system which will model and extract knowledge in such as natural language processing, semantic web, intelligent
Marathi language. In this paper, we review various existing agriculture information integration, knowledge-based systems and digital
ontology’s along with some of Natural Language Processing (NLP) libraries. Ontologies are developed to separate domain knowledge
models. Model ontology for agriculture domain system aims to retrieve from operational knowledge. Reuse of domain knowledge and
relevant answers to the farmer’s query. We explored Rule-Based and
Conditional Random Fields based models for Ontology extraction. The
operational knowledge is possible using ontologies.
extraction methods and preprocessing phases of proposed system is Ontologies in specific domains such as Health care have been
discussed. developed on a large scale. In health care, the information
regarding medical treatments is consistent worldwide. But in
Keywords: agricultural field, the information changes according to
environmental conditions and geographic locations. Agricultural
Ontology Modeling, Agriculture, NLP, Marathi, Domain Ontology
information has strong local characteristics in relation to climate,
culture, history, languages, and local plant varieties. Farmers in
1. INTRODUCTION India belong from different states and different states have
different languages. Language becomes a barrier as the farmers are
More than 70 percent of population in India has agriculture as unaware about other languages. Due to this, it is difficult to build
a mean of livelihood. The agriculture domain is very vast. Large a universal ontology that will provide answers to farmer’s queries
number of data has been written in books and till today lots of according to environmental conditions and in native language. The
electronic data is available. Farmers in India are badly affected by proposed system extracts the knowledge in native language i.e.
not being able to get vital information required to support their Marathi. It will help the farmers speaking Marathi language to gain
farming activities in a timely manner. Some of the required knowledge regarding crop diseases.
information can be found in government websites, agriculture
Natural Language Processing (NLP) is a very active area of
department leaflets, and from radio and television programs. Due
to its unstructured and varied format, and lack of targeted delivery research and development in Computer Science. NLP applications
are machine translation and automatic speech recognition. Natural
methods, knowledge is not reaching the farmers. India being a
language processing techniques are used to process input which is
diverse country and language changes after every 20 kilometers it
in the form of natural language i.e. human understandable. The
becomes difficult to communicate. And as majority of Indian
idea behind the natural language processing is to interpret input as
farmers are not educated, it becomes difficult for them to handle
whole by combining the structure and meaning of words that is
English language. So it is necessary to have a system which will
interpretations are obtained by matching patterns of words against
have farmers to gain knowledge in their native language.
the input utterance.
Marathi is regional language of Maharashtra state. It uses
modified version of Devanagari script and few dialects of Marathi
are Standard Marathi, Varhadi, Dangi and Ahirani.
There are over 68 million people of western and central India
speaks the Marathi language. Marathi is an Indo-Aryan language.
It is written in Devanagari script similar to the National Language
of India i.e. Hindi. Sanskrit language is written using the
Devanagari script. In India, Marathi language has the largest Fig.1. Architectural Overview
number of native speakers.


related data. agriculture ontology are Domain ontology.1 Integrated Agriculture Information Framework (IAIF): building biomedical ontologies from texts. But in India researches are still working in Indian Institute of Technology-Bombay (IITB) in crop specific ontology Juana Maria Ruiz-Martinez and four researchers [2] had for cotton crop and building ontology form text document. For economic growth of country agro substantial benefit to user in terms of: based industries play a vital role. by FAO integration unity. In section 5 results are discussed in tabular format Kanpur (IITK). Through the crop planting information expression and AGROVAC is a structured thesaurus created in 1980. Following are such describes Literature review on existing agriculture ontologies are examples: Agropedia platform is basically an agricultural discussed in text format. • Largely helpful for agriculture education system. types of The remainder of this paper is structured as follows: Section 2 corps.2 Scalable Service Oriented Agriculture Ontology for area of agriculture. transform the natural language description or and European Communities. Extraction and constructing agricultural domain for Marathi Agriculture is considered to be a very important sector in language. It is powered by FAO.1.PRACHI DALVI et al.: ONTOLOGY EXTRACTION FOR AGRICULTURE DOMAIN IN MARATHI LANGUAGE USING NLP TECHNIQUES The objective of this paper is to highlight the techniques or 2. It is important that all the data regarding agriculture domain should be well organized and • To help in terms of understandability which means the properly arranged. AGROVOC is widely used in specialized Precision Farming (ONTAgri): libraries as well as digital libraries and repositories to index content. includes all aspects of at its core. Ontology extraction techniques can be used for • Describe and represent data in an explicit manner. domain related repositories.1. There are many ontology’s available online in agricultural agriculture domain experts and researchers. 2. It covers the fields of food. This system provides among knowledge organization system.4 Agricultural Ontology Service (AOS): Food Safety Semantic Retrieval System is an ontology-based AOS is designed for utilization of AGROVOC encyclopedia semantic retrieval experimental system. AGROVOC is managed by FAO. and owned. farmers.1.1.1 EXISTING AGRICULTURE ONTOLOGIES methods found during the phase of keyword identification. According to Ling Cao et al. proposed Ontology learning from biomedical natural language documents using UMLS. It is also used as a specialized tagging resource for Scalable Service Oriented Agriculture Ontology for Precision knowledge and content organization by FAO and other third-party Farming (ONTAgri) is proposed to use in agriculture domain and stakeholders. This approach relies Integrated Agriculture Information Framework (IAIF) is one on natural language processing and knowledge acquisition of the useful solutions for ontology extraction. Maohua Wang. LITERATURE REVIEW languages [16]. international community of experts and institutions active in the 2. Gelian Song. Resource Ontology. this domain consist of several farming practices such as irrigation fertilization and pesticides spraying [3] [4]. forestry. There are more than 10. With the help of WAICENT. This IAIF techniques to obtain the relevant concepts and relations to be technique makes knowledge extraction possible from various included in OWL ontology. The use of ontology for extraction purpose may provide creation of raw food items. Agriculture Literature Retrieval System defined agriculture literature concepts captured from 2. so that the farmer can easily retrieve the inter- farmers can understand it easily in their native language. extracting relevant information. merge and aggregate the data in existing knowledge [2] provides an overall description of the AGROVOC Linked repositories.000 system. unstructured information into formal. This knowledge repository consists of universal along with ontology evaluation terminologies. animal husbandry. In section 3 Challenges are discussed.3 AGROVOC: method. fisheries. It is a multilingual thesaurus records. domain which includes ontology’s for different crops. systems [3]. Xiao Ying puts forward a kind of agriculture domain knowledge ontology representation 2.5 World Agriculture Information Center (WAICENT): Encyclopedia of Chinese Agriculture and Catalogue of Ancient WAICENT’s is a multilingual knowledge management Chinese Agricultural Literatures. maintained by an Linking Ontology [3]. etc. Armando Stellato and five researchers combining. etc. The main purpose of the users to access the accumulation of the knowledge in the food AOS is to achieve interoperability among different agriculture safety domain [18]. It also serves as a common set of core terms and food safety knowledge in the field of International Journal of relationships as well as the richer relationship which can be shared Applied Information Systems emergencies.1. 1360 . Crop specific ontology’s for rice crop were also built in IITK. The three sub-ontologies included in IAIF Dataset and details its maintenance and publication process. And use that knowledge to support agricultural problem [5] [6]. meta-model and localized content for a variety of users with appropriate interfaces that supports information access in multiple 2. fisheries. which is used for wide range of application in section 4. They proposed a methodology for 2. FAO keywords extracted from the research papers of Chinese knowledge of agriculture is available to users around the world agricultural history [19]. through internet [7]. proposed system and Ontology extraction methods are agriculture in India and developed by Indian Institute of India- discussed. solving and decision support effectively. Main functions of IAIF technique are Caterina Caracciolo. In Wikipedia. structured knowledge agriculture.

Parsing Marathi statements was difficult. “presented”.1.3 Stop Word Removal: ontology in Marathi language. data preprocessing used for extracting interesting. Identify keywords is one of the important task when working with text. Hence the ontology is modeled manually.1 Part-of-Speech Tagging [12][8]: electronically. clauses and sentences is called syntax.2.4 Processing: in text query. To extract key phrases from the text corpus different lexical patterns 2. morph analysis.7 OntoSim-Sugarcane: Marathi Format OntoSim is an application and basic purpose of this system is to represents hydrology.1 PREPROCESSING (CWMS): Preprocessing is an important task in Natural Language The basic purpose behind this system is balance processes for Processing (NLP). Block.1. Keyword identification is useful because they reduce 1361 . POS tagging and stop word removal etc. Fig. this system [15]. For each query. soil cell. to model 3. There is no standard ontology modeling tools. Stop words are used to join words together in a sentence.6 Citrus Water and Nutrient Management System 3. This ontology is used. They occur very frequently 2. For example. These tokens are given for parsing.2 Tokenization: The another challenged we faced was structure data was not In tokenization. soil moisture. phrases. available for Agriculture in Marathi language. System Flow • As a refining and classification tool facilitating indexing and searching process in a repository environment. This was the main challenged we faced during data Part-of-Speech (POS) tagging is a starting point for processing collection. crop growth on organic soils and nutrient uptake in Pre-processing Extraction Model Output southern Florida sugarcane production and 195 equations and 247 symbols are included in this collection [16]. Proposed system extract domain specific terms from the text • As a domain model for rule knowledge base construction corpus. Template based Keyword model Result 2.1. and weather are included in non-trivial and knowledge from unstructured Marathi text query. ‘or’.1. 3.e. Stop words are not useful for searching.1. language and the output is answer related to query i. soil layer. ‘this’ etc are not used for classification of (Marathi) was difficult. 2.2 Structured Data: 3.1. soil profile.2. Exploration of the words in a with unstructured data. Input Query in 2. are grouped into classes. The text corpus is processed using various techniques like [17].2 KEYWORD IDENTIFICATION contained in the current text.8 OntoCrop Ontology: Identification Assembly This is the ontology constructed for horticulture domain of CRF model agriculture and the author examines the usage in particular domain.2. In the proposed system farmer will enter the Water and nutrient for citrus production and included 700 query in Marathi language. a stream of text is break into words. The words having similar syntactic behaviour then we converted it into Marathi using Google Translate. documents. plant growth. Agriculture data in Marathi language is not available 3.2 CHALLENGES are applied.ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING. pesticides name. processing.5 Syntactically driven parsing: Three main phases are necessary: The way that words can fit together to form higher level units (a) Preprocessing such as phrases. sentence is done by tokenization. 3. “presenting” are reduced to a input of the process is a query entered by the farmer in Marathi common representation “present”. PROPOSED SYSTEM The process of conflating the variant forms of a word into a common representation is called stemming. but these words are meaningless.4 Stemming: 3.2. so they must be removed. The data was first collected in English language and textual information.1. ISSUE: 01 2. So in the area of natural language symbols and 500 equations. The words: “presentation”.1 Data Collection: the frequency of the term in text corpus. VOLUME: 07. Tokenization is used to identify the meaningful 2. nutrient cycling. irrigation system.2. these phases extract the ontological entities 3. So we had to deal symbols which are called as tokens.1. Relevance of the key term is calculated by counting 2. Syntax analyses are obtained by application of grammar that determines (b) Keyword identification what sentences are legal in the language that is being parsed. ‘are’. (c) Knowledge extraction.3 Standard Tool: keywords. root distribution. OCTOBER 2016.2. the The ontology extraction process is described in Fig. Stop words like Applying pre-processing techniques on natural language ‘and’.

Ontology Evaluation के ल्यास ते जातील या बद्द्ल मािहती कळवावी. distinct values returned by the system segments containing at most one word to extract. माशी घालवण्यासाठी concept is represented as a concatenation of the consecutive entity 3 words. size of the ground truth list We also collected frequently mentioned terms into question. the words are identified so that they can extract. other states so far.e. Finally. for mango crop). कसा न� Ontology 7 करता येईल ? Evaluation Metric Measure 8 डा�ळब च्या खोडावरील खवले �कड िनयंि�त कशी क� ? Perspective शेवगा झाडा च्या पान व श�गा खाणारी अिळ पडिल आहे. one to three keywords for each question i. tomato. CRF will provide answer for farmer query. कोणत्या. For training and testing. पाने सरळ आिण व्हायरस संपवण्यासाठी उपाय quality and correctness. for measuring the performance. According to = questions different category of question adopted different rule.3. The idea Considering two class disease and non-disease. D recall = tokenization and part-of-speech tagging. Our CRF Table. if system is to define a conditional probability distribution over label provides disease then value 1 is considered and if system does not sequences given at a particular observation sequence.PRACHI DALVI et al. Number of frameworks and सांगा. word score features and features name.1 Rule-based Method: Count: Number of terms with Consistency Rule based models help you to write the rules explicitly. 2 * precision * recall 3. ऊपाय सांगा designated one concept. By identifying most useful keywords from farmer query for handling words that are new or have only been observed in related pesticides are extracted. These terms are marked as being a part of a Question relevant entity. a schema for matching the rules and a conflict Quality Clarity Number of word senses resolution schema if more than one rule is applicable. Accuracy in ontology 3. they are deterministically 1 मीरिच पीकावर मावा रोग आला आहे. along with its subject and objects.3. the CRF identifies relevant terms. CRF is a show the results it will be considered as 2 and in non-disease class. After preprocessing that is stop words removal. crop name and disease starting features. identified as entities into one concept.2 Conditional Random Fields (CRF's) Method: F score = (3) precision + recall Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data. Questions Designed implementation consists of two steps: First. This सं�ा वर काली माशी पडली आहे. methodologies are available for ontology evaluation. Precision: total number correctly 9 found over whole knowledge defined योग्य मागर्दशर्न करा.1. The first distinct values extracted by our system and N be the total number operation that the algorithm must naturally execute is the of values returned by the system preprocessing. काय क� ? सं�ा फळातील रसशोषणारी �कड साठी कोणत्या 4 4. If consecutive words are identified as belonging Question ID to one entity (e. we included the other category crops like mango. in which truth list (doctors’ annotations). First we We have used CRF for knowledge Extraction in our project.g. लालकोळी ची िमरची वर लागण झाली आहे. 1362 . We picked Features include word identity. recall and F-score) will be used main verb of sentence.: ONTOLOGY EXTRACTION FOR AGRICULTURE DOMAIN IN MARATHI LANGUAGE USING NLP TECHNIQUES the dimensionality of text to the most important features. ending features. RESULTS AND DISCUSSIONS उपाययोजणा करायला ह�ा ? पपई च्या पानांवर �रग स्पॉट व्हायरस मुळे पाने वाकडे Ontology evaluation basically depends on two aspects i.3 EXTRACTION METHODS Correctness Recall: total correctly found over all knowledge that should be found 3.2. discriminative model. Second. transitions among class labels. It does not assume the features that are We have evaluated the values provided by the system with independent. D precision = specific vocabulary. we apply different S transformation rules. By using rule based method. CRF can be used to classify the वांगे च्या फु लावर स्पायडरमाईटस आहे तर कोणते औषधे identified relevant entities. locate the attributes by identifying related keywords. distinct values returned by the system within such vocabulary. This is done by merging the 2 वाप� ? consecutive words. D is the number of correct. कशाची फवारणी 6 Table. agriculture expert to check the correctness.e. The Table. 5 झालेत. we developed the rules needed to extract the Standard metrics (precision. such as कोणते. पपई वर काळे �ठपके पडले आहेत.1 shows the different = total number of results returned by the system questions asked by the farmers. a working memory for Efficiency Size storing states. Let S be the size of the ground Proposed system relies on a training corpus of sentences. A inconsistent meaning rule based system consists of a set of rules. The training sentences are split into short (1) number of correct. The extracting N strategy combined with regular expression and term searching (2) number of correct. bhendi etc.

Q. तसेच णारी �कड Overall performance of rule based system: In the below table संध्याकाळी शेतात दाट धुर करावा Precision.1 0.2 0.7 लालकोळी डा�ळब.8 1 1 Q.7 0. Q.870 0.4 टाकावे .9 1 1 1 सं�ा. वरील पीक �कवा रोगा ब�ल मािहती उपलब्ध Q. Fig.9 श�गा खाणारी नाही.2 1 1 Q.3 सं�ा Q.5 0. फॉस्फोिमडॉन.789 0.746 0.7 1 1 Q.5 1 1 Q.ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING.798 0. व्हायरस िनयं�णात ठे वावी. कॅ पटन. एन्डोसल्फान. Recall and F-score Question Question Keywords Extracted Value Precision Recall F -score ID ID इिमडाक्लोि�ड. वसंतवेल या सारखे गवत काढु न Q.8 0. Q.3: Enter query in Marathi अिळ Table. िमरची. ISSUE: 01 Table. िजनेब.745 0.4.6 1 1 Fig. डायकोफॉल.रसशोष Q.567 0. Answer of query Q.4. The Fig.3 2 1 Q.712 0.554 नाही.3 and Fig.674 0. मोनो�ोटोफॉस. डायमेथोएट. OCTOBER 2016.फळांना बॅग ने झाकु न टाकावे . मावा �कड that the crop name or disease name is not present.9 2 1 1363 . कॉपर पपई . कॉपर हाय�ॉक्साईड.8 खवले शेवगा.651 गुळवेल. इिमडाक्लोि�ड. फॉस्फोिमडॉन. Q.4 0. For question 3 and question 9 there are no values as the पपई . Results Table.543 0.587 0.3 1 1 1 वांगे. The system will return Q. डायथोन एम ४५. Recall and F-score values of each question are shown.4 1 1 Q.755 0. फॉस्फोिमडॉन.631 0.865 टस घ्यावी.590 Q. मावा या पै�क फवारणी घ्यावी. Q.732 Q.6 औिक्सक्लोराईड.5 स्पॉट िपकांची लागवड क� नये. अबामेिक्टन Q.771 वरील पीक �कवा रोगा ब�ल मािहती उपलब्ध Q. काळे Q.861 0.2 स्पायडरमाई थायमेथॉक्झाम या पै�क एकाची फवारणी Q.4 shows the screenshots of the system implementation.5. सल्फर ची धुरळणी. �ठपके यापैक� एकाची फवारणी घ्यावी. �रग पपईच्या शेताच्या जवळपास वेलव�गय information is not present in the ontology.3.1 मीरिच.6 0. Precision.1 1 1 Q. Classification Question ID Classification Expert Q. VOLUME: 07.595 0. डायमेथोएट. Q.766 Q.

ac. Maliappis. “From Agrovoc to the Agricultural Ontology Service/Concept Server”. “Applying an Agricultural Ontology Ontology Extractor for Indian Languages”. the relevant information will be extracted and Simulation in Agricultural Systems Modeling”. “Study on Precision Agriculture Knowledge Presentation with Ontology”. Vol. Kelly Morgan. Agricultural solution will be provided. 2015. According to the Grunwald. 38.6. 51. “Domain Specific [17] Michael T. Proceedings of to Web-based Applications”. J.H. 10. Identification and Control. 1-10. we have shown ontology modelling for [15] Howard Beck. pp. pp. F-Score graph for above query in Text-to-Speech Synthesis and Automatic Speech Recognition”. “Ontology-based farmer’s query. “The World Agricultural Information Centre (Waicent) Faos Information Gateway”. [13] Alexandre Trilla. [10] Daniel Jurafsky and James H. Kaiyan Feng and Lei Liu.. In this paper. Lang and languages. No. [1] Brijesh Bhatt and Pushpak Bhattacharya. pp. 341-348. 1-2. International Journal of 1364 . pp. “Ontology-based Simulation of Water Flow in Organic Soils applied to Florida Sugarcane”. 732-738. “Foundations of Statistical Natural Language Processing”. 130-136. “Speech and Language Processing”.. et al.PRACHI DALVI et al. Prentice hall. [5] Aqeel-ur Rehman and Zubair A. Howard W. 4. more number of crops and can be made available in other Indian Yunchul Jung. Proceedings of Conference on Modelling. Qiang Zeng. 299-311. 7. 1997. 97. 1-5. Beck. Proceedings of 8th International Conference on Knowledge Fig. Proceedings of 1st European Conference for Fig.5. pp. REFERENCES Agricultural Water Management. Food and Agriculture Organization of the United Nations. 48. “Ontology Learning from Biomedical Natural Language Documents using UMLS”. “A Synergistic Strategy for Combining Thesaurus based and Corpus-based Approaches in Building Ontology for Multilingual Search Engines”. 1999. Vol. The average F-score we obtained for 9 Systems. 2008. 2010.L. Vol. 103. pp. [2] Juana Maria Ruiz Martinez. 2013. 75-84. 1107-1115..iitk. 5. Recall graph for above query Information Technology in Agriculture. 3. the system can be expanded for [16] Ho-Young Kwon. Sabine Grunwald. [6] Caterina Caracciolo et al. Journal of Biomedical Informatics. [12] Hui Wang. Yunchul Jung. Margherita Sini.agropedia. 2014. Semantic Web.: ONTOLOGY EXTRACTION FOR AGRICULTURE DOMAIN IN MARATHI LANGUAGE USING NLP TECHNIQUES the 10th Workshop on Asian Language Resources. No. Departament de Tecnologies Media. Ward. Ho-Young Kwon and Jin Wu. 2006. Kelly T.7. 112- 122. [11] Leyla Zhuhadar. pp. Sabine agriculture domain in Marathi language. Mangstl. Johannes Keizer and Stephen Katz. 12365-12378. Judy and F. “The Agrovoc Linked Dataset”. Vol. CONCLUSION 2009. Timothy A. 2011. Martin. 2014. “Ontology-Based Query Expansion for Supporting Information Retrieval in Agriculture”. 2012. [4] Rayner Alfred et al. Vol. Proceedings of International Conference on Agricultural and Biosystems Engineering. “ONTAgri: Scalable Service Oriented Agriculture Ontology for Precision Farming”.R. No. [8] A. “Natural Language Processing Techniques Fig. Morgan. [14] Agropedia. 189-198. In future. pp. 2nd Edition. Expert Systems with Applications. Anita Liang. 463-477. MIT press. [9] Chris Manning and Hinrich Schutze. pp. 2011. Zuofeng Li. Vol. 2012. 2010 questions is 76. [7] Boris Lauser..98%. Available at: http://www. “Extracting Important Information from Chinese Operation Notes with Natural Language Processing Methods”. Shaikh. pp. pp. [3] Gelian Song et al. Vol. Computers in Human Behavior. pp. Precision graph for above query Management in Organizations. pp. 1. 3. Samira H Daroub.Weide Zhang.

pp. OCTOBER 2016. Proceedings of IEEE International Conference 3. VOLUME: 07. ISSUE: 01 Metadata. 2008. Networking and Mobile Computing. Construction of Agriculture Literature Retrieval System”. “Study on Proceeding of 4th International Conference on Wireless Food Safety Semantic Retrieval System based on Domain Communications. on Cloud Computing and Intelligence Systems. 2011. “Domain Ontology-based 133-140. Junping Du and Meiyu Liang. [19] Ling Cao and Lin He. 1365 . Vol.ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING. pp. Semantics and Ontologies. 40-44. No. pp. [18] Yuehua Yang. 1-2. 1- Ontology”. 2009. 4.