You are on page 1of 7

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 1, April 2010

URBANIZING THE RURAL AGRICULTURE


- KNOWLEDGE DISSEMINATION USING
NATURAL LANGUAGE PROCESSING
Priyanka Vij (Author) Harsh Chaudhary (Author) Priyatosh Kashyap (Author)
Student, Computer Science Engg. Student, Computer Science Engg. Student, Computer Science Engg.
Lingaya‟s Institute of Mgt. & Tech, Lingaya‟s Institute of Mgt. & Tech, Lingaya‟s Institute of Mgt. & Tech,
Faridabad, Haryana, India Faridabad, Haryana, India Faridabad, Haryana, India
priyankavij6@gmail.com harsh_hps@yahoo.co.in priyatoshkashyap@gmail.com

ABSTRACT - The Indian rural agriculture has been facing a lot make this communication interactive we make use of an
of problems. There are problems like irrigation problems, upcoming technology “Natural Language Processing”.
unfavorable weather conditions, lack of knowledge regarding the
market prices, animals, tools & pest prevention methods.
Hence there is a need to develop a method to enable our A. Natural Language Processing
rural farmers to gain knowledge. Knowledge can be gained by It‟s used to communicate with the computer in our natural
communicating to the experts of various fields in the agricultural language.
sector. Therefore, we aim to provide the farmers with an By using it we believe to make it an interactive End to End
interactive kiosk panel, using which they can get an easy and
timely solution to their queries within 24 hours without being
communication using Voice, where Voice of Sender in his
troubled to travel to distant places or make long-distance calls to language is converted to a Voice in Receiver‟s Language.
gain information. This Entire Process of Voice to Voice Transformation, may be
Hence we focus towards development of software, which divided into 3 phases:-
would provide immediate aid to the farmers in every possible
manner. It would be an interactive system providing an end-to-
end connectivity to the farmers with the international
agricultural experts, which can help them in solving their queries
and thereby enhancing the sources of information to the farmers.

Keyword: Rural Agriculture; Farmers; Natural Language Figure 1. Process of Natural Language Processing
Processing; Speech recognition; Language translation; Speech
synthesis; 1) Speech recognition: Converting the spoken words to
machine-readable input. It includes the conversion of
I. INTRODUCTION continuous sound waves into discrete words.[1]
India is an agro-based country with its major sector being the 2) Language translation: It‟s translation of one natural
rural region. One of the major source of livelihood in India is language into another.
agriculture. Current agricultural practices are neither 3) Speech synthesis: It‟s the artificial production of human
economically nor environmentally sustainable and India's speech. A text-to-speech system converts normal
yields for many agricultural commodities are low. language text into speech.[2]
Sources responsible for this are unpredictable climate,
growth of weeds, lack of knowledge about land reforms and B. Background and Related Work
market prices, decrease in profit margin, lack of technology, A lot of work has been done in the field of the agriculture
proper machinery, instant trouble shooting, knowledge of extension to provide the farmers with ready to use knowledge.
agricultural advancements, improper communication etc.. Many methods for the same have been implemented in India
In order to overcome these problems the farmers should be and other countries too.
made aware of the current trends in the field of agriculture, so
1) aAQUA - Almost All Questions Answered (aAQUA) is web
that the entire agricultural system can be upgraded to solve
based query answering system that helps farmers with their
and overcome the bottleneck problems in the agricultural
agricultural problems. The technology for aAQUA is a
growth. multilingual (Marathi, Hindi, and English) system which
As we know Communication is the main backbone to solve provides online answers to questions asked over the
any problem irrespective of any field it belongs to. So we have internet.[3]
used communication as an integral part of our project. To

163 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 1, April 2010

(IJCSIS) International Journal of Computer Science and Information Security,


Vol. XXX, No. XXX, 2010
Shortcoming: the experts who understand their language but also with
it‟s a web portal, but majority of the rural population, in experts from other countries.
India, is not literate,
We not only Deal With the Marketing Facilities, but
In order to run this system a person has to be available
generalize it to be a communication related to any field
always to register the farmer‟s query which is not always
causing a problem to the farmers.
very feasible.
If a person speaks in his regional language (e.g. Hindi) As we say that we connect the Farmers with the
his query will only reach the experts who understand that International experts it means we pose „No Restriction /
language (i.e. Hindi), Barrier‟ such as an Intranet, rather we use Internet
Making Help lines available, so that farmers can call and
2) E-Choupal :- The Internet enable the farmers to obtain ask, but this is totally dependent on a Mobile‟s Network as
information on mandi prices, good farming practices and well as a monetary aspect in connecting a call. Hence we
place orders for agricultural inputs like seeds and remove it as we use Internet & connect the two ends totally
fertilizers.[4] Each kiosk has an access to Internet. with the limited cost of a dedicated internet Line.
Shortcoming:
Deals only about the Market facilities and not
Troubleshooting
II. ARCHITECTURAL ISSUES AND CHALLENGES
The various challenges to be faced by us in building such a
3) Gyandoot:– It‟s an intranet network of computers system are:
connecting the rural areas and fulfilling the everyday, Providing information to the rural, computer-illiterate
information related needs of the rural people.[5] It made population via the kiosks was a big
use of information and communication technology to challenge. To make the illiterate people
provide online services comfortable with our system, we
Shortcoming: designed a user interface which could
It‟s only implemented as yet in an intranet, amongst a pictorially depict at a glance, what that
small district. particular section is about.(eg) Fig.2
points to the field “WEATHER” Figure 2. Weather Icon
4) Agriculture Information System:– It uses Agri Portal, Proper connectivity, as our
Mobile Agriculture, Kisan Help Desk, Agri Learning, Agri kiosks require 24 hour internet access. Due to this reason, it
GIS, Integration.[6] It broadcasts Information through is very essential to have dedicated internet connection.
Mobile Phones– Voice or SMS. Even in the case of resource constraint i.e. connectivity
Shortcoming: issues etc, a proper backup must be taken into
This one uses voice interaction but is only through consideration.
Mobile phones
Creating an voice recognition system, which translates the
C. Overcome The Shortcomings voice into text, This is a challenge as such systems require
a high amount of training for accurate recognition. The
We implement our system in an Information Kiosk, which
major concern is the difference in regional accent of people
helps us rectify the above shortcomings.
while speaking a language (such as Hindi).
The farmers get in hand information regarding the crops,
their prices in the market and every information they wish Translating a language to another, there might be certain
to know about within 24 hour, without having to go to grammatical errors that might also lead to a complete
change of meaning in the sentence and in that case the
various places in search of proper tools and techniques
which are being used. reciever might understand something completely different
than what was actually meant.
We Understand that a Farmer may or may not be literate, Synthesizing speech out of the text, there may be some
Hence We make limited use of language and emphasize on problems such as the computer is not able to say a word
the pictorial representation of Data. properly, and the audio which we get might be a rhyming
The farmers can themselves record their queries using the word of the actual word to be said, hence it would change
microphone. Hence the need for a helper to operate the the entire meaning of that sentence.
system for them is nullified. Adding intelligence in the system, to make this system give
response immediately, it is required that a database is
Moreover since Natural language processing technique has created where the pre-answered queries along with their
been used for the conversion of Hindi to English and solutions will be stored and when the similar question
English being the most common, hence it allows free would be asked, depending on the Pattern Matching the
interaction between a rural farmer and the international system will retrieve the most appropriate solution. But,
experts, thereby connecting Indian farmers, not only with

164 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 1, April 2010

(IJCSIS) International Journal of Computer Science and Information Security,


Vol. XXX, No. XXX, 2010
sometimes the keyword may match but the sense of the he had posted. The answered text is stored in the database next
question might be completely different. to the question, which would be used in phase 3.
Providing better service response time, we need to undergo
considerable process reengineering exercise so that the Phase 3: Farmers Getting Instant Responses if Query Already
response we claim to come as soon as the expert answers, Answered
max 24hrs, may be lowered to some minutes, which can If a Farmer asks a query which was already answered by some
only be possible if we are able to add intelligence to the other farmer then the Answer to that Particular query is posted
system. back to the Farmer Instantly.
This process of Intelligent Information Retrieval is carried
out by analyzing the keywords present in the query and then
III. METHODOLOGY searching it on the database whether they match accurately
It‟s the sequence of routines which we adopt for the with any of the pre-answered query If a perfect match occurs
development of the system, It‟s development is divided into 3 which is in context to the farmer‟s query, the answer is
Phases:- instantly given to the farmer else the process of phase1 would
be carried out.
We discuss the main Techniques, with the Help of which our
project transforms the Input Voice to the Audio output at the
same kiosk where the farmer asks the question:

A. Speech Recognition

Converting the spoken


words to machine-
readable input is speech
recognition.
It includes the
conversion of continuous
Figure 3. Flow diagram of the entire System, showing the 3
phases
sound waves into
discrete words. Speech
Phase 1: Farmers Asking Queries From The Expert Recognition
The identity of the Farmer is first identified. The Farmers then fundamentally functions
clicks on the buttons depicting the picture related to their as a pipeline that
queries, (eg) a problem pertaining to Diseases of Crops would converts PCM (Pulse
have a picture of crops on it. Code Modulation)
The question is then asked by the farmer in his voice (e.g. digital audio from a
Hindi) and recorded, then the message is translated to the sound card into
Expert‟s Language in text and is sent to the expert of that recognized speech. [7]
field. Figure 4. Speech Recognition
This process is implemented by a Speech Recognition Process
Technique, converting the Farmer‟s Voice to Text and then a
Language Translation converts the Text to the English Text The elements of the pipeline are:
and also the intermediate English Query thus produced would 1) Transform the PCM digital audio into a better acoustic
be stored in the database for a technique to come up in Phase representation: The PCM audio, thus noticed by the sound
3. card is converted into an acoustic representation which can
then easily be transformed into a digital representation
Phase 2: Experts Answering Back The Queries using a Fast-Fourier Transform (FFT). This digital
To the Query of the Farmer, the Specific Expert responds by representation can easily be understood by the computer
answering back in text. The English Text is then converted and so it can work over it.
into an Audio signal in the Language of Farmer (e.g. Hindi).
This audio would be heard by the respective farmer who asked 2) Figure out which phonemes are spoken: Here we begin by
this query when he next time logs in the system. applying a "grammar" on the data so the speech recognizer
This process is implemented by a Language Translation knows what phonemes to expect. A grammar could be
the English answer thus written is translated back into anything from a context-free grammar to full-blown
farmer‟s language text and then Speech Synthesis converts the Language. Hence the computer, fed in with a database of
text in the farmer‟s language into audio signals. This audio phonemes of that grammar, tried to figure out and identify
reply is heard by the Farmer as a response to the query which the phonemes in the digitized data, and spots out the
matching references

165 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 1, April 2010

(IJCSIS) International Journal of Computer Science and Information Security,


Vol. XXX, No. XXX, 2010
3) Convert the phonemes into words: The identified the part-of-speech based on the word endings, or by
phonemes are then searched in the database of words of looking the word up in a lexicon.
that particular grammar, in the order in which those 3) Word pronunciation: The pronunciation module accepts
phonemes appeared in the digitized data, and so phoneme- the text, and outputs a sequence of phonemes. It first looks
by-phoneme, a complete word is identified and made. the word up in its own pronunciation lexicon, if not found
it reverts to "letter to sound" rules, which are "trained" on a
B. Language Translation lexicon of hand-entered pronunciations.
Also known as Machine Translation, is translating one natural 4) Prosody: Prosody is the pitch, speed, and volume that
language into another. Machine translation generally uses syllables, words, phrases, and sentences are spoken with.
natural language processing and tries to define rules for fixed Without prosody text-to-speech sounds very robotic. First,
constructions. The original text is encoded in a symbolic the engine identifies the beginning and ending of
representation from which the translated text is derived. sentences, Engines also identify phrase boundaries and
finally algorithms then try to determine which words in the
sentence are important to the meaning, and these are
emphasized
5) Concatenate wave segments: The speech synthesis is
almost done by this point. All the text-to-speech engine has
Figure 5. Showing Language Translation between to do is convert the list of phonemes and their duration,
English & Hindi. [8]
pitch, and volume, into digital audio.
A new approach of machine translation is used i.e. statistical
approach. The Computer is Fed in with billions of words of D. Intelligent Information Retrieval
text, both monolingual text in the target language, and aligned The Information Retrieval refers to the process of retrieving
text consisting of examples of human translations between the information, which matches to a certain clause. Here we take
languages. Then the statistical learning technique is applied to care of matching the keywords in query to the, resources
build a translation model. already available in the database, which may be done with the
Thus we can say that Statistical Machine Translation help of Pattern Matching techniques, which help us identify
works by comparing large numbers of parallel texts that have the the Keywords. i.e. it mainly takes into consideration the
been translated between Source and Target Languages and noun, verb, phrases, adjectives which may be of importance in
from these it learns which words and phrases usually map to the sentence and ignored the connectors such as prepositions.
others, which is analogous to the way humans acquire Then, it searches those keywords n the database and the best
knowledge about other languages. The problem with statistical matching source having maximum keywords in it are taken
machine translation is that it requires a large number of into consideration. The word Intelligent is used, as it uses its
translated sentences which may be hard to find. knowledge to update the keywords to be searched in the
database.
C. Speech Synthesis
It‟s the artificial production of human speech. A text-to-speech IV. EVALUATING THE TECHNIQUES
system converts normal language text into speech. Text-to- The techniques were used in order to manipulate the data, and
speech fundamentally functions as a pipeline of processes that there examples may be considered, such as:
converts text into PCM digital audio.[9] The processes are:
1) Text Normalization: This component of text-to-speech Speech Recognition
converts any input text into a series of spoken words. Table 1, highlights an example where Farmer speaks in
Trivially, text normalization converts a string to a series of Hindi, and the respective spoken speech is recognized by the
words. The Text Normalization works by: First, isolating computer and the error in recognized form and its
words in the text and dealing them individuals, percentage in then mentioned.
Second, it then searches for numbers, times, dates, and
T ABLE 1: SPOKEN SENTENCE IS RECOGNIZED BY THE COMPUTER AS TEXT
other symbolic representations. These are analyzed and
converted to words. Then, abbreviations are converted to
proper words and finally normalizer will use its rules to see
if the punctuation causes a word to be spoken or if it is
silent.
2) Homograph disambiguation: A "homograph" is a word
with the same text as another word, but with a different
pronunciation. So we must try to figure out what the text is Language Translation
talking about and decide which meaning is most Table 2 and 3, points out to the translation of text. Where
appropriate in the given context. This is done by guessing Translation of 1 language to other takes place, and the error

166 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 1, April 2010

(IJCSIS) International Journal of Computer Science and Information Security,


Vol. XXX, No. XXX, 2010
in translation is mentioned, leading to a percentage of its V. FUTURE SCOPE
error in translating. This system can be extended in many directions, which we
found while developing it. As we have laid down the
T ABLE 2: RECOGNIZED T EXT IS TRANSLATED FROM H INDI TO E NGLISH
foundation, we can propose the extensions to our system, such
as:
The recent prices of the commodities can be listed on the
screen for the farmers so that they get in hand information
about the same.
Help farmers make online transactions of certain items.
In Table 2, we take that Hindi text in converted to English Real Time speech to speech translation, even according to
text, as the question was asked by the farmer in Hindi. Times Online, Google is developing a speech-to-speech
automated translator for Android phones.
T ABLE 3: EXPERT‟ S REPLY TO THE FARMER ‟ S QUERY, & TRANSLATING IT
FROM E NGLISH TO H INDI
Video conferencing can be integrated in this project,
wherein the farmers can actually communicate with the
experts via the video, and there being a total speech to
speech translation between the two ends
Encompassing Weather Forecasting within this project
whereby the farmers will be warned if any unfavorable
weather conditions prevail and so the farmers can grow
the crops depending on the weather and without having to
In Table 3, English text is converted to Hindi Text, as a be a victim of uncertain weather.
reply to farmer‟s question Multi Lingual system not only for Hindi but other
Speech Synthesis regional language and foreign language
Table 4, emphasizes the use of the text-to-speech Many questions are too elaborate & descriptive in nature.
application. The translated text of the expert‟s answer is Techniques for extracting the questions from such kind of
then made to be read out by the computer, the error in data must be seen.
pronunciation and its percentage is also mentioned
VI. PROTOTYPE OF
T ABLE 4: T HE H INDI ANSWER IN TEXT IS THEN MADE INTO SPEECH
„URBANIZING THE RURAL AGRICULTURE‟
Here we show screenshots of a working prototype of our
software:

The results of the above errors are now shown in a chart,


categorized in the sections as: Speech Recognition,
Language Translation 1 & 2, and Speech Synthesis.

Figure 7. Implementation of Speech Recognition &


Translation from Hindi to English

This screen demonstrates the process when farmer asks the


Figure 6. The error percentage occurring in various segments question in his language. Where it helps us recognize speech
word by word, after pressing the button “Enable Speech”, and
then the recognized Hindi words said by the farmers are
converted into English by pressing the “Hindi to English
Transform Button”.

167 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 1, April 2010

(IJCSIS) International Journal of Computer Science and Information Security,


Vol. XXX, No. XXX, 2010
We are deeply grateful to our Project Guide, Mr. Mohd. Ahsan
(CSE Dept), for his detailed and constructive comments, and for his
important support throughout this work.
A special thanks to Mr. Ashutosh Kashyap (CEO,
CodeInnovations), for giving such an idea, to take over this project
and his timely support and help for acquainting us with the new
technologies.

REFERENCES
[1] Speech Recognition,
http://en.wikipedia.org/wiki/Speech_recognition
[2] Speech Synthesis, http://en.wikipedia.org/wiki/Speech_synthesis

Figure 8. Implementing the Translation from English to Hindi [3] aAQUA – A Multilingual, Multimedia Forum for the
Community, Krithi Ramamritham, Anil, Bahuman, Ruchi Kumar,
This Screen Demonstrates the process when expert answers Aditya Chand, Subhasri Duttagupta, G.V. Raja Kumar, Chaitra
Rao, Media Lab Asia, IIT Bombay
the question in his language, to the query asked by the farmer.
Where an expert, logs in the system and tries to answer the [4] E-choupal, http://www.itcportal.com/sets/echoupal_frameset.htm
problem with his best knowledge. Then that answer written in [5] Gyandoot,http://www.icmrindia.org/casestudies/catalogue/IT%20
English is converted into Hindi. Then this text would be and%20Systems/ITSY022.htm
converted into speech, which farmer can listen when he logs in [6] Agriculture Information System,
the system. http://www.egovcenter.in/webinar/pdf/agriculture.pdf
[7] Speech Recognition,
VII. CONCLUSION http://electronics.howstuffworks.com/gadgets/high-tech-
Agriculture is the most important source of livelihood in India, gadgets/speech-recognition1.htm
but there are some problems still prevailing in agricultural [8] An Insight to Natural language Processing, Priyanka Vij, Harsh
field. Hence there is a need to enhance it in order to overcome Chaudhary, Priyatosh Kashyap, Students, Dept. of Computer
these problems. For this we have tried to optimize the Science Engg,. Lingaya’s Insitute Of Mgt. & Tech, Faridabad,
agricultural outputs using technology. Haryana, India
We have tried to bridge the gap between the farmers [9] Speech Synthesis, http://project.uet.itgo.com/textto1.htm
and the experts. The farmers will get instant solutions to their
queries on a real time basis or at maximum within AUTHORS PROFILE
24hours.This will not only help the farmers getting the correct
solution but will also save their time which would have been Priyanka Vij, a final year computer
wasted in going to that expert for getting the solution for a science student at Lingaya‟s Institute
particular query and also the money to communicate with him. of Mgt. & Tech., Faridabad, Haryana,
The natural language processing is a major part of India. Her areas of interest include Bio
this project, which would convert the farmer‟s language (e.g. Informatics, Natural Language
Hindi) into English which is globally an official language. Processing and Software development
Since there is no language barrier due to the application of life cycle.
natural language processing, so the query can be asked to the
international experts as well. Hence this is our initiative Harsh Chaudhary, a final year
towards development of interactive software which would computer science student at Lingaya‟s
help the farmers to employ the latest technologies and enhance Institute of Mgt. & Tech., Faridabad,
their crop‟s productivity. Haryana, India. His areas of interest
Since the Rural Agriculture is the most undeveloped include Computer architecture and
part of our country, till we don‟t find ways to improve it, we Natural Language Processing and
are hindering the progress of our country. This won‟t only project management-areas.
affect the ‘Growth Rate of the Indian Economy’, but also the
‘Global Growth Rate’. As when we‟ll grow, it‟ll help the
world grow. Priyatosh Kashyap, a final year
information technology student at
ACKNOWLEDGEMENT Lingaya‟s Institute of Mgt. & Tech.,
We are heartily thankful to Dr. T. V. Prasad (HOD, C.S.E Dept, Faridabad, Haryana, India. His areas
Lingaya’s Institute Of Management & Technology) whose of interest include Virtual Reality,
encouragement, guidance and support from the initial to the final Natural Language Processing and
level enabled us to develop an understanding of the subject. Robotics.

168 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 1, April 2010

(IJCSIS) International Journal of Computer Science and Information Security,


Vol. XXX, No. XXX, 2010

169 http://sites.google.com/site/ijcsis/
ISSN 1947-5500