Professional Documents
Culture Documents
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 529
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
this field. The data mining techniques used in agriculture for Thus, we maximize P (Ci / X). The class Ci for which P (Ci / X)
prediction of problems, disease detections, optimizing the is maximized is called the maximum posteriori
pesticide and so on. It is capable to provide a lot of hypothesis[6]. By Bayes theorem
information on agricultural-related activities, which can then
be analyzed in order to find important information and to P (Ci / X) = P (X / Ci) P (Ci) / P(X).
collect relevant information. The data mining techniques are
Only P (X / Ci) P (Ci) is maximized since P(X) is constant.
used for disease detection, pattern recognition by using
Under given data sets with many attributes, it would be
multiple application in this study[2]. In an another study,
extremely computationally expensive to evaluate P (X / Ci).
Ahsan Abdullah, Stephen Brobst, Ijaz Pervaizhave shown
For reduction of computation, the nave assumption of class-
that how data mining integrated agricultural data including
conditional independence is made.
pest scouting, pesticide usage and meteorological recordings
is useful for optimization (and reduction) of pesticide
P (X / Ci) = i)
usage[3]. Jyotshna Solanki, Prof. (Dr.) Yusuf Mulge discussed
about the techniques used in agriculture for exacting the
meaningful data for relevant information. Cunningham, S. J., = P (x1/Ci)P(x2/Ci)P (x3/Ci) .. P (xn / Ci)
and Holmes worked on a classification system capable of
sorting mushrooms into quality grades and achieving an Bayesian classifiers have the minimum error rate in
accuracy similar to that attained by human inspectors [4]. In comparison to all other classifiers [6].
another study, D Ramesh, B Vishnu Vardhan describe certain
Data Mining techniques adopted in order to estimate crop 3.2. RIPPLE -DOWN RULE LEARNER (RIDOR)
yield analysis with existing data [5].
Ripple-Down Rules devised by Prof. Paul Compton is a
As per as this study is concern, it is found that, till now work strategy of building system incrementally while they are
has been done on the mushroom to find out the mushroom already in use [7]. When a system is not able to handle a
which is edible or not. No such cases found which works on case or situation appropriately, there is a need to alter the
the disease of the mushroom cultivation and their system in such a way that the previously acquired
management. This study actually focuses on this area to knowledge of the system is not degraded, just revised where
diagnosis the mushroom disease using classification necessary. The alteration is made simply and rapidly and the
techniques of Data Mining. difficulty of making a change should not increase as the
system develops. RDR can be categorized as a type of
apprentice learning [7]. The following formulation of a
3. DATA MINING CLASSIFICATION ALGORITHMS ripple-down rules is shown below:
3.1. NAVE BAYES
if condition then conclusion [because case] except
Nave Bayesian classification is a supervised learning
if ....
process. In addition, it is a statistical method for
classification purposes. This classification is based on
else if ...
Bayesian theorem. Bayes Nave Bayesian classifiers assume
that the effect of an attribute value on a given class is
If a rule fires, but it generates an incorrect conclusion, RDR
independent of the values of the other attributes. This
has a fascination to add an except branch. Besides this, one
assumption is called class conditional independence. It is
condition is also kept in knowledge that if a rule fails to fire
made to simplify the computations involved, in this sense, is
when it should, an else branch is added. In RDR, the
considered nave [6]. It is looked attractive on when the
appropriate condition is chosen in such a way that the new
dimensionality of the supplied inputs is high. The process of
rule to be inserted is as general as possible for diagnosing
maximum likelihood is being used for parameter estimation
correctly the new case. An RDR keeps track those cases
in nave Bayesian models. Let D be a training set of tuples
which make a new rule create.
and their associated class labels. The representation of every
tuple is being done by an n-dimensional attribute vector, X= 3.3. SEQUENTIAL MINIMAL OPTIMIZATION (SMO)
(X1,X2,X3,.,Xn). Let there are m Classes C= (C1,C2,C3,..,Cm).
Under given dataset, the Nave Bayesian classifiers will It is the extension of Support Vector Machines (SVM) is a
predict that given a tuple X belongs to the class having the method for the classification of both linear and nonlinear
highest posterior probability, conditioned on X. That is, the data [6]. Extensions of basic SVM algorithm such as the
nave Bayesian classifier predicts that tuple X belongs to the sequential Minimal optimization developed by John C. Platt
class Ci if and only if, of Microsoft research, which is utilized in this paper,
implemented in WEKA can be used to train SVM faster,
P (Ci / X) > P (Cj/ X) for 1<= j <= m , j i better group samples clusters. The goal of SVM is to search
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 530
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 531
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
Using cross-validation parameter the evaluation has been produces a classifier with an algorithm from 90 labeled data
performed to observe the performance of the three and applies that on the 10 testing data for set 1 and does the
classification algorithms. It has been done to confirm the same thing for set 2 to 10 and produces 9 more classifiers.
best technique for the Mushroom Diseases Diagnosis with Afterwards, it averages the performance of the 10 classifiers
the help of performance factors or criteria. produced from 10 equal sized (90 training and 10 testing)
sets.
This study has been focused on such criterias like Accuracy,
Attribute Selected Classifier, Kappa Statistic, Confusion Kappa statistics plays a significant role in terms of
Matrix, True Positive(TP) Rate, False Positive(FP) Rate, classification in mushroom dataset. It is a chance-corrected
Precision, Recall, F-Measure, Receiver Operating measure of agreement between the classifications and the
Characteristics(ROC) Area, and Error rate such as Mean true classes of the entire data set. Kappa actually calculated
Absolute Error, Root Mean Squared Error, Relative Absolute by taking the agreement expected by chance away from the
Error, Root Relative Squared Error. observed agreement and dividing by the maximum possible
agreement. When a value is greater than 0, that means the
Table -1: Attribute Data Set classifier is doing better than chance, which happens in all
the three classifier that the study have been used.
Number of
Attribute
Attribute Description
Type
Attribute
Another important protagonist works here in this study is
Values
Mean absolute error (MAE). It measures the average
Effect_Of_Damage Nominal 16
magnitude of the errors in a set of forecasts, without
Effect_on_Cap Nominal 6 considering their direction thereof. It measures accuracy for
Mycelium_or_Patch_Color Nominal 11 continuous variables. It is the average over the verification
Compost_And_Casing_Soil_Color Nominal 11
sample of the absolute values of the differences between
forecast and the corresponding observation. Root mean
Odour Nominal 6
squared error (RMSE) measures the average magnitude of
Habitat Nominal 10 the error by using quadratic scoring rule. Here the difference
Liquid_Exudate Nominal 3 between forecast and corresponding observed values are
Use_damp_wood Nominal 2 each squared and then averaged over the sample.
erheated_compost Nominal 3
Table -2: Comparative Study among the Algorithms
Chicken_Or_Horse_Manure_Mixer_In_Com
Nominal 2
post
Excessive_Ammonia_Or_Nitrogen_In_Com Evaluation on Training Data SMO Nave Bayes RIDOR
Nominal 2 Set Simple
post
Correctly Classified Instances 110 (100%) 110 (100%) 98
Peat_In_Casing Nominal 2
(89.0909 %)
Improper_Phase_II_Composting Nominal 2 Incorrectly Classified 0 (0%) 0 (0%) 12
Dry_and_leathery Nominal 2 Instances (10.9091 %)
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 532
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
The Nave Bayes, SMO and RIDOR classifiers are compared agricultural cultivation. Moreover, while farmer cultivate
with accuracy measures that are depicted in below Table 3. mushroom, same situation might be happened. As a result, at
Here, Initially the Nave Bayes and SMO classification any phase of undesirable growth of mushroom could be seen
algorithm have high accuracy to compare with RIDOR and thus it causes adversely affect the final mushroom yield.
algorithms for Mushroom Diseases Diagnosis dataset.
In this paper different classification algorithms such as the
Table -3: Accuracy Measure for Classification Nave Bayes, SMO and RIDOR are experimentally evaluated
using Mushroom Diseases Diagnosis dataset. Till now, no
Mining Classifiers Accuracy (%) work has been found for diagnosing mushroom diseases. So,
SMO 100
this field has challenging scope for future research work and
directions, which may be extended further. Here, the
Nave Bayes 100
classification algorithms are used on Mushroom Diseases
RIDOR 89.0909
Diagnosis dataset with the help of the data mining tool as
WEKA. It is deduced that Nave Bayes Classification
algorithm provides better results for Mushroom Diseases
But, after analyzing elaborately, this study finds that more Diagnosis dataset when compared with other classification
filtration is required for choosing best algorithm among three
algorithms.
classification algorithm. So, the TP Rate, FP Rate, Precision,
Recall, F-Measure, ROC area values have been verified.
In near future there is defiantly scope for using more
In detailed Accuracy by Class, both Nave Bayes and SMO different classifiers with more consistent dataset to get more
classification algorithm has also same value without accurate values for decision making. Finally, applications of
comparison of error rate measure. Besides this, at the time of advance data mining technology can enhance disease
comparing different type of error rates namely Mean diagnosis and treatment to avoid undesirable loss during
Absolute Error (MAE), Root Mean Squared Error (RMSE), mushroom cultivation.
Relative Absolute Error (RAE), Root Relative Squared Error
(RRSE) in the mushroom dataset, then among two algorithm
consisting of same accuracy it has been investigated that ACKNOWLEDGEMENT
Nave Bayes Classification algorithm has become finally more
preferable because of less prone to error among all different I am obliged to Dr. Dilip Roy Chowdhury who had inspired
type of error rates during this experimental analysis. Nave me to create this research work. Evaluating my interest and
Bayes Classification algorithm performs best with accuracy urge he encouraged and helped me to complete this
100% with less error rate. experimental work.
Table -4: Error Rate Measure for Best Classification .
REFERENCES
Name of Classification
MAE RMSE RAE RRSE
Algorithm
[1] S.R. Sharma, Diseases and Competitor Moulids of
SMO 0.1094 0.2284 93.3603 94.3904
Mushrooms and their Management,N.R.C.M.
2007Technical Bulletin, National Research Centre for
Nave Bayes 0.0002 0.0012 0.1278 0.4881 Mushroom(Indian Council of Agricultural Research).
RIDOR 0.0136 0.1168 11.6397 48.2497 [2] J. Solanki, Y. Mulge Different Techniques Used in Data
Mining in Agriculture, International Journal of
Advanced Research in Computer Science and Software
Engineering (IJARCSSE), pp. 1223-1227, Vol. 5, Issue 5,
ISSN:2277 128X, 2015.
6. CONCLUSION AND FUTURE SCOPES [3] Abdullah, A., Brobst, S., Pervaiz, I., Umer, M. and Nisar,
A. (2004). Learning Dynamics of Pesticide Abuse
The study has found that, with ever changing and growing through Data Mining. In Proc. Australasian Workshop
agricultural knowledge and increasing spread of diseases, on Data Mining and Web Intelligence (DMWI2004),
the mushroom farmers are sometimes confused and Dunedin, New Zealand. CRPIT, pp. 151-156, Vol. 32.
overloaded with information during decision making. It is Purvis, M., Ed. ACS, 2004.
then suggested to consult the specialists to resolve
confusion. But, however, this facility of consulting specialists [4] Cunningham S. J., and Holmes, Developing Innovative
might not always be available or not timely available [12]. Applications in Agriculture using Data Mining. In the
This is certainly a serious problem of mushroom agricultural Proceedings of the Southeast Asia Regional Computer
domain. For this, huge damage is seen during any type of Confederation Conference, pp. 1-2, 1999.
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 533
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
[5] D Ramesh, B Vishnu Vardhan, Data Mining Techniques Development using Artificial Intelligence and Soft Computing
and Applications to Agricultural Yield Data, techniques. Besides, he is continuing his research in the
International Journal of Advanced Research in fields of Artificial Neural Networking, Data Mining, Rough Set
Computer and Communication EngineeringVol. 2, Issue
Computing, Knowledgebase Design and Information
9, pp. 3477-3480, ISSN (Print) : 2319-5940, ISSN
(Online) : 2278-102, 2013. Retrieval. He is associated with various national and
international journals of repute as a member of reviewing
[6] Jiawei Han, Micheline Kamber, Jian Pei., Han Data committee and editorial committee.
Mining Concepts and Techniques, 3rd Edition, Morgan Email: diliproychowdhury@gmail.com
Kaufmann Publishers, imprint of Elsevier,ISBN 978-0-
12-381479-1, 2012.
Mr. Subhashis Ojha is working at Dept. of
[7] Paul Compton, Glenn Edwards, Byeong Kang, Leslie
Lazarus, Ron Malor, Phil Preston and Ashwin Computer Application, Mathurapur B. S. S.
Srinivasan, Ripple down rules: Turning knowledge High School, Mathurapur, Malda as an
acquisition into knowledge maintenance*, Artificial Assistant Teacher. Before that he was a
Intelligence in Medicine (Elsevier), Vol. 4, pp. 463-475, faculty at Computer Application
1992. Department of Bankura Unnayani Institute
of Engineering under West Bengal University of Technology.
[8] Richard Enyinnaya, Predicting Cancer-Related
He has received MCA degree. His research area lies with
Proteins in Protein-Protein Interaction Networks using
Network Approach and SMO-SVM Algorithm, Expert System, Artificial Intelligence and Soft Computing.
International Journal of Computer Applications, pp. 5 - E-mail: subhashis.ojha@gmail.com
9, Vol. 115, No. 3, 2015.
BIOGRAPHIES
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 534