Professional Documents
Culture Documents
911
C. Data processing
In this phase the final data set organized in
preprocessing phase will be processed by different data
mining techniques. Data about identified spores will be
classified according to the group of diseases that can
cause. Beside this, spores for each pathogen are classified
in two types. First type represents active spores, meaning
that such spores could cause diseases in appropriate
weather conditions. Second type represents passive spores.
Passive spores can’t cause infection, although weather
conditions are fulfilled. Data collected from the
meteorological stations are more numerous. For each
parameter measurement is repeated many times during the
day. This causes big set of meteorological data for model
Fig. 1. Block scheme of the proposed system training. Depending of meteorological data category,
different classification or clustering techniques can be
A block which is called weather/meteorological stations used. Technique selection must be based on results that
presents network of the meteorological stations. These concrete technique provides. From all of the collected
stations are positioned in advance determined locations meteorological and data about spores we create training
near the plantation. Parameters from both types of stations file. This file starts with attributes definition of all data.
will be saved in system database. From automatic stations That means we must define if concrete attribute is numeric
report will be send through the network, automatically. or nominal. The last attribute represents class attribute.
Data from manual station must be entered by human. Data Attributes must have defined order of appearance in each
collection can be automated if we have more automatic instance. In that order we input values for concrete disease
stations with spore traps on them. Data from the electronic or group of diseases. Instances first have meteorological
microscope about identified spores (for which pathogen parameters, after that characteristic of the spores, and at
spore is present, active or passive type of spore) will be the end class value. Based on those instances we will
saved in the same database as meteorological data. More create training model. Number of instances is very
spore traps provide better coverage. One spore trap will be important. Increasing number of instances provides better
placed near the specific plant. In that way examination can training model. For created file we apply different
be significantly faster, because at the start we eliminate classification techniques, and measure percentage of
spores that are not specific for disease attacking that plant. corrected classified instances, standard deviation, mean
Spore trap will be examined by the phytopathology. At absolute error, relative absolute error, and root relative
this stage, the data collection phase is over. By this we squared error. A technique that gives the best results will
mean on the data obtained in the current time. For be chosen for creation of training model. In order to
successful prediction, data from the previous ten and more evaluate different data mining techniques we create
years will be entered in the database. Information about training dataset. Given dataset contains data from one year
identified infections on the field was obtained from long timespan. This dataset is created for two specific
farmers and from official bodies in charge of monitoring diseases. Over this data we apply multiple data mining
the occurrence of disease. techniques. Classification results are presented in Table 1.
For this evaluation we used WEKA classes implemented
B. Data preprocessing in C# form application.
Data preprocessing must be applied on both
meteorological and data from electronic microscope. Data TABLE 1: CLASSIFICATION RESULTS AND STATISTICS
preprocessing is often neglected but important step in the Classifier output J48 SMO ZeroR
data mining process. Data gathering methods are often Cor. classified instances 90.32% 85.25% 73.77%
lightly controlled, resulting in out-of-range values, Incor. classified instances 9.68% 14.75% 26.23%
impossible data combinations (Temperature: 29oC, Snow: Mean absolute error 0.0985 0.2659 0.2783
Yes), missing values, etc. Analyzing data that has not been Standard deviation 0.2812 0.3414 0.3681
carefully screened for such problems could produce Relative absolute error 25.39% 95.55% 100%
misleading results. Thus, the representation and quality of Root relative squared e. 64.06% 92.75 100%
data could be crucial for analysis process. Steps like data
preparation and filtering could take considerable amount For all classifiers we perform 10-fold cross-validation,
of processing time. Data preprocessing includes cleaning, without percentage split. This means that we use whole
normalization, transformation, feature extraction, dataset for training. Evaluation helps us to choose the best
selection, etc. The product of the data preprocessing is the technique for model creation. After the evaluation, training
final training set. For our research other weather model that will be used for prediction is build. For that
parameters besides the above mentioned are not necessary. purpose we use classifier that shows best results. This is
Such parameters that are coming from digital very important because if we build better training model
meteorological stations will be removed in this step.
912
prediction will be more accurate. From the table above we healthy foods, reduced number of chemical treatments is
can see that J48 classifier provides the highest percentage very important. With appropriate detection and prediction
of correctly classified instances. Because of that fact in we could get successful chemical protection and healthy
this case we will choose to build training model with J48. food.
Authors’ future research will be implementation and
D. Prediction
integration of proposed model in real terms. Authors plan
For disease infection all mentioned parameters must to create much bigger training dataset, with instances from
have values in specific range. Based on classified data about twenty years in the past. New parameters from the
from the training model, we can predict if appropriate real world will be collected from automatic meteorological
conditions for possible infection are satisfied. From this stations mounted on plantations. Obtained results will
moment we use created trained model for prediction. New show the degree of accuracy of the practical application of
set of instances provided from the meteorological stations the proposed model.
and laboratory will be used for test dataset creation. Test
dataset has the same form like training dataset. For the ACKNOWLEDGMENT
class values we can input two types of values. First is This paper is result of collaboration with the Ministry of
question mark that indicates that we do not know which Education, Science and Technological Development of
class value is appropriate for that set of data. In the second Republic of Serbia within the projects TR 32023 and TR
case we can predict the class value intuitively, and input 35026.
our prediction. After entering all the current values of the The authors are grateful to the professors from College
instances in the dataset, prediction can start. of agriculture and food technology in Prokuplje, for their
In prediction phase we use our saved training model. collaboration in order to acquire basic knowledge on
For the prediction we must select the classifier used for certain plant pathogens.
model creation. Prediction output will be class value for
each instance, regardless of whether we put a question REFERENCES
mark or predicted value. Despite this, degree of probability [1] M. Kantardzic, “Data mining concepts models methods and
is also essential. Degree of probability will vary depending algorithms”, John Wiley & Sons, Inc., Hoboken, New Jersey, pp.5-
21, 2011.
on the values of the current parameters. If current values [2] H. Jorquera, R. Perez, A. Cipriano, G. Acuna, “Short term
for all parameters are similar with corresponding values forecasting of air pollution episodes”, In: Zannetti P (eds)
from the training set, model will predict that probability Environmental modeling 4. WIT Press, UK, 2001.
[3] B. Rajagopalan, U. Lall, “A K-Nearest Neighbor simulator for daily
for infection is in the similar range. precipitation and other weather variables”, Water Resources
For the predicted value, if there is just one disease in the Research, vol. 35, no. 10, pp. 3089–3101, 1999.
training and testing set, and conditions for that disease are [4] S. Tripathi, V. Srinivas, R. Nanjundiah, “Downscaling of
precipitation for climate change scenarios: a Support Vector
fulfilled, we will get answer. If we have more than one Machine approach”, Journal of Hydrology, vol. 330, Issues 3-4, pp.
possible disease, output will be the one with the highest 621–640, 2006.
probability. [5] E. Georgiana, ”A Decision Tree for Weather Prediction”, Buletinul,
Vol. LXI no. 1, pp. 77-82, 2009.
We use mathematical regression methods for [6] K. Verheyen, D. Adriaens, M. Hermy, S. Deckers, “High resolution
verification of the results obtained by WEKA continuous soil classification using morphological soil profile
classification algorithms. Mathematical regression and descriptions”, Geoderma 101, pp. 31–48, 2001.
statistics calculations will be obtained by MatLab. After [7] G. Meyer, J. Neto, D. Jones, T. Hindman, “Intensified fuzzy
clusters for classifying plant, soil, and residue regions of interest
confirming predictions, and if some infection is possible from color images”, Computer and Electronics in Agriculture vol.
farmers will be notified. Notification will contain 42, pp.161–180, 2004.
information about present disease, and a proposal of [8] G. Camps-Valls, L. Gomez-Chova, J. Calpe-Maravilla, E.
SoriaOlivas, J. Martin-Guerrero, J. Moreno, “Support Vector
measures. For notification like this we must create farmers Machines for crop classification using hyperspectral data”,
database, and provide message or mail service. Lecture Notes Computer Sciences 2652, pp. 134–141 , 2003
[9] V. Leemans, M. Destain, “A real time grading method of apples
IV. CONCLUSION based on features extracted from defects”, Journal of Food
Engineering, vol. 61, pp. 83–89, 2004.
Agricultural production is complex job. One of the most [10] A. Mucherino, A. Urtubia, “Feature Selection for Datasets of Wine
unpredictable and complex task is chemical protection. Fermentations”, I3M Conference Proceedings, 10 th International
The key factor for successful chemical fruit protection Conference on Modeling and Applied Simulation (MAS11), Rome,
Italy, 2011.
from diseases and pests is nothing but the right moment. [11] Jr. Riul, H. Sousa, R. Malmegrim, D. Santos, A. Carvalho, F.
This means that the selection of chemicals is not as Fonseca, O. Oliveira, L. Mattoso, “Wine classification by taste
complex as timing determination for protection. Early fruit sensors made from ultra-thin films and using Neural Networks”
Sensors and Actuators B 98, pp. 77–82, 2004.
disease detection has a lot of benefits. From the angle of [12] K. Brudzewski, S. Osowski, T. Markiewicz, “Classification of milk
farmers, methods like suggested one provide important by means of an electronic nose and SVM neural network”, Sensors
information for successful chemical protection. Second and Actuators B 98, pp. 291–298, 2004.
[13] T. Rumpf, A. Mahlein, U. Steiner, E. Oerke, H. Dehne, L. Plumer,
benefit for the farmers is economical. They can save “Early detection and classification of plant diseases with Support
money if they reduce numbers of chemical treatments. Vector Machine based on hyperspectral reflectance”, Computer and
This is because model indicates when conditions for Electronic in Agriculture, vol. 71, num. 1, pp. 91-99, 2010.
diseases development are not fulfilled. In that case
chemical treatment is not needed. From the perspective of
913