You are on page 1of 6

THIAGARAJAR COLLEGE OF ENGINEERING,

MADURAI 625015.
Department of Computer Science and Engineering

Soil Data Analysis


20C113 - Amarnath V K | 20C122 - Kishore S | 20C129 - Ranjith R

Abstract
Agricultural research has been profited by technical advances such as automation, data mining.
Today ,data mining is used in a vast areas and many off-the-shelf data mining system products
and domain specific data mining application soft wares are available, but data mining in
agricultural soil datasets is a relatively a young research field. The large amounts of data that are
nowadays virtually harvested along with the crops have to be analyzed and should be used to
their full extent.

Introduction
A soil test is the analysis of a soil sample to determine nutrient content, composition and other
characteristics. Tests are usually performed to measure fertility and indicate deficiencies that
need to be remedied. The soil testing laboratories are provided with suitable technical literature
on various aspects of soil testing, including testing methods and formulations of fertilizer
recommendations. It helps farmers to decide the extent of fertilizer and farm yard manure to be
applied at various stages of the growth cycle of the crop.

Methodology
Soil classification system is essential for the identification of soil properties. Expert system can
be a very powerful tool in identifying soils quickly and accurately .Traditional classification
systems include use of tables, flow-charts. This type of manual approach takes a lot of time,
hence quick, reliable automated system for soil classification is needed to make better utilization
of technician's time.

We propose an automated system that has been developed for classifying soils based on fertility.
Being rule-based system, it depends on facts, concepts, theories which are required for the
implementation of this system. Rules for soil classification were collected from soil testing lab.
The soil sample instances were classified into the fertility class labels as: Very High, High,
Moderately High, Moderate, Low, and Very Low. These class labels for soil samples were
obtained with the help of this system and they have been used further for comparative study of
classification algorithms.
Soil Classification
Naive Bayes

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem
with strong (naive) independence assumptions. Depending on the precise nature of the
probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning
setting. An advantage of the naive Bayes classifier is that it only requires a small amount of
training data to estimate the parameters (means and variances of the variables) necessary for
classification.

J48 (C4.5)

J48 is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool.
C4.5 is a program that creates a decision tree based on a set of labeled input data. This decision
tree can then be tested against unseen labeled test data to quantify how well it generalizes. This
algorithm was developed by Ross Quinlan. It is an extension of Quinlan's earlier ID3 algorithm.
C4.5 uses ID3 algorithm that accounts for continuous attribute value ranges, pruning of decision
trees, rule derivation, and so on.

The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is
often referred to as a statistical classifier.

JRip

This algorithm implements a propositional rule learner, Repeated Incremental Pruning to


Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized
version of IREP.

In this paper, three classification techniques (naïve Bayes, J48 (C4.5) and JRip) in data mining
were evaluated and compared on basis of time, accuracy, Error Rate, True Positive Rate and
False Positive Rate. Tenfold cross-validation was used in the experiment. Our studies showed
that J48 (C4.5) model turned out to be the best classifier for soil samples.
Prediction Of Untested Attributes
Using regression algorithms like Linear Regression, Least Median Square, Simple Regression
different attributes were predicted. According to these results the values of Phosphorous attribute
was found to be most accurately predicted and it depends on least number of attributes.

When all attributes are numeric, linear regression is a natural and simple technique to consider
for numeric prediction, but it suffers from disadvantage of linearity. If data exhibits non-linear
dependency, it may not give good results .In this case, least median square technique is used.
Median regression techniques incur high computational cost which often makes them infeasible
for practical problems. Linear-Regression test for predicting phosphor gave the best and accurate
results. These predictions can be used to find out phosphor content without taking traditional
chemical tests in soil testing labs, and this will eventually save a lot of time. Statistical results of
these tests. There were very limited variations amongst the predicted values of phosphor
attribute. Though the Least Median of Squares algorithms is known to produce better results, we
noticed that the accuracy of linear regression was relatively equivalent to that of least median of
squares algorithm.
The Next Generation of Soil Testing And
Analysis Technology

Technological advancements in soil testing, analysis and data management are long overdue and
represent a major constraint to the farmer’s adoption and profit realization of precision
agriculture.

Logiag Inc is a company that is introducing a new soil data management system called
LASERAG that can help crop input suppliers, retailers and agronomists to provide farmers with
the precise and accurate soil test results they need to take their field productivity to the next
level.

The LASERAG system has four components:

1. Breakthrough Soil Analysis Technology: an innovative laser induced breakdown


spectroscopy (LIBS) technology has been adapted to analyse agricultural soil samples.
By focusing an intense laser beam on a dry and compressed soil sample, a plasma is
created at temperatures of up to 25000C. The electrons of all the atoms in the plasma are
excited and change energy levels. Once the beam is cut, the electrons settle back to their
original levels and emit atom specific photons that are captured. The nutrients present in
the sample are thus identified and quantified. The results are presented as standard soil
test results. Each sample is analysed 3000 times in a few seconds eliminating all human
error and increasing precision and repeatability. The new system is accurate 100% of the
time on all nutrients.

1. A Cloud Based Soil Information Management Software has been designed for the
service providers. The software allows the user to create customers fields,
determined the sampling pattern, import shapefiles and soil maps. Once the soil
test results are received the user can print various reports, make its prescriptions
and produce the shapefiles for the variable rate applications. The soil test results
are downloaded automatically each time the software is opened.

2. A Smart Phone GPS Application for the Soil Sampler in the Field. The
application synchronizes with the service provider software and the sampler can
see himself as a blue dot in the field. All the sampling points are also identified
and the app facilitates sampling decisions. Each sample is put in a small cup made
of a porous plastic to enhance drying with a QR code on top. The QR code is read
with the smart phone application. Once the top is closed, no other human being
will touch the sample. The cups are put in boxes of a dozen dropped at the post
office at the end of the day. No writing and no forms need to be filled out.

3. The Soil Analysis Lab itself, where an oven, a press and the machine housing the
laser are found, as well as a computer server with the lab software. The software
converts photons measurements to nutrient levels, stores the results on the cloud
and generates reports. Attached to the box is a computer screen that allows users
to see what is going on as the samples are analysed. Logiag is currently working
on the next generation of the system that will reduce the size of the machine and
making its transport and installation simpler.

Conclusion
This new approach combines recent developments in chemistry and modern statistics.
Specifically, from chemistry we use NIR spectroscopy to analyze the collection of some
regression models recently developed in nonparametric statistics. The technology of NIR
spectroscopy is the fastest and most accurate method. Moreover, it reduces the need for
conventional wet chemistry procedures. Next, the functional statistics allows us to explore all the
information of the spectroscopy analysis where the spectra are viewed as curves. These models
are easily implementable, and their efficiency is related to the homogeneity of the studied data.
In this sense for each case, we can choose the adapted model.

You might also like