You are on page 1of 30

Seminar Report

on
“Soil Classification using Machine Learning Methods and Crop
Suggestion Based on Soil Series’’

SUBMITTED BY

Dhobale Tushar
TECO2223A014

Under the guidance of


Mrs.Yasmin Khan

In partial fulfillment of the requirements for


Bachelors Degree in Computer Engineering
of
SAVITRIBAI PHULE PUNE UNIVERSITY[2022-2023]
Department of Computer Engineering

D. Y. PATIL COLLEGE OF ENGINEERING,


Akurdi, PUNE 411044.
Contents

TitlePage i

Certificate iii

Acknowledgement iv

Abstract v

1 Introduction 1

1.1 Problem Definition .. . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Motivation, Objective and Social Relevance...... 3

1.3 Planned Outcome . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Survey of Topic 5

3 Discussion of Base paper 6

4 Algorithm 7

5 Summary 8

6 References 9

7 Plagiarism Report 10
D. Y. Patil College of Engineering Akurdi,Pune-411044
Department of Computer Engineering

CERTIFICATE

This is to certify that Dhobale Tushar has satisfactorily completed the

seminar work entitled “Soil Classification using Machine Learning

Methods and Crop Suggestion Based on Soil Series ”Which is a bonafide

work carried out by him under the supervision of Mrs.Yasin Khan and it is

approved for the partial fulfillment of requirement of Savitribai Phule Pune

University ,for the award of the degree of Bachelors of Engineering

ComputerEngg. for the academic year 2022-23.

Mrs.Yasmin Khan Dr.M.A.Potey


(SeminarGuide) (HOD Computer)

Place:Akurdi
Date:

iii
ACKNOWLEDGEMENT

With immense pleasure, I present the seminar report as part of the

curriculum of the T.E. Computer Engineering. I wish to thank all the people

who gave us an unending support right from when the idea was conceived.

I express sincere and profound thanks to Mrs.Yasmin Khan seminar Guide

,and HOD "HOD Name", who is ready to help with the most diverse

problems that I have encountered along the way. I express sincere thanks

to all staff and colleagues who have helped directly or indirectly in

completing this seminar successfully.

Name :- Dhobale Tushar

Roll No :- TECO2223A014

Signature:- :-

iv
ABSTRACT

Soil is an important ingredient of agriculture. There are several kinds


of soil. Each typeof soil can have different kinds of features and different
kinds of crops grow on differenttypes of soils. We need to know the features
and characteristics of various soil types to understand which crops grow
better in certain soil types. Machine learning techniques can be helpful in
this case. In recent years, it is progressed a lot. Machine learning is still an
emerging and challenging research field in agricultural data analysis. In this
paper, we have proposed a model that can predict soil series with land type
and according to prediction it can suggest suitable crops. Several machine
learning algorithms such as weighted k-NearestNeighbor (k-NN), Bagged
Trees, and Gaussian kernel based Support Vector Machines (SVM) are used
for soil classification. based method performs better than many
existingmethods. Soil dataset and crop dataset are used. The datasets
comprise of chemical attributes andgeographical attributes of soil and crops.
Algorithms like SVM and Ensembling technique can be used to classify the
soil series data and predict the suitable crops.

v
List of Figures

Figure Figure Name Page No.


No

1
2

vi
List of Tables

Sr. No. Page no

vii
Technical Seminar Soil classification

1 Introduction

INTRODUCTION TO Soil Classification using Machine Learning


Methods and Crop Suggestion Based on Soil Series

Agriculture is a source of livelihood in most parts of the world.


Agricultural produce is of great importance. But in recent years, the
agricultural produce is gradually decreasing. Soil plays a crucial role
in agriculture. Soil consists of nutrients, that are used by the plants
to grow. There are different types of soils available and each having
different properties. Crop‟s productivity is mainly based on the type
of soil. The possible way to improve productivity is that we choose a
right crop for the right land type. This can be done by first analysing
thesoil then classifying it into different soil groups. Based on these
soil groups and the geographical conditions, one can decide which
crop is best suited and is beneficial. The traditional methods are
Costly, long process and also time consuming. Hence, there is a need
for new technologies and methods to enhancethe existing system in
order to get faster and better results.

Machine learning is one of the budding technologies in the field


of agriculture. Machine Learning can be used to improve the
productivity and quality of the crops in the agricultural sector. It can
be used to find patterns among theagricultural data and classify it
into a more meaningful data. This data can be used for further
processes. Machine learning techniques usually follows the
following procedure: collecting data, processing the data,
trainingtesting of data samples. The algorithm such as SVM can be
used for classification of soil and crop prediction based on previous
patterns followed and the type of the soil. The project requires the
following datasets: soil dataset with several chemical properties has
its features and crop dataset with geographical attributes as its
features.

Department of Comp.Engg. 1 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil classification

The project aims at creating a model that efficiently classifies


the soil instances and map the soil type to the crop data to get better
prediction with higher accuracies. Soil prediction involves types of
crop classifications and geographical attributes. It also aims at
creating a system that processes the real- time soil data to predict
the crops with higher accuracy. The model involves two phases:
training phase and testing phase. Two datasets are used: Soil dataset
andcrop dataset. The predicted and actual classes are compared and
the list with correct classes is obtained.

Department of Comp.Engg. 2 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil classification

1.1 Problem Definition

Soil classification is a way of describing a given plot of soil. Most


gardeners choose to work with five different types or classifications of
soils: sandy, saline, peat, clay, or silty. Different combinations of air,
water, organic matter, and materials result in different soils.

Soil classification is an important first step in gardening because it


enables growers to first identify what type of soil they have on hand, and
then help them know what improvements they need to make, if any, to
easily turn an unproductive lot of land into loam.

Classifying soil also helps gardeners understand what type of plants


will grow and thrive in their particular soil.

Department of Comp.Engg. 3 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

1.2 Motivation, Objective and Social Relevance

a.) Motivation behind project topic

The purpose of this work is to categorize the soil samples


according to the nutrients present into it and to predict the crops
which can be yield in the particular soil. Also this project provides
suggestion to improve the soil if the farmer wants to yield specific
crop in that soil.

b.) Objectives of the work

The primary objective of this system is to classify the soil


according to the nutrients into it. For this, we have taken datasets of
soil samples. The soil will beclassified using decision tree algorithm
and type of soil will be displayed. Also we are going to predict the
crops suitable for the particular type of soil. In additionto this, we are
going to improve the soil if the farmer wants to yield particular crop
in the same soil by suggesting the requirements of the nutrients for
the samesoil.

To overcome these drawbacks we propose a modified decision tree


approach forsoil classification. Here, we first calculate the gini index
for different ranges of attribute values instead of computing for
every successive pairs as was done in CART algorithm. Then we have
used ratios of these computed gini indices to reduce the bias that
was introduced by information gain in C4.5 algorithm. The proposed
algorithm is described below.

Department of Comp.Engg. 4 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

1.3 Planned Outcome

Describe types and processes of weathering and erosion.


This section introduces you to weathering and erosion, both important parts
of the rock cycle. You will learn how different rocks are weathered and eroded and
the implications of this weathering.

What You’ll Learn to Do


• Identify causes of soil erosion and soil loss.
• Identify strategies to conserve and replenish soil.

Learning Activities
The learning activities for this section include the following:

• Reading: Causes of Soil Erosion


• Video: The Dust Bowl
• Reading: Avoiding Soil Loss
• Self Check: Soil Conservation

Department of Comp.Engg. 5 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

2 Literature Survey of Topic

The literature for the classification of the soil and prediction of crops are as
follows -

Soil Classification using Machine Learning Methods and Crop


Suggestion Based on Soil Series This project creates a model that can
predict soil series with land type and according to prediction it can suggest
suitable crops. It makes use machine learning algorithms such as weighted
K-Nearest Neighbor (KNN), Bagged Trees, and Gaussian kernel-based
Support Vector Machines (SVM) to classify the soil series. The Soil
classification philosophies used, follows the existence knowledge and
practical circumstances. On the land surfaces of earth, classification of soil
creates a link between soil samples and various kinds of natural entity.
Based on these classifications and the mapped data, the suitable crops
were suggested for a particular region.
In 2014, K AdityaShastry, Sanjay H A and Kavya H discussed various
classification techniques in the paper “A Novel Data Mining Approach for
Soil Classification”. The algorithms CART, C4.5 and proposed approach has
been studied. Accuracy obtained for C4.5 is more as compared to CART.
The proposed system uses gini index and giniratio. The impurity measure
gini index used in CART is biased towards attributes withhigher range of
values, while information gain used in C4.5 is biased towards attributeswith
high values. This drawback is removed using the proposed approach. From
the paper it is also observed that the manual soil classification being done
is very cumbersome and time consuming.
In 2015, Monali Paul, Santosh K. Vishwakarma and Ashok Verma in
“Analysis of SoilBehaviour and Prediction of Crop Yield using Data Mining
Approach” introduced thatData mining in agriculture field is somewhat a
novel research field. Data mining is theprocess of discovering unknown and
likely impressive patterns in large datasets. Stepsof data mining such as
selection, preprocessing, transformation, data mining and interpretation
has been discussed and Naïve bayes and K-nearest neighbor has been used
for prediction and analysis. Naive Bayes classifiers can be trained very
efficiently in a supervised learning setting and works much better with
complex real situations.K-
Nearest Neighbor makes predictions based on the outcome of the K
neighbors closest to that point. K- nearestNeighbour uses Euclidean
distance formula for calculation.
Department of Comp.Engg. 6 D.Y.Patil COE,Akurdi,Pune
Technical Seminar Soil Classification

In “Sensible approach for Soil Fertility Management using GIS


Cloud” Decision Support System (DSS) using GIS enabled cloud
technologies is implemented. The proposed approach of agricultural
information development and integration system ensures that
complete agricultural related data on cloud database can be
integrated into spatial maps through GIS technologies, to organize,
accumulate, and administer geospatial data in a cloud database
according to individual farmer land information for improving data
accuracy of digital agriculture fertilizer management. IaaS, PaaS and
Saashas been discussed with the respective layers used. GIS Cloud
server is used to incorporate, integrate, store, update and manage
complete information of agriculture.
In 2015, Amol D. Vibhute, K. V. Kale, Rajesh K. Dhumal and S.
C. Mehrotra in “Soil Type Classification and Mapping using
Hyperspectral Remote Sensing Data” has demonstrated use of
support vector machine algorithm for identification, mapping and
classification of various types of soil using high spectral resolution
Hyperspectral data. Gaussian Radial Basis Function (RBF) kernel of
SVM was used and the accuracy obtained is 71.18.
In 2010, Sofianita Mutalib, S-N-FadhlunJamian, Shuzlina
Abdul-Rahman and Azlinah Mohamed explained In “Soil
Classification: An Application of Self Organizing Map and k-means”
The two unsupervised technique Kohonen Self Organizing Map
(SOM) and k-means have been used to classify the soil.
Characteristics of soil such as parent material, soil horizontal profile,
color of soil, texture of soil and soil depth is listed. This paper
predicts the classification of soil and gives information about the
plants to be cultivated in specify type of soil.

Department of Comp.Engg. 7 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

3 Discussion of Base paper

The system uses the soil series data obtained by Soil Resources
Development Institute (SRDI) of Bangladesh. The group of soils which is
formed from the same kind of parent materials and remains under the similar
conditions of drainage, vegetation time and climate forms the soil series. It
also has the same patterns of soil horizons with differentiating properties.
Each soil series were named based on its locality. The main purpose of this
system is to create a suitable model for classifying various kinds of soil series
data along with suitable crops suggestion for certain regions of Bangladesh. In
this paper, they have worked on 9 soil series datasets obtained from six
upazillas of Khulna district, Bangladesh. The dataset considered had 383
samples with 11 classes .
The crop database was created by considering the Upazilla codes, map
units and class labels. The method involved two phases: training phase and
testing phase. Two datasets were used: Soil dataset and crop dataset. Soil
dataset contains class labelled chemical features of soil. The system follows an
architecture as seen in Fig2.1.1. The machine learning methods were used to
find the soil class. Three different methods used were: weighted K-NN,
Gaussian Kernel based SVM, and Bagged Tree.
Weighted KNN: KNN uses all training data, since weighting is used. The
nearest neighbours are given more weightage than the farther ones. The
distance is calculated and the majority of such classification is taken as the
final classification. The obtained accuracy of the classification using KNN was
92.93 % .
Gaussian Kernel based SVM: SVM separates the objects of classes into
different decision planes. A decision boundary separates the objects of one
class from the object of another class. Support vectors are the data points
which are nearest to the hyper- plane. Kernel function separates the non
linear data by transforming the inputs to higher dimensional space. In this
project they have used gaussian kernel function. The accuracy obtained using
SVM was 94.95% .
Bagged Trees: Bagging generates a set of models that classifies the
random samples of data and predicts the classes. When given a new instance,
the predictions of these models are aggregated to find the final prediction of
class. The accuracy obtained using this algorithm was 90.91% .
Big Data Analytics for Crop Prediction Mode Using Optimization
Technique Big data analysis is used in discover novel solutions which means
analyzing bulky data set ,so which place a decision making in agricultural field.

Department of Comp.Engg. 8 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

In this project they are predicting two classes that is good yield and bad yield
based on soil and environmental features like average temperature, average
humidity, total rain fall and production yield hybrid classifier model is used in
optimizing the features and it is also divided into three phases that is viz pre-
processing, feature selection and SVM_GWO that is grey wolf optimizer along
with Support Vector Machine(SVM) classification is used to improve the
accuracy, precision, recall and F-measure. In this the result shows that
SVM_GWO approach is better as compared to typical SVMs classification
algorithm.
The proposed methodology of GWO (Grey Wolf Optimization over SVM
(Support Vector Machine) classifier. The proposed approach is divided into
three phase
Data Pre-Processing: Data pre-processing is deemed to be the main step
in the data mining method and machine learning projects. In effective data
collection can subsequently lead to improper combination, weak control and
incorrect values . It moreover provides misleading results if the identification
of the data is not carried out clearly. Therefore, it is essentially important to
carry over quality and representation of data at the initial stage Feature
Selection: The algorithm of feature selections used to select a subset of the
features that are considered be important so that the redundant data can be
removed. In addition, it also eliminates the irrelevant and noisy features of the
data in order to make more simple and accurate. It also later helps in saving
memory and storage. Feature selection algorithms are categories viz in two
categories subset selection methods and feature ranking methods. The
selection methods of Subset essentially find the features that are required for
finest subset
Machine Learning Classifier: A feature selection algorithm selects a
subset of vital features and removes superfluous, unrelated and raucous
features for simpler and more precise data illustration
Proposed Algorithms: Proposed Algorithm of SVM_GWO . The objective
of this study is to increase the accuracy of prediction model by using different
parameters for future precision agriculture. The data set has been taken from
Food agriculture organization and applied to the processing model through
map reduce. To process the big data, map reduce is used to minimize the time
of execution and further classification is done. Feature selection and
extraction are important steps in classification systems.
This paper presents a hybrid model i.e. SVM_GWO that uses a
combinational approach for improving the classification accuracy, recall,
precision, f-measure by selecting the optimal parameters settings in SVM. In
this classification we have extract the feature vector with minimum error and
Department of Comp.Engg. 9 D.Y.Patil COE,Akurdi,Pune
Technical Seminar Soil Classification

converge and then SVM_GWO is developed for selecting the optimal SVM
parameters. Result show that the proposed approach is better than the typical
SVM classification algorithm with classification accuracy 77.09%, precision
75.38%, recall 74.189% and f measure 73.15%

1. Steps
A brief step by step procedure of designing the crop
recommendation system explained as follows:

Step 1: Input The input dataset is a comma separated values file containing
the soil dataset, which has to be subjected to pre-processing.

Step 2: Pre-processing of input data Input dataset is subject to various pre-


processing techniques such as filling of missing values, encoding of
categorical data and scaling ofvalues in the appropriate range

Step 3: Splitting into training and testing dataset The pre-processed


dataset is then splitinto training and testing dataset based on the specified
split ratio. The split ratio considered in the proposed work is 75:25, which
means 75% of the dataset is used for the training the ensemble model and
the rest 25% is used as test dataset.

Step 4: Building individual classifiers on the training dataset The training


dataset is fedto each of the independent base learners and the individual
classifiers are built using thetraining dataset.

Step 5: Testing the data on each of the classifiers The testing dataset is
applied on each of the classifiers, and the individual class labels are
obtained

Step 6: Ensembling the individual classifier output using Majority Voting


Technique

Department of Comp.Engg. 10 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

2. Proposed Mothodology

The System architecture of the proosed model is shown in below fig.

Proposed System Architecture

Department of Comp.Engg. 11 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

The method invoves two phases: training phase and testing phase. Two datasete are
used: Soil dataset and crop dataset. Soil dataset contains class labeled chemical features
of soil. Table I shows the details of the 12 chemical attributes of soil, used in our
method.

Department of Comp.Engg. 12 D.Y.Patil COE,Akurdi,Pune


Technical Seminar Soil Classification

Soil series and land type combinely reprsents the soil class in the
database. The machinelearning methods are used to find the soil class
(i.e. soil series and land type). Three diffrent methods are used:
weighted K-NN, Gaussian Kernel based SVM, and Bagged Tree.

DepartmentofComp.Engg. 13 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification

4 Algorithm

I. Weighted K-NN

A refinement of the k-NN classification algorithm is to


weigh the contributionof each of the k neighbors according to
their distance to the query point q x , giving greater weight wi to
closer neighbors [10]. It is given by

Where the Weight is,

In case q x exactly matches one of i x so that the denominator


becomes zero, we assign( ) q F x equals ( )i f x in this case. It makes sense
to use all training examples not just k if weighting is used, the algorithm
then becomes a global one. The only disadvantageis that the algorithm will
run more slowly.

DepartmentofComp.Engg. 14 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification

w1 w2

Xu

w3

Construct the theoretical


sample set and output
level

Calcuate the Euclidean distance di, y =


j
(f −y )
j =1

Inverse Euclidean = 1
distance as i ,y j
weight

New calculation formula wi, y


s =
' j=1 j yj

j=1 i, y j
Short-term power predicted value

The flowchart of the proposed W-K-NN algorithm.

DepartmentofComp.Engg. 15 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification
II. SVM

SVM is a supervised machine learning algorithm which works based


on the concept of decision planes that defines decision boundaries. A
decision boundary separates the objects of one class from the object of
another class [11]. Support vectors are the data points which are nearest
to the hyper-plane. Kernel function is used to separate non-linear databy
transforming input to a higher dimensional space. Gaussian radial basis
function kernel is used in our proposed method.

Where, K ( X i , X j ) = Feature vectors in input space, =2 || , || X i X j


High dimensionalspace of X and Y coordinate, and σ is a free parameter

DepartmentofComp.Engg. 16 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification
III. Bagged Tree

We have used a bagged decision tree ensemble classifier (consisting of 30


trees). Bagging generates a set of models each trained on a random
sampling of the data (Bootstrap resampling) [12]. The predictions from
those models are aggregated to produce the final prediction using
averaging

The soil classes that are used in the proposed method of these upazillas
are shown in table II.

DepartmentofComp.Engg. 17 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification

Two-third of the samples are used to train the model(s), rest one-
third of the samples are used to test the models. For correctly classified
samples, crop is suggeted for the corresponding map unit of corresponding
upazilla. These two geogrphical features andsuggested crop-list makes the
crop dataset. The total map units of six upazillas of Khulna district,
Bangladesh is shown in table III.

DepartmentofComp.Engg. 18 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification

DepartmentofComp.Engg. 19 D.Y.PatilCOE,Akurdi,Pune
Technical Seminar Soil Classification

5 Summary

A model is proposed for predicting soil series and providing suitable crop
yieldsuggestion for that specific soil. The research has been done on soil
datasets of six upazillas of Khulna region. The model has been tested by
applying different kinds of machine learning algorithm. Bagged tree and K-
NN shows good accuracy but among all the classifiers, SVM has given the
highest accuracy in soil classification. The proposed model is justified by a
properly made dataset and machine learning algorithms. The soil
classification accuracy and also the recommendation of crops for specific
soil are more appropriate than many existing methods. In future, providing
fertilizer recommendation is our concern, also data of other districts will
be added to make this model more reliable and accurate.
Machine learning as we have seen till now, is a very powerful tool and as
evitable, it has some great application. We have seen till now that machine
learning is very much dependent upon data. Thus it is important
understand that data is quite invaluable and as simple is it may sound, data
analysis is not an easy task. Machine learning have found tremendous
application and has evolved further into deep learning and neural
networks, but the core idea is more or less the same for all of them. This
paper delivers a smooth insight of how to implement machine learning.
There are various ways, methods and techniques available to handle and
solve various problems, in different situations imaginable. This paper is
limited to only supervised machine learning, and tries to explain only the
fundamentals of this complex process. ACKNOWLEDGMENT This paper is
written based upon the summer project.

Department of Comp.Engg. 20 D.Y.PatilCOE ,Akurdi,Pune


Technical Seminar Soil Classification

References

1. O.Aqboola,A.Akinsoni,S.Ovedeji,” Changes in Nutrient Contents of


Soil acrossDifferent Land Uses in a Forest Reserve, Environmental
Science,2017.

2. M. S. Shirsath, E. Cernadas,M.Delqado,R.Khan,‟‟Classification of
agricultural soilparameters in India,Compute Electronic
Agricuture,2017.

3. M. P. K. Devi, U. Anthiyur and M. S. Shenbagavadivu, "Enhanced


Crop Yield Prediction and Soil Data Analysis Using Data Mining",
International Journal ofModern Computer Science, vol. 4, no. 6,
2016.

4. K.Shastry, H.A.Sanjay, H.Kavya,”A novel data mining approach for soil


Classification, ,International Conference on Computer Science &
Education,2014.

5. D. Ramesh and B. V. Vardhan,”Data mining techniques and


applications to agricultural yield data”, International journal of
advanced research in computer andcommunication engineering, vol.
2, no. 9, pp. 3477-3480, 2013.

6. Gholap, J., Ingole, A., Gohil, J., Gargade, S. and Attar, V., Soil data
analysis usingclassification techniques and soil attribute
prediction,2012.

Department of Comp.Engg. 21 D.Y.PatilCOE ,Akurdi,Pune


Technical Seminar Soil Classification

Department of Comp.Engg. 22 D.Y.PatilCOE ,Akurdi,Pune


Technical Seminar Soil Classification

Department of Comp.Engg. 23 D.Y.PatilCOE ,Akurdi,Pune

You might also like