Professional Documents
Culture Documents
Submitted by
NARASIMHALU R
(PES1PG21CA154)
Ms.SAMYUKTA D KUMTA
Asst.Professor
Certificate
This is to certify that the project entitled
NARASIMHALU R
(PES1PG21CA154)
in partial fulfillment for the completion of 3th semester project work in the Program of
Study MCA under the rules and regulations of PES University, Bengaluru during the
period January 2022 - May 2022. The project report has been approved as it satisfies
the 3th semester academic requirements
I, NARASIMHALU R, hereby declare that the project entitled, CROP YIELD PREDICTION
USING MACHINE LEARNING, is an original work done by me under the guidance of Ms.SAMYUKTA
D KUMTA (PES University) is being submitted in partial fulfillment of the requirements for com-
pletion of 3th Semester course work in the Program of Study MCA.
All corrections/suggestions indicated for internal assessment have been incorporated in the report.
The plagiarism check has been done for the report and is below the given threshold.
I further declare that the work reported in this project has not been submitted and will not be submit-
ted, either in part or in full, for the award of any other course.
Place: Bengaluru
Date : January 12, 2023
NARASIMHALU R
PES1PG21CA154
Acknowledgment
I take a great pleasure in expressing my sincere gratitude to all those who have guided me and supported
me to successfully complete this project.
I would like to express my sincere gratitude to the Vice Chancellor of PES University, Dr. J Suryaprasad
and Chairperson Dr. Veena S, who gave me an opportunity to go ahead with this project.
NARASIMHALU R
PES1PG21CA154
Abstract
The number of crops that will be produced in a particular year is predicted using statistical models
and algorithms for agricultural yield prediction. This may be accomplished utilising a variety of data
sources, including weather, soil, and historical yield data. Crop yield forecasting aims to support farmers,
agricultural businesses, and governments in making well-informed choices on crop management and out-
put.Machine learning may be used to forecast agricultural output in a number of ways. Using supervised
learning techniques, which entail modelling a labelled collection of historical yield data, is one strategy.
The model may then be used to forecast crop yields in the future depending on input information like
weather and soil conditions. Using unsupervised learning algorithms, which may find patterns in the
data without explicitly being trained on labelled data, is an alternative strategy. A considerable amount
of high-quality data is necessary in order to produce reliable forecasts. This comprises information on a
range of elements, including temperature, precipitation, soil type and quality, and the presence of pests
and diseases, that might impact crop output. After analysing this data, machine learning algorithms
may be used to forecast future crop yields. Overall, crop management and production may be optimised
to boost effectiveness and decrease waste by farmers, agricultural businesses, and governments with the
use of machine learning. By making sure there is enough food to fulfil demand, it can also assist to
promote food security.
Contents
1 Introduction 3
1.1 Project Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Survey 5
2.1 Domain Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Comparative study of Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Tools and Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 System Design 14
5.1 Architecture Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.1 Context Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Process Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Detailed Design 17
6.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Implementation 20
7.1 Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8 Testing 33
8.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.2 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
9 Conclusion 35
10 Future Work 36
Appendix A :References 37
Appendix D: Paper 39
Appendix E: Poster 40
List of Tables
List of Figures
7.1 Screenshot 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Screenshot 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Screenshot 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.4 Screenshot 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.5 Screenshot 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.6 Screenshot 06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.7 Screenshot 07 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.8 Screenshot 08 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.9 Screenshot 09 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.10 Screenshot 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.11 Screenshot 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.12 Screenshot 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.13 Screenshot 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.14 Screenshot 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.15 Screenshot 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.16 Screenshot 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.17 Screenshot 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.18 Screenshot 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.19 Screenshot 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.20 Screenshot 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.21 Screenshot 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.22 Screenshot 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.23 Screenshot 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.24 Screenshot 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.25 Screenshot 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 1
Introduction
1.1.3 Purpose
Crop yield forecasting is meant to assist farmers, agricultural businesses, and governments in making
well-informed choices on crop management and output. Choosing what crops to plant, when to plant
them, and how to maintain them to increase production are all examples of this.
Crop yield forecasting may assist farmers in maximising their agricultural methods and making the
most of their available resources and land. To determine which crops are most likely to be productive in
a particular year, for instance, and to modify their planting and management techniques appropriately,
farmers might utilise yield prediction. By doing so, waste may be decreased and crop production
efficiency can be improved. Crop yield forecasting is not only useful for farmers, but it may also be
used by agricultural firms and governments to plan for future food requirements and make sure there is
enough food supply to fulfil demand. This might contribute to increased food stability and security in
both wealthy and developing nations. Overall, crop yield prediction is a crucial instrument for increasing
agricultural production’s efficiency and effectiveness and for ensuring that there is enough food to fulfil
the demands of the world’s expanding population.
Chapter 2
Literature Survey
2.3.2 Technologies
Operating System : Windows 8/10 (64 bit OS)
Programming Language : Python 3.9.12
Framework : Anaconda 4.12.0
Libraries : Keras, TensorFlow
IDE : Jupyter Notebook
Frontend : Flask 1.1.2
Economic Feasibility
The agriculture sector and farmers may benefit greatly from applying machine learning to predict crop
harvests. Making educated decisions about planting and harvesting with the aid of accurate yield
projections may increase productivity and profitability for farmers.
The cost of creating a machine learning model for predicting crop yields will vary depending on a
number of variables, including the amount of data available, the resources needed for data collection and
pre-processing, the complexity of the model, and the price of the necessary hardware and software. To
develop this model, the data is gathered from kaggle and it is for free of cost and no external resources
used for data collection and pre-processing and the software used to develop this model is Jupyter
notebook, this is a free open source software.
Operational Feasibility
Operational feasibility refers to the practicality and ease of implementing a project. When considering
the operational feasibility of a crop yield prediction system using machine learning, you should consider
the following factors:
Data availability: To develop this model, there is a require of the data so the data set is collected
from the kaggle and there is efficient data for both training and testing the model, there are over 70
thousand records in the data set. This data is enough to train and test the model.
Hardware and software requirements: To built this model there is a need of software and hardware
requirements. Software used to built this model is Anaconda Navigator(Jupyter Notebook) and hardware
requirements like PC with processor i3 4th generation and above, hard disk with minimum of 500gb
space and RAM with 8gb and above.
-
Chapter 3
Chapter 4
4.1 Users
• Farmers
Predictive models may assist farmers in maximising their inputs (such as water, fertiliser,
and pesticides) and in helping them decide when to sow and harvest their crops with greater
knowledge. Both yields and costs may rise as a result of this.
• Agri-Businesses
With the use of yield prediction models, businesses that create or market agricultural inputs (such
as seeds and fertiliser) may better understand market demand and optimise their production and
distribution.
• Investors
For investors in the agriculture industry, accurate yield forecasts can be helpful since they can
guide investment choices.
Model Validation : Once trained, the model may be used to generate predictions
based on fresh data. To determine if the model can generalise to previously undis-
covered data, its accuracy can be assessed using a different test data set. Additional
stages, such as hyper parameter tuning or feature engineering, may be required if
the model’s performance is unsatisfactory.
Result Analysis : Analyzing the output of a crop production prediction model
created using machine learning entails a number of processes. Calculate several
performance measures, including as accuracy, precision, recall, and F1 score, to
assess the model’s performance. To assess the effectiveness of the model, you may
also use alternative metrics like mean absolute error (MAE) or root mean square
error (RMSE). Compare the model’s performance to that of other models: You
may compare the performance of different models you’ve constructed to discover
which one performs better. This might assist you in determining the advantages
and disadvantages of each model.
NFR2. Scalability
This model has a ability to handle an increase in the volume of requests without a
significant drop in performance.
NFR3. Reliability
The model is reliable and available to users at all times, with minimal downtime or
errors. This model available for 24/7.
NFR4. Security
The model is secure and protect against unauthorized access or misuse. This may
include measures such as authentication and secure data handling. As this model
contains a Register and login page, though it helps to protect against unauthorized
access.
NFR5. Usability
The Capability of model to be understood, learned and used by the user, when used
under specific condition. This model provides the user-friendly environment to the
user. Some aspects of functionality, reliability, and efficiency will effect usability.
NFR6. Maintainability
The Capability of the software to be modified. The proposed model is easy to make
modifications like corrections, improvements, or adaptation of the model to changes
in environment, requirements and functional specifications.
NFR7. Portability
The Capability of model to be transferred from one environment to another. This
model can be transferred from one platform to another platform, environment like
organizational, hardware or software environment. After the model is transferred
from one environment to another, its efficiency still remains same.
Chapter 5
System Design
In the above Architecture Diagram, Mainly there are three phases User Interface phase,Model Building
phase and Model Evaluation phase. In User interface, the can able to register or log in to the model
and he can give parameters for the prediction and the result is displayed in the user interface. In the
Model building phase all the preprocessing, data splitting and model building is done and it will provide
a model for testing and in the model evaluation phase the test data set is given to the trained model
to check its accuracy and can select model based on its performance metrics and the selected model is
saved in the form of pickle file. Later the user can give new crop yield data to the actual model and it
provides result to the user.
In the above Context diagram the components are External entities, Process and data flow. Model and
Users are the External Entities, Crop Yield Prediction is the machine learning algorithm. The Model
will provide the data set to the Algorithm and that algorithm will give the crop yield result and saved
model details to the model. Later the user will provide the input parameters i.e.,crop yield details(state
name,season name,crop name,size of the area and average rainfall) to the Algorithm, that algorithm will
predict the result and gives to the user.
In the above Process Flow Diagram, the collected crop yield data set is Analysed in the data analysis
phase and the analysed data set is then featured engineered means Label encoding converting the
categorical data into the numerical data, data splitting the featured data is splitted into two parts
for training the model and testing the model, grouping the unlabelled data is called clustering, In the
training phase the training data set is given to the both the algorithms (K-Neighbors Classifier and Ridge
Classifier)for the training purpose after this phase the testing data set is given to both the algorithms for
testing and validation on the basis of the performance of both the algorithms and accuracy the model
is selected for decision making and the new crop yield data is given to the selected model to predict the
result and finally the model provide the estimated crop yield.
-
Chapter 6
Detailed Design
In the above Use case diagram User and Model are the actors and login, logout, input dataset, Data
splitting, Train model, Test model, Load model into flask, User input, View result are the use cases
performed by the actors. Firstly the model will input the dataset, to input the dataset data gathering is
mandatory so we have made the relation as ¡¡include¿¿ and the model will split the data into two parts
as Testing dataset and Training dataset, the model is trained with the training dataset and tested with
testing dataset and predicts the Accuracy and that model is loaded in the flask i.e. frontend. Later the
user will login to the application, before logging into the application, the user wants to register first then
by using email id and password the user is able to login into the application. After successful login the
user have to give the input i.e., state name, area, rainfall, season and crop. Then the model will give
the result for the particular input, the user can view the result.
In the above sequence diagram, there are different lifelines and messages. The lifelines are User, At-
tributes, Classification, Result, Data set,Login and Admin. First the Admin will upload the collected
data set(collected from kaggle), later the model is builded and made a classification based on the pre-
vious data. The user will login to the system by providing the User id and password, And Enter the
parameters based on the parameters the model will classify and provides the result to the user.
In the above class diagram the user has the attributes like user name,email and password and he can
register or login to the system and the user data is stored in the database i.e., in excel sheet created
by admin. The admin can build a model and can select an algorithm to build a model, so based on
the performance and accuracy metrics he can select among the multiple algorithms and by using the
algorithm he is able to build a machine learning model, the model has the ability to predict the result
with new user data. The user can give new crop yield data to the model for the crop yield prediction
and model provides result to the user.
-
Chapter 7
Implementation
7.1 Screenshots
Flask
The above screenshot is related to front end developed using flask with python here we have imported
the libraries, classified classes with their respective values and opened a models using ’with’ statement
with ’open’ function in ’rb’ mode, the rb model opens a file in binary format for reading.
Here we have created a function for prediction of rainfall in the particular state in particular season.
With using GET and POST methods it takes state name and season name from the user and predicts
the rainfall for that data.
There is a section in the front end that visualize the data,like confusion matrix, classification report and
different visualizations to display those visualizations we have created a function called submit here user
have to select the option from the list, so the visualization of that option is displayed here.
In the above the same submit function is called. And the back function allows the user to the main dash
board from the graphs section. And we have created a login function which allows the user to login to
the model the user have to provide the user name and password to the model to get access to the model
and those user name and password is verified from the excel sheet.
Here we have defined a function for user registration before logging in the user have to register with
email,user name and password. Once the user is successfully registered. The details i.e, the email,user
name and password is stored in the excel sheet from next time if the user want to log in he/she just
want to enter user name and password.
And this is the last one in the frond end here, defined a function to logout from the dash board the user
can logout from the dash board just by clicking on logout on the top of the page and it directs the user
to the login page.
In the above Screenshot, we have imported the different libraries which are required to develop crop
yield prediction model. The different libraries like pandas,numpy,warnings,seaborn,matplotlib etc,. And
we have loaded the collected data set for future process.
Here we have analysed the number of records and columns, there are 74975 records with 5 columns.
Next we have checked whether the data set has the Null values luckly we don’t have any null values.
We have assigned the area which has quantile less than 1.0 to lower outliers and area which has quantile
greater tahn 0.90 to higher outliers. Here we have visualized the state counts, means there are different
states with different counts, so easy understanding we have visualised the number of counts for each
state with the help of pie chart.
Same like previous, here we have visualised the season counts, there are six types of seasons with different
counts those counts are visualised with the horizontal bar graph.
Here we have visualised the crop counts, there are different types of crops with different counts,so we
have visualised those crop counts using bar graph.
In the above screenshot we have done label encoding, means there are categorical values for state
name,season name and crop name so we have to convert those categorical values into numerical values.
The machine learning algorithms are only able to work with numerical data so this is the important
phase in model building.
The screenshots 13,14 and 15 are related to clustering, clustering means identifying groups of similar
objects in data set and grouping them. Here we made four clusters/groups based on the median value of
the area size like if the median value is 4443.0 is assigned to class 0, if the median value is 240.0 is assigned
to class 1, if the median value is 10838.0 is assigned to class 2 and finally if the median value is 329.0
is assigned to class 3. And we have given names to each of the classes like 0=’Good’,1=’Poor’,2=’Very
Good’ and 3=’Average’.
ere we have splitted the data set into two parts, one is for training the model and another is for test-
ing the model in the ratio 80:20, means 80 percent of data is for training the model and 20 percent is
for testing the model. And started to build our first model K-Neighbors Classifier and the Accuracy of
the model is 99.98 percent.
This screenshot is related to the above above model. Here we are going to visualize the confu-
sion matrix and classification report of the K-Neighbors Classifier. The Classification report contains
precision,recall,f1-score and support for every class. And Confusion matrix contains the values of true
labels and predicted labels.
In the above screenshot we are started to build our second model i.e, Ridge classifier and the accuracy
we got from this model is 86.76 percent.
Same like the first model we have visualised the confusion matrix and classification report of the Ridge
classifier algorithm.
Confusion matrix for the Ridge Classifier Algorithm and model saving.
In the above Screen shot, we have built a model to predict the rainfall, we have chosen the state
name,season name and rainfall for this prediction and we have splitted the train data set and test data
set.
Here we have started to build model for rainfall prediction using random forest regressor.
Here we have build a model for rainfall prediction using random forest regressor and got 100 percent
accuracy and visualised a graph for actual values and predicted values.
Chapter 8
Testing
Test Case
An ML model’s performance is assessed using test cases, which are particular sets of inputs and the
anticipated result. To make that the model is producing correct predictions and that it is operating as
intended, test cases are utilised.
8.1 Frontend
8.2 Backend
Chapter 9
Conclusion
Finally the conclusion of this model is that it is mainly useful for the farmers who are facing some
difficulties while cultivating their crops and who don’t have idea about what to grow and when to grow
on certain conditions. So this model helps them to make a better decisions and it will increase the
productivity and decrease the farmer’s economy.
Chapter 10
Future Work
References
[1] Veenadhari S, Misra B, Singh CD: Data mining techniques for predicting crop productivity—A
review article. In: IJCST.2011.
[2] Ramesh D, Vishnu Vardhan B: Data mining techniques and applications to agricultural yield data.
In: International journal of advanced research in computer and communication engineering. 2013.
[3] Thomas van Klompenburg, Ayalew Kassahun, Cagatay Catal: Crop yield prediction using machine
learning:0168-1699/ © 2020 Elsevier B.V
[4] nakha Venugopal, Aparna S, Jinsu Mani, Rima Mathew, Prof. Vinu Williams:Crop Yield Prediction
using Machine Learning Algorithms:NCREIS - 2021 Conference Proceedings
[5] Ashwini I. Patil, Ramesh A. Medar, Vinod Desai:Crop Yield Prediction Using Machine Learning
Techniques:2020 IJSRSET