You are on page 1of 18






Matric Number: 17090931

Name Chow Kinn John

Abstract .................................................................................................................................................... 1
1. Chapter 1: Introduction ..................................................................................................................... 2
1.1 Research Background ................................................................................................................ 2
1.2 Research Problem statement...................................................................................................... 3
1.3 Research questions .................................................................................................................... 4
1.4 Research Objective .................................................................................................................... 4
1.5 Research significance ................................................................................................................ 4
2. chapter 2: literature review ............................................................................................................... 5
2.1 Relevant work for Rice Leaf Disease classification .................................................................. 5
2.2 Rice Disease Feature Extraction ................................................................................................ 9
2.3 Summary of classification models ............................................................................................ 9
3. chapter 3: Research methodology.................................................................................................. 11
3.1 Research Design ...................................................................................................................... 11
3.2 Decision Tree and SVM approach .......................................................................................... 11
3.2.1 Image Acquisition and Segmentation ....................................................................................... 11
3.2.2 Feature Extraction ................................................................................................................ 12
3.3 CNN approach ......................................................................................................................... 12
3.4 Diagnostic Report Module ...................................................................................................... 12
3.5 Research Instrumentation ........................................................................................................ 13
3.6 Evaluation methods ................................................................................................................. 13
3.7 Conclusion ............................................................................................................................... 13
4. ReferenceS ...................................................................................................................................... 14

Agriculture is one of the domains that uses machine learning techniques

to identify plant diseases which cause the decrease of production yield in crops.
Rice diseases results major problems for rice farmers especially in detection of
rice plant diseases. Machine learning techniques have the potential to greatly
aid in the detection and classification of rice plant diseases. In this study, we
used different machine learning algorithms to classify four common rice plant
diseases: Brown Spot, Leaf Smut, Bacteria Blight, and Tungro. By analyzing
images of infected plants and using features such as shape, size, and color of the
lesions, the models were able to accurately classify the diseases with high
accuracy. The proposed diagnostic system was used to estimating intensity in
terms of extent and stage of infection providing detailed and overall diagnosis

Keywords: Rice Disease, Machine Learning, Image Processing, Disease Classification

1.1Research Background
Rice is the staple food for many countries in the world especially the Asia
region. There have been many challenges faced by rice farmers such as climate
change and rice plant diseases. These tribulations have done a lot of harm
towards the yield production of rice and huge financial losses. As the world
population grows in the coming years, we may face food shortage issues.
Statistics has shown that the production of rice plants is bound to decrease for
the next decade due to climate change (Tan et al., 2021, pp. 2–3). Climate
change issues seems to be an inevitable factor to control as the years goes by.
On the other hand, farmers face difficulty in detecting rice plant diseases. Early
detection of the rice plant disease can help farmers to make proper actions to
reduce the spread of the disease. Farmers who are lack in formal education
background may make wrong diagnosis on plant diseases. Wrong diagnosis
results in false treatment and leads to wastage of man power and cost invested
in pesticides for a farmer (Chen et al., 2021, p. 420). Therefore, detection and
classification have to be done concurrently to get the most accurate diagnosis
with the proper treatment. In these recent years, Machine Learning techniques
has been adopted for the usage of detection and classification of rice plant
diseases. Thus, such preventive measures adopted, and timely appropriate
action taken can save farmers and their hopes which in turn result with
substantial growth in productivity.
There are various pathogens of rice which give rise to bacterial, fungal and
viral diseases. These can potentially harm various parts of the plant. Fair
detection of diseases based on accurate recognition of symptoms promptly has
become a demanding task. There are several diseases with which rice crops are
infected. We have chosen five harmful diseases in this paper. Among the
considered four rice diseases, Brown spot and Blast fall under fungal disease
category of rice. Brown Spot disease is caused by a fungus and results in the
formation of small, dark brown spots on the leaves and stems of rice plants. It is
more common in warm, humid areas and is spread by spores that are carried by
wind and water. Blast disease is also caused by a fungus and results in the
development of large, necrotic lesions on the leaves, stems, and panicles of rice
plants. It tends to occur in cool, wet weather and is spread by spores that are
carried by wind and water.

A fungus called leaf smut can infect rice plants, resulting in the
development of black or brown, powdery lesions on the leaves. These lesions,
which can affect the yield and quality of infected crops, may resemble soot or
ash. Leaf smut is propagated by spores that are conveyed by wind, water, and
insects, and it is more prevalent in warm, humid climates. Additionally, tainted
soil or seed might spread it.
Tungro is caused by a virus and results in the yellowing and stunting of
rice plants. It is transmitted by insects, such as the green leafhopper, which feed
on the plants and spread the virus. Bacterial Blight (BB) disease is due to bacteria
and leads to the formation of small, water-soaked lesions on the leaves and
stems of rice plants. It is more likely to occur in warm, wet weather and is spread
by bacteria that are carried by wind and water. (Home - IRRI Rice Knowledge
Bank, n.d., pt. Pest and Diseases)
In order to determine which machine learning model performs best in
detection and classification, we will use a variety of models in this study.
Models such as the CNN model, the Decision Tree method, and the Support
Vector Machine (SVM). (1899–1911) (Venu Vasantha et al., 2022) For the
purpose of determining the level of infection on the four selected diseases of
rice plants, a proposed diagnostic system was developed.

1.2 Research Problem statement

Over the decades, rice has been the main source of food in the livelihood
of human beings. However, the emerge of multiple rice plant diseases have
always been the major issue for farmers in paddy fields. Failure to identify
infected crops results in mass destruction of crops and major yield loss. The
advancement of computer science and technology has created machine learning
methods in detection of rice leaf diseases. The classification of rice leaf disease
had been done for most type of diseases. However, the machine learning
techniques used only able to perform detection and prediction based on
disease. It has limitations to know exactly the stages of the infection towards
rice leaf diseases. Thus, there still exist the research gap to improve the
approach towards identifying the level of infection of rice leaf diseases.

1.3 Research questions
I. What are some effective methods for detecting rice plant diseases?
II. What are the suitable machine learning tools for classification of rice leaf
III. How can the proposed diagnostic system measure the severity of a rice
plant disease?

1.4 Research Objective

I. To identify the available machine learnings tools that can be used to
detect rice leaf disease.
II. Evaluate chosen machine learning techniques performance based on the
accuracy and precision.
III. Analyse the rice leaf diseases using the proposed methodology to
measure the infection intensity and generate a diagnosis report.

1.5 Research significance

The research scope covers the collection of images datasets of rice leaf
diseases. Datasets will be obtain through open source databases. The
population size for the images will be around 200 – 400 images.
The machine learning tools that this research will be exploring are such as
Support Vector Machine (SVM), Decision Tree algorithm, and CNN model.
The main contributions in the research are that rice farmers able to save cost
and time to identify and conduct the accurate treatment towards rice plant
diseases. Providing a diagnosis report which is easy to understand for farmers.

2.1 Relevant work for Rice Leaf Disease classification

Article No. Titile
Dataset Techniques Best Techniques Feature Extraction Evaluation Limitation Advantage

Generate confusion matrix, A

Uses Scale Invariant Feature performance table created
1 (Jagan et al., 2016) Not mentioned K-NN and SVM K-NN Transform(SIFT) for feature for paddy plant disease Poor detection accuracy Good disease recognition
extraction. recognisation using SVM and

K-means clustering. color,

Images that capture from texture, and Comparing three different Dataset not large enough. Low
Developed easy to use GUI
the rice fields from a shape. Taken 88 features segmentations. Choosing the accuracy for leaf smut disease.
2 (Prajapati et al., 2017) SVM (Gaussian kernel) SVM from input image to disease
village from the disease portion of best segmentation for the Difficulty to differentiate leaf
leaf image. classification. smut and brown spot disease.

historical rice blast

Accuracies are not high,
disease data and Long Short term memory neural Data in the form of
3 (Kim et al., 2017) N/A N/A The accuracy and F1-score Performance of model can be
historical climatic data of network quantitative data.
imporved by adding more data.
three different regions

Shape feature extraction
method to measure the
The RGB Color images of breadth and height of the Able to detect paddy plant
(Narmadha & paddy crop leaf are image is to measure the diseases such as Blast,
Arulvadivu, 2017) captured using smart
SVM and ANN N/A count of the object pixel.
N/A N/A Brown spot and Narrow
phones or digital camera Color feature extraction into brown spot
the RedGreenBlue(RGB)

400 images from sources

Scale Invariant Feature Only 3 rice diseases chosen.
include (Home - IRRI Rice Precision and Recall Method. High accuracy of 94% for
5 (Bashir et al., 2019)
Knowledge Bank, n.d.),
Bayes Classifier, KNN, SVM SVM Transform(SIFT) and K-
Confusion Matrix
Could conduct other type of
the SVM method
means clustering crops as well in future research.
(Shutterstock, n.d.)

After applying image filter,

five features were selected Accuracy, (True Positive Four ML algorithm models

6 (Ahmed et al., 2019)

UCI Machine Learning Logistic Regression, Decision using Correlation Based Rate), (False Positive Rate),
Quality of datasets not high
enough. Ensemble learning
were used in this research.
Repository Feature Selection technique. Precision, Recall, F- measure Decision Tree having a high
KNN, Decision Tree Tree methods are not explored
This technique selects the and Area under ROC accuracy results.
best 5 features.

The images captured The segmented images are The performance for the
High accuracy obtained by
from the paddy field using KNN and ANN used to extract the features KNN and ANN Classifiers is Only one type of disease being
7 (D. Vydeki, 1970)
digital camera with high ANN related to the disease measured using confusion used for research.
both KNN and ANN
classification models classification models
resolution infection. matrix.

The image go through a
6000 images, considering sequence of two
Confusion matrix used to The prediction of the gap
500 images per stress convolutional and pooling Maximum average stress
pre-trained VGG-16 CNN calculate accuracy. between yield potential and
8 (Anami et al., 2020) class are acquired. A total N/A layers to extract features, classification accuracy of
model Comparing accuracy with yield under stress can be the
of 500 healthy field followed by a fully connected 95.08% achieved
BPNN model. factors for further studies.
images per paddy crop. layer to interpret the

Evaluation was done based

Features like color maps,
K-nearest Neighbours (KNN) on comparing with another
edge maps, texture maps, The number of tillers and grain
the source is images from Support Vector Machine (SVM) research which uses BPNN,
Adaptive Feature region-based features counting is also a future
9 (Patel & Sharaff, 2021) (Kaggle: Your Home for Neural Network with Different CNN and SVM. The research N/A
Selection Algorithm evaluated. Later the features research work for this proposed
Data Science, n.d.) Layer Configurations (NN) compared the accuracy,
converted the segmented technique
Quadratic Linear Classifier (QL) delay, precision and recall
regions into numerical values.

Background area of image

Classifier performance on Only one type of disease is
suppressed using image Very high accuracy of
Convolutional Neural Network the 8 types of rice diseases tested using the Infection
(Venu Vasantha et al., Images that capture from masking technique and image 98.47% obtained from the
2022) the rice fields
(RDD_CNN) , Infection Intensity RDD_CNN filters. And then overall
based on precison, recall Intensity Estimation Module.
disease classification model
Estimation Module. value, F1 score, sensitivity Other types of diseases can be
infected area is RDD_CNN.
and specificity. used for this module.

Both the healthy and the
unhealthy samples from Both performance Only one type of disease is
(Kaggle: Your Home for Image preprocessing , image parameters of SVM and CNN chosen for research which is
(Chaudhary & kumar, High accuracy of 95% for
Data Science, n.d.). The SVM and CNN CNN segmentation, feature evaluated. The parameters brown spot. Does not have
the CNN method
data set consists of 1488 extraction. (GLCM) are such as precision, recall, method that measures the
healthy leaves and 523 F1-score and support. severity of the disease.
brown spot leaf samples.

Feature extraction using CNN.

Convolution layers extract
Rice leaf dataset with
Support Vector Machine (SVM), high-level features going The hyper-parameters of the High accuracy of 99.58%
5932 images and 1500 Using a confusion matrix to
12 (Sharma et al., 2022)
potato leaf images are
CNN, K-Nearest Neighbors (KNN), CNN through a set
measure the accuracy.
proposed CNN model not for paddy leaf and 97.66%
Decision Tree and Random Forest. of filters that extract optimized for potato leaves.
used in the study
meaningful information from
the image.

2.2 Rice Disease Feature Extraction
In order to classify diseases, feature extraction is an important step before
modeling process (Zamani et al., 2022). Both supervised and unsupervised
machine learning techniques uses different methods for image attribute
selections. Scale Invariant Feature Transform method used by (Jagan et al.,
2016) and (Bashir et al., 2019). K – means clustering adopted by (Prajapati et al.,
2017), while (Chaudhary & kumar, 2022) applied Gray level co-occurrence
matrix (GLCM) for attribute selection. (Ahmed et al., 2019) uses the Correlation
Based Feature Selection technique that results in selecting top 5 features.
(Anami et al., 2020), (Venu Vasantha et al., 2022) and (Sharma et al., 2022)
applied CNN as their feature extraction tool.

2.3 Summary of classification models

Based on the critical analysis table many different machine learning
algorithms were being used for detection and classification of rice leaf diseases.
In Korea, (Kim et al., 2017) the researchers utilized historical quantitative data
of weather data with the aid of long short term memory neural network. They
were able to predict the regions that blast rice disease will occur but the
accuracy was low. Logistic Regression, K-Nearest Neighbour, Decision Tree(j48)
and Naive Bayes classifier were used (Ahmed et al., 2019). The dataset was
divided into 90% for training and 10% for test set. Tenfold cross validation was
performed on each algorithm. Decision Tree had the highest training and testing
accuracy scores, according to the evaluation's findings.
KNN and ANN classifiers were also used in this research (D. Vydeki, 1970,
pp. 31–37). The research focused on classifying rice blast disease only. The
accuracy obtained for KNN was 70% and 90% for ANN. Prajapati and colleagues
(Prajapati et al., 2017, pp. 357–373) aimed to detect paddy plant diseases using
SVM technique. They used images capture from the rice fields and they
managed to obtain 93.33% in training and 73.33% in testing accuracy due to
dataset not being large enough and difficulty differentiating leaf smut and
brown spot disease.

In addition to using the Scale Invariant Feature Transformation for feature
extraction, Bashir and his team (Bashir et al., 2019, pp. 239–250) used the same
SVM technique to reach a better accuracy of 94%. The SVM model detected
three disease classes: Brown spot, false smut, and bacterial leaf blight. The
primary change that Bashir and his team made was a cure recommendation
following the accurately predicted rice illness. Which stage of the rice sickness it
was, was not specified.
In papers by Chaudhary and Kumar (2022, pp. 464–473) and Sharma et al.
(2022, pp. 212–2140), a plant disease detection model that used CNN was found
to be more accurate. This model's final accuracy is between 95% and 99.58%.
The only limitation for the research was only one type of paddy plant disease
was chosen. They also do not have a method which measures the severity of the
disease. In the year 2022 (Venu Vasantha et al., 2022, pp. 1895–1914), they
proposed a rice disease diagnostic system with the use of CNN model technique.
The proposed model was able to do classification for 8 types of rice plant
diseases and achieving an average accuracy of 98.47%. The study proceeds to
also have an infection intensity estimation module. The module was able to
determine the percentage of infection on the paddy leaf. Brown spot disease
was chosen for the module. As future work, IIE process can be extended for
other rice leaf diseases that adversely affect rice crop yield.

3.1Research Design
In this chapter, the following sections explains each step of the proposed
methods in this research. The diagram below is a flow chart that shows the
outline of this research.

Figure 1: Flow chart of the entire work

3.2 Decision Tree and SVM approach

3.2.1 Image Acquisition and Segmentation
The picture dataset for rice disease was collected from Kaggle (Rice-leaf-
disease, 2022). Images for the five rice illnesses brown spot, leaf smut, blast,
blight, and tungro make up the dataset. A total of 320 photos, 40 for the diseases
brown spot and leaf smut, and 80 each for blast, blight, and tungro. In the
process of preprocessing, the images are crop into a specific size and white
background is place for the images. A step called segmentation divides the

image's numerous components into a number of pieces. The objective is to
simplify and enhance the significance of the image's portrayal.
3.2.2 Feature Extraction
An image's overall size can be decreased using a feature extraction technique
by effectively portraying its interesting areas as a compact feature vector. This
is helpful when it's necessary to swiftly retrieve and match big image sizes. The
procedure is carried out by using the Gray level co-occurrence matrix (GLCM)
(Chaudhary & kumar, 2022).
3.2.3 Disease classification and prediction
The dataset size for our study was 320. The dataset was split into training and
test sets using a resample filter, with training data containing 256 occurrences
(80% of the total), and test data containing 64 instances (only 20% of the 320
total). No instance from the test dataset should appear in the training dataset.
After training the algorithm, the testing dataset will be use for the disease
prediction phase.
3.3 CNN approach
Artificial neural networks called convolutional networks take their cues
from the connection patterns of specific neurons in the visual cortex. The
numerous neuronal clusters that make up the visual field partially overlap as
well. This enables a neuron to react in a mathematically similar way to a
stimulus in its receptive field. The pre-processed image will go through a
number of convolutional layers and pooling layers. Each convolutional layers
contain filters that act as pattern detectors. The pooling layers help to reach
the fully connected layer and reduces the number of convolutional layers. The
fully connected layer will be the output of the classified image.

3.4 Diagnostic Report Module

This module was inspired by (Venu Vasantha et al., 2022, pp. 1911–1914).
We will apply the proper filters to photographs of the disease to compute the
areas of infection. The number of pixels is used to compute the infection areas.
The determined number of pixels will be translated to a percentage format. The
threshold percentage value used will identify the disease stage.

3.5 Research Instrumentation
The programming language PYTHON will be used to write the code.
Tensorflow is an open-source software library where models will be
implemented and image processing will be done with OpenCV.
3.6 Evaluation methods
The results of the classification models will be evaluated using confusion
matrix for each disease stated in this research. Base on the confusion matrix, we
will compute the accuracy, precision, recall, True positive rate, False Positive
rate and area under ROC.

3.7 Conclusion
In this proposal, an automated diagnosis report module is proposed. To
extract the features of rice leaf disease images, GLCM are adapted to extract
features. The unsupervised method CNN use convolutional layer filters for
feature extraction. To classify and predict rice leaf disease, SVM, Decision Tree
and CNN. The performance of the model will use confusion matrix to evaluate
true positive rate and false positive rate.


Ahmed, K., Shahidi, T. R., Irfanul Alam, S. M., & Momen, S. (2019). Rice Leaf Disease

Detection Using Machine Learning Techniques. 2019 International Conference on

Sustainable Technologies for Industry 4.0 (STI).

Alfred, R., Obit, J. H., Chin, C. P. Y., Haviluddin, H., & Lim, Y. (2021). Towards Paddy

Rice Smart Farming: A Review on Big Data, Machine Learning, and Rice Production

Tasks. IEEE Access, 9, 50358–50380.

Anami, B. S., Malvade, N. N., & Palaiah, S. (2020). Deep learning approach for recognition

and classification of yield affecting paddy crop stresses using field images. Artificial

Intelligence in Agriculture, 4, 12–20.

Bashir, K., Rehman, M., & Bari, M. (2019). Detection and Classification of Rice Diseases:

An Automated Approach Using Textural Features. January 2019, 38(1), 239–250.

Chaudhary, S., & kumar, U. (2022). Analysis of Methods of Machine Learning Techniques

for Detection and Classification of Brown Spot (Rice) Disease. Universal Journal of

Agricultural Research, 10(5), 464–473.

Chen, S., Zhang, K., Zhao, Y., Sun, Y., Ban, W., Chen, Y., Zhuang, H., Zhang, X., Liu, J., &

Yang, T. (2021). An Approach for Rice Bacterial Leaf Streak Disease Segmentation

and Disease Severity Estimation. Agriculture, 11(5), 420.

D. Vydeki, S. R. (1970). Application of machine learning in detection of blast disease in

South Indian rice crops. Journal of Phytology, 31–37.

Home - IRRI Rice Knowledge Bank. (n.d.).

Jagan, K., Balasubramanian, M., & Palanivel, S. (2016). Detection and Recognition of

Diseases from Paddy Plant Leaf Images. International Journal of Computer

Applications, 144(12), 34–41.

Kaggle: Your Home for Data Science. (n.d.).


Kaundal, R., Kapoor, A. S., & Raghava, G. P. (2006). Machine learning techniques in disease

forecasting: a case study on rice blast prediction. BMC Bioinformatics, 7(1).

Kim, Y., Roh, J. H., & Kim, H. (2017). Early Forecasting of Rice Blast Disease Using Long

Short-Term Memory Recurrent Neural Networks. Sustainability, 10(2), 34.

M N Abu Bakar, Abu Abdullah, N. Abdul Rahim, Haniza Yazid, S.N. Misman, & Maz

Jamilah Masnan. (2018). Rice Leaf Blast Disease Detection Using Multi-Level

Colour Image Thresholding. Journal of Telecommunication, Electronic and Computer

Engineering, 10, 1–6.

Narmadha, R. P., & Arulvadivu, G. (2017). Detection and measurement of paddy leaf disease

symptoms using image processing. 2017 International Conference on Computer

Communication and Informatics (ICCCI).

Patel, B., & Sharaff, A. (2021). Rice Crop Disease Prediction Using Machine Learning

Technique. International Journal of Agricultural and Environmental Information

Systems, 12(4), 1–15.

Prajapati, H. B., Shah, J. P., & Dabhi, V. K. (2017). Detection and classification of rice plant

diseases. Intelligent Decision Technologies, 11(3), 357–373.

Rice-leaf-disease. (2022, March 7). Kaggle.

Sharma, R., Singh, A., Kavita, Z. Jhanjhi, N., Masud, M., Sami Jaha, E., & Verma, S. (2022).

Plant Disease Diagnosis and Image Classification Using Deep Learning. Computers,

Materials &Amp; Continua, 71(2), 2125–2140.

Shutterstock. (n.d.). Stock Images, Photos, Vectors, Video, and Music.

Tan, B. T., Fam, P. S., Firdaus, R. B. R., Tan, M. L., & Gunaratne, M. S. (2021). Impact of

Climate Change on Rice Yield in Malaysia: A Panel Data Analysis. Agriculture,

11(6), 569.

Venu Vasantha, S., Samreen, S., & Lakshmi Aparna, Y. (2022). Rice Disease Diagnosis

System (RDDS). Computers, Materials & Continua 2022, 73(1), 1895–1914.

Zamani, A. S., Anand, L., Rane, K. P., Prabhu, P., Buttar, A. M., Pallathadka, H.,

Raghuvanshi, A., & Dugbakie, B. N. (2022). Performance of Machine Learning and

Image Processing in Plant Leaf Disease Detection. Journal of Food Quality, 2022, 1–



You might also like