You are on page 1of 48

Republic of the Philippines

Laguna State Polytechnic University


Province of Laguna

COLLEGE OF COMPUTER STUDIES


RICE-LEAF AILMENT DETECTOR RAD: DIFFERENTIABILITY OF SUPPORT
VECTOR MACHINE (SVM) AND CONVOLUTIONAL NEURAL NETWORK
(CNN) IN DETECTING RICE DISEASES USING IMAGE CLASSIFICATION

A Thesis Proposal
Presented to the Faculty of
College of Computer Studies
Laguna State Polytechnic University
San Pablo City Campus
San Pablo City

In Partial Fulfillment
of the Requirements for the Degree
Bachelor of Science in Computer Science

By:

Apiles, Jericho

Atienza, Jude A.

Eraña, Allen Troy

Ompangco, Kristine

Nov 2022
TABLE OF CONTENTS

PRELIMINARIES PAGE NO.


Title Page ………………………………………………………….. i
Approval Sheet ……………………………………………………. ii
Dedication …………………………………………………………. iii
Acknowledgement ………………………………………………… iv
Abstract …………………………………………………………….. v
Table of Contents …………………………………………………. vi
List of Figures ……………………………………………………... ix
List of Tables ………………………………………………………. x

Chapter I - The Problem and its Background

Introduction …………………………………………………………... 1
Background of the Study …………………………………………… 2
Objectives of the Study ……………………………………………. 4
Scope and Limitations ……………………………………………... 5
Significance of the Study ………………………………………….. 5
Research Framework ……………………………………………… 6
Technical Definition of Terms …………………………………... 9

Chapter II – Literature Review

Rice Propagation……………………………………………………… 10
Rice Disease ……………………………………………………. 11
Machine Learning …………………………………………………... 12
Supervised Learning….……………………………………….. 13
Image Acquisition……………………………………………………. 13
Feature Extraction…………… ……………………………………... 14
Feature Extraction in CNN and SVM 15
GLCM (Gray Level Co-Occurrence Matrix………………………… 16
Support Vector Machine (SVM)…………………………………….. 17
Convolutional Neural Network……………………………………... 18

Chapter III - Research Methodology

Research Design …………………………………………………… 21


Respondents of the Study ………………………………………... 22
Participants…………………………………………………… 22
Instrument……………………………………………………. 22
Materials and Equipment ………………………………………….. 22
Research Instrument ………………………………………………. 23
Research Procedure……………………………………………… 30
Development Methodology………………………………………. 31
Phase 1: System Requirements and Analysis……....... 32
Phase 2: Program Design………………………………. 33
Phase 3: Coding and Testing……………………………. 41
Phase 4: Integration and System Testing………………. 42
Phase 5: Deployment…………………………………….. 43
Statistical Treatment of Data ……………………………………… 44
List of Figures

FIGURE NO. TITLE PAGE NO.

1 Theoretical Framework 7

2 Conceptual Framework 8

3 Feature Extraction in CNN and SVM 15

4 Iterative Waterfall Model k 31

5 Website View Head 34

6 Rice Condition Classifier 35

7 Disease Detector Section 36

8 Use Case Diagram 37

9 System Flow Chart 38

10 Context Diagram 40

List of Tables

TABLE NO. TITLE PAGE NO.

1 SUMI (Software Usability Measurement 24


Inventory) Rating Scale

2 ISO 25010 Based Evaluation Questionnaire 27

3 Rating scale used to evaluate the developed 29


system
4 Numerical Scale of ISO25010 30

5 Data Set for Model Training and Testing 32

6 Evaluation Set for Trained Models 42


1

CHAPTER I

The Problem and its Background


Introduction
Rice has always been a staple food in Filipino cuisine and it is considered

one of the most abundant crops here in Asia. However, just like most crops, it is

susceptible to diseases that can hinder its safety and production which can lead to

a significant decrease in its quantity and quality. Two algorithms will be compared

in detecting rice disease. First is the Support Vector Machine (SVM). It is a

supervised machine learning algorithm used for classification and regression

problems. Which the data of finite-dimensional space is mapped to a much higher

dimension (p-dimension) and aims at finding the p-1- dimension hyper plane called

a linear classifier. On the other hand, the Convolutional Neural Networks (CNN)

are composed of multiple layers of artificial neurons. Artificial neurons, a rough

imitation of their biological counterparts, are mathematical functions that calculate

the weighted sum of multiple inputs and output an activation value.

Researchers will use two models in this study, both trained with different algorithms

to determine which is superior in detecting rice diseases. The SVM algorithm

utilized the process of background removal, K-means clustering, Feature

extraction and Texture extraction. The CNN algorithm shall perform data

augmentation to the leaves of the rice plant. A website shall be used in the

deployment of one model which is proven to be dominant of those two. This study

will focus on the comparison of Support Vector Machine and Convolutional Neural

Network in detecting the Rice Plant Ailment. The researchers evaluate and tested

the result which is the basis which algorithm will be implemented on the system.
2

Background of the Study


The Philippines ranked eighth in world rice production in 2018. Rice is

widely grown in Luzon, Western Visayas, Southern Mindanao, and Central

Mindanao. For the past two decades, rice production has increased from 12 Mt in

1999 to 19 Mt in 2008 as stated by FAOSTAT (2020).

The most common disease of rice crops in the Philippines is (1) Bacterial

Leaf Blight which is caused by Xanthomonas oryzaepv. oryzae (Xoo) affects the

rice plant at the seedling stage where infected leaves turn grayish-green and roll-

up. As the disease progresses, leaves turn yellow to straw-colored and wilt, leading

whole seedlings to dry up and die. The disease occurs in both tropical and

temperate environments, particularly in irrigated and rainfed lowland areas. It is

commonly observed when strong winds and continuous heavy rains occur. (2) Leaf

Scald which is caused by a fungal disease Microdochium oryzae that will result in

the scalded appearance of the rice leaves. This happens due to a wet weather

environment and high doses of nitrogenous fertilizers. It is very common in high

water content environments in Asia, Africa and the U.S. Symptoms are usually

gray-green, water-soaked lesions on the leaf and spreading through the large part

of the leaf blade. (3) Rice Leaf Blast is caused by a fungal disease called

Magnaporthe oryzae which affects the underlying parts of the plant causing severe

damage usually during the seedling stage. This occurs when there is a low soil

moisture content in the area, stagnant water and cool temperatures during the

daytime. It attacks different parts of the plant: the collar, which can ultimately kill

the entire leaf blade; the stem, which turns blackish and breaks easily (node blast);
3

the neck of the panicle, where the infected part is girdled by a grayish-brown lesion,

or when severe, causes the panicles to fall over; or on the branches of the panicles

which exhibit brown lesions when infected. (4) Narrow brown spot is caused by the

fungal disease called Sphaerulina oryzina and leads to the death of the leaves and

early ripening of the rice grain. This occurs when the soil is not getting enough

potassium and a cooler temperature ranging from 25 - 28 degree celsius. It

appears when the plant is reaching its maturity. The symptoms are usually lesions

on the leaf with a dark brown line that is parallel to the veins of the plant. It also

causes leaf sheath discoloration known as "net blotch" because of a pattern of

brown and light brown to yellow patches that resembles a net. (5) Brown Spot is

the most common and most damaging disease which is caused by the Bipolaris

oryzae. Symptoms are usually observed from seedling to the milk stage in a rice

crop and multiple large spots that can kill a whole leaf. When the disease reaches

the seeds,it will result in unfilled grains and spotted, dicolores seeds are formed.

The disease can develop when the crops are unflooded, nutrient deficient with the

soil and most importantly very high in humidity reaching from 86 to 100% with the

temperature between 16 to 36 degree celsius.

As everybody suffers a considerable financial loss as a result of these

illnesses, automation of rice disease identification and diagnosis is strongly

advocated in agricultural regions. Many control schemes have been proposed, with

deep learning emerging as the preferred method. Some of the common classifiers

among them are (1) Support Vector Machine and the newly added (2)

Convolutional Neural Network.


4

Rice disease spots would be segmented, and form and texture properties

would be extracted. The SVM technique would be used to classify Bacterial Leaf

Blight, Brown Spot, Narrow Brown Spot, Leaf Blast, Leaf Scald.In the CNN

algorithm, a model would be trained that should be fed by a dataset of images of

rice plants. It would start to process each image with its random values and then

compare its output with the image’s correct label to determine which leaves are

healthy and which ones are infected.

Objectives of the study


The main objective of this study is to create a model application that

enhances the detection of illnesses and diseases on rice plants by implementing

pattern recognition.

Specific Objectives

1. To develop a model that is capable of:

A. Recognizing illnesses on rice plants using image

classification models.

B. Recommend treatments for the detected illnesses

2. To compare the two model: Support Vector Machine & Convolutional

Neural Network in detecting rice ailment

3. To test the system and to determine the accuracy, recall, F-score and

precision of the trained image classification models.

4. To evaluate the usability of the proposed models by using ISO 25010 and

testing which image classification model would be more effective.


5

Scope and Limitation

This research would involve the creation of a model that is capable of

determining whether a rice leaf is healthy or not. Our model could detect several

conditions of the rice leaf: healthy, bacterial leaf blight, brown spot, leaf blast, leaf

scald, and narrow brown spot, based on the datasets gathered.

Supervised Vector Machine (SVM) and Convolutional Neural Network

(CNN) are two pattern recognition models that will be employed throughout the

software development process. As well as the recommendation on how to treat

infected rice plants. Lastly, is to create an analysis of the detected images that

would classify the rice diseases.

The researchers selected Windows as the platform on which the model

would run; it would be implemented locally. The model can only handle images

that are uploaded and would be recognized and are limited to the included dataset.

Significance of the Study


The research intends to improve the diagnosis of rice illnesses by including

image detection using pattern recognition so that the model can process and

analyze different textures and stains on rice leaves. This would also be critical for

our farmers to ensure the health of their crops. A website that can detect

irregularities would be beneficial in monitoring their harvests because it would

diagnose any infection that their rice has.

Beneficiaries of the proposed study include:


6

• Local Farmers/LandOwners

The model will help in detecting rice illnesses that would ensure the

wellness of their crops. Preventing possible losses due to these diseases.

• Rice Market

There would be a significant change in the quality and quantity of rice

delivered to the market due to farmers who would easily recognize the

abnormalities of their crops.

• Rice Consumer

The study could help the consumers to have a bowl of healthy rice on their

plates. Moreover, this research will contribute to the field of computer science by

giving a comparison of the accuracy of the Supervised Vector Machine (SVM) and

Convolutional Neural Network (CNN) models in recognizing irregular patches and

textures on uploaded photographs.

Research Framework

Computer Vision is a field of artificial intelligence (AI) that enables

computers and systems to derive meaningful information from digital images,

videos, and other visual inputs — and take actions or make recommendations

based on that information. The system will rely on deep learning, which is based

on a form of algorithm known as a neural network, to provide even more accurate

assessments of images required to provide an output.


7

Figure 1. Theoretical Framework

Image classification is the task of categorizing and assigning labels to

groups of pixels or vectors within an image dependent on particular rules. The

categorization law can be applied through one or multiple spectral or textural

characterizations. Then two of the most common methods to classify the overall

image through training data are ‘maximum likelihood’ and ‘minimum distance.’ For

instance, ‘maximum likelihood’ classification uses the statistical traits of the data

where the standard deviation and mean values of each textural and spectral

indices of the picture are analyzed first. Later, the likelihood of each pixel to

separate classes is calculated utilizing a normal distribution for the pixels in each

class. Moreover, a few classical statistics and probabilistic relationships are also

used. Eventually, the pixels are marked to a class of features that show the highest

likelihood.

Figure 1 represents the steps required in processing image classification,

there are six phases present in the given diagram which are image acquisition,
8

image pre-processing, supervised learning, spectral pattern recognition, and

image classification.

Figure 2. Conceptual Framework


The first process is image acquisition, which recovers patterns from input

data provided by an image. After retrieving an image, it would process it to increase

its quality so that the system could better evaluate it. Simultaneously, picture

samples would be fed into a model via supervised learning. During the training

phase, it would look for candidate features of a certain item and use both the

training data and associated output. The image would undergo spectral pattern

recognition, which classify the each pixel of the image. Finally, the image

classification. This will attempt to generate an output as well as an evaluation of

the observed image.

Figure 2 depicts the model’s procedure as it progresses in a linear fashion.

To completely execute image classification, the program requires a data source

from which the image input was obtained. The model will handle photos that the

user uploads. The trained model would now extract textures from the photographs
9

and compare them to model image samples to find abnormalities. Then it would

make some suggestions to the operator.

Technical Definition of Terms

Machine learning - refers to the usage and development of computer systems

that can learn and adapt without explicit instruction by analyzing and drawing

inferences from data patterns utilizing algorithms and statistical models.

Frame extraction - is the retrieval of an image from the frame in video footage.

Dataset - refers to collection of data clustered by different categories.

Object detection - is a computer vision technique that allows to identify and locate

objects in an image.

TensorFlow - is an end-to-end open-source platform for machine learning.

Pattern Recognition - is the automatic detection of regularities and patterns in

data.

Image Classification – is the process of taking information classes out of a

multiband raster image.

SVM – are supervised learning models with corresponding learning algorithms that

examine data for regression analysis and classification.

CNN – artificial neural network (ANN) class that is most frequently used to
assess visual imagery.
10

CHAPTER II

Literature Review

A review of literature is a categorization and assessment of what recognized

academics and researchers have published on a topic, grouped around a guiding

notion such as a research aim, thesis, or the problem or issue to be addressed. It

is the dissertation's scholarly core. It is an examination and synthesis of the source

materials written in a certain manner that moves from broad to narrow and

considers both theoretical and empirical concerns.

This chapter presents the review of related issues of the study. This chapter

will discuss how the system relates to the same issues and articles. This also

includes the same concept of the system presented in the same studies, both local

and foreign.

Rice Propagation

Nearly half of the world's population relies on rice as a staple meal, and

numerous nations have incorporated it into their cultural identity. (Chauhan,

Jabran, & Mahajan, 2017). The act of producing more plants of a specific species

or cultivar is known as plant propagation. These seeds develop into mature, fertile

plants when exposed to the right environmental circumstances.. (Trinklein,

2022). Rice may be grown as a transplant in a nursery bed or can be grown from

seed and placed in wet or dry seed beds. When transplants are moved to the field

from a nursery, they are always planted on dry seed beds. In the field, seeds may

be spread or mechanically drilled. “The main source of Additional rice production


11

in recent years is improved yield growth. However, the government must

implement a strategy to reduce population growth since the actual volume

produced by the country is not enough to match the rice demand because of the

high increase in population. If population growth is higher than the growth in yield,

the country will continue to import rice from other countries to meet domestic

demand for rice in the upcoming years. ” (Mamangon, 2016).

Rice Disease

When an epidemic of a rice disease arises, experts in the illness from

various agricultural research institutions or agricultural authorities chosen by the

government visit the location and offer guidance to the farmers. When compared

to the number of farmers, there are sometimes insufficient expertise in rice

disease. In rural regions, there is a tremendous demand for automated rice

disease detection utilizing readily available tools. (Rahman, Arko, Ali, & KhanKhan,

2018). When an epidemic of a rice disease arises, experts in the illness from

various agricultural research institutions or agricultural authorities chosen by the

government visit the location and offer guidance to the farmers. When compared

to the number of farmers, there is sometimes insufficient expertise in rice disease.

In rural regions, there is a tremendous demand for automated rice disease

detection utilizing readily available tools. (Dury, Bendjebbar, Hainzelin, Giordano ,

& Bricas, 2019). According to research conducted in India. Agriculture is a key

source of income and a means of subsistence. Rice is farmed as a staple meal

throughout India's major areas. Diseases have been shown to have a significant
12

negative impact on rice harvests, leading to significant losses for the agricultural

industry. Plant Pathologists are looking for a precise and trustworthy means of

diagnosing the illness affecting rice plants. The categorization of agricultural

diseases is one area of crop remote sensing where machine learning has been

utilized successfully. Deep learning is now a popular area of study for identifying

agricultural diseases.. (Upadgyag, Kumar, & Kumar, 2021)

The researchers' expertise in this area was crucial for the study's purpose, which

was to identify rice illnesses. According to the literature that has been collected.

Illnesses impact the entire world, not just a portion of Asia, and make farming

difficult. The researcher will compare two algorithms for detecting a specific list of

illnesses that are currently accessible.

Machine Learning

Classification, regression, and online learning are a few of the forms of

machine learning that can be utilized. The dataset can consist of text, numbers, or

images. The fundamental element of machine learning is that the algorithm behind

it primarily processes the input and the projected output. According to technical

definitions, this process mostly comprises interconnected routes, also known as

neural networks; these networks act as the artificial consciousness. (Tubiera &

Ricohermoso, 2021) . The main goal of machine learning in the creation of the

CNN and SVM models is to provide the data required in these models that will be

implemented in the system. This research was heavily skewed in this direction. In

order to conduct the operation of the system and pursue its goal, machine learning
13

has been employed to train this model. In this study, both model training was

supervised.

Supervised Learning
Supervised learning is an area of machine learning and artificial intelligence

known as supervised machine learning. It is distinguished by the way it trains

computers to accurately classify data or predict outcomes using labeled datasets.

(Education, 19 August 2020). The information that the machine would need to

classify the photos was calibrated by the researchers using supervised learning in

both models.

Image Acquisition

The most crucial step in dealing with photos is to capture them before

evaluating them. Image acquisition is the term used for this. A proper camera

allows for image acquisition. ( Mishra, Kumar, & Shukla, 2017). It is the method

through which we produce a digital image of a situation. The components of this

representation are referred to as pixels and are known as an image (picture

elements). An image sensor is the technical term for the electrical equipment that

records a scene. The most widely utilized image sensor technologies are a charge-

coupled devices (CCD) and complementary metal oxide semiconductor

(CMOS) (Perez-Sanz, Navarro, & Marcos, 2017). Image acquisition includes

these steps: (1) determining the position to be monitored; (2) arranging artificial

targets or using natural targets on the measurement points; (3) selecting an

appropriate camera and lens; (4) assembling the camera lens and setting it firmly
14

on a relatively stationary object; (5) aiming at the target and acquiring images.

(Zhuang, 2022). Certain acquisition tools are needed to capture photos.

Transforms between multiple geometric models and spatial coordinate systems

are necessary for imaging with a camera. The scene's characteristics, particularly

the type of light radiation, are reflected in the image. Image acquisition requires

the use of optics and image sensors. The former focuses a portion of the scene on

the sensor, while the latter converts the electromagnetic radiation energy into an

electrical signal that can be processed, displayed, and interpreted as an image.

Taking a sample of pixel values from a physical picture that is focused on the image

plane is known as image sampling. This sample may be multi-spectral, color, or

monochrome. In a rectangular array, the sampling frequently produces results that

are evenly spaced. The use of space-invariant sensors is one of the additional

sampling strategies and approaches. (Zhang, 2021) Data from the samples will be

collected using this method. The two algorithms for identifying illness will analyze

and test images of rice that have been obtained. Researchers will employ image

capture in order to offer the results and compare them based on the data obtained

from the study.

Feature Extraction

One of the foundational elements of computer vision-based object

recognition and classification is features extraction. A feature is data that is utilized

to solve a particular computer vision issue and is separate from the raw picture.

The so-called "feature vectors" include the features that have been retrieved from

a picture. (Perez-Sanz, Navarro, & Marcos, 2017) A feature is a significant and


15

distinctive aspect of a picture. Using a proper algorithm, feature extraction is the

process of extracting certain pertinent features from a picture. The classification

speed, accuracy, and ability to train a classifier with a huge quantity of data are all

improved by an accurate feature selection strategy. The most pertinent information

from the original data is extracted during feature extraction, and that information is

then represented in a reduced dimensional space. Since a classifier won't be able

to detect pictures from badly picked features, feature selection is a crucial post-

processing step. (Hasan, Mahbub, Alom, & Nasim, 2019)

Feature Extraction in CNN and SVM

In order to bridge the gap between image classification and object

recognition, one method for object detection employed convolutional networks to

suggest areas that were then coupled with CNNs. A further drawback was that the

method could only handle a small volume of training data, although that is more of

a learning issue than a feature extraction one. First, a region proposal was

constructed from the input picture. Each area proposal was then categorized using

category-specific linear SVMs after a CNN extracted a fixed-length feature vector

Figure 3. Feature Extraction in CNN and SVM


16

from each proposal. From each area suggestion, a fixed-size CNN input is

calculated using anisotropic picture scaling. The method made use of Alexnet, and

the initial layers retrieved information about edges and colors, which were then

aggregated in subsequent levels. Fig. displays the result of the last convolutional

layer. Figure 3. A together with the activation values, which among the highest

activations reveal units identifying red blobs, dog faces, and human faces. The

figure displays the final semantic segmentation. Figure 3. B, which demonstrates

the potency of these novel strategies.

The work of pre-processing images is a vital role for the categorization

process. The major goal of this part is to increase the dataset's image quality and

quantity. The quality, diversity, and quantity of the picture samples have a

significant impact on how well CNN extracts features. Therefore, the goal of the

pre-processing assignment is to increase the number of picture samples while

maintaining the quality and resolution of the original images. In order to increase

the number of photos, we manually trim huge photographs that include many

samples. (Hasan, Mahbub, Alom, & Nasim, 2019)

GLCM (Gray Level CO-Occurrence Matrix)

The gray-level co-occurrence matrix (GLCM), also referred to as the gray-

level spatial dependence matrix, is a statistical technique for analyzing texture that

takes into account the spatial relationship of pixels.The GLCM functions

characterize the texture of an image by calculating how often pairs of pixel with

specific values and in a specified spatial relationship occur in an image, creating a


17

GLCM, and then extracting statistical measures from this matrix. (MathworksInc,

2022) In a research about CBIR for classification of cow types using GLCM and

color features extraction, This research uses the GLCM to look for contrast,

energy, correlation, homogeneity and entropy. Confusion Matrix is used to assess

the CBIR's proposed accuracy. Considering the measurement results, accuracy

was 95%, while precision and recall were 100%.(T. Sutojo, et al. 2017)

In this study, the GLCM is employed to extract the leaf characteristics

related to diseases. The input data will be subjected to the image classification

process along with other processes. Given the aforementioned information, the

study can confidently use this feature extraction and rely on this information as its

foundation.

Support Vector Machine (SVM)

SVM is a supervised machine learning technique used for regression and

classification. Each piece of data in SVM is represented as a point in an n-

dimensional space, where n is the number of features being categorized. Finding

the hyper-plane that distinguishes between distinct groupings of dispersed data

points is how the classification is produced. (Hossain, Mou, Hasan, & Chakraborty,

2018) Due to its relative simplicity and lesser danger of over fitting, SVMs are

frequently used with multi-voxel pattern analysis (MVPA) for processing high-

dimensional imaging data. (Pisner & Schnyer, 2020)


18

Research conducted by (Hossain, Mou, Hasan, & Chakraborty, 2018) in

their research about the Recognition and Detection of Tea Lead’s Disease using

Support Vector Machine, proves that this algorithm is able to classify the disease

more accurately by 93% compared to other classifiers. Also, the algorithm helps

to reduce and extract the features that are required, which results in reducing the

required processing time. As shown through their simulation results, the proposed

algorithm can process a leaf classification 300 ms quicker than previous research

using SVM. This testifies to the effectiveness of SVM in detecting diseases,

highlighting the fast process despite a large amount of data. In this study, the SVM

will be used as one of the approaches to be compared in detecting rice diseases.

It aims to identify the best approach for detecting rice diseases.

Convolutional Neural Network

Convolutional Neural Network (CNN) performance has been quite good in

numerous issues involving computer vision and machine learning. CNN is helpful

in many contexts, but it excels at jobs involving images. Application areas for CNN

include object identification in pictures, semantic segmentation of images, and

image classification. (Wu, 2017) The concept of transfer learning is to transfer

knowledge gained in one or more initial tasks and use them to improve learning in

the current task. As a result, it has become possible to retrain an artificial neural

network trained in a single sample of data to perform tasks on a new data set,

which significantly speeds up the learning process of the

network. (Khotsyanovsky, 2022) Convolutional neural network techniques are

enhanced and optimized deep learning techniques. Two layers make up the
19

classifier: (1) the convolutional layer, which is made up of a series of convolutional

layers, and (2) fully linked layers. Basic characteristics like corners, edges, and

lines are extracted by the lower level convolutional layer. The intermediate level

identifies individual components of things, such as the eyes, nose, and other

features on faces. The most advanced stages of the entire thing. Building deep

learning models nowadays often involves using a transfer learning strategy. Many

well-known models for image classification issues have now been developed using

transfer learning. The pre-trained model is taken into account in transfer learning

depending on the dataset and dataset size. When the dataset is tiny, it is best

practice to retrain the classifier using all of the pre-trained models' weights as

feature extractors. (Masud, 2021)

Convolutional Neural Networks (CNN) have been shown to be effective in

the field of agriculture, particularly for assessing plant visual symptoms. There is a

choice between (1) creating smaller models for specific crops or (2) creating a

unique multi-crop model in a much more difficult task (especially at early disease

stages), but with the benefit of the entire multiple crop image dataset variability to

enrich image feature description learning as these models increase in both the

number of training images and the number of supported crops and

diseases. (Picon, et al., 2019)

Some of the research has proved the effectiveness of CNN in detecting

plant diseases in a lot of fields of agriculture. According to (Gutiérrez, et al., 2020)

to their work in “Comparison of Convolutional Neural Network Architectures for

Classification of Tomato Plant Diseases,” every model used in This work was
20

capable of classifying nine diseases in tomato leaves from the healthy class, where

the Google Net model with 22 layers can achieve 99.72% classification of tomato

diseases using the training mechanism of transfer learning, which is highly

statistically significant and demonstrates the effectiveness of classifying crop

diseases with the combination of CNN and fine-tuning adjustment.

To sum up, the researcher utilizes all the relevant literary pieces and

documents in this chapter as a starting point for a closer examination of the terms

and concepts that will be employed in the study. The information acquired supports

the researcher's claims and positions in developing this project. Concentrating on

the core idea of two methods: Support Vector Machine, which employs feature

extraction to identify the illnesses affecting rice. A Convolutional Neural Network

that emphasizes picture quality and improves objective results. This chapter acts

as the supporting information for their foundation because the study's main goal is

to compare these two methodologies.


21

CHAPTER III

Research Methodology

Research Design

The research is divided into two phases. First, a comparison of the Support

Vector Machine and Convolutional Neural Network pattern recognition models is

performed. This includes training and testing the models to establish their accuracy

as well as which of them is better at recognizing abnormalities and illnesses on

rice leaves using photos processed by the model. Furthermore, a prototype model

was constructed to implement which of the two performed better, and the usability

of the model was evaluated by the respondents. Furthermore, the assessment

findings were interpreted and examined to discover whether the demographic

features of the respondents can influence the model's usability.

Since the aim of this study is to compare two algorithm the researcher used

Comparative Research which both conclude qualitative and quantitative analysis

of the study. Comparative research essentially compares two groups in an attempt

to draw a conclusion about them. Researchers attempt to identify and analyze

similarities and differences between groups. The purpose of these studies is to

determine either similarities between groups or differences between groups.

(Richardson, 2018)
22

Respondents of the Study

Participants

The participants of the study were rice farmers, rice researchers, IT

professionals and students who are studying related to agricultural fields. The

participants were selected using Convenience Sampling Stratified and Quota

Sampling. The interview will conducted on San Isidro, Tiaong, Quezon for data

gathering.

Instrument

The researchers will used the ISO 25010 Software Quality for IT expert

evaluation to ensure the credibility of the model. Also the researchers prepared an

evaluation and questionnaire for the Agricultural Experts as well for the Farmer

and rice researchers. With this the data in the study will provide a credible result.

Materials and Equipment

The researchers will utilize the materials and equipment that are needed for

the completion of the prototype. The hardware component to develop the model is

a computer (desktop or laptop). On the other side, Software includes with an

Integrated Development Environment (IDE) for the creation of the model. Using

Python as the primary programming language and the combination of website

making tools such as HTML, CSS, Flask, Javascript and more to fully integrate the

prototype to the website.


23

Research Instrument

This research uses SUMI (Software Usability Measurement Inventory) and

the system is evaluated through ISO/IEC 9126-1 standard in terms of the quality

of the software and ISO 25010 as the instruments to gather data. This will assure

that the data is credible and validated.

SUMI (Software Usability Measurement Inventory)

SUMI is a method of evaluating the quality of software from the viewpoint

of the end user that has been thoroughly tested and proven. User experience has

always been measured by SUMI, which was originally developed in the 1990s

under the name "user satisfaction.” This approach was used by the researcher to

gauge how well the system was working and how satisfied users were with it. In

order to scale the rating of this questionnaire, the table will be the basis of its

interpretation. There are different aspects of SUMI’s user’s satisfaction includes

Efficiency, Affect, Helpfulness, Control and Learnability.

Efficiency refers to whether the user believes the software makes it

possible to complete the task(s) quickly, effectively, and affordably or, at the other

extreme, whether they believe the software is impeding performance.

Affect is a psychological term for an emotional state. In this context, it refers

to whether using the software has left the user feeling mentally stimulated and

pleasant or whether they have experienced the opposite.


24

Helpfulness: This is the user's perception of how helpfully the software

communicates and aids in resolving operational issues.

Control refers to the user's perception that he, not the product, is in charge

of determining the pace

Learnability is the simplicity with which a user can begin using and

discovering new features of the product.

Table 1.
SUMI (Software Usability Measurement Inventory) Rating Scale

SCALE INTERPRETATION Interpretation


1 Agree
2 Undecided
3 Disagree

ISO 25010

An evaluation system for product quality is built around the quality model.

When assessing a software product's properties, the quality model specifies which

quality traits will be taken into account. To assess the quality caliber of the software

created for this project, project demonstration and final evaluation are used. The

ISO 25010 software quality model is used in this study, software quality is divided

into two broad dimensions: (1) product quality and (2) quality in use. In this study

the quality will be evaluated by:


25

Functional suitability:

1. Functional Completeness: degree to which the set of functions

covers all the specified tasks and user objectives.

2. Functional Correctness: degree to which the functions provides the

correct results with the needed degree of precision.

3. Functional Appropriateness: degree to which the functions

facilitate the accomplishment of specified tasks and objectives.

Performance efficiency

1. Time-behavior: degree to which the response and processing times

and throughput rates of a product or system, when performing its

functions, meet requirements.

2. Resource utilization: degree to which the amounts and types of

resources used by a product or system, when performing its

functions, meet requirements.

3. Capacity: degree to which the maximum limits of the product or

system, parameter meet requirements.

Usability:

1. Appropriateness recognisability: degree to which users can

recognize whether a product or system is appropriate for their needs.


26

2. Learnability: degree to which a product or system enables the user

to learn how to use it with effectiveness, efficiency in emergency

situations.

3. Operability: degree to which a product or system is easy to operate,

control and appropriate to use.

4. User error protection: degree to which a product or system protects

users against making errors.

5. User interface aesthetics: degree to which a user interface enables

pleasing and satisfying interaction for the user.

6. Accessibility: degree to which a product or system can be used by

people with the widest range of characteristics and capabilities to

achieve a specified goal in a specified context of use.

Reliability

1. Maturity: degree to which a system, product or component meets

needs for reliability under normal operation.

2. Availability: degree to which a product or system is operational and

accessible when required for use.

3. Fault tolerance: degree to which a system, product or component

operates as intended despite the presence of hardware or software

faults.

4. Recoverability: degree to which, in the event of an interruption or a

failure, a product or system can recover the data directly affected and

re-establish the desired state of the system.


27

Security

1. Confidentiality: degree to which the prototype ensures that data are

accessible only to those authorized to have access.

2. Integrity: degree to which a system, product or component prevents

unauthorized access to, or modification of, computer programs or

data.

3. Non-repudiation: degree to which actions or events can be proven to

have taken place, so that the events or actions cannot be repudiated

later.

4. Accountability: degree to which the actions of an entity can be traced

uniquely to the entity.

5. Authenticity: degree to which the identity of a subject or resource can

be proved to be the one

Table 2.
ISO 25010 Based Evaluation Questionnaire
1 2 3 4 5
A. Functional Suitability
Functional Completeness. The system covers all the
specified tasks and user objectives.
Functional Correctness. The system provides the correct results
with the needed degree of precision.
Functional Appropriateness. The system facilitates the
accomplishment of specified tasks and objectives.

B. Performance Efficiency
Time Behavior. The system’s response and processing times
and throughput rates when performing its functions, meet
requirements.
Resource Utilization. The system’s amounts and types of
resources used when performing its functions, meet requirements.
Capacity. The system’s maximum limits of parameter meet
requirements.
28

C. Compatibility
Co-existence. The system can perform its required functions
efficiently while sharing a common environment and
resources with other products, without detrimental impact on
any other product.
Interoperability. The system can exchange information and use
the information that has been exchanged.

D. Usability
Appropriateness Recognizability. The system allows users to
recognize if it is appropriate for their needs.
Learnability. The system can be used by specified users to
achieve specified goals of learning to use the application with
effectiveness, efficiency, freedom from risk and satisfaction in a
specified context of use.
Operability. The system has attributes that make it easy to
operate and control.
User Error Protection. The system protects users against making
errors.
User Interaction Aesthetics. The system’s user interface enables
pleasing and satisfying interaction for the user.
Accessibility. The system can be used by people with the widest
range of characteristics and capabilities to achieve a specified
goal in a specified context of use.

E. Reliability
Maturity. The system meets the needs for reliability under normal
operation
Availability. The system is operational and accessible when
required for use.
Fault Tolerance. The system operates as intended despite the
presence of hardware or software faults.
Recoverability. The system can recover the data directly affected
and re-establish the desired state.

F. Security
Confidentiality. The system ensures that data are accessible only
to those authorized to have access.
Integrity. The system prevents unauthorized access to, or
modification of, computer programs or data.
Non-repudiation. The system can be proven to have taken place,
so that the events or actions cannot be repudiated later.
29

The following are required for demonstration and proper evaluation:

Procedures are employed.

1. Present and demonstrate the project so that the evaluators can investigate it.

Capabilities and applications Five (5) IT experts and one (1) agricultural expert will

be asked to evaluate the proposal. The system to determine whether the

application we created is suitable. It defined its mission and operational

capabilities.

2. Distribute ISO 25010 ratings using the scale shown in Table 3. the assessment

tools to the Five (5) IT Experts

3. Collect the instrument and then tabulate the data.

4. Determine the mean of each standard, as well as the overall mean; and

5. To interpret, use the equivalent numerical scale shown in Table 3. the findings

of the evaluation of the developed product's quality software.

Table 3.
Rating scale used to evaluate the developed system
RATING SCALE INTERPRETATION Interpretation
1 Strongly Agree
2 Strongly Agree
3 Agree
4 Disagree
5 Strongly Disagree

Table 3 shows the numerical rating and equivalence used in the ISO 25010

evaluation tool to interpret the evaluation result with its equivalent interpretation.
30

Table 4.
Numerical Scale of ISO25010
SCALE Interpretation
4.5 – 5.00 Excellent
3.5 – 4.00 Very Good
2.51 – 3.50 Good
1.51 – 2.50 Fair
1.00 – 1.50 Poor

Table 4 shows the numerical scale used to interpret the result of the

conducted evaluation. The Likert scale Formula utilized by the developers to

determine the evaluation results. 1.00 – 1.50 is interpreted as “Poor”, 1.51 –2.50

is interpreted as “Fair”, 2.51 – 3.50 is interpreted as “Good”, 3.51 – 4.50 is

interpreted as “Very good” and 4.51 – 5.00 is interpreted as “Excellent”.

Research Procedure

In order to conduct the data collection and research. It started with the

proposal letter in order to conduct an interview to the target participants. Then it

follows by the interview. The initial interview was conducted to testify the purpose

of the study and the final interview will be used as the basis of the result. An

interview to the field owners, farmers, researcher and IT expert will be conducted.
31

Development Methodology

Figure 4. Iterative Waterfall Model


https://www.techtarget.com/searchsoftwarequality/definition/waterfall-model
Figure 4 depicts an iterative waterfall model in which consist of 5 phases

and all phases are cascaded to one another and progress is represented as

flowing smoothly downhill (like a waterfall) through the stages. It differs slightly

from the basic model in that it gives feedback from each phase to its preceding

phases, minimizing the time and effort required to fix problems. When faults are

found later in the process, these feedback pathways allow programmers to remedy

errors made earlier in the process. The feedback channels allow the phase in

which errors are committed to be modified, and these modifications are reflected

in subsequent phases.
32

Phase 1: System Requirements and Analysis

The researchers gathered relevant information for the study, such as the

technologies to be used and the dataset for training object detection algorithms.

Data were acquired through reading related literature and studies on rice, rice

diseases, machine learning, pattern recognition, and photographs of identifying

rice diseases to advance the researchers' understanding.

Table No. 5
Data Set for Model Training and Testing
Training Set Test Set Total
Labels
Number of Number of Number of
Instances Instances Instances

Bacterial Leaf 1934 88 2022


Blight

Leaf Blast 1803 91 1894

Brown spots 1973 93 2066

Leaf Scald 358 90 448

Narrow Brown 352 88 440


Spot

Healthy 371 93 464

TOTAL 6791 543 7334

To train the pattern recognition model, a dataset about rice diseases must

be used as a training set so that it can spot anomalies in the rice plant. This

information was obtained from Kaggle (2021). The data set includes seven

thousand six hundred thirty four (7634) number images of six types of rice leaf

diseases: two thousand three hundred twenty-two (2322) Bacterial blight, one
33

thousand eight hundred ninety-four (1894) Leaf Blast, four hundred forty (440)

Narrow Brown Spot, four hundred forty-eight (448) Leaf Scald; four hundred

forty(440) Narrow Brown Spots and Four hundred sixty-four (464) Healthy. The

following photographs were chosen with the visual context and various types,

colors, damage types, and patterns in mind. The training set of the dataset consists

of 1934 Bacterial Leaf Blight, 1973 Brown Spot, 1803 Leaf Blast, 358 Leaf Scald,

352 Narrow Brown Spot and 371 Healthy. Furthermore, the test set yielded 88

bacterial blight, 91 blasts, 93 brown spots, 90 leaf scald, 88 narrow brown spots,

and 93 healthy. Transfer learning is a technique used to train the Convolutional

Neural Network (CNN) object detection models use the library and programming

language to train the model against the dataset. The models are trained within

eight (8) hours (6 hours in modeling and 2 hours for fine tuning) to guarantee that

each model performs well in detecting the aforementioned things.

Phase 2: Program Design

The data that was collected and processed in the requirements gathering

phase could be used to create design strategies to provide more robust design

choices. Researchers would use Bootstrap for the overall design and style of the

user interface, such as the navigation menu, buttons, and alerts.

In this phase, the prototype model will be created and crafted in this stage

as a web-based application employing HTML, JavaScript and CSS for the front-

end of the website. The researchers will additionally employ Bootstrap for the

general look and feel of the user interface, including navigation such as buttons.
34

For the back-end of the website the researchers used Flask implement the Python

to the website.

A prototype is an important tool in the development of a product. It provides

the creative with the opportunity to show the production’s possible outcome.

Making a prototype for this study provide the opportunity to make design

adjustments, solve layout issues, and improve the product's aesthetic appeal. A

rough draft of the product serves as the prototype. After the prototype is made, the

product can be improved in terms of both function and design. The user interface

created for the product in this study is represented by the photos that are provided.

This is the display that appears to make operating the device easier for the user.

The user can access and operate the system with ease because to its

straightforward interface. Additionally, it gives the right information about the

system.

Figure 5. Website View Head


35

Presents the first part of the website which includes the title of the web

page, a brief description of the website and “get started” button to proceed to the

next step. The system's name, function, and any associated tabs will all be made

clear to the user in this area, making the system more user-friendly. The

statement stating that the website is exclusively intended for research purposes.

Figure 6. Rice Condition Classifier

Presents the brief description on the five common diseases of the rice. T

This information contains things like the disease's causes, symptoms, and how to

recognize it. The user can view the details next to the photographs and their

names. The user can identify the ailments and learn more information about them

because of to this section. The user would better understand the system's role in

detecting these disorders if they are informed about it.


36

Figure 7. Disease Detector Section

This section showcase the upload function and a submit button. The upload

function enables the users to upload a photo of the rice plant with the rice disease.

The submit button is for the submission of the photo to the website for the machine

to predict the disease and display it predictions.


37

Figure 8. Use Case Diagram

In this figure it shows the use case diagram which serve as the visual

representation of the communication between the user and the system. Where

there is a user that uploads the image to the system which will be scanned for the

system’s Image Classification. Additionally, after the user uploads the image to the

system, clicking the submit button will allow the image to go through the process

of determining the state of the rice leaf and also display the image on that area.
38

Figure 9. System Flow Chart

This figure shows the flow chart of this system where it present the flow of

the system in determining the disease base on the users input. The system will

start when an input are being recognize which will be undergo by image handling

of the system. After the system done with the image handling it will proceed to the

Image classification that will identify the disease or the status of the rice that is in

the image. Following classification, the image will be used to determine if it is


39

healthy or afflicted with a disease. When the leaf state is healthy, the program will

stop; otherwise, the system will display the prediction and disease information.

The images would then undergo classification which would categorize and label

the pixels and vectors within an image. In SVM the pictures would first undergo

background removal which includes converting them into HSV format after then

enhance their contrast through the use of histogram equalization then the image

would be segmented through thresholding that would create a binary image setting

threshold value on the pixel intensity of the image based on the values

morphological transformations would be applied to determine the shape of the

object. The first morphological transformation is the erosion that erodes the

boundaries of the foreground object after that process is the dilation which would

increase the boundaries of the image, the first process is to remove the noises and

the second would help in joining broken parts of the object. To segment, the image

K-means clustering is used its role is to group similar data points and discover

underlying patterns. It identifies k (which refers to the number of centroids you

need in the dataset), several centroids (the imaginary or real location representing

the center of the cluster), and then allocates every data point to the nearest cluster

while keeping the centroids as small as possible. Feature extraction constructs

combinations of the variables to get around these problems while still describing

the data with sufficient accuracy it would start with masking a method of indicating

which elements of a matrix or vector should and should not be used followed by

contouring or the line joining all the points along the boundary of an image that are

having the same intensity it would be used to segment the affected part of the
40

image these two processes is essential in finding the mean and standard deviation

of pixels as well as the height and width of the segmented image. Lastly is to

extract texture on the image through the use of GLCM which is a matrix that is

defined over an image to be the distribution of co-occurring pixel values at a given

offset.

CNN on the other hand would only require data augmentation which consists of

flipping, zooming, shearing, rotating and cropping of the images.

Figure 10. Context Diagram

The system only interacts with one user at a time. The system's context

diagram is shown in this figure. The system is processing the information. Users

will upload images, which the system will use as it to compare to its dataset and

identify its status. The system runs them through an algorithm and provides the

necessary information. The system will provide recommendations to the user in

addition to the findings of any tests it has run.


41

Phase 3: Coding and Testing

The developers will use an Integrated Development Environment (IDE) to

manage the project files and structure which is Visual Studio Code developed by

Microsoft. It is also integrated with the programming language Python 3.8.10 as

the main syntax of the project. The tool is being used for the entire development

process of the website application. Moreover, the project requires a computer or

laptops to access the website. The project will be developed using a laptop with

the following system specifications: Intel i7 - 7700HQ @ 2.80GHz CPU, 12GB

RAM, NVIDIA GeForce GTX 1050 Ti (4GB) GPU, and Windows 11 operating

system.

The software application's user interface will be developed on web browsers

utilizing tools like Bootstrap, HTML, CSS, JavaScript, and Flask. The website is

suitable with desktops, laptops, tablets, and smartphones using bootstrap

technology. It is in charge of ensuring that the website's software application is

responsive and suitable for display on devices with various screen sizes. The user

interface's basis is made up of HTML, CSS, and JavaScript, and it will allow users

to interact with the system's design theme. The webpage that mixes and inclines

the capability of the Python programming language is connected to Flask. In

addition, its library is used as a dependency to train the model and establish its

learning. Tensor flow and Keras are the dependencies used in this system.
42

Phase 4: Integration and System Testing

In testing the model for the prototype, there was two sets of data to be

used. One is for the training of the model and one for evaluation. This is to

ensure that the result can be compared with one another. Also to identify if the

model is trained well to deploy in the website.

Table 6.
Evaluation Set for Trained Models
Evaluation Set
Labels
Number of Instances

Bacterial Leaf Blight 88

Leaf Blast 91

Brown spots 93

Leaf Scald 90

Narrow Brown Spot 88

Healthy 93

TOTAL 543

To assess the image detection model's performance and establish its

precision and accuracy. To test the trained model, the system need a dataset for

evaluation. It includes a dataset for testing and evaluating for leaf condition such
43

as Bacterial Leaf Blight, Leaf Blast, Leaf Scald, Narrow Brown Spot and Healthy.

which consists of a total of five hundred forty-three (543).It includes: eighty-

eight(88), ninety-one(91), ninety-three(93), ninety(90), eighty-eight(88) and ninety-

three(93) instances per conditions.

The percentage of accurately anticipated or identified features is known as

accuracy. The model must also have a percentage rate of at least 70% in order to

be considered acceptable. On the other hand, the model's level of precision is

determined by how many predictions it classifies as true or positive. In the system.

Researchers used a default accuracy and precision from a library in Scikit to test

the system.

Phase 5: Deployment

If one trained model outperforms the others in the study, it will be deployed

on the website. The test data for evaluation will be used on the website to evaluate

the model's function. The system will be deployed in a website using flask as the

main system and tested on various devices to determine the system's capability.

This will serve as the model's prototype and will be used to collect research data.

In this period, developers will evaluate and investigate the problem found in

the prototype model application during revision, and develop a new solution. As a

result, the prototype model’s design and functionality may change and improve.

Additionally, the functionalities and performance of the prototype model will be

retested and examined after the upgrade to see if additional improvements are

needed.
44

Statistical Treatment of Data


The formula of getting the percentage is given as:

P = f / N x 100
Were:
P = computed percentage
f = frequency of response
N = total number of respondents
The formula of getting the Average Weighted Mean is given as:

AWM= (5*f) + (4*f) + (3*f) + (2*f) + (1*f) / N


Were:
AWM= average weighted mean
f= frequency of response
N= total number of respondents

The percentage is computed with the following components. The frequency of the

response and it is divided by the total number of respondents multiplied to 100

inorder to get the percentage. The average weighted mean is computed by

multiplying the number of respondents to the corresponding answers and dividing

it by the number of the respondents.

You might also like