You are on page 1of 25

Computers and Electronics in Agriculture

TomFusioNet: A bi-stage Tomato disease recognition and illness classification


framework for mobile applications using the late fusion of cross-domain transfer
learning strategy
--Manuscript Draft--

Manuscript Number: COMPAG-D-22-00440

Article Type: Research Paper

Keywords: Deep Learning; Feature Fusion; Image processing ; Late Fusion; precision
agriculture

Abstract: Tomato crop disease is a critical concern for the farmers which needs to be addressed
to mitigate production loss. Recently, several multifarious technologies have been
proposed for the rapid recognition of tomato crop diseases. But there exists a wide
research gap in the practical deployment of these systems. Additionally, the existing
models suffer from various problems such as overfitting, gradient vanishing, etc. To
overcome these problems, this paper proposes an end-to-end mobile application
framework, TomFusioNet, for tomato disease analysis using its leaf images. For
feature extraction, the late-fusion strategy is leveraged by aggregating the results of
multiple cross-domain transfer learning models using a non-linearly weighted strategy.
For late fusion implementation, Multi-layer Perceptron (MLP) models are used as
separate meta-learners. TomFusioNet’s pipeline comprises two modules, namely,
DeepRec and DeepPred. DeepRec aims at providing preliminary disease detection
results while DeepPred further identifies the class of illness in case of positive
diagnosis by DeepRec. This paper additionally highlights the significance of feature
relevancy; therefore, a dedicated Hue Saturation and Value (HSV) color model-based
background removal algorithm is incorporated in TomFusioNet's pipeline as a pre-
processing step. A smartphone app is also designed for remote crop monitoring. The
proposed DeepRec and DeepPred late-fusion models achieve an average accuracy of
99.93% and 98.32%, respectively. Extensive comparative analyses reveal that the
proposed framework outperforms the state-of-art models in terms of accuracy rate,
weighted precision, recall, and F-measure. The proposed app framework does not
require any human intervention because it follows an end-to-end feature pipeline from
the pre-processing to the final result. Thereby, proving its robustness and efficacy.
Furthermore, the latency of predictive results on unseen data is less than 2 seconds,
hence, it can be effectively used by the farmers for rapid crop analysis at the comfort of
their location.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Highlights

Highlights:

 A bi-stage Tomato disease recognition and illness classification framework is proposed.


 A mobile application “TomFusioNet” is developed for tomato disease analysis using its
leaf images
 To achieve better results late fusion of cross-domain transfer learning strategy is
proposed.
 The implemented application framework does not require any human intervention
Manuscript File Click here to view linked References

1 TomFusioNet: A bi-stage Tomato disease recognition and illness


2 classification framework for mobile applications using the late fusion of
3 cross-domain transfer learning strategy
4
5 Harshit Kaushik1, Anvi Khanna2, Dilbag Singh3, Manjit Kaur3 and Heung-No Lee3
6 1Affiliation: School of Computing and Information Technology, Manipal University Jaipur, Jaipur, Rajasthan 303007, India
7 2Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India
8 3School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea.
9 Corresponding author: Heung-No Lee (e-mail: heungno@gist.ac.kr).
10
11 Abstract
12 Tomato crop disease is a critical concern for the farmers which needs to be addressed to mitigate production loss. Recently, several multifarious
13 technologies have been proposed for the rapid recognition of tomato crop diseases. But there exists a wide research gap in the practical
14 deployment of these systems. Additionally, the existing models suffer from various problems such as overfitting, gradient vanishing, etc. To
15 overcome these problems, this paper proposes an end-to-end mobile application framework, TomFusioNet, for tomato disease analysis using
16 its leaf images. For feature extraction, the late-fusion strategy is leveraged by aggregating the results of multiple cross-domain transfer learning
17 models using a non-linearly weighted strategy. For late fusion implementation, Multi-layer Perceptron (MLP) models are used as separate
18 meta-learners. TomFusioNet’s pipeline comprises two modules, namely, DeepRec and DeepPred. DeepRec aims at providing preliminary
19 disease detection results while DeepPred further identifies the class of illness in case of positive diagnosis by DeepRec. This paper additionally
20 highlights the significance of feature relevancy; therefore, a dedicated Hue Saturation and Value (HSV) color model-based background removal
21 algorithm is incorporated in TomFusioNet's pipeline as a pre-processing step. A smartphone app is also designed for remote crop monitoring.
22 The proposed DeepRec and DeepPred late-fusion models achieve an average accuracy of 99.93% and 98.32%, respectively. Extensive
23 comparative analyses reveal that the proposed framework outperforms the state-of-art models in terms of accuracy rate, weighted precision,
24 recall, and F-measure. The proposed app framework does not require any human intervention because it follows an end-to-end feature pipeline
25 from the pre-processing to the final result. Thereby, proving its robustness and efficacy. Furthermore, the latency of predictive results on unseen
26 data is less than 2 seconds, hence, it can be effectively used by the farmers for rapid crop analysis at the comfort of their location.
27
28 Keywords: Deep Learning, Feature Fusion, Image Processing, Late Fusion, Precision Agriculture.
29
30 1. Introduction
31 Agriculture plays an inevitable role in boosting the economic upturn of a country. According to [1], the agriculture sector of India contributes
32 35% to the national income and 14% in Gross Domestic Product (GDP). In India, tomato is the third most important crop after potato and
33 onion, however, its harvest index is less than other nations [2]. Bacterial diseases are a serious restricting factor that threatens the healthy yield
34 of tomatoes. These diseases affect the nutritional value of the crop which can have adverse effects on human health. According to past reports,
35 failure to timely gauge and treat crop diseases had a negative correlation with the economic growth of a country. Studies have shown that crop
36 loss due to pests, diseases and weeds approximately range from 10% to 30% of overall crop productions [2]. In the context of tomatoes, there
37 are multifarious pathogens responsible for a disease. For instance, an annual economic loss of 20%-70% is because of the Blight disease, which
38 is caused by a fungal pathogen, i.e., Phytophthora infestans [3]. Hence, tomatoes retain a prominent place in the human diet and, therefore,
39 their crop disease detection is crucial in ensuring bodily functions.
40 The traditional method for tomato disease identification involves manual human assessment by a plant expert or a farmer which is a
41 strenuous task. Also, this assessment methodology is subject to drawbacks such as high misjudgment error and less efficiency [4]. With the
42 accelerated developments in the field of artificial intelligence, internet of things (IoT), and cyber-physical systems (CPS) the agricultural sector
43 has started adopting precision farming – an emanating farming management paradigm to develop a decision support mechanism for conducting
44 a focussed assessment of infected crops and plants – for the early identification and anomaly detection of inter-field variability in crops [5].
45 Previous decision-making systems extensively relied on the extraction of handcrafted features that can be fed into a machine learning model
46 for disease prediction. For instance, K-nearest neighbor (KNN) [6], Support vector machine (SVM) [7], and Naïve Bayes [8] algorithms have
47 been extensively employed in this arena of agricultural research around the globe. However, manual feature extraction is laborious, time-
48 consuming, and necessarily demands an expert having the domain knowledge that comes at an additional cost of human resources. With the
49 advent of technology in recent years, the scientific community has seen remarkable deep learning applications in multifarious fields such as
50 object detection [9], computer-aided diagnosis [10], crowd control [11], human-computer interaction [12], etc. The automatic feature extraction
51 capability of deep models has fueled their extensive surge in the domain of machine vision [13].
52 In the last decade, deep learning has gained momentum in precision farming, whereby, the model automatically extracts and learns
53 signs for anomaly detection in plant images through some gradient optimization strategy [14]. With the accessibility to thousands of publicly
54 available plant images, several deep learning methodologies and cross-domain transfer learning models have been proposed and achieved state-
55 of-art results in farm applications [15]-[18]. Many researchers have also presented their solution to this problem as a binary classification study
56 [19]-[22]. These works demonstrate that deep learning is effective in gauging the disease-causing patterns in plants.
57 However, the existing works suffer from drawbacks such as low accuracy, a high false-positive rate resulting in erroneous predictions.
58 The majority of the works did not present any end-to-end modular framework which can be deployed in a real-world setting for the utility of
59 the farmers. During the image acquisition, the presence of unnecessary background clutter, irrelevant illuminations, and color aberrations
60 degrades the performance of a computer vision system [23]. According to Wertheimer’s gestalt theory [24], the human perceptual system tends
61 to group tokens, and this theory has been leveraged to develop various image segmentation algorithms. In the context of plant images, the
62 existence of the unrelated background features may develop a strong correlation with the local or global features of the leaf image.

63
64 Fig.1. An illustration of irrelevant background features in the tomato leaves: (a) Irrelevant Blurred background features in Early blight
65 stage, (b) Similar background color of soil in Late blight stage, and (c) Non-uniform spread of shadow in Late blight stage.
66
67 For instance, background artifacts such as leaf shadow, soil, and pebbles have a similar color scheme as the disease-causing candidate
68 features in the foreground (Fig. (1)). This can negatively affect feature extraction for diagnostic purposes. According to Arsenovic et al. [17],
69 the removal of background information significantly improved the performance of their computer-aided agricultural precision system
70 (CAAPS). Therefore, in recent years, the proliferation of image processing and segmentation algorithms has resulted in a surge of several state-
71 of-art background removal studies for plant disease localization [25]-[27].
72 Although the existing works have achieved better results but have not integrated their segmentation schemes in an end-to-end
73 modular framework. Most the existing disease recognition works have either proposed the traditional multiclass disease classification models
74 or binary diagnostic mechanisms. But, none of the previous works have presented a co-optimized end-to-end framework that can conduct
75 diagnosis followed by the classification of disease for its early identification by considering the computational requirements. The existing
76 models are less generalizable for validation. Also, the practical usage of a single facet binary diagnostic model is limited in agricultural research
77 because, apart from the diagnosis crop infestation treatment is highly dependent upon the severity and the type of disease, the crop is infested
78 with. For example, early blight disease in tomatoes is caused by Alternaria linariae that can be cured using mancozeb fungicide, whereas late
79 blight in tomatoes is caused due to Phytophthora infestans which is treated using another disinfectant chlorothalonil [28]. Hence, the dosage
80 and type of fungicides vary with the type of crop disease. Additionally, few studies have employed complex image segmentation algorithms
81 such as U-Nets [29] for background removal and Encoder-decoders [30] for lesion localization, however, the resources on the edge computing
82 devices are limited and complex architectures must be avoided for improved latency on the deployed device. The existing models suffer from
83 less accuracy and precision rate when validated on large image databases which affect the robustness of the developed system. Therefore, the
84 development of a computationally inexpensive high precision framework remains a critical problem in agricultural research [31].
85

86
87 Fig.2. Operating scenario of the proposed framework on mobile application
88 With the proliferation of information and communication technology (ICT) and accessibility to cost-effective smartphones, the utility
89 of mobile devices has increased manifold for the agricultural community [32]. The current smartphones offer many comprehensive features
90 such as high-definition cameras, faster internet connectivity (4G/5G), and heavy computational power along with cloud storage access in many
91 cases. Hence, various novel applications have been proposed in the past to promote precision farming activities [33]-[35]. In the current study,
92 a modular mobile application framework is recommended for the end-to-end analysis of diseases in tomato crops. The farmers are encouraged
93 to capture photos of tomato leaves at different timestamps and upload the images in the mobile application locally on the phone. The framework
94 obtains the captured image and applies the in-built operations to remove unnecessary background information for precise analysis. The trained
95 AI model fetches the image, conducts analysis on the server-side, and transfers the results back to the user’s mobile phone with the complete
96 analysis report of diagnosis and illness stage classification. The proposed framework is not computationally expensive because the computation
97 does not happen on the user’s mobile phone, rather, the cloud server is utilized for the entire inference. Fig. (2) depicts the operating scenario
98 of the proposed precision framework.
99 To address the limitations of previous works, various contributions have been envisaged in this study.
100  First, this paper presents a computationally inexpensive bi-stage mobile application framework called the TomFusioNet, for tomato
101 disease detection and illness classification. TomFusioNet comprises multiple cross-domain transfer learning models whose results
102 are fused using a multi-layer perceptron-based late fusion model.
103  Second, the proposed framework encompasses two components: 1) A tomato disease recognition mechanism – DeepRec – which
104 detects disease by analyzing tomato plant leaves. If any illness is recognized, the control is passed to B) The illness classification
105 mechanism – DeepPred – which enables the TomFusioNet framework to further classify the tomato crop’s illness into six classes:
106 Early blight, Late blight, Bacterial spot, Leaf mould, and yellow leaf curl virus and no-disease for complete analysis. This bi-stage
107 strategy makes the proposed model robust and provides a concrete idea to the farmers for crop treatment through a mobile application.
108  Third, a non-complex image segmentation strategy is leveraged using HSV color map transformation and pixel-level binary masking
109 to remove the unrelated background clutter which could otherwise hamper the prediction results. The proposed background
110 subtraction algorithm is computationally inexpensive and therefore, depicts less latency during real-time validation.
111 The rest of the paper is organized as follows: Section 2 presents the literature review of recent studies. The materials and methods
112 opted for this study are presented in Section 3. Experimental results and evaluation metrics are discussed in Section 4. Section 5 presents the
113 discussion and quantitative comparative analysis with relevant studies. The concluding remarks, limitations, and future works are discussed in
114 Section 6.
115
116 2. Literature review
117 Plant disease identification is critical in precision farming; therefore, multitudinous agricultural research has been conducted through the years
118 in this domain. In this section, relevant procedures concerning plant disease diagnosis have been reviewed extensively covering, a) machine
119 learning-based handcrafted feature extraction, b) image segmentation for plant disease localization and c) deep learning strategies for plant
120 disease identification.
121
122 2.1. Machine learning-based handcrafted feature extraction
123 Machine learning (ML) has been extensively used in the past to aid the plant disease diagnostic frameworks. Scale-Invariant Feature Transform
124 [36] and histogram of oriented gradients [37] are some of the few techniques which have been discussed substantially in the manual feature
125 extraction studies. Hlaing et al. [38] proposed a statistical ML model concerning tomato plant disease classification. In this work, textural
126 features of the leaf image are extracted from the SIFT algorithm and modelled using the generalized extreme value distribution (DEV) strategy.
127 Finally, the features are fed into the SVM algorithm for classification. They claim to achieve the highest accuracy of 84%, however, only 3474
128 were included in the training set. A year later, [38] extended their work and proposed [39], in which SIFT was used for feature extraction, but
129 a different strategy called the Johnson SB distribution technique was used for modeling. Moreover, in this study, color features were added to
130 the feature bank and the SVM classifier was leveraged for feature learning. The accuracy and the prediction speed in [39] improved by 1% and
131 3300 observations per sec, respectively. A leaf image localization technique was proposed by Kurmi et al. [40] for crop disease identification.
132 They leveraged morphological operations such as erosion and Gaussian filtering for leaf region refinement and extracted 20 handcrafted
133 features manually such as max diameter, mean object intensity, image Euler number, and so on. Logistic regression, SVM, and multi-layer
134 perceptron are used for classification, however, their model was only able to achieve an average accuracy of 93.2% which is less than many
135 other competitive models. Sabrol et al. [41] proposed a tomato plant classification study using handcrafted color, shape, and texture features.
136 Their model achieved an accuracy of 97.3%. In [42], Histogram-Oriented Gradient (HOG), Segmented Fractal Texture Analysis (SFTA), and
137 Local Ternary Patterns (LTP) features were extracted for experimentation. Principal component analysis (PCA) was also applied to reduce the
138 dimensionality of the feature set which helped to achieve the highest accuracy of 98.7%. Although, manual feature extraction techniques were
139 efficient, but these were time-consuming and labor intensive. The biggest challenge is the subjectivity of the process and the availability of a
140 domain expert who can first analyze and then identify useful features manually.
141
142 2.2. Image segmentation for disease localization
143 A major drawback of previous studies is the inaccurate analysis and non-meticulous extraction of the disease-causing candidate features. Image
144 segmentation has been a popular tool of choice in the past for researchers and has been applied in diverse applications such as computer-aided
145 diagnosis [43], object detection [44], 3D reconstruction [45], remote sensing [46], and so on. Background subtraction has become an active
146 area of research in precision agriculture because unrelated background information can adversely affect the performance of a diagnostic system
147 [47]. In a few cases, the background information such as prominent edges, a dark background color scheme, and irrelevant illuminations mimics
148 the appearance of disease-causing candidate features of plant images which leads to biased prediction results [48]. As a result, handcrafted
149 features extraction becomes a more strenuous practice. Some of the prominent background removal techniques have been extensively discussed
150 in the review studies by [49] and [50]. Several automated and semi-automated image segmentation schemes have been presented in the past
151 concerning plant disease diagnosis. For example, Ngugi et al. [51] presented a U-Net Encoder architecture for tomato leaf segmentation. In
152 this study, they proposed a conceptual model of a mobile application which can be deployed for the utility of farmers. However, the model
153 was only validated on 1408 plant images. Tian et al. [27] presented an adaptive clustering methodology for tomato leaf segmentation. In their
154 approach, a validity index is calculated for each image to identify the optimal number of clusters. Although this technique is novel and achieved
155 good segmentation results, calculation of validity index requires extra computation. Elangovan et al. [52] used the Otsu image segmentation
156 algorithm for background removal and then manually extracted the color and texture features. However, the research lacked any concrete
157 evaluation metrics to validate their findings. Singh et al. [53] introduced a soft computing technique for plant image segmentation and disease
158 classification. They employed genetic algorithms to optimize the process of finding prominent edges and their SVM classifier achieved an
159 accuracy of 95.7%. However, few of the disease-causing lesions were misclassified in their approach. Similarly, few other thresholding
160 techniques have been presented in the same domain [54,55] but none of the works have included their frameworks to an end-to-end deployable
161 framework. The majority of the deep learning and machine learning-based segmentation techniques are computationally expensive and have
162 unstable performance issues, therefore, it is not viable to deploy them practically.
163
164 2.3. Deep learning for disease identification
165 Deep learning models have been popularized in the last decade as they have demonstrated significant performance in various applications such
166 as biomedicine [56], industry 4.0 [57], autonomous driving [58], and remote sensing [59]. With accessibility to the massive amount of data
167 and availability of high computation power, deep learning models have evolved over the traditional machine learning algorithms [60]. In recent
168 times, various state-of-art deep learning techniques have been presented concerning agricultural research for early disease detection, anomaly
169 detection, and leaf pattern recognition which could be of high utility to the farmers. The performance of Alexnet and SqueezeNet was
170 investigated by Durmus et al. [61] for tomato disease classification. They concluded that the performance of SqueezeNet is better for the mobile
171 application framework in terms of computational requirements and achieved the final accuracy of 94.3%. Similarly, Sladojevic et al. [62]
172 presented a convolutional neural network model for plant disease classification from leaf images. Image pre-processing steps such as affine
173 transformation, perspective transformation, and image augmentation was conducted to improve model performance. Their model achieved the
174 final accuracy rate of 96.3%, however, these results were reported on a dataset with a small number of images. In [15], an attention-based
175 residual CNN model was developed for disease detection in tomato leaf images. This study was innovative because Karthik et al. [15] added
176 separate attention embedded residual progressive feature extraction block to extract the feature maps of different inception layers in the deep
177 model. They achieved the final accuracy of 98%. Costa et al. [20] studied the external defects on tomatoes using a ResNet-based binary
178 classifier. After several steps of fine-tuning their model achieved the highest accuracy of 97.7%.
179 From the existing works, it can be concluded that deep learning is efficient for disease identification, however, according to the best
180 of our knowledge none of the previous works presented an end-to-end framework comprising a lightweight image segmentation algorithm
181 along with feature extraction at high latency for real-world deployment using a mobile application. Additionally, previous deep models have a
182 similar architectural configuration which affects the efficiency of the feature extraction resulting in less precision and a high false-positive rate.
183 Also, the majority of the segmentation algorithms integrated with crop disease diagnostic frameworks are computationally expensive and,
184 therefore, algorithmic modelling of these systems needs to be realized for enhanced performance at minimal latency.
185
186 3. Materials and methods
187 This section discusses the material and methods used to successfully accomplish the objectives of this paper.
188
189 3.1 Tomato data acquisition
190 For the proposed study, the dataset is acquired from a public plant repository called the PlantVillage [63]. The entire dataset encompasses
191 54,303 crop images including 14 crops. The dataset comprises multiple facets of image formats, for instance, grayscale leaf images, segmented
192 diseased regions, and colored leaf images in the RGB format.

193
194 Fig.3 A multi-class corpus of tomato infestation images from the dataset
195
196 Table 1. Information about the dataset.
Class Disease Images Agent Effective Signs and Symptoms
Fungicide
1 Early Blight 2500 Alternaria linariae mancozeb Brown Leaf spots which grow up to half inch in diameter
2 Late Blight 1534 Phytophthora chlorothalonil Irregularly shaped water-soaked lesions are observed on the leaf
infestans during late blight
3 Bacterial 1820 Xanthomonas ManKocide Yellow-green, spots which turn brownish red after ageing.
Spot campestris
4 Leaf Mold 3078 Phytophthora- Amistar Yellowish spots with a pinch of pale green color on the upper
Infestans surface of a leaf. It finally turns bright yellow.
5 Yellow Leaf 3724 Begomovirus Imidacloprid Leaf yellowing in upward or downward fashion. Leaf cupping and
Curl reduction in leaf size are also symptoms for this disease.
197
198 Table 1 illustrates used dataset. In this research, RGB images are used which were further sent to the image processing module for color space
199 transformation. TomFusioNet contains two components: a) DeepRec and b) DeepPred. For the DeepRec component (binary classification),
200 10,836 images were utilized from the plant village dataset [63]. There were only two categories: a) Disease and b) No Disease. The illness
201 classification category comprised a mix of early blight, Late blight, Bacterial spot, Leaf mould, and yellow leaf curl virus images to cover all
202 the disease-causing signs and symptoms. For DeepPred, we have utilized 12,942 tomato leaf images comprising five major infestation classes:
203 Early blight, Late blight, Bacterial spot, Leaf mould, yellow leaf curl virus, and healthy leaves. Fig. (3) presents a corpus of tomato leaf images
204 from the PlantVillage dataset. The predefined image resolution is 256×256 pixels while the data division for both the components is explained
205 in the sections below. During initial visual data screening conducted by us, a strong correlation between the backgrounds with the foreground
206 features was observed in the majority of images (see Fig. (1)). This unwanted correlation can result in erroneous predictions of the proposed
207 AI model; therefore, an efficient background removal strategy is realized to aid the proposed computer-aided diagnostic model.
208
209 3.2 Framework development
210 3.2.1 Proposed TomFusioNet architecture

211
212 Fig.4 An illustration of the proposed TomFusioNet framework
213 In this research, an end-to-end mobile application framework is proposed for the disease identification and illness classification of tomato
214 crops. Fig. (4) presents the TomFusioNet framework proposed in this study. It comprises two late fusion components: a) DeepRec for disease
215 recognition and b) DeepPred for multiclass crop illness classification. The framework also includes an HSV color map-based background
216 removal algorithm which is computationally inexpensive and can be successfully integrated into a mobile application. The TomFusioNet
217 framework is implemented through a mobile app with the primary aim to provide an end-to-end precision farming solution for the farmers at
218 the comfort of their location. This mobile application will enable the farmers to recognize the crop disease and further identify the stage of
219 disease infestation by automatically segmenting the background and applying deep learning algorithms.
220
221 3.2.2 Background removal algorithm
222 To develop a predictive ability deep learning algorithms, for instance, CNNs interpret the visual information using the gradient optimization
223 technique. Therefore, irrelevant visual features can result in biased and incorrect predictions. According to the research of photorealism
224 conducted by [64], complex and irrelevant background components of an image can develop a strong statistical and visual correlation with the
225 foreground features which can be misleading in some cases. In agricultural research, precise analysis of relevant candidate features is vital,
226 however, the acquired data acquisition may inculcate several unnecessary background interferents. For example, branch segments, grass,
227 flowers, moths, stones, soil, and so on. From Fig. (1), it can be observed that these artifacts mimic the appearance of foreground candidate
228 features responsible for illness in leaves [51]. The arrows in Fig. (1) point out the commonalities of the background and candidate features.
229 Hence, feature relevancy needs to be maintained for an AI model to conduct precise analysis. In the past, texture and deep learning-based
230 segmentation approaches have been proposed in crop disease detection [65,66]. However, these techniques are computationally expensive.
231 Also, it is not viable to integrate and deploy these algorithms in a mobile application where latency is a critical concern.
232 The human eye is sensitive to color because it is the primary source of information to develop an understanding of a region of interest
233 (ROI) in an image [67]. Therefore, several color models have been developed that enable humans to perceive their surrounding entities. In
234 computer graphic research, color models are extensively employed because it is possible to manipulate the perceptual uniformity of colors
235 without affecting the spatial features of an image [68]. For this, several color models have been proposed in the past according to specific
236 utilities such as a) RGB, b) HSV, and c) CMYK. The RGB color model has been widely used for image segmentation because several
237 applications require the colors to remain consistent with the hardware device configuration and the RGB model fulfils this condition [67].
238 However, the current RGB color model-based segmentation applications are device-oriented and, therefore, less generalizable. Also,
239 the perceived color of the RGB image differs with variation in the configuration of the edge device being used for real-time validation. In the
240 context of the proposed research, not all farmers will use the same mobile device with similar configurations and due to such rigid
241 generalizability constraints, the RGB model is avoided in this study. Therefore, a device-independent color model is used for the proposed
242 application.
243 To overcome the listed limitations, we have utilized the HSV (Hue, Saturation, Value) color model transformation to assist the
244 background removal algorithm of tomato leaves. The preferred color model scheme has several advantages. First, the human eye is more
245 pertinent to the color intensity, and the HSV color model closely resembles human vision [69]. Second, HSV models are device-independent,
246 therefore, if the segmented crop image is transferred to a remote server via different devices, the color composition remains constant throughout.
247 Third, HSV images show less variation in hue values during the presence of external lighting as compared to an RGB image. For example, if
248 a leaf image is taken in presence of shadow blockage, the hue value will not fluctuate much compared to the red, green, and blue pixel values.
249 This illumination invariance capability aids efficient feature extraction. Fig. (5) presents a flowchart for the color model conversion schema.

250
251
252 Fig.5 Flowchart of color model conversion
253 The stepwise approach is explained below:
254 1) Initially, the batch of input RGB images is fed into the image processing module of the TomFusioNet framework and resized
255 uniformly to 256 × 256 pixel resolution for computational efficiency.
256 2) Since the RGB color model is not efficient for the proposed application as explained in the literature, a batch-wise linear
257 transformation is conducted at each iteration to convert the input leaf image from RGB to HSV color space. The steps are as follows:
258 Initially, the R, G, B pixel values are normalized which constraints the pixel range to [0,1] as presented in Eq. (1). This is done because RGB
259 to HSV is a lossy conversion that results in data loss, therefore, to convert an 8-bit image the values are scaled from [0 → 255] to [0 → 1].
260
𝑅 𝐺 𝐵
261 𝑟, 𝑔, 𝑏 = , ,
255.0 255.0 255.0
(1)
262 Here, the (𝑟, 𝑔, 𝑏)𝜖 [0, 1]. Next, minimum and maximum pixel values for the three channels are calculated as: 𝑀1 = 𝑀𝑖𝑛(𝑟, 𝑔, 𝑏)
263 and 𝑀2 = 𝑀𝑎𝑥(𝑟, 𝑔, 𝑏). The difference between 𝑀1 and 𝑀2 is identified as: 𝐷ℎ𝑠𝑣 = |𝑀1 − 𝑀2 |. The HSV color space is composed of ‘ℎ’:
264 Hue, ‘𝑠’: Saturation, and ‘𝑣’: Value. The hue value ‘𝐻’ is computed based on the five cases:
265
266 Case 1: If 𝑀1 = 𝑀2 , then 𝐻 = 0 (2)
267
(𝑏−𝑔)
268 Case 2: If 𝑀1 = 𝑟, then 𝐻 =
𝐷ℎ𝑠𝑣
× 60 (3)
269
(𝑏−𝑟)
270 Case 3: If 𝑀1 = 𝑔, then 𝐻 =
𝐷ℎ𝑠𝑣
× 60 (4)
271
(𝑔−𝑟)
272 Case 4: If 𝑀1 = 𝑏, then 𝐻 =
𝐷ℎ𝑠𝑣
× 60 (5)
273
274 Case 5: If 𝑀1 = 0, then 𝐻 = 180 (6)
275
276 Here, 𝑟, 𝑔, and 𝑏 stands for red, green, and blue, respectively. If the resultant Hue (𝐻) value is greater than 360, then 𝐻 = 𝐻 − 360. Finally,
277 the saturation is represented as ‘𝑆’ which is calculated as:
278
𝑀1 −𝑀2
279 𝑆=
𝑀1
× 100 (7)
280 where 𝑆 lies in the range [0, 100]. Let ‘𝑉’ denote the value or brightness of the resultant HSV image. It is extracted as the maximum value of
281 R, G, and B channels, and therefore, 𝑀1 == 𝑉. Fig. (5) illustrates the flowchart of color conversion model. The resultant HSV image along
282 with the snapshots of H, S, and V components is shown in Fig. (6).
283 The data points of the original RGB and HSV image are mapped into three-dimensional cartesian planes along the (𝑥, 𝑦, 𝑧) axis (refer
284 Fig. (7)). It can be inferred from the visualization that the distribution of pixels in Fig. (7)(a) is neither localized nor visually separable.
285 However, in Fig. (7)(b), the HSV pixel distribution for brightness (V) and the saturation (S) points are visually separable. This proves that the
286 HSV model is feasible for further segmentation processes.
287

288
289 Fig.6 Illustration of different components of the image: (a) original image, (b) hue component, (c) saturation component, (d) value
290 component, and (e) HSV image

291
292 Fig.7 3-dimensional Cartesian visualization: (a) RGB color model and (b) HSV color model
293

294
295 Fig.8 Analysis of background removal algorithm: (a) original image, (b) HSV image, (c) image Mask, and (d) background removed image.
296
297 3) For background removal of the HSV tomato image, individual channel masking is done. Image masking is a non-destructive process of
298 removing unwanted areas of an image [70]. Individual bit-wise masks with minimum and maximum range of Green (𝐺𝑚 ), Brown (𝐵𝑚 ), and
299 Yellow (𝑌𝑚 ) colors were identified. The initial data survey conducted by us revealed that most of the tomato disease features portray yellowish-
300 brown color while the rest of the surface is green, therefore, the threshold pixel range depends on them. The combined image mask (𝑋𝑚 ) is
301 prepared by the bitwise 𝑂𝑅 operation of the individual masks as 𝑋𝑚 = (𝐺𝑚 | 𝐵𝑚 | 𝑌𝑚 ). Finally, a bitwise 𝐴𝑁𝐷 operation is done on the input
302 HSV image where binary ‘1’ is assigned to the values falling in range and the rest are assigned ‘0’ value. Few areas in the image persisted
303 spatial noise, therefore, a gaussian blur function is applied a kernel size of (3, 3). Gaussian function uses a normal distribution and the image
304 variance to calculate the transformation value which is applied around each pixel. It can be represented as [71]:
−𝑥2+𝑦2
1
305 𝐺𝑓 (𝑥, 𝑦) =
√2𝜋𝜎 2
×𝑒 2𝜎2 (8)
306
307 Here, (𝑥, 𝑦) represents the pixel coordinates and 𝜎 is the image variance (𝜎 > 0). Fig. (8) shows the result of the proposed background removal
308 algorithm. After this pre-processing step, the prepared images are fed to the feature extraction model in separate batches.
309
310 3.2.3 Data preparation and artificial data generation
311 To assure feature extraction homogeneity for both DeepRec and DeepPred, the pre-processed tomato leaves are evenly resized to 224 × 224 ×
312 3. Previous research has suggested that the performance of deep models is subject to the availability of training and validation data due to the
313 involvement of millions of parameters [72]. To ensure this, geometric transformation-based artificial data augmentation was conducted before
314 feature extraction. Batchwise image transformation was done based on horizontal flips, height shift manipulation, rotation angle (Ɵ = 65
315 degrees), sheer transformation, and image cropping. Table 2 presents the values for all the augmentation parameters.
316
317 3.2.4 Cross-domain transfer learning
318 With the accessibility to a large amount of data, deep learning applications have drawn large attention from researchers around the globe. This
319 novel paradigm has seen multifarious state-of-art applications in various domains as reviewed in the literature above. With this increasing
320 trend, it has been identified that the majority of these research papers are based on the foundation of convolutional neural networks (CNNs).
321 CNNs are the feed-forward neural networks based on the layer-wise abstraction of visual data to convert a feature map of pixels into a higher
322 representation [73]. During training, the initial layer of a CNN automatically learns the low-level feature information such as edges and colors
323 followed by the extraction of high-level information such as curvatures in the final layers. A single convolution function at a partially connected
324 layer is an element-wise multiplication that is defined as [10]:
325
326 𝐶𝑜𝑛𝑣[𝑚, 𝑛] = (𝐹 × 𝐻)[𝑚, 𝑛] = ∑𝑗 ∑𝑘 ℎ [𝑗, 𝑘] × 𝑓[𝑚 − 𝑗, 𝑛 − 𝑘] (9)
327
328 Here, 𝐹 represents the input feature map and the convolution kernel is ℎ. The literals 𝑚 and 𝑛 denote the feature matrix indices as
329 rows and columns, respectively. During the backward propagation, the gradient is automatically adjusted according to a specific optimization
330 strategy. In the proposed research, stochastic gradient descent (SGD) is utilized which works elementwise to minimize the cost function for
331 updating weights as:
1
332 𝐶𝑓 = × ∑𝑛𝐼=1(𝑌̂ − 𝑌)2 (10)
𝑛
333
334 The weight updation process of SGD is defined as:
335 𝑊𝑖 = 𝑤 − ր∇𝐶𝑓 (𝑤) (11)
336
337 Here, ր is the learning rate to decide step size (ր 𝜖 ℝ) while the sign of 𝐶𝑓 determines the direction of gradient as positive or negative.
338 This capability of learning intrinsic details through the gradient optimization strategy is the reason behind the major applications of CNNs.
339 The cross-domain transfer learning is based on the knowledge transfer strategy by retraining the feature weights of the ImageNet
340 competition into applying them to other suitable applications [74]. This technique minimizes the time to train a model from scratch and has
341 produced accurate results for a wide range of problems. Transferring knowledge from pre-trained models also minimizes data dependency
342 which is a serious issue in deep learning research. Also, the data does not have to be independent and identically distributed which makes the
343 cross-domain transfer learning successful [75]. For example, consider a supervised learning task 𝑀 with data distribution as {𝑋𝑖 | 𝑌𝑖 }, where
344 𝑋𝑖 is a data point and 𝑋𝑖 𝜖(𝑋1 , 𝑋2 , 𝑋3 . . 𝑋𝑛 ). Likewise, 𝑌𝑖 is a label and 𝑌𝑖 𝜖 (𝑌1 , 𝑌2 , 𝑌3 . . 𝑌𝑛 ). The task 𝑀 can be represented as: 𝑀 = (𝑦 | 𝑘(𝑥)),
345 where 𝑘(𝑥) denotes the prediction function. In the cross-domain transfer learning schema, the prediction function 𝑘(𝑥) can be utilized to solve
346 task 𝐷 = {𝐷𝑖 | 𝐶𝑖 }. Here, 𝐷𝑖 is the data point, 𝐶𝑖 is the label, and 𝑀 ≠ 𝐷.
347
348 3.2.5 Model architectures
349 In this study, modified architectures of cross-domain transfer learning models such as the InceptioNet [76], ResNet-50 [77], and VGG-16 [78]
350 are adopted as illustrated in Fig. (9). During feature extraction, only fine-tuned models are utilized in the DeepRec (disease identification) and
351 DeepPred (disease classification) components i.e., the initial layers of the models are frozen to reduce the number of backward propagations
352 passes. Therefore, the total inference time of retraining the models is reduced which results in less computation requirement. The number of
353 input neurons in the bottommost FC layers of DeepRec and DeepPred is set to 256 and 512, followed by Sigmoid and SoftMax activation
354 functions for conducting binary and multiclass classification, respectively.

355
356 (a) VGG-16 modified architecture

357
358 (b) InceptioNet modified architecture

359
360 (c) ResNet-50 modified architecture
361 Fig.9. An illustration of modified transfer learning architectures used before applying the late fusion strategy
362
363 3.2.6 Feature extraction layers
364 a) Fully connected layer: A fully-connected (FC) layer has an end-to-end connection with all the neurons of the subsequent layer. While the
365 frozen weights of the deep models were not updated during backward propagation, two fully-connected dense layers with 256 and 512 neurons
366 were trained at each iteration of the forward propagation. The FC output layers of the DeepRec and DeepPred comprise 2 and 5 neurons,
367 respectively. Mathematically, the multiplication of weight matrix with the output feature vector from the previous FC layer is defined as:
368
369 𝐹𝑜𝑢𝑡𝑝𝑢𝑡 = 𝐾(𝑊. 𝑥 + 𝐵) (12)
370
371 where 𝐾 is the activation function ‘ReLu’, 𝑊 is the weight matrix updated during backward propagation, and 𝑥 represents the resultant input
372 vector from the previous layer. 𝐵 is defined as the bias term calculated during the linear transformation.
373 b) Global average pooling layer: This layer is added to regularize the deep models by conducting layer-wise dimensionality reduction during
374 the forward propagation. There are two kinds of pooling kernels: a) Maximum pooling and b) Global average pooling. In this study, global
375 average pooling convolution kernels of dimension (2 × 2) were added at the bottommost FC layers.
376
377 c) Activation layer and Dropout: ReLu activation function is employed in both components to induce non-linearity to model training during
378 feature extraction. It is a popular activation function amongst researchers because of its efficiency, faster convergence, and fewer computation
379 requirements. During backward propagation, the negative gradients are ignored resulting in the activation of positive valued neurons in the
380 next iteration. The domain range of a ReLu activation function is:
381
382 𝐽𝑅𝑒𝐿𝑢 (𝑧) = max(0, 𝑧) (13)
383
384 where 𝑧 𝜖 (0, 𝑛), 𝑛 = {1, 2, 3, . . . , 𝐾} and 𝐾 𝜖 (1, 𝑛). Sigmoid represents a monotonic function with a range between [0,1]. It is defined as:
385
1
386 𝑆𝑠𝑖𝑔𝑚𝑜𝑖𝑑 (𝑥) =
1+𝑒 −𝑥
(14)
387
388 Here, ‘𝑥’ is the output from the final dense layer. In DeepRec, the sigmoid activation function is added to the final dense layer. DeepPred
389 comprises the SoftMax activation function in the FC layer to obtain a multinomial probability distribution output for multiple classes.
390 According to [79], it assigns a score with a summation equal to 1 as presented below:
391
𝑒𝑦
392 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝜎(𝑦) = ∑𝑙1 𝑒 𝑦
(15)
393 Here, 𝑦 represents the input and ‘l’ is the number of labels.
394
395 3.2.7 Late fusion strategy
396 Generally, single facet deep models face generalization issues due to similar architectural constraints, high gradient loss, and poor accuracy.
397 To alleviate this issue, several information fusion techniques have been identified in the past to develop a multi-facet feature learning
398 mechanism by using a diverse feature set for the training of a deep model [80]. Previous studies in different modalities have shown that fusion-
399 based techniques not only aids in achieving higher accuracy but also provide pivotal information for extensive comparative analysis [81][82].
400 In the past, ensemble techniques such as bagging [83] and boosting [84] have been proposed in precision agriculture to achieve superior
401 performance for crop disease diagnosis, however, the result aggregation strategy of these existing techniques is very simple and not efficient.
402 For instance, the majority of ensemble techniques [83] obtains the final decision score by calculating a mean accuracy value of multiple deep
403 models. However, in this case, all the models contribute the same amount to the result regardless of their individual performance.
404 To overcome these limitations, the two most famous information fusion strategies such as Early Fusion and Late Fusion are utilized.
405 Early fusion involves the concatenation of multiple features from different classifiers into a higher dimensional representation vector set. This
406 vector set can be fed into a rule-based classifier such as an Artificial Neural Network (ANN) or Support Vector Machine (SVM) to produce
407 the final predictions [85].

408
409 Fig.10 Early Fusion strategy
410 Fig. (10) shows the feature concatenation of multiple models for early fusion. Its major limitation is the requirement of high computation power
411 because of the large number of calculations for high dimensional feature vectors. Therefore, a late fusion strategy is preferred in many cases.
412 Late fusion or decision-level fusion refers to the independent training of several deep models (for instance ANNs, MLPs, and CNNs) on unseen
413 datasets followed by the aggregation of their final prediction scores using a non-linear weighted average fusion technique [85]. The listed non-
414 linear weighted average fusion of predictions is carried out at the decision stage by a meta-learner classifier model which can be a multi-layer
415 perceptron (MLP), deep neural network (DNN), SVM, etc. A selected batch of training and validation data is kept separate for the training of
416 the meta-learner model to maintain the double-blind training scheme of the late fusion strategy. Fig. (11) depicts the working of a late fusion
417 model. These models are individually trained before decision level fusion to achieve superior individual performance. This technique has
418 several advantages, for example, it maintains zero error correlation amongst the deep models. Second, the meta-learner can produce
419 significantly accurate results by deducing the biases of generalizers (deep models) independently which prevents underfitting. Third, best
420 classifiers can be identified for each modality before fusion for better predictive accuracy. However, sometimes this process can be time-
421 consuming due to separate model training, but it can be reduced by utilizing shallow architectures.

422
423 Fig.11 Late Fusion strategy
424
425 In the current study, a late fusion of three cross-domain transfer learning models such as ResNet-50, InceptioNet, and VGG-16 is
426 realized. Mathematically, the whole process can be summarised as: For instance, let ‘𝐽’ represents the cross-domain transfer learning models
427 (𝐽1 , 𝐽2 , 𝐽3 , . . . , 𝐽𝑛 ) and ‘𝐾’ represents the class labels (𝐾1 , 𝐾2 , 𝐾3 , . . . , 𝐾𝑛 ). 𝐾 = {0, 1} for binary and 𝑘 = {0, 1, 2, 3, 4, 5} for multi-class
428 classification, then 𝐾 ̂ = 𝐽𝑖 Ɵ(𝑥 𝑖 , 𝑤 𝑖 , 𝐵𝑖 ). Here, Ɵ shows the final activation function as sigmoid (see Eq. (14)) or SoftMax (see Eq. (15)) and β
429 is the bias term. 𝐾 ̂ ∈ ℝℎ ×𝑤×𝑑 , where ℎ, 𝑤, and 𝑑 are the height, width, and number of channels of the output feature map. In the proposed late
430 fusion scheme, the non-linear meta learner is a Multi-layer perceptron model depicted as ‘ᴪ’. The fusion objective function is defined as:
431
432 𝑆𝑐𝑜𝑟𝑒𝐹𝑢𝑠𝑖𝑜𝑛 = ᴪ {(𝐽1 (𝑘̂), 𝐽2 (𝑘̂), 𝐽3 (𝑘̂), 𝐽4 (𝑘̂) . . . 𝐽𝑛 (𝑘̂))} (16)
433
434 Here, 𝑆𝑐𝑜𝑟𝑒𝐹𝑢𝑠𝑖𝑜𝑛 is the final late fusion prediction result based on a non-linear weighted average strategy. During computation, the classifier
435 which achieved better accuracy in the initial training contributes more to the final late fusion accuracy because the input predictions or scores
436 are weighted differently by the non-linear MLP meta-learner [86]. During the fusion, the top layers of transfer learning models are frozen, and
437 their weights are not updated which reduces the overall computational requirement and the model converges faster. Since this is a multi-channel
438 network, 𝑀 data copies are fed into the MLP fusion model during fusion. 𝑀 shows number of transfer learning models used in the initial stage.
439
440 3.2.8 DeepRec
441 In the TomFusioNet framework, DeepRec is developed as a disease recognition component that takes an input image of dimensions 224 ×
442 224 × 3 and produces a binary result as Disease (1) or No Disease (0). To invoke DeepRec, farmers can take an image from a smartphone
443 which will be directly fed into the Image pre-processing module for color space conversion (RGB to HSV) followed by the background removal
444 using the proposed masking algorithm (see section 3.2.2). The pre-processed image is free from background clutter as shown in Fig. (8) which
445 makes it suitable for further analysis as explained in the literature above. The main idea of the DeepRec mechanism is to produce a preliminary
446 result for the farmers by identifying if the tomato crops are infested with disease or not, if ‘No’, then the next image is analyzed. In case of
447 positive infestation (output=Yes), the DeepPred mechanism of TomFusioNet is invoked automatically to conduct further analysis as explained
448 in the next section.
449 Initially, the transfer learning models such as ResNet-50, VGG16 and Inception Net were trained separately on the binary plant
450 village dataset comprising tomato leaves to identify their individual performance. At this stage, the dataset consisted of two classes: a) Disease
451 and b) No Disease. To maintain uniformity, the Infested category comprised the equivalent presence of Early blight, Late blight, Bacterial
452 spot, Leaf mould, and yellow leaf curl virus tomato images. The Inception Net and ResNet-50 outperformed the VGG16 model and therefore,
453 their modified versions were further used for the late fusion. The reason behind the superior performance of ResNet-50 is its high
454 generalizability and efficient training on the ImageNet dataset [77]. It is originally a 50 layered architecture composed of multiple CNN stacks
455 tied together using a skip connection strategy. These skip connections prevent the dying neuron’s condition during the backward propagation
456 of gradient descent [77]. Fig. (9) shows the ResNet-50 architecture having skip connections. Additionally, the Inception Net is a 27-layer
457 architecture having sparse connections as shown in Fig. (9). The 1 × 1 convolutions in the inception layers are included to reduce the
458 dimensionality of the feature vector for the next layer which is the core concept of this model.

459
460 Fig.12 MLP component of DeepRec in TomFusioNet
461
462 Late fusion is handled by a multi-layer perceptron meta-learner model built over the modified version of Resnet-50 and InceptioNet. Total
463 10,836 images are used which are divided into three sets: Train (54%), Validation (23%), and Test (23%). Modification in late fusion
464 architecture is done at the final decision layer of both Inception Net and ResNet 50 where their decision scores are fed directly into MLP
465 component. Proposed MLP comprises an input layer, followed by two FC layers and an output layer as shown in Fig. (12). The multi-layer
466 perceptron adopted for the late fusion of DeepRec includes 64 and 128 neurons in its hidden layers. In the end, the sigmoid activation function
467 (refer Eq. (11)) is added in the output dense layer to produce a binary disease classification result as Disease (1) or No disease (0). ReLu
468 activation function (see Eq. (13)) is employed to provide non-linearity to the hidden layers during the fusion. The modification also includes
469 the freezing of transfer learning model weights to reduce training. During the hyperparameter tuning, multiple dropout rates, batch sizes along
470 with different activation functions were tested for DeepRec, and finally a single dropout layer of 0.2%, batch size 128 outperformed other
𝐿𝑅
471 combinations and gave better results. The initial learning rate (LR) of 0.01 was also reduced uniformly at a rate of
50
after each epoch.

472
473 Fig.13 DeepRec: Late Fusion component
474 Fig. (13) illustrates the entire DeepRec late fusion framework. Depending upon the result, the control of the framework is either automatically
475 transferred to DeepPred for further analysis or it loops back to the DeepRec component for the analysis of the next input image.
476 3.2.9 DeepPred
477 If the DeepRec gives output as Disease (1), then it means that the crop is unhealthy and the DeepPred component of the TomFusioNet
478 framework is invoked next in the pipeline. It further classifies the disease into multiple categories such as: Early blight, Late blight, Bacterial
479 spot, Leaf mould and, yellow leaf curl virus and gives an end-to-end result to the farmer. DeepPred is crucial in the proposed application of
480 precision farming because the kind of pesticide used to cure a disease varies as explained in the literature. A similar training strategy as
481 mentioned above is followed for DeepPred along with a few modifications in the meta-learner architecture.
482 Initially, the cross-domain transfer learning models such as ResNet-50, VGG-16, and Inception Net were separately trained on multi-
483 class tomato datasets for the above-mentioned categories including normal (healthy) leaf images. VGG-16 and ResNet-50 performed well and,
484 therefore, are used in the late fusion component of DeepPred for further feature extraction. VGG-16 is an award-winning architecture of the
485 2014 ImageNet competition [78] comprising 13 padded convolutional layers that takes an input image of dimension 224 × 224 × 3. The
486 model also includes a convolution stride of 1 along with max pool kernels of size 2×2 to provide a regularization effect. The original VGG-16
487 architecture also includes several 1 × 1 CNN layers to induce non-linearity in the objective loss function without manipulating the receptive
488 fields. Like DeepRec, late fusion is done using a modified multi-layer perceptron model for aggregating the decision scores of VGG-16 and
489 ResNet-50 using a non-linear weighted average strategy. It also receives the pre-processed image (see section 3.2.2) of dimension 224 ×
490 224 × 3, however, unlike DeepRec, the MLP is not shallow and comprises of an input layer followed by 3 hidden layers having 64, 128, and
491 512 neurons and a separate output dense layer with 6 output neurons. It is proposed to use minimum hidden layers in DeepPred keeping in
492 mind the long training time of both VGG-16 and ResNet-50 networks. Fig. (14) shows the MLP of DeepPred. The initial layers of both transfer
493 learning models are frozen to prevent unnecessary updation of weights during forward and backward propagation.
494 Total 15,820 images were used at this feature extraction stage which is divided into three sets: Train (60%), Validation (20%), and
495 Test (20%). Learning rate of 0.001 for a batch size of 256 and dropout rate of 0.3% was finalized after several hyperparameter tuning steps.
496 Since we are dealing with multiple classes and a large number of images, epochs for the DeepPred were fixed to 50. The modification in the
497 output FC layer of the late fusion model includes 6 nodes along with a SoftMax activation function (see Eq. (15)) because it gives a probability
498 distribution for multi-class classification. Fig. (15) presents the DeepPred late fusion component. The resulting predictions of the TomFusioNet
499 are shown to the farmer through a mobile application interface along with a complete report stating disease recognition by DeepRec and further
500 illness classification stage by DeepPred.

501
502 Fig.14 MLP component of DeepPred in TomFusioNet
503

504
505 Fig.15 DeepPred: Late Fusion component
506 3.2.10 Hyper-parameter tuning
507 DeepPred is a significant component of TomFusioNet and, therefore, hyperparameter optimization of the meta-learner multi-layer perceptron
508 is done by evaluating its performance for the number of hidden layer neurons combinations. The number of neurons is significant because
509 during backward propagation, the time complexity (𝑇) of a neural network is directly propositional to ‘𝑛’ layers and ‘𝑘’ neurons in a single
510 layer as:
511 𝑇(𝑛) ∝ 𝜃(𝑛. 𝑘) (17)
512
513 For this, a powerset 𝑃𝑠 was selected comprising multiple input neurons set as, 𝑃𝑠 = {[64, 128], [64, 128, 512], [128, 256, 512],
514 [128, 128, 512], [256, 512, 1024]}. The selection of neurons is done in ascending order (lower to higher) to keep a check on the time
515 complexity and model training time. The late fusion component of DeepPred is separately trained multiple times with different sets of input
516 neurons as included in the MLP model however other hypermeter configurations were the same. The most optimal neuron combination with
517 the best performance metrics was selected for the final model to reduce the number of false negatives and achieve better accuracy.
518
519 4 Experimental Results
520 4.1 Experimental setup
521 The proposed experiment is designed using the Keras framework in the Python programming language. The hardware configuration of the
522 workstation is as follows: Intel(R) Core (TM) i7-9750H CPU @ 2.60GHz, 8 GB RAM in a 64-bit windows version equipped with an NVIDIA
523 GeForce GTX1060 graphic card. Both DeepPred and DeepRec models used the stochastic gradient descent optimizer (refer Eq. (11)) because
524 it has a better convergence ratio in less time as compared to the batch gradient descent algorithm [87]. During the experimentation, the data
525 division ratios, batch size, learning rates, and the number of epochs for both the components (DeepRec and DeepPred) were selected precisely
526 post hyperparameter tuning as mentioned in the previous section.
527
528 4.2 Evaluation metrics
529 Performance analysis of the deep frameworks in TomFusioNet is based on several evaluation metrics such as: Accuracy, Weighted Average
530 Precision, Recall, and F1-score.
531 a) Accuracy: It is the primary mode of evaluation for this research which explains how the model performs by correctly predicting the labels
532 of the input samples. It is defined as the ratio of correctly predicted samples to the total sample set as:
533
𝑇𝑃+𝑇𝑁
534 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
(18)
535
536 Here, 𝑇𝑃 correctly identifies positive class. 𝑇𝑁 identifies the negative classes. 𝐹𝑃 refers to negative classes predicted as positive and 𝐹𝑁
537 denotes positive class that is predicted as negative.
538
539 b) Weighted Precision (WP): The precision metric explains how correctly the true positive samples have been identified out of all the positive
540 samples including the false positives. However, the conventional precision value can raise false flags in the case of unbalanced classes.
541 Therefore, variation of precision is considered in this paper. Weighted Average Precision (WaP) is the average precision of all classes multiplied
542 by an independent weight value depending upon the number of labels in each class as:
543
𝑃𝑖 ×𝐿𝑖
544 𝑊𝑎𝑃 = ∑
|𝐿|
(19)
545 Here, 𝑃𝑖 is the precision and 𝐿𝑖 is the label.
546
547 c) Weighted Recall (WR): Recall or sensitivity is the measure of true positive rate including using the positive classes such as True positive
548 and False Negatives. Similar to weighted precision, the independent weight value is added to the recall function to modify it to Weighted
549 Average Recall (WaR) for accurate analysis in the case of unbalanced classes as shown below:
550
𝑅𝑖 ×𝐿𝑖
551 𝑊𝑎𝑅 = ∑
|𝐿|
(20)
552 Here, 𝑅𝑖 is the recall and 𝐿𝑖 is the label.
553
554 d) Weighted F1-score (W-F1): Weighted Average F-1 (WaF-1) score is calculated as the harmonic mean of precision and recall, evaluating
555 the performance of a classifier. It produces a combined score by quantifying the data distribution for the imbalanced classes as:
556
𝐹𝑖 ×𝐿𝑖
557 𝑊𝑎𝐹 − 1 = ∑
|𝐿|
(21)
558 Here, 𝐹𝑖 is the F-score. 𝐿𝑖 is the label.
559
560 4.3 Quantitative analysis
561 The quantitative analysis has been conducted separately for the two components of the TomFusioNet framework. The following sections are
562 a) performance analysis of DeepRec and b) performance analysis of DeepPred.
563
564 a) Performance analysis of DeepRec
565 To analyze the performance of DeepRec, several late fusion combinations are examined. DeepRec models are trained on the same dataset with
566 same hyperparameters. Table 3 presents disease recognition ability (binary) by depicting the train and test accuracy along with the cross-
567 entropy loss and AUC values achieved by the individual models without fusion. It is evident that Inception Net and ResNet-50 performs well
568 for binary disease diagnosis, therefore, selected for late fusion using a custom multi-layer perceptron meta-learner. Meta-learner is trained on
569 a separate dataset with the train, validation, and test data division as 54:23:23. Experimental results reveal that the late fusion of Inception Net
570 and ResNet-50 achieves the best training, validation, and testing accuracy as 99.93%, 99.75%, and 99.76%, respectively. However, to confirm
571 the superiority, other combinations are also tested (See Table 4). Inception and VGG-16 late fusion model achieves the minimum accuracy.
572 The reason behind this can be that the VGG-16 architecture is very similar to a simple convolution neural network and the Inception is a
573 shallow deep model. Therefore, their late fusion model was unable to perform well and extract complex features. However, ResNet-50 is a
574 very deep architecture comprising skip connections to take care of the overfitting and dying neurons problem. Therefore, its combination with
575 the Inception Net was able to achieve an increase of 1.09% in the final test accuracy as compared to the best performing model of Table 4.
576
577 Table 3. Performance evaluation for binary disease recognition of custom cross-domain transfer learning models
Architecture Accuracy (%) Cross entropy Loss Area under curve
(Binary) (AUC)
Train Test

ResNet-50 92.71% 89.90% 0.690 0.93

VGG-16 90.09% 84.21% 0.833 0.91

InceptioNet 91.98% 87.90% 0.714 0.91

D-CNN 84.48% 80.91% 0.970 0.89

578
579 Table 4. Performance evaluation of DeepRec late fusion models
Fusion Accuracy (%) Cross entropy Loss Area under curve
Architecture (Binary) (AUC)
Train Validation Test

ResNet-50 + VGG-16 98.60% 98.01% 98.67% 0.013 0.9982


VGG-16 + InceptioNet 83.93% 81.00% 83.09% 0.41 0.8728
InceptioNet + ResNet-50 99.93% 99.75% 99.76% 0.0088 0.9987
580
581 Extensive performance analyses have also been conducted on the proposed late fusion model using unseen test set images. Fig. (16) shows the
582 confusion matrix analysis of the different late-fusion combinations. It can be observed that apart from the proposed late fusion model, there
583 are a high number of false negatives and false positives in Fig. (16)(b) and Fig. (16)(c). False negatives are crucial in precision farming because
584 these metrics may fail to address the farmer to take any remedial action in case of a positive disease case. Therefore, to develop an end-to-end
585 mechanism these negative metrics should be minimized. Our proposed late fusion model reduces the possibility of false analysis because it
586 only has 6 false negatives which are just 0.002% of the total test dataset. We further report the performance of DeepRec with and without
587 generating artificial data using geometric transformations. The results of Fig. (17) mention that augmentation facilitated the final performance
588 of the late fusion model by 5.32% in terms of the final accuracy rate. Although, these geometric transformations do not contribute by producing
589 new images, however, manipulations such as image shifting and a change in orientation can bring forward the edge cases of an image that
590 would have been ignored during the feature extraction.
591 Quantitative analysis has also been conducted for the model to deal with unbalanced classes using label-weighted metrics of
592 evaluation. Table 5 shows the weighted precision, recall, and F1-score values obtained per category by each fusion architecture. Although, the
593 proposed DeepRec has a slightly lesser sensitivity as compared to VGG-16+Resnet-50, and VGG-16+InceptioNet in identifying healthy
594 images, however, it outperformed the other competitive fusions in the overall scenario. To fairly analyze the learning process of the proposed
595 model a comparative loss analysis has been presented in Fig. (18). It can be observed that the training for the proposed fusion combination
596 from the 0th to the 30th epoch went smoothly with fewer fluctuations, proving its efficient generalizability over the entire training set.

597
598 Fig. 16. Confusion matrix analysis of DeepRe:. (a) InceptioNet + Resnet-50, (b) VGG-16 + InceptioNet, (c) ResNet-50 + VGG-16
599
600
601 Fig. 17. Performance evaluation of DeepRec late fusion models with and without artificial data augmentation.
602
603 Table.5 Class wise evaluation of the DeepRec late-fusion models in terms of Weighted Precision, Recall and F-measure
Fusion Model Label Weighted-Precision Weighted-Recall Weighted-F-measure

ResNet-50+InceptioNet Healthy 1 0.993 1


(Proposed)
Unhealthy (Disease) 0.99 1 1

Healthy 0.99 1 0.99


VGG-16+InceptioNet
Unhealthy (Disease) 0.97 0.96 0.99

ResNet-50+VGG-16 Healthy 0.97 0.99 0.99

Unhealthy (Disease) 0.99 0.98 0.99


604

605
606 Fig.18 Binary Cross entropy loss analysis for DeepRec late fusion models
607
608 b) Performance analysis of DeepPred
609 The evaluation metrics mentioned above suggest that our DeepRec model can accurately recognize or detect disease presence in tomato crop
610 images. However, the DeepPred component presents another facet of predictions in our research by further classifying the type of illness as:
611 a) Early blight, b) Late blight, c) Bacterial spot, d) Leaf mould, e) Yellow leaf curl virus, and f) Healthy. The data pipeline for DeepPred is
612 similar to DeepRec as shown in Fig. (4), however, DeepPred only gets the input data to feed if DeepRec detects any infestation presence as
613 positive in the leaf image. For feature fusion, DeepPred also uses the late fusion strategy. However, the multi-layer perceptron architecture is
614 different from DeepRec (see Fig. (14)). Initially, multi-class training of modified transfer learning models such as VGG-16, Inception Net, and
615 ResNet-50 was conducted on the PlantVillage dataset, and their performance is illustrated in Table 6. Since ResNet-50 and VGG-16 produced
616 better results, they were used for the late fusion-based feature extraction. Also, the training time for both the VGG-16 and ResNet-50 was
617 approximately 15 minutes which was less than the Inception Net for the same data division. For fair evaluation,
618
619 Table 6. Performance evaluation for multi-class disease identification of custom cross-domain transfer learning models
Architecture Accuracy (%) Cross entropy Loss Area under curve
(multiclass) (AUC)
Train Test

ResNet-50 91.71% 90.91% 0.721 0.91

VGG-16 93.12% 92.81% 0.710 0.93

InceptioNet 87.32% 87.30% 0.861 0.89

D-CNN 86.81% 85.47% 0.883 0.86

620
621 Table 7. Performance evaluation of DeepPred late fusion models
Architecture Accuracy (%) Cross entropy Loss (Binary) Area under curve (AUC)

Train Validation Test

VGG-16 + InceptioNet 96.78% 92.41% 92.81% 0.184 0.9768


InceptioNet + ResNet-50 92.09% 90.01% 88.76% 0.190 0.9592
ResNet-50 + VGG-16 98.32% 95.47% 95.52% 0.021 0.9910

622
623 Table 7 presents the performance of different late fusion combinations for the multi-class classification. The proposed DeepPred model
624 achieved an overall train, validation, and test accuracy of 98.32%, 95.47%, and 95.52%, respectively. The fusion model surpassed the best
625 performing model of Table 6 by 5.2% and 2.7% in terms of final train and test accuracy rate. Therefore, it is evident that the non-linear late
626 fusion technique was successful in enhancing the performance of the individual classifiers.
627 The dataset used for multi-class classification in DeepPred had a greater number of Early blight images compared to the late blight
628 disease. A similar imbalance trend is followed for the yellow leaf curl and bacterial spot images. Therefore, to fairly account for the dataset
629 imbalance during performance evaluation, Table 8 presents the performance of the proposed model and other competitive late-fusion
630 combinations in terms of WaP, WaR, and WaF-1 score. The metrics reveal that the model performed well for the unbalanced dataset and
631 further assure that the positive and the negative predicted class are correctly identified as their ground truth value.
632

633
634 Fig.19 Learning loss analysis for DeepPred late fusion models
635
636 To evaluate the model convergence Fig. (19) presents a comparative analysis of the loss curves. Evaluation of a deep learning model on the
637 unseen dataset is also crucial in determining its actual performance in real-life scenarios. Therefore, Fig. (20) shows the confusion matrix
638 analysis for all the late fusion combinations of DeepPred on the test set. The confusion matrix highlights that the proposed model has minimal
639 false negatives and high accuracy in predicting the true positives and true negatives correctly. Although, the model has less accuracy in
640 differentiating between the late blight and the leaf mould disease, however, this can be attributed to the fact that the test set was slightly
641 imbalanced, and this could be improved with more data access.
642
643 Table.8 Class wise evaluation of the DeepPred late-fusion models in terms of Weighted Precision, Recall and F-measure
Fusion Model Label Weighted-Precision Weighted-Recall Weighted-F-measure

Tomato Bacterial spot 0.99 1.00 0.99

Tomato Early blight 0.99 0.95 0.97

Tomato Late blight 0.98 0.99 0.99


Tomato Leaf Mould 1.00 0.99 0.99
ResNet-50+VGG-16
(Proposed) Tomato Yellow Leaf Curl Virus 1.00 1.00 1.00

Tomato healthy 1.00 1.00 1.00

Tomato Bacterial spot 0.99 0.95 0.97

Tomato Early blight 0.96 0.70 0.81

VGG-16+InceptioNet Tomato Late blight 0.86 0.97 0.91

Tomato Leaf Mould 0.90 0.94 0.92

Tomato Yellow Leaf Curl Virus 1.00 0.99 0.97

Tomato healthy 0.95 1.00 0.97

Tomato Bacterial spot 0.94 0.96 0.95

ResNet-50+VGG-16 Tomato Early blight 0.85 0.76 0.80

Tomato Late blight 0.91 0.94 0.93

Tomato Leaf Mould 0.83 0.90 0.87

Tomato Yellow Leaf Curl Virus 0.99 0.97 0.98

Tomato healthy 0.97 0.98 0.98


644 DeepPred plays a crucial role in correctly identifying the type of illness because various infestation candidate features are visually similar as
645 shown in Fig. (1). Therefore, it is important to enhance the predictive ability of the final model. To alleviate this issue, hyperparameter
646 optimization for the meta-learner (MLP component) was separately conducted as explained in Section 3.2.10. Different sets of hidden layer
647 configurations were identified, and their performance metrics are shown in Fig. (21). Although the plot for the neuron configuration
648 [256, 512, 1024] achieved the highest accuracy, however, its precision and recall are not competitive, and it can be due to overfitting. Also
649 due to many neurons, the fusion time of [256, 512, 1024] increased. The plot illustrates that the meta-learner model having neuron
650 configuration [64, 128, 512] outperformed other combinations and this configuration was selected for the late fusion meta-learner of the
651 proposed DeepPred component. It is revealed that the efficiency of the TomFusioNet framework to accurately predict (DeepRec) and classify
652 tomato disease (DeepPred) with less latency and high accuracy.
653

654
655 Fig. 20. Confusion matrix analysis of DeepPred late fusion models. (a) Resnet-50 + InceptioNet, (b) VGG-16 + InceptioNet, (c) ResNet-50 +
656 VGG-16
657
658 Fig. 21. Performance evaluation of DeepPred late fusion models with different hidden layer configurations of meta-learner MLP
659 4.4 Mobile application development
660 The mobile application is at forefront of the proposed TomFusioNet framework to function in a real-world situation. Not all farmers have direct
661 access to government bodies to seek help and advice regarding crop disease outbreaks. Therefore, the primary aim of this mobile application
662 is to provide end-to-end disease diagnostic assistance to the farmers at the comfort of their location. Also, owning a camera-enabled smartphone
663 is not very expensive these days, therefore, it is cost-effective. The proposed mobile application incorporates several functionalities such as:
664 (a) It accepts the tomato crop image via the phone camera; (b) The input image can be directly uploaded from the phone gallery; (c) The input
665 image is fed into the TomFusioNet framework’s ‘DeepRec’ component for preliminary disease diagnosis. (d) If the crop is healthy, no further
666 action is taken, and the user is prompted to insert another image. (e) If DeepRec detects infestation, the control is automatically passed to the
667 DeepPred component of TomFusioNet for further illness classification. (f) The farmer can select the appropriate diagnostic mechanism from
668 the app’s user-friendly interface. There are two advantages for the proposed bi-stage process: First, since the DeepPred is a heavier model with
669 3-layered late fusion architecture (see Fig. (14)), therefore, the mobile’s computation is not wasted on DeepPred if the crop is already predicted
670 as healthy by DeepRec. Second, the preliminary knowledge of the type of disease illness predicted by DeepPred assists the farmer to decide
671 the kind of fertilizers and pesticides to be used to cure the disease. Fig. (22) presents the control flow diagram of the mobile application.

672
673 Fig. 22. Mobile app control flow diagram.
674
675 Fig. 23. The TomFusioNet app mobile screens
676
677 Table.9 Performance evaluation with state-of-art methodologies of the past.
State-of-art study Accuracy Rate (%)

Hlaing et al. [38] 84

Durmus et al. [61] 94.3

Karthik et al. [15] 98

Khan et al. [88] 93

Tm et al. [89] 95
678 The front-end interface of the mobile application is presented in Fig. (23). Some additional features are also added to the mobile application
679 such as an assistance window stating the latest updates regarding government-recognized agricultural schemes and insurance plans useful for
680 the farmers. However, this feature can be enhanced if the mobile has an active internet connection.
681
682 5. Discussion and comparative analysis
683 Crop diseases possess a negative correlation with the economic development of a country. According to the report of Apeda – an independent
684 body recognized by the government of India – tomato is the second most important crop in the world after potato, however, 30%-70% of its
685 production is affected by the disease infestation. Therefore, its timely detection with high accuracy is of significant value. In recent years,
686 multifarious machine learning and deep learning methodologies have been proposed by the researchers but none of these studies discussed a
687 deployable end-to-end framework keeping in mind the convenience of farmers. This is still a major research gap concerning pragmatic
688 approaches. Although previous studies are highly skewed towards binary or multi-class classification and image segmentation, most of these
689 architectures exhibit less accuracy and poor evaluation metrics.
690 To mitigate the aforementioned issues, we have developed the TomFusioNet – an end-to-end precision agriculture module – a
691 framework that comprises a tomato disease detection module (DeepRec) and an illness classification module (DeepPred). As a pre-processing
692 step, the framework encompasses a color model transformation mechanism and an HSV-based image segmentation algorithm (refer Section
693 3.2.2) because HSV emulates human-like vision. This framework can be deployed as a mobile application that would assist the farmers in crop
694 analysis at the comfort of their location.
695 Experimental results reveal that the DeepRec module achieved an overall accuracy of 99.93% and 99.76% in the training and test
696 testing phase, respectively. Finally, DeepPred obtained superior metrics for multi-class tomato illness classification on the train and test set
697 with an overall accuracy rate of 98.32% and 95.52%, respectively. Table 9 presents the comparative analysis of the proposed model with the
698 competitive state-of-art techniques of the past. The proposed TomFusioNet outperformed the previous models with an average margin of 3%.
699 For instance, in an interesting study by Hlaing et al. [38], statistical features were manually extracted from tomato leaves to achieve the final
700 accuracy rate of 84% which is lower than the proposed model by 14%. Durmus et al. in [61] proposed the AlexNet and SqueezeNet architectures
701 for tomato disease detection using a mobile application. Their model achieved the final accuracy of 94.3%, however, no experimentation was
702 conducted on the test data to prove the robustness of their technique for unseen data. Karthik et al. ([15]) and Ngugi et al. [51] achieved good
703 overall accuracy, however, the dataset and the number of classes considered for the evaluation were limited. Apart from the limited data, these
704 works suffer from less precision and recall rate which means that the false positive rate is high. This raises a serious generalization issue for
705 the listed deep models. DeepPred only has a minimal false positive rate (see Fig. (20)) which proves its robustness and high specificity against
706 unseen images. Khan et al. ([88]) presented a novel fusion framework of color balancing and super-pixel to assist tomato disease detection
707 using unsupervised clustering and achieved the final accuracy of 93%. Their model achieved an overall false negative and true positive rate of
708 0.06 and 0.94% which is slightly less than the proposed approach. Therefore, the proposed TomFusioNet framework is effective in terms of
709 the aforementioned evaluation metrics of the recent state-of-art techniques.
710
711 6. Conclusion and future work
712 In this research, the proposed TomFusioNet framework is capable to produce an end-to-end disease recognition and illness classification report
713 of tomato crops through a mobile application. The proposed framework not only detects the inherent disease but also provides a secondary
714 feature to classify the illness into five major classes: Early blight, Late blight, Bacterial spot, Leaf mould, and yellow leaf curl virus via a single
715 data pipeline. The data pipeline also comprises an image processing module that transforms the color space from RGB to HSV to develop a
716 device-independent system that can perform background segmentation using the HSV bit-wise masking. The pre-processing step of removing
717 the background clutter aids in focussed feature extraction by the deep learning model. For feature extraction, a late fusion strategy of non-
718 linearly aggregating the results of cross-domain transfer learning models is proposed using custom-designed multi-layer perceptron
719 components. The uniqueness of the framework is its two components: DeepRec and DeepPred. While the first one provides a preliminary
720 decision of disease detection, the latter focuses on classifying the type of crop illness. The control only passes to DeepPred, in case of a positive
721 decision by DeepRec otherwise, the function loops back to receive the next input image. This bi-stage strategy reduces the computational
722 requirement for carrying out multi-class illness classification on healthy crop images. Finally, a mobile application use case is proposed for
723 TomFusioNet to provide an end-to-end solution to the farmers. This application would require a smartphone to receive the input image via
724 camera and perform predictive analysis using the proposed deep learning models. The framework is reliable and can be used as a preliminary
725 tool by the farmers who face tomato disease outbreaks and do not have direct access to government bodies for help. Experimental results reveal
726 that the proposed model achieved the final accuracy rate of 99.93% for disease recognition and 98.32% for multi-class illness classification
727 with the minimal false negative rate on the unseen dataset. The extensive comparative analysis illustrates that the metrics achieved by the
728 proposed model are superior to the state-of-art techniques conducted in the past which proves its effectiveness.
729 Although the TomFusioNet performance is extensively verified, there are a few limitations that can be resolved in the future. Limited
730 data is available for the early and late blight disease which depicts similar signs on a leaf, therefore, the proposed model had a less recall rate
731 for these categories. The proposed model can be improvised by adding the predictive ability for multiple crops (other than tomato) to target a
732 large agricultural community. The deep model can be extended by leveraging meta-heuristic algorithms such as Genetic algorithm (GA) and
733 Particle swarm optimization (PSO) algorithm for optimally selecting informative features. This would be highly impactful in improving the
734 performance metrics of the proposed deep learning models. If good results are obtained, this architecture can also be applied to other
735 applications concerning agricultural research.
736
737 CRediT Authorship Contribution Statement: Harshit Kaushik: Conceptualisation, Modelling, Experimentation, Writing, Figures, Editing;
738 Anvi Khanna: Experimentation, Analysis, Modelling; Dilbag Singh: Project administration, Supervision, Analysis, Writing, Editing, Review;
739 Manjit Kaur: Supervision, Analysis, Writing, Editing, Review; Heung-No Lee: Supervision, Analysis, Writing, Editing, Review, Funding
740
741 Acknowledgements: This work was partly supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information
742 Technology Research Center) support program (IITP-2021-2021-0-01835) supervised by the IITP (Institute for Information &
743 Communications Technology Planning & Evaluation) and the National Research Foundation of Korea (NRF) Grant funded by the Korean
744 government (MSIP) (NRF-2021R1A2B5B03002118).
745
746 Competing Interest: The authors declare no competitive interest regarding the publication of this paper.
747
748 References:
749 [1] Singh, Anil Kumar, Ashutosh Upadhyaya, Sonia Kumari, Prem K. Sundaram, and Pawan Jeet. "Role of Agriculture in making India $5 trillion Economy
750 under Corona Pandemic Circumstance: Role of agriculture in Indian economy." Journal of AgriSearch 7, no. 2 (2020): 54-58.
751 [2] Kumar, Sanjeev. "Plant disease management in India: Advances and challenges." African Journal of Agricultural Research 9, no. 15 (2014): 1207-1217.
752 [3] Singh, Vipin Kumar, Amit Kishore Singh, and Ajay Kumar. "Disease management of tomato through PGPB: current trends and future perspective." 3
753 Biotech 7, no. 4 (2017): 1-10.
754 [4] Mutka, Andrew M., and Rebecca S. Bart. "Image-based phenotyping of plant disease symptoms." Frontiers in plant science 5 (2015): 734.
755 [5] Selmani, Abdelouahed, Hassan Oubehar, Mohamed Outanoute, Abdelali Ed-Dahhak, Mohammed Guerbaoui, Abdeslam Lachhab, and Benachir Bouchikhi.
756 "Agricultural cyber-physical system enabled for remote management of solar-powered precision irrigation." Biosystems Engineering 177 (2019): 18-30.
757 [6] Sujatha, R., Jyotir Moy Chatterjee, N. Z. Jhanjhi, and Sarfraz Nawaz Brohi. "Performance of deep learning vs machine learning in plant leaf disease
758 detection." Microprocessors and Microsystems 80 (2021): 103615.
759 [7] Hassanien, Aboul Ella, Tarek Gaber, Usama Mokhtar, and Hesham Hefny. "An improved moth flame optimization algorithm based on rough sets for tomato
760 diseases detection." Computers and Electronics in Agriculture 136 (2017): 86-96.
761 [8] Kusuma, Arya, and M. Dalvin Marno Putra. "Tomato maturity classification using naive bayes algorithm and histogram feature extraction." Journal of
762 Applied Intelligent System 3, no. 1 (2018): 39-48.
763 [9] Zhao, Zhong-Qiu, Peng Zheng, Shou-tao Xu, and Xindong Wu. "Object detection with deep learning: A review." IEEE transactions on neural networks and
764 learning systems 30, no. 11 (2019): 3212-3232.
765 [10] Kaushik, Harshit, Dilbag Singh, Manjit Kaur, Hammam Alshazly, Atef Zaguia, and Habib Hamam. "Diabetic retinopathy diagnosis from fundus images
766 using stacked generalization of deep models." IEEE Access 9 (2021): 108276-108292.
767 [11] Sánchez, Francisco Luque, Isabelle Hupont, Siham Tabik, and Francisco Herrera. "Revisiting crowd behaviour analysis through deep learning: Taxonomy,
768 anomaly detection, crowd emotions, datasets, opportunities and prospects." Information Fusion 64 (2020): 318-335.
769 [12] Xu, Wei. "Toward human-centered AI: a perspective from human-computer interaction." interactions 26, no. 4 (2019): 42-46.
770 [13] Zhang, Qingchen, Laurence T. Yang, Zhikui Chen, and Peng Li. "A survey on deep learning for big data." Information Fusion 42 (2018): 146-157.
771 [14] Mohanty, Sharada P., David P. Hughes, and Marcel Salathé. "Using deep learning for image-based plant disease detection." Frontiers in plant science 7
772 (2016): 1419.
773 [15] Karthik, R., M. Hariharan, Sundar Anand, Priyanka Mathikshara, Annie Johnson, and R. Menaka. "Attention embedded residual CNN for disease detection
774 in tomato leaves." Applied Soft Computing 86 (2020): 105933.
775 [16] Saeed, Farah, Muhammad Attique Khan, Muhammad Sharif, Mamta Mittal, Lalit Mohan Goyal, and Sudipta Roy. "Deep neural network features fusion
776 and selection based on PLS regression with an application for crops diseases classification." Applied Soft Computing 103 (2021): 107164.
777 [17] Arsenovic, Marko, Mirjana Karanovic, Srdjan Sladojevic, Andras Anderla, and Darko Stefanovic. "Solving current limitations of deep learning based
778 approaches for plant disease detection." Symmetry 11, no. 7 (2019): 939.
779 [18] Gonzalez-Huitron, Victor, José A. León-Borges, A. E. Rodriguez-Mata, Leonel Ernesto Amabilis-Sosa, Blenda Ramírez-Pereda, and Hector Rodriguez.
780 "Disease detection in tomato leaves via CNN with lightweight architectures implemented in Raspberry Pi 4." Computers and Electronics in Agriculture 181
781 (2021): 105951.
782 [19] Francis, Mercelin, and C. Deisy. "Disease detection and classification in agricultural plants using convolutional neural networks—a visual understanding."
783 In 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 1063-1068. IEEE, 2019.
784 [20] da Costa, Arthur Z., Hugo EH Figueroa, and Juliana A. Fracarolli. "Computer vision based detection of external defects on tomatoes using deep
785 learning." Biosystems Engineering 190 (2020): 131-144.
786 [21] Dhakate, Mrunmayee, and A. B. Ingole. "Diagnosis of pomegranate plant diseases using neural network." In 2015 fifth national conference on computer
787 vision, pattern recognition, image processing and graphics (NCVPRIPG), pp. 1-4. IEEE, 2015.
788 [22] Sannakki, Sanjeev S., Vijay S. Rajpurohit, V. B. Nargund, and Pallavi Kulkarni. "Diagnosis and classification of grape leaf diseases using neural networks."
789 In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1-5. IEEE, 2013.
790 [23] Barbedo, Jayme GA. "Factors influencing the use of deep learning for plant disease recognition." Biosystems engineering 172 (2018): 84-91.
791 [24] Wertheimer, Max. "Laws of organization in perceptual forms." (1938).
792 [25] Wspanialy, Patrick, and Medhat Moussa. "A detection and severity estimation system for generic diseases of tomato greenhouse plants." Computers and
793 Electronics in Agriculture 178 (2020): 105701.
794 [26] Wang, Xiao-Feng, De-Shuang Huang, Ji-Xiang Du, Huan Xu, and Laurent Heutte. "Classification of plant leaf images with complicated
795 background." Applied mathematics and computation 205, no. 2 (2008): 916-926.
796 [27] Tian, Kai, Jiuhao Li, Jiefeng Zeng, Asenso Evans, and Lina Zhang. "Segmentation of tomato leaf images based on adaptive clustering number of K-means
797 algorithm." Computers and Electronics in Agriculture 165 (2019): 104962.
798 [28] Gleason, Mark Lawrence, and Brooke A. Edmunds. Tomato diseases and disorders. Ames, IA: Iowa State University, University Extension, 2005.
799 [29] Wang, Chunshan, Pengfei Du, Huarui Wu, Jiuxi Li, Chunjiang Zhao, and Huaji Zhu. "A cucumber leaf disease severity classification method based on the
800 fusion of DeepLabV3+ and U-Net." Computers and Electronics in Agriculture 189 (2021): 106373.
801 [30] Kolhar, Shrikrishna, and Jayant Jagtap. "Convolutional neural network based encoder-decoder architectures for semantic segmentation of
802 plants." Ecological Informatics 64 (2021): 101373.
803 [31] Barbedo, Jayme Garcia Arnal. "Plant disease identification from individual lesions and spots using deep learning." Biosystems Engineering 180 (2019): 96-
804 107.
805 [32] Ibrahim, Wan Mohd Rusydan Wan, Norzaidi Mohd Daud, and Roshidi Hassan. "The roles of ICT for knowledge management in agriculture." International
806 Journal of Technology Management and Information System 2, no. 2 (2020): 1-13.
807 [33] Petrellis, Nikos. "Mobile application for plant disease classification based on symptom signatures." In Proceedings of the 21st Pan-Hellenic Conference on
808 Informatics, pp. 1-6. 2017.
809 [34] Verma, Shradha, Anuradha Chug, Amit Prakash Singh, Shubham Sharma, and Puranjay Rajvanshi. "Deep learning-based mobile application for plant
810 disease diagnosis: A proof of concept with a case study on tomato plant." In Applications of image processing and soft computing systems in agriculture, pp.
811 242-271. IGI Global, 2019.
812 [35] Johannes, Alexander, Artzai Picon, Aitor Alvarez-Gila, Jone Echazarra, Sergio Rodriguez-Vaamonde, Ana Díez Navajas, and Amaia Ortiz-Barredo.
813 "Automatic plant disease diagnosis using mobile capture devices, applied on a wheat use case." Computers and electronics in agriculture 138 (2017): 200-209.
814 [36] Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60, no. 2 (2004): 91-110.
815 [37] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." In 2005 IEEE computer society conference on computer vision
816 and pattern recognition (CVPR'05), vol. 1, pp. 886-893. Ieee, 2005.
817 [38] Hlaing, Chit Su, and Sai Maung Maung Zaw. "Model-based statistical features for mobile phone image of tomato plant disease classification." In 2017 18th
818 international conference on parallel and distributed computing, applications and technologies (PDCAT), pp. 223-229. IEEE, 2017.
819 [39] Hlaing, Chit Su, and Sai Maung Maung Zaw. "Tomato plant diseases classification using statistical texture feature and color feature." In 2018 IEEE/ACIS
820 17th International Conference on Computer and Information Science (ICIS), pp. 439-444. IEEE, 2018.
821 [40] Kurmi, Yashwant, and Suchi Gangwar. "A leaf image localization based algorithm for different crops disease classification." Information Processing in
822 Agriculture (2021).
823 [41] Sabrol, H., and K. Satish. "Tomato plant disease classification in digital images using classification tree." In 2016 International Conference on
824 Communication and Signal Processing (ICCSP), pp. 1242-1246. IEEE, 2016.
825 [42] Aurangzeb, Khursheed, Farah Akmal, Muhammad Attique Khan, Muhammad Sharif, and Muhammad Younus Javed. "Advanced machine learning
826 algorithm based system for crops leaf diseases recognition." In 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), pp. 146-
827 151. IEEE, 2020.
828 [43] Chen, Liang, Paul Bentley, Kensaku Mori, Kazunari Misawa, Michitaka Fujiwara, and Daniel Rueckert. "DRINet for medical image segmentation." IEEE
829 transactions on medical imaging 37, no. 11 (2018): 2453-2462.
830 [44] Zhang, Laigang, Zhou Sheng, Yibin Li, Qun Sun, Ying Zhao, and Deying Feng. "Image object detection and semantic segmentation based on convolutional
831 neural network." Neural Computing and Applications 32, no. 7 (2020): 1949-1958.
832 [45] Mandikal, Priyanka, Navaneet KL, and R. Venkatesh Babu. "3d-psrnet: Part segmented 3d point cloud reconstruction from a single image." In Proceedings
833 of the European Conference on Computer Vision (ECCV) Workshops, pp. 0-0. 2018.
834 [46] Hossain, Mohammad D., and Dongmei Chen. "Segmentation for Object-Based Image Analysis (OBIA): A review of algorithms and challenges from remote
835 sensing perspective." ISPRS Journal of Photogrammetry and Remote Sensing 150 (2019): 115-134.
836 [47] Jaware, Tushar H., Ravindra D. Badgujar, and Prashant G. Patil. "Crop disease detection using image segmentation." World Journal of Science and
837 Technology 2, no. 4 (2012): 190-194.
838 [48] Picon, Artzai, Aitor Alvarez-Gila, Maximiliam Seitz, Amaia Ortiz-Barredo, Jone Echazarra, and Alexander Johannes. "Deep convolutional neural networks
839 for mobile capture device-based crop disease classification in the wild." Computers and Electronics in Agriculture 161 (2019): 280-290.
840 [49] Cremers, Daniel, Mikael Rousson, and Rachid Deriche. "A review of statistical approaches to level set segmentation: integrating color, texture, motion and
841 shape." International journal of computer vision 72, no. 2 (2007): 195-215.
842 [50] Pal, Nikhil R., and Sankar K. Pal. "A review on image segmentation techniques." Pattern recognition 26, no. 9 (1993): 1277-1294.
843 [51] Ngugi, Lawrence C., Moataz Abdelwahab, and Mohammed Abo-Zahhad. "Tomato leaf segmentation algorithms for mobile phone applications using deep
844 learning." Computers and Electronics in Agriculture 178 (2020): 105788.
845 [52] Elangovan, K., and S. Nalini. "Plant disease classification using image segmentation and SVM techniques." International Journal of Computational
846 Intelligence Research 13, no. 7 (2017): 1821-1828.
847 [53] Singh, Vijai, and Ak K. Misra. "Detection of plant leaf diseases using image segmentation and soft computing techniques." Information processing in
848 Agriculture 4, no. 1 (2017): 41-49.
849 [54] Dandawate, Yogesh, and Radha Kokare. "An automated approach for classification of plant diseases towards development of futuristic Decision Support
850 System in Indian perspective." In 2015 International conference on advances in computing, communications and informatics (ICACCI), pp. 794-799. IEEE,
851 2015.
852 [55] Kurmi, Yashwant, Suchi Gangwar, Dheeraj Agrawal, Satrughan Kumar, and Hari Shanker Srivastava. "Leaf image analysis-based crop diseases
853 classification." Signal, Image and Video Processing 15, no. 3 (2021): 589-597.
854 [56] Cao, Chensi, Feng Liu, Hai Tan, Deshou Song, Wenjie Shu, Weizhong Li, Yiming Zhou, Xiaochen Bo, and Zhi Xie. "Deep learning and its applications in
855 biomedicine." Genomics, proteomics & bioinformatics 16, no. 1 (2018): 17-32.
856 [57] Wang, Jinjiang, Yulin Ma, Laibin Zhang, Robert X. Gao, and Dazhong Wu. "Deep learning for smart manufacturing: Methods and applications." Journal
857 of manufacturing systems 48 (2018): 144-156.
858 [58] Uçar, Ayşegül, Yakup Demir, and Cüneyt Güzeliş. "Object recognition and detection with deep learning for autonomous driving
859 applications." Simulation 93, no. 9 (2017): 759-769.
860 [59] Ma, Lei, Yu Liu, Xueliang Zhang, Yuanxin Ye, Gaofei Yin, and Brian Alan Johnson. "Deep learning in remote sensing applications: A meta-analysis and
861 review." ISPRS journal of photogrammetry and remote sensing 152 (2019): 166-177.
862 [60] Zhang, Lu, Jianjun Tan, Dan Han, and Hao Zhu. "From machine learning to deep learning: progress in machine intelligence for rational drug
863 discovery." Drug discovery today 22, no. 11 (2017): 1680-1685.
864 [61] Durmuş, Halil, Ece Olcay Güneş, and Mürvet Kırcı. "Disease detection on the leaves of the tomato plants by using deep learning." In 2017 6th International
865 Conference on Agro-Geoinformatics, pp. 1-5. IEEE, 2017.
866 [62] Sladojevic, Srdjan, Marko Arsenovic, Andras Anderla, Dubravko Culibrk, and Darko Stefanovic. "Deep neural networks based recognition of plant diseases
867 by leaf image classification." Computational intelligence and neuroscience 2016 (2016).
868 [63] Hughes, David, and Marcel Salathé. "An open access repository of images on plant health to enable the development of mobile disease diagnostics." arXiv
869 preprint arXiv:1511.08060 (2015).
870 [64] Xue, Su, Aseem Agarwala, Julie Dorsey, and Holly Rushmeier. "Understanding and improving the realism of image composites." ACM Transactions on
871 graphics (TOG) 31, no. 4 (2012): 1-10.
872 [65] Guijarro, Marıa, Gonzalo Pajares, Isabel Riomoros, P. J. Herrera, X. P. Burgos-Artizzu, and Angela Ribeiro. "Automatic segmentation of relevant textures
873 in agricultural images." Computers and Electronics in Agriculture 75, no. 1 (2011): 75-83.
874 [66] Hossain, Eftekhar, Md Farhad Hossain, and Mohammad Anisur Rahaman. "A color and texture based approach for the detection and classification of plant
875 leaf disease using KNN classifier." In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1-6. IEEE, 2019.
876 [67] Ibraheem, Noor A., Mokhtar M. Hasan, Rafiqul Z. Khan, and Pramod K. Mishra. "Understanding color models: a review." ARPN Journal of science and
877 technology 2, no. 3 (2012): 265-275.
878 [68] Ganesan, P., V. Rajini, and R. Immanuvel Rajkumar. "Segmentation and edge detection of color images using CIELAB color space and edge detectors."
879 In INTERACT-2010, pp. 393-397. IEEE, 2010.
880 [69] Ganesan, Poonthalir, V. Rajini, B. S. Sathish, and Khamar Basha Shaik. "HSV color space based segmentation of region of interest in satellite images."
881 In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 101-105. IEEE, 2014.
882 [70] Nunnari, Fabrizio, Abraham Ezema, and Daniel Sonntag. "The effects of masking in melanoma image classification with CNNs towards international
883 standards for image preprocessing." In International Conference on Wireless Mobile Communication and Healthcare, pp. 257-273. Springer, Cham, 2020.
884 [71] Flusser, Jan, Sajad Farokhi, Cyril Höschl, Tomáš Suk, Barbara Zitová, and Matteo Pedone. "Recognition of images degraded by Gaussian blur." IEEE
885 transactions on Image Processing 25, no. 2 (2015): 790-806.
886 [72] Shorten, Connor, and Taghi M. Khoshgoftaar. "A survey on image data augmentation for deep learning." Journal of big data 6, no. 1 (2019): 1-48.
887 [73] Albawi, Saad, Tareq Abed Mohammed, and Saad Al-Zawi. "Understanding of a convolutional neural network." In 2017 international conference on
888 engineering and technology (ICET), pp. 1-6. Ieee, 2017.
889 [74] Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. "Imagenet: A large-scale hierarchical image database." In 2009 IEEE conference
890 on computer vision and pattern recognition, pp. 248-255. Ieee, 2009.
891 [75] Tan, Chuanqi, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. "A survey on deep transfer learning." In International conference
892 on artificial neural networks, pp. 270-279. Springer, Cham, 2018.
893 [76] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich.
894 "Going deeper with convolutions." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9. 2015.
895 [77] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on
896 computer vision and pattern recognition, pp. 770-778. 2016.
897 [78] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
898 [79] Wang, Shui-Hua, Preetha Phillips, Yuxiu Sui, Bin Liu, Ming Yang, and Hong Cheng. "Classification of Alzheimer’s disease based on eight-layer
899 convolutional neural network with leaky rectified linear unit and max pooling." Journal of medical systems 42, no. 5 (2018): 1-11.
900 [80] Kussul, Nataliia, Mykola Lavreniuk, Sergii Skakun, and Andrii Shelestov. "Deep learning classification of land cover and crop types using remote sensing
901 data." IEEE Geoscience and Remote Sensing Letters 14, no. 5 (2017): 778-782.
902 [81] Poria, Soujanya, Erik Cambria, Rajiv Bajpai, and Amir Hussain. "A review of affective computing: From unimodal analysis to multimodal
903 fusion." Information Fusion 37 (2017): 98-125.
904 [82] Xie, Jie, and Mingying Zhu. "Handcrafted features and late fusion with deep learning for bird sound classification." Ecological Informatics 52 (2019): 74-
905 81.
906 [83] Lu, Jiang, Jie Hu, Guannan Zhao, Fenghua Mei, and Changshui Zhang. "An in-field automatic wheat disease diagnosis system." Computers and electronics
907 in agriculture 142 (2017): 369-379.
908 [84] Bhatt, Prakruti, Sanat Sarangi, Anshul Shivhare, Dineshkumar Singh, and Srinivasu Pappula. "Identification of Diseases in Corn Leaves using Convolutional
909 Neural Networks and Boosting." In ICPRAM, pp. 894-899. 2019.
910 [85] Maimaitijiang, Maitiniyazi, Vasit Sagan, Paheding Sidike, Sean Hartling, Flavio Esposito, and Felix B. Fritschi. "Soybean yield prediction from UAV using
911 multimodal data fusion and deep learning." Remote sensing of environment 237 (2020): 111599.
912 [86] Akilan, T., QM Jonathan Wu, Amin Safaei, and Wei Jiang. "A late fusion approach for harnessing multi-CNN model high-level features." In 2017 IEEE
913 International Conference on Systems, Man, and Cybernetics (SMC), pp. 566-571. IEEE, 2017.
914 [87] Smith, Samuel, Erich Elsen, and Soham De. "On the generalization benefit of noise in stochastic gradient descent." In International Conference on Machine
915 Learning, pp. 9058-9067. PMLR, 2020.
916 [88] Khan, Saiqa, and Meera Narvekar. "Novel fusion of color balancing and superpixel based approach for detection of tomato plant diseases in natural complex
917 environment." Journal of King Saud University-Computer and Information Sciences (2020).
918 [89] P. Tm, A. Pranathi, K. SaiAshritha, N. B. Chittaragi and S. G. Koolagudi, "Tomato Leaf Disease Detection Using Convolutional Neural Networks," 2018
919 Eleventh International Conference on Contemporary Computing (IC3), 2018, pp. 1-5, doi: 10.1109/IC3.2018.8530532.

You might also like