You are on page 1of 11

Computers and Electronics in Agriculture 178 (2020) 105792

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

A solanaceae disease recognition model based on SE-Inception T


a,b,c,⁎ a,b a,b a,b a,b d
Zhenbo Li , Yongbo Yang , Ye Li , RuoHao Guo , Jinqi Yang , Jun Yue
a
College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
b
Key Laboratory of Agricultural Information Acquisition Technology, Ministry of Agriculture, Beijing 100083, China
c
National Innovation Center for Digital Fishery, China Agricultural University, Beijing 100083, China
d
College of Information and Electrical Engineering, LuDong University, Yantai 264025, China

ARTICLE INFO ABSTRACT

Keywords: Aiming at the diseases of tomato and eggplant, we present a solanaceae disease recognition model based on SE-
Disease recognition Inception. Our model uses batch normalization layer (BN) to accelerate network convergence. Besides, SE-
Batch normalization Inception structure and multi-scale feature extraction module is adopted to improve accuracy of this model. Our
SE-Inception sample data set consists of 4 disease categories including whitefly, powdery mildew, yellow smut, cotton blight.
Multi-scale feature extraction
We also add healthy leaves into it. In order to reduce overfitting, the data set is expanded by the data en-
Model implementation
hancement method of translation, rotation and flip. Experiments show that the average recognition accuracy of
this model is 98.29% and the model size is 14.68 MB on our constructed dataset. In addition, in order to verify
the robustness of this model, it was also verified on the public data set of PlantVillage, and the top-1, top-5
accuracy and the size of our proposed model is 99.27%, 99.99% and 14.8 MB respectively. Moreover, we im-
plemented a solanaceae disease image recognition system using this model based on the Android. The accuracy
of average recognition and the recognition time of a single photo are 95.09% and 227 ms, respectively. Our
constructed model has a small number of parameters with maintaining high accuracy, which can meet the needs
of automatic recognition of disease images on mobile devices. Data and code are available at https://github.
com/Jujube-sun/diseaseRecognition.

1. Introduction mainly rely on manual selection of specified features. Disease image


recognition based on deep learning usually uses convolutional neural
Plant diseases are one of the main reasons for the decline in the network (CNN), which is an end-to-end image recognition method. A
quality of agricultural products and the loss of agricultural economy series of representative CNN models were proposed, including AlexNet
(Liang et al., 2019, Mohanty et al., 2016, Hassanien et al., 2017). In (Krizhevsky et al. 2012), VGG (Simonyan and Zisserman, 2014), Goo-
addition, traditional plant disease identification mainly relies on arti- gLeNet (Szegedy et al. 2015), ResNet (He et al. 2016) and DenseNet
ficial observation and empirical judgment, which is slow and sub- (Huang et al. 2017) since 2012. Besides, these deep learning models are
jective. Accurate identification of plant diseases is the basis of rational used in crop disease identification nowadays. Nachtigall et al. (2016)
medication and the key to preventing diseases spreading. uses the AlexNet model to recognize apple leaf disease, which reaches
With the development of image processing and machine vision the recognition accuracy of 97.3%. The data set containing labeled
technology, people begin to use them to identify crop diseases. The examples consisting of 2539 images from 6 known disorders. And, Too
main procedure is to process the image first, and then to extract some et al. (2019) realizes plant disease identification by fine-tuning the
specific image features, finally use the classifier to classify the extracted existing deep convolutional neural network models such as VGG16,
features. At last, the classification of crop diseases will be realized. InceptionV4 (Szegedy et al. 2017), and ResNet network. In addition,
Scholars at home and abroad use different classifiers to identify crop Zhong and Zhao (2020) identifies apple disease leaves based on Den-
diseases. Common classifiers include Support Vector Machine (SVM) seNet network combined with regression, multi-label classification and
(Liran et al., 2017, Kamal et al., 2018), Bayesian classifier (Liran et al. focused loss function. The accuracy of the test set is 93.51%, 93.31%
2017, Kamal et al. 2018), random forest (Yongquan et al. 2018) and so and 93.71%, respectively.
on. While, the above methods prove the feasibility of CNN for plant
While, traditional machine vision disease recognition methods disease identification, these models have large parameters and long


Corresponding author at: P.O. Box 121, China Agricultural University, 17 Tsinghua East Road, Beijing 100083, PR China.
E-mail address: lizb@cau.edu.cn (Z. Li).

https://doi.org/10.1016/j.compag.2020.105792
Received 16 May 2020; Received in revised form 10 August 2020; Accepted 12 September 2020
Available online 24 September 2020
0168-1699/ © 2020 Elsevier B.V. All rights reserved.
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

training time, which is difficult to deploy on the mobile devices. With kinds of plants and 26 types of disease leaves totally. PlantVillage has a
the popularity of mobile devices, some scholars have also proposed total of 38 species and 54,305 images of plant disease leaves. In our
lightweight networks such as MobileNetV1 (Howard et al. 2017), Mo- experiment, we disturbed dataset and divided it into a training set, a
bileNetV2 (Sandler et al. 2018), ShuffleNetV1 (Zhang et al. 2018), validation set, and a test set according to the ratio of 6: 2: 2. The size of
ShuffleNetV2 (Ma et al. 2018), etc. Xiaoqing et al. (2019) identifies the original picture is normalized to 224 × 224, as input for model
tomato leaf disease images based on improved multi-scale AlexNet, training.
with the accuracy of 92.7% and the model size is just 29.9 MB. Yang
et al. (2019) compare MobileNeV1 and InceptionV3 [23] methods to 2.2. Data augmentation
realize the plant disease recognition on the mobile devices. The average
recognition rate on the PlantVillage data set is 95.02% and 95.62%, By counting the total number of samples and the distribution of
respectively. samples in various categories, it is found that the samples of our con-
In order to solve the problem of large model size, this paper pro- structed dataset are imbalanced distribution. Our constructed dataset
poses a new solanaceous disease recognition method based on SE- contains 434 yellow smut, 161 cotton blights, 386 powdery mildews,
Inception, which is inspired by GoogLeNet. Our model combines multi- 104 whitely, and 750 healthy leaves. Due to unbalanced data affecting
scale feature extraction, SENet (Hu et al. 2017), InceptionV2, Batch the recognition effect of deep learning models (Buda et al. 2018), data
normalization (Ioffe and Szegedy, 2015) methods. Our proposed model enhancement is performed for several categories with a small amount of
was trained and tested on our constructed dataset and the PlantVillage data. Color feature is one of the key features of disease identification. So
(Hughes and Salathe, 2015) public dataset. And we also compared with the color information of the original picture cannot be changed during
the recognition results and model size of some existing lightweight augmenting data. Based on the Keras framework, the following three
networks. As a result, our model performs well compared with others. data enhancement methods are mainly adopted (1) Random flip: flip
The remainder of this paper is organized in the following manner. along the horizontal and vertical directions of the image. (2) Random
Section 2 introduces the structure of experimental data and data pre- angle rotation: rotate at a certain angle with the image center as the
processing methods; In Section 3, we propose the network structure of origin (3) Image offset: shift the entire image along the horizontal or
our model; Experimental results are described in Section 4; Model im- vertical direction by a certain distance. The enhanced data set dis-
plementation is described in Section 5. Finally, the paper is summarized tribution is: 690 yellow smut, 644 cotton blights, 674 powdery mil-
in Section 6. dews, 602 whitely, and 750 healthy leaves. Detailed report of dataset
before and after applying the augmentation process is shown in Table 1.

2. Datasets
3. Architecture of our constructed model
2.1. Data acquisition
Our model uses the network structure of GoogLeNet as a reference
In this paper, we selected two datasets for the experiment. The first to construct a new lighter convolutional neural network with BN layer,
dataset was our constructed solanaceous disease dataset, and the multi-scale feature extraction module and SE-Inception (Szegedy et al.
second one was the PlantVillage dataset. 2016, Hu et al. 2017) structure. In order to improve the operating ef-
Our constructed solanaceous disease data in this article consists of ficiency, the model needs to reduce its memory requirements while
two parts, one part is from AI Challenger2018 (https://challenger.ai/ ensuring recognition accuracy.
competition/pdr2018.) Crop Disease Challenge (1315 photos). The
other part was taken under natural light in Xinyuan Sunshine Plantation 3.1. Multi-scale feature extraction
Park, Yongqing County, Langfang City, Hebei Province (520 photos). In
order to restore the real natural environment, we adopted multiple Due to the different morphology and features of different diseases, a
angle shots to take pictures in the morning and afternoon. The shooting multi-scale feature extraction module is proposed, which uses con-
equipment is Sony RX100M3 camera and Huawei Honor 10 mobile volution kernels of different scales to extract features from input pic-
phone. A total of 5 types of image samples of solanaceae were collected, tures. Multi-scale feature extraction module can extract multiple local
including 4 kinds of diseases (powdery mildew, whitely, cotton blight features simultaneously. In the convolutional neural network, the low-
and yellow smut) and healthy leaves. The background of the images level convolution retains the original information of the picture as
taken in the Xinyuan Sunshine Plantation Park is more complex than much as possible, mainly extracting simple features such as color,
those from the AI Challenger. An example of image samples is shown in texture, and edge of the image, while the features extracted by the high-
Fig. 1. level convolution are abstract and global (Yu et al. 2017).
PlantVillage (http://www.plantvillage.org) is a plant disease data The first layer in the GoogLeNet model uses a 7 × 7 large-scale
set. It contains a large number of plant disease images, including 13 convolution kernel. Generally, a large-scale convolution is used at the

Fig. 1. Sample images of our constructed dataset.

2
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

Table 1
Detailed report of the constructed dataset before and after applying the augmentation process.
Class Name Original(AI) Original(Xinyuan) Original After augmentation

1 Yellow smut 261 173 434 690


2 Cotton blights 0 161 161 644
3 Powdery mildews 343 43 386 674
4 Whitely 0 104 104 602
5 Healthy leaves 711 39 750 750

Total images 1315 520 1835 3360

bottom layer to extract coarse-grained features such as edges and


m
2 1
(x i uB ) 2
contours. When identifying different diseases of solanaceae crops, the B
m (2)
i=1
following issues need to be considered: (1) The scale of different disease
spots is different. In Fig. 1, the disease spots of powdery mildew and In the above formula, µ B represents the batch mean, and re- 2
B

whitefly are relatively small and scattered, while cotton blight are more presents the batch variance. After that, the data is normalized to obtain
obvious. (2) The characterization information of different diseases is data xi with mean 0 and variance 1.
similar. The powdery mildew and whitefly disease spots in Fig. 1 are xi µB
xi
relatively small and scattered. The small color and texture differences 2
+ (3)
B
(fine-grained characteristics) are the key to distinguishing these dis-
eases. Finally, the original feature distribution is restored through re-
In summary, the identification of different solanaceous diseases construction.
needs to consider both coarse-grained features (the size of the lesion) yi xi + = BN , (xi ) (4)
and fine-grained features (small colors and textures). In addition, the
comprehensive extraction of multiple features is the key to character- = Var [x i] (5)
izing the disease. Therefore, convolution kernels of different sizes are
set on the first layer of the model to improve the response of the bottom = E [xi ] (6)
layer to different granularity features. Four different convolutions where and are the parameters to be learned, Var represents the
kernels of 1 × 1, 3 × 3, 5 × 5, 7 × 7 are used. The number of small variance, and E represents the mean. When and in formula (4) take
convolution kernels (1 × 1, 3 × 3) and large convolution kernels the values of formula (5) and formula (6) respectively, the original
(5 × 5, 7 × 7) are 32 and 16 respectively. The feature maps obtained characteristic distribution of a certain layer can be restored.
after the convolution operation are merged into a tensor and continue
to be passed down. The specific structure is shown in Fig. 2. 3.3. InceptionV2

Our model uses the InceptionV2 structure to decrease the amount of


3.2. Batch normalization
its parameters to reduce the size of the model. InceptionV2 is an im-
provement of the original InceptionV1 structure. The structures of
In order to speed up the convergence of the network and prevent the
InceptionV1 and InceptionV2 are shown in Fig. 3.
model from overfitting, a Batch Normalization (BN) layer is added after
The InceptionV1 module is the basic module in GoogLeNet. It can
each convolutional layer in the network. The main purpose of the BN
improve the utilization of network resources and has less parameters
layer is to forcibly pull the distribution of any neuron input value of
comparing with VGGNet and AlexNet. InceptionV1 has 4 channels
each layer of the neural network back to the standard normal dis-
composed of 1 × 1, 3 × 3, and 5 × 5 convolution kernels. Convolution
tribution with a mean of 0 and a variance of 1 through certain stan-
neural network can obtain multiscale feature maps using different
dardization methods.
convolution kernels. The use of 1 × 1 convolution before 3 × 3 and
The algorithm process of the BN layer is as follows:
5 × 5 convolution is to reduce the dimension and calculation bottle-
The input of the batch normalization algorithm is the value of x in a
neck while increasing the number of network layers, which can im-
batch: B = {x1 m} ; Parameters to be learned: , ;
prove the expression ability of the network.
Output: {yi = BN , (x i )} .
First we calculate the mean and variance of batch B = {x1 m} The InceptionV2 module changed the 5 × 5 convolution in
InceptionV1 to two 3 × 3 convolutions. There are two advantages as
1
m follows: one of these is that a large number of parameters can be saved;
µB xi another is that it can handle more abundant spatial features and in-
m (1)
crease the diversity of features. In order to accelerate the network
i=1

Fig. 2. Multi-scale feature extraction module. Fig. 3. Structure of InceptionV1 and InceptionV2.

3
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

Fig. 4. SENet module.


X U
Ftr
H`×W`×C` H×W×C H×W×C

Fsq Fscale
Fex
1×1×C 1×1×C

C C
1 × 1 × 1 × 1 ×
r r

Fig. 5. Structure of SE-InceptionV2.

Fig. 7. Accuracy curves of different modules on training set.

Fig. 6. Structure of our model.

Table 2
Structure of the base model.
Type Patch Size/Stride Output Size

convolution 7 × 7/2 112 × 112 × 64


max pool 3 × 3/2 56 × 56 × 64
convolution 3 × 3/1 56 × 56 × 192
max pool 3 × 3/2 28 × 28 × 192
InceptionV2 1 × 1,3 × 3/1 28 × 28 × 384
InceptionV2 1 × 1,3 × 3/2 14 × 14 × 480
InceptionV2 1 × 1,3 × 3/2 7 × 7 × 512
avg pool 7 × 7/1 1 × 1 × 512
linear 512
softmax n

convergence speed, the BN layer is added after the convolution in the


Inception V2 structure.
Fig. 8. Loss curves of different modules on training set.

3.4. Squeeze-and-Excitation Networks


compression, excitation and reconstruction. Compression and excita-
tion are the core procedure of SENet. The SENet module is shown in
Squeeze-and-Excitation Networks (SENet) is the champion model of
Fig. 4.
the classification task of the last ImageNet large-scale visual recognition
Ftr represents the convolution process from the X feature map to the
challenge. SENet pays attention to the connection between various
U feature map. Fsq is the compression operation of the module, which
channels, and learns the importance of different channel features
encodes the entire spatial information on a channel as a global feature.
through the network. The SENet module can be divided into three steps:

4
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

Fig. 11. Comparison of the training accuracy and validation accuracy.


Fig. 9. Accuracy curves of different modules on validation set.

Fig. 12. Comparison of the training loss and validation loss.

Fig. 10. Loss curves of different modules on the validation set.

Table 3
The influence of different modules on the model.
Models Size(MB) Epoch Parameters FLOPs Accuracy

Base 13.1 27 1.71M 1.02GFlOPs 85.47


Base_Multi 13.5 47 1.76M 1.15GFLOPs 91.17
Base_BN 13.3 12 1.76M 1.15GFLOPs 81.48
Base_Multi_BN 13.7 14 1.76M 1.16GFLOPs 96.58
Base_Multi_BN_SE 14.6 24 1.86M 1.16GFLOPs 98.29

Table 4
Classification results of SE-Inception. FLOPs are estimated for input of
3 × 224 × 224.
Model FLOPs Parameters Accuracy (%) Size (MB)

1SE-Inception 710.72MFlops 0.51M 92.02 3.3


Fig. 13. Accuracy curve of Solanaceae validation set.
2SE-Inception 984.79MFlops 0.95M 95.73 6.78
3SE-Inception 1.16GFlops 1.84M 97.15 10.4
4SE-Inception 1.3GFlops 2.03M 96.01 15.9 The entire step is implemented by global average pooling. The specific
Ours 1.16GFLOPs 1.86M 98.29 14.68
formula is as follows.
H W
1
z c = Fsq (uc ) = uc (i , j )
H×W i=1 j=1 (7)

Through the Fsq operation, the original feature map of size


H × W × C is compressed into 1 × 1 × C. H, W, and C represent the

5
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

Fig. 14. Loss curve of the Solanaceae validation set. Fig. 15. Precision curve of PlantVillage training set.

Table 5
Classification results of our conduct dataset. FLOPs are estimated for input of
3 × 224 × 224.
Model FLOPs Parameters Accuracy (%) Size (MB)

GoogLeNet 1.51GFLOPs 9.95M 89.17 87.20


MobileNetV1 583.92MFLOPs 3.21M 94.02 24.88
MobileNetV2 320.24MFLOPs 2.23M 94.21 17.84
MobileNetV3 219.65MFLOPs 4.2M 95.44 67.35
ShuffleNetV2 150.6MFLOPs 1.26M 86.78 31.49
Ours 1.16GFLOPs 1.86M 98.29 14.68

height and width of the original feature map and the number of
channels, respectively. Fex stands for the excitation operation, which
uses the global features obtained by the Fsq operation through the fully
connected layer, the ReLU activation layer, the fully connected layer,
and the Sigmoid layer in sequence, and learns the weight coefficients of
each channel. The formula is shown below. Fig. 16. Loss curve of PlantVillage training set.

s = Fex (z, W ) = (g (z, W )) = (W2 ReLU (W1 z )) (8)


map size through Fscale .
In formula (8), W1
C
R r ×C ,
W2
C
RC × r .
In order to better fit the Adding a residual attention mechanism to the classification network
complex correlation between channels, greatly reducing the amount of can gather local features of the target in the image and improve the
parameters and calculations while adding more nonlinearity, the recognition accuracy. Due to the portability of SENet, SENet is com-
number of neurons in the first fully connected layer is divided by r, bined with InceptionV2 to obtain the SE-InceptionV2 structure shown
where r is the compression ratio of the channel. Then the dimension is in Fig. 5.
increased through the second fully connected layer to obtain the
1 × 1 × C activation feature diagram. In addition, due to the corre- 3.5. Overall structure and parameters
lation between the channels. Sigmoid is used instead of Softmax after
the second fully connected layer. Finally restore to the original feature The model consists of a multi-scale convolution module, a

Table 6
Detailed results of specific diseases.
Model Index YellowSmut CottonBlight PowderyMidew Whitefly Healthy

MobilNetV1 precision 88.89 100.00 93.15 86.96 97.33


recall 91.95 96.88 88.31 95.24 97.33

MobilNetV2 precision 89.66 96.88 90.91 95.44 97.99


recall 89.66 96.88 90.91 100.00 97.33

MobileNetV3 precision 92.13 100.00 97.33 80.00 98.66


recall 94.25 90.62 94.81 95.24 98.00

ShuffleNetV2 precision 80.23 96.88 93.22 45.24 97.30


recall 79.31 96.88 71.43 90.48 96.00

GoogLeNet precision 95.52 91.18 82.14 56.76 97.24


recall 73.56 96.88 89.61 100.00 94.00

Our Model precision 96.67 100.00 97.37 90.91 99.33


recall 100.00 93.75 96.10 95.24 98.67

6
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

convolution module, 2 maximum pooling layers, 3 SE-Inception mod-


ules, 1 SE module, an average pooling layer and a fully connected layer.
The convolution module is composed of a convolution layer and a BN
layer. The specific structure is shown in Fig. 6.

4. Experimental results and analysis

4.1. Experiment platform

The software environment of the experimental platform is Ubuntu


16.04 LTS 64-bit system, which is equipped with Intel® Xeon (R) CPU
E5-2683V3 processor, and NVIDIA GeForceGTX 1080Ti GPU. The
Python language is used for programming. The deep learning frame-
works used are Tensorflow 1.14 and Keras 2.2.5.

4.2. Training parameters


Fig. 17. Precision curve of PlantVillage validation set.
Comprehensively considering the performance of hardware devices
and training effects, the batch training method is used to divide the
training and testing process into multiple batches, each batch con-
taining 16 pictures so bathsize is 16. The number of iterations is set to
40. The loss function uses cross-entropy loss, the weight initialization
method uses Xavier, the initialization bias is 0, and the classification
layer uses Softmax function. Stochastic Gradient Descent (SGD) is used
to optimize the model. The initial learning rate is set to 0.1 and the
momentum is 0.9. The learning rate changes dynamically with the ac-
curacy of the validation set. If the accuracy of the validation set does
not appear after 3 iterations Increase, then we reduce the learning rate
by half. At the same time, in order to prevent overfitting, the ear-
ly_stopping parameter is set. If the loss of the validation set does not
drop after 4 iterations, the model training is considered completed.
When training and testing the model, the size of the picture is uniformly
normalized to 224 × 224 as the input of the model.

4.3. Model evaluation index


Fig. 18. Loss curve of PlantVillage validation set.
Taking into account the constraints of the hardware and software of
the mobile devices, the model is required to have a small memory re-
Table 7
Classification results on validation set of PlantVillage.
quirement while achieving high precision. Therefore, the average re-
cognition accuracy of the model and FLOPs, parameters are used to
Models Size (MB) Valid accuracy measure the performance of the model.
MobilNetV1 25.1 97.26
MobilNetV2 18.1 97.93 4.3.1. The average recognition accuracy of the model
MobileNetV3 67.6 97.89
The average recognition accuracy of the model is an important in-
GoogLeNet 87.4 96.52
ShuffleNetV2 31.7 95.32 dicator to measure the performance of the model. The calculation of the
Our Model 14.8 99.27 recognition average accuracy (AA) is shown below
nc
1 Ni
AA =
Table 8 nc i=1
NTi (9)
Detailed results on test set of PlantVillage. FLOPs are estimated for input of
3 × 224 × 224. where nc is the number of categories in the sample. nc = 5, on our
constructed dataset; nc = 38 on PlantVillage dataset. Ni is the re-
Model FLOPs Parameters Top-1 Top-5 NTi
cognition accuracy rate of the i-th category, where Ni is the number of
GoogLeNet 2 GFLOPs 10.05M 96.83 99.91 pictures predicted as i in the i-th category, which means TP (True
MobileNetV1 583.92MFLOPs 3.25M 97.26 99.89 Positive), and NTi is the total number of samples in the i-th category.
MobileNetV2 320.24MFLOPs 2.27M 97.94 99.94
MobileNetV3 219.65MFLOPs 4.24M 97.98 99.95
ShuffleNetV2 150.6MFLOPs 1.29M 95.50 99.71 4.3.2. Model FLOPs and parameters
Ours 1.16GFLOPs 1.89M 99.38 99.99 Model parameters is an important indicator for deployment on
mobile platforms, which determines the size of the model. The memory
space of the mobile platform is limited. If the model size is too large, the
application will load slowly and the response not be timely. Floating
point of operations (FLOPs) is used to measure model complexity.
Therefore, the size of the model is compressed as much as possible for
the sake of file portability. In order to meet the hardware conditions of
the mobile devices, model FLOPs and parameters should be small.

7
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

Less than
threshold

Greater than or
equal to threshold

Fig. 19. Process design of solanaceous disease identification system.

4.4. Evaluation of model structure according Figs. 8 and 10. The model recognition accuracy rate is
91.03%, which is higher than the benchmark model's 85.65%. Besides,
In order to explore the influence of the multi-scale feature extrac- the model size is just 13.5 MB.
tion module, BN layer and SENet in the constructed model, experiments
were carried out based on our constructed dataset. Our constructed
4.4.2. Evaluation of BN
dataset has a total of 3360 images, which are divided into a training set
We added BN to the base model to explore its effect on the model.
and a validation set according to a ratio of 8: 2. The structure of the
The blue curve represents Base_BN and the purple one represents Base
base model is shown in Table 2. It contains 2 convolutional layers, 2
in Figs. 7–10. After adding BN, the performance of the model on the
maximum pooling layers, 3 InceptionV2 modules, 1 maximum pooling
training set has been significantly improved, according to Figs. 7 and 8.
layer, and 1 fully connected layer. Base model and the models we
Besides, the convergence time of the model was reduced after adding
compare to use the same training parameters and training methods.
BN. However, we found its performance on validation set was not very
Training these models according to the training parameters in Section
well. Then, based on the multi-scale feature extraction module, a BN
4.2. The accuracy curve and loss curve of the training set are shown in
layer was added after each convolutional layer to explore the effect of
Figs. 7 and 8, respectively. Fig. 9 shows the accuracy curves of different
combining BN and multi-scale feature extraction module on the model
models on the validation set. Fig. 10 shows the loss curves of different
recognition. Comparing the orange curve and green one in Figs. 7 and
models on the validation set.
8, we found that accuracy increased and loss declined of Base_Multi_BN
on training set. And the model training time was shortened by adding
the BN layer. It can be seen from Fig. 9 that the average recognition
4.4.1. Evaluation of multi-scale feature extraction
accuracy rate is 96.58%, which is 5.7% higher than the Base_Multi.
In order to enhance the extraction of the model on disease features
After adding the BN layer, the size of the Base_Multi_BN model is
at different scales, the first convolution operation of the base model was
13.7 MB. Compared with the previous Base_Multi model, its size hardly
replaced with a multi-scale feature extraction module. This module
increases.
performs feature extraction on the input picture through 1 × 1, 3 × 3,
5 × 5, 7 × 7 parallel convolution. In Figs. 7 and 9, Base represents the
network result without adding the multi-scale feature extraction 4.4.3. Evaluation of SENet
module, and Base_Multi represents the model result after adding the In order to further improve the recognition effect of the model, we
module. Comparing these curves, it can be concluded that multi-scale combined the original InceptionV2 structure and SENet into SE-
feature extraction modules contributes to extract disease features at InceptionV2. As shown by Base_Multi_BN_SENet in Figs. 7 and 8, the
different scales, thereby improving the recognition accuracy of the final model performed slightly worse than model Base_Multi_BN and
model. We can also find that loss has reduced after adding this module Base_BN on the training set. Base_Multi_BN_SENet in Fig. 9 is the

8
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

experimental results show that our model meets the requirements of


both high precision and small model size.

4.4.4. Evaluation of model organization


We conducted ablation experiments to determine the architecture of
the model. We deleted InceptionV2 module in Base_Multi_BN model
mentioned in Section 4.4.2 and used it as the basic model. Based on the
basic model , we used different numbers of SE-Inception modules to
experiment on our constructed dataset. The specific experimental re-
sults are shown in Table 4. At the beginning, the accuracy rate in-
creased as the number of SENet modules increased, according to
Table 4. The model had better results when using 3 SE-Inception. We
used 3 SE-Inception modules and 1 SENet module to build the model.
As can be seen from Table 4, the accuracy of our model reaches to
98.29%, which is higher than 4SE-Inception.

4.5. Comparative evaluation

4.5.1. Evaluation on our constructed datasets


Based on our constructed disease dataset, a comparative experiment
was carried out with MobileNetV1, MobileNetV3 , ShuffleNetV2 and
GoogLeNet. We trained the compared models from scratch. The dataset
is consistent with Section 4.4 dataset division. Uniformly, we also select
the model after the validation set loss convergence as the final pre-
servation model. The comparison of the training and validation curves
about our model are shown in Figs. 11 and 12, respectively. And we
show accuracy and loss curves of our model and compared models on
the validation set in Figs. 13 and 14.
According to Figs. 11 and 12, our model performs well on both the
training set and the validation set. It can be seen from Fig. 13 that the
recognition accuracy of our model is higher than that of MobileNetV1,
MobileNetV2, MobileNetV3, GoogLeNet and ShuffleNetV2. The average
recognition accuracy of our model reached 98.29%. Although the
convergence speed of ShuffleNetV2 is relatively fast, its accuracy is only
86.78%. Compared with MobileNetV3, our constructed model iterated
5 times more, but the accuracy has increased by 2.85 percentage points.
It can be seen from Fig. 14 that the loss of our constructed model is
lower than other models. The Parameters, FLOPs, size and accuracy of
the model are described in Table 5. The identification results of specific
diseases are shown in Table 6.
According to Table 5, we can find that our model has highest ac-
Fig. 20. Examples of mobile devices identification.
curacy and smallest size among these models. The accuracy and size of
our model is 98.29% and 14.68 MB respectively. Our model has less
Table 9 parameters than other models except ShuffleNetV2. The weakness is
Application test of Solanaceae disease recognition system based on Android that FLOPs of our model is 1.16GFLOPs which is higher than Mobile-
platform. NetV1, MobileNetV2, MobileNetV3 and ShuffNetV2. The average re-
Model Recognition time (ms) Accuracy (%) System Size (MB) cognition accuracy rate of MobilenetV1, V2, V3, GoogLeNet and Shuf-
fleNetV2 are 94.02%, 94.21%, 95.44%, 89.17% and 86.78%
GoogLeNet 367 88.01 120
respectively. Besides, these models are lagger than ours.
MobileNetV1 219 94.27 89.84
MobileNetV2 350 93.19 85.38 We show precision and recall of different category in Table 6. It can
MobileNetV3 235 95.36 110 be seen from Table 6 that the precision of our model on yellow smut,
ShuffleNetV2 205 85.83 92.34 cotton blight, powdery mildew, whitefly and healthy is 96.67%, 100%,
Ours 227 95.09 84.84 97.37%, 90.91%, and 99.33% respectively. The disease spots of Cotton
blight obvious, all models have good performance on it. However,
powdery mildew and whitefly disease spots are small and similar.
validation set curve of our final model. It can be seen from the curve
Model judge whitefly as another disease so the precision of Whitefly is
that after adding SENet, the average recognition accuracy of the model
low.
has increased from 96.58% to 98.29%. In addition, the size of the model
has only increased slightly, from 13.7 MB in Base_Multi_BN to 14.6 MB.
4.5.2. Evaluation on PlantVillage datasets
The specific influence of different modules on the model recognition
In order to verify the robustness of this model, it was verified on the
effect is shown in Table 3. It can be seen from Table 3 that multi-scale
public dataset Plantvillage. PlantVillage data set is divided into training
feature extraction and SENet help the model improve the recognition
set, validation set and test set according to the ratio of 6: 2: 2. The
effect, and BN helps accelerate the convergence of the model. Ac-
PlantVillage dataset has a total of 54,305 disease pictures, including
cording to Table 3, the parameters increases to 1.86 M after adding
32,571 pictures in the training set, 10,852 pictures in the validation set,
SENet while accuracy also increase to 98.29%. We can also find that the
and 10,852 pictures in the test set. It includes 13 kinds of plants and 26
parameters and FLOPs of BN and Multi are not high. The above
types of disease leaves totally. PlantVillage also includes 12 types of

9
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

healthy leaves and the number of categories is 38. According to the experiments on the PlantVillage dataset and our constructed dataset,
training parameters in 4.2. We trained the compared models from and compared with some common lightweight network models. The
scratch. experimental results show that our model can better balance the re-
The accuracy curve and loss curve of the training set are shown in cognition accuracy and the memory consumption required for opera-
Figs. 15 and 16, respectively. And we show accuracy and loss curves of tion. It has high operating efficiency and the average recognition ac-
validation set in Figs. 17 and 18. According to Figs. 15 and 16, our curacy of the model on our constructed dataset and PlantVillge public
model performs better than GoogLeNet. Our number of iterations is data set reaches 98.29% and 99.27%, the model size is 14.68 MB and
higher than other models. Our model's performance on the training set 14.8 MB respectively. The weakness of our model is that FLOPs is 1.16
is slightly inferior to other models. But, as can be seen from Fig. 17, the GFLOPs. Although it is smaller than GoogLeNet, comparing with other
recognition accuracy of our model is higher than MobileNetV1, Mobi- models like MobileNetV1, V2, V3, it is still a large number. We had also
leNetV2, MobileNetV3, GoogLeNet and ShuffleNetV2. As can be seen developed a Solanaceae disease recognition system based on this
from Fig. 18, the loss of our model is lower than other models. Vali- model, which could achieve a recognition speed of 4 frames/s on the
dation set accuracy and model size on Plant Village of different models common Android platform and achieve 95.09% recognition accuracy
are shown in Table 7. FLOPs, parameters, top-1, top-5 are shown in on the test set, which initially meets the production of Solanaceae
Table 8. disease recognition on the mobile platform demand.
It can be seen from Table 7 that the accuracy rates of our model on In future work, we will further adjust the model structure to reduce
the validation set are 99.27%, which is 1.34% higher than Mobile- the FLOPs of the model. In a word, this model achieves a higher ac-
NetV2. The accuracy of the validation set of MobileNetV1, Mobile- curacy while occupying a smaller space, laying the foundation for the
NetV2, MobileNetV3, GoogLeNet and ShuffleNetV2 is 97.26%, 97.93%, deployment of mobile devices, and provides method guidance for the
97.89%, 96.52%, 95.32%, respectively. The model size of our model is automatic identification of diseases in the agricultural field.
14.8 MB which is smaller than others. The result of the models on the
test set can be seen from Table 8, our top-1 accuracy and top-5 accuracy CRediT authorship contribution statement
are highest among these models. And Parameters of our model is
1.29 M, less than MobileNetV1, V2, V3 and GoogLeNet. The weakness Zhenbo Li: Conceptualization, Supervision, Formal analysis.
refers to the high FLOPs. The experimental results prove that the model Yongbo Yang: Methodology, Software, Writing - original draft, Writing
is robust and has good performance on our constructed dataset and - review & editing. Ye Li: Validation, Visualization. RuoHao Guo:
public dataset. Investigation. Jinqi Yang: Data curation. Jun Yue: Resources.

5. Solanaceae disease recognition system based on SE-Inception Declaration of Competing Interest

We used SE-Inception to develop a Solanaceae disease recognition The authors declare that they have no known competing financial
system based on the Android platform. Our system was deployed on interests or personal relationships that could have appeared to influ-
Huawei Honor 10 mobile phone. The design process of the system is ence the work reported in this paper.
shown in Fig. 19. Our model requires the input image format to be a
color image. After the user uploads a photo of any size, the system will Acknowledgements
unify the image size to 224 × 224 × 3 through scaling. The system will
return the label corresponding to the maximum probability value as the Our deepest gratitude goes to the anonymous reviewers and editors
result to the user. We set a threshold value of 0.8 in the system. When for their careful work that have helped improve this paper sub-
the probability value of the largest category label is greater than or stantially. This study is supported by Hebei Province Science and
equal to 0.8, the recognition result is returned to the user. When the technology plan project under grant no. 18047405D—Integration and
probability value is less than 0.8, the user is asked to re-enter an image. demonstration of Internet of Things technology for quality and safety
Users can upload images in two ways: shooting and local uploading. In management of facility vegetables.
addition to displaying the recognition results directly to the user, the
recognition results can also be saved locally in the form of screenshots References
for users to view. The system operation interface is shown in Fig. 20.
We used 367 images to test the system, and the test results are Buda, M., Maki, A., Mazurowski, M.A., 2018. A systematic study of the class imbalance
shown in Table 9. It can be seen from the Table 9, the accuracy and problem in convolutional neural networks. Neural Networks 106, 249–259.
system size of our model are 95.09% and 84.84 MB respectively. System Xiaoqing, G., Taojie, Fan, Xin, S., 2019. Tomato leaf diseases recognition based on im-
proved Multi-Scale AlexNet. Trans. Chin. Soc. Agric. Eng. 35 (13), 162–169.
contains the tensorflow framework so its size is higher than the model Hassanien, A.E., Gaber, T., Mokhtar, U., Hefny, H., 2017. An improved moth flame op-
size. Although ShuffleNetV2 has the fastest recognition speed, its ac- timization algorithm based on rough sets for tomato diseases detection. Comput.
curacy is lower. The accuracy of our model is almost the same as that of Electron. Agric. 136, 86–96.
He, K., Zhang, X., Ren, S. Jian, 2016. Deep residual learning for image recognition. In:
MobileNetV3, but our system size is 25.16 MB less than that. The IEEE Conference on Computer Vision & Pattern Recognition.
average recognition time of a single picture in our system is 227 ms. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M.,
Adam, H., 2017. MobileNets: efficient convolutional neural networks for mobile vi-
sion applications. arXiv preprint arXiv:1704.04861.
6. Conclusion
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E., 2017. Squeeze-and-excitation networks. IEEE
Trans. Pattern Anal. Mach. Intell. 7132–7144.
This paper proposes a solanaceous disease identification model Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected
convolutional networks. In: Proceedings of the IEEE conference on computer vision
based on SE-Inception, which well satisfies the mobile devices's needs
and pattern recognition, 4700–4708.
for disease identification models. By using batch normalization layers Hughes, D.P., Salathe, M., 2015. An open access repository of images on plant health to
after each convolutional layer in the model, training time is greatly enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.
reduced and training stability is also improved. At the same time, the 08060.
Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by
multi-scale feature extraction module is used to improve the recogni- reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
tion accuracy of the model for different diseases. In addition, the SE Kamal, M.M., Masazhar, A.N.I., Rahman, F.A., 2018. Classification of leaf disease from
module is also added to the model, so that the model channel in- image processing technique. Indonesian J. Electr. Eng. Comput. Sci. 10 (1), 191–200.
Krizhevsky, A., Sutskever, I., Hinton, G., 2012. ImageNet classification with deep con-
formation can be fully utilized to improve the recognition rate. volutional neural networks. In: International Conference on Neural Information
In order to verify the effectiveness of the model, we conducted

10
Z. Li, et al. Computers and Electronics in Agriculture 178 (2020) 105792

Processing Systems, 1097–1105. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
Liang, Q., Xiang, S., Hu, Y., Coppola, G., Zhang, D., Sun, W., 2019. PD2SE-Net: Computer- V., Rabinovich, A., 2015. Going deeper with convolutions. In: Proceedings of the IEEE
assisted plant disease diagnosis and severity estimation network. Comput. Electron. conference on computer vision and pattern recognition, 1–9.
Agric. 157, 518–529. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception
Liran, W., Jun, Y., Zhenbo, L., Guangjie, K., Haiping, Q., 2017. Multi- classification de- architecture for computer vision. In: Proceedings of the IEEE conference on computer
tection method of plant leaf disease based on kernel function SVM. Trans. Chin. Soc. vision and pattern recognition, 2818–2826.
Agric. Mach. 48 (S1), 166–171. Too, E.C., Yujian, L., Njuki, S., Yingchun, L., 2019. A comparative study of fine-tuning
Ma, N., Zhang, X., Zheng, H.T., Sun, J., 2018. ShuffleNet V2: Practical Guidelines for deep learning models for plant disease identification. Comput. Electron. Agric. 161,
Efficient CNN Architecture Design, 116–131. 272–279.
Mohanty, S.P., Hughes, D.P., Salathé, M., 2016. Using deep learning for image-based Yang, L., Quan, F., Shuzhi, W., 2019. Plant disease identification method based on
plant disease detection. Front. Plant Sci. 7 (1419). lightweight CNN and mobile application. Trans. Chin. Soc. Agric. Eng. 35 (17),
Nachtigall, L.G., Araujo, R.M., Nachtigall, G.R., 2016. Classification of apple tree dis- 194–204.
orders using convolutional neural networks. In: 2016 IEEE 28th International Yongquan, X., Bing, W., Jun, Z., Haipeng, H., Jingru, S., 2018. Identification of wheat leaf
Conference on Tools with Artificial Intelligence (ICTAI), IEEE, 472–476. disease based on random forest method. J. Graph., 39(01), 57–62.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C., 2018. Mobilenetv2: Yu, W., Yang, K., Yao, H., Sun, X., Xu, P., 2017. Exploiting the complementary strengths
Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on of multi-layer CNN features for image retrieval. Neurocomputing 237, 235–241.
computer vision and pattern recognition, 4510–4520. Zhang, X., Zhou, X., Lin, M., Sun, J., 2018. ShuffleNet: an extremely efficient convolu-
Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale tional neural network for mobile devices. In: Proceedings of the IEEE conference on
image recognition. arXiv preprint arXiv:1409.1556. computer vision and pattern recognition, 6848–6856.
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A., 2017. Inception-v4, inception-resnet Zhong, Y., Zhao, M., 2020. Research on deep learning in apple leaf disease recognition.
and the impact of residual connections on learning. In: Thirty-first AAAI conference Comput. Electron. Agric. 168, 105146.
on artificial intelligence.

11

You might also like