You are on page 1of 14

sensors

Article
Apple Leaf Diseases Recognition Based on An
Improved Convolutional Neural Network
Qian Yan 1,2,† , Baohua Yang 3,† , Wenyan Wang 1 , Bing Wang 1,2, *, Peng Chen 4 and Jun Zhang 4
1 School of Electrical and Information Engineering, Anhui University of Technology,
Ma’anshan 243032, China; yanqian201288@163.com (Q.Y.); wywangahut@163.com (W.W.)
2 Key Laboratory of Power Electronics and Motion Control Anhui Education Department, Anhui University
of Technology, Ma’anshan 243032, China
3 School of Information and Computer, Anhui Agricultural University, Hefei 230036, China; ybh@ahau.edu.cn
4 Co-Innovation Center for Information Supply & Assurance Technology, Anhui University,
Hefei 230032, China; pchen@ahu.edu.cn (P.C.); junzhang@ahu.edu.cn (J.Z.)
* Correspondence: wangbing@ustc.edu
† These authors are equally contributed to this work.

Received: 6 April 2020; Accepted: 17 June 2020; Published: 22 June 2020 

Abstract: Scab, frogeye spot, and cedar rust are three common types of apple leaf diseases, and the
rapid diagnosis and accurate identification of them play an important role in the development of
apple production. In this work, an improved model based on VGG16 is proposed to identify apple leaf
diseases, in which the global average poling layer is used to replace the fully connected layer to reduce
the parameters and a batch normalization layer is added to improve the convergence speed. A transfer
learning strategy is used to avoid a long training time. The experimental results show that the overall
accuracy of apple leaf classification based on the proposed model can reach 99.01%. Compared
with the classical VGG16, the model parameters are reduced by 89%, the recognition accuracy is
improved by 6.3%, and the training time is reduced to 0.56% of that of the original model. Therefore,
the deep convolutional neural network model proposed in this work provides a better solution for
the identification of apple leaf diseases with higher accuracy and a faster convergence speed.

Keywords: apple leaf diseases; transfer learning; deep learning; convolutional neural networks

1. Introduction
Leaf diseases are one of the main obstacles to apple production. Among them, scab, frogeye spot,
and cedar rust are three most common types of apple leaf diseases and have a bad impact on apple
growing. Therefore, the detection of apple leaf diseases has attracted more and more attention, and the
early identification of apple leaf disease is very important for the intervention of treatment. In the past,
disease identification methods were generally divided into manual identification and an expert system.
However, both of them are highly dependent on fruit growers and experts and are time-consuming
and usually poor in generalization.
With the development of machine learning methods, some computational models have been
proposed for plant disease diagnosis based on different algorithms. Some studies have found diseased
regions by K-means clustering-based segmentation and build disease recognition models using
supervised learning methods, including the random forest, support vector machine (SVM), and
K-nearest neighbor methods [1–3]. Rothe et al. used an active contour model for image segmentation
and extracted Hu’s moments as features for the training of an adaptive neuro-fuzzy inference system,
by which a classification accuracy of 85% can be achieved [4]. Gupta et al. proposed an autonomously
modified SVM-CS model where a SVM model was trained and optimized using the concept of a cuckoo

Sensors 2020, 20, 3535; doi:10.3390/s20123535 www.mdpi.com/journal/sensors


Sensors 2020, 20, 3535 2 of 14

search [5]. However, these classification features are heavily depended on man-made selection and the
recognition rates are not satisfactory.
In recent years, convolutional neural networks (CNNs) have shown good results in recognition
tasks by reducing the need for image preprocessing and improving the identification accuracy [6–13].
Leaf disease recognition based on CNNs has become a new hotspot in the agricultural informatization
area [14–16]. Lu et al. proposed a rice disease identification method based on deep CNN techniques
and achieved an accuracy of 95.48% on a dataset of 500 natural images of diseased and healthy
rice leaves [17]. Zhang et al. proposed the improved GoogLeNet and Cifar10 models and obtained
the average identification accuracies of 98.9% and 98.8%, respectively [18]. Liu et al. designed a
novel architecture of AlexNet to detect apple leaf diseases, and the experimental results showed that
this approach achieved an overall accuracy of 97.62% for disease identification [19]. Although the
recognition accuracy of these CNN models is higher than that of traditional machine learning methods,
there are still some shortcomings—such as high model complexity, much more parameters, and a long
training time—which prevent their application in real environments.
In this work, we propose a method for apple leaf disease identification based on an improved
deep convolution neural network architecture which can effectively reduce the model complexity and
training time. The network proposed in this work adopts the concept of transfer learning to pre-train a
VGG16 network and adjusts the network structure by removing three fully connected layers, adding
a global average pooling layer, a batch normalization layer, and a fully connected layer. Based on a
benchmark dataset, the proposed model, which can reach a 89% reduction in the model parameters of
the original VGG16 model, greatly reduced the training time and achieved a higher accuracy rate.

2. Methods

2.1. Data
The dataset in this work is from the “2008 ‘AI Challenger’ Global Challenge” and includes 10
kinds of plants with 27 categories of diseases. This work addresses the automatic identification of
apple leaf diseases, therefore only apple leaves are selected from this dataset. There are four categories
of apple leaf images within the dataset, and Figure 1 lists some of them. With the exception of healthy
leaves, three types of disease images—i.e., scab, frogeye spot, and cedar rust—are collected within the
dataset. Typically, the lesions on scab leaves are gray-brown and nearly round or radial, frogeye spot is
tan and the shape is flakes or dots, and cedar rust leaves have round orange-yellow lesions with red
edges. Some spot and cedar rust lesions are similar in color and shape, which increases the difficulty in
recognition by computational methods.
In this work, there are 2446 pictures collected within our dataset, where 1340 of them are healthy,
411 are scab, 487 are frogeye spot, and 208 are cedar rust. In the original dataset, the dataset was
divided into two subsets—i.e., 2141 pictures were for model training and the remaining 305 ones for
testing. The details about the dataset are shown in Table 1.
Sensors 2020, 20, 3535 3 of 14
Sensors 2020, 20, x 3 of 14

(a) (b) (c) (d)

Figure
Figure1.1.Four
Fourkinds
kindsof
ofapple
appleleaves.
leaves.(a)
(a)Healthy,
Healthy,(b)
(b)scab,
scab,(c)
(c)frogeye
frogeyespot,
spot,(d)
(d)cedar
cedar rust.
rust.

In this work, there are 2446 Table 1. Details


pictures of thewithin
collected apple leaves.
our dataset, where 1340 of them are
healthy, 411 are scab, 487 are frogeye
Classes spot, and 208 are
Training Number cedar rust. InNumber
Test the original dataset, the dataset
was divided into two subsets—i.e., 2141 pictures were for model training and the remaining 305
Healthy 1172 168
ones for testing. The details about
Scab the dataset are360
shown in Table 1. 51
Frogeye Spot 427 60
Cedar rustTable 1. Details182
of the apple leaves. 26
Total 2141 305
Classes Training Number Test Number

2.2. VGG16 and Transfer Learning


Healthy 1172 168

2.2.1. VGG16 Scab 360 51


With the rapid development of deep learning,
Frogeye Spot 427CNNs had been60 applied widely in different fields,
especially in image classification and recognition and target location and detection [20]. A CNN is a
Cedar
special multi-layer perceptron rust or multilayered
(MLP) 182 feed forward26 neural network, which generally
consists of an input layer, convolution
Total layer, pooling
2141 layer, fully connected
305 layer, and output layer.
The convolution layer can realize dimensionality reduction and feature extraction by implementing
two design concepts: local perception and parameter sharing. The pooling layer can reduce the size of
With the rapid development of deep learning, CNNs had been applied widely in different
fields, especially in image classification and recognition and target location and detection [20]. A
CNN is a special multi‐layer perceptron (MLP) or multilayered feed forward neural network, which
generally consists of an input layer, convolution layer, pooling layer, fully connected layer, and
output layer.
Sensors 2020, The convolution layer can realize dimensionality reduction and feature extraction
20, 3535 4 ofby
14
implementing two design concepts: local perception and parameter sharing. The pooling layer can
reduce the size of the data, where smart sampling also has the invariance of local linear
the data, where smart sampling also has the invariance of local linear transformation, which enhances
transformation, which enhances the generalization ability of convolutional neural networks. The
the generalization ability of convolutional neural networks. The fully connected layer acts as a classifier
fully connected layer acts as a classifier in the whole neural network. It is common for multiple fully
in the whole neural network. It is common for multiple fully connected layers to be used after several
connected layers to be used after several rounds of convolution, and the resulting structure of the
rounds of convolution, and the resulting structure of the last convolutional layer is flattened [21,22].
last convolutional layer is flattened [21,22].
The VGG16 contains 16 convolutional layers with very small receptive fields, 3 × 3, and five
The VGG16 contains 16 convolutional layers with very small receptive fields, 3 × 3, and five
max-pooling layers of size 2 × 2 for carrying out spatial pooling, followed by three fully connected
max‐pooling layers of size 2 × 2 for carrying out spatial pooling, followed by three fully connected
layers. A classical VGG16 model involves 144 million parameters, where rectification nonlinearity
layers. A classical VGG16 model involves 144 million parameters, where rectification nonlinearity
(ReLU) activation is applied to all hidden space pooling and the softmax function is applied in the final
(ReLU) activation is applied to all hidden space pooling and the softmax function is applied in the
layer [23]. The model also uses dropout regularization in the fully connected layers. A schematic of
final layer [23]. The model also uses dropout regularization in the fully connected layers. A
the VGG16 architecture is shown in Figure 2, where the marked red box shows a classifier consisting of
schematic of the VGG16 architecture is shown in Figure 2, where the marked red box shows a
three fully connected layers.
classifier consisting of three fully connected layers.

Figure 2. A schematic of the VGG16 architecture.


Figure 2. A schematic of the VGG16 architecture.

2.2.2. Transfer
Transfer Learning
CNNs typically
typically require
require aa large annotated
annotated image
image dataset
dataset to
to achieve
achieve aa high
high predictive
predictive accuracy.
accuracy.
However, the acquisition of such data is difficult and labeling them is costly in many areas. In light
of these challenges, the concept of transfer learning is adopted in many previous studies for solving
cross‐domain
cross-domain image
image classification
classification problems
problems and
and has
has been shown to be very useful, where the
“off‐the‐shelf” features
“off-the-shelf” features of
of well-established
well‐established CNNs, such as VGG16, AlexNet, and GoogLeNet, are
pre-trained on large-scale annotated natural image datasets, such as ImageNet, where 15 million
images are involved [24–27].
One common strategy of transfer learning is feature transfer, which removes the last layer of the
pre-trained network and sends its previous activation values, which can be regarded as feature vectors,
into classifiers for training. Another is parameter transfer, which only needs to re-initialize a few layers
of the network, such as the last layer, and the other layers directly using the weight parameters of the
pre-trained network, while a new dataset is used to finetune the network parameters [28–30].
Because of the small amount of data in this work, training a neural network from scratch will
take a long time, and the data insufficiency easily causes an over-fitting problem, which will bring
pre‐trained network and sends its previous activation values, which can be regarded as feature
vectors, into classifiers for training. Another is parameter transfer, which only needs to re‐initialize a
few layers of the network, such as the last layer, and the other layers directly using the weight
parameters of the pre‐trained network, while a new dataset is used to finetune the network
parameters [28–30].
Sensors 2020, 20, 3535 5 of 14
Because of the small amount of data in this work, training a neural network from scratch will
take a long time, and the data insufficiency easily causes an over‐fitting problem, which will bring
the
the model
model poor
poor robustness.
robustness. Therefore,
Therefore,we wecan
can use
use the
the idea
idea of
of transfer
transfer learning,
learning, where
where aa pre‐trained
pre-trained
model
model isis built
builtononImageNet
ImageNet to to optimize
optimize the the classification
classification and recognition
and recognition of appleof leaf
apple leaf diseases.
diseases. Herein,
Herein,
the VGG16the VGG16 is finetotuned
is fine tuned fit ourtoown
fit our own
data, data,can
which which
savecan save
a lot a lot of training
of training time. time.

2.3. Improved
2.3. ImprovedCNNs
CNNs Based
Based on
on VGG16
VGG16

AA classical
classical VGG16
VGG16network
networkhashasa astrong
strong ability of of
ability image
imagefeature extraction
feature extractionandand
recognition.
recognition. Its core
Its
idea idea
core is to is
use tosmaller
use smallerconvolution kernels
convolution to increase
kernels the depth
to increase of the of
the depth network, whichwhich
the network, was the was keythe
to
win the runner-up position in positioning and classification tasks in the ILSVRC
key to win the runner‐up position in positioning and classification tasks in the ILSVRC Challenge in Challenge in 2014.
However,
2014. the VGG16
However, model has
the VGG16 a huge
model hasamount
a hugeofamount
parameters, which will which
of parameters, cause awill
slowcause
convergence
a slow
speed, long training
convergence speed, longtime,training
and large storage
time, capacity
and large in practical
storage capacityapplications.
in practical applications.
To address these problems,
To problems, this
this work
work improves
improves the the VGG16
VGG16 model
model by by using
using aa global
global average
average
pooling layer,
pooling layer, aa batch
batch normalization
normalization layer
layer and
and aa fully
fully connected
connected layer
layer to to replace
replace the
the three
three fully
fully
connected layers in the original model. The global average pooling layer is used
connected layers in the original model. The global average pooling layer is used to replace the fully to replace the fully
connected layer to reduce the parameters, and the batch normalization layer
connected layer to reduce the parameters, and the batch normalization layer is added to improve the is added to improve the
convergence speed.
convergence speed. In Inorder
ordertoto avoid
avoid aa long
long training
training time,
time, the
the weights
weights of of the
the convolution
convolution layers
layers areare
pre-trained by VGG16 on ImageNet. The stochastic gradient descent (SGD)
pre‐trained by VGG16 on ImageNet. The stochastic gradient descent (SGD) optimizer is replaced by optimizer is replaced by
an adaptive
an adaptive moment
moment estimation
estimation (Adam)
(Adam) to to accelerate
accelerate thethe convergence
convergence of of the
the network. The The network
network
structure is
structure is shown
shown in in Figure
Figure 3,
3, where
where the
the improvement
improvement of of aa classifier
classifier consisting
consisting of of aa global
global average
average
pooling layer, a batch normalization
pooling layer, a batch normalization layer, and layer, and a fully connected layer is shown within
connected layer is shown within the marked the marked
green box.
green box.

Figure 3. A schematic of the proposed CNN based on VGG16.


Figure 3. A schematic of the proposed CNN based on VGG16.
2.3.1. Global Average Pooling Layer (GAP)
2.3.1. Global
Global average
Averagepooling
PoolingisLayer (GAP) the whole network structure to prevent over-fitting and
to regularize
reduce the dimensions
Global from 3D
average pooling is to 1D [31,32]. the
regularize In this work,
whole the feature
network maps to
structure in prevent
the last convolution
over‐fittinglayer
and
are averaged
reduce into a series
the dimensions of 3D
from 1D outputs whichInisthis
to 1D [31,32]. shown in Figure
work, 4. A maps
the feature GAP can omit
in the lastthe expansion
convolution
of theare
layer feature mapsinto
averaged intoavectors
series ofand1Dfull connection
outputs whichprocessing,
is shown in and therefore
Figure 4. A greatly
GAP can reduces
omit thethe
number of of
expansion parameters.
the featureThe advantage
maps of a GAP
into vectors andover
fulla connection
fully connected layer is that
processing, and ittherefore
can preserve the
greatly
convolution
reduces structure
the number ofbetter by enhancing
parameters. the correspondence
The advantage of a GAP over between
a fullythe feature maps
connected layer is and analogy,
that it can
making the
preserve theclassification
convolutionofstructure
the feature mapby
better credible and well-explained.
enhancing the correspondence between the feature
maps and analogy, making the classification of the feature map credible and well‐explained.
Sensors 2020, 20, 3535 6 of 14

Sensors 2020, 20, x 6 of 14

Figure 4. Global average pooling.


Figure 4. Global average pooling.

2.3.2. Batch
2.3.2. Batch Normalization
Normalization (BN)
(BN)
In deep
In deeplearning,
learning, because
because the the number
number of layers
of layers in the network
in the network is very
is very large, if thelarge, if the data
data distribution
distribution at a certain layer starts to deviate significantly, this problem
at a certain layer starts to deviate significantly, this problem will intensify as the network deepens, will intensify as the
network
which willdeepens,
increase the which will ofincrease
difficulty the model the difficulty of
optimization. the model
Therefore, optimization.
normalization helps to Therefore,
alleviate
normalization helps to alleviate this problem. This method of batch
this problem. This method of batch normalization divides the data into several groups and updatesnormalization divides the data
intoparameters
the several groups and updates
according the parameters
to the groups. The dataaccording
in one group to the groups.
jointly The data
determines theindirection
one group of
jointly determines the direction of the gradient and reduces the randomness
the gradient and reduces the randomness when declining. On the other hand, because the number when declining. On the
other
of hand,in
samples because
the batch the isnumber of samples
much smaller than inthe
theentire
batch dataset,
is much thesmaller thanofthe
amount entire dataset,
calculation the
has also
amount ofsignificantly.
dropped calculation has Thealsobatch dropped significantly.
normalization layer The batch normalization
normalizes the inputs tolayer normalizes
the layer the
before the
inputs to the layer before the activation function is implemented, which can
activation function is implemented, which can solve the problems of input data offset and increase [33]. solve the problems of
inputBased
data offset and increase [33].
on the BN algorithm, the parameters of the input layer are normalized and the activation
Based
function on the
cannot BN the
affect algorithm,
distributionthe parameters
of neurons.ofThe the importance
input layer of areneurons
normalized
will and the activation
be weakened and
function cannot affect the distribution of neurons. The importance of
some of them may be removed automatically. Because of the normalization of each epoch, the risk neurons will be weakened and
some
of of themchanges
parameter may be removed
caused byautomatically.
a different data Because of the normalization
distribution is reduced and ofthe
each epoch, the risk
convergence of
speed
parameter
is accelerated. changes caused by a different data distribution is reduced and the convergence
speed is accelerated.
2.3.3. Adaptive Moment Estimation
2.3.3. Adaptive Moment Estimation
Adam is an extension of the stochastic gradient descent algorithm which can iteratively update
Adamnetwork
the neural is an extension
weights of the stochastic
based on traininggradient descent
data [34,35]. Thisalgorithm
method not which
onlycan iteratively
stores update
the exponential
the neural
decay mean network
of the square weights
gradientbased buton training
also preserves datathe[34,35]. This decay
exponential method notofonly
mean stores the
the previously
exponentialfirst-order
calculated decay mean andofsecond-order
the square gradientmomentbut also preserves
estimation of thethe exponential
gradient. It alsodecay
designs mean of the
different
previously
adaptive calculated
learning ratesfirst‐order
for different and second‐order
parameters. moment estimation
Optimization algorithms of suchtheasgradient.
SGD maintain It also
a
designs
single different
learning adaptive
rate during the learning
trainingrates for different
process, and Adam parameters. Optimization
can iteratively update thealgorithms such as
neural network
SGD maintain
weights based on a single learning
the training data. rate during
When the the trainingare
parameters process, and Adamand
backpropagated canupdated,
iteratively theupdate
Adam
the neuralcan
algorithm network
better weights
adjust the based on the
learning training
rate. data. When
Thus, Adam the convergence
has a fast parameters are speedbackpropagated
and effective
and updated,
learning effect. theIt canAdamalsoalgorithm
correct thecan better existing
problems adjust the learning
in other rate. Thus,techniques,
optimization Adam hassuch a fast
as
convergence
the loss functionspeed and effective
fluctuation caused learning
by theeffect. It can alsoofcorrect
disappearance the problems
the learning rate, slowexisting in other
convergence,
optimization
or techniques,
parameter updating withsuchhigh asvariance.
the loss function fluctuation caused by the disappearance of the
learning rate, slow convergence, or parameter updating with high variance.
3. Results and Discussion
3. Results
In thisand Discussion
work, the proposed model was implemented with the Keras deep learning framework using
a Intel ® Core™ i7-8750H GPU (LENOVO, Jiangsu, China). The ImageNet pre-trained VGG16 CNN
In this work, the proposed model was implemented with the Keras deep learning framework
implemented
using a Intel® within
Core™Keras Applications
i7‐8750H takes inJiangsu,
GPU (LENOVO, a defaultChina).
image The
input size of 227
ImageNet × 227. Therefore,
pre‐trained VGG16
all
CNN the implemented
pictures in ourwithin
datasetKeras
were cut to the sametakes
Applications size of
in 227 × 227. image input size of 227 × 227.
a default
The proposed
Therefore, CNN isin
all the pictures trained on 2141
our dataset training
were cut topictures
the same and tested
size on ×305
of 227 ones, and the confusion
227.
matrices of theCNN
The proposed prediction results
is trained onis2141
shown Tablepictures
training 2. It canand
be found
tested that the ones,
on 305 cedarand
rustthe
classification
confusion
matrices of the prediction results is shown Table 2. It can be found that the cedar rust classification is
totally accurate, only one healthy picture is misclassified as scab, and only one is misclassified as
healthy in both of scab and frogeye spot categories.
Sensors 2020, 20, 3535 7 of 14

Sensors 2020, 20, x 7 of 14


is totally accurate, only one healthy picture is misclassified as scab, and only one is misclassified as
healthy in both of scab and frogeye spot categories.
Table 2. Confusion matrix of the prediction results.

True LabelTable 2. Confusion matrix of the prediction results.


Healthy Scab Frogeye Spot Cedar Rust
Predict Label
True Label
Predict Label Healthy Scab Frogeye Spot Cedar Rust
Healthy 167 1 0 0
Healthy 167 1 0 0
Scab
Scab 11 5050 00 0 0
Frogeye Spot 1 0 59 0
FrogeyeRust
Cedar Spot 10 00 059 0 26

Cedar Rust 0 0 0 26

For the three misclassified pictures in the original dataset, Figure 5 lists the original one, its
visualization of the
For the last
three convolution
misclassified layer and
pictures theoriginal
in the superposition
dataset, of the heat
Figure map
5 lists theoforiginal
the original picture.
one, its
There are some enlightenments
visualization can belayer
of the last convolution found from
and the these pictures.ofInthe
superposition Figure 5b, the
heat map strong
of the light and
original
picture.
small There
disease are some
features mayenlightenments can be found
lead to the inaccurate from these
extraction of pictures. In Figureby
disease features 5b,the
the model.
strong The
lightspots
frogeye and small disease
in Figure 5c features may
are small in lead to thelight
size and inaccurate extraction
in color, of disease
which will features
leads to by theerrors
prediction
model. The frogeye spots in Figure 5c are small in size and light in color, which will leads to
with comparison to the dark area, for light is strongly learned in the network and therefore has a
prediction errors with comparison to the dark area, for light is strongly learned in the network and
bigger weight.
therefore has a bigger weight.

(A) Healthy is misclassified as scab.

(B) Scab is misclassified as healthy.

(C) Frogeye spot is misclassified as healthy.

(a) (b) (c)


Figure
Figure 5. The
5. The details
details of three
of three misclassifiedpictures:
misclassified pictures:(a)
(a)the
the original
original picture,
picture,(b)
(b)the
thevisualization
visualizationof of the
the final convolution layer, (c) the superposition of the heat map of the original
final convolution layer, (c) the superposition of the heat map of the original picture. picture.
Sensors 2020, 20, 3535 8 of 14
Sensors 2020, 20, x 8 of 14

3.1.
3.1. Comparison
Comparison of of Model
Model Performance
Performance
To
To evaluate
evaluate the
the performance
performance of of the
the proposed
proposed VGG VGG model,
model, four
four typical
typical convolutional
convolutional neural
neural
networks—i.e.,
networks—i.e., AlexNet, GoogleNet, Resnet‐34, and VGG16—are also implemented. Another
AlexNet, GoogleNet, Resnet-34, and VGG16—are also implemented. Another apple
apple
leaf disease recognition structure presented by Liu et al., where the inception structure
leaf disease recognition structure presented by Liu et al., where the inception structure was added was added into
the AlexNet framework, has also been compared. The recognition accuracy
into the AlexNet framework, has also been compared. The recognition accuracy of the different of the different models is
shown in Figure 6.
models is shown in Figure 6.
It can
It can be
be found
found that
that the
the accuracy
accuracy of of AlexNet
AlexNet and and the
the original
original VGG16
VGG16 is is 93.11%,
93.11%, ResNet34
ResNet34 is is
95.73%, and GoogleNet can reach 97.70%. When the inception structure
95.73%, and GoogleNet can reach 97.70%. When the inception structure was combined was combined with AlexNet,
with
the identification
AlexNet, accuracy can
the identification be increased
accuracy can beto increased
97.05%, which is higher
to 97.05%, than the
which original
is higher AlexNet.
than It can
the original
be seen that
AlexNet. ourbe
It can work
seenachieves
that ourtheworkhighest
achievesaccuracy in theaccuracy
the highest identification
in theofidentification
apple leaf diseases—i.e.,
of apple leaf
adiseases—i.e,
99.01% accuracy—which
a 99.01% accuracy—which demonstrates the effectiveness of model.
demonstrates the effectiveness of the proposed Compared
the proposed to
model.
the other five
Compared models,
to the otherwhether in terms
five models, of precision,
whether in termsrecall, or F1-score,
of precision, recall,our
or model achieved
F1‐score, the
our model
highest value.
achieved the highest value.

Figure 6.
Figure 6. Recognition
Recognition accuracy
accuracy of
of different
different models.
models.

Table 33 shows
Table shows the
the precision,
precision, recall,
recall, f1-score,
f1‐score, and
and accuracy
accuracy of of different
different models
models achieved
achieved for
for the
the
four categories
four categories of
of apple
apple images.
images. The Table 4 shows that AlexNet does not learn the the features
features of
of the
the
scab well
scab wellenough,
enough,andandthe
thedetection
detectioneffect
effectisispoor;
poor;the
theimproved
improved Alex
Alex + +Inception
Inception model
model recognition
recognition is
is better than the original Alex; what is more, the original VGG16 network has the worst
better than the original Alex; what is more, the original VGG16 network has the worst learning of each learning of
each feature.
feature. For these
For these four‐leaf
four-leaf types,types, allnetworks
all the the networks
havehave the best
the best recognition
recognition rate rate for healthy
for healthy and and
the
the lowest
lowest scabscab recognition
recognition rate.rate. Regardless
Regardless of accuracy
of the the accuracy or detection
or the the detection index
index of each
of each leaf type,
leaf type, our
our model
model achieved
achieved the best
the best results.
results. In general,
In general, our model
our model hasbest
has the the best recognition
recognition effect.
effect.
Sensors 2020, 20, 3535 9 of 14

Table 3. Classification performance comparison among different models.

Classes Model Precision Recall F1-Score Accuracy


AlexNet 95.27% 95.83% 95.55% 95.83%
ResNet34 98.14% 94.05% 96.05% 94.05%
GoogleNet 98.80% 98.21% 98.51% 98.21%
Healthy
VGG16 93.14% 97.02% 95.04% 97.02%
Alex + Inception 98.80% 98.21% 98.51% 98.21%
Our Work 98.82% 99.40% 99.11% 99.40%
AlexNet 90.70% 76.47% 82.98% 76.47%
ResNet34 84.21% 94.12% 88.89% 94.21%
GoogleNet 97.96% 94.12% 96.00% 94.12%
Scab
VGG16 90.91% 78.43% 84.21% 78.43%
Alex + Inception 97.87% 90.20% 93.88% 90.20%
Our Work 98.04% 98.04% 98.04% 98.04%
AlexNet 92.06% 96.67% 94.31% 96.67%
ResNet34 98.36% 100% 99.17% 96.67%
Frogeye GoogleNet 96.77% 100% 98.36% 100%
Spot VGG16 94.92% 93.33% 94.12% 93.33%
Alex + Inception 95.24% 100% 97.56% 100%
Our Work 100% 98.00% 99.16% 98.33%
AlexNet 86.67% 100% 92.86% 100%
ResNet34 100% 100% 100% 100%
GoogleNet 92.59% 96.15% 94.34% 96.15%
Cedar rust
VGG16 92.59% 96.15% 94.34% 96.15%
Alex + Inception 89.29% 96.15% 92.59% 96.15%
Our Work 100% 100% 100% 100%

Table 4. Comparison of the training parameters and time.

Model Parameter Training Time (s)


AlexNet 58,297,732 360
ResNet34 22,671,492 685
GoogleNet 5,716,848 604
VGG16 134,252,356 123,007
Alex + Inception 5,654,356 491
Our Work 14,717,764 692

3.2. Convergence Rate Analysis


The loss values in this work are calculated by cross entropy. Figure 7 shows the accuracy and loss
values of the five models during training. The experimental results show that AlexNet, ResNet-34,
GoogleNet, Alex + Inception, and our convolutional neural network converge within 60 training
epochs, while VGG16 converges slowly. It can be found the proposed network structure converges in
10 training epochs, which is faster than the other five CNN models. The training process of GoogleNet
is similar to the process of ResNet-34, and both converge after 20 training epochs, and AlexNet and the
Alex + Inception model tend to be stable after 40 epochs.
GoogleNet, Alex + Inception, and our convolutional neural network converge within 60 training
epochs, while VGG16 converges slowly. It can be found the proposed network structure converges
in 10 training epochs, which is faster than the other five CNN models. The training process of
GoogleNet is similar to the process of ResNet‐34, and both converge after 20 training epochs, and
Sensors2020,
Sensors 2020,20,
20, x 1010ofof1415
AlexNet and3535
the Alex + Inception model tend to be stable after 40 epochs.

3.3. Training Time and Parameters


Table 4 shows the number of parameters for each model and training time required when the
model becomes stable. It can be found that the classical VGG16 model has the most parameters and
the longest training time, the Alex + Inception model has the least training parameters, and AlexNet
has the shortest training time. Our improved model can reduce 119,534,592 training parameters in
comparison to the original VGG16 model. The convolutional neural network proposed in this work
has fewer training parameters than AlexNet, ResNet34, and VGG16. The training time of the
proposed model is 692 s, which is similar to that of ResNet34 and GoogleNet.
.
Table 4. Comparison of the training parameters and time.

Model(a) Parameter Training


(b) Time (s)

AlexNet 58,297,732 360


Figure7.7.Convergence
Figure Convergencecomparison:
comparison:(a)
(a)accuracy
accuracyvalues;
values;(b)
(b)loss
lossvalues.
values.
ResNet34
3.3. Training Time and Parameters 22,671,492 685

Table 4 shows the number of parameters 5,716,848


GoogleNet for each model and training time604required when the
model becomes stable. It can be found that the classical VGG16 model has the most parameters and
VGG16
the longest training time, the Alex + Inception134,252,356 123,007 and AlexNet
model has the least training parameters,
has the shortest
Alextraining time. Our improved model
+ Inception can reduce 119,534,592 training
5,654,356 491 parameters in
comparison to the original VGG16 model. The convolutional neural network proposed in this work
Our
has fewer training Work
parameters 14,717,764
than AlexNet, ResNet34, 692 of the proposed
and VGG16. The training time
model is 692 s, which is similar to that of ResNet34 and GoogleNet.
3.4.Comparison
3.4. ComparisonofofOptimal
OptimalAlgorithms
Algorithms
Theoptimization
The optimization algorithm
algorithm is is of
of great
great importance
importancefor forthe
themodel
modelperformance.
performance.In In thisthis
work, the
work,
SGD
the SGDoptimization
optimization algorithm
algorithmininthe theoriginal
originalVGG16
VGG16isisreplaced
replaced by the Adam
by the Adam optimization
optimizationalgorithm
algorithm
to improve the converge rate. Figure 8 shows the training process of these
to improve the converge rate. Figure 8 shows the training process of these two optimization algorithms two optimization
algorithms
with with same
same learning rate oflearning
1 × 10−5rate of results
. The 1 × 10−5show
. The that
results
the show
modelthat
usingthethemodel
Adamusing the Adam
algorithm has
algorithm has a faster convergence speed. It can be found that the accuracy
a faster convergence speed. It can be found that the accuracy of testing is 98.03% when the SGD of testing is 98.03% when
the SGD algorithm
algorithm is used,
is used, while thatwhile
of thethat
Adamof the Adam algorithm
algorithm is 99.01%. is 99.01%.
From the From
loss the lossincurve
curve Figure in 8,
Figure
it can8,
it can be seen that the Adam algorithm can converge quickly and
be seen that the Adam algorithm can converge quickly and is more stable than SGD. is more stable than SGD.

(a) (b)
Figure 8. Comparison of the optimal algorithms: (a) accuracy values; (b) loss values.
Figure 8. Comparison of the optimal algorithms: (a) accuracy values; (b) loss values.

3.5. Data Augmentation


Sensors 2020, 20, 3535 11 of 14

3.5. Data Augmentation


In this work, the dataset used herein includes only 2446 pictures, which is very small in
Sensors 2020, 20, x 11 of 14
comparison to that with which the VGG16 was pre-trained. In order to evaluate the performance of
the proposed
Sensors 2020, 20, x method, a data augmentation strategy is adopted to amplify the original dataset 11 andof test
dataset by image geometric transformation, color changing, and noise adding, which increase the14
the classification performance on it. The augmented dataset is generated based on the original dataset
size of the test dataset from 2141 to 21,410.
by image
dataset geometric
by rotation
image transformation,
geometric color changing,
transformation, color and noise
changing, andadding,
noise which increase
adding, which the size of
increase thethe
Image and flipping are two types of image geometric transformations where only the
testofdataset
size the testfrom 2141 to 21,410.
dataset
location of each pixel isfrom 2141 to
changed. 21,410. the pictures at different angles and flipping can expand
Rotating
Image
Image rotation
rotation and
and flipping
flipping areare
twotwo types
types ofof image
image geometric
geometric transformations
transformations where
where only
only thethe
the diversity of directions. It is generally difficult to capture each picture from different directions,
location
location of each
of each pixel is changed.
pixel is changed. Rotating
Rotating the pictures
the pictures at different
at different angles
angles and flipping can expand
and therefore to simulate this situation to eliminate the effect of direction onand flipping
picture can expand
recognition, we
the
the diversity
diversity of of directions.
directions. It It generally
is is generally difficult
difficult to to capture
capture each
each picture
picture from
from different
different directions,
directions,
rotated the original image around the center point by 90, 180, and 270 and when flipped
and
and therefore
therefore to simulate
toshown
simulate this
this situation
situation to eliminate the effect of direction on picture recognition, we
horizontally. As in Figure 9, afterto eliminate
rotation andthe effect
flipping, of
thedirection
number on picture
of pictures recognition,
increased bywe 4
rotatedthe
rotated the original
original image around
aroundthethe center point by 90,by180,90,and 270and
and 270
whenand flipped horizontally.
times the original data image
set. center point 180, when flipped
As shown in
horizontally. AsFigure
shown9,inafter
Figurerotation
9, afterand flipping,
rotation and the number
flipping, theof pictures
number increasedincreased
of pictures by 4 times by the
4
original
times data set.data set.
the original

(a) (b) (c) (d) (e)

(a)
Figure (b)
9. Direction disturbance: (c)90, (c) 180, (d) 270, (e)
(a) initial, (b) (d)horizontal flip. (e)
Figure 9. Direction disturbance: (a) initial, (b) 90, (c) 180, (d) 270, (e) horizontal flip.
Adjusting Figure
the 9.brightness,
Direction disturbance:
contrast, (a)and
initial,
hue (b) 90,
of (c)
the180,image
(d) 270,is(e)another
horizontalcommon
flip. image
Adjusting
augmentation the brightness,
method widely usedcontrast, and hue
in image of the image
processing. is another
During common
the process imageacquisition,
of image augmentation
Adjusting
method
pictures may be the
widely brightness,
used
affectedinby
image contrast,
processing.
different weather and hue
During
and of process
the
exposed theto image is intensities
of image
different another common
acquisition, image
pictures
of light, which may
augmentation
be affected
possibly method
bythe
affects widely
different weather
experimental used in image
and exposed
results. processing.
In ordertotodifferent During
simulateintensitiesthe process of image
of light,under
image collection acquisition,
whichdifferent
possiblylight
affects
pictures may we
backgrounds, be affected
the experimental adjusted by
results. Indifferent
the order toweather
brightness simulate and
image
and contrast, exposed to different
collection
as shown under intensities
different
in Figure ofthe
light
10, and light,
datawhich
backgrounds,
was
possibly affects
we adjusted
expanded by 4 thethe experimental
brightness
times. results. In
and contrast, as order
showntoinsimulate
Figure 10, image collection
and the data wasunder different
expanded by 4light
times.
backgrounds, we adjusted the brightness and contrast, as shown in Figure 10, and the data was
expanded by 4 times.

(a) (b) (c) (d) (e)

10. Color enhancement:


Figure(a) (a) initial, (b) low (c)
brightness, (c) high brightness, (d) low contrast,
(e) (e)
Figure 10. Color enhancement:(b)(a) initial, (b) low brightness, (d)
(c) high brightness, (d) low contrast,
high contrast.
(e) high contrast.
Figure 10. Color enhancement: (a) initial, (b) low brightness, (c) high brightness, (d) low contrast,
(e)To
To
gain
high
gain
some insight into the noise effect on apple leaf pictures, which is also a common factor
contrast.
some insight into the noise effect on apple leaf pictures, which is also a common factor
come from image acquisition equipment and the natural environmental, we added Gaussian noise to
come from image acquisition equipment and the natural environmental, we added Gaussian noise
the original image, which is shown in Figure 11.apple leaf pictures, which is also a common factor
to theTo gain some
original insight
image, whichinto the noise
is shown effect on
in Figure 11.
come from image acquisition equipment and the natural environmental, we added Gaussian noise
to the original image, which is shown in Figure 11.

(a) (b)
Figure 11. Gaussian
(a) noise: (a) initial, (b)
(b) Gaussian noise.
Figure 10. Color enhancement: (a) initial, (b) low brightness, (c) high brightness, (d) low contrast,
(e) high contrast.

To gain some insight into the noise effect on apple leaf pictures, which is also a common factor
come
Sensors from image
2020, 20, 3535 acquisition equipment and the natural environmental, we added Gaussian 12noise
of 14
to the original image, which is shown in Figure 11.

(a) (b)
Figure 11. Gaussian
Gaussian noise:
noise: (a)
(a)initial,
initial,(b)
(b)Gaussian
Gaussian noise.
noise.
Sensors 2020, 20, x 12 of 14

In the same experimental setup, the model we proposed is trained on the augmented 21,410
In the same experimental setup, the model we proposed is trained on the augmented 21,410
images and the final classification accuracy can reach 99.34%. When we used the original dataset to
images and the final classification accuracy can reach 99.34%. When we used the original dataset to
train the model, the accuracy rate can also reach 99.01%. Figure 12 shows the recognition accuracy.
train the model, the accuracy rate can also reach 99.01%. Figure 12 shows the recognition accuracy. It can
It can be seen that after the data expansion, all the measures have been slightly improved on the
be seen that after the data expansion, all the measures have been slightly improved on the
model proposed.
model proposed.

Figure 12.
Figure 12. Recognition
Recognition accuracy
accuracy for
for the
the initial
initial and expanded data.
and expanded data.

4. Conclusions
An improved convolution neural network network model
model based
based on
on VGG16
VGG16 is is proposed
proposed in in this
this work.
work.
The classifier of classical VGG16 network is modified by adding a batch normalization layer, a global
average pooling layer, and a fully connectedconnected layer to accelerate
accelerate convergence and reduce training
parameters. The proposed model trains on 2141 apple leaves in the training set to identify apple leaf
diseases. The
The experimental
experimental results
results show
show that
that the
the accuracy
accuracy of
of the model test can reach 99.01% after
692 s training. Compared with the classical VGG16 network, the model parameters parameters are reduced by
119,534,592, and the the accuracy
accuracy isis improved
improved by by 6.3%.
6.3%.
Although the training time is longer than that of AlexNet and ResNet, our model has fewer
parameters and
parameters anda higher
a higher accuracy.
accuracy. Compared
Compared with GoogleNet
with GoogleNet and Alexand+ Inception,
Alex + some
Inception, some
parameters
parameters
and trainingandtimetraining time arebut
are sacrificed, sacrificed,
our model buthas
ourthe
model has the
highest highest
accuracy ofaccuracy of up After
up to 99.01%. to 99.01%.
data
After data expansion,
expansion, the accuracy theofaccuracy
the model of the
canmodel can be increased
be increased to 99.34%.toThe99.34%. The convolution
convolution neural
neural network
network proposed
proposed in this workin this
canwork canapple
identify identify
leafapple leafquickly
diseases diseasesand
quickly and accurately
accurately and providesand aprovides
feasible
a feasible
scheme forscheme for identifying
identifying apple leaf apple leaf diseases.
diseases.
In the future, our work can be improved in the following following aspects:
aspects: (1) collecting more kinds and
quantities of apple disease pictures to enrich the datasets to train better models, (2) trying other deep
convolution neural
convolution neuralnetworks
networkstotoimprove
improve thethe accuracy
accuracy andand speed
speed of recognition,
of recognition, (3) trying
(3) trying to runtoother
run
other learning
deep deep learning
methodsmethods and apply
and apply them tothemthe to the real‐time
real-time detection
detection of diseases.
of apple apple diseases.

Author Contributions: Methodology, Q.Y.; software, W.W.; validation, P.C. and J.Z.; data curation, B.Y.;
writing—original draft preparation, Q.Y. and B.Y.; writing—review and editing, B.W.; project administration, B.W.
All authors have read and agreed to the published version of the manuscript.

Funding: This work was supported by the National Natural Science Foundation of China (Nos. 61472282,
61672035, and 61872004), Educational Commission of Anhui Province (No. KJ2019ZD05), Open Fund from Key
Laboratory of Metallurgical Emission Reduction & Resources Recycling (KF2017‐02), Co‐Innovation Center for
Information Supply and Assurance Technology in AHU (ADXXBZ201705), and Anhui Scientific Research
Sensors 2020, 20, 3535 13 of 14

Author Contributions: Methodology, Q.Y.; software, W.W.; validation, P.C. and J.Z.; data curation, B.Y.;
writing—original draft preparation, Q.Y. and B.Y.; writing—review and editing, B.W.; project administration, B.W.
All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the National Natural Science Foundation of China (Nos. 61472282,
61672035, and 61872004), Educational Commission of Anhui Province (No. KJ2019ZD05), Open Fund from
Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (KF2017-02), Co-Innovation Center
for Information Supply and Assurance Technology in AHU (ADXXBZ201705), and Anhui Scientific Research
Foundation for Returnees.
Acknowledgments: Special thanks to reviewers for their valuable comments.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Padol, P.B.; Yadav, A.A. SVM classifier based grape leaf disease detection. In Proceedings of the 2016
Conference on Advances in Signal Processing (CASP), Pune, India, 9–11 June 2016; pp. 175–179.
2. Qin, F.; Liu, D.; Sun, B.; Ruan, L.; Ma, Z.; Wang, H. Identification of alfalfa leaf diseases using image
recognition technology. PLoS ONE 2016, 11, e0168274. [CrossRef]
3. Islam, M.; Dinh, A.; Wahid, K.; Bhowmik, P. Detection of potato diseases using image segmentation and
multiclass support vector machine. In Proceedings of the 2017 IEEE 30th Canadian Conference on Electrical
and Computer Engineering (CCECE), Windsor, ON, Canada, 30 April–3 May 2017; p. 134.
4. Rothe, P.; Kshirsagar, R. Cotton leaf disease identification using pattern recognition techniques. In Proceedings
of the 2015 International Conference on Pervasive Computing (ICPC), Pune, India, 8–10 January 2015;
pp. 1–6.
5. Gupta, T. Plant leaf disease analysis using image processing technique with modified SVM-CS classifier.
Int. J. Eng. Manag. Technol. 2017, 5, 11–17.
6. Mohanty, S.P.; Hughes, D.; Salathe, M. Inference of plant diseases from leaf images through deep learning.
Front. Plant Sci. 2016, 7, 1419. [CrossRef] [PubMed]
7. Sladojevic, S.; Arsenovic, M.; Anderla, A.; Culibrk, D.; Stefanovic, D. Deep neural networks based recognition
of plant diseases by leaf image classification. Comput. Intell. Neurosci. 2016, 2016, 1–11. [CrossRef] [PubMed]
8. Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front.
Plant Sci. 2016, 7, 1419. [CrossRef]
9. Fuentes, A.; Yoon, S.; Kim, S.; Park, D. A robust deep-learning-based detector for real-time tomato plant
diseases and pests recognition. Sensors 2017, 17, 2022. [CrossRef]
10. Wang, G.; Sun, Y.; Wang, J. Automatic image-based plant disease severity estimation using deep learning.
Comput. Intell. Neurosci. 2017, 2017, 1–8. [CrossRef]
11. Xiao, Q.; Li, W.; Kai, Y.; Chen, P.; Zhang, J.; Wang, B. Occurrence prediction of pests and diseases in cotton on
the basis of weather factors by long short term memory network. BMC Bioinform. 2019, 20, 688. [CrossRef]
12. Yang, B.; Wang, M.; Sha, Z.; Wang, B.; Chen, J.; Yao, X.; Cheng, T.; Cao, W.; Zhu, Y. Evaluation of Aboveground
Nitrogen Content of Winter Wheat Using Digital Imagery of Unmanned Aerial Vehicles. Sensors 2019, 19,
4416. [CrossRef]
13. Hang, J.; Zhang, D.; Chen, P.; Zhang, J.; Wang, B. Classification of Plant Leaf Diseases Based on Improved
Convolutional Neural Network. Sensors 2019, 19, 4161. [CrossRef]
14. Li, W.; Chen, P.; Wang, B.; Xie, C. Automatic localization and count of agricultural crop pests based on an
improved deep learning pipeline. Sci. Rep. 2019, 9, 1–11. [CrossRef] [PubMed]
15. Xia, D.; Chen, P.; Wang, B.; Zhang, J.; Xie, C. Insect detection and classification based on an improved
convolutional neural network. Sensors 2018, 18, 4169. [CrossRef] [PubMed]
16. Xia, S.; Chen, P.; Zhang, J.; Li, X.; Wang, B. Utilization of rotation-invariant uniform LBP histogram
distribution and statistics of connected regions in automatic image annotation based on multi-label learning.
Neurocomputing 2017, 228, 11–18. [CrossRef]
17. Lu, Y.; Yi, S.; Zeng, N.; Liu, Y.; Zhang, Y. Identification of rice diseases using deep convolutional neural
networks. Neurocomputing 2017, 267, 378–384. [CrossRef]
18. Zhang, X.; Qiao, Y.; Meng, F.; Fan, C.; Zhang, M. Identification of maize leaf diseases using improved deep
convolutional neural networks. IEEE Access 2018, 6, 30370–30377. [CrossRef]
Sensors 2020, 20, 3535 14 of 14

19. Liu, B.; Zhang, Y.; He, D.; Li, Y. Identification of apple leaf diseases based on deep convolutional neural
networks. Symmetry 2018, 10, 11. [CrossRef]
20. Hinton, G.E.; Osindero, S.; Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18,
1527–1554. [CrossRef]
21. Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep
convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and
transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [CrossRef]
22. Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional
neural networks for medical image analysis: Full training or fine tuning? IEEE Trans. Med. Imaging 2016, 35,
1299–1312. [CrossRef]
23. Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep Convolutional Neural Networks with
transfer learning for computer vision-based data-driven pavement distress detection. Constr. Build. Mater.
2017, 157, 322–330. [CrossRef]
24. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks.
In Proceedings of the International Conference on Neural Information Processing System, Lake Tahoe, NV,
USA, 3 December 2012.
25. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.
Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [CrossRef]
26. He, K.; Zhang, X.; Ren, S.; Jian, S. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE
Conference on Computer Vision & Pattern Recognition, San Francisco, CA, USA, 18–20 June 1996.
27. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent
Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice,
Italy, 22–29 October 2017; pp. 2223–2232.
28. Yu, Y.; Lin, H.; Meng, J.; Wei, X.; Guo, H.; Zhao, Z. Deep transfer learning for modality classification of
medical images. Information 2017, 8, 91. [CrossRef]
29. Taylor, M.E.; Stone, P. Transfer Learning for Reinforcement Learning Domains: A Survey. J. Mach. Learn. Res.
2009, 10, 1633–1685.
30. Zhuang, F.Z.; Luo, P.; Qing, H.E.; Shi, Z.Z. Survey on Transfer Learning Research. J. Softw. 2015, 26, 26–39.
31. Lin, M.; Chen, Q.; Yan, S. Network in Network. arXiv 2013, arXiv:1312.4400.
32. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv
2014, arXiv:1409.1556.
33. Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement
Learning. arXiv 2016, arXiv:1602.01783.
34. Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.
35. Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks.
In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA,
21–26 July 2017; pp. 1125–1134.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like