Professional Documents
Culture Documents
Cîciu Radu-Marian - Research - Paper - Orchard Monitoring Using UAV To Capture Invaders Insect Images
Cîciu Radu-Marian - Research - Paper - Orchard Monitoring Using UAV To Capture Invaders Insect Images
Abstract— In this paper, we propose to detect and classify weeds affect the growing conditions of plants, insects can
four different type of bugs. The operation is performed using compromise the entire crop if they are not controlled in time
transfer learning based on Convolutional Neural Network [BAR11].
(CNN). Transfer learning is known to achieve a better accuracy.
Transfer learning is a technique in deep learning where learned Most harmful insects attack the plants immediately after
knowledge of one network for a specific problem is used as hatching, when they enter the fruit, they bear the epidermis of
initial point for another problem. The learned knowledge the leaves and those of the stems. If they are not controlled
includes parameters of network like weights and biases etc. effectively, the insects will multiply quickly and will
compromise the entire crop in a very short time [RES09].
I. INTRODUCTION
To reduce the damage caused by those harmful insects,
The scope of the research is to develop a method and an one first important step would be to identify the signs of an
algorithm that can detect and specify the name of the invader attack in time and to effectively administer the treatments to
insect by processing the gathered images from the orchard. combat them.
The dataset of images will be provided by using one drone or
multiple drones that will be sent into the field to capture II. STATE OF THE ART DESCRIPTION
images from the orchard trees. After gathering the data, the
images will be processed to extract useful information, like Convolutional neural network (CNN) represents a class of
the aria of the spread pest, the concentration of insects in the Dynamic neural network (DNN) in the field of deep learning
orchard, the variety of insect species etc. that has been used in computer vision and other related
studies. The CNN is built to have one input, several hidden
There are a lot of methods that could help us better detect layers that usually include convolutional layers, ReLU layers
and observe the invasive insect species. One of the methods type, max-pooling layers, and fully connected layers and one
could be the use of insect pheromones to attract them. output. The most important feature is that CNN uses less
Pheromones are odorous substances emitted by a single preprocessing and returns outstanding performance results
gender to attract individuals of the opposite gender for determined by the quality of the input objects, the size of the
mating, thus being the essential factor for the survival and data and the number of classes [ZHU20].
perpetuation of the species. Pheromones are generally
produced by females and play a role in orienting and Related to our research, apart from AlexNet, there are
approaching males from long distances. Due to the other convolutional neural networks that could be used to help
attractiveness of female pheromones, they have become very on classifying the four insect classes we have chosen. They
valuable in detecting and estimating insect populations and are considered to be above AlexNet with a more advanced
even in direct control combat. Due to the detection, we can architecture and more efficient. They are: GoogLeNet (2014),
monitor the evolution of the pest, the severity of the attack or VGGNet (2014) and FractalNet (2016) [LU17][ZAH18].
the crop healing after the treatment has been applied and how
it is decreasing in intensity [JUL16].
GoogLeNet (2014)
Another observation method could be the use of ground
robots. Autonomous robots equipped with cameras would be It is another convolutional neural network made of 22
driven on a designed route in the orchard aiming to capture layers deep and 27 max-pooling layers included. It uses a
targeted images and gather crop data. It is not a new concept, pretrained version of the trained network of two datasets:
as scientists have continuously searched for ways to improve ImageNet or Places365. First dataset, ImageNet could classify
the robots that can be used in all sorts of industries. Although the image dataset into 1000 object categories, like phones,
the agricultural sector implies a series of multiple factors to TVs, laptops, or animal species. The Places365 does basically
be considered when trying to take measures, like wind speed, the same thing but divides images into 365 different areas
the ground type, the intensity of the light, the weather such as field, park, runway, and lobby. Similar to AlexNet,
conditions, the placement of the orchard etc., the robots have we can retrain GoogLeNet to perform other tasks applying
been more and more often used in agricultural research transfer learning. At this moment, there are three versions of
because the information that they collected has helped the this neural network named Inception, version 1, 2 and 3
researchers to find solutions to several problems. We will be [SZE15][ZHA19].
seeing in the future more and more robots that will be
involved in topics like this [SIN10][BER15][ZHA19]. GoogLeNet is built with a total of nine inception blocks
and global average pooling to generate estimations. the
In many orchards, pests are one of the reasons that can inception block consists of four parallel paths. The first three
seriously affect crop production throughout the year. Just as paths use convolutional layers with window sizes of 1 × 1, 3
× 3, and 5 × 5 to extract information from different spatial
sizes. The middle two paths perform a 1 × 1 convolution on
the input to reduce the number of channels, reducing the linear transformation of the input channels (followed by non-
model’s complexity. The fourth path uses a 3 × 3 maximum linearity). The convolution stride is fixed to 1 pixel; the
pooling layers, followed by a 1 × 1 convolutional layer to spatial padding of convolutional layer input is such that the
change the number of channels. The four paths use spatial resolution is preserved after convolution. Spatial
appropriate padding to give the input and output the same pooling is carried out by five max-pooling layers, which
height and width. Finally, the outputs along each path are follow some of the convolutional layers. Max-pooling is
concatenated along the channel dimension and comprise the performed over a 2 × 2 pixel window, with stride 2
block’s output [SZE15][ZAH18][ZHA19]. [SIM15][QAS18][ZAH18][KAN19].
Three Fully-Connected (FC) layers follow a stack of
convolutional layers (which has a different depth in different
architectures): the first two have 4096 channels each, the third
performs 1000-way ILSVRC classification and thus contains
1000 channels (one for each class). The final layer is the soft-
max layer. The configuration of the fully connected layers is
the same in all networks [SIM15][QAS18][ZAH18][KAN19].
All hidden layers are equipped with the rectification
(ReLU) non-linearity. It is also noted that none of the
networks (except for one) contain Local Response
Normalization (LRN), such normalization does not improve
the performance on the ILSVRC dataset but leads to
increased memory consumption and computation time
[SIM15][QAS18][KAN19].
Input Sensitivity
The evaluation of classifier in applications of pattern
recognition and binary classification known as sensitivity is
defined as the ratio of true positive values to that of sum of
Conv., MXP, LRN true positives and false negatives. It essentially estimates that
how many of the True Positives of classification model are
truly classified.
Conv., MXP, LRN
F1-score
F1-score is a function of both the precision and sensitivity.
It is an important statistical evaluation parameter to
Conv. & ReLU understand the difference between precision and accuracy.
F1-score may be a good measure in case if there is a need of
equilibrium between Precision and Sensitivity and it is most
used when there is an unequal distribution of classes.
Conv. & ReLU
Equation
Parameter Formula
number
Conv. & ReLU
Accuracy TP + TN
ACC = 100 (3.1)
(ACC)
(TP + FN + TN + FP )
FC Precision TP
PRE = (3.2)
(PRE) TP + FP
Sensitivity TP
FC SEN = (3.3)
(SEN) TP + FN
TP
Soft-max
F1-score F1 = (3.4)
(F1) FP + FN
TP +
2
Figure 5. Architecture of AlexNet: Convolution, max-
pooling, LRN and FC - fully connected layer.
IV. EXPERIMENTAL RESULTS Table 1. The confusion matrix for system data validation.
After approximately 25 minutes, the network has finished
the image cycle processing. A total of 16 Epochs, 24 images Classification
Class1 Class2 Class3 Class4
per Epoch, have shown the efficiency of the transfer learning overall
method. The system has reached 95.69% accuracy rate. The Class1 49 2 51
more images were processed, the higher the accuracy and the
lower the loss. Class2 1 34 2 1 38
Class3 60 60
Class4 1 2 57 60
Truth
51 34 66 58 209
overall
Figure 7. Loss progress. The matrix shows us that the cases in which the network
returned errors were very few and the used algorithm was
able to identify with a high accuracy the insect species. The
more insect classes will be added, the higher the probability
that the system would return even more errors or reduced
accuracy. The test results depend on a lot of the similarities of
the insects and the number of clear images gathered from the
field. For a higher precision, the system could re-iterate the
existing data and the new ones. This is yet another topic that
needs to be studied.