Professional Documents
Culture Documents
Abstract—With the reduction of Unmanned Aerial storage spaces for storing intermediate results. Due to the
2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC) | 978-1-6654-3185-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/ITOEC53115.2022.9734340
Vehicle (UAV) hardware cost and the development of deep limited load capacity of UAV, for example, the maximum
learning algorithm, the real-time object detection algorithm load weight of Inspire 2 professional UAV produced by
applied in UAV vision has great advantages in many fields. Dajiang company is 810 g. The UAV platform for car
However, due to the limited energy consumption and aerial images object detection can only use the embedded
computing power of embedded devices used in the drones system driven by battery and limited computing resources.
and the variable object scales and complex backgrounds in Therefore, the limitation of hardware cost and
the UAV vision restrict the applications in object detection
transmission delay is still the primary problem for object
based on the drones. In this paper, we optimized the
detection tasks in UAV vision.
generation of anchor boxes, introduced a new module to
increase the receptive field to improve the detection of small To sum up, the difficulties of deploying deep neural
targets, and used adaptively spatial feature fusion in the network algorithm to achieve effective object detection
feature pyramid to increase feature fusion of multi-scale based on UAV focus on two aspects [6]:
features. At last we pruned the model to make it lighter and
faster, and got the Average Precision (AP) of 89.7% for (1) UAV vision has special mission scenarios and
UAV car aerial images and the speed of 35.7 FPS by object characteristics. Different from the mainstream
running on Neural Processing Units (NPUs), which proves objection detection, most of these task scenes adopt the
the feasibility of the intelligent object detection algorithm’s conventional horizontal angle of view, and the object
efficient processing in hardware resource limited scale is larger and the characteristics are more obvious.
environment. The content of the drone shots is complex, the size of the
target to be inspected is small, and the features are
Keywords—UAV aerial images; car detection; embedded ambiguous, which are not the task type targeted by the
hardware; real-time processing mainstream algorithm network design.
I. INTRODUCTION (2) The computing power resources of UAV platform
In recent years, UAV has played an irreplaceable role are limited, which is difficult to meet the need of real-time
in many fields including military and civilian fields detection.
relying on the virtue of their lightweight, Maneuverability,
II. PROPOSED OBJECT DETECTION ALGORITHM
flexible movement and low energy consumption. Real-
Time Object Detection is an important task in applications A. The framework of the YOLOv3 algorithm
of UAV, and has been well-explored. With the rapid We use YOLOv3 as a reference model due to its high
development of computer vision and artificial intelligence precision and easy to deploy to embedded platforms such
technology, among many target detection algorithms, the as NPU.
method based on deep learning has been widely used
because it does not need the Feature Engineering, and has YOLOv3 is an algorithm proposed in recent years [1].
higher precision. YOLOv3 uses a more powerful feature extraction network
Darknet-53. As shown in Figure 1, due to the introduction
The object detection algorithms based on deep of a residual structure, Darknet-53 deepens the network to
learning are mainly divided into two stage methods and 53 layers, and the feature extraction capability is further
one stage methods. The typical algorithms of the former improved. In addition, YOLOv3 draws on the anchor
are R-CNN, as well as the improved f Fast R-CNN and mechanism in Faster R-CNN, and introduces the feature
Faster R-CNN. The detection speed of these algorithms is pyramid network (FPN) structure [2], which performs
too slow to meet the requirements of real-time. One stage detection on three feature scales of 13×13, 26×26, and
algorithm does not need to extract candidate regions first, 52×52. Greatly improved the detection effect of YOLO
and the speed is greatly improved. Typical algorithms series algorithms on small targets.
include Yolo series and SSD.
However, the usual object detection methods based on
deep learning need a large amount of calculation and large
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 05:15:40 UTC from IEEE Xplore. Restrictions apply.
TABLE I. THE COMPARISON OF DIFFERENT ANCHOR GENERATION
METHODS
K-means 9 0.73
Improved methods 9 0.75
The test results show that the improved method for
optimizing anchor frame parameters is better on the aerial
data set used in this article. Table 1 is the comparison of
Fig. 1. YOLOv3 network structure Avg IOU of the traditional K-means clustering algorithm
on the data set of this article.
However, because the network structure of YOLOv3
is still too complicated, the detection speed of YOLOv3 is C. Context augmentation module
further reduced, and it cannot run in real time on edge The object detection requires context information,
devices with limited computing power such as drones. especially in terms of small objects, which are the main
components of the aerial data sets. Therefore, we propose
B. Optimize the parameters of the anchor
an improved receptive field module (IRFM), as shown in
Figure 2 is the sample distribution obtained based on Figure 4, which merges multi-scale dilated convolution
the sample width and height of the general data set COCO [3], and set the trainable parameter W to perform
and VOC data set. Figure 3 is the sample distribution weighted fusion with the shortcut layer outputs.
obtained based on the sample width and height of the
UAV aerial image data set used in this paper. We can see
clearly that the sample size of the aerial data set changes
greatly, and most of the targets are with smaller scales, so
it is necessary to re-select the appropriate anchor
parameters to make better predictions on the aerial data
set.
Fig. 2. Bounding and anchor distribution of general data set (left) and
UAV aerial image data set (right)
1952
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 05:15:40 UTC from IEEE Xplore. Restrictions apply.
method to perform channel pruning [5] and layer pruning
simultaneously to compress the width and depth of the
model respectively. The model compression process and
specific steps are as follows in Figure 6.
1953
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 05:15:40 UTC from IEEE Xplore. Restrictions apply.
computing performance. We choose this board for its low
power consumption and accelerated computational
efficiency of convolutional networks and be in support of
the localization development of chip. The figure 7 is the
overall idea of the target detection system.
1954
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 05:15:40 UTC from IEEE Xplore. Restrictions apply.
still allocated for each scale. The calculation results are as Table above shows that our proposed object detection
follows: algorithm based on YOLOv3 in this paper improves AP
by 7.4% compared with YOLOv3 on this data set, and the
TABLE II. ANCHOR BOXES ON DIFFERENT FEATURE MAP AP after compression is still about 6.2% higher than
Feature Anchor boxes Anchor boxes
YOLOv3.
map (K-means) (improved method) Comprehensive experiment Compare several groups
16x16 (64,91) (102,59) (153,128) (64,86) (105,58) (151,127) of algorithms in the same data set, table 5 shows the
32x32 (38,34) (40,58) (66,36) (43,22) (41,49) (68,35) comparison results of YOLOv3-ours, YOLOv3,
YOLOv3-spp and yolov4 in accuracy, model size and
64x64 (15,36) (40,21) (23,44) (11,9) (21,16) (26,30) inference time. We can see that the size of the model is
reduced by 86.5% compared to YOLOv3. And the
Yolov3-ours model has great advantages in detection
TABLE III. AP CALCULATED BY DIFFERENT METHODS
accuracy and model size.
Anchor generation method AP@0.5 (car)
K-means 0.835 TABLE VI. AP CALCULATED BY DIFFERENT SETS OF EXPERIENCES
1955
Authorized licensed use limited to: ULAKBIM UASL - Uluslararasi Kibris Universitesi. Downloaded on March 22,2023 at 05:15:40 UTC from IEEE Xplore. Restrictions apply.