You are on page 1of 5

2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)

Detection of Indonesian Fishing Vessels on


Unmanned Aerial Vehicle Images using YOLOv5s
2023 International Seminar on Intelligent Technology and Its Applications (ISITIA) | 979-8-3503-1395-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ISITIA59021.2023.10221071

Gramandha Wega Intyanto Khairul Anam Berlian Juliartha Martin Putra


Department of Electrical Engineering Department of Electrical Engineering Pemeliharaan Komputer dan Jaringan
University of Jember University of Jember Akademi Komunitas Negeri Pacitan
Jember, Indonesia Jember, Indonesia Pacitan, Indonesia
gramandha@unej.ac.id khairul@unej.ac.id berlian@aknpacitan.ac.id

Heri Prasetyo
Department of Informatics
Universitas Sebelas Maret
Surakarta, Indonesia
heri.prasetyo@staff.uns.ac.id

Abstract— UPT Port, Marine and Fishery Resources The first case is that every day the officer must record the
Management Tamperan, Pacitan, several obstacles are often number of boats parked at the port in the morning and evening
encountered in carrying out operational supervision of the due to a shortage of personnel. Because the computations are
transportation of fishing vessels. Namely, some sailors or performed manually, the data-collecting procedure takes a
fishermen place their boats not in the zone, so officers have to lengthy time. The second case at the port has a rule that the
check which boats are not appropriate and instructed to justify types of fishing vessels (daplangan, jukung, hand line, and
parking even though the number of personnel and technology is purse sense) parked at the port are required to place their boats
limited, which takes a long time in the data collection process. according to their zone. However, some sailors or fishermen
To solve this problem, the authors researched the detection of
still park their boats not according to the zone, so officers have
Indonesian fishing vessel types (daplangan boat, jukung boat,
hand line boat, and purse seine boat) on Unmanned Aerial
to check any vessels that do not conform and instruct them to
Vehicle (UAV) images taken in the parking zone at the port. justify parking. The problems and constraints that existed at
After obtaining the image data, preprocessing is carried out in the Tamperan Beach Port motivated and encouraged the
several stages: image standardization by resizing as needed, proponents to conduct research to solve problems in case
data sharing (69% training data, 19% validation data, 11% test studies of fishing vessels in the parking zone with the aim of
data) by cross-validation, and making ground truth. Deep detection, checklists, and calculating numbers through digital
learning-based detection technique specifications using the imagery. The author has used a flying robot which is often
YOLOv5s architecture. The results obtained from the method referred to as an unmanned aerial vehicle (UAV) fabrication,
used are the recall value of 0.932, precision value of 0.941, to take pictures of parking zones at ports by applying fishing
mAP@ 0.5 of 0.972, and mAP@ 0.5:0.95 of 0.74. vessel detection using deep learning with the YOLOv5s
model. Previous research on UAV imagery has been carried
Keywords— detection, Indonesian fishing vessels, unmanned out, utilizing UAV remote sensing to find and map certain
aerial vehicle (UAV) image, deep learning, YOLOv5s fruit trees [5], ships detection Based YOLO [6], [7], Building
UAV for Humanitarian Aid Delivery [8], Advances in UAV:
I. INTRODUCTION A Review [9] and so on. Several previous studies on UAV
Pacitan Regency is geographically located at the imagery for the segmentation and detection of fishing vessels
southwestern tip of East Java Province, where the southern have not been carried out much, mostly on synthetic aperture
region is directly adjacent to the Indian Ocean. The position, radar (SAR) or satellite imagery. The use of deep learning is
directly adjacent to the southern sea, provides great potential predicted to be very efficient and effective in this case study
for economic growth in the maritime sector[1]. This can be of image segmentation and detection due to the computational
used as a driving force for contributions to realizing the process of feature extraction and classification as well as from
government’s vision which expresses the ideals of the previous research literacy studies [10]–[12]. Several deep
Indonesian nation as the world’s maritime axis [2]. The world learning models or architectures are used, and evaluation tests
maritime axis policy proclaimed five main pillars that are a are carried out to determine the best accuracy in detecting
priority to be realized: the utilization of maritime natural fishing vessels in UAV images so that they can be
resources [3]. In an effort to utilize maritime natural resources, recommended for problem-solving application literacy in
the East Java provincial government assigned the UPT of future research.
ports and management of marine and fisheries resources
(P2SKP) Tamperan, Pacitan to carry out operational II. METHOD
monitoring of marine and fisheries resources for shipping In the research conducted, it was found that there were
transportation for fisheries [4]. The actual conditions at the five stages of the method (data collecting, data labeling,
Tamperan port in implementing fish shipping transportation preprocessing data, build modeling, and evaluation). Detailed
supervision have several obstacles influenced by the limited information for each stage of the method is described below:
personnel of officers and technology. Some of the most
highlighted cases are related to fishing vessels parked at the
port (Fig. 1).

979-8-3503-1395-6/23/$31.00 ©2023 IEEE 468


Authorized licensed use limited to: Zhejiang University. Downloaded on January 09,2024 at 15:31:12 UTC from IEEE Xplore. Restrictions apply.
A. Data Collecting total number type of daplangan is 11745, jukung is 2928,
We use UAV image data in the parking zone of the handline is 1742, and purse seine is 1085. Graphic illustration
Tamperan Fishing Port, Pacitan. The type of camera drone in Fig. 4.
used is the DJI model FC3170. Image data has many
variations ranging from color differences due to the influence
of weather and different parking zone locations. One image
has specifications in Table 1.
Fig. 4. Amount of members per fishing vessel type
TABLE I. IMAGE SPECIFICATION
Property Value C. Preprocessing Data
Dimensions 8000x 6000 At this stage, we resize the data into the entire image in the
Resolution unit 2
dataset to 640x640 pixels to speed up the computational
Horizontal & vertical resolution 72 dpi
Bit dept 24 process in training. Data sharing is 69% for training data (843
Color representation sRGB images), 19% for validation data (236 images), and 11% for
test data (135 images) by cross-validation.
In each image, there are almost 300-500 boats. The dataset
is selected empirically so that five main images are obtained, D. Build Modeling
and image patches are extracted with a size of 1024x1024 YOLOv5, the fifth generation of the YOLO model family,
pixels with a shift of every 256 pixels, shown in Fig. 1. As a is one of the most used object detection models. It has been
result, we get a total dataset from extracting image patches of used extensively for tasks including the detection of
1214 images. pedestrians, automobiles, airplanes, and ships [13]. The
dataset was developed, and the object detection method
chosen experimentally for the data used for testing,
validation, and training of the fishing vessels detection model
in the parking zone is YOLOv5 (You Only Look Once). The
head network, the neck network, and the backbone network
(a) (b) are the three parts of the YOLOv5 model (Fig. 5). The
backbone network is input data that is processed to get
Fig. 1. Illustration patches: (a) full image, (b) extract image patches
features (features extraction); the neck network collects and
distributes features that differ from the scales; the head
B. Image Labeling
network is used to assess the position and category of target
The UAV image was captured in the parking zone of box. In YOLOv5, the backbone network adopts a local cross-
Tamperan Pacitan Port. In each image are objects of fishing strategic network [14] to solve the gradient information
vessels. The fishing vessels category has four types:
duplication problem and eliminate gradients in optimization
daplangan, hand line, purse seine, and jukung (indexed from
networks [5]. It adopts path aggregation networks [15] dan
0 to 3). The categories of fishing vessels are shown in Figure
2. spatial pyramid pooling networks [16]. The neck network is
a model or architecture that can enhance the recognition of
things at various scales and the capacity to recognize the same
item at various scales [5]. In the Head network, it is the same
as layer detection like YOLOv3 and YOLOv4, namely
producing the final output vector with probabilistic
categories, object scores, and label bounding box predictions
Fig. 2. Image labeling and choosing the best anchor box on the feature map [5].

In the dataset, we do labeling using the labelme tool in the


form of label segmentation. Then we convert it with the
roboflow tool into the format label (bounding box) used for
YOLOv5 (Fig. 3).

Fig. 5. The structure of the YOLOv5 model [5]


(a) (b) (c)

Fig. 3. Sample data: (a) image data, (b) annotation segmentation, (c)
Taking advantage of the flexibility of the YOLOv5 model
annotation box configuration, the file structure model has been modified in
parameters of depth (depth_multiple) and width
A total of 1214 images produced a total of 17500 (width_multiple). The model size is indirectly adjusted by
annotations with data ratios for each category, namely the changing the number of bottlenecks in the cross stage partial

469
Authorized licensed use limited to: Zhejiang University. Downloaded on January 09,2024 at 15:31:12 UTC from IEEE Xplore. Restrictions apply.
networks and the number of convolution cores in each III. RESULTS AND DISCUSSION
convolution layer. This study used the YOLOv5 model, The build modeling section was the first to reveal the
whose scale is YOLOv5s. Parameter details are shown in YOLOv5s algorithm's basic idea. This section focuses on the
Table 2. analysis and performance results of YOLOv5s on Fishing
Vessel Detection on UAV Imagery. In this study, the device
TABLE II. PARAMETER we used was Google Colab, and In Fig. 6 detailed the device
Model Number Width Depth Number of GFLOPs specifications. The evaluation results are analyzed and
of layers parameters displayed in Fig. 7 – 9. The predicted results of image data
YOLOv5s 283 0.50 0.33 7263185 16.9 are shown in Fig. 10 and 11.
*GFLOPs denote Giga Floating-point Operations Per Second

E. Evaluation
Ideally, for evaluating models in deep learning, there are
several things, like accuracy, storage memory, speed
(inference time) and quantitative. However, evaluation
metrics is a metric that evaluates the model’s accuracy from
most related research conducted to date [5], [10]. The
accuracy of detection and segmentation algorithms can
access using the following popular methods, namely:
The popular metrics to calculate the accuracy of many
classic image segmentation models by recall (2), precision Fig. 6. Device specifications
(1), and F1-score (3), respectively. The evaluation matrices
(Fig. 7) typically display the outcomes of the three equations The hyperparameters were finetuned to obtain the optimal
combined for analysis. The FP denotes false positive, the FN ones, shown as follows:
for false negative, and TP for true positive in (1) and (2) [5], • Initial learning rate: 0,01;
[10]. • Max epoch: 100;
• Minimum batch size: 16;
(1) • Optimizer: SGDM.

(2) Initially, 0.01 was chosen as the learning rate, 100 was
specified as the maximum epoch, the minimum batch size
This study combined F1-score between recall and was established at 16, and the models were trained using a
precision, defined as the average of recall and precision [5], Stochastic Gradient Descent with Momentum (SGDM)
[10]. The F1 denotes the F1-score, the R for recall, and the P strategy to avoid overfitting and underfitting.
for precision.
1 2. . / (3)
The Average Precision measures how well a model
detects a specific category in deep learning object
detection[5], [10], as defined in (4). Where AP represents the
average precision, means the precision-recall curve of
the category. To obtain the average precision of multiple
categories is a comprehensive measure of a model for all
detection categories by mean average precision denotes
mAP.

(4)
The term Jaccard Index is referred to the combined and
intersection area between the segmentation map and
projected ground truth; another name is Intersection over
Union (IoU). A and B are used in (5) to represent the ground
truth and segmentation prediction components (which range
between 0 and 1); another popular metric is Mean-IoU. It Fig. 7. The recognition accuracy of YOLOv5s (evaluation metrics)
defines the average IoU across all classes. Modern
segmentation and detection algorithms are used to report In Fig. 7, shown in the evaluation metrics is relatively
detection algorithms and the performance of modern high accuracy performance in each category of fishing
segmentation [5], [10]. vessels, where the daplangan boat is 0.99, the hand line boat
| ∩"| is 0.96, the jukung boat is 0.94, and the purse seine boat is
, (5) 0.97. IoU > 0.5 is mAP@0.5, and IoU between 0.5 and 0.95
| ∪"|
is mAP@0.5:0.95 reflecting the average accuracy in all four
categories of results on the evaluation metrics of the
YOLOv5s model. Table 3 describes it in detail.

470
Authorized licensed use limited to: Zhejiang University. Downloaded on January 09,2024 at 15:31:12 UTC from IEEE Xplore. Restrictions apply.
TABLE III. RESULT OF EVALUATION METRICS
Class/Type Labels Images Recall Precision mAP@ 0.5 mAP@ 0.5:0.95
all 3516 236 0.932 0.941 0.972 0.74
daplangan 2360 236 0.974 0.946 0.99 0.808
handline 384 236 0.948 0.963 0.977 0.766
jukung 577 236 0.827 0.887 0.93 0.575
purse seine 195 236 0.981 0.97 0.99 0.809

Fig. 8. The graphic curve of F1-score, precision, and recall with confidence

Fig. 9. Graphical visualization of evaluation metrics for the YOLOv5s model

Fig. 8 represents the relationship between precision, precision. It displays the outcomes before and after the
recall, F1-score and confidence, as well as between recall and algorithm of YOLOv5s improvement. As shown in Fig. 9,

471
Authorized licensed use limited to: Zhejiang University. Downloaded on January 09,2024 at 15:31:12 UTC from IEEE Xplore. Restrictions apply.
training, validation, recall, precision, mAP@0.5, and University of Jember, Universitas Sebelas Maret, and
mAP@0.5:0.95 in the YOLOv5s model, which was carried Akademi Komunitas Negeri Pacitan.
out in 0 to 100 epochs, the results showed stable accuracy
after 80 epochs. In the evaluation metrics, the average REFERENCES
accuracy value of each model trained from 1-100 epochs was [1] R. R. Umami, “Analisis sektor potensial pengembangan wilayah guna
mendorong pembangunan daerah di kabupaten pacitan,” Universitas
selected as the final evaluation value. Fig. 9 also shows the Diponegoro, 2014.
loss indicator during the training and validation process. If [2] A. F. Fandisyah, N. Iriawan, and W. S. Winahju, “Deteksi Kapal di
the training loss value is close to the loss validation value, the Laut Indonesia Menggunakan YOLOv3,” J. Sains dan Seni ITS, vol.
model is not overfitting. The loss value is lower, which shows 10, no. 1, pp. D25–D32, 2021.
[3] Widodo and A. Bandono, “Penguasaan Dan Pengembangan Iptek
that the model has a better accuracy value. In using Kemaritiman Guna Mewujudkan Indonesia Sebagai Poros Maritim
YOLOv5s, there is no overfitting during the training process Dunia,” J. Sci. Technol., vol. 14, no. 3, pp. 319–327, 2021.
on the fishing vessels dataset. The predicted results of image [4] “Rencana Strategis Dinas Kelautan dan Perikanan Provinsi Jawa
detection in this study are shown in Fig. 10 and 11. Timur,” 2019. [Online]. Available:
https://ppptamperan.dkp.jatimprov.go.id/wordpress/wp-
content/uploads/2020/04/Renstra-2014-2019-RENCANA-
STRATEGIS-DINAS-KELAUTAN-DAN-PERIKANAN-
PROVINSI-JAWA-TIMUR.pdf
[5] Y. Xiong, X. Zeng, J. Liao, W. Lai, Y. Chen, and M. Zhu, “An
approach to detecting and mapping individual fruit trees integrated
YOLOv5 with UAV remote sensing,” Preprints, no. April, 2022, doi:
10.20944/preprints202204.0007.v2.
[6] Y. Chang, A. Anagaw, L. Chang, Y. C. Wang, C. Hsiao, and W. Lee,
“Ship Detection Based on YOLOv2 for SAR Imagery,” Remote Sens.,
(a) (b) (c) 2019, doi: 10.3390/rs11070786.
[7] H. Wu, Y. Hu, W. Wang, and X. Mei, “Ship Fire Detection Based on
Fig. 10. Sample image results: (a) image, (b) ground truth, (c) prediction an Improved YOLO Algorithm with a Lightweight Convolutional
results Neural Network Model,” Sensors, 2022, doi: .
https://doi.org/10.3390/s22197420.
[8] A. F. Babgei and H. Suryoatmojo, “Building an Unmanned Aerial
Vehicle for Humanitarian Aid Delivery,” Indones. J. Electron.
Instrum. Syst. Vol.10, vol. 10, no. 1, pp. 53–64, 2020, doi:
10.22146/ijeis.55144.
[9] F. Ahmed, J. C. M. Anupam, K. Pankaj, and S. Yadav, “Recent
Advances in Unmanned Aerial Vehicles : A Review,” Arab. J. Sci.
Eng., vol. 47, no. 7, pp. 7963–7984, 2022, doi: 10.1007/s13369-022-
06738-0.
[10] S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and
D. Terzopoulos, “Image Segmentation Using Deep Learning: A
Survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8828, no. c, pp.
1–20, 2021.
[11] G. W. Intyanto, “Klasifikasi Citra Bunga dengan Menggunakan Deep
(a) (b) Learning : CNN ( Convolution Neural Network ),” J. Arus Elektro
Indones., vol. 7, no. 3, pp. 80–83, 2021.
Fig. 11. Data results: (a) label prediction results by index type, (b) label [12] R. M. Connolly, K. I. Jinks, C. Herrera, and S. Lopez-marcano, “Fish
prediction results by name type surveys on the move : Adapting automated fish detection and classi fi
cation frameworks for videos on a remotely operated vehicle in
IV. CONCLUSIONS shallow marine waters,” no. November, pp. 1–11, 2022, doi:
10.3389/fmars.2022.918504.
The conclusion is that using the YOLO v5s model was [13] Y. Zhang, Z. Guo, J. Wu, Y. Tian, H. Tang, and X. Guo, “Real-Time
able to detect with high accuracy value, where the recall value Vehicle Detection Based on Improved YOLO v5,” 2022.
is 0.932, precision value is 0.941, mAP@ 0.5 is 0.972, and [14] C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh,
mAP@ 0.5:0.95 is 0.74. This method is recommended to and I. H. Yeh, “CSPNet: A new backbone that can enhance learning
capability of CNN,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern
detect fishing vessels in Parking Zone. For future research, we Recognit. Work., vol. 2020-June, pp. 1571–1580, 2020, doi:
will develop by comparing other methods for detecting fishing 10.1109/CVPRW50498.2020.00203.
vessels in the parking zone with UAV images. [15] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “PANet: Path Aggregation
Network for Instance Segmentation. (arXiv:1803.01534v3 [cs.CV]
ACKNOWLEDGEMENT UPDATED),” Cvpr, pp. 8759–8768, 2019, [Online]. Available:
http://arxiv.org/abs/1803.01534
Thanks to KEMENRISTEKDIKTI, especially DIKTI and [16] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in
DIKSI, for financing through the DPRM PDP 2022 grant. Deep Convolutional Networks for Visual Recognition,” IEEE Trans.
Thank you also to the collaborative universities of the Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015.

472
Authorized licensed use limited to: Zhejiang University. Downloaded on January 09,2024 at 15:31:12 UTC from IEEE Xplore. Restrictions apply.

You might also like