You are on page 1of 14

PLOS ONE

RESEARCH ARTICLE

Bass detection model based on improved


YOLOv5 in circulating water system
Longqin Xu1,2,3,4☯, Hao Deng1,2,3,4☯, Yingying Cao5, Wenjun Liu5, Guohuang He1,2,3,4,
Wenting Fan1,2,3,4, Tangliang Wei1, Liang Cao1,2,3,4, Tonglai Liu ID1,2,3,4*,
Shuangyin Liu ID1,2,3,4,5*
1 College of Information Science and Technology, Zhongkai University of Agriculture and Engineering,
Guangzhou, China, 2 Academy of Intelligent Agricultural Engineering Innovations, Zhongkai University of
Agriculture and Engineering, Guangzhou, China, 3 Intelligent Agriculture Engineering Technology Research
Center of Guangdong Higher Education Institues, Zhongkai University of Agriculture and Engineering,
a1111111111 Guangzhou, China, 4 Guangzhou Key Laboratory of Agricultural Products Quality & Safety Traceability
a1111111111 Information Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, China, 5 College of
a1111111111 Computer and Information Engineering, Tianjin Agricultural University, Tianjin, China
a1111111111
☯ These authors contributed equally to this work.
a1111111111
* tonglailiu@zhku.edu.cn (TL); shuangyinliu@zhku.edu.cn (SL)

Abstract
OPEN ACCESS
The feeding amount of bass farming is closely related to the number of bass. It is of great
Citation: Xu L, Deng H, Cao Y, Liu W, He G, Fan W,
et al. (2023) Bass detection model based on significance to master the number of bass to achieve accurate feeding and improve the eco-
improved YOLOv5 in circulating water system. nomic benefits of the farm. In view of the interference caused by the problems of multiple tar-
PLoS ONE 18(3): e0283671. https://doi.org/ gets and target occlusion in bass data for bass detection, this paper proposes a bass target
10.1371/journal.pone.0283671
detection model based on improved YOLOV5 in circulating water system. Firstly, acquiring
Editor: Praveen Kumar Donta, TU Wien: by HD cameras, Mosaic-8, a data augmentation method, is utilized to expand datasets and
Technische Universitat Wien, AUSTRIA
improve the generalization ability of the model. And K-means clustering algorithm is applied
Received: October 27, 2022 to generate suitable coordinates of prior boxes to improve training efficiency. Secondly,
Accepted: March 13, 2023 Coordinate Attention mechanism (CA) is introduced into backbone feature extraction net-
Published: March 27, 2023 work and neck feature fusion network to enhance attention to targets of interest. Finally,
Soft-NMS algorithm replaces Non-Maximum Suppression algorithm (NMS) to re-screen
Copyright: © 2023 Xu et al. This is an open access
article distributed under the terms of the Creative prediction boxes and keep targets with higher overlap, which effectively solves the problems
Commons Attribution License, which permits of missed detection and false detection. The experiments show that the proposed model
unrestricted use, distribution, and reproduction in can reach 98.09% in detection accuracy and detection speed reaches 13.4ms. The pro-
any medium, provided the original author and
posed model can help bass farmers under the circulating water system to accurately grasp
source are credited.
the number of bass, which has important application value to realize accurate feeding and
Data Availability Statement: All relevant data are
water conservation.
within the paper and its Supporting information
files.

Funding: National Natural Science Foundation of


China under Grant 61871475, in part by the Special
Project of Laboratory Construction of Guangzhou
Innovation Platform Construction Plan under Grant Introduction
201905010006, Guangzhou Key Research and
Bass is an edible fish with high nutritional value. The amount of bait feeding in bass culture
Development Project under Grant
202103000033,201903010043, Guangdong
is closely related to the number of bass [1]. If the feeding amount is too small, it will easily
Science and Technology Project under Grant lead to mutual injury of bass due to starvation and reduce survival rates. If fed too much,
2020A1414050060,2020B0202080002, Innovation it will reduce the quality of water and waste resources. Therefore, monitoring the bass

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 1 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Team Project of Universities in Guangdong population is beneficial to scientific feeding of bass and control of culture density [2], which
Province under Grant 2021KCXTD019, is important to achieve efficient bass culture. The traditional methods of bass counting
Characteristic Innovation Project of Universities in
mainly relies on manual estimation, which is time-consuming and error-prone. With the
Guangdong Province under Grant KA190578826,
Natural Science Foundation of Guangdong application of computer vision technology in aquaculture, target detection algorithms are
Province under Grant 2023A1515011230, adopted to solve the problem of accurate counting by acquiring biological image data
Educational Science Planning Project of through HD cameras and other devices. Traditional target detection algorithms use image
Guangdong Province under Grant 2020GXJK102, processing methods to extract features [3–6]. Zhang [7] applied image processing methods
and Grant 2018GXJK072, Guangdong Province
such as binarization, expansion and erosion to extract fry images and employed connected
Graduate Education Innovation Program Project
under Grant 2022XSLT056, 2022JGXM115,
area algorithm and refinement algorithm to count bass in the images. Comparing the two
Guangdong Science and Technology Planning algorithms, the refinement algorithm has higher accuracy for images with high overlap, but
Project under Grant 2015A040405014. The it requires high quality image acquisition. In practical applications, most of the acquired
Technical Service Project for Xingning Pigeon images are not suitable for direct counting. After designing a tank with a stable flow rate,
Industrial Park of Zhongkai University of Wang [8] used local threshold segmentation to extract targets and calculated the total num-
Agriculture and Engineering (Construction and
ber of bass from non-overlapping regions in images. In addition, four-neighborhood labeling
Promotion of Visual Information Platform).
National Innovation Industry Training Program for [9], grayscale image analysis [10], image noise reduction and segmentation [11] are also
College Students in 2022 under Grant employed to biological counting. Convolutional neural networks can greatly improve accu-
202211347023. National Innovation Industry racy and generalization ability [12–15], which are widely used in the field of fish target detec-
Training Program for College Students in 2021 tion [16–21]. An improved YOLOv3 model was proposed by Cui [22] to implement a
under Grant 202111347004. Guangdong
counting system of puffer fish. The improved YOLOv3 model effectively reduced the occur-
Innovation Industry Training Program for College
Students in 2022 under Grant S202211347083.
rence of missed and false detection in overlapping areas, and the counting accuracy can
Guangdong Basic and Applied Basic Research reach 92.5%. Using local image training and migration learning, Lu [23] proposed a light-
Foundation 2022B 1515120059. Science and weight YOLOv4-based shrimp automatic counting model with 92.12% counting accuracy.
Technology Project of Yunfu City under Grant For high-resolution images, Chen [24] designed an adaptive cropping preprocessing algo-
2022020303. rithm to augment datasets and proposed a YOLOv5-based model, which achieved 92.55%
Competing interests: The authors have declared accuracy. Li [25] improved VGG-19 (a convolutional neural network) to achieve 99.79%
that no competing interests exist. accuracy in evaluating iced pomfret freshness. The fish recognition method based on convo-
lutional neural networks have high recognition accuracy, but the inference speed is slow,
which cannot meet the demand of rapid multi-target detection. Zhao [26] proposed a fish
detection method combining visual attention mechanism SKNet and YOLOv5, it had recog-
nition accuracy of 98.86% and recall rate of 96.64%, 2.14% higher and 2.29% higher com-
pared with YOLOv5. Li [27] proposed a novel method of abnormal behavior detection based
on image fusion, the BCS-YOLOv5 based image fusion achieved the best accuracy with an
average accuracy of 96.69%. Zhao [28] proposed a high-precision and lightweight end-to-
end target detection model based on deformable convolution and improved YOLOv4, the
proposed model has an accuracy of 95.47% while the parameter amount is reduced by 10
times and the FPS is doubled. To address the above problems, this paper proposes an
improved YOLOv5 model for bass detection under recirculating water aquaculture system,
aiming to improve the rapid localization ability and counting accuracy. Taking bass as the
research object, the proposed model introduces Coordinate Attention mechanism [29] based
on YOLOv5 model to accurately locate and identify the targets of interest, and utilizes the
Soft-NMS [30] algorithm instead of the original NMS [31] algorithm for selecting more suit-
able target boxes.

Materials and methods


Experimental steps of proposed work
In this study, The Experimental steps of the proposed work are shown in Fig 1.

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 2 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 1. Experimental steps of the proposed work.


https://doi.org/10.1371/journal.pone.0283671.g001

Data acquisition
In the circulating water bass breeding workshop of Guangdong Provincial General Fisheries
Technology Promotion Station in Nansha District, Guangzhou, a bass video data collection
platform is built, mainly including a circular fish pond made of polypropylene plates, HD cam-
eras and video recorders. As shown in Fig 2, the diameter of the fish pond is 2.7 meters. The
datasets are collected from 7:00 am to 18:00 pm. In order to obtain complete images of fish
ponds from the top viewing angle, cameras are directly installed above the center of fish pond,
with a height of 1 meter from the water surface. The cameras have 1080P resolution, 2.8mm
focal length lens, 130 degree wide angle lens, and 30FPS MP4 video capture format. The
acquired videos are intercepted as JPG format images, and 1066 photos are intercepted at dif-
ferent time periods and light intensities, as shown in Fig 3.

Data annotation
In this paper, the datasets are manually labeled with the labeling tool LabelImg in VOC dataset
format, naming label target as fish, as shown in Fig 4. The annotation information is stored in
an XML file, which corresponds to the image file. Each line stores one target information, in
order: target category, X and Y axis coordinates of the center point of the detection boxes, and
target width and height.

Data augmentation
YOLOv5 model applied the data augmentation method Mosaic, which randomly scales and
crops four images and then concatenates them to form a single image. This method calculates
4 images at a time during the normalization operation, which can reduce the requirement for

Fig 2. Bass video data acquisition platform.


https://doi.org/10.1371/journal.pone.0283671.g002

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 3 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 3. Video capture image.


https://doi.org/10.1371/journal.pone.0283671.g003

memory and speed up training. Therefore, an improved data augmentation method Mosaic-8
is adopted in this paper. Mosaic-8 increases the number of images processed from 4 to 8 [32],
while random noise and random brightness are introduced to improve robustness and gener-
alization, as shown in Fig 5.

Improved YOLOv5 bass target detection algorithm


Principle of YOLOv5s algorithm. In this paper, lightweight detection model YOLOv5s
is applied, which has characteristics of fast detection, high performance, flexibility and rapid
deployment. YOLOv5s consists of a backbone feature extraction network (Backbone), a neck
feature fusion network (Neck), and a prediction layer (Prediction).
The backbone feature extraction network employs a CSPDarknet network structure that
mainly consists of a residual convolution layer (Bottleneck) and a CSPnet network structure,
which can increase the depth of networks and enhance the learning ability of convolutional
neural networks. Secondly, the backbone feature extraction network introduces a Focus net-
work structure to slice and stack the input images, expand the number of channels, decrease

Fig 4. Image annotation.


https://doi.org/10.1371/journal.pone.0283671.g004

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 4 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 5. Mosaic-8 data enhancement process.


https://doi.org/10.1371/journal.pone.0283671.g005

parameters and reduce the memory usage during training. Moreover, the backbone feature
extraction network also introduces a SPPF network structure, which employs maximum pool-
ing with different kernel sizes to increase the perceptual field.
Inspired by feature pyramids, the neck feature fusion network consists of a FPN [33] layer
with a PAN [34] layer. The FPN layer performs a down-sampling operation to deliver strong
semantic information from the top down, while the PAN layer performs an up-sampling oper-
ation to deliver strong localization information from the bottom down. They aggregate fea-
tures from different backbone feature layers to different prediction layers.
The prediction layer consists of three prediction feature layers, which aim to detect targets
of different sizes. The detection results output from the prediction layer are filtered by non-
maximal suppression (NMS) to obtain the highest scoring detection boxes. The structure of
the YOLOv5s network is shown in Fig 6.
Coordinate attention. Since environmental constraints and fixed camera shooting angles,
fish overlap and occlusion problems often occur, resulting in low model detection accuracy.
Therefore, in this paper, Coordinate Attention mechanism (CA) that combines location infor-
mation with channel information, is added to capture not only cross-channel information but
also direction-aware and location-sensitive information. The method helps to locate and iden-
tify the target of interestand improve accuracy in the case of overlap and occlusion.
As shown in Fig 7, CA mechanism is divided into two main steps, namely coordinate infor-
mation embedding and coordinate attention generation, aim to embed location information
into channel attention.
Improved backbone feature extraction network. To extract different scale features in the
backbone feature extraction network, CA mechanism is inserted into the C3 module, and the
CBS module on another branch is removed. As shown in Fig 8, the C3CA module is replaced
by the C3 module, which can effectively reduce parameters and make the model more
lightweight.
Improved neck feature fusion network. Similarly, the neck feature fusion network incor-
porates CA mechanism to obtain multi-scale feature information, which can help to locate tar-
gets of interest.

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 5 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 6. YOLOv5s network structure.


https://doi.org/10.1371/journal.pone.0283671.g006

The framework diagram of the improved YOLOv5 perch detection model is shown in Fig 9.
Loss function. The loss function of the improved YOLOv5 target detection model is
shown in Eq 1, which consists of a bounding box regression loss Lbbox, a confidence loss Lcfd
and a object class loss Lcls.
Loss ¼ Lbbox þ Lcfd þ Lcls ð1Þ

The bounding box regression loss Lbbox employs CIOU that takes into account overlap area,
central point distance and aspect ratio between ground-turth boxes and predicted boxes, mak-
ing the detection accuracy higher, as shown in Eqs 3, 4, 5 and 6.
Lbbox ¼ 1 CIOU ð2Þ

rðb; bgt Þ
CIOU ¼ IOU av ð3Þ
c2

A\B
IOU ¼ ð4Þ
A[B

v
a¼ ð5Þ
ð1 IOUÞ þ v

� �2
4 wgt w
v¼ arctan gt arctan ð6Þ
p2 h h

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 6 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 7. Coordinate attention mechanism.


https://doi.org/10.1371/journal.pone.0283671.g007

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 7 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 8. C3 module replaced by C3CA module.


https://doi.org/10.1371/journal.pone.0283671.g008

where IoU denotes the ratio of the intersection of the prediction box A and the ground-turth
box B to their concatenation. ρ denotes Euclidean distance between the center point b of the
prediction box and the center point bgt of ground-turth boxes, and c presents the diagonal
length of the minimum enclosing box. α is a weight parameter, and v is a aspect ratio coeffi-
cient of the ground-turth box to the predicted box. h and w denote the height and width of the
prediction box, and hgt and wgt denote the height and width of ground-turth boxes.
The confidence loss Lcfd is shown as Eq 7.

K �K X
X M
Lcfd ¼ ^ i logCi þ ð1
Iijobj ½C ^ i Þlogð1
C ^ i Þ�
C
i¼0 j¼0
ð7Þ
X
K �K X
M
lnoobj ^ i logCi þ ð1
Iijnoobj ½C ^ i Þlogð1
C ^ i Þ�
C
i¼0 j¼0

where K × K denotes the grid parameters of the output layer, M denotes the number of

Fig 9. Improved YOLOv5 network structure.


https://doi.org/10.1371/journal.pone.0283671.g009

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 8 / 14


PLOS ONE Bass detection model based on improved YOLOv5

prediction boxes corresponding to each output grid, C is the total number of categories, and
λnoobj is a penalty weight.
The object class loss Lcls is shown as Eq 8.

X
K�K X
Lcls ¼ Iijobj ½P^ i ðcÞlogPi ðcÞ þ ð1 P^ i ðcÞÞlogð1 P^ i ðcÞÞ� ð8Þ
i¼0 c2cls

where P is the classification probability, Pi(c) is the probability that the i-th sample is of class c,
and P^ i ðcÞ is the probability that the i-th sample is predicted to be of class c.
Soft-NMS. In the post-processing stage of YOLOv5, Non-Maximum Suppression (NMS)
is used to remove most of duplicate predictors and reduce the incidence of false detection. Pre-
diction boxes are sorted first followed by filtering according to the pre-set threshold, and maxi-
mum value prediction boxes are output, as shown in Eq 9.

8
< sk ; AIOU ðM; bk Þ � Nj
sk ¼ ð9Þ
: 0; AIOU ðM; bk Þ < Nj

where sk is a score of the k-th pre-set box, Nj is the preset threshold, M is the current highest
scoring predictor box, bk is a k-th predictor box to be filtered, and AIoU is the overlap region
between M and bk.
When two fish are close to each other in a densely populated image, NMS only keeps the
highest scoring prediction boxes and delete low-scoring prediction boxes, resulting in a low
recall and missed detection. Therefore, Soft-NMS method is utilized to replace the traditional
NMS in this paper, as shown in Eq 10.

8
< sk ; AIOU ðM; bk Þ � Nj
sk ¼ ð10Þ
: s ð1 AIOU ðM; bk ÞÞ; AIOU ðM; bk Þ < Nj
k

Soft-NMS resets scores based on the overlap of similar detection boxes and retains the pre-
diction boxes for next screening, solving the problem that traditional NMS directly removes
low-scoring prediction boxes that lead to missed detection.

Experimental results and analysis


Experimental platform
All experiments are conducted in Ubuntu 16.04 and Pytorch 1.7.1, where GPU is GeForce
GTX 3090 with 32 GB of memory, CPU is Intel(R) Xeon(R) Gold 6146 @2.40 GHz.
The original YOLOv5s model introduces a prior boxes to speed up training. In this experi-
ment, K-means algorithm is applied to cluster different sizes of bounding boxes, and 9 cluster-
ing centers are regenerated as prior boxes. The coordinates of clustering centers are (12,72),
(46,27), (16,81), (21,76), (44,39), (28,63), (20,97), (41,58), and (35,79), respectively, as shown in
Fig 10.

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 9 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 10. Cluster center distribution diagram.


https://doi.org/10.1371/journal.pone.0283671.g010

Evaluation metrics
The average precision (AP) and F1 are employed as evaluation metrics, where AP is the area
under the Precision-Recall curve, as shown in Eqs 11, 12 and 13.
TP
P¼ ð11Þ
TP þ FP

TP
R¼ ð12Þ
TP þ FN

Z 1
AP ¼ PðRÞdR ð13Þ
0

where P denotes the probability of correct prediction, R denotes the probability of incorrect
prediction, TP is the number of successful predictions, FP is the number of incorrect predic-
tions, and FN is the number of unpredicted predictions.
F1 is the average of precision and recall, as shown in Eq 14.
P�R
F1 ¼ 2 � ð14Þ
PþR

In addition, experiments also analyze the model performance in terms of detection time,
number of parameters, and model sizes.
Model training and hyperparameter setting. To verify the performance of the proposed
model, it is compared with target detection models YOLOv3tiny, YOLOv3, YOLOv4,
YOLOv7 and YOLOv5s. The total number of iterations for experiments are 200, the iteration
batch size is 32, the optimizer is Adam, and the default parameters are adopted. The learning
rate is set to 0.01, and the cosine annealing learning rate adjustment strategy is introduced.
The training loss and validation loss of each iteration are recorded and plotted, and the model
with the lowest loss in validation set is saved as the training result. The losses for training and
validation of different models are shown in Fig 11.

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 10 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 11. Loss value change curve.


https://doi.org/10.1371/journal.pone.0283671.g011

Comparative analysis of different target detection models


As shown in Table 1, compared with YOLOv5, YOLOv5-CA-Soft-NMS shows 2.91% increase
in AP and 8.03% decrease in the amount of parameters, and 1.1ms decreases in detection time.
This shows that with the introduction of C3CA module, the backbone feature extraction net-
work can effectively extract location features and channel features to focus on the region of
interest. Compared with YOLOv5-CA, YOLOv5-CA-Soft-NMS has an increased detection
time by 1.6ms, but it can also meet the requirements of fast detection with higher accuracy.
This phenomenon indicates that although Soft-NMS has more computational overhead than
NMS, it is more effective in detecting dense overlapping perch targets.
In terms of accuracy, YOLOv5-CA-Soft-NMS improves AP by 3.21% and 0.43% compared
with YOLOv3-tiny and YOLOv4, respectively. However, compared with YOLOv3, AP is
reduced by 0.59%. It is possible that YOLOv3 classifies incorrect targets as correct predictions,
resulting in higher AP.
In terms of performance, compared with YOLOv5-CA-Soft-NMS, YOLOv3-tiny, YOLOv3,
YOLOV4 and YOLOv7 has 1.3 times, 9.69 times, 8.15 times and 5.76 times higher number of
parameters, 1.31 times, 15.3 times, 9.45 times and 5.63 times higher model size, and 1.26 times,
3.4 times, 2.6 times and 1.34 times higher detection time, respectively. As shown in Table 1,
YOLOv5-CA-Soft-NMS has a lower number of parameters, smaller model weights and better
detection speed.

Comparative analysis of detection effect of different models


To verify the effectiveness of detection, YOLOv5-CA-Soft-NMS and other models are com-
pared on the test set, as shown in Fig 12.
In region A of Fig 12(a), YOLOv5-CA successfully detects all targets that cross-obscure
each other, and all other models have missed detection. In region B of Fig 12(a),

Table 1. Comparative experimental results.


Model AP_0.5 Recall Precision F1 Parameters Model size Time
YOLOv3tiny 94.7% 88.71% 92.01% 90.32% 8666692 16.61MB 17ms
YOLOv3 98.68% 95.33% 97.34% 96.32% 62546518 193.89MB 46ms
YOLOv4 97.66% 93.71% 95.14% 94.42% 52569318 119.83MB 35ms
YOLOv7 97.71% 93.63% 95.32% 94.26% 37196556 71.32MB 18ms
YOLOv5 95.18% 91.26% 93.52% 92.37% 7012822 13.7MB 14.5ms
YOLOv5-CA 97.64% 92.59% 95.31% 93.93% 6413870 12.59MB 11.8ms
YOLOv5-CA-Soft-NMS 98.09% 93.7% 95.78% 94.72% 6449758 12.67MB 13.4ms
https://doi.org/10.1371/journal.pone.0283671.t001

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 11 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Fig 12. Comparison of detection effects of different models.


https://doi.org/10.1371/journal.pone.0283671.g012

YOLOv5-CA-Soft-NMS successfully detects all parallel-obscured targets, while all other mod-
els have missed detection phenomena.
In region C of Fig 12(b), YOLOv5-CA successfully detects all densely overlapping targets,
and all other models suffer from false detection.
In region E of Fig 12(c), there are partially overlapping targets and a large number of inde-
pendent targets. YOLOv5-CA-Soft-NMS successfully detects all targets, while other models
have missed or false detection. There are targets with a high degree of overlap in region F of
Fig 11(c), and YOLOv5-CA-Soft-NMS and YOLOv5-CA successfully detect all targets, while
other models have missed detection.
In summary, YOLOv5-CA-Soft-NMS detects targets better than other models and effec-
tively improves accuracy in the case of target occlusion.

Conclusions
For the problems of multiple target occlusion and overlap in fish dense farming scenarios, this
paper proposes a bass target detection model based on improved YOLOv5.
To effectively detect bass targets, CA mechanism is inserted to improve the backbone fea-
ture extraction network and neck feature fusion network of YOLOv5 to capture cross-channel
information, orientation awareness and location sensitive information. The average mean
accuracy of model detection is improved by 2.46% compared with YOLOv5s model.
To effectively retain targets with high overlap degree and improve the recognition accuracy,
Soft-NMS replaces NMS to re-screen prediction boxes. The detection accuracy is improved by
2.91% compared with YOLOv5s.
In this paper, we propose a bass detection model, YOLOv5-CA-Soft-NMS, which has a
detection speed of 13.4ms and a detection accuracy of 98.09%. Compared with YOLOv3-tiny,
YOLOv3, YOLOv4, YOLOv5 and YOLOv7, it has higher detection accuracy and good detec-
tion speed. The proposed model is more suitable for rapid detection of bass population, so that
farmers can feed bait scientifically, control cost and save resources, which has high practical
application value.

Supporting information
S1 Dataset.
(ZIP)

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 12 / 14


PLOS ONE Bass detection model based on improved YOLOv5

Acknowledgments
The authors would like to thank Prof. Shuangyin Liu, Dr. Tonglai Liu for revising the
manuscript.

Author Contributions
Conceptualization: Hao Deng, Shuangyin Liu.
Data curation: Hao Deng.
Funding acquisition: Shuangyin Liu.
Investigation: Yingying Cao.
Methodology: Liang Cao.
Resources: Shuangyin Liu.
Software: Longqin Xu.
Supervision: Tonglai Liu.
Validation: Wenjun Liu, Guohuang He, Wenting Fan, Tangliang Wei.
Writing – original draft: Hao Deng, Tonglai Liu.

References
1. Zhao J. Key points of healthy breeding technology for perch[J]. Fishery Guide to be Rich, 2020, (23):
49–51.
2. Li K, Jiang X, Chen E, et al. Auto-counting the Eel Anguilla in recirculating aquaculture system via deep
learning[J]. Oceanologia et Limnologia Sinica, 2022, 53(03): 664–674.
3. Awalludin EA, Muhammad W N A W, Arsad T N T, et al. Fish larvae counting system using image pro-
cessing techniques[C]//Journal of Physics: Conference Series. IOP Publishing, 2020, 1529(5):
052040.
4. Kesvarakul R, Chianrabutra C, Chianrabutra S. Baby shrimp counting via automated image processing
[C]//Proceedings of the 9th International Conference on Machine Learning and Computing. 2017: 352–
356.
5. Albuquerque P L F, Garcia V, Junior A S O, et al. Automatic live fingerlings counting using computer
vision[J]. Computers and Electronics in Agriculture, 2019, 167: 105015. https://doi.org/10.1016/j.
compag.2019.105015
6. Klapp I, Arad O, Rosenfeld L, et al. Ornamental fish counting by non-imaging optical system for real-
time applications[J]. Computers and electronics in agriculture, 2018, 153: 126–133. https://doi.org/10.
1016/j.compag.2018.08.007
7. Zhang J, Pang H, Cai W, et al. Using image processing technology to create a novel fry counting algo-
rithm[J]. Aquaculture and Fisheries, 2022, 7(4): 441–449. https://doi.org/10.1016/j.aaf.2020.11.004
8. Wang W, Xu J, Du Q. Study on a computer vision based automatic counting system of fries[J]. Fishery
Modernization, 2016, 43(03): 34–38, 73.
9. Liu S, Chen J, Liu X, et al. Study on Chlorella automatic counting based on the algae fluorescence exci-
tation effect[J]. Fishery Modernization, 2012, 39(05): 16–20.
10. Fan L, Liu Y. Automate fry counting using computer vision and multi-class least squares support vector
machine[J]. Aquaculture, 2013, 380: 91–98. https://doi.org/10.1016/j.aquaculture.2012.10.016
11. Huang L, Hu B, Cao N. The Novel Fries-Counting Method Based on Image Processing[J]. Hubei Agri-
cultural Sciences, 2012, 51(09): 1880–1882.
12. Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal
networks[J]. Advances in neural information processing systems, 2015, 28.
13. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Pro-
ceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
14. Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on com-
puter vision. Springer, Cham, 2016: 21–37.

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 13 / 14


PLOS ONE Bass detection model based on improved YOLOv5

15. Bai Qiangy Gao Ronghuay Zhao Chunjiangy et al. Multi-scale behavior recognition method for dairy
cows based on improved YOLOV5s network[J]. Transactions of the Chinese Society of Agricultural
Engineering (Transactions of the CSAE), 2022, 38(12): 163–172.
16. Chen YY, Gong C Y, Liu Y Q. Fish identification method based on FTVGG16 convolutional neural net-
work[J]. Trans. Chin. Soc. Agric. Mach, 2019, 50: 223–231.
17. Cai K, Miao X, Wang W, et al. A modified YOLOv3 model for fish detection based on MobileNetv1 as
backbone[J]. Aquacultural Engineering, 2020, 91: 102117. https://doi.org/10.1016/j.aquaeng.2020.
102117
18. Hong Khai T, Abdullah S N H S, Hasan MK, et al. Underwater Fish Detection and Counting Using Mask
Regional Convolutional Neural Network[J]. Water, 2022, 14(2): 222. https://doi.org/10.3390/
w14020222
19. Kandimalla V, Richard M, Smith F, et al. Automated detection, classification and counting of fish in fish
passages with deep learning[J]. Frontiers in Marine Science, 2022: 2049. https://doi.org/10.3389/fmars.
2021.823173
20. Lainez S M D, Gonzales D B. Automated fingerlings counting using convolutional neural network[C]//
2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS). IEEE,
2019: 67–72.
21. Li D, Miao Z, Peng F, et al. Automatic counting methods in aquaculture: A review[J]. Journal of the
World Aquaculture Society, 2021, 52(2): 269–283. https://doi.org/10.1111/jwas.12745
22. Cui Z. Based on computer vision the counting system of puffer fish[D]. Dalian Ocean University, 2020.
23. Zhang L, Zhou X, Li B, et al. Automatic shrimp counting method using local images and lightweight
YOLOv4[J]. Biosystems Engineering, 2022, 220: 39–54. https://doi.org/10.1016/j.biosystemseng.
2022.05.011
24. Chen Z, Li Z, Yang Z, et al. Research on YOLOv5-based object detection method for factory farmed
shrimp[J]. Marine Fisheries: 1–13.
25. Li Zhenbo, Li Meng, Zhao Yuanyang, et al. Iced pomfret freshness evaluation method based on
improved VGG-19 convolutional neural networks[J]. Transactions of the Chinese Society of Agricultural
Engineering (Transactions of the CSAE), 2021, 37(22):286–294.
26. ZHAO Meng, Hong YU, Haiqing LI, et al. Detection of fish stocks by fused with SKNet and YOLOv deep
learning[J]. Journal of Dalian Ocean University, 2022, 37(2): 312–319.
27. Li Xin, Hao Yinfeng, Zhang Pan, et al. A novel automatic detection method for abnormal behavior of sin-
gle fish using image fusion. Computers and Electronics in Agriculture, 2022, 203: 107435. https://doi.
org/10.1016/j.compag.2022.107435
28. Zhao Shili, Zhang Song, Lu Jiamin, Wang He, Feng Yu, Shi Chen, et al. A lightweight dead fish detec-
tion method based on deformable convolution and YOLOV4, Computers and Electronics in Agriculture,
2022, 198: 107098. https://doi.org/10.1016/j.compag.2022.107098
29. Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]//Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713–13722.
30. Bodla N, Singh B, Chellapa R, et al. Soft-nms- improving object detection with one line of code C].2017
IEEE International Conference on Computer Vision (ICCV), 2017:5562–5569.
31. Neubeck A, Van Gool L. Efficient non-maximum suppression[C]//18th International Conference on Pat-
tern Recognition (ICPR’06). IEEE, 2006, 3: 850–855.
32. Guo L, Wang Q, Xue W, et al. A Small Object Detection Algorithm Based on Improved YOLOv5[J]. Jour-
nal of University of Electronic Science and Technology of China, 2022, 51(02): 251–258.
33. Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the
IEEE conference on computer vision and pattern recognition. 2017: 2117–2125.
34. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the
IEEE conference on computer vision and pattern recognition. 2018: 8759–8768.

PLOS ONE | https://doi.org/10.1371/journal.pone.0283671 March 27, 2023 14 / 14

You might also like