Professional Documents
Culture Documents
Article
Artificial Reef Detection Method for Multibeam Sonar Imagery
Based on Convolutional Neural Networks
Zhipeng Dong 1 , Yanxiong Liu 1,2 , Long Yang 1,2, *, Yikai Feng 1,2 , Jisheng Ding 1,2 and Fengbiao Jiang 1
1 The First Institute of Oceanography, Ministry of Natural Resources, Qingdao 266061, China
2 The Key Laboratory of Ocean Geomatics, Ministry of Natural Resources, Qingdao 266590, China
* Correspondence: yanglong@fio.org.cn
Abstract: Artificial reef detection in multibeam sonar images is an important measure for the moni-
toring and assessment of biological resources in marine ranching. With respect to how to accurately
detect artificial reefs in multibeam sonar images, this paper proposes an artificial reef detection frame-
work for multibeam sonar images based on convolutional neural networks (CNN). First, a large-scale
multibeam sonar image artificial reef detection dataset, FIO-AR, was established and made pub-
lic to promote the development of artificial multibeam sonar image artificial reef detection. Then,
an artificial reef detection framework based on CNN was designed to detect the various artificial
reefs in multibeam sonar images. Using the FIO-AR dataset, the proposed method is compared
with some state-of-the-art artificial reef detection methods. The experimental results show that the
proposed method can achieve an 86.86% F1-score and a 76.74% intersection-over-union (IOU) and
outperform some state-of-the-art artificial reef detection methods.
Keywords: artificial reef detection; multibeam sonar images; artificial reef detection dataset;
convolutional neural networks; deep learning
can learn and extract the image features based on a unique network structure and a large
number of labeled data. When the training data is sufficient, the image features extracted
by the CNN have good robustness and universality in different complex situations [26,27].
Therefore, some scholars have applied object detectors based on CNNs to artificial reef
detection in multibeam sonar images. For example, Xiong et al. used the Faster-RCNN [23]
and single-shot multibox detector (SSD) [21] to realize artificial reef detection in multibeam
sonar images [28]. Feldens et al. applied the You Look Only Once, version 4 (YOLOv4) [29]
object detector to detect artificial reefs using multibeam sonar images [30]. However, in
the detection results of Faster-RCNN, SSD and YOLOv4, the detected artificial reefs are
rectangular areas, making it difficult to accurately detect the boundary of artificial reefs.
In a word, there are currently few studies on the automatic detection of artificial reefs in
multibeam sonar images.
The semantic segmentation framework based on CNNs can assign a class label to
each pixel in the image. It is suitable for the detection and extraction of irregularly shaped
objects. Since the success of fully convolutional networks (FCN) [31] for semantic image
segmentation, different semantic segmentation frameworks based on CNN have been
rapidly developed. The semantic segmentation architectures (e.g., U-Net [32], SegNet [33],
and Deeplab series [34–37]) have been successively proposed and successfully applied
to the semantic segmentation of different types of images. For example, for the seman-
tic segmentation of natural images, Long et al. proposed the FCN to detect multi-class
objects in the PASCAL VOC 2011 segmentation dataset [31]. Badrinarayanan et al. pro-
posed the SegNet for various object detections in the PASCAL VOC 2012 and MS-COCO
segmentation datasets [33]. Chen et al. developed the Deeplab series semantic segmen-
tation frameworks to detect various objects in the PASCAL VOC 2012 and MS-COCO
segmentation datasets [34–37]. Han et al. proposed the edge constraint-based U-Net for
salient object detection in natural images [38]. For the semantic segmentation of medical
images, Ronneberger et al. proposed the U-Net for biomedical image segmentation to
obtain good cell detection results [32]. Wang et al. used the Deeplabv3+ to detect the
pathological slices of gastric cancer in medical images [39]. Bi et al. combined a generative
adversarial network (GAN) with the FCN to detect blood vessels, cells and lung regions
in medical images [40]. Zhou et al. proposed the U-Net++ to detect cells, liver and lung
regions in medical images [41]. For the semantic segmentation of high spatial resolution
remote sensing images (HSRIs), Guo et al. combined U-Net with an attention block and
multiple losses to realize the detection of buildings for HSRIs [42]. Zhang et al. proposed
a deep residual U-Net to detect roads in HSRIs [43]. Diakogiannis et al. proposed the
ResUnet-a to realize the semantic segmentation of various geographic objects for HSRIs.
The ResUnet-a can clearly detect buildings, trees, low vegetation and cars in the ISPRS 2D
semantic dataset [44]. Lin et al. proposed a nested SE-Deeplab model to detect and extract
roads in HSRIs [45]. Jiao et al. proposed the refined U-Net framework to detect clouds and
shadow regions in HSRIs. Different types of clouds and shadow regions in the HSRIs can
be accurately detected using the refined U-Net framework [46]. The semantic segmentation
frameworks based on CNNs have been successfully applied to various object detection in
natural images, medical images and HSRIs. However, they have not yet been applied to
the semantic segmentation of multibeam sonar images.
With respect to how to accurately detect artificial reefs in multibeam sonar images
using semantic segmentation frameworks based on a CNN, this paper proposes an artificial
reef detection framework for multibeam sonar images based on a CNN. First, a large-
scale multibeam sonar image artificial reef detection dataset, FIO-AR, was established and
made public to promote the development of artificial multibeam sonar image artificial
reef detection. Then, an artificial reef detection framework based on a CNN (AR-Net) is
designed to detect various artificial reefs in multibeam sonar images.
The main contributions of this paper are summarized as follows.
Remote Sens. 2022, 14, 4610 3 of 17
3.1. Artificial Reef Detection Dataset for the Multibeam Sonar Images
Large-scale semantic segmentation datasets are the basis and key to the high perfor-
mance of the semantic segmentation frameworks based on CNNs. However, there are
currently no publicly available multibeam sonar image artificial reef detection datasets. To
solve this problem, a large-scale multibeam sonar image artificial reef detection dataset,
FIO-AR, was established and published for the first time to facilitate the development of
multibeam sonar image artificial reef detection. In this paper, the NORBIT-iWBMS and
Teledyne Reson SeaBat T50P multibeam echosounding (MBES) bathymeters were used to
detect three artificial reef areas in China. The NORBIT-iWBMS and Teledyne Reson SeaBat
T50P MBES bathymeters are shown in Figure 1, and their detailed parameters are presented
in Table 1. The three artificial reefs are located in Dalian, Tangshan and Qianliyan, respec-
tively, as shown in Figure 2. Their latitudes and longitudes are roughly 121◦ 500 4100 E and
40◦ 000 2500 N, 118◦ 580 5000 E and 39◦ 100 4000 N, and 121◦ 140 1100 E and 36◦ 130 4000 N, respectively.
The three artificial reef areas are approximately 0.46 km2 , 3 km2 and 0.7 km2 , respectively.
The water depth of the three artificial reefs is about 2~4 m, 5~12 m and 30 m, respectively.
The three artificial reef areas are built mainly by throwing stones and reinforced con-
crete components.
Table 1. Detailed information for the NORBIT-iWBMS and Teledyne Reson SeaBat T50P.
For this paper, a rich variety of multibeam sonar artificial reef images were obtained
using two different MBES bathymeters to measure the three artificial reef areas at differ-
ent water depths. The obtained multibeam sonar artificial reef images are cropped into
1576 image blocks with a size of 512 × 512 pixels. The artificial reef areas in the 1576 image
blocks are accurately annotated using a professional semantic segmentation annotation tool
to establish the artificial reef detection dataset, FIO-AR. In the FIO-AR dataset, some images
and annotation samples are shown in Figure 3. In order to evenly match the distribution of
the training data and test data, we randomly selected 3/5 images as the training dataset,
1/5 images as the verification dataset, and 1/5 images as the test dataset from the FIO-AR
dyne Reson SeaBat T50P multibeam echosounding (MBES) bathymeters were used to detect
three artificial reef areas in China. The NORBIT-iWBMS and Teledyne Reson SeaBat T50P
MBES bathymeters are shown in Figure 1, and their detailed parameters are presented in
Table 1. The three artificial reefs are located in Dalian, Tangshan and Qianliyan, respectively,
Remote Sens. 2022, 14, 4610
as shown in Figure 2. Their latitudes and longitudes are roughly 121°50′41″E and 40°00′25″
4 of 17
N, 118°58′50″E and 39°10′40″ N, and 121°14′11″E and 36°13′40″N, respectively. The three
artificial reef areas are approximately 0.46 km2, 3 km2 and 0.7 km2, respectively. The water
depth of the three artificial reefs is about 2~4 m, 5~12 m and 30 m, respectively. The three
dataset. The FIO-AR dataset lays a good data foundation for artificial reef detection using
artificial reef areas are built mainly by throwing stones and reinforced concrete components.
the semantic segmentation frameworks based on a CNN.
Table 1. Detailed information for the NORBIT-iWBMS and Teledyne Reson SeaBat T50P.
Figure2.2.Distribution
Figure Distributionlocation
locationof
ofthree
threeartificial
artificialreef
reefareas.
areas.
ForAR-Net
3.2. The this paper, a rich variety
Framework Design of multibeam sonar artificial reef images were obtained
using two different MBES bathymeters to measure the three artificial reef areas at different
The semantic segmentation framework based on the CNN can assign a class label
water depths. The obtained multibeam sonar artificial reef images are cropped into 1576
to each pixel in the image. It is suitable for the detection and extraction of irregularly
image blocks with a size of 512 × 512 pixels. The artificial reef areas in the 1576 image
shaped objects. The semantic segmentation frameworks based on CNNs have been suc-
blocks are accurately annotated using a professional semantic segmentation annotation
cessfully applied to various object detections in natural images, medical images and HSRIs.
tool to establish the artificial reef detection dataset, FIO-AR. In the FIO-AR dataset, some
However, they have not yet been applied to the semantic segmentation of the multibeam
images and annotation samples are shown in Figure 3. In order to evenly match the dis-
sonar images. With respect to this problem, an artificial reef semantic segmentation frame-
tribution of the training data and test data, we randomly selected 3/5 images as the train-
work based on the CNN, the AR-Net, is designed to realize artificial reef detection and
ing dataset,
extraction in1/5 images as
multibeam the verification
sonar images. Thedataset,
SegNetand 1/5 images
network as the
structure test
has thedataset from
advantage
the FIO-AR dataset. The FIO-AR dataset lays a good data foundation for artificial
of a symmetrical structure in the encoding and decoding stages, which can achieve scale reef
detection using the semantic segmentation frameworks based on a CNN.
consistency between the input image and the semantic segmentation result [33]. The U-Net
network structure can combine high-level semantic information and low-level location
Remote Sens. 2022, 14, 4610 5 of 17
information to achieve accurate object region extraction [32]. The residual module can make
the network parameters converge faster and better [47]. In this paper, the AR-Net fully
integrates the advantages of the SegNet network structure symmetry, the U-Net, combined
with high-level feature semantic information and low-level feature location information,
and a residual module to optimize the training parameters to detect artificial reefs in the
multibeam sonar images. The AR-Net framework is an efficient and lightweight network
Remote Sens. 2022, 14, x FOR PEER REVIEW 5 of 18
architecture, as shown in Figure 4.
In the AR-Net framework, five-level features are used for the image feature extraction
in the encoding stage. In each level feature, there are two feature maps. In the decoding
stage, six-level features are used for the image feature extraction. Among them, each of
the first five-level features has two feature maps, and the sixth-level feature has only one
feature map. Moreover, the feature maps obtained from each level feature in the encoding
stage are fused into the image feature extraction in the decoding stage. In the decoding
stage, the logistic function normalizes the sixth-level feature output value from 0 to 1. The
logistic function is shown in Formula (1). The output values of the logistic function are
used to calculate the network loss function and predict the network semantic segmentation
result. In the AR-Net framework, the width, height and number of bands of the input
image are 240, 240 and 3, respectively. The size and stride of the convolution kernel are set
to 2 and 2 in the max-pool layers, respectively. In the convolution layers, the size, stride and
pad of the convolution kernel are set to 3, 1 and 1, respectively. During the training of the
AR-Net framework, the stochastic gradient descent (SGD) [48] with a minibatch size of 12 is
applied to train the network parameters. The learning rate for the first 10,000 min-batches
is set to 0.0001, the learning rate for the 10,000 to 50,000 times min-batch is set to 0.00001,
and the learning rate for the 50,000 to 100,000 times min-batch is set to 0.000001. The
momentum and weight decay are set to 0.9 and 0.0001, respectively. In the ground-truth
annotation, the pixel RGB values of the artificial reef area and the background are labeled as
(255, 255, 255) and (0, 0, 0), respectively. When calculating the loss function, the pixel values
of the ground-truth annotation result are divided by 255 to normalized to 0~1. In this paper,
the root mean squared error (RMSE) is used to calculate the AR-Net loss value between the
normalized ground-truth annotation results and the output values of the logistic function.
The RMSE is calculated as shown in Formula (2).
1
s( x ) = (1)
1 + e− x
v
u N m n 3
1
N × m × n × 3∑ ∑ ∑ ∑ ( Lkijr − L∗kijr )2
u
Loss = t (2)
k i j r
where x is the sixth-level feature output value. Loss is the training loss value of the AR-Net
framework for a min-batch. N is the number of images contained in a min-batch. m and n
are the height and width of the last feature map, respectively. Lkijr is the output value of
the logistic function at position (r, j, i, k). L∗kijr is the truth value at the position (r, j, i, k) of
the normalized ground-truth annotation results.
In the AR-Net framework testing phase, the output values of the logistic function are
multiplied by 255 to map to 0~255. A gray image is obtained based on the output values of
the logistic function using Formula (3) [49], which is the most widely used color image in
gray images. If the pixel value in the gray image is greater than or equal to the threshold,
the predicted image pixel corresponding to the location of the pixel in the gray image is an
artificial reef; otherwise, it is the background. The threshold is set to 200 in this paper. In
the prediction image, the artificial reef area and background are labeled as (255, 255, 255)
and (0, 0, 0), respectively.
where Gray is the pixel value of the gray image. (R, G, B) are the output values of the three
feature maps of the logistic function.
4. Results
To verify the effectiveness of the AR-Net framework, the AR-Net framework is com-
pared with some state-of-the-art semantic segmentation algorithms (e.g., U-Net, SegNet
and Deeplab). The AR-Net is comprehensively tested and evaluated in terms of its accuracy,
efficiency, model complexity and generality. The AR-Net is implemented based on the
Remote Sens. 2022, 14, 4610 7 of 17
darknet framework and written in C++. The experiments are run on a workstation with
Windows 10 installed, Inter CPU E5-2667 v4 @ 3.20 GHz, 16 GB RAM and NVIDIA Quadro
M4000 graphics card (8 GB GPU memory).
Table 2. Performance comparisons of the six semantic segmentation algorithms for the FIO-AR
dataset. The bold numbers represent the maximum value in each column.
In Table 2, the precision of the AR-Net is slightly lower than that of the SegNet and
ResUNet-a but is higher than that of the other three semantic segmentation algorithms,
indicating that the AR-Net can well distinguish the artificial reef and background for the
multibeam sonar images. The recalls of the six semantic segmentation algorithms are
0.5312, 0.7425, 0.8559, 0.8191, 0.8184 and 0.8638, respectively. The recall of the AR-Net is
the largest, which shows that the AR-Net can clearly identify the artificial reef pixels in the
multibeam sonar images. The F1-score of the six semantic segmentation algorithms are
0.6477, 0.7829, 0.864, 0.8557, 0.8455 and 0.8684, respectively. The AR-Net has the largest
F1-score, indicating that the AR-Net has the highest accuracy in the artificial reef detection
results for the multibeam sonar images. The IOUs of the six semantic segmentation
algorithms are 0.4789, 0.6432, 0.7605, 0.7477,0.7324 and 0.7674, respectively. The IOU of
the AR-Net is the largest among the six semantic segmentation algorithms, which shows
that the semantic segmentation results of the AR-Net are the most consistent with the
annotation results of the ground truth for the FIO-AR dataset. The OA of the six semantic
segmentation algorithms are 0.9046, 0.932, 0.9555, 0.9544, 0.9506 and 0.9568, respectively.
The AR-Net has the largest OA, showing that the AR-Net has the best classification results
for the multibeam sonar images. The quantitative evaluation results show that the AR-Net
outperforms the other five semantic segmentation algorithms and can obtain more accurate
artificial reef detection results for the FIO-AR dataset.
Table 3. The average time consumption of the six semantic segmentation algorithms for the FIO-AR
dataset. The bold numbers represent the minimum value in the column.
which indicates that the AR-Net framework is effective for artificial reef detection for the
multibeam sonar images in the FIO-AR dataset.
Table 4. The model parameter size of the six semantic segmentation algorithms. The bold numbers
represent the minimum value in each column.
4.5. Quantitative Evaluation for Artificial Reef Detection Results of Large-Scale Multibeam
Sonar Image
To further verify the AR-Net framework’s artificial reef detection effectiveness and
generalizability, two large-scale multibeam sonar images are used to compare the AR-Net
framework with some state-of-the-art semantic segmentation algorithms. Two large-scale
multibeam sonar images show the artificial reef areas in Dongtou. They are roughly located
at 121◦ 70 4000 E and 27◦ 500 3000 N, and 121◦ 110 2000 E and 27◦ 530 4000 N, respectively. An image
covers an area of approximately 1.05 km2 and has a size of 4259 × 1785 pixels. The other
image covers an area of approximately 0.47 km2 and has a size of 2296 × 1597 pixels. The
artificial reef areas are built mainly by throwing stones and reinforced concrete components
in Dongtou. Based on the trained model using the FIO-AR database, different semantic
segmentation algorithms are tested and compared using two large-scale multibeam sonar
images. Two large-scale multibeam sonar images are divided into 512 × 512-pixel image
blocks, which are input into the semantic segmentation algorithms for artificial reef detec-
tion. When the image blocks are detected, the detection results are mapped back to the
original large-scale images.
Table 5 shows the quantitative evaluation results of six semantic segmentation algo-
rithms for the artificial reef detection of the two large-scale multibeam images. In Table 5,
the F1-score values of the six semantic segmentation algorithms are 0.5394, 0.7729, 0.7853,
0.8163, 0.8049 and 0.8974, respectively. The F1-score value of the AR-Net is the largest,
indicating that the AR-Net has the highest accuracy in the artificial reef detection results for
the large-scale multibeam sonar images. The IOU values of the six semantic segmentation
algorithms are 0.3693, 0.6299, 0.6465, 0.6897, 0.6735 and 0.8139, respectively. The OAs of
the six semantic segmentation algorithms are 0.9025, 0.9336, 0.9451, 0.9514, 0.9486 and
0.9693, respectively. The AR-Net has the largest OA, showing that the AR-Net has the best
classification results for large-scale multibeam sonar images. The IOU of the AR-Net is the
largest among the six semantic segmentation algorithms, which shows that the semantic
segmentation results of the AR-Net are the most consistent with the annotation results of
the ground truth for the large-scale multibeam sonar images. The experimental results show
that the AR-Net is superior to the other five semantic segmentation algorithms and can
obtain more accurate artificial reef detection results for large-scale multibeam sonar images.
Table 5. Performance comparisons of the six semantic segmentation algorithms for the large-scale
multibeam images. The bold numbers represent the maximum value in each column.
Figure 5. The artificial reef detection results of the six semantic segmentation algorithms for the
multibeam sonar images in the FIO-AR dataset. (a–h) The first two columns are the multibeam
sonar images and ground-truth labels in the FIO-AR dataset, respectively. The next six columns
are the artificial reef detection results of the FCN, Deeplab, U-Net, SegNet, ResUNet-a and AR-Net,
respectively. The red rectangles represent the key comparison areas of the algorithm results.
Remote Sens. 2022, 14, 4610 11 of 17
In Figure 5a, the FCN and Deeplab fail to detect the larger artificial reef areas in the
red rectangular box. The U-Net, SegNet and ResUnet-a cannot detect the smaller-scale
artificial reef area in the red rectangle. The artificial reef detection results of the AR-Net are
basically consistent with the ground-truth labels, which can accurately detect the artificial
reef areas in the multibeam sonar images. In Figure 5b, the FCN and Deeplab miss detecting
the artificial reef areas in the red rectangle. The U-Net, SegNet and ResUnet-a produce
false detections for the artificial reef area in the red rectangle. The AR-Net can accurately
detect artificial reef areas in the multibeam sonar images. In Figure 5c, the FCN, Deeplab,
U-Net, SegNet and ResUnet-a have missed detections for the artificial reef areas in the red
rectangle. The artificial reef detection results of the AR-Net are basically consistent with
the ground-truth labels. In Figure 5d, the artificial reef detection results of the FCN and
Deeplab are quite different from the ground-truth labels, and there are a large number
of missed and false detections. For the artificial reef area in the red rectangle, the U-Net,
SegNet and ResUnet-a have false detections, while the AR-Net can accurately detect the
artificial reef in this area. In Figure 5e, the FCN and Deeplab miss the detection of artificial
reefs in red rectangles. The U-Net and SegNet mistakenly detect the gully areas in the red
rectangles as artificial fish reefs. The ResUNet-a and AR-Net can accurately detect discrete
artificial reefs in the multibeam sonar images. In Figure 5f, the FCN, Deeplab, U-Net,
SegNet and ResUNet-a cannot correctly detect the artificial reef area in the red rectangle.
However, the AR-Net can accurately detect the artificial reef area in the red rectangle,
and its artificial reef detection results are basically consistent with the ground-truth labels.
In Figure 5g, the FCN, Deeplab, U-Net, SegNet and ResUNet-a cannot accurately detect
discrete artificial reef areas in the red rectangle. The AR-Net can accurately detect discrete
artificial reef areas in the red rectangle, and its artificial reef detection results are the closest
to the ground-truth labels. In Figure 5h, for the artificial reef areas in the red rectangle,
the FCN, Deeplab, U-Net, SegNet and ResUNet-a cannot accurately detect them. The
AR-Net can accurately detect the artificial reef areas in the red rectangle, and its artificial
reef detection results are most similar to the ground-truth labels.
The visual evaluation results show that the artificial reef detection results of the AR-
Net are superior to those of the other five semantic segmentation algorithms for the FIO-AR
dataset. The AR-Net framework can accurately detect artificial reef areas with different
shapes in multibeam sonar images at different depths.
(a)
(b)
(c)
Figure 6. Cont.
RemoteRemote Sens. 2022,
Sens. 2022, 14, x 14,
FOR 4610
PEER REVIEW 13 of 14
17 of 18
(d)
(e)
Figure 6. Cont.
RemoteRemote
Sens. 2022, 14, x 14,
Sens. 2022, FOR PEER REVIEW
4610 15 of 18
14 of 17
(f)
Figure 6. 6.
Figure (a,d)
(a,d)are
arethe
the large-scale multibeam
large-scale multibeam sonar
sonar images.
images. (b,e) (b,e)
are theare the artificial
artificial reef detection
reef detection results re-
sults using
using SegNet.
SegNet. (c,f) (c,f) areartificial
are the the artificial reef detection
reef detection results
results using using The
AR-Net. AR-Net.
yellowThe yellow with
rectangles rectangles
with
thethe numbers
numbers represent
represent thecomparison
the key key comparison
areas ofareas of the algorithm
the algorithm results. results.
In Figure 6b—area
,
6. Conclusions 1 the SegNet misses detecting the artificial reef areas with small
height differences from the seafloor. In Figure 6c—area
, 1 the AR-Net can basically
In this paper, an artificial reef semantic segmentation framework, based on a CNN
accurately detect the artificial reef area, except for a small amount of false detection in the
(AR-Net), is designed for artificial reef detection of multibeam sonar images. The AR-Net
artificial reef boundary area. In Figure 6b—area
, 2 the SegNet falsely detects an undulating
fully integrates
seafloor area asthe advantages
an artificial ofFigure
reef. In the SegNet
6c—area network
, structure symmetry, the U-Net,
2 the AR-Net can accurately identify
combined with high-level feature
the seafloor area. In Figure 6b—area
, semantic
3 the SegNet misses the low-level
information and detection offeature location
artificial reef in-
formation,
areas withandsmall a height
residual modulefrom
differences to optimize
the seafloortheand
training parameters
falsely detects to detect
undulating artificial
seafloor
reefs in as
areas the multibeam
artificial reefs. Insonar
Figure images.
6c—area
,
Furthermore,
3 the AR-Netafalsely
large-scale
detectsmultibeam sonar image
undulating seafloor
areas asreef
artificial artificial reefs. dataset,
detection From the FIO-AR,
overall comparative evaluation,
was established andthe AR-Net isfor
published better
the than
first time
to the SegNetthe
facilitate fordevelopment
the detection results of artificial
of multibeam reefsimage
sonar in Figure 6a.
artificial reef detection. To verify
In Figure 6e—area
, 1 the SegNet has a large number of missed detections in artificial
the effectiveness of the AR-Net, the FIO-AR dataset and two large-scale multibeam sonar
reef areas. In Figure 6f—area
, 1 the AR-Net can basically accurately detect the artificial
images were used to qualitatively and quantitatively compare the AR-Net with some
reef area, except for a small amount of false detection in the artificial reef boundary area. In
state-of-the-art
Figure 6e—area semantic
, segmentation algorithms. The experimental results show that the
2 the SegNet has a large number of missed detections and false detections
AR-Net is superior
in artificial reef areas.to other state-of-the-art
In Figure 6f—area
, semantic
2 the AR-Netsegmentation
can accurately algorithms and can ac-
detect the artificial
curately andIn
reef areas. efficiently detect
,
Figure 6e—area artificial reef areas
3 the SegNet with number
has a large differentofshapes
missedin multibeam
detections andsonar
images at different depths, except for a small number of false detections and missed de-
false detections in artificial reef areas. In Figure 6f—area
,3 the AR-Net can accurately
detect in
tections thethe
artificial
boundary reef area,
areasexcept for a small
of artificial reefs.amount of false
In future detection
work, in the artificial
the boundary constraints
reef boundary area. In Figure 6e—area
,
4 there are a large number of missed detections
will be added to the semantic segmentation framework design to eliminate the missed
and false detections in the detection results of artificial reefs. In Figure 6f—area
, 4 the
and false detections in the artificial reef boundary areas to further improve the accuracy
artificial reefs can be accurately detected using the AR-Net. In Figure 6e, the SegNet has
of the artificial reef detection results.
a large number of missed detections and false detections in the detection results of densely
distributed artificial reefs. However, the AR-Net can accurately detect densely distributed
Author Contributions: Methodology, software, writing—original draft, writing—review and edit-
ing, Z.D.; funding acquisition, supervision, writing—review and editing, L.Y.; writing—review and
editing, Y.L. and Y.F.; data collection and annotation, J.D. and F.J. All authors have read and agreed
to the published version of the manuscript.
Remote Sens. 2022, 14, 4610 15 of 17
artificial reefs, except for a small number of missed and false detections in the boundaries
of artificial reefs in Figure 6f.
The analysis results show that the AR-Net can accurately detect various types of
artificial reefs from actual large-scale multibeam sonar images, except for a small number of
false detections and missed detections in the boundary areas of artificial reefs. Therefore, the
AR-Net can be effectively applied to artificial reef detection in actual large-scale multibeam
sonar images.
6. Conclusions
In this paper, an artificial reef semantic segmentation framework, based on a CNN
(AR-Net), is designed for artificial reef detection of multibeam sonar images. The AR-Net
fully integrates the advantages of the SegNet network structure symmetry, the U-Net,
combined with high-level feature semantic information and low-level feature location
information, and a residual module to optimize the training parameters to detect artificial
reefs in the multibeam sonar images. Furthermore, a large-scale multibeam sonar image
artificial reef detection dataset, FIO-AR, was established and published for the first time
to facilitate the development of multibeam sonar image artificial reef detection. To verify
the effectiveness of the AR-Net, the FIO-AR dataset and two large-scale multibeam sonar
images were used to qualitatively and quantitatively compare the AR-Net with some
state-of-the-art semantic segmentation algorithms. The experimental results show that
the AR-Net is superior to other state-of-the-art semantic segmentation algorithms and can
accurately and efficiently detect artificial reef areas with different shapes in multibeam
sonar images at different depths, except for a small number of false detections and missed
detections in the boundary areas of artificial reefs. In future work, the boundary constraints
will be added to the semantic segmentation framework design to eliminate the missed and
false detections in the artificial reef boundary areas to further improve the accuracy of the
artificial reef detection results.
Abbreviations
References
1. Yang, H. Construction of marine ranching in China: Reviews and prospects. J. Fish. China 2016, 40, 1133–1140.
2. Yang, H.; Zhang, S.; Zhang, X.; Chen, P.; Tian, T.; Zhang, T. Strategic thinking on the construction of modern marine ranching in
China. J. Fish. China 2019, 43, 1255–1262.
3. Zhou, X.; Zhao, X.; Zhang, S.; Lin, J. Marine ranching construction and management in east china sea: Programs for sustainable
fishery and aquaculture. Water 2019, 11, 1237. [CrossRef]
4. Yu, J.; Zhang, L. Evolution of marine ranching policies in China: Review, performance and prospects. Sci. Total Environ. 2020,
737, 139782. [CrossRef]
5. Qin, M.; Wang, X.; Du, Y.; Wan, X. Influencing factors of spatial variation of national marine ranching in China. Ocean Coast.
Manag. 2021, 199, 105407. [CrossRef]
6. Kang, M.; Nakamura, T.; Hamano, A. A methodology for acoustic and geospatial analysis of diverse artificial-reef datasets. ICES
J. Mar. Sci. 2011, 68, 2210–2221. [CrossRef]
7. Zhang, D.; Cui, Y.; Zhou, H.; Jin, C.; Zhang, C. Microplastic pollution in water, sediment, and fish from artificial reefs around the
Ma’an Archipelago, Shengsi, China. Sci. Total Environ. 2020, 703, 134768. [CrossRef]
8. Yu, J.; Wang, Y. Exploring the goals and objectives of policies for marine ranching management: Performance and prospects for
China. Mar. Pol. 2020, 122, 104255. [CrossRef]
9. Castro, K.L.; Battini, N.; Giachetti, C.B.; Trovant, B.; Abelando, M.; Basso, N.G.; Schwindt, E. Early detection of marine invasive
species following the deployment of an artificial reef: Integrating tools to assist the decision-making process. J. Environ. Manag.
2021, 297, 113333. [CrossRef]
10. Whitmarsh, S.K.; Barbara, G.M.; Brook, J.; Colella, D.; Fairweather, P.G.; Kildea, T.; Huveneers, C. No detrimental effects of
desalination waste on temperate fish assemblages. ICES J. Mar. Sci. 2021, 78, 45–54. [CrossRef]
11. Becker, A.; Taylor, M.D.; Lowry, M.B. Monitoring of reef associated and pelagic fish communities on Australia’s first purpose
built offshore artificial reef. ICES J. Mar. Sci. 2016, 74, 277–285. [CrossRef]
12. Lowry, M.; Folpp, H.; Gregson, M.; Suthers, I. Comparison of baited remote underwater video (BRUV) and underwater visual
census (UVC) for assessment of artificial reefs in estuaries. J. Exp. Mar. Biol. Ecol. 2012, 416, 243–253. [CrossRef]
13. Becker, A.; Taylor, M.D.; Mcleod, J.; Lowry, M.B. Application of a long-range camera to monitor fishing effort on an offshore
artificial reef. Fish. Res. 2020, 228, 105589. [CrossRef]
14. Trzcinska, K.; Tegowski, J.; Pocwiardowski, P.; Janowski, L.; Zdroik, J.; Kruss, A.; Rucinska, M.; Lubniewski, Z.; Schneider von
Deimling, J. Measurement of seafloor acoustic backscatter angular dependence at 150 kHz using a multibeam echosounder.
Remote Sens. 2021, 13, 4771. [CrossRef]
15. Tassetti, A.N.; Malaspina, S.; Fabi, G. Using a multibeam echosounder to monitor an artificial reef. In Proceedings of the
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Piano di Sorrento, Italy,
16–17 April 2015.
16. Wan, J.; Qin, Z.; Cui, X.; Yang, F.; Yasir, M.; Ma, B.; Liu, X. MBES seabed sediment classification based on a decision fusion method
using deep learning model. Remote Sens. 2022, 14, 3708. [CrossRef]
17. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans.
Pattern Anal. Mach. Intell. 2014, 37, 1904–1916. [CrossRef]
18. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), Boston, MA, USA,
8–12 June 2015; pp. 1440–1448.
19. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation.
IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [CrossRef] [PubMed]
20. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 779–788.
21. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of
the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 21–37.
22. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region based fully convolutional networks. In Proceedings of the Neural
Information Processing Systems (NIPS), Barcelona, Spain, 4–9 December 2016; pp. 379–387.
23. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans.
Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [CrossRef]
Remote Sens. 2022, 14, 4610 17 of 17
24. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE International Conference on Computer
Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525.
25. Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
26. Dong, Z.; Wang, M.; Wang, Y.; Liu, Y.; Feng, Y.; Xu, W. Multi-oriented object detection in high-resolution remote sensing imagery
based on convolutional neural networks with adaptive object orientation features. Remote Sens. 2022, 14, 950. [CrossRef]
27. Dong, Z.; Wang, M.; Wang, Y.; Zhu, Y.; Zhang, Z. Object detection in high resolution remote sensing imagery based on
convolutional neural networks with suitable object scale features. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2104–2114. [CrossRef]
28. Xiong, H.; Liu, L.; Lu, Y. Artificial reef detection and recognition based on Faster-RCNN. In Proceedings of the IEEE
2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China,
17–19 December 2021; Volume 2, pp. 1181–1184.
29. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
30. Feldens, P.; Westfeld, P.; Valerius, J.; Feldens, A.; Papenmeier, S. Automatic detection of boulders by neural networks. In
Hydrographische Nachrichten 119; Deutsche Hydrographische Gesellschaft E.V.: Rostock, Germany, 2021; pp. 6–17.
31. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE
International Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440.
32. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the
International Conference on Medical Image Computing and Computer-assisted Intervention, Munich, Germany, 5–9 October 2015;
Springer: Cham, Switzerland, 2015; pp. 234–241.
33. Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation.
IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [CrossRef] [PubMed]
34. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets
and fully connected CRFs. Comput. Sci. 2014, 357–361. [CrossRef]
35. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolu-
tional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [CrossRef]
[PubMed]
36. Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017,
arXiv:1706.05587.
37. Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image
segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018;
pp. 801–818.
38. Han, L.; Li, X.; Dong, Y. Convolutional edge constraint-based U-Net for salient object detection. IEEE Access 2019, 7, 48890–48900.
[CrossRef]
39. Wang, J.; Liu, X. Medical image recognition and segmentation of pathological slices of gastric cancer based on Deeplab v3+ neural
network. Comput. Met. Prog. Biomed. 2021, 207, 106210. [CrossRef]
40. Bi, L.; Feng, D.; Kim, J. Dual-path adversarial learning for fully convolutional network (FCN)-based medical image segmentation.
Vis. Comput. 2018, 34, 1043–1052. [CrossRef]
41. Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A nested U-Net architecture for medical image segmentation. In
Proceedings of the 4th Deep Learning in Medical Image Analysis (DLMIA) Workshop, Granada, Spain, 20 September 2018.
42. Guo, M.; Liu, H.; Xu, Y.; Huang, Y. Building extraction based on U-Net with an attention block and multiple losses. Remote Sens.
2020, 12, 1400. [CrossRef]
43. Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [CrossRef]
44. Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely
sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [CrossRef]
45. Lin, Y.; Xu, D.; Wang, N.; Shi, Z.; Chen, Q. Road extraction from very-high-resolution remote sensing images via a nested
SE-Deeplab model. Remote Sens. 2020, 12, 2985. [CrossRef]
46. Jiao, L.; Huo, L.; Hu, C.; Tang, P. Refined unet: Unet-based refinement network for cloud and shadow precise segmentation.
Remote Sens. 2020, 12, 2001. [CrossRef]
47. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778.
48. LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Backpropagation applied to kandwritten zip
code recognition. Neural Comput. 1989, 1, 541–551. [CrossRef]
49. Dong, Z.; Liu, Y.; Xu, W.; Feng, Y.; Chen, Y.; Tang, Q. A cloud detection method for GaoFen-6 wide field of view imagery based on
the spectrum and variance of superpixels. Int. J. Remote Sens. 2021, 42, 6315–6332. [CrossRef]
50. He, S.; Jiang, W. Boundary-assisted learning for building extraction from optical remote sensing imagery. Remote Sens. 2021,
13, 760. [CrossRef]