You are on page 1of 14

Automation in Construction 159 (2024) 105254

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

UAV-deployed deep learning network for real-time multi-class damage


detection using model quantization techniques
Xiaofei Yang *, Enrique del Rey Castillo , Yang Zou , Liam Wotherspoon
Department of Civil and Environmental Engineering, University of Auckland, Auckland 1023, New Zealand

A R T I C L E I N F O A B S T R A C T

Keywords: Real-time damage detection algorithms deployed on Unmanned Aerial Vehicles (UAVs) can support flight control
Energy-efficient deep learning network in real time, enabling the capture of higher quality inspection data. However, three challenges have hindered
Real-time damage detection their wider application: 1) Existing anchor-based damage detectors cannot generalize well to real-world sce­
Concrete bridge
narios and degrade the detection speed; 2) Prior studies exhibit a low detection accuracy; 3) No previous study
Quantization-aware training
Computer vision
considers the energy consumption issue of the damage detector, limiting the UAVs’ flight time. To meet these
challenges, this paper presented the YOLOv6s-GRE-quantized method, which is an energy-efficient anchor free
and real-time damage detection method built on top of the YOLOv6s algorithm. Firstly, the YOLOv6s-GRE
method was presented, where a generalized feature pyramid network (GFPN), a reparameterization efficient
layer aggregation network (RepELAN) and an efficient detection head were introduced into the YOLOv6s.
Comparison experiments showed that the YOLOv6s-GRE method, in contrast to YOLOv6s, advanced 2.3 per­
centage points in the metric of mAP50, while maintaining comparable detection speed and without requiring an
increase in model size. The YOLOv6s-GRE model was then reconstructed by the RepOptimizer (RepOpt) to
equivalently transform the YOLOv6s-GRE into a quantization-friendly model for addressing the quantization
difficulty of the reparameterization model. Finally, the YOLOv6s-GRE model with RepOpt was quantized by the
partial quantization-aware training technique, expediting the detection speed by 83.5% and saving energy by
79.7% while still maintaining a comparable level of detection accuracy. Implementing of this proposed method
can significantly boost bridge inspection productivity.

1. Introduction Unmanned aerial vehicles (UAVs) have recently seen an increased


use for bridge visual inspection due to their excellent maneuverability
Bridges form the backbone of the modern transportation network but and a wide coverage of the inspection field of view [32,44]. According to
are susceptible to deterioration and corrosion due to adverse environ­ a study from the Florida Department of Transportation, over half of
mental conditions and increasing traffic loads [6]. The excessive dete­ bridges can be inspected by UAVs [34], saving up to 60% of inspection
rioration of bridges may lead to traffic restrictions and temporary costs [46]. Existing UAV-assisted bridge inspection processes rely on
closures, significantly affecting a country’s economy [40]. Proper bridge fixed-view-point navigation or manual operation multiple meters away
condition monitoring is critical to the safe use of in-service bridges. from the bridge to capture images (e.g. 5.5 m of working distance with a
Among the various condition monitoring strategies, visual inspection is Ground Sampling Distance (GSD) of 1.5 mm/pixel [43]) to provide a
the most frequent and cost-effective, aiming to collect “relevant data and wide coverage of viewpoints and avoid collision. However, this does not
describe defects in terms of their type, location, extent, severity and, if meet the bridge health monitoring standards that require sub-millimeter
possible, cause” [3]. Nevertheless, conventional bridge visual inspection level of damage measurement accuracy. It is also impractical to imple­
has been recognized as time-consuming, error prone and sometimes ment a closer-proximity inspection for a long time. Because this will
dangerous [48]. Thus, new techniques with a more efficient and decrease the UAV coverage of viewpoints, significantly lowering the
objective inspection decision process are in high demand given that the inspection efficiency and increasing the flight time while compromising
bridge stock is ageing, and the infrastructure maintenance budget is flight safety. A possible solution is to deploy real-time damage detection
decreasing. algorithms on the UAV empowering it with damage awareness,

* Corresponding author.
E-mail address: fyan983@aucklanduni.ac.nz (X. Yang).

https://doi.org/10.1016/j.autcon.2023.105254
Received 10 June 2023; Received in revised form 20 December 2023; Accepted 21 December 2023
Available online 30 December 2023
0926-5805/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
X. Yang et al. Automation in Construction 159 (2024) 105254

precision position control and close-proximity inspection. The idea is 1) An improved real-time anchor free damage detection method called
that the UAV approaches the vicinity of the damaged area with the assist YOLOv6s-GRE was developed on top of the YOLOv6s algorithm to
of immediate flight control algorithms only when surface defects are better generalize to real-world scenarios with higher detection
detected given that healthy concrete areas comprise over 80–90% of the accuracy.
total bridge appearance area [1]. This leads to a damage-aware coverage 2) The YOLOv6s-GRE model was reconstructed with RepOptimizer
of viewpoints to capture high resolution damage data with sub- (RepOpt) [12] to equivalently transform the YOLOv6s-GRE model to
millimeter level of finer details, while ensuring the flight safety. It is a quantization-friendly model for addressing the quantization diffi­
worth noting that previous work leveraging the ground server to process culty of the reparameterization model. The YOLOv6s-GRE model
the UAV-captured videos is not appropriate to support a real-time flight with RepOpt was quantized using a partial quantization-aware
control because it may encounter delays caused by video encoding and training technique to obtain a memory and storage footprint saving
streaming, as well as inevitable interruptions in the video stream caused and energy-efficient network that is appropriate for deployment on
by connection issues, obstacles, and weather conditions, increasing the UAVs.
response time [4,45]. The key of UAV-assisted real-time damage
detection is to develop a light-weight, energy efficient and real-time The structure of the paper is organized as follows. State-of-the-art
damage detection algorithm that is applicable to be deployed on UAVs research is discussed in Section 2. The methodology of the paper is
to support higher quality inspection data acquisition. However, most introduced in Section 3. Section 4 elaborates on the experiment imple­
existing damage detection algorithms are not appropriate for the ment, evaluation and analysis. Section 5 concludes the paper and dis­
deployment on UAVs as existing real-time damage detection models cusses potential future research. Code would be available at https://gith
require expensive computational resources, high memory, a large stor­ ub.com/Xiaofei-Kevin-Yang/YOLOv6-GRE-Quantized.
age footprint and high energy consumption [21].
Existing real-time damage detection methods mainly leveraged light- 2. Literature review
weight networks or tiny versions of deep learning models with small
model depths and widths to reduce the model parameters and compu­ The combination of UAVs and automated damage detection algo­
tational burden. An early example proposed by Jiang et al. [22] intro­ rithms has the potential to significantly improve the efficiency of the
duced a light-weight backbone called MobileNetV3 [25] into the bridge visual inspection process from data collection and analysis to
original You Only Look Once version 3 (YOLOv3) algorithm [14] to auxiliary decision-making. Currently, automated damage detection
improve damage detection accuracy and inference speed. The recently methods can be divided into two groups: post-flight and real-time
developed YOLOv5s-HSC network employed a combination of the damage detection [49].
YOLOv5s algorithm and Swin Transformer modules as well as attention
modules to conduct real-time damage detection tasks [51]. Some limi­ 2.1. Post-flight damage detection
tations remain, despite the extensive work and significant improvements
of the last few years, as listed below: Post-flight damage detection, also called offline detection, refers to
the detection of damage after the data collection task. An extensive
1) Previous studies rely on anchor-based methods that spend a large number of studies have focused on the field of post-flight damage
amount of time on anchor relevant computation, limiting inference detection. An early example leveraged Faster Region-based Convolu­
speed and the ability to generalize to real-world scenarios. tional Neural Networks (Faster R-CNNs) to automatically detect five
2) The detection accuracy of existing real-time damage detection damage types [8]. The experiment results demonstrated that the deep
methods is still low due to the reliance on light-weight networks or learning technique was superior compared to traditional image pro­
tiny versions of deep learning models. cessing techniques and machine learning methods. Faster R-CNNs are a
3) None of the existing studies considered the energy consumption of typical two-stage damage detector that can achieve a high detection
deep learning networks, which is a critical factor for their deploy­ accuracy but have a relatively low detection speed. To accelerate the
ment on UAVs that are often constrained by their battery capacity. detection speed, YOLOv3 has been investigated [22,48], indicating a
three to five times faster detection speed than Faster R-CNNs while
This study presents the development of an energy-efficient anchor slightly compromising the overall detection accuracy. To improve the
free and real-time damage detection method using a model quantization detection accuracy of YOLO series, YOLOv4 [7] was presented on top of
technique, the YOLOv6s-GRE-quantized method, appropriate for YOLOv3 algorithm using more tuning methods such as mosaic data
deployment on UAVs. The YOLOv6s algorithm [27] is the current most augmentation, and the Path Aggregation Network (PAN) was introduced
advanced anchor free method and was selected as the baseline model into the network for better multi-level feature aggregation. An example
because of its high detection accuracy and fast inference speed, as well of the use of YOLOv4 to perform automated damage detection was
as small number of parameters and low floating-point operations per presented by Zou et al. [54], where depth-wise separable convolution
second (FLOPs) [27]. A generalized feature pyramid network (GFPN) blocks were added to improve the YOLOv4 network, leading to less
[23,47] was firstly introduced into the YOLOv6s algorithm as the neck computational cost. YOLOv5 was developed to further improve the
network to boost information exchange across distinct spatial scales and detection accuracy. A recent example presented an improved YOLOv5
different levels of potential semantics concurrently. Along with the algorithm to conduct bridge surface defect detection, integrating a
GFPN, a reparameterization-based efficient layer aggregation network convolutional block attention module, a decoupled prediction head, and
(RepELAN) was proposed to replace the original Reparameterization a focal loss function into the original network [37].
Block (RepBlock) to reduce the model parameters and computing re­ As the aforementioned YOLO algorithms are anchor-based methods,
sources while improving the feature fusion ability and the inference their detection performance is dependent on the selection of preset an­
speed. In addition, an efficient detection head was incorporated to chor box sizes. As the calculation of preset anchor box sizes is dataset-
decrease the model parameters. Finally, the model quantization tech­ specific, anchor-based methods cannot generalize well to the real-
nique was leveraged on the YOLOv6s-GRE method to reduce computa­ world scenarios. To address this issue, He et al. presented a novel
tional complexity, memory, storage footprint, and the energy usage of anchor-free method called CenWholeNet [17], which is an improved
the onboard computer as well as increase the computation speed. version of CenterNet [52]. It predicted the center point, the diagonal
The main contributions of this study are summarized as follows: length and angle of the defect bounding box, considering both central
information and whole information. In addition, a parallel attention
module was also introduced into the network to further increase the

2
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 1. Overall architecture of the original YOLOv6s model. Numbers indicate the input size of each layer within the model.

detection accuracy. model parameters. To meet this challenge, Kumar et al. [26] presented a
Most existing studies in the realm of automated damage detection real-time multi-UAV damage detection system, where the YOLOv3 al­
focused on post-flight damage detection. Deep learning networks for gorithm was deployed on the Jetson-TX2 [38] onboard computer of a
post-flight detection are normally based on a large number of parame­ Pixhawk’s hardware standards-based hexacopter. Although this method
ters, requiring heavy computational resources and expensive energy cost achieved real-time detection and deployment on the UAVs, its high
as well as a long detection time. Thus, it is challenging to deploy these computational complexity limited the onboard application. Efficient
post-flight algorithms directly on UAVs to perform real-time damage neural network architecture design presents high potential to reduce
detection. parameters and the computational complexity. Introducing efficient
neural network into existing methods could be a good solution to
2.2. Real-time damage detection improve computational efficiency. A recent example proposed by Jiang
et al. [21] used a lightweight backbone called MobileNetv2 [36] to
Real-time damage detection points to the detection of damage areas replace the original backbone of the YOLOv3 algorithm. The proposed
synchronously during the UAV data collection process. The real-time method with efficient backbone design significantly reduced the
detection results can be used for guiding UAVs to fly closer to the computational burden. Nevertheless, the lightweight MobileNetv2,
damaged area to capture higher resolution damage with finer details. An which leveraged depth-wise convolutions to reduce the model size,
early example leveraged the UAV to capture the video that was wire­ increased the memory access cost, thus decreasing the detection speed.
lessly streamed to a ground server for detecting defects. But this process In addition, a pruning algorithm [34] was also developed to decrease the
can result in significant delays by video encoding and streaming, as well number of unimportant deep network parameters. Recent work pro­
as inevitable interruptions and cuts in the video stream caused by posed a novel deep learning network called YOLOv4-FPM [32] on top of
connection issues, obstacles, and weather conditions. In addition, video the original YOLOv4 [21] algorithm to achieve real-time concrete bridge
streaming transmission has high-bandwidth requirements, further crack detection. This method firstly leveraged focal loss function [33] to
increasing the response time. A modified Faster R-CNN was then pre­ alleviate the imbalance problem between positive and negative samples,
sented to automatically detect multiple defects from video frames [5]. thus improving its detection accuracy for images with complex back­
The experiment results demonstrated that the presented method was a ground information. A pruning algorithm [30] was then employed to
quasi-real-time detection method and showed superior performance in remove unimportant nodes and parameters in the deep learning model,
small and blurry defects. While this method achieved a real-time reducing the model size and complexity as well as expediting the
detection speed, it could not be deployed on the UAVs due to the large detection speed. The limitation of this method is that the YOLOv4-FPM

3
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 2. Architecture of the RepVGG Block during (a) training phase and (b) inference phase.

algorithm is an anchor-based method that cannot generalize well in real- 3.1. Overview of YOLOv6s architecture
world scenarios.
Starting from YOLOv5, researchers designed multiple versions with The original YOLOv6s [27] network is a small version model
different model sizes by controlling the model depths and widths for designed for a mobile platform. The Reparameterization Visual Geom­
application in different devices. They normally employed the small etry Group (RepVGG) [13] network was used as the backbone module to
version models to perform the real-time detection task. For example, extract damage features from images. The RepVGG decoupled the
Zhao et al. [51] proposed a YOLOv5s-HSC method on top of the small training and inference architectures with a structural reparameteriza­
version of the original YOLOv5s model, where Swin transformer blocks tion technique, significantly improving its feature representation ability.
[31] and coordinate attention modules [19] were added to further Subsequently, the Reparameterization Path Aggregation Network (Rep-
improve the damage detection accuracy. An additional detection head PAN) [27] was leveraged as the neck module to perform multi-scale
was introduced into the network to alleviate the issue of defect scale feature aggregation, which is an enhanced PANet [29] established by
variation, however this significantly increased the number of model RepBlocks. An efficient decoupled head was designed for damage clas­
parameters and computational burden. A light-weight and improved sification and localization. The overall architecture of the original
YOLOv5s called YOLOv5s-GTB was developed by Xiao et al. [35], where YOLOv6s model is presented in Fig. 1. The details of the main blocks in
light-weight GhostNet [15] and Bi-directional Feature Pyramid Network each module are presented in the following sections.
(BiFPN) [39] were leveraged as the backbone and neck network
respectively. A transformer multi-headed self-attention mechanism was 3.1.1. Backbone module
also introduced into the proposed network. The experiment results The backbone module of the YOLOv6s model in Fig. 1 leveraged a
showed that the proposed method not only reduced the 42% number of structural re-parameterization technique to develop an efficient feature
parameters and had a faster detection speed but also achieved a better representation network denoted as EfficientRep. The backbone module
detection accuracy compared to the original YOLOv5s algorithm. contains five stages, and each stage starts with a down-sample layer via
To summarize, while existing real-time damage detection algorithms Stride-2 convolutional operation. The main components of the backbone
achieved considerable success, they still face several challenges. Firstly, module are comprised of RepVGG Blocks, RepBlocks and a SimSPPF
existing studies were mainly developed on top of anchor-based ap­ block.
proaches that cannot generalize well to real-world scenarios. Secondly, The RepVGG [13] block is designed using a reparameterization
real-time damage detection algorithms exhibit a low detection accuracy structure, decoupling the training-phase multi-branch and inference-
because they are based on light-weight and small versions of deep phase plain architecture, resulting in a better trade-off between accu­
learning models. In addition, we also found that none of the existing racy and computational efficiency. The architecture of the RepVGG
studies considered the energy consumption of deep learning networks block is illustrated in Fig. 2. The core of the structural re-
which is critical to energy-constraint UAVs. parameterization technique is the equivalent conversion of a certain
network architecture to another network by transforming its parame­
3. Methodology ters. To be specific, during the training phase, the RepVGG block is
constructed using 3 × 3 convolution (3 × 3 Conv2d), 1 × 1 convolution
This section firstly expatiates the overall architecture of the YOLOv6s (1 × 1 Conv2d) and identity branches if the stride is set to 1. The input is
algorithm from backbone module and neck module to head module to firstly processed by convolutional layers with kernels of 3 × 3, 1 × 1 and
help audience understand the main components of YOLOv6s algorithm. identity respectively, followed by the batch normalization (BN) layer.
Secondly, three improvements i.e., GFPN neck, RepELAN block and an After the normalization, the outputs are combined together via an
efficient detection head are illustrated in detail to better tradeoff be­ element-wise add operation and passed through the Rectified Linear
tween damage detection accuracy and speed. Finally, the principle of the Units (ReLU) activation function. The RepVGG block has two branches
model quantization technique used in this study is described to improve with convolutional kernels of 3 × 3 and 1 × 1 respectively if the stride is
the energy efficiency of the proposed YOLOv6s-GRE method. set to 2. It is worth noting that the stride is set to 2 when the RepVGG
block is used as a standalone block, while the stride is adjusted to 1 when

4
X. Yang et al. Automation in Construction 159 (2024) 105254

connections and skip connections, which also shortens the information


path.

3.1.3. Head module


The YOLOv6s model consists of three detection heads to predict
small, medium and large-scale defects. The details of the head module
and the relevant blocks are shown in Fig. 1. These detection heads adopt
a decoupled head design to perform damage classification and box
regression tasks separately. The aggregated feature maps generated
from the neck module are firstly processed by a 1 × 1 Conv block, fol­
lowed by the operation of two parallel branches with a single 3 × 3 Conv
block and a 1 × 1 convolutional layer (Conv2d) respectively. The out­
puts of these two branches are then concatenated together to predict the
coordinate values of bounding boxes and the corresponding class
probabilities.

Fig. 3. Architecture of the RepBlock during (a) training phase and (b) infer­ 3.2. Overview of YOLOv6s-GRE architecture
ence phase.
The overall architecture of the YOLOv6s-GRE method is illustrated in
the RepVGG block is integrated within the RepBlock framework. Fig. 5. Three improvements were introduced into the original YOLOv6s
The transformation of the parameters of the RepVGG block is per­ model. First, the GFPN Neck module (as presented in Fig. 5) was
formed after training, where the well-trained parameters of convolu­ introduced to deepen the neck module and achieved sufficient multi-
tional layers with kernels of 3 × 3, 1 × 1 and identity as well as batch scale feature fusion inspired by the idea that the heavy neck paradigm
normalization layers are leveraged to construct a single convolution is more suitable for damage detection tasks [23]. Secondly, a Repar­
layer with a 3 × 3 kernel. During the inference phase, the RepVGG block ameterization Efficient Layer Aggregation Network (RepELAN) block
is built using this 3 × 3 convolution layer and a ReLU activation. The replaced the original RepBlock in the GFPN Neck module. This acted as
RepBlock is constructed by a stack of RepVGG blocks. The architecture the feature fusion block, enhancing the feature learning ability and
of the RepBlock in the training and inference phases are shown in Fig. 3. lowering the computational resource requirements. Finally, an efficient
The Simplified Spatial Pyramid Pooling Fast (SimSPPF) block aims to detection head module (Efficient Head in Fig. 5) was included to reduce
efficiently combine features at different scales to extend the receptive the model size with a negligible reduction in the detection accuracy.
field. The architecture of the SimSPPF block is presented in Fig. 4. The Details of each improvement are elaborated in the following
input is firstly processed by the SimConv block that contains Sigmoid subsections.
Linear Unit (SiLU), followed by leveraging three maxpooling operations
(MaxPool) to extract features at different scales. Subsequently, features 3.2.1. Generalized feature pyramid network (GFPN)
of different scales are concatenated together (Concatenate), and then This study adopts a deeper and larger neck module called GFPN as
operated by the SimConv block. SimSPPF reduces the computational previous work demonstrated that a heavy neck design can perform
cost and memory usage of the feature fusion process by using max­ better than a heavy backbone design in damage detection tasks [23].
pooling instead of the convolutional operation, expediting the detection The architecture of the GFPN neck module is presented in Fig. 5. The
speed of the YOLOv6 model. GFPN firstly aggregated different levels of features extracted from the
backbone module with skip layer and cross-scale connections followed
3.1.2. Neck module by a top-down pathway to enhance the high-level semantics with multi-
The neck module in the YOLOv6s model aims to refine and consol­ scale low level spatial information, enabling more effective information
idate the feature maps extracted from the backbone module. The ar­ transmission from the early layer to the later layer. Subsequently, the
chitecture of the neck module is shown in Fig. 1. The neck module down-sampled feature maps are fused with previous high-level seman­
leverages RepBlock to reconstruct the Path Aggregation Network tics following a bottom-up pathway. The GFPN is able to achieve suffi­
(PANet) [29] used in YOLOv4 [7] and YOLOv5 [41], denoted as Rep­ cient information exchange across high level semantic information and
arameterization PANet (Rep-PAN). The Rep-PAN combines different low-level spatial information.
feature maps with distinct resolutions through lateral connections, top-
down connections and bottom-up connections. Lateral connections are 3.2.2. Reparameterization efficient layer aggregation network (RepELAN)
leveraged to connect features maps of different scales to enable infor­ Efficient layer aggregation network (ELAN) [42] follows a gradient
mation be shared at different levels. The top-down pathway enhances path design strategy, which can simultaneously improve the network
the semantic feature representation by up-sampling (UpSample) high learning ability and inference speed. The architecture of the ELAN block
level semantic feature maps via ConvTranspose2d operation and is presented in Fig. 6. The ELAN block allows flexibility when setting the
merging them with lower-level feature maps using lateral connections. number of convolutional block stacks to strike a trade-off between ac­
The bottom-up pathway further improves the localization feature rep­ curacy and computational efficiency. In addition, previous research [13]
resentation by iteratively down-sampling the feature maps and fusing has shown that the RepVGG block performed better than the convolu­
them with the low-level localization features through lateral tional blocks.
Inspired by both advantages, we presented a RepELAN block to

Fig. 4. Architecture of the SimSPPF block.

5
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 5. Overall architecture of the YOLOv6s-GRE method on top of YOLOv6s model. Numbers represent the input size of each layer within the YOLOv6s-
GRE method.

Fig. 6. Architecture of the ELAN block.

Fig. 7. Architecture of the proposed RepELAN block.

reconstruct the ELAN block with RepVGG blocks to further improve the connection, as shown in Fig. 8. Subsequently, BottleRep blocks were
network detection performance. The architecture of the proposed used to replace the original convolutional block stacks, and the number
RepELAN block is illustrated in Fig. 7. Specifically, we firstly con­ of BottleRep block stacks was select as 3 considering the trade-off be­
structed a BottleRep block using the RepVGG block with a residual tween accuracy and computational efficiency. In addition, 1 × 1

6
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 8. Architecture of BottleRep block.

SimConv blocks were also used to substitute the original 1 × 1 Conv


blocks to accelerate the detection speed. As can be seen from Fig. 7, The
RepELAN block has two branches, where the first branch is processed by
a 1 × 1 SimConv block to change the number of channels, while the
second branch contains a 1 × 1 SimConv block to change channel
numbers and a stack of 3 BottleRep blocks to extract features. The output
of the first branch and the output of each BottleRep block are finally
concatenated together, followed by a 1 × 1 SimConv operation.

3.2.3. Efficient detection head


To better trade-off the detection accuracy and speed as well as reduce
the model size in the level of algorithm, we presented an efficient
decoupled detection head to cancel the original middle 3 × 3 convolu­
tional layers, leaving a 1 × 1 Conv block and two task projection layers
(i.e., one linear layer for classification and one linear layer for regres­
sion). The architecture of the proposed detection head is presented in
Fig. 5.

3.3. Model quantization technique

Real-time damage detection algorithms have seen increased use on


mobile platforms, however the high energy consumption, computational
expense and memory usage present bottlenecks when deploying these
algorithms on UAVs [53]. Model quantization techniques aim to store
and compute model weight parameters with lower bit-width represen­
tation such as using 8-bit Integers (INT8) precision to replace the typical
32-bit Floating Point (FP32) precision [16]. The INT8 quantization al­
lows for a 4 times reduction in the model size and a 4 times reduction in
memory bandwidth requirements compared to the typical FP32 arith­
metic [20]. Hardware supported by INT8 precision is typically 2 to 4
times faster than FP32 precision [2]. In addition, INT8 multiplication
arithmetic consumes 18.5 times less energy than FP32 multiplication
arithmetic [18]. Therefore, the model quantization technique can not
only reduce memory footprint but also energy consumption, keeping
comparative levels of accuracy as compared to full precision (FP32) and
making the quantized model affordable for UAVs. Nevertheless, prior
model quantization techniques are not effective for the YOLOv6s-GRE
method due to the extensive use of reparameterization blocks, result­
ing in the amplified standard deviation of weight parameter distribu­
tion. This study firstly leveraged the RepOptimizer (RepOpt) [12]
method to equivalently transform the YOLOv6s-GRE model to a
quantization-friendly model for addressing the quantization challenge
of reparameterization blocks. Subsequently, sensitivity analysis was Fig. 9. Architecture of RealVGG Block.
performed to identify quantization-sensitive layers and then the
quantization-sensitive layers were converted into full precision arith­ branch design. For example, performance degradation of greater than
metic as a compromise. Lastly, partially quantization-aware training 20% on ImageNet [11] has been observed after a standard post-training
(QAT) was conducted on the YOLOv6s-GRE model with RepOpt. Details quantization [12]. To meet this challenge, RepOpt-VGG [12] was pro­
are explained in the following subsections. posed to develop a two-stage optimization pipeline [10]. Within the
RepOpt-VGG, a RealVGG block was leveraged (as shown in Fig. 9) to
3.3.1. Re-parameterizing optimizer replace the original RepVGG block during the training phase and
The YOLOv6s-GRE method has heavily employed RepVGG blocks (as introduce the model-specific prior knowledge into the model optimizer,
shown in Fig. 2) [13] in the network architecture due to their better which is achieved by adjusting the gradients based on model-specific
tradeoffs between detection accuracy and inference speed. The RepVGG hyper-parameters. This technique is known as Gradient Re-parameteri­
block incorporates model-specific prior knowledge using multiple zation, and the resulting optimizers are called RepOptimizers.
branches during the training phase while merging the multi-branch ar­ Inspired by the RepOpt-VGG network, we firstly replaced the
chitecture into the plain architecture with a single 3 × 3 convolutional RepVGG blocks with RealVGG blocks in the YOLOv6s-GRE model and
layer during the inference phase. However, these reparameterization- then trained the YOLOv6s-GRE method with the RepOpt to obtain
based blocks face quantization difficulty because of the gain of dy­ quantization-friendly weights.
namic numerical range caused by RepVGG blocks’ intrinsic multi-

7
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 10. Illustration of FakeQuantize insertion process.

3.3.2. Analysis of quantization sensitivity


Fig. 11. Architecture of FakeQuantize module.
Quantization sensitivity analysis aims to quantify the neural net­
work’s degree of sensitivity to changes in the precision of its weights and
activations. The mean average precision (mAP) was calculated for each Where xq represents the INT8 precision quantization tensor and xf
layer contained in the YOLOv6s-GRE method trained by RepOpt with denotes the FP32 precision floating point tensor. α and β stand for the
and without quantization to obtain the sensitivity distribution. The mAP minimum value and maximum value of FP32 precision floating point
differences with and without quantization were leveraged as the eval­ tensor. αq and βq point to the minimum value and maximum value of
uation metric to measure the quantization errors. INT8 precision quantization tensor.s = ββ−q − ααq denotes the scale factor, z =
( )
round β−q α q represents the zero point, and the clip function is
βα − αβ
3.3.3. Partial quantization-aware training
The core of the INT8 precision quantization is to map the FP32 computed as below.
precision floating point value xf ∈ (α, β) to INT8 precision quantization ⎧α x < α
( )
value xq ∈ αq , βq . This study adopted the partial quantization-aware ⎪ q f q


( ) ⎨
training method to quantize the YOLOv6s-GRE model with RepOpt, clip xf , αq , βq = xf αq ≤ xf ≤ βq (2)
leading to a negligible compromise in accuracy. Specifically, the most ⎪



sensitive layers were firstly assigned with full precision according to the βq xf > βq
sensitivity distribution obtained from the quantization sensitivity anal­
ysis. Secondly, the YOLOv6s-GRE with RepOpt was trained using FP32 The de-quantization function is defined as follows:
full precision. The FakeQuantize module was then inserted before the ( ) (
xd = fd xq , s, z = s xq − z
)
(3)
convolutional operations and after the ReLU operations within the non-
sensitive layers. The insertion of FakeQuantize module into the network Here note that the FakeQuantize operation will result in information
is illustrated in Fig. 10. loss because the floating-point values after quantization and de-
The model with FakeQuantize module is denoted as QAT model. The quantization are not completely recoverable due to the clip and round
FakeQuantize module is comprised of a quantization operation (Quan­ operations within the quantization function. The information loss Δis
tize) and a de-quantization operation (De-Quantize) in a sequential computed as below:
manner, as shown in Fig. 11. ( ( ) )
Δ = xf − fd fq xf , s, z , s, z (4)
The quantization operation is performed according to the quantiza­
tion function below: The information loss could further result in the network accuracy
( ) ( (x ) ) degradation after quantization operation. Therefore, the information
(1)
f
xq = fq xf , s, z = clip round + z , αq , βq loss caused by the FakeQuantize operation was added into the overall
s
loss function, and the pre-trained FP32 full precision model weights

8
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 12. Examples of different types of annotated defect images in the concrete bridge damage dataset.

were leveraged to initialize the QAT model. In addition, a self- library, and Python 3.8.5 version was used as the programming lan­
distillation approach and graph optimization [27] were also leveraged guage to develop the proposed method. During the training phase, the
to further improve the quantization accuracy and speed. Finally, the image input size was set to 640 pixels × 640 pixels. Batch size and the
QAT model was finetuned and the quantization parameters were saved training epoch were set to 32 and 100 respectively. In addition, the
accordingly. stochastic gradient descent (SGD) [24] was chosen as the optimizer with
momentum of 0.937. The initial learning rate was equal to 0.01, and the
4. Experiment and results learning rate decay followed a cosine schedule. The first 3 epochs were
used for warm-up. During the testing, the mean average precision (mAP)
4.1. Data preparation and the frames per second (FPS) were used as the evaluation metrics for
detection accuracy and speed respectively. The amount of model pa­
This study employed our previously built concrete bridge damage rameters and floating-point operations per second (FLOPS) were also
dataset, which was labelled by a group of researchers with a consistent computed to measure the model size and the computational complexity.
standard using the image annotation tool LabelImg [28]. The dataset The TensorRT version 8.2.3.0 was also selected for implementation of
comprises 1969 images with 4385 annotations, covering three common the quantization operation. In terms of the evaluation metric of the
defect types: spalling (1359 annotations), exposed rebar (1950 annota­ energy consumption, Joule (J) was leveraged, which is the product of
tions) and efflorescence (1076 annotations) considering the availability the power consumption and time. Here note that the evaluation exper­
of the damage dataset and prior extensive studies on other defects like iment of the energy consumption of damage detection algorithms was
cracks. The dataset was then partitioned into a training dataset, a vali­ conducted on the NVIDIA RTX3090 GPU with 24GB memory to
dation dataset and a testing dataset in a 7:1:2 ratio. The training dataset, demonstrate the viability of employing the proposed method for energy
validation dataset and testing dataset respectively contain 1377 images, conservation.
197 images and 395 images. Mosaic [7,41] and Mixup [50] were
employed as strong data augmentation methods to enhance the data 4.3. Evaluation
diversity. Some examples of defect images contained in the dataset are
presented in Fig. 12. The effectiveness of the YOLOv6s-GRE method was firstly evaluated
through comparison experiments. The effect of each improvement in the
YOLOv6s-GRE method was also analyzed via the ablation study [33].
4.2. Implementation details Finally, the evaluation of quantization results was carefully performed
to validate the advantages of the quantized model of the YOLOv6s-GRE
The Linux operating system along with an Intel(R) Xeon(R) CPU method. Details are illustrated in the following subsections.
E5–2680 v4 CPU and a single NVIDIA RTX3090 GPU with 24GB memory
were used to train and test the proposed real-time damage detection
method. The Pytorch 1.9.0 version [1] was selected as the deep learning

9
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 13. Some comparison examples of prediction results generated by (a) the original YOLOv6s and (b) the YOLOv6s-GRE method.

10
X. Yang et al. Automation in Construction 159 (2024) 105254

Table 1 speed decrease of 6.9% of FPS and a considerable increase in both the
Comparison of the results between the YOLOv6s and the YOLOv6s-GRE. model size and computational complexity by 8.8 Mega (M) and 12.51
Method mAP50 FPS Params FLOPS Giga (G) respectively. The implementation of the proposed RepELAN
block maintained the same level of detection accuracy and speed as the
YOLOv6s 64.8% 290 17.19 M 44.07 G
YOLOv6s-GRE 67.1% 282 17.89 M 43.33 G preceding method while reducing the model size and computational
complexity by 23.1% and 18.4%. We finally evaluated the effect of the
proposed efficient detection head, which has an adverse effect on the
4.3.1. Comparison experiments with YOLOv6s detection accuracy with a slight reduction of mAP50 by 0.5 percentage
Detailed qualitative and quantitative comparison experiments were point. Conversely, the proposed efficient detection head significantly
firstly conducted between the original YOLOv6s method and the increased the detection speed by 28% of FPS while reducing the model
YOLOv6s-GRE method. Prediction results of the YOLOv6s-GRE method size by 1.55 M and the computational resource requirement by 2.83 G.
were firstly qualitatively compared with the original YOLOv6s method. By implementing the above improvements, the YOLOv6s-GRE method
Some examples of the qualitative comparison between these two was more accurate compared to the YOLOv6s method, has a comparable
methods were presented in Fig. 13. level of detection speed, while reducing the model size and computa­
The prediction results generated by the original YOLOv6s can miss tional complexity to make it more appropriate for deployment on a UAV
the detection of some defects, as can be seen from Fig. 13. A possible onboard computer.
reason for the missed damage detection is that the surface of the exposed
rebar is covered by some concrete dust, resulting in a similar feature 4.3.3. Quantization evaluation
representation between the exposed rebar and concrete surface. In This section aims to evaluate the performance of the quantized
contrast, the YOLOv6s-GRE method exhibits stronger detection ability, YOLOv6s-GRE model. We firstly validated the effectiveness of the
resulting in higher confidence scores and can generate more accurate YOLOv6s-GRE method with RepOpt to address the quantization chal­
prediction results compared with the original YOLOv6s. In addition to lenge of reparameterization blocks. The overall comparison results be­
the qualitative evaluation, we also quantitatively compared the tween the YOLOv6s-GRE method and the YOLOv6s-GRE method with
YOLOv6s-GRE method with the original YOLOv6s method in terms of RepOpt are listed in Table 3. In general, the YOLOv6s-GRE method with
overall detection accuracy and the accuracy of each defect type. Com­ RepOpt achieved a similar performance in terms of detection accuracy,
parison results between the YOLOv6s and the YOLOv6s-GRE method are
summarized in Table 1. The YOLOv6s-GRE method exceeded the
YOLOv6s model by 2.3 percentage points of mAP50 for the detection Table 2
accuracy with a comparable detection speed, however the model size Effect of each improvement in the YOLOv6s-GRE method.
increased, and the computational complexity decreased compared to the No. Method mAP50 FPS Params FLOPS
YOLOv6s method.
A YOLOv6s 64.8% 290 17.19 M 44.07 G
Results of the method comparison for each defect type are presented B A + GFPN 67.6% 270 25.27 M 56.58 G
in Fig. 14, showing that the YOLOv6s-GRE method achieved the best C B + RepELAN 67.6% 272 19.44 M 46.16 G
detection accuracy across all defect types. The metrics of mAP50 for D C + Efficient head 67.1% 282 17.89 M 43.33 G
exposed rebar, efflorescence and spalling improved by 2.9, 2.1 and 1.9
percentage points respectively.
Table 3
4.3.2. Ablation study Overall comparison results between the YOLOv6s-GRE method and the
The ablation study analyses the effect of each improvement within YOLOv6s-GRE method with RepOpt.
the YOLOv6s-GRE method and the results are presented in Table 2. Method mAP50 FPS Params FLOPS
Introducing the GFPN neck module into the original YOLOv6s model YOLOv6s-GRE 67.1% 282 17.89 M 43.33 G
improved the detection accuracy, with mAP50 increasing by 2.8 per­ YOLOv6s-GRE with RepOpt 66.9% 285 17.89 M 43.48 G
centage points. Nevertheless, this operation also resulted in a detection

Fig. 14. mAP50 results for each defect type based on the YOLOv6s and the YOLOv6s-GRE methods.

11
X. Yang et al. Automation in Construction 159 (2024) 105254

Fig. 15. mAP50 results for each defect type between the YOLOv6s-GRE method and the YOLOv6s-GRE method with RepOpt.

Fig. 16. Difference for the metric of mAP50 of each layer with and without quantization.

and it comes with a similar detection speed, model size and computa­ and without quantization for each layer are shown in Fig. 16. As can be
tional complexity compare to the YOLOv6s-GRE method. In addition to seen from Fig. 16, the performance of the detect.cls_preds.1 layer
the overall comparison, we also performed a comparison across each dropped the most among all layers, with a 1.58 percentage point
defect type, as shown in Fig. 15. The YOLOv6s-GRE method with RepOpt reduction in mAP50 after the quantization operation. According to the
had a similar detection performance for each defect type compared to sensitivity distribution, the top 10 sensitive layers were set to full pre­
the YOLOv6s-GRE method. Thus, the YOLOv6s-GRE method with cision during partial quantization-aware training, where the layers were
RepOpt can be recognized as an equivalent to the YOLOv6s-GRE method with differences larger than 1 percentage point, considering the trade-
according to these comparison experiments for overall performance and off among the detection accuracy, speed and energy efficiency.
the performance for each defect type. Finally, the YOLOv6s-GRE method with RepOpt was quantized, and
The quantization sensitivity analysis was performed on all layers in the performance and energy consumption of the quantized model were
the YOLOv6s-GRE method with RepOpt. The differences of mAP50 with analyzed.

12
X. Yang et al. Automation in Construction 159 (2024) 105254

Table 4 the energy consumption by 79.7% while keeping a comparable level of


Overall comparison across different quantization methods and precision arith­ detection accuracy.
metic in terms of the metrics of mAP50, FPS and energy consumption. Some limitations remain despite the great success of this research.
No. Method Precision mAP50 FPS Energy Firstly, existing damage detection methods are still manually designed
consumption with a fixed network architecture, which cannot adapt well to specific
1 YOLOv6s FP 32 64.8% 290 136 J hardware with different constrains such as UAVs. Secondly, model
2 YOLOv6s-GRE FP 32 67.1% 282 149 J compression techniques are underused in the damage detection field and
3 YOLOv6s-GRE with FP 32 66.9% 285 148 J need to be further investigated in future studies. Thirdly, no previous
RepOpt
study integrates the real-time damage detection methods with the UAVs’
4 YOLOv6s-GRE with FP 16 66.8% 379 106 J
RepOpt dynamic path planning algorithm to perform flight control in real-time,
5 3 + post training INT 8 65.1% 561 21 J leading to a higher quality inspection data acquisition with finer damage
quantization (PTQ) details.
6 3 + quantization-aware INT 8 66.2% 535 23 J To address these limitations, future research might focus on 1)
training (QAT)
7 3 + Partial QAT INT 8 66.7% 523 30 J
developing a hardware-specific damage detection method using the
Neural Architecture Search [9] technique to automatically design the
network architecture according to the hardware capacity; 2) leveraging
The overall comparison results including the metrics of mAP50, FPS multiple model compression techniques in combination to further
and energy consumption across different quantization methods and reduce model size and computational complexity as well as energy
distinct precision arithmetic are presented in Table 4. Method No. 1 (the consumption; 3) incorporating real-time damage detection method into
original YOLOv6s) consumed 126 J energy during the inference on the automated path planning algorithms to capture higher quality inspec­
testing dataset. Method No. 2 (YOLOv6s-GRE) increased the energy tion data and boost inspection efficiency.
consumption by 9.6% despite the improvement on the detection accu­
racy, compared to the Method No. 1. As can be seen from Method No. 3 CRediT authorship contribution statement
and Method No. 4, the damage detection algorithm trained with smaller
floating-point operation precision can significantly enhanced the Xiaofei Yang: Writing – original draft, Software, Resources, Meth­
detection speed and energy efficiency by 33.0% and 28.4% respectively, odology, Investigation, Formal analysis, Data curation, Conceptualiza­
while maintaining a similar detection accuracy. Method No. 5, Method tion. Enrique del Rey Castillo: Writing – review & editing, Supervision,
No. 6 and Method No. 7 compared different quantization methods such Project administration, Conceptualization. Yang Zou: Writing – review
as PTQ, QAT and partial QAT implemented on the method No. 3. Firstly, & editing, Supervision, Funding acquisition, Data curation, Conceptu­
Method No. 5 achieved 65.1% of mAP50 degrading the 1.8 percentage alization. Liam Wotherspoon: Writing – review & editing, Supervision.
points while considerably expediting the detection speed by 96.8% and
saving energy by 85.8%, in contrast to method No. 3. Next, Method No.
Declaration of Competing Interest
6 achieved a better trade-off than Method No. 5, increasing the detection
accuracy but slightly lowering the detection speed and energy effi­
The authors declare that they have no known competing financial
ciency. Finally, Method No. 7 quantized by the partial QAT technique
interests or personal relationships that could have appeared to influence
was evaluated and achieved the best detection accuracy among three
the work reported in this paper.
quantization methods. Given above, Method No. 7 was selected as the
final quantization method to reduce the loss of detection accuracy as
Data availability
much as possible after quantization. Overall, Method No. 7 reached a
66.7% detection accuracy, 523 FPS detection speed and 30 J energy
Data will be made available on request.
consumption, which maintained a comparable level of the detection
accuracy while increasing the detection speed by 83.5% and saving
Acknowledgements
energy by 79.7%, in contrast to method No. 3.
The authors would like to acknowledge the support by University of
5. Conclusions
Auckland FRDF Grant (Project No. 3716476).

Automatically detecting concrete bridge damage in real-time using


References
UAVs and onboard computers remains a worldwide challenge. This
paper firstly presented an improved real-time anchor free damage [1] Python 1.9.0 Version. https://pytorch.org/docs/1.9.0/.
detection method called YOLOv6s-GRE on top of the YOLOv6s to better [2] Quantization – PyTorch 1.13 Documentation. https://pytorch.org/docs/stable/qua
generalize to real-world scenarios and trade off between the damage ntization.html.
[3] H. Agency, Inspection Manual for Highway Structures: Vol. 1: Reference Manual,
detection accuracy and speed. Specifically, three improvements have The Stationery Office, 2007. ISBN: 9780115527975.
been made: 1) the GFPN neck module was introduced to improve the [4] M.S. Alam, B. Natesha, T. Ashwin, R.M.R. Guddeti, UAV based cost-effective real-
damage detection accuracy; 2) the RepELAN block was designed and time abnormal event detection using edge computing, Multimed. Tools Appl. 78
(2019) 35119–35134, https://doi.org/10.1007/s11042-019-08067-1.
added to reduce the model size and computational complexity; and 3) [5] R. Ali, D. Kang, G. Suh, Y.-J. Cha, Real-time multiple damage mapping using
the efficient detection head was presented to expedite the detection autonomous UAV and deep faster region-based neural networks for GPS-denied
speed and further decrease the model size and complexity. In general, structures, Autom. Constr. 130 (2021), 103831, https://doi.org/10.1016/j.
autcon.2021.103831.
the YOLOv6s-GRE method, in contrast to YOLOv6s, achieved an [6] ASCE, 2021 Report Card for America’s Infrastructure American Society of Civil
improvement of 2.3 percentage point in mAP50, while maintaining a Engineers. https://infrastructurereportcard.org/, 2021.
comparable detection speed and without requiring an obvious increase [7] A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao, Yolov4: optimal speed and accuracy of
object detection, arXiv prepr. (2020), https://doi.org/10.48550/arXiv.2004.10934
in model size or computational complexity. Subsequently, the YOLOv6s-
arXiv:2004.10934.
GRE method was reconstructed with RepOptimizer to equivalently [8] Y.J. Cha, W. Choi, O. Büyüköztürk, Deep learning-based crack damage detection
transform the YOLOv6s-GRE method to a quantization-friendly model. using convolutional neural networks, Comp. Aid. Civ. Infrastr. Eng. 32 (5) (2017)
The YOLOv6s-GRE method with RepOpt was then quantized using 361–378, https://doi.org/10.1111/mice.12263.
[9] Y. Chen, T. Yang, X. Zhang, G. Meng, X. Xiao, J. Sun, Detnas: backbone search for
partial quantization-aware training technique. The quantized model object detection, Adv. Neural Inf. Proces. Syst. 32 (2019), https://doi.org/
significantly expedited the detection speed by 83.5% of FPS and saved 10.48550/arXiv.1903.10979.

13
X. Yang et al. Automation in Construction 159 (2024) 105254

[10] X. Chu, L. Li, B. Zhang, Make RepVGG greater again: a quantization-aware [32] M. Maboudi, M. Homaei, S. Song, S. Malihi, M. Saadatseresht, M. Gerke, et al.,
approach, arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2212.01593 arXiv: arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2205.03716 arXiv:
2212.01593. 2205.03716.
[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale [33] R. Meyes, M. Lu, C.W. de Puiseau, T. Meisen, Ablation studies in artificial neural
hierarchical image database, 2009, IEEE conference on computer vision and networks, arXiv prepr. (2019), https://doi.org/10.48550/arXiv.1901.08644 arXiv:
pattern recognition, IEEE (2009) 248–255, https://doi.org/10.1109/ 1901.08644.
CVPR.2009.5206848. [34] L.D. Otero, N. Gagliardo, D. Dalli, W.-H. Huang, P. Cosentino, Proof of Concept for
[12] X. Ding, H. Chen, X. Zhang, K. Huang, J. Han, G. Ding, Re-parameterizing your Using Unmanned Aerial Vehicles for High Mast Pole and Bridge Inspections,
optimizers rather than architectures, arXiv prepr. (2022), https://doi.org/ Florida Institute of Technology, Department of Engineering Systems, Melbourne,
10.48550/arXiv.2205.15242 arXiv:2205.15242. FL United States, 2015. https://rosap.ntl.bts.gov/view/dot/29176.
[13] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, J. Sun, Repvgg: making vgg-style [35] X. Ruiqiang, YOLOv5s-GTB: light-weighted and improved YOLOv5s for bridge
convnets great again, in: Proceedings of the IEEE/CVF Conference on Computer crack detection, arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2206.01498
Vision and Pattern Recognition, 2021, pp. 13733–13742, https://doi.org/ arXiv:2206.01498.
10.48550/arXiv.2101.03697. [36] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: inverted
[14] A. Farhadi, J. Redmon, Yolov3: An Incremental Improvement, Computer Vision residuals and linear bottlenecks, Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
and Pattern Recognition, Springer Berlin/Heidelberg, Germany, 2018, (2018) 4510–4520, https://doi.org/10.48550/arXiv.1801.04381.
pp. 1804–2767, https://doi.org/10.48550/arXiv.1804.02767. [37] S. Sun, W. Liu, R. Cui, YOLO based bridge surface defect detection using decoupled
[15] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, Ghostnet: more features from cheap prediction, in: 2022 7th Asia-Pacific Conference on Intelligent Robot Systems
operations, in: Proceedings of the IEEE/CVF Conference on Computer Vision and (ACIRS), IEEE, 2022, pp. 117–122, https://doi.org/10.1109/
Pattern Recognition, 2020, pp. 1580–1589, https://doi.org/10.48550/ ACIRS55390.2022.9845546.
arXiv.1911.11907. [38] A.A. Süzen, B. Duman, B. Şen, Benchmark analysis of jetson tx2, jetson nano and
[16] S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks raspberry pi using deep-cnn, in: 2020 International Congress on Human-Computer
with pruning, trained quantization and huffman coding, arXiv prepr. (2015), Interaction, Optimization and Robotic Applications (HORA), IEEE, 2020, pp. 1–5,
https://doi.org/10.48550/arXiv.1510.00149 arXiv:1510.00149. https://doi.org/10.1109/HORA49412.2020.9152915.
[17] Z. He, S. Jiang, J. Zhang, G. Wu, Automatic damage detection using anchor-free [39] M. Tan, R. Pang, Q.V. Le, Efficientdet: scalable and efficient object detection, in:
method and unmanned surface vessel, Autom. Constr. 133 (2022), 104017, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
https://doi.org/10.1016/j.autcon.2021.104017. Recognition, 2020, pp. 10781–10790, https://doi.org/10.48550/
[18] M. Horowitz, 1.1 computing’s energy problem (and what we can do about it), in: arXiv.1911.09070.
2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers [40] N.G. Thompson, M. Yunovich, D. Dunmire, Cost of corrosion and corrosion
(ISSCC), IEEE, 2014, pp. 10–14, https://doi.org/10.1109/ISSCC.2014.6757323. maintenance strategies, Corros. Rev. 25 (3–4) (2007) 247–262, https://doi.org/
[19] Q. Hou, D. Zhou, J. Feng, Coordinate attention for efficient mobile network design, 10.1515/CORRREV.2007.25.3-4.247.
in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern [41] Ultralytics, YOLOv5. https://github.com/ultralytics/yolov5/tree/v6.1.
Recognition, 2021, pp. 13713–13722, https://doi.org/10.48550/ [42] C.-Y. Wang, H.-Y.M. Liao, I.-H. Yeh, Designing network design strategies through
arXiv.2103.02907. gradient path analysis, arXiv prepr. (2022), https://doi.org/10.48550/
[20] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, arXiv.2211.04800 arXiv:2211.04800.
D. Kalenichenko, Quantization and training of neural networks for efficient [43] F. Wang, Y. Zou, E. del Rey Castillo, J. Lim, Optimal UAV Image Overlap for
integer-arithmetic-only inference, in: Proceedings of the IEEE Conference on Photogrammetric 3D Reconstruction of Bridges Vol. 1101, IOP Publishing, 2022,
Computer Vision and Pattern Recognition, 2018, pp. 2704–2713, https://doi.org/ https://doi.org/10.1088/1755-1315/1101/2/022052, p. 022052.
10.48550/arXiv.1712.05877. [44] F. Wang, Y. Zou, C. Zhang, J. Buzzatto, M. Liarokapis, E. del Rey Castillo, J.B. Lim,
[21] S. Jiang, Y. Cheng, J. Zhang, Vision-guided unmanned aerial system for rapid UAV navigation in large-scale GPS-denied bridge environments using fiducial
multiple-type damage detection and localization, Struct. Health Monit. (2022), marker-corrected stereo visual-inertial localisation, Autom. Constr. 156 (2023),
https://doi.org/10.1177/14759217221084878, 14759217221084878. 105139, https://doi.org/10.1016/j.autcon.2023.105139.
[22] Y. Jiang, D. Pang, C. Li, A deep learning approach for fast detection and [45] J. Wang, Z. Feng, Z. Chen, S. George, M. Bala, P. Pillai, S.-W. Yang,
classification of concrete damage, Autom. Constr. 128 (2021), 103785, https://doi. M. Satyanarayanan, Bandwidth-efficient live video analytics for drones via edge
org/10.1016/j.autcon.2021.103785. computing, in: 2018 IEEE/ACM Symposium on Edge Computing (SEC), IEEE, 2018,
[23] Y. Jiang, Z. Tan, J. Wang, X. Sun, M. Lin, H. Li, GiraffeDet: a heavy-neck paradigm pp. 159–173, https://doi.org/10.1109/SEC.2018.00019.
for object detection, arXiv prepr. (2022), https://doi.org/10.48550/ [46] J. Wells, B. Lovelace, Unmanned Aircraft System Bridge Inspection Demonstration
arXiv.2202.04256 arXiv:2202.04256. Project Phase II Final Report, Dept. of Transportation. Research Services & Library,
[24] N. Ketkar, N. Ketkar, Stochastic Gradient Descent, Deep Learning with Python: A Minnesota, 2017. https://rosap.ntl.bts.gov/view/dot/32636.
Hands-on Introduction, 2017, pp. 113–132, https://doi.org/10.1007/978-1-4842- [47] X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, X. Sun, DAMO-YOLO: a report on
2766-4_8. real-time object detection design, arXiv prepr. (2022), https://doi.org/10.48550/
[25] B. Koonce, MobileNetV3, Convolutional Neural Networks with Swift for arXiv.2211.15444 arXiv:2211.15444.
Tensorflow, Springer, 2021, pp. 125–144, https://doi.org/10.1007/978-1-4842- [48] C. Zhang, C.c. Chang, M. Jamshidi, Concrete bridge surface damage detection using
6168-2. a single-stage detector, Comp. Aid. Civ. Infrastr. Eng. 35 (4) (2020) 389–409,
[26] P. Kumar, S. Batchu, S.R. Kota, Real-time concrete damage detection using deep https://doi.org/10.1111/mice.12500.
learning for high rise structures, IEEE Access. 9 (2021) 112312–112331, https:// [49] C. Zhang, Y. Zou, F. Wang, E. del Rey Castillo, J. Dimyadi, L. Chen, Towards fully
doi.org/10.1109/ACCESS.2021.3102647. automated unmanned aerial vehicle-enabled bridge inspection: where are we at?
[27] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Constr. Build. Mater. 347 (2022), 128543 https://doi.org/10.1016/j.
YOLOv6: a single-stage object detection framework for industrial applications, conbuildmat.2022.128543.
arXiv prepr. (2022), https://doi.org/10.48550/arXiv.2209.02976 arXiv: [50] H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: beyond empirical risk
2209.02976. minimization, arXiv prepr. (2017), https://doi.org/10.48550/arXiv.1710.09412
[28] T. Lin, LabelImg, Online, https://github.com/tzutalin/labelImg, 2015. arXiv:1710.09412.
[29] S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance [51] S. Zhao, F. Kang, J. Li, Concrete dam damage detection and localisation based on
segmentation, in: Proceedings of the IEEE Conference on Computer Vision and YOLOv5s-HSC and photogrammetric 3D reconstruction, Autom. Constr. 143
Pattern Recognition, 2018, pp. 8759–8768, https://doi.org/10.48550/ (2022), 104555, https://doi.org/10.1016/j.autcon.2022.104555.
arXiv.1803.01534. [52] X. Zhou, D. Wang, P. Krähenbühl, Objects as points, arXiv prepr. (2019), https://
[30] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, C. Zhang, Learning efficient convolutional doi.org/10.48550/arXiv.1904.07850 arXiv:1904.07850.
networks through network slimming, in: Proceedings of the IEEE International [53] Y. Zhou, S.-M. Moosavi-Dezfooli, N.-M. Cheung, P. Frossard, Adaptive quantization
Conference on Computer Vision, 2017, pp. 2736–2744, https://doi.org/10.48550/ for deep neural network, in: Proceedings of the AAAI Conference on Artificial
arXiv.1708.06519. Intelligence Vol. 32, 2018, https://doi.org/10.48550/arXiv.1712.01048.
[31] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: [54] D. Zou, M. Zhang, Z. Bai, T. Liu, A. Zhou, X. Wang, W. Cui, S. Zhang, Multicategory
hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/ damage detection and safety assessment of post-earthquake reinforced concrete
CVF International Conference on Computer Vision, 2021, pp. 10012–10022, structures using deep learning, Comp. Aid. Civ. Infrastr. Eng. (2022), https://doi.
https://doi.org/10.48550/arXiv.2103.14030. org/10.1111/mice.12815.

14

You might also like