You are on page 1of 17

Automation in Construction 124 (2021) 103484

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Automated crack severity level detection and classification for ballastless


track slab using deep convolutional neural network
Weidong Wang a, b, Wenbo Hu a, b, Wenjuan Wang c, Xinyue Xu a, b, Mengdi Wang a, b,
Youyin Shi a, b, Shi Qiu a, b, *, Erol Tutumluer d
a
School of Civil Engineering, Central South University, Changsha 410075, China
b
MOE Key Laboratory of Engineering Structures of Heavy-haul Railway, Central South University, Changsha 410075, China
c
School of Business Administration, Capital University of Economics and Business, Beijing 100026, China
d
Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801-2352, USA

A R T I C L E I N F O A B S T R A C T

Keywords: The classification and treatment of cracks with different severity levels based on the width measurement is a
Deep convolutional neural network critical consideration in maintenance of ballastless track slab. Existing deep learning methods cannot directly
Automated crack classification quantify cracks, which must rely on image processing technologies to post-process the initial results by deep
Ballastless track slab
learning, inevitably leading to multiple steps and low efficiency. This paper proposes a novel quantitative
Image processing technology
Severity level quantification
classification method for cracks with different severity levels based on deep convolutional neural networks, using
orthogonal projection method to preprocess training data and define the severity level, which is validated and
evaluated from four aspects: network structures, crack data, classification methods, and environmental condi­
tions. Results show that the Inception-ResNet-v2 network can classify crack images into three severity levels
without pixel segmentation or post-processing, achieving the accuracy, precision, recall and F1 score all
exceeding 93%, with good robustness and adaptability to noise and light intensity.

1. Introduction Sobel and Canny, etc.) can accurately segment fuzzy boundaries be­
tween the surface cracks and background based on the difference in
High-speed railway (HSR) ballastless track slab (BTS) deteriorates pixel gray value [3–5], they will produce residual noise in the final
with service time increases. Distresses such as cracks not only reduce the output binary image, especially the detection effect of the noisy image is
strength of the track structure and shorten the service life of the BTS, but poor and easy to cause discontinuous crack edges [6–7]. Adu-Gyamfi
also may cause the fastener falling off and the rail shifting, which et al. [8] proposed an image denoising and enhancement method that
threatens the operation safety of HSR [1,2]. The severity level of cracks combines empirical mode decomposition (EMD) and weighted recon­
is an important decision-making factor for the maintenance and reha­ struction techniques, which can extract more effective image features
bilitation and strategies of BTS. Due to high frequency and short main­ from noisy images compared to most edge detectors. Unlike the edge
tenance window of HSR, it is difficult to obtain comprehensive, detection method, the processing object of threshold segmentation
accurate, detailed, and timely crack information on BTS through manual changes from the boundary pixels of the cracks to all pixels of the whole
visual inspection. Various machine vision methods have been developed image, which are judged as cracks or background by setting an appro­
to automatically detect cracks to replace the conventional manual visual priate threshold (global threshold, local threshold, or adaptive
recognition. threshold) [9–13]. Tang et al. [14] used fuzzy set theory and boundary
Image processing technologies (IPTs) have first been widely used in histogram to determine the optimal threshold for distinguishing crack
the field of crack detection, calculating and analyzing image features to pixels and background pixels by maximizing the fuzzy index entropy.
distinguish cracks from background based heuristic rules, which can be But the threshold setting still has great uncertainty and the detection
summarized as edge detection, threshold segmentation, region growth result is easy to lose the boundary information of the cracks when the
and filters. Although commonly used edge detectors (Roberts, Prewitt, contrast between the cracks and the background is low. The region

* Corresponding author at: School of Civil Engineering, Central South University, Changsha 410075, China.
E-mail address: sheldon.qiu@csu.edu.cn (S. Qiu).

https://doi.org/10.1016/j.autcon.2020.103484
Received 16 March 2020; Received in revised form 5 October 2020; Accepted 3 November 2020
Available online 22 January 2021
0926-5805/© 2020 Elsevier B.V. All rights reserved.
W. Wang et al. Automation in Construction 124 (2021) 103484

growth algorithm solves the problem of poor crack detection accuracy structures (pavement, bridge, tunnel, building) is demonstrated superior
under low contrast conditions by setting seed pixels in the target area to IPTS and MLAs, which has fast detection speed, high accuracy, and
and continuously expanding [15–17]. Although the anti-noise perfor­ good adaptability to complex environment conditions [40–48], it still
mance of the region growth is better than edge detection and threshold faces some challenges in quantifying cracks.
segmentation, it is still sensitive to noise and the seed pixels rely on The existing detection methods based on convolutional network
manual determination, which may lead to voids in the crack detection models cannot directly quantify cracks, which must rely on one or more
results. Zhang et al. [18] matched the pre-designed filter with crack IPTs to post-process the initial results by deep learning to calculate the
characteristics according to shape, direction or intensity to detect specific numerical indexes of cracks and further define the severity level.
cracks, which has higher detection accuracy and can better suppress Kang et al. [49] went through three steps to quantify cracks. First, Faster
environmental noise compared with edge detectors. R-CNN was used to identify and locate the cracks, then the improved
The heuristic rules of the above IPTs heavily rely on prior knowledge tubularity flow field (TUFF) algorithm was used to further segment the
and engineering experience (e.g., pavement distress matrix) to person­ specific features of the cracks, and finally the improved distance trans­
ally design and adjust according to unique crack detection requirements form method (DTM) was used to calculate the length and width. Beck­
[19], which aim at a single crack feature with high specificity, low man et al. [50] used the RANSAC algorithm to further divide and
repeatability and incompleteness. The performance under different calculate the volume of concrete spalling based on the positioning re­
complex environmental conditions varies, which also hinders the full sults of Faster R-CNN. Ni et al. [51] first used a dual-scale convolutional
automation in recognition of cracks in IPTS. The application of machine neural network to extract detailed morphological features of the cracks
learning algorithms (MLAs) is an effective measure to realize the auto­ and then estimated the width of the detected cracks based on the Zernike
matic detection of complex and diverse cracks. The essence of machine moment operator. Yang et al. [52] skeletonized the FCN network crack
learning is feature learning, using the extracted image features (entropy, characterization results based on the median axis algorithm and used the
texture, HOG, SIFT, LBP, GLCM, etc.) to fully train SVM, BPANN, NBC, ratio between the pixel area and pixel length to define the average width
KNN and other classifiers to learn the similarity between various crack of cracks. In the above process, all crack inspection data must go through
images, so that the computer can grasp the recognition rules and detect region location, pixel segmentation and post-processing based on IPTs to
cracks from unknown image data [20–26]. Cha et al. [27] used Hough calculate specific numerical index values for quantifying, which inevi­
transform and other image processing methods to extract features such tably leads to multiple steps, high computational costs and low auto­
as horizontal and vertical lengths of bolts, which were used to fully train mation, especially the post-processing based on IPTs greatly limits the
linear SVM to distinguish between tight bolts and loose bolts. Xu et al. massive data processing capability of convolutional network models.
[28] extracted the parameters representing the crack characteristics The severity level of the crack can be estimated based on other cracks
from each sub-image, and then manually selected sub-images with of known severity at that location [53]. Therefore, this paper proposes a
representative parameters to train the ANN. Shi et al. [29] extracted novel quantitative classification method based on IPT pre-processing
crack features at multiple levels and directions from the labeled data to and DCNN to address the challenges of existing crack quantification
fully train the crack forest, which solved the problem of detecting noise- methods. The low-efficiency calculation of specific numerical index
containing cracks with complex topology. However, the image features values of cracks based on IPT is completed in the training set in advance,
required to fully train existing machine learning classifiers rely on IPTs and then the training set is used to fully train the DCNN to directly
for pre-extraction and manual labeling, resulting in low detection ac­ classify cracks with different severity levels from inspection data of BTS.
curacy, extended timing and limited detection range by shallow and The fully trained DCNN does not need to segment the pixel features or
scant features [7]. In addition, unsupervised MLAs such as clustering calculate specific numerical indexes of cracks one by one, but quanti­
algorithm (K-means algorithm) [30], principal component analysis tatively classifies the cracks from the image level according to the sim­
(PCA) [31] and Gaussian mixture model [32] have also been applied to ilarity of features between training data and inspection data, which
crack detection of roads, bridges and other infrastructure. The advan­ overcomes the challenges of the existing crack quantification methods
tage is that the image data required for training does not require manual with multiple steps, high computational costs and low automation. In
labeling in advance, which reduces manual intervention, but the limi­ addition, unlike the existing shallow convolutional network models
tation is that they can only detect crack images with obvious textures (generally within 10 layers) used for crack detection [54–57], the DCNN
and are generally not as accurate as traditional supervised MLAs [33]. not only increases the structural depth but also expands the width of
From manually processing limited image features by IPTs and MLAs each layer in a parallel manner, so that it can fully learn the input image
to automatically processing rich, arbitrary image features using deep and extract more complex and advanced effective feature information to
learning is a great advancement. Deep learning methods with convolu­ ensure the accuracy of crack quantification [58–61].
tional neural network (CNN) as the core can automatically extract rich The objective of this research is to establish an automated classifier
and deep abstract features from massive infrastructure surface crack that can quantify cracks according to their severity levels. Adequate
data to master recognition rules. The CNN relies on the convolutional robustness and adaptability to adverse environments such as weak light
layer (a large number of convolution kernels) inside the networks to and massive background noise is needed for this classifier. This paper
perform a convolution operation with a neighborhood of the input crack uses DCNN to implement this classifier and evaluates and tests its per­
image, which slides from the upper left to the lower right of the image formance. The content of this research is shown in Fig. 1. Section 2 in­
with a certain step and outputs the deep abstract feature map for large- troduces the structural composition and image processing process of the
scale, diverse crack rapid detection tasks [34]. Various convolutional DCNN, and uses the Inception-ResNet-v2 network as an example for
network models derived from the original CNN have been used to elaboration. Section 3 first explains the source of the crack image in the
identify, locate and characterize cracks in different shapes and locations first part. Then in the second part, the average width of each crack is
[35–36]. Mandal et al. [37] used YOLOv2 network to accurately identify calculated by orthogonal projection method, and the average width is
and locate lateral cracks, longitudinal cracks and alligator cracks at used as a quantitative classification index to divide the collected crack
different locations. Faster R-CNN was used to simultaneously identify images into three severity levels (label1, label2, label3). The images are
and locate different types of structural damages, and the average pre­ preprocessed in the third part to build a crack image database containing
cision (AP) of cracks has reached 94.7%, which can adapt well to various 15,000 images. Section 4 uses transfer learning to compare the recog­
image sizes and lighting conditions [38]. The SDDNet proposed by Cha nition effect of six existing DCNNs on the database, and selects the
et al. [39] can efficiently segment crack features and remove complex Inception-ResNet-v2 network with the best performance for in-depth
backgrounds and crack-like features. Although the detection perfor­ learning, training, and testing of the database. Section 5 uses different
mance of existing convolutional network models for cracks of various training and testing sets to validate the classification results of the

2
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 1. Research roadmap for detection and classification of cracks with three severity levels.

network, which is compared with the four traditional machine learning process of the input image. At the same time, the error information (the
methods for evaluating the performance by accuracy, F1 score, and test gap between the real value and the predicted value) is propagated from
time. Section 6 further tests the consistency of the classification effect of the final classification layer to each convolutional layer in the network
the Inception-ResNet-v2 network in three adverse environments. Sec­ through back propagation. Forward learning and backpropagation
tions 7 concludes this paper by summarizing contributions and together constitute the image classification process of DCNN.
limitations. Unlike the existing shallow CNNs that rely on the simple series
combination of convolutional and pooling layers, the three unique key
2. Methodology modules (Stem module, Inception-resnet module, Reduction module) of
the Inception-ResNet-v2 network enable it to extract richer and more
2.1. Overall architecture effective feature information while avoiding the disappearance of gra­
dients, which are described in detail in part b, c, and d of Fig. 2 based on
This paper uses DCNN to detect and classify crack images with the data processing sequence.
different severity levels, and employs the Inception-ResNet-v2 network
as an example to introduce the structure composition and image pro­ 2.2. Stem module
cessing process. The overall structure of the Inception-ResNet-v2
network mainly includes input layer, convolution layer, pooling layer, The Stem module consists of several convolutional layers and pool­
Stem module, Inception-resnet module, Reduction module and the final ing layers to extract the underlying features of the input crack image, as
Dropout layer and Softmax layer (classification layer). The image pro­ shown in Fig. 2(b). The convolutional layer uses a large number of
cessing process can be regarded as a complex non-linear mapping from convolution kernels (filters) of different sizes such as 1, 3, and 5 to
input original image pixels to output classification score values. The perform a convolution operation with a neighborhood of the input
principle is shown in Eq. (1). image (the weight parameters are used to perform an inner product with
the data in the local window of the input image matrix), then slide from
f (xi , W, b) = Wxi + b (1)
the upper left to the lower right of the image with a certain step and
Where f(xi, W, b) is the mapping, xi is image pixel information, W is output the abstract feature map. A non-linear activation function RELU
the weight parameter of the image pixels, b is the deviation vector. is used to non-linearly transform the output of the convolution layer for
As shown in Fig. 2(a), the crack image of BTS is first sent to the input controlling the activation threshold of each neuron. The mathematical
layer of the Inception-ResNet-v2 network in the form of a three- expression is shown in Eq. (2). The pooling layer includes average
dimensional tensor. Then the feature matrix containing rich and deep pooling and maximum pooling, which can reduce the size of the image
feature information is automatically extracted from the input image matrix without changing the depth of the image matrix. Macroscopi­
data, which passes through a Stem module and several Inception-resnet cally, it can be regarded as converting a high-resolution picture into a
modules and Reduction modules in turn. Finally, the feature matrix low-resolution picture without affecting the image quality, so as to
enters the Softmax layer and outputs the probability that the image reduce the number of parameters in the entire neural network.
belongs to three severity labels, the highest probability is the final
f (x) = max(x, 0) (2)
predicted label of the image, which completes the forward learning

3
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 2. Inception-ResNet-v2 deep convolutional neural network structure composition: (a) Overall structure. (b) Stem module. (c) Inception-resnet-A module. (d)
Reduction-A module.

Resnet module directly connects the input layer and output layer of the
Inception-resnet module, directly learning the difference between the
2.3. Inception-resnet module
input layer and the output layer, and adding the difference to the input
layer for output results, which can effectively solve the problem of
The Inception-resnet module which consists of the Inception module
gradient disappearance caused by excessive network layers and improve
and the Resnet module further extracts the advanced features of the
the accuracy of image recognition.
crack image, as shown in Fig. 2(c). The Inception module uses a parallel
method to combine convolutional layers, increasing the width of the
network and the non-linearity of the structure, reducing the number of 2.4. Reduction module and the final classification layer
parameters in the entire network, which can speed up the calculation
speed and dig deeper image features, compared with the direct serial The Reduction module aggregates the valid information in the
combination method of traditional models (AlexNet, VGGNet). The feature information extracted from the crack image by halving the data

4
W. Wang et al. Automation in Construction 124 (2021) 103484

space and deepening the channel, as shown in Fig. 2(d). The role of the remove a large number of irrelevant background factors and reduce the
Dropout layer is to randomly remove some neurons during the net­ amount of subsequent DCNN calculations, as shown in the Fig. 4. The
work’s forward learning process, thereby reducing the number of pa­ corresponding real region size is about 23 mm × 23 mm after conversion
rameters in the entire network structure, which can prevent overfitting from the reference object.
and improve robustness. The Dropout layer usually retains only 80% of
neurons. The Softmax layer is the final classification layer, the feature 3.2. Image annotation
vector extracted from the crack image is input and the probability dis­
tribution of the image belonging to three classification labels is output, Each cropped BTS crack image (400pixel × 400pixel) needs to be
the most probable is the final predicted label of the image. The cross- labeled in order to obtain the reference required for training and testing.
entropy loss is used to evaluate the error between the predicted value Existing research usually divides the sample image into two labels based
and the real value of the image for judging the training and testing effect on whether or not there are cracks in the image, and detects the cracks
of the convolutional network structure. The mathematical expression is from the image block level, without considering the geometric size in­
shown in Eq. (3). formation of the cracks. Since the width characteristics which represents
the severity levels of cracks have an important impact on the mainte­
L = − [yloĝ
y + (1 − y)log(1 − ̂
y) ] (3)
nance and service methods of BTS and the evaluation of service status,
Where y is the real label; ̂
y is the prediction label. this paper proposes a hierarchical labeling method based on the average
width of cracks. If the average width of the crack in the image block
3. Preparation and processing of the database exceeds a certain limit, it is labeled as high severity level crack; other­
wise, it is labeled as low severity level crack; the image without crack is
3.1. Image acquisition labeled as health slab; which divides all 4650 crack images into different
severity levels. First, the orthogonal projection method [62] is used to
Building a large and comprehensive sample image database is a calculate the continuous width of the cracks in each image, and the
prerequisite for DCNN based image recognition. In this paper, the corresponding average width value is obtained. Then according to the
appearance of CRTS Ш BTS is scanned by high-resolution line-array “Rules for the maintenance of high-speed railway ballastless track lines
cameras mounted on the track inspection vehicle to collect the original (trial)” (Rail Transport [2012] No. 83) and other specifications to label
crack image data. The image acquisition equipment and region are cracks of different severity levels. [63–66].
shown in Fig. 3. 1) Continuous width measurement of cracks based on orthogonal
The original BTS images collected by the track inspection vehicle are projection method.
further cropped to obtain 4650 crack images (400pixel × 400pixel) to Due to the complexity of the characteristics of the crack width, the
manual measurement method based on a limited number of data

Fig. 3. Image acquisition of BTS surface: (a) Track inspection vehicle. (b) Acquisition region.

5
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 4. Cropped BTS crack images: (a) Crack image/1. (b) Crack image/2. (c) Crack image/3. (d) Crack image/4.

samples is insufficient to describe the true change of the crack width, most points inside each crack is stable near the average width. The
and it is easy to cause the final average crack width to be too large. average width can be used to distinguish the cracks with different width
Therefore, the orthogonal projection method is used to accurately levels.
describe the changes in the continuous width pixels of the cracks, and 2) Crack label division.
the average width pixels of the cracks are calculated. The rail image with The geometric width of cracks in BTS is different, and their severity
the width and pixel balance under the same shooting conditions is and maintenance methods required are also different. “Rules for the
selected as the reference object, and the average width of the crack is maintenance of high-speed railway ballastless track lines (trial)” (Rail
obtained according to the ratio of the pixel to the true geometric size. Transport [2012] No. 83) uses 0.1 mm, 0.2 mm and 0.3 mm width
The first step is to convert the original crack image into a binary thresholds to distinguish low, medium and high severity cracks in BTS.
image and extract the skeleton information of the crack, as shown in 0.1 mm is the levelIcrack and should be recorded; 0.2 mm is the level II
Fig. 5(b) and (c); the second step uses Canny edge operators to extract crack and needs to be included in the maintenance plan and repaired in a
crack edge pixels, which tracks and matches edge pixels in the coun­ timely manner; 0.3 mm is the level III crack and should be repaired
terclockwise direction and eight neighborhoods. By matching all edge immediately. In addition, the other three standards, such as the “Code
pixels, the binary edge is transformed into a layered sequence to obtain for durability design on concrete structure of railway” (TB10005–2010),
the complete crack contour, as shown in Fig. 5(d); The third step is to generally use the width threshold of 0.2 mm to distinguish cracks with
determine the direction of the crack skeleton according to the extracted different severity levels, as shown in Table 1.
crack skeleton information. The direction of the crack skeleton is defined Based on the above five standards of BTS in HSR, this paper uses
as the tangential direction at each skeleton point derived from the local 0.2 mm as the width threshold to classify more than 3000 CRTSIII BTS
adjacent skeleton points, as shown in Fig. 5(e); the fourth step is to crack images (400pixel × 400pixel). The average width of the cracks in
derive the orthogonal projection ray as the normal of the skeleton from each image is calculated based on the orthogonal projection method.
each skeleton point pixel. Each orthogonal projection ray and the two Then the crack images with an average width value greater than 0.2 mm
contour lines of the crack have two intersection points, as shown by the are labeled as label 1 (high severity level crack); the crack images with
two yellow intersection points in Fig. 5(f). The Euclidean distance be­ the average width value less than 0.2 mm are labeled as label 2 (low
tween these two intersections is used to determine the crack width at a severity level crack); and the normal BTS images without cracks are
certain point in the crack. The mathematical expression is shown in Eq. selected as a control label and labeled as label 3 (health slab). According
(4). to the 4: 1 principle of training set data and testing set data, 905 images
⃦ ⃦2 (400pixel × 400pixel) are taken as the testing set and 3620 images
aij = ⃦xi − xj ⃦2 (4) (400pixel × 400pixel) are taken as the training set, as shown in Table 2.
Where aij is the width of a point in the crack; xi, xj ∈ C ∩ R; C is the For label 1 (high severity level crack), low-pressure grouting method
crack contour point set; R is the orthographic projection ray point set. should be used for repair immediately; for label 2 (low severity level
The continuous width of the crack calculated by the orthogonal crack), it should be included in the maintenance plan and surface-
projection method is the pixel width. This article further uses the enclosed method should be used for repair.
apparent image of the rail under the same shooting conditions as a In this paper, the crack images of BTS with different severity levels
reference to calculate the actual geometric width of a point in the crack are labeled and identified based on the average width value, which
by analogy. The specification of the rail is 60Kg/m, the surface width of improves the efficiency of crack detection and treatment.
the rail head is consistent, the actual geometric size is 73 mm, and the
pixel value is 1275pixel, so the actual geometric width of a point in the 3.3. Image pre-processing
crack can be obtained as Eq. (5).
Pci The data scale of the three image labels is uneven, the characteristics
Wci = Wr × (5)
Pr of the cracks are not obvious and the degree of similarity with the
background environment is high, which brings great difficulties to
Where Wci represents the actual geometric width of a point in the
subsequent image recognition. This paper uses data enhancement and
crack; Wr represents the actual geometric width of the rail head surface;
histogram equalization [67] to pre-process the crack image
Pci represents the pixel width of a point in the crack; Pr represents the
(400pixel × 400pixel) of BTS. Data enhancement mainly uses random
pixel width of the rail head surface.
angle rotation, random mirror transformation (up and down, left and
Fig. 6 shows the calculation results of the continuous width of the
right), random miscut transformation, elastic distortion and other
cracks in the four example images in Fig. 4. The average width index
methods to randomly change each image while maintaining the same
values of these four cracks are 0.36 mm, 0.49 mm, 0.19 mm, and
pixel size, which expands the size of the original image set and make the
0.17 mm, as shown by the red dotted lines in the figure. The width of
number of images of the three labels reach a balance (Fig. 7). It is

6
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 5. The continuous width of cracks calculated based on the orthogonal projection method: (a) Original image. (b) Binarization. (c) Skeletonization. (d) Extract
crack outline. (e) Determine the orientation of the crack skeleton. (f) Derive orthogonal projection lines and calculate intersection width.

conducive to fully learning image feature information. Histogram and the surrounding background area, which achieves the effect of
equalization is a non-linear stretching of the gray histogram on the enhancing the overall contrast of the image (Fig. 8).
original image, transforming the histogram of the original image into a The data augmentation method is used to perform five cycles
uniform distribution form, increasing the dynamic range of the gray (random changes) on the original image of the three labels, which makes
value of the pixel, highlighting the differences between the crack area the image scale 5 times larger and there are no duplicate images.

7
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 6. The calculation result of the continuous width and average width of the crack: (a) Crack image/1. (b) Crack image/2. (c) Crack image/3. (d) Crack image/4.

According to the 4: 1 principle of training set data and testing set data,
Table 1
taking 3000 images (400pixel × 400pixel) as the testing set and 12,000
Provisions of the existing codes for the limit of concrete cracks of BTS in HSR.
(400pixel × 400pixel) images as the training set. Table 3 shows the la­
Existing code Provisions for crack limit of BTS bels and corresponding numbers of the crack image database on BTS.
(unit: mm)

Rules for the maintenance of high-speed I II III 4. Detection and classification of crack images
railway ballastless track lines (trial) (Rail 0.1 0.2 0.3
Transport [2012] No. 83)
Code for durability design on concrete Carbonized Other
4.1. Performance comparison of six existing deep convolutional neural
structure of railway (TB10005–2010) environment adverse networks
environments
0.2 0.15 Existing DCNNs (such as VGG16, Inception V3, and ResNet V1 50)
Code for design of concrete structures 0.2
can achieve a recognition accuracy of more than 95% in the ImageNet
(GB50010–2010)
Standard for acceptance of track works in 0.1 image database (including 1.2 million images with 1000 labels), which
high-speed railway (TB10754–2018) has been widely used [68]. As for the crack images of BTS, due to the
large differences between the morphological characteristics of cracks
and conventional image classification data sets, it is inconclusive as to

Table 2
Classification labels and quantity of cracks in BTS of HSR.
Image category label Average width Training set Testing set Maintenance measures Total

High severity level crack 1 w>0.2 mm 940 235 Immediate processing (low pressure grouting) 1175
Low severity level crack 2 w ≤ 0.2 mm 1580 395 Included in maintenance plan and timely repair (surface-enclosed method) 1975
Health slab 3 No cracks 1200 300 None 1500
Total 3620 905 4650

8
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 7. Data enhancement.

Fig. 8. Histogram equalization: (a) Original image. (b) Equalized image. (c) Histogram of the original image. (d) Histogram of the equalized image.

9
W. Wang et al. Automation in Construction 124 (2021) 103484

Table 3
Labels and quantity of pre-processed crack image database.
Image category label Average width Training set Testing set Image size/pixel Total

High severity level crack 1 w>0.2 mm 4000 1000 400 × 400 5000
Low severity level crack 2 w ≤ 0.2 mm 4000 1000 5000
Health slab 3 No cracks 4000 1000 5000
Total 12,000 3000 15,000

which network can achieve excellent recognition results. Therefore, this Therefore, this paper chooses the Inception-ResNet-v2 network to
paper uses transfer learning to compare and analyze six existing DCNNs further train and test the database deeply in order to complete the
(with excellent recognition effect on the ImageNet image database), detection and classification of crack images with three width levels. In
which selects the network with the highest accuracy in detecting and addition, pre-processing (including data enhancement and histogram
classifying cracks of three labels. equalization) of the original crack image improves the recognition ac­
Transfer learning is to transfer the structure and weight parameters curacy by 5%, indicating that the quality of the image database also has
of a pre-trained DCNN to a new image classification problem for training an important impact on the final recognition effect.
and recognition [69]. These weight parameters have been trained and
validated on a large-scale image in advance, which have strong gener­ 4.2. Hyperparameters
alization ability. They can well extract feature vectors that are more
streamlined and more expressive from the new image data set, which The Inception-ResNet-v2 network controls and optimizes the image
greatly saves training time. For new image classification problems, the training process through learning rate, weight decay, step, batch size,
transfer learning method can quickly obtain the optimal recognition and back-propagation algorithms. These structural parameters are
effect of the DCNNs. called hyperparameters. There is no specific conclusion on the setting
This paper uses six existing DCNNs with high recognition accuracy strategy of hyperparameters, which is mainly related to the actual image
on the ImageNet image database: VGG16, Inception V2, Inception V3, data. It is necessary to adjust and find the hyperparameter setting
Inception V4, Inception-ResNet-v2 and ResNet V1 50. All the network strategy that is most suitable for the image data according to the error of
structures and pre-trained weight parameters are transferred to the the training set and the testing set.
problem of crack recognition of BTS. The last classification layer of the This paper takes the GPU as the computing core (CPU:
network is modified to output the three severity labels in this paper, AMD2990WX@3.0GHz, RAM = 64GB; GPU: NVIDIA GeForce RTX
12,000 training set images are trained once (1 epoch) and the recogni­ 2080Ti), relying on TensorFlow, a deep learning framework developed
tion effect of each network is validated on 3000 testing set images. by Google, a large number of parameter adjustment experiments were
Table 4 shows the crack recognition accuracy and the number of performed on 12,000 training images and 3000 testing images to obtain
learnable parameters of these six networks. More learnable parameters the optimal settings of the hyperparameters as follows: Setting the batch
mean deeper convolutional layers, enabling the network to extract size (number of images read per update) to 64, epoch (training rounds)
richer image features and master more accurate recognition rules. to 60, weight decay (weight decay coefficient) to 0.00005, Dropout ratio
Therefore, the Inception-ResNet-v2 network with more learnable pa­ to 0.8, and use the Adam algorithm to update the weight of each layer.
rameters can achieve far better recognition accuracy than the Inception At the same time, a large initial learning rate (the magnitude of each
V2, Inception V3, Inception V4, and ResNet V1 50 networks. However, update of the weight parameter) is set to 0.01 and the learning rate is
undue deep convolutional layers not only increase the complexity and updated using an exponential decay method. After completing one
memory of the network, but also cause the parameters of the initial training round (epoch), the learning rate decays to 0.9 times the initial
convolutional layer to not be updated in time, resulting in a decrease in learning rate and it decays to 0.00001 after 60 training rounds (epoch),
the final accuracy, resulting in gradient disappearance. Therefore, as shown in Fig. 9.
although the VGG16 network has more learnable parameters than the
Inception-ResNet-v2 network, its recognition accuracy is lower. The
Inception-resnet module inside the Inception-ResNet-v2 network uses a
parallel method to combine the convolution layers different from the
VGG16 network, and directly connects the input layer and the output
layer, expanding the width of the network and avoiding the disappear­
ance of gradients, which makes the Inception-ResNet-v2 network has
achieved higher recognition accuracy with limited learnable parameters
than VGG16 network.

Table 4
Recognition accuracy and number of learnable parameters of six existing
DCNNs.
Network Image data Accuracy/% Total
parameters/
No Preprocessing
106
preprocessing

VGG16 Training set: 49.32 56.01 138.86


Inception V2 12000 69.88 74.27 10.16
Inception V3 Testing set: 52.98 58.11 24.33
Inception V4 3000 53.02 58.45 43.71
Inception- Size: 75.70 80.08 56.88
ResNet-v2 400 × 400
ResNet V1 Epoch: 1 68.59 72.59 23.51
50
Fig. 9. Learning rate decaying exponentially.

10
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 10. Loss and accuracy curve of optimal parameter settings: (a) Loss function curves for training and testing sets. (b) Accuracy curves for training and testing sets.

4.3. Training and testing results value ranges from 0 to 1, where 0 represents the worst detection effect of
the method, and 1 represents the best. As shown in Eqs. (6), (7) and (8).
As shown in Fig. 10, both the training set loss and the testing set loss
TP
have reached convergence and stability after setting the hyper­ Precision = (6)
TP + FP
parameters and nearly 10,000 steps of training. At this time, the losses of
the two are closest and the training of the network structure has reached TP
the fitting state. In the fitting state, the peak value of the accuracy on the Recall = (7)
TP + FN
training set appeared in the 54th training round, reaching 96.53%; the
peak value of the accuracy on the testing set reached 96.17% (the 56th 2 × Precision × Recall
F1 score = (8)
training round). In this paper, 96.17% testing set accuracy is used as the Precision + Recall
classification accuracy of the Inception-ResNet-v2 network. Where TP (true positive) is positive image correctly recognized by
the network; FP (false positive) is negative example images incorrectly
5. Comparison and evaluation labeled as positive examples; FN (false negative) is positive image
incorrectly labeled as negative image.
5.1. Comparison of different training and testing sets As shown in Fig. 11, the classification accuracy of the fully trained
Inception-ResNet-v2 network on the five testing sets has reached more
When using the Inception-ResNet-v2 network to detect and classify than 96%, and the average detection time for a single image is only
the database, only the features of 12,000 training set images have been 0.0075 s, which indicates that the network can accurately and efficiently
fully learned, while the 3000 testing set images are only used to validate distinguish crack images with three severity levels. The precision and
the learning effect of the network. Due to the images of the training and recall of the Inception-ResNet-v2 network on the five testing sets are all
testing sets are fixed, the generality and repeatability of the Inception- above 90%, and the gap of them between different testing sets is stable
ResNet-v2 network cannot be proven, which may fall into local mini­ within 6%, which shows that the classification performance of the
mum or maximum values. This paper uses the cross-validation (k = 5) network is less interfered by false positive and false negative, and it has
method to establish multiple training and testing sets based on the good generality and repeatability for different training sets and testing
database [70], and uses the accuracy and F1 score to compare and sets. In addition, cracks with high severity levels have lower recall and
evaluate the classification effect of the network. are more susceptible to false negative errors, while false positive errors
First, the crack image database (15,000 images, 3 labels) was are more likely to exist in crack recognition results with low severity
randomly and evenly divided into five smaller image sets levels (lower precision). The harmonic mean of precision and recall, F1
(k = 1,2,3,4,5), each image set contains 3000 images while each label score is used to more objectively consider false positive and false
accounts for 1000 images. Then take one of the image sets as the testing negative, and the average value of F1 score on the five testing sets ex­
set without putting back each time, and use the remaining four image ceeds 96%, which is close to the classification accuracy. This once again
sets as the training set, which can obtain a total of five different training proves the excellent classification performance of the Inception-ResNet-
sets plus corresponding testing sets, and the number of images in each v2 network.
training set and testing set is consistent with the original. The Inception- The cross-validation (k = 5) method can make most of the crack
ResNet-v2 network is used to learn each training set and output the images in the database both as a training set and testing set, which is
confusion matrix of classification results on the corresponding testing beneficial for obtaining as much effective information as possible from
set. Accuracy and the detection time of a single image are commonly the limited image data, fully mining the feature information of different
used to evaluate the overall performance of the classifier. In addition, morphological cracks, and avoiding falling into local extreme values
precision, recall and F1 score are calculated according to the confusion (overfitting). This paper uses the average value of accuracy, detection
matrix result output by the classifier to properly consider false positive time, precision, recall and F1 score on the five training and testing sets
and false negative [71–76]. Precision is the proportion of correct obtained by the cross-validation method as the final classification result
detection results to all actual detection results. Recall is the proportion of the Inception-ResNet-v2 network, as shown in Eq. (9). Therefore, the
of correct detection results to all real results that should be detected. F1 accuracy and the detection time of a single image are 97.41% and
measure is the harmonic mean value of precision and recall, and the 0.0075 s, respectively. The precision, recall and F1 score for detecting

11
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 11. Evaluation indicators of the Inception-ResNet-v2 network on five testing sets: (a) Accuracy and detection time. (b) Precision. (c) Recall. (d) F1 score.

high severity cracks (label 1) are 98.37%, 93.82% and 95.99% respec­ from input space to output space. It consists of input layer, hidden layer
tively, and these of low-severity cracks (label 2) are 94.25%, 98.39% and output layer, the large number of neurons between the two layers
and 96.23% respectively. The above three indicators of health slab (label are connected to each other using weight coefficients. Input data is
3) are all 100%. inferred and classified by adjusting weights and thresholds, and back
propagation is used to continuously reduce errors [23]. KNN is a ma­

k
qi chine learning method based on analogy reasoning. It compares the
Q = i=1 (9) testing image and the training image and calculates the distance be­
k tween them. It selects the label of the training image with the smallest
Where qi is the evaluation index of each testing set; k is the number of distance from the testing image and assigns it to the testing image [25].
divided image sets; Q is the final classification result of the Inception- NBC is a machine learning algorithm based on probability theory. It uses
ResNet-v2 network. the prior knowledge received from the training image to obtain the
posterior probability of the testing image. Decision function is modified
continuously to make the classification more accurate by updating in­
5.2. Comparison of different classification methods formation [23].
As shown in Fig. 12, according to the histogram of the evaluation
This paper evaluates the performance of crack classification of DCNN indicators (accuracy, detection time, F1 score/label 1, F1 score/label 2,
through a comparison testing of the Inception-ResNet-v2 network and F1 score/label 3), it can be seen that the Inception-ResNet-v2 network
four traditional machine learning methods, including Support Vector established in this paper has the best classification effect on the entire
Machine (SVM), Back Propagation Artificial Neural Network (BPANN), testing set, which receives an accuracy of 97.41% and the F1 score of
Naive Bayesian Classifier (NBC) and K-Nearest Neighbor (KNN). SVM is each label is also above 95%; Followed by BPANN and KNN, the
defined as a linear classifier with the largest interval in the feature space. recognition accuracy of the two methods for the entire testing set is more
It relies on kernel function mapping to set a classification hyperplane on than 85%, but the F1 score of label 1 (high severity level crack) and label
the n-dimensional space composed of the input training data, which 2 (low severity level crack) is not high; NBC can recognize label 3 (health
maximizes the separation between different classes. In this paper, the slab) well, but cannot distinguish between label 1 and label 2; The
HOG and GLCM features are extracted from the image and fused as classification effect of SVM for the entire testing set and each label does
training data for the SVM classifier. BPANN is a non-linear mapping

12
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 12. Performance indicators value of classification results of five methods: (a) Accuracy and detection time. (b) F1 score.

not fluctuate much, only about 70%. In addition, the Inception-ResNet- 6.1. Background noise
v2 network in this paper has the least detection time, only 0.0075 s,
which greatly improves the efficiency of crack detection and classifica­ Salt and pepper noise is a common noise in digital images. It can
tion compared to traditional machine learning methods. In summary, simulate the interference of background noise by randomly adding black
the Inception-ResNet-v2 network can effectively detect and classify or white pixels to the image. The signal noise rate (SNR: value range
crack images of BTS with three severity levels, which has the charac­ 0–1) is used to measure the noise in the image. The smaller the SNR, the
teristics of high accuracy and high efficiency and is superior to tradi­ larger the proportion of noise in the image. As shown in Fig. 14(a), with
tional machine learning methods. the continuous increase of noise in the image, the accuracy and F1 score
of the network slowly decreases for all testing images and each label
6. Consistency of classification results in adverse environmental images. When nearly half of the pixels in the image are converted into
conditions noise, the accuracy and F1 score are still above 80%.

In order to explore the robustness and adaptability of the Inception-


6.2. Light intensity (too strong or too weak)
ResNet-v2 network, this paper selects 300 images (each label contains
100 images) from the original testing set to further test the classification
By increasing or decreasing the brightness value (range 0–255) of
effect of the network under adverse environmental conditions, including
each pixel in the image to simulate the change of light under natural
three types: interference from background noise, the influence of light
environmental conditions. As shown in Fig. 14(b), appropriately
intensity and image blur caused by human factors. Adding different
increasing or decreasing the light has little effect on consistency of the
degrees of salt and pepper noise to the original image to simulate the
accuracy and F1 score. However, the F1 score of label 1 (high severity
interference of the background noise; performing Gaussian blur opera­
level crack) has decreased under strong light; in dark environment, it is
tion on the image to simulate the blurred image caused by focusing
difficult to accurately distinguish between cracks (label 1, label 2) and
errors during image acquisition; and changing the brightness value of
health slabs (label 3). The overall accuracy of 300 testing images has
the image to simulate the impact of light in the natural environment. As
decreased by 10% in extreme light environments.
shown in Fig. 13. The three types of images are analyzed and evaluated
from the aspects of accuracy index and F1 score index.
6.3. Image blur

Gaussian blur uses a Gaussian function with normal distribution to

Fig. 13. Adverse environmental conditions: (a) Background noise. (b) Strong light. (c) Week light (d) Image blur.

13
W. Wang et al. Automation in Construction 124 (2021) 103484

Fig. 14. The trend of accuracy and F1 score under three adverse environmental conditions: (a) Background noise. (b) Light intensity (too strong or too weak). (c)
Image blur.

perform convolution operation on the original image, which simulates In summary, the classification results of the Inception-ResNet-v2
blurred images collected due to artificial focus errors. The mathematical network can still guarantee good consistency under the influence of
expression is shown in Eq. (10). background noise and light intensity, and the accuracy and F1 score
remain above 80%. But for blurred images, the accuracy and F1 score of
1
(10) cracks classification are significantly reduced. This shows that the
2
F(r) = √̅̅̅̅̅̅̅̅̅̅N e− r /(2σ)
2

2πσ2 Inception-ResNet-v2 network has good robustness and adaptability to


Where σ is the standard deviation of the normal distribution, the the influence of background noise and light intensity, but the recogni­
larger the value, the more blurred the image is; r is a Gaussian fuzzy tion effect of blurred images needs to be improved.
matrix, which is generally taken as (6σ + 1) × (6σ + 1) in two-
dimensional image space. 7. Conclusions
As shown in Fig. 14(c), for label 3 (health slab), the F1 score is less
affected by blurred images; for label 1 (high severity level crack) and This paper proposes a novel quantitative classification method based
label 2 (low severity level crack), the blurred images have a big influ­ on IPT pre-processing and DCNN. Unlike the existing crack quantifica­
ence on the F1 score of the network; when the blur degree of the image is tion methods with multiple steps, high computational costs and low
small, it is easy to misjudge high severity level cracks as low severity automation, the fully trained Inception-ResNet-v2 network neither
level cracks, which causes the F1 score of label 1 to decrease greatly; needs to segment the pixel features of cracks nor to calculate detailed
however, with the increase of the blur degree of the images, the differ­ numerical indicators, which can accurately and efficiently detect and
ence between the low severity level cracks and the health slabs gradually classify cracks with three severity levels from the image level based on
narrows while the high severity level cracks still have a certain degree of the similarity of features between training data and inspection data,
recognition, which causes the F1 score of the label 2 and label 3 to achieving the accuracy of 97.41%, taking only 0.0075 s to detect a single
decrease, and the F1 score of the label 1 to increase. Affected by image image (400pixel × 400pixel). For the crack of high severity level (label
blur, the overall classification accuracy of the network for 300 testing 1), the precision, recall and F1 score are 98.37%, 93.82% and 95.99%
images is reduced to half of the original. respectively; for the crack of low severity level (label 2), these are

14
W. Wang et al. Automation in Construction 124 (2021) 103484

94.25%, 98.39% and 96.23% respectively. The above three indicators of Science and Technology of Guizhou Province, China ([2018] 2154); and
health slab (label 3) are all 100%. In addition, it is general and repeat­ Key Project of China State Railway Group Co., Ltd. (N2019G024).
able, which has good robustness and adaptability to noise and light
intensity. References
The database quality and DCNN architecture are two key factors
affecting the accuracy of the final classification. The orthogonal pro­ [1] S. Zhu, M. Wang, W. Zhai, C. Cai, C. Zhao, D. Zeng, J. Zhang, Mechanical property
and damage evolution of concrete interface of ballastless track in high-speed
jection method is used to calculate the continuous width of each point railway: experiment and simulation, Constr. Build. Mater. 187 (2018) 460–473,
inside the crack based on the extracted crack pixel skeleton and further https://doi.org/10.1016/j.conbuildmat.2018.07.163.
obtain the average width of the crack during image annotation, which [2] Z.P. Zeng, J.W. Wang, S.W. Shen, P. Li, A.S. Abdulmumin, W.D. Wang,
Experimental study on evolution of mechanical properties of CRTS III ballastless
effectively solves the shortcomings of low efficiency and large deviation slab track under fatigue load, Constr. Build. Mater. 210 (2019) 639–649, https://
of manual measurement methods. And using data enhancement and doi.org/10.1016/j.conbuildmat.2019.03.080.
histogram equalization to pre-process the crack images, which improves [3] I. Abeel-Qader, O. Abudayyeh, M.E. Kelly, Analysis of edge-detection techniques
for crack identification in bridges, J. Comput. Civ. Eng. 17 (4) (2003) 255–263,
the accuracy by 5%. The transfer learning method is used to compare https://doi.org/10.1061/(ASCE)0887-3801(2003)17:4(255).
and analyze the classification effect of six DCNN architectures, including [4] S.Y. Lee, S.H. Lee, D.I. Shin, Y.K. Son, C.S. Han, Development of an inspection
VGG16, Inception V2, Inception V3, Inception V4, Inception-ResNet-v2 system for cracks in a concrete tunnel lining, Can. J. Civ. Eng. 34 (8) (2007)
966–975, https://doi.org/10.1139/L07-008.
and ResNet V1 50. The results show that the Inception-ResNet-v2
[5] N.D. Hoang, Q.L. Nguyen, Metaheuristic optimized edge detection for recognition
network has the best classification performance on the crack image of concrete wall cracks: a comparative study on the performances of Roberts,
database of BTS. Prewitt, canny, and Sobel algorithms, Adv. Civil Eng. 2018 (2018) 7163580,
The comparative experiment is performed to evaluate the detection https://doi.org/10.1155/2018/7163580.
[6] S. Dorafshan, R.J. Thomas, M. Maguire, Comparison of deep convolutional neural
and classification performance of the Inception-ResNet-v2 network from networks and edge detectors for image-based crack detection in concrete, Constr.
data and methodology perspective, respectively. Data-wise, cross-vali­ Build. Mater. 186 (2018) 1031–1045, https://doi.org/10.1139/L07-008.
dation method (k = 5) is used to obtain five different training sets and [7] Y.J. Cha, W. Choi, O. Buyukozturk, Deep learning-based crack damage detection
using convolutional neural networks, Comp. Aid. Civil Infrastruct. Eng. 32 (5)
corresponding testing sets based on the single crack image database, and (2017) 361–378, https://doi.org/10.1111/mice.12263.
the network is used for training and testing. The results show that the [8] Y.O. Adu-Gyamfi, N.O.A. Okine, G. Garateguy, R. Carrillo, G.R. Arce,
Inception-ResNet-v2 network has good generality and repeatability for Multiresolution information mining for pavement crack image analysis, J. Comput.
Civ. Eng. 26 (6) (2012) 741–749, https://doi.org/10.1061/(asce)cp.1943-
different image data. Methodology-wise, the network is compared with 5487.0000178.
four widely used machine learning methods, including SVM, BPANN, [9] H. Zakeri, F.M. Nejad, A. Fahimifar, Rahbin: a quadcopter unmanned aerial vehicle
NBC and KNN, which proved that deep learning methods have signifi­ based on a systematic image processing approach toward an automated asphalt
pavement inspection, Autom. Constr. 72 (2016) 211–235, https://doi.org/
cant advantages in automatic crack detection. 10.1016/j.autcon.2016.09.002.
This paper further tests the classification effect of the Inception- [10] D. Zhang, S. Qu, L. He, S. Shi, Automatic ridgelet image enhancement algorithm for
ResNet-v2 network under three adverse environmental conditions. road crack image based on fuzzy entropy and fuzzy divergence, Opt. Lasers Eng. 47
(11) (2009) 1216–1225, https://doi.org/10.1016/j.optlaseng.2009.05.014.
Adding salt and pepper noise to the image to simulate the interference of
[11] W. Zhang, Z. Zhang, D. Qi, Y. Liu, Automatic crack detection and classification
background noise; changing the brightness value of the image to simu­ method for subway tunnel safety monitoring, Sensors (Basel) 14 (10) (2014)
late the change in the light intensity of the natural environment; per­ 19307–19328, https://doi.org/10.3390/s141019307.
forming Gaussian blur on the image to simulate the image blur caused by [12] D. Zhang, Q. Li, Y. Chen, M. Cao, L. He, B. Zhang, An efficient and reliable coarse-
to-fine approach for asphalt pavement crack detection, Image Vis. Comput. 57
human factors during image acquisition. The results show that the (2017) 130–146, https://doi.org/10.1016/j.imavis.2016.11.018.
Inception-ResNet-v2 network has good robustness and adaptability to [13] H.D. Cheng, X.J. Shi, C. Glazier, Real-time image thresholding based on sample
adverse environments such as background noise interference, strong space reduction and interpolation approach, J. Comput. Civ. Eng. 17 (4) (2003)
264–272, https://doi.org/10.1061/(ASCE)0887-3801(2003)17:4(264).
light or weak light; but for the image blur caused by human factors, the [14] Y.G. Tang, X.M. Zhang, X.L. Li, X.P. Guan, Application of a new image
accuracy and F1 score fluctuates greatly with the change of standard segmentation method to detection of defects in castings, Int. J. Adv. Manuf.
deviation. Technol. 43 (5–6) (2009) 431–439, https://doi.org/10.1007/s00170-008-1720-1.
[15] Q.Q. Li, Q. Zou, D.Q. Zhang, FoSA: F* seed-growing approach for crack-line
The fully trained DCNN is first applied to detect and classify cracks detection from pavement images, Image Vis. Comput. 29 (12) (2011) 861–872,
with different severity levels of BTS in HSR, which is demonstrated to be https://doi.org/10.1016/j.imavis.2011.10.003.
feasible and effective. The proposed method greatly improves the effi­ [16] H. Oliveira, P.L. Correia, Road Surface Crack Detection: Improved Segmentation
with Pixel-based Refinement, 25th European Signal Processing Conference
ciency and scope of maintenance and repairs while reducing labor re­ (EUSIPCO), IEEE, Greece, 2017, pp. 2026–2030, https://doi.org/10.23919/
quirements, which is of great significance for safe operation of HSR. eusipco.2017.8081565.
However, some limitations should also be noted, only the average width [17] D.J. Zhang, Q.Q. Li, Y. Chen, An efficient and reliable coarse-to-fine approach for
asphalt pavement crack detection, Image Vis. Comput. 57 (2017) 130–146,
is used in this paper to define the severity of cracks without considering
https://doi.org/10.1016/j.imavis.2016.11.018.
the influence of length on the detection results, and when the average [18] A. Zhang, Q. Li, K.C.P. Wang, S. Qiu, Matched filtering algorithm for pavement
width of the crack in the inspection data differs greatly from the training cracking detection, Transp. Res. Rec. 2367 (2013) 30–42, https://doi.org/
data, DCNN may not be able to make correct predictions. In addition, the 10.3141/2367-04.
[19] B. Steven, J.T. Harvey, B.W. Tsai, An Assessment of Automated Pavement Distress
presence of shadows and foreign objects will also bring great challenges Identification Technologies in California (Technical Memorandum No. UCPRC-TM-
to the detection results of DCNN. Therefore, future research is expected 2008-13), University of California Pavement Research Center (UCPRC), 2009,
to further enrich the crack database, increase the detectable length or pp. 1–70. http://www.ucprc.ucdavis.edu/pdf/stage_4_3.3_eval_autom_pav_assess_
techn_tm.pdf (Last accessed: 2020-07-05).
width range, and include more possible adverse environmental [20] G. Miguel, B. David, M. Oscar, Adaptive road crack detection system by pavement
conditions. classification, Sensors 11 (10) (2011) 9628–9657, https://doi.org/10.3390/
s111009628.
[21] S.F. Wang, S. Qiu, W.J. Wang, D. Xiao, K.C.P. Wang, Cracking classification using
Declaration of Competing Interest minimum rectangular cover-based support vector machine, J. Comput. Civ. Eng. 31
(5) (2017), 04017027, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000672.
None. [22] Y.F. Pan, X.F. Zhang, G. Cervone, L.P. Yang, Detection of asphalt pavement
potholes and cracks based on the unmanned aerial vehicle multispectral imagery,
IEEE J. Select. Topics Appl. Earth Observ. Remote Sens. 11 (10) (2018) 3701–3712,
Acknowledgements https://doi.org/10.1109/JSTARS.2018.2865528.
[23] N.D. Hoang, Q.L. Nguyen, Automatic recognition of asphalt pavement cracks based
on image processing and machine learning approaches: a comparative study on
The research is supported by the High-Speed Railway Infrastructure classifier performance, Math. Probl. Eng. 2018 (2018) 6290498, https://doi.org/
Joint Fund of the National Natural Science Foundation of China 10.1155/2018/6290498.
(U1734208); Science and Technology Support Plan of the Department of

15
W. Wang et al. Automation in Construction 124 (2021) 103484

[24] N.D. Hoang, Q.L. Nguyen, A novel method for asphalt pavement crack [49] D. Kang, S.S. Benipal, D.L. Gopal, Y.J. Cha, Hybrid pixel-level concrete crack
classification based on image processing and machine learning, Eng. Comput. 35 segmentation and quantification across complex backgrounds using deep learning,
(2) (2018) 487–498, https://doi.org/10.1007/s00366-018-0611-9. Autom. Constr. 118 (2020) 103291, https://doi.org/10.1016/j.
[25] F. Duan, S. Yin, P. Song, W. Zhang, C. Zhu, H. Yokoi, Automatic welding defect autcon.2020.103291.
detection of x-ray images by using cascade AdaBoost with penalty term, IEEE [50] G.H. Beckman, D. Polyzois, Y.J. Cha, Deep learning-based automatic volumetric
Access 7 (2019) 125929–125938, https://doi.org/10.1109/ damage quantification using depth camera, Autom. Constr. 99 (2019) 114–124,
ACCESS.2019.2927258. https://doi.org/10.1016/j.autcon.2018.12.006.
[26] Q. Yang, Y. Deng, Evaluation of cracking in asphalt pavement with stabilized base [51] F.T. Ni, J. Zhang, Z.Q. Chen, Zernike-moment measurement of thin-crack width in
course based on statistical pattern recognition, Int. J. Pavement Eng. 20 (4) (2017) images enabled by dual-scale deep learning, Comp. Aid. Civil Infrastruct. Eng. 34
417–424, https://doi.org/10.1080/10298436.2017.1299528. (5) (2019) 367–384, https://doi.org/10.1111/mice.12421.
[27] Y.J. Cha, K. You, W. Choi, Vision-based detection of loosened bolts using the [52] X.C. Yang, H. Li, Y.T. Yu, X.C. Luo, T. Huang, X. Yang, Automatic pixel-level crack
Hough transform and support vector machines, Autom. Constr. 71 (2) (2016) detection and measurement using fully convolutional network, Comp. Aid. Civil
181–188, https://doi.org/10.1016/j.autcon.2016.06.008. Infrastruct. Eng. 33 (12) (2018) 1090–1109, https://doi.org/10.1111/mice.12412.
[28] G.A. Xu, J.L. Ma, F.F. Liu, X.X. Niu, Automatic Recognition of Pavement Surface [53] M. Alinizzi, J.Y. Qiao, A. Kandil, Integration and Evaluation of Automated
Crack Based on BP Neural Network, Proceedings of the 2008 International Pavement Distress Data in INDOT’s Pavement Management System (Joint
Conference on Computer and Electrical Engineering (ICCEE), IEEE, Phuket, Transportation Research Program Publication No. FHWA/IN/JTRP-2017/07),
Thailand, 2008, pp. 19–22, https://doi.org/10.1109/ICCEE.2008.96. Purdue University, 2017, pp. 1–70, https://doi.org/10.5703/1288284316507.
[29] Y. Shi, L.M. Cui, Z.Q. Qi, Automatic road crack detection using random structured [54] Z. Tong, J. Gao, A.M. Sha, L.Q. Hu, S. Li, Convolutional neural network for asphalt
forests, IEEE Trans. Intell. Transp. Syst. 17 (12) (2016) 3434–3445, https://doi. pavement surface texture analysis, Comp. Aid. Civil Infrastruct. Eng. 33 (12)
org/10.1109/TITS.2016.2552248. (2018) 1056–1072, https://doi.org/10.1111/mice.12406.
[30] H.Y. Ju, W. Li, S. Tighe, R.R. Deng, S. Yan, Illumination compensation model with [55] M. Kouzehgar, Y.K. Tamilselvam, M.V. Heredia, M.R. Elara, Self-reconfigurable
k-means algorithm for detection of pavement surface cracks with shadow, facade-cleaning robot equipped with deep-learning-based crack detection based on
J. Comput. Civ. Eng. 34 (1) (2020), 04019049, https://doi.org/10.1061/(ASCE) convolutional neural networks, Autom. Constr. 108 (2019) 102959, https://doi.
CP.1943-5487.0000869. org/10.1016/j.autcon.2019.102959.
[31] I. Abdel-Qader, S. Pashaie-Rad, O. Abudayyeh, PCA-based algorithm for [56] J. Lee, H.S. Kim, N. Kim, E.M. Ryu, J.W. Kang, Learning to detect cracks on
unsupervised bridge crack detection, Adv. Eng. Softw. 37 (12) (2006) 771–778, damaged concrete surfaces using two-branched convolutional neural network,
https://doi.org/10.1016/j.advengsoft.2006.06.002. Sensors 19 (21) (2019) 4796, https://doi.org/10.3390/s19214796.
[32] L. Qiu, F. Fang, S.F. Yuan, C. Boller, Y.Q. Ren, An enhanced dynamic Gaussian [57] S. Park, S. Bang, H. Kim, H. Kim, Patch-based crack detection in black box images
mixture model-based damage monitoring method of aircraft structures under using convolutional neural networks, J. Comput. Civ. Eng. 33 (3) (2019),
environmental and operational conditions, Struct. Health Monitor. Int. J. 18 (2) 04019017, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000831.
(2019) 524–545, https://doi.org/10.1177/1475921718759344. [58] C. Szegedy, W. Liu, Y.Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
[33] S. Mathavan, M. Rahman, K. Kamal, Use of a self-organizing map for crack V. Vanhoucke, A. Rabinovich, Going Deeper with Convolutions, IEEE Conference
detection in highly textured pavement images, J. Infrastruct. Syst. 21 (3) (2015), on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA, 2015,
04014052, https://doi.org/10.1061/(ASCE)IS.1943-555X.0000237. pp. 1–9, https://doi.org/10.1109/cvpr.2015.7298594.
[34] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444, [59] K.M. He, X.Y. Zhang, S.Q. Ren, J. Sun, Deep Residual Learning for Image
https://doi.org/10.1038/nature14539. Recognition, IEEE Conference on Computer Vision and Pattern Recognition
[35] B. Kim, S. Cho, Image-based concrete crack assessment using mask and region- (CVPR), IEEE, Seattle, WA, 2016, pp. 770–778, https://doi.org/10.1109/
based convolutional neural, Struct. Control. Health Monit. 26 (8) (2019), e2381, CVPR.2016.90.
https://doi.org/10.1002/stc.2381. [60] C. Szegedy, V. Vanhoucke, S. Loffe, J. Shlens, Z. Wojna, Rethinking the Inception
[36] Y. Xu, Y.Q. Bao, J.H. Chen, W.M. Zuo, H. Li, Surface fatigue crack identification in Architecture for Computer Vision, IEEE Conference on Computer Vision and
steel box girder of bridges by a deep fusion convolutional neural network based on Pattern Recognition (CVPR), IEEE, Seattle, WA, 2016, pp. 2818–2826, https://doi.
consumer-grade camera images, Struct. Health Monitor. Int. J. 18 (3) (2019) org/10.1109/CVPR.2016.308.
653–674, https://doi.org/10.1177/1475921718764873. [61] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale
[37] V. Mandal, L. Uong, Y. Adu-Gyamfi, Automated Road Crack Detection Using Deep Image Recognition, arXiv preprint arXiv: 1409.1556 277, 2014, https://arxiv.org/
Convolutional Neural Networks, IEEE International Conference on Big Data (Big abs/1409.1556.
Data), IEEE, Seattle, WA, 2018, pp. 5212–5215, https://doi.org/10.1109/ [62] S. Qiu, W.J. Wang, S.F. Wang, K.C.P. Wang, Methodology for accurate AASHTO
BigData.2018.8622327. PP67-10–based cracking quantification using 1-mm 3D pavement images,
[38] Y.J. Cha, W. Choi, G. Suh, Autonomous structural visual inspection using region- J. Comput. Civ. Eng. 31 (2) (2017), 04016056, https://doi.org/10.1061/(ASCE)
based deep learning for detecting multiple damage types, Comp. Aid. Civil CP.1943-5487.0000627.
Infrastruct. Eng. 33 (9) (2018) 731–747, https://doi.org/10.1111/mice.12334. [63] National Railway Administration of the People’’s Republic of China, Rules for the
[39] W. Choi, Y.J. Cha, SDDNet: real-time crack segmentation, IEEE Trans. Ind. Maintenance of High-Speed Railway Ballastless Track Lines (trial) (Rail Transport
Electron. 67 (9) (2020) 8016–8025, https://doi.org/10.1109/TIE.2019.2945265. [2012] No. 83), China Railway Press, 2012. http://www.nra.gov.cn/jgzf/flfg/gfxw
[40] L. Zhang, F. Yang, Y.D. Zhang, Road Crack Detection Using Deep Convolutional j/zt/other/201803/t20180321_54084.shtml ((Last accessed: 2020-01-03) (in
Neural Network, 23rd IEEE International Conference on Image Processing (ICIP), Chinese)).
IEEE, Phoenix, AZ, 2016, pp. 3708–3712, https://doi.org/10.1109/ [64] National Railway Administration of the People’’s Republic of China, Code for
icip.2016.7533052. Durability Design on Concrete Structure of Railway (TB10005–2010), China
[41] K. Makantasis, E. Protopapadakis, A. Doulamis, N. Doulamis, C. Loupos, Deep Railway Press, 2010. http://www.cssn.net.cn/cssn/front/gbdetail.jsp?A001=N
Convolutional Neural Networks for Efficient Vision Based Tunnel Inspection, 11th jg1MTM5Mw ((Last accessed: 2020-01-03) (in Chinese)).
IEEE International Conference on Intelligent Computer Communication and [65] China Building Science Academy, Code for Design of Concrete Structures
Processing (ICCP), IEEE, Cluj Napoca, Romania, 2015, pp. 335–342, https://doi. (GB50010–2010), China Building Industry Press, 2010. http://www.cssn.net.
org/10.1109/ICCP.2015.7312681. cn/cssn/front/gbdetail.jsp?A001=NjE1NDI0Mw ((Last accessed: 2020-01-03) in
[42] E. Protopapadakis, N. Doulamis, Image based approaches for tunnels’ defects Chinese).
recognition via robotic inspectors, 11th international symposium on visual [66] National Railway Administration of the People’’s Republic of China, Standard for
computing (ISVC), Adv. Visual Comput. PT I 9474 (2015) 706–716, https://doi. Acceptance of Track Works in High-Speed Railway (TB10754–2018), China
org/10.1007/978-3-319-27857-5_63. Railway Press, 2018. http://www.cssn.net.cn/cssn/front/gbdetail.jsp?A001=N
[43] N.N. Wang, Q.G. Zhao, S.Y. Li, X.F. Zhao, P. Zhao, Damage classification for zk1OTg1MA ((Last accessed: 2020-01-03) in Chinese).
masonry historic structures using convolutional neural networks based on still [67] J.A. Stark, Adaptive image contrast enhancement using generalizations of
images, Comp. Aid. Civil Infrastruct. Eng. 33 (12) (2018) 1073–1089, https://doi. histogram equalization, IEEE Trans. Image Process. 9 (5) (2000) 889–896, https://
org/10.1111/mice.12411. doi.org/10.1109/83.841534.
[44] N.D. Hoang, Q.L. Nguyen, V.D. Tran, Automatic recognition of asphalt pavement [68] C. Szegedy, S. Loffe, V. Vanhoucke, A.A. Alemi, Inception-v4, Inception-ResNet and
cracks using metaheuristic optimized edge detection algorithms and convolution the impact of residual connections on learning, in: 31st AAAI Conference on
neural network, Autom. Constr. 94 (2018) 203–213, https://doi.org/10.1016/j. Artificial Intelligence, AAAI, San Francisco, CA, 2017, pp. 4278–4284.
autcon.2018.07.008. https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/viewPaper/14806
[45] K.G. Zhang, H.D. Cheng, B.Y. Zhang, Unified approach to pavement crack and (Last accessed: 2020-01-03).
sealed crack detection using preclassification based on transfer learning, [69] K. Gopalakrishnan, S.K. Khaitan, A. Choudhary, A. Agrawal, Deep convolutional
J. Comput. Civ. Eng. 32 (2) (2018), 04018001, https://doi.org/10.1061/(ASCE) neural networks with transfer learning for computer vision-based data-driven
CP.1943-5487.0000736. pavement distress detection, Constr. Build. Mater. 157 (2017) 322–330, https://
[46] B. Kim, S. Cho, Automated vision-based detection of cracks on concrete surfaces doi.org/10.1016/j.conbuildmat.2017.09.110.
using a deep learning technique, Sensors 18 (10) (2018) 3452, https://doi.org/ [70] I. Valavanis, D. Kosmopoulos, Multiclass defect detection and classification in weld
10.3390/s18103452. radiographic images using geometric and texture features, Expert Syst. Appl. 37
[47] H.W. Huang, Q.T. Li, D.M. Zhang, Deep learning based image recognition for crack (12) (2010) 7606–7614, https://doi.org/10.1016/j.eswa.2010.04.082.
and leakage defects of metro shield tunnel, Tunn. Undergr. Space Technol. 77 [71] A. Zhang, K.C.P. Wang, B.X. Li, et al., Automated pixel-level pavement crack
(2018) 166–176, https://doi.org/10.1016/j.tust.2018.04.002. detection on 3D asphalt surfaces using a deep-learning network, Comp. Aid. Civil
[48] F.C. Chen, M.R. Jahanshahi, NB-CNN: deep learning-based crack detection using Infrastruct. Eng. 32 (10) (2017) 805–819, https://doi.org/10.1016/10.1111/
convolutional neural network and naive Bayes data fusion, IEEE Trans. Ind. mice.12297.
Electron. 65 (5) (2018) 4392–4400, https://doi.org/10.1109/TIE.2017.2764844.

16
W. Wang et al. Automation in Construction 124 (2021) 103484

[72] X.X. Zhang, D. Rajan, B. Story, Concrete crack detection using context-aware deep J. Comput. Civ. Eng. 32 (5) (2018), 04018041, https://doi.org/10.1061/(ASCE)
semantic segmentation network, Comp. Aid. Civil Infrastruct. Eng. 34 (11) (2019) CP.1943-5487.0000775.
951–971, https://doi.org/10.1111/mice.12477. [75] B. Kim, S. Cho, Automated vision-based detection of cracks on concrete surfaces
[73] A. Zhang, K.C.P. Wang, Y. Fei, et al., Automated pixel-level pavement crack using a deep learning technique, Sensors 18 (10) (2018) 3452, https://doi.org/
detection on 3D asphalt surfaces with a recurrent neural network, Comp. Aid. Civil 10.3390/s18103452.
Infrastruct. Eng. 34 (3) (2019) 213–229, https://doi.org/10.1111/mice.12409. [76] Y. Fei, K.C.P. Wang, A. Zhang, Pixel-level cracking detection on 3D asphalt
[74] A. Zhang, K.C.P. Wang, Y. Fei, et al., Deep learning-based fully automated pavement images through deep-learning- based CrackNet-V, IEEE Trans. Intell.
pavement crack detection on 3D asphalt surfaces with an improved CrackNet, Transp. Syst. 21 (1) (2020) 273–284, https://doi.org/10.1109/
TITS.2019.2891167.

17

You might also like