Sensors

sensors
Article
Enhanced Intelligent Identification of Concrete
Cracks Using Multi-Layered Image
Preprocessing-Aided Convolutional Neural Networks
Ronghua Fu 1 , Hao Xu 2,3 , Zijian Wang 4 , Lei Shen 1 , Maosen Cao 1,5, *, Tongwei Liu 1 and
Drahomír Novák 6
1 Department of Engineering Mechanics, Hohai University, Nanjing 210098, China;
ronghua@hhu.edu.cn (R.F.); leishen@hhu.edu.cn (L.S.); twliu@hhu.edu.cn (T.L.)
2 School of Aeronautics and Astronautics, Faculty of Vehicle Engineering and Mechanics, State Key
Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Dalian 116024,
China; xuhao@dlut.edu.cn
3 Department of Civil and Environmental Engineering, Northwestern University, Chicago, IL 60626, USA
4 Key Laboratory of C&PC Structures, Ministry of Education, Southeast University, Nanjing 211189, China;
wzj@seu.edu.cn
5 Jiangxi Provincial Key Laboratory of Environmental Geotechnical Engineering and Disaster Control,
Jiangxi University of Science and Technology, Ganzhou 341000, China
6 Institute of Structural Mechanics, Faculty of Civil Engineering, Brno University of Technology, 60200 Brno,
Czech Republic; novak.d@fce.vutbr.cz
* Correspondence: cmszhy@hhu.edu.cn

Received: 5 March 2020; Accepted: 30 March 2020; Published: 3 April 2020
Abstract: Crack identification plays an essential role in the health diagnosis of various concrete
structures. Among different intelligent algorithms, the convolutional neural networks (CNNs)
has been demonstrated as a promising tool capable of efficiently identifying the existence and
evolution of concrete cracks by adaptively recognizing crack features from a large amount of concrete
surface images. However, the accuracy as well as the versatility of conventional CNNs in crack
identification is largely limited, due to the influence of noise contained in the background of the
concrete surface images. The noise originates from highly diverse sources, such as light spots, blurs,
surface roughness/wear/stains. With the aim of enhancing the accuracy, noise immunity, and versatility
of CNN-based crack identification methods, a framework of enhanced intelligent identification of
concrete cracks is established in this study, based on a hybrid utilization of conventional CNNs with
a multi-layered image preprocessing strategy (MLP), of which the key components are homomorphic
filtering and the Otsu thresholding method. Relying on the comparison and fine-tuning of classic CNN
structures, networks for detection of crack position and identification of crack type are built, trained,
and tested, based on a dataset composed of a large number of concrete crack images. The effectiveness
and efficiency of the proposed framework involving the MLP and the CNN in crack identification are
examined by comparative studies, with and without the implementation of the MLP strategy. Crack
identification accuracy subject to different sources and levels of noise influence is investigated.
Keywords: concrete crack identification; convolutional neural network; homomorphic filtering;

structural health monitoring; signal processing
1. Introduction
Concrete structures in various forms, such as high-rise buildings, bridges and dams, suffer
from continuous health deterioration during long service periods [1,2], making real-time detection
Sensors 2020, 20, 2021; doi:10.3390/s20072021 www.mdpi.com/journal/sensors

Sensors 2020, 20, 2021 2 of 24
of different types of structural damage a crucial demand [3]. Traditional visual inspection methods
relying on human labor for damage diagnosis entail unavoidable limitations such as high dependence
on individual subjectivity, expertise, and an extensive amount of labor [4]. In past decades, image
processing techniques demonstrated great advantages in aspects of efficiency, accuracy, objectivity [5],
etc., and were widely adopted to extract damage features, where crack recognition from concrete surface
images formed a particularly active area. Some representative methods for crack identification include
the threshold segmentation method, edge detection method, and artificial intelligence (AI) method.
A threshold segmentation algorithm proposed by Otsu [6] has been widely used in various image
recognition tasks [7], and modifications of that algorithm have been made for crack identification [8–12].
For instance, considering noise influence as the major obstacle for accurate image recognition,
Migdal et al. [13] conducted noise suppression with the assistance of Markov random fields of binary
segmentation, taking into account neighboring pixel classifications. While the process produced a
prominent noise reduction effect, the features of crack and noise become difficult to distinguish with an
increase in the extent of noise interference. Image edge detection was also adopted for crack recognition,
with representative applications established on Roberts, Prewitt and Sobel operators [14]. However,
the edge detection algorithm caused a typical ill-posed problem in that it was highly sensitive to noise
influence, especially due to light and distortion, and optimal solutions were difficult to obtain [15].
Modifications have been made to address the above issues. For example, noise removal algorithms of
nonlinear total variation proved effective in noise reduction and thus were able to enhance the accuracy
of edge detection [16]; Cha et al. [17] used Canny edge detection with a classifier of linear support vector
machine to distinguish loosened bolts from intact ones; Ayenu-Prah et al. [18] employed the Sobel
edge detection algorithm using bidimensional empirical mode decomposition for crack identification
and noise suppression; Acharya et al. [19] reduced noise using gradient-based edge detection based on
the difference between edge and non-edge pixels, where images in a color filter array were treated;
Yan et al. [20] developed an edge detection algorithm based on morphological filtering set to overcome
the interference of noise; Zhou et al. [21] adopted an improved edge detection algorithm based on a
wavelet transform to detect pavement distress, showing an advantage in quantifying the degree of
damage. However, the accuracy of the aforementioned algorithms is still considered limited in crack
identification when taking into account severe noise influences that are common in actual damage
detection scenarios.
With rapid development in recent years, AI methods have been increasingly applied in structural
damage detection [22], where crack recognition based on convolutional neural networks (CNNs) has
shown promise in engineering applications [23]. Compared to conventional machine learning methods,
CNNs are particularly powerful in learning the characteristics of images using a simpler network
structure [24]. Leveraging this merit, CNN-based methods can identify cracks with high efficiency,
especially when dealing with multi-classification [25] and large-scale problems [26]. Moreover, relying
on transfer learning, existing CNN structures can easily be modified and well utilized to solve
similar types of problems [27], enhancing the adaptability of CNNs for treating different crack images.
Chen et al. [28] proposed a method based on CNNs and naïve Bayes data fusion to detect cracks in a
single frame of video; Browne et al. [29] suggested the implementation of separable filters to detect
sewer cracks based on a CNN model. More applications can be found in the studies of Cha [30],
Zhao [31], Zhang [32], and Gavilán et al. [33], where the key concepts of crack detection were analogous.
That is, the image was first divided into a number of sub-regions, from which crack features were
extracted to form the feature vectors. The CNN was trained based on these feature vectors and then
used to identify cracks in each sub-region of the full image. Finally, the crack in the full image was
recognized by combining the identification results of each sub-region. While possessing inherent noise
immunity, CNNs still face challenges in accurate identification of concrete cracks [34]. That is because
the image background noise exhibits a large number of distinct features, which increase the difficulty
of crack feature extraction and largely jeopardize the accuracy, efficiency, and versatility of CNNs.
Sensors 2020, 20, 2021 3 of 24
Sensors 2020, 20, x FOR PEER REVIEW 3 of 25

Unfortunately, as commonly encountered in actual practice, severe background noise originating in
concrete surfaces
In this paper,fromcrack complex and diverse
identification from sources
images can hardly be
of concrete avoided.
surface was carried out based on the
hybrid In this paper, crack
utilization of CNNs identification from images
and a multi-layered image of concrete surface(MLP)
preprocessing was carried
strategy, outdefined
based on as the
the
hybrid
MLP–CNN utilization of CNNs
framework, theand a multi-layered
central componentsimage of which preprocessing
are homomorphic (MLP) strategy,
filteringdefined
and theasOtsuthe
MLP–CNN
thresholdingframework,
method. Apart the central
from clear components
improvement of which are homomorphic
in detection accuracy and filtering and the Otsu
noise immunity, the
thresholding
MLP–CNN framework method. Apart is alsofrom
ableclear improvement
to strengthen in detection
the versatility of accuracy and noise immunity,
crack identification, meaning that the
MLP–CNN
different types framework
and levelsis alsoofable to strengthen
background the can
noise versatility of crack
be treated identification,
using a uniformmeaning framework.that
different
Specifically,types andcrack
both levelsposition
of background
detection noise can and
(CPD) be treated
crack using a uniform framework.
type identification Specifically,
(CTI) networks were
both
built,crack position
trained detection
and tested, (CPD)
based on and crackcomposed
datasets type identification
of a number (CTI)ofnetworks
concrete were crackbuilt, trained
images. The
and tested, based on datasets composed of a number of concrete
effectiveness and efficiency of the developed framework was examined by comparing crack detection crack images. The effectiveness and
efficiency
results with of the
and developed
without the framework was examined
implementation of theby MLPcomparing
strategy. crack
Crack detection results with
identification and
accuracy
without
subject to the implementation
different sources and of the
levelsMLP strategy.
of noise Crackwas
influence identification
investigated. accuracy subject to different
sources and levels of noise influence was investigated.
2. MLP–CNN Framework
2. MLP–CNN Framework
The architecture of the proposed MLP–CNN framework is shown in Figure 1. The multi-layered
The architecture
preprocessing structure of the proposed
includes MLP–CNNofframework
a combination filtering and is shown
featurein Figure 1.techniques
extraction The multi-layered
applied
preprocessing
in sequence tostructure includesfunctions.
fulfill different a combination of filtering
The first layer, one andoffeature
the key extraction
components techniques
in the applied
MLP, is
in sequence to filtering
homomorphic fulfill different
[35], usedfunctions.
to process The first layer,
concrete one images
surface of the key in the components in the MLP,
frequency domain. For
is homomorphic filtering [35],
image processing, homomorphic filtering is able to suppress low frequency components,domain.
used to process concrete surface images in the frequency such as
For
thoseimage processing,
associated with homomorphic
lighting variations, filtering
whileis able to suppress
highlighting low frequency
high-frequency components,
components such as
associated
those associated with lighting variations, while highlighting
with local details such as crack edges. Considering that frequency-domain filtering can causehigh-frequency components associated
with local details
oscillation of thesuch as crack
grayscale edges.
in the outputConsidering
image, known that frequency-domain
as the ringing effect filtering can selection
[36], the cause oscillation
of filter
of the grayscale in the output image, known as the ringing effect
types in homomorphic filtering has a direct impact on the denoising effect. Three high-pass filters [36], the selection of filter types in
homomorphic filtering has a direct impact on the denoising effect.
commonly used for homomorphic filtering are the Gaussian filter, Butterworth filter, and ideal high Three high-pass filters commonly
used
pass for homomorphic
filter. As can be seen filtering are the
in Figure 2b, Gaussian filter, Butterworth
Gaussian filtering produces the filter, and ideal
optimal high pass
denoising filter.
effect, by
As can be seen in Figure 2b, Gaussian filtering produces the optimal
showing the significant crack feature with minimal ringing effect. In Figures 2c and 2d, the crack denoising effect, by showing
the significant
features crack feature
are largely submerged with in
minimal ringing effect.
the backgrounds. In Figure
Figure 2c,d, the
2c, treated by crack features are filter,
the Butterworth largely is
submerged in the backgrounds. Figure 2c, treated by the Butterworth
severely contaminated by high-frequency noise likely associated with the ringing effect; Figure 2d, filter, is severely contaminated by
high-frequency
treated by thenoise ideallikely
highassociated
pass filter, with the ringing
shows too weak effect; Figure 2d,
contrast in treated
the colors by theof ideal high pass
the crack and
filter, shows too weak contrast in the colors of the crack and background.
background. These unsatisfactory results are due to the filter characteristics, such as the dramatic These unsatisfactory results
are due toofthe
changes filter characteristics,
gradient such as the
in the filter functions. Thus,dramatic changes
the Gaussian of gradient
filter was adopted in theinfilter
this functions.
study for
Thus, the Gaussian
homomorphic filtering. filter was adopted in this study for homomorphic filtering.
Figure 1. Diagram of the overall architecture of the MLP–CNN framework.

Figure 1. Diagram of the overall architecture of the MLP–CNN framework.
Sensors 2020, 20, 2021 4 of 24
(a) (b) (c) (d) (e)
Figure
Figure 2. (a)Original
2. (a) Originalimage
imagecontaining
containinga acrack,
crack,and
and
thethe crack
crack feature
feature extraction
extraction results
results treated
treated by: by:
(b)
(b) Gaussian filter; (c) Butterworth filter; (d) ideal high pass filter for homomorphic filtering;
Gaussian filter; (c) Butterworth filter; (d) ideal high pass filter for homomorphic filtering; and (e) theand (e) the
Otsu
Otsu thresholding
thresholding method.
method.
The
The other
other keykey layer
layer ofof the
the MLP
MLP structure
structure is is the
the Otsu
Otsu thresholding
thresholding algorithm
algorithm [6] [6] used
used toto identify
identify
the
the maximum value of gray in the image. Image binarization is carried out in accordance with the
maximum value of gray in the image. Image binarization is carried out in accordance with the
adaptive
adaptive selection
selection of of thresholds.
thresholds. As Ascan canbe beseen
seenininFigure
Figure2e, 2e,the
theOtsu
Otsuthresholding
thresholding method,
method, which
which is
is applied
applied afterthe
after thehomomorphic
homomorphicfiltering, filtering,transforms
transformsthe thecrack
crackimages
images intointo binarized
binarized images
images with with
prominent
prominent crack crack features
features highlighted.
highlighted.
The
The causes of background
causes of background noise noise in in the
the images
images can can be
be complex,
complex, due due to to unexpected situations.
unexpected situations.
Moreover, additional noise may be generated during the filtering
Moreover, additional noise may be generated during the filtering process to distort the image.process to distort the image. Thus, to
Thus,
obtain
to obtainthethe
optimal
optimalde-noising
de-noising effect, twotwo
effect, additional
additionallayers, median
layers, filtering
median and aand
filtering Gaussian filter, filter,
a Gaussian were
inserted between the homomorphic filtering layer and the Otsu
were inserted between the homomorphic filtering layer and the Otsu thresholding layer. Median thresholding layer. Median filtering
is a nonlinear
filtering signal-processing
is a nonlinear technique
signal-processing based on
technique the theory
based on the of sorting
theory statistics.
of sorting With good
statistics. Withnoisegood
suppression effect, median filtering is particularly effective in maintaining
noise suppression effect, median filtering is particularly effective in maintaining crack edge crack edge information, due
to its demonstrated merit in image edge protection [37]. The Gaussian
information, due to its demonstrated merit in image edge protection [37]. The Gaussian filter, with filter, with the smooth feature
of
thethe Gaussian
smooth function,
feature of theisGaussian
applied next to process
function, the image,
is applied next reducing
to process thethe
accumulation
image, reducing of errors
the
during filtering. By using the MLP structure, concrete surface images
accumulation of errors during filtering. By using the MLP structure, concrete surface images containing different noise types
and levels can
containing be rapidly
different noisetransformed
types and into binary
levels can images
be rapidlywith transformed
significant crack intofeatures, and thuswith
binary images the
generation
significant abilities of the CNNs
crack features, and thus applied subsequently
the generation are maximized.
abilities of the CNNs applied subsequently are
In
maximized.the following study, the construction and application of the CNNs are introduced in Section 3,
where InCPD and CTI networks
the following study, the were built to and
construction identify the position
application of theandCNNs typeareofintroduced
cracks, respectively.
in Section
The constructions of these two CNNs are conducted in
3, where CPD and CTI networks were built to identify the position and type of cracks, a similar way to that of the respectively.
CNN-based
methods introducedofinthese
The constructions the literature.
two CNNs In Section 4, comparative
are conducted studies
in a similar wayaretopresented,
that of the showing
CNN-based crack
identification
methods introduced results achieved with and
in the literature. In without
Section 4, the implementation
comparative studiesof are
the presented,
MLP. showing crack
identification results achieved with and without the implementation of the MLP.
3. Concrete Crack Identification Based on CNN
3. Concrete
3.1. Crack Identification Based on CNN
Overall Structure
A general
3.1. Overall CNN structure is shown in Figure 3. The structure is mainly composed of three layers,
Structure
the input layer, the feature extraction layer and the final layer, where the sub-layers in the feature
A general
extraction layerCNN structure
(i.e., L2, L3, L4, is shown
and in Figure
L5) can 3. Theby
be repeated structure is mainly
considering composed
practical demands.of three layers,
For specific
the input layer, the feature extraction layer and the final layer, where the sub-layers in
crack detection aims, two network types with analogous structures were constructed, defined as crack the feature
extraction
position layer (i.e.,
detection (CPD)L2,and
L3, crack
L4, and
typeL5) can be repeated
identification by considering
(CTI) networks, practical demands. For
respectively.
specific crack detection aims, two network types with analogous structures were constructed, defined
as crack position detection (CPD) and crack type identification (CTI) networks, respectively.
Sensors 2020, 20, 2021 5 of 24
Figure 3. Overall structure of CNN.

3.1.1. Feature Extraction Layer
3.1.1. Feature Extraction Layer
3.1.1.The feature
Feature extraction
Extraction Layer layer consists of multiple convolutional layers (CONVs) and pooling
The feature extraction layer consists of multiple convolutional layers (CONVs) and pooling layers.
layers. The CONVs (in L2 as shown in Figure 3) feature sparse connections between the convolution
The feature
The CONVs (in L2extraction
as shown in layer consists
Figure of multiple
3) feature convolutional
sparse connections layersthe
between (CONVs) and kernels
convolution pooling
kernels and the pixel matrices of the input images, enabling efficient network training that benefits
layers.
and the The
pixelCONVs
matrices(inofL2the
asinput
shown in Figure
images, 3) feature
enabling sparse
efficient connections
network between
training the convolution
that benefits from the
from the relatively small number of network parameters [38]. Besides, the memory usage for
kernels and
relatively thenumber
small pixel matrices
of network of the input images,
parameters [38]. enabling
Besides, theefficient
memory network
usage training that benefits
for computation can
computation can be reduced due to the weight sharing of the CONVs [39]. Initial weights and bias of
from the relatively small number of network parameters [38]. Besides,
be reduced due to the weight sharing of the CONVs [39]. Initial weights and bias of the convolution the memory usage for
the convolution kernel in L2 were generated randomly. Activation functions (L3) were used to
computation
kernel can be
in L2 were reduced randomly.
generated due to the weight sharing
Activation of the (L3)
functions CONVs were [39].
usedInitial weights and
to introduce bias of
nonlinear
introduce nonlinear characteristics into the network [40]. Examples of nonlinear functions are shown
the convolution kernel in L2 were generated randomly. Activation functions
characteristics into the network [40]. Examples of nonlinear functions are shown in Figure 4. As can be (L3) were used to
in Figure 4. As can be seen, the sigmoid function ranging between 0 and 1 shows a small gradient
introduce
seen, nonlinear
the sigmoid characteristics
function into the network
ranging between [40]. Examples
0 and 1 shows of nonlinear
a small gradient that functions
can resultarein ashown
slow
that can result in a slow convergence speed and the problem of the gradient vanishing during CNN
in Figure 4. As
convergence canand
speed be seen, the sigmoid
the problem of thefunction
gradientranging
vanishing between
during0CNN and 1training.
shows aThe small gradient
hyperbolic
training. The hyperbolic tangent (tanh) ranging between -1 and 1 shows a larger gradient than that
that can(tanh)
tangent resultranging
in a slow convergence
between -1 andspeed
1 shows anda the problem
larger gradientof the
thangradient vanishing
that of the sigmoidduring
and thusCNN is
of the sigmoid and thus is easier for optimization. The rectified linear units (ReLU), in the simple
training.
easier The hyperbolic
for optimization. The tangent (tanh)
rectified linearranging between
units (ReLU), in -1
theand 1 shows
simple a larger gradient
mathematical than that
form as shown in
mathematical form as shown in Figure 4, has the largest gradient, which can increase the training
of the 4,
Figure sigmoid
has the and thus
largest is easierwhich
gradient, for optimization.
can increase the Thetraining
rectified linear units
efficiency (ReLU), in
and accuracy of the
CNN simple
[41].
efficiency and accuracy of CNN [41]. Therefore, ReLU was selected as the activation function in the
mathematical form as shown in Figure 4, has the largest gradient, which
Therefore, ReLU was selected as the activation function in the following study, located after can increase the training
CONVs
following study, located after the CONVs shown as L3 in Figure 3.
efficiency
shown and
as L3 in accuracy
Figure 3. of CNN [41]. Therefore, ReLU was selected as the activation function in the
following study, located after the CONVs shown as L3 in Figure 3.
3.0 Relu，y=max(0,x)
2.5 Sigmoid，y=(1+e-x)-1
3.0 Relu，y=max(0,x)
Tanh，y=tanh(x)
2.0
2.5 Sigmoid，y=(1+e-x)-1
Tanh，y=tanh(x)
1.5
2.0
1.0
1.5
y
0.5
1.0
y
0.0
0.5
-0.5
0.0
-1.0
-0.5
-1.0 -3 -2 -1 0 1 2 3
-3 -2 -1 0x 1 2 3
Figure4.4.Examples
Figure Examplesof
ofcommonly
commonlyused
usedactivation
x
activationfunctions
functionsfor
forneural
neuralnetworks.
networks.
Figure 4. Examples asofL4commonly

AAnormalized
normalized layer (shown
layer (shown as L4 in used
in Figure 3)
Figureactivation
is often functions
3)included
is often for neural
inincluded
CNN in networks.
to enhance
CNN thetogeneralization
enhance the
ability of the network, by creating a competitive mechanism for local neuron
generalization ability of the network, by creating a competitive mechanism for local neuron activity [42]. The present
activity
crack A normalized
recognition layeradopted
network (shownbatch as L4 in Figure 3)
normalization (BN)is (the
often includedofin
advantage CNN
which was todemonstrated
enhance the
[42]. The present crack recognition network adopted batch normalization (BN) (the advantage of
generalization
in [43]) was
as the ability of thelayer
normalized network, by creating a competitive mechanism for in
local neurontraining.
activity
which demonstrated in [43]) to
as alleviate the gradient
the normalized layer tovanishing problem
alleviate the gradient network
vanishing problem
[42]. The present crack recognition network adopted batch normalization (BN) (the advantage of
which was demonstrated in [43]) as the normalized layer to alleviate the gradient vanishing problem
Sensors 2020, 20, 2021 6 of 24
The pooling layer, shown as L5 in Figure 3, was constructed to perform aggregate statistics, based on
the location features of different images, to further reduce data complexity as well as the probability of
overfitting. The MaxPooling method was adopted for crack recognition.
3.1.2. Final Layer

The final layer normally contains multiple fully connected layers (FCs), activation functions and a
softmax layer. Dropout layers may also be included for improvement of accuracy. Dropout reduces
co-dependence between nodes by randomly resetting parts of the weights or outputs of the fully
connected layer to zero during the CNN training process, whereby the problem of overfitting can be
prevented [44]. As can be seen in Figure 3, the Dropout layer is often between FCs in the final layer.
The output of the softmax layer indicates the probability of an object being classified into a category.
The softmax layer is located before the output layer and is usually adopted for multi-classification
problems (i.e., C > 2). The final output of the network requires utilization of a softmax function to
generate the categories of objects with the highest probability. The softmax function is defined as:
eVi
Si = PC . (1)
Vi
i e
where Si is the softmax function; Vi is the ith element in the set V to be classified; C is the total number
of categories. Softmax is often located at the end of final layer and is connected to the output layer,
representing the relative probability of different categories [45].
3.1.3. Training Algorithms

During the training process, forward propagation obtained a loss, as the differences between the
output and the real values, by evaluating which of the CNN parameters were optimized. Specifically,
a stochastic gradient descent was used to minimize the cost function [46] and achieve optimal biases
and weights. The cost function of the ith CNN layer is:
1 2
J (i) (θ0 , θ1 ) = ( hθ ( x ( i ) ) − y ( i ) ) , (2)
2
the partial derivative of which is:
∆J (i) (θ0 , θ1 ) (i)

= (hθ (x(i) ) − y(i) )x j , (3)
θj
and the parameter updating formula is written as:
∆J (i) (θ0 , θ1 )
θ j := θ j − α . (4)
θj
In Equations (2)–(4), i represents the CNN neuron index; j represents the CNN layer index and α
is the learning rate. Note that each iteration of the gradient descent algorithm is affected by α. θ0 and
θ1 are the weight and bias of the CNN neurons to be updated; hθ (x(i) ) is the predicted value of the
CNN output, and y(i) is the true value.
3.2. Construction of the CPD Network
3.2.1. Preparation of Data Sets

The images for crack detection were live pictures of different concrete surfaces, each containing
1024 × 1024 pixels. The full-size images were divided into small sub-regions, defined as regions of
interest (ROIs), with the uniform resolution of 32 × 32. ROIs containing cracks are called crack ROIs
Sensors 2020, 20, 2021 7 of 24
and all the others are called background ROIs, as shown in Figure 5. The training set for the CPD
network included 1500 crack ROIs and 5000 background ROIs, whereas the test set included 542 crack
ROIs
Sensorsand 3680
2020, 20, xbackground ROIs.
FOR PEER REVIEW 7 of 25
Figure5.5.Diagram
Figure Diagramillustrating
illustratingthe
themethod
methodforforpreparing
preparingdatasets
datasetsforforthe
theCPD
CPDnetwork,
network,which
whichconsists
consists
ofofcrackFigure
and 5. Diagram illustrating the method for preparing datasets for the CPD network, which consists
crack andbackground
background regions
regionsofofinterest
interest(ROIs).
(ROIs).
of crack and background regions of interest (ROIs).
3.2.2.
3.2.2.Structure
Structure of
ofthe
theCPD
CPDNetwork
Network
3.2.2. Structure of the CPD Network
Because
Because the
the CPDCPD network
network was
waswasdesigned
designed to deal with two-classification problems, aasimple
Because the CPD network designedtotodeal deal with
with two-classification problems,
two-classification problems, simple
a simple
network
network structure
structure was preferred
was preferred to meet
to meet the requirement
thetherequirement of training efficiency without an overfitting
network structure was preferred to meet requirementof oftraining efficiencywithout
training efficiency without an an overfitting
overfitting
problem.
problem. Three
problem.
classical
ThreeThree
classical CNN
CNN
classical
structures
CNNstructures for
structures
two-dimensional
forfor
two-dimensional
two-dimensional image
image recognition,
recognition,
image recognition,
Alexnet,
Alexnet,
Alexnet,
Lenet5
Lenet5
Lenet5
and
andand
CIFAR10
CIFAR10 [47],
CIFAR10 were
[47], were tested
[47], were
tested for
tested accuracy
for accuracy
for accuracy of identification
of of of
identificationof
identification ROIs
of ROIs along
ROIs along with
alongwith the
withthe training
thetraining
training process,
process,
process,as as
as
shown in Figure
shown in 6. Table
Figure 6. 1 presents
Table 1 the
presents ultimate
the ultimate accuracy
accuracyand
and total
totaltime
time
shown in Figure 6. Table 1 presents the ultimate accuracy and total time of training. From observation of
of training.
training. From
From observation
observation
of
ofFigure
Figure 66and
and6Table
of Figure Table 1,1,Lenet5
and Table shows
1, Lenet5
Lenet5 shows the
shows lowest
thethe lowest
lowest recognition
recognitionaccuracy
recognition accuracy for
accuracy for crack
forcrack and
crackandand background
background
background
identification,
identification, whereas Alexnet shows the highest accuracy. However, the training of Alexnetisisvery
whereas
identification, Alexnet
whereas shows
Alexnet the
shows highest
the highestaccuracy.
accuracy.However,
However, the training
the training ofofAlexnet
Alexnet is very
very
time-consuming
time-consuming because because
of its of its
complexcomplex structure.
structure. Therefore,
Therefore, the
the CIFAR10
CIFAR10 network
network can
can be considered
be considered
time-consuming because of its complex structure. Therefore, the CIFAR10 network can be considered
a favorable
aafavorable option option for establishing a CPD network, due to itsbalance
balance ofof efficiency and accuracy. More
favorable optionfor
importantly, forestablishing
with establishing aaCPD
its simple structure,CPDnetwork,
network,
the CIFAR10
due to
toits
duenetwork
its balance
can be of
efficiency
efficiency
easily adjusted
and
and accuracy.
accuracy.
based on finely
More
More
importantly,
importantly, with its simple
with its simple structure,
structure, the CIFAR10
the CIFAR10 network
network can be easily
can be easily adjusted
adjusted based on finely
tuned parameters to achieve improvement of accuracy. Nevertheless, Alexnet was usedbased on finely
to construct
tuned
tunedparameters
parameters to
toachieve
achieve improvement
improvement of accuracy. Nevertheless, Alexnet was used to construct
the CTI network, as will be introduced. of accuracy. Nevertheless, Alexnet was used to construct
the
theCTI
CTInetwork,
network,as aswill
willbe beintroduced.
introduced.
100
100
80
Accuracy(%)
80
60
Accuracy(%)
60 40
Alexnet
40 20
Cifar10
Lenet5
0 Alexnet
20
0 100 200 300 400Cifar10
500
Iteration Lenet5
0
6. Accuracy
Figure Figure 6. Accuracy 0variations
variations 100
during
during 200
thethe
training 300 using
trainingprocess
process 400
using Alexnet, 500
Alexnet, Lenet5,and
Lenet5, andCIFAR10,
CIFAR10,
respectively.
respectively. Iteration
Figure 6. Accuracy variations during the training process using Alexnet, Lenet5, and CIFAR10,
respectively.
Sensors 2020, 20, 2021 8 of 24
Table 1. Ultimate accuracy and training time of ROI recognition by Alexnet, Lenet5, and CIFAR10,
respectively.
CNN Test accuracy (%) Time
Table 1. Ultimate Alexnet
accuracy and training time of98.6
ROI recognition by Alexnet, Lenet5,
140 min 7 s and CIFAR10,
respectively. Lenet5 45.2 1 min 25 s
CIFAR10 94.6 8 min 12 s
CNN Test accuracy (%) Time
Alexnet 98.6 140min7s
Three adjusted structures based on CIFAR10, referred to1min25s
Lenet5 45.2 as Cracknet1-1, 1-2 and 1-3, were tested
for comparison. Cracknet1-1 CIFAR10 94.6
has a feature extraction layer8min12s
including three CONVs, three pooling
Three
layers, and adjusted
three ReLUstructures
activationbased on CIFAR10,
functions, withoutreferred to as Cracknet1-1,
normalization 1-2 and 1-3,
and dropout wereThe
layers. tested
learning
for comparison. Cracknet1-1 has a feature extraction layer including three CONVs, three pooling
rate (i.e., α in Equation (4)) of Cracknet1-1 is marked as LR1 in Figure 7, with the initial value of
layers, and three ReLU activation functions, without normalization and dropout layers. The learning
0.001, which is multiplied by a scale factor of 0.1 every 8 epochs. Cracknets1-2 and 1-3 are similar to
rate (i.e., α in Eq. (4)) of Cracknet1-1 is marked as LR1 in Figure 7, with the initial value of 0.001,
1-1, which
exceptisthat the former includes a dropout layer, whereas the latter adopts the constant learning
multiplied by a scale factor of 0.1 every 8 epochs. Cracknets1-2 and 1-3 are similar to 1-1,
rateexcept
of LR2that(=0.001) as shown
the former includesinaFigure
dropout 7.layer,
The ultimate
whereas the accuracy of Cracknets1-1,
latter adopts 1-2, andrate
the constant learning 1-3 were
94.6%,
of LR2 (=0.001) as shown in Figure 7. The ultimate accuracy of Cracknets1-1, 1-2, and 1-3 were 94.6%, the
95.8% and 94.4%, respectively. It can be seen that, with the addition of a dropout layer,
accuracy
95.8% ofandCracknet1-2 increased
94.4%, respectively. about
It can 1% that,
be seen compared
with the with that of aCracknet1-1.
addition dropout layer,Onthethe other hand,
accuracy
of Cracknet1-2
the learning increasedhad
rate variation about 1%impact
little compared withaccuracy
on the that of Cracknet1-1.
of the CPD On the other
network. hand, the
However, CNN is
learningdifficult
sometimes rate variation had little
to converge, impact
which on the the
requires accuracy of the
learning rateCPD network. along
to decrease However,
withCNN is
the training
sometimes difficult to converge, which requires the learning rate to decrease along
epoch to achieve better convergence. Thus, LR1 was the preferred option in this study. Therefore, by with the training
epoch to achieve better convergence. Thus, LR1 was the preferred option in this study. Therefore, by
considering the overall performance in both accuracy and learning rate, Cracknet1-2 was selected as
considering the overall performance in both accuracy and learning rate, Cracknet1-2 was selected as
the CPD network, with detailed information as shown in Table 2. Apart from the extraction layer
the CPD network, with detailed information as shown in Table 2. Apart from the extraction layer as
as already introduced, the final layer consisted of two FCs, one dropout layer, one ReLU activation
already introduced, the final layer consisted of two FCs, one dropout layer, one ReLU activation
function and
function and one one
softmax layer
softmax connected
layer connected to the output
to the layer.
output There
layer. were
There 40 epochs,
were withwith
40 epochs, 96 iterations
96
per epoch, forper
iterations theepoch,
training of the
for the network.
training of the network.
LR1 LR2
0.001
Value of LearningRate
1E-4
1E-5
1E-6
1E-7
1E-8
0 5 10 15 20 25 30 35 40
Epoch of CNN
Figure7.7.Two
Figure Twolearning ratecurves
learning rate curvesduring
duringthethe network
network training
training period.
period.
Table 2. Specific structure of the CPD network (where h, w, and d are height, width, and depth,
respectively; Conv is convolution layer; MX is MaxPooling layer; FC is fully connected layer; D is
dropout
Tablelayer; S is softmax
2. Specific structurelayer;
of theNOI
CPDis network
number (where
of input; NOO
h, w, andisd number of output).
are height, width, and depth,
respectively; Conv is convolution layer; MX is MaxPooling layer; FC is fully connected layer; D is
Layer L1 L2 L3 L4 L5 L6 L7 L8
dropout layer; S is softmax layer; NOI is number of input; NOO is number of output).
Operator Input Conv ReLU MP Conv ReLU MP Conv
h/w/d layer
32/32/3 L1
5/5/32 L2 - L3 3/3/- L4 L5
5/5/32 L6 - L7 L8
3/3/- 5/5/64
Stride Operator - Input1 Conv - ReLU 2MP Conv 1 ReLU - MP Conv 2 1
L9 h/w/d
L10 32/32/3
Layer 5/5/32
L11 - 3/3/-
L12 5/5/32
L13 - L143/3/- 5/5/64
L15 L16
ReLU Stride
MP -
Operator 1 FC - ReLU 2 1D - FC 2 S1 Output
- 3/3/- NOI 576 - - 256 - 2
- 2 NOO 256 Table 2. (cont.).
- - 2 - 2
L9 L10 Layer L11 L12 L13 L14 L15 L16

Sensors 2020, 20,ReLU
x FOR PEERMP
REVIEWOperator FC ReLU D FC S Output 9 of 25
Sensors 2020, 20, 2021 - 3/3/- NOI 576 - - 256 - 2 9 of 24

- L9 2L10 NOOLayer L11
256 L12- L13- L14 2 L15 - L16 2
ReLU MP Operator FC ReLU D FC S Output
3.3. - 3/3/- NOI Window576 - - 256 - 2
3.3. Crack
Crack Position
Position Detection
Detection Using
Using Sliding
Sliding Window
- 2 NOO 256 - - 2 - 2
To
Todetect
detectcracks
cracksusing
usingthe
theCPD
CPDnetwork,
network,the
theimage
imagewaswasscanned
scannedbybya sliding window
a sliding window as as
shown
shown in
Figure 8. The
3.3. Crack
in Figure window
Position
8. The size
Detection
window was
sizeUsing32 × 32
was Slidingwhich
32×32 Window is the same as that of the ROIs. With ROI types
which is the same as that of the ROIs. With ROI types (i.e., (i.e.,
crack or
crack or Tobackground)
background) and coordinates as
as the
the outputs, crack positions were identified in terms of the
detect cracksand coordinates
using the CPD network, outputs,
the imagecrack positions
was scanned by awere identified
sliding window inas terms
shown of the
coordinates
coordinates of the crack ROIs.
in Figureof8. the
Thecrack
windowROIs.
size was 32×32 which is the same as that of the ROIs. With ROI types (i.e.,
crack or background) and coordinates as the outputs, crack positions were identified in terms of the
coordinates of the crack ROIs.
Figure 8.
Figure 8. Schematic
Schematic diagram
diagram ofof the
the procedure
procedure ofof crack
crack position
position detection
detection using
using aa sliding
sliding window.
window.
Figure 8. Schematic diagram of the procedure of crack position detection using a sliding window.
Figure 99shows
Figure showsthethe crack
crack position
position detection
detection results
results using using
the CPD thenetwork
CPD network
or the or theclassifier.
Alexnet
Figure 9 shows the crack position detection results using the CPD network or Alexnet
the Alexnet
classifier. The
The classifier.
sliding step sliding
The of step
the window
sliding of the window
step of thewas was
set atwas
window set
32 pixels,at 32 pixels,
set at 32which which
pixels, means means
that no
which means that
overlap
that no overlap
exists
no overlap exists
between
exists
between
two betweentwo
adjacenttwo adjacent
windows. windows.
adjacentItwindows. It
can be seen can be
It canthat seen that the
the constructed
be seen constructed
CPD network
that the constructed CPD network
achieves
CPD network achieves
detection
achieves detection
accuracy
detection
accuracy
similar to similar
accuracythat to to
of the
similar that ofofthe
Alexnet,
that Alexnet,
despite
the Alexnet, despite
using usinga astructure.
a simpler
despite using simpler
simpler structure.
structure.

(a)
(a)
(b)
Figure
Figure 9. Crack
9. Crack position
position detection
detection (CPD)(CPD) results
results usingusing theconstructed
the (a) (a) constructed
CPDCPD andAlexnet
and (b) (b) Alexnet
network.
network.
The impact of variation of the sliding step on crack recognition accuracy is shown in Figure 10,
where a different set of surface images is used. It can be seen that, with the sliding step of 32 pixels,
the crack recognition results exhibit several discontinuities, losing important information in the main
portions of the cracks. With reduction of the step size to 16 pixels, on the other hand, recognition
accuracy is considerably enhanced. This is because, with reduction of the step size, adjacent windows
Sensors 2020, 20, 2021
(b) 10 of 24
Figure 9. Crack position detection (CPD) results using the (a) constructed CPD and (b) Alexnet
network.
begin to overlap to contain some same areas of interest [48], which are repeatedly identified by the
begin to overlap to contain some same areas of interest [48], which are repeatedly identified by the
CPD network to increase the possibility of correct crack recognition.
CPD network to increase the possibility of correct crack recognition.
(a)
(b)
Figure 10. Crack position detection results subject to (a) 32 and (b) 16 pixels in step size of the
sliding window.
3.4. Construction of the CTI Network
3.4.1. Preparation of Data Sets

Unlike the data sets of the CPD network, consisting of crack and background ROIs, the data
sets of the CTI network consisted directly of full images of the concrete surface. To improve training
efficiency, the resolution of the original images was reduced from 1024 × 1024 to 227 × 227. Five crack
types, transverse cracks, vertical cracks, left oblique cracks, right oblique cracks, and mesh cracks, were
defined, with examples shown in Figure 11. Therefore, the CTI was constructed as a five-classification
network without the assistance of a sliding window. The training and test sets, which were under
moderate noise influence, contained 1500 and 577 images, respectively.
of the CTI network consisted directly of full images of the concrete surface. To improve training
efficiency, the resolution of the original images was reduced from 1024×1024 to 227×227. Five crack
types, transverse cracks, vertical cracks, left oblique cracks, right oblique cracks, and mesh cracks,
were defined, with examples shown in Figure 11. Therefore, the CTI was constructed as a five-
classification
Sensors network without the assistance of a sliding window. The training and test sets, 11
2020, 20, 2021 which
of 24
were under moderate noise influence, contained 1500 and 577 images, respectively.
(a) (b) (c) (d) (e)
Figure 11. Five crack types: (a) transverse cracks; (b) vertical cracks; (c) left oblique cracks; (d) right
Figure 11. Five crack types: (a) transverse cracks; (b) vertical cracks; (c) left oblique cracks; (d) right
oblique cracks; (e) mesh cracks.
oblique cracks; (e) mesh cracks.
3.4.2. Structure of CTI Network
3.4.2. Structure of CTI Network
Two structures, labeled Cracknet2-1 and Cracknet2-2, were constructed and compared. Cracknet2-1
Two structures,
was based on Lenet5, withlabeled Cracknet2-1
the feature and layer
extraction Cracknet2-2,
consistingwere constructed
of two CONVs and andtwo compared.
pooling
Cracknet2-1 was based on Lenet5, with the feature extraction layer consisting of two
layers. Cracknet2-2 was based on Alexnet structure, with the learning rate LR1 selected, containing CONVs and two
pooling
40 epochslayers. Cracknet2-2
including was per
13 iterations based
epochon during
Alexnetthestructure,
training with the learning
of Cracknet2-2, rate LR1
as shown selected,
in Figure 7.
containing 40 epochs including 13 iterations per epoch during the training of Cracknet2-2,
The structure of Cracknet2-2 is shown in Table 3. The feature extraction layer includes five CONVs, as shown
in Figure
three 7. The
pooling structure
layers, of Cracknet2-2
five ReLU activationisfunctions,
shown inandTable
two3.batch
The feature extraction
normalization layerThe
layers. includes
final
five CONVs, three pooling layers, five ReLU activation functions, and two batch
layer consists of three FCs, two ReLU activation functions, two dropout layers, and a softmax layer normalization
layers. Thetofinal
connected layer consists
the output of three
layer. Figure 12 FCs, two ReLU
compares activation
the accuracy functions,2-1
of Cracknets twoanddropout
2-2 forlayers, and
crack type
a softmax layer
identification connected
during to the output
the training process. layer.
TheFigure
ultimate12 accuracy
compareslevels
the accuracy of Cracknets
of Cracknets 2-1 and 2-12-2 and
are
2-2 for crack type identification during the training process. The ultimate
34.2% and 99.3%, respectively. The superiority of Cracknet2-2 can easily be seen. accuracy levels of Cracknets
2-1 and 2-2 are 34.2% and 99.3%, respectively. The superiority of Cracknet2-2 can easily be seen.
Table 3. Specific structure of the CTI network (BN is batch normalization layer; the meanings of the
Tableabbreviations
other 3. Specific structure of the CTI
are consistent withnetwork
those in(BN
Tableis 2).
batch normalization layer; the meanings of the
other abbreviations are consistent with those in Table 2).
Layer L1 L2 L3 L4 L5 L6 L7 L8
Layer Input L1 Conv L2ReLU L3 BN L4
Operator MPL5 ConvL6 ReLUL7 BNL8
h/w/d
Operator227/227/3
Input11/11/96 Conv - ReLU- BN 3/3/-
MP 5/5/128
Conv -
ReLU -BN
Stride - 4 - - 2 1 - -
h/w/d 227/227/3 11/11/96 - - 3/3/- 5/5/128 - -
L9 L10 L11 L12 L13 L14 L15 L16 Layer
Stride - 4 - - 2 1 - -
MP Conv ReLU Conv ReLU Conv ReLU MP Operator
3/3/- 3/3/384 - 3/3/192 - 3/3/128 - 3/3/- NOI
2 1 - 1 Table 3.- (cont.). 1 - 2 NOO
L17 L18 L19 L20 L21 L22 L23 L24 L25
L9 L10 L11 L12 L13 L14 L15 L16 Layer
FC ReLU D FC ReLU D FC
MP
9216
Conv
-
ReLU
-
Conv
4096
ReLU
-
Conv
-
ReLU
4096
MPS- Output
Operator
5
3/3/-
4096 3/3/384
- -- 3/3/192
4096 -- 3/3/128
- -5 3/3/-- NOI
5
2 1 - 1 - 1 - 2 NOO
For comparison, Cracknet2-2 was further tuned to obtain another two structures, Cracknets 2-3
and 2-4, in which 2-3 used a different learning rate and 2-4 deleted the dropout layer. The ultimate
accuracy levels of Cracknets 2-3 and 2-4 were 98.4% and 98.9%, respectively. It can be seen that under
the learning rate of LR1 and the dropout layer, Cracknet2-2 still gave the best results, and was thus
selected as the CTI network.
To examine the effectiveness of crack recognition by the CTI network (i.e., Cracknet 2-2), a confusion
matrix [49] that is often used to evaluate the quality of classifiers was constructed to quantify the
classification result, as shown in Figure 13. The horizontal and vertical elements of the matrix are the
true and predicted values, respectively. The sum of each row represents the actual number of samples
in a given class, whereas the sum of each column represents the number of samples predicted by the
network. According to the matrix, the accuracy of classification can be deemed high, although false
classification does occur in certain cases. That is, 20 (out of 25) cases of mesh crack were correctly
identified, but five cases were wrongly recognized as vertical cracks; 94 (out of 95) cases of right oblique
Table 3. (cont.).
L17 L18 L19 L20 L21 L22 L23 L24 L25

Sensors 2020, 20, 2021 12 of 24
FC ReLU D FC ReLU D FC S Output
9216 - - 4096 - - 4096 - 5
cracks were correctly
4096 identified,
- with
- one case
4096 wrongly
- recognized
- 5 as a
- transverse
5 crack. The overall
accuracy of the CTI network was estimated to be 99.3%.
100
80
Accuracy(%)
60
40
Cracknet2-1
20 Cracknet2-2
0 100 200 300 400 500

Iteration
Figure
Sensors 12. FOR
2020, 20, Accuracy variations along the training process using Cracknet2-1 and Cracknet2-2,
Figure 12. x Accuracy
PEER REVIEW
variations along the training process using Cracknet2-1 and Cracknet2-2,13 of 25
respectively.
respectively.
For comparison, Cracknet2-2 was further tuned to obtain another two structures, Cracknets 2-3
and 2-4, in which 2-3 used a different learning rate and 2-4 deleted the dropout layer. The ultimate
accuracy levels of Cracknets 2-3 and 2-4 were 98.4% and 98.9%, respectively. It can be seen that under
the learning rate of LR1 and the dropout layer, Cracknet2-2 still gave the best results, and was thus
selected as the CTI network.
To examine the effectiveness of crack recognition by the CTI network (i.e., Cracknet 2-2), a
confusion matrix [49] that is often used to evaluate the quality of classifiers was constructed to
quantify the classification result, as shown in Figure 13. The horizontal and vertical elements of the
matrix are the true and predicted values, respectively. The sum of each row represents the actual
number of samples in a given class, whereas the sum of each column represents the number of
samples predicted by the network. According to the matrix, the accuracy of classification can be
deemed high, although false classification does occur in certain cases. That is, 20 (out of 25) cases of
mesh crack were correctly identified, but five cases were wrongly recognized as vertical cracks; 94
(out of 95) casesConfusion
of right obliqueconstructed
cracks were correctly identified, with one case wrongly recognized
Figure 13. Confusionmatrix
Figure13. matrix constructedbased
basedon
onthe
thecrack
cracktype
typeidentification
identificationresults
resultsbased
basedon
onthe
the
as a transverse
CTI crack.
network.
CTI network. The overall accuracy of the CTI network was estimated to be 99.3%.
More
Moredetailed
detailedanalysis
analysiswas wasconducted
conducted to toexamine
examine thethefirst
first25
25feature
featuremaps
mapsof ofthe
the2nd
2ndandand3rd
3rd
convolutional
convolutionallayers
layersininthe
theCTI
CTInetwork,
network,as asshown
shownin inFigure
Figure14.14.Based
Basedononconvolution
convolution visualization,
visualization,
the
thecrack
crackfeatures
featuresextracted
extractedby byconvolution
convolution kernels
kernels can
can be
be seen,
seen, showing
showing information
information about
about edge,
edge,
direction,
direction, textures, color, etc. Although each type of crack has similar color patterns, thepatterns
textures, color, etc. Although each type of crack has similar color patterns, the patternsofof
crack edges and textures show larger differences, and difference in the direction patterns is
crack edges and textures show larger differences, and difference in the direction patterns is significant.significant.
For
Forthe given
the classification
given problem,
classification prominent
problem, crack direction
prominent patternspatterns
crack direction give rise to the rise
give high to
recognition
the high
accuracy of 99.3%
recognition using
accuracy of the
99.3%CTIusing
network.
the CTI network.
convolutional layers in the CTI network, as shown in Figure 14. Based on convolution visualization,
the crack features extracted by convolution kernels can be seen, showing information about edge,
direction, textures, color, etc. Although each type of crack has similar color patterns, the patterns of
crack edges and textures show larger differences, and difference in the direction patterns is significant.
For the
Sensors given
2020, classification problem, prominent crack direction patterns give rise to the13high
20, 2021 of 24
recognition accuracy of 99.3% using the CTI network.
(a) (b)
Figure 14. Visualization of crack features in the (a) second and (b) third convolutional layer.
Figure 14. Visualization of crack features in the (a) second and (b) third convolutional layer.
4. Concrete Crack Recognition Based on the MLP–CNN Framework
4. Concrete Crack Recognition Based on the MLP–CNN Framework
Based on the establishment of the CTI and CPD networks and the sliding window strategy for
Based on the
crack detection, as establishment of the CTI
introduced in Section 3, theand CPD networks
MLP–CNN and the
framework wassliding
used inwindow strategy
this section, for
focusing
crack detection, as introduced in Section 3, the MLP–CNN framework
on processing different types and levels of the influence of image noise. was used in this section,
focusing on processing different types and levels of the influence of image noise.
4.1. Comparison among Feature Extraction Algorithms
4.1. Comparison among Feature Extraction Algorithms
The background noise in concrete surface images can exhibit complex and diverse forms that
Theincrease
greatly background noise in of
the difficulty concrete surface images
crack recognition. Forcan exhibitphotography
example, complex andof diverse
crackedforms that
concrete
greatly increase the difficulty of crack recognition. For example, photography of cracked
surface is easily interfered with by factors such as a light spot, or a blur caused by camera shake (where concrete
surface is easily
an extreme case isinterfered with by factors
drone photography of dams,such as a light
bridges, spot,
roads, etc.).orOther
a blur caused
factors, by as
such camera shake
rain erosion,
daily wear, or human interference can lead to largely uneven concrete surfaces or stains. It was shown
in Section 3 that the effectiveness of the CPD and CTI network was satisfactory under relatively low
levels of background noise. However, severe noise results in a large decrease in the crack detection
accuracy of both traditional image feature extraction methods and CNN-based methods.
Figure 15 shows the results of crack feature extraction obtained by different preprocessing
techniques, i.e., the linear enhancement filtering method, the iterative threshold segmentation method,
the bit plane slicing method, and the MLP used in this study. Subject to relatively severe background
noise, the apparent drawbacks of conventional crack extraction algorithms can be seen in the figure,
where it is difficult to effectively differentiate the features of noise from those of the cracks. Moreover,
there are considerable differences between the feature extraction capacities of the different methods.
The results of the MLP, on the other hand, reveal significant crack features by minimizing the noise
interference. More importantly, with its binary nature and high precision, the MLP shows considerable
versatility in treating different noise types and levels and thus can be combined with CNN to detect
cracks under different circumstances.
Comparison results between the MLP-based results and the results of edge detection algorithms
(based on Prewitt, Roberts and Sobel operators, respectively) are shown in Figure 16. It can be observed
that the MLP-treated results are able to reveal much more prominent crack features.
in the figure, where it is difficult to effectively differentiate the features of noise from those of the
cracks. Moreover, there are considerable differences between the feature extraction capacities of the
different methods. The results of the MLP, on the other hand, reveal significant crack features by
minimizing the noise interference. More importantly, with its binary nature and high precision, the
MLP
Sensors shows
2020, 20, considerable
2021 versatility in treating different noise types and levels and thus can
14 ofbe
24
combined with CNN to detect cracks under different circumstances.
(a) (b) (c) (d) (e)
Figure 15.
Figure Comparison of
15. Comparison of crack
crack feature
feature extraction
extraction algorithms,
algorithms, subject
subject to
to (a)
(a) no
no treatment;
treatment; (b)
(b) linear
linear
enhancement
Sensorsenhancement filtering
filtering
2020, 20, x FOR method; (c) iterative threshold segmentation method; (d) bit plane slicing;
method; (c) iterative threshold segmentation method; (d) bit plane slicing;
PEER REVIEW (e)
15 of 25
(e) MLP proposed in this
MLP proposed in this paper. paper.
Comparison results between the MLP-based results and the results of edge detection algorithms
(based on Prewitt, Roberts and Sobel operators, respectively) are shown in Figure 16. It can be
observed that the MLP-treated results are able to reveal much more prominent crack features.
(a) (b) (c) (d) (e)

Figure 16. Comparison between edge detection and MLP, subject to (a) no treatment; (b) Prewitt
Figure 16. Comparison between edge detection and MLP, subject to (a) no treatment; (b) Prewitt
operator; (c) Roberts operator; (d) Sobel operator and (e) MLP proposed in this study.
operator; (c) Roberts operator; (d) Sobel operator and (e) MLP proposed in this study.
4.2. Crack Position Detection Subject to Moderate Noise Level
4.2. Crack Position Detection Subject to Moderate Noise Level
It was found that, with use of the MLP–CNN framework, the accuracy of the CPD network
It wasfrom
increased found that,towith
96.5% use A
99.6%. of straightforward
the MLP–CNN framework,
illustration istheshown
accuracy of the 17,
in Figure CPD network
where the
increased from 96.5% to 99.6%. A straightforward illustration is shown in Figure
confusion matrix in Figure 17a shows that the original CPD network results in 114 (out of 542) crack 17, where the
confusion matrix in Figure 17a shows that the original CPD network results in 114
ROIs that are misclassified as background ROIs, and of the background ROIs, 34 (out of 3680) are (out of 542) crack
ROIs that are as
misclassified misclassified
crack ROIs.asAs background ROIs, and
shown in Figure 17b,of the the
with background
use of theROIs, 34 (out instances
MLP–CNN, of 3680) are
of
misclassified as crack ROIs. As shown in Figure 17b, with the use of the MLP–CNN,
misclassification are largely reduced, i.e., only 8 (out of 542) crack ROIs and 5 background ROIs instances of
misclassification
are misclassified. are largely reduced, i.e., only 8 (out of 542) crack ROIs and 5 background ROIs are
misclassified.
confusion matrix in Figure 17a shows that the original CPD network results in 114 (out of 542) crack
ROIs that are misclassified as background ROIs, and of the background ROIs, 34 (out of 3680) are
misclassified as crack ROIs. As shown in Figure 17b, with the use of the MLP–CNN, instances of
misclassification are largely reduced, i.e., only 8 (out of 542) crack ROIs and 5 background ROIs are
Sensorsmisclassified.
2020, 20, 2021 15 of 24
(a) (b)
Figure 17. Confusion matrix constructed based on the recognition results using (a) the original CPD
Figure 17. Confusion matrix constructed based on the recognition results using (a) the original CPD
and (b) the MLP–CNN.
and (b) the MLP–CNN.
To further examine the crack detection accuracy, 5000 background ROIs combined with varied
To further examine the crack detection accuracy, 5000 background ROIs combined with varied
numbers of crack ROIs (from 300 to 1800) were used in the training set. As shown in Figure 18, the test
numbers of crack ROIs (from 300 to 1800) were used in the training set. As shown in Figure 18, the
accuracy of both the original CPD and the MLP–CNN tends to be stable after the number of crack
test accuracy of both the original CPD and the MLP–CNN tends to be stable after the number of crack
ROIs exceeds 1500. It is notable that the accuracy enhancement with use of the MLP–CNN is especially
SensorsROIs exceeds
2020, 20, 1500.
x FOR PEER It is notable that the accuracy enhancement with use of the MLP–CNN
REVIEW 16 of 25 is
significant under relatively small amounts of training data.
especially significant under relatively small amounts of training data.
Original CPD MLP-CNN

100
80
Acuraccy (%)
60
40
20
0
300 600 900 1200 1500 1800
Number of crack ROIs
Figure 18. Accuracy of the original CPD and the MLP–CNN subject to different numbers of crack ROIs
Figure 18. Accuracy of the original CPD and the MLP–CNN subject to different numbers of crack
in the training set.
ROIs in the training set.
Figure 19 shows the CPD results using a sliding window with the sliding step of 16 pixels.
Figure 19 shows the CPD results using a sliding window with the sliding step of 16 pixels. The
The results of the original CPD show large discontinuities along the main portion of the crack, whereas
results of the original CPD show large discontinuities along the main portion of the crack, whereas
the MLP–CNN has an obviously superior accuracy of detection, by which all cracks are identified
precisely, as shown in Figure 19b.
Figure 19 shows the CPD results using a sliding window with the sliding step of 16 pixels. The
results of the original CPD show large discontinuities along the main portion of the crack, whereas
Sensors 2020, 20, 2021 16 of 24
(a)
(b)
Figure 19. Crack position detection(CPD) effect based on (a) the original CPD and (b) MLP–CNN,
Figure 19. Crack position detection(CPD) effect based on (a) the original CPD and (b) MLP–CNN,
under moderate level of noise influence.
under moderate level of noise influence.
4.3. Crack Type Identification Subject to Moderate Noise Level
4.3. Crack Type Identification Subject to Moderate Noise Level
The accuracy of the CTI network changed from 99.3% to 98.8% with use of the MLP–CNN
The
framework accuracy of the CTI
under moderate network
noise. changed
Although therefrom
was 99.3%
a slighttoaccuracy
98.8% with use of 0.6%
decrease the MLP–CNN
after using
framework
MLP–CNN,under moderate
additional noise.showed
analysis Althoughthatthere
the was a slightinaccuracy
difference accuracydecrease of 0.6%
before and after
after theusing
MLP
MLP–CNN,
treatment was additional
negligible.analysis showed that the difference in accuracy before and after the MLP
treatment was negligible.
The variation of accuracy in crack type identification along with the increase of the size of the
The set
training variation
is shown of accuracy
in Figurein 20.crack
The type identification
abscissa representsalong with the
the number of increase of the300
images (from sizetoof1800,
the
training
with the set is shown
increment of in Figure
300) in the20. The abscissa
training set that represents
includes the the number
five of images
crack types. (from 300shows
The ordinate to 1800,the
with theaccuracy
testing increment of of 300)type
crack in the training setusing
classification that includes
the originalthe CTI
five and
crackthe
types. The ordinate
MLP–CNN, shows
respectively.
the testing
It can be seen accuracy
that theofoverall
crackaccuracy
type classification
of crack type using the original
identification usingCTI theand the CTI
original MLP–CNN,
and the
respectively. It can be seenunder
MLP–CNN, respectively, that the overallsizes
different accuracy of crack
of training type
sets, identificationMoreover,
is comparable. using the original CTI
the accuracy
and
tendsthetoMLP–CNN,
remain stable respectively,
along withunder differentinsizes
the increase sizeofoftraining sets, set.
the training is comparable.
To conclude,Moreover,
no obvious the
accuracy
improvement tendsofto remain accuracy
detection stable along
was with the increase
obtained in size ofunder
using MLP–CNN the training set. low
a relatively To conclude,
level of noiseno
obvious improvement of detection accuracy was obtained using MLP–CNN
influence. Improvement in detection accuracy is expected to be achieved under higher noise levels. under a relatively low
level of noise influence. Improvement in detection accuracy is expected to be achieved under higher
noise levels.
Original CTI MLP-CNN

100
80
Accuracy (%)
60
40
respectively. It can be seen that the overall accuracy of crack type identification using the original CTI
and the MLP–CNN, respectively, under different sizes of training sets, is comparable. Moreover, the
accuracy tends to remain stable along with the increase in size of the training set. To conclude, no
obvious improvement of detection accuracy was obtained using MLP–CNN under a relatively low
level of
Sensors noise
2020, influence. Improvement in detection accuracy is expected to be achieved under higher
20, 2021 17 of 24
noise levels.
Original CTI MLP-CNN

100
80
Accuracy (%) 60
40
20
0
300 600 900 1200 1500 1800
Training set size of CTI
Figure 20. Accuracy of the original CTI and the MLP–CNN subject to different numbers of crack ROIs
Figure 20. Accuracy of the original CTI and the MLP–CNN subject to different numbers of crack ROIs
4.4. Crack Position Detection Subject to Severe Noise Influence
Severe noise influence subject to different sources, including light spots, blurs, and surface
anomalies (e.g., uneven surface, low color contract, stains) is considered in this section, treated by the
original CPD and the MLP–CNN framework.
4.4.1. Light Spots

Light spots were simulated by first setting the center of the light source, i.e., (x0 , y0 ). With w and h
defined as the width and height of an image, x0 and y0 were set at w/3 and h/2, respectively. Thus,
images including light spots were generated in accordance with:
q
(x − x0 )2 + ( y − y0 )2
∆(x, y) = k × (1 − ), (5)
r
L(x, y) = l(x, y) + ∆(x, y). (6)
where k is a parameter relating to the brightness of the light spot, which is equal to 0.4; r is the radius
of the light source; ∆(x, y) is a matrix of the same size as the input image, representing the increase in
the pixel values of the image. Adding it to the matrix of the original image l(x, y) gives L(x, y) after the
lighting simulation.
The closer pixel points were to the light source, the greater was their brightness value. 80 images
after light spot treatment were randomly selected to form the test set. The crack position detection
results obtained using the original CPD and the MLP–CNN are shown in Figure 21. It is seen that
the original CPD fails to identify some main portions of the crack, which, interestingly, are located
close to the light source. In contrast, the MLP–CNN shows significant resistance to light spot influence,
leading to accurate identification results.
The closer pixel points were to the light source, the greater was their brightness value. 80 images
after light spot treatment were randomly selected to form the test set. The crack position detection
results obtained using the original CPD and the MLP–CNN are shown in Figure 21. It is seen that the
original CPD fails to identify some main portions of the crack, which, interestingly, are located close
to the 2020,
Sensors light20,
source.
2021 In contrast, the MLP–CNN shows significant resistance to light spot influence,
18 of 24
leading to accurate identification results.
(a) (b) (c)
Figure 21. The (a) original crack images, and the results of crack position detection (CPD) based on (b)
Figure 21. The (a) original crack images, and the results of crack position detection (CPD) based on
the original CPD and (c) the MLP–CNN, under the influence of light spots.
(b) the original CPD and (c) the MLP–CNN, under the influence of light spots.
4.4.2. Blurs
4.4.2. Blurs
Motion blurs were used to simulate camera movement caused by environmental or human factors.
The motionMotion blurs werewas
displacement usedsettotosimulate
20 pixelscamera movement
with motion angle of 15◦by
caused environmental
. After or human
the simulation of motion
blur,factors.
80 crackTheimages
motion were
displacement
selectedwas to set
formto 20 pixels with
a blurred testmotion angle
set. The of 15°.
crack After the
position simulation
detection results,
of motion blur, 80 crack images were selected to form a blurred test set. The crack
under the blur noise, based on the original CPD and the MLP–CNN are shown in Figure 22. As is position detection
results, under the blur noise, based on the original CPD and the MLP–CNN are shown in Figure 22.
evident, the blurring of image results in poor crack recognition with use of the original CPD, causing
As is evident, the blurring of image results in poor crack recognition with use of the original CPD,
serious discontinuity and even failure in crack identification, as seen from Figure 22b. Fortunately,
causing serious discontinuity and even failure in crack identification, as seen from Figure 22b.
the MLP–CNN
Fortunately, largely increases largely
the MLP–CNN the identification accuracy
increases the under the
identification blurring
accuracy conditions,
under so that all
the blurring
details of crack information are well maintained.
conditions, so that all details of crack information are well maintained.
Figure 22. Cont.

Sensors 2020, 20, 2021 19 of 24
(a) (b) (c)
Figure 22. (a)

Figure 22. Original crack
(a) Original images,
crack and
images, andthe
theresults
resultsof
ofcrack
crack position detection(CPD)
position detection (CPD)based
basedonon
(b)(b) the
original CPD and
the original CPD (c)and
the(c)
MLP–CNN, under
the MLP–CNN, the the
under influence of of
influence blurs.
blurs.
4.4.3.4.4.3.
Surface Anomalies
Surface Anomalies
120 120
images
images subject totonoise
subject noisedue dueto to background
background surfacesurface anomalies
anomalies in terms
in terms of wear,
of wear, stains,stains,
etc., etc.,
werewere
selected to form
selected to form thethe
test
testsetsetofofsurface
surface anomalies.
anomalies. The Thedetection
detection results
results for uneven
for uneven concreteconcrete
surfaces are shown in Figure 23. It can be seen that the overall detection accuracy of the original CPD
surfaces are shown in Figure 23. It can be seen that the overall detection accuracy of the original CPD
is satisfactory,
is satisfactory, butbutfalsefalse alarms
alarms stillexist
still existinin the
the intact
intactregion,
region,asasshown
shown in Figure 23a.23a.
in Figure The false alarms
The false alarms
caneffectively
can be be effectively preventedwith
prevented with thethe use
useofofMLP–CNN
MLP–CNN as shown
as shownin Figure 23b. Figure
in Figure 23b. 24 shows24
Figure theshows
detection results
the detection resultssubject
subjecttotolow
lowcontrast
contrast in in
color between
color between crack andand
crack background,
background, defined here as
defined here as
another type of surface anomaly. With the original CPD, several background ROIs were misclassified
another type of surface anomaly. With the original CPD, several background ROIs were misclassified as
as crack ROIs, and some main portions of cracks were missed. In contrast, using MLP–CNN, the
crack ROIs, and some main portions of cracks were missed. In contrast, using MLP–CNN, the majority
majority of crack information was well manifested, containing few misclassifications and rich crack
of crack information
information. Figure was25well
showsmanifested,
the detection containing fewregard
results with misclassifications and in
to surface stains, rich crackit information.
which can be
Figure 25 shows the detection results with regard to surface stains, in which it
seen that the stains further raise the issue of misclassification, with most stain areas in the background can be seen that the
stains further raise
identified as cracksthe with
issueuse
of misclassification,
of the original CPD. with
Suchmost stain
issues are areas
largelyinprevented
the background
with useidentified
of the as
cracks with
Sensorsuse
MLP–CNN. 2020,of
20,the original
x FOR CPD. Such issues are largely prevented with use of the MLP–CNN.
PEER REVIEW 20 of 25
(a) (b)
Figure 23. Results

Figure of crack
23. Results of crackposition detection
position detection (CPD)
(CPD) based
based on (a) on (a) the CPD
the original original CPD
and (b) and (b) the
the MLP–
MLP–CNN,
CNN,with unevenconcrete
with uneven concrete surfaces.
surfaces.
(a) (b)
Sensors 2020, Figure

20, 202123. Results of crack position detection (CPD) based on (a) the original CPD and (b) the MLP– 20 of 24
CNN, with uneven concrete surfaces.

(a) (b)
Figure 24. Results of crack position detection (CPD) based on (a) the original CPD and (b) the MLP–
Figure 24. Results of crack position detection (CPD) based on (a) the original CPD and (b) the
CNN, with low color contract between crack and background.
MLP–CNN, with low color contract between crack and background.
(a) (b)
Figure Results
25. 25.
Figure Resultsofof crack position
crack position detection
detection (CPD)
(CPD) basedbased on original
on (a) the (a) theCPD
original CPD
and (b) and (b) the
the MLP–
MLP–CNN, with
CNN, with stains
stains in background.
in the the background.
4.5. Crack TypeType

4.5. Crack Identification Subject
Identification SubjecttotoSevere
Severe Noise Influence
Noise Influence
TableTable 4 shows
4 shows fourfour test
test setsused
sets usedbyby the CTI
CTInetwork, each
network, representing
each a specific
representing type oftype
a specific noise,
of noise,
wherewhere moderate
moderate noise
noise refers
refers totonoise
noiseinfluence
influence as
asdiscussed
discussedinin
Sections 4.2 and
Sections 4.3. 4.3.
4.2 and
Table 4. Four test sets types and the number of pictures contained.
Test set Moderate noise Light spot Blur Surface anomaly

Size of each set 577 80 80 120
Table 5 presents the accuracy of the original CTI network and the MLP–CNN in crack type
Sensors 2020, 20, 2021 21 of 24
Table 4. Four test sets types and the number of pictures contained.
Test Set Moderate Noise Light Spot Blur Surface Anomaly

Size of each set 577 80 80 120
Table 5 presents the accuracy of the original CTI network and the MLP–CNN in crack type
identification subject to different types and levels of noise influence. Apart from the observation under
moderate noise influence, where the accuracy of the original CTI and the MLP–CNN is comparable,
a clear difference in identification accuracy between the original CTI and the MLP–CNN can be seen
under severe noise influence. Specifically, MLP–CNN improves the accuracy by 2.8% under light
spot influence, 5.4% under blur influence, and 4.7% under surface anomaly influence. In summary,
the stronger recognition accuracy of the MLP–CNN than that obtained using the CNN without the
implementation of the MLP is demonstrated, particularly under the influence of severe noise.
Table 5. Accuracy of crack type identification under different noise types and levels.
Noise Condition Accuracy of CTI Network (%) Accuracy of MLP–CNN (%)

Moderate noise 99.3 98.7
Light spot 86.5 89.3
Blur 82.6 88.0
Surface anomaly 89.6 94.3
5. Conclusions
In recognition of the influence of severe noise included in concrete surface images, which largely
degraded the accuracy of existing methods in crack identification, a MLP–CNN framework was
established in this paper relying on hybrid utilization of CNN and a multi-layered preprocessing
technique, the key elements of which were homomorphic filtering and the Otsu thresholding method.
With a binary nature and prominent crack features, the crack images processed by the MLP enabled
a significant enhancement of accuracy and versatility for concrete crack identification based on
CNN application.
Compared to the original CNN networks, i.e., CPD and CTI, the MLP–CNN framework proved
able to improve crack identification. Specifically, under a moderate noise level, accuracy of CPD was
increased by 3.1% using the MLP–CNN. However, the improvement in CTI could not be observed
clearly. Severe noise influence was then introduced, with sources in terms of light spot, blur, or surface
anomaly. Clear enhancement of crack identification was observed: (a) Subject to the noise influence of
light spot and blur, a significant amount of information of large portions of cracks was misclassified
with use of the original CPD, whereas the MLP–CNN identified the cracks precisely, with the majority
of crack information preserved. For CTI, the MLP–CNN increased recognition accuracy by 2.8% and
5.4% under the influence of light spot and blur, respectively. (b) Subject to noise influence from surface
anomalies, specifically uneven surface, low color contract between crack and background, and stains,
the original CPD encountered problems of missing crack information as well as giving false alarms
for the background region. These two drawbacks were well addressed by applying the MLP–CNN.
For CTI, the MLP–CNN increased the recognition accuracy by 4.7%.
The MLP–CNN framework shows potential applications in accurate identification of surface
cracks in concrete buildings such as bridges and dams. And its efficiency can be maximized in
some future scenarios, for example, rapid capturing of massive concrete surface images using drones.
The robustness and flexibility of the MLP–CNN framework can be further enhanced in future study by
adjusting the CNN types and parameters and enlarging the datasets. More comprehensive information
of cracks, such as depths and more patterns, is aimed to be recognized quantitatively.
Sensors 2020, 20, 2021 22 of 24
Author Contributions: Conceptualization, D.N. and M.C.; methodology, R.F., L.S. and H.X.; software, R.F.;
validation, D.N., Z.W. and M.C.; formal analysis, L.S., Z.W. and H.X.; investigation, R.F. and Z.W.; resources, M.C.;
data curation, R.F.; writing—original draft preparation, R.F.; writing—review and editing, T.L.; visualization, H.X.;
supervision, D.N.; project administration, M.C.; funding acquisition, M.C. All authors have read and agreed to the
published version of the manuscript.
Funding: This work is supported by the Transportation Science Research Project of Jiangsu Province (No.:
2019Z02), the Key R&D Project of Jiangxi Science and Technology Department (No.: 20192BBH80021), and the
Chinese (Jiangsu)-Czech Bilateral Co-funding R&D Project (No.: BZ2018022).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Song, G.; Ma, N.; Li, H.-N. Applications of shape memory alloys in civil structures. Eng. Struct. 2006, 28,
1266–1274. [CrossRef]
2. Song, G.; Gu, H.; Mo, Y.-L. Smart aggregates: Multi-functional sensors for concrete structures—A tutorial
and a review. Smart Mater. Struct. 2008, 17, 033001. [CrossRef]
3. Bayat, M.; Pakar, I.; Domairry, G. Recent developments of some asymptotic methods and their applications
for nonlinear vibration equations in engineering problems: A review. Lat. Am. J. Solids Struct. 2012, 9, 1–93.
[CrossRef]
4. Phares, B.M.; Rolander, D.D.; Graybeal, B.A.; Washer, G.A. Reliability of visual bridge inspection. Public
Roads 2001, 64.
5. Chen, P.-H.; Shen, H.-K.; Lei, C.-Y.; Chang, L.-M. Support-vector-machine-based method for automated steel
bridge rust assessment. Autom. Constr. 2012, 23, 9–19. [CrossRef]
6. Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. ManCybern. 1979, 9,
62–66. [CrossRef]
7. Vala, H.J.; Baxi, A. A review on Otsu image segmentation algorithm. Int. J. Adv. Res. Comput. Eng. Technol.
(Ijarcet) 2013, 2, 387–389.
8. Kirschke, K.; Velinsky, S. Histogram-based approach for automated pavement-crack sensing. J. Transp. Eng.
1992, 118, 700–710. [CrossRef]
9. Oh, H.; Garrick, N.W.; Achenie, L.E. Segmentation algorithm using iterative clipping for processing
noisy pavement images. In Proceedings of the Imaging Technologies: Techniques and Applications in
Civil Engineering. Second International ConferenceEngineering Foundation; and Imaging Technologies
Committee of the Technical Council on Computer Practices, Davos, Switzerland, 25–30 May 1997; American
Society of Civil Engineers: Reston, VA, USA, 1998.
10. Bonnet, N.; Cutrona, J.; Herbin, M. A ‘no-threshold’histogram-based image segmentation method. Pattern
Recognit. 2002, 35, 2319–2322. [CrossRef]
11. Oliveira, H.; Correia, P.L. Automatic Road Crack Segmentation using Entropy and Image Dynamic
Thresholding. In Proceedings of the 2009 17th European Signal Processing Conference, Glasgow, UK,
24–28 August 2009; pp. 622–626.
12. Al-Amri, S.S.; Kalyankar, N.V. Image segmentation by using threshold techniques. arXiv 2010, arXiv:1005.4020.
13. Migdal, J.; Grimson, W.E.L. Background subtraction using markov thresholds. In Proceedings of the 2005
Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION’05)-Volume 1, Breckenridge,
CO, USA, 5–7 January 2005; pp. 58–65.
14. Muthukrishnan, R.; Radha, M. Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inf.
Technol. 2011, 3, 259.
15. Ziou, D.; Tabbone, S. Edge detection techniques-an overview. Pattern Recognit. Image Anal. C/C Raspoznavaniye
Obraz. I Anal. Izobr. 1998, 8, 537–559.
16. Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear
Phenom. 1992, 60, 259–268.
17. Cha, Y.-J.; You, K.; Choi, W. Vision-based detection of loosened bolts using the Hough transform and support
vector machines. Autom. Constr. 2016, 71, 181–188.
18. Ayenu-Prah, A.; Attoh-Okine, N. Evaluating pavement cracks with bidimensional empirical mode
decomposition. Eurasip J. Adv. Signal Process. 2008, 2008, 1–7.
19. Acharya, T.; Tsai, P.-S. Edge-detection Based Noise Removal Algorithm. US Patent 6,229,578, 8 May 2001.
Sensors 2020, 20, 2021 23 of 24
20. Maode, Y.; Shaobo, B.; Kun, X.; Yuyao, H. Pavement crack detection and analysis for high-grade highway.
In Proceedings of the 2007 8th International Conference on Electronic Measurement and Instruments, Xi’an,
China, 16–18 August 2007; pp. 4-548–4-552.
21. Zhou, J.; Huang, P.S.; Chiang, F.-P. Wavelet-based pavement distress detection and evaluation. Opt. Eng.
2006, 45, 027007.
22. Knezevic, M.; Cvetkovska, M.; Hanák, T.; Braganca, L.; Soltesz, A. Artificial Neural Networks and Fuzzy
Neural Networks for Solving Civil Engineering Problems. Complexity 2018, 2018. [CrossRef]
23. Cha, Y.-J.; Choi, W. Vision-based concrete crack detection using a convolutional neural network. In Dynamics
of Civil Structures; Springer: Cham, Switzerland, 2017; Volume 2, pp. 71–73.
24. Simard, P.Y.; Steinkraus, D.; Platt, J.C. Best practices for convolutional neural networks applied to visual
document analysis. In Proceedings of the ICDAR, Edinburgh, UK, 3–6 August 2003; IEEE Computer Society:
Washington, DC, USA, 2003; Volume 3, pp. 958–962.
25. Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep
convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and
transfer learning. IEEE Trans. Med Imaging 2016, 35, 1285–1298. [CrossRef]
26. Yan, Z.; Zhang, H.; Piramuthu, R.; Jagadeesh, V.; DeCoste, D.; Di, W.; Yu, Y. HD-CNN: Hierarchical deep
convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE International
Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2740–2748.
27. Radenović, F.; Tolias, G.; Chum, O. CNN image retrieval learns from BoW: Unsupervised fine-tuning with
hard examples. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland,
2016; pp. 3–20.
28. Chen, F.-C.; Jahanshahi, M.R. NB-CNN: Deep learning-based crack detection using convolutional neural
network and Naïve Bayes data fusion. IEEE Trans. Ind. Electron. 2017, 65, 4392–4400. [CrossRef]
29. Browne, M.; Ghidary, S.S. Convolutional neural networks for image processing: An application in robot
vision. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Melbourne, Australia,
4–7 December 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 641–652.
30. Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural
networks. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
31. Zhao, X.; Li, S. A method of crack detection based on convolutional neural networks. In Proceedings of the
11th International Workshop on Structural Health Monitoring, Menlo Park, CA, USA, 12–14 September 2017;
pp. 978–984.
32. Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network.
In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA,
25–28 September 2016; pp. 3708–3712.
33. Gavilán, M.; Balcones, D.; Marcos, O.; Llorca, D.F.; Sotelo, M.A.; Parra, I.; Ocaña, M.; Aliseda, P.; Yarza, P.;
Amírola, A. Adaptive road crack detection system by pavement classification. Sensors 2011, 11, 9628–9657.
[PubMed]
34. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep
cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [PubMed]
35. Fries, R.; Modestino, J. Image enhancement by stochastic homomorphic filtering. IEEE Trans. Acoust.
SpeechSignal Process. 1979, 27, 625–637.
36. Lee, Y.-L.; Park, H.-W. Signal Adaptive Filtering Method and Signal Adaptive Filter for Reducing Blocking
Effect and Ringing Noise. Google Patents, U.S. Patent No. 6,259,823, 10 July 2001.
37. Ma, Z.; He, K.; Wei, Y.; Sun, J.; Wu, E. Constant time weighted median filtering for stereo matching and
beyond. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia,
1–8 December 2013; pp. 49–56.
38. Ioannou, Y.; Robertson, D.; Cipolla, R.; Criminisi, A. Deep roots: Improving cnn efficiency with hierarchical
filter groups. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu,
HI, USA, 21–26 July 2017; pp. 1231–1240.
39. Miao, S.; Wang, Z.J.; Liao, R. A CNN regression approach for real-time 2D/3D registration. IEEE Trans. Med
Imaging 2016, 35, 1352–1363.
40. Leshno, M.; Lin, V.Y.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial
activation function can approximate any function. Neural Netw. 1993, 6, 861–867.
Sensors 2020, 20, 2021 24 of 24
41. Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth
International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011;
pp. 315–323.
42. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Proceedings of the Advances in neural information processing systems, Lake Tahoe, NV, USA, 3–6 December
2012; pp. 1097–1105.
43. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate
shift. arXiv 2015, arXiv:1502.03167.
44. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent
neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
45. Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with gumbel-softmax. arXiv 2016, arXiv:1611.01144.
46. Burges, C.; Shaked, T.; Renshaw, E.; Lazier, A.; Deeds, M.; Hamilton, N.; Hullender, G. Learning to rank
using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, Bonn,
Germany, 7–11 August 2005; pp. 89–96.
47. Krizhevsky, A.; Hinton, G. Convolutional deep belief networks on cifar-10. Unpubl. Manuscr. 2010, 40, 1–9.
48. Schmugge, S.J.; Rice, L.; Nguyen, N.R.; Lindberg, J.; Grizzi, R.; Joffe, C.; Shin, M.C. Detection of cracks in
nuclear power plant using spatial-temporal grouping of local patches. In Proceedings of the 2016 IEEE
Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016;
pp. 1–7.
49. Hay, A. The derivation of global estimates from a confusion matrix. Int. J. Remote Sens. 1988, 9, 1395–1398.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Sensors

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sensors

Uploaded by

Copyright:

Available Formats

sensors

Keywords: concrete crack identification; convolutional neural network; homomorphic filtering;

Sensors 2020, 20, 2021; doi:10.3390/s20072021 www.mdpi.com/journal/sensors

Sensors 2020, 20, x FOR PEER REVIEW 3 of 25

Figure 1. Diagram of the overall architecture of the MLP–CNN framework.

(a) (b) (c) (d) (e)

Figure 3. Overall structure of CNN.

Figure 4. Examples asofL4commonly

3.1.2. Final Layer

3.1.3. Training Algorithms

∆J (i) (θ0 , θ1 ) (i)

and the parameter updating formula is written as:

3.2. Construction of the CPD Network

3.2.1. Preparation of Data Sets

Sensors 2020, 20, x FOR PEER REVIEW 7 of 25

L9 L10 Layer L11 L12 L13 L14 L15 L16

Sensors 2020, 20, 2021 - 3/3/- NOI 576 - - 256 - 2 9 of 24

Sensors 2020, 20, x FOR PEER REVIEW 10 of 25

3.4. Construction of the CTI Network

3.4.1. Preparation of Data Sets

(a) (b) (c) (d) (e)

L17 L18 L19 L20 L21 L22 L23 L24 L25

0 100 200 300 400 500

(a) (b) (c) (d) (e)

(a) (b) (c) (d) (e)

Original CPD MLP-CNN

Sensors 2020, 20, x FOR PEER REVIEW 17 of 25

Original CTI MLP-CNN

Original CTI MLP-CNN

4.4.1. Light Spots

L(x, y) = l(x, y) + ∆(x, y). (6)

(a) (b) (c)

Figure 22. Cont.

(a) (b) (c)

Figure 22. (a)

Figure 23. Results

Sensors 2020, Figure

Sensors 2020, 20, x FOR PEER REVIEW 21 of 25

4.5. Crack TypeType

Test set Moderate noise Light spot Blur Surface anomaly

Test Set Moderate Noise Light Spot Blur Surface Anomaly

Noise Condition Accuracy of CTI Network (%) Accuracy of MLP–CNN (%)

You might also like