CIE Lab of Concrete

Cement and Concrete Research 161 (2022) 106926
Contents lists available at ScienceDirect
Cement and Concrete Research

journal homepage: www.elsevier.com/locate/cemconres
End-to-end concrete appearance analysis based on pixel-wise semantic

segmentation and CIE Lab
Zhexin Hao a, *, Xinyu Qi b
a
School of Civil Engineering, Southeast University, Nanjing, Jiangsu, China
b
School of Cyber Science and Engineering, Southeast University, Nanjing, Jiangsu, China
A R T I C L E I N F O A B S T R A C T
Keywords: The traditional expert-on-site acceptance method has been unable to meet the need of intellectualization of
Concrete appearance analysis modern concrete industry. The cutting-edge achievements in the field of computer vision have immensely
Semantic segmentation accelerated the automation of concrete surface-related engineering. We propose an end-to-end concrete
FCN
appearance multiple identification method based on pixel-wise semantic segmentation and a corresponding
VGG
CIE Lab
method for quantitative chromatic aberration analysis based on CIE Lab. The objective function of the proposed
Fully Convolutional Network (FCN)-based network was optimized in terms of both empirical and structural risks,
and the complexity was reduced to improve its generalization performance. After comparing different preposed
base CNNs, the best results show 98.43 % pixel accuracy (PA), 91.33 % mean pixel accuracy (MPA), 83.89 %
mean intersection over union (MIoU), and 96.95 % frequency weighted intersection over union (FWIoU). In
addition, intra-class variation was explored under different light intensity and surface moisture content to
demonstrate the robustness of the proposed method.
1. Introduction detection tasks, such as Support Vector Machines (SVM) [5,6], K-Nearest
Neighbor (KNN) [7,8], and Decision Tree (DT) [9,10]. These traditional
As it is strong, durable and relatively inexpensive, concrete is the ML approaches did obtain appreciable results within their own close-sets
most used material worldwide in civil and infrastructure engineering. under pre-configured experimental conditions, but such “shallow
However, the appearance defects such as cracks and pores may reduce models” usually perform unsatisfactorily in real circumstances due to
the material's resistance to water/chemical transport [1] and thus the the lack of non-linear activation and the imperfection of hand-crafted
durability of concrete structures [2]. On the other hand, the appearance features (error, repetition, subjectivity, etc.).
defects are the key parts of many visual condition assessment manuals To solve the problems above, Deep Neural Network (DNN), as an
[3]. In addition to the impact of defects on material and structural advanced branch in ML, has been employed in various visual-data-based
performance, the quality of concrete appearance is also important with tasks [11–18]. Among them, Convolutional Neural Network (CNN) [19]
the improvement of the aesthetic requirements and material construc has become the representative network because the extraction of image
tion techniques. Traditional concrete appearance analysis is based on features by convolution operation has a high similarity with the se
visual inspection and manually sample measurement, which strongly mantic understanding process conducted within cortical neurons, logi
depend on the specialist's knowledge and experience and lack objec cally. In concrete appearance image analysis, CNN has been widely used
tivity in the quantitative analysis [4]. These bring about the needs of for crack [20–23] and damage [24–28] detection. Despite the excellent
developing a more objective and automated analysis method that re results of the above works, they are high-cost, with strict requirements
quires little or no direct human intervention. on the size of the input images, and unable to achieve end-to-end output.
In recent years, Machine Learning (ML) has become widely applied In 2015, Fully Convolutional Network (FCN) [29] was proposed. As
in concrete appearance analysis field for automated image processing. the pioneering work of semantic segmentation, FCN implemented a low-
Riding the wave of the development of computer vision, traditional ML cost, pixel-wise and end-to-end architecture for the first time, and its
algorithms have been utilized in their respective concrete surface object preposed base CNN is replaceable. Novel works have been carried out
* Corresponding author.
E-mail address: zxhao@seu.edu.cn (Z. Hao).
https://doi.org/10.1016/j.cemconres.2022.106926
Received 7 March 2022; Received in revised form 14 July 2022; Accepted 25 July 2022
Available online 9 August 2022
0008-8846/© 2022 Elsevier Ltd. All rights reserved.
Z. Hao and X. Qi Cement and Concrete Research 161 (2022) 106926
Fig. 1. Flowchart of the proposed analysis method.
Fig. 2. Schematic of dataset process and label preparation.
for concrete crack and damage detection [30–37]. However, most of the inequality across categories in concrete appearance dataset are both
studies only focused on a single category of the target (usually cracks) non-negligible, which were not taken into consideration in the former
and ignored other concrete appearance targets such as pores, designed works as well.
seams, or holes, therefore may have poor resistance to target interfer Furthermore, the concrete appearance color has been paid even less
ence and cannot perform further analysis of the general surface (e.g. attention to. The little work available [40–43] also lacks automated
chromatic aberration) in practical applications. There exist several quantitative evaluation of chromatic aberration, not to mention the fact
studies [38,39] which considered the multiple categories of targets. that the analysis of chromatic aberration is significantly influenced by
However, it can be found that the effect of surface pores on crack various targets on the concrete surface.
identification is not negligible, but they were not treated as an inde In order to address the above issues, an end-to-end concrete
pendent category, not to mention their own relevance as a target that appearance analysis method based on pixel-wise semantic segmentation
can be quantified and analyzed for concrete appearance engineering. and CIE Lab is proposed. The content of this research is described as
Furthermore, the environmental conditions such as light intensity and follows. In Section 2, the dataset preparation, network architecture,
surface moisture content at the time of image collection were not model optimization, and chromatic aberration analysis method are
considered and analyzed, thus the robustness of the model is limited. On introduced. Then, the performance of the proposed model is evaluated
the other hand, the tiny inter-class variation and the great class- and the end-to-end analysis outputs are presented in Section 3. Finally,
2
Fig. 3. Schematic of one-hot encoding.
Section 4 discusses the robustness of the proposed model and the pro 2.1. Dataset preparation
posed chromatic aberration analysis.
The images of concrete were taken from Xiong'an Station of Beijing-
2. Methodology Xiong'an Intercity Railway, Chaoyang Station of Beijing-Shenyang high-
speed Railway, Nanjing-Jurong Intercity Rail Transit, all in China, as
The proposed concrete appearance analysis method includes dataset well as concrete bulk samples prepared in the laboratory. The images
preparation, pixel-wise semantic segmentation model, and chromatic were captured by both an unmanned aerial vehicle (UAV) DJI Mavic 2
aberration analysis, as shown in Fig. 1. At first, image pre-processing Pro under auto mode, and an orbital large format scanner Senniao A0W.
includes geometric correction, cropping, and stitching of images was The camera view of the UAV was 77◦ and the distance between the UAV
carried out. Then, a dataset of concrete appearance images in different and the concrete surface was 1.5 m. The resolution of the scanner was
circumstances for the training, validation, and testing of automated from 300 dpi × 300 dpi to 900 dpi × 900 dpi. Different environmental
semantic segmentation was established. The pre-processed images were circumstances when captured the images were taken into consideration,
classified into pores, cracks, peculiarities, and general surfaces accord including different temperature (0–40 ◦ C), wind speed (0–8 m/s), wind
ing to the target types, and the dataset was prepared by manual coding direction (− 90 ~ +90◦ ), lighting conditions (1200–60,010 Lux), and
of labels. Afterward, a pixel-wise semantic segmentation model based on surface moisture content (0–100 %). In addition to the pores and cracks
a convolutional neural network was established. The training and on the surface of concrete, marks, stains, formwork seams, designed
optimization were conducted to perform pixel-wise recognition based seams, designed holes such as tie-bolt holes, pre-buried parts and pre-
on the dataset. At last, the chromatic aberration analysis was carried out drilled holes, and other surface objects were also captured in real
after removing the effect of the concrete surface targets. situations.
The data processing, network training, and evaluation were pro To eliminate the geometric distortion of images due to the equip
grammed in Python 3.7 based on PyCharm 2020 support. The deep- ment, acquisition methods, personnel operations, etc., Affine Trans
learning framework for pixel-wise semantic segmentation was formation, Projection Transformation and Polar Coordinate
PyTorch 1.7. The parallel computing framework for the network Transformation were carried out. Take a 5152 × 8560 JPG image ob
training was CUDA 10.2 and cuDNN 8.0. The specifications of the tained by the scanner as an example, the dataset preparation process is
workstation used for training and testing the network were as follows: shown in Fig. 2. The labeling was done by the authors on the basis of on-
NVIDIA Quadro M5000, Intel Xeon E5-2620 v4 @ 2.10 GHz CPU. site marking of the objects and was checked by experts of the
3
Fig. 4. Network architecture of the proposed pixel-wise semantic segmentation method.
engineering projects. The labeling accuracy was super pixel level, with m), single acquisition area, and images captured by scanner with
an actual accuracy of 0.5 ± 0.1 mm. Because the network established in different resolution and scanning speed; (2) images under different
this paper (Fig. 4) needs to down-sample for 5 times, the size of a unit environmental conditions such as temperature (0–40 ◦ C), wind speed
needs to be divisible by 25. Considering the computing power and the (0–8 m/s), wind direction (− 90 ~ +90◦ ), light intensity (1200–60,010
training effect, the size of the unit was set as 480 × 352. And the format Lux), and concrete surface moisture content (0–100 %); (3) images
was converted to PNG to avoid compression losses during transmission captured with different concrete strength (C40 ~ C60), surface shapes
and storage. and colors. The proposed model was built, modified and evaluated based
The information of the concrete appearance images in this paper can on this database. The training, validation, and test were set as a sample
be divided into three categories: general surfaces, defective objects, and ratio of 8:1:1.
peculiarities. Defective objects refer to material or structural defects
such as pores and cracks. Peculiarities refer to the designed seams and 2.2. Concrete appearance classification network (CACN)
holes, as well as other non-general concrete surfaces such as color cards
and scaleplate for measurement. Considering the demand of subsequent 2.2.1. Network architecture
evaluation metrics calculation and the difference of the features of the Fully Convolutional Network for Semantic Segmentation (FCN) [29]
targets, the targets to be identified and the corresponding labels are as is an advanced, real-time semantic segmentation method that has been
follows: red ([R, G, B] = [128, 0, 0]) for pores, yellow ([R, G, B] = [128, used since 2015. It can adapt classic classification networks such as
128, 0]) for cracks, green ([R, G, B] = [0, 128, 0]) for peculiarity, and AlexNet [45], VGG net [46], and GoogLeNet [47] into fully convolu
black ([R, G, B] = [0,0,0]) for general surface (background). Each pixel tional networks and transfer their learned representations by fine-tuning
in each unit was labeled using Labelme compiled by Python 3.7. Then to the segmentation task. In addition to using a stack of small con
the generated JSON label file was converted to PNG using the compiled volutional kernels to significantly reduce the network parameters and
json_to_dataset codes to obtain the labeled image as shown in the lower thus the training time required for the model as well as the storage ca
right corner of Fig. 2. The cut-acquired unit image and the corre pacity, and increasing the network depth, i.e., increasing the number of
sponding labeled image are a pair of data. nonlinear mappings of the network, allowing the network to extract
One-hot encoding was used to encode N states using N-bit state features with better judgment information and improving the network
registers, each state by its independent register bits, and only one valid performance, FCN defines a skip architecture that combines semantic
at any time [44]. For the four classifications ([pore, peculiarity, crack, information from a deep, coarse layer with appearance information from
general surface]) problem in this paper, if the true label of the current a shallow, fine layer to produce accurate and detailed segmentation.
pixel is pore, the corresponding one-hot encoding is [1,0,0,0]. The Traditional CNNs are suitable for classification and regression tasks at
schematic of one-hot encoding is shown in Fig. 3. the image level and cannot perform classification tasks at the pixel level.
The dataset established includes 2463 sets of images captured by Furthermore, the form of output results of traditional CNNs are proba
UAV (1021 sets) and orbital scanner (1442 sets) and the corresponding bility vectors, which cannot be mapped directly from the input image to
labeled images prepared. The database contains: (1) images captured by the predicted image of the same size, i.e., end-to-end (image-to-image)
UAV with different image size, ISO (100–12,800), shutter time (1/ cannot be achieved. FCN solves these problems to a certain extent and
8000–1/2 s), aperture value (2.8–11), distance from the surface (0.5–4 significantly reduces the requirement of the input image size.
4
The proposed automated concrete appearance pixel-wise semantic obtain:

segmentation method was developed based on FCN. The modified
1 1
network architecture is shown in Fig. 4. The network architecture con ∑ { ( )} ≥ ∑ { } (3)
j exp α wj T z − wi T z j exp wj T z − wi T z
sists of five blocks of convolution (Conv1-Conv5), each of which in
cludes several convolutions with 3 × 3 convolution kernels, as shown in ∑ T T
where α > 1, j exp. x is monotonically increasing, and wj z - wi z <
the blue block. Each block of convolution is followed by 2 × 2 pooling
0 for all j ∕
= i. Therefore, the inequality is directly established.
(Pool1-Pool5), as shown in the green block. In order to realize a pixel-
As a result, the features obtained through the above naive training
wise end-to-end semantic segmentation result, the original fully-
may not be suitable for classification. Furthermore, the training and
connected layer is replaced with two layers of convolution kernels for
testing process are decoupled because it is impossible to precollect all
1 × 1 convolution (Conv6 and Conv7). The size of the output feature
the images of concrete. Due to the open-set which is unseen/strange to
image is only affected by the pooling because the same padding is
the trained model, the features of the four classes learned by the deep
chosen for each convolution. For the original RGB image of size 480 ×
CNNs need to be not only separable but also significantly distinguish
352, after five pooling, the length and width of the output image become
able. Inspired by recent advance in face recognition [48–50], we
1/32 of the original image, i.e., 15 × 11, and 4096 15 × 11 feature maps
introduce L2-norm layer and scale layer to search a hypersphere rep
are obtained at the output after Conv7. Finally, the feature maps are
resentation for the features:
reduced to the size of the original image by a 32 times up-sampling. It
should be noted that the VGG-16 here (inside the pink dashed box) is a w' =
w
(4)
preposed base network, which can be replaced with any CNNs, such as ‖w‖
LeNet [19], AlexNet [45], GoogLeNet [47], etc. The model was fine-
z
tuned based on the pre-training of the selected preposed network, and z' = β (5)
‖z‖
the feature maps are skip-fused, superimposed on the output, and clas
sified into CACN-32 s, CACN-16 s, and CACN-8 s according to the where β > 0 is the scaler in scale layer, i.e. the radius of the hypersphere.
multiplicity of up-sampling. Second, the softmax classifier is weak in modeling difficult or
The feature map of Pool5 is up-sampled 32 times to the original extreme samples. Unluckily, our task is a typical class-inequality task.
image size, which is defined as CACN-32 s; the feature map of Pool5 is During the preparation of the dataset, we found that the percentage of
expanded 2 times and fused with Pool4, and up-sampled 16 times to the the four targets was uneven. Specifically, the vast majority of pixels in
original image size, which is defined as CACN-16 s; expanded the fusion the images were general surface, while the defective objects and pecu
results of Pool5 and Pool4 two times, and then fused with Pool3, up- liarities were rare, in addition to the fact that there are less cracks on the
sampled 8 times to the original image size, which is defined as CACN- concrete surface with the improvement of concrete preparation tech
8 s. It can be seen that the shallower the network, the more geometric niques. This class-inequality could cause the model's predictions to be
information and the less high-level semantic information, vice versa. biased to minimize the empirical risk, lead to the badness of generaliz
CACN-32 s only restores the features of Conv5 convolutional kernels, ability. To address this problem, we add balancing factor which is
and the geometric features at the shallow level are missing. Therefore, determined by Eq. (6):
the model complements the geometric information by forward iteration.
CACN-16 s and CACN-8 s are constructed by skip-fusion, respectively, so median 1/m n
Bi = = = (6)
that the model could strike a balance between global and local class frequence pi /n pi ⋅m
information.
where, i denotes the category, m denotes the number of categories, n
2.2.2. Learning algorithm denotes the total number of the dataset pixels, and pi denotes the number
Although the architecture of FCN can provide end-to-end pipeline, of pixels of each category.
the preset parameters and learning strategy need to be modified to meet Cross Entropy was chosen as the basic objective function, which is
the requirements of automated concrete appearance pixel-wise semantic determined by Eq. (7):
segmentation. CEH(p, q) = Ep(x) [ − logq(x) ]
Softmax defines the output probability of a neuron as the ratio of the = Hp(x) (q(x) )
input of each neuron to the sum of the output of all neurons and was ∑ n (7)
chosen as the activation function, which is determined by: =− p(xi )logq(xi )
i=1
L
ezj
softmax : aLj = ∑ zL (1) where, Hp(x) (q(x)) denotes the number of bits needed to be encoded
ke
k
using q(x) under p(x) distribution, H(p(x)) denotes the minimum num
where, k = [0, 1, ......, m-1] denotes the number of categories, ZLj denotes ber of bits required for the true distribution p(x). Considering p(x) as the
the input of the j-th neuron in the L-th layer (last layer), aLj denotes the labeled value (i.e., true distribution) and q(x) as the predicted value, the
output of the j-th neuron in the L-th layer (last layer). smaller the Cross Entropy, the smaller the difference between the true
Though, there are two apparent issues of the original softmax clas value and the predicted value, thus the better the prediction of the
sifier for this task. First, the classification score is sensitive that it can be model.
pushed to be arbitrarily high by enlarging the norm of some of the ob Combined with Eq. (1), Then Eq. (7) can be transformed into Eq. (8):
tained features, as demonstrated by the following proposition: ∑
n ∑
Proposition: For all z that satisfy wTi z > wTj z, ∀j ∕
= i, and any α > 1, we CEH(p, q) = − p(xi )logq(xi ) = − yi log(ai )
have: i=1
∑
i (8)
=− yi log(softmax)
exp{αwi T z} exp{wi T z}
(2)
i
∑ { }≥∑ { }
j exp αwj z
T T
j exp wj z
where, yi denotes the true label value. Since one-hot encoding was used,
where w denotes the weight and z denotes the feature obtained from yi takes 0 or 1. In Python, the Cross Entropy is encapsulated in the
training. Negative Log Likelihood (NLLLoss) in the torch.nn library. The opti
Proof: Dividing both sides of the inequality by exp.{αwTi z}, we can mized loss function is determined by Eq. (9):
5
Fig. 5. Schematic of conversion from RGB to CIE Lab.
Table 1 100
Parameters of the proposed pixel-wise detection model. CACN-32s
CACN-16s
Parameters Values Notes
CACN-8s
Batch size 32 Batch training sample size 90
Optimizer Adam Gradient Descent Optimizer
Learning rate - VGG 10− 4 Initial learning rate (VGG)
Learning rate - AlexNet 10− 3 Initial learning rate (AlexNet)
Learning rate - GoogLeNet 5− 5 Initial learning rate (GoogLeNet) 80
Learning rate decay 0.75 learning rate×0.75/100 epochs
L2 coefficient 10− 3 L2-regularization coefficient
70
100
CA CN -V GG
CA CN -Goog LeNet
CA CN -A lex Net 60
80
50
PA MPA MIoU FWIoU
60
Fig. 6. Results under different skip-fusion structures.
40 detect objects in images in the training dataset, but does not generalize
well outside the training set, which means that there are more training
iterations than optimal. This problem due to the self-structure is com
mon and difficult to avoid in deep learning. L2-regularization was used
20
in response to the overfitting problem. 2-norm is determined by Eq. (10):
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
∑n
‖w‖2 = |wi |2 (10)
0 i=1
PA MPA MIoU FWIoU
where w denotes the original weight vector. L2-regularization can be
Fig. 5. Results of extended comparison of CACN model.
seen as the penalty term of the loss function, i.e. adding a regularizer to
∑ the loss function, which make the newly obtained optimized objective
CEH(p, q) = − Bi × yi log(softmax) function need to make a trade-off between the two, as shown in Eq. (11):
i
} ∑
N
∑ exp{w'i T z' optimized objective function = CEH n (p, q) + λ‖w‖2 2 (11)
=− Bi × yi log∑ { T ' } n=1
i exp w'j z (9)
j
} where N denotes the total number of pixels, λ denotes the L2 coefficient,
exp{w'i T z' which was set as 10− 3 by experiment.
= − Br × log∑ { T ' }
exp w'j z Thus the complex resolution that may result from only optimizing
j the loss function are no longer the optimal. Adding the regularizer can
make the resolution simpler, which conforms to Occam's Razor theory,
Apart from the solutions for the empirical risk mentioned above, the
structural risk was also taken into consideration. In deep learning, as a and is also more in line with the deviance and variance analysis (vari
ance indicates the complexity of the model). By reducing the complexity
typical structural risk, overfitting means that the method is able to
of the model, the degree of overfitting was reduced, thereby the
6
100
Pore
100
Peculiarity
95
Crack
IoU (Intersection over Union)
80
90
CA (Class Accuracy)
85
60
80
40
Pore
75
Peculiarity
Crack
20
70
24 8 12 16 24 32 40 48 64 72 96 128 24 8 12 16 24 32 40 48 64 72 96 128
Batch size Batch size

(a) (b)
Fig. 7. Results of determining the optimal batch size: (a) IoU - Batch size, (b) CA - Batch size.
Fig. 8. An instance of pixel-wise semantic segmentation results with different batch sizes.
generalization ability of the model was improved. Random multi-scale Section 3.1) to determine the optimal model in iterations and to provide
training was applied by training the proposed method for different guidance on model optimization and tuning. Each batch was trained
resolutions to enhance the accuracy as well. with linearly increasing learning rates, beginning with the minimum
learning rate at 5 % of the maximum learning rate. The learning rate
2.2.3. Training procedure decay was 0.75 every 100 epochs. The model eventually went through
The learning rate and the batch size of training in the proposed >400 epochs to reach the optimum.
method were set by experiment at 10− 4 and 32, respectively. S-fold cross
validation was used during the training procedure. Evaluation and test
were carried out after each epoch to calculate IoU, PA, CA, etc. (in
7
Fig. 9. Units of pixel-wise semantic segmentation results: (a) captured by UAV, (b) captured by scanner.
2.3. Chromatic aberration analysis 3. Results and evaluation
The chromatic aberration analysis was carried out with the effects of 3.1. Evaluation metrics of the classification model
the concrete surface targets removed and only the general surface
(background) retained. RGB color space cannot be directly converted to Pixel accuracy (PA), mean pixel accuracy (MPA), intersection over
CIE Lab color space [51], and it is necessary to convert RGB color space union (IoU), mean intersection over union (MIoU), frequency weighted
to XYZ color space and then XYZ color space to CIE Lab color space. The intersection over union (FWIoU), and class accuracy (CA) were applied
conversion between RGB and XYZ is shown in Eqs. (12) and (13). to evaluate the automated concrete appearance pixel-wise semantic
⎡ ⎤ ⎡ ⎤⎡ ⎤ segmentation method. Among them, PA, MPA, MIoU and FWIoU are
X 0.4124 0.3576 0.1805 R
evaluation metrics for the whole sample set, while IoU and CA are
⎣ Y ⎦ = ⎣ 0.2126 0.7152 0.0722 ⎦⎣ G ⎦ (12)
evaluation metrics for a certain category. In the confusion matrix of this
Z 0.0193 0.1192 0.9505 B
task, the rows refer to the true category to which the pixel belongs, i.e.,
⎡
R
⎤ ⎡
3.2406 − 1.5372 − 0.4986
⎤⎡ ⎤
X the label value, and the columns refer to the predicted category for the
⎣ G ⎦ = ⎣ − 0.9689 1.8758 0.0415 ⎦⎣ Y ⎦ (13) pixel, i.e., the predicted value. In the following equations, Ni, j are
B 0.0557 − 0.2040 1.0570 Z defined as the number of pixels with true value of category i and pre
dicted value of category j. The number of categories k takes 4.
It can be seen that X = 0.4124R + 0.3576G + 0.1805B, the sum of the PA is the ratio of correctly predicted pixels to the total pixels, which
coefficients is 0.9505, which is very close to 1. The range of RGB is [0, is expressed in the confusion matrix as the ratio of the diagonal to the
255], so if the sum of the coefficients is equal to 1, then the range of X total number of pixels, as shown in Eq. (16):
must also be between [0, 255]. We can therefore consider modifying the ∑k
coefficients equivalently so that the sum is equal to 1, thus achieving a Nii
PA = ∑k i=1 ∑k (16)
mapping of XYZ and RGB in the same range. Then, the XYZ color space is i=1 j=1 Nij
mapped to the CIE Lab color space as follows:
⎧ MPA, also known as mean class accuracy, is the average of the per
⎨ L* = 116f (Y/Yn ) − 16 centage of pixels correctly predicted for each category, which is
*
a = 500[f (X/Xn ) − f (Y/Yn ) ] (14) expressed in the confusion matrix by summing the pixel precision of
⎩ *
b = 200[f (Y/Yn ) − f (Z/Zn ) ] each category in turn by row divided by the number of rows (total
number of categories), as shown in Eq. (17):
where
⎧ 1∑k Nii
⎪
( )3
6 MPA = ∑ (17)
⎪
⎪ t1/3
, t> k i=1 kj=1 Nij
⎪
⎨ 29
f (t) = ( )2 (15) IoU of a category is the ratio of the intersection of the true and
⎪
⎪ 1 29 4
⎪
⎪
⎩ t + , others predicted values of the category to the concatenation of the true and
3 6 29 predicted values, which is expressed in the confusion matrix as the ratio
A diagram of the conversion of the RGB color space to the CIE Lab of the diagonal value to (row sum + column sum - diagonal value (where
color space is shown in Fig. 5. that diagonal value is located)), as shown in Eq. (18). MIoU is the
average of all IoU, as shown in Eq. (19). FWIoU is an upgraded version of
8
shown in Eq. (21):

Nii
CA = ∑k (21)
j=1 Nij
3.2. Evaluation metrics of chromatic aberration
Standard Deviation of Chromatic Aberration (SDCA) was proposed to

indicate the degree of chromatic aberration of the specimen itself. It is
calculated as follows: First, the average Lab coordinate (L, a, b) of all
pixel points is calculated and defined as the average pixel point. Then
the Euclidean distance between each pixel and the average pixel point is
calculated. Finally, the standard deviation between all pixel and the
average pixel point is calculated as SDCA. If the RGB color space is used,
L, a, b can be simply replaced by R, G, B respectively. The calculation
formula is as follows:
{√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ }
SDCA = std (Li − L)2 + (ai − a)2 + (bi − b)2 (22)
Standard Deviation from Design (SDD) was proposed to indicate the

degree of variation from the chosen standard color card or the design
color. It is calculated as follows: Take the Lab coordinate (L’, a’, b’) of
the standard color card of the original design and define it as the stan
dard design color. Then calculate the Euclidean distance between each
pixel and the standard design color. The standard deviation between
each pixel and the design standard color is then calculated as SDD. If the
RGB color space is used, L, a, b can be simply replaced by R, G, B
respectively. The calculation formula is as follows:
{√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ }
SDD = std (Li − L’ )2 + (ai − a’ )2 + (bi − b’ )2 (23)
3.3. Pixel classification results
The parameters of the proposed pixel-wise object detection model

are determined by experiment, as shown in Table 1. In order to deter
mine the most matching preposed base network, the model was
extended to different CNNs for comparison. Fig. 5 shows the results of
the comparison of the extension of the model with 32 times up-sampling.
The preposed CNNs include VGG-16, GoogLeNet, and AlexNet, all of
which are classic in image recognition and object detection field. As can
be seen, CACN-VGG achieves the best performance in all four evaluation
metrics, with CACN-GoogLeNet being the second and CACN-AlexNet the
worst. The number of convolutional layers of GoogLeNet, VGG-16, and
AlexNet are 22, 16, and 8, respectively, and the receptive fields are 907,
404, and 355, respectively. Comparing the experimental results of these
Fig. 10. An instance of pixel-wise semantic segmentation result of a precast three CNNs, it can be seen that the hierarchy and the receptive field of
concrete column.
the network have an optimal interval. From the performance of the
evaluations, comparing the results of MIoU and FWIoU, it can be seen
MIoU, which brings in the frequency weights, as shown in Eq. (20). The that the bringing in of frequency weights has significantly improved the
weight is the ratio of the number of pixels correctly predicted in each original MIoU, which makes further distinction between different al
category to the total number of pixels. gorithms, and proves the effectiveness of frequency weights on imbal
Nii ance category dataset.
IoU = ∑k ∑k (18) Based on the above experimental results, VGG-16 was chosen as the
j=1 Nij + j=1 Nji − Nii
preposed network for this model. Fig. 6 shows the pixel classification
1∑k Nii results by using CACN-32s, CACN-16s, and CACN-8s, respectively. It can
MIoU = ∑ ∑ (19) be seen from the four evaluation metrics, that the skip-fusion structure
k i=1 kj=1 Nij + kj=1 Nji − Nii
(CACN-16s, CACN-8s) significantly improves the performance of the
∑k model compared with CACN-32s, which only considers the last layer of
∑k j=1 Nij
FWIoU = ∑ ∑k ⋅∑k
Nii
∑k (20) the feature maps. CACN-8s achieves the best results in most of the
k
evaluation metrics (PA, MPA, FWIoU), and is only slightly lower than
i=1
i=1 j=1 Nij j=1 Nij + j=1 Nji − Nii
CACN-16s in MIoU. The proposed model achieved the highest PA of
CA of a category is the ratio of the number of pixels correctly pre
98.43 %, MPA of 91.33 %, MIoU of 83.89 %, and FWIoU of 96.95 %. The
dicted for that category to the total number of pixels in that category,
above results prove the effectiveness of the skip-fusion structure, i.e.,
which is expressed in the confusion matrix as the ratio of the diagonal
adding shallow geometric information to deep semantic information
value to the sum of the rows in which the diagonal value is located, as
could help better fine cuts on object boundaries, thus significantly
9
100
Pore 100
Peculiarity
95 Crack
CA (Class Accuracy)
90
90
85
80
80
Pore
Peculiarity
75 Crack
70 70
0 10000 20000 30000 4000 0 500 00 60 000 0 10000 20000 30000 40000 50000 60000
Light intensity / Lux Light intensity / Lux
(a) (b)
Fig. 11. Robustness of object detection with light intensity variance: (a) IoU- Light intensity, (b) CA-Light intensity.
100
Pore
Peculiarity 100
95
Crack
90 95
CA (Class Accuracy)
85
90
80
85
75 Pore
Peculiarity
80 Crack
70
0 20 40 60 80 100 0 20 40 60 80 100
Surface moisture content /% Surface moisture content /%
(a) (b)
Fig. 12. Robustness of object detection with variation of light intensity: (a) IoU-Surface moisture content, (b) CA-Surface moisture content.
improving the overall performance of pixel classification. training time.

Batch size refers to the number of samples learned in one iteration There are certain differences in the pixel-wise semantic segmentation
with one weights update. Too small a batch size would cause error back results with different batch sizes, as the instance given in Fig. 8. When
propagation for weight update before the effective feature information is the batch size is too small (2–12), the detection accuracy is low and the
learned, resulting in the algorithm being unable to converge. On the recognition of the object edge shape is poor. As the batch size increases,
other hand, if the batch size is too large, it would lead to untimely the recognition results gradually approach the real situation and reach
iterative updates and slow parameter corrections. To explore the best the optimum at 32–40. Then when the batch size is too high (64–96), the
performance of the training, the batch size was adjusted to improve IoU model tends to fall into the local optimum, and the recognition accuracy
and CA of each category. Fig. 7 shows the pixel classification results with is reduced. Therefore, the batch size was determined as 32 combining
different batch sizes after 400 epochs in CACN-8 s. It can be seen that the the above results.
classification results are better as the batch size increases, which is due
to the fact that the more accurate the error descent direction, the less
training oscillation. The global optimal point is reached at a certain 3.4. Pixel-wise semantic segmentation results
point and then starts to gradually decline, which is due to the fact that
after the batch size increases to a certain level, the direction of the Since the automated cropping, semantic segmentation, re-splicing
gradient descent no longer changes and falls into a local extremum (i.e., restoration (to the input size), and output analysis of detection results
the error surface local depression point). The optimal batch size for pore, (including surface pore area ratio, maximum surface pore diameter,
peculiarity, and crack are 32, 32, and 48, respectively from the maximum surface crack width, and length) were compiled within the
perspective of IoU, and 32, 32, and 32, respectively from the perspective proposed model, the optimized and fixed model has no size re
of CA. With other conditions being the same, a larger batch size could quirements for input images and no complex operations required for
improve the parallelization efficiency of large matrix multiplication thus users, thus can be called “automated”. Some unit results of the proposed
improving memory utilization, as well as reduce the number of itera pixel-wise semantic segmentation model are shown in Fig. 9. Moreover,
tions to traverse the entire dataset once (1 epoch) thus reducing the the instance of a 1440 × 8440 input precast concrete column image
captured by UAV and the corresponding output images are shown in
10
Fig. 13. Instances of images of the same surface with variation of surface moisture content: (a) captured by UAV, (b) captured by scanner.
Fig. 10. Whether it is for surface pores or cracks, design seams or studies, the environmental conditions at the time of image collection
formwork seams, or even peculiarities such as stains, the color cards, were taken into consideration. In particular, light intensity and surface
and the scaleplate on the concrete surface, the output results can be moisture content have non-negligible impacts on semantic segmenta
directly perceived. tion, thus affecting the quantitative analysis of cracks, pores, etc., and
Among the samples with unsatisfactory detection results, the main that of chromatic aberration, which is discussed in Section 4.
failures were concentrated on the peculiarities. This is because the pe
culiarity targets contain many different styles of pixels, which are 4. Discussion
difficult to be classified with very high accuracy and will be further
discussed in Section 4. Follow-up studies could consider further refine In addition to identifying inter-class variations, intra-class variations
ment of the types of peculiarity targets or use instance segmentation, are further discussed to analyze the robustness of the proposed model.
and also take inter-unit continuity into consideration. One of the key reasons for using a convolutional neural network in this
Based on the results of the semantic segmentation, the corresponding task is to reduce the impact of image quality fluctuation on analysis
surface pore area ratio, maximum surface pore diameter, maximum results due to differences in image capture conditions, such as equip
surface crack width, and length were automatically calculated. After ment, environmental conditions, individual differences in a manual
>150 instances of experimental verification, the relative deviations for process, etc. The dataset of this task contains the images captured by
the four parameters mentioned above between the results of the auto UAV under different parameters including image size, IOS, exposure
mated analysis and the manual measurements were not higher than 2.8 time, f number, distance from the surface and by scanner under different
%, 1.2 %, 5.1 %, and 2.1 %, respectively. As for the instance image given scanning resolution and speed, different environment circumstances
above, the whole automatic detection and analysis process took <3 s. including temperature, wind speed, wind direction, light intensity, angle
Frankly, this is highly efficient and accurate in the field of pixel-wise of light exposure and surface moisture content, as well as different
analysis of large-size images. strength, surface shapes and colors of the concrete. Due to the richness of
Unlike other multiple classification models, the approach proposed the dataset, the proposed model already has considerable robustness in
in this paper is not only to segment the cracks as accurately as possible, terms of the principle of deep learning.
but also to view and analyze various targets equally (as the modified The two most influential and non-negligible effects (light intensity
learning algorithm proposed in Section 2.2.2 aims to do). Among them, and surface moisture content) were identified experimentally. The
surface pore, as an independent category, can provide a quantitative robustness of the proposed model was verified in terms of these two as
analysis of the concrete surface, which is also an important interfering below. The validation samples were not in the training set and were no
term for crack segmentation. In addition, design seams, formwork <20 per subgroup.
seams, and peculiarities such as stains, color cards, and the scaleplate all
need to be identified to avoid their interference with chromatic aber
ration analysis of the general surface. Furthermore, unlike previous
11
Mean deviatio n of SDCA from manually measured reference 1.0 1.5
Mean deviation of SDD from manually measured reference

UCL RGB
UCL RGB
CACN -Lab 1.0 C ACN-Lab
RGB
0.5 CL CACN-Lab
R GB
C LCACN -Lab
CL RGB 0.5 C LRGB
UCLCACN-L ab
0.0 UCLCACN-L ab
0.0
LCLCAAN-L ab
LCLCAAN-L ab
-0.5
-0.5
-1.0
-1.0
LCL RGB
-1.5
LCL RGB
0 10000 20000 30000 4 0000 50000 60000 0 10000 20000 30000 4 0000 50000 60000
Light intensity / Lux Light intensity / Lux
(a) (b)
Mean deviation of SDCA from manually measured reference
Mean devia tion of SDD from ma nually measured reference

0.75
0.8 UCL RGB
UCL RGB
CACN-Lab
CACN-Lab RGB
0.6
RGB CL CACN-Lab
0.50
CL CACN-Lab
CL RGB
CL RGB 0.4
UCLCACN-L ab
0.25 0.2
UCLCACN-L ab
0.0
0.00
-0.2 LCLCACN-L ab
LCLCAAN-L ab
-0.25 -0.4
LCL RGB
LCL RGB
-0.6
0 20 40 60 80 100 0 20 40 60 80 100
Surface moisture content /% Surface moisture content /%
(c) (d)
Fig. 14. Comparison of the robustness of the proposed chromatic aberration analysis method (CACN-Lab) and RGB mode under different light intensity and surface
moisture content: (a) SDCA - Light intensity; (b) SDD - Light intensity; (c) SDCA - Surface moisture content; (d) SDD - Surface moisture content.
4.1. Model robustness on light intensity surface increases. In a moist state, surface pores and cracks contain more
water due to their larger specific surface area. The wet air is subject to
The robustness metrics (IoU and CA) of the object detection of the gravity, so the drying process of concrete surface is a non-uniform
proposed model with light intensity variance are shown in Fig. 11. The process. The water contained in or around objects makes them darker,
relative deviations of IoU of pore, peculiarity, and crack are no >2.5 %, which adversely affects the detection. Among them, detection of pecu
0.7 %, and 1.2 %, respectively, and the relative deviations of CA are no liarity has the worst effect and the most fluctuation because it contains
>1.1 %, 3.2 %, and 2.4 %, respectively. The surface of concrete shows many different kinds. The detection results under air-dry state are
different states under different light intensities. At first, as the light in closest to the truth.
tensity increases, the difference between the objects to be detected and
the general concrete surface gradually increases. After reaching a certain
level, it no longer increases or even decreases. The light intensity during 4.3. Robustness of chromatic aberration analysis
image capture has a certain optimal interval for the effect of object
detection. In this case, it is 5000–20,000 Lux. To further analyze the robustness of the proposed chromatic aber
ration analysis, Quality Control Chart was used. With the number of
factor groups >10, the XBar-S control charts of SDCA and SDD of the
4.2. Model robustness on surface moisture content proposed method against directly using RGB without target identifica
tion to eliminate defective objects and peculiarities under different light
The robustness metrics (IoU and CA) of the object detection of the intensity and surface moisture content are shown in Fig. 14. The vertical
proposed model with surface moisture content variance are shown in coordinates are the average deviation of SDCA and SDD from the manual
Fig. 12. The relative deviations of IoU of pore, peculiarity, and crack are measurements for each subgroup. The blue dashed line is the specifi
no >4.8 %, 15.5 %, and 6.0 %, respectively, and the relative deviations cation limit (SLCACN) of the proposed method, and the green dashed line
of CA are no >3.5 %, 17.8 %, and 0.7 %, respectively. As the surface of is the upper and lower control limits (UCLCACN, LCLCACN). The red
concrete gradually dries (as shown in Fig. 13), the color of concrete dashed line is the specification limit of using RGB color mode without
becomes lighter, and the difference between objects and the general removing defective objects and peculiarities (SLRGB), under the same
12
calculation method as SDCA. And the black dashed line is its upper and Two evaluation metrics of chromatic aberration were proposed to
lower control limits (UCLRGB, LCLRGB). The specification limits and indicate the degree of chromatic aberration of the specimen itself and
control limits are shown as follows: the degree of variation from the chosen standard color card or the design
color. The chromatic aberration analysis was carried out with the effects
SL = X (18) of the concrete surface targets removed and only the general surface
(background) retained. The proposed CACN-CIE chromatic aberration
UCL = CL + 3σ (19) analysis method is more robust than the RGB mode under different light
intensity and surface moisture content.
LCL = CL − 3σ (20)
where σ is the standard deviation of the mean of each subgroup X, also Declaration of competing interest
known as the process standard deviation.
As can be seen from Fig. 14, whether in terms of the deviation of the The authors declare that they have no known competing financial
control limits from 0, the range of upper and lower control limits, or the interests or personal relationships that could have appeared to influence
fluctuation of the data, it is clear that the proposed CACN-Lab method is the work reported in this paper.
more robust than the RGB mode under different light intensity and
surface moisture content. Data availability
5. Conclusions Data will be made available on request.
An end-to-end concrete appearance analysis method based on pixel- Acknowledgments

wise semantic segmentation is proposed to detect multiple kinds of
targets on concrete surface: pore, crack, peculiarity, and general surface, The authors are grateful to the Editor and the reviewers for their
and also to analysis the chromatic aberration. A UAV and an orbital constructive comments and valuable suggestions to improve the quality
scanner were used to collect 2463 raw images under different equipment of the article.
parameters and environmental conditions. After pre-processing, the unit
was set as 480 × 352 and the pixel locations of all targets and their References
corresponding labels were specified. The training, validation, and test [1] L. Cong, C.L. Victor, K.Y.L. Christopher, Flaw characterization and correlation with
were set as a sample ratio of 8:1:1. Based on the weights of different cracking strength in Engineered Cementitious Composites (ECC), Cem. Concr. Res.
targets and Cross Entropy, the objective function was optimized to deal 107 (2018) 64–74.
[2] W. Virginie, M.J. Henk, Quantification of crack-healing in novel bacteria-based
with the category imbalance problem, i.e. empirical risk. As for struc
self-healing concrete, Cem. Concr. Compos. 33 (7) (2011) 763–770.
tural risk, L2-regularization was used in response to the overfitting [3] C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, P. Fieguth, A review on computer
problem. S-fold cross-validation was used during the training procedure, vision based defect detection and condition assessment of concrete and asphalt
and evaluation and test were carried out after each epoch to determine civil infrastructure, Adv. Eng. Inform. 29 (2) (2015) 196–210.
[4] T. Yamaguchi, S. Hashimoto, Improved percolation-based method for crack
the optimal model in iterations and to provide guidance on model detection in concrete surface images, in: Proceedings of the 19th International
optimization and tuning. Conference on Pattern Recognition 1(6), 2008, pp. 1746–1749. Tampa, Florida,
The optimal batch size and learning rate was determined experi USA.
[5] N.D. Hoang, Q.L. Nguyen, A novel approach for automatic detection of concrete
mentally as 32 and 10− 4 respectively. VGG-16, GoogLeNet, and AlexNet surface voids using image texture analysis and history-based adaptive differential
were used as the preposed base CNNs to determine the most matching evolution optimized support vector machine, Adv.Civ.Eng. 1–15 (2020), https://
replaceable parts of the network structure. CACN-VGG with an 8 times doi.org/10.1155/2020/4190682.
[6] Y. Yu, M. Rashidi, B. Samali, A.M. Yousefi, W.Q. Wang, Multi-image-feature-based
up-sampling skip-fusion structure performed best. The proposed model hierarchical concrete crack identification framework using optimized SVM multi-
achieved the highest PA of 98.43 %, MPA of 91.33 %, MIoU of 83.89 %, classifiers and D-S fusion algorithm for bridge structures, Remote Sens. 13 (2)
and FWIoU of 96.95 %. According to the semantic segmentation results, (2021) 240, https://doi.org/10.3390/rs13020240.
[7] F.F. Tritschel, J. Markowski, N. Penner, R. Rolfes, L. Lohaus, M. Haist, AI-supported
whether it is for surface pores or cracks, design seams or formwork quality assurance for the flow production of UHFB bar elements, Beton-
seams, or even peculiarities such as stains, the color cards, and the Stahlbetonbau 116 (S2) (2021) 34–41.
scaleplate on the concrete surface, all targets could be correctly recog [8] X. Huang, M. Wasouf, J. Sresakoolchai, S. Kaewunruen, Prediction of healing
performance of autogenous healing concrete using machine learning, Materials 14
nized. Based on the results, the corresponding surface pore area ratio,
(15) (2021) 4068, https://doi.org/10.3390/ma14154068.
maximum surface pore diameter, maximum surface crack width, and [9] R.S. Adhikari, A. Bagchi, O. Moselhi, Automated condition assessment of concrete
length were automated calculated and the relative deviations between bridges with digital imaging, Smart Struct.Syst. 13 (6) (2014) 901–925.
the results of the automated analysis and the manual measurements [10] Y. Nishida, N. Sohara, S. Yasukawa, K. Ishii, Crack detection in
a concrete structure using an underwater vehicle, in: Proceedings of the 26th
were not higher than 2.8 %, 1.2 %, 5.1 %, and 2.1 %, respectively. The International Conference on Artificial Life and Robotics, 2021, pp. 777–781
proposed model has no requirement on the input image size, and the (Online).
whole automatic detection and analysis process of a 1440 × 8440 raw [11] A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, S. Thrun,
Dermatologist-level classification of skin cancer with deep neural networks, Nature
image took <3 s. 542 (7639) (2017) 115.
The two most influential and non-negligible effects: light intensity [12] O.M. Manzanera, S.K. Meles, K.L. Leenders, R.J. Renken, M. Pagani, D. Arnaldi,
and surface moisture content were identified experimentally and the F. Nobili, J. Obeso, M.R. Oroz, S. Morbelli, N.M. Maurits, Scaled subprofile
modeling and convolutional neural networks for the identification of Parkinson's
robustness of the proposed model was verified. With light intensity disease in 3D nuclear imaging data, Int. J. Neural Syst. 29 (9) (2019), 1950010.
variance, the relative deviations of IoU of pore, peculiarity and crack are [13] P. Wang, X. Bai, Regional parallel structure based CNN for thermal infrared face
no >2.5 %, 0.7 % and 1.2 %, respectively, and the relative deviations of identification, Integr.Comput.Aided Eng. 25 (3) (2018) 247–260.
[14] H. Adeli, Four decades of computing in civil engineering, in: C. Ha-Minh, D. Dao,
CA are no >1.1 %, 3.2 % and 2.4 %, respectively. With surface moisture F. Benboudjema, S. Derrible, D. Huynh, A. Tang (Eds.), CIGOS 2019, Innovation for
content variance, The relative deviations of IoU of pore, peculiarity and Sustainable Infrastructure 54, Springer, Singapore, 2020, pp. 3–11.
crack are no >4.8 %, 15.5 %, and 6.0 %, respectively, and the relative [15] I. Katsamenis, E. Protopapadakis, A. Doulamis, N. Doulamis, A. Voulodimos, Pixel-
level corrosion detection on metal constructions by fusion of deep learning
deviations of CA are no >3.5 %, 17.8 %, and 0.7 %, respectively. Among
semantic and contour segmentation, in: G. Bebis, et al. (Eds.), Advances in Visual
them, the surface moisture content has a negative impact on the iden Computing. ISVC 2020. Lecture Notes in Computer Science vol. 12509, Springer,
tification of peculiarity and needs to be taken care of during the image Cham, 2020, https://doi.org/10.1007/978-3-030-64556-4_13.
capture stage.
13
[16] L. Zhang, F. Yang, Y.D. Zhang, Y.J. Zhu, Road crack detection using deep [34] J.R. Cheng, W. Xiong, W.Y. Chen, Y. Gu, Y.S. Li, Pixel-level crack detection using
convolutional neural network, in: 2016 IEEE International Conference on Image U-net, in: Proceedings of TENCON 2018 - 2018 IEEE Region 10 Conference, 2018,
Processing (ICIP), 2016, pp. 3708–3712. Phoenix, Arizona, USA. pp. 462–466. South Korea.
[17] F.C. Chen, R.J. Mohammad, NB-CNN: deep learning-based crack detection using [35] S.M. Kim, S.J. Mo, D.S. Kim, A method for concrete crack detection using U-net
convolutional neural network and Naive Bayes data fusion, IEEE Trans. Ind. based image inpainting technique, J.Korea Soc.Comput.Inf. 25 (10) (2020) 35–42.
Electron. 65 (5) (2018) 4392–4400. [36] C.V. Dung, L.D. Anh, Autonomous concrete crack detection using deep fully
[18] A. Zhang et al Deep learning-based fully automated pavement crack detection on convolutional neural network, Autom. Constr. 99 (2019) (2019) 52–58.
3D asphalt surfaces with an improved CrackNet. Journal of Computing in Civil [37] E. Protopapadakis, A. Voulodimos, A. Doulamis, N. Doulamis, T. Stathaki,
Engineering 32(5): 04018041. Automatic crack detection for tunnel inspection using deep learning and heuristic
[19] Y. Lecun, L. Bottou, Gradient-based learning applied to document recognition, image post-processing, Appl. Intell. 49 (7) (2019) 2793–2806.
Proc. IEEE 86 (11) (1998) 2278–2324. [38] T. Yamane, P. Chun, Crack detection from a concrete surface image based on
[20] C.V. Dung, L.D. Anh, Autonomous concrete crack detection using deep fully semantic segmentation using deep learning, J. Adv. Concr. Technol. 18 (2020)
convolutional neural network, Autom. Constr. 99 (2019) 52–58. 493–504.
[21] S.Y. Li, X.F. Zhao, Image-based concrete crack detection using convolutional neural [39] J.J. Rubio, T. Kashiwa, T. Laiteerapong, W.L. Deng, K. Nagai, S. Escalera,
network and exhaustive search technique, Adv.Civ.Eng. (2019), https://doi.org/ K. Nakayama, Y. Matsuo, H. Prendinger, Multi-class structural damage
10.1155/2019/6520620. segmentation using fully convolutional networks, Comput. Ind. 112 (2019) (2019),
[22] B. Kim, S. Cho, Image-based concrete crack assessment using mask and region- 103121.
based convolutional neural network, Struct. Control. Health Monit. 26 (8) (2019), [40] J. Miranda, J. Valenca, H. Costa, E. Julio, Chromatic design and application of
e2381. restoration mortars on smooth surfaces of white and GRAY concrete, Struct. Concr.
[23] J.H. Deng, Y. Lu, V.C.S. Lee, Concrete crack detection with handwriting script (2020), https://doi.org/10.1002/suco.202000054.
interferences using faster region-based convolutional neural network, Comput. [41] J. Miranda, J. Valenca, E. Julio, Colored concrete restoration method: for
Aided Civ.Infrastruct.Eng. 35 (4) (2020) 373–388. chromatic design and application of restoration mortars on smooth surfaces of
[24] R. Davoudi, G.R. Miller, J.N. Kutz, Data-driven vision-based inspection for colored concrete, Struct. Concr. 20 (4) (2019) 1391–1401.
reinforced concrete beams and slabs: quantitative damage and load estimation, [42] E.S. Castello, A.G. Santos, Variation of the surface chromatic properties of exposed
Autom. Constr. 96 (2018) 292–309. concrete facades treated with anti-graffiti products, Rev. Construc. 15 (1) (2016)
[25] Y.J. Cha, W. Choi, G. Suh, S. Mahmoudkhani, Autonomous structural visual 51–60.
inspection using region-based deep learning for detecting multiple damage types, [43] O. Rodriguez, M.I.S. de Rojas, M. Frias, A.R. Costa, The white cement behaviour
Comput.Aided Civ.Infrastruct.Eng. 33 (9) (2018) 731–747. with different materials addition submitted to ultraviolet light exposure, in:
[26] Y. Gao, K.M. Mosalam, Deep transfer learning for imagebased structural damage Materials Science Forum 636–637, Trans Tech Publications, 2010, pp. 1228–1233.
recognition, Comput.Aided Civ.Infrastruct.Eng. 33 (9) (2018) 748–768. [44] M. Cassel, F.L. Kastensmidt, Evaluating one-hot encoding finite state machines for
[27] C. Modarres, N. Astorga, E.L. Droguett, V. Meruane, Convolutional neural networks SEU reliability in SRAM-based FPGAs, in: Proceedings of the IEEE International
for automated damage recognition and damage type identification, Struct. Control. On-line Testing Symposium, 2006, pp. 139–144. Lake Como, Italy.
Health Monit. 25 (10) (2018), e2230. [45] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep
[28] Y.Q. Jiang, D.D. Pang, C.D. Li, A deep learning approach for fast detection and convolutional neural networks, Commun. ACM 60 (6) (2017) 84–90.
classification of concrete damage, Autom. Constr. 128 (2021), 103785. [46] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
[29] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic image recognition, Comput. Sci. 2014 (2014) https://arxiv.org/abs/1409.1556.
segmentation, in: Proceedings of the 2015 IEEE Conference on Computer Vision [47] C. Szegedy, W. Liu, Y.Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
And Pattern Recognition, 2015, pp. 3431–3440. Boston, Massachusetts, USA. V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of
[30] P.W. Guo, W.N. Meng, Y. Bao, Automatic identification and quantification of dense the 2015 IEEE Conference on Computer Vision And Pattern Recognition, 2015,
microcracks in high-performance fiber-reinforced cementitious composites through pp. 1–9. Boston, Massachusetts, USA.
deep learning-based computer vision, Cem. Concr. Res. 148 (2021) (2021), [48] Y.D. Wen, K.P. Zhang, Y. Qiao, A discriminative feature learning approach for deep
106532. face recognition, in: Proceedings of the 14th European Conference on Computer
[31] Z.Q. Liu, Y.W. Cao, Y.Z. Wang, W. Wang, Computer vision-based concrete crack Vision (ECCV), 2017, pp. 499–515. Amsterdam, Netherlands.
detection using U-net fully convolutional networks, Autom. Constr. 104 (2019) [49] R. Ranjan, C.D. Castillo, R. Chellappa, L2-constrained softmax loss for
129–139. discriminative face verification, in: Computer Vision and Pattern Recognition,
[32] S.Y. Li, X.F. Zhao, G.Y. Zhou, Automatic pixel-level multiple damage detection of 2017 http://arxiv.org/abs/1703.09507, http://arxiv.org/abs/1703.09507.
concrete structure using fully convolutional network, Comput.Aided Civ. [50] J.K. Deng, J. Guo, N.N. Xue, S. Zafeiriou, ArcFace: additive angular margin loss for
Infrastruct.Eng. 34 (7) (2019) 616–634. deep face recognition, in: Proceedings of the 32nd IEEE/CVF Conference on
[33] Z.Q. Liu, Y.W. Cao, Y.Z. Wang, W. Wang, Computer vision-based concrete crack Computer Vision And Pattern Recognition (CVPR), 2019, pp. 4685–4694. Long
detection using U-net fully convolutional networks, Autom. Constr. 104 (2019) Beach, California, USA.
(2019) 129–139. [51] C. Connolly, T. Fleiss, A study of efficiency and accuracy in the transformation
from RGB to CIELAB color space, IEEE Trans.Image Process. 6 (7) (1997) 1046.
14

CIE Lab of Concrete

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CIE Lab of Concrete

Uploaded by

Copyright:

Available Formats

Cement and Concrete Research 161 (2022) 106926

Contents lists available at ScienceDirect

Cement and Concrete Research

End-to-end concrete appearance analysis based on pixel-wise semantic

Fig. 1. Flowchart of the proposed analysis method.

Fig. 2. Schematic of dataset process and label preparation.

Fig. 3. Schematic of one-hot encoding.

Fig. 4. Network architecture of the proposed pixel-wise semantic segmentation method.

The proposed automated concrete appearance pixel-wise semantic obtain:

Fig. 5. Schematic of conversion from RGB to CIE Lab.

Batch size Batch size

2.3. Chromatic aberration analysis 3. Results and evaluation

shown in Eq. (21):

3.2. Evaluation metrics of chromatic aberration

Standard Deviation of Chromatic Aberration (SDCA) was proposed to

Standard Deviation from Design (SDD) was proposed to indicate the

3.3. Pixel classification results

The parameters of the proposed pixel-wise object detection model

improving the overall performance of pixel classification. training time.

Mean deviatio n of SDCA from manually measured reference 1.0 1.5

Mean deviation of SDD from manually measured reference

Mean devia tion of SDD from ma nually measured reference

5. Conclusions Data will be made available on request.

An end-to-end concrete appearance analysis method based on pixel- Acknowledgments

You might also like