You are on page 1of 7

Crack identification from concrete structure

images using deep transfer learning


Amena Qadri Syed J.Angel Arul Jothi Anusree K
Department of Computer Science Department of Computer Science Department of Computer Science
2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP) | 978-1-6654-4290-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/AISP53593.2022.9760670

Birla Institute of Technology and Birla Institute of Technology and Birla Institute of Technology and
Science Pilani,Dubai Campus Science Pilani,Dubai Campus Science Pilani,Dubai Campus
Dubai, UAE Dubai, UAE Dubai, UAE
f20170027@dubai.bits-pilani.ac.in angeljothi@dubai.bits-pilani.ac.in p20180904@dubai.bits-pilani.ac.in

Abstract—Early crack identification of civil structures is an concerned personnel in case of any potential threats. Moreover
essential task to prolong the life of the structures and to promise the identified cracks can be treated with the necessary
public safety. This research aims to develop an automated crack measures to reduce damage to the structure thereby increasing
identification system using deep learning models and the its life and enhancing public safety.
SDNET2018 dataset. Image augmentation is applied to
overcome the effect of unbalanced data. Deep pre-trained Research in the past on automated crack detection from
models like VGG16, InceptionV3, ResNet-50, ResNet-101 and images incorporated image processing and machine learning
ResNet-152 are trained and tested using the cracked and techniques to facilitate reliable and accurate detection and
uncracked images of decks and pavements from the dataset. The prediction of cracks from concrete structures. Recently, deep
experimental results show that the classification models learning, a branch of machine learning, has revolutionized the
obtained using transfer learning on the cracked and non- way computer vision problems are solved. Deep learning
cracked pavement and deck image dataset have accuracy values focuses on the learning of data representations based on the
of 70.59%, 60.31%71.93%, 75.40%, and 74.77% for VGG-16, human brain’s perception of information through the way
Inception V3, ResNet-50, ResNet-101, and Resnet-152 neurons interact. Though deep learning has been applied to
pretrained models respectively. solve several problems in various domains, limited research
has been done to use this technique to identify if an image has
Keywords—Crack detection, data augmentation, machine
a crack.
learning, deep learning, transfer learning, convolutional neural
network This work aims to develop an automated crack
identification system by implementing popular transfer
I. INTRODUCTION learning CNN models on the SDNET2018 image dataset of
Damage to an engineering structure can be defined as any decks and pavements. Transfer learning has been a popularly
change to its material and/or geometric properties of its used technique recently for computer vision tasks which
structural system. The breaking or fracturing of concrete enables the development of complex neural networks with
results in its complete or incomplete separation into two or ease. In transfer learning, a pre-trained deep neural network’s
more parts, and this separation is called a crack. The knowledge is used to solve a task which is similar to the task
maintenance of concrete structures ensures its aesthetics, the pre-trained architecture was initially trained to achieve
durability, and public safety. Early crack identification is of [10]. The automated system developed in this work classifies
high importance and is a matter of prime concern for an input image as cracked or uncracked. This study also
inspectors to monitor the structural integrity. compares the performance of the various transfer learning
CNN models to identify the best performing model which can
In the past, manual inspection was performed to identify be leveraged in further research to contribute better for
cracks in structures. This took into account the information inspection of cracked structures.
about the structure such as its age, design, details about the
initial observation of cracks on it, deflection, staining, water The structure of this manuscript is as follows: In Section
leakages, maintenance details, etc. Thus, manual inspection II, we present the Literature Review followed by the
required a lot of time, close observation, regular visit of the explanation of the dataset in Section III. The details of the
inspectors, well-focused inspection, and manual recording of convolutional neural networks and transfer learning through
details. Human error and negligence are unavoidable factors pre-trained models is presented in Section IV. The proposed
which may lead to deterioration of the structures due to long- method is given in Section V. In Section VI, the evaluation
term accumulation of damage when not detected early which metrics are discussed. The results obtained from the research
may prove to be a threat to public safety. and the discussion is presented in Section VII. The
conclusions drawn from the study are presented in Section
Development in the field of Internet of Things (IoT) and VIII.
machine learning (ML) has enabled autonomous structural
health monitoring (SHM) systems to be developed and II. LITERATURE REVIEW
implemented in a cost and labour effective manner [10].
Through these systems it is possible to regularly monitor and The advent of data driven methods have substantially
collect image data regarding the health of civil structures. The contributed to the progress in the domain of civil engineering
images collected could be analyzed by an automatic crack and inspection methods through different techniques to utilize
identification and detection algorithm which could warn the the data effectively. Initial work on the image data focused on

Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.
image processing techniques that included pre-processing of smoothing where the random noise of the images is removed
image data for intensity enhancement which includes the without loss in the precision of the edges. The Canny
methods of histograms for grayscale transformations [4], algorithm detects the edges following which the
adaptive grayscale transformation, the use of canny edge morphological filtering is applied. The feature extraction is
detection and morphological operations [3], logarithmic performed through calculating the horizontal and vertical
transformation [5], and the use of thresholding methods such projective integrals which serves as the information to be
as the OTSU algorithm for well thresholded results displaying utilized by the Decision Tree classifier to identify the regions
the distinguishable cracked regions [6,7]. of crack. The classification algorithm used is the C4.5 decision
tree algorithm implemented using the Weka suite to classify
Initial research utilizing image data leveraged different the images into the classes of Crack and Non-Crack [5].
techniques to obtain the image features and solve the crack
detection problem through basic image processing methods. It Adaptive linear grayscale image transformation is applied
is observed in [1] that the images were resized to only focus to deal with the difficulties of low levels of colour saturation,
on the regions of interest on which the edge detection is contrast and the high complexity of the composition of the
performed using the Prewitt kernel through which the gradient background. The images are denoised and their geometric
that is approximated with central differences is generated, features are extracted. The OTSU segmentation algorithm is
followed by the application of some element wise operations. applied to these images for crack segmentation. The suspected
The gradient matrix so obtained had information that can be crack images are now obtained and the features are extracted
used to threshold edges based on a certain criteria. Mean for a classification task. The length-width ratio of the
detection is performed on the edges detected previously with rectangle that is circumscribed in a crack image, the
a clear location of the edges through a mask dilation for the proportion of crack pixels in an image, the proportion of the
previously detected edges for better thresholded image results. points of curvature mutation, the image compactness through
Points that were not cracks and were unwanted were removed an equation incorporating crack area and crack perimeter, and
through orientation kernels. The angle and width of the crack the ratio of fitting difference where a series of selected points
were then determined through the use of angle kernels. from a target image are fit into a curve are the features that are
measured for classification. The parameter optimized SVM
In [2], the filtering and thresholding process was carried in through cross validation is used for image classification of
a different manner through first applying a gray level crack and non-crack images which are equally divided as per
transformation on the pavement images to deal with the the classes in the training and test sets [6].
inhomogeneous gray level in the background as well as the
noise-based histogram analysis and identifying background In [7], a similar approach is followed where the OTSU
signal to segment the background from the original image, algorithm performs the threshold segmentation followed by
followed by denoising through median filter for the feature extraction using the GLCM method and classification
elimination of isolated noise points. The spatial domain based on the extracted features by a SVM. This study
processing has histogram as its foundation that is leveraged additionally has the performance comparison of 5 different
for effective intensification of images to transform image gray SVM kernel functions, namely the dot, radial, polynomial,
level as per the specified gray level distribution through anova, and neural kernels for the prediction of the Crack and
histogram equalization. No Crack classes to identify the SVM that is best performing.
The results of the study concludes that the Anova kernel with
The use of real time image data for crack detection was a penalty factor of 0.5 best classifies the cracked and non-
made in [3] through an unmanned aerial monitoring which is cracked structures.
captured using a camera and utilizing a Raspberry Pi model 3
and ultrasonic sensors, processed by applying a grayscale In [9], the methods of efficient data collection through
adaptive threshold for extracting the region of interest, sensors and other devices were discussed such that the data
inverted and passed through a median filter to denoise initially collected does not require more time for pre-
following image morphology techniques followed by a fitted processing. The development of low cost and robust networks
contour and skeletal morphology after which the region of has been emphasized in the paper for the monitoring of the
crack was detected and the width of the crack was measured. structural health of the engineering structures. This
information helps in effective feature extraction that can be
Histogram thresholding is put to use for detection of used for prediction of the crack and its damage assessment as
potential cracks in concrete surfaces wherein the use of line well as predict the future lifetime of the structure.
emphasis filters is used to remove noise that is blob-like and
sheet-like, and the line structure is featured corresponding to The crack detection using traditional image processing
a crack.The moving average filters are then applied to methods were not efficient enough to provide a clear
smoothen the histogram for further analysis for the detection understanding and distinction of the cracked and uncracked
of significant peaks in the histogram based on the offset images. This led to research that could identify and extract
distance and crossover index that are dynamic parameters. features of the images easily. The CNNs were then leveraged
The result of this is put through image binarization as per a set to extract features from the images following which the
threshold [4]. classification layer of the CNN classifies the images to a class
based on the extracted features. Image feature extraction and
Pavement images collected using a camera were analyzed classification is a complex process requiring extremely long
for crack detection where the images were pre-processed training time and an enormous dataset to build on the results.
through logarithmic transformations for mapping a shorter This idea of transfer learning had been leveraged for object
range of values to a wider range of values. The darker pixels recognition task was put to implementation in [11] to classify
are thereby expanded and the brighter pixels are compressed, general objects of the GHIM10K and CalTech256 datasets
thereby facilitating the enhancement of the image’s using a fine-tuned pretrained VGG19 architecture followed by
illumination followed by the use of bilateral filters for image a SVM classifier.

Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.
Transfer learning gained a lot of popularity and was also tensor, perform the convolution operation, and apply an
put to use for a similar task of classification in for assuring the activation function which produces the output tensor. The
quality of welding structures. A pre-trained MobileNet most preferable activation function used in a CNN is the
architecture that was followed by a softmax classification ReLU activation function. This introduces a non-linearity in
layer that was optimized using Global Average Pooling and the CNN for it to be able to learn the real world data that has
DropBlock technology proved to perform much better than the non-negative linear values.
traditional convolutional neural networks in the classification
of defective welding structures [12]. The convolution layer helps the CNN to extract features
from the images. The convolutional layer is followed by the
The ability of pre-trained models to solve real world pooling layer (usually a max pooling layer) that helps in
problems of varying nature inspired researchers to implement down-sampling the output obtained from the convolutional
it further in the field of civil engineering and inspection. The layer so that the cost of computation and the probability of
use of pre-trained AlexNet, VGGNet13 and ResNet18 on overfitting is reduced. There can be several alternating layers
manually collected crack structure images to classify crack of convolutional and pooling layers in a CNN. A fully
images and detect regions of cracks in those images following connected layer which is in the form of a flattened vector
which a comparative analysis of these three neural networks follows the last pooling layer in a CNN. The softmax layer
is made to identify the best performing pre-trained takes the features from the fully connected layer to calculate
architecture for the crack detection task [10]. probabilities of the individual classes through an exponential
function that is normalized. Finally, the classification result is
III. DATASET the class which has the highest probability. A basic
The SDNET2018 dataset [8], which contains RGB images architecture of a CNN includes an input layer, multiple
of cracked and uncracked concrete structures (decks, convolution layers followed by pooling layers, fully
pavements, and walls) is utilized in this work. Specifically we connected layers, and finally the output layers. The
use only the decks and the pavements images from the dataset convolution layers serve as image feature extractors with the
for this study. The dataset contains 2025 cracked images and features serving as a basis for the resultant classification.
11,595 uncracked images of decks, and 2608 cracked and B. Pre-trained CNNs and Transfer Learning
21,726 uncracked images of pavements. Thus, the total
number of cracked and uncracked images used in this study is Further studies on CNN led to the development of deeper
4633 and 33,321 respectively. All images are in jpeg format neural networks called pre-trained CNNs. A pre-trained CNN
captured by a 16 MP Nikon digital camera. The images in the is a deep learning model that has been trained on a large
dataset are of the dimensions 256x256 pixels. Figure 1 shows amount of images belonging to some domain to solve some
a few sample cracked and uncracked images of decks and classification problem. Some of the popular pre-trained CNNs
pavements from the dataset. are AlexNet architecture, VGG architectures, Inception
architectures, ResNet architectures and the DenseNet
architectures. These networks along with their pre-trained
weights are available to the research community [13]. Each of
these models are pre-trained on the ImageNet dataset which is
a dataset commonly used for research in the computer vision
domain, and consists of 1,281,167 training images along with
50,000 validation and 100,000 test images in total with 1,000
categories.
Despite being highly efficient in solving several real-
world classification problems, CNNs suffer in terms of
requiring large amounts of training data and large training
time along with high computational power. The requirement
of large training data and long training time is counter-
attacked by the use of transfer learning. It is the application of
the knowledge gained by a CNN while solving a task to solve
a similar task through a pre-trained network generating results
faster, thereby reducing the training time. Transfer learning
has the main aim of transferring knowledge from the source to
accomplish a target. Different features are extracted in
Fig. 1. Cracked and non-cracked pavement and deck images from the
dataset. different layers of a deep learning model. Transfer learning
incorporates the tasks where domain adaptation is required
IV. BACKGROUND where the distribution of data at the source is shifted to the
representation as per the required target domain, domain
A. Convolutional Neural Network confusion is facilitated through transferring features across
Convolutional Neural Network (CNN) is a type of deep domains wherein the representations of the domain at source
learning architecture that helps in classification problems. It and the domain at the destination are forced to be similar
has revolutionized the way images could be processed and which as a result confuses the domain, and classification
analyzed. CNNs are composed of the convolutional layers, the problems with different number of target classes as per the
max pooling layers, the fully connected layers, and the task to be solved. This approach helps in solving various real-
Softmax layers. Convolutional layers of a CNN use filters that world problems through facilitating faster training along with
have learnable weights. The filters have width, height and good efficiency. The application of transfer learning models
depth arrangement of neurons to accept the input image or

Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.
on the test data yields accurate results making the models The VGG16 architecture is a CNN architecture that has 16
reliable. layers composed of 13 convolution (conv) layers and 3 fully
connected layers. Therefore the name VGG16. It has
V. METHODOLOGY AND IMPLEMENTATION approximately 138 million parameters. The conv layers have
In this work the analysis of image characteristics through a 3x3 receptive field and a stride of 1. The conv layers use
pre-trained CNNs and transfer learning have been put to use padding so that the output has the same spatial resolution as
for classification of concrete structure images as cracked and that of the input after convolution. Max pooling layers of 2 x
uncracked. The proposed method is detailed in Fig. 2. 2 with a stride of 2 are used after certain conv layers. Finally,
two fully connected layers having 4096 units in each, followed
by a softmax layer having 1000 units (as it is trained to classify
the 1000 classes in the ImageNet dataset) are present. The
activation function used in the conv layers is ReLU due to the
efficient computation for faster learning and reduction in the
vanishing gradient problem [13].
InceptionV3 is a deep learning network having 48 layers
and fewer parameters than VGGNet, and is Google’s third
edition of the Inception CNN. The Inception CNN, initially
introduced with the name GoogLeNet is known popularly
today as InceptionV1 which was made of inception modules.
The InceptionV1 architecture was refined through the
inclusion of batch normalization in InceptionV2. This was
followed by the addition of the ideas of factorization, auxiliary
classifiers and grid size reduction in the third iteration.
Different approaches to factorization have been incorporated
in the InceptionV3. First, factorization into smaller
convolutions was implemented which replaces a bigger
convolution by smaller convolutions. Another approach to
factorization is through asymmetrical convolutions wherein a
n x n convolution can be achieved by having a 1 x n
convolution followed by n x 1 convolution. Factorization
helped in the reduction of the number of parameters without
causing the network efficiency to reduce. [14]
The ResNet architectures are residual networks which are
Fig. 2. Methodology followed for crack and non-cracked image extremely deep and have convolutional layers and identity
classification blocks. It was the first network with the concept of skip
connection which helps in mitigating the vanishing gradient
A. Data Pre-processing problem through gradient flow with the means of an alternate
The images from the dataset are read and resized to 128 x path. ResNets of variable sizes can be constructed. The model
128 dimension. It can be observed from the SDNET2018 is then able to ensure that the performance of the higher layers
dataset that the number of cracked images is less than the is at least good as layers in the lower portion. They are
number of uncracked images for both decks and pavements. designed to perform 3x3 convolution in all convolutional
To resolve the issue of unbalanced dataset, image layers, and there is an increase in the number of filters on the
augmentation techniques are used to increase the number of basis of the depth of the network; it varies from 64 to 2048 (in
cracked images in the dataset. The cracked images are ResNet-50, ResNet-101, and ResNet-152). Only one 3x3 max
augmented where each original image is rotated to a random pooling layer is present in all the different ResNet models with
angle between 0 and 180 degree as well as 0 and -180 degrees, a stride of 2 applied after the first layer. Every model ends with
and flipped horizontally and vertically. Each of these an average pooling layer that replaces the fully connected
augmented images are dynamically processed at runtime. layer that helps replace the model complexity as this layer will
After data augmentation the images are split into two sets: one then have no parameters to optimize. This is followed by an
training set and the other testing set. The images are output layer with 1000 neurons to classify the 1000 classes of
distributed such that the training set has 32,400 images and the the ImageNet dataset having the Softmax activation function
testing set has 8,208 images. The training images consisted of applied to it to provide the probability of the class the image
16,200 cracked images and 16,200 uncracked images. The belongs to. A ResNet-50 pretrained model is 50 layers deep,
testing images had the composition of 4,204 cracked images ResNet-101 is 101 layers deep, and ResNet-152 is 152 layers
and 4,204 uncracked images. deep.
B. Pre-trained CNNs Used C. Transfer Learning the models
In this work, the pretrained VGG16 architecture, The pre-trained CNNs with Imagenet weights are loaded
InceptionV3 architecture, ResNet-50 architecture, ResNet- and their fully connected layers are left out so that the models
101 architecture, and ResNet-152 architecture are used for the can be modified for solving the task of concern as per the
classification of cracked and non-cracked surfaces. These requirements. Each layer of the pre-trained CNNs is already
architectures are considered due to their wide use in the trained before, therefore the layers are made as non-trainable
transfer learning studies of the past. explicitly to implement transfer learning effectively. Then, the
output layer of the pre-trained CNNs are flattened to obtain a

Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.
single dimension. This is followed by a fully connected layer TABLE I. CONFUSION MATRICES WHERE 0 IS FOR CRACKED AND 1 IS
FOR NON-CRACKED IMAGES FOR TRANSFER LEARNING BASED MODELS : (A) :
having 128 neurons along with the ReLU activation function. VGG16, (B) : INCEPTIONV3, (C) : RESNET-50, (D) : RESNET-101, (E) :
A dropout layer with 50% dropout rate follows this RESNET-152
arrangement and is succeeded by the final output layer which
is again a fully connected layer. The final output layer has 2 Class 0 1
neurons to represent the cracked and the non-cracked classes
with the Softmax activation function. The properties are 0 2265 1335
selected to be applied on the dataset used in this paper based
on the research results of the papers studied and other online 1 1079 3529
sources that have utilized these values for classification of
other image data. These properties, called the (A)
hyperparameters, play a crucial role in the training process.
The hyperparameter values are set the same for each transfer Class 0 1
learning model implemented in this paper to compare their
performances based on the evaluation metrics. 0 537 3063

VI. MODEL IMPLEMENTATION AND EVALUATION 1 195 4413


All the models in this work are implemented in python
with stochastic gradient descent (SGD) optimizer, number of (B)
epochs as 100, learning rate as 0.00015 and binary cross
entropy loss. An Early Stopping condition was used with a Class 0 1
patience value of 10 where the validation loss is monitored
and checked for minimum value of the validation loss. A 0 2178 1422
Model Checkpoint is also set to capture the model for the
1 882 3726
maximum validation accuracy.
The performance of the pre-trained deep learning models
(C)
leveraged for the classification of cracked and uncracked
images through transfer learning is evaluated through a Class 0 1
confusion matrix that enables to obtain the accuracy,
precision, recall, and F1-score. Since two classes (cracked and 0 2241 1359
non-cracked) are considered by this study, we obtain a 2x2
confusion matrix. Let true positives be the instances whose 1 660 3948
predicted and actual class labels are cracked, true negatives be
the instances whose predicted and actual class labels are non-
(D)
cracked, false positives be the predicted cracked images that
are actually non-cracked images, and false negatives be the
Class 0 1
predicted non-cracked images that are actually cracked
images. 0 2172 1428
The accuracy is a measure of how accurately the model
1 643 3965
has made its predictions for the task. Precision is another
metric which helps compute how many correctly predicted
cracked images were present out of all cracked images (E)
predicted by the model. Precision values showcase a model’s
reliability. Recall value highlights the value of the correctly TABLE II shows the results of the comparison on the basis
predicted cracked images divided by the correctly predicted of accuracy of classification suggests that the best
cracked images and wrongly predicted non-cracked images performance was by ResNet-101 with an accuracy of 75.40%
which were actually cracked region images. Recall values are followed by ResNet-152 with an accuracy close to it which is
necessary to not let go of an actual cracked image, even if it 74.77% and this is succeeded by the ResNet-50 having an
may be at the cost of raising a false alarm. The F1-score is accuracy of 71.93%. The VGG16 had a closer accuracy of
calculated such that the harmonic mean is obtained from the 70.59% and the InceptionV3 was found to be the model with
recall and precision values, thereby reducing the need for the the least accurate performance with an accuracy of 60.39%.
use of multiple metrics. It is used to understand the impact of On the basis of precision, the ResNet based models again
high or low precision values along with high or low recall displayed better values in comparison to the VGG16 and
values obtained for a classification model. A higher value of InceptionV3 models. ResNet-101 had a precision of 74.39%,
F1-score implies high precision and high recall values, followed by ResNet-152 with a precision of 73.52%, and
implying a better performance. VGG16 had 72.55% as its precision. The ResNet-50 had a
close precision value of 72.38%. The InceptionV3 had the
VII. RESULTS AND DISCUSSION least precision of 59.03%.
The models VGG16, InceptionV3, ResNet-50, ResNet-
The results for recall values show that InceptionV3 had
101, and ResNet-152 are compared based on accuracy,
better recall values when compared to all the other models
precision, recall, and F1-score. TABLE I presents the
which is in contrast to the results of accuracy and precision.
confusion matrices of all the pre-trained models used. The
InceptionV3 had a recall value of 95.77%, followed by
class ‘0’ represents the uncracked image class and the class
ResNet-152 having a recall of 86.05%, succeeded by ResNet-
‘1’ represents the cracked image class.

Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.
101 with a recall of 85.68%, ResNet-50 with a recall of Access, vol. 6, pp. 28993-29002, 2018, doi:
80.86%. The least recall value is of VGG16 and is 76.58%. 10.1109/ACCESS.2018.2844100.
[7] Y. Sari, P. B. Prakoso and A. R. Baskara, "Road Crack Detection using
A comparison of the F1-scores reveals that the deeper Support Vector Machine (SVM) and OTSU Algorithm," 2019 6th
Resnet networks have higher F1-scores with ResNet-101 and International Conference on Electric Vehicular Technology (ICEVT),
ResNet-152 having close values of 79.64% and 79.29% Bali, Indonesia, 2019, pp. 349-354, doi:
10.1109/ICEVT48285.2019.8993969.
respectively followed by ResNet-50 having a F1-score of
[8] Maguire, M., Dorafshan, S., & Thomas, R. J. (2018). SDNET2018: A
76.38%. The VGG-16 is found to be having the next highest concrete crack image dataset for machine learning applications. Utah
score of 74.51% followed by the Inception V3 with a F1-score State University. https://doi.org/10.15142/T3TD19
of 73.04%. [9] Mustapha S, Lu Y, Ng C-T, Malinowski P. Sensor Networks for
Structures Health Monitoring: Placement, Implementations, and
TABLE II. PERFORMANCE COMPARISON ON THE BASIS OF Challenges—A Review. Vibration. 2021; 4(3):551-585.
CLASSIFICATION METRICS https://doi.org/10.3390/vibration4030033
[10] Yang, C.; Chen, J.; Li, Z.; Huang, Y. Structural Crack Detection and
Pre-trained Model Accuracy Precision Recall F1-score Recognition Based on Deep Learning. Appl. Sci. 2021, 11, 2868.
https://doi.org/10.3390/app11062868
VGG16 70.59% 72.55% 76.58% 74.51% [11] M. Shaha and M. Pawar, "Transfer Learning for Image Classification,"
2018 Second International Conference on Electronics, Communication
InceptionV3 60.31% 59.03% 95.77% 73.04% and Aerospace Technology (ICECA), 2018, pp. 656-660, doi:
10.1109/ICECA.2018.8474802.
ResNet-50 71.93% 72.38% 80.86% 76.38% [12] H. Pan, Z. Pang, Y. Wang, Y. Wang and L. Chen, "A New Image
Recognition and Classification Method Combining Transfer Learning
ResNet101 75.40% 74.39% 85.68% 79.64% Algorithm and MobileNet Model for Welding Defects," in IEEE
Access, vol. 8, pp. 119951-119960, 2020, doi:
10.1109/ACCESS.2020.3005450.
Resnet-152 51.19% 73.52% 86.05% 79.29% [13] Simonyan, K. & Zisserman, A. (2014). Very deep convolutional
networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556.
[14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna,
VIII. CONCLUSION "Rethinking the Inception Architecture for Computer Vision," 2016
IEEE Conference on Computer Vision and Pattern Recognition
Through the implementation of the concepts discussed in (CVPR), 2016, pp. 2818-2826, doi: 10.1109/CVPR.2016.308
this paper and the observed results, it can be concluded that
neural networks with greater depth with respect to the number
of layers serve as better classifiers. The idea of transfer
learning facilitates better working and easier implementation
in terms of training time and ease of use. The issues of concern
thereby can be resolved with minimal resource utilization and
can be available for the use of all despite the limitations of
resources. Future studies can solve the tasks of segmentation
as discussed in the papers referred through neural networks
that are pre-trained on large datasets which serve as
convenient feature extractors and the last layers of the network
can incorporate the use of a machine learning classifier to
classify the images with even better accuracy.
REFERENCES
[1] L. S. Calderón and J. Bairán, "Crack detection in concrete elements
from RGB pictures using modified line detection Kernels," 2017
Intelligent Systems Conference (IntelliSys), London, 2017, pp. 799-
805, doi: 10.1109/IntelliSys.2017.8324222.
[2] Z. Qingbo, "Pavement Crack Detection Algorithm Based on Image
Processing Analysis," 2016 8th International Conference on Intelligent
Human-Machine Systems and Cybernetics (IHMSC), Hangzhou,
China, 2016, pp. 15-18, doi: 10.1109/IHMSC.2016.96.
[3] A. C. Paglinawan, F. R. G. Cruz, N. D. Casi, P. A. B. Ingatan, A. B. C.
Karganilla and G. V. G. Moster, "Crack Detection Using Multiple
Image Processing for Unmanned Aerial Monitoring of Concrete
Structure," TENCON 2018 - 2018 IEEE Region 10 Conference, Jeju,
Korea (South), 2018, pp. 2534-2538, doi:
10.1109/TENCON.2018.8650313.
[4] T. H. Dinh, Q. P. Ha and H. M. La, "Computer vision-based method
for concrete crack detection," 2016 14th International Conference on
Control, Automation, Robotics and Vision (ICARCV), Phuket,
Thailand, 2016, pp. 1-6, doi: 10.1109/ICARCV.2016.7838682.
[5] Cubero-Fernandez, A., Rodriguez-Lozano, F.J., Villatoro, R. et al.
Efficient pavement crack detection and classification. J Image Video
Proc. 2017, 39 (2017). https://doi.org/10.1186/s13640-017-0187-0
[6] S. Liang, X. Jianchun and Z. Xun, "An Algorithm for Concrete Crack
Extraction and Identification Based on Machine Vision," in IEEE

Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: University of Leeds. Downloaded on October 10,2023 at 12:52:55 UTC from IEEE Xplore. Restrictions apply.

You might also like