You are on page 1of 15

Expert Systems With Applications 214 (2023) 118966

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A learning-based crack defect detection and 3D localization framework for


automated fluorescent magnetic particle inspection
Qiang Wu, Xunpen Qin ∗, Kang Dong, Aixian Shi 1 , Zeqi Hu 1
School of Automotive Engineering, Wuhan University of Technology, Wuhan, 430070, China
Hubei Key Laboratory of Advanced Technology for Automotive Components, Wuhan, 430070, China

ARTICLE INFO ABSTRACT

Keywords: With the rapid development of deep learning, target detection and segmentation in dirty backgrounds have
Magnetic Particle Inspection been readily available. Fluorescent magnetic particle inspection (MPI) based on such technology will be a
Crack segmentation promising alternative for automated crack defect inspection. Most previous studies in MPI have focused only
3D reconstruction
on crack detection. Instead, we frame it as a crack 3D localization problem, since cracks on non-machined
CNN
surfaces of metal parts need to be polished and re-inspected, which relies on their 3D positions. Although
good results were obtained in defect detection, it is still challenging to perform pixel-level segmentation of
micro-cracks from the large background to obtain crack 2D pixels for 3D reconstruction. This paper proposes a
two-stage convolutional neural network (CNN) method for metal parts crack defect detection and segmentation
at the image-pixel level. The first stage detects and crops the potential cracks to a small area, and the second
stage can learn the context of cracks in the detected patches. A window-based stereo matching method is then
used to find matching pixels of cracks and to map crack image plane points to the 3D world points. We also
illustrate the entire system’s model deployment and signaling work to apply these methods. Both computational
and experimental results based on the system are presented for validation. The training precision of target
detection reaches 96.3%, its average precision reaches 85.4%, and the average precision reaches 98.3% when
the Intersection-over-Union (IoU) threshold is 0.5. The Dice score reaches 94% in pixel-level segmentation,
and the average precision is 99.3% when the probability threshold is set to 0.5. The corresponding efficiency
reaches 19 FPS and 18 FPS, and the mean absolute errors of 3D coordinates of reconstructed crack defects are
all within 1 mm in X-, Y- and Z- directions.

1. Introduction The key processes of fluorescent MPI include suspension liquids


spraying, magnetization, and visual inspection. Semi-automatic fluo-
rescent MPI detectors equipped with drenching and magnetizing have
Fluorescent Magnetic Particle Inspection (MPI) is an early and
been widely used in manufacturing industry, but the visual inspection
widely used inspection method for Non-Destructive Testing (NDT) of
mainly depends on manual at present. Previous work on fluorescent
surface and near-surface flaws of metal parts (Karthik et al., 2019;
MPI automatic defect detection has investigated the use of traditional
Lee et al., 2003; Li, Yang, Cai, & Kang, 2020; Staněk & Škvor, 2019).
image processing techniques (Abend, 1999; Shipway, Barden, Huth-
It makes use of the leakage magnetic field formed at the surface waite and Lowe, 2019; Shipway, Huthwaite, Lowe and Barden, 2019;
defect to adsorb the magnetic suspension particles with fluorescent Tang, Niu, Wee, & Han, 1995; Zheng, Xie, Viens, Birglen, & Mantegh,
dyes, so that they can be accumulated at the crack. Under ultraviolet 2013), and more, the visual feature sensitivity of fluorescent MPI
light, the indications with high color contrast on the part surface are was studied under the interaction of magnetic field strength, material
formed (Lovejoy, 1993). Compared with eddy current testing (Miao, Li, properties, particle size and fluorescence intensity (Biederer et al.,
& Tian, 2020), ultrasonic testing (Honarvar & Varvani-Farahani, 2020), 2009; Eisenmann, Enyart, Lo, & Brasche, 2015; Li et al., 2020). Re-
fluorescent MPI has simple process, high inspection and sensitivity, and cent breakthroughs in deep learning, particularly Convolutional Neural
is not affected by the size and shape of parts. Networks, make it have ability to extract weak features from complex

∗ Corresponding author at: School of Automotive Engineering, Wuhan University of Technology, Wuhan, 430070, China.
E-mail addresses: wuqiangyl@whut.edu.cn (Q. Wu), qxp915@hotmail.com (X. Qin), dkang0808@gmail.com (K. Dong), 2795687951@qq.com (A. Shi),
821759037@qq.com (Z. Hu).
1
Both authors contributed equally to the writing of the paper.

https://doi.org/10.1016/j.eswa.2022.118966
Received 19 April 2022; Received in revised form 16 September 2022; Accepted 1 October 2022
Available online 8 October 2022
0957-4174/© 2022 Published by Elsevier Ltd.
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

2. Related works

Detection of steel surface defects by image processing has been


studied for decades (Chin & Harlow, 1982; Luo, Fang, Liu, Yang, &
Sun, 2020; Neogi, Mohanta, & Dutta, 2014). Traditional methods use
manual parameters to find abnormal characteristics of defects in the
Fig. 1. Examples of crack defects. Their shapes and positions are random. The size of spatial and frequency domains. Statistical methods (Liu et al., 2017;
the crack defect area in the image on the left is 25 × 25, accounting for 1/4000 of Shi, Kong, Wang, Liu, & Zheng, 2016; Wang, Li, Gan, Yu, & Yang,
the entire image, and its width is about only 7 pixels.
2019; Wang et al., 2018) study the regular and periodic distribution
of pixel intensity value by measuring the statistical characteristics of
pixel spatial distribution. The precision of most of these approaches is
backgrounds and have been applied in this field. Shipway, Huthwaite,
correlated with features such as color and texture in the spatial domain,
Lowe, and Barden (2021), Tout, Meguenani, Urban, and Cudel (2021),
which prevent its portability. In view of the instability of statistical
Yang, Yang, Li, Chen, and Min (2022) and Zhou, Liu, Xu, and Deng
methods in the case of illumination changes and pseudo-defect inter-
(2019). Works based on image processing have shown a good begin-
ference, it is easier for images to separate defect feature information in
ning. However, as shown in Fig. 1, the large-area background occupies
the transform domain Choi et al. (2017), Liu and Yan (2014) and Wu,
almost all of the image. These algorithms developed based on single or
Xu, Chu, Duan, and Siebert (2019). This method is more suitable for
few crack features are often too reliant on defect shape, size, clean back-
images with regular texture features but it is difficult to apply to natural
ground, etc., and are difficult to be transplanted to a new environment,
textures with strong randomness. Model-based methods (Shi & Qiao,
resulting in poor detection and segmentation effects.
2018; Yu et al., 2018; Zhiznyakov, Privezentsev, & Zakharov, 2015;
There is one more issue that deserves attention. Relevant national
Zhou, Wu, Liu, Lu, & Hu, 2018) project the original texture of image
and association standards (British Standards Institution, 1999; Inter-
patches to a low-dimensional distribution through a special structural
national Association of Classification Societies, 2021; Standardization
model enhanced by parameter learning. However, the performance of
Administration of China, 2016) point out that: the forgings shall be
these methods is not only susceptible to environmental changes but also
sound and free from such segregation, cracks or defects that prelude
requires tedious manual tuning.
their intended use, and surface defects shall be removed by chipping
Recently, deep learning has brought big improvements to detect
and/or grinding. As a result, in manual operation, visual inspection and
surface defects in complex backgrounds (Ali & Cha, 2019; Mei, Wang,
defect marking are performed simultaneously under a UV lamp, and
& Wen, 2018; Nguyen, Perry, Bone, Le, & Nguyen, 2021; Wang, Chen,
then cracks will be polished and re-inspected. Therefore, obtaining the
3D position of crack defects is also essential for automated inspection. Qiao and Snoussi, 2018; Zou et al., 2018). In defect detection, an-
However, there is no relevant research to the best of our knowledge. chor box-based target localization algorithms represented by YOLO
At the same time, when the research goal turns to obtaining the 3D and Faster R-CNN are widely used (Cheng & Yu, 2020; Li, Su, Geng,
coordinates of cracks, pixel-level segmentation of the image is required. & Yin, 2018; Lin et al., 2019; Liu, Li, Wen, Chen, & Yang, 2019;
Obtaining cracked pixels directly from a sizeable uncorrelated area Zhang, Kang, Ni, & Ren, 2021; Zhao, Chen, Huang, Li, & Cheng,
of backgrounds makes the model difficult to converge and has low 2021). Other excellent networks are also used. Zhao et al. (2021)
segmentation precision. proposed a RetinaNet-based new deep neural network with different
channel attention and adaptive spatial feature fusion for steel surface
To this end, we propose a simultaneous crack detection and 3D
defect detection. Liu et al. (2019) proposed the GAN-based one-class
localization framework for fluorescent MPI, which computes micro-
classification method for strip steel surface defect detection to solve
cracks 3D coordinates from a segmentation map. Like other image-
based defect detection methods, we use CNN to extract features. The imbalanced class distributions caused by the sparse distribution of
significant difference between this method and previous works is that abnormal samples. Fu et al. (2019) present an effective CNN model
our ultimate goal is to compute explicitly the 3D world coordinates to achieve fast and accurate steel surface defect classification. Some
of crack defects from 2D images. The main idea of our approach is to other researches aim to perform pixel-level operations on images to
reconstruct 3d information from image pairs with only cracks, which obtain richer semantic information of defects. Shi, Wu, Yang, and Deng
is obtained from the segmentation model. To improve segmentation (2021), Song, Song, and Yan (2020) and Youkachen, Ruchanurucks,
precision and efficiency by ignoring large background regions, we start Phatrapomnant, and Kaneko (2019). For industrial metal defect sam-
with an object detection model and limit the computational scope. ples, it is difficult to obtain and label, which makes the model difficult
A critical feature of our method is that only textured cracks from to train. Some studies combine weak supervision with supervision to
pixel-level segmentation are provided to participate in restoring 3D achieve mixed supervision, improve the recognition precision of the
coordinates at each reconstruction. We provided the system with a model, and reduce the dependence on samples (He, Song, Meng, & Yan,
robotic arm that communicates with the workstation via Socket to take 2019; Kim, Park, & Park, 2019).
images that cover the surface of parts. Finally, the complete fluorescent According to their priorities, some existing models are only for
MPI framework is built with model deployment, signal transmission, regional-level detection or pixel-level segmentation without consid-
and computational efficiency to improve system usability. ering the 3D positions of defects. Meanwhile, in some cases, defect
To summarize, our contributions are: target-to-image ratio and background noise are important factors affect-
ing the precision of models. Inspired by the above-mentioned related
• A framework to simultaneously detect and reconstruct cracks for research, to realize the effective detection of small-scale and weak
automated fluorescent MPI. cracks under the complex background of fluorescent MPI, this paper
• An efficient crack segmentation uses a two-stage convolutional adopts a two-stage method. It uses the target detection and segmenta-
neural network with object detection. tion model at the same time to reduce the influence of non-correlated
• An executable system, including model deployment, signal com- areas. Finally, the conversion of 2D positions into 3D depths is realized
munication, computational efficiency, etc. by finding crack matching pixels in the image.

2
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 2. Schematic diagram of fluorescent MPI system including robot arm, cameras,
detector, and host computer. The workstation receives working signals from the robot
and camera and processes them through deployed deep learning models. The tool,
camera and robot world coordinate system are {𝐓}, {𝐂}, and {𝐖𝐂𝐒}, respectively.

3. Framework and module design details

The framework is designed with full consideration of the require-


ments of fluorescent MPI. In particular, we observed three limitations
to automated visual inspection: ① fixed-mount vision systems have
difficulty acquiring images that cover the surface of metal parts; ②
visual characteristics of micro-cracks are weak and difficult to segment
directly from the dirty background; ③ cracks observed need to be
marked for subsequent processing. Motivated by these observations, a
stereo vision detection system with a multi-degree-of-freedom (DOF)
robot arm is integrated. Fig. 2 shows the system schematic. It mainly
consists of four parts: 6DOF manipulator, Vision system, Fluorescence
MPI detector and Host computer. Cameras, a binocular vision system,
driven by a robotic arm, combined with the rotation of the steel parts,
can capture images from multiple views to form a field of view covering
the surface of the metal parts. The fluorescence MPI detector integrates
drenching, magnetizing, circumrotating, and demagnetizing functions
that can realize automatic clamping, spraying, and magnetization. The
host computer is responsible for calculation and communication. It
communicates with the fluorescent MPI detector via a Programmable
Logic Controller (PLC) to control the rotation of metal parts, controls
the robot to drive cameras via Socket programming, and triggers the vi-
sion system to take photos via a Transmission Control Protocol/Internet
Protocol (TCP/IP) connection.
Fig. 3 details the flow of back-end processing. It mainly consists
of four steps: image acquisition, crack detection, crack segmentation
and stereo matching. In the image acquisition stage, the workstation
receives the in-position signal from the robot arm firstly, then triggers
the camera to capture images. These images are passed sequentially
to the crack detection mode. Cracks (located in the overlapping field
Fig. 3. Step-by-step procedure to obtain cracks 3D coordinates. The process includes
of view of binocular cameras) should be detected by both cameras at crack defects detection, patch cropping, crack segmentation, image mapping, image
the same time to avoid repeated detection during camera movement, rectification and pixel matching.
or else the camera should be moved to the following location. And
equally important, the size of the detected crack (measured by the size
ratio of the Bounding Box (BBox) to the image) should be within a set
3.1. Crack detection
threshold value. The steel part will be defined as defective directly if the
threshold is exceeded. When these criteria are satisfied simultaneously,
To solve the segmentation imbalance problem caused by small crack
original images are cropped to fixed-size patches from the center of
BBox. Patches are used as the input of the crack segmentation model. defects a detection model is first used to detect candidate blocks. The
Then crack pixels are aligned into the original image coordinate system influence of network depth on crack resolution should be considered in
through cropping offset. As a result, binary images containing only the selection of crack detection model. With network depth increasing,
cracks are obtained. To obtain 3D coordinates on cracks by stereo convolutional networks will reduce feature map resolution by adjusting
vision binary images need to be rectified by camera parameters. The the convolutional step size or Pooling to increase the receptive field of
remapped images are aligned horizontally, and pixel disparities are cal- the convolutional kernel. As the resolution decreases, the target will
culated by stereo matching and converted into depth map or 3D point gradually lose location information in the feature map. In this work,
cloud. Finally, the coordinates are further transformed to the robot the Scaled-YOLOv4 (Wang et al., 2021) is used to be implemented for
world coordinate system through the hand–eye relationship between crack detection. Scaled-YOLOv4 is a high-precision single-stage object
the camera and the robot tool, which that the robot can recognize. detection model in the original YOLO (Redmon, Divvala, Girshick,

3
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 4. Scheme of the Scaled-YOLOv4 network, as proposed by Wang, Bochkovskiy, and Liao (2021), exemplified with three parts. The input is an image of height H and width
W. The box regression head predicts the pixel coordinates x1, y1, x2, y2 for anchor boxes.

& Farhadi, 2016) and its descendants (Bochkovskiy, Wang, & Liao, This paper is a single-class crack detection without considering the
2020; Redmon & Farhadi, 2017, 2018) in terms of detection speed classification loss. The loss function used in this model consists of two
and precision improved version. It supports higher input network size parts: object localization offset CIoU loss 𝐿𝐶𝐼𝑜𝑈 (Zheng et al., 2020) and
(resolution) while ensuring inference speed, which is beneficial to object confidence cross-entropy loss 𝐿𝐵𝐶𝐸 , as shown in Eqs. (1)–(3),
detect micro-cracks. Meanwhile, Scale-YOLOv4 is the current SOTA at where 𝜆𝑖 (𝑖 = 1, 2) is the balance coefficient.
the time we use it. Another important reason we choose it is that it
designs a set of models (YOLOv4-Tiny, YOLOv4-CSP, YOLOv4-Large) 𝐿𝑜𝑠𝑠 = 𝜆1 𝐿𝐵𝐶𝐸 + 𝜆2 𝐿𝐶𝐼𝑜𝑈 , (1)
for different GPUs (low-end, high-end). This makes it easy for us to 𝜌2 (𝑦, 𝑦)
̂
𝐿𝐶𝐼𝑜𝑈 = 1 − IOU(𝑦, 𝑦)
̂ + + 𝛼𝜈, (2)
deploy the model on different devices and YOLOv4-P5 is used in this 𝑐2

paper. 𝐿𝐵𝐶𝐸 = − Obj𝑖 [𝑦𝑙𝑜𝑔(𝑦)
̂ + (1 − 𝑦)𝑙𝑜𝑔(1
̂ − 𝑦)] . (3)
As shown in Fig. 4, the network structure consists of three parts: the
backbone for feature extraction, the neck for semantic representation In Eq. (2), IOU(𝑦, 𝑦)̂ = 𝑦∩ 𝑦̂
𝑦∪𝑦
represents the Intersection over Union
of the extracted features, and the head for prediction. Images are first between the prediction box and the ground truth. 𝜌 represents the
initialized through a convolutional layer with a kernel size of 3 × 3, euclidean distance between the center point of the prediction box
stride size of 1, and a channel number of 32. Then 5 CSPDark blocks 𝑦 and the target box 𝑦, ̂ and 𝑐 represents the diagonal distance of
are followed. Each CSPDark block is first down-sampled by convolution the minimum closure region that can contain both the prediction
with stride 𝑆 = 2. Consequently, the down-sampling rate are set box and the target box. 𝛼𝜈 is the penalty for the aspect ratio. 𝜈 =
( 𝑔𝑡
)2
4
to 2, 4, 8, 16, 32. After down-sampling, the featured graph is divided
𝜋2
arctan 𝑤
ℎ𝑔𝑡
− arctan 𝑤

is a positive number, (𝑤𝑔𝑡 , ℎ𝑔𝑡 ), (𝑤, ℎ) are
into two parts with the same number of channels. Part1 performs the
the true and predicted width and height of the BBox, respectively.
convolution operation using successive convolution layers with kernel 𝜈
𝛼 = (1−IOU)+𝜈 measures the consistency of the aspect ratio. In Eq. (3),
sizes of 3 × 3 and 1 × 1, and is connected by a residual block. Part2 does
not operate, and it concatenates directly with the final output of part1. Obj𝑖 indicates whether there is an object in the predicted object BBox
The number of residual blocks of each CSPDark block are 1, 3, 15, 15, 7, 𝑖, and the result value is 0 or 1.
and the number of channels are 64, 128, 256, 512, 1024. Then a CSP-
ized PAN (Path Aggregation Network) (Liu, Qi, Qin, Shi, & Jia, 2018) 3.2. Crack segmentation
structure is used at the neck of the network, and the bottom-up path is
extended to make it easier for low-level information to propagate to the The purpose of fluorescence MPI image segmentation is to obtain
top level. In the top-to-bottom process, the CSP-ized Spatial Pyramid the pixel coordinates of cracks, which can be used to calculate their
Pooling (SPP) (He, Zhang, Ren, & Sun, 2015) module is constructed by corresponding 3D coordinates. The semantic information and structure
four parallel branches: max-pooling layers with kernel sizes of 5 × 5, of those patches cropped from the crack detection model is relatively
9 × 9, 13 × 13, and a jump connection. SPP extends the receptive simple. Therefore, in order to improve the segmentation efficiency,
field to realize the fusion of local and global features. It enriches reduce the calculation consumption, and balance the segmentation
the expression ability of the feature map, which is beneficial for the precision, a widely used U-Net structure (Ronneberger, Fischer, &
targets with large scale differences to be detected. Then through two Brox, 2015), an encoding–decoding symmetric network, is adopted.
inverse CSP modules, up-sampling is performed between the modules. The structure of the model is shown in Fig. 5. It includes 4 down-
From bottom to top, we sample down by convolution with stride of sampling and up-sampling steps. Skip-connection technology is used
𝑆 = 2. To extract additional semantic features, the feature layers for every stage to ensure that the feature maps integrate more low-
obtained from the CSPDarknet53 are concatenated after convolution, lever features. Up-sampling can fuse different scale features and refine
then up-sampled, followed by down-sampling, which is stacked with segmented images’ edge information more refined.
the remaining feature layers for enhancing the feature. Three YOLO The image features are extracted by the encoding part, when the
heads with sizes of 28 × 28, 56 × 56, and 112 × 112 are used to fuse pixel-level segmentation is obtained by the decoding part. The model
and interact with feature maps of different scales to detect objects of has five convolutional stages. Two convolution layers with 3 × 3 kernel
different sizes. are used in each stage, and the ReLU (Rectified Linear Unit) is used

4
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 5. Scheme of the encoding–decoding symmetric network. The input is an image


with 3 channels. The output is a pixel probability graph with channel 1 of the same Fig. 6. Remapping crack pixels back to the original image. The yellow line represents
size as the original image. the crack, and the green box is the cropped patch.

as an activation function. The padding is set to 1 to keep the image


size after the convolution operation. In the encoder, of two adjacent
stages, the next stage is obtained from the previous by 2 × 2 Max-
Pooling down sampling. The image size is halved and the number of
channels is doubled accordingly. Thus, from the first to the fifth stage in
sequence, there are 32, 64, 128, 256, 512 channels which are the inverse
of the decoder process. While the encoding part is used for extracting
features, the decoding part is used for spatially localizing patterns in
the image. After the feature map of the previous stage is up-sampled,
it is concatenated with the same stage output of the encoder into a
single layer through skip-connection. In the last stage of the decoder,
the output is the same size as the input. Finally, a 1 × 1 convolution is
used to fuse the multi-channel feature maps, producing a single channel
feature map while maintaining the size of the image. Classify each Fig. 7. SAD window based stereo matching diagram. Finding the matching pixel from
the target image with the smallest sum of absolute differences in the reference image.
pixel on the feature map by applying the sigmoid activation function
to obtain the final segmentation map.
The loss contains two parts: binary cross-entropy 𝐿𝐵𝐶𝐸 and Dice
loss 𝐿𝐷𝑖𝑐𝑒 (Milletari, Navab, & Ahmadi, 2016). As shown in Eqs. (4)– The original image pairs containing only cracks are de distorted by
(6), where 𝜆𝑖 (𝑖 = 1, 2) is the balance coefficient, 𝑦̂ is ground truth, 𝑦 is their respective camera distortion factors and stereo rectified by cam-
predicted value. eras’ relative poses to obtain horizontally consistent image pairs (Hart-
ley, 1999). The position of the segmented crack feature is accurate
𝐿𝑜𝑠𝑠 = 𝜆1 𝐿𝐵𝐶𝐸 + 𝜆2 𝐿𝐷𝑖𝑐𝑒 , (4)
and belongs to the strong texture area of the binary image. Then, a
𝑦 ∩ 𝑦̂
𝐿𝐷𝑖𝑐𝑒 = 1 − 2 , (5) fast block matching method is used to find matching points between
|𝑦| + |𝑦|̂
[ ] the reference image and the target image through a minimum ‘‘sum
𝐿𝐵𝐶𝐸 = − 𝑦̂ log 𝑦 + (1 − 𝑦)̂ log(1 − 𝑦) . (6) of absolute difference’’ (SAD) window. For the background area, a
threshold is used to determine whether to participate in the calculation
3.3. Stereo matching to improve the matching efficiency. As shown in Fig. 7, set the size of
the matching window to 𝑛 × 𝑛. Taking the window center pixel (𝑥, 𝑦) in
According to Scharstein and Szeliski (2002), stereo matching al-
the reference image 𝐼𝑅 (𝑥, 𝑦) as the starting point, the disparity search
gorithms usually carry out some operations in the four steps of: ①
is performed on 𝐼𝑇 (𝑥, 𝑦) in the target image and its horizontal right
matching cost (Kanade & Okutomi, 1994; Zabih & Woodfill, 1994); ②
cost aggregation (Hirschmuller, 2007); ③ disparity calculation and opti- side. The search range is within the pre-selected maximum disparity
mization; ④ disparity refinement (Kolmogorov, Criminisi, Blake, Cross, 𝑑𝑚𝑎𝑥 (𝑑 ∈ [0...𝑑𝑚𝑎𝑥 ]). Calculate the gray scale changes between the two
& Rother, 2006; Woodford, Torr, Reid, & Fitzgibbon, 2009). Recovering windows, and the center pixel (𝑥, 𝑦 + 𝑑) of the window in the target
the depth of an entire scene is often time expensive due to complex image 𝐼𝑇 with the smallest change 𝐶(𝑥, 𝑦, 𝑑) is used as the matching
optimizations. Under fluorescent lighting, the low color contrast of the point of 𝐼𝑅 (𝑥, 𝑦), the disparity is 𝑑, and is defined as:
scene makes correct pixel matching difficult. We take a compromised ∑
𝑛
approach and only perform stereo matching on cracks segmented from 𝐶(𝑥, 𝑦, 𝑑) = |𝐼𝑅 (𝑥 + 𝑗, 𝑦 + 𝑖) − 𝐼𝑇 (𝑥 + 𝑑 + 𝑗, 𝑦 + 𝑖)|. (8)
the segmentation model. Therefore, crack pixel coordinates on patches 𝑖,𝑗=−𝑛
must be aligned to the original image. As shown in Fig. 6, the center After obtaining the disparity map, the depth 𝑍𝑐𝑎𝑚 = 𝑓 ∗ 𝑇 ∕𝑑 can
of the patch is offset from the top left corner of the original image by be calculated according to the similarity principle. 𝑇 is the baseline,
𝑥𝑏𝑜𝑥 , 𝑦𝑏𝑜𝑥 , and its height and width are ℎ𝑐𝑟𝑜𝑝 , 𝑤𝑐𝑟𝑜𝑝 , respectively. And the the distance between these two cameras, and 𝑓 is the focal length of
pixel coordinates of cracks under the patch can be expressed as (𝑥𝑖 , 𝑦𝑖 ). cameras. Further, the 3D coordinates of those matching points can be
The position (𝑥𝑐𝑟𝑎𝑐𝑘 , 𝑦𝑐𝑟𝑎𝑐𝑘 ) of cracks under the original image can
calculated:
be calculated by: [ ]T
𝑤𝑐𝑟𝑜𝑝 [ ]T 𝑥 − 𝐶𝑥 𝑦𝑐𝑟𝑎𝑐𝑘 − 𝐶𝑦
⎧𝑥 = 𝑥𝑏𝑜𝑥 − + 𝑥𝑖 𝑋𝑐𝑎𝑚 , 𝑌𝑐𝑎𝑚 , 𝑍𝑐𝑎𝑚 = 𝑍𝑐𝑎𝑚 𝑐𝑟𝑎𝑐𝑘 , ,1 , (9)
⎪ 𝑐𝑟𝑎𝑐𝑘 2 𝑓 𝑓
⎨ (7)
ℎ𝑐𝑟𝑜𝑝 where 𝐶𝑥 , 𝐶𝑦 are camera optical center coordinates. And the coordi-
⎪𝑦 [ ]
⎩ 𝑐𝑟𝑎𝑐𝑘 = 𝑦𝑏𝑜𝑥 −
2
+ 𝑦𝑖
nates 𝑋𝑐𝑎𝑚 , 𝑌𝑐𝑎𝑚 , 𝑍𝑐𝑎𝑚 indicates that the points are in the camera
where (ℎ𝑐𝑟𝑜𝑝 , 𝑤𝑐𝑟𝑜𝑝 ) is set to a fixed size (512, 512). coordinate system.

5
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Table 1
Comparison of different target detection methods.
Method Size FPS AP AP50 AP75 AP𝑆 AP𝑀 AP𝐿
SSD 512 67 51.7% 85.1% 61.1% 38.6% 61.5% 60.4%
Faster R-CNN 512 45 57.6% 90.5% 64.0% 46.9% 61.5% 70.7%
Faster R-CNN 896 31 69.9% 94.4% 79.6% 46.2% 70.9% 76.7%
Scaled-YOLOv4 512 𝟕𝟕 79.5% 97.3% 87.4% 61.0% 75.7% 91.0%
Scaled-YOLOv4 896 38 85.4% 98.3% 93.5% 72.1% 82.6% 86.8%

size is 2046 × 2040. Labelme (Russell, Torralba, Murphy, & Freeman,


2008), an image annotation software, is used to label the ground truth
for target detection and segmentation. The original data set is randomly
divided according to the ratio of 8:1:1. 80% of them are used as training
Fig. 8. Physical system. It mainly consists robot arm, cameras, fluorescence particle sets, 10% as test sets and 10% as verification sets. Some standard data
detector. augmentation tricks are used during training to increase the variability
of input images, including flipud, fliplr, random rotation ([− 𝜋6 , 𝜋6 ]),
scale (0.8∼0.95 times), brightness enhance (1.2∼1.5 times) and color-
To further transform the coordinates to the robot world coordinate space invert, Gaussian Blur (𝜎 ∈ [0, 3]). Each image is enhanced to 50
system (wcs), it is necessary to calibrate the hand–eye relationship images. As a result, the data set is expanded to 5100 pieces.
𝐇𝐜𝐚𝐦
𝐭𝐨𝐨𝐥
between the camera frame and the robot tool frame. And the Implementation The batch size is set to 8. The balance coefficients
transformation can be defined as: 𝜆1 and 𝜆2 for object localization offset CIoU loss and object confidence
[ ]T [ ]T cross-entropy loss 𝐿𝐵𝐶𝐸 are set to 0.05 and 1, respectively. The weights
𝑋𝑤𝑐𝑠 , 𝑌𝑤𝑐𝑠 , 𝑍𝑤𝑐𝑠 = 𝐇𝐭𝐨𝐨𝐥 𝐜𝐚𝐦
𝐰𝐜𝐬 𝐇𝐭𝐨𝐨𝐥 𝑋𝑐𝑎𝑚 , 𝑌𝑐𝑎𝑚 , 𝑍𝑐𝑎𝑚 , (10)
for detection heads 28 × 28, 56 × 56 and 112 × 112 are set to 4.0,
where 𝐇𝐭𝐨𝐨𝐥 1.0 and 0.4 respectively. The initial learning rate is set to 0.02. During
𝐰𝐜𝐬 is relationship between the robot tool frame and robot
world frame. the training process, some common training tricks, including adaptive
For each point, the error in the X-, Y-, and Z- directions are defined: learning rate, impulse and other techniques, are used to improve the
model’s precision, and a total of 300 epochs are trained. Eq. (15) is
used as the precision evaluation index.
𝜎 = ‖𝐏 − 𝐏𝐢 ‖, (11)
𝑇𝑃 𝑇𝑃
𝑃𝑟 = , 𝑅𝑒 = , (15)
where 𝐏 ∈ {𝑋, 𝑌 , 𝑍} represents the actual coordinate in the world 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝐹𝑁
frame. 𝐏𝐢 ∈ {𝑋𝑖 , 𝑌𝑖 , 𝑍𝑖 } is the coordinate reconstructed by stereo vision. where 𝑃 𝑟, 𝑅𝑒, 𝑇 𝑃 , 𝐹 𝑃 , 𝐹 𝑁 are Precision, Recall, the numbers of True
Finally, the accuracy of reconstruction is measured by Mean Abso- Positive, False Positive, and False Negative samples, respectively.
lute Error MAE(𝐏, 𝐏𝐢 ), Mean Square Error MSE(𝐏, 𝐏𝐢 ) and the percentage Performance Fig. 9 shows Scaled-YOLOv4 precision and loss dur-
Percentage(𝐏, 𝐏𝐢 ) of points where the difference between the estimated ing training process. P5_896_Pre indicates that the model is pre-trained
value and the actual value is greater than 𝜎𝐷 shown in Eqs. (12)–(14). on the COCO dataset (Lin et al., 2014), initialized with pre-trained
parameters. P5_896_No means initialization with random parameters.
1 ∑
𝑁
MAE(𝐏, 𝐏𝐢 ) = 𝜎, (12) As can be seen from the figure, the model fine-tuned on the pre-trained
𝑁 𝑖=1
√ parameters converges faster and the loss drops faster. In the first 50
√ epochs, the precision of the pre-trained model rise to 80%, and the
√1 ∑ 𝑁
MSE(𝐏, 𝐏𝐢 ) = √ 𝜎2, (13) loss dropped by more than 50%. In the following training process,
𝑁 𝑖=1
the precision increases continuously, and at the end of the training,
1 ∑
𝑁 the precision of pre-trained model reaches to 96.3%. However, the
Percentage(𝐏, 𝐏𝐢 ) = {𝜎 > 𝜎𝐷 }. (14) precision achieved without pre-training is only 80.2%.
𝑁 𝑖=1
On the other hand, we compare the performance of several com-
4. Experiments and results monly used object detection methods on the data set with image size
of 512 × 512 and 896 × 896, including SSD (Liu et al., 2016) and
As shown in Fig. 8, a CJW-2000 fluorescent particle flaw detector Faster R-CNN with backbone of ResNet50 (He, Zhang, Ren, & Sun,
is adopted for drenching, magnetizing, circumrotating, and demagne- 2016). These models are all pre-trained on the COCO dataset. The P–
tizing. R (Precision–Recall) curves under different IoU thresholds (0.5, 0.75,
The circumferential magnetization is 0∼2000 A, and the longitudinal 0.90) are shown in Fig. 10. When IoU = 0.5, those methods have almost
magnetization is 0∼16 000 AT, which can be adjusted continuously and the same excellent performance, with Scale-YOLOV4 and Faster R-
controlled by the power-off phase. The UV lamp’s illumination intensity CNN holding the lead. With the increase of recall, Scaled-YOLOv4 still
is not less than 1000 UW∕cm2 at 380 mm away from the work-piece maintains high precision. When the image size is 512 × 512, the recall
surface. The manipulator is an ABB robot equipped with a PC inter- of Faster R-CNN and SSD is maintained around 0.9, and the precision
face communication function. The stereo vision system combines two drops rapidly. As the IoU threshold increases and reaches 0.75, the
ACA2040-90uc Basler RGB cameras. The host computer is equipped precision loss of Scaled-YOLOv4 is small for different image sizes, but
with a NVIDIA 2080Ti graphics card with 11 GB of memory. the precision and recall of Faster R-CNN and SSD decreases severely at
image size 512 × 512. When the IoU increases to 0.90, the precision and
4.1. Detection recall of all models decreased rapidly. The Scaled-YOLOv4 maintains a
relatively high precision compared with other methods. The evaluation
Dateset The original data set is collected at the fluorescent mag- matric of those methods is listed in Table 1. The value marked in bold
netic particle inspection site of a Chinese auto parts manufacturing indicates the best results in corresponding item. At an image size of
company. An ACA2040-90UC Basler color camera is used. The objects 896 × 896, the average precision of Scale-YOLOv4 is 85.4%, which is
are car steering knuckles and steering knuckle arms. A total of 102 34% higher than SSD and 15% higher than Faster R-CNN. The average
fluorescence images with crack defects are obtained. The original image precision drops 10% when the image size drops to 512 × 512, but still

6
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 9. Precision and loss during training for Scaled-YOLOv4 with and without pre-trained.

Fig. 10. P–R (Precision–Recall) curves for different methods and image sizes. From left to right: IoU thresholds are 0.5, 0.75, 0.90, respectively.

Fig. 11. Example result of Scaled-YOLOv4 detection experiment on images with different type of cracks.

outperforms Faster R-CNN and SSD by about 20%. In terms of inference Fig. 11(c), is much larger than others. Fig. 11(d) shows a case where
speed, Scaled-YOLOv4 also leads with a speed of over 77 frames per there are multiple cracks on the surface of parts. In Fig. 11(e), there
second. At the same time, the size of the image has a greater impact on appear to be more than two lateral cracks emanating from a point and
the precision of those methods. a median crack extending from the same point towards the surface.
The crack detection results of Scaled-YOLOv4 at different scales At the end of the lateral branch cracks, the crack visual features are
are shown in Fig. 11(a), (b), and (c), respectively. The crack defect significantly weakened and discontinuous. The model does not perform
detection in Fig. 11(a)–(b) is challenging because of its small size and satisfactorily enough in such cases, it can accurately identify the well-
low contrast. The precision of searching for these non-salient targets characterized crack trunk, but end branches are missed. Nevertheless,
can reach 85%, indicating that the model can solve the detection task this deficiency can be compensated in the cropping session. Overall,
of small targets well. At the same time, the model also shows good the model has a high precision and good robustness, giving good
detection results when the length of these surface cracks, as shown in predictions on both test and training dataset. The model has good

7
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 12. Example of detection failure when image size is 512 × 512. The red box is Ground Truth, and the green box is the detection result. Top row: SSD. Middle row: Faster
R-CNN. Bottom row: Scaled-YOLOv4.

adaptability and can adapt to cracks of different scales and shapes, and enhance and color-space invert, Gaussian Blur), the dataset is doubled,
has high recognition precision for cracks with obscure features. yielding a total of 8100 images.
Fig. 12 shows some false detection examples at image size 512 × Implementation The model is trained from scratch for 300 epochs.
512. The red boxes are ground truth, and the green boxes are the The Batch size is set to 6, and the initial learning rate is 0.001. As shown
detection results. From top to bottom are SSD, Faster R-CNN and in Eqs. (4)–(6), the balance coefficients 𝜆1 and 𝜆2 for binary cross-
Scaled-YOLOv4. In SSD and Faster R-CNN, the first three columns from entropy loss 𝐿𝐵𝐶𝐸 and Dice loss 𝐿𝐷𝑖𝑐𝑒 are both set to 1, 𝑦̂ is ground
left to right are examples of false alarm, while the last two columns truth, 𝑦 is predicted value.
are examples of missed detection. In the case of false alarms, SSD and The dice coefficient score of the predicted image and the target
Faster R-CNN are prone to mistaking color-sharp edges for cracks. In the image, is used as the precision gold indicator and defined as:
missed detection examples, the crack vision features in the other images 𝑦 ∩ 𝑦̂
are weakened during the image down-sampling process, which makes 𝐴𝐶𝐷𝑖𝑐𝑒 = 2 . (16)
|𝑦| + |𝑦|̂
the model unable to detect effectively. In contrast, Scaled-YOLOv4
In the segmentation phase, a positive sample is a crack pixel, and a
has a few failed detection cases, which are shown in the third row.
negative sample is a background pixel.
Except for the fourth example, all the other examples occur only once.
Performance Fig. 13, from left to right, shows the training preci-
Again, the first two examples are mis-detecting crack-like regions. In
sion, training loss and the P–R curves of several common segmentation
the third and fourth examples, the model splits the ‘‘single’’ crack into
methods, including SegNet (Badrinarayanan, Kendall, & Cipolla, 2017)
‘‘multiple’’ targets. Especially the fourth case, which appeared many
and FCN (Long, Shelhamer, & Darrell, 2015). Them all show good
times in the test. In the fifth example, crack is not detected, possibly
performance and reach convergence within 50 epochs with precision
due to subtle image enhancement differences. In general, small image
of over 80%. Among them, U-Net achieves the best results under the
resolution can accelerate the convergence and inference speed of the
image size of 512 × 512, and its Dice precision reaches 93.8%. The
model, but improper image sampling may cause the lower precision.
training results of U-Net under two image sizes of 512 × 512 (U-
Without significantly affecting the inference speed, this paper adopts
Net:512) and 1024 × 1024 (U-Net:1024) are also compared. When the
an image size of 896 × 896 to ensure more stable detection precision.
image size of corp changes from 512 to 1024, the U-Net precision drops
by about 3.8%, and the convergence speed is relatively slow. The main
4.2. Segmentation reason for this may be that the large background brings more negative
samples. In P–R curves, U-Net:512 holds a curve most close to the up-
Dataset Crack detection is done by using the Scaled-YOLOv4 on right corner in the chart, and achieves the best precision and recall
the training, validation and test sets used in the target detection stage. values. Table 2 shows that U-Net:512 has obtained the highest average
The corresponding target regions are cropped and filtered with a size precision, reaching 99.3%. At the same time, the model maintains great
of 512 × 512, obtaining a total of 4050 images and maintaining a ratio advantages in terms of FPS, parameter quantity, and computation quan-
of 8:1:1. Using the same image enhancement methods and parameters tity. Its 7.76 M parameters are 1/4 of SegNet (29.44 M), 1/17 times
as for target detection (flipud, fliplr, random rotation, scale, brightness that of FCN (134.27 M), and its MAC (Multiply–Accumulate) operations

8
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 13. The accuracy and loss of different segmentation methods during training and their corresponding P–R curves. From left to right: Precision, Loss and P–R curve.

Table 2 Table 3
Comparison of different segmentation methods. Binocular vision system calibration result.
Method Size FPS AP Dice Params MACs 𝑓 /mm 𝑘 𝑆𝑥 𝑆𝑦 𝐶𝑥 𝐶𝑦
U-Net 512 𝟏𝟎𝟏 99.3% 𝟗𝟑.𝟖% 𝟕.𝟕𝟔 𝐌 𝟓𝟒.𝟗𝟔 𝐆 L 16.6 −141.6 5.5e−6 5.5e−6 1024.9 1032.2
U-Net 1024 25 97.2% 90.2% 7.76 M 219.84 G Calibration R 17.0 −106.5 5.5e−6 5.5e−6 1048.7 1043.9
SegNet 512 38 99.0% 91.8% 29.44 M 160.56 G
𝑿/mm 𝒀 /mm 𝒁/mm 𝑹 𝒙 ∕◦ 𝑹 𝒚 ∕◦ 𝑹𝒛 ∕◦
FCN 512 31 97.5% 89.0% 134.27 M 190.36 G 𝐓
69.1 −3.3e−6 1.6 0.06 359.3 359.8
𝑓 /mm 𝑘 𝑆𝑥 𝑆𝑦 𝐶𝑥 𝐶𝑦
L 16.8 0 5.5e−6 5.5e−6 1113.3 1059.4
does not exceed 1/3 times of the other methods (SegNet: 160.56G, Rectification R 16.8 0 5.5e−6 5.5e−6 1172.5 1059.4
FCN: 190.36 G). The small amount of parameters and computation also 𝑿/mm 𝒀 /mm 𝒁/mm 𝑹 𝒙 ∕◦ 𝑹 𝒚 ∕◦ 𝑹𝒛 ∕◦
𝐓
enable U-Net to achieve an higher inference speed of 101 FPS. 69.1 0 0 0 0 0
In Fig. 14, some typical examples segmentation results of those 𝑿/mm 𝒀 /mm 𝒁/mm 𝑹 𝒙 ∕◦ 𝑹 𝒚 ∕◦ 𝑹𝒛 ∕◦
Eye-in-hand 𝐇
methods are compared. From left to right are the cropped patches, −103.3 49.5 85.7 358.6 0.68 256.3

the ground truth and the results of segmentation, respectively. Overall,


each model including U-Net, SegNet, and FCN obtained more satis-
factory results, and U-Net and SegNet performed more outstanding 4.3. Depth estimation
in details compared to FCN. In the first two examples, the targets to
be segmented are smaller, the color contrast of the crack defects in Calibration The high-precision parameter calibration of the vision
the former image is lower, and the cracks in the second image are system is an important prerequisite to ensure the measurement preci-
finer. They are both difficult to be segmented directly in the original sion. It mainly includes camera internal parameters, the relative pose
image. In the first example, high precision crack segmentation results 𝐓 of camera system R in relation to camera system L, and the relative
are obtained for each model. The Dice values of U-Net, SegNet and FCN pose 𝐇 between the cameras and the world coordinate system (eye-in-
are 0.9956, 0.9931 and 0.9643, respectively. In the second example, hand). The camera calibration method proposed by Zhang (1999) and
U-Net (Dice: 0.8744) and SegNet (Dice: 0.8689) are able to segment hand–eye calibration method (Park & Martin, 1994; Tsai, Lenz, et al.,
the cracks accurately, but FCN (Dice: 0.5169) shows discontinuities. 1989) are adopted in this paper. The calibrated results are shown in
The shape of the crack in the third example is irregular, and there is Table 3, where 𝑓 represents the focal length of camera, 𝑆𝑥 and 𝑆𝑦
a little information loss at the end of the crack in the segmentation represent the width and height of a single pixel, and 𝐶𝑥 , 𝐶𝑦 represent
results of FCN (Dice: 0.8471), while U-Net (Dice: 0.9662) and SegNet the coordinates of the optical center. 𝐓, 𝐇 represent the stereoscopic
(Dice: 0.9524) performs better and retains more complete details. The pose and eye-in-hand relationship, respectively. (𝑋, 𝑌 , 𝑍), (𝑅𝑥 , 𝑅𝑦 , 𝑅𝑧 )
fourth is a more difficult example of segmentation. Although the size represent the translation distance and rotation angle along the X-, Y-
of the cracks are large, multiple cracks are connected and the contours , and Z-axis. With those all calibrated parameters, image rectification
are not clear. When segmenting such cracks, U-Net (Dice: 0.9609) and is performed to make pairs of conjugate epipolar lines collinear and
SegNet (Dice: 0.9492) have higher detail differentiation ability and the parallel to the horizontal image axis. After rectification, the distortion
boundaries between cracks are effectively distinguished. In FCN (Dice: coefficient 𝑘 of the camera becomes 0, and the two cameras are rectified
0.8151), the segmentation result is slightly wider than the ground truth, to horizontal alignment. Their optical centers 𝐶𝑦 are equal, the heights
and the contour details are slightly lower, but the contour information are the same 𝑍 = 0, the rotation angles 𝑅𝑥 , 𝑅𝑦 , 𝑅𝑧 are 0, and there is
can be completely preserved. In the fifth example, the color of the only translation in the X direction.
image has changed, with higher color contrast in the upper part of Dataset We have three knuckle arms with magnetic cracks. In the
the crack, but weaker in the lower part. Compared with U-Net (Dice: experiment, a total of 5 image pairs are obtained by binocular camera
0.9725) and SegNet (Dice: 0.9437), FCN (Dice: 0.7264) has serious (two ACA2040-90UC Basler color cameras) from different angles and
information loss in the lower part of the crack. positions shown in Fig. 16 top row.
In Fig. 15, we test U-Net with image crop size of 1024 × 1024. We Stereo Matching Fig. 16 shows the disparity maps and correspond-
single out examples with similar crack shapes to those in the above ing point clouds of five groups of crack defects. The top row is the
method and find that U-Net shows satisfactory segmentation results. left reference figure. The second row is disparity map obtained by
The Dice values are 0.9761, 0.8703 and 0.8392, 0.9265, 0.9441, respec- stereo matching according to Eq. (8). The bottom row is 3D point cloud
tively. Combining the inference speed and MAC size in Table 2 and the obtained by Eq. (9). The background area is limited by the threshold
statistics of the actual image size of crack defects in object detection, and does not participate in the calculation. The color intensity in depth
we finally choose 512 × 512 as the input size. maps is positively correlated with the actual coordinate value.

9
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 14. Comparison of results obtained by different methods on five sample images (From top to bottom).

Fig. 15. U-Net segmentation samples at image crop size of 1024 × 1024.

10
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 16. Examples of stereo matching for depth estimation. Top row: Original left image. Second row: Disparity map. Bottom row: Point cloud.

Fig. 17. Crack defects 3D coordinates in the camera coordinate system and the world coordinate system.

Coordinate Transformation The coordinates in the camera co- truth in the robotic coordinate system, can be obtained. The main
ordinate system often need to be transformed into other coordinate problem is that the actual values introduce robotic system errors, such
systems such as the robot world coordinate system for use. The co- as TCP (Tool Center Point) calibration errors and manual manipulation
ordinate transformation in this paper has two main purposes, one is errors. The second problem is that the number of reconstructed 3D
for subsequent robot marking, and the other is for reconstruction error points is much larger than that of acquisitions, and it is necessary
analysis. The first crack defect point cloud in the camera coordinate to find the point corresponding to the actual value from the point
system is shown in Fig. 17(a). Its coordinates transformed by Eq. (10) cloud. We pick out the corresponding points from the point cloud by
in the robot coordinate system are shown in Fig. 17(b). Euclidean distance minimization, which is an imprecise approach, but
Ground Truth It is often difficult to obtain actual 3D data and align a reasonable approach assuming the two objects are close enough.
it to the corresponding image by vision systems or other 3D techniques Finally, we selected a total of 150 points in 5 groups of experiments.
for error analysis. Thanks to the fact that only the 3D coordinates Error Analysis Under the previous assumptions, the errors of five
of the crack are required, it is possible to obtain the 3D coordinates sets of data along the X-, Y- and Z- axis were calculated by Eq. (11),
of some points on the surface crack of the part as the real value as shown in Fig. 18 (from left to right). The evaluation matric in
through the robot tool. Combined with the coordinate transformation, Eqs. (11)–(14) are analyzed, and the results are shown in Table 4.
the reconstructed 3D coordinates of the crack, as well as the ground The mean absolute errors on X-, Y- and Z- axis are 1.67, 1.25 and

11
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Fig. 18. 3D reconstruction errors. From left to right: X-, Y-, Z- axis errors,respectively.

Table 4
The statistical results of crack defect 3D reconstruction.
MAE (mm) MSE (mm) Percentage (<1 mm) Percentage (<2 mm) Percentage (<3 mm)
x y z x y z x y z x y z x y z
Caculate 1.67 1.25 1.19 1.95 1.43 1.35 30.8% 40.3% 40.9% 55.7% 87.9% 91.9% 95.3% 99.3% 99.3%
Correct 0.90 0.61 0.62 1.03 0.79 0.84 53.0% 81.9% 87.2% 99.3% 99.3% 98.0% 100.0% 99.3% 98.0%

1.19 mm, respectively. The root mean square errors were 1.95, 1.43 Algorithm 1: Crack detection and 3D localization
and 1.35 mm, respectively. The percentage of points with errors less Input: Image pair 𝐼 = (𝐼𝐿 , 𝐼𝑅 ); Camera parameters 𝐾; Hand–eye
than 1 mm was 30.8%, 40.3%, and 40.9%, respectively. The error of relationship 𝐇𝑐𝑎𝑚 ; Robot positions 𝐇𝑡𝑜𝑜𝑙 1 𝑛
𝑡𝑜𝑜𝑙 𝑤𝑐𝑠 ← {𝐇 , … , 𝐇 }
more than 95% points was within 3 mm. Because the calibration errors Output: A list 𝑃 of surface crack defects 3D coordinates
of robot often show directivity in macro. Therefore, the errors are fitted 1 Initialization:Camera TCP; Robot Socket; Model
horizontally, and the intercepts of the yellow horizontal lines in Fig. 18 2 𝑃 ← { }; ⊳ list surface crack defects 3D coordinates
are −1.65, −1.19 and −1.06 mm respectively. The average absolute 3 for 𝑖 ∈ {1, 2, … , 𝑛} do
errors corrected by this intercept are 0.90, 0.61, and 0.62 mm on X-, 4 Model ← {𝐼𝐿 , 𝐼𝑅 } ← camera ← Robot Socket;
Y-, and Z- axis, respectively. More than 50% of the points are within 5 {𝐷𝐿(1) , … , 𝐷𝐿(𝑠) }, {𝐷𝑅(1) (𝑡)
, … , 𝐷𝑅 } ← Detection{𝐼𝐿 , 𝐼𝑅 };
6 if size of 𝑑 ∈ {𝐷𝐿 , 𝐷𝑅 } > 𝜎 then
1 mm, and more than 98% are within 2 mm.
7 End;

8 else if 𝑑 ∈ {𝐷𝐿 , 𝐷𝑅 } is empty or 𝑑 ∉ 𝐼𝐿 𝐼𝑅 then
4.4. Model deployment 9 Continue;
10 else
Those models are deployed using the NVIDIA Triton Inference 11 {𝑅(1)
𝐿
, … , 𝑅(𝑠)𝐿
}, {𝑅(1)
𝑅
, … , 𝑅(𝑡)
𝑅
} ← Crop(𝐼 𝑖 , 𝐷𝑖 );
Server, an open source inference service that can be used to deploy 12 {𝑆𝐿(1) , … , 𝑆𝐿(𝑠) }, {𝑆𝑅(1) , … , 𝑆𝑅(𝑡) } ← Segment(𝑅𝐿 , 𝑅𝑅 );
models from all popular frameworks. It supports common frameworks 13 {𝐼𝐿𝑆 , 𝐼𝑅𝑆 } ← Region-mapping(𝑆𝐿 , 𝑆𝑅 );
such as TensorFlow and PyTorch. NVIDIA Triton Inference Servers 14 𝐼𝑑𝑖𝑠 ← Stereo-matching(𝐼𝐿𝑆 , 𝐼𝑅𝑆 );
maximize performance and reduce end-to-end latency by running mul- 15 𝑃𝑐𝑎𝑚 ← 3D coordinates in the camera frame(𝐼𝑑𝑖𝑠 , 𝐾);
tiple models simultaneously on GPUs. The system communication and 16 𝑃𝑤𝑜𝑟𝑙𝑑 ← (𝑃𝑐𝑎𝑚 , 𝐇𝑡𝑜𝑜𝑙 , 𝐇𝑐𝑎𝑚 );
⋃ 𝑤𝑐𝑠 𝑡𝑜𝑜𝑙
17 𝑃 ← 𝑃 𝑃𝑤𝑜𝑟𝑙𝑑 ;
processing pseudo-code are shown in Algorithm 1. Its inputs include
18 end
Image pair 𝐼 = (𝐼𝐿 , 𝐼𝑅 ); Camera parameters 𝐾; Hand–eye relationship
19 end
𝐇𝑐𝑎𝑚
𝑡𝑜𝑜𝑙
; Robot positions 𝐇𝑡𝑜𝑜𝑙 1 𝑛
𝑤𝑐𝑠 ← {𝐇 , … , 𝐇 }. And its output is a list 𝑃 of
surface crack defect 3D coordinates. Firstly, the camera is initialized,
and communication is established with the robot through Socket/TCP.
Table 5
After receiving the client’s movement command, the robot drives the Efficiency and GPU memory consumption.
camera to move to the present position and feeds back the signal to Steps Size Run-time GPU Mem
the client. Then cameras collect images at this position and performs
Detection 896 × 896 53 ms 3.2 GB
detection processing through the model deployed on the server-side. Segmentation 512 × 512 55 ms 1.4 GB
Here we have parallel processing and image queues. The detected Stereo matching 2046 × 2040 105 ms –
crack areas {𝐷𝐿 , 𝐷𝑅 } are determined by two conditions, and if the
conditions are satisfied, operations such as cropping, segmentation, and
reconstruction are continued to calculate the 3D coordinates of the
average GPU consumption is 1.4 Gb, and the average reasoning time
crack. When done, the robot is asked to move to the next position 𝐇𝑡𝑜𝑜𝑙 𝑤𝑐𝑠 . of a single image is 55 ms. In the stereo matching stage, the image size
In the middle, PLC drives the rotary mechanism to drive the parts to
is 2046 × 2040, and it takes 105 ms to calculate on the CPU. These
rotate, and the next round of photos is collected circularly. times are for the entire operation process, including pre-processing and
The GPU memory consumption and run time of those models de- network delays, etc. According to the calculation of the two pictures,
ployed on the workstation are shown in the following Table 5. The the total time of the model is about 320 ms. It should be noted that
image precision is Float32 bits. In Detection, the image size is down- not all images require segmentation and stereo reconstruction. On the
sampled to 896 × 896, the average GPU consumption is 3.2 Gb, and other hand, this paper uses a robot to drive the camera to move, and the
the average inference time of a single image is 53 ms. The detection magnetic particle flaw detector stops every 90 degrees to take 3 groups
results are displayed on the original images with a size of 2046 × 2040. of photos each time, a total of 12 groups of photos. The rotational
In the image segmentation stage, the image size is 512 × 512, the speed of the mechanism is 180 degrees per second. Thanks to parallel

12
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

computing and image queues, the average detection time of a single Biederer, S., Knopp, T., Sattel, T. F., Lüdtke-Buzug, K., Gleich, B., Weizenecker, J.,
part is within 5 s (excluding the processes of loading and unloading, et al. (2009). Magnetization response spectroscopy of superparamagnetic nanopar-
ticles for magnetic particle imaging. Journal of Physics D: Applied Physics, [ISSN:
drenching and magnetizing, etc.) for the currently measured forming
0022-3727] 42(20), Article 205007. http://dx.doi.org/10.1088/0022-3727/42/20/
parts shown in Fig. 16. 205007.
Bochkovskiy, A., Wang, C., & Liao, H. M. (2020). Yolov4: Optimal speed and accuracy
5. Conclusion of object detection. http://dx.doi.org/10.48550/arXiv.2004.10934, arXiv preprint
arXiv:2004.10934.
British Standards Institution (1999). Open die steel forgings for general engineering
This paper developed a automated fluorescent MPI framework for purposes-part 1: General requirements. https://dlscrib.com/download/bs-10250-4-
simultaneous crack defect detection and its 3D localization. First, a 2000_59b98ded08bbc5bc27894d06_pdf.
Cheng, X., & Yu, J. (2020). Retinanet with difference channel attention and adaptively
two-stage model is used to obtain crack defect pixel coordinates. The
spatial feature fusion for steel surface defect detection. IEEE Transactions on
first stage operates on the image and removes the noisy background Instrumentation and Measurement, 70, 1–11. http://dx.doi.org/10.1109/TIM.2020.
area. The second stage is used to classify all pixels in the crack re- 3040485.
gion localized in the first stage. Then, the images are rectified by Chin, R. T., & Harlow, C. A. (1982). Automated visual inspection: A survey. IEEE
Transactions on Pattern Analysis and Machine Intelligence, PAMI-4(6), 557–573. http:
the vision system parameters, and the stereo disparity is estimated
//dx.doi.org/10.1109/TPAMI.1982.4767309.
by matching the crack defect pixels in the left and right images to Choi, D. c., Jeon, Y. J., Kim, S. H., Moon, S., Yun, J. P., & Kim, S. W. (2017).
restore 3D depth maps. Next, the model deployment, communication Detection of pinholes in steel slabs using Gabor filter combination and morpho-
and efficiency analysis of the whole system are completed. Calculations logical features. Isij International, 57(6), 1045–1053. http://dx.doi.org/10.2355/
and experiments based on this system are designed. High crack defect isijinternational.ISIJINT-2016-160.
Eisenmann, D. J., Enyart, D., Lo, C., & Brasche, L. (2015). Review of progress in
detection, segmentation precision, and low crack spatial position error magnetic particle inspection. AIP Conference Proceedings, 1581(1), 1505. http://
are obtained. dx.doi.org/10.1063/1.4865001.
In future work, the convenience, efficiency and economy between Fu, G., Sun, P., Zhu, W., Yang, J., Cao, Y., Yang, M. Y., et al. (2019). A deep-learning-
multiple fixed cameras and manipulators will be measured by opti- based approach for fast and robust steel surface defects classification. Optics and
Lasers in Engineering, 121, 397–405. http://dx.doi.org/10.1016/j.optlaseng.2019.05.
mizing the viewpoints of cameras, and the most suitable arrangement 005.
scheme can be selected according to the actual needs. We also plan Hartley, R. I. (1999). Theory and practice of projective rectification. Interna-
to analyze the specificity of fluorescence images and crack defects and tional Journal of Computer Vision, 35(2), 115–127. http://dx.doi.org/10.1023/A:
continuously optimize the structure and size of the model. 1008115206617.
He, Y., Song, K., Meng, Q., & Yan, Y. (2019). An end-to-end steel surface defect
detection approach via fusing multiple hierarchical features. IEEE Transactions
CRediT authorship contribution statement on Instrumentation and Measurement, 69(4), 1493–1504. http://dx.doi.org/10.1109/
TIM.2019.2915404.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep
Qiang Wu: Designed the study, Contributed to analysis and convolutional networks for visual recognition. IEEE Transactions on Pattern Anal-
manuscript preparation. Xunpen Qin: Conception of the study. Kang ysis and Machine Intelligence, 37(9), 1904–1916. http://dx.doi.org/10.1109/TPAMI.
Dong: Data analysis and experimental platform. Aixian Shi: Data 2015.2389824.
analysis and experimental platform. Zeqi Hu: Data analysis and He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern
experimental platform.
recognition (pp. 770–778). http://dx.doi.org/10.48550/arXiv.1512.03385.
Hirschmuller, H. (2007). Stereo processing by semiglobal matching and mutual informa-
Declaration of competing interest tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.
http://dx.doi.org/10.1109/TPAMI.2007.1166.
Honarvar, F., & Varvani-Farahani, A. (2020). A review of ultrasonic testing applications
The authors declare that they have no known competing finan- in additive manufacturing: Defect evaluation, material characterization, and process
cial interests or personal relationships that could have appeared to control. Ultrasonics, 108, Article 106227.
influence the work reported in this paper. International Association of Classification Societies (2021). Guidelines for non-
destructive testing of hull and machinery steel forgings no.68. https://iacs.org.uk/
download/1855.
Data availability Kanade, T., & Okutomi, M. (1994). A stereo matching algorithm with an adaptive
window: Theory and experiment. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 16(9), 920–932. http://dx.doi.org/10.1109/34.310690.
No data was used for the research described in the article.
Karthik, M. M., Terzioglu, T., Hurlebaus, S., Hueste, M. B., Weischedel, H., & Stamm, R.
(2019). Magnetic flux leakage technique to detect loss in metallic area in external
Acknowledgments post-tensioning systems. Engineering Structures, 201, Article 109765. http://dx.doi.
org/10.1016/j.engstruct.2019.109765.
Kim, M. S., Park, T., & Park, P. (2019). Classification of steel surface defect us-
The authors would like to thank all the Hubei Key Laboratory of ing convolutional neural network with few images. In 2019 12th Asian control
Advanced Technology for Automotive Components staff for supporting conference (ASCC) (pp. 1398–1401). IEEE, https://ieeexplore.ieee.org/abstract/
this work. document/8764994.
Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., & Rother, C. (2006). Probabilistic
The work was supported by the Major Project of Technological In-
fusion of stereo with color and contrast for bilayer segmentation. IEEE Transactions
novation in Hubei Province (2020BED010), China Postdoctoral Science on Pattern Analysis and Machine Intelligence, 28(9), 1480–1492. http://dx.doi.org/
Foundation (2020M682498). 10.1109/TPAMI.2006.193.
Lee, J., Lee, S., Jiles, D., Garton, M., Lopez, R., & Brasche, L. (2003). Sensitivity analysis
of simulations for magnetic particle inspection using the finite-element method.
References
IEEE Transactions on Magnetics, 39, 3604–3606. http://dx.doi.org/10.1109/TMAG.
2003.816152.
Abend, K. (1999). Fully automated dye-penetrant inspection of automotive parts. Li, J., Su, Z., Geng, J., & Yin, Y. (2018). Real-time detection of steel strip surface
Computer Standards & Interfaces, 2(21), 157. http://dx.doi.org/10.1016/S0920- defects based on improved yolo detection network. IFAC-PapersOnLine, 51(21),
5489(99)92144-X. 76–81. http://dx.doi.org/10.1016/j.ifacol.2018.09.412.
Ali, R., & Cha, Y. (2019). Subsurface damage detection of a steel bridge using deep Li, L., Yang, Y., Cai, X., & Kang, Y. (2020). Investigation on the formation mechanism
learning and uncooled micro-bolometer. Construction and Building Materials, 226, of crack indications and the influences of related parameters in magnetic particle
376–387. http://dx.doi.org/10.1016/j.conbuildmat.2019.07.293. inspection. Applied Sciences, 10, http://dx.doi.org/10.3390/app10196805.
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional Lin, C., Chen, C., Yang, C., Akhyar, F., Hsu, C., & Ng, H. (2019). Cascading
encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern convolutional neural network for steel surface defect detection. In International
Analysis and Machine Intelligence, 39(12), 2481–2495. http://dx.doi.org/10.1109/ conference on applied human factors and ergonomics (pp. 202–212). Springer, http:
CVPR.2015.7298965. //dx.doi.org/10.1007/978-3-030-20454-9_20.

13
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Shipway, N., Huthwaite, P., Lowe, M., & Barden, T. (2019). Performance based
Microsoft coco: Common objects in context. In European conference on computer modifications of random forest to perform automated defect detection for flu-
vision (pp. 740–755). Springer, http://dx.doi.org/10.1007/978-3-319-10602-1_48. orescent penetrant inspection. Journal of Nondestructive Evaluation, 38(2), 1–11.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016). SSD: http://dx.doi.org/10.1007/s10921-019-0574-9.
Single shot multibox detector. In European conference on computer vision (pp. 21–37). Shipway, N., Huthwaite, P., Lowe, M., & Barden, T. (2021). Using ResNets to
Springer, http://dx.doi.org/10.48550/arXiv.1512.02325. perform automated defect detection for fluorescent penetrant inspection. NDT
Liu, K., Li, A., Wen, X., Chen, H., & Yang, P. (2019). Steel surface defect detection using & E International, 119, Article 102400. http://dx.doi.org/10.1016/j.ndteint.2020.
GAN and one-class classifier. In 2019 25th international conference on automation 102400.
and computing (ICAC) (pp. 1–6). IEEE, http://dx.doi.org/10.23919/IConAC.2019. Song, G., Song, K., & Yan, Y. (2020). EDRNet: Encoder–decoder residual network
8895110. for salient object detection of strip steel surface defects. IEEE Transactions on
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance Instrumentation and Measurement, 69(12), 9709–9719. http://dx.doi.org/10.1109/
segmentation. In Proceedings of the IEEE conference on computer vision and pattern TIM.2020.3002277.
recognition (pp. 8759–8768). http://dx.doi.org/10.48550/arXiv.1803.01534.
Standardization Administration of China (2016). Steel die forgings – tolerance
Liu, K., Wang, H., Chen, H., Qu, E., Tian, Y., & Sun, H. (2017). Steel surface defect
and machining allowance. https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=
detection using a new Haar–Weibull-variance model in unsupervised manner. IEEE
21DA10C89BA268AC118A824F82B06FD5.
Transactions on Instrumentation and Measurement, 66(10), 2585–2596. http://dx.doi.
Staněk, P., & Škvor, Z. (2019). Automated magnetic field evaluation for magnetic
org/10.1109/TIM.2017.2712838.
particle inspection by impulse. Journal of Nondestructive Evaluation, 38, 75. http:
Liu, W., & Yan, Y. (2014). Automated surface defect detection for cold-rolled steel strip
//dx.doi.org/10.1007/s10921-019-0615-4.
based on wavelet anisotropic diffusion method. International Journal of Industrial and
Systems Engineering, 17(2), 224–239. http://dx.doi.org/10.1504/IJISE.2014.061995. Tang, Y., Niu, A., Wee, W. G., & Han, C. Y. (1995). Automated inspection system
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic for detecting metal surface cracks from fluorescent penetrant images. In Machine
segmentation. In Proceedings of the IEEE conference on computer vision and pattern vision applications in industrial inspection III, Vol. 2423 (pp. 278–291). SPIE, http:
recognition (pp. 3431–3440). http://dx.doi.org/10.1109/CVPR.2015.7298965. //dx.doi.org/10.1117/12.205514.
Lovejoy, M. (1993). Magnetic particle inspection: a practical guide. Springer Science & Tout, K., Meguenani, A., Urban, J.-P., & Cudel, C. (2021). Automated vision system for
Business Media, http://dx.doi.org/10.1007/978-94-011-1536-0. magnetic particle inspection of crankshafts using convolutional neural networks.
Luo, Q., Fang, X., Liu, L., Yang, C., & Sun, Y. (2020). Automated visual defect International Journal of Advanced Manufacturing Technology, 112, 3307–3326. http:
detection for flat steel surface: A survey. IEEE Transactions on Instrumentation and //dx.doi.org/10.1007/s00170-020-06467-4.
Measurement, 69(3), 626–644. http://dx.doi.org/10.1109/TIM.2019.2963555. Tsai, R. Y., Lenz, R. K., et al. (1989). A new technique for fully autonomous and efficient
Mei, S., Wang, Y., & Wen, G. (2018). Automatic fabric defect detection with a 3 d robotics hand/eye calibration. IEEE Transactions on Robotics and Automation,
multi-scale convolutional denoising autoencoder network model. Sensors, 18, 1064. 5(3), 345–358. http://dx.doi.org/10.1109/70.34770.
http://dx.doi.org/10.3390/s18041064. Wang, C., Bochkovskiy, A., & Liao, H. M. (2021). Scaled-yolov4: Scaling cross stage par-
Miao, L., Li, H., & Tian, G. (2020). Resonant frequency tracking mode on eddy tial network. In Proceedings of the IEEE/Cvf conference on computer vision and pattern
current pulsed thermography non-destructive testing. Philosophical Transactions. recognition (pp. 13029–13038). http://dx.doi.org/10.48550/arXiv.2011.08036.
Series A, Mathematical, Physical, and Engineering Sciences, 378, Article 20190607. Wang, T., Chen, Y., Qiao, M., & Snoussi, H. (2018). Fast dynamic hysteresis
http://dx.doi.org/10.1098/rsta.2019.0607. modeling using a regularized online sequential extreme learning machine with
Milletari, F., Navab, N., & Ahmadi, S.-A. (2016). V-net: Fully convolutional neural forgetting property. International Journal of Advanced Manufacturing Technology, 94,
networks for volumetric medical image segmentation. In 2016 fourth international 3465–3471. http://dx.doi.org/10.1007/s00170-017-0549-x.
conference on 3D vision (3DV) (pp. 565–571). IEEE, http://dx.doi.org/10.1109/3DV.
Wang, J., Li, Q., Gan, J., Yu, H., & Yang, X. (2019). Surface defect detection via entity
2016.79.
sparsity pursuit with intrinsic priors. IEEE Transactions on Industrial Informatics,
Neogi, N., Mohanta, D. K., & Dutta, P. K. (2014). Review of vision-based steel surface
16(1), 141–150. http://dx.doi.org/10.1109/TII.2019.2917522.
inspection systems. EURASIP Journal on Image and Video Processing, 2014(1), 1–19.
Wang, H., Zhang, J., Tian, Y., Chen, H., Sun, H., & Liu, K. (2018). A simple guidance
http://dx.doi.org/10.4028/www.scientific.net/AMR.308-310.1328.
template-based defect detection method for strip steel surfaces. IEEE Transactions
Nguyen, N. H. T., Perry, S., Bone, D., Le, H. T., & Nguyen, T. T. (2021). Two-stage
on Industrial Informatics, 15(5), 2798–2809. http://dx.doi.org/10.1109/TII.2018.
convolutional neural network for road crack detection and segmentation. Expert
2887145.
Systems with Applications, 186, Article 115718. http://dx.doi.org/10.1016/j.eswa.
2021.115718. Woodford, O., Torr, P., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction
Park, F. C., & Martin, B. J. (1994). Robot sensor calibration: solving AX=XB on the under second-order smoothness priors. IEEE Transactions on Pattern Analysis and
Euclidean group. IEEE Transactions on Robotics and Automation, 10(5), 717–721. Machine Intelligence, 31(12), 2115–2128. http://dx.doi.org/10.1109/TPAMI.2009.
http://dx.doi.org/10.1109/70.326576. 131.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Wu, H., Xu, X., Chu, J., Duan, L., & Siebert, P. (2019). Particle swarm optimization-
Unified, real-time object detection. In Proceedings of the IEEE conference on computer based optimal real gabor filter for surface inspection. Assembly Automation, 39,
vision and pattern recognition (pp. 779–788). http://dx.doi.org/10.48550/arXiv. 963–972. http://dx.doi.org/10.1108/AA-04-2018-060.
1506.02640. Yang, Y., Yang, Y., Li, L., Chen, C., & Min, Z. (2022). Automatic defect identification
Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings method for magnetic particle inspection of bearing rings based on visual charac-
of the IEEE conference on computer vision and pattern recognition (pp. 7263–7271). teristics and high-level features. Applied Sciences, [ISSN: 2076-3417] 12(3), 1293.
http://dx.doi.org/10.48550/arXiv.1612.08242. http://dx.doi.org/10.3390/app12031293.
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. http://dx.doi. Youkachen, S., Ruchanurucks, M., Phatrapomnant, T., & Kaneko, H. (2019). Defect
org/10.48550/arXiv.1804.02767, arXiv preprint arXiv:1804.02767. segmentation of hot-rolled steel strip surface by using convolutional auto-encoder
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks and conventional image processing. In 2019 10th international conference of informa-
for biomedical image segmentation. In International conference on medical image tion and communication technology for embedded systems (IC-ICTES) (pp. 1–5). IEEE,
computing and computer-assisted intervention (pp. 234–241). Springer, http://dx.doi. http://dx.doi.org/10.1109/ICTEmSys.2019.8695928.
org/10.1007/978-3-319-24574-4_28. Yu, H., Li, Q., Tan, Y., Gan, J., Wang, J., Geng, Y.-a., et al. (2018). A coarse-to-fine
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: a model for rail surface defect detection. IEEE Transactions on Instrumentation and
database and web-based tool for image annotation. International Journal of Computer Measurement, 68(3), 656–666. http://dx.doi.org/10.1109/TIM.2018.2853958.
Vision, 77(1), 157–173. http://dx.doi.org/10.1007/s11263-007-0090-8.
Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame
correspondence. In European conference on computer vision (pp. 151–158). Springer,
stereo correspondence algorithms. International Journal of Computer Vision, 47(1),
http://dx.doi.org/10.1007/BFb0028345.
7–42. http://dx.doi.org/10.1023/A:1014573219977.
Zhang, Z. (1999). Flexible camera calibration by viewing a plane from unknown
Shi, T., Kong, J.-y., Wang, X., Liu, Z., & Zheng, G. (2016). Improved sobel algorithm
orientations. In Proceedings of the seventh ieee international conference on computer
for defect detection of rail surfaces with enhanced efficiency and accuracy. Journal
vision, Vol. 1 (pp. 666–673). Ieee, http://dx.doi.org/10.1109/ICCV.1999.791289.
of Central South University, 23(11), 2867–2875. http://dx.doi.org/10.1007/s11771-
016-3350-3. Zhang, J., Kang, X., Ni, H., & Ren, F. (2021). Surface defect detection of steel strips
Shi, B., & Qiao, P. (2018). A new surface fractal dimension for displacement mode based on classification priority YOLOv3-dense network. Ironmaking & Steelmaking,
shape-based damage identification of plate-type structures. Mechanical Systems and 48(5), 547–558. http://dx.doi.org/10.1080/03019233.2020.1816806.
Signal Processing, 103, 139–161. http://dx.doi.org/10.1016/j.ymssp.2017.09.033. Zhao, W., Chen, F., Huang, H., Li, D., & Cheng, W. (2021). A new steel defect detection
Shi, J., Wu, K., Yang, C., & Deng, N. (2021). A method of steel bar image segmentation algorithm based on deep learning. Computational Intelligence and Neuroscience, 2021,
based on multi-attention U-net. IEEE Access, 9, 13304–13313. http://dx.doi.org/10. http://dx.doi.org/10.1155/2021/5592878.
1109/ACCESS.2021.3052224. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020). Distance-iou loss:
Shipway, N., Barden, T., Huthwaite, P., & Lowe, M. (2019). Automated defect detection Faster and better learning for bounding box regression. In Proceedings of the AAAI
for fluorescent penetrant inspection using random forest. NDT & E International, conference on artificial intelligence, Vol. 34 (pp. 12993–13000). http://dx.doi.org/10.
101, 113–123. http://dx.doi.org/10.1016/j.ndteint.2018.10.008. 1609/aaai.v34i07.6999.

14
Q. Wu et al. Expert Systems With Applications 214 (2023) 118966

Zheng, J., Xie, W., Viens, M., Birglen, L., & Mantegh, I. (2013). Design of advanced Zhou, F., Liu, G., Xu, F., & Deng, H. (2019). A generic automated surface defect
automatic inspection system for turbine blade fpi analysis. In AIP conference detection based on a bilinear model. Applied Sciences, 9, 3159. http://dx.doi.org/
proceedings, Vol. 1511 (pp. 612–619). American Institute of Physics, http://dx.doi. 10.3390/app9153159.
org/10.1063/1.4789103. Zhou, S., Wu, S., Liu, H., Lu, Y., & Hu, N. (2018). Double low-rank and sparse
Zhiznyakov, A., Privezentsev, D., & Zakharov, A. (2015). Using fractal features of digital decomposition for surface defect segmentation of steel sheet. Applied Sciences, 8(9),
images for the detection of surface defects. Pattern Recognition and Image Analysis, 1628. http://dx.doi.org/10.3390/app8091628.
25(1), 122–131. http://dx.doi.org/10.1134/S105466181501023X. Zou, Q., Zhang, Z., Li, Q., Qi, X., Wang, Q., & Wang, S. (2018). Deepcrack: Learning
hierarchical convolutional features for crack detection. IEEE Transactions on Image
Processing, 28, 1498–1512. http://dx.doi.org/10.1109/TIP.2018.2878966.

15

You might also like