You are on page 1of 21

ed

SDDNet: Surface Defect Detection Model for

iew
Classification and Localization of Surface Defects
Manjeet Kaur1 , Krishan Kumar Chauhan1 , Isibor Kennedy Ihianle2 ,
Kayode Owa2 , Naveen Aggarwal1 , Renu Vig1 , Garima Joshi1∗
1
UIET, Panjab University, Chandigarh, India

ev
2
School of Science and Technology, Nottingham Trent University,
50 Shakespear Street, Nottingham, NG1 4FQ, United Kingdom

r
Abstract

er
Quality inspection of fasteners is compulsory for safe and reliable operations.
Manual inspection is usually necessary to assure product quality, but this
results in production bottlenecks, lower productivity, and poor efficiency.
pe
To meet the increased demand for high-quality products, sophisticated vi-
sual inspection systems must be integrated into production lines to address
these obstacles. It has the potential to have an influence on industrial sus-
tainability, since it will aid in enhancing industry throughput by identifying
flaws and pinpointing the root cause. In image classification and localization
ot

applications, object detection frameworks have lately shown extraordinary


performance. By customizing object detectors, an intelligent defect detec-
tion system can be designed for quality inspection assignments. Single shot
tn

detectors (SSD) approach based on MobilenetV2 also known as Surface de-


fect detection network (SDDNet) is investigated, deployed and assessed in
this work. It is looked into how to adjust the single shot detector’s anchors
to take different object shapes and sizes into account. The proposed method
greatly improves the accuracy for fault detection to 98.04% when compared
rin

to 94.64% without adjusting the anchor aspect ratios.


Keywords: MobilenetV2; object detection; anchor tuning; defect detection;
precision
ep


Corresponding author
Email address: joshi_garima5@yahoo.com ()
Pr

Preprint submitted to Journal of LATEX Templates October 20, 2022

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
1. Introduction

iew
For sustainable operations in fastener manufacturing industry, defects in
fasteners must be recognised and associated with the source of the problems
so that the underlying cause can be determined and corrected [1]. The var-
5 ious types of surface defects caused by mechanical processing, are classified
as crack, wrinkle, dent, scratches, cut, missing and misaligned threads [2]. A
crack is a visible damage along a fastener’s border that is typically caused

ev
by a stressed surface during the heating process [3]. Fastener raw materials
might potentially develop cracks. Wrinkles are a form of imperfection on
10 fasteners caused by material displacement during the forging process of nuts
in particular [4]. Another kind of defect is a dent, which happens when metal

r
is under filled during the forging process, resulting in a deep mark in the sur-
face. Scratches on fasteners is another common defect caused by excessive

15 er
thermal stress on the fastener’s surface during the heat treatment [5]. Cut
on the surface is frequently produced by improper handling of work piece
during tool movement. Damaged threads are also caused by mismanagement
pe
of production tools, resulting in overfilling, misalignment, and mismatched
threads on the fasteners [6]. Thus, an automatic segregation mechanism is
essential to detect these defect.
20 Although image processing and machine learning-based approaches are ef-
ficient but there are certain limitations of these techniques. Images must
be pre-processed, filtered, and scaled applying image processing procedures,
ot

while in machine learning extensive feature extraction process must be en-


sured to achieve the required degree of accuracy [7]. Furthermore, since these
25 algorithms need previous knowledge of all defect shapes, they are unable to
tn

cope with flexibility for noisy and low-quality and resolution images. To
overcome the limitations of all of these strategies, deep learning has been
proven to be much more efficient [8].
The defect detection model proposed in this paper aims to classify as well as
rin

30 localize all the defects present on the surface. From object detection problem
point of view, surface detection needs a trade-off between speed and accuracy
of the model. In the proposed work, an accurate and precise defect classifi-
cation and localization model is targeted rather than a fast but less accurate
ep

model. This paper makes three main contributions: (1) It presents a deep
35 machine learning model for image classification and localization; (2) It evalu-
ates the model on two different datasets containing fasteners of different sizes
and defects; (3) Further, it shows that the application of anchor values for
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
different aspect ratios significantly improves the classification performance of

iew
the proposed surface defect detection network, with respect to the different
40 features presented. The resulting approach is thus able to be implemented in
real-world automotive user interfaces to provide greater quality inspection of
fasteners. The paper is organised as follows. Section 2 provides a discussion
of related work conducted in the literature. Section 3 describes the proposed
methodology of the study, including a description of the dataset. Section

ev
45 4 presents the experimental results and Section 5 provides the concluding
remarks of the study, as well as proposed future work.

2. Related Works

r
Related work carried out for automatic defect detection system on fas-
teners using deep learning techniques are summarized in this section. Kou
50
er
et al. designed a defect detection model based on YOLOv3 to extract rich
feature information. Dense convolution blocks were included which substan-
tially improved feature characterization ability. On the GC10-DET dataset,
pe
the authors suggested model produced 71.3% mAP, while on the NEU-DET
dataset, it produced 72.2% mAP [9]. Xu et al. integrated the Darknet-53
55 model’s layers with a deep feature neural network. This model was based on
the YOLOv3 model and gives out more than 75% accuracy [10]. In another
study, Gai et al. developed a VGGNet architecture to detect defects on steel
ot

metal surfaces. They used an industrial camera to take images of defected


steel metal surfaces and found that their model performed better than a tra-
60 ditional ones [11]. Song et al. presented a CNN based technique in which
they considered damaged screws as defective case, images were taken with
tn

industrial camera. The model achieved 98% accuracy [12]. Patar et al. devel-
oped a faster region-based CNN (faster-RCNN), which is based on the same
screw classes as proposed by Song et al. and is implemented on a Raspberry
65 Pi 3 with a camera module for image capture. The model outperformed the
rin

usual template matching and single shot detection models, with an accuracy
of over 98% [13]. Chen et al. proposed a novel three stage DCNN-based
detection setup, two detectors to localize the cantilever joints and their fas-
teners and a classifier to inspect the fasteners. The proposed model achieved
ep

70 high detection rate [14]. On the publicly accessible dataset, GC10-DET, Lv


et al. suggested an end-to-end defect detection method based on single shot
multi-box detector for defect detection on metallic surfaces. A hard negative
mining method was used to resolve the problem of data imbalance [15].
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
A variety of deep learning-based techniques for defect identification on fas-

iew
75 teners have been investigated after a comprehensive study of the literature
[16]. Both transfer learning-based pre-trained models and developed models
based on pre-trained algorithms have been extensively employed for defect
assessment using deep learning-based techniques [17]. He et al. used 3 aspect
ratios [1:1, 1:2, 2:1] in Region Proposal Network (RPN) which employed to
80 extract region by sliding the feature map, which takes image as an input and

ev
returns a set of rectangular box with an object score as an output [18]. Chen
et al. used k-means clustering to determine the size of anchor boxes in ex-
tended feature pyramid network(EFPN). They used three anchor boxes per
level. Feature maps from the upper layer used a larger anchor box to capture

r
85 a large object, and the lower layers used a smaller anchor box to extract small
objects [19]. The authors observed that YOLOv2 and YOLOv3 model could
not distinguish the small size defects in NEU-DET dataset. They proposed

90
er
end to end defect detection network (EEDN) based on Faster-RCNN which
overcame this problem by using anchor boxes [15].
Although the related works mentioned above are commendable not all de-
pe
fect type, sizes and shapes were considered. A major limitation of YOLOv3
object detection models is its inability to recognise extremely small defects
in higher resolution. The failure to implement models that can match the
industrial quality standards. Furthermore, no machine vision based image
95 dataset for fasteners has been generated so far. Also, single shot detector
ot

light weight detectors are capable of performing in real-time scenario never-


theless, there is a need to investigate techniques which can help in enhancing
the performance of these models for defect identification systems. Moreover,
it is worth mentioning that the defect detection models should be extended
tn

100 to consider object of various sizes.

3. Proposed Methodology
rin

The proposed methodology for machine vision camera based on auto-


matic defect localization setup is shown in Figure 1. As depicted in the
block diagram, the images of the defective and non-defective fasteners are
105 captured using a machine vision camera. Data augmentation is applied from
ep

the resulting images. The images processed, annotated, labelled and then
used as an input of a Surface Defect Detection Network (SSDNet) classifier
with anchor box tuning to optimise performance.
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
iew
r ev
er
pe
ot

Figure 1: Proposed methodology for machine vision based automatic defect localization
tn

3.1. Dataset Description


110 To train the object detection framework for automatic surface defect de-
tection, it is important to create a database of images of different types of
defects. Unlike, generic object detection problem, fastener defect detection
rin

does not have a standard database that is available for this purpose. Hence
for the object detection proposed in this paper, a database specifically for
115 small fasteners (nuts) was created - Small Nut Surface Defect Dataset. The
proposed model has also been tested and validated using a publicly available
ep

dataset - NEU-DET surface defect database. Details of these datasets are as


presented in following paragraphs.
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
3.1.1. Small Nut Surface Defect Dataset
Samples of defective and non-defective nuts of size M6 1 (shape: hexago-

iew
120

nal; width across flats: 10mm; thickness: 5.2mm) and M8 (shape: hexagonal;
width across flats: 13mm; thickness: 6.8mm) have been collected from sev-
eral industries for the proposed study work. Five megapixel, Sony IMX264
CMOS sensor machine vision camera with global shutter from LUCID vision
125 under diffused light has been used to gather the samples. Data augmentation

ev
in terms of scaling, contrast variation, rotation resulted in total 1220 images.
Each image has been cropped to the required portion of image to reduce the
complexity of model. Minor details of defects are visible in the images that
have been captured with machine vision camera, a clear difference can be

r
130 seen in Figure 2. The level of complexity in the defect detection is depicted
in Figure 3, varying from the large defects which are clearly visible and the
defects with minor cracks which are difficult to detect. Also, it can be seen
er
that the shape and size of even a similar defect is different. To identify the
pe
ot
tn

Figure 2: a) Image of smallest target Size of Nut b) Microscopic Crack visible with Machine
Vision Camera c) Machine Vision Camera Setup

defect, each image has been annotated and labelled as part of the data pro-
rin

135 cessing in readiness for the classifier. In this study, four classes are targeted:
dent, crack, scratch and non-defective fasteners. To perform labeling of im-
ages, LabelImg tool has been used. A bounding box is created around the
defective surface, which is used to identify and classify the defect. For each
labelled image, an xml file is created, which contains the exact dimension
ep

140 and class label of all the bounding boxes that describe the defect position.

1
Engineers Edge
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
iew
Figure 3: Level of Complexity in the Dataset

ev
3.1.2. NEU-DET Surface Defect Database
An experiment has been done on a dataset created by Northeastern Uni-

r
versity (NEU) also known as NEU-DET Dataset [15]. The dataset consist of
six types of surface defects of the hot-rolled steel strip. The defect classes in
this dataset involve patches, pitted surface, scratches, rolled-in scale, crazing,
145

er
and inclusion. The dataset consists of 1800 grayscale images, 300 images per
class, resolution of each image is 200x200.
pe
3.2. Surface Defect Detection Network (SDDNet)
A hybrid model called surface defect detection network (SDDNet) is based
150 on single shot detector (SSD) architecture, which takes only one shot to de-
tect all the objects present in an image. Conventional SSD uses MobileNetV2
as the backbone network [20]. MobileNet is a lightweight network that sub-
ot

stantially reduces the network’s complexity, cost and size. It is intended for
real-time object detection. SSD utilizes multi-scale features calculated by
155 feature pyramid network (FPN) layer and default anchor boxes [21]. In the
proposed method, FPN architecture has been used in the Pyramid Structure
tn

for feature extraction. Deep learning architectures directly up-samples the


intermediate outputs to the size of the final predicted mask resulting in a
coarse mask. Up-sampling one resolution up, fusing with the previous in-
160 termediate output, and continuing the up-sampling process in this manner
rin

preserves the finest details. If the n × n feature map is first up-sampled to


2n × 2n followed by an up-sampling to the size, 4n × 4n then the maximum
local structure is preserved. This kind of up-sampling procedure is known as
up-sampling in a pyramidic manner. The feature maps are fused to create
ep

165 the final output. SSD head contains 6 prediction layers which are basically
the auxiliary convolutional layers, known as detection head [22]. In detection
head, feature maps are used for predictions. Each box of detection head con-
sists of two parameters: confidence scores, and a box offset/box prediction.
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
Confidence score tells about the object score in an image, if the object score

iew
170 is higher than the background score, that box will be selected as positive
default box. Each box will be having certain location, size, shape and aspect
ratio, box offset/box prediction adjust all these parameters and generate 4
offset related to original default bounding box [23]. In the proposed model,
the default anchor boxes have been replaced by the new anchor boxes.

r ev
er
pe

Figure 4: Proposed Architecture of SDDNet


ot

175 3.2.1. Anchor Box Tuning


tn

Anchor box tuning is one of the important factor that can be adjusted
to boost performance, and optimize the model in terms of accuracy and ef-
ficiency [24]. If anchor boxes are not adjusted as per the need of dataset,
CNN may not be able to detect some small, large, or irregular objects. There-
rin

180 fore, considerable effort can be made to ensure that the correct anchor boxes
are chosen by tuning them according to the need of dataset. The process
described here is used by SSD to tune anchors. First step is to create multi-
ple anchor boxes for each predictor that approach the appropriate position,
size, and shape of the object. Next for each anchor box, find which object’s
ep

185 bounding box has the biggest IoU (Intersection Over Union) and for IoU more
than 50%, it instructs the neural network to learn the object with greatest
IoU. Else, it notifies the neural network that the real detection is ambiguous
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
and that the example should not be learnt. If the maximum IoU is less than

iew
40%, the anchor box should assume that there is no object [25]. In practise,
190 this works well, and the hundreds of predictors do a great job of assessing
whether or not an image contains a particular type of object.If default an-
chor box setup is used, it can result in predictors that may not reach a 50%
IoU with any of the anchor boxes. In this instance, the neural network will
never be aware of the existence of these entities and so will never be able to

ev
195 detect them.

3.2.2. Anchor tuning using k-means clustering


For anchor tuning, k-means algorithm (an unsupervised learning ap-

r
proach) is used. It is computationally faster and more efficient, to find a
collection of aspect ratios that cover the majority of shapes in the dataset
[26]. This has been accomplished by identifying common clusters of the

er
pe
ot
tn

Figure 5: Height Width Scatter plot and Anchor Box Visualisation from the Training
rin

Dataset
200
dataset’s bounding boxes and then locating the centroids of these clusters
using the k-means clustering. It created thousands of anchor boxes for our
dataset as shown in Figure 5. The visualization of height and width scatter
ep

plot shows that most concentrated values in the lower range. Therefore, for
205 training the model with best suitable anchors and aspect ratio (height/width
ratio for the anchor boxes) values are chosen in order to match one of multiple
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
pre-defined anchor boxes to the ground truth bounding boxes while training

iew
[27]. In this work, 5, 10, and 15 aspect ratios of anchors for training the
model have been selected and results have been compared with the default
210 anchor value of 3 with aspect ratios 0.5, 1.0, 2.0. Table 1 provides the set of
anchor values and aspect ratio used. For visualization and comparison pur-
pose, anchor boxes for default anchors and 15 anchors are shown in Figure
6.

ev
Anchor Values Aspect Ratios
3 0.50, 1.00, 2.00
5 0.15, 0.75, 1.00, 1.34, 6.90
10 0.10, 0.30, 0.40, 0.50, 0.70, 1.80, 2.30, 2.80, 4.00, 6.90
15 0.07, 0.15, 0.24, 0.35, 0.45, 0.65, 0.75, 1.00,

r
1.34, 1.55, 2.25, 2.90, 4.17, 6.90, 15.40

Table 1: Anchor values and aspect ratios

er
pe
ot
tn

Figure 6: Aspect Ratios for 3 and 15 Anchor Boxes


rin

3.3. Performance Metrics


215 For the purpose of classification, the dataset is split into 80:20 training
ep

and test subsets. During detection, the model predicts numerous bounding
boxes for each item and eliminates unneeded boxes based on their confidence
value. The performance of the model is computed based on the following
metrics:
Pr

10

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
220 • Accuracy: It is one of the important classification metric that is defined

iew
as ”The total number correct predictions divided by the total number
of predictions, into hundred.”

• Mean Average Precision (mAP): For multiclass classification, mAP is


defined as the mean of average precision across all classes. For a model
225 to be precise, the model predicted label has to be true and the pre-
diction matches with ground truth labels. Average precision is the

ev
precision averaged across all distinct recall levels. This parameter is
important to compare different detectors in terms of a numerical met-
ric that could be directly used for comparison.

r
230 • Average Recall (AR): Recall is the number of true positives divided by
the number of ground-truths. It is estimate of how much the model

er
has remembered. Average recall is the recall averaged over all IoUs and
can be computed as two times the area under the recall-IoU curve.

• True Positive (TP): IoU is the measure of overlap between the pre-
pe
235 dicted bounding box and a ground-truth box. For IoU greater than the
threshold, the prediction is classified as TP. Here, a threshold of 50%
(0.5) is used to derive the results.

• Confidence Score: A confidence score is a evaluation standard. It is the


probability of the image being detected correctly by the algorithm and
ot

240 is given as a percentage of correct prediction.The scores are taken on


the mean average precision at different IoU thresholds. For this work,
the thresholds are set at 0.5 that is, if the confidence of the detected
tn

object is over 50%, only then the label will be taken for evaluation.

4. Results and Discussion


rin

245 This section presents the results achieved through the proposed method-
ology described in the previous sections. A set of experiments were per-
formed on the Small Nut Surface Defect Dataset and then the NEU-DET
Dataset. The class wise performance for — scratch, dent, crack, and non-
ep

defectives(ND) of the Small Nut Surface Defect Dataset using 3, 5, 10, and
250 15 anchors is as presented in Figure 7. The model with 15 anchors has pro-
vided greater precision in the cases of dent, crack and ND. In case of scratch,
the greatest precision is provided by model with 10 anchors. Considering
Pr

11

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
scratch as a non-critical defect, it can be concluded that 15 anchors are best

iew
suitable for the small nut surface defect dataset. Also the classification loss
and localization loss is minimum for in case of 15 anchors.

r ev
er
pe

Figure 7: Class-wise Performance


ot

255

Figure 8 shows the comparison of performance parameters by adjusting


the anchors. The results are also listed in Table 2. In total, four experiments
tn

have been conducted. The first experiment used the default anchor size of 3,
which has only three aspect ratio values. With an anchor value of 15 and a
260 50% IoU, accuracy is 98.04% from a class-wise classification of the fasteners.
This outperforms the default anchor value of 3 which has an accuracy of
rin

94.64%. Moreover, we performed a further classification to distinguish be-


tween the three defects and non-defective as the fourth class. Overall mAP
is 97% with an anchor value of 15 at 50% IoU. Crack was mostly detected
265 with an anchor value of 10 and Scratch anchor value of 15 with 99.28% and
99.27% accuracy, respectively. Although 10 anchors successfully classified
ep

all, while 15 anchor values reported best average accuracy, precision, and
recall. An investigation of the relative performance of all the anchor values
shows that 15 anchors performed best as presented in Figure 8. This further
Pr

12

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
0.985

iew
0.98

0.975

0.97

0.965

0.96

ev
0.955

0.95

0.945

0.94

r
0.935
Anchors=3 Anchors=5 Anchors=10 Anchors=15

Accuracy
er
mAP@0.5 Average Recall

Figure 8: Performance of SSDNet with Anchor Variation


pe
270 proves the significance of tuning the anchors to achieve the best performance.
These classification results demonstrate that the proposed model is capable
of distinguishing cracks, scratches, and dents from non-defective features as
part of defective fastener detection.
ot

One of the most crucial tuning hyperparameters is learning rate. It deter-


275 mines how frequently the weights of the network are updated during training.
If the learning rate is low, the model will require more training steps and time
tn

since each weight in the training network will be updated frequently. These
weights are updated in accordance with the learning rate assigned to the
model. Figure 9 shows the predicted results for the proposed model in case
280 of small nut dataset. Figure 10 illustrates the difference between the train-
ing of models without anchor tuning and with anchor tuning. While training
rin

without anchor tuning, it can be seen that the learning rate falls drastically
within 5000 training steps. This means that the weights of the model are
updated divergently. On the other hand, with anchor tuning learning rate
285 varies slowly with respect to the training steps since updates to the weights
ep

of the model are also slower and result in lower loss function. This way the
training extended to 35,000 training steps with learning rate of value 0.0179.
Figure 11 compares the total loss value all over the training steps with and
Pr

13

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
Overall Results
Anchors
mAP @0.5 Accuracy Ave. Recall Tot. Pred.

iew
3 0.95 0.95 0.96 431 (42%)
5 0.96 0.97 0.97 540 (53%)
10 0.97 0.97 0.97 616 (60%)
15 0.97 0.98 0.98 703 (68%)
Classwise Accuracy
Anchors
Dent Crack Scratch Non Defectives
3 0.98 1.00 0.96 0.95
5 0.99 0.99 0.98 0.97
10 0.99 0.99 0.98 0.98

ev
15 0.99 0.99 0.99 0.99
Classwise Precision
Anchors
Dent Crack Scratch Non Defectives
3 0.92 1 0.92 0.99
5 0.97 0.98 0.92 1.00
10 0.97 0.96 0.97 0.99

r
15 0.98 0.94 0.99 0.96
Classwise Recall
Anchors
Dent Crack Scratch Non Defectives
3
5
10
15
Anchors
1.00
0.99
0.99
0.98

Dent
1.00
1.00
0.98
1.00

Crack
er 0.96
0.99
0.98
0.99
F1 Score
Scratch
0.89
0.91
0.95
0.96

Non Defectives
pe
3 0.96 1.00 0.94 0.94
5 0.98 0.99 0.96 0.96
10 0.98 0.97 0.97 0.97
15 0.98 0.97 0.99 0.98

Table 2: Analysis of Performance with Anchor Tuning @0.5IoU


ot

Models mAP@0.5 Accuracy Recall Tot. Pred Pred Rate


SSDMobilnetV2 320x320 0.86 0.85 0.86 411 0.40
SSDMobilnetV2 640x640 0.95 0.95 0.96 431 0.23
Efficientdet 0.87 0.92 0.94 408 0.39
Proposed SDDNet 0.97 0.98 0.98 703 0.68
tn

Table 3: Comparison with other models for Small Nut Dataset with Pred. as Prediction

without anchor tuning. It is apparent that using a model without anchor tun-
rin

290 ing causes a larger overall loss. Anchor size has been adjusted based on the
different types of defects identified in the dataset. As a result, the model can
now detect small, large, and irregular faults, which reduces the overall loss
value. Classification loss occurs when the bounding box does not match the
ep

label of the predicted class. The network’s regularization function generates


295 regularization loss, which can be used to guide the optimization algorithm in
the appropriate direction. Over-fitting and under-fitting are eliminated us-
ing regularization loss. Localization loss is the measure of difference between
Pr

14

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
iew
r ev
(a) Labeled (b) Predicted

er
pe
ot

(c) Labeled (d) Predicted

Figure 9: Predictions on Small Nut Dataset


tn

the predicted bounding box and labeled bounding box. Figure 12 illustrates
the classification loss, regularization loss, and localization loss for various
300 anchor sizes. The minimum value of classification loss of 0.42 is achieved
in case of 3 anchors. The regularization loss remains unaffected by anchor
rin

tuning. While, the localization loss of approximately 0.3 is achieved with


anchor tuning. The decreased value of localization loss is indicative of the
overall improvement in precision of the proposed model.
305 The proposed model is trained and analysed on a NEU-DET dataset. The
ep

predictions made by model are then compared with the test set by matching
the class and location of defect present in both the predicted image and test
image. The four entities (true positive, false negative, false positive, true
negative) of confusion matrix have been calculated based on the comparison
Pr

15

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
iew
r ev
er
Figure 10: Comparison of Learning Rate
pe
ot
tn
rin

Figure 11: Comparison of Total Training Loss

made.
ep

310

Table 4 presents the comparison of the proposed model with anchor tuning
and without anchor tuning in the case of NEU-DET dataset. With default
anchors having 3 values the precision of 69%, accuracy of 96.24%, 80% recall
Pr

16

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
iew
r ev
er
Figure 12: Loss Values at different Anchor Sizes
pe
ot
tn

Figure 13: Predictions on NEU-DET dataset

Proposed Model mAP@0.5 Accuracy Recall Tot. Pred Pred Rate


rin

Default Anchors 0.69 0.96 0.80 205 0.25


with Anchor Tuning 0.99 0.99 0.99 543 0.67

Table 4: NEU-DET Dataset Results


ep

is achieved and the prediction rate is only around 25%. On the other hand
315 in the case of model with anchor tuning to train the model. It achieved
precision of 99%, accuracy of 99.82% with the 99% of recall rate, and 67%
Pr

17

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
Model mAP@0.5 Recall F1-Score
FRCN 0.70 0.93 0.80

iew
SSD 0.72 0.96 0.83
YOLOv2 0.50 0.74 0.60
YOLOv3 0.40 0.73 0.52
EEDN 0.72 0.99 0.84
IMN-YOLOv3 0.98 0.96 0.97
Proposed SDDNet 0.99 0.99 0.99

Table 5: NEU-DET Dataset Results Comparison with other Models

ev
of prediction rate is achieved.
Table 5 shows the detailed comparison results of proposed model with
other models in terms of mAP@0.5, recall and F1 score in the case of NEU-

r
320 DET dataset . In this paper, the test set of the dataset is used to evaluate
proposed model and competitive results with 99% mAP@0.5, 99.8% of recall

er
rate, and achieved 99.3% F1 score in the real-time scenario at 25fps.

5. Conclusion
pe
In this article, single shot detectors (SSD) approach based on MobilenetV2
325 also known as Surface defect detection network (SDDNet) was proposed for
object detection classification and localization specifically fasteners. The
study was also based on the Small Nut Surface Defect Dataset of M6 and
M8 nut sizes and a publicly open dataset. The experimental result shows that
ot

the proposed SSDNet shows that outperformed SSDMobilnetV2 320x320, SS-


330 DMobilnetV2 640x640, Effiecientdet models in the classification of fasteners
with scratches, dents, cracks and the non-defectives. By analyzing the overall
performance of all the models it can be concluded that the anchor tuning is
tn

an important parameter to detect various shapes and sizes of surface defects


effectively. The result also demonstrates how anchor tuning and aspect ratio
335 as factors play a significant role in the classification of defects of objects of
various sizes. These generate multiple bounding boxes with varying scales
rin

and aspect ratios centered on each pixel. The best result achieved was with
anchor size 15, thus leading to a conclusion as the optimal anchor size for
the small nut surface defect dataset when scratch is taken into consideration.
340 Using SSDNet with anchor tuning provides the advantage of evaluating the
ep

object being detected at once by computing a separate prediction at every


potential position eliminating the need to scan the image with a sliding win-
dow. Future work will be directed towards enhancing the speed for detection
of defects on the go.
Pr

18

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
345 Acknowledgement

iew
This research was supported under the NTU-PU (Nottingham Trent Uni-
versity, United Kingdom and Panjab University, India) Science and Technol-
ogy Partnership Collaborative (STPC) research grant.

References

ev
350 [1] I. Pastor-López, I. Santos, A. Santamarı́a-Ibirika, M. Salazar, J. De-
la Peña-Sordo, P. G. Bringas, Machine-learning-based surface defect
detection and categorisation in high-precision foundry, in: Industrial
Electronics and Applications, 2012, pp. 1359–1364.

r
[2] S. Kim, W. Kim, Y.-K. Noh, F. C. Park, Transfer learning for auto-
355

Networks, 2017, pp. 2517–2524.


er
mated optical inspection, in: International Joint Conference on Neural

[3] C. Gudas, et al., The effects of fatigue cracks on fastener loads during
pe
cyclic loading and on the stresses used for crack growth analysis in clas-
sical linear elastic fracture mechanics approaches, Materials Sciences
360 and Applications 11 (2020) 505.

[4] G. Olafsson, R. Tighe, S. Boyd, J. Dulieu-Barton, Lock-in thermography


using miniature infra-red cameras and integrated actuators for defect
ot

identification in composite materials, Optics & Laser Technology 147


(2022) 1–11.
tn

365 [5] E. Westphal, H. Seitz, A machine learning method for defect detec-
tion and visualization in selective laser sintering based on convolutional
neural networks, Additive Manufacturing 41 (2021) 1–13.

[6] D. Selvathi, I. H. Nithilla, N. Akshaya, Image processing techniques


rin

for defect detection in metals using thermal images, in: International


370 Conference on Trends in Electronics and Informatics, IEEE, 2019, pp.
939–944.
ep

[7] P. Xue, C. Jiang, H. Pang, Detection of various types of metal surface


defects based on image processing., Traitement du Signal 38 (2021).
Pr

19

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
[8] R. Manish, A. Venkatesh, S. D. Ashok, Machine vision based image pro-

iew
375 cessing techniques for surface finish and defect inspection in a grinding
process, Materials Today: Proceedings 5 (2018) 12792–12802.

[9] X. Kou, S. Liu, K. Cheng, Y. Qian, Development of a yolo-v3-based


model for detecting defects on steel strip surface, Measurement 182
(2021) 1–9.

ev
380 [10] Y. Xu, K. Zhang, L. Wang, Metal surface defect detection using modified
yolo, Algorithms 14 (2021) 1–14.

[11] X. Gai, P. Ye, J. Wang, B. Wang, Research on defect detection method

r
for steel metal surface based on deep learning, in: Information Technol-
ogy and Mechatronics Engineering Conference, 2020, pp. 637–641.

385

er
[12] L. Song, X. Li, Y. Yang, X. Zhu, Q. Guo, H. Yang, Detection of micro-
defects on metal screw surfaces based on deep convolutional neural net-
works, Sensors 18 (2018) 1–14.
pe
[13] M. N. A. A. Patar, M. A. Ayub, N. A. Zainal, M. A. Rosly, H. Lee,
A. Hanafusa, Detection of micro-defects on metal screw surfaces based
390 on faster region-based convolutional neural network, in: Intelligent Man-
ufacturing and Energy Sustainability, Springer, 2022, pp. 587–597.
ot

[14] J. Chen, Z. Liu, H. Wang, A. Núñez, Z. Han, Automatic defect detec-


tion of fasteners on the catenary support device using deep convolutional
neural network, IEEE Transactions on Instrumentation and Measure-
ment 67 (2017) 257–269.
tn

395

[15] X. Lv, F. Duan, J.-j. Jiang, X. Fu, L. Gan, Deep metallic surface defect
detection: The new benchmark and detection network, Sensors 20 (2020)
1–15.
rin

[16] S. Taheritanjani, R. Schoenfeld, B. Bruegge, Automatic damage detec-


400 tion of fasteners in overhaul processes, in: International Conference on
Automation Science and Engineering, IEEE, 2019, pp. 1289–1295.
ep

[17] H. Baumgartl, J. Tomas, R. Buettner, M. Merkel, A deep learning-


based model for defect detection in laser-powder bed fusion using in-
situ thermographic monitoring, Progress in Additive Manufacturing 5
405 (2020) 277–285.
Pr

20

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498
ed
[18] Y. He, K. Song, Q. Meng, Y. Yan, An end-to-end steel surface defect de-

iew
tection approach via fusing multiple hierarchical features, IEEE Trans-
actions on Instrumentation and Measurement 69 (2019) 1493–1504.

[19] X. Chen, J. Lv, Y. Fang, S. Du, Online detection of surface defects


410 based on improved yolov3, Sensors 22 (2022) 1–15.

[20] Y.-C. Chiu, C.-Y. Tsai, M.-D. Ruan, G.-Y. Shen, T.-T. Lee, Mobilenet-

ev
ssdv2: An improved object detection model for embedded systems, in:
International conference on system science and engineering, IEEE, 2020,
pp. 1–5.

r
415 [21] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-
scale image recognition, arXiv preprint arXiv:1409.1556 (2014).

er
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C.
Berg, Ssd: Single shot multibox detector, in: European conference on
computer vision, Springer, 2016, pp. 21–37.
pe
420 [23] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolu-
tional neural networks for mobile vision applications, arXiv preprint
arXiv:1704.04861 (2017).
ot

[24] C. B. Rasmussen, K. Kirk, T. B. Moeslund, Anchor tuning in faster


425 r-cnn for measuring corn silage physical characteristics, Computers and
Electronics in Agriculture 188 (2021) 1–15.
tn

[25] D. Li, J. Zhang, K. Huang, Learning to learn cropping models for


different aspect ratio requirements, in: Computer Vision and Pattern
Recognition (CVPR), 2020, pp. 12685–12694.
rin

430 [26] H. Qiu, H. Li, Q. Wu, F. Meng, K. N. Ngan, H. Shi, A2rmnet: Adap-
tively aspect ratio multi-scale network for object detection in remote
sensing images, Remote Sensing 11 (2019) 1–23.

[27] A. K. Aggarwal, R. Mohan, Aspect ratio analysis using image processing


ep

for rice grain quality, International Journal of Food Engineering 6 (2010)


435 1–16.
Pr

21

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4254498

You might also like