Li 2018

applied
sciences
Article
Research on a Surface Defect Detection Algorithm
Based on MobileNet-SSD
Yiting Li, Haisong Huang *, Qingsheng Xie, Liguo Yao and Qipeng Chen
Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University,
Guiyang 550025, China; tgl226537@163.com (Y.L.); qsxie@gzu.edu.cn (Q.X.); yaoliguo1990@163.com (L.Y.);
cqplll@gmail.com (Q.C.)
* Correspondence: huang_h_s@126.com; Tel.: +86-139-851-46670

Received: 26 August 2018; Accepted: 13 September 2018; Published: 17 September 2018
Abstract: This paper aims to achieve real-time and accurate detection of surface defects by using
a deep learning method. For this purpose, the Single Shot MultiBox Detector (SSD) network was
adopted as the meta structure and combined with the base convolution neural network (CNN)
MobileNet into the MobileNet-SSD. Then, a detection method for surface defects was proposed based
on the MobileNet-SSD. Specifically, the structure of the SSD was optimized without sacrificing its
accuracy, and the network structure and parameters were adjusted to streamline the detection model.
The proposed method was applied to the detection of typical defects like breaches, dents, burrs and
abrasions on the sealing surface of a container in the filling line. The results show that our method can
automatically detect surface defects more accurately and rapidly than lightweight network methods
and traditional machine learning methods. The research results shed new light on defect detection in
actual industrial scenarios.
Keywords: surface defects; meta structure; convolution neural network; MobileNet-SSD
1. Introduction
Intellisense and pattern recognition technologies have made progress in robotics [1–3], computer
engineering [4,5], health-related issues [6], natural sciences [7] and industrial academic areas [8,9].
Among them, computer vision technology develops particularly quickly. It mainly uses a binary
camera, digital camera, depth camera and charge-coupled device (CCD) camera to collect target images,
extract features and establish corresponding mathematical models, and to complete the processing of
target recognition, tracking and measurement. For example, Kamal et al. comprehensively consider
the continuity and constraints of human motion. After contour extraction of the acquired depth
image data, the Hidden Markov Model (HMM) is used to identify human activity. This system is
highly accurate in recognition and has the ability to effectively deal with rotation and deficiency of the
body [10]. Jalal et al. use Texture and shape vectors to reduce feature vectors and extracts important
features in facial recognition through density matching score and boundary fixation, so as to manage
key processing steps of face activity (recognition accuracy, recognition speed and security) [11]. In [12],
vehicle damage is classified by a deep learning method, and the recognition accuracy of a small
data set was up to 89.5% by the introduction of transfer learning and an integrated learning method.
This provides a new way for automatic processing of vehicle insurance. Zhang et al. combine the
four features of color, time motion, gradient norm and residual motion to identify the position of each
frame in video. The method uses weighted linear combination to evaluate the different combinations
of these features and establishes a precise hand detector [13]. With the continuous improvement of
computer hardware and the deepening of research on complex image classification, the application
prospect of computer vision technology will be more and more extensive.
Appl. Sci. 2018, 8, 1678; doi:10.3390/app8091678 www.mdpi.com/journal/applsci

Appl. Sci. 2018, 8, 1678 2 of 17
Surface defect detection is an important issue in modern industry. Traditionally, surface defects
are often detected in the following steps: first, pre-processing of the target image by image processing
algorithms. Image pre-processing technology can process pixels accurately. By setting and adjusting
various parameters according to actual requirements, the image quality can be improved by de-noising,
changing brightness and improving contrast, laying a foundation for subsequent processing; second,
carry out histogram analysis, wavelet transform or Fourier transform. The above transformation
methods can obtain the representation of an image in a specific space, which is convenient for the
artificial designing and extracting feature; finally, the image is classified according to its features
using a classifier. Common methods include thresholding, decision trees or support vector machine
(SVM). Most of the existing surface defect detection algorithms are based on machine vision [14–19].
Considering the mirror feature of ceramic balls, [17] obtains the stripe distortion image of defective
parts according to the principle of the fringe reflection and locates the defect positions by reverse ray
tracing. The research method is suitable for surface defect detection of ceramic balls and other phases,
but fails to achieve high accuracy due to the selection and design of radiographic models in reverse
ray tracing. Jian et al. realize the automatic detection of glass surface defects on cell phone screens
through fuzzy C-means clustering. Specifically, the image was aligned by contour registration during
pre-processing, and then the defective area was segmented by projection segmentation. Despite the
high accuracy, the detection approach consumes way too much time (1.6601 s) [18]. Win et al. integrate
a median-based Otsu image thresholding algorithm with contrast adjustment to achieve automatic
detection of the surface defects on titanium coating. The proposed method is simple and, to some
extent, immune to variation in light and contrast. However, when the sample size is large, the optimal
threshold calculation is too inefficient and the grey information is easily contaminated by dry noise
points [19]. To sum up, the above surface detection methods can only extract a single feature, and derive
a comprehensive description of surface defects from it. These types of approaches only work well
on small sample datasets, but not on large samples and complex objects and backgrounds in actual
production. To solve this problem, one viable option is to improve the approach with deep learning.
In recent years, deep learning has been successfully applied to image classification,
speech recognition and natural language processing [20–22]. Compared with the traditional machine
learning method, it has the following characteristics: deep learning can simplify or even omit the
pre-processing of data, and directly use the original data for model training; deep learning is composed
of multi-layer neural networks, which solve the defects in the traditional machine learning methods
of artificial feature extraction and optimization. So far, deep learning has been extensively adopted
for surface defect detection. For example, in [23], Deep Belief Network (DBN) was adopted to obtain
the mapping relationship between training images of solar cells and non-defect templates, and the
comparison between reconstructed images and defect images was used to complete the defect detection
of the test images. Cha et al. employ a deep convolution neural network (CNN) to identify concrete
cracks in complex situations, e.g. strong spots, shadows and ultra-thin cracks, and proves that the
deep CNN outperformed the traditional tools like Canny edge detector and Sobel edge detector [24].
Han et al. detect various types of defects on hub surfaces with residual network (ResNet)-101 as the
base net and the faster region-based CNN (Faster R-CNN) as the detector and achieves a high mean
average precision of 86.3% [25]. The above studies fully verify the excellent performance of deep
learning in detecting surface defects. Nevertheless, there are few studies on product surface defect
detection using several main target detection networks in recent years, such as YOLO (You Only
Look Once), SSD (Single Shot MultiBox Detector) [26] and so on. The detection performance of these
networks in surface defect detection needs to be further verified and optimized.
This paper presents a surface defect detection method based on MobileNet-SSD. By optimizing the
network structure and parameters, this method can meet the requirements of real-time and accuracy
in actual production. It was verified in the filling line and the results show that our method can
automatically locate and classify the defects on the surface of the products.
Appl. Sci. 2018, 8, 1678 3 of 17
2. Surface Defect Detection Method

Appl. Sci. 2018, 8, x FOR PEER REVIEW 3 of 17
2.1. Image Pre-Processing
2.1. Image Pre-Processing
There are two purposes of image pre-processing: one is to enhance image quality to ensure sharp
There
contrast are
and twoout
filter purposes
noise, of image
such pre-processing:
as adopting histogramone is to enhance image
equalization and grayquality
scaletotransformation
ensure sharp
contrast and filter out noise, such as adopting histogram equalization and gray scale
to enhance contrast, or adopting methods such as median filtering and adaptive filtering to remove transformation
to enhance
noise. contrast,
The second orsegment
is to adoptingthemethods such
image to as median
facilitate filtering
subsequent and adaptive
feature filtering
extraction, to threshold
such as remove
noise. The second is to segment the image to facilitate subsequent feature
segmentation, edge segmentation and region segmentation. In this paper, data enhancement and extraction, such as
threshold segmentation, edge segmentation and region segmentation. In this
defect area planning were carried out in view of the features of a container mouth in a filling line, paper, data
enhancement
such andmonotonous
as stability, defect area planning were
shape and carried out
a limited in view
number of of the features
images. of a container mouth
The pre-processing flow is
in a filling line, such as stability, monotonous shape and a limited number of images. The pre-
illustrated in Figure 1.
processing flow is illustrated in Figure 1.
Figure 1. The flow of image pre-processing. NMS = non-maximum suppression.

Figure 1. The flow of image pre-processing. NMS = non-maximum suppression.
2.1.1. Data Enhancement
2.1.1. Data Enhancement
The defect images on the sealing surface of a container in the filling line were collected by a
CCD The defectAimages
camera. on400
total of the sealing
imagessurface of a container
were taken, covering in defects
the filling line
like were collected
breaches, dents, by a CCD
burrs and
camera. A The
abrasions. totalimages
of 400 images were taken,
were converted covering
to the 300 × like
size ofdefects 300 breaches, dents,inputs.
as the training burrs and Suchabrasions.
a size can
The images
reduce were converted
the computing to training
load in the size of 300 × 300
without as the
losing tootraining inputs. Suchfrom
much information a size
the can reduceThen,
images. the
computing
data expansionloadwasin performed
training without losing
to increase thetoo much ofinformation
number images andfrom the the
prepare images.
dataset Then, data
for K-fold
expansion was performed
cross-validation. to increase
During data the number
expansion, of images
the following and prepare
actions the dataset
were targeted: for K-fold
rotation, cross-
horizontal
validation. vertical
migration, During migration,
data expansion,
scaling,the following
tangential actions wereand
transformation targeted:
horizontalrotation, horizontal
flip. In this way,
migration, vertical migration, scaling, tangential transformation and horizontal
the CNN could learn more invariant image features and prevent over-fitting. The implementation flip. In this way, the
CNN could
method learn
of data more invariant
expansion is shown image
in Tablefeatures
1: and prevent over-fitting. The implementation
method of data expansion is shown in Table 1:
Table 1. The implementation method of data expansion.
Data Expansion Mode Changes
Rotation From 0° to 10° random rotation of images
Horizontal migration images, and the offset value is

Horizontal migration
10% of the length of the images
Vertical migration images, and the offset value is 10% of

Vertical migration
the length of the images
Appl. Sci. 2018, 8, 1678 4 of 17
Table 1. The implementation method of data expansion.
Data Expansion Mode Changes

Rotation From 0◦ to 10◦ random rotation of images
Horizontal migration images, and the offset value is
Horizontal migration
10% of the length of the images
Vertical migration images, and the offset value is 10%
Vertical migration
of the length of the images
Scaling Enlarge the images to 10% scale
Stretch the pixel points horizontally. The stretch value
Tangential transformation
is 10% of the pixel points to the vertical axis distance
The images are randomly inverted with a horizontal
Horizontal flip
axis as the symmetry axis
The K-fold cross validation method divides the expanded dataset C into K discrete subsets.
During network training, a subset was selected as the test set, while the rest (K-1) subsets were
combined into the training set. Each training outputs a classification accuracy of the network model
on the selected test set. The same process was repeated K times to get the mean accuracy, i.e., the true
accuracy of the model.
2.1.2. Defect Area Planning

The defect samples contain lots of useless background information that may affect the recognition
quality of the detection algorithms. Our defect samples carry the following features: the defects
concentrated on the sealing surface, whose round shape remains unchanged. In view of these,
the Hough circle detection was used in the pre-processing phase to locate the edge of the cover,
and mitigate the impact of useless background on the recognition accuracy. The Hough circle transform
begins with the extraction of edge location and direction with a Canny operator. The Canny operation
involves six steps: RGB to gray conversion, Gaussian filtering, gradient and direction computation,
non-maximum suppression (NMS), double threshold selection and edge detection. Among them,
Gaussian filtering is realized by the two-dimensional (2D) Gaussian kernel convolution:
1 − x2 +2y2
K= e 2σ (1)
2πσ2
The purpose of Gaussian filtering is to smoothen the image and remove as much noise as possible.
Then, the Sobel operator was introduced to obtain the gradient amplitude and its direction, which can
enhance the image and highlight the points with significant changes in neighboring pixels. The operator
contains two groups of 3 × 3 kernels. One group is transverse detection kernels, and the other is
vertical detection kernels. The formula of the operator is as follows:
   
−1 0 +1 +1 +2 +1
Gx =  −2 0 +2  ∗ A, Gy =  0 0 0 ∗A (2)
   
−1 0 +1 −1 −2 −1
where A is a smooth image; Gx is an image after transverse gradient detection; and Gy is an image
after vertical gradient detection. The gradient can be expressed as:
1/2
∂f ∂f
|∇ f | = mag(∇ f ) = ( )2 + ( )2 (3)
∂x ∂y
Appl. Sci. 2018, 8, 1678 5 of 17
The gradient direction can be expressed as:
Gy
a( x, y) = arctan( ) (4)
Gx
During edge detection, the NMS was adopted to find the maximum gradient of local pixels by
comparing the gradients and gradient directions. After obtaining the binary edge image, the Hough
circle transform was employed to detect the circles. Under the coordinate system (c1 , c2 , r), the formula
of Hough circle transform can be described as:
2 2 2
( x −c1 ) + (y−c2 ) = r (5)
（ x -c1）2 + ( y -c 2 ) 2 =r 2 (5)
where (c1 , c2 ) is the center of the center coordinates; and r is the radius. The detection was realized
（cfollowing
wherethe 1 ,c2） is the center of the r is the radius. The detection was
through steps: first, the center
non-zero coordinates;
points inand the image are traversed, and line segments
realized through the following steps: first, the non-zero points
along the gradient direction (radius direction) and the opposite direction are in the image aredrawn.
traversed,
Theand line
intersection
segments along the gradient direction (radius direction) and the opposite direction are drawn. The
point of the segments is the circle center. Then, the maximum circle is obtained by setting the threshold
intersection point of the segments is the circle center. Then, the maximum circle is obtained by setting
value. After that, the rectangular regions are generated by the maximum radius. The size of the image
the threshold value. After that, the rectangular regions are generated by the maximum radius. The
was size
normalized to 300 × 300 × 3. Figure 2 shows the regional planning process.
of the image was normalized to 300 × 300 × 3. Figure 2 shows the regional planning process.
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)

Figure
Figure 2. (a)
2. (a) Originalimage;
Original image; (b)
(b) Gaussian
Gaussian smooth
smoothimage;
image;(c) (c)
X directional
X directionalgradient graph;graph;
gradient (d) Y (d) Y
directional
directional gradient
gradient graph;
graph; (e) Amplitude
(e) Amplitude image; image; (f) Angle
(f) Angle image;image; (g) Non-maximum
(g) Non-maximum value
value suppression
suppression
graph; (h) Edgegraph; (h) Edge
detection chart;detection chart; (i)
(i) Maximum Maximum
circle circle(j)detection;
detection; Maximum (j) Maximum circle of the
circle position
position of the original graph; (k) Cutting;
original graph; (k) Cutting; (l) Normalization. (l) Normalization.
2.2. 2.2. Defect

Defect Detection
Detection ModelBased
Model Basedon
onMobileNet-SSD
MobileNet-SSD Network
Network
2.2.1.2.2.1. Principles
Principles of of MobileNetFeature
MobileNet Feature Extraction
Extraction
TheThe MobileNetnetwork
MobileNet network[27]
[27] was
was developed
developedtotoimprove
improve thethe
real-time
real-timeperformance of deep
performance of deep
learning under limited hardware conditions. This network can reduce the number
learning under limited hardware conditions. This network can reduce the number of parameters of parameters
without sacrificing accuracy. Previous studies have shown that MobileNet only needs 1/33 of the
parameters of Visual geometry group -16 (VGG-16) to achieve the same classification accuracy in
ImageNet-1000 classification tasks.
Figure 3 shows the basic convolution structure of MobileNet. Conv_Dw_Pw is a deep and
Figure 3convolution
separable shows the structure.
basic convolution structure
It is composed of MobileNet.
of depth-wise Conv_Dw_Pw
layers (Dw) is alayers
and point-wise deep and
separable convolution
(Pw). The Dw are structure. It is composed
deep convolutional layers of depth-wise
using layers (Dw)
3 × 3 kernels, while and
the point-wise layers (Pw).
Pw are common
The convolutional
Dw are deeplayers
convolutional
using 1 × 1 layers
kernels.using 3 × 3 kernels,
Each convolution while
result the Pw
is treated arebatch
by the common convolutional
normalization
layers using 1 × 1 kernels. Each convolution result is treated by the batch normalization algorithm
algorithm and the activation function rectified liner unit (ReLU).
and the activation function rectified liner unit (ReLU).
Figure 3 shows the basic convolution structure of MobileNet. Conv_Dw_Pw is a deep and
separable convolution structure. It is composed of depth-wise layers (Dw) and point-wise layers
(Pw). The Dw are deep convolutional layers using 3 × 3 kernels, while the Pw are common
Appl. Sci. 2018, 8, 1678 6 of 17
convolutional layers using 1 × 1 kernels. Each convolution result is treated by the batch normalization
algorithm and the activation function rectified liner unit (ReLU).
Figure 3. The basic convolutional structure of MobileNet. Dw = depth-wise layers. Pw = point-wise

Figure 3. The basic convolutional structure of MobileNet. Dw = depth-wise layers. Pw = point-wise
layers. BN = batch normalization. Conv = convolution. ReLU6 = rectified liner unit 6.
layers. BN = batch normalization. Conv = convolution. ReLU6 = rectified liner unit 6.
In this paper, the activation function ReLU is replaced by ReLU6, and the normalization is carried
out by the batch normalization (BN) algorithm, which supports the automatic adjustment of data
distribution. The ReLU6 activation function can be expressed as:
y = min(max(z, 0), 6) (6)
where z is the value of each pixel in the feature map.

The deep and separable convolutional structure enables the MobileNet to speed up the training
and greatly reduces the amount of calculation. The reasons are as follows:
The standard convolution structure can be expressed as:
GN = ∑ KM,N ∗ FM (7)
M
where KM,N is the filter; and M and N are respectively the number of input channels and output
channels. During the standard convolution, the input image, including the feature image, FM means
input images, including feature maps, which use the fill style of zero padding.
When the size and channels of input images respectively are DF ∗DF and M, it is necessary to
have N filters with M channels and the size of DK ∗DK before outputting N feature images of the size
DK ∗DK . The computing cost is DK ∗DK ∗M ∗ N ∗ DF ∗DF .
By contrast, the Dw formula can be expressed as:
ĜM = ∑ K̂1,M ∗FM (8)
where K̂1,M is the filter. FM has the same meaning as Formula (7). When the step size is one, the filling
of zero ensures that the size of the characteristic graph is invariable after the application of deep and
separable convolutional structure. When the step size is two, zero filling ensures that the size of the
feature graph obtained after the application of deep and separable convolutional structure becomes
half of the input image/feature graph; that is, the dimensional reduction operation is realized.
The deep separable convolution structure of MobileNet can obtain the same outputs as
those of standard convolution based on the same inputs. The Dw phase needs M filters with
one channel and the size of DK ∗DK . The Pw phase needs N filters with M channels and the
size of 1 × 1. In this case, the computing cost of the deep separable convolution structure is
1
DK ∗DK ∗M ∗ DF ∗DF +M ∗ N ∗ DF ∗DF , about N + 12 of that of standard convolution.
DK
Besides, the data distribution will be changed by each convolution layer during network training.
If the data are on the edge of the activation function, the gradient will disappear and the parameters
will no longer be updated. Similar to the standard normal distribution, the BN algorithm adjusts the
data by setting two learning parameters, and prevents gradient disappearance and the adjustment of
complex parameters (e.g., learning rate and dropout ratio).
2.2.2. SSD Meta Structure

SSD network is a regression model, which uses features of different convolution layers to make
classify regression and boundary box regression. The model solves the conflict between translation
invariance and variability, and achieves good detection precision and speed.
Appl. Sci. 2018, 8, 1678 7 of 17
In each selected feature map, there are K frames that differ in size and width-to-height ratio.
These frames are called the default boxes. Figure 4 expresses default boxes on feature maps of different
convolutional layers. Each
on the sealing surface default box
of a container predicts
in the fillingtheline.B The
classscale
score
ofand the fourboxes
the default position parameters.
for each feature
Hence, B ∗ k ∗
map is computed as: w ∗ h class score and 4 ∗ k ∗ w ∗ h position parameters must be predicted for a w ∗ h
feature image. This requires (B + 4) ∗ k ∗ w ∗ h convolution kernels of the size 3 × 3 to process the
feature map. Then, the convolution S -Smin be taken as the final feature for classification
results
Sk =Smin + maxshould (k-1) , (k ∈ [1,m]) (9)
regression and bounding box regression. Here,m-1 B is set to four because there are four typical defects on
the sealing
m issurface of a container inmaps;
the filling
and line. The scale of the default boxes for each feature map
where the number of feature S m ax , S m in are parameters that can be set. In order to
is computed as:
control the fairness of feature vectors inS the−training and test experiments, the same five kinds of
max Smin
width-to-height ratios ar = {1, =3,S0.5,
Sk2, +
min 0.33} were (k −to1generate
used ) , (k ∈ [1,default
m]) boxes. Then, each default (9)
m−1
box canmbeisdescribed
where the number as: of feature maps; and S , S are parameters that can be set. In order to
max min
a
control the fairness of feature vectors in thewtraining
k =Sk aand
r test experiments, the same five kinds of
width-to-height ratios ar = {1, 2, 3, 0.5, 0.33} were (10)
used to generate default boxes. Then, each default
h ak = S k / a r
box can be described as: √
a wak = a Sk √ar
where wk is the width of default boxes; andha = h is the height of default boxes. (10)
k kSk / ar
'
where wak isathe
Next, default
widthbox Sk = Skboxes;
of default Sk+1 should
and hakbe is added when
the height the width-to-height
of default boxes. ratio is one. The
Next, a default box Sk0 =i+0.5Sk Sj+0.5
p
k+1 should be added when the width-to-height ratio is one.
center of each default box is ( , j+0.5) , and f k is the size of the K-th feature unit, i,j ∈ [ 0 , f k ]
i+0.5
The center of each default box is f(k |f | ,f k|f | ), and |fk | is the size of the K-th feature unit, i, j ∈ [0, |fk |].
k k
The
. Theintersection
intersectionover
overunion
union(IoU)
(IoU)between
betweenarea areaAAand
andareaareaB Bcan
canbebecalculated as:as:
calculated
area (A) 
area(A) ∩area(B)
area(B)
IoU ==
IoU (11)
(11)
area (A) 
area(A) ∪area(B)
area(B)
If the IoU of default box and calibration box (Ground-truth Box) is greater than 0.5, it means the
default box matches the calibrationcalibration box
box of
of that
that category.
category.
The SSD is an an end-to-end
end-to-end training
training model. The overall loss function of the training contains the
conf ( )
confidence loss LLconf ( s, c
s,c ) of
of the
the classification regression
classification regression and
and the
the position
position loss
loss of
of the
the bounding
bounding box
regression
regression LLloc ((r,r,l,g
l, g)).. This
This function
function can
can be
be described
described as:
as:
loc
11 Lconf (s, c)+αLloc (r, l, g))

L(Ls,(r, c, l, g))=
s,r,c,l,g (
= N (L
N
)
conf ( s,c ) +αLloc ( r,l,g )
(12)
(12)
where ααis a parameter to balance the confidence loss and position loss; s and r are the eigenvectors
where is a parameter to balance the confidence loss and position loss; s and r are the eigenvectors
of confident loss and position loss, respectively; c is the classification confidence; l is the offset of
of confident loss and position loss, respectively; c is the classification confidence; l is the offset of
predicted box, including the translation offset of the center coordinate and scaling offset of the height
predicted box, including the translation offset of the center coordinate and scaling offset of the height
and width; g is the calibration box of the target actual position; and N is the number of default boxes
and width; g is the calibration box of the target actual position; and N is the number of default boxes
that match the calibration boxes of this category.
that match the calibration boxes of this category.
Figure 4. Default boxes on feature maps.
2.2.3. Surface Defect Detection Algorithm Based on MobileNet-SSD Model
In the filling line, the sealing surface is easily damaged by friction, collision and extrusion in the
recycling and transport of pressure vessels. The common defects include breaches, dents, burrs and
abrasions on the sealing surface. In this paper, the MobileNet-SSD model can greatly reduce the
number of parameters, and achieve higher accuracy under the limited hardware conditions. The
Appl. Sci. 2018, 8, 1678 8 of 17
2.2.3. Surface Defect Detection Algorithm Based on MobileNet-SSD Model

In the filling line, the sealing surface is easily damaged by friction, collision and extrusion in the
recycling and transport of pressure vessels. The common defects include breaches, dents, burrs and
abrasions on the sealing surface. In this paper, the MobileNet-SSD model can greatly reduce the number
of parameters, and achieve higher accuracy under the limited hardware conditions. The complete
model
complete contains
modelfour parts:four
contains the input
parts:layer for importing
the input layer for the target image,
importing the MobileNet
the target image, thebase net for
MobileNet
extracting
base net for image features,image
extracting the SSD for classification
features, the SSD for regression and bounded
classification regression box and
regression
bounded and box
the
output layer for exporting the detection results.
regression and the output layer for exporting the detection results.
This
This model
model supports
supports fastfast and
and accurate
accurate detection
detection because
because the
the structure
structure of of MobileNet
MobileNet reduces
reduces thethe
complexity of computing. However, the structure of Pw changes the distribution
complexity of computing. However, the structure of Pw changes the distribution of output data of of output data of Dw.
This
Dw. mayThiscause
may acause
loss of
a precision. To solve To
loss of precision. thissolve
problem,
this the fully-connected
problem, layers were layers
the fully-connected abandoned,
were
and eight standard
abandoned, and eightconvolutional layers were added
standard convolutional with added
layers were the aimwith
to widen
the aim thetoreceptive
widen the field of the
receptive
feature
field of image, adjust
the feature the data
image, distribution
adjust the data and enhanceand
distribution the translation
enhance theinvariance
translation of invariance
the classification
of the
task. To prevent the disappearance of the gradient, the BN layer and activation
classification task. To prevent the disappearance of the gradient, the BN layer and activation function ReLU6 were
function
introduced
ReLU6 were tointroduced
each layer ofto theeachadded
layer structure.
of the added In addition,
structure.theIn two-layer
addition, thefeature imagefeature
two-layer in MobileNet
image
and the four-layer feature image in the added standard convolutional layers
in MobileNet and the four-layer feature image in the added standard convolutional layers constituted constituted a feature
image pyramid
a feature image(Figure
pyramid 5). (Figure
Kernels5). 3 ×Kernels
3 in size
3 ×were
3 inadopted
size were as adopted
the convolutional kernels to convolve
as the convolutional kernels
selected feature maps. The convolutional results were taken as the final
to convolve selected feature maps. The convolutional results were taken as the final feature for feature for classification
regression
classification andregression
bounded and box bounded
regression. box regression.
Figure
Figure 5. MobileNet-Single Shot
5. MobileNet-Single Shot MultiBox
MultiBox Detector
Detector (SSD)
(SSD) network
network feature
feature pyramid.
pyramid.
A 300 ×
A 300 300image
× 300 imagewaswastaken
takenasasthe
theinput.
input.The
Thesix
sixlayers
layersofofthe
thepyramid
pyramid respectively
respectively contain
contain 4,
4, 6, 6, 6, 6 and 6 default boxes. Besides, different 3 × 3 kernels were adopted for classification
6, 6, 6, 6 and 6 default boxes. Besides, different 3 × 3 kernels were adopted for classification and and
location
location with
with the
the step
step length
length of
of one.
one. The
The numbers
numbersin inbrackets
bracketswere
werethe theamount
amountofof33×× 33 filters
filters that
that are
are
applied around each location in the feature map. Its number was the amount of default
applied around each location in the feature map. Its number was the amount of default Box × number Box × number
of
of categories
categories (classification)
(classification) and
and the
the amount
amount ofof default Box×× 4(location),
default Box 4(location), respectively.
respectively. The The general
general
structure of MobileNet-SSD is shown in
structure of MobileNet-SSD is shown in Table 2.Table 2.
Table 2. General structure of MobileNet-SSD. ※ = the feature image to be used in classification

regression and bounded box regression.
Convolution Step Input Output

Conv0_BN_ReLU6(3 × 3) 2 3 32
Conv1_Dw_Pw(3 × 3) Dw:1 32 64
Conv2_Dw_Pw(3 × 3) Dw:2 64 128
Conv3_Dw_Pw(3 × 3) Dw:1 128 128
Conv4_Dw_Pw(3 × 3) Dw:2 128 256
Appl. Sci. 2018, 8, 1678 9 of 17
Appl. Sci. 2018, 8, x FOR PEER REVIEW 9
Table 2. General structure of MobileNet-SSD. ※Conv7_Dw_Pw(3

= the feature image× to
3) be used
Dw:1 512
in classification 512
regression and bounded box regression. Conv8_Dw_Pw(3 × 3) Dw:1 512 512
Convolution
Conv9_Dw_Pw(3
Step
× 3)
Input
Dw:1
Output
512 512
Conv10_Dw_Pw(3 × 3) Dw:1 512 512
Conv0_BN_ReLU6(3 × 3) 2 3 32
Conv1_Dw_Pw(3 × 3) Conv11_Dw_Pw(3
Dw:1 32× 3) Dw:1 64 512 512
Conv2_Dw_Pw(3 × 3) Dw:2
Conv12_Dw_Pw(3 64× 3) Dw:2 128 512 1024
Conv3_Dw_Pw(3 × 3) Dw:1 128 128
Conv4_Dw_Pw(3 × 3)
※Conv13_Dw_Pw(3
Dw:2 128
× 3) Dw:1 256
1024 1024
Appl. Sci. 2018, 8, x FOR PEER REVIEW Conv14_1_BN_ReLU6(1 9 of 17
Conv5_Dw_Pw(3 × 3) Dw:1 256 × 1) 1256 1024 256
Conv6_Dw_Pw(3 × 3) Dw:2
※Conv14_2_BN_ReLU6(3 256 × 3) 2512 256 512
※Conv7_Dw_Pw(3
Conv7_Dw_Pw(3 × ×3)3) Dw:1 Dw:1512 512512 512
Conv8_Dw_Pw(3 × ×3)3)
Conv15_1_BN_ReLU6(1
Dw:1512
× 1) 1512 512 128
Conv8_Dw_Pw(3 Dw:1 512512
Conv9_Dw_Pw(3 × 3) ※Conv15_2_BN_ReLU6(3
Dw:1 512 × 3) 2512 128 256
Conv9_Dw_Pw(3××3)3)
Conv10_Dw_Pw(3 Dw:1 Dw:1 512 512512 512
Appl. Sci. 2018, 8, x FOR PEER REVIEW Conv16_1_BN_ReLU6(1x1) 1 9 of256 17 128
Conv10_Dw_Pw(3
Conv11_Dw_Pw(3 × ×3)3) Dw:1 Dw:1512 512512 512
Conv12_Dw_Pw(3 × ×3)3) ※Conv16_2_BN_ReLU6(3
Dw:2512 × 3) 2 128 256
Appl. Sci. 2018, 8, x FOR PEER REVIEW Conv11_Dw_Pw(3 Dw:1 512512 1024
9 of 17
※Conv7_Dw_Pw(3
Conv13_Dw_Pw(3 ××3) 3) Conv17_1_BN_ReLU6(1
Dw:1Dw:1 512 5121024 × 1) 1
1024 256 64
Conv12_Dw_Pw(3 ××3)1)
Conv14_1_BN_ReLU6(1 Dw:2 1 512 1024 1024 256
Appl. Sci. 2018, 8, x FOR PEER REVIEW Conv8_Dw_Pw(3 × 3) ※Conv17_2_BN_ReLU6(3
Dw:1 512 512 × 3) 29 of 1764 128
※Conv13_Dw_Pw(3
※Conv7_Dw_Pw(3
Conv14_2_BN_ReLU6(3 × 3)××3)3) Dw:1 Dw:1512 2 1024 5121024 256 512
Conv9_Dw_Pw(3 × ×
Conv15_1_BN_ReLU6(1 3) 1) Dw:1 1 512 512
Appl. Sci. 2018, 8, x FOR PEER REVIEW Conv14_1_BN_ReLU6(1
Conv8_Dw_Pw(3 In Table × 1) Dw:11 512
× 3)2, Conv_BN_ReLU6 1024is a 512256512 convolutional
standard
128
9 oflayer,
17 while Conv1_Dw_Pw is a
※Conv7_Dw_Pw(3
Conv10_Dw_Pw(3
Conv15_2_BN_ReLU6(3 × 3)
× 3) × 3) Dw:1 512 Dw:1 2 512 512512128 256
※Conv14_2_BN_ReLU6(3
Conv9_Dw_Pw(3 × 3)
and separable×convolutional × 3) Dw:1 2 512 256 512 512 the sign ※ represents
Appl. Sci. 2018, 8, x FOR PEER REVIEW
Conv16_1_BN_ReLU6(1x1)
Conv11_Dw_Pw(3
Conv8_Dw_Pw(3 × 3) 3) Dw:1 Dw:1layer. 1 512Besides,
512 512512256 128
9 of 17
the feature image to be use
※Conv7_Dw_Pw(3
Conv15_1_BN_ReLU6(1
Conv10_Dw_Pw(3 classification
Conv16_2_BN_ReLU6(3 ××3) 3) ××3)1) Dw:1
regression and1 bounded 2 512 box
512 512128 regression.
128 Considering256 the small size of the target def
Conv12_Dw_Pw(3
Conv9_Dw_Pw(3 × 3)××3)1) Dw:1 Dw:2 512512 512 1024
Conv17_1_BN_ReLU6(1
Conv11_Dw_Pw(3
Conv8_Dw_Pw(3 the feature×image ×3) 3) of × 3) the
Dw:1 shallow
2 512 1 Conv7_Dw_Pw
128 512256256 output was64adopted for further analysis.
※Conv13_Dw_Pw(3
※Conv7_Dw_Pw(3
Conv10_Dw_Pw(3
Conv17_2_BN_ReLU6(3 ××3) × 3)
3) × 3) Dw:1 512 Dw:1 2 1024 512 102464 128
Conv12_Dw_Pw(3
Conv9_Dw_Pw(3 ××3) 3) Dw:11 512256 1024
Dw:2 512128
Conv14_1_BN_ReLU6(1
Conv11_Dw_Pw(3
Conv8_Dw_Pw(3 3. Experimental × 3) × 1)
× 3)2018,Results Dw:1 and1 512 1024 512 256
※Conv13_Dw_Pw(3
Conv10_Dw_Pw(3 Appl. Sci.××3)3) 8,× x3)FOR Dw:1 PEER2 Analysis
REVIEW
1024
512 128 1024 512256
In Table ※Conv14_2_BN_ReLU6(3
2,Conv12_Dw_Pw(3
Conv9_Dw_Pw(3
Conv_BN_ReLU6 ×× 3)
3)
is a × 3)
standard Dw:2
Dw:1 2 512
convolutional 256 1024 512 512 while Conv1_Dw_Pw is a deep
layer,
Conv17_1_BN_ReLU6(1
Conv14_1_BN_ReLU6(1
Conv11_Dw_Pw(3 Our experiment
× 3)× 1)× 1) targets Dw:11 1 an 1024
512 oil256 chili512
256filling
64 production line in China’s Guizhou Province. Du
and separable※Conv13_Dw_Pw(3
Conv15_1_BN_ReLU6(1
Conv10_Dw_Pw(3
convolutional the layer.
detection,
3) × 1) Dw:1
××3)Besides,
the image the of
1sign
the
1024
512 512
※Conv7_Dw_Pw(3
sealing
1024
512
represents 128 the feature × 3) image Dw:1 to be used 512 in 512
Conv12_Dw_Pw(3 × 3) × 3)× 3) Dw:2 2 2 512 256 64 1024 512128 was transmitted via the image acquisition unit to
surface
classificationConv14_1_BN_ReLU6(1
regression
Conv11_Dw_Pw(3 andfor
host bounded
image× 3)× signal × 3)
1) regression.
box Dw:11
processing.2 1024
512 128
Considering 256
512
Conv8_Dw_Pw(3 256
thethe small × 3)size of featurestheDw:1targetwere defects,
512 512 and the de
※Conv13_Dw_Pw(3
Conv15_1_BN_ReLU6(1 × ×3)1) Dw:11 512 Then,
1024 128
1024 corresponding extracted
the feature※Conv14_2_BN_ReLU6(3
image of the
Conv12_Dw_Pw(3 shallow
were Conv7_Dw_Pw
× 3) × 3) Dw:22 output
1 256
512 was
256 adopted
512
1024
Conv9_Dw_Pw(3 128 for further
× 3) analysis.
Dw:1 512 512
In Table 2, Conv_BN_ReLU6 ※Conv15_2_BN_ReLU6(3
Conv14_1_BN_ReLU6(1 is adetected
standard and
××1) 3) marked
convolutional 12
by128 the MobileNet-SSD
layer,
1024 while
256 Conv1_Dw_Pw is a deep
network. Specifically, the MobileNet-SSD se
and separable ※Conv16_2_BN_ReLU6(3
convolutional※Conv13_Dw_Pw(3
Conv15_1_BN_ReLU6(1 as
layer.and the training
Besides, × ×
the3) 1)
base × 3)netDw:1 1 of
sign ※21represents 2the 512
1024 128
pre-processed 128
1024
Conv10_Dw_Pw(3 256 database; × 3) then, Dw:1
the trained 512model 512
was migrated to
3. Experimental ※Conv14_2_BN_ReLU6(3
Results Analysis × 3) 256 the512 128 feature image to be used in
classification regression ※Conv15_2_BN_ReLU6(3
Conv17_1_BN_ReLU6(1
Conv14_1_BN_ReLU6(1
and bounded detection boxnetwork ××1)×3)1)
regression. 12 1 1024
for Considering
boundary 128 256
box 25664 sizeand
regression
Conv11_Dw_Pw(3
the small ×classification
of the 3)target defects, Dw:1 regression.
512 A total
512 of five widt
Ourof ※Conv16_2_BN_ReLU6(3
Conv15_1_BN_ReLU6(1
experiment targets an oil chili××1) 3)
filling 12
production 128 line in
512 256
128 China’s Guizhou Province. During the
the feature image ※Conv17_2_BN_ReLU6(3
height
※Conv14_2_BN_ReLU6(3 × 3)
the shallow Conv7_Dw_Pwratios were ×selected
3)
output 1
2 was2 according
256
adopted 64 to128
for
512
Conv12_Dw_Pw(3 the
128 defect
further size,
analysis.
× 3) namely, 1,
Dw:2 512 2, 3, 0.5 and 1024 respectively
0.33
detection, the ※Conv15_2_BN_ReLU6(3
Conv17_1_BN_ReLU6(1
image of the
S m ax sealing
was set ×to×1)0.95
surface 3) wasand 1transmitted
2 the 256 to
128 via
S※Conv13_Dw_Pw(3
64theThe
256
0.2. imagesix acquisition
layers of the unit
pyramidto the host
respectively
Conv15_1_BN_ReLU6(1 ××1)3) 12 128
512 m in 256
128 × 3) Dw:1 1024 1024 contain 4,
3. Experimental for image ※Conv17_2_BN_ReLU6(3
Results
In Table 2, Conv_BN_ReLU6 Conv16_1_BN_ReLU6(1x1)
signal processing.
and Analysis Then, the × 3)
corresponding 2
1 64
256 features 128 were extracted and the defects were
Conv17_1_BN_ReLU6(1 6, 6isanda standard
6 default ××1)convolutional
3)boxes.21During layer,
256
128 while
the training,
Conv14_1_BN_ReLU6(1 64 Conv1_Dw_Pw
256 the IoU of× the 1) ispositive
a deep 1 sample 1024 fell256 in (0.5, 1), that o
and separable detected ※Conv16_2_BN_ReLU6(3
and
convolutional marked by
layer. the
negative MobileNet-SSD
Besides, samplethe × 3)
sign ※network.
fell in 12(0.2, 2represents 128
0.5), Specifically,
the
and 128 256feature
theGuizhou the MobileNet-SSD
image
difficulty to
of×the be used
sample served
in as the
Our experiment
In Table 2, regression ※Conv17_2_BN_ReLU6(3
Conv_BN_ReLU6 targets an is oila chili
Conv16_1_BN_ReLU6(1x1) standardfilling × 3)
production
convolutional line※Conv14_2_BN_ReLU6(3
64
in China’s
256
layer, while Conv1_Dw_Pw Province. During
is3)a defects,
deep 2 fell 256in (0, 512
0.2). In addition
classification training base net
and ofbounded
thelearning
Conv17_1_BN_ReLU6(1 pre-processed
boxrate ×ofdatabase;
regression. 1)this Considering
paper 1then, is the
256the trained
the 64
small
exponential model was
size decay
of themigrated
target
learning torate
the detection
initialized to 0.1, and ran
and the detection, the image oflayer.
the sealing
※Conv16_2_BN_ReLU6(3 surface was
× and
3) ※ transmitted 2 128 via the
Conv15_1_BN_ReLU6(1 imageimage
256 acquisitionto ×of unit to
be1)five 1the 512 128
theseparable
feature
In Table
convolutional
network
image offor
2, Conv_BN_ReLU6the boundary
shallow Besides,
box
Conv7_Dw_Pw
initialization
is aThen,standard
the sign
regression ×
weights output
3)
convolutional
represents
classification
and was
2 bias adopted
layer,64
terms.
the feature
regression.
for
128 further A analysis.
total used in
width-to-height
host
classificationfor imageregression
ratios
signalConv17_1_BN_ReLU6(1
were and
processing.
selectedbounded according box the corresponding
to the× defect
regression. 1) Considering
size, 1 namely, thewhile
features were
256 Conv1_Dw_Pw
1, 2,643,size
small extracted
0.5of andthe0.33 and
target ×is3)
the a deep
defects
defects,
respectively. 2 The 128Smax 256
and wereseparable
the3.feature detected
image convolutional
and marked layer.
by
ofResults
the shallow the Besides,
Conv7_Dw_Pw the
MobileNet-SSD sign
output
×six ※ represents
network.
3) layers was 2 of the
Specifically,
adopted 64pyramid feature
for 128further
Conv16_1_BN_ReLU6(1x1) the image to
MobileNet-SSD
analysis.containbe used in
served1 6, 6, 256
Experimentalwas set to 0.95 and
and the Smin
Analysis
3.1. to 0.2.
Image The
Processing Unit the respectively 4, 6, 6 and 6 128
asInthe
classificationTable 2, Conv_BN_ReLU6
regression
training base net andofbounded is a box
the pre-processed standard
regression.convolutional
database; then,layer,
Considering thethe while
smallConv1_Dw_Pw
trained size of the
model wastarget isdefects,
migrated a deep to the
anddetection
separable default boxes. During
convolutional layer. the training,
Besides, thethe sign IoU ※ of the positive
represents ※Conv16_2_BN_ReLU6(3
the sample fell in (0.5, 1),
× 3) that of 2the negative
128 256
the featureOurimage
3. Experimental
In Table
experiment
network
2, of the
Results fortargets
Conv_BN_ReLU6 shallow
boundary
and anConv7_Dw_Pw
Analysis oil
is box
a
chili
As filling
regression
standardshown production
output
and
convolutional was
classification
in Figure line
adopted
6,layer,in China’s
the forfeature
further
regression.
image
while
GuizhouAimage
acquisition
Conv1_Dw_Pw
to
totalProvince.
analysis. of be
five
unit is
used During
width-to-
consists
a deep
in
of a transmission mechanis
classification sample
regression fell in (0.2,
andofbounded 0.5), and the
boxto difficulty
regression. of the
Considering sample fell in
Conv17_1_BN_ReLU6(1
the small (0, 0.2). In addition, × 1) the learning
1 rate
256 of 64
and
the
heightdetection,
ratios
separable
the image
were selected
convolutional
the
layer.
sealing
according proximity
Besides,
surface
the defect
switch
the
wassensor,
sign
transmitted
size,
※ namely,
an industrial
represents
via
1,the2,the3, 0.5size
image
CCD
feature and of
camera,
the respectively.
0.33
image
targetunit
acquisition defects,
atolight-emitting
be used
to The
in
thediode (LED) arc light source
this
Our experiment paper is the
targets exponential
an oil chili decay
filling learning
production rate
line initialized to
in China’s 0.1,
Guizhou and random
Province. initialization
3)During
× the weights
the feature
3. Experimental
host
S m axfor
classification was image
image toof
set Results
regression
the
signal
0.95 shallow
and the
and
and
Analysis
processing.
bounded
Conv7_Dw_Pw
Then,
Sam inlens. theThe
to Before
box 0.2.
regression.
output
corresponding
the
was
sixacquisition,
layers
Considering ofadopted
features
the the
for
pyramid
the LED small
further
were analysis.
extracted
respectively
arcsize light of source
and
the target contain
was 4, 26, 6, to64
defects
adjusted
defects,
128
calibrate the brightness,
the detection, andthe bias
imageterms. of the sealing surface was transmitted via the image acquisition unit to the
were
6,featuredetected
6 and
Our 6 default
experiment andboxes.
marked
targets byoilthe
During the
the MobileNet-SSD
lens wasproduction
training, mounted
the IoUnetwork.
of on theSpecifically,
theadopted CCDfor
positive camera.
sample thefell MobileNet-SSD
Then,
in the 1),
(0.5, thatserved
aperture and focal length were adjuste
of the
the
3.
hostExperimental
for imageimage of the
Results
signal and an
shallow
processing. Analysis chili
Conv7_Dw_Pw
Then, filling output
the corresponding
In Table wasline
2,quality in
features
Conv_BN_ReLU6 China’s Guizhou
werefurther extracted
ismodel
a standard Province.
analysis.
and the During
defects
convolutional layer, while Conv1_Dw_Pw
the as the
negative
detection,training
sample
3.1.
the base
Image fell
image netin of the
(0.2,sealing
Processing
of the pre-processed
ensure
0.5),
Unit and the database;
imaging
the difficulty
surface was network. then,
of the sample
transmitted of the
the trained
acquisition
via thelayer. fell in acquisition
image was
unit. migrated
Under
(0, 0.2). In unit the
addition, to
action
to the the
theof the transmission mechan
were detected
detection
Our and
network
experiment marked
for
targets by
boundary
an the MobileNet-SSD
box
oilthechili
the and
regression
filling
sensor separable and
production
detected convolutional
classification
the line
workpiece
Specifically,
in China’s regression.
and
the
Guizhou
produced
MobileNet-SSD
Besides,
A total the
of five
Province.
pulse sign
signals ※
served
width-to-
During represents
when the vessel the feature image
maintained to
a con
3.
host learning
Experimental
for image ratesignal
of this
Results paper
and
processing. is
Analysis Then, exponential
the correspondingdecay learningfeatures rate
were initialized
extracted to 0.1,
and and
the random
defects
as the training As
base shown
net of in Figure 6, the image acquisition unit consists of a transmission mechanism, a proximity
the
wereheight
detection,ratiosthe
initialization
detected were image
weights
and marked ofthe
selected
and the
bybias
pre-processed
according
sealing
the speed
terms. to thedatabase;
classification
surface
MobileNet-SSDand defect
was network.
spacing. size,then,
regression
transmitted
These namely, thevia
signals
trained
and
Specifically,1,thebounded
2, 3, model
0.5
image
triggeredthe
box
and was
the0.33
acquisition
CCD
MobileNet-SSD
migrated
regression.
respectively.
unitserved
camera
to
totothe
Considering
the The photos.
take the small size of
In order to the targ
ensure
detectionOur switch
network
experiment sensor,
for boundary
targets anan industrial
box
oil chili CCD
regression
the
filling camera,
and
feature
production image a light-emitting
classification
of
line the in regression.
shallow
China’s diode A
Conv7_Dw_Pw
Guizhou (LED)
total ofarcfive
Province. light
width-to-
output source
During was and
adopted a lens.
for further analysis.
host
as S mfor
the was
image
axtraining
setbase
to 0.95
signal net and
processing.
of the S Then,
thepre-processed
mthe to the
0.2. corresponding
in container
The runs
database; six layers
to then,
the center ofthethetrained
features pyramid
weremodel
of view respectively
extracted
field, sensor
was and
migratedcontain
the defects
needs 4,the
totobe 6, 6,
accurately debugged.
height
the ratios
detection, Before
were
the the
image acquisition,
selected of according
the sealing thetoLED the
surface arc
defect light
was source
size, namely,
transmitted was adjusted
1,
via 2, 3,
the 0.5 toand
image calibrate
0.33 the brightness,
respectively.
acquisition unit to The
the and the lens
were 3.1.
6,
detection6detected
Image
and network andfor
Processing
6 default marked
boundary by the
UnitDuring
boxes. box MobileNet-SSD
the training, and
regression the network.
IoU of theSpecifically,
classification positive
regression. the A
sample MobileNet-SSD
fell
total inof(0.5, 1),
fivethe served
that
width-toof the
S mthe
host ax for
was image was
set mounted
tobase
0.95
signal and on the
the
processing. Spre-processed
CCD to 3. Experimental
camera.
0.2. The
m inThen, the corresponding
Then,
six layers theResults
ofaperture
the
features and
pyramid and Analysis
were focal length
respectively
extracted were
contain
and adjusted4,
defects 6,to ensure the
6,the
as negative
height training
ratios sample
were net
fell
selected of
in the
(0.2,
according 0.5), and the database;
difficulty
to the acquisition
defect size, then,
of
namely, the the trained
sample
1, 2, 3,of0.5 model
fell in
and (0, was0.2).migrated
In
0.33 respectively. addition, to Thetheathe sensor
were
6, As
detected
6 and 6 network shown
imaging
default and in Figure
quality
marked
boxes. by 6,
of
Duringisbox the
the the
the image
acquisition
MobileNet-SSD
training, unit.
theand IoU Under
network.
of theunit theconsists
action
Specifically,
positive sample a
the
the transmission
transmission
MobileNet-SSD
fell in (0.5, mechanism,
mechanism,
1), and
that served
of the
detection
S mlearning
was set rate of
tobase
0.95for
this boundary
and paper
the theto regression
Spre-processed exponential
0.2.CCDThe Our classification
sixexperiment
decay
layers learning
of the targetsregression.
pyramidrate aninitialized
oil A total
chili
respectively toof0.1,
filling five
contain width-to-
production 4,random
6, 6,line in China’s Guizhou Provin
as proximity
ax training
the
negative sample switch
detected fell sensor,
the
net
in of an0.5),
workpiece
the
(0.2, industrial
m in and produced
and the camera,
database;
difficulty pulse of athen,
light-emitting
signals
the the
sample when
trained the
fell diode
vessel
model
in (0,(LED) was
0.2). arc
maintained lightasource
migrated
In addition, constant
to the andspeed and
height ratios were
initialization selected
weights and according
biasthe terms. to the defect
detection, size,the namely,
image1,of2,the 3, 0.5 and 0.33
sealing surfacerespectively.
was transmitted The via the image acquisition
6, 6a and
detection
learning lens. 6 default
Before
network boxes.
theforand During
acquisition,
boundary the training,
LED
boxexponential
regressionarc the
light
and IoU of
source the was
classification positive
adjusted sample
regression. to fell
calibrate in
A total (0.5,
the 1), that
brightness,
ofcorresponding
five
andwidth-toof the and
S m ax wasrate set of this
to 0.95 paper theis Sthe m in
to 0.2.
host The for decay
six layers
image learning
of
signal the rate
pyramid
processing. initialized
respectively
Then, tothe 0.1,contain random
4, 6, 6, features were extracted and t
negative
the ratios
height lens
initialization sample
was
were
weights fell
mounted in
selected
and (0.2,
on 0.5),
the
according
bias terms. andtothe
CCD difficulty
camera.
the defect Then,size, ofthe theaperture
namely, sample 1, 2,by fell0.5in
and
3, focal
and(0, 0.33
0.2).respectively.
length Inwere addition,
adjusted the to
The
6, 3.1. Image
6 and 6rate Processing
default boxes. Unit
During the were
training, detected
thedecayIoU of and marked
the positive samplethe MobileNet-SSD network. Specifically, the MobileNet-S
learning
S mensure was the of
set imagingthis
to 0.95 paper theis Sof
quality
and the theexponential
toacquisition
0.2. The unit.
six layers learning
Under of rate
thepyramid
the action initializedthefell
ofrespectively in
to (0.5,
transmission 1),
0.1,contain
and thatrandomof the
mechanism,
4, 6, 6,
ax
negative m in as the training base net of the pre-processed Indatabase; then,
the the trained model was migra
Assample
initialization
3.1. the
Imagesensor shown
weights fell
detected
Processing
in
inUnit
and
the (0.2,
Figure bias 0.5),
6,
workpiece theand
terms. image
and the difficulty
acquisition
produced ofunit
pulse the
signals sample
consists when fell
ofthe aintransmission
(0, 0.2).
vessel maintained addition,
mechanism,
a ofconstant a
6, 6 and
learning 6 default boxes. During the training,
detectionthe IoU of
network the positive
for boundary sample box fell in
regression(0.5, 1),
and that the
classification regression. A total of five
speed rate
proximity
negative sample
of thissensor,
andswitch
spacing.
fell
paper
inThese
(0.2,
anissignals
the exponential
industrial
0.5), and
CCD camera,
triggered
the
decay
the
difficulty CCD alearning
of
light-emitting
camera
the sample
rate
to take initialized
fell
diode
photos.
in
(LED)
(0,
to
In 0.1,
0.2). order
In
and
arc light random
to source
ensure
addition, the
and
that
As shown in height ratiosunit wereconsists selectedof according to the defect size, namely, 1, 2, 3, 0.5 and 0.33 respec
initialization
3.1.athe
Image
lens. Processing
Before
container theFigure
weights and
Unit 6, the
bias
acquisition, terms.image
the LED acquisition
viewarc light source was to adjusted a to transmission
calibrate themechanism,
brightness, and a
learning rate of runs this to the
paper center
is the of exponential field,
was
sensor
decay set to
needs
learning
0.95 and
be accurately
rate theinitialized
debugged.
to 0.2.to 0.1,
The and random
six layers of the pyramid respectively cont
Appl. Sci. 2018, 8, 1678 10 of 17
Appl. Sci. 2018,

spacing. 8, x FOR
These PEER REVIEW
signals triggered 10 of 17
the CCD camera to take photos. In order to ensure that the container
runs to the
Appl. Sci. 2018,center ofPEER
8, x FOR viewREVIEW
field, sensor needs to be accurately debugged. 10 of 17
Figure 6. Image acquisition device. CCD = charge-coupled device. LED = light-emitting diode.
Figure 6.
Figure
The detection 6. Image
Image acquisition
acquisition
network device. on
device.
was trained CCD
CCD == charge-coupled
the charge-coupled device. LED
device.
following hardware: LED = Core
Intel= light-emitting
i7 7700Kdiode.
light-emitting diode.
processor
(Vietnam, 2017) which has a main frequency of 4.2 GHz, 32 GB memory and a GeForce TITAN X
The detection unitnetwork was trained the following
on the following
graphics processing (GPU). The software part used thehardware: Intel Core
Ubuntu 14.04.2 i7 7700K
operating processor
system, and
(Vietnam, 2017) which has a main frequency of 4.2 GHz, 32 GB memory and a GeForce TITAN X
the Tensorflow deep learning framework. Twenty percent of the samples in the pre-processed library
graphics processing
graphics processingunit unit(GPU).
(GPU).TheThesoftware
software part used
thethe Ubuntu 14.04.2 operating system,
andand
were allocated to the test set and the other 80% partto theused
training Ubuntu
set. 14.04.2 operating system, the
the Tensorflow
Tensorflow deepdeep learning
learning framework.
framework. TwentyTwenty percent
percent of theofsamples
the samples
in theinpre-processed
the pre-processed library
library were
were allocated
allocated
3.2. ofto
to the
Comparison theset
test
Three test
and
Deep set and
the theNetworks
other
Learning other
80% to80%the to the training
training set. set.
3.2.The loss function

Comparison of Threeand theLearning
Deep accuracy of the proposed MobileNet-SSD surface defect detection
Networks
Networks
algorithm on the test set (Figure 7) were compared to those of the VGGNet [28], an excellent detection
The loss function and the accuracy
accuracy of
of the
the proposed
proposed MobileNet-SSD
MobileNet-SSD surface
surface defect
defect detection
detection
network in 2014 ImageNet and MobileNet. The three algorithms were trained via migration learning
algorithm
algorithm on
on the test set (Figure 7) were compared to those of the VGGNet [28], an excellent detection
and data enhancement. The training parameters and results are shown in Table 3.
network in 2014 ImageNet and MobileNet. The three algorithms were trained via migration learning
and data enhancement. The training parameters and results
results are
are shown
shown in
in Table
Table3.3.
Figure 7. 7.
Figure Loss function
Loss and
function accuracy
and ofof
accuracy the three
the detection
three networks.
detection VGG
networks. = visual
VGG geometry
= visual group.
geometry group.
Figure 7. Loss function and accuracy of the three detection networks. VGG = visual geometry group.
Table 3. Training parameters and results of the three detection networks. SGD = stochastic gradient
descent.
Table 3. Training parameters and results of the three detection networks. SGD = stochastic gradient
descent. Parameter VGG-16 MobileNet MobileNet-SSD
Parameter VGG-16 MobileNet MobileNet-SSD

Appl. Sci. 2018, 8, 1678 11 of 17
Table 3. Training parameters and results of the three detection networks. SGD = stochastic
gradient descent.
Parameter VGG-16 MobileNet MobileNet-SSD

Basic learning rate 0.003 0.003 0.003
Number of iterations of training 10,000 10,000 10,000
Verifying the number of iterations 50 50 50
Batch quantity 128 128 128
Optimization algorithm SGD SGD SGD
Network parameter (million) ≈12.30 ≈1.76 ≈1.92
Amount of calculation (million) ≈1325.00 ≈127.54 ≈157.32
Mean of loss function 0.14 0.19 0.09
Mean value (%) 93.91 92.33 96.73
In the training process, the detection networks were tested once after two hundred iterations
of the training set. The loss function and accuracy in Table 2 were mean values obtained from 40 to
50 iterations of the test set. It is clear that the MobileNet-SSD detection algorithm achieved better
accuracy than the other two networks with fewer network parameters.
3.3. Results of Defect Detection Network

The trained network parameters were adopted for the MobileNet-SSD defect detection network.
The test set image contained four different types of defect samples, each of which had 30 images
obtained through resampling. Each sample involved one or more defects. The detection results of
the trained MobileNet-SSD defect detection network on the four kinds of defect samples are show in
Table 4.
Table 4. Detection results of the trained MobileNet-SSD algorithm.
Successful Error
Sample Leakage Positive
Defect Type Detection Detection
Number Number Rate (%)
Number Number
Breach 30 30 0 0 100.00
Dent 30 27 2 1 90.00
Burr 30 28 1 1 93.33
Abrasion 30 39 1 0 96.67
Total 120 115 4 2 95.00
It can be seen from Table 3 that the surface defect detection network completes the defect marking
of 120 defect samples with a 95.00% accuracy rate. There were missing and false samples in dent and
burr defects and missing samples in abrasion defects. This is because the notches are more obvious
than the other defects, and related to the image quality and subjective feelings of humans.
When the filling line was in operation, the container passed by the image acquisition device
within a certain distance, triggering the CCD camera to take photos. The defect detection network
performed forward operation. If there were defects in the image, the alarm would buzz and the defect
type and location were identified by the host (Figure 8). The single forward operation of the network
was at the rate of 0.12 s/image.
Appl. Sci. 2018, 8, 1678 12 of 17
Appl. Sci.
Appl. Sci. 2018,
2018, 8,
8, xx FOR
FOR PEER
PEER REVIEW
REVIEW 12 of
12 of 17
17
Figure
Figure 8. Detection effect
8. Detection effect of
of sealing surface defects.
sealing surface
surface defects.
defects.
3.4. Degree
Degree of
3.4. Degree of Defect
Defect Detection
Detection
3.4. of Defect Detection
The
The defects
defectsof ofthe
thesame
sametype
type may
maydiffer in terms
differ in termsof severity.
terms Here,
of severity.
severity. the pre-processed
Here, datasets
the pre-processed
pre-processed were
datasets
The defects of the same type may differ in of Here, the datasets
divided into
were divided three
divided into categories
into three based
three categories on the
categories based defect
based on on the severity:
the defect easy, medium
defect severity:
severity: easy, and hard.
easy, medium The
medium and recognition
and hard.
hard. The
The
were
result can serve
recognition result as a
result can yardstick
can serve
serve asof the network
as aa yardstick
yardstick of classification
of the
the network quality. Seventy
network classification percent
classification quality. of all
quality. Seventy samples
Seventy percent were
percent of
of all
all
recognition
divided
samples into
were the training
divided setthe
into andtraining
the remaining
set and 30%
the to the test set.
remaining 30% The
to detection
the test results
set. The of the breaches
detection results
samples were divided into the training set and the remaining 30% to the test set. The detection results
are shown
of the
the in Figure
breaches 9.
are shown
shown in Figure
Figure 9.9.
of breaches are in
Figure 9.
Figure 9. Static
Static image
image detection
detection results
results of
of notches.
notches.
Figure 1010 shows

shows the
shows the precision–recall
the precision–recall (PR)
precision–recall (PR) curves
(PR) curves of
curves of the
of the experimental
the experimental dataset.
experimental dataset. The
dataset. The advanced
advanced
advanced
multi-task CNN
multi-task CNN (MTCNN)
CNN(MTCNN)
(MTCNN) [29] and
[29][29] Faceness-Net
and and [30]
Faceness-Net
Faceness-Net were contrasted
[30]contrasted
[30] were with
were contrasted the proposed MobileNet-
with theMobileNet-
with the proposed proposed
SSD algorithm
MobileNet-SSD (“MS” in
algorithm the figure)
(“MS” in on
the the experimental
figure) on the dataset.
experimental The experimental
dataset. The
SSD algorithm (“MS” in the figure) on the experimental dataset. The experimental results show results show
experimental that
results
that
the
showrecall
the recall rates
thatrates of the
the recall proposed
of therates algorithm
of the algorithm
proposed were
proposed algorithm93.11%,
were
were 93.11%, 92.18%
93.11%,
92.18% and
and 82.97%,
92.18% and
82.97%, in easy,
in82.97%, medium and
in easy,and
easy, medium hard
medium
hard
subsets,
and
subsets, respectively,
hardrespectively, and its
its performance
subsets, respectively,
and performance was better
and its performance
was better than
was thatthan
better
than that of the
of the contrastive
that algorithms.
of the contrastive
contrastive algorithms.
algorithms.
Appl. Sci. 2018, 8, 1678 13 of 17
(a)
(b)
(c)
Figure
Figure10.10.(a)
(a)Easy
Easyprecision–recall
precision–recall(PR)
(PR)curves
curvesofofthree
threedifferent
differentalgorithms;
algorithms;(b)
(b)Medium
MediumPRPRcurves
curves
ofofthree
threedifferent
differentalgorithms;
algorithms;(c)(c)Hard
HardPR
PRcurves
curvesofofthree
threedifferent
differentalgorithms.
algorithms.MSMS= =
MobileNet-SSD.
MobileNet-SSD.
MTCNN
MTCNN= = multi-task
multi-taskconvolution
convolutionneural
neuralnetwork.
network.
3.5.Contrast
3.5. ContrastExperiment
Experiment
Threecomparative
Three comparativeexperiments
experiments were
were designed
designed to tofurther
furthervalidate
validatethe
theproposed
proposed algorithm.
algorithm.In In
the
first
the experiment,
first experiment,the proposed
the proposed algorithm was compared
algorithm was comparedto five to
lightweight feature extraction
five lightweight networks,
feature extraction
including SqueezeNet [31], MobileNet, performance vs accuracy net (PVANet)
networks, including SqueezeNet [31], MobileNet, performance vs accuracy net (PVANet) [32], [32], MTCNN and
Faceness-Net. The feature extraction accuracy of each algorithm for the
MTCNN and Faceness-Net. The feature extraction accuracy of each algorithm for the ImageNet ImageNet classification task
is displayedtask
classification in Table 5. In the in
is displayed second
Tableexperiment, the above
5. In the second five networks
experiment, werefive
the above contrasted
networks with
werethe
proposed algorithm in defect detection of the filling line in terms of correct detection
contrasted with the proposed algorithm in defect detection of the filling line in terms of correct rate, training
time andrate,
detection the detection
training timetimeandper the
image (Table time
detection 6). per image (Table 6).
Table 5. Feature extraction accuracy of each algorithm. GFLOPS =floating-point operations per
second. GPU = graphics processing unit. PVANet= performance vs accuracy net.
Model Top-1 Top-5 GFLOPS GPU

SqueezeNet 57.64 80.40 0.858 700 ms
MTCNN 63.85 85.03 0.897 640 ms
MobileNet 69.37 89.22 0.513 610 ms
Faceness-Net 70.31 84.39 0.858 720 ms
Appl. Sci. 2018, 8, 1678 14 of 17
Table 5. Feature extraction accuracy of each algorithm. GFLOPS =floating-point operations per second.
GPU = graphics processing unit. PVANet= performance vs accuracy net.
Model Top-1 Top-5 GFLOPS GPU

SqueezeNet 57.64 80.40 0.858 700 ms
MTCNN 63.85 85.03 0.897 640 ms
MobileNet 69.37 89.22 0.513 610 ms
Faceness-Net 70.31 84.39 0.858 720 ms
PVANet 72.43 90.16 0.606 150 ms
MobileNet-SSD 81.26 94.62 0.812 120 ms
Table 6. Correct detection rate, training time and the detection time per image of each network.
Number Model Positive Rate Training Time (Day) Detection Time (s)
1 SqueezeNet 85.83% 2 0.31
2 Faceness-Net 90.83% 3 0.59
3 MobileNet 90.83% <1 0.54
4 MTCNN 91.67% 3 0.64
5 PVANet 94.17% 2 0.25
6 MobileNet-SSD 95.00% <1 0.12
As shown in the two tables, the MobileNet-SSD surface defect model is fast and stable, thanks to
the improved SSD meta-structure of the feature pyramid. In general, the proposed algorithm
outperformed the contrastive algorithms in detection rate, training time and detection time. The final
detection time of our algorithm was merely 120 milliseconds per piece, which meets the real-time
requirements of the industrial filling line.
In Contrast Experiment 3, four traditional defect recognition methods of k-nearest neighbor
(KNN) [33], HMM [34–36], SVM and HMM [37] and back propagation neural network (BPNN) [38]
are realized, which are compared with the method in this paper. The KNN method selects Euclidean
distance as the distance function; the HMM model adopts a sampling window of 5 × 4 size and uses
the discrete cosine transform (DCT) coefficient as the observation vector of HMM. The SVM and HMM
method is the same as in literature [37]. The hidden layer number of the BP neural network is set to
30. The above models are also applied to detect the defects on the sealing surface of a container in the
filling line. The statistical results are shown in Table 7.
Table 7. Correct detection rate and the detection time per image of each model. HMM: Hidden Markov
Model. SVM = support vector machine. KNN = k-nearest neighbor. BPNN = back propagation
neural network.
Number Model Positive Rate Detection Time (s)

1 KNN 65.83% 0.31
2 HMM 66.67% 0.39
3 SVM and HMM 80.83% 0.32
4 BPNN 84.17% 0.14
5 MobileNet-SSD 95.00% 0.12
As can be seen from Table 6, compared with the other traditional defect detection methods,
the MobileNet-SSD method has a higher positive detection rate. Under the same hardware conditions,
MobileNet-SSD still maintains the optimal speed despite the small differences between the above five
methods. In addition, the results of HMM and KNN are not ideal. The reason for this may be that
the proportion of defects is small, and the sealing surface of a container contains a lot of background
information. KNN and HMM did not extract specific features of the image before classifying. However,
both the BP neural network and MobileNet-SSD are based on neural networks, which can automatically
learn features by itself, so the accuracy rate of the two methods are relatively high. MobileNet-SSD,
Appl. Sci. 2018, 8, 1678 15 of 17
due to its unique deep convolutional structure, can learn the deep and detachable features of defects
with a bigger receptive field, so it can achieve a higher positive detection rate.
4. Conclusions
This paper proposes a surface defect detection method based on the MobileNet-SSD network,
and applies it to identify the types and locations of surface defects. In the pre-processing phase,
a regional planning method was presented to cut out the main body of the defect, reduce redundant
parameters and improve detection speed and accuracy. Meanwhile, the robustness of the algorithm was
elevated by data enhancement. The philosophy of MobileNet, a lightweight network, was introduced
to enhance the detection accuracy, reduce the computing load and shorten the training time of this
algorithm. The MobileNet and SSD were adjusted to detect the surface defects, such that the proposed
method could differentiate small defects from the background. The feasibility of the proposed method
was verified by defect detection for the sealing surface of an oil chili filling production line in Guizhou,
China. Specifically, an image acquisition device was established for the sealing surface and the deep
learning framework was adopted to mark the defect positions. The results show that the proposed
method can identify most defects in the production environment at high speed with accuracy. However,
the system also has its limitations. Deep learning models have a certain dependence on the hardware
platform because of computationally intensive processes, and they are not suitable for embedded
systems with general performance. Future research will further improve the proposed method through
integration with embedded chips and the Internet of Things, balancing the classification accuracy and
number of parameters of the detection method, and expand the application scope of our method to
complex defects in industrial processes.
Author Contributions: Project administration, Y.L.; validation, Y.L.; resources, H.H. and Q.X.; investigation, L.Y.
and Q.C.
Funding: This study was supported by the Major Project of Science and Technology in Guizhou Province
(No. [2017]3004) and Natural Science Foundation in Guizhou Province (No. [2015]2043).
Conflicts of Interest: The authors declare no conflict of interest. The founding sponsors had no role in the design
of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, and in the
decision to publish the results.
References
1. Uddin, M.T.; Uddiny, M.A. Human activity recognition from wearable sensors using extremely randomized
trees. In Proceedings of the International Conference on Electrical Engineering and Information
Communication Technology, Dhaka, Bangladesh, 21–23 May 2015; pp. 1–6.
2. Jalal, A.; Sarif, N.; Kim, J.T.; Kim, T.S. Human Activity Recognition via Recognized Body Parts of Human
Depth Silhouettes for Residents Monitoring Services at Smart Home. Indoor Built Environ. 2013, 22, 271–279.
[CrossRef]
3. Zhan, Y.; Kuroda, T. Wearable sensor-based human activity recognition from environmental background
sounds. J. Ambient Intell. Hum. Comput. 2014, 5, 77–89. [CrossRef]
4. Jalal, A. Security Architecture for Third Generation (3G) using GMHS Cellular Network. In Proceedings
of the International Conference on Emerging Technologies, Islamabad, Pakistan, 12–13 November 2008;
pp. 74–79.
5. Shire, A.N.; Khanapurkar, M.M.; Mundewadikar, R.S. Plain Ceramic Tiles Surface Defect Detection Using
Image Processing. In Proceedings of the International Conference on Emerging Trends in Engineering and
Technology, Port Louis, Mauritius, 18–20 November 2012; pp. 215–220.
6. Shang, L.; Yang, Q.; Wang, J.; Li, S.; Lei, W. Detection of rail surface defects based on CNN image recognition
and classification. In Proceedings of the International Conference on Advanced Communication Technology,
Chuncheon-si, Korea, 11–14 February 2018; pp. 45–51.
7. Jalal, A.; Kim, S. Advanced Performance Achievement using Multi-Algorithmic Approach of Video
Transcoder for Low Bitrate Wireless Communication. ICGST Int. J. Graph. Vis. Image Process. 2004, 5,
27–32.
Appl. Sci. 2018, 8, 1678 16 of 17
8. Deutschl, E.; Gasser, C.; Niel, A.; Werschonig, J. Defect detection on rail surfaces by a vision based system.
In Proceedings of the Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 507–511.
9. Yazdchi, M.; Yazdi, M.; Mahyari, A.G. Steel Surface Defect Detection Using Texture Segmentation Based
on Multifractal Dimension. In Proceedings of the International Conference on Digital Image Processing,
Bangkok, Thailand, 7–9 March 2009; pp. 346–350.
10. Kamal, S.; Jalal, A.; Kim, D. Depth Images-based Human Detection, Tracking and Activity Recognition Using
Spatiotemporal Features and Modified HMM. J. Electr. Eng. Technol. 2016, 11, 1921–1926. [CrossRef]
11. Jalal, A.; Kim, S. Global Security Using Human Face Understanding under Vision Ubiquitous Architecture
System. World Acad. Sci. Eng. Technol. 2006, 13, 7–11.
12. Patil, K.; Kulkarni, M.; Sriraman, A.; Karande, S. Deep Learning Based Car Damage Classification.
In Proceedings of the IEEE International Conference on Machine Learning and Applications, Cancun,
Mexico, 18–21 December 2017; pp. 50–54.
13. Zhang, Z.; Alonzo, R.; Athitsos, V. Experiments with computer vision methods for hand detection.
In Proceedings of the Petra 2011 International Conference on Pervasive Technologies Related to Assistive
Environments, Crete, Greece, 25 May 2011; pp. 1–6.
14. Jeon, Y.J.; Choi, D.C.; Lee, S.J.; Yun, J.P.; Kim, S.W. Steel-surface defect detection using a switching-lighting
scheme. Appl. Opt. 2016, 55, 47–57. [CrossRef] [PubMed]
15. Kabouri, A.; Khabbazi, A.; Youlal, H. Applied multiresolution analysis to infrared images for defects
detection in materials. NDT E Int. 2017, 92, 38–49.
16. Krummenacher, G.; Ong, C.S.; Koller, S.; Kobayashi, S.; Buhmann, J.M. Wheel Defect Detection with Machine
Learning. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1176–1187. [CrossRef]
17. Fu, L.; Wang, Z.; Liu, C. Research on surface defect detection of ceramic ball based on fringe reflection.
Opt. Eng. 2017, 56, 104104.
18. Jian, C.; Gao, J.; Ao, Y. Automatic Surface Defect Detection for Mobile Phone Screen Glass Based on Machine
Vision. Appl. Soft Comput. 2016, 52, 348–358. [CrossRef]
19. Win, M.; Bushroa, A.R.; Hassan, M.A.; Hilman, N.M.; Ide-Ektessabi, A. A Contrast Adjustment Thresholding
Method for Surface Defect Detection Based on Mesoscopy. IEEE Trans. Ind. Inform. 2017, 11, 642–649.
[CrossRef]
20. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
21. Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level
classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [CrossRef] [PubMed]
22. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.;
Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017,
42, 60–88. [CrossRef] [PubMed]
23. Xian-Bao, W.; Jie, L.; Ming-Hai, Y.; Wen-Xiu, H.; Yun-Tao, Q. Solar Cells Surface Defects Detection Based on
Deep Learning. Pattern Recognit. Artif. Intell. 2014, 27, 517–523.
24. Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional
Neural Networks. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
25. Han, K.; Sun, M.; Zhou, X.; Zhang, G.; Dang, H.; Liu, Z. A new method in wheel hub surface defect detection:
Object detection algorithm based on deep learning. In Proceedings of the International Conference on
Advanced Mechatronic Systems, Xiamen, China, 6–9 December 2017; pp. 335–338.
26. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox
Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands,
11–14 October 2016; pp. 21–37.
27. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017,
arXiv:1704.04861.
28. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv
2014, arXiv:1409.1556.
29. Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded
Convolutional Networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [CrossRef]
30. Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Faceness-Net: Face Detection through Deep Facial Part Responses.
IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1845–1859. [CrossRef] [PubMed]
Appl. Sci. 2018, 8, 1678 17 of 17
31. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level
accuracy with 50 × fewer parameters and <0.5 MB model size. arXiv, 2016; arXiv:1602.07360.
32. Hong, S.; Roh, B.; Kim, K.H.; Cheon, Y.; Park, M. PVANet: Lightweight Deep Neural Networks for Real-time
Object Detection. arXiv, 2016; arXiv:1611.08588.
33. Hunt, M.A.; Karnowski, T.P.; Kiest, C.; Villalobos, L. Optimizing automatic defect classification feature
and classifier performance for post. In Proceedings of the 2000 IEEE/SEMI Advanced Semiconductor
Manufacturing Conference and Workshop, Boston, MA, USA, 12–14 September 2000; pp. 116–123.
34. Jalal, A.; Kamal, S.; Kim, D. A depth video sensor-based life-logging human activity recognition system for
elderly care in smart indoor environments. Sensors 2014, 14, 11735–11759. [CrossRef] [PubMed]
35. Jalal, A.; Kim, Y.H.; Kim, Y.J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using
spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [CrossRef]
36. Jalal, A.; Kamal, S.; Kim, D. Individual detection-tracking-recognition using depth activity images.
In Proceedings of the 2015 12th International Conference on IEEE Ubiquitous Robots and Ambient
Intelligence (URAI), Goyang, Korea, 28–30 October 2015; pp. 450–455.
37. Wu, H.; Pan, W.; Xiong, X.; Xu, S. Human activity recognition based on the combined svm&hmm.
In Proceedings of the 2014 IEEE International Conference on Information and Automation (ICIA), Hailar,
China, 28–30 July 2014; pp. 219–224.
38. Islam, M.A.; Akhter, S.; Mursalin, T.E.; Amin, M.A. A suitable neural network to detect textile defects.
In Proceedings of the International Conference on Neural Information Processing, Hong Kong, China,
3–6 October 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 430–438.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Li 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Li 2018

Uploaded by

Copyright:

Available Formats

applied

Keywords: surface defects; meta structure; convolution neural network; MobileNet-SSD

Appl. Sci. 2018, 8, 1678; doi:10.3390/app8091678 www.mdpi.com/journal/applsci

2. Surface Defect Detection Method

Figure 1. The flow of image pre-processing. NMS = non-maximum suppression.

Table 1. The implementation method of data expansion.

Data Expansion Mode Changes

Rotation From 0° to 10° random rotation of images

Horizontal migration images, and the offset value is

Vertical migration images, and the offset value is 10% of

Table 1. The implementation method of data expansion.

Data Expansion Mode Changes

2.1.2. Defect Area Planning

The gradient direction can be expressed as:

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

2.2. 2.2. Defect

Figure 3. The basic convolutional structure of MobileNet. Dw = depth-wise layers. Pw = point-wise

y = min(max(z, 0), 6) (6)

where z is the value of each pixel in the feature map.

ĜM = ∑ K̂1,M ∗FM (8)

2.2.2. SSD Meta Structure

11 Lconf (s, c)+αLloc (r, l, g))

Figure 4. Default boxes on feature maps.

2.2.3. Surface Defect Detection Algorithm Based on MobileNet-SSD Model

2.2.3. Surface Defect Detection Algorithm Based on MobileNet-SSD Model

Table 2. General structure of MobileNet-SSD. ※ = the feature image to be used in classification

Convolution Step Input Output

Table 2. General structure of MobileNet-SSD. ※Conv7_Dw_Pw(3

Appl. Sci. 2018,

3.2.The loss function

Parameter VGG-16 MobileNet MobileNet-SSD

Parameter VGG-16 MobileNet MobileNet-SSD

3.3. Results of Defect Detection Network

Table 4. Detection results of the trained MobileNet-SSD algorithm.

Figure 1010 shows

Model Top-1 Top-5 GFLOPS GPU

Model Top-1 Top-5 GFLOPS GPU

Number Model Positive Rate Detection Time (s)

You might also like