Intersecting Machining Feature Localisation and Recognition Via Single Shot Multibox Detector.

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3030620, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1
Intersecting machining feature localisation and

recognition via single shot multibox detector
Peizhi Shi, Qunfen Qi, Yuchu Qin, Paul J. Scott, and Xiangqian Jiang
Abstract—In Industrie 4.0, machines are expected to become machine fault diagnosis [5], predictive maintenance [6], 3D
autonomous, self-aware and self-correcting. One important step object acquisition [7], retrieval [8], and recognition [9] for
in the area of manufacturing is feature recognition that aims manufacturing automation), as well as in other research areas
to detect all the machining features from a 3D model. In this
research area, recognising and locating a wide variety of highly (e.g. image retrieval [10], medical volume segmentation [11]).
intersecting features are extremely challenging as the topology These techniques enable intelligent agents to automatically
information of features is substantially damaged because of the learn from data without being explicitly programmed. To
feature intersection. Motivated by the single shot multibox de- this end, two novel approaches named FeatureNet [12] and
tector (SSD), this paper presents a novel deep learning approach MsvNet [13] that adopt machine learning techniques for
named SsdNet to tackle the machining feature localisation and
recognition problem. The typical SSD is designed for 2D image intersecting feature recognition have been developed. The
objection detection rather than 3D feature recognition. Therefore, two approaches are general purpose method in which ad-
the network architecture and output of SSD are modified to hoc rules are not required any more, such that they can
fulfil the purpose of this research. In addition, some advanced recognise a wide variety of features without imposing the
techniques are also utilised to further enhance the recognition burdens on the recogniser designer. In the two approaches,
performance. Experimental results on the benchmark dataset
confirm that the proposed method achieves the state-of-the-art unsupervised segmentation algorithms were utilised to divide
feature recognition performance (95.20% F-score), localisation intersecting features into separated features according to the
performance (90.62% F-score) and recognition efficiency (243.85 features’ shape information. Then, the deep learning methods
milliseconds per model). were employed to recognise these segmented features one by
Index Terms—Industrie 4.0, 3d feature localisation, feature one. However, it is rather difficult to accurately segmenting
recognition, single shot multibox detector, deep learning. intersecting features according to the shape information in
an unsupervised way, since the topology information of the
I. I NTRODUCTION features might be destroyed because of feature intersection.
This will lead to a large amount of features in a CAD model be
In the realm of manufacturing, every product starts with mis-recognised or mis-located, as evident in the experiments
a (or a set of) computer-aided design (CAD) model (or carried out in [13].
models). As we are now marching towards a new era of Motivated by an effective yet efficient object detection
smart manufacturing (or so called Industrie 4.0), machines algorithm named single shot multibox detector (SSD) [14], this
are expected to become autonomous, self-aware and self- paper presents a novel method called SsdNet where feature
correcting. One of the essential steps towards such advance, is segmentation and recognition are carried out together via
the ability of a machine to “understand” a given CAD model, supervised learning. The typical SSD is a deep neural network
that is, recognise any machining features of the model. This designed for 2D image objection detection, which cannot be
is called feature recognition. directly employed to recognise 3D models. Therefore, this
Machining feature recognition has become an active re- paper modified the network architecture and outputs of SSD
search topic since 1980s, where a large number of methods to tackle intersecting 3D machining feature localisation and
have been proposed. Most methods were implemented based recognition problem. This paper further utilises data augmen-
on manually designed rules. In these rule-based approaches, tation (DA) and transfer learning (TL) to improve the training
recognising a wide variety of highly intersecting features performances.
remains a somewhat challenging task [1]–[4] as in-depth The main contribution in this paper is an approach named
knowledge about different features and feature combinations SsdNet capable of yielding the state-of-the-art intersecting
is required. In recent years, machine learning techniques have feature recognition performance, localisation performance, and
been widely utilised in the area of smart manufacturing (e.g. recognition efficiency. As a minor contribution, a compre-
This work was supported by the EPSRC UKRI Innovation Fellow- hensive evaluation of the SsdNet and other learning-based
ship (Ref. EP/S001328/1), EPSRC Future Advanced Metrology Hub approaches is also conducted in this paper.
(Ref. EP/P006930/1) and EPSRC Fellowship in Manufacturing (Ref. The rest of this paper is organised as follows. Section II
EP/R024162/1).
The authors are with EPSRC Future Advanced Metrology Hub, School reviews the existing intersecting feature recognition methods
of Computing and Engineering, University of Huddersfield, Huddersfield and identifies the research gaps in this research area. Sec-
HD1 3DH, United Kingdom (e-mail: p.shi@hud.ac.uk; q.qi@hud.ac.uk; tion III presents a novel method called SsdNet that overcomes
y.qin@hud.ac.uk; p.j.scott@hud.ac.uk; x.jiang@hud.ac.uk). Corresponding
author: Qunfen Qi. the limitations arising from the existing methods. Section IV
The source code is available online: https://github.com/PeizhiShi/SsdNet. fully examines the performance of the SsdNet, and compares
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3030620, IEEE
the SsdNet to other intersecting feature localisation and recog- one. In general, watershed algorithm can yield expected results
nition approaches. Section V draws the conclusion. when segmenting features with low overlap degree, but fails to
separate highly intersecting features as the shape information
of most features is lost because of the feature intersection.
II. R ELATED WORK
To solve the issues arising from the FeatureNet, Shi et al.
Recognition and localisation are two critical tasks in intel- proposed a novel intersecting feature recognition approach
ligent system development [15], [16]. In the area of manu- named MsvNet [13]. In this approach, a 3D model with
facturing, feature recognition refers to the task for predicting intersecting features was first segmented into separated ones
the correct number and type of features appeared in the given via another unsupervised learning algorithm named selective
CAD model, while feature localisation refers to the task of search algorithm according to the 2D shape information of the
finding the precise locations of features in the CAD model. features. Then, these segmented features were passed through
Machining feature localisation and recognition can be carried a novel view-based 2D convolutional neural network (CNN)
out by either rule-based [2] or learning-based approaches [17]. for further recognition. Unlike the watershed algorithm that
The former imply that human developers utilise the knowledge only produces one set of segmentation results based on one
and experience to design rules for localisation and recognition, 3D model, the selective search algorithm can enumerate most
while the latter aim to create feature recognisers via machine potential features in a 3D model. Therefore, more intersect-
learning techniques from human labelled data. As isolated ing features are likely to be found by the selective search
feature recognition problem has already been perfectly solved, algorithm, which leads to a better localisation and recognition
this section will have a particular focus on an overview of the performance than the FeatureNet. Both the FeatureNet and
intersecting feature localisation and recognition approaches. MsvNet suffer from the following limitations: (1) Due to the
A promising rule-based intersecting machining feature nature of unsupervised segmentation algorithms involved in
recognition approach is the hint-based approach [18]. In this these methods, a large number of highly intersecting features
approach, an important concept named hint, which refers to could still be mis-recognised or mis-located as the topology
the minimum indispensable parts of a feature, was presented. information of these features is substantially damaged because
During the recognition, the hint-based system first achieves of the feature intersection. Therefore, unsatisfactory locali-
all potential hints from a 3D model. Then, a geometric sation and recognition results could be produced, which is
completion procedure, which includes a heuristic geometric also illustrated in the experiments; (2) The FeatureNet and
reasoning and matching procedure, is defined and adopted the MsvNet can be regarded as two-stage methods in which
to find features from a given CAD model according to the feature segmentation and recognition are conducted separately.
hint instances. Both hint-based and other existing rule-based Therefore, segmented features need to be passed through the
methods (e.g. STEP-based [19], [20], volumetric decomposi- neural networks multiple times, which will slow down the
tion [21], and graph-based approaches [22]–[24]) suffer from a whole recognition process.
number of limitations: (1) In-depth knowledge about different
features is required to design a reliable rule-based approach; III. F RAMEWORK
(2) Designing heuristic rules becomes more challenging in
This section first discusses the relevant issues raised in the
intersecting feature recognition, as the topology of a feature
existing approaches, which motivate the proposed method, and
is destroyed and most faces in the feature are lost. There-
makes an overview on the proposed approach to intersecting
fore, the rule developer has to consider all combinations of
feature localisation and recognition. Then, the neural network
features, and carefully check whether the proposed rules (e.g.
construction process and the final feature localisation and
hints, geometric completion procedures) are valid in all the
recognition process are illustrated in details.
situations; (3) Most rule-based approaches adopt matching or
searching algorithms to identify the potential feature in the 3D
model (e.g. the geometric completion procedure in hint-based A. Overview
approaches), which is computationally expensive [2]; As discussed in Section II, existing learning-based methods
As noted, designing heuristic rules for intersecting feature (MsvNet [13] and FeatureNet [12]) suffer from a number of
localisation and recognition is not an easy task. Some learning- limitations, which motivate the research conducted in this pa-
based approaches have been applied to reduce the effort per. Therefore, the main research problem that this paper aims
required for the rule developers. However, most of these to tackle is: how to locate and recognise highly intersecting
approaches (e.g. [25]–[27]) can only tackle limited types of machining features from a CAD model efficiently. To solve
feature intersections, and/or focus on specific type of CAD this research problem, several advanced methods are explored
representations. To tackle the above issues, Zhang et al. [12] in this paper.
presented a feature recogniser named FeatureNet which can Both the MsvNet and FeatureNet are two-stage methods
locate and recognise any types of intersecting features in a with the above-mentioned limitations. A one-stage method
given CAD model. In this approach, an unsupervised segmen- that conducts feature segmentation and recognition together
tation algorithm, called watershed algorithm, was employed to via supervised learning seems to be a proper solution to the
divide intersecting features into separated features according to above issues. In a supervised algorithm, different kinds of
the features’ shape information. A 3D convolutional network intersecting features can be seen at the training stage rather
was then utilised to recognise these segmented features one by than the test stage, which allows for producing much better
segmentation and recognition performances. In a one-stage denoted as the label of the 3D model. Then, a large number
algorithm, feature segmentation and recognition are carried of 3D multi-feature models with labels can be constructed
out together, which could speed up the recognition process. effectively.
As evident in the experiments conducted in [13], segmenting As discussed in Section II and [13], feature segmentation
intersecting features in 3D space is rather arduous. Experi- in a 3D space is much more challenging than in a 2D
mental results also demonstrated that it was relatively easy space. Therefore, a 3D model with intersecting features is
to locate and recognise 3D intersection features from 2D converted into a number of view images in this paper. These
view images [13]. To this end, a one-stage supervised feature images are employed as inputs of the proposed network as it
segmentation and recognition algorithm based on 2D view will allow for training an effective and efficient network for
images is an ideal solution to the research problem. In other feature segmentation and recognition easily. Suppose that the
words, the proposed deep neural network takes a view image dimension of a voxelised 3D model is d×d×d. This approach
as input, and predicts the types and 3D locations of all features scans this model from six directions, and takes six d×d images
appeared in this view direction. Finally, the 3D bounding from this model accordingly. In each image, the value of each
boxes achieved from different view directions are combined pixel represents the maximal depth of all features appeared at
together to form the final results. In summary, the SsdNet this position, as exemplified in Figure 1.
consists of two parts: one-stage supervised feature localisation
and recognition (to predict the types and locations of features
appeared in different view directions), and resulting fusion
(to form the final prediction results). The machine learning
techniques employed in each part are shown in the next two
sections respectively.
B. Network construction
The main purpose of this section is to construct a deep
neural network that maps a 2D view image to 3D locations
of all features appeared in this view direction. To attain this
goal, single shot multibox detector (SSD) [14], an effective
yet efficient one-stage object detection algorithm, is adopted
in this paper, as it is capable of identifying objects appeared
in an image effectively. The original SSD is designed for
image object detection, where the output of the algorithm
is a set of 2D bounding boxes. It means that it cannot be
applied to the research problem directly since the output in
feature localisation should be 3D locations of the features
rather than 2D bounding boxes. Therefore, this paper adjusts
the output of the original SSD to tackle the 3D feature
localisation and recognition problem. In addition to the output, Fig. 1. 2D images taken from different directions: (A) the original model, (B)
the architecture of the SSD is also modified to make the the transparent model, (C) view labels in six directions and (D) the images
with view labels. Each pixel value of images in (D) represents the maximal
training and recognition processes more efficient. depth of all features appeared at this position.
This section first discusses how to prepare the data for
training. Then, the novel network architecture designed based To improve the training performances, data augmentation
on the research problem and its training process are presented (DA), a widely used technique in the area of machine learning,
in details. is adopted in this approach. This method is able to consider-
1) Data preparation: As the SSD based approach is su- ably increase the diversity of training samples. In this paper,
pervised learning, 3D models with intersecting features and three DA strategies are employed: random flipping, random
their corresponding labels are required at the training stage. resizing, and random combination. In the 1st strategy, the
In practice, however, it may be easier to get 3D models 2D image is horizontally or vertically flipped with a small
with single features than models with intersecting features. probability. This strategy is capable of producing new training
To tackle this problem, this approach synthesises multi-feature images that contain features with different locations. In the
models by combining single feature models together. In this 2nd strategy, constant padding is applied to the top, bottom,
paper it is assumed that a dataset which consists of different left and right of the 2D training image. This padded image is
types of 3D voxelised single feature models is available. finally resized to the original size. This strategy could produce
Before training, a number of 3D models with single features new training images with smaller machining features. In the
(e.g. 2-10 models) are randomly selected from this dataset, 3rd strategy, the 2D image is combined with another training
and combined together via the boolean operation to form a image via the element-wise maximum operation. The labels (or
multi-feature model. The types and bounding boxes of all the bounding boxes) of two images are also concatenated together.
features in this newly constructed model are also recorded, and This strategy allows for producing training images which
contain more machining features. The detailed DA process TABLE I

is illustrated in Figure 2. T HE NETWORK ARCHITECTURE TABLE . T HIS NETWORK CONSISTS OF A
BASE NET AND A MULTIBOX NET. T HE BASE NET IS UTILISED TO CREATE
SIX ACTIVATION MAPS , WHILE THE MULTIBOX NET IS EMPLOYED TO
PRODUCE THE FINAL RESULTS BASED ON THE ACTIVATION MAPS .
Input Base Net Activation Map Multibox Net Output
Conv 64
Conv 64 Norm
3×64×64 → Max Pool → 128×32×32 → Conv 6×(c+5) → 6×(c+5)×32×32
Conv 128
Conv 128
↓
Max Pool Norm
Conv 256 → 256×16×16 → Conv 6×(c+5) → 6×(c+5)×16×16
Conv 256
Conv 256
↓
Fig. 2. Data augmentation strategies: (A) random flipping; (B) random Max Pool Norm
resizing; and (C) random combination. Conv 512 → 512×8×8 → Conv 6×(c+5) → 6×(c+5)×8×8
Conv 512
Conv 512
2) Network architecture: The original SSD [14] is a feed-
↓
forward convolutional neural network, which maps a 2D image
Max Pool
to a set of 2D bounding boxes and class scores of objects Conv 512
appeared in this image. It employed a high resolution image Conv 512
Conv 512 → 1024×4×4 → Conv 6×(c+5) → 6×(c+5)×4×4
(3 × 300 × 300) as input, which is computationally expensive. Max Pool
The output of the proposed framework is supposed to be a set Conv 1024
Conv 1024
of 3D feature bounding boxes rather than 2D boxes. Therefore,
↓
this paper adjusts the architecture and output of the original
Conv 256 → 512×2×2 → Conv 6×(c+5) → 6×(c+5)×2×2
SSD to fulfil the purpose of this paper. The modified network Conv 512
takes a smaller 2D view image as input, and predicts a set of ↓
3D bounding boxes and class scores of all features appeared in Conv 128 → 256×1×1 → Conv 6×(c+5) → 6×(c+5)×1×1
this view direction. Such a modification could produce a much Conv 256
better result in terms of recognition accuracy and efficiency.
As shown in Table I, the network contains two components:
a base net and a multibox net. The former is employed to
For an activation map, six pre-defined reference bounding
produce six activation maps with different sizes based on the
boxes are associated with each cell of the activation map in
input view image, while the latter is utilised to predict the
this approach (see the dotted line boxes in Figure 3 (A)). It is
types and locations of the 3D machining features at multiple
observed from Figure 3 (A) that each reference box is centred
scales based on the six activation maps.
at the cell of the activation map, and has fixed shape and
It is also observed from Table I that the network contains
size. The bounding box for a machining feature, however, is
25 convolutional layers in total (19 in the base net, 6 in the
supposed to have arbitrary shape, size and location (see the
multibox net). Each convolutional layer consists of a fixed
purple dashed line box in Figure 3 (B)). To attain this goal,
number of kernels, which are employed to apply some effects
the multibox network predicts offsets of the machining feature
to the inputs of neurons. The kernel operation is conducted
bounding box relative to the pre-defined reference bounding
as:
box, and the confidence score for each type of machining
feature (as shown in Figure 3 (B)). Therefore, 6 × m × m
X
z(i, j, k) = q(l, j + m − 1, k + n − 1)k(i, l, m, n), (1)
l,m,n reference bounding boxes can be pre-defined based on the
activation map of size m × m, and 6 × m × m machining
where q is the input of the neuron, z(i, j, k) is the output at a feature bounding boxes can be constructed based on these pre-
location (j, k) for the ith channel, and k is the kernel matrix. defined reference boxes. The dimension of the final predictions
In addition to the convolutional layers, three l2 norm layers is 6 × (c + 5) × m × m, where 6 refers to the number of
[28] are adopted to normalise the activation maps achieved machining feature bounding boxes per cell, c is the number
from the earlier convolutional layers since the earlier activation of feature types and 5 refers to the five offset values (height
maps usually have larger values than the latter maps. After the h, width w, depth l and centre coordinate (cx, cy)) of the
l2 normalisation, all activation maps will have a similar value machining feature bounding box relative to the pre-defined
range. The l2 norm operation is defined as: reference box (as illustrated in Figure 3 (B)).
z(i, j, k) 3) Training: In the proposed deep neural network, it is
o(i, j, k) = γi pP , (2) essential to know which reference bounding box is responsible
2
i |z(i, j, k)|
for predicting a certain ground truth bounding box. To tackle
where γi is a learnable scaling factor for the channel i. this issue, Liu et al. [14] presented a positive and negative
k achieved from the ith reference box. e is the exponential

constant approximately equal to 2.71828. The localisation loss
is defined as:
X X
Lloc (x, p, t) = xki,j SmoothL1 (pm m
i − t̂j ),
m∈{cx,cy,w,h,l} i∈P os
(5)
where pm i refers to the predicted localisation offset based on
Fig. 3. Reference bounding boxes and machining feature bounding boxes in the ith reference box, and SmoothL1 is the Smooth L1 loss
the activation map (adapted from [14]). (A) A 4 × 4 activation map and the
six corresponding reference bounding boxes (the dotted line boxes) in one [29]. This loss function is selected since it is less sensitive to
cell. (B) A machining feature bounding box (the purple dashed line box) and outliers than other loss functions, and is capable of preventing
its corresponding reference box (the red dotted line box). ∆(h, w, l, cx, cy) exploding gradients during training [29]. The overall loss for
in (B) refer to the location offset values of the purple machining feature box
relative to the red reference box. (s1 , ..., sc ) refer to the confidence scores all matches is calculated as:
for all feature types. 1
L(x, s, p, t) = (Lconf (x, s) + Lloc (x, p, t)), (6)
N
matching strategy to find corresponding reference bounding where N refers to the number of matches.
boxes for a ground truth bounding box. In this approach, the At the beginning of the training stage, network parameter
ratio between the overlapped area over the joined area, called initialisation is an important step since a better learning result
Intersection over Union (IoU) value [13], is utilised to measure could be achieved from a well-initialised neural network. To
the degree of overlap between two bounding boxes. A match tackle this problem, transfer learning (TL), a popular method
is positive when IoU value between two boxes is greater than for knowledge transfer and parameter initialisation, is adopted
0.5. The rest of matches are regarded as negative. It is obvious in this paper. This technique is capable of employing the
that there are more negative matches than positive ones, which knowledge gained from one problem to solve another problem.
makes the training data extremely imbalanced. Therefore, this In general, the network trained on a dataset of visual objects
approach only selects negative reference bounding boxes with contains deep knowledge of object detection, and could be
top loss values and guarantees that the ratio between the utilised to initialise the parameters (e.g. weights and biases) in
selected positive ones and negative ones is 1:3, as suggested the another network for object detection. Therefore, this paper
in [14]. employs a pre-trained SSD network on the pascal visual object
classes (VOC) benchmark set [30] to initialise parameters in
Suppose that there is a matching between a ground truth
the proposed network since the VOC is a large set for object
bounding box t and reference box d. The width, height, depth
detection. The detailed process of utilising TL is presented
and centre coordinate of a box are denoted as w, h, l and
in Section IV-B. During the training, this approach employs
(cx, cy) respectively. The encoded offsets of the ground truth
the Adam optimiser to minimise the loss function L(x, s, p, t),
box t relative to the reference box d are defined as:
as this optimiser can converge to minimum faster than other
t̂w = log(tw /dw ), t̂h = log(th /dh ), t̂l = log(tl /dl ), optimisers.
t̂cx = (tcx − dcx )/dw , t̂cy = (tcy − dcy )/dh , (3)

C. Feature localisation and recognition
w h l cx cy
where t , t , t and (t , t ) refer to the width, height, depth As illustrated in the previous section, a neural network
and centre coordinate of the ground truth bounding box. dw , which maps a 2D view image to a number of possible 3D
dh , dl and (dcx , dcy ) refer to the width, height, depth and feature locations is constructed. Therefore, the next issue
centre coordinate of the reference bounding box as illustrated is how to utilise this network for feature localisation and
in Figure 3. t̂w , t̂h , t̂l , (t̂cx , t̂cy ) refer to the width, height, recognition based on a 3D model rather than the 2D images.
depth and centre coordinate offset values of the ground truth To attain this goal, the six view images of the 3D model are
box t relative to the reference box d. first passed through the network. Then the outputs of the neural
In this approach, confidence loss (Lconf ) and localisation network are decoded (see Figure 4 (A) and (B)) as follows:
loss (Lloc ) are employed to train the neural network. The w h l
former measures how confident the deep network is of making t̄w = dw ep , t̄h = dh ep , t̄l = dl ep ,
a class prediction, while the latter is the mismatch between t̄cx = pcx dw + dcx , t̄cy = pcy dh + dcy , (7)
the predicted box and ground truth box. The confidence loss
is defined as: where pw , ph , pl and (pcx , pcy ) refer to the predicted width,
k 0 height, depth and centre coordinate offset values based on the
X esi X esi reference box d. t̄w , t̄h , t̄l and (t̄cx , t̄cy ) refer to the decoded
Lconf (x, s) = − xki,j
log( P sk )− log( P sk ),
ke ke width, height, depth and centre coordinate of predicted bound-
i i
i∈P os i∈N eg
(4) ing box.
where xki,j is an indicator value, which equals to one when After decoding, six sets of machining features could be
there is a match between the ith reference bounding box produced based on the six view images of the 3D model (see
and the jth ground truth box for the feature type k. ski Figure 4 (B)). Then, the bounding boxes of these features
refers to the predicted confidence score for the feature type are concatenated together, as illustrated in see Figure 4 (C).
It is observed that there are several redundant features in IV. E XPERIMENTAL RESULTS
the result achieved from the previous step since the neural Based on the framework presented in the previous section,
network is designed to find all potential features from a given this section first makes a comparison between the proposed
CAD model. To remove these features, soft non-maximum approach and other learning-based approaches (the MsvNet
suppression (Soft-NMS) [31], the state-of-the-art bounding [13] and FeatureNet [12]) in terms of intersecting machining
box selection method, with maximum cut algorithm [32] is feature localisation and recognition. Then, the effects of dif-
adopted. As suggested in [13], this method is capable of ferent learning strategies in the SsdNet are further examined.
eliminating redundant features effectively.
A. Benchmark dataset
As shown in Section III and [12], [13], a single feature
dataset is required to fully train the SsdNet, MsvNet and
FeatureNet. Therefore, the benchmark single feature set con-
structed in [12] is adopted in this experiment since it is a
diverse set with 24 different types of machining features. In
total, there are 24000 3D STL models in this set, 1000 for
each type of features.
In addition to the single feature dataset, a multi-feature
dataset is also required to test the localisation and recognition
performances of different approaches. Therefore, the bench-
mark multi-feature set presented in [13] is employed in this
experiment for testing purpose since this set consists of 1000
STL models with highly intersecting features. Shi et al. [13]
divided the dataset into ten different groups according to the
intersecting degree of features.
All the methods in this comparative study require 3D
voxelised models for training and testing. Therefore, a toolbox
named binvox is employed to convert 3D STL models in two
benchmark sets into 3D 64 × 64 × 64 grids as carried out
in [12], [13]. Therefore, each set contains models with shape
64 × 64 × 64. To make a fair comparison, all the experiments
are conducted under an identical optimal setting as suggested
in [13]. The above networks are trained and validated on the
benchmark single feature set [12], and tested on the benchmark
multi-feature set [13]. The single feature set is divided into
training and validation sets (90%:10%). At the training phase,
only 512 models per feature type (51.2%) are utilised to train
the networks, the same as in [13]. All the models in the
benchmark multi-feature set are selected to form a test set.
Fig. 4. Result fusion process: (A) predicted features achieved from each The information about the training, validation and test sets is
direction in 2D images (the depth information about each bounding box is summarised in Table II.
not displayed), (B) predicted features achieved from each direction in 3D
models, (C) combined features in a 3D space, and (D) final result.
TABLE II
T HE DATASET DESCRIPTION .
The Soft-NMS [31] algorithm starts with a list of 3D
location bounding boxes B = {b1 , ..., bn } and their corre- Training Set Validation Set Test Set
sponding localisation-recognition scores S = {s1 , ..., sn }. In Source FeatureNet Set [12] FeatureNet Set [12] MsvNet Set [13]
this algorithm, both the bounding boxes and corresponding Dimension 64 × 64 × 64 64 × 64 × 64 64 × 64 × 64
Property single feature models single feature models multi-feature models
score values are captured from the output layer of the SsdNet. Feature Type 24 machining features 24 machining features 24 machining features
Then, a greedy procedure is carried out to move a 3D bounding Set size 512 models per type 100 models per type 1000 models in total
box bm with the highest score value from B to a new bounding
box set D, and reduce the score value of each bounding box
bi in B proportional to the IoU values between the bm and bi .
This greedy procedure is terminated when all boxes in B are B. Experimental settings
moved to D. At the end, boxes in D with high score values In the SsdNet, 2D images instead of 3D models are required
are selected as final results via the max-cut algorithm [32]. for training the deep network. Therefore, 2.8M training images
It is observed from Figure 4 (D) that, this approach effec- and 1K validation images are created based on the 3D models
tively removes redundant and wrongly recognised machining in the training and validation sets by following the procedures
features. described in Section III-B1. As stated in Section III-B2, the
proposed network consists of a base net and a multibox net. TABLE III
Transfer learning (TL) adopts a pre-trained SSD network on F- SCORE FOR FEATURE RECOGNITION (%).
the VOC dataset to initialise parameters in the base net since
Test data group [13]
the structures of the base nets in the original SSD network
Method all 1 2 3 4 5 6 7 8 9 10
[14] and the proposed SsdNet are identical. The bias in each
neuron of the multibox net is set to zero, while each weight SsdNet 95.20 99.33 98.82 98.02 97.38 97.02 95.77 95.53 94.08 93.26 90.91
MsvNet 76.24 95.93 87.95 83.87 78.22 76.33 76.83 77.49 75.14 70.90 66.77
in the multibox net is set to a small random number. Other FeatureNet 57.45 92.83 81.01 64.35 59.09 60.13 57.01 54.78 53.80 50.18 46.49
technologies in TL (e.g. weight freezing) are not adopted in
this paper. The batch size is set as 16, while the number of
learning epochs is set to 4 (700,000 training steps in total). The Table III shows the F-score for feature recognition on
probability of applying each data augmentation (DA) strategy different data groups. As illustrated in the table, the SsdNet
to the training images is 50%. The learning rate is initially set achieves the highest recognition F-score for all groups, which
as 10−4 and then set as 10−5 in the 3rd epoch. This simple means that the proposed method is capable of producing more
learning rate decay scheme for Adam is utilised since it is correct predictions than incorrect ones, and also finding more
able to yield better learning results [33]. The values of the features from CAD models. As discussed in Section II, the
aforementioned hyper-parameters are determined according to MsvNet and FeatureNet were proposed based on unsupervised
the validation loss. In the MsvNet and FeatureNet, the values segmentation algorithms which are not very suitable for 3D
of all the hyper-parameters are identical to those in [13]. It is models with highly intersecting features since the shape in-
worth noting that the intersecting feature segmentation part in formation of most features are damaged because of feature
the FeatureNet is only a reimplemented version provided by intersection. Therefore, the SsdNet could produce much better
Shi et al. [13] where the watershed algorithm with a default results as supervised segmentation algorithm is utilised.
configuration is utilised. An Intel i9-9900X PC with a 128 GB
memory and NVIDIA RTX 2080ti GPU is employed to carry D. Localisation performance
out the experiments reported in the following sections.
For a machining feature detector, it is important to measure While the recognition performances of different approaches
its ability to locate and recognise the appeared features. As were examined in the previous section, this section further
suggested in [34], F-score is adopted in this comparative study evaluates whether these approaches can accurately find the
as this metric is suitable for multi-object classification and locations of the features from the CAD model. In this exper-
detection problem [35]. The F-score is the weighted average iment, the F-score metric is also employed, but the way of
of precision and recall. The precision is the average fraction calculating the true positive (tpi ) is different. As suggested in
of correctly recognised/located features (true positive) among [34], a detection is considered as true positive only when the
all the recognised/located features, the recall is the average IoU value between the ground truth and predicted boxes is
fraction of correctly recognised/located features (true positive) greater than 0.5. If multiple prediction boxes match a same
among the total appeared features. ground-truth box, this metric only keeps the box with the
top prediction score. For instance, there are five holes in a
3D model (gthole = 5). The system finds four holes from
C. Recognition performance this model (predhole = 4). Among the four holes, only one
This section focuses on examining the recognition perfor- hole is located precisely. Therefore, the number of correctly
mances of different approaches. Therefore, F-score is em- recognised yet located holes in this model should be one
ployed as evaluation metric, where the true positive value tpi instead of four (tphole = 1). Such a calculation allows for
for a 3D model is calculated as evaluating whether the predicted features are located correctly.
tpi = min(predi , gti ), (8) TABLE IV

F- SCORE FOR FEATURE LOCALISATION (%).
as implemented in [13]. tpi refers to the true positive value
which is the number of correctly recognised type i feature Test data group [13]
in a 3D model, predi is the number of predicted type i Method all 1 2 3 4 5 6 7 8 9 10
feature in this model, and gti is the actual number of the
SsdNet 90.62 96.73 96.22 93.53 92.74 92.59 91.57 90.90 89.24 88.19 85.09
type i feature appeared in this model. For instance, a 3D MsvNet 58.26 85.36 77.20 66.22 60.72 60.99 59.51 59.76 56.66 50.15 44.33
model contains five holes (gthole = 5) and two pockets FeatureNet 38.37 88.01 70.04 51.48 43.73 41.75 40.75 34.85 31.72 24.84 19.26
(gtpocket = 2). The feature recogniser, however, reports
that there are four holes (predhole = 4) and three pockets Table IV illustrates the F-score for feature localisation on
(predpocket = 3) appeared in this 3D model. Therefore, the different data groups. It is evident from the table that the
number of correctly recognised holes and pockets in this model SsdNet produces the highest F-score for all groups, especially
should be four (tphole = min(predhole , gthole ) = min(5, 4)) when recognising models with highly intersecting features
and two (tppocket = min(predpocket , gtpocket ) = min(2, 3)) (e.g. models in group 7-10). To fully examine the reason
respectively. Such a calculation only focuses on the evaluation for this, Figure 5 illustrates two 3D models with intersecting
of recognition performance without considering whether the features and their predicted bounding boxes achieved from
predicted features are located correctly. different approaches. The original CAD model in Figure 5
(A) consists of five features: a rectangular through slot, a into a set of 3D bounding boxes); (6) the average time taken
vertical circular end blind slot, a triangular through slot, a in feature selection; (7) the average number of segmented
chamfer and a circular blind step. Among these features, the features; and (8) the average number of forward passes. For
rectangular through slot and the vertical circular end blind a fair comparison, all the experiments are conducted on an
slot are overlapped together; the triangular through slot and Intel i9-9900X PC with a 128 GB memory and NVIDIA RTX
the circular blind step are also intersecting features. In the 2080ti GPU.
FeatureNet [12], an unsupervised learning algorithm named From the Table V, it is observed that the SsdNet is the most
watershed algorithm was first employed to segment features efficient intersecting machining feature localisation and recog-
according to the their 3D shape information. From the segmen- nition method (243.85 ms per model). As stated in the previous
tation result achieved from the FeatureNet in Figure 5 (A), it is sections, the SsdNet and MsvNet employs 2D images rather
evident that this algorithm fails to segment these intersecting than 3D models as inputs. Therefore, these two approaches
features appeared in the given CAD model. The MsvNet take similar constant times for pre-processing the input data
[13] employed another unsupervised learning algorithm named (145.04 and 131.64 ms per model respectively). As discussed
selective search algorithm to segment the features. Unlike the in Section II and III, the MsvNet and FeatureNet are two-stage
watershed algorithm that only produces one set of segmen- methods in which feature segmentation and recognition are
tation results based on one 3D model, the selective search conducted separately. In these approaches, machining features
algorithm aims to enumerate all (or most) possible features in are first separated via unsupervised algorithms. Then, the
a given 3D CAD model (see the segmentation result achieved neural networks need to recognise these segmented features
from the MsvNet in Figure 5 (A)). Therefore, most intersecting one by one, which is time-consuming. The SsdNet, however,
features are very likely to be found by the selective search is an one-stage method where feature segmentation and recog-
algorithm, which could lead to a better localisation perfor- nition are carried out together. It predicts the feature types
mance than the FeatureNet. Due to the nature of unsupervised and bounding boxes from the input models directly without
learning, however, six instead of five features are detected by an independent feature segmentation process. Therefore, the
the MsvNet (see the final result achieved from the MsvNet in SsdNet can achieve a better runtime performance than the
Figure 5 (A)). For 3D models with highly intersecting features, others. This is supported by the results illustrated in Table V.
the topology of each feature may be destroyed. In these It is observed that the MsvNet takes 519.16 (=373.32+145.84)
situations, using unsupervised segmentation algorithms for ms for feature segmentation and recognition, the FeatureNet
feature segmentation and localisation is particularly arduous. takes 362.72 (=314.53+48.20) ms, and the SsdNet only takes
The SsdNet is a one-stage method which directly segments, 9.49 ms. It is also visible that the SsdNet and MsvNet take
locates and recognises the intersecting features via supervised similar amount of times for post-precessing the outputs and
learning algorithm. From the final result achieved from the selecting features.
SsdNet in Figure 5 (A), different types of intersecting fea- As discussed previously, the MsvNet and FeatureNet are
tures can be identified correctly since supervised segmentation two-stage methods where the segmented features need to be
algorithm is employed. The CAD model in Figure 5 (B) passed through the networks separately. Therefore, the average
also contains five features, while four of them are overlapped number of segmented features and the average number of
together. It is observed that the FeatureNet fails to segment forward passes are identical in these approaches (see Table V).
these four intersecting features correctly, which leads to the The SsdNet is an one-stage method where the average number
wrong localisation result (see the final result achieved from of forward passes is a constant value (six forward passes per
the FeatureNet in Figure 5 (B)). From this figure, it is evident model). Therefore, the time complexity of the SsdNet is O(1).
that the MsvNet is capable of detecting two features correctly,
while the SsdNet could locate and recognise all these highly
intersecting features easily even when the shape information F. Benefits assessment of the SsdNet
of these features is substantially damaged due to the feature As described in Section III, the SsdNet employs several
intersection. training strategies, which could affect the final prediction
performances. Therefore, the experiments under the following
settings are conducted to examine the effects of these strate-
E. Efficiency gies: (1) The SsdNet with the default configuration suggested
This section further compares the proposed method to others in previous sections is employed. In this experiment, TL
in terms of the efficiency since the runtime performance of and DA are enabled at the training stage. The output of the
a feature recognition system is critical to computer-aided network is a set of 3D bounding boxes. The learning rate is
manufacturing. In this experiment, the following evaluation initially set as 10−4 and then changed to 10−5 at the 3rd
metrics are utilised: (1) the average time taken by different epoch. At the training phase, 512 models per feature type are
methods in recognising a 3D model in the test set; (2) the utilised to train the networks. (2) In this setting, the output
average time taken in data pre-processing (e.g. converting a of neural network is a set of 2D bounding boxes rather than
3D model into a set of 2D images); (3) the average time taken 3D boxes, which is identical to the original SSD algorithm
in feature segmentation; (4) the average time taken in feature [14]. The depth information of a potential feature is calculated
recognition (e.g. forward pass); (5) the average time taken in based on a heuristic estimation method suggested in [13]. In
the post-processing (e.g. converting the outputs of the network this estimation, the depth of a bounding box is set to the
Fig. 5. Two 3D models with intersecting features and their predicted bounding boxes yielded by the FeatureNet, MsvNet and SsdNet. The original CAD
model in (A) contains five features with medium degree of overlap, while the original model in (B) consists of five features with high degree of overlap. The
intermediate and final results (e.g. results achieved from feature segmentation, recognition and selection) yielded by three approaches are presented.
TABLE V
T HE COMPARISON TABLE .
Average Time (ms) Average Number

Method (1) All (2) Pre-processing (3) Segmentation (4) Recognition (5) Post-processing (6) Selection (7) Segmented Features (8) Forward Passes
SsdNet 243.85 145.04 - 9.49 59.77 18.19 65.55 6

MsvNet 724.18 131.64 373.32 145.84 51.6 6.67 24.46 24.46
FeatureNet 381.65 - 314.53 48.20 11.06 - 6.12 6.12
maximal depth of all features appeared in this 2D box. The rest TABLE VI
configurations are identical to those in the previous setting. E XPERIMENTAL RESULTS (%) BASED ON DIFFERENT CONFIGURATIONS .
(3) The TL and DA in this setting are disabled during the
Method TL & DA Output Learning Rate #Models Fr Fl
training. (4) In this setting, the SsdNet with 2D outputs is
employed. The TL and DA are also disabled during training. (1) SsdNet X 3D 10−4 , 10−5 512 95.20 90.62
(2) SsdNet X 2D 10−4 , 10−5 512 89.64 78.32
(5) and (6) To examine the benefits of the learning rate decay (3) SsdNet 3D 10−4 , 10−5 512 91.30 86.45
strategy, the learning rates are set as fixed values in these (4) SsdNet 2D 10−4 , 10−5 512 86.01 74.38
10−4
two settings (10−4 and 10−5 respectively). (7) - (10) To (5)
(6)
SsdNet
SsdNet
X
X
3D
3D 10−5
512
512
94.36
93.34
89.70
88.93
examine whether the proposed method is capable of producing (7) SsdNet X 3D 10−4 , 10−5 256 93.51 88.89
satisfactory results when there are no sufficient 3D models for (8) SsdNet X 3D 10−4 , 10−5 128 90.44 85.54
(9) SsdNet X 3D 10−4 , 10−5 64 85.99 79.46
training, the number of models per feature type utilised for (10) SsdNet X 3D 10−4 , 10−5 32 84.29 77.76
training is set as 256, 128, 64 and 32 respectively in these (11) MsvNet - - - 512 76.24 58.26
(12) FeatureNet - - - 512 57.45 38.37
four settings. (11) and (12) For the comparison purpose, the
MsvNet and FeatureNet with the default settings are adopted
as baselines. 512 models per feature type are employed for
training.
the SsdNet with the default configuration produces the best
Table VI illustrates the recognition and localisation F-score results in terms of feature localisation and recognition. From
(denoted as Fr and Fl respectively) under the 12 experimental the setting (1) and (2), it is observed that the network with
configurations. It is evident from the setting (1) - (10) that 3D outputs is better than that of 2D outputs. This result
indicates the deep learning algorithm outperforms the heuristic [7] R. C. Luo and C. W. Kuo, “A scalable modular architecture of 3d object
estimation method in calculating the depths of features. It is acquisition for manufacturing automation,” in IEEE 13th International
Conference on Industrial Informatics. IEEE, 2015, pp. 269–274.
evident from the setting (1) and (3) that TL and DA could [8] C. Zhang, G. Zhou, H. Yang, Z. Xiao, and X. Yang, “View-based 3d
enhance the localisation and recognition performances. This cad model retrieval with deep residual networks,” IEEE Transactions on
phenomenon can also be observed from the results captured Industrial Informatics, 2019.
[9] R. C. Luo and C. W. Kuo, “Intelligent seven-dof robot with dynamic
from the setting (2) and (4). It is evident from the setting obstacle avoidance and 3-d object recognition for industrial cyber–
(1), (5) and (6) that the SsdNet with a simple learning rate physical systems in manufacturing automation,” Proceedings of the IEEE,
decay strategy works better than the SsdNet with a fixed vol. 104, no. 5, pp. 1102–1113, 2016.
learning rate. Such an evidence is also supported by [33]. [10] H. Wang, Z. Li, Y. Li, B. B. Gupta, and C. Choi, “Visual saliency
guided complex image retrieval,” Pattern Recognition Letters, vol. 130,
In the setting (7) - (10), it is observed that the number pp. 64–72, 2020.
of 3D models utilised for training could largely affect the [11] M. Al-Ayyoub, S. AlZu’bi, Y. Jararweh, M. A. Shehab, and B. B. Gupta,
final recognition and localisation performances. The proposed “Accelerating 3d medical volume segmentation using gpus,” Multimedia
Tools and Applications, vol. 77, no. 4, pp. 4939–4958, 2018.
method could achieve near-optimal results when 128 - 256 [12] Z. Zhang, P. Jaiswal, and R. Rai, “Featurenet: Machining feature
models per feature type are employed for training. In addition, recognition based on 3d convolution neural network,” Computer-Aided
the SsdNet with limited number of training samples (e.g. 32 Design, vol. 101, pp. 12–22, 2018.
[13] P. Shi, Q. Qi, Y. Qin, P. J. Scott, and X. Jiang, “A novel learning-based
models per feature type) could still produce better results than feature recognition method using multiple sectional view representation,”
the MsvNet and FeatureNet, as evident in the setting (10), Journal of Intelligent Manufacturing, vol. 31, no. 5, pp. 1291–1309, 2020.
(11) and (12). From the setting (1), (11) and (12), it is visible [14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
that SsdNet with supervised feature segmentation method Berg, “Ssd: Single shot multibox detector,” in European Conference on
Computer Vision. Springer, 2016, pp. 21–37.
outperforms other approaches with unsupervised segmentation [15] M. A. Alsmirat, F. Al-Alem, M. Al-Ayyoub, Y. Jararweh, and B. B.
methods. Gupta, “Impact of digital fingerprint image quality on the fingerprint
recognition accuracy,” Multimedia Tools and Applications, vol. 78, no. 3,
pp. 3649–3688, 2019.
V. C ONCLUSION [16] Y. Liu and M. Zhu, “Processed rgb-d slam based on hog-man algorithm,”
International Journal of High Performance Computing and Networking,
In conclusion, this paper proposed a novel method for vol. 14, no. 3, pp. 376–384, 2019.
intersecting feature localisation and recognition via single [17] B. R. Babic, N. Nesic, and Z. Miljkovic, “Automatic feature recognition
using artificial neural networks to integrate design and manufacturing:
shot multibox detector (SSD). A thorough evaluation was Review of automatic feature recognition systems,” Artificial Intelligence
carried out to compare the proposed method to the others. for Engineering Design, Analysis and Manufacturing: AI EDAM, vol. 25,
Experimental results demonstrated that the SsdNet achieved no. 3, p. 289, 2011.
[18] J. H. Vandenbrande and A. A. Requicha, “Spatial reasoning for the
the state-of-art localisation and recognition performances on automatic recognition of machinable features in solid models,” IEEE
the benchmark test set due to the nature of supervised learning Transactions on Pattern Analysis and Machine Intelligence, vol. 15,
algorithm employed for feature segmentation. In addition, the no. 12, pp. 1269–1285, 1993.
[19] T. Dipper, X. Xu, and P. Klemm, “Defining, recognizing and repre-
training strategies adopted in this paper considerably enhanced senting feature interactions in a feature-based data model,” Robotics and
the recognition performances. Furthermore, the SsdNet is more Computer-Integrated Manufacturing, vol. 27, no. 1, pp. 101–114, 2011.
efficient than others as it is a one-stage method. The proposed [20] A. Mokhtar and X. Xu, “Machining precedence of 21/2d interacting
features in a feature-based data model,” Journal of Intelligent Manufac-
method could be utilised in a computer-aided process planning turing, vol. 22, no. 2, pp. 145–161, 2011.
(CAPP) system which produces a set of manufacturing opera- [21] Y. Woo, “Fast cell-based decomposition and applications to solid mod-
tions and machine tools based on the feature localisation and eling,” Computer-Aided Design, vol. 35, no. 11, pp. 969–977, 2003.
recognition results. With the insight obtained from this work, [22] Y. Li, Y. Ding, W. Mou, and H. Guo, “Feature recognition technology
for aircraft structural parts based on a holistic attribute adjacency graph,”
the feasibility in minimising the required training samples and Proceedings of the Institution of Mechanical Engineers, Part B: Journal
extending this approach to free-form features will be explored of Engineering Manufacture, vol. 224, no. 2, pp. 271–278, 2010.
in the ongoing work. [23] S. Xu, N. Anwer, and C. Mehdi-Souzani, “Machining feature recognition
from in-process model of nc simulation,” Computer-Aided Design and
Applications, vol. 12, no. 4, pp. 383–392, 2015.
R EFERENCES [24] G. Campana and M. Mele, “An application to stereolithography of a
feature recognition algorithm for manufacturability evaluation,” Journal
[1] J. Han, M. Pratt, and W. C. Regli, “Manufacturing feature recognition of Intelligent Manufacturing, pp. 1–16, 2018.
from solid models: a status report,” IEEE Transactions on Robotics and [25] L. Ding and Y. Yue, “Novel ann-based feature recognition incorporating
Automation, vol. 16, no. 6, pp. 782–796, 2000. design by features,” Computers in Industry, vol. 55, no. 2, pp. 197–222,
[2] B. Babic, N. Nesic, and Z. Miljkovic, “A review of automated feature 2004.
recognition with rule-based pattern recognition,” Computers in Industry, [26] N. Öztürk and F. Öztürk, “Hybrid neural network and genetic algorithm
vol. 59, no. 4, pp. 321–337, 2008. based machining feature recognition,” Journal of Intelligent Manufactur-
[3] A. K. Verma and S. Rajotia, “A review of machining feature recognition ing, vol. 15, no. 3, pp. 287–298, 2004.
methodologies,” International Journal of Computer Integrated Manufac- [27] E. Brousseau, S. Dimov, and R. Setchi, “Knowledge acquisition tech-
turing, vol. 23, no. 4, pp. 353–368, 2010. niques for feature recognition in cad models,” Journal of Intelligent
[4] X. Xu, “Integrating advanced computer-aided design, manufacturing, and Manufacturing, vol. 19, no. 1, pp. 21–32, 2008.
numerical control,” Information Science Reference, 2009. [28] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to
[5] S. Shao, S. McAleer, R. Yan, and P. Baldi, “Highly accurate machine fault see better,” arXiv preprint arXiv:1506.04579, 2015.
diagnosis using deep transfer learning,” IEEE Transactions on Industrial [29] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
Informatics, vol. 15, no. 4, pp. 2446–2455, 2018. conference on computer vision, 2015, pp. 1440–1448.
[6] G. A. Susto, A. Schirru, S. Pampuri, S. McLoone, and A. Beghi, “Machine [30] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zis-
learning for predictive maintenance: A multiple classifier approach,” IEEE serman, “The pascal visual object classes (voc) challenge,” International
Transactions on Industrial Informatics, vol. 11, no. 3, pp. 812–820, 2014. Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
[31] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-nms–improving Paul J. Scott is currently a professor at the EPSRC
object detection with one line of code,” in Proceedings of the IEEE Future Advanced Metrology Hub of the University
International Conference on Computer Vision, 2017, pp. 5561–5569. of Huddersfield. He received a Ph.D. degree in
[32] C. Largeron, C. Moulin, and M. Géry, “Mcut: a thresholding strategy Statistics from Imperial College London in 1983.
for multi-label classification,” in International Symposium on Intelligent He has an honours degree in Mathematics and an
Data Analysis. Springer, 2012, pp. 172–183. M.Sc. degree in Statistics. His research interests are
[33] A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, “The in manufacturing informatics, geometrical product
marginal value of adaptive gradient methods in machine learning,” in specifications and verification, philosophy of the
Advances in Neural Information Processing Systems, 2017, pp. 4148– measurement of product geometry, and foundations
4158. of specifying and characterising solutions for real
[34] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and world industrial problems. He was the project leader
M. Pietikäinen, “Deep learning for generic object detection: A survey,” for twenty published ISO standards and is currently working on four new
International Journal of Computer Vision, vol. 128, no. 2, pp. 261–318, ISO documents. He is a fellow of Royal Statistical Society (FRSS), an
2020. EPSRC Fellow of Manufacturing, a leading member of ISO TC 213, a
[35] M. Sokolova and G. Lapalme, “A systematic analysis of performance founder member of the strategic group AG1 and the technical review group
measures for classification tasks,” Information Processing & Manage- AG2 of ISO TC 213, a convenor of the working group WG15 (Filtration
ment, vol. 45, no. 4, pp. 427–437, 2009. and Extraction) and the advisory group AG12 (Mathematics for Geometrical
Product Specifications) of ISO TC 213, a core member of BSI TDW4, a
convenor of BSI TDW4/-/9, a visiting industrial professor of Taylor Hobson
Ltd, and the Taylor Hobson Chair for Computational Geometry.
Peizhi Shi is currently a research fellow at the

EPSRC Future Advanced Metrology Hub, Univer-
sity of Huddersfield, UK. He received his PhD in
computer science from the University of Manchester
in 2019, master’s degree in software engineering
from the University of Science and Technology of
China in 2013, and bachelor’s degree in computer
science from the Guilin University of Electronic
Technology in 2010. His current research interests
include machine learning, 3D object localisation and
recognition, machine perception, and their applica-
tions in intelligent system development.
Qunfen Qi is currently a senior research fellow at

the EPSRC Future Advanced Metrology Hub, Uni-
versity of Huddersfield, UK. She received a Ph.D. Xiangqian Jiang is currently the chair professor
degree in Precision Engineering from University and the director of the EPSRC Future Advanced
of Huddersfield in 2013. She is an EPSRC UKRI Metrology Hub, University of Huddersfield and the
Innovation Fellow, an EPSRC Peer Review Full Royal Academy of Engineering and Renishaw Chair
College Member, an EPSRC Women in Engineering in Precision Metrology. She has a D.Sc. degree
Society (WES) member, and a fellow of the Higher in Precision Engineering and a Ph.D. degree in
Education Academy (FHEA). Her research lies in Surface Metrology. Her research interests mainly
knowledge modelling for manufacturing covering lie in Surface Measurement, Precision Engineering,
smart information systems, abstract mathematical and Advanced Manufacturing Technologies. She was
theory (category theory), geometrical product specifications (GPS), additive made a Dame Commander (DBE) of the Order of
manufacturing (AM), and surface metrology. She has worked for fifteen years the British Empire for services to Engineering and
in developing decision-making tools for smart product design and inspection, Manufacturing in 2017. She is a fellow of the Royal Academy of Engineering
using category theory as its foundation. (FREng), a fellow of the Royal Society of Arts (FRSA), a fellow of the
Institute of Engineering Technologies (FIET), a fellow of the International
Academy of Production Research (FCIRP), a fellow of the International
Society for Nanomanufacturing (FISNM), a principle member of ISO TC 213
and BSI TW/4, an advisory member for UK national measurement system,
and the UK Chairman of the International Academy of Production Research.
Yuchu Qin is currently a Ph.D. candidate at the
EPSRC Future Advanced Metrology Hub, Univer-
sity of Huddersfield, UK. He received a D.Eng.
degree in Measurement Technology and Instrument
from School of Mechanical Science and Engineer-
ing, Huazhong University of Science and Technol-
ogy, China in 2017. He has a M.Eng. degree in
Computer Application Technology and a B.Eng.
degree in Computer Science and Technology. His
research interests include intelligent manufacturing,
geometrical product specifications, CAD data inter-
operability, and knowledge representation and reasoning. He has published
over twenty papers about manufacturing informatics in international jour-
nals such as Knowledge-Based Systems, Advanced Engineering Informatics,
Computer-Aided Design, and Journal of Computing and Information Science
in Engineering. He has also co-authored and published one monograph about
knowledge representation of geometrical product specifications.

Intersecting Machining Feature Localisation and Recognition Via Single Shot Multibox Detector.

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intersecting Machining Feature Localisation and Recognition Via Single Shot Multibox Detector.

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Intersecting machining feature localisation and

contain more machining features. The detailed DA process TABLE I

Input Base Net Activation Map Multibox Net Output

k achieved from the ith reference box. e is the exponential

t̂cx = (tcx − dcx )/dw , t̂cy = (tcy − dcy )/dh , (3)

tpi = min(predi , gti ), (8) TABLE IV

Average Time (ms) Average Number

SsdNet 243.85 145.04 - 9.49 59.77 18.19 65.55 6

Peizhi Shi is currently a research fellow at the

Qunfen Qi is currently a senior research fellow at

You might also like