X-YOLO A Deep Learning Based Toolset With Multiple Optimization Strategies For Contraband Detection

X-YOLO: A deep learning based toolset with multiple
optimization strategies for contraband detection

Haoyue Wang Wei Wang∗ Yao Liu
School of Data Science and School of Data Science and School of Data Science and
Engineering Engineering Engineering
East China Normal University East China Normal University East China Normal University
Shanghai, China Shanghai, China Shanghai, China
wsunshine1089@163.com wwang@dase.ecnu.edu.cn liuyao@cc.ecnu.edu.cn
ABSTRACT 1 INTRODUCTION
Observing X-ray images manually is a common method for detect- X-ray security is widely used to maintain aviation and transporta-
ing contraband in packages. Long-term continuous observation is tion security. It provides an important image-based object detection
prone to visual fatigue, leading to problems of missed detection task for operators to inspect compact, cluttered and highly vari-
and false detection. Motivated by aiding operators in contraband able package contents in a limited time. In recent years, China’s
detection in packages, we propose X-YOLO which is a deep learn- logistics industry has been booming. It has risen from a past end
ing based toolset with multiple optimization strategies for con- industry to a leading industry that guides production and promotes
traband detection to increase the detection precision. The path consumption, bringing convenience and benefits to many people.
enhancement is designed to shorten the information path between Contraband detection plays an important role in the daily pack-
the lower layer and the uppermost layer. We replace Leaky ReLU age logistics industry and the security industry. It is responsible for
with Swish and Steps with SGDR to make training stable. Mixup, preventing contraband such as inflammables and explosives from
a dataset-independent method for data augmentation, is designed entering the shipping channels, managing special freight items such
to increase the amount of training data for improving the general- as knives and monitoring the smuggling of key national contraband
ization of model without expert knowledge. In order to solve the such as drugs. With the popularity and rapid development of online
issue that Intersection over Union (IoU ) can not deal with two non- shopping, the goods transported by the logistics industry begin to
overlapping objects, we apply Generalized Intersection over Union be complex and diverse. The number of online logistics packages
(GIoU ) as bounding box losses. The experimental results show that has far exceeded the range that can be handled manually, which
X-YOLO achieves mAP up to 96.02% and recall up to 98.55%, surpass- has brought huge challenges to the logistics package supervision.
ing Faster R-CNN, SSD, YOLOv1, YOLOv2, Tiny-YOLO, YOLOv3, Although the traditional contraband detection considers applica-
YOLOv3-tiny, YOLOv3-spp and YOLOv3 with some of optimization tion of computer vision, those processes still require highly trained
strategies. operators to observe the resulting images and draw the correct
conclusions. Gesick et al.[1] proposed an algorithm based on the
CCS CONCEPTS scale invariant feature transform (SIFT)[2]. They attempted this
• Computing methodologies → Artificial intelligence; Com- algorithm, edge detection combined with pattern matching and
puter vision; Computer vision problems; Object detection; Daubechie wavelet transforms[3] to detect concealed weapons. Al-
though the results of the three approaches were not ideal, they
KEYWORDS inspired researchers to explore contraband detection later.
Chan et al.[4] explored the feasibility of applying SIFT to trans-
contraband detection, X-YOLO, optimization strategies
mission X-ray images, which can solve the problem of X-ray image
ACM Reference Format: correspondences. The results revealed that the transparency prop-
Haoyue Wang, Wei Wang, and Yao Liu. 2020. X-YOLO: A deep learning erty of X-ray images can provide additional information to help
based toolset with multiple optimization strategies for contraband detection. find the correspondences in X-ray images, while the information
In ACM Turing Celebration Conference - China (ACM TURC’20), May 22–24,
may constrain the matching process under certain conditions.
2020, Hefei, China. ACM, New York, NY, USA, 6 pages. https://doi.org/10.
1145/3393527.3393549
Bastan et al.[5] proposed two dense sampling methods as key-
point detectors for texture-free objects and extended the SPIN color
∗ Wei Wang is the corresponding author. descriptors to utilize material information. They then proposed
a multi-view branch-and-bound search algorithm for multi-view
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
object detection. However, they only tested three object categories.
for profit or commercial advantage and that copies bear this notice and the full citation Riffo[6] proposed a methodology for automatic detection of
on the first page. Copyrights for components of this work owned by others than ACM threat objects using single X-ray images. The detection method
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a used a visual vocabulary and an occurrence structure generated
fee. Request permissions from permissions@acm.org. from the training dataset that contained representative x-ray images
ACM TURC’20, May 22–24, 2020, Hefei, China of the threat objects to be detected. They detected three different
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7534-4/20/05. . . $15.00
threat objects: razor blades, shuriken and handguns. The true pos-
https://doi.org/10.1145/3393527.3393549 itive rates were: 99.0% for razor blades, 97.0% for shuriken and
ACM TURC’20, May 22–24, 2020, Hefei, China Haoyue Wang, Wei Wang, and Yao Liu
89.0% for handguns. Later, he and Mery et al.[7] evaluated ten dif- Faster R-CNN[14], and object detection algorithms based on end-
ferent computer vision strategies. They also detected razor blades, to-end training, such as YOLO[15] and SSD[16].
shuriken and handguns. They achieved more than 95.0% of accu- YOLO is an important object detection algorithm based on deep
racy by methods based on visual vocabularies and deep features. learning. In this paper, the deep learning based toolset for con-
To the best knowledge of the authors, it was the first experiment in traband detection is realized around YOLO. YOLO models object
baggage inspection (and probably in Xray testing) that used deep detection as a regression problem and directly returns class prob-
learning. abilities and position of objects. It uses end-to-end training. The
Kundegorski et al.[8] explored the use of various feature point back propagation of loss function runs through the entire network.
descriptors as visual word variants within a Bag-of-Visual-Words The original image is first divided into grid cells and each cell is
(BoVW) representation scheme, which can be used to classify im- independently detected. If the center of an object falls in a cell, then
ages within baggage security X-ray imagery for threat detection. the cell is responsible for predicting the object. Each cell needs to
For a firearms detection task, the model achieved an accuracy of predict B bounding boxes and each bounding box not only has to
94.0%. However, they did not evaluate performance for classification return its own position, but also predict a confidence. Confidence
of additional object types. indicates the probabilities that the bounding box contains the object
Akcay S et al.[9] solved the image classification problem posed to be detected and the accuracy of the predicted position of the
within the context of X-ray baggage security screening through bounding box. We can calculate confidence as follows:
deep convolutional neural networks(CNN) based on AlexNet[10].
t ruth
Due to the limitation of training data, they applyed transfer learn- con f idence = Pr (Object) × IoUpr ed (1)
ing. They primarily trained for generalized image classification
tasks where sufficient training data existed and later specifically here, if the object falls within the bounding box, Pr (Object) takes
optimized in specific application domains. Although they achieved 1 and vice versa. IoUpr t ruth is the intersection over union (IoU )
ed
98.92% detection accuracy for handgun detection problem, there between the predicted bounding box and the ground truth.
were too few types of baggage objects detected. They did not in- Each bounding box predicts five values of x, y, w, h and confi-
vestigate localization of X-ray baggage objects within the images dence, where (x, y) coordinate represents the center of the bounding
either. Later, they[11] trained support vector machine (SVM)[12] box relative to the grid cell boundary. w and h represent the ratios
classifier on CNN features. In addition to classification, they also of width and height of the bounding box relative to the entire image.
explored multiple CNN driven detection. With the use of YOLOv2, The x, y, w and h are all limited to [0, 1]. Each grid cell also predicts
using input images of size 416×416, they achieved an mean average C conditional class probabilities, Pr (Classi | Object).
accuracy (mAP) of 97.4% for a two-class firearm detection problem. Therefore, YOLO divides the image into S × S grid cells, for each
Based on classification task and location search task, Yuanxi grid cell predicts B bounding boxes and C class probabilities. These
Wei[13] used a multi-task transfer learning method on SSD network predictions are encoded as an S × S × (B × 5 + C) tensor.
to solve the problem that was difficult to collect a comprehensive During test time, the class probabilities are multiplied by the
image dataset of dangerous goods. The experiments were performed individual box confidence predictions to obtain the class-specific
in SSD300 on the image datasets filtered from GDXray and achieved confidence scores for each bounding box.
a mAP of 91.5%.
Pr (Classi | Object) ∗ Pr (Object) ∗ IoUpr
t ruth = Pr (Class ) ∗ IoU t rut h
ed i pr ed
(2)
We propose X-YOLO which is a deep learning based toolset with
multiple optimization strategies for contraband detection to in- These scores represent the probability that the predicted box be-
crease the detection precision in the paper. The main composition longs to a class and the accuracy that the predicted box matches
of the paper is as follows. Section 2 introduces some related work. the object.
Section 3 introduces X-YOLO which is a deep learning based toolset After getting the scores for each box, we can set a threshold
with multiple optimization strategies. Section 4 proves effective- and filter out the boxes with low scores. Later non-maximum sup-
ness of X-YOLO in contraband detection. The final section gives a pression(NMS) is performed on the reserved boxes to get the test
summary and proposes future work. results.
2 PRELIMINARIES OF YOLO 3 X-YOLO

The main task of object detection is to locate the interested objects We propose X-YOLO which is a deep learning based toolset to
from images. It requires not only to accurately determine the spe- improve efficiency of contraband detection. The structure of X-
cific category of each object, but also to give the boundary box YOLO is shown as Figure 1. Some articles are used for reference
of each object. Deep learning is a subset of artificial intelligence and multiple optimization strategies are integrated into X-YOLO
and machine learning. Without introducing manual coding rules or framework. In this section, we detail the optimization strategies
human domain knowledge, it can automatically learn features of used in X-YOLO.
data such as images, videos or text. At present, deep learning-based
object detection algorithms have surpassed traditional detection 3.1 SGDR
methods and become the mainstream. According to their design Most object detection algorithms with deep learning use Steps to
ideas, they can be divided into two types, object detection algo- reduce learning rate. As the number of iterations changes, the learn-
rithms based on regional proposals, such as R-CNN, Fast R-CNN, ing rate can be automatically reduced without manual operation.
X-YOLO: A deep learning based toolset with multiple optimization strategies for contraband detection ACM TURC’20, May 22–24, 2020, Hefei, China
Figure 1: X-YOLO Structure
Although Steps can be used to change the learning rate quickly, it Figure 2: Swish with different β
may become unstable in subsequent iterations.
Warmup learning rate is a way to avoid gradient explosions in GIoU [19] (Generalized Intersection over Union) not only has the
the initial stages of training. SGDR[17] (Stochastic Gradient Descent same characteristics as IoU , but also makes up for drawbacks of
with Warm Restarts) is a stochastic gradient descent technology IoU . The calculation of GIoU is summarized in Alg 1.
with warm restarts. It can reduce training time and avoid gradients
near zero. The restarts are not performed from scratch but emulated Algorithm 1 Generalized Intersection over Union
by increasing the learning rate. The old value is used as an initial
solution. Although the optimization process may be damaged in Input: Two arbitrary convex shapes: A, B ⊆ S ∈ Rn ;
the short term, it will eventually converge to a more ideal local Output: GIoU ;
1: For A and B, find the smallest enclosing convex C, C ⊆ S ∈ Rn .
minimum.
Within the i-th run, the learning rate is decayed with a cosine |A∩B |
annealing for each batch as follows: 2: IoU = |A∪B | ;
|C\(A∪B)|
1 Tcur 3: GIoU = IoU − |C | ;
ηt = ηimin + (ηimax − ηimin )(1 + cos ( π )) (3)
2 Ti
where ηimin and ηimax are ranges for the learning rate, and Tcur Compared with IoU , GIoU need to find the smallest convex
accounts for how many epochs have been performed since the last shapes C enclosing both A and B. Therefore, when two arbitrary
restart. convex shapes do not overlap, the GIoU can still be calculated,
which solves the problem that the IoU is not suitable as a loss
3.2 Swish function and better reflects how they overlap.
Swish[18] is a self-gated activation function, defined as follows: For 2D object detection tasks (such as comparing two axis-
aligned bounding boxes), GIoU has a straightforward solution.
Swish(x) = xσ (βx) (4)
In this case, the intersection and the smallest enclosing objects
where σ (·) is sigmoid function and β is either a constant or a train- both have rectangular shapes. The coordinates of their vertices
able parameter. Figure 2 plots the graph of Swish for different values are simply the coordinates of one of the two bounding boxes be-
of β. When β = 0, Swish becomes a linear function f (x) = x/2. ing compared, which can be attained by comparing each vertices’
When β = 1, Swish is approximately linear as x > 0 and approx- coordinates by minimum and maximum functions.
imately saturated as x < 0. If β → +∞, σ (βx) approaches a 0-1
function. Swish can be loosely viewed as a smooth function which 3.4 Mixup
nonlinearly interpolates between the linear function and the ReLU Deep neural networks generally require a large amount of training
function. In the paper, β is 1. data in order to obtain better results. When the number of data is
limited, increasing the amount of data through data augmentation
3.3 GIoU can improve the robustness of the model and avoid overfitting.
IoU , also known as Jaccard index, is the most commonly used While traditional data augmentation methods such as flip and
metric for comparing the similarity between two arbitrary shapes. crop are dataset-dependent and thus require expert knowledge.
However, IoU as both a metric and a loss has two issues: one is Mixup[20] is dataset-independent way for data augmentation:
that IoU cannot correctly distinguish between different alignments x̃ = λx i + (1 − λ)x j
of two objects. If two objects overlap in different directions and (5)
the intersections are at the same level, IoU s will be exactly equal. ỹ = λyi + (1 − λ)y j
Another is if two objects do not overlap, the IoU value will be zero (x i , yi ) and (x j , y j ) are two examples drawn at random from our
and not reflect how far the two shapes are from each other. In the training data and λ ∈ [0, 1]. Therefore, mixup extends the training
case of non-overlapping objects, if IoU is used as a loss, its gradient distribution by incorporating the prior knowledge that linear in-
will be zero and cannot be optimized. terpolations of feature vectors should lead to linear interpolations
of the associated objects. The mixup vicinal distribution can be process due to the limitation of data, thus we use data augmenta-
understood as a form of data augmentation that encourages the tion methods such as horizontal flip, random cut, color distortion
model to behave linearly in-between training examples. This linear and random sampling to increase the amount of data. Finally, 9810
behaviour reduces the amount of undesirable oscillations when images are divided into training and validation sets, with a propor-
predicting outside the training examples. tion of 4: 1. The path enhancement[23] is designed to shorten the
information path between the lower layer and the uppermost layer.
3.5 Assisted Excitation Moreover, we add antialiasing[24] which is a software technique
YOLOv3 faces two challenges[21]. One is difficulty in localization, for diminishing jaggies.
another is extreme imbalance of positive and negative samples. We set the number of categories to 6 and size of network input
Localization problems occur because classification and localization to 544 × 544. We experiment with batch size of 64, subdivision of
in YOLOv3 are performed simultaneously. Dense sliding window of 16 and initial learning rate of 0.001.
single-stage detectors cause the imbalance of positive and negative The bounding box priors of X-YOLO are determined using k-
samples. means clustering and 12 cluster centers are obtained. The 12 clusters
Base on the idea of curriculum learning[22] that if we learn are : (8×8), (10×13), (16×30), (33×23), (32×32), (30×61), (62×45),
simpler tasks first and then continue to perform more complex (59×119), (80×80), (116×90), (156×198), (373×326) on the dataset.
tasks, we will get better performance in terms of local-minima and The experiments are implemented using NVIDIA TITAN Xp and
generalization. The technology is only suitable for learning process PyTorch environment.
and the excitation factor is gradually decreased, so it will not affect
the speed of detection.
First, the bounding box mapping д is defined as follows:
1, I f some bbox exists at cell(i, j)

д(i, j) = (6)
0, I f no bbox exists at cell(i, j)
An assisted excitation module can be described as follows:
+1
al(c,i,j) = al(c ,i,j) + α(t)e (c,i,j) (7)
where al and al +1 are activation tensors at levels l and l + 1. e is
excitation tensor and α is excitation factor that bases on the number
of epoch t. Also (c, i, j) refer to channel number, row and column.
During training, α(t) starts with a nonzero value for initial epochs
and gradually decays to zero. e is a function of al and ground-truth.
The excitation e in bounding box locations can use different Figure 3: X-ray image
strategies. However, its performance is best when stimulated by
the shared information of bounding box locations on all channels.
Excitation tensor e is computed as follows:
4.2 Evaluation Criteria
d
д(i, j) Õ Mean average precision (mAP), an evaluation criteria, is used to
e (c,i,j) = a (8)
d c=1 (c ,i,j) evaluate the performance of models. It is calculated based on pre-
cision and recall. Precision and recall are calculated according to
where d represents the number of feature channels. The decay
equation(10) and equation(11) respectively.
function over time is as follows:
1 + Cos(π .t) TP
α(t) = .5 × (9) precision = (10)
Max_Iteration TP + FP
The effect of applying segmentation function will decay to 0, which TP
recall = (11)
is equivalent to detection network of original YOLO. TP + FN
Here, TP, true positive, is an outcome where the model correctly
4 RESULTS OF EXPERIMENT predicts the positive class. FP, false positive, is an outcome where
the model incorrectly predicts the positive class. FN, false negative,
4.1 Experimental Setup is an outcome where the model incorrectly predicts the negative
In the experiments, we use the X-ray contraband image dataset class.
which is from Jinnan digital manufacturing algorithm challenge We define samples with both GIoU and classification probability
of Tianchi competition. The dataset contains five classes of contra- greater than 0.5 are positive samples. Average precision(AP) is the
band, including iron shell lighter, black nail lighter, knife, battery area under precision-recall curve, that is the integral of precision-
capacitance and scissors. There are 981 training images and 1194 recall curve, as shown in equation(12). Actually, we can get AP
test images with the resolution of about 96 pixels by 96 pixels. Each like equation(13) through calculating the area of several rectangles
picture contains more than one class of contraband. under precision-recall curve. Based on AP, mAP can be expressed
We use YOLOv3 as the experimental framework to prove ef- as equation(14), where N is the number of classes of contraband in
fectiveness of X-YOLO. Some models may overfit during training the dataset.
X-YOLO: A deep learning based toolset with multiple optimization strategies for contraband detection ACM TURC’20, May 22–24, 2020, Hefei, China
∫ 1 of much contraband and different classes of much contraband in

AP = p(r ) dr (12) the images.
0
n
1 Õ
AP = max r ∈[ i −1 , i ]p(r ) (13)
n i=1 n n
ÍN
AP(n)
mAP = n=1 (14)
N
4.3 Results
The changes of loss during training are shown in Figure 4. Com-
pared with YOLOv3 with some of optimization strategie(s) and
other models, X-YOLO has similar convergence ability.
(a)
(a) Faster R-CNN
(b) (c)
Figure 5: Detection results
We use mAP to evaluate performance of the models. The com-

parison results are shown in Fig 6. The performance of X-YOLO is
(b) SSD better than that of other models. It achieves mAP up to 96.02% and
recall up to 98.55%, which is 1.32% and 2.69% higher than YOLOv3
respectively.
(c) YOLOv1&v2&v3
Figure 6: Model performance comparison

(d) Comparison of X-YOLO and YOLOv3
with some of optimization strategies The results of experiment show that SSD, YOLOv1, YOLOv2 and
Tiny-YOLO are not suitable for contraband detection, even the mAP
Figure 4: The loss of models of SSD does not reach 70.0%. Faster R-CNN, YOLOv3, YOLOv3-tiny
and YOLOv3-spp achieve certain effects for contraband detection.
The results of X-YOLO contraband detection are shown in Fig 5. The detection effect of YOLOv3 with some of optimization strategies
X-YOLO can not only identify the classes and locations of contra- of X-YOLO can be improved to varying degrees. In short, X-YOLO
band, but also handle a single class of contraband, the same class is the best for contraband detection.
5 CONCLUSION [14] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards
real-time object detection with region proposal networks. In C. Cortes, N. D.
Motivated by aiding operators in contraband detection in packages, Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural
a deep learning based toolset with multiple optimization strategies Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
[15] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look
for contraband detection: X-YOLO is developed, which aids to in- once: Unified, real-time object detection. In The IEEE Conference on Computer
crease the detection precision. Multiple optimization strategies are Vision and Pattern Recognition (CVPR), June 2016.
integrated into X-YOLO framework. The path enhancement is de- [16] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In
signed to shorten the information path between the lower layer and Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision –
the uppermost layer. We apply Swish and SGDR to make training ECCV 2016, pages 21–37, Cham, 2016. Springer International Publishing.
more stable. Mixup, a dataset-independent method for data aug- [17] Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts.
CoRR, abs/1608.03983, 2016.
mentation, is designed to increase the amount of training data for [18] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for activation
improving the generalization of model without expert knowledge. functions. CoRR, abs/1710.05941, 2017.
[19] Seyed Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian D.
GIoU is designed to be bounding box losses to solve the problem Reid, and Silvio Savarese. Generalized intersection over union: A metric and A
that IoU can not deal with two non-overlapping objects. loss for bounding box regression. CoRR, abs/1902.09630, 2019.
The experimental results show that X-YOLO achieves mAP up [20] Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. mixup:
Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.
to 96.02% and recall up to 98.55%, surpassing Faster R-CNN, SSD, [21] Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid
YOLOv1, YOLOv2, Tiny-YOLO, YOLOv3, YOLOv3-tiny, YOLOv3- Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, and Babak Nadjar
spp and YOLOv3 with some of multiple optimization strategies. Araabi. Assisted excitation of activations: A learning technique to improve object
detectors. CoRR, abs/1906.05388, 2019.
Future work will consider the detection of more contraband [22] Alex Graves, Marc G. Bellemare, Jacob Menick, Rémi Munos, and Koray
types, integration of more optimization strategies, combination of Kavukcuoglu. Automated curriculum learning for neural networks. In Doina Pre-
cup and Yee Whye Teh, editors, Proceedings of the 34th International Conference
advantages and disadvantages of various algorithms and application on Machine Learning, volume 70 of Proceedings of Machine Learning Research,
of other models. pages 1311–1320, International Convention Centre, Sydney, Australia, 06–11 Aug
2017. PMLR.
[23] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network
6 ACKNOWLEDGMENTS for instance segmentation. CoRR, abs/1803.01534, 2018.
This work is supported by the National Natural Science Foundation [24] Adam Marrs, Josef Spjut, Holger Gruen, Rahul Sathe, and Morgan McGuire. Adap-
tive temporal antialiasing. In Proceedings of the Conference on High-Performance
of China (Grant No. 61672384). Graphics, HPG ’18, New York, NY, USA, 2018. Association for Computing Ma-
chinery.
REFERENCES
[1] A. D. Pitcher, J. J. McCombe, E. A. Eveleigh, and N. K. Nikolova. Compact
transmitter for pulsed-radar detection of on-body concealed weapons. In 2018
IEEE/MTT-S International Microwave Symposium - IMS, pages 919–922, June 2018.
[2] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional
networks for visual recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 37(9):1904–1916, Sep. 2015.
[3] R. R. Thirrunavukkarasu, T. Meeradevi, A. Ravi, D. Ganesan, and G. P. Vadivel.
Detection r peak in electrocardiogram signal using daubechies wavelet transform
and shannon’s energy envelope. In 2019 5th International Conference on Advanced
Computing Communication Systems (ICACCS), pages 1044–1048, March 2019.
[4] D. Mery and A. K. Katsaggelos. A logarithmic x-ray imaging model for baggage
inspection: Simulation and object detection. In 2017 IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), pages 251–259, July 2017.
[5] Muhammet Baundefinedtan. Multi-view object detection in dual-energy x-ray
images. Mach. Vision Appl., 26(7–8):1045–1060, November 2015.
[6] V. Riffo and D. Mery. Automated detection of threat objects using adapted
implicit shape model. IEEE Transactions on Systems, Man, and Cybernetics: Systems,
46(4):472–482, April 2016.
[7] D. Mery, E. Svec, M. Arias, V. Riffo, J. M. Saavedra, and S. Banerjee. Modern com-
puter vision techniques for x-ray testing in baggage inspection. IEEE Transactions
on Systems, Man, and Cybernetics: Systems, 47(4):682–692, April 2017.
[8] M. E. Kundegorski, S. Akcay, M. Devereux, A. Mouton, and T. P. Breckon. On using
feature descriptors as visual words for object detection within x-ray baggage
security screening. In 7th International Conference on Imaging for Crime Detection
and Prevention (ICDP 2016), pages 1–6, Nov 2016.
[9] S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon. Transfer learning
using convolutional neural networks for object classification within x-ray bag-
gage security imagery. In 2016 IEEE International Conference on Image Processing
(ICIP), pages 1057–1061, Sep. 2016.
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition. In The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), June 2016.
[11] S. Akcay, M. E. Kundegorski, C. G. Willcocks, and T. P. Breckon. Using deep
convolutional neural network architectures for object classification and detection
within x-ray baggage security imagery. IEEE Transactions on Information Forensics
and Security, 13(9):2203–2215, Sep. 2018.
[12] Dengsheng Zhang. Support Vector Machine, pages 179–205. Springer International
Publishing, Cham, 2019.
[13] Yuanxi Wei and Xiaoping Liu. Dangerous goods detection based on transfer
learning in x-ray images. Neural Computing and Applications, Jul 2019.

X-YOLO A Deep Learning Based Toolset With Multiple Optimization Strategies For Contraband Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

X-YOLO A Deep Learning Based Toolset With Multiple Optimization Strategies For Contraband Detection

Uploaded by

Copyright:

Available Formats

X-YOLO: A deep learning based toolset with multiple

optimization strategies for contraband detection

2 PRELIMINARIES OF YOLO 3 X-YOLO

Figure 1: X-YOLO Structure

∫ 1 of much contraband and different classes of much contraband in

(a) Faster R-CNN

Figure 5: Detection results

We use mAP to evaluate performance of the models. The com-

Figure 6: Model performance comparison

You might also like