Professional Documents
Culture Documents
Abstract
Deep neural networks (DNNs) are widely used to handle many difficult tasks, such as image classification and malware
detection, and achieve outstanding performance. However, recent studies on adversarial examples, which have
maliciously undetectable perturbations added to their original samples that are indistinguishable by human eyes but
mislead the machine learning approaches, show that machine learning models are vulnerable to security attacks.
Though various adversarial retraining techniques have been developed in the past few years, none of them is scalable.
In this paper, we propose a new iterative adversarial retraining approach to robustify the model and to reduce the
effectiveness of adversarial inputs on DNN models. The proposed method retrains the model with both Gaussian
noise augmentation and adversarial generation techniques for better generalization. Furthermore, the ensemble
model is utilized during the testing phase in order to increase the robust test accuracy. The results from our extensive
experiments demonstrate that the proposed approach increases the robustness of the DNN model against various
adversarial attacks, specifically, fast gradient sign attack, Carlini and Wagner (C&W) attack, Projected Gradient Descent
(PGD) attack, and DeepFool attack. To be precise, the robust classifier obtained by our proposed approach can
maintain a performance accuracy of 99% on average on the standard test set. Moreover, we empirically evaluate the
runtime of two of the most effective adversarial attacks, i.e., C&W attack and BIM attack, to find that the C&W attack
can utilize GPU for faster adversarial example generation than the BIM attack can. For this reason, we further develop a
parallel implementation of the proposed approach. This parallel implementation makes the proposed approach
scalable for large datasets and complex models.
Keywords: Machine learning, Adversarial examples, Deep learning (DL)
© The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The
images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 2 of 15
Fig. 1 Two images, an arbitrary image of “5” from the cite reference [56] (left) and the corresponding adversarial example generated by the C&W’s
attack (right), are indistinguishable to a human. However, the machine learning-based image recognition model (see Section 6.1 for neural network
architectures) labels the left image as “5” and the right image as “3”
image on the left hand side. While the left image is classi- DeepFool, and C&W attacks. Then, we retrain the model
fied as “5,” however, the right image is classified as “3” by based on the adversarial images generated in the previ-
a state-of-the-art digital recognition classifier whose per- ous step, normal images, and noise images. This step can
formance accuracy is 99% on the MNIST test set. This be considered as a combination of adversarial (re)training
result demonstrates the vulnerability of machine learning [13] and Gaussian data augmentation (GDA) [33]. How-
models. ever, we use soft labels instead of hard labels. Repeat this
Various proactive and reactive defense methods against retraining process until the acceptable robustness or max-
adversarial examples have been proposed over the years imum iteration number is reached. The resulting classifier
[19]. Examples of proactive defenses include adversarial has n classes. The first n − 1 classes are normal image
retraining [14, 20], defensive distillation [21], and classifier classes, and the last one is the adversary class.
robustifying [22, 23]. Examples of reactive defenses are After the model is fully trained, we can test it. At the
adversarial detection [24–29], input reconstruction [30], testing time, a small random Gaussian noise is added to
and network verification [31, 32]. However, all defenses a given test image before it is inputted into the model.
are later shown to be either ineffective for stronger adver- Since our model is trained with Gaussian noise, it does
sarial attacks such as the C&W attack or cannot be applied not affect the classification of the normal images. How-
to large networks. For instance, Reluplex [31] is only fea- ever, if the test image is adversarial, it is likely to be
sible for a network with only a few hundred neurons. In generated from optimization algorithms that introduce
this work, we proposed a distributive retraining approach well-designed minimal perturbation to a normal image.
against adversarial examples that can preserve the per- We can improve the likelihood of distinguishing the nor-
formance accuracy even under strong attacks such as the mal image from the adversarial image by disturbing this
C&W attack. well-calculated perturbation with noises. Additionally, we
Contrary to some existing studies that attempt to detect propose an ensemble model in which a given test image
the adversarial examples at the cost of reducing the without random noise is also input to the model, and its
classifier’s accuracy, we aim to maintain the classifier’s output is compared with the output of the test image with
predicting accuracy by increasing model capacity and random noise added. If two outputs are the same, that is
generating additional training data. Moreover, this pro- the final output of the model. If two outputs are differ-
posed approach can be applied to any classifiers since ent, a given test image is marked as adversarial to alert the
it is classifier-independent. Particularly, the proposed system for further examination.
approach is useful for safety- and security-critical systems, The key contributions of this paper are in the following:
such as machine learning classifiers for malware detection
[9] and autonomous vehicles [11]. 1 We propose an iterative approach to training a
Our proposed iterative approach can be considered as model to either label the adversarial sample with the
an adversarial (re)training technique. First, we train the label of the original/natural sample or at least label it
model with normal images and normal images with ran- as the adversarial sample and alert the system for
dom Gaussian noises added (the later ones are simply further examination. Compared to other studies, we
called noise images in this paper). Next, we generate treat the perturbation differently depending on its
adversarial images using attack techniques, such as PGD, size. Our goal is to train a robust model that can
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 3 of 15
classify the instance correctly when the perturbation in both white-box settings (assume an adversary has full
η < η0 and label it either correctly or as adversarial knowledge of the trained model F) and gray-box settings
example when perturbation η ≥ η0 ; that is, η is larger (assume an adversary has no knowledge of the trained
than some application specific perturbation limit η0 . model F but has knowledge of the training set used).
This makes a better generalization as DNN models The ultimate goal of an adversary is to generate adver-
learn to focus on important features and ignore the sarial examples x that misled the trained model. That is,
unimportant ones. for each input (x, y), the adversary’s goal is to solve the
2 The iterative process is highly parallelizable. Since the following optimization problem:
iterative model at time t is not much different from
the iterative model at time t − 1, the older model for min ||x − x ||22
δ
generating adversarial examples should work as well s. t. x ∈[ 0, 1]n , F(x ) = y , (1)
as the current model due to the transferability of the
adversarial examples. Some GPU nodes can be used where y = y are class labels and is a maximum allowable
to generate adversarial examples based on the perturbation that is undetectable by human eyes.
previous iterative model instead of waiting for a new
iterative model. Therefore, the adversarial example 3 Related work
for the next iteration t can be generated before the Over the past few years, various adversarial (re)training
new iterative model is produced. methods have been developed to mitigate adversarial
3 The trained model based on our proposed attacks [14, 20]. Nevertheless, Tramèr et al. [34] showed
methodology will not only be robust against that a two-step attack can easily bypass the classifier
adversarial examples but also maintains the accuracy trained using adversarial examples by first adding a small
of the original classifier. Some existing methods noise to the system and then performing any classical
strengthen the model with respect to some attacks attack technique such as FGSM and DeepFool. Instead of
but reduce the accuracy of the classifier at the same injecting adversarial examples to the training set, Zant-
time. Our model, through multiple data generating edeschi et al. [35] suggested to add small noises to the
methods and larger network capacity, refines the normal images to generate additional training examples.
data representation at each iteration; therefore, we They compared their methods with other defense meth-
can maintain our robustness and accuracy at the ods, such as adversarial training and label smoothing, and
same time. showed that their approach is robust and does not com-
4 We proposed an ensemble model that outputs the promise the performance of the original classifier. The
final label based on the labels provided by two tests, last type of proactive countermeasures is to robustify a
one with added small random noise and one without. classifier. That is, its goal is to build more robust neu-
Small random noise is aimed to distort the optimal ral networks [22, 23]. For instance, in [22], Bradshaw
perturbation injected by the adversary. We have combined DNN with Gaussian processes (GP) to make
trained our model against random noise at training scalable GPDNN and showed that it is less susceptible to
time; hence, the small random noise does not affect the FGSM attack.
the classification accuracy of normal images. Furthermore, Yuan et al. [19] introduced three reac-
However, it may disturb the optimal adversarial tive countermeasures for adversarial examples: adversarial
example generated by an adversary. If the original detection [24–29], input reconstruction [30, 36], and net-
image and image with added test noise produce work verification [31, 32]. Adversarial detection consists
different outputs from the model, the input image is of many techniques for adversarial detecting. For instance,
likely to be adversarial. Feinman et al. [24] assumed that the distribution of adver-
sarial sample is different from the distribution of natural
The remainder of this paper is organized as follows. sample and proposed a detection method based on the
We introduce the threat model in Section 2. In Section 3, kernel density estimates in the subspace of the last hidden
we present related work, and in Section 4, we provide layer and the Bayesian neural network uncertainty esti-
the necessary background information. In Section 5, we mates with dropout randomization. However, Carlini and
present the proposed approach, followed by an evaluation Wagner [37] showed that the kernel density estimation,
in Section 6. In Section 8, we give conclusions and future which is the most effective defense technique among ten
work. defenses considered by Carlini and Wagner on MNIST,
is completely ineffective on CIFAR-10. Grosse et al. [25]
2 Threat model used Fisher’s permutation test with Maximum Mean Dis-
Before discussing the related work, we define the threat crepancy (MMD) to check where a sample is adversarial
model formally. In this work, we consider evasion attacks or natural. Though MMD is a powerful statistical test, it
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 4 of 15
fails to detect the C&W attack [37]. This indicates there is computationally expensive and only works on small net-
may not be a significant difference between the distribu- works. However, Yuan et al. [19] concluded that almost all
tion of an adversarial sample and the distribution of a defenses have limited effectiveness and are vulnerable to
natural sample when the adversarial perturbation is sub- unseen attacks. For instance, C&W attack is effective for
tle. Xu et al. [29] used feature squeezing techniques to most of existing adversarial detection methods though it
detect adversarial examples, and the proposed technique is computationally expensive [37]. In [37], authors showed
can be combined with other defenses such as the defensive that ten proposed detection methods cannot withstand
distillation for defense against adversarial examples. white-box attack and/or black-box attack constructed by
Input reconstruction is another category of reactive minimizing defense-specific loss functions.
defense, where a model is used to find the distribution of a
natural sample and an input is projected onto data mani- 4 Background
fold. In [30], a denoising contractive autodecoder network This section provides a brief introduction to neural net-
(CAE) is trained to transform an adversarial example to works and adversarial machine learning.
the corresponding natural one by removing the added per-
turbation. Song et al. [36] used PixelCNN, which provides 4.1 Neural networks
the discrete probability of raw pixel values in the image, Machine learning automates the tasks of writing rules for
to calculate the probabilities of all training images. At the a computer to follow. That is, giving an input and a desired
test time, a test instance is inputted and its probability is outcome, machine learning can find a set of rules needed
computed and ranked. Then, a permutation test is used to automatically. Deep learning is a branch of machine learn-
detect an adversarial perturbation. In addition, they pro- ing that automates a feature selection process. That is,
posed PixelDefend to purify an adversarial example by you do not even need to specify the features, and a neu-
solving the following optimization problem: ral network can extract them from raw input data. For
instance, in image classification, an image is inputted into
maxx P(x ) subject to||x − x||∞ ≤ . (2)
the neural network and the convolutional layers of the
Last but not the least, network verification formally proves neural network will extract the important features from
whether a given property of a DNN model is violated. For the image directly. This makes deep learning desirable for
instance, Reluplex [31] used a satisfiability modulo theory many complex tasks such as natural language processing
solver to verify whether there exists an adversarial exam- and image classification, where software engineers have
ple within a specified distance of some input for the DNN difficulty writing rules for a computer to learn such tasks.
model with ReLU activation function. Later, Carlini et al. The performance of a neural network is remarkable in
[38] showed that Reluplex can also support max-pooling domains such as image classification [1–3], natural lan-
by encoding max operators as follows: guage processing [4–6], and machine translation [39]. By
max(x, y) = ReLU(x − y) + y. (3) using the Universal Approximation Theorem [40], any
continuous function in a compact space can be approxi-
Initially, Reluplex could only handle the Linfty norm as a mated by a feed-forward neural network with at least one
distance metric. Using Eq. (3), Carlini et al. [38] encoded hidden layer and suitable activation function to any desir-
the absolute value of sample x as able accuracy [41]. This theorem explains the wide appli-
|x| = max(x, −x) = ReLU(x − (−x)) + (−x) (4) cation of deep neural networks. However, it does not give
any constructive guideline on how to find such universal
= ReLU(2x) − x. (5)
approximator. In 2017, Lu et al. [42] established the Uni-
In this way, Reluplex can handle the L1 norm as a dis- versal Approximation Theorem for width-bounded ReLU
tance metric as well. However, Reluplex is computation- networks, which showed that a fully connected width-(n+
ally infeasible for large networks. More specifically, it 4) ReLU networks, where n is the input dimension, is an
can only handle networks with a few hundred neurons universal approximator. These two Universal Approxima-
[38]. Using Reluplex and k-means clustering, Gopinath et tion Theorems explain the ability of deep neural network
al. [32] proposed DeepSafe to provide safe regions of a to learn.
DNN. On the other hand, Reluplex can be used by an A feed-forward neural network can be written as a
attacker as well. For instance, Carlini et al. [38] proposed function F : X → Y, where X is its input space or
a Ground-Truth Attack that uses C&W attack as an ini- sample space and Y is its output space. If the task is clas-
tial step for a binary search to find an adversarial example sification, Y is the set of discrete classes. If the task is
with the smallest perturbation by invoking Reluplex iter- regression, Y is a subset of Rn . For each sample, x ∈ X,
atively. In addition, Carlini et al. [38] also proposed a F(x) = fL (fL−1 (...(f1 (x)))), where fi (x) = A(wi ∗ x + b),
defense evaluation using Reluplex to find a provably min- A(.) is an activation function, such as the non-linear Rec-
imally distorted example. Since it is based on Reluplex, it tified Linear Unit (ReLU), sigmoid, softmax, identity, and
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 5 of 15
hyperbolic tangent (tanh); wi is a matrix of weight param- of the next few years, many other gradient-based attacks
eters; b is the bias unit; i = 1, 2, . . . , L; and L is the total are proposed, for example,
number of the layer in a neural network. For classifica-
• Fast Gradient Sign Method (FGSM)
tion, the activation function for the last layers is usually a
softmax function. The key to state-of-the-art performance FGSM is a one-step algorithm that generates
of the neural network is the optimized weight parameters perturbations in the direction of the loss gradient, i.e.,
that minimize a loss function J, a measure of the differ- the amount of perturbation is
ence between the predicted output and its true label. A η = sgn(∇x J(x, l)) (7)
common method used to find the optimized weights is
the back-propagation algorithm. For example, see [43] and and
[44] for an overview of the back-propagation algorithm.
x = x + sgn(∇x J(x, l)), (8)
In this paper, we consider the DNN models for image
classification. A gray scaled image x has h × w pixels, i.e., where is an application-specific imperceptible
x ∈ Rhw , where each component xi is normalized so that perturbation adversary intent to inject, and
xi ∈[ 0, 1] . Similarly, for a colored image x with a RGB sgn(∇x J(x, l)) is the sign of the loss function [14]. The
channel, x ∈ R3hw . In the following subsection, we will Basic Iterative Method (BIM) is an iterative
consider the attack approaches for generating adversarial application of FGSM in which a finer perturbation is
image x from the natural or original image x, where x can obtained at each iteration. This method is introduced
be either gray scaled or colored. by Kurakin et al. [45] for generating an adversarial
example in the physical world. The Projected
4.2 Attack approaches Gradient Descent (PGD) is a popular variate of BIM
Adversarial example/sample x is a generated sample that that initializes through uniform random noise [46].
is close to natural sample x to human eyes but misclas- Iterative Least-likely Class Method (ILLC) [45] is
sified by a DNN model [13]. That is, the modification is similar to BIM but uses the least likely class as the
so subtle that a human observer does not even notice it, targeted class to maximize the cross-entropy loss;
but a DNN model misclassifies it. Formally, an adversarial hence, this is a targeted attack algorithm.
sample x satisfies the conditions that • Jacobian-based Saliency Map Attack (JSMA)
• ||x − x ||22 < , where is the maximum allowable JSMA uses the Jacobian matrix of a given image to
find the input features of the image that made most
perturbation that is not noticeable by human eyes.
• x ∈[ 0, 1] . significant changes to output [47]. Then, the value of
• F(x ) = y , F(x) = y, and y = y , where y and y are this pixel is modified so that the likelihood of the
target classification is higher. The process is repeated
class labels.
until the limited number of modifiable pixels is
In general, there are two types of adversarial attacks for reached or the algorithm has succeeded in generating
DNN models: untargeted and targeted. In an untargeted the adversarial example that is misclassified as a
attack, an attacker does not have a specific target label in target class. Therefore, this is another targeted attack
mind when trying to fool a DNN model. In contrast, an but has a high computation cost.
attacker tries to mislead a DNN model to classify an adver- • DeepFool
sarial sample x as a specific target label t in a targeted DeepFool searches for the shortest distance to cross
attack. In 2013, Szegedy et al. [13] first generated such an the decision boundary using an iterative linear
adversarial example using the Limited-memory Broyden approximation of the classifier and orthogonal
Fletcher Goldfarb Shanno (L-BFGS) attack. Given a nat- projection of the sample point onto it [15], as shown
ural sample x and a target label t = F(x), they used the in Fig. 2. This untargeted attack generates adversarial
L-BFGS method to find an adversarial sample x that satis- examples with a smaller perturbation compared with
fies the following box-constrained optimization problem: L-BFGS, FGSM, and JSMA [17, 48]. For detail, see
[15]. Universal Adversarial Perturbation is an
min c||x − x ||22 + J(x , t), (6) updated version of DeepFool that is transferable [49].
subject to x ∈[ 0, 1]m and F(x ) = t, It uses the DeepFool method to generate the minimal
perturbation for each image and find the universal
where c is a constant that can be found by linear searching, perturbation that satisfies the following two
and m = hw if the image is gray scaled and m = 3hw if constraints:
the image is colored. Because of this computational inten-
sive linear searching method for optimal c, the L-BFGS ||η||p ≤ , (9)
attack is time consuming and impractical. Over the course P(F(x) = F(x + η)) > 1 − δ, (10)
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 6 of 15
3 Combine the regular training set with the adversarial Table 1 Training parameters
images and noisy images generated in 1 and 2, Parameter MNIST
respectively, to train the model. Instead of hard labels αNormalNoise 0.44
for these images, soft labels are used instead. For
αBIM 0.44
instance, hand-written “7” and “1” are similar in that
both have a long vertical line. Hence, rather than label αCW 0.54
a hand-written digit as 100% “7” or 100% “1”. We may αDF 0.54
say it is 80% “7” and 20% “1”. This shows the structure τNormalNoise 0.1
similarity between “7” and “1”. More precisely, the τBIM 0.1
soft labels for an adversarial image, a random image, τCW 0.0265
and a clean image are defined as follows, respectively.
τDF 0.0265
Let τ be a hyperparameter that measures the
α 1−α
acceptable size of perturbation. Let us first define the β 2 + 11
where η is the amount of L2 perturbation that an The implications of the above proposed approach are
adversary added a perturbation to a normal image x, discussed in Section 7, where we have showed that our
and δ is an extremely small constant allowing for the proposed approach is robust when a small noise is injected
division to be defined, say δ = 10−10 . We use this to the test instance before the classification is performed,
definition of robustness in this paper since this and it is also robust against both white-box attacks and
definition captures the intuition that larger black-box attacks. However, we have not studied whether
perturbation is required to fool the classifier or not our proposed approach is robust against adaptive
indicates a more robust classifier. attacks. A further study is needed in the future.
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 8 of 15
Table 3 Classification accuracy on the MNIST test dataset under Table 5 Classification accuracy on the CIFAR-10 test dataset
the white-box attack under white-box attacks
Attack FGSM C&W BIM DeepFool Attack FGSM C&W BIM DeepFool
Original 29% 7% 7% 29% Original 16% 46% 8% 13%
Adversarial training (ART) 98% 49% 98% 42% Robust classifier 47% 78% 80% 36%
Robust classifier 100% 98% 99% 99%
Fig. 3 Training and validation loss. The training loss decreases with every epoch, whereas the validation loss does not decrease much after the third
epoch. Hence, the training process can be stopped after three epochs to prevent overfitting
step) by assigning a CPU (or GPU) for each adversarial 30 times for each attack. Then, we install K40C, TitanV,
crafting technique. Similarly, data parallelization can be and both K40C and TitanV, respectively, and repeat the
performed during training. experiments. The result is shown in Figs. 5 and 6. The
The framework for parallelization is shown in Fig. 4. Ini- BIM attack run faster on CPU than on GPUs. In fact,
tially, the original model and a random sample of the orig- when an attacker tries to utilize the GPUs, the adversar-
inal images are used to generate fake images. The original ial image generation speed goes down. As shown in Fig. 5,
model is needed because we want to generate adversar- though we utilize different types of GPU, the adversar-
ial images by just adding an application-specific imper- ial image generation speed does not change among the
ceptible perturbation to the original image and make it GPU types. On the contrary, C&W attack takes advantage
misclassified by the original model. There are various of the GPU and runs faster on GPU than on CPU. Fur-
ways to obtain such an application-specific impercepti- thermore, we see that C&W attack runs faster on Titan
ble perturbation. For instance, the FGSM attack generates V than on K40c. Table 6 is obtained by averaging over
such a perturbation by using the model’s loss gradient as batch sizes. As shown, the BIM attack runs twice as fast
described in Section 4. After the fake images are gener- on CPU than GPU whereas C&W attack runs faster on
ated, it is saved to the fake image storage folder. During GPU. The variation among the runs is smaller under BIM
the retraining step, a random sample of fake images and attacks on average, as shown with smaller standard devi-
original images as well as the copies of the original model ations. Moreover, we create a row called Titan V + K40c
are sent to the multiple-GPUs for retraining. After the (average), which takes the average speed of Titan V and
retraining, the updated model is checked to see if it is K40c. Comparing the values on this row with the value in
robust against adversarial attacks. If so, the updated model the last row, we see that actual speed with two GPUs is not
is saved, and the process ends. Otherwise, the updated equal to the average speeds if GPU is utilized (as in C&W
model is saved as the new model for the next stage of fake attack).
image generation and retraining.
To demonstrate the importance of selecting right the 7.3 Ensembling
resource for different adversarial example crafting tech- At the test time, a small random noise is injected to
niques, we conduct an experiment on a standard alone the test instance before performing the classification.
personal computer (12 thread CPUs). First, we indepen- This small noise is aimed to distort the intentional per-
dently run each attack on the same computer and mea- turbation injected by the adversarial. An experiment is
sure the run time for generating a batch of 64, 128, 256, performed to illustrate it as follows. First, 100 adversar-
and 512 adversarial examples without GPU resource. To ial images are generated using two popular adversarial
reduce the variation, we repeat the same experiment for attack algorithms, BIM and FGSM. Then, using NumPy’s
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 11 of 15
Fig. 4 Parallelization framework. We decouple the fake image generation step from the retraining step by only updating the model used for the fake
image generation as a new model is obtained. This way, the fake image generation step can continuously generate adversarial images while the
retraining continues. Furthermore, the fake image generation step is parallelized by using multi-processing. Note that the processors can be GPUs
or CPUs. GPUs are better in term of computation efficiency, and CPUs can store more images
[58] normal number generator with a mean of 0.01 and image is not likely to change the label of the image since
variance of 0.01, a small random normal noise is generated we have trained our model with small Gaussian noise.
for each adversarial image. This small random normal However, if the test instance is adversarial, then it is likely
noise is injected into an adversarial image. Next, the orig- produced by an optimization algorithm that searches for
inal classifier classifies these noisy adversarial images. If a minimal perturbation to a clean image. However, the
the predicted label is different from the ground truth, injected random noise to such a clean image will mess
then the attack is successful (we considered an untar- up such well-calculated minimal perturbation. Therefore,
geted attack). Otherwise, it is unsuccessful. This process if the label of a given test image without added ran-
is repeated 30 times. The average result is shown in Fig. 7. dom noise is different from that of with added random
Even without robust adversarial training, the classifier noise, this is likely an adversarial image and the system is
(original) can reduce the attack success rate by 14% for the alerted.
FGSM attack and 20% for the BIM attack when the per-
turbation injected by an attacker is 0.01. Even when the 7.4 Limitations
perturbation increases to 0.05, the injected random noise Section 6 considered both white-box attacks (the attacker
can reduce the attack success rate of FGSM and BIM by knows model architecture) and black-box attacks (the
6% and 14%, respectively. attacker does not know the model architecture). The
Since our proposed model is trained with random noise, proposed model is robust against both types of attacks,
it is robust against it. The small noise added to the natural as shown in Section 5. However, those evaluations
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 12 of 15
Fig. 5 Adversarial image generation speed under the BIM attack. Image generation speeds under the BIM attack. Note that CPU has a much faster
adversarial image generation speed than K40C, TitanV, and both K40C and TitanV
Fig. 6 Adversarial image generation speed under the C&W attack. Note that CPU has a much slower adversarial image generation speed than K40C,
TitanV, and both K40C and TitanV in the attack. That is, the C&W attack has an opposite result than the BIM attack
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 13 of 15
Fig. 7 Reduction in the attack success rate (%) as a function of perturbation injected by the FGSM and BIM attacks, respectively
Lin et al. EURASIP Journal on Information Security (2022) 2022:1 Page 14 of 15
27. A. N. Bhagoji, D. Cullina, P. Mittal, Dimensionality Reduction as a Defense Learning Models Resistant to Adversarial Attacks (OpenReview.net, 2018).
against Evasion Attacks on Machine Learning Classifiers. CoRR. https://openreview.net/forum?id=rJzIBfZAb
abs/1704.02654 (2017). http://arxiv.org/abs/1704.02654 47. N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, in
28. X. Li, F. Li, in Proceedings of the IEEE International Conference on Computer 2016 IEEE European Symposium on Security and Privacy (EuroS&P). The
Vision. Adversarial examples detection in deep networks with limitations of deep learning in adversarial settings (IEEE, Saarbrücken,
convolutional filter statistics, (2017), pp. 5764–5772 2016), pp. 372–387
29. W. Xu, D. Evans, Y. Qi, in 25th Annual Network and Distributed System 48. X. Yuan, P. He, Q. Zhu, X. Li, Adversarial examples: attacks and defenses for
Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019)
2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural 49. S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, in Proceedings of the
Networks (The Internet Society, 2018). http://wp.internetsociety.org/ndss/ IEEE Conference on Computer Vision and Pattern Recognition. Universal
wp-content/uploads/sites/25/2018/02/ndss2018_03A-4_Xu_paper.pdf adversarial perturbations, (2017), pp. 1765–1773
30. S. Gu, L. Rigazio, in 3rd International Conference on Learning Representations, 50. P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, C.-J. Hsieh, in Proceedings of the 10th
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, ACM Workshop on Artificial Intelligence and Security. Zoo: zeroth order
ed. by Y. Bengio, Y. LeCun. Towards Deep Neural Network Architectures optimization based black-box attacks to deep neural networks without
Robust to Adversarial Examples, (2015). http://arxiv.org/abs/1412.5068 training substitute models (ACM, Dallas, 2017), pp. 15–26
31. G. Katz, C. Barrett, D. L. Dill, K. Julian, M. J. Kochenderfer, in International 51. N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, in
Conference on Computer Aided Verification, ed. by R. Majumdar, V. Kuncak. Proceedings of the 2017 ACM on Asia Conference on Computer and
Reluplex: an efficient SMT solver for verifying deep neural networks Communications Security. Practical black-box attacks against machine
(Springer, 2017), pp. 97–117 learning, (2017), pp. 506–519
32. D. Gopinath, G. Katz, C. S. Păsăreanu, C. Barrett, in Proceedings of the 16th 52. Z. Zhao, D. Dua, S. Singh, in 6th International Conference on Learning
International Symposium on Automated Technology for Verification and Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018,
Analysis (ATVA ’18). Lecture Notes in Computer Science, vol. 11138, ed. by S. Conference Track Proceedings. Generating Natural Adversarial Examples
Lahiri, C. Wang. DeepSafe: A Data-driven Approach for Assessing (OpenReview.net, 2018). https://openreview.net/forum?id=H1BLjgZCb
Robustness of Neural Networks (Springer, Los Angeles, 2018), pp. 3–19. 53. H. Cao, C. Wang, L. Huang, X. Cheng, H. Fu, in 2020 International
https://doi.org/10.1007/978-3-030-01090-4_1 Conference on Control, Robotics and Intelligent System. Adversarial DGA
33. V. Zantedeschi, M.-I. Nicolae, A. Rawat, in Proceedings of the 10th ACM domain examples generation and detection, (2020), pp. 202–206
Workshop on Artificial Intelligence and Security. Efficient defenses against 54. S. Chen, D. Shi, M. Sadiq, X. Cheng, Image denoising with generative
adversarial attacks (ACM, 2017), pp. 39–49 adversarial networks and its application to cell image enhancement. IEEE
34. F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel, in Access. 8, 82819–82831 (2020). https://doi.org/10.1109/ACCESS.2020.
6th International Conference on Learning Representations, ICLR 2018, 2988284
Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. 55. A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images,
Ensemble Adversarial Training: Attacks and Defenses (OpenReview.net, 32–33 (2009). https://www.cs.toronto.edu/~kriz/learning-features-2009-
2018). https://openreview.net/forum?id=rkZvSe-RZ TR.pdf. Accessed 21 Jan 2021
35. V. Zantedeschi, M.-I. Nicolae, A. Rawat, in Proceedings of the 10th ACM 56. Y. LeCun, C. Cortes, MNIST handwritten digit database (2010). http://yann.
Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, lecun.com/exdb/mnist/. Accessed 14 Jan 2016
USA, November 3, 2017, ed. by B. M. Thuraisingham, B. Biggio, D. M. 57. M.-I. Nicolae, M. Sinn, M. N. Tran, B. Buesser, A. Rawat, M. Wistuba, V.
Freeman, B. Miller, and A. Sinha. Efficient defenses against adversarial Zantedeschi, N. Baracaldo, B. Chen, H. Ludwig, I. Molloy, B. Edwards,
attacks (ACM, 2017), pp. 39–49. https://doi.org/10.1145/3128572.3140449 Adversarial Robustness Toolbox v0.10.0. CoRR. abs/1807.01069 (2018).
36. Y. Song, T. Kim, S. Nowozin, S. Ermon, N. Kushman, in 6th International http://arxiv.org/abs/1807.01069
Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 58. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D.
April 30 - May 3, 2018, Conference Track Proceedings. PixelDefend: Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S.
Leveraging Generative Models to Understand and Defend against Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P.
Adversarial Examples (OpenReview.net, 2018). https://openreview.net/ Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H.
forum?id=rJUYGxbCW Abbasi, C. Gohlke, T. E. Oliphant, Array programming with NumPy. Nature.
37. N. Carlini, D. Wagner, in Proceedings of the 10th ACM Workshop on Artificial 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Intelligence and Security. Adversarial examples are not easily detected: 59. J. Rauber, W. Brendel, M. Bethge, Foolbox v0.8.0: A Python toolbox to
bypassing ten detection methods (ACM, Dallas, 2017), pp. 3–14 benchmark the robustness of machine learning models. CoRR.
38. N. Carlini, G. Katz, C. W. Barrett, D. L. Dill, Ground-Truth Adversarial abs/1707.04131 (2017). http://arxiv.org/abs/1707.04131
Examples. CoRR. abs/1709.10207 (2017). http://arxiv.org/abs/1709.10207 60. G. W. Ding, L. Wang, X. Jin, advertorch v0.1: An Adversarial Robustness
39. P. L. Garvin, On Machine Translation: Selected Papers, vol. 128. (Walter de Toolbox based on PyTorch. CoRR. abs/1902.07623 (2019). http://arxiv.
Gruyter GmbH & Co KG, Netherlands, 2019) org/abs/1902.07623
40. K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are 61. N. Morgulis, A. Kreines, S. Mendelowitz, Y. Weisglass, Fooling a Real Car
universal approximators. Neural Netw. 2(5), 359–366 (1989) with Adversarial Traffic Signs. CoRR. abs/1907.00374 (2019). http://arxiv.
41. A. R. Barron, Approximation and estimation bounds for artificial neural org/abs/1907.00374
networks. Mach. Learn. 14(1), 115–133 (1994)
42. Z. Lu, H. Pu, F. Wang, Z. Hu, L. Wang, in Advances in Neural Information Publisher’s Note
Processing Systems. The expressive power of neural networks: a view from Springer Nature remains neutral with regard to jurisdictional claims in
the width, (2017), pp. 6231–6239 published maps and institutional affiliations.
43. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. (MIT press,
Cambridge, MA, 2016). http://www.deeplearningbook.org
44. J. Bergstra, F. Bastien, O. Breuleux, P. Lamblin, R. Pascanu, O. Delalleau, G.
Desjardins, D. Warde-Farley, I. Goodfellow, A. Bergeron, et al, in NIPS 2011,
BigLearning Workshop, Granada, Spain. Theano: deep learning on gpus
with python, vol. 3 (Citeseer, Granada, 2011), pp. 1–48
45. A. Kurakin, I. Goodfellow, S. Bengio, in 5th International Conference on
Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017,
Workshop Track Proceedings. Adversarial examples in the physical world
(OpenReview.net, 2017). https://openreview.net/forum?id=HJGU3Rodl
46. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, in 6th International
Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada,
April 30 - May 3, 2018, Conference Track Proceedings. Towards Deep