A Comparative Study of Adversarial Training Method - 2023 - Future Generation Co

Future Generation Computer Systems 142 (2023) 165–181
Contents lists available at ScienceDirect
Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs
A comparative study of adversarial training methods for neural models

of source code
∗
Zhen Li a,b , Xiang Huang a , , Yangrui Li b , Guenevere Chen c
a
School of Cyber Security and Computer, Hebei University, Baoding, 071002, Hebei Province, China
b
School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, Hubei Province, China
c
Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, 78249, TX, USA
article info a b s t r a c t
Article history: Adversarial training has been employed by researchers to protect AI models of source code. However, it
Received 15 August 2022 is still unknown how adversarial training methods in this field compare to each other in effectiveness
Received in revised form 30 November 2022 and robustness. This study surveys and investigates existing adversarial training methods, and conducts
Accepted 22 December 2022
experiments to evaluate these neural models’ performance in the domain of source code. First, we
Available online 26 December 2022
examine the process of adversarial training to identify four dimensions that could be used to classify
Keywords: different adversarial training methods into five categories, which are Mixing Directly, Composite Loss,
Adversarial training Adversarial Fine-tuning, Min–max + Composite Loss, and Min–max. Second, we conduct empirical
Robustness evaluations of these classified adversarial training methods under two tasks (i.e., code summarization
Source code and code authorship attribution) to determine their performance of effectiveness and robustness.
Comparative study
Experimental results indicate that the performance of certain combinations of adversarial training
techniques (i.e., min–max with composite loss, or directly-sample with ordinary loss) would be much
better than other combinations or other techniques used alone. Our experiments also reveal that the
model’s robustness of defensive methods can be enhanced by using diverse input data for adversarial
training, and that the number of fine-tuning epochs has little or no impact on model’s performance.
© 2022 Elsevier B.V. All rights reserved.
1. Introduction and the defensive target: data anomaly detection, data augmenta-
tion, model enhancement, and adversarial training. Among them,
Deep Learning (DL) approaches have been increasingly applied adversarial training is the most widely used defensive method,
to tackle a number of tasks in the source code domain, such as which inspires us to conduct this study to survey and investigate
code summarization [1–3], code functionality classification [4,5], the existing approaches in this field.
variable type inference [6,7], code authorship attribution [8–10], To the best of our knowledge, a systematic and independent
code auto-completion [11,12], and vulnerability detection [13]. evaluation of adversarial training methods that are designed for
Although DL models have outstanding performance, they are neural models of source code is still lacking in the literature. For
vulnerable to adversarial attacks. For example, applying small and instance, existing studies [14–21] focus mainly on proposing a
imperceptible perturbations to the input data of DL models can novel adversarial training method. These studies normally eval-
result in producing incorrect predictions. Such perturbed inputs uate the newly proposed method by comparing it with existing
that mislead the model’s output are termed adversarial examples. methods; however, they only selected one or two other categories
of adversarial training methods for comparison. In contrast, our
This type of vulnerability has been discovered in DL models across
study compares five different categories of methods collected
numerous fields including computer vision and natural language
by surveying the existing defenses. We also observe that many
processing. Similar to all the other DL application domains, DL
evaluations in the existing works such as [15–19] dealt with one
models of code are also susceptible to adversarial attacks.
specific task. They did not evaluate the proposed approach’s per-
To address adversarial attacks on neural models of source
formance across multiple tasks, which makes their assessments
code, researchers have proposed defensive methods to improve
less generic and their conclusions could be task-dependent. In
the robustness of models. These methods can be divided into four
contrast, this paper evaluates adversarial training approaches’
types based on the time when the defensive method is deployed
performance under two tasks, namely, code summarization and
code authorship attribution. By conducting evaluations across
∗ Corresponding author. tasks, this paper achieves insights that are more universally ap-
E-mail address: huang_xiang@stumail.hbu.edu.cn (X. Huang). plicable and task-independent. For example, we find that the
https://doi.org/10.1016/j.future.2022.12.030
0167-739X/© 2022 Elsevier B.V. All rights reserved.
Z. Li, X. Huang, Y. Li et al. Future Generation Computer Systems 142 (2023) 165–181
performance of pre-trained models does not improve as the num- hand, KD+BU method is time efficient but has a lower detection
ber of fine-tuning epochs increases, which is applicable to both rate than ML-LOO. Note that this study aims at evaluating the
tasks we study. detection of attacks (or reactive defenses) rather than adversarial
training (proactive defenses), which distinguishes the study from
Contributions. We conduct the first independent comparative
our work.
study to evaluate the performance of existing adversarial training
Existing studies, however, have not evaluated adversarial
methods that have been applied in the domain of source code for
training methods in the source code domain. Also, the methods
enhancing the robustness of DL models. The main contributions
that are effective in the image domain cannot be directly applied
of this paper are as follows.
to enhance source code DL model performance. This is primarily
First, we survey and derive a categorization of existing adver-
because the source code domain faces its unique challenges,
sarial training methods for neural models of source code based
such as the discreteness of the code space and high cost of on-
on four dimensions of input data, input model, learning sample
line transformations. We discuss these challenges in detail in
obtaining strategy, and loss computation. We define these four
dimensions because we have examined the processes of vari- Section 5.
ous adversarial training methods and observed that these four The second type of studies concerns how robust different
respects suffice to uniquely identify adversarial training methods. models are in themselves. Rabin et al. [25] carried out a compara-
Second, we independently and quantitatively evaluate the per- tive study on the robustness of several code models, i.e., code2vec,
formance of each category of existing adversarial training meth- code2seq, and GGNN. Applis et al. [26] proposed Lampion, a
ods for neural models of source code by carrying out experiments testing framework that can evaluate the robustness of source-
under two different tasks. code-based models, and used this framework to evaluate the
Third, our experimental results indicate that the DL models CodeBERT model. Our study is different because we focus on how
trained using Optimization-Objective-based (OO-based) methods much robustness different adversarial training defenses give to
achieve higher robustness when adversarial examples are regen- the models, rather than how robust they inherently are.
erated more frequently. In the training process, OO-based adver-
sarial training methods combine with composite loss better than 2.2. Adversarial training
ordinary loss to achieve higher robustness, but this is not the case
with the other type of training, namely Data-Augmentation-like Extensive research has been carried out on adversarial training
methods. in the Computer Vision (CV) and Natural Language Processing
The rest of this paper is organized as follows. Section 2 in- (NLP) fields. Based on how the training works, we divide methods
troduces related work. Section 3 elaborates on our methodology, in these fields into three major categories: Data-Augmentation-
including the research questions, experiment steps, and how to like (DA-like), Optimization-Objective-based (OO-based), and Vir-
categorize existing defenses. Section 4 presents our experimental tual Adversarial Training (VAT).
results and summarizes insights. Section 5 discusses the chal- In CV, OO-based methods are the earliest and the most popu-
lenges and prospects in this field, and limitations of this study. lar. In OO-based methods, adversarial examples are generated or
Section 6 concludes the paper. chosen during adversarial training based on the current state of
model. In generating or choosing the examples, it is expected that
2. Related work the objective function be maximized or minimized for individual
samples in order to best fool the model. In their pioneering
2.1. Comparative study of model robustness works, Szegedy et al. [27] and Goodfellow et al. [28] proposed
L-BFGS and FGSM respectively as the inner maximizers (gener-
Comparative studies of model robustness can be grouped into ators). Because transforming images is usually computationally
two types. The first type of studies focuses on how or why efficient, OO-based methods are widely adopted in the works
defensive methods perform differently in enhancing model’s ro- to follow [29–34]. Meanwhile, in DA-like methods, adversarial
bustness. Our work is similar to this type of studies, which is examples to be learned are obtained by attacking a pre-trained
currently lacking in the source code domain. For example, Li model (i.e., a surrogate). These pre-generated examples are then
et al. [22] carried out a comparative study on the adversarial combined with the original training set in some way to form a
training methods used in long-tailed classification in the im- new set which will be used for training the final robust model.
age classification domain. They categorized existing adversarial DA-like methods such as [35,36] are much less common in the
training method using three levels of components: information, image domain. Some researchers have also proposed GAN-based
methodology, and optimization. They observed that statistical adversarial training [37,38], which we view as a subtype of OO-
perturbations combine well with hybrid optimization to produce based methods. In this type of training, a generator tries to
robust models. Additionally, the gradient-based method usually minimize its objective to generate adversarial data that is closer
improves the performance of both the head and tail classes. to real data, while a discriminator tries to maximize its objective
Similarly, Ma et al. [23] comparatively studied the adversarial to more accurately identify adversarial data. In addition to su-
training methods solving the cold start problem in recommender pervised training, Miyato et al. [39] proposed Virtual Adversarial
systems. They defined three levels identical to Li et al. [22] to Training, which is a semi-supervised method that can guide the
classify adversarial training. They found that data perturbations training without use of label information. There are many works
and gradient-based perturbations lead to more competitive per- following this line of research.
formance than feature perturbations and statistical perturbations. In NLP, the landscape is different in that DA-like methods [40–
Moreover, hybrid optimization has a better performance than 44] gain more popularity than in CV. This is mainly because
adversarial optimization. Zhang et al. [24] empirically compared transformations cannot be done on text as efficiently as on im-
five detection methods, namely SPBAS, ML-LOO, KD+BU, LID, and ages, as the latter is continuous, while the former is discrete. An
MAHA, for protecting DL models from adversarial attacks in the image can be perturbed by altering the numerical value of its
image domain. Overall, the ML-LOO method, which detects ad- pixels, and this can be done almost arbitrarily, but a text cannot
versarial examples using feature attribution, is the best among be perturbed simply by altering arbitrary letters, which can result
the five. However, we notice that ML-LOO takes a long time to in invalid words. However, the later discovery that perturbations
extract features, making this method less efficient. On the other can be applied not on text itself, but on its embeddings, has led
166
Fig. 1. Workflow of this paper, where important dimensions that differentiate between adversarial training methods are marked in bold.
to a growth of OO-based methods in this field [45–50]. Like in CV, are the important dimensions that can differentiate between
GAN [51] and VAT [52,53] are also applied to adversarial training different adversarial training methods, and will be the subject of
in the NLP domain. our analysis.
In the source code domain, we are faced with the same prob- To ensure the comprehensiveness of our methodology, we
lem of inefficient transformation. However, perturbing vector have carefully selected the important dimensions. First, we col-
representation cannot be directly applied because source code, lected currently available works about adversarial training in the
compared to natural languages, is limited by stricter rules. Re- code domain that (a) claim to be proposing a novel adversarial
searchers have managed to design adversarial training methods training method; and (b) only utilize adversarial training (i.e., not
suitable for source code, and we will study these methods in this integrating it with other defensive methods). These criteria serve
paper. to include methods that are exactly relevant to adversarial train-
ing. The methods we finally included are listed in Section 3.4.
2.3. Adversarial defenses for source code Next, we examined their training process, and summarized the
typical steps of adversarial training. They are shown as Steps 1
In response to adversarial attacks, researchers naturally de- and 2 in Fig. 1. Then, we determined which ones of the typ-
signed countermeasures to enhance the robustness of models of ical steps qualify as important dimensions. We exclude those
source code. These measures can be classified based on the timing steps that are identical across all adversarial training methods,
when they are deployed (during training or testing), and on the and choose those that contain some differences as important
target (data or model). The first type of defense is data anomaly dimensions. In this way, we make sure that available adversarial
detection. This defense tries to identify and exclude anomalies training methods in the code domain are covered, and for each
or perturbations from the data before testing [14]. The second method, all steps that make a difference are included in the study.
type is data augmentation. This defense tries to add more (but To investigate how different existing adversarial training meth-
not necessarily adversarial) examples into the training set in an ods affect model performance, we first prepare the input data
attempt to adapt the model to possible variations of the original and model to be used in adversarial training. Next, we carry
data [20,54,55]. The third type is model enhancement. This defense out adversarial training using different methods, changing the
tries to improve the model itself, for example by introducing a learning sample obtaining strategy and loss computation method
score or threshold beyond which the prediction of the model to study their effect. Last, through experimental analysis, we
should be deemed unreliable [18]. The last type is adversarial compare the robust models and the normal models in terms of
training. This defense tries to add the worst-case perturbations of their performance. In this workflow, we study the effect of each of
the original data into the training set [14–21]. It is different from the important dimensions by controlling variables. When carrying
data augmentation in that it requires learning the worst (i.e., most out adversarial training, we change one dimension at a time so
likely to mislead the model) examples. In this paper, we focus on that any difference in the performance of the resulting model can
this last type of defense, since it is the most widely used. be attributed to that dimension exclusively. In addition, we will
use statistical testing (elaborated in Section 4.1) to support the
3. Methodology conclusions we reach about their effects. In doing so, we ensure
the validity of our methodology.
This paper aims to study the impact of different existing ad- In the rest of this section, we will elaborate on the three
versarial training methods on the performance of neural models steps of our workflow, and derive a categorization of existing
of source code, under two tasks. Particularly, the tasks in question adversarial training methods based on the four dimensions we
are code authorship attribution and code summarization. We defined.
have chosen these two tasks because they are the most studied
tasks in adversarial training in the code domain, according to our 3.1. Step 1: Input preparation
survey. We will conduct our study from these four dimensions:
(a) input data, (b) input model, (c) learning sample obtaining In this step, we prepare the input data and input model
strategy during training, and (d) loss computation. Fig. 1 shows used in adversarial training. Different adversarial training meth-
the workflow of this paper, in which there are three steps: input ods vary in these two dimensions, so we will prepare different
preparation, adversarial training, and experimental analysis. In data and models to study how various training methods impact
the first two steps, some elements are marked in bold. These performance.
167
3.1.1. Preparing model The Min–max strategy requires maximizing the loss of individ-
First, we prepare the input model, which includes two parts: ual samples while minimizing the expectation of the losses on all
pre-training and parameter-tuning. Models can be divided into samples through gradient descent. In our experiments, to study
two types according to their degree of training: the effect of learning sample obtaining strategy, we manually
modified the training code written by Henke et al. [15] for the
• Non-pre-trained model: The model is untrained and the
weights of the model are newly initialized. Most adversarial code summarization task to make it randomly pick perturbed
training methods train from scratch with this type of model. samples from the input instead of picking the loss-maximizing
• Pre-trained model: The model has been trained before and ones. In this way, we force the training to adopt the directly-
the weights of the model have been updated by previ- sample strategy to compare it with the original Min–max strat-
ous training. In this case, adversarial training is actually egy. For the code authorship attribution task, we write two ver-
fine-tuning a pre-trained model. sions of training code, which is based on the implementation by
Quiring et al. [56], that utilize the two strategies respectively.
We will compare the performance of these two types of mod-
Note that the Min–max strategy does not require on-line
els. For the pre-trained type of model, we need to prepare it in
generation of perturbed samples. While on-line generation is
this step by training a normal model on clean data. Furthermore,
commonly adopted in the image domain, it is usually very com-
for a category of adversarial training known as Adversarial
putationally expensive in the domain of code. We refer the reader
Fine-tuning, which uses pre-trained models, the number of
epochs to fine-tune (which is a hyperparameter of the input to the discussion in Section 5 for detail. Therefore, one solution
model) may have an effect on the final model’s performance. We is to generate the perturbed samples before training, pick the
will also tune this hyperparameter in this step to study its effect. ones with the maximum losses among these during training,
and then regenerate the perturbed samples for picking after a
3.1.2. Preparing data certain interval (e.g., every 2 epochs). This is how Henke et al. [15]
We prepare the input data, which again includes two parts: addressed the problem, and we will adopt this solution too.
preparing adversarial and perturbed examples, and synthesizing
the training set. In addition to clean data, some adversarial train-
3.2.2. Loss computation
ing methods require adversarial examples, while others further
Adversarial training methods vary in their loss computation.
require perturbed examples (i.e., examples that result from failed
There are mainly two types of loss:
attacks). In this step, we need to prepare these examples. This
can be done by attacking a normal model (e.g., the pre-trained • Ordinary loss: This type of loss does not differentiate be-
model we have just trained) on the training set. Note that certain tween samples.
conditions must be met for the attack chosen, which will be • Composite loss: In this type of loss, some terms are com-
described in Appendix. puted on clean samples, others are computed on adversar-
After preparing the training examples, we synthesize the train- ial or perturbed samples. The terms are assigned certain
ing set. Different adversarial training methods require different weights and then added together as the final loss.
training sets, but they can be roughly classified into two types:
In our experiments, we manually modified the training code
• Clean data + adversarial data: The training set consists of by Henke et al. [15] and Quiring et al. [56] to study the effect
original clean samples and adversarial examples. of loss computation. Additionally, we will conduct an ablation
• Clean data + adversarial data + perturbed data: The train- study on the Min–max + Composite Loss method to evaluate
ing set consists of original clean samples, adversarial exam-
whether the two components combine to achieve a better result
ples, and perturbed examples.
than when they are used alone.
Thus, in this step, we need to arrange the adversarial and per- There are some caveats when doing adversarial training. First,
turbed examples we have just prepared so that they fit different when experimenting on the ‘‘Directly-sample + Ordinary loss’’
adversarial training methods. method (which we will later name as Mixing Directly) under
the code summarization task, we found that the training does
3.2. Step 2: Adversarial training not converge. Therefore, we modified the training code and in-
troduced gradient clipping to address the problem. Second, not
In this step, we carry out adversarial training using the data all clean samples can be successfully attacked to generate their
and model we prepared in the previous step. We will obtain adversarial version, while some clean samples may correspond
a robust model when this step is finished. In the training loop to multiple adversarial versions due to the application of dif-
of adversarial training, there are two dimensions differentiating ferent transformations. Thus, for the ‘‘Composite Loss’’ and
various adversarial training methods: learning sample obtaining ‘‘Adversarial Fine-tuning’’ methods, we will use the origi-
strategy and loss computation. In this step, we will change them nal clean sample when there is no adversarial version available,
to study their effect. and we will randomly pick one if there are multiple adversarial
versions. Besides, we assign a weight of 0 to clean samples when
3.2.1. Learning sample obtaining strategy
doing Adversarial Fine-tuning, because this type of training
In the existing adversarial training methods, we see two types
requires fine-tuning on adversarial examples only.
of strategy for obtaining the samples to learn in the current
training iteration:
3.3. Step 3: Experimental analysis
• Directly-sample strategy: This strategy directly samples
from the input and learns the sampled data points as they
are. Now that we have trained all the necessary models, in this
• Min–max strategy: This strategy applies transformations step, we attack these models on the testing set, and then evaluate
on the samples from the input, and/or picks the ones that their performance with certain metrics. We aim to answer the
maximize the loss to learn. research questions and gain insights through evaluation.
168
3.3.1. Attacking on testing set

TP
We need to attack the normal and robust models on the Recall = . (5)
testing set. This is different from what we did when preparing TP + FN
adversarial data in Step 1 because now we are attacking on F1-score is defined as the harmonic mean of Precision (P) and
the testing set instead of the training set. The conditions for Recall (R), as shown below:
attack (see Appendix) still apply, and under the code authorship 2 P ·R
attribution task, the default random seed is used when attacking. F1 -score = =2· . (6)
P −1 + R−1 P +R
3.3.2. Evaluating with metrics Note that for the authorship attribution task, each class has
We analyze the results with the help of certain metrics. The its own F1-score, and we will report the average of the F1-score
metrics we use include accuracy, F1-score, untargeted attack on all classes. For the code2seq model used in the code summa-
success rate (ASRunt ), and, for code authorship attribution only, rization task, the prediction given by the model is a sequence
targeted attack success rate (ASRtar ). Accuracy is defined as the of words which represent the name that the model gives to the
proportion of correctly predicted samples to the total number of input function or method. For example, given a function that
samples in the testing set, as formally shown below: implements the binary search algorithm, a code2seq model may
give the prediction ‘‘binary search’’. Each word in the predic-
N
1 ∑ tion is known as a subtoken. To better assess the performance
Accuracy = 1 (M (xi ) = yi ) , (1) of code2seq, we report two versions of accuracy and ASRunt :
N
i=1 method-level and subtoken-level. Method-level means that in the
where N is the total number of samples in the testing set, M(·) definition of the metric in question, all ‘‘samples’’ or ‘‘examples’’
is the model, xi is the ith sample in the testing set, yi is the label are interpreted as ‘‘methods’’. Subtoken-level means that in the
corresponding to xi , and 1(·) is the indicator function. definition of the metric in question, all ‘‘samples’’ or ‘‘examples’’
ASRunt is defined as the proportion of adversarial examples are interpreted as ‘‘subtokens’’. For example, method-level accu-
(generated by an untargeted attack) to the number of samples racy is defined as the proportion of correctly predicted methods
correctly predicted by the normal model, as formally shown to the total number of methods in the testing set.
below:
3.4. Categorization of adversarial training methods
N
1∑
ASRunt = 1 (M (xi ) = yi ∧ M (t(xi )) ̸ = yi ) , (2) This subsection provides a categorization of existing adversar-
S
i=1 ial training methods for models of code based on the four dimen-
where S is the number of samples correctly predicted by the sions we define. The categorization will serve as the basis for our
normal model, and t(·) is any of the applicable transformations. experiments. We first divide existing adversarial training meth-
ASRtar is defined as the proportion of adversarial examples ods, according to their learning sample obtaining strategy be-
(generated by a targeted attack) to the total number of samples cause it determines the main process of training, into two major
in the testing set. An adversarial example in the targeted sense categories: Data-Augmentation-like (DA-like) and Optimization-
must be predicted by a model to be a certain class or result (the Objective-based (OO-based). DA-like methods are relatively
target) designated by the attacker. The formal definition of ASRtar straightforward. They first require training a normal model for
is: attacking on. The goal of the attack is to obtain adversarial
N
examples, which will then be combined with the clean training
1 ∑ ( set in some way to train the final robust model. During training,
1 M (t (xi )) = y∗i ,
)
ASRtar = (3)
N the samples to learn come directly from the input data. OO-based
i=1
methods formulate the defense as an optimization problem. They
where y∗i is the target designated by the attacker corresponding require designing an objective function that reflects the robust-
to xi . We only assess ASRtar under the authorship attribution task, ness of the model, and attempt to maximize or minimize this
and leave the assessment of ASRtar for the code summarization objective by generating or choosing the best examples from
task to future work because to the best of our knowledge, as of the input data to learn on the fly. Thus, during training, the
the time of writing of this paper, there is no publicly available training set constantly varies to adapt to the updates of the model
open-source targeted attack against the code2seq model used in weights. These two categories may be further broken down,
our code summarization experiments. as shown in Table 1. We will describe the methods in detail
F1-score is a classical metric in machine learning. It is calcu- presently, and the notations used in the descriptions are given
lated based on Precision and Recall. Precision and Recall, in turn, in Table 2.
are calculated based on the True Positives (TP), False Positives
(FP), and False Negatives (FN). For the purposes of the code sum- 3.4.1. Data-augmentation-like methods
marization task, a TP is a word that appears in both the prediction
Mixing Directly. This method trains a normal model M on the
and the true label. An FP is a word that appears in the prediction clean training set D, and carries out an attack on it, obtaining the
but not in the true label. An FN is a word that appears in the true adversarial examples set Dadv . It then mixes D and Dadv directly,
label but not in the prediction. These definitions are the same as forming a new training set D′ , and the final model M ∗ is obtained
used by Henke et al. [15]. For the authorship attribution task, a by training on D′ with the original model architecture and param-
multiclass classification problem, TP of a particular class is the eters. In this method, clean data and adversarial data are treated
number of samples predicted correctly to be this class. FP of a equally. In the computer vision domain, Szegedy et al. [27] first
particular class is the number of samples predicted to be this proposed this method. An example of this method applied to the
class but are actually not. FN of a particular class is the number of code domain is seen in Zhang et al. [17].
samples wrongly predicted to be some class other than this class Theoretically, the advantages of this method are that it is
in question. Formally, Precision and Recall are defined as: simple to understand and easy to implement. It can be employed
TP without on-line training, so it is also efficient in terms of training
Precision = , (4) time. On the other hand, its disadvantage is that it cannot adapt
TP + FP
169
Table 1
Categorization of existing adversarial training methods for models of code.
Category Dimensions Examples
Input data Input model Learning sample Loss computation
obtaining strategy
Mixing Directly Clean Non-pre-trained Directly-sample Ordinary Zhang et al. [17]
+ Adversarial
Composite Loss Clean Non-pre-trained Directly-sample Composite Li et al. [20],
+ Adversarial Ablation study on
DA-like Min–max +
Composite Loss
Adversarial Fine-tuning Clean Pre-trained Directly-sample Ordinary Yefet et al. [14],
+ Adversarial Springer et al.
[16],
Yang et al. [21]
Min–max + Composite Loss Clean Non-pre-trained Min–max Composite Yefet et al. [14],
+ Adversarial Henke et al. [15],
+ Perturbed Srikant et al. [19]
OO-based
Min–max Clean Non-pre-trained Min–max Ordinary Bielik et al. [18],
+ Adversarial Ablation study on
+ Perturbed Min–max +
Composite Loss
Table 2 3.4.2. Optimization-objective-based methods

Notations used in describing adversarial training methods.
Notation Definition Min–max + Composite Loss. This method models adversarial
training as the optimization problem:
D Clean training set
Dadv Adversarial training set argmin E maxL(w, t(x), y), (8)
xD A sample from the clean training set w∈H (x,y)∼D t ∈T
xadv A sample from the adversarial training set
xtrans A sample generated by a code transformation (not where T is the set of applicable transformations, and t(x) is syn-
necessarily adversarial) onymous with xtrans in our discussion. It tries to (re)generate or
M Normal model trained on a clean training set choose the loss-maximizing sample xtrans to approximate the max
M∗ Robust model obtained through adversarial training part, while doing gradient descent on the generated or chosen
samples as well as the clean samples to satisfy the min part. Dur-
ing training, the losses of xD and xtrans are calculated and added
to the model’s variation during training because the training set together. Madry et al. [33] first proposed this method in the image
is generated based on attacking the non-robust model, and does domain. Yefet et al. [14], Henke et al. [15] and Srikant et al. [19]
not change as the training progresses. In addition, it may decrease applied this method to the domain of code. Yefet et al. [14]
the model’s accuracy on the clean data. use gradient ascent and BFS to generate xtrans that maximizes
the loss, using one of two transformations during training, and
Composite Loss. This method trains a normal model M on the then train on the generated samples. Henke et al. [15] use one
clean training set D, and carries out an attack on it, obtaining of eight transformations, regenerate all xtrans every two epochs,
the adversarial examples set Dadv . For each sample xD in D, an and then train on the generated or regenerated samples with the
adversarial example xadv corresponding to xD is randomly chosen maximum loss. Srikant et al. [19] adopt a similar process to that
to calculate the composite loss: of Henke et al. [15], but they formulate the problem as a joint
optimization problem that needs to be solved by determining the
L′ = α ∗ L(xD , yD ) + β ∗ Ladv (xadv , yD ), (7) perturbation site and value.
where L is the original loss, Ladv is the adversarial loss, and α Theoretically, the advantage of this method is that it adapts to
and β are weights assigned to the clean data or the adversarial the constant change of the model during training by dynamically
data. Then the training proceeds in this composite-loss way until generating or choosing the sample that maximizes the loss with
it converges or reaches a certain condition. respect to the current state of the model to learn. However, it is
very time-consuming, which is one of its disadvantages.
Theoretically, the advantage of this method lies in the fact
that it can maintain a high accuracy on the clean data, while also Min–max. This method is largely identical with Min–max + Com-
improving the model’s robustness, as a result of striking a balance posite Loss, except that only the loss of xtrans is calculated and
between clean and adversarial data during training. It has the used to update model weights. In other words, for each clean
same disadvantage of unadaptability as with Mixing Directly. sample, a perturbed version is generated to replace it, rather than
augment it. In the domain of code, Bielik et al. [18] adopt this
Adversarial Fine-tuning. This method trains a normal model
method in the variable type inference task.
M on the clean training set D, and carries out an attack on it, Theoretically, the advantages and disadvantages of this
obtaining the adversarial examples set Dadv . Then, for each xD , it method are similar to those of Min–max + Composite Loss.
chooses an adversarial version xadv from Dadv and trains on it for However, this method may also harm the accuracy on clean data.
one epoch to obtain M ∗ . The applications of this method in the
code domain are seen in Yefet et al. [14], Springer et al. [16], and 4. Experimental results
Yang et al. [21].
Theoretically, the advantage of this method is that it can train In this section, we present and discuss our experimental re-
on newly seen samples without completely training from scratch, sults. Through the discussions, we answer the following Research
thus reducing the training time. It also has the same disadvantage Questions (RQs) relating to the four dimensions we identify in the
of unadaptability as Mixing Directly. previous section:
170
Table 3
Datasets used for the two tasks in our experiments.
Task Dataset # Authors # Programs/Methods
(Training:Testing)
Authorship attribution GCJ C++ 204 1428:204
Code summarization java-small – 150000:20000
• RQ1: How does learning sample obtaining strategy during

training impact model performance? Different adversarial
training methods may have different strategies for obtaining
the samples to be learned in the current training iteration.
Some methods directly sample from the input and learn
them as they are, while others first perturb samples from
the input, and/or pick those that maximize the loss to learn.
We will empirically evaluate these two strategies and study
their impact on model performance.
• RQ2: How does loss computation impact model perfor-
mance? Some adversarial training methods use the ordi-
Fig. 2. Distribution of F1-score using the Mixing Directly method under code
nary loss, where all terms are computed against all samples summarization.
equally, while others use a composite loss, where some
terms are computed against clean samples, other terms
against adversarial or perturbed samples. We seek to study
a high performance under this configuration, and so a grid search
the impact of different loss functions on model performance.
is not necessary.
• RQ3: How does input data impact model performance?
Some adversarial training methods require the input to be Datasets. We use one dataset for each task. The datasets we use
adversarial examples, while others require both adversarial are listed in Table 3. For authorship attribution, we use the GCJ
and perturbed examples (i.e., examples that result from C++ dataset, in agreement with Abuhamad et al. [8] and Quiring
failed attacks). Even within the same class of examples, the et al. [56]. GCJ is a programming contest held by Google, and
composition may be different in other ways (e.g., type of the GCJ C++ dataset includes code written by the contestants. For
perturbations may vary). We seek to study the impact that code summarization, we use the java-small dataset, in agreement
certain properties of the input data has on model perfor- with Henke et al. [15]. The java-small dataset is prepared by
mance. the authors of code2seq, and is suitable for code summarization.
• RQ4: How does input model impact model performance? We carry out k-fold cross validations for authorship attribution
Most adversarial training methods begin with a non-pre- (where k = 8) and for code summarization (where k = 10). Specif-
trained model, but a type of training known as Adver- ically, we divide the original dataset into k subsets. For each fold,
sarial Fine-tuning continues training on a pre-trained we use k-1 subsets for training and the remaining 1 subset for
model. Besides, for Adversarial Fine-tuning, the num- testing. We make sure each subset is used once for testing. We
ber of epochs to fine-tune (which is a hyperparameter of will report the average performance of all folds in the following
the input model) may have an effect on the final model’s subsections. For targeted attack in authorship attribution, we can-
performance. We seek to study the impact that the input not carry out cross validation because this type of attacks takes a
model has on final model performance. very long time to complete. For one specific model on one fold,
it will take 3 days on average on our machine. We have 8 folds
Furthermore, we carry out an ablation study on the Min–max and more than 20 models, considering the various dimensions
+ Composite Loss method to verify its two components com- outlined in the previous section. Therefore, cross validation is
bine to produce a better performance than when they are used prohibitively expensive for targeted attacks. As a result, for ASRtar
individually. The results will be given when we explore RQ1 and under authorship attribution, we will only report the results for
RQ2. We have published our implementation of experiments on the first fold.
GitHub at https://github.com/AdvTrainEvalSrcDomain/adv-train-
evaluation-source-code-domain. Data distribution. To facilitate statistical testing when answering
the RQs, we demonstrate the distribution of F1-score on all folds
4.1. Experimental setup under code summarization, in Fig. 2. We repeat on the first two
folds once to gather more data points for more accurate analysis.
It is evident that the data points are normally distributed, which
Tasks and Models. We experiment on two tasks: code authorship
justifies the use of statistical t-tests. Due to the enormous amount
attribution and code summarization. In authorship attribution,
of data, we take the distribution when the model is trained
the model is given a program (a single file containing source
using Mixing Directly as an example to plot the figure. The
code), and is expected to predict its author correctly. We use
remainder of the data on all metrics and tasks has been tested
the model proposed by Abuhamad et al. [8] and implemented by
and found to follow similar distribution.
Quiring et al. [56] for this task. We will refer to it as Abuhamad. In
code summarization, the model is given a snippet of code (usually Statistical testing. We conduct paired t-tests to validate the
a function or a method), and is expected to summarize its func- comparison results. First, we formulate hypotheses regarding the
tionality by, for example, predicting its function name. We use the performance of the two methods under comparison. Second, we
code2seq model for this task. We use default hyperparameters pair the two methods’ F1-scores or ASRunt for each of all folds,
for both models, except that for Abuhamad, we fix the maximum and then for each pair, we calculate the mean and the standard
number of training epochs to 300 and the number of RNN units error of its difference. Next, we compute the value of t by dividing
to 288 because we have observed that the model always achieves the absolute value of the mean by the standard error. Then, we
171
Table 4 Table 5
Comparison of model performance of adversarial training methods using dif- Statistical testing for learning sample obtaining strategy.
ferent learning sample obtaining strategies and loss computation. For code Model (task) Hypotheses Conclusions
summarization, method-level metrics are shown on the left side of the slashes;
subtoken-level metrics are on the right. F1-score: Reject H0
H0: Mixing Directly ≥ Min–max
Model (task) Adv. Train. Accuracy F1-score ASRunt ASRtar
category (%) (%) (%) H1: Mixing Directly < Min–max
No defense 10.2/29.8 43.4 91.0/36.7 – method-level ASRunt : Reject H0
Mixing 10.6/27.5 39.5 75.5/25.0 – H0: Mixing Directly ≥ Min–max
Directly
DA-like H1: Mixing Directly < Min–max
Comp. Loss 10.5/31.2 42.6 80.7/30.0 –
code2seq (Ablation) subtoken-level ASRunt : Accept H0
(Code Summ.) Min–max 11.6/31.9 44.0 75.3/30.9 – H0: Mixing Directly ≥ Min–max
+ Comp. Loss H1: Mixing Directly < Min–max
OO-based
Min–max 10.8/31.0 42.9 81.8/21.8 – code2seq
(Ablation) F1-score: Reject H0
(Code Summ.) H0: Composite Loss ≥ Min–max + Comp.
No defense 81.3 77.7 71.9 10.2
Mixing 81.8 78.0 43.7 0.8
Loss
Directly H1: Composite Loss < Min–max + Comp.
DA-like Loss
Comp. Loss 83.8 80.1 60.0 4.7
Abuhamad (Ablation)
(Author. Attrib.) method-level ASRunt : Accept H0
Min–max 81.9 77.6 49.3 5.1
+ Comp. Loss H0: Composite Loss ≥ Min–max + Comp.
OO-based
Min–max 5.4 3.6 92.9 0.0 Loss
(Ablation) H1: Composite Loss < Min–max + Comp.
Loss
subtoken-level ASRunt : Accept H0
H0: Composite Loss ≥ Min–max + Comp.
look up the p-value in the table for t-distribution and compare it Loss
with our predefined significance level α = 0.05. Finally, we reach H1: Composite Loss < Min–max + Comp.
conclusions about our hypotheses. We will give an example of Loss
how one of the tests is conducted in RQ1. For subsequent tests, F1-score: Reject H0
we will only state our hypotheses and conclusions. H0: Mixing Directly ≤ Min–max
H1: Mixing Directly > Min–max
4.2. RQ1: Impact of learning sample obtaining strategy ASRunt : Reject H0

H0: Mixing Directly ≥ Min–max
H1: Mixing Directly < Min–max
In this subsection, we evaluate the impact of learning sample
F1-score: Reject H0
obtaining strategy on model performance. To that end, we also Abuhamad
(Author. Attrib.) H0: Composite Loss ≤ Min–max + Comp.
conduct an ablation study on Min–max + Composite Loss to Loss
investigate the necessity of its incorporating the Min–max strat- H1: Composite Loss > Min–max + Comp.
egy. In the tables demonstrating the results, we will highlight Loss
in bold the figures that represent the best performance in each ASRunt : Reject H0
column for each task. For the code summarization task, we will H0: Composite Loss ≤ Min–max + Comp.
show the values of subtoken-level metrics on the right side of Loss
H1: Composite Loss > Min–max + Comp.
the slashes. For ease of description, we will refer to accuracy and Loss
F1-score collectively as ‘‘effectiveness’’, and to ASRunt and ASRtar
collectively as ‘‘robustness’’. Note that lower ASRunt and ASRtar
indicate higher robustness.
Table 4 demonstrates the performance of models trained using study the effect of data volume in RQ2. Additionally, as part of the
different learning sample obtaining strategies. DA-like methods ablation study, we also note that Min–max + Composite Loss
adopt the sample-directly strategy, and OO-based methods adopt performs better than Composite Loss.
the Min–max strategy. For the code summarization task, as the As for the authorship attribution task, most adversarial train-
results show, all adversarial training methods boost model ro- ing methods boost model robustness, but effectiveness is harmed
bustness effectively, and OO-based methods perform generally to varying degrees. To our surprise, DA-like methods perform
better than DA-like methods. This can be accounted for by the better in terms of effectiveness, and Mixing Directly, which
fact that OO-based methods adapt the model to the worst per- is a DA-like method, performs best in terms of robustness. This
turbations, and require the perturbed examples to be regenerated method not only achieves an accuracy on par with Min–max +
regularly to ensure the examples are up-to-date as the training Composite Loss, an OO-based method, but also secures a lower
progresses, so OO-based methods generalize better to unseen ASRunt (43.7% vs 49.3%). This may be accounted for by the limited
perturbations. In contrast, DA-like methods do not change the
times of perturbed examples regeneration we use in this task for
perturbed examples during training, so they may be less prepared
OO-based methods. We will study its impact presently. As part of
in the face of new perturbations. Among OO-based methods,
the ablation study, we note that Min–max + Composite Loss, in
Min–max + Composite Loss performs best on virtually all met-
comparison with Composite Loss, has a better ASRunt and ASRtar
rics, be it effectiveness or robustness. Compared with using no
defense, it manages to boost accuracy, while significantly de- but lower accuracy.
creasing the method-level ASRunt by 15.7%. In terms of subtoken- For statistical testing, we formulate six hypotheses for code
level ASRunt , Min–max, which is an OO-based method, performs summarization and four hypotheses for authorship attribution.
best with a drop by 14.9%. However, it is worth noting that The hypotheses are shown in Table 5. We observe that for code
Mixing Directly harms subtoken-level accuracy and F1-score summarization, OO-based methods perform better in effective-
more than any other method. This could be because it requires ness, but for authorship attribution, the DA-like method is more
learning much more adversarial examples than other methods, effective. In terms of robustness, for code summarization, there
but it does not take both clean and adversarial examples into is evidence that OO-based methods are better, but for authorship
account when computing loss, so accuracy is harmed. We will attribution, the DA-like method is better. To sum up, the results
172
Table 6 saturated accuracy. For the latter, the model continues to learn
Comparison of Mixing Directly and Min–max + Composite Loss with differ- and improve as the training progresses, so the robustness keeps
ent number of times of perturbed examples regeneration, code summarization
rising as more adversarial examples are regenerated for learning.
task.
Model (task) Adv. Train. # Times of Accuracy F1-score ASRunt
category regeneration (%) (%)
Mixing – 10.6/27.5 39.5 75.5/25.0
code2seq Directly
(Code Summ.) 9 11.6/31.9 44.0 75.3/30.9
Min–max
+ Comp. Loss 2 12.6/31.9 43.4 80.3/32.7
are mixed, and there does not seem to be a learning sample To sum up the discussion, we present:
obtaining strategy which is decisively better.
As an example, we describe the testing process for the F1-
scores of Mixing Directly and Min–max under code summa-
rization. First, we decide we want to test whether the F1-score of
Mixing Directly is greater than or equal to that of Min–max, or
less than it, which gives us two hypotheses H0 and H1. Second,
we calculate the mean and the standard error of the difference
between the F1-scores of the two methods for each pair (fold):
∑10
∆F 1i
∆F 1 = i=1 = −3.45, (9)
10


1 ∑10 4.3. RQ2: Impact of loss computation
σ =√ (∆F 1i − ∆F 1)2 = 1.06, (10)
10 In this subsection, we evaluate the impact of loss computation
i=1
σ on model performance. To that end, we also conduct an ablation
σ∆F 1 = √ = 0.34. (11) study on the Min–max + Composite Loss method to investigate
10 the necessity of its incorporating the composite loss.
Next, we compute the value of t: Table 4 shows the performance of models trained using dif-
ferent loss functions. As the results demonstrate, under code
| ∆F 1 |
t= = 10.29. (12) summarization task, for DA-like methods, composite loss per-
σ∆ F 1 forms worse than ordinary loss (Mixing Directly) in that the
Then, we obtain the p-value from the t-distribution table and method-level and subtoken-level ASRunt rise by 5.2% and 5.0%
compare it with the significance level: respectively, suggesting a decline in model robustness. However,
composite loss performs better than ordinary loss (Mixing Di-
p < 0.001 < α = 0.05. (13) rectly) in terms of subtoken-level accuracy and F1-score. For
Finally, because the p-value is less than the significance level, OO-based methods, the results demonstrate that the composite
we reach a conclusion: the hypothesis H0 is rejected, which loss (Min–max + Composite Loss) performs better than the
means this test supports the statement that the F1-score of Mix- ordinary loss (Min–max) in terms of both effectiveness and ro-
ing Directly is less than that of Min–max under code summa- bustness. These phenomena may be explained by the fact that
rization, which is highly statistically significant. composite loss takes both clean and adversarial examples into
consideration, so the model tries to strike a balance between
effectiveness and robustness during training, which can possibly
lead to a lower robustness and a higher effectiveness. We also
suspect the worse performance of composite loss for DA-like
methods is due to a difference in data volume, which we will
investigate presently.
For the authorship attribution task, we observe a trend similar
to that under the code summarization task. For DA-like meth-
ods, composite loss performs worse than ordinary loss (Mixing
Directly) in robustness but better in effectiveness. For OO-
Reflecting on the training process for Min–max + Compos- based methods, we observe something peculiar: composite loss
ite Loss, we wonder if the number of times that perturbed (Min–max + Composite Loss) performs better than ordinary loss
examples are regenerated has an impact on model performance. (Min–max) in terms of accuracy and ASRunt , but worse in terms
Due to limited time, we only regenerated perturbed examples of ASRtar . Besides, the accuracy of ordinary loss (Min–max) drops
twice in the authorship attribution task, but we regenerated nine so significantly (to 5.4%) that the model prediction is not reliable
times (which is the default) in the code summarization task. at all. This may account for the 0.0% ASRtar in Min–max, because
To verify our intuition, we try to regenerate only twice in code the MCTS attack will not attack samples that are misclassified at
summarization and see if this has an impact. The results are the beginning, so there are little samples that could be attacked
shown in Table 6. We find that for Min–max + Composite Loss, to begin with, let alone those that could be targeted-attacked
when we reduce the number of times of regeneration to two, the successfully. We speculate that this could be because in this
model’s ASRunt experiences an obvious rise. However, changing method, the model training is guided by adversarial examples
the number of times of regeneration seems to have no substantial only, and the examples to learn change very frequently, so the
impact on effectiveness. This may be explained by the fact that model has trouble converging stably to an optimal solution. In
in Min–max + Composite Loss, clean samples are unchanged, contrast, in Min–max + Composite Loss, training is guided by
while adversarial examples are constantly changing, so the model both unchanged clean samples and changing adversarial exam-
may learn the former well in a relatively short time, leading to a ples, so there is some stability in the examples to learn. However,
173
Table 7 Table 8
Statistical testing for loss computation. Comparison of mixing all adversarial examples versus one adversarial example
Model (task) Hypotheses Conclusions for each clean sample, for the Mixing Directly method. For code sum-
marization, method-level metrics are shown on the left side of the slashes;
F1-score: Reject H0 subtoken-level metrics are on the right.
H0: Mixing Directly ≥ Composite Loss
H1: Mixing Directly < Composite Loss category (%) (%) (%)
method-level ASRunt : Reject H0 No defense 10.2/29.8 43.4 91.0/36.7 –

H0: Mixing Directly ≥ Composite Loss Mixing Directly 10.6/27.5 39.5 75.5/ 25.0 –
(Adding all adv.
H1: Mixing Directly < Composite Loss examples)
subtoken-level ASRunt : Reject H0 code2seq Mixing Directly 8.4/29.4 42.3 77.2/26.8 –
(Code Summ.) (One adv. example
H0: Mixing Directly ≥ Composite Loss for each clean
H1: Mixing Directly < Composite Loss sample)
code2seq
F1-score: Reject H0 Composite Loss 10.5/31.2 42.6 80.7/30.0 –
(Code Summ.)
H0: Min–max ≥ Min–max + Comp. Loss No defense 81.3 77.7 71.9 10.2
H1: Min–max < Min–max + Comp. Loss Mixing Directly 81.8 78.0 43.7 0.8
(Adding all adv.
method-level ASRunt : Reject H0 examples)
H0: Min–max ≤ Min–max + Comp. Loss Abuhamad Mixing Directly 88.4 85.0 50.0 2.4
(Author. Attrib.) (One adv. example
H1: Min–max > Min–max + Comp. Loss
for each clean
subtoken-level ASRunt : Accept H0 sample)
H0: Min–max ≤ Min–max + Comp. Loss Composite Loss 83.8 80.1 60.0 4.7
H1: Min–max > Min–max + Comp. Loss
F1-score: Reject H0 Table 9
H0: Mixing Directly ≥ Composite Loss Statistical testing for Mixing Directly and Composite Loss with the same
H1: Mixing Directly < Composite Loss data volume.
ASRunt : Reject H0 Model (task) Hypotheses Conclusions
H0: Mixing Directly ≥ Composite Loss F1-score: Accept H0
H1: Mixing Directly < Composite Loss H0: Mixing Directly (1 adv. for 1 clean) ≤ Composite Loss
Abuhamad H1: Mixing Directly (1 adv. for 1 clean) > Composite Loss
(Author. Attrib.) F1-score: Reject H0 method-level ASRunt : Reject H0

code2seq
H0: Min–max ≥ Min–max + Comp. Loss H0: Mixing Directly (1 adv. for 1 clean) ≥ Composite Loss
(Code Summ.)
H1: Mixing Directly (1 adv. for 1 clean) < Composite Loss
H1: Min–max < Min–max + Comp. Loss
subtoken-level ASRunt : Reject H0
ASRunt : Reject H0 H0: Mixing Directly (1 adv. for 1 clean) ≥ Composite Loss
H0: Min–max ≤ Min–max + Comp. Loss
F1-score: Reject H0
H1: Min–max > Min–max + Comp. Loss H0: Mixing Directly (1 adv. for 1 clean) ≤ Composite Loss
Abuhamad H1: Mixing Directly (1 adv. for 1 clean) > Composite Loss
(Author. Attrib.) ASRunt : Reject H0
H0: Mixing Directly (1 adv. for 1 clean) ≥ Composite Loss
this degradation of performance does not appear in code sum-
marization, which may indicate that it is model-dependent. As
part of our ablation study, this confirms the necessity of using a
composite loss in Min–max + Composite Loss. performance of these two methods when they receive the same
For statistical testing, we formulate six hypotheses for code amount of data to learn. Concretely, we prepare a training set in
summarization and four hypotheses for authorship attribution. which each clean sample corresponds to at most one adversarial
The hypotheses are shown in Table 7. We observe that in terms version (some clean samples may not have adversarial versions
of effectiveness, composite loss usually performs better than or- because of failed attacks). Then, we train on this training set using
dinary loss. In terms of robustness, when using DA-like methods, Mixing Directly.
ordinary loss performs better, whereas for OO-based methods, The results are shown in Table 8. We observe that as far as
there is more evidence supporting the claim that composite loss DA-like methods are concerned, even with the same amount of
is better. training data, ordinary loss still performs better than composite
loss in terms of robustness. Meanwhile, the results for effective-
ness are divided: for code summarization, composite loss is more
effective, but for authorship attribution, ordinary loss prevails.
We speculate this may be because in ordinary loss, a clean sample
and its adversarial version are treated separately, so there may be
two backpropagations. On the other hand, in composite loss, the
clean sample and its adversarial version are treated as one, their
losses added together, so there is only one backpropagation. More
backpropagations mean more updates of the model weights and
more accurate fitting of the input, and therefore may explain the
higher robustness of ordinary loss. As for the lower effectiveness,
However, before we jump to the conclusion that composite
loss does not combine well with DA-like methods, we would it may be that the code2seq model needs more samples than the
like to take the issue of data volume into account. In DA-like Abuhamad model does in order to achieve higher effectiveness,
methods, Mixing Directly, which uses the ordinary loss, re- which is also why when we add all adversarial examples into the
quires that all adversarial examples be mixed into the training training set, the former model sees a rise in effectiveness. For OO-
set, but Composite Loss requires that for each clean sample, based methods, however, with both submethods, there is only
one adversarial version of it be chosen to learn. These particular one backpropagation because Min–max only learns adversarial
requirements of existing adversarial training methods have led examples, so composite loss manifests its strengths.
to a difference in the volume of the training data. If one method For statistical testing, we formulate three hypotheses for code
has more data to learn than the other, it is possible that it may summarization and two hypotheses for authorship attribution.
have benefited from this fact. Therefore, we also investigate the The hypotheses are shown in Table 9. The conclusions confirm
174
In answering RQ1, we find that OO-based methods perform

Table 10
better than DA-like methods under the code summarization task,
Comparison of including perturbed examples versus excluding perturbed exam-
ples for DA-like methods. For code summarization, method-level metrics are but we wonder if this is because OO-based methods include
shown on the left side of the slashes; subtoken-level metrics are on the right. perturbed examples in the input training set. Therefore, we also
category (%) (%) (%) add perturbed examples into the input for DA-like methods to
No defense 10.2/29.8 43.4 91.0/36.7 – investigate the impact of input data. The results are shown in
Mixing 10.6/27.5 39.5 75.5/25.0 – Table 10. We observe that under the authorship attribution task,
Directly
Mixing Directly 9.2/28.5 40.2 85.0/30.1 – adding perturbed examples improves the model performance by
code2seq + pert. examples
(Code Summ.) DA-like
a bit, but does not make any substantial difference. Under the
Composite 10.5/31.2 42.6 80.7/30.0 –
Loss code summarization task, we observe a similar trend: no substan-
Composite Loss 11.2/31.5 42.8 82.4/30.8 – tial improvement is made by including perturbed examples into
+ pert. examples
No defense 81.3 77.7 71.9 10.2
the training set.
Mixing 81.8 78.0 43.7 0.8 For statistical testing, we formulate six hypotheses for code
Directly
summarization and four hypotheses for authorship attribution.
Mixing Directly 81.6 78.0 41.4 0.5
Abuhamad + pert. examples The hypotheses are shown in Table 11. The tests overwhelmingly
(Author. Attrib.) DA-like
Composite 83.8 80.1 60.0 4.7 support the claim that including perturbed examples for DA-
Loss
Composite Loss 80.1 75.4 54.9 2.9 like methods does not help either effectiveness or robustness for
+ pert. examples
code summarization. Under authorship attribution, robustness is
enhanced by a bit at the cost of effectiveness.
Table 11
Statistical testing for including perturbed examples.
Model (task) Hypotheses Conclusions
F1-score: Accept H0
H0: Mixing Directly ≥ Mixing Directly + pert. examples
H1: Mixing Directly < Mixing Directly + pert. examples
method-level ASRunt : Reject H0
code2seq F1-score: Accept H0
(Code Summ.) H0: Composite Loss ≥ Composite Loss + pert. examples
H1: Composite Loss < Composite Loss + pert. examples
In addition, we also wonder if the types of transformations
H0: Composite Loss ≥ Composite Loss + pert. examples used in generating the input adversarial examples have an impact
on model performance. Is there a type of transformation which,
H0: Composite Loss ≥ Composite Loss + pert. examples when applied to generate adversarial examples, will be more
F1-score: Accept H0
effective in misleading the model than other types, irrespective of
the model or task? To answer that, we exclude all adversarial ex-
ASRunt : Accept H0 amples generated by each of the chosen transformations to study
Abuhamad H0: Mixing Directly ≥ Mixing Directly + pert. examples
(Author. Attrib.) H1: Mixing Directly < Mixing Directly + pert. examples their effect. For code summarization, the chosen transforma-
F1-score: Reject H0 tions are RenameFields, RenameLocalVariables, RenamePa-
H0: Composite Loss ≤ Composite Loss + pert. examples
H1: Composite Loss > Composite Loss + pert. examples rameters, AddDeadCode, InsertPrintStatements, Unroll-
ASRunt : Reject H0 Whiles, WrapTryCatch, and ReplaceTrueFalse. For
H0: Composite Loss ≤ Composite Loss + pert. examples
H1: Composite Loss > Composite Loss + pert. examples authorship attribution, the chosen transformations are
DeclNam:variable, DeclNam:function, IncludeAdd, Type-
def:addtypedefs, For:for_to_while, While:while_
that with the same amount of training data, DA-like methods to_for, DataStr:truefalse_to_one_zero, and iwyu. See
combined with ordinary loss will be more robust than with com- Appendix for details on these transformations. The results are
posite loss. However, results are inconclusive as to their influence shown in Figs. 3 and 4. We only mount untargeted attacks in this
on effectiveness. experiment. This is because we have observed in the previous
To sum up the discussion, we present: investigations that the ASRtar often has a trend in line with that
of the ASRunt , and so it suffices to examine ASRunt and reach
conclusions about robustness.
We find that practically all transformations contribute posi-
tively to model robustness, and removing one of most of them
will result in a drop in robustness. Under the authorship attribu-
tion task, iwyu (removing unused headers) has a great impact on
model robustness. For example, for Mixing Directly, excluding
iwyu results in a sharp rise in ASRunt by 18.5% compared to
4.4. RQ3: Impact of input data using all transformations. This may be because for Abuhamad, a
token-based model, iwyu often changes important tokens cru-
In this subsection, we evaluate the impact of input data on
model performance. To that end, we approach the evaluation cial to the model’s prediction. When this type of transforma-
from two aspects: the impact of including perturbed examples tion is excluded from the training set, the model becomes more
in training and the impact of transformation types used in gen- vulnerable to these token-based perturbations. Under the code
erating adversarial examples. summarization task, it is InsertPrintStatements that has a
175
Fig. 4. Comparison of using different types of transformations in generating

adversarial examples for training, authorship attribution task. The minus sign
(-) means this type of transformation is excluded from the training set.
Fig. 3. Comparison of using different types of transformations in generating

adversarial examples for training, code summarization task. The minus sign (-)
means this type of transformation is excluded from the training set. To sum up the discussion, we present:
strong impact. For example, for Mixing Directly, excluding

InsertPrintStatements leads to a significant rise in method-
level ASRunt by 10.0%. Similarly, this may be accounted for because
InsertPrintStatements perturbs the AST of the code, which
may include some tree nodes that code2seq relies heavily on to
give its predictions. We also note that while IncludeAdd does
something similar to InsertPrintStatements, both adding
unused code into the program, it has little impact on robustness
in authorship attribution. This may indicate that attack strength
does not transfer between tasks or models. In other words, one
type of transformation may be very effective against a certain
task, but much less so against another. 4.5. RQ4: Impact of input model
In this subsection, we evaluate the impact of input model

on the final model’s robustness. To that end, we approach the
evaluation from two aspects: the impact of pre-trained models
and the impact of number of epochs for fine-tuning.
Unlike other adversarial training methods which begin train-
ing from a non-pre-trained model, a category of adversarial train-
ing known as ‘‘Adversarial Fine-tuning’’ continues train-
ing for one epoch on a pre-trained normal model, and only on
adversarial examples. Meanwhile, for each clean sample, only
one of its adversarial versions will be chosen for use in fine-
tuning. Because of the small number of fine-tuning epochs, this
176
Table 12
Comparison of using non-pre-trained model versus pre-trained model as input Table 13
model. For code summarization, method-level metrics are shown on the left side Statistical testing for input model.
of the slashes; subtoken-level metrics are on the right. Model (task) Hypotheses Conclusions
Model (task) Adv. Train. Accuracy F1-score ASRunt ASRtar F1-score: Reject H0
category (%) (%) (%) H0: Mixing Directly (1 adv. for 1 clean) ≥ Adversarial
Fine-tuning
No defense 10.2/29.8 43.4 91.0/36.7 – H1: Mixing Directly (1 adv. for 1 clean) < Adversarial
Mixing Directly 10.6/27.5 39.5 75.5/25.0 – Fine-tuning
Non-pre-trained
Model Mixing Directly 8.4/29.4 42.3 77.2/26.8 – method-level ASRunt : Accept H0
(One adv. example H0: Mixing Directly (1 adv. for 1 clean) ≥ Adversarial
code2seq Fine-tuning
(Code Summ.) for each clean
sample) H1: Mixing Directly (1 adv. for 1 clean) < Adversarial
Fine-tuning
Composite Loss 10.5/31.2 42.6 80.7/30.0 –
code2seq subtoken-level ASRunt : Reject H0
Pre-Trained Adversarial 10.5/30.3 43.6 78.0/27.8 – (Code Summ.) H0: Mixing Directly (1 adv. for 1 clean) ≥ Adversarial
Model Fine-tuning Fine-tuning
H1: Mixing Directly (1 adv. for 1 clean) < Adversarial
No defense 81.3 77.7 71.9 10.2
Fine-tuning
Mixing Directly 81.8 78.0 43.7 0.8
Non-pre-trained F1-score: Reject H0
Model Mixing Directly 88.4 85.0 50.0 2.4 H0: Composite Loss ≥ Adversarial Fine-tuning
Abuhamad (One adv. example H1: Composite Loss < Adversarial Fine-tuning
(Author. Attrib.) for each clean
sample)
H0: Composite Loss ≤ Adversarial Fine-tuning
Composite Loss 83.8 80.1 60.0 4.7 H1: Composite Loss > Adversarial Fine-tuning
Pre-Trained Adversarial 79.2 74.4 53.0 4.3 subtoken-level ASRunt : Reject H0
Model Fine-tuning H0: Composite Loss ≤ Adversarial Fine-tuning
H1: Composite Loss > Adversarial Fine-tuning
F1-score: Reject H0
H0: Mixing Directly (1 adv. for 1 clean) ≤ Adversarial
Fine-tuning
H1: Mixing Directly (1 adv. for 1 clean) > Adversarial
method guarantees a very short training time, but we wonder if Fine-tuning
it performs better in terms of effectiveness and robustness when ASRunt : Accept H0
Abuhamad H0: Mixing Directly (1 adv. for 1 clean) ≤ Adversarial
compared with training from scratch using non-pre-trained mod- (Author. Attrib.) Fine-tuning
els. Therefore, we compare the performance of Adversarial H1: Mixing Directly (1 adv. for 1 clean) > Adversarial
Fine-tuning
Fine-tuning and other DA-like methods which begin training F1-score: Reject H0
from a non-pre-trained model. Under the authorship attribution H0: Composite Loss ≤ Adversarial Fine-tuning
task, we fine-tune for 15 epochs. This is because unlike in code ASRunt : Reject H0
summarization, where the code2seq model only needs training H0: Composite Loss ≤ Adversarial Fine-tuning
for a total of 20 epochs, the Abuhamad model used in author-
ship attribution requires more epochs (in our case, 300) for it
to be fully trained. Likewise, to ensure the Abuhamad model is
sufficiently fine-tuned, we have chosen the number of epochs for
fine-tuning so that it is 1/20 that of the total number of epochs
that the pre-trained model has been trained for, i.e., we fine-tune
for 15 epochs based on a pre-trained model that has been trained
for 300 epochs. This is the same ratio as used in training the
code2seq model (i.e., 1 fine-tuned vs. 20 pre-trained).
Table 12 compares the model performance when using non-
pre-trained and pre-trained input models. We observe that for the
In Adversarial Fine-tuning, existing implementations all
code summarization task, pre-trained input models outperform
choose 1 as the number of fine-tuning epochs, but the rationale
non-pre-trained ones in terms of F1-score. With the same amount
is not clear, which makes us wonder if this number is optimal.
of input data (one adversarial example for each clean sample),
Thus, we change this number to study its impact. Under the code
Adversarial Fine-tuning performs better than Composite
Loss, and only a little worse than Mixing Directly in terms summarization task, we try fine-tuning for 1, 3, and 5 epochs.
of robustness. For the authorship attribution task, with the same Under the authorship attribution task, we try fine-tuning for 15,
amount of input data, Adversarial Fine-tuning (using pre- 30, and 45 epochs. The results are shown in Figs. 5 and 6. We
trained models) outperforms Composite Loss (using non-pre- find that in authorship attribution, the accuracy stays at about
trained models) in terms of robustness, but its effectiveness 80%, the ASRunt at about 52%, and the ASRtar about 4%. In code
is harmed. Compared with Mixing Directly (using non-pre- summarization, a similar trend is observed. For example, the
trained models) with the same amount of data, Adversarial subtoken-level accuracy and ASRunt stay at about 28%, and the
Fine-tuning attains a comparable robustness, but a lower ef- F1-score does not change much. In conclusion, while we cannot
fectiveness, which is acceptable considering its much less training assert that fine-tuning for 1 (or 15 for authorship attribution)
time. epoch(s) is the best, fine-tuning for more epochs does not lead
For statistical testing, we formulate six hypotheses for code to substantial improvement either. This may be accounted for by
summarization and four hypotheses for authorship attribution. To the fact that adversarial examples require imperceptibility, so the
ensure the same data volume, we use one adversarial example for perturbations are small. Thus, the model only needs a few number
each clean sample in Mixing Directly to compare with Adver- of epochs to adapt to them and learn well.
sarial Fine-tuning. The hypotheses are shown in Table 13.
We observe that under both tasks, Adversarial Fine-tuning
(using pre-trained models) is always more robust than Compos-
ite Loss (using non-pre-trained models) and usually slightly
less robust than Mixing Directly (using non-pre-trained mod-
els). Nothing can be concluded about the effectiveness of these
methods.
177
Fig. 5. Comparison of different numbers of epochs for fine-tuning, code summarization task.
for continuous space such as the image space. This constraint

ensures that the adversarial examples are as close to the original
as possible. For source code, there is no direct parallel to the
Lp norm. Even if a parallel is devised, e.g., by mapping tokens
to continuous vectors, perturbing a vector under the guidance
of gradient can result in a vector that does not correspond to
any valid token, due to the discrete nature of the code space.
Furthermore, source code is bound by lexical and syntactical
rules, which are absent in images and not so strict in NLP. Simply
replacing a keyword with its natural language synonym can break
the program. Existing methods largely work around this challenge
by utilizing semantics-preserving code transformations, failing to
satisfy the need to keep adversarial examples close to the original.
Second, code transformation cannot be done efficiently during
training (i.e., in an on-line manner). This is because code process-
ing tools used to transform code often parses the source code into
Fig. 6. Comparison of different numbers of epochs for fine-tuning, authorship
an internal representation that is different from what is expected
attribution task. in the training of a deep neural network. To implement on-line
transformation in the code domain would mean parsing the code
and translating back and forth between different representations
To sum up the discussion, we present: during training, which would be prohibitively expensive in terms
of time. Existing methods work around this by regenerating ad-
versarial examples after a certain interval (e.g., every 2 epochs),
and not for every epoch. However, as our experimental results
have shown, less frequent regeneration can result in drops in
robustness. Thus, how to attain both high robustness and high
efficiency remains an open question.
On the positive side, our insights may shed some light on
the second challenge. We find that adversarial fine-tuning takes
little time and has similar robustness to other methods. It may be
possible to combine it with OO-based methods to help efficiency.
For example, first pre-train a model using Mixing Directly,
adapting the model to common adversarial examples. Next, fine-
tune the model using Min–max + Composite Loss. Because
the model has already seen some adversarial examples, it may
5. Discussion achieve high robustness faster in the fine-tuning stage. We leave
the verification of our theory to future work.
5.1. Challenges and prospects
5.2. Limitations
Like the image domain, the domain of source code has applied
adversarial training to address the issue of robustness. However, There are several limitations to this paper. First, limited mod-
there are several unique challenges that must be resolved which els and datasets. We only evaluate two models (code2seq and
prevent researchers from directly applying methods from the Abuhamad) and two datasets (java-small and GCJ C++). It is es-
image domain. sential that more models and datasets be taken into account in
First, the space of source code is a discrete one. The standard future assessments of the adversarial training methods in the
practice of adversarial training in the image domain is PGD-AT. code domain.
PGD-AT requires solving the Min–max problem by perturbing Second, limited metrics. We use four metrics: accuracy, F1-
the image under an Lp norm constraint, which is feasible only score, untargeted and targeted attack success rate. However, only
178
three of those are applicable to both of the tasks considered in Appendix. Caveats for attack
this paper. As a result, we have limited metrics to resort to when
we compare the effectiveness and robustness across tasks. As we mentioned in Section 3, certain conditions must be
Last, regenerated perturbed examples. In training using the met for the attack chosen. First, the attack must be semantics-
Min–max + Composite Loss method under authorship attri- preserving. Second, the attack must have a granularity that
bution, we only regenerated the perturbed examples twice. For matches that of the model under attack. For example, an attack
more accurate evaluation, perturbed examples should be regen- that contains a transformation that splits a function in two is
erated nine times for the model of the Min–max + Composite probably not fit for a model whose granularity is at the function
Loss method to match the training under code summarization. level, since the model will now view the original function as
two separate, unrelated functions, and this will negatively and
6. Conclusion unfairly affect the model’s performance. Third, the attack must
be effective against the code representation of the model. For
This paper carries out a comprehensive evaluation on the example, an attack that contains only one transformation that
adversarial training methods used in the domain of code. Con- changes the names of certain variables in the program may be
cretely, we first collect and examine existing adversarial training effective against token-based models, but less so against AST-
methods and classify them into five categories based on the four based models. Last, it is advised that the attack contain diverse
dimensions we define. Then, we raise research questions relating transformations in order to better evaluate model robustness.
to the four dimensions and conduct experiments on each of the In our experiments under the code summarization task, we
five categories of methods. Finally, we are able to reach con-
employ eight transformations used by Henke et al. [15]:
clusions as to the effects of learning sample obtaining strategy,
loss functions, input data, and input model, which together char- • AddDeadCode: Adds a statement of the form if (false)
acterize an adversarial training method. We find that different { ... }, which contains unused code.
learning sample obtaining strategies may suit different tasks, and • RenameLocalVariables: Replaces the name of a local
that OO-based methods combined with composite loss can attain variable in the program.
a better robustness than with ordinary loss. We also find that the • RenameParameters: Replaces the name of a function pa-
quality of adversarial examples, rather than perturbed examples, rameter in the program.
is key to adversarial training, and that pre-trained models can • RenameFields: Replaces the name of a referenced field in
achieve relatively good performance with a short training time. the program.
We believe this paper provides a systematic approach for the • ReplaceTrueFalse: Replaces a boolean literal.
study of adversarial training methods in the code domain, and • UnrollWhiles: Unrolls the body of a while loop exactly
can motivate future assessments and improvements of adversar- one step.
ial training in this domain. The limitations of this work lie in • WrapTryCatch: Wraps the program by a try-catch state-
limited datasets, models and metrics, and regenerated perturbed ment.
examples. Future works may extend the vision of this paper by • InsertPrintStatements: Inserts a print statement at a
collecting more datasets, and selecting more models and metrics random location.
for experimentation. Moreover, more time should be devoted to
regenerating perturbed examples for more accurate evaluation of For the code authorship attribution task, we have to use a
OO-based methods. different attack because the one by Henke et al. [15] does not
support the C++ language. We choose 8 out of 36 transformations
CRediT authorship contribution statement in the MCTS attack implemented by Quiring et al. [56] in order to
approximate the attack strength under the code summarization
Zhen Li: Conceptualization, Methodology, Project administra- task. We try to choose transformations that are the same types
tion. Xiang Huang: Methodology, Software, Validation, Investiga- as those used in the code summarization task. However, not all
tion, Writing – original draft. Yangrui Li: Validation, Investiga- transformations in code summarization have a counterpart in
tion. Guenevere Chen: Writing – review & editing. authorship attribution. In such cases, we choose transformations
that are similar (e.g., While:while_to_for vs UnrollWhiles,
Declaration of competing interest both of which affect while loops) or strong (e.g., iwyu, which
removes unused headers, a strong attack for the token-based
The authors declare that they have no known competing finan- Abuhamad model). The transformations we finally use for the
cial interests or personal relationships that could have appeared code authorship attribution task are as follows.
to influence the work reported in this paper.
• DeclNam:variable: Replaces the name of a variable de-
Data availability clared in the program.
• DeclNam:function: Replaces the name of a function de-
We have shared the link to our implementation in the clared in the program.
manuscript. • IncludeAdd: Adds unused headers.
• Typedef:addtypedefs: Adds unused typedefs.
Acknowledgments • For:for_to_while: Changes a for loop into a while
loop.
We thank the anonymous reviewers for their insightful com- • While:while_to_for: Changes a while loop into a for
ments, which guided us in improving the paper. This work was loop.
supported by the National Natural Science Foundation of China • DatStr:truefalse_to_one_zero: Changes the boolean
under Grant No. 62272187 and the Natural Science Foundation literals true and false into integer literals 1 and 0.
of Hebei Province, China under Grant No. F2020201016. Any • iwyu: Removes unused headers.
opinions, findings, conclusions or recommendations expressed in
this work are those of the authors and do not reflect the views of Note that we limit the perturbation budget – maximum num-
the funding agencies in any sense. ber of sites the attack is allowed to perturb – to 3. Concretely,
179
we filter out those samples which have been perturbed in more [15] J. Henke, G. Ramakrishnan, Z. Wang, A. Albarghouth, S. Jha, T. Reps,
than 3 sites when the attack is finished, because the nature of Semantic robustness of models of source code, in: Proceedings of the
29th IEEE International Conference on Software Analysis, Evolution and
the MCTS attack does not allow us to directly set a limit on its
Reengineering, IEEE, Honululu, HI, United States, 2022, pp. 526–537.
number of perturbed sites during attack. We have to limit the [16] J.M. Springer, B.M. Reinstadler, U.-M. O’Reilly, STRATA: simple, gradient-
perturbation budget because there is no upper limit as to how free attacks for models of code, 2020, arXiv preprint arXiv:2009.
many sites can be perturbed in the MCTS attack by default, which 13562.
violates the imperceptibility of perturbations that commonly de- [17] H. Zhang, Z. Li, G. Li, L. Ma, Y. Liu, Z. Jin, Generating adversarial examples
for holding robustness of source code processing models, in: Proceedings
fines adversarial examples. We have set the budget to 3 to ensure of the 34th AAAI Conference on Artificial Intelligence, AAAI, New York, NY,
the success rate. A lower budget would have caused the success United States, 2020, pp. 1169–1176.
rate to be much lower and therefore we could not have obtained [18] P. Bielik, M. Vechev, Adversarial robustness for code, in: Proceedings of
enough adversarial examples. the 37th International Conference on Machine Learning, ACM, 2020, pp.
896–907, Virtual.
Besides, under the code summarization task, for each original
[19] S. Srikant, S. Liu, T. Mitrovska, S. Chang, Q. Fan, G. Zhang, U.-M.
sample, each one of the eight transformations will be applied O’Reilly, Generating adversarial computer programs using optimized ob-
to generate a total of eight transformed samples for use in the fuscations, in: Proceedings of the 9th International Conference on Learning
training. However, under the code authorship attribution task, for Representations, 2021.
each original sample, the MCTS attack chooses the best one out of [20] Z. Li, G.Q. Chen, C. Chen, Y. Zou, S. Xu, RoPGen: Towards robust code
authorship attribution via automatic coding style transformation, in: Pro-
eight transformations to apply, generating only one transformed ceedings of the 44th IEEE/ACM International Conference on Software
sample. In this case, there will not be enough training samples for Engineering, IEEE, 2022, pp. 1906–1918.
the types of adversarial training that use the Min–max strategy. [21] Z. Yang, J. Shi, J. He, D. Lo, Natural attack for pre-trained models of code,
To cope with this problem, we force the MCTS attack to generate 2022, arXiv preprint arXiv:2201.08698.
[22] X. Li, H. Ma, L. Meng, X. Meng, Comparative study of adversarial training
slightly different transformed samples by modifying its random
methods for long-tailed classification, in: Proceedings of the 1st Interna-
seed used to initialize the attack. In addition to the random seed tional Workshop on Adversarial Learning for Multimedia, ACM, New York,
used by the original author, we use three other seeds for both NY, United States, 2021, pp. 1–7.
untargeted and targeted attack. [23] H. Ma, X. Li, L. Meng, X. Meng, Comparative study of adversarial training
methods for cold-start recommendation, in: Proceedings of the 1st Interna-
tional Workshop on Adversarial Learning for Multimedia, ACM, New York,
References NY, United States, 2021, pp. 28–34.
[24] S. Zhang, S. Chen, X. Liu, C. Hua, W. Wang, K. Chen, J. Zhang, J. Wang,
[1] U. Alon, S. Brody, O. Levy, E. Yahav, code2seq: Generating sequences from Detecting adversarial samples for deep learning models: a comparative
structured representations of code, 2018, arXiv preprint arXiv:1808.01400. study, IEEE Trans. Netw. Sci. Eng. 9 (1) (2021) 231–244.
[2] M. Allamanis, H. Peng, C. Sutton, A convolutional attention network [25] M.R.I. Rabin, N.D. Bui, K. Wang, Y. Yu, L. Jiang, M.A. Alipour, On the gener-
for extreme summarization of source code, in: Proceedings of the 33rd alizability of neural program models with respect to semantic-preserving
International Conference on Machine Learning, ACM, New York, NY, United program transformations, Inf. Softw. Technol. 135 (2021) 106552.
States, 2016, pp. 2091–2100. [26] L. Applis, A. Panichella, A. van Deursen, Assessing robustness of ML-based
[3] U. Alon, M. Zilberstein, O. Levy, E. Yahav, code2vec: Learning distributed program analysis tools using metamorphic program transformations, in:
representations of code, Proc. ACM Program. Lang. 3 (POPL) (2019) 1–29. Proceedings of the 36th IEEE/ACM International Conference on Automated
[4] L. Mou, G. Li, L. Zhang, T. Wang, Z. Jin, Convolutional neural networks Software Engineering, IEEE, 2021, pp. 1377–1381.
over tree structures for programming language processing, in: Proceedings [27] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R.
of the 30th AAAI Conference on Artificial Intelligence, AAAI, Phoenix, AZ, Fergus, Intriguing properties of neural networks, in: Proceedings of the 2nd
United States, 2016, pp. 1287–1293. International Conference on Learning Representations, Banff, AB, Canada,
[5] J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, X. Liu, A novel neural 2014.
source code representation based on abstract syntax tree, in: Proceedings [28] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial
of the 41st International Conference on Software Engineering, ACM/IEEE, examples, in: Proceedings of the 3rd International Conference on Learning
Montreal, Quebec, Canada, 2019, pp. 783–794. Representations, 2015.
[6] V.J. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis, Deep learning type [29] A. Sinha, Z. Chen, V. Badrinarayanan, A. Rabinovich, Gradient adversarial
inference, in: Proceedings of the 26th ACM Joint Meeting on European training of neural networks, 2018, arXiv preprint arXiv:1806.08028.
Software Engineering Conference and Symposium on the Foundations of [30] A. Kurakin, I. Goodfellow, S. Bengio, Adversarial machine learning at scale,
Software Engineering, ACM, Lake Buena Vista, FL, United States, 2018, pp. 2016, arXiv preprint arXiv:1611.01236.
152–162. [31] R. Huang, B. Xu, D. Schuurmans, C. Szepesvári, Learning with a strong
[7] J. Schrouff, K. Wohlfahrt, B. Marnette, L. Atkinson, Inferring javascript types adversary, 2015, arXiv preprint arXiv:1511.03034.
using graph neural networks, 2019, arXiv preprint arXiv:1905.06707. [32] C. Lyu, K. Huang, H.-N. Liang, A unified gradient regularization family
[8] M. Abuhamad, T. AbuHmed, A. Mohaisen, D. Nyang, Large-scale and for adversarial examples, in: Proceedings of the 15th IEEE International
language-oblivious code authorship identification, in: Proceedings of the Conference on Data Mining, IEEE, 2015, pp. 301–309.
25th ACM SIGSAC Conference on Computer and Communications Security, [33] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep
ACM, Toronto, Canada, 2018, pp. 101–114. learning models resistant to adversarial attacks, in: Proceedings of the
[9] A. Caliskan-Islam, R. Harang, A. Liu, A. Narayanan, C. Voss, F. Yamaguchi, 6th International Conference on Learning Representations, Vancouver, BC,
R. Greenstadt, De-anonymizing programmers via code stylometry, in: Canada, 2018.
Proceedings of the 24th USENIX Conference on Security Symposium, [34] U. Shaham, Y. Yamada, S. Negahban, Understanding adversarial training:
Washington, D.C., United States, 2015, pp. 255–270. Increasing local stability of supervised models through robust optimization,
[10] B. Alsulami, E. Dauber, R. Harang, S. Mancoridis, R. Greenstadt, Source code Neurocomputing 307 (2018) 195–204.
authorship attribution using long short-term memory based networks, in: [35] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel,
Proceedings of the 22nd European Symposium on Research in Computer Ensemble adversarial training: Attacks and defenses, in: Proceedings of the
Security, Springer, Oslo, Norway, 2017, pp. 65–82. 6th International Conference on Learning Representations, 2018.
[11] M. Brockschmidt, M. Allamanis, A.L. Gaunt, O. Polozov, Generative code [36] S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, Deepfool: A simple and
modeling with graphs, in: Proceedings of the 7th International Conference accurate method to fool deep neural networks, in: Proceedings of the 29th
on Learning Representations, New Orleans, LA, United States, 2019. IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.
[12] J. Li, Y. Wang, M.R. Lyu, I. King, Code completion with neural attention 2574–2582.
and pointer networks, in: Proceedings of the 27th International Joint Con- [37] H. Wang, C.-N. Yu, A direct approach to robust deep learning using
ference on Artificial Intelligence, Morgan Kaufmann, Stockholm, Sweden, adversarial networks, in: Proceedings of the 6th International Conference
2018, pp. 4159–4165. on Learning Representations, 2018.
[13] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, Y. Zhong, Vuldeepecker: [38] D. Stutz, M. Hein, B. Schiele, Disentangling adversarial robustness and gen-
A deep learning-based system for vulnerability detection, in: Proceedings eralization, in: Proceedings of the 32nd IEEE/CVF Conference on Computer
of the 25th Annual Network and Distributed System Security Symposium, Vision and Pattern Recognition, 2019, pp. 6976–6987.
ISOC, San Diego, CA, United States, 2018, pp. 259–273. [39] T. Miyato, S.-i. Maeda, M. Koyama, S. Ishii, Virtual adversarial training: A
[14] N. Yefet, U. Alon, E. Yahav, Adversarial examples for models of code, Proc. regularization method for supervised and semi-supervised learning, IEEE
ACM Program. Lang. 4 (OOPSLA) (2020) 1–30. Trans. Pattern Anal. Mach. Intell. 41 (8) (2018) 1979–1993.
180
[40] J. Li, S. Ji, T. Du, B. Li, T. Wang, TextBugger: Generating adversarial [56] E. Quiring, A. Maier, K. Rieck, Misleading authorship attribution of source
text against real-world applications, in: Proceedings of the 26th Annual code using adversarial learning, in: Proceedings of the 28th USENIX
Network and Distributed System Security Symposium, 2019. Conference on Security Symposium, USENIX Association, Santa Clara, CA,
[41] Y. Zang, F. Qi, C. Yang, Z. Liu, M. Zhang, Q. Liu, M. Sun, Word-level textual United States, 2019, pp. 479–496.
adversarial attacking as combinatorial optimization, in: Proceedings of the
58th Annual Meeting of the Association for Computational Linguistics,
2020, pp. 6066–6080.
[42] R. Jia, P. Liang, Adversarial examples for evaluating reading comprehension Zhen Li received the Ph.D. degree in Cyberspace Secu-
systems, in: Proceedings of the 2017 Conference on Empirical Methods in rity at Huazhong University of Science and Technology,
Natural Language Processing, 2017, pp. 2021–2031. Wuhan, China, in 2019. She was a Postdoctoral Fellow
[43] H. Zhang, H. Zhou, N. Miao, L. Li, Generating fluent adversarial examples at the University of Texas at San Antonio, USA, from
for natural languages, in: Proceedings of the 57th Annual Meeting of the 2019 to 2021. She is an associate professor at Hebei
Association for Computational Linguistics, 2019, pp. 5564–5569. University, Baoding China, and also an associate profes-
[44] I. Staliūnaitė, P.J. Gorinski, I. Iacobacci, Improving commonsense causal sor at Huazhong University of Science and Technology,
reasoning by adversarial training and data augmentation, in: Proceedings Wuhan, China. She is a member of the IEEE and a mem-
of the 35th AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. ber of the ACM. Her research interests mainly include
13834–13842. software security and artificial intelligence security.
[45] D. Wang, C. Gong, Q. Liu, Improving neural language modeling via ad-
versarial training, in: Proceedings of the 36th International Conference on
Xiang Huang received the B.E. degree in Computer Sci-
Machine Learning, PMLR, 2019, pp. 6555–6565.
ence and Technology from Foshan University, Foshan,
[46] C. Zhu, Y. Cheng, Z. Gan, S. Sun, T. Goldstein, J. Liu, FreeLB: Enhanced
China, in 2020. He is currently pursuing the M.E. degree
adversarial training for natural language understanding, in: Proceedings of
in Computer Technology at Hebei University, Baoding,
the 8th International Conference on Learning Representations, 2020.
China. His primary research interests lie in software
[47] J. Ebrahimi, D. Lowd, D. Dou, On adversarial examples for character-
security.
level neural machine translation, in: Proceedings of the 27th International
Conference on Computational Linguistics, 2018, pp. 653–663.
[48] H. Liu, Y. Zhang, Y. Wang, Z. Lin, Y. Chen, Joint character-level word
embedding and adversarial stability training to defend adversarial text,
in: Proceedings of the 35th AAAI Conference on Artificial Intelligence, Vol.
34, 2020, pp. 8384–8391.
[49] M. Yasunaga, J. Kasai, D. Radev, Robust multilingual part-of-speech tagging Yangrui Li is currently pursuing the bachelor’s de-
via adversarial training, in: Proceedings of the 2018 Conference of the gree in Cyberspace Security at Huazhong University of
North American Chapter of the Association for Computational Linguistics: Science and Technology, Wuhan, China. Her primary
Human Language Technologies, Vol. 1 (Long Papers), 2018, pp. 976–986. research interests lie in software security.
[50] J.Y. Yoo, Y. Qi, Towards improving adversarial training of NLP models, in:
Findings of the Association for Computational Linguistics: EMNLP 2021,
2021, pp. 945–956.
[51] D. Kang, T. Khot, A. Sabharwal, E. Hovy, Adventure: Adversarial training
for textual entailment with knowledge-guided examples, in: Proceedings of
the 56th Annual Meeting of the Association for Computational Linguistics,
Association for Computational Linguistics (ACL), 2018, pp. 2418–2428.
[52] H.-K. Poon, W.-S. Yap, Y.-K. Tee, W.-K. Lee, B.-M. Goi, Hierarchical gated
recurrent neural network with adversarial and virtual adversarial training
Guenevere Chen received the Ph.D. degree in Electri-
on text classification, Neural Netw. 119 (2019) 299–312.
cal and Computer Engineering from Mississippi State
[53] L. Li, X. Qiu, Token-aware virtual adversarial training in natural language
University, Mississippi State, MS, USA, in 2014. She is
understanding, in: Proceedings of the 35th AAAI Conference on Artificial
an assistant professor with the Department of Elec-
Intelligence, Vol. 35, 2021, pp. 8410–8418.
trical and Computer Engineering, University of Texas
[54] R. Compton, E. Frank, P. Patros, A. Koay, Embedding java classes with
at San Antonio, San Antonio, TX, USA. Her primary
code2vec: Improvements from variable obfuscation, in: Proceedings of the
research area is autonomic computing and cyber se-
17th International Conference on Mining Software Repositories, IEEE/ACM,
curity. Her research topics include human factors and
Seoul, Republic of Korea, 2020, pp. 243–253.
their impacts on cybersecurity, blockchain, health-
[55] D. Wang, Z. Jia, S. Li, Y. Yu, Y. Xiong, W. Dong, X. Liao, Bridging pre-
care information system and IoMT security, intelligent
trained models and downstream tasks for source code understanding, in:
transportation system security, industrial control sys-
Proceedings of the 44th IEEE/ACM International Conference on Software
tems security (SCADA and IIoT), software vulnerability detection, and end-to-end
Engineering, IEEE, 2022, pp. 287–298.
security solutions.
181

A Comparative Study of Adversarial Training Method - 2023 - Future Generation Co

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Comparative Study of Adversarial Training Method - 2023 - Future Generation Co

Uploaded by

Copyright:

Available Formats

Future Generation Computer Systems 142 (2023) 165–181

Contents lists available at ScienceDirect

Future Generation Computer Systems

A comparative study of adversarial training methods for neural models

3.3.1. Attacking on testing set

Table 2 3.4.2. Optimization-objective-based methods

• RQ1: How does learning sample obtaining strategy during

4.2. RQ1: Impact of learning sample obtaining strategy ASRunt : Reject H0

method-level ASRunt : Reject H0 No defense 10.2/29.8 43.4 91.0/36.7 –

(Author. Attrib.) F1-score: Reject H0 method-level ASRunt : Reject H0

In answering RQ1, we find that OO-based methods perform

Fig. 4. Comparison of using different types of transformations in generating

Fig. 3. Comparison of using different types of transformations in generating

strong impact. For example, for Mixing Directly, excluding

In this subsection, we evaluate the impact of input model

for continuous space such as the image space. This constraint

You might also like